The problem is the result of clustering

jabra · May 2018

Hello
Dear engineers
I want to cluster
I have five columns
I want to cluster in the third column, which is the text
With the select attribute operator I chose the third column for clustering.
I want to put the clustering result at the end of the clustering, in the output of all the columns, plus the column.
what should I do???
Thank you so much if you help me

lionelderkrikor · May 2018

Hi @jabra,

Can you share your dataset and your process, please ?

Otherwise, can you give an example of what you want to obtain : I have difficulties to understand what you want to do.

Regards,

Lionel

jabra · May 2018

Hello
thanks for your response
I have no access to the data and my rapidminer file. Which I send.
But
look
I have five columns with the names: idiot. name . lable. Address. Description . I have
I want to cluster the description based on the column name.
But
At the end of the clustering on the output. I have all the columns with the cluster output column. that's mean
Idiot name . lable. Address. Description and cluster
In the output, I can tell which sentence in the cluster has the x lable.
Thank you very much if you help me

lionelderkrikor · May 2018

Hi @jabra

I propose you this process (to adapt and complete with your own data) :

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="label"/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
  </operator>
</process>

Does this process answer to your need ?

Regards,

Lionel

lionelderkrikor · May 2018

Hi again @jabra,

Here you can find a new version of the previous process (maybe more adapted to your need) :

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="179" y="34">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID" width="90" x="514" y="238"/>
      <operator activated="true" class="concurrency:k_means" compatibility="8.2.000" expanded="true" height="82" name="Clustering" width="90" x="447" y="34">
        <parameter key="k" value="3"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="581" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="cluster"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="85"/>
      <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="849" y="85">
        <list key="key_attributes"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="983" y="85">
        <list key="function_descriptions">
          <parameter key="a1" value="concat(str([a1]),&quot;_&quot;,[cluster])"/>
          <parameter key="a2" value="concat(str([a2]),&quot;_&quot;,[cluster])"/>
          <parameter key="a3" value="concat(str([a3]),&quot;_&quot;,[cluster])"/>
          <parameter key="a4" value="concat(str([a4]),&quot;_&quot;,[cluster])"/>
        </list>
      </operator>
      <connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Select Attributes" from_port="original" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
      <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Regards,

Lionel

marcin_blachnik · May 2018

Hi
The easiest and the fastest way is to define a special roles for other columns except the one you want to cluster. In this case you would not need any select attributes, joins etc. You can do it because RapidMiner uses for any analysis (including clustering, classification regression etc) only the regular attributes.

Just put Set Role operator and type in "target role" (define your own role)

elena2020chao · May 2018

Hello
Dear Friends
I use the process document from data operator. I want to have columns in the tokenize of words in addition to the main columns and labels and clustering.
How to change
Thank you for helping me too
Thankful

jabra · May 2018

Hello
Very much of the process you sent. Thank you
Just dear dear engineer
What if I want to see the results of tokenize in the output? As our friend's question is (@ elena2020chao)

And how to evaluate the outcome?
See error

Thanks again if you send the process

jabra · May 2018

Hello
Has anyone ever done this? Who can help me? I need very much ...
Thank you so much if you help me

Thomas_Ott · May 2018

@jabra You have nominal values in data set that the performance operator can't use.

You need to convert everything to a numerical value.

jabra · May 2018

Hello
Thank you
But
I am clustering on the text field
What should I do?
Thankful

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

The problem is the result of clustering

Answers