Generate new attributes depended on ID

olgakulesza2 · May 2018

Hello everyone,

Example of my data are presented below. This is set of many books. First column is ID of a book. Now, there are 100 tags_name to each book. What I would like to obtain is table:

So have the row which contains all tag_names for one book.

Could you please help me?

lionelderkrikor · May 2018

Hi again @olgakulesza2,

1. First the new release of the process to rename the name of your columns "tag" :

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
        <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
        <parameter key="imported_cell_range" value="A1:D11"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Id.true.integer.attribute"/>
          <parameter key="1" value="Author.true.polynominal.attribute"/>
          <parameter key="2" value="Title.true.polynominal.attribute"/>
          <parameter key="3" value="Tag name.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
        <list key="aggregation_attributes">
          <parameter key="Tag name" value="concatenation"/>
        </list>
        <parameter key="group_by_attributes" value="Author|Title|Id"/>
      </operator>
      <operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="concat(Tag name)"/>
        <parameter key="split_pattern" value="[|]"/>
      </operator>
      <operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
        <parameter key="number_of_iterations" value="10"/>
        <parameter key="reuse_results" value="true"/>
        <process expanded="true">
          <operator activated="true" class="rename_by_generic_names" compatibility="8.2.000" expanded="true" height="82" name="Rename by Generic Names" width="90" x="313" y="85">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value="concat.*"/>
            <parameter key="generic_name_stem" value="tag"/>
          </operator>
          <connect from_port="input 1" to_op="Rename by Generic Names" to_port="example set input"/>
          <connect from_op="Rename by Generic Names" from_port="example set output" to_port="output 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
      <connect from_op="Split" from_port="example set output" to_op="Loop" to_port="input 1"/>
      <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

2. "Thanks @lionelderkrikor, but in that case I have splitted letters in each column"

I am surprised because I have no problem on my side :

Can you post a screenshot of what you get ?

Regards,

Lionel

lionelderkrikor · May 2018

Hi @olgakulesza2,

Does this process answer to your need (to adapt to your own dataset) :

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
        <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
        <parameter key="imported_cell_range" value="A1:D11"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Id.true.integer.attribute"/>
          <parameter key="1" value="Author.true.polynominal.attribute"/>
          <parameter key="2" value="Title.true.polynominal.attribute"/>
          <parameter key="3" value="Tag name.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
        <list key="aggregation_attributes">
          <parameter key="Tag name" value="concatenation"/>
        </list>
        <parameter key="group_by_attributes" value="Author|Title|Id"/>
      </operator>
      <operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="concat(Tag name)"/>
        <parameter key="split_pattern" value="[|]"/>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
      <connect from_op="Split" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Regards,

Lionel

Telcontar120 · May 2018

Pivot operator will do this easily for you, group by book id and index by tag.

olgakulesza2 · May 2018

Thanks @Telcontar120, but then I will have just names of tags as a columns names and some numbers as a values. I want tag_names to be values, column name may be for example tag1.

olgakulesza2 · May 2018

Thanks @lionelderkrikor, but in that case I have splitted letters in each column.

Telcontar120 · May 2018

But then you can add a "Loop Attributes" and just replace the attribute value with the macro for the attribute name for all your tags, I think :-)

olgakulesza2 · May 2018

@Telcontar120 I think I don't get it Could you please tell me about it with details? I'm completely new at Rapid Miner and I don't know things you are talking about

olgakulesza2 · May 2018

Now it works great, thank you @lionelderkrikor!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Generate new attributes depended on ID

Best Answer

Answers