The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How to split column and pivot values

User139033User139033 Member Posts: 2 Learner III
Hi guys!

I have a problem that I solved with R scripting, but I would like to do it in Rapidminer.

My data looks like:

id    aspects
1     bed: positive, room: positive
2     room: positive, wifi: negative

I would like to transform into this:

id    bed         room     wifi
1     positive  positive
2                   positive  negative
  
Thanks for your help.

Best Answer

Answers

  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hello @User130933,

    Here is your solution:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.3.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="ID;Aspects&#10;1;bed:positive,room:positive&#10;2;room:positive,wifi:negative"/>
            <parameter key="column_separator" value=";"/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="split" compatibility="9.3.000" expanded="true" height="82" name="Split Aspects" width="90" x="179" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Aspects"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="split_pattern" value=","/>
            <parameter key="split_mode" value="ordered_split"/>
            <description align="center" color="transparent" colored="false" width="126">Split each aspect first.</description>
          </operator>
          <operator activated="true" class="de_pivot" compatibility="9.3.000" expanded="true" height="82" name="De-Pivot" width="90" x="313" y="34">
            <list key="attribute_name">
              <parameter key="Evaluation" value="Aspects_(.*)"/>
            </list>
            <parameter key="index_attribute" value="Index"/>
            <parameter key="create_nominal_index" value="true"/>
            <parameter key="keep_missings" value="false"/>
            <description align="center" color="transparent" colored="false" width="126">Reorganize to have each aspect in a row.</description>
          </operator>
          <operator activated="true" class="split" compatibility="9.3.000" expanded="true" height="82" name="Split Evaluation" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Evaluation"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value=":"/>
            <parameter key="split_mode" value="ordered_split"/>
            <description align="center" color="transparent" colored="false" width="126">Split each aspect into name and value.</description>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Index"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
            <description align="center" color="transparent" colored="false" width="126">Remove the indexed values generated.</description>
          </operator>
          <operator activated="true" class="blending:pivot" compatibility="9.3.000" expanded="true" height="82" name="Pivot" width="90" x="715" y="34">
            <parameter key="group_by_attributes" value="ID"/>
            <parameter key="column_grouping_attribute" value="Evaluation_1"/>
            <list key="aggregation_attributes">
              <parameter key="Evaluation_2" value="first"/>
            </list>
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="default_aggregation_function" value="first"/>
            <description align="center" color="transparent" colored="false" width="126">Recompose into having the type of information you require.</description>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="9.3.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="849" y="34">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="regular_expression" value="first\(Evaluation_2\)_(.*)"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="first\(Evaluation_2\)_(.*)"/>
            <parameter key="replace_by" value="$1"/>
            <description align="center" color="transparent" colored="false" width="126">The pivot will generate strange names. Now you can rename it by using a simple regular expression.</description>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Split Aspects" to_port="example set input"/>
          <connect from_op="Split Aspects" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
          <connect from_op="De-Pivot" from_port="example set output" to_op="Split Evaluation" to_port="example set input"/>
          <connect from_op="Split Evaluation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Pivot" to_port="input"/>
          <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    
    You can use successive splits and de-pivots/pivots, depending on what you want to do. At the end, a Replace by Renaming will fix the names generated by the Pivot. Thanks to @yyhuang for her infinite patience teaching me how to do proper replacing. (She doesn't even know she did, but yes, she showed me some magic tricks at some point in March this year).

    All the best,

    Rodrigo.

  • User139033User139033 Member Posts: 2 Learner III
    It works superbly. 

    Thank you very much for your help.

    U.
Sign In or Register to comment.