The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
De-pivot?
Imagine that I have an example set (after using clustering) that looks like this:
id,cluster,product1,product2
1,clust0,10.5,20.3
2,clust1,15.75,30.12
3,clust0,5.25,27.2
4,clust1,12.75,8.25
5,clust0,18.05,22.3
6,clust1,1.25,3.75
And I need it in the following format:
id,cluster,value,type
1,clust0,10.5,product1
2,clust1,15.75,product1
3,clust0,5.25,product1
4,clust1,12.75,product1
5,clust0,18.05,product1
6,clust1,1.25,product1
1,clust0,20.3,product2
2,clust1,30.12,product2
3,clust0,27.2,product2
4,clust1,8.25,product2
5,clust0,22.3,product2
6,clust1,3.75,product2
How could I do it? If I didn't have id and cluster I could easily use de-pivot. But I don't know how to do it in the presence of the other two attributes.
Tagged:
0
Best Answer
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
Hi @earmijo,
You do not have to get rid of id or cluster columns to have your table de-pivoted,
check out this process:
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
<parameter key="text" value="id,cluster,product1,product2 1,clust0,10.5,20.3 2,clust1,15.75,30.12 3,clust0,5.25,27.2 4,clust1,12.75,8.25 5,clust0,18.05,22.3 6,clust1,1.25,3.75"/>
</operator>
<operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
<operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
<parameter key="column_separators" value=","/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="136">
<list key="attribute_name">
<parameter key="value" value="product.*"/>
</list>
<parameter key="index_attribute" value="type"/>
<parameter key="create_nominal_index" value="true"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
<connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="De-Pivot" to_port="example set input"/>
<connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Happy RapidMining!
YY
1
Answers
For fun, I made a process that does what you want but doesn't use de-pivot. It uses the Loop Attributes operator to iterate over each regular attribute to be selected which allows an example set to be created containing the single regular attribute and all other special ones. Some renaming and generation is then done so that the output from the loop operation is a collection that can be appended together to yield the answer.
Andrew
Thank you very much Andrew for taking the time to play with my problem. It works perfectly.
Thank you very much Andrew. It works perfectly.
Thank you very much yyhuang. It works perfectly in this example.
How would I generalize it when the columns instead of being named "product1, product2, product 3..." were named "supermarket,gasoline,hotels, ...."? In the first case I can use (as you did) "product.*". I tried ".*" in my case and it doesn't work.
Very good question @earmijo ! Actually you still can use de-pivot for the columns that are not easily selecte by regex. You just need extra step for table join.
check out this: