Newbi - Append Examples to generate a new row
Hi forum,
Actually I'm too new with RapidMiner but with quite some experience using Python.
The problem that I'm facing is that I have a file with data in this form:
Column Value
ContextDataValuesAgeValue 55to64
ContextDataValuesGenderValue Female
ProductId cb4d59cf-c48d-47ef-a943-50b2ae5d01ee
Rating 5.0
RatingRange 5.0
SubmissionTime 2016-09-14T14:39:14.000+00:00
UserLocation Southport, United Kingdom
ContextDataValuesAgeValue 45to54
ContextDataValuesGenderValue Female
ProductId cb4d59cf-c48d-47ef-a943-50b2ae5d01ee
Rating 5.0
RatingRange 5.0
SubmissionTime 2017-11-10T09:31:42.000+00:00
UserLocation London
What I need to do is to create a new file with each one of the "columns" and their corresponding value in one row for each group of columns. In this example I wrote 2 "groups" or new rows.
I have tried the PIVOT component but because the labels (text) of the Columns are the same (repeated in different rows) it throws an error of "Column name already exists". I also tried the Loop component but I don't know how to tell it "process the first 7 rows, pivot them, generate a new Example (row) and continue gropuing the rest of the file". I know is pretty simple but really can't find the way to do it.
I really appreciate all the help with this.
Thanks in advance!
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
If you have the same number of attributes every time, you can do this using Pivot, but you need to create a new index variable first. You can do that by first generating a numeric ID and then using Generate Attributes and use modulus arithmetic to give you an number from 0 to 7 (using mod 8 function), and then group them. You should then be able to use that as your index to Pivot your data. Something like the attached process.
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\brian\Downloads\sample.txt"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="false" breakpoints="after" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="136">
<parameter key="text" value="Time Location Incident Oct 25th Tampa Robbery Oct 25th Miami Theft Oct 26th Brandon Assault"/>
<description align="center" color="transparent" colored="false" width="126">This contains the type of data which this works on, with each attribute contained in a separate row but cycling through the same attributes in order.</description>
</operator>
<operator activated="true" class="generate_id" compatibility="8.0.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
<operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
<list key="function_descriptions">
<parameter key="index" value="mod(id,3)"/>
<parameter key="example" value="floor((id-1)/3)"/>
</list>
</operator>
<operator activated="true" class="pivot" compatibility="8.0.001" expanded="true" height="82" name="Pivot" width="90" x="447" y="34">
<parameter key="group_attribute" value="example"/>
<parameter key="index_attribute" value="index"/>
</operator>
<operator activated="true" class="rename_by_example_values" compatibility="8.0.001" expanded="true" height="82" name="Rename by Example Values" width="90" x="581" y="34">
<description align="center" color="transparent" colored="false" width="126">Used if the names of the attributes are in the first set of examples.</description>
</operator>
<operator activated="true" class="rename" compatibility="8.0.001" expanded="true" height="82" name="Rename" width="90" x="715" y="34">
<parameter key="old_name" value="0.0"/>
<parameter key="new_name" value="Example"/>
<list key="rename_additional_attributes"/>
<description align="center" color="transparent" colored="false" width="126">Used to rename attributes manually if needed.</description>
</operator>
<operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="849" y="34">
<parameter key="attribute_name" value="Example"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
<connect from_op="Rename by Example Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>2
Answers
Hello @homero_merino,
i think you just want to use the Transpose operator maybe followed by a guess types?
What function would you use in pandas?
Best,
Martin
Dortmund, Germany
Hi Martin,
Thanks for your reply. The answer is yes and no.
The problem with the TRANSPOSE function is that it raises a "Duplicate attribute name" error when the same "label" is repeated as stated in the example above.
I want to group the attributes (its always the same number of attributes - 7) into one single row (TRANSPOSE) in a simple way.
Thanks again, kind regards!
Thank you for your reply Brian, your solution is correct.
This kind of problem just need a common "id" for grouping the rows, and with the PIVOT component you just need to select the ID attribute.
Thanks a lot, kind regards!