Generate new attributes depended on ID
Hello everyone,
Example of my data are presented below. This is set of many books. First column is ID of a book. Now, there are 100 tags_name to each book. What I would like to obtain is table:
book_id | rating | aurhor | titile | userid | tag_name1 |tag_name2|....|tag_name100|
So have the row which contains all tag_names for one book.
Could you please help me?
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi again @olgakulesza2,
1. First the new release of the process to rename the name of your columns "tag" :
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
<parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
<parameter key="imported_cell_range" value="A1:D11"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Id.true.integer.attribute"/>
<parameter key="1" value="Author.true.polynominal.attribute"/>
<parameter key="2" value="Title.true.polynominal.attribute"/>
<parameter key="3" value="Tag name.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
<list key="aggregation_attributes">
<parameter key="Tag name" value="concatenation"/>
</list>
<parameter key="group_by_attributes" value="Author|Title|Id"/>
</operator>
<operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="concat(Tag name)"/>
<parameter key="split_pattern" value="[|]"/>
</operator>
<operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
<parameter key="number_of_iterations" value="10"/>
<parameter key="reuse_results" value="true"/>
<process expanded="true">
<operator activated="true" class="rename_by_generic_names" compatibility="8.2.000" expanded="true" height="82" name="Rename by Generic Names" width="90" x="313" y="85">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="concat.*"/>
<parameter key="generic_name_stem" value="tag"/>
</operator>
<connect from_port="input 1" to_op="Rename by Generic Names" to_port="example set input"/>
<connect from_op="Rename by Generic Names" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Loop" to_port="input 1"/>
<connect from_op="Loop" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>2. "Thanks @lionelderkrikor, but in that case I have splitted letters in each column"
I am surprised because I have no problem on my side :
Can you post a screenshot of what you get ?
Regards,
Lionel
1
Answers
Hi @olgakulesza2,
Does this process answer to your need (to adapt to your own dataset) :
Regards,
Lionel
Pivot operator will do this easily for you, group by book id and index by tag.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks @Telcontar120, but then I will have just names of tags as a columns names and some numbers as a values. I want tag_names to be values, column name may be for example tag1.
Thanks @lionelderkrikor, but in that case I have splitted letters in each column.
But then you can add a "Loop Attributes" and just replace the attribute value with the macro for the attribute name for all your tags, I think :-)
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@Telcontar120 I think I don't get it Could you please tell me about it with details? I'm completely new at Rapid Miner and I don't know things you are talking about
Now it works great, thank you @lionelderkrikor!