Merging double attributes and it's examples together
Hello,
I am pretty new to RapidMiner and thus can't find a solution the problem I have.
My example set contains customer ID's, booking years and the booked hotel. Because within each year some of the customers are going on holiday twice or more, there are double ID's occuring. Is it possible with RapidMiner to somehow merge the belonging booked hotels of the ID's together in one example? To make it more cleary I give you an example of what I mean:
Raw Data Year 20XX:
ID BookedHotel
12 Laplaza
13 Greengarden
12 Ocean
15 Laplaza
Now the customer with the ID 12 is going on holiday twice this year. One time to Laplaza Hotel and the other time to the Ocean Hotel. Now what I want to achieve should look like this:
ID BookedHotel
12 Laplaza; Ocean
13 Greengarden
15 Laplaza
So that if a customer books twice or more a year, the hotels are seperated by a semicolon in the same "cell". I already tried to achieve this by Pivot or generate Concatenation, but without success.
Thank you for your help and sorry for any mistakes (I'm german).
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi eldenoso,
Aggregate is doing the job. concat(hotel) and group_by id. The default delimiter is | but you can of course replace it with a Replace operator.
~Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany2 -
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi Ingo,
you can use this concat also for some fancy ticks. Since | is the or in regex you can extract concat(att) into a macro and use it in Select Attributes to select these attributes or in Filter Examples with a matches expression.
Kudos to @hhomburg for this trick.
~Martin
Example Process:
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="7.3.001" expanded="true" height="82" name="Subprocess" width="90" x="45" y="238">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
<list key="attribute_values">
<parameter key="Keep" value=""attribute_1""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="45" y="136">
<list key="attribute_values">
<parameter key="Keep" value=""attribute_15""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="append" compatibility="7.3.001" expanded="true" height="103" name="Append" width="90" x="179" y="34"/>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
<connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
<connect from_op="Append" from_port="merged set" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Create a table of attributes to keep, can be stored in repo or taken from a DB</description>
</operator>
<operator activated="true" class="aggregate" compatibility="7.3.001" expanded="true" height="82" name="Aggregate" width="90" x="179" y="238">
<list key="aggregation_attributes">
<parameter key="Keep" value="concatenation"/>
</list>
</operator>
<operator activated="true" class="extract_macro" compatibility="7.3.001" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="238">
<parameter key="macro" value="keep"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="concat(Keep)"/>
<parameter key="example_index" value="1"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="85">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="delay" compatibility="7.3.001" expanded="true" height="103" name="Delay" width="90" x="447" y="85">
<parameter key="delay" value="none"/>
<description align="center" color="transparent" colored="false" width="126">Ensure Execution Order</description>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="85">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="%{keep}"/>
<description align="center" color="transparent" colored="false" width="126">The magic happens here!</description>
</operator>
<connect from_op="Subprocess" from_port="out 1" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Delay" to_port="through 2"/>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Delay" to_port="through 1"/>
<connect from_op="Delay" from_port="through 1" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1
Answers
I must admit that I am very impressed. I did not know about the concat function in the "Aggregate" operator and actually started to build a workflow of at least 10 operators to solve this when I saw your post. This works like a charm and is so much more elegant!
Here is a small example process showing how this works.
Cheers,
Ingo
Nice one indeed :smileyvery-happy:
Thank you all for your help. The aggregation solution actually worked pretty good for my case :smileyvery-happy: