The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Merging examples
Hello,
My example set is similar to the one generated by this process:
Thanks for any pointers
- R
My example set is similar to the one generated by this process:
Basically, I have something like this:
<operator name="Root" class="Process" expanded="yes">
<operator name="OperatorChain" class="OperatorChain" expanded="no">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
</operator>
<operator name="label is regular" class="ChangeAttributeRole">
<parameter key="name" value="label"/>
</operator>
<operator name="BinDiscretization - 50" class="BinDiscretization">
<parameter key="number_of_bins" value="50"/>
<parameter key="range_name_type" value="short"/>
</operator>
<operator name="label is label" class="ChangeAttributeRole">
<parameter key="name" value="label"/>
<parameter key="target_role" value="label"/>
</operator>
<operator name="Nominal2Numerical" class="Nominal2Numerical">
</operator>
<operator name="BinDiscretization - 2" class="BinDiscretization">
<parameter key="range_name_type" value="short"/>
</operator>
<operator name="Nominal2Numerical (2)" class="Nominal2Numerical">
</operator>
<operator name="Sorting" class="Sorting">
<parameter key="attribute_name" value="label"/>
</operator>
</operator>
</operator>
I would like to merge all the "rangeX" examples, so that for each attribute, the maximum across all examples with the same ID is kept. eg, I want:
label att1 att2 att3 att4 att5
range1 1.0 0.0 0.0 0.0 1.0
range1 0.0 0.0 1.0 1.0 0.0
range10 1.0 1.0 1.0 0.0 0.0
range11 1.0 0.0 0.0 1.0 0.0
range11 1.0 0.0 0.0 1.0 1.0
range11 1.0 0.0 0.0 1.0 1.0
....
I hope I'm clear here... Unfortunately, I don't have access to the data format, so I must do this crazy trick. I guess I could always write my own operator to do this, but I'm sure RapidMiner has all the necessary operators already available for this!
label att1 att2 att3 att4 att5
range1 1.0 0.0 1.0 1.0 1.0
range10 1.0 1.0 1.0 0.0 0.0
range11 1.0 0.0 0.0 1.0 1.0
....
Thanks for any pointers
- R
1
Answers
thank you for this excellent post. Solving a problem described in a such detailed manner is fun. So I will not only point you to the Aggregation operator, but I also have an example process for you: Greetings,
Sebastian
Many years later and I had a similar (if not exactly the same) problem as OP.
Found the solution on this post