The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to copy rows (samples) based on numerical value of defined attribute?
CausalityvsCorr
Member Posts: 17 Contributor II
I have a dataset with a few thousand rows and tens of attributes, where one attribute contains integers between 1 and around 100. It can be treated as a sort of sampling weight. I need to copy each row based on the value in that specific attribute (i.e. from 1 to around hundred times) and to create a new dataset accordingly.
I cannot find any operator which is dedicated to this kind of task, but I am sure this can be done with RM. But how?
Tagged:
0
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
@CausalityvsCorr pretty simple to do with loops and macros. Something like this?
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="8.1.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
<parameter key="target_function" value="sum classification"/>
<parameter key="number_of_attributes" value="10"/>
</operator>
<operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
<operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
<parameter key="attribute_name" value="id"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="extract_macro" compatibility="8.1.000" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="34">
<parameter key="macro" value="num"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="concurrency:loop" compatibility="8.1.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
<parameter key="number_of_iterations" value="%{num}"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="id.eq.%{iteration}"/>
</list>
</operator>
<operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store" width="90" x="313" y="34">
<parameter key="repository_entry" value="//Local Repository/data/%{iteration}_data"/>
</operator>
<connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Store" to_port="input"/>
<connect from_op="Store" from_port="through" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Loop" to_port="input 1"/>
<connect from_op="Loop" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>2
Answers
Hi,
depending on why you need the rows copied, you may also avoid the data copy by setting the numerical attribute as weight. Many algorithms support weigthed examples. See their operator capabilities.
Greetings,
Sebastian