The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How to copy rows (samples) based on numerical value of defined attribute?

CausalityvsCorrCausalityvsCorr Member Posts: 17 Contributor II
edited July 2019 in Help

I have a dataset with a few thousand rows and tens of attributes, where one attribute contains integers between 1 and around 100. It can be treated as a sort of sampling weight. I need to copy each row based on the value in that specific attribute (i.e. from 1  to around hundred times) and to create a new dataset accordingly.

I cannot find any  operator which is dedicated to this kind of task, but I am sure this can be done with RM. But how?

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    @CausalityvsCorr pretty simple to do with loops and macros. Something like this?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="8.1.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="target_function" value="sum classification"/>
    <parameter key="number_of_attributes" value="10"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
    <parameter key="attribute_name" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="8.1.000" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="34">
    <parameter key="macro" value="num"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="8.1.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
    <parameter key="number_of_iterations" value="%{num}"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="id.eq.%{iteration}"/>
    </list>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store" width="90" x="313" y="34">
    <parameter key="repository_entry" value="//Local Repository/data/%{iteration}_data"/>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Store" to_port="input"/>
    <connect from_op="Store" from_port="through" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn

    Hi,

    depending on why you need the rows copied, you may also avoid the data copy by setting the numerical attribute as weight. Many algorithms support weigthed examples. See their operator capabilities.

     

    Greetings,

     Sebastian

  • u1111082u1111082 Member Posts: 5 Learner I
    edited June 2020
    I'm also trying to copy a row with say an attribute count value of 10 into 10 identical rows, so I can then run the  FG-growth operator. I'm not sure if the solutions above will work in my situation? Appreciate any comments.
Sign In or Register to comment.