Filtering rows and columns

Dr_Van_Nostrand · February 2010

Hi,

Maybe this belongs under a different thread but anyway: is there an operator for selectively filtering out (i.e. remove) one or more rows or columns from a table with data? This is possible in Knime, but Knime lacks other features that RM has.

Feedback is much appreciated!

MariusHelf · June 2012

There is Filter Examples for examples (rows), and Select Attributes for attributes (columns).

Image: https://us.v-cdn.net/6030995/uploads/editor/c4/k9cnjiyoh60x.png

Image: https://us.v-cdn.net/6030995/uploads/editor/rf/oeinechn8g0j.png

Image: https://us.v-cdn.net/6030995/uploads/editor/js/1qa4bl3163qs.png

Best, Marius

qwertz · June 2012

Hi Marius,

At first I want to say THANK YOU at this time for your fast and competent help with all my issues the last days!
I'm really trying to find answers in this forum and the documentation/wiki first.

I checked the "filter examples" operator you mentioned. Especially "attribute_value_filter" shows manifold capabilities to remove rows.

Though, I still found no solution to remove all rows but the last one with this operator.

haddock · June 2012

Hi there,

You can count the examples into a macro and used that as your range start and end, like this.


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
    <process expanded="true" height="349" width="681">
      <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="9" y="6"/>
      <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
        <parameter key="macro" value="exs"/>
      </operator>
      <operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="331" y="23">
        <parameter key="first_example" value="%{exs}"/>
        <parameter key="last_example" value="%{exs}"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Extract Macro" from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
      <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

HTH

qwertz · June 2012

Hi haddock,

Thanks for sharing. Indeed this solution works quite effective.

To be a somewhat more flexible in my process I tried another setup which can provide the last x rows of an example set by doing calculation with the macro's value. Unfortunatelly this doesn't work in the "filter example" operator (e.g. "%{exs}-5" as "first example" value).

Oh dear... I thougt it was so simple - now it drives me nuts

Sincerely
Sachs

haddock · June 2012

Hi there,

It's quite simple, I'll leave it to you, think loop counters.

qwertz · June 2012

Sorry, but I still don't get it. Tried the following process but without success... and still not even a clue how to make it

This process delivers x times the (same) last row and provides a collection instead of a single example set...


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
    <process expanded="true" height="450" width="647">
      <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
      <operator activated="true" class="loop" compatibility="5.2.003" expanded="true" height="76" name="Loop" width="90" x="179" y="30">
        <parameter key="set_iteration_macro" value="true"/>
        <parameter key="iterations" value="5"/>
        <process expanded="true" height="450" width="647">
          <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
            <parameter key="macro" value="%{abc}"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="179" y="30">
            <parameter key="first_example" value="%{abc}"/>
            <parameter key="last_example" value="%{abc}"/>
          </operator>
          <connect from_port="input 1" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Extract Macro" from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_port="output 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Loop" to_port="input 1"/>
      <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Bye for now
Sachs

haddock · June 2012

Hi,

I should have been more explicit about the count. The last example we know; if you take five from the number of examples
you'll filter to six examples ( think about the loop count ), so you need to add one back, like this.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
    <process expanded="true" height="450" width="748">
      <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
      <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
        <parameter key="macro" value="%{last}"/>
      </operator>
      <operator activated="true" class="generate_macro" compatibility="5.2.003" expanded="true" height="76" name="Generate Macro" width="90" x="380" y="30">
        <list key="function_descriptions">
          <parameter key="first" value="%{last}-5+1"/>
        </list>
      </operator>
      <operator activated="true" class="filter_example_range" compatibility="5.2.003" expanded="true" height="76" name="Filter Example Range" width="90" x="648" y="30">
        <parameter key="first_example" value="%{first}"/>
        <parameter key="last_example" value="%{last}"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Extract Macro" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
      <connect from_op="Generate Macro" from_port="through 1" to_op="Filter Example Range" to_port="example set input"/>
      <connect from_op="Filter Example Range" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

The more you use RM, the more those pesky macros show up!

qwertz · June 2012

Wow, that's it! Indeed simple - if you once know how it works ;D

Thank you so much!

haddock · June 2012

Indeedy!

The biggest mistake that I see being made on this Forum is that people underestimate RapidMiner, it is a deep and mature environment with many man years behind it. And these are seriously qualified enthusiastic man years, so the chances are that most likely tasks are covered, as you say if only you know how. It's a big toolkit, and it takes time.

Stick with it!

marasma · September 2014

Hi there,
I've got a similar problem and after two days of work, no solutions...
I need to remove the attributes whose sum (on its column) is < 10. How can I do? I only have numeric values under these attributes...
every feedback is very welcome! :-[

MariusHelf · September 2014

Hi marasma,

I would not necessarily call this a similar problem, but try the process below. It calculates the sum of each column using Aggregate with a default aggregation, then cleans the filenames to match the original name, then selects attribute which some up to at least 0 in this example, and finally the filtered data is used as reference data to filter the original data in Reorder Attributes.

I propose to use breakpoints after each operator to understand what's going on.

Best regards,
Marius

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.0.008">
  <context>
    <input/>
    <output/>
    <macros>
      <macro>
        <key>threshold</key>
        <value>0</value>
      </macro>
    </macros>
  </context>
  <operator activated="true" class="process" compatibility="6.0.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data" compatibility="6.0.008" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30"/>
      <operator activated="true" class="aggregate" compatibility="6.0.008" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
        <parameter key="use_default_aggregation" value="true"/>
        <parameter key="default_aggregation_function" value="sum"/>
        <list key="aggregation_attributes"/>
      </operator>
      <operator activated="true" class="rename_by_replacing" compatibility="6.0.008" expanded="true" height="76" name="Rename by Replacing" width="90" x="380" y="30">
        <parameter key="replace_what" value=".*\((.*)\)"/>
        <parameter key="replace_by" value="$1"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="6.0.008" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="30">
        <parameter key="attribute_filter_type" value="numeric_value_filter"/>
        <parameter key="numeric_condition" value="&gt; %{threshold}"/>
      </operator>
      <operator activated="true" class="order_attributes" compatibility="6.0.008" expanded="true" height="76" name="Reorder Attributes" width="90" x="648" y="165">
        <parameter key="sort_mode" value="reference data"/>
        <parameter key="handle_unmatched" value="remove"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="original" to_op="Reorder Attributes" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Reorder Attributes" to_port="reference_data"/>
      <connect from_op="Reorder Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Here is a link to this process in the Community Repository.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Filtering rows and columns

Best Answer

Answers