Create own table statistic

swas · November 2017

Hello,

i'm just getting started with Rapidminer and i'd like to ask a probably stupid question. But i'd like to ask if it so i get a better understanding.

I want to achieve something really simple:

I have a database with a table that i'm retrieving. Afterwards i select an attribute and i want to see if it's missing or not and with this i'd like to create a new result with the number of missing values, number of non missing values and the total number.

So this is a rather simple task to do in Rapidminer. And i sadly don't know how to achieve it. Or is it something i shouldn't do with Rapidminer?

I'd appreciate some thoughts.

Telcontar120 · November 2017

Yes, this is very easy to do in RapidMiner. First, if you simply want to see this information, you can get it from the "Statistics" view after you have imported your data. That will show summary info for each attribute, including the number of missings, like so:

stats view.PNG

But if you want to generate a table with this information, you can do so easily by using "Generate Attribute" to count the missings and then "Aggregate" to summarize for any attribute, like so:

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
        <parameter key="repository_entry" value="//Samples/data/Titanic"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
        <list key="function_descriptions">
          <parameter key="Missing_Age" value="missing(Age)"/>
        </list>
      </operator>
      <operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="85">
        <list key="aggregation_attributes">
          <parameter key="Name" value="count"/>
          <parameter key="Missing_Age" value="count (percentage)"/>
        </list>
        <parameter key="group_by_attributes" value="Missing_Age"/>
      </operator>
      <connect from_op="Retrieve Titanic" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
      <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

And you could further use a loop to do this automatically for any number of attributes that you like.

MartinLiebig · November 2017

Hi,

another way to do this is to use the "extract Statistics" operator which is included in the operator toolbox extension.

Cheers,

Martin

Telcontar120 · November 2017

That's a great operator, but unfortunately it doesn't give the total number of examples or the number of non-missings either, so it won't get exactly what the OP asked for. But that might be a nice enhancement for a future version of the "Extract Statistics" operator :-)

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Create own table statistic

Best Answers

Answers