The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Create own table statistic

swasswas Member Posts: 2 Learner I
edited November 2018 in Help

Hello,

i'm just getting started with Rapidminer and i'd like to ask a probably stupid question. But i'd like to ask if it so i get a better understanding.

I want to achieve something really simple:

I have a database with a table that i'm retrieving. Afterwards i select an attribute and i want to see if it's missing or not and with this i'd like to create a new result with the number of missing values, number of non missing values and the total number.

 

So this is a rather simple task to do in Rapidminer. And i sadly don't know how to achieve it. Or is it something i shouldn't do with Rapidminer?

 

I'd appreciate some thoughts.

 

 

Tagged:

Best Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    Yes, this is very easy to do in RapidMiner.  First, if you simply want to see this information, you can get it from the "Statistics" view after you have imported your data.  That will show summary info for each attribute, including the number of missings, like so:

    stats view.PNG

    But if you want to generate a table with this information, you can do so easily by using "Generate Attribute" to count the missings and then "Aggregate" to summarize for any attribute, like so:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Titanic"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
    <list key="function_descriptions">
    <parameter key="Missing_Age" value="missing(Age)"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="85">
    <list key="aggregation_attributes">
    <parameter key="Name" value="count"/>
    <parameter key="Missing_Age" value="count (percentage)"/>
    </list>
    <parameter key="group_by_attributes" value="Missing_Age"/>
    </operator>
    <connect from_op="Retrieve Titanic" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
    <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    And you could further use a loop to do this automatically for any number of attributes that you like.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Solution Accepted

    Hi,

     

    another way to do this is to use the "extract Statistics" operator which is included in the operator toolbox extension.

     

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    That's a great operator, but unfortunately it doesn't give the total number of examples or the number of non-missings either, so it won't get exactly what the OP asked for. But that might be a nice enhancement for a future version of the "Extract Statistics" operator :-)
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.