Create own table statistic
Hello,
i'm just getting started with Rapidminer and i'd like to ask a probably stupid question. But i'd like to ask if it so i get a better understanding.
I want to achieve something really simple:
I have a database with a table that i'm retrieving. Afterwards i select an attribute and i want to see if it's missing or not and with this i'd like to create a new result with the number of missing values, number of non missing values and the total number.
So this is a rather simple task to do in Rapidminer. And i sadly don't know how to achieve it. Or is it something i shouldn't do with Rapidminer?
I'd appreciate some thoughts.
Best Answers
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
Yes, this is very easy to do in RapidMiner. First, if you simply want to see this information, you can get it from the "Statistics" view after you have imported your data. That will show summary info for each attribute, including the number of missings, like so:
But if you want to generate a table with this information, you can do so easily by using "Generate Attribute" to count the missings and then "Aggregate" to summarize for any attribute, like so:
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Samples/data/Titanic"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
<list key="function_descriptions">
<parameter key="Missing_Age" value="missing(Age)"/>
</list>
</operator>
<operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="85">
<list key="aggregation_attributes">
<parameter key="Name" value="count"/>
<parameter key="Missing_Age" value="count (percentage)"/>
</list>
<parameter key="group_by_attributes" value="Missing_Age"/>
</operator>
<connect from_op="Retrieve Titanic" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<connect from_op="Aggregate" from_port="original" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>And you could further use a loop to do this automatically for any number of attributes that you like.
1 -
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi,
another way to do this is to use the "extract Statistics" operator which is included in the operator toolbox extension.
Cheers,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany2
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts