"Calculating average with missing values"
Hi everyone,
Having a Dataset like the following one:
row - Att1 - Att2 - Att3
(1) 5 7 5
(2) 6 8 9
(3) 7 6
(4) 7
I want to calculate the average so that result would be the following one:
row - Att1 - Att2 - Att3 - Average
(1) 5 7 5 5.6
(2) 6 8 9 7.6
(3) 7 6 6.5
(4) 7 7
I am using operator Generate Attributes, with the following function:
avg(Att1, Att2, Att3)
However, when any of the rows has a missing value, I simply get a Missing Value as Average.
I would like to know how to solve this problem.
Thanks in advance
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi,
the generate aggregation operator is able to do it.
Best,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany0 -
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
HI @iesnaola,
You have to use the Generate Aggregation operator and check the parameter ignore missing
Here the process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
<parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Average_missing\Average_missing.xlsx"/>
<parameter key="imported_cell_range" value="A1:C5"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Att1.true.integer.attribute"/>
<parameter key="1" value="Att2.true.integer.attribute"/>
<parameter key="2" value="Att3.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="generate_aggregation" compatibility="8.0.001" expanded="true" height="82" name="Generate Aggregation" width="90" x="447" y="34">
<parameter key="attribute_name" value="average"/>
<parameter key="aggregation_function" value="average"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Generate Aggregation" to_port="example set input"/>
<connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Regards,
Lionel
2
Answers
Thanks! It worked