The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Extract Aggregates operator : Error in functions calculation ?
lionelderkrikor
RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi RM Staff,
First I hope everyone is doing well.
Secondly, I think there is an error of calculation in the Extract Aggregates operator (Time-series module) for the :
- median
- first quartile
- third quartile
It seems that these 3 functions are assimiled to the "minimum" function...
Here the results for the "Temperature" attribute of the "Golf" dataset :
These curious results allowed me to test the new function "percentile" of the Aggregate operator. This operator give (from my point of view)
the good following results :
The process (use RM 9.1 (beta) to run this process) :
Regards,
Lionel
First I hope everyone is doing well.
Secondly, I think there is an error of calculation in the Extract Aggregates operator (Time-series module) for the :
- median
- first quartile
- third quartile
It seems that these 3 functions are assimiled to the "minimum" function...
Here the results for the "Temperature" attribute of the "Golf" dataset :
These curious results allowed me to test the new function "percentile" of the Aggregate operator. This operator give (from my point of view)
the good following results :
The process (use RM 9.1 (beta) to run this process) :
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000-BETA2"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000-BETA2" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000-BETA2" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="85"> <parameter key="repository_entry" value="//Samples/data/Golf"/> </operator> <operator activated="true" class="time_series:extract_std_descriptive_features" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Extract Aggregates" width="90" x="380" y="85"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Temperature"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="add_time_series_name" value="false"/> <parameter key="sum" value="true"/> <parameter key="mean" value="true"/> <parameter key="geometric_mean" value="true"/> <parameter key="first_quartile" value="true"/> <parameter key="median" value="true"/> <parameter key="third_quartile" value="true"/> <parameter key="min" value="true"/> <parameter key="max" value="true"/> <parameter key="std_deviation" value="true"/> <parameter key="kurtosis" value="true"/> <parameter key="skewness" value="true"/> <parameter key="ignore_invalid_values" value="false"/> </operator> <operator activated="true" class="aggregate" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Aggregate" width="90" x="581" y="136"> <parameter key="use_default_aggregation" value="false"/> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="default_aggregation_function" value="average"/> <list key="aggregation_attributes"> <parameter key="Temperature" value="median"/> <parameter key="Temperature" value="percentile (25)"/> <parameter key="Temperature" value="percentile (75)"/> <parameter key="Temperature" value="average"/> <parameter key="Temperature" value="minimum"/> </list> <parameter key="group_by_attributes" value=""/> <parameter key="count_all_combinations" value="false"/> <parameter key="only_distinct" value="false"/> <parameter key="ignore_missings" value="true"/> </operator> <connect from_op="Retrieve Golf" from_port="output" to_op="Extract Aggregates" to_port="example set"/> <connect from_op="Extract Aggregates" from_port="features" to_port="result 1"/> <connect from_op="Extract Aggregates" from_port="original" to_op="Aggregate" to_port="example set input"/> <connect from_op="Aggregate" from_port="example set output" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator> </process>
Lionel
2
Best Answer
-
tftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM ResearchHi @lionelderkrikor,
Thanks for reporting this. I already spotted this and it will be fixed in the 9.1 release (fix is not included in the beta). In fact the first quartile, median and third quartile features calculated the 0.25/0.5/0.75 percent quartiles ;-) So for smaller data set (as the golf one) it is basically the min.
Best regards,
Fabian6
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts