Possible bug with Aggregate (mode) Function?
Hi,
I tried to aggregate a set of values using the mode (aggregate) function. See input data below.
User_ID | Month | Coupon |
12245 | Aug-17 | A123 |
55645 | Aug-17 | B774 |
99987 | Aug-17 | B376 |
9890 | Aug-17 | B456 |
9890 | Aug-17 | B456 |
9890 | Aug-17 | B457 |
9891 | Aug-17 | ? |
9891 | Aug-17 | ? |
When aggregating, RM appears to randomly assign a value (mode) to the missing values when the answer for 9891 should be 0. Pls see xml below. Is this is a bug?
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.1.003" expanded="true" height="68" name="Retrieve RM_Test" width="90" x="45" y="85">
<parameter key="repository_entry" value="//Local Repository/RM_Test"/>
</operator>
<operator activated="true" class="aggregate" compatibility="8.1.003" expanded="true" height="82" name="Aggregate" width="90" x="246" y="85">
<parameter key="use_default_aggregation" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default_aggregation_function" value="average"/>
<list key="aggregation_attributes">
<parameter key="Coupon" value="mode"/>
</list>
<parameter key="group_by_attributes" value="User_ID|Month"/>
<parameter key="count_all_combinations" value="false"/>
<parameter key="only_distinct" value="false"/>
<parameter key="ignore_missings" value="false"/>
</operator>
<connect from_op="Retrieve RM_Test" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Comments
This is the result I get. The answer for 9981 should be 0 and not A123
Hello @data123,
In deed, you discovered a strange behaviour....
Until this phenomenon is explained, and as a palliative solution, you can in a preliminary way replace missing value(s)
with 0 using Replace Missing Values operator.
Here the process :
Regards,
Lionel
hmm well I'm not sure 0 would be the expected behavior IMHO but it seems to do what I would expect...
Thanks guys, If we replace with 0 then as long as the values are not declared as "missing", the aggregate (mode) function will compute them all hence presenting a mathematically correct result but not the desired result (e.g. A123,A123,0,0,0 will give a result of 0 instead of A123 as desired).
Hi all,
we ahve this on the radar and are working on it. To keep everything consistent, the aggregation function in future will return a missing value if most entries are missing values. So you still would have to use the Replace Missing operator afterwards. Will keep you posted.
Cheers
Jan
Quick update, the mode aggregation was alqays ignoring missing values, regardless whether the corresponding parameter was set or not. This will be fixed in the next patch release (8.2.1).
Cheers
Jan
fixed in ver 8.2.1