'Aggregate' by group is BUGGING OUT!

781194025 · October 2017

how do we delete threads?

i gave up on aggregate by group and moved onto LOOP VALUES -> AGGREGATE instead cuz.... that's all i could 'make work'.

who knows if its actually working tho...

Edin_Klapic · November 2017

Hi @781194025,

exactly for this reason the Operator "Group into Collection" in the extension 'Operator Toolbox' exists.

The result is a collection of ExampleSets grouped by one Attribute.

Best,

Edin

781194025 · November 2017

Hey ! Thanks for pointing that out!

But, seriously, aggregate is BUGGED!! Even when I split the data (by groups) and then aggregate it, the aggregated examples will gather data from GOD KNOWS WHERE!!!

I'll try Group Into Collection now, I suppose. But I don't want a collection, I want to eliminate redundant rows!!!

Pavithra_Rao · November 2017

Hey,

Would it be possible to share the process XML code here so that we can step through the process and see what is the error?

Cheers,

Pavithra

781194025 · November 2017

THE SAME THING HAPPENS WITH 'GROUP BY COLLECTION'!!!!

I group by URL, "loop collection" and run aggregate in the loop.
'Aggregate' should ONLY work on the 3 examples grouped by url in that collection. But somehow it aggregates data from the original set!!!!

AGGREGATE IS BUGGED!!!!
IN FACT, when I 'aggregate' a SINGLE EXAMPLE in a completely new process, after saving it as it's own independent single example set, it STILL remembers data to 'aggregate'.

I have been trying to do something VERY SIMPLE for literally a month now. Combine two example sets, grouped by url, where the missing fields 'fill in the blanks' of each other. I cannot make it happen even on 2 examples, let alone 2 example sets!!!

It's my fault for gathering data so haphazardly I suppose, but it's tricky because I often exceed my API limits and end up with half-completed data sets that need to be joined with other half-completed ones!!

I don't need to share my process code, just look at these screenshots!!

zprekopcsak · November 2017

Hi,

I am not sure if I understand what you are trying to do, but don't you just need the Remove Duplicates operator keeping one record of every URL?

In your example, you are taking the mode of 100% missing values. The attribute has the metadata about all the possible values and it finds that all of those potential values appear zero times. There is no clear winner so it will just pick one of the values as the mode. You could argue that it should keep it missing instead. Did I understand correctly that this would be the expected behaviour from your perspective?

Thanks, Zoltan

zprekopcsak · November 2017

Also, the Aggregate operator has a parameter called "ignore missings" that is set to true by default. If you set it to false then do you get the result that you expect?

Best, Zoltan

781194025 · November 2017

I'm trying to combine examples by URL where the examples have missing fields, without losing any data from the fields.

Look at attached photo "4 examples for aggregation": I want ALL that data combined in 1 row.

I seemed to have 'partially' solved the problem by simply "removing useless attributes" before running aggregate.

The pictures I previously attached clearly show 'aggregate' generating data out of thin air. Yes, I did try all the check-boxes.

My guess is aggregate draws data from the Repository or from the Example Set it was split off from, even if it's saved in an entirely seperate Example Set.

Anyway I'm done spending time and effort trying to report this bug when I'm only met with skepticism and cries of user error. Especially since I've found a way around it.

781194025 · November 2017

Attached is a simple process that should show the bug.

Make sure the data you're using is from a larger example set, split off into a subgroup by ID.

zprekopcsak · November 2017

Thanks for the explanation, I think now I get what you are trying to do. I was not sceptical, just did not understand fully.

Believe me that it does not pull the data from thin air. Even if you filter and save a dataset, each nominal attribute remembers all the potential values it ever had. This is quite useful in many cases so we do not intend to change that.

However, when you calculate mode on an group that only has missing values, then mode is counting the occurances of all potential values. All of them have zero occurances, so it is doing what it needs to do in case of a draw: picks one. This is a bug, and we need to make sure that if all values have zero occurances then it picks missing ("?") as a result. I have filed this in our internal bug tracker and it will be fixed in one of the upcoming releases.

Thanks for bringing this up!

Best, Zoltan

Edin_Klapic · November 2017

Hi @781194025,

Until the bug is fixed perhaps the Operator "Materialize Data" can help.

If you have filtered a dataset and are sure that you do not want to keep the potential values you can use this Operator right after your filtering steps / before your aggregations. It basically recreates the Metadata on the available data.

Best regards,

Edin

sgenzer · November 2017

Hi @zprekopcsak @Edin_Klapic - if this is a recognized bug, can I move this thread to "Product Feedback" so that Balazs H. can manage?

Scott

sdima · January 2019

Has this issue been resolved?

I am aggregating by grouping multiple factors and one is an integer. After the aggregation, the integer disappears.
Help?

sgenzer · January 2019

hi @sdima so can you please post your XML and your data so we can see what you're doing?

Scott

sdima · March 2019

Hi @sgenzer

Sure. Here it is

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">

</context>

<description align="center" color="transparent" colored="false" width="126">STAGE I - Read Google Storage</description>

</operator>

</operator>

</operator>

</list>

<description align="center" color="transparent" colored="false" width="126">STAGE II - Filter UK only LOCAL</description>

</operator>

</list>

<description align="center" color="transparent" colored="false" width="126">UK ONLY JAN ONLY JAGUAR No blank cost</description>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</list>

</operator>

<description align="center" color="transparent" colored="false" width="126">SHOW MLIs NOT MAPED</description>

</operator>

</list>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</operator>

</list>

<description align="center" color="transparent" colored="false" width="126">LOCLA WTY </description>

</operator>

</list>

</operator>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</list>

<description align="center" color="transparent" colored="false" width="126">BRING MLI INFO</description>

</operator>

<description align="center" color="transparent" colored="false" width="126">SHOW PARTS NOT MAPED</description>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</list>

</operator>

</operator>

</list>

</operator>

</list>

<description align="center" color="transparent" colored="false" width="126">AGGREGATE MLI LEVEL</description>

</operator>

</operator>

</list>

</operator>

</operator>

<description align="center" color="yellow" colored="false" height="104" resized="false" width="180" x="1902" y="484">Here is where I am losing &quot;Retailer Code N&quot; field</description>

</process>

</operator>

</process>

sgenzer · March 2019

hi @sdima ok thx for the XML. The CSVs would also help a lot as then I could run your process. Nevertheless I just want to make sure I understand your question. This is the problem?

Image: https://us.v-cdn.net/6030995/uploads/editor/q6/pgg8azhmf9dx.png

because when I look at the parameters, "Retailer Code N" is shown as polynominal, not integer (that's what all the cubes mean):

Image: https://us.v-cdn.net/6030995/uploads/editor/mw/ckba55oav0ey.png

Scott

NaGorham · March 2019

The mix overseer has a parameter implied as "disregard missings" that is set to bona fide as is normally done. https://arynews.tv/en/pm-complaint-cell-resolves-complaints If you set it to false by then do you get the effect that you simply anticipate

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

'Aggregate' by group is BUGGING OUT!

Answers