The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
Hi @781194025,
exactly for this reason the Operator "Group into Collection" in the extension 'Operator Toolbox' exists.
The result is a collection of ExampleSets grouped by one Attribute.
Best,
Edin
But, seriously, aggregate is BUGGED!! Even when I split the data (by groups) and then aggregate it, the aggregated examples will gather data from GOD KNOWS WHERE!!!
I'll try Group Into Collection now, I suppose. But I don't want a collection, I want to eliminate redundant rows!!!
Hey,
Would it be possible to share the process XML code here so that we can step through the process and see what is the error?
Cheers,
Pavithra
I group by URL, "loop collection" and run aggregate in the loop.
'Aggregate' should ONLY work on the 3 examples grouped by url in that collection. But somehow it aggregates data from the original set!!!!
AGGREGATE IS BUGGED!!!!
IN FACT, when I 'aggregate' a SINGLE EXAMPLE in a completely new process, after saving it as it's own independent single example set, it STILL remembers data to 'aggregate'.
I have been trying to do something VERY SIMPLE for literally a month now. Combine two example sets, grouped by url, where the missing fields 'fill in the blanks' of each other. I cannot make it happen even on 2 examples, let alone 2 example sets!!!
It's my fault for gathering data so haphazardly I suppose, but it's tricky because I often exceed my API limits and end up with half-completed data sets that need to be joined with other half-completed ones!!
I don't need to share my process code, just look at these screenshots!!
Hi,
I am not sure if I understand what you are trying to do, but don't you just need the Remove Duplicates operator keeping one record of every URL?
In your example, you are taking the mode of 100% missing values. The attribute has the metadata about all the possible values and it finds that all of those potential values appear zero times. There is no clear winner so it will just pick one of the values as the mode. You could argue that it should keep it missing instead. Did I understand correctly that this would be the expected behaviour from your perspective?
Thanks, Zoltan
Also, the Aggregate operator has a parameter called "ignore missings" that is set to true by default. If you set it to false then do you get the result that you expect?
Best, Zoltan
Look at attached photo "4 examples for aggregation": I want ALL that data combined in 1 row.
I seemed to have 'partially' solved the problem by simply "removing useless attributes" before running aggregate.
The pictures I previously attached clearly show 'aggregate' generating data out of thin air. Yes, I did try all the check-boxes.
My guess is aggregate draws data from the Repository or from the Example Set it was split off from, even if it's saved in an entirely seperate Example Set.
Anyway I'm done spending time and effort trying to report this bug when I'm only met with skepticism and cries of user error. Especially since I've found a way around it.
Make sure the data you're using is from a larger example set, split off into a subgroup by ID.
Thanks for the explanation, I think now I get what you are trying to do. I was not sceptical, just did not understand fully.
Believe me that it does not pull the data from thin air. Even if you filter and save a dataset, each nominal attribute remembers all the potential values it ever had. This is quite useful in many cases so we do not intend to change that.
However, when you calculate mode on an group that only has missing values, then mode is counting the occurances of all potential values. All of them have zero occurances, so it is doing what it needs to do in case of a draw: picks one. This is a bug, and we need to make sure that if all values have zero occurances then it picks missing ("?") as a result. I have filed this in our internal bug tracker and it will be fixed in one of the upcoming releases.
Thanks for bringing this up!
Best, Zoltan
Hi @781194025,
Until the bug is fixed perhaps the Operator "Materialize Data" can help.
If you have filtered a dataset and are sure that you do not want to keep the potential values you can use this Operator right after your filtering steps / before your aggregations. It basically recreates the Metadata on the available data.
Best regards,
Edin
Hi @zprekopcsak @Edin_Klapic - if this is a recognized bug, can I move this thread to "Product Feedback" so that Balazs H. can manage?
Scott
I am aggregating by grouping multiple factors and one is an integer. After the aggregation, the integer disappears.
Help?
Scott
Sure. Here it is
because when I look at the parameters, "Retailer Code N" is shown as polynominal, not integer (that's what all the cubes mean):
Scott