The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"[SOLVED] Aggregation: generating collections grouped by attrbutes"

UgoUgo Member Posts: 20 Contributor II
edited June 2019 in Help
Hello,

I have a data set with three attributes: first, second and rank.
I have sorted the data-set so that it is ordered first by "first" and then rank.
I need to:
1. generate subsets of the data grouped by "first"
2. for each group data-set select the first k elements

In the end I would like to select any group data-set via the "first" attribute.
 
Anyone know if this is possible?
If so how?

TIA,
Hugo F

P.S: I have looked at Aggregation and Collections to no avail

Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Hugo,

    try to use the Loop Values operator to loop over the values of "first". In the inner process, use "Filter Examples" such that you have only examples with the current value of first (you need the attribute_value_filter with an expression like "first=%{loop_value}". Then Sort by rank and use Filter Example Range to keep only the first k elements. Outside of the loop, you can use Append to recombine the results of the loop operator.

    Best regards,
    Marius
  • UgoUgo Member Posts: 20 Contributor II
    Hi Marius,

    Thanks once again for the input. Worked as expected.  :)
    Took some time though (36.34 min) because I am dealing with
    roughly 8.36 m records.

    Which brings me to my next question. Yesterday I considered
    using a DB which I expect is useful due to the indexing. So
    my question is: do these "Filter" operators also use indexes or
    would I be better off using SQL for the "filtering"?

    TIA,
    Hugo F.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    You should do it directly in the database. Once you imported the data into RapidMiner it's just a normal example set without any indices.
  • UgoUgo Member Posts: 20 Contributor II

    Ok. Pretty quick for 8M records though.

    Thanks.
    Hugo F.
Sign In or Register to comment.