Add a native Rank operator to RapidMiner Studio

Telcontar120 · February 2019

There have been several recent threads asking about how to calculate ranks using RapidMiner. Currently there is a Rank operator in the old and unsupported (and somewhat buggy) Finance & Economics extension, but it is hard to recommend that solution, especially to newer users. The alternative using RapidMiner native operators currently is very cumbersome and complex for something as conceptually simple as a rank calculation. It would be so much easier if RapidMiner simply added a native Rank operator to the basic data ETL toolkit.

MartinLiebig · February 2019

Hi @Telcontar120 ,

what would the Rank operator do?

BR,

Martin

Telcontar120 · February 2019

@mschmitz it would calculate the numerical rank of each example based on a specific numerical attribute(s) and its values. It is equivalent to sorting the examples by that attribute and then assigning a sequential numerical id. Take a look at the Rank operator in the Finance & Economics extension for a working example today that can be used simultaneously on any arbitrary set of numerical attributes.
A more sophisticated version would even provide options around whether to sort ascending vs descending and how to handle tie values (assign lowest rank, assign highest rank, or assign midpoint rank), and the option to either replace the original attribute vs adding a new attribute with the rank value.
This is conceptually similar to assigning the percentile value to all examples. There are many contexts in which this is a useful transformation, including many non-parametric calculations, or using rank value rather than raw values as predictors in models to eliminate scalar effects (e.g., of outliers) while preserving ordinality.
This can all be done manually now in RapidMiner but it requires a daisy chain of related operators (e.g., Generate Copy, Sort, Generate ID, etc.) that would be nice to combine all into one simple operator.

MartinLiebig · February 2019

Hi @Telcontar120 ,

we always face the issue: number of operators vs ease of use. If it's just Sort + GenId i would oppose a new operator. It only makes sense if there is more involved than "just this" i.e your percentiles.

@tftemme thoughts?

BR,

Martin

Telcontar120 · February 2019

@mschmitz But the reality is that this is a very commonly required transformation. And you often want to do it on a whole set of attributes at once, which means 4 operators (sort, generate id, set role, and rename) inside a Loop Attributes. In my mind that's enough of a hassle to be worth a separate operator.

Telcontar120 · February 2019

Also the method above doesn't handle ties very well either, which requires even more complexity to address properly with rank values.
P.S. I'd like there to be a percentile operator for exactly the same reason! Once again, it can be done manually using a Loop and similar operators to the ones above, only with the additional complexity of calculating the percentile value from the raw rank value.

MartinLiebig · February 2019

@Telcontar120 ,

you are aware that Aggregate can now calculate percentiles?

BR,

Martin

Telcontar120 · February 2019

@mschmitz Of course, but it calculates specific requested percentile values, it does not easily provide percentile rankings for all examples. Those are two related but different operations.

tftemme · March 2019

Hi @Telcontar120 , @mschmitz

I think >=4 operators for one frequent transformation is enough to put this into one operator. I will create a ticket for that for the operator toolbox. We will have to see how to put it into it. If you have further description on how the operator should work or what options it should provide, feel free to post them. The more description the better.

Best regards,
Fabian

Telcontar120 · March 2019

@tftemme feel free to reach out via PM if you want me to explain in further detail about the specifications that I listed above. Automatic attribute copying/renaming, tie handling, and multi-attribute selection would probably be the most important options to include to save time.
I realized you could also actually have a single operator to handle both raw ranks as well as percentile ranks, with another option to control the output format (rank vs percentile rank).

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Add a native Rank operator to RapidMiner Studio

Open for Voting · Last Updated May 2019

Comments