The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How do I split up scored data into 20 equally sized segments?

simon_philiposesimon_philipose Member Posts: 3 Learner I
edited February 2020 in Help

Hi there-- still only a few days into using RapidMiner and wasn't sure if/how I could go about doing the following: 

I created a logistic regression model for direct mail marketing. I've scored my model onto new data but what I want to be able to do is split the scored data up into 20 different groups based on their descending confidence(responder) value resulting in the A's having 1/20th of the most likely responders, the Bs having 1/20th of the next most likely and so on.

Your help is much appreciated.

-Simon




Tagged:

Best Answer

Answers

  • Pavithra_RaoPavithra_Rao Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 123 RM Data Scientist
    Hi @simon_philipose,

    You can first use Sort operator to Sort confidence values with the descending order, followed by Split data operator.
    In split data operator Parameter window; add partition ratio = 1/20

    Hope this helps.

    Cheers,
    Pavithra
  • simon_philiposesimon_philipose Member Posts: 3 Learner I

    Hi Pavithra,

    Thank you for your response. So I ran into a few problems with using the Split Data operator.

    1. It splits the dataset into multiple datasets. What I need is one data set but with a field called Model_Group with a value of A, B, C, D, etc. depending on the confidence values.

    2. It appears the maximum number of data sets I can split is 8 by putting .125 in the partions ratio field 8 times. I can't do 10, much less 20 different splits.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    i would do the following:

    Sort - by confidence
    Generate ID - to get a index
    Use Generate attributes with id%10 to get your Model_Group

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • simon_philiposesimon_philipose Member Posts: 3 Learner I
    Thank you so much @rfuentealba -- your solution worked perfectly! Very much appreciated!!
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Wow, so many ways to do this in RapidMiner!  
    If you copy your score attribute first, Discretize by Frequency should be able to do this directly for your score attribute by selecting that attribute and setting the number of bins to 20.  This will create exactly the bins you are looking for, although if there are a large number of ties this can sometimes cause problems for the Discretize operators.  (The reason you copy the score first is Discretize will replace your selected attribute with a new attribute, so if you still want to have the raw score, you will need two copies of it, one which is binned and one which is not).
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.