The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Decision Tree Data exploration with numerical value

b00122599b00122599 Member Posts: 26 Contributor II
edited December 2019 in Help
Hey folks,

I am fairly new to data science but wish to use a deicision tree to explore a dataset. The dataset has no label so I am assigning a label that would be a numerical value of 1-20. Would it be possible to have my label to target only high scorers on that attribute so a the class label would only be those objects which are scored 15 - 20 on the attribute I select as a label? If this make sense would anyone have any ideas of how to do so in rapidminer?

Any help is much appreciated.

Neil. 

Best Answer

  • b00122599b00122599 Member Posts: 26 Contributor II
    Solution Accepted
    Thanks very much for the pointers guys much appreciated

Answers

  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hi @b00122599

    Trying to understand what you want, So you are adding a label column whose labels range between 1 and 20 (1,2,3,... 20). But you want to predict only labels between 15 and 20 which you treat as high scores. If you want to apply a decision tree for classification purpose it will train based on all the labels unless you delete unnecessary labels from the data. You can train a model only on labels from 15 to 20 by filtering examples (your model doesn't train on 1 to 14 labeled samples). 
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Or perhaps an even better solution would be to discretize your numerical label and turn it into a nominal attribute instead, where values of 15-20 get the class "high" and the others get the class "low."  This can be done with multiple operators in RapidMiner including Discretize by User Specification or Generate Attributes.
    Then you will simply use that as your label and you will have a typical classification problem, which your Decision Tree learner should handle easily.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.