The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Decision tree

sshildermansshilderman Member Posts: 9 Contributor II
edited November 2018 in Help

I'm trying to use a decision tree to predict user will leave.

My data include 4 regular attributes (2 nominal, 2 integer), and 1 special attribute (nominal label).

When using the Decision Tree operator I don't get a tree with all data, only one of the regular appear (as root) and the leafs contains the label data (which is OK).

 

What am I doing wrong?

Tagged:

Answers

  • bhupendra_patilbhupendra_patil Employee-RapidMiner, Member Posts: 168 RM Data Scientist

    Hello, this may be simply happening because the data does not have patterns that fit the criteria you set.

     

    I will suggest trying values for pruning, prepruning and confidence values.

     

    A better way to find a right value for these would be using the "Optimize Parameters (Grid) operator and giving it a range to try combinations of some of these variables that affect your model.

     

    You should be able to see a sample process in the help for "Optimize Parameters(Grid)" to see how this operator works

     

    Good Luck

  • sshildermansshilderman Member Posts: 9 Contributor II

    Followup question -

     

    First of all, thank you for your answer.

    I created a table with patterns (manually), first to check i'm doing it right.

     

    Is there a way to know who is located in each leaf?

    I would like to learn which users will have a specific value (the labell value) in the future.

     

    Bests. 

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist

    Hi,

     

    what you can do is use the tree to rules operator. As a result (see attached process) you get the paths as strings. That might be helpful in first place. There is no one operator solution to apply this rules to a dataset to get "leaf IDs" but it might be possible to find some working process with things like Write as Text and then parse the resulting text files.

     

    Best,

    Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.1.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
    </operator>
    <operator activated="true" class="tree_to_rules" compatibility="7.1.001" expanded="true" height="82" name="Tree to Rules" width="90" x="246" y="85">
    <process expanded="true">
    <operator activated="true" class="parallel_decision_tree" compatibility="7.1.001" expanded="true" height="82" name="Decision Tree" width="90" x="45" y="34"/>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="380" y="85">
    <list key="application_parameters"/>
    </operator>
    <connect from_op="Retrieve Golf" from_port="output" to_op="Tree to Rules" to_port="training set"/>
    <connect from_op="Tree to Rules" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="Tree to Rules" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.