The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Decision Tree #entropy #criterion #kappa #accuracy
Hi guys,
Could anyone explain how to define and detect entropy in DT? ( what are the blue and the red label stands for under the leaf?
Is the 70% accuracy and kappa 0.30ish enough for prediction?
What criterion should I choose for DT '' gain_raio '' or '' information_gain '' to maximise my accuracy and kappa?
regards,
Could anyone explain how to define and detect entropy in DT? ( what are the blue and the red label stands for under the leaf?
Is the 70% accuracy and kappa 0.30ish enough for prediction?
What criterion should I choose for DT '' gain_raio '' or '' information_gain '' to maximise my accuracy and kappa?
regards,
Tagged:
0
Answers
The blue/red labels under each node indicate the number of examples that fell into each category in that node. The ratio of these forms the basis of the confidence score generated by the DT.
If you want to maximize your tree for accuracy, you can select accuracy directly as the main criterion for tree growth. But it is not possible to say in the abstract whether accuracy of 70% is "good enough" for prediction. In some fields that would be considered great and used with no problem, while in other fields it would be horrible. This question is very domain and dataset specific.
Information gain tends to favor attributes with more categories/specific values, because it is not adjusted for the number of possible distinct values. Information gain ratio adjusts for this, so all else being equal, information gain ratio is probably the more robust criteria between the two (which is why it is the default). If you want to understand how to calculate information gain, the wikipedia article has a good summary: https://en.wikipedia.org/wiki/Information_gain_in_decision_trees
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts