Decision Tree #entropy #criterion #kappa #accuracy

CelineS · August 2020

Hi guys,

Could anyone explain how to define and detect entropy in DT? ( what are the blue and the red label stands for under the leaf?

Is the 70% accuracy and kappa 0.30ish enough for prediction?

What criterion should I choose for DT '' gain_raio '' or '' information_gain '' to maximise my accuracy and kappa?

regards,

Telcontar120 · August 2020

Hi there, you have a few questions embedded in your post, so I'll try to comment on most of them.
The blue/red labels under each node indicate the number of examples that fell into each category in that node. The ratio of these forms the basis of the confidence score generated by the DT.
If you want to maximize your tree for accuracy, you can select accuracy directly as the main criterion for tree growth. But it is not possible to say in the abstract whether accuracy of 70% is "good enough" for prediction. In some fields that would be considered great and used with no problem, while in other fields it would be horrible. This question is very domain and dataset specific.
Information gain tends to favor attributes with more categories/specific values, because it is not adjusted for the number of possible distinct values. Information gain ratio adjusts for this, so all else being equal, information gain ratio is probably the more robust criteria between the two (which is why it is the default). If you want to understand how to calculate information gain, the wikipedia article has a good summary: https://en.wikipedia.org/wiki/Information_gain_in_decision_trees

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Decision Tree #entropy #criterion #kappa #accuracy

Answers