"comparing decision trees"

karen · March 2011

Hi! I'm generating decision trees varying input parameters: criterium, maximal depth and confidence. For comparing the different decision trees obtained i'm mostly considering:
-Accuracy
-Precision
-Recall
-Class Frequencies, because i'm interested in the most frequently classes obtained.

But Rapid Miner offers for example 4 criteria ( gain_ratio, information_gain, gini_index, accuracy) for Decision Tree (i'm not working with Multy way decision tree, or with weight based, ID3 or CHAID). Each of them generates trees with different class frequencies, accuracy, recall and precision.

I was wondering if there is some kind of framework for comparing this trees. For example i can obtain classes with high frequencies but not so high Precision, or lower frequencies and higher accuracy, how can these trees be compared?

Regarding the usability of the obtained results:
If following a branch of the tree i get some frequency but following another i get a slightly higher frequency involving more variables maybe this last one is better because it's more informative but how could i compare them? which branch is "better" if they are all slightly different? or all of them are quite similar regarding frequency?

Thanks
Karen

wessel · March 2011

Gain_ratio, information_gain, gini_index, and accuracy are measures to decide on which attribute to split, given the current dataset split.
These are parameters of the decision tree learner.

Accuracy, Precision, Recall are performance measure, on some test set. Rapid miner has these in the Performance (Classification) operator.

There are several frameworks to compare models.
The one best suited to trees, in my opinion, is the Minimal Description Length Framework.
http://en.wikipedia.org/wiki/Minimum_description_length

rakirk · April 2011

Hi Karen,

I agree with everything wessel said. I wanted to add that a typical trade-off analysis is done with learners in general (and decision trees are no exception) that compares model accuracy within a data set to model accuracy at classifying new data. A more generalizable model would be more favorable for predictive analysis. A more accurate, specialized model would be good for understanding a particular data set. Limiting the tree-depth is (in my opinion) probably the fastest way to explore these trade-offs.

regards,
rk

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"comparing decision trees"

Answers