My Decision Tree Shows only one Node
Question
I've taught a decision tree, but i do not get a real tree, but only one node/leaf. Why?
Answer
Decision Trees are known to be vulnerable against overtraining/overfitting. To prevent your tree becoming overtrained they have the option to (pre)prune themselves. Pruning is a way to cut away leaves which are not statistically meaningful, after the tree was built. Prepruning prevents that such leaves being built at all.
If you have only one node in your tree, it is very likely that the standard pruning options are preventing the tree growing. A drastic way to change this is to deactivate pruning and prepruning. Another way is to loosen the requierements for the cuts.
The two most important settings here are:
minimal_gain: specifies how good a cut needs to be that it is really executed. A 0 means all cuts are made while a 1 means only cuts which purify the two leafs are executed. The standard setting of 0.1 is a hard requierement. Common values for minimal gain are between 0.1 - 0.001.
confidence: This is a statistical measure based on binominal distribution which branches should be pruned away after building the tree. 0.25 is a reasonable number, but you might reduce it further to have a bigger tree.
As usual you should use proper validation (e.g X-Validation with a hold-out sample) to measure performance.
Dortmund, Germany
Comments
Hi , I have the same problem. My target value / label has two values - Yes / No and 95 % values are NO. The otput is one node - Class NO . But I want to find combination of variables, that will lead to answer Yes - pls help.
RasTo
Dear Rasto,
welcome to the community! The problem you are facing is not one of the decision tree itself, but rather of your problem. You have a unbalanced problem. There are some ways to work with unbalanced problems. The most common techniques are:
Downsample the majority class. This can be archived with the Sample operator. You need to use the Balance Data option of it.
Weighting: You can add weights to your examples, so that a Yes counts 10x a No. The easiest way to balance your classes with weights is the Generate Weight (Stratification) operator. I would recommend to use this first. The Sum of Weights should be set to your sample size.
Best,
Martin
Dortmund, Germany