The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Problem with decision tree algorithm"
hi,
I tried to run the Decision tree algorith in Raipd Miner and it seems not to provide a correct result. I am not sure if the problem is caused by the implementation of the algorith or there is another reason for that. Below is the exercise that I tried to run with RM.
I use the following data (A and B are nominal, binary attributes and there are two classes: + and-):
A,B,Class
T,F,+
T,T,+
T,T,+
T,F,-
T,T,+
F,F,-
F,F,-
F,F,-
T,T,-
T,F,-
I want to build a decision tree using Ginin index as the criterion for splitting. Rapid Miner selects attribute A as the best one for splitting. However, if I make calculations manually, B seems to be better. Do you know where is the difference from? Below are my calculations:
The overall gini before splitting is:
Gorig = 1β 0.42 β 0.62 = 0.48
The gain in gini after splitting on A is:
GA=T = 1β(4/7)2 β(3/7)2 = 0.4898
GA=F = 0
Ξ = Gorig β 7/10 GA=T β 3/10 GA=F = 0.1371
The gain in gini after splitting on B is:
GB=T = 1β(1/4)2β (3/4)2 = 0.3750
GB=F= 1 - (1/6)2 β (5/6)2 = 0.2778
Ξ = Gorig β 4/10 GB=T β 6/10 GB=F = 0.1633
Therefore, attribute B should be chosen to split the node (and not A as calculated by RM).
regards,
Szymon
I tried to run the Decision tree algorith in Raipd Miner and it seems not to provide a correct result. I am not sure if the problem is caused by the implementation of the algorith or there is another reason for that. Below is the exercise that I tried to run with RM.
I use the following data (A and B are nominal, binary attributes and there are two classes: + and-):
A,B,Class
T,F,+
T,T,+
T,T,+
T,F,-
T,T,+
F,F,-
F,F,-
F,F,-
T,T,-
T,F,-
I want to build a decision tree using Ginin index as the criterion for splitting. Rapid Miner selects attribute A as the best one for splitting. However, if I make calculations manually, B seems to be better. Do you know where is the difference from? Below are my calculations:
The overall gini before splitting is:
Gorig = 1β 0.42 β 0.62 = 0.48
The gain in gini after splitting on A is:
GA=T = 1β(4/7)2 β(3/7)2 = 0.4898
GA=F = 0
Ξ = Gorig β 7/10 GA=T β 3/10 GA=F = 0.1371
The gain in gini after splitting on B is:
GB=T = 1β(1/4)2β (3/4)2 = 0.3750
GB=F= 1 - (1/6)2 β (5/6)2 = 0.2778
Ξ = Gorig β 4/10 GB=T β 6/10 GB=F = 0.1633
Therefore, attribute B should be chosen to split the node (and not A as calculated by RM).
regards,
Szymon
Tagged:
0
Answers
thank you for this hint. We will check that, but might take some time.
Greetings,
Β Sebastian