The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Problem with decision tree algorithm"

szymekszymek Member Posts: 3 Contributor I
edited June 2019 in Help
hi,

I tried to run the Decision tree algorith in Raipd Miner and it seems not to provide a correct result. I am not sure if the problem is caused by the implementation of the algorith or there is another reason for that. Below is the exercise that I tried to run with RM.

I use the following data (A and B are nominal, binary attributes and there are two classes: + and-):
A,B,Class
T,F,+
T,T,+
T,T,+
T,F,-
T,T,+
F,F,-
F,F,-
F,F,-
T,T,-
T,F,-

I want to build a decision tree using Ginin index as the criterion for splitting. Rapid Miner selects attribute A as the best one for splitting. However, if I make calculations manually, B seems to be better. Do you know where is the difference from? Below are my calculations:
The overall gini before splitting is:
Gorig = 1βˆ’ 0.42 βˆ’ 0.62 = 0.48

The gain in gini after splitting on A is:
GA=T = 1βˆ’(4/7)2 βˆ’(3/7)2 = 0.4898
GA=F = 0
Ξ” = Gorig βˆ’ 7/10 GA=T βˆ’ 3/10 GA=F = 0.1371

The gain in gini after splitting on B is:
GB=T = 1βˆ’(1/4)2βˆ’ (3/4)2 = 0.3750
GB=F= 1 - (1/6)2 βˆ’ (5/6)2 = 0.2778
Ξ” = Gorig βˆ’ 4/10 GB=T βˆ’ 6/10 GB=F = 0.1633

Therefore, attribute B should be chosen to split the node (and not A as calculated by RM).

regards,
Szymon

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    thank you for this hint. We will check that, but might take some time.

    Greetings,
    Β  Sebastian
Sign In or Register to comment.