The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Information Gain Vs Gain Ratio
Hi Guys,
Going through with classification decision tree model using rapid miner, stuck with an experiment for information gain and gain ratio calculation, after reading following descriptions.
Information gain : It works fine for most cases, unless you have a few variables that have a large number of values (or classes).
Information gain is biased towards choosing attributes with a large number of values as root nodes.
Gain ratio : This is a modification of information gain that reduces its bias and is usually the best option. Gain ratio overcomes
the problem with information gain by taking into account the number of branches that would result before making the split.
It corrects information gain by taking the intrinsic information of a split into account.
When i use rapid miner operator "Weight by Information gain ratio" to calculate following sample data , it caluclates gain ratio for Outlook is quite different to my manual calculation- as below.
Rapid Miner Gain ratio calculation
Thanks
Sid
Going through with classification decision tree model using rapid miner, stuck with an experiment for information gain and gain ratio calculation, after reading following descriptions.
Information gain : It works fine for most cases, unless you have a few variables that have a large number of values (or classes).
Information gain is biased towards choosing attributes with a large number of values as root nodes.
Gain ratio : This is a modification of information gain that reduces its bias and is usually the best option. Gain ratio overcomes
the problem with information gain by taking into account the number of branches that would result before making the split.
It corrects information gain by taking the intrinsic information of a split into account.
When i use rapid miner operator "Weight by Information gain ratio" to calculate following sample data , it caluclates gain ratio for Outlook is quite different to my manual calculation- as below.
Sno Outlook PlayAbove manual "Gain Ratio (Sno) 0.31" calculated value matching to rapid miner "Gain Ratio (Sno) 0.310917507 ~ 0.31" calculation-as below, but above manual "Gain Ratio (Outlook) 0.13" is not matching to rapid miner "Gain Ratio (Outlook) 0.331559707 ~ 0.33" calculations
---- -------- ----------
A1 overcast Dont Play
B2 overcast Play
C3 rain Play
D4 rain Play
E5 rain Play
Following are my calculations for Gain ratio
Entropy for Outlook
H (Outlook) : Overcast
-1/2 log2 (1/2)-1/2 log2 (1/2)
-0.5 (-1) - 0.5 (-1)
H (Outlook) : 1
H (Outlook) : Rain
-3/3 log 2 (3/3)
-1 (0)
H (Outlook) : 0
-----------------------------------------------------------------------
Information Gain for outlook
I (Outlook) = 2/5*(1)+3/5 * (0)
=0.4
-----------------------------------------------------------------------
Entropy for Sno attribute
H (Sno) : A1
H (A1)= -1/5 log2(1/5)
0.0464
H (Sno) : B2
H (B2)= -1/5 log2(1/5)
0.0464
H (Sno) : C3
H (C3)= -1/5 log2(1/5)
0.0464
H (Sno) : D4
H (D4)= -1/5 log2(1/5)
0.0464
Hence
H(E5) = 0.0464
------------------------------------------------------------------------------
Information Gain for Sno attribute
I (Sno)
=1/1*log2(1/1)+1/1*log2(1/1)+1/1*log2(1/1)+1/1*log2(1/1)+1/1* log2(1/1)
=0
------------------------------------------------------------------------------
I (Outlook , no partition)
I(Outlook,no partition) =-1/5log2 (1/5)-4/5 log2 (4/5)
=-0.2*(-2.32192809)-0.8(-0.321928095)
=0.464385618+0.257542476
=0.72
-----------------------------------------------------------------------------
Entropy before - Entropy After for Outlook
I (Outlook ,no partition)-I (Outlook)=0.72-0.4
=0.32
Entropy before - Entropy After for Sno
I (Outlook ,no partition)-I (Outlook)=0.72-0
=0.72
------------------------------------------------------------------------------
Gain Ratio :
Intrinsic information 5*(-1/5*log2(1/5))
5*(-0.2(-2.32))
5*(0.464)
2.32
Gain Ratio (Outlook)= I (Outlook)/Intrinsic information
= 0.32/2.32
= 0.13
Gain Ratio (Sno) = I (Sno)/Intrinsic information
= 0.72/2.32
= 0.31
Rapid Miner Gain ratio calculation
Why it so ? i am using "Weight by Information gain ratio" operator in rapid miner.
Sno 0.310917507 ~ 0.31
Outlook 0.331559707 ~ 0.33
Thanks
Sid
0