# Rapid Miner Gain Ratio Calculation

Member Posts: 2 Contributor I
edited November 2018 in Help
Hi Guys,

Going through with classification decision tree model using rapid miner, stuck with an experiment for information gain and gain ratio calculation, after reading following descriptions.

Information gain : It works fine for most cases, unless you have a few variables that have a large number of values (or classes).
Information gain is biased towards choosing attributes with a large number of values as root nodes.

Gain ratio : This is a modification of information gain that reduces its bias and is usually the best option. Gain ratio overcomes
the problem with information gain by taking into account the number of branches that would result before making the split.
It corrects information gain by taking the intrinsic information of a split into account.

When i use rapid miner operator "Weight by Information gain ratio" to calculate following sample data , it caluclates gain ratio for Outlook is quite different to my manual calculation- as below.
`Sno	Outlook	 Play----    -------- ----------A1	overcast Dont PlayB2	overcast PlayC3	rain	 PlayD4	rain	 PlayE5	rain	 PlayFollowing are my calculations for Gain ratioEntropy for Outlook	H (Outlook) : Overcast			               -1/2 log2 (1/2)-1/2 log2 (1/2)		                       -0.5 (-1) - 0.5 (-1)			H (Outlook) :  1			H (Outlook) : Rain		                      -3/3 log 2 (3/3)		                      -1 (0)			H (Outlook) :  0		-----------------------------------------------------------------------Information Gain for outlook			I (Outlook) = 2/5*(1)+3/5 * (0)			    =0.4	-----------------------------------------------------------------------Entropy for Sno attributeH (Sno) : A1		H (A1)= -1/5 log2(1/5)			 0.0464	H (Sno) : B2		H (B2)= -1/5 log2(1/5)			 0.0464	H (Sno) : C3		H (C3)= -1/5 log2(1/5)			0.0464	H (Sno) : D4		H (D4)= -1/5 log2(1/5)			0.0464	Hence H(E5) = 0.0464	------------------------------------------------------------------------------Information Gain for Sno attributeI (Sno)=1/1*log2(1/1)+1/1*log2(1/1)+1/1*log2(1/1)+1/1*log2(1/1)+1/1* log2(1/1)			=0------------------------------------------------------------------------------I (Outlook , no partition)I(Outlook,no partition)	=-1/5log2 (1/5)-4/5 log2 (4/5)	            	        =-0.2*(-2.32192809)-0.8(-0.321928095)	 	                =0.464385618+0.257542476	               	        =0.72-----------------------------------------------------------------------------Entropy before - Entropy After for Outlook I (Outlook ,no partition)-I (Outlook)=0.72-0.4		                                      =0.32		Entropy before - Entropy After for Sno I (Outlook ,no partition)-I (Outlook)=0.72-0                                      =0.72	------------------------------------------------------------------------------Gain Ratio :			Intrinsic information	5*(-1/5*log2(1/5))		                 	5*(-0.2(-2.32))		                	5*(0.464)		                	2.32		Gain Ratio (Outlook)= I (Outlook)/Intrinsic information			           = 0.32/2.32			           = 0.13			Gain Ratio (Sno) = I (Sno)/Intrinsic information			           = 0.72/2.32			           = 0.31`
Above manual "Gain Ratio (Sno) 0.31" calculated value matching to rapid miner "Gain Ratio (Sno)  0.310917507 ~ 0.31" calculation-as below, but above manual "Gain Ratio (Outlook)  0.13" is not matching to rapid miner "Gain Ratio (Outlook) 0.331559707 ~ 0.33" calculations

Rapid Miner Gain ratio calculation
`Sno	0.310917507 ~ 0.31Outlook	0.331559707 ~ 0.33`
Why it so ? i am using "Weight by Information gain ratio" operator in rapid miner.

Thanks
Sid