Information gain calculation

slayer_666slayer_666 Member Posts: 1 Contributor I
edited November 2018 in Help
Hi,
    I am a newbie to rapid miner and would be extremely grateful for any help ....i am trying to get "information gain" numbers about a dataset (based on a numerical classifier) ....so primarily i cant use most decision trees since i am using a numerical classifier and i cant figure how to get hold of the info gain numbers (also gain ratio numbers) ... kindly let me know how to obtain the same ...for e.g. if i want to know what attributes give max info gain about say salary figures ...thers something called info gain weight but it has a max value of 1 and i cant even specify the classifying attribute. Bottomline

a) How do i set/specify the classifying attribute in case of classification using decision trees
b) How do i get hold of the information gain and gain ratio numbers specifically.

Regards,
Vikram

Answers

  • dan_agapedan_agape Member Posts: 106 Maven
    Hi,

    For a start with RapidMiner, you can use the introductory tutorials accessible directly from the software (see Help in the menu), or on the software webpage, or on Youtube.

    You may need also an introduction to Data Mining or Machine Learning as the terms you use suggest some confusion about the respective notions. Without a proper introduction, working in the area may be a frustrating experience, as finally it is quite technical.

    Returning to your questions, use set role operator to choose the output attribute in RM (called label here).

    Note that when the output attribute is nominal (or categorical) then this defines a classification problem (so the model is a classifier); when the output attribute is numeric, we speak about estimation (or regression). Note that the output attribute is always indicated by the problem to solve, assuming it is a supervised learning problem.

    Finally, the information gain and gain ratio appear in the set of criteria for choosing the most predictive input attributes when building a decision tree. The information gain or the gain ratio (depending which one was chosen) are not displayed for the user, but are just used in the recursive process for generating the tree. If you want to see your input attributes evaluated via these criteria, use attribute weighting operators (weight by information gain or gain ratio). The weights are precisely the information gains (or the gain ratios, respectively) of the attributes with respect to the label attribute and the current dataset. Note that the process of building the decision tree usually involves repeated calculations of these weights, as the current dataset gets smaller and smaller while going deeper in the tree, and for each node one needs to re-evaluate the predictiveness of the available attributes in order to choose the best one for that node and its corresponding dataset.

    Dan
Sign In or Register to comment.