Difference between InfoGainRatio and W-GainRatioAttributeEval

D_MD_M Member Posts: 15 Maven
edited November 2018 in Help
Hi,

What is the difference between InfoGainRatioWeighting and W-GainRatioAttributeEval operator? Do both of them measure GainRatio of an attribute w.r.t to class?

Answers

  • RalfKlinkenbergRalfKlinkenberg Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member, Unconfirmed, University Professor Posts: 68 RM Founder
    Hi D.M.,

    yes, both compute the same measure.

    InfoGainRatio is the Rapid-I/RapidMiner implementation and W-GainRatioAttributeEval is the Weka implementation of this metric.

    Best regards,
    Ralf
  • D_MD_M Member Posts: 15 Maven
    Thanks Ralf for replying.

    But the result I am getting is different for the two operators.

    I am using it alongwith AttributeWeightSelection for doing attribute selection.

    The attributes selected after the AttributeWeightSelection is different in both the cases.
    I am using the same value for all the parameters in AttributeWeightSelection in both the cases.
    [weight - 0.0, weight_relation - greater and all other parameters set to their default value.]
  • D_MD_M Member Posts: 15 Maven
    Let me put the process chain that I am using. This might help to locate the problem.

    Root
      TextInput
          StringTokenizer
          TokenlengthFilter
      W-GainRatio Attribute Eval / InfoGainRatioWeighing
      AttributeWeightSelection
      X-Validation
        LibSVMLearner
        OperatorChain
            ModelApplier
            Performance

    Using TFIDF as feature Vector.
  • D_MD_M Member Posts: 15 Maven
    Which operator is correct - W-GainRatioAttributeEval or InfoGainRatioWeighing?
    W-GainRatioAttributeEval is giving me much better result. Also it is much faster.

    If someone knows it. Plez reply.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    For nominal attributes: there if no difference. You can see this if you turn off normalization for such a data set. For numerical attributes: there is a difference in result as well as in runtime since the W-... variant performs a discretization before which is not done by the other one which checks all possible split points for numerical values. This needs indeed more time but usually delivers better weights for numerical values. Again, just perform the discretization yourself before applying the weighting and there will be no difference at all.

    Cheers,
    Ingo
  • D_MD_M Member Posts: 15 Maven
    Thanks Ingo for replying.

    Is there any way I can see the GainRatio values calculated for various attributes?

    I have put a breakpoint in the W-GainRatioAttributeEval / InfoGainRatioWeighing operator. It is showing 'range' coloumn and 'statisitcs'. But I guess they are not the Gain Ratio value calculated for an attribute. Because I am using AttributeWeightSelection with attribute weight=0.0 & weight_relation=greater & the attributes that are getting pruned is not according to the value in these colomns.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the weighting operators return two results: The input example set and a weights object denoting the calculated weight. Both objects will be shown in the result view as tabs. If you switch to the AttributeWeights tab, you will see a table with the (normalized) weights, calculated by the InfoGainRatioWeighting. If you want to see the original info gain ration, you should turn of the normalization by deselecting the "normalization" parameter of the operator.

    Greetings,
      Sebastian
Sign In or Register to comment.