Weighting and nominal attributes

kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 258   Unicorn
edited November 2018 in Help

Hello there rapidminers :)  

 

Small question on the process which is used for weighting attributes within auto model process. 

It has a section which processes nominals, namely, performs dummy coding: 

 

Screenshot 2018-05-30 16.58.40.png

 

The question is, for what reason this is done specifically for weighting process?

How then one should interpret weighting results?

For example, here are results from IP traffic classification, with and without dummy coding; as one can see, for binominal categories weights are exactly the same in values, but how to interpret certain chosen values included in the first case (all false except for cat_spam = true)? 

 

Screenshot 2018-05-30 17.05.58.pngWeights with dummy codingScreenshot 2018-05-30 17.06.29.pngWeights without dummy coding

 (kindly tagging @IngoRM)

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,979  RM Data Scientist

    Hey,

     

    keep in might that these weights are pearsons rho's. So you can't throw this method on nominals and need to do the conversion to dummy coding.

     

    Cheers!

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    kypexin
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 258   Unicorn

    Aah yes exactly :) 

    Still, there's a question of interpretation, namely, I struggle with putting into explanation of relation between these true/false values and label. Does in my example 'cat_reputation = false' support or contradict 'label = true'?  Or the other way around, based on a rather low correlation value from the corr. matrix (0.099), it is just 'the most important predictor' among others, while still quite weak? 

     

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,979  RM Data Scientist

    Hi,

     

    i think it should support it, if i got it right. But it's normalized to 1, so it's all relative to the highest influence factor

     

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor Posts: 2,188  Community Manager

    tagging @IngoRM if he's available...

     

     

Sign In or Register to comment.