Weighting and nominal attributes

kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn
edited June 2019 in Help

Hello there rapidminers  

 

Small question on the process which is used for weighting attributes within auto model process. 

It has a section which processes nominals, namely, performs dummy coding: 

 

Screenshot 2018-05-30 165840png

 

The question is, for what reason this is done specifically for weighting process?

How then one should interpret weighting results?

For example, here are results from IP traffic classification, with and without dummy coding; as one can see, for binominal categories weights are exactly the same in values, but how to interpret certain chosen values included in the first case (all false except for cat_spam = true)? 

 

Screenshot 2018-05-30 170558pngWeights with dummy codingScreenshot 2018-05-30 170629pngWeights without dummy coding

 (kindly tagging @IngoRM)

Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hey,

     

    keep in might that these weights are pearsons rho's. So you can't throw this method on nominals and need to do the conversion to dummy coding.

     

    Cheers!

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Aah yes exactly :) 

    Still, there's a question of interpretation, namely, I struggle with putting into explanation of relation between these true/false values and label. Does in my example 'cat_reputation = false' support or contradict 'label = true'?  Or the other way around, based on a rather low correlation value from the corr. matrix (0.099), it is just 'the most important predictor' among others, while still quite weak? 

     

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    i think it should support it, if i got it right. But it's normalized to 1, so it's all relative to the highest influence factor

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    tagging @IngoRM if he's available...

     

     

Sign In or Register to comment.