ANNOUNCEMENT: RAPIDMINER 9.1 BETA HAS BEEN RELEASED TODAY!   PLEASE DOWNLOAD AND GIVE FEEDBACK. ENJOY AND HAPPY RAPIDMINING!   -- @sgenzer – Community Manager

Weighting and nominal attributes

kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 193   Unicorn
edited November 10 in Help

Hello there rapidminers :)  

 

Small question on the process which is used for weighting attributes within auto model process. 

It has a section which processes nominals, namely, performs dummy coding: 

 

Screenshot 2018-05-30 16.58.40.png

 

The question is, for what reason this is done specifically for weighting process?

How then one should interpret weighting results?

For example, here are results from IP traffic classification, with and without dummy coding; as one can see, for binominal categories weights are exactly the same in values, but how to interpret certain chosen values included in the first case (all false except for cat_spam = true)? 

 

Screenshot 2018-05-30 17.05.58.pngWeights with dummy codingScreenshot 2018-05-30 17.06.29.pngWeights without dummy coding

 (kindly tagging @IngoRM)

Answers

  • mschmitzmschmitz Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 1,801  RM Data Scientist

    Hey,

     

    keep in might that these weights are pearsons rho's. So you can't throw this method on nominals and need to do the conversion to dummy coding.

     

    Cheers!

    Martin

    kypexin
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 193   Unicorn

    Aah yes exactly :) 

    Still, there's a question of interpretation, namely, I struggle with putting into explanation of relation between these true/false values and label. Does in my example 'cat_reputation = false' support or contradict 'label = true'?  Or the other way around, based on a rather low correlation value from the corr. matrix (0.099), it is just 'the most important predictor' among others, while still quite weak? 

     

  • mschmitzmschmitz Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 1,801  RM Data Scientist

    Hi,

     

    i think it should support it, if i got it right. But it's normalized to 1, so it's all relative to the highest influence factor

     

    Best,

    Martin

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager Posts: 1,832  Community Manager

    tagging @IngoRM if he's available...

     

     

Sign In or Register to comment.