🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Should you normalize dummy coded variables in clustering?

CuriousCurious Member Posts: 12 Newbie
edited June 15 in Help
Can you keep them as dummies and only normalize numeric variables?
Tagged:

Best Answer

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,666  RM Founder
    Hi,
    I would say this depends on the normalization.  If you normalize the rest to the range between 0 and 1, you can keep them as is.  Otherwise I would personally normalize all columns the same way (e.g. z-transformation).
    Hope this helps,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    varunm1sgenzerCurious
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,155  RM Data Scientist
    Hi,
    i usually use PCA after dummy coding to get rid of the problem.
    Best,
    Martin 
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    varunm1
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,254   Unicorn
    @mschmitz but doesn't that get rid of your underlying attributes as well and replace them with synthetic PCs?  That's probably not a helpful feature for clustering, or at least it wouldn't be for most of the clustering projects I have worked on.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,155  RM Data Scientist
    @Telcontar120,
    i later on join the original data back to the clustering results and start to interpret from there.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    Telcontar120
Sign In or Register to comment.