How to weight features before clustering

kayvanjookayvanjoo Member Posts: 4 Contributor I
edited July 2019 in Help

Dear all,

I have set of data that I want to cluster but before clustering I want to weight my features.

My question is how should I label my data as feature selection methods require a label role and I actually do not have any real label yet?

Should I set my expriments IDs as the label ?

What feature selection methods can I use before I cluster my data?

 

Thanks for your attention!

 

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you don't have a label, then how are you planning to assign weights?  Assigning the label tells RapidMiner the thing you are interested in predicting, which most weighting schemes will evaluate other attributes with respect to that label.  So don't use an ID variable, that would be pointless.  So perhaps you can explain a bit more about what you are trying to accomplish with the weighting prior to clustering?

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • kayvanjookayvanjoo Member Posts: 4 Contributor I

    Yes true !

    The reason that I want to use attribute weighting is that I somehow want to do feature selection and selecting statistically imporant features with a weight higher than .5 for example in order to classify my data points and that's why I want to do attribute weighting but I actually now dont have any idea that how can it be done or if it is possible or not!

    looking forward for your suggestions !

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    When you say "statistically important" that implies you have a reference point---statistically important to what?  That's generally when you have a label. Machine learning problems in general are classified as either supervised learning, where you have some specific target variable (called a label in RapidMiner) in mind, and unsupervised learning, where you don't have such a goal and instead the algorithms are merely looking for interesting structures or relationships in the data.

     

    You haven't said much about what you are actually trying to accomplish, but if clustering is the key method, then I would suggest that you go ahead and run your clustering without worrying about weighting yet.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yes, you can do all that. What you want to check out is the Select by Weights operator. There you can set your threshold and automatically select the attributes you want. 

Sign In or Register to comment.