Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
clustering - how to customize similarity function
Hi
so clustering is basically based on similarity/distance functions in which one example is compared to all the other examples of the data set.
Now I want a similiraty function in which the cell of an attribute/feature is not compared to all the other examples - instead that particular cell should be compared to only a given range of defined values. For example: a cell can have two possible values [yes,no]. So the similiarity function should compare the given cell value with just the values 'yes' and 'no'.
Is this possible with RM? If so, how?
Thank you!
Tagged:
0
Best Answers
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornIf I have understood what you are trying to do, I think you can replicate what you want by simply creating a new attribute that specifies whether a given attribute value is contained in a reference set. You can do this using Generate Attributes and the "contains" function. After doing that (and you can loop through any set of attributes for doing this) you will get a set of yes/no attributes which you can then use to do your clustering as opposed to your original attributes.6
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn@LeMarc You can chain several "contains" functions inside an IF statement to cover as many allowed values as you like. I think the approach I described will work for you; you will end up with a new set of attributes that tell you whether each of the underlying attributes contains a valid response or not (binary yes/no).
I'm not convinced that clustering is going to be the best way to handle this problem, though. You might want to look at some of the outlier detection algorithms as well.
5
Answers
Dortmund, Germany
Dortmund, Germany