The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"Weighted Attributes in cluster analysis"
Hello,
I attempt to evaluate attributes differently for a cluster analysis. For example should be the weight of attribute 1 25%, the weight of attribute 2 50% and attribute 3 should be weighted with 25%. I tried the "weight by user"- operator and "scale by weights"-operator. I edit all attributes, uncheck "normalize weights" and uncheck "distribute weights". Unfortunately one of the attributes weights is automatically set on the weight of 1 instead of 0.25 after the execution. I have no idea why it happens.
Another problem is that I have one attribute which is nominal. I already binary-coded it. How can I use this attribute for my cluster analysis. It has to be weighted, too.
Thanks for your help.
I attempt to evaluate attributes differently for a cluster analysis. For example should be the weight of attribute 1 25%, the weight of attribute 2 50% and attribute 3 should be weighted with 25%. I tried the "weight by user"- operator and "scale by weights"-operator. I edit all attributes, uncheck "normalize weights" and uncheck "distribute weights". Unfortunately one of the attributes weights is automatically set on the weight of 1 instead of 0.25 after the execution. I have no idea why it happens.
Another problem is that I have one attribute which is nominal. I already binary-coded it. How can I use this attribute for my cluster analysis. It has to be weighted, too.
Thanks for your help.
Tagged:
0
Answers
could post your process here? That makes it a lot easier to answer your questions. Transforming nominal attributes to numerical ones is possible. If it is reasonable for clustering depends on the character of the nominal value, i.e. it won't be helpful to map nominal values to unique numbers, instead the construction of a reasonable measure is recommended to fit the statistical distance approach used by many cluster techniques. Maybe you can provide more information about your nominal attributes.
Cheers,
Helge
My second question is, if RM can handle a cluster analysis with nominal and metric attributes.. In textbooks exist two possibilities to solve it.
First possibility is to separate the nominal and metric attributes. You have to calculate for both types a different matrix. For example you can use Tanimoto / Jaccard similarity for the nominal attributes and euclidean distance for the metric attributes. Then you can combine both matrices and weight the elements.
The second possibility is to transform the metric attributes to nominal.
Can RM create solution to a similarity matrix with Tanimoto/Jaccard or a distance matrix with the euclidean distance? Or is there another solution to that problem?