Options

"Categorical Clustering"

pvelandopvelando Member Posts: 5 Contributor II
edited June 2019 in Help
Hi all,

I'm trying to clouster this data that has numerical and categorical attributes:

high 177 180 187 180 177 188 177 189 177 166 166 164 170 170 160 164 167 168
weight 86 79 85 83 87 80 78 80 82 72 66 65 79 67 61 61 63 68
Param1 A M V M A M V V A V N M N V A N A M
Param2 H H H H H H H H H M M M M M M M M M

There is no way to convert categorical attributes in numercial.

I would like to know which would be the right algorithm to cluster this data that takes into consideration the non-numerical attributes; which are certainly relevant in term of clustering significance (k-means definetly does not work).

Well, thank you very much in advance,
Tagged:

Answers

  • Options
    pvelandopvelando Member Posts: 5 Contributor II
    After some testing. I've seen that agglomerative clustering might work, although the results are not very handy.
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    K-means will work with this data if you use the distance measure 'mixed euclidean distance'. You will probably have to normalize the numerical attributes to be between 0 and 1 for all the attributes to have an equal influence.

    Regards

    Andrew
Sign In or Register to comment.