Extract cluster centroids and compare with other centroids?

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help

hi,

I want to do clustering with k-means e.g with k= 3...20 on 2 datasets, and I want to extract the centroids from those clusters and compare the centroids from dataset 1 with the centroids from dataset2.. (e.g. by the euclidean distance).. is there some way to do that? and if I compare centroids, how can I extract those 2 centroids from dataset1 and datatset 2 that are closest to eachother?

Best Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Fred,

     

    check the attached Process. I think this is what you want?

     

    ~Martin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Sonar"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="85"/>
    <operator activated="true" class="k_means" compatibility="7.4.000" expanded="true" height="82" name="Clustering" width="90" x="313" y="34"/>
    <operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="447" y="34"/>
    <operator activated="true" class="replace" compatibility="7.4.000" expanded="true" height="82" name="Replace" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="cluster"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="replace_what" value="(.+)"/>
    <parameter key="replace_by" value="Squared_$1"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.4.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="34">
    <parameter key="attribute_name" value="cluster"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="k_means" compatibility="7.4.000" expanded="true" height="82" name="Clustering (2)" width="90" x="313" y="136">
    <parameter key="measure_types" value="NumericalMeasures"/>
    <parameter key="numerical_measure" value="ManhattanDistance"/>
    </operator>
    <operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes (2)" width="90" x="447" y="136"/>
    <operator activated="true" class="replace" compatibility="7.4.000" expanded="true" height="82" name="Replace (2)" width="90" x="581" y="136">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="cluster"/>
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="replace_what" value="(.+)"/>
    <parameter key="replace_by" value="Manhattan_$1"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.4.000" expanded="true" height="82" name="Set Role (2)" width="90" x="715" y="136">
    <parameter key="attribute_name" value="cluster"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="cross_distances" compatibility="7.4.000" expanded="true" height="103" name="Cross Distances" width="90" x="849" y="85"/>
    <connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Clustering" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Clustering (2)" to_port="example set"/>
    <connect from_op="Clustering" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
    <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Distances" to_port="request set"/>
    <connect from_op="Clustering (2)" from_port="cluster model" to_op="Extract Cluster Prototypes (2)" to_port="model"/>
    <connect from_op="Extract Cluster Prototypes (2)" from_port="example set" to_op="Replace (2)" to_port="example set input"/>
    <connect from_op="Replace (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Cross Distances" to_port="reference set"/>
    <connect from_op="Cross Distances" from_port="result set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi,

     

    so you want to cluster and check if there are clusters with purely one label in? Sounds like aggregate count(label) group_by(cluster)? Otherwise you might want to check the operator Map Clustering On Label. 

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • Fred12Fred12 Member Posts: 344 Unicorn

    yes that was pretty much what I was looking for :) thanks..

    but one more question, is it possible to cluster by labels? I mean each label as one cluster, and then extract or calculate the cluster centroid of each label group? how does it work, should I give the label the role "cluster"? or how?

  • Fred12Fred12 Member Posts: 344 Unicorn
    @mschmitz wrote:

    Hi,

     

    so you want to cluster and check if there are clusters with purely one label in? Sounds like aggregate count(label) group_by(cluster)? Otherwise you might want to check the operator Map Clustering On Label. 

    Best,

    Martin


    yeah, thats part of what I originally wanted to do.. is it any possible to declare an example set as a Cluster model? e.g. after I aggregated the class labels and built average / centroid of all class values, can I declare those centroids as a cluster model? 

    edit: sorry I just noticed, that would then be no more necessary as centroids are transformed into normal example sets after then ;)

     

    the formula to get Centroids by label lass is it the same as you described above?

Sign In or Register to comment.