RapidMiner

RapidMiner

Extract cluster centroids and compare with other centroids?

SOLVED
Elite

Extract cluster centroids and compare with other centroids?

hi,

I want to do clustering with k-means e.g with k= 3...20 on 2 datasets, and I want to extract the centroids from those clusters and compare the centroids from dataset 1 with the centroids from dataset2.. (e.g. by the euclidean distance).. is there some way to do that? and if I compare centroids, how can I extract those 2 centroids from dataset1 and datatset 2 that are closest to eachother?

4 REPLIES
RMStaff

Re: Extract cluster centroids and compare with other centroids?

Fred,

 

check the attached Process. I think this is what you want?

 

~Martin

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="136">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="85"/>
      <operator activated="true" class="k_means" compatibility="7.4.000" expanded="true" height="82" name="Clustering" width="90" x="313" y="34"/>
      <operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="447" y="34"/>
      <operator activated="true" class="replace" compatibility="7.4.000" expanded="true" height="82" name="Replace" width="90" x="581" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="cluster"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="replace_what" value="(.+)"/>
        <parameter key="replace_by" value="Squared_$1"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.4.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="34">
        <parameter key="attribute_name" value="cluster"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="k_means" compatibility="7.4.000" expanded="true" height="82" name="Clustering (2)" width="90" x="313" y="136">
        <parameter key="measure_types" value="NumericalMeasures"/>
        <parameter key="numerical_measure" value="ManhattanDistance"/>
      </operator>
      <operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes (2)" width="90" x="447" y="136"/>
      <operator activated="true" class="replace" compatibility="7.4.000" expanded="true" height="82" name="Replace (2)" width="90" x="581" y="136">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="cluster"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="replace_what" value="(.+)"/>
        <parameter key="replace_by" value="Manhattan_$1"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.4.000" expanded="true" height="82" name="Set Role (2)" width="90" x="715" y="136">
        <parameter key="attribute_name" value="cluster"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="cross_distances" compatibility="7.4.000" expanded="true" height="103" name="Cross Distances" width="90" x="849" y="85"/>
      <connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Clustering" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Clustering (2)" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
      <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Replace" to_port="example set input"/>
      <connect from_op="Replace" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Cross Distances" to_port="request set"/>
      <connect from_op="Clustering (2)" from_port="cluster model" to_op="Extract Cluster Prototypes (2)" to_port="model"/>
      <connect from_op="Extract Cluster Prototypes (2)" from_port="example set" to_op="Replace (2)" to_port="example set input"/>
      <connect from_op="Replace (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Cross Distances" to_port="reference set"/>
      <connect from_op="Cross Distances" from_port="result set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Elite

Re: Extract cluster centroids and compare with other centroids?

[ Edited ]

yes that was pretty much what I was looking for Smiley Happy thanks..

but one more question, is it possible to cluster by labels? I mean each label as one cluster, and then extract or calculate the cluster centroid of each label group? how does it work, should I give the label the role "cluster"? or how?

RMStaff

Re: Extract cluster centroids and compare with other centroids?

Hi,

 

so you want to cluster and check if there are clusters with purely one label in? Sounds like aggregate count(label) group_by(cluster)? Otherwise you might want to check the operator Map Clustering On Label. 

Best,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Highlighted
Elite

Re: Extract cluster centroids and compare with other centroids?

mschmitz wrote:

Hi,

 

so you want to cluster and check if there are clusters with purely one label in? Sounds like aggregate count(label) group_by(cluster)? Otherwise you might want to check the operator Map Clustering On Label. 

Best,

Martin


yeah, thats part of what I originally wanted to do.. is it any possible to declare an example set as a Cluster model? e.g. after I aggregated the class labels and built average / centroid of all class values, can I declare those centroids as a cluster model? 

edit: sorry I just noticed, that would then be no more necessary as centroids are transformed into normal example sets after then Smiley Wink

 

the formula to get Centroids by label lass is it the same as you described above?