Extract Cluster Prototypes component does not show my id attribute

Jeffersonjpa · April 2019

How could I pass an attribute with id label through an Extract Cluster Prototypes?

I need to identify the centers of the clusters (centroids) after the process of clustering with k-medoids for this my dataset has an identifier attribute that I set up as being id label using the setRole operator but the Extract Cluster Prototypes component does not show my id attribute. Can someone help me ?

varunm1 · April 2019

Hello @Jeffersonjpa

Sorry, I am a bit confused about this question. Extract cluster prototypes will get the centroids of each attribute for different cluster independent of a label. This operator will have your label but it is used only for visualization purpose. If you want the labels and cluster ID you should connect the output (Clustering.clustered set) of k-medoids to result. This will show the attributes, cluster of each sample and label assigned.

Jeffersonjpa · April 2019

Thank you @varunm1 but I'm afraid this is not the answer to my problem..

Explaining best, my dataset has already an identifier attribute I want to see this same unchanged attribute identifier at the end of the clustering process as a way to identify centroid (using k-medoids) within my dataset and complementing the problem I'm using normalization to improve the distance calculation between the attributes and my prior identifier should not be normalized

Telcontar120 · April 2019

The centroid is designed to be an average of all your attributes for each cluster, so it does not output an id. There is only one row per cluster.
If you have used k-medoids, then you can use Join to pull the averages into your full dataset and then map those centroids back to specific examples by Generate Attrbiutes.
This will show you which individual records match your cluster centroid. See attached example process:

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Ripley-Set" origin="GENERATED_TUTORIAL" width="90" x="112" y="34">
        <parameter key="repository_entry" value="//Samples/data/Ripley-Set"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID" width="90" x="246" y="34">
        <parameter key="create_nominal_ids" value="false"/>
        <parameter key="offset" value="0"/>
      </operator>
      <operator activated="true" class="k_medoids" compatibility="9.2.001" expanded="true" height="82" name="Clustering" width="90" x="380" y="85">
        <parameter key="add_cluster_attribute" value="true"/>
        <parameter key="add_as_label" value="false"/>
        <parameter key="remove_unlabeled" value="false"/>
        <parameter key="k" value="2"/>
        <parameter key="max_runs" value="10"/>
        <parameter key="max_optimization_steps" value="100"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="measure_types" value="MixedMeasures"/>
        <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
        <parameter key="nominal_measure" value="NominalDistance"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="divergence" value="GeneralizedIDivergence"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
      </operator>
      <operator activated="true" class="extract_prototypes" compatibility="9.2.001" expanded="true" height="82" name="Extract Cluster Prototypes" origin="GENERATED_TUTORIAL" width="90" x="581" y="34"/>
      <operator activated="true" class="concurrency:join" compatibility="9.2.001" expanded="true" height="82" name="Join" width="90" x="715" y="85">
        <parameter key="remove_double_attributes" value="false"/>
        <parameter key="join_type" value="right"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="cluster" value="cluster"/>
        </list>
        <parameter key="keep_both_join_attributes" value="false"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="849" y="85">
        <list key="function_descriptions">
          <parameter key="Centroid" value="if(att1==att1_from_ES2&amp;&amp;att2==att2_from_ES2,&quot;centroid&quot;,&quot;not&quot;)"/>
        </list>
        <parameter key="keep_all" value="true"/>
      </operator>
      <connect from_op="Ripley-Set" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
      <connect from_op="Clustering" from_port="clustered set" to_op="Join" to_port="right"/>
      <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="72"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Extract Cluster Prototypes component does not show my id attribute

Answers

Be Safe. Follow precautions and Maintain Social Distancing