Options

Extract Cluster Prototypes component does not show my id attribute

JeffersonjpaJeffersonjpa Member Posts: 5 Contributor I
edited April 2019 in Help
How could I pass an attribute with id label through an Extract Cluster Prototypes?
I need to identify the centers of the clusters (centroids) after the process of clustering with k-medoids for this my dataset has an identifier attribute that I set up as being id label using the setRole operator but the Extract Cluster Prototypes component does not show my id attribute. Can someone help me ? 

Answers

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Jeffersonjpa

    Sorry, I am a bit confused about this question. Extract cluster prototypes will get the centroids of each attribute for different cluster independent of a label. This operator will have your label but it is used only for visualization purpose. If you want the labels and cluster ID you should connect the output (Clustering.clustered set) of k-medoids to result. This will show the attributes, cluster of each sample and label assigned.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    JeffersonjpaJeffersonjpa Member Posts: 5 Contributor I
    edited April 2019
    Thank you @varunm1 but I'm afraid this is not the answer to my problem.. :D
    Explaining best, my dataset has already an identifier attribute I want to see this same unchanged attribute identifier at the end of the clustering process as a way to identify centroid (using k-medoids) within my dataset and complementing the problem I'm using normalization to improve the distance calculation between the attributes and my prior identifier should not be normalized
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    The centroid is designed to be an average of all your attributes for each cluster, so it does not output an id.  There is only one row per cluster. 
    If you have used k-medoids, then you can use Join to pull the averages into your full dataset and then map those centroids back to specific examples by Generate Attrbiutes. 
    This will show you which individual records match your cluster centroid.  See attached example process:
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Ripley-Set" origin="GENERATED_TUTORIAL" width="90" x="112" y="34">
            <parameter key="repository_entry" value="//Samples/data/Ripley-Set"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID" width="90" x="246" y="34">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="k_medoids" compatibility="9.2.001" expanded="true" height="82" name="Clustering" width="90" x="380" y="85">
            <parameter key="add_cluster_attribute" value="true"/>
            <parameter key="add_as_label" value="false"/>
            <parameter key="remove_unlabeled" value="false"/>
            <parameter key="k" value="2"/>
            <parameter key="max_runs" value="10"/>
            <parameter key="max_optimization_steps" value="100"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="measure_types" value="MixedMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="GeneralizedIDivergence"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
          </operator>
          <operator activated="true" class="extract_prototypes" compatibility="9.2.001" expanded="true" height="82" name="Extract Cluster Prototypes" origin="GENERATED_TUTORIAL" width="90" x="581" y="34"/>
          <operator activated="true" class="concurrency:join" compatibility="9.2.001" expanded="true" height="82" name="Join" width="90" x="715" y="85">
            <parameter key="remove_double_attributes" value="false"/>
            <parameter key="join_type" value="right"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="cluster" value="cluster"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="849" y="85">
            <list key="function_descriptions">
              <parameter key="Centroid" value="if(att1==att1_from_ES2&amp;&amp;att2==att2_from_ES2,&quot;centroid&quot;,&quot;not&quot;)"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <connect from_op="Ripley-Set" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
          <connect from_op="Clustering" from_port="clustered set" to_op="Join" to_port="right"/>
          <connect from_op="Extract Cluster Prototypes" from_port="example set" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="72"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.