"(solved) Clustering and classify unlabelled dataset"

blueearthblueearth Member Posts: 42 Contributor II
edited June 8 in Help
Hi all.
I have an example set without any special attributes ...is it possible to run unsupervised clustering or classification on it in order to cluster or classify these data?
for example i have set of regular attributes and i want a model to cluster or classify them with regards to regular attributes...is there any operator or processes for this purpose
Thank you.
Tagged:

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458   Unicorn
    Hello

    Yes indeed - all the clustering algorithms can do this.

    Here's an example using k-means. For fun, it also joins the cluster result back to the original and maps clusters to labels to come up with a classification performance.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="431" width="1016">
          <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="165">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|a4|a3|a2|a1"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="k_means" compatibility="5.2.008" expanded="true" height="76" name="Clustering" width="90" x="313" y="30">
            <parameter key="k" value="3"/>
            <parameter key="measure_types" value="NumericalMeasures"/>
            <parameter key="numerical_measure" value="CosineSimilarity"/>
          </operator>
          <operator activated="true" class="replace" compatibility="5.2.008" expanded="true" height="76" name="Replace" width="90" x="313" y="300">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="replace_what" value="id_(.*)"/>
            <parameter key="replace_by" value="$1"/>
          </operator>
          <operator activated="true" class="guess_types" compatibility="5.2.008" expanded="true" height="76" name="Guess Types" width="90" x="447" y="300">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="guess_types" compatibility="5.2.008" expanded="true" height="76" name="Guess Types (2)" width="90" x="447" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="id"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="join" compatibility="5.2.008" expanded="true" height="76" name="Join" width="90" x="581" y="120">
            <list key="key_attributes"/>
          </operator>
          <operator activated="true" class="map_clustering_on_labels" compatibility="5.2.008" expanded="true" height="76" name="Map Clustering on Labels" width="90" x="715" y="30"/>
          <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="849" y="30"/>
          <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Select Attributes" from_port="original" to_op="Replace" to_port="example set input"/>
          <connect from_op="Clustering" from_port="cluster model" to_op="Map Clustering on Labels" to_port="cluster model"/>
          <connect from_op="Clustering" from_port="clustered set" to_op="Guess Types (2)" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Guess Types" to_port="example set input"/>
          <connect from_op="Guess Types" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Guess Types (2)" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Guess Types (2)" from_port="original" to_port="result 2"/>
          <connect from_op="Join" from_port="join" to_op="Map Clustering on Labels" to_port="example set"/>
          <connect from_op="Map Clustering on Labels" from_port="example set" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>


    regards

    Andrew
  • blueearthblueearth Member Posts: 42 Contributor II
    Hi thank you so much
    but unfortunately i didn't get it
    here we have spacial attributes such as label and id in that example but what i have is an example set with out any special attributes and id its all just regular attributes and i want to know is it possible to cluster or classify them according to regular attributes?
    thanks alot
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458   Unicorn
    Hello

    Select the Clustering operator and set a breakpoint before it executes and one after.

    If you run the process you will see that the input to the operator is an example set consisting of 4 regular attributes whilst the output has an id and a cluster attribute added.

    regards

    Andrew



  • blueearthblueearth Member Posts: 42 Contributor II
    Thank you so much :D
Sign In or Register to comment.