[SOLVED] "Hierarchical Classification" operator

mdcmdc Member Posts: 58 Maven
Hello,

I'm trying to do hierarchical classification of documents and I believe the 'hierarchical classification' operator is the way to go as recommended here in the forum. My problem is that I couldn't figure out how to use this operator and what to expect as an output. I couldn't find any example of use in the forum either. Can somebody post a sample process using this operator?

thanks in advance,
Matthew

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Here's an example of a top down clustering. It uses the top clustering operator which itself contains another clustering operator; in this case expectation maximization with k = 2.. By observation this all works something like this. The outer operator invokes the inner which splits the example set into k = 2 clusters. The outer operator then repeats this with the examples from these 2 clusters and the inner operator duly splits these into 2 more clusters. This repeats for the number defined in the max depth parameter for the top down clustering operator. I believe the flatten clusters operator is what is needed to extract a particular clustering and to prove this to myself I added a map labels operator with performance to see how well the clusters map to the ground truth.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="top_down_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering (2)" width="90" x="112" y="165">
            <parameter key="max_depth" value="2"/>
            <process expanded="true">
              <operator activated="true" class="expectation_maximization_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering" width="90" x="179" y="75"/>
              <connect from_port="example set" to_op="Clustering" to_port="example set"/>
              <connect from_op="Clustering" from_port="cluster model" to_port="cluster model"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_cluster model" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="flatten_clustering" compatibility="5.3.008" expanded="true" height="76" name="Flatten Clustering" width="90" x="112" y="255"/>
          <operator activated="true" class="map_clustering_on_labels" compatibility="5.3.008" expanded="true" height="76" name="Map Clustering on Labels" width="90" x="380" y="210"/>
          <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance" width="90" x="514" y="75"/>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Clustering (2)" to_port="example set"/>
          <connect from_op="Clustering (2)" from_port="cluster model" to_op="Flatten Clustering" to_port="hierarchical"/>
          <connect from_op="Clustering (2)" from_port="clustered set" to_op="Flatten Clustering" to_port="example set"/>
          <connect from_op="Flatten Clustering" from_port="flat" to_op="Map Clustering on Labels" to_port="cluster model"/>
          <connect from_op="Flatten Clustering" from_port="example set" to_op="Map Clustering on Labels" to_port="example set"/>
          <connect from_op="Map Clustering on Labels" from_port="example set" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Map Clustering on Labels" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Performance" from_port="performance" to_port="result 2"/>
          <connect from_op="Performance" from_port="example set" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • mdcmdc Member Posts: 58 Maven
    Thanks Andrew for the reply.
    But I'm looking for hierarchical classification, particularly its operator. I have hierarchical labels which I can enter in the operator's table. But other than that I have no idea how to use (expected input and output) it.

    Matthew
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello Matthew

    Good point - I didn't pay attention to the question and substituted clustering for classification

    I'm not familiar with hierarchical classification in the context of machine learning but I'm guessing it's something to do with dividing example sets into smaller and smaller pieces based on a rule at each stage. That's sort of what the clustering example is doing with the proviso that the rule is not controllable because it is the same clustering algorithm at all times. It also produces a prediction so it is usable as a classifier - again with one proviso, the label results are not derived from the training data so there would also be ambiguity about the true identify of the clusters.


    regards

    Andrew
  • mdcmdc Member Posts: 58 Maven

    Hi,

    I created a hierarchical classification a couple of years ago similar to what you described --modelling/applying different set of labels to each divided example set. The set of labels are hierarchical. But since there is this 'Hierarchical Classification' operator, I thought that this could make the process simpler.

    Anyways, if anybody has a sample process please post it or maybe a hint on how it works.  ???

    thanks,
    Matthew
  • mdcmdc Member Posts: 58 Maven

    Anybody :(, any hint  ??? on how to  use that 'hierarchical classification' operator?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    The following process performs a hierarchical classification on Iris. You have to define the hierarchy in tabular form, starting from a "root" node.
    Please have a look at the process below and come back with any questions you have.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="hierarchical_multi_class_classification" compatibility="5.3.008" expanded="true" height="76" name="Hierarchical Classification" width="90" x="179" y="30">
            <list key="hierarchy">
              <parameter key="versicolor_virginica" value="Iris-versicolor"/>
              <parameter key="versicolor_virginica" value="Iris-virginica"/>
              <parameter key="root" value="Iris-setosa"/>
              <parameter key="root" value="versicolor_virginica"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="support_vector_machine" compatibility="5.3.008" expanded="true" height="112" name="SVM" width="90" x="179" y="30"/>
              <connect from_port="training set" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Hierarchical Classification" to_port="training set"/>
          <connect from_op="Hierarchical Classification" from_port="model" to_port="result 2"/>
          <connect from_op="Hierarchical Classification" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • mdcmdc Member Posts: 58 Maven

    Thanks Marius. It works but if I apply the model to an exampleset, the result is not showing the hierarchical labels --just the original labels (iris-*). Is there a way to make the prediction use the parent labels too --like another column?

    Matthew

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="hierarchical_multi_class_classification" compatibility="5.3.008" expanded="true" height="76" name="Hierarchical Classification" width="90" x="179" y="30">
            <list key="hierarchy">
              <parameter key="versicolor_virginica" value="Iris-versicolor"/>
              <parameter key="versicolor_virginica" value="Iris-virginica"/>
              <parameter key="root" value="Iris-setosa"/>
              <parameter key="root" value="versicolor_virginica"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="support_vector_machine" compatibility="5.3.008" expanded="true" height="112" name="SVM" width="90" x="179" y="30"/>
              <connect from_port="training set" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="75">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="447" y="30">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Hierarchical Classification" to_port="training set"/>
          <connect from_op="Hierarchical Classification" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Hierarchical Classification" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
          <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Matthew,

    unfortunately that is not possible with a single operator. It is possible to build a custom process that creates hierarchical labels, but that is way more complex.

    Best regards,
    Marius
  • mdcmdc Member Posts: 58 Maven

    Thanks. That's good to know.

    Matthew
  • ahkcsitahkcsit Member Posts: 1 Learner I

    Why do not you view this process graphically! I think it will be much easier than trying to imagine connections in the above code.

  • mattia_fumagallmattia_fumagall Member Posts: 3 Contributor I
    Dear All,
    sorry, maybe I am a little bit late, but, please, can you provide some hints on how to to set up the custom process you suggested? I am referring to the following post:
    Matthew,

    unfortunately that is not possible with a single operator. It is possible to build a custom process that creates hierarchical labels, but that is way more complex.

    Best regards,
    Marius
    Thank you in advance!
    Mattia

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @mattia_fumagall yes this is an OLD thread :smile: Can you please start a new discussion (click the "Ask a Question" button on the top) and describe what exactly you want to do? Using the code above from RapidMiner 5.3 is just not going to get us very far...

    Scott

Sign In or Register to comment.