Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Using SOM as a clustering Operator

siamak_wantsiamak_want Member Posts: 98 Contributor II
edited November 2018 in Help
Hi all,

Nowadays,  I have found SOM network (Self-Organizing Map) very efficient for text clustering (according to several valid publications), So I'm about to use it for document clustering with RM. but suddenly I found a strange fact:
                                      "SOM operator has been considered as a visualization operator in RM, not as a clustering operator"
Now, wthe question is that: Can I utilize the current visualization SOM algorithm for developing my own clustering SOM operartor? (I have bought the mannual and so I am familiar with extending rapidminer 5.0)

Anay idea would be greatly appreciated.
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Of course you could do that, but did you see that there is a Self-Organizing Map operator? Maybe it does exactly what you are planning to implement.

    Best, Marius
  • siamak_wantsiamak_want Member Posts: 98 Contributor II
    Hi Marius,
    Thanks to your straightforward guide, but the Self-Organizing Map (which you have addressed) does not deliver a cluster model. any idea about this?

    thanks
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    You can either apply a clustering algorithm on the SOMified data, or you could set the dimensionality of the SOM to 1 and the net size to the desired number of clusters. The SOM operator outputs a preprocessing model, which you can then apply on new data. See the attached process for an easy example.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="500" width="950">
          <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="246" y="75"/>
          <operator activated="true" class="self_organizing_map" compatibility="5.2.003" expanded="true" height="94" name="SOM" width="90" x="447" y="75">
            <parameter key="number_of_dimensions" value="1"/>
          </operator>
          <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data (2)" width="90" x="447" y="210"/>
          <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model" width="90" x="648" y="120">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="SOM" to_port="example set input"/>
          <connect from_op="SOM" from_port="example set output" to_port="result 1"/>
          <connect from_op="SOM" from_port="preprocessing model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • siamak_wantsiamak_want Member Posts: 98 Contributor II
    Thanks to your fantastic and also tricky method, Marius.
    I think you solved the problem. Now I can use the "preprocessing model", exactly as a "cluster model".

    thanks again to Marius.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Yeah, sometimes RapidMiner is not just about data mining, but also about creativity ;)
Sign In or Register to comment.