i want to apply cluster on a data set and then apply feature selection on each cluster

imparveenimparveen Member Posts: 1 Learner I
edited December 2018 in Help

I am working on health Data set.What i am trying to do is that i want to make 2 cluster of this data set and then on each cluster i want to apply different feature selection methods .Using Rapid miner how can I use each cluster so that i can seperately apply feature selection techniques on both the clusters.

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @imparveen welcome to the community. Without looking at your XML I can just speak in generalization about this. Basically you would run a clustering algorithm on your data set (e.g. k-means) first. The output of this will be a new attribute called "cluster":

     

    Screen Shot 2018-09-24 at 9.51.03 AM.png

     

    If you then want to work on each cluster separately, I would just use Filter Examples:

     

    Screen Shot 2018-09-24 at 9.53.20 AM.png          Screen Shot 2018-09-24 at 9.53.00 AM.png

     

    XML of that process is here:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Iris" origin="GENERATED_TUTORIAL" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="concurrency:k_means" compatibility="9.0.001" expanded="true" height="82" name="Clustering" origin="GENERATED_TUTORIAL" width="90" x="179" y="34">
    <parameter key="k" value="3"/>
    <parameter key="use_local_random_seed" value="true"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="9.0.002" expanded="true" height="103" name="Filter Examples" width="90" x="313" y="85">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="cluster.equals.cluster_1"/>
    </list>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Clustering" to_port="example set"/>
    <connect from_op="Clustering" from_port="clustered set" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
    <connect from_op="Filter Examples" from_port="unmatched example set" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="147"/>
    <portSpacing port="sink_result 3" spacing="21"/>
    <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="409" y="80">output with only cluster_1</description>
    <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="409" y="165">output with only cluster_2</description>
    </process>
    </operator>
    </process>

    Scott

Sign In or Register to comment.