Show only best clustering result K-means

stephanie_monicstephanie_monic Member Posts: 1 Contributor I
edited November 2018 in Help

Hey everybody. So, I stored the K-Means, Cluster Distance Performance (I'm using Davies-Bouldin Index), and Log operators inside Loop Parameters. It will show me the all of the cluster results of each k, also the k and its Davies-Bouldin Index.

 

For example the best k based on Davies Bouldin Index is 4. I want to only write the best clustering result (where it clustered to 4 clusters) to excel. But it will only write the biggest k that I set on Loop Parameters clustering result on excel. (For example I set the k = 2 until 10, right now it will only write the data that has been clustered to 10 cluster).

 

Do you guys have any idea to do it? Thank you so much it means a lot!

Answers

  • earmijoearmijo Member Posts: 270 Unicorn

    Try the following code. It should do what you want. Instead of using a loop operator, you could use Optimize Parameters.

     

    It's an example using the Iris dataset. 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.002" expanded="true" height="68" name="Retrieve Iris" width="90" x="179" y="289">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="7.6.002" expanded="true" height="103" name="Normalize" width="90" x="380" y="289"/>
    <operator activated="true" class="optimize_parameters_grid" compatibility="7.6.002" expanded="true" height="145" name="Optimize Parameters (Grid)" width="90" x="581" y="136">
    <list key="parameters">
    <parameter key="Clustering.k" value="[2.0;8;7;linear]"/>
    </list>
    <process expanded="true">
    <operator activated="true" class="k_means" compatibility="7.6.002" expanded="true" height="82" name="Clustering" width="90" x="380" y="85">
    <parameter key="k" value="8"/>
    </operator>
    <operator activated="true" class="cluster_distance_performance" compatibility="7.6.002" expanded="true" height="103" name="Performance" width="90" x="648" y="238">
    <parameter key="main_criterion" value="Davies Bouldin"/>
    <parameter key="main_criterion_only" value="true"/>
    </operator>
    <operator activated="true" class="log" compatibility="7.6.002" expanded="true" height="82" name="Log" width="90" x="849" y="85">
    <list key="log">
    <parameter key="k" value="operator.Clustering.parameter.k"/>
    <parameter key="performance" value="operator.Performance.value.DaviesBouldin"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Clustering" to_port="example set"/>
    <connect from_op="Clustering" from_port="cluster model" to_op="Performance" to_port="cluster model"/>
    <connect from_op="Clustering" from_port="clustered set" to_op="Performance" to_port="example set"/>
    <connect from_op="Performance" from_port="performance" to_op="Log" to_port="through 1"/>
    <connect from_op="Performance" from_port="example set" to_port="result 2"/>
    <connect from_op="Performance" from_port="cluster model" to_port="result 1"/>
    <connect from_op="Log" from_port="through 1" to_port="performance"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_port="result 2"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="result 2" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.