"how to evaluate and compare two clustering method including k-mean"

soheil008soheil008 Member Posts: 5 Contributor II
edited June 2019 in Help


I want to apply two clustering method including k-mean to my data and then compare them. Is there any way in rapidminer for copmaring clustering ?



  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Hi Soehill,


    Yes, that's quite easy to do. You would just need a Multiply operator after your data set and then connect the different clustering algorithms to it. Make sure to then output all the Clustering algo ports. Of course, you can use a Write CSV operator to write out the results too.


    Something like this perhaps?

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.1.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34"/>
    <operator activated="true" class="multiply" compatibility="7.1.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
    <operator activated="true" class="k_means" compatibility="7.1.001" expanded="true" height="82" name="Clustering" width="90" x="380" y="34"/>
    <operator activated="true" class="x_means" compatibility="7.1.001" expanded="true" height="82" name="X-Means" width="90" x="380" y="136"/>
    <connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Clustering" to_port="example set"/>
    <connect from_op="Multiply" from_port="output 2" to_op="X-Means" to_port="example set"/>
    <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
    <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
    <connect from_op="X-Means" from_port="cluster model" to_port="result 3"/>
    <connect from_op="X-Means" from_port="clustered set" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
  • Options
    soheil008soheil008 Member Posts: 5 Contributor II

    Tnx but I hadn't any problem with applying algorithms. Actually I apply K-Mean, K-Medoid and DBScan and I saved the results. Now I want to compare these results with each other and I don't know which operator should I use !


    I had found "cluster distance performance", "cluster density performance " and "item distribution performance". Which one is suitable for comparing K-Mean, K-Medoid and DBScan ?

    Can I use Davies Bouldin ?

  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist

    Two performance measures are supported by 'Cluster Distance Performance':

    Average within cluster distance and

    Davies-Bouldin index


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,520 RM Data Scientist

    And a quick help: You can use performance to data to make a example set from your performance vector. Afterwards it's easy to compare values with standard ETL tools.




    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.