"cluster performance evaluation - negative (?) average of distances"

dan_agapedan_agape Member Posts: 106 Maven
edited May 2019 in Help

I have just tested the operators measuring clustering performance, in particular for a centroid based scheme. The Cluster Distance Performance operator provided, as a measure of clustering quality, negative (?) averages of distances from the centroids to the instances within the respective clusters. Here is an example process that uses a clustering model built by the first process in http://rapid-i.com/rapidforum/index.php/topic,2608.0.html


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
 <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
   <process expanded="true" height="404" width="599">
     <operator activated="true" class="generate_data" compatibility="5.0.10" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="165">
       <parameter key="number_examples" value="2000"/>
       <parameter key="use_local_random_seed" value="true"/>
       <parameter key="local_random_seed" value="20090"/>
     <operator activated="true" class="select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="255">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="label"/>
       <parameter key="invert_selection" value="true"/>
       <parameter key="include_special_attributes" value="true"/>
     <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
       <parameter key="repository_entry" value="//NewLocalRepository/models/tmp_kmeans_mod"/>
     <operator activated="true" class="cluster_distance_performance" compatibility="5.0.10" expanded="true" height="94" name="Performance" width="90" x="246" y="165"/>
     <connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="Performance" to_port="example set"/>
     <connect from_op="Retrieve" from_port="output" to_op="Performance" to_port="cluster model"/>
     <connect from_op="Performance" from_port="performance" to_port="result 1"/>
     <connect from_op="Performance" from_port="example set" to_port="result 3"/>
     <connect from_op="Performance" from_port="cluster model" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
     <portSpacing port="sink_result 4" spacing="0"/>


  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    this seems to be strange, but I cannot execute your process, because of missing data. Could you please file a bug report for this, too?

    With kind regards,
      Sebastian Land
Sign In or Register to comment.