RapidMiner

RapidMiner

Cluster Algorithms do not produce any output

Contributor II

Cluster Algorithms do not produce any output

Hello everyone, 

 

I am trying to run the following cluster algorithms on an exampleset, that I generated beforehand from different JSON files.

What I would like to do, is to buld clustrs of the example set and measure the quality of each algorithm.

 

I am facing 2 basic problems.

 

1. When running the process with just the centroid algorithms, the process finishes successfully, but it won't produce any clusters. Or at least I can not see them in the results.

 

2. When running the process as in the attached .xml, the process stops, as the cluster algorithms do not produce any output.

 

Can anyone look at my process and give me any suggestions?

 

 

Thank you very much!!

 

 

  <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="example list 1-300"/>
      </operator>
      <operator activated="false" class="sample_kennard_stone" compatibility="7.4.000" expanded="true" height="82" name="Sample (Kennard-Stone)" width="90" x="246" y="340">
        <parameter key="sample_size" value="600"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="7.4.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
        <list key="columns"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (2)" width="90" x="380" y="30"/>
      <operator activated="true" class="loop_parameters" compatibility="7.4.000" expanded="true" height="145" name="Loop Parameters" width="90" x="648" y="289">
        <list key="parameters">
          <parameter key="Select Subprocess (2).select_which" value="[1;3;3;linear]"/>
        </list>
        <process expanded="true">
          <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (3)" width="90" x="45" y="136"/>
          <operator activated="true" class="select_subprocess" compatibility="7.4.000" expanded="true" height="103" name="Select Subprocess (2)" width="90" x="246" y="34">
            <process expanded="true">
              <operator activated="true" class="dbscan" compatibility="7.4.000" expanded="true" height="82" name="Clustering" width="90" x="112" y="34"/>
              <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="124" name="Subprocess (3)" width="90" x="112" y="289">
                <process expanded="true">
                  <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (8)" width="90" x="179" y="34"/>
                  <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="447" y="34"/>
                  <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (2)" width="90" x="447" y="136">
                    <parameter key="measure" value="GiniCoefficient"/>
                  </operator>
                  <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (9)" width="90" x="179" y="289"/>
                  <operator activated="false" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (3)" width="90" x="447" y="289"/>
                  <operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity" width="90" x="179" y="442"/>
                  <operator activated="true" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (4)" width="90" x="447" y="442"/>
                  <operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log" width="90" x="782" y="34">
                    <list key="log">
                      <parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
                      <parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
                      <parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
                      <parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
                      <parameter key="K" value="operator.Loop Parameters.value.iteration"/>
                      <parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
                    </list>
                  </operator>
                  <connect from_port="in 1" to_op="Multiply (8)" to_port="input"/>
                  <connect from_port="in 2" to_op="Multiply (9)" to_port="input"/>
                  <connect from_op="Multiply (8)" from_port="output 1" to_op="Performance" to_port="cluster model"/>
                  <connect from_op="Multiply (8)" from_port="output 2" to_op="Performance (4)" to_port="cluster model"/>
                  <connect from_op="Performance" from_port="cluster model" to_op="Performance (2)" to_port="cluster model"/>
                  <connect from_op="Multiply (9)" from_port="output 1" to_op="Performance (4)" to_port="example set"/>
                  <connect from_op="Multiply (9)" from_port="output 2" to_op="Data to Similarity" to_port="example set"/>
                  <connect from_op="Data to Similarity" from_port="similarity" to_op="Performance (4)" to_port="distance measure"/>
                  <connect from_op="Log" from_port="through 1" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="source_in 3" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                  <portSpacing port="sink_out 3" spacing="0"/>
                  <portSpacing port="sink_out 4" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Clustering" to_port="example set"/>
              <connect from_op="Clustering" from_port="cluster model" to_op="Subprocess (3)" to_port="in 1"/>
              <connect from_op="Clustering" from_port="clustered set" to_op="Subprocess (3)" to_port="in 2"/>
              <connect from_op="Subprocess (3)" from_port="out 2" to_port="output 1"/>
              <connect from_op="Subprocess (3)" from_port="out 3" to_port="output 2"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
              <portSpacing port="sink_output 3" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="agglomerative_clustering" compatibility="7.4.000" expanded="true" height="82" name="Clustering (2)" width="90" x="112" y="34"/>
              <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="124" name="Subprocess (4)" width="90" x="112" y="289">
                <process expanded="true">
                  <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="82" name="Multiply (10)" width="90" x="179" y="34"/>
                  <operator activated="false" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (5)" width="90" x="447" y="34"/>
                  <operator activated="false" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (6)" width="90" x="447" y="136">
                    <parameter key="measure" value="GiniCoefficient"/>
                  </operator>
                  <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (11)" width="90" x="179" y="289"/>
                  <operator activated="false" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (7)" width="90" x="447" y="289"/>
                  <operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (3)" width="90" x="179" y="442"/>
                  <operator activated="false" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (8)" width="90" x="447" y="442"/>
                  <operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log (2)" width="90" x="782" y="34">
                    <list key="log">
                      <parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
                      <parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
                      <parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
                      <parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
                      <parameter key="K" value="operator.Loop Parameters.value.iteration"/>
                      <parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
                    </list>
                  </operator>
                  <connect from_port="in 1" to_op="Multiply (10)" to_port="input"/>
                  <connect from_port="in 2" to_op="Multiply (11)" to_port="input"/>
                  <connect from_op="Multiply (11)" from_port="output 2" to_op="Data to Similarity (3)" to_port="example set"/>
                  <connect from_op="Log (2)" from_port="through 1" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="source_in 3" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                  <portSpacing port="sink_out 3" spacing="0"/>
                  <portSpacing port="sink_out 4" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Clustering (2)" to_port="example set"/>
              <connect from_op="Clustering (2)" from_port="cluster model" to_op="Subprocess (4)" to_port="in 1"/>
              <connect from_op="Clustering (2)" from_port="example set" to_op="Subprocess (4)" to_port="in 2"/>
              <connect from_op="Subprocess (4)" from_port="out 2" to_port="output 1"/>
              <connect from_op="Subprocess (4)" from_port="out 3" to_port="output 2"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
              <portSpacing port="sink_output 3" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="support_vector_clustering" compatibility="7.4.000" expanded="true" height="82" name="Clustering (3)" width="90" x="112" y="34"/>
              <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="124" name="Subprocess (5)" width="90" x="112" y="289">
                <process expanded="true">
                  <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (12)" width="90" x="179" y="34"/>
                  <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (9)" width="90" x="447" y="34"/>
                  <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (10)" width="90" x="447" y="136">
                    <parameter key="measure" value="GiniCoefficient"/>
                  </operator>
                  <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (13)" width="90" x="179" y="289"/>
                  <operator activated="false" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (11)" width="90" x="447" y="289"/>
                  <operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (4)" width="90" x="179" y="442"/>
                  <operator activated="true" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (12)" width="90" x="447" y="442"/>
                  <operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log (3)" width="90" x="782" y="34">
                    <list key="log">
                      <parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
                      <parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
                      <parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
                      <parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
                      <parameter key="K" value="operator.Loop Parameters.value.iteration"/>
                      <parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
                    </list>
                  </operator>
                  <connect from_port="in 1" to_op="Multiply (12)" to_port="input"/>
                  <connect from_port="in 2" to_op="Multiply (13)" to_port="input"/>
                  <connect from_op="Multiply (12)" from_port="output 1" to_op="Performance (9)" to_port="cluster model"/>
                  <connect from_op="Multiply (12)" from_port="output 3" to_op="Performance (12)" to_port="cluster model"/>
                  <connect from_op="Performance (9)" from_port="cluster model" to_op="Performance (10)" to_port="cluster model"/>
                  <connect from_op="Multiply (13)" from_port="output 2" to_op="Performance (12)" to_port="example set"/>
                  <connect from_op="Multiply (13)" from_port="output 3" to_op="Data to Similarity (4)" to_port="example set"/>
                  <connect from_op="Data to Similarity (4)" from_port="similarity" to_op="Performance (12)" to_port="distance measure"/>
                  <connect from_op="Log (3)" from_port="through 1" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="source_in 3" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                  <portSpacing port="sink_out 3" spacing="0"/>
                  <portSpacing port="sink_out 4" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Clustering (3)" to_port="example set"/>
              <connect from_op="Clustering (3)" from_port="cluster model" to_op="Subprocess (5)" to_port="in 1"/>
              <connect from_op="Clustering (3)" from_port="clustered set" to_op="Subprocess (5)" to_port="in 2"/>
              <connect from_op="Subprocess (5)" from_port="out 2" to_port="output 1"/>
              <connect from_op="Subprocess (5)" from_port="out 3" to_port="output 2"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
              <portSpacing port="sink_output 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (4)" width="90" x="313" y="289"/>
          <operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="514" y="391"/>
          <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="103" name="Subprocess (6)" width="90" x="447" y="136">
            <process expanded="true">
              <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (14)" width="90" x="179" y="34"/>
              <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (13)" width="90" x="447" y="34"/>
              <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (14)" width="90" x="447" y="136">
                <parameter key="measure" value="GiniCoefficient"/>
              </operator>
              <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (15)" width="90" x="179" y="289"/>
              <operator activated="true" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (15)" width="90" x="447" y="289"/>
              <operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (5)" width="90" x="179" y="442"/>
              <operator activated="true" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (16)" width="90" x="447" y="442"/>
              <operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log (4)" width="90" x="782" y="34">
                <list key="log">
                  <parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
                  <parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
                  <parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
                  <parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
                  <parameter key="K" value="operator.Loop Parameters.value.iteration"/>
                  <parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
                </list>
              </operator>
              <connect from_port="in 1" to_op="Multiply (14)" to_port="input"/>
              <connect from_port="in 2" to_op="Multiply (15)" to_port="input"/>
              <connect from_op="Multiply (14)" from_port="output 1" to_op="Performance (13)" to_port="cluster model"/>
              <connect from_op="Multiply (14)" from_port="output 2" to_op="Performance (15)" to_port="cluster model"/>
              <connect from_op="Multiply (14)" from_port="output 3" to_op="Performance (16)" to_port="cluster model"/>
              <connect from_op="Performance (13)" from_port="cluster model" to_op="Performance (14)" to_port="cluster model"/>
              <connect from_op="Multiply (15)" from_port="output 1" to_op="Performance (15)" to_port="example set"/>
              <connect from_op="Multiply (15)" from_port="output 2" to_op="Performance (16)" to_port="example set"/>
              <connect from_op="Multiply (15)" from_port="output 3" to_op="Data to Similarity (5)" to_port="example set"/>
              <connect from_op="Data to Similarity (5)" from_port="similarity" to_op="Performance (16)" to_port="distance measure"/>
              <connect from_op="Log (4)" from_port="through 1" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="source_in 3" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="log_to_data" compatibility="7.4.000" expanded="true" height="103" name="Log to Data" width="90" x="648" y="136"/>
          <operator activated="true" class="guess_types" compatibility="7.4.000" expanded="true" height="82" name="Guess Types" width="90" x="849" y="136"/>
          <connect from_port="input 1" to_op="Multiply (3)" to_port="input"/>
          <connect from_op="Multiply (3)" from_port="output 1" to_op="Select Subprocess (2)" to_port="input 1"/>
          <connect from_op="Multiply (3)" from_port="output 2" to_op="Subprocess (6)" to_port="in 2"/>
          <connect from_op="Select Subprocess (2)" from_port="output 1" to_op="Multiply (4)" to_port="input"/>
          <connect from_op="Select Subprocess (2)" from_port="output 2" to_port="result 2"/>
          <connect from_op="Multiply (4)" from_port="output 1" to_op="Subprocess (6)" to_port="in 1"/>
          <connect from_op="Multiply (4)" from_port="output 2" to_port="result 1"/>
          <connect from_op="Multiply (4)" from_port="output 3" to_op="Extract Cluster Prototypes" to_port="model"/>
          <connect from_op="Extract Cluster Prototypes" from_port="example set" to_port="result 4"/>
          <connect from_op="Subprocess (6)" from_port="out 1" to_op="Log to Data" to_port="through 1"/>
          <connect from_op="Log to Data" from_port="exampleSet" to_op="Guess Types" to_port="example set input"/>
          <connect from_op="Guess Types" from_port="example set output" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="loop_parameters" compatibility="7.4.000" expanded="true" height="145" name="Centroid Algo" width="90" x="648" y="30">
        <list key="parameters">
          <parameter key="Select Subprocess.select_which" value="[1;3;3;linear]"/>
        </list>
        <process expanded="true">
          <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="94" name="Multiply" width="90" x="45" y="120"/>
          <operator activated="true" class="select_subprocess" compatibility="7.4.000" expanded="true" height="103" name="Select Subprocess" width="90" x="246" y="30">
            <parameter key="select_which" value="3"/>
            <process expanded="true">
              <operator activated="true" class="k_means" compatibility="7.4.000" expanded="true" height="82" name="k-Means" width="90" x="45" y="30">
                <parameter key="k" value="9"/>
              </operator>
              <connect from_port="input 1" to_op="k-Means" to_port="example set"/>
              <connect from_op="k-Means" from_port="cluster model" to_port="output 1"/>
              <connect from_op="k-Means" from_port="clustered set" to_port="output 2"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
              <portSpacing port="sink_output 3" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="x_means" compatibility="7.4.000" expanded="true" height="82" name="X-Means" width="90" x="45" y="30"/>
              <connect from_port="input 1" to_op="X-Means" to_port="example set"/>
              <connect from_op="X-Means" from_port="cluster model" to_port="output 1"/>
              <connect from_op="X-Means" from_port="clustered set" to_port="output 2"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
              <portSpacing port="sink_output 3" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="k_medoids" compatibility="7.4.000" expanded="true" height="82" name="K Medoid" width="90" x="45" y="30">
                <parameter key="k" value="9"/>
              </operator>
              <connect from_port="input 1" to_op="K Medoid" to_port="example set"/>
              <connect from_op="K Medoid" from_port="cluster model" to_port="output 1"/>
              <connect from_op="K Medoid" from_port="clustered set" to_port="output 2"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
              <portSpacing port="sink_output 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (7)" width="90" x="313" y="340"/>
          <operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes (2)" width="90" x="581" y="289"/>
          <operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="103" name="Subprocess (2)" width="90" x="447" y="120">
            <process expanded="true">
              <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (5)" width="90" x="246" y="30"/>
              <operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (6)" width="90" x="246" y="210"/>
              <operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (2)" width="90" x="246" y="345"/>
              <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Distribution SoS (2)" width="90" x="447" y="30"/>
              <operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Distribution Gini (2)" width="90" x="447" y="120">
                <parameter key="measure" value="GiniCoefficient"/>
              </operator>
              <operator activated="true" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Distance (2)" width="90" x="447" y="210">
                <parameter key="normalize" value="true"/>
              </operator>
              <operator activated="false" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Density (2)" width="90" x="447" y="345"/>
              <operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log: Internal" width="90" x="715" y="30">
                <list key="log">
                  <parameter key="avgWithinDistance" value="operator.Distance (2).value.avg_within_distance"/>
                  <parameter key="itemDistribution" value="operator.Density (2).value.clusterdensity"/>
                  <parameter key="Gini" value="operator.Distribution Gini (2).value.item_distribution"/>
                  <parameter key="clusterDensity" value="operator.Distribution SoS (2).value.item_distribution"/>
                  <parameter key="K" value="operator.Centroid Algo.value.iteration"/>
                  <parameter key="Davis" value="operator.Distance (2).value.DaviesBouldin"/>
                </list>
              </operator>
              <connect from_port="in 1" to_op="Multiply (5)" to_port="input"/>
              <connect from_port="in 2" to_op="Multiply (6)" to_port="input"/>
              <connect from_op="Multiply (5)" from_port="output 1" to_op="Distribution SoS (2)" to_port="cluster model"/>
              <connect from_op="Multiply (5)" from_port="output 2" to_op="Distance (2)" to_port="cluster model"/>
              <connect from_op="Multiply (6)" from_port="output 1" to_op="Distance (2)" to_port="example set"/>
              <connect from_op="Multiply (6)" from_port="output 2" to_op="Data to Similarity (2)" to_port="example set"/>
              <connect from_op="Distribution SoS (2)" from_port="cluster model" to_op="Distribution Gini (2)" to_port="cluster model"/>
              <connect from_op="Log: Internal" from_port="through 1" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="source_in 3" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="log_to_data" compatibility="7.4.000" expanded="true" height="94" name="Internal validity measures" width="90" x="648" y="120">
            <parameter key="log_name" value="Log: Internal"/>
          </operator>
          <operator activated="true" class="guess_types" compatibility="7.1.001" expanded="true" height="76" name="Internal" width="90" x="782" y="120"/>
          <connect from_port="input 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Select Subprocess" to_port="input 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Subprocess (2)" to_port="in 2"/>
          <connect from_op="Select Subprocess" from_port="output 1" to_op="Multiply (7)" to_port="input"/>
          <connect from_op="Select Subprocess" from_port="output 2" to_port="result 2"/>
          <connect from_op="Multiply (7)" from_port="output 1" to_op="Subprocess (2)" to_port="in 1"/>
          <connect from_op="Multiply (7)" from_port="output 2" to_op="Extract Cluster Prototypes (2)" to_port="model"/>
          <connect from_op="Extract Cluster Prototypes (2)" from_port="example set" to_port="result 4"/>
          <connect from_op="Subprocess (2)" from_port="out 1" to_op="Internal validity measures" to_port="through 1"/>
          <connect from_op="Internal validity measures" from_port="exampleSet" to_op="Internal" to_port="example set input"/>
          <connect from_op="Internal" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="Centroid Algo" to_port="input 1"/>
      <connect from_op="Multiply (2)" from_port="output 2" to_op="Loop Parameters" to_port="input 1"/>
      <connect from_op="Loop Parameters" from_port="result 2" to_port="result 4"/>
      <connect from_op="Loop Parameters" from_port="result 3" to_port="result 5"/>
      <connect from_op="Loop Parameters" from_port="result 4" to_port="result 6"/>
      <connect from_op="Centroid Algo" from_port="result 2" to_port="result 1"/>
      <connect from_op="Centroid Algo" from_port="result 3" to_port="result 2"/>
      <connect from_op="Centroid Algo" from_port="result 4" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
      <portSpacing port="sink_result 6" spacing="0"/>
      <portSpacing port="sink_result 7" spacing="0"/>
    </process>
  </operator>
</process>
8 REPLIES
Community Manager

Re: Cluster Algorithms do not produce any output

There's something wrong with the XML you posted. Can you export the process (File > Export Process) and attach + a snapshot of the data.

Regards,
T-Bone
Twitter: @neuralmarket
Contributor II

Re: Cluster Algorithms do not produce any output

 

Sorry, I missed the first lines of the xml.

 

So here is the exported process and the snapshot of the example set. The example set is data after preprocessing.

As I appearently have too much data (approx. 7 GB), RapidMiner stops when I try to do the preprocessing and clustering process in one, due to insufficient Memory. Though I have 16GB RAM. But thats ok for me, I will try to work with subprocesses and save the intermediate results. Or would you recommend something else?

 

Attachments

Community Manager

Re: Cluster Algorithms do not produce any output

Are you using pruning in the Process Documents operator? To get the attribute count down, and hopefully process the data through your clustering algorithms, try pruning. 

Regards,
T-Bone
Twitter: @neuralmarket
Contributor II

Re: Cluster Algorithms do not produce any output

[ Edited ]

Yes I am using pruning while processing the documents. Still the amount of attributes is enormous...

RMStaff

Re: Cluster Algorithms do not produce any output

Hi,

you could try a PCA and cluster in the PCA-space.

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Contributor II

Re: Cluster Algorithms do not produce any output

Hi Martin, 

 

actually I have never used PCA before. So I did give it a try. And the process took hours for 50 documents... is that normal?

 

And I do not really understand the resulting example set (see the screenshot). I recieve pc_1, pc_2, etc. as results instead of the attributes (words) I had before. How can I interpret this?

 

PCA_example set.PNG

Contributor II

Re: Cluster Algorithms do not produce any output

So, has anyone any idea regarding my clustering process??

 

 

Attachments

Community Manager

Re: Cluster Algorithms do not produce any output

Oh hey, so for PCA it's a transfomation to get rid of correlated variables and create a data set of uncorrelated values.  Simafore wrote up a great article on how to do it in RapidMiner and how to interpret it here: http://www.simafore.com/blog/bid/57651/When-Principal-Component-Analysis-makes-sense-in-business-ana...

 

and

 

http://www.simafore.com/blog/bid/62910/How-to-run-Principal-Component-Analysis-with-RapidMiner-Part-...

 

 

 

Regards,
T-Bone
Twitter: @neuralmarket