Cluster Algorithms do not produce any output

brunabruna Member Posts: 7 Contributor I
edited November 2018 in Help

Hello everyone, 

 

I am trying to run the following cluster algorithms on an exampleset, that I generated beforehand from different JSON files.

What I would like to do, is to buld clustrs of the example set and measure the quality of each algorithm.

 

I am facing 2 basic problems.

 

1. When running the process with just the centroid algorithms, the process finishes successfully, but it won't produce any clusters. Or at least I can not see them in the results.

 

2. When running the process as in the attached .xml, the process stops, as the cluster algorithms do not produce any output.

 

Can anyone look at my process and give me any suggestions?

 

 

Thank you very much!!

 

 

  <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="example list 1-300"/>
</operator>
<operator activated="false" class="sample_kennard_stone" compatibility="7.4.000" expanded="true" height="82" name="Sample (Kennard-Stone)" width="90" x="246" y="340">
<parameter key="sample_size" value="600"/>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="7.4.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<list key="columns"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (2)" width="90" x="380" y="30"/>
<operator activated="true" class="loop_parameters" compatibility="7.4.000" expanded="true" height="145" name="Loop Parameters" width="90" x="648" y="289">
<list key="parameters">
<parameter key="Select Subprocess (2).select_which" value="[1;3;3;linear]"/>
</list>
<process expanded="true">
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (3)" width="90" x="45" y="136"/>
<operator activated="true" class="select_subprocess" compatibility="7.4.000" expanded="true" height="103" name="Select Subprocess (2)" width="90" x="246" y="34">
<process expanded="true">
<operator activated="true" class="dbscan" compatibility="7.4.000" expanded="true" height="82" name="Clustering" width="90" x="112" y="34"/>
<operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="124" name="Subprocess (3)" width="90" x="112" y="289">
<process expanded="true">
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (8)" width="90" x="179" y="34"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="447" y="34"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (2)" width="90" x="447" y="136">
<parameter key="measure" value="GiniCoefficient"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (9)" width="90" x="179" y="289"/>
<operator activated="false" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (3)" width="90" x="447" y="289"/>
<operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity" width="90" x="179" y="442"/>
<operator activated="true" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (4)" width="90" x="447" y="442"/>
<operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log" width="90" x="782" y="34">
<list key="log">
<parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
<parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
<parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
<parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
<parameter key="K" value="operator.Loop Parameters.value.iteration"/>
<parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
</list>
</operator>
<connect from_port="in 1" to_op="Multiply (8)" to_port="input"/>
<connect from_port="in 2" to_op="Multiply (9)" to_port="input"/>
<connect from_op="Multiply (8)" from_port="output 1" to_op="Performance" to_port="cluster model"/>
<connect from_op="Multiply (8)" from_port="output 2" to_op="Performance (4)" to_port="cluster model"/>
<connect from_op="Performance" from_port="cluster model" to_op="Performance (2)" to_port="cluster model"/>
<connect from_op="Multiply (9)" from_port="output 1" to_op="Performance (4)" to_port="example set"/>
<connect from_op="Multiply (9)" from_port="output 2" to_op="Data to Similarity" to_port="example set"/>
<connect from_op="Data to Similarity" from_port="similarity" to_op="Performance (4)" to_port="distance measure"/>
<connect from_op="Log" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Subprocess (3)" to_port="in 1"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Subprocess (3)" to_port="in 2"/>
<connect from_op="Subprocess (3)" from_port="out 2" to_port="output 1"/>
<connect from_op="Subprocess (3)" from_port="out 3" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="agglomerative_clustering" compatibility="7.4.000" expanded="true" height="82" name="Clustering (2)" width="90" x="112" y="34"/>
<operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="124" name="Subprocess (4)" width="90" x="112" y="289">
<process expanded="true">
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="82" name="Multiply (10)" width="90" x="179" y="34"/>
<operator activated="false" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (5)" width="90" x="447" y="34"/>
<operator activated="false" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (6)" width="90" x="447" y="136">
<parameter key="measure" value="GiniCoefficient"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (11)" width="90" x="179" y="289"/>
<operator activated="false" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (7)" width="90" x="447" y="289"/>
<operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (3)" width="90" x="179" y="442"/>
<operator activated="false" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (8)" width="90" x="447" y="442"/>
<operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log (2)" width="90" x="782" y="34">
<list key="log">
<parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
<parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
<parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
<parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
<parameter key="K" value="operator.Loop Parameters.value.iteration"/>
<parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
</list>
</operator>
<connect from_port="in 1" to_op="Multiply (10)" to_port="input"/>
<connect from_port="in 2" to_op="Multiply (11)" to_port="input"/>
<connect from_op="Multiply (11)" from_port="output 2" to_op="Data to Similarity (3)" to_port="example set"/>
<connect from_op="Log (2)" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Clustering (2)" to_port="example set"/>
<connect from_op="Clustering (2)" from_port="cluster model" to_op="Subprocess (4)" to_port="in 1"/>
<connect from_op="Clustering (2)" from_port="example set" to_op="Subprocess (4)" to_port="in 2"/>
<connect from_op="Subprocess (4)" from_port="out 2" to_port="output 1"/>
<connect from_op="Subprocess (4)" from_port="out 3" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="support_vector_clustering" compatibility="7.4.000" expanded="true" height="82" name="Clustering (3)" width="90" x="112" y="34"/>
<operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="124" name="Subprocess (5)" width="90" x="112" y="289">
<process expanded="true">
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (12)" width="90" x="179" y="34"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (9)" width="90" x="447" y="34"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (10)" width="90" x="447" y="136">
<parameter key="measure" value="GiniCoefficient"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (13)" width="90" x="179" y="289"/>
<operator activated="false" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (11)" width="90" x="447" y="289"/>
<operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (4)" width="90" x="179" y="442"/>
<operator activated="true" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (12)" width="90" x="447" y="442"/>
<operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log (3)" width="90" x="782" y="34">
<list key="log">
<parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
<parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
<parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
<parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
<parameter key="K" value="operator.Loop Parameters.value.iteration"/>
<parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
</list>
</operator>
<connect from_port="in 1" to_op="Multiply (12)" to_port="input"/>
<connect from_port="in 2" to_op="Multiply (13)" to_port="input"/>
<connect from_op="Multiply (12)" from_port="output 1" to_op="Performance (9)" to_port="cluster model"/>
<connect from_op="Multiply (12)" from_port="output 3" to_op="Performance (12)" to_port="cluster model"/>
<connect from_op="Performance (9)" from_port="cluster model" to_op="Performance (10)" to_port="cluster model"/>
<connect from_op="Multiply (13)" from_port="output 2" to_op="Performance (12)" to_port="example set"/>
<connect from_op="Multiply (13)" from_port="output 3" to_op="Data to Similarity (4)" to_port="example set"/>
<connect from_op="Data to Similarity (4)" from_port="similarity" to_op="Performance (12)" to_port="distance measure"/>
<connect from_op="Log (3)" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
<portSpacing port="sink_out 4" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Clustering (3)" to_port="example set"/>
<connect from_op="Clustering (3)" from_port="cluster model" to_op="Subprocess (5)" to_port="in 1"/>
<connect from_op="Clustering (3)" from_port="clustered set" to_op="Subprocess (5)" to_port="in 2"/>
<connect from_op="Subprocess (5)" from_port="out 2" to_port="output 1"/>
<connect from_op="Subprocess (5)" from_port="out 3" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (4)" width="90" x="313" y="289"/>
<operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes" width="90" x="514" y="391"/>
<operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="103" name="Subprocess (6)" width="90" x="447" y="136">
<process expanded="true">
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (14)" width="90" x="179" y="34"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (13)" width="90" x="447" y="34"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance (14)" width="90" x="447" y="136">
<parameter key="measure" value="GiniCoefficient"/>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (15)" width="90" x="179" y="289"/>
<operator activated="true" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Performance (15)" width="90" x="447" y="289"/>
<operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (5)" width="90" x="179" y="442"/>
<operator activated="true" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Performance (16)" width="90" x="447" y="442"/>
<operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log (4)" width="90" x="782" y="34">
<list key="log">
<parameter key="Avg_within_distance" value="operator.Performance (3).value.avg_within_distance"/>
<parameter key="Item_Distribution" value="operator.Performance (4).value.clusterdensity"/>
<parameter key="Gini" value="operator.Performance (2).value.item_distribution"/>
<parameter key="Cluster_Density" value="operator.Performance.value.item_distribution"/>
<parameter key="K" value="operator.Loop Parameters.value.iteration"/>
<parameter key="Davies" value="operator.Performance (3).value.DaviesBouldin"/>
</list>
</operator>
<connect from_port="in 1" to_op="Multiply (14)" to_port="input"/>
<connect from_port="in 2" to_op="Multiply (15)" to_port="input"/>
<connect from_op="Multiply (14)" from_port="output 1" to_op="Performance (13)" to_port="cluster model"/>
<connect from_op="Multiply (14)" from_port="output 2" to_op="Performance (15)" to_port="cluster model"/>
<connect from_op="Multiply (14)" from_port="output 3" to_op="Performance (16)" to_port="cluster model"/>
<connect from_op="Performance (13)" from_port="cluster model" to_op="Performance (14)" to_port="cluster model"/>
<connect from_op="Multiply (15)" from_port="output 1" to_op="Performance (15)" to_port="example set"/>
<connect from_op="Multiply (15)" from_port="output 2" to_op="Performance (16)" to_port="example set"/>
<connect from_op="Multiply (15)" from_port="output 3" to_op="Data to Similarity (5)" to_port="example set"/>
<connect from_op="Data to Similarity (5)" from_port="similarity" to_op="Performance (16)" to_port="distance measure"/>
<connect from_op="Log (4)" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" compatibility="7.4.000" expanded="true" height="103" name="Log to Data" width="90" x="648" y="136"/>
<operator activated="true" class="guess_types" compatibility="7.4.000" expanded="true" height="82" name="Guess Types" width="90" x="849" y="136"/>
<connect from_port="input 1" to_op="Multiply (3)" to_port="input"/>
<connect from_op="Multiply (3)" from_port="output 1" to_op="Select Subprocess (2)" to_port="input 1"/>
<connect from_op="Multiply (3)" from_port="output 2" to_op="Subprocess (6)" to_port="in 2"/>
<connect from_op="Select Subprocess (2)" from_port="output 1" to_op="Multiply (4)" to_port="input"/>
<connect from_op="Select Subprocess (2)" from_port="output 2" to_port="result 2"/>
<connect from_op="Multiply (4)" from_port="output 1" to_op="Subprocess (6)" to_port="in 1"/>
<connect from_op="Multiply (4)" from_port="output 2" to_port="result 1"/>
<connect from_op="Multiply (4)" from_port="output 3" to_op="Extract Cluster Prototypes" to_port="model"/>
<connect from_op="Extract Cluster Prototypes" from_port="example set" to_port="result 4"/>
<connect from_op="Subprocess (6)" from_port="out 1" to_op="Log to Data" to_port="through 1"/>
<connect from_op="Log to Data" from_port="exampleSet" to_op="Guess Types" to_port="example set input"/>
<connect from_op="Guess Types" from_port="example set output" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
<operator activated="true" class="loop_parameters" compatibility="7.4.000" expanded="true" height="145" name="Centroid Algo" width="90" x="648" y="30">
<list key="parameters">
<parameter key="Select Subprocess.select_which" value="[1;3;3;linear]"/>
</list>
<process expanded="true">
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="94" name="Multiply" width="90" x="45" y="120"/>
<operator activated="true" class="select_subprocess" compatibility="7.4.000" expanded="true" height="103" name="Select Subprocess" width="90" x="246" y="30">
<parameter key="select_which" value="3"/>
<process expanded="true">
<operator activated="true" class="k_means" compatibility="7.4.000" expanded="true" height="82" name="k-Means" width="90" x="45" y="30">
<parameter key="k" value="9"/>
</operator>
<connect from_port="input 1" to_op="k-Means" to_port="example set"/>
<connect from_op="k-Means" from_port="cluster model" to_port="output 1"/>
<connect from_op="k-Means" from_port="clustered set" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="x_means" compatibility="7.4.000" expanded="true" height="82" name="X-Means" width="90" x="45" y="30"/>
<connect from_port="input 1" to_op="X-Means" to_port="example set"/>
<connect from_op="X-Means" from_port="cluster model" to_port="output 1"/>
<connect from_op="X-Means" from_port="clustered set" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="k_medoids" compatibility="7.4.000" expanded="true" height="82" name="K Medoid" width="90" x="45" y="30">
<parameter key="k" value="9"/>
</operator>
<connect from_port="input 1" to_op="K Medoid" to_port="example set"/>
<connect from_op="K Medoid" from_port="cluster model" to_port="output 1"/>
<connect from_op="K Medoid" from_port="clustered set" to_port="output 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="103" name="Multiply (7)" width="90" x="313" y="340"/>
<operator activated="true" class="extract_prototypes" compatibility="7.4.000" expanded="true" height="82" name="Extract Cluster Prototypes (2)" width="90" x="581" y="289"/>
<operator activated="true" class="subprocess" compatibility="7.4.000" expanded="true" height="103" name="Subprocess (2)" width="90" x="447" y="120">
<process expanded="true">
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (5)" width="90" x="246" y="30"/>
<operator activated="true" class="multiply" compatibility="7.4.000" expanded="true" height="124" name="Multiply (6)" width="90" x="246" y="210"/>
<operator activated="true" class="data_to_similarity" compatibility="7.4.000" expanded="true" height="82" name="Data to Similarity (2)" width="90" x="246" y="345"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Distribution SoS (2)" width="90" x="447" y="30"/>
<operator activated="true" class="item_distribution_performance" compatibility="7.4.000" expanded="true" height="82" name="Distribution Gini (2)" width="90" x="447" y="120">
<parameter key="measure" value="GiniCoefficient"/>
</operator>
<operator activated="true" class="cluster_distance_performance" compatibility="7.4.000" expanded="true" height="103" name="Distance (2)" width="90" x="447" y="210">
<parameter key="normalize" value="true"/>
</operator>
<operator activated="false" class="cluster_density_performance" compatibility="7.4.000" expanded="true" height="124" name="Density (2)" width="90" x="447" y="345"/>
<operator activated="true" class="log" compatibility="7.4.000" expanded="true" height="82" name="Log: Internal" width="90" x="715" y="30">
<list key="log">
<parameter key="avgWithinDistance" value="operator.Distance (2).value.avg_within_distance"/>
<parameter key="itemDistribution" value="operator.Density (2).value.clusterdensity"/>
<parameter key="Gini" value="operator.Distribution Gini (2).value.item_distribution"/>
<parameter key="clusterDensity" value="operator.Distribution SoS (2).value.item_distribution"/>
<parameter key="K" value="operator.Centroid Algo.value.iteration"/>
<parameter key="Davis" value="operator.Distance (2).value.DaviesBouldin"/>
</list>
</operator>
<connect from_port="in 1" to_op="Multiply (5)" to_port="input"/>
<connect from_port="in 2" to_op="Multiply (6)" to_port="input"/>
<connect from_op="Multiply (5)" from_port="output 1" to_op="Distribution SoS (2)" to_port="cluster model"/>
<connect from_op="Multiply (5)" from_port="output 2" to_op="Distance (2)" to_port="cluster model"/>
<connect from_op="Multiply (6)" from_port="output 1" to_op="Distance (2)" to_port="example set"/>
<connect from_op="Multiply (6)" from_port="output 2" to_op="Data to Similarity (2)" to_port="example set"/>
<connect from_op="Distribution SoS (2)" from_port="cluster model" to_op="Distribution Gini (2)" to_port="cluster model"/>
<connect from_op="Log: Internal" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="source_in 3" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" compatibility="7.4.000" expanded="true" height="94" name="Internal validity measures" width="90" x="648" y="120">
<parameter key="log_name" value="Log: Internal"/>
</operator>
<operator activated="true" class="guess_types" compatibility="7.1.001" expanded="true" height="76" name="Internal" width="90" x="782" y="120"/>
<connect from_port="input 1" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Select Subprocess" to_port="input 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="Subprocess (2)" to_port="in 2"/>
<connect from_op="Select Subprocess" from_port="output 1" to_op="Multiply (7)" to_port="input"/>
<connect from_op="Select Subprocess" from_port="output 2" to_port="result 2"/>
<connect from_op="Multiply (7)" from_port="output 1" to_op="Subprocess (2)" to_port="in 1"/>
<connect from_op="Multiply (7)" from_port="output 2" to_op="Extract Cluster Prototypes (2)" to_port="model"/>
<connect from_op="Extract Cluster Prototypes (2)" from_port="example set" to_port="result 4"/>
<connect from_op="Subprocess (2)" from_port="out 1" to_op="Internal validity measures" to_port="through 1"/>
<connect from_op="Internal validity measures" from_port="exampleSet" to_op="Internal" to_port="example set input"/>
<connect from_op="Internal" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Centroid Algo" to_port="input 1"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="Loop Parameters" to_port="input 1"/>
<connect from_op="Loop Parameters" from_port="result 2" to_port="result 4"/>
<connect from_op="Loop Parameters" from_port="result 3" to_port="result 5"/>
<connect from_op="Loop Parameters" from_port="result 4" to_port="result 6"/>
<connect from_op="Centroid Algo" from_port="result 2" to_port="result 1"/>
<connect from_op="Centroid Algo" from_port="result 3" to_port="result 2"/>
<connect from_op="Centroid Algo" from_port="result 4" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
<portSpacing port="sink_result 7" spacing="0"/>
</process>
</operator>
</process>
Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    There's something wrong with the XML you posted. Can you export the process (File > Export Process) and attach + a snapshot of the data.

  • brunabruna Member Posts: 7 Contributor I

     

    Sorry, I missed the first lines of the xml.

     

    So here is the exported process and the snapshot of the example set. The example set is data after preprocessing.

    As I appearently have too much data (approx. 7 GB), RapidMiner stops when I try to do the preprocessing and clustering process in one, due to insufficient Memory. Though I have 16GB RAM. But thats ok for me, I will try to work with subprocesses and save the intermediate results. Or would you recommend something else?

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Are you using pruning in the Process Documents operator? To get the attribute count down, and hopefully process the data through your clustering algorithms, try pruning. 

  • brunabruna Member Posts: 7 Contributor I

    Yes I am using pruning while processing the documents. Still the amount of attributes is enormous...

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

    you could try a PCA and cluster in the PCA-space.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • brunabruna Member Posts: 7 Contributor I

    Hi Martin, 

     

    actually I have never used PCA before. So I did give it a try. And the process took hours for 50 documents... is that normal?

     

    And I do not really understand the resulting example set (see the screenshot). I recieve pc_1, pc_2, etc. as results instead of the attributes (words) I had before. How can I interpret this?

     

    PCA_example set.PNG

  • brunabruna Member Posts: 7 Contributor I

    So, has anyone any idea regarding my clustering process??

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Oh hey, so for PCA it's a transfomation to get rid of correlated variables and create a data set of uncorrelated values.  Simafore wrote up a great article on how to do it in RapidMiner and how to interpret it here: http://www.simafore.com/blog/bid/57651/When-Principal-Component-Analysis-makes-sense-in-business-analytics

     

    and

     

    http://www.simafore.com/blog/bid/62910/How-to-run-Principal-Component-Analysis-with-RapidMiner-Part-1

     

     

     

Sign In or Register to comment.