Options

"(Normal ?) bug : log all criteria / Optimization of cluster model"

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

Hi,

It's to report a bug when the parameter log all criteria is checked for the optimization of a cluster model (Kmeans) .

When the process is executed, RapidMiner raise the following error : 

java.lang.ArrayIndexOutOfBoundsException

When RM create the Optimize Parameters results, each row has in theory a different length  - (length(row(i+1)) = length(row(i)) + 1 - 

because for each row , RM add  Avg. within centroid distance_cluster_i. So when RM try to create the second row, it raise an error

because the dimensions of the table change.

I hope it is understanble. Here the process : 

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="open_file" compatibility="8.0.001" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
<parameter key="resource_type" value="URL"/>
<parameter key="url" value="https://archive.ics.uci.edu/ml/machine-learning-databases/00292/Wholesale customers data.csv"/>
</operator>
<operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
<parameter key="csv_file" value="C:\Users\lueth\Desktop\Wholesale customers data.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Channel.true.binominal.attribute"/>
<parameter key="1" value="Region.true.polynominal.attribute"/>
<parameter key="2" value="Fresh.true.integer.attribute"/>
<parameter key="3" value="Milk.true.integer.attribute"/>
<parameter key="4" value="Grocery.true.integer.attribute"/>
<parameter key="5" value="Frozen.true.integer.attribute"/>
<parameter key="6" value="Detergents_Paper.true.integer.attribute"/>
<parameter key="7" value="Delicassen.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Channel|Region"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="145" name="Multiply" width="90" x="447" y="34"/>
<operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="715" y="391">
<list key="parameters">
<parameter key="Clustering.k" value="[2.0;10;10;linear]"/>
</list>
<parameter key="log_all_criteria" value="true"/>
<process expanded="true">
<operator activated="true" class="k_means" compatibility="8.0.001" expanded="true" height="82" name="Clustering" width="90" x="112" y="34">
<parameter key="k" value="3"/>
</operator>
<operator activated="true" class="cluster_distance_performance" compatibility="8.0.001" expanded="true" height="103" name="Performance" width="90" x="313" y="34"/>
<connect from_port="input 1" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Performance" to_port="cluster model"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Performance" to_port="example set"/>
<connect from_op="Performance" from_port="performance" to_port="performance"/>
<connect from_op="Performance" from_port="cluster model" to_port="model"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="x_means" compatibility="8.0.001" expanded="true" height="82" name="X-Means" width="90" x="715" y="136"/>
<operator activated="true" class="k_means" compatibility="8.0.001" expanded="true" height="82" name="k-Means" width="90" x="715" y="34">
<parameter key="measure_types" value="NumericalMeasures"/>
</operator>
<operator activated="true" class="agglomerative_clustering" compatibility="8.0.001" expanded="true" height="82" name="Agglomerative Clustering" width="90" x="715" y="238">
<parameter key="mode" value="AverageLink"/>
<parameter key="measure_types" value="NumericalMeasures"/>
</operator>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="k-Means" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_op="X-Means" to_port="example set"/>
<connect from_op="Multiply" from_port="output 3" to_op="Agglomerative Clustering" to_port="example set"/>
<connect from_op="Multiply" from_port="output 4" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 6"/>
<connect from_op="Optimize Parameters (Grid)" from_port="model" to_port="result 7"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter set" to_port="result 8"/>
<connect from_op="X-Means" from_port="cluster model" to_port="result 4"/>
<connect from_op="X-Means" from_port="clustered set" to_port="result 5"/>
<connect from_op="k-Means" from_port="cluster model" to_port="result 1"/>
<connect from_op="k-Means" from_port="clustered set" to_port="result 2"/>
<connect from_op="Agglomerative Clustering" from_port="cluster model" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
<portSpacing port="sink_result 6" spacing="0"/>
<portSpacing port="sink_result 7" spacing="0"/>
<portSpacing port="sink_result 8" spacing="0"/>
<portSpacing port="sink_result 9" spacing="0"/>
</process>
</operator>
</process>

 

What is your opinion about that, do you think it deserves "Product feedback" ?

 

Regards, 

 

Lionel

 

 

 

 

 

 

0
0 votes

Fixed and Released · Last Updated

Comments

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again,

     

    Little update : the problem concern log all criteria, so the Loop parameters operator is concerned too.

     

    Regards,

     

    Lionel

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I would report this as a bug in the Product Feedback board.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    moving this thread to Product Feedback.


    Scott

     

     

  • Options
    jczogallajczogalla Employee, Member Posts: 144 RM Engineering

    Hi Lionel,

    we are looking into this. We will keep you updated here!

    Regards

    Jan

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • Options
    jczogallajczogalla Employee, Member Posts: 144 RM Engineering

    Hi Lionel,

    we temporarily fixed this for the Beta by throwing an appropriate user error and are working on a permanent solution

    Regards

    Jan

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi Jan,

     

    Thanks you for your feedback,

     

    Regards, 

     

    Lionel

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    @jczogalla - mark this resolved or still investigating?

  • Options
    jczogallajczogalla Employee, Member Posts: 144 RM Engineering

    @sgenzer - Leave it as investigating please.

Sign In or Register to comment.