Cannot change k-value for the sample kmean with plot.

kal12kal12 Member Posts: 5 Newbie
edited August 2020 in Help
hello community. I need to do k-means cluster using rapidminer and produce elbow method graph. i've tested the k-means clustering with plot sample. However, when I try to change the k value from 13 to 5 for the k-means operator and run, the k value does not change instead it turn back to 13 and produce 13 cluster. Can someone tell me what is the problem and solution?

Thank you in advance.
Tagged:

Best Answer

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,453 RM Data Scientist
    Hi @Kal12,
    are you sure that you don't use any optimize around your kmeans? Or that you are using X-means?

    Please post your process here, then we can have a look.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • kal12kal12 Member Posts: 5 Newbie
    Thank you for your response. This is the xml, I also attach the .rmp file.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.7.002" expanded="true" name="Root" origin="GENERATED_SAMPLE">
        <parameter key="logverbosity" value="warning"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.7.002" expanded="true" height="68" name="Retrieve" origin="GENERATED_SAMPLE" width="90" x="45" y="34">
            <parameter key="repository_entry" value="../../data/Iris"/>
          </operator>
          <operator activated="true" class="loop_parameters" compatibility="9.7.002" expanded="true" height="82" name="ParameterIteration" origin="GENERATED_SAMPLE" width="90" x="179" y="34">
            <list key="parameters">
              <parameter key="KMeans.k" value="2,3,4,5,6,7,8,9,10,11,13"/>
            </list>
            <parameter key="error_handling" value="fail on error"/>
            <parameter key="synchronize" value="false"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:k_means" compatibility="9.7.002" expanded="true" height="82" name="KMeans" origin="GENERATED_SAMPLE" width="90" x="45" y="34">
                <parameter key="add_cluster_attribute" value="true"/>
                <parameter key="add_as_label" value="false"/>
                <parameter key="remove_unlabeled" value="false"/>
                <parameter key="k" value="13"/>
                <parameter key="max_runs" value="10"/>
                <parameter key="determine_good_start_values" value="false"/>
                <parameter key="measure_types" value="BregmanDivergences"/>
                <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
                <parameter key="nominal_measure" value="NominalDistance"/>
                <parameter key="numerical_measure" value="EuclideanDistance"/>
                <parameter key="divergence" value="SquaredEuclideanDistance"/>
                <parameter key="kernel_type" value="radial"/>
                <parameter key="kernel_gamma" value="1.0"/>
                <parameter key="kernel_sigma1" value="1.0"/>
                <parameter key="kernel_sigma2" value="0.0"/>
                <parameter key="kernel_sigma3" value="2.0"/>
                <parameter key="kernel_degree" value="3.0"/>
                <parameter key="kernel_shift" value="1.0"/>
                <parameter key="kernel_a" value="1.0"/>
                <parameter key="kernel_b" value="0.0"/>
                <parameter key="max_optimization_steps" value="100"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
              </operator>
              <operator activated="true" class="cluster_distance_performance" compatibility="9.7.002" expanded="true" height="103" name="Evaluation" origin="GENERATED_SAMPLE" width="90" x="179" y="34">
                <parameter key="main_criterion" value="Avg. within centroid distance"/>
                <parameter key="main_criterion_only" value="false"/>
                <parameter key="normalize" value="false"/>
                <parameter key="maximize" value="false"/>
              </operator>
              <operator activated="true" class="log" compatibility="9.7.002" expanded="true" height="103" name="ProcessLog" origin="GENERATED_SAMPLE" width="90" x="313" y="34">
                <list key="log">
                  <parameter key="k" value="operator.KMeans.parameter.k"/>
                  <parameter key="DB" value="operator.Evaluation.value.DaviesBouldin"/>
                  <parameter key="W" value="operator.Evaluation.value.avg_within_distance"/>
                </list>
                <parameter key="sorting_type" value="none"/>
                <parameter key="sorting_k" value="100"/>
                <parameter key="persistent" value="false"/>
              </operator>
              <connect from_port="input 1" to_op="KMeans" to_port="example set"/>
              <connect from_op="KMeans" from_port="cluster model" to_op="Evaluation" to_port="cluster model"/>
              <connect from_op="KMeans" from_port="clustered set" to_op="Evaluation" to_port="example set"/>
              <connect from_op="Evaluation" from_port="performance" to_op="ProcessLog" to_port="through 1"/>
              <connect from_op="Evaluation" from_port="example set" to_op="ProcessLog" to_port="through 2"/>
              <connect from_op="ProcessLog" from_port="through 1" to_port="performance"/>
              <connect from_op="ProcessLog" from_port="through 2" to_port="result 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="ParameterIteration" to_port="input 1"/>
          <connect from_op="ParameterIteration" from_port="result 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • kal12kal12 Member Posts: 5 Newbie
    Ok, now I see. Thank you very much.
Sign In or Register to comment.