Hello everyone how can i restrict to sample size at clustering algorthm ?

SelimSelim Member Posts: 32 Contributor I
edited June 2019 in Help
F.e i have 3 cluster and 20 item and when i apply to k-means its giving me which have 11-2-7 item but i want to that it is gonna similar size f.e 7-7-6 how can do that ? 
Kind regards,

Best Answers

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @Selim

    The result is not guaranteed but you can try to use the DBScan model (an other cluster algorithm) and play with its 2 parameters epsilon and min points.
    By playing with these parameters, I was able to classify the "Iris dataset" in 3 clusters of approximately same size : 
    Here the process : 
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="112" y="85">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="dbscan" compatibility="9.2.001" expanded="true" height="82" name="Clustering" width="90" x="246" y="85">
            <parameter key="epsilon" value="0.8"/>
            <parameter key="min_points" value="40"/>
            <parameter key="add_cluster_attribute" value="true"/>
            <parameter key="add_as_label" value="false"/>
            <parameter key="remove_unlabeled" value="false"/>
            <parameter key="measure_types" value="MixedMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="GeneralizedIDivergence"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    
    Otherwise here an interesting link : 

    https://stackoverflow.com/questions/5452576/k-means-algorithm-variation-with-equal-cluster-size

    Hope this helps,

    Regards,

    Lionel


  • SelimSelim Member Posts: 32 Contributor I
    Firstly thanks for answer i will try. And also Do you have an idea about do it with execute python ? Which code i need to write on python ?
  • SelimSelim Member Posts: 32 Contributor I
    Many Thanks again . If i send you my excel file and rapid miner process can you check it ? And my python knowledge is not very well ı am just beginner on python ı have read your answer to a question which one is at april 2018 so ı have tried to do it with execute python operator on python script. So as result can you check my process ? And if ı copy paste to this codes will it work do you think ? 
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Selim,

    No, the Python code provided in my previous post will not work if you just copy-paste (it is only a simplified pseudo-code).
    But, yes, if you provide your Excel file and your RapidMiner process, I will work on your project to provide a process
    which performs what you want to do (to obtain cluster(s) of same size).

    Regards,

    Lionel
  • SelimSelim Member Posts: 32 Contributor I
    thanks a lot again.
  • SelimSelim Member Posts: 32 Contributor I
    ı am doing zoning at a warehouse .when ı run to this process it is giving 5 cluster with similar size but it does not mean that when ı work with 10.000 item it will give same size clusters so ı want to do sth permanent .so ı think ı need to write code on python.what do you think about this process and how can we do this ?

  • SelimSelim Member Posts: 32 Contributor I
    when ı try to send photo of process it is giving error.so ı can tell you to process.
    read excel---nominal to numerical---normalize----weight by user ---select by weights---clustering(k-means)---performance(distance)
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Selim

    To share your RapidMiner's process, follow these instructions : 

    Note: This solution requires the "XML" panel which can be opened in the "View" menu and then "Show Panel".  Activate the XML panel if you did not do this before.

    Open your process in RapidMiner and open the XML panel. If you can't find it, make sure to follow the note above.

    Copy the XML code from there and paste it somewhere else, for example into a forum post here on the community portal.  By the way, if you post your XML here, please use the code environment which you get by clicking on the </> icon in the toolbar of the post.

    In order to import such an XML description of your process, e.g. to use a process someone else has posted here in the forum, please follow the following steps:

    1. Create a new process and go the the XML panel (see above).
    2. Clear the view and copy the XML code you got into that panel.
    3. Then press the green checkmark icon on top of the panel.
    4. Switch back to the Process panel.

    Don't forget step 3 above - you need to accept the changed XML code first before you will see any changes in the process!


    Regards,


    Lionel

  • SelimSelim Member Posts: 32 Contributor I
    here is the steps of clustering at rapid miner 
  • SelimSelim Member Posts: 32 Contributor I
    edited April 2019
    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>

          <operator activated="true" class="normalize" compatibility="8.1.003" expanded="true" height="103" name="Normalize" width="90" x="112" y="238">
            <parameter key="attribute" value="ağırlık"/>
            <parameter key="attributes" value="hacim|ağırlık|satış miktar"/>
          </operator>
          <operator activated="true" class="weight_by_user_specification" compatibility="8.1.003" expanded="true" height="82" name="Weight by User Specification" width="90" x="313" y="238">
            <list key="name_regex_to_weights">
              <parameter key="hacim" value="2.0"/>
              <parameter key="ağırlık" value="4.0"/>
              <parameter key="satış miktar" value="12.0"/>
            </list>
          </operator>
          <operator activated="true" class="select_by_weights" compatibility="8.1.003" expanded="true" height="103" name="Select by Weights" width="90" x="514" y="238">
            <parameter key="deselect_unknown" value="false"/>
          </operator>
          <operator activated="true" class="concurrency:k_means" compatibility="8.1.003" expanded="true" height="82" name="Clustering" width="90" x="514" y="34">
            <parameter key="k" value="5"/>
            <parameter key="determine_good_start_values" value="true"/>
            <parameter key="measure_types" value="MixedMeasures"/>
          </operator>
          <operator activated="true" class="cluster_distance_performance" compatibility="8.1.003" expanded="true" height="103" name="Performance" width="90" x="648" y="85">
            <parameter key="normalize" value="true"/>
            <parameter key="maximize" value="true"/>
          </operator>
          <connect from_port="input 1" to_op="Read Excel" to_port="file"/>
          <connect from_op="Read Excel" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Weight by User Specification" to_port="example set"/>
          <connect from_op="Weight by User Specification" from_port="weights" to_op="Select by Weights" to_port="weights"/>
          <connect from_op="Weight by User Specification" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
          <connect from_op="Select by Weights" from_port="example set output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_op="Performance" to_port="cluster model"/>
          <connect from_op="Clustering" from_port="clustered set" to_op="Performance" to_port="example set"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <connect from_op="Performance" from_port="example set" to_port="result 2"/>
          <connect from_op="Performance" from_port="cluster model" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>

  • SelimSelim Member Posts: 32 Contributor I
    İs it okay rıght now ? 
  • SelimSelim Member Posts: 32 Contributor I
    @lionelderkrikor hello sir ı have been waiting for your answer . Did you consider to process ? 
  • SelimSelim Member Posts: 32 Contributor I
    edited April 2019
    @lionelderkrikor
    thank you so much really. ı did it now . ı really thank you so so much again.
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Selim

    Yes you have to copy the XML process I shared and then paste it in the XML panel of RapidMiner.
    Then you have to click on the green check mark -> The process will appear in the main window.

    Tell me if you have a problem...

    Regards,

    Lionel
  • SelimSelim Member Posts: 32 Contributor I
    @lionelderkrikor sir ı have problem about execute python operator it is giving error .ı did the python path in Settings --> Preferences --> Python Scripting window. ı set the python path and tested it but  it gave me error 
    ı added screenshot of error to Word file may you check it,please ? 

Sign In or Register to comment.