Cluster algortyms

SelimSelim Member Posts: 32 Contributor I
what is the x-means and k- medoid ? And also what is the difference between k-means of this two algortym ? 

Best Answer

Answers

  • SelimSelim Member Posts: 32 Contributor I
    Thanks @rfuentealba . I got a question one more .fuzzy c mean and x-means are same things ?
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hello @Selim

    No, fuzzy clustering algorithms use a different type of function, called the "fuzzer" or "fuzzifier", to see if an algorithm belongs to certain cluster or not. While the idea of clustering remains the same, fuzzy clustering uses similarity, intensity and distance as the three stooges main points of analysis, and one example can potentially (though not commonly) belong to more than one cluster. That isn't possible with k-Means, k-Medoids and X-Means, because these are "hard labeled".

    Fuzzy C means is available in the "Information Selection" plugin for RapidMiner. It's not part of the standard RapidMiner, BTW.

    All the best,

    Rodrigo.
  • SelimSelim Member Posts: 32 Contributor I
    edited May 2019
    @rfuentealba firstly thank for ur answers. ı got one question more . now ı have been working on a k-means clustering algorthm for a zoning warehouse.and it is working with execute python operator and it is dividing to clusters same size.and ı got attribute which is "volume".as you know volume is very important for warehouse so ı want to that sum of all clusters volume gonna be equal each other .so how can ı do that ? ı have shared my xml below.ı am waiting 4 ur answer 
    kind regards
    ---------------------
    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="9.2.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85">
            <parameter key="excel_file" value="C:\Users\selimcelebi\Desktop\Yeni Microsoft Excel Çalışma Sayfası.xlsx"/>
            <parameter key="sheet_selection" value="sheet number"/>
            <parameter key="sheet_number" value="1"/>
            <parameter key="imported_cell_range" value="A1"/>
            <parameter key="encoding" value="SYSTEM"/>
            <parameter key="first_row_as_names" value="true"/>
            <list key="annotations"/>
            <parameter key="date_format" value=""/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="read_all_values_as_polynominal" value="false"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="StockCode.true.integer.attribute"/>
              <parameter key="1" value="Description.true.polynominal.attribute"/>
              <parameter key="2" value="weight(gram).true.integer.attribute"/>
              <parameter key="3" value="volume(cm3).true.integer.attribute"/>
              <parameter key="4" value="quantity.true.integer.attribute"/>
              <parameter key="5" value="UnitPrice.true.real.attribute"/>
              <parameter key="6" value="fragility.true.integer.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="9.2.001" expanded="true" height="103" name="Normalize" width="90" x="246" y="85">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="|fragility|StockCode|volume(cm3)|weight(gram)|UnitPrice|quantity"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="method" value="Z-transformation"/>
            <parameter key="min" value="0.0"/>
            <parameter key="max" value="1.0"/>
            <parameter key="allow_negative_values" value="false"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="85">
            <list key="function_descriptions"/>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="85">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value="F"/>
            <parameter key="attributes" value="|Description|StockCode"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="715" y="238">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value="F"/>
            <parameter key="attributes" value="|Description"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID (2)" width="90" x="983" y="187">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="set_macros" compatibility="9.2.001" expanded="true" height="82" name="Set Macros" width="90" x="715" y="85">
            <list key="macros">
              <parameter key="cluster_number" value="10"/>
            </list>
          </operator>
          <operator activated="true" class="python_scripting:execute_python" compatibility="9.2.000" expanded="true" height="103" name="Execute Python" width="90" x="849" y="85">
            <parameter key="script" value="import pandas as pd&#10;from operator import itemgetter&#10;import numpy as np&#10;import random&#10;import sys&#10;from scipy.spatial import distance&#10;from sklearn.cluster import KMeans&#10;&#10;&#10;C = %{cluster_number}&#10;&#10;def k_means(X) : &#10;&#10;  kmeans = KMeans(n_clusters=C, random_state=0).fit(X)&#10;  return kmeans.cluster_centers_&#10;&#10;&#10;&#10;&#10;def samesizecluster( D ):&#10;    &quot;&quot;&quot; in: point-to-cluster-centre distances D, Npt x C&#10;            &#10;        out: xtoc, X -&gt; C, equal-size clusters&#10;       &#10;    &quot;&quot;&quot;&#10;       &#10;    Npt, C = D.shape&#10;    clustersize = (Npt + C - 1) // C&#10;    xcd = list( np.ndenumerate(D) )  # ((0,0), d00), ((0,1), d01) ...&#10;    xcd.sort( key=itemgetter(1) )&#10;    xtoc = np.ones( Npt, int ) * -1&#10;    nincluster = np.zeros( C, int )&#10;    nall = 0&#10;    for (x,c), d in xcd:&#10;        if xtoc[x] &lt; 0  and  nincluster[c] &lt; clustersize:&#10;            xtoc[x] = c&#10;            nincluster[c] += 1&#10;            nall += 1&#10;            if nall &gt;= Npt:  break&#10;    return xtoc&#10;&#10;def rm_main(data):&#10; &#10;  data_2 = data.values&#10;  &#10;  centres = k_means(data_2)&#10;  D = distance.cdist( data_2, centres )&#10;  xtoc = samesizecluster( D )&#10;  data['cluster'] = xtoc&#10;&#10;    &#10;  return data"/>
            <parameter key="use_default_python" value="true"/>
            <parameter key="package_manager" value="conda (anaconda)"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role (2)" width="90" x="983" y="85">
            <parameter key="attribute_name" value="cluster"/>
            <parameter key="target_role" value="cluster"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.001" expanded="true" height="82" name="Generate ID" width="90" x="1117" y="85">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.001" expanded="true" height="82" name="Join" width="90" x="1251" y="187">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <connect from_port="input 1" to_op="Read Excel" to_port="file"/>
          <connect from_op="Read Excel" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Set Macros" to_port="through 1"/>
          <connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
          <connect from_op="Execute Python" from_port="output 1" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Selim

    Are you asking about taking the sum of volume column based on the cluster number? If so, you can use the Aggregate operator and group by based on cluster column.

    If this is not the answer you are looking for. please explain a bit more about your requirement.

    Thanks
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hello @Selim,

    I have the same questions as Varun has. I will work on your problem tomorrow, I promise.
    BTW, friendly moderator's advice: you are making too many questions on the same thread, and that makes it difficult to find proper answers in a future. It's not like we are charging you for writing a new post each time you have questions for the community. :wink:
    All the best,

    Rodrigo.
  • SelimSelim Member Posts: 32 Contributor I
    edited May 2019
    @varunm1 @rfuentealba firstly thanks for your answer.ı will tell it again rn.
    firtsly ı need say that ı am doing clustering with 4 attribute which include "volume"
    and ı am doing this clustering in the warehouse(storage) so volume is very important for me so when ı cluster to items sum of each clusters has to be equal(volume) . if ı have to give an example
    .in this data ı wanna that cluster 1 gonna be = 1-3-5 cluster 2 gonna be =2-4-6 because sum of  volume of every cluster same that it is 60 .ı hope u got what ı mean .ıf u dont pls say it to me.ı am waiting for ur answer .
    • item no   volume   
            1              10 
            2               15
           3                20 
           4                25
           5               30 
           6                20
  • SelimSelim Member Posts: 32 Contributor I
    @varunm1 @rfuentealba do you have any idea ? ı really need to solve to this problem.if you wanna ı can share my process via xml to understand exactly what ı am doing 
    Kind Regards,
Sign In or Register to comment.