Cluster algortyms

Selim · May 2019

what is the x-means and k- medoid ? And also what is the difference between k-means of this two algortym ?

rfuentealba · May 2019

Hello,

Let's start with understanding k-Means. You set a number of clusters (k) and the algorithm determines what examples belong to that cluster by determining how far are they from that specific cluster. Then the centroids of each cluster are calculated by averaging the distances of all the examples that belong to that cluster to that cluster.

The k-Medoids algorithm is almost the same as the k-Means algorithm with one difference: the center of a cluster is moved to an example, rather than an imaginary number taken from the calculation specified above.

The x-Means algorithm is an improvement. You don't have to determine the number of clusters. Instead, someone said that there is a possibility of determining the correct number of clusters by running a quick heuristic (e.g. an algorithm that belongs to IA but not to Machine Learning). That heuristic determines how many K's are required for that specific example set, and then the algorithm is more or less the same as a k-Means.

There is a lot of n-dimensional geometry in explaining these algorithms. That is why you need to use these with numbers only.

Hope this helps.

All the best,

Rodrigo.

Selim · May 2019

Thanks @rfuentealba . I got a question one more .fuzzy c mean and x-means are same things ?

rfuentealba · May 2019

Hello @Selim

No, fuzzy clustering algorithms use a different type of function, called the "fuzzer" or "fuzzifier", to see if an algorithm belongs to certain cluster or not. While the idea of clustering remains the same, fuzzy clustering uses similarity, intensity and distance as the three stooges main points of analysis, and one example can potentially (though not commonly) belong to more than one cluster. That isn't possible with k-Means, k-Medoids and X-Means, because these are "hard labeled".

Fuzzy C means is available in the "Information Selection" plugin for RapidMiner. It's not part of the standard RapidMiner, BTW.

All the best,

Rodrigo.

Selim · May 2019

@rfuentealba firstly thank for ur answers. ı got one question more . now ı have been working on a k-means clustering algorthm for a zoning warehouse.and it is working with execute python operator and it is dividing to clusters same size.and ı got attribute which is "volume".as you know volume is very important for warehouse so ı want to that sum of all clusters volume gonna be equal each other .so how can ı do that ? ı have shared my xml below.ı am waiting 4 ur answer
kind regards
---------------------

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">

</context>

</list>

</operator>

</operator>

</operator>

</operator>

</operator>

</operator>

</list>

</operator>

<parameter key="script" value="import pandas as pd
from operator import itemgetter
import numpy as np
import random
import sys
from scipy.spatial import distance
from sklearn.cluster import KMeans


C = %{cluster_number}

def k_means(X) : 

 kmeans = KMeans(n_clusters=C, random_state=0).fit(X)
 return kmeans.cluster_centers_




def samesizecluster( D ):
 """ in: point-to-cluster-centre distances D, Npt x C
 
 out: xtoc, X -> C, equal-size clusters
 
 """
 
 Npt, C = D.shape
 clustersize = (Npt + C - 1) // C
 xcd = list( np.ndenumerate(D) ) # ((0,0), d00), ((0,1), d01) ...
 xcd.sort( key=itemgetter(1) )
 xtoc = np.ones( Npt, int ) * -1
 nincluster = np.zeros( C, int )
 nall = 0
 for (x,c), d in xcd:
 if xtoc[x] < 0 and nincluster[c] < clustersize:
 xtoc[x] = c
 nincluster[c] += 1
 nall += 1
 if nall >= Npt: break
 return xtoc

def rm_main(data):
 
 data_2 = data.values
 
 centres = k_means(data_2)
 D = distance.cdist( data_2, centres )
 xtoc = samesizecluster( D )
 data['cluster'] = xtoc

 
 return data"/>

</operator>

</operator>

</operator>

</operator>

</process>

</operator>

</process>

varunm1 · May 2019

Hello @Selim

Are you asking about taking the sum of volume column based on the cluster number? If so, you can use the Aggregate operator and group by based on cluster column.

If this is not the answer you are looking for. please explain a bit more about your requirement.

Thanks

rfuentealba · May 2019

Hello @Selim,

I have the same questions as Varun has. I will work on your problem tomorrow, I promise.

BTW, friendly moderator's advice: you are making too many questions on the same thread, and that makes it difficult to find proper answers in a future. It's not like we are charging you for writing a new post each time you have questions for the community.

All the best,

Rodrigo.

Selim · May 2019

@varunm1 @rfuentealba firstly thanks for your answer.ı will tell it again rn.
firtsly ı need say that ı am doing clustering with 4 attribute which include "volume"
and ı am doing this clustering in the warehouse(storage) so volume is very important for me so when ı cluster to items sum of each clusters has to be equal(volume) . if ı have to give an example
.in this data ı wanna that cluster 1 gonna be = 1-3-5 cluster 2 gonna be =2-4-6 because sum of volume of every cluster same that it is 60 .ı hope u got what ı mean .ıf u dont pls say it to me.ı am waiting for ur answer .

item no volume

1 10
2 15
3 20
4 25
5 30
6 20

Selim · May 2019

@varunm1 @rfuentealba do you have any idea ? ı really need to solve to this problem.if you wanna ı can share my process via xml to understand exactly what ı am doing
Kind Regards,

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Cluster algortyms

Best Answer

Answers

Be Safe. Follow precautions and Maintain Social Distancing