The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
create clusters of the same/similar size
dparaskevop
Member Posts: 11 Contributor II
Hello all,
How can I preserve a balance among my clusters? Eg groups of 10 people, with similar characteristics. At the moment I get clusters with 18 people and clusters with 3 people on the same data set, when I use k-means. Can I somehow restrict the number of objects per cluster?
Many thanks,
Dimitris
0
Answers
This requirement isn't a classic application of clustering based on machine learning algorithms, and while there are some constrained clustering algorithms out there that can do what you want, I am not aware of any that are implemented in RapidMiner clustering operators (although I'd love to see one because this question does get asked from time to time). You might be able to find something in R or python that could be used within RapidMiner though.
Alternatively, there are of course operators for simple binning by frequency, so you could come up with some kind of synthesized attribute combining the values of other attributes and then create groups based on that.
@mschmitz any other thoughts on this one?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi @dparaskevop,
Interesting topic.
How suggested by @Telcontar120, there are ressources on internet. So, I did not reinvent the yarn to cut the butter and you can find here a process using a Python script (via the Execute Python operator) :
For example, below, the results of clustering of a "school" dataset with 2 attributes :
- 100 examples chosen at random in range [0,1].
- 30 examples per cluster.
In practice, this script can be generalizable to a space of dimension n. (to be applied to your project of people caracteristics).
I hope that these elements will be useful to you.
Regards,
Lionel
Very slick! When RapidMiner native operators fail you can always count on @lionelderkrikor to come to the rescue with a clever Python script!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts