Creating equally sized clusters that are representative for the population

Kristjan_Mar · February 2021

Hi all,

I have a set of data (population) with individuals that have signed up to be a part of a group. When they signed up they gave some background information, leaving me with 5 variables that I am mostly focusing on.

What I want to do is create 4 equally sized groups that are as representative for the whole population as possible. That is, I want to create 4 homogenous groups.

Also, I have some other columns in the dataset that are important in handling/using the dataset. I would like this information to be included in each of the groups (subsamples) so that they still match the respondent that they should belong to.

In short: How can I create four homogenous subsamples that are representative of the population, using only selected variables from the dataset?

Cheers, K

MarcoBarradas · February 2021

Hi @Kristjan_Mar it seems you need to create 4 stratified samples of your data.
For that you need to use the Split Data operator with sampling type stratified.

Hope that helps you.

Telcontar120 · February 2021

I think I am confused about your wording of your intended outcome here---"as representative of the whole population as possible" and "homogeneous" are typically not synonymous. If you want the groups to be as representative of the whole as possible, you basically want random subsets, which you can accomplish easily by Split Data and choosing sampling type of shuffled. You would only need to select the sampling type of stratify if you first choose a nominal attribute as your label to stratify on, and you want to make sure that each resulting partition contains the same proportions of these label classes. I suggest you have a look at the tutorial and help explanation of the Split Data operator. (You can use Select Attributes prior to the split to only bring in the 5 attributes that you are interested in if you only want to look at those).

Kristjan_Mar · February 2021

Thank you @MarcoBarradas and @Telcontar120!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Creating equally sized clusters that are representative for the population

Best Answers

Answers