Partitioning the DataSet into N samples

John_De_Jong · October 2011

Is there a Preprocessing Filter in Rapid Miner where i can take a whole dataset, and create N samples with same distribution as the originial data set.
An example
I have data set with 1Million data, with two classes. So original Instance has size of 1 Million. I want to sub-sample them into 50 sub-samples with 20K data in each sample, i.e size of sample1, sample2...sample50 is 20K. When i run the filter i get 50 Instances, and each Instance has 20K, and each sample of 20K is unique samples from 1 Million, and it has same balance between the labels as in 1 Million, i.e if label1 had 90% and label2 had 10%, in 20K i have 18K of label1 and 2K of label2.

Any help would be appreciated
John

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Partitioning the DataSet into N samples