Options

Partitioning the DataSet into N samples

John_De_JongJohn_De_Jong Member Posts: 10 Contributor II
edited November 2018 in Help
Is there a Preprocessing Filter in Rapid Miner where i can take a whole dataset, and create N samples with same distribution as the originial data set.
An example
I have data set with 1Million data, with two classes. So original Instance has size of 1 Million. I want to sub-sample them into 50 sub-samples with 20K data in each sample, i.e size of sample1, sample2...sample50 is 20K. When i run the filter i get 50 Instances, and each Instance has 20K, and each sample of 20K is unique samples from 1 Million, and it has same balance between the labels as in 1 Million, i.e if label1 had 90% and label2 had 10%, in 20K i have 18K of label1 and 2K of label2.

Any help would be appreciated
John
Sign In or Register to comment.