# "Sample operators"

I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.

I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.

Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.

How do I do this?

## Answers

347MavenIf I understand you correctly, than you want to sample the data in such a way that you have the same number of examples for all classes. Beware that this may change some properties of the data so that a model trained on this subset but applied to set of the initial structure may be biased.

How to:

Rapidminer is like lego, there is not a single operator to achieve this but the combination of many.

Here are the bricks:

- Filter Examples

- Multiply

- Join

- Sample

99Contributor IIsince this is a common task, I added a example process on myExperiment.

Search for "Change Class Distribution of Your Training Data Set by Filtering and Sampling" in the myExperiment View to download the process.

http://www.myexperiment.org/workflows/1775.html

See http://rapid-i.com/component/option,com_myblog/show,Video-on-RapidMiner-Community-Extension-myExperiment-.html/Itemid,172/lang,en/ ; for the myExperiment stuff.

See also the "Same Number of Examples per Class" process here on myExperiment http://www.myexperiment.org/workflows/1315.html for a more sophisticated/generic solution.

