Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Sample operators"
Hi,
I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.
I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.
Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.
How do I do this?
Thanks
B
I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.
I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.
Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.
How do I do this?
Thanks
B
Tagged:
0
Answers
If I understand you correctly, than you want to sample the data in such a way that you have the same number of examples for all classes. Beware that this may change some properties of the data so that a model trained on this subset but applied to set of the initial structure may be biased.
How to:
Rapidminer is like lego, there is not a single operator to achieve this but the combination of many.
Here are the bricks:
- Filter Examples
- Multiply
- Join
- Sample
hope this was helpful,
steffen
since this is a common task, I added a example process on myExperiment.
Search for "Change Class Distribution of Your Training Data Set by Filtering and Sampling" in the myExperiment View to download the process.
http://www.myexperiment.org/workflows/1775.html
See http://rapid-i.com/component/option,com_myblog/show,Video-on-RapidMiner-Community-Extension-myExperiment-.html/Itemid,172/lang,en/ ; for the myExperiment stuff.
See also the "Same Number of Examples per Class" process here on myExperiment http://www.myexperiment.org/workflows/1315.html for a more sophisticated/generic solution.
Ciao Sebastian