Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Generate more examples based on our dataset data
mansour_ebrahim
Member Posts: 22 Contributor II
in Help
Hi all
I asked the following question but haven't received a piece of good advice so far. Do appreciate if anyone can help me. (I don't want to do UPsampling or SMOT operator).
Is there any operator in RapidMiner to increase the number of example in the dataset? I mean an operator which generate more samples from all groups and increase the total numbers of example in my dataset. I am running a DL model on my dataset but the number of samples is not enough and cannot get more samples and have to generate and produce more samples from all groups.
Also, I do not want to balance the number of samples in classes; just increasing the size of dataset let's say threefold.
Regards.
Mansour
I asked the following question but haven't received a piece of good advice so far. Do appreciate if anyone can help me. (I don't want to do UPsampling or SMOT operator).
Is there any operator in RapidMiner to increase the number of example in the dataset? I mean an operator which generate more samples from all groups and increase the total numbers of example in my dataset. I am running a DL model on my dataset but the number of samples is not enough and cannot get more samples and have to generate and produce more samples from all groups.
Also, I do not want to balance the number of samples in classes; just increasing the size of dataset let's say threefold.
Regards.
Mansour
1
Answers
If you just want to repeat your existing examples, multiply your example set and use Append to append them as many times as you want.
You can optionally add some noise by randomly changing some attribute values.
However, this won't really change your model. You usually can't cheat machine learning algorithms by inventing more data than you actually have.
Regards,
Balázs
For your purpose, you can use SAMPLE (BOOTSTRAPPING) operator which will do exactly what you want - increase number of examples without creating any synthetic examples. But as @BalazsBarany said already, this technique won't have any significant effect on model performance.
Vladimir
http://whatthefraud.wtf