Generate more examples based on our dataset data

mansour_ebrahimmansour_ebrahim Member Posts: 22 Contributor II
Hi all
I asked the following question but haven't received a piece of good advice so far. Do appreciate if anyone can help me. (I don't want to do UPsampling or SMOT operator).

Is there any operator in RapidMiner to increase the number of example in the dataset? I mean an operator which generate more samples from all groups and increase the total numbers of example in my dataset. I am running a DL model on my dataset but the number of samples is not enough and cannot get more samples and have to generate and produce more samples from all groups. 
Also, I do not want to balance the number of samples in classes; just increasing the size of dataset let's say threefold.
Regards.
Mansour

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi! 

    If you just want to repeat your existing examples, multiply your example set and use Append to append them as many times as you want.

    You can optionally add some noise by randomly changing some attribute values. 

    However, this won't really change your model. You usually can't cheat machine learning algorithms by inventing more data than you actually have.

    Regards,

    Balázs
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn
    Hi @mansour_ebrahim

    For your purpose, you can use SAMPLE (BOOTSTRAPPING) operator which will do exactly what you want - increase number of examples without creating any synthetic examples. But as @BalazsBarany said already, this technique won't have any significant effect on model performance. 
Sign In or Register to comment.