Generate more examples based on our dataset data

mansour_ebrahim · August 2019

Hi all
I asked the following question but haven't received a piece of good advice so far. Do appreciate if anyone can help me. (I don't want to do UPsampling or SMOT operator).

Is there any operator in RapidMiner to increase the number of example in the dataset? I mean an operator which generate more samples from all groups and increase the total numbers of example in my dataset. I am running a DL model on my dataset but the number of samples is not enough and cannot get more samples and have to generate and produce more samples from all groups.
Also, I do not want to balance the number of samples in classes; just increasing the size of dataset let's say threefold.
Regards.
Mansour

BalazsBarany · August 2019

Hi!

If you just want to repeat your existing examples, multiply your example set and use Append to append them as many times as you want.

You can optionally add some noise by randomly changing some attribute values.

However, this won't really change your model. You usually can't cheat machine learning algorithms by inventing more data than you actually have.

Regards,

Balázs

kypexin · August 2019

Hi @mansour_ebrahim

For your purpose, you can use SAMPLE (BOOTSTRAPPING) operator which will do exactly what you want - increase number of examples without creating any synthetic examples. But as @BalazsBarany said already, this technique won't have any significant effect on model performance.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Generate more examples based on our dataset data

Answers