Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Dealing with Imbalanced Data
I'm studying the consequences of imbalanced data. I'm trying to replicate some earlier papers on the topic (e.g. Japkowicz 2002).
This is what I need to do, but I'm stuck:
1) Take the original dataset
2) Split it according to the value of the label (call the two new example sets : Common and Rare).
3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.
4) Join the resampled Rare with the old Common.
I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.
Thanks in advance for any help,
\E
This is what I need to do, but I'm stuck:
1) Take the original dataset
2) Split it according to the value of the label (call the two new example sets : Common and Rare).
3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.
4) Join the resampled Rare with the old Common.
I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.
Thanks in advance for any help,
\E
0
Answers
http://rapid-i.com/rapidforum/index.php/topic,1246.msg4786.html#msg4786