Dealing with Imbalanced Data
I'm studying the consequences of imbalanced data. I'm trying to replicate some earlier papers on the topic (e.g. Japkowicz 2002).
This is what I need to do, but I'm stuck:
1) Take the original dataset
2) Split it according to the value of the label (call the two new example sets : Common and Rare).
3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.
4) Join the resampled Rare with the old Common.
I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.
Thanks in advance for any help,
\E
This is what I need to do, but I'm stuck:
1) Take the original dataset
2) Split it according to the value of the label (call the two new example sets : Common and Rare).
3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.
4) Join the resampled Rare with the old Common.
I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.
Thanks in advance for any help,
\E
0
Answers
http://rapid-i.com/rapidforum/index.php/topic,1246.msg4786.html#msg4786