Dealing with Imbalanced Data

earmijo · December 2009

I'm studying the consequences of imbalanced data. I'm trying to replicate some earlier papers on the topic (e.g. Japkowicz 2002).

This is what I need to do, but I'm stuck:

1) Take the original dataset

2) Split it according to the value of the label (call the two new example sets : Common and Rare).

3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.

4) Join the resampled Rare with the old Common.

I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.

Thanks in advance for any help,

\E

earmijo · December 2009

Almost inmediately after posting my question I found a way to do it. It is not very elegant and I'm sure it is not very useful if the dataset is huge, but it works fine for me. It is an example of oversampling the small class. I'll share it with you:

<operator name="Root" class="Process" expanded="yes">
    <operator name="ChurnReductionExampleSetGenerator" class="ChurnReductionExampleSetGenerator">
    </operator>
    <operator name="IOMultiplier" class="IOMultiplier">
        <parameter key="io_object"	value="ExampleSet"/>
    </operator>
    <operator name="IOSelector" class="IOSelector">
        <parameter key="io_object"	value="ExampleSet"/>
    </operator>
    <operator name="ExampleFilter" class="ExampleFilter">
        <parameter key="condition_class"	value="attribute_value_filter"/>
        <parameter key="parameter_string"	value="label = terminate"/>
    </operator>
    <operator name="Bootstrapping" class="Bootstrapping">
        <parameter key="sample_ratio"	value="13.28"/>
    </operator>
    <operator name="IOSelector (2)" class="IOSelector">
        <parameter key="io_object"	value="ExampleSet"/>
        <parameter key="select_which"	value="2"/>
    </operator>
    <operator name="ExampleFilter (2)" class="ExampleFilter">
        <parameter key="condition_class"	value="attribute_value_filter"/>
        <parameter key="parameter_string"	value="label = ok"/>
    </operator>
    <operator name="ExampleSetMerge" class="ExampleSetMerge">
    </operator>
</operator>

haddock · December 2009

Actually this issue has already been covered several times, once even by me..

http://rapid-i.com/rapidforum/index.php/topic,1246.msg4786.html#msg4786

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Dealing with Imbalanced Data

Answers