"Imbalanced data: label weights or over/undersample"

Bulkington · February 2011

Hi all,

i have to work with an imbalanced dataset for classification. So I want to try to oversample the minority class or to undersample the majority class. According to this earlier post there is no possibility in RM to generate a fixed label distribution through sampling but the same effect can be simulated by label weights:

http://rapid-i.com/rapidforum/index.php/topic,106.0.html

Now my questions:

1. Where can I find the operator EqualLabelWeighting mentioned in the post? Maybe I'm acting dumb but I just can't find it. btw: I'm using RM 5.1.002
2. Since the above mentioned post is more than two years old: I suppose there is still no possibility to actually oversample or undersample minority/majority classes?

I appreciate your help!

Thanks.

land · February 2011

Hi,
the operator this post referred to the operator that is now called Generate Weight (Straticifaction). But you can do a more fine grained sampling by using a process like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.003" expanded="true" name="Process">
    <process expanded="true" height="491" width="788">
      <operator activated="true" class="retrieve" compatibility="5.1.003" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="loop_values" compatibility="5.1.003" expanded="true" height="76" name="Loop Values" width="90" x="179" y="30">
        <parameter key="attribute" value="class"/>
        <parameter key="iteration_macro" value="class"/>
        <process expanded="true" height="509" width="806">
          <operator activated="true" class="filter_examples" compatibility="5.1.003" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="class=%{class}"/>
          </operator>
          <operator activated="true" class="sample_bootstrapping" compatibility="5.1.003" expanded="true" height="76" name="Sample (Bootstrapping)" width="90" x="246" y="30">
            <parameter key="sample" value="absolute"/>
            <parameter key="sample_size" value="200"/>
          </operator>
          <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Sample (Bootstrapping)" to_port="example set input"/>
          <connect from_op="Sample (Bootstrapping)" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="5.1.003" expanded="true" height="76" name="Append" width="90" x="313" y="30"/>
      <connect from_op="Retrieve" from_port="output" to_op="Loop Values" to_port="example set"/>
      <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
      <connect from_op="Append" from_port="merged set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Imbalanced data: label weights or over/undersample"

Answers