How to balance examples ?

AxelAxel Member Posts: 19 Maven
edited November 2018 in Help
Hello everybody,

I have a classification problem with two classes and one of those classes is in large excess in my data set.
I would like to use roughly equal numbers of the two classes for my learner and so I wonder, if
there Is  a way to select only a subset of the examples whose class is in excess ?
I looked at the Sampling operator, but that samples the same fraction from all classes.

Many thanks,

axel

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there Axel.

    There probably is a much smarter way of doing this, but I'm too wrecked to think of it  ;D, so you'll have to make do with the following...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="Count All Examples" class="DataMacroDefinition">
            <parameter key="macro" value="Total"/>
        </operator>
        <operator name="Change label to more tractable attribute" class="ChangeAttributeRole">
            <parameter key="name" value="label"/>
        </operator>
        <operator name="Sort examples" class="Sorting">
            <parameter key="attribute_name" value="label"/>
        </operator>
        <operator name="Take a copy for later" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="Remove positives" class="ExampleFilter">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=positive"/>
            <parameter key="invert_filter" value="true"/>
        </operator>
        <operator name="Count Negatives" class="DataMacroDefinition">
            <parameter key="macro" value="Neg"/>
        </operator>
        <operator name="Calculate Positives" class="MacroConstruction">
            <list key="function_descriptions">
              <parameter key="Pos" value="%{Total}-%{Neg}"/>
            </list>
            <parameter key="use_standard_constants" value="false"/>
        </operator>
        <operator name="Compute First & Last deletions" class="MacroConstruction">
            <list key="function_descriptions">
              <parameter key="First" value="if(%{Neg}&gt;=%{Pos},1,2*%{Neg}+1)"/>
              <parameter key="Last" value="if(%{Neg}&gt;=%{Pos},%{Total}-2*%{Pos},%{Total})"/>
            </list>
        </operator>
        <operator name="Restore copy, trash other" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
            <parameter key="delete_others" value="true"/>
        </operator>
        <operator name="Filter to equalise" class="ExampleRangeFilter">
            <parameter key="first_example" value="%{First}"/>
            <parameter key="last_example" value="%{Last}"/>
            <parameter key="invert_filter" value="true"/>
        </operator>
        <operator name="Restore label" class="ChangeAttributeRole">
            <parameter key="name" value="label"/>
        </operator>
    </operator>
    You'd better test it as well, as I haven't !

    Have fun...



  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if your learner supports weighted examples, you could use the equal label weighting operator. It will distribute over all labels the same amount of weight.
    But I guess we should add some sort of balancing operator in the future...

    Greetings,
      Sebastian
  • AxelAxel Member Posts: 19 Maven
    Wow Haddock,

    that's not very nice, but it works !

    Many thanks,
            Axel

    P.S. But I think, RapidMiner really needs a special operator for this...
  • alejandro_tobonalejandro_tobon Member Posts: 16 Maven
    Hi, I can not make this code run on Rapid miner 5, I need help.

    Thanks
    Alejandro
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    well I think you either have to install RM4.x and load it there, store it and import the file, or you could extract another valid RapidMiner 4.x process file, insert the code there and import it with RapidMiner 5.0.

    Or you simply build the process manually from scratch...

    Greetings,
      Sebastian
Sign In or Register to comment.