The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

[SOLVED] Balancing data - pull with undelete possible ?

frasfras Member Posts: 93 Contributor II
edited November 2018 in Help
Hi,

say we have data consisting 1000 times class 0 and 50 times class 1.
Using the Operator "Sample" I can resample class 0 to e.g. 800.
BUT I would like to resample class 1 to e.g. 100 so I have to blow them up some how
what is also called "pull with undelete".
Is this possible ?
Thx, Frank

Answers

  • earmijoearmijo Member Posts: 271 Unicorn
    I would use bootstraping. Take a look at the following code. Let me know if it helps you.

    I'm using the dataset Golf that comes with Rapidminer. There are two classes: yes (9 obs) and no (5). I end up with a new dataset which has yes(8 obs) and no(8 obs). That's exactly what you want.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
       <process expanded="true" height="341" width="815">
         <operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="165">
           <parameter key="repository_entry" value="//Samples/data/Golf"/>
         </operator>
         <operator activated="true" class="multiply" compatibility="5.2.006" expanded="true" height="94" name="Multiply" width="90" x="256" y="137"/>
         <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples" width="90" x="447" y="30">
           <parameter key="condition_class" value="attribute_value_filter"/>
           <parameter key="parameter_string" value="Play = yes"/>
         </operator>
         <operator activated="true" class="sample" compatibility="5.2.006" expanded="true" height="76" name="Sample" width="90" x="581" y="30">
           <parameter key="sample_size" value="8"/>
           <list key="sample_size_per_class"/>
           <list key="sample_ratio_per_class"/>
           <list key="sample_probability_per_class"/>
         </operator>
         <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (2)" width="90" x="447" y="210">
           <parameter key="condition_class" value="attribute_value_filter"/>
           <parameter key="parameter_string" value="Play = no"/>
         </operator>
         <operator activated="true" class="sample_bootstrapping" compatibility="5.2.006" expanded="true" height="76" name="Sample (Bootstrapping)" width="90" x="593" y="209">
           <parameter key="sample" value="absolute"/>
           <parameter key="sample_size" value="8"/>
         </operator>
         <operator activated="true" class="append" compatibility="5.2.006" expanded="true" height="94" name="Append" width="90" x="773" y="109"/>
         <connect from_op="Retrieve" from_port="output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
         <connect from_op="Filter Examples" from_port="example set output" to_op="Sample" to_port="example set input"/>
         <connect from_op="Sample" from_port="example set output" to_op="Append" to_port="example set 1"/>
         <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Sample (Bootstrapping)" to_port="example set input"/>
         <connect from_op="Sample (Bootstrapping)" from_port="example set output" to_op="Append" to_port="example set 2"/>
         <connect from_op="Append" from_port="merged set" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • frasfras Member Posts: 93 Contributor II
    Yes, that's it. Separating via "Filter Example" and finally "Append" was not on my list...
Sign In or Register to comment.