"Balanced sampling"

frankiefrankie Member Posts: 26  Maven
edited May 23 in Help
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".

My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)

Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.


-----------------------------------------------------

Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:

  com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
  com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
  com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
  com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
  com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
  com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
  com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
  com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
  com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
  com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
  com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
  com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
  com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.Process.run(Process.java:899)
  com.rapidminer.Process.run(Process.java:795)
  com.rapidminer.Process.run(Process.java:790)
  com.rapidminer.Process.run(Process.java:780)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Tagged:

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458   Unicorn
    Hello Frankie,

    In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.

    regards

    Andrew
  • frankiefrankie Member Posts: 26  Maven
    What happens if the two groups contain 4000 and 3000 samples, respectively?
    How will I then sample 2000 from each?


    Thanks,
    Frankie
  • TKTK Member Posts: 14 Contributor II
    You can define absolute Values for each class within the sample operator
  • roman_bednarikroman_bednarik Member Posts: 3 Contributor I
    Hi, picking up on an old thread: how about if the size of the set is not known, e..g we don't know the absolute number of positive and the absolute number of negative examples? Is there a way to select a balanced subset?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hi, you can filter your data by label and then apply sampling operators on the filtered data sets and append them. I think http://rapid-i.com/rapidforum/index.php/topic,5706.0.html gives an example for that.

    Best, Marius
  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    Hello
    How can I equal the number of classes (50 50) for two feature?












    The class contains two values:

    true:94

    false:569











  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

     












    How can I balance the maximum amount of class attribute?











  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,446  Community Manager
    Hello @abbasi_samira - have you tried the “Balance Data” operator?

    Scott

  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    hi












    I want to over sample balance data

    How can I oversampling balance this?
    Please explain the oversampling balance steps

    please help me

    thanks











  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    hi












    Yes I used the Balance Sample

    read excel--->sample--->balance---->relative or absoulat

    but This is the method undersampling balance

    I need to oversampling balance











  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,446  Community Manager

    I would recommend going through the Sample operator tutorial (found inside the Sample help pane).

     

    Screen Shot 2017-12-16 at 12.14.56 PM.png

     

    The Mannheim extension also has a Balance data operator.


    Scott

     

    abbasi_samira
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 280   Unicorn

    Hi @abbasi_samira

     

    There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.

    sgenzermschmitzabbasi_samira
Sign In or Register to comment.