Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Balanced sampling"

frankiefrankie Member Posts: 26 Contributor II
edited May 2019 in Help
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".

My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)

Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.


-----------------------------------------------------

Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:

  com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
  com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
  com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
  com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
  com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
  com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
  com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
  com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
  com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
  com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
  com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
  com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
  com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.Process.run(Process.java:899)
  com.rapidminer.Process.run(Process.java:795)
  com.rapidminer.Process.run(Process.java:790)
  com.rapidminer.Process.run(Process.java:780)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Tagged:

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello Frankie,

    In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.

    regards

    Andrew
  • frankiefrankie Member Posts: 26 Contributor II
    What happens if the two groups contain 4000 and 3000 samples, respectively?
    How will I then sample 2000 from each?


    Thanks,
    Frankie
  • TKTK Member Posts: 14 Contributor II
    You can define absolute Values for each class within the sample operator
  • roman_bednarikroman_bednarik Member Posts: 3 Contributor I
    Hi, picking up on an old thread: how about if the size of the set is not known, e..g we don't know the absolute number of positive and the absolute number of negative examples? Is there a way to select a balanced subset?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi, you can filter your data by label and then apply sampling operators on the filtered data sets and append them. I think http://rapid-i.com/rapidforum/index.php/topic,5706.0.html gives an example for that.

    Best, Marius
  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    Hello
    How can I equal the number of classes (50 50) for two feature?












    The class contains two values:

    true:94

    false:569











  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

     












    How can I balance the maximum amount of class attribute?











  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Hello @abbasi_samira - have you tried the “Balance Data” operator?

    Scott

  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    hi












    I want to over sample balance data

    How can I oversampling balance this?
    Please explain the oversampling balance steps

    please help me

    thanks











  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    hi












    Yes I used the Balance Sample

    read excel--->sample--->balance---->relative or absoulat

    but This is the method undersampling balance

    I need to oversampling balance











  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    I would recommend going through the Sample operator tutorial (found inside the Sample help pane).

     

    Screen Shot 2017-12-16 at 12.14.56 PM.png

     

    The Mannheim extension also has a Balance data operator.


    Scott

     

  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @abbasi_samira

     

    There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.

Sign In or Register to comment.