Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Balanced sampling"
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Tagged:
0
Answers
In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.
regards
Andrew
How will I then sample 2000 from each?
Thanks,
Frankie
Best, Marius
Hello
How can I equal the number of classes (50 50) for two feature?
Scott
hi
Please explain the oversampling balance steps
hi
I would recommend going through the Sample operator tutorial (found inside the Sample help pane).
The Mannheim extension also has a Balance data operator.
Scott
Hi @abbasi_samira
There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.
Vladimir
http://whatthefraud.wtf