The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Balanced sampling"
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Tagged:
0
Answers
In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.
regards
Andrew
How will I then sample 2000 from each?
Thanks,
Frankie
Best, Marius
Hello
How can I equal the number of classes (50 50) for two feature?
Scott
hi
Please explain the oversampling balance steps
hi
I would recommend going through the Sample operator tutorial (found inside the Sample help pane).
The Mannheim extension also has a Balance data operator.
Scott
Hi @abbasi_samira
There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.
Vladimir
http://whatthefraud.wtf