The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Balanced sampling"
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
"Given index '4000' does not fit the mapped ExampleSet".
My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)
Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.
-----------------------------------------------------
Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:
com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
com.rapidminer.operator.Operator.execute(Operator.java:771)
com.rapidminer.Process.run(Process.java:899)
com.rapidminer.Process.run(Process.java:795)
com.rapidminer.Process.run(Process.java:790)
com.rapidminer.Process.run(Process.java:780)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Tagged:
0
Contributor II
Answers
In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.
regards
Andrew
How will I then sample 2000 from each?
Thanks,
Frankie
Best, Marius
Hello
How can I equal the number of classes (50 50) for two feature?
Scott
hi
Please explain the oversampling balance steps
hi
I would recommend going through the Sample operator tutorial (found inside the Sample help pane).
The Mannheim extension also has a Balance data operator.
Scott
Hi @abbasi_samira
There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.
Vladimir
http://whatthefraud.wtf