The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Balanced sampling"

frankiefrankie Member Posts: 26 image Contributor II
edited May 2019 in Help
A question. When I want to create a balanced training dataset using the "Sample" operator I get an error:
"Given index '4000' does not fit the mapped ExampleSet".

My data has two labels, "-1" and "1" and I try to take a 2000 sample from each of them.
The error comes during XValidation (not for example during preprocessing)

Can somebody help out, or guide me how these samples should/can be taken?
I realize it could be hard to pinpoint the source of the problem, but any ideas? The flow workds perfectly without the sampling operator.


-----------------------------------------------------

Exception: java.lang.RuntimeException
Message: Given index '4000' does not fit the mapped ExampleSet!
Stack trace:

  com.rapidminer.example.set.MappedExampleSet.getExample(MappedExampleSet.java:137)
  com.rapidminer.example.set.NonSpecialAttributesExampleSet.getExample(NonSpecialAttributesExampleSet.java:78)
  com.rapidminer.example.set.RemappedExampleSet.getExample(RemappedExampleSet.java:128)
  com.rapidminer.example.set.SplittedExampleSet.getExample(SplittedExampleSet.java:200)
  com.rapidminer.example.set.IndexBasedExampleSetReader.hasNext(IndexBasedExampleSetReader.java:62)
  com.rapidminer.example.set.AttributesExampleReader.hasNext(AttributesExampleReader.java:52)
  com.rapidminer.operator.learner.functions.kernel.evosvm.EvoSVMModel.performPrediction(EvoSVMModel.java:178)
  com.rapidminer.operator.learner.PredictionModel.apply(PredictionModel.java:76)
  com.rapidminer.operator.ModelApplier.doWork(ModelApplier.java:100)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.validation.ValidationChain.executeEvaluator(ValidationChain.java:211)
  com.rapidminer.operator.validation.ValidationChain.evaluate(ValidationChain.java:307)
  com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:143)
  com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:133)
  com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:261)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:368)
  com.rapidminer.operator.Operator.execute(Operator.java:771)
  com.rapidminer.Process.run(Process.java:899)
  com.rapidminer.Process.run(Process.java:795)
  com.rapidminer.Process.run(Process.java:790)
  com.rapidminer.Process.run(Process.java:780)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:62)
Tagged:

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 image Unicorn
    Hello Frankie,

    In the sample operator, set the sample parameter to relative and then set the sample ratio to 0.5. This will select half your examples.

    regards

    Andrew
  • frankiefrankie Member Posts: 26 image Contributor II
    What happens if the two groups contain 4000 and 3000 samples, respectively?
    How will I then sample 2000 from each?


    Thanks,
    Frankie
  • TKTK Member Posts: 14 image Contributor II
    You can define absolute Values for each class within the sample operator
  • roman_bednarikroman_bednarik Member Posts: 3 image Contributor I
    Hi, picking up on an old thread: how about if the size of the set is not known, e..g we don't know the absolute number of positive and the absolute number of negative examples? Is there a way to select a balanced subset?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 image Unicorn
    Hi, you can filter your data by label and then apply sampling operators on the filtered data sets and append them. I think http://rapid-i.com/rapidforum/index.php/topic,5706.0.html gives an example for that.

    Best, Marius
  • abbasi_samiraabbasi_samira Member Posts: 9 image Contributor I

    Hello
    How can I equal the number of classes (50 50) for two feature?












    The class contains two values:

    true:94

    false:569











  • abbasi_samiraabbasi_samira Member Posts: 9 image Contributor I

     












    How can I balance the maximum amount of class attribute?











  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 image Community Manager
    Hello @abbasi_samira - have you tried the “Balance Data” operator?

    Scott

  • abbasi_samiraabbasi_samira Member Posts: 9 image Contributor I

    hi












    I want to over sample balance data

    How can I oversampling balance this?
    Please explain the oversampling balance steps

    please help me

    thanks











  • abbasi_samiraabbasi_samira Member Posts: 9 image Contributor I

    hi












    Yes I used the Balance Sample

    read excel--->sample--->balance---->relative or absoulat

    but This is the method undersampling balance

    I need to oversampling balance











  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 image Community Manager

    I would recommend going through the Sample operator tutorial (found inside the Sample help pane).

     

    Screen Shot 2017-12-16 at 12.14.56 PM.png

     

    The Mannheim extension also has a Balance data operator.


    Scott

     

  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 image Unicorn

    Hi @abbasi_samira

     

    There's an extension called 'Operator toolbox' which now contains 'SMOTE upsampling' operator which you could use for oversampling the minority class.

Sign In or Register to comment.