The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Minority Classes in Classification

TobiasMalbrechtTobiasMalbrecht Moderator, Employee-RapidMiner, Member Posts: 295 RM Product Management
edited September 2019 in Help
New message posted in the sourceforge forum at http://sourceforge.net/forum/forum.php?thread_id=2092429&forum_id=390413:


Hi -

I'm a newbie on the list so apologies if this has been delt with before.

Is there a way to oversample minority classes (or undersample majority data) so that a dataset is balanced before using a learner?

Thanks,

- Mark
Tagged:

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee-RapidMiner, Member Posts: 295 RM Product Management
    Hi Mark,

    a sampling algorithm that generates a fixed label distribution through sampling has not been yet implemented in RapidMiner. However, we recently implemented an operator [tt]EqualLabelWeighting[/tt] which has roughly the same effect as it generates example weights and sets these weights so that all label values (classes) are equally weighted in the example set. Of course the subsequent learner has to be capable of using example weights. Otherwise the former equal label weighting is ignored.

    Hope that helps!
    Regards,
    Tobias
  • keithkeith Member Posts: 157 Maven
    Hi,

    I was searching to an answer on how to oversample an underrepresented portion of the data, and came across this previous question on the same topic.

    I wanted to see if there had been any new features in RM 4.2 that make enable oversampling.  If not, is it possible to somehow use the WEKA function "weka.filters.supervised.instance.Resample ", which appears to do it?

    Thanks,
    Keith
  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Keith,

    beside the mentioned EqualLabelWeighting there are no new sampling operators for over- and undersampling, sorry. Since basically all learning schemes in RM support weighted examples and methods like threshold variations (in the postprocessing group) and cost sensitive learning are supported, I actually don't miss those methods but anyway: I will add them to our todo list. Of course you are also free to extend RM with this functionality by yourself.

    Cheers,
    Ingo
Sign In or Register to comment.