The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[Solved]UNBALANCED DATA - Newbie Question
Hello All,
I am new to this forum and I have read through previous posts but I'm not understanding the basic steps needed to set up a process to balance data.
I have a label with the following split (97% = Y, 3% = N). I have used WEKA's "resample" filter in the past which does what I would like to do in RapidMiner. Essentially you can expand your under-represented value to match your over-represented value. My questions is, which operator(s) should I use and with which settings?
Sorry for the rookie question,
Paul
I am new to this forum and I have read through previous posts but I'm not understanding the basic steps needed to set up a process to balance data.
I have a label with the following split (97% = Y, 3% = N). I have used WEKA's "resample" filter in the past which does what I would like to do in RapidMiner. Essentially you can expand your under-represented value to match your over-represented value. My questions is, which operator(s) should I use and with which settings?
Sorry for the rookie question,
Paul
0
Answers
if you can live with the fact that both classes are sampled with replacement, then you can use the Sample (Bootstrapping) operator with weighted sampling: just assign a higher weight to the minority class, such that it is more likely to be sampled. This is done beforehand with the GenerateAttributes operator. Then the weights attribute must be assigned the role "weight". Please have a look at the attached process for the details and come back here if you have any questions left.
For alternatives, please have a look at this thread, there is quite some discussion on the topic: http://rapid-i.com/rapidforum/index.php/topic,2190.0.html
All the best,
Marius