The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Sample operators"
Hi,
I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.
I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.
Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.
How do I do this?
Thanks
B
I have been using the "sample" operator to reduce the size of my input data. I have limited success in achieving what I want to and will appreciate any help. My main issue is that I don't know which specific operator to use and I cannot find documentation explaining each.
I have data with 4 labels with representation of each as A = 60%, B = 20%, C = 15% and D = 5%.
Irrespective of the statistic significance, I want to include at least all 5% of D and an equivalent portion of A, B and C.
How do I do this?
Thanks
B
Tagged:
0
Answers
If I understand you correctly, than you want to sample the data in such a way that you have the same number of examples for all classes. Beware that this may change some properties of the data so that a model trained on this subset but applied to set of the initial structure may be biased.
How to:
Rapidminer is like lego, there is not a single operator to achieve this but the combination of many.
Here are the bricks:
- Filter Examples
- Multiply
- Join
- Sample
hope this was helpful,
steffen
since this is a common task, I added a example process on myExperiment.
Search for "Change Class Distribution of Your Training Data Set by Filtering and Sampling" in the myExperiment View to download the process.
http://www.myexperiment.org/workflows/1775.html
See http://rapid-i.com/component/option,com_myblog/show,Video-on-RapidMiner-Community-Extension-myExperiment-.html/Itemid,172/lang,en/ ; for the myExperiment stuff.
See also the "Same Number of Examples per Class" process here on myExperiment http://www.myexperiment.org/workflows/1315.html for a more sophisticated/generic solution.
Ciao Sebastian