The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Generate more examples based on our dataset data
mansour_ebrahim
Member Posts: 22 Contributor II
in Help
Hi all
I asked the following question but haven't received a piece of good advice so far. Do appreciate if anyone can help me. (I don't want to do UPsampling or SMOT operator).
Is there any operator in RapidMiner to increase the number of example in the dataset? I mean an operator which generate more samples from all groups and increase the total numbers of example in my dataset. I am running a DL model on my dataset but the number of samples is not enough and cannot get more samples and have to generate and produce more samples from all groups.
Also, I do not want to balance the number of samples in classes; just increasing the size of dataset let's say threefold.
Regards.
Mansour
I asked the following question but haven't received a piece of good advice so far. Do appreciate if anyone can help me. (I don't want to do UPsampling or SMOT operator).
Is there any operator in RapidMiner to increase the number of example in the dataset? I mean an operator which generate more samples from all groups and increase the total numbers of example in my dataset. I am running a DL model on my dataset but the number of samples is not enough and cannot get more samples and have to generate and produce more samples from all groups.
Also, I do not want to balance the number of samples in classes; just increasing the size of dataset let's say threefold.
Regards.
Mansour
1
Answers
If you just want to repeat your existing examples, multiply your example set and use Append to append them as many times as you want.
You can optionally add some noise by randomly changing some attribute values.
However, this won't really change your model. You usually can't cheat machine learning algorithms by inventing more data than you actually have.
Regards,
Balázs
For your purpose, you can use SAMPLE (BOOTSTRAPPING) operator which will do exactly what you want - increase number of examples without creating any synthetic examples. But as @BalazsBarany said already, this technique won't have any significant effect on model performance.
Vladimir
http://whatthefraud.wtf