The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How can I improve the performance of my model with an imbalanced database for a classification issue
Samira_123
Member Posts: 9 Learner II
Hi,
This is my fist time using RapidMiner. I have to do a classification for an assignment.
The database is really imbalanced. I have 180 out of 12800 donors who donated (class - 1) in the past and the remaining donors didn't donated (class - 0).
When I created and selected relevant attributes, the class precisions were relevant but the class recall for class 1 was totally irrelevant. I had something close to 8%.
However, when I used the 'Sample' operator to balance my database, the class recall and the class precision were around 60%. I am not sure if it is the right thing to do because at the end, I end up with 360 donors instead of 12 800.
At the end, I have to use a test set of more than 12 000 donors to predict which donor will donate.
Thank you
NB: My kappa is equal to 0.267
This is my fist time using RapidMiner. I have to do a classification for an assignment.
The database is really imbalanced. I have 180 out of 12800 donors who donated (class - 1) in the past and the remaining donors didn't donated (class - 0).
When I created and selected relevant attributes, the class precisions were relevant but the class recall for class 1 was totally irrelevant. I had something close to 8%.
However, when I used the 'Sample' operator to balance my database, the class recall and the class precision were around 60%. I am not sure if it is the right thing to do because at the end, I end up with 360 donors instead of 12 800.
At the end, I have to use a test set of more than 12 000 donors to predict which donor will donate.
Thank you
NB: My kappa is equal to 0.267
Tagged:
0
Best Answer
-
varunm1 Member Posts: 1,207 UnicornHello @Samiraaa_123
Whats your kappa value? And what did you apply? Also, there is no guarantee you always get excellent results with ML, some times the data might be random or you may not be tuning your hyperparameters well. You need to keep trying using different models and tuning their hyperparameters and you also need to understand data by checking correlations, distributions.Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
6
Answers
How are you validating your model? Is it cross-validation or split validation?
Sampling is good when it is applied to the training set. It is not recommended to apply sampling on the whole dataset. As the dataset is small, you can try upsampling your minority class using SMOTE operator present in the Operator toolbox (Download from Marketplace) instead of downsampling. Also, you can try weighting your examples instead of sampling, this word only for few algorithms like neural networks, decision trees, etc,. Weighting doesn't alter your sample-sizes but assigns equal importance to both classes. This can be done using Generate weights (Stratification). You should check if the algorithm you are trying to use will accept this weighting. That can be found by right-clicking on the algorithm operator and then click on Show operator info. There if you see a green tick after "Weighted Examples" then that algorithm is fine for weighting.
Are your tuning the model's hyperparameters? Are you trying different algorithms?
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I used the 'Cross Validation' operator to validate my model. I tried to balance my dataset by using the Generate weights (Stratification) before since I saw this could work on the forum but it says that the 'Random Forest' (operator I am using for the classification) will disregard that.
Does the SMOTE operator need to be placed just before the cross-validation?
Thank you so much for your answer
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thank you for your advice
I've been trying to do what you said. My class precision is really good but the class recall for the class 1 is irrelevant.
Thank you. You were very helpful.
Wish you a good weekend