The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
SMOTE
Hi
My binary classification problem is imbalanced. (In 5% of the cases the outcome occurs)
I used SMOTE for the variable selection and training of the model.
SMOTE derives from the paper in the link above.
In this paper mentioned it is written: "a combination with
the method of over-sampling
the minority class and under-sampling the majority class can achieve better classifier performance than only under-sampling the majority class."
the minority class and under-sampling the majority class can achieve better classifier performance than only under-sampling the majority class."
My question now is: Is applying SMOTE not sufficient to address the imbalanced problem.Or do I need to add aditionally an operator for "under-sampling the majority class"?
0
Answers
If there're too many samples in the majority class, you can add down-sampling (w/ "sample" operator) before SMOTE. You may also use some similarity analysis to identify the similar data points in the majority class and size down this population with simple filters. Some R/python library are helpful to under-sample with sophisticated algorithms, e.g. Edited Neared Neighbor Rule, Condensed Nearest Neighbor Rule, TomekLinks, One-sided selection, Neighborhood Cleaning Rule,...
Note that ROC curve can not measure the performance of classifiers well on imbalanced data. Because TPR only depends on positives, ROC curves do not measure the effects of negatives. AUC does not place more emphasis on one class over the other, so it does not reflect the minority class well. Try the Precision-Recall curve on the imbalanced data.
Cheers,
YY