The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Classification of highly imbalanced data
Hi guys,
I'm working on churn prediction problem and I'm having a problem with highly imbalanced data (only 0.1% churners in data set). I have tried different types of pre-processing and modeling, but still cannot get decent results (maximum 20 % real churners in 10% of highest propensity records).
I tried to use upsampling, downsampling, something in between, clustering set before classification, normalization, PCA, feature selection... And different modeling techniques, decision trees, neural nets, SVM... Bagging and boosting and missclassification cost. This has helped me to improve accuracy of my model from 2% to 20 % in highest propensity segment, but this is the most i got.
Did anyone work on similar problems? Which technique did you find most helpful?
Thank you in advance,
Bojana
I'm working on churn prediction problem and I'm having a problem with highly imbalanced data (only 0.1% churners in data set). I have tried different types of pre-processing and modeling, but still cannot get decent results (maximum 20 % real churners in 10% of highest propensity records).
I tried to use upsampling, downsampling, something in between, clustering set before classification, normalization, PCA, feature selection... And different modeling techniques, decision trees, neural nets, SVM... Bagging and boosting and missclassification cost. This has helped me to improve accuracy of my model from 2% to 20 % in highest propensity segment, but this is the most i got.
Did anyone work on similar problems? Which technique did you find most helpful?
Thank you in advance,
Bojana
0