The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Bad Performance of ChurnPrediction
Tomatenmark
Member Posts: 4 Learner II
in Help
Hey there,
I created a process for ChurnPrediction. My label in the data set is Churn.
1 is for yes and 0 is for no.
I used a decision tree and cross validation opeartor as you can see in my process.
But the my model does not make predictions that a customer will move/churn.
All customers are predicted to stay, therefore my class recall of true 1 is 0%.
I can not find the problem why my predictions are so bad.
Please find attached the data file, my process and a screenshot of the performance vector.
Thanks for your support
Tagged:
1
Answers
the default settings of Decision Tree are often good, but not in every case. They are meant to avoid too much overfitting, but this could be inappropriate for your data.
Try disabling pruning and postpruning first. Check the resulting model. Chances are that it will be a very complex tree (likely overfitted), but it will predict both categories, even if the cross validation will show bad results. If this works, you can enable pruning and postpruning again and play with the parameters until you find the optimum.
The best way to do this is by using Optimize Parameters. There's a readily usable building block in the Community Samples repository:
Community Building Blocks/Optimize Decision Tree.
Here's an Academy video on parameter optimization:
https://academy.rapidminer.com/learn/video/optimization-of-the-model-parameters
Lastly, maybe Decision Tree is not the best learner for your data. You could try Gradient Boosted Trees, Random Forests, Naive Bayes, Logistic Regression, Deep Learning, Support Vector Machines etc.
Regards,
Balázs
rfuentealba,
thanks for your answer. As you can see I used üptimize_parameters_grid in process, I tried many different parameter combinations for the decision tree, but still not working.
To find correlated variables I saw that I can use a correlation matrix, I will try it out.
So that most of my variables are numeric is fine or is it a problem?
Why shall I remove the CustomerID? In my set role operator I told rapidminer that it is a id column.
Hope my explanations are understandable to you
Thanks in advance,
Tomatenmark