The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Optimize Auto Model towards Sensitivity
VCResearcher_0
Member Posts: 2 Learner I
Hi Experts,
Some context on my problem: I have an unbalanced dataset with 3k observations from which about 5% are successful companies and 95% unsuccessful ones. The underlying definitions of successful/failure are not relevant here as the dataset contains only labels 0 (failure) or 1 (successful). For every company, I have about 150 features which were identified at point in time t1. The label successful/unsuccessful was identified at point in time t2 because at point t1 it's unclear whether the company will become successful or not.
Goal: Based on the information we have at point t1, I want to predict whether the company will become a success or failure at point t2. The model should serve as a pre-selection tool for venture capital investors to figure out on which companies to focus their attention, i.e., which have the highest likelihood of success. In venture capital, only very small number of portfolio companies account for the majority of the fund's return. The majority of companies are failures and don't return anything. The return distribution is similar to a pareto distribution where 20% of companies account for 80% of returns. Consequently, the investor cannot afford to miss out on any of the success cases. This means that while it's okay to wrongly classify failures as success, it's not okay to wrongly classify a success as a failure, i.e., I need to optimise the model towards sensitivity (avoid false negatives).
Problem: After running the Auto Model, I have 2 questions: 1) With the default setting only Naive Bayes leads to a sensitivity different to 0, i.e., 87.5%. How can I optimize all models towards sensitivity? 2) How can I limit the number of success predictions? Once I want to optimize the model towards sensitivity (avoid false negatives), the model could easily predict every company as success and end up with 100% sensitivity. Is it possible to limit the number of success predictions to a specific threshold, e.g., 20% of the sample size?
Really looking forward to your help & thanks already upfront!
Some context on my problem: I have an unbalanced dataset with 3k observations from which about 5% are successful companies and 95% unsuccessful ones. The underlying definitions of successful/failure are not relevant here as the dataset contains only labels 0 (failure) or 1 (successful). For every company, I have about 150 features which were identified at point in time t1. The label successful/unsuccessful was identified at point in time t2 because at point t1 it's unclear whether the company will become successful or not.
Goal: Based on the information we have at point t1, I want to predict whether the company will become a success or failure at point t2. The model should serve as a pre-selection tool for venture capital investors to figure out on which companies to focus their attention, i.e., which have the highest likelihood of success. In venture capital, only very small number of portfolio companies account for the majority of the fund's return. The majority of companies are failures and don't return anything. The return distribution is similar to a pareto distribution where 20% of companies account for 80% of returns. Consequently, the investor cannot afford to miss out on any of the success cases. This means that while it's okay to wrongly classify failures as success, it's not okay to wrongly classify a success as a failure, i.e., I need to optimise the model towards sensitivity (avoid false negatives).
Problem: After running the Auto Model, I have 2 questions: 1) With the default setting only Naive Bayes leads to a sensitivity different to 0, i.e., 87.5%. How can I optimize all models towards sensitivity? 2) How can I limit the number of success predictions? Once I want to optimize the model towards sensitivity (avoid false negatives), the model could easily predict every company as success and end up with 100% sensitivity. Is it possible to limit the number of success predictions to a specific threshold, e.g., 20% of the sample size?
Really looking forward to your help & thanks already upfront!
Tagged:
0
Answers
By default, Auto-Model is optimizing a model based on the accuracy....
After opening each process (for example the process associated to a Decision Tree model) generated by Auto-Model, you have to :
- Go inside the Optimize Parameters operator -> Cross Validation operator
- Replace the Performance operator by Performance (Binominal Classification) operator
- Set sensitivity in the main criterion parameter of this operator.
This time, RapidMiner will optimize the parameter(s) of your model to maximize the sensitivity.
I hope it helps,
Regards,
Lionel
To answer to your second question :
By defaut, for a binary classification problem, RapidMiner apply a threshold of 0,5 on the confidences
to determine the predicted class...
To modify (increase) this threshold, you can use the association Create Threshold / Apply Threshold operators like this :
I propose you increase this threshold, for example threshold = 0,7. In this case, you will have :
If confidence(target = Success) > 0,7, then predicted class = Success
else predicted class = Fail
There is no "automatic way" in RapidMiner to obtain / calculate the threshold corresponding to a final sensitivity of 20%.
Logically, the more you increase the threshold (0,7 - 0,8 - 0,9 - 0,95 etc.) the more the sensitivity decrease..
It's up to you to adjust by dichotomy the threshold to obtain a sensitivity of 20%.
I hope it helps,
Regards,
Lionel
Re your first answer, it only changed the confusion matrix results for the Gradient Boosted Trees (i.e., when opening the generated model in "Design" >> "Optimize Parameters" >> "Cross Validation" >> "Inner Performance (Bin. Class.)" and changing the main criterion to "sensitivity", it jumped from 0% to 18.75%). Unfortunately, results did not change at all for Decision Trees and Random Forest (and I actually could not find the respective operator for Deep Learning, Log Reg, Gen Reg and Naive Bayes). This feels a bit weird though. What do you think?
Re your second answer, I don't want to limit the sensitivity (it should be actually maximized as much as possible) but limit the number of positive predictions, i.e., only predict 20% (or for a dataset of 200 observations only predict 40 observations) as success but with the highest sensitivity possible. Is there a way to limit the number of success predictions but still maximum sensitivity?
The ultimate goal is to find a model which has the highest sensitivity and can be limited in the number of positive (success) predictions. Any ideas?
1. It's the expected behaviour of Auto Model :
By default, Auto Model don't perform parameters optimization for the models you mentionned. To optimize these models
you have to open the generated process and manually add an Optimize Parameters operator (inspire you to the model of Decision Tree for example).
2. To increase the sensitivity, you can sample your dataset with the Sample operator.
By this way, you increase the ratio success / fail in the training set used to train your model(s) and then you increase the sensitivity.
It's up to you to adjust by dichotomy these 2 ratios to maximize the sensitivity and simultaneously obtain a success prediction rate of 20%.
I hope it helps,
Regards,
Lionel
NB : Process with Sample operator to inspire you :