The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Prediction outcome all shows no
Hi, I am new to RapidMiner.
I would like to predict Credit Payment 'yes or no' outcome with at least 2 method (e.g Naive Bayes/Decision Tree/k-mean etc.), however all of the prediction returns a no.
Unlike a usual yes/no this attribute has 3 values in the excel data: yes, no and unknown.
I am using Naive Bayes, decision tree and have set the attribute to label, polynominal for prediction.
Appreciate any help in advance.
Regards.
Tagged:
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn@Coralion,
In deed you have an imbalanced training set :
The outcome = Yes is your minority class.
Without preprocessing, given that the examples with outcome = Yes are in minority, your model has difficulties to "capture"
the relationships between the outcome = yes and your different regular attributes and thus is not able to predict correctly the outcome = Yes.
In these cases, for your business case, if the capacity of correctly predicting the outcome = Yes is important for you, you have to preprocess your training set by upsampling the minority class. It can be done by the SMOTE upsampling operator.
By doing this, you significantly increase the Recall (aka Sensitivity), ie the capacity of your model to correctly predict the outcome = Yes.
Example(s) of results here without/with preprocessing your training set :
You can find your process including the preprocessing step in attached file.
More generally for your business problem, you have to "quantify" what is important for you.
For that, you have to quantify 4 values :
- the (potential) gain when you correctly predict the Outcome = Yes (True positive cases)
- the (potential) gain when you correctly predict the Outcome = No (True negative cases)
- the (potential) cost when you incorrectly predict the Outcome = Yes (the real outcome is No) (False positive cases)
- the (potential) cost when you incorrectly predict the Outcome = No (the real outcome is Yes) (False negative cases)
By setting these 4 values, you are defining a "cost matrix". RapidMiner will also automatically build the model(s)
which will maximize the gain (and minimize the cost).
To do that, you have to submit your training set to AutoModel and define your "cost matrix" in the third menu ("Prepare Target").
I hope these elements will help you !
Regards,
Lionel
7
Answers
It's difficult to answer without analysing your data.
Maybe have you got a highly imbalanced training set ...??
Can you please share your data ?
Regards,
Lionel
As attached. There are some 'unknown' values for education and job attribute. Those could be the affecting factors.