The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Accuracy of models all the same
AizatAlam_129
Member Posts: 14 Contributor II
in Help
Hi,
I ran my data on RM's automodel for prediction and the results showed that all models have the same accuracy rate.
I have no idea why this happened. Can anyone explain to me what can be the possibilities that it came to be like this?
Thank you
I ran my data on RM's automodel for prediction and the results showed that all models have the same accuracy rate.
I have no idea why this happened. Can anyone explain to me what can be the possibilities that it came to be like this?
Thank you
Tagged:
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
Having imbalanced data is not necessarily a problem. There are some methods for coping with it.
Try balancing the data before running AutoModel. That won't give you a perfect model you can deploy but you get an estimate of the model quality on the balanced data, on the importance of attributes and on which algorithm works best on your data. You should get more complex models and more reasonable confusion matrixes from this approach, even if the accuracy might be lower than before.
Here's an Academy video on balancing, sampling and weighting data. These are approaches you can try for creating a good model on imbalanced data:
https://academy.rapidminer.com/learn/video/sampling-weighting-intro
So, I would do the following:
1. Downsampling the majority class to be more or less equal to the minority class.
2. Running AutoModel on the balanced data.
3. Choosing a model type for further work.
4. Building a process with an approach for weighting or sampling, e. g. downsampling in the left part of a cross validation.
5. Validating and optimizing the final model.
It is important that you validate your models on the original (imbalanced) distribution even when using some sampling method to build better models.
Regards,
Balázs1
Answers
In my experience this happens when the data set is imbalanced and complex to predict. In that case, for each modeling algorithm, predicting the majority class will be the best choice, so they will just do that. But there could be other possibilities, too.
Are the accuracy rates, AUC values and confusion matrixes all the same? You can easily see the "all models predict the majority class" in the confusion matrix.
Can you take a look at the actual models, e. g. decision tree, GBT, Random Forest? They are easy to interpret. If the trees are just simple two-way decisions and not trees, then that's the reason.
Regards,
Balázs
And upon checking DT, GBT and RF, oddly they are indeed just simple two-way decisions.
Does this mean that I have a problem with my data and that the models are incorrect?