The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
standard of reliability- accuracy and gains
Hi everyone,
I am a newbie to Rapidminer. This will be a quick question, I hope to hear any comments from you.
I imported 203 rows of data into Rapidminer and run automodel. The best performing model is the decision tree model, accuracy=55.6%, standard deviation ±12, gains =20, can I judge that this model is reliable? why? is there a standard for reliability in Rapidminer?
And what are the means of the weight of important factors? At what value can the two be considered very related?
And what are the means of the weight of important factors? At what value can the two be considered very related?
thank you very much!
Tagged:
0
Answers
There are multiple things you need to look at to see if your model is reliable or not.
One starting point is the model performance and I see it is 55.6 which is just above chance accuracy. Now to analyze your performance, you need to look at your class imbalance and the model performance in each class. For example, if your data has two classes A & B and you have 113 samples belonging to class A and 90 samples belonging to class B and your model predicted all samples as class A then you get an accuracy of 55.6. In this case, the model is bad as it never predicted class B. So you need to look at confusion matrix or precision and recall metrics.
If you want to find this in a single metric, there is "Kappa" performance metric. This is not impacted by class imbalances and provides you with a stable performance metric. You can use this instead of accuracy. The standard deviation is also a major factor to consider. It is hard to tell what is the best value as it depends on the domain and the problem you are trying to solve.
Domain knowledge is also another major thing, are models with accuracy 55.6% accepted in the domain linked to your problem. I think 55.6 is low in any domain as its just above-chance accuracy, but no strict comments as I am not aware of the data.
Other things to consider are feature importance, model validation type, and cost sensitivity matrix.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing