Bad performance when loading and applying a SVM Model
hi,
I saved a SVM model as
I got a performance of about 80%-87% for testing / training accuracy respectively.
When I now load the saved model and applied the model on some test data, I get only about 57%, and the contingency table shows me that there seems to be an issue... here is the design from loading and applying the model:
and here is the performance:
ok my previous post was cut again...so here again:
when I am doing the same process, only with a k-nn model, I get also about 80% on new test data... that's why I'm asking me if this is a SVM operator related issue, or if I did something wrong with my process, or if its a saving the model issue...
the only log message that appeared was:
Aug 5, 2016 9:18:20 AM WARNING: Kernel Model: The value types between training and application differ for attribute 'ABC', training: real, application: integer
therefore, I imported the data again, and configured real for 'ABC', the message dissapeared but the result was the same unfortunately...
Answers
in addition:
I get the same performance (57%) when I am splitting my original excel file with the data in test and training data (split Data -> write to Excel...) when I then use the separate test excel data to test my model, I also get a surprisingly bad performance of 57%....
altough if I apply the split Data operator directly in my design window for the process, the splitted test data gives me 80% performance if applied to the previously created model directly ( from the split data operator) it seems that nothing has changed between the two test data examples, but the performance is quite different when using split Data operator and if importing the splitted test data previously...
I can not say the same for my saved k-nn model, there the extra imported test data performs also around 80%...
is this a bug related to SVM operator?
Hi @Fred12
that 57% is not a great number in itself and more so because your model seems to predicting everything as 1.0
Also between consecutive runs the random seed may change and hence leading to different results.
You can specify local seeds to ensure repeatability.
Also split data in your case is basicalyl sending over a part of data to apply model, so you are only testing a part of it.
In those number of records it happens to get 57% correctly.
What kind of attributes you have and what are you trying to predict ?
You may need to do additional preprocessing, to see if you need to normalize, weight or do other feature processing before, but rung now the model is not usable, because everythign si predicted as 1.0
Try some of the new algo that came with 7.2 version this week, they may yield better performance
My first question is how was the SVM model trained? Was it trained using X-val? What were the performance measures there?
As @bhupendra_patil pointed out, the model is selecting for the one class over the others. What was the data like before you trained the model, was there a large majority class that overwhlemed a smaller minority class? Sometimes in situations like this, it's unbalanced data that is causing this.
yeah class distributions (1,3,4) where about 50/30/20 %... but in X-Validation, it had about 80% performance, and I had stratified data split... therfore I'm wondering, if the model recognises correctly about 80% in X-Validation training / testing, why should it be otherwise when applying the same model for the other stratified test data? does imbalanced data have so much impact on the predictive accuracy on the model if previously in the X-Validation it seemed to work just fine?