How to train and apply one-class SVM?
hi,
I noticed, there is a One-Class SVM Classifier method in the LibSVM Operator. I guess this is for training only one-class (and adapting the specific ranges / Distributions) for this class..
how do I train one class, and then apply them for later classification?
e.g I train 3 classes, each with a one-class SVM.
In the later, I want to classify those 3 classes on a dataset with all 3 classes in it, would I have to use 3 SVM Classifier Models to do that? And how can one model then cope with the other 2 classes? how does it do that? Does it just check if a class falls in the distribution of the trained classes, and then decides if yes (it is the same class) or no?
and how can I design my process to use all the 3 trained One-Class SVM's onto one dataset with all 3 classes?
Answers
The community is your friend: http://community.rapidminer.com/t5/RapidMiner-Studio/One-class-SVM-for-text-classfication/m-p/31538 See @mschmitz's solution.
To train three separte classes individually, I would just filter out each class and then have three seperate SVM''s. For scoring, I'd take the unlabeled set and apply it to the three SVM models you created but I'm not 100% sure why you would go this route.
Isn't it better to just train the model on all three classes then have one model?
well, I am not sure,
first I also thought so it would be better to train them on all 3 classes together, but I have class distribution of 50% to 30% to 20%, and therefore I think there will always be some more bias for the 50% majority class... however if I do that, I get accuracy of 86% if I apply feature selection (6 out of 25 attributes), but no matter what learner I try or what I do in addition, I get not more than 86% (maybe the other 14% are lost cases?and if so , are there techniques to identify those and maybe train / learn them separately?).
But now if I balance them absolute with same number of all 3 classes, I get only about 82% with feature selection applied before...
what is better now? The 86% ? or should I take the 82% as from the balanced sample for granted?
and what wonders me in addition is that, if I extract the 14% misclassified data with filter example operator, and train them again separately on the SVM, I get a classification of also above 80% ??? so I really am not sure what this should tell me?
seems that those 14% are really different than the overall 86% ? and I guess I could not combine the 2 classifiers then, right?
Thats why I wanted to try another approach: Just train on the distributions of the 3 classes separately, and then look if it helps in better distincting the classes or not... (but I guess it will not do well probably..)
Accuracy is just one part of the evaluation, how does the precision compare to the different balancing schemes?
as I mentioned, if I let data unbalanced, do 70:30 split with all data , I get about 86-87% performance, if balanced absolutely equal all classes (about 500), I get 82% accuracy