One-class SVM performance problem

Legend · January 2010

Dears,

I have been playing with rapidminer one-class LibSVM but I couldn't get any negative prediction result, only 100% confidence_TRUE at any parameters of SVM.

Does somebody know how to get correct result for one-class SVM in RM?

I will appreciate your response.
Kindly Regards,
Danny Seo.

TobiasMalbrecht · January 2010

Hi Danny,

it is quite irritating that you get 100% confidence for the class with every parameter setting. I was able to get more reasonable results quite easily using generated data. So maybe there is something wrong in your process setup or your parameters. Here is the RM5 code for the process I just set up. Maybe you are able to use this as a guide ...


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="415" width="882">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="target_function" value="gaussian mixture clusters"/>
        <parameter key="number_examples" value="1000"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="use_local_random_seed" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
        <list key="function_descriptions">
          <parameter key="label" value="&quot;true&quot;"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
        <parameter key="name" value="label"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="581" y="30">
        <parameter key="svm_type" value="one-class"/>
        <parameter key="gamma" value="1.0"/>
        <list key="class_weights"/>
      </operator>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="715" y="30">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="SVM" to_port="training set"/>
      <connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="SVM" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Kind regards,
Tobias

Legend · January 2010

Dear Tobias Malbrecht,

Thank you for your response.
I have tested your code as folloing :
(I just added some test data generation.)

However, it always results "true" predictions even if test data is generated between 100 and 200 bounds.
How can I classify out liers?

(It's possible with the consideration of confidence(true) attrigbute?)

Thanks.
Kindly Regards,
Danny.



<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="415" width="882">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="target_function" value="gaussian mixture clusters"/>
        <parameter key="number_examples" value="1000"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="use_local_random_seed" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
        <list key="function_descriptions">
          <parameter key="label" value="&quot;true&quot;"/>
        </list>
      </operator>
      <operator activated="true" breakpoints="after" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
        <parameter key="name" value="label"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="581" y="30">
        <parameter key="svm_type" value="one-class"/>
        <parameter key="gamma" value="1.0"/>
        <list key="class_weights"/>
      </operator>
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="120">
        <parameter key="target_function" value="gaussian mixture clusters"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_upper_bound" value="30.0"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="local_random_seed" value="2010"/>
      </operator>
      <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes (2)" width="90" x="179" y="120">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="120">
        <list key="function_descriptions">
          <parameter key="label" value="&quot;false&quot;"/>
        </list>
      </operator>
      <operator activated="true" breakpoints="after" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="447" y="120">
        <parameter key="name" value="label"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="715" y="75">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="SVM" to_port="training set"/>
      <connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
      <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

TobiasMalbrecht · January 2010

Hi Danny,

Legend wrote:

However, it always results "true" predictions even if test data is generated between 100 and 200 bounds.
How can I classify out liers?

(It's possible with the consideration of confidence(true) attrigbute?)

of course it predicts class "true" - what else should the model do, if it only describes one class? Nevertheless, the confidence attribute is an indicator to what extent the data points belong to that class. You may define a threshold yourself and classify all instances below that threshold as outliers. Alternatively, you may also use an outlier detection scheme directly.

Kind regards,
Tobias

dragoljub · January 2010

I have extensively used the C++ version of LibSVM. The one-class SVM in RM does not seem to perform the same type of analysis, namely it does not allow taking multiple class labels.

For example, one-class SVM can be used to train an outlier model using 2 classes of labeled data. Although model training does not use the labels when generating a model it should be able to differentiate (predict) between the inside and outside of the one-class model. Therefore RM should be able to take a binomial class label and perform prediction for 2 classes of labels.

-Gagi

harri678 · February 2010

I am also working with LibSVM's one-class in RM and i miss the classification part (controlled by nu parameter).
In the log i can find lots of the following entries when using one-class in RM 5.0.3:


...
Feb 24, 2010 11:58:33 AM WARNING: SimpleCriterion: NaN was generated!
...

I remember the "NaN" values from java-libsvm where it indicates the classification result of an outlier, so its definitly processed within RM. Would it be very difficult to add some kind of binominal prediction-functionality where model result "NaN" is mapped to a predication label like "out" and result "1" is mapped to a prediction "in"? I can offer to contribute some code in this case if you give me a hint in which RM-class these changes are required and if its not too time consuming

.

Greetings,
Harald

land · February 2010

Hi Harald,
you might write a feature request on our bug tracker, but our schedule is quite full. So if you need it really fast, you very well could contribute the code. I would start search in the LibSVMModel class in com/rapidminer/operator/learner/functions/kernel package.

Greetings,
Sebastian

harri678 · February 2010

Hi Sebastian,

I think I will give it a try to implement it by myself! My dev-environment is already up and running and I've located the proper part in the code (thanks for the tip). When using the one class classification mode, the results are either "-1" or "1" as expected but the probabilities aren't calculated by this function. So I think about implementing an optional parameter for the libsvmtype one-class to switch between the current and a new classification behavior to maintain downwards compatibility. Currently I still need a little bit more understanding on how the datastructs (esp. Attribute and Example) work together. Also the NaN log message is not directly generated by libsvm, I'm sure it has something to do with the label attribute and I'll research this too.

If its working and it's not too dirty I'd gladly contribute the code.

Greetings, Harald

UPDATE: patch available in http://rapid-i.com/rapidforum/index.php/topic,1746.0.html

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

One-class SVM performance problem

Answers