The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Optimal SVM parameters but very different results?
Hi all,
I am using the grid search parameter optimizer to determine the best parameters (C and gamma) for my SVM. The SVM is embedded in a 10-fold-validation.
After the process is finished I get the parameter set and a performance of 100 (!)%. (see code 1)
Code 1:
Code2:
How is that possible?
Or am I doing something wrong? (by the way: my classification task is binary and my two classes are well balanced, (128 training vectors a 5 features for each class))
Thanks a lot in advance,
Sasch
I am using the grid search parameter optimizer to determine the best parameters (C and gamma) for my SVM. The SVM is embedded in a 10-fold-validation.
After the process is finished I get the parameter set and a performance of 100 (!)%. (see code 1)
Code 1:
When I actually apply the received parameters I only get 52,05 %. (see code 2)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="467" width="748">
<operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
<parameter key="excel_file" value="D:\PSY-DATA\06_HERZRATEN_PROJEKT\HR_KlassDaten.xlsx"/>
<parameter key="imported_cell_range" value="A1:M4901"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Probandennummer.false.integer.attribute"/>
<parameter key="1" value="Alter.false.integer.attribute"/>
<parameter key="2" value="Altersgruppe.false.integer.attribute"/>
<parameter key="3" value="Geschlecht.false.integer.attribute"/>
<parameter key="4" value="Geschlechtsfaktor.false.integer.attribute"/>
<parameter key="5" value="Mens.false.integer.attribute"/>
<parameter key="6" value="RMSSD(ms).true.real.attribute"/>
<parameter key="7" value="mean_RR(ms).true.real.attribute"/>
<parameter key="8" value="std_RR(ms).true.real.attribute"/>
<parameter key="9" value="mean_HR.true.real.attribute"/>
<parameter key="10" value="std_HR.true.real.attribute"/>
<parameter key="11" value="label.false.polynominal.attribute"/>
<parameter key="12" value="label valenz.true.polynominal.label"/>
</list>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="5.2.008" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="447" y="30">
<list key="parameters">
<parameter key="SVM.C" value="0.03125,0.125,0.5,2,8,32,128,512,2048,8192,32768"/>
<parameter key="SVM.gamma" value="0.000030517578125,0.00012207,0.000488281,0.001953125,0.0078125,0.03125,0.125,0.5,2,8"/>
</list>
<process expanded="true" height="487" width="826">
<operator activated="true" class="x_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="179" y="75">
<parameter key="average_performances_only" value="false"/>
<process expanded="true" height="487" width="346">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.008" expanded="true" height="76" name="SVM" width="90" x="246" y="30">
<parameter key="gamma" value="0.001953125"/>
<parameter key="C" value="32768"/>
<parameter key="cache_size" value="250"/>
<list key="class_weights"/>
<parameter key="calculate_confidences" value="true"/>
</operator>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="487" width="300">
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="180" y="30">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_port="input 1" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Code2:
[The codes above are only showing my setups for trouble-shooting.]
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="467" width="748">
<operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
<parameter key="excel_file" value="D:\PSY-DATA\06_HERZRATEN_PROJEKT\HR_KlassDaten.xlsx"/>
<parameter key="imported_cell_range" value="A1:M4901"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Probandennummer.false.integer.attribute"/>
<parameter key="1" value="Alter.false.integer.attribute"/>
<parameter key="2" value="Altersgruppe.false.integer.attribute"/>
<parameter key="3" value="Geschlecht.false.integer.attribute"/>
<parameter key="4" value="Geschlechtsfaktor.false.integer.attribute"/>
<parameter key="5" value="Mens.false.integer.attribute"/>
<parameter key="6" value="RMSSD(ms).true.real.attribute"/>
<parameter key="7" value="mean_RR(ms).true.real.attribute"/>
<parameter key="8" value="std_RR(ms).true.real.attribute"/>
<parameter key="9" value="mean_HR.true.real.attribute"/>
<parameter key="10" value="std_HR.true.real.attribute"/>
<parameter key="11" value="label.false.polynominal.attribute"/>
<parameter key="12" value="label valenz.true.polynominal.label"/>
</list>
</operator>
<operator activated="true" class="x_validation" compatibility="5.2.008" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
<parameter key="average_performances_only" value="false"/>
<process expanded="true" height="511" width="365">
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.008" expanded="true" height="76" name="SVM" width="90" x="137" y="30">
<parameter key="gamma" value="3.0517578125E-5"/>
<parameter key="C" value="0.03125"/>
<parameter key="cache_size" value="250"/>
<list key="class_weights"/>
<parameter key="calculate_confidences" value="true"/>
</operator>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="511" width="365">
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="205" y="30">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
How is that possible?
Or am I doing something wrong? (by the way: my classification task is binary and my two classes are well balanced, (128 training vectors a 5 features for each class))
Thanks a lot in advance,
Sasch
Tagged:
0
Answers
are you testing your model on a different dataset than you trained on? -> The results may vary slightly.
See these posts (they are about paramater optimization):
http://rapid-i.com/rapidforum/index.php/topic,4034.msg14915.html#msg14915
http://rapid-i.com/rapidforum/index.php/topic,4018.msg14881.html
Maybe they'll help to solve your problem ...(?)
Greetz,
Sasch
However, an accuracy of 100% is unusual. Your processes look fine, so you should have a look at your data: how many examples are you using for the optimization? Are the classes balanced? How did you create the sample? Is it drawn from the same distribution as your test data?
Best, Marius
thanks for taking time to help me.
-Your processes look fine
=> Uff, thank god, first problem solved, that helped me a lot
- how many examples are you using for the optimization?
=> I have 256 examples, 128 examples for each class => That means the classes are perfectly balanced. I also use 10-fold-validation for accuracy estimation.
- How did you create the sample?
=> Each example consists of 5 features and a label for the condition (negative/positive). In my case, all features are derived from heart rate data (e.g. mean, std, RMSS etc.)
- Is it drawn from the same distribution as your test data?
=> Yes.
So, when I put the optimal parameters in a SVM and train now on the same data with an 10-fold-val, I only get 52 % accuracy (from my point of view this result doesn't reflect the term "slightly differ" )
My problem here isn't the 100% accuracy, it's that fatal drop of over 40% percent...
Thanks again,
Sasch
That value does not leave much room for fluctuations during the 10 folds, however, what's the accuracy's standard deviation in the first process?
How much data do you have in total? 256 examples is not very much, if possible you should really increase it by a factor of 10.
=> Yes, I know, we always say "god is angry" if you get an 100% result on bio signal data
- 256 examples is not very much, if possible you should really increase it by a factor of 10
=> We are talking about bio signal data derived from humans. It's hard to get those features exactly for the conditions we examine. We are all sent to hell by our chief if we increase the examples artificially..
- That value does not leave much room for fluctuations during the 10 folds, however, what's the accuracy's standard deviation in the first process?
=> Perhaps this helps:
http://imageshack.us/photo/my-images/3/61745688.jpg
What happens if you run the optimization on the test set?
What do you mean with your last question?
My second code runs with the optimal parameters on the same data set as the first code. I thought the 10-fold-val will do the rest? (Splitting in test and training sets and so on..)
On such a small dataset the splits created by the X-Validation may have a big impact. Try to set the same local random seed for all X-Validations. That won't improve your analysis, but at least it will make the results comparable, and if the processes are setup correctly, you should get exactly the same performances with equal parameter sets.