The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Problem/bug - CV and parameter optimization
marcin_blachnik
Member Posts: 61 Guru
Hallo
Below is a typical process with embedded parameter optimization subprocess, however it just doesn't work. It looks like a bug,
Best
Marcin
Below is a typical process with embedded parameter optimization subprocess, however it just doesn't work. It looks like a bug,
Best
Marcin
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
<process expanded="true" height="361" width="804">
<operator activated="true" class="retrieve" compatibility="5.2.000" expanded="true" height="60" name="Retrieve" width="90" x="49" y="88">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="x_validation" compatibility="5.2.000" expanded="true" height="112" name="Validation" width="90" x="370" y="108">
<description>A cross-validation evaluating a decision tree model.</description>
<process expanded="true" height="483" width="547">
<operator activated="true" class="multiply" compatibility="5.2.000" expanded="true" height="94" name="Multiply" width="90" x="30" y="120"/>
<operator activated="true" class="optimize_parameters_grid" compatibility="5.2.000" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="179" y="30">
<list key="parameters">
<parameter key="SVM_Opti.C" value="[0.001;1000;3;logarithmic]"/>
<parameter key="SVM_Opti.gamma" value="[0.01;1;3;linear]"/>
<parameter key="Normalize_Opti.method" value="Z-transformation,range transformation"/>
</list>
<process expanded="true" height="483" width="844">
<operator activated="true" class="x_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation (2)" width="90" x="190" y="48">
<description>A cross-validation evaluating a decision tree model.</description>
<process expanded="true" height="654" width="466">
<operator activated="true" class="normalize" compatibility="5.2.000" expanded="true" height="94" name="Normalize_Opti" width="90" x="39" y="261">
<parameter key="method" value="range transformation"/>
</operator>
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.000" expanded="true" height="76" name="SVM_Opti" width="90" x="208" y="264">
<parameter key="gamma" value="1.0"/>
<parameter key="C" value="1000.0"/>
<list key="class_weights"/>
</operator>
<operator activated="true" class="group_models" compatibility="5.2.000" expanded="true" height="94" name="Group Models" width="90" x="231" y="19"/>
<connect from_port="training" to_op="Normalize_Opti" to_port="example set input"/>
<connect from_op="Normalize_Opti" from_port="example set output" to_op="SVM_Opti" to_port="training set"/>
<connect from_op="Normalize_Opti" from_port="preprocessing model" to_op="Group Models" to_port="models in 1"/>
<connect from_op="SVM_Opti" from_port="model" to_op="Group Models" to_port="models in 2"/>
<connect from_op="Group Models" from_port="model out" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="654" width="466">
<operator activated="true" class="apply_model" compatibility="5.2.000" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.000" expanded="true" height="76" name="Performance (2)" width="90" x="179" y="30"/>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="5.2.000" expanded="true" height="76" name="Log" width="90" x="417" y="176">
<list key="log">
<parameter key="normalize" value="operator.Normalize_Opti.parameter.method"/>
<parameter key="C" value="operator.SVM_Opti.parameter.C"/>
<parameter key="gamma" value="operator.SVM_Opti.parameter.gamma"/>
<parameter key="acc" value="operator.Validation (2).value.performance"/>
<parameter key="num" value="operator.Optimize Parameters (Grid).value.applycount"/>
</list>
</operator>
<connect from_port="input 1" to_op="Validation (2)" to_port="training"/>
<connect from_op="Validation (2)" from_port="averagable 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_parameters" compatibility="5.2.000" expanded="true" height="60" name="Set Parameters" width="90" x="313" y="30">
<list key="name_map">
<parameter key="SVM_Opti" value="SVM"/>
<parameter key="Normalize_Opti" value="Normalize"/>
</list>
</operator>
<operator activated="true" class="normalize" compatibility="5.2.000" expanded="true" height="94" name="Normalize" width="90" x="78" y="370"/>
<operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.000" expanded="true" height="76" name="SVM" width="90" x="246" y="165">
<parameter key="gamma" value="1.0"/>
<parameter key="C" value="1000.0"/>
<list key="class_weights"/>
</operator>
<operator activated="true" class="group_models" compatibility="5.2.000" expanded="true" height="94" name="Group Models (2)" width="90" x="380" y="300"/>
<connect from_port="training" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="Normalize" to_port="example set input"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_op="Set Parameters" to_port="parameter set"/>
<connect from_op="Normalize" from_port="example set output" to_op="SVM" to_port="training set"/>
<connect from_op="Normalize" from_port="preprocessing model" to_op="Group Models (2)" to_port="models in 1"/>
<connect from_op="SVM" from_port="model" to_op="Group Models (2)" to_port="models in 2"/>
<connect from_op="Group Models (2)" from_port="model out" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="483" width="397">
<operator activated="true" class="apply_model" compatibility="5.2.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
<operator activated="true" class="log" compatibility="5.2.000" expanded="true" height="76" name="Log (2)" width="90" x="199" y="190">
<list key="log">
<parameter key="normalize" value="operator.Normalize.parameter.method"/>
<parameter key="C" value="operator.SVM.parameter.C"/>
<parameter key="gamma" value="operator.SVM.parameter.gamma"/>
<parameter key="acc" value="operator.Performance.value.performance"/>
</list>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_op="Log (2)" to_port="through 1"/>
<connect from_op="Log (2)" from_port="through 1" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="2"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
what is not working for you? If i run the process everything seems to work. At least i receive a performance vector as a result.
Best,
Nils
It looks like the Normalization step is not applied in the testing process of the final CV.
this behavior isn't a bug. If you check "create view" in all normalize and apply model operators everything works as expected.
If you do NOT check "create view" every normalize and apply model operator will work on the real data of the loaded iris data set which can't be handled by the SVM because it changes all the time the data is normalized.
Best,
Nils
In my opinion current solution of Normalization but also other operators of RM is very dangerous. I have used that process for many real problems, and now I dno't know which results are good and which are bad.
Moreover after running this process the output exampleTable includes 534 attributes!!! which can not be collected as garbage by JVM. That may lead to Out of Memory problems.
Alternative solution is creating new exampleTable whenever operator modify the data. This can be also unefficient in case of single numerical and many categorical attributes but at least it will be so error free solution.
Best regards
Marcin
But where exactly do you get an exampleTable with 534 Attributes? When i run the process with Materialize Data and then look at the output port of validation,
there are only 4 regular and 2 special attributes.
Best,
Nils
I think that this may be the problem also described in thread http://rapid-i.com/rapidforum/index.php/topic,4195.0.html . So any operator that adds new attributes to the exampleTable like normalization, PCA etc. should be used very careful, especially inside the loop operators and process optimization. My observation is that whenever one uses any of that kind of operators inside the loop or optimization, the subprocess should start with MaterializeData. The only question is which operators add new attributes. How would you comment this?
Thank you very much for all your answers
Marcin
i just ran the incorrect process again and still received only 4 attributes. Did you use the newest version 5.2.1? The problem in the thread you have mentioned should have been fixed with 5.2.1.
If you did use the version 5.2.1. where exactly get aware of the huge amount of attributes? Did you see them in the result view as a example set result of the first CV?
Best,
Nils
Marcin