The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Save best model each iteration
Hi everyone,
I've got a YAGGA process set and working just fine. However, I'd like to output the best model found for every generation (not just the best attributes). By "best" I mean the one with the highest performance found so far. I would be happy if the same file was written to over and over again, as long I could always stop the process at any time after the first generation/iteration and have the best model saved for me. I would like to do this not just for YAGGA but just about any other iterating RapidMiner process, that way I could stop long-running processes before they complete and not worry about "losing" the information that they have found so far.
Could someone post a simple example on how to do this?
Thank you!
I've got a YAGGA process set and working just fine. However, I'd like to output the best model found for every generation (not just the best attributes). By "best" I mean the one with the highest performance found so far. I would be happy if the same file was written to over and over again, as long I could always stop the process at any time after the first generation/iteration and have the best model saved for me. I would like to do this not just for YAGGA but just about any other iterating RapidMiner process, that way I could stop long-running processes before they complete and not worry about "losing" the information that they have found so far.
Could someone post a simple example on how to do this?
Thank you!
0
Answers
unfortunately this is currently not possible in an easy way. At least I have no clue how to do this, because the Yagga operator does not provide this information to the process.
One could simply extend the processbranch acting in a way to write the model out, if the performance is the best known till now, but it would need some programming work. Another possibility would be to use the Script operator to do something equal, or at least writing the performance value into a macro, which then could be used to work with the process branch.
Greetings,
Sebastian
I am trying to figure out how to use "condition type" = "max_performance_value" ... I'd like to write out a model only when the performance exceeds the previously encountered performance.
Here's my rough draft (it isn't working, that's why ProcessBranch is disabled:)
<operator name="Root" class="Process" expanded="yes">
<parameter key="logverbosity" value="warning"/>
<operator name="LoadData" class="OperatorChain" expanded="yes">
<operator name="MacroDefinition" class="MacroDefinition">
<list key="macros">
<parameter key="baseName" value="test"/>
</list>
</operator>
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="daily2.att"/>
</operator>
</operator>
<operator name="GeneratingGeneticAlgorithm" class="GeneratingGeneticAlgorithm" expanded="yes">
<parameter key="population_size" value="25"/>
<parameter key="maximum_number_of_generations" value="1000"/>
<parameter key="generations_without_improval" value="5"/>
<parameter key="keep_best_individual" value="true"/>
<parameter key="p_initialize" value="0.05"/>
<parameter key="use_plus" value="false"/>
<parameter key="use_diff" value="true"/>
<parameter key="use_div" value="true"/>
<parameter key="max_number_of_new_attributes" value="2"/>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="number_of_validations" value="5"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="OperatorChain" class="OperatorChain" expanded="no">
<operator name="LinearRegression" class="LinearRegression">
<parameter key="keep_example_set" value="true"/>
<parameter key="feature_selection" value="none"/>
<parameter key="eliminate_colinear_features" value="false"/>
</operator>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Applier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="main_criterion" value="spearman_rho"/>
<parameter key="spearman_rho" value="true"/>
<parameter key="use_example_weights" value="false"/>
</operator>
<operator name="ProcessBranch" class="ProcessBranch" activated="no" expanded="yes">
<parameter key="condition_type" value="max_performance_value"/>
<parameter key="condition_value" value="1"/>
<operator name="ModelWriter" class="ModelWriter" breakpoints="after">
<parameter key="model_file" value="testBestModel"/>
<parameter key="output_type" value="XML"/>
</operator>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<list key="log">
<parameter key="Perf" value="operator.RegressionPerformance.value.performance"/>
<parameter key="Tries" value="operator.RegressionPerformance.value.applycount"/>
</list>
</operator>
</operator>
</operator>
</operator>
</operator>
When the ProcessBranch is enabled, it writes a model file out every iteration (instead of just when a higher max_performance_value is encountered.
Unfortunately this is not that simple. You will have to store the maximal achieved performance into a macro, therefore you will need the MacroConstruction.
But to store the current's iteration performance into a macro first hand, you will have to extract it somehow. Either with the scripting operator or using a more complicated way using Logging / ProcessLog to ExampleSet / DataMacroDefinition.
I'm not quite sure about your application, but I'm not seeing any sense making this effort for writing out the best model generated during a cross-validation. This simply is only for estimating the performance and since not all data is used, the performance will be worse than training a model on all available data.
Greetings,
Sebastian
are you sure the problem is the YAGGA and not the inner operators? Could you post your process? I would then take a quick look.
Greetings,
Sebastian