The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Question
I was wondering how the maximal number of XValidations embedded into an EvolutionaryParameterOptimization
can be determined.
My settings for the evolutionary parameter optimization are:
"max_generations" value="5"
"generations_without_improval" value="-1" (on purpose to make things more clear)
"population_size" value="20"
"tournament_fraction" value="0.3"
And for the Xvalidation, the parameter "number_of_validations" is set to 2.
Here is the corresponding code:
population size is 20, there are 2*20=40 validations in each generation. Using 5 generations I would
expect, 200 validations in total.
But when I check the output of the ProcessLog operator, the parameter optimization computes 248 performance
values, which in my opinion should represent one individual each, with 2 iterations (the two runs of the validation).
Thus, in total 2*248=596 validations are performed in total. Why not just 200?
Marcus
can be determined.
My settings for the evolutionary parameter optimization are:
"max_generations" value="5"
"generations_without_improval" value="-1" (on purpose to make things more clear)
"population_size" value="20"
"tournament_fraction" value="0.3"
And for the Xvalidation, the parameter "number_of_validations" is set to 2.
Here is the corresponding code:
I would expect that that for each individual (within a population) 2 validations are performed. Since the
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="../data/polynomial.aml"/>
</operator>
<operator name="ParameterOptimization" class="EvolutionaryParameterOptimization" expanded="yes">
<list key="parameters">
<parameter key="LibSVMLearner.C" value="0.1:100"/>
<parameter key="LibSVMLearner.degree" value="2:7"/>
</list>
<parameter key="max_generations" value="5"/>
<parameter key="generations_without_improval" value="-1"/>
<parameter key="population_size" value="20"/>
<parameter key="tournament_fraction" value="0.3"/>
<parameter key="local_random_seed" value="2001"/>
<parameter key="show_convergence_plot" value="true"/>
<operator name="Validation" class="XValidation" expanded="yes">
<parameter key="number_of_validations" value="2"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="svm_type" value="epsilon-SVR"/>
<parameter key="kernel_type" value="poly"/>
<parameter key="C" value="76.53909856172457"/>
<list key="class_weights">
</list>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Test" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
<operator name="Log" class="ProcessLog">
<parameter key="filename" value="paraopt.log"/>
<list key="log">
<parameter key="C" value="operator.LibSVMLearner.parameter.C"/>
<parameter key="degree" value="operator.LibSVMLearner.parameter.degree"/>
<parameter key="performance" value="operator.Validation.value.performance"/>
<parameter key="iterations" value="operator.Validation.value.iteration"/>
</list>
</operator>
</operator>
</operator>
population size is 20, there are 2*20=40 validations in each generation. Using 5 generations I would
expect, 200 validations in total.
But when I check the output of the ProcessLog operator, the parameter optimization computes 248 performance
values, which in my opinion should represent one individual each, with 2 iterations (the two runs of the validation).
Thus, in total 2*248=596 validations are performed in total. Why not just 200?
Marcus
0
Answers
I think the population_size parameter specifies the size of the initial population, which might change in the next generations. This might cause the deviation from the expected number.
Greetings,
Sebastian
evolutionary algorithms with a variable population size are IMHO
not that common. Do you have by any chance a reference
(paper/URL/book) that describes the principles you are using in
RapidMiner for this parameter optimization?
So, does this mean that the number of validations cannot be
bounded by a maximal number of validations?
Marcus
I had asked a similar question a few months ago, and Ingo gave a little more background on what RM does behind the scenes with evolutionary algorithms:
http://rapid-i.com/rapidforum/index.php/topic,344.0.html
Hope this helps,
Keith
from the generation.
I would assume that you have evaluated individuals from generation n, then select some of them for cross-over and mutation, and finally put these possibly new individuals in generation n+1. In the next round, all individuals (if new) from generation n+1 are then evaluated. Thus, I would expect that in each generation at most p individuals, with p being the population size, are evaluated. But this seems to be wrong. It seems to me that the new offspring individuals (after crossover and mutation) are evaluated but their fitness values are dropped such that they have to be re-evaluated
in generation n+1.
Marcus
it might happen, that two individuals mutate AND cross over, so that the number might increase over the population size.
Greetings,
Sebastian
after the first generation with an initial population the fitness values for each individual are computer.
Then in the selection phase the fittest individuals (fraction specified for example by 'tournament_fraction')
are determined. For those, pairs are randomly selected and crossover is performed with 'crossover_prob'.
For these new individuals the fitness must be evaluated. So, after this step we have possibly some more
individuals due to the additional children.
Next, on these children mutation is done. For these mutated individuals again fitness evaluation must
be performed. So, in addition to the additional "crossover" children we may get new "mutation" children.
Together with the parent, the individuals represent the offspring.
Finally, the reinsertion step is performed by selecting the fittest individual from the offspring and insert them
into the next generation. Which reinsertion strategy are you actually using? Depending on the strategy, I assume
that, as Sebastian wrote previously, the population size might become larger or smaller in the following generation.
The evolutionary parameter optimizations has the nice feature 'show_convergence_plot'. How is actually
the blue curve computed? I might imagine that for each generation the average performance is computed.
Marcus
I'm sorry, but I'm neither a specialist in this topic nor have I written this operators. Everybody which has participated on writing this part of rapid miner is currently out of office due to various reasons. So I cannot give any absolut answer...
Greetings,
Sebastian