# Question

I was wondering how the maximal number of XValidations embedded into an EvolutionaryParameterOptimization

can be determined.

My settings for the evolutionary parameter optimization are:

"max_generations" value="5"

"generations_without_improval" value="-1" (on purpose to make things more clear)

"population_size" value="20"

"tournament_fraction" value="0.3"

And for the Xvalidation, the parameter "number_of_validations" is set to 2.

Here is the corresponding code:

population size is 20, there are 2*20=40 validations in each generation. Using 5 generations I would

expect, 200 validations in total.

But when I check the output of the ProcessLog operator, the parameter optimization computes 248 performance

values, which in my opinion should represent one individual each, with 2 iterations (the two runs of the validation).

Thus, in total 2*248=596 validations are performed in total. Why not just 200?

Marcus

can be determined.

My settings for the evolutionary parameter optimization are:

"max_generations" value="5"

"generations_without_improval" value="-1" (on purpose to make things more clear)

"population_size" value="20"

"tournament_fraction" value="0.3"

And for the Xvalidation, the parameter "number_of_validations" is set to 2.

Here is the corresponding code:

I would expect that that for each individual (within a population) 2 validations are performed. Since the

<operator name="Root" class="Process" expanded="yes">

<operator name="ExampleSource" class="ExampleSource">

<parameter key="attributes" value="../data/polynomial.aml"/>

</operator>

<operator name="ParameterOptimization" class="EvolutionaryParameterOptimization" expanded="yes">

<list key="parameters">

<parameter key="LibSVMLearner.C" value="0.1:100"/>

<parameter key="LibSVMLearner.degree" value="2:7"/>

</list>

<parameter key="max_generations" value="5"/>

<parameter key="generations_without_improval" value="-1"/>

<parameter key="population_size" value="20"/>

<parameter key="tournament_fraction" value="0.3"/>

<parameter key="local_random_seed" value="2001"/>

<parameter key="show_convergence_plot" value="true"/>

<operator name="Validation" class="XValidation" expanded="yes">

<parameter key="number_of_validations" value="2"/>

<parameter key="sampling_type" value="shuffled sampling"/>

<operator name="LibSVMLearner" class="LibSVMLearner">

<parameter key="svm_type" value="epsilon-SVR"/>

<parameter key="kernel_type" value="poly"/>

<parameter key="C" value="76.53909856172457"/>

<list key="class_weights">

</list>

</operator>

<operator name="ApplierChain" class="OperatorChain" expanded="yes">

<operator name="Test" class="ModelApplier">

<list key="application_parameters">

</list>

</operator>

<operator name="Performance" class="Performance">

</operator>

</operator>

</operator>

<operator name="Log" class="ProcessLog">

<parameter key="filename" value="paraopt.log"/>

<list key="log">

<parameter key="C" value="operator.LibSVMLearner.parameter.C"/>

<parameter key="degree" value="operator.LibSVMLearner.parameter.degree"/>

<parameter key="performance" value="operator.Validation.value.performance"/>

<parameter key="iterations" value="operator.Validation.value.iteration"/>

</list>

</operator>

</operator>

</operator>

population size is 20, there are 2*20=40 validations in each generation. Using 5 generations I would

expect, 200 validations in total.

But when I check the output of the ProcessLog operator, the parameter optimization computes 248 performance

values, which in my opinion should represent one individual each, with 2 iterations (the two runs of the validation).

Thus, in total 2*248=596 validations are performed in total. Why not just 200?

Marcus

0

## Answers

2,531UnicornI think the population_size parameter specifies the size of the initial population, which might change in the next generations. This might cause the deviation from the expected number.

Greetings,

Sebastian

4Contributor Ievolutionary algorithms with a variable population size are IMHO

not that common. Do you have by any chance a reference

(paper/URL/book) that describes the principles you are using in

RapidMiner for this parameter optimization?

So, does this mean that the number of validations cannot be

bounded by a maximal number of validations?

Marcus

157MavenI had asked a similar question a few months ago, and Ingo gave a little more background on what RM does behind the scenes with evolutionary algorithms:

http://rapid-i.com/rapidforum/index.php/topic,344.0.html

Hope this helps,

Keith

4Contributor Ifrom the generation.

I would assume that you have evaluated individuals from generation n, then select some of them for cross-over and mutation, and finally put these possibly new individuals in generation n+1. In the next round, all individuals (if new) from generation n+1 are then evaluated. Thus, I would expect that in each generation at most p individuals, with p being the population size, are evaluated. But this seems to be wrong. It seems to me that the new offspring individuals (after crossover and mutation) are evaluated but their fitness values are dropped such that they have to be re-evaluated

in generation n+1.

Marcus

2,531Unicornit might happen, that two individuals mutate AND cross over, so that the number might increase over the population size.

Greetings,

Sebastian

4Contributor Iafter the first generation with an initial population the fitness values for each individual are computer.

Then in the selection phase the fittest individuals (fraction specified for example by 'tournament_fraction')

are determined. For those, pairs are randomly selected and crossover is performed with 'crossover_prob'.

For these new individuals the fitness must be evaluated. So, after this step we have possibly some more

individuals due to the additional children.

Next, on these children mutation is done. For these mutated individuals again fitness evaluation must

be performed. So, in addition to the additional "crossover" children we may get new "mutation" children.

Together with the parent, the individuals represent the offspring.

Finally, the reinsertion step is performed by selecting the fittest individual from the offspring and insert them

into the next generation. Which reinsertion strategy are you actually using? Depending on the strategy, I assume

that, as Sebastian wrote previously, the population size might become larger or smaller in the following generation.

The evolutionary parameter optimizations has the nice feature 'show_convergence_plot'. How is actually

the blue curve computed? I might imagine that for each generation the average performance is computed.

Marcus

2,531UnicornI'm sorry, but I'm neither a specialist in this topic nor have I written this operators. Everybody which has participated on writing this part of rapid miner is currently out of office due to various reasons. So I cannot give any absolut answer...

Greetings,

Sebastian