Options

Apparent "memory leak" with GeneticAlgorithm attribute selection

MarkMark Member Posts: 9 Contributor II
The following simple process works if maximum_number_of_generations is a relatively small value, but when I increase it to a larger value, in this case 3000, RapidMiner uses more and more memory as it tries to optimize the selection of attributes.  After an hour or two, it has used up the available memory of my computer.  If I save the PerformanceVector file after the process is finished, I get a very large *.per file that is gigabytes in size.  Is RapidMiner accumulating data related to the PerformanceVector file in memory as it runs the process?  If it is, can I turn this feature off?  Or is there something else I can do to reduce the memory footprint?

Thanks,

Mark

<operator name="Root" class="Process" expanded="yes">
    <parameter key="logfile" value="*******/logfile.log"/>
    <parameter key="resultfile" value="******/results.res"/>
    <operator name="ExcelExampleSource" class="ExcelExampleSource">
        <parameter key="excel_file" value="******.xls"/>
        <parameter key="first_row_as_names" value="true"/>
        <parameter key="id_column" value="1"/>
        <parameter key="label_column" value="34"/>
    </operator>
    <operator name="GeneticAlgorithm" class="GeneticAlgorithm" expanded="yes">
        <parameter key="maximum_number_of_generations" value="3000"/>
        <parameter key="show_stop_dialog" value="true"/>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="NaiveBayes" class="NaiveBayes">
                </operator>
            </operator>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
</operator>

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Mark,

    we are actually currently planning a code revision of the feature selection / weighting / construction operators - one of the reasons was the large amount of memory used by this operators for larger number of individuals and / or generations. This new code base will probably be part of the upcoming 4.3 release together with a new mechanism of feature constructions / generations. We will take this report into account in order to reduce the footprint.

    For the moment, you could try multistarts instead of large number of generations. In many applications, multiple starts with different random seed instead of long optimization times lead to similar results with respect to performance. You could use the operator IteratingOperatorChain for this (inner random seed to -1!)

    Cheers,
    Ingo



  • Options
    MarkMark Member Posts: 9 Contributor II
    Hello Ingo,

    I just ran a similar process using GeneticAlgorithm selection with the 4.5 release, and it ran without a problem.  Thank you for following up on this!

    Mark
Sign In or Register to comment.