Options

Save best model each iteration

BAMBAMBAMBAMBAMBAM Member Posts: 20 Maven
edited November 2018 in Help
Hi everyone,

I've got a YAGGA process set and working just fine.  However, I'd like to output the best model found for every generation (not just the best attributes). By "best" I mean the one with the highest performance found so far. I would be happy if the same file was written to over and over again, as long I could always stop the process at any time after the first generation/iteration and have the best model saved for me.  I would like to do this not just for YAGGA but just about any other iterating RapidMiner process, that way I could stop long-running processes before they complete and not worry about "losing" the information that they have found so far.

Could someone post a simple example on how to do this?

Thank you!


Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    unfortunately this is currently not possible in an easy way. At least I have no clue how to do this, because the Yagga operator does not provide this information to the process.
    One could simply extend the processbranch acting in a way to write the model out, if the performance is the best known till now, but it would need some programming work. Another possibility would be to use the Script operator to do something equal, or at least writing the performance value into a macro, which then could be used to work with the process branch.

    Greetings,
      Sebastian
  • Options
    BAMBAMBAMBAMBAMBAM Member Posts: 20 Maven
    k thanks, if I figure something out I will let you guys know!
  • Options
    BAMBAMBAMBAMBAMBAM Member Posts: 20 Maven
    Could someone please give an example of how to use ProcessBranch without branching on an attribute value?

    I am trying to figure out how to use "condition type" = "max_performance_value" ... I'd like to write out a model only when the performance exceeds the previously encountered performance.

    Here's my rough draft (it isn't working, that's why ProcessBranch is disabled:)
    <operator name="Root" class="Process" expanded="yes">
        <parameter key="logverbosity" value="warning"/>
        <operator name="LoadData" class="OperatorChain" expanded="yes">
            <operator name="MacroDefinition" class="MacroDefinition">
                <list key="macros">
                  <parameter key="baseName" value="test"/>
                </list>
            </operator>
            <operator name="ExampleSource" class="ExampleSource">
                <parameter key="attributes" value="daily2.att"/>
            </operator>
        </operator>
        <operator name="GeneratingGeneticAlgorithm" class="GeneratingGeneticAlgorithm" expanded="yes">
            <parameter key="population_size" value="25"/>
            <parameter key="maximum_number_of_generations" value="1000"/>
            <parameter key="generations_without_improval" value="5"/>
            <parameter key="keep_best_individual" value="true"/>
            <parameter key="p_initialize" value="0.05"/>
            <parameter key="use_plus" value="false"/>
            <parameter key="use_diff" value="true"/>
            <parameter key="use_div" value="true"/>
            <parameter key="max_number_of_new_attributes" value="2"/>
            <operator name="XValidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="number_of_validations" value="5"/>
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="OperatorChain" class="OperatorChain" expanded="no">
                    <operator name="LinearRegression" class="LinearRegression">
                        <parameter key="keep_example_set" value="true"/>
                        <parameter key="feature_selection" value="none"/>
                        <parameter key="eliminate_colinear_features" value="false"/>
                    </operator>
                </operator>
                <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                    <operator name="Applier" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="RegressionPerformance" class="RegressionPerformance">
                        <parameter key="main_criterion" value="spearman_rho"/>
                        <parameter key="spearman_rho" value="true"/>
                        <parameter key="use_example_weights" value="false"/>
                    </operator>
                    <operator name="ProcessBranch" class="ProcessBranch" activated="no" expanded="yes">
                        <parameter key="condition_type" value="max_performance_value"/>
                        <parameter key="condition_value" value="1"/>
                        <operator name="ModelWriter" class="ModelWriter" breakpoints="after">
                            <parameter key="model_file" value="testBestModel"/>
                            <parameter key="output_type" value="XML"/>
                        </operator>
                    </operator>
                    <operator name="ProcessLog" class="ProcessLog">
                        <list key="log">
                          <parameter key="Perf" value="operator.RegressionPerformance.value.performance"/>
                          <parameter key="Tries" value="operator.RegressionPerformance.value.applycount"/>
                        </list>
                    </operator>
                </operator>
            </operator>
        </operator>
    </operator>

    When the ProcessBranch is enabled, it writes a model file out every iteration (instead of just when a higher max_performance_value is encountered.
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi.
    Unfortunately this is not that simple. You will have to store the maximal achieved performance into a macro, therefore you will need the MacroConstruction.
    But to store the current's iteration performance into a macro first hand, you will have to extract it somehow. Either with the scripting operator or using a more complicated way using Logging / ProcessLog to ExampleSet / DataMacroDefinition.

    I'm not quite sure about your application, but I'm not seeing any sense making this effort for writing out the best model generated during a cross-validation. This simply is only for estimating the performance and since not all data is used, the performance will be worse than training a model on all available data.

    Greetings,
      Sebastian
  • Options
    BAMBAMBAMBAMBAMBAM Member Posts: 20 Maven
    The reason why I was trying to do this is because RapidMiner always gives me the "out of memory" error (even if the process has been running happily for hours using 500MB of memory and there is over 1GB free on the machine, and I've set  MAX_JAVA_MEMORY=4000, and that seemed to work since now RapidMiner will use 1.8GB of virtual memory instead of 600MB which it was using before I changed the MAX_JAVA_MEMORY setting.)   So I can never get it to finish a complete run through GGA, YAGGA, etc. I just want to get any reasonably good solution that was found during the search.  The main goal is to get a good set of attributes. Once I have a "good" model written out to a file, I can read it back in in a simple process and rebuild the model using the "good" attributes and all the data.
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    are you sure the problem is the YAGGA and not the inner operators? Could you post your process? I would then take a quick look.

    Greetings,
      Sebastian
Sign In or Register to comment.