Propagate ExampleSet

_paul__paul_ Member Posts: 14 Contributor II
edited November 2018 in Help
Hi,

I'd like to find a good parameter combination using GridParameterOptimization, then write
the parameters to disk (for logging purposes), read them again, set them to a learner and
finally write this optimized model to disk.

Here is the process:

<operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#p#ygt# Often the different operators have many parameters and it is not clear which parameter values are best for the learning task at hand. The parameter optimization operator helps to find an optimal parameter set for the used operators. #ylt#/p#ygt#  #ylt#p#ygt# The inner crossvalidation estimates the performance for each parameter set. In this process two parameters of the SVM are tuned. The result can be plotted in 3D (using gnuplot) or in color mode. #ylt#/p#ygt#  #ylt#p#ygt# Try the following: #ylt#ul#ygt# #ylt#li#ygt#Start the process. The result is the best parameter set and the performance which was achieved with this parameter set.#ylt#/li#ygt# #ylt#li#ygt#Edit the parameter list of the ParameterOptimization operator to find another parameter set.#ylt#/li#ygt# #ylt#/ul#ygt# #ylt#/p#ygt# "/>
    <operator name="Input" class="ExampleSource">
        <parameter key="attributes" value="../data/polynomial.aml"/>
    </operator>
    <operator name="Normalization" class="Normalization">
    </operator>
    <operator name="OperatorChain" class="OperatorChain" expanded="yes">
        <operator name="FeatureNameFilter" class="FeatureNameFilter">
            <parameter key="filter_special_features" value="true"/>
            <parameter key="skip_features_with_name" value="a1"/>
        </operator>
        <operator name="FeatureNameFilter (2)" class="FeatureNameFilter">
            <parameter key="filter_special_features" value="true"/>
            <parameter key="skip_features_with_name" value="a3"/>
        </operator>
    </operator>
    <operator name="ParameterOptimization" class="GridParameterOptimization" expanded="yes">
        <list key="parameters">
          <parameter key="Training.degree" value="1,2,3,4,5"/>
        </list>
        <operator name="Validation" class="XValidation" expanded="yes">
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="Training" class="LibSVMLearner">
                <parameter key="svm_type" value="epsilon-SVR"/>
                <parameter key="kernel_type" value="poly"/>
                <parameter key="degree" value="1"/>
                <parameter key="C" value="50"/>
                <parameter key="epsilon" value="0.01"/>
                <list key="class_weights">
                </list>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Test" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluation" class="RegressionPerformance">
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="normalized_absolute_error" value="true"/>
                    <parameter key="squared_error" value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Log" class="ProcessLog">
            <parameter key="filename" value="paraopt.log"/>
            <list key="log">
              <parameter key="C" value="operator.Training.parameter.C"/>
              <parameter key="degree" value="operator.Training.parameter.degree"/>
              <parameter key="absolute" value="operator.Validation.value.performance"/>
            </list>
        </operator>
    </operator>
    <operator name="ParameterSetWriter" class="ParameterSetWriter">
        <parameter key="parameter_file" value="parameters.par"/>
    </operator>
    <operator name="ParameterSetLoader" class="ParameterSetLoader">
        <parameter key="parameter_file" value="parameters.par"/>
    </operator>
    <operator name="ParameterSetter" class="ParameterSetter">
        <list key="name_map">
        </list>
    </operator>
    <operator name="Final" class="LibSVMLearner">
        <list key="class_weights">
        </list>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file" value="mymodel.mod"/>
    </operator>
</operator>
The problem is the missing example set for the learner "Final". Can I somehow propagate the
ExampleSet that is also used for the GridParameterOptimization to the second learned ("Final")?

Regards,
Paul

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Paul,
    there are three solutions for this problem. You could either switch on the "keep_example_set" parameter in the XValidation, which will work in this situation. A more generall solution would be to, copy the example set, so that a copy of your exampleset remains, if the first one is consumed somewhere. The following process will show how to do that:
    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#p#ygt# Often the different operators have many parameters and it is not clear which parameter values are best for the learning task at hand. The parameter optimization operator helps to find an optimal parameter set for the used operators. #ylt#/p#ygt#  #ylt#p#ygt# The inner crossvalidation estimates the performance for each parameter set. In this process two parameters of the SVM are tuned. The result can be plotted in 3D (using gnuplot) or in color mode. #ylt#/p#ygt#  #ylt#p#ygt# Try the following: #ylt#ul#ygt# #ylt#li#ygt#Start the process. The result is the best parameter set and the performance which was achieved with this parameter set.#ylt#/li#ygt# #ylt#li#ygt#Edit the parameter list of the ParameterOptimization operator to find another parameter set.#ylt#/li#ygt# #ylt#/ul#ygt# #ylt#/p#ygt# "/>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="FeatureNameFilter" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="a1"/>
            </operator>
            <operator name="FeatureNameFilter (2)" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="a3"/>
            </operator>
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="ParameterOptimization" class="GridParameterOptimization" expanded="yes">
            <list key="parameters">
              <parameter key="Training.degree" value="1,2,3,4,5"/>
            </list>
            <operator name="Validation" class="XValidation" expanded="yes">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="Training" class="LibSVMLearner">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="kernel_type" value="poly"/>
                    <parameter key="degree" value="1"/>
                    <parameter key="C" value="50"/>
                    <parameter key="epsilon" value="0.01"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                    <operator name="Test" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="Evaluation" class="RegressionPerformance">
                        <parameter key="root_mean_squared_error" value="true"/>
                        <parameter key="absolute_error" value="true"/>
                        <parameter key="normalized_absolute_error" value="true"/>
                        <parameter key="squared_error" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="Log" class="ProcessLog">
                <parameter key="filename" value="paraopt.log"/>
                <list key="log">
                  <parameter key="C" value="operator.Training.parameter.C"/>
                  <parameter key="degree" value="operator.Training.parameter.degree"/>
                  <parameter key="absolute" value="operator.Validation.value.performance"/>
                </list>
            </operator>
        </operator>
        <operator name="ParameterSetWriter" class="ParameterSetWriter">
            <parameter key="parameter_file" value="parameters.par"/>
        </operator>
        <operator name="ParameterSetLoader" class="ParameterSetLoader">
            <parameter key="parameter_file" value="parameters.par"/>
        </operator>
        <operator name="ParameterSetter" class="ParameterSetter">
            <list key="name_map">
            </list>
        </operator>
        <operator name="Final" class="LibSVMLearner">
            <list key="class_weights">
            </list>
        </operator>
        <operator name="ModelWriter" class="ModelWriter">
            <parameter key="model_file" value="mymodel.mod"/>
        </operator>
    </operator>
    A still more sophisticated method would be to use the IOStorage mechanism. With this you can store the data somewhere and retrieve it lateron, even if all IOObjects are thrown away or consumed during the process. Here's how it would work:
    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#p#ygt# Often the different operators have many parameters and it is not clear which parameter values are best for the learning task at hand. The parameter optimization operator helps to find an optimal parameter set for the used operators. #ylt#/p#ygt#  #ylt#p#ygt# The inner crossvalidation estimates the performance for each parameter set. In this process two parameters of the SVM are tuned. The result can be plotted in 3D (using gnuplot) or in color mode. #ylt#/p#ygt#  #ylt#p#ygt# Try the following: #ylt#ul#ygt# #ylt#li#ygt#Start the process. The result is the best parameter set and the performance which was achieved with this parameter set.#ylt#/li#ygt# #ylt#li#ygt#Edit the parameter list of the ParameterOptimization operator to find another parameter set.#ylt#/li#ygt# #ylt#/ul#ygt# #ylt#/p#ygt# "/>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="FeatureNameFilter" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="a1"/>
            </operator>
            <operator name="FeatureNameFilter (2)" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="a3"/>
            </operator>
        </operator>
        <operator name="IOStorer" class="IOStorer">
            <parameter key="name" value="ExampleSet"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="remove_from_process" value="false"/>
        </operator>
        <operator name="ParameterOptimization" class="GridParameterOptimization" expanded="yes">
            <list key="parameters">
              <parameter key="Training.degree" value="1,2,3,4,5"/>
            </list>
            <operator name="Validation" class="XValidation" expanded="yes">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="Training" class="LibSVMLearner">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="kernel_type" value="poly"/>
                    <parameter key="degree" value="1"/>
                    <parameter key="C" value="50"/>
                    <parameter key="epsilon" value="0.01"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                    <operator name="Test" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="Evaluation" class="RegressionPerformance">
                        <parameter key="root_mean_squared_error" value="true"/>
                        <parameter key="absolute_error" value="true"/>
                        <parameter key="normalized_absolute_error" value="true"/>
                        <parameter key="squared_error" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="Log" class="ProcessLog">
                <parameter key="filename" value="paraopt.log"/>
                <list key="log">
                  <parameter key="C" value="operator.Training.parameter.C"/>
                  <parameter key="degree" value="operator.Training.parameter.degree"/>
                  <parameter key="absolute" value="operator.Validation.value.performance"/>
                </list>
            </operator>
        </operator>
        <operator name="ParameterSetWriter" class="ParameterSetWriter">
            <parameter key="parameter_file" value="parameters.par"/>
        </operator>
        <operator name="ParameterSetLoader" class="ParameterSetLoader">
            <parameter key="parameter_file" value="parameters.par"/>
        </operator>
        <operator name="ParameterSetter" class="ParameterSetter">
            <list key="name_map">
            </list>
        </operator>
        <operator name="IORetriever" class="IORetriever">
            <parameter key="name" value="ExampleSet"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="Final" class="LibSVMLearner">
            <list key="class_weights">
            </list>
        </operator>
        <operator name="ModelWriter" class="ModelWriter">
            <parameter key="model_file" value="mymodel.mod"/>
        </operator>
    </operator>
    Greetings,
    Sebastian
  • _paul__paul_ Member Posts: 14 Contributor II
    Thank you Sebastian.

    This is exactly what I was looking for. :-)

    Regards,
    Paul
Sign In or Register to comment.