"nested cross validation with rapid miner"

Legacy UserLegacy User Member Posts: 0 Newbie
edited May 2019 in Help
Hi,

I was trying to get one of my  tasks with LibSVM done using RapidMiner, however it seems that I am somehow understanding/using the operators in a wrong way.
I want to do cross validation  (data = train +  test) on my data to measure the performance of a simple SVM model, but also do cross-validation (train = paramTrain + paramTest ) to find the optimal parameters (say, C and gamma) for my SVM (= the SVM trained on the current cv-training set) which leads to a nested cros-validation.

Therefor I created an outer XValidation for containing 1.  paramterOptimazation (train) and 2. evaluation (test) and in the parameterOptimization a second XValidation, containing Learner (paramTrain) and Evaluator (paramTest).
However, I have problems to pass the model optimized by the inner crossValidation to the outer Evaluation. If I create a final model in the inner XVal by setting the according parameter to true, it is not passed to the outer one. If I add an additional SVM learner after the inner XVal, this learner complains about a missing ExampleSet.
I hope that was not too confusing - here is the according code.  It would be great if anyone could tell me what I am doing wrong ...

<operator name="Root" class="Process" expanded="yes">
   <description text="#ylt#p#ygt# Often the different operators have many parameters and it is not clear which parameter values are best for the learning task at hand. The parameter optimization operator helps to find an optimal parameter set for the used operators. #ylt#/p#ygt#  #ylt#p#ygt# The inner crossvalidation estimates the performance for each parameter set. In this process two parameters of the SVM are tuned. The result can be plotted in 3D (using gnuplot) or in color mode. #ylt#/p#ygt#  #ylt#p#ygt# Try the following: #ylt#ul#ygt# #ylt#li#ygt#Start the process. The result is the best parameter set and the performance which was achieved with this parameter set.#ylt#/li#ygt# #ylt#li#ygt#Edit the parameter list of the ParameterOptimization operator to find another parameter set.#ylt#/li#ygt# #ylt#/ul#ygt# #ylt#/p#ygt# "/>
   <operator name="SparseFormatExampleSource" class="SparseFormatExampleSource">
       <parameter key="data_file" value="some_sample.svn"/>
       <parameter key="dimension" value="200000"/>
       <parameter key="format" value="yx"/>
   </operator>
   <operator name="XValidation" class="XValidation" expanded="yes">
       <operator name="find libSVMPrarams" class="OperatorChain" expanded="yes">
           <description text="Input: ExampleSetOutput: Model"/>
           <operator name="loopThroughLocalParams" class="GridParameterOptimization" expanded="yes">
               <list key="parameters">
                 <parameter key="trainWithLocalParams.C" value="[0.0;Infinity;10;linear]"/>
                 <parameter key="trainWithLocalParams.gamma" value="[0.0;Infinity;10;linear]"/>
               </list>
               <operator name="crossEvalLocalParams" class="XValidation" expanded="yes">
                   <parameter key="keep_example_set" value="true"/>
                   <operator name="trainWithLocalParams" class="LibSVMLearner">
                       <parameter key="C" value="250.0"/>
                       <parameter key="epsilon" value="0.01"/>
                       <parameter key="kernel_type" value="poly"/>
                   </operator>
                   <operator name="evaluateLocalParams" class="OperatorChain" expanded="yes">
                       <operator name="Test" class="ModelApplier">
                           <list key="application_parameters">
                           </list>
                       </operator>
                       <operator name="ClassificationPerformance" class="ClassificationPerformance">
                           <parameter key="weighted_mean_precision" value="true"/>
                       </operator>
                   </operator>
               </operator>
           </operator>
           <operator name="LibSVMLearner" class="LibSVMLearner">
           </operator>
       </operator>
       <operator name="evaluateLibSvmWithOptimalParams" class="OperatorChain" expanded="yes">
           <operator name="applyModel" class="ModelApplier">
               <list key="application_parameters">
               </list>
           </operator>
       </operator>
   </operator>
</operator>

Thanks a lot
Mome

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Mome,
    there were only a few things missing to complete your process. Look below:
    <operator name="Root" class="Process" expanded="yes">
        <description text="#ylt#p#ygt# Often the different operators have many parameters and it is not clear which parameter values are best for the learning task at hand. The parameter optimization operator helps to find an optimal parameter set for the used operators. #ylt#/p#ygt#  #ylt#p#ygt# The inner crossvalidation estimates the performance for each parameter set. In this process two parameters of the SVM are tuned. The result can be plotted in 3D (using gnuplot) or in color mode. #ylt#/p#ygt#  #ylt#p#ygt# Try the following: #ylt#ul#ygt# #ylt#li#ygt#Start the process. The result is the best parameter set and the performance which was achieved with this parameter set.#ylt#/li#ygt# #ylt#li#ygt#Edit the parameter list of the ParameterOptimization operator to find another parameter set.#ylt#/li#ygt# #ylt#/ul#ygt# #ylt#/p#ygt# "/>
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="C:\Dokumente und Einstellungen\sland\Eigene Dateien\Yale\RapidMiner_Zaniah\sample\data\iris.aml"/>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <operator name="find libSVMPrarams" class="OperatorChain" expanded="yes">
                <description text="Input: ExampleSetOutput: Model"/>
                <operator name="loopThroughLocalParams" class="GridParameterOptimization" expanded="yes">
                    <list key="parameters">
                      <parameter key="trainWithLocalParams.C" value="[0.0;1000.0;2;linear]"/>
                      <parameter key="trainWithLocalParams.gamma" value="[0.0;1000.0;2;linear]"/>
                    </list>
                    <operator name="crossEvalLocalParams" class="XValidation" expanded="yes">
                        <parameter key="keep_example_set" value="true"/>
                        <parameter key="number_of_validations" value="2"/>
                        <operator name="trainWithLocalParams" class="LibSVMLearner">
                            <parameter key="C" value="1000.0"/>
                            <parameter key="epsilon" value="0.01"/>
                            <parameter key="gamma" value="1000.0"/>
                            <parameter key="kernel_type" value="poly"/>
                        </operator>
                        <operator name="evaluateLocalParams" class="OperatorChain" expanded="yes">
                            <operator name="Test" class="ModelApplier">
                                <list key="application_parameters">
                                </list>
                            </operator>
                            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                                <parameter key="weighted_mean_precision" value="true"/>
                            </operator>
                        </operator>
                    </operator>
                </operator>
                <operator name="ParameterSetter" class="ParameterSetter">
                    <list key="name_map">
                      <parameter key="trainWithLocalParams" value="optimizedLearner"/>
                    </list>
                </operator>
                <operator name="optimizedLearner" class="LibSVMLearner">
                    <parameter key="C" value="1000.0"/>
                    <parameter key="gamma" value="500.0"/>
                </operator>
                <operator name="IOConsumer" class="IOConsumer">
                    <parameter key="io_object" value="PerformanceVector"/>
                </operator>
            </operator>
            <operator name="evaluateLibSvmWithOptimalParams" class="OperatorChain" expanded="yes">
                <operator name="applyModel" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="ClassificationPerformance (2)" class="ClassificationPerformance">
                    <parameter key="accuracy" value="true"/>
                    <parameter key="classification_error" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>
    I hat do insert the parameterSetter in order to set the best values into the parameters for the final svm. The next thing was the IOConsumer for deleting performance vector. Since all performance operators get averaged, it confuses the outer crossvalidation if the inner delivers an additional performance vector.
    And at last I had to add a performance measure in the outer crossvalidation.

    But I admit, its quite confusing :)

    Greetings,
      Sebastian
Sign In or Register to comment.