"XValidation with customized performance values"

chris_mlchris_ml Member Posts: 17 Maven
edited May 2019 in Help
Hi,

I'd like to perform a parameter optimization where the quality of the model and its parameters
are not evaluated by the standard performance values like "absolute_error" but by a customized
measure which is generated by an external tool.

My idea looks as follows (not yet complete) [based on 1. example from sample directory "07_Meta"]:

<operator name="Root" class="Process" expanded="yes">
    <operator name="Input" class="ExampleSource">
        <parameter key="attributes" value="../data/polynomial.aml"/>
    </operator>
    <operator name="ParameterOptimization" class="GridParameterOptimization" expanded="yes">
        <list key="parameters">
          <parameter key="Training.C" value="50,100,150,200,250"/>
          <parameter key="Training.degree" value="1,2,3,4,5"/>
        </list>
        <operator name="Validation" class="XValidation" expanded="yes">
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="Training" class="LibSVMLearner">
                <parameter key="C" value="50"/>
                <parameter key="degree" value="1"/>
                <parameter key="epsilon" value="0.01"/>
                <parameter key="kernel_type" value="poly"/>
                <parameter key="svm_type" value="epsilon-SVR"/>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="ModelWriter" class="ModelWriter">
                    <parameter key="model_file" value="/tmp/mymodel.mod"/>
                    <parameter key="output_type" value="XML"/>
                </operator>
                <operator name="CommandLineOperator" class="CommandLineOperator">
                    <parameter key="command" value="INVOKE EXTERNAL TOOL"/>
                </operator>
                <operator name="Replace1" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Replace2" class="RegressionPerformance">
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="normalized_absolute_error" value="true"/>
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="squared_error" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>
    <operator name="ParameterSetWriter" class="ParameterSetWriter">
        <parameter key="parameter_file" value="/tmp/parameters.par"/>
    </operator>
</operator>

A short explanation:
The idea is to write each model (with currently considered parameters) into a file ("ModelWrite") where
it is used together with an external application (invoked via "CommandLineOperator") to generate a
performance value. I skip all details - just assume that the performance of the currently considered model
is dumped into a file from where RapidMiner should read it. Based on these values the cross-validation can
be performed. So basically RapidMiner's "ModelApplier" and the performance operator generating a performance
vector must be replaced by a call to an external tool.

My question:
How can I integrate the results of an external tool into the XValdation chain? Obviously. the two operators named
"Replace1/2" must be replaced by an operator that reads the custom performance value (is an integer value, the
smaller the better) from a file generated by the external tool and transforms it into a valid "PerformanceVector"
which is used afterwards by the XValidation operator. Reading from a file and translating the integer value into
a performance vector are the two issues I couldn't solve yet. :-)

Do you have any ideas how to accomplish this?

Thank you in advance.
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Chris,
    this is an interessting question. I think, I have solved your problem, provided that your program is able to write the results in a proper csv file. The code is shown below:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="Input" class="ExampleSource">
            <parameter key="attributes" value="../data/polynomial.aml"/>
        </operator>
        <operator name="ParameterOptimization" class="GridParameterOptimization" expanded="yes">
            <list key="parameters">
              <parameter key="Training.C" value="50,100,150,200,250"/>
              <parameter key="Training.degree" value="1,2,3,4,5"/>
            </list>
            <operator name="Validation" class="XValidation" expanded="yes">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="Training" class="LibSVMLearner">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="kernel_type" value="poly"/>
                    <parameter key="degree" value="1"/>
                    <parameter key="C" value="50"/>
                    <parameter key="epsilon" value="0.01"/>
                    <list key="class_weights">
                    </list>
                </operator>
                <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                    <operator name="ModelWriter" class="ModelWriter">
                        <parameter key="model_file" value="/tmp/mymodel.mod"/>
                        <parameter key="output_type" value="XML"/>
                    </operator>
                    <operator name="CommandLineOperator" class="CommandLineOperator">
                        <parameter key="command" value="INVOKE EXTERNAL TOOL"/>
                    </operator>
                    <operator name="loadPerformance" class="CSVExampleSource">
                    </operator>
                    <operator name="Data2Performance" class="Data2Performance">
                        <parameter key="performance_type" value="data_value"/>
                        <parameter key="attribute_name" value="performance"/>
                        <parameter key="example_index" value="1"/>
                        <parameter key="optimization_direction" value="minimize"/>
                    </operator>
                </operator>
            </operator>
        </operator>
        <operator name="ParameterSetWriter" class="ParameterSetWriter">
            <parameter key="parameter_file" value="/tmp/parameters.par"/>
        </operator>
    </operator>
    The idea is to let the external program write a new example set containing only one attribute and one example. This is read using the CSVExampleSource for example. The Data2Performance Operator will then extract a value (probably the only one existing) and return it as a PerformanceVector. The direction for optimization might be specified.

    Greetings,
      Sebastian
  • chris_mlchris_ml Member Posts: 17 Maven
    Hi Sebastian,

    this seems to be a very good solution. I need just a tiny change:
    the performance value generated by the external application has
    a predefined format. It looks something like

      #Performance 120.25

    Is there an operator which I can use instead of the CVSExampleSource
    that allows me to parse arbitrary strings in order to retrieve the integer
    (here 120.25)?
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Chris,
    this is possible using the CSVExampleSource itself. If you have switched to the expert modus, there are several parameters to specify which character seperates the columns. Probably the default should do it in your case. Unfortunately the operator uses the # as a comment symbol as default, so you will have to uncheck the boolean "use_comment_characters".

    Good luck!
      Sebastian
Sign In or Register to comment.