what is the inside mechanism of optimize parameters

seenomeseenome Member Posts: 4 Contributor I
edited November 2018 in Help
I found that the optimized parameter given by optimize parameters actually doesn't have the highest precision or accuracy. So just wonder how the optimal value is decided? Or further can user define his own rules?
Thanks.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,
    there are different optimize parameters operators - which one did you use? Some of them do not garantee to find the best parameter combination, but only a "probably quite good" solution (like evolutionary parameter optimization).
    Furthermore, how did you find out that your parameters don't deliver the best performance? If you tested on a different dataset than you trained with, the results may vary slightly, even if the data are from the same distribution. And there are some other things to consider, maybe this thread can help you: http://rapid-i.com/rapidforum/index.php/topic,4018.msg14881.html

    Cheers, Marius
  • seenomeseenome Member Posts: 4 Contributor I
    Thanks for reply.

    I put X-validation inside Optimize Parameters Grid, SVM inside X-validation (10 fold).
    For example, I set up SVM.C in [1,5] with 5 steps, precision as the main criteria for Performance.
    The result shows SVM.C=2.0. However when I draw the precision series curve in log, 2.0 doesn't has the highest precision.
    That's where my question comes from.
    Thanks.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Which performance does your Log operator log? If it logs the performance of the X-Validation, in your case 2.0 should indeed also be the maximum of the plot. Logging the values from the Performance operator inside the X-Validation won't work beause of the reasons stated in the other thread.

    Cheers,
    Marius
  • seenomeseenome Member Posts: 4 Contributor I
    unfortunately, I put the log outside X-validation, which should work.  I post my codes here just in case you had time to take a quick look. Thanks a lot
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.8" expanded="true" name="Process">
        <parameter key="logfile" value="/home/yzheng/workspace/alldata/svm-log-sys.csv"/>
        <parameter key="resultfile" value="/home/yzheng/workspace/alldata/svmgrid-result.csv"/>
        <process expanded="true" height="360" width="1005">
          <operator activated="true" class="read_csv" compatibility="5.0.10" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
            <parameter key="file_name" value="/home/yzheng/workspace/alldata/svmgrid-data-new.csv"/>
            <parameter key="encoding" value="UTF-8"/>
            <parameter key="trim_lines" value="true"/>
            <parameter key="column_separators" value=","/>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
            <list key="data_set_meta_data_information"/>
          </operator>
          <operator activated="true" class="guess_types" compatibility="5.0.10" expanded="true" height="76" name="Guess Types" width="90" x="177" y="84">
            <parameter key="block_type" value="value_matrix"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="Set Role (3)" width="90" x="313" y="120">
            <parameter key="name" value="class"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.1.8" expanded="true" height="112" name="Optimize Parameters (Grid)" width="90" x="514" y="75">
            <list key="parameters">
              <parameter key="SVM.C" value="[1;10;100;linear]"/>
            </list>
            <process expanded="true" height="360" width="1005">
              <operator activated="true" class="x_validation" compatibility="5.1.8" expanded="true" height="112" name="Validation" width="90" x="324" y="133">
                <parameter key="average_performances_only" value="false"/>
                <parameter key="local_random_seed" value="1978"/>
                <process expanded="true" height="360" width="477">
                  <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.0.10" expanded="true" height="76" name="SVM" width="90" x="190" y="86">
                    <parameter key="kernel_type" value="linear"/>
                    <parameter key="C" value="10.0"/>
                    <list key="class_weights"/>
                  </operator>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="36"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true" height="360" width="477">
                  <operator activated="true" class="apply_model" compatibility="5.1.8" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance_classification" compatibility="5.0.10" expanded="true" height="76" name="Performance" width="90" x="246" y="30">
                    <parameter key="main_criterion" value="accuracy"/>
                    <parameter key="weighted_mean_recall" value="true"/>
                    <parameter key="weighted_mean_precision" value="true"/>
                    <list key="class_weights"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="log" compatibility="5.1.8" expanded="true" height="76" name="Log" width="90" x="514" y="120">
                <parameter key="filename" value="/home/yzheng/workspace/alldata/svmgrid-log-performance.csv"/>
                <list key="log">
                  <parameter key="c" value="operator.SVM.parameter.C"/>
                  <parameter key="accuracy" value="operator.Performance.value.accuracy"/>
                  <parameter key="precision" value="operator.Performance.value.weighted_mean_precision"/>
                  <parameter key="recall" value="operator.Performance.value.weighted_mean_recall"/>
                </list>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="90"/>
              <portSpacing port="source_input 2" spacing="18"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write" compatibility="5.0.10" expanded="true" height="60" name="Write" width="90" x="715" y="165">
            <parameter key="object_file" value="/home/yzheng/workspace/alldata/svmgrid-result"/>
            <parameter key="output_type" value="XML"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Guess Types" to_port="example set input"/>
          <connect from_op="Guess Types" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
          <connect from_op="Set Role (3)" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_op="Write" to_port="object"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_port="result 2"/>
          <connect from_op="Write" from_port="object" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="36"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Heya,

    indeed it is correct to put the log outside the XVal, however, you are logging the performance of the Performance operator, which delivers the performance of its last application, i.e. in the last iteration of the XVal. To get the average result of the XVal (which is what you want), log the performance of the XVal operator itself.
  • seenomeseenome Member Posts: 4 Contributor I
    Problem solved. Thanks for the insight which makes perfect sense.  ;D
Sign In or Register to comment.