"[Solved] Choosing best parameters resulted by Cross validation"

njasajnjasaj Member Posts: 18 Contributor II
edited June 2019 in Help
Hi all,
I need to build a model by SVM. I have used grid search and cross validation (k=2 to 20) in order to find best parameters. The problem is it that when i log cross validation accuracy, there is a lot of parameter combination which has same accuracy and same confusion matrix but when I apply those parameters on test data set i get very different accuracies (from 90 to 60). In real world problems we have no acess to test data set, So how should i select the best combination?


  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    what do you mean by "k"? The SVM does not have a "k" parameter. Please post your process setup - to see how to do that, please have a look at the post linked in my signature.

    Best, Marius
  • njasajnjasaj Member Posts: 18 Contributor II
    Dear Marius,
    Thanks for your attention.k is number of folds in cross validation.I choose the parameters which lead to _best correlation coefficient_ but unfortunately  this combination of C and gamma doesn't have proper result on unseen data and there is about 20 difference between model application on correlation coefficient of training data and unseen data. If i choose another combination from log of parameter optimization (parameter with just a bit lower correlation coefficient or same correlation coefficient), the model will have much better performance on unseen data. How should i choose the best parameters form cross validation and parameter optimization? Is the parameters which lead to best performance should be selected or there is a rule for selection of best parameters?
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="452" width="873">
          <operator activated="true" class="read_csv" compatibility="5.2.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="75">
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          <operator activated="true" class="normalize" compatibility="5.2.008" expanded="true" height="94" name="Normalize" width="90" x="246" y="75">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="method" value="range transformation"/>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.2.008" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="380" y="75">
            <list key="parameters">
              <parameter key="SVM.C" value="[1;4000;100;linear]"/>
              <parameter key="SVM.gamma" value="[0.1;0.5;10;linear]"/>
            <process expanded="true" height="470" width="891">
              <operator activated="true" class="x_validation" compatibility="5.2.008" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <process expanded="true" height="470" width="420">
                  <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.008" expanded="true" height="76" name="SVM" width="90" x="111" y="25">
                    <parameter key="svm_type" value="epsilon-SVR"/>
                    <parameter key="gamma" value="0.1"/>
                    <parameter key="C" value="3334.0"/>
                    <parameter key="p" value="0.01"/>
                    <list key="class_weights"/>
                  <connect from_port="training" to_op="SVM" to_port="training set"/>
                  <connect from_op="SVM" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                <process expanded="true" height="470" width="420">
                  <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="30" y="22">
                    <list key="application_parameters"/>
                  <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="187" y="26">
                    <parameter key="main_criterion" value="correlation"/>
                    <parameter key="absolute_error" value="true"/>
                    <parameter key="correlation" value="true"/>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
              <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="447" y="30">
                <list key="log">
                  <parameter key="cc" value="operator.Performance.value.correlation"/>
                  <parameter key="rmse" value="operator.Performance.value.root_mean_squared_error"/>
                  <parameter key="c" value="operator.SVM.parameter.C"/>
                  <parameter key="g" value="operator.SVM.parameter.gamma"/>
              <connect from_port="input 1" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
          <connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
    [ /code]
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    the default of 10 folds is usually a good choice.

    Optimizing the parameters C and gamma is correct, but you should set the range from 1e-6 to 1e3 (1*10^-6 - 1*10^3) on a logarithmic scale for both parameters.

    You chose to log the values of the Performance operator - that will give you only the performance of the last fold. Instead, you want to log the performance of the complete X-Validation, so change the Log operator to log the value "performance" of the Validation. Additionally you may want to log the deviation value, which will give the standard deviation of the performance over the 10 folds.
    The Validation provides 4 performance values: "performance" is the criterion which you selected as "main_criterion" in the Performance operator. "performance1,2,3" deliver the first 3 criteria (from top to bottom) activated in the performance operator. The deviation always refers to the main criterion.

    For the interpretation and choice of parameters, not always the parameters which lead to the highest performance are the best. You also should consider the deviation. If the deviation is high, then there is a big probabilty, that the performance on new data will differ  significantly from the estimated performance. So if the second-best parameter combination has a much lower standard deviation, you should consider to use that one instead.

    Happy Mining!
  • njasajnjasaj Member Posts: 18 Contributor II
    Hi Marius,
    It was a complete and nice answer and helped me a lot, actually solved my problem.Thank you very much.
  • mafern76mafern76 Member Posts: 45 Contributor II

    I'm reviving this because it came up in my search and I think it is relevant to my question.

    I have already been doing logging to select parameters based on high performance and low deviation, but what if you actually have high deviation-deviation? I mean, for example, you run 10 10-folds x validations and you get deviation values from 0.003 to 0.3. I came to this problem when looping 100 NNetworks with parameters obtained from a 0.003 deviation x-fold: AUCs ranging from 0.7 to 0.73.

    I did ran 5 folds for that parameter search instead of 10, would you attribute the issue to only that? Or are there also other common, overlooked factors?

    I'm thinking sample size, but assuming that is "OK", could it be possible to blame something else? Algorithm related maybe, regardless of parameters?

    Thanks for your insight and the parameters range suggestion for SVM!

  • mafern76mafern76 Member Posts: 45 Contributor II
    Well I just tested how deviation was altered while changing number of folds, and I could observe less folds drastically reduced deviation.

    About sample size: less folds = less sample size for training. Is it reasonable to rule out sample size as a problem, or could increasing testing size actually be part of the solution? Regarding this, I'm wondering, when I establish a minimum sample size, shouldn't that be multiplied by the number of folds my x-val has? In my particular case I'm using a very basic rule of thumb: number of attributes x 10 x 2 (2 for binominal label). I'm thinking about this in order to minimize deviation due to small test sample.

    As you can see I'm exploring, any help would be much appreciated, thanks.
Sign In or Register to comment.