Options

Inconsistent results with Optimize Parameters (Grid)

lsevellsevel Member Posts: 18 Contributor II
edited November 2018 in Help
Hi all,

I've been working with some extracted connectivity values from fMRI data and am attempting to use Optimize Parameters (Grid) to determine parameter values within a stacked model. (Optimize Parameters-->Cross Validation-->Stacking, etc). I've found that my accuracy values with an optimized model performed in the Optimize Parameters operator (86.67%) are different from those performed with ostensibly the same parameters as those chosen in the Optimize operator but when performed with only cross validation (Cross Validation-->Stacking, etc) (accuracy = 77.50%). Is this difference to be expected? If so, which operator provides the most valid results?

Thank you,

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Hi,

    are you sure that your optimization does not yield to overfitting?

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    lsevellsevel Member Posts: 18 Contributor II
    I suppose that could be the case but then presumably the model would still seem overfit in the cross validation step performed within optimization?
  • Options
    earmijoearmijo Member Posts: 270 Unicorn
    If the sample size is too small it can happen too. I would fix the random seed in the X-val operator. Then you should get exactly the same results.
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Out of curiousity have you considered putting your X-Validation inside the optimise parameters? 
    This might help prevent overfitting. 

    See below for a crude example.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.000">
      <operator activated="true" class="log" compatibility="7.0.000" expanded="true" height="82" name="Log" width="90" x="581" y="85">
        <parameter key="filename" value="D:\log_values.txt"/>
        <list key="log">
          <parameter key="Count" value="operator.SVM.value.applycount"/>
          <parameter key=" Testing Error" value="operator.Performance.value.performance"/>
          <parameter key="Training Error" value="operator.Performance (2).value.performance"/>
          <parameter key="SVM C" value="operator.SVM.parameter.C"/>
          <parameter key="SVM gamma" value="operator.SVM.parameter.gamma"/>
        </list>
        <parameter key="sorting_type" value="none"/>
        <parameter key="sorting_k" value="100"/>
        <parameter key="persistent" value="false"/>
      </operator>
    </process>
  • Options
    lsevellsevel Member Posts: 18 Contributor II
    Sorry if my initial post wasn't clear--I do with x-validation within optimize (and stacking within that cross validation). However, when using just x-validation with the same parameters found in the optimize (with x-val nested) I get different results.
Sign In or Register to comment.