X-fold-cross validation on predefined folds [SOLVED]

drakuladrakula Member Posts: 5 Contributor II
edited November 2018 in Help
Hello,
How can I perform an x-fold-cross validation process on already created folds? That is, I have 10 predefined train/test set pairs and I need to apply them all on the same learner (so ultimately I can optimize this learner on the defined folds by using EvolutionaryParameterOptimization operator)? Hope this is clear enough... Thanks in advance!

Answers

  • earmijoearmijo Member Posts: 270 Unicorn
    Clear enough. Use the operator Batch-X-Validation. This operator needs that one of the variables in your dataset has the role of "batch".
  • drakuladrakula Member Posts: 5 Contributor II
    Thank you very much!
  • drakuladrakula Member Posts: 5 Contributor II
    Um, I have just realized I don't actually do a standard 10-fold-cross validation. In my setting, data is divided in 10 folds, but in each round 6 folds are used as train data and 4 as test data.
    So it would be really helpful if I could somehow just load mine train/test pairs and apply them all on the same learner. Or can I somehow set this test/train ratio in Batch-X-Validation?
  • earmijoearmijo Member Posts: 270 Unicorn
    Then my previous answer (use batch-X-validation) probably is not the best idea. I guess you could achieve the results you want using the operator FilterExamples with condition class = attribute value filter. Or perhaps there is some operator that involves looping in a more efficient way. But somebody else will have to give you a hand here. It is beyond my limited expertise :-)
  • earmijoearmijo Member Posts: 270 Unicorn
    I'm thinking of something like this (I'm sure there is a more economical way of doing it):
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <process expanded="true" height="476" width="882">
          <operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="210">
            <parameter key="repository_entry" value="//Clases/Datos/batch"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.2.006" expanded="true" height="94" name="Multiply" width="90" x="45" y="30"/>
          <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="pair1 = training"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes" width="90" x="380" y="30">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|income"/>
          </operator>
          <operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression" width="90" x="514" y="30">
            <parameter key="feature_selection" value="none"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (2)" width="90" x="246" y="165">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="pair1 = testing"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (2)" width="90" x="380" y="165">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|income"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.2.006" expanded="true" height="76" name="Apply Model" width="90" x="581" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.2.006" expanded="true" height="76" name="Performance" width="90" x="733" y="162"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
          <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="126"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    The dataset used looks like this:

    pair1,pair2,income,consumption
    training,testing,119,154
    training,testing,85,123
    training,training,97,125
    training,testing,95,130
    training,training,120,151
    training,training,92,131
    training,training,105,141
    training,training,110,141
    training,training,98,130
    training,testing,98,134
    training,training,81,115
    training,training,81,117
    training,training,91,123
    training,training,105,144
    training,training,100,137
    training,training,107,140
    training,training,82,123
    training,training,84,115
    training,testing,100,134
    training,testing,108,147
    training,training,116,144
    training,training,115,144
    training,training,93,126
    training,training,105,141
    training,training,89,124
    training,training,104,144
    training,training,108,144
    training,training,88,129
    training,training,109,137
    training,training,112,144
    testing,testing,96,132
    testing,training,89,125
    testing,training,93,126
    testing,testing,114,140
    testing,training,81,120
    testing,training,84,118
    testing,testing,88,119
    testing,training,96,131
    testing,training,82,127
    testing,testing,114,150

  • drakuladrakula Member Posts: 5 Contributor II
    Thank you very much :)
  • earmijoearmijo Member Posts: 270 Unicorn
    I kept playing with your question. This is the best I could come up with. This is in case the number of pairs is too large.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <process expanded="true" height="476" width="1016">
          <operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Clases/Datos/batch"/>
          </operator>
          <operator activated="true" class="loop" compatibility="5.2.006" expanded="true" height="76" name="Loop" width="90" x="313" y="75">
            <parameter key="set_iteration_macro" value="true"/>
            <parameter key="iterations" value="2"/>
            <process expanded="true" height="740" width="969">
              <operator activated="true" class="multiply" compatibility="5.2.006" expanded="true" height="94" name="Multiply (2)" width="90" x="112" y="165"/>
              <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (3)" width="90" x="246" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="pair%{iteration} = training"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (3)" width="90" x="380" y="30">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="|income"/>
              </operator>
              <operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression (2)" width="90" x="514" y="30">
                <parameter key="feature_selection" value="none"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples (4)" width="90" x="313" y="300">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="pair%{iteration} = testing"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (4)" width="90" x="447" y="255">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="income|"/>
              </operator>
              <operator activated="true" class="apply_model" compatibility="5.2.006" expanded="true" height="76" name="Apply Model (2)" width="90" x="648" y="210">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.006" expanded="true" height="76" name="Performance (2)" width="90" x="782" y="210"/>
              <connect from_port="input 1" to_op="Multiply (2)" to_port="input"/>
              <connect from_op="Multiply (2)" from_port="output 1" to_op="Filter Examples (3)" to_port="example set input"/>
              <connect from_op="Multiply (2)" from_port="output 2" to_op="Filter Examples (4)" to_port="example set input"/>
              <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Select Attributes (3)" to_port="example set input"/>
              <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Linear Regression (2)" to_port="training set"/>
              <connect from_op="Linear Regression (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/>
              <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Loop" to_port="input 1"/>
          <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="126"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.