Adaptive learning

Danyo83Danyo83 Member Posts: 41 Contributor II
edited June 2019 in Help
Hi,

when using time series on real time,  it would be helpful to have a tool which learns adaptively so that one does not have to build a model again and again, just taking into account the new incoming data. It would be a lot less computational expensive than leave one out Xvalidation.

Answers

  • wesselwessel Member Posts: 537 Maven
    Hey,

    You can already do this in Rapid Miner.
    These are called up-datable models.

    Some machine learning techniques allow for updating really easy:
    - Nearest Neighbors (just change the data)
    - Naive Bayes (just change the means)

    Some are more hard:
    - Tree models (propagate new data and re-prune) (can not do this in rapid miner yet, but should be easy to change the java code)
    - Linear Regression (check a paper on how to update regression models using 1 extra data point)

    More advanced stuff can not be in a tool like Rapid Miner.
    Updating models is a hot topic in Machine Learning.
    If you have a good idea for updating, you can probably write a good paper on this topic!

    Best regards,

    Wessel
  • Danyo83Danyo83 Member Posts: 41 Contributor II
    Hey Wessel,

    thanks for the info. But still even with k-NN and Naive Bayes you would have to train again the whole dataset with the new data point right? It is not that the existing model would be only extended but completely new trained?! so it would be the same as e.g. with SVM where the support vectors would be constructed every time you run the training again, except that k-NN and NB are fare faster...

    Daniel

  • wesselwessel Member Posts: 537 Maven
    @ kNN
    No, because training of kNN is instant...

    @ Other
    Use "Update Model"

  • wesselwessel Member Posts: 537 Maven
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.005" expanded="true" height="60" name="Retrieve" width="90" x="45" y="120">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="split_data" compatibility="5.3.005" expanded="true" height="76" name="Split Data" width="90" x="246" y="120">
            <enumeration key="partitions">
              <parameter key="ratio" value="0.5"/>
              <parameter key="ratio" value="0.5"/>
            </enumeration>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="5.3.005" expanded="true" height="76" name="Naive Bayes" width="90" x="313" y="255"/>
          <operator activated="false" class="decision_tree" compatibility="5.3.005" expanded="true" height="76" name="Decision Tree" width="90" x="45" y="480"/>
          <operator activated="true" class="update_model" compatibility="5.3.005" expanded="true" height="76" name="Update Model" width="90" x="246" y="345"/>
          <connect from_op="Retrieve" from_port="output" to_op="Split Data" to_port="example set"/>
          <connect from_op="Split Data" from_port="partition 1" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Update Model" to_port="model"/>
          <connect from_op="Naive Bayes" from_port="exampleSet" to_op="Update Model" to_port="example set"/>
          <connect from_op="Update Model" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Danyo83Danyo83 Member Posts: 41 Contributor II
    Hey Wessel,

    thanks a lot but I have a problem to understand how this works exactly. Consider a this case: You have time series features and you have 1000 instances for learning, you can handle this optimization (e.g. Feature selection) of the training data via (linear) simple or sliding window validation. After that you train the learner (with the optimal features) for the whole training set again, After that you have one new unseen instance for testing. After that in the next period this instance goes into the training set and the next new instance is for testing. The already trained model (with 1000 instances) shall be extended adjusted with the 1 new instance, but not trained completely again with the whole data set (1001 instances).

    How would you realize this?

    I know it is not correct but my approach would be like this:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="1000_instance_train_data" width="90" x="45" y="30">
            <parameter key="csv_file" value="D:\Promotion\Matlab\Ich\Workspaces\Tag\Feature_Matrix_nonlin_check.csv"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="label.true.binominal.label"/>
              <parameter key="1" value="a1.true.real.attribute"/>
              <parameter key="2" value="a2.true.real.attribute"/>
              <parameter key="3" value="a3.true.real.attribute"/>
              <parameter key="4" value="a4.true.real.attribute"/>
              <parameter key="322" value="a322.true.real.attribute"/>
              <parameter key="323" value="a323.true.real.attribute"/>
              <parameter key="324" value="a324.true.real.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.3.005" expanded="true" height="94" name="Multiply" width="90" x="179" y="30"/>
          <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="1_new_instance" width="90" x="112" y="300">
            <parameter key="csv_file" value="D:\Promotion\Matlab\Ich\Workspaces\Tag\Feature_Matrix_nonlin_check.csv"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="label.true.integer.attribute"/>
              <parameter key="1" value="a1.true.real.attribute"/>
              <parameter key="2" value="a2.true.real.attribute"/>
              <parameter key="3" value="a3.true.real.attribute"/>
              <parameter key="4" value="a4.true.real.attribute"/>
              <parameter key="321" value="a321.true.real.attribute"/>
              <parameter key="322" value="a322.true.real.attribute"/>
              <parameter key="323" value="a323.true.real.attribute"/>
              <parameter key="324" value="a324.true.real.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="optimize_selection_forward" compatibility="5.3.005" expanded="true" height="94" name="Forward Selection" width="90" x="313" y="30">
            <process expanded="true">
              <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="112" y="30">
                <process expanded="true">
                  <operator activated="true" class="naive_bayes" compatibility="5.3.005" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
                  <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
                  <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance_classification" compatibility="5.3.005" expanded="true" height="76" name="Performance_train_data_validation" width="90" x="216" y="30">
                    <list key="class_weights"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance_train_data_validation" to_port="labelled data"/>
                  <connect from_op="Performance_train_data_validation" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="example set" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="select_by_weights" compatibility="5.3.005" expanded="true" height="94" name="Select by Weights" width="90" x="179" y="165"/>
          <operator activated="true" class="naive_bayes" compatibility="5.3.005" expanded="true" height="76" name="Naive Bayes (2)" width="90" x="313" y="165"/>
          <operator activated="true" class="multiply" compatibility="5.3.005" expanded="true" height="94" name="Multiply (2)" width="90" x="447" y="165"/>
          <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model_train_data_new" width="90" x="581" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.3.005" expanded="true" height="76" name="Performance_train_data_new" width="90" x="715" y="165">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="select_by_weights" compatibility="5.3.005" expanded="true" height="94" name="Select by Weights (2)" width="90" x="313" y="300"/>
          <operator activated="true" class="update_model" compatibility="5.3.005" expanded="true" height="76" name="Update Model" width="90" x="447" y="300"/>
          <operator activated="true" class="read_csv" compatibility="5.3.005" expanded="true" height="60" name="1001_instances_new_training" width="90" x="112" y="435">
            <parameter key="csv_file" value="D:\Promotion\Matlab\Ich\Workspaces\Tag\Feature_Matrix_nonlin_check.csv"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="label.true.binominal.label"/>
              <parameter key="1" value="a1.true.real.attribute"/>
              <parameter key="2" value="a2.true.real.attribute"/>
              <parameter key="3" value="a3.true.real.attribute"/>
              <parameter key="4" value="a4.true.real.attribute"/>
              <parameter key="321" value="a321.true.real.attribute"/>
              <parameter key="322" value="a322.true.real.attribute"/>
              <parameter key="323" value="a323.true.real.attribute"/>
              <parameter key="324" value="a324.true.real.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="select_by_weights" compatibility="5.3.005" expanded="true" height="94" name="Select by Weights (3)" width="90" x="380" y="435"/>
          <operator activated="true" class="apply_model" compatibility="5.3.005" expanded="true" height="76" name="Apply Model_to_new_trainingset" width="90" x="581" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.3.005" expanded="true" height="76" name="Performance_train_data_new (2)" width="90" x="715" y="300">
            <list key="class_weights"/>
          </operator>
          <connect from_op="1000_instance_train_data" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Forward Selection" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Select by Weights" to_port="example set input"/>
          <connect from_op="1_new_instance" from_port="output" to_op="Select by Weights (2)" to_port="example set input"/>
          <connect from_op="Forward Selection" from_port="attribute weights" to_op="Select by Weights" to_port="weights"/>
          <connect from_op="Forward Selection" from_port="performance" to_port="result 1"/>
          <connect from_op="Select by Weights" from_port="example set output" to_op="Naive Bayes (2)" to_port="training set"/>
          <connect from_op="Select by Weights" from_port="weights" to_op="Select by Weights (2)" to_port="weights"/>
          <connect from_op="Naive Bayes (2)" from_port="model" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Naive Bayes (2)" from_port="exampleSet" to_op="Apply Model_train_data_new" to_port="unlabelled data"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Apply Model_train_data_new" to_port="model"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="Update Model" to_port="model"/>
          <connect from_op="Apply Model_train_data_new" from_port="labelled data" to_op="Performance_train_data_new" to_port="labelled data"/>
          <connect from_op="Performance_train_data_new" from_port="performance" to_port="result 2"/>
          <connect from_op="Select by Weights (2)" from_port="example set output" to_op="Update Model" to_port="example set"/>
          <connect from_op="Select by Weights (2)" from_port="weights" to_op="Select by Weights (3)" to_port="weights"/>
          <connect from_op="Update Model" from_port="model" to_op="Apply Model_to_new_trainingset" to_port="model"/>
          <connect from_op="1001_instances_new_training" from_port="output" to_op="Select by Weights (3)" to_port="example set input"/>
          <connect from_op="Select by Weights (3)" from_port="example set output" to_op="Apply Model_to_new_trainingset" to_port="unlabelled data"/>
          <connect from_op="Apply Model_to_new_trainingset" from_port="labelled data" to_op="Performance_train_data_new (2)" to_port="labelled data"/>
          <connect from_op="Performance_train_data_new (2)" from_port="performance" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    .

     
  • wesselwessel Member Posts: 537 Maven
    And you can not throw in a lot of CPU power, and retrain all your models?

    You should evaluate how x-validation error differs from sliding window validation error.
    If x-validation is a good proxy, you can save a lot of computation here (model selection).
    After selecting your optimal scheme, you do 1 final sliding window validation on all your data, to give an estimate of error on your best model (error estimation).

    Best regards,

    Wessel
Sign In or Register to comment.