Classifying new examples without re-running the existing trained model

brett_800brett_800 Member Posts: 8 Contributor II
edited August 2019 in Help
How can I run a classification for new examples against my trained model, without re-running the trained model again?

The trained model takes some time to process (1 hour), and I'd like to classify new examples without having to wait every time. I've never separated these two processes before, I always had them in the same process flow window, as I don't know to execute these processes independently.






Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    You need two processes for that.

    The first one is the training process: create the model, and then use the Store operator to save the model to the repository.

    Then create a second model, the application process: load your new examples, use the Retrieve operator to load the model from the repository, and use Apply Model as usual.

    Best regards,
    Marius
  • brett_800brett_800 Member Posts: 8 Contributor II
    Thanks for the reply Marius. 

    I have a couple of questions regarding the process flow as listed below.  This example process flow is my actual flow, apart from a substitution of an example dataset. 

    My first question is given the loops in this process, where is the correct location to connect the Store (Model) operator in this process flow?  Secondly, with the pre-processing I am conducting in this process flow, how do I ensure that new, out of sample data is treated with exactly the same pre-processing (I think ICA generates custom weights that might have to be written somewhere..?) when I create a new process flow utilizing my previously stored model?

    The process xml:


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.012">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.012" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="numerical_to_binominal" compatibility="5.3.012" expanded="true" height="76" name="Numerical to Binominal" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="20_OV_COVER"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.3.012" expanded="true" height="76" name="Set Role" width="90" x="45" y="120">
            <parameter key="attribute_name" value="class"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="5.3.012" expanded="true" height="94" name="Normalize" width="90" x="179" y="120"/>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.012" expanded="true" height="94" name="Nominal to Numerical (2)" width="90" x="45" y="210">
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.3.012" expanded="true" height="94" name="Replace Missing Values" width="90" x="179" y="210">
            <list key="columns"/>
          </operator>
          <operator activated="true" class="independent_component_analysis" compatibility="5.3.012" expanded="true" height="94" name="ICA" width="90" x="313" y="210">
            <parameter key="number_of_components" value="700"/>
          </operator>
          <operator activated="true" class="optimize_selection_forward" compatibility="5.3.012" expanded="true" height="94" name="Forward Selection" width="90" x="514" y="75">
            <parameter key="maximal_number_of_attributes" value="100"/>
            <parameter key="speculative_rounds" value="10"/>
            <process expanded="true">
              <operator activated="true" class="x_validation" compatibility="5.3.012" expanded="true" height="112" name="Validation" width="90" x="112" y="30">
                <parameter key="number_of_validations" value="5"/>
                <process expanded="true">
                  <operator activated="true" class="naive_bayes" compatibility="5.3.012" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
                  <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
                  <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="5.3.012" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="5.3.012" expanded="true" height="76" name="Performance" width="90" x="276" y="30"/>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="example set" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Nominal to Numerical (2)" to_port="example set input"/>
          <connect from_op="Nominal to Numerical (2)" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="ICA" to_port="example set input"/>
          <connect from_op="ICA" from_port="example set output" to_op="Forward Selection" to_port="example set"/>
          <connect from_op="ICA" from_port="original" to_port="result 1"/>
          <connect from_op="ICA" from_port="preprocessing model" to_port="result 2"/>
          <connect from_op="Forward Selection" from_port="example set" to_port="result 3"/>
          <connect from_op="Forward Selection" from_port="attribute weights" to_port="result 4"/>
          <connect from_op="Forward Selection" from_port="performance" to_port="result 5"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="18"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
          <portSpacing port="sink_result 6" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Most preprocessing operators have a "pre" output that delivers a preprocessing model that can be applied with Apply Model the same way as prediction models.

    Since you have several prerprocessing steps you will find the Group Models operator useful to combine several (pre)processing models such that they can be applied with one single Apply Model operator. This combined model can also be stored in the repository.


    Since Forward Selection only puts out the attribute weights, but not the model itself, you have to train the model in a separate step. See the process below for an example.

    Btw, the attribute weights should also be stored in the model (att output of the Forward Selection), and applied before model application in the application process with Select by Weights.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.013" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="optimize_selection_forward" compatibility="5.3.013" expanded="true" height="94" name="Forward Selection" width="90" x="179" y="75">
            <parameter key="maximal_number_of_attributes" value="100"/>
            <parameter key="speculative_rounds" value="10"/>
            <process expanded="true">
              <operator activated="true" class="x_validation" compatibility="5.3.013" expanded="true" height="112" name="Validation" width="90" x="112" y="30">
                <parameter key="number_of_validations" value="5"/>
                <process expanded="true">
                  <operator activated="true" class="naive_bayes" compatibility="5.3.013" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
                  <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
                  <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="5.3.013" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="5.3.013" expanded="true" height="76" name="Performance" width="90" x="276" y="30"/>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="example set" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_port="performance"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="5.3.013" expanded="true" height="76" name="Naive Bayes (2)" width="90" x="313" y="30"/>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Forward Selection" to_port="example set"/>
          <connect from_op="Forward Selection" from_port="example set" to_op="Naive Bayes (2)" to_port="training set"/>
          <connect from_op="Naive Bayes (2)" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.