Increment new data in decision tree

flaviorrrlflaviorrrl Member Posts: 9 Contributor II
edited November 2018 in Help
I have three csv files with different years (2001, 2005, 2009), in relation to the consumption of electric energy in the USA.

I've managed to create the ranking model of the year in 2001, but now wished to update the model incrementing data from 2005. I would like to update the model with incremental data.

someone knows how to do this?

Regards,
flaviorrl

Answers

  • flaviorrrlflaviorrrl Member Posts: 9 Contributor II
    Help me please....
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    the decision tree is not an updatable model. You will have to relearn a new one on the updated training data. The only updatable models in the core of Studio currently are k-NN and Naive Bayes.

    Regards,
    Marco
  • flaviorrrlflaviorrrl Member Posts: 9 Contributor II
    Thank you for the reply...

    And can you tell me how this is possible in K-NN and Naive Bayes? with the update model?

    I tried and couldn't ...
    Marco Boeck wrote:

    Hi,

    the decision tree is not an updatable model. You will have to relearn a new one on the updated training data. The only updatable models in the core of Studio currently are k-NN and Naive Bayes.

    Regards,
    Marco
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    see the following example process which you just need to save in your repository:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.006" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_nominal_data" compatibility="6.0.006" expanded="true" height="60" name="Generate Nominal Data (2)" width="90" x="45" y="165">
            <parameter key="use_local_random_seed" value="true"/>
            <parameter key="local_random_seed" value="2014"/>
          </operator>
          <operator activated="true" class="generate_nominal_data" compatibility="6.0.006" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
            <parameter key="use_local_random_seed" value="true"/>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="6.0.006" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30"/>
          <operator activated="true" class="store" compatibility="6.0.006" expanded="true" height="60" name="Store" width="90" x="313" y="30">
            <parameter key="repository_entry" value="originalModel"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="6.0.006" expanded="true" height="60" name="Retrieve" width="90" x="514" y="165">
            <parameter key="repository_entry" value="originalModel"/>
          </operator>
          <operator activated="true" class="update_model" compatibility="6.0.006" expanded="true" height="76" name="Update Model" width="90" x="514" y="75"/>
          <connect from_op="Generate Nominal Data (2)" from_port="output" to_op="Update Model" to_port="example set"/>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_op="Update Model" to_port="model"/>
          <connect from_op="Retrieve" from_port="output" to_port="result 2"/>
          <connect from_op="Update Model" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco
  • flaviorrrlflaviorrrl Member Posts: 9 Contributor II
    Hi,

    I'm very grateful Marco Boeck for help me.

    To add a third dataset this process, my code xml is the more correct?
    After running the first process adds the saved dataset (retrieve new) and add a new dataset (generate nominal data).

    Rapidminer supports incremental clustering and incremental association rules? For incremental Clustering, I use the cobWeb algorithm with an extension to Weka?

    Could provide a simple example for both?

    Regards,
    flaviorrl
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="generate_nominal_data" compatibility="5.3.015" expanded="true" height="60" name="Generate Nominal Data (2)" width="90" x="45" y="165">
           <parameter key="use_local_random_seed" value="true"/>
           <parameter key="local_random_seed" value="2014"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve newData" width="90" x="45" y="300">
           <parameter key="repository_entry" value="newData"/>
         </operator>
         <operator activated="true" class="generate_nominal_data" compatibility="5.3.015" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
           <parameter key="use_local_random_seed" value="true"/>
         </operator>
         <operator activated="true" class="naive_bayes" compatibility="5.3.015" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30"/>
         <operator activated="true" class="store" compatibility="5.3.015" expanded="true" height="60" name="Store" width="90" x="313" y="30">
           <parameter key="repository_entry" value="//NewLocalRepository/originalModel"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve originalModel" width="90" x="1050" y="300">
           <parameter key="repository_entry" value="//NewLocalRepository/originalModel"/>
         </operator>
         <operator activated="true" class="update_model" compatibility="5.3.015" expanded="true" height="76" name="Update Model" width="90" x="447" y="210"/>
         <operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="648" y="210">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance_classification" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="782" y="30">
           <parameter key="kappa" value="true"/>
           <parameter key="root_mean_squared_error" value="true"/>
           <parameter key="root_relative_squared_error" value="true"/>
           <parameter key="correlation" value="true"/>
           <list key="class_weights"/>
         </operator>
         <operator activated="true" class="store" compatibility="5.3.015" expanded="true" height="60" name="Store (2)" width="90" x="916" y="120">
           <parameter key="repository_entry" value="newData"/>
         </operator>
         <connect from_op="Retrieve newData" from_port="output" to_op="Update Model" to_port="example set"/>
         <connect from_op="Generate Nominal Data" from_port="output" to_op="Naive Bayes" to_port="training set"/>
         <connect from_op="Naive Bayes" from_port="model" to_op="Store" to_port="input"/>
         <connect from_op="Store" from_port="through" to_op="Update Model" to_port="model"/>
         <connect from_op="Retrieve originalModel" from_port="output" to_port="result 3"/>
         <connect from_op="Update Model" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Update Model" from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Performance" from_port="performance" to_port="result 1"/>
         <connect from_op="Performance" from_port="example set" to_op="Store (2)" to_port="input"/>
         <connect from_op="Store (2)" from_port="through" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
         <portSpacing port="sink_result 4" spacing="0"/>
       </process>
     </operator>
    </process>

    Marco Boeck wrote:

    Hi,

    see the following example process which you just need to save in your repository:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.006">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.0.006" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="generate_nominal_data" compatibility="6.0.006" expanded="true" height="60" name="Generate Nominal Data (2)" width="90" x="45" y="165">
           <parameter key="use_local_random_seed" value="true"/>
           <parameter key="local_random_seed" value="2014"/>
         </operator>
         <operator activated="true" class="generate_nominal_data" compatibility="6.0.006" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
           <parameter key="use_local_random_seed" value="true"/>
         </operator>
         <operator activated="true" class="naive_bayes" compatibility="6.0.006" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30"/>
         <operator activated="true" class="store" compatibility="6.0.006" expanded="true" height="60" name="Store" width="90" x="313" y="30">
           <parameter key="repository_entry" value="originalModel"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="6.0.006" expanded="true" height="60" name="Retrieve" width="90" x="514" y="165">
           <parameter key="repository_entry" value="originalModel"/>
         </operator>
         <operator activated="true" class="update_model" compatibility="6.0.006" expanded="true" height="76" name="Update Model" width="90" x="514" y="75"/>
         <connect from_op="Generate Nominal Data (2)" from_port="output" to_op="Update Model" to_port="example set"/>
         <connect from_op="Generate Nominal Data" from_port="output" to_op="Naive Bayes" to_port="training set"/>
         <connect from_op="Naive Bayes" from_port="model" to_op="Store" to_port="input"/>
         <connect from_op="Store" from_port="through" to_op="Update Model" to_port="model"/>
         <connect from_op="Retrieve" from_port="output" to_port="result 2"/>
         <connect from_op="Update Model" from_port="model" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    Regards,
    Marco
Sign In or Register to comment.