Using the Select Attribute Operator during training

ml1nml1n Member Posts: 8 Contributor II
edited November 2018 in Help
Hello,
I've trained a Neural Network by using a select attribute operator to reduce 85 attributes from a database view to 25 attributes used in training.

After training, I've saved the Neural Network to the repository and attempted to run it with unseen examples using the Read CSV operator.
The unseen sample contains a single record with only the 25 attributes that the select operator presented during training but I get an error from the neural Network:

Apr 11, 2012 9:54:23 PM SEVERE: java.lang.ArrayIndexOutOfBoundsException: DataRow: table index 85 of Attribute avg_score_4 is out of bounds.


If I present an example containing all 85 original variables, it works.
Have I misunderstood what the select operator is designed to do? I'm guessing I could make this work by making a database view containing only the variables I need but that seems redundant when the select operator is available.

What am I missing?

Many thanks,
M.

Answers

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hello,

    with just your description it is hard to understand the problem you are having.
    Please post your processes train and apply processes like it is described here: http://rapid-i.com/rapidforum/index.php/topic,4654.0.html

    Best,
    Nils
  • ml1nml1n Member Posts: 8 Contributor II
    Hi,
    Apologies for not posting the process, I assumed I was doing something obviously stupid.
    The training process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
       <process expanded="true" height="386" width="949">
         <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
           <parameter key="repository_entry" value="//DB/black.pg/Example Sets/public.vw___w_stats_b_0610"/>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.1.017" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
           <parameter key="name" value="hvwo"/>
           <parameter key="target_role" value="label"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="select_attributes" compatibility="5.1.017" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="hvwo|a_at_strength_10|a_at_strength_4|a_at_strength_40|a_de_strength_10|a_de_strength_4|a_de_strength_40|h_at_strength_10|h_at_strength_4|h_at_strength_40|h_de_strength_10|h_de_strength_4|h_de_strength_40||avg_h_score_40|avg_h_score_4|avg_h_score_10|avg_h_concd_40|avg_h_concd_4|avg_h_concd_10|avg_a_score_40|avg_a_score_4|avg_a_score_10|avg_a_concd_40|avg_a_concd_4|avg_a_concd_10"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="5.1.017" expanded="true" height="94" name="Normalize" width="90" x="447" y="30">
           <parameter key="method" value="range transformation"/>
         </operator>
         <operator activated="true" class="neural_net" compatibility="5.1.017" expanded="true" height="76" name="Neural Net" width="90" x="581" y="30">
           <list key="hidden_layers"/>
           <parameter key="normalize" value="false"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="300">
           <parameter key="repository_entry" value="//DB/black.pg/Example Sets/public.vw___w_stats_b_1011"/>
         </operator>
         <operator activated="true" class="select_attributes" compatibility="5.1.017" expanded="true" height="76" name="Select Attributes (2)" width="90" x="179" y="255">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="avg_a_concd_10|avg_a_concd_4|avg_a_concd_40|avg_a_score_10|avg_a_score_4|avg_a_score_40|avg_h_concd_10|avg_h_concd_4|avg_h_concd_40|avg_h_score_10|avg_h_score_4|avg_h_score_40|a_at_strength_10|a_at_strength_4|a_at_strength_40|a_de_strength_10|a_de_strength_4|a_de_strength_40|h_at_strength_10|h_at_strength_4|h_at_strength_40|h_de_strength_10|h_de_strength_4|h_de_strength_40|hvwo"/>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.1.017" expanded="true" height="76" name="Set Role (2)" width="90" x="313" y="210">
           <parameter key="name" value="hvwo"/>
           <parameter key="target_role" value="label"/>
           <list key="set_additional_roles">
             <parameter key="hvwo" value="label"/>
           </list>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.1.017" expanded="true" height="76" name="Apply Model (2)" width="90" x="451" y="187">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.1.017" expanded="true" height="76" name="Apply Model" width="90" x="581" y="165">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance" compatibility="5.1.017" expanded="true" height="76" name="Performance" width="90" x="715" y="300"/>
         <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Neural Net" to_port="training set"/>
         <connect from_op="Normalize" from_port="preprocessing model" to_op="Apply Model (2)" to_port="model"/>
         <connect from_op="Neural Net" from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_op="Retrieve (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
         <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
         <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
         <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model (2)" from_port="model" to_port="result 4"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Apply Model" from_port="model" to_port="result 3"/>
         <connect from_op="Performance" from_port="performance" to_port="result 1"/>
         <connect from_op="Performance" from_port="example set" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="180"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
         <portSpacing port="sink_result 4" spacing="0"/>
         <portSpacing port="sink_result 5" spacing="0"/>
       </process>
     </operator>
    </process>
    The "running" process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
       <parameter key="logverbosity" value="all"/>
       <process expanded="true" height="460" width="688">
         <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="112" y="30">
           <parameter key="repository_entry" value="../../Models/h_with_ad_as"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve (2)" width="90" x="112" y="120">
           <parameter key="repository_entry" value="../../Models/norm_h_with_ad_as"/>
         </operator>
         <operator activated="true" class="open_file" compatibility="5.1.017" expanded="true" height="60" name="Open File" width="90" x="45" y="255">
           <parameter key="filename" value="C:\Users\ml1\Desktop\testdata.csv"/>
         </operator>
         <operator activated="true" class="read_csv" compatibility="5.1.017" expanded="true" height="60" name="Read CSV" width="90" x="112" y="345">
           <parameter key="csv_file" value="C:\Users\ml1n\Desktop\testdata.csv"/>
           <parameter key="column_separators" value=","/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <parameter key="encoding" value="windows-1252"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="avg_h_score_4.true.real.attribute"/>
             <parameter key="1" value="avg_h_score_10.true.real.attribute"/>
             <parameter key="2" value="avg_h_score_40.true.real.attribute"/>
             <parameter key="3" value="avg_h_concd_4.true.real.attribute"/>
             <parameter key="4" value="avg_h_concd_10.true.real.attribute"/>
             <parameter key="5" value="avg_h_concd_40.true.real.attribute"/>
             <parameter key="6" value="avg_a_score_4.true.real.attribute"/>
             <parameter key="7" value="avg_a_score_10.true.real.attribute"/>
             <parameter key="8" value="avg_a_score_40.true.real.attribute"/>
             <parameter key="9" value="avg_a_concd_4.true.real.attribute"/>
             <parameter key="10" value="avg_a_concd_10.true.real.attribute"/>
             <parameter key="11" value="avg_a_concd_40.true.real.attribute"/>
             <parameter key="12" value="h_at_strength_4.true.real.attribute"/>
             <parameter key="13" value="h_at_strength_10.true.real.attribute"/>
             <parameter key="14" value="h_at_strength_40.true.real.attribute"/>
             <parameter key="15" value="h_de_strength_4.true.real.attribute"/>
             <parameter key="16" value="h_de_strength_10.true.real.attribute"/>
             <parameter key="17" value="h_de_strength_40.true.real.attribute"/>
             <parameter key="18" value="a_at_strength_4.true.real.attribute"/>
             <parameter key="19" value="a_at_strength_10.true.real.attribute"/>
             <parameter key="20" value="a_at_strength_40.true.real.attribute"/>
             <parameter key="21" value="a_de_strength_4.true.real.attribute"/>
             <parameter key="22" value="a_de_strength_10.true.real.attribute"/>
             <parameter key="23" value="a_de_strength_40.true.real.attribute"/>
           </list>
           <parameter key="read_not_matching_values_as_missings" value="false"/>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.1.017" expanded="true" height="76" name="Apply Model (2)" width="90" x="313" y="210">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.1.017" expanded="true" height="76" name="Apply Model" width="90" x="514" y="120">
           <list key="application_parameters"/>
         </operator>
         <connect from_op="Retrieve" from_port="output" to_op="Apply Model" to_port="model"/>
         <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model (2)" to_port="model"/>
         <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
         <connect from_op="Read CSV" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
         <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="108"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • ml1nml1n Member Posts: 8 Contributor II
    Hi,
    Having tried changing various things around to try and understand this problem further it appears as if the only way I can get rid of this error is to store the data in the repository after the Set Role and Select Attributes operation and using it from there rather than the database (i.e not using a select or set role operator during training).  I also need to remove the normalization stage.
    Neither of those options are available to me for the real use case though, I need to try difference combinations of attributes during training rather than storing specific ones in the repository and the model works significantly better on the normalized data.

    I can't stop thinking that I'm missing something obvious. Am I doing something stupid?

    This is the only pair of processes I can get to work:
    Learning:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
        <process expanded="true" height="386" width="949">
          <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="255">
            <parameter key="repository_entry" value="//football/Latest/data/vwo_1011"/>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//football/Latest/data/vwo_0610"/>
          </operator>
          <operator activated="true" class="neural_net" compatibility="5.1.017" expanded="true" height="76" name="Neural Net" width="90" x="581" y="30">
            <list key="hidden_layers"/>
            <parameter key="normalize" value="false"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.1.017" expanded="true" height="60" name="Store" width="90" x="715" y="30">
            <parameter key="repository_entry" value="home_nn"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.017" expanded="true" height="76" name="Apply Model" width="90" x="581" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.1.017" expanded="true" height="76" name="Performance" width="90" x="715" y="300"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Retrieve" from_port="output" to_op="Neural Net" to_port="training set"/>
          <connect from_op="Neural Net" from_port="model" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <connect from_op="Performance" from_port="example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="180"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Applying:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
        <parameter key="logverbosity" value="all"/>
        <process expanded="true" height="460" width="688">
          <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="112" y="30">
            <parameter key="repository_entry" value="home_nn"/>
          </operator>
          <operator activated="true" class="open_file" compatibility="5.1.017" expanded="true" height="60" name="Open File" width="90" x="112" y="165">
            <parameter key="filename" value="C:\Users\ml1n\Desktop\testdata.csv"/>
          </operator>
          <operator activated="true" class="read_csv" compatibility="5.1.017" expanded="true" height="60" name="Read CSV" width="90" x="112" y="345">
            <parameter key="csv_file" value="C:\Users\ml1n\Desktop\testdata.csv"/>
            <parameter key="column_separators" value=","/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="avg_h_score_4.true.real.attribute"/>
              <parameter key="1" value="avg_h_score_10.true.real.attribute"/>
              <parameter key="2" value="avg_h_score_40.true.real.attribute"/>
              <parameter key="3" value="avg_h_concd_4.true.real.attribute"/>
              <parameter key="4" value="avg_h_concd_10.true.real.attribute"/>
              <parameter key="5" value="avg_h_concd_40.true.real.attribute"/>
              <parameter key="6" value="avg_a_score_4.true.real.attribute"/>
              <parameter key="7" value="avg_a_score_10.true.real.attribute"/>
              <parameter key="8" value="avg_a_score_40.true.real.attribute"/>
              <parameter key="9" value="avg_a_concd_4.true.real.attribute"/>
              <parameter key="10" value="avg_a_concd_10.true.real.attribute"/>
              <parameter key="11" value="avg_a_concd_40.true.real.attribute"/>
              <parameter key="12" value="h_at_strength_4.true.real.attribute"/>
              <parameter key="13" value="h_at_strength_10.true.real.attribute"/>
              <parameter key="14" value="h_at_strength_40.true.real.attribute"/>
              <parameter key="15" value="h_de_strength_4.true.real.attribute"/>
              <parameter key="16" value="h_de_strength_10.true.real.attribute"/>
              <parameter key="17" value="h_de_strength_40.true.real.attribute"/>
              <parameter key="18" value="a_at_strength_4.true.real.attribute"/>
              <parameter key="19" value="a_at_strength_10.true.real.attribute"/>
              <parameter key="20" value="a_at_strength_40.true.real.attribute"/>
              <parameter key="21" value="a_de_strength_4.true.real.attribute"/>
              <parameter key="22" value="a_de_strength_10.true.real.attribute"/>
              <parameter key="23" value="a_de_strength_40.true.real.attribute"/>
            </list>
            <parameter key="read_not_matching_values_as_missings" value="false"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.017" expanded="true" height="76" name="Apply Model" width="90" x="514" y="120">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Apply Model" to_port="model"/>
          <connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
          <connect from_op="Read CSV" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="108"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    M.
  • haddockhaddock Member Posts: 849 Maven
    G'Day!

    Probably got the wrong end of the stick, again, but if you want to train on one normalised set, and then apply that model on another set, then the 'Group Models' route looks the biz, it'll put the neural net and the normalisation models together into one model that can be applied elsewhere. Here's an example ...

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
       <process expanded="true" height="245" width="822">
         <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="38" y="31"/>
         <operator activated="true" class="normalize" compatibility="5.2.003" expanded="true" height="94" name="Normalize" width="90" x="179" y="30"/>
         <operator activated="true" class="neural_net" compatibility="5.2.003" expanded="true" height="76" name="Neural Net" width="90" x="320" y="27">
           <list key="hidden_layers"/>
         </operator>
         <operator activated="true" class="group_models" compatibility="5.2.003" expanded="true" height="94" name="Group Models" width="90" x="514" y="30"/>
         <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="165"/>
         <operator activated="true" class="apply_model" compatibility="5.2.003" expanded="true" height="76" name="Apply Model" width="90" x="648" y="120">
           <list key="application_parameters"/>
         </operator>
         <connect from_op="Generate Data" from_port="output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Neural Net" to_port="training set"/>
         <connect from_op="Normalize" from_port="preprocessing model" to_op="Group Models" to_port="models in 2"/>
         <connect from_op="Neural Net" from_port="model" to_op="Group Models" to_port="models in 1"/>
         <connect from_op="Group Models" from_port="model out" to_op="Apply Model" to_port="model"/>
         <connect from_op="Generate Data (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
         <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    Hope that's useful.

  • ml1nml1n Member Posts: 8 Contributor II
    Solved!
    I needed to set the "Create View" option on the Normalize data operator. Once I'd used the Group Models I started getting the error during the performance testing of the training process which led me to the solution.
    Thanks for the help Haddock, I was starting to tear out what little hair I have remaining. The Group Models is a nifty tip that makes my life a lot simpler too.

    Regards,
    M.
Sign In or Register to comment.