The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

"Same accuracy...different predictions"

blueearthblueearth Member Posts: 42 Contributor II
edited June 2019 in Help
Hi i had a multi labels original data set which was weighted by two models gini index and uncertainty ...i collected attributes which gained weight more than .5 and made two databases....one based on gini index weighting attributes and the other one by uncertainty weighting attributes.
i ran these two data sets with a x-validation which trained by neural network operator...the achieved accuracy was same for both data sets :99.45%
but when i applied this model on an unknown database once with gini index attributes and once with uncertainty attributes ...the achieved prediction was completely different...whats the problem ? did i go wrong somewhere?
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    the accuracy represents only the probability that a new, unseen example drawn from the same distribution as the training set is classified correctly. This leaves room for different predictions on unseen data.

    To tell if you did anything wrong we need more detailed information on what you did.
    For example the data on which you apply a model must have the same attributes as the training data. So you can't apply a model trained on attribute set A on an example set with attribute set B and expect sensible results. From your description we can't see though what exactly you have done.

    Best, Marius
  • blueearthblueearth Member Posts: 42 Contributor II
    according to apply model describe...i made by unknown data attributes on my trained data attributes...every thing such as count, label ,order of attributes were exactly same as trained data set...so i made two unknown databases one based on gini index attributes and other one based on uncertainty attributes.
    but as i told before although the accuracy of my trained databases were same but the predictions were completely different....
    here is my weighting process
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="1475" width="768">
          <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="../../Data/F C Data"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="value_type" value="numeric"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.2.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="210">
            <list key="columns"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="45" y="345"/>
          <operator activated="true" class="weight_by_gini_index" compatibility="5.2.008" expanded="true" height="76" name="Weight by Gini Index" width="90" x="179" y="210"/>
          <operator activated="true" class="select_by_weights" compatibility="5.2.008" expanded="true" height="94" name="Select by Weights (5)" width="90" x="380" y="210">
            <parameter key="weight" value="0.7"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.2.008" expanded="true" height="60" name="Store (5)" width="90" x="581" y="210">
            <parameter key="repository_entry" value="../../Results/Attribute Weighting/Gini Index"/>
          </operator>
          <operator activated="true" class="weight_by_uncertainty" compatibility="5.2.008" expanded="true" height="76" name="Weight by Uncertainty" width="90" x="171" y="342"/>
          <operator activated="true" class="select_by_weights" compatibility="5.2.008" expanded="true" height="94" name="Select by Weights (7)" width="90" x="380" y="345">
            <parameter key="weight" value="0.7"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.2.008" expanded="true" height="60" name="Store (6)" width="90" x="581" y="345">
            <parameter key="repository_entry" value="../../Results/Attribute Weighting/Uncertainty"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Weight by Gini Index" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Weight by Uncertainty" to_port="example set"/>
          <connect from_op="Weight by Gini Index" from_port="weights" to_op="Select by Weights (5)" to_port="weights"/>
          <connect from_op="Weight by Gini Index" from_port="example set" to_op="Select by Weights (5)" to_port="example set input"/>
          <connect from_op="Select by Weights (5)" from_port="example set output" to_op="Store (5)" to_port="input"/>
          <connect from_op="Store (5)" from_port="through" to_port="result 1"/>
          <connect from_op="Weight by Uncertainty" from_port="weights" to_op="Select by Weights (7)" to_port="weights"/>
          <connect from_op="Weight by Uncertainty" from_port="example set" to_op="Select by Weights (7)" to_port="example set input"/>
          <connect from_op="Select by Weights (7)" from_port="example set output" to_op="Store (6)" to_port="input"/>
          <connect from_op="Store (6)" from_port="through" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="468"/>
          <portSpacing port="sink_result 2" spacing="162"/>
          <portSpacing port="sink_result 3" spacing="126"/>
        </process>
      </operator>
    </process>
  • blueearthblueearth Member Posts: 42 Contributor II
    and this is my training process ....the model applier process were exactly done according to rapidminer samples ....whats the problem? how can i get different  predictions when i have same accuracy ? and how can it be fixed ?
    at least the predictions should not be so different from each others
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
       <process expanded="true" height="353" width="701">
         <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="gini index" width="90" x="45" y="30">
           <parameter key="repository_entry" value="../../../Results/Attribute Weighting/Gini Index"/>
         </operator>
         <operator activated="true" class="replace_missing_values" compatibility="5.2.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="246" y="30">
           <list key="columns"/>
         </operator>
         <operator activated="true" class="x_validation" compatibility="5.2.008" expanded="true" height="112" name="Neural Net a" width="90" x="581" y="30">
           <parameter key="use_local_random_seed" value="true"/>
           <process expanded="true" height="506" width="399">
             <operator activated="true" class="neural_net" compatibility="5.2.008" expanded="true" height="76" name="Neural Net" width="90" x="154" y="30">
               <list key="hidden_layers"/>
             </operator>
             <connect from_port="training" to_op="Neural Net" to_port="training set"/>
             <connect from_op="Neural Net" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true" height="506" width="399">
             <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="226" y="30"/>
             <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
             <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
             <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Uncertainty" width="90" x="45" y="210">
           <parameter key="repository_entry" value="../../../Results/Attribute Weighting/Gini Index"/>
         </operator>
         <operator activated="true" class="replace_missing_values" compatibility="5.2.008" expanded="true" height="94" name="Replace Missing Values (2)" width="90" x="246" y="210">
           <list key="columns"/>
         </operator>
         <operator activated="true" class="x_validation" compatibility="5.2.008" expanded="true" height="112" name="Neural Net a (2)" width="90" x="581" y="210">
           <parameter key="use_local_random_seed" value="true"/>
           <process expanded="true">
             <operator activated="true" class="neural_net" compatibility="5.2.008" expanded="true" name="Neural Net (2)">
               <list key="hidden_layers"/>
             </operator>
             <connect from_port="training" to_op="Neural Net (2)" to_port="training set"/>
             <connect from_op="Neural Net (2)" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true">
             <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" name="Apply Model (3)">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" name="Performance (3)"/>
             <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
             <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
             <connect from_op="Performance (3)" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <connect from_op="gini index" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
         <connect from_op="Replace Missing Values" from_port="example set output" to_op="Neural Net a" to_port="training"/>
         <connect from_op="Neural Net a" from_port="averagable 1" to_port="result 1"/>
         <connect from_op="Uncertainty" from_port="output" to_op="Replace Missing Values (2)" to_port="example set input"/>
         <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Neural Net a (2)" to_port="training"/>
         <connect from_op="Neural Net a (2)" from_port="averagable 1" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
  • blueearthblueearth Member Posts: 42 Contributor II
    can someone please explain if anything is wrong  or if there is a something i should know about process?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey, you have enabled the shuffle option in the Neural Net operators, which implies the use of random numbers. If you use the same local random seed in both Neural Net operators, you should get identical results.

    Best, Marius
Sign In or Register to comment.