Options

"Weight by user spec/ scale by weights"

schillsschills Member Posts: 16 Contributor II
edited May 2019 in Help
Hello guys

I have been using "weight by user specification" and "scale by weights" to transform my data by weighting the attributes before it enters the learner model (default model,SVM or neural Net).
The data seems to be transformed in the correct way by being multiplied by the specified weights.

However, the problem comes with the prediction; the prediction of the label attribute (and associated probability) is exactly the same using the original, non-weighted data as it is using the new weighted data.
How can this be possible? How can the prediction be exactly the same for 2 different types of data? Surely weighting the attributes in different ways will affect the data and give a different prediction to the situation where the attributes were all weighted the same?
I have tried this approach of using weighted and unweighted attributes on the same data for many different learner models (eg SVM, NN, default model) and it doesnt seem to make a difference
Maybe I am not understanding the models correctly....but as I see it transforming the input data via weights should change the prediction.

Please see my process below; I have used the "gold data" from neural market trends to try and predict the "up or down trend"

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.002" expanded="true" name="Process">
    <process expanded="true" height="296" width="882">
      <operator activated="true" class="retrieve" compatibility="5.1.002" expanded="true" height="60" name="Retrieve" width="90" x="26" y="60">
        <parameter key="repository_entry" value="../Gold/label"/>
      </operator>
      <operator activated="true" class="weight_by_user_specification" compatibility="5.1.002" expanded="true" height="76" name="Weight by User Specification (2)" width="90" x="159" y="46">
        <parameter key="normalize_weights" value="false"/>
        <list key="name_regex_to_weights">
          <parameter key="^GDAXI" value="100.0"/>
          <parameter key="CD" value="100.0"/>
        </list>
      </operator>
      <operator activated="true" class="scale_by_weights" compatibility="5.1.002" expanded="true" height="76" name="Scale by Weights" width="90" x="296" y="51"/>
      <operator activated="true" class="default_model" compatibility="5.1.002" expanded="true" height="76" name="Default Model" width="90" x="514" y="75"/>
      <operator activated="true" class="retrieve" compatibility="5.1.002" expanded="true" height="60" name="Retrieve (2)" width="90" x="51" y="187">
        <parameter key="repository_entry" value="../Gold/no label"/>
      </operator>
      <operator activated="true" class="weight_by_user_specification" compatibility="5.1.002" expanded="true" height="76" name="Weight by User Specification" width="90" x="246" y="165">
        <parameter key="normalize_weights" value="false"/>
        <list key="name_regex_to_weights">
          <parameter key="^GDAXI" value="100.0"/>
          <parameter key="CD" value="100.0"/>
        </list>
      </operator>
      <operator activated="true" class="scale_by_weights" compatibility="5.1.002" expanded="true" height="76" name="Scale by Weights (2)" width="90" x="380" y="165"/>
      <operator activated="true" class="apply_model" compatibility="5.1.002" expanded="true" height="76" name="Apply Model" width="90" x="779" y="155">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Weight by User Specification (2)" to_port="example set"/>
      <connect from_op="Weight by User Specification (2)" from_port="weights" to_op="Scale by Weights" to_port="weights"/>
      <connect from_op="Weight by User Specification (2)" from_port="example set" to_op="Scale by Weights" to_port="example set"/>
      <connect from_op="Scale by Weights" from_port="example set" to_op="Default Model" to_port="training set"/>
      <connect from_op="Default Model" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Weight by User Specification" to_port="example set"/>
      <connect from_op="Weight by User Specification" from_port="weights" to_op="Scale by Weights (2)" to_port="weights"/>
      <connect from_op="Weight by User Specification" from_port="example set" to_op="Scale by Weights (2)" to_port="example set"/>
      <connect from_op="Scale by Weights (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Sorry for such a long post, but I have been at this for hours and would really appreciate some help

Cheers
Schills
Tagged:

Answers

  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    I can see two problems.

    Firstly, the "weight by user specification" fails to create a weight for the attribute ^GDAXI because the ^ is interpreted as some sort of regular experession so it fails to match the attribute. My workaround is to rename the attributes.

    Secondly, the default model operator is set to "median" which means it always chooses the median of the class labels which looks like it is "DOWN". Hence applying the model to unlabelled data will always yield "DOWN" regardless of the atttributes. If you change this to something like a neural network and add an XValidation block to output a performance, you can get some sort of view about how well the model would behave on unseen data.

    regards

    Andrew
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.002" expanded="true" name="Process">
        <process expanded="true" height="647" width="1016">
          <operator activated="true" class="retrieve" compatibility="5.1.002" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//DataMining/DataSets/GoldLabelled"/>
          </operator>
          <operator activated="true" class="guess_types" compatibility="5.1.002" expanded="true" height="76" name="Guess Types" width="90" x="179" y="30"/>
          <operator activated="true" class="rename_by_replacing" compatibility="5.1.002" expanded="true" height="76" name="Rename by Replacing" width="90" x="380" y="30">
            <parameter key="replace_what" value="\^"/>
          </operator>
          <operator activated="true" class="weight_by_user_specification" compatibility="5.1.002" expanded="true" height="76" name="Weight by User Specification (2)" width="90" x="514" y="30">
            <parameter key="normalize_weights" value="false"/>
            <list key="name_regex_to_weights">
              <parameter key="GDAXI" value="100.0"/>
              <parameter key="CD" value="100.0"/>
            </list>
          </operator>
          <operator activated="true" class="scale_by_weights" compatibility="5.1.002" expanded="true" height="76" name="Scale by Weights" width="90" x="648" y="30"/>
          <operator activated="true" class="retrieve" compatibility="5.1.002" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="165">
            <parameter key="repository_entry" value="//DataMining/DataSets/GoldUnlabelled"/>
          </operator>
          <operator activated="true" class="guess_types" compatibility="5.1.002" expanded="true" height="76" name="Guess Types (2)" width="90" x="179" y="165"/>
          <operator activated="true" class="rename_by_replacing" compatibility="5.1.002" expanded="true" height="76" name="Rename by Replacing (2)" width="90" x="380" y="165">
            <parameter key="replace_what" value="\^"/>
          </operator>
          <operator activated="true" class="weight_by_user_specification" compatibility="5.1.002" expanded="true" height="76" name="Weight by User Specification" width="90" x="514" y="165">
            <parameter key="normalize_weights" value="false"/>
            <list key="name_regex_to_weights">
              <parameter key="GDAXI" value="100.0"/>
              <parameter key="CD" value="100.0"/>
            </list>
          </operator>
          <operator activated="true" class="scale_by_weights" compatibility="5.1.002" expanded="true" height="76" name="Scale by Weights (2)" width="90" x="648" y="165"/>
          <operator activated="true" class="x_validation" compatibility="5.0.000" expanded="true" height="112" name="Validation" width="90" x="782" y="30">
            <description>A cross-validation evaluating a decision tree model.</description>
            <process expanded="true" height="654" width="466">
              <operator activated="true" class="neural_net" compatibility="5.1.002" expanded="true" height="76" name="Neural Net" width="90" x="213" y="30">
                <list key="hidden_layers"/>
              </operator>
              <connect from_port="training" to_op="Neural Net" to_port="training set"/>
              <connect from_op="Neural Net" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="654" width="466">
              <operator activated="true" class="apply_model" compatibility="5.0.000" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.0.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.002" expanded="true" height="76" name="Apply Model" width="90" x="916" y="165">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Guess Types" to_port="example set input"/>
          <connect from_op="Guess Types" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Weight by User Specification (2)" to_port="example set"/>
          <connect from_op="Weight by User Specification (2)" from_port="weights" to_op="Scale by Weights" to_port="weights"/>
          <connect from_op="Weight by User Specification (2)" from_port="example set" to_op="Scale by Weights" to_port="example set"/>
          <connect from_op="Scale by Weights" from_port="example set" to_op="Validation" to_port="training"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Guess Types (2)" to_port="example set input"/>
          <connect from_op="Guess Types (2)" from_port="example set output" to_op="Rename by Replacing (2)" to_port="example set input"/>
          <connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Weight by User Specification" to_port="example set"/>
          <connect from_op="Weight by User Specification" from_port="weights" to_op="Scale by Weights (2)" to_port="weights"/>
          <connect from_op="Weight by User Specification" from_port="example set" to_op="Scale by Weights (2)" to_port="example set"/>
          <connect from_op="Scale by Weights (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Validation" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    schillsschills Member Posts: 16 Contributor II
    Hi Andrew

    Thanks so much for taking the time to reply to my post! It was very informative and solved most my problems!

    I have one further question regarding using the Support Vector Machine (SVM);
    I have used your method (code) below, but replaced the Neural Net with a SVM as I believe it is more accurate for what I am trying to model (eg FOREX). However, using the SVM gives me the same problems as I mentioned before, in that the predictions and probabilities are exactly the same using weights and then not using weights....do you know why this is, or how to fix?
    Is it because the SVM applies its own weights and thus any weights you apply to the data before this is irrelevant?
    Is there a way to use your own weights with a SVM, or if this is not possible, is there a similar model to a SVM that you can apply weights to?

    Your method below works fine using Neural net and X-validation, but does not work when replacing the modelling operator with a SVM.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.002" expanded="true" name="Process">
        <process expanded="true" height="296" width="882">
          <operator activated="true" class="retrieve" compatibility="5.1.002" expanded="true" height="60" name="Retrieve" width="90" x="26" y="60">
            <parameter key="repository_entry" value="../Gold/label"/>
          </operator>
          <operator activated="true" class="support_vector_machine" compatibility="5.1.002" expanded="true" height="112" name="SVM" width="90" x="581" y="75"/>
          <operator activated="true" class="retrieve" compatibility="5.1.002" expanded="true" height="60" name="Retrieve (2)" width="90" x="51" y="187">
            <parameter key="repository_entry" value="../Gold/no label"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.002" expanded="true" height="76" name="Apply Model" width="90" x="779" y="155">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Compare the output prediction generated with the above code (unweighted) vs your code below (but changing NN to SVM).

    Any extra info on this issue would be much appreciated

    Cheers
    Schills
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello,

    In short, I don't know the details without spending more time but normalising is important for SVM. I can't be sure but I think all the work to scale attributes would be undone by an automatic normalising that might happen in the operator - see the following link.

    http://rapid-i.com/rapidforum/index.php/topic,83.0.html

    Generally, the literature often states the benefits of SVMs as being better for highly dimensioned data so it doesn't really matter about the number of attributes; it should ignore irrelevant ones.

    SVMs require care to get the parameters right, the link above itself has a link to a guide written by the authors of the libsvm code. There is no SilVer Magic bullet I'm afraid.

    regards,

    Andrew
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Schills,
    but replaced the Neural Net with a SVM as I believe it is more accurate for what I am trying to model (eg FOREX)
    .

    I'm interested in Forex so could you expand on why you believe that?

    Many thanks.
  • Options
    schillsschills Member Posts: 16 Contributor II
    Andrew
    Thanks again for all the info. I now understand SVM a lot better!

    Haddock
    I think SVM are best for Forex and other financial predictions because it works well for a large number of attributes. It is a great starting point and gives you a real good overview. Comparing the predictions to historical values, the SVM appeared to be the most accurate
Sign In or Register to comment.