Negative weights

tayebasta · December 2017

Hello,

I built a decision tree classification model, and I would like to know how each of the attributes is contributing to generating the obtained model. I connected the weights port to res port, I obtained positive and negative weight values; all of them in the range [-1,+1]. What do mean negative weights?

Does it mean that what does matter is the absolute value?

pschlunder · December 2017

Hi tayebasta,

in general the weights are the collected value an Attribute delivered throughout the model with regards to the chosen criterion (e.g. information gain).

I'd like to have a look into that, could you please share your process if possible, or describe it a bit?

You can just provide the .rmp file or copy the XML code here. To gain access go to "View -> Show Panel -> XML". This opens up a new view containing the XML code of your process. BTW: Did you know, that you can just drag & drop (or ctrl + v) XML code into your standard process view to copy a process into studio?

Regards,

Philipp

lionelderkrikor · December 2017

Hi @pschlunder

I meet the same thing (negative weights)

You can find my process here : (the training set in attached file) :

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
        <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Predictive_Analytics_and_Data_Mining\Dec 15 2014\09_Text_9.3.2_blog-gender-dataset-removed-missing.xlsx"/>
        <parameter key="imported_cell_range" value="A1:B3232"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="BLOG.true.text.attribute"/>
          <parameter key="1" value="GENDER.true.binominal.label"/>
        </list>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="GENDER.is_not_missing."/>
        </list>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
        <parameter key="prune_method" value="absolute"/>
        <parameter key="prune_below_absolute" value="20"/>
        <parameter key="prune_above_absolute" value="200"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="112" y="136"/>
          <operator activated="true" class="text:filter_stopwords_dictionary" compatibility="7.5.000" expanded="true" height="82" name="Filter Stopwords (Dictionary)" width="90" x="112" y="238">
            <parameter key="file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Predictive_Analytics_and_Data_Mining\Dec 15 2014\09_stop_word.txt"/>
          </operator>
          <operator activated="true" class="text:stem_porter" compatibility="7.5.000" expanded="true" height="68" name="Stem (Porter)" width="90" x="313" y="238"/>
          <operator activated="true" class="text:generate_n_grams_terms" compatibility="7.5.000" expanded="true" height="68" name="Generate n-Grams (Terms)" width="90" x="313" y="136"/>
          <operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="313" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Stopwords (Dictionary)" to_port="document"/>
          <connect from_op="Filter Stopwords (Dictionary)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
          <connect from_op="Stem (Porter)" from_port="document" to_op="Generate n-Grams (Terms)" to_port="document"/>
          <connect from_op="Generate n-Grams (Terms)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
          <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="246" y="187"/>
      <operator activated="true" class="weight_by_svm" compatibility="8.0.001" expanded="true" height="82" name="Weight by SVM" width="90" x="380" y="289"/>
      <operator activated="true" class="weight_by_information_gain" compatibility="8.0.001" expanded="true" height="82" name="Weight by Information Gain" width="90" x="380" y="187"/>
      <operator activated="true" class="select_by_weights" compatibility="8.0.001" expanded="true" height="103" name="Select by Weights" width="90" x="514" y="187">
        <parameter key="weight_relation" value="top k"/>
        <parameter key="k" value="20"/>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="8.0.001" expanded="true" height="145" name="Cross Validation" width="90" x="648" y="136">
        <process expanded="true">
          <operator activated="true" class="support_vector_machine" compatibility="8.0.001" expanded="true" height="124" name="SVM" width="90" x="179" y="34"/>
          <connect from_port="training set" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Multiply" to_port="input"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Weight by Information Gain" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Weight by SVM" to_port="example set"/>
      <connect from_op="Weight by SVM" from_port="weights" to_port="result 3"/>
      <connect from_op="Weight by Information Gain" from_port="weights" to_op="Select by Weights" to_port="weights"/>
      <connect from_op="Weight by Information Gain" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
      <connect from_op="Select by Weights" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Select by Weights" from_port="weights" to_port="result 1"/>
      <connect from_op="Cross Validation" from_port="model" to_port="result 4"/>
      <connect from_op="Cross Validation" from_port="example set" to_port="result 5"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 6"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
      <portSpacing port="sink_result 6" spacing="0"/>
      <portSpacing port="sink_result 7" spacing="0"/>
    </process>
  </operator>
</process>

Regards,

Lionel

tayebasta · December 2017

Hello,

I read Weight by Correlation in Rapidminer documentation:

"A correlation is a number between -1 and +1 that measures the degree of association between two attributes (call them X and Y). A positive value for the correlation implies a positive association. In this case, large values of X tend to be associated with large values of Y and small values of X tend to be associated with small values of Y. A negative value for the correlation implies a negative or inverse association. In this case, large values of X tend to be associated with small values of Y and vice versa." Does this apply to attribute weights in modeling (classification)?

pschlunder · December 2017

Hi @lionelderkrikor,

could you please provide a data set. I can't reconstruct the problem without the data you're using.

Thanks

@tayebasta Regarding the association to correlations: If we'd considere the negative weights to be reasonable, it would imply, that a split using this Attribute would worsen the decision. But I'd really like to see your process to investigate more. Are you providing a column with the role weight, that uses negative weights?

Regards,

Philipp

tayebasta · December 2017

Dear Mr. Philip,

Please find attached a copy of processes rmp file.

Regards,

tayebasta · December 2017

Dear Mr. Philip,

I exported the model from my home laptop to my office computer and run it on the same data. On both computers, I am using Rapidminer version 8. The only difference if the PC at home is 64 bit and the one in office is 32.

All weights I've got this time are positive. They are attached to this message.

I'll go home and compare and see what's up.

Regards,

Basta

lionelderkrikor · December 2017

Hi @pschlunder

Thanks you for your feedback.

effectively, i forgot to attach my entry data set.

You can find it in attached file, this time.

Best regards,

Lionel

pschlunder · December 2017

Hi @lionelderkrikor,

sorry I still can't reproduce your problem, since your process doesn't define a label, but uses label requireing Operators like 'Weight by Information Gain'.

@tayebasta looking forward to your findings.

Regards,

Philipp

lionelderkrikor · December 2017

Hi @pschlunder

The process work fine on my computer, but previously I did not specify that my label is set on the parameter data set meta data information of Read Excel operator :

then :

I hope the process will running with these informations.

Thanks you,

regards,

Lionel

pschlunder · December 2017

Sorry, my bad :smileylol:

I just loaded the data straight into Studio and replaced your Read Excel Operator with Retrieve >.<

@lionelderkrikor your process uses a SVM applied to a binominal classification problem. Internally one label value is seen as true and one as false. So when obtaining positive weights, they imply a relevance with regards to the true label value (in your case Gender = M), hence the highest positive weight is the word 'husband'. While the highest negative weight occurs for the token 'wife'. This value is a strong sign, that it is not the true label value, hence Gender = F. For that case the weight can be seen as something similar to a correlation as @tayebasta suggested.

Regards,

Philipp

lionelderkrikor · December 2017

Hi,

Thanks to you @tayebasta, @pschlunder for your explanations. It's much clearer to me.

Best regards,

Lionel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Negative weights

Answers