How to manually give attribute weight

can_yucebascan_yucebas Member Posts: 7 Contributor II
edited August 2019 in Help
Hi to all,

I'm using a categorical dataset with 20 attributes and analyzed it successfuly with ID3 deision tree. What I'm trying do to for next step is to give attribute some weights so these higher weighted attributes to appear in first levels (nearest too root) of the tree. Can you help me about this? My biggest problem is I could not find an operator that will allow me to give attribute weights manually
Tagged:

Answers

  • SkirzynskiSkirzynski Member Posts: 164 Maven
    Have you tried the "Weight by User Specification" operator? Here is an example (the weight-operator is inside the decision tree operator):

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="431" width="547">
          <operator activated="true" class="generate_nominal_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30"/>
          <operator activated="true" class="decision_tree_weight_based" compatibility="5.3.000" expanded="true" height="60" name="Decision Tree (Weight-Based)" width="90" x="246" y="30">
            <process expanded="true" height="538" width="765">
              <operator activated="true" class="weight_by_user_specification" compatibility="5.3.000" expanded="true" height="76" name="Weight by User Specification" width="90" x="313" y="30">
                <list key="name_regex_to_weights">
                  <parameter key="att1" value="2.0"/>
                  <parameter key="att2" value="1.5"/>
                  <parameter key="att3" value="0.5"/>
                  <parameter key="att4" value="0.1"/>
                  <parameter key="att5" value="1.9"/>
                </list>
              </operator>
              <connect from_port="training set" to_op="Weight by User Specification" to_port="example set"/>
              <connect from_op="Weight by User Specification" from_port="weights" to_port="weights"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_weights" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Decision Tree (Weight-Based)" to_port="training set"/>
          <connect from_op="Decision Tree (Weight-Based)" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • can_yucebascan_yucebas Member Posts: 7 Contributor II
    Ok, thanks for the advice. I'm trying it on my model. As soon as I finished I'll share the results
  • dramhamptondramhampton Member Posts: 9 Contributor II
    I have a similar request.  in my case, I am looking to create a sentiment analysis model for user comments but do not have enough training data to use that, so I am using a list of words with sentiment weights from -5 to +5 (it's the AFINN database). I have a document term matrix so that the terms occurring in each user comment are all listed - in my case it is term frequency. 

    I wish to multiply these frequencies by the weight given to each term, eg if the document reads 'This was dire, a complete failure' I have the word 'dire' appearing once and also 'failure' appearing once.  'Dire' scores -3 and 'failure' scores -2, so the score for that comment would be 1*-3 plus 1*-2 = -5.

    I can't see how to do that - the Weight by User Specification operator requires each word to be entered separately and there are about 2500 words in my AFINN database so that's no good...

    many thanks

    David Hampton
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    @dramhampton,

    You may want to take a look at Word2Vec. It does exactly that.

    All the best,

    Rodrigo.
Sign In or Register to comment.