"Is assigning custom weights to attributes and using those custom weights in algorithms possible?"

max_groh95max_groh95 Member Posts: 2 Contributor I
edited June 2019 in Help

I want to assign/set a custom weight to a binominal attribute, so that this attribute's impact on my prediction is much higher. I already know how to set custom weights with "weight by user specification" but i do not know how to input those weights in predictions/algorithms. Is there a possibility to do this in rapidminer?

Kind Regards

Max

Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Of course!  Although not all machine learning algorithms can utilize weights.  If you select any learner operator and then press F1 or right click and view operator information, you will get a helpful summary that lets you know whether that learner accepts weighted examples.  See the attached screenshot for neural nets, for example.  

     

    As long as the selected operator can use weights, you don't have to do anything special, just make sure you assign your weights in a prior step in your process and when the operator reads your exampleset, it will use the weights automatically.  There is also a helpful web app that tells you which learners can accept weights in RapidMiner, which you can find here (check under advanced options at the bottom on the left): http://mod.rapidminer.com/#app

     

    operator information.PNG 

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi Brian,

     

    I think you're mixing up weighting examples and attributes ;-)

     

    Weighting examples works as you describe. But I think the original poster asks about weighting attributes.

     

    Machine learning algorithms are all about determining the weight of attributes themselves. For example, Decision Tree selects the attributes based on their contribution to a pure class; Linear Regression assigns weights to numerical attributes itself etc. I don't think you can create a better model if you manipulate the weight of an attribute yourself. 

     

    It might work for k-NN. Normalize the other attributes first and then try different numeric values (e. g. 0 and 3, 0 and 4, 0 and 5) for the binominal attribute. That would give it a higher weight than the other attributes have.

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Balazs, thanks for the clarification!  Indeed you are correct, I had read the OP too quickly and my answer relates to example weights.  As you say, I typically let the algorithm decide the attribute weight on its own, since that's what it is designed to do.  Sorry @max_groh95 for any confusion.

     

    In the meantime, another way that you are able to implicitly overweight the binominal attribute of interest is simply to develop separate models/scorecards for your two populations (one with the binominal attribute =yes and the other = no).  This is an old trick from the world of financial services in scorecard development.  At that point, you are essentially conditioning the subsequent models on a particular value of your initial scorecard split.  Unfortunately that can't be expressed as a numerical weight compared to the other attributes in the model, but as a practical matter it is imposing a very high importance on that splitting attribute since everything else is essentially being derived as an interaction with it.  

     

    I hope this is helpful.

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • max_groh95max_groh95 Member Posts: 2 Contributor I

    Thanks for your replies!

    Unfortunately i am still trying to figure out how to connect these "assigned weights" to my algorithms. I already did connect by using the example set port, but i do not get any different results. 2016-12-07 08_59_58-__Local Repository_processes_testtestttest – RapidMiner Studio Free 7.2.002 @ MD.pngI used different operators for weights but it doesnt change the results. Nested in the "Validations" are simple Naive Bayes ALgos

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi,

     

    as I described earlier, you're doing a very non-standard thing. It won't work automatically with Naive Bayes or any other learner.

     

    For Brian's solution, you would use "Filter Examples" with your attribute and use the example set and the unmatched outputs for building different models (or doing different validations). You would then apply the right model according to the attribute value later.

     

    For the k-NN approach, you would normalize the data first, then assign the numeric values with the higher weight to your binary attribute after that, and then finally learn the model. It's a good idea to use Group Models inside the cross validation for this.

     

    Regards,

     

    Balázs

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    I think the operator you are looking for here is "Select by Weights" this will remove attributes from your dataset which do not meet a threshold criteria. 

    Then provide this smaller dataset to the algorithm. 

     

    For example: here is an example using the Titanic dataset.  

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
    </operator>
    <operator activated="true" breakpoints="after" class="weight_by_correlation" compatibility="7.3.000" expanded="true" height="82" name="Weight by Correlation" width="90" x="179" y="34"/>
    <operator activated="true" class="select_by_weights" compatibility="7.3.000" expanded="true" height="103" name="Select by Weights" width="90" x="380" y="30">
    <parameter key="weight_relation" value="greater"/>
    <parameter key="weight" value="0.3"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="7.3.000" expanded="true" height="145" name="Cross Validation (2)" width="90" x="581" y="289">
    <process expanded="true">
    <operator activated="true" class="naive_bayes" compatibility="7.3.000" expanded="true" height="82" name="Naive Bayes (2)" width="90" x="112" y="34"/>
    <connect from_port="training set" to_op="Naive Bayes (2)" to_port="training set"/>
    <connect from_op="Naive Bayes (2)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.3.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="7.3.000" expanded="true" height="82" name="Performance UnWeighted" width="90" x="246" y="34"/>
    <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance UnWeighted" to_port="labelled data"/>
    <connect from_op="Performance UnWeighted" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="7.3.000" expanded="true" height="145" name="Cross Validation" width="90" x="581" y="34">
    <process expanded="true">
    <operator activated="true" class="naive_bayes" compatibility="7.3.000" expanded="true" height="82" name="Naive Bayes" width="90" x="112" y="34"/>
    <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
    <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.3.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="7.3.000" expanded="true" height="82" name="Performance Weighted" width="90" x="246" y="34"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance Weighted" to_port="labelled data"/>
    <connect from_op="Performance Weighted" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Select by Weights" to_port="weights"/>
    <connect from_op="Weight by Correlation" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
    <connect from_op="Select by Weights" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Select by Weights" from_port="original" to_op="Cross Validation (2)" to_port="example set"/>
    <connect from_op="Select by Weights" from_port="weights" to_port="result 3"/>
    <connect from_op="Cross Validation (2)" from_port="model" to_port="result 5"/>
    <connect from_op="Cross Validation (2)" from_port="performance 1" to_port="result 2"/>
    <connect from_op="Cross Validation" from_port="model" to_port="result 4"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="18"/>
    <portSpacing port="sink_result 3" spacing="336"/>
    <portSpacing port="sink_result 4" spacing="126"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    <portSpacing port="sink_result 6" spacing="0"/>
    </process>
    </operator>
    </process>

     

Sign In or Register to comment.