Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

How to add weights to k-NN [SOLVED]

michaelhechtmichaelhecht Member Posts: 89 Maven
Hello,

I just sitting in a Rapid-I course I had the question how to add weights to the attributes for an k-NN operator. Now one could answer this satisfying.

So how to weight attributes (e.g.numeric ones) for a weighted distance for the k-NN operator?

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey,

    just let me know when your next break starts and I can explain it to you personally :)

    See you later!

    Marius
  • michaelhechtmichaelhecht Member Posts: 89 Maven
    Ok, meanwhile I understood what to do. One could normalize the attributes and then scale by weights.

    But what can I do with nominal attributes?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Well, for nominal attributes it is not possible to apply weights directly. You have to convert them to a numerical representation beforehand and then use the same technique as for numerical attributes, i.e. scaling the values. For the conversion you can e.g. use Nominal to Numerical with dummy_coding.

    Best regards,
    Marius
  • michaelhechtmichaelhecht Member Posts: 89 Maven
    I just discussed this with Ralf Klinkenberg. Here is what I implemented this night in the hotel  :)

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Golf" width="90" x="45" y="30">
           <parameter key="repository_entry" value="//Samples/data/Golf"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="5.3.007" expanded="true" height="94" name="Normalize" width="90" x="179" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Humidity|Temperature"/>
         </operator>
         <operator activated="true" class="nominal_to_binominal" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Binominal" width="90" x="313" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Wind|Play|Outlook"/>
         </operator>
         <operator activated="true" class="nominal_to_numerical" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Numerical" width="90" x="447" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Wind|Play|Outlook = sunny|Outlook = rain|Outlook = overcast"/>
           <list key="comparison_groups"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="5.3.007" expanded="true" height="94" name="Normalize (2)" width="90" x="581" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|Wind = true|Wind = false|Outlook = sunny = true|Outlook = sunny = false|Outlook = rain = true|Outlook = rain = false|Outlook = overcast = true|Outlook = overcast = false"/>
         </operator>
         <operator activated="true" class="weight_by_information_gain_ratio" compatibility="5.3.007" expanded="true" height="76" name="Weight by Information Gain Ratio" width="90" x="45" y="210"/>
         <operator activated="true" class="scale_by_weights" compatibility="5.3.007" expanded="true" height="76" name="Scale by Weights" width="90" x="179" y="210"/>
         <operator activated="true" class="k_nn" compatibility="5.3.007" expanded="true" height="76" name="k-NN" width="90" x="313" y="210">
           <parameter key="k" value="2"/>
         </operator>
         <operator activated="true" class="apply_model" compatibility="5.3.007" expanded="true" height="76" name="Apply Model" width="90" x="447" y="210">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance" compatibility="5.3.007" expanded="true" height="76" name="Performance" width="90" x="581" y="210"/>
         <connect from_op="Retrieve Golf" from_port="output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
         <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
         <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Normalize (2)" to_port="example set input"/>
         <connect from_op="Normalize (2)" from_port="example set output" to_op="Weight by Information Gain Ratio" to_port="example set"/>
         <connect from_op="Weight by Information Gain Ratio" from_port="weights" to_op="Scale by Weights" to_port="weights"/>
         <connect from_op="Weight by Information Gain Ratio" from_port="example set" to_op="Scale by Weights" to_port="example set"/>
         <connect from_op="Scale by Weights" from_port="example set" to_op="k-NN" to_port="training set"/>
         <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_op="k-NN" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Performance" from_port="performance" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>

    This performs better than direct application of k-NN with two neighbours, i.e. seems to work as expected.

    Nevertheless, what is missing (in my opinion) is a weight input for at least k-NN and Bayes-Operators (since weighting is e.g. a part of the Weka WAODE method, i.e. makes sense there and possibliy also for other operators) to apply attribute weighting in a "natural" way.
  • wesselwessel Member Posts: 537 Maven
    Marius wrote:

    Hey,

    just let me know when your next break starts and I can explain it to you personally :)

    See you later!

    Marius
    Where do you guys hang out?
    Dortmund University?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    michaelhecht wrote:


    This performs better than direct application of k-NN with two neighbours, i.e. seems to work as expected.
    Just for the record: to know if it really performs better you have to validate the model in a proper way, e.g. with a cross validation. Try setting k to 1 and you'll always get an accuracy of 100% on the training data :)

    Where do you guys hang out?
    Dortmund University?
    Rapid-I Headquarters, Dortmund :)
  • michaelhechtmichaelhecht Member Posts: 89 Maven
    Well, I added SOLVED to the topic, but finally the difference to a weighted k-NN is, that the weights are taken squared if e.g. euclidean distance is applied. So it isn't a real solution but a workaround to the missing weight input. ;)
Sign In or Register to comment.