"Retrieve KNN Distance Results"

michaelglovenmichaelgloven RapidMiner Certified Analyst, Member Posts: 46 Guru
edited June 2019 in Help

Hi, is there an operator to extract distance results from application of KNN lazy learner to labeled and scored data? I would like to see the underlying data driving the predictions.

Best Answer

  • Options
    michaelglovenmichaelgloven RapidMiner Certified Analyst, Member Posts: 46 Guru
    Solution Accepted

    good ideas,  looks like I can get what I'm looking for thru data to similarity operator. Also, the graph outputs (spring) are especially helpful in visualizing the KNN method.


  • Options
    kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @michaelgloven

    Not really sure 100% in my guessing, but maybe 'Cross Distances' operator might help you in this case?

    I have never used it myself on real data but it seems it has same distance measures as k-NN does. 

  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Which distance are you looking for?  The distances to the k nearest neighbors themselves would be fine for a k of 1 to 3, but will look pretty messy when you reach k=50+.


    Here's a sample process, personally I'm not too keen. 


    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    <operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="179" y="136">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.7"/>
    <parameter key="ratio" value="0.3"/>
    <operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN" width="90" x="313" y="238">
    <parameter key="k" value="3"/>
    <operator activated="true" class="cross_distances" compatibility="8.2.000" expanded="true" height="103" name="Cross Distances" width="90" x="380" y="85">
    <parameter key="only_top_k" value="true"/>
    <parameter key="k" value="3"/>
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="514" y="34">
    <list key="aggregation_attributes">
    <parameter key="distance" value="average"/>
    <parameter key="group_by_attributes" value="request"/>
    <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="289">
    <list key="application_parameters"/>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="648" y="34">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="request"/>
    <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="715" y="136">
    <parameter key="attribute_name" value="average(distance)"/>
    <parameter key="target_role" value="distance_measure"/>
    <list key="set_additional_roles"/>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="k-NN" to_port="training set"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Cross Distances" to_port="request set"/>
    <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="k-NN" from_port="exampleSet" to_op="Cross Distances" to_port="reference set"/>
    <connect from_op="Cross Distances" from_port="result set" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Cross Distances" from_port="request set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>


Sign In or Register to comment.