Options

K-NN with training and testing CSV sets

RyujakkRyujakk Member Posts: 17 Maven
edited June 2019 in Help
Hi there,

[Edit: Solved... The training and testing files did not have the same attribute names. Now it works just fine!]

I'm testing out Rapid Miner 5.0, and I have a problem with a classic setup. I have two CSV input files (train and test), and I want to train the K-NN on the train file, and test it on the test file (the usual setup basically  ::) ).
The problem is that the K-NN predicts the same value for each test example. When I use the Weka implementation (W-IBk), I don't have that problem.

Here is my process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
 <context>
   <input>
     <location/>
   </input>
   <output>
     <location/>
     <location/>
   </output>
   <macros/>
 </context>
 <operator activated="true" class="process" expanded="true" name="Process">
   <process expanded="true" height="316" width="577">
     <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV TRAIN" width="90" x="45" y="30">
       <parameter key="file_name" value="train.csv"/>
       <parameter key="use_first_row_as_attribute_names" value="false"/>
     </operator>
     <operator activated="true" class="numerical_to_real" expanded="true" height="76" name="Numerical to Real" width="90" x="179" y="30"/>
     <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="30">
       <parameter key="name" value="train.csv_5"/>
       <parameter key="target_role" value="label"/>
     </operator>
     <operator activated="true" class="k_nn" expanded="true" height="76" name="k-NN" width="90" x="447" y="30">
       <parameter key="k" value="4"/>
     </operator>
     <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV TEST" width="90" x="45" y="165">
       <parameter key="file_name" value="test.csv"/>
       <parameter key="use_first_row_as_attribute_names" value="false"/>
     </operator>
     <operator activated="true" class="numerical_to_real" expanded="true" height="76" name="Numerical to Real (2)" width="90" x="179" y="165"/>
     <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="313" y="165">
       <parameter key="name" value="test.csv_5"/>
       <parameter key="target_role" value="label"/>
     </operator>
     <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="165">
       <list key="application_parameters"/>
     </operator>
     <connect from_op="Read CSV TRAIN" from_port="output" to_op="Numerical to Real" to_port="example set input"/>
     <connect from_op="Numerical to Real" from_port="example set output" to_op="Set Role" to_port="example set input"/>
     <connect from_op="Set Role" from_port="example set output" to_op="k-NN" to_port="training set"/>
     <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
     <connect from_op="Read CSV TEST" from_port="output" to_op="Numerical to Real (2)" to_port="example set input"/>
     <connect from_op="Numerical to Real (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
     <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
     <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
Thank you for any help!

- R
Tagged:
Sign In or Register to comment.