Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

K-NN with training and testing CSV sets

RyujakkRyujakk Member Posts: 17 Maven
edited June 2019 in Help
Hi there,

[Edit: Solved... The training and testing files did not have the same attribute names. Now it works just fine!]

I'm testing out Rapid Miner 5.0, and I have a problem with a classic setup. I have two CSV input files (train and test), and I want to train the K-NN on the train file, and test it on the test file (the usual setup basically  ::) ).
The problem is that the K-NN predicts the same value for each test example. When I use the Weka implementation (W-IBk), I don't have that problem.

Here is my process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
 <context>
   <input>
     <location/>
   </input>
   <output>
     <location/>
     <location/>
   </output>
   <macros/>
 </context>
 <operator activated="true" class="process" expanded="true" name="Process">
   <process expanded="true" height="316" width="577">
     <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV TRAIN" width="90" x="45" y="30">
       <parameter key="file_name" value="train.csv"/>
       <parameter key="use_first_row_as_attribute_names" value="false"/>
     </operator>
     <operator activated="true" class="numerical_to_real" expanded="true" height="76" name="Numerical to Real" width="90" x="179" y="30"/>
     <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="30">
       <parameter key="name" value="train.csv_5"/>
       <parameter key="target_role" value="label"/>
     </operator>
     <operator activated="true" class="k_nn" expanded="true" height="76" name="k-NN" width="90" x="447" y="30">
       <parameter key="k" value="4"/>
     </operator>
     <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV TEST" width="90" x="45" y="165">
       <parameter key="file_name" value="test.csv"/>
       <parameter key="use_first_row_as_attribute_names" value="false"/>
     </operator>
     <operator activated="true" class="numerical_to_real" expanded="true" height="76" name="Numerical to Real (2)" width="90" x="179" y="165"/>
     <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role (2)" width="90" x="313" y="165">
       <parameter key="name" value="test.csv_5"/>
       <parameter key="target_role" value="label"/>
     </operator>
     <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="447" y="165">
       <list key="application_parameters"/>
     </operator>
     <connect from_op="Read CSV TRAIN" from_port="output" to_op="Numerical to Real" to_port="example set input"/>
     <connect from_op="Numerical to Real" from_port="example set output" to_op="Set Role" to_port="example set input"/>
     <connect from_op="Set Role" from_port="example set output" to_op="k-NN" to_port="training set"/>
     <connect from_op="k-NN" from_port="model" to_op="Apply Model" to_port="model"/>
     <connect from_op="Read CSV TEST" from_port="output" to_op="Numerical to Real (2)" to_port="example set input"/>
     <connect from_op="Numerical to Real (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
     <connect from_op="Set Role (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
     <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>
Thank you for any help!

- R
Tagged:
Sign In or Register to comment.