Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

kNN with Optimize Parameters (Grid)

User8259User8259 Member, University Professor Posts: 8 University Professor
Trying a few Split-Validation experiments using Optimize Parameters (Grid) with kNN. In the following, everything is held the same except the changes noted below and the results, which are inconsistent:

Batch 1:-

Run 1. k = 1, 3, 5, ... ,25:
Run 2. k = 1,3,5.
Run 3. k =1.

The inconsistency is with the results  (Accuracy, Kappa, F-Measure) for k = 1.

Run 1 produces different results than Runs 2 and 3 despite all else being held fixed.
Run 2 differs from Run 1 only when Local Seed is 1. They agree for the remaining seed choices.
Runs  1 & 2 results agree for k = 3 & 5.

Because the problem appeared to manifest with k -1, I tried a few runs but started with k = 3, instead of 1.

Batch 2:-

Run 1. k = 3, 5, ...., 25
Run 2. k = 3, 5, 7, 9, 11.
Run 3. k = 3.

Again, mutual inconsistencies showed up only with k = 3.

Notably, the same results showed up for k = 3 and Local Seed = 1 in Run 1 and k = 3 and Local Seed = 11 in Run 2. There may be other such peculiarities but this caught my eye.

The Seed = 1 and Seed = 11 results for the two runs are not the same but the Grid results for Seed 1 and Seed 11 "criss-cross" as just mentioned between the two runs.

And, the results for k = 3 from the second batch of three runs do not match the results for k = 3 from the first batch.

As stated at the outset, all else is exactly the same in these runs, to my knowledge. Am I missing something obvious?

I am using the same platform to run these. I can share the input file, the respective process files, and the results logged into an Excel sheet for ease of comparison via email, with anybody who wants to take a look. Pl. send me an email address.

Thanks!


 




Best Answer

  • User8259User8259 Member, University Professor Posts: 8 University Professor
    Solution Accepted
    An Update on the Above:

    Found that if I specified "Use Local Random Seed" and then selected a value of "true" for this parameter, then I get consistent results between the runs. In earlier versions, if I just specified "Local Random Seed" values to use, RMS apparently understood that "Use Local Random Seed" was "true." I did not have to also specify "Use Local Random Seed." Hopefully, the results I am getting are also correct, but they certainly are mutually consistent.
Sign In or Register to comment.