optimize k in k-nn

neginzneginz Member Posts: 17 Maven
edited December 2018 in Help


I am using k-nn algorithm for predict categories of some product with test mining on customers comments.

I have a Q about optimize parameter in classification using k-nn algorithm,

I want to optimize the K with "optimize parameters" and "log" operators for best accuracy in performance,

but I have 2 performance operator in my process below and i don't know where should I put optimize parameters and log process?

and need a help for it 





  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Solution Accepted

    Hi @neginz,


    There is a logic behind this.


    What do you want to optimize? The answer is k.

    Where is Waldo k? Somewhere inside the Cross Validation operator.

    So you want to enclose the Cross Validation inside the Optimize Parameters operator and choose k from the Parameters panel when configuring the parameter optimization.


    Please find attached.


    Hope it helps.


    All the best,


  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Solution Accepted



    I am sorry, I forgot about the log parameter.


    Inside the Optimize Parameters operator, right after Cross Validation. What you want to log is performance, hence you should put your operator there.


    One scatter plot is worth a thousand example sets:


    Screen Shot 2018-07-23 at 01.15.30.png


    Rule of thumb, according to CRISP-DM (which is a massive thing):


    • Understand the business. (You).
    • Understand the data. (You).
    • Prepare the data (RapidMiner Studio).
    • Build the model (RapidMiner Studio).
    • Evaluate and Optimize the model (RapidMiner Studio).
    • Deploy the model (RapidMiner Server).



    Model < Validation < Optimization


    So the biggest one (optimization) is performed over (validation), and that one should contain a model. You don't want to log the validation but the optimizations, so the Log operator goes after the Cross Validation operator.


    Hope it helps.


  • neginzneginz Member Posts: 17 Maven

    tnx for your replay @rfuentealba

    I know about optimize parameters but what about "log" operator where should I put that, and because I have 2 performance I don't know which one should compare with K (the one in cross-validation or the one out of it in the main process) my problem is with the performance ...

  • MaerkliMaerkli Member Posts: 84 Guru

    Hallo Rodrigo,


    I have deployed your second XML file but Cross Validation and Log Operators don't show up; is it normal? Should it be a subprocess

    of Optimize Parameters?


  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    BTW, I don't know why the "Process Documents from Data" operator appears red in my example. I know it is red because there is no such operator with that name, but I do have that operator.


    Perhaps @neginz can explain the part that I don't have so that we can compose the project properly, or someone else can point me to the solution? (Not that I'm worried about it, just trying to make it easier to others when they read this answer)


    Screen Shot 2018-07-25 at 13.23.47.png

  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hi @Maerkli,


    Following the principle of: "A visualization is worth a thousand example sets":


    Here is what I see when I open the process.


    Screen Shot 2018-07-25 at 13.09.06.png

    You see the Optimize Parameters operator? It's selected here:


    Screen Shot 2018-07-25 at 13.09.14.png


    If I double click on it, it opens the following:


    Screen Shot 2018-07-25 at 13.12.18.png


    There you go, hope it helps.


    HINT: If you see an operator that has a double border or otherwise it looks like there are two operators one over the other (like the Cross Validation operator, the Optimize Parameters (Grid) operator, and a few others), it means it's a superoperator, so you can doble click on it and explore its content. In fact, the Subprocess operator is one of these famous superoperators.



  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    (My two last messages were swapped, read the last before this one first, then the one before the last... cc @sgenzer wat)

  • neginzneginz Member Posts: 17 Maven

    hi @rfuentealba sorry for my delay

    I don't know why they are red even when I run your XML code there appear red for me, too  even though that was my process !!!

    BTW tnx for your help about log operator, it works well .

  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hello, @neginz! Glad it helped. I've marked the solutions as accepted, if you don't mind.


    I think that the Process Data From Files operator has an error loading here. By any chance, do you use Windows? I tried your process on Mac. By Occam's Razor, I think the culprit is having different filesystem layouts, hence I'll probably try to reproduce it tomorrow and if I can find the problem, submit a bug report.


    All the best!


  • neginzneginz Member Posts: 17 Maven


    it was "process from data" operator. and yes I use windows maybe that's the problem...

