🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance


New Sliding Window Validation operator in 9.4 BETA

NoelNoel Member Posts: 81  Maven
I have a process that was originally exported from Auto Model using a GBT (attached). I opened that process in the 9.4 Beta and replaced the Cross Validation operators with the new Sliding Window Validation operator (my data is time series). It is in this way that I'm trying to leverage the Optimize Parameters and Feature Selection functionality.

The issue I'm having is that output from the various operators don't jibe and I'm not sure what to make of the conflicting metrics.

For example, the Performance Vector indicates 64.94% accuracy / 35.06% error:

Whatever this is implies a 72.6% accuracy / 27.4% error:

I'm not sure what this says:

And finally the GBT model's output says 77.91% accuracy / 22.03% error (twice):

0 votes

Fixed and Released · Last Updated

released in 9.4.1 TSE-115


  • varunm1varunm1 Moderator, Member Posts: 1,207   Unicorn
    edited August 2019
    Hello @Noel

    I have gone through your process. I have some inputs as well as some questions. I see that you changed the process as noted in your question.

    Questions: Are all the images in this post are from the changed (modified) model? I am asking this question because of the discrepancy I am seeing in image 2 and image 3. The number of trees and depth changing. I see based on your process, for optimize parameters you set the criteria as (Maximal Depth and Number of trees), this seems to be inline with image 3. Image 2 seems like a default auto model Optimize parameter settings.

    Clarifying based on images posted in question from top to bottom.

    Image 1: Auto model will split data into 60:40 ration (train:test). The 40 percent testing data is again divided into 7 subsets on which the model is tested and the performance is averaged. So, the accuracy you are seeing here is the average performance of all the 7 hold out test sets.

    Image 2: Image 2 is related to optimize parameters. In this, the data is trained on different model parameters automatically based on the criteria set in optimize parameters, for example, maximal depth, number of trees and learning rate. This optimize parameter will find the best parameter based on the error rate. The one set in the auto model is classification error. This is visible in image 2. for different combinations what is the error rate.

    Image 3: Image 3 is based on your changes in optimize parameters criteria, these changed parameters are shown in the above-posted image. 

    Why are you getting zeros in image 3?
    The reason is when you are using "log performances" operator, you chose the wrong type of data to log. If you see the below image, you chose sliding window validation, but in log performances you chose, cross-validation, this is not available so it looks like the process is returning zero. You need to change the value to Sliding window validation. Also as I said, you should log the error rate.

    Image 4: This is the trained GBT model performance. @IngoRM can help in confirming this.

    Hope this helps. Please make changes in your process as informed. If you need some clarification please inform here.

    Be Safe. Follow precautions and Maintain Social Distancing

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,126   Unicorn
    Hi @Noel,

    I think that the second and the third metrics (second = 72,6% accuracy   / and third = 22,093 % error) are the training Error.
    If I good understand the 72,6% accuracy is the metrics calculated during the training phase based on the 60 % of the initial dataset if you are using AutoModel.
    The first metric (64.94 % accuracy) is the TEST ERROR. This test error is calculated on a hold out dataset (the 40 % of the initial dataset which haven't been used to train the model in AutoModel) .

    Thanks to correct me if I'm wrong...



  • NoelNoel Member Posts: 81  Maven
    @varunm1, @lionelderkrikor -

    Thanks for the responses. I have a better appreciation for the difference between output reporting training accuracy vs. testing accuracy.

    I still have (at least) two issues, though: 1.) the Sliding Window Validation operator in 9.4 Beta appears to have no "values" that it can report to optimization operators:

    2.) I wonder if the consequence of this potential lack of communication between operators is an inability to optimize the GBT and effectively select more successful sets of features...

  • varunm1varunm1 Moderator, Member Posts: 1,207   Unicorn
    Hello @Noel

    I have an alternative, if you want to run the sliding window validation without logging, you can directly connect the per port of sliding window validation to the output port of optimization. In the performance operator inside sliding window validation select classification error in criterion.

    @tftemme any suggestion on why the performances for sliding window validation cannot be logged out side of the operator.

    Be Safe. Follow precautions and Maintain Social Distancing

  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 161  RM Research
    Hello @Noel, @varunm1

    I will investigate why the Sliding Window Validation operator does not provide values for logging.

    Meanwhile, the Optimization operator does not use the logged value, it takes always the main criterion from the performance vector provided to its inner port. So the optimization works independently of the logging, no need to worry about that.

    Best regards,
Sign In or Register to comment.