🦉🦉   WOOT WOOT!   RAPIDMINER WISDOM 2020 EARLY BIRD REGISTRATION ENDS FRIDAY DEC 13!   REGISTER NOW!   🦉🦉

Auto Model Performance. Is it training, testing, or validation?

KonradlkKonradlk Member Posts: 9 Learner I
edited November 22 in Help

Best Answer

Answers

  • varunm1varunm1 Moderator, Member Posts: 970   Unicorn
    Hello @Konradlk

    Auto model divides the original dataset into 60:40 split (Train: Test). The validation in the auto model is a multi hold out set validation. The model will be trained on 60% data and the 40% test data will be divided into 7 subsets. Once the model is trained, it will be used to make predictions on each of the 7 subsets independently and the performance of these 7 subsets will be averaged. So the performance you see in the auto model is from the test data using a multi hold out validation method.

    Hope this helps. Please inform if you need more information.
    lionelderkrikorKonradlkIngoRM
  • KonradlkKonradlk Member Posts: 9 Learner I
    Thank you so much @varunm1 . Do you have resources to find the other errors? 
  • varunm1varunm1 Moderator, Member Posts: 970   Unicorn
    Do you have resources to find the other errors?
    Can you inform what kind of resources and errors you are looking for?

    If you click on "performance" of each model you can find different performance metrics like accuracy, precision, recall etc
    Konradlk
  • KonradlkKonradlk Member Posts: 9 Learner I
    edited November 22
    @varunm1

    Hi, Im looking to get the performance vector for each step of the process. So I am looking for the performance vector of Training, Validation and Testing. 

    I was previously using a process a coworker left me, and they explicitly said that they need errors for all 3 stages. I am sorry that this is unclear. I do not have the greatest understanding of this and trying to learn very very quickly. 

    My goal is to run several different prediction models and compare the performance of the different models.

    This pictures down below was what i was left with. I can post more information if necessary. 


  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 911   Unicorn
    Hi @Konradlk,

    Your process is correct.
    You have effectively : 
     - the training performance (given by the Performance operator in the "training" part of the Cross Validation operator)
     - the validation performance (given by the Performance operator in the "testing" part of the Cross Validation operator)
     - the testing performance (given by the Performance operator in the main process)

    Do you encounter some errors with this process ?

    Regards,

    Lionel


    Konradlkvarunm1mbsIngoRM
  • KonradlkKonradlk Member Posts: 9 Learner I
    @lionelderkrikor

    I do encounter errors when I try to change the neural network for deep learning or Generalized linear or SVM.

    The problem I encounter is that no matter what the predictive models I run I get the exact same errors for each performance test. 

    When I run an auto model I get different errors for each model but not when I change them in my process. I change the models by just changing the neural network box to whatever else I wanted to run. 
  • varunm1varunm1 Moderator, Member Posts: 970   Unicorn
    I do encounter errors when I try to change the neural network for deep learning or Generalized linear or SVM.

    Can you inform the details of those errors? If possible provide us with data and .rmp file to debug.

    When I run an auto model I get different errors for each model but not when I change them in my process.
    You might get different errors because the processes are different.
    lionelderkrikor
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 911   Unicorn
    @Konradlk

    The method of validation is different in AutoModel and in your process : 
     - In AutoModel, a split validation with a multi hold out set validation is performed like described by Varun. You can open the process generated by AutoModel to understand how is validated your model.
     - In your process, you are using a Cross Validation.

    Although performance should not differ significantly in both cases, the use of 2 different validations method can explain the differences.

    Moreover you are applying a preprocessing step to your data (Normalization). To my knowledge, AutoModel does not apply such preprocessing step by default. This difference in the preprocessing step can explain the difference in the performance results. Once again you can open the process generated by AutoModel and compare it to your process.
      
    But in order we can reproduce what you observe, and find what exactly is going on, can you share your data and your process (the process of your screenshot)

    Regards,

    Lionel
    varunm1mbs
  • KonradlkKonradlk Member Posts: 9 Learner I
    @varunm1 @lionelderkrikor 
    Once again thank you both for your time and help. I am going to attach my .rmp file and both excel files I use. If either of you can help me figure out how to get decent data for neural network and at least one predictive model I would be so grateful.

    For both of the excel files only the last sheet is used
  • varunm1varunm1 Moderator, Member Posts: 970   Unicorn
    edited November 23
    Hello @Konradlk

    Any reference performance values you have or you are looking for? I modified your process and added an optimize parameter grid for the neural network. I didn't change layer information inside a neural network like adding neurons or layers. 

    I attached the working process without errors. You can change the layers in the neural network operator inside the optimize parameter (Grid) to see how different layers work. I will try other settings, you can add layers and try as well. Use Squared correlation and RMSE as your performance evaluation metrics.

    Please let us know if you have more questions
    lionelderkrikor
Sign In or Register to comment.