Auto Model Performance. Is it training, testing, or validation?

Konradlk · November 2019

Image: https://us.v-cdn.net/6030995/uploads/editor/dn/0z70ar8g59i9.jpg

varunm1 · November 2019

@Konradlk

Here you go. I tried a couple of neural layers with different layer sizes and adding new layers. It looks like the best performance (in my trials) came with only one layer with 2 neurons. Adding more neurons or layers is reducing the Test performance as it seems overfitting.

The process attached seemed optimal with RMSE of 0.023 and Squared Correlation of 0.5. You can try other models and compare them with a neural network to see if the RMSE is decreasing and Square correlation is increasing. Higher squared correlation and lower RMSE are better.

Below are the testing data performances (RMSE & Squared Correlation respectively)
NN with a single layer and 4 neuron Test 0.025 0.430

NN with a single layer and 10 neuron Test 0.027 0.419

NN with two-layer and 2 neurons in each layer 0.027 0.395

NN with a single layer and 2 neurons test 0.023 0.50

Hope this helps.

varunm1 · November 2019

Hello @Konradlk

Auto model divides the original dataset into 60:40 split (Train: Test). The validation in the auto model is a multi hold out set validation. The model will be trained on 60% data and the 40% test data will be divided into 7 subsets. Once the model is trained, it will be used to make predictions on each of the 7 subsets independently and the performance of these 7 subsets will be averaged. So the performance you see in the auto model is from the test data using a multi hold out validation method.

Hope this helps. Please inform if you need more information.

Konradlk · November 2019

Thank you so much @varunm1 . Do you have resources to find the other errors?

varunm1 · November 2019

Do you have resources to find the other errors?

Can you inform what kind of resources and errors you are looking for?

If you click on "performance" of each model you can find different performance metrics like accuracy, precision, recall etc

Konradlk · November 2019

@varunm1

Hi, Im looking to get the performance vector for each step of the process. So I am looking for the performance vector of Training, Validation and Testing.

I was previously using a process a coworker left me, and they explicitly said that they need errors for all 3 stages. I am sorry that this is unclear. I do not have the greatest understanding of this and trying to learn very very quickly.

My goal is to run several different prediction models and compare the performance of the different models.

This pictures down below was what i was left with. I can post more information if necessary.

Image: https://us.v-cdn.net/6030995/uploads/editor/ma/bsuz8zrd99pc.jpg

Image: https://us.v-cdn.net/6030995/uploads/editor/y9/ipnvz7e2f8cl.jpg

lionelderkrikor · November 2019

Hi @Konradlk,

Your process is correct.
You have effectively :
- the training performance (given by the Performance operator in the "training" part of the Cross Validation operator)
- the validation performance (given by the Performance operator in the "testing" part of the Cross Validation operator)
- the testing performance (given by the Performance operator in the main process)

Do you encounter some errors with this process ?

Regards,

Lionel

Konradlk · November 2019

@lionelderkrikor

I do encounter errors when I try to change the neural network for deep learning or Generalized linear or SVM.

The problem I encounter is that no matter what the predictive models I run I get the exact same errors for each performance test.

When I run an auto model I get different errors for each model but not when I change them in my process. I change the models by just changing the neural network box to whatever else I wanted to run.

varunm1 · November 2019

I do encounter errors when I try to change the neural network for deep learning or Generalized linear or SVM.

Can you inform the details of those errors? If possible provide us with data and .rmp file to debug.

When I run an auto model I get different errors for each model but not when I change them in my process.

You might get different errors because the processes are different.

lionelderkrikor · November 2019

@Konradlk

The method of validation is different in AutoModel and in your process :
- In AutoModel, a split validation with a multi hold out set validation is performed like described by Varun. You can open the process generated by AutoModel to understand how is validated your model.
- In your process, you are using a Cross Validation.

Although performance should not differ significantly in both cases, the use of 2 different validations method can explain the differences.

Moreover you are applying a preprocessing step to your data (Normalization). To my knowledge, AutoModel does not apply such preprocessing step by default. This difference in the preprocessing step can explain the difference in the performance results. Once again you can open the process generated by AutoModel and compare it to your process.

But in order we can reproduce what you observe, and find what exactly is going on, can you share your data and your process (the process of your screenshot)

Regards,

Lionel

Konradlk · November 2019

@varunm1 @lionelderkrikor
Once again thank you both for your time and help. I am going to attach my .rmp file and both excel files I use. If either of you can help me figure out how to get decent data for neural network and at least one predictive model I would be so grateful.

For both of the excel files only the last sheet is used

varunm1 · November 2019

Hello @Konradlk

Any reference performance values you have or you are looking for? I modified your process and added an optimize parameter grid for the neural network. I didn't change layer information inside a neural network like adding neurons or layers.

I attached the working process without errors. You can change the layers in the neural network operator inside the optimize parameter (Grid) to see how different layers work. I will try other settings, you can add layers and try as well. Use Squared correlation and RMSE as your performance evaluation metrics.

Please let us know if you have more questions

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Auto Model Performance. Is it training, testing, or validation?

Best Answer

Be Safe. Follow precautions and Maintain Social Distancing

Answers

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing