External Validation

noritanorita Member Posts: 29 Contributor I
I did the following process and got good performance results by cross validation. Now I want to run an extern data set on this very same model. How to do so?

The retriew valdays_complete thereby is the external set, Filter examples (2) selects the dementia subgroup (also the used subgroup for modelling).





Tagged:

Best Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi,

    doing the backward elimination and the other feature selection *before* the cross validation is not the best approach. You want to validate the entire modeling process, and feature selection is an important part of that. It does take longer because of the repetitions, but you should put the feature selection into the cross validation in the main process. 

    Does the random attribute stay in the data after the feature selection? One would expect that it is eliminated. So it shouldn't be in the model, and then it won't be relevant.

    You can connect any results you're interested in to the result ports. It will be interesting to compare the validation performance to the external data set performance.

    Regards,
    Balázs
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi @Norita,

    you can use the Remember operator in the cross validation after the feature selection to remember the weights for example. Then after the validation you would use Recall after the validation to retrieve the result.

    The list of the attributes is also available in most models, but it's usually harder to retrieve it from those.

    Regards,
    Balázs

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi @norita,

    the usual way to do this would be *another* Cross Validation around the Backward Elimination. You should validate the entire modeling process, not just the feature selection. The modeling in the outer validation would be of course the same as inside the B. Elimination.

    The outer Cross Validation has a "mod" output that gives you the model. You can then use Apply Model to apply this model on a new data set (the external data) given that it has the same attributes with the same type as those that went into the model. (Additional attributes don't matter.)

    So if you do a lot of preprocessing in Cleanse.Days.Data, you will need to do the same process on the external data to achieve the attribute structure expected by the model.

    Regards,
    Balázs
  • noritanorita Member Posts: 29 Contributor I
    So I came as far. Can you give me some remarks on what I have still to improve.
    What I wonder is if I have to insert on the external validation data also the operators from generate attribut (creates a random attribute to interpret the weights of the features better) to the backward elimination or if I did it right to enter the data directly on der produced model.
    Further I was wondering that the final performance (performance 9) the outpout of the operator "performance" goes to the result connection.

    Kind regards and thank you very much!
  • noritanorita Member Posts: 29 Contributor I
    edited June 2021













  • noritanorita Member Posts: 29 Contributor I
    edited June 2021
    Hi

    Thank you now it works and gives good results. Thank you.

    Do you know how I can see the final included attributes for prediction. Because I just realized that the Backward Selection slighly changes with each iteration. And setting the breaking point there provides me with different attribute resluts.
    Do you know how I can assess the final selection of the attributes of the final model?

    Kind regards

    Nora
Sign In or Register to comment.