Cross validation and AutoModel

tonyboy9tonyboy9 Member Posts: 113 Contributor II
I'm trying to understand how cross-validation works. I've looked at different processes showing cross-validation. This one process is by far the most interesting. I'm imagining what it must be like inside these operators trying to get the parameters correct. I assume this process is the gold standard when it comes to understanding how cross-validation works. 

I compared this with the AutoModel results classification errors over eight different models. 

As a data analyst working for a boss who wants results yesterday, what is there to gain risking errors building such a process, when I could run AutoModel?

I just noticed one of the operators in the process creates a lift chart. In AutoModel each model comes with its own lift chart.

Please advise. Thanks for your time.

Best Answers

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted
    hi @tonyboy9 I wrote an article a while back on cross-validation. You can find it here:



  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi @tonyboy9,

    data science is more than just building and validating models. You probably know that data preprocessing is about 2/3 to 3/4 of the average data science project. 

    AutoModel helps you a lot in selecting the appropriate model for a given dataset. Sometimes you select models based on other criteria: how fast they are, if they are able to cope with missing or text data, if you can explain the results well like with a decision tree, etc.

    One step of the entire process, selecting the best performing model, is automatic with AutoModel. But still you could take a few of the best models and further optimize them, experiment with preprocessing options inside the cross validation (binning, grouping, feature selection etc.). 

    The depicted process might be overly complex, as it standardizes data types and so on, so that the same process works on a large number of data sets. If your data set is already properly preprocessed, you might skip parts. Or you wouldn't use the Model Simulator while optimizing the model and so on. 

Sign In or Register to comment.