Cross validation and AutoModel

tonyboy9 · February 2021

I'm trying to understand how cross-validation works. I've looked at different processes showing cross-validation. This one process is by far the most interesting. I'm imagining what it must be like inside these operators trying to get the parameters correct. I assume this process is the gold standard when it comes to understanding how cross-validation works.

Image: https://us.v-cdn.net/6030995/uploads/editor/bi/eyjitn9hy3hl.png

I compared this with the AutoModel results classification errors over eight different models.

As a data analyst working for a boss who wants results yesterday, what is there to gain risking errors building such a process, when I could run AutoModel?

I just noticed one of the operators in the process creates a lift chart. In AutoModel each model comes with its own lift chart.

Please advise. Thanks for your time.

Image: https://us.v-cdn.net/6030995/uploads/editor/r7/0aytbpcvpxa0.png

sgenzer · February 2021

hi @tonyboy9 I wrote an article a while back on cross-validation. You can find it here:

https://community.rapidminer.com/discussion/55112/cross-validation-and-its-outputs-in-rm-studio

Scott

BalazsBarany · February 2021

Hi @tonyboy9,

data science is more than just building and validating models. You probably know that data preprocessing is about 2/3 to 3/4 of the average data science project.

AutoModel helps you a lot in selecting the appropriate model for a given dataset. Sometimes you select models based on other criteria: how fast they are, if they are able to cope with missing or text data, if you can explain the results well like with a decision tree, etc.

One step of the entire process, selecting the best performing model, is automatic with AutoModel. But still you could take a few of the best models and further optimize them, experiment with preprocessing options inside the cross validation (binning, grouping, feature selection etc.).

The depicted process might be overly complex, as it standardizes data types and so on, so that the same process works on a large number of data sets. If your data set is already properly preprocessed, you might skip parts. Or you wouldn't use the Model Simulator while optimizing the model and so on.

Regards,
Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Cross validation and AutoModel

Best Answers