How to make automodel doing cross-validation?

wanglu2014wanglu2014 Member Posts: 19 Contributor II
edited June 2019 in Help

Thank for your attention. In automodel, imported data are splited into training and validation with a ratio. However, for improve the reliability of model,  can we modify the spliting process into cross-validation?

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Certainly, just open the process for the model you want and then change the process from split validation into cross validation and rerun.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • FatmaFatma Member Posts: 1 Contributor I
    edited March 2019
    Excuse me @Telcontar120 , I have the same question and couldn't understand from where to change the process from split validation into cross validation? I'm very sorry but I'm still beginner to the RapidMiner. I found split data block is this what you mean? if so how to split the data to for example leave on out, or k=4 in k-fold cross validation?
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    No, what I meant is that once you have the process, you can select the split validation operator, and replace it with the cross-validation operator instead.  This can be done by right-clicking on the split validation operator, or by manually copying the new cross-validation operator in, copying the operators out of the split validation into the cross-validation, and then deleting the split validation operator.  Same results.  In both cases, just make sure you have wired up the internal operators correctly.  See the cross-validation tutorial from the help if you need to double-check.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hello,

    Today I was showcasing RapidMiner AutoModel to a new coworker. With the Titanic dataset, if you select a Logistic Regression (that is the case I remember, but there might be many others) there is no such thing as a Split Validation operator. Instead, the process performs a Split Data operator in an early stage and applies the Performance operators as the final ones, which is what I call the manual way to perform validation.

    In that case, it is not as simple as changing the operator. (Others are, though).

    My advice would be to reorder the process and understand how it works because while AutoModel is a great beginning of a data science project, it is still a beginning: our project still lacks proper documentation (it still cannot generate the documentation for our domain expertise), removal of boilerplate steps (if our dataset doesn't have text, why handling text?), and adapting the process to our use cases.

    I know, this is not the kind of happy answer that magically solves our problems and having to go through the process is especially frustrating for newcomers to RapidMiner, but please focus in that RapidMiner does have a #noblackboxes philosophy that allows people to go from nought to 60 in a few seconds by having access to what the process does.

    (@Telcontar120, are you having the same deja vu I had? Wasn't this the topic of our conversation when we met each other?)

    Hope this helps,

    Rodrigo.
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited March 2019
    @IngoRM this looks great. For huge datasets, this method in AM works like a gem and also seems reliable based on your test. I am a bit confused about why you used the holdout sets in the process when you are splitting data randomly. Now it's clear.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    nice to know that you looked throughly into the matter, I trust AM even more right now.

    I think that once an adequate model is found in AM, one should train a new model with all the data in a new process, possibly with hyperparameter tuning.

    Regards,
    Sebastian

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    We are actually looking into a new deployment feature for Auto Model as we speak so simplify this process of retraining etc.  Stay tuned ;-)
  • YinYin Member Posts: 17 Contributor II
    @IngoRM I see that your post is from 2019, has this been implemented yet? 
  • yoni1961yoni1961 Member, University Professor Posts: 14 University Professor
    edited December 2022
    @IngoRM I see that your post is from 2019, has this been implemented yet?   Same question.... We have a small data set (106) and would like to use Cross-Validation....   Any more we need to know beyond your detailed (and GREAT explanation above????? (what you call my 2 cents with is much more than that????:) @Telcontar120
Sign In or Register to comment.