🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

What type of validation does Auto Model use for small data sets?

ChinChin Member Posts: 2 Learner I
Hi Everyone,

I am in the middle of using RapidMiner Auto Model for classification in my thesis but can't seem to find information regarding what type of validation is used for Auto Model on a data set of 100 items. What type of validation does Auto Model use in my situation and can someone link me to documentation that I can reference for writeup? 

Also, what is the default split between testing and training data for Auto Model? 

Thanks so much in advance for your help!


Answers

  • varunm1varunm1 Moderator, Member Posts: 824   Unicorn
    Hello @Chin ;

    Currently, Automodel splits the dataset into 60:40 ratio (train:test). This is same for any dataset and doesn't depend on the size as per my understanding. Once the automodel executes, you can access it and see how the process is working.





    Hope this helps
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    Chin
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,665  RM Founder
    edited May 28
    Just to add to the great explanation of @varunm1
    In addition to the 60% : 40% split we do, we then perform a multiple hold-out set validation on the 40% test data, i.e. we split the 40% again into 7 parts, evaluate the model on each part, get rid of the two extremes / outliers, and build the average of the rest.  This way we keep many of the benefits of a cross-validation without it's biggest drawback: 5x-10x runtime increases.  In my experiments, I did not find significant differences between this approach and cross validation and if I ever find the time, I will write a nice blog post about it :-)
    Hope this helps,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    varunm1DocMushermbsChin
  • varunm1varunm1 Moderator, Member Posts: 824   Unicorn
    The only drawback of this method is that it cannot provide predictions for all the samples in our dataset incase we want to analyze (Example: healthcare data for the individual patient). But this is a specific requirement, so we need to manually add cross-validation to auto model process in this case. There will always be a trade-off :smile:
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    Chin
  • ChinChin Member Posts: 2 Learner I
    Thanks so much for your help, @IngoRM and @varunm1 : ) : ) : ) I really appreciate it.
    sgenzervarunm1
Sign In or Register to comment.