ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.

VOTING MATTERS!

IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.

NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.

RapidMiner AutoModel customize validation set

christos_karraschristos_karras Member Posts: 50 Guru
edited October 2019 in Product Ideas
I would like to make a feature request for RapidMiner AutoModel: it should be possible to customize the way the training and validation data is split. I often work with time series data, and in this type of data there are frequently correlations between rows that are close in time. AutoModel is splitting the training and validation set randomly, which means that information from the validation set leaks into the training set because of the correlation between nearby rows. Therefore, AutoModel always overestimates how good the model will be on new data. AutoModel should allow selecting an alternative training-validation splitting method, for example Linear sampling. Also, for cases where the built-in methods are not adequate, it should be possible to specify a custom validation set to AutoModel, to provide the flexibility to use any method to split the datasets before trying them in AutoModel.
Tagged:
1
1 votes

Open for Voting · Last Updated

PROD-898

Comments

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @christos_karras

    Can you provide more details on what kind of feature you are looking for in auto model?
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • christos_karraschristos_karras Member Posts: 50 Guru
    Hi @varunm1, see my edited description (the original comment was saved before I finished writing the details). Thanks
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited October 2019
    Hello @christos_karras

    Thanks, I looked at it. Actually, the current auto model is not intended for time series data as you already mentioned the reasons, I am not going in-depth. But still, you can use the auto model process for time series if you add appropriate operators like windowing before the data is fed to the ML model. This needs manual customization by going into the process after auto model completes running the process. You can always open the process in the auto model and carefully make changes in the process. In many instances, I change the split to cross-validation and @Noel uses auto model process for time series forecasting using windowing operators. It is a bit challenging at first as there are many connections in the auto model that needs to be taken care of while customizing manually, but the operator arrangement in 9.4 auto model is far better than earlier versions, thanks to @IngoRM for that.

    @IngoRM might inform you if there are any plans for time series. I think he will definitely have some.

    Just my 2c.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • christos_karraschristos_karras Member Posts: 50 Guru
    Yes, as you explained I have to resort to customizing the process generated by AutoModel, which takes longer compared to simply using AutoModel. While having more in-depth support for time series built-in directly AutoModel would be great, in the short term I think adding the ability to customize the validation set would be an easy way to make it more useful for time series data, or any other case where a random split is not adequate.

    I understand that AutoModel is not intended to completely replace a customized process, but rather it is just a way to get started faster. However, if AutoModel overestimates the performance of some model types, it may lead to taking the wrong direction for further customizations. For example, with an incorrect training-validation split, AutoModel could determine that a random forest is the best option, but then when I try with a customized process I could find that a random forest is not so good and it would have been better to use a linear model.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Stay tuned on Auto Model for time series ;)
    And on the second point of trying out the model on another independent data set quickly: the new Model Ops (Deployments) view in RM 9.4 makes this really simple now.  Check out the videos below:
    I recommend to watch them all three, but the second one would cover the "Scoring" functionality which is what you would need to do...
    Hope this helps,
    Ingo
  • NoelNoel Member Posts: 82 Maven
    @IngoRM Can't wait for Auto Model for time series!!
  • NoelNoel Member Posts: 82 Maven
    @christos_karras (& @varunm1)-

    One needs to be quite careful editing the exported Auto Model process to "convert" for use with time series. Personally, I did not appreciate all the interconnections that exist therein. (And I think there was a wholesale change in 9.4.)
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    @Noel

    @IngoRM will always keep us in hot seat with new releases :smiley:
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.