The updates in 9.9 power advanced use cases and offer productivity enhancements for users who prefer to code.


Auto Model data set split using choice (e.g. linear sampling)

behnishbehnish Member Posts: 4 Contributor I
Hello, I wonder if it is possible to indicate e.g. linear sampling split for the training and test data set generation within the module "Auto Model".
Somehow the predicted values are far to good, so that the it would be better for my data set to use linear sampling to split the data set.
Of course it would be possible to do so after Auto Model using the stored process, but for convince it might better to chose first hand.
Thank you.

Best Answer

  • ceaperezceaperez Member Posts: 116   Unicorn
    Solution Accepted
    Hi @behnish, 
    The Auto model perform a lot of operations automatically using a standard good practices for ML. Each model created using these good practices has a lot of parameters and its unmanageable from a panel.
    the best solution is to run a Auto model and then go into the model and adap it



  • behnishbehnish Member Posts: 4 Contributor I
    Hello @ceaperez, thank u for the prompt response. Indeed, the Auto model gives a great overview about models and feature sets. Then that is the way to do it - adapt it afterwards.
    Best. T
  • behnishbehnish Member Posts: 4 Contributor I

    Hello, it looks like the Auto model is designed to extract interleaved training and test sets at a ratio of 0.6 to 0.4 over the whole example set range. The model gives then a very good regression with my dataset. 

    Creating the Model based on training and testing data sets using linear sampling (0.9 -0.1) resulted in an about 4 times worse performance. This indicates that the model needs further steps to get more generalized and the importance of the training set preparation.

    Thus, it would be still nice to have a choice for data set splitting in the Auto model.

    In addition, the problem remains how to further optimize the Model to get more generalized. One way could be to run the Model using a variety of data set splitting to optimize the Model parameters or to add random noise levels into the data, like in Image recognition approaches.

Sign In or Register to comment.