Achieved decent accuracy with random dep variable values

tkaisertkaiser Member Posts: 8 Contributor I
edited August 2019 in Help

I had a gradient boosted tree classification model, generated using the Auto Model, that produced a 70% f-measure for a given dependent variable value…but then I input random numbers for the dependent variable and ran a GBT model again, with the same exact example data, and the f-measure was 65%. So closer than I had expected, and wondering how that can be the case. Thank you. 



  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist



    what is the f-measure if you run a Default Model operator?



    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    tkaisertkaiser Member Posts: 8 Contributor I

    Sorry, but I am not sure where that would go in the auto model process. 


    And I have now uncovered a second perhaps more pressing problem. The auto model ran a 3 fold cross validation, thus validating the future accuracy of the predictive model, guarenteeing there is no overlap between training and test sets. F-measure was about 70%, accuracy 90%. But then i did a manual hold-out - essentially giving 90% of my original data set to the auto model (GBT again), and then testing the model on the 10% holdout data. Performance was a little lower, but close to original performance measures. But when i applied the hold-out test set, the model performed terribly. Would very much appreciate some guidance as I have now lost confidence in my model's ability to predict future data.        

Sign In or Register to comment.