"AdaBoost performance on new data (test dataset) MUCH worse than without AdaBoost"

miaquemiaque Member Posts: 4 Contributor I
edited June 2019 in Help
Hello,
I have the following problem:
I am working on dataset of data suitable for modeling the classification problem of digits recognition.

The database consists of 64 normal attributes + one for the class. It consists of nearly 5000 examples and is divided for training set (30 digit-writers) and test set (another, new 14 writers).

For my study project I am obliged to use the meta-learning operators. I faced the problem, that without use of AdaBoost operator, the results are aprox. 85% for the  training set (X-Validation) and aprox. 80% for testing set (new data). When I try to implement AdaBoost, the results from X-Validation of training set are getting better - aprox. 90%, and MUCH WORSE for the new data - only 20% of accuracy!

Can anyone know what can be the issue here?

Thank you!

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,316  RM Data Scientist
    seems like you overtrain, right?
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.