The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

"AdaBoost performance on new data (test dataset) MUCH worse than without AdaBoost"

miaquemiaque Member Posts: 4 Contributor I
edited June 2019 in Help
I have the following problem:
I am working on dataset of data suitable for modeling the classification problem of digits recognition.

The database consists of 64 normal attributes + one for the class. It consists of nearly 5000 examples and is divided for training set (30 digit-writers) and test set (another, new 14 writers).

For my study project I am obliged to use the meta-learning operators. I faced the problem, that without use of AdaBoost operator, the results are aprox. 85% for the  training set (X-Validation) and aprox. 80% for testing set (new data). When I try to implement AdaBoost, the results from X-Validation of training set are getting better - aprox. 90%, and MUCH WORSE for the new data - only 20% of accuracy!

Can anyone know what can be the issue here?

Thank you!


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    seems like you overtrain, right?
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.