Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Bad performance when loading and applying a SVM Model

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help

hi,

I saved a SVM model as 

I got a performance of about 80%-87% for testing / training accuracy respectively.

When I now load the saved model and applied the model on some test data, I get only about 57%, and the contingency table shows me that there seems to be an issue... here is the design from loading and applying the model:

Unbenannt.JPG

 

and here is the performance:

 

Unbenannt2.JPG

ok my previous post was cut again...so here again:

when I am doing the same process, only with a k-nn model, I get also about 80% on new test data... that's why I'm asking me if this is a SVM operator related issue, or if I did something wrong with my process, or if its a saving the model issue...

 

the only log message that appeared was:

Aug 5, 2016 9:18:20 AM WARNING: Kernel Model: The value types between training and application differ for attribute 'ABC', training: real, application: integer

 

therefore, I imported the data again, and configured real for 'ABC', the message dissapeared but the result was the same unfortunately...

 

Tagged:

Answers

  • Fred12Fred12 Member Posts: 344 Unicorn

    in addition:

    I get the same performance (57%) when I am splitting my original excel file with the data in test and training data (split Data -> write to Excel...) when I then use the separate test excel data to test my model, I also get a surprisingly bad performance of 57%....

     

    altough if I apply the split Data operator directly in my design window for the process, the splitted test data gives me 80% performance if applied to the previously created model directly ( from the split data operator) it seems that nothing has changed between the two test data examples, but the performance is quite different when using split Data operator and if importing the splitted test data previously...

     

    I can not say the same for my saved k-nn model, there the extra imported test data performs also around 80%...

    is this a bug related to SVM operator?

  • bhupendra_patilbhupendra_patil Employee, Member Posts: 168 RM Data Scientist

    Hi @Fred12

     

    that 57% is not a great number in itself and more so because your model seems to predicting everything as 1.0

    Also between consecutive runs the random seed may change and hence leading to different results.

    You can specify local seeds to ensure repeatability.

     

    Also split data in your case is basicalyl sending over a part of data to apply model, so you are only testing a part of it.

    In those number of records it happens to get 57% correctly.

     

    What kind of attributes you have and what are you trying to predict ?

    You may need to do additional preprocessing, to see if you need to normalize, weight or do other feature processing before, but rung now the model is not usable, because everythign si predicted as 1.0

     

    Try some of the new algo that came with 7.2 version this week, they may yield better performance 

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    My first question is how was the SVM model trained? Was it trained using X-val? What were the performance measures there? 

     

    As @bhupendra_patil pointed out, the model is selecting for the one class over the others. What was the data like before you trained the model, was there a large majority class that overwhlemed a smaller minority class? Sometimes in situations like this, it's unbalanced data that is causing this.

  • Fred12Fred12 Member Posts: 344 Unicorn

    yeah class distributions (1,3,4) where about 50/30/20 %... but in X-Validation, it had about 80% performance, and I had stratified data split... therfore I'm wondering, if the model recognises correctly about 80% in X-Validation training / testing, why should it be otherwise when applying the same model for the other stratified test data? does imbalanced data have so much impact on the predictive accuracy on the model if previously in the X-Validation it seemed to work just fine?

Sign In or Register to comment.