how does learning a model take place?

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help


sorry for that maybe very stupid question, but I have some very basic question regarding learning the models.. how does it take place / what is the algorithm for learning based on the dataset? I know its always different for the different algorithms out there.


In instance-based methods like k-nn its quite easy, and I think I understand, it's just comparing new instances with already present instances in instance space, and vote for majority class, and new instances are learned by basically just "remembering" them and use them together with other instances for learning when new instances come in...


but how about Naive Bayes or SVM or decision trees?

In X-Validation, each training part is for learning a model, based on the instances in the training part, and then tested on the test-part... but what If the test part has very bad performance? like 10% accuracy, how is that part then being "applied", e.g "incorporated" into the trained-model to reach better performance for the test? I mean, after having trained the model, the model is finished and no further changes are made to it, especially there is no sub-sequent training that incorporates the test-part into it or is there sub-sequent training? furthermore, this would skew the test-performance, as the trained-part would have seen the test-part already in the training, or am I wrong?


my second question is: where can I see which algorithm uses attribute weighting? I tried to use weighting by "Generate Weight (Stratification)" operator because I have 3 labels and classes are imbalanced, 60%,30% and 10% prevalence, and then use the new weighted example set for LIBSVM and k-nn modeling, but it said they will make no use of it, why is that? I thought SVM could profit from balanced data?


I have methods for weighting in the test-round, but no found any good weighting methods for the training-round... balanced sampling is not a good solution, as I will have only a small dataset because my least often label has only 100 instances.... any ideas how to do this?

Best Answer

  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist
    Solution Accepted

    hi @Fred12


    I will try and answer few of your questions.

    As far as algorithms go, these are standard algo's, the math and how it exaclty learns is something that may be available on non-rapidminer sources.

    @IngoRM does a great job at explaining a few of them in his 5 minute series.




    As far as cross validation goes, there is no improvement happening,

    it is just "honest" training and testing, where in  each iteration the training data and test data have no overlap.

    if it is 10% accurate, it will drop the overall performance, since at the end the performance reported is average performance.

    Again here is Ingo's video explaining it really well



    Also here is a bonus beta access to you, I beleive you are talking about row weights and not attributes weight.




    select your column type, target type and advanced option (uses row weights) and see what algorithms you can use


    Edit: Typo



  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    ok I just recognized that was a pretty stupid question, training is based on training data of course, with the assumption that test data behaves the same as training data lol ;)


  • Options
    jujujuju Member Posts: 39 Guru

    improving performance happens when you do x-validation separately for many different parameters, and you choose the parameter that yields the best performance

    you can use process 'optimize paramters'




Sign In or Register to comment.