🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

"SVM:

spitfire_chspitfire_ch Member Posts: 38  Guru
edited May 2019 in Help
hi,

sorry for bombarding you with posts currently. I guess it's a symptom of the learning phase.

Anyway, when fiddling around with support vector machines, I often run into the problem, that its keeps iterating forever (or at least for many hours) without getting any further. This is only the case for certain kernel types, other complete the analysis within minutes. I think this because they (the slow ones) don't manage to converge. What is the right thing to do in such a situation?  I can think of two adjustments:
  • Using a higher convergence epsilon
  • Decreasing max iterations
Is that correct? Or would I have to fiddle with epsilon rather than with convergence epsilon? Are there any recommended values I should use if I want to compare different kernel types with a optimize parameters operator? So far I run into the problem, that most types do ok, but one will eventually be caught in a loop. I can only hit stop then and all the gathered performance information of the other kernel types is lost.

Maybe this would also be ideas to implement: A "stop sub process button" in addition to the "stop everything button". This would only exit the current operation (eg. the validation of the current kernel type) and move on. The best parameters of the operations that were not canceled will be chosen.

Thanks for your advise
Hanspeter

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751  RM Founder
    Hi,

    sorry for bombarding you with posts currently. I guess it's a symptom of the learning phase.
    Don't worry! Your questions certainly hit more advanced topics and there certainly is a need for discussion at this level  :)

    This is only the case for certain kernel types, other complete the analysis within minutes. I think this because they (the slow ones) don't manage to converge.
    Yip. This is certainly the main reason. There are at least two other aspects: If you have a large number of features, certain kernel functions indeed need more time for their calculations. But the difference to the linear kernel should not be too high. Especially not, if the kernel cache can be used a lot. This is less likely for large numbers of examples or if the working set of the SVM often changes. And the latter again could be again a sign for a missing convergence.

    Your two adjustments are corect. It could also help sometimes to increase the kernel cache (if available). However, I would recommend to first start with a lower number of iterations first. Start with a real low number (like 500 or 1000) and check if anything was learned at all. Then you could increase the number once or twice in order to check if the results change much. If this is not the case, the SVM is probably not able to learn at all with this kernel function / settings.

    I can only hit stop then and all the gathered performance information of the other kernel types is lost.
    You could log the information with a "Log" operator in persistance mode. In this case, you will keep the information you got so far.

    Maybe this would also be ideas to implement: A "stop sub process button" in addition to the "stop everything button". This would only exit the current operation (eg. the validation of the current kernel type) and move on. The best parameters of the operations that were not canceled will be chosen.
    In general, this would indeed be helpful and some "anytime" operators (like the genetic feature selection or weightings) already offer such a function. Those operators have a parameter indicating if a "Should Stop?" dialog is shown. We could think of a generic mechanism for this "stop anytime" feature on a operator level but I doubt that it is really intuitive to offer this also on a generic subprocess level. And probably it would not even be possible in general.

    Cheers,
    Ingo
  • spitfire_chspitfire_ch Member Posts: 38  Guru
    Hi Ingo,

    thank you very much for your advice and insight. I will follow your suggested steps when trying to optimize a SVM learner. I am also very glad to hear about the log operator. I wasn't aware of it's capabilities. This does indeed sound very useful - as does the "stop" dialog feature on some operators. Thanks for pointing these features out!

    Best regards
    Hanspeter
Sign In or Register to comment.