Accelerate parameter optimization for SVM

qwertz2qwertz2 Member Posts: 49 Guru
edited December 2018 in Help
Dear all,

I have used SVMs for a time and a while now but parameter optimization is new to me. With the default parameters the parameter optimization takes quite a while in my case.

So I was wondering which parameters might have the highest influence on the execution time (while keeping a similar performance). I could think of running parameter optimization while logging performance and execution time. However, I wanted to ask if there is a common / better approach...

Looking forward to any comments and feedback.

Best regards
Sachs
Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    A good approach is to either use the Optimize Parameters (Evolutionary) OR use a logarithmic scale for your options. 

  • qwertz2qwertz2 Member Posts: 49 Guru

     

    Hi Thomas,

     

    Thanks for sharing your ideas!

     

    I tried the evolutionary optimizer. The pro: This will probably find the “best” parameter and I runs exactly as often as required. (It would not go through a whole grid while the performance is already decreasing.) However, I have the impression that I could accelerate the optimizer by changing its parameters. Number of generations for example has a huge influence on the time. 5 is the default value but I have no feeling whether 2 is still enough to achieve good performance or whether I should better use 10 instead.

     

    And the optimizer has several more parameters. Of course it would be possible to test dependencies but this would take a long time. So I was wondering whether there are basic rules that give direction (e.g. in case of many attributes increase parameter X and decrease Y compared to default settings). In my case I have a data sets of about 10 to 20 attributes and 80 examples which has to be run manifold times.

     

     

    Best regards

    Sachs

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    With SVM's, IMHO, the gamma and C are the biggest parameters to optimize. There is a trade off, of course.  Have you seen this image? 

     

    http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Financial-Time-Series-Prediction/m-p/33456?lightbox-message-images-33515=551i694A2F729EAC22A8

  • qwertz2qwertz2 Member Posts: 49 Guru
    Hi Thomas,

    Thank you for your input. That picture was new to me and I like it pretty much as it gives a good idea to which range C and gamma can be limited for paremeter optimization.

    By now I understood which parameters to optimize of the SVM. But when I fooled around with evolutionary optimization I came across the terms:
    - Generation
    - Population
    - Individual

    Do you happen to know if there is any documentation which describes what is meant by these terms and what they provoke? Are they kinds of synonyms for attributes and examples?



    Best regards
    Sachs

  • qwertz2qwertz2 Member Posts: 49 Guru

     

    I am back with the current status of my research:

     

    1) Using the tutorial process of "optimize parameters evolutionary" operator I logged the example set provided within the optimizer. I found that the example set is the same for each iteration?!? But to my understanding it is supposed to change with each loop. *confused*

     

    2) Regarding the terms - after a while of reading - I came to this conclusion:

    individual ~ row
    population ~ combination of several individuals
    generation ~ combination of several populations

     

    Looking forward to enlightment :smileyhappy:

     

    Kind regards

    Sachs

  • qwertz2qwertz2 Member Posts: 49 Guru

     

    Ok, got so far to understand that I was completly wrong about my assumption. Population does not refer to the examples but to the "candidates" of the parameter to be optimized. This is why the example data remains the same for all iterations!

     

    Still the question remains: To accelerate the optimization shall I rather decrease the number of generations or the population size? What is the consequence of each option?

     

     

    Looking forward to any feedback.

    Sachs

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I love this thread, you're answering your own questions! lol. 

     

    Population size, tends to have a bigger impact right away in your performance, so i would start with that first. However, generations (when your run it a while) could have impacts later one. 

     

    When we did the PQL model for RapidMiner, we did Multi-Objective Feature Selection. It was, in a way to reduce the amount of attributes but extract the maximum performance. We noticed that after 300 or 600 generations, we had some good bumps in performance. So I would start with population and then generation. 

  • qwertz2qwertz2 Member Posts: 49 Guru

     

    Hi Thomas,

     

    I took the weekend to run a couple of samples to compare. I can confirm that you were right and that population size had the bigger impact on runtime and performance as well.

     

    Regarding answering my own questions: Yes, that happened a couple of time since I am using Rapidminer. Sometimmes I start with a question and while time passes by, knowledge grows along with hours in modelling and testing. And when I believe that a result might be useful to the community I post it back. That's what I consider the minimum I could do for the community where I found lot's of support! This sometimes ends up in a post where it seems that I am answering my own questions =)

     

     

    Best regards

    Sachs

Sign In or Register to comment.