The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Accelerate parameter optimization for SVM
Dear all,
I have used SVMs for a time and a while now but parameter optimization is new to me. With the default parameters the parameter optimization takes quite a while in my case.
So I was wondering which parameters might have the highest influence on the execution time (while keeping a similar performance). I could think of running parameter optimization while logging performance and execution time. However, I wanted to ask if there is a common / better approach...
Looking forward to any comments and feedback.
Best regards
Sachs
I have used SVMs for a time and a while now but parameter optimization is new to me. With the default parameters the parameter optimization takes quite a while in my case.
So I was wondering which parameters might have the highest influence on the execution time (while keeping a similar performance). I could think of running parameter optimization while logging performance and execution time. However, I wanted to ask if there is a common / better approach...
Looking forward to any comments and feedback.
Best regards
Sachs
Tagged:
0
Answers
A good approach is to either use the Optimize Parameters (Evolutionary) OR use a logarithmic scale for your options.
Hi Thomas,
Thanks for sharing your ideas!
I tried the evolutionary optimizer. The pro: This will probably find the “best” parameter and I runs exactly as often as required. (It would not go through a whole grid while the performance is already decreasing.) However, I have the impression that I could accelerate the optimizer by changing its parameters. Number of generations for example has a huge influence on the time. 5 is the default value but I have no feeling whether 2 is still enough to achieve good performance or whether I should better use 10 instead.
And the optimizer has several more parameters. Of course it would be possible to test dependencies but this would take a long time. So I was wondering whether there are basic rules that give direction (e.g. in case of many attributes increase parameter X and decrease Y compared to default settings). In my case I have a data sets of about 10 to 20 attributes and 80 examples which has to be run manifold times.
Best regards
Sachs
With SVM's, IMHO, the gamma and C are the biggest parameters to optimize. There is a trade off, of course. Have you seen this image?
http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Financial-Time-Series-Prediction/m-p/33456?lightbox-message-images-33515=551i694A2F729EAC22A8
Thank you for your input. That picture was new to me and I like it pretty much as it gives a good idea to which range C and gamma can be limited for paremeter optimization.
By now I understood which parameters to optimize of the SVM. But when I fooled around with evolutionary optimization I came across the terms:
- Generation
- Population
- Individual
Do you happen to know if there is any documentation which describes what is meant by these terms and what they provoke? Are they kinds of synonyms for attributes and examples?
Best regards
Sachs
I am back with the current status of my research:
1) Using the tutorial process of "optimize parameters evolutionary" operator I logged the example set provided within the optimizer. I found that the example set is the same for each iteration?!? But to my understanding it is supposed to change with each loop. *confused*
2) Regarding the terms - after a while of reading - I came to this conclusion:
individual ~ row
population ~ combination of several individuals
generation ~ combination of several populations
Looking forward to enlightment :smileyhappy:
Kind regards
Sachs
Ok, got so far to understand that I was completly wrong about my assumption. Population does not refer to the examples but to the "candidates" of the parameter to be optimized. This is why the example data remains the same for all iterations!
Still the question remains: To accelerate the optimization shall I rather decrease the number of generations or the population size? What is the consequence of each option?
Looking forward to any feedback.
Sachs
I love this thread, you're answering your own questions! lol.
Population size, tends to have a bigger impact right away in your performance, so i would start with that first. However, generations (when your run it a while) could have impacts later one.
When we did the PQL model for RapidMiner, we did Multi-Objective Feature Selection. It was, in a way to reduce the amount of attributes but extract the maximum performance. We noticed that after 300 or 600 generations, we had some good bumps in performance. So I would start with population and then generation.
Hi Thomas,
I took the weekend to run a couple of samples to compare. I can confirm that you were right and that population size had the bigger impact on runtime and performance as well.
Regarding answering my own questions: Yes, that happened a couple of time since I am using Rapidminer. Sometimmes I start with a question and while time passes by, knowledge grows along with hours in modelling and testing. And when I believe that a result might be useful to the community I post it back. That's what I consider the minimum I could do for the community where I found lot's of support! This sometimes ends up in a post where it seems that I am answering my own questions
Best regards
Sachs