The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
is a SVM invariant to Skewness and Kurtosis?
I have data that is highly positively skewed, and I want to train a SVM (LibSVM) Classifier on the data with 3 classes... my question is, does skewness and/or curtosis affect performance of a SVM classifier? or is it invariant to those statistic measurements? or should I rather use a log transform for e.g skewed columns?
and which other classifiers are invariant of statistical measures, and which ones does affect them?
Tagged:
0
Answers
Fred,
help me to get the connection. Kurtosis and Skeweness are univariate measures of a distribution. These do not depend on the label class. They have not that much todo with how a SVM works.
The interesting part is to do a scatter plot between label and observerd attribute(s). The non-linearity here is the intersting thing to catch.
~Martin
Dortmund, Germany
well what I mean is, in linear Models like Generalized Linear Model, I think the distribution plays a role like skewness etc. therefore its useful at least with my dataset if I apply a log transformation on skew columns... I got 10%+ better performance after doing that.. outliers also play an important role I think,
I just wanted to know if the same also applies to SVM, I tried with my dataset, both normal and log transformed, and somehow I got about 1- 1.5% better performance on my log-transformed dataset... how can that be?
Mh,
it's overall a interesting question for a regression problem. The problem of GLMs comes from the underlying distribution assumption. By minimizing least squares you implicitly assume a normal distribution. Distributions with high skewness/kurtosis violate this assumption and are thus not performing well.
For SVM-Regression i am not 100% sure if there is such an assumption in the Loss measure. I think it uses absolute loss with the Epsilon to ignore errors below this. Maybe @IngoRM or @RalfKlinkenberg can help, they got some more theoretical experience with SVMs.
In any way, your increase in performance is explainable to me from a total different point of view. If you apply a log on all attributes, your kernel function gets different. That makes a obvious difference.
Best,
Martin
Dortmund, Germany
ok thanks, but what do you mean by my kernel function gets different? In what matter is it different?
Hey,
if you look for example at the rbf kernel (https://en.wikipedia.org/wiki/Radial_basis_function_kernel ) you would simply replace x and x' with log(x) and log(x'). Thats simply different. Not necessarly good or bad, but different.
~Martin
Dortmund, Germany