X-Validation with large data set using libsvm

side · December 2012

Hello,

I'm trying to use X-Validation in large data set with libsvm. More specifically, I have 3 data sets with 70 100 and 105 mb in arff files. The data are unbalnced so I would like to do x-validation to find the best kernel parameters. However, the rapid miner spend a lot of time. I can't run it so far probably because the system has limited cpu. I run on 64bit windows 7 and I have amd athlon dual core 2,2GHz.

Can anyone explain me why the systmem can't produce the results?

Thank you and happy new year my friends

MariusHelf · January 2013

Hi,

why is your data so large? Do you have many attributes, or do you have many examples? By design, the SVM is quite slow when you have many examples (O(n^3)), but quite fast for many attributes (O(m)). So if you have many examples in your data, you should consider to use another algorithm than the SVM instead. Decision trees e.g. are quite fast for many examples, but have a bad performance for data with many attributes.

Furthermore, you should not use heavily unbalanced data for training, but balance it beforehand. You can use the Sample operator for that, with the balance_data parameter.

Best regards,
Marius

side · January 2013

I have 15000 examples with 2000 attributes, so Decision trees is not a good selection. SVM can handle this but with not good results and spend 4 hours for 3 cross validations.
Using Sample operator we are losing important information from the ignored examples. So I can't use this feature. Reading about svm requirements I learned that svm need O(n^3) time. However, I saw that there is CVM (Core Vector Machine) which can handle this problem with O(n) time complexity, but rapid miner doesn't support this algorithm. Would Rapid Miner include this algorithm in a future version?

Thans a lot!

MariusHelf · January 2013

Hi,

we probably won't include the Core Vector Machine in the near future. However, as far as I know, Hendrik Blom from the TU Dortmund implemented the Core Vector Machine in a custom extension during his this. You should find his contact data easily via google.

Best regards,
Marius Helf

side · January 2013

Thank you Marius! Nice to meet you!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

X-Validation with large data set using libsvm

Answers