The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
X-Validation with large data set using libsvm
Hello,
I'm trying to use X-Validation in large data set with libsvm. More specifically, I have 3 data sets with 70 100 and 105 mb in arff files. The data are unbalnced so I would like to do x-validation to find the best kernel parameters. However, the rapid miner spend a lot of time. I can't run it so far probably because the system has limited cpu. I run on 64bit windows 7 and I have amd athlon dual core 2,2GHz.
Can anyone explain me why the systmem can't produce the results?
Thank you and happy new year my friends
I'm trying to use X-Validation in large data set with libsvm. More specifically, I have 3 data sets with 70 100 and 105 mb in arff files. The data are unbalnced so I would like to do x-validation to find the best kernel parameters. However, the rapid miner spend a lot of time. I can't run it so far probably because the system has limited cpu. I run on 64bit windows 7 and I have amd athlon dual core 2,2GHz.
Can anyone explain me why the systmem can't produce the results?
Thank you and happy new year my friends
0
Answers
why is your data so large? Do you have many attributes, or do you have many examples? By design, the SVM is quite slow when you have many examples (O(n^3)), but quite fast for many attributes (O(m)). So if you have many examples in your data, you should consider to use another algorithm than the SVM instead. Decision trees e.g. are quite fast for many examples, but have a bad performance for data with many attributes.
Furthermore, you should not use heavily unbalanced data for training, but balance it beforehand. You can use the Sample operator for that, with the balance_data parameter.
Best regards,
Marius
Using Sample operator we are losing important information from the ignored examples. So I can't use this feature. Reading about svm requirements I learned that svm need O(n^3) time. However, I saw that there is CVM (Core Vector Machine) which can handle this problem with O(n) time complexity, but rapid miner doesn't support this algorithm. Would Rapid Miner include this algorithm in a future version?
Thans a lot!
we probably won't include the Core Vector Machine in the near future. However, as far as I know, Hendrik Blom from the TU Dortmund implemented the Core Vector Machine in a custom extension during his this. You should find his contact data easily via google.
Best regards,
Marius Helf