X-Validation with large data set using libsvm

sideside Member Posts: 8 Contributor II
Hello,

I'm trying to use X-Validation in large data set with libsvm. More specifically, I have 3 data sets with 70 100 and 105 mb in arff files. The data are unbalnced so I would like to do x-validation to find the best kernel parameters. However, the rapid miner spend a lot of time. I can't run it so far probably because the system has limited cpu. I run on 64bit windows 7 and I have amd athlon dual core 2,2GHz.

Can anyone explain me why the systmem can't produce the results?

Thank you and happy new year my friends

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    why is your data so large? Do you have many attributes, or do you have many examples? By design, the SVM is quite slow when you have many examples (O(n^3)), but quite fast for many attributes (O(m)). So if you have many examples in your data, you should consider to use another algorithm than the SVM instead. Decision trees e.g. are quite fast for many examples, but have a bad performance for data with many attributes.

    Furthermore, you should not use heavily unbalanced data for training, but balance it beforehand. You can use the Sample operator for that, with the balance_data parameter.

    Best regards,
    Marius
  • sideside Member Posts: 8 Contributor II
    I have 15000 examples with 2000 attributes, so Decision trees is not a good selection. SVM can handle this but with not good results and spend 4 hours for 3 cross validations.
    Using Sample operator we are losing important information from the ignored examples. So I can't use this feature. Reading about svm requirements I learned that svm need O(n^3) time. However, I saw that there is CVM (Core Vector Machine) which can handle this problem with O(n) time complexity, but rapid miner doesn't support this algorithm. Would Rapid Miner include this algorithm in a future version?

    Thans a lot!
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    we probably won't include the Core Vector Machine in the near future. However, as far as I know, Hendrik Blom from the TU Dortmund implemented the Core Vector Machine in a custom extension during his this. You should find his contact data easily via google.

    Best regards,
    Marius Helf
  • sideside Member Posts: 8 Contributor II
    Thank you Marius! Nice to meet you!
Sign In or Register to comment.