ModelApplier needs to much memory with high-dimensional data?

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2018 in Help
Hi again,


I was playing around with the cross validation for some time using one of the templates that come with RapidMiner and the sparse toy data file. Using the toy data, the  standard-XVal with a LibSVM classification learner + ModelApplier + Evauator runs in less than 2 sek.
Then I changed the the dimension of the data from the current 25 features to something larger (e.g. 100000), simply by adding 1 additional feature with the index 99999 and some value to each of my 10 sparse data  vectors.
Unfortunately, the application (!) of the learned model to the test data now  runs extremely long, using incredible amounts of memory. When I do the same without RapidMiner, using a simple perl script and the standard LibSVM implementation, the XVal is again done in seconds. Am I using the wrong ModelApplier or wrong options?

Thank you so much,
Mome

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi Mome,
    this might result from some internal conversions, but I'm not sure. Could you please send me the example data file and the process?

    Greetings,
      Sebastian
  • Legacy UserLegacy User Member Posts: 0 Newbie
    Sorry for the late reply, some other project occupied all my time. Meanwhile, I found out that RapidMiner works indeed very well. I found my stupid mistake:
    The SparseFormatExampleSource has a "DataManagement" parameter. When I store 1 Mio (very sparse set) attributes for thousands of samples using a double_array, I assume this leads to an extremely large (and extremely sparse) matrix. Choosing "boolean_sparse_array"  instead worked well for my problem. I promis to read the operator description more carefully next time  :D

    Thanks a lot
    Mome

Sign In or Register to comment.