Cache for ExampleSets?

harri678harri678 Member Posts: 34 Contributor II
edited November 2018 in Help

I have been wondering if there is any chance of caching the ExampleSets between multiple runs. In my case, the loading of the sparse data files takes lots of processing time every run but the data files do not change. So some kind of caching would be great to speed things up? Has this already been discussed or is there another solution to avoid reloading sparse files every run beside sql?



  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Harald,
    did you try to save it into the repository? Might speed things up a lot...
    Caching is in fact an issue, but this is not planned for the client version of RapidMiner.

  • Options
    harri678harri678 Member Posts: 34 Contributor II
    I made a little benchmark and the "Read AML" of a sparse file is faster than store/retrieve repository.
    sparse-file-specs: 7200 examples, 155340 attributes (16Mb .dat, 11Mb .aml, approx. 90% sparse)

    I use "Read AML" and "Store" to save the data into the repository and made several loading-only tests to eliminate caching. These are the results:

              Retrieve Repo    Read AML (sparse)
    1. run:  02:10            00:18
    2. run:  02:03            00:19
Sign In or Register to comment.