Process step caching

twuytstwuyts Member Posts: 1 Contributor I
edited November 2018 in Help
Greetings Programs!

We are currently evaluating a few tools (SAS Enetrprise Miner, IBM SPSS Modeler, RapidMiner, KNIME). This question is NOT about a comparison between those, but rather about a feature I really like in SPSS Modeler, that I haven't found in RapidMiner.

When you are creating a process, SPSS Modeler allows you to set a flag on any process step, which tells it to cache the output when run. This allows for a rapid development cycle of your process, because the tool is smart enough not to restart from the beginning of the process, but rather from a cached intermediate result.

For example: I have a CSV file with 12 million records, where I'm doing a lot of transformation and aggregation. At a certain point in the process, the intermediate result set is only 100 thousand records. I mark this spot as 'to be cached'. Next I continue developing my process, and add a few steps. Checking the result is really fast, since it can simply start with the cached set of 100k records each time I run it, and not from the starting set of 12M.

The thing I like about this feature, is that is totally transparent: I only have to mark the spot, and SPSS Modeler handles the rest.

I haven't found this in RapidMiner, which means that each time I want to check the result of my process, it has to start from scratch, running through each and every step again.

Did I overlook something? Is a similar feature available in RapidMiner?

Thanks for your input.



  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering

    while you can view intermediate step results via so called "breakpoints", you currently cannot start a process from such a breakpoint. You either have to continue the process after entering a breakpoint or start it from the beginning again. There is a workaround by utilizing the "Store" and "Retrieve "operators, but I'll admit it's clunky and not exactly convenient to use.

    Remembering data at a certain point and allowing the process to be started from there is on our list of "cool features we want to have", however it does not yet exist.

  • Options
    jeremyjeremy Member Posts: 11 Contributor I

    Is process caching anywhere further in development?  This would be great to see soon.

Sign In or Register to comment.