Options

Memory recycling

tennenrishintennenrishin Member Posts: 177 Contributor II
edited November 2018 in Help
Background:

I have quite a large process (several thousand lines of xml) that generates a set of reports one after the other. Only reports that are not up to date are generated; up-to-date reports are skipped. After generating a report, the process marks it as up-to-date in the file system.

Therefore the process can be interrupted and re-invoked at will. The task of updating all reports will eventually be completed provided the process receives enough running-time in total.

Problem:

When this process is run, RM gradually consumes more and more memory and very slowly grinds to a halt until it says something to the effect of "Insufficient memory to complete this process". However, when I manually interrupt the and re-invoke this process a few times, that task is completed quickly without any problems.

But ideally I would like this process to complete on its own, without regular user intervention. How can I accomplish this? (It would be an ugly workaround, but is there some way to emulate the effects (on memory) of stopping and restarting a process?)

More background

1. Note that the reports are generated in a separate process, which is called by the mother process iteratively. Therefore I would expect RM to release all of that process's memory when it completes. Is this assumption correct?

2. I have also tried invoking the Free Memory operator after each time that the child process is called, but this seems to have made no difference.

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    it's hard to say anything without actually seeing the process. Are the processes that generate the reports only called when the reports need updating?
    Does the memory problem also occur if you do something else in the subprocesses instead of generating reports?
    How big is your data? Do you have a loop, and do you have connected anything to the output port of that loop that you don't need anymore? In that case, you should cut the corresponding connections.

    Best regards,
    Marius
  • Options
    tennenrishintennenrishin Member Posts: 177 Contributor II
    Yes understandably hard. Thanks for responding. I will try to explain it better:

    Let process A have a Loop Repository operator. Inside this loop, process B is called.

    Process B
    * reads some ExampleSet from the repository
    * performs some pre-processing before returning the data to process A via the output
    * but it also generates an html report, optionally. It decides whether or not to generate the report by checking whether that report already exists in the file system.

    The important point to note is that process B's output is not affected by whether or not it had to perform this optional last task of generating a report. Therefore as far as process A is concerned, process B behaves the same either way, except that it takes longer if it has to generate the report.

    Now, when the reports are already existent in the file system (so that process B does less work) then I find that process A completes successfully with plenty of memory to spare. However, when those reports do not exist (so that process B has to do more work) then I find that process A can no longer run to completion before running out of memory.

    These observations go against the expectation that process B would release the memory it used internally, when it exits. Instead, I find that the memory is consumed cumulatively until I stop both process B and process A.

    (And the peculiar result of all this is that I have to manually stop and restart process A a few times, in order for it to complete successfully, while I actually need to be able to "fire-and-forget" it.)

    Thanks,
    Isak
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Isak,

    thank you for this very detailed and comprehensible description.
    tennenrishin wrote:

    These observations go against the expectation that process B would release the memory it used internally, when it exits. Instead, I find that the memory is consumed cumulatively until I stop both process B and process A.
    You are right, that is how it is supposed to be. Unfortunately, that's all I can say at this moment. We'll have a look at that.

    Best regards,
    Marius
  • Options
    tennenrishintennenrishin Member Posts: 177 Contributor II
    Thanks Marius.

    This may be the same issue as
    http://rapid-i.com/rapidforum/index.php/topic,6072.0.html
    because it is possible that I usually (for good measure) restart RM before rerunning the process.

    In other words (unless I'm forgetting some evidence) it may in fact be that process B holds onto that memory regardless of whether process A has stopped.

    Regards
    Isak
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Yes, it is most probably the same issue, especially seen that the problem obviously only occurs when process B creates the reports.

    Best regards,
    Marius
Sign In or Register to comment.