RapidMiner

Read Database causes Out of Memory Error independent of read data

Status: Investigating

In many situations you need to separate a huge data set by iteratively read it from the database and process each chunk. Unfortunately Read Database seems to have a cache on it's own or at least not clearing up everything until the process is stopped.

We had issues that the subprocess with Read Database only generates very small results. Every single chunk was of handable size. But still we ran out of memory.

We then put the Read Database operator into a Cache subprocess. This operator will store results of it's inner operators, so that if the process is re-executed there won't be a database access anymore. While the first interation of the process still crashed, the second ran through, because 80% of the chunks came from the Cache operator. 
That made us suspicious. We then removed the Read Database operator and put it into another process, which is executed instead of the operator itself. Seems to be a miracle, but then it went through smoothly! Instead of running out of 2 GB we now only used 200MB! There seems to be a HUGE memory leak in this operator, especially considering that this operator tends to be executed multiple times in a process!

Please give an indication when this can be fixed, because this is a major issue for us in our projects. It's not really convenient to separate the operator into another process. 

 

1 Comment (1 New)
Comments
Community Manager
Status: Investigating