Write Database for large dataset

elgolubevaelgolubeva Member Posts: 10 Contributor I
edited October 2019 in Help

Hi!

How is it possible to decrease memory usage and execution time of the Write Database operator? 

I must save in a database a dataset of about 2.2 million rows and about 25 columns. I use SQL Server 2012.

I didn't notice any difference between using batch size = 10000 and batch size = 100000: in both ways the operator took about 42 sec.

Writing my dataset first to a .txt file, than using Read CSV and than Write Database also didn't decrease the time.

Maybe Execute SQL can help?

 

Thank you for ideas.

Tagged:

Best Answer

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted

    Hi!

     

    Java has very special ways for managing memory. Even if Free Memory cleans up some old example sets and starts a garbage collection, the overall memory usage of the Java process will stay the same for a while. It might be lower some minutes later when Java decides to give a portion back to the operating system.

     

    Also, RapidMiner 5 was written in Java 6 times. Current Java versions probably behave differently with it.

     

    Regards,

    Balázs

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi!

     

    The batch size depends on many factors. 10,000 could even be too large - try with lower values like 1,000 and 3,000. 

     

    Execute SQL likely won't help: you would need to create the insert strings yourself and execute line by line (or in batches) - that's what Write Database does in optimized code. 

     

    I think 42 seconds for 2.2 million rows is quite good - improbable that you can improve it dramatically. 

     

    Are you filling an existing table? Does it have many indexes and constraints defined? Could you drop the indexes beforehand, insert the data, and rebuild the indexes? This is a frequently mentioned optimization for bulk data import into databases.

    For example, if you make sure that the ID field is unique in RapidMiner, you wouldn't need the database to check it on every insert.

     

    Regards,

    Balázs

  • elgolubevaelgolubeva Member Posts: 10 Contributor I

    Balarz,

    thank you very much for your reply.

    My table has no indexes or constraints defined, I know that they increase the execution time of Write Database operator.

    Using lower values of batch size had no effect, but now I know that using too large batch size doesn't help either.

     

    I would also like to know, if it is possiple to notice the effect of Free Memory operator in RapidMiner 5.

    I look at Task Manager before and after Free Memory operator execution and see no difference in Physical Memory percentage (e.g. 96%)

     

     

     

  • elgolubevaelgolubeva Member Posts: 10 Contributor I

    Balazs,

    thank youfor your help.

Sign In or Register to comment.