"CSV File too big to process?"

eldenosoeldenoso Member Posts: 65 Contributor I
edited June 2019 in Help

Hello altogether,

 

I have got a problem concerning the input data. I want to retrieve a 2GB CSV file, but everytime the operator stops at 40%, then the error message, that memory is not enough occurs (I have 16GB RAM). What can I do about that? Since Rapidminer is a data mining software I expected it to to things like that easily?

Thank you for your help :-)

Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    My first suggestion would be to try splitting your csv into smaller chunks (using any free utility like csv splitter) and then reading them in using a loop files operator and joining/appending them together.  I don't know how RapidMiner handles memory management for large files like that.  Perhaps one of the RM staffers will have another suggestion.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • zprekopcsakzprekopcsak RapidMiner Certified Expert, Member Posts: 47 Guru

    Hi,

    Can you share a data sample so we can investigate?

    Without not knowing anything about the data, I would have two suggestions to try:

    1. Make sure that the attribute types are properly set in the import wizard. If you store a datetime as a nominal instead of a proper datetime then you grow the memory footprint significantly. Same with attributes that have only missing values in the first X rows. RapidMiner will not be able to guess their types so unless you set that manually, it will default to nominal.
    2. You may want to try the in-product beta mode that has a lower memory footprint in general. See more details here: http://static.rapidminer.com/rnd/html/rapidminer-7.3-beta-mode.html

     

    Best,

  • eldenosoeldenoso Member Posts: 65 Contributor I

    The only idea I came up with (after searching the web) was import the CSV into an sql program (e.g. postgres), so that I can use the stream data operator? 

    Unfortunately I cannot upload the data but I can tell you everything. It contains 8 attributes and round about 80 million examples. The only attribute I had to change with the import wizard was the date (it was set wrong). Ironically if I don't change the date-type the import succeeds.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Sometimes dates are not read in correctly in the Read CSV operator, but that's OK. You can always convert those date values by using Nominal to Date operator.

Sign In or Register to comment.