01-24-2017 04:14 PM
I have got a problem concerning the input data. I want to retrieve a 2GB CSV file, but everytime the operator stops at 40%, then the error message, that memory is not enough occurs (I have 16GB RAM). What can I do about that? Since Rapidminer is a data mining software I expected it to to things like that easily?
Thank you for your help :-)
01-24-2017 05:26 PM
My first suggestion would be to try splitting your csv into smaller chunks (using any free utility like csv splitter) and then reading them in using a loop files operator and joining/appending them together. I don't know how RapidMiner handles memory management for large files like that. Perhaps one of the RM staffers will have another suggestion.
01-24-2017 10:50 PM
Can you share a data sample so we can investigate?
Without not knowing anything about the data, I would have two suggestions to try:
01-25-2017 03:51 AM
The only idea I came up with (after searching the web) was import the CSV into an sql program (e.g. postgres), so that I can use the stream data operator?
Unfortunately I cannot upload the data but I can tell you everything. It contains 8 attributes and round about 80 million examples. The only attribute I had to change with the import wizard was the date (it was set wrong). Ironically if I don't change the date-type the import succeeds.
01-25-2017 08:54 AM
Sometimes dates are not read in correctly in the Read CSV operator, but that's OK. You can always convert those date values by using Nominal to Date operator.