Options

Locking up during data import

drobertson123drobertson123 Member Posts: 4 Contributor I
edited November 2018 in Help
Hello

I am hoping someone has some advice.  Being new to Rapid Miner I am not sure if I am missing something, but this doesn't seem to be right.

I am seeing a consistent problem while I attempt to import data from a CSV file.  The file contains roughly 5 million rows of data.  Each row is comma seperated values containing 3 data items.  A date  (example: 12/3/2010), an integer representing the time during the day and a decimal value.  Everything seems to go fine during the import specification process.  When I actually ask it to finish and do the import the software freezes.  If I go away and come back to it the program screen is black.  It stays that way until I kill the Rapid Miner process.

In Task Manager Rapid Miner is not using any CPU cycles and it isn't consuming much RAM.  The program just seems to be blocked.

Does anyone have any idea what is happening and what to do to fix it?


Thanks for the help.

Doug

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    Greetings Doug, and welcome!

    Assuming you've got enough RAM etc., in your position I'd break the problem down by ...

    1. Breaking the data into chunks,
    2. Cutting down the column separator possibilities in the CSV read operator,

    Because this sort of problem can be caused simply by a wayward column separator, like a space, and scrolling through five million lines is not hugely thrilling!

    Happy hunting, hope that nails it down  ;D
  • Options
    drobertson123drobertson123 Member Posts: 4 Contributor I
    Thanks for the advice.

    I have 8GB of RAM on a windows 7 system. I tried a smaller batch of data (4000 records) and it worked.  I am trying to nail down where the issue is but I still can't seem to find it.

    Are there limitations on the number of rows imported?  I work with large data sets and it would be nice to know what limitations I have.  Also, should I be upping the memroy settings anywhere to get better performance on large data sets?

    I apreciate any advice you can give.  This looks like a great tool, but I am still learning a lot.

    Doug
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    As far as I know the limits are OS imposed, and the memory allowed is tweakable in the startup scripts; but I'm on XP and Vista 64 and not familiar with 7.

    Good luck!

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I would suggest switching to the result perspective while executing the process and watching the memory monitor. If the memory consumption increases steadily and finally the monitor turns red and the gui starts to hang, then you simply have not enough memory.
    Of course this might be caused by wrong parameter settings of the importer, but this is unlikely if it works with 4000 samples.

    Greetings,
      Sebastian
  • Options
    drobertson123drobertson123 Member Posts: 4 Contributor I
    Everyone thanks for the support.

    I figured out what the issue was.  I am running windows 7 x64 and I had the 32 bit Rapid Miner installed.  Despite it running in a 32 bit space it seemed to cause many problems.  Please watch out for this in the future.

    I now have the 64 bit version installed and it works fine.

    I apreciate the good advice people gave.

    Thanks,
    Doug Robertson
Sign In or Register to comment.