Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Locking up during data import

drobertson123drobertson123 Member Posts: 4 Contributor I
edited November 2018 in Help
Hello

I am hoping someone has some advice.  Being new to Rapid Miner I am not sure if I am missing something, but this doesn't seem to be right.

I am seeing a consistent problem while I attempt to import data from a CSV file.  The file contains roughly 5 million rows of data.  Each row is comma seperated values containing 3 data items.  A date  (example: 12/3/2010), an integer representing the time during the day and a decimal value.  Everything seems to go fine during the import specification process.  When I actually ask it to finish and do the import the software freezes.  If I go away and come back to it the program screen is black.  It stays that way until I kill the Rapid Miner process.

In Task Manager Rapid Miner is not using any CPU cycles and it isn't consuming much RAM.  The program just seems to be blocked.

Does anyone have any idea what is happening and what to do to fix it?


Thanks for the help.

Doug

Answers

  • haddockhaddock Member Posts: 849 Maven
    Greetings Doug, and welcome!

    Assuming you've got enough RAM etc., in your position I'd break the problem down by ...

    1. Breaking the data into chunks,
    2. Cutting down the column separator possibilities in the CSV read operator,

    Because this sort of problem can be caused simply by a wayward column separator, like a space, and scrolling through five million lines is not hugely thrilling!

    Happy hunting, hope that nails it down  ;D
  • drobertson123drobertson123 Member Posts: 4 Contributor I
    Thanks for the advice.

    I have 8GB of RAM on a windows 7 system. I tried a smaller batch of data (4000 records) and it worked.  I am trying to nail down where the issue is but I still can't seem to find it.

    Are there limitations on the number of rows imported?  I work with large data sets and it would be nice to know what limitations I have.  Also, should I be upping the memroy settings anywhere to get better performance on large data sets?

    I apreciate any advice you can give.  This looks like a great tool, but I am still learning a lot.

    Doug
  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    As far as I know the limits are OS imposed, and the memory allowed is tweakable in the startup scripts; but I'm on XP and Vista 64 and not familiar with 7.

    Good luck!

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I would suggest switching to the result perspective while executing the process and watching the memory monitor. If the memory consumption increases steadily and finally the monitor turns red and the gui starts to hang, then you simply have not enough memory.
    Of course this might be caused by wrong parameter settings of the importer, but this is unlikely if it works with 4000 samples.

    Greetings,
      Sebastian
  • drobertson123drobertson123 Member Posts: 4 Contributor I
    Everyone thanks for the support.

    I figured out what the issue was.  I am running windows 7 x64 and I had the 32 bit Rapid Miner installed.  Despite it running in a 32 bit space it seemed to cause many problems.  Please watch out for this in the future.

    I now have the 64 bit version installed and it works fine.

    I apreciate the good advice people gave.

    Thanks,
    Doug Robertson
Sign In or Register to comment.