"Issues Importing CSV"

jbartotjbartot Member Posts: 4 Contributor I
edited May 2019 in Help
Hi,

I am trying to import a CSV that is about 25M in size.  RM really struggles to process the file.  It maxes out the processors for about 10 minutes before it finally gives up and runs out of memory.  I specifically set the java heap size on launch and can see that the OS is giving RM the 2G memory space I specified.  I tried this on a smaller file (1/5 size) and got similar behavior.

I have tried this importing either to a repository or to the workspace.  I get the same behavior both ways.  The data itself is 500 x 12,000 (bag of words document vectors).  Even assuming each feature value takes up 8 bytes (for doubles) of space, it doesn't make sense that this is such a struggle. 

Any ideas?  Am I thinking about this right?

Any help would be appreciated.

Jay
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    well shouldn't happen. Is the data confidental? Would it be possible otherwise if you send it to me? Then I will include it into our checks.

    Greetings,
      Sebastian
  • jbartotjbartot Member Posts: 4 Contributor I
    Happy to share the data.  Given its size, where should I post it to?

    Thanks

    Jay
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    please send me an email. If you compress the data it should fit in my mail account.

    Greetings,
      Sebastian
Sign In or Register to comment.