Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Automodel - large (?) CSV dataset memory issues

baeikenebaeikene Member Posts: 1 Learner I
edited June 2019 in Help

I'm doing research on the CIC IDS 2017 dataset which contains 200-300MB of data for one file.

I try to do a automodel and predict the source IP based on other attributes. I get into memory issues running this (I have 16GB RAM) but I assume that I have used a too large dataset or too many attributes for the modeling.

So my question is what number of lines and attributes can I expect to be handled doing this?

 

Tagged:

Answers

  • rfuentealbarfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hello,

     

    RapidMiner is a bit resource hungry, but it shouldn't be a problem to load a large file like that one. I have 4 Gb of RAM on my MacBook Air and can load the file.

     

    The thing is that with such a limited amount of memory, I usually do three things to maximize:

     

    1. I write the CSV into an IOObject.
    2. I split my model into many smaller models that still can make sense for the user/developer.
    3. I use the Free Memory operator when I cannot split a model into two or three.

    However, I also do tune my RapidMiner Studio installation to use more memory. In this case:

     

    Preferences > System > Data Management. I configure that number to be at least twice the amount of training data.

     

    Screen Shot 2018-09-27 at 01.58.53.png

     

    Hope this helps,

     

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi,

     

    can you tell us in which step of Automodel this happens? If you read the file and save it as IOObject in the repository, do you still have the same problems?

     

    You can also try avoiding Deep Learning or Gradient Boosting Models, which are very resource consuming.

     

    Cheers,

    Sebastian

Sign In or Register to comment.