The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

[SOLVED] Hardware Recommendations for Running RapidMiner

jcurry1jcurry1 Member Posts: 24 Contributor II
edited November 2018 in Help
Folks,

I will be starting a text mining project and hope to use RapidMiner.  The data could get fairly big.

*****Does anyone have hardware recommendations for running RapidMiner? I may be able to run on a reasonably good UNIX server.

****Is there more value in using extra memory for example? Or processor? Or Storage?

I will be using the Community Edition at this point, so I presume that processing will be done in-memory rather than in database in which case fast storage wouldn't offer any benefits.  So my guess is memory and processor are the things to look at.  But there may be a scale beyond which benefits would not be increasing aswell.

(I posted about this last week but didn't get responses. I've tried to rephrase my query in a clearer way.)

Regards,
John.

Answers

  • Options
    fritmorefritmore Member Posts: 90 Contributor II
    depends on the problem at hand.
    u can run RM on fairly complex problems on a laptop.

    some problems may need 2345GB of ram.
    some will do with 200MB.

    some problems may need all the time in the universe some 1s.
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi John,

    basically fritmore is right, the hardware recommendations heavily depend on the tasks at hand. However, some general things can be said:

    - most RapidMiner processes can only use one CPU
    - if you have large amounts of data, you should consider a machine with a reasonable amount of RAM
    - executing long-running processes on your workstation/laptop is at best inconvenient.

    That said, I would recommend to setup RapidAnalytics on your high-performance server. RapidAnalytics offers a repository, i.e. your data and processes are stored on the server, but you can access them in the usual way from within RapidMiner as if they were in your local repository. That way you can:
    - design the processes on your personal laptop/workstation at home or at work
    - execute the processes on the RapidAnalytics server with one click from within RapidMiner
    - access the results as usual from within RapidMiner

    Then it is no problem to shutdown your laptop while a process is running, since it's executed on the server, or to design the next process while the previous one is running.

    RapidAnalytics can execute several processes at the same time and thus use multiple CPUs. The only limit is the available RAM. So first thing for your server should be a reasonable amount of RAM, second thing fast CPUs.

    You probably also want to store your data in a database. That one could run on the same machine as RapidAnalyitcs, or on another machine. For this I can't give any recommendations without knowing your specific use case and budget.

    Best,
    Marius
  • Options
    jcurry1jcurry1 Member Posts: 24 Contributor II
    Thanks for that Marius.  Just as regards databases, I will be accessing data from an Ingres SQL Database or Vectorwise Database that may be held on the same server or else will connect through Vnodes to other servers.

    As I understand it, RapidMiner does all of its processing  in-memory so the database setup wouldn't affect performance i.e. the data will input into memory through one of the read-in nodes and for all of the procesing from then on, the database is out of the equation.  Is it fairly accurate to say this?
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    jcurry1 wrote:
    As I understand it, RapidMiner does all of its processing  in-memory so the database setup wouldn't affect performance i.e. the data will input into memory through one of the read-in nodes and for all of the procesing from then on, the database is out of the equation.  Is it fairly accurate to say this?
    Yes, that's correct.
  • Options
    jcurry1jcurry1 Member Posts: 24 Contributor II
    That's great.  Thanks for all the help.
    John.
Sign In or Register to comment.