Best Practice for RapidMiner workstation

severin_glaeserseverin_glaeser Member Posts: 3 Contributor I
edited December 2018 in Help

Dears,

 

I am a mere System Administrator, so I don't know nothing about data mining and stuff.

 

One of our miners will get a replacement for his workstation next year. And I want to know what would be best for him.

 

 

What will increase his performance most?

 

A) more RAM (currently 16GB, -> up to 64GB is possible)

B) more Cores (currently 4C8T)

C) more better Cores (currently i7 might be Xeon)

D) more MHz

E) something completely different

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Dear Severin,

     

    there is no one-size fits it all answer. In general the two things one want to have is more RAM and more threads on the CPU. There are some algorithms which are by default memory intensive (e.g. FP-Growth) were i would definitly going for more RAM. Also the question is - what is the future in terms of data usage. Will he scale up and learn on more data? If yes, RAM.

     

    Otherwise i would go for CPUs. With the recent update all important operators run in parallel. Twice the amount of cores decreases the runtime by a factor of 2.

    Knowing your use case, I would argue for 32GB Ram and more threads.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research

    Dear Severin,

     

    I completely agree with Martin. But here are some additional thoughts:
    With more threads you will most likely see the biggest speed-up
    (as long you license also supports them).
    Of course a better CPU will always have some effect, but not so much that I would say it always has to be bleeding edge.

     

    The best amount ofRAM is a bit more tricky to decide. As Martin said, some algorithms require per se more RAM. Also if you run multiple threads, each need some memory on its own. So if your memory is already near its limit, adding more threads won't help much.

     

    Another potential factor is of course the data access. If your processes requires a lot of file or database access, a slow connection or hard drive can be a seriously bottleneck. So in this case, investing in an SSD might be smart.

     

    Best,
    David

     

  • severin_glaeserseverin_glaeser Member Posts: 3 Contributor I
    Hi,

    Thank you for your answer.
    So I think I will order a notebook with Xeon CPU and 64GB RAM (fully equipped Lenovo Thinkpad P51, I always wanted to order that.)
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    I've used a P50 myself for a few years and found it pretty decent.
  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi everyone, 

     

    I think this thread is a good place to share the experience with mycorporate RM setup which might not be pretty common, but actually has come very handy for me. @severin_glaeser -- I don't know your network configuration / requirements / policy, but maybe this could be interesting. 

     

    So far, I am running my RM studio on a virtual machine under Windows, on a dedicated server with two 2,4Ghz Xeons and 16M RAM (enough for me now but easily extendable if needed). My laptop is a MacBook which then connects to VM via RDP. As the whole configuration stays in the same network, there's absolutely no lagging or visible latency when working with RDP, even with VPN connection.

     

    Initially this configuration was offered solely for strict DB security reasons (I cannot connect to DB from any local workstation which is also connected to the internet), but at the end it proved to be very efficient for number of reasons: 

     

    1. I don't have to care about any DB connections setup, those are set up from VM by admins (well, if you are an admin, then this is not a feature for you personally :) ).
    2. I don't have to care about configuration of the machine (CPU, RAM) - if needed, VM can just be moved to more powerful server or more RAM can be added.
    3. (VERY convenient) All my RM operations are done on remote machine so in some sense I always have a possibility for background process execution, I can even run a huge and long process and then close my laptop and go home, just to get the results ready in an hour when I reconnect via VPN from home. 
    4. I have an opportunity to use 2 instances of RM studio simultaneously: Studio Large is running on VM for working processes and Studio Free can be run locally on a laptop (of course with no connection to working DB but it is not needed), which is also very convenient when I want to quickly check some tutorial processes or examples while still running working process on the main instnace of Studio without interrupting or even closing it. 

    So far this might not be an answer to your question, but rather a different look on posiible configuration.

  • severin_glaeserseverin_glaeser Member Posts: 3 Contributor I
    Hi @kypexin,

    Thank you for your best practice.
    I also thought about that idea (get a Poweredge R630, install Windows 10 ( I dont know if Studio will work on server OS) and run it in the datacenter (next to the DB, so almost no latency) and so on.
    But on the other hand I always wanted to have a fully equipped P51...
    We will see, I will talk with the big data guys :-)
Sign In or Register to comment.