The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

Optimize Selection Evolutionary, Parallel -Scaling

hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
edited November 2018 in Help

Has anyone done any tests to determine how well Rapidminer 7 scales on multicore cpu's? Partiularly machines with 16 threads or greater?

 

Many thanks,

 

Alex

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Hi Alex,

     

    One of the best experts for this would be @land as his company developed an extension specifically for parellizing efficiently in RapidMiner. 

    I think you spoke with him on the forum recently.  I'm sure he has some good information in that area. 

     

    Regards,

    John.

  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn

    Thanks John,

     

    I will ask him. I will also try and setup a test myself over the next couple of weeks.

     

    regards,

     

    Alex

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,

    in principle you need to consider that each thread needs a copy of the data, so your memory should match your CPU count.
    The easiest way is to use multiple threads for the cross validation, this directly results in nearly x-times speed up.
    However, as one usually uses a 10 fold cross validation (I make it usually 8 to match my cpu cores) this speedup is limited. If you need to utilize more threads, you also need to run outer operators in parallel.
    I usually find myself to avoid this and rather have multiple processes running in parallel. One usually does not only use ONE single optimization run, but have multiple for multiple methods. This way you can easily bring down also bigger servers.
    And of course real world projects usually not just need one model but usually multiple ones. So you can also loop over groups of data and calculate their models in parallel.

    We offered the Jackhammer Extension until recently that did add a lot of the necessary functionality.

    Greetings,
    Sebastian
Sign In or Register to comment.