parallelization and CPU optimization

sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
edited December 2018 in Product Feedback - Resolved

With RM 7.3's big improvement on Cross-Validation performance, I would like to suggest that RM parallelize and/or optimize CPU performance on:

1) k-means clustering (on a 6-core machine I still only see use of 1 core)

2) Decision Tree

3) Process Documents from Data (Text Processing extension)

4) Loop (all the variations)

5) Branch and Select Subprocess

 

Scott

 

 

0
0 votes

Fixed and Released · Last Updated

Comments

  • zprekopcsakzprekopcsak RapidMiner Certified Expert, Member Posts: 47 Guru

    Hi Scott,

     

    Great suggestions, thanks a lot. I can already confirm that some of this is in the making as we speak. :)

    Let me ask a few clarifying questions:

    2) Decision Tree (and Random Forest) already has a parallel implementation since RapidMiner 6.2. Based on our tests, it is on par with some of the fastest tree learner implementations. Can you name specific circumstances (e.g many nominal attributes) where you feel the execution speed is not great?

    3) Process Documents from Data: this operator has been significantly sped up with version 7.2.1 of the Text Processing extension that was released a few weeks ago. Have you had a chance to test that? Do you still feel that it is too slow?

     

    Thanks, Zoltan

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Good morning Zoltan,

     

    I may have spoken too soon about the Decision Tree - I have not benchmarked it recently and seen whether or not it is indeed using multiple cores.  Yes I am usually using Decision Tree with a ton of nominal attributes.

     

    As for Process Documents from Data, this is what I was doing yesterday and yes, I can confirm that it is only using 1 core.  It is slow.  I was watching it spin for a long time while simultaneously watching my gorgeous 6-core processor being underutlilized.

     

    Thanks!

     

    Scott

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    ok Decision Tree is indeed cranking up CPU usage.  :)

     

    Scott

    Screen Shot 2016-11-14 at 10.19.26 AM.pngScreen Shot 2016-11-14 at 10.18.27 AM.png


    @sgenzer wrote:

    With RM 7.3's big improvement on Cross-Validation performance, I would like to suggest that RM parallelize and/or optimize CPU performance on:

    1) k-means clustering (on a 6-core machine I still only see use of 1 core)

    2) Decision Tree

    3) Process Documents from Data (Text Processing extension)

    4) Loop (all the variations)

    5) Branch and Select Subprocess

     

    Scott

     

     


     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Sign In or Register to comment.