RapidMiner

0 Likes

parallelization and CPU optimization

Status: Released

With RM 7.3's big improvement on Cross-Validation performance, I would like to suggest that RM parallelize and/or optimize CPU performance on:

1) k-means clustering (on a 6-core machine I still only see use of 1 core)

2) Decision Tree

3) Process Documents from Data (Text Processing extension)

4) Loop (all the variations)

5) Branch and Select Subprocess

 

Scott

 

 

5 Comments (5 New)
Comments
RM Staff

Hi Scott,

 

Great suggestions, thanks a lot. I can already confirm that some of this is in the making as we speak. Smiley Happy

Let me ask a few clarifying questions:

2) Decision Tree (and Random Forest) already has a parallel implementation since RapidMiner 6.2. Based on our tests, it is on par with some of the fastest tree learner implementations. Can you name specific circumstances (e.g many nominal attributes) where you feel the execution speed is not great?

3) Process Documents from Data: this operator has been significantly sped up with version 7.2.1 of the Text Processing extension that was released a few weeks ago. Have you had a chance to test that? Do you still feel that it is too slow?

 

Thanks, Zoltan

Community Manager

Good morning Zoltan,

 

I may have spoken too soon about the Decision Tree - I have not benchmarked it recently and seen whether or not it is indeed using multiple cores.  Yes I am usually using Decision Tree with a ton of nominal attributes.

 

As for Process Documents from Data, this is what I was doing yesterday and yes, I can confirm that it is only using 1 core.  It is slow.  I was watching it spin for a long time while simultaneously watching my gorgeous 6-core processor being underutlilized.

 

Thanks!

 

Scott

 

Community Manager

ok Decision Tree is indeed cranking up CPU usage.  Smiley Happy

 

Scott

Screen Shot 2016-11-14 at 10.19.26 AM.pngScreen Shot 2016-11-14 at 10.18.27 AM.png


sgenzer wrote:

With RM 7.3's big improvement on Cross-Validation performance, I would like to suggest that RM parallelize and/or optimize CPU performance on:

1) k-means clustering (on a 6-core machine I still only see use of 1 core)

2) Decision Tree

3) Process Documents from Data (Text Processing extension)

4) Loop (all the variations)

5) Branch and Select Subprocess

 

Scott

 

 


 

Community Manager
Status: Coming Soon
 
Community Manager
Status: Released