Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
parallelization and CPU optimization
sgenzer
Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
With RM 7.3's big improvement on Cross-Validation performance, I would like to suggest that RM parallelize and/or optimize CPU performance on:
1) k-means clustering (on a 6-core machine I still only see use of 1 core)
2) Decision Tree
3) Process Documents from Data (Text Processing extension)
4) Loop (all the variations)
5) Branch and Select Subprocess
Scott
Tagged:
0
Comments
Hi Scott,
Great suggestions, thanks a lot. I can already confirm that some of this is in the making as we speak.
Let me ask a few clarifying questions:
2) Decision Tree (and Random Forest) already has a parallel implementation since RapidMiner 6.2. Based on our tests, it is on par with some of the fastest tree learner implementations. Can you name specific circumstances (e.g many nominal attributes) where you feel the execution speed is not great?
3) Process Documents from Data: this operator has been significantly sped up with version 7.2.1 of the Text Processing extension that was released a few weeks ago. Have you had a chance to test that? Do you still feel that it is too slow?
Thanks, Zoltan
Good morning Zoltan,
I may have spoken too soon about the Decision Tree - I have not benchmarked it recently and seen whether or not it is indeed using multiple cores. Yes I am usually using Decision Tree with a ton of nominal attributes.
As for Process Documents from Data, this is what I was doing yesterday and yes, I can confirm that it is only using 1 core. It is slow. I was watching it spin for a long time while simultaneously watching my gorgeous 6-core processor being underutlilized.
Thanks!
Scott
ok Decision Tree is indeed cranking up CPU usage.
Scott