It looks like you're new here. Sign in or register to get started.
when i what to run a process on cloud how do i know which size to choose?
It's really up to you, choosing a larger size will simply allow your process to run more quickly if the dataset is large. That's true unless you get "out of memory" errors when you run it locally, in which case you may need a larger size to ensure it finishes at all.
I did some experimenting with this on AWS using an instance with 36 vCPUs. That configuaration is basically a dual cpu Intel server with nine actual cores, 18 threads per cpu and lots of memory.
What stood out was that I could only ever get Rapdiminer Studio to use one CPU (max 18 threads) in this case. It was not that fast either. I decided that the cloud was not for me after that experience.
Hi Alex, and thanks for sharing the results of your testing. I am not sure when it was done, but it is the case that how many cores RapidMiner uses varies depending on the operators that are being utilized. RapidMiner has recently made progress in taking advantage of parallel processing by making more of the most commonly-used processing-intensive operators able to parallelize their work. See this recent announcement, for example, about the changes to the cross-validation operator which was just released earlier this month: https://rapidminer.com/new-parallel-cross-validation/
So if your AWS testing was a while ago, you might want to redo it at some point in the future to take advantage of the newer operators. I have tested the new cross validation operator and it is definitely faster than the prior version.