RapidMiner

h20 cpu usage

Community Manager Community Manager
Community Manager

h20 cpu usage

Good morning everyone,

 

Anyone else noticing the H20 engine with RM7.5 bogging down CPU tremendously, even when not using Deep Learning?  I have been experiencing these weird "cycles" where, all of the sudden, my CPU usage goes through the roof for a few minutes while it does some H20 stuff, and then cycles back down.  Any way to tone this down?


Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
8 REPLIES
RM Certified Expert
RM Certified Expert

Re: h20 cpu usage

I have noticed the same thing. I would be surprised if there is any way to alter the behavior of the h20 operators, although it would be welcomed!
Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
RM Staff
RM Staff

Re: h20 cpu usage

Hi Scott, Brian,

 

Is it something that you only experience with 7.5?

Because the high cpu usage is more like a feature. Smiley Happy

H2O algorithms are all highly parallel algos, not just Deep Learning. They spin up a cluster to leverage many cores.

 

The global "Number of threads" setting (Preferences...) in Studio or Server can limit their cpu usage.

 

Best,

Peter

Community Manager Community Manager
Community Manager

Re: h20 cpu usage

no definitely like the usage of cores when I'm actually doing the modeling.  The issue I'm having is that they seem to cycle up and down when I'm NOT running a process, or when my process does not have any modeling operators in it (ETL stuff).  And when it does cycle up, it really goes whole-hog and practically locks up my machine until it's done.

 

I can do a video screencapture if you want so you can see the cycling.

 

Scott

 

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Staff
RM Staff

Re: h20 cpu usage

Hmm, maybe this is not H2O related at all but might be a result of the new data core...  Yes, any additional insights or a video would be highly appreciated.

Many thanks,

Ingo


How to load processes in XML from the forum into RapidMiner: Read this!
Community Manager Community Manager
Community Manager

Re: h20 cpu usage

ok I "caught" RM today doing this cycling thing.  No it's not an H20 issue - not showing in the log anyway.  Here's a video screencapture with my CPU usage in the foreground.  Note that I have no processes running at all.  Nothing.

 

https://drive.google.com/file/d/0B0-I7wWw0DZnTXVtaGkxMWM0ajg/view?usp=sharing

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
RM Staff
RM Staff

Re: h20 cpu usage

Hi,

 

can you do me a favor and look at your .RapidMiner folder and check the file size of "cta.h2.db"?

After you did that and it's large, you can also try closing Studio, then send said file to me. Afterwards delete it from the .RapidMiner folder and see if the problem is gone.

 

Regards,

Marco

_________________________________________________________
Team Lead Software Engineering | RapidMiner GmbH
Community Manager Community Manager
Community Manager

Re: h20 cpu usage

Hi Marco -

 

No problem - here you go:

 

Screen Shot 2017-06-26 at 10.13.16 AM.png

 

And yes, I quit and restart RM all the time.  I have a hypothesis on why this is happening.  When I run a large process that has some kind of looping in it (e.g. loop operator, cross-validation, optimize parameters, etc...) and I decide to stop the process before it has finished, I have a hunch that the process does not stop - maybe due to its parallel processing somehow?  When I click the "Stop" button, I still see the process icon spinning and it will remain spinning until I delete that operator.  It will even keep spinning when I start the process again.  So I think what is happening in that activity monitor video is that, even though I have "stopped" a process, it's still going.  

 

Case in point: yesterday I was running a process where RM was taking a large data set (2m+ examples, 50+attributes) and creating k-means clusters of various sizes inside a optimize parameters operator.  The goal was to optimize the performance of the clusters via cluster density.  Knowing that this process was a monster that was going to take several hours, and could likely crash somehow, I had it store the performance of each cluster density using the Store operator inside.  Well lo and behold I saw that this was stalling at some point so I stopped it.  But my CPU was still going strong and sure enough, several minutes later, I saw another performance pop into my repository.  The only way I could really stop this whole thing was to quit and restart RapidMiner.  It's sort of like that Monty Python movie: "STOP I say - or I will say STOP again!!"

 

Does this help?  Smiley Happy

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Highlighted
RM Staff
RM Staff

Re: h20 cpu usage

Hi,

 

Loops etc should also stop when you click the stop button. The thing however is, this functionality depends on graceful termination by the operator implementation, i.e. it has to check before doing meaningful work whether the process was stopped by the user or not. Unfortunately, there are operators out there which do not regularly check whether they should stop for various reasons (their code being old, using a 3rd party library we cannot control, etc). If you have such processes and this occurs, feel free to share the process XML with me (private message if you like) and let me know where the process did continue after pressing the stop button.

You cannot safely terminate a thread in Java (because it's highly dangerous and may leave other things in undefined states because a method call was interrupted mid-execution), which is why the above will remain a problem.

 

If you can, please also upload the cta.h2.db file for me and send me a download link (again via private message), that would be tremendously helpful.

 

Regards,

Marco

_________________________________________________________
Team Lead Software Engineering | RapidMiner GmbH