The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

h20 cpu usage

sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
edited December 2018 in Help

Good morning everyone,

 

Anyone else noticing the H20 engine with RM7.5 bogging down CPU tremendously, even when not using Deep Learning?  I have been experiencing these weird "cycles" where, all of the sudden, my CPU usage goes through the roof for a few minutes while it does some H20 stuff, and then cycles back down.  Any way to tone this down?


Scott

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I have noticed the same thing. I would be surprised if there is any way to alter the behavior of the h20 operators, although it would be welcomed!
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    phellingerphellinger Employee, Member Posts: 103 RM Engineering

    Hi Scott, Brian,

     

    Is it something that you only experience with 7.5?

    Because the high cpu usage is more like a feature. :)

    H2O algorithms are all highly parallel algos, not just Deep Learning. They spin up a cluster to leverage many cores.

     

    The global "Number of threads" setting (Preferences...) in Studio or Server can limit their cpu usage.

     

    Best,

    Peter

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    no definitely like the usage of cores when I'm actually doing the modeling.  The issue I'm having is that they seem to cycle up and down when I'm NOT running a process, or when my process does not have any modeling operators in it (ETL stuff).  And when it does cycle up, it really goes whole-hog and practically locks up my machine until it's done.

     

    I can do a video screencapture if you want so you can see the cycling.

     

    Scott

     

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hmm, maybe this is not H2O related at all but might be a result of the new data core...  Yes, any additional insights or a video would be highly appreciated.

    Many thanks,

    Ingo

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    ok I "caught" RM today doing this cycling thing.  No it's not an H20 issue - not showing in the log anyway.  Here's a video screencapture with my CPU usage in the foreground.  Note that I have no processes running at all.  Nothing.

     

    https://drive.google.com/file/d/0B0-I7wWw0DZnTXVtaGkxMWM0ajg/view?usp=sharing

     

    Scott

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering

    Hi,

     

    can you do me a favor and look at your .RapidMiner folder and check the file size of "cta.h2.db"?

    After you did that and it's large, you can also try closing Studio, then send said file to me. Afterwards delete it from the .RapidMiner folder and see if the problem is gone.

     

    Regards,

    Marco

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    Hi Marco -

     

    No problem - here you go:

     

    Screen Shot 2017-06-26 at 10.13.16 AM.png

     

    And yes, I quit and restart RM all the time.  I have a hypothesis on why this is happening.  When I run a large process that has some kind of looping in it (e.g. loop operator, cross-validation, optimize parameters, etc...) and I decide to stop the process before it has finished, I have a hunch that the process does not stop - maybe due to its parallel processing somehow?  When I click the "Stop" button, I still see the process icon spinning and it will remain spinning until I delete that operator.  It will even keep spinning when I start the process again.  So I think what is happening in that activity monitor video is that, even though I have "stopped" a process, it's still going.  

     

    Case in point: yesterday I was running a process where RM was taking a large data set (2m+ examples, 50+attributes) and creating k-means clusters of various sizes inside a optimize parameters operator.  The goal was to optimize the performance of the clusters via cluster density.  Knowing that this process was a monster that was going to take several hours, and could likely crash somehow, I had it store the performance of each cluster density using the Store operator inside.  Well lo and behold I saw that this was stalling at some point so I stopped it.  But my CPU was still going strong and sure enough, several minutes later, I saw another performance pop into my repository.  The only way I could really stop this whole thing was to quit and restart RapidMiner.  It's sort of like that Monty Python movie: "STOP I say - or I will say STOP again!!"

     

    Does this help?  :)

     

    Scott

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering

    Hi,

     

    Loops etc should also stop when you click the stop button. The thing however is, this functionality depends on graceful termination by the operator implementation, i.e. it has to check before doing meaningful work whether the process was stopped by the user or not. Unfortunately, there are operators out there which do not regularly check whether they should stop for various reasons (their code being old, using a 3rd party library we cannot control, etc). If you have such processes and this occurs, feel free to share the process XML with me (private message if you like) and let me know where the process did continue after pressing the stop button.

    You cannot safely terminate a thread in Java (because it's highly dangerous and may leave other things in undefined states because a method call was interrupted mid-execution), which is why the above will remain a problem.

     

    If you can, please also upload the cta.h2.db file for me and send me a download link (again via private message), that would be tremendously helpful.

     

    Regards,

    Marco

Sign In or Register to comment.