NOTE: IF YOU WISH TO REPORT A NEW BUG, PLEASE POST A NEW QUESTION AND TAG AS "BUG REPORT". THANK YOU.

Model training error (H2O)

Serek91Serek91 Member Posts: 22 Contributor II
Hi, I got this error during machine learning:



Anyone can help? It happened after ~1h of processing, it is a bit annoying... Process and csv files are attached below.
Tagged:
0
0 votes

Sent to Engineering · Last Updated

HHO-124

Comments

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited September 2019
    Hello @Serek91

    I can reproduce the exact error you are getting, I am not sure about the exact reason for this error, but I thought this might be due to some parallel processing conflict. For this reason, I removed enable parallel execution by unchecking the option on "Cross-Validation (GLM)" operator. The process completed successfully in 2hr 45 mins on a 12 core processor. I attached the result CSV files here.

    Maybe someone from RM engineering might take a look regarding this error, in the mean time you can run the process as mentioned above.

    @sgenzer
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @Serek91 @varunm1 I just ran the process on my 4-core with parallel processing and it ran fine. 59min12sec total. No errors. If you can repeat, can you please send me a log file?

    Scott

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited September 2019
    Hi @sgenzer

    It failed again when I am using parallel execution (Cross Validation GLM) on 6 core (12 logical cores) processor. I attached error log in this.

    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    thanks @varunm1. I really need the whole log though...some info I need that comes before the section you sent. If it is not too much trouble, can you pls restart RM, set logverbosity to "ALL", run the process until it crashes, quit RM, and then send me the full log file rapidminer-studio.log?

    Thank you!

    Scott
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited September 2019
    Hello @sgenzer

    Can you check the attached notepad? My log level is already set to ALL in Settings --> Preferences --> User Interface, if this is not the place to set log level please inform me, My older log file is too big (171 MB) to attach as I don't close RM and my system for several days. I copied all the log related to this process from this big file. 

    I also closed and opened RM as you said and the process completed without any error in 20 minutes on my PC. This behaved like an old tv that needs a couple of bangs to display picture lol  :D . This seems to be a tricky problem.

    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    ok that's going to make our life harder unfortunately. Thank you for the log file @varunm1. That's probably good enough for now. I've sent it off to engineering.
  • Serek91Serek91 Member Posts: 22 Contributor II
    So only way to make it works, is to run it without parallel execution?
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hey @Serek91

    It did run with parallel as well once I close RM and opened again, not sure why its behaving like this, so to investigate @sgenzer created a ticket. You can try, but if parallel doesn't work, you can run without that
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    @Serek91 no, again I was able to run this no problem with parallelization. I have no idea what is going on with your installation...

    Scott
  • tkeneztkenez Employee, RapidMiner Certified Expert, Member Posts: 22 RM Product Management
    Hey folks,

    Following up here. The team was able to pinpoint the root cause of this issue, and it is expected to fail regardless of whether parallelization is enabled or not. The problem occurs if there are a large number of H2O based models trained without restarting Studio, or running in a persistent Job Container.
    We have created a fix for this that will be available in the next product release, so stay tuned :)
Sign In or Register to comment.