Cannot execute log reg calibration learning: Error while training the H2O model: Illegal argument(s)

SabaMomeniKho · December 2019

Hello,
I'm using auto model in rapidminer 9.5 for a crash dataset. the task is prediction and the "class" column is the target. I chose decision tree, naive bayes, gradient boosted trees, random forest, svm and deep learning. After running, the process only shows results for naive bayes and decision trees and the others face the error below:

Cannot execute log reg calibration learning: Error while training the H2O model: Illegal argument(s) for GLM model: ERRR on field: _response: Response cannot be constant.
As I'm new to this software and I should use it for my Msc thesis, I really need help with this problem. I have also attached my data in case you needed to see.
Thank you.

lionelderkrikor · December 2019

Hi @SabaMomeniKho,

It's a known issue from the RM staff. It is due to the fact that your label has (very) minority classes :

Image: https://us.v-cdn.net/6030995/uploads/editor/14/hbrpbeppituc.png

There are 2 workarounds :
- First try to group your 2 minority classes ("majorinjury" and "fatal") in a unique class (called for example "other injuries"). You can do that with the Replace Rare Values operator which is part of the Toolbox extension (to install from the marketPlace).
- if it does not work, filter out this minority classes from your dataset.

Hope this helps,

Regards,

Lionel

lionelderkrikor · December 2019

@SabaMomeniKho,

OK, I understand.
Yes, good idea : You can apply these predictors separately in the design view.
With your highly imbalanced dataset, I think you can present 2 strategies :

1. No data preprocessing :

Please look at the Process_1.rmp in attached file and its results.
Given you have very few examples of your minority classes (minorinjury, majorinjury, fatal), without data preprocessing, the used algorithm(s) have difficulties to establish / to "captur" the relationships between your regular attributes and these minority classes of your label. As the results, you have effectively a relativ good accuracy, because your algorithm(s) are predicting (quasi) only the majority class (in your case "pdo"). But the cons of this strategy is that the recall of your minority class are extremely bad (very close to 0 or 0), that is to say that the capacity of your model to correctly predict the minority classes is very bad :

Image: https://us.v-cdn.net/6030995/uploads/editor/kn/3pttnq2i2tl6.png

2. Data preprocessing :

Please look at the Process_2.rmp in attached file and its results.
If your priority goal is to correctly predict one of your 3 minority classes deservedly (contribute to better road safety is a noble task, congratulations !

), you have to upsample the minority class you want to correctly predict, meaning that you have to "artificially increase" the number of observations of this minority class. For that you can use the SMOTE Upsampling operator (part of Toolbox Extension to install from the MarketPlace). In the parameters of this operator, uncheck auto detect minority class and set the name of the minority class you want to predict, for example "fatal".

Image: https://us.v-cdn.net/6030995/uploads/editor/oi/4a406v2rlt7w.png

As the results, the class recall of the studied minority class is significantly than in the first strategy; meaning that your model is now able to correctly predict one of your minority class (for example "fatal"). The cons of this strategy is that your overall accuracy will decrease :

Image: https://us.v-cdn.net/6030995/uploads/editor/s9/itwup884luu8.png

Next steps :

To enhance the performance of your model(s) , you can introduce the concepts of :
- Parameters optimization (via the Optimize Parameters (Grid) operator)
- Feature selection (via the Automatic Feature Engineering / Apply Feature Set operators)
To help you with these concepts, you can go to the RapidMiner Academy where there are plenty pedagogic videos :
https://academy.rapidminer.com/

Don't hesitate to comeback if you have other questions during your thesis...

Regards,

Lionel

PS : For my general culture, what is the meaning of "pdo" (the majority class of your label). Thanks you...

SabaMomeniKho · December 2019

hi @lionelderkrikor

thanks for helping:)

actually this is a real data, related to 2018 roadway crashesh in Iowa, usa and as you saw, there are few crashes that have led to fatality or minor injury! and for the whole process in my thesis, I need each four classes.

what do you think about applying these predictors separately in design view and then comparing the results?!

varunm1 · December 2019

@lionelderkrikor its Property Damage Only (when there is no bodily injury involved in crash).

lionelderkrikor · December 2019

@lionelderkrikor its Property Damage Only (when there is no bodily injury involved in crash).

OK, thanks Varun !!

Regards,

Lionel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Cannot execute log reg calibration learning: Error while training the H2O model: Illegal argument(s)

Best Answers

Answers

Be Safe. Follow precautions and Maintain Social Distancing