Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Bug report : Calibrate (Rescale Confidendes (Logistic)) operator
lionelderkrikor
RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Dear all,
I wanted to report a bug under certain conditions when AutoModel is executing :
You can reproduce this error by :
- Executing AutoModel with the data in attached file,
- setting the Classification attribute as the target variable.
- setting all the options by default in AutoModel,
After opening the process and investigations :
- The bug is generated by the Calibrate (Rescale Confidences(Logistic)) operator (inside Train Model / Optimize subprocesses) : When this operator is removed (and if also the Split Data operator is removed), the process works fine.
- The bug is linked to the Split Ratio of Train/Test (0.9/0.1). In deed if the ratio is set to 0.8/0.2, the process works fine.
- The bug seems linked to the one-hot-encoded of the Date attributes. In deed if the Extract Date Information is disabled in AutoModel (and thus AutoModel works with the original attributes), the process works fine.
Maybe a possible solution, if the bug is unavoidable under certain conditions, is to use the Calibrate operator with a Handle Exception operator.
Thanks you for your listening,
Regards,
Lionel
I wanted to report a bug under certain conditions when AutoModel is executing :
You can reproduce this error by :
- Executing AutoModel with the data in attached file,
- setting the Classification attribute as the target variable.
- setting all the options by default in AutoModel,
After opening the process and investigations :
- The bug is generated by the Calibrate (Rescale Confidences(Logistic)) operator (inside Train Model / Optimize subprocesses) : When this operator is removed (and if also the Split Data operator is removed), the process works fine.
- The bug is linked to the Split Ratio of Train/Test (0.9/0.1). In deed if the ratio is set to 0.8/0.2, the process works fine.
- The bug seems linked to the one-hot-encoded of the Date attributes. In deed if the Extract Date Information is disabled in AutoModel (and thus AutoModel works with the original attributes), the process works fine.
Maybe a possible solution, if the bug is unavoidable under certain conditions, is to use the Calibrate operator with a Handle Exception operator.
Thanks you for your listening,
Regards,
Lionel
Tagged:
0
Comments
Dortmund, Germany
Thanks you for your answer !
No, it is not the case for me :
Here the distributions of values of the label for both training set and test set entering in the Calibrate(Rescale Confidences(Logistic) operator.
On the other hand, these 2 example sets have no "predictions" column...
Regards,
Lionel
Ingo
Thanks for your answer.
Ok, I understand now the problem and your position.
What do you think about this alternative strategy to handle "rare classes" :
Use the Replace Rare Values operator to"group" the "rare classes" into a bigger class. It avoids to "lose" the informations contained in the rare values :
Here a (fictive) example of such strategy :
Regards,
Lionel
Ingo
I just wanted to report this bug with an other dataset. But in this case, it is binary balanced label (there is no rare values in the label) :
You can also notice that, in this dataset, the polynominal regular attributes are imbalanced but NOT highly imbalanced...
The error occurs with the Naive Bayes model and you have to enable FEATURE SELECTION and FEATURE GENERATION in AutoModel.
Regards,
Lionel
EDIT : I forgot to attach the data...