Model Simulator - optimize input parameters

User36964 · April 2019

Hi to all,

I'm building a model by random forest. To find the best fitting model I use cross validation. According to the cross validation results my model has 74.35% accuracy and 0.830 AUC. These are good performance results for the model.

I also run the model simulator. The output of the model simulator list the contradicting and supporting parameters but indicates that my results are not confident. And shows the confidence level as 55%. I wonder how this confidence level is calculated and what does it indicate? How can AUC is high when the confidence level is this low ?

Another issue is the optimize option of the model simulator. In the output window of the model simulator there is an optimize button. This option optimize the input parameters in order to increase the confidence level. The confidence level of my model increased to 90%. When confidence level increases some of the contradicting and supporting parameters change. Is there a way to see the newly formed model (by the optimization) and its performance indicators?

IngoRM · April 2019

Hi,

Another issue is the optimize option of the model simulator. In the output window of the model simulator there is an optimize button. This option optimize the input parameters in order to increase the confidence level. The confidence level of my model increased to 90%.

That's actually not really what happened

The optimizations is merely trying input combinations so that the confidence is maximized (or whatever the target is). It does not really change the overall confidence levels for the model, it just looks for the case where the model is most sure one way or the other. Those settings are then used on the left side when you press on finished and the model's response is shown on the right side, including the new supporting and contradicting factors for the new model prediction.

When confidence level increases some of the contradicting and supporting parameters change. Is there a way to see the newly formed model (by the optimization) and its performance indicators?

As I said, the model is still the same. The optimization is done with the same model, only the input values are changes. Please see this here for more information: https://docs.rapidminer.com/8.2/studio/operators/scoring/model_simulator.html

I use weight by tree importance to see the highest weight attributes( most relevant or important for the outcome) . Unfortunately the attributes do not match with the ones listed in the model simulator. For example the most relevant attribute found by model simulator is the 973 th. among the attribute weights. As vice versa the highest weight attributes are not listed in the model simulator output.

That can indeed happen since both are completely different approaches. One is using artificially generated data points to calculate the local importance of the different factors. The advantage is that this approach works for all model types. Mode on this here: https://docs.rapidminer.com/9.0/studio/operators/scoring/explain_predictions.html

The tree importance is taking the specific model into account, but obviously only works on tree-based models.

Hope this helps,
Ingo

kypexin · April 2019

Hi @User36964

Regarding your first question:

And shows the confidence level as 55%. I wonder how this confidence level is calculated and what does it indicate? How can AUC is high when the confidence level is this low ?

In fact, confidence and performance are not connected in a direct way. By default, the threshold used for model output is 0.5, which means all binary predictions with confidence above it count as the first class ('True', 'Yes' etc) and all predictions with confidence below it count as the second class ('False', 'No' etc). When metrics like accuracy, AUC and others are calculated, only binary prediction is taken into account but not the confidence, so it doesn't matter here if confidence is 55% or 90% as both indicate the same class.

And I'll let more knowledgeable RapidMiners elaborate on your second question

User36964 · April 2019

Thank you for quick response.

I have another question here.

I use weight by tree importance to see the highest weight attributes( most relevant or important for the outcome) . Unfortunately the attributes do not match with the ones listed in the model simulator. For example the most relevant attribute found by model simulator is the 973 th. among the attribute weights. As vice versa the highest weight attributes are not listed in the model simulator output.

MartinLiebig · April 2019

Hi,

to add/emphasize to @IngoRM 's comment:

Weight by tree Importance gives you the GLOBAL influence factors across all examples. Explain prediction gives you LOCAL interpretations, which are only valid for this example (or the neighbourhood).

Best,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Model Simulator - optimize input parameters

Best Answer

Answers