Model Simulator - optimize input parameters

User36964User36964 Member, University Professor Posts: 15 University Professor
Hi to all,
I'm building a model by random forest. To find the best fitting model I use cross validation. According to the cross validation results my model has 74.35% accuracy and 0.830 AUC. These are good performance results for the model.

I also run the model simulator. The output of the model simulator list the contradicting and supporting parameters but indicates that my results are not confident. And shows the confidence level as 55%. I wonder how this confidence level is calculated and what does it indicate?  How can AUC is high when the confidence level is this low ?

Another issue is the optimize option of the model simulator. In the output window of the model simulator there is an optimize button. This option optimize the input parameters in order to increase the confidence level. The confidence level of my model increased to 90%. When confidence level increases some of the contradicting and supporting parameters change. Is there a way to see the newly formed model (by the optimization) and its performance indicators?


Tagged:

Best Answer

Answers

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 290 Unicorn
    Hi @User36964

    Regarding your first question:
    And shows the confidence level as 55%. I wonder how this confidence level is calculated and what does it indicate?  How can AUC is high when the confidence level is this low ? 
    In fact, confidence and performance are not connected in a direct way. By default, the threshold used for model output is 0.5, which means all binary predictions with confidence above it count as the first class ('True', 'Yes' etc) and all predictions with confidence below it count as the second class ('False', 'No' etc). When metrics like accuracy, AUC and others are calculated, only binary prediction is taken into account but not the confidence, so it doesn't matter here if confidence is 55% or 90% as both indicate the same class. 

    And I'll let more knowledgeable RapidMiners elaborate on your second question :) 
    varunm1User36964sgenzer
  • User36964User36964 Member, University Professor Posts: 15 University Professor
    Thank you for quick response.
    I have another question here. 

    I use weight by tree importance to see the highest weight attributes( most relevant or important for the outcome) . Unfortunately the attributes do not match with the ones listed in the model simulator. For example the most relevant attribute found by model simulator is the 973 th. among the attribute weights. As vice versa the highest weight attributes are not listed in the model simulator output.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,282 RM Data Scientist
    Hi,
    to add/emphasize to @IngoRM 's comment:
    Weight by tree Importance gives you the GLOBAL influence factors across all examples. Explain prediction gives you LOCAL interpretations, which are only valid for this example (or the neighbourhood).
    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    User36964
Sign In or Register to comment.