RM 9.4 feedback (official release) : Costs/Benefits calculation

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Dear all,

First thanks you for implementing the costs/benefits calculus in this new release - I think lot of users (including me) waited for this new feature.

2 months ago I had several questions in this thread about the Costs/Benefits calcultation and thanks to @IngoRM to answer me, that's was clear : 

https://community.rapidminer.com/discussion/55904/questions-on-rapidminer-9-4-beta-new-releases

But in this official release , I'm seeing that "Total Cost/Benefit (expected) and the associated average were abandoned. My first question is why ?

 The "Total Cost/Benefit (expected)" and the associated average are replaced by : 
 - "Total for best option"
 - "Gain"

My second question is  : can you explain how this 2 numbers are calculated (despite my efforts i was not able to retrieve them) and why these 2 new numbers are more relevant than the "Total Cost/Benefit (expected)" ?

Here my attempt to retrieve these 2 numbers with the Titanic Dataset with all options by default in AutoModel with NB model : 




Third question : in the new column called "cost" why the cost is not counted as negative when the prediction is wrong (I suppose the following cost matrix as the following) :

 






Thanks you for your listening,

Regards,

Lionel
Tagged:

Best Answers

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi Ingo,

    Yes, your long and detailed explanation helps me a lot to understand these new concepts of Benefits/Costs. #noblackboxes  :)
    Thank you for spending your time answering my questions.

    Now you'll think I'm picky about the details, but I will quote the deutsch philosopher Friedrich Nietzsche : "The Devil is in the details"  >:)
    I begin  : 
    The 3 money indicators (Total Cost/Benefits, Total for Best Option, Gain) are calculated on the whole validation set (ie for the Titanic dataset on 524 examples [1309 examples x 40%]) : 



    But the displayed confusion matrix is NOT builded on the whole validation test : 



    Here we can see that the number of examples used to build this confusion matrix (always for the Titanic) is 
    219 + 135 + 7 + 14 = 375 examples A priori due to the factor 5 /7 introduced by the Performance Average (Robust) operator.

    My question is for a question of homogeneity of the results, should the 3 moneys indicators not be calculated with this displayed confusion matrix ? In other words, actually, the displayed money indicators don't correspond directly to the displayed confusion matrix ...

    Thanks you for your patience and your listening...

    Regards,

    Lionel



  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    You got me there
    So Friedrich Nietzsche was right ..... >:)

    More seriouly, I agree with your point of view, Ingo,  and once again, thanks for taking the time to answer me.

    Regards,

    Lionel 
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    This is a very interesting discussion.  I haven't had a chance to dive into this new operator yet, but I had a couple of questions.
    @IngoRM how is the new operator different from the existing Performance(Costs) operator?  Or is it?
    It appears that they require the same inputs (a class order and then a misclassification cost matrix). In this framework, are you still allowed to enter benefits as negative costs?

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.