Options

Cost Analyses of inaccurate regression model

HaukeVHaukeV Member Posts: 5 Newbie
Hi,

I have built a model to estimate house prices, however I would like to classify the predictions as correct or incorrect based on a % difference with the correct price. Any idea how I can do that? I would then like to apply a cost and a benefit for every correct/incorrect prediction.

Kind regards,
Hauke
Tagged:

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @HaukeV,

    You can use the Generates Attributes operator to calculate the difference between the predicted price and the correct price : 
    for example diff = (pred(price) - true(price)) / true (price) and then apply a threshold to apply a cost or a benefit.

    Hope this helps,

    Regards,

    Lionel
  • Options
    HaukeVHaukeV Member Posts: 5 Newbie
    Hi Lionel, thanks for the answer. We now have 2 columns, one with true/false if the prediction is 15% or more lower, and a second column true/false if the prediction is 15%  or more higher. 

    Now trying when we try to run the cost analyses for classification, so that a true in either category is given the appropriate cost, but we keep running into issues. Tried manually redefining labels, predictions,.. Any help would be much appreciated.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Hi,
    you may use Performance (Cost) as an operator here. Is this a commercial project? That way we can help you with presales ressources to make your point.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    HaukeVHaukeV Member Posts: 5 Newbie
    Hi Martin,

    thanks for your information. This is not a commercial project, just practice for a paper. I changed my setup and now I have only one attribute which classifies as either 0, too low, or too high. I then tried to use the Performance (cost) operator on this to add a cost matrix. To make that work I first had to add a default model which just copies the too low, too high or 0 class to a predicted class. I set all values to 0 in the matrix, and then only if the predicted class = the actual class I add the required cost. However when I do the math manually (80 cases too low, 100 cases too high, multiplied by their respective costs)this does not at all match the math that the performance (costs) module does.
    Thanks for your help!!
Sign In or Register to comment.