Cannot compute the performance of a linear regression model

aledipo10aledipo10 Member Posts: 2 Contributor I
edited November 2018 in Help

Hi there,

 

I'm a first time user of rapidminer and need to carry out a project of a course.

 

The goal is to create a linear regression model from some data, apply it to a new set of similar data and validate the model. The approach I adopted is the following:

1. Load the data

2. Select the interesting attributes (predictor variables which I believe affect the target)

3. Transform a categorical attribute into dummy variables

4 Apply the linear regression model

5. Load the new data set, apply the model and see the results

 

However, I get an error at the end when I try to connect the out lab port of the Apply Model block to the lab inp port of the Performance block:

"Input ExampleSet does not have a label attribute performance"

 

Do you have any insights on this issue that could help me?

 

Please find attached the .rpm process

 

Thanks in advance, A.

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    The fix is relatively easy: just do what RapidMiner asks you to do and keep the label attribute "Avg_Sale_Amount" also in the test data (also change its role to label just like you did for the training data).

     

    Think about it: how is RM supposed to calculate a performance if it does not know what the true values are?  That is why the performance operators need both attributes, the label and the predictions so it can do the comparisons.

     

    Hope this helps,

    Ingo

  • aledipo10aledipo10 Member Posts: 2 Contributor I

    Hi IngoRM,

     

    thanks for your reply.

    Actually, it's not that clear to me, I'm sorry. The attribute "Avg_Sale_Amount" is not present in the test data as it is the target variable that is to be predicted. Indeed, after feeding the "Apply model" block with the output of the linear regression and with the test data, in the results I see the attribute named "prediction(Avg_Sale_Amount)".

     

    How should I keep the label attribute "Avg_Sale_Amount" also in the test data?

     

    Thanks for your help, A.

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    I don't have your original data so I do not know if the column "Avg_Sale_Amount" is in the original data or not.  If it is, just include it in the list of attributes you are chosing with the operator Select Attributes.  And also set the role to label (just as you did for the training part of the process).

     

    If it is NOT part of the test data then... it is actually not test data :-)  The idea of a test data set is that you have the true labels so that you can actually make the comparison with the predictions.  If you do not know the truth, there is nothing to compare to.

     

    In this case forget about your "test" data for now and just do a split on the training data with the operator Split Data to actually create your test data set (including the label column!).  Alternatively you can use one of the validation operators like cross-validation etc. 

     

    In case this is all not clear at this point, I really recommend to do the tutorials in RapidMiner which you find in the "Need Help?" menu in the top right corner of the screen under "Tutorials".  I especially recommend the tutorials in the section "Modeling, Scoring, and Validation".

     

    Hope this helps,

    Ingo

  • prashant768prashant768 Member Posts: 6 Contributor I
    I think here the point is to get the prediction of a new dataset without the target variable. Once we have created the model and gotten the performance of that model, now we want to use that model to predict the values for the new dataset which is not having the target variable. In that case the RM is giving error 
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    You are doing something wrong in this case. Please watch videos in the Academy about the correct way to predict data and evaluate models.

    For building the model you need a dataset with a label (= attributed marked with the role label) and additional regular attributes.

    Apply Model takes a dataset with or without a label (the label is ignored) but all the necessary regular attributes. 

    It then adds a prediction column, with the role prediction.

    For the Performance operators you obviously need both the label and the prediction, these are compared to determine the machine learning performance. For just making predictions you don't need the label in the new dataset and you do get the prediction from Apply Model.

    Regards,
    Balázs

Sign In or Register to comment.