🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Cannot compute the performance of a linear regression model

aledipo10aledipo10 Member Posts: 2 Contributor I
edited November 2018 in Help

Hi there,

 

I'm a first time user of rapidminer and need to carry out a project of a course.

 

The goal is to create a linear regression model from some data, apply it to a new set of similar data and validate the model. The approach I adopted is the following:

1. Load the data

2. Select the interesting attributes (predictor variables which I believe affect the target)

3. Transform a categorical attribute into dummy variables

4 Apply the linear regression model

5. Load the new data set, apply the model and see the results

 

However, I get an error at the end when I try to connect the out lab port of the Apply Model block to the lab inp port of the Performance block:

"Input ExampleSet does not have a label attribute performance"

 

Do you have any insights on this issue that could help me?

 

Please find attached the .rpm process

 

Thanks in advance, A.

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751  RM Founder

    Hi,

     

    The fix is relatively easy: just do what RapidMiner asks you to do and keep the label attribute "Avg_Sale_Amount" also in the test data (also change its role to label just like you did for the training data).

     

    Think about it: how is RM supposed to calculate a performance if it does not know what the true values are?  That is why the performance operators need both attributes, the label and the predictions so it can do the comparisons.

     

    Hope this helps,

    Ingo

  • aledipo10aledipo10 Member Posts: 2 Contributor I

    Hi IngoRM,

     

    thanks for your reply.

    Actually, it's not that clear to me, I'm sorry. The attribute "Avg_Sale_Amount" is not present in the test data as it is the target variable that is to be predicted. Indeed, after feeding the "Apply model" block with the output of the linear regression and with the test data, in the results I see the attribute named "prediction(Avg_Sale_Amount)".

     

    How should I keep the label attribute "Avg_Sale_Amount" also in the test data?

     

    Thanks for your help, A.

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751  RM Founder

    Hi,

     

    I don't have your original data so I do not know if the column "Avg_Sale_Amount" is in the original data or not.  If it is, just include it in the list of attributes you are chosing with the operator Select Attributes.  And also set the role to label (just as you did for the training part of the process).

     

    If it is NOT part of the test data then... it is actually not test data :-)  The idea of a test data set is that you have the true labels so that you can actually make the comparison with the predictions.  If you do not know the truth, there is nothing to compare to.

     

    In this case forget about your "test" data for now and just do a split on the training data with the operator Split Data to actually create your test data set (including the label column!).  Alternatively you can use one of the validation operators like cross-validation etc. 

     

    In case this is all not clear at this point, I really recommend to do the tutorials in RapidMiner which you find in the "Need Help?" menu in the top right corner of the screen under "Tutorials".  I especially recommend the tutorials in the section "Modeling, Scoring, and Validation".

     

    Hope this helps,

    Ingo

Sign In or Register to comment.