Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Question regarding linear regression model output
akselerator
Member Posts: 3 Learner I
in Help
Hi RapidMiner Community
I tried to make a linear regression model and tried testing the performance of the model through cross validation. The output is a linear function:
If the first values are inserted into the output function in Row No. 12, with a distance of 48 and WTG quantity of 1, the output is 48,381.73. However, the model predicts 60,651.
Does anyone know how the 'predict' column in cross-validation works when it predicts based on the variables that are set up. and why it is different from the result of the linear regression model?
Thanks in advance for taking your time to read my question.
Kind regards
Aksel
I tried to make a linear regression model and tried testing the performance of the model through cross validation. The output is a linear function:
- 31.472 * Distance in kilometers + 34850.105 * WTG Quantity + 15042.279The model performs very well at predicting the cost that I am seeking. However, the output in the predict column in cross validation does not match the variables in the overall function. If I insert a given distance and a given WTG quantity in the function, the output is not the same as the predict(variable).
If the first values are inserted into the output function in Row No. 12, with a distance of 48 and WTG quantity of 1, the output is 48,381.73. However, the model predicts 60,651.
Does anyone know how the 'predict' column in cross-validation works when it predicts based on the variables that are set up. and why it is different from the result of the linear regression model?
Thanks in advance for taking your time to read my question.
Kind regards
Aksel
Tagged:
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 UnicornHi @akselerator,
It is because during the 10 fold cross validation, RapidMiner produce 10 different models with each fold of data.
However, the model delivered at the output is built with the entire dataset.
Thus the models of each cross validation fold are different from the "production" model (the equation you showed).
That's why you can not retrieve the prediction of one or several models of the cross validation with the equation of the "production model".
I hope it is clear
Regards,
Lionel
1
Answers
Thank you so much. It makes much sense.
Kind regards,
Aksel
Regards,
Lionel