RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Any ways of calculating coefficient of var and standard error in Linear Reg for cross validation?

binaytamrakarbinaytamrakar Member Posts: 5 Contributor I
edited November 2018 in Help

I am using Rapid miner for Linear regression analysis.

 

I think there are some important analysis missing in rapidminer, For eg, There is relative Mean square error but no Standard error of Regression. From my knowledge, I know that RMSE takes N in denominator whereas standard error takes (N-2) in the denominator.

Also, I am looking for Coefficient of Variation for the model (100*std error/ mean of dependent var.) analysis while comparing the performance of one model on the other.

 

I can do these calculations manually, but when I am using 10- fold cross validation, which takes the avg of all the 10 performances, I cannot calculate for all of the folds and take average. 

 

Could someone explain me, if we can calculate standard error of regression and coefficient of variation for the model in the model using rapidminer?

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,749  RM Founder

    Hi,

     

    You can use the operator X-Prediction to get the predictions for all examples in a cross validated fashion.  Then you can use Generate Attributes on the original label and the prediction for the calculation of the error followed by Aggregate and another division (probably easierst via Generate Macro) to calculate the performances you want.  This would only lead to the microaverage of the cross validation though, not the macro average and standard deviations.

     

    Another option is to create an extension which offers those performance metrics in case you know some Java.

     

    Cheers,

    Ingo

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,475  RM Data Scientist

    Hi,

     

    another way to do this is to generate the performance value by hand (using e.g. Aggregate) and then Data to Performance to get a performance vector which can be used everywhere in Rapidminer - for example in cross validation. You can put this construct into a process which can be used anywhere using excute process.

     
    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • binaytamrakarbinaytamrakar Member Posts: 5 Contributor I

    Dear Ingo,

     

    Very much well appreciated for your answer. Yes, following this procedure will yield me only the mikro average, but I prefer macro average, which is taken across the folds. Is there other option, if I want to calculate macro average?

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,475  RM Data Scientist

    Dear Binaytamrakar,

     

    you can use a log operator to log the individual performances and then use Log to Data to get the table for all folds.

     

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.