RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

Multiple non linear Regression in Rapid miner

binaybinay Member Posts: 3 Contributor I
edited November 2018 in Help
I am a newbie in rapid miner. I am using Rapid miner as a part of my data mining tool for my graduation thesis.

I have a number of independent variables and one dependent variable which is numerical. I have tried using Linear regression and polynomial regression. However, I also want to try multiple non-linear regression on my data, if it predicts more accurately than linear regressions.

By multiple non-linear regression, I mean that, some independent variables are linear, and some are non-linear(as logarithmic, or exponential or even polynomial). And the predictive value is the combination of all of those.

Y = a . C1 + b.e^C2 + c.log C3 + ...

Here, a, b, c are independent variables and C1, C2, C3 are coefficients.

Could anybody explain me how can I add such operators to achieve my goal?

Thanks, in advance. I would love to clarify my problem, if it is not clear to you.
Tagged:

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,637  RM Data Scientist
    My first try would be a neural net?

    ~Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • binaybinay Member Posts: 3 Contributor I
    Thanks Martin for the answer.

    But Neural net is something which is hidden to the user and also requires large number of inputs. That is why I am considering regressions, in which the regression formula is visible and clear to the user. And the user can easily relate how the dependent variables are a function of linear, exponential, logarithmic or polynomial function of independent variables.

    So, are there any operators in Rapidminer to get such kind of formulas for regressions? Or if there is any way to deal with such problem?

    I might use neural networks and other techniques as well to validate my predictions though.

    Thanks!
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 570   Unicorn
    It sounds like you are describing an SVM? 
    Each variable gets a formula which transforms the space around it so it becomes linear. 

    Try one alongside the Create Formula operator. 
  • binaybinay Member Posts: 3 Contributor I
    Thanks Edward, for your valuable suggestion.

    I implemented your approach and it did produce a very complex formula of like 20-30 terms for 5 independent variables.
    But the worst part was that, the performance was not very promising for my data.

    I am developing a parametric cost model, in which the cost is dependent on a number of independent variables. So, the final formula would contain various Cost Estimating Relationship formulas combined together to predict the cost. I know that this is a multiple non-linear regression problem, but I do not know how to implement this even with other tools or with rapidminer.

    Any further help to this direction, would be appreciated.
  • joen841030joen841030 Member Posts: 8 Contributor II
    @binay Hello, I wonder if you have figured out how to do nonlinear multiple regression? If so, I'd appreciate if you can kindly share the process! Thanks!!
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,954  Community Manager
    hi @joen841030 hmm this is an OLD thread; I'm not sure user binay is going to pick this up (although who knows? :smile: ).

    Anyway all the regression operators including linear (GLM), polynomial, etc.. can all be found by simply typing "regression" in the operator search window:



    Is there a particular reason you want to use nonlinear regression models? What is your use case? Have you tried just using Auto Model and see what happens there first as a quick test?

    Scott
    Tghadiallylionelderkrikor
  • joen841030joen841030 Member Posts: 8 Contributor II
    edited December 2019
    @sgenzer
    Thanks so much for the reply! I have 1 dependent variable (engagement rate) and 12 independent variables (color of the picture) all measured at continuous level. I tried SPSS first with linear regression but didn't really work because the data should be non-linear based on the graph. That's why now I am trying out nonlinear.
    But I am actually not sure which function exactly I should use for my case...Thanks! 
    sgenzer
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,087   Unicorn
    Hi @joen841030,

    I highly recommend to follow Scott's advice to submit your data to AutoModel.
    More over , AutoModel can perform feature selection (and eventually feature generation) automatically for you.
    Your dataset must contain at least 100 rows.

    Regards,

    Lionel
    varunm1joen841030sgenzer
  • joen841030joen841030 Member Posts: 8 Contributor II
    Thanks @lionelderkrikor

    I have just tried it out! The generalized linear model appeared to perform the best though. However, I wonder is there any reason that there is no p-value etc showing? 





  • varunm1varunm1 Moderator, Member Posts: 1,204   Unicorn
    Hello @joen841030

    To get the p-values, please uncheck the "use regularization" option in GLM parameters and check the "compute p-values" in the parameters. I also suggest checking the "remove collinear columns" option as well. This way you will get the p-values. 

    Please let us know if you encounter any issues.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    sgenzer
  • joen841030joen841030 Member Posts: 8 Contributor II
    @varunm1 Thanks for the comment! However, the results of GLM shows "error"... It shows "Error while training the H2O model: Found collinear columns in the dataset. P-values can not be computed with collinear columns in the dataset. Set remove_collinear_columns flag to true to remove collinear columns automatically. "

    I wanted to check the "remove collinear columns" as per your suggestion, but I couldn't find that option? Where is that? Thank you very much in advance!!!. 
  • varunm1varunm1 Moderator, Member Posts: 1,204   Unicorn
    edited December 2019
    Hello @joen841030

    Looks like you didn't check "add intercept". First, check the "add intercept" then you can find "remove collinear columns".


    Let us know any other issues you face.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    sgenzerlionelderkrikor
  • joen841030joen841030 Member Posts: 8 Contributor II
    @varunm1 thank you for your comments earlier, extremely helpful! 
    I've decided to use the results from SVM eventually but I am not sure exactly how to interpret those numbers ... for example, some of the weight of the attribute shows 0, meaning that they do not contribute to my DV at all? And there are several other outputs under SVM that I am not sure how to interpret it. I couldn't find SVM in the Auto Model ducumentation on rapidminer website. It would be nice if you have some information regarding the SVM results generated by Auto Model!
    lionelderkrikor
Sign In or Register to comment.