Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Multiple non linear Regression in Rapid miner

binaybinay Member Posts: 3 Contributor I
edited November 2018 in Help
I am a newbie in rapid miner. I am using Rapid miner as a part of my data mining tool for my graduation thesis.

I have a number of independent variables and one dependent variable which is numerical. I have tried using Linear regression and polynomial regression. However, I also want to try multiple non-linear regression on my data, if it predicts more accurately than linear regressions.

By multiple non-linear regression, I mean that, some independent variables are linear, and some are non-linear(as logarithmic, or exponential or even polynomial). And the predictive value is the combination of all of those.

Y = a . C1 + b.e^C2 + c.log C3 + ...

Here, a, b, c are independent variables and C1, C2, C3 are coefficients.

Could anybody explain me how can I add such operators to achieve my goal?

Thanks, in advance. I would love to clarify my problem, if it is not clear to you.
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    My first try would be a neural net?

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • binaybinay Member Posts: 3 Contributor I
    Thanks Martin for the answer.

    But Neural net is something which is hidden to the user and also requires large number of inputs. That is why I am considering regressions, in which the regression formula is visible and clear to the user. And the user can easily relate how the dependent variables are a function of linear, exponential, logarithmic or polynomial function of independent variables.

    So, are there any operators in Rapidminer to get such kind of formulas for regressions? Or if there is any way to deal with such problem?

    I might use neural networks and other techniques as well to validate my predictions though.

    Thanks!
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    It sounds like you are describing an SVM? 
    Each variable gets a formula which transforms the space around it so it becomes linear. 

    Try one alongside the Create Formula operator. 
  • binaybinay Member Posts: 3 Contributor I
    Thanks Edward, for your valuable suggestion.

    I implemented your approach and it did produce a very complex formula of like 20-30 terms for 5 independent variables.
    But the worst part was that, the performance was not very promising for my data.

    I am developing a parametric cost model, in which the cost is dependent on a number of independent variables. So, the final formula would contain various Cost Estimating Relationship formulas combined together to predict the cost. I know that this is a multiple non-linear regression problem, but I do not know how to implement this even with other tools or with rapidminer.

    Any further help to this direction, would be appreciated.
  • joen841030joen841030 Member Posts: 8 Contributor I
    @binay Hello, I wonder if you have figured out how to do nonlinear multiple regression? If so, I'd appreciate if you can kindly share the process! Thanks!!
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @joen841030 hmm this is an OLD thread; I'm not sure user binay is going to pick this up (although who knows? :smile: ).

    Anyway all the regression operators including linear (GLM), polynomial, etc.. can all be found by simply typing "regression" in the operator search window:



    Is there a particular reason you want to use nonlinear regression models? What is your use case? Have you tried just using Auto Model and see what happens there first as a quick test?

    Scott
  • joen841030joen841030 Member Posts: 8 Contributor I
    edited December 2019
    @sgenzer
    Thanks so much for the reply! I have 1 dependent variable (engagement rate) and 12 independent variables (color of the picture) all measured at continuous level. I tried SPSS first with linear regression but didn't really work because the data should be non-linear based on the graph. That's why now I am trying out nonlinear.
    But I am actually not sure which function exactly I should use for my case...Thanks! 
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @joen841030,

    I highly recommend to follow Scott's advice to submit your data to AutoModel.
    More over , AutoModel can perform feature selection (and eventually feature generation) automatically for you.
    Your dataset must contain at least 100 rows.

    Regards,

    Lionel
  • joen841030joen841030 Member Posts: 8 Contributor I
    Thanks @lionelderkrikor

    I have just tried it out! The generalized linear model appeared to perform the best though. However, I wonder is there any reason that there is no p-value etc showing? 





  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hello @joen841030

    To get the p-values, please uncheck the "use regularization" option in GLM parameters and check the "compute p-values" in the parameters. I also suggest checking the "remove collinear columns" option as well. This way you will get the p-values. 

    Please let us know if you encounter any issues.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • joen841030joen841030 Member Posts: 8 Contributor I
    @varunm1 Thanks for the comment! However, the results of GLM shows "error"... It shows "Error while training the H2O model: Found collinear columns in the dataset. P-values can not be computed with collinear columns in the dataset. Set remove_collinear_columns flag to true to remove collinear columns automatically. "

    I wanted to check the "remove collinear columns" as per your suggestion, but I couldn't find that option? Where is that? Thank you very much in advance!!!. 
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    edited December 2019
    Hello @joen841030

    Looks like you didn't check "add intercept". First, check the "add intercept" then you can find "remove collinear columns".


    Let us know any other issues you face.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • joen841030joen841030 Member Posts: 8 Contributor I
    @varunm1 thank you for your comments earlier, extremely helpful! 
    I've decided to use the results from SVM eventually but I am not sure exactly how to interpret those numbers ... for example, some of the weight of the attribute shows 0, meaning that they do not contribute to my DV at all? And there are several other outputs under SVM that I am not sure how to interpret it. I couldn't find SVM in the Auto Model ducumentation on rapidminer website. It would be nice if you have some information regarding the SVM results generated by Auto Model!
Sign In or Register to comment.