Options

Multiple non linear Regression in Rapid miner

Member Posts: 3 Contributor I
edited November 2018 in Help
I am a newbie in rapid miner. I am using Rapid miner as a part of my data mining tool for my graduation thesis.

I have a number of independent variables and one dependent variable which is numerical. I have tried using Linear regression and polynomial regression. However, I also want to try multiple non-linear regression on my data, if it predicts more accurately than linear regressions.

By multiple non-linear regression, I mean that, some independent variables are linear, and some are non-linear(as logarithmic, or exponential or even polynomial). And the predictive value is the combination of all of those.

Y = a . C1 + b.e^C2 + c.log C3 + ...

Here, a, b, c are independent variables and C1, C2, C3 are coefficients.

Could anybody explain me how can I add such operators to achieve my goal?

Thanks, in advance. I would love to clarify my problem, if it is not clear to you.
Tagged:

• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
My first try would be a neural net?

~Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany
• Options
Member Posts: 3 Contributor I

But Neural net is something which is hidden to the user and also requires large number of inputs. That is why I am considering regressions, in which the regression formula is visible and clear to the user. And the user can easily relate how the dependent variables are a function of linear, exponential, logarithmic or polynomial function of independent variables.

So, are there any operators in Rapidminer to get such kind of formulas for regressions? Or if there is any way to deal with such problem?

I might use neural networks and other techniques as well to validate my predictions though.

Thanks!
• Options
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
It sounds like you are describing an SVM?
Each variable gets a formula which transforms the space around it so it becomes linear.

Try one alongside the Create Formula operator.
• Options
Member Posts: 3 Contributor I
Thanks Edward, for your valuable suggestion.

I implemented your approach and it did produce a very complex formula of like 20-30 terms for 5 independent variables.
But the worst part was that, the performance was not very promising for my data.

I am developing a parametric cost model, in which the cost is dependent on a number of independent variables. So, the final formula would contain various Cost Estimating Relationship formulas combined together to predict the cost. I know that this is a multiple non-linear regression problem, but I do not know how to implement this even with other tools or with rapidminer.

Any further help to this direction, would be appreciated.
• Options
Member Posts: 8 Contributor II
@binay Hello, I wonder if you have figured out how to do nonlinear multiple regression? If so, I'd appreciate if you can kindly share the process! Thanks!!
• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
hi @joen841030 hmm this is an OLD thread; I'm not sure user binay is going to pick this up (although who knows? ).

Anyway all the regression operators including linear (GLM), polynomial, etc.. can all be found by simply typing "regression" in the operator search window:

Is there a particular reason you want to use nonlinear regression models? What is your use case? Have you tried just using Auto Model and see what happens there first as a quick test?

Scott
• Options
Member Posts: 8 Contributor II
edited December 2019
@sgenzer
Thanks so much for the reply! I have 1 dependent variable (engagement rate) and 12 independent variables (color of the picture) all measured at continuous level. I tried SPSS first with linear regression but didn't really work because the data should be non-linear based on the graph. That's why now I am trying out nonlinear.
But I am actually not sure which function exactly I should use for my case...Thanks!
• Options
Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi @joen841030,

More over , AutoModel can perform feature selection (and eventually feature generation) automatically for you.
Your dataset must contain at least 100 rows.

Regards,

Lionel
• Options
Member Posts: 8 Contributor II
Thanks @lionelderkrikor

I have just tried it out! The generalized linear model appeared to perform the best though. However, I wonder is there any reason that there is no p-value etc showing?

• Options
Moderator, Member Posts: 1,207 Unicorn
Hello @joen841030

To get the p-values, please uncheck the "use regularization" option in GLM parameters and check the "compute p-values" in the parameters. I also suggest checking the "remove collinear columns" option as well. This way you will get the p-values.

Please let us know if you encounter any issues.
Regards,
Varun
https://www.varunmandalapu.com/

Be Safe. Follow precautions and Maintain Social Distancing

• Options
Member Posts: 8 Contributor II
@varunm1 Thanks for the comment! However, the results of GLM shows "error"... It shows "Error while training the H2O model: Found collinear columns in the dataset. P-values can not be computed with collinear columns in the dataset. Set remove_collinear_columns flag to true to remove collinear columns automatically. "

I wanted to check the "remove collinear columns" as per your suggestion, but I couldn't find that option? Where is that? Thank you very much in advance!!!.
• Options
Moderator, Member Posts: 1,207 Unicorn
edited December 2019
Hello @joen841030

Looks like you didn't check "add intercept". First, check the "add intercept" then you can find "remove collinear columns".

Let us know any other issues you face.
Regards,
Varun
https://www.varunmandalapu.com/

Be Safe. Follow precautions and Maintain Social Distancing

• Options
Member Posts: 8 Contributor II