The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Options

# Polynomial regression with mixed terms

michaelhecht
Member Posts:

**89**Maven
I would appreciate if the polynomial regression would be able to apply also mixed terms, i.e.

if there are attributes like X1,X2,X3 and a label Y and I specify a second order polynomial

that I could get

Y = a * X1 + b * X2 + c * X3 + d * X1*X2 + e * X2*X3 + f * X1*X3 + g * X1^2 + h * X2^2 + i * X3^2

maybe with a kind of optimization of used terms according to an implicite cross validation,

since the application of all possible mixed terms could "explode" the size of the polynomial,

if the size of attribute is large. On the other hand maybe one could specify the maximum number

of mixed terms, e.g. to 3 which means that intermixing is only allowed for 3 attributes. With higher

order polynomials one would get terms like: a * X1 * X4^3 * X5^2.

if there are attributes like X1,X2,X3 and a label Y and I specify a second order polynomial

that I could get

Y = a * X1 + b * X2 + c * X3 + d * X1*X2 + e * X2*X3 + f * X1*X3 + g * X1^2 + h * X2^2 + i * X3^2

maybe with a kind of optimization of used terms according to an implicite cross validation,

since the application of all possible mixed terms could "explode" the size of the polynomial,

if the size of attribute is large. On the other hand maybe one could specify the maximum number

of mixed terms, e.g. to 3 which means that intermixing is only allowed for 3 attributes. With higher

order polynomials one would get terms like: a * X1 * X4^3 * X5^2.

0

## Answers

2,531Unicornas you said, the invention of all mixed terms would let the problem explode. So you would have to do some sort of feature selection internally, validation is needed of course, too.

I personally believe, that this is the wrong way, because you loose so much of controll. It's a simple approach but you could make everything you want to have done internally just inside your process. Use a feature generation providing multiplications, perhabs a genetic algorithm, just as you like, together with a validation over a linear regression and you have everything you want...

Greetings,

Sebastian

89MavenIf yes, assistance is really welcome. I'm not in any way so familiar with RapidMiner that I'm able to do

this. The documentation also doesn't seem to be helpful. So if it is easy for you or a challenge I

would really be happy to get a solution or at least some hints. I'm sure that other users of this

forum do think similar.

2,531Unicornhere's just a simple example for what I suggested. The genetic feature generation will produce products of the attributes, and the linear regression will learn a least squares fitted model on all terms. Hence this is some sort of polynomial regression with mixed terms. The genetic selection algorithm will optimize the used mixed terms, so that it is avoided to build all possible of them at once.

Greetings,

Sebastian

89MavenI'm both impressed and confused. Is it really this simple (short)? Is there any detailed description in the RM

documentation how the AGA operator works?

Is there a chance to force certain attributes to be used? In detail I wanted to plot att1 against predicted label

but the AGA produced only mixed terms.

Nevertheless, thank you very much again.

[glow=red,2,300]

I'm sure that RM is able to do much more than I can imagine![/glow]So where is the RM book to learn me how to do?

:-\

2,531Unicornunfortunately there's no more detailed description than the operator info. Surely this implementation is based upon a scientific publication, but perhabs this something too detailed?

If you only want to plot one variable against another, you could simply use the scatter plot? If you want to evaluate any special expression, because of some background knowledge, you could use the AttributeConstruction operator. You can enter there any expression, in your case you would type "att1 * prediction(label)". Take a look in the operator info of the AttributeConstruction operator for an overview over the various functions available.

The RM book is planned, but unfortunately the community finds too much bugs, so nobody can work on it To be serious: We are working on it, but it won't be finished in the near future.

Greetings,

Sebastian