Options

Polynomial regression with mixed terms

michaelhechtmichaelhecht Member Posts: 89 Maven
edited June 2019 in Help
I would appreciate if the polynomial regression would be able to apply also mixed terms, i.e.
if there are attributes like X1,X2,X3 and a label Y and I specify a second order polynomial
that I could get

Y = a * X1 + b * X2 + c * X3 + d * X1*X2 + e * X2*X3 + f * X1*X3 + g * X1^2 + h * X2^2 + i * X3^2

maybe with a kind of optimization of used terms according to an implicite cross validation,
since the application of all possible mixed terms could "explode" the size of the polynomial,
if the size of attribute is large. On the other hand maybe one could specify the maximum number
of mixed terms, e.g. to 3 which means that intermixing is only allowed for 3 attributes. With higher
order polynomials one would get terms like: a * X1 * X4^3 * X5^2.

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    as you said, the invention of all mixed terms would let the problem explode. So you would have to do some sort of feature selection internally, validation is needed of course, too.
    I personally believe, that this is the wrong way, because you loose so much of controll. It's a simple approach but you could make everything you want to have done internally just inside your process. Use a feature generation providing multiplications, perhabs a genetic algorithm, just as you like, together with a validation over a linear regression and you have everything you want...

    Greetings,
      Sebastian
  • Options
    michaelhechtmichaelhecht Member Posts: 89 Maven
    Hi, thank you for your reply. Do I understand right, that you suggest to do all manually using RapidMiner?

    If yes, assistance is really welcome. I'm not in any way so familiar with RapidMiner that I'm able to do
    this. The documentation also doesn't seem to be helpful. So if it is easy for you or a challenge  ;) I
    would really be happy to get a solution or at least some hints. I'm sure that other users of this
    forum do think similar.
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Michael,
    here's just a simple example for what I suggested.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="one variable non linear"/>
            <parameter key="number_of_attributes" value="1"/>
        </operator>
        <operator name="AGA" class="AGA" expanded="yes">
            <parameter key="use_plus" value="false"/>
            <parameter key="reciprocal_value" value="false"/>
            <operator name="XValidation" class="XValidation" expanded="yes">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <operator name="LinearRegression" class="LinearRegression">
                </operator>
                <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier" class="ModelApplier">
                        <list key="application_parameters">
                        </list>
                    </operator>
                    <operator name="Performance" class="Performance">
                    </operator>
                </operator>
            </operator>
        </operator>
        <operator name="Construction2Names" class="Construction2Names">
        </operator>
    </operator>
    The genetic feature generation will produce products of the attributes, and the linear regression will learn a least squares fitted model on all terms. Hence this is some sort of polynomial regression with mixed terms. The genetic selection algorithm will optimize the used mixed terms, so that it is avoided to build all possible of them at once.

    Greetings,
      Sebastian
  • Options
    michaelhechtmichaelhecht Member Posts: 89 Maven
    Wow,

    I'm both impressed and confused. Is it really this simple (short)? Is there any detailed description in the RM
    documentation how the AGA operator works?

    Is there a chance to force certain attributes to be used? In detail I wanted to plot att1 against predicted label
    but the AGA produced only mixed terms.

    Nevertheless, thank you very much again.

    [glow=red,2,300]I'm sure that RM is able to do much more than I can imagine!
    So where is the RM book to learn me how to do?
    [/glow]
    :-\
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    unfortunately there's no more detailed description than the operator info. Surely this implementation is based upon a scientific publication, but perhabs this something too detailed? :)

    If you only want to plot one variable against another, you could simply use the scatter plot? If you want to evaluate any special expression, because of some background knowledge, you could use the AttributeConstruction operator. You can enter there any expression, in your case you would type "att1 * prediction(label)". Take a look in the operator info of the AttributeConstruction operator for an overview over the various functions available.

    The RM book is planned, but unfortunately the community finds too much bugs, so nobody can work on it :) To be serious: We are working on it, but it won't be finished in the near future.

    Greetings,
      Sebastian
Sign In or Register to comment.