# Multivariate stepwise robust linear regression learner

michaelhecht
Member Posts:

**89**Guru
Since M5P which is in the Weka part of RapidMiner, doesn't perform acceptable (in my opinion)

I would really like to see a Learner for a piecewise multivariate robust linear regression comparable

to the SRT-approach one proposed by HUANG and TOWNSHEND in

http://www.landcover.org/pdf/ijrs24_p75.pdf

In my opinion this would be a quite optimal way to approximate arbitrary numerical measured data.

Although HUANG and TOWNSHEND didn't apply robust regression this should be definitely

implemented to avoid a too strong influence of outliers. Nevertheless the proposed SRT approach

has the big advantage of producing continuous functions in contrast to M5P.

If not possible, I would also appreciate a multivariate spline approximation of numerical data.

This should definitely be available

I would really like to see a Learner for a piecewise multivariate robust linear regression comparable

to the SRT-approach one proposed by HUANG and TOWNSHEND in

http://www.landcover.org/pdf/ijrs24_p75.pdf

In my opinion this would be a quite optimal way to approximate arbitrary numerical measured data.

Although HUANG and TOWNSHEND didn't apply robust regression this should be definitely

implemented to avoid a too strong influence of outliers. Nevertheless the proposed SRT approach

has the big advantage of producing continuous functions in contrast to M5P.

If not possible, I would also appreciate a multivariate spline approximation of numerical data.

This should definitely be available

0

## Answers

89GuruIt seem, that something comparable (even if different approach) seems to be the MARS, i.e

Multivariate Adaptive Regression Splines. I don't think, that such a technique is currently part

of RapidMiner.

This or the already mentioned SRT model should be part ofRapidMiner.

157GuruCurious if the RM development team has evaluated MARS and what they think of it as a possible future addition to RM?

Keith

89GuruI also found the R implementation called "earth" and I really would be happy if more of data mining and statistics

related functions of programmes like R (or matlab or scilab) would be included in RM to avoid frequent switching between both programs. Usually one has to do a certain amout of statistics prior to data mining application.

Nevertheless, I hope that a MARS like implementation is part of the next RM.

2,525Unicornwe are aware that MARS exists and I have here some todo lists, dating from two years ago, where MARS is one point left...I realy like this algorithm, because its linear in number of examples and number of attributes and seems to perform promising.

We have some really great improvments for RM 6, which make the work much easier, but still need a huge amount of developer time. So I can't promise, if we get this feature into the first version. But I'm glad to readd it to my current todo list

Greetings,

Sebastian

89GuruI'm really happy that you waited for exactly this request

Furthermore, even if I don't want to look greedy I think that a LOESS

algorithm for locally weighted polynomial regression is the other

important approach that is missed in RM. Since there are algorithms

customizing kd-trees for increasing the speed up it is stronlgy

related to data mining. Even if this approach is lazy I think it

should be avalable since ther are almost no demands on the

structure of the fitted data. As for all algorithms I would prefer a

robust approach ::)

2,525UnicornI will note LOESS, too, but I doubt, that we will get everything in the next major release. There's a lot of work to accomplish until then..

We evaluated the usage of KD-Trees and Ball-Trees a year ago and didn't find it worth the effort. Especially in high dimensions, they are loosing every performance advantage against a linear search. Perhabs its because of my implementation, but I don't think so...rather I suspect, it's another variant of the curse of the dimensionality...

Greetings,

Sebastian

89Guruif it isn't in the next release I can live with this - time doesn't matter ...

By the way, is there any planning to modify the polynomial regression

to a robust regression? Since we have a lot of "real" industrial data

with a certain amount of outliers all methods, but at least the regression,

should be outlier resistant.

I read a lot of data mining literature but it seems that there is no approach

to robust methods (except regression). So main focus is "only" on data preparation.

But how to decide which data points are outliers and which not?

There should be a method as follows:

1. Mark the outliers in a data set automatically e.g. with a robust LOESS/MARS method

2. Apply a "classical" data mining method with decreased weighting on the outliers

This is my request for RM 6 ;D