Multivariate stepwise robust linear regression learner

michaelhechtmichaelhecht Member Posts: 89  Guru
edited June 4 in Help
Since M5P which is in the Weka part of RapidMiner, doesn't perform acceptable (in my opinion)
I would really like to see a Learner for a piecewise multivariate robust linear regression comparable
to the SRT-approach one proposed by HUANG and TOWNSHEND in

http://www.landcover.org/pdf/ijrs24_p75.pdf

In my opinion this would be a quite optimal way to approximate arbitrary numerical measured data.
Although HUANG and TOWNSHEND didn't apply robust regression this should be definitely
implemented to avoid a too strong influence of outliers. Nevertheless the proposed SRT approach
has the big advantage of producing continuous functions in contrast to M5P.

If not possible, I would also appreciate a multivariate spline approximation of numerical data.
This should definitely be available ;)


Answers

  • michaelhechtmichaelhecht Member Posts: 89  Guru
    Since no-one reacts on my post I reply myself ;)

    It seem, that something comparable (even if different approach) seems to be the MARS, i.e
    Multivariate Adaptive Regression Splines. I don't think, that such a technique is currently part
    of RapidMiner.

    This or the already mentioned SRT model should be part ofRapidMiner.

  • keithkeith Member Posts: 157  Guru
    I happened to come across MARS a few days ago myself.  There's an implementation of it in R (its in a package actually called "earth" instead of "MARS" because the latter is a trademarked term).  Some preliminary testing found it to be a very effective learner, and computationally a lot faster than some other approaches I've tried.

    Curious if the RM development team has evaluated MARS and what they think of it as a possible future addition to RM?

    Keith
  • michaelhechtmichaelhecht Member Posts: 89  Guru
    Hello keith,

    I also found the R implementation called "earth" and I really would be happy if more of data mining and statistics
    related functions of programmes like R (or matlab or scilab) would be included in RM to avoid frequent switching between both programs. Usually one has to do a certain amout of statistics prior to data mining application.

    Nevertheless, I hope that a MARS like implementation is part of the next RM.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi guys,
    we are aware that MARS exists and I have here some todo lists, dating from two years ago, where MARS is one point left...I realy like this algorithm, because its linear in number of examples and number of attributes and seems to perform promising.
    We have some really great improvments for RM 6, which make the work much easier, but still need a huge amount of developer time. So I can't promise, if we get this feature into the first version. But I'm glad to readd it to my current todo list :)

    Greetings,
      Sebastian
  • michaelhechtmichaelhecht Member Posts: 89  Guru
    Hi again,

    I'm really happy that you waited for exactly this request  ;)

    Furthermore, even if I don't want to look greedy I think that a LOESS
    algorithm for locally weighted polynomial regression is the other
    important approach that is missed in RM. Since there are algorithms
    customizing kd-trees for increasing the speed up it is stronlgy
    related to data mining. Even if this approach is lazy I think it
    should be avalable since ther are almost no demands on the
    structure of the fitted data. As for all algorithms I would prefer a
    robust approach  ::)
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi Michael,
    I will note LOESS, too, but I doubt, that we will get everything in the next major release. There's a lot of work to accomplish until then..
    We evaluated the usage of KD-Trees and Ball-Trees a year ago and didn't find it worth the effort. Especially in high dimensions, they are loosing every performance advantage against a linear search. Perhabs its because of my implementation, but I don't think so...rather I suspect, it's another variant of the curse of the dimensionality...

    Greetings,
      Sebastian
  • michaelhechtmichaelhecht Member Posts: 89  Guru
    Fine,

    if it isn't in the next release I can live with this - time doesn't matter ...

    By the way, is there any planning to modify the polynomial regression
    to a robust regression? Since we have a lot of "real" industrial data
    with a certain amount of outliers all methods, but at least the regression,
    should be outlier resistant.

    I read a lot of data mining literature but it seems that there is no approach
    to robust methods (except regression). So main focus is "only" on data preparation.
    But how to decide which data points are outliers and which not?

    There should be a method as follows:
    1. Mark the outliers in a data set automatically e.g. with a robust LOESS/MARS method
    2. Apply a "classical" data mining method with decreased weighting on the outliers

    This is my request for RM 6  ;D
Sign In or Register to comment.