Options

Attribute Question

GhostriderGhostrider Member Posts: 60 Contributor II
I have two time series that I want to feed into a learning algorithm, probably start with a neural net or SVM.  When I plot the time series, the gap or vertical space between the two lines is meaningful.  However, should I make this an attribute?  Or would the absolute position of each point be sufficient (the vertical space can be derived from absolute position)?  Generally, how do I know when I should construct a synthetic / derived attribute?

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    yes, if the gap is meaning ful, you definitely should add a new attribute before transforming the series, e.g. by windowing. Things like this - for example also extracting additional describing features - usually help a lot since only a few data mining schemes get the importance of implicit features like (x_t - y_t) without adding them to the data. Other extracted features often help since the abstract from the actual absolute values.

    How to know? Well, just try it and check if it helps to improve your prediction performance. In general, many modern data mining schemes are good in giving unnecessary features a low weight (or you could add an additional feature selection for supporting this) but they can hardly construct any implicit feature on their own.

    Cheers,
    Ingo
  • Options
    GhostriderGhostrider Member Posts: 60 Contributor II
    "extracting additional describing features - usually help a lot since only a few data mining schemes get the importance of implicit features like (x_t - y_t) without adding them to the data"

    What data mining schemes get the importance of implicit features?  Seems like those would be the only ones I'd be interested in using.  Only drawback I am guessing would be over-fitting.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    well, the most often used candidate for this probably would be genetic programming. In my very early data mining days I used them a lot but quickly found them not robust and stable enough. Genetic programming is very likely to overfit and you would have to embed some type of regularization in order to prevent this. I would recommend to use a feature generation approach (like the operator YAGGA2) with a robust inner learner instead and add some regularization, for example by taking the number and / or the complexity of the features into account.

    If you want to read more about this, I would recommend my PhD thesis. About 300 pages funny stuff around these and related questions :-)

    Cheers,
    Ingo
  • Options
    GhostriderGhostrider Member Posts: 60 Contributor II
    Ingo,
    I downloaded your thesis and it looks very good.  I think you should share this someplace, maybe make a sticky thread as I think others could benefit from reading it...I'm only 10 pages into it...do you know of any other good machine learning resources for beginners?

    Yes, I have been experimenting with genetic programming, using the ECJ project.  I would actually expect GP to overfit a lot less if you prune / constrain size of GP tree.  At least the over-fitting is controllable unlike a lot of other machine learning algorithms.  Also, a big advantage of GP is that you can understand what was learned, it produces a readable parse tree.  If I feed data into an ANN or SVM, I have no idea what it has actually learned.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    thanks for your kind words. Well, I suppose my PhD would hardly count as good introduction  ;D  but we had a thread here in the forum a couple of months ago discussing recommendations for several books in the field, maybe those could help as a good starting point:

    http://rapid-i.com/rapidforum/index.php/topic,1837.msg7910.html

    I would actually expect GP to overfit a lot less if you prune / constrain size of GP tree.
    That's true but this does not really help if you allow arbitrary functions at every place of the parse tree since in many cases you will end up with a shallow tree containing every function you can think of having them mixed together. And it does not help for getting stable results: change the data only a bit and you will often end up with completely different results.

    At least the over-fitting is controllable unlike a lot of other machine learning algorithms.
    Hmm, I don't think so. Almost every learning scheme proposed during the last 20 years have some built in regularization for controlling over-fitting. Of course it is a user parameter (which is an annoying fact which serves as one of the major motivations of my PhD) but nevertheless it can be controlled. Actually, the only popular learning scheme popping into my mind which does not really offer anything nice for this are neural networks - which probably is one of the major reasons (beside loooooong runtimes....) why I don't use them often.

    lso, a big advantage of GP is that you can understand what was learned, it produces a readable parse tree.  If I feed data into an ANN or SVM, I have no idea what it has actually learned.
    I completely agree, that really is a strong point of Genetic Programming! At least for ANN you cannot really understand anything just from the model. Things are in my opinion different for SVM though...

    Cheers,
    Ingo
Sign In or Register to comment.