Numerical attributes transformations? Log, 0;1 range, outliers, etc...

mafern76mafern76 Member Posts: 45 Contributor II
Hi!

I was wondering what are, in your experience, the best practices to try out when transforming numerical attributes into forms digestable by predictive algorithms such as Neural Networks, Regressions, SVM...

There's the old convert range to 0;1 rule but that can be expanded, for example..

If the attribute has negative values should it still go 0;1 or -1;1 with negatives left to zero and positives right to zero?

Is it better to use a z-transform to center on 0, regardless of original values?

If you use standard deviation to force outliers' values, how many standard deviations do you take? 2? 3? 5?

Do you simply apply a log to every attribute to spread out values or do you manually look into graphs and decide which ones could benefit from getting "logged"?

When using log, how do you deal with values lower than 1?

What other things do you do to accomodate values to what algorithms benefit the most from?

What do you understand for "what algorithms benefit the most from"?

Thanks for your insight, best regards.

I'm planning on launch a battery of experiments with these posibilities, but I was hoping maybe to have some insight before starting.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    for most of your questions there are nor simple answers, just some rules of thumb. For example, we usually prefer the Z-Transformation because it is more robust against outliers and allows an easy cut at 2, 3 or 5 standard deviations. How many standard deviations you allow depends on the data and the setting you are in.

    Calculating the log function also depends on the data, you really should look at your data beforehand, or as an alternative calculate the log function (or square, ore root, or whatsoever) on all attributes and apply a Feature Selection afterwards to select the most expressive features and function thereof.

    Also the when to use what algorithm is not so easy, again just some rules of thumb such as "SVM for sparse data with many attributes, Decision tree never for data with many attributes, but often good for data with a lot of examples", and so on.

    Best regards,
    Marius
  • mafern76mafern76 Member Posts: 45 Contributor II
    Hi thanks for your answer, very interesting to know others' opinions.

    I have never thought about adding another layer of attributes consisting of original vs. log vs. square vs. etc., but it goes well with my brute force approach so I'll include it.

    I guess experience will teach me more and more what is reasonable to explore and what leads nowhere.
Sign In or Register to comment.