RapidMiner

Can I log-transform attributes with different scales?

Highlighted
Elite II

Can I log-transform attributes with different scales?

hi,

is it ok if I transform attributes that are skewed but have different scales? e.g. salaries data and age for a certain income group or so...

like to attributes that are both skewed, but where the scales are very different... or should I first normalize them and then log -transform? 

 

does the transformation affect the scales in some way? and after log-transformation, should I still normalize them or is it not necessary anymore?

5 REPLIES
Elite III

Re: Can I log-transform attributes with different scales?

From my perspective, these are just two separate transformations.  Log transformation will change the shape of the underlying distribution whereas normalization will not.  Normalize is used if you are trying to bring all attributes into the same absolute value scale, such as when you are using algorithms that are sensitive to the numerical scale of the attributes, such as k-NN or PCA.  Log transformations are typically done to make distributions tighter or more "normal" rather than skewed.   You can do one or the other or both, depending on what you are trying to accomplish.   

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Elite II

Re: Can I log-transform attributes with different scales?

ok, but does log transformation affect scales also, or just the skew? but the scales will still be in the same kind of ranges?

and if k-NN works better without normalization, as I encountered with my dataset -- should I stick to no normalization or still normalize the data?

Elite III

Re: Can I log-transform attributes with different scales?

Naturally the log transformation alters the scale.  And depending on the orders of magnitude involved, it will not necessarily put attributes into the identical scale range either (unlike normalization which is used to put all attributes into the same scale).  Take a look at this quick sample process, which shows the impact of both normalization and log transforms on the labor negotiations sample dataset.

 

I always normalize when using k-NN.  If k-NN works better in your dataset without normalization, then implicitly what is happening is you are giving more weight in your distance metric to attributes that have larger absolute values.  This may by chance turn out to be a good thing, but typically it is not an intended consequence, nor is the relative weighting of the different attributes necessarily easy to understand.  If you have any nominal attributes and you are therefore using the mixed Euclidean measures distance metric, the asymmetrical impact is typically even worse.  It may really be an indication of model overfitting (even if you are doing cross-validation).  

 

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts

Attachments

Moderator

Re: Can I log-transform attributes with different scales?

As a rule of thumb, I always normalize when working with K-nn. Using the z-transformation method transforms the data into a normal distribution with a mean=0 and a variance of 1, so skew doesn't come into play.  

Moderator

Re: Can I log-transform attributes with different scales?

Looks like someone is up early on a Saturday... Smiley Happy