Do I always need to exec. a normalization/z-trans. to compare data each other and apply a ML model?

Mike0985 · May 2021

Dear all,

First of all, I am a beginner in using RM and data science techniques. Therefore, please be patient with me. I got the attached NBA data set from Kaggle I am using for a university project work / exam.

In general, do I always need to execute a normalization (z-transformation) to compare data each other within my data set, e.g. NBA statistics in my data set > columns L - Q and W - AB, and apply a machine learning model, e.g. naive bayes or linear/logistic regression?

Is an outlier detection a real machine learning model or more a technique to filter out outliers? At which number of detected outliers is it advantageous to apply an outlier detection, e.g. 10 or more detected outliers?

I would be very grateful if someone could help me.

Regards,

Michael

Mike0985 · May 2021

Hello Martin,

Referring to your first comment "Well, it depends on the algorithm you are using. In general Normalization never hurts and it can help quite a bit. Some algorithms simply don't care like a decision tree. Then you loose interpretability but not predictive power."

I still do not know exactly which ML model to use for my data set. I´m still working on this issue. I put the data set into the auto model function and different ML models, like Naive Bayes or a Regression, could be possible reffering to e.g. the accuracy. Therefore, would you say to try both, with and without normalization, with the auto model function to see and compare which would fit best?

Reffering to you second comment "Outlier techniques can be used in several ways. They can be used to:"

I applied the outlier detection to my data set (more than 21.000 rows) but the detection could only reduce the data set by less than 10 outliers but it took more than 30 min. Would you say in this case an outlier detection is also useful or better leave it and spare 30 min for the data science process?

Thanks in advance.

Regards,

Michael

MartinLiebig · May 2021

Hi there,

In general, do I always need to execute a normalization (z-transformation) to compare data each other within my data set, e.g. NBA statistics in my data set > columns L - Q and W - AB, and apply a machine learning model, e.g. naive bayes or linear/logistic regression?

Well, it depends on the algorithm you are using. In general Normalization never hurts and it can help quite a bit. Some algorithms simply don't care like a decision tree. Then you loose interpretability but not predictive power.

Is an outlier detection a real machine learning model or more a technique to filter out outliers? At which number of detected outliers is it advantageous to apply an outlier detection, e.g. 10 or more detected outliers?

Outlier techniques can be used in several ways. They can be used to

Clean the data set to make it more interpretable
Get better models, since some models are effected by outliers (i.e Linear Regresion).
A technique to gather information / a ML model. For example in predictive maintenance or in fraud detection.

It all depends on how you use it.

Cheers,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Do I always need to exec. a normalization/z-trans. to compare data each other and apply a ML model?

Best Answer

Answers