Options

Normalization (z-transform) formula?

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help

hi,

I am a bit confused what the true formula for z-transform normalization in rapidminer is, normally it is         (X-arithmeticMean(X))/std.deviation

 

but what is the standard-deviation formula? one like that:

https://docs.tibco.com/pub/spotfire/6.5.0/doc/html/norm/norm_z_score.htm

or one used in studentizing:

https://de.wikipedia.org/wiki/Studentisierung

 

or sample variance?

https://de.wikipedia.org/wiki/Stichprobenvarianz

 

and what do I use if I don't know the real arithmetic mean of my values X or the probability distribution of my values of X ?

because variance uses Expectation values from X, and its        variance = SUM(p(x)*X)

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist

    Fred,

     

    i totally do not get the question. You are doing data mining, so you always work on estimates of it. Thats the difference between Erwartungswert and Mittelwert in german. The only question is wether you correct with the -1 or not.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    ok then I was getting something wrong ;)

    when do I apply -1 correction, and when not? where is the difference?

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist

    I think the correct way is to use it with -1. That makes it a unbiased estimator (erwartungstreuer schätzer) for the true std. dev.

     

    Makes no difference for high n though.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.