# ANOVA

Hello all of you

I am currently playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.

My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.

many thanks in advance

greetings

Steffen

I am currently playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.

- Do you agree (with your experience), that the assumption of homogeneous variance can be ignored if the checked sequences have equal length and are approximately equally distributed (same distributions, but differing parameters) ?
- What about Kruskal Wallis ? It may be more conservative (rejecting H0 more often), but since it is rank-based it can be applied to any performance measure without to much trouble (I suppose).
- What about "local testers" like Scheffé or Turkey ? Is their absence in RM a consequence of agreement ("bah. Those are useless") or time ?

*valid*testsetup. I thought really deeply about this and ... I know that significance testing is not the way to ultimate truth, but for the first step I want to create a setup that is acceptable in terms of the current "state of the art". I have talked to other students and people at my home university and read a lot of papers which lead to the picture that significance testing is not thaaaaat important in data mining :-\My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.

many thanks in advance

greetings

Steffen

0

## Answers

347GuruTukey

-assumes normal distribution (since t-test is allowed for testing performance values like auc this should not be a problem)

-assumes that the samples have equal size (no problem)

-Tukey tells me where a difference is given (unlike ANOVA)

-Tukey is not that conservative (unlike rankbased Steel/Dwass. Rankbased procedures may be mor reliable, but I prefer less conversative tests)

greetings

Steffen

1,643RM FounderSo, back to the questions: I am not too much of an expert for the details (hey, after all I

ama data miner ) but as far as I know you can ignore the test. At least this is what the statisticians I know usually do.For all of those the reason why they are missing is simple: lack of time combined with the fact that no one asked for them yet. But that's exactly the point for all those significance tests: the results are only valid if the assumptions are correct. And for Tukey the assumptions are pretty similar to those for paired t-tests / ANOVA: if the data does not follow a normal distribution the results will simply not be valid at all.

But that's also true for paired t-test and still I cannot fully recommend those for all cases (beside the assumptions).

Sorry, I cannot comment on that. Anyone else?

Cheers,

Ingo

347GuruThank you Ingo for your estimation. I guess I got to restrain my efforts to find the best test for my current problem (instead of global truths) or I will never finish the project...

I just want to add a remark: The problem is to find a test which is capable of multiple comparisons. Applying the paired t-test more than once is not valid since the problem of the cumulation of the alpha error. So...Anova and Tukey are capable, but meanwhile ANOVA just checks IF their is the difference between the means Tukey tells me WHERE the difference is.

aside: Today I stumbled on a paper using t-test for AUC, of course without an explanation. First one I have seen doing this...I found no argument for this, but ... sometimes I wonder if the problem is on my side, when I am trying to be more correct then some data mining researchers out there >:( .Seems to me like this parents to children relationship: children are not allowed to do certain things the parents do because the children (students) are not able to estimate the consequences...

*grumble*

Steffen