Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
ANOVA
Hello all of you
I am currently playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.
My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.
many thanks in advance
greetings
Steffen
I am currently playing messing around with statistics to check my validation results. Reading some literature I have a question about ANOVA. Since the operator is part of RM, I assume that it is considered useful.
- Do you agree (with your experience), that the assumption of homogeneous variance can be ignored if the checked sequences have equal length and are approximately equally distributed (same distributions, but differing parameters) ?
- What about Kruskal Wallis ? It may be more conservative (rejecting H0 more often), but since it is rank-based it can be applied to any performance measure without to much trouble (I suppose).
- What about "local testers" like Scheffé or Turkey ? Is their absence in RM a consequence of agreement ("bah. Those are useless") or time ?
My current choice would be the Tukey-Test. ANOVA is (in my current point of view) as useful as a mathematical proof of existence.
many thanks in advance
greetings
Steffen
0
Answers
Tukey
-assumes normal distribution (since t-test is allowed for testing performance values like auc this should not be a problem)
-assumes that the samples have equal size (no problem)
-Tukey tells me where a difference is given (unlike ANOVA)
-Tukey is not that conservative (unlike rankbased Steel/Dwass. Rankbased procedures may be mor reliable, but I prefer less conversative tests)
greetings
Steffen
So, back to the questions: I am not too much of an expert for the details (hey, after all I am a data miner ) but as far as I know you can ignore the test. At least this is what the statisticians I know usually do.
For all of those the reason why they are missing is simple: lack of time combined with the fact that no one asked for them yet. But that's exactly the point for all those significance tests: the results are only valid if the assumptions are correct. And for Tukey the assumptions are pretty similar to those for paired t-tests / ANOVA: if the data does not follow a normal distribution the results will simply not be valid at all.
But that's also true for paired t-test and still I cannot fully recommend those for all cases (beside the assumptions).
Sorry, I cannot comment on that. Anyone else?
Cheers,
Ingo
Thank you Ingo for your estimation. I guess I got to restrain my efforts to find the best test for my current problem (instead of global truths) or I will never finish the project...
I just want to add a remark: The problem is to find a test which is capable of multiple comparisons. Applying the paired t-test more than once is not valid since the problem of the cumulation of the alpha error. So...Anova and Tukey are capable, but meanwhile ANOVA just checks IF their is the difference between the means Tukey tells me WHERE the difference is.
aside: Today I stumbled on a paper using t-test for AUC, of course without an explanation. First one I have seen doing this...I found no argument for this, but ... sometimes I wonder if the problem is on my side, when I am trying to be more correct then some data mining researchers out there >:( .Seems to me like this parents to children relationship: children are not allowed to do certain things the parents do because the children (students) are not able to estimate the consequences...
*grumble*
Steffen