Inferential Statistics  R, Python or Extension
michaelgloven
RapidMiner Certified Analyst, Member Posts: 46 Guru
As a partner, I am looking to use RapidMiner to integrate related inferential statistical methods such as hypothesis testing, confidence intervals, chisquare, etc. as part of a client implementation. I see there is a payfor extension to do this work, but given the simplicity of these methods and unwanted burden of managing a paid for subscription to integrate these methods for only occasional use, is there a nocharge library of operators available, or do I need to just leverage R or Python and create my own? We only need a few methods for occasional use and I'd like to know if there are other options besides R, Python or the payfor extension? Thanks!
Tagged:
0
Best Answer

michaelgloven RapidMiner Certified Analyst, Member Posts: 46 GuruI normally calculate the z test statistic by taking the sample mean (or median)  null hypothesis value (what I'm testing) all divided by the standard error assuming the constraints of the central limit theorem. So, for SE I usually use the sample standard deviation/sq root of samples. I then compare this result with the critical z value (1.65 for a one tail test and level of significance of 5%) to see if I should reject or accept the hypothesis. The math is quite simple, I was just looking for a simple operator to automate the work given how important testing our data and results is to our particular use cases. I believe I can make all of this work with your suggestions above.1
Answers
Dortmund, Germany
For each selected attribute a confidence of the Tukey Test is calculated. This confidence is defined as the distance between the current value to the median, divided by the distance of the lower/upper 'Tukey Test boundary' to the median.
So instead of mean and std_dev we take Inter quartile range and median. Median is more robust to outliers than mean, so i and many statspeople prefer it.
Can you have a look at Tukey test? We may just write the same stuff but with mean and std_dev if that's what you need.
Cheers,
Martin
Dortmund, Germany
Thank you in advance
Dortmund, Germany
in KS test, the KS statistics, pvalue will be returned as Dr Martin mentioned above. What is the usual significant level used by you in practice?
KStest http://haifengl.github.io/api/java/smile/stat/hypothesis/KSTest.html
Hope it helps.
YY
My problem is that I was trying to automatize the steps in T Test and F test, and I need more than the pvalue, like the statistics T and F,and the critical region.
Is there any way to calculate columns using the distributions F and T like in excel?
Thank you!