Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[SOLVED] Binomial test on examples whose attributes specify the parameters
tennenrishin
Member Posts: 177 Contributor II
Pardon me if this is a silly question.
Suppose I have an ExampleSet with two attributes, eg:
TC SC
005 3
010 7
150 83
etc...
...where TC is the (Bernoulli) "Trial Count" and SC is the "Success Count", and the probability of success at each trial is 0.5. I would like to perform the binomial test on each example. In other words, I would like to generate a new attribute SS (Statistical Significance) indicating (for each example) the probability that TC trials will result in at least SC successes. How should I approach this?
I can't see how I could construct the cumulative binomial distribution from the functions available in the Generate Attributes operator, except perhaps if TC is small, and using loops. I'm going to look into how much Hoeffding's inequality and Chernoff's inequality can help, but am I overlooking any simpler way of doing this? Perhaps some statistical tests already implemented in one of the RM extensions?
Thanks in advance.
Suppose I have an ExampleSet with two attributes, eg:
TC SC
005 3
010 7
150 83
etc...
...where TC is the (Bernoulli) "Trial Count" and SC is the "Success Count", and the probability of success at each trial is 0.5. I would like to perform the binomial test on each example. In other words, I would like to generate a new attribute SS (Statistical Significance) indicating (for each example) the probability that TC trials will result in at least SC successes. How should I approach this?
I can't see how I could construct the cumulative binomial distribution from the functions available in the Generate Attributes operator, except perhaps if TC is small, and using loops. I'm going to look into how much Hoeffding's inequality and Chernoff's inequality can help, but am I overlooking any simpler way of doing this? Perhaps some statistical tests already implemented in one of the RM extensions?
Thanks in advance.
0
Answers
It's very likely R will have something (I haven't checked ). If so it's not too hard to call an R script from an RM process.
Alternatively, does a Java library exist to do this calculation? If so, you could use a Groovy script.
regards
Andrew
I've never used R before. I'm willing to learn, but do I need to take on the full learning curve at this point, or is it possible to give me some pointers to get me up and running quickly on this particular application?
Best,
Isak
Here's an example of an R script being called
http://rapidminernotes.blogspot.co.uk/2011/06/counting-clusters-part-r.html
You'll have to do a bit of Googling to find the right R library for your specific requirement
regards
Andrew