"Probability of mean being the same in first and second part of series?"

wessel · March 2011

Assume a series of 1000 data points. The mean of the first 500 points is 2, with s.d. 1. What is the probability of finding mean > 0, on the second 500 data points?
Extra information:
The first 500 points are approximately normally distributed.
The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)

rakirk · April 2011

This is actually a more complex question than I thought upon initial inspection.

Knowledge of the first 500 in the series effects knowledge about the rest of the series.

Your question relates to the probability that the mean is > 0 and this requires several tools to answer: Bayes theorum, a truth table and the characteristics of your normal distribution.

1. Truth table: the polarity of the mean of the outcomes of the two halves of your distribution can be: +,- ; +, + ; -, - ; -, +
However, the question restricts these outcomes to +, - ; +, + ; -, + since the second half has to be > 0
Thus, the prior odds of the series is 1/3

2. There are conditional probabilities for polarity outcomes within the data based upon the evidence from your distribution:
P(+) : the SD is 1, meaning < 0 would fall > 2 SD away from mean (roughly 97% of data would likely fall above 0 given the distribution you have listed). Thus, P(+) = .97

P(-) = 1 - P(+) = 0.03

3. Now reason using the probabilities and the outcomes:
The probability that the second half of the data is positive = prior odds that positive outcome will occur * probability of obtaining your series outcome
= P(+|+) = P(+) * p(+|+)
~ (.97)*(.33)
~ .3201 or about a 32% chance.

So, I believe this is your answer.

regards,

rk

rakirk · April 2011

There was an error in my previous post. I believe that I was correct in stating that the odds of the series outcome is 1/3 and the odds for the mean to be positive is .97. However, I conditioned the results wrong. Through normalization we can see that the true answer is ~94%:

alpha<[(1/3)*(.97^2)],[(2/3)*(.97*.03)]> = <94.17,5.82>

wessel · April 2011

Hey, I believe you can only calculate it like this if you assume that points are independently distributed.

I believe the true answer is:
"Not enough information to give an answer"

Best regards,

Wessel

rakirk · April 2011

This approach assumes they are conditionally independent...

wessel · April 2011

Then I do not understand what you are doing.

I think you should look at the trend of the 500 data points.
The mean and variance are not sufficient statistics to say something useful about this trend.

Also I do not understand why you would want to restrict your values to + and -.
The data generated are real numbers.
Possibly by a random walk process.
http://en.wikipedia.org/wiki/Random_walk

If it were random walk, only the last value of the 500, would say something meaningful.
The mean and variance would be useless statistics.

Best regards,

Wessel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Probability of mean being the same in first and second part of series?"

Answers