# "Probability of mean being the same in first and second part of series?"

Assume a series of 1000 data points. The mean of the first 500 points is 2, with s.d. 1. What is the probability of finding mean > 0, on the second 500 data points?

Extra information:

The first 500 points are approximately normally distributed.

The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)

Extra information:

The first 500 points are approximately normally distributed.

The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)

Tagged:

0

## Answers

29MavenKnowledge of the first 500 in the series effects knowledge about the rest of the series.

Your question relates to the probability that the mean is > 0 and this requires several tools to answer: Bayes theorum, a truth table and the characteristics of your normal distribution.

1. Truth table: the polarity of the mean of the outcomes of the two halves of your distribution can be: +,- ; +, + ; -, - ; -, +

However, the question restricts these outcomes to +, - ; +, + ; -, + since the second half has to be > 0

Thus, the prior odds of the series is 1/3

2. There are conditional probabilities for polarity outcomes within the data based upon the evidence from your distribution:

P(+) : the SD is 1, meaning < 0 would fall > 2 SD away from mean (roughly 97% of data would likely fall above 0 given the distribution you have listed). Thus, P(+) = .97

P(-) = 1 - P(+) = 0.03

3. Now reason using the probabilities and the outcomes:

The probability that the second half of the data is positive = prior odds that positive outcome will occur * probability of obtaining your series outcome

= P(+|+) = P(+) * p(+|+)

~ (.97)*(.33)

~ .3201 or about a 32% chance.

So, I believe this is your answer.

regards,

rk

29Mavenalpha<[(1/3)*(.97^2)],[(2/3)*(.97*.03)]> = <94.17,5.82>

537GuruI believe the true answer is:

"Not enough information to give an answer"

Best regards,

Wessel

29Maven537GuruI think you should look at the trend of the 500 data points.

The mean and variance are not sufficient statistics to say something useful about this trend.

Also I do not understand why you would want to restrict your values to + and -.

The data generated are real numbers.

Possibly by a random walk process.

http://en.wikipedia.org/wiki/Random_walk

If it were random walk, only the last value of the 500, would say something meaningful.

The mean and variance would be useless statistics.

Best regards,

Wessel