Options

# "Probability of mean being the same in first and second part of series?"

Member Posts: 537 Maven
edited June 2019 in Help
Assume a series of 1000 data points. The mean of the first 500 points is 2, with s.d. 1. What is the probability of finding mean > 0, on the second 500 data points?
Extra information:
The first 500 points are approximately normally distributed.
The first 500 points are not independently distributed, for most points: point(t) is similar to point(t+1)
Tagged:

• Options
Member Posts: 29 Contributor II
This is actually a more complex question than I thought upon initial inspection.

Knowledge of the first 500 in the series effects knowledge about the rest of the series.

Your question relates to the probability that the mean is > 0 and this requires several tools to answer: Bayes theorum, a truth table and the characteristics of your normal distribution.

1. Truth table: the polarity of the mean of the outcomes of the two halves of your distribution can be: +,- ; +, + ; -, - ; -, +
However, the question restricts these outcomes to +, - ; +, + ; -, + since the second half has to be > 0
Thus, the prior odds of the series is 1/3

2. There are conditional probabilities for polarity outcomes within the data based upon the evidence from your distribution:
P(+) : the SD is 1, meaning < 0 would fall > 2 SD away from mean (roughly 97% of data would likely fall above 0 given the distribution you have listed). Thus, P(+) = .97

P(-) =  1 - P(+) = 0.03

3. Now reason using the probabilities and the outcomes:
The probability that the second half of the data is positive = prior odds that positive outcome will occur * probability of obtaining your series outcome
= P(+|+) = P(+) * p(+|+)
~ (.97)*(.33)
~ .3201 or about a 32% chance.

So, I believe this is your answer.

regards,

rk

• Options
Member Posts: 29 Contributor II
There was an error in my previous post. I believe that I was correct in stating that the odds of the series outcome is 1/3 and the odds for the mean to be positive is .97. However, I conditioned the results wrong. Through normalization we can see that the true answer is ~94%:

alpha<[(1/3)*(.97^2)],[(2/3)*(.97*.03)]> = <94.17,5.82>
• Options
Member Posts: 537 Maven
Hey, I believe you can only calculate it like this if you assume that points are independently distributed.

I believe the true answer is:
"Not enough information to give an answer"

Best regards,

Wessel
• Options
Member Posts: 29 Contributor II
This approach assumes they are conditionally independent...
• Options
Member Posts: 537 Maven
Then I do not understand what you are doing.

I think you should look at the trend of the 500 data points.
The mean and variance are not sufficient statistics to say something useful about this trend.

Also I do not understand why you would want to restrict your values to + and -.
The data generated are real numbers.
Possibly by a random walk process.
http://en.wikipedia.org/wiki/Random_walk

If it were random walk, only the last value of the 500, would say something meaningful.
The mean and variance would be useless statistics.

Best regards,

Wessel