> Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this empirical probability of heads to be equal to the true probability of flipping a heads:0.5. Shockingly, Jack is wrong. If he were to sample one million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.
If I try to work this out, I write down the 16 possibilities for four coin flips:
TTTT, TTTH, TTHT, TTHH,
THTT, THTH, THHT, THHH,
HTTT, HTTH, HTHT, HTHH,
HHTT, HHTH, HHHT, HHHH
I count 24 instances where H occurs before the end of the sequence, 12 of which are followed by H and 12 of which are followed by T. So I get the expected 0.5 outcome.The authors do some other calculation, and I don't understand what they are thinking. Can someone explain?
Here's one way to reproduce the 40% number they get in the paper. Take a sequence of four flips. Consider five cases:
1. 0 heads. Probability of head following a head=0
2. 1 head. Probability of head following a head=0
3. 2 heads. Probability of head following a head = 1/3
4. 3 heads. Probability of head following a head = 2/3
5. 4 heads. Probability of head following a head = 1
Now if those five cases were equally likely, then what would be the expected number of heads following a head?
Answer: (0 + 0 + 1/3 + 2/3 + 1)/5 = 0.4
Is this what they assume gamblers are using for 'empirical probability'? I can't tell.
Their calculation seems to be something like
sumOverRunLengths[P(runLength)*(1 - runLength)].
I also wish I knew how to get HN to keep my newlines like you did.
For three flips I get this sample space:
TTT no data; TTH no data; THT one data point HT; THH one data point HH; HTT one data point HT; HTH one data point HT; HHT two data points HH, HT; HHH two data points HH, HH
total data points by result: HT 4 HH 4
For four flips it's the same deal, after an H I'm equally likely to get another H or a T. Of course.
What am I missing? I don't quite understand their definition of 'empirical probability.'