Wednesday, May 6, 2009

Streaks and stats

On Baseball Musings, the fine summary site run by David Pinto, he quotes another site on the current winning streak (now at 17) of Milwaukee over Pittsburgh:
Let’s assume, for a second, that the Brewers have a 60% chance of beating the Pirates in any given game. The chances that they win two in a row at that rate would be 0.6*0.6, which equals 0.36, or 36%. A three-game sweep? The odds of that are 21.4%. The odds of the Brewers winning 17 in a row? The chances of that are 0.17%. Even if you assume the Brewers have a 70% chance to beat the Bucs, the odds jump to 0.23%. If we bump the odds of a Brewers win in any given game to 80% (which is pretty high, even for the Pirates and Brewers), this sort of thing only happens 2.3% of the time.
I wrote the following comment:
That analysis is fine, as far as it goes, but it's more valid to look at the probability that any matchup over a period of time might bring about that result, in which case the chances are considerably higher.

We are looking, right now, at a result that's caught our eye simply because it's improbable. But there are 30 MLB teams, and 435 possible matchups. A back-of-the-envelope calculation with the 60% figure shows us that there's actually a 52.3% probability that one of the 435 matchups leads to a 17-game streak.
I wrote that this morning, and time precluded me from doing a better job. So let me unfold that a bit more, and correct a big mistake I made.

The original quote incorrectly figures the probability of any one team beating another team 17 straight times given a single probability of winning (we know that's untrue, as the percentage should actually fluctuate quite a bit depending on starting pitchers, home field advantage, injuries, and what have you, but the overall unlikelihood is roughly correct). It gets off to a rough start, in that the probability of winning three in a row is 21.6%, not 21.4%. The probability of 17 in a row is actually (0.6^17), or 0.017%. (With 70%, it is 0.23% - drat, that's where the first number should have seemed wrong to me - I told you I was in a hurry earlier.)

As I stated in my original comment, there are 435 (30*29/2) possible matchups between two teams in a 30-team league, so, if you're looking for the probability that some two teams will feature a 17-game streak, you have to use that (unless for some reason you really are focusing just on the Milwaukee-Pittsburgh competition).

So, and here is where I revise my original comment, the probability that one of the 435 matchups will contain at least one 17-game winning streak will be (using 60% - I know that's wrong, I'll get back to it in a minute) 1 - (1 - 0.017%)^435, or 7.1%. At 70%, the number rises to 63.7%.

Now, it's obvious that not every one of those matchups can be at 60% or 70%; one would assume from symmetry that they should average out at 50%. While I could write a quick program to simulate the range of winning percentages, that's a bit more of a project than I would prefer to take on today. I'll just use 50% for all matchups, which puts a floor on the result (I should explain; any deviation from the 50% figure creates a higher probability of such a streak, and you can see that by considering the case where one team has a 100% chance of winning a game against the other team, where a 17-game streak, or longer, is self-evident).

At 50%, the probability that one team can beat another 17 times in a row is very small, 0.00076%. Even with 435 matchups, one would expect to see a streak like this only 0.33% of the time, about 1 in 300. But that's the number at one point in time, and there are numerous opportunities to see a streak of this type.

I won't essay the math on how often we should see such a streak over the whole of major league baseball; the noise threatens to overtake the real information after a while. I will offer a simplification to at least gauge a result, that this problem is analogous to that of any 17-game winning streak. In that case, we should see a streak of this length 1 in 131,072 times. But we have 30 teams, and the streak can start in any one of the 162 games in a season, so we might expect to see a streak like this every (131,072/(30 * 162)) seasons, which is right around 27.

In this light, that this is the longest such streak since 1970 would make us think that, just maybe, we shold have expected this before now. It certainly doesn't seem that this quote from the original article, "When one team loses to another seventeen straight times, it's not just a case of good vs. bad. It's a case of good vs. bad with a sprinkle of bad luck and an incredible rash of improbability" is true at all.

No comments:

Clicky Web Analytics