It’s playoff time in the NBA and NHL, so if you walk into a sports bar or, you know, your living room, you’re likely to bump right into a great basketball or hockey game. The basketball and hockey playoffs follow virtually the same format. Each has four rounds and each round is a seven game series where two games play each other for up to seven games. The first team to win four games wins the series. Once a team has won four games, the series is over (they don’t play seven games no matter what) and one team advances to the next round of the playoffs and the other team is eliminated. The games in a series are referred to by number: Game One, Game Two, etc. When you watch a playoff game on TV, you’ll almost invariably hear the announcers talk about a statistic that goes something like this:
Teams that win Game X win the series Y percent of the time.
This statistic bugs me because it’s misleading and a transparent ploy on the part of the television networks to retain viewers. Here’s why it’s misleading.
When we hear a percentage, we’re used to evaluating it as if either 0% or 50% is the baseline. If I hear that “people who eat apples at 2:03 p.m. get hit by cars within the next two hours 54% of the time” I’m going to assume the baseline is close to 0% and go out of my way to avoid apples at that time. If I hear that “teams that wear green win 49% of the time,” that sounds to me like the baseline is 50% and green is a slight disadvantage. The difference with this statistic is that the baseline is not 50%. Not even close! One win in a seven game series is a big deal! Teams only need to win four games to win the whole series. A victory in any game is a 25% contribution to the final goal. I don’t know exactly what the math is here (math friends, help!) but I’m going to say, since they’re 1/4 of the way to winning, let’s add 12.5% (1/4 of 50) to 50% and use that as the baseline. Just by winning a game (no matter what number game it is) a team has materially contributed to its own task of winning the series. Fine, you say, “but the statistics you hear are even higher than 62.5%.” Just wait, there’s more.
The next tricky trick trick in this misleading statistic is a problem with how the data is selected. In my last post about misleading statistics, the one on runs in basketball, I described a trick about including too little data in a statistic. Here we have the opposite problem. Instead of excluding data, the clever (and dramatic) people who create these statistics include too much data. Almost every year, there are at least a few seven game series in the NHL and NBA playoffs that are mismatches. The playoffs are actually designed to create this. The way they work is that the best team in the regular season (the #1 seed) plays the worst qualifying playoff team (the #8 seed) in the first round. #2 plays #7, #3 plays #6, and #4 plays #5. Now, these are professional sports, so usually the difference between a #1 and an #8 is not as great as you might see in March Madness. Still, some #1 teams are just way, way better than the #8 team they face. Maybe the #8 wins one game but loses the series 4-1. Not infrequently, a superior team will actually win four straight games, which is called a sweep.
Sweeps are legitimate playoff series, but they’re not usually all that suspenseful. In a matchup between a clearly superior team and a clearly inferior team, use of one of these statistics would be silly because the number of the game is immaterial next to the fact that one team is better. In the NBA, the Cleveland Cavaliers just swept the Boston Celtics. The Cavaliers have the best basketball player in the world, LeBron James, and their second and third best players are almost unanimously thought of as better than anyone the Celtics have on their team right now. The Cavaliers are better. The big problem with this, is that the data gets lumped in with all the rest of the data. When you add their data in, it’s going to inflate the correlation between winning Games One through Four with winning the series.
What the statistic is really trying to convince us of is that the specific number of the game is important — that this game is more important than the one before it or after it in the series. To do that, it uses too much data (including series between teams of very different skills) and also our own assumption about what the baseline of a percentage statistic should be. It’s possible that some number games do have more impact on the result of a series between two evenly matched teams than others and I’d be very interested in seeing a true analysis of that. Until then, ignore what any commentator tells you about the importance of a game. Unless, of course, that game is Game Seven, in which case, even I can tell you that the team that wins Game Seven wins the series 100% of the time.