math Archives - Dear Sports Fan

How do people gamble on horse racing?

Dear Sports Fan,

How do people gamble on horse racing? Like most people, I’ll watch the Kentucky Derby or one of the other Triple Crown races if its on but I never understand the gambling talk. Can you help?

Thanks,
Kelly

Dear Kelly,

As with many sports, but perhaps even more so in horse racing, one of the primary attractions is gambling. There are lots of ways to bet on a horse race, so many in fact, that to the uninitiated it may seem like an impossible task. There are really only two key things that need to be deciphered to have a basic understanding of how to gamble on horse racing.

The first is how to understand odds. Each horse has odds expressed as a combination of two numbers that can be written as “40 to 1” or “40/1”. These numbers are simultaneously an expression of what people think is going to happen and how lucrative betting on that horse could be. The easiest way to think about this is by fitting the numbers into the sentence: If the race were run [sum of two numbers] times, you should expect this horse to win [second number] times. As you sub the numbers in, you can see why betting on a 40/1 horse (one that, if the race were run 41 times, should be expected to win only once) is called a long shot bet or one that is unlikely to pay off. A bet on the favorite, this year a horse named American Pharoah who currently has 5/2 odds (if the race were run seven times, you should expect him to win twice), is more likely to win. That’s why the payouts also vary depending on the odds. A long shot bet on a 20 to 1 horse will typically pay $21 for every one you bet while a 5/2 bet like the one you’d place on the favorite this year will typically pay only $7 for every one you bet. There’s no need to memorize the payouts but if you want to cheat sheet, ABC News has a handy one here.

The second piece of gambling on horses to learn is that there are several different things that you can bet on. This is a little like the prop bets that are so popular around the Super Bowl. In horse racing, betting on which horse is going to win is just the start of things. There are also bets called Place or Show that give you a little flexibility in case your horse doesn’t win. Betting on a horse to place means you win if they come in first or second while show means you win if they come in first, second, or third. With each additional piece of flexibility, you stand to win less though. The other main vector of betting is in the other direction — betting on your ability to predict not just which horse will come in first but also which will come in second, third, fourth, or even fifth. As you add horses that must finish the race in a specific spot, your chances of winning go down and your potential payout goes up. The name for each bet also gets increasingly silly. Predicting the top two exactly is called an Exacta, three a Trifecta, four a Superfecta, and five a Super High-Five.

Unlike other sports, where it’s usually recommended not to split your rooting interests for the sake of gambling (watching a game in which you’ve bet money against your favorite team is a confusing and disheartening experience) at a horse race, it’s often more fun to make multiple bets. If you take a liking to two or three horses, it can sometimes be better to bet different combinations of them in exactas or even trifectas than to bet them straight-up.

Now that you have a basic understanding of some of the key concepts and terms in gambling on horse racing, you can go off and lose (or win!) some money or you can test your knowledge. Keep your eyes peeled to Dear Sports Fan for our upcoming annotated version of the classic horse racing gamblers song, Fugue for Tinhorns from the musical Guys and Dolls.

Thanks for reading,
Ezra Fischer

The 'this game is important' playoff series trick

It’s playoff time in the NBA and NHL, so if you walk into a sports bar or, you know, your living room, you’re likely to bump right into a great basketball or hockey game. The basketball and hockey playoffs follow virtually the same format. Each has four rounds and each round is a seven game series where two games play each other for up to seven games. The first team to win four games wins the series. Once a team has won four games, the series is over (they don’t play seven games no matter what) and one team advances to the next round of the playoffs and the other team is eliminated. The games in a series are referred to by number: Game One, Game Two, etc. When you watch a playoff game on TV, you’ll almost invariably hear the announcers talk about a statistic that goes something like this:

Teams that win Game X win the series Y percent of the time.

This statistic bugs me because it’s misleading and a transparent ploy on the part of the television networks to retain viewers. Here’s why it’s misleading.

When we hear a percentage, we’re used to evaluating it as if either 0% or 50% is the baseline. If I hear that “people who eat apples at 2:03 p.m. get hit by cars within the next two hours 54% of the time” I’m going to assume the baseline is close to 0% and go out of my way to avoid apples at that time. If I hear that “teams that wear green win 49% of the time,” that sounds to me like the baseline is 50% and green is a slight disadvantage. The difference with this statistic is that the baseline is not 50%. Not even close! One win in a seven game series is a big deal! Teams only need to win four games to win the whole series. A victory in any game is a 25% contribution to the final goal. I don’t know exactly what the math is here (math friends, help!) but I’m going to say, since they’re 1/4 of the way to winning, let’s add 12.5% (1/4 of 50) to 50% and use that as the baseline. Just by winning a game (no matter what number game it is) a team has materially contributed to its own task of winning the series. Fine, you say, “but the statistics you hear are even higher than 62.5%.” Just wait, there’s more.

The next tricky trick trick in this misleading statistic is a problem with how the data is selected. In my last post about misleading statistics, the one on runs in basketball, I described a trick about including too little data in a statistic. Here we have the opposite problem. Instead of excluding data, the clever (and dramatic) people who create these statistics include too much data. Almost every year, there are at least a few seven game series in the NHL and NBA playoffs that are mismatches. The playoffs are actually designed to create this. The way they work is that the best team in the regular season (the #1 seed) plays the worst qualifying playoff team (the #8 seed) in the first round. #2 plays #7, #3 plays #6, and #4 plays #5. Now, these are professional sports, so usually the difference between a #1 and an #8 is not as great as you might see in March Madness. Still, some #1 teams are just way, way better than the #8 team they face. Maybe the #8 wins one game but loses the series 4-1. Not infrequently, a superior team will actually win four straight games, which is called a sweep.

Sweeps are legitimate playoff series, but they’re not usually all that suspenseful. In a matchup between a clearly superior team and a clearly inferior team, use of one of these statistics would be silly because the number of the game is immaterial next to the fact that one team is better. In the NBA, the Cleveland Cavaliers just swept the Boston Celtics. The Cavaliers have the best basketball player in the world, LeBron James, and their second and third best players are almost unanimously thought of as better than anyone the Celtics have on their team right now. The Cavaliers are better. The big problem with this, is that the data gets lumped in with all the rest of the data. When you add their data in, it’s going to inflate the correlation between winning Games One through Four with winning the series.

What the statistic is really trying to convince us of is that the specific number of the game is important — that this game is more important than the one before it or after it in the series. To do that, it uses too much data (including series between teams of very different skills) and also our own assumption about what the baseline of a percentage statistic should be. It’s possible that some number games do have more impact on the result of a series between two evenly matched teams than others and I’d be very interested in seeing a true analysis of that. Until then, ignore what any commentator tells you about the importance of a game. Unless, of course, that game is Game Seven, in which case, even I can tell you that the team that wins Game Seven wins the series 100% of the time.

Why runs in basketball are a lie

During virtually every basketball game you watch, men’s or women’s, college or professional, at some point a little graphic seems to float up onto the screen and an announcer will note its content to reinforce it’s message. “The UC-Irvine Anteaters are on a 9-2 run in the last three minutes and 26 seconds,” the announcer will say. What this means is that in the last X time Team A has scored Y points while Team B has scored Z points and Z is always significantly less than Y. This is supposed to be surprising and impressive. “Wow” the viewer is meant to think, “Team A is really beating up on Team B in a significant way. Scoring Y points and only allowing Z points must mean that Team A is way better than Team B.” This conclusion is certainly true sometimes but not nearly as often as you’re meant to think.

I have a book on my shelves called How to Lie with Statistics. It’s a classic and one of its lessons applies to this situation. A great way to lie about statistics, and one that must be used every time one of these runs statistics pops up in a basketball game is selection bias. Selection bias is a great way of lying with statistics. Wikipedia defines it as:

Selection bias refers to the selection of individuals, groups or data for analysis such that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed.

In this case, the way that the selection is biased is in its starting point. It’s end point is always the current moment of the basketball game. That’s an essential element of the con — “In the last X minutes…” The starting point is not random though, it’s carefully chosen. I guarantee that the second before the television station chooses to start the period, Team B (the one that seems to be losing terribly) scored a basket. Otherwise, why not extend the period further back? The longer it is, the more impressive it is.

If we assume that Team B scored right before the run started, than every time we see or hear about a run, we should add two (or three) points to Team B’s score. A 9-2 run becomes a 9-4 (or 5) run in our heads. A 7-0 run would more fairly be seen as a 7-2 (or 3) run. The reason why I say to add two or three points is the source of another form of trickery. Single points can be scored in basketball but by far the more common forms of scoring involve either two or three points being scored at once. That means a 9-2 run probably only involves four scores on the part of the team with 9 and one from the team with 2. (There’s lots of other ways this could happen, but this is the most likely. A 4-1 run seems less unlikely and therefore significant than 9-2. Basketball’s scoring system makes runs seem more crazy than they actually are.

The other piece of selection bias is this: the television station only points out a run when it happens. I know, that sounds utterly stupid, but it’s true. We don’t notice when the last 11 points have been split relatively evenly between two teams because no one points out that this has just happened.

I suspect that even if basketball were totally random — by which I mean that you could replace the basketball game in this scenario with someone flipping a coin a couple hundred times and marking down every Heads as a score for Team A and every Tails as a score for Team B — that you would expect to see runs worth noting by a commentator in almost every game. After all, a basketball game has around 140 possessions in college and around 190 in the NBA. If you think of it as 140 or 190 coin flips in a row, doesn’t it seem pretty likely that we’d see at least one run of four or five or six or even seven Heads with only one or two Tails mixed in?

I’m quite sure that there’s a mathematician out there who can help with the statistics in our coin flipping game. How likely are what types of runs in a game of 140-190 coin flips? If we can find that mathematician and pair her with a basketball statistics junkie who can find out what runs show up how often in real games, then we’ll be able to figure out whether the runs in basketball are actually notable or simply sleight of hand used by television producers to keep us glued to our seats. My money is on the magic trick.

— — —

Note 1: I use this trick all the time on this blog. I know it’s deceptive, but it is how most sports fans think about games — “this is an important game for my team because they’ve lost six of the last seven (of 82 or 162) regular season games. They need to break the streak!” I even think about games that way when it’s my favorite team involved. Sports fandom is not always or even often rational.

Note 2: The simple way to fix this would be to think about scoring in terms of arbitrary splits — what has the score differential been in the last two minutes or four minutes? This gets rid of one form of selection bias — the starting point — but it would still be vulnerable to the other kind of selection bias where commentator only note the split when it seems unusual.

March Madness mathematical musings

It’s March Madness time again, which means everyone is wandering around looking at print-outs or electronic versions of a bracket. The bracket shows a tournament with 64 teams divided into four groups of 16 each. Within each group of 16, the teams are ranked or seeded from 1 to 16. In the first round of the tournament, represented on the outside of the bracket, 1 plays 16, 2 plays 15, 3 plays 14, and so on until you reach the 8 vs. 9 game. Many of these pairs of numbers are instantly recognizable to most sports fans. We all know that a 16 has never beaten a 1, that 12 seeds seem to upset 5 seeds more frequently than one would expect, and that once you get to an 8 vs. 9 or a 7 vs. 10 game, the teams are so evenly matched that you can’t call it an upset when the 9 or 10 seed wins. It occurred to me yesterday (this is a pretty obvious realization, but cut me some slack, I did have a fever) that if you add the two seed numbers, every matchup in the first round adds up to 17.

Cool! Now I know lots of ways to add to 17. I wasn’t sure how this was going to help me in life but I kept thinking. 17… 17 is one more than 16. 16 is the number of teams in each quarter of the tournament. So, the seed numbers add up to one more than the number of teams left in each quarter of the bracket. Does that work for later rounds too? Well, let’s assume there are no upsets in the first round. Seeds 1-8 advance, seeds 9-16 lose. 1 plays 8, 2 plays 7, 3 plays 6, and 4 plays 5 in the next round. All of those numbers add up to nine, which is one more than eight. Eight is the number of teams left in that side of the bracket! If you keep going with this logic, again with no upsets, it keeps working for a while. The next round would have 1 playing 4 and 2 playing 3. 1 beats 4, 2 beats 3, and then 1 and 2 play for the right to represent this quarter of the overall tournament in the… Final Four! That’s when the four groups of 16 teams merge and become a single tournament. This is where the logic breaks down, because you would expect all four 1 seeds to make it, so that round’s sum would be two even though there are four teams left and the same would be true for the final game when there are only two teams left.

I might have lost you there for a minute (or maybe forever) but I’m about to bring it back to reality a little. We know that the favorites don’t always win during March Madness. Yesterday it seemed like the favorites were barely going to win at all! Already we’ve had 14 seeds beat 3 seeds, 11 beating 6, and 9 beating 8. This means that things won’t work so nicely in the second round. For example, instead of 3 seed Iowa State playing 6 seed SMU (adds up to 9) in the next round, we’re going to have 14 seed UAB playing 11 seed UCLA. 14 plus 11 is 25 not 9. The sum trick only works if the favorites always win.

Once I realized this, I was disappointed for a few minutes. Being disappointed because upsets ruin my little math trick is silly, of course. Upsets are what make March Madness so great. They’re what puts the Madness in March Madness. Then I had a (minuscule) Eureka moment. We can quantify exactly how “mad” each quarter of the bracket is by adding up the seed numbers of the teams that advance and subtracting the number we would have gotten if all the favorites had won. Call it the Madness Metric™. Using that same example of UAB and UCLA advancing instead of Iowa State and SMU, you would take their seeds, 14 and 11, add them to get 25 and then subtract 9 (the expected seed sum for the next round of the tournament) to get 14. 14 is pretty mad!

It’s not an advanced metric by any means, but it is a fun way to compare the regions (each quarter of the tournament is called a region because it’s played in one spot, not because the teams are from one place) to see which one is the maddest of them all! I’ll report back at the end of each round on this metric.