What does Wins Above Replacement or WAR mean in baseball?

Dear Sports Fan,

What does the stat Wins Above Replacement or WAR mean in baseball?

Thanks,
Rich

— — —

Dear Rich,

Wins Above Replacement is one of the many statistics that have either invaded or enhanced baseball over the past twenty years, depending on your perspective. It’s a single stat made up of many parts that tries to summarize the overall value of a player to the success of their team in a single number. The higher a player’s WAR number, the more they have contributed to their team’s success.

WAR is expressed as the number of wins a player’s team won thanks to their contributions as compared to what the team would have won (speculation warning) if they had been replaced by someone else. Baseball Prospectus, one of the several entities that has their own way of calculating WAR, lists Tim Keefe’s 1883 season as the best ever, during which he contributed 20.2 wins to the New York Metropolitans. If, instead of Mr. Keefe, the Metropolitans had had to scour their farm system for another right handed pitcher, this stat suggests they would have only won 34 games instead of the 54 they actually won.

Here’s the thing about WAR: because it’s intended to be an all encompassing statistic, it’s very complicated. It encompasses a lot of stuff! This is a screenshot from the Wikipedia article on WAR:

Woah! Don’t panic though. In order to understand WAR at a basic level, we need to understand two things — what elements are scored to show whether a player is doing well or poorly and what a replacement player is and how their potential contribution is defined.

What contributes to a player’s Wins Above Replacement?

Although different groups calculate WAR using different formulas, there are some general elements that go into figuring out a player’s contributions to their team. Offensive statistics having to do with hitting and base running are used. Defensively, an everyday player’s fielding is looked at whereas all sorts of pitching statistics are available for pitchers as well as the batting statistics of opposing teams. Overarching all of this is availability – in order to contribute, you’ve got to avoid injury.

What is a replacement player and how is their contribution defined?

Here’s where WAR gets all highfaluting and counterfactual. It’s all very well to measure a player’s contribution to their team’s victories but that’s just the first letter in a three letter statistic! The AR in WAR means “above replacement.” In other words — if Old Hoss Radbourn (the second highest single-season WAR player ever) had broken his ankle before the 1884 season, how would the Providence Grays have done? This question begs another question — who would he be replaced by?

The creators of the WAR statistic believe that there is a generally available and definable level of baseball talent that we can assume would replace any player out there. Given baseball’s well established farm system, (there are 19 minor league baseball leagues with 256 teams in the Major League Baseball ecosystem) that’s probably somewhat true. A replacement-level player is defined for most WAR calculations as one who is 80% as good as the average major league baseball player. One exception to that is the catcher position, which despite (or perhaps because of) being the coolest position in the sport, has fewer players who are decent at it. For that reason, replacement-level catchers are defined as 75% of the league average catcher.


So that’s WAR or Wins Above Replacement in baseball. I find it an interesting statistic because it contains within it the terrifying truth of many professional athlete’s lives that they are eminently replaceable. It’s also sort of funny to think about extending the logic of the statistic to everyday life. “How’s the fried calamari at this place?”
“It’s great — I’d say it’s about seven YAR”
“YAR?”
“Yums Above Replacement…”

Thanks for reading,
Ezra Fischer

What is a hockey assist?

Dear Sports Fan,

I was playing basketball the other day and one of my teammates complimented me on a “nice hockey assist.” I know that an assist is the pass right before someone makes a shot. What is a hockey assist?

Thanks,
Conrad


Dear Conrad,

A hockey assist refers to a pass that led to a pass that led to a goal or basket. It’s called a hockey assist because hockey is the only one of the major sports to credit players for it in basic official statistics. A hockey player who passes the puck to a teammate who scores is given an assist. A hockey player who passes the puck to a teammate who passes the puck to a teammate who scores is also credited with an assist. To distinguish the two types of assists, the first one is called a primary assist and the other is called a secondary assist. What the hockey world calls a secondary assist, the rest of the world calls a “hockey assist.”

Every sport has a historical group of simple statistics which defined how casual fans and even insiders thought about players for a long time. Examining the statistics can also tell us something about the culture of the sport. In hockey, one of those basic statistics was points, calculated by adding all of that player’s goals and assists. This is perhaps simplest way to judge a player’s worth. In a player’s cumulative season or career point total, a secondary assist counts just as much as a primary one. From this, we can intuit that hockey values teamwork and spreads out credit for achievements more than most sports. This rings true considering some of hockey’s other traditions, like putting the name of every player from the championship team on the Stanley Cup, hockey’s ultimate trophy.

The hockey assist is not without its critics. In fact, a quick google search reveals people who call it a lie, pointless, and less sense than almost any other rule in sports. People need to chill out. The statistical revolution has come to every major sport and has completely revolutionized the way players are evaluated within teams. No team worth its salt is going to make player decisions on statistics as fundamental as assists or points. Furthermore, as people have become more savvy about looking for meaningful statistics in other sports, the hockey assist received some serious consideration. Here’s a great blog post by Kevin Yeung for SB Nation’s Memphis Grizzlies blog, Grizzly Bear Blues, in which he explores the hockey assist in a basketball context. It’s worth a read if you’re interested in learning more about the value of your basketball hockey assist!

Thanks for reading,
Ezra Fischer

Why is 50% written .500 and said "five hundred" in sports?

Dear Sports Fan,

Here’s something I don’t get about sports. Why is 50% called “500” in sports? Is this some kind of metric system thing? Or is it for a purpose?

Thanks,
Emily


Dear Emily,

There are a variety of numbers in sports that are expressed as a number between zero and 1,000 when they more naturally might be thought of as a percentage. For example, a team that has won half its games and lost half its games is often said to be a “500 team” or playing “500 ball.” What this means is that they have won 50% of the games they’ve played. Likewise, a basketball player who scores on 38.9% of the three point shots she attempts may be said to be “shooting 389 from downtown.”

There’s nothing magical about these numbers, they’re just like percentages — a way of expressing the result from one number being divided by another. In the case of percentages, we take one number, divide it by another number, and then multiply the result by 100 and smack a percentage sign next to it. Thus one divided by two, which is .5 becomes 50%. The difference between a percentage and a sports number is that instead of multiplying by 100, sports numbers get multiplied by 1,000. At least when spoken out loud. A lot of times when you see a sports number like this written out, it will actually be written as “.500” even though no one would ever read it as “five tenths” instead of “five hundred” in a sports context.

If you always want to express a ratio to the hundredths place, it kind of makes sense to multiply by 1,000 instead of 100. It’s certainly easier to refer to something that happens three times every eight times it’s attempted as “three seventy five” than “point three seventy five” or even “thirty seven point five percent.” It’s not surprising that sports people feel that their numbers need this level of accuracy. People who live in and around sports often seem to be obsessed with accuracy to the point of over-precision. For example, players eligibly to be picked in the NBA draft tonight have their height measured down to the quarter inch with and without shoes! Commentators in many sports will often argue about whether it looks like something a player did took .25 seconds or .28 seconds. As if the commentators can actually judge a hundredth of a second difference from their perch at the top of the stadium! Because the difference between winning and losing in sports can sometimes be as slim as a fraction of an inch or a hundredth of a second it’s tempting to believe that all metrics in sports need to have that level of detail.

 

In defense of the sports way of expressing percentages, its historic source is a number that reasonably should be expressed to the tenth of a percent: batting average in baseball. Batting average is a player or team’s number of hits divided by their number of at bats. It’s not the best metric in baseball, in fact it’s a reasonably misleading one, but it does have a very long history. For many years, it was considered a key statistic in measuring how good a player was doing against to his current competition and for making historical comparisons. Baseball is obsessed with statistics because it creates them so nicely. Its long, 162 game season virtually guarantees that any measure one can imagine will have a statistically significant sample over the course of a year. With 30 teams and well over a dozen position players on each team, not even to mention the 120+ history of professional baseball, if you really want to know how a player’s batting average compares to his peers, you do need to take that number to the third decimal point. It would be far less interesting to say that Manny Machado and Adrian Gonzalez have the same batting average of 30% than to say that Machado is 18th in the league, with a .304 (pronounced “three oh four”) batting average and Gonzalez is 30th with a .296. Eight tenths of a percent may not seem like much, but over the course of a season (162 games times roughly 3.5 at bats per game) that’s a difference of four or five hits.

 

Of course, the source of the habit doesn’t matter so much if it is misapplied. Team records are the clearest form of misapplication. The problem with using this kind of number to express a team’s record is that aside from the most obvious numbers like .333, .666, and any of .000, .100, .200, and so on, these figures are very hard for us to translate into numbers in our heads. Quick — tell me how many wins and losses a team whose record is .527 has. According to the current MLB baseball standings, the answer is 39 wins and 35 losses, like the Toronto Blue Jays have. Although these numbers are convenient for creating a standings table (because they allow an easy comparison of teams who have played different numbers of games) they probably should not be displayed. In terms of figuring out how well your team is doing, the order of the teams in the standings and the games back metric are far, far better.

Regardless of how reasonable or unreasonable the sports percentage expression is, it’s deeply engrained in sports culture and seems to be here to stay. It’s easy to wonder though, if this small form of numerical manipulation makes it easier for sports people to mangle numbers in much sillier ways, like the habit of asking players to “give 110%.” That’s a story for another day.

Thanks for your question,
Ezra

Why isn't a shot that hits the post a shot in hockey?

Dear Sports Fan,

Over the past three years, I’ve become an ice hockey fan but there’s one thing that still really annoys me. Hockey fans and commentators often talk about “shots” as a meaningful statistic but it seems totally meaningless to me. Apparently a shot that hits the post doesn’t count as a shot — just the same as a shot that goes twenty feet wide. That distinction should mean something! What does the shot statistic means and why I should care about it?

Thanks,
Sonja


Dear Sonja,

Sports are full of statistics. From the outside looking in, it might seem like sports fans are just obsessed with statistics for no reason. That’s probably true for some sports fans but the purpose of a stat, the reason why it exists, is to represent some aspect of the game numerically so that it’s easier to know how well a team or player is doing. Stats are supposed to help the viewer understand what’s going on in a given game and to compare the performance of their favorite teams and players not only against their opponents but also against their own past performances. The sports world is in the midst of a thirty year statistical revolution during which many of the older statistics have been torn down and either replaced by new ones or simply discredited. Shots are one of ice hockey’s oldest statistic. Why don’t we examine what the shots statistic is, what it’s trying to tell us, and what some potential replacements could be.

The full name of the statistic which is commonly referred to as “shots” is “shots on goal.” In some ways, this helps explain what the statistic means and in other ways… well, in other ways, it probably serves only to further the confusion. “Shots” sounds like it should include any time a player winds up and shoots the puck, intending to score a goal, even if her shot is blocked or goes three feet wide. When you use the full name of the statistic, it becomes more understandable why shots that are blocked or miss the goal aren’t counted. That’s good — a stat’s name should reflect what it actually is. What you point out about hitting the posts or the crossbar is true though. Those shots are not counted in the shots on goal statistic even though they may feel like they should.

My totally unfounded guess about how this game about is that goalie statistics are a little bit older than skater statistics. Perhaps the shots statistic was created in response to an older goalie statistic. Saves — the number of times a goalie catches or deflects the puck away — makes sense. Want to know how active a goalie has been during a game? How many saves did he make? Shots seems like the reverse of saves plus the number of goals a team scores. Every time the goalie makes a save the opposing team registers a shot. Every time the goalie doesn’t make a save and a goal results, the other team registers a shot. Combining metrics like these would make the life of an early statistics keeper much easier. A shot that hit the post and didn’t go in is clearly not a save, so it didn’t get counted as a shot either.

The problem with the shots on goal statistic, which I think you are getting at by objecting to the way shots that hit the post are treated, is that it doesn’t do a very good job at telling us anything meaningful about the game. At first glance, it seems like it’s trying to show how well a team or player is doing on offense. Alas, it doesn’t distinguish between a puck that hit the crossbar and one that missed by six feet, even if those two acts are very different from a successful-offense perspective. It counts a harmless, non-threatening long-distance wrist shot but it doesn’t count a puck that nearly goes in before being blocked by a desperate defender. If a team wanted to inflate their shots statistic, they would just wildly throw the puck at the net every time they got near the offensive zone. That’s not a good offensive strategy for winning, so it seems like an offensive statistic shouldn’t encourage it.

Before we get to ideas for replacing this statistic, it’s worth mentioning that in real life, over a large sample size, which the 82 game regular season in the NHL is, shots is not a terrible statistic. Oh sure, in any given game it could be problematic for the reasons we just mentioned, but over time the better offensive players and teams do tend to generate more shots. This past year, the team with the most shots per game during the regular season was the Chicago Blackhawks, now playing in the Stanley Cup Finals, and the player with the most shots was Alexander Ovechkin, who also had by far the most goals. Shots don’t have to be a perfect statistic to be useful in part because no reasonable player or team actually modifies their behavior based on the shots statistic. It’s not perfect but I am still happier when the team I’m rooting for has more shots than the other team does.

One of the reasons players and teams don’t optimize for shots is because they probably don’t even use that statistic anymore. Although it’s still a mainstay of television production and newspaper columns, almost every team has its own group of statisticians who work for it. These folks create and keep much more meaningful proprietary statistics that they hope will give their team an edge over the competition. I have no idea what their statistics are but here are some other stats could replace or augment the shots statistic. In addition to shots on goal, you’ll sometimes see a “shots attempted” statistic. This counts any shot that misses or is blocked as well as ones that count as shots. That’s good because it’s basically not subjective and it’s process driven instead of outcome driven. A team that has the puck more and is playing better offensively will generate more shots, even if the majority of them miss or get blocked. Another stat that I like is “scoring chances.” This one is totally subjective. It counts any time a team looks like it legitimately might score, even if that moment doesn’t result in a shot. Virtually every time the puck hits the post, it would count as a scoring chance because if it had been an inch to the right or left, you’d have had a goal. Sometimes a scoring chance could happen without even an attempted shot. If a player is wide open in front of the net and whiffs on a pass to her and never makes contact with the puck, it’s still a glorious and missed scoring chance. The problem with scoring chances is that what you or I might think of as a legitimate chance, someone else who has more confidence in the goalie might consider a routine save and not count.

Statistics create a representational model of the sports they seek to quantify. Like drawing a stick figure, a statistic doesn’t need to be perfect, or even good, to be helpful. The shots on goal statistic isn’t a very good one, but when combined with others, it can give a general sense of how a game is going.

Thanks for reading,
Ezra Fischer

The 'this game is important' playoff series trick

It’s playoff time in the NBA and NHL, so if you walk into a sports bar or, you know, your living room, you’re likely to bump right into a great basketball or hockey game. The basketball and hockey playoffs follow virtually the same format. Each has four rounds and each round is a seven game series where two games play each other for up to seven games. The first team to win four games wins the series. Once a team has won four games, the series is over (they don’t play seven games no matter what) and one team advances to the next round of the playoffs and the other team is eliminated. The games in a series are referred to by number: Game One, Game Two, etc. When you watch a playoff game on TV, you’ll almost invariably hear the announcers talk about a statistic that goes something like this:

Teams that win Game X win the series Y percent of the time.

This statistic bugs me because it’s misleading and a transparent ploy on the part of the television networks to retain viewers. Here’s why it’s misleading.

When we hear a percentage, we’re used to evaluating it as if either 0% or 50% is the baseline. If I hear that “people who eat apples at 2:03 p.m. get hit by cars within the next two hours 54% of the time” I’m going to assume the baseline is close to 0% and go out of my way to avoid apples at that time. If I hear that “teams that wear green win 49% of the time,” that sounds to me like the baseline is 50% and green is a slight disadvantage. The difference with this statistic is that the baseline is not 50%. Not even close! One win in a seven game series is a big deal! Teams only need to win four games to win the whole series. A victory in any game is a 25% contribution to the final goal. I don’t know exactly what the math is here (math friends, help!) but I’m going to say, since they’re 1/4 of the way to winning, let’s add 12.5% (1/4 of 50) to 50% and use that as the baseline. Just by winning a game (no matter what number game it is) a team has materially contributed to its own task of winning the series. Fine, you say, “but the statistics you hear are even higher than 62.5%.” Just wait, there’s more.

The next tricky trick trick in this misleading statistic is a problem with how the data is selected. In my last post about misleading statistics, the one on runs in basketball, I described a trick about including too little data in a statistic. Here we have the opposite problem. Instead of excluding data, the clever (and dramatic) people who create these statistics include too much data. Almost every year, there are at least a few seven game series in the NHL and NBA playoffs that are mismatches. The playoffs are actually designed to create this. The way they work is that the best team in the regular season (the #1 seed) plays the worst qualifying playoff team (the #8 seed) in the first round. #2 plays #7, #3 plays #6, and #4 plays #5. Now, these are professional sports, so usually the difference between a #1 and an #8 is not as great as you might see in March Madness. Still, some #1 teams are just way, way better than the #8 team they face. Maybe the #8 wins one game but loses the series 4-1. Not infrequently, a superior team will actually win four straight games, which is called a sweep.

Sweeps are legitimate playoff series, but they’re not usually all that suspenseful. In a matchup between a clearly superior team and a clearly inferior team, use of one of these statistics would be silly because the number of the game is immaterial next to the fact that one team is better. In the NBA, the Cleveland Cavaliers just swept the Boston Celtics. The Cavaliers have the best basketball player in the world, LeBron James, and their second and third best players are almost unanimously thought of as better than anyone the Celtics have on their team right now. The Cavaliers are better. The big problem with this, is that the data gets lumped in with all the rest of the data. When you add their data in, it’s going to inflate the correlation between winning Games One through Four with winning the series.

What the statistic is really trying to convince us of is that the specific number of the game is important — that this game is more important than the one before it or after it in the series. To do that, it uses too much data (including series between teams of very different skills) and also our own assumption about what the baseline of a percentage statistic should be. It’s possible that some number games do have more impact on the result of a series between two evenly matched teams than others and I’d be very interested in seeing a true analysis of that. Until then, ignore what any commentator tells you about the importance of a game. Unless, of course, that game is Game Seven, in which case, even I can tell you that the team that wins Game Seven wins the series 100% of the time.