By In Stuff

Finding Peace In WAR

So, I was just out at the Sloan Sports Analytics Conference in Boston … and it was awesome, as you might expect. I’ll have more to say about the overall conference in the next couple of days, but there were baseball tidbits that I thought some of you might find interesting and will try to post throughout the day.

OK, so it’s no secret that I’m a big fan of the statistical concept of Wins Above Replacement (WAR). I like the idea of trying to find a number that can give us a rough idea of how much a player contributes to his team. I like the idea of gathering a player’s production as a hitter, base runner and fielder and trying to estimate how many extra wins he adds (or takes away). I like the idea of trying to isolate a pitcher’s contribution to a team. It’s an imperfect but utterly fascinating statistic now and it will get better over time. I love baseball more because of WAR.

There are several versions of WAR out there – most prominently at Baseball Reference and at Fangraphs — and this has been an issue for many people, even those open to new statistics. They will say, “How can WAR be a good statistic if people figure it differently? How can I put any stock in a statistic if the Fangraphs version of it has Chase Headley as being 7.5 wins above replacement while Baseball Reference has him six wins above replacement?”

There are a few different answers to this. The main one — explained well by the Sam Miller in a good piece at ESPN the Magazine — is that baseball is a complicated thing … and complicated things do not lend themselves to clean and easy answers. Fangraphs WAR is figured differently from Baseball Reference WAR — especially for pitching and defense — because people at Fangraphs and Baseball Reference have different beliefs about the best way to measure players. They like their individual methods. And it so happens that their individual methods will sometimes come to surprisingly divergent conclusions (though, more often than not, the two are in basic agreement).

I think one thing about embracing advanced statistics is that you just have to be open to a little bit of chaos. Sure, batting average will give you the same answer every time. But bating average isn’t a very telling statistic.

I will say, though, do think it would be better for everyone if the B-Ref and Fangraph versions came somewhat closer together. I don’t mean they should change the way they measure players. I’m just saying that their methods are more similar than different, and it would be great for them to find common ground where common ground can be found.

Well, it seems like they might — MIGHT — be coming a little closer together in an important way. I ran into the brilliant Sean Forman at the Sloan Conference and he told me something — apparently Fangraphs and Baseball Reference have a slightly different idea of what constitutes replacement level. As you already know, WAR measures players against a fictional “replacement player,” who is supposed to represent the sort of player a team could easily acquire from the minors or on the waiver wire.

I didn’t know that Fangraphs and Baseball Reference had slightly different replacement levels. It’s a very small difference, but very small differences in a statistic like this — which is figured to a tenth of a point — are very important.

Sean told me that, for instance, Baseball Reference WAR and Fangraphs WAR have very similar views on the production of Jack Morris. The two figure pitching in different ways — with Fangraphs relying a lot on walks, strikeouts and home runs while Baseball Reference gives significance to runs allowed — but Sean said their numbers should, more or less, line up. They don’t.

Fangraphs has Morris being worth 56.9 wins above replacement. Baseball Reference has him at 39.3 WAR.

Forman says almost the entire difference is based on the value the two groups give a replacement player.

And so, good news, he says that he intends to meet with Fangraphs folks and try to hammer out a consistent value for replacement level. He said they might try to get Tom Tango and others involved too. I think this would be great news for the statistic. It’s a fine thing for Fangraphs WAR and Baseball Reference WAR to be different. But it would be great if they could start in the same place.

Print Friendly, PDF & Email

16 Responses to Finding Peace In WAR

  1. djangoz says:

    That would be great. I really like WAR too, but getting those two different WARs closer together sure would help.

    Though what you said about walks, strikeouts and home runs vs runs allowed makes me think that from here on out I’ll be paying attention to Fangraphs WAR a lot more.

  2. Joe Ram says:

    I’m not feeling this idea. The point of a statistics is to tell us something about a player, if they have different calculations, call them different things (You may have mentioned these before but “bWAR” and “fWAR” work for me). To simply say you guys should modify your criteria so that you match more closely, and therefore make people who are uncomfortable with statistics more comfortable is not logical. If both sides want to agree on the same definition of how to value a replacement player that is fine, I guess, but not necessary.

    “Let’s change this because people are too stupid or too lazy to understand the differences,” is just wrong.

  3. Joe Tiburzi says:

    It’s worth pointing out (well, worth it to me) that while Batting Average is consistent, it doesn’t always tell you the same thing. Two guys with the same batting average can and usually do have wildly different value, depending on their number of at bats, walks, power, etc. The difference between the two WARs bothers me too, but all stats have flaws.

  4. Richard says:

    I’d like to see the concept of a “zero player”. As I understand it, a team of replacement players would win some small number of games, 35 to 40. (On the other hand, it’s a big deal whether a replacement-level team would win 35 or 40 games.)

    Teams in their losses get outscored by a margin of about 2.2 to 1, a margin that’s actually quite consistent over the years. So Team Zero would be just good enough to create that ratio, maybe getting outscored 1,100 runs to 500 or so. Put Mike Trout on Team Zero and that’s a 10-win team, scoring 600 runs and giving up 1,090 or something. Or you could do it in reverse: Remove Trout from the Angels and add in the CF from Team Zero, who bats .220/.290/.350, and they’re a 79-win team.

    • Jake says:

      This comment has been removed by the author.

    • Jake says:

      whoops, I said something dumb because I forgot the definition of Pythagorean wins and losses.

      I did sqrt(RS^2/(RS^2+RA^2)), which gave Team Zero 48 victories per year.

      eliminating the sqrt(), making the formula correct, gave 14 wins, which jives better with the idea that Team Zero really, really sucks.

  5. Tangotiger says:

    WAR is a framework. Fangraphs has their implementation of WAR (fWAR) and Reference has their version (rWAR). The reader out there has their own as well (though perhaps not as well-defined and not as consistent), since every reader has his final opinion/list.

    I like that fWAR and rWAR have very distinct ideas as to how to handle pitching. As Bill James once noted, since we are providing estimates, it’s nice if we have different ways of trying to estimate the same thing. That’s a feature, not a bug. Sometimes they are not close, but most of the times it is close.

    And I like Poz’s view that in the places where they are close, you may as well get closer there if you can.


    Interesting thought about setting replacement level as the RS/RA level of the losses. The ratio of runs scored to runs allowed in those cases is about 3:1, which would set the win% at .100. I don’t think that works, but it’s an interesting thought. The best guess for replacement level for a team is somewhere between .250 and .350. That is, have them play long enough so that bad luck and good luck will cancel out, and that’s what you’ll find.

    • Jake says:

      I don’t think, conceptually, it makes sense to set replacement level at RS/RA level of all the losses.

      that implies that we start with a premise that the replacement team loses every game – but we are trying to find out what percentage of games the replacement loses, and thus by setting it to 0, we are doing some circular reasoning.

  6. Grover Jones says:

    We shouldn’t worry about differing WARs. If I’m looking to buy a stock, and Vanguard and TD Rowe have different statistics for potential stocks I’m looking to buy, does that mean the statistics are worthless? Of course not! They’re both experts looking at the same “player” (stock), but I can consider both companies’ opinions fairly even if they don’t agree.

  7. Vidor says:

    “There are a few different answers to this. The main one — explained well by the Sam Miller in a good piece at ESPN the Magazine — is that baseball is a complicated thing “

    This is assuming facts not in evidence. The easiest answer for why people can’t agree on how to calculate WAR is that WAR is a bad statistic.

  8. Grover Jones says:

    Vidor you seem to be calling for a conclusion of the witness. I object.

  9. Sounds promising. I was pleased when Bill James came up with the concept of win shares. The numbers he calculated were congruent with this fan’s perceptions. And now it was possible to quantify (approximately of course) just how foolish Cleveland’s 1951 Minoso trade was, instead of just saying “It was a bad trade.”

  10. This comment has been removed by the author.

Leave a Reply

Your email address will not be published. Required fields are marked *