By In Stuff

A Bit More on WAR

I hope you saw Sean’s response to my thoughts on the role that Baseball Reference WAR played in the Cy Young battle between Rick Porcello and Justin Verlander. I thought it was thoughtful, especially considering that he wrote it while on vacation. I want to discuss it in a bit more detail here.

At the beginning Sean points out:

1. That Baseball Reference very clearly states that the difference of 1-2 runs should not be considered definitive.

2. They break out each component so there’s nothing whatsoever hidden.

These things are absolutely true and if I in any way suggested otherwise then I misspoke and take it back. I was not trying to suggest that B-R was openly lobbying for Justin Verlander because he had more WAR than Porcello (6.6 to 5.0) OR that I had uncovered some hidden secret when pointing out that the entire difference came down to the defensive differences of the Tigers and the Red Sox. I simply did the math. It’s completely on me that I had not done it before.

What I think — hope — I was doing was making parallel points that the WAR difference between Verlander and Porcello DID, like it or not, have a real impact on the Cy Young race (and the reaction afterward) and that I seriously doubt most people did the math to figure out why Verlander had that edge.

And, once those points were made, I wanted to make the big one: I don’t think using overall team defensive numbers to separate a pitcher’s performance from his fielders the way WAR does is compelling or convincing. And I think, based on everything I can actually see in the Verlander-Porcello record, it was an incorrect adjustment in that specific case.


To very briefly recap, Baseball Reference WAR uses pitchers runs allowed and compares it — after various adjustments — to league average. Porcello and Verlander, after ballpark adjustments, saved almost exactly the same number of runs against the average pitcher. But, because the good folks at Baseball Info Solutions had the Boston Red Sox as an excellent defensive team (53 runs saved) and the Detroit Tigers as a terrible defensive team (minus-50 runs), WAR makes the assumption that much of Porcello’s value actually belongs to his fielders while Verlander’s numbers should ae adjusted significantly upward because he would have been better with even an average defense behind him.

That’s the entire 1.6 WAR difference.

Now, I pointed out that everything I can find — whether it’s using old-fashioned stuff like errors and base-stealers thrown out and unearned runs or newer details like batting average on line-drives — suggest that the Tigers defense behind Verlander was BETTER than the Red Sox defense behind Porcello. I don’t know if that’s true (though I think with the incredible progress being made with Statcast we will soon know a lot more). But I do know that when you break it down batter by batter there seems no possible way that Verlander’s defense was that much worse than Porcello’s.

And so here then is the part of Sean’s response I want to talk about:

Maybe it’s true that the Tigers were above average fielders when Verlander was on the mound, maybe not, but keep in mind BIS had the Sox at +59 and the Tigers at -49 for the season. How on earth does a team that’s the 3rd worst defensive team transform itself into an above average defensive team for the 228 innings JV was on the mound given they’d then have to be EVEN worse the other 1200 innings to get to -49?

If we assume for the moment the Tigers were in fact “excellent” behind Verlander, then the question becomes how to handle this. We apply the team’s DRS to each pitcher based on the percentage of the team’s balls in play which in probably 95% of the cases is a good way to do this and may still be in Verlander’s case. If you start to dice things up by the pitcher on the mound you then run into very small samples where something like Mookie Betts pulling back a home run and getting a double play has a dramatic impact on the pitcher’s WAR. I don’t think you’d like the alternative as you run the risk of conflating random variance with real performance differences.

Sean’s numbers are slightly different for Baseball Info Solutions than the ones I have, but that’s OK. His point does seem irresistible; I do sound a bit crazy to think that the Tigers were excellent defensively when Verlander was on the mound when they were so clearly inferior the rest of the time.

So … let’s do something else. Let’s compare two other pitchers: Toronto’s Marco Estrada and Tampa Bay’s Jake Odorizzi. But instead of comparing their WAR, I want to compare their won-loss records. Yes, it’s true, I don’t like won-loss record, but I’m trying to make a different point. I hope it will make sense as we go.

Estrada had a 9-9 record.

Odorizzi had a 10-6 record.

Now, we all know enough about the quirks of baseball to know that won-loss record is altered by all sorts of things — the timing of runs scored, the effectiveness of the bullpen, etc. So let’s say that we wanted to figure out what their won-loss record SHOULD HAVE BEEN. There are various ways of doing this that are well-beyond my mathematical means, but for our purposes let’s to use the Baseball Reference WAR to make an estimation.


We start with runs allowed:

Estrada gave up 73 runs in 176 innings, for a 3.73 runs per nine innings.

Odorizzi gave up 80 runs in 187 2/3 innings for a 3.84 runs per nine innings.

Estrada was ever so slightly better at preventing runs. Hiss advantage goes up a touch more because he faced slightly better competition. It goes up  a little bit more because of ballpark factors; Toronto’s ballpark is better for hitters than Tampa Bay’s.

Do you have the image in your mind? Now, Baseball Reference makes the defensive adjustment. The Blue Jays defense (52 runs saved) was better than Tampa Bay’s defense (22 runs saved) but they were both better than average and so both pitchers have their numbers adjusted downward, with Estrada’s going down a little bit more.

Got it? In the end, Baseball Reference has Estrada saving 17 runs against average vs. Odorizzi’s 12 runs above average. That makes Estrada’s WAR 3.4 and Odorizzi’s 3.0, so fairly close, slight edge to Estrada … BUT remember in this case we’re not looking for their WAR. We’re trying to look at won-loss records.

And in order to look at won-loss record, well, yes, we have one more thing to look at: Run support.

I’m guessing you can see where I’m going with this.

If we mirror Baseball Reference’s defensive system, we should figure out run support by looking at how many runs their teams scored over the season.

Toronto scored 759 runs, which is 28 runs above league average.

Tampa Bay scored 672 runs, which is 59 runs below league average.

And so, it’s obvious that Toronto’s offense — to use my own quote — is much, much, much better than Tampa Bay’s offense. From this, then, we have to assume that Toronto gave Estrada much better run support than Odorizzi. Right? I mean, how on earth would the second-worst offense in the league transform itself into an offensive machine for the 187 innings that Jake Odorizzi is on the mound? How on earth could the powerful Blue Jays offense go into the tank for the 176 innings that Marco Estrada pitching?

Like I say, I think you know where this is going.


The Rays averaged 5.18 runs per nine innings when Odorizzi was on the mound.

The Blue Jays averaged 3.63 runs per nine when Estrada was on the mound.

I am one of Baseball Reference’s biggest fans, obviously, and so what I’m writing here is not intended as a criticism but as a friendly suggestion. I don’t think using team defensive adjustments for individual pitchers work in B-R WAR. From the perspective of the lamest of laymen, I just don’t think they’re persuasive. Even if Verlander-Porcello is an anomaly the way that Estrada-Odorizzi is an anomaly, I don’t think there’s any way that pitchers get exactly the same level of defense behind them. That seems utterly obvious to me. Look at the Red Sox run support:

Rick Porcello (33 starts), 7.6 runs per nine.

David Price (35 starts)(, 6.6 runs per nine.

Steven Wright (24 starts), 6.4 runs per nine

Clay Bucholz (21 starts), 4.6 runs per nine

Drew Pomeranz (13 starts), 3.9 runs per nine

Eduardo Rodriguez (20 starts), 3.6 runs per nine

Why would offensive run support fluctuate so much and defense NOT fluctuate? This is especially true because different pitchers allow different sorts of balls in play — ground balls, fly balls, line drives, choppers, bloopers, bat-breaking balls, you name it. I am of the belief that pitchers do not have very much control of whether balls put in play become hits, but I think it’s obvious that they do have clear tendencies. The strength of the Red Sox defense was in the outfield where their right and centerfielders saved 44 runs. This, theoretically, should help a fly ball pitcher like Eduardo Rodriguez more than a ground ball pitcher like Pomeranz.

Theoretically, anyway.

I don’t believe that Baseball Reference breaks down ground balls or fly balls or where balls were hit — I think they just use total runs saved by a defense and apply that generally to balls in play. It seems clumsy for an elegant formula.

Sean wrote that if they find an issue with defensive numbers they will look at it because they’re always looking to improve. I believe that’s true. This is a friendly suggestion. Sean and company are a lot smarter than I am and if they find little to no merit in what I’m saying, I will not take any offense. My kids don’t think I know what I’m talking about either.


Print Friendly, PDF & Email

49 Responses to A Bit More on WAR

  1. Neil says:

    Another thing to consider is each individual pitchers contribution to defense. While a small piece, it is at least one out of nine defenders that is different for each pitcher.

    • Eric says:

      This really is a fantastic point. It also reminded me how some teams have personal catchers for pitchers – Lester frequently had Ross behind the plate for example, or Dickey/Thole. In the case of Lester, for example, he’d have a different pitcher (himself) and catcher (Ross) – so 2/9 of the defense is different than the rest of the Cubs.

  2. Lee says:

    As an Australian who knows nothing about baseball this makes sense to me. Though when I saw they gave pitchers wins I thought this must be on those rare occasions where the pitcher personally scored more runs than he gave up because the alternative (what I late learned to be the truth) seemed insane in a team sport, so what would I know

    • Marc W. Schneider says:

      I find it somewhat annoying that a person who admittedly “knows nothing about baseball” still feels qualified to comment that attributing wins to pitchers is “insane.”. I certainly recognize that pitcher wins is a flawed and often misleading statistic, but there is a history behind the number that I, at least, value.

      • Lee says:

        Just telling u what someone with no knowledge of the history of the sport who encounters that bizarre state for the first time thinks, it’s equally annoying how in America you assign qbs wins but not say, middle linebackers, it’s as if u don’t get the core concept of team sports, that it requires everyone to have success

      • SDG says:

        Yes, I love baseball history too, but wins (and RBIs) are to me not only useless statistics, in that they tell you little about the performance of the individual in question, but dangerous, because they make an issue of performance into one of character. You get wins because you have the heart of a lion and the will of a soldier! You get RBIs because you’re a true-blue teammate who sacrifices his personal glory for his band of brothers! Conversely, if you don’t have those stats, it’s because you’re a lazy, selfish, jerk.

  3. CJ says:

    Well Joe I think you’re spot on here that defense CAN fluctuate for individual pitchers and I might take it a step further to say that the pitcher may influence that defensive performance.

    Obviously we know that Porcello is a ground ball pitcher and Verlander is a fly ball pitcher. 43.1% of Porcello’s batted balls and 33.7% of Verlander’s batted balls were on the ground. They had similar %’s of line drives and HR/FB.

    With this being the case, that terrible Detroit defense probably had an easier time playing behind Verlander than that great Boston defense had playing behind Porcello.

    In fact, we see that Verlander’s Babip is consistent with other babip’s that he’s had in the past, whereas Porcello has his lowest babip ever, and quite a margin lower than his average.

    All of this is to say, Verlander takes the defense out of it more than Porcello does, by allowing fewer ground balls, which lead to a higher babip, historically than do fly balls. Porcello rode his great defense this year, and that made him a great pitcher.

    Verlander essentially pitched around his poor defense and Porcello pitched to his great defense. I think this makes Baseball Reference’s calculation inaccurate because it assigns a full season value of defense to Verlander, when Verlander takes his defense out of the equation more often than Porcello.

    FWIW, Fangraphs had them both at 5.2 WAR, tied with Chris Sale, who had only 4.9 bWar.

  4. Fivejackace says:

    Beautiful. Just beautiful, Joe. An excellent response that had me nodding my head in agreement the whole time.

  5. Craig from Az says:

    While the runs scored per pitcher comparison is interesting, I don’t think anybody would agree that defense is anywhere near as variable as offense. So while a good offense will be alternatively good and bad on any given day (although more good than bad), a good defense will pretty much be good every day.

    I do think the fly ball/ground ball tendencies of the pitcher might impact the effectiveness of any particular defense, although still to a much smaller degree than offensive variability.

    But I have no data to prove any of this 😉

    • Anon says:

      Why would defense not vary as much as offense? Why wouldn’t fielders have good days and bad days just as hitters do? And why wouldn’t there be an element of randomness to where the good and bad days fall?

      • Mr Fresh says:

        To oversimplify… good hitters are successful 30% of the time. Good defenders are successful 98+% of the time. Obviously there is a lot more to it than that.. but I agree with the premise that run support varies a lot more than the quality of the defense.

        • Donald A. Coffin says:

          You’re using errors as your measure of defensive success, aren’t you? I think, rather, it should be something like Bill James’ defensive efficiency ratio of batting average of balls in play. Verlander’s BABIP was .254; Porcells’s, .269. They both faced about 900 batters (903 for V, 890 for P), so the difference is about 12 fewer baserunners allowed on balls in play by Verlander. Detroit’s overall BABIP was .280 (so somewhat higher than .280 for everyone but Verlander); Boston’s was .272–virtually the same for Porcello and for the rest of the staff. So SOMETHING Verlander was doing–or that the fieleders behind him were doing–was very different.

      • JaLaBar says:

        I think, quite simply, that it comes down to something this simple: Mike Trout going 0-5 isn’t that rare. You just aren’t going to see Andrelton Simmons kick a buncha balls around the infield.

        • SDG says:

          You’re going to see a bunch of balls he might have gotten to if he positioned himself differently. We just aren’t used to judging that.

  6. jpdg says:

    “I am of the belief that pitchers do not have very much control of whether balls put in play become hits, but I think it’s obvious that they do have clear tendencies.”

    Honestly Joe, you should just stick to using the FIP based WAR by FanGraphs. The FanGraphs version is extreme in that it works under the assumption that the pitcher, aside from inducing infield pop-ups, has ZERO control over whether a ball in play becomes a hit and strips away the contribution of the defense entirely. Again, it’s extreme but I believe that’s the more sound approach because I don’t see how we can assume defensive performance is static over the course of 162 games. The same way a terrible offensive team like Philadelphia could have a seven run outburst several times a year, a poor defensive team like Detroit could have had several brilliant defensive performances over the course of a long season. Both versions of WAR have their merits but the FanGraphs version just makes more sense to me. The Baseball Reference version just makes too many assumptions for my liking.

  7. Daniel says:

    There was one sentence in Forman’s response that jumped out at me: “If you start to dice things up by the pitcher on the mound you then run into very small samples where something like Mookie Betts pulling back a home run and getting a double play has a dramatic impact on the pitcher’s WAR.”

    The thing that struck me is that it seems like it’s the *current* system that has the problem he’s pointing out. Let’s say that Betts makes that play, saving (say) a three-run homer. Then he just took 3+ runs off the raw numbers that BR is going to use to calculate Porcello’s WAR — which seems like it’s exactly the dramatic impact that Forman is talking about. That impact gets partially corrected, because the play improves the Red Sox’ defensive metrics a bit, but not totally, because the current method basically divides up that luck among the whole pitching staff.

    More generally: I think we have a natural intuition that there are just so many defensive plays in every single game that it should all average out. But you rightly point out that there’s a competing intuition, which is that we all know offense fluctuates a lot, and there’s no reason to think that defense should be different. I think that the way to resolve these two competing intuitions is to realize that the vast majority of balls in play are either easy outs or no-doubt hits. The marginal cases, the ones where the difference between good defense and bad defense matters, the ones that we’re worried about here, are much rarer, and so it’s no surprise that they end up being unevenly distributed.

  8. Timothy James Selenski says:

    Best response EVER!!! Joe just dropped the mic…

  9. the_slasher14 says:

    Throw in the fact that it has often been said that pitchers who work slooooowly put a defense to sleep behind them while the Bob Gibson get-the-ball-back-and-throw-it types keep the defense on its toes, and you’ve got yet another reason to consider results might be quite different from pitcher-to-pitcher. Sean’s point about the result of considering individual pitchers’ defensive numbers might produce weird results because of the dreaded small-sample-size has some merit but I don’t see how it can outweigh the fact that pitcher A and pitcher B can produce legitimately different fielding results for valid reasons. Well done, Joe.

  10. Mark says:

    I have always thought the randomness of baseball is primarily driven by offense not defense. So as soon as I saw where Joe’s post was going I was not swayed. Yet I admit I cannot prove or even give a decent argument for my belief. It is, to me, self-evident, yet obviously that is a personal judgment.

    FWIW, I believe that defense is less random because there is no defensive equivalent of hitting the ball right on the nose and a it become a line drive out, or a weak ground ball that travels 47 feet yet becomes a hit. Defense is less random because you are not hitting a round ball with a round bat. Defense is less random because effort is rewarded more on defense than on offense. Defense is less random because there is less failure.

    I realize these are less-than-persuasive arguments. But Joe’s arguments for defensive variability is IMO even less persuasive. Joe mostly lists lots of stats that prove offensive variability and then suggests we just accept those as evidence of defensive variability. I’m not buying. Then there is this: “I am of the belief that pitchers do not have very much control of whether balls put in play become hits, but I think it’s obvious that they do have clear tendencies.”

    Got it. Pitchers have no effect on defensive performance, except when they do, and when that happens they deserve no credit for it.

    Has nobody ever done a comparison of the variability of oWAR compared to dWAR? It seems to me that would go a long ways towards clearing this up.

    • Joe Posnanski says:

      Tom Tango points out that the one time he saw individualized fielding numbers it was for the 2012 Tigers, which of course had Verlander, Porcello AND Scherzer. That Tigers defense was terrible.

      Verlander’s fielders were +8
      Porcello’s were -9
      Scherzer’s were -11.

  11. Atom says:

    Baseball Reference pitcher WAR always threw up some really, really funky number on occasion. Not all the time, but every now and again, their system would spit out a number for two pitchers that would have you absolutely flummoxed.

    Some examples
    1993, National League
    Jose Rijo – 2.48 ERA, 162 ERA+, 257 IP, 76 R, 71 ER, 19 HR, 62 BB, 227 K, 2.93 FIP, 3.66 K/BB
    Maddux – 2.36 ERA, 170 ERA+, 267 IP, 85 R, 70 ER, 14 HR, 52 BB, 197 K, 2.85 FIP, 3.79 K/BB

    You could argue either way which was better. Maddux coughed up more unearned runs, which certainly doesn’t help, but gave up fewer homer in more innings, had a slightly better K/BB rate and FIP. Rijo struck out more guys, matched him in several categories, and had most of his runs accounted for in ERA.

    Anyway, it’s close, right?

    According to WAR, not at all. Maddux posted a 5.8 WAR, Rijo’s was *9.3*.

    Another example would be 1979.
    JR Richard – 2.71 ERA, 130 ERA+, 292 IP, 98 R, 88 ER, 13 HR, 98 BB, 313 K, 2.21 FIP, 3.19 K/BB
    Phil Niekro – 3.39 ERA, 119 ERA+, 342 IP, 160 R, 129 ER, 41 HR, 113 BB, 208 K, 4.16 FIP, 1.84 K/BB

    Niekro coughed up far, far more runs (31 unearned. As a knuckleballer, it’s likely not a coincidence the Braves lead the league in passed balls). On the whole, he gave more than 1.2 runs per nine innings than Richard. Granted, he played in a more hitter friendly environment, but that would be a huge adjustment to make up. In sabermetric stats, Richard destroys him. NIekro has his durability. How much ground does that make up?

    Richard: 5.6
    Niekro: 7.6

    I can listen to the argument that Nierko was more valuable based on his innings. I won’t agree with it, but fine. But *2 wins* better?

    Every few years or so, you’ll see one of these utterly perplexing results. It’s enough to make me not take their pitcher WAR too seriously

    • invitro says:

      If you want to know the reason for the brWAR’s, why don’t you post their components? They’re easily available, on the same line where you got the total WAR. I mean RA9def, PPFp, the numbers on that row.

      Niekro’s 50 more innings is not a small amount, compared to the 2-win WAR difference. It’s 17% more innings, and 17% more than 5.6 WAR is 6.6 WAR. So if they pitched at the same level, that’s half the 2-WAR difference right there, and now you’re down to having a serious problem with a 1-WAR difference, which is a little hard to take seriously. (I hope you’re not saying this difference means brWAR doesn’t pass the smell test. 😉 )

      • Cooper Nielson says:

        But Atom isn’t claiming “they pitched at the same level.” He presented overwhelming evidence that Richard “destroys him” on a per-inning basis, so Niekro’s only advantage was his durability (innings).

        17% more innings could lead to 1.0 more WAR *if* Niekro had been as good as Richard on a per-inning basis, but since he wasn’t, the difference should be less than 1.0 WAR, not more.

  12. Nick says:

    Great post Joe. Apologies if the point has been made already and for being perhaps overly simplistic, but just looking at the 2016 Red Sox stats you get a sense of the natural variation in defense no doubt common throughout the league. The Sox made 20 errors in Porcello’s 33 starts… or 0.61 per game. They made just 55 errors in the other 129 starts.. or 0.43 per game. Obviously the ‘error’ stat leaves a lot to be desired, but that seems like a pretty huge difference – the great Sox defense had a ~50% increase in errors during Porcello starts! And its not unsurprising. As has been stated – Porcello is a ground ball pitcher who had one great defender in the infield (Pedroia), one above average (Shaw), and two well below average (Bogaerts and Ramirez). He had an average infield defense to work with. What made the Sox great defensively WAS their outfield which Porcello didn’t take full advantage of, both because of his low flyball rates, and his unfortunately bad luck: Bradley/Betts happened to make half their combined errors this season in games Porcello started (2/4).
    I wonder if the solution is to add some weight behind plays the pitcher was on the mound for. Count those stats 5 times or something and all others once to get your defensive ranking for the 2016 Red Sox behind Rick Porcello. Baseball Ref really does need to think about making some changes to how they value defense… forget this year, when they have Ted Higuera ahead of Roger Clemens in the 1986 WAR leaders, something is off.

  13. Darrel says:

    The premise that defense, over a small sample size of which one pitchers season certainly is, could vary from starter to starter on the same team seems a very reasonable position. If measured that way I believe Joe would be right and we would see variances yearly amongst a teams SP. Where he lost me though was the comparison to runs support. Defense is more a less an independent variable. That is to say that once a ball is in play a team, or a player, will either turn it into an out or not. There are a relatively small number of external forces acting on that. Things like weather, field conditions and the like and IMO those things are rare and have very little impact on the totality of the measured sample. Offense is a whole nother kettle of fish.

    Maybe Estrada always faced the other teams ace. Maybe the manager always put a better D player in in place of a better offensive one(personal catchers come to mind). Maybe Estrada faced better hitting teams than one of his fellow SP due to the way the rotation fell. Maybe he more often pitched in better pitcher stadiums. Maybe opposing managers always stacked their lineups against him instead of resting a starter or two. Oh yeah and all of the things that might effect defense by definition effect offense as well. Way, way too many external factors to make the analogy hold up for me despite the interesting premise.

    • invitro says:

      “Things like weather, field conditions and the like and IMO those things are rare and have very little impact on the totality of the measured sample.” — Just a quick note. I’ve seen Bill James write that weather, in particular temperature, has a large effect on runs. For example, it’s probably the main reason why offense tends to be down in the playoffs.

      • Darrel says:

        Runs yes but not defense. A play that should be made on D should be made despite the difference in temperature from July to October.

  14. invitro says:

    Why doesn’t someone actually see what the Defensive Runs Saved* was during Verlander’s and Porcello’s innings? That information exists, right? Maybe it exists but is not available? Or it’s available but not freely available? Or is freely available but too hard to use or understand? (* and other competing defensive stats)

  15. MGL says:

    Joe, your run support analogy is good but there is no particular reason why the variance from pitcher to pitcher in run support should be the same as the variance in a defensive measure like UZR, DRS or BABIP. In fact it’s likely that there’s much more variance in run support which supports Sean’s argument.

    However both of you are missing the most important piece of the puzzle – each pitchers BABIP. I don’t have their numbers handy but I believe that Verlander’s is lower than Porcello’s. That strongly suggests that the defense did play well behind Verlander maybe even better than with Porcello despite the fact that over the whole season the Boston D was likely much better.

    If we knew nothing about the pitchers’ BABIP then Sean’s method would be correct (although he is likely using too extreme a number) but once we incorporate those BSBIP everything changes. It then becomes a Bayesian exercise.

    • Adam S says:

      MGL – To be fair to Joe, he makes precisely that point in his first article “Yep. Defense. Even though Porcello gave up more unearned runs than Verlander, and even though Porcello’s batting average on balls in play was considerably higher (.269 to .256)”.

      • MGL says:

        Right he did and that’s the critical data that essentially makes Sean’ argument irrelevant. Whether it is rare or not for individual pitchers to vary a lot in terms of the defense behind them we have strong evidence that the defense played well behind Verlander perhaps even better than Porcello.

        • Brian Cartwright says:

          Verlander’s defense converted more balls into outs. That does not mean they ‘played better’. As I show below, Verlander presented his defense with easier balls to field, as 36% of his batted balls (including grounders) were hit above 28 degrees, giving them considerable hang time and a very low expected BABIP.

          • MGL says:

            I understand. Unless we have that data then we do have to assume that when a pitcher has a low BABIP that their fielders played better than they did over the whole season. We also assume that he allowed easier to field BIP.

          • MGL says:

            I understand. Unless we have that data then we do have to assume that when a pitcher has a low BABIP that their fielders played better than they did over the whole season. We also assume that he allowed easier to field BIP.

          • Brian Cartwright says:

            With Statcast we can measure that directly, and Tango’s made some tweets the past few days on that subject.

            Where there’s only play by play, GB%, PU%, HR% and others can suggest with a good amount of accuracy the distribution of vertical angles

  16. Mike says:

    Joe – how do we account for the effect one’s bullpen has on their ERA? As noted in a Boston Globe article, “Rick Porcello certainly needs to thank his bullpen for his Cy Young victory over Justin Verlander. Porcello bequeathed 11 runners to his relievers and none scored; Verlander bequeathed 15 and eight scored. Had Verlander had better relief, he would have led the league in ERA.” Since earned runs matter for purposes of calculating BR WAR, Verlander’s bullpen significantly hurt his BR WAR and hindered his Cy Young chances while Porcello’s bullpen significantly bolstered his. It seems that some adjustment should be made for the number of runners left on base or the number of runs allowed by the bullpen that are charged to the starter after a starter is pulled.

    • MGL says:

      That can be done quite easily. You subtract any runs charged to them with another pitcher on the mound and you “complete” the inning using run expectancy tables adjusted for the pitcher’s runs allowed for the season.

      Say the pitcher leaves a game with runners on 1st and 2nd and 1 out. Rather than charge those runners to the starter if they score with a reliever on the mound you merely charge the starter with the number of runs equal to the run expectancy of that situation, say .75 runs, and credit him with 2 more outs. That’s how many runs allowed we expect on the average if he were to complete the inning. Actually the .75 runs is for an average pitcher so we adjust it according to his RA9 for the season. If it’s 80% of league average then we use .8 * .75 runs.

      Even though runs allowed by relievers but charged to the starter will even out in the long run, in any single season you can have gross disparities like the one you mentioned. The fairest way to avoid that in the short term is to use the method I described. It’s messy though. Most people would prefer to just know a starters ERA not one adjusted for partial innings.

  17. Brian Cartwright says:

    Pitchers have almost no control over the outcome of a ground ball, but they do have a lot to say about a ball in the air – through their control of vertical angles.

    Forget line drives – they are too subjective. A ball in the air to the outfield is a hit about 40% of the time, compared to 25% on the ground. Extreme fly ball pitchers allow a 30% hit rate on balls in the air to the outfield, while for extreme ground ball pitchers it’s 50%. The difference is that getting a higher percentage of balls on the ground means that the balls in the air are also shifted to lower angles (the entire distribution moves) – and the low angle balls are short, with less hang time, and harder to catch. (See my piece in the 2012 Hardball Times Annual).

    Second point – BABIP is a noisy stat, requiring a lot of regression, because it’s an
    aggregation of several rate stats which each have their own properties. There are bunt hits, infield hits, ground balls hits to the outfield, and hits on balls in the air to the outfield.

    For all the other Tigers’ pitchers in 2016, an infield grounder was beat out for a hit at a rate of .078. For Verlander, .084 – no difference.

    For other Tigers’ pitchers, a ground ball got through to the outfield at an average rate of .193. For Verlander, .201. Again, no difference.

    The infield defense for Verlander was the same or even slightly worse than the other Tigers’ pitcher.

    On balls in the air to the outfield, other Tigers’ pitchers allowed a hit rate of .416 (slightly below average). For Verlander, .314. This is a large difference. Was it bad defense or the types of flies that Verlander allowed?

    Other Tigers’ pitchers allowed a ground ball rate of .471, Verlander .358. Fairly large difference. Verlander’s was a low GB%, but not Chris Young extreme.

    Other Tigers’ pitchers got an infield popup at a rate of .161 of all balls in the air. For Verlander, .197. That’s a quite high popup rate, where balls are caught for outs 98% of the time.

    I have a regression formula, but just eyeballing it for now, the play by play shows Verlander getting fewer balls on the ground and more infield pops. That strongly suggests his balls in the air to the outfield were at higher than average angles.

    Statcast verifies this. Verlander’s rate of allowing balls with vertical angles between -90 and -4 was 71% of MLB average. His rate of balls from 44 to 90 (infield and outfield pops, with a BABIP of .026) was 203% of MLB average.

    Angles BABIP Verlander Porcello
    -90 to -4 .157 0.71 0.92
    -4 to -12 .508 0.86 0.98
    12 to 28 .545 0.94 1.02
    28 to 44 .107 1.27 0.99
    44 to 90 .026 2.03 0.98

    Doing a weighted mean of the MLB average of the vertical angle groups, Verlander allowed 5.7 fewer base hits than expected. Porcello, 11.3 fewer (perhaps the better defense)

    36% of Verlander’s batted balls were high in the air where they rarely fall for hits, compared to 24% for Porcello. 44% of Verlander’s balls were in the highest hit rates of -4 to 28, 51% for Porcello. That’s Verlander’s skill in avoiding base hits.

    A model that considers neither park factors or defense would credit each pitcher for his expected hits allowed based on the distribution of vertical angles on his balls in play, which represents how catchable the balls were. Where Statcast is not available, this can be fairly well estimated (r^2 of .70 for BABIP on balls in the air to the outfield) using GB%, PU% and SO%, and assuming MLB average hit rates for balls on the ground.

    • Hamster Huey says:

      Wow – having not thought about it for too long, this seems like a great analysis to me, and might represent a middle ground between Joe and Sean. The Tigers defense _did_ perform better (at least, convert more batted balls into outs) for Verlander than it did for the rest of their pitchers – but Verlander might actually deserve more of the credit for this than a simple “BABIP is out of the pitcher’s control” absolutist interpretation would admit.

  18. kdon says:

    A nice response, and very clever, but the analogy to run support is simply not valid. As many people have pointed out, offense does fluctuate significantly more than defense. It’s simple: luck.

    Anyone who watches baseball see how hard hit balls get caught and cheapie pop-ups fall in for hits. This hardly ever happens with defense. A defensive player who does all the athletic things right – fields, dives, throws, whatever, is almost always rewarded. There is no luck on catching a liner in the gap or throwing a strike to 1B, so it’s very hard to fluctuate to the degree that offense does.

    • Brian Cartwright says:

      There will me more variance with fewer chances. Verlander made 34 starts – that’s the number of opportunities in ‘runs per game’. Innings go on until someone makes the third out. One game of 14 runs of support would be 3% of the season starts but 10% of the runs.

      Verlander also allowed over 500 batted balls. The difference between the best and worst fielder is 0.1. If you hit a bunch of flies to left field, the best fielder will catch 60% and the worst 50%. That can potentially be 50 or 60 base hits over the course of the season, but the small range and the large sample provide for much less randomness over the course of a season.

  19. Kevin says:

    Joe has some smart readers. Great comments! Hope BR is taking notes.

  20. Mark Daniel says:

    Here’s what bugs me about defensive WAR. Take a look at Josh Donaldson’s positioning on the play in this video –
    It’s Salvador Perez’ walkoff hit in the 2014 AL Wild card game (A’s vs. Royals) if you don’t trust the link and want to look it up yourself.

    Go to the 30 second point of that video, and you’ll see they show a camera angle from behind homeplate somewhere. You can see that Donaldson is playing far off the bag, and has to dive full out to try and get the ball. He misses it, obviously, but I believe WAR would ding him a substantial amount for not making the play, because a ball hit at that speed and in that location would probably be converted into an out a significant percentage of the time.

    This is a really hard play to make. Perhaps impossible. So why ding Donaldson points for a play that pretty much no one would make?
    I wish we had access to the chart BIS uses to determine the ease or difficulty of various plays.

  21. Yo says:

    To me, this is the key: “That Baseball Reference very clearly states that the difference of 1-2 runs should not be considered definitive.” I enjoy looking at the WAR totals and comparing them to past years, but what do they really mean? We’re talking about totals under 7, so 1-2 is a pretty big “discrepancy.” It seems clear that WAR shouldn’t be used – by itself – to determine an MVP, Cy Young or anything else.

    And, it gets worse when some then use a $ multiplier (WAR = $7M, or something) to determine whether a contract is a good one or bad one. You’re multiplying one estimate with another.

    • Mark Daniel says:

      I agree. If we’re going to use WAR, it should come with a variance measure, such as 5.0 +/- 1.5 or something. But that actually wouldn’t be good enough. I’m certain that the variance between the hitting stats and the variance between the baserunning and fielding stats are quite different (as in there’s lots more variability in the baserunning and fielding stats).

  22. shagster says:


    Nice analysis of the ‘comps.’ If writing thing doesn’t work out, you can fall back on a career as a commercial banker, fixed income, or hedge fund trader. Good bankers are former analysts that now assign/hire an analyst, review the analyst’s numbers, toss them out, and THEN interview and write the story. It’s a progression. Some analysts can’t make the leap, and simply compare the differences in numbers. If writing doesn’t pan out, try your hand at trading/banking. ; )

  23. Ian says:

    What blows me away is that 1-2 wins in a season in not definitive. I always thought the difference was something like .5 Win. We (fans) sure use WAR as very definitive and I never see anyone argue that a 4.2 player might be better than a 5.3 player. Heck, most of Pos argument against Jack Morris was his low WAR total. So, did I misunderstand Sean’s post? If not, if B-R thinks an avg player is worth 2 wins but a difference of 1-2 wins isn’t definitive, than what is the point? Is fangraphs WAR more accurate or do they suffer from the same range?

    • Mark Daniel says:

      I don’t think Fangraphs is more accurate. They use more or less the same formula, but it differs in the adjustments, I believe.
      From Fangraphs site:

      “Positional player WAR values typically only differ dramatically when the various systems disagree about a player’s defense. The hitting and running stats are different, but they usually aren’t different enough to significantly alter the values you see.
      Pitchers, however, are valued very differently by the different systems. FIP is a linear weights based system that treats all balls in play as equally valuable and ignores sequencing. Baseball-Reference starts with runs allowed and works backwards. Baseball Prospectus uses a complex modeling system to attempt to derive the value of individual events while controlling for contextual factors. You have to decide which method is the one you prefer, although looking at each site is the best way to get a complete picture of the player.”

Leave a Reply

Your email address will not be published. Required fields are marked *