By In Stuff

Porcello v. Verlander

One thing that ticks me off in sports is when an announcer says one thing, the replay clearly shows something else, and the announcer refuses to back off. You see it all the time. An announcer (often Phil Simms) will explain a quarterback sack away by saying that there was nobody open downfield or praise a defender for covering a receiver without committing pass interference.

Then they will go to the replay and it will show two receivers waving their arms madly to indicate how open they are or it will show a cornerback absolutely mugging a receiver.

And the announcer will continue to explain the sack as no one being open and continue to say that the cornerback played perfect coverage.

I feel a little bit like that on this Porcello-Verlander thing. I have said on multiple occasions that, while it’s close, I would have voted Justin Verlander over Rick Porcello for Cy Young. I still believe the first part; it’s absurdly close. There’s no WRONG answer as to which pitcher deserved the Cy Young Award. None of the following words change that.

But I said I would have given the slight edge to Verlander because it seemed to me — at first glance — that he had the slightly better season as judged by the advanced stats, particularly Baseball Reference WAR.

So, upon further review, I am obliged to say this: I think Baseball Reference WAR in this case doesn’t pass the smell test.

* * *

Let’s start with this fairly obvious statement: Not very long ago, there would have been no debate whatsoever about who should win the 2016 American League Cy Young Award.

Porcello went 22-4.

Verlander went 16-9.

Those won-loss records would have ended all arguments.

Bill James did an interesting study on this that I think he will unveil any time now — he looked into the Cy Young voting to see how it has changed in regards to the emphasis voters put on win-loss records. I won’t give any spoilers here except to say that the data shows that up until 1990 or so it’s pretty clear that won-loss record was EASILY the most important factor in Cy Young voting.

Then, things began to very slowly change. Why? I think it came down to three things:

1. Pitchers stopped going deep into games and they made fewer starts, which naturally brought down win totals. The last 25 years, there have been 81 pitchers who won 20 or more in a season, and no one won 25. In the 25 years before that, 182 pitchers won 20 or more and 14 of them won 25-plus. With fewer wins, voters had to look elsewhere.

2. Some of the greatest pitchers in baseball history pitched in the 1990s — Greg Maddux, Roger Clemens, Randy Johnson, Pedro Martinez, etc. — but their greatness was rarely reflected by wins and losses. Maddux won 20 just twice. Johnson and Martinez both won Cy Young Awards with 17-win seasons, Clemens with an 18-win season.

3. New statistics came along that were better indicators of a pitchers skill than their won-loss record. And certain annoying people like yours truly and Brian Kenny began ranting against the pitcher win.

Over the last 10 or so years, the voters have often rebelled hard against the won-loss record. When Felix Hernandez won the Cy Young in 2010 with a 13-12 record (ower, among others, C.C. Sabathia, who went 21-7), the wall came a tumblin’ down.

This year’s Cy Young duel between Verlander and Porcello seemed just the latest battle between advanced statistics and pitcher won-loss record. That’s certainly how many people told the story … and I probably fell for that a little bit too.

There was just one problem with that story: Most of the advanced stats you looked at did not favor Verlander. It was tempting to make the 2016 Cy battle a lot like the 1999 battle when Mike Hampton went 24-4 and Randy Johnson went 17-9. But that one was very, very different. Unit dominated EVERY STATISTIC except won-loss record. His ERA was about a half run better. He had 200 more strikeouts while walking fewer batters. He pitched 32 more innings with a much lower WHIP. By Baseball Reference WAR he was two and a half wins better.

Verlander’s advantages are, well, considerably more subtle if they even exist. His ERA advantage (3.04 to 3.15) is negligible at best. Truth is, in context, when you consider ballpark, Porcello has the clear edge. Porcello’s ERA+ of 145 is better than Verlander’s 136.

Verlander did have 64 more strikeouts (and he led the league in Ks) but Porcello countered by having 25 fewer walks. It was Porcello who had the better strikeout-to-walk ratio. Verlander pitched just four more innings than Porcello, and Porcello completed one more game. Verlander had the slightest of edges in WHIP (.008 is hardly an edge) but Porcello gave up fewer home runs.

Verlander certainly played in front of a worse defense, but Porcello actually had the lower FIP which only considers strikeouts, walks and home runs allowed.

Fangraphs WAR, which builds around FIP, had them exactly even at 5.2 wins above replacement.

So, you will ask, why was there a perception that Verlander had the better advanced metrics season?

Answer: Baseball Reference.

Baseball Reference WAR

Verlander 6.6

Porcello 5.0

Now, let me pause here to say: Baseball Reference is a miracle. It is the joy of my life and the joy of most baseball writer’s lives. If forced to give up Baseball Reference or a family member, well, it would depend on which family member. But I am convinced that the main reason  Justin Verlander got 14 first place Cy Young votes to Porcello’s 8 is because of that fairly sizable gap in Baseball Reference WAR. There might be other factors, but I would wager that this is by far the biggest one.

I say that because Baseball Reference WAR is absolutely the biggest reason I thought that Verlander had the better statistical season.

Hey, I check Baseball Reference WAR every single day of the season. Well, I’m on the site every single day — I imagine many baseball writers are on the site every single day — and WAR is on a front page box, updated constantly. That Verlander lead in Baseball WAR absolutely played in my mind all season long. Everything else abut the two pitchers was so close so for me it came down to Porcello’s won-loss record or Verlander’s 1.6 win edge on Baseball Reference.

Of course I chose Baseball Reference. I don’t judge pitchers by wins and losses.

But here’s the thing: I had NO IDEA WHY Verlander had such an edge in Baseball Reference WAR. And at some point it occurred to me: I should know why. So I stretched my mathematical understanding to their breaking point and looked more closely at it. And, um, I say this with love: I think the Baseball Reference WAR formula got it very wrong.

* * *

 

Let me give this final caveat out of respect to Sean and all the good folks at Baseball Reference: I might have messed up on my math here. It’s no secret that I am mathematically challenged. I did run my numbers by a couple of much smarter people, and they seemed to agree with what I’m saying. But if the basic takeaway from the numbers are wrong, I will certainly correct the error.

 

OK, let’s break down Baseball Reference WAR for Porcello and Verlander.

First: Baseball Reference calculates its WAR based on runs allowed and innings pitched. This is in contrast with Fangraphs which, as mentioned, builds its formula around strikeouts, walks and home runs allowed. Baseball Reference takes how many runs a pitcher has allowed (unearned AND earned runs) and then, after making a few adjustments, compares those runs to league average. The adjustments can be a bit complicated but the idea of comparing runs allowed to league average is  simple.

OK, we start with runs allowed — and this includes unearned runs.

Porcello gave up 85 total runs in 223 innings. That’s 3.43 runs per nine innings.

Verlander gave up 81 runs in 227 2/3 innings. That’s 3.20 runs per nine innings.

Porcello gave up three more unearned runs than Verlander, which is why there’s a bigger gap here than between their ERAs. Next, we compare those runs allowed to the league average and here is what we get.

Porcello is 26 runs better than average.

Verlander is 33 runs better than average.

OK, perfect. Verlander is a little bit better. Next, there’s a small adjustment made based on whether the pitcher is a starter or reliever. You can read all the reasoning for this adjustment and all the others over at Baseball Reference.  Porcello and Verlander are obviously both starters, so you add 4.5 runs to their total.

The scoreboard:

Porcello is 30.5 runs better than average.

Verlander is 37.5 runs better than average.

Easy enough. Next comes ballpark adjustment. Fenway Park was tough on pitchers, so Porcello gains 5.7 runs. Comerica Park, meanwhile, leaned slightly toward the pitcher and so Verlander has 1.2 runs knocked off his total. You can agree or disagree with these adjustments; Bill James, for one, believes the adjustments are too small.

The scoreboard:

Porcello is 36.2 runs better than average.

Verlander is 36.3 runs better than average.

Now let’s stop right here and marvel at how close the two pitchers are. This FEELS right to me. They had almost identical seasons when you consider all factors, and here you have the two pitchers within a tenth of a run of each other. If the formula stopped here, they would basically have the exact same Baseball Reference WAR. And if that was the case, I think Porcello would have won the Cy Young Award more convincingly.

But it doesn’t stop here. You are probably wondering what adjustment could come along that would separate the two pitchers by almost two full wins.

 

Answer: Defense.

Yep. Defense. Even though Porcello gave up more unearned runs than Verlander, and even though Porcello’s batting average on balls in play was considerably higher (.269 to .256) and even though the Red Sox committed five more errors behind Porcello and threw out significantly fewer base stealers, the Baseball Info Solutions stats say that Boston was a much, much, much better defensive team than Detroit.

I should say here that overall I do believe wholeheartedly that Boston WAS a much, much, much better defensive team than Detroit. I just don’t know how that specifically affected these two pitchers.

The Baseball Reference WAR formula concludes it affected them a lot. I mean, seriously, A LOT. By my admittedly shaky calculations, Baseball Reference takes away NINE RUNS ABOVE AVERAGE from Porcello’s total and ADDS FOUR RUNS ABOVE AVERAGE to Verlander’s total.

And so, in the end, this is the final scoreboard:

 

 

Porcello is 27 runs above average

Verlander is 40 runs above average

Wow: That’s some gap now. And it is that 13-run difference that gives Verlander his 6.6 to 5.0 WAR edge. All 13 runs come from defensive adjustment.

Now, like I say, maybe I’m doing the math all wrong. Maybe defense doesn’t represent all 13 runs. But it unquestionably is the bulk of that 13-run difference. And, well, I’m just not buying it at all. Yes, I’m all for trying to isolate a pitcher’s contribution away from the defense’s. And I’m a big fan of Baseball Info Solutions. But this sort of massive defensive adjustment makes no sense to me.

For one thing, I think it’s quite likely that Detroit played EXCELLENT defense behind Verlander, even if they were shaky behind everyone else. I’m not sure how you can expect a defense to allow less than a .256 batting average on balls in play (the second-lowest of Verlander’s career and second lowest in the American League in 2016) or allow just three runners to reach on error all year (the lowest total of Verlander’s career).

For another, the biggest difference in the two defenses was in right and centerfield. The Red Sox centerfielder and rightfielder saved 44 runs, because Jackie Bradley and Mookie Betts are awesome. The Tigers centerfield and rightfielder cost 49 runs because Cameron Maybin, J.D. Martinez and a cast of thousands are not awesome.

But the Tigers outfield certainly didn’t cost Verlander. He allowed 216 fly balls in play, and only 16 were hits. Heck, the .568 average he allowed on line drives was the lowest in the American League. I find it almost impossible to believe that the Boston outfield would have done better than that.

What I’m saying here is that while the defensive adjustments seem shaky and unpersuasive, the stark final WAR number — 6.6 to 5.0 WAR — is there in your face. I don’t know how many people voted for Verlander because of Baseball WAR numbers, but I suspect at least a handful did.

And I wonder how many of them realized they were voting for a defensive adjustment. I love the concept of WAR, and I appreciate the efforts to make it better all the time. And I know the Baseball Reference people do not claim that it is the perfect statistic or that anyone should base their entire award ballot on it. But WAR does have real sway in the baseball commuinity. And in this case, I think it was pretty misleading.

Print Friendly

42 Responses to Porcello v. Verlander

  1. nightfly says:

    That is very good stuff, Pos. It seems ironic that in a world of highly-advanced metrics, some very basic things (unearned runs, errors) should show a more accurate picture.

    I wonder, is it possible to do the defensive adjustment based only on how the defense performs behind a particular pitcher, or would that make it too complex a formula? It seems like one should account for defense somehow, and consider how much help a pitcher got from his fielders. But simply assuming that the same defense played evenly and equally well behind everyone all the time is a surprisingly crude mistake considering the kind of fine detail baseball statistics have offered lately. It’s like assuming that every pitcher on a team has equal run support.

    • Cb says:

      Build your formula around FIP. It, as the name implies, eliminates the fielders from the formula and gives you a better idea of “raw pitching”

      • Mark Daniel says:

        That’s what Fangraphs did. Their pitcher WAR is based on FIP, as a result, both Verlander and Porcello had the exact same WAR of 5.2, which is tied for 1st in the AL (along with Chris Sale, and Kluber at 5.1).

        • Chris says:

          The primary problem with the FG model is that it’s two steps removed from actual on-field results. FIP is great, but if you’re looking at a metric for evaluating what happened in the past, I’d argue that you should err in favor of a more comprehensive view of what happened on the field, as opposed to isolating components and independently weighting them.

    • Rob Smith says:

      Yes, that would be ideal. But getting the reporting down to that level of detail may not be very easy. I’m guessing they don’t have a way to get that info at that level, keeping in mind that some pitchers throw to one batter, some are flyball pitchers for which outfielders are more important & some are groundball pitchers, some are good at holding runners negating stolen bases, others don’t hold runners well at all (negating the excellent throwing ability of their catcher) & other oddities.

      So, I can see where BBR took the approach that most statistical quirks and issues sort themselves out over the course of a long season. So an overall defensive average should generally work fine. But, as this analysis shows, averages don’t always reflect reality for any one pitcher in any one season. But I don’t know that they can really pull anything better at this point.

  2. Patrick says:

    Bravo to you for digging into the numbers, and for writing about it!! I hope Sean Forman responds. He is always open and thoughtful when it comes to discussing WAR and BR.

  3. Edward says:

    If Porcello and Verlander were equal (and having watched Ricky and Justin when they were both members of the Tigers, I would take JV every single start, no matter what kind of season Ricky would have), this still doesn’t explain why two writers would inexplicably leave Verlander off their ballots altogether. If you believe the two of them are equal-ish, then you list them close together on the ballot.

  4. Edward says:

    I think in time that Porcello’s Cy Young is going to be measured up against Steve Stone’s 1980 Cy Young — a solid, competent innings-eater who for one year put everything together.

  5. Travis says:

    Maddux won 20 in 1992 and in 1993.

  6. Big Daddy Bobo says:

    Joe, I think you have revealed in a way that many folks can understand, three of the inherent problems with statistical values such as WAR. First is the tendency of people to evaluate performance based on a single value – whether it is an old-fashioned and imperfect value such as pitcher wins, or whether it is a new-fangled amalgam such as WAR. We’ve all been told that WAR is the best representation of a player’s value, and so we believe it is the end-all-be-all. Just like pitcher wins, WAR is a shortcut attempt to define overall value. It may be an improvement over pitcher wins, but it is still a shortcut, and much can be lost with the shortcut. I’m not bashing on WAR, I just think it is important for folks to understand that over-reliance on any kind of shortcut can lead to improper analysis.

    Second, your discussion of the impact the defensive adjustment input has on the WAR calculation, shows that the variables and adjustments which are the inputs in a calculated must be vigilantly assessed and reassessed over time. I’ve seen way too many computer models in science and engineering get out of whack because of inputs, adjustments, and assumptions. Statistical analysis is important, but we can never rest in thinking we’ve got it right all the time.

    Third, your analysis demonstrates the WAR calculation has a subjective element to it. Too many people forget the simple fact that calculated values often have subjective adjustments built into them – adjustments, as you have shown, can lead to inaccurate results. WAR is not based on an immutable, measured scientific truth such as 9.8 m/sec2 is the acceleration of the force of gravity. Again, our understanding of the difference between measured values and calculated values should drive us towards refining our inputs into the calculated values and never being satisfied.

    I do think it is interesting you characterize the impact of the defensive adjustment as an “error”. To me, an error is a mechanical failure – somebody used the correct formula but made an error by inputting the wrong number or executing the calculation incorrectly. I think it would be better to characterize the role of the defensive adjustment in the WAR calculation as an inaccurate assumption or as an inaccurate input.

    • Rob Smith says:

      I agree. This is true for any current statistic. It drives me crazy when people hang their hat heavily on FIP & debate whether a pitcher can really control the type of contact a hitter can have. But, FIP is effective in showing certain things about a pitcher. How much to they dominate hitters, for example. So, this is a great example why we shouldn’t cling too tightly to ANY specific measure. They all need to be considered when evaluating a pitcher. My guess is that WAR is pretty good when looking at a career (except for the obvious concern about the impact of an extra long career in building up WAR). But, even over one year, it’s not always going to fully capture reality. It’s just an indicator.

  7. Chill says:

    Thanks, Joe. Maybe my favorite thing you’ve written about baseball this season.

  8. PhilM says:

    See, all the more reason to use ERA+ and negative binomial (Paschal) distribution to determine “neutral” win-loss records! (It’s like Pythagorean, but without the annoying run-context sensitivity.) Verlander’s “neutral” record for 2016 is 17-8, while Porcello’s is 18-8. This satisfies the “I want to see win-loss records” as well as the “but let’s use better statistics to determine them” crowds. And of course Kluber’s is 19-8, so he was actually the best. 😉

  9. the_slasher14 says:

    Well said and informative. My main reason for liking the Porcello choice was it gave us Verlander’s lady friend’s priceless response.

    • Rob Smith says:

      Is it creepy that, while I generally despise when significant others go to Twitter to defend their partners, I now give Kate Upton complete carte blanche to do so whenever she likes. In fact, I encourage her to do so regularly. Maybe she can date Aaron Rogers next. The possibilities boggle the mind.

  10. Brent says:

    I see FIP is mentioned above. Let me point out that for any pitcher who has half his games at Fenway, FIP is less reliable than for pitchers with any other home park. There are balls regularly hit at Fenway that a) aren’t home runs (and obviously aren’t walks or strikeouts) and b) aren’t catchable. Not aren’t sometimes catchable like line drives that happen to hit a gap, but aren’t ever catchable (by anyone not using a trampoline or some other device to assist in jumping). Yet, they are still in the formula as plays that can possibly be made. There are other stadiums will tall walls, but none as tall as the Monster. And I think it can proven that both FIP and BABIP are affected, at least slightly by the balls off the wall.

    • Rob Smith says:

      I can see your point about BABIP since the wall catches some fly ball outs & turns them into doubles off the wall. But I’m not sure how FIP is impacted since only HRs play into FIP on balls in play. It was my understanding that the wall also stole some line drive HRs, turning them into doubles and sometimes singles. I believe the HR average at Fenway is actually below league average. So, FIP would actually be slightly lower at Fenway.

      • Rob Smith says:

        I was having trouble understanding your point…. maybe you were making the same point I was making?

      • Brent says:

        Yes, I am not sure I disagree with you. My point is that FIP will be affected in ways not seen in other ballparks. In other words, FIP in Fenway is not consistent with FIP in Kaufmann Stadium because all balls hit within the confines of Kaufmann are “in play” and be affected by fielding and positioning of fielders. But not every ball hit within the confines of Fenway can be characterized that way. Any ball that hits more than 10 feet up on the Monster should really be treated like HRs if you want Fenway FIP to be the same as Kaufmann FIP. How about saying it this way? Balls 10 feet up or higher on the Monster are as independent of fielding as HRs are and should be treated as such when calculating FIP.

      • Chuck H says:

        Flip side of the wall catching fly ball outs and turning them into doubles is the wall stopping line drive home runs and turning them into singles.

  11. Bill Smith says:

    So…what did Kate say after she read you post?

  12. invitro says:

    After reading the article, and before reading the comments, I just want to say… this is a great, great article. This is exactly what sportswriters should do, what seemingly all of them are too lazy to do, one small reason why I love reading this site even when I get tired of the “there’s something in my eye” stories and comments. I actually hadn’t looked closely at these guys’ WAR’s… I know that pitching WAR is pretty hotly debated (in contrast to hitting WAR).

    I’m glad to hear Joe admit he bases his award voting mostly off the WAR numbers. Maybe he’s said so before, maybe not so bluntly. I think it’s a good thing to do. It does feel a bit awkward, though, when Joe (or someone else) goes through all the stats when talking about an award voting decision and I’m thinking… c’mon now. It’s the WAR. It’s all about the WAR.

    All of the big controversies of the last few years with WAR have been about defense, in particular how much more defense counts with b-r’s WAR than the unaware might think. This is generally in regard to a hitter’s defense. We just went through a big one with Heyward. It’s a change to see it with a pitcher.

    I don’t have any idea yet whether b-r’s defensive adjustment is right in this case. I’m not sure how to know that without doing everything Joe did, and then go much further by breaking down the defense of the outfielders. I do have a strong sense that outfielders’ defense is much, much more important than people think, or thought five years ago, or is measured so by the b-r formulas. Off the cuff example: I think the Royals had their two-year run mainly due to their outfield defense, and getting it relatively cheaply. Anyway, I’m eager to read the comments now :).

  13. invitro says:

    One thing that “ticks me off” about Joe’s article is the use of “smell test”. Joe says that if WAR is off by 1.6 from what he expects it to be, then it doesn’t pass the smell test. This is ridiculous. The “smell test” is whether WAR is off by a *large* amount. If someone computed Verlander’s WAR to be 1.0, or 10.5, then it wouldn’t pass the “smell test”. To say that a difference of 1.6 doesn’t pass this test is to misunderstand the meaning of the term.

    Now, I know that “smell test” is a hot buzzword, and so many writers try to use it as often as possible. But it really is not appropriate here.

    • Steve Sherman says:

      But 1.6 is a very significant difference in WAR. It definitely qualifies as a ‘large amount’. Look at Porcello’s bWAR as a percentage of Verlander’s and this should be clear.

      • invitro says:

        Here’s a quote from Sean Forman, from the latest article: “We state clearly that we don’t find differences of 1-2 wins to be definitive.” Were you aware of that? Does it change your opinion?

  14. MikeD says:

    Interesting article, as always, but please don’t give Brian Kenny credit for anything. It’s unfortunate he’s become the face of sabermetrics for many baseball fans. He simply regurgitates many ideas that have been around since the early ’80s when Bill James began to popularize them, and by many other writers and analysts since. He talks about a “revolution” that’s been ongoing long before he ever showed up. That’s fine, but he’s as guilty as the old guard he picks on. He’s regimented in his thinking. He actually discussed the Porcello/Verlander vote today on his program. It would have been nice if he even acknowledged Joe’s article in an attempt to present a clearer picture.

  15. SteveC says:

    Good breakdown. I’m a Detroit and Verlander fan, but both guys were worthy. No crime was committed. One thing that hasn’t been mentioned is, more than defensive support, the possible role of offensive support. Porcello got 6.61 runs per start, the highest in the league. Verlander got 3.97, 30th in the league and the lowest among the contenders. He had three 1-0 games, two were losses and one was a no decision. He had another 2-1 loss. He had eight other no decisions, in six of them he pitched at least 7 innings; in all of them he gave up no more than 2 runs.

    If he gets any support, and the 1-0 games and the 2-1 game go his way, his record is now 19-9. Three of those no decisions ended up being Tiger wins. In all three, he pitched 7 innings and gave up 2. If those become wins for him, he’s 22-9. I wonder is that makes his case more convincing. I didn’t crunch Porcello’s numbers as closely, but he did not have as many no decisions or close losses.

    I know that ifs make for great and endless discussions, but I wanted to add that to the pile.

    • SteveC says:

      Sorry everyone. My math was wrong. If the 1-0 and 2-1 games went Verlander’s way, his record would have been 20-5. If he then got the three no decision wins, it would have been 23-5. I think that’s right. Feel free to correct me.

    • MikeN says:

      Any of those numbers would have given Justin the win. However, you have to make the same adjustment for Rick and see what is his new record.

      • SteveC says:

        Without a doubt. Porcello had close losses – 1-0, 3-2 and 3-1. If he gets those, he’s 25-1 and it doesn’t matter about any other stats, he’d win.

        But you wonder about run support. I just picked two numbers, 8 and 6, and wondered how many times their teams scored at least that much in games they started. Granted, it’s rough. I’m not looking at when the runs were scored but it gives a sense of security. Porcello got at least 8 runs in 13 starts. Verlander got that in 5. Porcello got at least 6 in 18 starts. Verlander got it in 7.

        Fun with numbers.

    • Simon says:

      SteveC, this is what I thought of too – the differences in run support are huge, and this is one of the most important reasons to reduce the importance of pitcher wins. I also think both pitchers are worthy enough for the award.

      Then I DID run some numbers. This is going to be a bit of a blog post on its own – apologies in advance.

      Porcello has 33 starts and Verlander has 34, so to compare them I removed Verlander’s first start of the year, a 6-IP, 3ER, no-decision 8-7 win for the tigers. I took the 33 games of Porcello-Boston run support, and the 33 games of Verlander-Detroit run support (in total over 9IP). I took the Runs allowed (R, not Earned Runs ER) for each of the pitchers over their 33 starts. I did not include bullpens or the timing of runs scored.

      I randomized the order of Boston run support and Detroit run support, and compared these 33 numbers, in sequence, to the runs allowed by Porcello and Verlander. If the pitcher gives up fewer runs than the team scores, he gets a win. If it’s more, it’s a loss. If it’s the same, it’s a no-decision. Again, no bullpens or timing of runs, which is going to over-estimate wins and under-estimate no-decisions. I repeated the randomization & comparison 100,000 times.

      Using Boston’s runs scored, their Median win-loss records are:
      Porcello 27-4
      Verlander 27-4

      Using Detroit’s runs scored, their Median W-L are:
      Porcello 20-9
      Verlander 22-8

      Boston scores so many runs that the comparison is almost identical – they did have similar seasons – and not as interesting. Detroit scored so few for Verlander that it DOES get a little more interesting. Why is there a difference?

      The two pitchers have a different distribution of runs allowed over their 33 or 34 starts (back to Real Life Stats). Porcello allowed 5ER or more only once – 5 runs to Baltimore in a 12-7 no-decision Boston loss. (I’m using ER Earned Runs this time, since this is less about team wins and losses and run support). Verlander allowed 5ER or more three times – games of 7, 7, and 8 ER in which he took the loss every time.

      The pitchers allowed 0 – 1 – 2 – 3 – 4 ERs this many times:
      Porcello 3 – 6 – 7 – 11 – 5
      Verlander 4 – 9 – 10 – 6 – 2

      Verlander had 13 starts with 0 or 1 ER and 23 starts with 2 or less ER.
      Porcello had 9 starts with 0 or 1 ER and 16 starts with 2 or less ER.

      Porcello never blew up – he pitched 5 or more innings in every start and failed to complete six innings only three times. That’s amazing. His team had a good chance to win every start. In more than half of his starts (18/33), he gave up 2 or 3 ER. Verlander got shelled three times, giving his team almost no chance. Otherwise, he was generally outstanding. In more than half his starts (19/34), he gave up 1 or 2 ER.

      Using the same method as described above with 100,000 simulations, I compared runs allowed (not ER) between Porcello and Verlander directly. This is probably more unfair than the other comparison because of park factors and differences between fenway and comerica parks, but Verlander comes out well ahead on the strength of his more better starts and few terrible starts.
      Porcello 10-17
      Verlander 17-10

      Porcello was certain to give you a good start. Verlander was more likely to give you a great start. I think your choice boils down to how much you value great over good – or vice versa. For Boston, scoring 6.6 runs per game for Porcello, his always good was usually good enough. For Detroit, scoring less than 4 runs per game for Verlander, great was needed more often, and he delivered amazingly well.

      I choose Verlander. I’d take his distribution of 3 guaranteed losses* and 23 starts of 2ER or less over Porcello’s 0 guaranteed losses* and 16 starts of 2ER or less. That’s 4 more games in which I think I have a great chance to win. I choose likely greatness over always good-ness.

      *guaranteed losses are not an Official Statistic, but I hope you know what I mean.

      • SteveC says:

        Hey Simon. Good number crunching and good post. Thanks for taking the time. I think every stat had them close and that’s the way the vote came out. Again, both worthy of the award. To play off what you said well, Verlander’s best was probably better, but Porcello had the better all-season consistency.

  16. KHAZAD says:

    It is fun to talk about going away from wins as an indicator, and we have seen some examples in recent years. But Porcello eked out a victory because there was no tuly dominant (meaning clearly better than the other contenders) pitcher and he was 22-4.
    Does anyone think that if their win loss records were reversed that Porcello would finish higher than 4th, or that a a 22-4 Verlander would not win easily?

  17. jg says:

    “For one thing, I think it’s quite likely that Detroit played EXCELLENT defense behind Verlander, even if they were shaky behind everyone else.”

    Love Joe but this is absurd.

  18. Pete R says:

    One of my favorite statistics is WPA, even though it completely ignores defense. Coincidentally, I think, Verlander again beats Porcello by about 1.5 (3.792 to 2.374).

    I looked at their baseball-reference 2016 Game Log pages- yes, it is indeed a wonderful site- and I clicked on “WPA” to rank their games from best to worst.

    And I found something very strange. In Verlander’s six BEST games, his Tigers went 1-5. Now that seems to be virtually impossible. The reason they did so badly: Verlander was charged with three runs, total, over six games: while his bullpen was charged with 15 runs. That’s with him pitching seven or eight innings in each of the six games.

    Can we give him those five games, which would turn 16-9 into 21-8? Well, if we did, we should turn Porcello’s 22-4 into 23-3 by the same token. But then it’s a lot closer.

    And by the way, let’s look at Porcello’s worst four games (four or five runs in about six innings): he went 0-1 in those four games. Run support doesn’t hurt.

  19. bosoxfor life says:

    After reading this article as well as the Bill James article that Joe mentions, it appears that statisticians are confusing number crunching with the scientific method. Automatically assuming that the Tiger defense cost Verlander runs and, conversely, that the Red Sox defense saved Porcello runs is nothing but shear guesswork. Now that guesswork is an educated guess but that is all it is. Without seeing and judging every play made behind both pitchers there is no truth and for BR to assume such does nothing to advance some of the more debatable sabermetric ideas. This comes from somebody who believes in statistics as a predictor of probability but understands there are limitations.

  20. […] and publicly, Bill James, Tom Tango, and Joe Posnanski have been arguing about Baseball Prospectus’ version of Wins Above Replacement. Specifically, […]

  21. […] and publicly, Bill James, Tom Tango, and Joe Posnanski have been arguing about Baseball Prospectus’ version of Wins Above Replacement. Specifically, […]

Leave a Reply

Your email address will not be published. Required fields are marked *