Stats I Like
Posted: March 10th, 2008 | Filed under: Baseball | 57 Comments »
The day is here! The Soul of Baseball paperback is now officially out, available on-line, available in bookstores, available at my house (send your checks!). Buy the award-winning book that President George Bush called “Thank you for the copy.” The book that inspired my daughter Elizabeth to say, “What’s in the box? Oh, no, not MORE of these books!” We’re waiting for the Barack Obama endorsement (Yes, he can). But you’ll want to beat the rush and buy immediately.
* * *
I certainly do not claim to be on the cutting edge of baseball analysis. I think of this a bit like my Bruce Springsteen fanhood — on one level, I’m probably a very big Bruce Springsteen fan. I have all of his mainstream albums, I’ve seen him in concert a few times (well, four times, going on five), I was for a short time a subscriber to the fan-mag “Backstreets,” and I own two Springsteen books (though neither of them is exactly ABOUT Springsteen — Ive never read Born to Run or Glory Road). I would imagine this puts me in the top, oh, 5% of Springsteen fans.
But I would also say that my resume would not get me into a Springsteen Ivy League school. I have at least 10 friends who are bigger Springsteen fans than I am, friends who more or less TAKE PITY on my puny but gallant efforts to appreciate the Boss.
That’s more or less how I feel about baseball analysis — I love the game, and probably spend more time thinking about baseball (my job notwithstanding) than 95% of baseball fans. But I’m pretty lousy at analyzing it, and I obviously have many many many many friends and acquaintances who use statistics, knowledge, sources and just plain good sense to offer ten-thousand-times better insight into baseball than I do. Unfortunately for them, I happen to care more about Gilligan’s Island and the Buggles, which is why you are here when you should be reading their stuff instead.
I say all that as an explanation: A handful of people have asked me to write about some baseball stats I like, why I like them, what they do, and how I break down baseball. It’s an incredibly self-serving thing to do, which is why I have avoided doing it up to now, and I would say that to most of you it probably will read like a Dick and Jane book. I would recommend that 99.9993% of you skip it — basically everyone except those who were in my wedding party.
For those of you still with me, here are just a few of the ones I like and how they work. This isn’t meant as a comprehensive list … it really is just about a few stats I like:
* * *
Hitters
The Stat Trio: batting average/on-base percentage/slugging percentage.
This is my favorite way to get a quick glance at a player’s abilities. You have batting average, a stat I don’t like on its own but it is quite interesting when you can see it smacked up against the other numbers. You have on-base percentage, which tells you how often the guy gets on base. You have slugging percentage, which tells you what kind of extra-base power the batter’s giving you. Those three numbers should give you a pretty good feel for the kind of player.
When considering a player, I like to start with the baseline: .300/.400/.500. That’s a very, very good player — almost an ideal player, really. If you could put up those numbers over the course of a career, you probably will go to the Hall of Fame. Only 20 players in baseball history (with more than 5,000 at-bats) have surpassed all those benchmarks — all the eligibles are in.
For a season, a .300/.400/.500 is a really, really nice year. Think Al Rosen in 1954 or Eddie Murray in 1984 or Chipper Jones in 2003. You know, a very good, very steady, gets on-base, hits-with-power kind of player.
So you start with .300/.400/.500 and then you go up and down from there.
Let’s say you see a player who has had a long career and his Stat Trio is: .272/.370/.470
Well, just from that, I would think, hey, there’s a very good player, maybe a Hall of Famer, maybe not depending on other factors (like what position he played, what time he played in, his choice of performance enhancing drug, etc). His average is a bit low for the Hall, but he clearly walked a lot (see the gap between batting average and on-base percentage), he hit with power but perhaps not ferocious power. Good bet that this was an outstanding and very steady player.
And he was: That’s Dwight Evans.
Let’s say you see another longstanding player with a Stat Trio of: .281/.330/.331.
From that, you can guess that he was a middle infielder — that .331 slugging is a dead giveaway. Any player with a long career and a slugging percentage that low, almost without exception, is a well-regarded defensive middle infielder. I would guess that because of the 281 lifetime average — about as high as you can go with a slugging percentage that low — this guy had a lot of speed, and was probably a leadoff hitter. Of course, I’m cheating because I know who this is, but I think that’s how I would look at it.
Anyway ,that’s Maury Wills.
Seeing all those numbers in a row like this gives you a good sense of how much that batting average means. Don’t get me wrong, it’s good to hit .300. But it’s not necessarily great.
Say, you have a player hit .264/.386/.554. With that on-base percentage and slugging maybe you shouldn’t worry about his low average too much. He’s walking and hitting with a lot of power. That’s Adam Dunn.*
You have a player hit .300/.322/.387. Now, that’s an empty .300. And because he DOES hit .300, you can guess that some some manager who should know better is leading him off even though the guy can’t get on base. You bet. That’s Ralph Garr.
You have someone hitting .314/.381/.394. Well, you hope that someone with that low a slugging percentage can run and play a premium position. If he can’t run, and he is your designated hitter, that’s trouble. And that’s Jose Vidro. Now wonder some people want the Mariners to sign Barry Bonds.
*I know there some people who felt like my last blog post on stats was pointed toward a friend of mine, Paul Daugherty, and I just want to say it was not, or at least it was not consciously pointed his way. I did read his column about Dusty Baker, which had a couple of swipes at Bill James, and I’m sure I locked that away in my mind. But the post itself was started a couple of days before, and when I wrote the Bill James Pozterisk, I was actually thinking about someone else who had just sent me an essay he had done where he ripped Bill James for taking the fun out of baseball. Maybe reading both that and Doc’s column on the same weekend was too much — I’m not saying Doc had nothing to do with it. I’m also not going to say I liked Doc’s column because I did not, and I suspect Dusty Baker will be a fiasco in Cincinnati, and Bill James will probably keep on doing all right in Boston. But I respect Doc a lot, I’ve long admired his work, and I’m entirely grateful to him — he was one of two or three key people who helped me get my breakthrough job in Cincinnati.
Also, while we’re here, I would ask for everyone to please tone down some of the name-calling and personal attacks in the comments section. I know I have joked about being angry at comments directed at me before, but to be serious for a second, I don’t mind those … rip me all you like for this FREE STINKING BLOG THAT I’M GIVING YOU FOR FREE AND DID I MENTION IT HAPPENS TO BE FREE? (See, I’m joking again). But lay off each other. Give peace a chance. I’ve had friends have to cut off the comments on their blogs because the name calling just got ugly, and I would rather not do that. OK, enough of that. Back to the countdown.
One more comparison. Take two guys fighting for the MVP spot:
Player 1: .308/.352/.605
Player 2: .343/.402/.501
First thing I notice is the HUGE slugging percentage of Player 1. I suspect that guy led the league in homers, probably RBIs and with that .308 average I would bet on him winning the MVP. He obviously did not walk at all, and his on base percentage — the most important of the three averages when it comes to scoring runs — is nothing special.
The second player hits all three baselines, and I would guess he had a more well-rounded year. As it turns out, the second player also stole 21 bases and played a premium defensive position well and, in my opinion, should have been the runaway MVP. The second player is Alan Trammell, the first George Bell, the year 1987.
OPS+: This is OPS (on-base percentage plus slugging percentage) as measured against league average with ballpark effects taken into consideration. It’s a complicated formula, but at the end of the day 100 is league average and every point over or under is a percentage against league average that year (a player with a 113 OPS+, for instance, performed 13 percent better than league average).*
*You will find this stat, and almost every other stat mentioned here, on the remarkable, incredible, gift from the Gods, www.baseball-reference.com. I spend roughly 800 hours there per week.
I’m not overly crazy about OPS — I’m not exactly sure why you ADD on-base percentage and slugging to get a statistic. I guess this is a fairly common complaint among people who know a lot more than I do about math. Also, most people would say that OBP is quite a bit more important when it comes to scoring runs than SLG, so it’s wrong to just add them together, like they’re worth the same thing. LIke I say, I prefer looking at the Stats Trio and make judgments from there.
That said, I like OPS+ because of the way it tries to put the number in context. For instance, take a look at these two lines for a season:
Player 1: .317/.360/.536
Player 2: .313/.359/.531
They’re awfully close, no? Well, we could look at some counting stats to divide them too:
Player 1: 31 doubles, 11 triples, 29 homers, 119 RBIs, 105 runs.
Player 2: 39 doubles, 3 triples, 31 homers, 141 RBIs, 114 runs.
Again, fairly close, Player 2 certainly seems to have the better of it. Obviously you know by now it’s a trick … here are their OPS+.
Player 1: 146 OPS+
Player 2: 112 OPS+
These seasons weren’t close at all because of the time and ballparks. Player 1 is Roberto Clemente in 1966, when he won his MVP award. He played in a fairly neutral ballpark in a year when pitchers dominated … the league OPS was only .722. Clemente’s 146 OPS+ — being 46 percent better than league average — is outstanding.
Player 2 is Dante Bichette in 1996. He played in a crazy hitters park — Coors Field for about a five-year period there was probably the craziest hitters park since that ridiculous Lake Front Park in Chicago 1884* — and he played in big-scoring era when league OPS (adjusted to Coors) was .830.
It isn’t always quite this tidy, but I think OPS+ is an excellent way to judge players in different eras.
*One of the great rarely told stories is of the Chicago White Stockings in 1884. They played in Lake Front Park, where the right field fence was only 215 feet away. It was so ridiculously short that until 1884, a ball hit over that fence was called a ground-rule double. But that year, the White Stockings and their racist manager Cap Anson – sorry, can’t mention Anson without throwing in that little aside — decided to make any ball hit over the fence a home run. Apparently this is how rules were made in 1884.
So here is how the teams ranked in home runs in 1884:
Buffalo Bisons, 39
Boston Beaneaters, 36
Detroit Wolverines, 31
Ned Williamson, 27
Fred Pfeffer, 25
New York Gothams, 23
Abner Dalrymple, 22
Providence Grays, 21
Racist Cap Anson, 21
Cleveland Blues, 16
Philadelphia Quakers, 14
Obviously the four in bold are people rather than teams, and they all played for the White Stockings. The White Stockings hit 142 home runs as a team over that Little League fence — Ned Williamson set a major league record by hitting three homers in a game, and then two others followed with three-homer games in the ballpark. Williamson would hold the single season home run record until the end of the Dead Ball Era when it was smashed by Babe Ruth.
Equivalent Average (Eqa): A complicated but very useful Baseball Prospectus statistic that takes into account times on base, total bases, stolen bases and times caught stealing, and spits out an average that takes into account ballpark effects and what kind of offensive year it was in baseball. The cool part about Eqa is that an average season is always .260 … so you know that if a guy has a .300 Eqa, that’s really good; you can view Eqa exactly the way you view batting average.
It’s a nice one-stop shopping number. Last year, for instance, A-Rod led the American League with a .340 Eqa, followed by David Ortiz (.338), Magglio Ordonez (.336) and Carlos Pena (.336).
One thing Eqa does better than perhaps anything I’ve seen is tell you how ridiculous Barry Bonds’ Balco run really was:
2000: .362 Eqa (led league by 13 points)
2001: .427 Eqa (led league by 59 points)
2002: .453 Eqa (led league by 101 points)
2003: .412 Eqa (lead league by 50 points)
2004: .457 Eqa (led league by 112 points)
Bonds led the National League in Eqa last year too.
Win Shares: Talk about complicated — it took Bill James about 200 pages to explain how Win Shares work. But the end result is very simple. Bill found a way to statistically break down a teams wins — that is, he gave different players their “share” of the teams victories. There are all sorts of numbers involved in figuring Win Shares — starting with the fact that each team actually gives out three shares per win — but I only worry about the end result. You could break down the numbers like so:
40+ Win Shares — a historic season.
30+ Win Shares: An MVP type season.
22-29 Win Shares: An All-Star type season.
17-21Win Shares: A good solid season.
12-16 Win Shares: Depending on the playing time, an OK season.
You can compare how Win Shares ranked Barry Bonds’ historic Balco run:
2000: 32 (third in league)
2001: 54 (led league by 12 points)
2002: 49 (led league by 17 points)
2003: 39 (trailed Albert Pujols)
2004: 48 (led league by 11 points)
* * *
Pitching
ERA+: This is the same concept as OPS+, only obviously it’s ERA, which explains the name. Again, it is measured against the league average that year and it takes ballpark effects into account. How about another comparison?
Player 1: 19-17, 2.63 ERA.
Player 2: 19-7, 2.64 ERA
Well, obviously Player 1 was involved in many more decision, and so probably threw many more innings. And in this case — because Player 1 threw SO many more innings — Win Shares is of no help:
Player 1: 26 Win Shares.
Player 2: 26 Win Shares.
ERA+ makes it pretty clear who had the better year:
Player 1: 114 ERA+
Player 2: 181 ERA+
Yes, Player 1 is Don Drysdale pitching in that enormous Dodger Stadium with the mound roughly the height of Mount Kilimanjaro. The league ERA that year, ballpark adjusted, was 2.99.
Player 2 is Randy Johnson in 2000, pitching in the very hitter friendly Bank One Ballpark. He won his second straight of four consecutive Cy Youngs, and he deserved. The league ERA that year — again ballpark adjusted — was 4.78.
Runs Saved Against Average: This is, as far as I know, a Lee Sinins creation — he is the producer of the Sabermetric Encyclopedia CD which is a great, great thing to have. If you’re curious, Sinins says Runs Saved is very similar to Total Baseball’s “Pitching Runs,” although they figure ballpark effects differently and Sinins allows his pitchers to have negative numbers.
I like the negative numbers. In 2006, Jose Lima scored a -50 Runs Saves — meaning he gave up roughly 50 more runs than the average pitcher. I was proud to see so many of those games.
Basically the stat is just what its name claims … it tries to estimate how many fewer (or more) runs that pitcher gave up compared to what the average pitcher would have done in the same number of innings in the same ballpark context. Bill James swears by Runs Saved — It gives you a good feel, again in one number, how well that pitcher pitched that year.
Here are the top Runs Saved seasons over the last 50 years:
1. Pedro Martinez, 2000, 77
2. Pedro Martinez, 1999, 71
3. Roger Clemens, 1997, 69
4. Pedro Martinez, 1997, 65
5. Greg Maddux, 1995, 64
6. Randy Johnson, 2002, 62
7. Randy Johnson, 1999, 60
8. Randy Johnson, 2001, 59
9. Dwight Gooden, 1985, 58
(tie) Sandy Koufax, 1966, 58
(tie) Ron Guidry, 1978, 58
12. Juan Marichal, 1965, 57
(tie) Randy Johnson, 2000, 57
14. Vida Blue, 1971, 56
(tie) Bob Gibson, 1968, 56
(tie) Greg Maddux, 1994, 56
17. Greg Maddux, 1998, 55
(tie) Randy Johnson, 1995, 55
(tie) Randy Johnson, 1997, 55
(tie) Roger Clemens, 1990, 55
You think Randy Johnson could be on the list a little more? Obviously the list is overwhelmed by Selig Era pitchers because that was such a huge offensive era. Pedro’s 2000 season (18-6, 1.74 ERA) looks plenty good at a glance. But it was better than that. You can see, it was in its on way, significantly better than Gibson in ‘68, Koufax in ‘66, Guidry in ‘78. It was better than Dizzy Dean in ‘34 (when he went 30-7 and saved 66 runs) and better, even, than Walter Johnson’s 1913 (the famed 36-7, 1.14 ERA season — the Big Train saved 75 runs, which is INCREDIBLE in the Deadball Era, but still not there with Pedro).
* * *
Baserunning
As far as I know, Bill James is the first mainstream writer to really take on base running stats — in his Handbook every year, he breaks down base runners into a plus/minus system that is really a lot of fun. But what I want to talk about briefly here is how the numbers can give us a pretty good sense of how much speed means in baseball.
Let’s take perhaps the fastest runner in the American League — I’m saying Carl Crawford, though I see that Carlos Gomez is now telling everyone that HE’S the fastest (Advice: Score a few runs, kid, before yapping). And let’s take perhaps the slowest runner in the American League … Big Papi.*
*I realize that, technically, no player is slower than Bengie Molina — no LAND MASS is slower than Bengie Molina — but he was in the NL in 2007.
OK, so we know that Carl Crawford will steal you 50 bases a year while Papi will steal, um, well last year he got 3, which isn’t bad. Crawford also hit nine triples last year, while Papi, well, he hit 1 — he’s hit at least one every year this decade.
But what does it mean on the bases? Last year, Crawford scored 93 runs. He hit 11 homers, so let’s leave those out for the moment. We’ll say he scored 82 non-homer runs.
Papi, meanwhile scored 81 non-homer runs (116 minus 35). So that’s pretty close.
Here’s how Bill breaks down the base-running numbers.
Score from second on a single:
Crawford: 14 out of 23
Papi: 18 out of 31
Score from first on a double:
Crawford: 4 out of 8
Papi: 1 out of 9
Go 1st to third on a single:
Crawford: 5 out of 20
Papi: 8 out of 36
Crawford did take 26 bases on wild pitches, passed balls, balls, sac flies. But Papi took 27 bases.
So what does it all mean? I mean there are a lot of ways to break it down. Have at it. This is how I see things: Crawford scored 81 runs in 210 times on base (39%) which is fabulous. Papi scored 81 in 262 times on base (31%) which is really not bad for a big man.
Crawford obviously turned many more singles to doubles with his stolen bases, and he scored from second base at a higher percentage (though not a lot higher). He got three more runs than Papi just scoring from first on doubles. He was more effective going first to third.
But really, what I get out of this is that over a long season, speed on the bases can only get you so much. I know there are many who believe these slow guys don’t really help you when they walk — they clog up the bases, they stifle rallies, and so on.
But I take from these numbers a different story. Over a full season, the fastest guy might — again MIGHT — score 10 to 20 more runs than the slowest guy if given the same opportunities. That’s not inconsequential. But it’s probably less than I had expected. And that again is only if given the SAME OPPORTUNITIES. Speed’s just dandy. But it’s still all about getting on base. The slowest guy in this case got on base 50 more times than the fastest guy and scored about the same number of runs on the bases.
And that doesn’t even get into the 25 more homers he hit.
I’ll say this: If the slowest guy hits 25 more home runs, and gets on base 50 times more — well, let’s just say, I don’t care if the first guy runs like Mo Greene and the second guy runs like Lorne Greene, it’s absolutely no contest.
* * *
Fielding
John Dewan’s Plus/Minus: This is a fielding system, developed by John Dewan, that essentially involves Dewan and his merry men looking at every single play, plotting it on a computer, and determining how many defenders in baseball would have made the play.
A ground ball his hit in the hole between short and third. Derek Jeter, stabs it, makes his patented jump throw, and gets the runner. Dewan punches it into his computer and discovers that Major League shortstops only make that play about 25 percent of the time. OK. That means that’s a plus-.75 for Jeter.
The next ball hit is exactly 11 inches to the left of Jeter. He dives but cannot get there. Dewen punches it in and sees that 90 percent of the Major Leaguers make the play. That’s a -.90 for Jeter.
At the end of the year, then, Dewan adds it all up and give you a plus/minus number — showing you how many more/less plays than the AVERAGE that player made. It’s not perfect — ballparks play a role in fielding too — but I believe it to be the best stat I’ve seen on fielding
Here were the plus/minus leaders and trailers for 2007:
First base
Albert Pujols +37
Dmitri Young, -22
Second base
Aaron Hill and Chase Utley, +22
Dan Uggla, -19
Shortstop
Troy Tulowitzki, +35
Hanley Ramirez, -37
Third base
Pedro Feliz, +27
Ryan Braun, -41*
*Yes, that’s new left fielder Ryan Braun.
Left field
Eric Byrnes, +28
Manny Ramirez, -38
Center field
Carlos Beltran, +25
Gary Matthews Jr., -26*
*Yes, I was surprised because I always think of that great catch he made at the wall.
Right field
Franklin Gutierrez, +22
Jermaine Dye, -37*
*Yikes
Pitcher
Greg Maddux, +10
Daniel Cabrera, -9
There is no plus/minus, unfortunately, for catchers. All of this, along with Win Shares and base running info, is available in the 2008 Bill James Handbook.
* * *
Blog Post Length
This blog post is 3,431 words long. This should give you a good idea how stupid the author really is.
I really like Dye, but he is awful in RF now. He just gives up tons of triples. I kind of wish the White Sox would have let him go and went with Swisher/Anderson/Quentin in the OF. That would have been some nice defense. Instead I will get to watch Owens not get on base.
I picked up Bill James Gold Mine yesterday. It isn’t as in-depth as I was hoping, but it’s a really enjoyable read and I’m already 120 pages into it.
Joe,
I’m going to make this sound as uncheesy as possible, but there really is nothing that gets me more excited (well, a few things) than when I click your blog and see you have a new one posted and it takes me like 2 minutes to just scroll through the whole thing. I haven’t read it yet but when I do I know I will take great pleasure in it. So um, thanks, I guess.
Mr. Posnanski, you’re not stupid: you’re practicing. And it isn’t some 78-mph batting-practice fastball here, you have a real audience. So keep it up. Blogging will keep you sharp.
Just not brief.
Also, thanks for the B-Mo joke.
Joe, I’m sure most of us don’t mind your lengthy posts, and your tangents/sidenotes make them more entertaining than they already are. If we wanted to read a constrained article of yours, there are plenty of those over at the KC Star website.
Thanks, Joe!
BTW, your blogs are almost as long as my sermons.
@Steven: I’ve never seen “I didn’t read this because it was too long” meant as a compliment before. Well done.
As for the post, these are pretty much the same stats I like the best; the ones I will defend in arguments with total strangers at 4 a.m. at IHOP (not much is worth defending at such a time and place).
I just wish more people could embrace these stats, NOT because I think baseball should be some kind of dry science, but because Bill James, Baseball Prospectus, and all the rest are laboring to give us new ways to continue enjoying baseball. You can be a wide-eyed, almost childlike fan with a wild passion for The Game, and at the same time be interested in Win Shares, VORP, Eqa, and all the rest.
“The next ball hit is exactly 11 inches to the left of Jeter. He dives but cannot get there.”
This absolutely made my night, Joe. And great rhythm too.
I thought it was fantastic — Thanks, Joe.
That was a great read! I think the baserunning comparison really does highlight what a smart baserunner Big Papi really is. Also, if I’m remembering correctly, Balco Bonds had an insane run of never being thrown out at home, despite declining speed.
Great read. My one quibble is that I don’t like stats that measure against the average player, for reasons that James and BP have hashed out too many times to mention. The Pedro-Big Train comparison perfectly illustrates the problem. Walter Johnson may not have saved as many runs relative to the average pitcher as Pedro. But he was playing a more pitcher-dominated game. Walter threw 346 innings, which BP gives a whopping WARP value of 19.4 (about 58 win shares).
Ask yourself what the purpose of name dropping is?
White Sox fan, John Dewan’s products are fine money makers. He deserves praise for his entrepreneurship, which is what most of this is about. But the plus/minus system is laughably unscientific. How good is the collection method. How good is the rating technique.
And since you jump into a weird attempt at humor to laugh at the responses you got for your tantrum in your last post. You were a sphincter. You weren’t honest or fair. Sorry if it disparages your clique or affects the cool kids profitability, but every now and then kids need to hear the truth. If someone clocks you, remember you deserve it.
How is Dewan’s +/- laughably unscientific? It’s just about the most scientific defensive metric available (right there with UZR and PMR).
Q: Why was MLB all white from the 1880s until 1947?
A: It was all Cap Anson’s fault. (Even though he died in 1922.)
You know how some Hall voters say “I don’t want to vote for McGwire, but I honestly don’t have a way to decide exactly which players used steroids” ? Not to defend Anson or anything, but was he really acting alone?
Another great post. Keep up the good work Joe.
Joe,
Perhaps you may want to also have a look at Dan Fox’s baserunning metrics on BP. They look like pretty solid measures for that aspect of the game.
As usual, ignore my comment if you know what I’m talking about and already considered my suggestion when writing another fabulous blog post.
“Sorry if it disparages your clique …”
Wait a minute, there’s a clique here? How come no one told me? What, I’m not good enough for your precious little group of buddies? Well you go on and hang out with them! See what I care! I have plenty of friends! Really! There’s, well, my sister. And my mom. I’ve got a cousin Scott who talks to me when he’s in town. And that girl Susan whose skin keeps breaking out and has had braces for about eight years, she just asked me if I had plans for the prom. She didn’t come right out and ask me to go, cause that wouldn’t be cool, and besides, I wouldn’t say yes anyway ’cause I’ve got options. I swear. I’m not just going to go with anyone just so I can say I went. And besides, not going to the prom is my CHOICE. I have that right if I want it. Most of the really cool people aren’t going anyway. I might decide to spend the whole night hanging out with the guys from the band, ’cause that’s how I roll.
I don’t need you’re stinking clique.
Poseurs.
I’m partial to Win Probability Added. Fangraphs does a fantastic job of tracking this and looking at a game’s progress through WPA is a really insightful way to see the truly important moments in a game.
Joe,
Read Born To Run – it’s terrific.
Donate Glory Days to your local library – Marsh abandoned any sense of critical distance in it.
Joe, I crunched the numbers recently for the last 10 years to figure out the probability of scoring once you are on base (not counting homers), and it was basically 1/3 of the time. So Big Papi scoring 31% of the time is not bad, just barely below average. By comparison, Bengie Molina is only 22.5%
Just wanted to say that I frequent BaseballThinkFactory.com every day and I probably only read 5% of the articles posted there.
The exception is your blog. I always RTFA.
Most readers of this blog and sites like Baseball Prospectus like to think of ourselves as “modern” fans who buy into the new wave of statistical analysis. However, for nearly all of us there is a substantial leap of faith as to most of the statistics you cite. I like Win Shares and EqA but I, like nearly everyone else, lack the statistics background to truly evaluate their accuracy or importance. I like Bill James’ writing and I respect the BP crowd, so I tend to defer to their analysis. But I sure don’t have any objective way to evaluate it. If someone says that his studies show that slugging percentage is 2.5 time more important than on-base percentage, then I can either believe them or not, but I can’t run my own numbers to test his findings.
This may be the reason that more traditional stats like RBI and batting average still resonate with most fans. At least fans can get their hands around them. If a runner is on second and the next guy singles him home, well then, we can see that batter did his job and earned an RBI for his work. It’s more accessible, and doesn’t require the leap of faith.
“I’m not exactly sure why you ADD on-base percentage and slugging to get a statistic.”
There’s nothing wrong with adding them, or doing anything you have to do with them, to get a strong correlation between your cobbled-up number and team runs scored.
Paul, you like Jim Rice way too much to be in the clique…
Joe, I think you’re looking for Eddie Murray’s 1984 season, not ‘74, since Murray’s rookie year was in 1977.
RE: OPS
Bill James goes into GREAT detail about the inaccuracy of OPS in his New Historical Abstract, for the very reasons mentioned. He suggests multiplying SLG by 4 or something along those lines . . .
I think OPS+ and ERA+ are fantastic, short-hand indicator’s of a player’s true value – so long as I’m not the one doing the math to figure them out!!! (Ha! – Thank GOD for baseball-reference.com)
Wait a minute…
REM’s “Low” or Cracker’s “Low”?
Every scientific defensive statistic compares a player to a composite average of all players at a position. The problem with that – and it’s not an easy problem to solve – is that the probability that a particular player will make a particular play depends a great deal on where the player is positioned at the start of the play. And defensive positioning varies from team to team, and from batter to batter. For that reason, we simply don’t know in most cases how close we are when we estimate the probability that an average player would make a play when positioned where the Yankees position Jeter.
.”Paul, you like Jim Rice way too much to be in the clique…
I knew that was going to bite me on the ass eventually….
Kieth K- you’ve hit on the biggest problem with internet sabermetrics “research”, imho. We are constantly seeing articles that give a very basic overview of the hypothesis tested (if we’re lucky; often the hypothesis is not clearly stated at all) and the methodology used, sort of, and then given “results” that are very incomplete. We’re absolutely never given the source data that was used or even the full result data set, making it impossible to replicate the findings.
And yet these findings are broadcast to millions and accepted as gospel, defended almost religiously by people who consider themselves “scientific fans” or something, while no scientific/engineering organization (ITEA, AIAA, etc) would accept a paper written like that for a major conference.
I know a lot of it is the audience and sports fans aren’t going to get into an article that goes into the level of detail that true scientific studies would entail, but we accept way too much on faith…
Keith:
“If someone says that his studies show that slugging percentage is 2.5 time more important than on-base percentage, then I can either believe them or not, but I can’t run my own numbers to test his findings……This may be the reason that more traditional stats like RBI and batting average still resonate with most fans….It’s more accessible, and doesn’t require the leap of faith.”
I don’t quite understand this line of thinking. This is basically deciding to go about your life believing the world is flat and the sun orbits the earth because its “easier to get your hands around,” and you don’t want to actually learn classical physics. So you just use the second best system because its easier and you understand it. And that’s fine, you can do what you want and it doesn’t effect me. But just because the old system is easier and maybe used by the layman more often, doesn’t mean you can understand the results that come out of a new system of analysis. For example, a recent study might show eating spinich fights colon cancer, well just because you don’t undstand the methods by which this information was arrived at doesn’t mean you shouldn’t use this information. But this is exactly what you (and more importantly people that vote on MVP awards or the HOF) are doing in the field of baseball analysis.
I’m was a dedicated Alan Trammell fan, but let’s be fair. George Bell did score more runs than Trammell did in 1987 even though it was close and Trammell played a few less games. OBP doesn’t tell EVERYTHING about how many runs you score. The best stat for that is R.
Mike:
“For that reason, we simply don’t know in most cases how close we are when we estimate the probability that an average player would make a play when positioned where the Yankees position Jeter.”
It seems to me you could normalize to the distance from the player’s orginal possition. All that matters from the defenders point of view is where it was hit to, how hard and at what trajectory.
Thanks, Joe. Great post.
Only two minor quibbles: David Ortiz isn’t even the slowest guy on his own team. He’d beat Doug Mirabelli, even considering last year he was suffering from a bum knee.
Also, I’d be willing to bet Jacoby Ellsbury is faster than Carl Crawford.
Cheers
“We’re absolutely never given the source data that was used or even the full result data set, making it impossible to replicate the findings.”
This isn’t entirely true, SABR publishes a journal just like anyone else. But in some cases it is, but this is the way things go in the privite sector. If, say Amgen, comes up with a new drug for diabetes, or is even just trying to come up with this drug, do you think they publish all the work they put into this so it can be reproduced by a competitor? So, yes, sometimes in life we just have to trust the experts.
“while no scientific/engineering organization (ITEA, AIAA, etc) would accept a paper written like that for a major conference.”
Umm this is baseball, why would a major conference in an acedemic field (ie engineering/biology/etc.) accept a paper off topic?
“I know a lot of it is the audience and sports fans aren’t going to get into an article that goes into the level of detail that true scientific studies would entail, but we accept way too much on faith…”
Did you notice how Bill James has about 200 pages deticated to explaining WS that is available to the public. True, some stats methodology is hidden, but again this is just a product of privite orginizations doing much of this research to gain a profit, just like a major drug company.
Joe:
Compounding on the OPS+ discussion between Clemente and Bichette, what kind of numbers would it have taken for a player in 1996 at Coors to achieve a 146 OPS+? Those numbers would have to be obscene.
SleepyCA, you make a valid point- there is indeed no overseeing body to review baseball-related academic papers. However, I sincerely doubt that the data work that went into ANY of these stats that Joe is talking about has not been checked and re-checked ad nauseum.
The people and sites that publish these stats and information have significant reputations to uphold…not to mention many of them sell products based on the stat work they do, so they are also responsible to the consumer. Thus, they have every incentive to make 100% sure that what they are producing is accurate and statistically sound to a great degree.
Every quality statistic that is published comes along with a set of criticisms for why it may slightly misrepresent the reality of baseball. This shows us that the mathematics behind the statistics have been analyzed to the fullest extent possible. Thus, we should have confidence that every angle has been pursued.
There are plenty of academic articles in the world that present interesting findings for which the basis is beyond our reach. That doesn’t give us cause to treat everything that is published in this manner as unproven and of suspect veracity.
If Moe Greene really could run that fast, he wouldn’t have gotten shot in the eye.
I have a bone to pick with you people. First of all, you’re all morons. Kidding of course. But I leave for 2 days and I come back to this? People being called morons on a light-hearted baseball blog? Wowza! You gotta love the Internet. Or no you don’t.
Unrelated topic – has anyone been watching the horrible and unwatchable and yet strangely hypnotic at the same time show “October Road”? I’m not sure which part of that show I hate the most – the acting, the dialogue, the premise, the fact that they all get together [seriously] and play in a fake band with no instruments, or the fact that none of the characters are remotely likeable. And yet, I watch it every week and I was glued to the TV last night. What the frig?!?! Can I blame the wife?
Can Joe or someone else answer a question for me? In the Trammel/George Bell conversation, Joe said that the difference in on-base percentage indicated that Bell did not walk at all. The way I read it, the difference between Bell’s batting average and on-base % is 44 hundredths of a point. Trammel’s difference is 59 hundreths of a point. If my math is correct, over 500 plate appearances, Trammel walked 7-8 times more than Bell. Am I doing the math right? And if I am, is that really that big a difference? Yes, I could go look up the stats at baseball-reference.com, but I’m trying to think for myself here. Thanks.
Never mind, I just looked it up. Clearly, my math is wrong as Trammel walked 21 more times than Bell that year. So I guess my next question is how do we measure that value to the team? Over the course of a season, every 8 games (or more or less once a week), Trammel got a walk where Bell got out. How valuable is that, when it comes down to it?
Sorry if these questions prove I’m no smarter than a fifth grader. (I can take personal shots at myself, right?)
“It seems to me you could normalize to the distance from the player’s orginal position.”
If you knew it, you could do it. But you rarely know it with enough accuracy – you certainly can’t tell based on the typical tight TV shots, unless you’re lucky enough to have a replay from the right angle – and no one at the ballpark records the location of every fielder when the ball is put into play.
Geez, about half the stuff you write is stuff I’ve been meaning to write, the other half is stuff I never could write. This was in the first group, and I liked it.
Worth noting in the future … OPS+ has a great virtue that is wildly underpublicized, and that is that when it is derived correctly (as B-R does), it more or less corrects the biggest problem with OPS, which is that it underweights OBP in favor of slugging, in addition to being park- and league-neutral.
“However, I sincerely doubt that the data work that went into ANY of these stats that Joe is talking about has not been checked and re-checked ad nauseum.”
You’d be surprised. People make basic errors in their assumptions that would get themselves an F grade in Stats 101.
I have a question about the runs saved pitching stat.
I understand it is runs saved against average, and so I am assuming that 77 runs saved in 2000 has been equalized in some manner against 75 runs saved in 1913, but can somebody explain how?
By appearance, in my uneducated viewpoint, it would seem that saving 75 runs in 1913 would carry greater value than saving 77 runs in 2000. Percentage wise, if you save 77 runs (or about 2 runs a game over a 30+start season) when average games have 9 or 10 runs scored, that carries less value than saving 2 runs/game when the average game only has 5 or 6 runs scored.
I’d be interested in hearing how the statistic handles that.
Wally, et al:
Of course you are right — just because one does not fully understand the analytical basis for a particular theory or principle does not mean one should disregard it in favor of something simpler. I’m on your side on this — I was just speculating as to why some fans and media still cling to things like batting average and RBI as significant measures of a player’s performance.
The answer may be that most fans don’t consider baseball to be on the level of complexity of classical physics or colon cancer. It’s baseball! It’s a game kids play! To quote Nuke LaLoosh, you throw the ball, you hit the ball, you catch the ball. Many fans may think that something as simple as baseball may be fairly evaluated with simple statistics without resorting to complicated statistical analysis.
By the way, Tim Keown has a good column today on espn.com in which he compares the war between statheads and “traditionalists” to the creationism/evolution debate. In his words, this is a war which may never be over, because each side has such disdain for the other.
Mike:
“If you knew it, you could do it. But you rarely know it with enough accuracy – you certainly can’t tell based on the typical tight TV shots, unless you’re lucky enough to have a replay from the right angle – and no one at the ballpark records the location of every fielder when the ball is put into play.”
I was under the assumption that this is exactly what is going on in these +/- fielding metrics. I’m not 100% certain, but I do believe I read a Neyer or BP piece about this. Particularly about the ones done by the teams themselves that are not released to the public. Do you honestly think that the Red Sox, for example, aren’t recording all this when the average player on their team is worth 5-6 million a year?
And you wouldn’t need every possible angle recorded, just 2 cameras over the whole field and you could triangulate the position of any ball hit and any fielder.
“You’d be surprised. People make basic errors in their assumptions that would get themselves an F grade in Stats 101.”
Sure that happens, we all make mistakes, but if you really keep up with baseball research it is not only check and rechecked inside one institution but across several (at least the stuff the public sees). What Bill James writes one day, gets studied by BP the next, etc. In short the assumptions get checked just as the math gets checked, and while an individual may be prone to mistakes, the entire field is relatively mistake free, self checking, and most importantly pretty upfront about the short comings of any method. Ie. any stat normalized to a league average or replacement level will be pretty upfront that where you set that bar will influence the out come greatly (for instance a guy that pitches 250 innings at an ERA+ of 120 will look better against league average than a pitcher that throws 180 innings at an ERA+ of 200, while if we compare them to replacement level the pitcher with an ERA+ of 200 will look relatively better compared to the pitcher with an ERA+ of 120). And that determining that bar can be difficult in itself.
Keith K, I definitely agree with your point that the acceptability of stats is directly related to the accessability of them. People will always love the classic stats because you can watch a game and understand how everything that happens affects those- i.e. someone gets a hit and their average goes up, etc. It’s certainly tough to sit in front of the tv keeping track of how everyone’s Win Shares are piling up.
What’s the name of the sports site that runs all the NFL simulations and can tell you what the chances of winning (over thousands of simulations) based on using different strategies (like choosing go for it almost exclusively on 4th down)?
Does the site do the same thing for baseball? Could you plug in two different parameters thusly: one Mark Teahen bats second and Mark Grudzielanek 7th, the other Grudz 2nd and Teahen 6th (use an average of their ‘06 and ‘07 stats as the baseline for each player; I guess you’d need more than stats, you’d need the site’s database to have a record of their exact performance in every situation over the past several years and be able to generalize that to the likelihood of those events being repeated over the course of thousands of simulations).
Could it then tell you the average number of runs the Royals would score over thousands of sims? Just as importantly, could it tell you what the per game mode would be and how that would translate into the Royals likely record in close games?
All the information on how to value individual players is very, very interesting, but it’s not as interesting as how it relates to ways in which players can be used in combination to maximize victories for a team.
Supposedly the Brewers and Cardinals are considering batting the pitcher 8th (you could even make an argument for 7th). This strategy would seem to be best used in conjunction with batting your 3 best hitters in the top three spots. Anybody have the baseball knowledge-mathematical acuity to disprove the conjecture that the following lineup would outscore the conventional lineup? 1. Gordon 2. Butler 3. Guillen 4. Teahen 5. Grudz 6. Buck 7. Pena 8. DeJesus 9. Gathright
The obvious argument is that Gordon will lead off some games by homering or doubling and by leading off the Royals lose his RBI potential, except if he homers, the Royals have the lead, and if he doubles, with Butler, Guillen, Teahen to follow, he’s going to usually score, and again, Royals lead. Not only that, but the Royals are then set up better for the 3rd and 4th innings as well.
Bret – Ellis Burks had a 149 ops+ in 1996. He had a monster year:
.344 /.408 /.639
142 runs
211 hits
45 doubles
8 triples
40 home runs
128 rbi’s
32 stolen bases
“Over a full season, the fastest guy might — again MIGHT — score 10 to 20 more runs than the slowest guy if given the same opportunities. ”
That’s a pretty good estimation of the value of baserunning (although not necessarily of actually scoring runs in the literal sense). The stats I developed add up the contribution across five different aspects of running and the greatest single season of the retrosheet era was at +20 runs for Maury Wills in 1962 (http://danagonistes.blogspot.com/2008/02/baserunning-for-ages.html).
I also found Willie Wilson to be the best overall baserunner of the retrosheet era.
As far as James’ numbers are concerned, the problem is that he doesn’t take into consideration very much context and so looking at overall percentages on advancements for single seasons doesn’t necessarily give the complete picture. I talked about some of the issues in a column in 2006 (http://www.baseballprospectus.com/article.php?articleid=5774).
Great article though. Thanks
I learned a lot from the article about the different SABR stats, what they are, and how they are useful, but there is one huge, glaring mistake, in my opinion in evaluating the baserunning value. That being that the impact of good baserunning on run scoring is completely dependent on the type of players that hit behind, as well as perhaps in front of the player in question.
Papi may have been close to Crawford simply because for much of the year Crawford had Ty WIggington hitting behind him, whereas Pappy had Manny behind him. Obviously, you see how this difference can confound the effect of baserunning on scoring runs.
In addition, the value of baserunning has to be taken in the context of the team the player is on. I bet baserunning would be more valuable for a guy that has contact hitters behind him than a guy who has power hitters behind him.
Another factor is how the threat of a guy stealing second, for example, impacts the quality of pitches the next batter sees and therefore how their production is affected. It would only be human for any pitcher to be slightly less effective when he has to worry about the guy on first.
I am wondering whether there are more developed methods for evaluating the impact of baserunning that take into account some of the above considerations.
I’ll second G. Young’s question about runs saved. First, it seems odd that only 6 of the top 20 seasons on that list occurred before the 1990s. And there is no question that one run saved in 1968 was more valuable than one run saved in 1997. Are these numbers normalized by era? Am I not reading this list the right way?
In response to Pete Ridges: no Anson surely did not act alone. But there were black major leaguers (not a lot of them but some) prior to Anson being the first manager to refuse to play against them. That’s an historical fact.
In Springsteen’s song “The Angel,” what does The Boss mean by “baseball cards poked in his spokes?” We’re talking about a guy on a motorcycle, right? A modern day cowboy roaming the untamed, man-made, flatlands of Jersey. I get that people used to put cards in the spokes of their bicycles, but in the spokes of a motorcycle? That, I don’t get.
For fielding stats, I don’t think you want to account for the starting position of fielders. The anecdote is Cal Ripken’s ability to position himself well given the pitcher and hitter. He didn’t have the best movement at shortstop, but he still got to more balls that most. I agree do you somehow need to account for pitcher and hitter tendencies when computing expected results to use a baseline.
I’m was first shocked to see Clemens 1997 season rank so high, but then I checked baseballreference.com and noticed that he led his league in innings pitched and complete games, all while boasting a 221 ERA+. The winshares system credits him with 32 winshares, which is actually the highest total we’ve seen out of any modern pitcher. Granted, considering it was at age 34 he accomplished it, it’s likely he was dabbling with illegal drugs, but putting that aside for a second, I think Clemens 97 seasons seems to go underrated when discussing the “best” seasons by a pitcher ever. I was one of those people.
I think it is fairly obvious that positioning is taken into account for this particular fielding stat.
The stat relies not on the movement of the player to get to the ball, but rather on the location of the ball relevant to the field as a whole and the “normal” position of a fielder.
For example, a batted ball hit 20 feet to the right of the “normal” position of a shortstop. If the shortstop has chosen for that pitch to position himself 20 feet to the right of the “normal” position of a shortstop, and he makes the play on a ball hit right at him, then he is credited as such and his plus/minus rating improves based not upon the athleticism of the play but on his positioning.
It appears to me that positioning is well accounted for.
Joe,
I just re-read this post since the first time you posted it months ago. This is the first time I’ve commented here, but I wanted to say that your blog is a must-read for me. You really are the best baseball blogger out there. Keep up the good work.