Well, you all are about to get a little view into my number one hobby of the last two years: baseball!
For those who don't know the game, baseball seems extremely dreary, drawn out, and rather lacking in athleticism. To prevent re-hashing the same old arguments, I won't even address those concerns. Instead, I want everyone to know that behind the face of baseball, there is perhaps the most advanced math of any sport, and lots of smart math and CS people are making their careers in baseball nowadays.
Since this isn't a baseball blog, I will keep this short. Essentially, the entire way that baseball was analyzed up to the 70's was completely wrong. Given the long history of the sport and it's tradition of keeping accurate statistics, you may be surprised, or even offended, by that statement.
I assure you, it is true. Bill James is sort of the Knuth of baseball- he basically analyzes numbers. This was all brought screaming into the light by Michael Lewis with Moneyball, and the baseball world really didn't like it. I won't bother trying to convince you of anything here, but it's a little like a more believable quantum mechanics: everyone denies it could be true, but experiments bear it out.
Among the central tenets of sabermetrics (a fancy word for investigating the inner workings of baseball) is that closers are overrated. If you don't follow baseball, closers are the teams best relief pitcher, who will typically only pitch in a save situation, which nowadays means only the last inning when their team is ahead by 3 or less runs.
A rather important event this year was Francisco Rodriguez (K-Rod) breaking the all time saves record. Of course, the numbers guys say that although it's a great achievement, it is way overblown. Basically, the Angels this year got K-Rod into an enormous amount of save situations, so he got the record not by any other extraordinary feat of pitching, but by being in the right place at the right time.
Again, the fact that he actually executed this is still very impressive. But if you need any proof that saves are overrated, take a look at Joe Borowski. He had 45 saves in 2007, even though his ERA, at 5.07, was way below average for a reliever, not to mention a closer. In 2008, he's out of baseball.
The evidence doesn't end there, though. Check out this rather amazing comparison between Boston's closer, Jonathan Papelbon, and K-Rod.
What you're seeing here is a ridiculous similary between K-Rod and Papelbon. Papelbon's save situations have been far less, and he hasn't quite executed as well in some of them, but his total numbers are either identical or significantly better!
Why did this happen? The common critiscm of modern stats analysis is it reduces the players to robots. In that light, it happened because of sheer luck: Papelbon's earned runs (ER) came at a bad time. (In fact, anyone who watches the Sox knows that Papelbon has been a victim of a lot of bloops and errors this year) But there is more to it: players are not robots.
The basic misunderstanding of numbers analysis is what the statisticians throw away as "noise" or "luck". When a player is on a hot streak, hitting .450 for the past 10 games, a statistican will tell you "yes, but his batting average on balls in play (BABIP) is unsustainably high and will go down". Does that mean that the player has definitely just been lucky?
No.
What it means is there is no way (yet invented, anyway) of telling for sure whether the player has been lucky or just extremely good for the last 10 games (probably a mix of both...). Even further, there is absolutely no way (and this probably won't change) of predicting when the player is going to end his "hot streak". The hot streak may be due to a fight resolved with the player's wife, or a sleep condition rectified, or any number of factors we simply can't know about. The player has not simply been unsustainably good for the past 10 games- he may have really put his swing mechanics together and is just seeing the ball really well.
The key to baseball analysis, though, is realizing that those factors change so much that you may as well call them statistical noise. If the player defies a statistician's expectations and actually hits .450 for the whole season, it's not like the stat guy will be afronted and angry- he'll be amazed, since that has never happened. It may be possible to happen, but based on the long history of baseball, it probably won't.
Well, I know that was a lot of rambling about baseball, and I may write some more posts about it in the future, but it is quite difficult to introduce a subject to an audience and then talk about a specific part of it. If anyone has suggestions for making any part of this clearler (it is hard to write clearly, since I know all of these things as basic knowledge now, but I surely didn't 2 years ago!), please tell me and I'll work on it.
Sunday, September 21, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment