Sunday, September 21, 2008

Save The Day

Well, you all are about to get a little view into my number one hobby of the last two years: baseball!

For those who don't know the game, baseball seems extremely dreary, drawn out, and rather lacking in athleticism. To prevent re-hashing the same old arguments, I won't even address those concerns. Instead, I want everyone to know that behind the face of baseball, there is perhaps the most advanced math of any sport, and lots of smart math and CS people are making their careers in baseball nowadays.

Since this isn't a baseball blog, I will keep this short. Essentially, the entire way that baseball was analyzed up to the 70's was completely wrong. Given the long history of the sport and it's tradition of keeping accurate statistics, you may be surprised, or even offended, by that statement.

I assure you, it is true. Bill James is sort of the Knuth of baseball- he basically analyzes numbers. This was all brought screaming into the light by Michael Lewis with Moneyball, and the baseball world really didn't like it. I won't bother trying to convince you of anything here, but it's a little like a more believable quantum mechanics: everyone denies it could be true, but experiments bear it out.

Among the central tenets of sabermetrics (a fancy word for investigating the inner workings of baseball) is that closers are overrated. If you don't follow baseball, closers are the teams best relief pitcher, who will typically only pitch in a save situation, which nowadays means only the last inning when their team is ahead by 3 or less runs.

A rather important event this year was Francisco Rodriguez (K-Rod) breaking the all time saves record. Of course, the numbers guys say that although it's a great achievement, it is way overblown. Basically, the Angels this year got K-Rod into an enormous amount of save situations, so he got the record not by any other extraordinary feat of pitching, but by being in the right place at the right time.

Again, the fact that he actually executed this is still very impressive. But if you need any proof that saves are overrated, take a look at Joe Borowski. He had 45 saves in 2007, even though his ERA, at 5.07, was way below average for a reliever, not to mention a closer. In 2008, he's out of baseball.

The evidence doesn't end there, though. Check out this rather amazing comparison between Boston's closer, Jonathan Papelbon, and K-Rod.

What you're seeing here is a ridiculous similary between K-Rod and Papelbon. Papelbon's save situations have been far less, and he hasn't quite executed as well in some of them, but his total numbers are either identical or significantly better!

Why did this happen? The common critiscm of modern stats analysis is it reduces the players to robots. In that light, it happened because of sheer luck: Papelbon's earned runs (ER) came at a bad time. (In fact, anyone who watches the Sox knows that Papelbon has been a victim of a lot of bloops and errors this year) But there is more to it: players are not robots.

The basic misunderstanding of numbers analysis is what the statisticians throw away as "noise" or "luck". When a player is on a hot streak, hitting .450 for the past 10 games, a statistican will tell you "yes, but his batting average on balls in play (BABIP) is unsustainably high and will go down". Does that mean that the player has definitely just been lucky?

No.

What it means is there is no way (yet invented, anyway) of telling for sure whether the player has been lucky or just extremely good for the last 10 games (probably a mix of both...). Even further, there is absolutely no way (and this probably won't change) of predicting when the player is going to end his "hot streak". The hot streak may be due to a fight resolved with the player's wife, or a sleep condition rectified, or any number of factors we simply can't know about. The player has not simply been unsustainably good for the past 10 games- he may have really put his swing mechanics together and is just seeing the ball really well.

The key to baseball analysis, though, is realizing that those factors change so much that you may as well call them statistical noise. If the player defies a statistician's expectations and actually hits .450 for the whole season, it's not like the stat guy will be afronted and angry- he'll be amazed, since that has never happened. It may be possible to happen, but based on the long history of baseball, it probably won't.


Well, I know that was a lot of rambling about baseball, and I may write some more posts about it in the future, but it is quite difficult to introduce a subject to an audience and then talk about a specific part of it. If anyone has suggestions for making any part of this clearler (it is hard to write clearly, since I know all of these things as basic knowledge now, but I surely didn't 2 years ago!), please tell me and I'll work on it.

Wednesday, September 17, 2008

90 miles per hour, girl, is the speed I drive

I was directed towards this article about removing traffic signs and creating a "shared space" to reduce accidents. I had read this article many times in the past, and sure enough, a quick check on Wiki confirmed- Bohmte, the town in question, enacted this policy in 2007. By all accounts, it seems to have succeeded so far.

Ah, well, it's not the first time that my attention has been diverted towards a rehashed topic. In fact, I've thought quite a lot about traffic in the past couple of years, and I'd like to share my thoughts in a couple of blog entries.

The pith of the shared space concept is that signs are less effective than properly designed intersections. I can see several intuitive reasons for at least the initial success of this program.

First of all, I strongly believe that informational signs are becoming part of the ignored content that the brain automatically filters out. Much like ads on a frequently visited web page, which are so glossed over by the reader that the authors may as well remove them (more on this another time, I hope), signs just aren't read by most commuters. Apparently, this problem is worse in Europe.

Second, there may be a correlation between the amount of information provided and the responsibility felt by the commuter. By responsibility, I don't mean that commuters literally feel that they are absolved of accident guilt. It is more of a responsibility to pay attention and not drive on autopilot.

Personally, I sometimes drive for minutes on my home town roads without really thinking about anything. I know each turn, each blind driveway, and each intersection, so I know where I have to worry about cars coming, and I know where I can pass bikers safely...right?

Not necessarily. Most accidents happen due to unexpected events. The less I have to think about the act of driving, the slower I am to react to out of the ordinary events. If an animal wanders into the road, I'm sure I would be able to react better if I had been scanning the road for bikers and pedestrians and other cars actively than if I had been driving on autopilot.

This is sort of like a theory some of my high school teachers followed: move the kids around the classroom throughout the year, and they pay better attention. I may write a full post on this sort of subject, so I'll keep this tangent brief, but I absolutely believe that humans are far more alert in new situations than in ones done a hundred times before.


So, should we start tearing down signage across America? No, I don't think so. First of all, the concept of shared space relies on efficient road design, which is sorely lacking in America. Removing signs is not enough: you have to redesign the traffic flow to allow for the actual use cases. Shared space suggests rotaries, something that Americans (or at least New Englanders) struggle with quite a lot. Plus, I feel like this idea would completely break down in cities.

Still, I can't help but wonder. Part of my driving philosophy is to aggressively yield and maintain large gaps, so I'm all for anything that encourages such conscious effort by everyone. Again, this is a whole other post, but a lot of traffic problems could really be solved with a simple change in driving style.


What do you think? Should we start implementing the shared space plan in some rural towns? Or even throw it into a section of, say, downtown Portland, Oregon?

Friday, September 12, 2008

And, Back

But first, a non-technical post! Well, sort of...

This experiment in revealing detail was recently posted by the excellent Design with Intent, which I found a long while back as a neat explanation of how architecture can be used to influence desired behavior. Since then, the author has done a simply fabulous job making his topics accessible, especially since they usually aren't.

Anyway, back to the experiment. It is still an interesting application of the web for content delivery (I really like sites/pages that show ideas that can only be shown on the internet), so I suppose it is technological...but the main concept is merely an in-depth examination of everyday life. Writing one of these pages would be a good exercise for a creative writer or poet, I would assume, although I am neither.

I like clicking on the words in different orders. It is somewhat like a limited choose your own story in that sense...

But seriously, I think this experiment emphasizes a couple of interesting things:

1) The amazing amount of decision making, judgment, and coordination involved in every day life
2) The fact that this is made almost entirely unconscious by our brain's amazing powers (which I will hopefully write about later). This makes human-level A.I. so hard to imagine at this point.
3) What level of detail is appropriate? In writing, details can be lauded as so realistic that the reader is transported into the scene...but can also be entirely boring. In memory, we subconsciously choose which details to remember (and usually, it's only a small bit of the total detail).

Actually, my Classical Music professor did a nice demonstration of typical detail memorization while trying to illustrate the limitations of oral tradition (versus writing down songs/stories). She rattled off an absurdly detailed story about nothing really important, then asked a student to recite it. He managed to remember almost the entire outline of the story, but had to make up words to fill in the details.

This was entirely expected, of course, but I still came away impressed by our ability to remember the essential details of life. If you want to try a similar exercise, just try to remember the details of the tea story from before- I bet you can't!



Well, that's it for now. My schedule for the semester is intense:
Algorithms (CS theory)
Artificial Intelligence (CS theory/application)
Analysis (Math)
Classical Music (yay!)
I will also be learning Ruby on Rails (to start...perhaps Django on Python) with my friend Seth, who worked at YouTube this summer and absolutely rocked their socks off. Check out his mugshot!

Point is, you may be seeing quite a few posts regarding the 5 topics above. I hope to finish up the posts I had in mind over the summer, as well.