Monday, August 04, 2008

Baseball as a Metaphor for Index Construction

Yesterday's TV program, 60 Minutes, had a segment on Bill James, the statistical guru for the Boston Red Sox. Baseball fans are often great fans of baseball statistics, and are able to cite batting averages of lots of players from years past and recited records and record holders. As I understand Mr. James' history, he has developed new measures that are useful in predicting individual player's contributions to team success and has introduced the use of statistics based on those measures in Front Office decision making.

For example, for decades fans have kept track of the numbers of games won and lost per season by pitchers, and considered the best pitchers to be those who were counted to have won the most games. It seems obvious that winning baseball games depends on the performance of the whole team and not just the pitcher, and the pitcher's record will also depend to some degree on the overall strength of the pitching staff and management decisions on who to play, when. Mr. James has an alternative view:
As for pitching, he has said that won-loss records do not tell how good or how bad a pitcher is. "The most accurate thing is to focus on the strikeouts, the walks, the home runs allowed. And to evaluate the pitcher on that level," James explains.
Let me suggest that the right index depends on the use of the information. For most fans it may suffice to know that a pitcher won more than 20 games in each of so many seasons to justify the opinion that that is a very good pitcher. Assuming the fan does not bet the house on the outcome of games that the pitcher starts, that relatively available and memorable statistic is enough. On the other hand, for the baseball executive seeking to allocate his financial resources to hire a team that would win the most games and have a chance to win the World Series, detailed analysis of complex data to understand the potential contribution of various pitchers to the team success is very important, and worth considerable time and money. The more complex statistics that indicate in detail how well the pitcher performs are more than justified.

In general, the more that depends on the decision to be made, the more can be spent obtaining and analyzing information on which to base the decision. Mr. James is quite right in suggesting that in modern baseball, which is big business, depending on indices because of their historical importance is costly, and it pays to figure out exactly what information is needed and then if necessary constructing new indices to better approximate that information.

The interview ended with the following exchange:
"There's something in baseball that you really can't quantify. And that is, the mix of guys at a given moment, there's some magic or whatever, that goes on. That all the James-ian theory in the world will never find the answer to," Safer says.

"It's mostly intangible," James says. "I mean, I don't understand most of it. I don't think that anybody in the Red Sox would tell you that we have that magic stuff figured out. But there are people here who understand that part of the equation a lot better than I do."
This seems to me to be a real challenge. I agree a baseball team is more than the sum of the skills of the individual players, but it seems to me that one should be able to develop indicators of "fit". If marriage bureaus can do matchmaking using quantitative techniques, then why should a baseball team not be able to develop indices to measure the compatibility of team members.

I would think one could develop quantitative information, for example from individual and organizational psychology, that would complement the performance statistics, and be used in conjunction with scouting reports and success in tryouts to improve final decisions on player selection.

Generalizing, if you can formulate the hypothesis that dimensions that you are not currently measuring influence the outcomes you seek to predict, then try to figure out whether indices can be found to measure those dimensions. As baseball executives should not limit their thinking to traditional baseball statistics, so in general analysts should not limit their search for indices and data to that which has commonly been used in the past.

No comments:

Post a Comment