Wednesday, September 19, 2012

Thinking about the information basis of analysis

I found this interesting. It is from a chapter titled "Do You Really Need More Information," Chapter 5 from Psychology of Intelligence Analysis by Richards J. Heuer, Jr.
Eight experienced horserace handicappers were shown a list of 88 variables found on a typical past-performance chart--for example, the weight to be carried; the percentage of races in which horse finished first, second, or third during the previous year; the jockey's record; and the number of days since the horse's last race. Each handicapper was asked to identify, first, what he considered to be the five most important items of information--those he would wish to use to handicap a race if he were limited to only five items of information per horse. Each was then asked to select the 10, 20, and 40 most important variables he would use if limited to those levels of information. 
At this point, the handicappers were given true data (sterilized so that horses and actual races could not be identified) for 40 past races and were asked to rank the top five horses in each race in order of expected finish. Each handicapper was given the data in increments of the 5, 10, 20 and 40 variables he had judged to be most useful. Thus, he predicted each race four times--once with each of the four different levels of information. For each prediction, each handicapper assigned a value from 0 to 100 percent to indicate degree of confidence in the accuracy of his prediction. 
When the handicappers' predictions were compared with the actual outcomes of these 40 races, it was clear that average accuracy of predictions remained the same regardless of how much information the handicappers had available. Three of the handicappers actually showed less accuracy as the amount of information increased, two improved their accuracy, and three were unchanged. All, however, expressed steadily increasing confidence in their judgments as more information was received. 
This ties into my last post. Note that handicappers can make modest changes in their predictions on "test and retest", suggesting that a prediction might well be regarded as drawn from a distribution of alternative estimates.

I don't really have an explanation for the three handicappers who decreased the accuracy of their prediction when they tried to incorporate more information.

It occurs to me that adding more indicators to their information base may have been used by the handicappers to validate the information that they were already using, more than to improve their prediction. A handicapper might look at the time of a horse in its last race as a key indicator. If the handicapper discovers later that the horse stumbled coming out of the starting gate (a rare occurrence), that might explain a poor time. Discovering that the start was normal might add confidence to the original judgement. If the later information was indeed used to validate such assumptions about the normally useful indicators, then indeed that later information might improve the confidence in the estimates without improving their accuracy.

Note that there is an idea of the economics of adding information to a prediction. One would normally assume that there would be decreasing returns to adding information on which a prediction would be made. On the other hand, there is cost for each piece of information added. Thus at some point the value of an added piece of information should fall below the cost of obtaining that piece. If one were going to bet a dollar with a friend on a race, it would not make much sense to spend a lot of time handicapping that race. It you were advising a foreign potentate who planned to bet a million on the race, you might be exhaustive in data mining before making your estimate.

No comments: