Monday, September 17, 2012

Some tests are better at determining quality of the performer.

There is no doubt in a golf match whether or not the ball falls into the cup. In tournament play, scores are very carefully kept. Courses are carefully designed to present individual challenges while maintaining considerable inter-course similarity; they define par for each hole, so that scores for 18 or 72 holes with respect to par are relatively similar from course to course. Courses do play differently on different days.

The idea is that the outcome of a tournament is primarily determined by the play of the golfer. Players obviously differ one from another in their ability. The score on a specific day is also affected by the how well the golfer is playing that day and what might be termed luck -- how a ball bounces or the specific place in which it lands. These seem to be relatively random. The first day of a tournament often sees relatively low ranked players in the top ten, but after four days and 72 holes the leaders are usually among the highest ranking players on the tour. Indeed, over thousands of holes played during a year, the best players seem to come out on top of the rankings.

One way of looking at a player's performance is the Z score -- the distance between the score of that golfer and the average of those playing in the same tournament, measured in standard deviations. I draw on a recent article by Bill Barnwell for the following.) Tiger Woods, in perhaps the best performance by a golfer in a century, produced a 15-stroke win at the U.S. Open in 2000, which had a Z-Score of -4.12. That is, he was more than four standard deviations better than the average of the the world's best golfers who had played the full 72 holes of one of the four most important pro-golf tournaments of the year. His average Z-Score over the many tournaments he played that year was -2.14. Thus he was consistently scoring two standard deviations better then the professional golfers "making the cut", something that would have a vanishingly small probability of occurring were he but an average professional. In the U.S. Open, however, his score was a further two standard deviations about the average for his competitors, suggesting that he was playing at the top of his form and the luck was with him that day.

I would suggest that scoring on a standardized multiple choice test, such as some of those produced by the Educational Testing Service would be similar at some level to that of golf. There would be very little uncertainty about any answer for any test taker. While the ETS works to see that the tests are standardized, there would be some variation in the scores of a student taking a test several times due to changes in the specific questions to which she responded. The more important sources of variation would be how well the student took the test on the specific day, and how good the student was generally on taking such tests.

Contrast this with an essay test. Not only is variation introduced by the student, but also by the person grading the essays. One could for example, run a set of essays by a panel of reviewers and estimate the distribution of grades.

I got thinking about this while watching Broadway or Bust, a contest in which 60 high school students compete in a singing, dancing, acting competition testing their potential for the Broadway musical theater. The students, who have different talents and different levels of training, must perform combining acting, dancing and singing skills (and few if any are expert at all three). They perform different numbers emphasizing different skills. They are judged on the basis of their performances by a panel of Broadway professionals, who themselves have different backgrounds and abilities.

In some sense the competition is fair, in that the successful students will be those who most impress that set of judges with that specific performance. The successful student will probably be at the top of his/her form, with a number that reaches those judges, who happen to like that performance. It seems unlikely that the student who will have the best career on Broadway will be recognized in this competition, but we will need a lifetime to be sure. Lets hope the students realize that they are all winners.

