Saturday, May 11, 2013

Subjective Probabilities, Ratings, Handicappers, and Open Access Online Scientific Literature


In a previous post I proposed an iterative process that can be used sequentially add information to the rating of research proposals, providing quantitative results. This post extends that method to show how information on the source of the proposal and the source of the review can be used to improve the results. It also provides a measure of the information provided by the review at each stage. The method is extended to rating open access online scientific publications and non-peer reviewers.

The Economist has published an article describing the rapid increase in open-access scientific publishing. The Internet potentially makes the distribution of scientific papers much less expensive than was the case in the past when they had to be printed on paper and sent via the mails. That expense of professional paper journals was becoming more and more burdensome on society as there were more and more scientists reading those papers and more and more papers by those scientists to be read.

The economic model in the past has been that professional societies sponsored journals, and their members paid dues to the societies which paid for the publishing and distribution of those journals. In other cases, scientific publishers published the journals, financing their publications via subscriptions to the journals.

Increasingly, the open-access costs are paid with the submission of papers or when they are accepted for publication. Governments and foundations that finance research are increasingly paying for the costs of the publication of results from the research that they finance.

In both cases, peer review was used to select the articles to be published in the journal. The peer review was used to select the superior submissions which would then be published. Major journals such as Science and Nature publish only one in ten papers submitted to them. Peer review by such prestigious journals was doubly effective in that scientists would not go to the effort of preparing and submitting papers to them unless they believed that the paper would have a decent chance of being published.

I quote from the article:
Outsell, a Californian consultancy, estimates that open-access journals generated $172m in 2012. That was just 2.8% of the total revenue journals brought their publishers (some $6 billion a year), but it was up by 34% from 2011 and is expected to reach $336m in 2015. The number of open-access papers is forecast to grow from 194,000 (out of a total of 1.7m publications) to 352,000 in the same period. 
Open-access publishers are also looking at new ways of doing business. Frontiers, for example, does not try to judge a paper's significance during peer review, only its accuracy—an approach also adopted by the Public Library of Science (PLoS), a non-commercial organisation based in San Francisco that was one of the pioneers of open-access publishing. It thus accepts 80-90% of submissions.
The New York Times recently published an article describing a problem that is emerging with the increase in open-access, online publishing. There is now a
parallel world of pseudo-academia, complete with prestigiously titled conferences and journals that sponsor them. Many of the journals and meetings have names that are nearly identical to those of established, well-known publications and events......... 
The number of these journals and conferences has exploded in recent years as scientific publishing has shifted from a traditional business model for professional societies and organizations built almost entirely on subscription revenues to open access, which relies on authors or their backers to pay for the publication of papers online, where anyone can read them......... 
But some researchers are now raising the alarm about what they see as the proliferation of online journals that will print seemingly anything for a fee. They warn that nonexperts doing online research will have trouble distinguishing credible research from junk. “Most people don’t know the journal universe,” Dr. Goodman said. “They will not know from a journal’s title if it is for real or not.”
The NYT article was informed by an earlier article in Nature, part of its series on the future of scientific publishing.

About Peer Review

I spent more than a decade managing peer review processes, and came to some conclusions about them. First, it is important to find real "peers", people who are not only experts, but whose expertise is truly applicable to the things that are asked to review. The peer reviewers must be carefully prepared with instructions as to the criteria for their evaluations. Ideally they should be experienced, so that they will have learned how to do peer review well. They must be motivated, in part by their understanding of the importance of their task; in person peer review where they must defend their judgments before other scientists also prove motivating. Care must also be taken to assure that no biases creep into the peer review.

IMDB entry for "10"
Still, in spite of care with all these aspects of the management of the peer review process, reviewers still come in with reviews of varying quality. In my experience, it was important to make judgments on the quality of reviewers, to give credence to reviews according to their perceived quality, and to not invite people back to review new things if they did not do well on previous reviews.

I recently posted on this blog presenting a method, based on experience in peer review, to quantify the subjective probabilities associated with peer review.

The Information Provided by Peer Review

The post suggested a ten point scale for evaluation. It could be applied however to a scale of any length. The method provided a probability, P(Ai), that the average peer reviewer (of the cadre used by the managers of the process) would assign rating Ai to a given submission. The method allowed the probabilities to be refined by a sequential process of inviting reviews one at a time.

The entropy of a probability distribution over a set of ratings is defined as the average uncertainty, as in the following equation:
The information provided by another review is the decrease in entropy. Thus, a sequential process could be monitored by the rate of change of information with additional reviews.

Reviewers Are Not All Equal

There are known biases in peer reviews. Some reviewers have a central bias, tending to concentrate ratings in the center of the scale; others do the opposite, tending toward ratings at the upper and lower extremes of the scales. Some reviewers are notably sour, tending to give low ratings in most reviews; others are generous, giving higher than average ratings.

I even found one reviewer who had a negative correlation with the other reviewers with whom he was associated!

If you have data it is possible to establish a set of estimated probabilities P(Ai / Aj) that the average high quality reviewer rating would be Ai if the specific reviewer gave a review Aj. Thus one can adjust the sequential review process described in the previous post to account for the individual tendencies of each specific reviewer. Contact me if you need more specifics (john.daly@gmail.com).

Entropy as a Measure of Reviewer Quality

Consider a table of rating frequencies. Each column corresponds to one of the ratings by the reviewer of interest. The cells in that column provide the count of ratings by other reviewers for those submissions that received the given rating by the target reviewer. An illustrative table is shown below:

From such a table one could calculate a probability table, which would allow an estimate of the Entropy associated with the probabilities.

Entropy is a measure of average uncertainty in a probabilistic process. An entropy can be calculated from a table such as the one above using the formula given above. It is a measure of the quality of ratings made by a target reviewer.

The entropy can be divided between the entropy of the distribution of the target reviewer's ratings and the average entropy of the probabilities of other raters given the target reviewer's ratings. (The proof of that statement is too long to be given here.)

If a target reviewer tends to give some ratings much more frequently than others, the information per rating is reduced.

In the table shown, the target reviewer ratings are highly predictive of those of the other reviewers. In such a case, the entropy is low. On the other hand, if there is wide variation between the target reviewers ratings and those of other reviewers, and the entropy is high, little information is gained from the target reviewer's ratings. Thus, the more the probability columns for ratings 1 through 10 look like the column on the right, the higher the entropy, and the less value in the target reviewer's ratings.

A further calculation might be useful, and that is the correlation between the ratings of the target reviewer and other reviewers, since it is even possible that a high rating by one reviewer might predict low ratings by other reviewers. One might identify a positive or negative correlation by eye. One might go further, for example calculate the probability of agreement between the target reviewer rating and other reviewer ratings, the probability of a difference of plus and of minus 1, of plus and minus 2, etc. That would provide a probability distribution.

Submiters Are Not All Equal

It is a poorly hidden secret that some scientists are more successful than average in having their proposals funded and their papers published. This of course is partly due to the fact that some scientists regularly propose better than average projects and write better than average papers.

Where reviewers can identify the person responsible for the submission that they are reviewing, they may rationally modify their estimates of the a priori probability that the submission is going to be good. A serious and expert reviewer will of course use the information obtained from a detailed reading and consideration of the submission, but will also use other information ooncerning the source of the submission.

What About non-Peer Review?


I think the Amazon.com process of having customers give ratings between one and five stars is prototypical of a process providing information to the online reader from people who may or may not be experts. Amazon.com uses a star rating system (example shown above). One presumes that the number of stars represents the quality of the offering as compared with other similar offering from Amazon.com. Some of the people providing ratings are experts. Thus in the case of a book of history, some will be professional historians whose research is closely related to the book they are rating. They may or may not be "peers" of the average reader. The 8 reviewers who gave one star (in the example above) to a book that had 264 five star and 136 four star ratings may have been right, but more probably they were not giving accurate ratings.

Amazon.com improves the rating system by asking people not only to give a rating but also to provide a review. It then asks customers to rate the reviews according to how useful they found them. Thus the reader can judge the quality of the star rating with added information on the review.

Handicappers, Horse Racing and Probability Estimates

It is known that the odds in pari mutual betting match closely the odds that the horses will actually win the race; if the betting odds are 3 to 1 for the favorite, the favorite will win about a quarter of the time. How can that be?

Part of the answer is that horse racing fans are rather good at estimating the probability that horses will win races. The study the horses, the breeding, the training, and past performance and make their judgments. Their subjective judgments on probabilities are not uninformed.

Moreover, there are professional handicappers who are paid for making the odds on horse races, and those odds are published. They are available in newspapers or in tip sheets sold at the race track. Punters take the handicapper predictions into account in making their own judgments.

In betting the actual pari mutual odds are based on the amounts bet, not the number of betters. The casual race course visitor may make a $2 bet on his inexpert judgment. The informed professional gambler will make much larger bets when the odds seem right. Thus the amount of the bet carries implicit information on the credibility of the punter's estimates of probability. The larger bets of course have more influence on the final pari mutual odds.

So What?

The procedures described above can not only provide quantitative estimates of the quality of scientific papers (or books or movies), but can be used to improve those estimates sequentially as more and more reviews are received. They can also provide quantitative estimates of the amount of information in the current rating. They can be modified to include information on the track record of the author of the submissions and on the past performance of the reviewers.

It would seem to be important to find such means to help scientists choose which papers to read in the burgeoning corpus of open-access online scientific papers. They may even find use by online vendors such as Amazon.com and eBay.com.

No comments: