This is the Wikipedia entry for information retrieval. Note that it includes definitions for key indicators used for evaluating information retrieval systems: precision, recall, fallout, F-measure, and average precision.It seems that most of the published work done on indicators dealing with search engines is directed to companies interested in finding how highly their webpages show up in the list of responses when a search engine in queried. Since most product queries are generic, the company is interested in how well it shows up relevant to other companies for generic queries for products that the company could supply.
There is also a set of indicators for data and information bases, designed to indicate how well the software will answer questions put to the information base. Thus "precision" is the portion of retrieved documents that are relevant, and "recall" is the portion of the relevant documents that are identified in response to a request.
The Internet differs from a data or information base in obvious ways. One is that queries are not intended usually to return specific individual websites, but rather to search cyberspace and return the most relevant website, preferably in order of relevance. It is important that the list of returns not contain large numbers of irrelevant webpages. Since cyberspace is so large and people seldom go far down in the list of responses to a question, there is little value in measures comparable to "recall"; who cares whether 100,000 or 1,000,000 responses are returned to a general query. I would also suggest that relevance might better be considered a continuous variable rather than an either or variable. So one might consider an indicator of the total relevance values for the first five or ten returns from a search.
Google News is an example of a search engine that clusters responses. Since many news service stories are picked up by many organizations for posting on the web, either the same or slightly different stories may show up many times. Google News would gather all such stories in one group and go on to another response or group. Thus one would perhaps want a number that represents the overall information on the topic of interest provided by the first five or ten responses. (E.g. what percentage of the information being sought was included in those responses taken as a group.)
It is clear that not all information on the Internet is equally valid. It would be nice to have search engines that could return estimates of the quality of the data provided on the returned URLs. Thus I am interested not only in how complete the answer is to my query, but also how much confidence I should have in that answer.
It is clear that different people have different levels of skill in searching the Internet using Google or other search engines. Thus an indicator of search quality might be as well used to measure search capabilities of users as of the quality of response from search engines.
I note that as a user of search engines, I am not only interested in whether or not I get a useful response in the first search, but whether I can find what I want through a sequence of searches.
Is it then possible to construct some indicators that would be useful? Perhaps one could have panels of people use a search engine periodically and report on their experience. They might estimate the degree of completeness of the response to a set of queries and their estimate of the confidence in the information provided. Trends in the average over time might well serve as a means of monitoring the improvement of quality of the search engine.
No comments:
Post a Comment