Relaxation in Web Search: A New Paradigm for Search by Boolean Queries |
Parke Godfrey |
March 1998 draft. Submitted.
Search by boolean queries suffers from a paradox of precision: if the query is too general (with respect to the corpus), the response is an avalanche; if the query is too specific, the response is empty. It is difficult to cast a query balanced on this scale of specificity that retrieves a reasonable number of items. Nowhere is this more evident, perhaps, than in search on the World Wide Web.
For this reason, many Web search engines employ information retrieval (IR) techniques other than boolean search. Indeed, this problem of specificity is a key reason, in general, that other IR techniques have been researched. For instance, AltaVista's standard interface employs an IR scoring technique across the document list to rank-order a result list. (AltaVista's "advanced" interface is for boolean search.) However, the IR searches have problems too. Quite often, IR search queries result in avalanches. If the IR scoring is not "natural" for the query at hand, the documents-of-interest may be buried in the results, never to be found.
The utility of boolean queries could be saved if a good method to recover from over-specified queries were evident. In this paper, we work to devise such a method. We show how boolean queries can be relaxed automatically, by throwing away just as many keywords as necessary for the query to succeed. With this relaxation facility, a new search paradigm becomes possible: ask a specific query; it will be relaxed to the points of success to find the closest related documents.