a study of aboutness in information retrieval

A Study of Aboutness in Information RetrievalP.D. BruzaSchool of Information SystemsQueensland University of [email protected] T.W.C. HuibersDepartment of Computer ScienceUtrecht UniversityThe [email protected] paper addresses the notion of aboutness in information retrieval. First, an expositionis given on how aboutness relates to relevance - a fundamental notion in information retrieval.A short summary is given on how aboutness is de�ned in more prominent information retrievalmodels. A model-theoretic de�nition of aboutness is then analyzed in an abstract setting usingso called information �elds. These allows properties of aboutness to be expressed independentof any given information retrieval model. As a consequence, information retrieval models canbe theoretically compared according to what aboutness postulates they support. The Booleanand Coordinate retrieval models are compared in this fashion. In addition to model-theoreticaboutness, preferential entailment and conditional probabilities are employed to de�ne about-ness between primitive information carriers. The preferential entailment approach is basedon a preference semantics derived from nonmonotonic logics. The nonmonotonic behaviourof aboutness under information composition is highlighted. Rules describing how aboutnessmay be preserved under composition are proposed. Finally, a term aboutness de�nition drawnfrom a network-based probabilistic framework is analyzed. Conclusions regarding the impliedretrieval e�ectiveness are drawn.Keywords: information retrieval theory, relevance, belief networks.1 INTRODUCTIONThe information retrieval problem can be described as the quest to �nd the set of relevant infor-mation objects corresponding to a given information need, which is represented by a request. Theimport of the information retrieval problem need not be argued as information oods towards usin ever increasing tides. Figure 1 depicts the so called information retrieval paradigm. It involvesa person with an information need that (s)he wishes to ful�ll. (It is beyond the scope of this articleto investigate how this need arises etc. Ingwersen (1994) has recently broached this area from acognitive view point). Henceforth, we will denote this person as the searcher and the informationneed as N . The information need is expressed in the form of a request, denoted q, which is givento an information retrieval system, or a human intermediary, such as a librarian. The intentionis that the request be as good as possible a description of the need N . In addition there is theinformation to be retrieved. This is modelled as a set O of information objects. The informationobjects are also referred to as documents. Each object O is characterized to facilitate its retrieval.The characterization, denoted �(O), consists of a set of terms drawn from the characterizationlanguage C. The terms are used to capture some of the content of object O. The characterizationis arrived at via a process of indexing. Due to practical considerations, indexing algorithms usuallyonly produce characterizations consisting of a small set of terms also known as keywords. Notethat such characterizations are very incomplete descriptions of the associated information object.Once the information need has been formalized in the form of a request, it can be respondedto via manual means, for example, by a librarian locating potentially relevant books, or as is more1

�� @ �� AA - requestq character-ization� �6formulation 6indexingmatching�� informationneedN��

Figure 1: The Information Retrieval Paradigmcommon these days, by an automated information retrieval system. The latter case is driven bya process called matching. This process involves comparing the characterization of an object Owith the request q. If deemed su�ciently similar, then O is assumed relevant and returned. Inboth the manual and automatic cases, information objects are returned, but most often not all ofthe objects in the result set will be relevant with respect to the speci�c information need N .The question of relevance is an important one and has long been scrutinized and philosophizedin information retrieval research, simply because the goal of information retrieval is to returnas many relevant objects as possible to a given information need. In order to ful�ll this goalit is necessary to de�ne formally what relevance is. The underlying problem seems to be thatthe information need de�es formalization. Cooper (1971a) describes the information need as a\psychological state" and as such not a \visible object or complex of symbols : : : somethingnot directly observable". Furthermore, the determination of relevance appears to be a subjectiveprocess. Blair (1990) cites research which concludes that searchers can consistently and easilydetermine the relevance of an information object with respect to their information need withoutbeing able to enunciate the criteria they use for this. This phenomenon has been compared withhow people can recognize faces without being able to elucidate how they do this. Schamber et al(1990) provide a comprehensive survey of relevance in information retrieval. They conclude \webelieve relevance is a multidimensional concept; that it is dependent on both internal (cognitive)and external (situational) factors; that it is based on a dynamic human judgment process; andthat it is a complex but systematic and measurable phenomenon". In short, relevance appears tobe a clearly de�ned notion with respect to the searcher, but di�cult to de�ne operationally.Cooper (1971a) made an important contribution to increasing the understanding what rele-vance is by making a distinction between so called logical relevance and utility. A given object islogically relevant to the information need if the object is topically related to the need. Utility, onthe other hand, is purely a pragmatic notion: Is the object useful to the searcher? The di�erencebetween the two notions is made clear when one considers the possibility that even though anobject may be topically related to an information need, the searcher may reject it because they donot trust the information in it as being accurate. The foothold for operationally de�ning relevancehas been via this notion of topical relatedness. If the object is deemed to be topically related, orabout, the request, the probability of relevance of the object with respect to the information needis assumed to increase. Observe the level of indirection here. Information retrieval systems donot typically deal with the information need and relevance directly, but rather with requests andcomputations of aboutness between an object and a request.A diverse range of information retrieval models has emerged based on the way they computeaboutness. We will brie y discuss some of the more prominent models. Boolean retrieval assumes2

that the request is represented as Boolean formula constructed from keywords and the logicalconnectives ^;_ and :. An information object O is deemed to be about the request q if and onlyif the request can be proven from its characterization (�(O) ` q).As the name suggests, vector space retrieval adopts a geometric stance (Salton 1993). Bothinformation objects and requests are represented as vectors in an n�dimensional space; one di-mension corresponding to each term in the characterization language C. An information object Ois assumed to be about the request q if and only if the cosine of the angle between ~�(O) and ~q isgreater than zero.Various logic-based information retrieval models have recently appeared. They all trace theirroots back to Van Rijsbergen's pioneering work in this area (Van Rijsbergen 1986,1989). In anutshell, the logic-based approaches embody aboutness in the following way: Object O is deemed tobe about request q if �(O) ! q, whereby the connective ! signi�es implication. Van Rijsbergen'swork did not specify details of the logic surrounding this connective. As a consequence variousspeci�c logics have spawned (Bruza 1992; Nie 1992; Meghini et al 1993; Marshall 1991; Sembok& Van Rijsbergen 1990; Chiaramella & Chevallet 1992). In practice, it is often the case that thetruth of �(O) ! q cannot be established. In this eventuality, the probability that O is about qis computed, i.e. (Pr(�(O) ! q). Van Rijsbergen's original work described how this could bedone via a process called imaging (Van Rijsbergen 1989). Imaging is based on a Kripke structurewhose worlds can be understood to be object characterizations i.e. partial models based on theassociated object. The intuition behind the accessibility relation is \similarity" between worlds.Imaging has been investigated by Nie (1992) and more recently by Crestani & Van Rijsbergen(1995). Alternative approaches to imaging have been studied. Bruza (1993) talks of minimalaxiomatic extension. In this approach, ! is underpinned by a strict inference mechanism de�nedon the characterization language C. If the proof of q based on �(O) does not go through, then�(O) is extended in a minimal way until the proof succeeds. The probability of �(O) ` q (i.e.�(O) ! q) is assumed to be inversely proportional to the amount of extension of �(O) that wasnecessary to let the proof go through.So far, scarce mention has been made of probabilistic approaches. Probability theory is a soundbasis for information retrieval as it is an inherently uncertain process; the request is typically not anexact representation of the information need, the characterizations of objects are, due to practicalreasons, incomplete. In the late seventies, Maron (1977) studied aboutness in a probabilisticsetting. He introduced three types of aboutness. Subjective aboutness (S-about) corresponds tothe notion of relevance as discussed above. Objective aboutness (O-about) is similar to Cooper'slogical relevance, and retrieval aboutness (R-about), which is de�ned in terms of a probabilitydistribution spanning the elements of the characterization language C. Maron was concernedwith the indexing problem, so R-about was formulated as follows: Object O is R-about descriptori; (i 2 C) is de�ned to be the probability that if O were to satisfy the user's information need, thenthat user would be employing search term i. Maron formulated this probability as Pr(i j A;O)where A designates the class of user, and O designates the class of events each of which consistsof having O satisfy some given information need.The above survey on aboutness in information retrieval is rounded o� with mention of Hutchin-son's work (1977). He describes how aboutness may be derived based on linguistic cues.In summary, most information retrieval models compute aboutness between an object O and arequest by using the object characterization �(O). In the last thirty years of information retrievalresearch, aboutness has usually only been de�ned within the framework of a given informationretrieval model. In addition, the assumptions regarding the aboutness decision are often notexplicitly stated. As a result, it is di�cult to compare the matching processes of di�erent retrievalsystems at a theoretical level. Recent research has attempted to examine di�erent retrieval systemswithin a general theoretical framework (Nie 1986; Van Rijsbergen 1986a,1986b; Bruza & Huibers1994; Huibers & Bruza 1994). This research is in its infancy and the many details of the frameworkhave yet to be worked out. The only consensus seems to be that the framework should be logic-based. We believe information-based logics are a reasonable point of departure, as informationretrieval systems deal with information, not truth.3

2 INFORMATIONAL FUNDAMENTALSWe begin by abstracting from notions such as descriptors, documents and queries and introducethe notion of an information carrier. The central theme of this article is to express the assump-tions made in information retrieval models regarding aboutness. The logic-based approach toinformation retrieval allows aboutness to be considered within a logical framework. In our earlierwork we have introduced aboutness as a model-theoretic concept (Bruza & Huibers 1994):De�nition 2.1 (Model-theoretic Aboutness)An information carrier i will be said to be about information carrier j if the informationborne by j holds in i �That is, i is taken to be a model in which j is interpreted, for this reason the notation i j=a j willbe used. As information retrieval systems typically work with documents, i would typically be anatural language text. As a result the models that we are dealing with are not those traditionallyencountered in logic1.The above de�nition can be found explicitly or implicitly in several papers on logic-based infor-mation retrieval (Cooper 1971b; Van Rijsbergen 1986b; Bruza & Van der Gaag 1994; Meghini etal 1993; Sebastiani 1994). Note that this conception of aboutness is not applicable for documentclustering where aboutness is determined by the overlap between respective document charac-terizations. The de�nition is however useful for studying aboutness between a document and aquery.For purposes of illustration the elements of the index expression language will sometimes beused as information carriers (see Bruza 1993). Informally speaking, an index expression consistsof a number of keywords, separated by means of connectors which model relationship (types).Keywords are taken from a given set K of keywords and correspond to nouns or noun-qualifyingadjectives; connectors are taken from a set C of connectors and are basically restricted to theprepositions and the so-called null connector. More formally, the language L(K;C) of indexexpressions over K and C is de�ned by the following syntax:Eexpr ! Keyword fConnector Eexprg�Keyword ! k, k 2 KConnector ! c, c 2 CExamples of index expressions are people in (need of (information)) and e�ective � (information �((retrieval)); in the latter index expression � denotes the null connector. The brackets emphasizethe tree structure of index expressions. For reasons of brevity we will ignore them from this pointon.Information ContainmentSome information carriers convey more information than others. According to Landman (1986)and Barwise & Etchemendy (1990) information can be partially ordered with respect to informa-tion containment (denoted by !):i ! j i� the information that information carrier i carries already contains the infor-mation that carrier j carriesIn other words, carrier i bears more information that carrier j. Keep in mind that carrier i \is lessthan" carrier j in the ordering. Information containment is related to speci�city. For example,the information carrier little � green � martians is more speci�c than green � martians which is morespeci�c than martians. Note that little � green � martians! green � martians! martians.The information containment relation ! is assumed to be re exive, antisymmetric and transi-tive. Transitivity in the context of information containment is also known as the Xerox Principle(Barwise & Etchemendy 1990).1It is a matter of ongoing investigation to provide a suitable model-theoretic basis for information retrieval.Recent e�orts have centered around situation theory (Huibers & Bruza 1994; Lalmas & Van Rijsbergen 1992; VanRijsbergen 1993) 4

Another manifestation of information containment is the ISA relationship. For example, salmonISA �sh so salmon ! �sh. Information containment can also be considered within in a logicalframework. If a formula is derivable from a formula �, then can be thought of as beinginformationally contained in �. This corresponds to the intuition that the information containedby a given formula is all the theorems provable from it.Information CompositionConsider the Boolean retrieval query shuttle ^ design. This is an expression of the need to beinformed about the design of the space shuttle. Observe how ^ is used to compose the twoindividual terms into shuttle ^ design. This is an example of information composition. Here isanother based on index expression carriers. Given the information carriers green � martians andlittle � martians. These can be composed to form the information carrier little � green � martians.Note how the latter carrier bears precisely the information furnished by the combination of thetwo previous carriers. This is the fundamental property of information composition. Informationcomposition will be denoted by the operator �. Based on the above intuition, it is reasonable toassume that i � j ! i and i� j ! j. Furthermore, it is assumes that � is idempotent. Observe,however, that properties such as the commutativity and associativity of � cannot be taken forgranted. The properties of � depend on the language chosen for the information carriers and therules speci�c to that language which govern composition. In the case of the index expressions,composition turns out to be quite complex. At the level of keywords, composition is realized viainsertion of a connector. For example, given the keywords system and information. These can becomposed using the null connector in two ways: system � information and information � system.This example also demonstrates that information composition is not commutative in the caseof index expressions as the latter carrier corresponds to the noun phrase \information system"whereas the previous one corresponds to \system information".If one carries information composition to the extreme one attains total information, whichaccording to Landman is too much information (Landman 1986). In a state of total information,the information is so dense that it cannot be comprehended. The total information carrier will bedenoted by 0 which constitutes the bottom element of the partial order of information carriers.As a consequence, all information carriers are informationally contained in 0. The carrier 0 isintuitively similar to falsum in the context of classical logic in the sense that all formulae canbe derived from it. As a �nal note on information composition, it will be assumed that for alli; j 2 =; i� j 2 = or j � i 2 =.Information PreclusionNot all information carriers can be meaningfully composed. The reason for this is that they areincompatible; the information they share clashes, or is contradictory. In other words, carriers iand j are said to preclude each other, denoted i ? j. It is natural to assume that facts precludetheir negation. Information preclusion, however, is not restricted to being a relation over facts. Itcan be argued that green � martians precludes blue � martians under the assumption that martiansare either blue or green, but not both. This intuition behind this phenomenon can be explained interms of possible worlds (Landman 1986). After characterizing a world as being a \green martian"world, it cannot be re-characterized as a \blue martian" world. Note that little � martians does notpreclude green � martians. Several authors regard information preclusion as being fundamental toa theory of information (Landman 1986; Barwise & Etchemendy 1990).Information preclusion arises naturally in information retrieval. For example, if you are search-ing for documents about river � pollution you are probably not interested in documents about air� pollution. Hence, river � pollution ?N air� pollution whereby the N emphasizes that the preclu-sion relationship is a product of the given information need N . Later in this article this type ofpreclusion, dubbed preferential preclusion, will be investigated in more detail.Information FieldsTo summarize, a framework has been proposed in which the notion of information carrier isfundamental. Such a framework is termed an information �eld. An information �elds draws its5

underlying concepts from theories of information being currently developed in information-basedlogics (speci�cally infon algebras (Barwise & Etchemendy 1990) and data semantics (Landman1986)). The intention is that an information �eld o�ers the necessary building blocks to expressproperties of aboutness. More about this in the next section.De�nition 2.2 (Information Field)An information �eld is a structure (=;!;�;?N ; 0) such that1. = is a non-empty set of information carriers2. (=;!) is a poset3. 0 2 = and for all i 2 =, 0 ! i4. 8 i; j 2 = :� i� j ! i and i� j ! j� i� i = i� i� j 2 = or j � i 2 =5. ?N� =� = �3 MONOTONIC PROPERTIES OF ABOUTNESSWhen an information retrieval model is developed certain assumptions are made as to what pro-duces \good" retrieval. The problem is that these assumptions are often not expressed, and whenthey are, they tend not to be expressed in a general enough way to determine whether two di�erentretrieval mechanisms are governed by the same or similar sets of assumptions. At the moment, thequestion as to whether the assumptions behind the matching process of coordinate retrieval arethe same as those that govern vector space retrieval cannot be satisfactorily answered. If it could,the two systems could be formally compared, for example, by using representation theorems. Suchan approach has been used with success in comparing nonmonotonic reasoning systems.This section proposes a system of postulates expressed in terms of concepts from the informa-tion �eld. The postulates are intended to characterize the assumptions inherent within a givenretrieval mechanism with regard to aboutness. Retrieval mechanisms can then be compared ac-cording to which postulates they are governed by. The notion of aboutness is embodied by abinary relation j=a over the set of information carriers =. The �rst postulate expresses that aninformation carrier is about itself.Postulate 1 (Re exivity (R)) i j=a iAn information carrier is about the information it contains. This is the premise behind theContainment postulate.Postulate 2 (Containment (C)) i! ji j=a jThis premise is implicit in automatic indexing. In this process descriptors contained in the docu-ment are identi�ed to facilitate its retrieval. The underlying assumption is that the document inquestion is about each of these descriptors.Say there is a document d which is about green �martians. Observe that green �martians !martians. Therefore, d is about martians. This is an example of Right Containment Monotonicity.6

Postulate 3 (Right Containment Monotonicity (RCM))k j=a i i! jk j=a jRight Containment Monotonicity is fundamental to many systems proposed in situation theory(Barwise & Etchemendy 1990). RCM is, however, sometimes a dubious assumption for drivinginformation retrieval. Consider a document d about protozoas (d j=a protozoa). Protozoas areanimals, hence protozoa! animal. RCM permits the conclusion d j=a animal. Observe that d willtherefore be returned in response to the query animal. It is doubtful that a typical searcher woulddeem this document to be relevant. Later in this article, questions of this nature will be exploredfurther by taking searcher preferences into account.Postulate 4 (Context-Free And (C-FA))k j=a i k j=a jk j=a i � j(Note: for reasons of simplicity, the dual postulate kj=ai kj=ajkj=a j�i is not stated. This will be continuedthroughout).Boolean retrieval, for one, is founded on this postulate. For example, if a document d is aboutriver and the same document is about pollution it is assumed that d is about river ^ pollution.Recent research has shown that this can be a dubious assumption, particularly at lower levels ofinformation granularity (Callan & Croft 1993; Salton et al 1993). The problem lies in the factthat the carrier river ^ pollution bears implicitly the assumption that river and pollution are relatedwhich doesn't have to be the case. In order to alleviate this problem, a context sensitive approachcan be adopted (Bruza & Huibers 1994).If a document is not about pollution, then it can't be about river pollution. This is the intuitionbehind the so called Negation Rationale.Postulate 5 (Negation Rationale (NR))k 6j=a ik 6j=a i � jIt has been recently proven that inference network models implicitly embody this property if thetopology of the network is determined by the information containment relation (Bruza & Van derGaag 1994).If it can established that an information carrier i is about another carrier k, then compos-ing i with another carrier j will not violate this. In other words, aboutness is preserved undercomposition. There are both right and left variants of the compositional monotonicity rule:Postulate 6 (Left Compositional Monotonicity (LM))i j=a ki � j j=a kPostulate 7 (Right Compositional Monotonicity (RM))i j=a ki j=a j � kAt �rst sight Compositional Monotonicity is not an unreasonable property. We will illustrate it byusing LM in coordinate retrieval systems as an example. The information carriers of these systemsare represented as sets of terms. Information composition is realized by set union. Informationcontainment is simply the subset relationship. Consider a document d characterized by the termsfti; tjg and the query ftjg. Now, ftjg ! ftjg as ftjg � ftjg. Using the Containment postulate7

renders ftjg j=a ftjg. Application of Compositional Monotonicity yields fti; tjg j=a ftjg. Hencedocument d would be returned to the searcher.Compositional monotonicity also underlies query expansion. This is a process whereby theinitial query of the user is expanded to hopefully include aspects of the information need notexplicitly stated by the searcher. For example, a query pollution could be expanded to river �pollution using Left Compositional Monotonicity:pollution j=a pollutionriver � pollution j=a pollutionNote that the expansion should not proceed in an uncontrolled manner. Most query expansionsystems expand by using terms that are semantically related to those present in the query underexpansion. In other words, a conservative form of monotonicity is being employed. We will havemore to say about this in the next section.If information carriers preclude each other, then it doesn't seem unreasonable to assume thatthey are not about each other.Postulate 8 (Preclusion (P)) i ? ji 6j=a j and j 6j=a iApplications of this assumption can be readily found in information retrieval. For example, thebasic probabilistic retrieval model operates according to a term independence assumption. Inother words, independent terms (viewed as information carriers) are assumed not to be abouteach other. Furthermore, term vectors in vector space retrieval are orthogonal to each other. Thisis a geometric expression of information preclusion.The abundance of probabilistic retrieval systems (founded on Bayesian inference) suggests thataboutness should exhibit nonmonotonic character. One immediate example is Boolean retrieval,which is known to function according to a Closed World Assumption (Van Rijsbergen 1986a).Postulate 9 (Closed World Assumption (CWA))i 6j=a j j ? ki j=a kThe Closed World Assumption is a postulate that describes nonmonotonicity with regard to about-ness. At an operational level, nonmonotonicity means that an information retrieval mechanismwould be able to retract previous aboutness decisions in the light of new information, for exam-ple, relevance feedback from the searcher. Relevance feedback is a process whereby the searcheridenti�es relevant objects in the result set. Using terms in the identi�ed relevant objects, a newquery result is computed. Some retrieval mechanisms, for example the vector space model, cannotretract an aboutness decision once it has been made. That is once an information object is addedto the result set, it cannot be removed on the basis of the aboutness decision. For this reason, cuto� values are often used as an ad hoc means of excluding objects. It would seem that aboutnessdoes have nonmonotonic character, and that retrieval mechanisms should take it into account.4 NONMONOTONIC PROPERTIES OF ABOUTNESSUp till now the intuition behind i j=a j is that i is a framework in which j is interpreted. In somelogic-based approaches to information retrieval each individual document is considered to be aframework in which the query is interpreted. Observe that documents are generally much larger,more complex and more descriptive than queries. This is not true of document characterizations.For example, a commonly used document characterization consists of a set of keyword descriptors.These are very primitive information carriers, and viewed in isolation are often ambiguous. Forexample, the information carrier crane. Is this referring to a bird or lifting machinery? As thematching process basically compares document characterizations with the query, the questionarises as to the nature of aboutness between information carriers which are primitive. This sectionargues that preferential entailment is a useful mechanism for examining this question.8

4.1 Information Carriers and Preferential StructuresConsider the information carrier migration. This carrier can be seen as discriminating a set ofobjects dealing with migration. For example, there may be a document dealing with the migrationof sparrows, whilst another may be about the migration of salmon. Driven by their speci�cinformation need a searcher will prefer some information objects over others. For example, inthe above case, a searcher may prefer the document about the migration of salmon because thatis what he or she wants to be informed about. Observe that the information need imposes apreferential ordering on information carriers. Figure 2 depicts a preferential ordering over a set ofinformation carriers dealing with migration.Crane Sparrow

Salmon

AboriginiesGreeksFigure 2: Example preferential ordering on information carriers about migrationInformation retrieval systems attempt to approximate the preferential ordering via documentranking. Each document receives a match coe�cient between its characterization and the query.The coe�cient is used to order the documents on likelihood of relevance to the searcher.Based on the above intuition, the preference relation can be formalized as follows. Let i @N jdenote that carrier i is preferable to carrier j in the light of information need N . It is natural toassume that @N is irre exive, antisymmetric and transitive. Stated simply, @N is a strict partialorder on a set of information carriers =. The structure h=;@N i will be referred to as a preferentialstructure. We will take the set = to be �nite. As a consequence a so called smoothness conditionis guaranteed, the details of which are not relevant to this article.Preferential structures have been investigated within meta-theories for nonmonotonic reasoning(Kraus et al 1990). In these investigations, the preference relation, denoted @ has to do with\normality". For example, the world in which Tweety is a bird and ies, is considered more normalthan a world where Tweety is a bird and doesn't y. Here \world" refers to an interpretation of astandard logic (propositional or �rst-order predicate, either classical or modal). This logic, whichis based on a preferential structure h=;@i, is termed a preferential logic and is denoted by [email protected] of the intuitive meaning given to the preference relation, the following theorem canbe stated (Shoham 1989):Theorem 4.1 (Shoham) Let L@ be a preferential logic such that @ is smooth. Then L@ ismonotonic if and only if @ is an equivalence relation.Monotonicity of logic L@ refers to the following property, where i; j and k are formulae of L@:i j= ki ^ j j= k9

This property bears a striking resemblance to the Left Compositional Monotonicity postulateintroduced in the previous section. That is, if we know that information carrier i is about carrierk, then composing j to i will not disturb the aboutness relationship with k:i j=a ki� j j=a kLet us now place Shoham's theorem within the context of information retrieval and the follow-ing question arises. Given that @N in h=;@N i is not an equivalence relation (because it is astrict partial order), does this imply that aboutness is not monotonic with regard to informationcomposition?4.2 Defaults in Information RetrievalAssume for the moment that there is a searcher with an information need dealing with migration.Furthermore, assume the information need is satis�ed by being informed about the migration ofbirds. Observe that if the searcher enters the query migration, (s)he expects information about themigration of birds. In other words, that the migration is bird migration is assumed, that is, takenas a default. This default is expressed as follows:migration �N birdwhich reads \in the light of the information need N , migration preferentially entails birds". Ob-serve that the default is parameterized with the information need N . Remember that it is thisneed that is imposing the ordering on information carriers.Defaults arise out of searcher biases arising out of the information need at hand. More sig-ni�cantly, defaults imply a preference for some information carriers over others. In other words,defaults are intimately tied to the underlying preferential structure. The nature of this relationshipwill be explored in the rest of this section.We begin by adapting Shoham's (1989) de�nition of preferential satisfaction. The basic in-tuition that was presented earlier remains the same. Documents will be considered a frameworkin which other information carriers (queries) are interpreted. This is augmented by taking thepreference relation into account.De�nition 4.1 (Preferential Satisfaction)An information carrier i preferentially satis�es an information carrier k (written i j=N k)if � i j=a k� there is no information carrier j such that j @N i and j j=a kThe carrier i is said to be a preferred situation which supports k. �With regard to the preferential structure depicted in �gure 2, the preferred information carriersare the ones dealing with the migration of cranes and sparrows. Defaults can now be de�ned interms of preferential entailment as follows:De�nition 4.2 (Preferential Entailment (Aboutness))An information carrier i preferentially entails information carrier j (i �N j) if for allinformation carriers k:� if k j=N i then k j=a j �For example, the default: migration �N bird expresses that preferred information carriers dealingwith migration are also about birds. 10

4.3 Reasoning with Defaults under Information CompositionWhen a searcher wishes to learn about bird migration (s)he typically wants to become informedabout such aspects as their migration patterns, where they migrate to and why etc. In terms ofa preferential structure the preferred information carriers will contain information of the abovesort. Within this speci�c information need, the searcher would seemingly not want to be informedabout salmon migration. With this as background, consider the following application of LeftCompositional Monotonicity which is now based on defaults:migration �N birdsalmon�migration �N birdThis example demonstrates that aboutness is not monotonic with regard to information composi-tion because the preferred information carriers dealing with salmon migration need not be aboutbirds. Observe that the following is intuitively acceptable:migration �N birdspring�migration �N birdWhat is at work here? It seems that the default migration �N bird precludes any discussion aboutsalmon in the context of migration. That is,migration ?N salmonThe above preclusion arises directly from the preference for information carriers dealing with birdmigration. Generalizing from the example leads to the conclusion that preclusion is also tied tothe underlying preferential structure. The following de�nition establishes information preclusionwithin this framework.De�nition 4.3 (Preferential Preclusion)An information carrier i preferentially precludes information carrier j (i ?N j) if for all k:� if k j=N i then k 6j=a j �Preferential preclusion relationships open the door to one type of conservative monotonicitywith regard to aboutness. Guarded Compositional Monotonicity states that information composi-tion may only occur when no preclusion relationships are violated. In other words, an informationcarrier (read term) can only be composed to another carrier if it is not inconsistent with thepreferences inherent in the given information need.Postulate 10 (Guarded Left Compositional Monotonicity (GLM))i �N j i 6?N ki � k �N jAssuming that part of the migration information need would typically involve some aspect of theseasonal movements of birds, we cannot conclude that all preferred information carriers aboutmigration do not support some information about seasons. That is, migration 6?N spring. As aconsequence, the following is a valid application of Rational Compositional Monotonicity:migration �N bird migration 6?N springspring�migration �N birdThe dual of GLM is Guarded Right Compositional Monotonicity:Postulate 11 (Guarded Right Compositional Monotonicity (GRM))i �N k i 6?N ji �N j � k11

Observe that GLM and GRM embody a conservative notion of information composition in com-parison with the unguarded variants LM and RM.Cautious Compositional Monotonicity also provides a conservative approach to informationcomposition, namely, it can only occur within the presence of suitable defaults:Postulate 12 (Cautious Compositional Monotonicity (CautM))i �N j i �N ki � j �N kGuarded Left Compositional Monotonicity2 and Cautious Compositional Monotonicity parallelsimilar rules found in (Kraus et al 1990).5 THEORETICAL COMPARISON OF INFORMATION RETRIEVALMECHANISMSIn the ideal situation, an information retrieval mechanism only returns those objects that arerelevant to the information need at hand. In reality, objects can be returned that are not relevant(i.e. noise) or objects that are relevant are not returned (i.e. lying by omission). The �eld ofinformation retrieval has produced a number of criteria for judging how e�ective an informationretrieval mechanism is at returning relevant objects with a minimum amount of noise. Foremostamong them are recall and precision. Given that R is the set of relevant objects and res(q) is theset of objects returned, then recall is the ratio of relevant returned objects to relevant objects andprecision is the ratio of relevant returned objects to returned objects. More formally,recallN (q) = jres(q) \RjjRjprecisionN (q) = jres(q) \Rjjres(q)jRecall and precision are employed in an experimental setting. For a given database of informa-tion objects and set of queries, two information retrieval mechanisms can be compared via theirrespective average recall and precision results. Statistical tests of signi�cance are employed todecide whether one mechanism is superior to the other. The experimental information retrievalparadigm has long been under �re, but nevertheless employed because of a lack of an alternativemethod for investigating e�ectiveness.Some recent research has attempted to compare information retrieval mechanisms in a theo-retical way (Bruza & Huibers 1994; Huibers et al 1995; Huibers & Bruza 1994). The basic ideabehind the theoretical comparison is the following. Two information retrieval models are mappeddown into an information �eld and are compared according to what aboutness postulates theysupport. We demonstrate this method by comparing the Boolean and coordinate retrieval models.For this purpose, the model-theoretic postulates (1-9) will be employed.5.1 Boolean vs. Coordinate RetrievalTo distill the postulates which govern Boolean retrieval requires a de�nition which establishesaboutness within this model. This de�nition functions as a foothold which can be used to examinewhich postulates are supported by the underlying information �eld.De�nition 5.1 (Boolean Aboutness)Let T be a vocabulary (set of terms) and B(T ) the Boolean language de�ned on T using thelogical connectives :;_;^. Furthermore, let BR be a set of strict inference rules de�ned onB(T ). Let O be a set of objects (documents) whereby for o 2 O; �(o) (�(o) � T ) denotes thecharacterization of o. Let � 2 B(T ), theno j=a � i� �(o) `BR �2Also known as rational monotonicity 12

�In other words, a query � holds in an object o if and only if the query can be deduced from o'scharacterization using the strict inference rules which are de�ned as follows. For simplicity, onlyrules involving : and ^ are speci�ed:1. if � 2 �(o) then �(o) `BR �2. if �(o) `BR �; �(o) `BR � then �(o) `BR � ^ �3. if �(o) 6`BR � then �(o) `BR :�The basis of this inference mechanism (clause 1) can be explained in terms of information con-tainment: � 2 �(o) is nothing more than an a�rmation that the index term � is informationallycontained in o, hence o ! �. Therefore, both objects and terms are considered information car-riers in the Boolean Information Field. This demonstrates that information �elds can consist ofinformation carriers of di�erent types. Furthermore, clause 1 and de�nition 5.1 imply o j=a �,hence a Boolean Information Field supports the Containment postulate.Complex Boolean formulae are also considered to be information carriers and are orderedinformationally in a natural way whereby information composition � is modelled by ^: �^� ! �and � ^ � ! �. Note that in this set up information composition is only de�ned for formulae.There is no composition operator for objects. This is consistent with the view held in the Booleanretrieval model that objects are disjoint, amorphous things with no operators de�ned on them. Asa consequence, the Left Compositional Monotonicity postulate is not applicable. The right versionof this rule is applicable, but is clearly not supported. To complete the Boolean Information �eld,all formulae are deemed to informationally preclude their negation: � ?N :�.Thus far we have established that Boolean Information Fields embody the Containment postu-late. Furthermore, it can be directly shown that clause 2 functions according to the Context-FreeAnd postulate as follows:o j=a � Def 5.1�(o) `BR � o j=a � Def 5.1�(o) `BR � clause 2�(o) `BR � ^ � Def 5.1o j=a � ^ �In a similar way, it can be shown that clause 3 cloaks the Closed World Assumption and that theRight Containment Monotonicity (RCM) postulate is also supported. Using RCM it can quickly bedemonstrated that the Negation Rationale is also supported by proceeding reductio ad absurdum:Given there are carriers i; j; k and k 6j=a i, k j=a i ^ j is assumed. By de�nition, i ^ j ! i, andapplying RCM k j=a i ^ j i ^ j ! ik j=a ileads to a contradiction. Therefore, k 6j=a i^j must have been the case. Analysis of the remainingaboutness postulates leads to the following theorem which states which postulates are supportedby a Boolean Information Field.Theorem 5.1 Let BF be a Boolean Information Field with j=a � BF= � BF= and j=a de�ned asin De�nition 5.1, then BF supports the postulates C,CF-A,CWA,RCM,NR.The same process is now repeated for coordinate retrieval. The matching function whichdrives coordinate retrieval measures overlap, for example, between the set of terms characterizinga document d and the set of terms comprising the query q.13

De�nition 5.2 (Coordinate Aboutness)Let T be a vocabulary, with � � T . Furthermore, let O be a set of objects whereby foro 2 O; �(o) (�(o) � T ) denotes the characterization of o. Then,o j=a � i� � \ �(o) 6= ? �The mapping of coordinate retrieval to an information �eld proceeds in a similar way to Booleanretrieval. Both object characterizations and queries are modelled as information carriers consistingof a set of terms. Information containment within this framework is modelled by the subset relationover }(T ). Furthermore, the indexing relation once again determines an information containmentrelation between objects and terms: if t 2 �(o) then o ! �t. The information compositionoperator � is realized by set union. The notion of information preclusion is foreign to coordinateretrieval, hence ?N= ?. It can be shown that the Strict Coordinate Information Field embodiesthe following aboutness postulates.Theorem 5.2 Let CF be a Strict Coordinate Information Field with j=a � CF= � CF= and j=ade�ned as in De�nition 5.2, then CF satis�es the postulates C,CF-A,RMComparing the representation theorems from Boolean retrieval and coordinate retrieval, we seethat coordinate retrieval supports Right Compositional Monotonicity whereas Boolean Retrievaldoes not. This re ects the fact that in coordinate retrieval only a non-trivial overlap is neededbetween the object characterization and query to determine aboutness. It has been argued thatRight and Left Compositional Monotonicity adversely a�ect precision (Bruza & IJdens 1995).This suggests that coordinate retrieval will not be precise. Boolean retrieval, on the other hand,supports the Closed World Assumption. This postulate degrades precision. Therefore, coordinateretrieval and Boolean retrieval systems are likely to be lacking in precision, but for di�erentreasons.The NR postulate is supported by the Boolean information �eld. This is a precision-orientedpostulate. For example, if an object is not about pollution, we do not want to conclude that it isabout river � pollution. (Here the null connector � is being used to realize information composition).Matching functions based on overlap will allow such conclusions. As a consequence, Booleanretrieval may prove to be more precise than coordinate retrieval. Experimentation would beneeded to con�rm this.Observe that the Boolean Information �eld supports RCM whereas the coordinate information�eld does not. It has been argued that RCM promotes recall (Bruza & IJdens 1995). In thisarticle, we have argued that it can be a dubious assumption for driving information retrieval,thus degrading precision. However, such dubious aboutness decisions principally arise when theinformation containment relation spans keywords, for example, via an ISA relation. This isn't thecase in Boolean retrieval, so RCM will not be precision degrading.5.2 Network-based Probabilistic RetrievalBelief networks have been investigated by several authors for driving information retrieval (Turtle& Croft 1990; Turtle 1990; Fung & Del Favoro 1995; Bruza & Van der Gaag 1994; Bruza 1993).Their favour can be attributed to the fact that they embody probabilistic reasoning wherebyterm dependencies can be neatly represented in the topology of the network. Furthermore, beliefnetworks allow multiple sources of evidence to be processed. It has been found that this contributesinformation retrieval e�ectiveness. This section demonstrates how aboutness between terms canbe investigated within so called information containment belief networks (Bruza & IJdens 1995).Informally speaking, a belief network is a graphical representation of a problem domain depict-ing the probabilistic variables discerned in the said domain and their interdependencies. Theseinterdependencies are quanti�ed by means of conditional probabilities. Take, for example, thesimple belief network depicted in �gure 3. From the viewpoint of information retrieval, this beliefnetwork expresses our belief that a document is about the pollution of rivers is dependent on14

it being about pollution and it being about rivers. This belief is quanti�ed via the conditionalprobability Pr(POLLUTION OF RIVERS j POLLUTION^ RIVERS).sPOLLUTION s RIVERS�� AAAA AAKs POLLUTION OF RIVERSFigure 3: Simple Belief NetworkThis example demonstrates that conditional probabilities are re ected as directed edges in thebelief network. The topology of a belief network is a directed acyclic graph (digraph), G = (V;A)where the vertex set V constitutes a set of probabilistic variables as introduced above, and theedge (Vi; Vj) 2 A models that variable Vj is directly in uenced by variable Vi.The advantage of belief networks is that by supplementing the topology with a set � of con-ditional probability assessments, a malleable representation of a joint probability distribution Prde�ned over V1 : : : Vn arises. Only a relatively small number of conditional probability assessmentsneed to be speci�ed because the belief network topology represents explicitly (in)dependencies be-tween variables.Pearl (1988) has shown that the topology together with the local probability assessments de�nea joint probability distribution. In other words, belief networks can be used as a reasonably e�cientmechanism to drive probabilistic inference. Our research has centered around belief networksconstructed from index expressions (IEBNs) (Bruza 1993; Bruza & Van der Gaag 1994; Bruza &Van der Gaag 1993; IJdens 1994; Bruza & IJdens 1994). Index expressions are assumed to beordered by a relation of information containment. For example, poll of riv ! poll. Furthermore,information containment is assumed to be certain within the context of an IEBN: i! j ) Pr(j ji) = 1. Abstracting from index expressions allows the de�nition of a belief network whose topologyis determined by the information containment relation over a set of terms and which embodiesinformation containment as being certain.De�nition 5.3 (Information Containment Belief Network)Let B = (G;�) be a belief network such that VG = fV1; : : : ; VMg. The belief networkB = (G;�) is termed an information containment belief network (ICBN) i� all of the fol-lowing conditions hold:� (fv1; : : : ; vmg;!;�;?N ; 0) is an information �eld� (Vj ; Vi) 2 GA i� vi ! vj� if vi ! vj then Pr(vj j vi) = 1 �The nodes in the ICBN topology correspond to terms in the characterization language C. Thisallows a probabilistic de�nition of aboutness between term descriptors to be investigated. Forillustration purposes term aboutness as de�ned by Index Expression Belief Networks (an ICBNbased on index expressions) will be used (see Bruza & Van der Gaag 1994). Within IEBNs,expression i is deemed to be about expression j if and only if Pr(i j j) > Pr(i). The intuitionbehind this de�nition for information retrieval is as follows. Assume i to be a query and j to bea characterization of an object O, then if j increases our belief in i, the probability of O beingrelevant is assumed to increase. Note that i and j are index expressions. As a consequence,model theoretic aboutness is not applicable. Neither is the preferential entailment approach, as jincreasing the belief in i o�ers no guarantee that the preferred i-objects also deal with j. Clearly,15

this de�nition of aboutness is weaker than preferential entailment. For this reason i / j will beused to denote this weaker form of aboutness between i and j.By analyzing what aboutness postulates are implied by Pr(i j j) > Pr(i), important in-sights may be gained as to the potential retrieval e�ectiveness of this term aboutness de�ni-tion. Bruza & IJdens recently analyzed aboutness within IEBNs using the base set of postulatesC,RCM,RM,LM,GRM,GLM,CautM which were de�ned in terms of /3. The analysis yielded thefollowing theorem.Theorem 5.3 Let B = (G;�) be an ICBN with I; J 2 VG such thati / j i� Pr(i j j) > Pr(i)Then, B supports the postulates C, LM, RM, GRM, GLM, and CautM.The fact that the above aboutness de�nition supports the LM and RM postulates raises seriousquestions about precision. Within an IEBN, the LM postulate is manifested as follows. Giventhat Pr(pollution j river � pollution) > Pr(pollution), thenPr(air � pollution j river � pollution) > Pr(air � pollution)Here the null connector � is functioning as the composition operator �. In other words, using theabove aboutness de�nition the term air � pollution is deemed to be about the term river � pollution.An information retrieval mechanism could then conclude on the basis of this that a documentdealing with river pollution should be returned in response to the query \air pollution". Precisionwill degrade as a result.The postulates GRM, GLM, and CautM turn out to be trivially supported via RM and LMbeing supported. As a consequence no insights can be gleaned with regard to these postulates.Bruza & IJdens also provide an analysis of an alternative aboutness de�nition which includesa probabilistic interpretation of preclusion relationships. These can have an important bearing onthe aboutness decision between descriptors. By way of illustration, consider the term migration.Assuming, for a given information need N , that all the preferred documents satisfying N wouldbe about the migration of birds, then this information need implicitly generates aboutness rela-tions like migration about sparrow, migration about duck etc. Also inherent in N , are preclusionrelationships like migration ?N salmon. Viewed within the framework of an ICBN, the preclu-sion relationships can be considered as negative evidence. In the above example, :salmon shouldbe propagated when determining those descriptors about migration. It turns out that preclusionrelationships implemented in this fashion can nullify the precision degrading LM and RM pos-tulates and render an aboutness de�nition that should be more precise than the one stated intheorem 5.3.6 CONCLUSIONS AND FURTHER RESEARCHThis article has attempted to look fundamentally at the notion of aboutness in information re-trieval. Aboutness has essentially been explored from two perspectives. The �rst perspectiveconsiders aboutness from a model-theoretic stance. Nine postulates regarding aboutness were ex-pressed in terms of concepts fundamental to a so called information �eld. This perspective opensthe door for the theoretical comparison of information retrieval models as well as the analysisof term aboutness de�nitions. This article has demonstrated how the Boolean retrieval modelcan be theoretically compared to the coordinate retrieval model according to which aboutnesspostulates they support. It should be mentioned, however, that there is not yet an agreed set ofaboutness postulates with which an information retrieval mechanism can be analyzed. Nor is itpossible at this stage to formally analyze the e�ect of a given postulate on recall and precision. Itis the authors' belief that the latter is a re ection of a lacking underlying model theory in which3Some of these postulates are motivated from a model-theoretic perspective, whereas others are motivatedfrom a preferential entailment perspective. The question being tackled is which of these postulates manifest in aprobabilistic setting 16

aboutness postulates can be examined. Even though theoretical information retrieval is still in itsinfancy, work is proceeding to develop such a theory (Huibers & Bruza 1994; Bruza & Huibers1994; Lalmas & Van Rijsbergen 1992; Van Rijsbergen 1993; Huibers et al 1995). The ability tobe able to map an arbitrary retrieval model into a comprehensive theoretical framework wouldbe a breakthrough for information retrieval theory. Within such a framework, general patternsof aboutness could be studied. Furthermore, the framework would o�er the freedom to exploreissues in relation to information retrieval independent from the idiosyncrasies and formalisms ofa given model.The second perspective o�ered in this article is that in information retrieval aboutness alsoplays a role between primitive information carriers such as term descriptors. Both preferential en-tailment and conditional probabilities were used to de�ne this type of aboutness. The preferentialentailment approach has the advantage that rules motivated from an underlying model-theoreticsemantics (preference structures) can be used to drive a conservative form of information compo-sition. This is potentially useful for query expansion as well as for driving information retrieval.A preference structure arises out of biases inherent in the searcher's given information need. In apractical situation, the underlying preference will not be directly available. (In fact, the goal ofinformation retrieval is to deliver this structure!). We are currently investigating how the under-lying preference structure (and associated defaults) can be approximated via relevance feedback(see (Bruza 1995) for some initial ideas on this).This article has argued that aboutness demonstrates nonmonotonic character with regard toprimitive information carriers under information composition. We are currently investigating otherinference rules in the context of preferential structures. More research is needed to understand thenonmonotonic aspects of information retrieval systems. Default logic is a potentially fruitful areaof investigation (Hunter 1995). Our ultimate goal is to produce an aboutness preserving inferencemechanism founded on a suitable information-based framework. Such a mechanism could formthe brains of intelligent information agents which will help us sift our way through the informationage.References[BE90] J. Barwise and J. Etchemendy. Information, Infons, and Inference. In R. Cooper,K. Mukai, and J. Perry, editors, Situation Theory and its Applications, volume 1 ofCSLI Lecture Note Series, pages 33{78. CSLI, 1990.[BH94] P.D. Bruza and T.W.C. Huibers. Investigating Aboutness Axioms using InformationFields. In Proceedings of the Seventeenth ACM SIGIR Conference on Research andDevelopment in Information Retrieval, pages 112{121, 1994.[BI94] P.D. Bruza and J.J IJdens. E�cient Probabilistic Inference through Index ExpressionBelief Networks. In Proceedings of the Seventh Australian Joint Conference on Arti�cialIntelligence (AI94), pages 592{599. World Scienti�c, 1994.[BI95] P.D. Bruza and J.J IJdens. Deciding Term Aboutness Probabilistically, 1995. Submit-ted to the Arti�cial Intelligence Journal. Available from the authors.[Bla90] D.C. Blair. Language and Representation in Information Retrieval. Elsevier, 1990.[Bru93] P.D. Bruza. Strati�ed Information Disclosure: A Synthesis between Information Re-trieval and Hypermedia. PhD thesis, University of Nijmegen, 1993.[Bru95] P.D. Bruza. Intelligent Filtering using Nonmonotonic Inference, 1995. Submitted tothe First Australian Document Computing Symposium. Available from the author.[BvdG93] P.D. Bruza and L.C. van der Gaag. E�cient Context-Sensitive Plausible Inference forInformation Disclosure. In Proceedings of the Sixteenth ACM SIGIR Conference onResearch and Development in Information Retrieval, pages 12{21, 1993.17

[BvdG94] P.D. Bruza and L.C. van der Gaag. Index Expression Belief Networks for InformationDisclosure. International Journal of Expert Systems, 7(2):107{138, 1994.[CBC93] J.P. Callan and W. Bruce Croft. An Evaluation of Query Processing Strategies usingthe TIPSTER collection. In Proceedings of the Sixteenth ACM SIGIR Conference onResearch and Development in Information Retrieval, pages 347{355, 1993.[CC92] Y. Chiarmarella and J.P. Chevallet. About Retrieval Models and Logic. The ComputerJournal, 35(3):233{242, 1992.[Coo71a] W.S. Cooper. A De�nition of Relevance for Information Retrieval. Information Storageand Retrieval, 7:19{37, 1971.[Coo71b] W.S. Cooper. A De�nition of Relevance for Information Retrieval. Information Storageand Retrieval, 7:19{37, 1971.[CR95] F. Crestani and C.J. van Rijsbergen. Probability Kinematics in Information Retrieval.In Proceedings of the Eighteenth International ACM SIGIR Conference on Researchand Development in Information Retrieval, pages 291{299, 1995.[FDF95] R. Fung and B. Del Favoro. Applying Bayesian Networks to Information Retrieval.Communications of the ACM, 38(3):42{48, 1995.[HB94] T.W.C. Huibers and P.D. Bruza. Situations: A general framework for studying Infor-mation Retrieval. In Proceedings of the 16th BCS Information Retrieval Colloquium.British Computer Society, 1994.[HIC95] T. Huibers, O. Iadh, and J. Chevallet. Axiomatization of a Conceptual Graph Formal-ism for Information Retrieval in a Situated Framework. Under Preparation, 1995.[Hun95] A. Hunter. Using default logic in information retrieval. In C Froidevaux and J Kohlas,editors, Symbolic and Quantitative Approaches to Uncertainty, volume 946 of LectureNotes in Computer Science, pages 235{242, 1995.[Hut77] W.J. Hutchinson. On the problem of aboutness in Information Retrieval. Journal ofInformatics, 1:17{35, 1977.[IJd94] J.J. IJdens. Using Index Expression Belief Networks for Information Disclosure|towards e�ective use of the IEBN model as a disclosure system. Master's thesis, UtrechtUniversity, March 1994.[Ing94] P. Ingwersen. Polyrepresentation of Information Needs and Semantic Entities. InProceedings of the Seventeenth ACM SIGIR Conference on Research and Developmentin Information Retrieval, pages 101{110, 1994.[KLM90] S. Kraus, D. Lehmann, and M. Magidor. Nonmonotonic Reasoning, Preferential Modelsand Cumulative Logics. Arti�cial Intelligence, 44:167{207, 1990.[Lan86] F. Landman. Towards a Theory of Information. Foris, 1986.[LR92] M. Lalmas and C.J. van Rijsbergen. A Logical Model of Information Retrieval based onSituation Theory. In Proceedings of the 14th BCS Information Retrieval Colloquium.British Computer Society, Springer-Verlag, 1992.[Mar77] M.E. Maron. On Indexing, Retrieval and the Meaning of About. Journal of the Amer-ican Society for Information Science, 28(1):38{43, 1977.[Mar91] R. Marshall. Manipulating Full-Text Scienti�c Databases: A logic-based Semantico-pragmatic Approach. The Computer Journal, 34(3):245{253, 1991.18

[MSST93] M. Meghini, F. Sebastiani, U. Straccia, and C. Thanos. A Model of information Re-trieval based on Terminological Logic. In Proceedings of the Sixteenth ACM SIGIRConference on Research and Development in Information Retrieval, pages 298{307,1993.[Nie86] J. Nie. An Outline of a General Model for Information Retrieval Systems. In Proceedingsof the Ninth ACM SIGIR Conference on Research and Development in InformationRetrieval, pages 495{506, 1986.[Nie92] J. Nie. Towards a Probabilistic Modal Logic for semantic-based Information Retrieval.In Proceedings of the Fifteenth ACM SIGIR Conference on Research and Developmentin Information Retrieval, pages 140{151, 1992.[Pea88] J. Pearl. Probabalistic Reasoning in Intelligent Systems: Networks of Plausible Infer-ence. Morgan Kaufman Publishers, Palo Alto, 1988.[Rij86a] C.J. van Rijsbergen. A New Theoretical Framework for Information Retrieval. InProceedings of the Ninth International ACM SIGIR Conference on Research and De-velopment in Information Retrieval, pages 194{200, 1986.[Rij86b] C.J. van Rijsbergen. A non-classical logic for information retrieval. The ComputerJournal, 29(6):481{485, 1986.[Rij89] C.J. van Rijsbergen. Towards an Information Logic. In Proceedings of the Twelfth ACMSIGIR Conference on Research and Development in Information Retrieval, pages 77{86, 1989.[Rij93] C.J. van Rijsbergen. What is Information Anyway? In Two Essays in InformationRetrieval. University of Glasgow, 1993. Research Report IR-93-03.[SAB93] G. Salton, J. Allan, and C. Buckley. Approaches to Passage Retrieval in Full Text Infor-mation Systems. In Proceedings of the Sixteenth ACM SIGIR Conference on Researchand Development in Information Retrieval, pages 49{58, 1993.[Sal83] G. Salton. Introduction to Modern Information Retrieval. McGraw-Hill Book Company,1983.[Seb94] F. Sebastiani. A Probabilistic Terminological Logic for Modelling Information Re-trieval. In Proceedings of the Seventeenth ACM SIGIR Conference on Research andDevelopment in Information Retrieval, pages 122{130, 1994.[SEN90] L. Schamber, M. Eiseneberg, and M. Nilan. A Re-examination of Relevance: Towarda dynamic situational de�nition. Information Processing and Management, 26(6):755{776, 1990.[Sho89] Y. Shoham. E�cient Reasoning about Rich Temporal Domains. In R.H. Thomason,editor, Philosophical Logic and Arti�cial Intelligence, pages 191{222. Kluwer AcademicPublishers, 1989.[SvR90] T.M.T. Sembok and C.J. van Rijsbergen. SILOL: A Simple Logical-Linguistic Docu-ment Retrieval System. Information Processing and Management, 26(1):111{134, 1990.[TC90] H.R. Turtle and W. Bruce Croft. Inference Networks for Document Retrieval. InJ.L. Vidick, editor, Proceedings of the 13th International ACM SIGIR Conference onResearch and Development in Information Retrieval, pages 1{24, 1990.[Tur90] H.R. Turtle. Inference Networks for Document Retrieval. PhD thesis, University ofMassachusetts, Amherst, 1990. Available as technical report 90-92.19