beyond information searching and browsing: acquiring knowledge from digital libraries*1

24
Beyond information searching and browsing: acquiring knowledge from digital libraries q Ling Feng a, * , Manfred A. Jeusfeld b , Jeroen Hoppenbrouwers b a Department of Computer Science, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands b Infolab, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands Available online 18 May 2004 Abstract Digital libraries (DLs) are a resource for answering complex questions. Up to now, such systems mainly support keyword-based searching and browsing. The mapping from a research question to keywords and the assessment whether an article is relevant for a research question is completely with the user. In this paper, we present a two-layered digital library model. The aim is to enhance current DLs to support different levels of human cognitive acts, thus enabling new kinds of knowledge exchange among library users. The low layer of the model, namely, the tactical cognition support layer, provides users with requested relevant documents, as searching and browsing do. The upper layer of the model, namely, the strategic cognition support layer, not only provides users with relevant documents but also directly and intelligently answers users’ cognitive questions. On the basis of the proposed model, we divide the DL information space into two subspaces, i.e., a knowledge subspace and a document subspace, where documents in the document subspace serves as the justification for the corresponding knowledge in the knowledge subspace. Detailed description of the knowledge subspace and its construction, as well as query facilities against the enhanced DLs for users’ knowledge sharing and exchange, are particularly discussed. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Digital libraries; Tactical support; Strategic support; Document subspace; Knowledge subspace 1. Introduction In today’s networked environment, people demand new evolutionary technology for information re- trieval and intellectual exchange. DLs serve as an increasingly important channel to a vast array of information sources and services. They provide new opportunities to assemble, organize, and access large volumes of information from multiple repositories, while making distributed heterogeneous resources q An earlier version of the paper was presented at the ICADL’02 Conference. * Corresponding author. Tel.: +31-53-489-3772; fax: +31-53-489-2927. E-mail addresses: [email protected] (L. Feng), [email protected] (M.A. Jeusfeld), [email protected] (J. Hoppenbrouwers). URLs: http://www.cs.utwente.nl/ling (L. Feng), http://infolab.uvt.nl/people/jeusfeld/ (M.A. Jeusfeld), http://infolab.uvt.nl/people/ hoppie/ (J. Hoppenbrouwers). 0306-4573/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.ipm.2004.04.005 Information Processing and Management 41 (2005) 97–120 www.elsevier.com/locate/infoproman

Upload: l

Post on 30-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Information Processing and Management 41 (2005) 97–120

www.elsevier.com/locate/infoproman

Beyond information searching and browsing:acquiring knowledge from digital libraries q

Ling Feng a,*, Manfred A. Jeusfeld b, Jeroen Hoppenbrouwers b

a Department of Computer Science, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlandsb Infolab, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands

Available online 18 May 2004

Abstract

Digital libraries (DLs) are a resource for answering complex questions. Up to now, such systems mainly support

keyword-based searching and browsing. The mapping from a research question to keywords and the assessment

whether an article is relevant for a research question is completely with the user. In this paper, we present a two-layered

digital library model. The aim is to enhance current DLs to support different levels of human cognitive acts, thus

enabling new kinds of knowledge exchange among library users. The low layer of the model, namely, the tactical

cognition support layer, provides users with requested relevant documents, as searching and browsing do. The upper

layer of the model, namely, the strategic cognition support layer, not only provides users with relevant documents but

also directly and intelligently answers users’ cognitive questions. On the basis of the proposed model, we divide the DL

information space into two subspaces, i.e., a knowledge subspace and a document subspace, where documents in the

document subspace serves as the justification for the corresponding knowledge in the knowledge subspace. Detailed

description of the knowledge subspace and its construction, as well as query facilities against the enhanced DLs for

users’ knowledge sharing and exchange, are particularly discussed.

� 2004 Elsevier Ltd. All rights reserved.

Keywords: Digital libraries; Tactical support; Strategic support; Document subspace; Knowledge subspace

1. Introduction

In today’s networked environment, people demand new evolutionary technology for information re-

trieval and intellectual exchange. DLs serve as an increasingly important channel to a vast array ofinformation sources and services. They provide new opportunities to assemble, organize, and access large

volumes of information from multiple repositories, while making distributed heterogeneous resources

qAn earlier version of the paper was presented at the ICADL’02 Conference.* Corresponding author. Tel.: +31-53-489-3772; fax: +31-53-489-2927.

E-mail addresses: [email protected] (L. Feng), [email protected] (M.A. Jeusfeld), [email protected] (J. Hoppenbrouwers).

URLs: http://www.cs.utwente.nl/ling (L. Feng), http://infolab.uvt.nl/people/jeusfeld/ (M.A. Jeusfeld), http://infolab.uvt.nl/people/

hoppie/ (J. Hoppenbrouwers).

0306-4573/$ - see front matter � 2004 Elsevier Ltd. All rights reserved.doi:10.1016/j.ipm.2004.04.005

98 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

spread across the network appear to be a single uniform federated source (Schatz & Chen, 1999). Under the

assistance of DL systems, users can move from source to source, seeking and linking information auto-

matically or semi-automatically to solve their various information problems.

Traditionally, when people retrieve information, their activities are classified into two broad categories:searching and browsing. Searching implies that the user knows exactly what to look for, while browsing

should assist users navigating among correlated searchable terms to look for something new or interesting.

Currently, most of the major work on DL systems focus on supporting these two kinds of activities.

However, with more DLs built up to handle users’ information requests, we ask questions: Are the existing

information technologies powerful enough to facilitate DL users’ problem solving? Or what technologies do we

still need to help users better do their work? Compared to traditional physical libraries, are there any important

features missing from the current DL systems? To answer these questions, let us first look at a scenario on

the use of a DL system.Kooper works in a city plan and management office. He is investigating the precaution policy against

flood this year. If there will be a flood in the coming summer, necessary actions (e.g., strengthening the

embankment, resource allocation, etc.) must be taken now to prevent the city from sustaining losses.

According to Kooper’s previous experience, it seems that A wet-winter might cause flood in summer. To

confirm this pre-conceived hypothesis, he rewords a DL system to request information talking about the

reasons of flood. The DL system returns a number of articles. He browses through the returned article list

and selects three articles that look most relevant. All of the three articles mention that A precursor to the

flooding in summer is a wet-winter. To assure this information is also valid for the area where Kooper lives,he accesses the DL again to ask for documents reporting river floods in the city before. From the articles

returned, he gets to know the latest three severe floods that happened in 1986, 1995, 1997, respectively, in

the city. He then navigates to the meteorological repository of the DL, and accesses the weather infor-

mation of the city during the winter time of these three years. He notes a tight correlation between wet-

winter and summer-flood. Based on the information obtained from the DL, Kooper is pretty sure now about

his prior flood prediction assumption for the city. He logs off the system. Having experienced a very wet-

winter this year, Kooper decides to draw up a city flood precaution plan immediately.

This scenario shows us a few shortcomings that exist in current DLs. Below we outline some of theproblems and propose our work to solve them.

1.1. Inadequate strategic level cognition support

The aim of DL systems is to empower users to find useful information to solve their problems. When a

person goes into a library to look for something, she/he usually has certain purposes in mind. For example,

she/he may want to read a specific article written by a certain author. In this situation, the target of the user

is precise and clear. Sometimes, the user wishes to explore the available resources first before exploiting

them. This exploration may be targeted at refining a prior understanding of a certain information context,

or formulating a further concrete requirement for specific documents. Most efforts of the current DLsystems aim to support these two kinds of users’ activities, namely, searching and browsing.

However, besides simply entertaining users with documents located by searching and browsing, DL

systems also should consider supporting human’s strategic level cognitive work which can directly enable

correct actions and problem solving. Typically, users have already some prior domain-specific knowledge

or pre-conceived hypotheses. They may expect the library systems to confirm/deny their existing concepts,

or to check whether there is some exceptional/contradictory evidence against pre-existing notions, or to

provide some predictive information so that users can take effective actions. Under this circumstance, the

users’ information needs are not only for relevant documents, but also for intelligent answers to theirquestions together with supporting literature for justification and explanation, as illustrated in the above

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 99

scenario. To distinguish this kind of information need from traditional searching and browsing requests, we

categorize traditional users’ information searching and browsing activities into tactical level cognition act

and the latter into strategic level cognition act.

As Checkland and Holwell (1998) says, The nature of an information system is to provide informational

support to people as they carry out their intentional tasks. Developing one of the most complex and advanced

forms of information systems, DL system designers must have a comprehensive understanding of users’

information needs and their purposes for using DLs. In the above example, if the DL system could

automate user’s rational exploration from the knowledge space, consisting of propositions and assertions,

to the corresponding justification/explanation space, consisting of data sources in various forms like textual

articles, reports, databases, etc., the information role of the DLs in helping users derive effective decisions

will be greatly enforced. Unfortunately, such high-order cognition support and its impact on users’ work

have not received much exploration in current DL systems.

1.2. Inadequate knowledge sharing facilities

Traditional libraries are a public place where a large extent of mutual learning, knowledge sharing, and

exchange can happen. A user may ask a librarian for search assistance. Librarians themselves may col-laborate in the process of managing, organizing, and disseminating information, or share experiences in

using consistently-emerging new systems and tools to tackle users’ search questions. Users may commu-

nicate and learn from each other by observing how others use library’s resources, or by asking for help.

With paper sources digitalized and physical libraries moving to virtual DLs, these valuable features of

traditional libraries should be retained. We believe DLs of the future should not just be simple storage and

archival systems. To be successful, DLs must become a knowledge place where knowledge acquisition,

sharing, and propagation take place. For example, if the DL in the above scenario could make readily

available expertise and answers to strategic level questions, which might require time-consuming search orconsultation with experts, it can help users to better exploit the DL and improve working effectiveness.

Also, as computerized knowledge does not deteriorate with time as that human knowledge does, for long-

term retention, DLs offer ideal repositories for the knowledge in the world, and make it universally

accessible. Currently, there are some DL efforts aiming in this direction, see Marshall et al. (2003) for good

reference.

1.3. Our work

We propose a two-layered DL functional model to support both tactical level and strategic level cog-

nitive tasks of users according to their information needs. The model moves beyond simple searching and

browsing across multiple correlated repositories, to acquisition of knowledge. On the basis of the proposed

functional model, we further divide the DL information space into two subspaces, i.e., a knowledge subspace

and a document subspace. Documents in the document subspace serve as the justification for the corre-

sponding knowledge in the knowledge subspace. We give a detailed description of the DL knowledgesubspace, as well as query facilities against such enhanced DLs. Further issues regarding the construction of

the knowledge subspace and its linkage with the document subspace also are addressed in the paper.

The remainder of the paper is organized as follows. In Section 2, we review some closely related work. A

two-layered DL model, consisting of a tactical cognition support layer and a strategic cognition support

layer, is presented in Section 3. A formal description of the DL knowledge subspace and query facilities

against the enhanced DLs are presented in Section 4 and Section 5, respectively. Two mechanisms

for the construction of a DL’s knowledge subspace and its linkage to the document subspace are described

in Section 6. One of them is based on the manual input from experienced users. Another approachaddresses (semi-)automatic knowledge acquisition by correlating and analyzing data sources from multiple

100 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

repositories in DLs. Section 7 discusses the fundamental differences between such an enhanced DL system

and traditional knowledge-based information systems. A number of issues are meanwhile pointed out for

future work.

2. Related work

DLs integrate a variety of information technologies, calling for multi-disciplinary cooperation in

information science. In this section, we provide a brief review of relevant work in different fields, including

ontologies and lexical relations, knowledge representation, information retrieval, and multi-source inte-

gration in DLs.

2.1. Ontologies and lexical relations

The Oxford English Dictionary defines ontology as the ‘‘the science or study of being’’. In philosophy, the

word ontology refers to a systematic description of a minimal set of concepts that a language needs to

express all its other concepts (Hoppenbrouwers, 1997). In linguistic and lexicographic contexts, an ontology

is a specification of the concepts and their relationships that exist for a community of people or automatedagents. Its purpose is to enable knowledge sharing and reuses (Gruber, 1993). To reflect cognitive relations

among the concepts as expressed by or embodied in words in natural languages, the theory of lexical

relations was developed (Kiefer, 1969), where information is represented as a collection of nodes connected

by labeled arcs that express links or relationships between the nodes. These links may represent semantic

and syntactic relationships among concepts, and can be used to reflect not only major aspects of word

meaning, but also their morphological relationships and implicatures (Evens & Smith, 1979; Kiefer, 1969;

Markowitz, Nutter, & Evens, 1992; Nutter, 1989; Nutter, Fox, & Evens, 1990). The most widely recognized

relationships include is-a, synonym, antonym, and so on. In the literature, a link can be denoted using eithera labeled arc or a proposition node (Cercone & McCalla, 1987; Findler, 1979; Brachman, Ciccarelli,

Greenfeld, & Yonke, 1978). Further, a methodology for identifying, evaluating, and describing different

relationships, along with their logical properties in modeling human reasoning, was presented (Markowitz

et al., 1992). It begins with a study of dictionary and folk definitions to obtain the linguistic formulae used

to express the link in question. Then it discovers how this link appears in anthropology, linguistics, phi-

losophy, and psychology, and the way it interacts with other links and the way it is used in conceptual

information processing. A hierarchy of lexical–semantic relations is thus constructed which contains both

basic and non-basic semantic relations as well as a large number of lexical relations. Many (but not all) ofthe lexical relations so far identified can be extracted automatically from dictionary definitions (Nutter

et al., 1990). CYC (Lenat, 1995) aims to build a common sense knowledge base consisting of roughly

100,000 general concepts spanning all aspects of human reality. WordNet is another remarkable and widely

used lexicon, which groups the words together by means of synonym sets. It aims to provide an aid to

search in a lexicon in a conceptual rather than an alphabetical way (Miller, Beckwith, Fellbaum, Gross, &

Miller, 1993).

2.2. Knowledge representation

Knowledge is an important element of any AI application. A number of knowledge representation

schemas are designed in the AI field so that the knowledge can be applied in the reasoning process to solve

problems. These techniques can be roughly categorized as either declarative methods, in which most of the

knowledge is represented as a static collection of facts accompanied by a small set of general procedures formanipulating them; or procedural methods, in which the bulk of the knowledge is represented as procedures

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 101

for using it (Anderson, 1992; Garnham, 1988; Rich, 1983). Typical declarative methods include predicate

logic, semantic nets, frames, and scripts. (1) Predicate logic involves using standard forms of logical

symbolism to represent real-world facts as statements, written as well-formed formulae, which are made up

of predicates, constant terms, variables, logical connectives, and quantifiers. (2) Semantic nets take thecomplex structure of the world into consideration, where information is represented as a set of nodes

connected to each other by a set of labeled arcs, representing relationships among the nodes (Minsky,

1968a, 1968b). Both (3) frames and (4) scripts serve as general-purpose knowledge structures that represent

some common features of things or sequence of events in a particular context. Some procedural knowledge

representation methods include procedures and production rules.

In our study, we explore the use of predicate logic and semantic network mechanisms to represent

hypotheses and their inter-relationships in the DL knowledge subspace.

2.3. Information retrieval

To support efficient information searching and browsing activities, many efforts have been made in

developing retrieval models, building document and index spaces, or extending and refining queries for

information retrieval systems (Crestani, Lalmas, van Rijsgergen, & Campbell, 1998; Fidel, 1994; Frakes &

Baeza-Yates, 1992). A three-dimensional model of information exploration is presented in Waterworth and

Chignell (1991), with the attempt to clarify the respective roles of the human and the system in browsing

and information retrieval, and to characterize alternative interaction styles to maximize retrieval effec-

tiveness. In Dunlop and van Rijsbergen (1993), index terms are automatically extracted from documentsand a vector–space paradigm is exploited to measure the matching degrees between queries and documents.

Indexes and metadata also can be created manually, from which semantic relationships are captured (Beard

& Sharma, 1997; Dao, 1998). Besides, the large collection of documents can be semantically partitioned

into different clusters, so that queries can be evaluated against relevant clusters (Willett, 1988). To improve

query recall and precision, several query expansion and refinement techniques based on relational lexicons/

thesauri or relevance feedback have been explored (Efthimiadis, 1993; Jones et al., 1995; V�elez, Weiss,Sheldon, & Gifford, 1997). Ingwersen (1994) applies a cognitive polyrepresentation of information needs to

information retrieval systems, aiming to improve the intellectual access to information sources on an en-riched contextual platform. Beaulieu and Jones (1998) investigate the interface design issues for a highly

interactive information retrieval system based on a probabilistic retrieval model with relevance feedback.

Belkin and Marchetti (1990) propose a method for specifying the functionality of an intelligent interface to

large-scale information retrieval systems, and for implementing those functions in an operational envi-

ronment. Furthermore, Chen (1992) outlines a theoretical framework for understanding the information

storage and retrieval process in terms of the ‘‘agents’’ involved in the indexing and searching processes. It

shows how human agents are involved in the retrieval processes, the types of knowledge these agents

possess, and the observed characteristics of their indexing and searching behaviors. A recent work incor-porates knowledge about the document structures into information retrieval, and the presented query

language allows the assignment of structural roles to individual query terms (Wolff, Fl€orke, & Cremers,2000). Individual differences in using information retrieval systems also have been extensively studied in

Borgman (1987), McKnight, Dillon, and Richardson (1989), Belkin, Borgman, and Dumais (1994), Spink

and Saracevic (1998) and Dumais (1999).

2.4. Multi-source integration in DLs

The emerging field of DLs brings together participants from many existing research areas (Fox &Marchionini, 1998, 2001). From a database or information retrieval perspective, DLs may be seen as a form

102 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

of federated databases, since a DL usually contains lots of distributed and heterogeneous repositories which

may be autonomously managed by different organizations. To facilitate users’ easy access to diverse

sources, many efforts have been engaged in handling various structural and semantics variations and

providing users with a coherent view of a massive amount of information. Currently, the interoperabilityproblem has sparked vigorous discussions in the DL community (Chen, 1999; Paepcke, Chang, Garcia-

Molina, & Winograd, 1998; Payette, Blanchi, Lagoze, & Overly, 1999; Schatz, 1996, 1998; Schatz & Chen,

1999; Schatz et al., 1999). The concept extraction, mapping, and switching techniques, developed in Ben-

nett, He, Chang, and Schatz (1999), Mead and Gay (1995), Chen, Smith, and Ng (1997), enable users in a

certain area to easily search the specialized terminology of another area. A dynamic mediator infrastructure

(Melnik, Garcia-Molina, & Paepcke, 2000) allows mediators to be composed from a set of modules, each

implementing a particular mediation function, such as protocol translation, query translation, or result

merging (Paepcke et al., 2000). (Jr & Lagoze, 1997; Payette & Lagoze, 1999) present an extensible digitalobject and repository architecture FEDORA, which can support the aggregation of mixed distributed data

into complex objects, and associate multiple content disseminations with these objects. Kahn and Wilensky

(1995), Paepcke et al. (1996) and Paepcke and Hassan (1996) employ the distributed object technology to

cope with interoperability among heterogeneous resources. According to topic areas, a distributed semantic

framework is proposed to contextualize the entire collection of documents in a DL for efficient large-scale

searching (Papazoglou & Hoppenbrouwers, 1999). Experiences in designing and implementing DLs,

including the archival repository architecture, user interface, cross-access mechanism, and intellectual

property protection, etc., are substantially presented in Bates (2002); Hoppenbrouwers and Paijmans(2000); Crespo and Garcia-Molina (2000); Cooper, Crespo, and Garcia-Molina (2000); House, Butler,

Ogle, and Schiff (1996); House (1995); Phelps and Wilensky (1997); Beard and Sharma (1997); Buttenfield

and Larsen (1997). Recommendations for DL evaluation from both a longitudinal and multifaceted

viewpoint are provided in Marchionini (2000) and Kantor (1998).

On the other hand, applying AI to library science, many library-oriented expert systems have been

developed (Lancaster & Sandore, 1997; Lancaster & Smith, 1990). However, most of these systems

essentially aid in carrying out the support operations of libraries, such as descriptive cataloging, collection

development, disaster planning and response, database searching, document delivery, etc. (Lancaster &Sandore, 1997; Lancaster & Smith, 1990). There is a fundamental difference from the work reported in this

paper, whose intention is to answer library users’ strategic questions.

3. A two-layered DL function model

Supporting users’ information searching and browsing has been the focus of DL research for a long time.

However, as social humans, their information expectations for a DL are more than pure searching andbrowsing. In this section, we extend the traditional role of DLs from information provider to information

and knowledge provider. Fig. 1 illustrates a two-layered DL functional model, consisting of a tactical

cognition support layer and a strategic cognition support layer.

3.1. Tactical level vs. strategic level cognition support

3.1.1. Tactical level cognition support

We view traditional DL searching and browsing as tactical level cognitive acts. The target of searching is

towards certain specific documents. One searching example is ‘‘Look for the article written by John Brown

in the proceedings of VLDB88’’. As the user’s request can be precisely stated beforehand, identifying the

target repository where the requested document is located is relatively easy. Primarily, the ability to searchindexes of repositories can support this kind of searching activities. In comparison to searching whose

Fig. 1. A two-layered DL functional model.

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 103

objective is well-defined, browsing aims to provide users with a conceptual map, so that users can navigate

among correlated items to hopefully find some potentially useful documents, or to formulate a more precise

retrieval request further. For instance, a user reads an article talking about a water reservoir construction

plan in a certain region. She/he wants to know the possible influence on ecological balance. By following

semantic links for the water reservoir plan in the DL, she/he navigates to the related ‘‘ecological protection’’

theme, under which a set of searchable terms with relevant documents are listed for selection. To facilitate

browsing, DLs must integrate diverse repositories to provide users with a uniform searching and retrieval

interface to a coherent collection of materials. The capability that enables navigation among a network ofinter-related concepts, plus the searching capability on each individual repository, constitute the funda-

mental support to browsing activities.

As the techniques of searching and browsing have been extensively studied and published in the liter-

ature, we will not discuss these any further here.

3.1.2. Strategic level cognition support

In contrast to tactical level cognition support which intends to provide users with requested documents,

strategic level cognition support not only provides relevant documents but also intelligently answers high-

order cognitive questions, and meanwhile provides justifications and evidences. Taking the scenario in

Section 1 as an example, the purpose of Kooper on the use of the DL is to confirm his prior knowledgeabout ‘‘Wet-winter causes summer-flood’’. Instead of retrieving documents using keywords like wet-winter,

summer-flood, cause, etc., the user would prefer to pose a direct question Q1 as follows, and expect a direct

confirmed/denied answer from the DL system rather than a list of articles lacking explanatory semantics

and waiting for his assessment.

Q1: ‘‘Tell me whether a wet-winter will cause summer-flood’’.

Other high-order cognitive request examples are like:

Q2: ‘‘Tell me what will possibly cause summer-flood. Give me referent articles that talk about this’’.

Q3: ‘‘Give me articles which state the necessary influences of wet winter’’.In response to different questions as described above, it is desirable for DLs to provide knowledge-level

answers, together with the justifications for these answers. For example, the justification for Q1 is a series of

articles talking about wet-winter causes flood in summer, as well as evidence articles which talk about,

respectively, wet-winter in certain years and summer-flood in the next years in certain particular regions.

The provision of strategic level cognition support adds values to DLs beyond simply providing docu-

ment access. It reinforces the exploration and utilization of information in DLs, and advocates a more close

and powerful interaction between users and DL systems. With this high-order cognitive assistance, DL

users will be able to find things to solve their real information problems themselves. From the viewpoint ofDLs, to realize such a strategic level cognition function, substantial information analysis and abstraction

Fig. 2. An enlarged DL information space.

104 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

need to be done. This inevitably involves the navigation and correlation of information items across

multiple repositories in DLs, and production of intelligent knowledge in answering users’ strategic ques-tions.

3.2. An enlarged DL information space

In order to support the two kinds of cognitive acts, we further divide the DL information space into two

subspaces, i.e., a knowledge subspace and a document subspace, as shown in Fig. 2. Note that in this figure,

we only illustrate the organization of documents for knowledge justification purpose. Documents in a DL

are in fact also indexed and clustered based on the ontology and specific topics.

3.2.1. The knowledge subspace

The basic constituent of the knowledge subspace is knowledge, such as hypothesis, rule, belief, etc. 1

Each hypothesis describes a certain relationship among a set of concepts. For example, the hypothesis ‘‘H2:wet-winter causes summer-flood’’ explicates a causal relationship between a cause wet-winter and the effect

summer-flood it has. Considering that the DL knowledge subspace is for users to retrieve strategical level

knowledge, it will inevitably be subject to the classical information retrieval’s vocabulary (synonymy and

polysemy) problem. Previous research (Bates, 1986; Chen, Lynch, Basu, & Ng, 1993) demonstrated thatdifferent users tend to use different terms to seek identical information. Furnas, Landauer, Gomex, and

Dumais (1987) showed that the probability of two people using the same term to classify an object was less

than 20%. To enable knowledge exchange and reusability, we formulate various relationships including

equivalence, specification/generalization, and opposition over the hypotheses knowledge, expanding a single

user’s hypothesis into a network of related hypotheses. For example, a more general hypothesis with respect

1 In this initial study, we focus on hypothesis knowledge in empirical sciences.

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 105

to H2 is ‘‘H1: wet-winter is related with river behavior’’. Section 4 gives a detailed description of the

knowledge subspace. A query language against the enhanced DLs is provided in Section 5.

3.2.2. The document subspace

Under each hypothesis is a justification set, giving reasons and evidences for the knowledge. These

justifications, made up of articles, reports, etc., constitute the document subspace of the DL information

space. Taking the above hypothesis H2 for example, the articles mentioning exactly that wet-winter is an

indicator of summer-flood constitute the justification for that hypothesis. Returning to the previous example

questions Q1, Q2, and Q3, the DL system can provide not only intelligent answers according to thehypothesis, but also a series of links to the justifying articles for users’ reference.

It is worth notice here that the document subspace challenges traditional DLs on literature organization,

classification, and management. For belief justifications, we must extend the classical keyword-based index

schema, which is mainly used for information searching and browsing purposes, to knowledge-based index

schema, in order that the information in DLs can be easily retrieved by both keywords and knowledge. Two

mechanisms for constructing the knowledge subspace and its linkage with the document subspace are

described in Section 6.

4. A formal description of DL knowledge subspace

In this section, we define the basic constituent of the DL knowledge subspace-hypothesis, starting with its

two constructional elements, i.e., concept terms and relation terms. Different concept-term-based and

relation-term-based relationships are then defined. They form the base for the definitions of hypotheses and

the equivalence, specification/generalization, and opposition relationships of hypotheses. Throughout the

discussion, we assume the following notation is used.

• A finite set of entity concepts EConcept ¼ fe1; e2; . . . ; etg.• A finite set of relation concepts RConcept ¼ fr1; r2; . . . ; rsg.• A finite set of concepts Concept, where Concept ¼ EConcept [ RConcept.• A finite set of contextual attributes Att ¼ fa1; a2; . . . ; amg.The domain of ai 2 Att is denoted as DomðaiÞ.

• A finite set of concept terms CTerm ¼ fn1; n2; . . . ; nug.• A finite set of relation terms RTerm ¼ fm1;m2; . . . ;mvg.• A finite set of hypotheses Hypo ¼ fH1;H2; . . . ;Hwg.

4.1. Concepts

Concepts represent real-world entities, relations, states, and events. Here, we classify concepts into two

categories, i.e., entity concepts and relation concepts, according to the roles they play in representingknowledge, which will be addressed in this section. Entity concepts are used to describe concept terms,

including attributes. They are similar to entities in classical Entity-Relationship (ER) diagrams. Some entity

concept examples are dog, animal, traffic-jam, wet-winter, summer-flood, etc. In contrast, relation concepts

are mainly for presenting various relationships among concept terms. They thus look like relationships in

ER diagrams. For example, the causal relation between a wet-winter in the north and a consequence

summer-flood in the south can be expressed with the relation concept cause.

Concepts are the building blocks of a sentence. They convey the most fundamental cognitive knowledge

in human society. Based on the substantial work on lexicography and ontology (Sowa, 1984; Kiefer, 1969;Evens & Smith, 1979; Nutter et al., 1990; Gruber, 1993; Guarino & Welty, 2000), three typical primitive

106 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

relationships between concepts can be established. They are Is-A, Synonym, and Opposite, each of which is

denoted using a binary predicate. For example, Is-A (summer-flood, river-behavior) represents the notion

that summer-flood is a river-behavior, Opposite (wet-winter, dry-winter) represents a pair of antonym con-

cepts, wet-winter and dry-winter, and Is-A (cause, relate) specifies a more general relation concept relatewith respect to cause.

Property 1. The three primitive relationships of concepts (which can be either entity concepts or relation

concepts) have the following properties: (1) Is-A is reflexive and transitive. (2) Synonym is reflexive, tran-

sitive, and symmetric. (3) Opposite is symmetric.

4.2. Concept terms

Observing that real-world users’ investigation and understanding of concepts are usually under certain

specific contexts, in this study, we associate each entity concept with a conceptual context to denote the

circumstance under which the entity concept is considered. Basically, we can declare a conceptual context

by a set of attributes called contextual attributes, each of which represents a dimension in the real world.

Typical contextual attributes include time, space, temperature, and so on. Note that the contextual attri-

butes could be of any kind as long as they are meaningful and make commitment to the existence of entity

concepts. For example, the context for the entity concepts wet-winter and summer-flood could be con-

structed by two contextual attributes, Period and Region, indicating the time and place for the occurrencesof wet-winter and summer-flood. The context of an entity concept can be further instantiated by assigning

concrete values to its constructional attributes. For example, we can restrict the contextual Region and Year

of the wet-winter concept to the north and south areas in 2000 by setting Region :¼ f\north"; \south"g andyear:¼ {‘‘2000’’}. Region :¼ DomðRegionÞ assigns all applicable regions (‘‘south’’, ‘‘north’’, ‘‘east’’, and‘‘west’’) to contextual attribute Region.

In this paper, we view an entity concept and its associated context as an integral unit, where the context

spans the universe of discourse for that entity concept. To distinguish it from the traditional notion of

entity concept, we name it as a concept term.

Definition 1. A concept term n 2 CTerm is of the form n ¼ ejAV , where e 2 EConcept and AV ¼fa :¼ Va j ða 2 AttÞ ^ ðVa DomðaÞÞg.Let Att ¼ fa1; a2; . . . ; amg be a set of contextual attributes which constitute the context under consid-

eration. The default setting for attribute ai (where ai 2 Att) is the whole set of applicable values in thedomain of ai, i.e., ai :¼ DomðaiÞ. AV ¼ fa1 :¼ Domða1Þ; a2 :¼ Domða2Þ; . . . ; am :¼ DomðamÞg depicts auniverse context, covering all possible values for every constructional attribute. A simple and equivalent

representation of a universe context is AV ¼ �.

Example 1. Suppose the context under consideration is comprised of two attributes Region and Year.

wet-winter jfRegion:¼f\north";\south"g;Year:¼f\2000"gg is a concept term, denoting a wet-winter entity concept in the north

or south in 2000. The concept term wet-winter jfYear:¼f\2000"gg implies a wet-winter concept in 2000 in all regions,including ‘‘north’’, ‘‘south’’, ‘‘west’’, and ‘‘east’’. The Context in wet-winterjfRegion:¼DomðRegionÞ;Year:¼DomainðYearÞg

spans the universe of discourse, equivalent to the expression wet-winter j� according to Definition 1.

Different relationships of concept terms can be identified based on their entity concepts and associated

contexts. Before giving the formal definitions, we first define three relationships between two conceptualcontexts.

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 107

Definition 2. Let AV1 and AV2 be two instantiated contexts.

• AV16 a AV2 ðor AV2 P a AV1Þ, iff 8ða :¼ V2Þ 2 AV2 9ða :¼ V1Þ 2 AV1ðV1 V2Þ;• AV1 ¼a AV2, iff AV1, 6 a AV2 and AV26 a AV1;• AV1 <a AV2 ðorAV2 >a AV1Þ, iff ðAV16 a AV2 ^ AV1 6¼a AV2Þ.

According to Definition 1, for any instantiated context AV , we have AV 6 a �.

The ¼ a relationship between two instantiated contexts AV1 and AV2 indicates that they both have exactlythe same contextual attributes with the same attribute values. AV16 a AV2 implies a broader contextualscope of AV2 which embodies the contextual scope of AV1.

Example 2. Assume we have four instantiated contexts:

AV1 ¼ �,AV2 ¼ fYear :¼ f\1999"; \2000"; \2001"gg,AV3 ¼ fRegion :¼ f\north"g; Year :¼ f\2000"gg, andAV4 ¼ fRegion :¼ f\south"g; Year :¼ f\2000"gg.

Following Definition 2,

AV2 <a AV1, AV3 <a AV1, AV4 <a AV1, AV3 <a AV2, and AV4 <a AV2.

Property 2. The four context-based relationships defined above have the following properties:

(1) 6 a (or P a) is reflexive and transitive.

(2) ¼a is reflexive, transitive, and symmetric.

(3) <a (or >a) is transitive.

Based on the primitive relationships of concepts (Is-A, Synonym, Opposite), as well as their contexts

ð6 a;¼a; <aÞ, we declare the following three concept-term-based relationships, i.e., equivalence EQct,

specification SPECct, and opposition OPSIct. Assume n1 ¼ e1jAVi and n2 ¼ e2jAV2 are two concept terms inthe following definitions, where n1, n2 2 CTerm.

Definition 3 (Equivalence EQctðn1; n2Þ). n1 is equivalent to n2, iff the following two conditions hold: (1)ðe1 ¼ e2Þ _ Synonymðe1; e2Þ; (2) ðAV1 ¼a AV2Þ.

Example 3. Given two concept terms: n1 ¼ wet-winter j� and n2 ¼ high-rainfall-winterjfRegion:¼DomðRegionÞg,

EQctðn1; n2Þ since Synonymðwet-winter; high-rainfall-winterÞ.

Definition 4 (Specification SPECctðn1; n2Þ). n1 is a specification of n2 (conversely, n2 is a generalization ofn1), iff the following two conditions hold: (1) ðe1 ¼ e2Þ _ Is� Aðe1; e2Þ _ Synonymðe1; e2Þ and (2) ðAV16 a AV1Þ.

Example 4. Given three concept terms: n1 ¼ wet-winter jfRegion:¼f\north"g;Year:¼f\2000"gg, n2 ¼ wet-winterjfYear:¼f\2000"gg, and n3 ¼ winter j�, we have SPECctðn1; n2Þ, SPECctðn1; n3Þ, and SPECctðn2; n3Þ, since fRegion :¼f\north"g; Year :¼ f\2000"gg <a fYear :¼ f\2000"gg, fRegion :¼ f\north"g; Year :¼ f\2000"gg <�

a,fYear:¼ f\2000"gg <�

a, and Is-Aðwet-winter;winterÞ.

108 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

Definition 5 (Opposition OPSIctðn1; n2Þ). n1 is opposite to n2, iff the following two conditions hold (1)Oppositeðe1; e2Þ; (2) ðAV1 ¼a AV2Þ.

Example 5. Let n1 ¼ wet-winter jfRegion:¼f\north"gg and n2 ¼ dry-winterjfRegion:¼f\north"gg be two concept terms, wehave OPSIctðn1; n2Þ since Oppositeðwet-winter; dry-winterÞ.

To facilitate the description of hypothesis-based inter-relationships in Section 4.4, we further extend the

three relationships (i.e., EQct, SPECct, and OPSIct) defined over a pair of concept terms to the relationships(i.e., EQCT , SPECCT , and OPSICT ) over a pair of concept term sets. Let N1 and N2 be two concept term sets,where N1, N2 CTerm.

Definition 6 (Equivalence EQCT ðN1;N2Þ). N1 is equivalent to N2, iff 8n1 2 N1 9n2 2 N2 EQctðn1; n2Þ ^ 8n2 2N2 9n1 2 N1 EQctðn2; n1Þ.

Definition 7 (Specification SPECCT ðN1;N2Þ). N1 is specification of N2, iff 8n2 2 N2 9n1 2 N1 SPECctðn1; n2Þ.

Definition 8 (Opposition OPSICT ðN1;N2Þ). N1 is opposite to N2, iff 9n1 2 N1 9n2 2 N2 OPSIctðn1; n2Þ _ 9n2 2N2 9n1 2 N1 OPSIctðn2; n1Þ.

As long as there exists a pair of opposite concept terms in the two concept term sets, we declare they areopposite.

Property 3. The three relationships (EQCT , SPECCT , and OPSlCT) defined above have the following properties:

(1) EQCT is reflexive, transitive, and symmetric. (2) SPECCT is reflexive and transitive. (3) OPSICT is sym-

metric.

4.3. Relation terms

A conceptual relation explicates a certain correlation among a set of conceptual terms. Unlike entity

concepts, a relation concept can be affiliated with different kinds of modals like necessity, possibility, per-

mission, etc. to qualify the truth of the relationship. Reword we explore the use of well-established modallogic (Chellas, 1980) to our conceptual relation study. Modal logic, as mentioned in the Stanford Ency-

clopedia of philosophy (Garson, 2001), is, strictly speaking, the study of deductive behavior of the

expressions ‘‘it is necessary that’’ and ‘‘it is possible that’’. Here, by prefixing a relation concept r with thesymbol � or }, we can achieve different levels of ascertainability regarding relation r. For example, �causeimplies a definite causal relation, while }cause implies a possible cause relation.

Definition 9. A relation term m 2 RTerm is of the form m ¼ dr, where r 2 RConcept, and d could be �, }, oromitted representing an empty modal.

Definition 10. According to modal logic, we define the order \�" < \" < \}" for symbols �, empty modal,and }.

The three relation-term-based relationships can be defined using the same names (i.e., EQ, SPEC, andOPSI) as concept-term-based relationships but with a different subscript flag ‘‘rt’’ to make the difference.

Definition 11 (Equivalence EQrtðd1r1; d2r2Þ). d1r1 is equivalent to d2r2, iff the following two conditions hold:(1) ðr1 ¼ r2Þ _ Synonymðr1; r2Þ and (2) ðd1 ¼ d2Þ.

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 109

Definition 12 (Specification SPECrtðd1r1; d2r2Þ). d1r1 is a specification of d2r2 (conversely, d2r2 is a general-ization of d1r1), iff the following two conditions hold: (1) ðr1 ¼ r2Þ _ Is-Aðr1; r2Þ _ Synonymðr1; r2Þ and (2)ðd1 ¼ d2Þ _ ðd1 < d2Þ.

Definition 13 (Opposition OPSIrtðd1r1; d1r2Þ). d1r1 is opposite to d2r2, iff the following two conditions hold:(1) Opposite ðr1; r2Þ and (2) ðd1 ¼ d2 6¼ \}"Þ.

Example 6. EQrtð}cause;}lead-toÞ, SPECrtð�cause;}causeÞ, SPECrtðcause; relateÞ, OPSIrtð�relate;�unr-elateÞ.

The three relation-term-based relationships defined above have the same properties as their corre-

sponding concept-term-based relationships.

4.4. Hypotheses

A hypothesis communicates a human’s cognitive idea or thinking about things in existence, such as the

causal connection of situations, the sequential occurrence of events, etc. Here, we view each piece of

hypothesis as a conceptual relation (situated at the intermediate of Fig. 3) which correlates a set of

input concept terms to a set of output concept terms. For example, the hypothesis ‘‘wet-winter in the

north causes summer-flood in the south and hot summer in the east’’ causally relates concept term

wet-winter jfRegion:¼f\north"gg to summer-flood jfRegion:¼f\south"gg and hot-summer jfRegion:¼f\east"gg.

Definition 14. A hypothesis H 2 Hypo is of the form H ¼ drðIN ;ON Þ, where dr 2 RTerm, IN CTerm, and

ON CTerm. A hypothesis H also can be pictorially denoted as ‘‘H : IN )dr

ON ’’.

Various hypothesis-based inter-relationships can be established based on their component relationships,

i.e., concept term sets and relation terms. Assume that H1: IN1 )d1r1

ON1 and H2: IN2 )d2r2

ON2 are two hypotheses

in Hypo.

Definition 15. H1 is equivalent to H2, written as H1 �h H2, iff EQrtðd1r1; d2r2Þ ^ EQCT ðIN1 ; IN2Þ ^EQCT ðON1 ;ON2Þ.

For two equivalent hypotheses, they must have equivalent relation terms, as well as equivalent input and

output concept term sets.

Definition 16. H1 is a specification of H2 (conversely, H2 is a generalization of H1), written as

H1 �h H2ðH2 �h H1Þ, iff SPECrtðd1r1; d2r2Þ ^ SPECCT ðIN1 ; IN2Þ ^ SPECCT ðON1 ;ON2Þ.

Fig. 3. A schematic view of a hypothesis.

110 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

We call H1 a strict specification of H2 (conversely, H2 a strict generalization of H1), written asH1 �h H2ðH2 �h H1Þ, iff ðH1 �h H2Þ ^ ðH1 6� h H2Þ.If H1 is a specification of H2 and a specification of H3, then H1 is a common specification of H2 and H3.

Conversely, if H1 is a generalization of H2 and a generalization of H3, then H1 is a common generalization ofH2 and H3.

Example 7. Given the following three hypotheses:

H1: fwet-winter jfRegion:¼f\north"gg;warm-winter jfRegion:¼f\north"ggg )�cause

fsummer-flood jfRegion:¼f\south"ggg,H2: fwet-winterjfRegion:¼f\north"ggg )

}causefsummer-floodjfRegion:¼f\south"ggg, and

H3: fwet-winterjfRegion:¼f\north"ggg )}cause

friver-behavior j�g,

H1 is more specific than H2 and H3, and H2 is also more specific than H3 (i.e., H1 �h H2, H1 �h H3, andH2 �h H3). Besides, all of them are strict specifications. H1 is a common specification of H2 and H3, and H3 isa common generalization of H1 and H2.

Definition 17. H1 is opposite to H2, written as H1 /h H2, iff either of the following conditions holds:

(1) OPSIrtðd1r1; d2r2Þ ^ EQCT ðIN1 ; IN2Þ ^ EQCT ðON1ON2Þ; or(2) EQrtðd1ri; d2r2Þ ^ EQCT ðIN1 ; IN2Þ ^ OPSICT ðON1ON2Þ.

For two opposite hypotheses, they may have equivalent input/output concept term sets but with oppositerelation terms (Case 1 of the definition), or they may have equivalent relation terms and input concept term

sets, but with at least one opposite output concept term pair (Case 2 of the definition).

Example 8. Given the following three hypotheses:

H1: fwet-winter j�g )�relate

fsummer-flood j�; hot-summer j�g,H2: fwet-winter j�g )

�unrelatefsummer-flood j�; hot-summer j�g, and

H3: fwet-winter j�g )�relate

fcool-summer j�g,

we have ðH1 /h H2Þ and ðH1 /h H3Þ, since OPSIrtð�relate;�unrelateÞ and OPSICT ðhot-summer j�;cool-summer j�Þ.

Property 4. The relationships defined on hypotheses have the following properties: (1) Equivalence �h is

reflexive, transitive, and symmetric. (2) Specification �h is reflexive and transitive. (3) Strict specification �h

is transitive. (4) Opposition /h is symmetric.

4.5. The knowledge subspace

Hypotheses and their inter-relationships constitute a DL knowledge subspace. At an abstract level, a

knowledge subspace can be viewed as an oriented diagram consisting of a series of nodes (each representing

a hypothesis) that are connected to each other through directed labeled edges (representing various rela-

tionships between hypotheses), as shown in the upper part of Fig. 2. To make the diagram connected, we

introduce two special hypotheses to Hypo: the universal hypothesis > that is a generalization of all otherhypothesis, and the absurd hypothesis ? that is a specification of all other hypotheses.

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 111

Definition 18. A DL knowledge subspace KSpace ¼ ðNode;EdgeÞ is composed of a set of nodes representinghypotheses in Hypo, and a set of directed edges representing relationships of equivalence �h, (strict)

specification ð�hÞ �h, and opposition /h among the hypotheses.

The DL knowledge subspace provides users with a huge exploratory space, where knowledge acquisi-

tion, navigation, and deduction can be performed. If a user’s inquiry has the form of a hypothesis, the

above relationships like equivalence, specification/generalization, opposition can be explored to find

matching hypotheses in the knowledge subspace. The hypotheses together with the backing documents,

serving as the justification of hypotheses, are returned to the user as a part of the answer to his/her stra-

tegical request.

4.6. The linkage between the knowledge subspace and document subspace

The DL document subspace accommodates all the documents in a library. They are the sources for

answering users’ information searching and browsing requests. In addition, for an enhanced DL systemproposed in this paper, documents in the DL document subspace also serve as the justification for the

corresponding knowledge in the knowledge subspace. That is, under each hypothesis is a justification,

giving reasons and evidences for that knowledge. This justification comes from articles, reports, and so on.

Taking the hypothesis H : fwet-winter j�g )�cause

fsummer-flood j�g for example, its supporting documents arethose which state exactly that ‘‘wet-winter is an indicator of summer-flood’’. Let Doc denote the whole set ofdocuments in a DL. All the documents in the DL that support a hypothesis constitute the referent for that

hypothesis.

Definition 19. Let H 2 Hypo be a hypothesis in the knowledge subspace. The referent of H , written asWH , isthe set of all documents in the library that support H .For the two special hypotheses, we assume W ?¼ 0 and W> ¼ Doc.

The defined relationships (i.e., equivalent, specific/general, and opposite) among hypotheses lead us to thefollowing axiom and theorem.

Axiom 1. Let H1, H2 be two hypotheses in Hypo.

• If H1 is a (strict) specification of H2 (i.e., H1 �h H2 or H1 �h H2), then WH1 WH2.• If H1 is equivalent to H2 (i.e., H1 �h H2), then WH1 ¼ WH2• H1 is opposite to H2 (i.e., H1 /h H2), then WH1 \WH2 ¼ ;.

Using Definition 16 of speciality/generality between hypotheses, we can be sure that if a hypothesis is

consistent with a set of documents, any generalization of it also will be consistent with this document set. In

contrast, if a document does not conform to a hypothesis, it cannot conform to any specialization of that

hypothesis either. For any hypothesis H 2 Hypo where ð?�h H �h >Þ, it is obvious that ; ¼ W ?WH W> ¼ Doc.

Theorem 1. Let H1, H2, H3 be three hypotheses in Hypo, where H1 is a common generalization of H2 and H3i.e., ðH2;�h H1Þ and ðH3 �h H1Þ. We have ðWH2 [WH3Þ WH1.

Proof. Since ðH2 �h H1Þ, according to Axiom 1, ðWH2 WH1Þ. Similarly, ðWH3 WH1Þ because ofðH3 �h H1Þ. Thus, ðW3H2 [WH3Þ WH1. h

112 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

5. Queries against the enhanced DLs

With the proposed two-layered DL information space, users can pose to DLs various strategic level

questions which involve knowledge elements like ‘‘what will cause a summer-flood in the south?’’ ‘‘Does

mobile phone cause brain cancer? If so, give me documents to prove that’’. To express this kind of information

requests, we present a query language against the enhanced DLs, using a syntax similar to logical

expressions.

5.1. Query constructors

Basically, a DL query consists of a single head and a single tail separated by the j symbol, hheadi j htaili.The tail describes the query condition, while the head describes the query result, which can be knowledge(i.e., either a complete hypothesis or part of a hypothesis) with/without justification documents, or a

boolean indicating whether the knowledge is true or false.

A query tail is a list of predicates connected by logical connectives AND ð^Þ, OR ð_Þ, or NOT ð:Þ,where each predicate falls into any of the following three categories:

• an element binding constraint FuncðeÞ=Funcðe;EÞ;• a hypothesis of a binary predicate form tðIN ;ON Þ;• a comparison constraint between two expressions hexpressionihopcompihexpressioni.

An element in a query tail can be either a variable or a constant. It can represent a hypothesis, a

hypothesis component (which can be a concept term, a contextual attribute, a contextual attribute value, a

relation term, or a modal symbol), or a referent document of a hypothesis, i.e.,

helementi ::¼ hhypothesisi j hhypothesis componenti jhreferent-documenti

hhypothesis componenti ::¼ hconcept-termi j hcontextual-attributei jhcontextual-attribute-valuei jhrelation-termi j hmodali

We use different constraint functions of the form FuncðeÞ or Funcðe;EÞ to bind an element to the corre-sponding category.

• HypoðeÞ––e denotes a hypothesis;• Ref ðe;HÞ––e denotes a referent document of hypothesis H ;• CTermðe;HÞ––e denotes a concept term in hypothesis H ;• Attðe;HÞ––e denotes a contextual attribute in hypothesis H ;• Valðe; aÞ––e denotes a value of the contextual attribute a;• RTermðe;HÞ––e denotes a relation term in hypothesis H ;• Modalðe;HÞ––e denotes a modal in hypothesis H .

An expression in a comparison constraint hexpressionihopcompihexpressioni can be an element or anarithmetic expression involving single-value-oriented operators {+, ), *, /} or set-value-oriented operatorsf\;[g. That is,

hexpressioni ::¼ helementi j helementihopmathihexpressionihop i ::¼ þ j � j � j = j \ j[

math

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 113

To query the knowledge in a DL, we expand the traditional comparison operators to include the

hypothesis-based comparison operators f�h;�h;�h;�h;�h; 6� h;/hg defined in the previous section, i.e.,

hopcompi ::¼< j6 j ¼ j 6¼ j > jP j ! j j " j # j �h j �h j �h j 6� hj /h j �h j �h

Within a query (tail and head), we use e to indicate a single element, and brace feg to indicate a set ofelements.

5.2. Query examples

We illustrate the usage of the presented query constructors through a set of examples. In this subsection,

we formulate our descriptions according to different types of query outputs.

(1) Query the tenability of an input knowledge (e.g., hypothesis)Q1: Tell me whether wet-winter will cause summer-flood.

jcauseðfwet-winter j�g; fsummer-flood j�gÞ

An empty query head merely inquires the tenability of the input knowledge. If it holds, the query returns

true, and otherwise, false.

(2) Query the referent documents of a hypothesis

Q2: Give me documents which talk about the causal relation between wet-winter and summer-flood.

fdg jHypoðHÞ ^ Ref ðd;HÞ ^ H �h causeðfwet-winter j�g; fsummer-flood j�gÞ

Some intermediate variables with corresponding binding constraints (e.g., H in this query example) can beemployed for the ease of query expressions.

(3) Query the hypotheses and their referent documents

Q3: Tell me all hypotheses in contrast to the one ‘‘wet-winter will cause summer-flood’’. Give me all the

referent documents.

fH ; fdgg jHypoðHÞ ^ Ref ðd;HÞ ^ H /h causeðfwet-winter j�g; fsummer-flood j�gÞ

The result is a set whose element is made up of two fields, a hypothesis and a set of supporting documents.

(4) Query the concept terms in a hypothesis

Q4: Tell me what factor(s) will possibly cause a summer-flood. Give me the referent documents.

fng; fdg jHypoðHÞ ^ CTermðn;HÞ ^ Ref ðd;HÞ ^ H �h }causeðn:fsummer-flood j�gÞ

(5) Query the relation term in a hypothesis

Q5: Tell me whether there is a relation between wet-winter and summer-flood. Give me the referent docu-

ments.

m; fdg jHypoðHÞ ^ RTermðr;HÞ ^ Ref ðd;HÞ ^ H �h mðfwet-winter j�g; fsummer-flood j�gÞ

(6) Query the modal of a relation term in a hypothesis

Q6: Tell me whether wet-winter will necessarily or possibly cause summer-flood.

d jHypoðHÞ ^Modalðd;HÞ ^ H �h dcauseðfwet-winter j�g; fsummer-flood j�gÞ

The query answer will be �, }, or empty.(7) Query the contextual values in a hypothesisQ7: Tell me which region(s) will have summer-flood due to the wet-winter in the south. Give me the referent

documents.

114 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

fvg; fdg jHypoðHÞ ^ Valðv;RegionÞ ^ Ref ðd;HÞ ^ H �h causeðfwet-winter jRegion:¼fvgg;fsummer-flood jRegion:¼fsouthggÞ

(8) Query the contexts (attribute-value pairs) in a hypothesis

Q8: Tell me the contexts under which wet-winter will cause summer-flood.

fa :¼ fvagg; fb :¼ fvbgg jHypoðHÞ ^ Attða;HÞ ^ Valðva; aÞ ^ Attðb;HÞ ^ Valðvb; bÞ^H �h causeðfwet-winter jfa:¼fvaggg; fsummer-flood jfb:¼fvbgggÞ

6. The construction of DL knowledge subspace

The construction of the DL knowledge subspace and its linkage to the associated justification (docu-

ment) subspace can be done in two ways: (1) human-centered knowledge acquisition. Experienced humans

input hypothesis knowledge manually, based on which justifying articles are collected by performing

searching and browsing on DL systems; (2) machine-centered knowledge acquisition. Under the assistance of

users, DL systems automatically deduce hypothesis knowledge by correlating and analyzing data sources.We discuss these two methods in detail in the following sections.

6.1. Human-centered knowledge acquisition

The central problem in all attempts to leverage a digital library by adding knowledge to mainly symbolic

data (either in the form of raw data or document collections) is to understand the data. Currently, machines

hardly extract proper knowledge from data alone. Therefore, any practical application of the techniques

described in this paper involves human knowledge acquisition efforts.

The sheer amount of data available through current DLs, including the Web, prevents ‘‘charting of the

knowledge space’’, as traditionally done by librarians and other knowledge workers. Their efforts are still

invaluable, as all ‘‘portals’’ on the Internet prove: few true search engines without human knowledge charts

survive to date. However, humans alone on the knowledge provider side cannot properly chart all docu-

ments and data that become, or already are, available to (potential) knowledge consumers.What is needed is a human input from the consumer side. As even detailed logs of search engines prove,

these logs (taken at the tactical level of information seeking) only reveal attempts of users to find what they

want using coarse and/or dumb tools. It is like attempting to reconstruct the history of the great pyramids

in Egypt by looking only at the cuts the tools of the stone carvers made on the sand stone blocks. What is

totally missing from search engine query logs is the actual strategic intention of the user behind the key-

board.

Acquiring this strategic intention is difficult. Few users will volunteer a semi-formal description of their

intentions when there is no single form of short-term reward for their efforts. Users will never take the timeto learn a formal language to express their strategic targets, and any user interface trying to guide them will

likely become either too complex to understand or too coarse to be of use.

In order to overcome at least some of these problems and to be able to run some experiments on strategic

user intention acquisition, we intend to concentrate on a very limited subset of knowledge acquisition

(hypothesis support/denial). Such a limited goal can be pursued with limited tools, making it suitable for a

small group of users with a well-defined type of knowledge need. On top of the limited, but understandable

and simple strategic intention acquisition tool, we intend to implement some kind of immediate reward

system that makes it attractive for users to specify their strategic intentions to the system before embarkingon their traditional, tactical search effort. Currently, we think of a system that extends the user’s quota for

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 115

certain activities, typically inter-library loans or other kinds of priced assets that are required for the work,

but are in short supply due to the costs involved. Users who are active in supplying proper strategic targets

are rewarded by increasing quota for related activities, facilitating their work. Existing systems which use

this ‘‘return on investment’’ policy have showed that such a policy has a self-regulating property, 2 andindeed may lead to a coarse but useful filtering of data items (documents) useful to support a strategic goal.

One of our future work will concentrate on finding a proper test case for such a system, and to gather

data on the success of a ‘‘return on investment’’ policy, the collected strategic knowledge targets in respect

to the retrieved documents, and the re-usability of these targets and the document result sets for other users

with comparable information needs.

6.2. Machine-centered knowledge acquisition

On the other hand, with more and more digital information in a wide variety of disciplines accumulated

in DLs today, the automatic and/or user-assisted semi-automatic extraction of inherent knowledge from

such a large volume of data becomes indispensable. In this section, we outline a framework for machine-

centered knowledge discovery across multiple repositories in DLs. Basically, it proceeds in six steps: (1) setup knowledge discovery targets, (2) identify relevant resources, (3) filter out interesting concepts from identified

resources, (4) correlate concepts according to contextual information, (5) extract knowledge and justifications

from correlated concepts, and (6) evaluate the discovered knowledge and justifications.

Phase-1: Set Up Knowledge Discovery Tasks. Before extracting knowledge and associated justifications,

we need to know what kind of knowledge and justifications are expected from users. Does the user want to

know the inherent associations of some of concepts in certain areas? Or does the user want to obtain some

knowledge on classifying or discriminating objects? Or is the user interested in the sequential evolution of

certain objects in order to make predictions? Here, a friendly user interface is important to the natural andaccurate specification of such knowledge acquisition targets.

Phase-2: Identify Relevant Resources. Confronted with a huge collection of data sources scattered in

heaps of repositories, we must identify repositories which contain the most likely relevant resources in

response to a given knowledge request. Otherwise, the following knowledge discovery process will be

lengthy, aimless, and inefficient. To do this, we must categorize the information content in each repository.

By querying the concepts, which are elicited from the user’s request and background, against these meta-

data catalogs, we may identify relevant data sources. For example, if a user is interested in the correlations

between ‘‘summer-flood’’ and ‘‘meteorological factors’’, the meteorological repository in the observatoryheadquarters and the repository in the river management office will be identified as mostly relevant.

Phase-3: Filter Out Interesting Concepts from Identified Resources. The identified resources from Phase-2

could be in various formats. They can be unstructured articles, multimedia documents, formatted reports,

database records, semi-structured files, or hypertexts, etc. To make knowledge discovery across multiple

heterogeneous repositories possible, we need to transform the original data sources into a uniform format.

A record structure consisting of a set of keywords can be exploited to describe each documental entity.

These keywords explicate the major concepts conveyed by the corresponding documents. For example,

from a textual article which mentions ‘‘high rainfall amounts in November and December in 1996’’, we canfilter out ‘‘wet-winter’’ concept ‘‘in 1996’’. Here, the transformation from heterogeneous resources into

keyword-based records solicits a wide range of techniques, including natural language processing, infor-

mation analysis, categorization and summarization, textual and multimedia data mining, etc.

Phase-4: Correlate Concepts According to Contextual Information. Data records filtered out of diverse

resources must be logically linked together so that their inherent associations can be detected. This could be

2 http://www.slashdot.org

Table 1

A logical connection example between two resource records

linked record ID context Info. weather record (Concept) river behavior record (Concept)

1 1984: wet winter, warm serious flood summer

2 1985: dry winter, warm no flood summer

3 1986: wet winter, cold no flood summer

4 1994: wet winter, cold flood summer

5 1996: except wet winter, cold flood summer

116 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

done based on their common contexts. For example, to find possible relationships between season and riverbehavior as a consequence, we can link a yearly weather record obtained from one resource, with the corre-

sponding river behavior record in the same year, which is obtained from another resource. Here, ‘‘year’’

conveys a kind of contextual information with which the existence of concepts ‘‘weather’’ and ‘‘river behavior’’

have meaning. Table 1 shows a list of logical connection examples between these two kinds of records.

Phase-5: Extract Knowledge and Justifications from Correlated Concepts. After preparing linked data

records, we are now in a position to find their inherent knowledge and justifications. Plenty of efforts have

been made on message understanding and information extraction in the literature (Sundheim, 1993; Bagga,

1998; Lin, 1998). Our current study focuses on the discovery of correlationships of concepts. The associ-ation rule technique developed in the data mining area can be applied to this knowledge extraction phase

(Agrawal, Imielinski, & Swami, 1993; Agrawal & Srikant, 1994).

The problem of mining association rules from transactional data was first introduced in (Agrawal et al.,

1993). The application is sales data of supermarkets. It aims to discover the associations among items

purchased by customers such that the presence of some items in a transaction will imply the presence of

other items in the same transaction. The following is a mathematical model to address the problem of

mining association rules. Let I ¼ fi1; i2; . . . ; img be a set of literals, called items. Let D be a set of transactionrecords, where each transaction record A consists of a set of items such that A I . Let X be a set of items. Atransaction A is said to contain X if and only if X A. An association rule is an implication of the formX ) Y , where X ! I , Y ! I , and X \ Y ¼ /. The rule X ) Y holds in the transaction set D with confidencec if c% of transactions in D that contain X also contain Y . The rule has support s if s% of transactions in Dcontain X [ Y .Applying the concept of association rules in transactional data to our linked records illustrated in

Table 1, we can find the correlationship of concept terms like ‘‘wet-winter ) summer-flood’’. Since amongthe total five records in this small database, three of them contain {wet-winter, summer-flood}, thus sup-

port¼ 3/5¼ 60%. Within the four records stating wet-winter, three of them also state summer-flood.Therefore, we have confidence 3/4¼ 75%. These supporting records, together with the support and confi-

dence measurement values, can serve as the justifications for the extracted knowledge.

Phase-6: Evaluate the Discovered Knowledge and Justifications. Obtaining a set of knowledge and jus-

tifications is not the end. Besides using statistical indicators such as support and confidence to measure the

strength of discovered knowledge, we may also incorporate subjective measurements to evaluate and rank

their significance with respect to users’ cognitions and their real-life problems.

7. Conclusion

Information is a basic human need. DLs contain a set of information resources collected and organized

in a way that accommodates the actual tasks and activities that people engage in when they create, seek,

and use information sources. Motivated by the problems: (i) inadequate strategical level cognition support;(ii) inadequate knowledge sharing facilities––with the present-day DLs, we first introduce a two-layered DL

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 117

functional model to support different levels of human cognitive acts. The tactical level cognition support

aims to provide users with requested relevant documents, as searching and browsing do, while strategic

level cognition support can provide not only documents but also intelligent answers to users’ high-order

cognitive questions. Second, to address users’ high-order cognitive requests, we propose an enlargedinformation space comprised of a knowledge subspace and a document subspace. A formal description of the

knowledge subspace, as well as query facilities against the enhanced DLs for knowledge sharing and

propagation among DL users, are described.

Compared with traditional knowledge-based systems, enhanced DL systems with knowledge elements

have the following distinguished features.

• A cognitive function. A knowledge-based system is usually designed to apply logical inference rules to

make judgements in processing business routines or come up with a conclusion to a certain pre-definedproblem. The mission of a DL system equipped with a knowledge subspace is to make expertise knowl-

edge widely available to the public. From such DLs, users can obtain not only requested documents, but

also intelligent answers to their strategic level questions.

• A broad scope. A knowledge-based system intends to solve problems in a narrow domain, e.g., company

delivery charge, heart disease diagnosis, etc. The rules stored in its knowledge bases are thus limited to a

particular field. On the contrary, the scope of the DL knowledge subspace is broad, covering a wide

spread of scientific and engineering disciplines.

• An extensive content. DLs collect a huge body of documents, which can serve as knowledge justifications.Comparatively, knowledge-based systems contain a limited amount of rules and facts in a particular field

of expertise.

The major contributions of the paper are twofold. First, the presented DL information space extends the

traditional role of DLs from information provider to information and knowledge provider. Second, the

traditional simple keyword-based index schema is expanded to the strategic knowledge-based level, con-

sisting of inter-related hypotheses that are backed by documents. One can say that the essence of the

documents is reconstructed in the DLs’ knowledge level.The work reported here is a first step towards knowledge-based DLs. For future work, a lot of issues

deserve investigation. (1) One immediate work is to build up the knowledge subspace and link it to the

document subspace. (2) To facilitate strategic level cognitive activities, efficient storage and management of

the knowledge and document subspaces is very important and must be carefully planned. This demands

effective indexing strategies for both knowledge and justifying documents. (3) Efficient query evaluation

and knowledge navigation mechanisms must be built to help users make the best of information and

knowledge assets in solving their problems. (4) Our eventual goal is to develop a practical DL system, which

can empower humans with real actionable knowledge in solving their information problems.

Acknowledgements

The authors would like to thank the guest editors Edward Fox and Elisabeth Logan, and the referees for

their insightful comments which have greatly helped to improve the paper.

References

Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of

the 1993 ACM SIGMOD international conference on management of data. Washington, DC, USA (pp. 207–216).

118 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of the 20th international conference on

very large data bases, Santiago, Chile (pp. 478–499).

Anderson, R. (1992). Information and knowledge based systems: An introduction. New York: Prentice Hall, Inc.

Bagga, A. (1998). Analyzing the complexity of a domain with respect to an information extraction task. In Proceedings of the seven

message understanding conference, Virginia, USA. Available: http://www.itl.nist.gov/iaui/894.02/related.projects/muc/proceed-

ings/muc_7_toc.html#infoextract/.

Bates, M. (1986). Subject access in online catalogs: a design model. Journal of the American Society for Information Science, 37(6),

357–376.

Bates, M. (2002). The cascade of interactions in the digital library interface. Information Processing and Management, 38(3), 381–400.

Beard, K., & Sharma, V. (1997). Multidimensional ranking for data in digital spatial libraries. Journal on Digital Libraries, 1(2),

153–160.

Beaulieu, M., & Jones, S. (1998). Interactive searching and interface issues in the Okapi best match probabilistic retrieval system.

Interacting with Computers, 10, 237–248.

Belkin, N., Borgman, C., Dumais, S., & Beaulieu M. H. (1994). Evaluating interactive retrieval systems. In Proceedings of the 17th

international conference on research and development in information retrieval, Dublin, Ireland (p. 361).

Belkin, N., & Marchetti, P. (1990). Determining the functionality and features of an intelligent interface to an information retrieval

system. In Proceedings of the 13th international conference on research and development in information retrieval, Brussels,

Belgium (pp. 151–177).

Bennett, N., He, Q., Chang, C., & Schatz, B. (1999). Concept extraction in the interspace prototype. Technical Report, UIUCDCS-R-

99-2095, Department of Computer Science, University of Illinois at Urbana-Champaign.

Borgman, C. (1987). Individual differences in the use of information retrieval systems: some issues and some data. In Proceedings of the

10th international conference on research and development in information retrieval, New Orleans, USA (pp. 61–71).

Brachman, R., Ciccarelli, E., Greenfeld, N., & Yonke, M. (1978). KL-ONE reference manual. Technical report, Bolt Beranek and

Newman Inc. BBN Report No. 3848, Cambridge, MA.

Buttenfield, B., & Larsen, S. (1997). Building a digital library for earth system science: the Alexandria project. In Proceedings of annual

meeting of the Geoscience Information Society, USA (pp. 23–26).

Cercone, N. & McCalla, G. (Eds.). (1987). The Knowledge Frontier. New York: Springer-Verlag (S. C. Shapiro, & W. J. Rapaport,

Chapter: SNePS considered as a fully intensional propositional semantic network).

Checkland, P., & Holwell, S. (1998). Information, systems and information systems––making sense of the field. New York: John Wiley &

Sons, Inc.

Chellas, B. (1980). Modal logic: An introduction. Cambridge: Cambridge University Press.

Chen, H. (1992). Knowledge-based document retrieval: framework and design. Journal of Information Science, 18, 293–314.

Chen, H. (1999). Semantic research for digital libraries. D-Lib Magazine, 5(10), Available: http://www.dlib.org/dlib/october99/chen/

10chen.html.

Chen, H., Lynch, K., Basu, K., & Ng, T. (1993). Generating, integrating, and activating thesauri for concept-based document retrieval.

IEEE Expert, 8(2), 25–34.

Chen, H., Smith, T., & Ng, T. (1997). Geoscience self-organizing map and concept space. In Proceedings of 2nd ACM international

conference on digital libraries, Philadelphia, USA (p. 257).

Cooper, B., Crespo, A., & Garcia-Molina, H. (2000). Implementing a reliable digital object archive. In Proceedings of the fourth

European conference on research and advanced technology for digital libraries, Lisbon, Portugal (pp. 128–143).

Crespo, A., & Garcia-Molina, H. (2000). Modeling archival repositories for digital libraries. In Proceedings of the fourth European

conference on research and advanced technology for digital libraries, Lisbon, Portugal (pp. 190–205).

Crestani, F., Lalmas, M., van Rijsgergen, C., & Campbell, I. (1998). ‘Is this document relevant?. . . probably’: a survey of probabilisticmodels in information retrieval. ACM Computing Surveys, 30(4), 528–552.

Dao, T. (1998). An indexing model for structured documents to support queries on content, structure and attributes. In Proceedings of

the IEEE forum on research and technology advances in digital libraries, California, USA (pp. 88–97).

Dumais, S. (1999). Beyond content-based retrieval: modeling domains, users, and interaction. In Proceedings of the IEEE advanced in

digital libraries, Maryland, USA (pp. 144–145).

Dunlop, M., & van Rijsbergen, C. (1993). Hypermedia and free text retrieval. Information Processing and Management, 29(3), 287–298.

Efthimiadis, E. (1993). A user-centered evaluation of ranking algorithms for interactive query expansion. In Proceedings of the 16th

international conference on research and development in information retrieval, Pittsburgh, PA (pp. 146–159).

Evens, M., & Smith, R. (1979). A lexicon for a computer question–answering system. American Journal of Computational Linguistics,

83, 1–93.

Fidel, R. (1994). User-centered indexing. Journal of the American Society for Information Science, 45(8), 572–576.

Findler, N. (Ed.). (1979). Associative networks: The representation and use of knowledge by computers. New York: Academic Press

(S. C. Shapiro, Chapter: The SnePS semantic network processing system).

Fox, E., & Marchionini, G. (1998). Toward a world wide digital library. Communications of the ACM, 41(4), 28–32.

L. Feng et al. / Information Processing and Management 41 (2005) 97–120 119

Fox, E., & Marchionini, G. (2001). Digital libraries: introduction. Communications of the ACM, 44(5), 30–32.

Frakes, W., & Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms. New Jersey: Prentice-Hall, Inc.

Furnas, G., Landauer, T., Gomex, L. M., & Dumais, S. (1987). The vocabulary problem in human–system communication.

Communications of the ACM, 30(11), 964–971.

Garnham, A. (1988). Artificial intelligence: An introduction. London: Routledge & Kegan Paul, Inc.

Garson, J. (2001). The Stanford encyclopedia of philosophy, winter 2001 ed. Edward n. zalta. Available: http://plato.stanford.edu/

archives/ (Chapter: Modal logic).

Gruber, T. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(2), 199–220.

Guarino, N., & Welty, C. (2000). Ontological analysis of taxonomic relationships. In Proceedings of the 19th international conference on

conceptual modeling, Salt Lake City, UT, USA (pp. 210–224).

Hoppenbrouwers, J. (1997). Conceptual modeling and the lexicon. PhD thesis, Tilburg University, The Netherlands.

Hoppenbrouwers, J., & Paijmans, H. (2000). Invading the fortress: how to besiege reinforced information bunkers. In Proceedings of

the IEEE advanced in digital libraries, Washington, DC USA (pp. 27–35).

House, N. V. (1995). User needs assessment and evaluation for the UC Berkeley electronic environmental library project. In

Proceedings of the 2nd international conference on the theory and practice of digital libraries, Texas, USA. Available: http://

csdl.tamu.edu/DL95/.

House, N. V., Butler, M., Ogle, V., & Schiff, L. (1996). User-centered iterative design for digital libraries: the Cypress experience. D-Lib

Magazine, 2(2), Available: http://www.dlib.org/dlib/february96/02vanhouse.html.

Ingwersen, P. (1994). Polyrepresentation of information needs and semantic entities: elements of a cognitive theory for information

retrieval interaction. In Proceedings of the 17th international ACM SIGIR conference on research and development in information

retrieval, Dublin, Ireland (pp. 101–110).

Jones, S., Gatford, M., Robertson, S., Beaulieu, M.-H., Seeker, J., & Walker, S. (1995). Interactive thesaurus navigation: intelligence

rules OK. Journal of the American Society for Information Science, 46(1), 52–59.

Jr, R. D., & Lagoze, C. (1997). Extending the Warwick framework: From metadata containers to active digital objects. D-Lib

Magazine, 3(11), Available: http://www.dlib.org/dlib/november97/daniel/lldaniel.html.

Kahn, R., &Wilensky, R. (1995). A framework for distributed digital object services. Corporation for National Research Initiatives.

Available: http://www.cnri.reston.va.us/home/cstr/arch/k-w.html.

Kantor, P. (1998). Observation and measurement in evaluating digital libraries. Available: http://www.canis.uiuc.edu/seminar.html.

Kiefer, F. (Ed.). (1969). Studies in syntax and semantics. Dordrecht, Holland (Y. D. Apresyan, L. A. Melcuk, & A. K. Zolkovsky,

Chapter: Semantics and lexicography: towards a new type of unilingual dictionary).

Lancaster, F., & Sandore, B. (1997). Technology and management in library and information services. University of Illinois Graduate

School of Library and Information Science.

Lancaster, F., & Smith, L. (Eds.), (1990). Artificial intelligence and expert systems: will they change the library? The Board of Trustees

of the University of Illinois (pp. 2–49). (D. P. Metzler, Chapter: Artificial intelligence: what will they think of next?).

Lenat, D. (1995). CYC: a large-scale investment in knowledge infrastructure. Communications of ACM, 13(11), 32–38.

Lin, D. (1998). Using collocation statistics in information extraction. In Proceedings of the seven message understanding conference,

Virginia, USA. Available: http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceed-ings/muc_7_toc.html#infoextract.

Marchionini, G. (2000). Evaluating digital libraries: a longitudinal and multi-faceted view. Library Trend, 49(2), 304–333.

Markowitz, J., Nutter, J., & Evens, M. (1992). Beyond is-a and part-whole: more semantic network links. Journal of Computers and

Mathematics with Applications, 23(6–9), 377–390.

Marshall, B., Zhang, Y., Chen, H., Lally, A., Shen, R., Fox, E., & Cassel, L. (2003). Convergence of knowledge management and

E-learning: the GetSmart experience. In Proceedings of the third joint ACM/IEEE-CS international conference on digital libraries,

Houston (pp. 135–146).

McKnight, C., Dillon, A., & Richardson, J. (1989). Problems in hyperland? A human factors perspective. Hypermedia, 1(2), 167–178.

Mead, J., & Gay, G. (1995). Concept mapping: an innovative approach to digital library design and evaluation. ACM SIGOIS Bulletin,

16(2), ID-14.

Melnik, S., Garcia-Molina, H., & Paepcke, A. (2000). A mediation infrastructure for digital library services. Technical report, Stanford

University.

Miller, G., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to wordnet: an on-line lexical database. Available:

http://www.cogsci.princeton.edu/wn/5papers.pdf.

Minsky, M. (Ed.). (1968a). Semantic information processing. Cambridge, MA: MIT Press (R. Quillian, Chapter: Semantic memory).

Minsky, M. (Ed.). (1968b). Semantic information processing. Cambridge, MA: MIT Press (B. Raphael, Chapter: A computer program

for semantic information retrieval).

Nutter, J. (1989). Representing knowledge about words. In Proceedings of the second national conference of AI and cognitive science,

London (pp. 313–328).

Nutter, J., Fox, E., & Evens, M. (1990). Building a lexicon from machine-readable dictionaries for improved information retrieval.

Journal of Literary and Linguistic Computing, 5(2), 129–137.

120 L. Feng et al. / Information Processing and Management 41 (2005) 97–120

Paepcke, A., Brandriff, R., Janee, G., Larson, R., Ludaescher, B., Melnik, S., & Raghavan, S. (2000). Search middleware and the

simple digital library interoperability protocol. D-Lib Magazine, 6(3), 5–8, Available: http://www.dlib.org/dlib/march00/paepcke/

03paepcke.html.

Paepcke, A., Chang, C., Garcia-Molina, H., & Winograd, T. (1998). Interoperability for digital libraries worldwide. Communications of

the ACM, 41(4), 33–43.

Paepcke, A., Cousins, S., Garcia-Molina, H., Hassan, S., Ketchpel, S., Roscheisen, M., & Winograd, T. (1996). Using distributed

objects for digital library interoperability. IEEE Computer, 29(5), 61–68.

Paepcke, A., & Hassan, S. (1996). Combining CORBA and the World-Wide Web in the Stanford digital library project. In Proceedings

of the OMG’s WWW/CORBA workshop.

Papazoglou, M., & Hoppenbrouwers, J. (1999). Contextualizing the information space in federated digital libraries. SIGMOD Record,

28(1), 40–46.

Payette, S., Blanchi, C., Lagoze, C., & Overly, E. (1999). Interoperability for digital objects and repositories: the Cornell/CNRI

experiments. D-Lib Magazine, 5(5), Available: http://www.dlib.org/dlib/may99/payette/05payette.html.

Payette, S., & Lagoze, C. (1999). Flexible and extensible digital object and repository architecture (FEDORA). In Proceedings of the

2nd European conference on research and advanced technology for digital libraries, Crete, Greece (pp. 41–59).

Phelps, T., & Wilensky, R. (1997). Multivalent annotations. In Proceedings of the 1st European conference on research and advanced

technology for digital libraries, Pisa, Italy (pp. 287–303).

Rich, E. (1983). Artificial Intelligence. Singapore: McGraw-Hill, Inc.

Schatz, B. (1996). Information analysis in the net: the interspace of the twenty-first century. In Proceedings of the 7th workshop on

digital libraries. Tsukuba Science City, Japan. Available: http://www.canis.uiuc.edu/archive/talks/plenary.pdf.

Schatz, B. (1998). High-performance distributed digital libraries: building the interspace on the grid. In Proceedings of 7th IEEE

international symposium on high-performance distributed computing (pp. 224–234).

Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, 32(2), 45–50.

Schatz, B., Mischo, W., Cole, T., Bishop, A., Harum, S., Johnson, E., Neumann, L., Chen, H., & Ng, D. (1999). Federated search of

scientific literature. IEEE Computer, 32(2), 51–59.

Sowa, J. (1984). Conceptual structures: Information processing in mind and machine. Boston: Addison-Wesley, Longman, Inc.

Spink, A., & Saracevic, T. (1998). Human–computer interaction in information retrieval: nature and manifestations of feedback.

Interacting with Computers, 10(3), 249–267.

Sundheim, B. (1993). Tipster/MUC-5 information extraction system evaluation. In Proceedings of the fifth message understanding

conference, San Mateo, USA (pp. 27–44).

V�elez, B., Weiss, R., Sheldon, M., & Gifford, D. (1997). Fast and effective query refinement. In Proceedings of the 20th international

ACM SIGIR conference on research and development in information retrieval, Philadelphia, USA (pp. 6–15).

Waterworth, J., & Chignell, M. (1991). A model for information exploration. Hypermedia, 3(1), 35–58.

Willett, P. (1988). Recent trends in hierarchical document clustering: a critical review. Information Processing and Management, 24(5),

577–597.

Wolff, J., Fl€orke, H., & Cremers, A. (2000). Searching and browsing collections of structural information. In Proceedings of the IEEE

advances in digital libraries, Washington, DC, USA (pp. 141–150).