progress in the development of national knowledge infrastructure

12

Upload: independent

Post on 21-Apr-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Vol.17 No.5 J. Comput. Sci. & Technol. Sept. 2002

Progress in the Development of National Knowledge

Infrastructure

CAO Cungen (���), FENG Qiangze (�$), GAO Ying ( !), GU Fang ( �),

SI Jinxin (���), SUI Yuefei (*#�), TIAN Wen (� +), WANG Haitao (���),

WANG Lili (���), ZENG Qingtian (%��), ZHANG Chunxia (&��),

ZHENG Yufei ('"�) and ZHOU Xiaobin ()��)

Knowledge Acquisition and Sharing Group, Key Laboratory of Intelligent Information Processing

Institute of Computing Technology, The Chinese Academy of Sciences, Beijing 100080, P.R. China

E-mail: [email protected]

Received January 14, 2002; revised May 10, 2002.

Abstract This paper presents the recent process in a long-term research project, called

National Knowledge Infrastructure (or NKI). Initiated in the early 2000, the project aims to

develop a multi-domain shareable knowledge base for knowledge-intensive applications. To

develop NKI, we have used domain-speci�c ontologies as a solid basis, and have built more

than 600 ontologies. Using these ontologies and our knowledge acquisition methods, we have

extracted about 1.1 millions of domain assertions. For users to access our NKI knowledge,

we have developed a uniform multi-modal human-knowledge interface. We have also imple-

mented a knowledge application programming interface for various applications to share the

NKI knowledge.

Keywords National knowledge infrastructure, knowledge acquisition, domain-speci�c

ontology, human-knowledge interface, knowledge application programming interface

1 Introduction

National knowledge infrastructure (NKI) was �rst coined by the �rst author in July 1995. A project

was formally initiated in the early 2000 for a period of three years. Afterwards, additional funds were

obtained from the National Natural Science Foundation and the Ministry of Science and Technology

of China.

NKI has two major purposes. First, it aims to develop multi-domain knowledge bases to be shared

by various knowledge-intensive tasks, such as natural language understanding, speech understanding,

planning and diagnosis, through a standard knowledge application programming interface (KAPI).

Second, it serves as a knowledge provider for people of di�erent backgrounds through a society-

oriented knowledge service interface. The generic architecture of NKI is depicted in Fig.1. From the

�gure, it can be seen that knowledge is the center of NKI.

Large-scale knowledge acquisition and sharing have been researched in several famous projects.

Cyc[1�5] is probably the �rst project to develop a large-scale shareable knowledge base. Within nearly

18 years, roughly 1.7 millions of commonsense assertions are acquired, and hundreds of microtheories

are constructed. BKB (botany knowledge base) is another famous project for constructing a shareable

knowledge base of botany[6;7]. The BKB knowledge base can be used in several applications, e.g.,

intelligent tutoring systems, knowledge-based reasoning and language processing.

Cyc and BKB are largely knowledge bases constructed manually. In recent years, researchers

attempt to extract domain knowledge automatically. A typical example is MindNet[8;9].

Our strategy of acquiring domain knowledge is to subdivide the whole process into three phases.

This work is supported by a grant from the Chinese Academy of Sciences (Grant No.#2000-4010), a grant from the

National Natural Science Foundation of China (Grant No.#20010010-A), and a grant from the Ministry of Science and

Technology (Grant No.#2001CCA03000).

524 CAO Cungen, FENG Qiangze et al. Vol.17

Fig.1. General architecture of NKI.

The �rst phase is to semi-automatically acquire domain knowledge from text sources (i.e., Internet

webpages, technical dictionaries and textbooks) to generate a basic domain knowledge base (BDKB).

The main tasks of the phase are to develop domain-speci�c ontologies and ontology-driven methods

for knowledge acquisition and analysis. We have extended GFP[10] both as an ontology representation

language and as a knowledge representation.

Our second phase is to automatically extract domain knowledge from other texts. The core of

the second phase is natural language understanding (NLU). Unlike current NLU practice, our NLU

method will be heavily based on the BDKB and domain-speci�c ontologies built in the �rst phase.

The third phase is to elicit personal experience of experts of di�erent domains. This phase should

be relatively easier than traditional knowledge elicitation, because the main role of domain experts is

to re�ne the existing domain knowledge bases, rather than build the knowledge bases from scratch.

The paper mainly reports our methods for the �rst phase, and brie y mentions our recent progress

in the second phase. The rest of the paper is organized as follows. Section 2 discusses the strategies for

designing and sharing NKI ontologies. Section 3 presents our ontology-based knowledge acquisition

methods. In Section 4, we focus on the society-oriented knowledge service interface, and present a

natural language knowledge user interface to which we add an error-tolerant ViaVoice speech front-

end. Section 5 presents the �rst version of our knowledge application programming interface (KAPI).

Section 6 concludes the paper and mentions a few problems for future research.

2 Design and Management of Domain-Speci�c Ontologies

Generally, an ontology is an explicit speci�cation of a conceptualization[11], and is fundamental

in sharing knowledge in di�erent agents and across di�erent applications. However, ontology has

been a controversial term in the current AI practice, and so far no formal de�nition exists. In the

No.5 Development of National Knowledge Infrastructure 525

literature, we can identify two sorts of ontologies: engineering ontologies and formal ontologies[12].

Formal ontologies are abstract and axiomatic conceptualization of a domain, while engineering ones

are most informal and sometimes misleading.

In our work, we have elected to use the term of domain-speci�c ontologies (DSO) and attempt

to design a theoretic system of such ontologies covering all domains (66 in total). The reason for

choosing domain-speci�c ontologies is two-fold. First, domain-speci�c ontologies are not far away from

concrete domains. This makes them very useful in knowledge acquisition. Second, our experience has

repeatedly shown that many ontological constraints or axioms are hard to be formulated at a high

level of abstraction, but are easy to be identi�ed and formalized within domain-speci�c ontologies.

More importantly, these lower-level axioms are more useful in ontology-based knowledge analysis (see

below).

Because there may eventually be a tremendous number of categories in various domains, they

must be organized for ontology comprehension, design and analysis. We have considered two general

relationships between categories: inheritance and dependence.

The inheritance relationship between two categories C1 and C2, denoted as inherits (C1; C2), builds

categories into a hierarchical structure, and categories at higher levels of abstraction can be shared

and reused by those at lower levels.

The dependence relationship, denoted as depends(C1; C2), re ects the fact that the slots of C2 can

be shared by C1, but C2 does not represent any entities in the universe of discourse. In other words,

C2 represents a chunking of slots that are shareable among a number of other categories, but it does

not have any meaning without C1.

defcategory hcategory-namei hrelevant-categoriesif

hslot-defifhslot-defigghcategory-namei::=hstring without white spacesihrelevant-categoriesi::=inheritshcategoriesi

jdependshcategoriesijinheritshcategoriesi; dependshcategoriesi

hcategoriesi::=hcategory-nameif, and hcategory-nameig

hslot-defi::=hslot-typei:hslot-namei:typehslot-typei[:synonymhslot-synonymsi][:parasynonymhslot-parasynonymsi][:antonymhslot-antonymsi][:unithslot-unitsi][:domainhslot-value-domaini][:defaulthdefault-slot-valuei][:facethslot-facetsi][:reversehreverse-slotsi][:propertyhslot-propertiesi][:propertyhslot-propertiesi][:subslothsub-slotsi][:relslothpropertiesi]

Fig.2. NKI category language.

Fig.2 illustrates a frame-based language for de�ning a domain-speci�c category. The language

consists of three parts:

� Category header. The ontology header begins with the keyword defcategory, which is followed

by the name of the category�. A category may possibly inherit the content from its super cate-

gories, which is represented by the relation inherits. It may extend what we call slot categories

(also called attribute categories). A slot category is one that does not correspond to any entity in

the domain of interest, and they are only a set of slots which are shareable in di�erent categories.

Because there may eventually be a tremendous number of categories in the domains, inherits

�It should be stressed that NKI does not assume the uniqueness of category naming.

526 CAO Cungen, FENG Qiangze et al. Vol.17

and depends play a critical role in organizing the categories into a hierarchy of abstraction,

where categories at higher levels of abstraction can be shared in di�erent domains and reused in

de�ning lower-level categories.

� Slot de�nitions. A slot de�nition consists of a (unordered) list of slots. A slot may have a

number of facets to constrain its interpretation.

� Category axioms. Category axioms are �rst-order well-de�ned formulae (WFF) for constraining

the interpretation of categories and slots.

With the de�nition above, we have constructed domain-speci�c categories covering 17 disciplines

such as mathematics, biology, ethnology, modern medicine, traditional Chinese medicine, sports and

geography. More categories are under construction and veri�cation.

The top-level category of entities is called Thing, and the top-level slot category is called Name

(see Fig.3). The category Thing summarizes common attributes and relations in a single thing, and

is inherited by all other categories, whereas the category Name de�nes a list of common slots for

describing naming information for an entity.

defcategory Name defcategory Thing depends Name

f fattribute: full-name attribute: informal-de�nition

:type String :type String

attribute: abbreviated-name attribute: formal-de�nition

:type StringArray :type WFFcategory

attribute: English-name attribute: is-instance-of

:type StringArray :type Category

attribute: common-name attribute: has-part

:type StringArray :type Category

g g

Fig.3. The categories Name and Thing.�

Fig.4 depicts a category for geographic location called GeoLocation. Currently, the category

contains 39 slots for specifying geographic information.

GeoLocation is actually a general one in the sense that it is widely applicable to a number of

domains. For example, GeoRegion depends on GeoLocation, as shown in Fig.5.

defcategory GeoLocation

frelation: in-east-of

:type PhysicalEntity

relation: in-south-of

:type PhysicalEntity

relation: in-west-of

:type PhysicalEntity

relation: in-north-of

:type PhysicalEntity

relation: in-middle-of

:type PhysicalEntity

g

Fig.4. NKI location ontology.

defcategory GeoRegion inherits PhysicalThing; depends GeoLocation

fattribute: population

:type Integer

:facet time

attribute: land-area

:type Real

:unit km2

attribute: average-summer-whether

:type Real

:unit ÆC

g

Fig.5. The category GeoRegion.

The GeoRegion is in turn inherited by the categories of City and Country, as shown in Figs.6

and 7.

For slots in a category, we have to specify one or more axioms to constrain their interpretation.

These constraints are actually integral components of our categories. For our geographical ontology

consisting of categories in Figs.4 to 7, we have the following axioms.

Axiom 1. Given two geographical entity X and Y , in-east-of(X;Y )! in-west-of(Y;X).

�Note that unless otherwise stated, all the categories given in this paper are partial. Interested readers may

contact with the authors to get a full copy of the domain-speci�c categories.

No.5 Development of National Knowledge Infrastructure 527

defcategory City inherits GeoRegion

frelation: geographical-part-of

:type GeographicalEntity

attribute: city- ower

:type PlantEntity

g

Fig.6. The category City.

defcategory Country inherits GeoRegion

fattribute: capital-city

:type GeographicalEntity

attribute: �rst-regional-division

:type GeographicalEntity

attribute: national- ower

:type GeographicalEntity

attribute: number-of-nationalities

:type Integer

g

Fig.7. The category Country.

Axiom 1 is often too strong, and may not represent the real relative location of two geographical

entities. We may need a weak but reasonable axiom like Axiom 10 below.

Axiom 10. Given two geographical entity X and Y , in-east-of(X;Y ) ! in-west-of(Y;X)_ in-

northwest-of(Y;X)_ in-southwest-of(Y;X).

Axioms 1 and 10 apply to a single category. Some axioms may involve several slots in di�erent

categories. The relation geographical-part-of in City is such an example, as shown below.

Axiom 2. Given a city X and country Y , geographical-part-of(X;Y ) ! population(X) �

population(Y ).

Axiom 2 reveals an interesting property between the relation geographical-part-of and the attribute

population: the ordering of populations is preserved under the geographical parts of countries. In other

words, if a place is part of a larger place, the population of the former is less than that of the latter.

A similar case is the attribute number-of-nationalities.

Axiom 3. Given a cityX and country Y , geographical-part-of(X;Y )! number-of-nationalities(X)

� number-of-nationalities(Y ).

Fig.8. An interface for the category Country.

528 CAO Cungen, FENG Qiangze et al. Vol.17

An axiom similar to Axiom 3 is about the attribute capital-city. That is, the capital city of a

country is a geographical part of the country.

Axiom 4. Given a city X and country Y , capital-city(X;Y )! geographical-part-of(X;Y ).

From Axioms 2, 3 and 4, we have the following result:

Proposition 1. Given a city X and country Y, capital-city(X;Y )! population(X) � population(Y )

^ number-of-nationalities(X) � number-of-nationalities(Y ).

To share domain-speci�c ontologies among di�erent designers, we have built a management sys-

tem called OKEE (Ontology and Knowledge Engineering Environment). It provides a set of basic

operations on categories and their slots, including

1) Creation for categories and their slots.

2) Retrieval of categories and their slots.

3) Removing categories and their slots.

4) Merging two categories.

5) Finding out common slots in di�erent categories.

6) Functions for identifying possible syntactical and semantical errors in categories.

Fig.8 is the interface for the category Country from the list of all geographical categories on the left

side. We can add, remove and change any part of the category on the interface. A full introduction

to OKEE is given in [13].

3 Knowledge Acquisition from Text: Ontology-Driven Methods

In recent years, knowledge acquisition from text has been paid much attention[8;9;12;14�31]. A key

reason is that majority of the knowledge of a domain are presented in domain texts and documents[27].

From the literature of knowledge acquisition from text (KAT), two kinds of methods can more

or less be identi�ed. The �rst kind of methods relies on general-purpose algorithms to automatically

extract concepts and their relations from text. However, the extreme complexity of a natural language

makes it diÆcult for an automated KAT system to come soon. The second kind of KAT methods is

semi-automated in the sense that knowledge engineers mark a source text by some semantic tags or

keywords[12;14;27;32], and the rest of the KAT task is performed by computer.

We have developed three systems of knowledge acquisition from text (e.g., [12, 27, 32]). The

�rst system mainly consists of a high-level knowledge language (HLKL) for knowledge engineers to

formalize text, and an HLKL compiler for compiling the HLKL text into knowledge frames. Because

the language is very natural in syntax and semantics, it is quite easy to use. The object language is

an IO-model representation, and the compilation of HLKL texts is presented elsewhere in [12, 27, 32].

The second system, called OMKE, is an ontology-mediated knowledge extractor. The input to

this system is semi-structured text. By semi-structured text, we mean that the syntax of the text is

relatively �xed and thus can be easily summarized manually.

OMKE works as follows. For each slot in a category and for each input text (usually very large),

we design one or more knowledge extraction agents to represent the slot. Each agent has a pre-de�ned

working context (a separate syntax). When an agent's working context appears in some segment of

the input text, the agent is invoked to perform a list of pre-de�ned semantic operations. The most

common operation is to �ll in, based on the instantiated working context in the text, the slot of a

frame which an agent represents. After the input text is processed, knowledge frames are collectively

extracted by the agents. A full introduction to OMKE can be found elsewhere in [33].

The third KAT system is a frame language for knowledge engineers to formalize text, together

with the frame compiler mentioned above. After the text is formalized, a frame compiler compiles

frames into IO-models based on relevant categories. Although this method is not as natural as the

�rst one, most of our project knowledge engineers choose the frame language (NKI-FL) in formalizing

domain knowledge. The reason is two-fold. First, frames are straightforward and easy to understand.

Second, frames are grammatically close to the categories that are also described in a frame fashion.

In fact, most knowledge engineers design categories and formalize domain texts alternatively.

No.5 Development of National Knowledge Infrastructure 529

After a text is formalized, our frame compiler parses and analyzes the formal description. In the

following, we only discuss the NKI-FL compilation.

The compilation for NKI-FL texts is ontology-driven. Given a knowledge frame, the compilation

parses each slot in the frame. This parsing is referenced to the corresponding ontology of the frame,

and value domains and default values (if any) of slots are checked.

When a frame is parsed, IO-models are generated. IO-models are representations of interconnec-

tions between concepts. They are special semantic networks for eÆcient retrieval and inference[12].

After IO-models are generated, they must be connected with other relevant IO-models to form a

bigger IO-model. This step is automatically performed via common concepts. It is this step that our

knowledge acquisition methods are superior to other purely manual methods.

As an illustration of the knowledge compilation and automatic model connection processes, Figs.9

and 10 show two separate knowledge frames in NKI-FL.

de�rame People's Republic of China: Country

f

Chinese-name: (������rst-regional-division: Beijing, Tianjin, Shanghai, Liaolin, etc.

g

Fig.9. A frame for People's Republic of China.

de�rame People's Republic of China: Country

fcapital-city: Beijing

to-west-of: Japan

g

Fig.10. Another frame for People's Republic of China.

First, the IO-model for Fig.9 is generated, as depicted on the upper part of Fig.11. Then the

IO-model for Fig.10 is generated, as shown on the lower part of Fig.11. These two IO-models have a

common concept, i.e., Beijing, and thus should be connected, and redundant occurrences (of common

concepts) are then removed by adding in what we called io-sockets and io-plugs (see smaller boxes in

Fig.11).

Fig.11. Knowledge compilation and IO-model connection.

It is worth mentioning that it is this sort of IO-models that make our knowledge server eÆcient

in retrieving concepts and the relationships between various concepts. This is because all knowledge

related to a concept is bound together via io-sockets and io-plugs, and thus global search for a piece

of knowledge of some concept is reduced to a local search around that concept.

4 A Uniform Human-Knowledge Interface (HKI)

NKI also aims to provide knowledge to a wide range of human users. For that purpose, we have

implemented a uniform multi-modal human-knowledge interface (HKI) as a part of the society-oriented

knowledge services. This gives a brief view of the techniques we have developed in the HKI.

4.1 A Parametric Knowledge Query Language

Technically, several options can be considered to develop HKI, e.g., natural language understanding,

530 CAO Cungen, FENG Qiangze et al. Vol.17

speech technology and multi-media technology. However, initial e�ort showed that the barrel between

knowledge and people is not entirely a technical one.

A user can ask for a piece of knowledge in di�erent manners. For example, to ask for the height

of Huangshan (a scenery mountain in Anhui Province of China), a user may query as follows:

� How high is Huangshan?

� Please tell me the height of Huangshan.

� What is the height of Huangshan?

The second problem is: Whether do we provide a single interface to users to query all the knowledge

from di�erent domains, or provide an interface for each domain? We have chosen to develop a single

uniform interface for all users. To achieve this goal, we have designed a hierarchical system of query

templates (QP), based on object-oriented analysis (OOA). QPs at higher levels of abstraction have a

high generality and can be inherited by lower-level QPs. Fig.12 illustrates the template inheritance

and instantiation.

Fig.12. Illustrating query template inheritance and instantiation.

In the �gure, Q1 is an abstract template for making queries about \part-of" relations, in which

h?bei, h?partsi and hCi are three template variables. Variables without question marks are to be

instantiated to concepts, attributes, relationships, etc. in a concrete user query. Variables with

a question mark are domain-customable variables, and an interface designer can customize these

variables when designing domain-speci�c queries. For example, to query about \body parts" of some

living entity, users may use Q11, but to query about \geographical parts" of some geographical entity,

Q12 can be used.

To end this subsection, we stress that, in using the natural language interface, we �nd that users

often make misspellings, causing the server to respond with \No answer to your query or please

rephrase your question!" Some kind of automated misspelling correction is needed, and Subsection

4.3 will address this problem.

4.2 Speech Input

We have developed a speech input interface for users to query our knowledge server easily. The

system is based on IBM ViaVoice 2000. Before a user uses the speech interface, some sort of speech

training is necessary to improve the speech recognition of ViaVoice. Our demonstration has shown

that no matter how the training is performed, various errors occur in user queries.

4.3 Knowledge-Based Error Checking for Knowledge Queries

Generally, error checking is intractable (if not impossible). But in our human-knowledge interface,

error checking is tractable because it is based on the huge multi-domain knowledge base.

We de�ne three similarity measures between two Chinese characters:

1) SSIM(C1; C2): the stroke similarity of C1 and C2. It is the maximal length of common strokes

between C1 and C2.

2) MSIM(C1; C2): the matrix similarity of C1 and C2. It is the overlapping bits in the 24 � 24

matrix divided by 242.

No.5 Development of National Knowledge Infrastructure 531

3) SIM(C1; C2): the overall similarity of C1 and C2. It is de�ned as �SSIM(C1; C2) + �MSIM-

(C1; C2), where � and � are two parameters to be determined by experiments� .

Accordingly, we de�ne the stroke similarity between two words W1 = C1C2 : : : Cnand W2 =

D1D2 : : : Dnas SSIM(W1;W2) =

PSSIM(C

i; C

i), and matrix similarity as MSIM(W1;W2) =

PMSIM-

(Ci; C

i). The overall similarity betweenW1 andW2 are SIM(W1;W2) = � SSIM(W1;W2)+�MSIM(W1,

W2), where � and � are also two (di�erent) parameters to be determined by experiments.

Now, our error checking method is as follows. Given a query Q with a possible misspelled word

W1 = C1C2 : : : Cnwith n characters, ifW1 is found in our knowledge base, then we conclude thatW1 is

correctly spelled. Otherwise, we search the knowledge base for a wordW2 such that SIM(W1;W2) > Æ,

where Æ is a parameter to be determined by experiments. If W2 is found, we replace W1 with W2.

To check for pronunciation errors, we de�ne a speech similarity between two Chinese characters

based on their pinyins. We divide all Chinese characters into classes of identical pronunciation. These

classes are actually a partition of the characters. Then, classes are further partitioned into classes

based on their similarities. For example, � is in the class YIN, and in class YING. But in reality,

these two characters sound very similar. Thus, we put classes YIN and YING into a similarity class

called YIN-YING.

With this processing, the algorithm for checking pronunciation errors is straightforward: To check

for possible errors, we choose di�erent combinations of characters in the generated classes of identical

or similar pronunciation so that the combinations form meaningful concepts in our knowledge base.

5 Knowledge Application Programming Interface

For the NKI knowledge to be accessed by knowledge-intensive applications and our built-in knowl-

edge services, we need a knowledge application programming interface (KAPI). To build such a KAPI,

two key components | knowledge model and model operations must be created.

We have implemented a knowledge application programming interface (KAPI) in Java and C. The

architecture of KAPI is depicted in Fig.13. It consists of two components: a knowledge model and a

set of operations on the model. The knowledge model is a frame formalism, as shown in Figs.10 and

11 in Section 4.

Fig.13. KAPI architecture.

On the client side, operations are mainly classi�ed into three categories. The �rst category contains

operations on knowledge bases. Examples are

1) connect-to (kb, user, password). Establish a connection to the knowledge base kb for a user

with a password.

2) open(kb). Open the knowledge base kb for reading and writing.

�In our experiments, we simply choose � = � = 0:5.

532 CAO Cungen, FENG Qiangze et al. Vol.17

3) remove(kb). Remove the knowledge base kb.

4) close(kb). Close the opened knowledge base kb.

5) disconnect-from(kb, user, password). Disconnect an established connection to kb.

The second category of operations is on knowledge frames. Examples are

1) open(frame)

2) get-frame(frame)

3) remove(frame)

4) remove-attribute(attribute, frame)

5) remove-attribute(facet, attribute, frame)

6) change-attribute(old-attribute, new-attribute, frame)

7) replace-attribute-value(attribute, old-value, new-value, frame)

8) close(frame)

The third category of operations consists of basic operations, and provides a basis to implement

the �rst two categories of operations. Examples are

1) get-attribute-value(frame, attribute)

2) add-attribute-value(frame, attribute)

3) �nd-frame-by-name(frame)

The server-side operations are merely corresponding operations for executing the client-side opera-

tions with equivalent semantics: The client-side operations are transmitted to Apache via the HTTP

protocol, and an Apache module on the server side invokes corresponding operations to perform the

requests on the knowledge bases (KBs).

6 Conclusion and Future Research

We have built more than 600 domain-speci�c ontologies covering 17 di�erent domains. These

ontologies are much more useful than formal abstract ontologies. In fact, based on the ontologies,

about 1.1 millions of domain assertions are formalized from text. The knowledge services that we

developed are also based on those ontologies.

We believe that with the current methodology, more domain basic knowledge would be acquired

fairly quickly. However, NKI is quite young, and we have quite a number of problems to solve before

it becomes mature.

First of all, we need to further develop and improve our system of domain-speci�c ontologies,

and more ontologies should be added to the system. Further, there are many other sorts of domain

knowledge that we have not acquired, on which focus will be paid.

Second, we are developing automated knowledge acquisition methods and tools to speed up the

knowledge acquisition process. The current approach is based on text pattern matching for acquiring

semi-structured knowledge from handbooks and dictionaries, etc. We are focusing more on extracting

knowledge from unstructured text.

Third, we have been developing a set of inference tools. While the inheritance-based inference

engine has been fully implemented, several other common engines have been considered, including

similarity-based reasoning, diagnostic inference, temporal reasoning and default reasoning.

Finally, various knowledge applications are possible based on our huge domain knowledge base.

Currently, we have developed an NKI-based OCR system and a prototype of intelligent tutoring

system. Many other applications, such as medical diagnosis, geographic information systems and

natural language understanding systems are on our research agenda.

References

[1] Guha R V, Lenat D B. Cyc: A midterm report. AI Magazine, 1990, 11(3): 32{59.

[2] Guha R V. Contexts: A formalization and some applications. Tech. ACT-CYC-423-91, MCC, Austin, Texas, 1991.

No.5 Development of National Knowledge Infrastructure 533

[3] Lenat D B. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 1995, 38(11):

33{38.

[4] Lenat D B, Guha R V. Building Large Knowledge-Based Systems. Addison-Wesley, MA, 1990.

[5] Lenat D B, Miller G A, Yokoi T. CYC, WordNet and EDR| critiques and responses | discussion. Communications

of the ACM, 1995, 38(11): 45{48.

[6] Clark P, Porter B. Building domain representations from components. AI96-241, University of Texas at Austin,

1996.

[7] Clark P, Porter B. Building concept representation from reusable components. In Proceedigs of 1997 AAAI, AAAI

Press, 1997, pp.369{376.

[8] Richardson S. Determining similarity and inferring relations in a lexical knowledge base [Dissertation]. City Uni-

versity of New York, 1997.

[9] Richardson S, Dolan W B, Vanderwende L. MindNet: Acquiring and structuring semantic information from text.

In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International

Conference on Computational Linguistics, ACL, 1998, CONF 17, Vol.2, pp.1098{1102.

[10] Chaudhri V K, Farquhar A et al. The generic frame protocol 2.0. SRI International Technical Report, 1997.

[11] Gruber T R. A translation approach to portable ontology speci�cation. Knowledge Acquisition, 1993, 5(2): 199{220.

[12] Cao C. Medical knowledge acquisition from encyclopedic texts. Lecture Notes in Computer Science 2101, 2001,

pp.268{271.

[13] Si J, Cao C et al. OKEE: A multi-domain ontology and knowledge engineering environment. To appear in Pro-

ceedings of EDICS, 2002.

[14] Bowden P R, Halstead P, Rose T G. Extracting conceptual knowledge from text using explicit relation markers. Ad-

vances in Knowledge Acquisition, Shadbolt N, Ohara K, Schreiber G (eds.), Lecture Notes in Arti�cial Intelligence,

Vol.1076, Springer-Verlag, Berlin, 1996, pp.147{162.

[15] Delisle S, Barker K, Copek T, Szpakowicz S. Interactive semantic analysis of technical texts. International Journal

of Computational Intelligence, 1996, 12: 273{306.

[16] Gomez F. Acquiring knowledge about the habitats of animals from encyclopedic texts. In Proceedings of the

Workshop for Knowledge Acquisition (KAW-95), 1995, Vol.1, pp.1{22.

[17] Gomez F, Hull R, Segami C. Acquiring knowledge from encyclopedic texts. In Proceedings of the 4th Conference

on Applied Natural Language Processing (ANLP94), 1994, pp.84{90.

[18] Hahn U, Romacker M, Schulz S. Discourse structures in medical reports |Watch out! The generation of referentially

coherent and valid text knowledge bases in the MEDSYNDIKATE System. International Journal of Medical

Informatics, 1999, 53: 1{28.

[19] Hahn U, Schnattinger K. Towards text knowledge engineering. In Proceedings of the 15th National Conference on

Arti�cial Intelligence and 10th Conference on Innovative Applications of Arti�cial Intelligence, Cambridge, MA:

AAAI Press/MIT Press, 1998, pp.524{553.

[20] Hahn U, Schnattinger K. Deep knowledge discovery from natural language texts. In Proceedings of the 3rd Interna-

tional Conference on Knowledge Discovery and Data Mining, Heckerman D, Mannila H, Pregibon D, Uthurusamy

R (eds.), Menlo Park, CA: AAAI Press, 1999, pp.175{178.

[21] Hahn U, Klenner M, Schnattinger K. Learning from texts: A terminological metareasoning perspective. In Con-

nectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing, Wermter S, Rilo� E,

Scheler G (eds.), Lecture Notes in Arti�cial Intelligence, Vol.1040, Springer-Verlag, Berlin, 1996, pp.453{468.

[22] Hahn U, Romacker M. Content management in the SYNDIKATE System { How technical documents are automat-

ically transformed to text knowledge bases. IEEE Trans. Data & Knowledge Engineering, 2000, 35: 137{159.

[23] Hull R, Gomez F. Automatic acquisition of biographic knowledge from encyclopedic texts. International Journal

of Expert Systems with Applications, 1999, 16: 261{270.

[24] Kazawa K, Fujimoto K, Matsuzawa K. Attribute dependency acquisition from formatted text. In Proceedings of the

3rd International Conference on Knowledge-Based Intelligent Information Engineering Systems, 1999, pp.464{468.

[25] Lapalut S. Text clustering to help knowledge acquisition from documents. In Advances in Knowledge Acquisition,

Shadbolt N, Ohara K, Schreiber G (eds.), Lecture Notes in Arti�cial Intelligence, Vol.1076, Springer-Verlag, Berlin,

1996, pp.115{130.

[26] Plant R T. Techniques for knowledge acquisition from text. International Journal of Computer Information Sys-

tems, 1994, 35: 64{70.

[27] Lu R, Cao C. Towards knowledge acquisition from domain books. In Current Trends in Knowledge Acquisition,

Wielinga B, Gaines B, Schreiber G, Vansomeren M (eds.), Amsterdam: IOS Press, 1990, pp.289{301.

[28] Sato H, Fujimoto K. A new approach to semantic word-matching for knowledge acquisition from text containing

daily-used words. In Advances in Intelligent Systems: Theory and Applications, Mohammadian M (ed.), Frontiers

in Arti�cial Intelligence and Applications, 2000, 59: 135{140.

[29] Schmidt G. Knowledge acquisition from text in a complex domain. In The 5th International Conference on Indus-

trial and Engineering Applications of Arti�cial Intelligence and Expert Systems, Belli F, Radermacher F J (eds.),

Lecture Notes in Arti�cial Intelligence, Vol.604. Springer-Verlag, Berlin, 1992, pp.529{538.

[30] Tschaitschian B, Abecker A, Schmalhofer F. Information tuning with KARAT: Capitalizing on existing documents.

In Knowledge acquisition, Modeling and Management, Plaza E, Benjamins R (eds), Lecture Notes in Arti�cial

Intelligence, Vol.1319, Springer-Verlag, Berlin, 1997, pp.269{284.

534 CAO Cungen, FENG Qiangze et al. Vol.17

[31] Vanderwende L. The analysis of noun sequences using semantic information extracted from on-line dictionaries

[Dissertation]. Georgetown University, Washington, DC, 1996.

[32] TianW, Cao C. A framework for extracting knowledge of the human blood system frommedical texts. In Proceedings

of the 16th International Conference for Young Computer Scientists, 2001, pp.501{505.

[33] Cao C, Wang H et al. An ontology-mediated knowledge programming framework for rapidly acquiring domain

knowledge from semi-structured text. To appear in Proceedings of Paci�c Knowledge Acquisition Workshop, 2002.

CAO Cungen is a professor and Ph.D. advisor of the Institute of Computing Technology (ICT), Chinese

Academy of Sciences (CAS). His major research interests are knowledge acquisition and sharing.

FENG Qiangze is a Ph.D. candidate of ICT, CAS. His research interest is human-knowledge interface.

GAO Ying is a M.S. candidate of the Institute of Software, CAS. Her research interest is musical knowl-

edge acquisition.

GU Fang is a Ph.D. candidate of ICT, CAS. Her research interest is ontological analysis.

SI Jinxin is a Ph.D. candidate of ICT, CAS. His research interest is IT knowledge acquisition.

SUI Yuefei is a professor and Ph.D. advisor of ICT, CAS. His interest is logical foundation of arti�cial

intelligence.

TIAN Wen is a Ph.D. candidate of ICT, CAS. Her research interest is human commonsense knowledge

acquisition.

WANGHaitao is a M.S. candidate of ICT, CAS. His research interest is automatic knowledge acquisition.

WANG Lili is a M.S. candidate of the Institute of Software, CAS. Her research interest is religious and

ethological knowledge acquisition.

ZENG Qingtian is a Ph.D. candidate of ICT, CAS. His research interests are mathematical knowledge

acquisition and Petri net theory.

ZHANG Chunxia is a Ph.D. candidate of ICT, CAS. Her research interest is automatic knowledge

acquisition from text.

ZHENG Yufei is a research associate of ICT, CAS. His research interest is knowledge service.

ZHOU Xiaobin is a M.S. candidate of ICT, CAS. His research interest is medical knowledge acquisition.