af ramewor kf or characterizin gt erminological systems

14
A Framework for Characterizing Terminological Systems R. Cornet, N. F. de Keizer, A. Abu-Hanna Department of Medical Informatics, Academic Medical Center, Universiteit van Amsterdam, Amsterdam, The Netherlands Summary Objectives: The notion of a terminological system (TS) is complex due to the broad range of systems, appli- cations, and clinical domains. A uniform approach to describe the characteristics of TSs is lacking. This impedes furthering understanding, applicability, mutual comparison and development of TSs. For these reasons we propose a terminological systems characterization framework. Methods: Relevant issues pertaining to TSs and ter- minology servers have been extracted from literature describing requirements and functionality of TSs. From these issues, features have been distilled and further refined. A categorization has been developed to provide a convenient arrangement of these features. Results: The framework distinguishes between appli- cation-dependent and application-independent features of TSs. Definitions are provided for measures of content coverage, which was identified as the only application- dependent feature. Application-independent features are categorized along two axes: their respective type of TS and the particular element within that system, i.e. the formalism, the content, or the functionality. For each feature we provide an explicit question, the answer to which yields a feature value. The framework has been applied to SNOMED CT and the CLUE browser. Conclusions: We present and apply a framework to support a feature-based characterization of termino- logical systems. Standardized methods for content coverage studies reduce the effort of assessing the applicability of a TS for a specific clinical setting. A two-axial categorization provides a convenient ar- rangement of the large number of application-indepen- dent features. Application of the framework increases comparability of terminological systems. This frame- work may also help TS developers determine how their system can be improved. Keywords Terminological systems, evaluation, methods, definitions Methods Inf Med 2006; 45: 253–66 1. Introduction Changes in health care organization and technological development have resulted in different health care information systems. This has lead to an evolution of medical ter- minological systems aimed at satisfying the demands for re-use and faithful trans- mission of data and computer-based man- agement of semantics. In accordance to [1] we define a terminological system as “a model of concepts and relationships to- gether with the terms pertaining to them”. Rossi Mori et al. [2] describe the evo- lution of terminological systems in terms of three generations. First-generation systems, e.g. the ICD-family [3] and the Medical Subject Headings (MeSH) [4], are char- acterized by a fixed organization (typically hierarchical) and a simple representation such as a systematic list that is alphabeti- cally indexed. Second-generation systems, such as the medical dictionary for regula- tory activities MedDRA [5], LOINC [6] and SNOMED International [7], have a dy- namic organization (i.e. provide multiple hierarchies) and are compositional, combin- ing the simple list representation of con- cepts with a knowledge base to define and extend these concepts. Third-generation systems, e.g. SNOMED CT [8], GALEN [9], Gene Ontology (GO) [10] and the Foun- dational Model of Anatomy (FMA) [11], are based on formal models providing symbols denoting concepts and a set of formal rules to manipulate them. Throughout these gen- erations, terminological systems have de- veloped from single-purpose, inextensible systems to extensible multi-purpose sys- tems. The range of domains that termino- logical systems cover is broad, as is indi- cated by the examples above. It covers among others patient data, anatomy, drugs, genomics, and medical literature. The in- crease of terminological systems both in number and size is demonstrated by the growth of the UMLS Metathesaurus which integrates a large number of terminological systems [12]. The 2004AC Metathesaurus contains information about over 1 million biomedical concepts and 4.3 million con- cept names (i.e. terms) from more than 100 terminological systems a . Due to the multiplicity and dynamics of terminological systems, a need for under- standing their characteristics has emerged. Based on a review of literature and relevant standards, a typology of these systems is de- fined [1], which is summarized in Table 1. Each terminological system is a terminol- ogy, i.e. a list of terms denoting concepts in a domain, with possibly additional char- acteristics, e.g. it is also a vocabulary when the system includes definitions of the concepts of the terminology. Based on this typology, (recent versions of) the Inter- national Classification of Diseases (ICD) can be typified as not only a classifica- tion, but also a thesaurus, terminology and a coding system. As another example, SNOMED CT (Systematized Nomenclature of Medicine) is not only a nomenclature, but can also be typified as all of the system types mentioned in Table 1. This typology provides a first means for categorizing terminological systems. Other approaches have been advocated that distin- guish terminological systems for example by their prototypical use. In [13], systems for recording detailed patient data are re- ferred to as nomenclatures, whereas sys- tems used for statistical purposes are re- ferred to as (statistical) classifications. The fields of application of medical ter- minological systems have expanded over 253 © 2006 Schattauer GmbH Methods Inf Med 3/2006 a http://www.nlm.nih.gov/pubs/factsheets/umlsmeta. html (last visited February 14, 2005)

Upload: academicmedicalcentreuniversiteitvanamsterdam

Post on 12-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

A Framework for Characterizing TerminologicalSystemsR. Cornet, N. F. de Keizer, A. Abu-HannaDepartment of Medical Informatics, Academic Medical Center, Universiteit van Amsterdam,Amsterdam, The Netherlands

SummaryObjectives: The notion of a terminological system (TS)is complex due to the broad range of systems, appli-cations, and clinical domains. A uniform approachto describe the characteristics of TSs is lacking. Thisimpedes furthering understanding, applicability,mutual comparison and development of TSs. Forthese reasons we propose a terminological systemscharacterization framework.Methods: Relevant issues pertaining to TSs and ter-minology servers have been extracted from literaturedescribing requirements and functionality of TSs. Fromthese issues, features have been distilled and furtherrefined. A categorization has been developed to providea convenient arrangement of these features.Results: The framework distinguishes between appli-cation-dependent and application-independent featuresof TSs. Definitions are provided for measures of contentcoverage, which was identified as the only application-dependent feature. Application-independent featuresare categorized along two axes: their respective type ofTS and the particular element within that system, i.e.the formalism, the content, or the functionality. Foreach feature we provide an explicit question, theanswer to which yields a feature value. The frameworkhas been applied to SNOMED CT and the CLUE browser.Conclusions: We present and apply a framework tosupport a feature-based characterization of termino-logical systems. Standardized methods for contentcoverage studies reduce the effort of assessing theapplicability of a TS for a specific clinical setting.A two-axial categorization provides a convenient ar-rangement of the large number of application-indepen-dent features. Application of the framework increasescomparability of terminological systems. This frame-work may also help TS developers determine howtheir system can be improved.

KeywordsTerminological systems, evaluation, methods,definitions

Methods Inf Med 2006; 45: 253–66

1. IntroductionChanges in health care organization andtechnological development have resulted indifferent health care information systems.This has lead to an evolution of medical ter-minological systems aimed at satisfying thedemands for re-use and faithful trans-mission of data and computer-based man-agement of semantics. In accordance to [1]we define a terminological system as “amodel of concepts and relationships to-gether with the terms pertaining to them”.

Rossi Mori et al. [2] describe the evo-lution of terminological systems in terms ofthree generations. First-generation systems,e.g. the ICD-family [3] and the MedicalSubject Headings (MeSH) [4], are char-acterized by a fixed organization (typicallyhierarchical) and a simple representationsuch as a systematic list that is alphabeti-cally indexed. Second-generation systems,such as the medical dictionary for regula-tory activities MedDRA [5], LOINC [6] andSNOMED International [7], have a dy-namic organization (i.e. provide multiplehierarchies) and are compositional, combin-ing the simple list representation of con-cepts with a knowledge base to define andextend these concepts. Third-generationsystems, e.g. SNOMED CT [8], GALEN[9], Gene Ontology (GO) [10] and the Foun-dational Model ofAnatomy (FMA) [11], arebased on formal models providing symbolsdenoting concepts and a set of formal rulesto manipulate them. Throughout these gen-erations, terminological systems have de-veloped from single-purpose, inextensiblesystems to extensible multi-purpose sys-tems. The range of domains that termino-logical systems cover is broad, as is indi-cated by the examples above. It coversamong others patient data, anatomy, drugs,genomics, and medical literature. The in-

crease of terminological systems both innumber and size is demonstrated by thegrowth of the UMLS Metathesaurus whichintegrates a large number of terminologicalsystems [12]. The 2004AC Metathesauruscontains information about over 1 millionbiomedical concepts and 4.3 million con-cept names (i.e. terms) from more than 100terminological systemsa.

Due to the multiplicity and dynamics ofterminological systems, a need for under-standing their characteristics has emerged.Based on a review of literature and relevantstandards, a typology of these systems is de-fined [1], which is summarized in Table 1.Each terminological system is a terminol-ogy, i.e. a list of terms denoting concepts ina domain, with possibly additional char-acteristics, e.g. it is also a vocabulary whenthe system includes definitions of theconcepts of the terminology. Based on thistypology, (recent versions of) the Inter-national Classification of Diseases (ICD)can be typified as not only a classifica-tion, but also a thesaurus, terminologyand a coding system. As another example,SNOMED CT (Systematized Nomenclatureof Medicine) is not only a nomenclature, butcan also be typified as all of the systemtypes mentioned in Table 1.

This typology provides a first means forcategorizing terminological systems. Otherapproaches have been advocated that distin-guish terminological systems for exampleby their prototypical use. In [13], systemsfor recording detailed patient data are re-ferred to as nomenclatures, whereas sys-tems used for statistical purposes are re-ferred to as (statistical) classifications.

The fields of application of medical ter-minological systems have expanded over

253

© 2006 Schattauer GmbH

Methods Inf Med 3/2006

a http://www.nlm.nih.gov/pubs/factsheets/umlsmeta.html (last visited February 14, 2005)

254

Cornet et al.

Methods Inf Med 3/2006

the years. One of the first terminologicalsystems, the ICD, was developed in orderthat “the medical terms reported by physi-cians, medical examiners, and coroners ondeath certificates can be grouped togetherfor statistical purposes”. Contemporary ter-minological systems enable a much broaderuse; the NHS Information Authoritysketches a spectrum of applications for ter-minological systems: Documentation in theEPR/EHR; Decision support; Clinicalaudit; Reporting; Summaries; Adminis-trative and management information; Epi-demiology; Billing; and Resource manage-mentb.

Given this broad range of possible appli-cations and the large number of highly dif-ferent terminological systems, it is hard tocompare TSs and to determine which ter-minological system(s) can fulfil specificneeds. There is no generic solution satisfy-ing all needs, and the usefulness of a spe-cific terminological system may differ fromone situation to another. The aim of thispaper is to provide a framework to describefeatures of TSs. This framework can supportcomparison between terminological sys-tems, assessing the fulfilment of require-ments and development of a terminologicalsystem. In order to explicitly distinguish thevarious notions used throughout this paper,Figure 1 gives a schematic presentation ofthe notions used in this paper. Figure 2 pro-vides a simplified representation of theprocess of determining the applicability of a

terminological system to meet application-specific requirements. A requirement con-sists of a characteristic and its constraint(s).Characteristics of a TS can be made explicitby features and their values. In the frame-work presented in this paper, features (alsocalled attributes, e.g. “number of con-cepts”, “underlying formalism”) are ex-plicitly distinguished from feature values(a.k.a. attribute values, e.g. “1038 con-cepts”, “frames representation”). To deter-mine the degree to which a requirement issatisfied, the feature values of terminologi-cal systems are matched with application-specific requirements. This requires theexistence of a well-defined and well-de-scribed set of features and feature values ofterminological systems. However, such adescription of features and their values doesnot currently exist, which hampers the as-sessment of the applicability of a TS and thecomparability of terminological systems.As shown in Figure 2 our approach forextracting these features and their valuesconsists of two steps. In the first step,requirements relevant to terminological sys-tems are identified. In the second step poss-ible features are derived and questions areformulated to obtain feature values of a TS.This enables the characterization of ter-minological systems in a structured way,thus providing insight into the similarities ofand differences between various termino-logical systems. Thereby it increases thecomparability of terminological systemsand the support for assessment of theirapplicability.As the total number of featuresmay become very large, the focus in thispaper will be on the intrinsic features ofterminological systems. Hence licensing is-sues or organizational topics such as main-tenance or versioning will not be discussed.

It has often been pointed out (e.g. in [14,15]) that desired characteristics of and cri-teria for terminological systems may varywith their intended usage. In [16], this appli-cation-dependence is mentioned as the firstbarrier to evaluation of terminological sys-tems. Contrary to desired requirements (e.g.“concepts must be designated by Dutchterms”), values of features of terminologicalsystems are almost exclusively application-independent (e.g. “concepts are designatedby terms in English and French”).

b http://www.nhsia.nhs.uk/snomed/pages/events/snomed_update/text5.html (last visited February14, 2005)

Type of terminological system Distinctive characteristic

Terminology List of terms referring to concepts in a defined particular domain.

Thesaurus Terms are ordered e.g. alphabeticallyConcepts are described by more than one (synonymous) term.

Vocabulary Concepts have definitions, either formally or in free text

Nomenclature A set of rules for composing new complex concepts or the terminological system resultingfrom this set of composition rules

Classification Concepts are arranged using generic (is_a) relationships

Coding system Codes designate concepts

Table 1 Overview of types of terminological systems, as defined in [1]. Each terminological system is a terminology andpossibly one or more of the following: thesaurus, classification, vocabulary, nomenclature, and/or coding system.

Fig. 1Schematic presentation ofthe notions used in the lit-erature for various issuesof terminological systems.Examples are given initalics.

255

A Framework for Characterizing Terminological Systems

Methods Inf Med 3/2006

The only application-dependent featureidentified is “content coverage”, a quintes-sential feature of terminological systems.Our framework explicitly distinguishesbetween (application-dependent) contentcoverage and application-independent fea-tures, as shown in Figure 2.

This paper is organized as follows. Anoverview of related research, and the back-ground for the framework is presented inSection 2. Section 3 provides the process ofdeveloping the framework for characteriz-ing terminological systems. Section 4 con-cerns the application-dependent feature“content coverage” and summarizes me-thods to evaluate the content of a termino-logical system. Furthermore, Section 4 pre-sents a categorization of features formu-lated as explicit questions for the appli-cation-independent description of termino-logical systems. Section 5 presents the ap-plication of the framework to the SNOMEDCT terminological systemc, which is re-ceiving increasing attention. Section 6 dis-cusses the merits and limitations of thisframework by looking at various appli-cations of the framework. This section alsorelates the proposed framework to the liter-ature and addresses issues that requirefurther research. Section 7 concludes thispaper.

2. BackgroundAlthough first-generation terminologicalsystems were developed in a paper-basedera, this does not mean that these systemsare useless in today’s computerized envi-ronment. Each of the three generations ofterminological systems does have its ad-vantages and disadvantages in terms of use,maintenance and costs. Therefore it is im-portant to understand the features that char-acterize these systems and evaluate themwith regard to the requirements of theirpotential users. Fortunately the topics ofstandardization and understanding of ter-minological systems are getting increas-ing attention. This has resulted in various

publications that address description andevaluation of terminological systems andterminology servers [14-18]. We defineterminology servers as “software modulesthat provide functionality for navigation,manipulation and/or modification of aterminological system by means of a(standardized) application-programminginterface”. These modules are often closelytied to a specific terminological system andtherefore we include their functionality inour framework.

A number of publications have paid at-tention to describing requirements for ter-minological systems. Among these pub-lications is [14], specifying twelve desider-ata that were distilled (mainly) from liter-ature from the 1990s. The desiderata (suchas “concept orientation”, “polyhierarchy”and “formal definitions”) are proposed as achecklist to address the requirements of in-tended users of second and third-generationterminological systems.The desiderata statethe characteristics a terminological systemshould possess, but do not pay attention tomethods for measuring these characteris-tics, their significance or interdependence.

The “Standards Specification forQuality Indicators for Controlled Health

Vocabularies” [15] is a further step towardsstructured specification of terminologicalsystems. Largely based on [14], it distin-guishes “general information and character-istics” and “characteristics describing thestructure” of the terminology model. Fur-thermore, it describes characteristics in-fluencing maintenance, and characteristicsand measures for the evaluation of termino-logical systems.

In the USA in 2003, the National Com-mittee on Vital and Health Statistics Sub-committee on Standards and Security hasmade an inventory of about 40 terminologi-cal systems to arrive at national terminologystandards for Patient Medical Record In-formation [17]. This inventory was based ona questionnaire that contained between 40and 100 (depending on the level of detailconsidered) questions regarding a largenumber of characteristics of terminologicalsystems (and their developers). This ques-tionnaire intends to use independent, con-tinuous measures of well-defined character-istics without paying attention to their sig-nificance and interdependence. As such, itis the first effort known to the authors to de-scribe a large number of terminological sys-tems in a structured manner.

c http://www.snomed.org/snomedct/index.html(last visited February 14, 2005)

Fig. 2 Schematic representation of using features of terminological systems for determining the satisfaction of require-ments for a specific application

256

Cornet et al.

Methods Inf Med 3/2006

The Object Management Group (OMG)has taken a functionality-oriented approachin the Lexicon Query Service Specification(LQS) [18], which defines “methods for ac-cessing the content of medical terminologysystems”. Rather than defining character-istics of terminological systems, it providesa reference of functions that terminologyservers should offer. As many functions de-pend on characteristics of the underlyingterminological system, the functionalityalso provides insight into requirements on aterminological system.

Other recent research focuses on theontological correctness of the contents ofterminological systems [19, 20]. Thesepapers provide an analysis of the inter-actions between ontological and epistemo-logical components of terminological sys-tems, and on the distinction between classesand concepts. Analyses in [20] aim at deter-mining among others: terms containingclassification criteria, and terms reflectingdetectability, modality, uncertainty, andvagueness. In [19] a discussion is providedon the use of classes and concepts in ter-minological systems, where classes do indi-cate naturally delimited sets (e.g. fracture,breast cancer), as opposed to concepts,that provide artificially constructed sets(e.g. fracture without intracranial injury).Both papers demonstrate the need for in-depth study of the contents of terminologi-cal systems and the need for developing me-thods to guard their ontological correctness.

Recently, a framework comparable to theone we present in this paper, has been devel-oped [21, 22]. The distinction between theframeworks is that the work of Supekar fo-cuses on formal ontologies in the context ofthe semantic web, whereas we aim at pro-viding a more generic description, not onlyof systems based on formal representation(i.e. ontologies), but also on traditional ter-minological systems. Moreover, our frame-work is restricted to terminological systemsin the domain of health care.

This overview shows that various effortshave been made towards description or(means for) evaluation of terminologicalsystems. This paper builds on these efforts,by proposing a general framework that ex-plicitly distinguishes features and featurevalues, categorizes these features, and

primary care. The application-independentfeatures are organized according to thecategorization presented in the next sub-section.

3.2 Categorization of Application-independent FeaturesTo provide further structure for the set ofapplication-independent features, we cat-egorize these according to two axes: theelements of terminological systems andservers, as described below, and the type ofterminological system (terminology, the-saurus, vocabulary, nomenclature, classi-fication, coding system, as depicted inTable 1).

3.2.1 Elements of Terminological Systemsand Servers

The “elements of terminological systemsand servers” axis consists of “formalism”,“content (domain knowledge)”, and “func-tionality” [23]. We take this issue of func-tionality of terminology servers into ac-count as many contemporary terminologi-cal systems are packaged with some “de-fault” services, and because the use of a ter-minological system in a computerized en-vironment is commonplace. Moreover, is-sues mentioned in literature often involveboth systems and servers.

Within a terminological system, one candistinguish the domain knowledge and theformalism that is used to represent the do-main knowledge. The formalism (e.g.frame-based representation, entity-relation-ship modeling, or description logic) is fullyseparated from the represented domainknowledge. Others further subdivide con-cepts and relations into, for example, “cat-egorical structure” (a meta-model of con-cept classes and their relations) and “systemof concepts” (the set of concepts of the spe-cific domain) [24], or “Top-level Ontol-ogy”, and “Domain Ontology” [25].As sucha distinction is relevant for concepts and re-lations, but not for other content, such as theterms or codes in a terminological system;we do not further subdivide domain knowl-edge.

suggests methods to determine the featurevalues in an objective and reproduciblemanner.

3. Process of Formulationof the FrameworkThe framework presented in this paper con-sists of features that we have extracted fromthe literature and then categorized, in orderto show the interdependence and signifi-cance of the features. Furthermore, featureshave been refined when appropriate, andmethods to determine feature values areprovided where applicable.

3.1 Collection of FeaturesThe work mentioned in Section 2 [14, 15,17, 18] forms our starting point for the liter-ature that was consulted. This is due to thegeneric approach of these papers and be-cause of their focus on requirements forterminological systems. From these papers,issues relevant for characterizing termino-logical systems were extracted. These issuesare either desiderata (required character-istics), or generic characteristics (ways ofdescribing terminological systems). Basedon the issues found, we have derived rel-evant features.

The value of the feature “content cover-age” (the extent to which the concepts andterms used in the terminological systemcover the domain) is highly application-dependent. The content coverage dependson the intended application and domain ofuse; hence this cannot be determined inde-pendently of the application of a termino-logical system. In order to make outcomesof content coverage studies comparable,agreed-upon methods for assessing contentcoverage are useful. Such methods are de-scribed in Section 4.1.

In contrast to “content coverage”, allother characteristics of terminological sys-tems can be assessed independently of anapplication. For example, whether a systemprovides “synonyms” is independent ofthe user of the system, e.g. physicians orresearchers in internal medicine, surgery or

257

A Framework for Characterizing Terminological Systems

Methods Inf Med 3/2006

● Formalism-related features are those thatrelate to the formalisms underlying therepresentation of terminological knowl-edge. For example, whether the formal-ism of the system allows the expressionof a poly-hierarchy, or whether the formal-ism restricts the maximum granularity.

domains. Note that these characteristicspresent an overview of the content. State-ments about e.g. the completeness of theterminological system’s content are partof the application-dependent character-ization of terminological systems.

● Content (domain knowledge)-relatedfeatures describe the actual content of(a specific version/release of) a system.Examples thereof are the number of con-cepts, the average number of parent con-cepts (to measure use of poly-hierarchi-cal definitions), and the covered clinical

Table 2 Illustration of the process of extracting features from the literature and categorizing them, applied to “desiderata for controlled medical vocabularies” from (14). The first twocolumns provide quotations from [14], the additional remarks and categorization in the last two columns are provided by the authors.

Desideratum Mentioned Issues

Content ● Add terms as they are encountered● Compositional extensibility● Formal methodology for expanding content● Methods for recognizing and filling gaps in content

ConceptOrientation

● Nonvagueness● Nonambiguity● Nonredundancy,

in the context of pre-coordinated concepts

RecognizeRedundancy

● Provide a mechanism by which redundancy can be recognizedin the context of post-coordination

FormalDefinitions

● Expressed using e.g.: frames, semantic networks, classificationoperators, categorical structures, conceptual graphs

Additional remarks

● In a wider definition, content refers to terms, concepts, relations,composition rules

● No restrictions may be put on the breadth or depth of the taxonomy● Compositional extensibility implies that a system is a nomenclature● Maintenance must be based on formal methods

● These issues need to be asserted during modeling● Nonvagueness is very hard to evaluate● Nonambiguity and Nonredundancy can partly be evaluated when (formal)

definitions are available

● This is functionality that depends on the representation

● The formalism needs to be explicitly described; e.g. supported structures andtheir semantics

● Most notably: Description Logics

Categories

formalismcontentterminologyclassificationnomenclaturecoding

contentvocabularynomenclature

formalismfunctionvocabularynomenclature

formalismvocabulary

Representing Context

Polyhierarchy

ConceptPermanence

EvolveGracefully

Multiple Granularities

MultipleConsistent Views

Nonsemantic ConceptIdentifier

Reject “N.E.C.”

● Coping with contexts may be easier if such contexts aremodeled in the vocabulary

● Allow multiple hierarchies to coexist

● The meaning of a concept is inviolate

● Give a clear, detailed description of what changes occurand why

● The more macroscopic the level of discourse, the coarser thegranularity of the concepts; hence vocabularies be capableof handling both fine-grained and general concepts

● An application may restrict coding to coarse-grained concepts,hide intermediate classes or limit the user to a single, stricthierarchy

● Concepts must have a unique identifier, free of hierarchicalor other implicit meaning

● Definition can only be based on knowledge of the rest ofconcepts in the vocabulary, leading to “semantic drift”

● This is not necessarily part of the vocabulary, but loosely related to it

● Concepts can have multiple parents● Other non-taxonomic hierarchies (e.g. partonomy) must be possible

● This has to be asserted during modeling● Concept deletion is not allowed, a mechanism is required for marking a

concept “obsolete”● It is unclear how to deal with concepts for which the set of subordinate

concepts has changed

● This is a maintenance issue, not an intrinsic feature

● Like “content”, no restrictions may be put on the breadth or depth of thetaxonomy

● This requires representation of “relevance” for various domains● It is arguable whether this is part of the terminological system

● Using a random identifier is required

● This can be regarded a versioning problem, but poses constraints ondomain knowledge

formalismvocabulary

formalismclassificationvocabulary

formalismcontent

classificationvocabulary

--

formalismcontentclassificationvocabulary

functionvocabulary

formalismcoding

contentclassificationvocabulary

258

Cornet et al.

Methods Inf Med 3/2006

● Function-related features describe a ter-minology server in terms of the providedfunctionality, e.g. retrieval of descendantconcepts, or translating a term from onelanguage to another. Ideally, a terminol-ogy server is separated from a termino-logical system, so that a server can beused with more than one system, andlikewise, a system can be addressed bymore than one server.

The two axes described above result in a 3by 6 grid in which application-independentfeatures are placed. This provides explicitand comprehensive clusters. We will usethis grid in Section 4.2.

3.3 Refinement of FeaturesThe process of placing application-indepen-dent features into the above-mentioned gridfrequently required further refinement offeatures, as placement of conceived featureswas non-trivial. We also defined and catego-rized additional features, similar to the onesfound in the literature but not mentioned assuch. To illustrate how we performed theprocess of feature extraction, refinementand categorization, Table 2 shows how thedesiderata from [14] have been processed.Criteria from other literature mentionedhave been processed in the same way, butthis is not represented in Table 2. The twocolumns on the left describe the issues men-tioned for each desideratum. The two col-umns on the right present additional

coverage are made explicit. Second, theapplication-independent description is pre-sented, in which features are organizedaccording to the 3 by 6 grid categorizationthat was introduced in Section 3.2.

4.1 Application-dependentDescription: Content CoverageContent coverage is one of the most im-portant aspects of a terminological system,since physicians need to be able to com-pletely and accurately depict the patientstatus or care process. Also, clinical re-searchers need to be able to construct pa-tient groups at any desired level of aggre-gation and be ensured that all patients in-volved are included in these groups [26].The content of a terminological system in-cludes all concepts, the relationships be-tween these concepts and the terms thatdescribe these concepts (and relations) innatural language(s), as well as any composi-tion rules, concept definitions and codes.Coverage of concepts and terms, which aremeasured in relation to the intended domainand usage, are application-dependent.

Various methods have been applied toevaluate the coverage of the concepts orterms. One example is to measure the cover-age of concepts in a terminological systemalready in use based on the number of con-cepts that had to be added to the system dueto under-representation in the terminologi-cal system [27].

Another approach to assess the coverageof concepts and terms is through ‘conceptmatching’ and ‘term matching’ [28-35].These two measures may give different re-sults in the situation where synonymy issupported but some synonymous terms arenot included in a terminological system. For‘concept matching’ and ‘term matching’, arepresentative subset of concepts and terms,respectively, is extracted from documen-tation in the domain of intended application.For example, if a terminological system isevaluated with regard to its use by nursesfor documentation of nursing information,then the subset of concepts could be wellextracted from existing nursing documen-tation in medical records [28, 30]. This sub-set of concepts or terms is then matched

remarks to these issues, plus the relevantcategories of the features from the twoaxes. For example, in [14] the desideratum“Recognize Redundancy” (as expressed incolumn 1) is defined and the accompanyingtext (summarized in column 2) mentions“As vocabularies evolve, gracefully or not,they will begin to include this kind of redun-dancy [i.e. multiple ways to code a concept].Rather than pretend it does not happen, weshould embrace the diversity it representswhile, at the same time, provide a mech-anism by which we can recognize redundan-cy and perhaps render it transparent.” Thethird column provides a summary of theanalyses of the issues mentioned in the liter-ature, which lead to the applicable cat-egories of the framework (i.e. formalismand functionality of both the vocabulary andnomenclature).

The results of the process of featureextraction, refinement and categorizationhave been used to define and position thefeatures as shown in Table 4.

4. Description of theFrameworkThe process as described in Section 3 has re-sulted in a framework that consists of twomain parts: content coverage, which turnedout to be the only application-dependentfeature identified, and a description and cat-egorization of application-independent fea-tures. First, methods to determine content

Table 3 Definitions for various measures for content coverage

Concept coverage: the extent to which the concepts within a subset, representative for the domain of interest, can be representedby the concepts within the terminological system.

Concept token coverage: concept coverage using a subset in which each concept may occur more than once, indicating the oc-currence of that concept in practice.

Concept type coverage: concept coverage using a subset in which each concept occurs at most once.

Post-coordinated concept coverage: the extent to which the concepts within a representative subset can be represented by theconcepts (either pre-existing or created with use of composition rules) within the terminological system.

Term coverage: the extent to which the terms within a representative subset exist in the terminological systems’ content, providedthat the terms relate to concepts that are present in the terminological system.

Term token coverage: term coverage using a subset in which each term may occur more than once, indicating the occurrence ofthat term in practice.

Term type coverage: term coverage using a subset in which each term occurs at most once.

259

A Framework for Characterizing Terminological Systems

Methods Inf Med 3/2006

with the content of the terminological sys-tem. In both term coverage and conceptcoverage we can distinguish between the“token” coverage, where concepts or termsare counted in accordance to their frequencyof use, and the “type” coverage, in whichconcepts and terms contribute equally, irre-spective of their frequency of use. This dis-tinction has been made for example in [32,35]. For nomenclatures, one can determinepost-coordinated concept coverage, whichtakes into account both pre-coordinated(concepts as such present in the TS) andpost-coordinated concepts (compositions ofpre-coordinated concepts). The definitionsfor the various measures for content cover-age are provided in Table 3.

The extent to which a concept or termcan be matched with concepts in a termino-logical system is mostly presented as a‘match score’ [32, 36, 37]. The coverage ofthe content can be represented, for example,by calculating the percentage of perfectmatches, approximate matches and non-matches.

In [32] a systematic comparison is pre-sented of the concept coverage of seventerminological systems for five “semanticdomains” (i.e. “diagnoses”, “findings”,“modifiers”, “other”, and “treatments andprocedures”), distinguishing “incidentsamples” (i.e. concept token coverage) and“unique subsets” (i.e. concept type cover-age). Availability of such subsets to a broadpublic and reproducible methods to deter-mine and present coverage can result inbenchmarks for application-dependent as-sessment of terminological systems. Amade-up example of the presentation of re-sults of various content coverage measuresis shown in Figure 3.

4.2 Application-independentDescriptionSection 3 has described the process of for-mulating the framework. After extractingthe application-independent features fromthe literature, we categorized these featuresaccording to two axes. Table 4 shows the re-sult of this categorization by type of ter-minological system on the horizontal axisand by elements of terminological systems

5.1 Application-dependentDescription: Content Coverageof SNOMED CT

We have performed a provisional conceptcoverage study for SNOMED CT in the do-main of intensive care. For this study, weused the same data set as in [38]. This dataset consists of all diagnoses that formed(a part of) the in- and exclusion criteria ofclinical studies that appeared in two impor-tant intensive care journals (Intensive CareMedicine and Critical Care Medicine) be-tween January 1, 2001 and July 1, 2001. Fig-ure 4 presents the concept and term cover-age for this application. It shows that “tokencoverage” is higher than “type coverage”,indicating that the concepts and terms thatare present in SNOMED CT are those thatare used more frequently. The relatively lowcoverage of about 70% can be explainedby the fact that many aggregations are basedon highly domain-specific concepts, suchas “encephalopathy with pathogenesis otherthan sepsis (e.g. hepatic encephalopathy)”.It is worth addressing the question ofwhether such concepts should be repre-sented in a terminological system, but sucha treatise is beyond the scope of this paper.

and servers on the vertical axis. Features arecategorized in the most applicable category.For example, the feature “number of con-cepts” is placed under “terminology”, as it isrelevant for all (concept-oriented) terminol-ogies.

The features in Table 4 are presented asexplicit questions. Answering these ques-tions provides a description of the appli-cation-independent characteristics of a ter-minological system. An example of this isdescribed in Section 5.

5. Application of theFramework to SNOMED CTWe applied the framework as has been de-scribed above to the July 2003 UK versionof SNOMED CT, used in combination withthe CLUE browser 5.5. SNOMED CT waschosen as it is recommended as the foun-dation of a standard vocabulary in both theUSA and the UK, and consequently it hasrecently been receiving much attention.

Fig. 3 Example of presentation of content coverage measurement results

5.2 Application-independentDescription of SNOMED CT

Now we address the questions described inTable 4 for SNOMED CT. In Table 5 wesummarize each question in an italic type-face, followed by a short answer. Figure 5shows statistics of three features related tothe content of the thesaurus (the number ofconcepts with synonymous terms), the nom-enclature (the number of refinable con-cepts), and the classification (the number ofparents per concept), respectively.

6. DiscussionThe framework as described and appliedin this paper aims at providing a charac-

260

Cornet et al.

Methods Inf Med 3/2006

Table 4 Two-axial categorization of questions to obtain application-independent characteristics of terminological systems

Terminology(list of terms)

Form

alism

Are “concepts” and“terms” explicitly distin-guished?Is length of terms re-stricted?Which character encodingmechanism is used?Can concepts be markedas obsolete?

Cont

ent

How many total conceptsand terms are in the ter-minology?Which areas/ domainsare covered?

Func

tiona

lity

How can terms besearched? E.g. convertcode to text, keywordmatch, lookup phrases(incl wildcards), case in-sensitive, etc.

Thesaurus(indexing and synonyms)

Are terms indexed?Are synonyms allowed, i.e.can multiple terms havethe same meanings?How is synonymy repre-sented?Can multiple languages berepresented?Are synonyms forfragments allowed?(e.g. cardiac ~ heart)

In which way(s) are theterms indexed?In what languages areterms described?

Can terms be translatedfrom one language toanother?

Classification(is-a relationships)

Can hierarchical relation-ships between concepts bedefined?If yes, Which?

Part-of?Is-a?

Is poly-hierarchy supported?Is hierarchy restricted indepth or breadth?Can classification beinferred based on a con-cept’s definition?

Can properties be inheritedto subordinate concepts?What is the distributionof the number of parentsper concept?

Can all descendants of aconcept be retrieved atonce?

Vocabulary((formal) definitions)

Is the meaning of conceptsrepresented in free text?Is the meaning of conceptsrepresented formally?If yes: how? e.g. frames,Description Logic (DL)If DL: which DL?Are relationships explicitlydefined?

Are all concepts defined/de-scribed, or only “core con-cepts”? e.g. diseases, butnot anatomyHow many concepts are:– vague?– ambiguous?– redundant?How many and whichrelationtypes do exist?

Are multiple consistentviews provided?Can properties of a conceptbe retrieved (e.g. definitionretrieval)?Is basic inference supported,e.g. subsumption testing,instance checking?

Nomenclature(composition rules)

Is composition of conceptspossible?How is this represented?Can equivalent definitions bedetected automatically?Can compositions changethe meaning of a concept,or do they only specify con-cepts in more detail?

How many concepts can becombined or further spec-ified?

How is a user supportedin constructing compositeconcepts?Can refinable relations beretrieved?

Coding system(codes)

Are codes assigned to con-cepts? If yes, is there codegeneration mechanism?Are lengths of codes re-stricted?Is there a meaning to thesecodes (e.g. mnemonic)?Do the codes limit the taxon-omic placement of concepts?

Are all concepts coded?Are the codes proprietary orcross-mapped to anothersystem?

Can codes be cross-mappedto codes in another codingsystem?

Term

inolo

gical

syste

mTe

rmin

ology

serve

r

Fig. 4 Content coverage of SNOMED CT w.r.t. concepts that were retrieved from studies published in two Intensive Carejournals [38]

terization of terminological systems. Thischaracterization needs to strike a balancebetween conciseness and completeness. Acomplete characterization is impractical ifnot impossible, as there may always remainnew features that can be defined. Hence, thefeatures defined in this paper provide a good

systems, and the possibility to overcome theweaknesses. Generally, shortcomings inthe content can be solved relatively easily,whereas shortcomings in the formalism areharder to overcome. Likewise, if a terminol-ogy server lacks functionality, this can onlybe implemented if the formalism underlying

starting point, not a definitive collection.The process we followed to the identifiedfeatures makes it fair to assume that thesefeatures are indeed important ones.

The distinction between formalism, do-main knowledge and functionality helps toidentify the strengths and weaknesses of

261

A Framework for Characterizing Terminological Systems

Methods Inf Med 3/2006

Table 5 Two-axial categorization of questions to obtain application-independent characteristics of SNOMED CT (formalism and content) and the CLUE Browser (functionality)

Terminology

Form

alism

concepts and termsdistinguished: yesterm length restriction:nonecharacter encoding: UTF-8concept obsoletion mech-anism: yes, concept statusflag, a.o. retired, withmotivations

Cont

ent

total number of concepts:352662total number of terms:939705covered areas: disorders,subjective symptoms,findings, procedures, lab,radiology, anatomy,medication, chemicals,devices, care manage-ment, assessment tools

Func

tiona

lity

convert code to text: onlyfor SNOMED concept- anddescription Idslookup phrases for astring: yeslookup phrases matchinga string (with wildcards);inexact match: yes, workson parts of wordscode refinement: nokeyword matching: nocase-insensitive:

Thesaurus

terms indexed: nonesupports synonymy: yes,concepts can be representedby multiple terms indifferent languagessynonym representation:description status, whichcan be preferred, synonym,fully specified or unspecifiedmultilingual representation:yes, by means of languagecodedescription obsoletionmechanism: yes, descriptionstatus flag, a.o. retired,with motivationssynonyms for fragments:no, only for full terms

number of concepts withsynonymous terms: see Fig-ure 5a (average = 2.66).languages: UK English, USEnglish, German, Spanish

translation to other lan-guages: no, one language-specific version is used

Classification

hierarchical relationships:Is-a, Part-ofallows polyhierarchy: yes,unrestrictedhierarchical depth restric-tion: nonehierarchical breadth restric-tion: noneclassification inferred basedon concept def: yes, DLbased. Included in dis-tributed version, not sup-ported by CLUE browser

properties inherited tosubordinates: yesnumber of parents perconcept: Figure 5c (aver-age=1.30).

retrieve descendants: yes

Vocabulary

supports free-text conceptdefinition: nonesupports formal concept de-finition: yes, DL :ELH + role groups + rolecomposition + right identityaxiomsatomic concepts distin-guished: noexplicitly defined relation-ships: yes

approx number of vagueconcepts: 13151 (3.7%)(based on preferred termscontaining “ or ” or “/or”)approx number of ambigu-ous concepts: n/aapprox number of redundantconcepts: n/arelationships used: 993 at-tributes are defined, 42 areused.

providing multiple consistentviews: yes, subsets can bedefined by means of refer-ence to concepts to be in-cluded or excluded (with orwithout their subsumees)retrieve definitions: yessubsumption testing: noinstance checking: nodetection of equivalentdefinitions: noquery for conceptsmatching structural criteria:no

Nomenclature

composition possible: yescomposition formalism:characteristic type = 0 or 1(0=Defining. 1=Qual-ifier),refinability = 0, 1 or 2(0=Not refinable; 1=Op-tional: May be refined by se-lecting subtypes; 2=Man-datory: Must be refined byselecting a subtype.)detection of equivalent de-finitions: the formalism (DL)supports this, as do DL rea-soners. No known tools thatsupport postcoordinationand detection of equivalentdefinitionscompositions change mean-ing or only specify moredetail: more detail only

number of refinable con-cepts: 153080 (43%)distribution of refinablerelations per conceptseeFigure 5b

support in concept composi-tion: noretrieve refinable relations:yes

Coding system

codes assigned: yescode generation mech-anism: sequential number+ partition identifier +check digitcode length restriction: 18positionsmeaning of identifiers: noneLimitation of taxonomicplacement: no

All concepts coded: yescrossmappings: CTV-3,CDT-2, HHCC, ICD-9-CM,ICD-10, ICD-O, LOINC, NIC,NANDA, PNDS, OMAHA,OPCS-4

cross coding: no

Term

inolo

gical

syste

mTe

rmin

ology

servi

ce

a terminological system provides supportfor such functionality. For example, to pro-vide word normalization, the formalismshould allow for the representation of nor-mal forms and inflections of terms.

In this section we will further discuss thelimitations and possible drawbacks of thisframework by looking at various applicationtasks of the framework: comparison be-tween terminological systems, fulfilment ofrequirements, and development of a ter-minological system. We will furthermorerelate the framework to the literature, andlook at the possibilities for and benefit ofsharing experiment results using this frame-work.

cal systems can be described as different de-scription logics, but it may be hard to inter-pret what the (practical) consequences areof these different formalisms. To further en-hance comparability, not only the featuresshould be explicitly specified, but also theirallowed values i.e. the possible featurevalues, for example “DL, frames, other” forthe feature “Formalism used”. Currently, nocategories for feature values are presented;instead they are specified in free-text. Sec-ondly, some features in the categorizationare hard to measure, such as the number ofvague, ambiguous or redundant concepts.Thirdly, measurement of the application-de-pendent feature “content coverage” remainslabor-intensive, and should be performed

6.1 Using the Framework forComparing Terminological Systems

The utility of the framework presented inthis paper increases if researchers and de-velopers of terminological systems wouldaddress the questions described in Table 4and make their answers publicly available.The availability of a structured character-ization of various terminological systemswill support their comparability but someproblems will still remain.The first problemis that, although the feature values are de-scribed, the interpretation of their impli-cation may be difficult. For example therepresentation formalisms of terminologi-

262

Cornet et al.

Methods Inf Med 3/2006

Fig. 5 Use of terms and compositionality in SNOMED CT. Figure a shows a histogram of the number of synonyms (on a log scale), Figure b shows the histogram of the number of refinablerelationships (the composition mechanism in SNOMED CT), Figure c shows the histograms of the number of parents.

for each domain and application, as existingsubsets may not always be representative forintended new usage. It is important thatthese subsets are made publicly available sothat similar subsets can be used to evaluatecontent coverage of different terminologicalsystems.

6.2 Using the Framework forRequirements FulfilmentFigure 2 depicts how features and their val-ues can be used in the process of selecting aterminological system for a specific appli-cation. These feature values need to bematched with requirements that the appli-cation poses on the system. This is notnecessarily a straightforward process. Oneneeds to determine the domain of interest(e.g. intensive care) and the application (e.g.recording the reason for admission). Threetypes of application of a terminological sys-tem can be distinguished according to [39]:1) entering and presenting data about pa-tients, 2) sharing and integrating infor-mation, and 3) querying and retrieving in-formation. The NHS Information Authoritydescribes nine scenarios that were men-tioned in Section 1, which we will place intothese three categories. As a rule of thumb,the following requirements hold for theseapplications:● Entering and presenting data about pa-

tients (NHS: Documentation in the EPR/EHR, Decision support): a terminologi-cal system should capture all conceptsand terms that are applicable. Eitherthese concepts must be present in the sys-tem, or it must be possible to composethem using pre-existing concepts, a pro-cess also called post-coordination. Thelatter poses demands on the nomencla-ture-related features of a terminologicalsystem. Properly handling terms enteredby clinicians involves terminology- andthesaurus-related features of the system.

● Sharing and integrating information(NHS: Clinical audit, Reporting, Sum-maries, Decision support): semantics ofthe data should be well understood by allpeople involved and by other involvedsoftware applications, such as decisionsupport systems. Hence, definitions are

ture use of a terminological system, togetherwith the requirements that emerge from thisusage, can help determine the necessaryfeatures and feature values that the termino-logical system should possess. We believethat the framework described in this paperwill also contribute to development of ter-minological systems from scratch, as theframework encourages developers to deter-mine which features are important for theirspecific application.

6.4 Relation between Frameworkand LiteratureIn this section we will relate our work to theliterature on which we based our frame-work, although we do not claim that thisbibliography provides a complete overview.

The desiderata specified in [14] focus onapplication-independent and application-dependent characteristics. These mainly in-volve formalism-related issues (conceptorientation, concept permanence, nonsem-antic identifiers, polyhierarchy, formal de-finitions, rejection of “Not Elsewhere Clas-sified”, multiple granularities, multipleconsistent views, representation of context).“Content” in itself is defined as the most im-portant characteristic of terminologicalsystems. However, methods to evaluate thisapplication-dependent characteristic, assummarized in section 4.2, are not providedin [14]. “Recognition of redundancy” re-lates both to functionality and formalismfor detection of equivalent (post-coor-dinated) concept definitions. “Gracefulevolution” is not within the scope of thispaper since it involves a formalized or-ganization for keeping track of changesbetween versions of terminological sys-tems, and in our framework we do nottake maintenance and versioning issues intoaccount.

The Quality Indicators from [15] coverthe application-independent characteristicsas mentioned in [14] but add more detail tothese characteristics, and provide some ad-ditional characteristics, such as: clearlystated “purpose and scope” of terminologi-cal systems, and functionality for “normal-ization of content and semantics”. It is fur-thermore stated that composition of con-

important (vocabulary-related features).In addition, it may be desirable to presentinformation at varying levels of detail,especially when information is ex-changed between various specialties, forexample between a cardiac surgeon and ageneral practitioner. This poses demandson the classification-related features.

● Querying and retrieving (Administrativeand management information, Epidemi-ology, Billing, and Resource manage-ment): this requires the ability to ag-gregate data (classification-related fea-tures). It needs to be determined howconcepts need to be aggregated, e.g.whether there are predefined axes(sometimes referred to as chapters), orwhether it should be possible for usersto freely combine concept-criteria. Thelatter case requires concepts to be ex-plicitly characterized by properties andrelations to other concepts (i.e. vocabu-lary-related properties).

These descriptions of the types of appli-cations indicate which terminological sys-tem types (e.g. vocabulary, nomenclature)are the most relevant. Explicit categoriza-tion by the terminological system type (rep-resented by the columns ofTable 4) supportsfocusing on those features that are essentialfor fulfilling the requirements of the ter-minological system. A more detailed re-quirement analysis is outside the scope ofthis paper.

6.3 Using the Framework forDevelopment of a Termino-logical System

Many contemporary systems have evolvedover time by adding contents, broadeningthe scope, and aiming at supporting a widerrange of applications. These systems oftenprovide workarounds for limitations of pre-vious versions, without really solving theunderlying problem. For example, ICD10has introduced the “dagger/asterisk” mech-anism to enable dual coding, which is a par-tial solution for creating a poly-hierarchy,which was not supported in earlier versions.A clear understanding of the initial and fu-

263

A Framework for Characterizing Terminological Systems

Methods Inf Med 3/2006

cepts must be possible, i.e. that a termino-logical system is a nomenclature. A notableissue mentioned in [15] is the need forspecification of some application-specificrequirements, such as: persistence and ex-tent of (primary) use, and the degree of auto-matic inference intended. As described inFigure 2, these application-specific require-ments should be mapped to the featurevalues of a terminological system.

LQS [18], being a specification for im-plementation of a terminology server, canalso be used as a structured reference forrequired functionality. In this way it canplay an important role for the application-independent description of functionality ofexisting terminology servers, and guide de-velopment of new terminology servers. Thefunctions described in LQS outnumber thefunctionality defined in our framework, aswe defined functionality at the level of usecases rather than function calls in an Appli-cation Programming Interface (API). Thisrestriction is motivated by the need to keepthe framework at a conveniently high levelof granularity. The development of newterminology server interfaces such as HL7Common Terminology Services (CTS)d

may be a next step towards providing meansfor the characterization and implementationof terminology servers, and provide a valu-able addition to our framework.

The National Committee on Vital andHealth Statistics (NCVHS) Questionnaire[17] is the first effort known to the authorsthat delivers a structured, application-inde-pendent description of a variety of termino-logical systems. If the results become avail-able electronically it will provide a valuablesource of information for comparison,evaluation or development of terminologi-cal systems. The main added value of ourframework compared to [17] is the catego-rization of features, and the provision ofexplicit measures and methods for contentcoverage.

By making characteristics explicit andby striving to make them objectivelymeasurable, the barriers to evaluation of ter-minological systems as described by Hales

[16] can be overcome. The first three bar-riers “1) evaluations are application depen-dent, 2) assessment is empirical instead ofindependent, 3) dichotomous measures ofcharacteristics (presence or absence) aremore in use than continuous measures” havebeen overcome by using our general ap-proach presented in Figure 2. By using astructured categorization of characteristics(Table 4) in the first phase of this approachwe paid attention to the remaining barriers“4) poor definition of characteristics, 5)large number of characteristics, 6) differentsignificance of characteristics, and 7) inter-dependence of characteristics”.

6.5 Reuse of Results –Possibilities and BenefitsThe framework presented in this paper pro-vides a template for organizing features ofterminological systems. This is of import-ance for increasing the understanding ofterminological systems. However, its addi-tional value is in the application of theframework, and in sharing the results. Theusefulness of the framework will increasewith the number of terminological systemsand applications it is applied to. If descrip-tions are shared and detailed methods aremade available, comparison, evaluation anddevelopment of TSs will become less com-plex than it currently is.As making an appli-cation-independent description is a one-time effort for each (version of a) termino-logical system, and requirements specifi-cation is a one-time effort for each appli-cation, this framework can contribute to areduction of the effort to be put in compari-son and evaluation, assuming that re-searchers share their results. This sharing ofresults requires a repository in which thecharacterizations of individual (versions of)terminological systems are held. Providingan open (e.g. web-based) environment willincrease the utility of this framework.Firstly, such an environment can providemore information about systems, their fea-ture values, and the methods to determinethese feature values. Secondly, if resultsare shared, the relative cost of comparisonof terminological systems will decrease.Finally, performing such comparisons may

provide insight into the most prominentpossibilities for improvement.

The realization of such an infrastructurecomprises further work we are planning toundertake.

7. ConclusionThe framework described in this paper aimsto describe the essential characteristics ofthe terminological systems. This enhancesthe understanding of these systems, which isnecessary for comparison, application, anddevelopment of terminological systems.Since most characteristics are application-independent, their description can be reusedin different applications. The proposed cat-egorization as described in Table 4 supportsthe explicit “once-only” description of theapplication-independent characteristics of aterminological system. Thereby, this frame-work aims to reduce the efforts for deter-mining which terminological system is ap-plicable for a certain clinical setting. Thisframework may also help terminologicalsystem developers to determine in what waytheir system can be improved to serve moreor broader needs.

We have combined the two axes “Ele-ments of terminological systems andservers” and “Types of terminological sys-tems”, resulting in a 3 by 6 grid of appli-cation-independent characteristics as pre-sented in Table 4. In this grid we havespecified questions as examples of the char-acteristics of terminological systems. Thesequestions are examples of how to specifythe application-independent description ofterminological systems, the first step inthe evaluation process. We plan to realize awebsite through which the set of questionsis made available. In this way, characteriza-tions of individual terminological systemsas well as future additions to the current setof questions can be made publicly acces-sible.

The process described in this paper re-veals the prominent features of termino-logical systems. The framework provides astructured categorization of features thatconstitute characterization of a terminologi-cal system. Beyond understanding termino-

d http://informatics.mayo.edu/index.php?page=11(last visited February 14, 2005)

264

Cornet et al.

Methods Inf Med 3/2006

References1. de Keizer NF, Abu-Hanna A, Zwetsloot-Schonk

JH. Understanding terminological systems. I: Ter-minology and typology. Methods Inf Med 2000;39 (1): 16-21.

2. Rossi Mori A, Consorti F, Galeazzi E. Standardsto support development of terminological systemsfor healthcare telematics. Methods Inf Med 1998;37 (4-5): 551-63.

3. International Statistical Classification of Diseasesand Related Health Problems, 1989 Revision.Geneva: World Health Organization; 1992.

4. Rogers FB. Medical subject headings. Bulletin ofthe Medical LibraryAssociation 1963; 51: 114-6.

5. Brown EG, Wood L, Wood S. The medical dic-tionary for regulatory activities (MedDRA). Drugsafety: an international journal of medical toxicol-ogy and drug experience 1999; 20 (2): 109-17.

6. McDonald CJ, Huff SM, Suico JG, Hill G,Leavelle D, Aller R, et al. LOINC, a universalstandard for identifying laboratory observations:a 5-year update. Clinical Chemistry 2003; 49 (4):624-33.

7. Coté RA, Rothwell DJ, Palotay JL, Beckett RS,Brochu L, editors. SNOMED International: thesystematized nomenclature of human and vet-erinary medicine. Vols I-IV. Northfield, IL: Col-lege of American Pathologists; 1993.

8. Spackman K. SNOMED RT and SNOMED CT.Promise of an international clinical terminology.MD Computing 2000; 17 (6): 29.

9. Rector AL, Nowlan WA. The GALEN project.Computer Methods and Programs in Biomedicine1994; 45 (1-2): 75-8.

logical systems, the main application tasksthat this framework supports are compari-son of terminological systems, fulfilmentof requirements, and development of a ter-minological system.

The application of the framework toSNOMED CT and the CLUE Browser de-monstrates the applicability of the frame-work. Further application of the frameworkto a variety of terminological systems andservers will help to increase the understand-ing of their individual merits, and eases theprocess of determining the applicability of aterminological system for a specific appli-cation and domain.

AcknowledgmentsThis work is supported by the Netherlands Organi-zation for Scientific Research (NWO) program “In-formation & Communication Technology in Health-care” (ICZ) for the project entitled “Terminology andSemantics: Making semantics explicit”, number014-18-014.

10. Harris MA, Clark J, Ireland A, Lomax J, Ash-burner M, Foulger R, et al. The Gene Ontology(GO) database and informatics resource. NucleicAcids Research 2004; 32 (Database issue):D258-61.

11. Rosse C, Mejino JLV. V. A reference ontology forbiomedical informatics: the Foundational Modelof Anatomy. Journal of Biomedical Informatics2003; 36 (6): 478-500.

12. Schuyler PL, Hole WT, Tuttle MS, Sherertz DD.The UMLS Metathesaurus: representing differentviews of biomedical concepts. Bulletin of theMedical Library Association 1993; 81 (2):217-22.

13. Ingenerf J, Giere W. Concept-oriented standard-ization and statistics-oriented classification: con-tinuing the classification versus nomenclaturecontroversy. Methods Inf Med 1998; 37 (4-5):527-39.

14. Cimino JJ. Desiderata for controlled medicalvocabularies in the twenty-first century. MethodsInf Med 1998; 37 (4-5): 394-403.

15. ISO/TC215 WG 3. Standard Specification forQuality Indicators for Controlled Health Vocabu-laries; 2000 July. Report No.: TS17117.

16. Hales JW, Schoeffler KM. Barriers to Evaluationof Clinical Vocabularies. In: Cesnik B, McCrayAT, Scherrer JR, editors. Medinfo1998: Proceed-ings of the 9th World Congress on Medical In-formatics; 1998; Seoul, Korea. Amsterdam, TheNetherlands: IOS Press; 1998. pp 680–4.

17. NCVHS SSS. Summary and Analysis of Ter-minology Questionnaires Submitted by Devel-opers of Candidate Terminologies for PMRI Stan-dards. draft; 2003 April 17.

18. OMG. Lexicon Query Service Specification: Ob-ject Management Group; 2000 July. Report No.:00-06-31.pdf.

19. Smith B. Beyond Concepts: Ontology as RealityRepresentation. In: Varzi A, Vieu L, editors. Pro-ceedings of FOIS 2004. International Conferenceon Formal Ontology and Information Systems;2004; Turin: Amsterdam, The Netherlands: IOSPress; 2004. pp 73-84.

20. Bodenreider O, Smith B, Burgun A. The Ontol-ogy-Epistemology Divide: A Case Study in Medi-cal Terminology. In: Varzi A, Vieu L, editors. Pro-ceedings of FOIS 2004. International Conferenceon Formal Ontology and Information Systems;2004; Turin. Amsterdam, The Netherlands: IOSPress; 2004. pp 185-95.

21. Supekar K. A Peer-review Approach for OntologyEvaluation. In: Noy NF, editor. 8th InternationalProtégé Conference; July 18-21, 2005. Madrid,Spain; 2005. pp 77-9.

22. Supekar K, Patel C, Lee Y. Characterizing qual-ity of knowledge on semantic web. In: BarrV, Markov Z, editors. Proceedings of theSeventeenth International FLAIRS Conference;May 2004; Miami, FL: AAAI Press; 2004.pp 220-8.

23. Cornet R. Towards Structured Requirements forTerminological Systems and Servers. In: Patel V,Rogers R, Haux R, editors. Medinfo 2001: Pro-ceedings of the 10th World Congress on Medical

Informatics; 2001; London, UK: IOS Press, Am-sterdam, The Netherlands; 2001. p 295.

24. CEN/TC251. Medical Informatics – Categoricalstructures of systems of concepts – Model forrepresentation of semantics. European Prestan-dard. Brussels, Belgium: European Committeefor Standardization; October 1997. Report No.:ENV 12264.

25. Guarino N. Formal Ontology in Information Sys-tems. In: Guarino N, editor. Proceedings ofFOIS’98; June 1998; Trento, Italy: IOS Press,Amsterdam, The Netherlands; 1998. pp 3-15.

26. Arts DGT, Cornet R, de Jonge E, de Keizer NF.Methods for Evaluation of Medical Terminolo-gical Systems. Methods Inf Med 2005; 44:616-25.

27. Bodenreider O, Burgun A, Botti G, Fieschi M,Le Beux P, Kohler F. Evaluation of the UnifiedMedical Language System as a medical knowl-edge source. Journal of the American MedicalInformatics Association 1998; 5 (1): 76-87.

28. Bakken Henry S, Holzemer WL, Reilly CA,Campbell KE. Terms used by nurses to describepatient problems: can SNOMED III representnursing concepts in the patient record? Journal ofthe American Medical Informatics Association1994; 1 (1): 61-74.

29. Bodenreider O, Mitchell JA, McCray AT. Evalu-ation of the UMLS as a terminology and knowl-edge resource for biomedical informatics. In: Ko-hane IS, editor. Proceedings of the 2002 AMIAAnnual Symposium; 2002; San Antonio, TX,USA. Philadelphia, PA: Hanley and Belfus Inc.,USA; 2002. pp 61-5.

30. Bowles KH. Application of the Omaha System inacute care. Research in Nursing & Health 2000;23 (2): 93-105.

31. Campbell JR, Carpenter P, Sneiderman C, Cohn S,Chute CG, Warren J. Phase II Evaluation of Clini-cal Coding Schemes: Completeness, Taxonomy,Mapping, Definitions, and Clarity. Journal of theAmerican Medical Informatics Association 1997;4 (3): 238-51.

32. Chute CG, Cohn SP, Campbell KE, Oliver DE,Campbell JR, For The Computer-Based PatientRecord Institute’s Work Group on Codes & Struc-tures. The content coverage of clinical classifi-cations. Journal of the American Medical In-formatics Association 1996; 3 (3): 224-33.

33. Cimino JJ, Patel VL, Kushniruk AW. Studying thehuman-computer-terminology interface. Journalof the American Medical Informatics Association2001; 8 (2): 163-73.

34. Hardiker NR, Rector AL. Structural validation ofnursing terminologies. Journal of the AmericanMedical Informatics Association 2001; 8 (3):212-21.

35. Humphreys BL, McCray AT, Cheh ML. Evalu-ating the coverage of controlled health data termi-nologies: report on the results of the NLM/AHCPR large scale vocabulary test. Journal of theAmerican Medical Informatics Association 1997;4 (6): 484-500.

36. Warnekar P, Carter J. HIV Terms Coverage by aCommercial Nomenclature. In: Musen MA,

265

A Framework for Characterizing Terminological Systems

Methods Inf Med 3/2006

Friedman CP, Teich JM, editors. Proceedingsof the 2003 AMIA Annual Symposium; 2003;Washington, DC, USA. Philadelphia, PA, USA:Hanley and Belfus Inc.; 2003. p 1046.

37. Wasserman H, Wang J. An Applied Evaluation ofSNOMED CT as a Clinical Vocabulary for theComputerized Diagnosis and Problem List. In:Musen MA, Friedman CP, Teich JM, editors. Pro-ceedings of the 2003 AMIA Annual Symposium;2003; Washington, DC, USA. Philadelphia,

PA, USA: Hanley and Belfus Inc.; 2003. pp699-703.

38. Arts D, De Keizer N, De Jonge E, Cornet R. Com-parison of methods for evaluation of a medical ter-minological system. In: Fieschi M, Coiera E, Li J,editors. Proceedings from Medinfo 2004, SanFrancisco, CA, USA. Amsterdam, The Nether-lands: IOS Press; 2004. pp 467-71.

39. Rector AL. Clinical terminology: why is it sohard? Methods Inf Med 1999; 38(4-5): 239-52.

Correspondence to:Ronald CornetDepartment of Medical Informatics, J1b-114Academic Medical Center, Universiteit van AmsterdamP.O. Box 227001100 DE AmsterdamThe NetherlandsE-mail: [email protected]

266

Cornet et al.

Methods Inf Med 3/2006