taxonomies and indexing: a technical strategy diane vizine-goetz office of research oclc online...

Download Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

If you can't read please download the document

Post on 11-Jan-2016




0 download

Embed Size (px)


  • Taxonomies and Indexing: A Technical StrategyDiane Vizine-GoetzOffice of ResearchOCLC Online Computer Library Center, Inc.

  • ContextTechniques and approaches developed by & for libraries and other institutions responsible for preserving the human recordBroad scopeLong tradition of information organization

  • Why organize information?ForSearch and retrievalUsePreservation & disposition

  • Why Organize Information by Subject?Find information on a particular subjectOnly and all relevant informationprecisionrecallFind related information

  • How?Subject analysisConceptual analysis--Determining what an information object is aboutTranslate concepts into knowledge organization (KO) schemee.g., Subject indexesThesauriClassification schemeAutomated, Semi-automated, Human/Intellectual

  • Automation & Subject Analysis

    Subject Analysis

    Conceptual Analysis

    Translate concept into KO scheme


    - Web search engine

    +KO scheme

    Automatic identification of key concepts (names, words, and phrases)


    Automatic translation to KO scheme


    - Machine aided indexing/classification

    Automatic identification of key concepts

    Human-controlled translation to KO scheme


    - Traditional indexing & library cataloging

    Human identification of key concepts (topics, names, time periods, forms/genre

    Human translation to KO scheme

  • Automated Concept IdentificationAutomated IndexingRanges from simply identifying words in a document, toSophisticated analyses that identify key names, words, and phrasesWordSmith Project ClassificationAutomated assignment of documents to categories or classes

  • Political News Concepts Extracted by WordSmithfair housing fair housing act family planning family planning programmes family planning programs family planning services federal government federal government deficit federal reserve federal reserve bank federal reserve board federal reserve chairman alan greenspan federal reserve system

  • Advantages of automatic concept identificationInexpensiveSuitable for indexing/categorizing large quantities of textCan identify popular and emerging concepts and terminology

  • Why use knowledge organization schemes? Knowledge organization schemes such as subject heading lists, thesauri, & classification schemes are specialized languages designed for retrieving informationGoal--to reduce ambiguities that cause precision & recall failures

  • Free text v.s. controlled subject retrieval language WordSmith

    family planning family planning programmes family planning programs family planning servicesLibrary of Congress Subject Headings (LCSH)

    Birth control clinicsUF Family planning services Planned parenthood servicesBT Clinics


  • MeSH Heading vs. LCSHFamily PlanningNote: Programs or services designed to assist the family in controlling reproduction by either improving or diminishing fertility.Entry Term Birth Control Planned Parenthood Basal Body Temperature Method Birth Limiting Births Averted Family Planning Surveys ...Birth control (19880919)UFFamily planning Planned parenthood Population control Pregnancy--Prevention BTHygiene, Sexual Sexual ethics RTContraception Family sizeNTAbortionBirth IntervalsChildlessness...

  • Characteristics of subject retrieval languages Terminology is often domain specificMedicine > MeSH; Engineering > INSPEC; Agriculture > AgrovocControl vocabulary (synonyms & homonyms)Express relationships between terms

  • Within a domain, terms are context independentEi ThesaurusTMBank protectionUFCoastal engineering--Bank protectionInland waterways--Bank protectionSNProtection of river banks and lake shores. For seacoasts, use SHORE PROTECTIONDT January 1993

    BT ProtectionRTBanks (bodies of water)Coastal engineeringEnvironmental engineeringErosionInland waterwaysRiver controlShore protectionSlope protectionSoil conservationMC 407.2; 407.3 OC 914.1

  • Controlled VocabularyPreferred way of expressing a concepte.g., Popular vs. technicalHeart attack vs. Myocardial infarctionNon-used vocabulary often includedSynonymsCurrent/Outdated terms > Disabled/HandicappedLexical variantsPhrase/Inverted forms > Bilingual education/Education, BilingualQuasi-SynonymsSynonyms/Antonyms > Literacy/Illiteracy

  • RelationshipsEquivalenceSynonymous termsHierarchyGeneric relationship (kind)Whole-part relationshipInstance relationship (example)Association

  • Subject Retrieval using a controlled vocabulary

  • Related Terms in LCSH

  • Classification / Categorization SystemA systematic arrangement of knowledge into useful categoriesGeneral schemes & special schemesDDC, LCC, UDC & AGRIS, MSCPresent a generalized view of knowledge at varying levels of depthMay be enumerative or synthetic

  • Some Advantages of Traditional SchemesMeaningful notationWell-developed hierarchiesWell-defined categoriesRich network of relationships

  • Meaningful Notation (DDC)005.1Programming005.1Programmation005.1005.1Programacin

  • DDC Notation Indicates Hierarchy600Technology


    633Field and plantation crops


    633.11 Wheat 633.12 Buckwheat 633.13 Oats

  • Well-developed Hierarchies

  • Hierarchies & CategoriesHierarchical from general to specificCategories have superordinate, coordinate, subordinate relationships in hierarchySubcategories must be mutually exclusive

  • Hierarchies & CategoriesTop > Recreation > Automotive > Driving > Road Rage

    Social Problems > Public Safety > Traffic Hazards > Highways > Road Rage

  • Hierarchies, Categories, Relationships500Science510 Mathematics512 Algebra, number theory512.3 Fields Class here field theory, Galois theory Class linear algebra in 512.5; class number theory in 512.7

  • Advantages of Category SchemesFacilitate retrieval based on concepts not simply keywordsProvide context for search terms (disambiguates)Facilitate browsing & search refinement

  • Advantages & Disadvantages of Formal KO Schemes+Bring like items togetherProvide context & show relationshipsSupport browsingMay accommodate multilingual usage -Reactive to emerging topicsTerminology may not match usersNot practical to apply to everything

  • Advantages & Disadvantages of Free Text +Latest terminologyApplication not an issue-User must to produce synonyms and relationshipsLimited browsingLittle multilingual support

  • Other SolutionsCombine approachesMap among KO schemesMap free text terms to KO schemesProduce supplemental browsable indexes from free text

  • ResourcesANSI/NISO Z39.19-1993 (Revision of ANSI Z39.19-1980) Guidelines for the Construction, Format, and Management of Monolingual Thesauri Controlled vocabularies, thesauri and classification systems available in the WWW. DC Subject The Intellectual Foundation of Information Organization by Elaine Svenonius. MIT Press; ISBN: 0262194333List of Web Subject Resources The Organization of Information (Library and Information Science Text Series) by Arlene G. Taylor. Libraries Unlimited; ISBN: 1563084988Resources for Indexers

    Great Library of Alexandria

    started around 300 B.C.

    500,000 scrolls

    to present-day Web resourcesHierarchical--meets the test of superordination, coordination, and subordination

    Animal husbandry is part of two broader classes: Agriculture and Technology. Under 636 are several subordinate classes, two of which are the ones containing dogs and cats. Dogs and cats are coordinate classes (the characteristic of division is domestic animals). Animal husbandry is superordinate to both.