Transcript
Page 1: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Taxonomies and Indexing: A Technical Strategy

Diane Vizine-GoetzOffice of Research

OCLC Online Computer Library Center,

Inc.

Page 2: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Context

Techniques and approaches developed by & for libraries and other institutions responsible for preserving the human record

Broad scopeLong tradition of information

organization

Page 3: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Why organize information?

For Search and retrieval Use Preservation & disposition

Page 4: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Why Organize Information by Subject?

Find information on a particular subject Only and all relevant information

precisionrecall

Find related information

Page 5: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

How?

Subject analysis Conceptual analysis--Determining what

an information object is “about” Translate concepts into knowledge

organization (KO) scheme• e.g., Subject indexes• Thesauri• Classification scheme

Automated, Semi-automated, Human/Intellectual

Page 6: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Automation & Subject Analysis

Subject Analysis Conceptual Analysis Translate concept intoKO scheme

Automated- Web search engine

+KO scheme

Automatic identification ofkey concepts (names,words, and phrases)

NA

Automatic translation toKO scheme

Semi-automated- Machine aidedindexing/classification

Automatic identification ofkey concepts

Human-controlledtranslation to KO scheme

Human/Intellectual- Traditional indexing &library cataloging

Human identification ofkey concepts (topics,names, time periods,forms/genre

Human translation to KOscheme

Page 7: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Automated Concept Identification

Automated Indexing Ranges from simply identifying words in a

document, to Sophisticated analyses that identify key

names, words, and phrasesWordSmith Project http://orc.rsch.oclc.org:5061/

Automated Classification Automated assignment of documents to

categories or classes

Page 8: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Political News Concepts Extracted by WordSmith

fair housing

fair housing act

family planning

family planning programmes family planning programs

family planning services

federal government

federal government deficit federal reserve

federal reserve bank

federal reserve board

federal reserve chairman alan greenspan

federal reserve system

Page 9: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 10: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 11: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 12: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 13: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Advantages of automatic concept identification

InexpensiveSuitable for indexing/categorizing

large quantities of textCan identify popular and emerging

concepts and terminology

Page 14: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Why use knowledge organization schemes?

Knowledge organization schemes such as subject heading lists, thesauri, & classification schemes are specialized languages designed for retrieving information

Goal--to reduce ambiguities that cause precision & recall failures

Page 15: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Free text v.s. controlled subject retrieval language

WordSmith

family planning family planning

programmes family planning

programs family planning services

Library of Congress Subject Headings (LCSH)

Birth control clinics

UF Family planning services

Planned parenthood services

BT Clinics

19860211

Page 16: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

MeSH Heading vs. LCSH

Family PlanningNote: Programs or services

designed to assist the family in controlling reproduction by either improving or diminishing fertility.

Entry Term Birth Control Planned Parenthood Basal Body Temperature Method Birth Limiting Births Averted Family Planning Surveys ...

Birth control (19880919)

UF Family planning Planned parenthood Population control Pregnancy--Prevention

BT Hygiene, Sexual Sexual ethics

RT Contraception Family size

NT AbortionBirth IntervalsChildlessness...

Page 17: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Characteristics of subject retrieval languages

Terminology is often domain specificMedicine > MeSH; Engineering > INSPEC;

Agriculture > Agrovoc

Control vocabulary (synonyms & homonyms)

Express relationships between terms

Page 18: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Within a domain, terms are context independent

Ei ThesaurusTM

Bank protectionUF

Coastal engineering--Bank protectionInland waterways--Bank protection

SNProtection of river banks and lake shores. For seacoasts, use SHORE PROTECTION

DT January 1993

BT Protection

RTBanks (bodies of water)Coastal engineeringEnvironmental engineeringErosionInland waterwaysRiver controlShore protectionSlope protectionSoil conservation

MC 407.2; 407.3 OC 914.1

Page 19: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Controlled Vocabulary

Preferred way of expressing a concept e.g., Popular vs. technical

• Heart attack vs. Myocardial infarction

Non-used vocabulary often includedSynonyms

• Current/Outdated terms > Disabled/Handicapped

Lexical variants• Phrase/Inverted forms > Bilingual education/Education,

Bilingual

Quasi-Synonyms• Synonyms/Antonyms > Literacy/Illiteracy

Page 20: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Relationships

Equivalence Synonymous terms

HierarchyGeneric relationship (kind)Whole-part relationshipInstance relationship (example)

Association

Page 21: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Subject Retrieval using a controlled vocabulary

Page 22: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Related Terms in LCSH

Page 23: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Classification / Categorization System

A systematic arrangement of knowledge into useful categories General schemes & special schemes

DDC, LCC, UDC & AGRIS, MSC

Present a generalized view of knowledge at varying levels of depth

May be enumerative or synthetic

Page 24: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Some Advantages of Traditional Schemes

Meaningful notationWell-developed hierarchiesWell-defined categoriesRich network of relationships

Page 25: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Meaningful Notation (DDC)

005.1 Programming005.1 Programmation

005.1 Программирование

005.1 Programación

Page 26: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

DDC Notation Indicates Hierarchy

600 Technology

630 Agriculture

633 Field and plantation crops

633.1 Cereals

633.11 Wheat 633.12 Buckwheat 633.13 Oats

Page 27: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Well-developed Hierarchies

Page 28: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Hierarchies & Categories

Hierarchical from general to specificCategories have superordinate,

coordinate, subordinate relationships in hierarchy

Subcategories must be mutually exclusive

Page 29: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Hierarchies & Categories

Top > Recreation > Automotive > Driving > Road Rage

Social Problems > Public Safety > Traffic Hazards > Highways > Road Rage

Page 30: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Hierarchies, Categories, Relationships

500 Science510 Mathematics512 Algebra, number theory

512.3 Fields Class here field theory, Galois

theory Class linear algebra in 512.5;

class number theory in 512.7

Page 31: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Advantages of Category Schemes

Facilitate retrieval based on concepts not simply keywords

Provide context for search terms (disambiguates)

Facilitate browsing & search refinement

Page 32: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 33: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 34: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 35: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc
Page 36: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Advantages & Disadvantages of Formal KO Schemes

+ Bring like items together Provide context & show relationships Support browsing May accommodate multilingual usage

- Reactive to emerging topics Terminology may not match users Not practical to apply to everything

Page 37: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Advantages & Disadvantages of Free Text

+ Latest terminology Application not an issue

- User must to produce synonyms and

relationships Limited browsing Little multilingual support

Page 38: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Other Solutions

Combine approaches Map among KO schemes Map free text terms to KO schemes Produce supplemental browsable

indexes from free text

Page 39: Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc

Resources

ANSI/NISO Z39.19-1993 (Revision of ANSI Z39.19-1980) Guidelines for the Construction, Format, and Management of Monolingual Thesauri <http://www.niso.org/stantech.html#z3919>

Controlled vocabularies, thesauri and classification systems available in the WWW. DC Subject <http://www.lub.lu.se/metadata/subject-help.html>

The Intellectual Foundation of Information Organizationby Elaine Svenonius. MIT Press; ISBN: 0262194333

List of Web Subject Resources <http://www.loc.gov/catdir/pcc/saco/resources.html>

The Organization of Information (Library and Information Science Text Series) by Arlene G. Taylor. Libraries Unlimited; ISBN: 1563084988

Resources for Indexers <http://www.asindexing.org/asires.shtml>


Top Related