copyright © 2006 access innovations, inc. 1 building taxonomies part 4 alice redmond-neal access...
TRANSCRIPT
![Page 1: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/1.jpg)
Copyright © 2006 Access Innovations, Inc. 1
Building Building TaxonomiesTaxonomies
Part 4Part 4 Alice Redmond-NealAccess Innovations, Inc.
Enterprise Search SummitNew York City, May 21, 2006
![Page 2: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/2.jpg)
Copyright © 2006 Access Innovations, Inc. 2
Evaluating termsEvaluating termsEvaluating termsEvaluating terms
• Do terms represent all necessary concepts? – Gap analysis
• Do terms capture necessary details? – Level of granularity
• Are terms understood by users? – Domain expert vs. common user
![Page 3: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/3.jpg)
Copyright © 2006 Access Innovations, Inc. 3
Talk about termsTalk about termsTalk about termsTalk about terms
• Term format• Grammatical issues• Singular and plural forms• Spelling• Abbreviations and acronyms• Capitalization• Other punctuation• Consistency
![Page 4: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/4.jpg)
Copyright © 2006 Access Innovations, Inc. 4
Term formatTerm formatTerm formatTerm format
• KISS – Keep it short and simple– 1-2-3 words
• Effect on search• Factoring, Postcoordination (coming)
• Grammatical issues– Nouns and noun phrases– Verbish things– Adjectives– Adverbs– Initial articles
![Page 5: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/5.jpg)
Copyright © 2006 Access Innovations, Inc. 5
Most terms are nounsMost terms are nounsMost terms are nounsMost terms are nouns
• Nouns or simple noun phrases (phrase = compound or bound term)– Adj + Noun – Art history (ANSI/NISO
standard)• Noun + Prep + Noun – History of art (ISO
standard)
– Exceptions – Burden of proof, Coats of arms, Prisoners of war, Birds of prey, etc.
![Page 6: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/6.jpg)
Copyright © 2006 Access Innovations, Inc. 6
Other parts of speechOther parts of speechOther parts of speechOther parts of speech• Verbs
– Gerund form: Fishing• Adjectives
– Not used in isolation– Very rare (lots in Art & Architecture
Thesaurus)– OK when combined with another term –
Dental bridges• Adverbs
– No, except as part of proper name – Very Large Array
• Articles– No, except as part of proper name –
El Salvador, Le Mans
![Page 7: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/7.jpg)
Copyright © 2006 Access Innovations, Inc. 7
Singular and plural formsSingular and plural formsSingular and plural formsSingular and plural forms
• Plural form for count nouns– “how many” clouds, animals, highways
• Singular form for mass nouns– “how much” security, oxygen, rain
• Exceptions– Body parts in medicine singular (heart,
foot)– Unique entities singular (Brooklyn
Bridge)– User warrant plural/singular (fishes)
stocks?fishes?monies?
![Page 8: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/8.jpg)
Copyright © 2006 Access Innovations, Inc. 8
Term spellingTerm spellingTerm spellingTerm spelling
• Preferred spelling depends on audience– Multinational company may need
alternative spellings in same taxonomy
• Use most widely accepted spelling• Use secondary spelling as NonPreferred
Term (synonym)• Exception:
– Proper names – Labour Party
![Page 9: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/9.jpg)
Copyright © 2006 Access Innovations, Inc. 9
Abbreviations and Abbreviations and acronymsacronymsAbbreviations and Abbreviations and acronymsacronyms
• Use only when full form is rarely seen – SCUBA, LASER, DNA, LASIK
• Use full form if abbreviation is not widely used and understood– Automated teller machines – for ATM– Driving while intoxicated – for DWI
• Alternative becomes NonPreferred Term• Use and acceptance always shifting• Be consistent
![Page 10: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/10.jpg)
Copyright © 2006 Access Innovations, Inc. 10
CapitalizationCapitalizationCapitalizationCapitalization
• Standards: use all lower case– Exceptions:
• Initialisms – DNA• Proper names – Queen Mary• Trade names – Thesaurus Master™• Taxonomic names – Homo sapiens
• Much variation in practice
![Page 11: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/11.jpg)
Copyright © 2006 Access Innovations, Inc. 11
ParenthesesParenthesesParenthesesParentheses• Use only for
– Parenthetical qualifiers to disambiguate homographs • Bridges (Dentistry), Bridges (Roadways), Bridges
(Music)– Different meanings for singular / plural word forms
• Bridges [all the above] vs. Bridge (Card game)• Wood (Material) vs. Woods (Forest)• Damage (Injury) vs. Damages (Law)
– Facet indicators – Paint (by finish)– Part of the term – benzo(a)pyrene – Trademark indicator (tm) becomes ™
![Page 12: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/12.jpg)
Copyright © 2006 Access Innovations, Inc. 12
HyphensHyphensHyphensHyphens
• Generally avoid -- nonfiction
• Use only if– Omitting the hyphen would be
ambiguous• cocitation vs. co-occurrence
– The hyphen is part of the term• n-body problem• p-benzoquinone• CD-ROM
![Page 13: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/13.jpg)
Copyright © 2006 Access Innovations, Inc. 13
Other punctuation bitsOther punctuation bitsOther punctuation bitsOther punctuation bits
• Apostrophes– Keep for possessive case
• Diacritical marks– Keep if possible –
Québec
• Other random marks– Keep if part of a proper name –
A&W Root BeerStandard & Poors
![Page 14: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/14.jpg)
Copyright © 2006 Access Innovations, Inc. 14
Compound terms Compound terms (aka bound (aka bound
terms)terms) and factored termsand factored termsCompound terms Compound terms (aka bound (aka bound
terms)terms) and factored termsand factored terms• Term consisting of more than one
word that represents a single concept
• Keep compound term or factor out (split)?
![Page 15: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/15.jpg)
Copyright © 2006 Access Innovations, Inc. 15
Compound terms Compound terms are precoordinatedare precoordinatedCompound terms Compound terms are precoordinatedare precoordinated• Elements are bound together to specify a
concept at the indexing stage• Can’t change the parts
Water pollutionLibrary scienceTelevision influence on preschoolers
Chicken dinner with turnips and rutabagas- no substitutions of menu items!
![Page 16: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/16.jpg)
Copyright © 2006 Access Innovations, Inc. 16
Factored terms Factored terms can be Postcoordinatedcan be PostcoordinatedFactored terms Factored terms can be Postcoordinatedcan be Postcoordinated• Elements can be strung together to
specify a concept at the search stage• Elements can be mixed and
combined as needed– Few clothing pieces several outfits
• The sum of the elements reflects the concept (usually)
![Page 17: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/17.jpg)
Copyright © 2006 Access Innovations, Inc. 17
To factor or not to factorTo factor or not to factorTo factor or not to factorTo factor or not to factorIs each factor a single concept?Is each factor in your thesaurus?
If YES, break term down to factors: California highway construction
California + Highways + Construction
If NO, or if factoring would be confusing, retain the compound termChildren’s television Television + Children ??Science library Library + Science ??
![Page 18: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/18.jpg)
Copyright © 2006 Access Innovations, Inc. 18
Precoordination positivesPrecoordination positivesPrecoordination positivesPrecoordination positives
• User expectations – Rapid transit– Occurs commonly in data– Splitting would be odd– Reflects a single concept for the audience
• Better accuracy – captures specific concepts precisely
• Fewer false drops• Term information is retained
(Related Terms, NonPreferred Terms, Scope Notes, …)
![Page 19: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/19.jpg)
Copyright © 2006 Access Innovations, Inc. 19
Precoordination negativesPrecoordination negativesPrecoordination negativesPrecoordination negatives
• Poorer total recall• Term proliferation
– Combinations and permutations increase thesaurus size
• Higher cost• Limited flexibility in expressing
new concepts
![Page 20: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/20.jpg)
Copyright © 2006 Access Innovations, Inc. 20
Postcoordination pros and Postcoordination pros and consconsPostcoordination pros and Postcoordination pros and consconsHigher recallLower costGreater flexibility – enables expression
of new concepts through novel combinations
x Lower accuracy, some false drops– Library scienceNOT = Library + Science– Art museums NOT = Art + Museums
• Postcoordination is implicit in most online searches (implied AND between search words)
![Page 21: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/21.jpg)
Copyright © 2006 Access Innovations, Inc. 21
About “and”About “and”About “and”About “and”• Avoid “and” in terms – not a single
conceptInstead of: Children and television
Factor and postcoordinate
USE Media influence + Television + Children
• “and” OK when both elements are members of a broader class
Vessels Ships and boats
Your need for granularity may dictate your choice
![Page 22: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/22.jpg)
Copyright © 2006 Access Innovations, Inc. 22
So far you’ve gotSo far you’ve gotSo far you’ve gotSo far you’ve got
• Hierarchy• Complete term records
– Broader and Narrower Terms• Polyhierarchies when needed
– Preferred/NonPreferred Terms (equivalence relationships)
– Related Terms (associative relationships)– Scope Notes– Correct term format– Compound terms when needed
![Page 23: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/23.jpg)
Copyright © 2006 Access Innovations, Inc. 23
NotationNotationNotationNotation• Symbols (numbers, letters, hyphens,
colons…)– 1: Apples
• 1.1: Granny Smith• 1.2: Winesap
• Another kind of ordering (non-alphabetic)– Chronological, positional, numeric sequence,
or other logical sequence for user group– Same terms presented differently – Different user groups, different purposes
• Adjunct to verbal expression of term• Secondary to verbal concept organization
![Page 24: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/24.jpg)
Copyright © 2006 Access Innovations, Inc. 24
Automatic taxonomy Automatic taxonomy construction construction Automatic taxonomy Automatic taxonomy construction construction
• Words and phrases from documents• Based on frequency and co-occurrence
of words• No semantic analysis• Produces list of possible terms • Requires editorial analysis
– hierarchical and conceptual organization– association of related concepts– identifying and deduplicating equivalent
concepts
![Page 25: Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 4 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,](https://reader035.vdocuments.mx/reader035/viewer/2022070407/56649e415503460f94b332e7/html5/thumbnails/25.jpg)
Copyright © 2006 Access Innovations, Inc. 25
Review, Review, edit,edit, test, test, edit,edit, use, use, edit,edit, and maintain, i.e. and maintain, i.e. editedit
Review, Review, edit,edit, test, test, edit,edit, use, use, edit,edit, and maintain, i.e. and maintain, i.e. editedit• Review
– Users– Expert reviewers
• Test– Index 500+
documents (more for variable writing style; fewer for strict style)
– Monitor search log
• Edit and maintain– Add term– Change existing term– Change term status– Delete term– Add term relationship– Delete term relationship– Add/modify Scope Note– Change overall
structure
Consider machine automated / assisted indexing software