making decisions in creating taxonomies

25
November 8, 2007 Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential. Making Decisions in Making Decisions in Creating Taxonomies Creating Taxonomies Heather Hedden Information Taxonomist, Viziant Corporation

Upload: heather-hedden

Post on 01-Nov-2014

429 views

Category:

Technology


3 download

DESCRIPTION

Taxonomy Boot Camp conference presentation 2007

TRANSCRIPT

Page 1: Making Decisions in Creating Taxonomies

November 8, 2007

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Making Decisions in Making Decisions in Creating Taxonomies Creating Taxonomies

Heather HeddenInformation Taxonomist, Viziant Corporation

Page 2: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Background

• Heather Hedden’s taxonomy development experience– controlled vocabularies for periodical index databases (Gale)– matching of controlled vocabulary to keywords for consumer

products/services directories (various “yellow pages” clients)– enterprise taxonomies for corporate web sites and intranets (Earley

& Associations)– base and custom taxonomies integrated within a knowledge

discovery and data mining product (Viziant)

• Viziant Corporation– A provider of information access and intelligence systems for

enterprises and government

Page 3: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Decisions for the Taxonomist

• Decisions of the taxonomy owner– Approximate number of top-level nodes and number of levels– Structure: primarily facets or tree– Interface design: number and layout of displayed nodes– Presence of polyhierarchies– Automated search & retrieval or human indexing/tagging

• Decisions often left to the taxonomist– Exact/final number of levels, nodes per level– Arrangement of the node hierarchy, placement within facets– Degree of term pre- or post-coordination– Extent of use of variants/cross-references

Page 4: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Number of levels, nodes per level

• 3 levels and 6-8 nodes per level is a nice ideal– Web site/intranet menu navigation

• Menu is confined to bar across top or margin to the side• Menus pull-down or topic trees expand in place

• More levels and nodes per level are often needed – Content management/document retrieval for large content

repositories• industries, products, fields of science, diseases, geographies,

named entities

• Decision: Make more levels or make more nodes per level

Page 5: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Number of levels, nodes per level: Examples

Deep: Many levels

Geographies- North America - South America - Europe - Asia - Africa - Oceania-- United States --Central Asia--- New England --Middle East---- Massachusetts --South Asia----- Boston --Southeast Asia------ North End------- Old North Church

Broad: Many nodes per level

Geographies- U.S. cities - U.S. States - Countries - World cities - Continents - Landmarks-- Albuquerque -- Alabama

Page 6: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Number of levels, nodes per level: Examples

Deep: Many levels (SIC, NAICS style with 10-20 upper level nodes)

Industries - Transportation services-- Air transportation--- Schedule air transportation services---- Scheduled air freight transportation services

Broad: Many nodes per level (job search sites, 50 - 80 nodes per level)

Industries Second levels at select nodes only: Healthcare, Sales- Accounting/Auditing- Administrative Support Services- Advertising/Marketing/Public Relations- Aerospace/Aviation/Defense- Agriculture, Forestry, & Fishing- Airlines

etc.

Page 7: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Number of levels, nodes per level

• Decision Factors– Display interface/horizontal and vertical real estate– Speed of displaying deeper levels– User market, needs, and expectations

• Industry experts, internal employees, general public, students, etc.

• Need to balance how much can be easily skimmed in one view vs. how many levels down the user has patience to click down through

• More levels lead to less consistency across levels.

Page 8: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Arrangement of node hierarchy

• Decision: What’s the best method to handle different means of classification within the same hierarchy?– Industries by traditional SIC/NAICS classification or by vertical

market– Products by manufacturing technology or by end-use– Places by physical geographic location or by type– Organizations by goals/objectives or by political/religious affiliation– Government agencies by type or by country/state of affiliation

• Even within facets, there often are hierarchies.• Even allowing polyheirarchies, a top-level classification is

needed, and too many polyhierarchies can be confusing.

Page 9: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Arrangement of node hierarchy: Examples

1. Governmental bodies & agencies- U.S. governmental bodies & agencies-- U.S. Courts-- U.S. executive branch agencies-- U.S. legislative branch-- State bodies & agencies- Foreign governmental bodies & agencies-- Foreign courts-- Foreign legislatures-- Foreign national agencies-- Foreign state & provincial government agencies

2. Governmental bodies & agencies-- Foreign legislatures (+ instances)-- U.S. legislatures (+ US federal and state instances)

3. Governmental bodies & agencies- Legislative bodies-- National legislatures (+ instances, both foreign and US)-- State & provincial legislatures (+ all instances alphabetical for US and foreign)

4. Governmental bodies & agencies- Legislative bodies (+ all instances, US and foreign, in one alphabetical list)

Page 10: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Arrangement of node hierarchy

• Decision: If linking named entities to topical subjects, should they each link at the lowest node level possible, or group all of them together at a higher level?

• Example: Link specific churches at the broader term, Churches (denominations), the appropriate narrower term, or both

Churches (denomination)- Catholic churches- Orthodox churches- Protestant churches

Does the user know where to look for the Maronite Church or the Assyrian Church of the East?

Page 11: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Arrangement of node hierarchy

• Decision factors:– Knowledge of users as to where to categorize an entity– Likelihood of users to browse rather than search for entities– Existence of entities that don’t belong in a subcategory– Purpose to teach users (students) where entities belong

• Linking entities at both specific and broader level, makes them easier to find, but clutters up the taxonomy, slows down performance, and may not seem logical at first to the user

Page 12: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Arrangement of node hierarchy

• Decision Factors– User market, needs, and expectations

• How the users classify the subject matter• Whether a topic is even likely to be browsed for in the taxonomy

or rather entered in the search box– Support for polyhierachies– Permissibility of nodes as category labels, not linked to content, at

various intermediate levels within the hierarchy• e.g. Foreign legislatures

• Need to consider– Whether to create nodes difficult to distinguish in indexing

• e.g. both Legislative bodies and National legislatures

Page 13: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Placement within facets

• Facets may be determined by taxonomy owner, but placement of certain nodes may not be obvious– Institutions could be Places or Organizations

• Places of worship, educational institutions, museums, libraries– Business activities could be Actions or Topics

• Acquisitions, Contracts, Joint ventures, Sales

• Decisions:– In which facet to put these nodes– Whether two (parenthetically modified) nodes for the concept

should be created, one for each facet, e.g. Hotels (buildings) and Hotels (companies)

– Or whether a node can be in more than one facet

Page 14: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Placement within facets

• Decision factors– System support for two occurrences of the same-named node– Automated or manual indexing

• Automated indexing may not distinguish between different facet-meanings of a term: action or topic, place or organization, etc.

Page 15: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Term pre-coordination or post-coordination

• Hierarchical tree or thesauri serve pre-coordination – User browses for most specific concept

• Facets serve post-coordination– User chooses combination of concepts from multiple facets (e.g.

place, product type, usage issue, customer type)

• But topic trees/thesauri may be used within a UI supporting multiple search terms (narrow a search)

• But hierarchies can exist within facets

• Decisions: – In a topic tree/thesaurus, whether to expect post-coordination– In a faceted taxonomy, whether and how much to have pre-

coordination

Page 16: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Term pre-coordination or post-coordination

• Place and Topic facets– Russian foreign policy or Russia and Foreign policy– French embassies or France and Embassies– United States-Canadian relations

• Ethnicity and Occupation facets– Hispanic writers or Hispanics and Writers

• Body part and Disease facets– Ovarian cancer or Ovaries and Cancer

• Business action and Product facets– Drug trials or Product testing and Drugs– CRM Software or Customer Relations Management and

Software/Marketing software

Page 17: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Term pre-coordination or post-coordination

• Decision Factors– Human or automated indexing/tagging

• If human indexing, all could be post-coordinated– Keyword searching or taxonomy browse

• If Keyword searching, pre-coordinated is preferred– Nature and volume of content

• Specific content serves narrower pre-coordinated subjects– Scope of the content

• Wide range of articles is better served by pre-coordination

Page 18: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Term pre-coordination or post-coordination

• Advantages to pre-coordinated terms– Provide more precise retrieval results, if used correctly– Better suited for specific, custom taxonomies– Better suited for phrase search string searching

• Disadvantages to pre-coordinated terms– Narrower nodes might be overlooked by the user.– More complex to correctly index.

• Flexibility in degree of pre- or post-coordination is OK, but consistency of application makes the taxonomy more usable.

Page 19: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Variants and cross-references

• Variants, Nonpreferred terms, Nonpostable terms, Equivalent terms, See references, Cross-references, Keywords

• First, take into consideration:– Human or automated indexing/tagging– Automated stemming– Taxonomy browse, search, or both. If both, which is dominant– Content from divergent sources, countries– System/UI support for a variant pointing to more than one node

Page 20: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Variants and cross-references

• Decision: whether a concept should be a node or its variant (when they are not synonyms)– Create a more specific/narrower node, or use it as a variant

• Hydroelectric plants USE Electric power plants• Factories USE Plants & factories

– Differentiate closely related terms, or use one as a variant• Foreign policy vs. International relations• Colleges & universities vs. Higher education

– Differentiate topics from actions, or use one as a variant• Contracts vs. Contracting• Investments vs. Investing

Page 21: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Variants and cross-references

• Decision: whether a term should be a node or its variant (when synonyms)– Plural vs. singular– Acronym vs. spelled out form– Technical/academic vs. popular term– Synonyms also for a word within a phrase-term

• administration vs. management• oil vs. petroleum• communications vs. telecommunications• health vs. medical

Page 22: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Variants and cross-references

• Decision Factors: for the number of variants per node– Users as monolithic or diverse– Size of taxonomy (if browsable)

• If small and easily learned then large number of variants unnecessary

– Human or automated indexing/tagging• Automated indexing needs many more variants

– Keyword searching or taxonomy browse• If Keyword searching needs more variants

– Nature and volume of content• Broad/general content needs more variants

– Display of Cross-references• Limit the number of variants if they display in the UI

Page 23: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Variants and cross-references

• Decision Factors: for the choice of term as node or variant– User background, level of expertise, expectations– Political correctness, instructiveness to users– Number of characters in display width

• The more stakeholders involved, the more complex the decision in choosing the preferred name of the node

Page 24: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Conclusions

• Taxonomy creation is a decision-making task• Different decisions are based on different factors• Each taxonomy project is unique• Creators/editors of the taxonomy need to know:

– Who are the users and what are their needs– What is the nature of the content– What the user interface will look like– What the system supports (faceted search, multiple cross-refs)– How the content will be indexed/tagged

Page 25: Making Decisions in Creating Taxonomies

Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

Questions?

Heather HeddenInformation TaxonomistViziant CorporationTwo International Place, Suite 410Boston, MA 02110www.viziantcorp.com

[email protected] ext. 104978-467-5195 (cell)