taxonomy fundamentals workshop

117
Taxonomy Fundamentals Workshop Taxonomy Boot Camp, October 16, 2012, Washington, DC Marjorie Hlava, President Access Innovations, Inc. www.accessinn.com Heather Hedden Hedden Information Management www.hedden-information.com

Post on 18-Oct-2014

2.320 views

Category:

Documents


0 download

DESCRIPTION

Opening presentation for Track 1 of the 2012 Taxonomy Boot Camp, October 16, 2012. Presented by Marjorie M.K. Hlava of Access Innovations and Heather Hedden of Hedden Information Management.

TRANSCRIPT

Page 1: Taxonomy Fundamentals Workshop

TaxonomyFundamentals

Workshop

Taxonomy Boot Camp, October 16, 2012, Washington, DC

Marjorie Hlava, PresidentAccess Innovations, Inc.

www.accessinn.com

Heather Hedden

Hedden Information Managementwww.hedden-information.com

Page 2: Taxonomy Fundamentals Workshop

Introductions

Marjorie HlavaPresident, Access Innovations, Inc.

Heather HeddenTaxonomy Consultant, Hedden Information Management

Author, The Accidental Taxonomist

Page 3: Taxonomy Fundamentals Workshop

Outline

• The basics – 30 minutes

• More details: Polyhierarchies and Facets– 30 minutes (including exercises)

• “Taxonomatch” – 15 minutes

• Implementation and applications – 15 minutes

• Q & A

Page 4: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

The Basics – 30 minutes

• What is a taxonomy?• What are the parts of a taxonomy?• How do you build one?• Guidelines for the terms• Subject Matter Experts (SME’s)• 40 slides

Page 5: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

What is a Taxonomy? ANSI/NISO Z39.19-2005

“A collection of controlled vocabulary terms organized into a

hierarchical structure.”

controlled

Missing: equivalence, associative relationships, and notes

Yes!

Page 6: Taxonomy Fundamentals Workshop

© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.

The Semantic Road Map: Knowledge Organization Systems

Semantic network

Ontology

Thesaurus

Taxonomy

Controlled vocabulary

Synonym set/ring

Name authority file

Uncontrolled list• Unrelated Entities• Ambiguity

• Linked Entities• Contextual Specificity

• Simple• Low value

• Complex• High value

Uncontrolled list

Highest Cost over Time!

Page 7: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Basic features - The term record

• Main Term (MT) • Top Term (TT)• Broader Terms (BT)• Narrower Terms (NT)• Related Terms (RT)

– See also (SA)• Non-Preferred Term (NP)

– Used for (UF), See (S)– Synonyms

• Scope Note (SN)• History (H)

= subject term, heading, node, category, descriptor, class

TAXONOMY

THESAURUS

ONTOLOGY

Page 8: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Taxonomy? Thesaurus?• Often used interchangeably• Thesaurus is a taxonomy with extras

– Related Terms– Non-preferred Terms (USE/Used for)– Scope Notes– More

• Taxonomies often have the actual information object at the final node.

• CMS and SharePoint tend to the hierarchical view only, definition, and USE

Page 9: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Copyright © 2005 - Access Innovations, Inc.

Taxonomyview

ThesaurusTerm Record

view

Page 10: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

How do you build a taxonomy ?

• Define subject field• Collect terms• Organize terms• Fill in gaps• Flesh out and interrelate terms• Apply to your data

You’re done!

Page 11: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Define subject field

• Review representative collection of content• Determine:

– Core areas – Peripheral topics

PsychologyEducation

Sociology

Law

• Scope can be modified later

Page 12: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Build, buy, augment?• Survey existing thesaurus/taxonomy resources for your domain• Test for

• Scope• Depth• Make-or-break terms• Cost

• Adoption of existing taxonomies– Term registries– Taxobank– Taxonomy Warehouse– Other resources

Don’t reinvent the wheel!

Page 13: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Foundations• Start with what is known• Build from there• Use the literature, your data• Use internal lists• Built-in continuous review throughout the

process, and beyond• Who is involved?

– Taxonomists– Subject matter experts– Project management– Users

Page 14: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Collect terms• Your documents and databases• Departmental terminology• Textbooks and their indexes• Book tables of contents and indexes• Journal quarterly indexes• Encyclopedias• Lexicons, glossaries on the topic• Web resources• Users and experts• Search logs

Page 15: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Gather terms from search logs

• Top 100 search terms from search logs• Terms used more than 50 times• Match to website with appropriate answer• Basis for favorites or best bets, presented at

the top of results list• Behavior-based taxonomy

Page 16: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

How do you choose terms?

• Importance in the subject area• Use in the literature, by the organization or

community• Necessary degree of specificity or detail• Relationship with other controlled

vocabularies• Single concept = single term

Page 17: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

One term / one concept

• Terms represent simple or unitary concept• A unit of thought• May be a single-word term• May be a multiword term is required to

represent the concept• Three main categories

– Concrete entities – Abstract concepts– Proper nouns

“A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.”

Page 18: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Concrete entities as terms

• Things and their physical parts– Birds

• Feathers

• Buildings• Floors

• Materials– Cement – Wood – Lead

– Cards and Chips

Page 19: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Abstract concepts as terms• Actions and events

– evolution, skating, management, ceremonies• Abstract entities

– law, theory• Properties of things, materials, and actions

– strength, efficiency• Disciplines and sciences

– physics, meteorology, mathematics• Units of measurement

– pounds, kilograms, miles, meters, nanoseconds

Page 20: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Proper nouns as terms

• Individual entities – “classes of one” – expressed as proper nouns – San Francisco, Lake Michigan

Thesaurus standards exclude proper names, persons, and trade names authority files. Taxonomies include them as final nodes.

Page 21: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Organize terms – roughly• Sort terms into several major categories – logical

groups of similar concepts as Top Terms– Identify core areas and peripheral topics– 10 – 20 to start– Consider moving proper names to authority files

• Result: loose collection of terms under several main headings– Rough and tentative – see how it fits as you go– Initial gap analysis– Add / modify / delete as needed

Page 22: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

How do terms relate?

• Hierarchical relationships– Parents and their children

• Equivalence relationships– Aliases

• Associative relationships– Cousins– See also’s

TAXONOMY

THESAURUS

Page 23: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Hierarchical relationships

• Broader Term represents the class, whole, or genus

• Narrower Term is a member, part, or species– Generic relationship– Whole-part relationship– Instance relationship

• NT inherit all the BT characteristics • BTs/NTs have a reciprocal relationship

Page 24: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Broader to narrower terms

Politics

ElectionsPresidential elections

Gubernatorial electionsMayoral elections

Page 25: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Hierarchy – Whole-part relationship

• Four general types – Body systems and organs

• Ear Middle ear

– Geographical locations• Bernalillo County Albuquerque

– Fields of study• Geology Physical geology

– Hierarchical social structures• Ontario Manitoulin District

Page 26: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Hierarchy – Instance relationship

• General category (common noun) as BT, with individual example (proper noun) as NTI (Narrower Term Instance)

Seas French cathedralsBaltic Sea Chartres CathedralCaspian Sea Rheims Cathedral

Mediterranean Sea Rouen Cathedral

Essentially identical to “final node” in taxonomies

Page 27: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Polyhierarchical relationship

• Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)

• Part of ISO standards, new to ANSI/NISO

Copyright © 2009 - Access Innovations, Inc.

Nurses Health administrators Nurse administrators Nurse administrators

Finance Careers Accounting Accounting

Page 28: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Generic relationship test – 1 • Both terms in same fundamental category• “All-and-some” test

SOME ALL

SOME NOT ALL

Rodents

Squirrels

Pests

Squirrels

Inheritance or inclusion – what’s true of the parent (BT)

is true for all children (NTs)

Page 29: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Pests

Generic relationship test – 2

Squirrels

Rodents

ALL squirrels are rodents x NOT ALL squirrels are pestsx NOT ALL pests are rodents

Page 30: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Equivalence relationship• Preferred Term

– Thesaurus term and valid for indexing– Thesaurus notation: USE

• Non-Preferred Term– Not valid for indexing– An alias or imposter– Entry point, directs user to Preferred Term– Thesaurus notation: UF or NPTSpiders Plant pathology UF Arachnids USE Phytopathology

Page 31: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Equivalence – when to use• Synonyms, slang, quasi-synonyms• Scientific and trade names

– Ibubrofen UF Motrin™• Lexical variants

– Fiber optics UF Fibre optics– Mouse UF Mice

• Upward posting of narrow concepts not specified in taxonomy or thesaurus– Social class UF Elite, Middle class, Working class

Get equivalent terms from search logs, brainstorming…

Page 32: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Associative relationship• Related Terms (RTs) – cousins• “…terms related conceptually but not hierarchically, and

are not part of an equivalence set” (i.e. not synonyms)• Both valid for indexing• Reciprocal relationship with each other• Expands user’s awareness, reflects thesaurus

coverage of unanticipated areas• Main basis for the ontology • 14 main options offered in Z39.19

Page 33: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Scope Notes (SN)• Indicate meaning of the term in the context of

this thesaurus, for this audience– Stress – Mental, Psychological, Physiological

• Could be the definition or glossary• Indicate any restriction in meaning• Indicate range of topics covered• Provide direction for indexers; for terms often

confused, may suggest an alternative term• Use as needed – may not be for every term• Use a style guide• Be concise

Page 34: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Stating the terms

• Term format• Grammatical issues• Singular and plural forms• Spelling• Abbreviations and acronyms• Capitalization• Other punctuation• Consistency

Page 35: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Term format• KISS – Keep it short and simple

– 1-2-3 words– Effect on search– Pre- and Post-Coordination

• Establish a policy – follow Chicago Manual of Style

• Grammatical issues – Nouns and noun phrases– Verbs Gerunds – Adjectives - no– Adverbs - no– Initial articles – no

Page 36: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Compound terms – nope!

• “Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard)

• “Compound terms should be factored (split) into simple elements…” (ANSI/NISO standard)

• Term phrases are okay (bigrams)– Adjective-Noun– American history

• Two concepts combined are not– Aromatherapy for bloating

Page 37: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Pre and post coordinate terms

• Pre coordinates – two concepts– Subject headings – Library of Congress

• American history – Civil War

– Back of the book– Put together in advance by the publisher

• Post Coordinate– Taxonomy terms– Single concept– Put together by the user / searcher

Page 38: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

So far you’ve got• Hierarchy

– Broader and Narrower Terms– Polyhierarchies when needed

• Preferred/Non-Preferred Terms – Equivalence relationships

• Related Terms– Associative relationships

• Scope Notes• Complete term records

– Correct term format

Page 39: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Review, edit, test, edit, use, edit, and maintain, i.e. edit• Review

– Users– Expert reviewers

• Test– Index 500+ documents

(more for variable writing style; fewer for strict style)

– Monitor search log

• Edit and maintain– Add term– Change existing term– Change term status– Delete term– Add term relationship– Delete term relationship– Add/modify Scope Note– Change overall structure

Consider automated / assisted indexing software

Page 40: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

• Work first from the literature• Establish literary warrant for terms • Someone else do the clerical work • Differentiate the lexicography work

– From the Subject Matter expert work • Let SMEs do the review and tailoring• Expert review ensures the proper term use

and application • Advisory Board…advisable!

Subject Matter Experts

Page 41: Taxonomy Fundamentals Workshop

© 2012 Hedden Information Management

More Details

Polyhierarchies Facets

Page 42: Taxonomy Fundamentals Workshop

Polyhierarchies

Child Term 1

Term

Child Term 1

Child Term 2

Grand-child 1

Grand- child 2

Grand- child 3

Grand-child 1

Term

Grand- child 4

Grand-child 2

Hierarchy Polyhierarchy

Child Term 2

© 2012 Hedden Information Management

Page 43: Taxonomy Fundamentals Workshop

Polyhierarchies A term has a polyhierarchy if it has more than

one broader term. Polyhierarchy is permitted if the hierarchical

relationship is valid in both/all cases Remember “All-and-Some” test for each

generic hierarchical relationship

© 2012 Hedden Information Management

Page 44: Taxonomy Fundamentals Workshop

Polyhierarchies Based on generic relationship

TrucksCars

Light trucks

EducatorsMusicians

Music Teachers

Motor vehiclesProfessions

© 2012 Hedden Information Management

Page 45: Taxonomy Fundamentals Workshop

Polyhierarchies Based on different kinds of hierarchical relationships/

different means of categorizing (less common)

UtahLakes

Great Salt Lake

Bodies of Water

UnitedStates

© 2012 Hedden Information Management

Page 46: Taxonomy Fundamentals Workshop

Polyhierarchy - Pluses

Polyhierarchy is useful when… It is obviously logical for select terms

(cross-overs/hybrids, e.g. Music teachers or Light Trucks) It is indicated by different stakeholder views Indexers/taggers browse the taxonomy hierarchically End-user testing/input (e.g. card-sorting) indicates users

are split as to where in the hierarchy an item belongs

© 2012 Hedden Information Management

Page 47: Taxonomy Fundamentals Workshop

Polyhierarchy - Pluses

Retail website case study example:

Health & Fitness › Portable Fitness Electronics › Fitness GPS Watches

Car, Marine & GPS › GPS Navigation › Handheld GPS › Fitness GPS Watches

Sports taxonomy case study example:

Back Exercises

› Dead Lifts

Hamstring Exercises

› Dead Lifts

© 2012 Hedden Information Management

Page 48: Taxonomy Fundamentals Workshop

Polyhierarchy - Minuses

Polyhierarchy is not so good when… It violates hierarchical relationship standards It becomes excessive, perhaps more common than mono-

hierarchies It is the result of different kinds of a categorization,

and the presence of different kinds of categorization is confusing

It is a small taxonomy and the user doesn’t need or expect polyhierarchy

© 2012 Hedden Information Management

Page 49: Taxonomy Fundamentals Workshop

Problems with excessive polyhierarchies: Familiar tree structure is lost. Users cannot see the logical

hierarchy. Users spend too much time clicking through categories.

Polyhierarchy - Minuses

© 2012 Hedden Information Management

Page 50: Taxonomy Fundamentals Workshop

Polyhierarchy - MinusesLogical polyhierarchies, if done consistently, could

become extensive.

Example: creating polyhierarchies for products based on different classifications

TablewareGlass Products

Wine Glasses

Balls Soccer Equipment

Soccer Balls

© 2012 Hedden Information Management

Page 51: Taxonomy Fundamentals Workshop

Polyhierarchy - MinusesMultiple, potentially confusing categorizations: Place names in hierarchies for both geographic location

and for place type Products in hierarchies for both material and for use Physical exercises in hierarchies for both body part and

purpose/type (strength, endurance, etc.)

“It’s OK, we can have polyhierarchies” This is not always the best solution.

Maybe facets should be used instead.

© 2012 Hedden Information Management

Page 52: Taxonomy Fundamentals Workshop

Violating hierarchical relationship standards Might be OK in some cases in some taxonomies But avoid overuse in polyhierarchies

Case study example: Accessories as a narrower term

to a product category Services as a narrower term

to a product category

Computers & Tablets Laptop & Netbook Computers Tablets, iPads & E-Readers Desktop & All-in-One Computers Monitors Mice & Keyboards Printers Hard Drives & Storage Computer Memory Video Cards & PC Components Networking & Wireless Software Computer Accessories Computer Setup & Services

Polyhierarchies - Cases

© 2012 Hedden Information Management

Page 53: Taxonomy Fundamentals Workshop

Violating hierarchical relationship standards within limits

Computers & TabletsLaptop & Netbook Computers

PC LaptopsMacBooksChromebooksNetbooks

All NetbooksNetbook CasesComputer Setup & Services Not OK

Laptop AccessoriesComputer Setup & Services OK

Desktop & All-in-One ComputersAll-in-One ComputersTowers OnlyDesktop PackagesComputer Setup & Services

OK

Polyhierarchies - Cases

© 2012 Hedden Information Management

Page 54: Taxonomy Fundamentals Workshop

Do not create a polyhierarchy to both a “parent” and a “grandparent.”

Digital Cameras

Cameras

Digital SLR Cameras

Grandparent of Digital SLR Cameras

Parent of Digital SLR Cameras

Polyhierarchies - Cases

© 2012 Hedden Information Management

Page 55: Taxonomy Fundamentals Workshop

Might be better not to have polyhierarchies when the taxonomy is small and the number of top-level categories are few

Case study: Client management documents of a financial services company has 114 topical terms categorized with just five broader terms:

Account Information Client Information Client Status Disclosures & Notifications Approvals/Guidance

Decided against polyhierarchies. Reason: Repeat users can memorize the small hierarchy. They

don’t expect polyhierarchy here.

Polyhierarchies - Cases

© 2012 Hedden Information Management

Page 56: Taxonomy Fundamentals Workshop

Polyhierarchies - ConclusionsSome is good. More isn’t necessarily better. Polyhierarchies are best for isolated terms that can fall

into two categories. Polyhierarchies can become too many in cases of

overlays of two different categorization methods for numerous terms. (Facets may be better.)

Polyhierarchies are useful, no matter how extensive, in term-focused thesauri

Polyhierarchies should be more limited in fully displayed taxonomies

© 2012 Hedden Information Management

Page 57: Taxonomy Fundamentals Workshop

Polyhierarchies - Exercise

Propose two broader terms for each:

Hotel managers Printers Fish Egypt Bill Gates

© 2012 Hedden Information Management

Page 58: Taxonomy Fundamentals Workshop

Facets

For serving faceted classification, which allows the assignment of multiple classifications to an object

A “dimension” of a query; a type of concept Intended for searching with multiple terms in combination

(post-coordination), one from each facet Can be for topics or for named entities, but generally not

both Reflect the domain of content A subset of metadata fields

© 2012 Hedden Information Management

Page 59: Taxonomy Fundamentals Workshop

Facets

Faceted ClassificationMathematician/librarian S.R. Ranganathan (1920s)

developed as an alternative to the Dewey Decimal System for books:

“Colon Classification”

1. Personality – topic or orientation

2. Matter – things or materials

3. Energy – actions

4. Space – places or locations

5. Time – times or time periods

© 2012 Hedden Information Management

Page 60: Taxonomy Fundamentals Workshop

Facets

Facets are suitable for: Structured data with discernable metadata fields or database

records Homogeneous data with similar types of characteristics (e.g.

products in an e-commerce site)

Example types of facets: For products

category, brand, size, color, price range, features For people

name, job title, gender, birth year, location, department For reports

author, subject, audience, document type, language

© 2012 Hedden Information Management

Page 61: Taxonomy Fundamentals Workshop

Facets

For enterprise taxonomies:Patrick Lambe,

Organising Knowledge People and organizations Things and parts of things Activity cycles Locations

For Web sites:Rosenfeld and Morville,

Information Architecture Topic Product Document type Audience Geography Price

© 2012 Hedden Information Management

Page 62: Taxonomy Fundamentals Workshop

Facet Examples

1. Shoebuy.com - advanced searchhttp://www.shoebuy.com/s.jsp/r_as

2. My Recipeshttp://search.myrecipes.com

3. Microbial Life Educational Resourceshttp://serc.carleton.edu/microbelife/resources

© 2012 Hedden Information Management

Page 63: Taxonomy Fundamentals Workshop
Page 64: Taxonomy Fundamentals Workshop

© 2012 Hedden Information Management

Page 65: Taxonomy Fundamentals Workshop

My Recipes

© 2012 Hedden Information Management

Page 66: Taxonomy Fundamentals Workshop
Page 67: Taxonomy Fundamentals Workshop

© 2012 Hedden Information Management

Page 68: Taxonomy Fundamentals Workshop

Facets & Hierarchies

Combining Facets and Hierarchies

1. Have hierarchies within facets

2. Start with hierarchical categories and then limit further with facets

© 2012 Hedden Information Management

Page 69: Taxonomy Fundamentals Workshop

Facets & Hierarchies

1. Hierarchies within facets: indented display

World Bank documents advanced search

http://documents.worldbank.org/curated/en/docadvancesearch

© 2012 Hedden Information Management

Page 70: Taxonomy Fundamentals Workshop
Page 71: Taxonomy Fundamentals Workshop

Facets & Hierarchies

2. Hierarchies of topics, then facets to narrow results:

ThomasNet business directoryhttp://ps.thomasnet.com/productsearch

Buzzillions product reviewshttp://www.buzzillions.com

Amazon.com books browsehttp://www.amazon.com

© 2012 Hedden Information Management

Page 72: Taxonomy Fundamentals Workshop

Taxonomy Structures: Hierarchies

One level per web pageYahoo directory

http://search.yahoo.com/dir

ThomasNet browse

http://www.thomasnet.com/browse

Page 73: Taxonomy Fundamentals Workshop

© 2012 Hedden Information Management

Page 74: Taxonomy Fundamentals Workshop
Page 75: Taxonomy Fundamentals Workshop
Page 76: Taxonomy Fundamentals Workshop

Buzzillions

© 2012 Hedden Information Management

Page 77: Taxonomy Fundamentals Workshop
Page 78: Taxonomy Fundamentals Workshop

Amazon > Books

Page 79: Taxonomy Fundamentals Workshop

Advantages Supports more complex search queries by users Allows users to control the search refinement, narrowing

or broadening in any manner or order

Disadvantages Only suitable for somewhat structured, unified type of

content that share the same multiple facets Might not support multiple terms selected at once from

the same facet Often hidden from users under “Advanced Search” Requires investment of thorough (multifacted)

indexing/tagging

Facets - Conclusions

© 2012 Hedden Information Management

Page 80: Taxonomy Fundamentals Workshop

Facets - Conclusions

Facet Design Tips Number of facets: 4-8, with 5-6 as ideal Facets listed in logical, not alphabetical order Number of terms per facet: 2-25

Ideally not much more than can be viewed in a scroll box If the list is obvious (US states), then more is OK. Exception can be made for hierarchical “Topics” facet

If <12 terms, then a logical display orderIf >12 terms, then alphabetical

A two-level hierarchy (indented) within a facet is possible

© 2012 Hedden Information Management

Page 81: Taxonomy Fundamentals Workshop

Designate a set of 4-7 facets for a tour operator web site selling vacation packages.

Facets - Exercise

© 2012 Hedden Information Management

Page 82: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

• Designed to enhance understanding and retention of the vocabulary concepts necessary for creating a taxonomy, ontology, thesaurus, or controlled vocabulary.

• Game supplies:– 1 Deck of Orange Question and Challenge Cards– 1 Deck of Green Answer Cards

• Game setup:– Shuffle the deck of Green Answer cards, – Deal the entire deck to the players. – Shuffle the deck of Orange Question and Challenge cards– Place them facedown in a pile in the middle of the table so that all

players can reach the pile.

• Reinforce what you just heard!• Have fun!

TAXONOMATCH

Page 83: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

1. Play moves to the left of the dealer

2. Draw a card from the top of the Orange cards. Read it aloud to all of the players.

3. The player who read the card says out loud what they think the answer is.

4. Each player looks at the Green Answer cards in their hand.

1. If they have the correct answer to the Question or Challenge, they show their card to everyone at the table.

2. If everyone agrees that the answer is correct, the player holding the correct answer card gives it to the player who read the Question or Challenge card.

5. The player places their associated pair of cards – one Orange Question and Challenge card and one Green Answer card – face up on the table in front of them.

6. Play passes to the person who held the correct Green Answer card in their hand. Play continues as in step 2 above.

7. Discussion among the players to arrive at the correct answer is permissible and encouraged!

8. If players do not arrive at a consensus regarding the correct answer, the Orange Question and Challenge card may be returned to the bottom of the pile, and play passes to the person to the left of the player who drew the previous card.

9. When all of the Orange Question and Challenge cards have been drawn, read aloud, and matched with their Green Answer cards, the game ends.

10. If there are any Orange Question and Challenge cards remaining to which players cannot agree on an answer, players may consult their notes or ask the session speaker.

TAXONOMATCH RULES

Page 84: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Implementation and applications

• Adding the terms to the information objects• Search and other applications• Taxonomy use cases – implementation• Opportunities and Obstacles• 30 minutes

Page 85: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Parts of the puzzle• The taxonomy

– The words to use– In the order you want the users to browse

• Applications– Search, CMS, SharePoint etc

• Implementation / actions– Making the links– Adding terms to information objects

• Most people confuse the parts and they act very differently

Page 86: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Thesaurus Master

Machine Aided Indexer

(M.A.I.™)

Database

Repository

Fully integrated with MOSS

SearchPresentatio

n Layer

Increasesaccuracy

Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms

Client Taxonomy

Inline Tagging

Metadata and Entity

Extractor

Automatic Summarizatio

n

Search Softwa

re

Client Data

Full Text

HTML, PDF,

Data Feeds,

etc.

Client taxonomy

The Workflow

Tag and createmetadata

Put in database with tags

Build search inverted index

Create user interface

Gather source data

Page 87: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Adding terms to information objects

• Part of the record– XML– MARC

• A relational table pointing the terms to a record ID number (Secondary key)

• Adding data to the HTML – META NAME KEYWORD Element

• Many other options

Page 88: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Part of the record - XML

• Added as an element in the XML record• Need an element to put the data in

– <Taxonomy Term>• Capture the terms when creating the records

Page 89: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Editorial Workflow IntegrationAuthor Submission Module

The author fills in the data to the document template, attaching images and graphs as necessary

An API calls Data Harmony and generates a list of indexing terms based on the content

Page 90: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Authors review the indexing and may change it

Content is stored into a data repository as HTML, XML, etc.

Editorial Workflow IntegrationAuthor Submission Module

Page 91: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

In the HTML record• Makes it crawl able for the Internet• Used in CMS applications

– Content Management Systems• Add to the HTML

– Manually– In Dreamweaver – In your CMS like Extron

• Author Submissions Example• Do the same with SharePoint

Page 92: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

META NAME “KEYWORDS”

Page 93: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

In Relational Database Table• Primary key – the record• Secondary key all the metadata

– Like taxonomy terms– Like author– Like publication date

• Used in Oracle, SQL, etc– Need filed to put the taxonomy data in

• Supports “Faceted Search” – Each item in a separate field or element or table

Page 94: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Relational database diagram

Page 95: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Using taxonomies in applications• Improve search• Subject browsing• Mobile intelligence• Targeted resources based

on subject or user role• Link to society resources• Author submission module• Author authority database• Expert reviewer

identification• Member profiles• Data visualization• More like this

• In “indexing” or categorizing, as subject metadata

• In content management systems

• In SharePoint• In mashups• In social networking sites• In author tagging • In filtering data – e.g., spam

filters and RSS feeds• In web crawlers• Social media - community

Page 96: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Why does search fail?• Most large organizations have 5 search

softwares– All disappointing and on the shelf

• Inconsistent results• Unclear path to results• Lack of single unified clear consistent

vocabulary• Not tied to data governance

– Taxonomy– Other metadata

Page 97: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Parts of Search

• Search software– Inverted Index– Search algorithms

• Presentation layer– Search box– Autocompletion– Related and narrower terms– Hierarchical display

Page 98: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Outline of Presentation1 Define key terminology2 Thesaurus tools

– Features– Functions

3 Costs – Thesaurus construction– Thesaurus tools

4 Why & when?

Creating an Inverted File Index

Sample DOCUMENT

Page 99: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Simple inverted file indexThe terms from the “outline”

&1234constructioncostsdefinefeaturesfunctions

key ofoutlinepresentationterminologythesaurustoolswhenwhy

Page 100: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

& - Stop1 - Stop2 - Stop3 - Stop4 - Stopconstruction - L7, P2, SH costs - L6, P1, Hdefine - L2, P1, Hfeatures - L4, P1, SHfunctions - L5, P1, SH

key - L2, P2, Hof - Stopoutline - L1, P1, Tpresentation - L1, P3, Tterminology - L2, P3, Hthesaurus - (1) - L3, P1, H (2) - L7, P1, SH (3) - L8, P1, SHtools - (1) - L3, P2, H (2) - L8, P2, SHwhen - L9, P3, Hwhy - L9, P1, H

Complex inverted file indexPlacement location

Page 101: Taxonomy Fundamentals Workshop

© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.

Improve search www.mediasleuth.com

Navigate the full taxonomy “tree”

BROWSE

Auto-completion using the taxonomy

Guide the user

Page 102: Taxonomy Fundamentals Workshop

© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.

Subject browsing

Page 103: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Targeted resources based on subject or user role

CONFIDENTIAL

Page 104: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Linked data

Job Posting for Expert on Topic A

Author NetworksSocial Networking

Journal Article on Topic A

Other Journal Articles on

Topic A

Upcoming Conference on Topic A

Podcast Interview with Researcher

Working on Topic A

Grant Available for Researchers

Working on Topic A

CME Activity on Topic A

Page 105: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Cancer Epidemiology Biomarkers & Prevention Vol. 12, 161-164, February 2003© 2003 American Association for Cancer Research Short Communications

Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the American Cancer Society Cancer Prevention Study II Nutrition Cohort Heather Spencer Feigelson1, Carolyn R. Jonas, Andreas S. Robertson, Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department of Epidemiology and Surveillance Research, American Cancer Society, National Home Office, Atlanta, Georgia 30329-4251

Recent studies suggest that the increased risk of breast cancer associated with alcohol consumption may be reduced by adequate folate intake. We examined this question among 66,561 postmenopausal women in the American Cancer Society Cancer Prevention Study II Nutrition Cohort.

Related Press Releases• How What and How Much We Eat (And Drink) Aff

ects Our Risk of Cancer • Novel COX-2 Combination Treatment May Reduc

e Colon Cancer Risk Combination Regimen of COX-2 Inhibitor and Fish Oil Causes Cell Death

• COX-2 Levels Are Elevated in Smokers

Related AACR Workshops and Conferences• Frontiers in Cancer Prevention Research• Continuing Medical Education (CME) • Molecular Targets and Cancer Therapeut

icsRelated Meeting Abstracts• Association between dietary folate intake, alcoh

ol intake, and methylenetetrahydrofolate reductase C677T and A1298C polymorphisms and subsequent breast

• Folate, folate cofactor, and alcohol intakes and risk for colorectal adenoma

• Dietary folate intake and risk of prostate cancer in a large prospective cohort study

Related Working Groups• Finance• Charter• Molecular Epidemiology

Related Education Book ContentOral Contraceptives, Postmenopausal Hormones, and Breast CancerPhysical Activity and CancerHormonal Interventions: From Adjuvant Therapy to Breast Cancer PreventionRelated Awards

• AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards

• ACS Award• Weinstein Distinguished Lecture

WebcastsRelated Webcasts

Think Tank ReportRelated Think Tank Report Content

Link to society resources

Page 106: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Authors at a place

Page 107: Taxonomy Fundamentals Workshop

© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.

Member profile tagging

User pastes or uploads CV

Button to auto-extract taxonomy attributes

Page 108: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

TaxoTerm ServerData Harmony

(M.A.I.)

Even

t Han

dler

Returns subject metadata

MicrosoftSharePointServer 2010

User uploads a document to SharePoint space

Before uploading to SharePoint server, the EventHandler sends the document to Data Harmony.

Data Harmony automatically attaches indexing terms before uploading to MOSS

108

Adding terms to SharePoint

Page 109: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

SharePoint 2010 only shows 10 lines of the taxonomy

109

This add on makes it all viewable

Page 110: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

QU

ERY API

CUSTOMCONNECTOR

EMAILCONNECTOR

Core Architectural Components

Pipeline

SEARCHSERVER

QU

ERYPR

OC

ESSOR

Query

Results

VerticalApplications

Portals

CustomFront-Ends

MobileDevicesContent

Push

DO

CU

MEN

TPR

OC

ESSOR

WebContent

Files,Documents

Databases

CustomApplications

CO

NTEN

T API

FAST MANAGEMENT API

Index DBDATABASE

CONNECTOR

FILETRAVERSER

WEBCRAWLER

Pipeline

Email, Groupware

Administrator’sDashboard

FILTERSERVER

Agent DB

Alerts

Use taxonomy terms hereData Harmony Governance API

MA

Istro

Search harmony

Taxonomies added in search example

Page 111: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Autosuggestion of taxonomy terms

Populate Keywords, Descriptors, Indexing terms, etc.

Allow for manual review of auto-tagging for quality assurance.

Page 112: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

More Innovations• Link topic to article to author to event• Make visual links within domain • Enable authors to submit and categorize conference

submissions• Create author authority database linking to co-authors,

topics, locations, etc.• Create expert reviewer database• Create member profiles with alternate names,

publications, tagged by topic• Visualize data and domain distribution• Display interest connections in social network• Deliver accurate targeted information through mobile

applications• Etc.

Page 113: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Taxonomy standards• Z39.19 (2005) Controlled Vocabularies• BS 8723 Parts 1 – 5• ISO25964 Parts 1 - 2 • TAG 37 and 46 standards• SKOS - Simple Knowledge Organization System• OWL - Web Ontology Language• AND more!

Page 114: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

IT is often Fire, Ready, Aim!

• Choose the hardware• Choose the software• Decide on the format• Convert the data• Fix the data• Tack on a taxonomy• Ignore the standards

Page 115: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Change to Ready, Aim, Fire!

• Follow the data• Look at the data, format and content• Design taxonomy for data• Leverage the standards• Use taxonomy to tag data• Choose search and repository software for

data• Load the data into the system• Keep your eye on the target

Page 116: Taxonomy Fundamentals Workshop

© 2012. Access Innovations, Inc. All Rights Reserved.

Summary

• We covered the basics• We talked about the implementation • Application of the terms to your content• We reinforced the learning with activities• No go hear the case studies of the next two

days!

Page 117: Taxonomy Fundamentals Workshop

Questions?

Heather Hedden

Taxonomy Consultant

Hedden Information Management

www.hedden-information.com

www.accidental-taxonomist.com

[email protected]

978-467-5195

Marjorie M.K. Hlava

President

Access Innovations, Inc.

www.accessinn.com

www.data-harmony.com

[email protected]

505-998-0800