Download - Taxonomy 101
+
Taxonomy 101Controlled Vocabularies and Beyond
Barbara McGlamery, Marthastewart.com
+About Me
9+ years Time Inc.
Entertainment Weekly
This Old House
Time
People
Instyle
Recipe Finder
1+ years Martha Stewart
Martha Stewart Living
Martha Stewart Weddings
Whole Living
+Agenda
Basics of taxonomy and controlled
vocabularies
Developing a taxonomy
Taxonomy software and tagging tools
Records management and taxonomy
+What is a controlled vocabulary?
Predefined, authorized terms that can be consistently applied to content
Types: Lists
Synonym rings
Authority Files
Facets
+What is a taxonomy?
Classification of a controlled vocabulary in a hierarchical list
Types:
Taxonomy
Thesaurus
Ontology
+Controlled Vocabulary
Predefined, authorized terms
that can be consistently applied
to content
Relationship is between the list
value and class
+Controlled Vocabulary
Units of Measure
Cup
Tablespoon
Teaspoon
+Synonym Ring
Extends a CV by adding synonyms as
equivalent terms
Relationship is between list value and its
synonyms
+Synonym Ring
Units of Measure
Cup = C= c
Tablespoon = Tbl = T
Teaspoon – tsp = t
+Authority File
Extends CV’s and synonym rings further by
assigning one term as the preferred term
which all other synonyms will point to
Relationship assigns property (Preferred
Term) to one term and all others as
synonyms
+Authority File
Units of Measure
(Preferred Term) Cup
Syn: C, c
(PT) Tablespoon
Syn: Tbl, T
(PT) Teaspoon
Syn: tsp, t
+Facets
Terms are broken down individually by
unique properties, allowing a mix and match
approach to search and retrieval
Relationship is between one facet node and
multiple values
+Facets
+Taxonomy
Classification of a controlled vocabulary in
a hierarchical list
Relationship is in assigning a hierarchy to
list values
+Taxonomy
Food
Main Ingredient
Vegetables (ahem…fruit)
Tomatoes
Beefsteak tomatoes
Cherry tomatoes
Sundried tomatoes
+Thesaurus
CV’s in a hierarchical structure with
predefined relationships between terms
(Broader Term, Narrower Term, Preferred
Term, etc.)
Relationship is in assigning standardized
properties to list values
+Thesaurus
Food
(BT) Main Ingredient
(BT)Vegetables (ahem…fruit)
(BT)Tomatoes
(NT)Beefsteak tomato
(NT)(PT)Cherry tomato
(RT) Roma tomato
(NT)Sundried tomato
(RT) Tomato sauce
+Ontology
CV’s in a hierarchical structure with complex
relationships defined
Relationship is in assigning predetermined
standardized and freeform properties to list
values
+Ontology
Beefsteak tomatoes
(isMainIngredient)
Tomato sauce
Will Smith
(isLeadActor)
Men in Black 3
+Semantic (semantic) Web
Big S
Initiative from W3C to create a web of machine readable data by marking up content with consistently applied, standardized and freeform properties
RDF/OWL
Proprietary
Little s
Various standards that mark up content with agreed-upon and freeform properties
Microformats
Microdata
Proprietary
+Pros and Cons of CV’s and
taxonomy
Benefits
Greater precision in search and retrieval
Allows for faceted browsing
Facilitates aggregation of content
Clearly defines relationships between things
Limitations
Initial costs
Upkeep
Can spiral out of control
May be too complex for some organizations
+What is taxonomy used for in web
world?
Search and retrieval
Faceted browsing
Aggregation
of content
Internal organization
of assets
+Developing a taxonomy
Strategy and planning
Choosing style and method
Determine classes and relationships
Gather terms and organize
Add terms and relationships
Review and approval
+Strategy and Planning
Identify business case
ROI
Money saved
Money earned
Scope
Use cases
Front-end
Back-end
Approval
Wireframes and functional specification
+Choose Style and Method
Method
Top down
Bottom up
Styles
CV
Synonym ring
Authority file
Facets
Taxonomy
Thesaurus
Ontology
+Determine Classes and
Relationships
Classes
As few as necessary
Relationships between terms
As few as necessary
With a taxonomy, determine nature of hierarchy
Type of
With a thesaurus, use predefined, but you may not want
to use all
With ontology, determine complex relationships
+Gather Terms and Organize
Research
Competitive analysis
Identify existing outside CV’s that might be utilized (SIC
codes)
Meet with stakeholders
Get as much input as possible
Stick to biz case (spiraling problem)
You are the final decision maker
Must conform to structure decided upon otherwise mass
chaos
Always keep use cases in mind
+Add Terms and Relationships
Things to keep in mind:
Synonyms, misspellings, special characters
Homonyms
Different database identifiers or different names
Shower (baby and bathroom)
Duplicates
Technical considerations if different children
Breads as a main ingredient or as a dish
Bruschetta (dish, but not main ingredient)
Descriptions
Identifying duplicates or notes regarding the application to content
+Review and Approval
Thorough review by all stakeholders
This can take several sessions if
taxonomy is big
Final approval and sign-off
Critical for buy-in
+Taxonomy and Tagging Tools
Relational databases
Filemaker Pro
Microsoft Access
MySql
Content management
software
Drupal
Sharepoint
Proprietary applications
Thesaurus and taxonomy tools
Open source
Protégée
Commercial
SchemaLogic (Thesaurus)
TopBraid Composer,
(Ontologies), Pro
Auto categorization and text
mining
Data Harmony MAIstro,
Nstein
+Tagging the Content
Manual
Good for small, controlled sets of documents
Highly accurate
Time consuming
Automated
Good for large unwieldy sets of documents
Fast and getting more accurate daily
Expensive, 3rd party apps
Hybrid
Manual – content or document creators insert valuable metadata
Automated – other data extracted and matched to taxonomy
+Real World Application of Taxonomy
for Records Management
Classifying
Storing and retrieving
Securing
Archiving or destroying
+Real World Applications
CV
List of Departments (HR, IT, Marketing)
Synonym rings
Mergers and acquisitions = M and A = M&A
Authority File
(PT) Mergers and acquisitions
Syn: M and A, M&A
Facets
Authors, Departments, Security Level
Taxonomy/Thesaurus
Organizational chart
Investment Bank Director
SVP Investments
EVP Investments
Investment Analyst
Ontology
Relationships between affiliations and departments/industries
ARMA (isProfessionalAssn) for Records Managers
+What could it be used for in your
world?
http://www.yutope.com/2008/07/is-your-email-inbox-overflowing/
+Industry standards
Taxonomy specific
Dublin Core (DC)
Thesaurus construction
ANSI/NISO Z39.19
ISO 2788; 5964
Ontology development
W3C
Resource Description Framework (RDF)
Web Ontology Language (OWL)
Records Management specific
Metadata management
ISO/S 23081-1
ISO 23081-2
+
Questions?
+My contact info
Barbara McGlamery
Taxonomist
Martha Stewart Living Omnimedia
(212)827-8817