taxonomy fundamentals workshop
Post on 18-Oct-2014
2.320 views
DESCRIPTION
Opening presentation for Track 1 of the 2012 Taxonomy Boot Camp, October 16, 2012. Presented by Marjorie M.K. Hlava of Access Innovations and Heather Hedden of Hedden Information Management.TRANSCRIPT
TaxonomyFundamentals
Workshop
Taxonomy Boot Camp, October 16, 2012, Washington, DC
Marjorie Hlava, PresidentAccess Innovations, Inc.
www.accessinn.com
Heather Hedden
Hedden Information Managementwww.hedden-information.com
Introductions
Marjorie HlavaPresident, Access Innovations, Inc.
Heather HeddenTaxonomy Consultant, Hedden Information Management
Author, The Accidental Taxonomist
Outline
• The basics – 30 minutes
• More details: Polyhierarchies and Facets– 30 minutes (including exercises)
• “Taxonomatch” – 15 minutes
• Implementation and applications – 15 minutes
• Q & A
© 2012. Access Innovations, Inc. All Rights Reserved.
The Basics – 30 minutes
• What is a taxonomy?• What are the parts of a taxonomy?• How do you build one?• Guidelines for the terms• Subject Matter Experts (SME’s)• 40 slides
© 2012. Access Innovations, Inc. All Rights Reserved.
What is a Taxonomy? ANSI/NISO Z39.19-2005
“A collection of controlled vocabulary terms organized into a
hierarchical structure.”
controlled
Missing: equivalence, associative relationships, and notes
Yes!
© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.
The Semantic Road Map: Knowledge Organization Systems
Semantic network
Ontology
Thesaurus
Taxonomy
Controlled vocabulary
Synonym set/ring
Name authority file
Uncontrolled list• Unrelated Entities• Ambiguity
• Linked Entities• Contextual Specificity
• Simple• Low value
• Complex• High value
Uncontrolled list
Highest Cost over Time!
© 2012. Access Innovations, Inc. All Rights Reserved.
Basic features - The term record
• Main Term (MT) • Top Term (TT)• Broader Terms (BT)• Narrower Terms (NT)• Related Terms (RT)
– See also (SA)• Non-Preferred Term (NP)
– Used for (UF), See (S)– Synonyms
• Scope Note (SN)• History (H)
= subject term, heading, node, category, descriptor, class
TAXONOMY
THESAURUS
ONTOLOGY
© 2012. Access Innovations, Inc. All Rights Reserved.
Taxonomy? Thesaurus?• Often used interchangeably• Thesaurus is a taxonomy with extras
– Related Terms– Non-preferred Terms (USE/Used for)– Scope Notes– More
• Taxonomies often have the actual information object at the final node.
• CMS and SharePoint tend to the hierarchical view only, definition, and USE
© 2012. Access Innovations, Inc. All Rights Reserved.
Copyright © 2005 - Access Innovations, Inc.
Taxonomyview
ThesaurusTerm Record
view
© 2012. Access Innovations, Inc. All Rights Reserved.
How do you build a taxonomy ?
• Define subject field• Collect terms• Organize terms• Fill in gaps• Flesh out and interrelate terms• Apply to your data
You’re done!
© 2012. Access Innovations, Inc. All Rights Reserved.
Define subject field
• Review representative collection of content• Determine:
– Core areas – Peripheral topics
PsychologyEducation
Sociology
Law
• Scope can be modified later
© 2012. Access Innovations, Inc. All Rights Reserved.
Build, buy, augment?• Survey existing thesaurus/taxonomy resources for your domain• Test for
• Scope• Depth• Make-or-break terms• Cost
• Adoption of existing taxonomies– Term registries– Taxobank– Taxonomy Warehouse– Other resources
Don’t reinvent the wheel!
© 2012. Access Innovations, Inc. All Rights Reserved.
Foundations• Start with what is known• Build from there• Use the literature, your data• Use internal lists• Built-in continuous review throughout the
process, and beyond• Who is involved?
– Taxonomists– Subject matter experts– Project management– Users
© 2012. Access Innovations, Inc. All Rights Reserved.
Collect terms• Your documents and databases• Departmental terminology• Textbooks and their indexes• Book tables of contents and indexes• Journal quarterly indexes• Encyclopedias• Lexicons, glossaries on the topic• Web resources• Users and experts• Search logs
© 2012. Access Innovations, Inc. All Rights Reserved.
Gather terms from search logs
• Top 100 search terms from search logs• Terms used more than 50 times• Match to website with appropriate answer• Basis for favorites or best bets, presented at
the top of results list• Behavior-based taxonomy
© 2012. Access Innovations, Inc. All Rights Reserved.
How do you choose terms?
• Importance in the subject area• Use in the literature, by the organization or
community• Necessary degree of specificity or detail• Relationship with other controlled
vocabularies• Single concept = single term
© 2012. Access Innovations, Inc. All Rights Reserved.
One term / one concept
• Terms represent simple or unitary concept• A unit of thought• May be a single-word term• May be a multiword term is required to
represent the concept• Three main categories
– Concrete entities – Abstract concepts– Proper nouns
“A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.”
© 2012. Access Innovations, Inc. All Rights Reserved.
Concrete entities as terms
• Things and their physical parts– Birds
• Feathers
• Buildings• Floors
• Materials– Cement – Wood – Lead
– Cards and Chips
© 2012. Access Innovations, Inc. All Rights Reserved.
Abstract concepts as terms• Actions and events
– evolution, skating, management, ceremonies• Abstract entities
– law, theory• Properties of things, materials, and actions
– strength, efficiency• Disciplines and sciences
– physics, meteorology, mathematics• Units of measurement
– pounds, kilograms, miles, meters, nanoseconds
© 2012. Access Innovations, Inc. All Rights Reserved.
Proper nouns as terms
• Individual entities – “classes of one” – expressed as proper nouns – San Francisco, Lake Michigan
Thesaurus standards exclude proper names, persons, and trade names authority files. Taxonomies include them as final nodes.
© 2012. Access Innovations, Inc. All Rights Reserved.
Organize terms – roughly• Sort terms into several major categories – logical
groups of similar concepts as Top Terms– Identify core areas and peripheral topics– 10 – 20 to start– Consider moving proper names to authority files
• Result: loose collection of terms under several main headings– Rough and tentative – see how it fits as you go– Initial gap analysis– Add / modify / delete as needed
© 2012. Access Innovations, Inc. All Rights Reserved.
How do terms relate?
• Hierarchical relationships– Parents and their children
• Equivalence relationships– Aliases
• Associative relationships– Cousins– See also’s
TAXONOMY
THESAURUS
© 2012. Access Innovations, Inc. All Rights Reserved.
Hierarchical relationships
• Broader Term represents the class, whole, or genus
• Narrower Term is a member, part, or species– Generic relationship– Whole-part relationship– Instance relationship
• NT inherit all the BT characteristics • BTs/NTs have a reciprocal relationship
© 2012. Access Innovations, Inc. All Rights Reserved.
Broader to narrower terms
Politics
ElectionsPresidential elections
Gubernatorial electionsMayoral elections
© 2012. Access Innovations, Inc. All Rights Reserved.
Hierarchy – Whole-part relationship
• Four general types – Body systems and organs
• Ear Middle ear
– Geographical locations• Bernalillo County Albuquerque
– Fields of study• Geology Physical geology
– Hierarchical social structures• Ontario Manitoulin District
© 2012. Access Innovations, Inc. All Rights Reserved.
Hierarchy – Instance relationship
• General category (common noun) as BT, with individual example (proper noun) as NTI (Narrower Term Instance)
Seas French cathedralsBaltic Sea Chartres CathedralCaspian Sea Rheims Cathedral
Mediterranean Sea Rouen Cathedral
Essentially identical to “final node” in taxonomies
© 2012. Access Innovations, Inc. All Rights Reserved.
Polyhierarchical relationship
• Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)
• Part of ISO standards, new to ANSI/NISO
Copyright © 2009 - Access Innovations, Inc.
Nurses Health administrators Nurse administrators Nurse administrators
Finance Careers Accounting Accounting
© 2012. Access Innovations, Inc. All Rights Reserved.
Generic relationship test – 1 • Both terms in same fundamental category• “All-and-some” test
SOME ALL
SOME NOT ALL
Rodents
Squirrels
Pests
Squirrels
Inheritance or inclusion – what’s true of the parent (BT)
is true for all children (NTs)
© 2012. Access Innovations, Inc. All Rights Reserved.
Pests
Generic relationship test – 2
Squirrels
Rodents
ALL squirrels are rodents x NOT ALL squirrels are pestsx NOT ALL pests are rodents
© 2012. Access Innovations, Inc. All Rights Reserved.
Equivalence relationship• Preferred Term
– Thesaurus term and valid for indexing– Thesaurus notation: USE
• Non-Preferred Term– Not valid for indexing– An alias or imposter– Entry point, directs user to Preferred Term– Thesaurus notation: UF or NPTSpiders Plant pathology UF Arachnids USE Phytopathology
© 2012. Access Innovations, Inc. All Rights Reserved.
Equivalence – when to use• Synonyms, slang, quasi-synonyms• Scientific and trade names
– Ibubrofen UF Motrin™• Lexical variants
– Fiber optics UF Fibre optics– Mouse UF Mice
• Upward posting of narrow concepts not specified in taxonomy or thesaurus– Social class UF Elite, Middle class, Working class
Get equivalent terms from search logs, brainstorming…
© 2012. Access Innovations, Inc. All Rights Reserved.
Associative relationship• Related Terms (RTs) – cousins• “…terms related conceptually but not hierarchically, and
are not part of an equivalence set” (i.e. not synonyms)• Both valid for indexing• Reciprocal relationship with each other• Expands user’s awareness, reflects thesaurus
coverage of unanticipated areas• Main basis for the ontology • 14 main options offered in Z39.19
© 2012. Access Innovations, Inc. All Rights Reserved.
Scope Notes (SN)• Indicate meaning of the term in the context of
this thesaurus, for this audience– Stress – Mental, Psychological, Physiological
• Could be the definition or glossary• Indicate any restriction in meaning• Indicate range of topics covered• Provide direction for indexers; for terms often
confused, may suggest an alternative term• Use as needed – may not be for every term• Use a style guide• Be concise
© 2012. Access Innovations, Inc. All Rights Reserved.
Stating the terms
• Term format• Grammatical issues• Singular and plural forms• Spelling• Abbreviations and acronyms• Capitalization• Other punctuation• Consistency
© 2012. Access Innovations, Inc. All Rights Reserved.
Term format• KISS – Keep it short and simple
– 1-2-3 words– Effect on search– Pre- and Post-Coordination
• Establish a policy – follow Chicago Manual of Style
• Grammatical issues – Nouns and noun phrases– Verbs Gerunds – Adjectives - no– Adverbs - no– Initial articles – no
© 2012. Access Innovations, Inc. All Rights Reserved.
Compound terms – nope!
• “Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard)
• “Compound terms should be factored (split) into simple elements…” (ANSI/NISO standard)
• Term phrases are okay (bigrams)– Adjective-Noun– American history
• Two concepts combined are not– Aromatherapy for bloating
© 2012. Access Innovations, Inc. All Rights Reserved.
Pre and post coordinate terms
• Pre coordinates – two concepts– Subject headings – Library of Congress
• American history – Civil War
– Back of the book– Put together in advance by the publisher
• Post Coordinate– Taxonomy terms– Single concept– Put together by the user / searcher
© 2012. Access Innovations, Inc. All Rights Reserved.
So far you’ve got• Hierarchy
– Broader and Narrower Terms– Polyhierarchies when needed
• Preferred/Non-Preferred Terms – Equivalence relationships
• Related Terms– Associative relationships
• Scope Notes• Complete term records
– Correct term format
© 2012. Access Innovations, Inc. All Rights Reserved.
Review, edit, test, edit, use, edit, and maintain, i.e. edit• Review
– Users– Expert reviewers
• Test– Index 500+ documents
(more for variable writing style; fewer for strict style)
– Monitor search log
• Edit and maintain– Add term– Change existing term– Change term status– Delete term– Add term relationship– Delete term relationship– Add/modify Scope Note– Change overall structure
Consider automated / assisted indexing software
© 2012. Access Innovations, Inc. All Rights Reserved.
• Work first from the literature• Establish literary warrant for terms • Someone else do the clerical work • Differentiate the lexicography work
– From the Subject Matter expert work • Let SMEs do the review and tailoring• Expert review ensures the proper term use
and application • Advisory Board…advisable!
Subject Matter Experts
© 2012 Hedden Information Management
More Details
Polyhierarchies Facets
Polyhierarchies
Child Term 1
Term
Child Term 1
Child Term 2
Grand-child 1
Grand- child 2
Grand- child 3
Grand-child 1
Term
Grand- child 4
Grand-child 2
Hierarchy Polyhierarchy
Child Term 2
© 2012 Hedden Information Management
Polyhierarchies A term has a polyhierarchy if it has more than
one broader term. Polyhierarchy is permitted if the hierarchical
relationship is valid in both/all cases Remember “All-and-Some” test for each
generic hierarchical relationship
© 2012 Hedden Information Management
Polyhierarchies Based on generic relationship
TrucksCars
Light trucks
EducatorsMusicians
Music Teachers
Motor vehiclesProfessions
© 2012 Hedden Information Management
Polyhierarchies Based on different kinds of hierarchical relationships/
different means of categorizing (less common)
UtahLakes
Great Salt Lake
Bodies of Water
UnitedStates
© 2012 Hedden Information Management
Polyhierarchy - Pluses
Polyhierarchy is useful when… It is obviously logical for select terms
(cross-overs/hybrids, e.g. Music teachers or Light Trucks) It is indicated by different stakeholder views Indexers/taggers browse the taxonomy hierarchically End-user testing/input (e.g. card-sorting) indicates users
are split as to where in the hierarchy an item belongs
© 2012 Hedden Information Management
Polyhierarchy - Pluses
Retail website case study example:
Health & Fitness › Portable Fitness Electronics › Fitness GPS Watches
Car, Marine & GPS › GPS Navigation › Handheld GPS › Fitness GPS Watches
Sports taxonomy case study example:
Back Exercises
› Dead Lifts
Hamstring Exercises
› Dead Lifts
© 2012 Hedden Information Management
Polyhierarchy - Minuses
Polyhierarchy is not so good when… It violates hierarchical relationship standards It becomes excessive, perhaps more common than mono-
hierarchies It is the result of different kinds of a categorization,
and the presence of different kinds of categorization is confusing
It is a small taxonomy and the user doesn’t need or expect polyhierarchy
© 2012 Hedden Information Management
Problems with excessive polyhierarchies: Familiar tree structure is lost. Users cannot see the logical
hierarchy. Users spend too much time clicking through categories.
Polyhierarchy - Minuses
© 2012 Hedden Information Management
Polyhierarchy - MinusesLogical polyhierarchies, if done consistently, could
become extensive.
Example: creating polyhierarchies for products based on different classifications
TablewareGlass Products
Wine Glasses
Balls Soccer Equipment
Soccer Balls
© 2012 Hedden Information Management
Polyhierarchy - MinusesMultiple, potentially confusing categorizations: Place names in hierarchies for both geographic location
and for place type Products in hierarchies for both material and for use Physical exercises in hierarchies for both body part and
purpose/type (strength, endurance, etc.)
“It’s OK, we can have polyhierarchies” This is not always the best solution.
Maybe facets should be used instead.
© 2012 Hedden Information Management
Violating hierarchical relationship standards Might be OK in some cases in some taxonomies But avoid overuse in polyhierarchies
Case study example: Accessories as a narrower term
to a product category Services as a narrower term
to a product category
Computers & Tablets Laptop & Netbook Computers Tablets, iPads & E-Readers Desktop & All-in-One Computers Monitors Mice & Keyboards Printers Hard Drives & Storage Computer Memory Video Cards & PC Components Networking & Wireless Software Computer Accessories Computer Setup & Services
Polyhierarchies - Cases
© 2012 Hedden Information Management
Violating hierarchical relationship standards within limits
Computers & TabletsLaptop & Netbook Computers
PC LaptopsMacBooksChromebooksNetbooks
All NetbooksNetbook CasesComputer Setup & Services Not OK
Laptop AccessoriesComputer Setup & Services OK
Desktop & All-in-One ComputersAll-in-One ComputersTowers OnlyDesktop PackagesComputer Setup & Services
OK
Polyhierarchies - Cases
© 2012 Hedden Information Management
Do not create a polyhierarchy to both a “parent” and a “grandparent.”
Digital Cameras
Cameras
Digital SLR Cameras
Grandparent of Digital SLR Cameras
Parent of Digital SLR Cameras
Polyhierarchies - Cases
© 2012 Hedden Information Management
Might be better not to have polyhierarchies when the taxonomy is small and the number of top-level categories are few
Case study: Client management documents of a financial services company has 114 topical terms categorized with just five broader terms:
Account Information Client Information Client Status Disclosures & Notifications Approvals/Guidance
Decided against polyhierarchies. Reason: Repeat users can memorize the small hierarchy. They
don’t expect polyhierarchy here.
Polyhierarchies - Cases
© 2012 Hedden Information Management
Polyhierarchies - ConclusionsSome is good. More isn’t necessarily better. Polyhierarchies are best for isolated terms that can fall
into two categories. Polyhierarchies can become too many in cases of
overlays of two different categorization methods for numerous terms. (Facets may be better.)
Polyhierarchies are useful, no matter how extensive, in term-focused thesauri
Polyhierarchies should be more limited in fully displayed taxonomies
© 2012 Hedden Information Management
Polyhierarchies - Exercise
Propose two broader terms for each:
Hotel managers Printers Fish Egypt Bill Gates
© 2012 Hedden Information Management
Facets
For serving faceted classification, which allows the assignment of multiple classifications to an object
A “dimension” of a query; a type of concept Intended for searching with multiple terms in combination
(post-coordination), one from each facet Can be for topics or for named entities, but generally not
both Reflect the domain of content A subset of metadata fields
© 2012 Hedden Information Management
Facets
Faceted ClassificationMathematician/librarian S.R. Ranganathan (1920s)
developed as an alternative to the Dewey Decimal System for books:
“Colon Classification”
1. Personality – topic or orientation
2. Matter – things or materials
3. Energy – actions
4. Space – places or locations
5. Time – times or time periods
© 2012 Hedden Information Management
Facets
Facets are suitable for: Structured data with discernable metadata fields or database
records Homogeneous data with similar types of characteristics (e.g.
products in an e-commerce site)
Example types of facets: For products
category, brand, size, color, price range, features For people
name, job title, gender, birth year, location, department For reports
author, subject, audience, document type, language
© 2012 Hedden Information Management
Facets
For enterprise taxonomies:Patrick Lambe,
Organising Knowledge People and organizations Things and parts of things Activity cycles Locations
For Web sites:Rosenfeld and Morville,
Information Architecture Topic Product Document type Audience Geography Price
© 2012 Hedden Information Management
Facet Examples
1. Shoebuy.com - advanced searchhttp://www.shoebuy.com/s.jsp/r_as
2. My Recipeshttp://search.myrecipes.com
3. Microbial Life Educational Resourceshttp://serc.carleton.edu/microbelife/resources
© 2012 Hedden Information Management
© 2012 Hedden Information Management
My Recipes
© 2012 Hedden Information Management
© 2012 Hedden Information Management
Facets & Hierarchies
Combining Facets and Hierarchies
1. Have hierarchies within facets
2. Start with hierarchical categories and then limit further with facets
© 2012 Hedden Information Management
Facets & Hierarchies
1. Hierarchies within facets: indented display
World Bank documents advanced search
http://documents.worldbank.org/curated/en/docadvancesearch
© 2012 Hedden Information Management
Facets & Hierarchies
2. Hierarchies of topics, then facets to narrow results:
ThomasNet business directoryhttp://ps.thomasnet.com/productsearch
Buzzillions product reviewshttp://www.buzzillions.com
Amazon.com books browsehttp://www.amazon.com
© 2012 Hedden Information Management
Taxonomy Structures: Hierarchies
One level per web pageYahoo directory
http://search.yahoo.com/dir
ThomasNet browse
http://www.thomasnet.com/browse
© 2012 Hedden Information Management
Buzzillions
© 2012 Hedden Information Management
Amazon > Books
Advantages Supports more complex search queries by users Allows users to control the search refinement, narrowing
or broadening in any manner or order
Disadvantages Only suitable for somewhat structured, unified type of
content that share the same multiple facets Might not support multiple terms selected at once from
the same facet Often hidden from users under “Advanced Search” Requires investment of thorough (multifacted)
indexing/tagging
Facets - Conclusions
© 2012 Hedden Information Management
Facets - Conclusions
Facet Design Tips Number of facets: 4-8, with 5-6 as ideal Facets listed in logical, not alphabetical order Number of terms per facet: 2-25
Ideally not much more than can be viewed in a scroll box If the list is obvious (US states), then more is OK. Exception can be made for hierarchical “Topics” facet
If <12 terms, then a logical display orderIf >12 terms, then alphabetical
A two-level hierarchy (indented) within a facet is possible
© 2012 Hedden Information Management
Designate a set of 4-7 facets for a tour operator web site selling vacation packages.
Facets - Exercise
© 2012 Hedden Information Management
© 2012. Access Innovations, Inc. All Rights Reserved.
• Designed to enhance understanding and retention of the vocabulary concepts necessary for creating a taxonomy, ontology, thesaurus, or controlled vocabulary.
• Game supplies:– 1 Deck of Orange Question and Challenge Cards– 1 Deck of Green Answer Cards
• Game setup:– Shuffle the deck of Green Answer cards, – Deal the entire deck to the players. – Shuffle the deck of Orange Question and Challenge cards– Place them facedown in a pile in the middle of the table so that all
players can reach the pile.
• Reinforce what you just heard!• Have fun!
TAXONOMATCH
© 2012. Access Innovations, Inc. All Rights Reserved.
1. Play moves to the left of the dealer
2. Draw a card from the top of the Orange cards. Read it aloud to all of the players.
3. The player who read the card says out loud what they think the answer is.
4. Each player looks at the Green Answer cards in their hand.
1. If they have the correct answer to the Question or Challenge, they show their card to everyone at the table.
2. If everyone agrees that the answer is correct, the player holding the correct answer card gives it to the player who read the Question or Challenge card.
5. The player places their associated pair of cards – one Orange Question and Challenge card and one Green Answer card – face up on the table in front of them.
6. Play passes to the person who held the correct Green Answer card in their hand. Play continues as in step 2 above.
7. Discussion among the players to arrive at the correct answer is permissible and encouraged!
8. If players do not arrive at a consensus regarding the correct answer, the Orange Question and Challenge card may be returned to the bottom of the pile, and play passes to the person to the left of the player who drew the previous card.
9. When all of the Orange Question and Challenge cards have been drawn, read aloud, and matched with their Green Answer cards, the game ends.
10. If there are any Orange Question and Challenge cards remaining to which players cannot agree on an answer, players may consult their notes or ask the session speaker.
TAXONOMATCH RULES
© 2012. Access Innovations, Inc. All Rights Reserved.
Implementation and applications
• Adding the terms to the information objects• Search and other applications• Taxonomy use cases – implementation• Opportunities and Obstacles• 30 minutes
© 2012. Access Innovations, Inc. All Rights Reserved.
Parts of the puzzle• The taxonomy
– The words to use– In the order you want the users to browse
• Applications– Search, CMS, SharePoint etc
• Implementation / actions– Making the links– Adding terms to information objects
• Most people confuse the parts and they act very differently
© 2012. Access Innovations, Inc. All Rights Reserved.
Thesaurus Master
Machine Aided Indexer
(M.A.I.™)
Database
Repository
Fully integrated with MOSS
SearchPresentatio
n Layer
Increasesaccuracy
Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms
Client Taxonomy
Inline Tagging
Metadata and Entity
Extractor
Automatic Summarizatio
n
Search Softwa
re
Client Data
Full Text
HTML, PDF,
Data Feeds,
etc.
Client taxonomy
The Workflow
Tag and createmetadata
Put in database with tags
Build search inverted index
Create user interface
Gather source data
© 2012. Access Innovations, Inc. All Rights Reserved.
Adding terms to information objects
• Part of the record– XML– MARC
• A relational table pointing the terms to a record ID number (Secondary key)
• Adding data to the HTML – META NAME KEYWORD Element
• Many other options
© 2012. Access Innovations, Inc. All Rights Reserved.
Part of the record - XML
• Added as an element in the XML record• Need an element to put the data in
– <Taxonomy Term>• Capture the terms when creating the records
© 2012. Access Innovations, Inc. All Rights Reserved.
Editorial Workflow IntegrationAuthor Submission Module
The author fills in the data to the document template, attaching images and graphs as necessary
An API calls Data Harmony and generates a list of indexing terms based on the content
© 2012. Access Innovations, Inc. All Rights Reserved.
Authors review the indexing and may change it
Content is stored into a data repository as HTML, XML, etc.
Editorial Workflow IntegrationAuthor Submission Module
© 2012. Access Innovations, Inc. All Rights Reserved.
In the HTML record• Makes it crawl able for the Internet• Used in CMS applications
– Content Management Systems• Add to the HTML
– Manually– In Dreamweaver – In your CMS like Extron
• Author Submissions Example• Do the same with SharePoint
© 2012. Access Innovations, Inc. All Rights Reserved.
META NAME “KEYWORDS”
© 2012. Access Innovations, Inc. All Rights Reserved.
In Relational Database Table• Primary key – the record• Secondary key all the metadata
– Like taxonomy terms– Like author– Like publication date
• Used in Oracle, SQL, etc– Need filed to put the taxonomy data in
• Supports “Faceted Search” – Each item in a separate field or element or table
© 2012. Access Innovations, Inc. All Rights Reserved.
Relational database diagram
© 2012. Access Innovations, Inc. All Rights Reserved.
Using taxonomies in applications• Improve search• Subject browsing• Mobile intelligence• Targeted resources based
on subject or user role• Link to society resources• Author submission module• Author authority database• Expert reviewer
identification• Member profiles• Data visualization• More like this
• In “indexing” or categorizing, as subject metadata
• In content management systems
• In SharePoint• In mashups• In social networking sites• In author tagging • In filtering data – e.g., spam
filters and RSS feeds• In web crawlers• Social media - community
© 2012. Access Innovations, Inc. All Rights Reserved.
Why does search fail?• Most large organizations have 5 search
softwares– All disappointing and on the shelf
• Inconsistent results• Unclear path to results• Lack of single unified clear consistent
vocabulary• Not tied to data governance
– Taxonomy– Other metadata
© 2012. Access Innovations, Inc. All Rights Reserved.
Parts of Search
• Search software– Inverted Index– Search algorithms
• Presentation layer– Search box– Autocompletion– Related and narrower terms– Hierarchical display
© 2012. Access Innovations, Inc. All Rights Reserved.
Outline of Presentation1 Define key terminology2 Thesaurus tools
– Features– Functions
3 Costs – Thesaurus construction– Thesaurus tools
4 Why & when?
Creating an Inverted File Index
Sample DOCUMENT
© 2012. Access Innovations, Inc. All Rights Reserved.
Simple inverted file indexThe terms from the “outline”
&1234constructioncostsdefinefeaturesfunctions
key ofoutlinepresentationterminologythesaurustoolswhenwhy
© 2012. Access Innovations, Inc. All Rights Reserved.
& - Stop1 - Stop2 - Stop3 - Stop4 - Stopconstruction - L7, P2, SH costs - L6, P1, Hdefine - L2, P1, Hfeatures - L4, P1, SHfunctions - L5, P1, SH
key - L2, P2, Hof - Stopoutline - L1, P1, Tpresentation - L1, P3, Tterminology - L2, P3, Hthesaurus - (1) - L3, P1, H (2) - L7, P1, SH (3) - L8, P1, SHtools - (1) - L3, P2, H (2) - L8, P2, SHwhen - L9, P3, Hwhy - L9, P1, H
Complex inverted file indexPlacement location
© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.
Improve search www.mediasleuth.com
Navigate the full taxonomy “tree”
BROWSE
Auto-completion using the taxonomy
Guide the user
© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.
Subject browsing
© 2012. Access Innovations, Inc. All Rights Reserved.
Targeted resources based on subject or user role
CONFIDENTIAL
© 2012. Access Innovations, Inc. All Rights Reserved.
Linked data
Job Posting for Expert on Topic A
Author NetworksSocial Networking
Journal Article on Topic A
Other Journal Articles on
Topic A
Upcoming Conference on Topic A
Podcast Interview with Researcher
Working on Topic A
Grant Available for Researchers
Working on Topic A
CME Activity on Topic A
© 2012. Access Innovations, Inc. All Rights Reserved.
Cancer Epidemiology Biomarkers & Prevention Vol. 12, 161-164, February 2003© 2003 American Association for Cancer Research Short Communications
Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the American Cancer Society Cancer Prevention Study II Nutrition Cohort Heather Spencer Feigelson1, Carolyn R. Jonas, Andreas S. Robertson, Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department of Epidemiology and Surveillance Research, American Cancer Society, National Home Office, Atlanta, Georgia 30329-4251
Recent studies suggest that the increased risk of breast cancer associated with alcohol consumption may be reduced by adequate folate intake. We examined this question among 66,561 postmenopausal women in the American Cancer Society Cancer Prevention Study II Nutrition Cohort.
Related Press Releases• How What and How Much We Eat (And Drink) Aff
ects Our Risk of Cancer • Novel COX-2 Combination Treatment May Reduc
e Colon Cancer Risk Combination Regimen of COX-2 Inhibitor and Fish Oil Causes Cell Death
• COX-2 Levels Are Elevated in Smokers
Related AACR Workshops and Conferences• Frontiers in Cancer Prevention Research• Continuing Medical Education (CME) • Molecular Targets and Cancer Therapeut
icsRelated Meeting Abstracts• Association between dietary folate intake, alcoh
ol intake, and methylenetetrahydrofolate reductase C677T and A1298C polymorphisms and subsequent breast
• Folate, folate cofactor, and alcohol intakes and risk for colorectal adenoma
• Dietary folate intake and risk of prostate cancer in a large prospective cohort study
Related Working Groups• Finance• Charter• Molecular Epidemiology
Related Education Book ContentOral Contraceptives, Postmenopausal Hormones, and Breast CancerPhysical Activity and CancerHormonal Interventions: From Adjuvant Therapy to Breast Cancer PreventionRelated Awards
• AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards
• ACS Award• Weinstein Distinguished Lecture
WebcastsRelated Webcasts
Think Tank ReportRelated Think Tank Report Content
Link to society resources
© 2012. Access Innovations, Inc. All Rights Reserved.
Authors at a place
© 2011. Access Innovations, Inc. All Rights Reserved. © 2012. Access Innovations, Inc. All Rights Reserved.
Member profile tagging
User pastes or uploads CV
Button to auto-extract taxonomy attributes
© 2012. Access Innovations, Inc. All Rights Reserved.
TaxoTerm ServerData Harmony
(M.A.I.)
Even
t Han
dler
Returns subject metadata
MicrosoftSharePointServer 2010
User uploads a document to SharePoint space
Before uploading to SharePoint server, the EventHandler sends the document to Data Harmony.
Data Harmony automatically attaches indexing terms before uploading to MOSS
108
Adding terms to SharePoint
© 2012. Access Innovations, Inc. All Rights Reserved.
SharePoint 2010 only shows 10 lines of the taxonomy
109
This add on makes it all viewable
© 2012. Access Innovations, Inc. All Rights Reserved.
QU
ERY API
CUSTOMCONNECTOR
EMAILCONNECTOR
Core Architectural Components
Pipeline
SEARCHSERVER
QU
ERYPR
OC
ESSOR
Query
Results
VerticalApplications
Portals
CustomFront-Ends
MobileDevicesContent
Push
DO
CU
MEN
TPR
OC
ESSOR
WebContent
Files,Documents
Databases
CustomApplications
CO
NTEN
T API
FAST MANAGEMENT API
Index DBDATABASE
CONNECTOR
FILETRAVERSER
WEBCRAWLER
Pipeline
Email, Groupware
Administrator’sDashboard
FILTERSERVER
Agent DB
Alerts
Use taxonomy terms hereData Harmony Governance API
MA
Istro
Search harmony
Taxonomies added in search example
© 2012. Access Innovations, Inc. All Rights Reserved.
Autosuggestion of taxonomy terms
Populate Keywords, Descriptors, Indexing terms, etc.
Allow for manual review of auto-tagging for quality assurance.
© 2012. Access Innovations, Inc. All Rights Reserved.
More Innovations• Link topic to article to author to event• Make visual links within domain • Enable authors to submit and categorize conference
submissions• Create author authority database linking to co-authors,
topics, locations, etc.• Create expert reviewer database• Create member profiles with alternate names,
publications, tagged by topic• Visualize data and domain distribution• Display interest connections in social network• Deliver accurate targeted information through mobile
applications• Etc.
© 2012. Access Innovations, Inc. All Rights Reserved.
Taxonomy standards• Z39.19 (2005) Controlled Vocabularies• BS 8723 Parts 1 – 5• ISO25964 Parts 1 - 2 • TAG 37 and 46 standards• SKOS - Simple Knowledge Organization System• OWL - Web Ontology Language• AND more!
© 2012. Access Innovations, Inc. All Rights Reserved.
IT is often Fire, Ready, Aim!
• Choose the hardware• Choose the software• Decide on the format• Convert the data• Fix the data• Tack on a taxonomy• Ignore the standards
© 2012. Access Innovations, Inc. All Rights Reserved.
Change to Ready, Aim, Fire!
• Follow the data• Look at the data, format and content• Design taxonomy for data• Leverage the standards• Use taxonomy to tag data• Choose search and repository software for
data• Load the data into the system• Keep your eye on the target
© 2012. Access Innovations, Inc. All Rights Reserved.
Summary
• We covered the basics• We talked about the implementation • Application of the terms to your content• We reinforced the learning with activities• No go hear the case studies of the next two
days!
Questions?
Heather Hedden
Taxonomy Consultant
Hedden Information Management
www.hedden-information.com
www.accidental-taxonomist.com
978-467-5195
Marjorie M.K. Hlava
President
Access Innovations, Inc.
www.accessinn.com
www.data-harmony.com
505-998-0800