termset metadata tagging presentation - taxonomy bootcamp london 2016
TRANSCRIPT
TAGGING DOCUMENTS MADE EASY, USING MACHINE LEARNINGBrendan [email protected]
BRENDAN CLARKE• A Microsoft ECM expert
• Co-Founded TermSet three years ago
• Got the scars from real world IA projects
Creating Tax-
ononomies; 7
NLP; 3
Demo; 10Tagging; 10
Demo; 10Agenda
PART ONE – APPROACHES FOR BUILDING TAXONOMIES
TOP DOWN - APPROCH• Defines top level
containers and work downwards.
• Usually broad (3-10 wide) and shallow (3-4 deep)
• Simple, high level classification (functional)
TOP DOWN – TERMS
• Manually defined or replicated from existing structures
• Imported from other systems
• Industry standards / purchased taxonomies
TOP DOWN – SUMMARY
• People / Committee Driven approach
• Some guesswork of what terms should be
• Simple, high level classification (functional) – Way better than folders!
BOTTOM UP - APPROCH• Terms driven by the
words and phrases within your content
• More complex taxonomies
• Detailed, accurate terms that are subject or facet level
BOTTOM UP - TERMS• Manual analysis of
the documents
• Statistical analysis of terms and phrases
• Natural Language processing
BOTTOM UP - SUMMARY• Technology driven
approach (or a very tough people process)
• Produces detailed taxonomies that reflect the actual content
• Extra granulation of tagging
AND THE WINNER IS…
• Combining top down and bottom up is the best approach
• Top down classifies the type of documents
• Bottom up classifies the subject of the document
• New technology allows bottom up to be realistic
TermSet adds accurate consistent metadata without placing any burden on end users or your IT team.
Builds taxonomies (bottom up) using NLPApplies tagsMetadata as a service TM
WHAT EXACTLY IS NLP ?
DEMO – CREATING TERMS FROM YOUR DOCUMENTS USING NLP
PART TWO – APPLYING YOUR TAGS
MANUAL TAGGING • Adoption problem
• Asbestos problem / GIGO
• Challenging to do retrospectively (migration tools can help)
MANUAL TAGGING • Infer as many terms as possible from:
Document types, Location, Function
• Mandate as few tags as possible
• Stay shallow or flat with hierarchies
MACHINE TAGGING • Simple machine tagging can use search
to match taxonomy terms to the content of documents
• More advanced taggers allow rules or weights to be assigned to each tag (tags not context aware)
• New technologies (NLP) provide a new approach to creating taxonomies
TERMSET TAGGING • TermSet recommends the right
taxonomies for each library (context aware tagging)
• TermSet automates building the underlying IA in SharePoint
• Extra cool NLP tags can be added (Summaries, Sentiment and Language)
• Monitors for new documents and terms arriving into your world
DEMO – TAGGING DOCUMENTS
WRAP UP• TermSet automates a bottom up
approach to create and use taxonomies for SharePoint
• Visit www.termset.com or e-mail [email protected] for a free licence
• If you need assistance with top down taxonomies or you use a different DMS e-mail me to join the beta program for www.taxononica.com