text analytics for search applications workshop tom reamy chief knowledge architect kaps group...

30
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Upload: bathsheba-fields

Post on 21-Jan-2016

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analyticsfor Search Applications

Workshop

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2

Agenda

Introduction – Text Analytics & Infrastructure Platform– Text Analytics Features– Semantic Infrastructure – Taxonomy, Metadata, Technology– Value of Text Analytics– Getting Started with Text Analytics

Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications

– Integration with Search and ECM– Platform for Information Applications

Questions / Discussions

Page 3: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

3

KAPS Group: General

Knowledge Architecture Professional Services – Network of Consultants Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching

– Attensity, Clarabridge, Lexalytics, Strategy – IM & KM - Text Analytics, Social Media, Integration Services:

– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,

Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc.

Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies

Presentations, Articles, White Papers – http://www.kapsgroup.com

Page 4: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

4

Agenda – Introduction Text Analytics & Semantic Infrastructure Text Analytics Features

– Categorization & Extraction

Semantic Infrastructure – Taxonomy, Metadata, Technology

Value of Text Analytics– Enterprise Search that works

Getting Started with Text Analytics – Text Analytics Strategy & Vision– Text Analytics Evaluation / Quick Start

Page 5: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

5

Introduction to Text AnalyticsText Analytics Features Noun Phrase Extraction / Fact Extraction

– Catalogs with variants, rule based dynamic– Relationships of entities – people-organizations-activities

Sentiment Analysis– Objects and phrases – statistics & rules – Positive and Negative

Summarization – replace snippets Auto-categorization – built on a taxonomy

– Training sets, Terms, Semantic Networks– Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE

Auto-categorization as Foundation– Disambiguation - Identification of objects, events, context– Build rules based, not simply Bag of Individual Words

Page 6: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Case Study – Categorization & Sentiment

6

Page 7: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Case Study – Categorization & Sentiment

7

Page 8: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

8

Page 9: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

9

Introduction to Text AnalyticsTaxonomy & Metadata Thesauri, Controlled Vocabulary, Glossaries, Product Catalogs

– Resources to build on SharePoint – Managed Metadata Services

– Term stores – corporate taxonomies– Enterprise Keywords (Folksonomy)

Metadata standards – Dublin Core - Mostly syntactic not semantic– Semantic – keywords – very poor performance, no structure

Facets – classes of metadata– Standard - People, Organization, Document type-purpose– Requires huge amounts of metadata

Page 10: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

10

Introduction to Text AnalyticsTA & Taxonomy Complimentary Information Platform Taxonomy provides a consistent and common vocabulary

– Enterprise resource – integrated not centralized Text Analytics provides a consistent tagging

– Human indexing is subject to inter and intra individual variation Taxonomy provides the basic structure for categorization

– And candidates terms Text Analytics provides the power to apply the taxonomy

– And metadata of all kinds Text Analytics and Taxonomy Together – Platform

– Consistent in every dimension– Powerful and economic

Page 11: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Introduction to Text AnalyticsTaxonomy and Text Analytics Standard Taxonomies = starter categorization rules

– Example – Mesh – bottom 5 layers are terms Categorization taxonomy structure

– Tradeoff of depth and complexity of rules– Easier to maintain taxonomy, but need to refine rules

Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large– Orthogonal categories

Smaller modular taxonomies– More flexible relationships – not just Is-A-Kind/Child-Of

Different kinds of taxonomies – Sentiment – products and features

• Taxonomy of Sentiment, Emotion - Expertise – process

11

Page 12: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

12

Introduction to Text AnalyticsMetadata - Tagging How do you bridge the gap – taxonomy to documents? Tagging documents with taxonomy nodes is tough

– And expensive – central or distributed Library staff –experts in categorization not subject matter

– Too limited, narrow bottleneck– Often don’t understand business processes and business uses

Authors – Experts in the subject matter, terrible at categorization– Intra and Inter inconsistency, “intertwingleness”– Choosing tags from taxonomy – complex task– Folksonomy – almost as complex, wildly inconsistent– Resistance – not their job, cognitively difficult = non-compliance

Text Analytics is the answer(s)!

Page 13: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

13

Introduction to Text AnalyticsContent Management – SharePoint Mind the Gap – Manual, Automatic, Hybrid All require human effort – issue of where and how effective Manual - human effort is tagging (difficult, inconsistent) Automatic and Hybrid - human effort is prior to tagging

– Build on expertise – librarians on categorization, SME’s on subject terms Hybrid Model

– Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author

– Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy

– Feedback – if author overrides -> suggestion for new category– Facets – Requires a lot of Metadata - Entity Extraction feeds facets

Hybrid – Automatic is really a spectrum – depends on context

Page 14: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

14

Introduction to Text AnalyticsBenefits of Text Analytics Why Text Analytics?

– Enterprise search has failed to live up to its potential– Enterprise Content management has failed to live up to its potential– Taxonomy has failed to live up to its potential– Adding metadata, especially keywords has not worked

What is missing?– Intelligence – human level categorization, conceptualization– Infrastructure – Integrated solutions not technology, software

Text Analytics can be the foundation that (finally) drives success – search, content management, and much more

Page 15: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

15

Text Analytics Platform – BenefitsIDC White Paper Time Wasted

– Reformat information - $5.7 million per 1,000 per year– Not finding information - $5.3 million per 1,000– Recreating content - $4.5 Million per 1,000

Small Percent Gain = large savings– 1% - $10 million– 5% - $50 million– 10% - $100 million

Page 16: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

16

Text Analytics Platform – Benefits

Findability within and outside the enterprise– Savings per year - $millions

Rescue enterprise search and ECM projects– Add semantics to search

Clean up enterprise content– Duplication and accurate categorization

Improve the quality of information access– Finding the right information can save millions

Build smarter applications – Social networking, locate expertise within the enterprise

Page 17: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

17

Text Analytics Platform – Benefits

Understand your customers– What they are talking about and how they feel about it

Empower your employees – Not only more time, but they work smarter

Understand your competitors– What they are working on, talking about– Combine unstructured content and rich data sources – more

intelligent analysis

Page 18: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

18

Text Analytics Platform – Dangers

Text Analytics as a software project Not enough resources – to develop, to maintain-refine Wrong resources – SME’s, IT, Library

– Need all of the above and taxonomists+

Bad Design:– Start with bad taxonomy– Wrong taxonomy – too big or two flat

Bad Categorization / Entity Extraction– Right kind of experience

Page 19: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

19

Getting Started with Text AnalyticsText Analytics Vision & Strategy Strategic Questions – why, what value from the text analytics,

how are you going to use it– Platform or Applications?

What are the basic capabilities of Text Analytics? What can Text Analytics do for Search?

– After 10 years of failure – get search to work?

What can you do with smart search based applications?– RM, PII, Social

ROI for effective search – difficulty of believing– Problems with metadata, taxonomy

Page 20: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

20

Getting Started with Text AnalyticsText Analytics Vision & Strategy Simple Subject Taxonomy structure

– Easy to develop and maintain Combined with categorization capabilities

– Added power and intelligence Combined with people tagging, refining tags Combined with Faceted Metadata

– Dynamic selection of simple categories– Allow multiple user perspectives

• Can’t predict all the ways people think• Monkey, Banana, Panda

Combined with ontologies and semantic data– Multiple applications – Text mining to Search– Combine search and browse

Page 21: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Step 1 : TA Information Audit Start with Self Knowledge Info Problems – what, how severe Formal Process - KA audit – content, users, technology, business

and information behaviors, applications - Or informal for smaller organization,

Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining

Category modeling – Cognitive Science – how people think Natural level categories mapped to communities, activities

• Novice prefer higher levels• Balance of informative and distinctiveness

Text Analytics Strategy/Model – forms, technology, people

21

Page 22: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Step 1 : TA Information Audit Start with Self Knowledge Ideas – Content and Content Structure

– Map of Content – Tribal language silos– Structure – articulate and integrate– Taxonomic resources

People – Producers & Consumers– Communities, Users, Central Team

Activities – Business processes and procedures– Semantics, information needs and behaviors– Information Governance Policy

Technology – CMS, Search, portals, text analytics– Applications – BI, CI, Semantic Web, Text Mining

22

Page 23: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

23

Step 2: TA EvaluationVarieties of Taxonomy/ Text Analytics Software Taxonomy Management - extraction Full Platform

– SAS, SAP, Smart Logic, Concept Searching, Expert System, IBM, Linguamatics, GATE

Embedded – Search or Content Management– FAST, Autonomy, Endeca, Vivisimo, NLP, etc.– Interwoven, Documentum, etc.

Specialty / Ontology (other semantic)– Sentiment Analysis – Attensity, Lexalytics, Clarabridge, Lots – Ontology – extraction, plus ontology

Page 24: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Step 2: Text Analytics EvaluationDifferent Kind of software evaluation Traditional Software Evaluation - Start

– Filter One- Ask Experts - reputation, research – Gartner, etc.• Market strength of vendor, platforms, etc.• Feature scorecard – minimum, must have, filter to top 6

– Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus

– Filter Three – In-Depth Demo – 3-6 vendors Reduce to 1-3 vendors Vendors have different strengths in multiple environments

– Millions of short, badly typed documents, Build application– Library 200 page PDF, enterprise & public search

24

Page 25: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Design of the Text Analytics Selection Team Traditional Candidates – IT&, Business, Library IT - Experience with software purchases, needs assess, budget

– Search/Categorization is unlike other software, deeper look

Business -understand business, focus on business value They can get executive sponsorship, support, and budget

– But don’t understand information behavior, semantic focus

Library, KM - Understand information structure Experts in search experience and categorization

– But don’t understand business or technology

25

Page 26: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Design of the Text Analytics Selection Team

Interdisciplinary Team, headed by Information Professionals Relative Contributions

– IT – Set necessary conditions, support tests– Business – provide input into requirements, support project– Library – provide input into requirements, add understanding of

search semantics and functionality

Much more likely to make a good decision Create the foundation for implementation

26

Page 27: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Step 3: Proof of Concept / Pilot Project

4 weeks POC – bake off / or short pilot Real life scenarios, categorization with your content 2 rounds of development, test, refine / Not OOB Need SME’s as test evaluators – also to do an initial categorization of

content Measurable Quality of results is the essential factor Majority of time is on auto-categorization Need to balance uniformity of results with vendor unique capabilities –

have to determine at POC time Taxonomy Developers – expert consultants plus internal taxonomists

27

Page 28: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 29: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

29

Resources

Conferences:– Text Analytics World – All aspects of text analytics

• Call for Speakers – Oct 3-4 Boston– Text Analytics Summit – social media focus

LinkedIn Groups:– Text Analytics World– Text Analytics Group– Data and Text Professionals– Sentiment Analysis– Metadata Management– Semantic Technologies

Page 30: Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

30

Resources

Books– Women, Fire, and Dangerous Things

• George Lakoff– Knowledge, Concepts, and Categories

• Koen Lamberts and David Shanks– The Stuff of Thought – Steven Pinker

Journals– Academic – Cognitive Science, Linguistics, NLP– Applied – Scientific American Mind, New Scientist