sound foundations - taxonomies that workk2s.ca/ihfarmer.pdf · 2007-08-16 · sound foundations -...
TRANSCRIPT
Second Knowledge Solutions:http:// k2s.ca 1
Sound foundations -taxonomies thatwork
Linda Farmer, Second Knowledge SolutionsBeth Golden, FactivaDavid M. Scott, NewsEdge
Second Knowledge Solutions:http:// k2s.ca 2
Information Highways 2002
Linda FarmerSecond Knowledge Solutions
[email protected]://k2s.ca
Taxonomies: Naming & Structuring Your Content
Second Knowledge Solutions:http:// k2s.ca 3
Here’s What We’ll Cover
1. Building a taxonomy2. Populating the taxonomy with content3. Automatic categorization4. Benefits and ROI5. Future Directions
Second Knowledge Solutions:http:// k2s.ca 4
What is a taxonomy?
SPARROW
BIRD
CANARY
Mammal
Animal
EAGLE
A set of named, mutuallyexclusive hierarchical categories.
A systematic way to name and structure information about a given subject
Second Knowledge Solutions:http:// k2s.ca 5
What is a thesaurus?BIRDNT CANARY EAGLE SPARROW
BT ANIMAL
RT BEAK ORNITHOLOGY WINGS
UF Aves
A list of preferred and non-preferredterms accompanied by a standardized set of relationship indicators.- hierarchical (BT), associative (RT) and equivalent/synonymous (UF)
Indexing/Word MappingPromotes consistency in indexing documents and retrieving more relevant search results
Vocabulary Control
Second Knowledge Solutions:http:// k2s.ca 6
Thesaurus Taxonomy
BIRDNT CANARY EAGLE SPARROW
BT ANIMAL
RT BEAK ORNITHOLOGY WINGS
UF Aves
Tweety Bird
Classification/Categorization
SPARROW
BIRD
CANARY
Mammal
Animal
EAGLE
Second Knowledge Solutions:http:// k2s.ca 7
What is an ontology?
Taxonomy
Thesaurus
Ontology = extended taxonomy• Concepts, relations, facts & principles• Catalogue of a world/domain• How it works, how it’s put together (B2B)
A taxonomy on steroids
Second Knowledge Solutions:http:// k2s.ca 8
How Do You Build A Taxonomy?
1. Business Goals: What business process/ objectivewill this taxonomy serve?
2. Create a domain statement.3. Do other taxonomies exist in this domain?4. Bring together a sample set of documents & identify
key concepts. Select name.5. Determine relationships between concepts.6. Create hierarchical categories for domain.7. Choose implementation tools.
Second Knowledge Solutions:http:// k2s.ca 9
Domain = Knowledge Management
PeopleKnowledge
TechnologyStrategiesEnterprise
Knowledge Representation
Content Management
Knowledge Sharing
Social Capital
Intellectual Capital
Human Capital
Structural Capital
Customer Capital
Management Publication Collection
Second Knowledge Solutions:http:// k2s.ca 10
Challenges in Building a Taxonomy
1. Staff inexperienced, untrained in thinkingtaxonomically
2. Interesting but laborious process3. Expensive and time-consuming part of a
portal investment4. Subject Matter Experts (SMEs) too busy5. Translate taxonomy into a web/portal
interface6. Keeping it up-to-date
Second Knowledge Solutions:http:// k2s.ca 11
Populating Taxonomy withContent
1. Revisit existing content and assigncategories from taxonomy.
2. Incorporate metadata tags fortaxonomy category.
Second Knowledge Solutions:http:// k2s.ca 12
Challenges: Populating a Taxonomy
1. Existing corpus of documents is hugeUnstructured, untagged, unindexed.Who’s going to do it? How long will it take?Will we need to hire additional staff? Howmuch old stuff is worth keeping?
2. Keeping up with large volume of newcontent.
Second Knowledge Solutions:http:// k2s.ca 13
Taxonomy Maintenance
1. Large volume of new content2. Need to adjust taxonomy categories3. Reclassify content to incorporate
changes4. Need good technical support
Second Knowledge Solutions:http:// k2s.ca 14
Volume of Information ImposesConstraints
Categorization Approaches
1. Manual
2. Automatic
3. Hybrid/cyborg
Second Knowledge Solutions:http:// k2s.ca 15
Categorization Software Tools
1. Create taxonomy categories2. Classify existing collections
of unstructured content3. Apply metadata to content
The new lifeline for the enterprise swimming in unstructured information
Second Knowledge Solutions:http:// k2s.ca 16
Phase 1: Taxonomy CreationINFORMATION EXTRACTION ENGINE
•Linguistic analysis•Statistical clustering•User-defined vocabulary•Proximity analysis•Stemming•Proprietary techniques• Intranet
• Internet• Domain Corpus (Lotus Notes, Documentum, etc.)
UNSTRUCTUREDNETWORKED SOURCES
Extracted phrases
categories + concepts
Taxonomy
•Taxonomy import•Vocabulary preferences
Second Knowledge Solutions:http:// k2s.ca 17
Phase 2: Categorization
TrainingSet/Topic
Taxonomy
Categorizeddocuments
Unstructureddocuments
CLASSIFIER SYSTEM
“bag of words”
Tagged (XML) content
•Intranet•Extranet•Portals•Website•Applications
•Directory Management •Confidence Scoring•Workflow• crawler sw
Second Knowledge Solutions:http:// k2s.ca 18
Phase 3: Taxonomy Maintenance
New documents
INFORMATION EXTRACTION ENGINE
•Concept maps•Taxonomy viewers
Taxonomy
New conceptsNew categories
Second Knowledge Solutions:http:// k2s.ca 19
Some Automatic CategorizersSemio
Inxight
Autonomy
Quiver
Cartia
Mohomine
Metacode
GlobalWisdom
ArchiText(Yellowbrix)
Second Knowledge Solutions:http:// k2s.ca 20
Benefits of a Taxonomy
1. Organized around a purpose2. Named and given context &
meaning3. Structured for quick access4. Tagged for computer
manipulation
Your information is now content:
Second Knowledge Solutions:http:// k2s.ca 21
Which means that …
ÿ Information is easier tofind.
ÿ It can be shared andreused.
ÿ Duplication of projectefforts is minimized
ÿ Full text searching isenhanced, moreeffective
ÿ Content qualityimproved with contextand visibility across theenterprise
ÿ Provides navigationpaths through corporateknowledge
ÿ Shows up whatinformation is missing.
AND…
Second Knowledge Solutions:http:// k2s.ca 22
Knowledge Repository.. Content….Structure…….Metadata
BIRD
CANARY
Mammal
Animal
EAGLESPARROW
Taxonomy Is Now a KnowledgeFramework
PERSONALIZATION
ENTERPRISECONTENT MANAGEMENT
TRAINING
COLLABORATION
Second Knowledge Solutions:http:// k2s.ca 23
Collection
• Authoring• Acquisition• Conversion• Aggregation• Collection Services
RAW INFO
Content Components
Management
• Repository• Administration• Workflow
Publication
• Templates• Publishing Services• Publications web, print, CD
TAXONOMIES
Content Components
Content Management Process
Second Knowledge Solutions:http:// k2s.ca 24
Content Management Resource
Bob BoikoContent ManagementBible 2002
www.metatorial.com
Second Knowledge Solutions:http:// k2s.ca 25
Upfront Expense and ROI
l What is the businesscase for taxonomies?
l Does the cost ofcreating & maintaininga taxonomy outweighits benefits? $$$$
Second Knowledge Solutions:http:// k2s.ca 26
Some Costsÿ Categorization software
$125,000 +ÿ Content management softwareÿ Staff to work on setting up taxonomyÿ Continual maintenanceÿ Modifications to content management
processesÿ Technical staff support
Second Knowledge Solutions:http:// k2s.ca 27
Some Returns on Investmentÿ Enhanced productivity
e.g. 30% reduction in time searching for info.10 - 40% increase in productivity.
ÿ Improved staff effectivenessFind info. easier. Reduced development cycles,effective sales calls, enhanced customer support, etc.
ÿ Streamlining of processesEnhanced info. sharing with reduction in duplicationof effort
Second Knowledge Solutions:http:// k2s.ca 28
The Big QuestionsCan our enterprise afford what it takesto effectively organize our knowledge?
Do we have any alternatives or is this thecost of doing business?
Are there other ways to remaincompetitive in an information-criticalmarketplace?
Second Knowledge Solutions:http:// k2s.ca 29
One Person’s View
l Marc Auckland, Chief Knowledge Manager,British TelecomAt a CKO Summit, September 2000
“If you think knowledge is expensive, try ignorance.”
Second Knowledge Solutions:http:// k2s.ca 30
Future Trends3rd wave of Internet-related software - the “semantic wave” involving meaning and understanding.
High performance knowledge processing
Knowledge Bases (KBs)• Merging of collaborative multiple KBs created bydistributed teams of domain experts
Second Knowledge Solutions:http:// k2s.ca 31
Spinning The Semantic Web
Berners-Lee & W3C (www.w3.org)
The “smart network” that understands the meaning of words and the logical relationships among them.
Replaces the “web of links” with the “web of meaning”.
Distributed Ontologies
Artificial intelligence
XML/RDF
Intelligent Agents
Databases Natural Language Processing
Topic Maps
Second Knowledge Solutions:http:// k2s.ca 32
Distributed Ontologiesÿ Taxonomy + metadata
about the properties foreach category or class
ÿ Encoded with logic-basedlanguage that enablesautomated reasoning byintelligent agents and webapplications
ÿ Goal: Provide highlyreusable, extensible, long-lived semantic structure forcontent Taxonomy on steroids
Second Knowledge Solutions:http:// k2s.ca 33
Ontology for Class Named “Bird”
SPARROW
BIRD
CANARY
Mammal
Animal
EAGLE
Migration patterns
Diseases
Mythology
Aerodynamics
Second Knowledge Solutions:http:// k2s.ca 34
Application Areas
ÿ E-commerceÿ Enterprise
integrationÿ Digital librariesÿ Medicine
ÿ Biologyÿ Bioinformaticsÿ Geographic
information systemsÿ Legal information
systems
Second Knowledge Solutions:http:// k2s.ca 35
Topic Maps- Hot New Technology
ÿ New paradigm for knowledge navigation andsynthesis (based on the index to a book)
ÿ Provide a navigation map or style sheet for aninformation set
ÿ Emerging ISO standard (XTM 1.0)www.topicmaps.org
Second Knowledge Solutions:http:// k2s.ca 36
The TAO of Topic Maps
Occurrences- monograph, article, picture, commentary, etc.
“Born In” “Is In”“Written By”
“Influenced By”
Associations
Topics
M.Butterfly
Italy
LuccaPucciniTosca
RomeVerdi
Info. Set = Opera
Second Knowledge Solutions:http:// k2s.ca 37
A Topic Map
Occurrences
“Born In” “Is In”“Written By”
Topics
M.Butterfly
Italy
LuccaPucciniTosca
RomeVerdi
Info. Set = Opera
“Influenced by”
Topic MapAssociations
RDF/XML/XTM
Second Knowledge Solutions:http:// k2s.ca 38
Applications of Topic Maps
ÿ Enhance & extend existing taxonomiesÿ Provide customizable, personalized routes to
informationÿ Build a structured semantic network link over
web, portals and intranets.ÿ Apply multiple maps (views) to the same
information poolÿ Maps collectable, interchangeable
Second Knowledge Solutions:http:// k2s.ca 39
Parting Words
ÿ Create a taxonomy thatreflects the needs andorganizational logic ofyour business - nothingmore, nothing less.
ÿ Plan to commit enoughpeople with expertise inthe subject matter tocreate and maintain thetaxonomy
ÿ Keep up with &evaluate the taxonomytechnology.
ÿ Used a phasedapproach toimplementation
ÿ Make enterprise contentmanagement part ofyour taxonomy initiative.
Second Knowledge Solutions:http:// k2s.ca 41
Sound foundations -taxonomies thatwork