strategies llctaxonomy may 22, 2005copyright 2005 taxonomy strategies llc. all rights reserved....

86
Strategies LLC Taxonomy May 22, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide Metadata Applications Ron Daniel & Joseph Busch Taxonomy Strategies

Upload: erin-hughes

Post on 26-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

Strategies LLCTaxonomy

May 22, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Workshop: Why and How to Use Dublin Core for Enterprise-Wide

Metadata Applications

Ron Daniel & Joseph Busch

Taxonomy Strategies

Page 2: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

2TAXONOMY STRATEGIES LLC The business of organized information

Workshop goals

1. What is the Dublin Core?2. Answer these enterprise-wide metadata ROI questions:

What is the value proposition for adding metadata to content? Does metadata make content reusable? Findable? Improve productivity? How can metadata value be measured in a way that quantifies how it contributes to the bottom line?

3. Answer these Business process questions: How is Dublin Core tagging being done on content to expose

metadata to portals, search engines, and other metadata-aware applications? How are metadata value spaces (controlled vocabularies) maintained within an enterprise? Across enterprises?

4. Answer these technology questions: What tools exist to use Dublin Core and other metadata

standards in enterprise information management environments?

Page 3: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

3TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Q&A6:45 Adjourn

Page 4: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

4TAXONOMY STRATEGIES LLC The business of organized information

Who we are: Joseph Busch

Over 25 years in the business of organized information Founder, Taxonomy Strategies Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies (acquired by Interwoven,

November 2000) Program Manager, Getty Foundation Manager, Pricewaterhouse

Metadata and taxonomies community leadership President, American Society for Information Science & Technology Director, Dublin Core Metadata Initiative Adviser, National Research Council Computer Science and

Telecommunications Board Reviewer, National Science Foundation Division of Information and

Intelligent Systems Founder, Networked Knowledge Organization Systems/Services

Page 5: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

5TAXONOMY STRATEGIES LLC The business of organized information

Who we are: Ron Daniel, Jr.

Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies Standards Architect, Interwoven Senior Information Scientist, Metacode Technologies (acquired by

Interwoven, November 2000) Technical Staff Member, Los Alamos National Laboratory

Metadata and taxonomies community leadership Chair, PRISM (Publishers Requirements for Industry Standard

Metadata) working group Acting chair: XML Linking working group Member: RDF working groups Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2

reports.

Page 6: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

6TAXONOMY STRATEGIES LLC The business of organized information

Recent & current projects

Government Commodity Futures Trading

Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank of Atlanta Forest Service GSA Office of Citizen Services (

www.firstgov.gov) Head Start Infocomm Development Authority of

Singapore NASA (nasataxonomy.jpl.nasa.gov) Small Business Administration Social Security Administration USDA Economic Research Service USDA e-Government Program (

www.usda.gov)

Commercial Allstate Insurance Blue Shield of California Debevoise & Plimpton Halliburton Hewlett Packard Motorola PeopleSoft Pricewaterhouse Coopers Siderean Software Sprint Time Inc.

Commercial subcontracts Agency.com – Top financial services Critical Mass – Fortune 50 retailer Deloitte Consulting – Big credit card Gistics/OTB – Direct selling giant

NGO’s CEN IDEAlliance IMF OCLC

Page 7: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

7TAXONOMY STRATEGIES LLC The business of organized information

What we do

Organize Stuff

Page 8: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

8TAXONOMY STRATEGIES LLC The business of organized information

Who are you? Tell us:

Your name Your organization Your job title The things you want to get from this workshop

Page 9: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

9TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Q&A6:45 Adjourn

Page 10: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

10TAXONOMY STRATEGIES LLC The business of organized information

Metadata: Different definitions

Library & Information Science

Author/Title/Subject Controlled Vocabularies for

Subject Codes (e.g. Dewey)

Authority Files for Author Names

Database Tables/Columns/

Datatypes/Relationships References for some

values

Page 11: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

11TAXONOMY STRATEGIES LLC The business of organized information

Metadata: Why it matters

“Adding metadata to unstructured content allows it to be managed like structured content. Applications that use structured content work better.”

“Enriching content with structured metadata is critical for supporting search and personalized content delivery.”

“Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.”

“Better structure equals better access: Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access.”

Page 12: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

12TAXONOMY STRATEGIES LLC The business of organized information

Metadata: Supports core functions

Asset metadata – Who:

Creator, Publisher, Contributor, Type, Format,

Identifier

Subject metadata –What, Where & Why:

Subject, Title, Description, Coverage

Relational metadata – Links between and to:

Source, Relation

Use metadata – When & How:

Date, Language, Rights

Enabled Functionality

Co

mp

lex

ity

http://dublincore.org/documents/dces/

More efficient editorial process

Better navigation &

discovery

Page 13: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

13TAXONOMY STRATEGIES LLC The business of organized information

Hierarchical classification of things into a tree structureHierarchical classification of things into a tree structure

What is a taxonomy? Systematics view

Kingdom Phylum Class Order Family Genus Species

AnimaliaChordata

MammaliaCarnivora

CanidaeCanis

C. familiari

Linnaeus …

Segment Family Class Commodity

44-Office Equipment and Accessories and Supplies .12-Office Supplies

.17-Writing Instruments

.05-Mechanical pencils

.06-Wooden pencils

.07-Colored pencils

UNSPSC …

Page 14: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

14TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Q&A6:45 Adjourn

Page 15: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

15TAXONOMY STRATEGIES LLC The business of organized information

Dublin Core: A little more complicated

Elements1. Identifier2. Title3. Creator4. Contributor5. Publisher6. Subject7. Description8. Coverage9. Format10. Type11. Date12. Relation13. Source14. Rights15. Language

AbstractAccess rightsAlternativeAudienceAvailableBibliographic citationConforms toCreatedDate acceptedDate copyrightedDate submittedEducation levelExtentHas formatHas partHas versionIs format ofIs part of

Is referenced byIs replaced byIs required byIssuedIs version ofLicenseMediatorMediumModifiedProvenanceReferencesReplacesRequiresRights holderSpatialTable of contentsTemporalValid

RefinementsBoxDCMITypeDDCIMTISO3166ISO639-2LCCLCSHMESHPeriodPointRFC1766RFC3066TGNUDCURIW3CTDF

EncodingsCollectionDatasetEventImageInteractive ResourceMoving ImagePhysical ObjectServiceSoftwareSoundStill ImageText

Types

Page 16: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

16TAXONOMY STRATEGIES LLC The business of organized information

Dublin Core framework for corporate use

Not just 15 elements A framework to enable cross-resource exploration and

use

Dublin Core is framework for “integration metadata” at BellSouth

Source: Todd Stephens, BellSouth

Page 17: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

17TAXONOMY STRATEGIES LLC The business of organized information

ElementData Type Length

Req. / Repeat Source Purpose

Asset Metadata

Unique ID Integer Fixed 1 System supplied Basic accountability

Recipe Title String Variable 1 Licensed Content Text search & results display

Recipe summary String Variable 1 Licensed Content Content

Main Ingredients List Variable ?Main Ingredients vocabulary

Key index to retrieve & aggregate recipes, & generate shopping list

Subject Metadata

Meal Types List Variable * Meal Types vocab

Browse or group recipes & filter search results

Cuisines List Variable * Cuisines

Courses List Variable * Courses vocab

Cooking Method Flag Fixed * Cooking vocab

Link Metadata

Recipe Image Pointer Variable ? Product Group Merchandize products

Use Metadata

Rating String Variable 1 Licensed Content Filter, rank, & evaluate recipes

Release Date Date Fixed 1 Product Group Publish & feature new recipes

Legend: ? – 1 or more * - 0 or more

Metadata: A data specification – a recipe example

dc:identifier

dc:title

dc:description

X

X

X

X

X

dcterms:hasPart

dc:date

dc:type=“recipe”, dc:format=“text/html”, dc:language=“en”

Page 18: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

18TAXONOMY STRATEGIES LLC The business of organized information

Why Dublin Core?

Dublin Core is a de-facto standard across many other systems and standards

RSS (1.0), OAI Inside organizations – portals,

CMS, …

Mapping to DC elements from most existing schemes is simple

Beware of force-fits

Why will metadata already exist? Because of search projects,

portal integration projects, etc. that are creating it or standardizing a mapping.

Source: Todd Stephens, BellSouth

Per-Source Data Types, Access Controls, etc.

Dublin Core and Similar

Taxonomies, Vocabularies,

Ontologies

Page 19: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

19TAXONOMY STRATEGIES LLC The business of organized information

Creator

“An entity primarily responsible for making the content of the resource”

In other words – Author, Photographer, Illustrator, … Potential refinements by creative role Rarely justified

Creators can be persons or organizations

Key Point – Reminder: Name variations are a big issue in data quality: Ron Daniel Ron Daniel, Jr. Ron Daniel Jr. R.E. Daniel Ronald Daniel Ronald Ellison Daniel, Jr. Daniel, R.

Name fields may contain other information <dc:creator>Case, W. R. (NASA

Goddard Space Flight Center, Greenbelt, MD, United States)</dc:creator>

Best practice – Validate names against LDAP or other “Authority File”

Refinements

None

Encodings

None

Page 20: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

20TAXONOMY STRATEGIES LLC The business of organized information

Example – Name mismatches

One of these things is not like the other:

Ron Daniel, Jr. and Carl Lagoze; “Distributed Active Relationships in the Warwick Framework”

Hojung Cha and Ron Daniel; “Simulated Behavior of Large Scale SCI Rings and Tori”

Ron Daniel; “High Performance Haptic and Teleoperative Interfaces”

Differences may not matterIf they do This error cannot be reliably detected automatically Authority files and an error-correction procedure are

needed

Page 21: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

21TAXONOMY STRATEGIES LLC The business of organized information

Contributor

“An entity responsible for making contributions to the content of the resource.”

In practice – rarely used. Difficult to distinguish from

Creator. Adds UI Complexity for no real

gain

Best Practice?

Recommendation – Don’t use.

Refinements

None

Encodings

None

Page 22: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

22TAXONOMY STRATEGIES LLC The business of organized information

Publisher

“An entity responsible for making the resource available”.

Problems: All the name-handling stuff of

Creator. Hierarchy of publishers (Bureau,

Agency, Department, …)

Refinements

None

Encodings

None

Page 23: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

23TAXONOMY STRATEGIES LLC The business of organized information

Title

“A name given to the resource”.

Issues: Hierarchical Titles

e.g. Conceptual Structures: Information Processing in Mind and Machine (The Systems Programming Series)

Untitled Works Metaphysics

Refinements

Alternative

Encodings

None

Page 24: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

24TAXONOMY STRATEGIES LLC The business of organized information

Identifier

“An unambiguous reference to the resource within a given context”

Best Practice: URL

Future Best Practice: URI?

Problems Metaphysics Personalized URLs Multiple identifiers for same

content Non-standard resolution

mechanisms for URIs

Recommendations – Plan how to introduce long-lived URLs

Refinements

Bibliographic Citation

Encodings

URI

Page 25: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

25TAXONOMY STRATEGIES LLC The business of organized information

Date

“A date associated with an event in the life cycle of the resource”

Woefully underspecified.

Typically the publication or last modification date.

Best practice: YYYY-MM-DD

Refinements

CreatedValidAvailableIssuedModifiedDate AcceptedDate CopyrightedDate Submitted

Encodings

DCMI PeriodW3C DTF (Profile of ISO 8601)

Page 26: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

26TAXONOMY STRATEGIES LLC The business of organized information

Subject

The topic of the content of the resource.

Best practice: Use pre-defined subject schemes, not user-selected keywords. Supported Encodings probably not

useful for most corporate needs

Factor “Subject” into separate facets. People, places, organizations, events,

objects, services Industry sectors Content types, audiences, functions Topic

Some of the facets are already defined in DC (Coverage, Type) or DCTERMS (Audience)

Refinements

None

Encodings

DDCLCCLCSHMESHUDC

Page 27: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

27TAXONOMY STRATEGIES LLC The business of organized information

Coverage

“The extent or scope of the content of the resource”.

In other words – places and times as topics.

Key Point – Locations important in SOME environments, irrelevant in others. Time periods as subjects rarely important in commercial work.

Best Practice – ISO 3166-1, 3166-2

Refinements

SpatialTemporal

Encodings

Box (for Spatial)ISO3166 (for Spatial)Point (for Spatial)TGN (for Spatial)W3CTDF (for Temporal)

Page 28: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

28TAXONOMY STRATEGIES LLC The business of organized information

Description

“An account of the content of the resource”.

In other words – an abstract or summary

Key Point – What’s the cost/benefit tradeoff for creating descriptions? Quality of auto-generated

descriptions is low For search results, hit highlighting

is probably better

Refinements

AbstractTable of Contents

Encodings

None

Page 29: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

29TAXONOMY STRATEGIES LLC The business of organized information

Type

“The nature or genre of the content of the resource”

Best Current Practice: Create a custom list of content types, use that list for the values. Try to avoid “image”, “audio”, and

other format names in the list of content types, they can be derived from “Format”.

No broadly-acceptable list yet found.

Refinements

None

Encodings

DCMI Type

Page 30: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

30TAXONOMY STRATEGIES LLC The business of organized information

Format

“The physical or digital manifestation of the resource.”

In other words – the file format

Best practice: Internet Media Types

Outliers: File sizes, dimensions of physical objects

Refinements

ExtentMedium

Encodings

IMT

Page 31: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

31TAXONOMY STRATEGIES LLC The business of organized information

Language

“A language of the intellectual content of the resource”.

Best Practice: ISO 639, RFC 3066

Dialect codes: Advanced practice

Refinements

None

Encodings

ISO639-2RFC1766RFC3066

Page 32: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

32TAXONOMY STRATEGIES LLC The business of organized information

Relation

“A reference to a related resource”

Very weak meaning – not even as strong as “See also”.

Best practice: Use a refinement element and URLs.

Refinements

Is Version OfHas VersionIs Replaced ByReplacesIs Required ByRequiresIs Part OfHas PartIs Referenced ByReferencesIs Format OfHas FormatConforms To

Encodings

URI

Page 33: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

33TAXONOMY STRATEGIES LLC The business of organized information

Source

“A reference to a resource from which the present resource is derived”

Original intent was for derivative works

Frequently abused to provide bibliographic information for items extracted from a larger work, such as articles from a Journal

Refinements

None

Encodings

URI

Page 34: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

34TAXONOMY STRATEGIES LLC The business of organized information

Rights

“Information about rights held in and over the resource”

Could be a copyright statement, or a list of groups with access rights, or …

Refinements

Access RightsLicense

Encodings

None

Page 35: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

35TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Q&A6:45 Adjourn

Page 36: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

Strategies LLCTaxonomy

May 22, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of Dublin Core metadata in Corporate Environments

http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/cwa/cwa15247.asp

Page 37: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

37TAXONOMY STRATEGIES LLC The business of organized information

Dublin Core: CEN/ISSS Workshop on Dublin Core Metadata – corporate uses

Applied Information Technique

AstraZenica BBC BellSouth Cisco Daimler Chrysler Giunti Labs GSK Halliburton

HP IBM Intel John Wiley & Sons Lilly PeopleSoft Rohm Haas SAP Software AG Unisys

Page 38: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

38TAXONOMY STRATEGIES LLC The business of organized information

How is Dublin Core used in corporate environments?

57%

43% 43%

29%

0%

10%

20%

30%

40%

50%

60%

De facto Simple Access enabler Compliance

Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core

– Guidance information for the deployment of Dublin Core metadata in Corporate Environments

Page 39: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

39TAXONOMY STRATEGIES LLC The business of organized information

Jurisdiction

Industry Impact

BRM Impact

Form TypeAgency AudienceKeyword Topic

Taxonomy: e-Forms exampleTaxonomy: e-Forms example

0001 Legislative

1000 Judicial1100

Executive Office of Pres

0003 Exec Depts1200 Agriculture1300 Commerce9700 Defense9100 Education8900 Energy7500 HHS7000 DHS8600 HUD1400 Interior1500 Justice1600 Labor1900 State6900 Transport2000 Treasury3600 Veterans

Ind AgenciesIntl Orgs

ApplicationApprovalClaimInformation

requestInformation

submission

InstructionsLegal filingPaymentProcuremen

tRenewalReservationService

requestTestOther inputOther

transaction

Agriculture & food

CommerceCommunica-

tionsEducationEnergyEnv proForeign relsGovtHealth &

safetyHousing &

comm devLaborLawNamed grpsNational defNat resourcesRecreationSci & techSocial pgmsTransport

AllGeneral

CitizenBusinessGovtEmployeeNative American

Non-resident

TouristSpecial

group

00 Generic11

Agriculture21 Mining22 Utilities23

Construct31-33

Manuf42

Wholesale44-45

Retail48-49 Trans51 Info52 Finance54

Profession55 Mgmt56 Support61

Education62 Health

Care71 Arts72

Hospitality81 Other

Services92 Public

Admin

FederalState +Local +Other +

Citizen SrvcsSocial SrvsDefenseDisastersEcon DevEducationEnergyEnv MgmtLaw EnfJudicial

CorrectionalHealthSecurityIncome Sec

IntelligenceIntl AffairsNat ResourTransportWorkforceScience

DeliverySupport Manageme

nt

Controlled VocabulariesControlled Vocabularies

Facets

Page 40: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

40TAXONOMY STRATEGIES LLC The business of organized information

How Dublin Core is extended?

100%

86%

57% 57%

0%

20%

40%

60%

80%

100%

120%

Doc Types Products &Services

Roles InconsistentEncoding

Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core

– Guidance information for the deployment of Dublin Core metadata in Corporate Environments

Page 41: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

41TAXONOMY STRATEGIES LLC The business of organized information

Custom business process document types? Ouch!

Oil & gas services company document types

analysis, appraisals, assessments, forecasts, predictions

agendas, plans, designs, schedules, workflow

applications, proposals, requests, requirements

permits, consents, approvals, rejections, certificates

work orders, correspondence

auditing, compliance, testing, inspections, operations reports

lessons learned, after-action reviews, meeting minutes, FAQs

policies, procedures, training manuals, standards, best practices

research notes, journal articles

newsletters, bulletins, press releases

ads, brochures, data sheets, technical notes, case studies, price lists

checklists, templates, forms, logos, branding

software, database forms

Page 42: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

42TAXONOMY STRATEGIES LLC The business of organized information

The power of taxonomy facets

4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,00010,000 nodes (104) Easier to maintain Can be easier to

navigate

Page 43: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

43TAXONOMY STRATEGIES LLC The business of organized information

Taxonomic metadata example:Form SS-4. Employer Identification Number (EIN)

Facet Values

Agency IRS

Content Type Information Submission

Industry Impact

Generic

Jurisdiction Federal

Programs & Services

Support Delivery of Services/General Government/Taxation Management

Keyword Topic

Commerce/Employment taxes

Audience Business

Page 44: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

44TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Q&A6:45 Adjourn

Page 45: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

45TAXONOMY STRATEGIES LLC The business of organized information

Fundamentals of metadata ROI

Tagging content using metadata and a taxonomy are costs, not benefits.

There is no benefit without exposing the tagged content to users in some way that cuts costs or improves revenues.

Putting metadata and a taxonomy into operation requires UI changes and/or backend system changes, as well as data changes.

You need to determine those changes, and their costs, as part of the ROI.

Page 46: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

46TAXONOMY STRATEGIES LLC The business of organized information

Common metadata ROI scenarios

Catalog site Increased sales. Increased productivity.

Customer support Cutting costs. Increased sales.

Compliance Avoiding penalties.

Knowledge worker productivity Less time searching, more time working.

Executive Mandate No ROI study, just someone with a vision and a budget.

Page 47: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

47TAXONOMY STRATEGIES LLC The business of organized information

Guided Navigation

2-3 clicks to product

No dead ends

http://www.tesco.com/winestore

Metadata ROI: Catalog site

Page 48: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

48TAXONOMY STRATEGIES LLC The business of organized information

Metadata ROI: Catalog site

Increased sales Product findability. Product cross-sells and up-

sells. Customer loyalty.

1-5% increase in sales $57.6B sales (’04) $2.1B net income (’04)

Enterprise portal cost $6M

$600M to $2B/year $21M to $105M/year

1-5% increase in productivity $50K average cost per employee 310,400 employees (’04)

$155M to $776M/year

Page 49: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

49TAXONOMY STRATEGIES LLC The business of organized information

Metadata ROI: Customer support model

Policy categories for browsing

Type and go to search for specific policies

Good search results for policy topics, e.g., “pets”

Refine search offered with results

Help on search page, not a click away.

Page 50: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

50TAXONOMY STRATEGIES LLC The business of organized information

Metadata ROI: Customer support model

Self service Fewer customer calls. Faster, more accurate CSR

responses through better information access.

25-50% service efficiency increase 300K customer service calls

per month $6 cost per call

Manual processing 100,000 documents 2 pages per document $4 per page $800K

$5.4M to $10.8M/yr

$186M to $930M/year ($575M) to $169M/year

1-5% increased sales $18.6B sales (’04) ($761M) net income (’04)

Page 51: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

51TAXONOMY STRATEGIES LLC The business of organized information

Metadata ROI: Compliance

Avoiding penalties for breaching regulations

SOX: up to 5 years in jail SOX: up to $5M

Following required procedures

Loss of company $100B revenue (’00)

Loss of partner companies Arthur Andersen

$100B

Page 52: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

52TAXONOMY STRATEGIES LLC The business of organized information

Searching

Creating

Commun-icating

Knowledge workers spend up to 2.5 hours each day looking for information …

… But find what they are looking for only 40% of the time.

— Kit Sims Taylor

Page 53: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

53TAXONOMY STRATEGIES LLC The business of organized information

High cost of not finding information

“The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs …”

— Sue Feldman,

High cost of poor classification Poor classification costs a 10,000 user organization $10M

each year—about $1,000 per employee.

— Jakob Nielsen, useit.com

But “better search” itself is a weak ROI

Page 54: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

54TAXONOMY STRATEGIES LLC The business of organized information

Creating new

content

Recreating existing content

SearchingCommun-icating

26%9%

Knowledge workers spend more time re-creating existing content than creating new content

— Kit Sims Taylor

Page 55: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

55TAXONOMY STRATEGIES LLC The business of organized information

Metadata ROI: Productivity

Decreased cost to market Decreased development

cost Increased R&D productivity Reduced time for sales &

marketing 1-5% decrease in drug

development cost $800M/drug

5-10% increase in R&D productivity

13% of revenue $39B in sales (’04)

10-20% decrease in time for sales & marketing

13% of revenue

Enterprise document management system cost

$10M

$8M to $16M/drug

$254M to $507M/year

$254M to $507M/year

Page 56: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

56TAXONOMY STRATEGIES LLC The business of organized information

Metadata FAQ: Executive mandate is key

There is no ROI out of the box Just someone with a vision

…and the budget to make it happen.

What’s really needed? Demos and proofs of value. So that a stronger cost benefit argument can be made for

continuing the work

Page 57: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

57TAXONOMY STRATEGIES LLC The business of organized information

Metadata FAQ: How do you sell it?

Don’t sell “metadata” or “taxonomy”, sell the vision of what you want to be able to do.

Clearly understand what the problem is and what the opportunities are.

Do the calculus (costs and benefits) Design the taxonomy (in terms of LOE) in relation to the

value at hand.

Page 58: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

58TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Q&A6:45 Adjourn

Page 59: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

59TAXONOMY STRATEGIES LLC The business of organized information

Overview of metadata practices

Identify the team Use (or map to) Dublin Core for basic information. Extend with custom elements for specific facts. Use pre-existing, standard, vocabularies as much as

possible. ISO country codes for locations Product & service info from ERP system Validate author names with LDAP directory

Design a QC Process Start with an error-correction process, then get more formal on

error detection Large-scale ontologies may be valuable in automated error

detection

Page 60: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

60TAXONOMY STRATEGIES LLC The business of organized information

Factor “Subject” into smaller facets

Size DMOZ tries to organize all

web content, has more than 600k categories!

Difficulty in navigating, maintaining

Hidden facet structure “Classification Schemes” vs.

“Taxonomies”

Page 61: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

61TAXONOMY STRATEGIES LLC The business of organized information

Sources for 7 common vocabularies

Vocabulary Definition Potential Sources

Organization Organizational structure. FIPS 95-2, U.S. Government Manual, Your organizational structure, etc.

Content Type Structured list of the various types of content being managed or used.

DC Types, AGLS Document Type, AAT Information Forms , Records management policy, etc.

Industry Broad market categories such as lines of business, life events, or industry codes.

FIPS 66, SIC, NAICS, etc.

Location Place of operations or constituencies.

FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, etc.

Function Functions and processes performed to accomplish mission and goals.

FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc.

Topic Business topics relevant to your mission and goals.

Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, etc.

Audience Subset of constituents to whom a piece of content is directed or intended to be used.

GEM, ERIC Thesaurus, IEEE LOM, etc.

Products and Services

Names of products/programs & services.

ERP system, Your products and services, etc.

dc:publisher

dc:type

dc:coverage

dc:subject

dcterms:audience

Page 62: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

62TAXONOMY STRATEGIES LLC The business of organized information

Cheap and Easy Metadata

Some fields will be constant across a collection. In the context of a single collection those kinds of

elements add no value, but they add tremendous value when many collections are brought together into one place, and they are cheap to create and validate.

Page 63: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

63TAXONOMY STRATEGIES LLC The business of organized information

Taxonomy Business Processes

• Taxonomies must change, gradually, over time if they are to remain relevant

• Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions

• A team will need to maintain the taxonomy on a part-time basis

• Taxonomy team reports to some other steering committee

Page 64: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

64TAXONOMY STRATEGIES LLC The business of organized information

Published CVs and STs

Consuming Applications

Syndicated Terminologies

IntranetSearch

’’

Web CMS

Archives

ERMS

Custodians

Notifications

Change Requests & Responses

ISO3166-1

Other External

ERP

Other Internal

Vocabulary Management

System

Other Controlled

Items

’’

Intranet Nav.

DAM

Definitions about the Controlled Vocabulary Governance Environment

Controlled Vocabulary Governance Environment

2: CV Team decides when to update CVs

3: Team adds value via mappings, translations, synonyms, training materials, etc.

1: Syndicated Terminologies change on their own schedule

4: Updated versions of CVs published to consuming applications

CVs

Page 65: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

65TAXONOMY STRATEGIES LLC The business of organized information

Other Controlled Items

Taxonomy Team will have additional items to manage: Charter, Goals, Performance Measures Editorial rules Team processes Tagger training materials (manual and automatic) Outreach & ROI

Communication plan Website Presentations Announcements

Roadmap

Page 66: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

66TAXONOMY STRATEGIES LLC The business of organized information

Taxonomy governance | Generic team charter

Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme Associated taxonomy materials, such as:

Editorial Style Guide Taxonomy Training Materials Metadata Standard Team rules and procedures (subject to CIO review)

Team evaluates costs and benefits of suggested changeTaxonomy Team will: Manage relationship between providers of source

vocabularies and consumers of the Taxonomy Identify new opportunities for use of the Taxonomy across

the Enterprise to improve information management practices Promote awareness and use of the Taxonomy

Page 67: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

67TAXONOMY STRATEGIES LLC The business of organized information

Other Controlled Items - Editorial RulesTo ensure consistent style, rules are needed

Issues commonly addressed in the rules: Sources of Terms Abbreviations Ampersands Capitalization Continuations (More… or Other…) Duplicate Terms Hierarchy and Polyhierarchy Languages and Character Sets Length Limits “Other” – Allowed or Forbidden? Plural vs. Singular Forms Relation Types and Limits Scope Notes Serial Comma Spaces Synonyms and Acronyms Term Arrangement (Alphabetic or …) Term Label Order (Direct vs. Inverted)

Must also address issue of what to do when rules conflict – which are more important?

Rule Name Editorial Rule

Use Existing Vocabularies

Other things being equal, reusing an existing vocabulary is preferred to creating a new one.

Ampersands The character '&' is preferred to the word ‘and’ in Term Labels.Example: Use Type: “Manuals & Forms”, not “Manuals and Forms”.

Special Characters Retain accented characters in Term Labels.Example: España

Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS NOT preceded by a comma.Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”.

Capitalization Use title case (where all words except articles are capitalized).Example: “Education, Learning & Employment”NOT “Education, learning & employment”NOT “EDUCATION, LEARNING & EMPLOYMENT”NOT “education, learning & employment”

… …

Page 68: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

68TAXONOMY STRATEGIES LLC The business of organized information

Roles in Two Taxonomy Governance Teams

Executive Sponsor Advocate for the taxonomy team

Business Lead Keeps team on track with larger business

objectives Balances cost/benefit issues to decide

appropriate levels of effort Specialists help in estimating costs

Obtains needed resources if those in team can’t accomplish a particular task

Technical Specialist Estimates costs of proposed changes in terms

of amount of data to be retagged, additional storage and processing burden, software changes, etc.

Helps obtain data from various systems

Content Specialist Team’s liaison to content creators Estimates costs of proposed changes in terms

of editorial process changes, additional or reduced workload, etc.

Small-scale Metadata QA Responsibility

Taxonomy Specialist Suggests potential taxonomy changes based on

analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system

with aid of IT specialist

Content Owner Reality check on process change suggestions

Business LeadCustodians Responsible for content in a specific CV.

Training Representative Develops communications plan, training

materials

Work Practices Representative Develops processes, monitors adherence

IT Representative Backups, admin of CV Tool

Info. Mgmt. Representative Provides CV expertise, tie-in with larger IM effort

in the organization.

Team structure at a different org.

Page 69: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

69TAXONOMY STRATEGIES LLC The business of organized information

Taxonomy governance | Where changes come from

experience

End User

Firewall

Taxonomy

Content TaggingLogic

ApplicationUI

TaggingUI

Tagging Staff

Taxonomy Editor

Staff notes

‘missing’concepts

Query log analysis

Requests from other parts of NASA

experience

End User

Taxonomy Team

FirewallFirewall

Taxonomy

Content TaggingLogic

TaggingLogic

ApplicationUI

ApplicationUI

TaggingUI

TaggingUI

Tagging Staff

Taxonomy Editor

Staff notes

‘missing’concepts

Query log analysis

Requests from other parts of the organization

Team considerations

1. Business goals

2. Changes in user experience

3. Retagging cost

Recommendations by Editor

1. Small taxonomy changes (labels, synonyms)

2. Large taxonomy changes (retagging, application changes)

3. New “best bets” content

Application Logic

Page 70: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

70TAXONOMY STRATEGIES LLC The business of organized information

Principles

Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective “subjects”, they are objective “objects”.

Clearly identify the Custodians of the facets, and the process for maintain and publishing them.

Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable.

For example, labels like “Anarchist” or “Prime Minister” can be applied to the same person at different times (e.g. Nelson Mandela).

Page 71: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

71TAXONOMY STRATEGIES LLC The business of organized information

Enterprise Portal challenges when organizing content

Multiple subject domains across the enterprise Vocabularies vary Granularity varies Unstructured information represents about 80%

Information is stored in complex ways Multiple physical locations Many different formats

Tagging is time-consuming and requires SME involvement Portal doesn’t solve content access problem

Knowledge is power syndrome Incentives to share knowledge don’t exist Free flow of information TO the portal might be inhibited

Content silo mentality changes slowly What content has changed? What exists? What has been discontinued? Lack of awareness of other initiatives

Page 72: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

72TAXONOMY STRATEGIES LLC The business of organized information

Challenges when organizing content on enterprise portals

Lack of content standardization and consistency Content messages vary among departments How do users know which message is correct?

Re-usability low to non-existent Costs of content creation, management and delivery may

not change when portal is implemented: Similar subjects, BUT Diverse media Diverse tools Different users

How will personalization be implemented? How will existing site taxonomies be leveraged? Taxonomy creation may surface “holes” in content

Page 73: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

73TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Q&A6:45 Adjourn

Page 74: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

74TAXONOMY STRATEGIES LLC The business of organized information

Methods used to create & maintain metadata

71%

57%

43% 43%

0%

10%

20%

30%

40%

50%

60%

70%

80%

Forms DistributedProduction

Centralizedproduction

Not Automated

Base: 20 corporate information managers CEN/ISSS Workshop on Dublin Core

– Guidance information for the deployment of Dublin Core metadata in Corporate Environments

Page 75: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

75TAXONOMY STRATEGIES LLC The business of organized information

The Tagging Problem

How are we going to populate metadata elements with complete and consistent values?

What can we expect to get from automatic classifiers?

Page 76: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

76TAXONOMY STRATEGIES LLC The business of organized information

Tagging

Province of authors (SMEs) or editors? Taxonomy often highly granular to meet task and re-use

needs. Vocabulary dependent on originating department. The more tags there are (and the more values for each

tag), the more hooks to the content. If there are too many, authors will resist and use “general”

tags (if available) Automatic classification tools exist, and are valuable, but

results are not as good as humans can do. “Semi-automated” is best. Degree of human involvement is a cost/benefit tradeoff.

Page 77: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

77TAXONOMY STRATEGIES LLC The business of organized information

Automatic categorization vendors | Analyst viewpoint

Accuracy Levelhighlow

Con

tent

Vol

umes

low

high

Page 78: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

78TAXONOMY STRATEGIES LLC The business of organized information

Considerations in automatic classifier performance

Classification Performance is measured by “Inter-cataloger agreement”

Trained librarians agree less than 80% of the time

Errors are subtle differences in judgment, or big goofs

Automatic classification struggles to match human performance

Exception: Entity recognition can exceed human performance

Classifier performance limited by algorithms available, which is limited by development effort

Very wide variance in one vendor’s performance depending on who does the implementation, and how much time they have to do it

1) 80/20 tradeoff where 20% of effort gives 80% of performance.

2) Smart implementation of inexpensive tools will outperform naive implementations of world-class tools.

Accuracy

Development Effort/ Licensing

Expense

Regexps

Trained Librarians

potential performance

gain

Page 79: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

79TAXONOMY STRATEGIES LLC The business of organized information

Tagging tool example: Interwoven MetaTagger

Manual form fill-in w/ check boxes, pull-down lists, etc.

Auto keyword & summarization

Page 80: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

80TAXONOMY STRATEGIES LLC The business of organized information

Tagging tool example: Interwoven MetaTagger

Auto-categorization

Parse & lookup (recognize names)

Rules & pattern matching

Page 81: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

81TAXONOMY STRATEGIES LLC The business of organized information

Metadata tagging workflows

Even ‘purely’ automatic meta-tagging systems need a manual error correction procedure. Should add a QA sampling

mechanism Tagging models:

Author-generated Central librarians Hybrid – central auto-tagging

service, distributed manual review and correction

Compose in Template

Submit to CMS

Analyst Editor

Review content

Problem?

Copywriter

Copy Edit content

Problem?Hard Cop

y

Web site

Y

Y N

N

Approve/Edit metadata

Automatically fill-in metadata

Tagging Tool Sys Admin

Sample of ‘author-generated’ metadata workflow.

Page 82: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

82TAXONOMY STRATEGIES LLC The business of organized information

Automatic categorization vendors | Pragmatic viewpoint

Accuracy Levelhighlow

Con

tent

Vol

umes

low

high

Page 83: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

83TAXONOMY STRATEGIES LLC The business of organized information

Seven practical rules for taxonomies

1. Incremental, extensible process that identifies and enables users, and engages stakeholders.

2. Quick implementation that provides measurable results as quickly as possible.

3. Not monolithic—has separately maintainable facets.4. Re-uses existing IP as much as possible.5. A means to an end, and not the end in itself .6. Not perfect, but it does the job it is supposed to do—

such as improving search and navigation. 7. Improved over time, and maintained.

Page 84: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

84TAXONOMY STRATEGIES LLC The business of organized information

Agenda

3:30 Introductions: Us and you3:45 Background: Metadata & controlled vocabularies4:00 Dublin Core: Elements, issues, and recommendations4:30 Dublin Core in the wild: CEN study and remarks4:45 Enterprise-wide metadata ROI questions5:00 Break5:15 ROI (Cont.)5:30 Business processes6:15 Tools & technologies6:30 Summary, Q&A6:45 Adjourn

Page 85: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

85TAXONOMY STRATEGIES LLC The business of organized information

Summary: Categorize with a purpose

What is the problem you are trying to solve? Improve search Browse for content on an enterprise-wide portal Enable business users to syndicate content Otherwise provide the basis for content re-use

How will you control the cost of creating and maintaining the metadata) needed to solve these problems?

CMS with a metadata tagging products Semi-automated classification Taxonomy editing tools Guided navigation tools

Page 86: Strategies LLCTaxonomy May 22, 2005Copyright 2005 Taxonomy Strategies LLC. All rights reserved. Workshop: Why and How to Use Dublin Core for Enterprise-Wide

Strategies LLCTaxonomy

May 22, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Contact Info

Ron Daniel

925-368-8371

[email protected]

Joseph Busch

415-377-7912

[email protected]