quality taxonomies

35
Quality Taxonomies Dr. Claude Vogel Founder & CTO KM World 2000

Upload: wells

Post on 10-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Quality Taxonomies. Dr. Claude Vogel Founder & CTO KM World 2000. Ontology / Taxonomy. Static Discovery. Root Ontology. Taxonomy Generation. Dynamic Discovery. What is Quality ?. “Best value for the money” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Quality Taxonomies

Quality Taxonomies

Dr. Claude VogelFounder & CTO

KM World 2000

Page 2: Quality Taxonomies

Ontology / Taxonomy

Root Ontology

Taxonomy Generation

Static Discovery

Dynamic Discovery

Page 3: Quality Taxonomies

What is Quality ? “Best value for the money” According to this definition, you are entitled to

get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.

Page 4: Quality Taxonomies

What is Quality ?

“Good Quality is Nominal Conformance” Taxonomy Quality is defined as Taxonomy

Conformance to: – Valid requirements;– Explicitly documented development standards; and, – Implicit characteristics that are expected of all

professionally developed taxonomies, such as the desire for good maintainability.

Page 5: Quality Taxonomies

Standards ISO 2788-1986

– International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute)

ISO 5964-1985 – International Organization for Standardization. Documentation—Guidelines for the Establishment

and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute)

ANSI/NISO Z39.19-1993– National Information Standards Institute. Guidelines for the Construction, Format, and Management

of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)

SEMIO Quality Plan v1 2000 ISO/IEC 13250 Topic Maps RDF

– Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML

Page 6: Quality Taxonomies

Project Plan

1. Kick-off2. Requirements Review3. Lexicon Review4. Taxonomy Review5. Tags Review6. Final Review

Page 7: Quality Taxonomies

1. Kick-off Objectives

– Purpose– Scope– Scale– Users– Conditions of receipt

Roles– Supplier– Customer

• Admin• KE• Experts• Users

Planning Training and Transfer

Page 8: Quality Taxonomies

2. Requirements Review

Sources Lexicon Ontology Install

Page 9: Quality Taxonomies

Sources

Dispersion (Multiplicity, Size, Homogeneity) Refresh Access

Features Internet, News, E-Mail

Reports, Patents

E-Trade, Logs

Informative content - + + Number of topics covered + + - Structured information - + + Size of records - + - Number of records + - +

Page 10: Quality Taxonomies

Typical Patterns Disparity

Adjust sources Adjust crawl strategy Isolate communities / taxonomies

Page 11: Quality Taxonomies

Lexicon

Vocabularies, etc. Substitutions: Acronyms, Synonyms, etc. Preferred Keywords: Brand Names, etc. Banned Keywords

Page 12: Quality Taxonomies

Typical Patterns Lack of requirements

Use Librarian Resources

Page 13: Quality Taxonomies

Ontology

Thesaurus ? Is the information domain analysis complete,

consistent, and accurate ? Is the partitioning of the problem complete ?

Page 14: Quality Taxonomies

Typical Patterns Directory versus Taxonomy

Isolate “directory” branches Thesaurus versus Taxonomy

Put an ontology on top of thesaurus Check ASAP match of thesaurus generics with extracted

lexicon Very high level design for top categories

requirements Plan to work bottom-up

See also Taxonomy (functions, combinations, etc.)

Page 15: Quality Taxonomies

Install

Implementation / Integration:– Are external and internal interfaces properly

defined? – Are all requirements traceable to the system level? – Has prototyping been conducted for the

user/customer? – Is performance achievable within the constraints

imposed by other system elements? – Are requirements consistent with schedule,

resources, and budget?

Page 16: Quality Taxonomies

Typical Patterns Scale Security Missing Documents

Page 17: Quality Taxonomies

3. Lexicon Review Coverage

– Extracted words / Words– (Extracted Index / Index)

Sources bench-marking– Coverage– Extraction quality– Topic distribution

Structure– Most Frequent Phrases– Most Productive Generics

Substitutions Exceptions

Page 18: Quality Taxonomies

Typical Patterns Low level of frequency / quality for the

most meaningful content Increase size of value corpus Filter and re-import lexicon

Page 19: Quality Taxonomies

4. Taxonomy Review Taxonomy Operation

– Correctness– Reliability– Usability– Integrity– Efficiency

Taxonomy Revision– Maintainability– Flexibility– Testability

Taxonomy Transition– Portability– Reusability– Interoperability

Page 20: Quality Taxonomies

Tax

Liability

Loan

Term loan

Short-term loan

Unique Beginner

Life Form

Generic

Specific

Varietal

Folk Taxonomies Design

The Berlin and Kay model: Taxonomy = Nomenclature + Terminology

Page 21: Quality Taxonomies

Correctness Accuracy Completeness Consistency

Page 22: Quality Taxonomies

Accuracy

PrecisionRecall

Page 23: Quality Taxonomies

Completeness

Taxonomy Maps Lexicon Collection

Page 24: Quality Taxonomies

Concentration Works Against Quality

Lexicon

Document Collection

Maps

Taxonomy

Tagging

Tagging Coverage Ontology Coverage Hook Coverage Map Coverage Lexical Coverage Collection Coverage

Page 25: Quality Taxonomies

Consistency:Typical Patterns

Objectivization Hyperonymy Speciation Necessity

Page 26: Quality Taxonomies

Objectivization

EmploymentFiringHiring

Salaries

Avoid functional categories Don’t mix functions / objects Exhaust scripts Match idiomatic phrases

Page 27: Quality Taxonomies

Genericity

PartsAir ConditioningBelts and HosesBodyBrake SystemChassisEngineExhaust SystemFuel SystemGlassIgnition

Avoid meronymy Don’t mix meronymy /

hyperonymy Exhaust prototypes

Page 28: Quality Taxonomies

Speciation

Person Unwelcome personUnpleasant personSelfish personOpportunistBackscratcher

Avoid “strings” of categories Avoid (non-idioms) properties for categories

(WordNet)

Page 29: Quality Taxonomies

Necessity

Tax

Individuals Corporations

Assets Liability Assets Liability

B C

D

E

FG

H

I

K

Tax

Individuals Corporations

Assets Liability

Individuals Corporations

Avoid non-productive categories

Avoid combinations of categories

Page 30: Quality Taxonomies

Nomenclature (Design Structure) Quality Index

UB

i j

lf lflf1 2 g g gn 1 2 i

n3 4 mg g g g g g s s s s s s25 6 1 3 4

s s s s5 6 7 8

v v1 2

•Level 0

•Level 1

•Level 2

•Level 3

•Level 4

UB = unique beginner lf = life-form g = generic s = specific v = varietal

Width

Depth

Balance

Page 31: Quality Taxonomies

Complexity Index Cyclometric complexity increases with number of

Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing.

Taxonomy Complexity Index combines:– autonomy– closure– similarity– typicality– commonality– redundancy– stability

Page 32: Quality Taxonomies

Maturity index The IEEE standard 982.1-1988 suggests a taxonomy

maturity index to provide an indication of the stability of the taxonomy .

Maturity Index combines:– number of modules in current ontology / taxonomy.– number of modules in current ontology / taxonomy that have

been changed.– number of modules added to current ontology / taxonomy. – number of modules deleted from the previous version of the

ontology / taxonomy.

Page 33: Quality Taxonomies

5. Tags Review Document coverage Concepts coverage

<tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>

Page 34: Quality Taxonomies

6. Final Review Receipt Maintenance

Page 35: Quality Taxonomies

Quality Taxonomies

Claude [email protected]

KM World 2000