sc 32 tutorial session

59
1 Open Forum 2003 on Metadata Registries Open Forum 2005 on Metadata Registries SC 32 Tutorial Session WG 2 eXtended Metadata Registry (XMDR) April 19, 2005 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California

Upload: keefe-palmer

Post on 04-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

SC 32 Tutorial Session. WG 2 eXtended Metadata Registry (XMDR) April 19, 2005. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California. 11179 Metadata Registries Extensions. Register (and manage) any semantics that are useful for managing data. - PowerPoint PPT Presentation

TRANSCRIPT

1Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

SC 32 Tutorial Session

WG 2 eXtended Metadata Registry (XMDR)

April 19, 2005

Bruce Bargmeyer,

Lawrence Berkley National Laboratory

University of California

2Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

11179 Metadata RegistriesExtensions

Register (and manage) any semantics that are useful for managing data. E.g., this may include registering not only permissible

values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found.

E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomitized ontologies….

Support traditional data management and data administration

Lay Foundation for semantics based computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….

3Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

XMDR Project Background

Collaborative, interagency effort DOD, EPA, USGS, NCI, Mayo Clinic, LBNL …& others

Draws on and contributes to interagency cooperation on ecoinformatics

Involves international, national, state, local government agencies, other organizations

Recognizes great potential of semantics-based computing, management of metadata Improving collection, maintenance, dissemination, processing of

very diverse data structures Collaboration arises from needs for traditional data administration,

for sharing data across multiple organizations, for managing complex semantics associated with data, and for semantics based computing.

Many Players, Many Interests…Shared Context

4Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Purposes of XMDR Project

1. Propose revisions to 11179 Parts 2 & 3 (3rd Ed.) – to serve as the design for the next generation of metadata registries.

2. Demonstrate Reference Implementation – to validate the proposed revisions

Extend semantics management capabilities Enable registration of correspondences between multiple

concept systems and between concept systems and data Explore uses of terminologies and ontologies Systematize representation of concepts and relationships Enable registration of metadata for knowledge bases Adapt & test emerging semantic technologies

5Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Movement TowardSemantics Management

Going beyond traditional Data Standards and Data Administration

In addition to anchoring data with definitions, we want to process data and concepts based on context and relationships, and axioms

In addition to natural language, we want to capture semantics with more formal description techniques FOL, DL, Common Logic, OWL

Going beyond information system interoperability and data interchange to processing based on inferences and probabilistic correspondence between concepts found in

natural language (in the wild) and both data in databases and concepts found in concept systems.

6Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

11179 Semantics Management

Real World

ModelingMethods &

Terminology

Model Artifacts,Data Exchange,

Concept Systems

Applications

Methodologies: EDR, NIAM, O-O, RDF, UML, Ontology

CDIF, (UML- MDL, XMI), OWL, CL & SCL (KIF)

SQL, Object, RDF, Semantic Web

Metamodel constructs: CDIF Core, MOF, XML-Schema, RDFS, ODM, OWL, CLMDR Semantics

Management:Data elements,Domains,Concepts, Terms …

7Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

SC 32 Standards & Projects

Real World

ModelingTools

Applications

Methodologies: EDR, NIAM, O-O, RDF, UML, Ontology

CDIF, (UML- MDL, XMI), OWL, CL & SCL (KIF)

SQL, Object, RDF, Semantic Web

WG 3 - SQL

Metamodel constructs: CDIF Core, MOF, XML-Schema, RDFS, ODM, OWL, CL

WG 2 - MMF (19763), CL (24707) &MOF PAS submission

WG 2 - MMF (19763), CL (24707) &XMI PAS submission

WG 2 – 11179

WG 2 - 20944

MDR SemanticsManagement:Data elements,Domains,Concepts, Terms …

WG2 - 20943

ModelingMethods &

Terminology

Model Artifacts,Data Exchange,

Concept Systems

8Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

XMDR Focus

Methodologies: EDR, NIAM, O-O, RDF, UML, Ontology

CDIF, (UML- MDL, XMI), OWL, CL & SCL (KIF)

SQL, Object, RDF, Semantic Web

WG 3 - SQL

Metamodel constructs: CDIF Core, MOF, XML-Schema, RDFS, ODM, OWL, CL

WG 2 - MMF (19763), CL (24707) &MOF PAS submission

WG 2 - MMF (19763), CL (24707) &XMI PAS submission

XMDR project

XMDR project

XMDR project

Real World

ModelingMethods &

Terminology

Applications

WG 2 – 11179Parts 2 & 3

WG 2 - 20944

MDR SemanticsManagement:Data elements,Domains,Concepts, Terms …

WG2 - 20943

Model Artifacts,Data Exchange,

Concept Systems

9Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Drawing Together

Metadata Registry

Terminology Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

MetadataRegistries

Terminology

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

10Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Data Element ConceptAfghanistan

Belgium

China

Denmark

Egypt

France

Germany

…………

Data ElementsData ElementsAFG

BEL

CHN

DNK

EGY

FRA

DEU

…………

ISO 3166English Name

ISO 31663-Numeric Code

004

056

156

208

818

250

276

…………

ISO 31663-Alpha Code

Afghanistan

Belgium

China

Denmark

Egypt

France

Germany

…………

Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

Name:Context:Definition:Unique ID: 3820Value Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

Name:Context: Definition:Unique ID: 1047Value Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

11Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

What is an ontology?

The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. The types in the ontology represent the predicates, word senses, or concept and relation types of the language L when used to discuss topics in the domain D.

Building, Sharing, and Merging Ontologies-John F. Sowa

12Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Terminolocgical & Formal(Axiomatized) Ontologies

The difference between a terminological ontology and a formal ontology is one of degree: as more axioms are added to a terminological ontology, it may evolve into a formal or axiomatized ontology.

Cyc has the most detailed axioms and definitions; it is an example of an axiomatized or formal ontology. EDR and WordNet are usually considered terminological ontologies.

Building, Sharing, and Merging OntologiesJohn F. Sowa

13Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

An Axiom for an Axiomatized Ontology

Definition: The resource_cost_point predicate, cpr, specifies the cost_value, c, (monetary units) of a resource, r, required by an activity, a, upto a certain time point, t.

If a resource of the terminal use or consume states, s, for an activity, a, are enabled at time point, t, there must exist a cost_value, c, at time point, t, for the activity, a,that uses or consumes the resource, r. The time interval, ti = [ts, te], during which a resource is used or consumed byan activity is specified in the use or consume specifications as use_spec(r, a, ts, te, q) or consume_spec(r, a, ts, te, q) where activity, a, uses or consumes quantity, q, of resource, r, during the time interval [ts, te]. Hence,

Axiom: a, s, r, q, ts, te, (use_spec(r, a, ts, te, q) enabled(s, a, t)) ∀ ∧ ∨(consume_spec(r, a, ts, te, q) enabled(s, a, t))≡ c, cpr(a,c,t,r)∧ ∃

Cost Ontology for TOronto Virtual Enterprise (TOVE)

14Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Samples of Eco & Bio Graph Data

Nutrient cycles in microbial ecologies These are bipartite graphs, with two sets of nodes, microbes and reactants (nutrients), and directed edges indicating input and output relationships. Such nutrient cycle graphs are used to model the flow of nutrients in microbial ecologies, e.g., subsurface microbial ecologies for bioremediation.

Chemical structure graphs: Here atoms are nodes, and chemical bonds are represented by undirected edges. Multi-electron bonds are often represented by multiple edges between nodes (atoms), hence these are multigraphs. Common queries include subgraph isomorphism. Chemical structure graphs are commonly used in chemoinformatics systems, such as Chem Abstracts, MDL Systems, etc.

Sequence data and multiple sequence alignments . DNA/RNA/Protein sequences can be modeled as linear graphs

Topological adjacency relationships also arise in anatomy. These relationships differ from partonomies in that adjacency relationships are undirected and not generally transitive.

15Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Eco & Bio Graph Data (Continued)

Taxonomies of proteins, chemical compounds, and organisms, ... These taxonomies (classification systems) are usually represented as directed acyclic graphs (partial orders or lattices). They are used when querying the pathways databases. Common queries are subsumption testing between two terms/concepts, i.e., is one concept a subset or instance of another. Note that some phylogenetic tree computations generate unrooted, i.e., undirected. trees.

Metabolic pathways: chemical reactions used for energy production, synthesis of proteins, carbohydrates, etc. Note that these graphs are usually cyclic.

Signaling pathways: chemical reactions for information transmission and processing. Often these reactions involve small numbers of molecules. Graph structure is similar to metabolic pathways.

Partonomies are used in biological settings most often to represent common topological relationships of gross anatomy in multi-cellular organisms. They are also useful in sub-cellular anatomy, and possibly in describing protein complexes. They are comprised of part-of relationships (in contrast to is-a relationships of taxonomies). Part-of relationships are represented by directed edges and are transitive. Partonomies are directed acyclic graphs.

Data Provenance relationships are used to record the source and derivation of data. Here, some nodes are used to represent either individual "facts" or "datasets" and other nodes represent "data sources" (either labs or individuals). Edges between "datasets" and "data sources" indicate "contributed by". Other edges (between datasets (or facts)) indicate derived from (e.g., via inference or computation). Data provenance graphs are usually directed acyclic graphs.

16Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

A graph theoretic characterization

Readily comprehensible characterization of metadata structures

Graph structure has implications for: Integrity Constraint Enforcement Data structures Query languages Combining metadata sets Algorithms for query processing

17Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Definition of a graph

Graph = vertex (node) set + edge set Nodes, edges may be labeled Edge set = binary relation over nodes

cf. NIAM

Labeled edge set RDF triples (subject, predicate, object) predicate = edge label

Typically edges are directed

18Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example of a graph

infectious disease

influenza measles

is-ais-a

19Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Types of Metadata Graph Structures

Trees Partially Ordered Trees Ordered Trees Faceted Classifications Directed Acyclic Graphs Partially Ordered Graphs Lattices Bipartite Graphs Directed Graphs Cliques Compound Graphs

20Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Graph Taxonomy

Directed Graph

Directed Acyclic Graph

Graph

Undirected Graph

Bipartite Graph

Partial Order Graph

Faceted Classification

Clique

Partial Order Tree

Tree

Lattice

Ordered Tree

Note: not all bipartite graphsare undirected.

21Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Trees

In metadata settings trees are almost most often directed edges indicate direction

In metadata settings trees are usually partial orders Transtivity is implied (see next slide) Not true for some trees with mixed edge types. Not always true for all partonomies

22Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example: Tree

Oakland Berkeley

Alameda County

California

part-of

part-of part-of

part-of

part-of

Santa Clara San Jose

Santa Clara County

part-of

23Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Trees - cont.

Uniform vs. non-uniform height subtrees Uniform height subtrees

fixed number of levels common in dimensions of multi-dimensional data

models

Non-uniform height subtrees common terminologies

24Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Partially Ordered Trees

A conventional directed tree Plus, assumption of transitivity Usually only show immediate ancestors

(transitive reduction) Edges of transitive closure are implied Classic Example:

Simple Taxonomy, “is-a” relationship

25Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example: Partial Order Tree

Polio Smallpox

Infectious Disease

Disease

is-a

is-a is-a

is-a

is-a

Diabetes Heart disease

Chronic Disease

is-a

Signifies inferred is-a relationship

26Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Ordered Trees

Order here refers to order among sibling nodes (not related to partial order discussed elsewhere)

XML documents are ordered trees Ordering of “sub-elements” is to support classic

linear encoding of documents

27Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example: Ordered Tree

part-ofpart-of

part-of

Paper

Title page Section Bibliography

Note: implicit ordering relation among parts of paper.

28Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Faceted Classification

Classification scheme has mulitple facets Each facet = partial order tree Categories = conjunction of facet values (often written

as [facet1, facet2, facet3]) Faceted classification = a simplified partial order graph Introduced by Ranganathan in 19th century, as Colon

Classification scheme Faceted classification can be descirbed with

Description Logc, e.g., OWL-DL

29Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example: Faceted Classification

Wheeled Vehicle Facet

2 wheeled

4 wheeled

3 wheeled

is-a is-a is-a

Internal Combustion

Human Powered

Vehicle Propulsion Facet

Bicycle Tricycle MotorcycleAuto

is-a is-a

is-a

is-ais-a

is-ais-a

is-a

is-a

is-a is-a

30Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Faceted Classifications and Multi-dimensional Data Model

MDM – a.k.a. OLAP data model Online Analytical Processing data model Star / Snowflake schemas

Fact Tables fact = function over Cartesian product of

dimensions dimensions = facets

geographic region, product category, year, ...

31Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Directed Acylic Graphs

Graph: Directed edges No cycles No assumptions about transitivity (e.g., mixed edge

types, some partonomies) Nodes may have multiple parents

Examples: Partonomies (“part-of”) - transitivity is not always

true

32Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example: Directed Acyclic Graph

Wheeled Vehicle

2 Wheeled Vehicle

4 WheeledVehicle

3 WheeledVehicle

is-a is-a is-a

Internal Combustion

Vehicle

Human PoweredVehicle

PropelledVehicle

Bicycle Tricycle MotorcycleAuto

is-a is-a

is-a

is-ais-a

is-ais-a

is-a

is-a

is-a is-a

is-a is-a

Vehicle

33Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Partial Order Graphs

Directed acyclic graphs + inferred transitivity Nodes may have multiple parents Most taxonomies drawn as transitive reduction, transitive closure

edges are implied. Examples:

all taxonomies most partonomies multiple inheritance

POGs can be described in Description Logic, e.g., OWL-DL

34Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example: Partial Order Graph

Wheeled Vehicle

2 Wheeled Vehicle

4 WheeledVehicle

3 WheeledVehicle

is-a is-a is-a

Internal Combustion

Vehicle

Human PoweredVehicle

PropelledVehicle

Bicycle Tricycle MotorcycleAuto

is-a is-a

is-a

is-ais-a

is-ais-a

is-a

is-a

is-a is-a

is-a is-a

Vehicle

Dashed line = inferred is-a (transitive closure)

35Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Directed Graph

Generalization of DAG (directed acyclic graph)

Cycles are allowed Arises when many edge types allowed Example: UMLS

36Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Lattices

A partial order For every pair of elements A and B

There exists a least upper bound There exists a greatest lower bound

Example: The power set (all possible subsets) of a finite set LUB(A,B) = union of two sets A, B GLB(A,B) = intersect of two sets A,B

37Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example Lattice: Powerset of 3 element set

{a,b,c}

{c}{a} {b}

{b,c}{a,b} {a,c}

Denotes subset

Empty Set

38Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Lattices - Applications

Formal Concept Analysis synthesizing taxonomies

Machine Learning concept learning

39Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Bipartite Graphs

Vertices = two disjoint sets, V and W All edges connect one vertex from V and one

vertex from W Examples:

mappings among value representations mappings among schemas (entity/attribute, relationship) nodes in Conceptual

Graphs

40Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example Bipartite Graph

California

Massachusetts

Oregon

CA

MA

OR

States Two-letter state codes

41Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Clique

Clique = complete graph (or subgraph) all possible edges are present

Used to represent equivalence classes Typically, on undirected graphs

42Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example of Clique

California

CA

Calif.

CAL

Here edges denote synonymy.

43Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Compound Graphs

Edges can point to/from subgraphs, not just simple nodes

Used in conceptual graphs CG is isomorphic to First Order Logic Could be used to specify contexts for

subgraphs

44Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Example Compound Graph

Colin Powell claimed

Iraq had WMDs

45Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Conclusions

We can characterize metadata structure in terms of graph structures

Partial Order Graphs are the most common structure: used for taxonomies, partonomies support multiple inheritance, faceted classification implicit inclusion of inferred transitive closure

edges

46Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Axiomitized Ontology

Definition: The resource_cost_point predicate, cpr, specifies the cost_value, c, (monetary units) of a resource, r, required by an activity, a, upto a certain time point, t.

If a resource of the terminal use or consume states, s, for an activity, a, are enabled at time point, t, there must exist a cost_value, c, at time point, t, for the activity, a,that uses or consumes the resource, r. The time interval, ti = [ts, te], during which a resource is used or consumed byan activity is specified in the use or consume specifications as use_spec(r, a, ts, te, q) or consume_spec(r, a, ts, te, q) where activity, a, uses or consumes quantity, q, of resource, r, during the time interval [ts, te]. Hence,

Axiom: a, s, r, q, ts, te, (use_spec(r, a, ts, te, q) enabled(s, a, t)) ∀ ∧ ∨(consume_spec(r, a, ts, te, q) enabled(s, a, t))≡ c, cpr(a,c,t,r)∧ ∃

47Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Challenges

How to register & manage the various graph structures? DBMS, File systems ….

How to query the graph structures? XQuery for XML Poor to non-existent graph query languages

How to get adequate performance, even in high performance computing environment

User interface complexity How to manage semantic drift Versions How to interrelate graphs with other graphs and with

data Granularity at which to register metadata (then point

to greater detail elsewhere?)

48Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

How can Terminologies and Ontologies help Manage Metadata?

At the level of metadata instances in a registry, connect metadata entities via shared terms via automatic indexing of metadata words via text values from specific metadata elements

At the level of the 11179 (or other) metamodel, ontologies can help specify formal relationships is-a and part-of hierarchies, etc. Inheritance, aggregation, … for automatic searching of sub-classes & inverses to specify semantic pathways for indexing

49Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Major Tasks, Deliverables & Milestones

Initial Architecture Design

Present Proposed 11179 Part 2 Revisions to SC32

WG2 mtg in DC

Research and Development

Task/Deliverable

Test Implementation (External Users)

Prepare Draft Revision of 11179 Part 2 for SC32

mtg in Berlin

System Test & Evaluation (Internal Participants)

Identify, Select Metadata Sources

Identify, Select Technologies

Develop Project Plan

Gantt Chart Forthcoming

50Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

General Tasks/Intentions

Limited Query Optimization

Brief ISO/IEC L8

Documents will Recommend Choices

Task/Intention

Brief DOD Metadata Working Group

Brief IC Metadata Working Group

Limited Help Functions

Prioritized Content Registration

Limited User Interface - Initially

Will Seek to Promote Awareness

51Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Potential Standards/Technologies

DBMS Object, XML, Relational, RDF/Graph, Logic, Text, Document, Multimedia

Knowledge Representation Web Ontology Language (OWL) Common Logic (CL)

Middleware/Messaging Cocoon 2, Jini, CoABS, JMS, XMLBlaster, SOAP

XML [Semantic] Web Services Axis, JWSDP

Agent Development ABLE, JADE

Engines/Servers OMS (IBM), Federator/OMS (OWI) Jess

Open Source and Risk Tolerant

52Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Architecture Approach

Fully modular approach Exemplars:

Apache Web Server Eclipse IDE Protégé Ontology Editor

Benefits: numerous modules are relatively easy to implement clean separation of concerns and high reusability and portability tooling support required is minimal

53Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Ontology EditorProtege11179 OWL Ontology

XMDR Prototype Architecture: Initial Implemented Modules

MetadataValidator

AuthenticationService

MappingEngine

RegistryExternalInterface

Generalization Composition (tight ownership) Aggregation (loose ownership)

Jena, Xerces

Java

RetrievalIndex

FullTextIndex

Lucene

LogicBasedIndex

Jena, OWI KSRacer

RegistryStore

WritableRegistryStore

Subversion

54Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

XMDR Content Priority List

Phase 1

(V.A) National Drug File Reference Terminology (?)

DTIC Thesaurus (Defense Technology Info. Center Thesaurus)

NCI Thesaurus National Cancer Institute Thesaurus

NCI Data Elements (National Cancer Institute Data Standards Registry

UMLS (non-proprietary portions)

GEMET (General Multilingual Environmental Thesaurus)

EDR Data Elements (Environmental Data Registry)

ISO 3166 Country Codes – from EPA EDR

USGS Geographic Names Information System (GNIS)

55Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

XMDR Content Priority List

Phase 2LOINC Logical Observation Identifiers Names and Codes

ITIS Integrated Taxonomic Information System

Getty Thesaurus of Geographic Names (TGN)

SIC (Standard Industrial Classification System)

NAICS (North American Industrial Classification System)

NAIC-SIC mappings

UNSPSC (United Nations Standard Products and Services Codes)

EPA Chemical Substance Registry System

EPA Terminology Reference System

ISO Language Identifiers ISO 639-3 Part 3

IETF Language Identifiers RFC 1766

Units Ontology

56Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

XMDR Content Priority List

Phase 3HL7 Terminology

HL7 Data Elements

GO (Gene Ontology)

NBII Biocomplexity Thesaurus

EPA Web Registry Controlled Vocabulary

BioPAX Ontology

NASA SWEET Ontologies

NDRTF

57Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

Acknowledgementsand References

Frank Olken, LBNL Kevin Keck, LBNL John McCarthy, LBNL

58Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

59Open Forum 2003 on Metadata RegistriesOpen Forum 2005 on Metadata Registries

11179 Metadata RegistriesExtensions

Register (and manage) any semantics that are useful for managing data. E.g. Add semantic information beyond definitions link any concept found in concept systems or in the “wild”

(or clusters of concepts and relations) to data This may include all of the permissible values

(concepts) and the full concept systems in which the permissible values are found. E.g., may want to register keywords, thesauri, taxonomies,

ontologies, axiomitized ontologies…. Lay Foundation for semantics based computing:

Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web …