federal enterprise architecture data and information

22
Federal Enterprise Architecture Data and Information Reference Model Contribution to The FEA DRM Data Management Strategy Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work Brand Niemann and Ken Gill November 26, 2003 DRAFT

Upload: alistercrowe

Post on 10-May-2015

1.050 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Federal Enterprise Architecture Data and Information

Federal Enterprise Architecture

Data and Information Reference Model

Contribution to The FEA DRM Data Management Strategy

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Brand Niemann and Ken Gill

November 26, 2003 DRAFT

Page 2: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

2

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Introduction to Data SemanticsInformation technology and practices have evolved from centralized data management systems to decentralized and distributed computing information exchanges. Increasingly mature and robust infrastructures for distributing information are helping to realize the idea that information can be available to anyone, anytime, anywhere.

The availability of increasing amounts of information presents the challenge of delivering the right information, to the right person, at the right time. The data must be relevant, meaningful and at the appropriate level of detail.

Data Semantics is the discipline that facilitates the delivery of the right information based on the requestors requirements. Semantic agreement within Community’s of Practice is essential to facilitating meaningful and effective data exchange.

Domain Data Harmonization StrategyThe vast majority of existing information systems have evolved over time with diverse requirements and different data models. As a result the data stored in these systems have a varying level meaning, consistency, and quality. Data harmonization is the process by which a Community of Practice agrees to the meaning and format of the data residing in its information systems by applying common definitions, attributes, and values. Example outputs include data dictionaries and data registries.

There are several examples of current and ongoing Community of Practice Data Harmonization efforts (see use cases). The success of these efforts demonstrate several guiding principles.

Page 3: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

3

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Data Harmonization Guiding Principles:1. Data harmonization is a process not a project and should begin as early as possible.2. Identify key internal and external stakeholders.3. Engage existing and potential partners.4. Understand and agree on scope of initiative.5. Define requirements.6. Review best practices.7. Select a methodology and appropriate tools.8. Identify relevant information exchanges and the data systems that support them.9. Concepts and definitions must be universally accepted within the Community of Practice.10. Publish work product so it can be consumed by practitioners and technologists.

Global Justice Information Sharing Initiative (Global), includes the:

Global Infrastructure/Standards Working Group (GI/SWG), which created the:

XML Structure Task Force (XSTF), which consists of:

Agencies (practitioners), commercial XML developers (implementers), technical support staff,

and administrative support staff, which produced the:

“The Development of JXDD 3.0, Draft 0.1, July 3, 2003 (Justice XML Data Dictionary Version

3.0).

Page 4: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

4

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Goals of the JXDD 3.0 Development Effort: Reference architecture and namespaces for a standard Justice XML

Data Dictionary Schema specification. Object-oriented data model, named types, extensibility. Maximize use of standards and best practices:

• ISO 11179;• Draft Federal XML Developers Guide;• Intelligence Community Metadata Language; and• etc.

Metadata for content, registry support, and infrastructure support. Value constraints: codes/enumerations, special semantics. Fuller representation of relationships. Incorporate a broader set of user requirements:

• Data exchange requirements from several efforts; and• Functional requirements.

XML Schema version control. Migration paths.

Page 5: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

5

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

In this age of eGovernment and Enterprise Architecture whose objectives are

increased collaboration, consolidation, and integration to transform from organization-

to citizen/customer-centric, some key questions to ask yourself and others are: How “smart” is your data and information?; and Who are you collaborating with? Some key goals are then:

• “Smarter data” (put more effort into the data than the applications); and

• More collaboration (on and by means of “smarter data”).

The Semantic Web is a machine-readable web of “smart data” and automated services that amplify the Web far beyond current capabilities.

• Smart data is data that is application-independent, composable, classified, and part of a larger information ecosystem (ontology).

XML provides a simple, yet robust mechanism for encoding semantic information, or the meaning of data and shifts the “power” from the application to the data.

• But simple XML metadata is not enough because it only provides syntactic interoperability.

• Additional XML-based Ontology languages are being developed to encode semantic interoperability.– In the next ten years, we will see semantics to describe problems and business processes in specialized

domains.

Page 6: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

6

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

DynamicResources

StaticResources

InteroperableSyntax

InteroperableSemantics

Web Services

WWW Semantic Web

Semantic WebServices

Semantic Web Services

Enterprise Ontology andWeb Services Registry

Source: Derived in part from two separate presentations at the WebServices One Conference 2002 by Dieter Fensel and Dragan Sretenovic.

Page 7: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

7

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

RDF: Dublin Core Metadata and Relationships: The Resource Description Framework (RDF) is an XML-based language

to describe resources and is designed to create meta data about the “resource” as a standalone entity. The RDF model is often called a “triple” because it has three parts: (1) a resource; (2) a resource’s properties; and (3) the property values.

• The knowledge representation community uses the grammatical parts of a sentence: (1) subject; (2) predicate; and (3) object.

RDF Schema is language layer on top of RDF in what is called the “Semantic Web Stack”. Above RDF Schema is Ontologies and above that is the third and final web in Tim Berners-Lee’s three part vision (collaborative web, Semantic Web, web of trust).

Ontology involves discovering categories and fitting objects into them in ways that make sense. When we make a list…we are categorizing - we are engaging in rudimentary ontology. By prioritizing items in a list, we are assigning relationships among various things. Ontology can be relatively simple, or it can be quite complex.

XML Topic Maps are popular implementations of taxonomies and have complimentary characteristics to RDF.

Page 8: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

8

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

RDF: Dublin Core Metadata and Relationships (continued): The Dublin Core Metadata Initiative is a cross-disciplinary

international effort to develop mechanisms for the discovery-oriented description of diverse resources in an electronic environment. The Dublin Core Element Set is a list of fifteen fixed elements that capture a representation of essential aspects related to the description of resources. A complete list of Dublin Core metadata elements (e.g. author, title, creation date, etc.) can be found at http://dublincore.org/documents/1999/07/02/dces/

Metadata can exist within the resource that it is describing (internal metadata), or it can exist in a separate file (external metadata) that is associated with the content file.

Three excellent resources are:• Practical RDF: Solving Problems with the Resource Description Framework, Shelley

Powers, O’Reilly, July 2003.

• The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley Technology Publishing, June 2003; and

• XML Topic Maps: Creating and Using Topic Maps for the Web, Addison Wesley, July 2002.

Page 9: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

9

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Key Ontology Components RDF Triple Components

Subject*

Object

Literal

Predicate**

Predicate**

=URI

=Literal

=Property orAssociation

*The company* **sells batteries**.

Personbirthdate: dateGender: char

Image

Leader Organization

Resource

leads

is-A works for

published

depiction

knows

Source: The Semantic Web: A Guide to the Future of XML, Web Services,and Knowledge Management, Wiley Technology Publishing, June 2003.

Page 10: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

10

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

RDF: Semantic links - "Joining the Web"

Source: Standards, Semantics and Survival, by Tim Berners-Lee, Director, World Wide Web Consortium, January

2003.

Page 11: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

11

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Source: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley Technology Publishing, June 2003.

Page 12: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

12

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Source: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley Technology Publishing, June 2003.

The Ontology Spectrum: Weak to strong semantics.

Weak semantics

Strong semantics

TaxonomyIs a classification of

ThesaurusHas narrower meaning than

Conceptual ModelIs subclass of

Local Domain TheoryIs disjoint subclass of with transitivity property

Schema

XTM

RDF/S

UML

DAML+OIL, OWL

Page 13: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

13

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Ex

pre

ssi

vit

y a

nd

Se

ma

nti

c P

ow

er

Enterprise Support

XML

RDF

OWL

Data and Schema Management Validation

Run-time Engine

Integration and Orchestration

Ontology Works

enLeague

Ontoprise

Network InferenceUnicorn

SchemaLogic

Contivo

Celcorp

VitriaMetaMatrix

Modulant

IGS

S

S

S

SS S

S

S

S

U

S&U

S&U

S

U

S&U

Structured information

Unstructured information

Supports both

Current Support / Primary Strength

SMiosoft

Emerging Vendors Landscape: Semantic Integration

Source: Irene Polikoff,TopQuadrant, Positioning Semantic Technologies: The Emerging VendorLandscape, September 8, 2003.

Page 14: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

14

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

XML Collaborator

Registry

MetaBase MOF

Repository

Disparate Data Sources

(1) Import Physical Source

Metadata

(2) Identify and model XML

schema types (4) Define XML Schema using

registered elements from MetaBase

(3) Import modeled XML elements and

types into XML Registry

(5) Import XML Schema info MOF

repository

(6) Map virtual XML Document to physical

sources using Schema in MOF repository

(7) Create and Deploy Web Services for

accessing integrated data

Web Service

(8) Register WSDL in UDDI Registry

Design-Time Integration of Data Via Web Services Architecture Pilot

See Appendix: July 17, 2003.

Page 15: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

15

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Suggestions provided to the AIC Governance and Components Subcommittees,

August 25, 2003: 1. While ISO 11179 is emphasized because of considerable legacy work and

MOF is considered more useful currently, the Semantic Web technologies of RDF and OWL have much more mature and capable data models for semantic integration and interoperability and provides a convergence of the four data communities (document, Web, database, and programming).

• See http://www.w3.org/2003/08/owl-pressrelease

2. Data Independence is step one (Michael Daconta's "Declaration of Data Independence" from the September 8th Conference on Semantic Technologies for eGov:

• (a) Data is more important than applications.

• (b) Data value increases with the number of connections it shares.

• (c) Data about data can expand to as many layers as there are meanings.

• (d) Data modeling harmony is the alignment of syntax, semantics, and pragmatics.

• (e) Data and logic are the yin and yang of information processing.

• (f) Data modeling makes the implicit explicit and the transparent apparent.

• (g) Data standardization is not amenable to competition.

• (h) Data modeling must be decentralized.

• (i) Data relations must not be based on probability or luck.

• (j) Data is truly independent when the next generation need not reinvent it.

Page 16: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

16

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Suggestions provided to the AIC Governance and Components Subcommittees,

August 25, 2003 (continued): 3. The Intelligence Community Metadata Working Group (IC MWG) as a DRM

Governance Model (http://www.xml.saic.com/icml/)• (a) Establish by IC Chief Information Officer (CIO) Executive Council.

• (b) Promulgates the April 2003 IC Policy requiring IC-wide use of the IC XML standards for metadata and metadata markup. Also identifies and harmonizes enterprise-level metadata and metadata markup standards.

• (c) Developing community-wide standard XML metadata models including security-marking constructs that assist writers with application of Controlled Access Program Coordination Office (CAPCO) marking instructions. Subsequent standards will address digital signatures, encryption, and public key management.

• (d) Developing an IC metadata registry and registry services.

• (e) Currently working on the Terrorist Watchlist Person Data Exchange Standard XML Tags and Schema.

• (f) Work is accomplished in regular Technical Exchange Meetings (TEM) and Team meetings.

4. The Data and Information Reference Model has been and currently is the object of a series of pilot projects with the Communities of Practice. This should continue and is required by H.R. 2458 - the E-Government Act of 2002, SEC. 212. Integrated Reporting Study and Pilot Projects, (d) Pilot Projects To Encourage Integrated Collection And Management Of Data And Interoperability Of Federal Information Systems. Added November 25th.

Page 17: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

17

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

The Data and Information Reference Model has been (see Appendix) and currently is

the object of a series of pilot projects with the Communities of Practice: Open GIS Consortium (OGC).

• Information Communities and Semantics WG (ICS WG)– http://www.opengis.org/groups/?iid=50

Sustainability of Intergovernmental Exchange Networks (Global-Justice, Environmental Information-EPA, and Health IT Sharing (Health) (SIEN).

• Government Semantic XML Web Services Community of Practice (SWS-CoP)– http://web-services.gov/

Intelligence Community Metadata Working Group (IC MWG).• http://www.xml.saic.com/icml/

Semantic Interoperability Special Interest Group (SI-SIG).• To be announced.

E-Gov SmartServices• To join the group send an email to eGov_SmartServices-

[email protected] with empty Subject and Body. You will then receive an email with a web link where you can select the subscription option.

Open International Forum on Business Ontology• ONTOLOG - collaborative work environment

– http://ontolog.cim3.net/

More to come.

Page 18: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

18

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Appendix – History of DRM Support Work: January 22, 2003, Suggestions for the Federal Enterprise

Architecture (FEA) Data and Information Reference Model (updated January 27, 2003), FEA Data and Information Reference Model (DRM).

• http://web-services.gov/FEA-DRM12203.ppt

January 31, 2003, Topic Map Web Services for the FEA-PMO and FEA-DRM - Cognitive Topic Map Web Sites (CTW): Aggregating Information Across Individual Agencies and E-Gov Initiatives, Michel Biezunski, Coolheads Consulting, Proposed Pilot.

• http://web-services.gov/mbegov213003.ppt

February 10 and 20, 2003, Distributed Components, Metadata Models, and Registries: Input to the Governance and Components Subcommittee Meetings and the FEA Data and Information Reference Model (DRM). See "The Distributed Components, Metadata Models, and Registries" ListServ Discussion Summary-Joe Chiusano, Booz Allen Hamilton, March 4, 2003.

• http://web-services.gov/XML%20Web%20Services%20Working%20Group%2022003.ppt

• http://web-services.gov/Distributed%20Components,%20Metadata%20Models,%20and%20Registries%20Thread%20-%20%2003-04-03.ppt

Page 19: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

19

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Appendix – History of DRM Support Work (continued): March 19, 2003, Strengthening the Federal Enterprise Architecture

(FEA) Data and Information Reference Model (DRM) and Military Pilot Project (Federated Registries: Concepts and CONOPS) discussed at the XML Registry Pilot Team Meeting.

• http://web-services.govFEA-DRM31903.ppt

• http://xml.gov/agenda/rrt20030319.htm

April 4, 2003, Working Group provides Multiple Registries and Repositories to be federated with the XML Working Group's GSA-NIST XML Registry Pilot to support the CIO Council's Architecture and Infrastructure Committee (AIC) and the Data and Information Reference Model (DRM).

• http://web-services.gov/Registries41003.ppt

April 21, 2003, Strengthening the Federal Enterprise Architecture (FEA) Data and Information Reference Model (DRM) for the DRM Offsite May 19, 2003.

• http://web-services.gov/FEA-DRM42103.ppt

Page 20: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

20

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Appendix – History of DRM Support Work (continued): May 16, 2003, Extending the FEA DRM to Support the Joint Government

Data & Information Reference Model (GDIRM), the Business Compliance One Stop E-Gov Initiative, ITIPS-II, & Component Technology Activities, input for the DRM Offsite May 19, 2003. Also Geospatial Interoperability Reference Model (GIRM).

• http://web-services.gov/FEA-DRM51903.ppt

• http://web-services.gov/EA%20for%20Geospatial3.ppt

July 17, 2003, Report at AIC Task Leaders Meeting on Governance Subcommittee Goal 3 Task Pilots: A Government Enterprise Component Registry and Repository Using Native XML Database Technology (for presentation on July 22 and 23) and Joint Government Data and Information Reference Model (IAC White Paper) for review which includes MetaMatix-XML Collaborator Pilot Project (see pages 26-27).

• http://web-services.gov/Components%20Repository72203.ppt

• http://web-services.gov/030528_IAC_EA_SIG_Information_and_Data_Reference_Model_Body.pdf

September 8, 2003, “Semantic Technologies for eGov” Conference at the White House Conference Center. Proceedings and DVD recording are available.

• http://www.topquadrant.com/conferences/tq_proceedings.htm

Page 21: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

21

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Appendix – History of DRM Support Work (continued): October 16, 2003, Founding Meeting of the Semantics SIG to Establish

a Community of Purpose, Decides to Develop a Charter, Mission Statement, White Paper, and Foster "Best Practices". More information to follow.

October 20, 2003, Emerging Components Quarterly Conference at the White House Conference Center Featuring: Semantic Mapping Tools (Image Matters: userSmarts and Ontology Manipulation Toolkit).

Also to be presented November 19-20, 2003, at the Geography Awareness Week and GIS Day 2003, Mellon Auditorium, Washington, DC, 14th and Constitution Avenue.

• http://www.componenttechnology.org/Emerging/Oct202003Conference/Agenda/

• http://web-services.gov/GISdayBrand111903.doc

• http://web-services.gov/brief-userSmartsOverview-031020.ppt

• http://www.fgdc.gov/gisday2003/

February 4, 2004, E-Gov Web-Enabled Government 2004 Conference, Session 2-4: Understanding Semantic Web Technology, Brand Niemann and Jim Hendler.

• http://www.e-gov.com

Page 22: Federal Enterprise Architecture Data and Information

Office of Management and Budget – Federal Enterprise Architecture

22

Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work

Review Comments and Suggestions: A word of praise: I've had some Unisys folks help to add their ideas and revisions

to these paragraphs and several of the Unisys Architects have called out your section as being "right on the money" and "very impressive". Davis Roberts, Unisys.

Just writing to let you know that your slide presentation (and the work you are doing) is great! Thank you. I look forward to meeting you, and collaborating with you in the not-too-distant future. I just came upon a very good presentation by Brand Niemann and Ken Gill that is part of their contribution to the US Federal Enterprise Architecture ("FEA") Data and Information Reference Model ("DRM") data management strategy. The work of some of our community members: Mike Daconta/Leo Obrst/Kevin Smith, Jack Park/Sam Hunting; and even our [ontolog-forum] community of practice, has been referenced in there too. Let's keep up the good work here ... we definitely look forward to closer collaboration with the eGov/FEA folks in the future. Peter Yim [email protected], Organization: CIM Engineering, Inc. To: [email protected], Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/ , Shared Files: http://ontolog.cim3.net/file/, Community Wiki: http://ontolog.cim3.net/wiki/

See “Data Models in a National Infrastructure Handling Reporting Obligations: Norwegian Experience and Opportunities, Version 1.01, August 2003, by Per Myrseth, IBM Norway, 14 pp.

See “An Overview of SNOWMED CT (Systematized Nomenclature of Medicine Clinical Terms), American College of Pathologists, 2003, 32 slides.