federal enterprise architecture data and information
TRANSCRIPT
Federal Enterprise Architecture
Data and Information Reference Model
Contribution to The FEA DRM Data Management Strategy
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Brand Niemann and Ken Gill
November 26, 2003 DRAFT
Office of Management and Budget – Federal Enterprise Architecture
2
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Introduction to Data SemanticsInformation technology and practices have evolved from centralized data management systems to decentralized and distributed computing information exchanges. Increasingly mature and robust infrastructures for distributing information are helping to realize the idea that information can be available to anyone, anytime, anywhere.
The availability of increasing amounts of information presents the challenge of delivering the right information, to the right person, at the right time. The data must be relevant, meaningful and at the appropriate level of detail.
Data Semantics is the discipline that facilitates the delivery of the right information based on the requestors requirements. Semantic agreement within Community’s of Practice is essential to facilitating meaningful and effective data exchange.
Domain Data Harmonization StrategyThe vast majority of existing information systems have evolved over time with diverse requirements and different data models. As a result the data stored in these systems have a varying level meaning, consistency, and quality. Data harmonization is the process by which a Community of Practice agrees to the meaning and format of the data residing in its information systems by applying common definitions, attributes, and values. Example outputs include data dictionaries and data registries.
There are several examples of current and ongoing Community of Practice Data Harmonization efforts (see use cases). The success of these efforts demonstrate several guiding principles.
Office of Management and Budget – Federal Enterprise Architecture
3
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Data Harmonization Guiding Principles:1. Data harmonization is a process not a project and should begin as early as possible.2. Identify key internal and external stakeholders.3. Engage existing and potential partners.4. Understand and agree on scope of initiative.5. Define requirements.6. Review best practices.7. Select a methodology and appropriate tools.8. Identify relevant information exchanges and the data systems that support them.9. Concepts and definitions must be universally accepted within the Community of Practice.10. Publish work product so it can be consumed by practitioners and technologists.
Global Justice Information Sharing Initiative (Global), includes the:
Global Infrastructure/Standards Working Group (GI/SWG), which created the:
XML Structure Task Force (XSTF), which consists of:
Agencies (practitioners), commercial XML developers (implementers), technical support staff,
and administrative support staff, which produced the:
“The Development of JXDD 3.0, Draft 0.1, July 3, 2003 (Justice XML Data Dictionary Version
3.0).
Office of Management and Budget – Federal Enterprise Architecture
4
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Goals of the JXDD 3.0 Development Effort: Reference architecture and namespaces for a standard Justice XML
Data Dictionary Schema specification. Object-oriented data model, named types, extensibility. Maximize use of standards and best practices:
• ISO 11179;• Draft Federal XML Developers Guide;• Intelligence Community Metadata Language; and• etc.
Metadata for content, registry support, and infrastructure support. Value constraints: codes/enumerations, special semantics. Fuller representation of relationships. Incorporate a broader set of user requirements:
• Data exchange requirements from several efforts; and• Functional requirements.
XML Schema version control. Migration paths.
Office of Management and Budget – Federal Enterprise Architecture
5
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
In this age of eGovernment and Enterprise Architecture whose objectives are
increased collaboration, consolidation, and integration to transform from organization-
to citizen/customer-centric, some key questions to ask yourself and others are: How “smart” is your data and information?; and Who are you collaborating with? Some key goals are then:
• “Smarter data” (put more effort into the data than the applications); and
• More collaboration (on and by means of “smarter data”).
The Semantic Web is a machine-readable web of “smart data” and automated services that amplify the Web far beyond current capabilities.
• Smart data is data that is application-independent, composable, classified, and part of a larger information ecosystem (ontology).
XML provides a simple, yet robust mechanism for encoding semantic information, or the meaning of data and shifts the “power” from the application to the data.
• But simple XML metadata is not enough because it only provides syntactic interoperability.
• Additional XML-based Ontology languages are being developed to encode semantic interoperability.– In the next ten years, we will see semantics to describe problems and business processes in specialized
domains.
Office of Management and Budget – Federal Enterprise Architecture
6
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
DynamicResources
StaticResources
InteroperableSyntax
InteroperableSemantics
Web Services
WWW Semantic Web
Semantic WebServices
Semantic Web Services
Enterprise Ontology andWeb Services Registry
Source: Derived in part from two separate presentations at the WebServices One Conference 2002 by Dieter Fensel and Dragan Sretenovic.
Office of Management and Budget – Federal Enterprise Architecture
7
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
RDF: Dublin Core Metadata and Relationships: The Resource Description Framework (RDF) is an XML-based language
to describe resources and is designed to create meta data about the “resource” as a standalone entity. The RDF model is often called a “triple” because it has three parts: (1) a resource; (2) a resource’s properties; and (3) the property values.
• The knowledge representation community uses the grammatical parts of a sentence: (1) subject; (2) predicate; and (3) object.
RDF Schema is language layer on top of RDF in what is called the “Semantic Web Stack”. Above RDF Schema is Ontologies and above that is the third and final web in Tim Berners-Lee’s three part vision (collaborative web, Semantic Web, web of trust).
Ontology involves discovering categories and fitting objects into them in ways that make sense. When we make a list…we are categorizing - we are engaging in rudimentary ontology. By prioritizing items in a list, we are assigning relationships among various things. Ontology can be relatively simple, or it can be quite complex.
XML Topic Maps are popular implementations of taxonomies and have complimentary characteristics to RDF.
Office of Management and Budget – Federal Enterprise Architecture
8
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
RDF: Dublin Core Metadata and Relationships (continued): The Dublin Core Metadata Initiative is a cross-disciplinary
international effort to develop mechanisms for the discovery-oriented description of diverse resources in an electronic environment. The Dublin Core Element Set is a list of fifteen fixed elements that capture a representation of essential aspects related to the description of resources. A complete list of Dublin Core metadata elements (e.g. author, title, creation date, etc.) can be found at http://dublincore.org/documents/1999/07/02/dces/
Metadata can exist within the resource that it is describing (internal metadata), or it can exist in a separate file (external metadata) that is associated with the content file.
Three excellent resources are:• Practical RDF: Solving Problems with the Resource Description Framework, Shelley
Powers, O’Reilly, July 2003.
• The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley Technology Publishing, June 2003; and
• XML Topic Maps: Creating and Using Topic Maps for the Web, Addison Wesley, July 2002.
Office of Management and Budget – Federal Enterprise Architecture
9
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Key Ontology Components RDF Triple Components
Subject*
Object
Literal
Predicate**
Predicate**
=URI
=Literal
=Property orAssociation
*The company* **sells batteries**.
Personbirthdate: dateGender: char
Image
Leader Organization
Resource
leads
is-A works for
published
depiction
knows
Source: The Semantic Web: A Guide to the Future of XML, Web Services,and Knowledge Management, Wiley Technology Publishing, June 2003.
Office of Management and Budget – Federal Enterprise Architecture
10
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
RDF: Semantic links - "Joining the Web"
Source: Standards, Semantics and Survival, by Tim Berners-Lee, Director, World Wide Web Consortium, January
2003.
Office of Management and Budget – Federal Enterprise Architecture
11
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Source: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley Technology Publishing, June 2003.
Office of Management and Budget – Federal Enterprise Architecture
12
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Source: The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management, Wiley Technology Publishing, June 2003.
The Ontology Spectrum: Weak to strong semantics.
Weak semantics
Strong semantics
TaxonomyIs a classification of
ThesaurusHas narrower meaning than
Conceptual ModelIs subclass of
Local Domain TheoryIs disjoint subclass of with transitivity property
Schema
XTM
RDF/S
UML
DAML+OIL, OWL
Office of Management and Budget – Federal Enterprise Architecture
13
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Ex
pre
ssi
vit
y a
nd
Se
ma
nti
c P
ow
er
Enterprise Support
XML
RDF
OWL
Data and Schema Management Validation
Run-time Engine
Integration and Orchestration
Ontology Works
enLeague
Ontoprise
Network InferenceUnicorn
SchemaLogic
Contivo
Celcorp
VitriaMetaMatrix
Modulant
IGS
S
S
S
SS S
S
S
S
U
S&U
S&U
S
U
S&U
Structured information
Unstructured information
Supports both
Current Support / Primary Strength
SMiosoft
Emerging Vendors Landscape: Semantic Integration
Source: Irene Polikoff,TopQuadrant, Positioning Semantic Technologies: The Emerging VendorLandscape, September 8, 2003.
Office of Management and Budget – Federal Enterprise Architecture
14
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
XML Collaborator
Registry
MetaBase MOF
Repository
Disparate Data Sources
(1) Import Physical Source
Metadata
(2) Identify and model XML
schema types (4) Define XML Schema using
registered elements from MetaBase
(3) Import modeled XML elements and
types into XML Registry
(5) Import XML Schema info MOF
repository
(6) Map virtual XML Document to physical
sources using Schema in MOF repository
(7) Create and Deploy Web Services for
accessing integrated data
Web Service
(8) Register WSDL in UDDI Registry
Design-Time Integration of Data Via Web Services Architecture Pilot
See Appendix: July 17, 2003.
Office of Management and Budget – Federal Enterprise Architecture
15
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Suggestions provided to the AIC Governance and Components Subcommittees,
August 25, 2003: 1. While ISO 11179 is emphasized because of considerable legacy work and
MOF is considered more useful currently, the Semantic Web technologies of RDF and OWL have much more mature and capable data models for semantic integration and interoperability and provides a convergence of the four data communities (document, Web, database, and programming).
• See http://www.w3.org/2003/08/owl-pressrelease
2. Data Independence is step one (Michael Daconta's "Declaration of Data Independence" from the September 8th Conference on Semantic Technologies for eGov:
• (a) Data is more important than applications.
• (b) Data value increases with the number of connections it shares.
• (c) Data about data can expand to as many layers as there are meanings.
• (d) Data modeling harmony is the alignment of syntax, semantics, and pragmatics.
• (e) Data and logic are the yin and yang of information processing.
• (f) Data modeling makes the implicit explicit and the transparent apparent.
• (g) Data standardization is not amenable to competition.
• (h) Data modeling must be decentralized.
• (i) Data relations must not be based on probability or luck.
• (j) Data is truly independent when the next generation need not reinvent it.
Office of Management and Budget – Federal Enterprise Architecture
16
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Suggestions provided to the AIC Governance and Components Subcommittees,
August 25, 2003 (continued): 3. The Intelligence Community Metadata Working Group (IC MWG) as a DRM
Governance Model (http://www.xml.saic.com/icml/)• (a) Establish by IC Chief Information Officer (CIO) Executive Council.
• (b) Promulgates the April 2003 IC Policy requiring IC-wide use of the IC XML standards for metadata and metadata markup. Also identifies and harmonizes enterprise-level metadata and metadata markup standards.
• (c) Developing community-wide standard XML metadata models including security-marking constructs that assist writers with application of Controlled Access Program Coordination Office (CAPCO) marking instructions. Subsequent standards will address digital signatures, encryption, and public key management.
• (d) Developing an IC metadata registry and registry services.
• (e) Currently working on the Terrorist Watchlist Person Data Exchange Standard XML Tags and Schema.
• (f) Work is accomplished in regular Technical Exchange Meetings (TEM) and Team meetings.
4. The Data and Information Reference Model has been and currently is the object of a series of pilot projects with the Communities of Practice. This should continue and is required by H.R. 2458 - the E-Government Act of 2002, SEC. 212. Integrated Reporting Study and Pilot Projects, (d) Pilot Projects To Encourage Integrated Collection And Management Of Data And Interoperability Of Federal Information Systems. Added November 25th.
Office of Management and Budget – Federal Enterprise Architecture
17
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
The Data and Information Reference Model has been (see Appendix) and currently is
the object of a series of pilot projects with the Communities of Practice: Open GIS Consortium (OGC).
• Information Communities and Semantics WG (ICS WG)– http://www.opengis.org/groups/?iid=50
Sustainability of Intergovernmental Exchange Networks (Global-Justice, Environmental Information-EPA, and Health IT Sharing (Health) (SIEN).
• Government Semantic XML Web Services Community of Practice (SWS-CoP)– http://web-services.gov/
Intelligence Community Metadata Working Group (IC MWG).• http://www.xml.saic.com/icml/
Semantic Interoperability Special Interest Group (SI-SIG).• To be announced.
E-Gov SmartServices• To join the group send an email to eGov_SmartServices-
[email protected] with empty Subject and Body. You will then receive an email with a web link where you can select the subscription option.
Open International Forum on Business Ontology• ONTOLOG - collaborative work environment
– http://ontolog.cim3.net/
More to come.
Office of Management and Budget – Federal Enterprise Architecture
18
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Appendix – History of DRM Support Work: January 22, 2003, Suggestions for the Federal Enterprise
Architecture (FEA) Data and Information Reference Model (updated January 27, 2003), FEA Data and Information Reference Model (DRM).
• http://web-services.gov/FEA-DRM12203.ppt
January 31, 2003, Topic Map Web Services for the FEA-PMO and FEA-DRM - Cognitive Topic Map Web Sites (CTW): Aggregating Information Across Individual Agencies and E-Gov Initiatives, Michel Biezunski, Coolheads Consulting, Proposed Pilot.
• http://web-services.gov/mbegov213003.ppt
February 10 and 20, 2003, Distributed Components, Metadata Models, and Registries: Input to the Governance and Components Subcommittee Meetings and the FEA Data and Information Reference Model (DRM). See "The Distributed Components, Metadata Models, and Registries" ListServ Discussion Summary-Joe Chiusano, Booz Allen Hamilton, March 4, 2003.
• http://web-services.gov/XML%20Web%20Services%20Working%20Group%2022003.ppt
• http://web-services.gov/Distributed%20Components,%20Metadata%20Models,%20and%20Registries%20Thread%20-%20%2003-04-03.ppt
Office of Management and Budget – Federal Enterprise Architecture
19
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Appendix – History of DRM Support Work (continued): March 19, 2003, Strengthening the Federal Enterprise Architecture
(FEA) Data and Information Reference Model (DRM) and Military Pilot Project (Federated Registries: Concepts and CONOPS) discussed at the XML Registry Pilot Team Meeting.
• http://web-services.govFEA-DRM31903.ppt
• http://xml.gov/agenda/rrt20030319.htm
April 4, 2003, Working Group provides Multiple Registries and Repositories to be federated with the XML Working Group's GSA-NIST XML Registry Pilot to support the CIO Council's Architecture and Infrastructure Committee (AIC) and the Data and Information Reference Model (DRM).
• http://web-services.gov/Registries41003.ppt
April 21, 2003, Strengthening the Federal Enterprise Architecture (FEA) Data and Information Reference Model (DRM) for the DRM Offsite May 19, 2003.
• http://web-services.gov/FEA-DRM42103.ppt
Office of Management and Budget – Federal Enterprise Architecture
20
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Appendix – History of DRM Support Work (continued): May 16, 2003, Extending the FEA DRM to Support the Joint Government
Data & Information Reference Model (GDIRM), the Business Compliance One Stop E-Gov Initiative, ITIPS-II, & Component Technology Activities, input for the DRM Offsite May 19, 2003. Also Geospatial Interoperability Reference Model (GIRM).
• http://web-services.gov/FEA-DRM51903.ppt
• http://web-services.gov/EA%20for%20Geospatial3.ppt
July 17, 2003, Report at AIC Task Leaders Meeting on Governance Subcommittee Goal 3 Task Pilots: A Government Enterprise Component Registry and Repository Using Native XML Database Technology (for presentation on July 22 and 23) and Joint Government Data and Information Reference Model (IAC White Paper) for review which includes MetaMatix-XML Collaborator Pilot Project (see pages 26-27).
• http://web-services.gov/Components%20Repository72203.ppt
• http://web-services.gov/030528_IAC_EA_SIG_Information_and_Data_Reference_Model_Body.pdf
September 8, 2003, “Semantic Technologies for eGov” Conference at the White House Conference Center. Proceedings and DVD recording are available.
• http://www.topquadrant.com/conferences/tq_proceedings.htm
Office of Management and Budget – Federal Enterprise Architecture
21
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Appendix – History of DRM Support Work (continued): October 16, 2003, Founding Meeting of the Semantics SIG to Establish
a Community of Purpose, Decides to Develop a Charter, Mission Statement, White Paper, and Foster "Best Practices". More information to follow.
October 20, 2003, Emerging Components Quarterly Conference at the White House Conference Center Featuring: Semantic Mapping Tools (Image Matters: userSmarts and Ontology Manipulation Toolkit).
Also to be presented November 19-20, 2003, at the Geography Awareness Week and GIS Day 2003, Mellon Auditorium, Washington, DC, 14th and Constitution Avenue.
• http://www.componenttechnology.org/Emerging/Oct202003Conference/Agenda/
• http://web-services.gov/GISdayBrand111903.doc
• http://web-services.gov/brief-userSmartsOverview-031020.ppt
• http://www.fgdc.gov/gisday2003/
February 4, 2004, E-Gov Web-Enabled Government 2004 Conference, Session 2-4: Understanding Semantic Web Technology, Brand Niemann and Jim Hendler.
• http://www.e-gov.com
Office of Management and Budget – Federal Enterprise Architecture
22
Business Driver 4: Resolve Data Semantics Issues That Impede Community of Practice Work
Review Comments and Suggestions: A word of praise: I've had some Unisys folks help to add their ideas and revisions
to these paragraphs and several of the Unisys Architects have called out your section as being "right on the money" and "very impressive". Davis Roberts, Unisys.
Just writing to let you know that your slide presentation (and the work you are doing) is great! Thank you. I look forward to meeting you, and collaborating with you in the not-too-distant future. I just came upon a very good presentation by Brand Niemann and Ken Gill that is part of their contribution to the US Federal Enterprise Architecture ("FEA") Data and Information Reference Model ("DRM") data management strategy. The work of some of our community members: Mike Daconta/Leo Obrst/Kevin Smith, Jack Park/Sam Hunting; and even our [ontolog-forum] community of practice, has been referenced in there too. Let's keep up the good work here ... we definitely look forward to closer collaboration with the eGov/FEA folks in the future. Peter Yim [email protected], Organization: CIM Engineering, Inc. To: [email protected], Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/ , Shared Files: http://ontolog.cim3.net/file/, Community Wiki: http://ontolog.cim3.net/wiki/
See “Data Models in a National Infrastructure Handling Reporting Obligations: Norwegian Experience and Opportunities, Version 1.01, August 2003, by Per Myrseth, IBM Norway, 14 pp.
See “An Overview of SNOWMED CT (Systematized Nomenclature of Medicine Clinical Terms), American College of Pathologists, 2003, 32 slides.