an introduction to track 4: soa and metadata (semantics) chuck mosher senior enterprise architect...
TRANSCRIPT
An Introduction to Track 4: SOA and Metadata
(Semantics)
Chuck Mosher
Senior Enterprise Architect
cmosher @ metamatrix.com
2nd SOA for E-Government Conference30-31 October 20006
2
Agenda
• The drivers for data (& metadata) integration• Metadata in an SOA• Data services: using active metadata to drive
data integration• Beyond metadata: dictionaries, vocabularies,
domain models, ontologies (semantics)• Why ontologies?• Overview of Track 4 Presentations• Q & A
3
Acknowledgements
• Dave McComb*, Semantic Arts
• Atif Kureishy*, Booz | Allen | Hamilton
• John Salasin*, NIST
• Jeff Pollock, Oracle
• Brand Niemann, EPA
• Andy Evans, Revelytix
* Track 4 Speaker, 2:45-4:15 pm tomorrow
4
One of the three enablers which drives domain-wide visibility: “… is a standard enterprise data architecture — the foundation for effective and rapid data transfer and the fundamental building block to enable a common logistical picture.”
Army Lt. Gen. Claude Christianson
“If you look at all the trends in the IT arena over the past 30 to 40 years, we’ve moved into an environment where we’ve got faster networks, more powerful processors, but it really comes down to the data”
Michael Todd, DOD CIO office
Data Interoperability Lies At The Very Core of DoD Transformation
5
Dr. Linton Wells, as quoted in September’s NDIA Magazine, “…data compatibility may be an issue. Enabling digital interaction with nontraditional partners may require middleware or other programs that convert data from totally different formats …”
6
Problem Scope
• Incompatible data meanings are the largest, most expensive, and time-consuming portion of IT visibility and IT interoperability projects:– Gartner… Forrester… NIST…– IDC… CIO Magazine…
• The classic “n-squared” problem of interfaces is even more severe at the data layer:– Data-to-data interfaces outnumber “pipes”– Tightly-coupled is brittle, and requires code
• Information growth is accelerating – FAST!– 2002-2005 – more new data than all of history– 5 exabytes of new digital data created in 2002 – enough for .5 million
new Library’s of Congress
Jeff Pollock – 2004 White House Conference on Semantic Technology
7
Agenda
• The drivers for data (& metadata) integration• Metadata in an SOA• Data services: using active metadata to drive
data integration• Beyond metadata: dictionaries, vocabularies,
domain models, ontologies (semantics)• Why ontologies?• Overview of Track 4 Presentations• Q & A
8
Why Does SOA Need Metadata?
• An architectural style enabling loose-coupling• Cornerstone of E-Government reengineering• Web Services and their related standards
(SOAP, WSDL, UDDI) provide an implementation framework for several key features of SOA
• BUT: Web Service technologies do not provide all the requirements for Dynamic USE of Discoverable Services
• Discovery – Yes – UDDI/ebXML• Use – No – requires service consumers and
providers to agree on a pre-defined standard interface for the service
9
SOA is Easy, It’s Metadata That’s Hard
• SOA focuses on the interoperability between application interfaces & protocols
• Data (and service) meaning, integrity, and transformation have to be addressed elsewhere
• This information is found in the metadata
• SOA makes getting control over the metadata critical to success– Or you will end up with SOA silos!
10
Metadata Is Everywhere
• Integration– Syntactic– Semantic– Application– Process
• Accessibility• Visibility• Discoverability
• Management– Governance– Auditing– Lineage– Quality– Compliance– Change Mgmnt– Impact Analysis– Performance
Many of the problems & issues around SOA implementations & governance boil down to getting a solid handle on all of the types & forms of metadata involved
11
What Are Semantic Conflicts?Different primitive or abstract types for same information
Synonyms/antonyms have different text labels
Different conceptions about the relationships among concepts in similar data sets. Collections or constraints have been modeled differently for same information
Different abstractions are used to model same domain
Different choices are made about what concepts are made explicit
Fundamentally different data representations are used
Synonyms/antonyms exist in same/similar concept instance values
Different units of measures with incompatible scales
Similar concepts with different definitions
Fundamental incompatibilities in underlying domains
Disparity among the integrity constraints
Data Type
Labeling
AggregationStructureCardinality
Generalization
Value Representation
Impedance Mismatch
Naming
Scaling and Unit
Confounding
Domain
IntegrityJeff Pollock – 2004 White House Conference on Semantic Technology
12
Metadata Management Maturity• Level 1: Inventory of information assets
– Necessary 1st step – what data do we have– Typically stored in repositories, registries, spreadsheets,
implicit in data itself (relational DB’s)• Level 2: Impact analysis
– Develop domain vocabularies and data models– Discover or create relationships between system artifacts
• Level 3: Metadata-driven integration– Design-time metadata repository + run-time integration– Example of Model-Driven Architecture
• Level 4: Semantic Web– Dynamic, machine-based inferencing at the concept level
13
Data Evolution Timeline
Age of Programs
Age of Proprietary
Data
Age of OpenData
Age of Open
Metadata
Age of SemanticModels
Program-Data
GIGO/minis/micros www / Netscape Web services OWL
Text, Office DocsDatabases
(proprietary schema)
HTML,XML
(open schema)
Namespaces,Taxonomies,
RDF
Ontologies&
Inference
1945 -1970 2000 - 20031994 - 20001970 - 1994 2003 -
ProceduralProgramming
Object-OrientedProgramming
Model-DrivenProgramming
“Data is lesslessimportant
than code”
“Data is asasimportantas code”
“Data is moremoreimportant
than code”
Michael Daconta, Creating Relevance and Reuse with Targeted Semantics,XML 2004 Conference Keynote, November 16, 2004.
14
Agenda
• The drivers for data (& metadata) integration• Metadata in an SOA• Data services: using active metadata to drive
data integration• Beyond metadata: dictionaries, vocabularies,
domain models, ontologies (semantics)• Why ontologies?• Overview of Track 4 Presentations• Q & A
15
Program Challenges• Multiple sources
• Different interfaces/drivers• Different physical structures• Different semantics
• Single interface to data desired• Real-time access to data• Performance• Maintainability as data changes• Maintainability as apps change
Mission Challenges• Time-to-deploy• Agility - Responsiveness to change• Automation – Reduce cost of new development and operations• ROI of enterprise information
Agency Challenges• 100’s/1000’s of data sources• 100’s/1000’s of applications• Multiple access points/modes for apps• Understanding relationships/semantics• Data consistency• Data reuse – bridging data silos• Support for Web Services & SQL• Control & manageability, compliance• Security & auditing
Information Resources
Communities of Interest
Information Challenges
?
16
Information Virtualization
Information Resources
Communities of Interest
Information Virtualization Layer
17
Information Virtualization
Unified Semantic Layer
Information Virtualization Layer
Data Federation Layer
Data Access/Connectivity Layer
Enterprise Data Sources
Unification of different concepts across systemsSingle-query access to heterogeneous systemsUniform, standardized access to any system
18
Metadata-Based Data Service
MasterData
OperationalData Store
AgencyApplication
Data Service
SQL SQL APICall
XML/SOAP
• Decouple data sources from application– Data implementation shielded
from application• Semantic/Format Mediation
– Standard vocabulary • Single access point
– Web Service/XML– SQL
• Federation– Single source or multi-source
• Scalability– Security, performance
Bridge theGap
SQL
19
FEA DRM View on Data Services
DRM Version 2 Data Access Services• Context Awareness Services• Structural Awareness Services• Transactional Services• Data Query Services• Content Search and Discovery Services• Retrieval Services• Subscription Services• Notification Services
Service Types include:• Metadata / Data
• Structured / Unstructured• Read / Write• Push / Pull
20
Designing data services
Modeling Information Services for SOA
xml
databases
warehouses
spreadsheets
services
<sale/> <value/></ sale >
geo-spatial
rich media
…Enterprise Enterprise Information Information
Sources (EIS)Sources (EIS)
Information Information ConsumersConsumers
Reusable,Reusable,Integrated Data Integrated Data
ObjectsObjects
ExposedExposedDataData
ServicesServices
<WSDL><WSDL>(contract)
<WSDL><WSDL>(contract)
<WSDL><WSDL>(contract)
Custom Apps
Web Services,Business Processes
Packaged Apps
Reporting, Analytics
EAI, Data warehouses
OD
BC
JDB
CS
OA
P
Logistics
Intelligence
21
• Transformations from one or more sources
• Transformations defined with:– Joins/unions– Criteria– Functions
• Elements mapped to dictionary
• Business definitions captured
Data Service Abstraction Layers
22
Data Service Layer in SOAClient Process & Applications
Data Sources
Data Services Layer
Message Services (ESB)
Business Services
Business Process Services
App App App App App App
Data Service Data Service Data Service Data Service Data Service
23
Data,ContentSources
Logical Data Model
Data Services Approaches
T
Org, Person, Image,
Location
MaterializedLogical Model
<X>
</X>
<X>
</X>
<X>
<X>
<X>
</X>
<X>
</X>
<X>
<X>
Data Services for Multiple Purposes:
• Simplified access to value-added (tagged) data in real-time• Value-added (tagged) data materialized & staged
• Phased-in migration from legacy to new• Managed archiving via classification, retention tags
• Enhanced search via consistent content tags
Model-Driven Integration LayerModel-Driven Integration Layer
Data,ContentSources
Logical Data ModelT
Organization, Customer, Imagery, Location
MaterializedLogical Model
<X>
</X>
<X>
</X>
<X>
<X>
<X>
</X>
<X>
</X>
<X>
<X>
AgileInformation
Services
<X>
</X>
<X>
</X>
<X><X>
<X>
</X>
<X>
</X>
<X><X>
<X>
</X>
<X>
</X>
<X><X>
Enriched Data/Content Store
24
T
Authoritative Sources:• Mapped to logical
Multiple Internal/External Information Sources
Application views of information:
• Relational, XML
T T
XML Document<a>
</a>
<b>
</b>…
T
TT
ODBC/JDBC JDBC SOAP
WebServices
WebServices
Search Applications
Search Applications
BusinessIntelligence
Applications
BusinessIntelligence
Applications
Logical Data Model:• Agency or COI-specific• Rationalize, harmonize,
mediate
C2, Logistics, Intelligence, …
Leveraging COI Data Dictionaries
bldg_id SITENUM Facility_ID
Location_ID
bldg_type Depot_Number
Location_Type
25
Agenda
• The drivers for data (& metadata) integration• Metadata in an SOA• Data services: using active metadata to drive
data integration• Beyond metadata: dictionaries, vocabularies,
domain models, ontologies (semantics)• Why ontologies?• Overview of Track 4 Presentations• Q & A
26
Beyond Mere Metadata
• Vocabularies/lexicons, Domain Models, Taxonomies, Ontologies
• All are means of beginning to define the context and scope of the domain of interest
• All specify artifacts in some way
• The “Semantics” word often means the relationships between artifacts is also specified
27
Semantics = Meaning = Relationships
• Humans (and therefore our machines) only ever understand anything in so far as it is related to other things
ID
28
Semantics = Meaning = Relationships
• Humans (and therefore our machines) only ever understand anything in so far as it is related to other things
ID
VANY
MD
29
Semantics = Meaning = Relationships
• Humans (and therefore our machines) only ever understand anything in so far as it is related to other things
ID
SUPEREGO
EGO
ANALYSIS
30
Semantics = Meaning = Relationships
• Humans (and therefore our machines) only ever understand anything in so far as it is related to other things
ID
LICENSE
CARD
BADGE
31
Data Dictionary -> Vocabulary
• The data alone does not have sufficient context• Using metadata is not enough - you must be able to
leverage domain concepts and terminologies• Example problem – potentially similar data elements,
but dissimilar constructs/datatypes/descriptions– How do we relate common constructs with uncommon datatypes? – Solution requires that vocabulary relate those constructs across
models with transformation relationships, logic
• Define business use/semantics of similar information– Datatypes describe a set of values– Defines the technical constraints on values– Enables integrating information, as datatypes can be
referenced by any models (relational, XML, object, …)
32
Benefits of Building a Vocabulary• Develop reusable information models and schemas
• Capture business and technology requirements in a single vocabulary
• Capture institutional knowledge
• Enables semantic mining techniques for deeper data discovery and information sharing
• Accelerate interoperability, web services and SOA development and deployment
• Establish and maintain a common relationship across data sources
• Establish and maintain compliance with industry exchange models
• Reduce IT expenses by leveraging data in its native source
• Reduce IT expenses associated with building and maintaining partner integration
• Improved information sharing directly enhances decision making
33
Develop UML Use-CaseAuto Generate XSD - XML
Vocabulary Handbook
UNCLASSIFIED
Example Vocabulary Development Process
Determine Pilot Demonstration
Class Relationship Diagram
MDA DS COI Pilot - John Shea PEO C4I, PMW180 ISR/IO NMCI
34
Agenda
• The drivers for data (& metadata) integration• Metadata in an SOA• Data services: using active metadata to drive
data integration• Beyond metadata: dictionaries, vocabularies,
domain models, ontologies (semantics)• Why ontologies?• Overview of Track 4 Presentations• Q & A
35
“Ideal” Semantics
• Formal definition of meaning– Unambiguous– Machine process-able– Decidable
• Automated classification– Membership based on properties
• Inference– Can increase what you know based on
classification
36
Ontologies
• Ontology is an explicit formal specification of the terms in a domain and the relationships between them– Others are special cases– Formal conceptual model– W3C standard (OWL/RDF) implementation
• Concepts, definitions, properties, relationships
• Machines can draw inferences from the properties and relationships captured in the model
37
Ontologies
• Ontologies bring rigorous definitions of meaning to (meta)data
• More abstraction from lower levels of detail
• Key to loose-coupling
• With OWL/RDF, part of the W3C Semantic Web vision
38
W3C Semantic Web Stack
39
RDF
• Resource Description Format
• A mechanism to make assertions about things
• In the form of a triple:
subject -> predicate ->object
Resource (URI) -> Property (URI) -> Resource (URI or literal)
• URI’s establish unique namespace; do not have to be addressable
40
RDF Examples
Airport123Business345
“ORD”
“Chicago, IL”
closestTo
name
locatedIn
Airport123
Airport123
41
OWL
• OWL extends RDF by allowing us to create and make assertions about classes of things
Feline
Mammal Hair
Retractable
Claws
is a
has
has
42
T
Authoritative Sources:• Mapped to logical
Multiple Internal/External Information Sources
Application views of information:
• Relational, XML
T T
XML Document<a>
</a>
<b>
</b>…
T
TT
ODBC/JDBC JDBC SOAP
WebServices
WebServices
Search Applications
Search Applications
BusinessIntelligence
Applications
BusinessIntelligence
Applications
Logical Data Model:• Agency or COI-specific• Rationalize, harmonize,
mediate
C2, Logistics, Intelligence, …
Semantic Mapping Challenge
bldg_id SITENUM Facility_ID
Location_ID
bldg_type Depot_Number
Location_Type
43
Contextualize (Interpret)
Automated term tokenization
Automated semantic linking using the default knowledge-base contained within MatchIT
ArticleAmount
Amount Article
Sum
Assets
Creation
Synonym
Type-of
44
Semantic Matching (Mediate)
• With relationships pre-established within the knowledge-base…
• Identify the Target and the Source(s) and run the match.
ArticleAmount
ProductShares
Automatically linked by a specific % distance
45
Facilitate Decision Making (Mediate)
Helps facilitate rapid decision making
Target element for matching
Automatically calculated semantic distance between terms
Source candidate for matching
46
Enterprise Model (UML)
Data Models(Relational, XML)XML
XMLXML
Physical Sources
Model & Relate information within any domain
Ontology Models(e.g. OWL, RDF)
Relate information in different domains/models
Search within and across domains for related information
Integration Driven By Semantics
47
Ontology-Driven Integration Example
Land
4 Wheel
2 Wheel
TruckBus Car
Fuel Truck
CargoTruck
Transportation T
T
T
T
equivalence
equivalence
equivalence
equivalence
Logical Views Physical SourcesOntology
48
Agenda
• The drivers for data (& metadata) integration• Metadata in an SOA• Data services: using active metadata to drive
data integration• Beyond metadata: dictionaries, vocabularies,
domain models, ontologies (semantics)• Why ontologies?• Overview of Track 4 Presentations• Q & A
49
Track 4 Talks Tomorrow: 2:45-4:15pm
• Predictive Metrics To Guide SOA-Based System Development– John Salasin, NIST
• Integrating SOA and Ontologies for Information Sharing– Atif Kureishy, BAH
• SOA & Semantics– Dave McComb, Semantic Arts
50
Predictive Metrics To Guide SOA Development
John Salasin, NIST• Will propose a set of metrics (vocabulary) to
characterize SOA-based systems• These metrics can be assessed at different points
in the development lifecycle– Early stage (concept development)– Architecture/Construction (system charac.)– Operations (robustness, perf, usage, govern.)– Evolution (extensibility, change mgmnt)
• Analysis can lead to ongoing refinement at every stage
• Quantitative, incremental Verification &Validation
51
Integrating SOA and Ontologies for Information Sharing
Atif Kureishy, BAH
• Will discuss approaches for dynamic use of discoverable services
• Leverage semantic understanding/ definition of application domain
• Ontology-driven application case study
52
SOA & Semantics – Dave McComb
Dave McComb, Semantic Arts
• How firms are using semantic web standards & technology to assist their SOA efforts
• Semantics for service discovery
• Enterprise message modeling
• Dynamic classification of messages
53
Agenda
• The drivers for data (& metadata) integration• Metadata in an SOA• Data services: using active metadata to drive
data integration• Beyond metadata: dictionaries, vocabularies,
domain models, ontologies (semantics)• Why ontologies?• Overview of Track 4 Presentations• Q & A
An Introduction to Track 4: SOA and Metadata
(Semantics)
Chuck Mosher
Senior Enterprise Architect
cmosher @ metamatrix.com
2nd SOA for E-Government Conference30-31 October 20006