leveraging set, owl, cam and dictionary based tools to enabled automated cross-dictionary domain...
TRANSCRIPT
Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations
David WebberOASIS SET TC / CAM TC(with excerpts from iSURF presentation by Prof. Dr. Asuman Dogac, METU-SRDC, Turkey)
OASIS SET TC Automating Intra-domain Mappings
Agenda
Part I: Introduction – Intra-domain example use cases Challenges and Opportunities
Part II: Roadmap – CAM templates, OWL, XPath, Dictionaries, CCTS Using Dictionary based approach and SET Tools
for aligning structure components across syntax vocabularies within a domain
Part III: Summary – Next Steps
Part I: Intra-domain Example Use Cases
Information Exchange Interoperability Many common domains are using multiple
vocabularies that have arisen historically over time – e.g. banking, healthcare, supply chain1.
These may be weakly or strongly aligned depending on the domain and fragmentation / marketplaces within it
All domains share common components such as organisation, person, customer, vehicle, address.
1 – X12/EDI, UN/CEFACT, UBL, GS1, xCBL, cXML, FIX, SWIFT, HL7, more…
Dictionary alignment task challenges Each domain can be inspected by comparing
the vocabulary dictionaries Creating dictionaries in a common reference
format has previously been complex and manual intensive process
Even within a domain implementation the vocabulary maybe fragmented and inconsistent because information models evolve over time
Opportunities and Potential
Creating an agnostic set of methods and tools that allow alignment within a domain to facilitate consistent information definitions
Leverage the approach to also support semi or fully automated mapping patterns and templates
Use open standards and open source tools Provide open public roadmap for tool vendors Allow standards groups to publish their exchanges in
an open non-proprietary syntax and rule system Enable SMBs to build once, exchange to many
Part II: Roadmap – CAM templates, OWL, XPath Dictionaries and CCTS
CAM templates, OWL and dictionaries Information components derive their meaning and
semantics from the context of their use pattern, not the physical name label, e.g. Customer/Account/Number Order/Item/Number
CAM templates and OWL terms share ability to express use patterns that can be inspected and equivalence deduced using software agents that traverse the exchange structure components
Matching is based on rules that can be tailored and reference to dictionaries of known properties
Allows automated generation of domain dictionaries
CAM templates, XPath and dictionaries CAM toolkit contains dictionary analysis tools
that can: Create a new dictionary from existing domain
exchange transactions Merge dictionaries together Compare exchange transactions to dictionary
definitions and produce spreadsheet of matches and deltas
Report XPath location usage patterns of all unique items and exchange transactions
Assign unique UID values to each component
CAM dictionary generation overview
XSD schemas
XSD schemas
CAM Templates
CAM Templates
XSLTscriptXSLTscript
Master Dictionary
Master Dictionary
Compare &Merge
Components:NameDescriptionTypeRestrictionsRelationshipsUsage occurrences
XSLTscriptXSLTscript
11 22
33
UID
Dictionary Tools Generate a dictionary of core components from a
set of exchange templates Separate dictionary content by namespace Merges annotations and type definitions from
exchange template into dictionary Compare each exchange template to the master
domain dictionary Produce spreadsheet workbooks Update spreadsheet and export back to dictionary
core components
Create Dictionary – CAM process
Select Dictionary; empty for new create, or existing for merge
Output dictionary filename
Select template content namespace to match with
Merge mode; use true to combine content
Compare to Dictionary
Pick dictionary to compare with
Name of result cross-reference file
Open Cross-Reference as Spreadsheet
CAM template to OWL exporter Currently CAM toolkit contains a variety of exporter
tools into XSD schema, XML dictionary and XML test case example generation
Opportunity to write exporter that generates OWL terms directly from CAM template patterns in dictionary
Using XSLT to accomplish this, so can be easily adapted, extended and tailored
Allows OWL-based reasoner to act with CAM Reasoner can also then update CAM dictionary to
complete the semantic mapping
CAM to OWL generation overview
OWL terms instances
OWL terms instances
Extract andGenerate
Components:NameDescriptionTypeRestrictionsRelationships
Master DictionaryMaster Dictionary
XSLTscriptXSLTscript11
UID
22
ReasonerReasoner
33
44UID UID
Insert UID couplet pairings
Explicate semantics related with the different usages of document data types Different document standards use CCTS Data
Types differently For example, “Code.Type" in one standard is
represented by “Text.Type" in another standard and yet with “Identifier.Type" in another standard
This knowledge in real world is expressed through class equivalences so that not only the humans but also the reasoner knows about it Code.Type ≡ Text.Type Name.Type ≡ Text.Type Identifier.Type ≡ Text.Type Can cross-reference via UID as well as type
Dictionaries, UIDs, and CAM templates Within a dictionary each unique context of an item
can be assigned a UID label value These UID label values can then be inserted as
references into a CAM template Each UID couplet across exchange formats within a
domain can be marked as equivalent or similar (aliases)
This allows automated mapping across CAM template definitions
For similar items, CAM supports transform rules in standard XPath syntax
Dictionary Alignment Step Human / OWL inspectors Dictionary alignment report produces known
equivalents listing (confidence 100%), and then lesser equivalence rankings based on matching factors
Component compound relationships resolved using CAM template structure layouts
Human inspection then reviews and resolves and updates dictionary (using Excel spreadsheet workbook format)
New dictionary produced Iterative refinement over time can enhance alignment
along with common practices through industry agreements
From Dictionary to Runtime Mapping Once dictionary is available with UID
couplets for domain crosswalks – proceed to align Take templates of actual exchanges – and label
these with UID couplets Lookup UID couplets in dictionary and update
target template with UID from couplet Take completed templates – use to drive actual
mapping processes
Create UID driven mapping template
CAM template(source)
CAM template(source)
CAM template(target)
CAM template(target)
DomainMaster Dictionary
DomainMaster Dictionary
UID UIDLookup UID couplet
XSLTscriptXSLTscript
UIDsUIDs
UpdatedCAM template
(target)
UpdatedCAM template
(target)
UIDsUIDs
11
22
33
Same, orSimilar
+ optional XPath mapping rule
RulesRules
Automated UID driven mapping
CAM template(source)
CAM template(source)
CAM template(target)
CAM template(target)
UIDsUIDs
XSLTscriptXSLTscript
Input XML instance
Input XML instance
OutputXML instance
OutputXML instance
UIDsUIDs
11
22
33
44
RulesRules
Dictionary approach summary1. If the document components of two different domain standards share the
same semantic properties: Use this as an indication that they may be similar
2. Some explicitly defined semantic properties may imply further implicit semantic relationships:
Use a reasoner to obtain implicit relationships Align to dictionary definitions allowing crosswalk Create harmonized dictionary lookup Use abstract UID as common reference (linkage between language
specific named types/objects)3. Explicate semantics related with the different usages of document data
types in different document schemas to obtain some desired interpretations by means of such informal semantics
Determine similar/match relationships and rules for constraint alignment and compound component relationships (e.g. date-time vice date and time)
4. Provide dictionary structure format for managing relationships Leverage existing OASIS CAM and ebXML Registry TC work
Part III: Summary – Next Steps
Summary Develop crosswalks:
Convert XSD schema to CAM templates Leverage template structure and XPath rules to build
dictionaries with UID labels Build OWL relationships from dictionaries Compare each dictionary to master dictionary and
reference OWL and type knowledge bases to align Produce spreadsheet for manual review Save final results back to master dictionary
Build runtime templates: Compare individual CAM templates to master dictionary,
generate cross-walk section between components Cross-walk can contain alignment rules in XPath for
content handling (e.g. code values and re-formatting)
Tools needed
CAM Schema ingesting Dictionary builder
OWL Reasoner CAM dictionary to OWL generator Extend CAM dictionary format for couplets / rules Extend reasoner to update dictionary couplets
Mapping XSLT engine to read input, templates and create output
(Can use existing XSLT CAM validator as basis)
GS1.XML UID UBL 2.0
Forecast.Indicator.Indicator A1034 Forecast.BasedOnConsensus_Indicator.Indicator
PartyIdentification.Details C3401 PartyIdentification.Details
PartyIdentification.Primary_Identification.GLN_Identifier C3402 PartyIdentification.Identifier
NonGLN_PartyIdentification.Details C3451 PartyIdentification.Details
NonGLN_PartyIdentification.Identification.Text C3452 PartyIdentification.Identifier
ElectronicDocument.Status.Identifier D4310 Forecast.DocumentStateCode.Code
Abstract_Forecast.Purpose.ForecastPurposeCriteriaType_Code E0010 Forecast.PurposeCode.Code
Multi_unitMeasure.Measure.Measure F0301 Dimension.Measure
Abstract_Forecast_TimeStampedTradeItemQuantity.Association.Code
E0451 Forecast.Identifier.Identifer
Date_TimePeriod.EndDate.Date_DateTime T0012 Period.EndDate.Date, Period.EndTime.Time
Date_TimePeriod.BeginDate.Date_DateTime T0013 Period.StartDate.Date, Period.StartTime.Time
TimePeriod.Details T0009 Period.Details
TimePeriod.Length.Duration_Measure T0008 Period.Duration.Measure
TimePeriod.Type.Code T0021 Period.DescriptionCode.Code
TradeItemIdentification.Details F0340 ItemIdentification.Details
TradeItemIdentification.Primary_Identification.GTIN_Identifier F0341 ItemIdentification.Identifier
NonGTIN_TradeItemIdentification.Details F0342 ItemIdentification.Details
NonGTIN_TradeItemIdentification.Identification.Type_Code ItemIdentification.Extended_Identifier.Identifier
The above equivalences are labelled as couplets through the UID dictionary cross-references and can be stored back into CAM templates <Extensions> section for runtime crosswalk use.
Runtime crosswalks between template structure member items