leveraging set, owl, cam and dictionary based tools to enabled automated cross-dictionary domain...

28
Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with excerpts from iSURF presentation by Prof. Dr. Asuman Dogac, METU-SRDC, Turkey) OASIS SET TC Automating Intra-domain Mappings

Upload: darcy-lee

Post on 13-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations

David WebberOASIS SET TC / CAM TC(with excerpts from iSURF presentation by Prof. Dr. Asuman Dogac, METU-SRDC, Turkey)

OASIS SET TC Automating Intra-domain Mappings

Page 2: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Agenda

Part I: Introduction – Intra-domain example use cases Challenges and Opportunities

Part II: Roadmap – CAM templates, OWL, XPath, Dictionaries, CCTS Using Dictionary based approach and SET Tools

for aligning structure components across syntax vocabularies within a domain

Part III: Summary – Next Steps

Page 3: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Part I: Intra-domain Example Use Cases

Page 4: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Information Exchange Interoperability Many common domains are using multiple

vocabularies that have arisen historically over time – e.g. banking, healthcare, supply chain1.

These may be weakly or strongly aligned depending on the domain and fragmentation / marketplaces within it

All domains share common components such as organisation, person, customer, vehicle, address.

1 – X12/EDI, UN/CEFACT, UBL, GS1, xCBL, cXML, FIX, SWIFT, HL7, more…

Page 5: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Dictionary alignment task challenges Each domain can be inspected by comparing

the vocabulary dictionaries Creating dictionaries in a common reference

format has previously been complex and manual intensive process

Even within a domain implementation the vocabulary maybe fragmented and inconsistent because information models evolve over time

Page 6: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Opportunities and Potential

Creating an agnostic set of methods and tools that allow alignment within a domain to facilitate consistent information definitions

Leverage the approach to also support semi or fully automated mapping patterns and templates

Use open standards and open source tools Provide open public roadmap for tool vendors Allow standards groups to publish their exchanges in

an open non-proprietary syntax and rule system Enable SMBs to build once, exchange to many

Page 7: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Part II: Roadmap – CAM templates, OWL, XPath Dictionaries and CCTS

Page 8: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

CAM templates, OWL and dictionaries Information components derive their meaning and

semantics from the context of their use pattern, not the physical name label, e.g. Customer/Account/Number Order/Item/Number

CAM templates and OWL terms share ability to express use patterns that can be inspected and equivalence deduced using software agents that traverse the exchange structure components

Matching is based on rules that can be tailored and reference to dictionaries of known properties

Allows automated generation of domain dictionaries

Page 9: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

CAM templates, XPath and dictionaries CAM toolkit contains dictionary analysis tools

that can: Create a new dictionary from existing domain

exchange transactions Merge dictionaries together Compare exchange transactions to dictionary

definitions and produce spreadsheet of matches and deltas

Report XPath location usage patterns of all unique items and exchange transactions

Assign unique UID values to each component

Page 10: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

CAM dictionary generation overview

XSD schemas

XSD schemas

CAM Templates

CAM Templates

XSLTscriptXSLTscript

Master Dictionary

Master Dictionary

Compare &Merge

Components:NameDescriptionTypeRestrictionsRelationshipsUsage occurrences

XSLTscriptXSLTscript

11 22

33

UID

Page 11: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Dictionary Tools Generate a dictionary of core components from a

set of exchange templates Separate dictionary content by namespace Merges annotations and type definitions from

exchange template into dictionary Compare each exchange template to the master

domain dictionary Produce spreadsheet workbooks Update spreadsheet and export back to dictionary

core components

Page 12: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Create Dictionary – CAM process

Select Dictionary; empty for new create, or existing for merge

Output dictionary filename

Select template content namespace to match with

Merge mode; use true to combine content

Page 13: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Compare to Dictionary

Pick dictionary to compare with

Name of result cross-reference file

Page 14: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Open Cross-Reference as Spreadsheet

Page 15: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

CAM template to OWL exporter Currently CAM toolkit contains a variety of exporter

tools into XSD schema, XML dictionary and XML test case example generation

Opportunity to write exporter that generates OWL terms directly from CAM template patterns in dictionary

Using XSLT to accomplish this, so can be easily adapted, extended and tailored

Allows OWL-based reasoner to act with CAM Reasoner can also then update CAM dictionary to

complete the semantic mapping

Page 16: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

CAM to OWL generation overview

OWL terms instances

OWL terms instances

Extract andGenerate

Components:NameDescriptionTypeRestrictionsRelationships

Master DictionaryMaster Dictionary

XSLTscriptXSLTscript11

UID

22

ReasonerReasoner

33

44UID UID

Insert UID couplet pairings

Page 17: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Explicate semantics related with the different usages of document data types Different document standards use CCTS Data

Types differently For example, “Code.Type" in one standard is

represented by “Text.Type" in another standard and yet with “Identifier.Type" in another standard

This knowledge in real world is expressed through class equivalences so that not only the humans but also the reasoner knows about it Code.Type ≡ Text.Type Name.Type ≡ Text.Type Identifier.Type ≡ Text.Type Can cross-reference via UID as well as type

Page 18: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Dictionaries, UIDs, and CAM templates Within a dictionary each unique context of an item

can be assigned a UID label value These UID label values can then be inserted as

references into a CAM template Each UID couplet across exchange formats within a

domain can be marked as equivalent or similar (aliases)

This allows automated mapping across CAM template definitions

For similar items, CAM supports transform rules in standard XPath syntax

Page 19: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Dictionary Alignment Step Human / OWL inspectors Dictionary alignment report produces known

equivalents listing (confidence 100%), and then lesser equivalence rankings based on matching factors

Component compound relationships resolved using CAM template structure layouts

Human inspection then reviews and resolves and updates dictionary (using Excel spreadsheet workbook format)

New dictionary produced Iterative refinement over time can enhance alignment

along with common practices through industry agreements

Page 20: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

From Dictionary to Runtime Mapping Once dictionary is available with UID

couplets for domain crosswalks – proceed to align Take templates of actual exchanges – and label

these with UID couplets Lookup UID couplets in dictionary and update

target template with UID from couplet Take completed templates – use to drive actual

mapping processes

Page 21: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Create UID driven mapping template

CAM template(source)

CAM template(source)

CAM template(target)

CAM template(target)

DomainMaster Dictionary

DomainMaster Dictionary

UID UIDLookup UID couplet

XSLTscriptXSLTscript

UIDsUIDs

UpdatedCAM template

(target)

UpdatedCAM template

(target)

UIDsUIDs

11

22

33

Same, orSimilar

+ optional XPath mapping rule

RulesRules

Page 22: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Automated UID driven mapping

CAM template(source)

CAM template(source)

CAM template(target)

CAM template(target)

UIDsUIDs

XSLTscriptXSLTscript

Input XML instance

Input XML instance

OutputXML instance

OutputXML instance

UIDsUIDs

11

22

33

44

RulesRules

Page 23: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Dictionary approach summary1. If the document components of two different domain standards share the

same semantic properties: Use this as an indication that they may be similar

2. Some explicitly defined semantic properties may imply further implicit semantic relationships:

Use a reasoner to obtain implicit relationships Align to dictionary definitions allowing crosswalk Create harmonized dictionary lookup Use abstract UID as common reference (linkage between language

specific named types/objects)3. Explicate semantics related with the different usages of document data

types in different document schemas to obtain some desired interpretations by means of such informal semantics

Determine similar/match relationships and rules for constraint alignment and compound component relationships (e.g. date-time vice date and time)

4. Provide dictionary structure format for managing relationships Leverage existing OASIS CAM and ebXML Registry TC work

Page 24: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Part III: Summary – Next Steps

Page 25: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Summary Develop crosswalks:

Convert XSD schema to CAM templates Leverage template structure and XPath rules to build

dictionaries with UID labels Build OWL relationships from dictionaries Compare each dictionary to master dictionary and

reference OWL and type knowledge bases to align Produce spreadsheet for manual review Save final results back to master dictionary

Build runtime templates: Compare individual CAM templates to master dictionary,

generate cross-walk section between components Cross-walk can contain alignment rules in XPath for

content handling (e.g. code values and re-formatting)

Page 26: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Tools needed

CAM Schema ingesting Dictionary builder

OWL Reasoner CAM dictionary to OWL generator Extend CAM dictionary format for couplets / rules Extend reasoner to update dictionary couplets

Mapping XSLT engine to read input, templates and create output

(Can use existing XSLT CAM validator as basis)

Page 27: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

GS1.XML UID UBL 2.0

Forecast.Indicator.Indicator A1034 Forecast.BasedOnConsensus_Indicator.Indicator

PartyIdentification.Details C3401 PartyIdentification.Details

PartyIdentification.Primary_Identification.GLN_Identifier C3402 PartyIdentification.Identifier

NonGLN_PartyIdentification.Details C3451 PartyIdentification.Details

NonGLN_PartyIdentification.Identification.Text C3452 PartyIdentification.Identifier

ElectronicDocument.Status.Identifier D4310 Forecast.DocumentStateCode.Code

Abstract_Forecast.Purpose.ForecastPurposeCriteriaType_Code E0010 Forecast.PurposeCode.Code

Multi_unitMeasure.Measure.Measure F0301 Dimension.Measure

Abstract_Forecast_TimeStampedTradeItemQuantity.Association.Code

E0451 Forecast.Identifier.Identifer

Date_TimePeriod.EndDate.Date_DateTime T0012 Period.EndDate.Date, Period.EndTime.Time

Date_TimePeriod.BeginDate.Date_DateTime T0013 Period.StartDate.Date, Period.StartTime.Time

TimePeriod.Details T0009 Period.Details

TimePeriod.Length.Duration_Measure T0008 Period.Duration.Measure

TimePeriod.Type.Code T0021 Period.DescriptionCode.Code

TradeItemIdentification.Details F0340 ItemIdentification.Details

TradeItemIdentification.Primary_Identification.GTIN_Identifier F0341 ItemIdentification.Identifier

NonGTIN_TradeItemIdentification.Details F0342 ItemIdentification.Details

NonGTIN_TradeItemIdentification.Identification.Type_Code ItemIdentification.Extended_Identifier.Identifier

The above equivalences are labelled as couplets through the UID dictionary cross-references and can be stored back into CAM templates <Extensions> section for runtime crosswalk use.

Page 28: Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with

Runtime crosswalks between template structure member items