matching and reuse of xml schemas
DESCRIPTION
Matching and Reuse of XML Schemas . Sample XML Schema. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/1.jpg)
1
Matching and Reuse of XML Schemas
![Page 2: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/2.jpg)
2
Sample XML Schema<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="car"> <xs:complexType> <xs:sequence> <xs:element name="make" type="xs:string"/> <xs:element name="model" type="xs:string"/> <xs:element name="year" type="xs:string"/> <xs:element name="color" type="xs:string"/> <xs:element name="driver"> <xs:complexType>
<xs:sequence> <xs:element name="first" type="xs:string"/> <xs:element name="last" type="xs:string"/> <xs:element name="license" type="xs:string"/></xs:sequence>
</xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
![Page 3: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/3.jpg)
3
What is XML schema matching
Matching – identifying the relations among the corresponding elements of two schemas e.g. customer/firstName <==> client/name/first customer/name <==> concatenate (client/name/first, client/name/last)
Calculate the distance between two Schemas E.g., distance between customer.xsd and client.xsd is 0.67.
![Page 4: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/4.jpg)
4
Why XML Schema matching From data integration point of view:
Purpose: Automatically identifying corresponding elements between two schemas Relevant works:
Database schema matching/mapping, e.g., A. Doan, et al., Reconciling schemas of disparate data sources: A machine-learning approach. SIGMOD, 2001
Generic schema mapping, e.g., J. Madhavan, P. A. Bernstein, E. Rahm. Generic schema matching with Cupid. VLDB, 2001.
XML Schema matching. E.g. H. Do, E. Rahm. COMA A system for flexible combination of schema matching approaches. VLDB 2002.
From web service composition point of view e.g., matching the output type of one service with the input of another in
sequential composition From software reuse point of view:
Purpose: Build XML Schema categories and search engines; Relevant works:
Software component search: A Mili, R Mili, RT Mittermeir, A survey of software reuse libraries, Annals of Software Engineering, 1998.
Agent and service matching: Katia Sycara, Jianguo Lu, Matthias Klusch, Interoperability among Heterogeneous Software Agents on the Internet, Technical Report CMU-RI-TR-98-22, CMU.
![Page 5: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/5.jpg)
5
What are the problems
Modelling As graph As tree matching
Node similarity Name, type, cardinality.
Structure similarity Tree edit distance
K. Zhang, D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 1989.
![Page 6: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/6.jpg)
6
Overview of our system
XMLSchema Name
Similarity
XMLSchema
Modelling Structural RelationsName Relations
Results retrieval
Node Relations
NodeSimilarity
Structural similarity
![Page 7: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/7.jpg)
7
Three similarities
WordNet,string matching
Hungarian method
NameSimilarity
NodeSimilarity
Structural Similarity
Node name Hierarchicalstructure
Compatibilitytables
User-defineddata type
Built-indata type Cardinality
Tree matchingalgorithm
![Page 8: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/8.jpg)
8
Modelling
<xs:element name="driver" type="driverType"/>
<xs:attribute name="license" type="xs:string"/>
Model schem
as as trees
![Page 9: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/9.jpg)
9
ModellingcustomerOrder
shipping billing address
date ship2Add date bill2Add street province postcode
schema
reference
paper
authortitle contents
refNo paper
customerOrder
shipping billing
date ship2Add date bill2Add
schema
street
address
province postcode
street
address
state zip
Address_ca.xsd Address_us.xsd
Model schem
as as trees
Reference
Importing and Inclusion
Recursion
![Page 10: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/10.jpg)
10
Information excluded in Modelling Related to elements or attributes
Default value, value range, unique, nullable…
Related to structure Sequence All Choice
name
first last
name
last first
Model schem
as as trees
![Page 11: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/11.jpg)
11
Computing node similarity Computing name similarity with the help of:
WordNet and its API String matching Hungarian method
Add the similarity of other information Data type Minimum cardinality Maximum cardinality
Node similarity
![Page 12: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/12.jpg)
12
Name similarity from token lists Tokenize names
E.g. clientName -> client name submittedReports -> submit report
Similarity between two token lists Using Hungarian method for Weighted Bipartite Graph Matching
(WBGM)
simi,j
sim0,0customer
delivery
address
client
require
shipping
address
customerDeliveryAddress vs. clientRequiredShippingAddress
Node similarity
![Page 13: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/13.jpg)
13
Determine the structural relation
Tree 1 Tree 2
Structure similarity
![Page 14: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/14.jpg)
14
Common substructure
car
make
model
year
colordriver
firstName
lastName
license
make
carmodel
year
color
driver
first
last
license
Structure similarity
![Page 15: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/15.jpg)
15
Approximate Common Structure
car
make
model
year
colordriver
firstName
lastName
license
make
carmodel
year
color
driver
first
last
license
Structure similarity
![Page 16: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/16.jpg)
16
Mappings in an ACS
car
make
model
year
color
driver
first (firstName)
last (lastName)
license
mACS1 = {(s1.car, s2.car), (s1.make, s2.make), (s1.year, s2.year), (s1.color, s2.color)}
mACS2 = {(s1.dirver, s2.driver), (s1.fist, s2.firstName), (s1.last, s2.lastName), (s1.license, s2.license)}
ACS1
ACS2
Structure similarity
![Page 17: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/17.jpg)
17
Evaluation Criteria
Matching outcomes Mappings Schema similarity
Execution time
Collected four groups of Schemas Purchase orders used in COMA (5) Large schemas from XML.org (86) Schemas on hospitality domain (95) Extract from WSDL (419)
Evaluation
![Page 18: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/18.jpg)
18
Comparison with edit distance algorithm element mapping on data group 1
0.00.10.20.30.40.50.60.70.80.91.0
Precision by method 1 Recall by method 1Precision by method 2 Recall by method 2
Evaluation
Method 1: our algorithmMethod 2: edit distance
![Page 19: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/19.jpg)
19
Comparison with edit distance: schema similarity data group 3 and 4
Top-k Precision
0.0
0.2
0.4
0.6
0.8
1.0
Method 1 onSchema group
3
Method 1 onSchema group
4
Method 2 onSchema group
3
Method 2 onSchema group
4
Top-3 Precision
Top-5 Precision
Evaluation
Method 1: our algorithmMethod 2: edit distance
![Page 20: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/20.jpg)
20
Comparison with edit distance: performance on data group 2
0
50
100
150
200
250
Input size (M*N)
(sec
onds
)
Avg Matching Time 1 Avg Matching Time 2
Evaluation
Method 1: our algorithmMethod 2: edit distance
![Page 21: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/21.jpg)
21
Comparison with COMA (Mapping)
COMA – 'All' COMA – 'All+SchemaM' Our algorithm
Precision about 0.95 about 0.93 0.88
Recall about 0.78 about 0.89 0.87
Overall 0.73 0.82 0.75
Overall is a measure that combines precision and recall. It reflects the efforts of removing incorrect mappings and adding missing ones.
Evaluation
![Page 22: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/22.jpg)
22
Conclusion
Scalable schema matching Wang Lian, David W. Cheung, Nikos Mamoulis, and Siu-Ming Yiu,
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, TKDE, 2005.
Subtyping
Apply to web service matching
![Page 23: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/23.jpg)
23
Web service synthesis
![Page 24: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/24.jpg)
24
Web Service Composition
Composite web service: “service implemented by combining the functionality provided by other web services” –G. Alonso et al.
Web service composition: the process of developing a composite web service
Approaches to web service composition: Conventional programming languages, such as Java, C#; Web service composition languages, such as BPEL; Workflow, pi-calculus, petri net, automata… Web service synthesis.
composition
![Page 25: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/25.jpg)
25
Web Service Synthesis
BPEL and the like are still programming languages They describe exactly how to compose the web services.
Web service synthesis We describe what is the service. But don’t describe how to
implement it; We don’t even know what are the component services involved;
The relevant services are discovered and invoked dynamically; The implementation is synthesized from the web service
specification, automatically.
Program synthesis has a long history.
composition
![Page 26: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/26.jpg)
26
Web Service Synthesis
WSSyntactic Specification (WSDL)Semantic Specification (Datalog)
Service Implementation
Service Specification (WSDL/Datalog)
WS2
WS1
WSService Implementation (BPEL)
composition
![Page 27: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/27.jpg)
27
Syntactic specification: …Semantic Specification:chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR).
Synthesis Example
Service specificationSyntactic: Interface definition defined by WSDLSemantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).
Service ImplementationJava code, database
Service SpecificationSyntactic specification:WSDL fileSemantic Specification:amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).
Chapters
amazon
MetaSearchService
??
MetaSearchService Implementation
composition
![Page 28: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/28.jpg)
28
Generate the abstract implementation by query rewriting
Syntactic specification: …Semantic Specification:chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR).
Service specificationSyntactic: Interface definition defined by WSDLSemantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).
Service ImplementationJava code, database
Service SpecificationSyntactic specification:WSDL fileSemantic Specification:amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).
Chapters
amazon
MetaSearchService
Q(ISBN, PRICE, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR).
MetaSearchService Abstract Implementation
composition
![Page 29: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/29.jpg)
29
Generate the Concrete Implementation
Syntactic specification: …Semantic Specification:chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR).
Service specificationSyntactic: Interface definition defined by WSDLSemantic: Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- …
Service ImplementationJava code, database
Service SpecificationSyntactic specification:WSDL fileSemantic Specification:amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE).
Chapters
amazon
MetaSearchService
Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR).
MetaSearchService Abstract Implementation
Invoke amazon;Invoke chapters;Combine the output;
MetaSearchService Concrete Implementation
composition
![Page 30: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/30.jpg)
30
It is a lightweight approach…
Web services are restricted to be database queries or functions that can be described by database queries or Datalog;
Semantic specification is Datalog instead of more powerful specification mechanism employing ontology;
Compositions are restricted to data composition instead of full-blown process specification such as BPEL.
All those choices are meant for the construction of a practical web service synthesis system…
composition
![Page 31: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/31.jpg)
31
Mapping between Datalog and Web Services
Database vendors also provide wrappers for web services Behind a web service there is a SQL query that corresponds to the
web service; SQL defines the semantics of the web service. Major database vendors support the mapping between SQL and
Web service; We experimented with DB2WS.
Malaika, S. et al. DB2 and Web Services. IBM System Journal, 41(4), pp. 666-685. 2002.
composition
![Page 32: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/32.jpg)
32
Generate the Abstract Implementation by Query rewriting
Definition: Given a query Q and a set of views V. A rewriting of Q using V is a query Q’ such that Q=Q’, and Q’ refers to one or more views in V.
Q T1, T2, T3.
Query:
Views:
Rewriting 2:Q V1, V2.
Rewriting 1:Q V1, T3.
V1T1,T2.V2T2,T3.
composition
![Page 33: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/33.jpg)
33
Our query rewriting system
composition
![Page 34: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/34.jpg)
34
Limitations of our approach
Focus on database web services; Datalog is not expressive enough.
Query rewriting in Description Logic, or OWL.
Assume the existence of global database schemas: Service providers need to provide the semantic definition of web
services in terms a global database schema; New service specification is also defined using the common schema
Schema matching
composition
![Page 35: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/35.jpg)
35
Other threads
Web service collection and clustering From UDDI, Crawler, Search engines such as Google Master thesis to be finished this summer
Web service metrics Schema subtyping
Based on regular tree grammar Master thesis to be finished this summer
Bottom up web service composition Semantic web service
![Page 36: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/36.jpg)
36
Service Oriented Architecture
Discovery agency
ProviderRequesterinteract
findpublish
![Page 37: Matching and Reuse of XML Schemas](https://reader035.vdocuments.mx/reader035/viewer/2022081505/56815f2b550346895dcdf63b/html5/thumbnails/37.jpg)
37
Web service discovery
Keywords search Based on IR techniques, such as vector space model Fast, but not accurate
Signature matching Decide subtype relations between input and output of web services Used in service composition, to find composable web services
Relaxed matching Approximate matching, allowing small deviations in both structure
and words/tags Semantic matching
Matching functional requirements of web services Used in adaptive, autonomous systems