a metadata integration assistant generator for heterogeneous databases
DESCRIPTION
A Metadata Integration Assistant Generator for Heterogeneous Databases. Young-Kwang Nam Joseph Goguen Guilian Wang. Data Integration in Synthetic Scientific Applications. Applications. Integrated result without inconsistency, etc. Query. global unified schema/ontology. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/1.jpg)
Young-Kwang Nam
Joseph Goguen
Guilian Wang
A Metadata Integration Assistant Generator for Heterogeneous Database
s
![Page 2: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/2.jpg)
Data Integration in Synthetic Scientific Applications
Integrated result without inconsistency, etc.
Applications
…
Integration System
datasource 1
datasource 2
datasource n
local schema/ontology
local schema/ontology
local schema/ontology
global unifiedschema/ontology
Query
![Page 3: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/3.jpg)
Why Difficult: Data Heterogeneity
• Platform & System Heterogeneity– OS, Hardware – DBMSs, Concurrency control and recovery capabilities
• Syntactic & Structural Heterogeneity– Machine readable aspects of representation – Data models, Schemas,
• Semantic Heterogeneity– Naming conflicts: synonyms, homonyms– Scaling & precision conflicts– Sampling rates, error distribution, etc.
![Page 4: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/4.jpg)
More Difficult: Flexible Integration
• No all-encompassing system satisfies everyone:– frequent update of sources– frequent change of user requirements– non-published data from one’s own lab
• Simplicity and readability are more desirable than completeness or exhaustiveness to domain scientists
• Domain knowledge is crucial for – solving heterogeneities– query optimization
• Desirable to support domain scientists to do data integration on their own
![Page 5: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/5.jpg)
A Common Data Integration Architecture
…
Mediator
datasource 1
datasource 2
datasource n
Query
Wrapper Wrapper Wrapper
Result
An Integrated View Materialized or Virtual
![Page 6: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/6.jpg)
Structural vs. Semanticwrt Mediation Level
• Structural approach (Mediated schema approach)– integration by generating mediated schema that characterize a
set of data sources
• Semantic approach (Ontology-based approach)– difficult to integrate structural aspects of sources from
semantic perspective due to inherent embedded semantics within local schemas & implicit assumptions
– integration by sharing a common ontology among the differentdata sources
![Page 7: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/7.jpg)
Global-as-view vs. Local-as-viewwrt Mapping Direction
• Global-as-view approach– each item in Global schema/ontology as a view (query)
over source schemas/ontologies– query(G) = query(f(S1, S2, …, Sn))– straightforward query rewriting
• Local-as-view approach– Each source as a view/query over global schema/ontology– query(G) = query(f1
-1 (S1), f2-1(S2), …, fn
-1 (Sn))– easy adding or removing sources
![Page 8: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/8.jpg)
Representative Systems
• TSIMMIS (Stanford & IBM, 1995)
• MedMaker (Stanford, 1996)
• MIX (SDSC&UCSD, 2000)
• IM (AT&T, 1996)
• Clio+Garlic (IBM, 2000)
• DIXSE (UT, 2001)
• XYLEME (2001)
• HERMES (UMD, 1994)
• SIMS (USC, 1996)
• Observer (UG, 1996)
• Infosleuth (MCC, 1997)
• COIN (MIT, 1999)
• Ontobroker (Ger., 2000)
• KIND (SDSC&UCSD, 2001)
![Page 9: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/9.jpg)
Our Approach
• Virtual Integration: retrieve data and resolve conflicts at query time, easy maintenance
• Structural Approach: take users’ knowledge on data semantics hidden in structural information as input to achieve semantic mediation
• Local-as-view: easily adds or removes sources, convenient to fit applications
• GUI for specifying semantic mappings through assigning same index to same meaning nodes (paths)
• Automatically generate DDXMI for query decomposition
• Semantic functions
![Page 10: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/10.jpg)
Current Prototype Architecture
User query (XML query)
DDXMIColumn or Path
Column or Path For each DB
XML/DB1 XML/DB2 XML/DBn
XML/DBengine2
query2
XML/DBengine1
query1
XML/DBenginen
queryn
queryGenerator/collector
result1result2
resultn
![Page 11: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/11.jpg)
Distributed Database XML Metadata Interface (DDXMI)
• Include Database or XML document name or location information
• Contain table columns or XML path information
• Function or operation name for resolving semantic issues about table columns or XML elements and attributes
![Page 12: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/12.jpg)
DDXMI DTD
<!ELEMENT DDXMIA (DDXMI.header, DDXMI.isequivalent, documentspec)><!ELEMENT DDXMI.header (documentation,version,date,authorization)><!ELEMENT documentation (#PCDATA)><!ELEMENT version (#PCDATA)><!ELEMENT date (#PCDATA)><!ELEMENT authorization (#PCDATA)><!ELEMENT DDXMI.isequivalent (source,destination*)*><!ELEMENT source (#PCDATA)><!ELEMENT destination (#PCDATA)><!ELEMENT documentspec (document, (elementname,operation*)*)><!ELEMENT document (#PCDATA)><!ELEMENT elementname (#PCDATA)><!ELEMENT operation (#PCDATA)>
![Page 13: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/13.jpg)
How to generate DDXMI
• Define a Master DTD (global schema) based on application requirements for choosing elements or tables from the distributed systems
• Parse the master DTD and generate a path for each element from root to current element
• Assign the master index number to the site element node which has the same meaning of the master DTD node
• May include a function name for some nodes
• Generate DDXMI file automatically by collecting over same index numbers
![Page 14: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/14.jpg)
Generate Master Index
![Page 15: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/15.jpg)
Site1 : Book1 DTD Tree
Index number functionname
![Page 16: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/16.jpg)
Book1 Path Information
0 book1.xml1 /bib/book11 /bib/book/price12 /bib/book/author1211 /bib/book/author/first1212 /bib/book/author/last13 /bib/book/title15 /bib/book/publisher16 /bib/book/editor161 /bib/book/editor/affiliation162 /bib/book/editor/last162 /bib/book/editor/first
Master Index0 book.xml 1 /book 11 /book/price 12 /book/author 121 /book/author/full_name 1211 /book/author/full_name/first_name 1212 /book/author/full_name/last_name 13 /book/title 14 /book/year 15 /book/publisher 16 /book/editor 161 /book/editor/affiliation 162 /book/editor/full_name
Site1 Index
![Page 17: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/17.jpg)
Site 2 : Book2 DTD Tree
![Page 18: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/18.jpg)
Book2 Path Information
0 book2.xml1 /arts/book12 /arts/book/author1211 /arts/book/author/firstname1212 /arts/book/author/lastname13 /arts/book/title15 /arts/book/publisher
Master Index0 book.xml 1 /book 11 /book/price 12 /book/author 121 /book/author/full_name 1211 /book/author/full_name/first_name 1212 /book/author/full_name/last_name 13 /book/title 14 /book/year 15 /book/publisher 16 /book/editor 161 /book/editor/affiliation 162 /book/editor/full_name
Site2 Index
![Page 19: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/19.jpg)
Site 3 : Book3 DTD Tree
![Page 20: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/20.jpg)
Book3 Path Information
Master Index0 book.xml 1 /book 11 /book/price 12 /book/author 121 /book/author/full_name 1211 /book/author/full_name/first_name 1212 /book/author/full_name/last_name 13 /book/title 14 /book/year 15 /book/publisher 16 /book/editor 161 /book/editor/affiliation 162 /book/editor/full_name
0 book3.xml1 /bookstore/book11 /bookstore/book/price12 /bookstore/book/author1211
/bookstore/book/author/name1212
/bookstore/book/author/name13 /bookstore/book/title
Site3 Index
![Page 21: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/21.jpg)
XML Query Languages
• XQL : takes a document point of view• XML-QL : takes a database point of view• Quilt : draws from both areas
– proposed by Don Chamberlin, Jonathan Robie, and Daniela Florescu
– Kweelt (University of Washington), a XML query engine based on Quilt, used in our prototype
• XQuery proposal follows Quilt closely
![Page 22: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/22.jpg)
How to generate site queries
• Parse the master query, a query over the global schema
• If encounter a path, depending on its kind, get corresponding path name from DDXMI file and substitute it
• If there is no corresponding path in the DDXMI, then put it as a null value
no queries generated for that site
![Page 23: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/23.jpg)
How to get site element names
book
price authorpublisher
yeartitle
editor
full_name
first_namelast_name
affiliationfull_name
Master index
book
bookstore
Site Index
price_info
price
DDXMI
[In Quilt Query]
1.book bookstore/book
2. price bookstore/book/price_info/price
price_info/price
cut!!<source>book</source> <destination>booksore/book</destination><source>book/price</source> <destination>bookstore/book/price_info/price<destination>
![Page 24: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/24.jpg)
1:1 Mapping ExampleFOR $book IN document("book.xml")//book
[publisher = "Addison-Wesley"] RETURN <book>$book/title</book>
book
priceauthor
publisher
yeartitle
editor
full_name
first_name last_name
affiliation full_name
Master index
book
bib
publisher title
Book1
book
arts
publisher title
Book2
book
bookstore
title
Book3
![Page 25: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/25.jpg)
Query Execution Result
![Page 26: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/26.jpg)
1:N Mapping ExampleFOR $edi IN document("book.xml")//book/editorRETURN <editor>$edi/full_name</editor>
book
priceauthor
publisheryear
title
editor
full_name
first_name last_name
affiliation full_name
Master index
book
bib
editor
Book1
book
artsBook2
book
bookstoreBook3
last first
<source>/book/editor/full_name</source><destination>/bib/book/editor/last,/bib/book/editor/first</destination>
DDXMI
![Page 27: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/27.jpg)
Query Execution Result
![Page 28: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/28.jpg)
N:1 Mapping ExampleFOR $a IN document("book.xml")//book//authorRETURN <author> $a/last_name,$a/first_name </author>
book
priceauthor
publisheryear
title
editor
full_name
last_name first_name
affiliationfull_name
Master index
book
bib
author
Book1
book
bookstoreBook3last first
book
arts
author
Book2
lastname firstname
author
name
<operation>lstring</operation>
<operation>fstring</operation>
![Page 29: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/29.jpg)
Query Generation Result
import split as UDF_split;
FUNCTION fstring($str){ split(" ",$str)[1]}
FUNCTION lstring($str){ split(" ",$str)[2]}
FOR $a IN document("book3.xml") //book//author
RETURN <author> fstring($a/name), lstring($a/name)</author>
![Page 30: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/30.jpg)
Query Execution Result
![Page 31: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/31.jpg)
Semantic Function Involved ExampleFOR $book IN document("book.xml")//bookRETURN <book> $book/title,$book/author,$book/price </book>
<operation>div(100)</operation>
book
priceauthor
publisheryear
title
editor
full_name
first_name last_name
affiliation full_name
Master index
book
bib
price
Book1
book
artsBook2
book
bookstoreBook3
price
![Page 32: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/32.jpg)
Query Execution Result
![Page 33: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/33.jpg)
Remaining Issues• Handle attributes: one DTD has an attribute but others don’t, or an attri
bute in one DTD as an element in others• More efficient way for generating DDXMI file automatically when there
are many paths in the master DTDe.g., tree:tree mapping: if two paths are indicated as the same and have the same children, then the index numbers should be generated automatically
• Migrate to XML schemas, instead of DTDs• Support JOIN, PRODUCT generated by queries• Move to XQuery and a query engine with distributed query support• Integrate the individual site query results as one return as a single data s
ource ready for further analysis • Provide mechanisms for removing redundancy• Justify the semantics of the query generated
![Page 34: A Metadata Integration Assistant Generator for Heterogeneous Databases](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813b46550346895da42557/html5/thumbnails/34.jpg)
• Our prototype uses distributed metadata to generate a GUI tool to describe mappings between master and local databases by assigning index numbers and specifying conversion function names
• Uses Quilt as its XML query language. • A DDXMI file is generated based on the mappings, and is
used to translate queries over the virtual master database into sub-queries to local databases
• An experiment testing feasibility is reported in which 3 different bibliography databases are integrated.
• Implemented with Java Webserver and JavaCC• Move to real applications, e.g. in the context of NSF proje
ct SEEK (Science Environment for Ecological Knowledge)
Conclusion