benchmarks from usage to evaluationperso.ens-lyon.fr/christian.perez/_media/160614/lsc-j2...mphone...
TRANSCRIPT
Angela Bonifati (Université Lyon 1 & CNRS Liris, BD Team)
2ème Journée LSC du LIP, June 24th 2016
Benchmarks
From Usage To Evaluation
Schema Matching and Mapping Systems
© Journées LSC du LIP 2016, A. Bonifati
Introduction
l Data is inherently heterogeneous n Due to the explosion of online data repositories n Due to the variety of users, who develop a wealth
of applications u At different time u With disparate requirements in their mind
l A fundamental requirement is to translate data across different formats n How data is transformed from one format to
another is done through mappings
© Journées LSC du LIP 2016, A. Bonifati
Mappings Are All Around
l Data integration [Lenzerini 2002] n to specify the relationship between local and
global schemas
S1
S2
S3
Global Schema T
I1
I2
I3 Sources
mappings
© Journées LSC du LIP 2016, A. Bonifati
Mappings Are All Around
l Schema integration [Batini et al. 1986] n to specify the relationship between the input
schemas and the integrated schemas
S1
S2
S3
Integrated Schema
Input Schemas
mappings
© Journées LSC du LIP 2016, A. Bonifati
Mappings Are All Around
l Data exchange [Fagin et al. 2005] n to specify the relationship between source and
target schemas
S T
mappings
Source schema Target Schema
I I J
© Journées LSC du LIP 2016, A. Bonifati
Mappings Are All Around
l Schema evolution [Lerner 2000] n to specify the relationship between the old and
new version of an evolved schema
S1 S1’ S1’’
Evolving Schema S1
mappings
© Journées LSC du LIP 2016, A. Bonifati
How Did It All Start
l One of the first systems to deal with this problem was developed at IBM in 1977: n EXPRESS (EXtraction, Processing and REStructuring
System) [Shu et al. 1977] consists of two languages: u DEFINE that works as a DDL (Data Definition Language) u CONVERT that works as a DTL (Data Translation Language)
and has a total of 9 operators, each of which receives as input a data file, performs the respective transformation and generates an output data file.
l EXPRESS required the users familiarity with the languages and was customized to only one model (hierarchical)
l After that, inter-model transformations were also studied [Tork-Roth et al. 1997] [Atzeni et al. 1997]
© Journées LSC du LIP 2016, A. Bonifati
A Data Transfer Example
Source: Rcd projects: Set of project: Rcd name status
grants: Set of grant: Rcd gid project recipient manager supervisor
contacts: Set of contact: Rcd cid email phone
companies: Set of company: Rcd name official
Target: Rcd projects: Set of project: Rcd code funds: Set of fund: Rcd fid finId
finances: Set of finance: Rcd finId mPhone company
companies: Set of company: Rcd coid name
name status PIX Active E-services Active Clio Inactive
cid email phone Benedikt [email protected] 5827766 Hull [email protected] 5824509 Shrivastava [email protected] 3608776 Belanger [email protected] 3608600 Fernandez [email protected] 3608679
name official AT&T AT&T Research Labs Lucent Lucent Technologies, Bell Labs Innovations
Projects
Grants
Contacts
Companies
gid project recipient manager supervisor g1 PIX AT&T Fernandez Belanger g2 PIX AT&T Shrivastava Belanger g3 E-services Bell-labs Benedikt Hull
Sou
rce
Inst
ance
© Journées LSC du LIP 2016, A. Bonifati
Desired Target Instance
Target: Rcd projects: Set of project: Rcd code funds: Set of fund: Rcd fid finId
finances: Set of finance: Rcd finId mPhone company
companies: Set of company: Rcd coid name
Source: Rcd projects: Set of project: Rcd name status
grants: Set of grant: Rcd gid project recipient manager supervisor
contacts: Set of contact: Rcd cid email phone
companies: Set of company: Rcd name official
Finances
code: E-services Projects
fid finId g3 ???
Funds
code: PIX
fid finId g1 ??? g2 ???
Funds
finId mPhone company ??? 3608679 ??? ??? 3608776 ??? ??? 5827766 ???
coid name Sk2(AT&T) AT&T Sk2(Lucent) Lucent ??? ??? ??? ??? ??? ???
Companies
Target in
stance
© Journées LSC du LIP 2016, A. Bonifati
The Needed Transformation Query LET $doc0 := document("inputXMLfile") RETURN <T> { distinct-values ( FOR $x0 IN $doc0/S/grant, $x1 IN $doc0/S/project, $x2 IN $doc0/S/contact, $x3 IN $doc0/S/contact WHERE $x2/cid/text() = $x0/manager/text() AND $x0/supervisor/text() = $x3/cid/text() AND $x0/project/text() = $x1/name/text() RETURN <project> <code> { $x0/project/text() } </code> { distinct-values ( FOR $x0L1 IN $doc0/S/grant, $x1L1 IN $doc0/S/project, $x2L1 IN $doc0/S/contact, $x3L1 IN $doc0/S/contact WHERE $x2L1/cid/text() = $x0L1/manager/text() AND $x0L1/supervisor/text() = $x3L1/cid/text() AND $x0L1/project/text() = $x1L1/name/text() AND $x0/project/text() = $x0L1/project/text() RETURN <funding> <fid> { $x0L1/gid/text() } </fid> <finId> { "Sk52(", $x0L1/gid/text(), ", ", $x0L1/project/text(), ")" } </finId> </funding> ) } </project> ) } { distinct-values ( FOR $x0 IN $doc0/S/grant, $x1 IN $doc0/S/project, $x2 IN $doc0/S/contact, $x3 IN $doc0/S/contact WHERE $x2/cid/text() = $x0/manager/text() AND $x0/supervisor/text() = $x3/cid/text() AND $x0/project/text() = $x1/name/text() RETURN <finance> <finId> { $x0/gid/text() } </finId> <mPhone> { $x2/phone/text() } </mPhone> <company> { "Sk46(", $x2/phone/text(), ", ", $x0/gid/text(), ")" } </company> </finance> ) } { distinct-values ( FOR $x0 IN $doc0/S/grant, $x1 IN $doc0/S/project, $x2 IN $doc0/S/contact, $x3 IN $doc0/S/contact WHERE $x2/cid/text() = $x0/manager/text() AND $x0/supervisor/text() = $x3/cid/text() AND $x0/project/text() = $x1/name/text() RETURN <company> <coid> { "Sk46(", $x2/phone/text(), ", ", $x0/gid/text(), ")" } </coid> <name> { "Sk49(", $x2/phone/text(), ", ", $x0/gid/text(), ")" } </name> </company> ) } { distinct-values ( FOR $x0 IN $doc0/S/company RETURN <company> <coid> { "Sk93(", $x0/cname/text(), ")" } </coid> <name> { $x0/cname/text() } </name> </company> ) }
© Journées LSC du LIP 2016, A. Bonifati
The Road To Mapping Systems
l The design of data transformations has been a manual task for a long while n Designers had to be familiar with the language n As schemas became larger and more complex, the
task became too laborious, time-consuming and error-prone
l The need of raising the level of abstraction and trying to automate the tasks was soon realized.
l The idea … Mapping Systems
© Journées LSC du LIP 2016, A. Bonifati
Generating Mapping
l Different techniques exist to generate mappings: n Manual, e.g.
u by means of high-level mapping languages, such as [Bernstein et al. 2007]
u by means of sophisticated user interfaces [Altova 2008]
n Semi-automatic, e.g. u By means of designer guidance [Alexe2008] u Via advanced algorithms to do the reasoning instead of
the mapping designer [Madhavan at al. 2001] [Popa et al. 2002][Do et al. 2002][Bonifati et al. 2008]
© Journées LSC du LIP 2016, A. Bonifati
The First Step of a Mapping Task
Source: Rcd projects: Set of project: Rcd name status
grants: Set of grant: Rcd gid project recipient manager supervisor
contacts: Set of contact: Rcd cid email phone
companies: Set of company: Rcd name official
Target: Rcd projects: Set of project: Rcd code funds: Set of fund: Rcd fid finId
finances: Set of finance: Rcd finId mPhone company
companies: Set of company: Rcd coid name
name status PIX Active E-services Active Clio Inactive
cid email phone Benedikt [email protected] 5827766 Hull [email protected] 5824509 Shrivastava [email protected] 3608776 Belanger [email protected] 3608600 Fernandez [email protected] 3608679
name official AT&T AT&T Research Labs Lucent Lucent Technologies, Bell Labs Innovations
Projects
Grants
Contacts
Companies
gid project recipient manager supervisor g1 PIX AT&T Fernandez Belanger g2 PIX AT&T Shrivastava Belanger g3 E-services Bell-labs Benedikt Hull
© Journées LSC du LIP 2016, A. Bonifati
Matching
l Given two schemas as input, the source and the target schema, matching is a process that produces as output a set of matches or correspondences or (simply) lines between the elements of the two schemas
l A match is a triple <Es, Et, e> n Where Es is a set of elements of the source
schema, Et is a set of elements of the target schema, and e specifies a simple relationship (equality or set inclusion) or complex relationship between element in Es and Et
© Journées LSC du LIP 2016, A. Bonifati
The Matching Relationship e
l Depends on the cardinalities of Es and Et l Depends on the semantics:
n Can be a function n Can be an arithmetic operation n Can be a set-theoretic relation (e.g. ≡,overlaps) n Can be a conceptual modeling relationship (e.g.
part-of, subclass-of)
© Journées LSC du LIP 2016, A. Bonifati
Matching: An Alternative Definition
l The matching process [Euzenat et al. 2007] can be seen as a function f from a pair of schemas S and T, an optional input alignment A, a set of matching parameters p and a set of resources r:
A’ = f(S, T, A?, p, r)
l Ultimately, an alignment is a set of correspondences between elements in S and elements in T
© Journées LSC du LIP 2016, A. Bonifati
Matching Examples
l Simple relationship: n Name ≡ Title n Location ≡ Address
l Complex relationship: n speed = velocity x 2.237 n speed x 0.447 = velocity n speed = concat(velocity x 2.237, ‘MPH’) n speed ≥ velocity
Company
Location Name
Source
Address
Organization
Title
Target
© Journées LSC du LIP 2016, A. Bonifati
The matching process
l Can be roughly divided into three steps: n Pre-match: training of classifiers for machine
learning-based matchers, matching parameters (weights, thresholds), adjustments of resources, such as thesauri and constraints
n Match: the actual matching task n Post-match: the user may check and modify the
displayed matches
© Journées LSC du LIP 2016, A. Bonifati
Some Schema Matchers
l Cupid [Madhavan et al. 2001] : based on structural and name similarity
l S-Match [Giunchiglia et al. 2004]: based on semantic closeness
l Coma++ [Aumueller et al. 2005]: based on matching reuse
l LSD [Doan et al. 2001]: based on data value analysis and machine-learning techniques
l iMap [Dhamankar et al. 2004]: suited for complex e expressions
l Similarity Flooding [Melnik et al. 2002]: based on graph similarity
© Journées LSC du LIP 2016, A. Bonifati
Similarity Flooding
© Journées LSC du LIP 2016, A. Bonifati
COMA++
© Journées LSC du LIP 2016, A. Bonifati
Matchings Are Not Enough
Source: Rcd projects: Set of project: Rcd name status
grants: Set of grant: Rcd gid project recipient manager supervisor
contacts: Set of contact: Rcd cid email phone
companies: Set of company: Rcd name official
Target: Rcd projects: Set of project: Rcd code funds: Set of fund: Rcd fid finId
finances: Set of finance: Rcd finId mPhone company
companies: Set of company: Rcd coid name
name status PIX Active E-services Active Clio Inactive
cid email phone Benedikt [email protected] 5827766 Hull [email protected] 5824509 Shrivastava [email protected] 3608776 Belanger [email protected] 3608600 Fernandez [email protected] 3608679
name official AT&T AT&T Research Labs Lucent Lucent Technologies, Bell Labs Innovations
Projects
Grants
Contacts
Companies
gid project recipient manager supervisor g1 PIX AT&T Fernandez Belanger g2 PIX AT&T Shrivastava Belanger g3 E-services Bell-labs Benedikt Hull
Cannot describe the full details of the transformation
© Journées LSC du LIP 2016, A. Bonifati
Source Schema Target Schema
Matchings
Matcher
The Mapping Generation Process
Matching is just the beginning of any mapping generation process
© Journées LSC du LIP 2016, A. Bonifati
Mappings
l Given the source and the target schemas, mapping is a process that takes as input a set of matches between the elements of the two schemas and produces a relationship or constraint e that must hold between their respective instances
l In other words, a mapping is a triple <S, T, e> n Where S is the source schema, T is the target schema,
and e specify a constraint that any instances adhering to S and T must satisfy or an executable statement to transform the instance of S into the instances of T
© Journées LSC du LIP 2016, A. Bonifati
A Mapping Example Source: Rcd
projects: Set of project: Rcd name status
grants: Set of grant: Rcd gid project recipient manager supervisor
contacts: Set of contact: Rcd cid email phone
companies: Set of company: Rcd name official
Target: Rcd projects: Set of project: Rcd code funds: Set of fund: Rcd fid finId
finances: Set of finance: Rcd finId mPhone company
companies: Set of company: Rcd coid name
project(na,st), grant(gid,na,re,ma,su), contact(ma,em,ph) à project(na,FUND), fund(gid,finId), finance (finId,ph,company)
company(company, name)
© Journées LSC du LIP 2016, A. Bonifati
Mappings & Instances
l Mappings are the basic ingredients of many tasks, such as information integration, P2P query answering, data exchange etc.
l In particular, mappings as inter-schema constraints may not be enough to fully specify a unique target instance n There may exist multiple target instances
satisfying the mappings n Finding the best target instance is the goal of the
data exchange problem [Fagin et al. 2005] u The mapping is converted into a executable
transformation script to obtain that particular instance
© Journées LSC du LIP 2016, A. Bonifati
S: Rcd projects: Set of project: Rcd name status
grants: Set of grant: Rcd gid project recipient manager supervisor
contacts: Set of contact: Rcd cid email phone
companies: Set of company: Rcd name official
A Data Exchange Example
T: Rcd projects: Set of project: Rcd code funds: Set of fund: Rcd fid finId
finances: Set of finance: Rcd finId mPhone company
companies: Set of company: Rcd coid name
Target in
stance
Finances
code: E-services Projects
fid finId g3 ???
Funds
code: PIX
fid finId g1 ??? g2 ???
Funds
finId mPhone company ??? 3608679 ??? ??? 3608776 ??? ??? 3608600 ???
coid name ??? AT&T ??? Lucent
Companies
project(na,st), grant(gid,na,re,ma,su), contact(ma,em,ph) à project(na,FUND), fund(gid,finId), finance (finId,ph,company),
company(company, name)
© Journées LSC du LIP 2016, A. Bonifati
Source Schema Target Schema
Matchings User
Transformation Scripts
Matcher
Mapping Generation Engine
Mappings (Dependencies)
Query Engine
The Mapping Generation Process
Data Exchange Engine
Source Instance
Target Instance
© Journées LSC du LIP 2016, A. Bonifati
Research Prototype Systems
l Mapping generation and data exchange are separate tasks n Clio[Popa et al. 2002], HePToX[Bonifati et al. 2010],
Spicy[Bonifati et al. 2008] l Mappings Generation
n the mappings are expressed as high-level assertions in a logical formalism
n A mapping is a source-to-target tuple-generating dependency (or s-t tgd in short)
𝜙(𝑥)→∃ 𝜓(𝑥,𝑦) where n φ(x) (ψ(x,y), resp.) is a conjunction of atoms over the
source (target, resp.) l Data exchange
n The respective module transforms the high-level mappings into transformation scripts (in SQL or XQuery) to generate the target instance.
© Journées LSC du LIP 2016, A. Bonifati
Clio
© Journées LSC du LIP 2016, A. Bonifati
Spicy
© Journées LSC du LIP 2016, A. Bonifati
HepToX
© Journées LSC du LIP 2016, A. Bonifati
Source Schema Target Schema
Matchings User
Transformation Scripts
Matcher
Mapping Generation Engine
Mappings (Dependencies)
Query Engine
Commercial Mapping Systems
Data Exchange Engine
Source Instance
Target Instance
Mapping Engine Mapping generation and Data exchange are merged in one. The system directly creates in some native language the final transformation script
© Journées LSC du LIP 2016, A. Bonifati
Popular Commercial Systems
l Altova Mapforce l Stylus Studio l IBM Rational Data Architect l BizTalk mapper l Adeptia l BEA Aqualogic
© Journées LSC du LIP 2016, A. Bonifati
Stylus Studio
© Journées LSC du LIP 2016, A. Bonifati
Altova MapForce
© Journées LSC du LIP 2016, A. Bonifati
BizTalk Mapper
© Journées LSC du LIP 2016, A. Bonifati
A Mapping Tool Categorization
l All tools provide to the mapping designer: n A graphical representation of the two schemas n A set of graphical transformation constructs
u The granularity and power of these constructs is a main factor of differentiation among the tools
Detailed Specification by the Designer
Intelligence of the mapping tool/ Effort in post-verification
Roughly Commercial
Mapping Tools
Roughly Research
Prototypes
© Journées LSC du LIP 2016, A. Bonifati
Issues in Data Exchange
l When multiple target instances exist, how do we compute the best one?
l Is a given target instance better than another? n Universal solutions
u Introduced in [Fagin et al. 2005] u These are the “most general” target instances, and also
represent the entire space of solutions u Among the universal solutions, the smallest of all and
the most compact one is called the “core”
© Journées LSC du LIP 2016, A. Bonifati
Universal Core Instances T: Rcd
Advised: Rcd sname facid
WorksWith: Rcd
sname facid
S: Rcd PTStud: Rcd
age name
GradStud: Rcd age name
sname facid
Bob N3 Ann N4
WorksWith
PTStud(x,y), à Advised(y,z) GradStud (x,y) à Advised(y,z), WorksWith(y,z)
Source instance PTStud
age name 27 Bob 30 Ann
GradStud
age name 32 John 30 Ann
Advised sname facid Bob N3 Ann N4 John N1
N1 Cathy
A solution:
sname facid Bob N3 Ann N4
WorksWith
Advised sname facid Bob N3 Ann N4 John N1
N2 Ann sname facid Bob N3 Ann N4
WorksWith
Advised sname facid Bob N3 Ann N4 John N1
A universal Solution: The core:
© Journées LSC du LIP 2016, A. Bonifati
Commercial vs. Research Prototype Systems
l Whereas research prototypes (e.g. Clio, Spicy++) are tending to produce target instances that look more and more like the core
l Commercial tools leave the task to the users, who have to manually interact with sophisticated GUIs and write pieces of the transformation manually n No core definition is even considered
Core? No,
thanks
© Journées LSC du LIP 2016, A. Bonifati
Mixing Matching and Mapping
l Matching and Mapping not always by separate tools n Clio has as an add-on a matcher based on
attribute feature analysis [Naumann et al. 2002] n Bernstein’s model management considers the
matcher as a fully integrated and indistinguishable component
n Spicy [Bonifati et al. 2008] has a matcher based on instance-based structural analysis
© Journées LSC du LIP 2016, A. Bonifati
Limitations of Current Systems
l Manual approaches are not applicable to large-scale mapping tasks
l The user/developer has to become familiar with the mapping language and the user interfaces
l The outcome of the mapping process may not respect the user requirements and desired semantics (unsurprisingly!)
l Specifications may be incomplete and dependent of system peculiarities
l Thus, there is a need for a verification and guidance process
© Journées LSC du LIP 2016, A. Bonifati
Source Schema Target Schema
Matchings User
Transformation Scripts
Matcher
Mapping Generation Engine
Mappings (Dependencies)
Query Engine
The Verification Process
Data Exchange Engine
Source Instance
Target Instance
Data Examples
Expected Target
Instance
Verification And
Selection User
© Journées LSC du LIP 2016, A. Bonifati
A-Posteriori Verification
l The main problem with matching and mapping is the dichotomy between the expected results and the generated answers n Some tools allow a post-verification
u by using data examples v Tupelo [Fletcher et al. 2006], Muse[Alexe et al. 2008] Clio
[Alexe et al. 2010] u by using an automatic instance comparison,
v Spicy [Bonifati et al. 2008] u by means of manual user feedback
v unfeasible for large-scale tasks u via debugging techniques
v Routes [Chiticariu et al. 2006]
© Journées LSC du LIP 2016, A. Bonifati
Conclusions and Ongoing work
l Interactive Mapping Refinement with User Data Examples n Independence of the graphical language n Approach driven by data examples that are simple
to build for a generic user (few tuples/no core or universal data examples)
n Obtained mappings have good properties (normalisation, absence of redundancy)
n Joint work with U. Comignani (M2 Lyon1), E. Coquery and R. Thion
l Funding: ANR DataCert (2016-2020) & CNRS Mastodons MedClean 2016 & Palse Impulsion 2016
© Journées LSC du LIP 2016, A. Bonifati
Main Sources
[Euzenat et al. 2007] J. Euzenat and P. Shvaiko, "Ontology matching", Springer-Verlag, 2007
[Bellahsene et al. 2011] Z. Bellahsene, A. Bonifati and E. Rahm, "Schema Matching and Mapping", Springer-Verlag, 2011
© Journées LSC du LIP 2016, A. Bonifati
Thanks and Acknowledgements
Thank you !!
Thanks to my co-author Y. Velegrakis (Univ. of Trento, Italy)
© Journées LSC du LIP 2016, A. Bonifati
List of References (cont’d)
l [Alexe et al. 2008] Alexe B, Chiticariu L, Miller RJ, Tan WC (2008) “Muse: Mapping Understanding and deSign by Example”. In: ICDE, pp 10-19
l [Bonifati et al. 2008] Bonifati A, Mecca G, Pappalardo A, Raunich S, Summa G (2008) “Schema Mapping Verification: The Spicy Way”. In: EDBT, pp 85-96
l [Lenzerini 2002] Lenzerini Maurizio “Data Integration: A Theoretical Perspective”. In: PODS, pp 233-246
l [Miller et al. 2000] Miller RJ, Haas LM, Hernandez MA (2000) “Schema Mapping as Query Discovery”. In: VLDB, pp 77-88
l [Rahm et al. 2001] Rahm E, Bernstein PA (2001) “A survey of approaches to automatic schema matching”. VLDB Journal 10(4):334-35
l [Velegrakis 2005] Velegrakis Y “Managing Schema Mappings in Highly Heterogeneous Environments”. PhD thesis, University of Toronto
© Journées LSC du LIP 2016, A. Bonifati
List of References (cont’d)
l [Altova 2008] Altova (2008) MapForce. Http://www.altova.com l [Batini et al. 1986] Batini C, Lenzerini M, Navathe SB (1986) A
Comparative Analysis of Methodologies for Database Schema Integration. ACM Comp. Surv. 18(4):323-364
l [Bernstein et al. 2007] Bernstein PA, Melnik S (2007) Model management 2.0: manipulating richer mappings. In: SIGMOD, pp 1–12
l [Do et al. 2002] Do HH, Rahm E (2002) COMA - A System for Flexible Combination of Schema Matching Approaches. In: VLDB, pp 610–621
l [Euzenat et al. 2007] Euzenat J, Shvaiko P (2007) Ontology matching. Springer Verlag, Heidelberg
l [Fagin et al. 2005] Fagin R, Kolaitis PG, Miller RJ, Popa L (2005) Data exchange: semantics and query answering. Theoretical Computer Science 336(1):89-124
l [Lerner 2000] Lerner BS (2000) A Model for Compound Type Changes Encountered in Schema Evolution. TPCTC 25(1):83–127
l [Popa et al, 2002] Popa L, Velegrakis Y, Miller RJ, Hernandez MA, Fagin R (2002)Translating Web Data. In: VLDB, pp 598–609
© Journées LSC du LIP 2016, A. Bonifati
List of References (cont’d)
l [Alexe2010] Alexe B, Kolaitis PG, Tan W (2010b) Characterizing Schema Mappings via Data Examples. In: PODS
l [Aumueller2006] Aumueller D, Do HH, Massmann S, Rahm E (2005) Schema and ontology matching with COMA++. In: SIGMOD, pp 906–908
l [Bonifati et al. 2010] Angela Bonifati, Elaine Qing Chang, Terence Ho, Laks V. S. Lakshmanan, Rachel Pottinger, Yongik Chung: Schema mapping and query Translation in heterogeneous P2P XML databases. VLDB J. 19(2): 231-256 (2010)
l [Chiticariu et al. 2006] Chiticariu L, TanWC(2006) Debugging Schema Mappings with Routes. In: VLDB, pp 79–90
l [Dhamankar at al. 2004] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Y. Halevy, Pedro Domingos: iMAP: Discovering Complex Mappings between Database Schemas. SIGMOD Conference 2004: 383-394
l [Doan et al. 2001] Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources: A machine-learning approach. In: SIGMOD, pp 509–520