qom: qualitative and quantitative measure of schema matching naiyana tansalarak and kajal t....
Post on 22-Dec-2015
222 views
TRANSCRIPT
QoM:Qualitative and Quantitative
Measure of Schema Matching
Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter)
University of Massachusetts, LowellOct 14th, 2003
22th International Conference on Conceptual Modeling (ER) 2003
Chicago, Illinois
ER 20032
• Integration of information - A big challenge!
• “Data data everywhere and ……
• Problem: Heterogeneous data sources
• Concepts: protein sequence, grams of protein
• Semantics: “protein” for a protein scientist vs “protein” for a nutritionist
• Data Formats: XML, object oriented, relational
• Access Methods: special purpose programs (BLAST), SQL, XQuery
• “need a way to integrate”
Introduction
ER 20033
Integration of heterogeneous sources
Source 1
Source 2
Source n
.
.
.
IntegratedSources
Problems:- Resolve conflicts- Integrate data- Interpret results
Goal:
- Automated / Semi-automated Integration via “Schema Matching”
ER 20034
Schema Matching - The Process
• Schema matching: process of finding “semantic correspondences” between the entities of two or more schemas– Input: two schemas
– Output: set of matches between the two schemas
• Two entities match if their similarity value is above threshold
• Similarity values & thresholds tightly coupled to algorithm.– Example:
• CUPID[MBR01] defines similarity value as the fraction of leaves in the two subtrees that have at least one “strong” link to a leaf in the other subtree.
• Linguistic algorithm’s similarity values are based on the level of matching in a hypernym tree.
– Thresholds are ad-hoc
• Problem: A match from one algorithm may not be considered a match by another algorithm!
ER 20035
Contributions of Our Work
• Proposal of QoM - Quality of Match metric– A metric for comparing different matches
produced by the match algorithms
• Measurement of QoM– Qualitative measure: Match Taxonomy– Quantitative measure: Weight-based Match Model
ER 20036
Outline
• Motivation
• Our Approach
– Unifying Data Model: UML
– Match Taxonomy
– Weight-based Match Model
• Related Work
• Conclusions and Future Work
ER 20037
Unifying Data Model
Recipe
Ingredient
Instruction
1
name desc
id qty item
id step
1
m
m
<xsd:element name=”dish”> <xsd:complexType> <xsd:sequence> <xsd:element name=”name” type=”xsd:String”/> <xsd:element name=”desc” type=”xsd:String”/> <xsd:element name=”item”> <xsd:complexType> <xsd:sequence> <xsd:element name=”qty” type=”xsd:String”/> <xsd:element name=”item” type=”xsd:String”/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name=”step”> <xsd:complexType> <xsd:sequence> <xsd:element name=”direction” type=”xsd:Integer”/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType></xsd:element>
VS.
VS. -name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
UML Model UML Model
ER 20038
Definition
a schema S = < C>
a class c = < A, M >
an attribute a = < L, A, T, N, I >
a method m = < A, O, I, pre, post >
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
ER 20039
Definition (Cont’)
a schema S = < C >
a class c = < A, M >
an attribute a = < L, A, T, N, I >
a method m = < A, O, I, pre, post >
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
ER 200310
Definition (Cont’)
a schema S = < C >
a class c = < A, M >
an attribute a = < L, A, T, N, I >
a method m = < A, O, I, pre, post >
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
ER 200311
Definition (Cont’)
a schema S = < C >
a class c = < E, F >
an attribute a = < L, A, T, N, I >
a method m = < A, O, I, pre, post >
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
ER 200312
Qualitative Measure:Taxonomy of Schema Matches
Attribute/Method Level … Micro Match
MjMi MjMi
MjMi
MjMi
MjMi
MjMi
•Goal:
• Describe the “quality” and the “coverage” of match
Class Level … Sub-Macro Match
Schema Level … Macro Match
ER 200313
Micro Match
• Attributes can be compared based on:• label (L ), scope (A ), type (T ), atomicity (N ),
intializer(I )
• Match can be:
• Exact -
• Labels: exact string match or synonyms (name vs name)
• Other properties: equivalent values (String vs char[] )
• Relaxed -
• Labels: “almost same”, same hypernym tree (firstName vs name )
• Other properties: implied values (protected vs private)
ER 200314
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
name = < name, private, String, atomic, null >
name = < name, private, String, atomic, null >
Exact Match
ER 200315
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
name = < name, private, String, atomic, null >
qty = < qty, private, String, atomic, null >
Relaxed Match
ER 200316
Sub-Macro Match
Total Match Partial Match
• Classes can be compared based on:– The quality of match of its attributes (micro match)
• Exact vs Relaxed
– The “coverage”: the number of micro matches between the source and the target classes
• Total : all attributes of the source have a match in the target
• Partial : some, not all, attributes of source have a match in the target.
17ER 2003
Sub-Macro Match (Cont’)
Total Exact Match Partial Exact Match
Total Relaxed Match Partial Relaxed Match
ER 200318
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
name , desc
name , desc
Recipe Total Exact Match Dish
ER 200319
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
id , qty, item
qty, item
Ingredient Partial Exact Match Item
Item Total Exact Match Ingredient
ER 200320
Macro Match
• Schemas can be compared in a similar manner to classes
• Schemas can be compared based on:– The quality of match of their classes (sub-macro match)
• Total Exact, Total Relaxed, Partial Exact and Partial Relaxed
– The “coverage”: the number of sub-macro matches between the source and the target schemas
• Total : all classes of the source have a match in the target
• Partial : some, not all, classes of source have a match in the target.
ER 200321
Macro Match
Total Exact Match Partial Exact Match
Total Relaxed Match Partial Relaxed Match
ER 200322
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
Recipe, Ingredient, Instruction
Dish, Item, Step
Recipe Partial Exact Match Dish
Dish Total Exact Match RecipeTE PE PE
ER 200323
Quantitative Measure:Weight-Based Measure of QoM
• Match Taxonomy :– Qualitative measure of match between two entities– Can distinguish between a total exact and a partial exact
match, or a total exact and a partial relaxed match– Cannot decide if one partial exact match is better than the
other, or if a total relaxed match is better than a partial exact match
• Weight Based Measure:– Provides a quantitative metric for the QoM
ER 200324
=,
Match operator Weight
Weight-based Match Model
• Match Value: “weight” of each match operator representing the match between two properties
• Example: Label match
Name Name
W(ls , lt) = 1.0
ER 200325
What has been done -Related Work
• Domain specific [BHP94,BCVB01,BM01] and domain independent [HMN+99,MBR01,DR02] algorithms
• Approaches exploit various types of information
– Element names, structural properties, ontologies, characteristics of data instances.
• Example:
– Doan et al. [DDH01]
• Combines match predictions using a set of machine learning techniques
• Match predictions based on element name matching, content matching, text classification and domain knowledge
– Madhavan et al. (Cupid) [ MBR01]
• Hybrid algorithm - combines linguistic and structural match algorithm
ER 200326
Conclusions and Future Work
• Contributions:
– Proposed QoM: a quality metric for schema matches
– Two techniques to evaluate the QoM
• Qualitative: Match Taxonomy
• Quantitative: Weight-based match Model
• Future Work:
– Combining “user input” for desired matches to optimize the schema match
process
– Refinement of QoM for XML model
• Accounting for order, and the different levels of nesting
– Development of Match algorithms based on QoM
ER 200328
Micro Match Model
QoM (as, at) =
W (Ls, Lt) + W (As, At) + W (Ts, Tt) + W (Ns, Nt) + W (Is, It)
5
QoM (ms, mt) = QoMsig (ms, mt) + (2 * QoMspec (ms, mt))
3
QoMsig (ms, mt) = W (As, At) + W (Os, Ot) + W (Is, It)
3
QoMspec (ms, mt) = W (pres, pret) + W (posts, postt)
2
ER 200329
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
The Recipe Schema The Dish Schema
ER 200330
Micro Match (Attribute)
as = < Ls, As, Ts, Ns, Is > vs at = < Lt, At,
Tt, Nt, It>
Ls vs Lt
As vs At
Ts vs Tt
Ns vs Nt
Is vs It
Exact Match Relaxed Match
ER 200331
Micro Match (Method)
ms = < As, Os, Is, Pres Posts > vs mt = < At, Ot, It, Pret
Postt >
As vs At
Os vs Ot
Is vs It
Pres vs Pret
Posts vs Post
Exact Match Relaxed Match
=
=
=
ER 200332
Weighing the Micro Match
• Match between attributes based on the match of the individual properties– Exact or relaxed
• QoM(as, at): – Quantitative measure of the match between
attributes as and at.– The normalized sum of the match values of the
individual properties of an attribute.
ER 200333
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
name = < name, private, String, atomic, null >
name = < name, private, String, atomic, null >
QoM (namerecipe, namedish) = 1.0 + 1.0 + 1.0 + 1.0 + 1.0 = 1.0
ER 200334
Weighing the Sub-Macro Match
• Sub-Macro match:– Normalized sum of QoM of micro matches:
– Coverage:
– Sub-Macro Match:
RW (Cs, Ct) = QoM (Ms, Mt)
| Cs|
| Cs |
RS (Cs, Ct) = | Cms |
| Ct |
RT (Cs, Ct) = | Cms |
3
QoM (Cs, Ct) = RW (Cs, Ct) + Rs (Cs, Ct) + RT (Cs, Ct)
ER 200335
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
id , step
direction
R
QoM (stepInstruction, directionstep) = 0.5 + 1.0 + 1.0 + 1.0 + 1
= 0.9
QoM (Instruction, step) = (0.45+0.5+1.0) / 3 = 0.65
RW (Instruction, step) = 0.9 / 2 = 0.45
RS (Instruction, step) = 1 / 2 = 0.5
Rt (Instruction, step) = 1 / 1 = 1.0
ER 200336
Weighing the Macro Match
• Macro match:– Normalized sum of sub-macro QoMs
– Coverage
– Macro Match:
RW (Ss, St) = QoM (Cs, Ct)
| Ss|
| St |
RT (Ss, St) = | Sms |
| Ss |
RS (Ss, St) = | Sms |
3
QoM (Ss, St) = RW (Ss, St) + Rs (Ss, St) + RT (Ss, St)
ER 200337
Example
-name : String-desc : String
Recipe
-id : String-qty : String-item : String
Ingredient
-id : String-step : String
Instruction
-has 1
-belongs to
*
-has1 -belongs to
*
-name : String-desc : String
Dish
-qty : String-item : String
Item
-direction : String
Step
-composes of 1
-belongs to
*
-composes of1
-belongs to *
Recipe, Ingredient, Instruction
Dish, Item, Direction
1.00
QoM (RECIPE, DISH) = (0.81+1.0+1.0) / 3 = 0.94
RW (RECIPE, DISH) = (1.00 + 0.78 + 0.65 ) / 3 = 0.81
RS (RECIPE, DISH) = 3 / 3 = 1.0
Rt (RECIPE, DISH) = 3 / 3 = 1.0
0.78 0.65