qom: qualitative and quantitative measure of schema matching naiyana tansalarak and kajal t....

37
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts, Lowell Oct 14 th , 2003 22th International Conference on Conceptual Modeling (ER) 2003 Chicago, Illinois

Post on 22-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

QoM:Qualitative and Quantitative

Measure of Schema Matching

Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter)

University of Massachusetts, LowellOct 14th, 2003

22th International Conference on Conceptual Modeling (ER) 2003

Chicago, Illinois

ER 20032

• Integration of information - A big challenge!

• “Data data everywhere and ……

• Problem: Heterogeneous data sources

• Concepts: protein sequence, grams of protein

• Semantics: “protein” for a protein scientist vs “protein” for a nutritionist

• Data Formats: XML, object oriented, relational

• Access Methods: special purpose programs (BLAST), SQL, XQuery

• “need a way to integrate”

Introduction

ER 20033

Integration of heterogeneous sources

Source 1

Source 2

Source n

.

.

.

IntegratedSources

Problems:- Resolve conflicts- Integrate data- Interpret results

Goal:

- Automated / Semi-automated Integration via “Schema Matching”

ER 20034

Schema Matching - The Process

• Schema matching: process of finding “semantic correspondences” between the entities of two or more schemas– Input: two schemas

– Output: set of matches between the two schemas

• Two entities match if their similarity value is above threshold

• Similarity values & thresholds tightly coupled to algorithm.– Example:

• CUPID[MBR01] defines similarity value as the fraction of leaves in the two subtrees that have at least one “strong” link to a leaf in the other subtree.

• Linguistic algorithm’s similarity values are based on the level of matching in a hypernym tree.

– Thresholds are ad-hoc

• Problem: A match from one algorithm may not be considered a match by another algorithm!

ER 20035

Contributions of Our Work

• Proposal of QoM - Quality of Match metric– A metric for comparing different matches

produced by the match algorithms

• Measurement of QoM– Qualitative measure: Match Taxonomy– Quantitative measure: Weight-based Match Model

ER 20036

Outline

• Motivation

• Our Approach

– Unifying Data Model: UML

– Match Taxonomy

– Weight-based Match Model

• Related Work

• Conclusions and Future Work

ER 20037

Unifying Data Model

Recipe

Ingredient

Instruction

1

name desc

id qty item

id step

1

m

m

<xsd:element name=”dish”> <xsd:complexType> <xsd:sequence> <xsd:element name=”name” type=”xsd:String”/> <xsd:element name=”desc” type=”xsd:String”/> <xsd:element name=”item”> <xsd:complexType> <xsd:sequence> <xsd:element name=”qty” type=”xsd:String”/> <xsd:element name=”item” type=”xsd:String”/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name=”step”> <xsd:complexType> <xsd:sequence> <xsd:element name=”direction” type=”xsd:Integer”/> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType></xsd:element>

VS.

VS. -name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

UML Model UML Model

ER 20038

Definition

a schema S = < C>

a class c = < A, M >

an attribute a = < L, A, T, N, I >

a method m = < A, O, I, pre, post >

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

ER 20039

Definition (Cont’)

a schema S = < C >

a class c = < A, M >

an attribute a = < L, A, T, N, I >

a method m = < A, O, I, pre, post >

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

ER 200310

Definition (Cont’)

a schema S = < C >

a class c = < A, M >

an attribute a = < L, A, T, N, I >

a method m = < A, O, I, pre, post >

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

ER 200311

Definition (Cont’)

a schema S = < C >

a class c = < E, F >

an attribute a = < L, A, T, N, I >

a method m = < A, O, I, pre, post >

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

ER 200312

Qualitative Measure:Taxonomy of Schema Matches

Attribute/Method Level … Micro Match

MjMi MjMi

MjMi

MjMi

MjMi

MjMi

•Goal:

• Describe the “quality” and the “coverage” of match

Class Level … Sub-Macro Match

Schema Level … Macro Match

ER 200313

Micro Match

• Attributes can be compared based on:• label (L ), scope (A ), type (T ), atomicity (N ),

intializer(I )

• Match can be:

• Exact -

• Labels: exact string match or synonyms (name vs name)

• Other properties: equivalent values (String vs char[] )

• Relaxed -

• Labels: “almost same”, same hypernym tree (firstName vs name )

• Other properties: implied values (protected vs private)

ER 200314

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

name = < name, private, String, atomic, null >

name = < name, private, String, atomic, null >

Exact Match

ER 200315

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

name = < name, private, String, atomic, null >

qty = < qty, private, String, atomic, null >

Relaxed Match

ER 200316

Sub-Macro Match

Total Match Partial Match

• Classes can be compared based on:– The quality of match of its attributes (micro match)

• Exact vs Relaxed

– The “coverage”: the number of micro matches between the source and the target classes

• Total : all attributes of the source have a match in the target

• Partial : some, not all, attributes of source have a match in the target.

17ER 2003

Sub-Macro Match (Cont’)

Total Exact Match Partial Exact Match

Total Relaxed Match Partial Relaxed Match

ER 200318

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

name , desc

name , desc

Recipe Total Exact Match Dish

ER 200319

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

id , qty, item

qty, item

Ingredient Partial Exact Match Item

Item Total Exact Match Ingredient

ER 200320

Macro Match

• Schemas can be compared in a similar manner to classes

• Schemas can be compared based on:– The quality of match of their classes (sub-macro match)

• Total Exact, Total Relaxed, Partial Exact and Partial Relaxed

– The “coverage”: the number of sub-macro matches between the source and the target schemas

• Total : all classes of the source have a match in the target

• Partial : some, not all, classes of source have a match in the target.

ER 200321

Macro Match

Total Exact Match Partial Exact Match

Total Relaxed Match Partial Relaxed Match

ER 200322

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

Recipe, Ingredient, Instruction

Dish, Item, Step

Recipe Partial Exact Match Dish

Dish Total Exact Match RecipeTE PE PE

ER 200323

Quantitative Measure:Weight-Based Measure of QoM

• Match Taxonomy :– Qualitative measure of match between two entities– Can distinguish between a total exact and a partial exact

match, or a total exact and a partial relaxed match– Cannot decide if one partial exact match is better than the

other, or if a total relaxed match is better than a partial exact match

• Weight Based Measure:– Provides a quantitative metric for the QoM

ER 200324

=,

Match operator Weight

Weight-based Match Model

• Match Value: “weight” of each match operator representing the match between two properties

• Example: Label match

Name Name

W(ls , lt) = 1.0

ER 200325

What has been done -Related Work

• Domain specific [BHP94,BCVB01,BM01] and domain independent [HMN+99,MBR01,DR02] algorithms

• Approaches exploit various types of information

– Element names, structural properties, ontologies, characteristics of data instances.

• Example:

– Doan et al. [DDH01]

• Combines match predictions using a set of machine learning techniques

• Match predictions based on element name matching, content matching, text classification and domain knowledge

– Madhavan et al. (Cupid) [ MBR01]

• Hybrid algorithm - combines linguistic and structural match algorithm

ER 200326

Conclusions and Future Work

• Contributions:

– Proposed QoM: a quality metric for schema matches

– Two techniques to evaluate the QoM

• Qualitative: Match Taxonomy

• Quantitative: Weight-based match Model

• Future Work:

– Combining “user input” for desired matches to optimize the schema match

process

– Refinement of QoM for XML model

• Accounting for order, and the different levels of nesting

– Development of Match algorithms based on QoM

ER 200327

More Information: http: //www.cs.uml.edu/dsl

email: [email protected],

ER 200328

Micro Match Model

QoM (as, at) =

W (Ls, Lt) + W (As, At) + W (Ts, Tt) + W (Ns, Nt) + W (Is, It)

5

QoM (ms, mt) = QoMsig (ms, mt) + (2 * QoMspec (ms, mt))

3

QoMsig (ms, mt) = W (As, At) + W (Os, Ot) + W (Is, It)

3

QoMspec (ms, mt) = W (pres, pret) + W (posts, postt)

2

ER 200329

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

The Recipe Schema The Dish Schema

ER 200330

Micro Match (Attribute)

as = < Ls, As, Ts, Ns, Is > vs at = < Lt, At,

Tt, Nt, It>

Ls vs Lt

As vs At

Ts vs Tt

Ns vs Nt

Is vs It

Exact Match Relaxed Match

ER 200331

Micro Match (Method)

ms = < As, Os, Is, Pres Posts > vs mt = < At, Ot, It, Pret

Postt >

As vs At

Os vs Ot

Is vs It

Pres vs Pret

Posts vs Post

Exact Match Relaxed Match

=

=

=

ER 200332

Weighing the Micro Match

• Match between attributes based on the match of the individual properties– Exact or relaxed

• QoM(as, at): – Quantitative measure of the match between

attributes as and at.– The normalized sum of the match values of the

individual properties of an attribute.

ER 200333

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

name = < name, private, String, atomic, null >

name = < name, private, String, atomic, null >

QoM (namerecipe, namedish) = 1.0 + 1.0 + 1.0 + 1.0 + 1.0 = 1.0

ER 200334

Weighing the Sub-Macro Match

• Sub-Macro match:– Normalized sum of QoM of micro matches:

– Coverage:

– Sub-Macro Match:

RW (Cs, Ct) = QoM (Ms, Mt)

| Cs|

| Cs |

RS (Cs, Ct) = | Cms |

| Ct |

RT (Cs, Ct) = | Cms |

3

QoM (Cs, Ct) = RW (Cs, Ct) + Rs (Cs, Ct) + RT (Cs, Ct)

ER 200335

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

id , step

direction

R

QoM (stepInstruction, directionstep) = 0.5 + 1.0 + 1.0 + 1.0 + 1

= 0.9

QoM (Instruction, step) = (0.45+0.5+1.0) / 3 = 0.65

RW (Instruction, step) = 0.9 / 2 = 0.45

RS (Instruction, step) = 1 / 2 = 0.5

Rt (Instruction, step) = 1 / 1 = 1.0

ER 200336

Weighing the Macro Match

• Macro match:– Normalized sum of sub-macro QoMs

– Coverage

– Macro Match:

RW (Ss, St) = QoM (Cs, Ct)

| Ss|

| St |

RT (Ss, St) = | Sms |

| Ss |

RS (Ss, St) = | Sms |

3

QoM (Ss, St) = RW (Ss, St) + Rs (Ss, St) + RT (Ss, St)

ER 200337

Example

-name : String-desc : String

Recipe

-id : String-qty : String-item : String

Ingredient

-id : String-step : String

Instruction

-has 1

-belongs to

*

-has1 -belongs to

*

-name : String-desc : String

Dish

-qty : String-item : String

Item

-direction : String

Step

-composes of 1

-belongs to

*

-composes of1

-belongs to *

Recipe, Ingredient, Instruction

Dish, Item, Direction

1.00

QoM (RECIPE, DISH) = (0.81+1.0+1.0) / 3 = 0.94

RW (RECIPE, DISH) = (1.00 + 0.78 + 0.65 ) / 3 = 0.81

RS (RECIPE, DISH) = 3 / 3 = 1.0

Rt (RECIPE, DISH) = 3 / 3 = 1.0

0.78 0.65