qom: qualitative and quantitative measure of schema matching naiyana tansalarak and kajal t....

of 37/37
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts, Lowell Oct 14 th , 2003 22th International Conference on Conceptual Modeling (ER) 2003 Chicago, Illinois

Post on 22-Dec-2015

218 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts, Lowell Oct 14 th, 2003 22th International Conference on Conceptual Modeling (ER) 2003 Chicago, Illinois
  • Slide 2
  • ER 2003 2 Integration of information - A big challenge! Data data everywhere and Problem: Heterogeneous data sources Concepts: protein sequence, grams of protein Semantics: protein for a protein scientist vs protein for a nutritionist Data Formats: XML, object oriented, relational Access Methods: special purpose programs (BLAST), SQL, XQuery need a way to integrate Introduction
  • Slide 3
  • ER 2003 3 Integration of heterogeneous sources Source 1Source 2Source n...... Integrated Sources Problems: -Resolve conflicts -Integrate data -Interpret results Goal: -Automated / Semi- automated Integration via Schema Matching
  • Slide 4
  • ER 2003 4 Schema Matching - The Process Schema matching: process of finding semantic correspondences between the entities of two or more schemas Input: two schemas Output: set of matches between the two schemas Two entities match if their similarity value is above threshold Similarity values & thresholds tightly coupled to algorithm. Example: CUPID[MBR01] defines similarity value as the fraction of leaves in the two subtrees that have at least one strong link to a leaf in the other subtree. Linguistic algorithm s similarity values are based on the level of matching in a hypernym tree. Thresholds are ad-hoc Problem: A match from one algorithm may not be considered a match by another algorithm!
  • Slide 5
  • ER 2003 5 Contributions of Our Work Proposal of QoM - Quality of Match metric A metric for comparing different matches produced by the match algorithms Measurement of QoM Qualitative measure: Match Taxonomy Quantitative measure: Weight-based Match Model
  • Slide 6
  • ER 2003 6 Outline Motivation Our Approach Unifying Data Model: UML Match Taxonomy Weight-based Match Model Related Work Conclusions and Future Work
  • Slide 7
  • ER 2003 7 Unifying Data Model VS. UML Model
  • Slide 8
  • ER 2003 8 Definition a schema S = a class c = an attribute a = a method m =
  • Slide 9
  • ER 2003 9 Definition (Cont ) a schema S = a class c = an attribute a = a method m =
  • Slide 10
  • ER 2003 10 Definition (Cont ) a schema S = a class c = an attribute a = a method m =
  • Slide 11
  • ER 2003 11 Definition (Cont ) a schema S = a class c = an attribute a = a method m =
  • Slide 12
  • ER 2003 12 Qualitative Measure: Taxonomy of Schema Matches Attribute/Method Level Micro Match Goal: Describe the quality and the coverage of match Class Level Sub-Macro Match Schema Level Macro Match
  • Slide 13
  • ER 2003 13 Micro Match Attributes can be compared based on: label ( L ), scope ( A ), type ( T ), atomicity ( N ), intializer( I ) Match can be: Exact - Labels: exact string match or synonyms ( name vs name ) Other properties: equivalent values ( String vs char[] ) Relaxed - Labels: almost same, same hypernym tree ( firstName vs name ) Other properties: implied values ( protected vs private )
  • Slide 14
  • ER 2003 14 Example name = Exact Match
  • Slide 15
  • ER 2003 15 Example name = qty = Relaxed Match
  • Slide 16
  • ER 2003 16 Sub-Macro Match Total MatchPartial Match Classes can be compared based on: The quality of match of its attributes (micro match) Exact vs Relaxed The coverage : the number of micro matches between the source and the target classes Total : all attributes of the source have a match in the target Partial : some, not all, attributes of source have a match in the target.
  • Slide 17
  • ER 2003 17 Sub-Macro Match (Cont ) Total Exact MatchPartial Exact Match Total Relaxed MatchPartial Relaxed Match
  • Slide 18
  • ER 2003 18 Example name, desc Recipe Total Exact Match Dish
  • Slide 19
  • ER 2003 19 Example id, qty, item qty, item Ingredient Partial Exact Match Item Item Total Exact Match Ingredient
  • Slide 20
  • ER 2003 20 Macro Match Schemas can be compared in a similar manner to classes Schemas can be compared based on: The quality of match of their classes (sub-macro match) Total Exact, Total Relaxed, Partial Exact and Partial Relaxed The coverage : the number of sub-macro matches between the source and the target schemas Total : all classes of the source have a match in the target Partial : some, not all, classes of source have a match in the target.
  • Slide 21
  • ER 2003 21 Macro Match Total Exact MatchPartial Exact Match Total Relaxed Match Partial Relaxed Match
  • Slide 22
  • ER 2003 22 Example Recipe, Ingredient, Instruction Dish, Item, Step Recipe Partial Exact Match Dish Dish Total Exact Match Recipe TEPE
  • Slide 23
  • ER 2003 23 Quantitative Measure: Weight-Based Measure of QoM Match Taxonomy : Qualitative measure of match between two entities Can distinguish between a total exact and a partial exact match, or a total exact and a partial relaxed match Cannot decide if one partial exact match is better than the other, or if a total relaxed match is better than a partial exact match Weight Based Measure: Provides a quantitative metric for the QoM
  • Slide 24
  • ER 2003 24 =, Match operator Weight Weight-based Match Model Match Value: weight of each match operator representing the match between two properties Example: Label match Name W(l s, l t ) = 1.0
  • Slide 25
  • ER 2003 25 What has been done - Related Work Domain specific [BHP94,BCVB01,BM01] and domain independent [HMN+99,MBR01,DR02] algorithms Approaches exploit various types of information Element names, structural properties, ontologies, characteristics of data instances. Example: Doan et al. [DDH01] Combines match predictions using a set of machine learning techniques Match predictions based on element name matching, content matching, text classification and domain knowledge Madhavan et al. (Cupid) [ MBR01] Hybrid algorithm - combines linguistic and structural match algorithm
  • Slide 26
  • ER 2003 26 Conclusions and Future Work Contributions: Proposed QoM: a quality metric for schema matches Two techniques to evaluate the QoM Qualitative: Match Taxonomy Quantitative: Weight-based match Model Future Work: Combining user input for desired matches to optimize the schema match process Refinement of QoM for XML model Accounting for order, and the different levels of nesting Development of Match algorithms based on QoM
  • Slide 27
  • ER 2003 27 More Information: http: //www.cs.uml.edu/dslhttp: //www.cs.uml.edu/dsl email: [email protected],[email protected],
  • Slide 28
  • ER 2003 28 Micro Match Model QoM (a s, a t ) = W (L s, L t ) + W (A s, A t ) + W (T s, T t ) + W (N s, N t ) + W (I s, I t ) 5 QoM (m s, m t ) = QoM sig (m s, m t ) + (2 * QoM spec (m s, m t ) ) 3 QoM sig (m s, m t ) = W (A s, A t ) + W (O s, O t ) + W (I s, I t ) 3 QoM spec (m s, m t ) = W (pre s, pre t ) + W (post s, post t ) 2
  • Slide 29
  • ER 2003 29 Example The Recipe SchemaThe Dish Schema
  • Slide 30
  • ER 2003 30 Micro Match (Attribute) a s = vs a t = L s vs L t A s vs A t T s vs T t N s vs N t I s vs I t Exact Match Relaxed Match
  • Slide 31
  • ER 2003 31 Micro Match (Method) m s = vs m t = A s vs A t O s vs O t I s vs I t Pre s vs Pre t Post s vs Post Exact Match Relaxed Match ======
  • Slide 32
  • ER 2003 32 Weighing the Micro Match Match between attributes based on the match of the individual properties Exact or relaxed QoM(a s, a t ): Quantitative measure of the match between attributes a s and a t. The normalized sum of the match values of the individual properties of an attribute.
  • Slide 33
  • ER 2003 33 Example name = QoM (name recipe, name dish ) = 1.0 + 1.0 + 1.0 + 1.0 + 1.0 = 1.0
  • Slide 34
  • ER 2003 34 Weighing the Sub-Macro Match Sub-Macro match: Normalized sum of QoM of micro matches: Coverage: Sub-Macro Match: R W (C s, C t ) = QoM (M s, M t ) | C s | R S (C s, C t ) = | C m s | | C t | R T (C s, C t ) = | C m s | 3 QoM (C s, C t ) = R W (C s, C t ) + R s (C s, C t ) + R T (C s, C t )
  • Slide 35
  • ER 2003 35 Example id, step direction R QoM (step Instruction, direction step ) = 0.5 + 1.0 + 1.0 + 1.0 + 1 = 0.9 QoM (Instruction, step) = (0.45+0.5+1.0) / 3 = 0.65 R W (Instruction, step) = 0.9 / 2 = 0.45 R S (Instruction, step) = 1 / 2 = 0.5 R t (Instruction, step) = 1 / 1 = 1.0
  • Slide 36
  • ER 2003 36 Weighing the Macro Match Macro match: Normalized sum of sub-macro QoMs Coverage Macro Match: R W (S s, S t ) = QoM (C s, C t ) | S s | | S t | R T (S s, S t ) = | S m s | | S s | R S (S s, S t ) = | S m s | 3 QoM (S s, S t ) = R W (S s, S t ) + R s (S s, S t ) + R T (S s, S t )
  • Slide 37
  • ER 2003 37 Example Recipe, Ingredient, Instruction Dish, Item, Direction 1.00 QoM (RECIPE, DISH) = (0.81+1.0+1.0) / 3 = 0.94 R W (RECIPE, DISH) = (1.00 + 0.78 + 0.65 ) / 3 = 0.81 R S (RECIPE, DISH) = 3 / 3 = 1.0 R t (RECIPE, DISH) = 3 / 3 = 1.0 0.780.65