entity resolution with evolving rules

26
Entity Resolution with Evolving Rules Youzhong Ma 2010-9-25 Lab of WAMDM

Upload: zandra

Post on 12-Jan-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Entity Resolution with Evolving Rules. Youzhong Ma 2010-9-25 Lab of WAMDM. Outline. Motivations ER Related concepts ER properties Conclusions. Entity Resolution background. Entity Resolution background. Naïve ER Approach Vs. New Approach. Outline. Motivations ER Related concepts - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Entity Resolution with Evolving Rules

Entity Resolution with Evolving Rules

Youzhong Ma 2010-9-25Lab of WAMDM

Page 2: Entity Resolution with Evolving Rules

Outline

Motivations ER Related concepts ER properties Conclusions

Page 3: Entity Resolution with Evolving Rules

Entity Resolution background

Page 4: Entity Resolution with Evolving Rules

Entity Resolution background

Page 5: Entity Resolution with Evolving Rules

Naïve ER Approach Vs. New Approach

Page 6: Entity Resolution with Evolving Rules

Outline

Motivations

ER Related concepts ER properties Conclusions

Page 7: Entity Resolution with Evolving Rules

ER Related concepts

Suppose market A will merge market B They have to combine their customers The same person may occur in two

markets’ customer DB, but some attributes are different

How to deal with it?

Page 8: Entity Resolution with Evolving Rules

ER Rule

Boolean functions determines if two records represent the same

entity: true or false.

Distance functions How different(similar) the records are.

Page 9: Entity Resolution with Evolving Rules

ER Example

Page 10: Entity Resolution with Evolving Rules

ER procedure

B1:Pname E1 = {{r1,r2,r3},{r4}} (6 comps) )

B2: Pname ∧ Pzip E2 = {{r1,r2},{r3},{r4}}

Naïve approachNaïve approach6 comps6 comps

original records set S = {r1,r2,r3,r4}ER input Pi = {{r1},{r2},{r3},{r4}}

Evolving ruleEvolving rule3 comps3 comps

The Evolving rule approach only works if the ER algorithm satisfies Certain properties and B2 is Stricter than B1.

So one contribution of this paper is to exploitUnder what conditions and for what ER algorithmsAre incremental approaches feasible?

Page 11: Entity Resolution with Evolving Rules

B1:Pname ∧ Pzip E1 = {{r1,r2},{r3},{r4}} (6 comps) )

B2: Pname ∧ Phone E2 ={{r1},{r2,r3},{r4}}

3comps3comps

original records set S = {r1,r2,r3,r4}ER input Pi = {{r1},{r2},{r3},{r4}}

Pname Ename = {{r1,r2,r3},{r4}}

Pzip Ezip = {{r1,r2},{r3},{r4}}

Materialization!

Page 12: Entity Resolution with Evolving Rules

Outline

Motivations ER Related concepts

ER properties Conclusions

Page 13: Entity Resolution with Evolving Rules

Two important properties for ER algorithms that enable efficient rule evolution for match-based clustering

Rule Monotonicity(RM)

Context Free(CF)

Page 14: Entity Resolution with Evolving Rules

Pname ∧ Pzip ≤ Pname

Page 15: Entity Resolution with Evolving Rules

Rule Monotonicity(RM)

B2:Pname E2 = {{r1,r2,r3},{r4}}

B1: Pname ∧ Pzip E1 = {{r1,r2},{r3},{r4}}

Page 16: Entity Resolution with Evolving Rules

Context Free (CF)

Page 17: Entity Resolution with Evolving Rules

General Incremental VS. Context Free

Order independent VS. Rule Monotonicity An ER algorithm is order independent if the ER

result is same regardless of the order of the records processed.

Existing properties in literature

Page 18: Entity Resolution with Evolving Rules
Page 19: Entity Resolution with Evolving Rules
Page 20: Entity Resolution with Evolving Rules
Page 21: Entity Resolution with Evolving Rules

experiments

Page 22: Entity Resolution with Evolving Rules

Outline

Motivations ER Related concepts ER properties

Conclusions

Page 23: Entity Resolution with Evolving Rules

conclusions

Propose a new ER approach with evolving rules

Exploiting the properties (RM、 CF) of the ER algorithms that enable efficient rule evolution

Providing guidance to the ER algorithms designer

Page 24: Entity Resolution with Evolving Rules

Some problems

How are the comparision rules generated?

How to design the ER Algorithms that hold the RM and CF properties?

How to Implement the ER algorithms in MapReduce framework?

Page 25: Entity Resolution with Evolving Rules

Thanks to everyone of Web Group sincerely

Page 26: Entity Resolution with Evolving Rules