learning semantic string transformations from examples

63
Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani

Upload: laquinta-liz

Post on 03-Jan-2016

25 views

Category:

Documents


3 download

DESCRIPTION

Learning Semantic String Transformations from Examples. Rishabh Singh and Sumit Gulwani. FlashFill. Transformations. Syntactic Transformations Concatenation of regular expression based substring “VLDB2012”  “VLDB” Semantic Transformations More than just characters - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Semantic String Transformations from Examples

Learning Semantic String Transformations from

ExamplesRishabh Singh and Sumit

Gulwani

Page 2: Learning Semantic String Transformations from Examples

FlashFill

Page 3: Learning Semantic String Transformations from Examples
Page 4: Learning Semantic String Transformations from Examples

Transformations

• Syntactic Transformations – Concatenation of regular expression based

substring

– “VLDB2012” “VLDB”

• Semantic Transformations–More than just characters– “1/5/2010” “May 1st 2010”

Page 5: Learning Semantic String Transformations from Examples

Semantic Transformations

• Semantic information as relational tables– 1 January, 2 February

• Learn table lookup queries– VLOOKUP macro 2nd most problematic

Page 6: Learning Semantic String Transformations from Examples

Outline

• Lookup Transformations

• Lookup + Syntactic Transformations

• Case Studies

Page 7: Learning Semantic String Transformations from Examples

Table Lookup Transformati

ons

Demo

Page 8: Learning Semantic String Transformations from Examples

Learning Framework

Input Strings

FOutput String

F1

1. Domain-specific Language L

Fn…

2. Algorithm to learn all Fs from (i,o)

Page 9: Learning Semantic String Transformations from Examples

Lookup Transformation Language

Page 10: Learning Semantic String Transformations from Examples

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

023-34-3254 6418 Mary Dina

Input v1 Output

044-58-3429 Steve Russell

Select(Name, EmpRecord, (SSN = v1))

Example - Lookup

Page 11: Learning Semantic String Transformations from Examples

ItemRec

ItemId Item

ST-340 Stroller

BI-567 Bib

DI-328 Diapers

WI-989 Wipes

AS-469 Aspirator

PriceRec

ItemId Price

ST-340 $145.67

BI-567 $3.56

DI-328 $21.45

WI-989 $5.12

AS-469 $2.56

Input v1 Output

Stroller $145.67

Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))

Example – Transitive Lookup

Page 12: Learning Semantic String Transformations from Examples

Learn Query

ItemRec

ItemId Item

ST-340 Stroller

BI-567 Bib

DI-328 Diapers

WI-989 Wipes

AS-469 Aspirator

PriceRec

ItemId Price

ST-340 $145.67

BI-567 $3.56

DI-328 $21.45

WI-989 $5.12

AS-469 $2.56

Input v1 Output

Stroller $145.67

Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))

Page 13: Learning Semantic String Transformations from Examples

Synthesis Algorithm :

• Input: (input state , output string )

• Output: all conforming expressions

• Reachability algorithm from input strings

Page 14: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

Strings reachable from input row044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

𝜂1 𝜂2 𝜂3Progs [𝜂 1 ]= {𝑣1 }

Page 15: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

strings in table rows of visited nodes 044-58-3429 1125 Steve Russell

)B≡ {∧𝐶𝑖={𝑣𝑎𝑙−1 (𝑇 [𝐶𝑖 ,𝑟 ] ) }} 𝑗

Page 16: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

……..Repeat until k steps or

fixpoint

Page 17: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

……..

Steve Russell

𝜂Progs [𝜂 ]

Page 18: Learning Semantic String Transformations from Examples

GenerateSt r𝑡• Sound and k-complete

– t: number of reachable strings– p: number of candidate keys–m: maximum size of a candidate key

Page 19: Learning Semantic String Transformations from Examples

Data structure

• Maintains tree structure– share common sub-expressions

• CNF of Boolean Conditionals– independent column predicates

Page 20: Learning Semantic String Transformations from Examples

Intersect t :D t1∧Dt 2

∧ ≡

Page 21: Learning Semantic String Transformations from Examples

Synthesize Procedure

Synthesize((i1,o1), …, (in,on))

P = GenerateStrt(i1,o1)

for j = 2 to n: P’ = GenerateStrt(ij,oj)

P = Intersectt(P’, P)

return P

Page 22: Learning Semantic String Transformations from Examples

Semantic String

Transformations

Demo

Page 23: Learning Semantic String Transformations from Examples

Syntactic String Language [GulwaniPOPL11]

Page 24: Learning Semantic String Transformations from Examples

Combined Language

Syntactic manipulations over lookup outputs

Syntactic manipulations before indexing

Page 25: Learning Semantic String Transformations from Examples

Synthesis Algorithm:

– Reachability based on syntactic string matches•

– Boolean conditionals

Page 26: Learning Semantic String Transformations from Examples

GenerateSt r𝑢SSN: 044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

Mr. Steve Russell

Page 27: Learning Semantic String Transformations from Examples

GenerateSt r𝑢SSN: 044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

GenerateSt r ′𝑡

Page 28: Learning Semantic String Transformations from Examples

GenerateSt r𝑢SSN: 044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

GenerateSt r ′𝑡

Page 29: Learning Semantic String Transformations from Examples

GenerateSt r𝑢{ “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” } Set of reachable

strings

Page 30: Learning Semantic String Transformations from Examples

GenerateSt r𝑢

GenerateSt r𝑠

{ “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” }

Mr. Steve Russell

and in paper

Page 31: Learning Semantic String Transformations from Examples

Experiments

• 50 benchmark problems– 12 , 38

• ~1020 consistent expressions– Size of data structure: ~2000

• Performance: 96% less than 1 second

• Ranking: at most 3 examples (95% 2 examples)

Page 32: Learning Semantic String Transformations from Examples

Related Work

• Matching strings for table joins– Record Matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06]– Schema Matching [Dhamankar et. al. SIGMOD04, Warren & Tompa

VLDB06]

• Query Synthesis– from representative view [Das Sharma et.al. ICDT10, Tran et.al.

SIGMOD09]

• Text-editing by example– QuickCode[Gulwani POPL11]– SMARTedit[Lau et.al. ML03], Simulatenous Editing[Miller

et.al. USENIX01]

Page 33: Learning Semantic String Transformations from Examples

Thanks!

 

End-Users

Algorithm Designers

Software Developers

Large potential

Page 34: Learning Semantic String Transformations from Examples

Backup slides

Page 35: Learning Semantic String Transformations from Examples

Semantic String Transformations

Time (12 Hr) Time (24 Hr)

0930 9:30 AM

1520 3:20 PM

1648

0830

1015

2010

1012

1425

=TEXT(C,”00 00”)+0

Page 36: Learning Semantic String Transformations from Examples

Semantic String Transformations

Date Formatted Date

06-03-2008 Jun 3rd, 2008

03-26-2010

08-01-2009

09-24-2007

05-14-2010

07-20-1998

10-24-2004

08-24-1972

Page 37: Learning Semantic String Transformations from Examples

Idea 1: Share sub-expressions

T3

C1 C2 C3

s3 s4 s5

T1

C1 C2 C3

s1 s2 s3

T2

C1 C2 C3

s2 s3 s4

Select(C3, T2, C1=e)

Select(C2, T3, C1=Select(C2,T2,C1=e)

e Select(C2, T1, C1=v1)𝑠2

Page 38: Learning Semantic String Transformations from Examples

Youtube Videos

FrenchPolishUrduGermanSerbianRussian

http://bit.ly/flashfill

Page 39: Learning Semantic String Transformations from Examples

Idea 2: CNF conditionals

T

C1 C2 C3 … Cn Cn+1

s s s s t

v1 v2 … vm Out

s s s t

Page 40: Learning Semantic String Transformations from Examples

No. of Consistent Expressions

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 491

10000

100000000

1000000000000

1E+016

1E+020

1E+024

1E+028

1E+032

1E+036

Large number of consistent expressions

Benchmarks

Nu

mb

er

of

exp

ressio

ns

Page 41: Learning Semantic String Transformations from Examples

Succinct Representation

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

500

1,000

1,500

2,000

Succinct Representation

Benchmarks

Siz

e o

f D

ata

Str

uctu

re

Page 42: Learning Semantic String Transformations from Examples

Performance

1 6 11 16 21 26 31 36 41 460.00

2.00

4.00

6.00

8.00

10.00

12.00

Running Time

Benchmarks

Ru

nn

ing

Tim

e (

in s

econ

ds)

Page 43: Learning Semantic String Transformations from Examples

Ranking

1 2 30

5

10

15

20

25

30

35

40

Ranking Measure

Number of I/O examples

Nu

mb

er

of

Be

nch

ma

rks

Page 44: Learning Semantic String Transformations from Examples

Idea 2: CNF conditionals

{{𝜂1 ,𝜂2 } ,𝜂2 ,Progs }Progs [𝜂1 ]≡ {𝑣1 ,𝑣2 ,⋯ ,𝑣𝑚}

Progs [𝜂2 ]={Select (C𝑛+1 ,𝑇 ,∧𝑖C i= {𝑠 ,𝜂1 })}

𝑚+1Θ ((𝑚+1 )𝑛 )

Page 45: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

: string value𝜂

: set of lookup programs to generate

𝑣𝑎 𝑙−1 (𝑠 ):Node𝜂 ,𝑣𝑎𝑙 (𝜂 )=𝑠

Page 46: Learning Semantic String Transformations from Examples

Related Work

• Record Matching – Similarity functions for matching [Elmagarmid et. al.

07, Koudas et. al. SIGMOD06]– Customizable similarity function [Arasu et. al. VLDB09]

• Learning Schema Matches– iMAP [Dhamankar et. al. SIGMOD04] concat. of

column strings using domain-specific knowledge

– [Warren & Tompa VLDB06] concatenation of column substrings, single table

Page 47: Learning Semantic String Transformations from Examples

Related Work

• Query Synthesis [Das Sharma et.al. ICDT10, Tran et.al. SIGMOD09]

– Infer relation from large representative example view

– no joins or projections

• Text-editing using examples– QuickCode[Gulwani POPL11] string

transformations– SMARTedit[Lau et.al. ML03], Simulatenous

Editing[Miller et.al. USENIX01] programming by demonstration

Page 48: Learning Semantic String Transformations from Examples

General Framework

• A Domain-specific Transformation Language L– Expressive and succinct

• Efficient Data structures for set of expressions– Version-space algebra

• GenerateStr – All sets of expressions from I-O example

• Intersect– Intersect two sets of expressions

Page 49: Learning Semantic String Transformations from Examples

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

023-34-3254 6418 Mary DinaInput v1 Output

044-58-3429 Steve Russell

023-34-3254

Select(Name, EmpRecord, (SSN = v1))

Example - Lookup

Page 50: Learning Semantic String Transformations from Examples

ItemRec

ItemId Item

ST-340 Stroller

BI-567 Bib

DI-328 Diapers

WI-989 Wipes

AS-469 Aspirator

PriceRec

ItemId Price

ST-340 $145.67

BI-567 $3.56

DI-328 $21.45

WI-989 $5.12

AS-469 $2.56

Input v1 Output

Stroller $145.67

Bib

Aspirator

Wipes

Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))

Example – Transitive Lookups

Page 51: Learning Semantic String Transformations from Examples

Data Structure

Page 52: Learning Semantic String Transformations from Examples

Data structure for expressions

Page 53: Learning Semantic String Transformations from Examples

Data structure

Page 54: Learning Semantic String Transformations from Examples

Data structure

Page 55: Learning Semantic String Transformations from Examples

Data structure

Page 56: Learning Semantic String Transformations from Examples

T1

C1 C2 C3

s1 s2 s3

T2

C1 C2 C3

s2 s3 s4

Ti

C1 C2 C3

si si+1 si+2

Example

…TmInput v1 Output

s1 sm

Page 57: Learning Semantic String Transformations from Examples

Ti-1

C1 C2 C3

si-1 si si+1

Ti-2

C1 C2 C3

si-2 si-1 si

Sub-expression Sharing

𝑠𝑖

Page 58: Learning Semantic String Transformations from Examples

Sub-expression Sharing

𝑠𝑖− 1 𝑠𝑖𝑠𝑖− 2

𝜂𝑖

𝜂𝑖− 1

𝜂𝑖− 2

Page 59: Learning Semantic String Transformations from Examples

Sub-expression Sharing

{{𝜂1 ,𝜂2 ,⋯ ,𝜂𝑚 } ,𝜂𝑚 , Progs }

Progs [𝜂1 ]≡ {𝑣1 }Progs [𝜂2 ]={Select (C2 , T 1,C1= {s1 ,𝜂1 }) }

Page 60: Learning Semantic String Transformations from Examples

Sub-expression Sharing

𝑁 (𝑖 )=𝑁 (𝑖−1 )+𝑁 (𝑖−2)

𝑁 (𝑖 )=Θ (2𝑖 ){{𝜂1 ,𝜂2 ,⋯ ,𝜂𝑚 } ,𝜂𝑚 , Progs }

Progs [𝜂1 ]≡ {𝑣1 }Progs [𝜂2 ]={Select (C2 , T 1,C1= {s1 ,𝜂1 }) }

Page 61: Learning Semantic String Transformations from Examples

Intersect t :D t1∧Dt 2

Page 62: Learning Semantic String Transformations from Examples

Current State of the Art: Help forums

Page 63: Learning Semantic String Transformations from Examples

Observations

• Semantic string transformations

• Input-output examples based interaction– New disambiguating inputs

• Add-in with the same interface