a graph model for data and workflow provenance€¦ · a graph model for data and workflow...
TRANSCRIPT
![Page 1: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/1.jpg)
A graph model for data and workflow provenance
Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska,
Jan van den Bussche, & Stijn Vansummeren
TaPP 2010
![Page 2: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/2.jpg)
Provenance in ...• Databases
• Mainly for (nested) relational model
• Where-provenance ("source location")
• Lineage, why ("witnesses")
• How/semiring model
• Relatively formal
• Workflows
• Many different systems
• Many different models
• (converging on OPM?)
• Graphs/DAGs
• Relatively informal
![Page 3: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/3.jpg)
Provenance in ...• Databases
• Mainly for (nested) relational model
• Where-provenance ("source location")
• Lineage, why ("witnesses")
• How/semiring model
• Relatively formal
• Workflows
• Many different systems
• Many different models
• (converging on OPM?)
• Graphs/DAGs
• Relatively informal
?????
![Page 4: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/4.jpg)
This talk
• Relate database & workflow "styles"
• Develop a common graph formalism
• Need a common, expressive language that
• supports many database queries
• describes some (simple) workflows
![Page 5: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/5.jpg)
Previous work
• Dataflow calculus (DFL), based on nested relational calculus (NRC)
• Provenance "run" model by Kwasnikowska & Van den Bussche (DILS 07, IPAW 08)
• "Provenance trace" model for NRC
• by (Acar, Ahmed & C. '08)
• Open Provenance Model (bipartite graphs)
• (Moreau et al. 2008-9), used in many WF systems
![Page 6: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/6.jpg)
NRC/DFL background
• A very simple, functional language:
• basic functions +, *,... & constants 0,1,2,3...
• variables x,y,z
• pair/record types (A:e,...,B:e), πA (e)
• collection (set) types
• {e,...} e ∪ e {e | x in e'} ∪e
![Page 7: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/7.jpg)
An example
![Page 8: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/8.jpg)
An example
• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}
![Page 9: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/9.jpg)
An example
• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}
sum { x * y | (x,y,z) in R, x < y}
![Page 10: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/10.jpg)
An example
• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}
sum { x * y | (x,y,z) in R, x < y}
= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}
![Page 11: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/11.jpg)
An example
• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}
sum { x * y | (x,y,z) in R, x < y}
= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}
= sum {1 * 2, 4 * 5}
![Page 12: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/12.jpg)
An example
• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}
sum { x * y | (x,y,z) in R, x < y}
= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}
= sum {1 * 2, 4 * 5}
= sum {2,20}
![Page 13: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/13.jpg)
An example
• Suppose R = {(1,2,3), (4,5,6), (9,8,7)}
sum { x * y | (x,y,z) in R, x < y}
= sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}
= sum {1 * 2, 4 * 5}
= sum {2,20}
= 22
![Page 14: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/14.jpg)
Another example
• In DFL, built-in functions / constants can be whole programs & files,
• as in Provenance Challenge 1 workflow:
let WarpParams := {align_warp(img,hdr})
| (img,hdr) in Inputs} in
let Reslices := {reslice(wp)
| wp in WarpParams} in
softmean(Reslices)
![Page 15: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/15.jpg)
Goal: Define "provenance graphs" for DFL
![Page 16: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/16.jpg)
Goal: Define "provenance graphs" for DFL
let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} inlet Reslices := {reslice(wp) | wp in WarpParams} inin softmean(Reslices)
![Page 17: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/17.jpg)
Goal: Define "provenance graphs" for DFL
let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} inlet Reslices := {reslice(wp) | wp in WarpParams} inin softmean(Reslices)
http://www.flickr.com/photos/schneertz/679692806/
![Page 18: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/18.jpg)
First step: values
c
<>{}
...
elem
elem A1
An
v
v
v
or
v
v
or ...copyvor
![Page 19: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/19.jpg)
Example value
1
<>
{}elem
elem
A
B2
3
<>A
B
![Page 20: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/20.jpg)
Next step: evaluation nodes ("process")
c ...
1
n
e
fe
x letx
body
e
e
Constants,primitive functions
Variables & temporary bindings
head
![Page 21: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/21.jpg)
Pairing
...
A1
An
e
<>e
πAe
Record building
Field lookup
![Page 22: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/22.jpg)
Conditionals
iftest
then
e
eif
test
else
e
e
Note: Only taken branch is recorded
![Page 23: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/23.jpg)
Sets: basic operations
{}e
∅
∪1
2
e
e
Empty set
Singleton
Union
![Page 24: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/24.jpg)
Sets: complex operations
∪e
forx
head
body
e
e
ebody...
Flattening
Iteration
![Page 25: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/25.jpg)
Provenance graphs
• are graphs with "both value and evaluation structure"
!
" #
$
% &
!
"
#$%&" '
(
)
#$%&"
'
(
*
+
, '
'-(
./01"
2/34
(5 6%4"
./01!
2/34
$%&"
6%4"$%&"
!
" #
$
%
&'(
)
#
*%
+,-&'(
#
%
./01
![Page 26: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/26.jpg)
A bigger example
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
![Page 27: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/27.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Value structure
![Page 28: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/28.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Value structure
{} {}{}
C
C
2
C
CC
C
C
C
T
{}
{}
{}
{}<>
1
2
1 <>
1 C
{}
C
C
CC
C F
C
![Page 29: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/29.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Input values
{} {}{}
C
C
2
C
CC
C
C
C
T
{}
{}
{}
{}<>
1
2
1 <>
1 C
{}
C
C
CC
C F
C
![Page 30: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/30.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Return value
{} {}{}
C
C
2
C
CC
C
C
C
T
{}
{}
{}
{}<>
1
2
1 <>
1 C
{}
C
C
CC
C F
C
![Page 31: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/31.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Expression structure
![Page 32: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/32.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Expression structure
=
fstx
snd
empty
let R
let S
for ysUfor x
if
if
R=
{}x
snd
fst
fst
sndy
+
![Page 33: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/33.jpg)
Building provenance graphs
• is complicated
• Here we'll use high-level "graph rewrite rule" formalism
• Mostly because it is nicer to look at than formal version
![Page 34: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/34.jpg)
cc c
ff f(v1,...,vn)
1
nvn
v1
...
1
nvn
v1
...
letx
head
body
v
ex
head
body
v
ex
letx copy
copy
![Page 35: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/35.jpg)
πAi<>
A1
Anvn
v1
...
<>
A1
Anv
v
...πAi...
vi copy...
...vi
<>A1
Anvn
v1
...
A1
Anvn
v1
... <> <>
A1
An
![Page 36: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/36.jpg)
if
e2
e1
True
e1
Trueif
test
then
test
then
else
if
e2
e1
False
e2
Falseif
test
else
test
then
elsecopy
copy
![Page 37: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/37.jpg)
empty?{} {} empty? False
...
elem
elem
v
v
...
elem
elem
v
v
empty?{} {} empty? True
![Page 38: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/38.jpg)
∅ ∅∅
{}
∪
...
elem
elem
v
v
{}
...
elem
elem
v
v
{}
v
elem
v {}{}
{}∪...
v
v
...
v
v
elem
elem
{}
elem
elem
{}
elem
elem
......
![Page 39: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/39.jpg)
OK, take a deep breath!
![Page 40: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/40.jpg)
e
e
x copy
x copy
forx
head
body
{}
ex
...
elem
elem
vn
v1
head
body
{}
...elem
elem
vn
v1
body
forx {}
elem
elem
...
...
elem
elem
v
v
{}
...
elem
elem
v
v
{}
elem
elem
{} ∪
...
elem
elem
v
v
{}
...
elem
elem
v
v
{}
elem
elem
{} {}∪
elem
elem
......
...
![Page 41: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/41.jpg)
An example
forx
head
body
{}elem
elem
2
1
+1
x
![Page 42: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/42.jpg)
An example
forx
head
body
{}elem
elem
2
1
+1
x
![Page 43: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/43.jpg)
An example
head
body
{}elem
elem
2
1
+1
forx {}
elem
elem
+1
x C
x C
![Page 44: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/44.jpg)
An example
head
body
{}elem
elem
2
1
+1
forx {}
elem
elem
+1
x C
x C
![Page 45: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/45.jpg)
An example
head
body
{}elem
elem
2
1
+
forx {}
elem
elem
+
x C
x C
1 1
1 1
![Page 46: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/46.jpg)
An example
head
body
{}elem
elem
2
1
+
forx {}
elem
elem
+
x C
x C
1 1
1 1
![Page 47: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/47.jpg)
An example
head
body
{}elem
elem
2
1
forx {}
elem
elem
x C
x C
1 1
1 1
+ 2
+ 3
![Page 48: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/48.jpg)
Graphs can "lie" (inconsistency)
•+ 5
2
2
![Page 49: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/49.jpg)
Graphs can "lie" (inconsistency)
•+ 5
2
2
if copy
2
True test
else
![Page 50: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/50.jpg)
Graphs can "lie" (inconsistency)
•+ 5
2
2
if copy
2
True test
else
4
3
2
1
head
body
elem
elem
body
forx {}
elem
elem
{}
3
4
![Page 51: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/51.jpg)
Graphs can "lie" (inconsistency)
•+ 5
2
2
if copy
2
True test
else
4
3
2
1
head
body
elem
elem
body
forx {}
elem
elem
{}
3
4
"Locally" but not "globally" consistent
![Page 52: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/52.jpg)
Graph queries
• Many possible approaches
• In paper: some Datalog
• Maybe overkill, seems fragile
• In code: some "annotation propagation" traversals
• Seems to handle where, "explanations", "summaries"
![Page 53: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/53.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Explaining
![Page 54: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/54.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Explaining
![Page 55: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/55.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Explaining
![Page 56: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/56.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Explaining
Note: Smallest consistent subgraph (NOT
transitive closure!)
![Page 57: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/57.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
Summarizing
![Page 58: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/58.jpg)
!
"
#$%
&'()
*+,$-.
/&'()0
&'() 1
0
0
2+3
4#
5678 %8$%
2+3
%98-
"
#$%
&'()
*+,
$-./
&'() 0
&'()1
0
1 8:(%) 4#
;<=$8 %8$%
2+38=$8
#'6+"&'() 98<.
&'()
>'.)
&'()>'.)
2+3
?2+3
@
)
#$%
&'() $-.A
&'()
0&'()
1
#'6+)&'() 98<.
1
>'.)
2+3=8%+@
98<.
2+3
>'.)
=8%+!
98<.
&'()
>'.)
0
12+3
0
12+3
&'()
+ 2
Summarizing
{}1
1
![Page 59: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/59.jpg)
Graphs are partially "replayable"
• If we change a value node, can try to "readjust" to recover consistency
• Formalized in (Acar, Ahmed, Cheney 08)
+ 4
2
2
![Page 60: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/60.jpg)
Graphs are partially "replayable"
• If we change a value node, can try to "readjust" to recover consistency
• Formalized in (Acar, Ahmed, Cheney 08)
+ 4
2
17
![Page 61: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/61.jpg)
Graphs are partially "replayable"
• If we change a value node, can try to "readjust" to recover consistency
• Formalized in (Acar, Ahmed, Cheney 08)
+
2
17
19
![Page 62: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/62.jpg)
Graphs are partially "replayable"
• If we change a value node, can try to "readjust" to recover consistency
• Formalized in (Acar, Ahmed, Cheney 08)
+
2
17
19
if copy
2
test
else
False
![Page 63: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/63.jpg)
Graphs are partially "replayable"
• If we change a value node, can try to "readjust" to recover consistency
• Formalized in (Acar, Ahmed, Cheney 08)
+
2
17
19
if copy
2
True test
else
![Page 64: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/64.jpg)
Graphs are partially "replayable"
• If we change a value node, can try to "readjust" to recover consistency
• Formalized in (Acar, Ahmed, Cheney 08)
+
2
17
19
if
2
True test
else
Stuck!
????
![Page 65: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/65.jpg)
Implementation in Haskell
• Summarized in paper, full code on request
• roughly 250 LOC for basic evaluator
• another 300 for graphviz translation, basic queries, examples
• Point?
• No claim of efficiency/scalability but easy to understand, experiment
• Elucidates some tricky details that pictures hide
• Similar "lightweight modeling" might be valuable for understanding/relating other WF/DB models
![Page 66: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/66.jpg)
Related work• This work synthesizes/rearranges ideas from
several previous works & "folklore"
• traces (Acar, Ahmed, Cheney 2008)
• runs (Kwasnikowska, van den Bussche, DILS 2007, IPAW 2008)
• OPM graphs (Moreau et al. IPAW 2008 etc.)
• and many workflow systems
• More can be done to relate DB & workflow models
![Page 67: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/67.jpg)
Future work
• This is work in progress
• Next steps:
• Extending to understand/model other workflow features
• Better grasp of "real" queries and features needed
• Implementa(tion|ability)?
• Optimization?
![Page 68: A graph model for data and workflow provenance€¦ · A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney, Natalia Kwasnikowska, Jan van den Bussche,](https://reader033.vdocuments.mx/reader033/viewer/2022051907/5ff9bc47fd2f095c3b772bf9/html5/thumbnails/68.jpg)
Conclusions
• DB & WF provenance have much in common
• We develop common graph model
• with both intuitive & precise presentations
• Still much to do to relate and integrate DB & WF models
• let alone integrate models at scale in real systems