trio-boeing.ppt

67
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University

Upload: hondafanatics

Post on 20-Jan-2015

235 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: trio-boeing.ppt

Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS

Martin Theobald

Jennifer Widom

Stanford University

Page 2: trio-boeing.ppt

2

Outline

• Motivation and Applications

• Working Model for ULDBs

• Trio-One System Architecture

• Efficient Confidence Computations

Page 3: trio-boeing.ppt

3

Motivation

• Many applications involve data that is uncertain

(approximate, probabilistic, inexact, incomplete, imprecise, fuzzy, inaccurate, ...)

• Many of the same applications need to track the lineage of their data

Neither uncertainty nor lineage are supported by conventional Database Management Systems (DBMSs)

Coincidence or Fate?Coincidence or Fate?

Page 4: trio-boeing.ppt

4

Sample Applications

Deduplication• Uncertainty: Match and merge• Lineage: Source records

Information extraction• Uncertainty: Extracted labels and values• Lineage: Original context

Information integration• Uncertainty: Inconsistent information• Lineage: Original sources

Page 5: trio-boeing.ppt

5

Sample Applications

Scientific experiments• Uncertainty: Captured (and derived) data

• Lineage: Layers of views

Sensor data• Uncertainty: Sensor values, aggregations,

missing readings

• Lineage: Original readings, views

Page 6: trio-boeing.ppt

6

Claim

The connection between uncertainty and lineage goes deeper than just a shared need by several applications

Coincidence or Fate?Coincidence or Fate?

Page 7: trio-boeing.ppt

7

Lineage and Uncertainty

Lineage...• Enables simple and consistent representation of

uncertain data

• Allows for intuitive and complete working models

• Correlates uncertainty in query results with uncertainty in the input data

• Can make computation over uncertain data more efficient

Applications use lineage to reduce or resolve uncertainty

Page 8: trio-boeing.ppt

8

Goal

A new kind of DBMS in which:

1. Data2. Uncertainty3. Lineage

are all first-class interrelated concepts

Trio Trio }

Page 9: trio-boeing.ppt

9

The Trio Trio

1. Data Model Simplest extension to relational model that’s

sufficiently expressive

2. Query Language Simple extension to SQL with well-defined

semantics and intuitive behavior

3. System A complete open-source DBMS that people

want to use

Page 10: trio-boeing.ppt

10

The Present

1. Data Model Uncertainty-Lineage Databases (ULDBs)

2. Query Language TriQL

3. System Trio-One: Layered prototype deployable

on top of any standard relational DBMS

Page 11: trio-boeing.ppt

11

Running Example: Crime-Solving

Saw(witness,car) // may be uncertain

Drives(person,car) // may be uncertain

Suspects(person) = πperson(Saw ⋈car Drives)

Page 12: trio-boeing.ppt

12

Data Model: Uncertainty

An uncertain database represents a set ofpossible instances of data values

• Amy saw either a Honda or a Toyota

• Jimmy drives a Toyota, a Mazda, or both

• Betty saw an Acura with confidence 0.5 or a Toyota with confidence 0.3

• Hank is a suspect with confidence 0.7

Page 13: trio-boeing.ppt

13

Our Model for Uncertainty

1. Alternatives (mutually exclusive)

2. ‘?’ (Maybe) Annotations

3. Confidences

Page 14: trio-boeing.ppt

14

Our Model for Uncertainty

1. Alternatives: uncertainty about value

2. ‘?’ (Maybe) Annotations

3. Confidences

Saw (witness,car)

(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)

witness car

Amy { Honda, Toyota, Mazda }

=

Three possibleinstances

Page 15: trio-boeing.ppt

15

Six possibleinstances

Our Model for Uncertainty

1. Alternatives

2. ‘?’ (Maybe): uncertainty about presence

3. Confidences

Saw (witness,car)

(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy, Mazda)

(Betty, Acura)?

Page 16: trio-boeing.ppt

16

Our Model for Uncertainty

1. Alternatives

2. ‘?’ (Maybe) Annotations

3. Confidences: weighted uncertainty

Saw (witness,car)

(Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy, Mazda): 0.2

(Betty, Acura): 0.6?

Six possible instances, each with a probability

Page 17: trio-boeing.ppt

17

Models for Uncertainty

• Our model (so far) is not especially new

• We spent some time exploring the space of models for uncertainty

• Tension between understandability and expressiveness– Our model is understandable

– But it is not complete, or even closed under common operations

Page 18: trio-boeing.ppt

18

Closure and Completeness

Completeness Can represent any finite set of possible

instances

Closure Can represent any result of relational

operators

Note: Completeness Closure

Page 19: trio-boeing.ppt

19

Models for Uncertainty

• A-Tuples – Uncertainty on attribute values• (Amy, {Toyota, Mazda})• Not closed

• X-Tuples – Uncertainty on entire tuples• { (Amy, Toyota), (Amy, Mazda) }

• Still not closed

• C-Tables – Tuples + Boolean constraints• { (Amy, Toyota, X=1),

(Amy, Mazda, X<>1),

(Cathy, Honda, X<>1) }• Closed and complete!• Understandable?

Page 20: trio-boeing.ppt

20

Our Model is Not Closed

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

(Billy, Honda) ∥ (Frank, Honda)

(Hank, Honda)

Suspects

Jimmy

Billy ∥ Frank

Hank

Suspects = πperson(Saw ⋈car Drives)

???

Does not correctlycapture possibleinstances in theresult

CANNOT

Page 21: trio-boeing.ppt

21

to the Rescue

Lineage (provenance): “where data came from”• Internal lineage

• External lineage

In Trio: A function λ from alternatives to other

alternatives (or external sources)

Lineage

Page 22: trio-boeing.ppt

22

Example with Lineage

ID Saw (witness,car)

11

(Cathy, Honda) ∥ (Cathy, Mazda)

ID Drives (person,car)

21

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

22

(Billy, Honda) ∥ (Frank, Honda)

23

(Hank, Honda)

ID Suspects

31

Jimmy

32

Billy ∥ Frank

33

Hank

???

Suspects = πperson(Saw ⋈car Drives)

λ(31) = (11,2),(21,2)λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)

λ(33) = (11,1), 23

Correctly captures possible instances inthe result

Page 23: trio-boeing.ppt

23

Trio Data Model

1. Alternatives (mutually exclusive)

2. ‘?’ (Maybe) Annotations

3. Confidences

4. Lineage

ULDBs are closed and complete

Uncertainty-Lineage Databases (ULDBs)Uncertainty-Lineage Databases (ULDBs)

Page 24: trio-boeing.ppt

24

ULDBs: Lineage

Conjunctive lineage sufficient for most operations

• Disjunctive lineage for duplicate-elimination

• Negative lineage for difference

Page 25: trio-boeing.ppt

25

ULDBs: Minimality

A ULDB relation R represents a set of possible

instancesDoes every tuple in R appear in some

possible instance? (no extraneous tuples)

Does every maybe-tuple in R not appear in some possible instance? (no extraneous ‘?’s)

Also

Data-minimalityData-minimality

Lineage-minimalityLineage-minimality

Page 26: trio-boeing.ppt

26

Data Minimality Examples

Extraneous ‘?’

. . .

10

(Billy, Honda) ∥ (Frank, Honda)

. . .

. . .

40

Billy ∥ Frank

. . .

?λ(40,1)=(10,1); λ(40,2)=(10,2)

extraneous

. . .

20

(Billy, Honda)

. . .

?. . .

30

(Frank, Honda)

. . .

?

Page 27: trio-boeing.ppt

27

Data Minimality Examples

Extraneous tuple

(Diane, Mazda) ∥ (Diane, Acura)

Dianeextraneous

(Diane, Mazda)

(Diane, Acura)

?

??

Page 28: trio-boeing.ppt

28

ULDBs: Membership Questions

Does a given tuple t appear in some (all) possible instance(s) of R ?

Is a given table T one of (all of) the possible instances of R ?

Polynomial algorithms based on data-minimizationPolynomial algorithms based on data-minimization

NP-HardNP-Hard

Page 29: trio-boeing.ppt

29

Non-Theorists...

Wake Back Up!

Page 30: trio-boeing.ppt

30

Querying ULDBs

• Simple extension to SQL

• Formal semantics, intuitive meaning

• Query uncertainty, confidences, and lineage

TriQLTriQL

Page 31: trio-boeing.ppt

31

Initial TriQL Example

ID Saw (witness,car)

11

(Cathy, Honda) ∥ (Cathy, Mazda)

ID Drives (person,car)

21

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

22

(Billy, Honda) ∥ (Frank, Honda)

23

(Hank, Honda)SELECT Drives.person INTO SuspectsFROM Saw, DrivesWHERE Saw.car = Drives.car

ID Suspects

31

Jimmy

32

Billy ∥ Frank

33

Hank

???

λ(31) = (11,2),(21,2)λ(32,1)=(11,1),(22,1); λ(32,2)=(11,1),(22,2)

λ(33) = (11,1), 23

Page 32: trio-boeing.ppt

32

Formal Semantics

Query Q on ULDB D

DD

D1, D2, …, DnD1, D2, …, Dn

possibleinstances

Q on eachinstance

representationof instances

Q(D1), Q(D2), …, Q(Dn)Q(D1), Q(D2), …, Q(Dn)

D’D’implementation of Q

operational semanticsD + ResultD + Result

Page 33: trio-boeing.ppt

33

Operational Semantics

Over conventional relational database:

For each tuple in cross-product of X1, X2, ..., Xn

1. Evaluate the predicate

2. If true, project attr-list to create result tuple

3. If INTO clause, insert into table

SELECT attr-list [ INTO table ]FROM X1, X2, ..., Xn

WHERE predicate

Page 34: trio-boeing.ppt

34

Operational Semantics

Over ULDB:For each tuple in cross-product of X1, X2, ..., Xn

1. Create “super tuple” T from all combinations of alternatives

2. Evaluate predicate on each alternative in T ; keep only the true ones

3. Project attr-list on each alternative to create result tuple

4. Details: ‘?’, lineage, confidences

SELECT attr-list [ INTO table ]FROM X1, X2, ..., Xn

WHERE predicate

Page 35: trio-boeing.ppt

35

Operational Semantics: Example

SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

Page 36: trio-boeing.ppt

36

Operational Semantics: Example

SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)

Page 37: trio-boeing.ppt

37

Operational Semantics: Example

SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)

Page 38: trio-boeing.ppt

38

Operational Semantics: Example

SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)

Page 39: trio-boeing.ppt

39

Operational Semantics: Example

(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)

SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)

Page 40: trio-boeing.ppt

40

Operational Semantics: Example

(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)

SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)

Page 41: trio-boeing.ppt

41

Operational Semantics: Example

(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)

SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)

Page 42: trio-boeing.ppt

42

Operational Semantics: Example

SELECT Drives.person INTO SuspectsFROM Saw, DrivesWHERE Saw.car = Drives.car

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

Suspects

Jim ∥ Bill

Hank

??

λ( ) = ...λ( ) = ...

Page 43: trio-boeing.ppt

43

Confidences

Confidences supplied with base data

Trio computes confidences on query results• Default probabilistic interpretation• Can choose to plug in different arithmetic

Saw (witness,car)

(Cathy, Honda): 0.6 ∥ (Cathy, Mazda): 0.4

Drives (person,car)

(Jim, Mazda): 0.3 ∥ (Bill, Mazda): 0.6

(Hank, Honda) ): 1.0

Suspects

Jim: 0.12 ∥ Bill: 0.24

Hank: 0.6

?

?

?

0.3 0.4

0.6

ProbabilisticProbabilistic Min

Page 44: trio-boeing.ppt

44

TriQL: Querying Confidences

Built-in function: Conf()

SELECT Drives.person INTO Suspects FROM Saw, Drives WHERE Saw.car = Drives.car AND Conf(Saw) > 0.5 AND Conf(Drives) >

0.8

SELECT Drives.person INTO Suspects FROM Saw, Drives WHERE Saw.car = Drives.car AND Conf(*) > 0.4

Page 45: trio-boeing.ppt

45

TriQL: Querying Lineage

Built-in join predicate: Lineage()

SELECT Saw.witness INTO AccusesHank FROM Suspects, Saw WHERE Lineage(Suspects,Saw) AND Suspects.person = ‘Hank’

Also works for transitive lineage:

SELECT witness FROM AccusesHank WHERE Lineage(Saw,AccusesHank)

Page 46: trio-boeing.ppt

46

Additional Query Constructs

• “Horizontal subqueries”Refer to tuple alternatives as a relation

• Unmerged (horizontal duplicates)

• Flatten, GroupAlts

• NoLineage, NoConf, NoMaybe

• Query-computed confidences

• Data modification statements

Page 47: trio-boeing.ppt

47

Final Example Query

PrimeSuspect (crime#, accuser, suspect)

(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)

(2, Cathy, Frank) ∥ (2, Betty, Freddy)

person score

Amy 10

Betty 15

Cathy 5

Suspects

Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166

Frank: 0.25 ∥ Freddy: 0.75

Credibility

List suspects with conf values based on accuser credibility

Page 48: trio-boeing.ppt

48

Final Example Query

PrimeSuspect (crime#, accuser, suspect)

(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)

(2, Cathy, Frank) ∥ (2, Betty, Freddy)

person score

Amy 10

Betty 15

Cathy 5

Suspects

Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166

Frank: 0.25 ∥ Freddy: 0.75

Credibility

SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)

Page 49: trio-boeing.ppt

49

Final Example Query

PrimeSuspect (crime#, accuser, suspect)

(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)

(2, Cathy, Frank) ∥ (2, Betty, Freddy)

person score

Amy 10

Betty 15

Cathy 5

Suspects

Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166

Frank: 0.25 ∥ Freddy: 0.75

Credibility

SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)

Page 50: trio-boeing.ppt

50

Final Example Query

PrimeSuspect (crime#, accuser, suspect)

(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy, Hank)

(2, Cathy, Frank) ∥ (2, Betty, Freddy)

person score

Amy 10

Betty 15

Cathy 5

Suspects

Jimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166

Frank: 0.25 ∥ Freddy: 0.75

Credibility

SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)

“Scaled as conf”

“Uniform as conf”

“Scaled as conf”

“Uniform as conf”

Page 51: trio-boeing.ppt

51

The Trio System

Version 1Entirely on top of conventional DBMS

Surprisingly easy and complete, reasonably efficient

Page 52: trio-boeing.ppt

52

The Trio System: Version 1

Lin:RaidF

r tableaidT

o

R aid xid C

Relational DBMS

create trio table T(A,B)

select C into R ...

TrioMetadat

a

Trio APITrio API

SQL commands

• Result cursors• Traverse lineage

T aid xid A B

Page 53: trio-boeing.ppt

53

Relation Encoding

aid xid

conf

person

car

21 1 2 Jim Mazda

22 1 2 Bill Mazda

23 2 1 Hank Honda

Suspects

Jim ∥ Bill

Hank

aid xid conf person

31 1 3 Jim

32 1 3 Bill

33 2 2 Hank

Saw (witness,car)

(Cathy, Honda) ∥ (Cathy, Mazda)

Drives (person,car)

(Jim, Mazda) ∥ (Bill, Mazda)

(Hank, Honda)

aid xid conf witness

car

11 1 2 Cathy Honda

12 1 2 Cathy Mazda

??

aidFr

table aidTo

31 Saw 12

31 Drives 21

32 Saw 12

32 Drives 22

33 Saw 11

33 Drives 23

SawDrives

SuspectsLin:Suspects

Page 54: trio-boeing.ppt

54

Relation Encoding with Confidences

aid xid

conf

person car

21 1 0.3 Jim Mazda

22 1 0.6 Bill Mazda

23 2 1.0 Hank Honda

aid xid conf person

31 1 NULL Jim

32 1 NULL Bill

33 2 NULL Hank

Saw (witness,car)

(Cathy, Honda): 0.6 ∥ (Cathy, Mazda): 0.4

Drives (person,car)

(Jim, Mazda): 0.3 ∥ (Bill, Mazda): 0.6

(Hank, Honda)

aid xid conf witness

car

11 1 0.6 Cathy Honda

12 1 0.4 Cathy Mazda

SawDrives

Suspects

0.12

0.24

0.6

Page 55: trio-boeing.ppt

55

Query Translation

Persistent results: Query Q into result table R

1. Run query Q’ to produce “super-result” R Q’ ≈ Q but adds aid/xid’s of source tuples, joins lineage

tables for lineage() predicates

2. Group R into alternatives, generate new xid’s3. Materialze lineage data into lin:R

4. Compute confidences? (optional)5. Add/update metadata: schemas, confidence info,

lineage structure

Transient results: stop at 2, return cursor over new xid’s

Page 56: trio-boeing.ppt

56

The Trio System: Version 1

Lin:R aidtabl

e aid

R aid xid C

Relational DBMS

create trio table T(A,B)

select C into R ...

TrioMetadat

a

Trio APITrio API

SQL commands

• Result cursors• Traverse lineage

T aid xid A B

Page 57: trio-boeing.ppt

57

The Trio System: Version 1

Trio API and translator(TriQL parser in Python)

Trio API and translator(TriQL parser in Python)

TrioMetadat

a

TrioExplorer(GUI client)

TrioExplorer(GUI client)

Trio Stored

Procedures

EncodedData

TablesLineageTables

Standard SQL

• Data & system columns• A-IDs for alternatives• “Verticalized” through shared X-IDs• Confidences

• One lineage table per derived table• Connecting A-IDs

• Table types• Schema-level lineage structure

• conf()• lineage()

• TriQL queries• DDL commands• Schema browsing• Table browsing• Explore lineage• On-demand confidence computation

Command-lineclient

Command-lineclient

Relational DBMS

SPI-Plugin for

PostgreSQLReplacing

“Fetch-All”

SPI-Plugin for

PostgreSQLReplacing

“Fetch-All”

Page 58: trio-boeing.ppt

58

The Trio System: Version 2

Trio API and translatorTrio API and translator

TrioExplorer(GUI client)

TrioExplorer(GUI client)Command-line

client

Command-lineclient

Trio DBMS

SpecializedTrio Data Structures

SpecializedTrio Processing & Optimizing

Standard SQL Direct function calls

Page 59: trio-boeing.ppt

59

Efficient Confidence Computation

Previous approach (probabilistic databases)• Each operator computes confidences during

query execution• Only certain (“safe”) query plans allowed

In ULDBs Confidence of alternative A is function of

confidences in λ(A)

Our approach• Decoupled data and confidence computation

• Allow any query plan for data computation

• Compute confidences on-demand based on lineage

Page 60: trio-boeing.ppt

60

CIDs & Confidence Memoization

Find set of Closest Independent Descendants (CIDs) for each relation in the schema-level lineage DAG (relations with no shared ancestors)

Batched confidence computations in SQL• Select unions of distinct base alternatives from

all root-to-leaf lineage paths• With CIDs and memoization enabled

– if confidences are memoized then stop at CIDs – else traverse down to the leafs (base data) and memoize at CIDs

(ongoing work)

Page 61: trio-boeing.ppt

61

“Probabilistic TPC-H” Setup

• Horizontal split of TPC-H tables

• Uniform-random number of alternatives [1-10]

• Uniform confidences

PartsuppsPartsuppsLineitemsLineitemsOrdersOrders

O2O2O1

O1

OO LL PP

OLOL LPLP

OLPOLP1.5M

1.0M 0.1M

1.0M 0.5M 0.7M

O3O3 L2

L2L1L1 L3

L3 L4L4 P2

P2P1P1 P3

P3

OO LL PP

CID(OLP) = {O, L, P}

0.8M6M1.5M

Page 62: trio-boeing.ppt

62

Confidence Computation Times

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

Exe

cuti

on

Tim

e (s

eco

nd

s)

O L P OL LP OLP Total

No CIDs

CIDs

Page 63: trio-boeing.ppt

63

Per-Tuple vs. Batching (no CIDs)

Page 64: trio-boeing.ppt

64

Memoization & Join Multiplicity

Page 65: trio-boeing.ppt

65

Current Topics

System• Full query language• Nice interface• Performance experiments• Demo applications

Algorithms: confidence & lineage computation,

extraneous data, membership questions• Minimize lineage traversal

• Confidence memoization

• Efficient batch computations

Page 66: trio-boeing.ppt

66

Future Directions

Theory, Model, Algorithms• Unlimited opportunities

System• Storage, indexing, partitioning• Statistics and query optimization

More features• Top-K by confidence • Incomplete relations; continuous uncertainty;

correlated uncertainty• External lineage; update lineage; versioning

Page 67: trio-boeing.ppt

Search “stanford trio”

Current Trio team:Parag Agrawal, Anish Das Sarma, Raghotham

Murthy, Michi Mutsuzaki, Shubha Nabar, Tomoe Sugihara,

Martin Theobald, Jennifer Widom

Special thanks to:Ashok Chandra, Alon Halevy, Jeff Ullman,

Omar Benjelloun

but don’t forgetthe lineage…