mazda trio meeting

Post on 13-Sep-2014

653 Views

Category:

Automotive

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com

TRANSCRIPT

Trio: A System for Data, Uncertainty, and Lineage

Search “stanford trio”http://i.stanford.edu/trio

2

People

Current• Jennifer Widom (faculty)• Omar Benjelloun (post-doc)• Parag Agrawal, Anish Das Sarma, Shubha Nabar (PhD)• Michi Mutsuzaki (MS)• Tomoe Sugihara (visitor)

Incoming• Martin Theobald (post-doc)• Raghu Murthy (MS)• Ander de Keijzer (visitor)

Alums• Alon Halevy, Ashok Chandra (visitors)• Chris Hayworth (MS)

3

Why Uncertainty + Lineage?

Many applications seem to need bothFrom a technical standpoint, it turns out

that lineage...

1. Enables simple and consistent representation of uncertain data

2. Correlates uncertainty in query results with uncertainty in the input data

3. Can make computation over uncertain data more efficient

4

Trio Components

1. Data Model ULDBs (Uncertainty-Lineage Databases): Simple extension to relational model

2. Query Language TriQL: Simple extension to SQL, well-defined

semantics and intuitive behavior

3. System Version 1: Complete system and GUI built

on top of conventional DBMS

5

Running Example: Crime-Solving

Saw(witness,car) // may be uncertainDrives(person,car) // may be uncertain

Suspects(person) = πperson(Saw ⋈ Drives)

6

Our Model for Uncertainty

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences

7

Our Model for Uncertainty

1. Alternatives: uncertainty about value2. ‘?’ (Maybe) Annotations3. Confidences

Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,

Mazda)

witness carAmy { Honda, Toyota,

Mazda }=

Three possibleinstances

8

Six possibleinstances

Our Model for Uncertainty

1. Alternatives2. ‘?’ (Maybe): uncertainty about presence3. Confidences

Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,

Mazda)(Betty, Acura)

?

9

Our Model for Uncertainty

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences: weighted uncertainty

Saw (witness,car)(Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy,

Mazda): 0.2(Betty, Acura): 0.6

?

Six possible instances, each with a probability

10

Models for Uncertainty

• Our model (so far) is not especially new• We spent some time exploring the space of

models for uncertainty [ICDE 06, journal]

• Tension between understandability and expressiveness– Our model is understandable– But it is not complete, or even closed under

common operations

11

Our Model is Not Closed

Saw (witness,car)(Cathy, Honda) ∥ (Cathy,

Mazda)

Drives (person,car)(Jimmy, Toyota) ∥ (Jimmy,

Mazda)(Billy, Honda) ∥ (Frank, Honda)

(Hank, Honda)

SuspectsJimmy

Billy ∥ FrankHank

Suspects = πperson(Saw ⋈ Drives)

???

Does not correctlycapture possibleinstances in theresult

CANNOT

12

Lineage to the Rescue

Lineage• Captures “where data came from”• In Trio: A function λ from alternatives to other

alternatives (or external sources)

13

Example with Lineage

ID Saw (witness,car)11

(Cathy, Honda) ∥ (Cathy, Mazda)

ID Drives (person,car)21

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

22

(Billy, Honda) ∥ (Frank, Honda)

23

(Hank, Honda)

ID Suspects31

Jimmy

32

Billy ∥ Frank

33

Hank

???

Suspects = πperson(Saw ⋈ Drives) λ(31) = (11,2),(21,2)

λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)λ(33) = (11,1), 23

Correctly captures possible instances inthe result

14

Uncertainty-Lineage Databases (ULDBs)

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences4. Lineage

ULDBs are closed and complete[VLDB 06]

15

ULDBs: Lineage

• Conjunctive lineage sufficient for most operations

• Duplicate-elimination: Disjunctive lineage • Difference: Negative lineage• General case after multiple

operations/queries: Boolean formula

16

ULDBs: Interesting Questions

• Data-minimality: extraneous alternatives, extraneous “?”

• Lineage-minimality: harder• Membership: tuple and table, some-

instance and all-instances

• Coexistence: multiple tuples• Extraction: remove tables, retain

possible-instances

17

Example: Extraneous Data

(Diane, Mazda) ∥ (Diane, Acura)

Dianeextraneous

(Diane, Mazda)

(Diane, Acura)

?

??

18

Example: Coexistence

MazdaAcura

(Diane, Mazda) ∥ (Diane, Acura)

(Diane, Mazda)

(Diane, Acura)

?

??

?Can’t coexist

19

Querying ULDBs: Semantics

Query Q on ULDB D

D

D1, D2, …, Dn

possibleinstances

Q on eachinstance

representationof instances

Q(D1), Q(D2), …, Q(Dn)

D’implementation of Q

operational semanticsD + Result

20

Querying ULDBs: TriQL

Basic TriQL: SQL with new semantics• Obeys commutative diagram for uncertain data• Tracks lineage• Query results: new table or on-the-fly

Implemented TriQL: also built-in predicates conf(), lineage(), lineage*()

21

Additional TriQL Constructs[Language manual on web site]

• “Horizontal subqueries”Refer to tuple alternatives as a relation

• Unmerged (horizontal duplicates)• Flatten, GroupAlts

• NoLineage, NoConf, NoMaybe• Query-specified confidences [done]• Data modification statements

22

Confidence Computation

• Confidences computed on-demand based on lineage—Confidence of alternative A is function of

confidences in λ*(A)—Permits any query plan for data computation

• Default probabilistic interpretation, but queries can override

SELECT person, min(conf(Saw),conf(Drives)) as confFROM Saw, DrivesWHERE Saw.car = Drives.car

23

Trio System: Version 1

Standard relational DBMS

Trio API and translator(Python)

Command-lineclient

TrioMetadat

a

TrioExplorer(GUI client)

Trio Stored

Procedures

EncodedData

TablesLineageTables

Standard SQL• “Verticalize”• Shared IDs for alternatives• Columns for confidence,“?”• One per result table• Uses unique IDs

• Table types• Schema-level lineage structure• conf()• lineage() “==>”• lineage*() “==>>”

• DDL commands• TriQL queries• Schema browsing• Table browsing• Explore lineage• On-demand confidence computation

24

Current & Future Topics

Algorithms: confidence computation, coexistence

extraneous data• Minimize lineage traversal• Memoization• Batch operations

System• Full query language• More internal processing ?

– Storage and indexing– Statistics and query optimization

25

Current & Future Topics

• Top-K by confidence • Extend basic uncertainty model

—Incomplete relations—Continuous uncertainty—Correlated uncertainty ?

• External lineage, update lineage, versioning

top related