wolfgang gatterbauer

65
Wolfgang Gatterbauer http://queryviz.com abases will visuali queries too University of Washington Database Group VLDB'11

Upload: tao

Post on 24-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

VLDB'11. Databases will visualize queries too. Wolfgang Gatterbauer . http://queryviz.com. University of Washington Database Group. Two Interactions between Users and Queries. Intent: Find. essential for Query Browse and Re-use. hard. SQL. Query Interpretation. SELECT A FROM R - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Wolfgang Gatterbauer

Wolfgang Gatterbauer

http://queryviz.com

Databases will visualize queries too

University of WashingtonDatabase Group

VLDB'11

Page 2: Wolfgang Gatterbauer

2http://queryviz.com

Two Interactions between Users and Queries

Query Composition

Query Interpretation SELECT AFROM RWHERE B not in

(SELECT DFROM S)

Intent: Find...

SQL

Problem:Query Interpretation is hard too!

even used for testing purposes, e.g., on www.gradiance.com

Khoussainova et al. [CIDR’09]

Li et al. [CIDR’11]

Chatzopoulou et al. [SSDBM’09] Howe, Cole [MS eSc WS’10]

Recent work on Query Management:Idea: Re-use and adapt existing queriesCQMSSQL QuerIESQLshareDBease

hardessential for Query Browse and Re-use

Page 3: Wolfgang Gatterbauer

3http://queryviz.com

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3,

Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists (select xc1.pid

from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists (select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

Browsing and Understanding existing Queries

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Page 4: Wolfgang Gatterbauer

4http://queryviz.com

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3,

Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists (select xc1.pid

from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists (select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

Castspidmid

Castspidmid

Castspidmid

Castspidmid

Actor id

fname='Kevin'lname='Bacon'

Castspidmid

Castspidmid

Actorid

fname='Kevin' lname='Bacon'

Actorid

fname='Kevin'lname='Bacon'

Actorid

fnamelname

SELECTfnamelname

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

W wid

select wid

W wid

W wid tid

W wid tid

>

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

Query Visualization can help

Page 5: Wolfgang Gatterbauer

http://queryviz.com

Four principal ways for Query Interpretation

5

with SQL

How to facilitateSQL queryinterpretation?

w/o SQL

1 Manipulate SQL text

2 Show query results

3 Translate into NL text

4 Visualize Query

represent otherthan query

representquery

as visual

in NL

in a different query language ?

as combination of input / output / query?

e.g., syntactic highlightinge.g., aligning query blocks

Olston et al. [Sigmod’09]

Ioannidis et al. [NLDB’08] [CIDR'00, ICDE'10]

e.g., example data results,related to

http://queryviz.com

into music ?as ???

Page 6: Wolfgang Gatterbauer

http://queryviz.com

"One picture > 1000 words"

6

Erickson [lecture notes’09]"...what we think the world looks like" according to

Text Visual"... P is the set of problems that can be solved quickly... NP is the set of decision problems where we can verify a YES answer quickly if we have the solution in front of us... A problem is NP-hard if a polynomial-time algorithm for would imply a polynomial-time algorithm for every problem in NP... a problem is NP-complete if it is both NP-hard and an element of NP."

Page 7: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

7

easyhard

Lot of past work, see e.g. surveyRecent focus in DB

Data Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

Interpret(Read)

Compose(Write)

User Action

Catarci et al. [J. Vis. Lang. Comput.’97]

Page 8: Wolfgang Gatterbauer

http://queryviz.com

The Challenge

8

Find the appropriate visual alphabet which(i) allows users to quickly understand a query's intent, (ii) can be easily learned by users, and (iii) can express a large fraction of SQL.

Additionally, find(iv) automatic translations from SQL to the visualization.

goal

Page 9: Wolfgang Gatterbauer

http://queryviz.com 9

... logical correspondence

... digrammatic reasoning, reading order, inside/outside

... start from existing known stuff

... ambiguity example

... online test & grammar

... grouping example / disjunctions

Page 10: Wolfgang Gatterbauer

Incremental Complexity

10

select F.personfrom Frequents F, Likes L, Serves Swhere F.person = L.personand F.bar = S.barand L.drink = S.drink

Likes(person, drink)Frequents(person, bar)Serves(bar, drink, price)

Q: Find persons that frequent some bar that serves only drinks they like.

select F.personfrom Frequents Fwhere not exists

(select S.drinkfrom Serves Swhere S.bar = F.barand not exists

(select L.drinkfrom Likes Lwhere L.person = F.personand S.drink = L.drink))

Q(x) :- Frequents(x,y), Serves (y,z,_), Likes (x,z)

: dashed line around relation

Design decision: start from known visual metaphors for CQs; gradually generalize

Unlike SQL: no aliases needed; schema implicit

Unlike Datalog: no anony-mous variables shown

Q: Find persons that frequent some bar that serves some drink they like.

+167% more SQL text

Design decision: allow an implicit reading order to the arrow

+13% more visual elements

Page 11: Wolfgang Gatterbauer

Logical transformations

11

Likes(person, drink)Frequents(person, bar)Serves(bar, drink, price)

Q: Find persons that frequent some bar that serves only drinks they like.

select F.personfrom Frequents Fwhere not exists

(select S.drinkfrom Serves Swhere S.bar = F.barand not exists

(select L.drinkfrom Likes Lwhere L.person = F.personand S.drink = L.drink))

: dashed line around relation

: double line around relation

Q: Find persons that frequent a bar so that they like all drinks served.

Q: Find persons that frequent some bar so that there is no drink served that the person does not like.

Design decision: limited logical transforma-tion can further simplify representation

Page 12: Wolfgang Gatterbauer

http://queryviz.com

QueryViz for Query Intent, not Debugging

12

Discontinuity with NULL valuesselect R.Afrom Rwhere not exists

(select *from Swhere S.B = R.B)

select R.Afrom Rwhere R.B not IN

(select S.Bfrom S)

Discontinuity with empty tablesselect R.afrom R, Swhere R.a=S.aor exists

(select *from Twhere R.a=T.a)

select R.afrom R, S, Twhere R.a=S.aor R.a=T.a

S

ASELECT

AT

A

R

A

Empty result if S.B contains NULL

Empty result if T is empty

Design decision: minimum visual complexity possible overloading and ambiguity like in NL

Page 13: Wolfgang Gatterbauer

http://queryviz.com

Arrangement of Tables and Arrows in the Graph

13

Q: Find worlds for which there exists another earlier world that contains all its tuples.

Hollow arrow for comparison within the same component (CQ block)

Arrangement currently via Graphviz; place for improvement

SELECT W1.widFROM Worlds W1, Worlds W2WHERE W1.wid > W2.widAND not exists

(SELECT *FROM Worlds W3WHERE W3.wid = W1.widAND not exists

(SELECT *FROM Worlds W4WHERE W4.wid = W2.widAND W4.tid = W3.tid))

Design decision: overloading of meaning to the arrow symbol

Page 14: Wolfgang Gatterbauer

http://queryviz.com

Input: Schema

Output: visualization

14

Iinput Query

Danaparamita & G [EDBT'11]

http://queryviz.com

Page 15: Wolfgang Gatterbauer

http://queryviz.com

Wide Open Questions

15

2. What is the appropriate level of abstraction? (intent vs. debugging)

3. What are the appropriate basic visual metaphors?

4. Can we visualize at different granularities? ("zooming in")

7. How to optimally place the visual elements?

1. How to visualize outer joins, sorting, arithmetic expressions, etc.?

8. How to standardize evaluation of alternative approaches? ("TPC-H for speed of Query Interpretation" via user studies)

5. How can we visualize query fragments?

6. How to adapt visualizations to audiences? ("one size fit all")

Page 16: Wolfgang Gatterbauer

http://queryviz.com

Wide Open Questions

16

2. What is the appropriate level of abstraction? (intent vs. debugging)

3. What are the appropriate logical symbols?

4. Can we visualize at different granularities? ("zooming in")

5. How to adapt visualizations to audiences? ("one size fit all")

6. How can we visualize query fragments?

7. How to optimally place the visual elements?

1. How to visualize outer joins, sorting, arithmetic expressions, etc.?

8. How to standardize evaluation of alternative approaches? ("TPC-H for speed of Query Interpretation" via user studies)Query Plan SQL Query Query Intent

more abstract

Pirahesh et al. [Sigmod’92] Jaakkola & Thalheim. [ER WS’03]Most VQL* such as Visual SQL

Correlated nesting is preserved

Scores Team Day Runs

select Team Day

Scores Team Day Runs

http://queryviz.com

* Note that VQL (Visual Query Languages) do not provide the reverse functionality of query visualization

Starbust

Page 17: Wolfgang Gatterbauer

http://queryviz.com

Wide Open Questions

17

2. What is the appropriate level of abstraction? (intent vs. debugging)

3. What are the appropriate basic visual metaphors?

1. How to visualize outer joins, sorting, arithmetic expressions, etc.?

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

QueryViz: default reading orderand logical equivalences

retain originalnesting

Frequents(,)

select(person)

Frequents(,)

Likes(,) Serves(,)

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

Arrows encoding logicalrelations instead of boxes

Somethingcompletelydifferent

Page 18: Wolfgang Gatterbauer

http://queryviz.com

Wide Open Questions

18

2. What is the appropriate level of abstraction? (intent vs. debugging)

3. What are the appropriate basic visual metaphors?

4. Can we visualize at different granularities? ("zooming in")

5. How can we visualize query fragments?

6. How to adapt visualizations to audiences? ("one size fit all")

7. How to optimally place the visual elements?

1. How to visualize outer joins, sorting, arithmetic expressions, etc.?

8. How to standardize evaluation of alternative approaches? ("TPC-H for speed of Query Interpretation" via user studies)

Page 19: Wolfgang Gatterbauer

19http://queryviz.com

The Vision in a Nutshell

Query Composition

Query Interpretation

SELECT AFROM RWHERE B not in

(SELECT DFROM S)

SD

RAB

selA

Query Vi-sualization

Query Re-finement

Q Visualization can facilitate Q Composition through(i) faster Q Interpretation and thus Q Re-use, and (ii) a visual understanding of SQL design patterns.

Thus "Databases will visualize queries too"

easyhard

Page 20: Wolfgang Gatterbauer

20

BACKUP

Page 21: Wolfgang Gatterbauer

http://queryviz.com

Query: Aggregates / Group by

21

Course (course-no, title ) Transcript (student-id, course-no, grade)

select t.student-id from Transcript tgroup BY t.student-idhaving COUNT(t.course-no) =

(select COUNT(course-no) from Course).

Query from: G. Graefe, R. Cole. Fast Algorithms for Universal Quantification in Large Databases (TODS 1995)

Q: "Find the students who have taken as many (different) courses as there are courses offered by the university (tuples in the courses relation).”(“…assuming that there are no duplicates in either relation, that all Transcript tuples refer to valid courseno’s,and that there are no “NULL” values…”)

Transcript student-id COUNT(course-no)

select student-id Course

COUNT(course-no)

Page 22: Wolfgang Gatterbauer

http://queryviz.com

Simple disjunctions

22

select R.Afrom R, S, Twhere R.A = S.A or R.A = T.A

SQL1 a1: a2.a3. R(a1) S(a2) T(a2) [ a2=a1 a3=a1 ]Graph a1: R(a1) (a2. [ S(a2) a2=a1 T(a2) a2=a1 ]Graph a1: R(a1) (a2. [ S(a2) a2=a1 ] a3. [ T(a3) a3=a1 ] )

R(A)S(A)T(A)

Query from: H. Garcia-Molina et al. Database systems: the complete book. 2002. p.260

S A

select A

T A

R A

Note: graph does not explain why with empty S relation, the result is empty (unintuitive conceptual SQL evaluation strategy …)

Page 23: Wolfgang Gatterbauer

http://queryviz.com

Human-Computer Interaction

23

Text Visual (graphics)

Sequential

Sequential

Sequential

ParallelInterpret(Read)

Compose(Write)

User ActionCommunication Medium

easyhard

Page 24: Wolfgang Gatterbauer

24

Barriers to Adoption

100%

+12%Kinesis Typing speed*

Time

-58%

* Self-test and test with first-time user: 3 repetitions, 2-minute typing test from http://hi-games.net/typing-test/

(1) Transition with lower productivity (2) Price

Kinesis: ~ 250 $Standard: ~ 50 $

???

Page 25: Wolfgang Gatterbauer

http://queryviz.com

No Barriers to Adoption

25

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

W wid

select wid

W wid

W wid tid

W wid tid

>

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3,

Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists (select xc1.pid

from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists (select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

Castspidmid

Castspidmid

Castspidmid

Castspidmid

Actor id

fname='Kevin'lname='Bacon'

Castspidmid

Castspidmid

Actorid

fname='Kevin' lname='Bacon'

Actorid

fname='Kevin'lname='Bacon'

Actorid

fnamelname

SELECTfnamelname

(1) Q Visualization does not replace the existingmodel of interaction for Q Composition

(2) free: only enhances the existing way

Page 26: Wolfgang Gatterbauer

http://queryviz.com

Comparison: QGM (Query Graph Model)

26

Inventory(partno, descr)Quotations(partno, suppno, price)

Query: Find suppliers and parts for which the supplier price is less than that of all other suppliers.

Note that automatic attribute node placement can be improved

SchemaQGM Pirahesh et al. [Sigmod’92]

QueryViz

Page 27: Wolfgang Gatterbauer

http://queryviz.com

Comparison: Visual SQL

27

Query: Which students have not yet successfully taken any lecture?

Note that automatic node placement can be improved

SchemaVisual SQLThalheim. [Visual SQL: eine ER-basierte Einfuehrung in die Datenbankprogrammierung Teil I, p. 44, 2003]

QueryViz

Correlated nesting is preserved and needs to be detected by user

Student (MatrNr, Name, Gebdatum)hoert (MatrNr, Semester, KursNr, Note)

select S.Name, S.Gebdatumfrom Student Swhere not exists

(select *from hoert Hwhere S.MatrNr = H.MatrNrand H.Note is not null)

Page 28: Wolfgang Gatterbauer

http://queryviz.com

Find the title of courses, the name of instructors, the gpa and name of students, and the description of comments for courses that are taught by instructors, are taken by students that gave comments, and are offered by departments. Return results only for courses whose term is spring, students whose class is 2011, comments whose rating is greater than 3, and departments whose name is CS.

Comparison: DB Graph

28

Departments(DepID, DepCode, Name) Courses(CourseID, DepID, Title)Instructors(InstrID, Name) Students(SuID, Name, Class, GPA)CourseSched(CourseID, Year, Term, InstrID, TimeSlot)StudentHistory(SuID, CourseID, Year, Term, Grade)Comments(SuID, CourseID, Year, Term, Text, Rating, Date)

select s.Name, s.GPA, c.Title, i.Name, co.Textfrom Students s, Comments co,StudentHistory h, Courses c, Departments d,CourseSched cs, Instructors iwhere s.SuID = co.SuID ands.SuID = h.SuID and h.CourseID = c.CourseID andc.DepID = d.DepID andc.CourseID = cs.CourseID and cs.InstrID = i.InstrID ands.Class = 2011 and co.Rating > 3 andcs.Term = 'spring' and d.Name = 'CS'

Koutrika et al. [ICDE'10]Intermediate Database graph for transforming into NL

QueryViz

Query:

Page 29: Wolfgang Gatterbauer

29

OUT

Page 30: Wolfgang Gatterbauer

http://queryviz.com

Combining succinctness ideas from DRC and TRC

30

select distinct F1.personfrom Frequents F1where not exists (select *from Frequents F2where F2.person = F1.personand not exists (select *from Serves S3, Likes L4where S3.drink = L4.drinkand S3.bar = F2.barand L4.person = F2.person))

Q: Find persons that frequent only bars that serve some drink they like.

Like Datalog (DRC): no aliases needed: Frequents appears twice

Like SQL (TRC): only relevant variables are shown: Price is missing

Natural reading order that corres-ponds to the intended meaning

Likes(person, drink)Frequents(person, bar)Serves(bar, drink, price)

Connected components can represent a nested subquery

Page 31: Wolfgang Gatterbauer

http://queryviz.com

Two bounding box types: for all and not exists

31

select W1. tid, W1.widfrom Worlds W1where W1.wid >= all

(select W2.widfrom Worlds W2where W2.tid = W1.tid)

Worlds(wid, tid)

Q: Worlds and tuples, where tuples do not appear in a later world.

Find worlds and tuples, so that for all worlds that contain the same tid, their wid is smaller or equal to this world.

wid: world IDtid: tuple ID

Note the comparison operator is read: The wid at the beginning of the arrow (on the right) <= wid at the end (on the left)

For all: : double line around relationNot exists: : dashed line around relation

Page 32: Wolfgang Gatterbauer

http://queryviz.com

Alternatives

32

5-22-2009

1. One category can have many products2. One product has only one category.

Source: ?

Page 33: Wolfgang Gatterbauer

http://queryviz.com

Familiar visual constructsatives

33

5-22-2009

Source: ?

Page 34: Wolfgang Gatterbauer

http://queryviz.com

Familiar visual constructs

34

Page 35: Wolfgang Gatterbauer

http://queryviz.com

Alternatives

35

5-22-2009

Source: http://techmania.wordpress.com/2008/06/09/creating-er-diagrams-from-sql/

Page 36: Wolfgang Gatterbauer

http://queryviz.com

Alternatives

36

5-22-2009

Source: http://schemaspy.sourceforge.net/

Page 37: Wolfgang Gatterbauer

http://queryviz.com

Why Query Visualization is different

37

Compare to Browsing through a log of walking directions to various sights in Seattle

Page 38: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

38

easyhard

Lot of past work, see surveyRecent focus in DB

Data Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

Interpret(Read)

Compose(Write)

User Action

Catarci et al. [J. Vis. Lang. Comput.’97]

Page 39: Wolfgang Gatterbauer

39http://queryviz.com

Two Interactions between Users and Queries

Query Composition

Query Interpretation SELECT AFROM RWHERE B not in

(SELECT DFROM S)

Intent: Find...

SQL

Khoussainova et al. [CIDR’09]

Li et al. [CIDR’11]

Chatzopoulou et al. [SSDBM’09] Howe, Cole [MS eSc WS’10]

Recent work on Query Management:Idea: Re-use and adapt existing queries

Problem:Query Interpretation is hard too!

even used for testing purposes, e.g., on www.gradiance.comCQMS

SQL QuerIESQLshareDBease

Motivation: How can we best facilitate Query Interpretation and thus Query-Reuse?

hardessential for Query Browse and Re-use

Page 40: Wolfgang Gatterbauer

http://queryviz.com

Question: right level of abstraction?

40

Query Plan SQL Query Query Intent

more abstract

Pirahesh et al. [Sigmod’92]Starbust

Jaakkola and B. Thalheim. [ER WS’03]Most VQL such as Visual SQL

Correlated nesting is preserved

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Danaparamita, G [EDBT’11]QueryViz

Note that these approaches don't provde the reverse functionality of query visualization.

Page 41: Wolfgang Gatterbauer

41http://queryviz.com

Summary: The Argument for Query Visualization

Text Visual

Sequential

Sequential

Sequential

ParallelInterpret

Compose

Interpret

Compose

Data Queries

__________

InfoVis

Visual QueryLanguages

QueryVisualization

(1) Existing work on Q Management suggests Q-Browse and Q-Reuse to facilitate Q Composition.

(2) Q-Browse requires fast Q Interpretation by users.

(3) Thesis: Q Visualization can help.(4) Suggestion: QueryViz as one system

(5) Different systems can easily be evaluated and compared.(6) Important: Like InfoVis and unlike Visual

Q Languages, Q Visualization enhances the user experience without replacing the current mode for Q Composition.

Page 42: Wolfgang Gatterbauer

Wolfgang Gatterbauer

http://queryviz.comDatabase groupUniversity of Washington

VLDB'11

Databases will visualize queries too

Page 43: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

Data Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

43

easyhard

Lot of past work

Recent focus in DB

Page 44: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

Text Visual (graphics)

Sequential

Sequential

Sequential

ParallelInterpret(Read)

Compose(Write)

User Action

Communication Medium

44

easyhard

Page 45: Wolfgang Gatterbauer

45http://queryviz.com

Why users need to interpret queries?

Query Composition

Query InterpretationSELECT AFROM RWHERE B not in

(SELECT DFROM S)

Query Composition is hard

Find...How can we facilitate Query Interpretation?

Query Evaluation

Aabc

SQL Data

CQMS: Khoussainova et al. [CIDR’09]

DBease: Li et al. [CIDR’11]

SQL QuerIE: Chatzopoulou et al. [SSDBM’09]

SQLshare: Howe, Cole [MS eScience WS’10]

Hence recent work on Query ManagementIdea: Re-use and adapt existing queries

Problem:Query Interpretation is hard too

e.g., used for testing purposes on www.gradiance.com

Page 46: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

Text Visual (graphics)

Sequential

Sequential

Sequential

ParallelInterpret(Read)

Compose(Write)

User Action

Communication MediumData Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

46

easyhard

Page 47: Wolfgang Gatterbauer

47http://queryviz.com

Summary: The Argument for Query Visualization

Text Visual

Sequential

Sequential

Sequential

ParallelInterpret

Compose

Interpret

Compose

Data Queries

__________

InfoVis

Visual QueryLanguages

QueryVisualization

(1) Existing work on Q. Management suggests Q.-Browse and Q.-Reuse to facilitate Q. Composition.

(2) Q.-Browse requires fast Q. Interpretation by users.

(3) Thesis: Q. Visualization can help.(4) Suggestion: QueryViz as one system

(5) Different systems can easily be evaluated and compared.(6) Important: Like InfoVis and unlike Visual

Q. Languages, Q. Visualization enhances the user experience without replacing the current mode for Q. Composition.

Page 48: Wolfgang Gatterbauer

http://queryviz.com

Colors

Text Visual (graphics)

Sequential

Sequential

Sequential

ParallelInterpret(Read)

Compose(Write)

User Action

Communication MediumData Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

48

LightGreen RGB: 144 238 144LightCoral RGB: 240 128 128

Page 49: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

Text Visual (graphics)

Sequential

Sequential

Sequential

ParallelInterpret(Read)

Compose(Write)

User Action

Communication MediumData Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

49

Page 50: Wolfgang Gatterbauer

http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3,

Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists (select xc1.pid

from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists (select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

Query Browse with Query Visualization

Query Browse without Query Visualization

50

Page 51: Wolfgang Gatterbauer

51http://queryviz.com

Queries and Users

Query Composition

Query InterpretationSELECT AFROM RWHERE B not in

(SELECT DFROM S)

Argument for default-all: If annotations are on domain values, then retrievingall annotations are relevant.

Counter-Argument: But then these anno-tations can be modeled in a separate table as normalized tables.

Minimal propagation (αpm)Default-all propagation (αp

d)

Note minimal propagation is not equivalent to just evaluating the where-provenance for the query: Q7(x,y):-Ra(x,y),Ra(y,_). E.g. αp(t5,B,Q7) = {e,f,g}

Find...

Motivation of this talk:How can we facilitate Query Interpretation?

Query Evaluation

Aabc

SQL Data

Page 52: Wolfgang Gatterbauer

52http://queryviz.com

Queries and Users

Query Composition

Query Interpretation SELECT AFROM RWHERE B not in

(SELECT DFROM S)

Argument for default-all: If annotations are on domain values, then retrievingall annotations are relevant.

Counter-Argument: But then these anno-tations can be modeled in a separate table as normalized tables.

Minimal propagation (αpm)Default-all propagation (αp

d)

Note minimal propagation is not equivalent to just evaluating the where-provenance for the query: Q7(x,y):-Ra(x,y),Ra(y,_). E.g. αp(t5,B,Q7) = {e,f,g}

Find...

Page 53: Wolfgang Gatterbauer

http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

Query Browse with Query Visualization

Query Browse with Query Visualization

53

Page 54: Wolfgang Gatterbauer

54http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Query Browse with Query Visualization

Query Browse with Query Visualization

Page 55: Wolfgang Gatterbauer

55http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

Query Browse with Query Visualization

Query Browse with Query Visualization

Page 56: Wolfgang Gatterbauer

56http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

Query Browse with Query Visualization

W wid

select wid

W wid

W wid tid

W wid tid

>

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

Query Browse with Query Visualization

Page 57: Wolfgang Gatterbauer

57http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

Query Browse with Query Visualization

W wid

select wid

W wid

W wid tid

W wid tid

>

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3,

Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists (select xc1.pid

from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists (select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

Castspidmid

Castspidmid

Castspidmid

Castspidmid

Actor id

fname='Kevin'lname='Bacon'

Castspidmid

Castspidmid

Actorid

fname='Kevin' lname='Bacon'

Actorid

fname='Kevin'lname='Bacon'

Actorid

fnamelname

SELECTfnamelname

Query Browse with Query Visualization

Page 58: Wolfgang Gatterbauer

Databases will visualize

queries tooWolfgang Gatterbauer

http://queryviz.comDatabase groupUniversity of Washington

Version Aug 27, 2011

(VLDB'11)

Page 59: Wolfgang Gatterbauer

59

Increased workload

3 Increased workload**

3

CONTROLLING NUMBER OF STUDENTS IN CLASS*

Active selection by lecturer

Self-selection by students

Other processes(non-deterministic)

Previousachievements

1

Qualification exam or interview

2

Enforced grading

4

First come, first serve

5

Small visibility of announcement

6

• Numerus clausus

Examples

• Bucerius Law School in Hamburg• US universities (SAT tests)

• Proseminar Web Information Extraction WS 2005/06

• WU-Wien; Same lecture, 2 Professors

• Kärntner Approch in Alpbach

• Medizin Wien WS 05/06• Some courses at TU-Wien, WU-Wien• ? USI Graz (Hanggliding course) ?

* Assuming capacity constraints that cannot be removed** Mainly content related workload, but also increased administrative efforts, such as inconvenient lecture timesSource: Wolfgang

Actual optionsWho decides about people taking the class?

Control

Choice ProseminarWIE WS 2005/06

Auctions or similar processes

7 • “3er Vorschlag” WU-Wien

Page 60: Wolfgang Gatterbauer

http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

Query Browse with Query Visualization

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3,

Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists (select xc1.pid

from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists (select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

Page 61: Wolfgang Gatterbauer

http://queryviz.com

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores Team Day Runs

select Team Day

Scores Team Day Runs

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person = F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

Query Browse with Query Visualization

W wid

select wid

W wid

W wid tid

W wid tid

>

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3,

Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists (select xc1.pid

from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists (select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

Castspidmid

Castspidmid

Castspidmid

Castspidmid

Actor id

fname='Kevin'lname='Bacon'

Castspidmid

Castspidmid

Actorid

fname='Kevin' lname='Bacon'

Actorid

fname='Kevin'lname='Bacon'

Actorid

fnamelname

SELECTfnamelname

Page 62: Wolfgang Gatterbauer

http://queryviz.com

QueryViz: helping users understand SQL queries• Focus: Usability of Databases

– How to help users re-using existing queries

• Query Visualization: QueryViz– minimal yet expressive visual vocabulary– combines succinctness ideas from SQL and Datalog– Online interface at http://QueryViz.com

DG [EDBT’11 (demo)] Khoussainova et al. [CIDR’09]

Jagadish et al. [Sigmod’07]

select distinct a3.fname, a3.lname from Actor a0, Casts c0, Casts c1, Casts c2, Casts c3, Actor a3where a0.fname = 'Kevin' and a0.lname = 'Bacon' and c0.pid = a0.id and c0.mid = c1.mid and c1.pid = c2.pid and c2.mid = c3.mid and c3.pid = a3.id and not exists

(select xc1.pid from Actor xa0, Casts xc0, Casts xc1where xa0.fname = 'Kevin' and xa0.lname = 'Bacon' and xa0.id = xc0.pid and xc0.mid = xc1.mid and xc1.pid = a3.id)

and not exists(select ya0.idfrom Actor ya0where ya0.fname = 'Kevin' and ya0.lname = 'Bacon’)

Castspidmid

Castspidmid

Castspidmid

Castspidmid

Actor id

fname='Kevin'lname='Bacon'

Castspidmid

Castspidmid

Actorid

fname='Kevin' lname='Bacon'

Actorid

fname='Kevin'lname='Bacon'

Actorid

fnamelname

SELECT

fnamelname

select W1.widfrom Worlds W1where not exists

(select *from Worlds W2where W2.wid < W1.widand not exists

(select *from Worlds W3where W3.wid = W1.widand not exists

(select *from Worlds W4where W4.wid = W2.widand W4.tid = W3.tid)))

W wid

select wid

W wid

W wid tid

W wid tid

>

select F1.personfrom Frequents F1where not exists

(select F2.barfrom Frequents F2where F2.person = F1.personand not exists

(select S3.drinkfrom Serves S3, Likes L4where L4.person =

F1.personand L4.drink = S3.drinkand S3.bar = F2.bar))

Frequents person

select person

Frequents person bar

Serves bar drink

Likes person drink

select Team, Dayfrom Scores S1where not exists (select *

from Scores S2where S1.Runs = S2.Runsand (S1.Team <> S2.Team or S1.Day <> S2.Day))

Scores

Team

Day

Runs

select

Team

Day

Scores

Team

Day

Runs

select S.snamefrom Sailors Swhere not exists

(select B.bidfrom Boats Bwhere not exists

(select R.bidfrom Reserves Rwhere R.bid = B.bidand R.sid = S.sid))

Sailors name sid

select name

Boats bid

Reserves bid sid

• Proposed a Collaborative Query Management System

Page 63: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

63

Text Visual (graphics)

Sequential

Sequential

Sequential

ParallelInterpret(Read)

Compose(Write)

User Action

Communication MediumData Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

Interpret(Read)

Compose(Write)

User Action

Page 64: Wolfgang Gatterbauer

http://queryviz.com 64

Fig_MatrixDataQuery

Data Queries

_______________

InformationVisualization

Visual QueryLanguages

QueryVisualization

Target to Visualize

4-1-2011

Interpret(Read)

Compose(Write)

User Action

Page 65: Wolfgang Gatterbauer

http://queryviz.com

Query Visualization vs. Visual Query Languages

65

Text Visual (graphics)

Sequential

Sequential

Sequential

ParallelInterpret(Read)

Compose(Write)

User Action

Communication Medium