overview’of’data’ exploration’techniques’olga/publications/tutorial-sigmod15... ·...

86
OVERVIEW OF DATA EXPLORATION TECHNIQUES Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri SIGMOD 2015, Melbourne

Upload: others

Post on 22-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

OVERVIEW  OF  DATA  EXPLORATION  TECHNIQUES  

Stratos  Idreos,  Olga  Papaemmanouil,  Surajit  Chaudhuri  SIGMOD  2015,  Melbourne  

Page 2: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

USER  INTERACTION  

Page 3: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

DB  

User  Interface  Layer  

visualize    results  

express    interests  

assisted  query  formulaSon    

collaborate  

annotate  query/results  recommendaSons  

Page 4: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

DB  

User  Interface  Layer  

Page 5: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

DB  

User  Interface  Layer  

Data  Visualiza=on  

Page 6: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

DB  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

Page 7: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    visualizaSon  

visualiza=on  tools  

automa=c  visualiza=on  

visual  op=miza=ons  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 8: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    visualizaSon  

visualiza=on  tools  

automa=c  visualiza=on  

visual  op=miza=ons  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 9: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Back  in  1982…  

window-­‐based  “sophis=cated”    browser  for  rela=onal  DBs  

TIMBER  

(1,1)   (M,1)  

(1,i)  ith  tuple  goes  here  

DB  

browser  for  mulSple  relaSons/tuples    rich  query  language  for  icon-­‐oriented  DBs    visual  editor  of  text  objects    browser  for  geographical  data  

TIMBER,  VLDB’82    

Page 10: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

user-­‐driven    visualizaSons  

data    cubes  

Polaris  

DB  

back-­‐end    queries  

visual    specs  

Polaris,  INFOVIS  ‘02    

Page 11: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

user-­‐driven    visualizaSons  

data    cubes  

Polaris  

back-­‐end  queries:  data  selecSon,  parSSon  into  panes  

visual    specificaSons  

back-­‐end    queries  

specificaSons  (abributes)  

DB  

Polaris,  INFOVIS  2002    

Page 12: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

user-­‐driven    visualizaSons  

data    cubes  

Polaris  

back-­‐end  queries:  data  transformaSons    (group,  sort,  aggregate  within  each  pane)    

visual    specificaSons  

back-­‐end    queries  

transformaSons  (group  by,  sort)  

DB  

Polaris,  INFOVIS  2002    

Page 13: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    cubes  

Polaris  

visual    specificaSons  

back-­‐end    queries  

mappings  (shape,  size,  color)  

back-­‐end  queries:  graphical  transformaSons  (renter  and  visualize)  

DB  

user-­‐driven    visualizaSons  

Polaris,  INFOVIS  2002    

Page 14: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

collaboraSve    exploraSon  

Sky  View    

live  annotaSons  

exploraSon  for    sky  objects/paberns  

AstroShelf,  SIGMOD  ‘12  

Page 15: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Sky  View    

Live  Annota=ons  

subscripSons  to    interesSng  objects  

exploraSon  for    sky  objects/paberns  

AstroShelf,  SIGMOD  ‘12  

collaboraSve    exploraSon  

Page 16: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Sky  View    

Live  Annota=ons  

stream  based    noSficaSons  

collaboraSve    exploraSon  

exploraSon  for    sky  objects/paberns  

AstroShelf,  SIGMOD  ‘12  

Page 17: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    visualizaSon  

visualiza=on  tools  

automa=c  visualiza=on  

visual  op=miza=ons  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 18: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

automaSc  visualizaSon  

review  views    

request  views    

manual,  repeSSve  exploraSon  for  best  visualizaSon(s)    

 

interesSng?    insigheul?  

User  Interface  Layer  

Data  Visualiza=on  

DB  

Page 19: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

filter  across  charts,  recommend,  rank  

search,  select,  promote,  discard,  

save,  share  VizDeck  

DB  

model  “good”    charts  

saved  decks/replay  logs  

auto-­‐ranked    visualizaSons  

VizDeck,  SIGMOD  ‘12  

Page 20: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

automaSc    visualizaSons  

aggregaSons/  single-­‐abribute    

group-­‐by    

sales  over  Sme  %  sales/  region    user    query  

Q1    Q2  …  Qn  

uSlity   DB  

informa=ve  queries   visualizaSon    

engine  high  deviaSon  

from  overall  dataset  

SeeDB,    PVLDB‘13  

Page 21: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    visualizaSon  

visualiza=on  tools  

automa=c  visualiza=on  

visual  op=miza=ons  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 22: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

resoluSon    reducSon  

SciDB  

user    query  

query    results    

expensive,  ineffecSve  on  big  data  sets  

Visualiza=on  

Scalar,    Big  Data  Vis  ‘13  

Page 23: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

resoluSon    reducSon  

SciDB  

user    query  

query    results    

Visualiza=on  

user    query  

reduced  results    

modified  query  plans  filter/aggregate/sample    

at  given  resoluSon  

Visualiza=on  

Data    Reduc=on  

SciDB  

Scalar,    Big  Data  Vis  ‘13  

Page 24: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

approximate    visualizaSons  

DB  

Visualiza=on  

SELECT  X,  AVG(Y)  FROM  R(X,Y)  GROUP  BY  X    

reduced  results    

original    chart  

approximate    chart  

same  group    ordering  

user    query  

Sampling  

Blais  et  al,  PVLDB  ‘15  

Page 25: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

approximate    visualizaSons  

DB  

Visualiza=on  

SELECT  X,  AVG(Y)  FROM  R(X,Y)  GROUP  BY  X    

reduced  results    

approximate    chart  

user    query  

Sampling  

clear  ordering  less  samples  

Blais  et  al,  PVLDB  ‘15  

Page 26: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

approximate    visualizaSons  

DB  

Visualiza=on  

SELECT  X,  AVG(Y)  FROM  R(X,Y)  GROUP  BY  X    

reduced  results    

approximate    chart  

user    query  

Sampling  

correct  order?  sample  more    

min  #  samples    for  correct  order?  

Blais  et  al,  PVLDB  ‘15  

Page 27: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

approximate    visualizaSons  

DB  

Visualiza=on  

SELECT  X,  AVG(Y)  FROM  R(X,Y)  GROUP  BY  X    

reduced  results    

approximate    chart  

user    query  

#samples   Group  1   Group  2   Group  3   Group  4  

1   [60,90]   [20,50]   [10,40]   [40,70]  

20   [64,84]   [30,48]   [15,35]   [45,65]  

21   [66,84],  I   [30,48]   [17,35]   [46,64]  

70   [66,84],  I   [40,47]   [17,32],  I   [46,53]  

sampling    phases/    confidence  intervals  

Sampling  

Blais  et  al,  PVLDB  ‘15  

Page 28: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

visualizaSon    management    

DB  

user    query  

query    results    

Visualiza=on  

overlapping  user  queries  

replicated  db  opera=ons  

memory  opera=ons  on  big  data  

Ermac,  PVLDB  ‘14  

Page 29: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

visualizaSon    management    

DB  

user    query  

query    results    

Visualiza=on  

DVMS  

visual    specifica=ons    

logical  visual  plans  è  physical  query  plans  

transforma=ons  to  pixel  space  visual  op=miza=ons   reduced    

rendering  =me  

Ermac,  PVLDB  ‘14  

Page 30: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

exploraSon  interfaces  

automa=c  explora=on  

novel  query  interfaces  

assisted  query  formula=on  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 31: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

exploraSon  interfaces  

automa=c  explora=on  

novel  query  interfaces  

assisted  query  formula=on  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 32: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

manual  vs  automaSc    data  exploraSon  

SQL  query    formulaSon  

result    review  

predicate  adjustment  

query    execuSon  

long,  imprecise,    labor-­‐intensive  process  

DB  manual  

Page 33: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

manual  vs  automaSc    data  exploraSon  

SQL  query    formulaSon  

result    review  

predicate  adjustment  

query    execuSon  

long,  imprecise,    labor-­‐intensive  process  

DB  

capture  user  interests  

op=mize  query  execu=on  

reduce    user  effort  

recommend  data/queries  

manual  

auto  

Page 34: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

manual  vs  automaSc    data  exploraSon  

SQL  query    formulaSon  

result    review  

predicate  adjustment  

query    execuSon  

long,  imprecise,    labor-­‐intensive  process  

DB  

capture  user  interests  

op=mize  query  execu=on  

reduce    user  effort  

recommend  data/queries  

manual  

auto  

Page 35: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

explore    by  example  

effecSveness  vs  efficiency  sampling  areas?  sampling  size?  

DB  

User  Model   Samples  Space    

Explora=on  decision    

tree  classifier  

relevant  

irrelevant  

sample  extrac=on  

AIDE,  SIGMOD’14/  VLDB’15  

Page 36: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

explore    by  example  

Abrib

ute  B  

Abribute  A  

relevant  areas  to  predict  

AIDE,  SIGMOD’14/  VLDB’15  

Page 37: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Abrib

ute  B  

Abribute  A  

x  

x   x  

x  x  

x  x  

x  x  

√  

√  

explore    by  example  

√  

uniform  sampling  across  domain  

AIDE,  SIGMOD’14/  VLDB’15  

Page 38: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

√  

√  

discover    relevant  area  

√  √  

x  √  

Abrib

ute  B  

Abribute  A  

√  

explore    by  example  

predicted    relevant  area  

sampling  around  relevant  objects  

AIDE,  SIGMOD’14/  VLDB’15  

Page 39: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

√  √  √  

x  √  

Abrib

ute  B  

Abribute  A  

√  √  

refined  predicted    relevant  areas  

explore    by  example   sampling  around  boundaries  

AIDE,  SIGMOD’14/  VLDB’15  

Page 40: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

result    recommendaSons  

query  

DB   addi=onal    results  

interes=ng  queries  

DB  

results  YMALDB  

YMALDB,  VLDBJ  ’13  

Page 41: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

result    recommendaSons  

query  

DB   addi=onal    results  

interes=ng  queries  

DB  

results  YMALDB  

rank  faSets  

freq(result)/  freq(DB)  

query   extract    query  faSets  

selecSon  predicates  based  on  original  query  

expand    adributes  

add  abributes  from  table  schema  

top-­‐k  queries  

YMALDB,  VLDBJ  ’13  

Page 42: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

result    recommendaSons  

query  

DB   addi=onal    results  

interes=ng  queries  

DB  

results  

rank  faSets  

freq(result)/  freq(DB)  

YMALDB  

query   extract    query  faSets  

selecSon  predicates  based  on  original  query  

expand    adributes  

add  abributes  from  table  schema  

top-­‐k  queries  

!tle,  year,  genre  of  Scorsese  movies  

!tle,  year,  genre,  country    of  Scorsese  movies  

many  Scorsese  movies  are  related  to  Italy  

+   =  

YMALDB,  VLDBJ  ’13  

Page 43: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

exploraSon  interfaces  

automa=c  explora=on  

novel  query  interfaces  

assisted  query  formula=on  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 44: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

keyword-­‐based    query  suggesSons  

SQL  query  (tedious)    

keywords  (intuiSve)  DB  

relevant  &  irrelevant  data  

relevant  data  

keyword  search  DB  

relevant  data  

how  we  can  discover  relevant  queries?    

SQLSUGG,  ICDE’11  

Page 45: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Template    Repository   “database  gray”  

template  on  Stle/authors?  template  on  Stle?  

suggested      queries  

Template  Matcher  

SQL    Query  

Generator  

keywords   Sample  Results/  

Visualiza=on  

ranked  

templates  

keyword-­‐based  query  suggesSons  

SQLSUGG,  ICDE’11  

Page 46: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

suggested      queries  

Template  Matcher  

SQL    Query  

Generator  

keywords   Sample  Results/  

Visualiza=on  

ranked  

templates  

Paper  

=tle   year  

Paper  

=tle   year  

id=p_id  

Template  1   Template  2  

Author  Template    Repository  

template  generaSon  

keyword-­‐based    query  suggesSons  

SQLSUGG,  ICDE’11  

Page 47: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

 template  relevance  =  f  (en=ty  relevance  &  importance)  

suggested      queries  

Template  Matcher  

SQL    Query  

Generator  

keywords   Sample  Results/  

Visualiza=on  

ranked  

templates  

Template    Repository  

enSty  relevance      èkeyword  frequency  in  enSty    enSty  importanceè  importance  of    data  nodes                                                                

relevant    template?    

keyword-­‐based    query  suggesSons  

SQLSUGG,  ICDE’11  

Page 48: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

equi-­‐join    inference  

inference  algorithm  

informa=ve  

tuple  

Cartesian  

product  

goal  join  

predicate  

minimize  user  effort  discover  all  posi=ves  eliminate  all  nega=ves  

goal  predicate:    

table  A  A1   A2   B1   B2  

table  B  sample  

BonifaS  et  al,  EDBT’14  

Page 49: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

equi-­‐join    inference  

inference  algorithm  

informa=ve  

tuple  

Cartesian  

product  

table  A   table  B  goal  join  

predicate  

A1   A2   B1   B2  

(A1,  B1)   (A1,  B2)   (A2,  B1)  

(A1,  B1)  (A1,  B2)  

(A1,  B1)  (A2,  B1)  

(A1,  B1)  (A2,  B2)  

prune  predicates  with    uninformaSve  tuples    label  tuple  that  prunes  as    many  predicates  as  possible        

sample  

candidate  predicates  

BonifaS  et  al,  EDBT’14  

Page 50: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

graphical  query    specificaSon  

answers  

non-­‐answers  

result  visualizaSon  

DataPlay,  PVLDB  ’13  

Page 51: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

graphical  query    specificaSon  

answers  

non-­‐answers  

pivot    relaSon  

query  /visualizaSon  recommendaSons  

add,  remove  query  constraints  

result  visualizaSon  

seman=c  query  tuning    by  local  syntac=c  modifica=ons  

DataPlay,  PVLDB  ’13  

Page 52: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

graphical  query    specificaSon  

answers  

non-­‐answers  

result  visualizaSon  

search  limited  to    local  modifica=ons  

pivot    relaSon  

query    correcSons  

add,  remove  results  

DataPlay,  PVLDB  ’13  

Page 53: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

query    recommendaSons  

query  Charles  DB  

results   queries    selected  query  

Charles,  CIDR  ’13  

Page 54: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

query    recommendaSons  

query  Charles  DB  

results   queries    selected  query  

<5                >5  

 <5                >5  <20        <30  <5              >5  >20        >30    

weight   weight,  height  different  data  parSSons  

quality:  simplicity,  breadth,  balance  

Charles,  CIDR  ’13  

Page 55: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

query    refinement  

Merlin  DB  condi=onal  

 query  

select  species  from  birds  where  color=  {red:  80%,  blue:  20%}  

ranked  results  by    match  probability  

sensi=vity  of  user  predicates  

query  refinements  w/    quality  improvement  

rank   species  

1   Bluebird  

2   Blue  Jay  

adr   sensi=vity  

color   18.6  

adr   quality  score  

size   83.3  

legcolor   57.1  

remaining    adributes  

result  quality  if    added  in  the  query  

impact  on  ranking  

Merlin,  ICDE  ’14  

Page 56: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

exploraSon  interfaces  

automa=c  explora=on  

novel  query  interfaces  

assisted  query  formula=on  

User  Interface  Layer  

Data  Visualiza=on  

Explora=on    Interface  

DB  

Page 57: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

no-­‐keyboard  interfaces  

gesture  recogniSon  

query    context  

query  intend  

 query  space    

search    pabern    

 

query  template  

GestureDB,  CIDR  ’13  

Page 58: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

no-­‐keyboard  interfaces  

DB  

touch  recogniSon  

gesture    recogniSon  

map  touch  to  operators      

novel    database  kernel  

touch  input  

quick  response  

dbTouch,  CIDR  ’13  

Page 59: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

MIDDLEWARE  TECHNIQUES    

InteracSve  ExploraSon  through    Data  Prefetching  &  Query  ApproximaSon    

Page 60: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

SQL  query    formulaSon  

result    review  

predicate  adjustment  

query    execuSon  

interacSve  data  exploraSon  

ad-­‐hoc,  non-­‐op=mized,  labor-­‐intensive  process    

interac=ve:  small  latency  bounds  on  user  wait  =me  

DB  

Page 61: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

middleware    opSmizaSons  

query  

results  middleware  

prefetching  

query  approxima=on  

DB  

specula=ve  query  execu=on  

result    reuse  

structure-­‐aware  prefetching  

online  processing  

sample-­‐based    processing  

Page 62: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

sample-­‐based  processing  

approximate    results    

query     sampling  

•  accuracy  vs  response  Smes  •  sample  construcSon  &  selecSon  •  error  approximaSon  

Samples   DB  

Page 63: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

off-­‐line  data  synopses  

approximate  results  +  confidence  bounds      

query  Synopses  

DB  

Aqua    

transformed    query  

join  synopses:  sample  disSnguished  joins  

congressional  samples:  biased  sampling  for  group-­‐by  queries  

incremental  maintenance:  equi-­‐depth  &  compressed  histograms  

samples  histograms  

Aqua,  SIGMOD  ’99  

Page 64: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

offline  sampling  

 on  frequent  columns  sets  

DB  

disk   in-­‐memory  

samples  across    1000s  machines  

select  avg(sessionTime)  FROM  table  WHERE  city=“SF”  WITHIN  1  SEC  

online  sample  selecSon  

Results  190+/-­‐  5.89  (95%  confidence)  

parallel  query    execu=on  on  mul=ple  samples  across    

mul=ple  machines  

BlinkDB,  EuroSys  ’13  

online  sampling    selecSon  

Page 65: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data  impressions  

approximate    results  

DB  Level  1  

query  &  =me/error      bounds    

adapSve  sampling  to    exploraSon  focus  

impressions    during  data  loading  

 

mulS  layer  sampling  and    processing  to    meet  user  bounds  

Level  2  

Level  3  

SciBORG,  CIDR  ’11  

Page 66: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

middleware    opSmizaSons  

query  

results  middleware  

prefetching  

query  approxima=on  

DB  

specula=ve  query  execu=on  

result    reuse  

structure-­‐aware  prefetching  

online  processing  

sample-­‐based    processing  

Page 67: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

speculaSve    query  execuSon  

1.   predict  follow-­‐up  queries  2.   execute  queries  3.   cache  results  

DB  

Query  Formula=on  user  wait  =me  

Query  Execu=on  

Result  Review  

=me  

Page 68: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

speculaSve    query  execuSon  

1.   predict  follow-­‐up  queries  2.   execute  queries  3.   cache  results  

DB  

Query  Formula=on  user  wait  =me  

Query  Execu=on  

Result  Review  

=me  

Page 69: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

speculaSve    query  execuSon  

1.   predict  follow-­‐up  queries  2.   execute  queries  3.   cache  results  

DB  

Query  Formula=on  user  wait  =me  

Query  Execu=on  

Result  Review  

=me  

explora=on  space  reduc=on  

query  enumera=on  

query    ranking  

Page 70: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

cube  exploraSon  

SELECT  AVG  (iops)  FROM  events  WHERE  month=“m1”  AND  week=“w1”      GROUP  BY  zone  

location

itme

month

zone center rack

week

hour

user  query  

explora=on  space  reduc=on  

DICE,  ICDE  ’14  

Page 71: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

cube  exploraSon  

SELECT  AVG  (iops)  FROM  events  WHERE  month=“m1”  AND  week=“w1”      GROUP  BY  zone  

WHERE  month=“m1”        

WHERE  month=“m1”  AND  week=“w1    AND  hour=“h1”        

WHERE  month=“m1”    AND  week=“w2”    

parent  

child  

sibling  

user  query  

explora=on  space  reduc=on  

 cube  explora=on  operators  

location

itme

month

zone center rack

week

hour

DICE,  ICDE  ’14  

Page 72: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Q(month=“m1”)  

Q(month  =  “m12”)  

Q(hour  =“h1”)  

Q(hour  =“  h24”)  

Q(week=“w2”)  

Q(week=“w3”)  

specula=ve  queries    

…  

…  

…  

cube  exploraSon  explora=on  

space  reduc=on  query  

enumera=on  

SELECT  AVG  (iops)  FROM  events  WHERE  month=“m1”  AND  week=“w1”      GROUP  BY  zone  

user  query  

location

itme

month

zone center rack

week

hour

DICE,  ICDE  ’14  

Page 73: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Q(month=“m1”)  

Q(month  =  “m12”)  

Q(hour  =“h1”)  

Q(hour  =“  h24”)  

Q(week=“w2”)  

Q(week=“w3”)  

specula=ve  queries    

…  

…  

…  

cube  exploraSon  explora=on  

space  reduc=on  query  

enumera=on  

SELECT  AVG  (iops)  FROM  events  WHERE  month=“m1”  AND  week=“w1”      GROUP  BY  zone  

user  query  

query    ranking  

location

itme

month

zone center rack

week

hour

DICE,  ICDE  ’14  

Page 74: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Query  Formula=on  user  wait  =me,  t  

Result  Review  

DB  

Specula=ve  Execu=on  

Query  Execu=on  

74  

=me  

cube  exploraSon  

DICE,  ICDE  ’14  

Page 75: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

Query  Formula=on  user  wait  =me,  t  

Result  Review  

DB  

Specula=ve  Execu=on  

Query  Execu=on  

75  

QUERY   Probability   Exec  Time  Q1   0.3   22  

Q2   0.25   20  

Q3   0.25   35  

Q4   0.15   70  

Q5   0.05   35  

maximize  query  probability  total  speculaSon  Sme  <  t  

=me  

cube  exploraSon  

DICE,  ICDE  ’14  

Page 76: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

result  reuse  

DB  

Query  1  

Execu=on  

Query  2  

•  idenSfy  (likely)  overlapping  results  •  cache  them  •  reduce  query  execuSon  Sme  (user  wait  Sme)  

Query  3  

Execu=on  

prefetching    window   prefetching    window  

=me  

Execu=on  

Page 77: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

semanSc  windows  user-­‐defined    

window  properSes    

SW2  

SW3  

SW4  

SW1  

2D  exploraSon  space  

overlapping    results/windows  

window  prefetching    

which  order?  

Kalinin  et  al,  SIGMOD  ’14  

Page 78: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

semanSc  windows  

 uSlity-­‐based  result  ranking  

&    result  prefetching  

   

SW2  

SW3  

SW4  

SW1  

2D  exploraSon  space  

user-­‐defined    window  properSes    

overlapping    results/windows  

Kalinin  et  al,  SIGMOD  ’14  

Page 79: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

semanSc  windows  

 online  performance  vs    

query  compleSon  Sme  

SW1  

extend  &  prefetch  

adjust  prefetching  size  to  output  progress  

 

SW2  

Kalinin  et  al,  SIGMOD  ’14  

Page 80: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    diversificaSon  

query  

DB  

diversified  results     k  representa=ve  tuples  with  

max  total    pairwise  distance  

Page 81: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    diversificaSon  

d(T1,  T3)  T1  T2  T3  T4  T5  

random    tuple  

d  (T2,  T3)  

d(T4,  T3)  

d(T5,  T3)  

Query    Output  

Diversified  Output  k=  3  

Max  Diversified    Set  Search    

query  

DB  

diversified  results     k  representa=ve  tuples  with  

max  total    pairwise  distance  

Page 82: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    diversificaSon  

d(T1,  T3)  T1  T2  T3  T4  T5  

random    tuple  

d  (T2,  T3)  

d(T4,  T3)  

d(T5,  T3)  

T1  T2  T3  T4  T5  

d  (T2,  T1)+  d(T2,T3)  

d  (T4,  T1)+d(T4,  T3)  

d  (T5,  T1)+d(T5,  T3)  

Query    Output  

Diversified  Output  k=  3  

Max  Diversified    Set  Search    

query  

DB  

diversified  results     k  representa=ve  tuples  with  

max  total    pairwise  distance  

Page 83: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

data    diversificaSon  

d(T1,  T3)  T1  T2  T3  T4  T5  

random    tuple  

d  (T2,  T3)  

d(T4,  T3)  

d(T5,  T3)  

T1  T2  T3  T4  T5  

d  (T2,  T1)+  d(T2,T1)  

d  (T4,  T1)+d(T4,  T3)  

d  (T5,  T1)+d(T5,  T3)  

T1  T2  T3  T4  T5  

Query    Output  

Diversified  Output  k=  3  

Max  Diversified    Set  Search    

query  

DB  

diversified  results     k  representa=ve  tuples  with  

max  total    pairwise  distance  

Page 84: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

interacSve  data    diversificaSon  

Q1  

Q2  

Q3  w  

w  

w   w  ¢  

¢  ¢  

¢  ¢  

¢  

¢  

w  

w  

¢  

   overlapping        diversified  results  

     long  Time-­‐To-­‐Insight  

¢   ¢   ¢  

¢  

¢  ¢  ¢  

¢   ¢  ¢  

¢  

¢  

¢  

w  

w  

w  

w  

w  

w  w  

•  cache  diversified  results  and  use  most  promising  •  regression  model  predicts  max  diversificaSon  of  a  set    

DivIDE,  SSDBM’14  

Page 85: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

DB  query    

Cached  Diversified  Results  

reusable  results  

query  results  

divide    search  space  

reusable    diversified    results  

new    query    results  

model    based  output    selec=on  

diversified  results    

•  search  space  pruning  through  regression  model  •  best/first  fit  search  for  max  total  diversificaSon  

among  cached  and  new  results  

DivIDE,  SSDBM’14  

interacSve  data    diversificaSon  

Page 86: OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’olga/publications/tutorial-sigmod15... · OVERVIEW’OF’DATA’ EXPLORATION’TECHNIQUES’ Stratos’Idreos,’OlgaPapaemmanouil,’

structure-­‐aware  prefetching  •  prefetching  for  interacSve  spaSal  query  sequences  •  model  structures  of  past  spaSal  queries  in  graph  •  idenSfy  guiding  structure  in  past  two  queries  :  iteraSve  pruning  •  cache  the  predicted  next  locaSon  

SCOUT,  VLDB’12