dbease: making databases user-friendly and easily accessible guoliang li, ju fan, hao wu, jiannan...

29
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Upload: hugh-webster

Post on 12-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

DBease: Making Databases User-Friendly and Easily

Accessible

Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng

Database Group, Department of Computer Science and Technology,

Tsinghua University, Beijing 100084, China

Page 2: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

How to Access Databases?

• Traditional database-access methods:–SQL

Select title, author, booktitle, year From dblpWhere title Contains “search” And booktitle

Contains “cidr”

–Query-by-exmaple (Form)

–Keyword Search“search cidr”

CIDR'11 - DBease

(2)

cidr

Page 3: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Comparison of Different Methods

CIDR'11 - DBease

(3)

Usa

bil

ity

Page 4: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Too many

results!

Keyword Search• Is traditional keyword search good enough?

CIDR'11 - DBease

No result!

(4)

Page 5: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Form-based Search

• Form-based Search has the same problem.

CIDR'11 - DBease

Complicated and

stillno result!

(5)

Page 6: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Our Solution

CIDR'11 - DBease

(6)

Type-Ahead Search

Type-Ahead Search in Forms

SQL SuggestionU

sab

ilit

y

Page 7: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

What is Type-Ahead Search?

CIDR'11 - DBease

(7)

Page 8: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Type-Ahead Search

• Advantages– On-the-fly giving users instant feedback– Helping users navigate the underlying

data– Tolerating inconsistencies between query

and data– Supporting Synonyms – Supporting XML data– Supporting Multiple tables

CIDR'11 - DBease

(8)

Page 9: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Problem Formulation

• Data: A set of records• Query

– Q = {p1, p2, …, pl}: a set of prefixes

– δ: Edit-distance threshold

• Result– A set of records having all query prefixes or their similar

forms (conjunctive)

CIDR'11 - DBease

Edit Distance:The number of edit operations

(insertion, deletion, substitution)transformed a string to another

ed(string, stang) =2

(9)

Page 10: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Indexing

• Trie Index• Words: root to leaves• Inverted lists on leaves

CIDR'11 - DBease

(10)

Page 11: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

(11)CIDR'11 - DBease

Algorithm• Step 1: Find similar prefixes incrementally• Step 2: Retrieve the leaf nodes of similar prefixes • Step 3: Compute union lists of inverted lists of leaf nodes• Step 4: Intersect the union lists of query keywords

=cid r

Page 12: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Type-Ahead Search in Forms

CIDR'11 - DBease

(12)

Type-Ahead Search

Type-Ahead Search in Forms

Usa

bil

ity

Page 13: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

What is Type-Ahead Search in Forms?

CIDR'11 - DBease

(13)

Page 14: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Type-Ahead Search in Forms

• Problem Formulation– Data: A relation with multiple attributes– Query: A set of prefixes on attributes in a

form interface – Answers:

• Local results of the focused attribute• Global results of the relation

• Advantages– On-the-fly Faceted Search– Supporting Aggregation

CIDR'11 - DBease

(14)

Page 15: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Data Partition

• Global Table Local Tables

CIDR'11 - DBease

(15)

ID Title Conf. Author

1 xml database

VLDB albert

2 xml database

SIGMOD

bob

3 xml search VLDB albert

4 xml security

VLDB alice

5 rdbms SIGMOD

charlie

ID Conf.

C1 VLDB

C2 SIGMOD

ID Author

A1 albert

A2 bob

A3 alice

A4 charlie

Page 16: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Indexing

• Each attribute– Trie– Mapping Tables

• Local Global• Global Local

CIDR'11 - DBease

(16)

……

Φ

x

m

l

s

e

T1: xml datrabase

T2: xml search

T3: xml security

a c

T1

Trie:1, 2

T2 3

T3 4

T4 5

L-G Mapping Table:

1 T1

2 T1

3 T2

4 T3

G-L Mapping Table:

5 T4

Page 17: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Our Solution

CIDR'11 - DBease

(17)

Author:

Title:

xml databasexml searchxml security

xml database (albert)xml database (bob)xml search (albert)xml security (alice)

……

Φ

x

m

l

s

e

T1: xml datrabase

T2: xml search

T3: xml security

a c

T1

Trie:1, 2

T2 3

T3 4

T4 5

L-G Mapping Table:

1 T1

2 T1

3 T2

4 T3

G-L Mapping Table:

5 T4

Page 18: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Author:

xmlTitle:

albertalice

xml database, albertxml search, albertxml security, aliceal

Our Solution

CIDR'11 - DBease

(18)

l

b

e

r

i

c

5: alice

4: albert

e

T1

Trie

1, 2

T2 3

T3 4

T4 5

L-G Mapping Table

1 T1

2 T1

3 T2

4 T3

G- L Mapping Table

5 T4

a

a

Page 19: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

SQL Suggestion

CIDR'11 - DBease

(19)

Type-Ahead Search

Type-Ahead Search in Forms

SQL SuggestionU

sab

ilit

y

Page 20: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

What is SQL Suggestion?

CIDR'11 - DBease

(20)

Page 21: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

SQL Suggestion

• Problem Formulation– Data: A database with multiple tables– Query: A set of keywords– Answers: Relevant SQL queries

• Advantages– Suggest SQL queries based on keywords– Help users formulate SQL queries to find accurate

results– Designed for both SQL programmers and Internet users– Group answers based on SQL structures– Support Aggregation– Support Range queries

CIDR'11 - DBease

(21)

Page 22: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Our Solution

• Suggest Templates from Keywords– A template is a structure in

the databases– Modeled as a graph

• Nodes: entities (table names or attribute names)

• Edges: foreign keys or membership

• Suggest SQL queries from Templates– Mapping between keywords

and templates

CIDR'11 - DBease

(22)

keyword paper ir(a) Query

(b) Template

(c) SQL

Page 23: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Template Suggestion

• Template Generation– Extension from basic entities

(tables)

• Template Ranking– Template weight

• Pagerank – Relevancy between a keyword

and an entity• Tf*idf

• Algorithms– Fagin algorithms– Threshold-based pruning

techniques

CIDR'11 - DBease

(23)

Page 24: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

SQL Suggestion

• SQL suggestion model– Mapping from keywords to templates – Matching is a set of mappings with all

keywords– Weighted set-covering problem (NP-hard)

• SQL ranking– Relevancy between keywords and attributes – Attribute weight

• Algorithms– Greedy algorithms

CIDR'11 - DBease

(24)

Page 25: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Search: dbeasehttp://dbease.cs.tsinghua.edu.cn

Page 26: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer
Page 27: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Differences to Google Instant Search• Fuzzy prefix matching• Google firstly predicts queries, and

then use the top queries to search the documents. Google may involve false negatives, while we can find the accurate top-k answers.

CIDR'11 - DBease

(27)

Page 28: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Differences to Complete Search• Fuzzy prefix matching• Different index structures• More efficient

CIDR'11 - DBease

(28)

Page 29: DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer

Differences to Keyword Search• Effectiveness

– SQL Suggestion supports range queries, and aggregation functions.

– SQL Suggestion can group answers.– SQL Suggestion can help users to express

their query intent more accurately.

• Efficiency– Faster

CIDR'11 - DBease

(29)