efficient computation of diverse query results presenting: karina koifman course : db seminar

42
EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Post on 22-Dec-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS

Presenting: Karina Koifman Course : DB Seminar

Page 2: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Example

Page 3: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Example

Yahoo! Autos

Page 4: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Maybe a better retrieval

Page 5: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Introduction

The article talks about the problem of

efficiently computing diverse query results

in online shopping applications.

Page 6: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

The Goal

The goal of diverse query answering

is to return a representative set of

top-k answers from all the tuples

that satisfy the user selection

condition

Page 7: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Users issues query for a

product

Only most relevant answers are

shown.

Many Duplications

The Problem

Page 8: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Existing Solutions

Definition of diversity

Impossibility results of

diversity.

Query processing technique.

Agenda

Page 9: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Existing Solutions

Existing solutions are inefficient or

do not work in all situations.

Example:

Obtain all the query results and

then pick a diverse subset from

these results doesn’t scale for

large data sets.

Page 10: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Existing Solutions

Web search engines:

first retrieve c × k and then pick a diverse subset from

these.

It is more efficient than the previous method.

many duplicates product sale. (inefficient and

doesn’t guarantee diversity)

Page 11: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Existing Solutions

issuing multiple queries to obtain diverse results:

Page 12: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Pro’s\Con’s

The good:

Diversity

The Bad:

Hurts performance

Empty results

*There are no Honda

Accord convertibles

Page 13: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Existing Solutions

Definition of diversity

Impossibility results of

diversity.

Query processing technique.

Agenda

Page 14: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

A diversity ordering of a relation R with

attributes A, denoted by , is a total

ordering of the attributes in A.

Example: Make ≺ Model ≺ Color ≺ Year ≺

Description ≺ Id

Diversity Ordering

R

Page 15: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

The DB example

Id Make Model Color Year Description

1 Honda Civic Green 2007 Low miles

2 Honda Civic Blue 2007 Low miles

3 Honda Civic Red 2007 Low miles

4 Honda Civic Black 2007 Low miles

5 Honda Civic Black 2006 Low miles

6 Honda Accord Blue 2007 Best Price

7 Honda Accord Red 2006 Good miles

8 Honda Odyssey Green 2007 Rare

9 Honda Odyssey Green 2006 Good miles

10 Honda CRV Red 2007 Fun Car

11 Honda CRV Orange 2006 Good miles

12 Toyota Prius Tan 2007 Low miles

13 Toyota Corolla Black 2007 Low miles

14 Toyota Tercel Blue 2007 Low miles

15 Toyota Camry Blue 2007 Low miles

Page 16: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Similarity – SIM(X,Y)

1 Honda Civic Green 2007 Low miles

2 Honda Civic Blue 2007 Low miles

( , ) 1SIM x y

12 Toyota Prius Tan 2007 Low miles

1 Honda Civic Green 2007 Low miles

( , ) 0SIM x y

Find a result set that

minimizes

,( , )

x y SSIM x y

Page 17: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Example - Similarity

Id Make Model Color Year Description

1 Honda Civic Green 2007 Low miles

6 Honda Accord Blue 2007 Best Price

8 Honda Odyssey Green 2007 Rare

Id Make Model Color Year Description

1 Honda Civic Green 2007 Low miles

2 Honda Civic Blue 2007 Low miles

12 Toyota Prius Tan 2007 Low miles

Page 18: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Prefix

Id Make Model Color Year Description

1 Honda Civic Green 2007 Low miles

Id Make Model Color Year Description

2 Honda Civic Blue 2007 Low miles

Id Make Model Color Year Description

8 Honda Odyssey Green 2007 Rare

9 Honda Odyssey Green 2006 Good miles

Page 19: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Few more definitions

RES(R,Q) of size k

Given relation R and query Q, let maxval =

,K R Q

max ( ), where ,

is the sum of the scores of tuples in TKT Score T Score T

Page 20: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Existing Solutions

Definition of diversity

Impossibility results of

diversity.

Query processing technique.

Agenda

Page 21: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Impossibility Results

Intuition: IR score of an item depends

only on the item and possibly statistics

from the entire corpus, but diversity

depends on the other items in the

query result set.

Page 22: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Inverted Lists

Honda cars

Honda

Car

Merged Inverted List:

Page 23: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Impossibility Results

Item in an inverted list has a score, which can either be a global

score (e.g., PageRank) or a value/keyword -dependent score (e.g.,

TF-IDF).

The items in each list are usually ordered by their score – so that

we could handle top-k queries .

If we assume that we have a scoring function f() that is monotonic-

which as a normal assumption for traditional IR system, then the

article proofs either it’s not diverse or to inefficient\infeasible.

Page 24: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Existing Solutions

Definition of diversity

Impossibility results of diversity.

Query processing technique.

Agenda

Page 25: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

The DB example

Id Make Model Color Year Description

1 Honda Civic Green 2007 Low miles

2 Honda Civic Blue 2007 Low miles

3 Honda Civic Red 2007 Low miles

4 Honda Civic Black 2007 Low miles

5 Honda Civic Black 2006 Low miles

6 Honda Accord Blue 2007 Best Price

7 Honda Accord Red 2006 Good miles

8 Honda Odyssey Green 2007 Rare

9 Honda Odyssey Green 2006 Good miles

10 Honda CRV Red 2007 Fun Car

11 Honda CRV Orange 2006 Good miles

12 Toyota Prius Tan 2007 Low miles

13 Toyota Corolla Black 2007 Low miles

14 Toyota Tercel Blue 2007 Low miles

15 Toyota Camry Blue 2007 Low miles

Page 26: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

The car indexing example

Page 27: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

One-pass Algorithm

Lets say Q looks for descriptions with ‘Low’, with k=3

Honda.Civic.Green.2007.’Low miles’

Page 28: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

One-pass Algorithm

We start from two Civics , then we know that we need only

one more so we pick the next Civic

Page 29: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

One-pass Algorithm

Then we look for another in next level (Accord)- no such,

because it doesn’t have ‘Low’ in it (also no other in that level).

Page 30: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

One-pass Algorithm

Then we look for another in next level (make)- and prune,

This is maximum diverse – we stop here.

Page 31: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

One-pass Algorithm

If we had a Ford, we would continue

Ford

Focus0

Black0

070

Lowmiles

0

Page 32: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Scored One-pass Algorithm

Give each car a score , then the query would take this

score as parameter- minScore- smallest score in the

result set,

Choose next next ID by :

The smallest ID such that score(id)>=root.minScore.

And the algorithm proceeds as before.

Page 33: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Probing Algorithm

Main idea: to go over all the cars as they were on an axis

K=

1

K=

2

K=

3

Page 34: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Advantage of bidirectional exploring

“Honda” only has one child,

we found it quickly not exploring

every option (only civic).

Each time we add a node to the

diverse solution we do not have to

prune it- unlike the OnePass

algorithm.

Page 35: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

WAND algorithm

WAND is an efficient method of obtaining top-K

lists of scored results, without explicitly merging

the full inverted lists.

AND(X1,X2,...Xk)≡ WAND(X1,1,X2,1, ...Xk,1,k),

OR(X1,X2,...Xk) ≡ WAND(X1,1,X2,1, ...Xk,1,1).

To obtain k best results the operator uses the

upper bounds of maximum contribution, and

temp threshold.

WAND(X1,UB1,X2,UB2,...,Xk ,UBk, θ)

Page 36: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Scored Probing AlgorithmWe use the WAND algorithm- to obtain the top-k list.

Next step is marking all possible nodes to add- as

MIDDLE.

we also maintain a heap – for a node with minimum

child.

Each step we move nodes from tentative to useful .

Page 37: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Experiments

MultQ – rewriting the query as multiple

queries and merging their results.

Naïve – all the results of a query

Basic - just first k answers – without

diversity.

OnePass , Probe – our algorithms

U = unscored

S = scored

Page 38: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Experiments

Page 39: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Experiments

Page 40: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Conclusions

Formalized diversity in structured

search and proposed inverted-list

algorithms.

The experiments showed that the

algorithms are scalable and

efficient.

In particular, diversity can be

implemented with little additional

overhead when compared to

traditional approaches

Page 41: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Extension of the algorithm

Assign higher weights to

Hondas and Toyotas when

compared to Teslas, so that

the diverse results have

more Hondas and Toyotas.

Page 42: EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar

Questions?

Thank You!