multimedia dbs

30
Multimedia DBs

Upload: dreama

Post on 05-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

Multimedia DBs. PAA and APCA. Another approach: segment the time series into equal parts, store the average value for each part. Use an index to store the averages and the segment end points. X. X. X. X'. X'. X'. SVD. DFT. DWT. eigenwave 0. 0. Haar 0. eigenwave 1. 1. 0. 0. 0. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multimedia DBs

Multimedia DBs

Page 2: Multimedia DBs

PAA and APCA

• Another approach: segment the time series into equal parts, store the average value for each part.

• Use an index to store the averages and the segment end points

Page 3: Multimedia DBs

0

1

2

3 4

5

6

7

Haar 0

Haar 1

Haar 2

Haar 3

Haar 4

Haar 5

Haar 6

Haar 7

0 20 40 60 80 100 120 140

X

X'DFT

Agrawal, Faloutsos, Swami 1993

Chan & Fu 1999

eigenwave 0

eigenwave 1

eigenwave 2

eigenwave 3

eigenwave 4

eigenwave 5

eigenwave 6

eigenwave 7

Korn, Jagadish, Faloutsos 1997

Feature Spaces

X

X'DWT

0 20 40 60 80 100 120 140

X

X'SVD

0 20 40 60 80 100 120 140

Page 4: Multimedia DBs

Piecewise Aggregate Approximation (PAA)

valueaxis

time axis

Original time series(n-dimensional vector)S={s1, s2, …, sn}

n’-segment PAA representation (n’-d vector)

S = {sv1 , sv2, …, svn’ }sv1

sv2 sv3sv4

sv5

sv6

sv7

sv8

PAA representation satisfies the lower bounding lemma(Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)

Page 5: Multimedia DBs

Can we improve upon PAA?

n’-segment PAA representation

(n’-d vector)

S = {sv1 , sv2, …, svN }

sv1

sv2 sv3sv4

sv5

sv6

sv7

sv8

sv1

sv2

sv3

sv4

sr1 sr2 sr3 sr4

n’/2-segment APCA representation

(n’-d vector)

S= { sv1, sr1, sv2, sr2, …, svM , srM }

(M is the number of segments = n’/2)

Adaptive Piecewise Constant Approximation (APCA)

Page 6: Multimedia DBs

1.69

3.02

1.21

1.75

3.77

1.03

Reconstruction error PAA Reconstruction error APCA

APCA approximates original signal better than PAA

Improvement factor =

Page 7: Multimedia DBs

APCA Representation can be computed efficiently

• Near-optimal representation can be computed in O(nlog(n)) time

• Optimal representation can be computed in O(n2M) (Koudas et al.)

Page 8: Multimedia DBs

Q

M

i iiii svqvsrsr1

21 ))((

DLB(Q’,S)

DLB(Q’,S)

Distance Measure

S

Q

D(Q,S)

n

iii sq

1

2

D(Q,S)

Exact (Euclidean) distance D(Q,S) Lower bounding distance DLB(Q,S)

S

S

Q’

Page 9: Multimedia DBs

Index on 2M-dimensional APCA space

Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree)

R1

R3

R2R4

2M-dimensional APCA space

S6

S5

S1

S2 S3

S4

S8

S7

S9

R2 R3 R4

R3 R4

R1

S3 S4 S5 S6 S7 S8 S9S2S1

R2

Page 10: Multimedia DBs

k-nearest neighbor AlgorithmR1

S7

R3

R2

R4

S1

S2S3

S5

S4

S6

S8

S9

MINDIST(Q,R2)

MINDIST(Q,R4)

MINDIST(Q,R3)

Q

For any node U of the index structure with MBR R, MINDIST(Q,R) D(Q,S) for any data item S under U

Page 11: Multimedia DBs

Index Modification for MINDIST Computation

APCA point S= { sv1, sr1, sv2, sr2, …, svM, srM }

S1

S2S3

S5

S4 S6

S8S9

R1

R3

R2R4

APCA rectangle S= (L,H) where

L= { smin1, sr1, smin2, sr2, …, sminM, srM } and

H = { smax1, sr1, smax2, sr2, …, smaxM, srM }

sv1

sv2

sv3

sv4

sr1 sr2 sr3 sr4

smax3

smin3

smax1

smin1

smax2

smin2

smax4

smin4

S7

Page 12: Multimedia DBs

REGION 3

REGION 2

REGION 1

MBR Representation in time-value space

valueaxis

time axis L= { l1, l2, l3, l4 , l5, l6 }

We can view the MBR R=(L,H) of any node U as two APCA representations

L= { l1, l2, …, l(N-1), lN } and H= { h1, h2, …, h(N-1), hN }

l1

l2

l3

l4 l6

l5

H= { h1, h2, h3, h4 , h5, h6 }

h1

h2

h3

h4

h5

h6

Page 13: Multimedia DBs

Regions

M regions associated with each MBR; boundaries of ith region:

REGION i

l(2i-1)

h(2i-1)

h2il(2i-2)+1

h3

h1

h5

h2 h4 h6

valueaxis

time axis

l3

l1

l2 l4

l6

l5

REGION 1

REGION 3

REGION 2

Page 14: Multimedia DBs

Regions

h3

h1

h5

h2 h4 h6

valueaxis

time axis

l3

l1

l2 l4

l6

l5

REGION 2 t1 t2

REGION 3

REGION 1

• ith region is active at time instant t if it spans across t

• The value st of any time series S under node U at time instant t must

lie in one of the regions active at t (Lemma 2)

Page 15: Multimedia DBs

MINDIST Computation

For time instant t, MINDIST(Q, R, t) =

minregion G active at t MINDIST(Q,G,t)

h3

h1

h5

h2 h4 h6

l3

l1

l2 l4

l6

l5

t1

REGION 3

REGION 2

REGION 1

MINDIST(Q,R,t1)=min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1))=min((qt1 - h1)2 , (qt1 - h3)2 )=(qt1 - h1)2

MINDIST(Q,R) =

n

ttRQMINDIST

1),,(

Lemma3: MINDIST(Q,R) D(Q,C) for any time series C under node U

Page 16: Multimedia DBs

Approximate Search

• A simpler definition of the distance in the feature space is the following:

• But there is one problem… what?

M

i crki

crcr

k i

ii qcv1

2

1)(

1

1DLB(Q’,S)

Page 17: Multimedia DBs

Multimedia dbs

• A multimedia database stores also images

• Again similarity queries (content based retrieval)

• Extract features, index in feature space, answer similarity queries using GEMINI

• Again, average values help!

Page 18: Multimedia DBs

Images - color

what is an image?A: 2-d array

Page 19: Multimedia DBs

Images - color

Color histograms,and distance function

Page 20: Multimedia DBs

Images - color

Mathematically, the distance function is:

Page 21: Multimedia DBs

Images - color

Problem: ‘cross-talk’:

• Features are not orthogonal ->

• SAMs will not work properly

• Q: what to do?

• A: feature-extraction question

Page 22: Multimedia DBs

Images - color

possible answers: • avg red, avg green, avg blue

it turns out that this lower-bounds the histogram distance ->

• no cross-talk• SAMs are applicable

Page 23: Multimedia DBs

Images - color

performance:

time

selectivity

w/ avg RGB

seq scan

Page 24: Multimedia DBs

Images - shapes

• distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

• (Q: how to normalize them?

Page 25: Multimedia DBs

Images - shapes

• distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

• (Q: how to normalize them?

• A: divide by standard deviation)

Page 26: Multimedia DBs

Images - shapes

• distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

• (Q: other ‘features’ / distance functions?

Page 27: Multimedia DBs

Images - shapes

• distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

• (Q: other ‘features’ / distance functions?

• A1: turning angle

• A2: dilations/erosions

• A3: ... )

Page 28: Multimedia DBs

Images - shapes

• distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

• Q: how to do dim. reduction?

Page 29: Multimedia DBs

Images - shapes

• distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

• Q: how to do dim. reduction?

• A: Karhunen-Loeve (= centered PCA/SVD)

Page 30: Multimedia DBs

Images - shapes

• Performance: ~10x faster

# of features kept

log(# of I/Os)

all kept