Transcript
Page 1: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Big Data Management and AnalyticsAssignment 7

1

Page 2: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-1

(a) Explain each of the terms by providing a short definition

Aggregation: Matching of similar objects to groups and aggregation of theentire group

Compression: Compress received data

Data Reduction: reduce the size of received data

Histograms: Describes a method for approximating frequencydistributions of elements in streams

2

Page 3: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-1

(a) Explain each of the terms by providing a short definition

Load Shedding: Given the case where the data in the stream arrives withsuch a high velocity that it could overburden the system, some part of thedata will be discarded.

Microclusters: Describes a group of similar objects

Sampling: Selecting a subset of the data

Wavelets: Deconstruct a signal in several coefficients

3

Page 4: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-1

(b) Illustrate how the terms are related to each other.

4

Data Reduction

SamplingLoad

Shedding

Aggregation

Histograms

Microclusters

Compression Wavelets

Page 5: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

Given the following input sequence S = (4,1,2,3,6,1,7,6)

(a) Perform a Haar Wavelet Transformation on S and determine theWavelet coefficients

Transform S into a sequence of two-component vectors 𝑠1, 𝑑1 … 𝑠𝑛, 𝑑𝑛

where 𝑠𝑖𝑑𝑖=1

2

1 11 βˆ’1

π‘₯𝑖π‘₯𝑖+1

5

Vector withcurrent elementof the sequenceand its sucessorelement

Haar matrix

Page 6: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

Given the following input sequence S = (4,1,2,3,6,1,7,6)

(a) Perform a Haar Wavelet Transformation on S and determine theWavelet coefficients

Step1:

𝑠1 =4+1

2,2+3

2,6+1

2,7+6

2= 2.5, 2.5, 3.5, 6.5

𝑑1 =4βˆ’1

2,2βˆ’3

2,6βˆ’1

2,7βˆ’6

2= (1.5, βˆ’0.5, 2.5, 0.5)

6

Page 7: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

Given the following input sequence S = (4,1,2,3,6,1,7,6)

Step1:

𝑠1 =4+1

2,2+3

2,6+1

2,7+6

2= 2.5, 2.5, 3.5, 6.5

𝑑1 =4βˆ’1

2,2βˆ’3

2,6βˆ’1

2,7βˆ’6

2= (1.5, βˆ’0.5, 2.5, 0.5)

Step2:

𝑠2 =2.5+2.5

2,3.5+6.5

2= 2.5, 5

𝑑2 =2.5βˆ’2.5

2,3.5βˆ’6.5

2= (0,βˆ’1.5)

7

Page 8: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

Given the following input sequence S = (4,1,2,3,6,1,7,6)

Step2:

𝑠2 =2.5+2.5

2,3.5+6.5

2= 2.5, 5

𝑑2 =2.5βˆ’2.5

2,3.5βˆ’6.5

2= (0,βˆ’1.5)

Step3:

𝑠3 =2.5+5

2= 3.75

𝑑3 =2.5βˆ’5

2= (βˆ’1.25)

8

Page 9: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

Given the following input sequence S = (4,1,2,3,6,1,7,6)

9

Mean Coefficients

(4,1,2,3,6,1,7,6) (-)

(2.5, 2.5, 3.5, 6.5) (1.5, -0.5, 2.5, 0.5)

(2.5, 5) (0, -1.5)

(3.75) (-1.25)

Page 10: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

Given the following input sequence S = (4,1,2,3,6,1,7,6)

(b) Reconstruct the original sequence S using the Wavelet coefficients

For the reconstruction of a sequence S we use:

1 11 βˆ’1

βˆ™ 𝑠𝑖𝑑𝑖

= π‘₯′𝑖π‘₯′𝑖+1

10

Page 11: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

The wavelet coefficients obtained from (a):

DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)

Loss-free reconstruction:

11

8

7

6

5

4

3

2

1

0

-1

-21 2 3 4 5 6 7 8

Page 12: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

The wavelet coefficients obtained from (a):

DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)

Loss-free reconstruction:

12

8

7

6

5

4

3

2

1

0

-1

-2

1 11 βˆ’1

βˆ™ 3.75βˆ’1.25

= 2.55

3.75

5

2.5

βˆ†= 1.25

Intuition:

negative sign: start below the base linepositive sign: start above the base line

1 2 3 4 5 6 7 8

Take the middle of each (sub)-plateau!

Page 13: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

The wavelet coefficients obtained from (a):

DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)

Loss-free reconstruction:

13

8

7

6

5

4

3

2

1

0

-1

-2

1 11 βˆ’1

βˆ™ 2.50

= 2.52.5

3.5

5

2.5

βˆ†= 1.5

1 2 3 4 5 6 7 8

1 11 βˆ’1

βˆ™ 5βˆ’1.5

= 3.56.5

6.5

βˆ†= 0

Page 14: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

The wavelet coefficients obtained from (a):

DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)

Loss-free reconstruction:

14

8

7

6

5

4

3

2

1

0

-1

-2

1 11 βˆ’1

βˆ™ 2.51.5

= 41

4

1 2 3 4 5 6 7 8

1 11 βˆ’1

βˆ™ 2.5βˆ’0.5

= 23

6

1 11 βˆ’1

βˆ™ 3.52.5

= 61

1 11 βˆ’1

βˆ™ 6.50.5

= 761

2

3

6

1

7

The original sequence is

S = (4,1,2,3,6,1,7,6)

Page 15: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

(c) We assume that all coefficients of value [-0.5,0.5] are close to zero

DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)

Changes to:

DWTβ€˜(S) = (3.75, -1.25, 0, -1.5, 1.5, 0, 2.5, 0)

Reconstructing S with DWTβ€˜(S):

15

1 11 βˆ’1

𝑇

βˆ™ 3.75βˆ’1.25

= 2.55

1 11 βˆ’1

𝑇

βˆ™ 2.50

= 2.52.5

1 11 βˆ’1

𝑇

βˆ™ 5βˆ’1.5

= 3.56.5

1 11 βˆ’1

βˆ™ 2.51.5

= 41

1 11 βˆ’1

βˆ™ 2.50

= 2.52.5

1 11 βˆ’1

βˆ™ 3.52.5

= 61

1 11 βˆ’1

βˆ™ 6.50

= 6.56.5

Page 16: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

(c) We assume that all coefficients of value [-0.5,0.5] are close to zero

DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)

Changes to:

DWTβ€˜(S) = (3.75, -1.25, 0, -1.5, 1.5, 0, 2.5, 0)

Using DWTβ€˜(S) leads to a loss afflicted reconstruction with:

S = (4,1,2,3,6,1,7,6)

Sβ€˜ = (4,1,2.5,2.5,6,1,6.5,6.5)

Now take each residue from S,Sβ€˜ and compute their difference and sum upthe differences to the total approximation error:

πœ€π‘’π‘Ÿπ‘Ÿ(𝑆, 𝑆′) =

𝑖=0

𝑆 βˆ’1

𝑆 𝑖 βˆ’ 𝑆′(𝑖)

16

Page 17: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-2

(c) We assume that all coefficients of value [-0.5,0.5] are close to zero

S = (4,1,2,3,6,1,7,6)

Sβ€˜ = (4,1,2.5,2.5,6,1,6.5,6.5)

Now take each residue from S,Sβ€˜ and compute their difference and sum upthe differences to the total approximation error:

πœ€π‘’π‘Ÿπ‘Ÿ(𝑆, 𝑆′) =

𝑖=0

𝑆 βˆ’1

𝑆 𝑖 βˆ’ 𝑆′(𝑖)

πœ€π‘’π‘Ÿπ‘Ÿ 𝑆, 𝑆′ = 4 βˆ’ 4 + 1 βˆ’ 1 + 2 βˆ’ 2.5 + 3 βˆ’ 2.5 + 6 βˆ’ 6 + 1 βˆ’ 1 +

7 βˆ’ 6.5 + 6 βˆ’ 6.5 = 2

17

Page 18: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-3

(a) Compute the reduced representation of S using PAA (box size M=4)

Hint: A PAA approximates a time series 𝑋 of length 𝑁 with a vector

𝑋 = ( π‘₯1, … , π‘₯𝑀) of arbitrary length 𝑀 ≀ 𝑁, where for each π‘₯𝑖 holds:

π‘₯𝑖 =𝑀

𝑁

𝑗=𝑁𝑀 π‘–βˆ’1 +1

𝑁𝑀𝑖

π‘₯𝑗

18

Page 19: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-3

(a) Compute the reduced representation of S using PAA (box size M=4)

position p = (1 ,2 ,3 ,4 ,5 ,6 ,7 ,8)

Initial sequence S = (4 ,1 ,2 ,3 ,6 ,1 ,7 ,6)

19

π‘₯1 =4

8

𝑗=84 1βˆ’1 +1=1

841=2

π‘₯𝑗 =(4 + 1)

2= 2.5

π‘₯2 =4

8

𝑗=84 2βˆ’1 +1=3

842=4

π‘₯𝑗 =(2 + 3)

2= 2.5

Page 20: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-3

(a) Compute the reduced representation of S using PAA (box size M=4)

position p = (1 ,2 ,3 ,4 ,5 ,6 ,7 ,8)

Initial sequence S = (4 ,1 ,2 ,3 ,6 ,1 ,7 ,6)

20

π‘₯3 =4

8

𝑗=84 3βˆ’1 +1=5

843=6

π‘₯𝑗 =(6 + 1)

2= 3.5

π‘₯4 =4

8

𝑗=84 4βˆ’1 +1=7

842=8

π‘₯𝑗 =(7 + 6)

2= 6.5

Page 21: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-3

(a) Compute the reduced representation of S using PAA (box size M=4)

PAA(S) = (2.5, 2.5, 3.5, 6.5)

21

8

7

6

5

4

3

2

1

0

-1

-2

4

1 2 3 4 5 6 7 8

6

1

2

3

6

1

7

The red linelooks somehowfamiliar…

Page 22: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-3

(b) Convince yourself that PAA and DWT (using Haar Wavelets as basisfunction!) are equivalent!

22

8

7

6

5

4

3

2

1

0

-1

-2

4

1 2 3 4 5 6 7 8

8

7

6

5

4

3

2

1

0

-1

-21 2 3 4 5 6 7 8

PAA DWT

Page 23: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-3

(b) Convince yourself that PAA and DWT (using Haar Wavelets as basisfunction!) are equivalent!

23

Given the case that the number of coefficients of the DWT is a power of twoit is always possible to convert between the representations (PAA and Haar)

Source: Keogh E. et. Al. - Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases; KAIS Long paper (2000)

Page 24: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-3

(b) Convince yourself that PAA and DWT (using Haar Wavelets as basisfunction!) are equivalent!

24

Mean Coefficients

(4,1,2,3,6,1,7,6) (-)

(2.5, 2.5, 3.5, 6.5) (1.5, -0.5, 2.5, 0.5)

(2.5, 5) (0, -1.5)

(3.75) (-1.25)

Page 25: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-4

Given a data stream of size 𝑁. Randomly select π‘˜ ≀ 𝑁 elements from thestream. Here π‘˜ represents the size of the reservoir.

(a) Setting π‘˜ = 1,𝑁 = 2. The first element is in the reservoir, the second isnot. What is the probability of both elements to be in the reservoir?

25

A, B

π‘˜ = 1 𝑁 = 2Keep firstitem in memory

BA

𝑝𝑛𝑒𝑀 =1

𝑖=1

2

To keep the newitem and removethe old one

π‘π‘œπ‘™π‘‘ = 1 βˆ’1

𝑖=1

2To keep the olditem A and ignorethe new one

Page 26: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-4

Given a data stream of size 𝑁. Randomly select π‘˜ ≀ 𝑁 elements from thestream. Here π‘˜ represents the size of the reservoir.

(b) Setting π‘˜ = 1,𝑁 = 3. What is now the probability for each of theelements to be in the reservoir?

26

A, B, C

π‘˜ = 1 𝑁 = 3Keep firstitem in memory

B, CA

π‘π‘œπ‘™π‘‘ =1

2(1 βˆ’

1

3) =1

2βˆ™2

3=1

3

𝑝𝑛𝑒𝑀 =1

𝑖=1

3

Page 27: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-4

Given a data stream of size 𝑁. Randomly select π‘˜ ≀ 𝑁 elements from thestream. Here π‘˜ represents the size of the reservoir.

(c) Setting π‘˜ = 1. What is the probability for any given 𝑁?

27

A, B, C, …

π‘˜ = 1 𝑁 = 𝑛Keep firstitem in memory

B, C, …A

π‘π‘œπ‘™π‘‘ =1

2βˆ™2

3βˆ™3

4… βˆ™π‘–βˆ’2

π‘–βˆ’1βˆ™π‘–βˆ’1

𝑖=1

𝑖

Page 28: Big Data Management and Analytics Assignment 7

DATABASESYSTEMSGROUP

Assignment 7-4

Given a data stream of size 𝑁. Randomly select π‘˜ ≀ 𝑁 elements from thestream. Here π‘˜ represents the size of the reservoir.

(d) What is the probability for an arbitrary reservoir size π‘˜ and an abitrarystream size 𝑁?

28

A, B, C, …

π‘˜ = ? 𝑁 = 𝑛Keep firstk items in memory

…

π‘π‘œπ‘™π‘‘ =π‘˜

π‘˜+1βˆ™π‘˜+1

π‘˜+2βˆ™π‘˜+2

π‘˜+3… βˆ™π‘˜+π‘–βˆ’π‘˜βˆ’1

𝑖=π‘˜

𝑖

A, … 𝑝𝑛𝑒𝑀 =π‘˜

𝑖


Top Related