big data management and analytics assignment 8

40
DATABASE SYSTEMS GROUP Big Data Management and Analytics Assignment 8 1

Upload: others

Post on 10-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Big Data Management and AnalyticsAssignment 8

1

Page 2: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

Finding similar items

Suppose that the universal set is given by {1,…,10}. Construct minhashsignatures for the following sets:

(a) 𝑆1 = {3,6,9}

(b) 𝑆2 = {2,4,6,8}

(c) 𝑆3 = {2,3,4}

1. Construct the signatures for the sets using the following list ofpermutations:

β€’ (1,2,3,4,5,6,7,8,9,10)

β€’ (10,8,6,4,2,9,7,5,3,1)

β€’ (4,7,2,9,1,5,3,10,6,8)

2

Page 3: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

i. Create first a characteristic matrix foreach permutation

(a) Set one column with the shingles

(b) Set columns for each document

(c) Fill the document columns by setting a 1 where there is an occurence for each shingle, and 0 else

3

Element 𝑆1 𝑆2 𝑆3

1 0 0 0

2 0 1 1

3 1 0 1

4 0 1 1

5 0 0 0

6 1 1 0

7 0 0 0

8 0 1 0

9 1 0 0

10 0 0 0

Page 4: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

i. Create first a characteristic matrix for each permutation

4

Element 𝑆1 𝑆2 𝑆3

1 0 0 0

2 0 1 1

3 1 0 1

4 0 1 1

5 0 0 0

6 1 1 0

7 0 0 0

8 0 1 0

9 1 0 0

10 0 0 0

Element 𝑆1 𝑆2 𝑆3

10 0 0 0

8 0 1 0

6 1 1 0

4 0 1 1

2 0 1 1

9 1 0 0

7 0 0 0

5 0 0 0

3 1 0 1

1 0 0 0

Element 𝑆1 𝑆2 𝑆3

4 0 1 1

7 0 0 0

2 0 1 1

9 1 0 0

1 0 0 0

5 0 1 0

3 1 0 1

10 0 1 0

6 1 1 0

8 0 1 0

1st permutation 2nd permutation 3rd permutation

Page 5: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

ii. Compute the minhash for each permutation

5

Element 𝑆1 𝑆2 𝑆3

1 0 0 0

2 0 1 1

3 1 0 1

4 0 1 1

5 0 0 0

6 1 1 0

7 0 0 0

8 0 1 0

9 1 0 0

10 0 0 0

1st permutation

Select the first occurence of 1 per setand get the elements at which they canbe found

Which leads to the following minhash:β„Ž 𝑆1 = 3β„Ž 𝑆2 = 2β„Ž 𝑆3 = 2

Page 6: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

ii. Compute the minhash for each permutation

6

The same procedure for the 2nd permutation…

β„Ž 𝑆1 = 6β„Ž 𝑆2 = 8β„Ž 𝑆3 = 4

Element 𝑆1 𝑆2 𝑆3

10 0 0 0

8 0 1 0

6 1 1 0

4 0 1 1

2 0 1 1

9 1 0 0

7 0 0 0

5 0 0 0

3 1 0 1

1 0 0 0

2nd permutation

Page 7: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

ii. Compute the minhash for each permutation

7

…and for the 3rd permutation

Which leads to the following minhash:β„Ž 𝑆1 = 9β„Ž 𝑆2 = 4β„Ž 𝑆3 = 4

Element 𝑆1 𝑆2 𝑆3

4 0 1 1

7 0 0 0

2 0 1 1

9 1 0 0

1 0 0 0

5 0 1 0

3 1 0 1

10 0 1 0

6 1 1 0

8 0 1 0

3rd permutation

This yields the following signatures:𝑆𝐼𝐺 𝑆1 = {3,6,9}𝑆𝐼𝐺 𝑆2 = {2,8,4}𝑆𝐼𝐺 𝑆3 = {2,4,4}

Page 8: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

2. Instead of using the previously given permutations use hash functions:

β„Ž1 π‘₯ = π‘₯ π‘šπ‘œπ‘‘ 10β„Ž2 π‘₯ = (2π‘₯ + 1) π‘šπ‘œπ‘‘ 10β„Ž3 π‘₯ = (3π‘₯ + 2) π‘šπ‘œπ‘‘ 10

8

Page 9: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

2. Instead of using the previously given permutations use hash functions:

i. Set up a table:

9

Element π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0

2 0 1 1

3 1 0 1

4 0 1 1

5 0 0 0

6 1 1 0

7 0 0 0

8 0 1 0

9 1 0 0

10 0 0 0

Page 10: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

ii. Compute the hash values (except for zero-rows):

10

Element π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

10 0 0 0 - - -

Page 11: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

11

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ ∞ ∞ ∞

π’‰πŸ ∞ ∞ ∞

π’‰πŸ‘ ∞ ∞ ∞

e π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

10 0 0 0 - - -

Page 12: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

12

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ ∞ 2 2

π’‰πŸ ∞ 5 5

π’‰πŸ‘ ∞ 8 8

e π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

10 0 0 0 - - -

Update of 2nd row:

Update only 𝑆2, 𝑆3!

𝑆2:min ∞, 2 = 2min ∞, 5 = 5min ∞, 8 = 8

𝑆3:min ∞, 2 = 2min ∞, 5 = 5min ∞, 8 = 8

Page 13: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

13

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ 3 2 2

π’‰πŸ 7 5 5

π’‰πŸ‘ 1 8 1

e π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

Update of 3rd row:

Update only 𝑆1, 𝑆3!

𝑆1:min ∞, 3 = 3min ∞, 7 = 7min ∞, 1 = 1

𝑆3:min 2,3 = 2min 5,7 = 5min 8,1 = 1

Page 14: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

14

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ 3 2 2

π’‰πŸ 7 5 5

π’‰πŸ‘ 1 4 1

e π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

10 0 0 0 - - -

Update of 4th row:

Update only 𝑆2, 𝑆3!

𝑆2:min 2,4 = 2min 5,9 = 5min 8,4 = 4

𝑆3:min 2,4 = 2min 5,9 = 5min 1,4 = 1

Page 15: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

15

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ 3 2 2

π’‰πŸ 3 3 5

π’‰πŸ‘ 0 0 1

e π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

10 0 0 0 - - -

Update of 6th row:

Update only 𝑆1, 𝑆2!

𝑆1:min 3,6 = 3min 7,3 = 3min 1,0 = 0

𝑆2:min 2,6 = 2min 5,3 = 3min 4,0 = 0

Page 16: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

16

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ 3 2 2

π’‰πŸ 3 3 5

π’‰πŸ‘ 0 0 1

e π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

10 0 0 0 - - -

Update of 8th row:

Update only 𝑆2!

𝑆2:min 2,8 = 2min 3,7 = 3min 0,6 = 0

Page 17: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

17

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ 3 2 2

π’‰πŸ 3 3 5

π’‰πŸ‘ 0 0 1

e π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘ π’‰πŸ(𝒙) π’‰πŸ(𝒙) π’‰πŸ‘(𝒙)

1 0 0 0 - - -

2 0 1 1 2 5 8

3 1 0 1 3 7 1

4 0 1 1 4 9 4

5 0 0 0 - - -

6 1 1 0 6 3 0

7 0 0 0 - - -

8 0 1 0 8 7 6

9 1 0 0 9 9 9

10 0 0 0 - - -

Update of 8th row:

Update only 𝑆1!

𝑆1:min 3,9 = 3min 3,9 = 3min 0,9 = 0

Page 18: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Create a table for all hash functions and sets and initialize them withinfinite distance:

This yields the following signatures:𝑆𝐼𝐺 𝑆1 = 3,3,0𝑆𝐼𝐺 𝑆2 = 2,3,0𝑆𝐼𝐺 𝑆3 = 2,5,1

18

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π’‰πŸ 3 2 2

π’‰πŸ 3 3 5

π’‰πŸ‘ 0 0 1

Page 19: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

3. How does the estimated Jaccard similarity from (1.) and (2.) comparewith the true Jaccard similarity of the original data? How to reducedeviations in the approximated Jaccard similarities?

19

RECAP: Jaccard similarity:

π‘‘π½π‘Žπ‘π‘π‘Žπ‘Ÿπ‘‘ 𝐷1, 𝐷2 = 𝐷1∩𝐷2𝐷1βˆͺ𝐷2

Page 20: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

i. Actual Jaccard similarity:

20

RECAP: Jaccard similarity:

π‘‘π½π‘Žπ‘π‘π‘Žπ‘Ÿπ‘‘ 𝐷1, 𝐷2 = 𝐷1∩𝐷2𝐷1βˆͺ𝐷2

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π‘ΊπŸ - 1 6 1 5

π‘ΊπŸ - - 2 5

π‘ΊπŸ‘ - - -

(a) 𝑆1 = {3,6,9}(b) 𝑆2 = {2,4,6,8}(c) 𝑆3 = {2,3,4}

Page 21: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

ii. Permutation estimated Jaccard similarity:

21

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π‘ΊπŸ - 0 6 0 5

π‘ΊπŸ - - 2 3

π‘ΊπŸ‘ - - -

(a) 𝑆1 = {3,6,9}(b) 𝑆2 = {2,8,4}(c) 𝑆3 = {2,4,4}

Page 22: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iii. Hash estimated Jaccard similarity:

22

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π‘ΊπŸ - 2 3 0 5

π‘ΊπŸ - - 1 5

π‘ΊπŸ‘ - - -

(a) 𝑆1 = {3,3,0}(b) 𝑆2 = {2,3,0}(c) 𝑆3 = {2,5,1}

Page 23: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-1

iv. Comparison:

23

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π‘ΊπŸ - 2 3 0 3

π‘ΊπŸ - - 1 3

π‘ΊπŸ‘ - - -

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π‘ΊπŸ - 0 3 0 3

π‘ΊπŸ - - 2 3

π‘ΊπŸ‘ - - -

π‘ΊπŸ π‘ΊπŸ π‘ΊπŸ‘

π‘ΊπŸ - 1 6 1 5

π‘ΊπŸ - - 2 5

π‘ΊπŸ‘ - - -

Actual J. similarity Perm. est. J similarity Hash. est. J similarity

Estimation of the actual J. similarity is rather poor, why?

Too small minhash vectors. Get more permutations of the universal set ormore hash functions to extend the minhash vectors!

Page 24: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

24

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Wait for the first 6 points to arrive

Page 25: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

25

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Wait for the first 6 points to arrive

Page 26: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

26

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Wait for the first 6 points to arrive

Page 27: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

27

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Wait for the first 6 points to arrive

Page 28: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

28

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Wait for the first 6 points to arrive

Page 29: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

29

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Wait for the first 6 points to arrive

Page 30: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

30

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Apply standard k-Means to create q clusters.

Page 31: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

31

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.

Page 32: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

32

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.

𝐢𝐹𝑇1 = (𝐢𝐹2𝑋, 𝐢𝐹1𝑋, 𝐢𝐹2𝑑, 𝐢𝐹1𝑑, 𝑛)

12 + 22

12 + 12=5

2

1 + 2

1 + 1=3

2

12 + 22 = 5

1 + 2 = 3

2

Page 33: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

33

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.

𝐢𝐹𝑇1 = (5

2,3

2, 5,3,2)

cπ‘’π‘›π‘‘π‘’π‘Ÿ = 322 =

1.51

Page 34: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

34

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

𝐢𝐹𝑇1 = (5

2,3

2, 5,3,2)

π‘Ÿπ‘Žπ‘‘π‘–π‘’π‘  = 𝐢𝐹2𝑋𝑛 βˆ’ 𝐢𝐹1𝑋

𝑛2

= 522 βˆ’

322

2

=2.51

βˆ’1.51

2

=2.51

βˆ’2.251

= 2.5 βˆ’ 2.25 + (1 βˆ’ 1) = 0.5

Page 35: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

35

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.

π’Šπ’… π‘ͺπ‘­πŸπ‘Ώ π‘ͺπ‘­πŸπ‘Ώ π‘ͺπ‘­πŸπ’• π‘ͺπ‘­πŸπ’• 𝒏 𝒄𝒆𝒏 𝒓𝒂𝒅

1 5

2

3

2

5 3 2 1.5

1

0.5

Page 36: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

36

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.

π’Šπ’… π‘ͺπ‘­πŸπ‘Ώ π‘ͺπ‘­πŸπ‘Ώ π‘ͺπ‘­πŸπ’• π‘ͺπ‘­πŸπ’• 𝒏 𝒄𝒆𝒏 𝒓𝒂𝒅

1 5

2

3

2

5 3 2 1.5

1

0.5

2 32

145

8

17

25 7 2 4

8.5

0.5

Page 37: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

37

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

For each discovered cluster, assign a uniqueID and create ist micro-cluster summary.

π’Šπ’… π‘ͺπ‘­πŸπ‘Ώ π‘ͺπ‘­πŸπ‘Ώ π‘ͺπ‘­πŸπ’• π‘ͺπ‘­πŸπ’• 𝒏 𝒄𝒆𝒏 𝒓𝒂𝒅

1 5

2

3

2

5 3 2 1.5

1

0.5

2 32

145

8

17

25 7 2 4

8.5

0.5

3 181

25

19

7

61 11 2 9.5

3.5

0.7 := centroids

Page 38: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

38

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Compute distance between point p and eachof the q maintained micro-cluster centroids.

𝑝𝑑𝐿2(𝐢1, 𝑝) = 2.06

𝐢1

𝐢2

𝐢3𝑑𝐿2(𝐢2, 𝑝) = 5.85

𝑑𝐿2(𝐢3, 𝑝) = 7.51

Page 39: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

39

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Select a cluster clu as the closestmicro-cluster to p.

𝑝𝑑𝐿2(𝐢1, 𝑝) = 2.06

𝐢1

𝐢2

𝐢3𝑑𝐿2(𝐢2, 𝑝) = 5.85

𝑑𝐿2(𝐢3, 𝑝) = 7.51

Page 40: Big Data Management and Analytics Assignment 8

DATABASESYSTEMSGROUP

Assignment 8-2 - CluStream

40

β€’ initPoints = 6

β€’ q = 3

β€’ Factor of clu radius t = 5

t 1 2 3 4 5 6 7 8 9 10 11 12

p 1

1

2

1

4

9

4

8

10

4

9

3

2

3

11

3

12

12

12

11

11

12

4

2

0 1 2 3 4 5 6 7 8 9101112

123456789

101112

Find the maximum boundary of clu:= factor of t of clu radius= 5 βˆ™ 0.5 = 2.5

𝑝

𝑑𝐿2(𝐢1, 𝑝) = 2.06

𝐢1

𝐢2

𝐢3is p within the maximum cluster boundary of 𝐢1?

(π‘₯ βˆ’ π‘₯𝑐1π‘π‘’π‘›π‘‘π‘Ÿπ‘œπ‘–π‘‘)2+(𝑦 βˆ’ 𝑦𝑐1π‘π‘’π‘›π‘‘π‘Ÿπ‘œπ‘–π‘‘)

2≀ (𝑑 βˆ™ π‘Ÿπ‘Žπ‘‘πΆ1)2?

(2 βˆ’ 1.5)2+(3 βˆ’ 1)2≀ (5 βˆ™ 0.5)2?