cs435 introduction to big data - colorado state universitycs435/slides/week7-a-6.pdf · cs435...

9
CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi Lee Pallickara 1 10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.0 CS435 Introduction to Big Data PART 1. LARGE SCALE DATA ANALYTICS WEB-SCALE LINK ANALYSIS Sangmi Lee Pallickara Computer Science, Colorado State University http://www.cs.colostate.edu/~cs435 10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.1 FAQs Matrix vs. Vector multiplication Please see the updated slides Marked with: Version 1: fundamental Matrix – vector multiplication Version 2: handling large vector Version 3: handling large vector v and column size in M Quiz 4 Sorting large amount of numbers 10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.2 PageRank Map function The Map function is written to apply to one element of M Each Map task will operate on a chunk of the matrix M From each matrix element mij, it produces the key-value pair (i, mijvj ) All terms of the sum that make up the component xi of the matrix-vector product will get the same key, i Reduce function Sums all the values associated with a given key i The result will be a pair (i, xi) VERSION 1: FUNDAMENTA MATRIX vs. VECTOR MULTIPLICATION 10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.3 Matrix M Initial Vector v Division of a matrix and vector into five stripes The ith stripe of the matrix multiplies only components from the ith stripe of the initial vector 0.002 0.017 0.003 0.010 0.000 0.000 0.001 0.000 0.012 0.000 0.001 0.000 0.001 0.001 0.120 0.000 .. .. 1/n 1/n 1/n 1/n Results: 0.002 x 1/n +0.017 x 1/n + 0.003 x 1/n +0.010 x 1/n… = (M00 x v0) + (M01 x v1) + (M020 x v2)… VERSION 2: WITH VERY LARGE v 10/7/2019 CS435 Introduction to Big Data – Fall 2019 Matrix M Initial Vector v Division of a matrix and vector into five stripes The ith stripe of the matrix multiplies only components from the ith stripe of the initial vector Mapper 1 Mapper 2 Mapper 3 Mapper 4 Mapper 5 n splits of v (n x l) blocks of M l blocks k stripes k splits Reducer: Add all of the local sums of the row k, and store it in the kth element of v Page 0: 1/n Page 1: 1/n Page 2: 1/n Page 3: 1/n Page: 0 1 2 3 ====================== 0.002 0.017 0.003 0.010 0.000 0.000 0.003 0.000 0.002 0.000 0.003 0.000 0.002 0.017 0.000 0.00 .. .. VERSION 3: WITH VERY LARGE v and COLUME in M 10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.5 1. Analysis phase (first MapReduce) Major functionality of mapper: performing a simple random sampling Input: each record Output: if the record is selected: <amountOfTransaction, null> If the record is not selected: no output Major functionality of reducer: there will be a single reducer No particular functionality. Return the keys only Input <amountOfTransaction, null> Output <amountOfTransaction, null>

Upload: others

Post on 30-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

1

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.0

CS435 Introduction to Big Data

PART 1. LARGE SCALE DATA ANALYTICSWEB-SCALE LINK ANALYSISSangmi Lee Pallickara

Computer Science, Colorado State Universityhttp://www.cs.colostate.edu/~cs435

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.1

FAQs• Matrix vs. Vector multiplication• Please see the updated slides• Marked with:• Version 1: fundamental Matrix – vector multiplication• Version 2: handling large vector• Version 3: handling large vector v and column size in M

• Quiz 4• Sorting large amount of numbers

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.2

PageRankMap function

• The Map function is written to apply to one element of M• Each Map task will operate on a chunk of the matrix M

• From each matrix element mij, it produces the key-value pair (i, mijvj )• All terms of the sum that make up the component xi of the matrix-vector

product will get the same key, i

Reduce function

• Sums all the values associated with a given key i

• The result will be a pair (i, xi)

VERSION 1: FUNDAMENTA MATRIX vs. VECTOR MULTIPLICATION 10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.3

Matrix M Initial Vector v

Division of a matrix and vector into five stripes

The ith stripe of the matrix multiplies only components from the ith stripe of the initial vector

0.002 0.017 0.003 0.010

0.000 0.000 0.001 0.0000.012 0.000 0.001 0.0000.001 0.001 0.120 0.000

.. .. … …

1/n

1/n1/n1/n

Results:

0.002 x 1/n +0.017 x 1/n + 0.003 x 1/n +0.010 x 1/n… = (M00 x v0) + (M01 x v1) + (M020 x v2) …

VERSION 2: WITH VERY LARGE v

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.4

Matrix M

Initial Vector v

Division of a matrix and vector into five stripes

The ith stripe of the matrix multiplies only components from the ith stripe of the initial vector

Mapper 1

Mapper 2

Mapper 3

Mapper 4

Mapper 5

n splits of v

(n x l) blocks of M

l blocks

k stripes

k splits

Reducer:Add all of the local sumsof the row k, and store itin the kth element of v

Page 0: 1/n

Page 1: 1/nPage 2: 1/nPage 3: 1/n

Page: 0 1 2 3

======================0.002 0.017 0.003 0.0100.000 0.000 0.003 0.000

0.002 0.000 0.003 0.0000.002 0.017 0.000 0.00

.. .. … …

VERSION 3: WITH VERY LARGE v and COLUME in M 10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.5

1. Analysis phase (first MapReduce)

Major functionality of mapper: performing a simple random sampling

Input: each record

Output: if the record is selected: <amountOfTransaction, null>

If the record is not selected: no output

Major functionality of reducer: there will be a single reducer

No particular functionality. Return the keys only

Input <amountOfTransaction, null>

Output <amountOfTransaction, null>

Page 2: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

2

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.6

1.5 Separate task (non—parallel)

• Generate .seq file to specify ranges for partitioning using output from the “Analysis phase” job.

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.7

2. Sorting phase (second MapReduce)

Uses TotalOrderPartitioner (or write your own customized partitioner) (Set

this in driver)

- This will be an identity Mapper

Input: each record, Output: <amountOfTransaction, same as the input>

Major functionality of reducer: writing the sorted values

No particular functionality.

Input <amountOfTransaction, [a list of records]>

Output <amountOfTransaction, [a list of records]>

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.8

Topics

• Large-scale Analytics 1. Web-Scale Link and Social Network Analysis

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.9

Part 1. Large Scale Data Analytics1. Web-Scale Link Analysis and Social Network Analysis

Challenges in PageRank Algorithm for the real Web: Examples

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.10

Example 1 • Compute the PageRank of each page assuming β=0.8

1/3 1/2 1/2M=1/3 0 1/2

1/3 1/2 0

1/3v0= 1/3

1/3

β M v0 +(1-β)e/n⅓ ½ ½ ⅓

= 0.8 × ⅓ 0 ½ × ⅓ + (1-0.8)e/3 ⅓ ½ 0 ⅓

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.11

Example 2• clique• Set of nodes with all possible arcs from one to another

• Suppose the Web consists of a clique of n nodes and a single additional node that is the successor of each of the n nodes in the clique

A B

C D

E

Page 3: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

3

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.12

Example 2

• Determine the PageRank of each page assuming β=0.8• Is there a dead end?

• Is there a spider trap?

A B

C D

E

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.13

Example 2

• Determine the PageRank of each page assuming β=0.8• Is there a dead end? • Yes. E

• Is there a spider trap? • No. Having clique does not mean that there is a spider trap

A B

C D

E

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.14

Example 2A B

C D

E

0 ¼ ¼ ¼ 0M= ¼ 0 ¼ ¼ 0

¼ ¼ 0 ¼ 0¼ ¼ ¼ 0 0¼ ¼ ¼ ¼ 0⅕

v0= ⅕⅕⅕⅕

β M v0 +(1-β)e/n0 ¼ ¼ ¼ 0 ⅕ 1¼ 0 ¼ ¼ 0 ⅕ 1

= 0.8 × ¼ ¼ 0 ¼ 0 ⅕ +(1-0.8) 1 /5¼ ¼ ¼ 0 0 ⅕ 1¼ ¼ ¼ ¼ 0 ⅕ 1

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.15

Example 3

• Suppose that we recursively eliminate dead ends from the Web graph to solve the remaining graph• Suppose that the graph is a chain of dead ends, headed by a node with a

self-loop• What would be the PageRank assigned to each of the nodes?

A B C D Z

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.16

Example 3• Remove all of the dead ends recursively

A B C D Z

A B C D

A B C D

A B C

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.17

Example 3• Remove all of the dead ends recursively

• What is v0 and M?

A B

A

Page 4: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

4

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.18

Example 3• What is v0 and M?

• v0 =[1], M = [1], PageRank of A = 1• PageRank of B = ½×1• PageRank of C = PageRank of B = ½• PageRank of D = PageRank of C = ½• …

A

A B C D Z

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.19

Part 1. Large Scale Data Analytics1. Web-Scale Link Analysis and Social Network Analysis

Using PageRank in a search engine

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.20

Searching pages

• Each search engine has a secret formula that decides the order in which to show pages to the user in response to a search query consisting of one or more search terms

• Google uses more than 250 different properties of pages

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.21

Generating the final lists

• Selecting candidate pages• A page has to have at least one of the search terms in the query• Applying weight• Presence or absence of search terms in prominent places

• e.g. headers or the links to the page itself

• Among the qualified pages, a score is computed for each• PageRank score

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.22

Part 1. Large Scale Data Analytics1. Web-Scale Link Analysis and Social Network Analysis

Efficient Computation of PageRank

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.23

Problems in performing PageRank

• To compute the PageRank for a Web graph• We should perform a matrix-vector multiplication of the order of 50 times• Until the vector is close to unchanged at one iteration

• The transition matrix of the Web M is very sparse• Representing it by all its elements is highly inefficient• We want to represent the matrix by only its nonzero elements

• We want to reduce the amount of data that must be passed from the Map tasks to Reduce tasks

Page 5: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

5

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.24

Representing Transition Matrices (1/2)

• The average Web page has about 10 out-links• We are analyzing a graph of 1.4 billion pages• Only one in 0.14 billion (140 million) entries is not 0

• Can we list the location of the nonzero entries and their values?

• If we use two 4-byte integers for coordinates (row#, col#) of an element and an 8-byte double-precision number for the probability value • 16-bytes per nonzero entry• The space needed is linear of nonzero entries

VERSION 4: SPARSE MATRICES10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.25

Representing Transition Matrices (2/2)

• For the Web graph• The value will be 1 divided by the out-degree of the page

0 1/2 0 01/3 0 0 1/2

M= 1/3 0 1 1/21/3 1/2 0 0

Source (PR) Degree Destinations

A (l) 3 B, C, D

B (m) 2 A, D

C (n) 1 C

D (o) 2 B, C

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.26

Mapper generates (key, value) = (destinations, current PR/degree)e.g. for the source A, (B, l/3), (C, l/3), (D, l/3)For the source B, (A, m/2), (D, m/2)

For the source C, (C, n)For the source D, (B, o/2), (C, o/2)

Reducer calculatesAdd values per nodee.g. For B: (B, l/3), (B, o/2)For C: (C, l/3), (C, n), (C, o/2)vcurrent= < m/2 , (l/3 +o/2), (l/3+n+o/2), (l/3+m/2)>

Source Degree Destinations

A (l) 3 B, C, D

B (m) 2 A, D

C (n) 1 C

D (o) 2 B, C

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.27

PageRank Iteration Using MapReduce

• One iteration of the PageRank algorithm involves,

• First round of MapReduce• Calculate Mv and store the result to v’

• Second round of MapReduce• For each component, multiply β and add (1-β)/n

v ' = βMv+ (1−β)e / n

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.28

PageRank Iteration Using MapReduce

• If n is small enough that each Map task can store the full vector v in main memory• And v’

• For the Web, v is much too large to fit in main memory• We need striping

• M into vertical stripes and break v into corresponding horizontal strips

v ' = βMv+ (1−β)e / n

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.29

Part 1. Large Scale Data Analytics1. Web-Scale Link Analysis and Social Network Analysis

Link spam

Page 6: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

6

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.30

Architecture of a Spam Farm

• Spam Farm• A collection of pages whose purpose is to increase the PageRank of a certain page

or pages

• From the point of view of the spammer, the Web is divided into two parts• Inaccessible pages

• The pages that the spammer cannot affect• Most of the Web

• Accessible pages• Those pages that, while they are not controlled by the spammer, can be affected by the

spammer

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.31

The Web from the point of view of the link spammer

InaccessiblePages

AccessiblePages

OwnPages

TargetPage

SupportingPages

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.32

Understanding Spam Farm (1/2)

• Setting the links to the target page• Without link from outside, the spam farm is not useful• e.g. Blogs or news papers

• Comments like “I agree. Please see my article at www.mySpamFarm.com”

A

B

C

D0.70.35

0.35

0.7

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.33

Understanding Spam Farm (2/2)

• There is one page t, the target page• Spammer attempts to place as much PageRank as possible

• There are a large number of m supporting pages• Accumulate the portion of the PageRank that is distributed equally to all pages• The fraction 1-β of the PageRank that represents surfers going to a random page• Prevent the PageRank of t from being lost

• Note that all of the supporting pages links only to t

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.34

Analysis of a Spam Farm (1/6)

• A taxation parameter β• The fraction of a page’s PageRank that

gets distributed to its successors at the next round

• Let there be,• n pages on the Web in total• A target page t• m supporting pages

Accessible

Pages

Own

Pages

TargetPage

SupportingPages

m

t

n pages on the web

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.35

Analysis of a Spam Farm (2/6)

• Let x be the amount of PageRank contributed by the accessible pages• x is the sum over all accessible

page p with a link to t, of the PageRank of p times β divided by the number of successors of p

• Finally, let y be the unknown PageRank of t

Accessible

Pages

Own

Pages

TargetPage

SupportingPagesm

tx

Page 7: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

7

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.36

Analysis of a Spam Farm (3/6)

• The PageRank of each supporting page• βy/m+(1-β)/n

• First term represents the contribution from t• βy is distributed to t’s successors

• Second term is the supporting page’s share of the fraction 1-β of the PageRank that is divided equally among all pages on the Web

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.37

Analysis of a Spam Farm (4/6)

• PageRank of y of target page t is (1)+(2)+(3)

1. Contribution x from outside

• x is the sum over all accessible page p with a link to t, of the PageRank of p times βdivided by the number of successors of p

2. β times the PageRank of every supporting page

βm(βy/m+(1-β)/n)

3. (1-β)/n, the share of the fraction 1-β of the PageRank that belongs to tThis amount is negligible

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.38

Analysis of a Spam Farm (5/6)

• From (1) and (2),

y = x +βm(βym+1−βn) = x +β 2y+β(1−β)m

n

y = x1−β 2

+ c mn

Where

c = β(1−β) / (1−β 2 ) = β / (1+β)

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.39

Analysis of a Spam Farm (6/6)

• If we choose β=0.85, then 1/(1-β2)=3.6• c= β/(1+β)=0.46

• The structure has amplified the external PageRank contribution by 360%

• Also, it obtained an amount of PageRank that is 46% of the fraction of the Web, m/n, that is in the spam farm

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.40

Part 1. Large Scale Data Analytics1. Web-Scale Link Analysis and Social Network Analysis

Combatting Link Spam

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.41

Combatting Link Spam

• Detecting and eliminating link spam have been critical for search engines• Just as it was critical to eliminate term spam in the previous decade

• Detecting particular structures• Spam farm

• One page links to a very large number of pages• Each of which links back to it

Page 8: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

8

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.42

Combatting Link Spam

• Modifying PageRank to lower the rank of link-spam pages automatically• TrustRank• Spam mass

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.43

TrustRank

• TrustRank is a topic-sensitive PageRank• “topic” is a set of pages believed to be trustworthy (not spam)

• Develop a suitable teleport set of trustworthy pages• Let humans examine a set of pages and decide which of them are trustworthy

• Pick a domain whose membership is controlled • University pages• .mil, or .gov

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.44

Calculating TrustRank (1/2)

• Then the topic-sensitive PageRank for S is the limit of the iteration,

• M is the transition matrix of the Web, and |S| is the size of set S

v ' = βMv+ (1−β)eS / | S |

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.45

Calculating TrustRank (2/2)

• Suppose we use β=0.8, and our trust rank is represented by the teleport set (trustworthy pages) S={B,D}

0 2/5 5/4 0βM= 4/15 0 0 2/5

4/15 0 0 2/54/15 2/5 0 0

0 2/5 5/4 0 0v’ = 4/15 0 0 2/5 v + 1/10

4/15 0 0 2/5 04/15 2/5 0 0 1/10

0/2 2/10 42/150 62/250 54/2101/2 3/10 41/150 71/250 59/2100/2 2/10 26/150 46/250 … 38/2101/2 3/10 41/150 71/250 59/210

B and D get a higher PageRank than before

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.46

Spam Mass• Measures the fraction of its PageRank that comes from spam for each

page

• For an arbitrary page p,• PageRank r

• Computing ordinary PageRank

• TrustRank t• Computing the TrustRank based on some teleport set of trustworthy pages

• The spam mass • (r - t)/r

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.47

• A negative or small positive spam mass• p is probably not a spam page

• Page with high spam mass score• Should be eliminated

Page 9: CS435 Introduction to Big Data - Colorado State Universitycs435/slides/week7-A-6.pdf · CS435 Introduction to Big Data Fall 2019 Colorado State University 10/7/2019 Week 7-A Sangmi

CS435 Introduction to Big DataFall 2019 Colorado State University

10/7/2019 Week 7-ASangmi Lee Pallickara

9

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.48

Example

• Suppose that both the PageRank and TrustRank were computed• Teleport set was page B and D• Which nodes are not the link spams?• Is there any link spam?

Web Page PageRank TrustRank SpamMass

A 3/9 54/210 0.229

B 2/9 59/210 -0.264

C 2/9 38/210 0.186

D 2/9 59/210 -0.264

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.49

Example

• Suppose that both the PageRank and TrustRank were computed• Teleport set was page B and D• Which nodes are not the link spams?

• B and D• C has lower chance to be the link spam compared to A

Web Page PageRank TrustRank SpamMass

A 3/9 54/210 0.229

B 2/9 59/210 -0.264

C 2/9 38/210 0.186

D 2/9 59/210 -0.264

10/7/2019 CS435 Introduction to Big Data – Fall 2019 W7.A.50

Questions?