mapreduce programming model data type: key-value records map function: (k in, v in ) list(k inter,...

26
MapReduce Programming Model • Data type: key-value records • Map function: (K in , V in ) list(K inter , V inter ) • Reduce function: (K inter , list(V inter )) list(K out , V out )

Upload: amber-baldwin

Post on 13-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

MapReduce Programming Model• Data type: key-value records

• Map function:(Kin, Vin) list(Kinter, Vinter)

• Reduce function:(Kinter, list(Vinter)) list(Kout, Vout)

Example: Word Countdef mapper(line):

foreach word in line.split():

output(word, 1)

def reducer(key, values):

output(key, sum(values))

Word Count Execution

the quick

brown fox

the fox ate

the mouse

how now

brown cow

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

brown, 2

fox, 2

how, 1

now, 1

the, 3

ate, 1

cow, 1

mouse, 1

quick, 1

the, 1brown, 1

fox, 1

quick, 1

the, 1fox, 1the, 1

how, 1now, 1

brown, 1

ate, 1mouse, 1

cow, 1

Input Map Shuffle & Sort Reduce Output

در ای رابطه mapreduce جبرعملگرها•

–Projection ()–Selection ()–Intersect) ( –Cartesian product ()–Set union ()–Set difference ()–Rename ()–Join) ( ⋈–Group by… aggregation–…

Relational Selection

T1

T2

T3

T4

T5

T1

T3 R

selection in mapreduce

• MAP:

• REDUCE:

A

131

B

XYZ

C

ttt ,t

tttttt n ,,...,,, 21

)(1 RAدرست شرط اگر

باشد

درست شرط اگرنباشد

MAP

B

XZ

C

REDUCE

(1,X,) ((1),(X, ))

(3,Y,)

(1,Z,) ((1),(Z, ))

Example:

11

A

Relational Projection

R1

R2

R3

R4

R5

R1

R2

R3

R4

R5

R,

Projection in map reduce

• MAP:

• REDUCE:

A

131

B

XYX

C

ttt ,

tttttt n ,,...,,, 21

)(, RBA

(1,X,) ((1,x),(1,X))

(3,Y,) ((3,Y),(3,Y))

(1,X,) ((1,X),(1,))

MAP REDUCE

A

13

B

XY

Example:

Relational Union

T1

T2

T3

T1

T2

T1

T2

T3

T4

R

S

Union in map reduce

• MAP:

• REDUCE:

A

abc

B

213

A

bd

ttt ,

tttttt ,,...,,, 21

SR

(a,2) ((a,2),(a,2))(b,1) ((b,),(b,))

(c,3) ((c,3),(c,3))

MAP REDUCE

A

abcd

B

2134

Example:S

B

R

14

(b,1) ((b,1),(b,1))

(d,4) ((d,4),(d,4))

Relational Intersect

T1

T2

T3

T1

T2

T2

R

S

Intersect in map reduce

• MAP:

• REDUCE:

A

abc

B

213

A

bd

),(, RttRt SR

(a,2) ((a,2),((a,2),(R)))(b,1) ((b,),((b,),(R)))

(c,3) ((c,3),((c,3),(R)))

MAP REDUCE

A

b

B

1

Example:S

B

R

14

(b,1) ((b,1),((b,1),(S)))

(d,4) ((d,4),((d,4),(S)))

),(, SttSt )(,)),(,(, ttSRtt

))(,(, Rtt ))(,(, Stt

Relational Joins

R1

R2

R3

R4

S1

S2

S3

S4

R1 S2

R2 S4

R3 S1

R4 S3

Natural Join Operation – Example

• Relations r, s:A B

12412

C D

aabab

B

13123

D

aaabb

E

r

A B C D E

s

r s

Natural Join Operation – Example

• Relations r, s:A B

12412

C D

aabab

B

13123

D

aaabb

E

r

A B

1

C D

a

E

s

r s

Natural Join Operation – Example

• Relations r, s:A B

12412

C D

aabab

B

13123

D

aaabb

E

r

A B

11

C D

aa

E

s

r s

Natural Join Operation – Example

• Relations r, s:A B

12412

C D

aabab

B

13123

D

aaabb

E

r

A B

111

C D

aaa

E

s

r s

Natural Join Operation – Example

• Relations r, s:A B

12412

C D

aabab

B

13123

D

aaabb

E

r

A B

1111

C D

aaaa

E

s

r s

Natural Join Operation – Example

• Relations r, s:A B

12412

C D

aabab

B

13123

D

aaabb

E

r

A B

11112

C D

aaaab

E

s

r s

Natural Join Example

R1 S1

R1 S1 =

ID sname rating age bid day

22 dustin 7 45.0 101 10/ 10/ 96 58 rusty 10 35.0 103 11/ 12/ 96

ID sname rating age

22 dustin 7 45.0

31 lubber 8 55.5 58 rusty 10 35.0

ID bid day

22 101 10/10/96 58 103 11/12/96

Types of Relationships

One-to-OneOne-to-ManyMany-to-Many

Reduce-side Join: 1-to-1

R1

R4

S2

S3

R1

R4

S2

S3

keys valuesMap

R1

R4

S2

S3

keys values

Reduce

Reduce-side Join: 1-to-many

R1

S2

S3

R1

S2

S3

S9

keys valuesMap

R1 S2

keys values

Reduce

S9

S3 …

Reduce-side Join: many-to-many

R1

keys values

In reducer…

S2

S3

S9

Hold in memory

Cross with records from other set

R5

R8

Map-side Join: Basic Idea

Assume two datasets are sorted by the join key:

R1

R2

R3

R4

S1

S2

S3

S4

A sequential scan through both datasets to join(called a “merge join” in database terminology)

:منابع Cloud Computing with MapReduce and Hadoop

Matei ZahariaElectrical Engineering and Computer SciencesUniversity of California, Berkeley

Database and Map Reduce

Based on slides from Jimmy Lin’s lecture slides (http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/index.html) (licensed under Creation Commons Attribution 3.0 License)

Mining of Massive Datasets

Anand Rajaraman Kosmix, Inc. Jeffrey D. Ullman Stanford Univ.