mapreduce. web data sets can be very large – tens to hundreds of terabytes cannot mine on a single...

30
MapReduce

Upload: wendy-copeland

Post on 26-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

MapReduce

• Web data sets can be very large – Tens to hundreds of terabytes

• Cannot mine on a single server

• Standard architecture emerging:– Cluster of commodity Linux nodes– Gigabit Ethernet interconnect

• How to organize computations on commodity clusters?

MapReduce

• Map-reduce is a high-level programming system that allows processes to be written simply.

• The user writes code for two functions, map and reduce.

• A master controller divides the input data into chunks, and assigns different processors to execute the map function on each chunk.

• Other processors, perhaps the same ones, are then assigned to perform the reduce function on pieces of the output from the map function.

Data Organization

• Data is assumed stored in files. – Typically, the files are very large compared with the files

found in conventional systems. • For example, one file might be all the tuples of a very large

relation. • Or, the file might be a terabyte of "market baskets,“• Or, the file might be the "transition matrix of the Web," which is

a representation of the graph with all Web pages as nodes and hyperlinks as edges.

• Files are divided into chunks, which might be complete cylinders of a disk, and are typically many megabytes.

The Map Function• Input is a set of key-value records.

• Executed by one or more processes, located at any number of processors. – Each map process is given a chunk of the entire input data on which to

work.

• Output is a list of key-value pairs. – The types of keys and values for the output of the map function need

not be the same as the types of input keys and values.– The "keys" that are output from the map function are not true keys in

the database sense. • That is, there can be many pairs with the same key value.

The Reduce Function• The second user-defined function, reduce, is

also executed by one or more processes, located at any number of processors.

• Input a key value from the intermediate result, together with the list of all values that appear with this key in the intermediate result.

• The reduce function itself combines the list of values associated with a given key k.

A Simple Example • Counting words in a large set of documents

map(string value)//key: document name//value: document contentsfor each word w in value

EmitIntermediate(w, “1”);

reduce(string key, iterator values)//key: word//values: list of countsint results = 0;for each v in values

result += ParseInt(v);Emit(AsString(result));

Example for MapReduce

• Page 1: the weather is good• Page 2: today is good• Page 3: good weather is good.

Map output

• Worker 1: – (the 1), (weather 1), (is 1), (good 1).

• Worker 2: – (today 1), (is 1), (good 1).

• Worker 3: – (good 1), (weather 1), (is 1), (good 1).

Reduce Input

• Worker 1:– (the 1)

• Worker 2:– (is 1), (is 1), (is 1)

• Worker 3:– (weather 1), (weather 1)

• Worker 4:– (today 1)

• Worker 5:– (good 1), (good 1), (good 1), (good 1)

Reduce Output

• Worker 1:– (the 1)

• Worker 2:– (is 3)

• Worker 3:– (weather 2)

• Worker 4:– (today 1)

• Worker 5:– (good 4)

MapReduce Architecture

MapReduce

Parallel Execution

MapReduce ExampleConstructing an Inverted Index• Input is a collection of documents, • Final output (not as the output of map) is a list for each word of the documents that

contain that word at least once.

Map Function• Input is a set of (i,d) pairs

– i is document ID– d is corresponding document.

• The map function scans d and for each word w it finds, it emits the pair (w, i). – Notice that in the output, the word is the key and the document ID is the

associated value. • Output of map is a list of word-ID pairs.

– Not necessary to catch duplicate words in the document; the elimination of duplicates can be done later, at the reduce phase.

– The intermediate result is the collection of all word-ID pairs created from all the documents in the input database.

Note. The output of a map-reduce algorithm is always a set of key-value pairs. Useful in some applications to compose two or more map-reduce operations.

MapReduce Example

Constructing an Inverted Index• Input is a collection of documents, • Final output (not as the output of map) is a list for each word of

the documents that contain that word at least once.

Reduce Function• The intermediate result consists of pairs of the form (w,

[i1, i2,…,in]), – where the i's are a list of document ID's, one for each occurrence of

word w.• The reduce function takes a list of ID's, eliminates duplicates, and

sorts the list of unique ID's.

Parallelism• This organization of the computation makes excellent use of

whatever parallelism is available.

• The map function works on a single document, so we could have as many processes and processors as there are documents in the database.

• The reduce function works on a single word, so we could have as many processes and processors as there are words in the database.

• Of course, it is unlikely that we would use so many processors in practice.

Some Applications• Distributed Grep:

– Map - Emits a line if it matches the supplied pattern

– Reduce - Copies the the intermediate data to output

• Count of URL access frequency– Map – Process web log and outputs <URL, 1>

– Reduce - Emits <URL, total count>

• Reverse Web-Link Graph– Map – process web log and outputs <target, source>

– Reduce - emits <target, list(source)>

Refinement

• Fault tolerance• Different partitioning functions.• Combiner function.• Different input/output types.• Skipping bad records.• Local execution.• Status info.• Counters.

Fault Tolerance

• Network Failure: – Detect failure via periodic heartbeats – Re-execute completed and in-progress map tasks – Re-execute in progress reduce tasks – Task completion committed through master

• Master failure: – Could handle, but don't yet (master failure unlikely)

Fault Tolerance

• Reactive way– Worker failure

• Heartbeat, Workers are periodically pinged by master– NO response = failed worker

• If the processor of a worker fails, the tasks of that worker are reassigned to another worker.

– Master failure• Master writes periodic checkpoints• Another master can be started from the last checkpointed state• If eventually the master dies, the job will be aborted

Fault Tolerance

• Proactive way (Redundant Execution)– The problem of “stragglers” (slow workers)

• Other jobs consuming resources on machine• Bad disks with soft errors transfer data very slowly• Weird things: processor caches disabled (!!)

– When computation almost done, reschedule in-progress tasks

– Whenever either the primary or the backup executions finishes, mark it as completed

Fault Tolerance

Fault Tolerance

• Input error: bad records– Map/Reduce functions sometimes fail for particular inputs – Best solution is to debug & fix, but not always possible – On segment fault

• Send UDP packet to master from signal handler • Include sequence number of record being processed

– Skip bad records• If master sees two failures for same record, next worker is told to

skip the record

Combiner

Locality issue

• Master scheduling policy – Asks GFS for locations of replicas of input file blocks– Map tasks typically split into 64MB (== GFS block size)– Map tasks scheduled so GFS input block replica are on

same machine or same rack

• Effect – Thousands of machines read input at local disk speed – Without this, rack switches limit read rate

Status monitor

Refinements

• Task Granularity– Minimizes time for fault recovery– load balancing

• Local execution for debugging/testing • Compression of intermediate data• Better Shuffle-Sort

Points need to be emphasized

• No reduce can begin until map is complete• Master must communicate locations of

intermediate files• Tasks scheduled based on location of data• If map worker fails any time before reduce

finishes, task must be completely rerun• MapReduce library does most of the hard

work for us!