"mapreduce: simplified data processing on large clusters" paper presentation @ papers we...
TRANSCRIPT
MapReduce: Simplified Data Processing on Large Clusters
Papers We LoveBucharest Chapter October 12, 2015
What is?
◉ MapReduce is a programming model and implementation for processing large data sets
◉ Users specify a Map function that processes a key-value pair to generate a set of intermediate key-value pairs and a Reduce function that aggregates all intermediate values that share the same intermediate key in order to combine the derived data appropriately
map:(k1, v1)->[(k2, v2)]reduce:(k2, [v2])->[(k3, v3)]
Map◉function (mathematics)◉map (Java)◉Select (.NET)
Where else have we seen this?
Reduce◉ fold (functional
programming)◉ reduce (Java)◉ Aggregate (.NET)
Other analogies
"To draw an analogy to SQL, map is like the group-by clause of an aggregate query. Reduce is analogous to the aggregate function that is computed over all the rows with the same group-by attribute"
D.J. DeWitt & M. Stonebraker
Divide-and-conquer algorithms“recursively breaking down a problem into two or more sub-problems of the same (or related) type (divide), until these become simple enough to be solved directly (conquer). The solutions to the sub-problems are then combined to give a solution to the original problem.”
History
◉ April 1960: John McCarthy introduced the concept of “maplist”
◉ September 4, 1998: Google founded◉ 1998-2003: hundreds of special-purpose large data
computation programs in Google◉ February 2003: 1st version of MapReduce◉ August 2003: MapReduce significant enhancements◉ June 18, 2004: Patent US7650331 B1 filed◉ December 6, 2004: 1st MapReduce public presentation◉ 2005: Hadoop implementation started in Java (Douglass
R. Cutting & Michael J. Cafarella)◉ September 4, 2007: Hadoop 0.14.1◉ January 19, 2010: Patent US7650331 B1 published◉ July 6, 2015: Hadoop 2.7.1
Distribution issues
◉ Communication and routingwhich nodes should be involved?what transport protocol should be used?threads/events/connections managementremote execution of your processing code?
◉ Fault tolerance and fault detection◉ Load balancing / partitioning of data
heterogeneity of nodesskew in datanetwork topology
◉ Parallelization strategyalgorithmic issues of work splitting
“without having to deal with failures, the rest of the support code just isn’t so complicate”
S. Ghemawat
application-independentMap Module
application-independentReduce Module
MapReduce system
application-specificMap Operation
application-specificReduce
OperationInput Data
Intermediate Data
Output Data
MapReduce model in practice
map(String key, String value):for each Word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values)int result = 0;for each v in values:
result += ParseInt(v);Emit(key, AsString(result));
map:(k1, v1)->[(k2, v2)]reduce:(k2, [v2])->[(k3, v3)]
How long does it take to go through 1 TB?
Sequentially: 3 hoursMapReduce: startup overhead 70 seconds + computation 80 seconds
Environment◉ 1800 Linux dual-processorx86 machines, 2-4 GB memory◉ Fast Ethernet/Giga Ethernet◉ Inexpensive IDE disks and a distributed Google File System
Tianhe-2, #1 supercomputer: 3,120,000 cores, 1,5 PB total memory
Execution diagram◉ Master process is a task itself initiated by the WQM and is
responsible for assigning all other tasks to worker processes◉ Each worker invokes at least a map thread and a reduce
thread◉ If a worker fails, its tasks are reassigned to another worker
process◉ When WQM receives a job, it allocates the job to the master
that calculates and requires M+R+1 processes to be allocated to the job
◉ WQM responds with the process allocation info (can result less processes) to the master that will manage the performance of the job
◉ Reduce tasks begin work when the master informs them that there are intermediate files ready
◉ Input data (files/DB/memory) are splitted in data blocks (16-64 MB) automatically or configurable
◉ The worker to which a map task has been assigned applies the map() operator to the respective input data block
◉ When the worker completes the task, it informs the master of the status
◉ Master informs workers where to find intermediate data and schedules their reading
◉ Workers (3 & 4) sort the intermediate key-value pairs, then merge (by applying reduce()) them and write to output
Workflow diagram
◉ When a process completes a task it informs WQM which updates the status tables
◉ When WQM discovers one process failed, it assign its tasks to a new process and updates the status tablesTask Status Table
◉TaskID◉Status (InProgress, Waiting, Completed, Failed)◉ProcessID◉InputFiles (Input, Intermediate)◉OutputFiles
Process Status Table◉ProcessID◉Status (Idle, Busy, Failed)◉Location (CPU ID, etc.)◉Current (TaskID, WQM)
Questions from the audience @ original paper presentation◉ Q: Wanted to know of any task that could not be handled using
MapReduce?A: join operations could not be performed with the current model
◉ Q: Wondered how MapReduce differs from parallel databases?A: MapReduce is stored across a large number of machines as compared to parallel
databases, the abstractions are fairly simple to use in MapReduce, and MapReduce also benefits greatly from locality optimizations
Bibliography
◉ J. Dean, S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters", OSDI'04, Dec. 6, 2004
◉ J. Dean, S. Ghemawat, “System and method for efficient large-scale data processing”, Patent US 7650331 B1, Jan. 19, 2010
◉ S. Ghemawat, J. Dean, J. Zhao, M. Austern, A. Spector, "Google Technology RoundTable: Map Reduce", Aug. 21, 2008 – Youtube
◉ P. Mahadevan, "OSDI'04 Conference Reports", ;LOGIN: Vol. 30, No. 2, Apr. 2005, p. 61
◉ R. Jacotin, “Lecture: The Google MapReduce”, SlideShare, October 3, 2014