mapreduce : simpliyed data processing on large clusters

21
MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)

Upload: havyn

Post on 24-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

MapReduce : Simpliyed Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. To appear in OSDI 2004 ( O perating S ystems D esign and I mplementation) . Jeff Dean. Sanjay Ghemawat. Introduce. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MapReduce :  Simpliyed Data Processing on Large Clusters

MapReduce: Simpliyed Data Processing on Large Clusters

Jeffrey Dean and Sanjay Ghemawat

To appear in OSDI 2004(Operating Systems Design and Implementation)

Page 2: MapReduce :  Simpliyed Data Processing on Large Clusters

Jeff DeanSanjay Ghemawat

Page 3: MapReduce :  Simpliyed Data Processing on Large Clusters

Important programming model for large-scale data-parallel application

Introduce

Page 4: MapReduce :  Simpliyed Data Processing on Large Clusters

Motivation

- Parallel applicationsWidely usedSpecial purpose applications

- Common functionalityParallelize computationDistribute dataHandle failures

- Large Scale(Big Data) Data Processing

Page 5: MapReduce :  Simpliyed Data Processing on Large Clusters

MapReduce?

-Programming ModelParallelGenericScalable

-DataMap(Key-Value) pair

-ImplementationCommodity clusters Commodity PC

Page 6: MapReduce :  Simpliyed Data Processing on Large Clusters

# map(key, val) is run on each item in set emits new-key / new-val pairs

# reduce(key, vals) is run for each unique key emitted by map()

emits final output

MapReduce?

# User define function

Page 7: MapReduce :  Simpliyed Data Processing on Large Clusters

Example

# Distributed Grep (Global / Regular Expression / Print )

# Count of URL Access Frequency (logs of webpage request) map<URL,1(total)> reduce<URL, total count(n)>

# Reverse Web-Link Graph map<target(linked url), source(web page) reduce<target,list(source)>

Page 8: MapReduce :  Simpliyed Data Processing on Large Clusters

# Inverted Index map<word, document ID> reduce<word, list(document id)>

# Distributed Sort map<key, record> reduce<key record>(emits all pairs unchanged)

# Term-Vector per Host (<word, frequency>a list of pair) map<hostname, term vector> reduce<hostname, term vector> (throwing away infrequent terms , and emits a fi-nal)

Example

Page 9: MapReduce :  Simpliyed Data Processing on Large Clusters

Execution overview

Page 10: MapReduce :  Simpliyed Data Processing on Large Clusters

Typical cluster

# Machines are typically 100s or 1000s of 2-CPU x86 machines(dual-processor x86 proces-sors)running Linux, with 2-4 GB of memory# NetWork 100 megabits/second or 1 gigabit/second

# Storage Storage is on local IDE disks

# GFS GFS: distributed file system manages data

# Job scheduling system - jobs made up of tasks - scheduler assigns tasks to machines

# Language C++ library linked into user programs

Page 11: MapReduce :  Simpliyed Data Processing on Large Clusters

Distributed-1?

#1 - Split input file into M pieces (16M ~ 64M)(user via optional pa-rameter) - start up many copies of the program on a cluster of machines#2 - Master(1) – on e of the copies of the program is special - worker(n) – assigned work by the master - Map task(M) / Reduce tasks(R)

#3 - Map task reads the content (from input split) - pares (key/value pair) user define map function - buffered in memory

#5 Reduce workers - it uses remote procedure calls to read the buffered data from the local disks of the map workers

#4 Map workers - Periodically, the buffered pairs are written to local disk - the local disk are passed back to the master - who is responsible for forwarding these locations to the reduce workers

Page 12: MapReduce :  Simpliyed Data Processing on Large Clusters

#6 - reduce worker iterates(unique intermediate key encountered) - start up many copies of the program on a cluster of machines - The output of the Reduce function is appended to a finnal output le for this reduce partition.

Distributed-2?

#7 - When all map tasks and reduce tasks have been completed - the master wakes up the user program - the MapReduce call in the user program returns back to the user code.

#8 - After successful completion - R output files(reduce)(file names as specied by the user) - the MapReduce call in the user program returns back to the user code.

Page 13: MapReduce :  Simpliyed Data Processing on Large Clusters

Master Data Structures

#Status

Idle( 비가동 ) in-progress( 가동 ) completed( 완료 )

Page 14: MapReduce :  Simpliyed Data Processing on Large Clusters

Fault Tolerance( 결함의 허용 범위 )

#Worker Failure - The master pings every worker periodically - MapReduce is resilient to large-scale worker failures

#Master Failure mapreduce stop - It is easy to make the master write periodic checkpoints of the mas-ter data structures described above. - If the master task dies, a new copy can be started from the last checkpointed state. - Clients can check for this condition and retry the MapReduce opera-tion if they desire.#Semantics in the Presence of Failures ( 실패의 의미 )

Page 15: MapReduce :  Simpliyed Data Processing on Large Clusters

Locality( 지역성 )

#GFS 저장 네트워크 대역폭 절약 GFS divides each file into 64 MB blocks, and stores several copies of each block (typically 3 copies) on different machines.

#When running largeMapReduce operations on a signicant fraction of theworkers in a cluster, most input data is read locally andconsumes no network bandwidth.

Page 16: MapReduce :  Simpliyed Data Processing on Large Clusters

Task Granularity

# 이상적인 : Map (M) , Reduce(R) M,R > Machines - 동적 로드벨런싱 향상 - worker failure 복구시간 향상

#Master O(M+R) 개의 스캐줄링 생성 O(M+R) 개의 상태가 메모리에 유지 실질적인 허용 범위가 존재함 O(M+R) 의 상태는 최소 1byte 로 구성됨

#reduce(r) 사용자 로부터 제약을 받음 ( 각각의 시스템에서 처리 됨으로 )

#M=200,000 개 R=5,000 개 (Machines)Worker=2000 환경에서 MapReduce 연산을 수행

Page 17: MapReduce :  Simpliyed Data Processing on Large Clusters

Backup Tasks

# ”Straggler” 낙오자 Machines 전체 연산 중 가장 나중에 수행 되는 매우 처리가 오래 걸리는 map or reduce task

# When a MapReduce operation is close to completion, the master schedules backup executions of the remaining in-progress tasks.

#The task is marked as completed whenever either the primary or the backup execution completes.

Page 18: MapReduce :  Simpliyed Data Processing on Large Clusters
Page 19: MapReduce :  Simpliyed Data Processing on Large Clusters

Combiner Function

Master

MapTask

MapTask

ReduceTask

ReduceTask

ReduceTask

MapTask

Network TrafficCPU Performance

N1

N3N2

Page 20: MapReduce :  Simpliyed Data Processing on Large Clusters

Status Infomation

#The master runs an internal HTTP server and exports a set of status pages for human consumption

#how many tasks have been completed

#how many are in progress, bytes of input, bytes of intermediate data, bytes of output, processing rates

# The user can use this data to predict how long the computation will take

Page 21: MapReduce :  Simpliyed Data Processing on Large Clusters

Conclusions

#First, the model is easy to use, even for programmers without experi-encewith parallel and distributed systems,# Second, a large variety of problems are easily expressible as MapRe-duce computations

# Third, we have developed an implementation of MapReduce that scales to large clusters of machines comprising thousands of machines

# First, restricting the programming model makes it easy to parallelize and distribute computations and to make such computations fault-tol-erant.# Second, network bandwidth is a scarce resource.

# Third, redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss.