mapreduce: simplified data processing on large clusters

MapReduce: MapReduce: simplified data simplified data

processing on large processing on large clustersclustersJeffrey Dean and Jeffrey Dean and

Sanjay GhemawatSanjay Ghemawat

Presented By :-Presented By :-Venkataramana ChunduruVenkataramana Chunduru

AGENDAAGENDA GFSGFS MAP REDUCEMAP REDUCE HADOOPHADOOP

MotivationMotivation Input data is large.Input data is large.

The whole Web, billions of Pages.The whole Web, billions of Pages. Lots of machinesLots of machines

Use them efficiently.Use them efficiently.

Google needed good Distributed file Google needed good Distributed file SystemSystem

Why not use the existing file systems?Why not use the existing file systems? Google’s problems are different from anyone else.Google’s problems are different from anyone else. GFS is designed for Google apps and workloads.GFS is designed for Google apps and workloads. Google apps are designed for GFS.Google apps are designed for GFS.

NFS DisadvantagesNFS Disadvantages Network congestion Heavy disk activity of the NFS server adversely

affects the NFS’s performance. When the client attempts to mount , the client

system hangs, although this can be mitigated using a specific mount.

If the server hosting the exportedfile system becomes unavailable due to any reason, no one can access the

resource. NFS has security problems because its design assumes a trustednetwork.

GFS AssumptionsGFS Assumptions High Component failure ratesHigh Component failure rates

Inexpensive commodity components fail all the Inexpensive commodity components fail all the time.time.

Modest number of huge filesModest number of huge files.. Just a few millionJust a few million Each is 100 MB or larger: multi GB files Each is 100 MB or larger: multi GB files

typicallytypically Files are write once ,mostly appended Files are write once ,mostly appended

toto Perhaps ConcurrentlyPerhaps Concurrently

Large streaming readsLarge streaming reads..

GFS Design DecisionsGFS Design Decisions Files are stored as chunks.Files are stored as chunks. - Fixed size(64 MB).- Fixed size(64 MB). Reliability through replication.Reliability through replication. - - Each chunk is replicated across 3+ chunkserversEach chunk is replicated across 3+ chunkservers Single master to co ordinate Single master to co ordinate

access,keep metadataaccess,keep metadata - - Simple centralized management.Simple centralized management. No data cachingNo data caching - - Little benefit due to large datasets,streaming reads.Little benefit due to large datasets,streaming reads.

GFS ArchitectureGFS Architecture

Single MasterSingle Master From Distributed systems we know it From Distributed systems we know it

is a :is a : - Single point of failure.- Single point of failure. - Scalibility bottleneck.- Scalibility bottleneck. GFS solutionsGFS solutions - Shadow masters- Shadow masters - Minimize master involvement- Minimize master involvement Simple and good enoughSimple and good enough..

Metadata (1/2)Metadata (1/2) Global metadata is stored on the Global metadata is stored on the

master.master. - File and chunk namespaces.- File and chunk namespaces. - Mapping from files to chunks.- Mapping from files to chunks. - Locations of each chunk replicas.- Locations of each chunk replicas. All in memory (64bytes/chunk)All in memory (64bytes/chunk) - Fast- Fast - Easily Accessible.- Easily Accessible.

Metadata (2/2)Metadata (2/2) Master has an operation log for Master has an operation log for

persistent logging of critical metadata persistent logging of critical metadata updates.updates.

- Persistent on local disk- Persistent on local disk - Replicated- Replicated - Check points for faster recovery.- Check points for faster recovery.

Deployment in GoogleDeployment in Google 50 + GFS Clusters50 + GFS Clusters

Each with thousands of storage nodesEach with thousands of storage nodes

Managing petabytes of data.Managing petabytes of data.

GFS is under big table.GFS is under big table.

Conclusion of GFSConclusion of GFS GFS demonstrates how to support large GFS demonstrates how to support large

scale processing workloads on scale processing workloads on commodity hardwarecommodity hardware

- Designed to tolerate frequent component failures.- Designed to tolerate frequent component failures. - Optimized for huge files that are mostly appended - Optimized for huge files that are mostly appended

and read.and read. - Go for simple solutions.- Go for simple solutions.

GFS has met Google's storage needs…. GFS has met Google's storage needs…. it must be good !!!it must be good !!!

Example for MapReduceExample for MapReduce Page 1: the weather is goodPage 1: the weather is good Page 2: today is goodPage 2: today is good Page 3: good weather is good.Page 3: good weather is good.

Map outputMap output Worker 1:Worker 1:

(the 1), (weather 1), (is 1), (good 1).(the 1), (weather 1), (is 1), (good 1). Worker 2:Worker 2:

(today 1), (is 1), (good 1).(today 1), (is 1), (good 1). Worker 3:Worker 3:

(good 1), (weather 1), (is 1), (good 1).(good 1), (weather 1), (is 1), (good 1).

Reduce InputReduce Input Worker 1:Worker 1:

(the 1)(the 1) Worker 2:Worker 2:

(is 1), (is 1), (is 1)(is 1), (is 1), (is 1) Worker 3:Worker 3:

(weather 1), (weather 1)(weather 1), (weather 1) Worker 4:Worker 4:

(today 1)(today 1) Worker 5:Worker 5:

(good 1), (good 1), (good 1), (good 1)(good 1), (good 1), (good 1), (good 1)

Reduce OutputReduce Output Worker 1:Worker 1:

(the 1)(the 1) Worker 2:Worker 2:

(is 3)(is 3) Worker 3:Worker 3:

(weather 2)(weather 2) Worker 4:Worker 4:

(today 1)(today 1) Worker 5:Worker 5:

(good 4)(good 4)

MapReduce ArchitectureMapReduce Architecture

Parallel ExecutionParallel Execution

Fault ToleranceFault Tolerance Network Failure:Network Failure:

Detect failure via periodic heartbeats Detect failure via periodic heartbeats Re-execute completed and in-progress Re-execute completed and in-progress mapmap

tasks tasks Re-execute in progress Re-execute in progress reducereduce tasks tasks Task completion committed through masterTask completion committed through master

Master failure:Master failure: Could handle, but don't yet (master failure Could handle, but don't yet (master failure

unlikely)unlikely)

RefinementRefinement

Different partitioning functions.Different partitioning functions. Combiner function.Combiner function. Different input/output types.Different input/output types. Skipping bad records.Skipping bad records. Local execution.Local execution. Status info.Status info. Counters.Counters.

What’sWhat’s Framework for running applications on large clusters of Framework for running applications on large clusters of

commodity hardwarecommodity hardware Scale: petabytes of data on thousands of nodesScale: petabytes of data on thousands of nodes

IncludeInclude Storage: HDFSStorage: HDFS Processing: MapReduceProcessing: MapReduce

Support the Map/Reduce programming modelSupport the Map/Reduce programming model Requirements Requirements

Economy: use cluster of comodity computersEconomy: use cluster of comodity computers EEasy to useasy to use

Users: no need to deal with the complexity of Users: no need to deal with the complexity of distributed computingdistributed computing

Reliable: can handle node failures automaticallyReliable: can handle node failures automatically

Whats Hadoop ..Contd.Whats Hadoop ..Contd. Hadoop is a software platform that lets one Hadoop is a software platform that lets one

easily write and run applications that easily write and run applications that process vast amounts of data.process vast amounts of data.

Here's what makes Hadoop especially useful:Here's what makes Hadoop especially useful:

ScalableScalable

EconomicalEconomical

EfficientEfficient

Reliable Reliable

HDFSHDFS Hadoop implements Hadoop implements MapReduceMapReduce, using the , using the

Hadoop Distributed File System (HDFS) (see Hadoop Distributed File System (HDFS) (see figure below.) figure below.)

MapReduce divides applications into many MapReduce divides applications into many small blocks of work. HDFS creates multiple small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing replicas of data blocks for reliability, placing them on compute nodes around the cluster. them on compute nodes around the cluster. MapReduce can then process the data where MapReduce can then process the data where it is located. it is located.

Hadoop has been demonstrated on clusters Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is with 2000 nodes. The current design target is 10,000 node clusters.10,000 node clusters.

http://wiki.apache.org/hadoop/HadoopMapReduce

Hadoop ArchitectureHadoop Architecture

DataData data data data dataData data data data dataData data data data data

Data data data data dataData data data data dataData data data data data



ResultsData data data dataData data data dataData data data dataData data data dataData data data dataData data data dataData data data dataData data data dataData data data data

Hadoop Cluster

DFS Block 1

DFS Block 1

DFS Block 2

DFS Block 2

DFS Block 2

DFS Block 1

DFS Block 3

DFS Block 3

DFS Block 3

MAP

MAP

MAP

Reduce

Sample Hadoop CodeSample Hadoop Code Sample text-files as input:Sample text-files as input: $ bin/hadoop dfs -ls /usr/joe/wordcount/input/ $ bin/hadoop dfs -ls /usr/joe/wordcount/input/

/usr/joe/wordcount/input/file01 /usr/joe/wordcount/input/file01 /usr/joe/wordcount/input/file02 /usr/joe/wordcount/input/file02

$ bin/hadoop dfs -cat /usr/joe/wordcount/input/$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file01file01 Hello World, Bye World! Hello World, Bye World!

$ bin/hadoop dfs -cat /usr/joe/wordcount/input/$ bin/hadoop dfs -cat /usr/joe/wordcount/input/file02file02 Hello Hadoop, Goodbye to hadoop. Hello Hadoop, Goodbye to hadoop.

Run the application:Run the application: $ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount $ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount

/usr/joe/wordcount/input /usr/joe/wordcount/output /usr/joe/wordcount/input /usr/joe/wordcount/output Output:Output: $ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-00000 $ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-00000

Bye 1 Bye 1 Goodbye 1 Goodbye 1 Hadoop, 1 Hadoop, 1 Hello 2 Hello 2 World! 1 World! 1 World, 1 World, 1 hadoop. 1 hadoop. 1 to 1 to 1

Contd…Contd… Notice that the inputs differ from the first version we looked at, and how Notice that the inputs differ from the first version we looked at, and how

they affect the outputs.they affect the outputs. Now, lets plug-in a pattern-file which lists the word-patterns to be Now, lets plug-in a pattern-file which lists the word-patterns to be

ignored, via the DistributedCache.ignored, via the DistributedCache. $ hadoop dfs -cat /user/joe/wordcount/$ hadoop dfs -cat /user/joe/wordcount/patterns.txt patterns.txt

\. \. \, \, \! \! to to

Run it again, this time with more options:Run it again, this time with more options: $ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount -$ bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount -

Dwordcount.Dwordcount.case.sensitive=true case.sensitive=true /usr/joe/wordcount/input /usr/joe/wordcount/input /usr/joe/wordcount/output -skip /user/joe/wordcount/patterns.txt /usr/joe/wordcount/output -skip /user/joe/wordcount/patterns.txt

As expected, the output:As expected, the output: $ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-00000 $ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-00000

Bye 1 Bye 1 Goodbye 1 Goodbye 1 Hadoop 1 Hadoop 1 Hello 2 Hello 2 World 2 World 2 hadoop 1 hadoop 1

Contd…Contd… Run it once more, this time switch-off case-sensitivity:Run it once more, this time switch-off case-sensitivity: $ bin/hadoop jar /usr/joe/wordcount.jar $ bin/hadoop jar /usr/joe/wordcount.jar

org.myorg.WordCount -org.myorg.WordCount -Dwordcount.Dwordcount.case.sensitive=falsecase.sensitive=false /usr/joe/wordcount/input /usr/joe/wordcount/output -/usr/joe/wordcount/input /usr/joe/wordcount/output -skip /user/joe/wordcount/patterns.txt skip /user/joe/wordcount/patterns.txt

Sure enough, the output:Sure enough, the output: $ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-$ bin/hadoop dfs -cat /usr/joe/wordcount/output/part-

00000 00000 bye 1 bye 1 goodbye 1 goodbye 1 hadoop 2 hadoop 2 hello 2 hello 2 world 2world 2

HadoopHadoop HDFS assumes that hardware is unreliable HDFS assumes that hardware is unreliable

and will eventually fail.and will eventually fail.

Similar to RAID level exceptSimilar to RAID level except --HDFS can replicate data across several HDFS can replicate data across several

machinesmachines

Provides Fault toleranceProvides Fault tolerance

Extremely high capacity storageExtremely high capacity storage

HadoopHadoop

““Moving Computation is cheaper than Moving Computation is cheaper than moving data”moving data”

HDFS is said to be rack awareHDFS is said to be rack aware..

Who uses Hadoop?Who uses Hadoop? ‘‘Facebook’Facebook’ uses Hadoop to analyze user uses Hadoop to analyze user

behavior and the effectiveness of ads on behavior and the effectiveness of ads on the site.the site.

The tech team at ‘The tech team at ‘The New York Times’The New York Times’

rented computing power on ‘rented computing power on ‘Amazon’s’Amazon’s’ cloud and used Hadoop to convert 11 cloud and used Hadoop to convert 11 million archived articles, dating back to million archived articles, dating back to 1851, to digital and searchable documents. 1851, to digital and searchable documents. They turned around in a single day a job They turned around in a single day a job that otherwise would have taken months.”that otherwise would have taken months.”

Who uses Hadoop?Who uses Hadoop? Besides Besides Yahoo!,Yahoo!, many other organizations are using many other organizations are using

Hadoop to run large distributed computations. Some of Hadoop to run large distributed computations. Some of them include:them include:

A9.comA9.com FacebookFacebook Fox Interactive MediaFox Interactive Media IBMIBM ImageShackImageShack ISIISI Joost Joost Last.fm Last.fm Powerset Powerset The New York Times The New York Times Rackspace Rackspace Veoh Veoh

http://en.wikipedia.org/wiki/A9.com

http://en.wikipedia.org/wiki/Facebook

http://en.wikipedia.org/wiki/Fox_Interactive_Media

http://en.wikipedia.org/wiki/IBM

http://en.wikipedia.org/wiki/ImageShack

http://en.wikipedia.org/wiki/Information_Sciences_Institute

Yahoo! Launches World's Yahoo! Launches World's Largest Hadoop Production Largest Hadoop Production

ApplicationApplication YAHOO! RECENTLY LAUNCHED WHAT WE BELIEVE IS YAHOO! RECENTLY LAUNCHED WHAT WE BELIEVE IS

THE WORLDS LARGEST THE WORLDS LARGEST APACHE HADOOPAPACHE HADOOP PRODUCTION PRODUCTION APPLICATION. THE YAHOO! SEARCH WEBMAP IS A APPLICATION. THE YAHOO! SEARCH WEBMAP IS A HADOOP APPLICATION THAT RUNS ON A MORE THAN HADOOP APPLICATION THAT RUNS ON A MORE THAN 10,000 CORE LINUX CLUSTER AND PRODUCES DATA 10,000 CORE LINUX CLUSTER AND PRODUCES DATA THAT IS NOW USED IN EVERY YAHOO! WEB SEARCH THAT IS NOW USED IN EVERY YAHOO! WEB SEARCH QUERY.QUERY.

THE WEBMAP BUILD STARTS WITH EVERY WEB PAGE THE WEBMAP BUILD STARTS WITH EVERY WEB PAGE CRAWLED BY YAHOO! AND PRODUCES A DATABASE OF CRAWLED BY YAHOO! AND PRODUCES A DATABASE OF ALL KNOWN WEB PAGES AND SITES ON THE INTERNET ALL KNOWN WEB PAGES AND SITES ON THE INTERNET AND A VAST ARRAY OF DATA ABOUT EVERY PAGE AND AND A VAST ARRAY OF DATA ABOUT EVERY PAGE AND SITE. THIS DERIVED DATA FEEDS THE MACHINE SITE. THIS DERIVED DATA FEEDS THE MACHINE LEARNED RANKING ALGORITHMS AT THE HEART OF LEARNED RANKING ALGORITHMS AT THE HEART OF YAHOO! SEARCH.YAHOO! SEARCH.

http://hadoop.apache.org/

Yahoo’s HadoopYahoo’s Hadoop One of Yahoo's One of Yahoo's HadoopHadoop clusters sorted 1 terabyte of data in 209 clusters sorted 1 terabyte of data in 209

seconds, which beat the previous record of 297 seconds in the annual seconds, which beat the previous record of 297 seconds in the annual general purpose (daytona) general purpose (daytona) terabyte sort benchmarkterabyte sort benchmark. The sort . The sort benchmark, which was created in 1998 by Jim Gray, specifies the benchmark, which was created in 1998 by Jim Gray, specifies the input data (10 billion 100 byte records), which must be completely input data (10 billion 100 byte records), which must be completely sorted and written to disk. This is the first time that either a Java or sorted and written to disk. This is the first time that either a Java or an open source program has won. Yahoo is both the largest user of an open source program has won. Yahoo is both the largest user of Hadoop with 13,000+ nodes running hundreds of thousands of jobs a Hadoop with 13,000+ nodes running hundreds of thousands of jobs a month and the largest contributor, although non-Yahoo month and the largest contributor, although non-Yahoo usageusage and and contributionscontributions are are increasing rapidly increasing rapidly..

The cluster statistics were:The cluster statistics were:

910 nodes, 2 quad core Xeons @ 2.0ghz per node910 nodes, 2 quad core Xeons @ 2.0ghz per node

4 SATA disks per node, 8G RAM per node4 SATA disks per node, 8G RAM per node

1 gigabit ethernet on each node, 40 nodes per rack1 gigabit ethernet on each node, 40 nodes per rack

8 gigabit ethernet uplinks from each rack to the core.8 gigabit ethernet uplinks from each rack to the core.

Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)Red Hat Enterprise Linux Server Release 5.1 (kernel 2.6.18)

Sun Java JDK 1.6.0_05-b13Sun Java JDK 1.6.0_05-b13

http://hadoop.apache.org/core

http://www.hpl.hp.com/hosted/sortbenchmark/

http://wiki.apache.org/hadoop/PoweredBy

http://people.apache.org/~omalley/CorePatchesByBranch-18.png

Process DiagramProcess Diagram

Map/Reduce ProcessesMap/Reduce Processes

• • Launching ApplicationLaunching Application – – User application codeUser application code – – Submits a specific kind of Map/Reduce jobSubmits a specific kind of Map/Reduce job• • JobTrackerJobTracker – – Handles all jobsHandles all jobs – – Makes all scheduling decisionsMakes all scheduling decisions• • TaskTrackerTaskTracker – – Manager for all tasks on a given nodeManager for all tasks on a given node• • TaskTask – – Runs an individual map or reduce fragment for aRuns an individual map or reduce fragment for a given jobgiven job – – Forks from the TaskTrackerForks from the TaskTracker

Hadoop Map-Reduce Hadoop Map-Reduce ArchitectureArchitecture

Master-Slave architectureMaster-Slave architecture

Map-Reduce Master “Jobtracker”Map-Reduce Master “Jobtracker”

– – Accepts MR jobs submitted by usersAccepts MR jobs submitted by users – – Assigns Map and Reduce tasks to TasktrackersAssigns Map and Reduce tasks to Tasktrackers – – Monitors task and tasktracker status, re-executes tasks Monitors task and tasktracker status, re-executes tasks

upon upon failurefailure

Map-Reduce Slaves “Tasktrackers”Map-Reduce Slaves “Tasktrackers”

– – Run Map and Reduce tasks upon instruction from the Run Map and Reduce tasks upon instruction from the JobtrackerJobtracker

– – Manage storage and transmission of intermediate outputManage storage and transmission of intermediate output

Imp LinksImp Links http://public.yahoo.com/gogate/http://public.yahoo.com/gogate/

hadoop-tutorial/start-tutorial.htmlhadoop-tutorial/start-tutorial.html http://www.youtube.com/watch?http://www.youtube.com/watch?

v=5Eib_H_zCEY&feature=relatedv=5Eib_H_zCEY&feature=related http://www.youtube.com/watch?http://www.youtube.com/watch?

v=yjPBkvYh-ss&feature=relatedv=yjPBkvYh-ss&feature=related http://labs.google.com/papers/gfs-http://labs.google.com/papers/gfs-

sosp2003.pdfsosp2003.pdf

Thank you !!!!!Thank you !!!!!

mapreduce: simplified data processing on large clusters

Documents

good weather

master failure unlikelyrefin

mastermaster failure

global metadata

single point of failure

master involvementsimple

chunkservers single

large datasets