cs 525 advanced distributed systems spring 09
DESCRIPTION
CS 525 Advanced Distributed Systems Spring 09. Indranil Gupta (Indy) Lecture 4 The Grid. Clouds. January 29, 2009. Two Questions We’ll Try to Answer. What is the Grid? Basics, no hype. What is its relation to p2p?. Example: Rapid Atmospheric Modeling System, ColoState U. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/1.jpg)
1
Indranil Gupta (Indy)Lecture 4
The Grid. Clouds.January 29, 2009
CS 525 Advanced Distributed Systems
Spring 09
![Page 2: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/2.jpg)
2
Two Questions We’ll Try to Answer
• What is the Grid? Basics, no hype.• What is its relation to p2p?
![Page 3: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/3.jpg)
3
Example: Rapid Atmospheric Modeling System, ColoState U
• Hurricane Georges, 17 days in Sept 1998– “RAMS modeled the mesoscale convective
complex that dropped so much rain, in good agreement with recorded data”
– Used 5 km spacing instead of the usual 10 km– Ran on 256+ processors
• Can one run such a program without access to a supercomputer?
![Page 4: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/4.jpg)
4
Wisconsin
MITNCSA
Distributed ComputingResources
![Page 5: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/5.jpg)
5
An Application Coded by a PhysicistJob 0
Job 2
Job 1
Job 3
Output files of Job 0Input to Job 2
Output files of Job 2Input to Job 3
Jobs 1 and 2 can be concurrent
![Page 6: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/6.jpg)
6
An Application Coded by a Physicist
Job 2
Output files of Job 0Input to Job 2
Output files of Job 2Input to Job 3
May take several hours/days4 stages of a job
InitStage inExecuteStage outPublish
Computation Intensive, so Massively Parallel
Several GBs
![Page 7: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/7.jpg)
7
Wisconsin
MITNCSA
Job 0
Job 2Job 1
Job 3
![Page 8: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/8.jpg)
8
Job 0
Job 2Job 1
Job 3
Wisconsin
MIT
Condor Protocol
NCSAGlobus Protocol
![Page 9: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/9.jpg)
9
Job 0
Job 2Job 1
Job 3Wisconsin
MITNCSA
Globus Protocol
Internal structure of differentsites invisible to Globus
External Allocation & SchedulingStage in & Stage out of Files
![Page 10: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/10.jpg)
10
Job 0
Job 3Wisconsin
Condor Protocol
Internal Allocation & SchedulingMonitoringDistribution and Publishing of Files
![Page 11: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/11.jpg)
11
Tiered Architecture (OSI 7 layer-like)
Resource discovery,replication, brokering
High energy Physics apps
Globus, Condor
Workstations, LANs
Opportunity for Crossover ideas from p2p systems
![Page 12: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/12.jpg)
12
The Grid TodaySome are 40Gbps links!(The TeraGrid links)
“A parallel Internet”
![Page 13: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/13.jpg)
13
Globus Alliance
• Alliance involves U. Illinois Chicago, Argonne National Laboratory, USC-ISI, U. Edinburgh, Swedish Center for Parallel Computers
• Activities : research, testbeds, software tools, applications
• Globus Toolkit (latest ver - GT3) “The Globus Toolkit includes software services and libraries
for resource monitoring, discovery, and management, plus security and file management. Its latest version, GT3, is the first full-scale implementation of new Open Grid Services Architecture (OGSA).”
![Page 14: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/14.jpg)
14
More
• Entire community, with multiple conferences, get-togethers (GGF), and projects
• Grid Projects:http://www-fp.mcs.anl.gov/~foster/grid-projects/
• Grid Users: – Today: Core is the physics community (since the Grid originates
from the GriPhyN project)
– Tomorrow: biologists, large-scale computations (nug30 already)?
![Page 15: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/15.jpg)
15
Some Things Grid Researchers Consider Important
• Single sign-on: collective job set should require once-only user authentication
• Mapping to local security mechanisms: some sites use Kerberos, others using Unix
• Delegation: credentials to access resources inherited by subcomputations, e.g., job 0 to job 1
• Community authorization: e.g., third-party authentication
![Page 16: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/16.jpg)
16
Grid History – 1990’s• CASA network: linked 4 labs in California and New Mexico
– Paul Messina: Massively parallel and vector supercomputers for computational chemistry, climate modeling, etc.
• Blanca: linked sites in the Midwest– Charlie Catlett, NCSA: multimedia digital libraries and remote
visualization
• More testbeds in Germany & Europe than in the US• I-way experiment: linked 11 experimental networks
– Tom DeFanti, U. Illinois at Chicago and Rick Stevens, ANL:, for a week in Nov 1995, a national high-speed network infrastructure. 60 application demonstrations, from distributed computing to virtual reality collaboration.
• I-Soft: secure sign-on, etc.
![Page 17: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/17.jpg)
17
Trends: Technology
• Doubling Periods – storage: 12 mos, bandwidth: 9 mos, and (what law is this?) cpu speed: 18 mos
• Then and Now
Bandwidth– 1985: mostly 56Kbps links nationwide
– 2004: 155 Mbps links widespread
Disk capacity
– Today’s PCs have 100GBs, same as a 1990 supercomputer
![Page 18: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/18.jpg)
18
Trends: Users• Then and Now Biologists:
– 1990: were running small single-molecule simulations – 2004: want to calculate structures of complex
macromolecules, want to screen thousands of drug candidatesPhysicists– 2006: CERN’s Large Hadron Collider produced 10^15
B/year
• Trends in Technology and User Requirements: Independent or Symbiotic?
![Page 19: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/19.jpg)
19
Prophecies
In 1965, MIT's Fernando Corbató and the other designers of the Multics operating system envisioned a computer facility operating “like a power company or water company”.
Plug your thin client into the computing Utiling and Play your favorite Intensive Compute &Communicate Application
– [Will this be a reality with the Grid?]
![Page 20: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/20.jpg)
20
“We must addressscale & failure”
“We need infrastructure”
P2P Grid
![Page 21: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/21.jpg)
21
Definitions
Grid
P2P
• “Infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities” (1998)
• “A system that coordinates resources not subject to centralized control, using open, general-purpose protocols to deliver nontrivial QoS” (2002)
• “Applications that takes advantage of resources at the edges of the Internet” (2000)
• “Decentralized, self-organizing distributed systems, in which all or most communication is symmetric” (2002)
![Page 22: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/22.jpg)
22
Definitions
Grid
P2P
• “Infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities” (1998)
• “A system that coordinates resources not subject to centralized control, using open, general-purpose protocols to deliver nontrivial QoS” (2002)
• “Applications that takes advantage of resources at the edges of the Internet” (2000)
• “Decentralized, self-organizing distributed systems, in which all or most communication is symmetric” (2002)
525: (good legal applications without intellectual fodder)
525: (clever designs without good, legal applications)
![Page 23: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/23.jpg)
23
Grid versus P2P - Pick your favorite
![Page 24: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/24.jpg)
24
ApplicationsGrid• Often complex & involving various
combinations of– Data manipulation– Computation– Tele-instrumentation
• Wide range of computational models, e.g.– Embarrassingly ||– Tightly coupled – Workflow
• Consequence– Complexity often inherent in the application
itself
P2P• Some
– File sharing– Number crunching– Content distribution– Measurements
• Legal Applications?
• Consequence– Low Complexity
![Page 25: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/25.jpg)
25
ApplicationsGrid• Often complex & involving various
combinations of– Data manipulation– Computation– Tele-instrumentation
• Wide range of computational models, e.g.– Embarrassingly ||– Tightly coupled – Workflow
• Consequence– Complexity often inherent in the application
itself
P2P• Some
– File sharing– Number crunching– Content distribution– Measurements
• Legal Applications?
• Consequence– Low Complexity
![Page 26: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/26.jpg)
26
Scale and FailureP2P• V. large numbers of entities
• Moderate activity– E.g., 1-2 TB in Gnutella (’01)
• Diverse approaches to failure– Centralized (SETI)– Decentralized and Self-Stabilizing
FastTrackC 4,277,745
iMesh 1,398,532
eDonkey 500,289
DirectConnect 111,454
Blubster 100,266
FileNavigator 14,400
Ares 7,731
(www.slyck.com, 2/19/’03)
Grid
• Moderate number of entities
– 10s institutions, 1000s users
• Large amounts of activity
– 4.5 TB/day (D0 experiment)
• Approaches to failure reflect assumptions
– E.g., centralized components
![Page 27: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/27.jpg)
27
Scale and FailureGrid
• Moderate number of entities
– 10s institutions, 1000s users
• Large amounts of activity
– 4.5 TB/day (D0 experiment)
• Approaches to failure reflect assumptions
– E.g., centralized components
P2P• V. large numbers of entities
• Moderate activity– E.g., 1-2 TB in Gnutella (’01)
• Diverse approaches to failure– Centralized (SETI)– Decentralized and Self-Stabilizing
FastTrackC 4,277,745
iMesh 1,398,532
eDonkey 500,289
DirectConnect 111,454
Blubster 100,266
FileNavigator 14,400
Ares 7,731
(www.slyck.com, 2/19/’03)
![Page 28: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/28.jpg)
28
Services and InfrastructureGrid• Standard protocols (Global Grid
Forum, etc.)• De facto standard software (open
source Globus Toolkit)• Shared infrastructure (authentication,
discovery, resource access, etc.)Consequences• Reusable services• Large developer & user communities• Interoperability & code reuse
P2P• Each application defines & deploys
completely independent “infrastructure”
• JXTA, BOINC, XtremWeb?• Efforts started to define common APIs,
albeit with limited scope to dateConsequences• New (albeit simple) install per
application • Interoperability & code reuse not
achieved
![Page 29: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/29.jpg)
29
Services and InfrastructureGrid• Standard protocols (Global Grid
Forum, etc.)• De facto standard software (open
source Globus Toolkit)• Shared infrastructure (authentication,
discovery, resource access, etc.)Consequences• Reusable services• Large developer & user communities• Interoperability & code reuse
P2P• Each application defines & deploys
completely independent “infrastructure”
• JXTA, BOINC, XtremWeb?• Efforts started to define common APIs,
albeit with limited scope to dateConsequences• New (albeit simple) install per
application • Interoperability & code reuse not
achieved
![Page 30: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/30.jpg)
30
Coolness FactorGrid P2P
![Page 31: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/31.jpg)
31
Coolness FactorGrid P2P
![Page 32: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/32.jpg)
32
Summary: Grid and P2P
1) Both are concerned with the same general problem– Resource sharing within virtual communities
2) Both take the same general approach– Creation of overlays that need not correspond in structure to
underlying organizational structures
3) Each has made genuine technical advances, but in complementary directions– “Grid addresses infrastructure but not yet scale and failure”
– “P2P addresses scale and failure but not yet infrastructure”
4) Complementary strengths and weaknesses => room for collaboration (Ian Foster at UChicago)
![Page 33: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/33.jpg)
33
Crossover IdeasSome P2P ideas useful in the Grid
– Resource discovery (DHTs), e.g., how do you make “filenames” more expressive, i.e., a computer cluster resource?
– Replication models, for fault-tolerance, security, reliability– Membership, i.e., which workstations are currently available?– Churn-Resistance, i.e., users log in and out; problem difficult
since free host gets a entire computations, not just small files
• All above are open research directions, waiting to be explored!
![Page 34: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/34.jpg)
34
Cloud Computing
What’s it all about?
A First Step
![Page 35: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/35.jpg)
35
Life of Ra (a Research Area)
TIME
PO
PU
LAR
ITY
O
F A
RE
A
First peak – end of hype (“This is a hot area!”)Hype- “Wow!”
First trough – “I told you so!”
Young Adolescent Middle Age Old Age
(low-hangingfruits)
(interestingProblems)
(solid base, hybrid algorithms)
(incremental Solutions)
Where is Grid?Where is cloud computing?
![Page 36: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/36.jpg)
36
How do I identify what stage a research area is in?
1. If there have been no publications in research area more than 1-2 years old, it is in the “Young Phase”
2. Pick a paper in the last 1 year published in the research area. Read it. If you think that you could have come up with the core idea in that paper (given all the background etc.), then the research area is in its “Young” phase.
3. Find the latest published paper that you think you could have come up with the idea for. If this paper has been cited by one round of papers (but these citing papers themselves have not been cited), then the research area is in the “Adolescent” phase.
4. Do Step 3 above, and if you find that the citing papers themselves have been cited, and so on, then the research area is at least in the “Middle Age” phase.
5. Pick a paper in the last 1-2 years. If you find that there are only incremental developments in these latest published papers, and the ideas may be innovative but are not yielding large enough performance benefits, then the area is mature.
6. If no one works in the research area, or everyone you talk to thinks negatively about the area (except perhaps the inventors of the area), then the area is dead.
![Page 37: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/37.jpg)
37
What is a cloud?
• It’s a cluster! It’s a supercomputer! It’s a datastore!
• It’s superman!
• None of the above
• Cloud = Lots of storage + compute cycles nearby
![Page 38: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/38.jpg)
38
Data-intensive Computing
• Computation-Intensive Computing– Example areas: MPI-based, High-performance computing, Grids– Typically run on supercomputers (e.g., NCSA Blue Waters)
• Data-Intensive– Typically store data at datacenters– Use compute nodes nearby– Compute nodes run computation services
• In data-intensive computing, the focus shifts from computation to the data: problem areas include
– Storage – Communication bottleneck– Moving tasks to data (rather than vice-versa)– Security– Availability of Data– Scalability
![Page 39: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/39.jpg)
39
Distributed Clouds
• A single-site cloud consists of– Compute nodes (split into racks)
– Switches, connecting the racks
– Storage (backend) nodes connected to the network
– Front-end for submitting jobs
– Services: physical resource set, software services
• A geographically distributed cloud consists of– Multiple such sites
– Each site perhaps with a different structure and services
![Page 40: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/40.jpg)
40
Only show internal switches used for data transfers, 1GbE with 48 ports
InternalSwitch
32 nodes
DL160
ProcurveSwitch
ProcurveSwitch
8 ports
8 ports
InternalSwitch
32 nodes
DL160
InternalSwitch
32 nodes
DL160
InternalSwitch
32 nodes
DL160
StorageNode
StorageNode
StorageNode
StorageNode
HeadNode
2 ports
2 ports
Note: System management, monitoring, and operator console will use a different set of switches not pictured here.
Cirrus Cloud at University of Illinois
![Page 41: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/41.jpg)
41
Example: Cirrus Cloud at U. Illinois
• 128 servers. Each has– 8 cores (total 1024 cores)– 16 GB RAM– 2 TB disk
• Backing store of about 250 TB
• Total storage: 0.5 PB
• Gigabit Networking
![Page 42: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/42.jpg)
42
6 Diverse Sites within Cirrus
I. UIUC – Systems Research for Cloud Computing + Cloud Computing Applications
II. Karlsruhe Institute of Tech (KIT, Germany): Grid-style jobs
III. IDA, SingaporeIV. IntelV. HPVI. Yahoo!: CMU’s M45 clusterAll will be networked together: see
http://www.cloudtestbed.org
![Page 43: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/43.jpg)
43
What “Services”?
Different Clouds Export different services• Industrial Clouds
– Amazon S3 (Simple Storage Service): store arbitrary datasets – Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary
images– Google AppEngine: develop applications within their appengine
framework, upload data that will be imported into their format, and run
• Academic Clouds – Google-IBM Cloud (U. Washington): run apps programmed atop
Hadoop– Cirrus cloud: run (i) apps programmed atop Hadoop and Pig, and
(ii) systems-level research on this first generation of cloud computing models
![Page 44: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/44.jpg)
44
Software “Services”
• Computational– MapReduce (Hadoop)– Pig Latin
• Naming and Management– Zookeeper– Tivoli, OpenView
• Storage– HDFS– PNUTS
![Page 45: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/45.jpg)
45
Sample Service: MapReduce
• Google uses MapReduce to run 100K jobs per day, processing up to 20 PB of data
• Yahoo! has released open-source software Hadoop that implements MapReduce
• Other companies that have used MapReduce to process their data: A9.com, AOL, Facebook, The New York Times
• Highly-Parallel Data-Processing
![Page 46: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/46.jpg)
46
What is MapReduce?• Terms are borrowed from Functional Language (e.g.,
Lisp)Sum of squares:
• (map square ‘(1 2 3 4))– Output: (1 4 9 16)[processes each record sequentially and independently]
• (reduce + ‘(1 4 9 16))– (+ 16 (+ 9 (+ 4 1) ) )– Output: 30[processes set of all records in a batch]
![Page 47: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/47.jpg)
47
Map
• Process individual key/value pair to generate intermediate key/value pairs.
Welcome EveryoneHello Everyone
Welcome1Everyone 1 Hello 1Everyone 1 Input <filename, file text>
![Page 48: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/48.jpg)
48
Reduce
• Processes and merges all intermediate values associated with each given key assigned to it
Welcome1Everyone 1 Hello 1Everyone 1
Everyone 2 Hello 1Welcome1
![Page 49: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/49.jpg)
49
Some Applications• Distributed Grep:
– Map - Emits a line if it matches the supplied pattern– Reduce - Copies the the intermediate data to output
• Count of URL access frequency– Map – Process web log and outputs <URL, 1>– Reduce - Emits <URL, total count>
• Reverse Web-Link Graph– Map – process web log and outputs <target, source>– Reduce - emits <target, list(source)>
![Page 50: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/50.jpg)
50
Programming MapReduce
• Externally: For user1. Write a Map program (short), write a Reduce program (short)2. Submit job; wait for result3. Need to know nothing about parallel/distributed programming!
• Internally: For the cloud (and for us distributed systems researchers)
1. Parallelize Map2. Transfer data from Map to Reduce3. Parallelize Reduce4. Implement Storage for Map input, Map output, Reduce input,
and Reduce output
![Page 51: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/51.jpg)
51
Inside MapReduce
• For the cloud (and for us distributed systems researchers)
1. Parallelize Map: easy! each map job is independent of the other!2. Transfer data from Map to Reduce:
• All Map output records with same key assigned to same Reduce task
• use partitioning function (more soon)3. Parallelize Reduce: easy! each map job is independent of the
other!4. Implement Storage for Map input, Map output, Reduce input,
and Reduce output• Map input: from distributed file system• Map output: to local disk (at Map node); uses local file system• Reduce input: from (multiple) remote disks; uses local file systems• Reduce output: to distributed file systemlocal file system = Linux FS, etc.distributed file system = GFS (Google File System), HDFS (Hadoop
Distributed File System)
![Page 52: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/52.jpg)
52
Internal Workings of MapReduce
![Page 53: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/53.jpg)
53
Flow of Data• Input slices are typically 16MB to 64MB.
• Map workers use a partitioning function to store intermediate key/value pair to the local disk.– e.g., Hash (key) mod R
Output files
Map workers
Reduce workerspartitioning
![Page 54: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/54.jpg)
54
Fault Tolerance
• Worker Failure– Master keeps 3 states for each worker task
• (idle, in-progress, completed)
– Master sends periodic pings to each worker to keep track of it (central failure detector)
• If fail while in-progress, mark the task as idle
• If map workers fail after completed, mark as idle
• Notify the reduce task about the map worker failure
• Master Failure– Checkpoint
![Page 55: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/55.jpg)
55
Locality and Backup tasks• Locality
– Since cloud has hierarchical topology– GFS stores 3 replicas of each of 64MB chunks
• Maybe on different racks
– Attempt to schedule a map task on a machine that contains a replica of corresponding input data: why?
• Stragglers (slow nodes)– Due to Bad Disk, Network Bandwidth, CPU, or
Memory.– Perform backup (replicated) execution of straggler task:
task done when first replica complete
![Page 56: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/56.jpg)
56
Grep
Locality optimization helps: • 1800 machines read 1 TB at peak ~31 GB/s • W/out this, rack switches would limit to 10 GB/s
Startup overhead is significant for short jobs
Workload: 1010 100-byte records to extract records
matching a rare pattern (92K matching records)
Testbed: 1800 servers each with 4GB RAM, dual 2GHz Xeon, dual 169 GB IDE disk, 100 Gbps, Gigabit ethernet per machine
![Page 57: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/57.jpg)
57
Normal No backup tasks 200 processes killed
Sort
• Backup tasks reduce job completion time a lot!• System deals well with failures
M = 15000 R = 4000
Workload: 1010 100-byte records (modeled after TeraSort benchmark)
![Page 58: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/58.jpg)
58
Discussion Points• Storage: Is the local write-remote read model good for Map
output/Reduce input?– What happens on node failure?
• Entire Reduce phase needs to wait for all Map tasks to finish– Why? What is the disadvantage?
• What are the other issues related to our challenges:– Storage – Communication bottleneck– Moving tasks to data (rather than vice-versa)– Security– Availability of Data– Scalability– Locality: within clouds, or across them– Inter-cloud/multi-cloud computations– Other Programming Models?
• Based on MapReduce• Beyond MapReduce-based ones
• Concern: Do clouds run the risk of going the Grid way?
![Page 59: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/59.jpg)
59
P2P and Clouds/Grid
• Opportunity to use p2p design techniques, principles, and algorithms in cloud computing
• Cloud computing vs. Grid computing: what are the differences?
![Page 60: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/60.jpg)
60
Prophecies
In 1965, MIT's Fernando Corbató and the other designers of the Multics operating system envisioned a computer facility operating “like a power company or water company”.
Plug your thin client into the computing Utiling and Play your favorite Intensive Compute & Storage
& Communicate Application– [Will this be a reality with the Grid and Clouds?]
Are we there yet?
???
Are we going towards it?
![Page 61: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/61.jpg)
61
Administrative AnnouncementsStudent-led paper presentations (see instructions on website)• Start from February 12th• Groups of up to 2 students each class, responsible for a set
of 3 “Main Papers” on a topic– 45 minute presentations (total) followed by discussion– Set up appointment with me to show slides by 5 pm day prior to
presentation
• List of papers is up on the website• Each of the other students (non-presenters) expected to read
the papers before class and turn in a one to two page review of the any two of the main set of papers (summary, comments, criticisms and possible future directions)
![Page 62: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/62.jpg)
62
Announcements (contd.)• Presentation Deadline: form groups by midnight
of January 31 by dropping by my office hours (10.45 am – 12 pm, Tu, Th in 3112 SC)– Hurry! Some interesting topics are already taken!– I can help you find partners
• Use course newsgroup for forming groups and discussion: class.cs525
![Page 63: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/63.jpg)
63
Announcements (contd.)
Projects• Groups of 2 (need not be same as presentation
groups)• We’ll start detailed discussions “soon” (a few
classes into the student-led presentations)
• Please turn in filled-out “Student Infosheets” today or next lecture.
![Page 64: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/64.jpg)
64
Next week
• No lecture Tuesday February 3 (no office hours either)
• Thursday (February 5) lecture: read Basic Distributed Computing Concepts papers
![Page 65: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/65.jpg)
65
Backup Slides
![Page 66: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/66.jpg)
66
Example: Rapid Atmospheric Modeling System, ColoState U
• Weather Prediction is inaccurate
• Hurricane Georges, 17 days in Sept 1998
![Page 67: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/67.jpg)
67
![Page 68: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/68.jpg)
68
Next Week Onwards
• Student led presentations start– Organization of presentation is up to you– Suggested: describe background and motivation for the
session topic, present an example or two, then get into the paper topics
• Reviews: You have to submit both an email copy (which will appear on the course website) and a hardcopy (on which I will give you feedback). See website for detailed instructions.– 1-2 pages only, 2 papers only
![Page 69: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/69.jpg)
69
Refinements and Extensions
• Local Execution– For debugging purpose– Users have control on specific Map tasks
• Status Information– Master runs an HTTP server– Status page shows the status of computation– Link to output file– Standard Error list
![Page 70: CS 525 Advanced Distributed Systems Spring 09](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56813aab550346895da2a32d/html5/thumbnails/70.jpg)
70
Refinements and Extensions
• Combiner Function– User defined
– Done within map task.
– Save network bandwidth.
• Skipping Bad records– Best solution is to debug & fix
• Not always possible ~ third-party source libraries
– On segmentation fault: • Send UDP packet to master from signal handler • Include sequence number of record being processed
– If master sees two failures for same record: • Next worker is told to skip the record