![Page 1: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/1.jpg)
MapReduce and Dryad
CS227
Li Jin, Jayme DeDona
![Page 2: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/2.jpg)
Outline
• Map Reduce
• Dryad
– Computational Model
– Architecture
– Use cases
– DryadLINQ
![Page 3: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/3.jpg)
Outline
• Map Reduce
• Dryad
– Computational Model
– Architecture
– Use cases
– DryadLINQ
![Page 4: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/4.jpg)
Map/Reduce function
• Map
– For each pair in a set of key/value pairs, produce a new key/value pair.
• Reduce
– For each key
• Look at all the values associated with that key and compute a new value.
![Page 5: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/5.jpg)
Map/Reduce Function Example
![Page 6: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/6.jpg)
Implementation Sketch
• Map’s input pairs divided into M splits
– stored in DFS
• Output of Map divided into R pieces
• One master process is in charge: farms out work to W worker processes.
– each process on a separate computer
![Page 7: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/7.jpg)
Implementation Sketch
• Master partitions splits among some of the workers– Each worker passes pairs to map function
– Results stored in local files• Partitioned into R pieces
– Remaining works perform reduce tasks• The R pieces are partitioned among them
• Place remote procedure calls to map workers to get data
• Put output to DFS
![Page 8: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/8.jpg)
Implementation Sketch
![Page 9: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/9.jpg)
Implementation Sketch
![Page 10: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/10.jpg)
More Details
• Input files split into M pieces, 16MB-64MB each.
• A number of worker machines are started
– Master schedules M map tasks and R reduce tasks to workers, one task at a time
– Typical values:
• M = 200,000
• R = 5000
• 2000 worker machines.
![Page 11: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/11.jpg)
More Details
• Worker assigned a map task processes the corresponding split, calling the map function repeatedly; output buffered in memory
• Buffered output written periodically to local files, partitioned into R regions.
– Locations sent back to master
![Page 12: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/12.jpg)
More Details
• Reduce tasks
– Each handles one partition
– Access data from map workers via RPC
– Data is sorted by key
– All values associated with each key are passed to the reduce function
– Result appended to DFS output file
![Page 13: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/13.jpg)
Coping with Failure
• Master maintains state of each task
– Idle (not started)
– In progress
– Completed
• Master pings workers periodically to determine if they’re up
![Page 14: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/14.jpg)
Coping with Failure
• Worker crashes
– In-progress tasks have state set back to idle
• All output is lost
• Restarted from beginning on another worker
– Completed map tasks
• All output is lost
• Restarted from beginning on another worker
• Reduce tasks using output are notified of new worker
![Page 15: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/15.jpg)
Coping with Failure
• Worker crashes(continued)
– Completed reduce tasks
• Output already on DFS
• No restart necessary
• Master crashes
– Could be recovered from checkpoint
– In practice
• Master crashes are rare
• Entire application is restarted
![Page 16: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/16.jpg)
Counterpoint
• MapReduce: A major step backwards
– http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/
• A giant step backward in the programming paradigm for large-scale data intensive applications
• Sub optimal. Use brute force instead of indexing
• Not novel at all – it represents a specific implementation of well known techniques nearly 25 years ago
• …
![Page 17: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/17.jpg)
Countercounterpoint
• Mapreduce is not a database system, so don’t judge it as one
• Mapreduce has excellent scalability; the proof of Google’s use
• Mapreduce is cheap and databases are expensive. (As a countercountercounterpointto this, a Vertica guy told me they ran 3000 times faster than a hadoop job in one of their client’s cases)
![Page 18: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/18.jpg)
Outline
• Map Reduce
• Dryad
– Computational Model
– Architecture
– Use cases
– DryadLINQ
![Page 19: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/19.jpg)
Dryad goals
• General-purpose execution environment for distributed, data-parallel applications
– Concentrates on throughput not latency
– Assumes private data center
• Automatic management of scheduling, distribution, fault tolerance, etc.
![Page 20: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/20.jpg)
Outline
• Map Reduce
• Dryad
– Computational Model
– Architecture
– Use cases
– DryadLINQ
![Page 21: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/21.jpg)
Where does Dryad fit in the stack?
• Many programs can be represented as a distributed execution graph
• Dryad is middleware abstraction that runs them for you
– Dryad sees arbitrary graphs
• Simple, regular scheduler, fault-tolerance, etc.
• Independent of programming model
– Above Dryad is graph manipulation
![Page 22: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/22.jpg)
Job = Directed Acyclic Graph
Processing
verticesChannels
(file, pipe,
shared
memory)
Inputs
Outputs
![Page 23: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/23.jpg)
Inputs and Outputs
• “Virtual” graph vertices
• Extensible abstraction
• Partitioned distributed files
– Input file expands to set of vertices
• Each partition is one virtual vertex
– Output vertices write to individual partitions
• Partitions concatenated when outputs completes
![Page 24: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/24.jpg)
Channel Abstraction
• Sequence of structured (typed) items
• Implementation– Temporary disk file
• Items are serialized in buffers
– TCP pipe• Items are serialized in buffers
– Shared-memory FIFO• Pass pointers to items directly
• Simple, general data model
![Page 25: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/25.jpg)
Why a Directed Acyclic Graph?
• Natural “most general” design point
• Allowing cycles causes trouble
• Mistake to be simpler
– Supports full relational algebra and more
• Multiple vertex inputs or outputs of different types
– Layered design
• Generic scheduler, no hard-wired special cases
• Front ends only need to manipulate graphs
![Page 26: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/26.jpg)
Why a general DAG?
• “Uniform” stages aren’t really uniform
![Page 27: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/27.jpg)
Why a general DAG?
• “Uniform” stages aren’t really uniform
![Page 28: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/28.jpg)
Graph complexity composes
• Non-trees common
• E.g. data-dependent re-partitioning
– Combine this with merge trees etc.
Distribute to equal-sized ranges
Sample to estimate histogram
Randomly partitioned inputs
![Page 29: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/29.jpg)
Why no cycles?
• Scheduling is easy
– Vertex can run anywhere once all its inputs are ready.
– Directed-acyclic means there is no deadlock
– Finite-length channels means vertices finish.
![Page 30: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/30.jpg)
Why no cycles?
• Scheduling is easy
– Vertex can run anywhere once all its inputs are ready.
– Directed-acyclic means there is no deadlock
– Finite-length channels means vertices finish.
![Page 31: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/31.jpg)
Why no cycles?
• Scheduling is easy
– Vertex can run anywhere once all its inputs are ready.
– Directed-acyclic means there is no deadlock
– Finite-length channels means vertices finish.
![Page 32: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/32.jpg)
Why no cycles?
• Scheduling is easy
– Vertex can run anywhere once all its inputs are ready.
– Directed-acyclic means there is no deadlock
– Finite-length channels means vertices finish.
![Page 33: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/33.jpg)
Why no cycles?
• Scheduling is easy
– Vertex can run anywhere once all its inputs are ready.
– Directed-acyclic means there is no deadlock
– Finite-length channels means vertices finish.
![Page 34: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/34.jpg)
Why no cycles?
• Scheduling is easy
– Vertex can run anywhere once all its inputs are ready.
– Directed-acyclic means there is no deadlock
– Finite-length channels means vertices finish.
• Fault tolerance is easy (with deterministic code)
![Page 35: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/35.jpg)
Optimizing Dryad applications
• General-purpose refinement rules
• Processes formed from subgraphs
– Re-arrange computations, change I/O type
• Application code not modified
– System at liberty to make optimization choices
• High-level front ends hide this from user
– SQL query planner, etc.
![Page 36: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/36.jpg)
Outline
• Map Reduce
• Dryad
– Computational Model
– Architecture
– Use cases
– DryadLINQ
![Page 37: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/37.jpg)
Runtime
• Services– Name server– Daemon
• Job Manager– Centralized coordinating process– User application to construct graph– Linked with Dryad libraries for scheduling vertices
• Vertex executable– Dryad libraries to communicate with JM– User application sees channels in/out– Arbitrary application code, can use local FS
V V V
![Page 38: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/38.jpg)
Scheduler state machine
• Scheduling is independent of semantics
– Vertex can run anywhere once all its inputs are ready
• Constraints/hints place it near its inputs
– Fault tolerance
• If A fails, run it again
• If A’s inputs are gone, run upstream vertices again (recursively)
• If A is slow, run another copy elsewhere and use output from whichever finishes first
![Page 39: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/39.jpg)
Outline
• Map Reduce
• Dryad
– Computational Model
– Architecture
– Use cases
– DryadLINQ
![Page 40: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/40.jpg)
SkyServer DB Query
• 3-way join to find gravitational lens effect
• Table U: (objId, color) 11.8GB
• Table N: (objId, neighborId) 41.8GB
• Find neighboring stars with similar colors:– Join U+N to find
T = U.color,N.neighborId where U.objId = N.objId
– Join U+T to findU.objId where U.objId = T.neighborID
and U.color ≈ T.color
![Page 41: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/41.jpg)
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
• Took SQL plan
• Manually coded in Dryad
• Manually partitioned data
SkyServer DB query
u: objid, colorn: objid, neighborobjid[partition by objid]
selectu.color,n.neighborobjid
from u join nwhereu.objid = n.objid
(u.color,n.neighborobjid)[re-partition by n.neighborobjid][order by n.neighborobjid]
[distinct][merge outputs]
selectu.objid
from u join <temp>whereu.objid = <temp>.neighborobjid and|u.color - <temp>.color| < d
![Page 42: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/42.jpg)
SkyServer DB query
• M-S-Y : SHM
– “in-memory” : D-M is TCP and SHM
– “2-pass” : D-M is Temp Files.
• Other Edges:
– Temp Files
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
![Page 43: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/43.jpg)
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
0 2 4 6 8 10
Number of Computers
Speed-u
p
Dryad In-Memory
Dryad Two-pass
SQLServer 2005
![Page 44: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/44.jpg)
Outline
• Map Reduce
• Dryad
– Computational Model
– Architecture
– Use cases
– DryadLINQ
![Page 45: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/45.jpg)
Dryad Software Stack
![Page 46: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/46.jpg)
DryadLINQ
• LINQ: Relational queries integrated in C#
• More general than distributed SQL
– Inherits flexible C# type system and libraries
– Data-clustering, EM, …
![Page 47: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/47.jpg)
LINQ
Collection<T> collection;
bool IsLegal(Key);
string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
![Page 48: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/48.jpg)
Collection<T> collection;bool IsLegal(Key k);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ = LINQ + Dryad
C#
collection
results
C# C# C#
Vertexcode
Queryplan(Dryad job)
Data
![Page 49: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/49.jpg)
Performance
• 10% code.(In comparison to programming directly on the Dryad middleware)
• 30% slower than “expert code”.
![Page 50: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/50.jpg)
Summary
• General-purpose platform for scalable distributed data-processing of all sorts
• Very flexible
– Optimizations can get more sophisticated
• Designed to be used as middleware
– Slot different programming models on top
– LINQ is very powerful
![Page 51: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/51.jpg)
Yahoo! Cloud Serving Benchmark
Xiaowei
![Page 52: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/52.jpg)
Motivation
PNUTS
![Page 53: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/53.jpg)
Benchmark tiers
• Tier 1 – Performance– A system with better performance will achieve the desired
latency and throughput with fewer servers
• Tier 2 – Scalability– Latency as database, system size increases– “Scaleup”
– Latency as we elastically add servers– “Elastic speedup”
![Page 54: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/54.jpg)
Benchmark tiers
• Tier 3 – Availability– Measure the Impact of failures on the system
• Tier 4 – Replication– Measure the effects of Replication Strategy on the system’s
performance
![Page 55: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/55.jpg)
Architecture
Workload parameter file• R/W mix• Record size• Data set• …
Command-line parameters• DB to use• Target throughput• Number of threads• …
YCSB client
DB
clie
nt
Client threads
Stats
Workload executor
Clo
ud
DB
Extensible: plug in new clientsExtensible: define new workloads
![Page 56: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/56.jpg)
DB interface
• read()
• insert()
• update()
• delete()
• scan()
– Execute range scan, reading specified number of records starting at a given record key
![Page 57: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/57.jpg)
Test
• Setup– Six server-class machines
• 8 cores (2 x quadcore) 2.5 GHz CPUs, 8 GB RAM, 6 x 146GB 15K RPM SAS drives in RAID 1+0, Gigabit ethernet, RHEL 4
– Plus extra machines for clients, routers, controllers, etc.– Cassandra 0.5.0 (0.6.0-beta2 for range queries) – HBase 0.20.3– MySQL 5.1.32 organized into a sharded configuration – PNUTS/Sherpa 1.8 with MySQL 5.1.24– No replication; force updates to disk (except HBase, which primarily commits
to memory)
• Workloads– 120 million 1 KB records = 20 GB per server
• Caveat– We tuned each system as well as we knew how, with assistance from the
teams of developershttps://github.com/brianfrankcooper/YCSB/tree/master/workloads
![Page 58: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/58.jpg)
Elasticity• Run a read-heavy workload
0
100
200
300
400
500
600
700
800
0 50 100 150 200 250 300 350
Read
late
ncy (
ms)
Duration of test (min)
Cassandra Elasticity – 5th to 6th server
![Page 59: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/59.jpg)
Running a workload
• Set up the database system to test
• Choose the appropriate DB interface layer
• Choose the appropriate workload
• Choose the appropriate runtime parameters (number of client threads, target throughput, etc.)
• Load the data
• Execute the workload
![Page 60: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/60.jpg)
Tips
• Only one Tip!
![Page 61: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/61.jpg)
Conclusions
• YCSB is an opensource benchmark for cloud serving systems
• Experimental results show tradeoffs between systems
• https://github.com/brianfrankcooper/YCSB/wiki/
• http://arunxjacob.blogspot.com/2011/03/setting-up-ycsb-for-low-latency-data.html
![Page 62: MapReduce and Dryad - Brown University · •Mapreduce is not a database system, so don’t judge it as one •Mapreduce has excellent scalability; the proof of Google’s use •Mapreduce](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6659703b271366615c7eb/html5/thumbnails/62.jpg)