dryad and dryalinq
DESCRIPTION
Dryad and DryaLINQ. Dryad and DryadLINQ. Dryad provides automatic distributed execution DryadLINQ provides automatic query plan generation. Dryad. General-purpose execution environment for distributed, data-parallel applications - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/1.jpg)
Dryad and DryaLINQ
![Page 2: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/2.jpg)
Dryad and DryadLINQ
Dryad provides automatic distributed executionDryadLINQ provides automatic query plan generation
![Page 3: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/3.jpg)
Dryad
• General-purpose execution environment for distributed, data-parallel applications
• Focus on simplicity, reliability, scalability, efficiency and not latency, unreliable networks
• Automatic management of scheduling, distribution, fault tolerance
• Exploits Data Parallelism
![Page 4: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/4.jpg)
Dryad
• Computations expressed as a Directed Acyclic Graph– Jobs executed on vertices– Edges are communication channels– Each vertex has several input and output edges– Data transport mechanisms: Files, TCP pipes,
shared memory FIFOs
![Page 5: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/5.jpg)
Job = Directed Acyclic Graph
Processingvertices Channels
(file, pipe, shared memory)
Inputs
Outputs
![Page 6: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/6.jpg)
Dryad vs. MapReduce, Parallel DB
• More control to developer than MapReduce• MapReduce aims at simplicity at the expense
of generality and performance• Computation Graph is implicit in Parallel DB
![Page 7: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/7.jpg)
Dryad System Architecture
• Job manager – coordinates jobs, constructs graph• Name server – exposes computers with network
topology• Daemons run on each computer in the cluster
![Page 8: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/8.jpg)
Communication
![Page 9: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/9.jpg)
Job (Graph) Construction
• Using graph operators implemented in C++ to describe the graph (from simpler sub graphs).
![Page 10: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/10.jpg)
Job Execution• Job manager not currently fault tolerant• Vertices may be scheduled multiple times due to
failures – Each execution versioned– Execution record kept- including versions of incoming vertices– Outputs are uniquely named (versioned)– Final outputs selected if job completes– Non-file communication (TCP pipe, Shared Memory FIFO)
may cascade failures• Vertices specify hard constraints or preferences for set
of computers required• Scheduling is greedy assuming only one job
![Page 11: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/11.jpg)
11
Policy Managers
R R
X X X X
Stage RR R
Stage X
Job Manager
R managerX ManagerR-X
Manager
Connection R-X
![Page 12: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/12.jpg)
Cluster network topology
rack
top-of-rack switch
top-level switch
![Page 13: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/13.jpg)
Run-time Graph Refinement
![Page 14: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/14.jpg)
14
S S S S
A A A
S S
T
S S S S S S
T
# 1 # 2 # 1 # 3 # 3 # 2
# 3# 2# 1
static
dynamic
rack #
Dynamic Aggregation
![Page 15: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/15.jpg)
Fault Tolerance
![Page 16: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/16.jpg)
SkyServer DB Query
• 3-way join to find gravitational lens effect• Table U: (objId, color) 11.8GB• Table N: (objId, neighborId) 41.8GB• Find neighboring stars with similar colors:
– Join U+N to findT = U.color,N.neighborId where U.objId = N.objId
– Join U+T to findU.objId where U.objId = T.neighborID
and U.color ≈ T.color
![Page 17: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/17.jpg)
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
• Took SQL plan• Manually coded in Dryad• Manually partitioned data
SkyServer DB query
u: objid, colorn: objid, neighborobjid[partition by objid]
select u.color,n.neighborobjidfrom u join nwhere u.objid = n.objid
(u.color,n.neighborobjid)[re-partition by n.neighborobjid][order by n.neighborobjid]
[distinct][merge outputs]
select u.objidfrom u join <temp>where u.objid = <temp>.neighborobjid and |u.color - <temp>.color| < d
![Page 18: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/18.jpg)
Optimization
D
M
S
Y
X
M
S
M
S
M
S
U N
U
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
![Page 19: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/19.jpg)
Optimization
D
M
S
Y
X
M
S
M
S
M
S
U N
U
D D
MM 4n
SS 4n
YY
H
n
n
X Xn
U UN N
U U
![Page 20: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/20.jpg)
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
0 2 4 6 8 10
Number of Computers
Spe
ed-u
pDryad In-Memory
Dryad Two-pass
SQLServer 2005
![Page 21: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/21.jpg)
High level Programming Languages
• Nebula – limited to existing binaries• SSIS – SQLServer workflow engine, distributed• DryadLINQ – Supports both imperative and
declarative operations on datasets
![Page 22: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/22.jpg)
Dryad/DryadLINQ
• Decoupling of Dryad and DryadLINQ– Dryad: execution engine (given DAG, do scheduling and
fault tolerance)– DryadLINQ: programming model (given query, generate
DAG)
![Page 23: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/23.jpg)
DryadLINQ• Exploits LINQ (Relational queries integrated in C#) to provide
a hybrid of imperative and declarative programming• LINQ has a design choice that is easy to express computations
also giving runtime leeway implementing them.• Sequential program composed of LINQ expressions• Performs side-effect free transformations on datasets• Written and Debugged using .NET development tools• More general than distributed SQL• Programs can be automatically optimized and efficiently
executed on large cluster
![Page 24: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/24.jpg)
DryadLINQ
• Serialization for dryad are provided by High level software layers like DrayLINQ
• DrayLINQ preserves the LINQ programming model and defines new operators and datatypes for data parallel programming
![Page 25: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/25.jpg)
DryadLINQ Architecture
![Page 26: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/26.jpg)
DryadLINQ Data Model
Partition
Partitioned Table
.Net objects
Data Model is distributed implementation of LINQ CollectionsEach Dataset is distributed (disjoint) across the clusterPartitioned table exposes metadata information
– type, partition, compression scheme, serialization, etc.
![Page 27: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/27.jpg)
DrayLINQ Constructs
• Expressions must be side-effect free• Allows programmer to specify annotations
(hints) to guide optimization• Operators
– Hash Partition– Range Partition– Apply: Allows arbitrary streaming computations– Fork: Takes single input and generates multiple
output datasets
![Page 28: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/28.jpg)
System Implementation
• Execution Plan Graph: Starts by converting raw LINQ expressions into EPG
• DryadLINQ Optimizations– Static Optimizations– Dynamic Optimizations
• Code Generation: Uses dynamic code generation to automatically synthesize LINQ code to be run at the Drayad vertex
![Page 29: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/29.jpg)
Conclusions• Goal: Use a compute cluster as if it is a single
computer– Dryad/DryadLINQ represent a significant step
• Requires close collaborations across many fields of computing, including– Distributed systems– Distributed and parallel databases– Programming language design and analysis
![Page 30: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/30.jpg)
References
• Dryad: Distributed Data-parallel Programs from Sequential Building Blocks (Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly March 2007)
• DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language (Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon CurreyDecember 2008)
![Page 31: Dryad and DryaLINQ](https://reader036.vdocuments.mx/reader036/viewer/2022081507/568162b6550346895dd33ed8/html5/thumbnails/31.jpg)
Thank you