a survey of programming frameworks for dynamic grid workflow applications

31
A Survey of Programming Frameworks for Dynamic Grid Workflow Applications November 2 nd , 2007 Taura Lab Ken Hironaka

Upload: ardice

Post on 25-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

A Survey of Programming Frameworks for Dynamic Grid Workflow Applications. November 2 nd , 2007 Taura Lab Ken Hironaka. Background. Attempts to analysis databases of enormous size Genetic sequence database BLAST (Basic Local Alignment Search Tool) library MEDLINE journal abstract database - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

A Survey of Programming Frameworks for Dynamic Grid

Workflow Applications

November 2nd , 2007Taura Lab

Ken Hironaka

Page 2: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Background

• Attempts to analysis databases of enormous size– Genetic sequence database• BLAST (Basic Local Alignment Search Tool) library

– MEDLINE journal abstract database• Enju (a syntactic parser for English)

• Improvements in algorithms are not enough to handle the overwhelming amount of data– Need to be able to parallelize computation

Page 3: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Basic Demands

• Express the workload with ease– Don’t have to think about complex configuration

files

• Parallel Computation– No Distributed Computing experts required!

Page 4: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Well known frameworks• Batch Schedulers

– Solution for cluster computers

– Submit each task as a “Job” in input file(s)

– The job is scheduled to an idle node

– Good for embarrassingly parallel tasks• Tasks with no inter-task

dependencies– Data sharing by NFS

• Easy data collection

Central Manager

Busy Nodes

Submit

Assign

Cluster

Page 5: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Arising Problems

• Handling Workflows

• Coping with Grid(multi-cluster) environments

• Creation of tasks/aggregation of results

Page 6: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Handling Workflows

• Most Tasks are not so embarrassingly parallel– Blindly scheduling jobs is not good enough

• Workflows: Dependencies between tasks– Passing output files as input files bet. tasks

• Eg: Natural Language Processing

Phonological Analysis

MorphologicalAnalysis

SyntacticAnalysis

SemanticAnalysis

Task File

Page 7: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Coping with Grid environments

• Multiple Clusters– 1 huge cluster is rare

• Connectivity in WANs– Firewalls, NATs

• File sharing problems– Independent file systems

• Dynamics– Nodes joining and

leaving(failure)

leave

join

Fire Wall

Page 8: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Task creation/data collection• “Task” in the conventional sense

– Simple compute a given data• Manual task creation

– Splitting the problems into sub-problems– Tedious manual work for large input/databases

• Manual data collection– Collecting results afterwards– What if they are dispersed all over the Grid?

• Not so trivial for modern settings– Such tasks need to be built into the framework

Page 9: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Detailed summary of the demands

• Modern Parallelization frameworks⇒ Frameworks that facilitate Workflow

applications on dynamic Grid environments– Handle workflows with grace– Cope with WAN connectivity problems– Handle Data Transfers– Cope with dynamic changes in resources– Automatically create tasks/collect results– EASY TO USE!

Page 10: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

The Organization of this Presentation

• Background/Introduction• Existing Programming Frameworks• Conclusion/Future Work

Page 11: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Condor

• One of many available batch schedulers– Maintains a pool of idle nodes in a cluster

• Goes beyond a regular scheduler– Allows workflow expression for submitted tasks– File Transfer extension to handle data driven jobs

• Grid-Enabled (Condor-G)– Uses the Globus toolkit to run on multiple-clusters• Allows jobs to be scheduled on different pools on

different clusters

Page 12: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

DAGMan: Expressing Workflows

• Extension to Condor– Executes a set of Tasks with DAG dependencies

• DAG(Directed Acyclic Graph)– An expression of workflows– A→B : A must finish before B starts– eg: A→C, B→C, C→D, C→E– Can express general workflows

• Fault-Tolerance– In case of failure, restarts from the job that failed

• Eg. If Task C fails, Task A and B are not redone

A

B

C

D

E

Page 13: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

DAGMan: Howto

• Create a script defining jobs and dependencies

• Submit DAG

– It will be automatically translated into Condor Jobs and will be scheduled accordingly in Condor

### sample.dag ###

#define jobsJob A A.shJob B B.pyJob C C.plJob D D.shJob E E.sh

#define dependenciesPARENT A B CHILD CPARENT C CHILD D E

$condor_submit_dag sample.dag

Page 14: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Stork• Data Placement

Scheduler for Condor• File transfer across file

systems– ftp, scp, http, etc…

• A transfer is treated as a DAGMan Job– Allows Jobs to pass files

without shared FS• Inter-task data passing is

not possible– Must use a third-party

server to pass data

### trans.stork ###

[ dest_url = "file:/tmp/hoge.tar.gz"; src_url = "ftp://www.foo.com/hog.tar.gz”; dap_type = transfer; ]

### sample2.dag ###DATA INPUT0 trans.storkJOB A A.sh

PARENT INPUT0 CHILD A

Page 15: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Review of Condor• Strong in classical batch queuing system related topics

• Pros– Handles workflows and fault-tolerance– Possible to deploy on multiple clusters

• Cons– Condor and its extensions must be installed by the system

administrator on all nodes on each cluster• Big initial overhead• Cannot add more nodes dynamically

– Limited file transfer options• Inter-task data passing is not possible

– Task creation and result collection done manually

Page 16: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Ibis(Satin)

• Java-based parallel computation library• Distributed Object Oriented– Transparent location of distributed objects– Offers RMI(Remote Method Invocation)• Transparent delegation of computation

• Divide-and-Conquer type applications– Satin

foo.doJob(args)

RMI

compute

foo

Page 17: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Divide-and-Conquer• One large problem may be

recursively split into numerous smaller subproblems– Eg. Quick Sort, Fibonacci

• SPAWN– Create sub-problems

“children”• SYNC

– By parent– Wait for sub-problem results

• Can Express DAG workflows

Fib(20)

= Fib(19) + Fib(18)

Parent – ChildRelationship

Page 18: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Divide-and-Conquer: HowTo• Import the library• Define user class

extending on the SatinObject library class

• Define computation methods– Use recursion

• Allows creation of sub-problems

– sync()• Implicit definition of

dependencies

### fib.java ###import ibis.satin.SatinObject;

Class Fib extends SatinObjectimplements …{ public int fib(int N){ if(n<2)return n;

int x = fib(N-1); int y = fib(N-2);

sync(); return x + y; }}

Implicit spawn

Wait for results

Page 19: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Random Work Stealing

• strategy to load-balance among participating nodes– An idle node steals an

unfinished sub-problem from a random node

– The result is returned to the victim node

• Adapts to joining nodes– Automatically acquire

tasks

Node 0

Node 2

Node 1

STEAL

STEAL

Page 20: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Dealing with Failures

• When a node fails, its sub-problems needs to be restarted.

• Orphan Tasks– Sub-problems that lose

the parent by which their results are used

– Orphan Tasks results are circulated among nodes

Node 0

Node 1

Node 2

OrphanedSub-Problems

Results cached & circulated

Page 21: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Review of Ibis(Satin)• Pros– Benefits from targeting divide and conquer applications

• Able to handle workflow by using spawn and sync• Automatically creates tasks/collect results

– Handles dynamic joining/leaving of nodes• Random work stealing• Recycling Orphan sub-problem results

• Cons– Currently, only supports direct communication among

nodes(not for Firewall or NAT)– Targeted for CPU intensive applications

• No primitives for file transfer over the network

Page 22: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Map-Reduce• Framework for processing large homogenous data on

clusters– Handling large databases in google

• The user defines 2 functions– Map, Reduce

Map()

Map()

Map()

Reduce()

Reduce()

Input DataOutput per

reducer

Page 23: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Map-Reduce: Howto

• Abstraction– Data file →

set of key/values

• Map– (k1,v1) → (k2, v2)– values with same key are

combined

• Reduce– (k2, list of v2) → list of

v2

### word count ###

#key: document name#value: contents

def Map(key, value): #emit 1 for each word for w in value: emit(w, 1)

#key: word#values: list of 1s

def Reduce(key,values): result = 0 #add up 1s for i in values: result += 1 emit(key, result)

Page 24: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Implementation

• Master – Worker Model– Worker:• Map Workers and Reduce Workers• Data is directly transferred from Map → Reduce

– Master: coordinates flow of data bet. Workers– Work on failed workers are restarted

• Distributed File System– Collection of results is made simple

Page 25: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Review of Map-Reduce• Abstracts data files to key/value sets– Computes on them using user defined function

• Pros– Automatic task create/result collection– Automatic file transfers between Map/Reduce– Fault tolerant

• Cons– Map – Reduce Model is still restrictive for many real-life

application– Not for WAN– Cannot add nodes dynamically

Page 26: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Comparison of FrameworksCondor Ibis(Satin) Map-Reduce

Grid connectivity ○ × ×

Workflow ○ ○ △

File Transfer △ × ○

Task creation/data collection

× ○ ○

Join/Leave of nodes △ ○ △

Deployment Ease △ ○ ○

- Each have their own strength and weaknesses- Not so trivial to make Grid workflow application easy for scientific computing users

Page 27: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Conclusion

• We have presented a series of viable choices when one attempts to perform parallel workflow applications in a Grid environment

• File Transfer, Task creation/data collection– Need tasks to be able to interact with external

entities• Ibis: parent-child relationship• Map-Reduce: master – worker, worker – worker

Page 28: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Future Works

• Workflow tasks cannot be isolated entities– Need means of interaction among them• Are raw sockets enough?• WAN, dynamic resource compatible

• Grid enabled Workflow framework with following properties– RMI, file transfer primitive between tasks

Page 29: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Map Worker

Map Worker

Map Worker

Reduce Worker

Reduce Worker

MapReduce

Input in Splits

Output per reducer

MasterNotify

Go Fetch

Page 30: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Node 0

Node 1

Node 2

OrphanedSub-Problems

Page 31: A Survey of Programming Frameworks for Dynamic Grid Workflow Applications

Adding Nodes• Possible to add more

nodes at runtime• Uses a global server that

is accessible from everywhere– A new node uses this

server to bootstrap itself and join the already participating nodes

• Random Work Stealing– Automatically load-

balances in the face of new nodes

Bootstrap Server

Join and Steal

Satin system