work queue: a scalable master/worker framework
DESCRIPTION
Peter Bui June 29, 2010. Work Queue: A Scalable Master/Worker Framework. Master/Worker Model. Central Master application Divides work into tasks Sends tasks to Workers Gathers results Distributed collection of Workers Receives input and executable files Runs executable files - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/1.jpg)
Work Queue: A Scalable Master/Worker Framework
Peter BuiJune 29, 2010
![Page 2: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/2.jpg)
Master/Worker Model
• Central Master applicationo Divides work into taskso Sends tasks to Workerso Gathers results
• Distributed collection of Workerso Receives input and executable fileso Runs executable fileso Returns output files
![Page 3: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/3.jpg)
Work Queue versus MPI
Work Queue– Number of workers dynamic– Scale up to large number of
workers (100s - 1000s)– Reliable and fault tolerant at
the task level – Allows for heterogeneous
deployment environments– Workers communicate only
with Master
MPI– Number of workers static– Scale up to limited number of
workers (16, 32, 64)– Reliable at application level
but no fault tolerance– Requires homogeneous
deployment environment– Workers can communicate
with anyone
![Page 4: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/4.jpg)
Success Stories
Makeflow
SAND
Wavefront
All-Pairs
![Page 5: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/5.jpg)
Architecture (Overview)
![Page 6: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/6.jpg)
Architecture (Master)
• Uses Work Queue libraryo Creates a Queueo Submits Tasks
Command Input files Output files
o Library keeps tracks of Tasks When a Worker is available, the
library sends Taskso When Tasks complete
Retrieve output files
![Page 7: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/7.jpg)
Architecture (Workers)
• User start workers on any machine
• Contact Master and request work
• When Task is received, perform commutation, return results
• After set idle timeout, quit and cleanup
![Page 8: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/8.jpg)
API Overview (Work Queue)
Simple C API
• Work Queueo work_queue_create(int port)
Create a new work queue.o work_queue_delete(struct work_queue *q)
Delete a work queue.o work_queue_empty(struct work_queue *q)
Determine whether there are any known tasks queued, running, or waiting to be collected.
![Page 9: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/9.jpg)
API Overview (Task)
Simple C API
• Tasko work_queue_task_create(const char *command)
Create a new task specification. o work_queue_task_delete(struct work_queue_task *t)
Delete a task specification.o work_queue_task_specify_input_file(struct work_queue_task *t, const char *fname, const char *rname);Add input file specification.
o work_queue_task_specify_output_file(struct work_queue_task *t, const char *rname, const char *fname);Add output file specification.
![Page 10: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/10.jpg)
API Overview (Execution)
Simple C API
• Executiono work_queue_submit(struct work_queue *q, struct work_queue_task *t)Submit a job to a work queue.
o work_queue_wait(struct work_queue *q, int timeout)Wait for tasks to complete.
![Page 11: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/11.jpg)
Software Configuration
Web Information
http://cse.nd.edu/~ccl/software/installed.shtml
AFS $ setenv PATH ~ccl/software/cctools/bin:$PATH $ setenv PATH ~condor/software/bin:$PATH
CRC
$ module use /afs/nd.edu/user37/ccl/software/modulefiles $ module load cctools $ module load condor
![Page 12: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/12.jpg)
Example 1: DConvert
• Goal: convert set of input images to specified format in parallelo Input: <format> <input_image1> <input_image2> ...o Output: converted images in specified format
• Skeleton:o ~pbui/www/scratch/workqueue-tutorial.tar.gz
![Page 13: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/13.jpg)
DConvert (Preparation)
Setup scratch workspace$ mkdir /tmp/$USER-scratch$ cd /tmp/$USER-scratch$ pwd
Copy source tarball and extract it$ cp ~pbui/www/scratch/workqueue-tutorial.tar.gz .$ tar xzvf workqueue-tutorial.tar.gz
$ cd workqueue-tutorial$ ls
Open dconvert.c source file for editting$ gedit dconvert.c &
![Page 14: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/14.jpg)
DConvert (TODO 1, 2, and 3)
// TODO 1: include work queue header file
#include "work_queue.h"
// TODO 2: declare work queue and task structs
struct work_queue *q;struct work_queue_task *t;
// TODO 3: create work queue using default port
q = work_queue_create(0);
![Page 15: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/15.jpg)
DConvert (TODO 4, 5, 6)
// TODO 4: create task, specify input and output file, submit task
t = work_queue_task_create(command);work_queue_task_specify_input_file(t, input_file, input_file);work_queue_task_specify_output_file(t, output_file, output_file);work_queue_submit(q, t);
// TODO 5: while work queue is empty wait for task, then delete returned task
while (!work_queue_empty(q)) { t = work_queue_wait(q, 10); if (t) work_queue_task_delete(t);}
// TODO 6: delete work queue
work_queue_delete(q);
![Page 16: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/16.jpg)
DConvert (Demonstration)
Build and prepare application$ make$ cp /usr/share/pixmaps/*.png .
Start batch of workers$ condor_submit_workers `hostname` 9123 5
Start application$ ./dconvert jpg *.png
![Page 17: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/17.jpg)
Tips and Tricks (Debugging)
Debugging
• Enable cctools debugging systemo In master application:
debug_flags_set("wq"); debug_flags_set("debug");
o In workers: work_queue_worker -d debug -d wq <hostname> <port>
• Incrementally test number of workers
Failed Execution
• Include executable and dependencies as input files• Right target platform (32-bit vs 64-bit, OS, etc.)
![Page 18: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/18.jpg)
Tips and Tricks (Tasks)
Tag Tasks
• Give a task an identifying tag so Master can keep track of it
Use input and output buffers• work_queue_task_specify_input_buf
o Contents of buffer will be materialized as a file at worker• task->output
o Buffer that contains standard output of task
Check task results• task->result: result of task• task->return_status: exit code of command line
![Page 19: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/19.jpg)
Tips and Tricks (Batch)
Custom Worker Environment
• Modify batch system specific submit scriptso condor_submit_workers
Set requirementso sge_submit_workers
Set environment Set modules
![Page 20: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/20.jpg)
Tips and Tricks (CRC)
Submit master, find host, submit workers• qsub myscript.sh
#!/bin/cshmaster
• qstat -u <afsid> | grep myscript.sh
• sge_submit_workers <hostname> <port>
![Page 21: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/21.jpg)
Example 2: Mandelbrot Generator
• Goal: generate mandelbrot imageo Input: <width> <height> <xmin> <xmax> <ymin> <ymax> <max_iterations>o Output: mandelbrot image in PPM format
• Skeleton:o ~pbui/www/scratch/workqueue-tutorial.tar.gz
![Page 22: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/22.jpg)
Mandelbrot (Overview)
z(n+1) = z^2 + c
Escape Time Algorithm
• For each pixel (r, c) in image calculate if corresponding point (x, y) escapes boundary
• Iterative algorithm where each pixel computation is independent
Application design
• Master partitions image into tasks• Workers compute Escape Time Algorithm on partitions
![Page 23: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/23.jpg)
Mandelbrot (Naive Approach)
Master
• For each pixel (r, c) in image (width x height)o Computer corresponding x, yo Submit task with for pixel with x, y
Pass x, y parameters as input buffer Tag task with r, c values
• Wait for each task to complete:o Retrieve output of worker from task->outputo Retrieve r, c from task->tago Store pixel[r, c] = output
• Output pixels in PPM format
![Page 24: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/24.jpg)
Mandelbrot (Naive Approach)
Worker
• Read in parameters from input file:o x0, y0, max_iterations, black_value
• Perform Mandelbrot computation as specified from Wikipedia:o http://en.wikipedia.org/wiki/Mandelbrot_set#For_programmers
• Output result (iterations) to standard out
![Page 25: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/25.jpg)
Mandelbrot (Analysis)
Problem
• Processing each pixel as a single task is inefficiento Too-fine grainedo Overhead of sending parameters, running tasks, and
retrieving results > than computation time
Work Queue Golden Rule:
Computation Time > Data Transfer Time + Task setup overhead
![Page 26: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/26.jpg)
Mandelbrot (Better Approach)
Send Rows
• Process groups of pixels rather than individual ones:o Send a row and have the worker return a series of resultso Perhaps send multiple rows?
• Should take execution time from minutes to seconds
![Page 27: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/27.jpg)
Mandelbrot (Demonstration)
Build application$ make
Start batch of workers$ condor_submit_workers `hostname` 9123 10
Start application$ ./mandelbrot_master 512 512 -2 1 -1.5 1.5 250 > output.ppm$ display output.ppm
![Page 28: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/28.jpg)
Advanced Features
Fast Abort
• Allow Work Queue to pre-emptively kill slow tasks• work_queue_activate_fast_abort(q, X)
o X is the fast abort multipliero if (runtime >= average_runtime * X) fast_abort
Scheduling
• Change how workers are selectedo FCFS: first come, first serveo FILES: has the most cached fileso TIME: fastest average turn around time
• Can be set for queue or for task
![Page 29: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/29.jpg)
Advanced Features (More)
Automatic Master Detection
• Start master with a project name:o setenv WORK_QUEUE_NAME="project_name"
• Enable master auto selection mode with workerso work_queue_worker -a -N "project_name"o work_queue_pool -T condor -a -N "project_name"
• Checkout master at http://chirp.cse.nd.edu
Shut down workers• work_queue_shut_down_workers
![Page 30: Work Queue: A Scalable Master/Worker Framework](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56814565550346895db2372d/html5/thumbnails/30.jpg)
Web Resources
Website
http://www.nd.edu/~ccl/software/workqueue/
• User manual and C API documentation
Bug Reports and Suggestions
http://www.cse.nd.edu/~ccl/software/help.shtml
Python-API
http://bitbucket.org/pbui/python-workqueue/
• Experimental Python binding