introduction to matlab distributed computing server (mdcs) - mcgill... · 1 introduction to matlab...

67
1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge [email protected] December 1st, 2015

Upload: others

Post on 23-Mar-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

1

Introduction to Matlab Distributed Computing Server

(MDCS)

Dan Mazur and Pier-Luc [email protected]

December 1st, 2015

Page 2: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

Partners and sponsors

2

Page 3: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

3

Exercise 0: Login and Setup

Example hand-out slip:07:k41a0?wy#

● Ubuntu login:● Username: csuser07● Password: ___@[S07

● Guillimin login:● ssh [email protected] ● Password: k41a0?wy#

Page 4: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

4

Outline

● Introduction and Overview● Configuring MDCS for Guillimin

● Submitting and monitoring jobs on Guillimin– batch command

● Parallel toolbox– parfor loops (parallel for loops)

– spmd sections (single program multiple data)

– distributed arrays (large memory problems)

– GPUs and Xeon Phis

Page 5: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

5

Parallel Computing Toolbox (PCT)

● High-level constructs for parallel programming– parallel for loops

– distributed arrays

– data parallel (spmd) sections

● Implicit (automatic) parallelism

● Implemented with MPI (MPICH2)

● Restricted to 12 cores on a single node– Multi-node scalability built into MPICH2

– Scalability intentionally limited through technological effort

Page 6: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

6

MDCS Overview

● MDCS allows parallel toolbox users access to a number of workers (set by the license terms) on any number of nodes

Page 7: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

7

MDCS vs. PCT differences● MDCS jobs are submitted to the batch system

on a cluster, not run locally– Client - Server model

● In PCT, one explicitly starts a parpool environment– In MDCS, this environment is requested in the

batch() command

Page 8: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

8

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Worker Nodes

Page 9: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

9

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Worker Nodes

Job

Scheduler

Page 10: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

10

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Monitoring informationWorker Nodes

Job

Scheduler

Page 11: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

11

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Monitoring informationWorker Nodes

Job

Scheduler

Important: Do not attachlarge data files.Data transfer to and from Guilliminis best accomplished with scp or sftp.See http://www.hpc.mcgill.ca for large file transfers.

Page 12: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

12

MDCS Licensing

One N-worker MDCS Job

Provided by user (often via institution) Provided by McGill HPC

Desktop Matlab license

Parallel computing toolbox license

Additional toolbox licensesPool of 64 MDCS licenses

N x MDCS worker licenses

1 master process worker license

Page 13: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

13

MDCS Scenario

● Researchers begin using desktop Matlab using institutional licenses

● Eventually, researchers and research programs depend on the resulting software

● Problem sizes increase with time, eventually necessitating parallel computing

● No problem: Mathworks uses an implementation of MPI with good scaling behaviour provided by the free software community to implement their parallel computing toolbox functionality

– But, place restrictions on number of nodes and cores

– Require additional licenses to remove these restrictions

● Because of decisions they made years ago, researchers find themselves facing either

– Potentially expensive license fees to unlock their software's capabilities, or

– Financial and time barriers to switching vendors (i.e. porting code)

Page 14: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

14

MDCS Alternatives● Compile MPI functions with mex

– Difficult to maintain, cannot use PCT functions, cannot use Matlab debugger, must have access to many individual Matlab licenses (e.g. TAH license)

● Use Matlab MPI - Use global file system for MPI-like communication– Low performance for tightly-coupled problems

● Use GNU Octave– Reduces the switching costs by re-implementing the Matlab programming

language

– Parallel capabilities are less mature than Matlab

● Porting code to another language (Python, R, Fortran, etc.)– Significant effort and time

● Contact us for help and advice– [email protected]

Page 15: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

15

MDCS Desktop Configuration

1) Install scripts used for communicating with scheduler

2) Configure the cluster profile

3) Verify your setup

Page 16: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

16

Exercise 1: Install Scripts

● Download and unpack .tar.gz configuration file on your local machine– E.g. Linux:cd <workdir>

wget \ http://www.hpc.mcgill.ca/downloads/mdcs_config/guillimin_mdcs_config_v2.3.tar.gz

tar -xvf guillimin_mdcs_config_v2.3.tar.gz

● copy all "config/toolbox-local/*" files to the "<your_matlab_install>/toolbox/local" folder on your local machine

● Start or restart Matlab. Then test your installation:

>> glmnVersion

Page 17: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

17

Permissions● What if you don't have write access to the toolbox/local folder?

● Create a new folder in your home directory for Matlab scripts

● Add the new path to your Matlab pathpath('newpath', path);

● Set new path in a startup.m file

● Use MATLABPATH environment variable in Mac and Linux OSs

Page 18: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

18

MDCS Integration ScriptsglmnCommSubFcn.m

glmnIndSubFcn.m

glmnGetRemoteConn.m

glmnPBS.m

glmnCreateSubScript.m

glmnExtractJobId.m

glmnGenSubmitString.m

glmnCommJobWrapper.shglmnIndJobWrapper.sh

glmnDeleteJobFcn.m

glmnGetJobStateFcn.m

Main drivers for submitting jobs

Establishes connection to cluster with ssh

Cancel job on cluster through Matlab

Get the job status from the cluster

Specifies the submission parameters

Creates a script which will run on clusterto submit the job

Generates the qsub command

Gets the PBS jobID from the cluster

The script that is submitted to the worker nodes by qsub

Page 19: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

19

Avoiding Metadata Corruption● Each pair (Server, Matlab installation) requires a pair of

metadata folders, one on the submitting computer and one on Guillimin

● E.g. installing a new version of Matlab and re-using the same metadata folders will result in corruption

● E.g. Submitting to a new MDCS server and re-using the same metadata folders will result in corruption

● E.g. Multiple users from the same client will require a shared metadata folder (read and write) or separate profiles– Important: You cannot re-use your class account

configuration for other Guillimin accounts

Page 20: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

20

How many metadata folders?

guillimin orcinusServers:

Clients:

R2013a R2014a

Lab computer Home computer

R2013a

Page 21: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

21

How many metadata folders?

guillimin orcinusServers:

Clients:

R2013a R2014a

Lab computer Home computer

R2013a

Answer: 12

Page 22: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

22

Exercise 2: Configure your computer● We have made a script, glmnConfigCluster.m, to

make configuration easier

● Warning: glmnConfigCluster will overwrite any profiles called 'guillimin'

>> glmnConfigCluster

Enter a unique name for your local computer (e.g. the hostname): workshopHome directory on local computer (e.g. /home/alex, /Users/alex, or C:\\Users\\alex): /Users/dmazurHome directory on guillimin (e.g. /home/alex): /home/dmazurOne last step: please connect to guillimin, and create your Matlab job directory:

mkdir -p /home/dmazur/.matlab/jobs/workshop/guillimin/R2014a

Once done, your local computer will be configured to submit jobs to guillimin.

Page 23: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

23

Exercise 3: Validation● You will want to test your new cluster with

simple tests before trying more complicated codes

● Clicking the validation button in Matlab can take a long time and the final test is expected to fail

● Perform the validation procedure from the McGill HPC documentation– Must be performed in the TestParfor directory

cd examples/TestParfor

– In glmnPBS.m, set procsPerNode to 3

Page 24: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

24

A simple batch job

● myCluster = parcluster('guillimin')– Selects a cluster profile

● j = batch(myCluster, ...)– Submits jobs to a cluster

● Prompted for username● Select 'no' when asked to use identity file● Prompted for password

● wait(j): Waits for job to finish

Page 25: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

25

Exercise 4: Simple Batch Job

>> myCluster = parcluster('guillimin')>> j = batch(myCluster, @rand, 1, {10, 10}, 'CurrentDirectory', '.');>> wait(j)>> r = fetchOutputs(j)

Page 26: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

26

glmnPBS.m

● For parallel jobs, we have a script (glmnPBS.m) to make job submission easier

● Place this script in your working directory● Before submission, check that you have a

valid glmnPBS.m file, and that your submission parameters are correct

>> test = glmnPBS();

>> test.getSubmitArgs()

Page 27: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

27

classdef glmnPBS %Guillimin PBS submission arguments properties % Local script, remote working directory (home, by default) localScript = 'TestParfor'; workingDirectory = '.';

% nodes, ppn, gpus, phis and other attributes numberOfNodes = 1; procsPerNode = 3; gpus = 0; phis = 0; attributes = '';

% Specify the memory per process required pmem = '1700m'

% Requested walltime walltime = '00:30:00'

% Please use metaq unless you require a specific node type queue = 'metaq'

% All jobs should specify an account or RAPid: % e.g. % account = 'xyz-123-aa' account = '';

% You may use otherOptions to append a string to the qsub command % e.g. % otherOptions = '-M email[at]address.com -m bae' otherOptions = '' end

Page 28: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

28

Submitting with glmnPBS.m

>> cluster = parcluster('guillimin');

>> glmnPBS.submitTo(cluster);

● Note that glmnPBS.m must be present for all job submissions, even with batch()

– Called by glmnCommSubFcn.m

methods(Static) function job = submitTo(cluster) opt = glmnPBS(); job = batch(cluster, opt.localScript, ... 'matlabpool', opt.getNbWorkers(), ... 'CurrentDirectory', opt.workingDirectory ... ); endend

Page 29: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

29

Matlab Job Monitor● Parallel > Monitor Jobs

● Select Profile: guillimin● Enter username

● Select 'no'

● Enter password

● Tip: Set autoupdate to 'never', or use an identity file. Otherwise, Matlab interrupts your work with password requests.

Page 30: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

30

Job Monitor can report the state, and more details such as output and errors (right click).

Matlab Job Monitor

Page 31: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

31

Monitoring Jobs on Guillimin

● Show running and queued jobsqstat -u class01

– qstat shows both MDCS and other Guillimin jobs

● Detailed scheduler information for job w/ jobID=########qstat -f ########

● Meta-data is stored in job-specific folders /home/username/.matlab/jobs/workshop/guillimin/R2014a/Job1

– The .log files contain output and error from Matlab itself

– The .txt files contain output from disp() and fprintf()

● You should create output and save matlab (.mat) files within your Guillimin storage (scratch, home, or project spaces)– fprintf()

– save()

Page 32: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

32

Exercise 5: Submit Parallel Job● Change the working directory to the examples/TestParfor folder you copied from the .tar.gz configuration file

● Launch TestParFor.m using glmnPBS.m

>> cluster = parcluster('guillimin')>> job = glmnPBS.submitTo(cluster)

Page 33: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

33

Make sure you are in thecorrect directory

>> cluster = parcluster('guillimin')>> job = glmnPBS.submitTo(cluster)

This script runs for ~15 minutes.You may use showq or the jobmonitor to monitor its progress.

Page 34: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

34

Exercise Codes

● While your job is waiting/running...

● Please download and extract the exercise codes from our website

● http://www.hpc.mcgill.ca/downloads/

intro_mdcs/dec2015.tar.gz

Page 35: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

35

Parallel Matlab● Benefits of parallelism

– Computations complete faster

– Scale to larger data sets in the same amount of time

– Work with larger data sets using distributed memory

Page 36: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

36

Parallel Matlab

● Implicit (automatic) parallelism– Bioinformatics toolbox

– Image processing toolbox

– optimization toolbox

– signal processing toolbox

– statistics toolbox

– etc...

● Explicit parallelism– parallel toolbox

● parfor● spmd● distributed()

Page 37: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

37

TestParfor.mfunction TestParfor; clear all; N=4000; filename='~/output_test_parfor.txt';outfile = fopen(filename,'w');fprintf(outfile, 'CALCULATION LOG: \n\n'); tic;for k=1:10 Ham(:,:,k)=rand(N)+i*rand(N); fprintf(outfile,'Serial: Doing K-point : %3i\n', k); inv(Ham(:,:,k));endt2=toc; fprintf(outfile, 'Time serial = %12f\n', t2);fclose(outfile); tic;parfor k=1:10 Ham(:,:,k)=rand(N)+i*rand(N); outfile = fopen(filename,'a'); fprintf(outfile,'Parallel: Doing K-point : %3i\n', k); fclose(outfile); inv(Ham(:,:,k));end t2=toc;outfile = fopen(filename,'a'); fprintf(outfile, 'Time parallel = %12f\n', t2);fprintf(outfile, 'CALCULATIONS DONE ... \n\n'); fclose(outfile);

Serial 'for' loop executed on headprocessor

Parallel 'parfor' loop executed on2 worker nodes

Location of output file on Guillimin

Page 38: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

38

Parfor

i = 1 i = 2 i = 3 i = 4

Time

Serial for loopi = 1 i = 2 i = 3 i = 4

Time

Serial for loop

Time

Parallel parfor loopwith 4 workers

i = 1

i = 2

i = 3

i = 4

Time

Page 39: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

39

~/output_test_parfor.txtCALCULATION LOG:

Serial: Doing K-point : 1Serial: Doing K-point : 2Serial: Doing K-point : 3Serial: Doing K-point : 4Serial: Doing K-point : 5Serial: Doing K-point : 6Serial: Doing K-point : 7Serial: Doing K-point : 8Serial: Doing K-point : 9Serial: Doing K-point : 10Time serial = 553.056296Parallel: Doing K-point : 7Parallel: Doing K-point : 4Parallel: Doing K-point : 6Parallel: Doing K-point : 3Parallel: Doing K-point : 5Parallel: Doing K-point : 2Parallel: Doing K-point : 1Parallel: Doing K-point : 9Parallel: Doing K-point : 8Parallel: Doing K-point : 10Time parallel = 291.879429CALCULATIONS DONE ...

Ideal speedup = 2.00XActual speedup = 1.90X

Serial 'for' loop executed on headprocessor

Parallel 'parfor' loop executed on2 worker nodes

Page 40: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

40

Parfor loops● Loop index must be consecutive integers

– Cannot be altered in the loop

● Iterations must be independent from one another– Local or temporary variables modified inside the

parfor loop can't be used after the for loop

● Cannot nest parfor loops– Don't need to be the outermost for loop

● Matlab editor will automatically warn about problems

Page 41: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

41

Load Balancing

● Each iteration of the for loop should do an equal amount of work

Good load balancing:

parfor i = 1: 40 x = rand(1000, 1000); inv(x);end

Bad load balancing:

parfor i = 1: 40 x = rand(100*i, 100*i); inv(x);end

40th iteration has much more workthan 1st iteration

Page 42: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

42

Parallel Reduction

>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

● Operation will be done 'atomically'● Operation must be associative

● e.g. addition or multiplication● not subtraction or division

Page 43: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

43

Aside: Atomic Operations>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

Step 1: Read s from memoryStep 2: add 1Step 3: Store result in s

s

0

0

0

1

1

Worker 2

0

0+1

Worker 1

0

0+1

non-atomic addition

1

23

1

23

Page 44: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

44

Aside: Atomic Operations>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

Step 1: Read s from memoryStep 2: add 1Step 3: Store result in s

s

0

0

0

1

1

Worker 2

0

0+1

Worker 1

0

0+1

s

0

0

0

1

1

1

1

2

Worker 2

1

1+1

Worker 1

0

0+1

non-atomic addition atomic addition

1

23

1

23

1

23

1

23

Page 45: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

45

Aside: Atomic Operations>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

Step 1: Read s from memoryStep 2: add 1Step 3: Store result in s

s

0

0

0

1

1

1

1

2

Worker 2

1

1+1

Worker 1

0

0+1

atomic addition

1

23

1

23

Matlab calls 's' a 'reduction variable' and these operations are automaticallyatomic.

http://www.mathworks.com/help/distcomp/reduction-variables.html

Page 46: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

46

Parallel Concatenation

>> y = [];>> parfor i = 1:10>> y = [y, i] ;>> end>> disp(y) 1 2 3 4 5 6 7 8 9 10

● Matrix is stored in 'correct' order according to index i

Page 47: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

47

Parameter Sweep● Damped harmonic

oscillator

● Give initial velocity for a variety of k's and b's and watch maximum response amplitude

Page 48: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

48

Exercise 6: Parameter Sweep● paramsweep.m solves a second-order

ordinary differential equation (ODE) for varying parameter values

● Modify this code to run in parallel on 2 workers

● Submit your modified code to the MDCS

● Retrieve the resulting plot from Guillimin using scp and view it on your laptop[laptop]$ scp \ [email protected]:~/paramsweep.png ./

Page 49: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

49

Page 50: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

50

Single Program Multiple Data● spmd command allows each worker to

execute the same program on different data

● Variables labindex and numlabs are (for example) used to index the data– Automatically defined inside SPMD sections

● Functions labsend() and labreceive() are used to send and receive data between the workers

Page 51: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

51

>> matlabpool(3) % Or parpool(3) on newer versions of MatlabStarting matlabpool using the 'local' profile ... connected to 3 workers.>> spmdlabindexendLab 1: ans = 1 Lab 2: ans = 2 Lab 3: ans = 3 >> spmdq = magic(labindex + 2);end

>> q q = Lab 1: class = double, size = [3 3] Lab 2: class = double, size = [4 4] Lab 3: class = double, size = [5 5] >> q{1}

ans =

8 1 6 3 5 7 4 9 2

>> q{2}

ans =

16 2 3 13 5 11 10 8 9 7 6 12 4 14 15 1

Page 52: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

52

SPMD data load example● SPMD can be used to have each worker

process data from separate files

● Example, process data stored in files datafile1.mat, datafile2.mat, etc...

spmd infile = load(['datafile' num2str(labindex) '.mat']); result = myfunc(infile)end

Page 53: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

53

Serial numerical integration

m = 10;b = pi/2;dx = b/m;x = dx/2:dx:b-dx/2;int = sum(cos(x)*dx)

Page 54: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

54

SPMD Integral● We would like to parallelize this integral using

spmd

● In terms of m, b, numlabs and labindex:– How many increments per lab?

– Integration length per lab?

– Local integration range?

● We can use gplus() to perform a global sum over workers

Page 55: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

55

SPMD Integral

● We would like to parallelize this integral using spmd

● In terms of m, b, numlabs and labindex:– How many increments per lab?

● n = m / numlabs

– Integration length per lab?● Delta = dx * n = (b / m) * (m / numlabs) = b / numlabs

– Local integration range?● ai = (labindex – 1) * Delta● bi = labindex * Delta

● We can use gplus(int, 1) to perform a global sum over int from each worker

Page 56: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

56

SPMD Integral

e.g.) m = 10, numlabs = 5n = 10/5 = 2 increments per labDelta = (pi/2)/5 = pi/10ai = (labindex-1)*pi/10bi = labindex*pi/10

Sum over increments for a worker:int = sum(cos(x)*dx);Global sum over all workers:int = gplus(int, 1);

Page 57: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

57

Exercise 7: Numerical Integration● integration.m is a serial numerical

integration program

● Modify this code to run in parallel using the spmd command

● Submit your modified code to the MDCS using 2 workers

Page 58: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

58

Distributed Arrays

[a;e;i;

m]

[b; f;j;

n]

[c;g; k; o]

[d; h; l;p]

matlabpool(4)A = distributed([ a b c d; e f g h; i j k l; m n o p]);

1 21

3 4

MDCS Workers

Page 59: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

59

Distributed Arrays

● Allow large data sets to be distributed over multiple nodes● Distributed by columns● Can be constructed by

– partitioning a large array already in memory

– combining smaller arrays into one large array

– using distributed matrix constructor functions (distributed.rand(), distributed.zeros(), etc.)

● Operations on distributed arrays are automatically parallelized● Arrays do not persist if the matlabpool is closed

Page 60: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

60

Codistributed Arrays● Codistributed arrays provide much more

control over how arrays are distributed– Can be distributed by any dimension

– Can distribute different amounts of data to different workers

● Codistributed arrays can be declared inside spmd sections

Page 61: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

61

Exercise 8: Matrix Multiplication● matrixmul.m is a serial matrix multiplication

● Modify this file to use distributed arrays– create distributed random arrays a, b

– time a matrix multiplication: tic; c = a*b; toc

● Submit the job for 1 worker and then for 4 workers– What is the speedup (serial time / parallel time)?

Page 62: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

62

Using GPUs with Matlab● The Parallel Computing Toolbox can utilize

CUDA-capable GPUs on the system (e.g. the K20s on Guillimin)

● GPU-enabled functions– fft, filter

– toolbox functions

● Linear-algebra operations

● Custom CUDA kernels – .cu or .ptx formats

Page 63: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

63

GPU Arrays● Matlab can copy arrays to the GPU

● Perform matrix operations on the GPU to speed them up

● e.g.– x = rand(1000, 'single', 'gpuArray');

– x2 = x.*x; %performed on GPU

Page 64: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

64

Exercise 9: GPU Job

● fourier.m is a serial fast Fourier transform (FFT) code

● Modify this file to perform the same calculation using normal and GPU Arrays

● Use tic and toc to time both operations and output the results

● Submit this job to a Guillimin GPU node– Hint: Simply request in glmnPBS.m

numberOfNodes = 1;

procsPerNode = 1;

gpus = 1;

● What is the speedup from the GPU?

Page 65: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

65

Summary● Today we learned:

– How to configure a desktop installation of Matlab to submit jobs to a cluster computer using MDCS

– How to submit jobs to a cluster and monitor their output

– How to write parallel Matlab applications using parfor, spmd, and distributed arrays

● Many Matlab programs can be parallelized with a very small change

● Note that parallel programming is a huge topic and we have only scratched the surface!

Page 66: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

66

Questions

What questions do you have?

Page 67: Introduction to Matlab Distributed Computing Server (MDCS) - McGill... · 1 Introduction to Matlab Distributed Computing Server (MDCS) Dan Mazur and Pier-Luc St-Onge guillimin@calculquebec.ca

67

Using Xeon Phi with Matlab

● Matlab uses the Intel MKL math library

● Version >= 11.0 of MKL has automatic offloading to Xeon Phi– Included in Matlab R2014a and newer

● On Guillimin:module add ifort_icc

export MKL_MIC_MAX_MEMORY=16G

export MKL_MIC_ENABLE=1

matlab &