parallel programming in matlab on biohpc › media › filer_public › 8d › d7 › ... ·...

42
Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu [email] [email protected] Updated for 2018-03-21

Upload: others

Post on 23-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Programming in MATLAB on BioHPC

1

[web] portal.biohpc.swmed.edu

[email] [email protected]

Updated for 2018-03-21

Page 2: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

What is MATLAB

2

• High level language and development environment for:- Algorithm and application development- Data analysis- Mathematical modeling

• Extensive math, engineering, and plotting functionality

• Add-on products for image and video processing, communications, signal processing and more.

powerful & easy to learn !

Page 3: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Ways of MATLAB Parallel Computing

3

• Build-in Multithreading (Implicit Parallelism)*automatically handled by Matlab

- use of vector operations in place of for-loops+, -, * , /, SORT, CEIL, LOG, FFT, IFFT, SVD, …

Example: dot product

v.s. for i = 1 : size(A, 1)for j = 1: size(A, 2)

C(i, j) = A(i, j) * B(i, j);end

end

C = A .* B;

If both A and B are 1000-by-1000 Matrix, the Matlab build-in dot product is 50~100 times faster than for-loop.

* Testing environment : Matlab/R2015a @ BioHPC computer node

Page 4: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Ways of MATLAB Parallel Computing

4

• Parallel Computing Toolbox *Acceleration MATLAB code with very little code changes

job will running on your workstation, thin-client or a reserved compute node

- Parallel for Loop (parfor)- Spmd (Single Program Multiple Data)- Pmode (Interactive environment for spmd development)- Parallel Computing with GPU- Benchmark

• Matlab Distributed Computing Server (MDCS)*Directly submit Matlab job to BioHPC cluster

- Matlab job scheduler integrated with SLURM- Single job use multiple nodes

Page 5: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox –Parallel for-Loops (parfor)

5

• Allow several MATLAB workers to execute individual loop iterations simultaneously.

• The only difference in parfor loop is the keyword parfor instead of for. When the loop begins, it opens a parallel pool of MATLAB sessions called workers for executing the iterations in parallel.

for …<statements>

end

<open matlab pool>

parfor …<statements>

end

<close matlab pool>

Page 6: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

a b

6

Parallel Computing Toolbox –Parallel for-Loops (parfor)

Example 1: Estimating an Integration

𝐹 𝑥 = න𝑎

𝑏

0.2𝑠𝑖𝑛2𝑥 𝑑𝑥

The integration is estimated by calculating the total area under the F(x) curve.

𝐹 𝑥 ≈

1

𝑛

ℎ𝑒𝑖𝑔ℎ𝑡 ∗ 𝑤𝑖𝑑𝑡ℎ

We can calculate subareas at any order !

Page 7: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

7

quad_fun.m (serial version)

function q = quad_fun (n, a, b)q = 0.0;w=(b-a)/n;for i = 1 : n

x = (n-i) * a + (i-1) *b) /(n-1);fx = 0.2*sin(2*x);q = q + w*fx;

endreturn

end

function q = quad_fun_parfor (n, a, b)q = 0.0;w=(b-a)/n;parfor i = 1 : n

x = (n-i) * a + (i-1) *b) /(n-1);fx = 0.2*sin(2*x);q = q + w*fx;

endreturn

end

quad_fun_parfor.m (parfor version)

12 workersSpeedup = t1/t2 = 6.14x

Parallel Computing Toolbox –Parallel for-Loops (parfor)

Example 1: Estimating an Integration

Page 8: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

8

Parallel Computing Toolbox –Parallel for-Loops (parfor)

Q: How many workers can I use for MATLAB parallel pool ?

Default Maximum

128GB, 384GB, GPU,32GB 12 16*

256GB 12 24*

256GBv1, GPUv1 14 28*

Workstation 6 6

Thin client 2 2

** Use parpool(‘local’, numOfWorkers) to specify how many workers you want.(or use matlabpool(‘local’, numOfWorkers) if your are using matlab_r2013a)

2013a/2013b• max number of workers = 12

2014a/2014b/2015a/2015b• max number of workers = number of physical cores

Page 9: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

9

Parallel Computing Toolbox –Parallel for-Loops (parfor)

Q: How many workers can I use for MATLAB parallel pool ?

2016a/2016b/2017a/2017b• max number of workers = 512• default number of workers = number of

physical cores• choose smartly

% load local profile configurationmyCluster = parcluster('local');% increase number of workersmyCluster.NumWorkers=32;

% open matlab poolparpool(myCluster);

% parfor loopparfor …<statements>

end

% close matlab pooldelete(gcp('nocreate'));

Number of workers

128GB, 384GB, GPU, 32GB <= 32

256GB <= 48

256GBv1, GPUv1 <=56

Workstation <= 12

Thin client <= 4

Page 10: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Limitations of parfor

10

No Nested parfor loops

parfor i = 1 : 10parfor j = 1 : 10

<statement>end

end

parfor i = 1 : 10for j = 1 : 10

<statement>end

end

* Because a worker cannot open a parallel pool

Loop iterations must be independent

parfor i = 2 : 10A(I) = 0.5* ( A(I-1) + A(I+1) );

end

Loop variable must be increasing integers

parfor x = 0 : 0.1 : 1<statement>

end

xAll = 0 : 0.1 :1;parfor ii = 1 : length(xAll)

x = xAll(ii);<statement>

end

Page 11: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – spmd and Distributed Array

11

spmd : Single Program Multiple Data

• single program -- the identical code runs on multiple workers.• multiple data -- each worker can have different, unique data for that code.

• numlabs – number of workers.• labindex -- each worker has a unique identifier between 1 and numlabs.

• point-to-point communications between workers : labsend, labreceive, labsendrecieve

• Ideal for: 1) programs that takes a long time to execute.2) programs operating on large data set.

spmdlabdata = load([‘datafile_’ num2str(labindex) ‘.ascii’]) % multiple dataresult = MyFunction(labdata); % single program

end

<open matlab pool>

spmd<statements>

end

<close matlab pool>

Page 12: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

12

function q = quad_fun (n, a, b)q = 0.0;w=(b-a)/n;for i = 1 : n

fx = 0.2*sin(2*i);q = q + w*fx;

endreturn

end

a = 0.13;b = 1.53;n = 120000000;spmd

aa = (labindex - 1) / numlabs * (b-a) + a;bb = labindex / numlabs * (b-a) + a;nn = round(n / numlabs)q_part = quad_fun(nn, aa, bb);

endq = sum([q_part{:}]);

quad_fun.m

Parallel Computing Toolbox – spmd and Distributed Array

quad_fun_spmd.m (spmd version)

12 workersSpeedup = t1/t3 = 8.26x

Example 1: Estimating an Integration

12 communications between client and workers to update q, whereas in parfor, it needs 120,000,000 communications between client and workers

Page 13: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

13

Parallel Computing Toolbox – spmd and Distributed Array

distributed array - Data distributed from client and access readily on client. Data always distributed along the last dimension, and as evenly as possible along that dimension among the workers.

codistributed array – Data distributed within spmd, user define at which dimension to distribute data . Array created directly (locally) on worker.

parpool(‘local’);A = rand(3000);B = rand(3000);spmd

u = codistributed(A, codistributor1d(1)); % by row, 1st dimv = codistributed(B, codistributor1d(2)); % by column, 2nd dimw = u * v;

enddelete(gcp);

Matrix Multiply A*B

parpool(‘local’);A = rand(3000);B = rand(3000);a = distributed(A);b = distributed(B);c = a*b; % run on workers, c is distributeddelete(gcp);

distributed array codistributed array

Page 14: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

14

Parallel Computing Toolbox – pmode: learn parallel programing interactively

Enter command here

Page 15: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

15

Example 2 : Ax = b

Parallel Computing Toolbox – spmd and Distributed Array

n = 10000;M = rand(n);X = ones(n, 1);A = M + M’;b = A * X;u = A \ b;

linearSolver.m (serial version) n = 10000;M = rand(n);X = ones(n, 1);spmd

m = codistributed(M, codistributor(‘1d’, 2));x = codistributed(X, codistributor(‘1d’, 1));A = m +m’;b = A * x;utmp = A \ b;

endu1 = gather(utmp);

linearSolverSpmd.m (spmd version)

Page 16: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

16

Parallel Computing Toolbox – spmd and Distributed Array

Factors reduce the speedup of parfor and spmd

• Computation inside the loop is simple.• Memory limitations. (create more data compare to serial code)• Transfer and convert data is time consuming• Unbalanced computational load• Synchronization

Page 17: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – spmd and Distributed Array

17

Example 3 : Contrast Enhancement

function y = contrast_enhance (x)x = double (x);n = size (x, 1);x_average = sum ( sum (x(:,:))) /n /n ;s = 3.0; % the contrast s should be greater than 1y = (1.0 – s ) * x_average + s * x((n+1)/2, (n+1)/2);return

end

Running time is 14.46 s

contrast_enhance.m

x = imread(‘surfsup.tif’);yl = nlfilter (x, [3,3], @contrast_enhance);y = uint8(y);

Image size : 480 * 640 uint8

contrast_serial.m

(example provided by John Burkardt @ FSU)

Page 18: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

18

Parallel Computing Toolbox – spmd and Distributed Array

Example 3 : Contrast Enhancement

x = imread(‘surfsup.tif’);xd = distributed(x);spmd

xl = getlocalPart(xd);xl = nlfilter(xl, [3,3], @contrast_enhance);

endy = [xl{:}];

contrast_spmd.m (spmd version)

12 workers, running time is 2.01 sspeedup = 7.19x

When the image is divided by columns among the workers, artificial internal boundaries are created !

Solution: build up communication between workers.

General sliding-neighborhood operation

zero padding at edges

Page 19: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – spmd and Distributed Array

Example 3 : Contrast Enhancement

column = labSendReceive ( previous, next, xl(:,1) );

column = labSendReceive ( next, previous, xl(:,end) );

previous next

ghost boundary

For 3*3 filtering window, you need 1 column of ghost boundary on each side.

Page 20: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

column = labSendReceive ( previous, next, xl(:,1) );if ( labindex < numlabs )

xl = [ xl, column ]; end

column = labSendReceive ( next, previous, xl(:,end) );if ( 1 < labindex )

xl = [ column, xl ]; end

xl = nlfilter ( xl, [3,3], @contrast_enhance );

if ( 1 < labindex ) xl = xl(:,2:end);

end

if ( labindex < numlabs ) xl = xl(:,1:end-1);

end

xl = uint8 ( xl ); endy = [ xl{:} ];

2020

x = imread('surfsup.tif');xd = distributed(x);

spmdxl = getLocalPart ( xd );

if ( labindex ~= 1 ) previous = labindex - 1;

elseprevious = numlabs;

end

if ( labindex ~= numlabs ) next = labindex + 1;

elsenext = 1;

end

12 workersrunning time 2.00 sspeedup = 7.23x

contrast_MPI.m (mpi version)

Parallel Computing Toolbox – spmd and Distributed Array

Example 3 : Contrast Enhancement

find out previous & next by labindex

attache ghost boundaries

remove ghost boundaries

Page 21: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

21

Parallel Computing Toolbox – GPU

Parallel Computing Toolbox enables you to program MATLAB to use your computer’s graphics processing unit (GPU) for matrix operations. For some problems, execution in the GPU is faster than in the CPU.

Step 1 : reserve a GPU nodeStep 2 : inside terminal, type in

export CUDA_VISIBLE_DEVICES=“0”to enable the hardware rendering.

How to enable GPU computing of Matlab on BioHPC cluster

• You can launch Matlab directly if you have GPU card on your workstation.• Current supporting versions on BioHPC: 2013a/2013b/2014a/2014b/2015a/2015b

Page 22: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – GPU

22

Example 2 : Ax = b

linearSolverGPU.m (GPU version)

Speedup = t1/t3 = 2.54x

n = 10000;M = rand(n);X = ones(n, 1);Mgpu = gpuArray(M);Xgpu = gpuArray(X);Agpu = Mgpu + Mgpu’;Bgpu = Agpu * Xgpu;ugpu = Agpu \ bgpuu = gather(ugpu);

Copy data from CPU to GPU ( ~ 0.3 s)

Copy data from GPU to CPU ( ~0.0002 s)

Page 23: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

23

Parallel Computing Toolbox – GPU

Check GPU information (inside Matlab)

Q: If the GPU available.if (gpuDeviceCount > 0 )

Q: How to check memory usage?currentGPU = gpuDevice;

currentGPU.AvailableMemory;

* totalMemory refers to the GPU accelerator memory, e.g: K40m ~12 GB; K20m ~5 GB

Page 24: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – GPU

24

How to use multiple GPU devices? E.g. GPUv1

nGPU=gpuDevice;Parpool(‘local’, nGPU);

spmdgpuDevice(labindex);…

end;

Page 25: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – memory comsumption

25

Out of memory

* the actually memory consumption is much higher than the input data size

Page 26: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – benchmark

26

Wiki

In computing, a benchmark is the act of running a computer program, a set of programs, or

other operations, in order to assess the relative performance of an object, normally by

running a number of standard tests and trials against it.

Example:

Benchmarking A\b on the GPUhttps://www.mathworks.com/help/distcomp/examples/benchmarking-a-b-on-the-gpu.html

We want to benchmark matrix left division (\), and not the cost of transferring data between the CPU and GPU, the time it takes to create a matrix, or other parameters

openExample('distcomp/paralleldemo_gpu_backslash')

function time = timeSolve(A, b, waitFcn)

tic;

x = A\b; %We don't need the value of x.

time = toc; end

Page 27: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – proxy settings

27

Page 28: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Parallel Computing Toolbox – benchmark A\b on the GPU

28

@nucleus171, GPUv1>> gpuDevice(1)

ans =

CUDADevice with properties:

Name: 'Tesla P100-PCIE-16GB'Index: 1

ComputeCapability: '6.0'SupportsDouble: 1DriverVersion: 9

ToolkitVersion: 8MaxThreadsPerBlock: 1024MaxShmemPerBlock: 49152

MaxThreadBlockSize: [1024 1024 64]MaxGridSize: [2.1475e+09 65535 65535]

SIMDWidth: 32TotalMemory: 1.7067e+10

AvailableMemory: 1.6578e+10MultiprocessorCount: 56

ClockRateKHz: 1328500ComputeMode: 'Default'

GPUOverlapsTransfers: 1KernelExecutionTimeout: 1

CanMapHostMemory: 1DeviceSupported: 1DeviceSelected: 1

* Use the number of floating

point operations per second as

our measure of performance

* nvidia-smiCheck which GPU device is used

Page 29: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Matlab Distributed Computing Server

29

Page 30: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Matlab Distributed Computing Server: Setup and Test your MATLAB environment

30

Home -> ENVIRONMENT -> Parallel -> Manage Cluster Profiles

Add -> Import Cluster Profiles from fileAnd then select profile file from:

/project/apps/apps/MATLAB/profile/nucleus_r2017a.settings/project/apps/apps/MATLAB/profile/nucleus_r2016b.settings/project/apps/apps/MATLAB/profile/nucleus_r2016a.settings/project/apps/apps/MATLAB/profile/nucleus_r2015b.settings/project/apps/apps/MATLAB/profile/nucleus_r2015a.settings/project/apps/apps/MATLAB/profile/nucleus_r2014b.settings/project/apps/apps/MATLAB/profile/nucleus_r2014a.settings/project/apps/apps/MATLAB/profile/nucleus_r2013b.settings/project/apps/apps/MATLAB/profile/nucleus_r2013a.settings

*based on your matlab version

Page 31: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

31

Matlab Distributed Computing Server: Setup and Test your MATLAB environment

You should have two Cluster Profiles ready for Matlab parallel computing:“local “ – for running job on your workstation, thin client or on any single compute node of

BioHPC cluster (use Parallel Computing toolbox)“nucleus_r<version No.>” – for running job on BioHPC cluster with multiple nodes (use

MDCS)

* Ignore the “MATLAB Parallel Cloud” profile if you are using 2016a/2016b/2017a

Test your parallel profile

Page 32: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Matlab Distributed Computing Server: Setup SLURM environment

32

From Matlab command window:ClusterInfo.setQueueName(‘128GB’); % use 128 GB partitionClusterInfo.setNNode(2); % request 2 nodesClusterInfo.setWallTime(‘1:00:00’); % setup time limit to 1 hourClusterInfo.setEmailAddress(‘[email protected]’); % email notification

You could check your settings by:ClusterInfo.getQueueName(); ClusterInfo.getNNode(); ClusterInfo.getWallTime(); ClusterInfo.getEmailAddress();

Page 33: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

33

Matlab Distributed Computing Server: batch processing

For scripts:job = batch(‘ascript’, ‘Pool’, 15, ‘profile’, ‘nucleus_r2016a’);

For functions:job = batch(@sfun, 1, {50, 4000}, ‘Pool’, 15, ‘profile’, ‘nucleus_r2016a’);

script name

number of workers - 1 configuration file

function name

number of variables to be returned by function

cell array containing input arguments

Job submission

Page 34: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

34

Matlab Distributed Computing Server: batch processing

Wait for batch job to be finished

• Block session until job has finished : wait(job, ‘finished’);

• Occasionally checking on state of job : get(job, ‘state’);

• Load result after job finished : results = fetchOutputs(job) -or- load(job);

• Delete job : delete(job);

You can also open job monitor from Home->Parallel->Monitor Jobs

Page 35: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

35

Matlab Distributed Computing Server

Example 1: Estimating an Integration

ClusterInfo.setQueueName(‘super’); ClusterInfo.setNNode(2); ClusterInfo.setWallTime(‘00:30:00’);

job = batch(@quad_fun, 1, {120000000, 0.13, 1.53}, ‘Pool’, 63, ‘profile’, ‘nucleus_r2016a’);wait(job, ‘finished’);Results = fetchOutputs(job);

quad_submission.m

Page 36: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

36

Matlab Distributed Computing Server

* Recall the serial job only needs 5.7881 s, MDCS is not designed for small jobs

If we submit spmd version code with MDCS

Running time is 15 sjob = batch(‘quad_fun_spmd’, ‘Pool’, 63, ‘profile’, ‘nucleus_r2016a’);

Running time is 24 s

Page 37: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

37

Matlab Distributed Computing Server

Example 4: Vessel Extraction

microscopy image (size:2126*5517)

binary image243 subsets of binary image

local normalizationremove pepper & saltthreshold

separate vessel groups based on connectivity

find the central line of vessel (fast matching method)

~ 6.5 hours

~ 21.3 seconds

~ 6.6 seconds

Page 38: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

38

Matlab Distributed Computing Server

Example 4: Vessel Extraction (step 3)

% get file listsubImages = dir(['BWimages', '/*.jpg']);

parfor i = 1 : length(subImages)filename = subImages(i).name;fastMatching(filename);

end

time needed for processing each sub image various Job running on 4 nodes requires minimum computational timet = 784 s

t = 784 s

Speedup = 28.3x

vesselExtraction_submission.m

Page 39: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

License limitation of Matlab Distributed Computing Server

39

Maximum concurrent licenses = 128 workerShared with all BioHPC user

Solution :• Single-node jobs – use local profile, increase number of workers (2016a)• Multiple-node jobs – SLURM + GUN parallel

- ex_04 : example_parallel_sbatch.sh

Job pending on MDCS licenses

4 nodes with a pool size of 128 threads, t = 699 s

Page 40: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

SLURM + GUN parallel

40

* Image retrieved from https://www.biostars.org/p/63816/

parallelize is to run 8 jobs on each CPU

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time

A shell tool for executing jobs in parallel using one or more computers.

http://www.gnu.org/software/parallel/

Usage: sbatch GUN parallel srun base scripts

Page 41: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Submit multiple jobs to multi-node with srun & GUN parallel

41

#!/bin/bash

#SBATCH --job-name=parallelExample#SBATCH --partition=super#SBATCH --nodes=4#SBATCH –ntasks=128#SBATCH --time=1-00:00:00

CORES_PER_TASK=1INPUTS_COMMAND=“ls BWimages”TASK_SCRIPT=“single.sh”

module add parallel

SRUN=“srun --exclusive –n –N1 –c $CORES_PER_TASK”PARALLEL=“parallel --delay .2 –j $SLURM_NTASKS –joblog task.log”

eval $INPUTS_COMMAND | $PARALLEL $SRUN sh $TASK_SCRIPT {}

#!/bin/bash

module add matlab/2016a

matlab –nodisplay –nodesktop -singleCompThread –r “fastMatching(‘$1’), exit”

single.sh

• INPUTS_COMMAND: generate a input file list• TASK_SCRIPT: base script

Page 42: Parallel Programming in MATLAB on BioHPC › media › filer_public › 8d › d7 › ... · 2019-06-27 · Parallel Programming in MATLAB on BioHPC 1 [web] portal.biohpc.swmed.edu

Questions and Comments

42

For assistance with MATLAB, please contact the BioHPC Group:

Email: [email protected]

Submit help ticket at http://portal.biohpc.swmed.edu

When did it happen?What time? Cluster or client? What job id? How you run it?

Which Matlab Version?If the code comparable with different versions?

Can we look at your scripts and data?Tell us if you are happy for us to access your scripts/data to help troubleshoot.