cuda matrix-matrix multiplication - sdsu librarymthomas/sp17.605/lectures/cuda...matrix-matrix...

33
COMP 605: Introduction to Parallel Computing Lecture : CUDA Matrix-Matrix Multiplication Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University (SDSU) Posted: 04/25/17 Last Update: 04/25/17

Upload: others

Post on 08-Mar-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Introduction to Parallel ComputingLecture : CUDA Matrix-Matrix Multiplication

Mary Thomas

Department of Computer ScienceComputational Science Research Center (CSRC)

San Diego State University (SDSU)

Posted: 04/25/17Last Update: 04/25/17

Page 2: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 2/33 Mary Thomas

Table of Contents

1 CUDA Matrix MultiplicationMatrix-Matrix Multiplication - CUDA ApproachCUDA MatMul Code

Page 3: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 3/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication (Mat-Mat-Mult)

Page 4: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 4/33 Mary Thomas

CUDA Matrix Multiplication

2D Matrix-Matrix Multiplication (Mat-Mat-Mult)

/* Serial_matrix_mult */for (i = 0; i < n; i++)

for (j = 0; j < n; j++) {C[i][j] = 0.0;for (k = 0; k < n; k++)

C[i][j] = C[i][j] + A[i][k]*B[k][j];printf(... )

}

Where:A is an [mx k] matrixB is a [k x n]C is a matrix with the dimensions [mx n]

Page 5: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 5/33 Mary Thomas

CUDA Matrix Multiplication

2D Matrix-Matrix Multiplication (Mat-Mat-Mult)

Definition: Let A be an [mx k] matrix, and B be a be an [k x n],then C will be a matrix with the dimensions [mx n].

Then AB = bcijc, andcij =

∑kt=1 aitbtj = ai1b1j + ai2b2j + · · ·+ ak1bkj

=

a00 ... a0j ... a0,k−1

...ai0 ... aij ... ai,k−1

...am−1,0 ... am−1,j ... am−1,k−1

b00 ... b0j ... b0,n−1

...bi0 ... bij ... bi,n−1

...bk−1,1 ... bkj ... bn−1,p−1

=

c00 ... c1j ... c1,n−1

...ci0 ... cij ... ci,n−1

...cm−1,0 ... cmj ... cm−1,n−1

c12 = a11b12 + a12b22 + a13b32

Page 6: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 6/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Recall: Defining GPU Threads and Blocks

Looking at Device: Nvidia Tesla C1060

Kernels run on GPU threads

Grid: organized as 2D array of blocks:

Maximum sizes of each dimension:[gridDim.x x gridDim.y x gridDim.z]

= (65, 536 x 65, 536 x 1) blocks

Block: 3D collection of threads

Max threads per block: 512Max thread dimensions: (512, 512, 64)[blockDim.x ∗blockDim.y ∗blockDim.z]MaxThds/Block ≤ 1024

threads composing a thread block must:

execute the same kernelshare data: issued to the same coreWarp: group of 32 threads; min size ofdata processed in SIMD fashion byCUDA multiprocessor.

Source: http://hothardware.com/Articles/NVIDIA-GF100-Architecture-and-Feature-Preview

Page 7: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 7/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix Mult: linear mapping of a 2D matrix in C.

CUDA does not allowrun-time allocation of a 2Dmatrix

C memory mapping isrow −major order.

Index for accessing matrix Min the inner loop:M[ i × width + k ]

Need to linearize the array inrow −major order, into avector which can bedynamic.

1 D array, where Element[row][col] is element [row*width+col]

Thread mapping: intx = threadIdx .x + blockIdx .x ∗ blockDim.x ;

Page 8: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 8/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Mapping Serial Code to Threads

Source: http://www.hpcwire.com/features/Compilers_and_More_

Optimizing_GPU_Kernels.html

Page 9: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 9/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Programming Model

P = M × N is a dot productEach dot product is independent of all the others.

Page 10: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 10/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix Mult: Serial C code (K&H)

Calculating P[i ][j ] = P[i ][j ] +M[i ][k]× N[k][j ]

Page 11: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 11/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix-Multiplication Algorithm for GPU/CUDA Host.

Host uses CudaMalloc to allocate memory on the device globalmemoryspace.

Page 12: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 12/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Cuda Device Memory Model

Host and device have separatememories.

Host can only copy to/fromglobal memory andconstant memory

cudaMalloc()cudaFree()cudaMemcpy()

Page 13: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 13/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix-Multiplication Algorithm for GPU/CUDA Host.

Host code showing cudaMalloc calls.

Page 14: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 14/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 15: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 15/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Current code only uses threadIdnx, so can only use 1 block.

Page 16: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 16/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 17: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 17/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Handling Arbitrary Sized Square Matrices

Page 18: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 18/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 19: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 19/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 20: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 20/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 21: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 21/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 22: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 22/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix-Multiplication Using 2x2 Block Grid

Page 23: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 23/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix-Multiplication - Thread Actions

Page 24: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 24/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix-Multiplication using multiple thread blocks

Page 25: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 25/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Matrix-Multiplication using multiple thread blocks

Page 26: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 26/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 27: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 27/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 28: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 28/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 29: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 29/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 30: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 30/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 31: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 31/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 32: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 32/33 Mary Thomas

CUDA Matrix Multiplication

Matrix-Matrix Multiplication - CUDA Approach

Page 33: CUDA Matrix-Matrix Multiplication - SDSU Librarymthomas/sp17.605/lectures/CUDA...Matrix-Matrix Multiplication - CUDA Approach Current code only uses threadIdnx, so can only use 1 block

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 33/33 Mary Thomas

CUDA Matrix Multiplication

CUDA MatMul Code

CONTENT