fortran & link with library & brief explanation of mkl blas

33
Some things you need to know Jongsu Kim

Upload: liam-jongsu-kim

Post on 17-Aug-2015

129 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Fortran & Link with Library & Brief Explanation of MKL BLAS

Some things you need to knowJongsu Kim

Page 2: Fortran & Link with Library & Brief Explanation of MKL BLAS

Fortran

Page 3: Fortran & Link with Library & Brief Explanation of MKL BLAS

Fortran….

• Still Fortran 77, 90, or 95?• Fortran 2003 & 2008 is already here and 2015 will be a future.• Some parts will be deleted or obsolescent.• We are using Fortran wrong way.

Page 4: Fortran & Link with Library & Brief Explanation of MKL BLAS

What you shouldn’t useLabeled Do Loops

do 100 ii=istart,ilast,istep isum = isum + ii100 continue

1 2 3 4 5 6 7

A

B

C(1) C(2)

EQUIVALENCEspecify the sharing of storage units by two or more objects

in a scoping unit

character (len=3) :: C(2)character (len=4) :: A,Bequivalence (A,C(1)), (B,C(2))

COMMONBlocks of physical storage accessed by any of

the scoping units in a program

COMMON /BLOCKA/ A,B,C(10,30)COMMON I, J, K

ENTRYsubroutine-like-things Inside subroutine

FIXED FORM SOURCEFortran 77 style (80 column restriction)

CHARACTER* formreplaced with CHARACTER(LEN=?)

NON-BLOCK DO CONSTRUCTthe DO range doesn't end in a CONTINUE or

END DO

Page 5: Fortran & Link with Library & Brief Explanation of MKL BLAS

What you shouldn’t useLabeled Do Loops

Label doesn’t need, hard to remember what meaning of number. Moreover, we have END DO or CYCLE statement

EQUIVALENCEEquivalence is also error-prone. It is hard to memorize all of positions where this variables points.

Since COMMON and EQUIVALENCE is not to en-couraged to use, BLOCK statement is also not to do.

COMMONSharing lots of variables over program is danger-ous. It is error-prone

ENTRYIt complicates program because we have module & subroutine

NON-BLOCK DO CONSTRUCTHard to maintain where DO loop ends

Page 6: Fortran & Link with Library & Brief Explanation of MKL BLAS

What you might want to use – CYCLE , EXIT

• Avoid GOTO Statement• Use CYCLE or EXIT statement• CYCLE : Skip to the end of a loop• EXIT : exit loop

do i=1, 100x = real(i)y = sin(x)if (i == 20) exitz = cos(x)

enddo

do i=1, 100x = real(i)y = sin(x)if (i == 20) cyclez = cos(x)

enddo

19 iteration will be done successfully, but at 20th iteration, y = sin(x) executed then exit loop.

100 iteration, but at i=20, z = cos(x) doesn’t executed

Page 7: Fortran & Link with Library & Brief Explanation of MKL BLAS

What you might want to use – CYCLE , EXIT

• Avoid GOTO statement• Use CYCLE or EXIT statement with nested loop• Constructs (DO, IF, CASE, etc.) may have names

outer: do j=1, 100inner: do i=1, 100

x = real(i)y = sin(x)if (i > 20) exit outerz = cos(x)

enddo innerenddo outer

Exit whole loop at i=21 Skip z=cos(x) when i>21

outer: do j=1, 100inner: do i=1, 100

x = real(i)y = sin(x)if (i > 20) cycle outerz = cos(x)

enddo innerenddo outer

Page 8: Fortran & Link with Library & Brief Explanation of MKL BLAS

What you might want to use – WHERE

real, dimension(4) :: &x = [ -1, 0, 1, 2 ], &a = [ 5, 6, 7, 8 ]...where (x < 0)

a = -1.end where

where (x /= 0)a = 1. / a

elsewherea = 0.

end where

where (x < 0)a = -1.

end wherea : {-1.0, 6.0, 7.0, 8.0}

where (x /= 0)a = 1. / a

elsewherea = 0.

end where

a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}

Page 9: Fortran & Link with Library & Brief Explanation of MKL BLAS

What you might want to use – ANY

integer, parameter :: n = 100real, dimension(n,n) :: a, b, c1, c2

c1 = my_matmul(a, b) ! home-grown functionc2 = matmul(a, b) ! built-in function

if (any(abs(c1 - c2) > 1.e-4)) thenprint *, ’There are significant dif-

ferences’endif

• ANY and WHERE remove redundant do loop

Page 10: Fortran & Link with Library & Brief Explanation of MKL BLAS

What you might want to use – DO CONCURRENT

• Vectorization• Simple example of Auto-Parallelization• Definition : Processes one operation on multiple pairs of operands at once

do concurrent (i=1:m)call dosomething()

end do

DO i=1,1024 C(i) = A(i) * B(i)END DO

DO i=1,1024,4 C(i:i+3) = A(i:i+3) * B(i:i+3)END DO

• ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option.• No data dependencies, No EXIT or CYCLE Statement, No return statement.• Use with OpenMP.

Page 11: Fortran & Link with Library & Brief Explanation of MKL BLAS

For More..

• Read Fortran 2008 Standard• http://www.j3-fortran.org/doc/year/10/10-007.pdf

• More recent document for Fortran 2015 (or more, working now)• http://j3-fortran.org/doc/year/15/15-007.pdf

• Easy to read documents• The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf• Modern Programming Languages: Fortran90/95/2003/2008 :

https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf

Page 12: Fortran & Link with Library & Brief Explanation of MKL BLAS

Build System (MakeFile)

Page 13: Fortran & Link with Library & Brief Explanation of MKL BLAS

Build?• Process From Source Code to Executable Files, so called Build.• Compiler : tool for compile, Linker : tool for Link.• ifort, gcc, gfortran, and so on are combined tool for compile & link.

Source Code1.f

Source Code2.f

Source Code3.f

Source Code1.o

Source Code2.o

Source Code3.o

Compile Link

Libraries(FFTW..)

Readable Unreadable

a.out

Page 14: Fortran & Link with Library & Brief Explanation of MKL BLAS

Makefile?

• make do all of compile & link jobs automatically. Makefile is a build script. • make(actually gmake) is one of many tools. There are many tools like make, so called build sys-

tem.• Visual studio has own build system. Hence it doesn’t use makefile.

$ gcc -o hellomake hellomake.c hellofunc.c -I.

hellomake: hellomake.c hellofunc.cgcc -o hellomake hellomake.c hellofunc.c -I.

1. Command-line

2. Simple Makefile (1)

• “hellomake:” : rule name • “hellomake.c hellofunc.c hellomake.h” : dependencies• “gcc …” : actual command

• Simply “make” execute first rule defined in Makefile

Makefile Command-line

$ make or$ make hellomake

Page 15: Fortran & Link with Library & Brief Explanation of MKL BLAS

Makefile?

CC=gccCFLAGS=-I.

hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I.

3. Simple Makefile (3)

Add constants• “CC=gcc” : C Compiler• “CFLAGS” : list of flags to pass to the compilation command

• For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS”• Indent(tab) with command line (“$(CC)”) is important!

$ make or$ make hellomake

Page 16: Fortran & Link with Library & Brief Explanation of MKL BLAS

Makefile?

CC=gccCFLAGS=-I.DEPS = hellomake.h

hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I.

%.o: %.c $(DEPS)$(CC) -c $< $(CFLAGS)

4. Simple Makefile (4)

Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile• Rule %.o : rule for compilation, Rule hellomake : rule for link.• $@ is the name of the file to be made. (e.g. hellomake for rule hellomake)• $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake)• $^ The names of all the prerequisites, with spaces between them• $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c)

$ make or$ make hellomake

Page 17: Fortran & Link with Library & Brief Explanation of MKL BLAS

Compiler & Linker Options

FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/includeLIBS=-L/home/astromeca/usr/lib -lfftw3 -lm

Compiler Options and Linker Options

• -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive Optimization)

• -r8 : real type is a double precision (8byte(=64bit) for real)• -I : Specify include directory. Include : .h files (declaration)• -L : Specify library directory. Library files : .so or .a • -lfftw3 : Link with fftw3 library• -lm : link with math library (to use several math intrinsic functions)

Page 18: Fortran & Link with Library & Brief Explanation of MKL BLAS

Compiler & Linker Options

Recommend options

• -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary compu-tations on the heap instead of the stack. Same effect as allocate statement.

• -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) : SSE4.2

• -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results.• -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT.• -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f

suffix as Fortran 90 or higher, enable this option.• $ man ifort gives us a lot of additional information.

Debug vs Release• -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds

some additional code hence it slows code and turn off optimization automatically. • If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or –

check options.

Page 19: Fortran & Link with Library & Brief Explanation of MKL BLAS

MKL BLAS & CG Method

Page 20: Fortran & Link with Library & Brief Explanation of MKL BLAS

Intel MKL(Math Kernel Library) and BLAS

Intel MKL

• A library of optimized math routines for science, engineering, and financial applications.• Basic functions related to matrix or vector included.• You don’t need any installation, just add library.

BLAS• Basic Linear Algebra Subprograms• a set of low-level routines for performing common linear algebra operations such as vector addition, scalar

multiplication, dot products, linear combinations, and matrix multiplication• It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on.• I will use MKL BLAS because it is easy to compile and well documentated.• It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI paral-

lelism is not implemented).

I will show how to make CG method using MKL BLAS line by line.

Page 21: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1

1

1

row offsets

column indices

values

9 entries (non zero entries)

Page 22: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1

1 2

1 7

column indices

values

9 entries (non zero entries)

row offsets

Page 23: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3

1 2 2

1 7 2

column indices

values

9 entries (non zero entries)

row offsets

Page 24: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3

1 2 2 3

1 7 2 8

column indices

values

9 entries (non zero entries)

row offsets

Page 25: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5

1 2 2 3 1

1 7 2 8 5

column indices

values

9 entries (non zero entries)

row offsets

Page 26: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5

1 2 2 3 1 3

1 7 2 8 5 3

column indices

values

9 entries (non zero entries)

row offsets

Page 27: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5

1 2 2 3 1 3 4

1 7 2 8 5 3 9

column indices

values

9 entries (non zero entries)

row offsets

Page 28: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8

1 2 2 3 1 3 4 2

1 7 2 8 5 3 9 6

column indices

values

9 entries (non zero entries)

row offsets

Page 29: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8

1 2 2 3 1 3 4 2 4

1 7 2 8 5 3 9 6 4

column indices

values

9 entries (non zero entries)

row offsets

Page 30: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8 10

1 2 2 3 1 3 4 2 4

1 7 2 8 5 3 9 6 4

column indices

values

9 entries (non zero entries)

row offsets

Indicates end

Page 31: Fortran & Link with Library & Brief Explanation of MKL BLAS

Sparse matrix

• If construct A matrix with zeros, 16 * 8bytes is required• Sparse matrix, CSR matrix, requires 23 * 8bytes.• Inefficient? No, if you have large A matrix, such as , CSR is SOOOO efficient.

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8 10

1 2 2 3 1 3 4 2 4

1 7 2 8 5 3 9 6 4

Page 32: Fortran & Link with Library & Brief Explanation of MKL BLAS

What BLAS Library Functions Required?

• mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-ar-ray variation) with zero-based indexing with double precision. used in computation.• call mkl_dcsrgemv(transa, m, a, ia, ja, x, y)• transa : determine (transa=‘N’ or ‘n’) or (transa=‘T’ or ‘t’ or ‘C’ or ‘c’). • m : # of rows of A• a : Values array of A in CSR format• ia : Row offset array of A in CSR format • ja : Column indices array of A in CSR format• x : x vector• y : output ()

• dcopy : Copy vector (routines), copy arrays from x to y. • call dcopy(n, x, y)• n : # of elements in vectors and .• x : Input, vector• y : Output, vector

Page 33: Fortran & Link with Library & Brief Explanation of MKL BLAS

What BLAS Library Functions Required?

• ddot : Computes a vector-vector dot product. • not subroutine, it’s a function.• dot(x, y) • x, y : vector

• daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y

• call daxpy(n, a, x, y)• n : # of elements in vectors and .• A : Scalar A• x : Input, vector• y : Output, vector

• dnrm2 : Computes the Euclidean norm of a vector. • not subroutine, it’s a function• nrm2(x)• n : # of elements in vectors .• x : Input, vector