fortran & link with library & brief explanation of mkl blas

Some things you need to knowJongsu Kim

Fortran

Fortran….

• Still Fortran 77, 90, or 95?• Fortran 2003 & 2008 is already here and 2015 will be a future.• Some parts will be deleted or obsolescent.• We are using Fortran wrong way.

What you shouldn’t useLabeled Do Loops

do 100 ii=istart,ilast,istep isum = isum + ii100 continue

1 2 3 4 5 6 7

A

B

C(1) C(2)

EQUIVALENCEspecify the sharing of storage units by two or more objects

in a scoping unit

character (len=3) :: C(2)character (len=4) :: A,Bequivalence (A,C(1)), (B,C(2))

COMMONBlocks of physical storage accessed by any of

the scoping units in a program

COMMON /BLOCKA/ A,B,C(10,30)COMMON I, J, K

ENTRYsubroutine-like-things Inside subroutine

FIXED FORM SOURCEFortran 77 style (80 column restriction)

CHARACTER* formreplaced with CHARACTER(LEN=?)

NON-BLOCK DO CONSTRUCTthe DO range doesn't end in a CONTINUE or

END DO

What you shouldn’t useLabeled Do Loops

Label doesn’t need, hard to remember what meaning of number. Moreover, we have END DO or CYCLE statement

EQUIVALENCEEquivalence is also error-prone. It is hard to memorize all of positions where this variables points.

Since COMMON and EQUIVALENCE is not to en-couraged to use, BLOCK statement is also not to do.

COMMONSharing lots of variables over program is danger-ous. It is error-prone

ENTRYIt complicates program because we have module & subroutine

NON-BLOCK DO CONSTRUCTHard to maintain where DO loop ends

What you might want to use – CYCLE , EXIT

• Avoid GOTO Statement• Use CYCLE or EXIT statement• CYCLE : Skip to the end of a loop• EXIT : exit loop

do i=1, 100x = real(i)y = sin(x)if (i == 20) exitz = cos(x)

enddo

do i=1, 100x = real(i)y = sin(x)if (i == 20) cyclez = cos(x)

enddo

19 iteration will be done successfully, but at 20th iteration, y = sin(x) executed then exit loop.

100 iteration, but at i=20, z = cos(x) doesn’t executed

What you might want to use – CYCLE , EXIT

• Avoid GOTO statement• Use CYCLE or EXIT statement with nested loop• Constructs (DO, IF, CASE, etc.) may have names

outer: do j=1, 100inner: do i=1, 100

x = real(i)y = sin(x)if (i > 20) exit outerz = cos(x)

enddo innerenddo outer

Exit whole loop at i=21 Skip z=cos(x) when i>21

outer: do j=1, 100inner: do i=1, 100

x = real(i)y = sin(x)if (i > 20) cycle outerz = cos(x)

enddo innerenddo outer

What you might want to use – WHERE

real, dimension(4) :: &x = [ -1, 0, 1, 2 ], &a = [ 5, 6, 7, 8 ]...where (x < 0)

a = -1.end where

where (x /= 0)a = 1. / a

elsewherea = 0.

end where

where (x < 0)a = -1.

end wherea : {-1.0, 6.0, 7.0, 8.0}

where (x /= 0)a = 1. / a

elsewherea = 0.

end where

a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}

What you might want to use – ANY

integer, parameter :: n = 100real, dimension(n,n) :: a, b, c1, c2

c1 = my_matmul(a, b) ! home-grown functionc2 = matmul(a, b) ! built-in function

if (any(abs(c1 - c2) > 1.e-4)) thenprint *, ’There are significant dif-

ferences’endif

• ANY and WHERE remove redundant do loop

What you might want to use – DO CONCURRENT

• Vectorization• Simple example of Auto-Parallelization• Definition : Processes one operation on multiple pairs of operands at once

do concurrent (i=1:m)call dosomething()

end do

DO i=1,1024 C(i) = A(i) * B(i)END DO

DO i=1,1024,4 C(i:i+3) = A(i:i+3) * B(i:i+3)END DO

• ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option.• No data dependencies, No EXIT or CYCLE Statement, No return statement.• Use with OpenMP.

For More..

• Read Fortran 2008 Standard• http://www.j3-fortran.org/doc/year/10/10-007.pdf

• More recent document for Fortran 2015 (or more, working now)• http://j3-fortran.org/doc/year/15/15-007.pdf

• Easy to read documents• The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf• Modern Programming Languages: Fortran90/95/2003/2008 :

https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf

http://www.j3-fortran.org/doc/year/10/10-007.pdf

http://j3-fortran.org/doc/year/15/15-007.pdf

ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf

https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf

Build System (MakeFile)

Build?• Process From Source Code to Executable Files, so called Build.• Compiler : tool for compile, Linker : tool for Link.• ifort, gcc, gfortran, and so on are combined tool for compile & link.

Source Code1.f

Source Code2.f

Source Code3.f

Source Code1.o

Source Code2.o

Source Code3.o

Compile Link

Libraries(FFTW..)

Readable Unreadable

a.out

Makefile?

• make do all of compile & link jobs automatically. Makefile is a build script. • make(actually gmake) is one of many tools. There are many tools like make, so called build sys-

tem.• Visual studio has own build system. Hence it doesn’t use makefile.

$ gcc -o hellomake hellomake.c hellofunc.c -I.

hellomake: hellomake.c hellofunc.cgcc -o hellomake hellomake.c hellofunc.c -I.

1. Command-line

2. Simple Makefile (1)

• “hellomake:” : rule name • “hellomake.c hellofunc.c hellomake.h” : dependencies• “gcc …” : actual command

• Simply “make” execute first rule defined in Makefile

Makefile Command-line

$ make or$ make hellomake

Makefile?

CC=gccCFLAGS=-I.

hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I.


Add constants• “CC=gcc” : C Compiler• “CFLAGS” : list of flags to pass to the compilation command

• For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS”• Indent(tab) with command line (“$(CC)”) is important!


Makefile?

CC=gccCFLAGS=-I.DEPS = hellomake.h

hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I.

%.o: %.c $(DEPS)$(CC) -c $< $(CFLAGS)


Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile• Rule %.o : rule for compilation, Rule hellomake : rule for link.• $@ is the name of the file to be made. (e.g. hellomake for rule hellomake)• $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake)• $^ The names of all the prerequisites, with spaces between them• $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c)


Compiler & Linker Options

FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/includeLIBS=-L/home/astromeca/usr/lib -lfftw3 -lm

Compiler Options and Linker Options

• -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive Optimization)

• -r8 : real type is a double precision (8byte(=64bit) for real)• -I : Specify include directory. Include : .h files (declaration)• -L : Specify library directory. Library files : .so or .a • -lfftw3 : Link with fftw3 library• -lm : link with math library (to use several math intrinsic functions)

Compiler & Linker Options

Recommend options

• -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary compu-tations on the heap instead of the stack. Same effect as allocate statement.

• -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) : SSE4.2

• -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results.• -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT.• -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f

suffix as Fortran 90 or higher, enable this option.• $ man ifort gives us a lot of additional information.

Debug vs Release• -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds

some additional code hence it slows code and turn off optimization automatically. • If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or –

check options.

MKL BLAS & CG Method

Intel MKL(Math Kernel Library) and BLAS

Intel MKL

• A library of optimized math routines for science, engineering, and financial applications.• Basic functions related to matrix or vector included.• You don’t need any installation, just add library.

BLAS• Basic Linear Algebra Subprograms• a set of low-level routines for performing common linear algebra operations such as vector addition, scalar

multiplication, dot products, linear combinations, and matrix multiplication• It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on.• I will use MKL BLAS because it is easy to compile and well documentated.• It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI paral-

lelism is not implemented).

I will show how to make CG method using MKL BLAS line by line.

Sparse Matrix Format

• Before starting BLAS Library Functions, we need to consider how to construct matrix in .

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1

1

1

row offsets

column indices

values

9 entries (non zero entries)



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1

1 2

1 7

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3

1 2 2

1 7 2

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3

1 2 2 3

1 7 2 8

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5

1 2 2 3 1

1 7 2 8 5

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5

1 2 2 3 1 3

1 7 2 8 5 3

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5

1 2 2 3 1 3 4

1 7 2 8 5 3 9

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8

1 2 2 3 1 3 4 2

1 7 2 8 5 3 9 6

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8

1 2 2 3 1 3 4 2 4

1 7 2 8 5 3 9 6 4

column indices

values


row offsets



1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8 10

1 2 2 3 1 3 4 2 4

1 7 2 8 5 3 9 6 4

column indices

values


row offsets

Indicates end

Sparse matrix

• If construct A matrix with zeros, 16 * 8bytes is required• Sparse matrix, CSR matrix, requires 23 * 8bytes.• Inefficient? No, if you have large A matrix, such as , CSR is SOOOO efficient.

1 7 0 0

0 2 8 0

5 0 3 9

0 6 0 4

1 3 5 8 10

1 2 2 3 1 3 4 2 4

1 7 2 8 5 3 9 6 4

What BLAS Library Functions Required?

• mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-ar-ray variation) with zero-based indexing with double precision. used in computation.• call mkl_dcsrgemv(transa, m, a, ia, ja, x, y)• transa : determine (transa=‘N’ or ‘n’) or (transa=‘T’ or ‘t’ or ‘C’ or ‘c’). • m : # of rows of A• a : Values array of A in CSR format• ia : Row offset array of A in CSR format • ja : Column indices array of A in CSR format• x : x vector• y : output ()

• dcopy : Copy vector (routines), copy arrays from x to y. • call dcopy(n, x, y)• n : # of elements in vectors and .• x : Input, vector• y : Output, vector

What BLAS Library Functions Required?

• ddot : Computes a vector-vector dot product. • not subroutine, it’s a function.• dot(x, y) • x, y : vector

• daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y

• call daxpy(n, a, x, y)• n : # of elements in vectors and .• A : Scalar A• x : Input, vector• y : Output, vector

• dnrm2 : Computes the Euclidean norm of a vector. • not subroutine, it’s a function• nrm2(x)• n : # of elements in vectors .• x : Input, vector

fortran & link with library & brief explanation of mkl blas

Engineering