introduction to parallel fem in c parallel data...

39
Introduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information Technology Center Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009)

Upload: others

Post on 02-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Introduction to Parallel FEM in CParallel Data Structure

Kengo NakajimaInformation Technology Center

Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009)

Page 2: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 2

Parallel Computing

• Faster, Larger & More Complicated

• Scalability– Solving Nx scale problem using Nx computational

resources during same computation time• for large-scale problems: Weak Scaling• e.g. CG solver: more iterations needed for larger problems

– Solving a problem using Nx computational resources during 1/N computation time

• for faster computation: Strong Scaling

Page 3: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 3

What is Parallel Computing ? (1/2)

• to solve larger problems faster

Homogeneous/HeterogeneousPorous Media

Lawrence Livermore National Laboratory

Homogeneous Heterogeneousvery fine meshes are required for simulations of heterogeneous field.

Page 4: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 4

What is Parallel Computing ? (2/2)

• PC with 1GB memory : 1M meshes are the limit for FEM− Southwest Japan with 1,000km x 1,000km x 100km in 1km mesh

-> 108 meshes• Large Data -> Domain Decomposition -> Local Operation • Inter-Domain Communication for Global Operation

Large-ScaleData

LocalData

LocalData

LocalData

LocalData

LocalData

LocalData

LocalData

LocalData

Communication

partitioning

Page 5: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 5

What is Communication ?

• Parallel Computing -> Local Operations

• Communications are required in Global Operations for Consistency.

Page 6: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Large Scale Data -> partitioned into Distributed Local Data Sets.

Local Data

Local Data

Local Data

Local Data

FEM Matrix

FEM Matrix

FEM Matrix

FEM Matrix

FEM code can assembles coefficient matrix for each local data set : this part could be completely local, same as serial operations

Linear Solver

Linear Solver

Linear Solver

Linear Solver

Global Operations & Communications happen only in Linear Solversdot products, matrix-vector multiply, preconditioning

Operations in Parallel FEMSPMD: Single-Program Multiple-Data

MPI

MPI

MPI

Intro pFEM 6

Page 7: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 7

Parallel FEM Procedures

• Design on “Local Data Structure” is important– for SPMD-type operations in the previous page

• Matrix Generation• Preconditioned Iterative Solvers for Linear Equations

Page 8: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Bi-Linear Square ElementsValues are defined on each node

1

5

1

2

6

2

3

7

3

8

4

1

5

1

6

2

3

7

3

8

4

Local information is not enough for matrix assembling.

divide into two domains by “node-based” manner, where number of “nodes (vertices)” are balanced.

21

5

1

6

2

3

8

4

7

3

2

6

2

7

3

Information of overlapped elements and connected nodes are required for matrix assembling on boundary nodes.

Intro pFEM 8

Page 9: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Local Data of Parallel FEM

Node-based partitioning for IC/ILU type preconditioning methods Local data includes information for :

Nodes originally assigned to the partition/PE Elements which include the nodes : Element-based operations (Matrix

Assemble) are allowed for fluid/structure subsystems. All nodes which form the elements but out of the partition

Nodes are classified into the following 3 categories from the viewpoint of the message passing Internal nodes originally assigned nodes External nodes in the overlapped elements but out of the partition Boundary nodes external nodes of other partition

Communication table between partitions NO global information required except partition-to-partition

connectivity

Intro pFEM 9

Page 10: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro pFEM 10

Node-based Partitioninginternal nodes - elements - external nodes

1 2 3

4 5

6 7

8 9 11

10

14 13

15

12

PE#0

7 8 9 10

4 5 6 12

3111

2

PE#3

7 1 2 3

10 9 11 12

568

4

PE#2

34

8

69

10 12

1 2

5

11

7

PE#1

1 2 3 4 5

21 22 23 24 25

1617 18

20

1112 13 14

15

67 8 9

10

19

PE#0PE#1

PE#2PE#3

Page 11: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro pFEM 1111

Elements which include Internal Nodes 内点を含む要素

Node-based Partitioninginternal nodes - elements - external nodes

8 9 11

10

14 13

15

12

External Nodes included in the Elements 外点in overlapped region among partitions.

Partitioned nodes themselves (Internal Nodes) 内点

1 2 3

4 5

6 7

Info of External Nodes are required for completely local element–based operations on each processor.

Page 12: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Elements which include Internal Nodes

Node-based Partitioninginternal nodes - elements - external nodes

8 9 11

10

14 13

15

12

External Nodes included in the Elementsin overlapped region among partitions.

Partitioned nodes themselves (Internal Nodes)

1 2 3

4 5

6 7

Info of External Nodes are required for completely local element–based operations on each processor.

We do not need communication during matrix assemble !!

Intro pFEM 12

Page 13: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 13

Parallel Computing in FEMSPMD: Single-Program Multiple-Data

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

MPIMPI

MPIMPI

MPIMPI

Page 14: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 14

Parallel Computing in FEMSPMD: Single-Program Multiple-Data

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

MPIMPI

MPIMPI

MPIMPI

Page 15: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 15

Parallel Computing in FEMSPMD: Single-Program Multiple-Data

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

MPIMPI

MPIMPI

MPIMPI

1 2 3

4 5

6 7

8 9 11

10

14 13

15

12

1 2 3

4 5

6 7

8 9 11

10

14 13

15

12

7 1 2 3

10 9 11 12

568

4

7 1 2 3

10 9 11 12

568

4

7 8 9 10

4 5 6 12

3111

2

7 8 9 10

4 5 6 12

3111

2

34

8

69

10 12

1 2

5

11

7

34

8

69

10 12

1 2

5

11

7

Page 16: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 16

Parallel Computing in FEMSPMD: Single-Program Multiple-Data

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

MPIMPI

MPIMPI

MPIMPI

1 2 3

4 5

6 7

8 9 11

10

14 13

15

12

1 2 3

4 5

6 7

8 9 11

10

14 13

15

12

7 1 2 3

10 9 11 12

568

4

7 1 2 3

10 9 11 12

568

4

7 8 9 10

4 5 6 12

3111

2

7 8 9 10

4 5 6 12

3111

2

34

8

69

10 12

1 2

5

11

7

34

8

69

10 12

1 2

5

11

7

Page 17: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 17

Parallel Computing in FEMSPMD: Single-Program Multiple-Data

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

Local Data FEM code Linear Solvers

MPIMPI

MPIMPI

MPIMPI

Page 18: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

Intro-pFEM 18

• to get information of “external nodes” from external partitions (local data)

• “Communication tables” contain the information

What is Communications ?

Page 19: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

19

1D FEM: 12 nodes/11 elem’s/3 domainsIntro-pFEM

0 1 2 3 4 5 6 7 8 9 10 110 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 111 2 3 4 5 6 7 8 9 100

Page 20: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

20

1D FEM: 12 nodes/11 elem’s/3 domains0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

Intro-pFEM

Page 21: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

21

# “Internal Nodes” should be balanced

#0

#1

#2

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

Intro-pFEM

Page 22: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

22

Matrices are incomplete !0

1

2

3

4

5

6

7

8

9

10

11

#0

#1

#2

0

1

2

3

8

9

10

11

4

5

6

7

Intro-pFEM

Page 23: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

23

Connected Elements + External Nodes

#0

#1

#2

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

0

1

2

3

7

8

9

10

11

7

8

9

10

3

4

5

6

7

8

3

4

5

6

7

Intro-pFEM

Page 24: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

24

1D FEM: 12 nodes/11 elem’s/3 domains0

1

2

3

4

0

1

2

3

7

8

9

10

11

7

8

9

10

3

4

5

6

7

8

3

4

5

6

7

#0

#1

#2

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

Intro-pFEM

Page 25: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

25

1D FEM: 12 nodes/11 elem’s/3 domainsIntro-pFEM

0 1 2 3 4 5 6 7 8 9 10 110 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4

7 8 9 10 11

1 2 3

7 8 9 10

0

3 4 5 6 7 83 4 5 6 7

0 1 2 3 4 5 6 7 8 9 10 111 2 3 4 5 6 7 8 9 100

Page 26: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

26

Local Numbering for SPMDNumbering of internal nodes is 1-N (0-N-1), same operations

in serial program can be applied. How about numbering of external nodes ?

0 1 2 3 ?

? 0 1 2 3

? 0 1 2 3 ?

Intro-pFEM

0 1 2 3 4

7 8 9 10 11

1 2 3

7 8 9 10

0

3 4 5 6 7 83 4 5 6 7

Page 27: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

SPMD

PE #0

Program

Data #0

PE #1

Program

Data #1

PE #2

Program

Data #2

PE #M-1

Program

Data #M-1

mpirun -np M <Program>

PE: Processing ElementProcessor, Domain, Process

Each process does same operation for different dataLarge-scale data is decomposed, and each part is computed by each processIt is ideal that parallel program is not different from serial one except communication.

Intro pFEM 27

Page 28: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

28

Local Numbering for SPMDNumbering of external nodes: N+1, N+2 (N,N+1)

0 1 2 3 4

4 0 1 2 3

4 0 1 2 3 5

Intro-pFEM

Page 29: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

29

Finite Element Procedures• Initialization

– Control Data– Node, Connectivity of Elements (N: Node#, NE: Elem#)– Initialization of Arrays (Global/Element Matrices)– Element-Global Matrix Mapping (Index, Item)

• Generation of Matrix– Element-by-Element Operations (do icel= 1, NE)

• Element matrices• Accumulation to global matrix

– Boundary Conditions• Linear Solver

– Conjugate Gradient Method

Intro-pFEM

Page 30: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

30

Preconditioned CG Solver

Compute r(0)= b-[A]x(0)

for i= 1, 2, …solve [M]z(i-1)= r(i-1)

i-1= r(i-1) z(i-1)if i=1p(1)= z(0)

elsei-1= i-1/i-2p(i)= z(i-1) + i-1 p(i-1)

endifq(i)= [A]p(i)

i = i-1/p(i)q(i)x(i)= x(i-1) + ip(i)r(i)= r(i-1) - iq(i)check convergence |r|

end

Intro-pFEM

N

N

DD

DD

M

0...00000.........00000...0

1

2

1

Page 31: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

31

Preconditioning, DAXPYLocal Operations by Only Internal Points: Parallel

Processing is possible

/*//-- {x}= {x} + ALPHA*{p} DAXPY: double a{x} plus {y}// {r}= {r} - ALPHA*{q}*/for(i=0;i<N;i++){

U[i] += Alpha * W[P][i];W[R][i] -= Alpha * W[Q][i];

}

/*//-- {z}= [Minv]{r}*/

for(i=0;i<N;i++){W[Z][i] = W[DD][i] * W[R][i];

}

0

1

2

3

4

5

6

7

8

9

10

11

Intro-pFEM

Page 32: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

32

Dot ProductsGlobal Summation needed: Communication ?

/*//-- ALPHA= RHO / {p}{q}*/C1 = 0.0;for(i=0;i<N;i++){

C1 += W[P][i] * W[Q][i];}

Alpha = Rho / C1;

0

1

2

3

4

5

6

7

8

9

10

11

Intro-pFEM

Page 33: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

33

Matrix-Vector ProductsValues at External Points: P-to-P Communication

/*//-- {q}= [A]{p}*/for(i=0;i<N;i++){

W[Q][i] = Diag[i] * W[P][i];for(j=Index[i];j<Index[i+1];j++){

W[Q][i] += AMat[j]*W[P][Item[j]];}

}

4 0 1 2 3 5

Intro-pFEM

Page 34: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

34

Mat-Vec Products: Local Op. Possible0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

=

Intro-pFEM

Page 35: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

35

Mat-Vec Products: Local Op. Possible0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

4

5

6

7

8

9

10

11

=

Intro-pFEM

Page 36: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

36

Mat-Vec Products: Local Op. Possible0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

0

1

2

3

=

Intro-pFEM

Page 37: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

37

Mat-Vec Products: Local Op. #00

1

2

3

0

1

2

3

0

1

2

3

=

4

0 1 2 3 4

Intro-pFEM

Page 38: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

38

Mat-Vec Products: Local Op. #10

1

2

3

0

1

2

3

0

1

2

3

=

4 0 1 2 3 5

0

1

2

3

0

1

2

3

0

1

2

3

=

4

5

Intro-pFEM

Page 39: Introduction to Parallel FEM in C Parallel Data …nkl.cc.u-tokyo.ac.jp/13e/03-MPI/intro-pFEM-C.pdfIntroduction to Parallel FEM in C Parallel Data Structure Kengo Nakajima Information

39

Mat-Vec Products: Local Op. #20

1

2

3

0

1

2

3

=

0

1

2

3

0

1

2

3

0

1

2

3

=

4

0

1

2

3

4 0 1 2 3

Intro-pFEM