sensitivity analysis for large-deflection and postbuckling responses on distributed-memory computers

ELSEVIER

Computer methods in.applied

mechanics and englneerlng

Comput. Methods Appl. Mech. Engrg. 129 (1996) 393-409

Sensitivity analysis for large-deflection and postbuckling responses on distributed-memory computers

Brian C. Watson, Ahmed K. Noor” Center for Computational Structures Technology, University of Virginia, NASA Langley Research Center, Hampton, VA 23681,

USA

Received 24 February 1996

Abstract

A computational strategy is presented for calculating sensitivity coefficients for the non-linear large-deflection and postbuckling responses of laminated composite structures on distributed-memory parallel computers. The strategy is applicable to any message-passing distributed computational environment. The key elements of the proposed strategy are: (a) a multiple-parameter reduced basis technique; (b) a parallel sparse equation solver based on a nested dissection (or multilevel substructuring) node ordering scheme; and (c) a multilevel parallel procedure for evaluating hierarchical sensitivity coefficients. The hierarchical sensitivity coefficients measure the sensitivity of the composite structure response to variations in three sets of interrelated parameters; namely, laminate, layer and micromechanical (fiber, matrix and interface/interphase) parameters. The effectiveness of the strategy is assessed by performing hierarchical sensitivity analysis for the large-deflection and postbuckling responses of stiffened composite panels with cutouts on three distributed-memory computers. The panels are subjected to combined mechanical and thermal loads. The numerical studies presented demonstrate the advantages of the reduced basis technique for hierarchical sensitivity analysis on distributed-memory machines.

1. Introduction

Non-linear large-deflection and postbuckling analyses of large-scale structures can require immense computational resources. Distributed-memory parallel computers, such as the Intel Paragon, the Cray T3D and the IBM SP2, have the potential to provide the increased speed and capacity necessary to perform such analyses. To achieve this potential, the computational strategies used in such analyses must take advantage of the unique characteristics of these computers. In recent years, intense efforts have been devoted to the development of parallel computational strategies and numerical algorithms for large-scale finite element response computations (see e.g. [l-5]). Much of this work has focused on implementing efficient linear equation solvers on distributed-memory computers. This has led to the development of a number of direct and iterative numerical algorithms for the solution of large sparse linear systems of equations (see [6-121).

While efficient methods to compute response quantities are important, automated design procedures for large complex structures require the calculation of the derivatives of response quantities with respect to design variables. Relatively few studies on parallel strategies for computing sensitivity coefficients have been reported in the literature (see e.g. [13-E]). In the cited references, only small to moderate size problems were considered, and to the authors’ knowledge no studies have been reported on the

* Corresponding author.

Elsevier Science S.A. SSDZ 0045-7825(95)00862-4

394 B.C. Watson, A.K. Noor I Comput. Methods Appl. Mech. Engrg. 129 (1996) 393-409

calculation of sensitivity coefficients for the non-linear response of large complex structures on parallel computers. The present study attempts to fill this void.

Parallel strategies typically attempt to distribute computational work by breaking a large problem into smaller subproblems, which are then solved separately on individual processors. The degree of independence of the subproblems is a measure of the effectiveness of the algorithm since it determines the amount and frequency of communication and synchronization. Parallel decomposition strategies for structural response analysis problems can be divided into three categories, namely: nodal, elemental, and domain-based.

Nodal parallel decomposition strategies include node-by-node iterative solvers as well as column- oriented direct solvers. Elemental parallel decomposition strategies include element-by-element equation solvers and parallel frontal equation solvers. Domain-based algorithms include nested dissection-based (substructuring) techniques and domain decomposition methods. The first two categories of numerical algorithms allow only small granularity of the parallel tasks, and require frequent communications among the processors. By contrast, the third category allows a larger granularity, which can result in improved performance for the algorithm.

Sensitivity analysis suggests a fourth category for parallel strategies; namely, decomposition based on design variables. Strategies in this category allow the largest granularity because of the complete independence of the parallel tasks.

The overall goal of the present study is to develop fully-parallel, scalable computational strategies for the sensitivity analysis of large complex non-linear structures on distributed-memory parallel computers. Specifically, the objectives of the present paper are to: (a) present an effective computational strategy for the sensitivity analysis of non-linear static and postbuckling responses for large-scale structures on distributed-memory computers; and (b) assess the performance of the proposed strategy and demonstrate potential benefits resulting from its use on modern distributed-memory computers.

The key elements of the proposed strategy are: (a) a multiple-parameter reduced basis technique; (b) a parallel sparse equation solver based on a nested dissection node ordering scheme; and (c) a multilevel parallel procedure for evaluating hierarchical sensitivity coefficients. An assessment is made of the effectiveness of the strategy for the sensitivity analysis for large-deflection and postbuckling response of stiffened composite panels on the Intel Paragon, the Cray T3D and the IBM SP2.

2. Mathematical formulation

2.1. Governing finite element equations

The analytical formulation is based on a moderate-rotation, geometrically non-linear, shallow shell theory, with the effects of transverse shear deformation and anisotropic material behavior included. A mixed formulation is used in which the fundamental unknowns consist of internal forces (or stress resultants) and generalized displacements. A total Lagrangian description of the deformation of the structure is used in which the deformed configurations of the structure are referred to the initial coordinate system of the undeformed structure. Stress resultants are allowed to be discontinuous at interelement boundaries. The external loading consists of a combination of transverse loading, p, temperature change, T, and in-plane edge shear loading, N,,.

The governing finite element equations for the non-linear and postbuckling response vectors, and the sensitivity coefficients can be written in the following compact form:

f(Z) = KZ + G(Z) - aQ1- qzQ2 = 0 (1)

?!h + 41 ah

?fi!? + 92 ah (2)

where Z is the response vector which includes both nodal displacements and stress-resultant parameters: K is the global linear structure matrix which includes the flexibility and the linear strain- displacement matrices; G(Z) is the vector of non-linear terms; q1 and q2 are thermal and mechanical

B.C. Watson, A.K. Noor I Comput. Methods Appl. Mech. Engrg. 12Y (1996) 39.3-409 39.5

load parameters; Q, is the vector of normalized thermal strains; Q2 is the vector of normalized mechanical loads; and A is a typical lamination, material or geometric parameter of the structure. The forms of the arrays, K, G(Z), Q, and Q2, are given in [16]. Note that Eqs. (1) are non-linear in 2, while Eqs. (2) are linear in &Z/&I.

2.2. Hierarchical sensitivity coefjicients

The non-linear and postbuckling responses of laminated composite structures are dependent on a hierarchy of parameters including laminate, layer and micromechanical (fiber, matrix and interface/ interphase) parameters (see Fig. 1). A study of the sensitivity of the response to variations in each of these parameters provides insight into the importance of the parameter and helps in the development of materials to meet certain performance requirements.

Three sets of composite parameters are considered, namely: panel, layer and micromechanical. The panel parameters include the extensional stiffnesses, A,, bending stiffnesses, D,, bending-extensional coupling stiffnesses, I?,,, (i, j = 1,2,6) transverse shear stiffnesses, Aifj. (i’, j’ = 4,5) and the thermal effects appearing in the laminate constitutive relations (see [17]). The layer parameters include the individual layer properties: Young’s moduli E,, E,; shear moduli G,,, GTT; major Poisson’s ratio vLT; thermal expansion coefficients a,, (Ye; fiber orientation angles 8 and layer thicknesses h; where subscripts L and T refer to the longitudinal (fiber) and transverse directions, respectively. The micromechanical parameters refer to the fiber, matrix and interface/interphase moduli E,, E,, E,, G,, G, , G, and the fiber and matrix volume fractions uf and u,. The subscripts f, m and p denote the fiber, matrix and interface/interphase property, respectively. The three sets of parameters will henceforth be referred to as A:, A;, A: where superscripts p, 4 and m refer to the panel, layer and micromechanical parameters, respectively; and the indices i, j and k range from 1 to the number of parameters in each category.

The computational procedure consists of evaluating the sensitivity coefficients with respect to each of the panel (or laminate) stiffnesses &Z/ah: using Eqs. (2). The sensitivity coefficients with respect to the layer and micromechanical parameters are then obtained by applying the chain rule of partial differentiation, which results in the following linear combinations

Panel (Laminate)

Layer h

Micromechanical (Constituent Materials)

E,,,.G,,,,v,.a,,,

Et, G,, vr. a,, V,

Fig. 1. Hierarchy of parameters for laminated composite panels.


(4)

where

aij = s

aA; (5)

cik = s = c aijbjk k I

(6)

(7)

The coefficients defined by Eqs. (5), aij, relate the panel (laminate) stiffnesses to the individual layer properties, and are obtained from the lamination theory. The coefficients defined by Eqs. (6), bjk, relate the layer properties to the constituent properties, and are obtained from the micromechanical model. The coefficients defined by Eqs. (7), cik, relate the laminate stiffnesses to the micromechanical properties. If the laminate stiffnesses are uniform, and the constitutive relations of the laminate, layer and constituents are independent of the response, then the aij, bjk and cik coefficients are constants, and need to be generated only once for each panel, even when the response is non-linear.

2.3. Basis reduction and reduced system of equations

The response vector 2 is expressed as a linear combination of a few preselected global approximation vectors. The approximation can be expressed by the following transformation

z=r* (8)

The columns of the matrix, r, in Eqs. (8) are the global approximation vectors, and the elements of the vector, #,, are the unknown amplitudes of the approximation vectors. Note that the number of basis vectors in Eqs. (8) is considerably smaller than the total number of degrees of freedom.

A Bubnov-Galerkin technique is used to replace the governing equations of the panel response, Eqs. (l), by the following non-linear reduced equations in the unknown JI

K”IcI + W) - 41& - 92& = 0 (9)

where

K= rTKr (10)

G"(~)=rTc(r*) (11)

&= rTel (12)

6, = rTQ* (13)

The derivative of the response vector with respect to A is formed by differentiating Eqs. (8) as follows

where A is a typical laminate, layer, or micromechanical parameter from the three sets, A:, AT, AT. The derivative of the vector, +, with respect to the parameter, A, is obtained by differentiating the

reduced equations, Eqs. (9), as follows

B.C. Watson, A. K. Noor / Cornput. Methods Appl. Mech. Engrg. IZY (1996) 393-409 397

The effectiveness of the proposed technique for calculating the sensitivity coefficients depends, to a great extent, on the proper choice of the basis vectors (the columns of the matrix f ). An effective choice was found to be the various-order path derivatives of the response with respect to the load parameters. That is, the columns of the matrix, r, used in approximating the response vector, 2, over a range of values of the load parameters, q, and q2, consist of the response vector associated with a preselected pair of values, (q:, qi), and its derivatives with respect to q, and q2, evaluated at the same preselected values.

r=

[

az az a22 a22 a22 z__-_-.. .

aq, aq2 aq: aqlaq2 aq: II YY4$ (16)

The path derivatives (columns of the matrix r) are obtained by successive differentiation of the governing finite element equations of the panel, Eqs. (l), with respect to the parameters q1 and q2 (see (18, 191). Note that only one matrix factorization is needed to generate all the path derivatives.

The derivatives of r with respect to A are obtained by differentiating each of the recursion relations for evaluating the path derivatives. The resulting equations have the same left-hand side matrix as that of the original recursion relations. Therefore, no additional matrix factorizations are needed to generate arlak

3. Computational procedure

3.1. Phases of the computational procedure

The computational procedure for generating the non-linear response vector, 2, and its sensitivity coefficients, aZ/dh, can be conveniently divided into two distinct phases: the reduction phase, and the solution phase.

The reduction phase consists of the evaluation of the basis vectors at a particular load state, (qy, 4;); evaluation of the derivatives of the basis vectors with respect to the panel parameters; generation of the reduced matrices; and generation of the derivatives of the reduced matrices. These tasks involve generating and assembling the elemental arrays, factoring the global structure matrix, evaluating right-hand side vectors and performing forward/ back substitutions. An initial set of basis vectors, associated with zero values of the load parameters, is generated. For postbuckling problems another set of basis vectors and reduced equations are generated in the vicinity of the stability boundary (see [18]).

The solution phase consists of solving the reduced equations for both the response and the sensitivity coefficients for different combinations of load parameters. For each load state, ( ql, q2), the vector of reduced unknowns, $, is obtained by solving the reduced governing equations, Eqs. (9). Then, the derivatives of the reduced unknowns are obtained by solving the reduced equations, Eqs. (15). The response vector and its sensitivity coefficients, Z and aZ/ah, are obtained by using Eqs. (8) and (14).

3.2. Multilevel parallelism

Within the reduction phase, the calculations involved in generating the sensitivity coefficients can be divided into two general stages. The first stage involves assembling and factoring the global stiffness matrix, forming right-hand side vectors and performing forward/back substitutions to generate the basis vectors. The second stage involves forming right-hand side vectors and performing forward/ back substitutions to generate the derivatives of the basis vectors with respect to the panel parameters, and forming the associated reduced matrices. The calculations associated with the first stage can be considered as an overhead because they need to be performed once, regardless of the number of panel parameters.


-Single level parallel strategies-

(b) Eight groups of I processor each

Multiple level parallel strategies.

1 Time, f

(c)Two groups of 4 processors each

V“i-

(d) Four groups of 2 processors each

Fig. 2. Single level and multiple level parallel strategies for evaluating sensitivity coefficients with respect to eight panel parameters using eight processors.

A traditional approach at parallelizing this process is to use all available processors in both of the stages, with the different panel parameters processed in serial (see Fig. 2a). An alternative approach is to make the processors perform the calculations for the different sensitivity coefficients in parallel (see Fig. 2b). Note that this approach requires redundant calculations for the overhead (all of the processors perform these identical calculations). Because of the larger granularity of the parallel tasks, the second approach often outperforms the first. However, for large-scale problems, its effectiveness may be limited by available memory on individual processors.

A third approach combines the advantages of the first two approaches with a multilevel scheme (see Figs. 2c and 2d). The processors are divided into groups, and each group is used to evaluate a subset of the sensitivity coefficients. All of the processors within each group cooperate to reduce the computational workload on any one processor. The groups work independently to achieve a large granularity of task.

3.3. Numerical algorithms

The most time-consuming steps within the computational procedure are those associated with operating on the original, full system of equations, namely, evaluation of the basis vectors, and the derivatives of the basis vectors with respect to each panel parameter. The numerical calculations in these steps involve assembling and factoring the global structure matrix, forming right-hand side vectors, and performing forward and back substitutions.

The element generation and assembly of the global structure matrix is performed using the node-by-node procedure described in [5, 201. The global structure matrix is factored using the nested dissection-based sparse parallel linear solver described in [5]. This algorithm exploits the relatively large-grain parallelism available in a sparse factorization. Additionally, this algorithm enables the forward and back substitution to exploit this same relatively large grain parallelism.

B.C. Watson, A.K. Noor I Comput. Methods Appl. Mech. Engrg. 129 (1996) 393-409 399

4. Implementation on distributed-memory computers

The aforementioned computational procedure has been implemented on three distributed-memory computer systems, namely: the Intel Paragon XP/S computer at Sandia National Laboratories; the Cray T3D computer at the Jet Propulsion Laboratory; and IBM SP2 computer systems at NASA Langley Research Center and the Maui High Performance Computing Center. The major characteristics of these computer systems are described in [5]. The program was coded in Fortran. The Paragon at Sandia used the SUNMOS operating system to obtain improved communication rates.

4.1. Multilevel communications

A library of communication routines, including send and receive, was written in support of the multilevel parallel procedure. These routines allow the use of multiple levels of static groupings of the processors. At the lowest level, all processors form a single group. At higher levels, processors are assigned to one of a user-specified number of static groups. Library routines allow the processors to change their group level when necessary. Communication routines are sensitive to the current group level and map the current number of a processor to the processor number assigned to it by the operating system.

A machine independent global operations library was written to provide reduction and synchronization routines (e.g. global sum). These routines are sensitive to the current group level. At the lowest level, the global operations connect all of the processors. At higher levels, the global operations connect only the processors within a specific group. This allows processors to operate within the group as if they formed a complete parallel job, separate from the other groups.

4.2. Reduction phase

The reduction phase consists of forming the basis vectors, derivatives of the basis vectors, reduced system matrices and the derivatives of the reduced system matrices. This phase comprises the bulk of the total computational effort required for a given problem. Thus, efficient parallelization of this phase is very important.

The implementation follows the multilevel parallel approach. The processors are divided into a number of groups and each group performs the calculations for a subset of the sensitivity coefficients. The number of groups is set at runtime. Each group needs the basis vectors and the factored left-hand side matrix to perform the calculations associated with its own subset of panel parameters. Therefore, each group redundantly completes the overhead of factoring the left-hand side matrix and forming the basis vectors. These computations are performed in parallel using all of the processors within the group. The groups themselves operate in parallel to compute the derivatives of the basis vectors with respect to different panel parameters.

Because each right-hand side vector is built up from the elemental matrices, formation of these vectors is parallelized on the element level. Each processor loops over its own elements and assembles the contributions in local storage for the right-hand side vector. Then, a global sum operation is used to add all of the processors’ contributions into the final right-hand side vector. Note that this global sum only applies to the processors in the current group, and all of these processors receive a copy of the updated vector.

The procedure used to form the reduced system matrices and their derivatives is similar to that used to form the right-hand side vectors. Each processor assembles the contributions from its own elements into its own local storage for the reduced system matrices. Then, a global sum operation is used to add all of the processors’ copies to produce the final matrices.

4.3. Solution phase

The solution phase consists of solving the reduced systems of equations for the reduced unknowns and their derivatives. Because reduced systems associated with different panel parameters are


independent, these are solved on different processors. The basis vectors and their derivatives are used with the reduced coefficients to recover the solution and its derivatives for displacement and stress resultant components of interest.

Once all of the sensitivity coefficients for the panel level parameters have been computed, they are passed to a single processor. That processor then computes the linear combinations of these sensitivity coefficients to form the sensitivity coefficients for the layer- and micromechanical-level parameters using Eqs. (3) and (4), respectively.

5. Numerical studies and performance evaluation of the proposed strategy

To assess the effectiveness of the proposed strategy and the foregoing computational procedure, a number of sensitivity analyses for postbuckling and non-linear responses of composite panels have been performed using this strategy on the Intel Paragon XP/S, the Cray T3D and the IBM SP2 computers. Compiler and operating system information for the systems used in the numerical studies is given in Appendix A.

Typical results are presented herein for two postbuckling problem sets and a large-deflection problem set. The postbuckling problems are: a square stiffened composite panel with a cutout (Fig. 3a); and two discretizations of a stiffened rectangular composite panel with a cutout (Fig. 3b). The loading consists of combined in-plane edge shear and uniform temperature change. The large-deflection problem consists of a shallow cylindrical composite panel subjected to thermal loading (Fig. 3~). Table 1 lists problem size data for the finite element models used.

Fig. 4 shows the wall clock times expended in the sensitivity analysis of the shallow cylindrical panel on the IBM SP2 for varying numbers of processors and groups. In the sensitivity analysis only one global structure matrix factorization is performed, and most of the computational time is expended in the formation of the derivatives of the basis vectors and the associated reduced equations. The relatively low overhead makes the application of the multilevel parallel strategy very effective. The effects of dividing the processors into different numbers of groups can be seen in Fig. 4. As more groups are used, the overhead time increases somewhat. This occurs because fewer processors cooperate to perform this work. However, the overhead time was small to begin with, and remains small even when several groups are used. The time for the formation of the derivatives of the basis

(a) square panel (b) Rectangular panel

32880 DOF

(c) Shellcw cylindrical panel

Fig. 3. Stiffened composite panel models used in the present study.

Table 1


Problem size data for finite element models used in this study

Description Number of Number of elements nodes

Number of displacement degrees of freedom

Number of non-zero coefficients in upper triangular portion of stiffness matrix

Square panel 354 1516 9096 410 460 Rectangular panel Coarse mesh 1320 5480 32 880 1 5 18 600

Fine mesh 2394 9844 59 064 2 747 652 Curved panel 1320 5480 32 880 1518600

1 Total Number of Processors I

Number of Groups

4 1

8 ’ 2

1

32 2

4

0

0 50 100 150

Time (set)

200 250

Fig. 4. Wall clock times expended in evaluating the sensitivity coefficients for the large-deflection response of the cylindrical panel using the reduced basis strategy on the IBM SP2.

vectors is significantly reduced when multiple groups are used, because of the increase in the granularity of the parallel tasks.

Fig. 5 shows the timing results for a sensitivity analysis using the full system of finite element equations, Eqs. (1) and (2), for the shallow cylindrical panel on the IBM SP2. The analysis was performed with varying numbers of processors and groups. In contrast to the sensitivity analysis based on the reduced basis technique, the full-system method requires many global structure matrix factorizations to generate the response and sensitivity coefficients at each load step. As can be seen from Fig. 5, the large overhead associated with global structure matrix factorization renders the multiple group strategy ineffective. Although the time expended in the evaluation of the sensitivity coefficients decreases as the number of groups increases, this time represents only a small portion of the total analysis time. Additionally, the overhead time increases and offsets the benefit of using multiple groups. Comparison of the times shown in Fig. 5 with those shown in Fig. 4 reveals that the advantage of using the reduced basis technique over using a full system analysis increases with the increase in the number of processors used.

Fig. 6 shows the wall clock times for the sensitivity analysis of the postbuckling response for each of the three stiffened composite models (Fig. 3a-b) on the IBM SPZ. Fig. 6 also shows the effects on the total analysis time of using increasing numbers of processors and increasing numbers of groups. When all of the processors are used in a single group, the benefit of using additional processors diminishes as the problem size increases. This is explained by the fact that the larger size problems require a greater


, Total Number of Processors

;I/ Number of Groups

4 1

6 ’ Matrix lactoiizations 2 and other overhead

0 Form RHS for sensihity analysis

16 ; n Fonvardback

4 suLwtnuliins for senslivity anaiysis

1

32 2

4

0

0 100 200 300 400 500 600

Time (set)

Fig. 5. Wall clock times expended in evaluating the sensitivity coefficients for the large-deflection response of the cylindrical panel using the full system of equations on the IBM SP2.

Number of Groups Total Numbr ot Promuors

Number of Groups Total Number of ProwsSon

(a) 9096 DOF Model (b) 32880 DOF Model

1000

Number ot Groups Total Number of Pr-ors

(c) 59064 DOF Model

Fig. 6. Wall clock times expended in evaluating the sensitivity coefficients for the postbuckling response for each of three stiffened composite models on the IBM SP2.


number of processors to overcome local memory limitations. The added interprocessor communication due to doubling the number of processors greatly increases for larger number of processors. For example, the increase in the interprocessor communication resulting from increasing the number of processors from 16 to 32 is significantly higher than that resulting from increasing the number of processors from 2 to 4.

Fig. 6 shows that the use of multiple groups provides a way to increase the effectiveness of additional processors. The use of multiple groups increases the granularity of the parallel tasks and reduces the total interprocessor communication requirements. The reduction becomes more pronounced for larger numbers of processors.

Figs. 7-9 show the computational times for evaluating the sensitivity coefficients for the postbuckling response for each of the three composite panels on the Intel Paragon, the Cray T3D and the IBM SP2,

/ Total Number of Processors

Number of Groups

i, 6

16

32

64

Time (set)

(a) 9096 DOF

(b) 32880 DOF

H Determine stability

boundary and initial

postbuckled response

Assemble and

factor stiffness matrix

0 Form tight-hand-sides

for sensitivity analysis

n Forward/back

substitutions for

sensitivity analysis

6II Form reduced matrices for


loo 2lil & 4i6 A 6& A iI AlI Time (aec)

(c) 59064 DOF

Fig. 7. Breakdown of the wall clock times expended in evaluating the sensitivity coefficients for the postbuckling response on the Intel Paragon.

404 B.C. Watson. A.K. Noor / Comput. Methods Appl. Mech. Engrg. 129 (1996) 393-409

I Total Number of Processors

56

32

64

28

,256

3

5

1

Number of Groups

11 1

:

lo loo 2w 300 400 500

Time (set)

(a) 9096 DOF

Time (SW)

(b) 32880 DOF

1

2

4

5

0 200 400 600 aa0 1oM) 12w

Time (set)

(c) 59064 DOF

H Detenine stability boundary and initial

postbuckled response

Assemble and

factor stiffness matrix

L! Form right-hand-sides


n Forward/back

substitutions for


q Form reduced

matrices for sensitivity analysis

Fig. 8. Breakdown of the wall clock times expended in evaluating the sensitivity coefficients for the postbuckling response on the Cray T3D.

respectively. These figures show the breakdown of the total wall clock times according to the different steps of the computational procedure, and the changes in these times as the number of processors and the number of groups change. As can be seen from Figs. 7-9, for a single group, the percentage of the total time expended in the second stage (i.e. the formation of right-hand sides and forward/back substitutions for evaluating sensitivity coefficients) grows as the problem size increases. Because of this, and the fact that the second stage calculations benefit most from the use of multiple groups, the larger size problems are able to use additional processors more efficiently. On the other hand, when the time expended in the sensitivity-related stage does not dominate the total time, there is little to be gained from the use of additional processors.


,, Total Number of Processors

Number of Groups

-

Time (MC)

(a) 9096 DOF

Assemble and factor stiffness matrix

1

2

4 [I? Form right-hand-sides


1 2 w Forward/back 4 8 substitutions for

0 100 200 300 400 m 600 700 800 sensitivity analysis

Time (set) n Form reduced

(b) 32880 DOF matrices for

I sensitivity analysis

1

2

4

B

0 200 400 500 800 1000

lime (set)

(c) 59064 DOF

Fig. 9. Breakdown of the wall clock times expended in evaluating the sensitivity coefficients for the postbuckling response on the

IBM SP2.

Figs. 10 and 11 demonstrate the large quantity of sensitivity information that can be generated by the aforementioned parallel computational procedure in a relatively short wall clock time (about 200 s for the 59064 DOF model on the IBM SP2, see Fig. 9). Figs. 10 and 11 show typical results for selected sensitivity coefficients for the postbuckling response of the stiffened composite panel with 32 880 degrees of freedom for different combinations of loadings in the postbuckled region. Note that all of the sensitivity coefficients shown in the figures were generated with a single set of basis vectors. Fig. 10 shows the value of the transverse displacement of one node and the sensitivity coefficients with respect

to E,> ET, vL-r-7 aT and 8,. Fig. 11 shows similar results for the total strain energy and its sensitivity coefficients with respect to the same set of panel parameters.


1.6 r

1.6

I

.I T=,

(~yJ~E,)(EJh)

.1 .i .:

1.6 1.6

1.4 I

1.4

"

fl.2 -.I

i -3

-3 i

1.0

:,, * i

1.0

.1 .$ .h .a!0 .i .1 .I .b T=Cl T/r,

@“,/aa#,W @w,&)(l/h)

Fig. 10. Variation of the transverse displacement at node ‘A’ and its sensitivity coefficients with respect to E,, E,, vLT, cxL and (?I with mechanical and thermal loading in the postbuckling region for the composite panel with 32880 DOF.

6. Comments on the computational procedure

The following comments can be made concerning the potential of the foregoing strategy and computational procedure for the hierarchical sensitivity analysis of the non-linear response of large complex structures on distributed-memory parallel computers:

(1) The application of the reduced basis technique shifts the computational effort from the overhead stage (primarily matrix factorization) to the stage of sensitivity calculations (primarily formation of right-hand sides and forward/ back substitutions). This shift allows a larger granularity of parallel tasks. Large granularity of the parallel tasks is the key to efficient use of distributed- memory computers.

(2) The heavy dependence of the procedure on forward/back substitution suggests the use of the most efficient algorithm available for this step. Previous studies (see e.g. [5]) have shown that a nested dissection-based sparse parallel linear solver is very efficient for forward/ back substitution. Other sparse parallel solvers, such as a skyline-based method, simply cannot match the forward/back substitution performance of the nested dissection-based solver.

B.C. Watson, A.K. Noor i Comput. Methods Appl. Mech. Engrg. 129 (1996) 393-409 407

1.6 r

4!0 .I .A 4’ .!I .i .A - .$ .h Tflm l-ZL----_l TfTm

UUE,h4 (aU/aE,)(E,UE,h4)

1.6 r

(JU/JE,)(Uh4)

1.6

.‘1 .A - .$ .!I TK,

wJ~)(o;UE,h*)

(JU/JvL,)(UE,h4)

1.6

.I .1 - .i .!l TKC,

(aU/ae,)(UETh4)

Fig. 11. Variation of the total strain energy and its sensitivity coefficients with respect to E,, E,, vLr, uL and 0, with mechanical and thermal loading in the postbuckling region for the composite panel with 32880 DOF.

(3)

(4)

(5)

The hierarchical sensitivity analysis requires the evaluation of a relatively large number of sensitivity coefficients at the first level. This requirement increases the amount of work to be done at the largest granularity, and thereby increases the efficiency of the multilevel parallel procedure. The efficiency gained by the use of a multilevel parallel procedure increases as the problem size increases. The multiple group strategy allows a larger granularity of tasks, while keeping the redundant calculations at a minimum. This procedure allows for larger numbers of processors to be efficiently used for hierarchical sensitivity analysis of large complex structures. When using a strategy such as the aforementioned multilevel parallel sensitivity analysis procedure, a large amount of data can be generated in a relatively short time. To effectively visualize this data, advanced postprocessing and visualization tools are needed.

7. Concluding remarks

A computational strategy is presented for calculating sensitivity coefficients for the non-linear and postbuckling responses of structures on distributed-memory parallel computers. The strategy is

408 B.C. Watson, A.K. Noor I Comput. Methods Appl. Mech. Engrg. 129 (19%) 393-409

applicable to any message-passing distributed computational environment. The key elements of the proposed strategy are: (a) a multiple-parameter reduced basis technique; (b) a parallel sparse equation solver based on a nested dissection (or multilevel substructuring) node ordering scheme; and (c) a multilevel parallel procedure for evaluating hierarchical sensitivity coefficients. The implementation of the proposed strategy is described for three distributed-memory computers, namely, the Intel Paragon XP/S, the Cray T3D and the IBM SP2. The performance of the strategy is evaluated by performing sensitivity analyses of the postbuckling and non-linear responses of several laminated composite panels. These numerical results demonstrate the effectiveness of the multilevel parallel procedure based on the reduced basis technique for sensitivity analysis. The results demonstrate how the use of multiple groups facilitates the efficient use of large numbers of processors for the sensitivity analysis of the non-linear response for large complex structures.

Acknowledgments

The present work is supported by NASA Cooperative Agreement NCCW-0011. The authors appreciate useful discussions with David Womble of Sandia, Majdi Baddourah of Lockheed, Olaf Storaasli of NASA Langley.

Work on the IBM SP2 at the Maui High Performance Computing Center is sponsored in part by the Phillips Laboratory, Air Force Material Command, USAF, under Cooperative Agreement Number F29601-93-2-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Phillips Laboratory or the U.S. Government.

Appendix A. Operating system and compiler information for the systems used in the present study

At the time of this study the Intel Paragon XP/S at Sandia National Laboratories had 1840 compute nodes, each running SUNMOS version 1.6.1. Of these compute nodes, 512 had 32MB of RAM, while the remainder had 16MB of RAM. The service partition nodes were running OSF/ 1 version R1.2. The code was compiled with Intel’s f77 compiler version R4.5 with level 3 optimization.

The Cray T3D at the Jet Propulsion Laboratory had 256 processors, each running UNICOS-MAX version 1.1.0.2. Each processor had 64MB of RAM. The code was compiled with Cray’s cf77 compiler (cf77_M) version 6.1.

At the Maui High Performance Computing Center, the IBM SP2 had 320 ‘thin’ nodes and 80 ‘wide’ nodes, each running AIX version 3.2. All ‘wide’ nodes were used in the present study. Each processor had at least 64MB of RAM. The IBM SP2 at NASA Langley Research Center had 40 ‘wide’ nodes, each running AIX version 3.2. Each processor had at least 128MB of RAM. The code was compiled with IBM’s XL Fortran compiler (mpxlf) with level 3 optimization.

References

[l] 0.0. Storaasli and E.A. Carmona, Guest eds., High performance computing for flight vehicles (Special Issue) Comput.

Syst. Engrg. 2(2/3) (1992).

[2] A.K. Noor and S.L. Venneri, Guest eds., High performance computing for flight vehicles (Special issue) Comput. Syst.

Engrg. 3( l-4) (1992).

[3] 0.0. Storaasli, J.M. Housner and D.T. Nguyen, Guest eds.), Parallel computational methods for large-scale structural

analysis and design (Special Issue) Comput. Syst. Engng. 4(4-6) (1993).

[4] 0.0. Storaasli. D.T. Nguyen, M. Baddourah and J. Qin, Computational mechanics analysis tools for parallel-vector

supercomputers, Proc. of the 34th AIAAIASMEIASCEIAHSIASC Structures, Structural Dynamics, and Materials

Conference, La Jolla, CA (April 15-22, 1993) 772-778.

[S] B.C. Watson and A.K. Noor, Postbuckling and large-deflection nonlinear analyses on distributed-memory computers, Comput. Syst. Engrg. 5(4-6) (1994) 389-405.


[6] K.H. Law and D.R. Mackay, A parallel row-oriented sparse solution method for finite element structural analysis, Int. J.

Numer. Methods Engrg. 36 (1993) 2896-2919.

[7] J. Qin and D.T. Nguyen, A new parallel-vector finite element analysis software on distributed memory computers, Proc. of

the 34th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, La Jolla, CA (April

15-22, 1993) 98-102.

[S] M.T. Heath and P. Raghavan, A distributed solution of sparse linear systems, University of Illinois Report UIUCDCS-R-93-

1793, Feb. 1993.

[9] K.P. Wang and J.C. Bruch, A highly efficient iterative parallel computational method for finite element systems, Engrg.

Comput. 10 (1993) 196-204.

[lo] A. Gupta and V. Kumar, A scalable parallel algorithm for sparse matrix factorization, Technical Report 94-19, Dept. of

Computer Science, University of Minnesota, April, 1994.

[ll] G. Karypis and V. Kumar, A high performance sparse Cholesky factorization algorithm for scalable parallel computers,

Technical Report 94-41, Dept. of Computer Science, University of Minnesota, 1994.

[12] C.T. Vaughan, Structural analysis on massively parallel computers, Comput. Syst. Engrg. 2(2/3) (1991) 261-267.

[13] E.S. Sikiotis and V.E. Saouma, Parallel structural optimization on a network of computer workstations, Comput. Struct.

29( 1) (1988) 141-150.

[14] M.E.M. El-Sayed and C.K. Hsiung, Design optimization with parallel sensitivity analysis on the CRAY X-MP, Struct.

Optimiz. 3 (1991) 247-251.

[15] M.A. Baddourah and D.T. Nguyen, Geometrically nonlineardesign sensitivity analysis on parallel-vector high-performance

computers, Proc. of the 34th AIAAIASMEIASCEIAHSIASC Structures, Structural Dynamics, and Materials Conference,

La Jolla, CA (April 15-22, 1993) 1901-1905.

[16] A.K. Noor and CM. Andersen, Mixed models and reduced/selective integration displacement models for nonlinear shell

analysis, Int. J. Numer. Methods Engrg. 18 (1982) 1429-1454.

[17] A.K. Noor, Recent advances in the sensitivity analysis for thermomechanical postbuckling of composite panels, ASCE J.

Engrg. Mech. 122(4) (1996) in press.

[18] A.K. Noor and J.M. Peters, Multiple-parameter reduced basis technique for bifurcation and post-buckling analyses of

composite plates, Int. J. Numer. Methods Engrg. 19 (1983) 1783-1803.

[19] A.K. Noor, Recent advances and applications of reduction methods, Appl. Mech. Rev. 47(.5) (1994) 125-145.

[20] M.A. Baddourah, 0.0. Storaasli and E.A. Carmona, A parallel algorithm for generation and assembly of finite element

stiffness and mass matrices, Proc. of the 32nd AIAAIASMEIASCEIAHSIASC Structures, Structural Dynamics, and

Materials Conference, Baltimore, MD (1991) 1547-1553.

sensitivity analysis for large-deflection and postbuckling responses on distributed-memory computers

Documents