msc software: partner showcase - nvidia gpu computing ... · msc nastran is the world’s most...

The power wall (resulting from increase in power consumption and heat dissipation due to increased processor speeds) has introduced radical changes in computer architectures. Increasing core counts and hence, increasing parallelism have replaced increasing clock speeds as the primary way of delivering greater hardware performance. A modern GPU (Graphics Processing Unit) consists of hundreds of simple processing cores; this degree of parallelism on a single processor is typically referred to as ‘many-core’ relative to ‘multi-core’ that refers to processors with at most a few dozen cores.

Many-core GPUs will often demand a high degree of fine-grained parallelism – the application program should create many threads so that while some threads are waiting for data to return from memory other threads can be executing – offering a different approach in

terms of hiding memory latency because of their specialization to inherently parallel problems.With the ever-increasing demand for more computing performance, the HPC industry is moving towards a hybrid computing model, where GPUs and CPUs work together to perform general purpose computing tasks. In this hybrid-computing model, the GPU serves as a co-processor to the CPU. Co-processing refers to the use of an accelerator, a GPU, to offload the CPU and to increase computational efficiency. In order to exploit this hybrid computing model and the massively parallel GPU architecture, application software will need to be redesigned. MSC Software and NVIDIA engineers have been working together over the last year on the use of GPUs to accelerate the sparse direct solver in MSC Nastran.

PARTNER SHOWCASE

Partner Showcase: NVIDIA | 1

Key Highlights:

IndustryHigh-Perfomance Computing

ChallengeIncrease computing performance by developing a hybrid computing model

MSC Software SolutionsMSC Nastran 2012 to support GPU computing capability including multiple GPU computing capability for DMP runs

Benefits•Vastly reduce use of pinned host memory

•Handle arbitrarily large fronts, for very large models

NVIDIA is a MSC Software Performance partner with Quadro® and Professional Solution product lines that provide excellent performance for Patran and MSC Nastran on Windows® and Linux® systems.

MSC Software: Partner Showcase - NVIDIA

GPU Computing Accelerates Simulation Performance for MSC Nastran Users

PARTNER SHOWCASE

2 | MSC Software


Solver Acceleration in MSC Nastran 2012: A sparse direct solver is possibly the most important component in a finite element structural analysis program. Typically, a multi-frontal algorithm with out-of-core capability for solving extremely large problems and BLAS level 3 kernels for the highest compute efficiency is implemented. Elimination tree and compute kernel level parallelism with dynamic scheduling is used to ensure the best scalability. The BLAS level 3 compute kernels in a sparse direct solver are the prime candidate for GPU computing due to their high floating point density and favorable compute to communication ratio.

The proprietary symmetric MSCLDL and asymmetric MSCLU sparse direct solvers in MSC Nastran employ a super-element analysis concept instead of dynamic tree level parallelism. In this super-element analysis, the structure/matrix is first decomposed into large sub-structures/sub-domains according to user input and load balance heuristics. The out-of-core multi-frontal algorithm is then used to compute the boundary stiffness, or the Schur compliment, followed by the transformation of the load vector, or the right hand side, to the boundary. The global solution is found after the boundary stiffness matrices are assembled into the residual structure and the residual structure is factorized and solved. The GPU is a natural fit for each sub-structure boundary stiffness/Schur compliment calculation.

Today’s GPUs can provide memory bandwidth

and floating-point performance that are several factors faster than the latest CPUs. In MSC Nastran, the most time consuming part is the BLAS level 3 operations in the multi-frontal factorization process. To date, only the trailing matrix updates of the front factorization are implemented as CUDA kernels and these update kernels are the subject of a collaborative work between NVIDIA and MSC engineers.

GPU Computing Implementation and Target Analysis (Solution Sequences): NVIDIA’s CUDA parallel programming architecture is used to implement the update kernels. CUDA is the hardware and software architecture that enables NVIDIA GPUs to execute programs written with C, C++, FORTRAN, OpenCL, and other languages.

Vastly reduced use of pinned host memory and the ability to handle arbitrarily large fronts, for very large models (greater than 15M DOF) on a single Tesla C2050 GPU, are some strengths of the GPU implementation in MSC Nastran 2012. ‘Staging’ is a term that is used to describe how very large fronts are handled. If the trailing submatrix is too large to fit on the GPU device memory, then it is broken up into approximately equal-sized ‘stages’ and the stages are completed in order. Multiple streams are used within a stage. So, for an arbitrarily large submatrix, say 40GB, then it would be solved in, say, 10 stages of 4GB each. The actual sizes of the stages can be varied for performance tuning.

In addition, the MSC Nastran implementation supports multiple GPU computing capability for DMP (Distributed Memory Parallel) runs. In such cases of DMP>1, multiple fronts are factorized concurrently on multiple GPUs. The matrix is decomposed into two domains, and each domain is computed by a MPI process.

A typical MSC Nastran job submission command with multiple GPUs is shown below:

nastran2012 jid=myinput mem=48gb buffsize=65537 dmp=2 gpuid=0:1 gputhresh=12000 sys205=192 sys151=1 mode=i8 sdir=/local/skodiyal/tmp bat=no scr=yes

gpuid is the ID of a licensed GPU device to be used in the analysis. Multiple IDs may be assigned to MSC Nastran DMP runs. gputhresh represents the minimum threshold for GPU computing in the multi-frontal sparse factorization. If the product of the rank size and the front size of each front is smaller than value, the rank update of the front is processed on the CPU. Otherwise, the GPU device would be used for the rank update of the front.

The GPUs supported with this implementation are the NVIDIA Tesla 20-series (shown in Figure 1) and Quadro GPUs based on the Fermi architecture (compute capability 2.0). Linux and Windows 64-bit platforms are supported

Any ‘fat’ BLAS3 code path would be potential candidate for GPU computing. Sparse direct solver intensive SOL101 (linear statics), SOL108 (direct frequency) and SOL400 (nonlinear) fall into this category.

Figure 1: NVIDIA Tesla 20-series GPUs (workstation & server form factors)

PARTNER SHOWCASE

Partner Showcase: NVIDIA | 3


SOL108 would need a complex sparse direct solver that is not supported in MSC Nastran 2012 implementation, however, this feature is currently under development and testing for an upcoming point release. Likewise, conventional SOL111 (modal frequency) with large MPYAD’s (multiply-add) also should benefit from GPU computing in a later release.

Performance analysis with GPU Computing:Linear and nonlinear structural stress analysis are the target applications with this first implementation of GPU computing in MSC Nastran 2012. Structural finite element models dominated by solid elements provide for more concentrated computational work in the sparse matrix factorization, which is highly desirable for the GPU. A range of models with varying fidelity, from around 1M degrees of freedom (DOF) to 15M DOF is considered (Figure 2). Performance comparisons are relative to a serial Nastran run, which is still widely adopted within the customer community, as well as with multi-core (2x quad-core Nehalem) CPUs.

The hardware configurations used with these benchmark runs consisted of:

(1) AMAX server, Linux, 2x hex-core Westmere, 2.67GHz, 32GB memory, 2x Tesla C2050 GPU for the 945K and 1.3M DOF model

(2) Super Micro server, Linux, 2x quad-core Nehalem 2.27GHz, 96GB memory, 2.2 TB SATA 5-way striped RAID and 2x Tesla C2050 GPU for all other models.

Figure 3 shows the end-to-end (total) speed-up for single and multiple GPU runs. In general, based on the benchmark models, we see speed-ups in the range of 4-6X with a single GPU over a serial run and in the range of 1.4-2X with 2 GPUs over a 8 core DMP run.

Summary:GPU computing is implemented in MSC Nastran 2012 to significantly lower the simulation times for industry standard analysis models. Vastly reduced use of pinned memory and the ability to handle arbitrarily large front sizes for very large models are some of the strengths of this implementation. Further, multiple GPUs can be used with Nastran DMP analysis. The performance speed-ups

enabled by GPU computing will facilitate MSC Nastran users to add more realism to their models thus improving the quality of the simulations. A rapid CAE simulation capability from GPUs has the potential to transform current practices in engineering analysis and design optimization procedures.

This initial GPU computing implementation also identified certain issues – for one, the larger the model, the higher the DMP overhead in MSC Nastran. This increased CPU side overhead reduces the overall speed-up resulting from GPU computing. Future releases of MSC Nastran will address such issues as well as expand the GPU computing capability to include complex solver kernels for the NVH and dynamics markets.

Figure 2: Automotive crank shaft (945K DOF) and engine (15.2M DOF) models

Figure 3: Performance speed-ups with Single and Multiple GPUs using MSC Nastran 2012 models

Europe, Middle East, AfricaMSC Software GmbHAm Moosfeld 1381829 Munich, GermanyTelephone 49.89.431.98.70

Asia-PacificMSC Software Japan LTD. Shinjuku First West 8F23-7 Nishi Shinjuku1-Chome, Shinjuku-KuTokyo, Japan 160-0023Telephone 81.3.6911.1200

Asia-PacificMSC Software (S) Pte. Ltd. 100 Beach Road#16-05 Shaw Tower Singapore 189702Telephone 65.6272.0082

CorporateMSC Software Corporation2 MacArthur PlaceSanta Ana, California 92707Telephone 714.540.8900www.mscsoftware.com

The MSC Software corporate logo, MSC, and the names of the MSC Software products and services referenced herein are trademarks or registered trademarks of the MSC Software Corporation in the United States and/or other countries. All other trademarks belong to their respective owners. © 2012 MSC Software Corporation. All rights reserved.

NVIDIA*2012MAY*PS

About MSC NastranMSC Nastran Structural & Multidiscipline FEA

MSC Nastran is the world’s most widely used Finite Element Analysis (FEA) solver that helped MSC Software become recognized in 2011 as one of the “10 Original Software Companies”. When it comes to solving for stress/strain behavior, dynamic and vibration response and thermal gradients in real-world systems, MSC Nastran is recognized as the most trusted multidiscipline solver in the world.

MSC Nastran is built on work done by NASA scientists and researchers, and is trusted for the design of mission critical systems in every industry. Nearly every spacecraft, aircraft, and vehicle designed in the last 40 years has been analyzed using MSC Nastran.

In recent years, several extensions to its capabilities have resulted in a single multidisciplinary solver providing users with a trusted solution to simulate everything from a single component to complex assemblies under diverse conditions.

MSC Nastran offers a complete set of linear static and dynamic analysis capabilities along with unparalleled support for superelements enabling users to solve large, complex assemblies more efficiently. MSC Nastran also offers a complete set of implicit and explicit nonlinear analysis capabilities, thermal and interior/exterior acoustics, and coupling between various disciplines such as thermal, structural, and fluid interaction. New modular packaging that enables you to get only what you need makes it more affordable to own MSC Nastran than ever before.

Please visitwww.mscsoftware.com

for more partner showcases

About MSC SoftwareMSC Software is one of the ten original software companies and the worldwide leader in multidiscipline simulation. As a trusted partner, MSC Software helps companies improve quality, save time and reduce costs associated with design and test of manufactured products. Academic institutions, researchers, and students employ MSC technology to expand individual knowledge as well as expand the horizon of simulation. MSC Software employs 1,000 professionals in 20 countries. For additional information about MSC Software’s products and services, please visit www.mscsoftware.com.

PARTNER SHOWCASE

msc software: partner showcase - nvidia gpu computing ... · msc nastran is the world’s most...

Documents