performance study: fluent 12 and panfs · 2016. 1. 28. · since 2006, ansys and panasas have...
TRANSCRIPT
Performance Study:
FLUENT 12 and PanFS
Stan PoseyIndustry and Applications Market DevelopmentPanasas, Fremont, CA, USA
Bill LoeweTechnical Staff Member, Applications EngineeringPanasas, Fremont, CA, USA
Slide 2 Panasas, Inc.
Founded 1999 By Prof. Garth Gibson, Co-Inventor of RAID
Technology Parallel File System and Parallel Storage Appliance
Locations US: HQ in Fremont, CA, USA
R&D centers in Pittsburgh & Minneapolis
EMEA: UK, DE, FR, IT, ES, BE, Russia
APAC: China, Japan, Korea, India, Australia
Customers FCS October 2003, deployed at 200+ customers
Market Focus
Alliances
Energy Academia
Government Life Sciences
Manufacturing Finance
ISVs: Resellers:
Primary Investors
Panasas Company Overview
Slide 3 Panasas, Inc.
Background on Parallel FLUENT Study
FLUENT is an application from ANSYS -- not a benchmark kernel
The CFD models and tests and relevant to customer practice
This was run on a production system at customer U of Cambridge
The results were validated by U of Cambridge and ANSYS
Since 2006, Ansys and Panasas have jointly-invested in development of parallel I/O for release in FLUENT 12
This study demonstrates benefits of Panasas parallel file system and parallel storage for FLUENT 12 vs. FLUENT 6.3 with tests for both capability and capacity computing
Collaborators include Ansys and the University of Cambridge
Motivation
Considerations
Slide 4 Panasas, Inc.
University of CambridgeHPC Service, Darwin Supercomputer
Darwin Supercomputer Computational Units
Nine repeating units, each consists of 64 nodes (2 racks) providing 256 cores each, 2340 cores total
All nodes within a CU connected to a full bisectional bandwidth Infiniband 900 MB/s, MPI latency of 2 ms
Description of Darwin at U of Cambridge
Source: http://www.hpc.cam.ac.uk
Slide 5 Panasas, Inc.
Number of cells 111,091,452
Solver PBNS, DES, Unsteady
Iterations5 time steps, 100 total iters -
data save after last iteration
Output size:
FLUENT v6.3 (serial I/O; size of .dat file)
FLUENT v12 (serial I/O; size of .dat file)
FLUENT v12 (parallel I/O; size of .pdat file)
14,808 MB
16,145 MB
19, 683 MB
Unsteady external aero for 111 MM cell truck; 5 time
steps with 100 iterations, and a single .dat file write
Univ of Cambridge DARWIN Cluster
Location: University of Cambridge http://www.hpc.cam.ac.uk
Vendor: Dell ; 585 nodes; 2340 cores; 8 GB per node; 4.6 TB total mem
CPU: Intel Xeon (Woodcrest ) DC, 3.0 GHz / 4MB L2 cache
Interconnect: InfiniPath QLE7140 SDR HCAs; Silverstorm 9080 and 9240 switches,
File System: Panasas PanFS, 4 shelves, 20 TB capacity
Operating System: Scientific Linux CERN SLC release 4.6
DARWIN 585 nodes; 2340 cores
Panasas: 4 Shelves, 20 TB
Truck111M Cells
Details of the FLUENT 111M Cell Model
Slide 6 Panasas, Inc.
Input (mesh, conditions)
Results(pressures, . . .)
start
.
.
.
.
.
.
iter ~10,000
complete
Time historytime step = 500
Time historytime step = 25
Time historytime step = 20
Time historytime step = 15
Time historytime step = 10
Time historytime step = 5
Unsteady CFD Simulation Schematic and Typical I/O Profile
This Study is a Partial CFD Simulation
The Focus of the
FLUENT study is
only a sub-set of
a full unsteady
CFD simulation:
- Read once
- Compute 5 time
steps (100 iters)
- Write once (but
full simulation has
multiple writes)
Slide 7 Panasas, Inc.
FLUENT 12 and New Parallel I/O Scheme
Panasas and ANSYS Alliance Has Produced Parallel I/O for FLUENT 12
FLUENT 12: Offers
support for PanFS
and most other
parallel file systems
Beta available today
Serial I/O Scheme Improvement Parallel I/O Scheme
Source: Barb Hutchings Presentation at SC07, Nov 2007, Reno, NV
12
Slide 8 Panasas, Inc.
FLUENT Comparison of PanFS vs. NFS on University of Cambridge Cluster
Parallel FLUENT 12 and 2x Improvement
Truck Aero111M Cells
Number of Cores
2541
1318
2680
4568
0
1000
2000
3000
4000
5000
64 128
PanFS -- Solver OnlyPanFS -- FLUENT 12NFS -- Solver OnlyNFS -- FLUENT 6.3
Lower
is
better
1.9x
1.8x
2.0x
Tim
e (S
eco
nd
s) o
f S
olv
er +
Dat
a F
ile W
rite Time of Solver + Data File Write
NOTE: Solver times same for PanFS and NFS
NOTE: Solver scales linear as expected NOTE: Read
times are not
included in
these results
Slide 9 Panasas, Inc.
Number of Cores
2541
1318
779
4568
2680
1790
0
1000
2000
3000
4000
5000
64 128 256
PanFS -- FLUENT 12
NFS -- FLUENT 6.3
Truck Aero111M Cells
Scalability of Solver + Data File WriteT
ime
(Sec
on
ds)
of
So
lver
+ D
ata
File
Wri
te Time of Solver + Data File Write
Lower
is
better
1.7x
1.5x
1.9x
1.7x
FLUENT Comparison of PanFS vs. NFS on University of Cambridge Cluster
NOTE: Read
times are not
included in
these results
1.8x
2.0x
2.3x
Slide 10 Panasas, Inc.
Number of Cores
273
345334
0
100
200
300
400
64 128 256
PanFS -- FLUENT 12
NFS -- FLUENT 6.3
Higher
Is
Better
Truck Aero111M Cells
Performance of Data File Write in MB/sE
ffec
tive
Rat
es o
f I/O
(M
B/s
) fo
r D
ata
Wri
te
39x
Effective Rates of I/O for Data File Write
FLUENT Comparison of PanFS vs. NFS on University of Cambridge Cluster
31x
20x NOTE: Data
File Write Only
Slide 11 Panasas, Inc.
FLUENT 6.3 End-User Challenges
Large production cases may not scale effectively and efficiently on a large
cluster (>64 cores) for read and write operations owing to serial I/O
The use of frequent checkpoints for very large steady-state cases, and/or
large unsteady simulations (multiple writes) is impractical with serial I/O
FLUENT 12 and Panasas Solution
The Panasas parallel file system and storage, combined with parallel I/O
of FLUENT 12 scales I/O and therefore the overall FLUENT simulation
Use of PanFS for the 111M cell case at 64-way provides a nearly 2x
increase in FLUENT utilization for the same software license $’s spent
Such capability enables FLUENT users to develop more advanced CFD
models (more transient vs. steady, LES, etc.) with confidence in scalability
Observations From FLUENT 12 Study
Slide 12 Panasas, Inc.
A single wide-parallel job vs. multiples of less-parallel jobs
Often referenced in HPC industry as capability vs. capacity computing
Both are important, but capacity computing more common in practice
Example: design optimization based on capacity, impractical with capability
Panasas scales I/O for the large single CFD job, and provides
parallel data access (vs. serial NFS) for multi-job scenarios
A multi-job test was developed with the Truck model at 14M cells:
The same Truck model with a coarsened mesh from 111M to 14M cells
Launched 8 times (8 copies) each using 16 cores for a total 128 cores
PanFS is parallel, NFS has single data path for during 8 solution writes
Two Measures of FLUENT Performance
Single Large Job Multiple Job Instances
Compute nodes
PanFS and storage
Truck Aero14M Cells
Slide 13 Panasas, Inc.
Number of cells 13,839,118
Solver PBNS, DES, Unsteady
Iterations5 time steps, 100 total iters -
data save after last iteration
Output size:
FLUENT v6.3 (serial I/O; size of .dat file)
FLUENT v12 (serial I/O; size of .dat file)
FLUENT v12 (parallel I/O; size of .pdat file)
2,466 MB
2,467 MB
2,898 MB
Unsteady external aero for 14 MM cell truck; 5 time
steps with 100 iterations, and a single .dat file write
Univ of Cambridge DARWIN Cluster
Location: University of Cambridge http://www.hpc.cam.ac.uk
Vendor: Dell ; 585 nodes; 2340 cores; 8 GB per node; 4.6 TB total mem
CPU: Intel Xeon (Woodcrest )DC, 3.0 GHz / 4MB L2 cache
Interconnect: InfiniPath QLE7140 SDR HCAs; Silverstorm 9080 and 9240 switches,
File System: Panasas PanFS, 4 shelves, 20 TB capacity
Operating System: Scientific Linux CERN SLC release 4.6
DARWIN 585 nodes; 2340 cores
Panasas: 4 Shelves, 20 TB
Truck14M Cells
Details of the FLUENT 14M Cell Model
Slide 14 Panasas, Inc.
30
168
0
100
200
300
Single-Job 16-way Multi-Job PanFS 8 x 16-way Multi-Job NFS 8 x 16-way
PanFS -- FLUENT 12
NFS -- FLUENT 6.3
Panasas Benefit with Multi-Job FLUENTT
ime
(Sec
on
ds)
of
Dat
a F
ile W
rite
Data File Write Times
Lower
is
better
FLUENT Comparison of PanFS vs. NFS on University of Cambridge Cluster
5x
8x
Av = 279
Av = 36
66%
NOTE: NFS
degrades 66%
on average for
the multi-job
scenario test
Truck Aero14M Cells
Slide 15 Panasas, Inc.
Number of Cores
345
0
100
200
300
400
128 128
PanFS -- FLUENT 12
Truck Aero14M and
111M Cells
Performance of Data File Write in MB/sE
ffec
tive
Rat
es o
f I/O
(M
B/s
) fo
r D
ata
Wri
te Effective Rates of I/O for Data File Write
FLUENT Comparison of PanFS vs. NFS on University of Cambridge Cluster
NOTE: Data
File Write Only
644 MB/s Aggregate
NOTE: Comparison of capability vs. capacity I/O rates at 128 cores
Capacity aggregate performance nearly 2x
faster vs. capability
20 GB write to single file 23 GB write to 8 files
Slide 16 Panasas, Inc.
Contributors to the Study
University of Cambridge
Dr. Paul Calleja, Director, HPCS
Dr. Stuart Rankin, Lead System Manager, HPCS
ANSYS
Dr. Prasad Alavilli, FLUENT and CFX Development
Ms. Barbara Hutchings, Director of Technology Alliances
Panasas
Mr. Derek Burke, Director of Marketing, Panasas EMEA
Ms. Michelle Cheng, Director of Global Alliances
Slide 17 Panasas, Inc.
Stan Posey
Thank You for This Opportunity
RESOURCES:
Questions can be directed to the Panasas email addresses below
The 111M cell truck model is public and available from Ansys
http://www.fluent.com/software/fluent/fl6bench/fl6bench_6.3/problems/truck_111m.htm
FLUENT log files of all jobs are available upon request to Panasas
Bill Loewe