aa220/cs238 - parallel methods in numerical analysis...
TRANSCRIPT
Lecture 27
November 26, 2003
AA220/CS238 - Parallel Methods in Numerical Analysis
Parallel Visualization in the ASCI Program
Overview
• Visualization of large-scale datasets generated with massively
parallel machines is a very compute intensive task:
– Large datasets
– Usually time-dependent
– Complex solution features yield large I/O requirements
– Floating point operations needed to render the image
• Advancements are required in several areas
– Basic improvements in visualization algorithms
– Parallel implementation of visualization algorithms
– Parallel visualization hardware (scalable and cost-effective)
Overview - Cont’d
• Examples in this lecture drawn from:
– Stanford ASCI work in unsteady turbomachinery flow simulations
– University of Utah Scientific Computing and Imaging Institute
– Collaboration with MIT on parallel pV3
• A number of research groups are working on parallel
visualization techniques (both hardware and software):
– Stanford University
– U. of Utah
– DoE National Laboratories
– Etc…
Large-Scale Scientific Visualization
Scientific Computing and Imaging InstituteScientific Computing and Imaging Institute
University of UtahUniversity of Utah
Chris Johnson
Interactive Large-Scale Visualization
Medical
Scientific
ComputingGeoScience
The Visualization Pipeline
OnlineOffline
Search RenderConstruct
Isovalue
• Dynamic extraction of isosurfaces
• Rapid extractions
Pre-
process
Generate Render
Visualization Process
Isosurface Extraction
• Marching Cubes
• Octree
• Extrema Graphs
• Sweeping Simplices
• The Span Space
• Livnat, Shen, Johnson
Isovalue
Isovalue
Maximum
Minimum
min =
max
Isosurface Extraction
• Marching Cubes
• Octree
• Extrema Graphs
• Sweeping Simplices
• The Span Space
• Livnat, Shen, Johnson
Isovalue
Isovalue
Maximum
Minimum
min =
max
Isosurface Extraction
• Marching Cubes
• Octree
• Extrema Graphs
• Sweeping Simplices
• The Span Space
• Livnat, Shen, Johnson
Maximum
Minimum
min =
max
– NOISE: O( n+k)
The Visualization Pipeline
Search Renderconstruct
Isovalue
OnlineOffline
• Reduce the amount of data
– Reduce during the search...
Pre-
process
View point
O(k) O(k)O(V(k)) O(V(k))
A View-dependent Approach
• Attractive for:
– Large datasets
– High depth complexity
– Remote visualization
A View-dependent Approach
• Three step method
Traverse
Project
To Graphics
Hardware
Front to back
1) Traverse front to back
2) Project onto a virtual screen
3) Render triangles on graphics hardware
A View-dependent Approach
• Flow chart
• Object Space • Value Space
• Image Space
• Prune non -
intersecting cells
• Front to Back
traversal
• Prune non-visible cells
• Graphics
Engine
• Z-buffer
• Rendering
•S
oft
ware
Hard
ware
• Final Image
• Visibility Part II
• Visibility Part I
Visible Woman
• Full View
• Isosurface depend
• Polys 2,246,000 246,000
• Create 177 sec 72 sec
• Render 2.32 sec 0.25 sec
•
Why Not Always Use Polygons?
• Marching cubes and similar algorithms can
generate millions of polygons for large data sets
– Reduce by decimation (e.g. Shekhar et. al ‘96)
– View dependent (e.g. Livnat and Hansen ‘98)
Real-Time Ray Tracer
Real-Time Ray Tracer (RTRT)
• Implemented on SGI Origin 3000 ccNUMA
architecture - up to 512 processors (now
working on a distributed version)
• Approximately linear speedup
• Load balancing and memory coherence are
key to performance
Algorithm - 3 Phases
• Traversing a ray through cells that do notcontain an isosurface
• Analytically computing the isosurface whenthe intersecting volume contains an isosurface
• Shading the resulting intersection point
0
5
10
15
20
Frame Number (time)
Fra
mes/
seco
nd
(3
2 p
ro
cess
ors)
Real-Time Ray Tracer - Scalability
RTRT Time Varying Visualization
Real-time Volume Rendering
Volume Rendering
enamel /
backgrounddentin / background dentin / enamel dentin / pulp
1D: not possible
2D: specificity not as good
Volume Rendering - 3D Transfer Function
Vector FieldsVector Fields
© ZIB© ZIB
© © UofUUofU
LIC Flow (Banks and Interrante)
Illuminated Lines - C. Hege, ZIB
Tensor Visualization - Hesselink
Brush Strokes (Laidlaw `98)
Lecture 27
November 26, 2003
AA220/CS238 - Parallel Methods in Numerical Analysis
Large-Scale Visualization of
Turbomachinery Flows Using pV3
Objectives
• Utilize existing software and hardware
technologies to visualize large datasets with
proper scalability in both
– Display size / resolution
– Rendering speed
• Interactive visualization of large-scale
datasets for useful investigation of simulation
results
• Understand what can be done with the kind of
visualization systems that will be available on
the desktop in 2-3 years
Motivation
• At Stanford, in the DoE ASCI (Accelerated Strategic
Computing Initiative) we are trying to simulate very large
scale flows in turbomachinery. The visualization of these
flows is rather difficult and time consuming.
• Our CS group has a lot of expertise in software and hardware
for parallel rendering.
• Can we leverage these tools in the context of an engineering-
usable visualization package?
Objective - Demonstrate Potential of
Hi-Fi Gas Turbine Engine Simulation
• Integrated fan/compressor/combustor/turbine/secondariesunsteady flow and turbulentcombustion simulation
– RANS Turbomachinery
– Combustor
• RANS (NASA-NCC)
• LES (CITS)
– Multi-Code Interface
• Complex code coupling
• Will require 100 TFLOPS
• Have industry and NASAparticipation and interest
P&W 6000 Engine
FlamletFlamlet-progress variable model-progress variable model
for combustion LESfor combustion LES
Mixture fraction
Product mass
fraction
P&W combustor 2.5D grid 1P&W combustor 2.5D grid 1
Stanford-ASCI TFLO Project Goals
• To develop a scalable code (TFLO) that is capable of:
– tackling large-scale unsteady flow simulations of multistageturbomachinery, as well as interactions between compressor, combustor,and turbine
– rapid and cost-effective steady and unsteady analyses required in a designenvironment (single blade passages, multiple stage simulation with lowblade counts) comparable to existing industrial practice
– incorporate advanced turbulence models with corrections to account foreffects typical in turbomachinery (streamline curvature, rotation, etc.)
• To contribute to the development of numerical simulationtechniques that make this type of calculations computationallyaffordable
• To demonstrate integrated calculations simulating the interactionbetween the compressor, combustor, and HP/LP turbine
Gas-Turbine Components
TFLO performance on P&W 6000 turbine
Unsteady Simulation of
Aachen Turbine Rig (TFLO)
Entropy
x/C
pressureenvelope(p/pref)
0 0.25 0.5 0.75 11.2
1.25
1.3
1.35
1.4
passage count 1-1-1passage count 6-7-6
Aachen BladeUnsteady Pressure
Envelopes• Simulation Completed,
AIAA Paper Presented
• 13.5 M Points
• 374 Blocks
• 187 Processors
• 2,800 Time-Steps (w/ 30
inner iterations per time-
step) Required
• 1,985 Hours (clock
time), 371,000 Hours
(cpu time) Required
Frequency/BPF
PressureAmplitude(Pa)
0 1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
Estimated T.E. vortex sheddingfrequency: 10 BPF
1.00
0.69 0.660.83
0.76
0.28
0.03
0.21
0.240.24
0.10
0.03
0.17
0.21
0.10
0.38
0.030.21
0.41
0.14
0.310.03
0.55
0.45
0.38
0.21 0.03
0.03
0.17
0.03
0.07
0.21
T.E.L.E. L.E.
S.S. P.S.
Amplitude of Harmonics
Frequency Spectrum
PredictedMeasured
Plane No. 00Time Index: 1
Secondary Velocity Field
Unsteady Flow Simulation of
P&W Turbine Rig (TFLO)
PressureEntropy
Blade Trailing
Edge Shocks
Shock/Blade
Interaction:
Reflected
Waves from
Vane
Viscous
Wake/Blade
Interaction
Vane/Blade
Potential
Interaction • One Global (1/6 Circumference)
(33% of Total) Completed
• 31.2 M Points
• 652 Blocks
• 196 Processors
• 4,200 Time-Steps (w/ 30 inner
iterations per time-step) Required
• 4,125 Hours (clock time), 808,500
Hours (cpu time) Required
Pressure Loading
Compares Well with
Experiment and PW
Prediction
Predicted Aerodynamic
Losses Compare
Favorably with PW
Prediction
Unsteady Flow Simulation of PW6000 Turbine
(TFLO)
Entropy
• 63% of One GlobalCycle (1/6Circumference)(21% of Total)Completed
• 93.8 M Points• 2192 Blocks
• 512 - 1024Processors
• 5,700 Time-Steps(w/ 30 inneriterations per time-step) Required
• 5,970 Hours (clocktime), 3,060,000Hours (cpu time)Required
HPT (1,2,3)
LPT (5,6,7)
Main/Secondary Flow Path Integration
Direct Coupling (SPMD)
Temperature and
Streamlines
(projected inconstant )
Temperature and 3D Blade-
Relative Streamlines
Pressure and
Streamlines
(projected inconstant )
• Simulation Complete
• 9.4 M Points, 238 Blocks, 144
Processors,
• 1-200,000 Time-Steps Required
• 3,700 Hours (clock time), 532,800
Hours (cpu time) Required
Key Technologies
• Hardware
– High resolution displays• Powerwall
• Super-high resolution displays (5000x3000 kind)
– High speed network interconnects for commodityclusters
• Software
– Support for tiled / high resolution displays with wireGL
– Parallel software implementation for scalablerendering using pV3
Why pV3?
• pV3 is already setup for
– Parallel feature extraction
– Concurrent visualization
– Distributed visualization
– Computational steering
• Work to be done
– Use of wireGL for tiled displays (completed)
– Parallelization of renderer (almost completed)
Current Large-Scale Visualization Setup
GR
1
GR
2
GR
3
GR
4
CPU 1
CPU 2
CPU 3
CPU 4
WAN
CPU 1
CPU 2
CPU 3
CPU 4
CPU 5
CPU 6
CPU 7
CPU 8
CPU 9
CPU 10
CPU 11
CPU 12
CPU 13
CPU 14
CPU 15
CPU 16
pV3
clients
pV3
server
(wiregl)
Feature Extraction
Rendering
wiregl
Bottlenecks in WAN (avoidable),
single renderer (in progress),
internal network
Future Large-Scale Visualization Setup
GR
1
GR
2
GR
3
GR
4
CPU 1
CPU 2
CPU 3
CPU 4
WAN
CPU 1
CPU 2
CPU 3
CPU 4
CPU 5
CPU 6
CPU 7
CPU 8
CPU 9
CPU 10
CPU 11
CPU 12
CPU 13
CPU 14
CPU 15
CPU 16
pV3
clients
pV3
server
(wiregl)
Feature Extraction
Rendering
wiregl
Advantages / Expected Outcome
• Rendering speed 12 x on current display
(best case scenario)
• High resolution images for flow details
• Large degree of interactivity for
turbomachinery flow visualizations
• Parallel I/O will be necessary for unsteady
flow visualizations