experimental use of a large gpgpu-enhanced linux cluster computational sciences 16 june 2009 robert...
TRANSCRIPT
![Page 1: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/1.jpg)
Experimental Use of a Large Experimental Use of a Large GPGPU-Enhanced Linux Cluster GPGPU-Enhanced Linux Cluster
Computational Sciences
16 June 2009Robert F. Lucas
(310)448-9449Approved for public release; distribution is unlimited.
![Page 2: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/2.jpg)
Joshua DHPI
20 of 28 racks
![Page 3: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/3.jpg)
Overview
• Background and DHPI Proposal
• Initial Configuration and Results
• GPU Training and Early Research
• Transition to Production Use
• Summary
![Page 4: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/4.jpg)
JFCOM Background
• U.S. Joint Forces Command (JFCOM) • One of DoD’s combatant commands• Key role in transforming defense capabilities
• Two Directorates using agent-based simulations• J7 Training• J9 Experimentation
• Simulations are characterized by• Interactive use by hundreds of personnel• Distributed trans-continentally• Typical users are uniformed warfighters
![Page 5: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/5.jpg)
Simulation Federates
• Rule-based agent models of entity behavior• Individual entities run sequentially• Can be interactive and run in real time• Large compute clusters required to run big ensemble
• HLA RTI communication (IEEE 1516)• Fault-tolerant• Publish/Subscribe model
• Relevant Examples:• OneSAF• Joint Semi-Automated Forces (JSAF)• “Culture”, simplified civilian derivative• Simulation of the Location and Attack of Mobile Enemy
Missiles (SLAMEM)
![Page 6: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/6.jpg)
Heterogeneous Ensemble
Log DB
SQL DB
MARCI
Logger
Log DB
SQL DB
MARCI
Logger
Log DB
SQL DB
MARCI
Logger
Log DB
SQL DB
MARCI
Logger
Log DB
SQL DB
MARCI
Logger
Log DB
SQL DB
MARCI
Logger
Log DB
SQL DB
MARCI
Logger
Front End
OOSSOARSLAMEMJSAFCulture
Event Control
Pucker
![Page 7: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/7.jpg)
Model of a City
![Page 8: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/8.jpg)
Experimental Sensor Architecture
![Page 9: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/9.jpg)
DHPI Proposal Basics
NEED - 24x7x365 enhanced, distributed and scalable compute resources to enable joint warfighters at JFCOM … to develop, explore, test, and validate 21st century battlespace concepts … to enhance global-scale, computer-generated military experimentation by sustaining more than 2,000,000 entities on appropriate terrain with valid phenomenology.
APPROACH – Enable further growth in entity count, entity complexity, and environmental/Infrastructure settings by employing large Linux cluster with General Purpose GPUs (GPGPU) on each node to aid in line-of-sight, route planning, plume representation, all capable of running faster than real time.
CHALLENGES – Effectively implementing Hardware configuration to provide stable and useful platform, motivate/train operators to utilize GPGPUs, and program simulations to take advantage of GPGPUs
![Page 10: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/10.jpg)
Cluster Configuration as
Originally Specified
• 256 NodesNode (2) AMD Santa Rosa 2220 2.8 GHz dual-core processors
GPU (1) NVIDIA 7950 Video Card
Node Chassis 4U chassis
Memory 16 GB DIMM DDR2 667 per node
• GigE Internode Communications• Delivery to:
Joint Advanced Tactics and Training Laboratory
Suffolk, VA
![Page 11: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/11.jpg)
Modification Requested
• Shortly after award, NVIDIA announced CUDA
• Only available on 8800 model cards• Programming with CUDA seen as:
• Easier• More extensible• More accessible by C programmers
• Requested and received upgrade
![Page 12: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/12.jpg)
Management Rack
![Page 13: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/13.jpg)
Cluster Row Installed in Suffolk Virginia
![Page 14: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/14.jpg)
Exceeding the Goal10 Million SAF Entities
![Page 15: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/15.jpg)
PerspectiveEntity Growth vs.
Time
Nu
mb
er
an
d
Com
ple
xit
y o
f JS
AF
En
titi
es
JSAF/SPP Joshua (2008)
10,000,010,000,00000
UE 98-1
(1997)
JSAF/SPP Capability (2006)
JSAF/SPP Urban
Resolve (2004)
JSAF/SPP
Tests (2004)
J9901 (1999)
SAF Expres
s (1997)
3,600 3,600 12,000 12,000 107,00107,00
0 0
AO-00 (2000)
50,000 50,000
1,400,001,400,00
1,000,001,000,0000
250,000250,000
SPP Proof of Principle DARPA / Caltech
Experiments continue to require orders of magnitude larger &
more complex battlespaces
SCALEand FIDELITY
DC Clusters at MHPCC & ASCMSRC
DHPI GPU-
Enhanced Cluster
![Page 16: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/16.jpg)
Why GPUs?
• GPU performance can be 100X hosts• This differential is expected to grow
• Early OneSAF work (UNC & SAIC)• Line of Sight• Route Finding• Collision Detection
• ISI verified they’re also bottlenecks in JSAF
![Page 17: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/17.jpg)
Route Planning Performance Impact
Time Spent in Route Planning is Critical Bottleneck
![Page 18: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/18.jpg)
Fortran vs CUDA
do j = jl, jr do i = jr + 1, ld x = 0.0 do k = jl, j - 1 x = x + s(i, k) * s(k, j) end do s(i, j) = s(i, j) - x end doend do
ip=0;for (j = jl; j <= jr; j++) { if(ltid <= (j-1)-jl){ gpulskj(ip+ltid) = s[IDXS(jl+ltid,j)]; } ip = ip + (j - 1) – jl + 1; }
__syncthreads();
for (i = jr + 1 + tid; i <= ld; i += GPUL_THREAD_COUNT) { for (j = jl; j <= jr; j++) { gpuls(j-jl,ltid) = s[IDXS(i,j)]; } ip=0; for (j = jl; j <= jr; j++) { x = 0.0f; for (k = jl; k <= (j-1); k++) { x = x + gpuls(k-jl,ltid) * gpulskj(ip); ip = ip + 1; } gpuls(j-jl,ltid) -= x; } for (j = jl; j <= jr; j++) { s[IDXS(i,j)] = gpuls(j-jl,ltid); } }
![Page 19: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/19.jpg)
Early GPU Programming
• Trained ISI staff with Sparse Matrix Solver• Then examined JSAF kernels
• Line-of-sight• Illumination • Route planning
• Route planning appeared easiest to integrate• Route planning work published at I/ITSEC
Tran, J.J., Lucas, R.R., Yao, K-T., Davis, D.M. and Wagenbreth, G., Yao, K-T., &
Bakeman, D.J., (2008), A High Performance Route-Planning Technique for Dense
Urban Simulations, WSC05-2008 I/ITSEC Conference, Orlando, FL
![Page 20: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/20.jpg)
Timing Comparison CPU v GPU
Single Source Shortest Path
![Page 21: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/21.jpg)
CUDA TrainingPET Courses
• Conceived and run by Dr. David Pratt• HPCMP FAPOC for FMS
• Location & Dates: • SAIC facility Suffolk VA, 23 - 25 October
2007• ISI Marina del Rey 21- 23 October 2008• UCSD San Diego 5 – 6 March 2009
• Attendees: total ~ 60 HPCMP users• Web resource and trial test beds• Under construction
![Page 22: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/22.jpg)
NVIDIA Instructors
Paulius MicikeviciousPatrick Legresley
![Page 23: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/23.jpg)
Views of CUDA Classes
![Page 24: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/24.jpg)
Practicum and Programming Sessions
Gene Wagenbreth gives programming
exercises
Patrick Legresley helps with
implementation
![Page 25: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/25.jpg)
Transition to Production Use
• JFCOM “customers” often run classified• Most software development is
unclassified• Issue arose last fall • JFCOM decided to split Joshua in two • Secret (GenSer)• Unclassified
• The partition is now complete
![Page 26: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/26.jpg)
Partitioning of Joshua
![Page 27: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/27.jpg)
Joint Integrated Persistent Surveillance
(JIPS) ProjectThere is an identified military problem that currently is the focus of the requirement for Joshua. The Joint Force Commander (JFC) requires adequate capability to rapidly integrate and focus collection assets to achieve the persistent surveillance.
A major goal is to improve and integrate JIPS by developing Tactics, Techniques and Procedures (TTP’s), Concept of Operations (CONOPS), and architectures that maximize tipping, cueing and communications.
Another goal is the efficient and effective use of sensors to achieve optimum persistence in restricted and denied areas.
A third goal is to improve doctrine, organization, and TTPs to enable the JFC to better command and support Joint Intelligence, Surveillance and Reconnaissance (JISR) operations by:
(1) effective capability apportionment and management,
(2) timely and responsive analytic support
(3) fast, reliable tactical Command, Control, Communications (C3)
The last goal is to enhance employment, coordination and optimization of ISR assets by improving management processes, tools, and CONOPS.
![Page 28: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/28.jpg)
JIPS User Interface
![Page 29: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/29.jpg)
Experimental Schedule
![Page 30: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/30.jpg)
SummaryReturn on
InvestmentJFCOM needs to realistically simulate urban settings with thousands of combatants and millions of civilians
Some benefits are not easily quantified, e.g. finding out what tactics or sensor would be successful in different cities could not practically be done for reasons of security, public resistance to combat troops in their city, and diplomatic issues emerging practicing in cities of potential conflict
Other costs can be estimated, including the estimated cost of keeping an Army Division (around 10K soldiers) in the field at about $20M per day.
JFCOM, using the DHPI cluster, can run such a program with around 100 technicians (one one hundredth on the personnel with less expensive support costs, equipment and energy use, less loss of life, …); without the cluster, this could not be done. Cost saving may be ~$19.5M each day.
At one recent “exercise,” the Lieutenant General in charge pointed out that it was probably the only time in his career he would have an opportunity to command so large a unit, yet he may be called on to do just that any time
The cluster is incorporated into the DREN, allowing large numbers of soldiers to participate, 1,500 in one recent Urban Resolve experiment.
![Page 31: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/31.jpg)
SummaryTechnical Merit
• Challenges laid out in the proposal have all been met
• Joshua deployed and in service • Creating data for JIPS Experiment
• Two million entity goal exceeded• Capability of GPU demonstrated
• Developers trained to use GPUs• Route planning kernel implemented
• Joshua has changed the J9 culture• New code being developed using client/server model
![Page 32: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/32.jpg)
SummaryComputational
Merit• Joshua dedicated to a uniquely military problem• Modeling of operations in urban terrain• Users are often uniformed warfighters
• Hosts a large, heterogeneous ensemble of SAFs• 1024 cores needed to model urban population
• HLA RTI fault tolerant communication model• Anonymous publish/subscribe
• Novel research challenges• Scalable interest management to bound messages• Scaling individual behavior models• Mining distributed data logs to analyze results
![Page 33: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/33.jpg)
SummaryCurrent Progress
• Successfully deployed and accepted at JFCOM
• Exceeded technical goal of hosting 2M entities
• Classification issues led to partitioning• Joshua is now fully engaged in day-to-day
simulation experiments at JFCOM• Running ensembles of SLAMEM simulations
• Ops-tempo is expected to continue and increase• Human-in-the-loop experiments in FY10
![Page 34: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/34.jpg)
SummaryAppropriateness
• Dedicated system• Required by classified, interactive use
• Linux cluster • Easy migration path for users• Low risk given prior experience with DC Linux
clusters• Low-cost GigE network adequate for current
SAFs
• GPUs were inexpensive upgrade ($50K)• Joshua has met JFCOMs requirements
• In service creating data for JIPS
![Page 35: Experimental Use of a Large GPGPU-Enhanced Linux Cluster Computational Sciences 16 June 2009 Robert F. Lucas rflucas@isi.edu (310)448-9449 Approved for](https://reader036.vdocuments.mx/reader036/viewer/2022062718/56649e665503460f94b606ed/html5/thumbnails/35.jpg)
This material is based on research sponsored by the Air Force Research Laboratory under agreement numbers F30602-02-C-0213 and FA8750-05-2-0204. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes, notwithstanding any copyright notation appearing thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory or the U.S. Government.
Research Funded by JFCOM and AFRL