high performance computing: an introduction for the society of actuaries
DESCRIPTION
Invited talk. An introduction to high-performance computing given at the Society of Actuaries Life & Annuity Symposium in 2011.TRANSCRIPT
![Page 1: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/1.jpg)
1
High Performance Computing
Adam DeConinckR Systems NA, Inc.
![Page 2: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/2.jpg)
2
Development of models begins at small scale.
Working on your laptop is convenient, simple.
Actual analysis, however, is slow.
![Page 3: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/3.jpg)
3
Development of models begins at small scale.
Working on your laptop is convenient, simple.
Actual analysis, however, is slow.
“Scaling up” typically means a small server or fast multi-core desktop.
Speedup exists, but for very large models, not significant.
Single machines don't scale up forever.
![Page 4: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/4.jpg)
4
For the largest models, a different approach is required.
![Page 5: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/5.jpg)
5
High-Performance Computing involves many distinct computer processors working together on the same calculation.
Large problems are divided into smaller parts and distributed among the many computers.
Usually clusters of quasi-independent computers which are coordinated by a central scheduler.
![Page 6: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/6.jpg)
6
Typical HPC Cluster
Scheduler
File Server
Ethernet network
High-speed network(10GigE / Infiniband)
Computes
LoginExternal connection
![Page 7: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/7.jpg)
7
Performance test: stochastic finance model on R Systems cluster
High-end workstation: 8 cores. Maximum speedup of 20x: 4.5 hrs → 14 minutes
Scale-up heavily model-dependent: 5x – 100x in our tests, can be faster
No more performance gain after ~500 cores: why? Some operations can't be parallelized.
Additional cores? Run multiple models simultaneously
Performance gains
Number of cores
Dur
atio
n (s
)
High-end workstation
![Page 8: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/8.jpg)
8
Performance comes at a price: complexity.
New paradigm: real-time analysis vs batch jobs. Applications must be written specifically to take
advantage of distributed computing. Performance characteristics of applications change. Debugging becomes more of a challenge.
![Page 9: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/9.jpg)
9
New paradigm: real-time analysis vs batch jobs.
Most small analyses are done in real time:
“At-your-desk” analysis
Small models only
Fast iterations
No waiting for resources
Large jobs are typically done in a batch model:
Submit job to a queue
Much larger models
Slow iterations
May need to wait
![Page 10: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/10.jpg)
10
Applications must be written specifically to take advantage of distributed computing.
Explicitly split your problem into smaller “chunks”
“Message passing” between processes Entire computation can be slowed by one
or two slow chunks Exception: “embarrassingly parallel”
problems Easy-to-split, independent chunks of
computation Thankfully, many useful models fall under
this heading. (e.g. stochastic models)“Embarrassingly parallel” =
No inter-process communication
![Page 11: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/11.jpg)
11
Performance characteristics of applications change.
On a single machine: CPU speed (compute) Cache Memory Disk
On a cluster: Single-machine metrics Network File server Scheduler contention Results from other nodes
![Page 12: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/12.jpg)
12
Debugging becomes more of a challenge.
More complexity = more pieces that can fail Race conditions: sequence of events no longer deterministic Single nodes can “stall” and slow the entire computation Scheduler, file server, login server all have their own challenges
![Page 13: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/13.jpg)
13
External resources One solution to handling complexity: outsource it! Historical HPC facilities: universities, national labs
Often have the most absolute compute capacity, and will sell excess capacity
Competition with academic projects, typically do not include SLA or high-level support
Dedicated commercial HPC facilities providing “on-demand” compute power.
![Page 14: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/14.jpg)
14
External HPC Outsource HPC sysadmin No hardware investment Pay-as-you-go Easy to migrate to new tech
Internal HPC Requires in-house expertise Major investment in hardware Possible idle time Upgrades require new hardware
![Page 15: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/15.jpg)
15
External HPC No guaranteed access Security arrangements complex Limited control of configuration Some licensing complex
Outsource HPC sysadmin No hardware investment Pay-as-you-go Easy to migrate to new tech
Internal HPC No external contention All internal—easy security Full control over configuration Simpler licensing control
Requires in-house expertise Major investment in hardware Possible idle time Upgrades require new hardware
![Page 16: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/16.jpg)
16
“The Cloud” “Cloud computing”: virtual machines, dynamic allocation of resources in
an external resource
Lower performance (virtualization), higher flexibility
Usually no contracts necessary: pay with your credit card, get 16 nodes
Often have to do all your own sysadmin
Low support, high control
![Page 17: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/17.jpg)
17
CASE STUDY:Windows cluster for Actuarial
Application
![Page 18: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/18.jpg)
18
Global insurance company
Needed 500-1000 cores on a temporary basis Preferred a utility, “pay-as-you-go” model Experimenting with external resources for “burst”
capacity during high-activity periods Commercially licensed and supported application Requested a proof of concept
![Page 19: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/19.jpg)
19
Cluster configuration Application embarrassingly parallel, small-to-medium data files,
computationally and memory-intensive Prioritize computation (processors), access to fileserver over
inter-node communication, large storage Upgraded memory in compute nodes to 2 GB/core
128-node cluster: 3.0 GHz Intel Xeon processors, 8 cores per node for 1024 cores total
Windows 2008 HPC R2 operating system Application and fileserver on login node
![Page 20: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/20.jpg)
20
Stumbling blocks Application optimizationCustomer had a wide variety of models which generated different usage
patterns. (IO, compute, memory-intensive jobs) Required dynamic reconfiguration for different conditions.
Technical issueIterative testing process. Application turned out to be generating massive
fileserver contention. Had to make changes to both software and hardware.
Human processesUsers were accustomed to internal access model. Required changes both for providers (increase ease-of-use) and users (change workflow)
Security Customer had never worked with an external provider before. Complex internal security policy had to be reconciled with remote access.
![Page 21: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/21.jpg)
21
Lessons learned:
Security was the biggest delaying factor. The initial security setup took over 3 months from the first expression of interest, even though cluster setup was done in less than a week.
Only mattered the first time though: subsequent runs started much more smoothly.
A low-cost proof-of-concept run was important to demonstrate feasibility, and for working the bugs out.
A good relationship with the application vendor was extremely important to solving problems and properly optimizing the model for performance.
![Page 22: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/22.jpg)
22
Recent developments: GPUs
![Page 23: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/23.jpg)
23
Graphics processing units
CPU: complex, general-purpose processor
GPU: highly-specialized parallel processor, optimized for performing operations for common graphics routines
Highly specialized → many more “cores” for same cost and space
Intel Core i7: 4 cores @ 3.4 GHz: $300 = $75/core NVIDIA Tesla M2070: 448 cores @ 575 MHz: $4500 = $10/core
Also higher bandwidth: 100+ GB/s for GPU vs 10-30 GB/s for CPU
Same operations can be adapted for non-graphics applications: “GPGPU”
Image from http://blogs.nvidia.com/2009/12/whats-the-difference-between-a-cpu-and-a-gpu/
![Page 24: High Performance Computing: an Introduction for the Society of Actuaries](https://reader034.vdocuments.mx/reader034/viewer/2022051513/5473a7e1b4af9fcd0a8b53f8/html5/thumbnails/24.jpg)
24
HPC/Actuarial using GPUs Random-number generation
Finite-difference modeling
Image processing
Numerical Algorithms Group: GPU random-number generator
MATLAB: operations on large arrays/matrices
Wolfram Mathematica: symbolic math analysis
Data from http://www.nvidia.com/object/computational_finance.html