numerically intensive computing in finance lecture 1...
TRANSCRIPT
![Page 1: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/1.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 1: Introductionand Prototype Applications
Mike [email protected]
Lecture 1 1
![Page 2: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/2.jpg)
'
&
$
%
Objectives
This course is motivated by the fact that large-scalecomputations are becoming a standard part ofmathematical finance.
By the end of this course you should have:• some understanding of computer hardware,
and the trends for the future;• a good understanding of the different kinds of
parallel computing;• an understanding of the different kinds of
parallelism inherent in financial applications;• some practical experience with parallel codes!
Lecture 1 2
![Page 3: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/3.jpg)
'
&
$
%
Course Structure
Week 8: MM&SC and ACM MSc’s, andOSC users and others within Oxford University
Lectures in the mornings, Monday-Thursday:• 9:30 – 10:30• 10:45 – 11:45• 12:00 – 13:00
Practicals in the afternoons, Monday-Friday, 2-6.There is no need to be present all the time, justwork at your own pace to complete assignments.
For those doing the course as a Special Topicthere will be additional projects afterwards.
Lecture 1 3
![Page 4: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/4.jpg)
'
&
$
%
Course Structure
Week 9: MSc in Mathematical Finance(module 10)
Lectures in the mornings, Tuesday-Friday:• 9:00 – 10:00• 10:15 – 11:15• 11:30 – 12:30
Practicals in the afternoons, Tuesday-Thursday,1:30-6. There is no need to be present all thetime, just work at your own pace to completepracticals 1-4.
Lecture 1 4
![Page 5: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/5.jpg)
'
&
$
%
Lecture Outline
Day 1: Introduction1 Two prototype problems: Monte-Carlo and
Black-Scholes financial applications2 The “big picture” overview of high performance
computing3 Distributed resource management and web
services
Day 2: Shared-memory Parallelism4 Processor and memory technology5 Shared-memory multiprocessors6 OpenMP multi-threaded computing
Lecture 1 5
![Page 6: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/6.jpg)
'
&
$
%
Lecture Outline
Day 3: Distributed-memory parallelism7 Distributed-memory systems8 BSP model of distributed computing, and
parallelisation of explicit approximations9 Introduction to MPI message passing
Day 4: Distributed-memory applications10 Parallelisation of explicit approximations11 Parallelisation of implicit approximations12 More on MPI
Lecture 1 6
![Page 7: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/7.jpg)
'
&
$
%
Practicals
The practicals are a very important part of thecourse. Anyone taking the course for credit aspart of one of the MSc’s must completepracticals 1-4 and hand in a write-up showingthat they have gone through all of the exercises.• Using NAG libraries, Grid Engine and web
services for parallel Monte-Carlo calculations• Using OpenMP multithreading for an explicit
finite difference Black-Scholes discretisation• An introduction to MPI message passing,
including for Monte-Carlo calculations• Using MPI for an explicit B-S FD method• Using MPI for an implicit B-S FD method
Lecture 1 7
![Page 8: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/8.jpg)
'
&
$
%
Reading Material
Hardware• Web references for current hardware
(see links from course webpage)• John L. Hennessy and David A. Patterson,
Computer Architecture: a QuantitativeApproach, 3rd edition, Morgan Kaufmann,2003.
Lecture 1 8
![Page 9: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/9.jpg)
'
&
$
%
Reading Material
Software• R. Chandra et al, Parallel Programming in
OpenMP, Morgan Kaufmann, 2001.• W. Gropp, E. Lusk and A. Skjellum, Using MPI:
Portable Parallel Programming with theMessage-Passing Interface (second edition),MIT Press, 2000.
Lecture 1 9
![Page 10: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/10.jpg)
'
&
$
%
Reading Material
Mathematical Finance• Lecture notes for MSc courses• P. Wilmott, S. D. Howison and J. Dewynne,
Mathematics of Financial Derivatives, CUP,1995.• D. Duffy, Finite Difference Methods in Financial
Engineering: A Partial Differential EquationApproach, John Wiley and Sons, 2006• P. Glasserman, Monte Carlo Methods in
Financial Engineering, Springer, 2004.
Lecture 1 10
![Page 11: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/11.jpg)
'
&
$
%
Monte Carlo Model Problem
Stochastic differential models in mathematicalfinance have the form
dS = a(S, t) dt + b(S, t) dW
where S, a are vectors, b is a matrix, and dW is anincrement of a vector Wiener path with correlationΣ(S, t).
These are to be solved subject to some initialconditions at time t = 0, and the aim is todetermine the expected (discounted) value of apayoff function of the state at final time t = T .
Lecture 1 11
![Page 12: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/12.jpg)
'
&
$
%
Monte Carlo Model Problem
In Monte-Carlo simulations, the expected value isestimated by averaging the values obtained bydoing lots of different path calculations withdifferent random inputs.
Using Forward Euler time discretisation, each pathis calculated from
Sn+1 = S
n + a(Sn, tn) ∆t + b(Sn, tn) ∆Wn
where ∆Wn is a vector of normally distributedrandom variables with zero mean, variance ∆t andcorrelation Σ(Sn, tn)
Lecture 1 12
![Page 13: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/13.jpg)
'
&
$
%
Monte Carlo Model Problem
Because of the Central Limit Theorem, the errorin the estimated value is proportional to N−1/2,where N is the number of paths calculated.
There are various techniques for reducing theconstant of proportionality (co-variate variables,variance reduction) and improving the exponent(quasi-random sequences), but for the purposesof this course these are not important.
Lecture 1 13
![Page 14: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/14.jpg)
'
&
$
%
Monte Carlo Model Problem
What is important?• Each path calculation is entirely independent
– “trivially parallel”, just run a number ofpaths on each machine in a “cluster” or“farm” and average the output• Need to generate lots of random numbers• Each path needs a completely independent
set of random numbers
Lecture 1 14
![Page 15: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/15.jpg)
'
&
$
%
Monte Carlo Model Problem
Our two-asset model problem is
dS1 = r S1 dt + σ S1 dW1
dS2 = r S2 dt + σ S2 dW2
with correlation matrix
Σ =
(
1 ρρ 1
)
The initial conditions at t = 0 are S1 = S2 = 1
and the discounted payoff at t = 1 is
P (S1, S2) =
{
e−r, max(|S1 − 1|, |S2 − 1|) < 0.1
0, otherwise
Lecture 1 15
![Page 16: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/16.jpg)
'
&
$
%
Random Number Generation
• use standard numerical libraries, so don’tneed to know how they’re generated• uniformly distributed random numbers on
[0,1] are generated by a recurrence relation• converted into Normally distributed variables
with zero mean and unit variance through:– Box-Muller method– Marsaglia-Bray method– inverting cumulative probability
distribution
See Glasserman’s book for more details.
Lecture 1 16
![Page 17: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/17.jpg)
'
&
$
%
Random Number Generation
If X is a vector of independent normallydistributed random variables with zero mean andunit variance, then the vector Y defined by
Y = L X
is a vector of normally distributed variables withzero mean and covariance matrix
Σ = L LT
Lecture 1 17
![Page 18: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/18.jpg)
'
&
$
%
Random Number Generation
Given a particular desired Σ, the simplest choicefor L is a Cholesky factorisation in which L islower-triangular.
For our model problem
Σ =
(
1 ρρ 1
)
this gives
L =
(
1 0
ρ√
1−ρ2
)
Lecture 1 18
![Page 19: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/19.jpg)
'
&
$
%
Finite Difference Model Problem
The Black-Scholes equation for our two-assetmodel problem is can be written in the form
Vt + rS1VS1+ rS2VS2
+
σ2(
12S2
1VS1S1+ ρS1S2VS1S2
+ 12S2
2VS2S2
)
= rV
This is solved backwards in time from the finalvalue equal to the payoff function, to get thevalue at at the initial time t = 0.
Lecture 1 19
![Page 20: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/20.jpg)
'
&
$
%
Finite Difference Model Problem
Switching to new variables η = logS, τ = 1− t,and defining
r∗ = r − 12σ2,
the equation becomes
Vτ = r∗(
Vη1 + Vη2
)
+ σ2(
12Vη1η1 + ρVη1η2 + 1
2Vη2η2
)
− rV
which is to be solved forward in time from τ = 0
to τ = 1.
Lecture 1 20
![Page 21: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/21.jpg)
'
&
$
%
Finite Difference Model Problem
A simple Explicit Euler central spacediscretisation on a uniform Cartesian grid is
V n+1 = (1− r∆t)V n +r∗∆t
2∆η
(
δ2η1+ δ2η2
)
V n
+σ2∆t
2∆η2
(
(1−ρ) δ2η1+ ρ δ2η1η2
+ (1−ρ) δ2η2
)
V n
where
δ2η1Vi,j ≡ Vi+1,j − Vi−1,j
δ2η2Vi,j ≡ Vi,j+1 − Vi,j−1
and . . .
Lecture 1 21
![Page 22: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/22.jpg)
'
&
$
%
Finite Difference Model Problem
} }
} } }
} }
δ2η1Vi,j ≡ Vi+1,j − 2Vi,j + Vi−1,j
δ2η1η2Vi,j ≡ Vi+1,j+1 − 2Vi,j + Vi−1,j−1
δ2η2Vi,j ≡ Vi,j+1 − 2Vi,j + Vi,j−1
making it a 7-point stencil:
Lecture 1 22
![Page 23: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/23.jpg)
'
&
$
%
Finite Difference Model Problem
If we instead use Backward Euler timedifferencing, giving
(1 + r∆t) V n+1 − r∗∆t
2∆η
(
δ2η1+ δ2η2
)
V n+1
− σ2∆t
2∆η2
(
(1−ρ) δ2η1+ ρ δ2η1η2
+ (1−ρ) δ2η2
)
V n+1
= V n
then the question is how to solve the system ofsimultaneous equations for V n+1.
Jacobi, Gauss-Seidel and CG-like iterativesolution methods will be considered later.
Lecture 1 23
![Page 24: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/24.jpg)
24
![Page 25: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/25.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 2: Computing – the “Big Picture”
Mike [email protected]
Lecture 2 25
![Page 26: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/26.jpg)
'
&
$
%
The Driving Forces
Money and economics are what drive computing,not technology.
Money: if there’s a big enough market, someonewill develop the product.
Economics: cost per unit item is minimised byproducing huge numbers of the same item –particularly important in computing where thecosts of development and fabrication plant arehuge (measured in $bn’s).
Lecture 2 26
![Page 27: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/27.jpg)
'
&
$
%
Technological Trends
Moore’s Law (from Gordon Moore of Intel, 30 yearsago): CPU speed doubles every 18-24 months
There is similar growth in all other hardwareaspects: memory size, memory bandwidth, disksize, network speed, . . .
Safe to assume that this will continue for at leastthe next 10 years, driven by:• multimedia applications• anti-virus/firewall/anti-spam software• image processing• “intelligent” software
Lecture 2 27
![Page 28: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/28.jpg)
'
&
$
%
The Hardware Pyramid
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJ
JJJ
z
embedded systems
laptops
PC’s
servers
supercomputers -
Lecture 2 28
![Page 29: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/29.jpg)
'
&
$
%
Hardware
• almost all computing is now done onsystems built of commodity components,benefitting from economies of scale– the days of highly-specialised “vector”supercomputers are over• roughly 4 : 2 : 1 ratio in performance for
CPUs in servers : PCs : embedded systems• Intel is the dominant force in CPUs; only
AMD, IBM, Sun are left in competition
Lecture 2 29
![Page 30: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/30.jpg)
'
&
$
%
Multi-level Parallelism
• instruction parallelism (e.g. addition)• pipeline parallelism, overlapping different
instructions• multiple pipelines, each with own capabilities• multiple CPU’s within a single “multicore”
chip• multiple chips within a single shared-memory
computer• multiple computers within a
distributed-memory system• multiple systems within an organisation
Lecture 2 30
![Page 31: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/31.jpg)
'
&
$
%
Hardware
Lecture 4 will look at CPUs to understand thelower levels of parallelism, and at how data ismoved between the CPU and the main memoryusing caches.
An understanding of both is required to get thebest execution speed from sequential processes,and the memory hierarchy also has majorconsequences for parallel computing.
Lecture 2 31
![Page 32: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/32.jpg)
'
&
$
%
Hardware for high-end computing
1) shared-memory multiprocessor• the modern mainframe – top products from
Sun and IBM are used widely in banks,especially for database applications• single very large memory (up to 250GB?)
accessed by multiple processors (up to 72dual-core chips)• hardware challenge is high bandwidth
memory access – costly• often has high-reliability features such as
hot-swap disks, redundant power supplies– adds to cost
Lecture 2 32
![Page 33: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/33.jpg)
'
&
$
%
Hardware for high-end computing
Oxford Supercomputing Centre plans:• spend 20% of budget on shared-memory
systems for specific applications(e.g. Gaussian, molecular modellingpackage)• each with probably 8 dual-core processors,
and maybe 32GB memory• probably no high-reliability features to
minimise cost
Lecture 2 33
![Page 34: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/34.jpg)
'
&
$
%
Hardware for high-end computing
2) tightly-coupled distributed-memory system• multiple nodes (with 1 – 4 processors) each
with own memory• high-bandwidth low-latency network
connection (Gigabit Ethernet, Myrinet,Infiniband)• in academia, used to be collections of PCs
on a shelf, but now there are tailoredpackages from the leading vendors• a key issue is system management; you lose
all of the price/performance benefits if youhave to employ lots of system managers
Lecture 2 34
![Page 35: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/35.jpg)
'
&
$
%
Hardware for high-end computing
Oxford Supercomputing Centre plans:• spend 80% of budget on several large
clusters• each with probably 128 “nodes” containing 2
dual-proc chips, so a total of 512 cores percluster• probably Gigabit Ethernet with custom
drivers for networking, except for one clusterwith higher-spec custom networking
Lecture 2 35
![Page 36: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/36.jpg)
'
&
$
%
Hardware for high-end computing
Another example is the OCCF cluster for theOxford Centre for Computational Financeand the Computing Laboratory• 24 Sun Ultra-80 nodes each with 4
UltraSPARC processors and 2GB memory• connected by Myrinet for parallel computing,
and 100Mb/s Ethernet for file i/o and externalnetwork access• very old now – about to be shut down
Lecture 2 36
![Page 37: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/37.jpg)
'
&
$
%
Hardware for high-end computing
3) loosely-coupled PC/workstation “farms”• similar to 2) but with relatively low-speed
interconnect (100Mb/s or Gigabit Ethernetwith TCP/IP software)• ideally suited for “trivially-parallel”
applications like Monte-Carlo• system management and resource
management are again the key issues
Lecture 2 37
![Page 38: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/38.jpg)
'
&
$
%
Hardware for high-end computing
Dedicated farms:• racks of up to 4000 “pizza box” servers at
bio-informatics companies
Collection of “idle” resources:• traders’ workstations/PCs which are idle
overnight and at weekends• computer teaching labs in the university,
unused most of the time!
Lecture 2 38
![Page 39: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/39.jpg)
'
&
$
%
Hardware for high-end computing
Comparison:• 8 : 2 : 1 cost/performance ratio for three
categories• shared-memory and distributed-memory
systems built of high-end processorsbecause of cost of interconnect• PC/workstation farms built of low-end
processors for lowest cost/performance ratio
Lecture 2 39
![Page 40: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/40.jpg)
'
&
$
%
Hardware for high-end computing
Trends:• business/finance is replacing science as
main user of “supercomputers”• big shared-memory systems (> 64 procs)
are in decline because database softwarehas been re-written for distributed-memorysystems• concerns over power consumption caused
move to lower-frequency multicore chips– increasingly the aim is to maximise CPUperformance per watt!
Lecture 2 40
![Page 41: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/41.jpg)
'
&
$
%
Electrical Power
• new OSC computer room will have 600kWsupply for computers (plus an additional400kW to keep them cool)• total power concumption: 1MW
total electricity bill: £400k/yr• averaged over a 3-year lifetime, electricity
cost is roughly 40% of the purchase price
• Intel Pentium 4 Extreme Edition had clockfrequency up to 3.8GHz and used up to130W• new Intel multicore chips run at up to 3GHz
and use 65-75W
Lecture 2 41
![Page 42: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/42.jpg)
'
&
$
%
Final “Big Picture” Considerations
• driving force is vast market for PCs andservers with a price tag of £500 -£1500• consequence is that a compute cluster
costing £1M may have up to 1000 chips(2000 cores) if the interconnect is not tooexpensive• move to clusters with multi-core chips means
we may have to exploit both shared-memoryand distributed-memory parallel computingin high-end applications.
Lecture 2 42
![Page 43: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/43.jpg)
'
&
$
%
Hardware vs. Software
Hardware:• phenomenal technological advances, driven
by user needs• new products every year, new architectures
every 10 years
Software:• disappointingly slow progress, limited by
“people” issues• new languages and standards every 10
years or so
Lecture 2 43
![Page 44: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/44.jpg)
'
&
$
%
Software “People” Issues
• need global standards, agreed by committeewhich takes time – Fortran 90 was motivatedby vector computing, which was on the wayout by the time the standard was agreed!• need (re)training of staff – in the worst case
have to wait for existing staff to retire!• staff can be reluctant to learn new skills which
might not be transferrable to a new employer– another reason for standards• companies have been happier investing in
hardware than software – changing thesedays?
Lecture 2 44
![Page 45: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/45.jpg)
'
&
$
%
Languages
Fortran and C:• for those who want the highest performance• closest to the level of the operating system
(written in C)
C++, Java, C#:• object-oriented computing for better software
design and re-use of code (in principle)
Visual Basic, Matlab:• “niche” languages with very strong following• emphasis on ease of use
Lecture 2 45
![Page 46: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/46.jpg)
'
&
$
%
Parallel Computing Standards
• OpenMP for multithreaded computing onshared-memory systems• MPI for message-passing on
distributed-memory systems• both support Fortran, C and C++ and
provide portability across all major vendors
However, both are rather low-level and MPIinvolves tedious programming; I’d like to seemore research on developing parallel librariesto handle parallelism automatically
Lecture 2 46
![Page 47: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/47.jpg)
47
![Page 48: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/48.jpg)
48
![Page 49: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/49.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 3: Distributed resource management andweb services
Mike [email protected]
Lecture 3 49
![Page 50: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/50.jpg)
'
&
$
%
“Trivial” Parallelism
Monte-Carlo applications are a good example oftrivial parallelism• 106 independent random paths can be
grouped into 100 jobs, each with 104 paths• Each job is independent and has very few
inputs and outputs• Given lots of machines, want “something” to
decide where the jobs should be run to givethe fastest turnaround time.
Only tricky bit for user is making sure each jobuses independent random number generation– see practical 1.
Lecture 3 50
![Page 51: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/51.jpg)
'
&
$
%
Distributed Resource Management
Using loosely-coupled PC/workstation farms forMonte Carlo calculations needs distributedresource management:• which machines are available?• how heavily are they being used?• do they have the necessary
software/licenses?• what rights do I have to use them?
Lecture 3 51
![Page 52: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/52.jpg)
'
&
$
%
Distributed Resource Management
Grid Engine (Sun), LSF (Platform Computing)and Condor (Univ. of Wisconsin) deal with thisthrough users submitting tasks to a unified queuewhich dispatches jobs based on:• matching job requirements to machine
properties• taking account of current interactive/batch
usage of machine• taking account of different priorities of
different user groups• doing charging if necessary
Maybe sounds simple – but very important
Lecture 3 52
![Page 53: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/53.jpg)
'
&
$
%
Distributed Resource Management
Some DRM software can also work in ahierarchy:• each department has Grid Engine to
manage its own cluster• the overall organisation has Grid Engine
queues which can feed into the departmentalqueues• users normally use departmental resources,
but can go to higher level queues for extraresources
This is a more robust solution than having asingle control point for the entire organisation.
Lecture 3 53
![Page 54: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/54.jpg)
'
&
$
%
Web Services
Distributed resource management is one aspectof Grid Computing.
Another is the use of Web Services to linkseparate applications running on differentmachines, possibly under different operatingsystems, even within different organisations.
Because of the requirements of eCommerce,there is a huge development effort with wellestablished standards supported by all of themajor companies (Microsoft, IBM, SUN)
Lecture 3 54
![Page 55: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/55.jpg)
'
&
$
%
Web Services
At its simplest, web services follows an RPC(Remote Procedure Call) approach:• a client process sends a request to a server• the server process returns a response
• the client and server processes usually“belong” to different users (different userid)• the server process is usually a persistent
service, running indefinitely waiting for clientrequests
Lecture 3 55
![Page 56: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/56.jpg)
'
&
$
%
Web Services
Within the basic client/server arrangement, thereare a number of subtle distinctions.
A standard web server can offer web servicesthrough CGI executables: it listens to port 80,and if a requests asks for a particular CGI to beexecuted to generate a response then it does it.
Alternatively, can have a standalone web servicewhich listens to a particular port and deals withrequests.
Lecture 3 56
![Page 57: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/57.jpg)
'
&
$
%
Note on Ports
When an application “talks” to application onanother machine, it does so through numbered“ports”.
There is at most one application listening to eachport, with reserved port numbers for particularservices (/etc/services on a Unix system)• 21 ftp• 22 ssh• 80 http
Firewalls restrict which ports are left open, andhence control external communication.
Lecture 3 57
![Page 58: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/58.jpg)
'
&
$
%
Web Services
What about handling multiple requests fromdifferent clients?
• could queue them up and process them oneat a time• could spin off a separate thread (or fork a
separate process) to deal with each one
Lecture 3 58
![Page 59: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/59.jpg)
'
&
$
%
Web Services
What about handling multiple requests from thesame client?
If the history of the interaction needs to bemaintained (persistence), this can be done byopening a communication channel andmaintaining it (keepalive) until the client closes it,or there’s a timeout.
(In this case, should use a separate thread orprocess for each client.)
Lecture 3 59
![Page 60: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/60.jpg)
'
&
$
%
Web Services
Standards are crucial for interoperability of webservices.
SOAP (Simple Object Access Protocol) definesthe RPC interaction:• XML for the main content (request and
response)• optional MIME attachment (just like email)• http/https to send the SOAP messages
There is no restriction on the choice of languagefor implementing the server or client application.
Lecture 3 60
![Page 61: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/61.jpg)
'
&
$
%
Web Services
Language-specific support for creating webservices includes:• Java: IBM Websphere, Sun ONE,
Borland JBuilder, lots of others• C#: Microsoft .NET• Python: ZSI (Zolera Soap Infrastructure)• C/C++: gSOAP
Lecture 3 61
![Page 62: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/62.jpg)
'
&
$
%
gSOAP
gSOAP is a package for generating web serviceservers and clients in C/C++• a pre-processor generates additional C/C++
files given a header file specification of theRPC routines• there are also some gSOAP files which
contain the code to do all the conversion ofdata to/from XML• the distribution includes 150 pages of
documentation and lots of exampleapplications
Lecture 3 62
![Page 63: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/63.jpg)
'
&
$
%
gSOAP
The example is a web service calculator whichtakes two numbers and adds or subtracts them.
For this application, the user writes 3 files:• calc.h: a header file defining the RPC
routines• calcserver.c: the server code• calcclient.c: the client code
Lecture 3 63
![Page 64: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/64.jpg)
'
&
$
%
calc.h
//gsoap ns service name: calc//gsoap ns schema namespace: urn:calc
int ns add(double a,double b,double *result);
int ns sub(double a,double b,double *result);
The ns prefix and the gSOAP declarations avoidambiguities if an application needs to use twoservices with the same RPC names
Lecture 3 64
![Page 65: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/65.jpg)
'
&
$
%
calcserver.c
#include <math.h>#include "soapH.h"#include "calc.nsmap"
int main(int argc, char **argv){ int m, s; /* master and slave sockets */struct soap soap;soap init(&soap);
m = soap bind(&soap,NULL,80,100);
for ( ; ; ){ s = soap accept(&soap);soap serve(&soap);soap end(&soap);
}
return 0;}
Lecture 3 65
![Page 66: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/66.jpg)
'
&
$
%
calcserver.c
int ns add(struct soap *soap,double a, double b, double *result)
{ *result = a + b;return SOAP OK;
}
int ns sub(struct soap *soap,double a, double b, double *result)
{ *result = a - b;return SOAP OK;
}
Lecture 3 66
![Page 67: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/67.jpg)
'
&
$
%
calcclient.c
#include "soapH.h"#include "calc.nsmap"
const char server[] ="http://booth10.ecs.ox.ac.uk:80";
int main(int argc, char **argv){ struct soap soap;double a, b, result;
soap init(&soap);
a = strtod(argv[2], NULL);b = strtod(argv[3], NULL);
Lecture 3 67
![Page 68: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/68.jpg)
'
&
$
%
calcclient.c
switch (*argv[1]){ case ’a’:
soap call ns add(&soap, server, "",a, b, &result);
break;case ’s’:soap call ns sub(&soap, server, "",
a, b, &result);break;
}
if (soap.error)soap print fault(&soap, stderr);
elseprintf("result = %g\n", result);
return 0;}
Lecture 3 68
![Page 69: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/69.jpg)
'
&
$
%
gSOAP
Additional features:• multiple results handled by a result structure• dynamic arrays handled by a structure with
size and pointer• keepalive for services needing persistence• https and SSL for security• zlib and gzip compression• MIME attachments
Lecture 3 69
![Page 70: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/70.jpg)
'
&
$
%
Final comments
Web services are likely to become very useful inlinking Windows PCs on the desktop to Unixservers in the back-office:• web clients on Windows PCs written in
Java/C#/Visual Basic using Microsoft’s• web services on Unix servers written in
C/C++ using gSOAP• much more dynamic/responsive than using
software like Grid Engine.
Lecture 3 70
![Page 71: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/71.jpg)
71
![Page 72: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/72.jpg)
72
![Page 73: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/73.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 4: Processor and Memory Technology
Mike [email protected]
Lecture 4 73
![Page 74: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/74.jpg)
'
&
$
%
Processor Technology
Why discuss processor technology?• interesting to learn how Moore’s Law is being
upheld• interesting, because there’s lots of
parallelism in the CPU hidden from theprogrammer/user;• important, because a better understanding
enables an expert programmer to get betterperformance• important, because it affects whether
higher-level parallelism involves 10’s ofprocessors, or 1000’s
Lecture 4 74
![Page 75: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/75.jpg)
'
&
$
%
Ideal Von Neumann Processor
• each cycle, CPU takes data from registers,does an operation, and puts the result back• load/store operations (memory←→ registers)
also take one cycle• CPU can do different operations each cycle• output of one operation can be input to next
-
timeop1-- -
op2-- -
op3-- -
CPU’s haven’t been this simple for a long time!
Lecture 4 75
![Page 76: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/76.jpg)
'
&
$
%
Pipelining
Pipelining is a technique in which multipleinstructions are overlapped in execution.
-
time1 2 3 4 5-- -
1 2 3 4 5-- -
1 2 3 4 5-- -
• 1 result per cycle after pipeline fills up• improved utilisation of hardware• major complication – an output can only be
used as input for an operation starting later
Lecture 4 76
![Page 77: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/77.jpg)
'
&
$
%
Superscalar Processors
Most processors have multiple pipelines fordifferent tasks, and can start a number ofdifferent operations each cycle.
Example: Sun Microsystems UltraSPARC III• 2 integer pipes• 1 floating-point (FP) multiply pipe• 1 FP addition/subtraction pipe• in principle, capable of producing 2 integer
and 2 FP results per cycle• FP division uses both FP pipes and is very
slow (29 cycles)
Lecture 4 77
![Page 78: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/78.jpg)
'
&
$
%
Technical Challenges
• compiler to extract best performance,reordering instructions if necessary• controller to handle multiple pipelines
(sometimes with out-of-order execution)• memory hierarchy to deliver data to registers
fast enough to feed the processor• tricks to avoid delays waiting for data
(pipeline stall)• tricks to avoid delays due to conditional
branching (loops, logical tests)
These all limit the number of pipelines that canbe used effectively
Lecture 4 78
![Page 79: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/79.jpg)
'
&
$
%
Programmer Assistance
The programmer can help the compiler byproviding more scope for re-ordering operations– common trick is loop unrolling with addedbenefit of less branching.
for (i=0; i<1000; i++) {
x += sqdt*rand[i];
}
Problem: each multiply must complete beforeaddition, and looping probably forces addition tocomplete before next multiply.
Lecture 4 79
![Page 80: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/80.jpg)
'
&
$
%
Programmer Assistance
for (i=0; i<1000; i+=4) {
x += sqdt*rand[i];
x += sqdt*rand[i+1];
x += sqdt*rand[i+2];
x += sqdt*rand[i+3];
}
Each addition must complete before nextaddition, but multiplies are now almost fullyoverlapped.
Note: need a “remainder” loop when loop rangeis not perfectly divisible by unrolling factor.
Lecture 4 80
![Page 81: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/81.jpg)
'
&
$
%
Programmer Assistance
To get more speedup, do 2 Monte-Carlo paths atsame time:
for (i=0; i<1000; i+=2) {
x1 += sqdt*rand[i];
x2 += sqdt*rand[i+1000];
x1 += sqdt*rand[i+1];
x2 += sqdt*rand[i+1001];
}
Now enough scope for overlap to get almost fullutilisation of a processor with a single 3-stagepipeline.
Lecture 4 81
![Page 82: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/82.jpg)
'
&
$
%
Compiler Optimisation
Need even more unrolling for multiple pipelines.Fortunately, the compiler will perform innermostloop unrolling, but sometimes needs to be told todo so – compiler directive.
Sun’s cc compiler also has different optimisationlevels, giving a trade-off between compiler andcode speed.-fast does a variety of optimisations includingmultiplying by a reciprocal instead of dividingrepeatedly by the same number, and optimisationfor native hardware.
Lecture 4 82
![Page 83: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/83.jpg)
'
&
$
%
Current Trends
• clock cycle no longer reducing, due toproblems with power consumption(up to 130W per chip)• gates/chip still doubling every 24 months⇒ more on-chip memory and MMU
(memory management units)⇒ specialised hardware (e.g. multimedia,
encryption)⇒ multi-core (multiple CPU’s on one chip)• peak performance of chip still doubling every
12-18 months
Lecture 4 83
![Page 84: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/84.jpg)
'
&
$
%
Intel chips
“Conroe” desktop chip:• dual-core chip running at 2.66GHz• 14-stage pipelines capable of 4 operations per
cycle
Others:• dual-core Core Duo already out in laptops• dual-core “Woodcrest” for servers soon• 90% of all sales dual-core by end of 2006• quad-core chips by early 2007
Lecture 4 84
![Page 85: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/85.jpg)
'
&
$
%
AMD chips
Athlon X2 desktop chip:• dual-core chip running at up to 2.4GHz, using
90-110W• quad-core in early 2007
Dual-core Opterons:• dual-core chip running at up to 2.6GHz, using
55-95W• up to 8-way (16-core) SMP systems• quad-core due in early 2007
Lecture 4 85
![Page 86: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/86.jpg)
'
&
$
%
IBM chips
Power 5:• 2 cores on a single chip• each core runs two threads simultaneously,
overlapping on different pipelinesIBM/Sony/Toshiba Cell chip:• originally designed for new Sony Playstation• has one Power 4 core plus 8 graphics cores• now to be used as multi-core chip in new IBM
blade system
Lecture 4 86
![Page 87: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/87.jpg)
'
&
$
%
SUN chips
Sparc VI:• 2 cores, running at 2.4GHz using 120W• Fujitsu developing quad-core variant for
2008?
UltraSparc T1 (“Niagara”) chip:• 8 cores, running at up to 1.2GHz• extra bits for encryption and data
compression• limited floating point performance• intended for file servers / web servers
Lecture 4 87
![Page 88: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/88.jpg)
'
&
$
%
ClearSpeed
• startup company• PCI-Express board with 2 compute chips• each has 96 cores, running at 133MHz(?),
using 10W• ideally suited for Monte Carlo applications• best performance/watt in marketplace?
Lecture 4 88
![Page 89: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/89.jpg)
'
&
$
%
Memory Hierarchy
Why discuss memory?• more and more, it is the bottleneck in
modern computer systems• in some cases, it is possible to get much
greater performance through minor changesto a code• understanding how caches work is vital to
understanding the operation andprogramming of shared-memory parallelcomputers
Lecture 4 89
![Page 90: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/90.jpg)
'
&
$
%
Memory Hierarchy
?
fastermore expensive
smaller
1 – 8 GB400MHz DDR2Main memory
1 – 4 MB1GHz SRAML2 Cache
L1 Cache64KB2GHz SRAM
registers
100+ cycle access, 5GB/s
12 cycle access, 20GB/s
2 cycle access
?
6
??66
???666
Lecture 4 90
![Page 91: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/91.jpg)
'
&
$
%
Memory Hierarchy
Execution speed relies on exploiting data locality• temporal locality: a data item just accessed
is likely to be used again in the near future,so keep it in the cache• spatial locality: neighbouring data is also
likely to be used soon, so load them into thecache at the same time using a ‘wide’ bus(like a multi-lane motorway)
Lecture 4 91
![Page 92: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/92.jpg)
'
&
$
%
Caches
The cache line is the basic unit of data transfer;typical size is 128 bytes ≡ 16× 8-byte items.
In a single cache system, when the CPU loadsdata into a register:• looks for line in cache• if there (hit), get data• if not (miss), get entire line from main
memory, displacing an existing line in cache(usually least recently used)
When the CPU stores data from a register:• same procedure
Lecture 4 92
![Page 93: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/93.jpg)
'
&
$
%
Caches
What happens when a cache line is modified?
Write-through cache:• modified line is immediately written to the
main memory• main memory stays up-to-date• generates lots of memory traffic
Write-back cache:• modified line is only written to main memory
when it gets displaced from the cache• much less memory traffic• main memory may not have latest values
– potential problem for parallel computing
Lecture 4 93
![Page 94: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/94.jpg)
'
&
$
%
Caches
Multi-level caches
All major processors use at least two levels ofcache:• primary cache is small (e.g. 64KB), on-chip
and write-through• secondary cache is larger (e.g. 2MB),
usually on-chip and write-back• if there is a third level cache, then it is even
larger, off-chip and write-back
Lecture 4 94
![Page 95: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/95.jpg)
'
&
$
%
Importance of Locality
Typical workstation:2 Gflops CPU5 GB/s memory←→ L2 cache bandwidth128 bytes/line
5GB/s ≡ 40M line/s ≡ 600M reals/s
At worst, each flop requires 2 inputs and has 1output, forcing loading of 3 lines =⇒ 13 Mflops
If all 16 variables/line are used, then thisincreases to 200 Mflops.
To get up to 2Gflops needs temporal locality,re-using data already in the cache.
Lecture 4 95
![Page 96: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/96.jpg)
'
&
$
%
Loop Ordering
A 2D finite difference code typically has loops ofthe form
for (i=0; i<1000; i++) {
for (j=0; j<1000; j++) {
u[id(i,j)] = ...
}
}
where id(i,j) maps the indices (i,j) to aunique element of u.
Question: would it be more efficient to re-orderthe loops?
Lecture 4 96
![Page 97: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/97.jpg)
'
&
$
%
Loop Ordering
The answer depends on the function id(i,j).
If we use
id(i,j) = i + j*imax
then id(3,7) is next to id(4,7), but notid(3,8).
Multiple dimensions are handled similarly, withthe lower dimensions varying most rapidly.
Lecture 4 97
![Page 98: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/98.jpg)
'
&
$
%
Loop Ordering
Consequently, in the FD example, it is best tohave the i loop innermost, to access theelements of u[id(i,j)] sequentially.
If the j loop is innermost, then the cache linewith element u[id(i,j)] may have beendisplaced by the time that u[id(i+1,j)] is tobe computed.
This can have very dramatic consequences!
Lecture 4 98
![Page 99: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/99.jpg)
'
&
$
%
Current Trends
Memory hierarchy seems likely to remain – veryhigh speed memory is too expensive
Importance of cache lines and data locality islikely to remain – transferring multiple bits of datain parallel is only way to get high throughput
Best we can hope for is that compilers will handlecode optimisation, but remember, high-endnumerical computing is not a big driver.
Lecture 4 99
![Page 100: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/100.jpg)
100
![Page 101: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/101.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 5: Shared-memory Multiprocessors
Mike [email protected]
Lecture 5 101
![Page 102: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/102.jpg)
'
&
$
%
Shared-memory Multiprocessors
CPU CPU CPU CPU CPU
cache cache cache cache cache
Main Memory
Conceptual arrangement:• multiple CPU’s, each with own cache• all linked to a unified main memory by a very
high bandwidth interconnect
Lecture 5 102
![Page 103: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/103.jpg)
'
&
$
%
Shared-memory Multiprocessors
For historical reasons, they are also referred toas SMP systems – Symmetric Multi-Processors
“Symmetric” refers to the fact that all processorsare equal
An asymmetric system is one in which there is amaster processor, and a number of slaves– like the ClearSpeed card
Lecture 5 103
![Page 104: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/104.jpg)
'
&
$
%
Interconnect
One challenge in building shared-memorysystems is achieveing sufficient bandwidthbetween all of the processors and multiple“memory ports” (points of entry into the mainmemory)• traditional PC bus is not scalable – fixed
bandwidth shared between more and moreprocessors• scalable performance is achieved using
commodity crossbar (full interconnect) chipsoriginally developed for network switches
Lecture 5 104
![Page 105: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/105.jpg)
'
&
$
%
Cache Coherency
The other challenge in shared-memory multiprocessors ismaintaining coherency with write-back caches
CPU1 CPU2 CPU3 CPU4 CPU5
cache cache cache cache cache
Main Memory
Suppose CPU2 loads and modifies variable X, and thenCPU4 needs to load X – what happens?
Lecture 5 105
![Page 106: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/106.jpg)
'
&
$
%
Cache Coherency
The solution is a “snoopy bus” linking the caches; CPU2spots the request from CPU4 and supplies the newervalue for X.
CPU1 CPU2 CPU3 CPU4 CPU5
cache cache cache cache cache
Main Memory
Lecture 5 106
![Page 107: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/107.jpg)
'
&
$
%
Cache Coherency
In the MESI cache coherency protocol, a cacheline can be in one of 4 states:• Modified: sole owner of modified line• Exclusive: sole owner, not modified• Shared: shared ownership, not modified• Invalid: incorrect data
����
����
����
����
M E
SI �
�
?@@
@@
@@
@@
@@I@@
@@
@@
@@
@@R
write
write
writeby other
read by otherread by other
write
Lecture 5 107
![Page 108: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/108.jpg)
'
&
$
%
Cache Coherency
Note: don’t want different processors “fighting”for ownership of the same cache line – can givevery bad performance
As with the main system bus, the snoopy bus hasproblems scaling to large numbers of processors.
There have been alternative methods used inlarge shared-memory NUMA (Non-UniformMemory Access) machines, but they wereexpensive.
Lecture 5 108
![Page 109: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/109.jpg)
'
&
$
%
Shared-memory Computing
A key distinction: processors and processes
A processor is a piece of hardware which canexecute instructions
A process is a program consisting of a set ofinstructions
At any instant, there is precisely one processexecuting on each processor, but the sameprocess may be executing on more than oneprocessor
Lecture 5 109
![Page 110: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/110.jpg)
'
&
$
%
Shared-memory Computing
In a shared-memory system, a user application isa single Unix process, with a number of virtualmemory pages holding the user’s data, and anumber of “threads” working on it.
Very like having a project being carried out by apool of “workers”:• some tasks can only be done by a single
worker, while the rest wait around• other tasks can be carried out in parallel by
many workers• key is deciding what can be done in parallel,
avoiding conflicts between workers
Lecture 5 110
![Page 111: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/111.jpg)
'
&
$
%
Shared-memory Computing
The operating system is itself a multithreadedapplication, with perhaps one thread handlingdisk i/o, one network i/o, one task scheduling, etc.
Task scheduling for multiple users is particularlyimportant:• system maintains a list of active processes• each process gets given its turn for execution
for a few milliseconds, and then is put to theback of the queue to wait for its next turn• multithreaded processes are usually
executed on a corresponding number ofprocessors
Lecture 5 111
![Page 112: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/112.jpg)
'
&
$
%
Static and Stack Memory Management
To understand some aspects of shared-memoryprogramming, need to know how compilershandle data within programs.
Static allocation means that the compiler decidesat compilation time where the data will sit withinthe user’s virtual memory.
Stack allocation means it’s handled on-the-flyduring execution, as needed.
Lecture 5 112
![Page 113: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/113.jpg)
'
&
$
%
Static and Stack Memory Management
In C, static allocation is specified through the useof the static instruction
void counter(int n){static int kount;
if (n==0)kount = 0;
else if (n==1)kount = kount + 1;
elseprintf("%d", kount);
return 0;
}
Lecture 5 113
![Page 114: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/114.jpg)
'
&
$
%
Static and Stack Memory Management
In C, stack allocation is the default, enablingroutines to be used recursively
int factorial(int n){int fact, nm;
if (n==1)fact = 1;
else if (n>1) {nm = n-1;fact = n*factorial(nm);}
}
return fact;
Lecture 5 114
![Page 115: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/115.jpg)
'
&
$
%
Static and Stack Memory Management
These two examples show the key aspects ofeach approach.
Static allocation is persistent, continuing after aroutine finishes – may be more efficient becauseno run-time allocation is needed.
Stack allocation is transient, with fresh allocationeach time a routine starts, disappearing when itfinishes.
Lecture 5 115
![Page 116: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/116.jpg)
'
&
$
%
Static and Stack Memory Management
So far, have considered only sequentialprocesses – what about multi-threading?
Simple example – suppose two threads want toprint something out at the same time.
The libraries that handle printing have internaldata. To avoid conflict, each call needs stackallocation giving independent private data— “thread-safe” libraries (often not the defaultbecause they’re less efficient).
Lecture 5 116
![Page 117: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/117.jpg)
'
&
$
%
Static and Stack Memory Management
More generally, in multithreaded applicationsthere is the important concept of shared andprivate data:
Private data belongs to a particular thread• it is allocated on its own private stack• it can be seen and changed only by that
thread
Shared data is visible to all threads• it is either statically allocated, or allocated on
a master stack• any of the threads can change its value
Lecture 5 117
![Page 118: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/118.jpg)
'
&
$
%
Shared-memory Programming
In general terms, there are two levels ofshared-memory programming.
At a low-level, one can start several threads andthen explicitly tell each what to do — in this case,the code will have instructions such as “if this isthread 3 then do the following ... ”.
Lecture 5 118
![Page 119: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/119.jpg)
'
&
$
%
Shared-memory Programming
This is very flexible, but involves tediousprogramming.
In general I would recommend it only toexperienced programmers wanting to do anapplication in which different threads are doingentirely different things — e.g. one thread ismanaging network i/o, one is running anexperiment, one is handling terminal i/o.
For C programs, POSIX pthreads is thestandard, but I have very little experience ofusing it.
Lecture 5 119
![Page 120: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/120.jpg)
'
&
$
%
Shared-memory Programming
The higher-level approach is to tell the compilerwhat can be done in parallel, and let itautomatically generate the code to handle themultiple threads.
In this case, typically there is a master threadwhich is always active, and a bunch of otherthreads which spring into action for parallel loops,and hibernate in between.
OpenMP is the standard for this higher-levelapproach, superceding the many vendor specificversions that used to exist.
Lecture 5 120
![Page 121: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/121.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 6: OpenMP Programming
Mike [email protected]
Lecture 6 121
![Page 122: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/122.jpg)
'
&
$
%
Overview
The code is executed sequentially by a masterthread except for regions (e.g. loops) which areexplicitly declared to be done in parallel.
The extra threads hibernate during the sequentialsections, are activated during the parallelsections, then get suspended again. This is allhandled by the compiler and the run-timeexecution environment.
The programmer is responsible for saying what isto be done in parallel. If the programmer makes amistake, execution may be slow and/or incorrect.
Lecture 6 122
![Page 123: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/123.jpg)
'
&
$
%
parallel for
The parallel for directive says the next loopis to be executed in parallel:
#pragma omp parallel for \private(i,du) shared(u,v)
for (i=0; i<imax; i++) {du = v[i]*v[i];u[i] += du;
}
Note the specification of private and sharedvariables. The default is that loop indices areprivate, and everything else is shared.
Lecture 6 123
![Page 124: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/124.jpg)
'
&
$
%
parallel for
Private variables are defined to exist transientlywithin the loop:• uninitialised on entry to the loop• undefined on exit from the loop
If there is a pre-existing global variable with thesame name, it is undefined what happens to this– avoid this!
Conceptually, the du variable in the previousexample becomes du n where n is the threadnumber, making these variables different fromany global variable du.
Lecture 6 124
![Page 125: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/125.jpg)
'
&
$
%
parallel for
There is control over how the loop iterations aredivided between the threads, through an optionalschedule argument.
schedule(static) splits the loop range into(almost) equal chunks, one for each thread. Thisis the default, and the best for simple loops withequal work per iteration
schedule(static,n) uses chunks of size n,assigned to threads in simple rotation.
Lecture 6 125
![Page 126: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/126.jpg)
'
&
$
%
parallel for
schedule(dynamic,n) uses chunks of size n
assigned to threads when they complete theprevious chunk. This is the best choice when thework per loop iteration varies considerably.
#pragma omp parallel for \private(i) shared(u) schedule(dynamic,n)
for (i=0; i<imax; i++) {if(u[i] < 0) {u[i] = small work(u[i]);
}if(u[i] >= 0) {u[i] = big work(u[i]);
}}
Lecture 6 126
![Page 127: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/127.jpg)
'
&
$
%
parallel for
With nested loops, remember it is the loopimmediately after the directive that is parallelised.
#pragma omp parallel for \private(i,j) shared(u)for (j=0; j<jmax; j++) {for (i=0; i<imax; i++) {u[id(i,j)] = ...
}}
Here the j loop is parallelised.
Lecture 6 127
![Page 128: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/128.jpg)
'
&
$
%
parallel for
for (j=0; j<jmax; j++) {#pragma omp parallel for \private(i) shared(j,u)
for (i=0; i<imax; i++) {u[id(i,j)] = ...
}}
Here the i loop is parallelised.
In general, parallelising the outer loop is best(less starting and suspending of threads) exceptwhen the outer loop is over a small range (poorload balancing — e.g. when there are 4 threadsand jmax = 5.)
Lecture 6 128
![Page 129: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/129.jpg)
'
&
$
%
parallel for
What can go wrong? What about
sum = 0;
#pragma omp parallel for \private(i,ds) shared(u,sum)
for (i=0; i<imax; i++) {ds = u[i]*u[i];sum += ds;
}
This is likely to give incorrect results because ofthe accumulation into sum.
Lecture 6 129
![Page 130: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/130.jpg)
'
&
$
%
parallel for
time
?
Thread 1load sum
add ds
store sum
Thread 2
load sum
add ds
store sum
What’s the problem? Consider two threads.
The overlapped additions to sum mean that thefirst thread’s contribution gets lost.
Lecture 6 130
![Page 131: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/131.jpg)
'
&
$
%
parallel for
First solution uses critical directive to say onlyone thread at a time can work with sum
sum = 0;
#pragma omp parallel for \private(i,ds) shared(u,sum)
for (i=0; i<imax; i++) {ds = u[i]*u[i];
#pragma omp critical{sum += ds;}
}
This will give valid results.
Lecture 6 131
![Page 132: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/132.jpg)
'
&
$
%
parallel for
Second solution uses atomic update, making theload, add, store sequence act as a singleinstruction — only possible for single operations.
sum = 0;
#pragma omp parallel for \private(i,ds) shared(u,sum)
for (i=0; i<imax; i++) {ds = u[i]*u[i];
#pragma omp atomicsum += ds;
}
This will give valid results.
Lecture 6 132
![Page 133: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/133.jpg)
'
&
$
%
parallel for
Although both of these solutions will give validresults, the performance will be appalling becausethe different threads will fight over access to thecache line holding the shared sum variable.
Instead, use special reduction instruction
sum = 0;
#pragma omp parallel for \private(i,ds) shared(u) reduction(+:sum)
for (i=0; i<imax; i++) {ds = u[i]*u[i];sum += ds;
}
Lecture 6 133
![Page 134: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/134.jpg)
'
&
$
%
parallel for
How does the compiler get good performance?
It creates temporary private variablessum local to accumulate the partial sums foreach thread, then at the end combines them withthe shared variable sum.
Works with other reduction operators such asmin, max, -, *.
Lecture 6 134
![Page 135: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/135.jpg)
'
&
$
%
parallel for
Another example of data dependencies isGauss-Seidel iteration.
#pragma omp parallel for private(i,j) shared(u)for (j=0; j<jmax; j++) {for (i=0; i<imax; i++) {u[id(i,j)]=0.25*(u[id(i-1,j)]+u[id(i+1,j)]
+u[id(i,j-1)]+u[id(i,j+1)]);}
}
This will produce incorrect results because itdoes not respect the fact that u[id(10,10)]should be updated after u[id(9,9)]
Lecture 6 135
![Page 136: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/136.jpg)
'
&
$
%
parallel for
To parallelise Gauss-Seidel correctly, first need toidentify inherent parallelism — all entries alongi + j = const can be updated in parallel.
w w w w w w w w ww w w w w w w w ww w w w w w w w ww w w w w w w w ww w w w w w w w ww w w w w w w w ww w w w w w w w ww w w w w w w w ww w w w w w w w w
@@
@@
@@
@@
@@
@@
@@@
@@
@@
@@
@@
@@
@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
@@
��
��
��
����
Lecture 6 136
![Page 137: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/137.jpg)
'
&
$
%
parallel for
Hence parallelise first half of loop as
for (k=0; k<imax; k++) {
#pragma omp parallel for private(i,j) shared(k,u)for (i=0; i<=k; i++) {j = k - i;u[id(i,j)]=0.25*(u[id(i-1,j)]+u[id(i+1,j)]
+u[id(i,j-1)]+u[id(i,j+1)]);}
}
and do the second half similarly.
Lecture 6 137
![Page 138: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/138.jpg)
'
&
$
%
Other OpenMP Directives
• parallel sections and section
Defines a number of sections of code to behandled by multiple threads, one per section.
• parallel
Most general parallel construct, definingcode to be executed by multiple threads,often with low-level control over what eachthread does, based on its thread number.
Lecture 6 138
![Page 139: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/139.jpg)
'
&
$
%
Financial Applications
For financial applications (and most others too)parallel for with shared, private andreduction clauses should be all that is needed.
Monte Carlo• use parallel for for parallel execution of
paths, with reduction to combine theresults to get average value
Lecture 6 139
![Page 140: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/140.jpg)
'
&
$
%
Financial Applications
Multi-dimensional Black-Scholes solution• for explicit FD methods, use parallel
for for outermost grid dimension• for implicit FD methods, details depend on
the iterative solver– methods like GMRES and BiCGstab will
need reduction for vector dot products– Gauss-Seidel and ILU preconditioners
will require careful re-writing to exposeinherent parallelism
Lecture 6 140
![Page 141: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/141.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 7: Distributed Memory Multiprocessors
Mike [email protected]
Lecture 7 141
![Page 142: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/142.jpg)
'
&
$
%
Idealisation
BSP hardware model (Valiant, McColl)
/ / / / / /P P P P P PM M M M M M
• a number of processor/memory nodesconnected by a ‘network’• each processor has fast access to local
memory and slow access to remote memory• real hardware differs in having usual
memory/cache/register hierarchy
Lecture 7 142
![Page 143: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/143.jpg)
'
&
$
%
Network
IBM’s Blue Gene uses a hypercubegeneralisation of a 2D network array
Lecture 7 143
![Page 144: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/144.jpg)
'
&
$
%
Network
Clusters use a commodity switch (GigabitEthernet, Myrinet, Infiniband)
Key performance measures are:• latency – minimum time to communicate
between two processors• bandwidth per processor
Lecture 7 144
![Page 145: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/145.jpg)
'
&
$
%
Network
Gigabit Ethernet• latency: 1–2 ms if using TCP/IP; 50µs if using
custom drivers• bandwidth per processor: 1Gb/s ≈ 100MB/s• now standard for PCs/servers
10Gig Ethernet• same latency as Gigabit Ethernet• bandwidth per processor: 10Gb/s ≈ 1GB/s• starting to be used for servers
Lecture 7 145
![Page 146: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/146.jpg)
'
&
$
%
Network
Myrinet (from Myricom)• latency: 10 µs• bandwidth per processor: 2-10Gb/s ≈
250MB-1GB/s• the current proprietary market leader for
distributed-memory systems
Infiniband• latency: 10 µs• bandwidth per processor: 10–40Gb/s ≈
1–5GB/s• a new standard being adopted by major
manufacturers, including IBM and SUN
Lecture 7 146
![Page 147: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/147.jpg)
'
&
$
%
Distributed-memory Computing
On commodity clusters, each node has its ownindependent Unix operating system kernel• completely independent computers,
connected by a network• each handles its own file i/o, network i/o,
process scheduling• if one machine “dies”, the rest carry on
regardless
Lecture 7 147
![Page 148: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/148.jpg)
'
&
$
%
Distributed-memory Computing
Slightly different on the IBM Blue Gene• micro-kernel on each node• specialised functions such as file i/o only
performed on certain nodes• not clear what happends when one node
“dies”
Lecture 7 148
![Page 149: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/149.jpg)
'
&
$
%
Distributed-memory Computing
User applications involve coordination betweenmultiple processes• problem data is split up between multiple
processes• typically, each process has unique use of
one node, to avoid scheduling difficulties• during program development, can run
multiple processes on one node to test code• processes communicate by sending
messages to each other
Lecture 7 149
![Page 150: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/150.jpg)
'
&
$
%
Distributed-memory Computing
Basic process loop:
6
?
?
do some work using local data
communicate between processes
Lecture 7 150
![Page 151: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/151.jpg)
'
&
$
%
Message Passing
• standard software (MPI, PVM) for allsystems• simple, crude, effective• requires action by both processes• sending:
– write message (put data into an array)– send to other process
• receiving:– receive message– read message (copy into another array)
Lecture 7 151
![Page 152: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/152.jpg)
'
&
$
%
Message Passing
MPI (Message Passing Interface) is the standard:• FORTRAN, C and C++ implementations
available on all major platforms• designed by committee, so lots of options, but
in practice few are needed• highly optimised• safe for use in parallel libraries
PVM (Parallel Virtual Machine) is an older library,now obselete.
Lecture 7 152
![Page 153: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/153.jpg)
'
&
$
%
Message Passing Concepts
Two key concepts: buffering and blocking
Message passing is buffered if the message istransferred via a message buffer, and not directlyfrom/to the process memory
Buffering is less efficient because of copying thedata, but it is usually simpler and safer (lessscope for the programmer to make mistakes)
Lecture 7 153
![Page 154: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/154.jpg)
'
&
$
%
Message Passing
Message passing is blocking if the process waitsto complete the send or receive operation beforecontinuing.
Non-blocking operations can be more efficient(allowing possible overlap of computation andcommunication) but can be more confusing, anderror prone.
Lecture 7 154
![Page 155: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/155.jpg)
'
&
$
%
Message Passing
When using buffering, it is simplest to usenon-blocking send (like sending a letter) andblocking receive (wait for the postman to deliverthe post)
Without buffering, it is simplest to use blockingsend/receive leading to a synchronous transfer(like sending a fax)
Lecture 7 155
![Page 156: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/156.jpg)
'
&
$
%
Message Passing
• buffered, non-blocking send– task A continues after sending message• buffered, blocking receive
– task B waits until it gets message
task A..send(B,msg)...
task B....recv(A,msg).
Lecture 7 156
![Page 157: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/157.jpg)
'
&
$
%
Message Passing
The big problem to be avoided is deadlock, inwhich all processors are waiting for someoneelse to send a message – a common error forbeginners
task A.recv(B,msg1)send(B,msg2).
������*
HHHHHHY
task B.recv(A,msg2)send(A,msg1).
Lecture 7 157
![Page 158: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/158.jpg)
'
&
$
%
Message Passing
For buffered transfers with a non-blocking send,the following works correctly,
task A.send(B,msg1)recv(B,msg2).
HHHHHHj
�������
task B.send(A,msg2)recv(A,msg1).
but it still leads to deadlock for sends which areblocking/synchronous.
Lecture 7 158
![Page 159: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/159.jpg)
'
&
$
%
Message Passing
For synchronous transfers, must use thefollowing:
task A.send(B,msg1)recv(B,msg2).
-
�
task B.recv(A,msg1)send(A,msg2).
Lecture 7 159
![Page 160: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/160.jpg)
160
![Page 161: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/161.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 8: BSP Model of Distributed Computing
Mike [email protected]
Lecture 8 161
![Page 162: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/162.jpg)
'
&
$
%
BSP Hardware Model
/ / / / / /P P P P P PM M M M M M
• a number of processor/memory nodesconnected by a ‘network’• each processor has fast access to local
memory and slow access to remote memory
Aim is to predict likely performance on realhardware, and make choices about alternativeimplementation strategies
Lecture 8 162
![Page 163: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/163.jpg)
'
&
$
%
BSP Parameters
p = number of processors
s = processor speed (Mflops)
l =latency/synchronisation time
time for 1 floating point op
g =time to get/send 1 fp. variable
time to do 1 floating point op.
Note:• p, l, g are non–dimensional• estimated execution time will be s−1f(p, l, g)
• local memory access times are neglected– no modelling of cache performance
Lecture 8 163
![Page 164: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/164.jpg)
'
&
$
%
BSP Parameters
For a cluster with Intel/AMD processors andMyrinet networking:
l ≈ 10µs
0.5ns= 2× 104, g ≈ 50ns
0.5ns= 100
For an IBM Blue Gene system with slowerprocessors and faster networking:
l ≈ 10µs
1ns= 104, g ≈ 25ns
1ns= 25
Lecture 8 164
![Page 165: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/165.jpg)
'
&
$
%
BSP Computation Model
Execution proceeds in supersteps separated bysynchronisations
superstepsynchsuperstepsynch
"""
Each superstep consists of each process doingsome calculations using local data thencommunicating some data to other processors
Lecture 8 165
![Page 166: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/166.jpg)
'
&
$
%
BSP Cost Modelling
The cost of a single superstep is
s−1(no + l + ncg)
whereno = max number of f.p. operationsnc = max number of real variables communicatedby one process
For a given application and problem size, no and nc
will depend on p.
The BSP cost of the whole task is just the sum ofthe individual supersteps.
Lecture 8 166
![Page 167: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/167.jpg)
'
&
$
%
Explicit FD Calculation
@@��
Suppose we want to perform a 2D explicit FDcalculation on a grid which is N1×N2.
To do this, we will partition the grid using a“processor grid” which is p1×p2 (with p=p1 p2)
Lecture 8 167
![Page 168: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/168.jpg)
'
&
$
%
Explicit FD Calculation
qqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q q
qqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q q
qqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q q
qqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q qqqqqqqqq
qqqqqqqq
q q q q q q q q
q q q q q q q q
Lecture 8 168
![Page 169: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/169.jpg)
'
&
$
%
Explicit FD Calculation
ssssssss
ssssssss
s s s s s s s s
s s s s s s s s
To minimise memory requirements, eachprocessor works with just its part of the overall
grid, of sizeN1
p1×N2
p2, plus a copy of the
neighbouring nodes from adjacent partitions– often known as “halo nodes”
Lecture 8 169
![Page 170: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/170.jpg)
'
&
$
%
Explicit FD Calculation
If each timestep requires m operations per gridpoint, the total number of operations per superstepis
no = mN1 N2
p1 p2
The new values of halo nodes then have to becommunicated to the neighbours on all four sides,so
nc = 2
(
N1
p1+
N2
p2
)
and the total BSP cost is
T = s−1
(
mN1 N2
p1 p2+ l + 2g
(
N1
p1+
N2
p2
))
Lecture 8 170
![Page 171: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/171.jpg)
'
&
$
%
Explicit FD Calculation
Re-writing it as
T = s−1
(
mN1 N2
p+ l + 2g
(
N1
p1+
p1N2
p
))
and treating p1 as continuous, with p fixed, we findthis is minimised when
N1
p1=
N2
p2
This gives us our first result using BSP modelling— time is minimised by using square partitions(minimum ratio of surface to volume)
Lecture 8 171
![Page 172: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/172.jpg)
'
&
$
%
Explicit FD Calculation
If we now define
Nlocal =N1
p1=
N2
p2
then the total cost is
T = s−1(
m N2local + l + 4g Nlocal
)
For good efficiency, want communication andlatency costs to be small compared to computation,so require
Nlocal �max(√
l/m, 4g/m)
This is our second BSP result — the minimumproblem size for effective parallelisation
Lecture 8 172
![Page 173: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/173.jpg)
'
&
$
%
Explicit FD Calculation
Suppose we now consider a d-dimensionalproblem, with each partition of size Nd
local.
In this case, the total BSP cost per timestep is
T = s−1(
m Ndlocal + l + 2 d g Nd−1
local
)
For good efficiency require
Nlocal �max
(
l
m
)1/d
,2 d g
m
– probably best satisfied for d=3.
Lecture 8 173
![Page 174: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/174.jpg)
'
&
$
%
Explicit FD Calculation
In general, we define parallel efficiency as
Parallel efficiency =sequential time
p× parallel time
In the 2D explicit FD case,
sequential time = ms−1N1N2 = ms−1pN2local
so we get
Parallel efficiency =
(
1 +l
mN2local
+4g
mNlocal
)−1
Lecture 8 174
![Page 175: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/175.jpg)
'
&
$
%
Parallel Efficiency and Scalability
Scalability concerns what happens as you increasethe number of processors. However, one has to becareful with how it is defined:• Fixed overall problem size: as p increases,
Nlocal decreases so the parallel efficiencydecreases.• Fixed problem size per processor: as p
increases, Nlocal remains fixed and so doesthe parallel efficiency.
Personally, I think the second definition is moreappropriate – the point of using lots of processorsis to be able to tackle really big problems.
Lecture 8 175
![Page 176: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/176.jpg)
'
&
$
%
Halos for FD Calculations
v vv v vv v
interior
The FD approximation to the 2D Black-Scholesequation uses a 7-point stencil because of thecross-derivative.
Lecture 8 176
![Page 177: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/177.jpg)
'
&
$
%
Halos for FD Calculations
At first sight, it looks as if this will require halotransfers from 2 of the diagonal neighbours,as well as the four immediate neighbours.
However, with a little care, extra transfers canbe avoided.
The key is to complete halo exchange in thex-direction before starting halo exchange inthe y-direction.
Lecture 8 177
![Page 178: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/178.jpg)
'
&
$
%
Halos for FD Calculations
v v v v v vv v v v v vv v v v v vv v v v v v
After the exchange in the x-direction, with theimmediate neighbours on either side, the nodeswith dots have up-to-date values.
Lecture 8 178
![Page 179: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/179.jpg)
'
&
$
%
Halos for FD Calculations
v v v v v vv v v v v vv v v v v vv v v v v vv v v v v vv v v v v v
After the exchange in the y-direction, with theimmediate neighbours on either side, all nodeshave up-to-date values. The corner values comefrom copying the neighbours’ halos.
Lecture 8 179
![Page 180: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/180.jpg)
180
![Page 181: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/181.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 9: An Introduction to MPIMessage-Passing
Mike [email protected]
Lecture 9 181
![Page 182: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/182.jpg)
'
&
$
%
Key Reference
Using MPI: portable parallel programming withthe message-passing interface (second edition)by Gropp, Lusk and Skjellum is excellent!
• starts with basics and adds to them slowly• emphasises that most people need only a
limited subset of MPI• lots of examples of direct relevance• I suggest you stick to Chapters 1–4:
– Background– Introduction– Using MPI in Simple Programs– Intermediate MPI.
Lecture 9 182
![Page 183: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/183.jpg)
'
&
$
%
Some Basics
A program using MPI must be compiled with aspecial command, usually mpicc or mpcc.
One thing this does is to provide a link to aheader file mpi.h which must be included ineach C file using the line
#include "mpi.h"
When run interactively, it is executed by a specialcommand of the formmprun -np n program
where n is the number of processes to be used.
Lecture 9 183
![Page 184: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/184.jpg)
'
&
$
%
Some Basics
An MPI program usually starts with the lines
MPI Init(*argc,*argv);
MPI Comm size(MPI COMM WORLD, *nprocs);
MPI Comm rank(MPI COMM WORLD, *myid);
• MPI Init initialises things• MPI Comm size gives the number of processes• MPI Comm rank gives the “rank” within the group
(0 ≤ myid < nprocs)
Lecture 9 184
![Page 185: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/185.jpg)
'
&
$
%
Some Basics
MPI COMM WORLD is a communicator which inthis case is a constant defined in mpi.h todenote the entire set of processes.
It is possible to construct other communicators,e.g. for communication between a subset ofprocesses, or to protect/isolate communicationwithin a library.
Lecture 9 185
![Page 186: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/186.jpg)
'
&
$
%
Some Basics
The first routines to learn about are:• MPI Bcastto broadcast data from one
process to the others;• MPI Reduceto reduce data from all
processes to one;• MPI Send, MPI Recv, MPI Sendrecv
to send messages between processes• MPI Finalize terminates all MPI
communication
You can go a long way using just these routines.
Lecture 9 186
![Page 187: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/187.jpg)
'
&
$
%
MPI Bcast
The syntax of the broadcast subroutine is
MPI Bcast(*data,size,type,origin,
communicator)
• data is the data to be sent• size is the number of pieces of data• type is its type (e.g. MPI INT orMPI DOUBLE)• origin is the rank of the process doing the
broadcast
Lecture 9 187
![Page 188: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/188.jpg)
'
&
$
%
MPI Reduce
Similarly, the syntax of the reduction subroutine is
MPI Reduce(*input,*output,size,type,
operation,destination,
communicator)
• input is the data to be reduced• output is where the result is put on the process
given by destination; use MPI Allreduce
instead to send the output to all processes• operation is the reduction operation to be
performed (e.g. MPI SUM or MPI MAX)• the others are the same as for MPI Bcast
Lecture 9 188
![Page 189: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/189.jpg)
'
&
$
%
MPI Send
MPI Send(*data,size,type,
destination,tag,
communicator)
• destination is where the message is to besent, and tag is a user-chosen integer label• to be safe, think of this as a blocking
synchronous send; for small messages it mayhave a non-blocking implementation using asystem buffer
Lecture 9 189
![Page 190: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/190.jpg)
'
&
$
%
MPI Recv
MPI Recv(*data,size,type,
origin,tag,
communicator,*status)
• a blocking receive, will wait for a message withcorrect origin and tag, but these can be set toMPI ANY SOURCE and MPI ANY TAG
• status is a variable of special typeMPI Status with additional information• note that incoming messages do not have to be
read in the order in which they arrive
Lecture 9 190
![Page 191: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/191.jpg)
'
&
$
%
MPI Sendrecv
MPI Sendrecv(*data1,size1,type1,dest,tag1,
*data2,size2,type2,orig,tag2,
communicator,*status)
• a combined blocking send and receive;my personal favourite when most processesneed to both send and receive• can use MPI PROC NULL as destination (or origin)
if there is no message to be sent (or received)• combining operations enables MPI implementation
to be more efficient
Lecture 9 191
![Page 192: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/192.jpg)
'
&
$
%
Vector datatypes
So far, all of the send/receive routines have dealtwith contiguous blocks of data. However, inpractice the data to be communicated is often notcontiguous (e.g. 2D halo exchange).
What to do?• Option 1: copy everything into a contiguous
array, then send• Option 2: use MPI’s capability to define new
vector datatypes
Lecture 9 192
![Page 193: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/193.jpg)
'
&
$
%
Vector datatypes
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35
Simplest to show an example from “Using MPI “
MPI Type vector(5,1,7,MPI DOUBLE,
&newtype)
Lecture 9 193
![Page 194: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/194.jpg)
'
&
$
%
Vector datatypes
After the new data type has been defined it has to be“committed” using the command
MPI Type commit(&newtype)
and then it can be used, as in
MPI Send(&data[3],1,newtype,
destination,tag,
communicator)
Note this specifies just one item, of type newtype
with data[3] being the start of the item
Lecture 9 194
![Page 195: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/195.jpg)
'
&
$
%
Vector datatypes
The general syntax of MPI Type vector is
MPI Type vector(count,size,stride,
oldtype,*newtype)
• count is the number of blocks• size is the size of each block (often 1)
composed of type oldtype
• stride is the offset between each block (≥ size)• newtype is the label for the new datatype, of type
MPI Datatype
Note: oldtype can itself be a derived datatype,so you can build up very complex datatypes.
Lecture 9 195
![Page 196: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/196.jpg)
196
![Page 197: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/197.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 10: Explicit and Implicit FD Methods
Mike [email protected]
Lecture 10 197
![Page 198: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/198.jpg)
'
&
$
%
Explicit FD Calculation
To recap, the explicit B-S discretisation is
V n+1 = (1− r∆t)V n +r∗∆t
2∆η
(
δ2η1+ δ2η2
)
V n
+σ2∆t
2∆η2
(
(1−ρ)δ2η1+ ρδ2η1η2
+ (1−ρ)δ2η2
)
V n
giving a 7-point stencil
x xx x x
x x
Lecture 10 198
![Page 199: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/199.jpg)
'
&
$
%
Explicit FD Calculation
The computational grid is broken into partitions ...
Lecture 10 199
![Page 200: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/200.jpg)
'
&
$
%
Explicit FD Calculation
... and each timestep involves calculations oneach partition followed by an updating of the halodata – a single BSP superstep
sssssssss
sssssssss
s s s s s s s s s
s s s s s s s s s
Lecture 10 200
![Page 201: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/201.jpg)
'
&
$
%
Explicit FD Calculation
One practical point to note: each MPI task shouldonly allocate memory for the partition and itshalo, not the entire grid.
There are two options on handling indices andarrays within each partition:• use usual “global” indices with an adjustment
to the definition of id(i,j) so thatid(i,j) = offset+i+j*imax local
• use “local” indices with standard arrayswithout offsets – this is my personalpreference.
Lecture 10 201
![Page 202: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/202.jpg)
'
&
$
%
Implicit FD Calculation
The implicit discretisation is
(1 + r∆t) V n+1 − r∗∆t
2∆η
(
δ2η1+ δ2η2
)
V n+1
− σ2∆t
2∆η2
(
(1−ρ)δ2η1+ ρδ2η1η2
+ (1−ρ)δ2η2
)
V n+1
= V n
which may be written collectively as
AV n+1 = b
giving a system of simultaneous equations to besolved iteratively.
Lecture 10 202
![Page 203: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/203.jpg)
'
&
$
%
Implicit FD Calculation
Note that the operations necessary to evaluatethe matrix-vector product AV , are essentially thesame as for an explicit timestep.
Assuming the halo data is up-to-date, one cancompute on each partition the elements of theproduct AV which correspond to grid points inthat partition.
Lecture 10 203
![Page 204: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/204.jpg)
'
&
$
%
Jacobi iteration
The difficulties involved in parallelising an implicitFD calculation depend on how the implicitequations are solved.
The simplest approach would be to use Jacobiiteration in which each point is updated using oldvalues of its neighbours.
Ak,kV(m+1)k = bk −
∑
l 6=k
Ak,lV(m)l
Each Jacobi iteration step requires just onesuperstep to update the interior points andexchange halo data with neighbouring partitions.
Lecture 10 204
![Page 205: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/205.jpg)
'
&
$
%
CG Iteration
If a Krylov iterative solver with a simple diagonalpreconditioner is used, then it is alsostraightforward.
To see this, we will consider the use of CG tosolve
Ax = b
with A being symmetric and positive definite, witha sparse 5-point stencil in 2D.
Lecture 10 205
![Page 206: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/206.jpg)
'
&
$
%
CG Algorithm
x0 = 0; k = 0; r0 = b−Ax0
while |rk| > tolerancek = k + 1
if k = 1
p1 = r0else
βk = rTk−1rk−1/rT
k−2rk−2
pk = rk−1 + βkpk−1
endαk = pT
k rk−1/pTk Apk
xk = xk−1 + αkpk
rk = rk−1 − αkApk
endx = xk
Lecture 10 206
![Page 207: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/207.jpg)
'
&
$
%
CG Algorithm
The core part of the algorithm, after somere-arrangement, is
αk = pTk rk−1/pT
k Apk
xk = xk−1 + αkpk
rk = rk−1 − αkApk
βk+1 = rTk rk/rT
k−1rk−1
pk+1 = rk + βk+1pk
which can be calculated in three supersteps asfollows:
Lecture 10 207
![Page 208: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/208.jpg)
'
&
$
%
CG Algorithm
Superstep 1:• compute Apk on local partition• compute local contributions to pT
k rk−1 andpTk Apk and send to others
Superstep 2:• compute αk and update xk and rk
• compute local contribution to rTk rk and send
to others
Superstep 3:• compute βk and update pk+1
• exchange pk+1 halos
Lecture 10 208
![Page 209: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/209.jpg)
'
&
$
%
CG Algorithm
Alternatively, if we write it as
pk = rk−1 + βkpk−1
αk = pTk rk−1/pT
k Apk
xk = xk−1 + αkpk
rk = rk−1 − αkApk
βk+1 = rTk rk/rT
k−1rk−1
then it can be done in two supersteps as follows:
Lecture 10 209
![Page 210: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/210.jpg)
'
&
$
%
CG Algorithm
Superstep 1:• finish computing βk and update pk
(including halo copies)• compute Apk on local partition• compute local contributions to pT
k rk−1 andpTk Apk and send to others with Apk halo
Superstep 2:• compute αk and update xk and rk
(including halo copies)• compute local contribution to rT
k rk and sendto others
Lecture 10 210
![Page 211: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/211.jpg)
'
&
$
%
CG Algorithm
This second approach is unusual:• usually don’t modify halo values – just
“read-only” copies from the neighbouringpartition• works in this case because they are updated
in exactly the same way as on the “master”partition
Shows a little creativity can reduce the executiontime.
Lecture 10 211
![Page 212: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/212.jpg)
212
![Page 213: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/213.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 11: More on Implicit Methods
Mike [email protected]
Lecture 11 213
![Page 214: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/214.jpg)
'
&
$
%
Gauss-Seidel
If Gauss-Seidel is used to solve the equations, oras a preconditioner, it is harder to parallelise.• start by partitioning the grid into strips
Lecture 11 214
![Page 215: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/215.jpg)
'
&
$
%
Gauss-Seidel: first approach
• first superstep: start with first row, first stripand work across to first partition boundary tosend ‘halo’ point to neighbour
u u u u uLecture 11 215
![Page 216: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/216.jpg)
'
&
$
%
Gauss-Seidel: first approach
• next superstep: do second row of first strip,and first row of second strip
u u u u uu u u u u u u u u u
Lecture 11 216
![Page 217: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/217.jpg)
'
&
$
%
Gauss-Seidel: first approach
• additional supersteps: continue the processuntil the grid is completed
u u u u u u u u u u u u u u u u u u u uu u u u u u u u u u u u u u u u u u u uu u u u u u u u u u u u u u u u u u u uu u u u u u u u u u u u u u u u u u u uu u u u u u u u u u u u u u uu u u u u u u u u uu u u u u
Lecture 11 217
![Page 218: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/218.jpg)
'
&
$
%
Gauss-Seidel: first approach
If the grid size is N×N , then
# supersteps = N+p−1 ≈ N, assuming p� N
cost of single step = s−1
(
14N
p+ l + 2g
)
=⇒ Total cost = s−1
(
14N2
p+ Nl + 2Ng
)
≈ s−1
(
14N2
p+ Nl
)
since l� g.
Lecture 11 218
![Page 219: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/219.jpg)
'
&
$
%
Gauss-Seidel: first approach
For good parallel efficiency, needN
p� l
14
For real hardware this implies huge problems areneeded to make good use of parallelism.
What to do?
1) Use Jacobi iteration instead – tempting but lazy.Personal view: start with best numerical algorithmand then worry about how to parallelise it.
2) Reduce number of supersteps
Lecture 11 219
![Page 220: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/220.jpg)
'
&
$
%
Gauss-Seidel: second approach
Same as before, except do m rows beforetransferring boundary data to neighbouringpartition
t t t t t t t t t t t t t t tt t t t t t t t t t t t t t tt t t t t t t t t t t t t t tt t t t t t t t t t t t t t tt t t t t t t t t tt t t t t t t t t tt t t t t t t t t tt t t t t t t t t tt t t t tt t t t tt t t t tt t t t t
m6
?
Lecture 11 220
![Page 221: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/221.jpg)
'
&
$
%
Gauss-Seidel: second approach
# supersteps =N
m+ p− 1 ≈ N
m+ p
superstep cost = s−1
(
14mN
p+ l + 2mg
)
Total time T ≈ s−1(
N
m+ p
)
(
14mN
p+ l + 2mg
)
Note that N � pg is necessary for communicationtime to be negligible compared to computation.This condition is satisfied for large problems onhardware with high bandwidth.
Lecture 11 221
![Page 222: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/222.jpg)
'
&
$
%
Gauss-Seidel: second approach
If it is satisfied, then
T ≈ s−1
(
14mN + pl +14N2
p+
Nl
m
)
For fixed values of N, p, s, g, l the total time is aminimum when
dT
dm= 0 =⇒ 14N − Nl
m2= 0
=⇒ m =√
l/14
Lecture 11 222
![Page 223: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/223.jpg)
'
&
$
%
Gauss-Seidel: second approach
For this optimum value for m we get
T ≈ s−1
(
14N2
p+ 2N
√14 l + pl
)
= s−1 4N2
p
1 +p√
l/14
N
2
and so for good parallel efficiency we requireN � p
√l in addition to N � pg.
These restrictions are now achievable withreasonably large problem sizes on real hardware.
Lecture 11 223
![Page 224: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/224.jpg)
'
&
$
%
Gauss-Seidel
The lessons to be learned from this are:• don’t settle for the most obvious solution; if it
doesn’t give good performance work out whyand try to find a solution• the optimal parallel algorithm may depend on
hardware BSP parameters; the attraction ofBSP cost modelling is that it allows you tomodel the tradeoffs
Lecture 11 224
![Page 225: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/225.jpg)
'
&
$
%
ILU
ILU (incomplete LU factorisation) is sometimesused as a preconditioner for iterative solverssuch as GMRES.
It involves solving two systems of equations withtriangular matrices
LUx = b =⇒ Ly = b, Ux = y
The L solution is like the forward sweep in G-S;the U solution is like the reverse sweep.
Lecture 11 225
![Page 226: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/226.jpg)
'
&
$
%
ADI
Parallelisation of ADI preconditioners is complicatedbecause of the tri-diagonal equations to be solved.
Start by dividing N×N grid into√
p×√p partitions tominimise communication costs.
Lecture 11 226
![Page 227: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/227.jpg)
'
&
$
%
ADI
Using Thomas algorithm to solve the equations,in first superstep, begin m columns andcommunicate appropriate data to neighbours:
m-�
Lecture 11 227
![Page 228: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/228.jpg)
'
&
$
%
ADI
In second superstep, do the next m columns:
m-�
Repeat until the forward sweep is complete.
Lecture 11 228
![Page 229: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/229.jpg)
'
&
$
%
ADI
Use a similar procedure for the reverse sweep.
Optimum value for m can be deduced from BSPcost analysis.
Lecture 11 229
![Page 230: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/230.jpg)
230
![Page 231: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/231.jpg)
231
![Page 232: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/232.jpg)
232
![Page 233: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/233.jpg)
'
&
$
%
Numerically Intensive Computing in Finance
Lecture 12: More on MPI
Mike [email protected]
Lecture 12 233
![Page 234: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/234.jpg)
'
&
$
%
Cartesian Grids
Most finite difference methods use structuredgrids with i,j,k indexing (as opposed to finiteelement methods that often use unstructuredgrids composed of triangles/tetrahedra with avery general connectivity).
MPI calls them Cartesian grids, and provides anumber of special routines to make it easy towork with them.
Lecture 12 234
![Page 235: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/235.jpg)
'
&
$
%
Cartesian Grids
MPI Dims create(nprocs,ndim,*pdims)
This routine creates a process grid to partition amulti-dimensional Cartesian grid• nprocs is the number of processes (input)• ndim is the number of dimensions (input)• pdims is an array containing the dimensions of
the process grid (output) with the product beingequal to nprocs
Lecture 12 235
![Page 236: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/236.jpg)
'
&
$
%
Cartesian Grids
MPI Cart create(oldcomm, ndim, *pdims,
*periodic, *reorder,
*newcomm)
This routine assigns processes to the process gridand creates a new communicator• oldcomm is the old communicator (usuallyMPI COMM WORLD); newcomm is the new one• ndim and pdims are same as before• periodic is an array defining whether the grid
is to be periodic• reorder specifies whether to give MPI full
freedom in how to assign processors
Lecture 12 236
![Page 237: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/237.jpg)
'
&
$
%
Cartesian Grids
MPI Cart coords(newcomm,myid,ndim,*coords)
This routine gives the coordinates of the process withinthe Cartesian process grid• myid is the rank obtained by callingMPI Comm rank(newcomm,*myid)
• coords is an integer array of size ndim giving thecoordinates
Lecture 12 237
![Page 238: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/238.jpg)
'
&
$
%
Cartesian Grids
MPI Cart shift(newcomm,dir,shift,
*src,*dest)
Often want to shift data from a processor to itsneighbour in a particular direction. This routinegives the ID’s of the two neighbouring processes:• 0 ≤ dir < ndim is the direction• shift is the size of shift (usually 1)• src is the ID of the process below
(the source of shifted messages)• dest is the ID of the process above
(the destination of shifted messages)
Lecture 12 238
![Page 239: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/239.jpg)
'
&
$
%
Cartesian Grids
These routines provide all the key capabilities forworking with Cartesian grids – all are used inPractical 5.
The one thing not provided is a simple routine toexchange halos – this you have to programyourself using a vector datatype.
Lecture 12 239
![Page 240: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/240.jpg)
'
&
$
%
Cartesian Grids
ttttttttt t t t t t t t
nx
ny
In 2D, exchange in x-direction uses a stride ofnx, and exchange in y-direction is a simplecontiguous transfer.
Lecture 12 240
![Page 241: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/241.jpg)
'
&
$
%
Cartesian Grids
In 3D, it is hard to visualise, but• in x-direction, halo has ny*nz elements with
a stride of nx• in z-direction, halo is a single contiguous
block of size nx*ny
• in y-direction halo has nz blocks of size nx
with stride nx*ny
Practical 5 generalises this to an arbitrarynumber of dimensions
Lecture 12 241
![Page 242: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/242.jpg)
'
&
$
%
Blocking and Buffering
In lecture 7, discussed the concepts of blockingand buffering• blocking means the program waits until the
operation has completed before continuing• buffering means the data is copied to a
temporary buffer during transmission
MPI provides 5 different combinations of sendand receive — enough to confuse anyone!• MPI Send, MPI Ssend, MPI Bsend
all pair with MPI Recv
• MPI Sendrecv works on its own• MPI Isend pairs with MPI Irecv
Lecture 12 242
![Page 243: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/243.jpg)
'
&
$
%
Blocking and Buffering
MPI Recv is a blocking receive• program can only continue once the
message has arrived
MPI Ssend is a blocking send• synchronous transfer like sending a fax• simple, but generally not efficient due to
unnecessary waiting
Lecture 12 243
![Page 244: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/244.jpg)
'
&
$
%
Blocking and Buffering
MPI Bsend is a non-blocking buffered send• copies the data into a buffer before
continuing• user must supply the buffer – see
documentation• generally good for efficiency (less waiting)
but copying data costs time, and supplyingthe buffer is tedious and error-prone
MPI Send is a cross between MPI Ssend andMPI Bsend — a reasonable compromise• uses internal buffer for small messages• uses synchronous transfer for large ones
Lecture 12 244
![Page 245: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/245.jpg)
'
&
$
%
Blocking and Buffering
MPI Sendrecv is a blocking send/recv pair• very well suited to halo exchange• the MPI system decides the order of sending
and receiving• no buffering so no time wasted on copying• very easy to use
Lecture 12 245
![Page 246: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/246.jpg)
'
&
$
%
Blocking and Buffering
MPI Isend and MPI Irecv are quite different,non-blocking operations producing anasynchronous transfer
Continuing the letter/fax analogy, these are likeshipping a piano using a courier company:• sender says “here’s where the piano is”• receiver says “here’s where I want it to go
when it arrives”• the courier ships directly, when both ready• sender and receiver continue as usual,
occasionally checking to see if the piano hasgone/arrived
Lecture 12 246
![Page 247: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/247.jpg)
'
&
$
%
Blocking and Buffering
The syntax for MPI Isend is:
MPI Isend(*data,size,type,
destination,tag,
communicator,*request)
The one extra argument compared to MPI Send
is request. This is a handle which can be usedlater to check if the send operation has beencompleted, or to wait for it to complete.
Lecture 12 247
![Page 248: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/248.jpg)
'
&
$
%
Blocking and Buffering
Similarly, the syntax for MPI Irecv is:
MPI Irecv(*data,size,type,
origin,tag,
communicator,*request)
Compared to MPI Recv the argument statushas been replaced by the handle request.
Lecture 12 248
![Page 249: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/249.jpg)
'
&
$
%
Blocking and Buffering
The status of a request can be tested with thecommand
MPI Test(*request,*flag,*status)
with flag being true if it has been completed.
Alternatively, can wait for it to be completed using
MPI Wait(*request,*status)
There are also MPI Waitall and MPI Waitany
variants for handling multiple requests; they do whattheir names suggest.
Lecture 12 249
![Page 250: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/250.jpg)
'
&
$
%
Blocking and Buffering
When using MPI Isend and MPI Irecv theimportant thing is not to touch the data after thestart of the transfer and before its completion.
The whole point of using MPI Test andMPI Wait is to know when it is safe to startusing the data on the receiving side, and tore-use the storage on the sending side.
Using MPI advocates this form of sendingmessages — I agree in principle, but I thinkMPI Sendrecv is simpler / more intuitive.
Lecture 12 250
![Page 251: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/251.jpg)
'
&
$
%
Other MPI Capabilities
• more general datatypes, useful for handlingstructures or a mix of real and integervariables• support for parallel libraries, to make sure
message-passing within the library does notconflict with the user’s own message-passing• MPI error handling• various scatter/gather operations in addition
to broadcast• routines for constructing new
communicators, e.g. to enable differentgroups of processes to do different tasks
Lecture 12 251
![Page 252: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/252.jpg)
'
&
$
%
Other MPI Capabilities
MPI-2 (new standard, not yet fully implemented):• dynamic spawning of new processes, and
their inclusion into new communicators• parallel file I/O – better performance than all
file I/O being done by one process• remote memory operations put/get,
directly accessing remote memory withoutany action by remote process
Lecture 12 252
![Page 253: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/253.jpg)
'
&
$
%
Final Advice
• get a solid understanding of the basics• read the early chapters of Using MPI
carefully, maybe skim through the rest• if using MPI Send, check it works if you useMPI Ssend instead• only use advanced capabilities if you’re sure
they will greatly simplify the programming(e.g. the Cartesian utilities) or greatlyimprove performance• keep the MPI code as isolated as possible
from the main application code• if it’s an important application, discuss it with
others with more experience
Lecture 12 253
![Page 254: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/254.jpg)
'
&
$
%
Practical 4
Global view of data partitioning:
j = 0j = 1
j = m− 2j = m− 1
Lecture 12 254
![Page 255: Numerically Intensive Computing in Finance Lecture 1 ...people.maths.ox.ac.uk/gilesm/talks/nicf06.pdfMoney and economics are what drive computing, not technology. Money: if there’s](https://reader035.vdocuments.mx/reader035/viewer/2022081403/608997b8834d537ddf15b897/html5/thumbnails/255.jpg)
'
&
$
%
Practical 4
Local view of data partitioning:
j = jlower − 1
j = jlower
j = jupper
j = jupper + 1
halojlocal = 0
jlocal = 1
jlocal = jmax − 2
jlocal = jmax − 1 halo
jlower = ((m−2) ∗myid)/nprocs + 1
jupper = ((m−2) ∗ (myid+1))/nprocs
jmax = jupper − jlower + 3
joff = jlower − 1
Lecture 12 255