apracticalguidetoquasi-montecarlomethodsdirk.nuyens/taiwan/... · 11/7/2016 ·...

A Practical Guide to Quasi-Monte Carlo Methods

Frances Y. Kuo and Dirk Nuyens

Abstract These notes are prepared for the short course on “High-dimensionalIntegration: the Quasi-Monte Carlo Way”, to be held at National Chiao Tung Uni-versity and National Taiwan University in November 2016. We will cover basictheory and practical usage of quasi-Monte Carlo methods, with a demo on thesoftware packages. Our aim is to make these notes easily accessible to non-experts,including students, practitioners, and potential new collaborators. We discussonly the essential concepts and hide away most of the technical details. We do notcite references in the text, but references for further reading are provided in thefinal section.

The sections marked with * contain more theoretical background and are targetedat potential collaborators who wish to gain a deeper understanding. These sec-tions are not necessary for students and practitioners who just want to try outquasi-Monte Carlo methods for the first time.

Date: 7 November 2016

Frances Y. KuoSchool of Mathematics and Statistics, University of New South Wales, Sydney NSW 2052, Australiae-mail: [email protected]

Dirk NuyensDepartment of Computer Science, KU Leuven, Celestijnenlaan 200A, 3001 Leuven, Belgiume-mail: [email protected]

1

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1 High dimensional integration . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Quasi-Monte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Lattice points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1 Generating vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Random shifting and practical error estimation . . . . . . . . 82.3 Fast component-by-component construction . . . . . . . . . 102.4 Lattice sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 A taste of the theoretical error analysis* . . . . . . . . . . . . . . . 12

3 Digital nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1 Digital net property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Digital construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Sobol′ sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4 Polynomial lattice rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.5 Random digital shifting and scrambling . . . . . . . . . . . . . . . 203.6 Higher order nets by interlacing . . . . . . . . . . . . . . . . . . . . . . 20

4 Toy applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1 Transformation to the unit cube . . . . . . . . . . . . . . . . . . . . . . 224.2 Option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.4 PDE with a random coefficient . . . . . . . . . . . . . . . . . . . . . . . 25

5 Software demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.1 A simple test function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.2 The difficulty of our test function . . . . . . . . . . . . . . . . . . . . . 295.3 Some technical details* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.4 Usage of random number generators . . . . . . . . . . . . . . . . . 305.5 Monte Carlo approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 315.6 Quasi-Monte Carlo approximation . . . . . . . . . . . . . . . . . . . . 325.7 Using standard lattice point generators . . . . . . . . . . . . . . . 355.8 Applying the theory* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3

4 Contents

5.9 Constructing point sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.10 Sobol′ sequences, digital sequences, and interlacing . . . 46

6 Small project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

§1.2 Monte Carlo method 5

1Introduction

High dimensional problems are coming to play an ever more important role inapplications. They pose immense challenges for practical computation, because ofa nearly inevitable tendency for the cost of computation to increase exponentiallywith dimension. Effective and efficient methods that do not suffer from this “curseof dimensionality” are in great demand. Quasi-Monte Carlo (QMC) methods canlift this curse and we will show you how.

1.1 High dimensional integration

We begin with an integral formulated over the s-dimensional unit cube [0,1]s ,

I ( f ) =∫1

0· · ·

∫1

0f (x1, . . . , xs )dx1 · · ·dxs =

∫[0,1]s

f (x)dx ,

where the number of integration variables – the dimensionality – s is large, e.g.,hundreds or thousands or more.

(Note that an expectation can be written as an integral. Later we will discuss theimportant question of how to transform an integral from practical applicationsinto this form.)

One approach that comes to mind is to approximate this integral by a productrule, i.e., each one-dimensional integral is approximated by your favorite one-dimensional quadrature rule, e.g., rectangle rule, Simpson rule, Gauss rule, etc.But this would not work: with 100 integration variables, even if you have just2 quadrature points in each coordinate direction, then you would require 2100

evaluations of the integrand f and your computation would never finish in yourlife time! So, forget about product rules in high dimensions!

(There is a class of methods called sparse grids which cleverly leaves out someproduct points; that’s a story for another day.)

1.2 Monte Carlo method

The Monte Carlo method, or MC method in short, approximates the integral byaveraging random samples of the function

Qn( f ) = 1

n

n−1∑k=0

f (tk ), (1)

6 §1 Introduction

where the sample points t0, . . . , tn−1 are independent and uniformly distributedover the unit cube. This is a very simple and widely used method. It can be de-ployed as long as the integrand is square integrable.

Apart from the ease of use, the Monte Carlo method has the advantage ofproducing an unbiased estimate of the integral, i.e., E[Qn( f )] = I ( f ). It can beeasily shown that the root-mean-square error of the Monte Carlo method satisfies√

E |I ( f )−Qn( f )|2 = σ( f )�n

,

where σ2( f ) := I ( f 2)− (I ( f ))2 is the variance of f . So we say that the Monte Carlomethod “converges like order 1/

�n”, and we write O (1/

�n). In concrete terms,

this means that if you want to reduce your error in half, then you need to use 4times as many sample points. This convergence rate is often too slow for practicalapplications.

The variance of f is generally not explicitly known, but in practice we canestimate the root-mean-square error by

√E |I ( f )−Qn( f )|2 ≈

√√√√ 1

n(n −1)

n−1∑k=0

( f (tk )−Qn( f ))2 .

1.3 Quasi-Monte Carlo methods

Quasi-Monte Carlo methods, or QMC methods in short, take the same form (1)as the Monte Carlo method in the unit cube, but instead of generating the sam-ple points tk randomly, we choose them deterministically in a clever way to bemore uniformly distributed than random points, so that they have a faster rate ofconvergence.

All QMC theoretical error bounds take the common form of a product

|I ( f )−Qn( f )| ≤ D(t0, . . . , tn−1)V ( f ), (2)

with one factor depending only on the points and the other depending only onthe integrand. In the classical theory these two factors are called the discrepancyof the points and the variation of f , respectively. If the integrand f has sufficientsmoothness, e.g., can be differentiated once with respect to each variable, thenclassical theory tells us that certain QMC methods can converge like O ((logn)s /n);they are referred to as low-discrepancy sequences. The convergence rates can beeven higher for periodic integrands.

The drawback of the classical QMC theory is that the error bound and impliedconstant grow exponentially with dimension s, so the theory is not useful whens is very large. A remedy is provided in modern QMC theory by working withweighted function spaces: the error bound can be independent of s as long as the

§1.3 Quasi-Monte Carlo methods 7

integrand f has the appropriate property that there is some varying degree ofimportance between the variables. A taste of this modern theory is given in §2.5.We then have a very similar, but modern, interpretation of (2) in the form

|I ( f )−Qn( f )| ≤ eγ(t0, . . . , tn−1)‖ f ‖γ ,

where the first factor is now called the worst case error of the QMC method in aweighted function space with weights γ, and the second factor is the norm of f inthat same weighted space.

There are two main families of QMC methods: lattice rules and digital nets.They represent different approaches to achieve uniformity of the points. Here wewill introduce these methods, providing a bit more details on lattice rules whiletouching only some basic principles of digital nets.

8 §2 Lattice points

2Lattice points

Lattice rules have been around since the 1950s and they are very easy to specifyand use: all you need is one integer vector with s components.

2.1 Generating vector

Given an integer vector z = (z1, . . . , zs ) known as the generating vector, a (rank-1)lattice rule with n points takes the form

Qn( f ) = 1

n

n−1∑k=0

f

({kz

n

})= 1

n

n−1∑k=0

f

(kz mod n

n

), (3)

where the braces around a vector indicate that we take the fractional parts of eachcomponent in the vector, e.g., {(1.8,2.3)} = (0.8,0.3), which is clearly equivalent tocarrying out the modulo n operation in the numerator as indicated in (3). Figure 1(left) illustrates a 64 point lattice rule in 2D.

The quality of the lattice rule depends on the choice of the generating vector.Due to the modulo operation, it suffices to consider the values from 1 up to n −1,leaving out 0 which is clearly a bad choice. Furthermore, we restrict the values tothose relatively prime to n, to ensure that every one-dimensional projection of then points yields n distinct values. Thus we write z ∈Us

n , with Un := {z ∈Z : 1 ≤ z ≤n −1 and gcd(z,n) = 1}.

For theoretical analysis we often assume that n is prime to simplify somenumber theory arguments. For practical application we often take n to be a powerof 2. The total number of possible choices for the generating vector is then (n−1)s

and (n/2)s , respectively. Even if we have a criterion to assess the quality of thegenerating vectors, there are simply too many choices to carry out an exhaustivesearch when n and s are large. Later we will return to this issue of constructing agood generating vector.

2.2 Random shifting and practical error estimation

We can shift the points of a lattice rule by any vector of real numbers Δ =(Δ1, . . . ,Δs ), to obtain a shifted lattice rule

Qn( f ) = 1

n

n−1∑k=0

f

({kz

n+Δ

}).

§2.2 Random shifting and practical error estimation 9

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

Fig. 1 Applying a (0.1,0.3)-shift to a 64-point lattice rule in two dimensions: left – original latticerule, middle – moving all points by (0.1,0.3), right – wrapping the points back inside the unitcube.

Due to the fractional part function, we may restrict the shift to Δ ∈ [0,1)s . Figure 1(right) illustrates the result of shifting a 64-point lattice rule in 2D by the vector(0.1,0.3). Clearly we see the regular structure of the lattice points are preserved.

A randomly shifted lattice rule provides an unbiased approximation of theintegral (this applies to all QMC methods), while using multiple shifts allows us toobtain a practical error estimate in the same way as the Monte Carlo method. Itworks as follows. We generate q independent random shifts Δ(i ) for i = 0, . . . , q −1from the uniform distribution on [0,1]s . For the same fixed lattice generatingvector z , we compute the q different shifted lattice rule approximations and denotethem by Q(i )

n ( f ) for i = 0, . . . , q −1. We take the average

Qn,q ( f ) = 1

q

q−1∑i=0

Q(i )n ( f ) = 1

q

q−1∑i=0

(1

n

n−1∑k=0

f

({kz

n+Δ(i )

}))

as our final approximation to the integral. Then an estimate for the root-mean-square error of Qn,q ( f ) is given by

√E |I ( f )−Qn,q ( f )|2 ≈

√√√√ 1

q(q −1)

q−1∑i=0

(Q(i )n ( f )−Qn,q ( f ))2 .

Here the expectation is taken with respect to the random shifts.The total number of function evaluations in Qn,q ( f ) is q n. Typically, we take q

to be small, e.g., q = 16 or 32. For a fair comparison with the Monte Carlo method,we should therefore take nMC = q nQMC samples in the Monte Carlo method.


2.3 Fast component-by-component construction

Recall that the components of the generating vector can be restricted to the setUn , e.g., Un = {1,2,3, . . . ,n −1} when n is prime, and Un = {1,3,5, . . . ,n −1} when nis a power of 2. There are far too many choices in high dimensions.

Suppose that we have a computable criterion for assessing the quality of agenerating vector in dimension s, and we denote it by Es (z1, . . . , zs ), and it is thesmaller the better. Then we can use the component-by-component construction tofind a generating vector:

1. Set z1 = 1 (because in one dimension all choices are the same).2. Choose z2 from the set Un so that E2(z1, z2) is minimized.3. Choose z3 from the set Un so that E3(z1, z2, z3) is minimized.4. Choose z4 from the set Un so that E4(z1, z2, z3, z4) is minimized.5. . . .

The fact that such a greedy algorithm can produce good generating vectors isjustified by theory, and we will say more about this in §2.5. The computationalcost of the algorithm depends on the form of this criterion Es (z1, . . . , zs ). We havethe fast component-by-component construction: in some favourable situationsthe cost is O (s n logn) operations, i.e., linearly in dimension s and almost linearlyin the number of points n. This means that we can really construct generatingvectors in tens of thousands of dimensions and millions of points!

The magic behind the fast component-by-component construction is that inmany cases the algorithm requires the evaluation of a matrix-vector multiplicationwith the matrix of the form[

ω

(kz mod n

n

)]z∈Un ,1≤k≤n−1

for some function ω. When n is prime, we can permute the rows and columns ofthis matrix to obtain a circulant matrix so that the matrix-vector multiplicationwhich typically requires O (n2) operations can be done in O (n logn) operationsusing Fast Fourier Transforms. When n is not prime it gets more complicated butsimilar cost savings can be made.

Figure 2 illustrates the structure of such a matrix (left) for n = 53 and thecorresponding matrix after permutation (right).

2.4 Lattice sequences

Recall that the formula for obtaining the kth point of an n-point lattice rule withgenerating vector z is

tk ={

k

nz}

. (4)

§2.4 Lattice sequences 11

1 53

Fig. 2 Circulant permutation for n = 53 (prime) for fast component-by-component construction.

This gives rise to a so-called “closed” QMC method: the generation of the pointsdepends on knowing n in advance. This is inconvenient in practice, because if wewant to change the number of points we would need to generate all of the pointsfrom scratch. An “open” QMC method, on the contrary, allows you to keep addingpoints as you wish while keeping all existing points; they are therefore referred toas “sequences” and are also said to be “extensible”.

In a lattice sequence in base 2, the formula is changed to

tk = {φ2(k) z

}, (5)

where φ2(·) is the radical inverse function in base 2: loosely speaking, if we havethe index k = (· · ·k2k1k0)2 in binary representation, then φ2(k) = (0.k0k1k2 · · ·)2

is obtained by mirroring the bits of k around the binary point. For example, ifk = 6 = (110)2 then φ2(k) = (0.011)2 = 0.375.

The formula (5) does not require you to know n in advance, and so in practiceyou can add more points to your lattice rule approximation until you are satisfiedwith the error. When n = 2m for any m ≥ 1, the formulas (4) and (5) produce thesame set of points, only the ordering of the points are different. Therefore, if youwant the points of an extensible lattice rule only at exact powers of 2, you canavoid the radical inverse function and still use the formula (4) to get your points.For example, if you already have n = 2m points for some m, then to double thenumber of points all you need to do is to use the formula (4) with n replaced by 2n,and then consider only those points generated by the odd indices k. All of theabove extends trivially to base b ≥ 2.

We know how to construct a good generating vector for a lattice sequence usingthe fast component-by-component construction, i.e., this generating vector canbe used for many different values of n. Figure 3 illustrates the nested structurein the matrices when we work with powers of 2 and how we explore the nestedcirculant structure.


1 2 4 8 16 32 64 128 1 2 4 8 16 32 64 128

① Natural ordering of the indices

② Grouping on divisors ③ Generator ordering of the indices

④ Symmetric reduction after applicationof B2 kernel function

Fig. 3 Number theoretic permutations on a matrix with n = 128 (power of 2) for fast component-by-component construction.

2.5 A taste of the theoretical error analysis*

Here we discuss some key elements of the theory and construction for lattice rules.It is not necessary to understand all of these to be able to use lattice rules. Wetherefore mark this subsection as optional material (*). The reader may skip to thenext section.

Weighted Sobolev space: what kind of integrands can we handle?

In the modern analysis of randomly shifted lattice rules, we assume that theintegrand f belongs to a weighted Sobolev space of functions whose mixed firstderivatives are square-integrable, with the norm given by

‖ f ‖2γ = ∑

u⊆{1,...,s}

1

γu

∫[0,1]|u|

(∫[0,1]s−|u|

∂|u| f∂xu

(x)dx−u)2

dxu . (6)

There are different variants of the norm, but ultimately it is a way to measure theregularity and variability of the function. Okay, this is a hell of a formula to take in.Let us explain what it means step by step.

There are 2s possible subsets u of the coordinate indices {1, . . . , s}. Let us picka simple example first, say, s = 5 and u = {1,3,4}. Then we separate the “active”variables xu = (x1, x3, x4) from the “inactive” variables x−u = (x2, x5), and consider

§2.5 A taste of the theoretical error analysis* 13

∂|u| f∂xu

= ∂3 f

∂x1∂x3∂x4.

This is called a “mixed first derivative” because we never differentiate more thanonce with respect to each variable, even though it looks like a 3rd order derivativein the regular sense. According to the norm, we should integrate out the inactivevariables, square the result, and then integrate out the active variables:

∫1

0

∫1

0

∫1

0

(∫1

0

∫1

0

∂3 f

∂x1∂x3∂x4(x1, x3, x4; x2, x5)dx2 dx5

)2

dx1 dx3 dx4. (7)

We do this for each of the 2s subsets of {1, . . . , s} and then sum up the results, butwith weights γu > 0 acting as relative scaling. A large value for (7) means that f ismore variable in the projection of (x1, x3, x4), and we need a larger weight γ{1,3,4}

to compensate it in the norm if we want f to have norm 1.We denote the norm with a subscript γ to emphasize the important role played

by the weights. We will see below that under appropriate conditions on the weightswe can obtain error bounds that are independent of the dimension s. In practicewe would choose the weights to match the characteristics of a given integrand.

The simplest form of weights are the so-called product weights: we assume thatthere is one weight γ j > 0 associated with each variable x j so that

γu = ∏j∈u

γ j ,

e.g., γ{1,3,4} = γ1γ3γ4. Typically we also assume that γ1 ≥ γ2 ≥ ·· · > 0, indicatingthat the variables are labeled in the order of decreasing importance.

Another form of weights that have become popular in recent times are calledPOD weights, or product and order dependent weights. There is an additionalsequence of numbers Γ� such that γu = Γ|u|

∏j∈uγ j , i.e., the weights have an extra

multiplying factor which depends on the number of elements in the set u, hencethe name “order dependent”. POD weights arise from some PDE applications andoften some factorials |u|! appear; we won’t discuss them further here.

Worst case error: how do we assess the quality of a lattice rule?

The worst case error for a shifted lattice rule in our weighted Sobolev space isdefined to be the largest possible error for any function with norm at most 1, i.e.,

eγ(z ,Δ) := sup‖ f ‖γ≤1

|I ( f )−Qn( f )|.

This means that for any given f in our weighted Sobolev space, we have the latticerule error bound

|I ( f )−Qn( f )| ≤ eγ(z ,Δ)‖ f ‖γ .


For a randomly shifted lattice rule, we have the root-mean-square error bound√E |I ( f )−Qn( f )|2 ≤ esh

γ (z)‖ f ‖γ , (8)

where the expectation is with respect to the random shift Δ, and where

eshγ (z) :=

√∫[0,1]s

e2γ(z ,Δ)dΔ

is called the shift-averaged worst case error. Notice the separation of the depen-dence of the error bounds on the points from the dependence on the integrand,similarly to (2), but here the weights enter both factors. There is a trade-off: largeweights lead to a small norm but a large worst case error, and vice versa.

Our weighted Sobolev space happens to be a reproducing kernel Hilbert space.We will not go into any details here, other than saying that this provides a verypowerful set of tools for analysis and we have explicit computable formulas foreγ(z ,Δ) and esh

γ (z). In particular, with product weights we know that

[eshγ (z)]2 = −1+ 1

n

n−1∑k=0

s∏j=1

(1+γ j B2

({kz j

n

})), (9)

where B2(x) = x2 −x +1/6 for x ∈ [0,1] is the Bernoulli polynomial of degree 2.

Component-by-component construction: how do we find a good latticegenerating vector?

Given weightsγ j and a generating vector z , we can evaluate (9) in O (ns) operations.Theoretically we could do this for each of the (n−1)s choices of generating vectorswhen n is prime and then pick the vector with the smallest worst case error. Thisis however not practically possible when s is large.

We will choose the generating vector by the component-by-component con-struction: given n, smax, and weights γu,

1. Set z1 = 1.2. For s = 2,3, . . . , smax, choose zs in Un to minimize [esh

γ (z1, . . . , zs−1, zs )]2.

To prove that this gives a good lattice rule we use mathematical induction,combined with the so-called “averaging argument”. First we present a simpleresult to illustrate the proof technique.

Theorem 1. Let n ≥ γ1/6 be prime. A lattice rule can be constructed by the component-by-component algorithm such that

[eshγ (z1, . . . , zs−1, zs )]2 ≤ 1

n

s∏j=1

(1+ γ j

6

).

§2.5 A taste of the theoretical error analysis* 15

By taking the square root on both sides, we see that this simple result gives usonly the O (1/

�n) convergence rate, the same as the Monte Carlo method. On the

other hand, since

s∏j=1

(1+ γ j

6

)= exp

(s∑

j=1log

(1+ γ j

6

))≤ exp

(1

6

s∑j=1

γ j

),

we see that our error bound can be independent of the dimension s provided thatthe sum of the infinite sequence of weights is finite, i.e.,

∞∑j=1

γ j <∞ .

Here is a brief synopsis of the proof. From (9) we can write the error expressionin s dimensions in terms of the error expression in s −1 dimensions as follows:

[eshγ (z1, . . . , zs−1, zs )]2

= [eshγ (z1, . . . , zs−1)]2 + γs

n

n−1∑k=0

[B2

({kzs

n

}) s−1∏j=1

(1+γ j B2

({kz j

n

}))].

Then we take the average of this expression over all choices of zs from Un . When nis prime this means that we take

A(z1, . . . , zs−1) = 1

n −1

n−1∑zs=1

[eshγ (z1, . . . , zs−1, zs )]2.

Since the only dependence on zs is in the first B2 factor, we end up having tocompute

1

n −1

n−1∑zs=1

B2

({kzs

n

}),

which equals 1/6 if k = 0 and −1/(6n) otherwise. We combine this with the induc-tion hypothesis on [esh

γ (z1, . . . , zs−1)]2 to show that

A(z1, . . . , zs−1) ≤ 1

n

s∏j=1

(1+ γ j

6

).

Now since we take zs to be the value that minimizes [eshγ (z1, . . . , zs−1, zs )]2, it must

be bounded by the average A(z1, . . . , zs−1) and in turn bounded by the requiredupper bound.

A more sophisticated averaging argument can be used to prove that we can getclose to O (1/n) convergence. We state the result below for general weights γu andgeneral n.

Theorem 2. A lattice rule can be constructed by the component-by-componentalgorithm such that


eshγ (z) ≤

(1

|Un |∑

��=u⊆{1,...,s}γλu

(2ζ(2λ)

(2π2)λ

)|u|)1/(2λ)

for all λ ∈ (1/2,1], where ζ(x) =∑∞k=1 k−x is the Riemann zeta function.

We have |Un | = n −1 when n is prime, |Un | = n/2 when n is a power of 2, andmore generally |Un | ≤ n/2 when n is a power of a prime. The convergence rateclose to O (1/n) is obtained by taking the parameter λ arbitrarily close to 1/2. Thisimposes stronger decay requirements on the weights γu if we want to end up witha bound that is independent of s. In particular, if we have product weights, then tohave a convergence rate close to O (1/n) we need

∞∑j=1

√γ j < ∞.

We concede that this theorem is way too technical for the purpose of thisintroduction, but we just want to provide a taste of the analysis, as the heading ofthis subsection foreshadowed.

§3.1 Digital net property 17

3Digital nets

Here we take a very informal approach to introduce the family of digital nets.

3.1 Digital net property

Loosely speaking, the general principle of digital nets is all about getting the samenumber of points in various allowable sub-divisions of the unit cube. This is similarin spirit to the Sudoku game!

Figure 4 illustrates the digital net property in 2D with 16 points. We can partitionthe unit square into 16 rectangles of the same shape and size. There is exactly onepoint in each rectangle (points on the top and right boundaries count toward thenext rectangle), and this must hold for all of the 5 possible ways to sub-divide theunit square.

•

•

•

•

•

•

•

••

•

•

•

•

•

•

• •

•

•

•

•

•

•

••

•

•

•

•

•

•

• •

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

• •

•

•

•

•

•

•

••

•

•

•

•

•

•

• •

•

•

•

•

•

•

••

•

•

•

•

•

•

•

Fig. 4 Illustration of a (0,4,2)-net in base 2: every elementary interval of volume 1/16 containsexactly one of the 16 points. A point that lies on the dividing line counts toward the interval aboveor to the right.

18 §3 Digital nets

This property generalizes to base b ≥ 2: instead of halving each time, we sub-divide into b equal partitions. The property also generalizes to include a quality pa-rameter called “t-value”: each allowable sub-division (formally called elementaryintervals) contains exactly bt points. The smaller t is, the finer we can sub-dividethe unit cube, and the more uniformly distributed the points are. Such a point setwith n = bm points in s dimensions is called a (t ,m, s)-net. Figure 4 is an exampleof a (0,4,2)-net in base 2.

A (t , s)-sequences is a sequence of points in s dimensions such that if we chopthe sequence into consecutive blocks of bm points then every block is a (t ,m, s)-net.

3.2 Digital construction

Needless to say, we cannot design digital nets in high dimensions by hand drawingrectangles or boxes. We construct digital nets by a digital construction scheme.

Recall that to construct lattice rules we need a generating vector of integers –one integer per dimension. To construct a digital net we need a vector of generatingmatrices C1, . . . ,Cs – one generating matrix per dimension.

Here is how it works in base 2 (easily generalizes to base b ≥ 2). Suppose wewant n = 2m points. To get the j th component of the kth point, we write k =(km−1 · · ·k1k0)2 in binary representation, take the m × m binary matrix C j fordimension j , and compute⎛⎜⎜⎜⎜⎝

y1

y2...

ym

⎞⎟⎟⎟⎟⎠=

⎛⎜⎜⎜⎜⎝ C j

⎞⎟⎟⎟⎟⎠⎛⎜⎜⎜⎜⎝

k0

k1...

km−1

⎞⎟⎟⎟⎟⎠ , (10)

where all additions and multiplications are carried out modulo 2. Then the j thcomponent of the kth point is (0.y1 y2 · · · ym)2.

Just as the case that the choice of generating vector determines the quality of alattice rule, here the choice of the generating matrices determines the quality ofa digital net – the corresponding t-value of the net can be small (good) or large(bad). In the case of a lattice rule we need only one integer value z j in dimensionj , but here we need to specify m2 entries for the binary matrix C j .

Finding good generating matrices can be a difficult task. Below we discuss twospecial cases of digital net construction.

Before we proceed, note that we can think of C j in (10) as the top-left handcorner of some bigger matrix and it does not even have to be a square matrix.

§3.4 Polynomial lattice rules 19

3.3 Sobol′ sequences

Sobol′ points are a popular example of digital nets in base 2, and they have beenaround long before the general concept of digital nets took shape. (Sobol′ is aRussian name – the ′ is not a typo! It denotes a soft pronunciation of the letter l.)To generate Sobol′ points, we need one primitive polynomial and some initialdirection numbers for every dimension.

Primitive polynomials have specific properties which we will not go into here,but it is well-known how many there are of a given degree and also what they are.Since we need a different primitive polynomial for each dimension, and sincethe quality of the Sobol′ points deteriorates when the degree of the polynomialincreases, we arrange all the primitive polynomials in order of increasing degreeso that we use up all the lower degree polynomials first.

The initial direction numbers are used to kick start some recurrence relationinvolving the coefficients of the primitive polynomial in each dimension. Theseeventually lead to the entries of the generating matrix C j . Again we will not go intothe technical details here. Many software packages include implementation ofSobol′ generators, e.g., Matlab, NAG, QuantLib. We also provide our own Sobol′generators for more than 20,000 dimensions.

3.4 Polynomial lattice rules

Another good way to get digital nets is by polynomial lattice rule construction –they are actually digital nets rather than lattice rules, but in some formulation theymimic lattice rules and hence the name. Instead of having one generating vectorof integers z1, . . . , zs , we need a generating vector of polynomials q1(χ), . . . , qs (χ) –one polynomial per dimension.

Let p(χ) be a polynomial of degree m with binary coefficients. In dimension j ,we have a polynomial q j (χ) of degree at most m −1 with binary coefficients. Wefind the binary digits u1,u2, . . . in

q j (χ)

p(χ)= u1

χ+ u2

χ2 + u3

χ3 +·· ·

by equating coefficients in q j (χ) = (u1/χ+u2/χ2 +u3/χ3 +·· · )p(χ), noting that alladditions and multiplications are to be done modulo 2. Then we set

C j =

⎛⎜⎜⎜⎜⎜⎜⎝

u1 u2 u3 · · · um

u2 u3 . . .. . .

um+1

u3 . . .. . .

. . .um+2

... . . .. . .

. . . ...um um+1 um+2 · · · u2m−1

⎞⎟⎟⎟⎟⎟⎟⎠ .

20 §3 Digital nets

The polynomial p(χ) is called the modulus, and it does not play a crucial role.The quality of a polynomial lattice rule is determined by the choice of the gen-erating polynomials q1(χ), . . . , qs (χ). Nowadays we have theory and algorithm tofind good polynomials using fast component-by-component construction, analo-gously to lattice rules. All of these generalize to base b ≥ 2.

3.5 Random digital shifting and scrambling

To preserve the digital net property, we need a different kind of randomizationstrategy other than shifting, which preserves the lattice structure. One simplestrategy is by digital shifting. Instead of taking {tk +Δ} for the kth point, we do

tk ⊕Δ

which means we carry out the exclusive-or operation on the binary bits of thevector components. For example, if x = 0.625 = (0.101)2 and y = 0.125 = (0.001)2,then x ⊕ y = (0.100)2 = 0.5.

Scrambling is a more sophisticated randomization technique which can im-prove on the convergence rate of digital nets by an extra factor of 1/

�n in some

circumstances. Figure 5 illustrates the concept of scrambling as a sequence ofpictures in 2D where slices are randomly swapped following some allowable con-ditions that preserve the digital net property.

As for lattice rules, randomization of digital nets provide an unbiased estimateto the integral approximation as well as a practical error estimate.

3.6 Higher order nets by interlacing

There is a strategy called interlacing which can turn a regular digital net into higherorder digital net. Higher order digital nets can achieve O (1/nα) convergence if theintegrand is roughly α times differentiable in each variable. We need a differentfunction space setting to the one in §2.5, and the theory is quite challenging.

Conceptually, to get a higher order digital net in s dimensions with interlacingfactor α, we take a regular digital net in α s dimensions, and then “interlace” everyblock of α dimensions. Interlacing works as follows: if we have x = (0.x1x2x3 · · · )2,y = (0.y1 y2 y3 · · · )2 and z = (0.z1z2z3 · · · )2, then the result of interlacing these threenumbers is

(0.x1 y1z1x2 y2z2x3 y3z3 · · · )2 .

This corresponds to an interlacing factor of 3, and we end up with a number thathas three times as many bits as the original numbers.

§3.6 Higher order nets by interlacing 21

(a) (b) (c)

(d) (e) (f)

(g) (h)

Fig. 5 Owen’s scrambling in base 2: (a) original; (b) swap left and right halves; (c) swap 3rd and4th vertical quarters; (d) swap 3rd and 4th, 7th and last vertical eighths; (e) swap 3rd and 4th,7th and 8th, 9th and 10th, 15th and last sixteenths; (f) swap 1st and 2nd horizontal quarters; (g)swap 1st and 2nd, 5th and 6th, 7th and last horizontal eighths; (h) swap 3rd and 4th, 7th and 8th,9th and 10th, 15th and last horizontal sixteenths.

In practice an efficient way to implement higher order digital nets is by interlac-ing the rows of the generating matrices of the regular digital net, and then generatethe points from these expanded matrices by allowing non-square matrices in (10).

Note that precision is a practical issue for higher order digital nets: if we wantn = 2m points with interlacing factor α, then under standard double precisionmachine numbers we can only manage when αm ≤ 53. For example, we can onlyget up to order α= 3 with 216 points.

22 §4 Toy applications

4Toy applications

In this section we outline three integrals which arise from grossly simplified modelsof practical applications. We begin with a discussion on the transformation neededto bring the integral into the unit cube.

4.1 Transformation to the unit cube

Question 1. Given an integral ∫∞

−∞g (y)φ(y)dy ,

where φ : R→R is some univariate probability density function, i.e., φ(y) ≥ 0 forall y ∈R and

∫∞−∞φ(y)dy = 1, how do we transform the integral into [0,1]?

Answer. Let Φ : R→ [0,1] denote the cumulative distribution function of φ, i.e.,Φ(y) =∫y

−∞φ(t )dt and and let Φ−1 : [0,1] →R denote its inverse. Then we use thesubstitution (or change of variables)

x = Φ(y) ⇐⇒ y = Φ−1(x) ,

to obtain ∫∞

−∞g (y)φ(y)dy =

∫1

0g (Φ−1(x))dx =

∫1

0f (x)dx ,

with the transformed integrand f := g ◦Φ−1.

Question 2. Is this the only way?

Answer. No. We can divide and multiply by any other probability density func-tion φ̃, and then map to [0,1] using its inverse cumulative distribution functionΦ̃−1: ∫∞

−∞g (y)φ(y)dy =

∫∞

−∞g (y)φ(y)

φ̃(y)φ̃(y)dy

=∫∞

−∞g̃ (y) φ̃(y)dy , g̃ (y) := g (y)φ(y)

φ̃(y)

=∫1

0g̃ (Φ̃−1(x))dx

=∫1

0f̃ (x)dx ,

§4.1 Transformation to the unit cube 23

giving a different transformed integrand f̃ := g̃ ◦ Φ̃−1.Ideally we would like to use a density function which leads to an easy integrand

in the unit cube. This is related to the concept of importance sampling for theMonte Carlo method.

Question 3. How does this transformation generalize to s dimensions?

Answer. If we have a product of univariate densities, then we can apply themapping Φ−1 componentwise

y = Φ−1(x) = (Φ−1(x1), · · · ,Φ−1(xs ))

to obtain ∫Rs

g (y)s∏

j=1φ(y j )dy =

∫[0,1]s

g (Φ−1(x))dx =∫

[0,1]sf (x)dx .

Remember that we can always divide and multiply to get such a product:∫Rs

(some ugly function of y)dy =∫Rs

(some ugly function of y)∏sj=1φ(y j )

s∏j=1

φ(y j )dy .

Question 4. How do we tackle the multivariate normal density which occurs inmany integrals from practical models?

Answer. If the multivariate normal density is the dominating part of the entireintegrand, then factorize the covariance matrix Σ, i.e., find an s × s matrix A suchthat

Σ = A A� , (11)

and then use the substitution (treating all vectors as column vectors)

y = Az followed by z = Φ−1(x)

to obtain∫Rs

g (y)exp(− 1

2 y�Σ−1 y)�(2π)s det(Σ)

dy (12)

=∫Rs

g (Az)exp(− 1

2 z�z)�(2π)s

dz

=∫Rs

g (Az)s∏

j=1

exp(− 12 z2

j )�

2πdz =

∫[0,1]s

g (AΦ−1(x))dx =∫

[0,1]sf (x)dx .

The factorization (11) is not unique. Two obvious choices are

1. the Cholesky factorzation with lower triangular matrix A, or


2. the principal components factorization which is given by

A = [√

λ1η1; · · · ;√

λsηs ] ,

where (λ j ,η j )sj=1 denotes the eigenpairs of Σ, with ordered eigenvalues λ1 ≥

λ2 ≥ ·· · ≥λs and unit-length column eigenvectors η1, . . . ,ηs .

Other choices are possible.

Question 5. What if the multivariate normal density is not the dominating part ofthe entire integrand?

Answer. In that case, other transformation steps would be required to capturethe main feature of the entire integrand.

4.2 Option pricing

Following the Black-Scholes model, the integral arising from pricing an arithmeticaverage Asian call option takes the general form of (12), with

g (y) = e−r T max

(1

s

s∑j=1

St j (y j )−K ,0

),

and

St j (y j ) = S0 exp

((r − σ2

2

)j T

s+σy j

).

where r is the risk-free interest rate,σ is the volatility, and S0 is the initial asset price.The variables y = (y1, . . . , ys )� correspond to a discretization of the underlyingBrownian motion over a time interval [0,T ], and the covariance matrix has entriesΣi j = (T /s)min(i , j ). The payoff function g (y) compares the average of the assetprices St j at the discrete times with the strike price K , and takes their difference ifit is positive, or the value 0 if the difference is negative.

It is widely accepted that QMC methods work especially well for such prob-lems if we take the principal component construction approach to facorize thecovariance matrix Σ. The success of QMC for option pricing problems cannot beexplained by the standard theory due to the “kink” in the integrand. Lots of newQMC theory have been developed with this problem in mind.

Parameters for numerical experiment: S0 = 100 (dollars), K = 100 (dollars),T = 1 (year), r = 0.1, σ= 0.2, and s = 256.

§4.4 PDE with a random coefficient 25

4.3 Maximum likelihood

One example of a time series Poisson likelihood model involves an integral of theform (12), with

g (y) =s∏

j=1

exp(τ j (β+ y j )−eβ+y j )

τ j !.

Here β ∈R is a model parameter, τ1, . . . ,τs ∈ {0,1, . . .} are the count data, and Σ is aToeplitz covariance matrix with Σi j =σ2κ|i− j |/(1−κ2), where σ2 is the varianceand κ ∈ (−1,1) is the autoregression coefficient.

The obvious way to transform this integral into the unit cube by factorizing Σ

would yield a very spiky function f . Instead, it is better to consider q(y) and themultivariate normal density together and then perform some change of variableswith the effect of recentering and rescaling the whole integrand, before mappingto the unit cube. We have some new QMC theory that can explain the success ofthis approach.

Parameters for numerical experiment: β = 0.7, σ2 = 0.3, κ = 0.5, and τ =(2,0,3,2,0,2,1,4,2,1,8,5,2,3,6,2,2,0,0,1,0,7,2,5,1) for s = 25 (we have more databeyond 25 dimensions). This example came from Kuo, Dunsmuir, Sloan, Wand &Womersley (2008).

4.4 PDE with a random coefficient

We consider the following model of a parametric elliptic Dirichlet problem

−∇x · (a(x , y)∇x u(x , y)) = f (x) for x ∈ D and y ∈ [− 12 , 1

2 ]s , (13)

u(x , y) = 0 for x ∈ ∂D,

where D ⊂Rd , d ∈ {1,2,3}, is a bounded spatial domain with a Lipschitz boundary∂D , and where the parametric diffusion coefficient is (after truncating an infiniteseries to s terms)

a(x , y) = a(x)+s∑

j=1y j ψ j (x), (14)

The parameter vector y is distributed on [− 12 , 1

2 ]s with the uniform probabilitymeasure. We assume that a ∈ L∞(D) and

∑∞j=1 ‖ψ j ‖L∞(D) <∞, and that 0 < amin ≤

a(x , y) ≤ amax <∞ for all x and y . We refer to this as the uniform case.We are interested in the expected value of some bounded linear function G of

the solution u, which is an integral of the form∫[− 1

2 , 12 ]s

G(u(·, y))dy .


Note that in this problem we are integrating with respect to the parameter vectory , and not the spatial variables x .

A QMC Finite Element approximation to this integral is

1

n

n−1∑k=0

G(uh

(·, tk −

1

2

)),

where uh denotes the Finite Element weak solution. Essentially, we generate anumber of QMC points (either deterministic or randomized) and translate themto the cube [− 1

2 , 12 ]s . Each such translated QMC point gives a different value of y

and we solve the corresponding PDE and then apply the functional G . We finallytake the average of all solutions.

By now there is a large body of literature on applying QMC methods to theseand related problems, including the so-called lognormal case which give rise to anintegral over Rs with a normal density. QMC methods are relatively new to theseapplications and they are proven to be very competitive to other well establishedmethods.

Parameters for numerical experiment: f (x) = 100x1, G(u(·, y)) =∫D u(x , y)dx ,

d = 2, a(x) = 1, s = 100 (or any other number), and

ψ j (x) = λ j sin(k1, j πx1) sin(k2, j πx2) ,

where the sequence of pairs((k1, j ,k2, j )

)j≥1 is an ordering of the elements of Z+×

Z+ such that λ j = (k21, j +k2

2, j )−2 is a non-increasing sequence. (In other words, we

form the pairs of positive integers, order them according to the reciprical of thesum of the squared components, and then keep the first s pairs.) This examplecame from Dick, Kuo, Le Gia & Schwab (2016).

§5.1 A simple test function 27

5Software demo

We will demonstrate how to apply QMC methods to your favorite integrals/expec-tations. We will consider a simple test function throughout this section, insteadof taking more complicated examples as in the previous section (e.g., where onefunction evaluation could involve solving a PDE).

Matlab will be used as the lingua franca in the examples, but further down youcan find Python and C++ code. If you know at least one of these languages (orany similar language) then you should be able to understand what is going on,especially after comparing the three different implementations in §5.7. From acomputational point of view it is important that we vectorize our function evalua-tions, see §5.3 for an explanation.

The exposition is such that the better code comes at the end.

All numerical tests in this chapter were run on the same old laptop which has a 1.8 GHz IntelCore i7 (2 core’s) 4 GB under Mac OS Sierra 10.12.1 with Matlab R2016a, Python 2.7.1 with NumPy1.5.1 and clang++ 8.0.0

5.1 A simple test function

We consider the following example function taken from Gantner & Schwab (2016):

g (x) := exp

(c

s∑j=1

x j j−b

)=

s∏j=1

exp(c x j j−b). (15)

For testing purposes it is nice to know the exact value of the integral. Since g is aproduct of one-dimensional functions and we can write down the solution of theone-dimensional integrals, we find

I (g ) :=∫

[0,1]sg (x)dx =

s∏j=1

exp(c j−b)−1

c j−b, c �= 0.

Let us define g in Matlab as a vectorized inline function taking multiple vectorsat once as an s ×n array and returning a 1×n array of results:

% x is an [s-by-n] matrix, n vectors of size s; vectorized g(x):g = @(x, c, b) exp(c * (1:size(x,1)).^(-b) * x);% note that vectors are considered to be columns and so the% product above is an inner product, summing over the dimensions

In fact we will define g slightly different as we will not just pass in multiple vectorsat once, but also different shifted versions of these vectors as an s ×n ×m array:

28 §5 Software demo

g = @(x, c, b) reshape( ...exp(c * (1:size(x,1)).^(-b) * x(:,:)), ... % 'as above'1, size(x,2), size(x,3) ... % 1-by-n-by-[whatever left]

);

Of course more complicated functions would better be defined in a separate file.E.g.: we could define g in a separate file with the same function signature (we callthis file gfun.m, and thus this function’s name is gfun, to distinguish from theinline definition above):

function y = gfun(x, c, b)% function y = gfun(x, c, b)%% Vectorized evaluation of the example function% \[ g(x) := \exp( c \sum_{j=1}^s x_j j^{-b} ) \]%% Inputs:% x array of s-dimensional vectors, [s-by-n] array% or [s-by-n-by-m] array (or even deeper nesting,% but the first dimension should be the dimension s)% c scaling parameter, scalar% b dimension decay parameter to diminish influence of% higher dimensions, scalar, b >= 0 (b = 0 is no decay)% Outputs:% y function value for each input vector, [1-by-n] array% or [s-by-n-by-m] array (or deeper, but same as x)%% Note: the array x (and also the resulting array y) can have more% than two dimensions, e.g., x could be [s-by-n-by-m] and then the% resulting y will be [1-by-n-by-m]. This is to accommodate for% multiple versions of a point set (e.g., for shifted point sets).y = reshape( ...

exp(c * (1:size(x,1)).^(-b) * x(:,:)), ...max((1:ndims(x) ~= 1) .* size(x), 1) ... % first dim to 1

);

This version is even more general as it allows x to have any shape as long as theleading dimension is s (the dimensionality of the points), and it will return anarray of the same shape but with the first dimension set to 1 (mapping vectors intofunction values).

We are now ready to fix some parameters:

% parameters of the g-functions = 100; % number of dimensionsc = 1; % c-parameterb = 2; % decay-parameter

Then we can calculate its exact integral value:

a = c * (1:s).^(-b); exact = prod(expm1(a)./a);% or as a function (repeating the 'a' twice):gexact = @(s, c, b) prod(expm1(c*(1:s).^(-b))./(c*(1:s).^(-b)));% notice we use expm1(a) and not exp(a) - 1

§5.3 Some technical details* 29

5.2 The difficulty of our test function

x

y

1

10

exp(x)

exp(x/5)

exp(−x/5)

exp(−x)exp(−5x)

Fig. 6 Interpretation of the combined parameters c and b for the function g in a single coordinate.The effect is “multiplied” in multiple dimensions.

The parameters c and b specify how difficult the function g will be. We canuse Figure 6 as a guideline. It is clear we have a product of such one-dimensionalexponential functions.

Looking at the figure, we see that if the argument to the exponential functionis a small number then the function is nearly linear and approaching a constantfunction. Note that constant functions are ridiculously easy to integrate. Theextreme case would be c = 0; in that case both MC and QMC methods will give theexact value of the integral already with one function value.

The larger the value of c (positive or negative) the more we deviate from alinear function and we need more and more samples to determine its integral.For negative c with very large magnitude, the function is essentially 0 except forg (0) = 1, so it is a “peaky function” and rather hard to integrate.

In the multivariate case, the parameter b, with b ≥ 0, modifies how quickly weconverge to a constant function as the dimension j increases. When the numberof dimensions increases, the deviation from the constant function is “multiplied”.

5.3 Some technical details*

Floating point precision

Note the usage of the function expm1 in calculating the exact integral of g . Thisis useful for small arguments when exp(x) ≈ 1 since then it is more accurate tocompute the right-most expression instead of the middle expression of


exp(x)−1 =(1+x + x2

2!+ x3

3!+·· ·

)−1 = x + x2

2!+ x3

3!+·· · ,

because of floating point arithmetic where 1+ε is rounded to 1 for ε smaller thanthe floating point precision.

Vectorization of (interpreted) code

It is often a good idea to vectorize code in interpreted languages (e.g., Matlab andPython). Vectorized code often runs much faster than for-loops. Once you get usedto vectorized code, in terms of matrices and vectors, it is less error prone and easierto read. In Matlab we vectorize by using matrix and vector operations, as well asuse array operations on each element by using a dot in front of the expression:“.*”, “./”, “.^”. You can look up “vectorization” in the Matlab documentation.

A very useful function is reshape which does not cost any computational ormemory effort. It just takes the same block of data but interprets the data as if theelements are to be interpreted with a different shape. Matlab uses column-majorformat, which means that for x an s×n×m array, the first s consecutive numbers inthe data block is the part x(1:s,1,1) = x(:,1,1). This is the same as in Fortran.C and C++ however use row-major format in which the last dimension iteratesover consecutive elements. In Python, using NumPy, the default is row-major butit can be chosen on an array by array basis.

The way we have chosen to lay out our collections of s-dimensional vectors inMatlab is such that the vectors are stored consecutively in memory. In this way thedata needed to evaluate one function value is localized in memory. If we wouldhave to implement such a vectorization in Python with row-major format, thenwe would formulate x as an m ×n × s array instead. That is what we will choose in§5.7.2 when making a Python implementation.

Parallelization

We note that MC and QMC methods are “embarrassingly parallel”. This is a techni-cal term to mean that all computations are independent and so can be distributed(probably preferably in blocks) straightforwardly over multiple cores. The accu-mulation of the results can be done with a simple “reduce” operation.

5.4 Usage of random number generators

It is good practice to have reproducible results (for debugging or when testingoptimizations, or checking the results in this text). We will use the MersenneTwister as the random number generator for our MC simulations and we will set

§5.5 Monte Carlo approximation 31

its initial state to a fixed value such that we can repeat our experiment and getexactly the same random numbers.

Similarly we will use the combined recursive generator from L’Ecuyer for therandom shifting in case of the QMC approximations.

rng_MC = RandStream('mt19937ar', 'Seed', 1234);rng_shifts = RandStream('mrg32k3a', 'Seed', 1234);

In Matlab you can now draw random numbers from the Mersenne Twister bydoing x = rand(rng_MC, s, n) to obtain an s ×n array.

5.5 Monte Carlo approximation

We are now ready to do a first approximation of the integral using MC method. Wewill use ten thousand samples to get a MC approximation. (Of course in our testcase we do know the exact value of the integral.)

tic;N = 1e5; % number of samplesG = g(rand(rng_MC, s, N), c, b); % evaluate at once, mean but easyMC_Q = mean(G);MC_std = std(G)/sqrt(N);t = toc;fprintf('MC_Q = %g (error=%g, std=%g, N=%d) in %f sec\n', ...

MC_Q, abs(MC_Q - gexact(s, c, b)), MC_std, N, t);

This gives us

MC_Q = 2.3679 (error=0.000573539, std=0.00223394, N=100000) in 0.271163 sec

Without resetting the seed and re-running the above code 9 times gives the follow-ing output:

MC_Q = 2.36737 (error=0.00110043, std=0.00223924, N=100000) in 0.209265 secMC_Q = 2.37118 (error=0.00270455, std=0.00224065, N=100000) in 0.237171 secMC_Q = 2.36787 (error=0.000600521, std=0.00224386, N=100000) in 0.233068 secMC_Q = 2.36933 (error=0.000861304, std=0.00223436, N=100000) in 0.227344 secMC_Q = 2.37172 (error=0.00324287, std=0.00223811, N=100000) in 0.223135 secMC_Q = 2.37063 (error=0.00216023, std=0.00223951, N=100000) in 0.219199 secMC_Q = 2.36794 (error=0.000537361, std=0.00224293, N=100000) in 0.221401 secMC_Q = 2.36981 (error=0.00133505, std=0.00223974, N=100000) in 0.293782 secMC_Q = 2.36995 (error=0.00147575, std=0.00223747, N=100000) in 0.272278 sec

In Figure 7 we plot the results of these 10 approximations to obtain estimates toI (g ) as well as σ2(g ). In Figure 8 we see the standard error plotted in terms of thenumber of samples used. We can clearly see from the figure that the convergenceis 1/

�N , as expected. For our test function we can actually calculate σ2(g ), and so

we also plotted σ(g )/�

N as a dashed reference line. Using 105 random sampleswe observe an estimated standard error of ∼ 10−3. If we would want to divide thiserror by 10 then we will need to take 100 times more samples.


101 102 103 104 105

2

2.5

3

101 102 103 104 105

0.2

0.4

0.6

0.8

Fig. 7 Monte Carlo integral approximations (left) and variance estimations (right) for 10 runs forthe function g with c = 1, b = 2 and s = 100. The dashed lines are the actual integral and variance.

101 102 103 104 10510−3

10−2

10−1

100

Fig. 8 Standard error of 10 Monte Carlo integral approximations for g with c = 1, b = 2 ands = 100. The dashed line is the actual σ(g )/

�N

5.6 Quasi-Monte Carlo approximation

As we just observed the MC rate of 1/�

N we are ready to try a QMC rule andsee if we get the promised 1/N convergence. We take the simplest of all QMCrules: a lattice rule (or actually a lattice sequence). We will use a “multi purpose”lattice sequence to calculate our integral. We use the generating vector from thefile exod2_base2_m20_CKN on the webpage https://people.cs.kuleuven.be/~dirk.nuyens/qmc-generators/. The maximum number of dimensions provided is 250and the maximum number of points is 220 (about a million). This generating vector

§5.6 Quasi-Monte Carlo approximation 33

was optimized for the weighted Sobolev space with “order 2” weights. This meansthe rule was optimized to pay particular attention to all 2-dimensional projections.The rule was constructed to be good for all powers of 2 and can thus be used toobtain a sequence of approximations.

As a first try we can just generate the first 216 points in 100 dimensions and takethe average of our function at these points (our QMC estimate):

z = load('exod2_base2_m20_CKN.txt'); % should be column vector!z = z(:); % will force column vectortic;N = pow2(16); x = mod(z(1:s) * (0:N-1), N) / N; % all 2^16 pointsQMC_Q = mean(g(x, c, b));t = toc;fprintf('QMC_Q = %g (error=%g, N=%d) in %g sec\n', ...

QMC_Q, abs(QMC_Q - gexact(s, c, b)), N, t);

This gives us

QMC_Q = 2.36845 (error=2.10821e-05, N=65536) in 0.588378 sec

As this is a deterministic result, there is no standard error output on the esti-mator. In this case of c = 1 we get a nice error of ∼ 10−5 while MC only gets ∼ 10−3

using 105 evaluations, i.e., more than 50% more evaluations. Although the com-plexity of generating random numbers and generating lattice points is technicallyalmost the same, there is a larger difference in execution time in Matlab due tothe fact that the Matlab random number generator is compiled code; we alreadysaw however that the error of QMC is much much smaller than MC for the samenumber of points. Of course, the time difference goes away by using compiledcode for QMC as well. We will show some timings in Figure 11.

To obtain a practical error estimate we will now use several randomly shiftedlattice rules and look at successive approximations for increasing number of points.We do that in the following snippet:

M = pow2(3); % number of random shiftsshifts = rand(rng_shifts, s, M);

Ns = pow2(4:16); % calculate approximations for different N'sQMC_Q = zeros(size(Ns)); % approximations for increasing NQMC_std = zeros(size(Ns)); % standard error for each NQMC_VG = zeros(size(Ns)); % estimate of variance of g for each N

for i=1:numel(Ns)N = Ns(i); % total number of function evaluationsn = N / M; % use QMC rule with n = (N/M) points and

% M independent realisations of this ruletic; % we generate all points and all shifts s-by-n-by-M:G = g(mod(bsxfun(@plus, reshape(z(1:s)*(0:n-1), [s n 1]), ...

reshape(n*shifts, [s 1 M])), n)/n, ...c, b); % G is 1-by-n-by-M (all points, all shifts)

QMC_R = mean(G); % mean(G) is 1-by-1-by-MQMC_Q(i) = mean(QMC_R); % average over all M shiftsQMC_std(i) = std(QMC_R)/sqrt(M); % stderr over M shiftst = toc;


% ad hoc calculations of var(g)QMC_VG(i) = sum(G(:).^2)/N - mean(G(:)).^2;fprintf('QMC_Q = %g (error=%g, std=%g, N=%d) in %f sec\n', ...

QMC_Q(i), abs(QMC_Q(i) - gexact(s, c, b)), ...QMC_std(i), N, t);

end

The output is

QMC_Q = 2.26254 (error=0.105936, std=0.116497, N=16) in 0.001603 secQMC_Q = 2.29778 (error=0.0706969, std=0.0386404, N=32) in 0.000966 secQMC_Q = 2.36985 (error=0.00137447, std=0.0291711, N=64) in 0.001031 secQMC_Q = 2.37964 (error=0.0111641, std=0.0120569, N=128) in 0.003045 secQMC_Q = 2.36813 (error=0.000341202, std=0.00856618, N=256) in 0.003622 secQMC_Q = 2.36805 (error=0.000427477, std=0.00671663, N=512) in 0.004757 secQMC_Q = 2.36629 (error=0.00218738, std=0.00265859, N=1024) in 0.008950 secQMC_Q = 2.36886 (error=0.000388421, std=0.000855061, N=2048) in 0.022411 secQMC_Q = 2.36879 (error=0.000317932, std=0.000581296, N=4096) in 0.038437 secQMC_Q = 2.3687 (error=0.000229074, std=0.000316656, N=8192) in 0.101937 secQMC_Q = 2.36844 (error=2.99859e-05, std=0.000141558, N=16384) in 0.169285 secQMC_Q = 2.36844 (error=3.17432e-05, std=7.53801e-05, N=32768) in 0.269543 secQMC_Q = 2.3685 (error=3.04367e-05, std=3.9735e-05, N=65536) in 0.531608 sec

101 102 103 104 105

2

2.5

3

101 102 103 104 105

0.2

0.4

0.6

0.8

Fig. 9 Quasi-Monte Carlo integral approximations (left) and variance estimations (right) forrandomly shifted lattice rules (with 8 shifts) for the function g with c = 1, b = 2 and s = 100. Weplot 10 results (for 10 times 8 random shifts) to get an idea of how the result varies in terms of therandomness of the shifts. The dashed lines are the actual integral and variance. The Monte Carloapproximations are the greyed out lines for reference.

The results of these 10 QMC simulations with different shifts are given in Fig-ure 9 and Figure 10. The difference with the MC simulations (greyed out on theplots) is enormous, which is most obvious on the log-log plot of the standard error.The convergence rate here is near 1/N . If we want to have one more digit accuracy,then we need to take 10 times more samples (instead of the 100 in the case of MC).This is the power of plain straightforward QMC.

Excercise 5.1 In the above code the complete point set is constructed over andover again when we increase the number of points. This is not needed; we can justcalculate the additional points. Write your own code to only evaluate the additional

§5.7 Using standard lattice point generators 35

101 102 103 104 10510−5

10−4

10−3

10−2

10−1

100

Fig. 10 Standard error of 10 quasi-Monte Carlo integral approximations for g with c = 1, b = 2and s = 100. For reference we also plotted the standard error of the Monte Carlo approximationsas greyed out lines.

points and add the sum of your new evaluations (for each shift) to a variableQMC_acc of size 1×M, such that QMC_R = QMC_acc / n. (For the impatient: wewill do this as well in the next section.)

5.7 Using standard lattice point generators

In the previous section we have written the lattice point generation from scratch.Here we show information on three pre-made point generators which can bedownloaded from https://people.cs.kuleuven.be/~dirk.nuyens/qmc-generators/:

• Matlab: latticeseq_b2.m,• Python: latticeseq_b2.py (which contains a command line point generator),• C++: latticeseq_b2.hpp and latticeseq_b2.cpp (which is an example com-

mand line point generator).

All of these point generators actually generate the points in radical inverseordering. This means the points can be used as a lattice sequence in the order theyare generated. There is no need to work in blocks of powers of 2, although we willstill do that in the examples below for convenience.


5.7.1 Using “latticeseq_b2.m” in Matlab

The Matlab point generator is very straightforward to use. We need to initialize itwith a generating vector:

z = load('exod2_base2_m20_CKN.txt');latticeseq_b2('init0', z);

Now you keep on asking for more points as if it were a random number generatorin Matlab:

for i=1:4x = latticeseq_b2(s, n); % draw n vectors of length s% now do something with this [s-by-n] matrix

end

Here is how we use our Matlab point generator to approximate the integral ofthe test function:

%% our function gg = @(x, c, b) exp(c * (1:size(x,1)).^(-b) * x);gexact = @(s, c, b) prod(expm1(c*(1:s).^(-b))./(c*(1:s).^(-b)));

%% parameters of the g-functions = 100; % number of dimensionsc = 1; % c-parameterb = 2; % decay-parameterexact = gexact(s, c, b);

%% QMC generatorz = load('exod2_base2_m20_CKN.txt');latticeseq_b2('init0', z);

%% reset RNG for shiftsrng_shifts = RandStream('mrg32k3a', 'Seed', 1234);

%% random shiftsM = pow2(3);shifts = rand(rng_shifts, s, M);

%% QMC approximationacc = zeros(1, M);nprev = 0;mmin = 4; mmax = 16;QQ=zeros(1, mmax-mmin+1); SS=zeros(size(QQ)); tt=zeros(size(QQ));for m=mmin:mmax

N = pow2(m);n = N / M;nnew = n - nprev;tic;x = latticeseq_b2(s, nnew); % generate the new pointsfor r=1:M

acc(r) = acc(r) + ...sum( g(mod(bsxfun(@plus, x, shifts(:,r)), 1), c, b) );

end


nprev = n;Q = mean(acc/n);stdQ = std(acc/n) / sqrt(M);t = toc;fprintf('QMC_Q = %g (error=%g, std=%g, N=%d) in %f sec\n', ...

Q, abs(Q - exact), ...stdQ, N, t);

QQ(m-mmin+1) = Q; SS(m-mmin+1) = stdQ; tt(m-mmin+1) = t;end

loglog(pow2(mmin:mmax), SS, '-o');

Excercise 5.2 Please note that we used less vectorization here compared to our Mat-lab code in the previous subsection. This time we have an extra for-loop runningover the different shifts. (We did however make the change not to evaluate the func-tion over and over again in the initial points of the lattice sequence.) Try to changethe code such that this extra for-loop is eliminated by using similar techniques as inour previous Matlab code. (Note that you then will have to plug in the version of gwhich could accept s ×n ×m arrays again.)

The Matlab code results in the following output:

QMC_Q = 2.26254 (error=0.105936, std=0.116497, N=16) in 0.002076 secQMC_Q = 2.29778 (error=0.0706969, std=0.0386404, N=32) in 0.001282 secQMC_Q = 2.36985 (error=0.00137447, std=0.0291711, N=64) in 0.001494 secQMC_Q = 2.37964 (error=0.0111641, std=0.0120569, N=128) in 0.004586 secQMC_Q = 2.36813 (error=0.000341202, std=0.00856618, N=256) in 0.003935 secQMC_Q = 2.36805 (error=0.000427477, std=0.00671663, N=512) in 0.004966 secQMC_Q = 2.36629 (error=0.00218738, std=0.00265859, N=1024) in 0.007660 secQMC_Q = 2.36886 (error=0.000388421, std=0.000855061, N=2048) in 0.012066 secQMC_Q = 2.36879 (error=0.000317932, std=0.000581296, N=4096) in 0.025548 secQMC_Q = 2.3687 (error=0.000229074, std=0.000316656, N=8192) in 0.043272 secQMC_Q = 2.36844 (error=2.99859e-05, std=0.000141558, N=16384) in 0.125853 secQMC_Q = 2.36844 (error=3.17432e-05, std=7.53801e-05, N=32768) in 0.141943 secQMC_Q = 2.3685 (error=3.04367e-05, std=3.9735e-05, N=65536) in 0.256846 sec

5.7.2 Using “latticeseq_b2.py” in Python

We will now do the same in Python. The Python code presented here uses “row-major” ordering for the function g : we keep consecutive dimensions of a vector asthe last dimension in an array such that they are consecutive in memory.

The Python point generator can be used as an iterator, e.g.:

from latticeseq_b2 import latticeseq_b2

latgen = latticeseq_b2('exod2_base2_m20_CKN.txt', s=4, m=3)for x in latgen:

print x

This will print:

[0.0, 0.0, 0.0, 0.0][0.5, 0.5, 0.5, 0.5]


[0.25, 0.75, 0.75, 0.25][0.75, 0.25, 0.25, 0.75][0.125, 0.375, 0.375, 0.125][0.625, 0.875, 0.875, 0.625][0.375, 0.125, 0.125, 0.375][0.875, 0.625, 0.625, 0.875]

Our Python implementation now goes as follows:

#!/usr/bin/env python

from numpy import *from timeit import default_timer as timerimport sys

from latticeseq_b2 import latticeseq_b2 # Python point generator

def g(x, c, b):"""Our function g, which accepts x as a [sz1-by-...-by-s]array with s the number of dimensions in a single vector."""s = size(x, -1)return exp(c * inner(arange(1, s+1, dtype='d')**-b, x))

def gexact(s, c, b):a = c*arange(1, s+1, dtype='d')**-breturn prod(expm1(a) / a)

# parameters for our functions = 100c = 1b = 2exact = gexact(s, c, b)

# random shiftingM = 2**3random.seed(1) # Mersenne Twistershifts = random.rand(M, s)

# QMC generator (truncated to correct number of dimensions)latgen = latticeseq_b2('exod2_base2_m20_CKN.txt', s=s)

acc = zeros((M,)) # accumulator for each shiftnprev = 0mmin = 4; mmax = 16for m in range(mmin, mmax+1):

N = 2**mn = N / Mstart = timer()for k in range(nprev, n):

x = latgen.next() # next point, evaluate in all shifts:acc += [ g((x+shift) % 1, c, b) for shift in shifts ]

nprev = nQ = mean(acc/n)stdQ = std(acc/n) / sqrt(M)end = timer()


print "QMC_Q = %g (error=%g, std=%g, N=%d) in %f sec" % \(Q, abs(Q-exact), stdQ, N, end-start)

sys.stderr.write("%d\t%g\t%g\t%g\t%g\n" % \(N, Q, stdQ, abs(Q-exact), end-start))

We print both on stdout and stderr. By using output redirection we can look atboth outputs separately, or save them to file. If you run the program like

./qmc.py 2> /dev/null

then you only see the info printed on stdout. If on the other hand you are interestedin making a plot in Matlab with the output on stderr you can go ahead like this:

./qmc.py 2> qmcpy_stderr.txt

and then make a plot in Matlab doing:

R = load('qmcpy_stderr.txt'); % column 1 is total function evalsloglog(R(:,1), R(:,3), '*-'); % column 3 is stderr

This Python code results in the following output on stdout:

QMC_Q = 2.47133 (error=0.102853, std=0.109378, N=16) in 0.002346 secQMC_Q = 2.39414 (error=0.0256693, std=0.0448069, N=32) in 0.001667 secQMC_Q = 2.36028 (error=0.00819468, std=0.0223617, N=64) in 0.003084 secQMC_Q = 2.37069 (error=0.00222088, std=0.0146228, N=128) in 0.006265 secQMC_Q = 2.37581 (error=0.00734012, std=0.00635773, N=256) in 0.012161 secQMC_Q = 2.37829 (error=0.00981743, std=0.00549536, N=512) in 0.025390 secQMC_Q = 2.37399 (error=0.00551578, std=0.00233053, N=1024) in 0.057405 secQMC_Q = 2.36986 (error=0.00138407, std=0.00107834, N=2048) in 0.100266 secQMC_Q = 2.36914 (error=0.00066687, std=0.000603703, N=4096) in 0.204837 secQMC_Q = 2.36866 (error=0.000190412, std=0.000238276, N=8192) in 0.390456 secQMC_Q = 2.36858 (error=0.000109506, std=0.000156642, N=16384) in 0.773687 secQMC_Q = 2.36851 (error=4.12105e-05, std=7.73331e-05, N=32768) in 1.583910 secQMC_Q = 2.3685 (error=2.46144e-05, std=2.11916e-05, N=65536) in 3.423699 sec

Excercise 5.3 As can be seen from this output the execution time of the Pythoncode is dramatically slower. This is because the Python code has not been vectorized(we did provide a vectorized function g however). Try to change the Python codeto make use of vectorized evaluation of the function g . In Python you can use(x + shifts[r,:]) % 1 for x an n × s array to shift all n points at once by ther -th shift of size 1× s. In numpy this is called “broadcasting” and is similar to theeffect of the Matlab singleton expansion bsxfun. Also use the calc_block methodfrom the point generator to get a full power of 2 block of points as a numpy array(but not in radical inverse ordering anymore). If you make this change the aboveoutput should change in similar performance enhancements as the following:

QMC_Q = 2.47133 (error=0.102853, std=0.109378, N=16) in 0.000493 secQMC_Q = 2.39414 (error=0.0256693, std=0.0448069, N=32) in 0.000674 secQMC_Q = 2.36028 (error=0.00819468, std=0.0223617, N=64) in 0.000606 secQMC_Q = 2.37069 (error=0.00222088, std=0.0146228, N=128) in 0.000713 secQMC_Q = 2.37581 (error=0.00734012, std=0.00635773, N=256) in 0.000784 secQMC_Q = 2.37829 (error=0.00981743, std=0.00549536, N=512) in 0.001038 secQMC_Q = 2.37399 (error=0.00551578, std=0.00233053, N=1024) in 0.002384 secQMC_Q = 2.36986 (error=0.00138407, std=0.00107834, N=2048) in 0.003819 secQMC_Q = 2.36914 (error=0.00066687, std=0.000603703, N=4096) in 0.007984 secQMC_Q = 2.36866 (error=0.000190412, std=0.000238276, N=8192) in 0.009895 secQMC_Q = 2.36858 (error=0.000109506, std=0.000156642, N=16384) in 0.019604 secQMC_Q = 2.36851 (error=4.12105e-05, std=7.73331e-05, N=32768) in 0.043807 secQMC_Q = 2.3685 (error=2.46144e-05, std=2.11916e-05, N=65536) in 0.093990 sec


5.7.3 Using “latticeseq_b2.hpp” in C++

Once we move into a compiled language, we do not really need vectorization anymore to get fast code. (Of course, you would still use highly optimized/vectorizedcode when you are in need of BLAS and LAPACK operations; but this is of no con-cern to us right now. Similarly you could make use of advanced SIMD instructionsto speed up your function evaluation as well.)

The C++ code makes use of the C++11 standard. The generator works like a C++iterator: obtaining its value by operator* (as a pointer to an array) and advancingto the next point by operator++. We will copy the point into a std::vector objectfor simplicity of the code. The following is our implementation in C++ (using-std=c++11 and -Wall -Ofast to compile).

#include <iostream>#include <fstream>#include <vector>#include <cmath>#include <random>#include <chrono>

#include "latticeseq_b2.hpp" // the C++ lattice point generator

double mod1(double x){

return x - (long long)(x); // assuming x is positive}

// our function gdouble g(const std::vector<double>& x, double c, double b){

using std::pow; using std::exp;double s = 0;for(int j = 1; j <= x.size(); ++j) s += x[j-1] * pow(j, -b);return exp(c * s);

}

double gexact(int s, double c, double b){

using std::pow; using std::expm1;double y = 1;for(int j = 1; j <= s; ++j)

y *= expm1(c * pow(j, -b)) / (c * pow(j, -b));return y;

}

int main(int argc, char* argv[]){

using std::sqrt; using std::abs; using namespace std::chrono;

// parameters of our function gint s = 100;double c = 1;


double b = 2;double exact = gexact(s, c, b);

// random shiftingint M = 1 << 3; // 2^3 = 8 random shiftsstd::mt19937 rng(1); // Mersenne Twisterstd::uniform_real_distribution<> dis(0, 1);std::vector<double> shifts(s*M);for(int i = 0; i < s*M; ++i) shifts[i] = dis(rng);

// QMC generatorint mmin = 4;int mmax = 16; // read generator vector from file:std::ifstream is("exod2_base2_m20_CKN.txt");std::vector<int> z(std::istream_iterator<int>(is), {});qmc::latticeseq_b2<double, int> latgen(s, mmax, z.begin());

std::vector<double> acc(M, 0); // accumulator for each shiftint nprev = 0;for(int m = mmin; m <= mmax; ++m){

int N = 1 << m; // estimates with 2^m function valuesint n = N / M; // divide out random shiftsauto t1 = high_resolution_clock::now();for(int k = nprev; k < n; ++k, ++latgen) {

std::vector<double> x(*latgen); // get QMC pointfor(int r = 0; r < M; ++r) { // for all shifts:

for(int j = 0; j < s; ++j) // apply random shiftx[j] = mod1(x[j] + shifts[(r-1)*M+j]);

acc[r] += g(x, c, b); // eval function}

}nprev = n;double acc_tot = 0; // calculate average over all shifts:for(int r = 0; r < M; ++r) acc_tot += acc[r];double Q = acc_tot / N;double varQ = 0; // calculate variance of estimator Q(g)for(int r = 0; r < M; ++r)

varQ += ((acc[r]/n) - Q) * ((acc[r]/n) - Q);varQ /= M*(M-1);auto t2 = high_resolution_clock::now();auto t = duration_cast<nanoseconds>(t2-t1).count() / 1e9;// print info on stdoutstd::cout << "QMC_Q = " << Q

<< " (error=" << abs(Q-exact)<< ", std=" << sqrt(varQ)<< ", N=" << N<< ") in " << t << " sec" << std::endl;

// also print machine parseable info on stderrstd::cerr << N << "\t"

<< Q << "\t"<< sqrt(varQ) << "\t"<< abs(Q-exact) << "\t"<< t << "\t"


<< std::endl;}return 0;

}

We print both on stdout and stderr. By using output redirection we can look atboth outputs separately, or save them to file. If you run the program like

./qmc 2> /dev/null

then you only see the info printed on stdout. If on the other hand you are interestedin making a plot in Matlab with the output on stderr you can go ahead like this:

./qmc 2> qmccpp_stderr.txt

and then make a plot in Matlab doing:

R = load('qmccpp_stderr.txt'); % column 1 is total function evalsloglog(R(:,1), R(:,3), '*-'); % column 3 is stderr

This C++ code results in the following output on stdout:

QMC_Q = 2.51913 (error=0.150654, std=0.171512, N=16) in 3.95e-05 secQMC_Q = 2.38777 (error=0.0193001, std=0.0919632, N=32) in 2.0462e-05 secQMC_Q = 2.34195 (error=0.0265183, std=0.0356583, N=64) in 6.315e-05 secQMC_Q = 2.36126 (error=0.00721449, std=0.0229612, N=128) in 0.00010186 secQMC_Q = 2.37241 (error=0.00393292, std=0.00883655, N=256) in 0.000156834 secQMC_Q = 2.37777 (error=0.00929647, std=0.0056284, N=512) in 0.000323498 secQMC_Q = 2.37204 (error=0.00356295, std=0.00325614, N=1024) in 0.000638391 secQMC_Q = 2.36928 (error=0.000810464, std=0.00116796, N=2048) in 0.00119738 secQMC_Q = 2.36874 (error=0.000261964, std=0.000408671, N=4096) in 0.00227906 secQMC_Q = 2.36815 (error=0.000326629, std=0.000247666, N=8192) in 0.00444471 secQMC_Q = 2.36818 (error=0.000290633, std=0.00010247, N=16384) in 0.0113687 secQMC_Q = 2.36843 (error=4.36275e-05, std=7.94542e-05, N=32768) in 0.0226769 secQMC_Q = 2.3685 (error=2.57808e-05, std=4.57268e-05, N=65536) in 0.0395034 sec

The execution times for increasing block sizes is plotted in Figure 11. The differ-ence in execution time for the C++ version is strikingly faster than the Matlab andPython codes, not unexpectedly of course. (Do not forget to turn on optimizationwith C++ code, otherwise the usage of templates will slow you down, not speedyou up. In fact the Python code was faster if C++ was compiled with -O3 instead of-Ofast on our testing machine.) It can also be seen from the graph that for increas-ing block size (i.e., when m increases) the interpreted codes are catching up. This iswhy vectorization is very important! If the computational effort in an interpretedlanguage is put into big compiled parts of code (like vectorized evaluation), thenthe difference with compiled code becomes very minimal. In Figure 12 we finallysee total execution time versus standard error of the different methods presented.

5.8 Applying the theory*

We now look at our example function g in light of the theory from §2.5. For ourfunction g it is easy to take mixed partial derivatives:

§5.8 Applying the theory* 43

101 102 103 104 105 10610−5

10−4

10−3

10−2

10−1

100

101

block size

tim

e(s

ec)

Matlab radicalinverse vectorizedPython radicalinverse not vectorizedPython not radicalinverse vectorizedC++ radicalinverse not vectorizedMatlab Monte Carlo

Fig. 11 Raw timings (not averaged over multiple runs) of the different code snippets in the textfor evaluating blocks of points. Ran on a 1.8 GHz Intel Core i7 (2 core’s) 4 GB under Mac OS Sierra10.12.1 with Matlab R2016a, Python 2.7.1 with NumPy 1.5.1 and clang++ 8.0.0.

∂|u|g∂xu

(x) =(∏

j∈u(c j−b)

) (s∏

j=1exp(c x j j−b)

)=(∏

j∈u(c j−b)

)g (x) .

Thus it is possible to find an explicit expression for the norm of g in the weightedSobolev space, see (6). For most practical integrands from real applications, itis rarely the case that we can find the norm explicitly and so we would need toestimate the norm.

With this in mind, we now consider a rough estimate of the norm of g . Let usassume for the moment that ‖g‖∞ ≤Cc,b,s . Then we have∥∥∥∥∂|u|g∂xu

∥∥∥∥∞ ≤(∏

j∈u(|c| j−b)

)‖g‖∞

and so the norm of g in the weighted Sobolev space, see (6), satisfies

‖g‖2γ = ∑

u⊆{1,...,s}

1

γu

∫[0,1]|u|

(∫[0,1]s−|u|

∂ug

∂xu(x)dx−u

)2

dxu

≤ C 2c,b,s

∑u⊆{1,...,s}

1

γu

∏j∈u

(|c| j−b)2 .


10−5 10−4 10−3 10−2 10−1 100 101 10210−5

10−4

10−3

10−2

10−1

100

time (sec)

stan

dar

der

ror

Matlab radicalinverse vectorizedPython radicalinverse not vectorizedPython not radicalinverse vectorizedC++ radicalinverse not vectorizedMatlab Monte Carlo

Fig. 12 Convergence of standard error in terms of raw total timings (not averaged over multipleruns). Ran on a 1.8 GHz Intel Core i7 (2 core’s) 4 GB under Mac OS Sierra 10.12.1 with MatlabR2016a, Python 2.7.1 with NumPy 1.5.1 and clang++ 8.0.0.

Furthermore, the constant Cc,b,s can be bounded in the following way. For b > 1(such that the Riemann zeta function ζ(b) <∞) and c ≥ 0, we have for all x ∈ [0,1]s

and all s ≥ 1 that

g (x) = exp

(c

s∑j=1

x j j−b

)≤ exp

(c

∞∑j=1

j−b

)= exp(c ζ(b)) <∞.

For c ≤ 0 we obviously always have g (x) ≤ 1 for all x ∈ [0,1]s . We thus concludethat Cc,b,s <∞ independently of s when b > 1.

We can now apply our estimate for the norm in our error bound for randomlyshifted lattice rules, see (8), to obtain

E |I (g )−Qn(g )|2 ≤ [eshγ (z)]2 ‖g‖2

γ ,

where we recall from Theorem 2 that if n is a power of a prime then

eshγ (z) ≤

(2

n

∑��=u⊆{1,...,s}

γλu

(2ζ(2λ)

(2π2)λ

)|u|)1/(2λ)

holds for all λ ∈ (1/2,1]. This means we can write

§5.9 Constructing point sets 45

E |I (g )−Qn(g )|2

≤C 2

c,b,s

n1/λ

(2

∑��=u⊆{1,...,s}

γλu

(2ζ(2λ)

(2π2)λ

)|u|)1/λ ( ∑u⊆{1,...,s}

1

γu

∏j∈u

(c2 j−2b)

).

The upper bound is minimized by choosing

γu = ∏j∈u

((2π2)λ

2ζ(2λ)c2 j−2b

)1/(1+λ)

.

These are the so-called “product weights”. The closer we take λ→ 1/2 the moreour root-mean-square error bound will tell us we have 1/n convergence. However,if we take λ= 1/2 then ζ(2λ) =∞ and the bound blows up.

In light of the above analysis, we consider product weights of the form

γ j =(

c j−b

�2

)1.25

, (16)

which corresponds roughly to taking λ= 0.6 in the formula above. The theory thenindicates that we get around O (n−0.83) convergence.

5.9 Constructing point sets

If we know how to choose the weights γu to describe our integrand, then wecan actually fire up a fast component-by-component construction to find a goodgenerating vector.

We illustrate here how to construct a good randomly shifted extensible latticerule for product weights (16) in the weighted Sobolev space. We use the MatlabCBC scripts from https://people.cs.kuleuven.be/~dirk.nuyens/fast-cbc/:

c = 1; b = 2; s = 100;mmin = 1; mmax = 16;[z, e2] = fastrank1expt(2, mmin, mmax, s, ...

@(t) t.^2-t+1/6, ... % shift-invar unanchored Sobolev@(j) (c*j.^-b/sqrt(2)).^1.25 ); % our product weights

% now save the vector to a filefid = fopen('myrule.txt', 'w');fprintf(fid, '%d\n', z);fclose(fid);

The vector e2 contains the squared worst-case error for increasing number ofdimensions, from 1 up to 100. One could plot e2 on a log-log plot (or just log in they-direction). If this plot seems to approach a finite limit then we can quite safelyassume the space is tractable and the convergence can be obtained independentof the number of dimensions.


We can now go and fire up our previous code to use this new vector. We donot however expect to see much noticeable difference. Experience tells us that ifwe pick a good “off the shelve” lattice rule, then it mostly performs quite well ingeneral. Of course, this robustness is not guaranteed. Therefore, if we can obtaingood information on the particular type of integrands, it is better to construct theQMC rule specifically for this type of integrands. This is the case for integrandsfrom PDEs with random coefficients, and we have a software package for thispurpose, see https://people.cs.kuleuven.be/~dirk.nuyens/qmc4pde/.

Excercise 5.4 Try the lattice rule constructed in this section to compare convergencegraphs with the off the shelve rule that we used in the precious section.

5.10 Sobol′ sequences, digital sequences, and interlacing

In the same spirit as the lattice sequence generators, we also provide generatorsfor digital nets:

• Matlab: digitalseq_b2g.m,• Python: digitalseq_b2g.py (which contains a command line point genera-

tor),• C++: digitalseq_b2g.hpp and digitalseq_b2g.cpp (which is an example

command line point generator).

Note that the g at the end of these file names is not a typo. This indicates that thepoints are generated in “gray coded radical inverse” ordering, which allows us toobtain the next point in the sequence very efficiently from the current point.

The easiest way to generate (interlaced) digital nets is through its generatingmatrices. Several sets of generating matrices are downloadable from our websites.

You might remember that interlacing digital nets allows us to get higher orderconvergence. We show an example using an order 2 interlaced Sobol′ sequence. InMatlab we can do as follows:

load sobol_alpha2_Bs.coldigitalseq_b2g('init0', sobol_alpha2_Bs)

% and now you can use it as if it was a random number generatorfor i=1:4

x = digitalseq_b2g(s, n); % draw n vectors of length s% now do something with this [s-by-n] matrix

end

Randomization now has to be done by digital shifting. A file digitalshift.mcan be found on the website with the point generator. We show a convergenceplot for s = 100, c = 0.5 and b = 2 for the standard Sobol′ sequence and for theinterlaced Sobol′ sequence of order 2 in Figure 13. Note that we changed the cparameter to be slightly easier in order to clearly see the order 2 convergence.

§5.10 Sobol′ sequences, digital sequences, and interlacing 47

101 102 103 104 10510−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

SobolSobol interl 21/N ref

1/N2 ref

Fig. 13 Higher order convergence using interlacing with the Sobol′ sequence.

48 §6 Small project

6Small project

Implement QMC for one of the toy applications.

49

7Further reading

Here we provide a biased selection of survey articles and books for further reading,including some elementary reviews as well as comprehensive technical surveys.We rank the level of difficulty of each reference in a scale from 1 (easy) to 5 (hard).

Contemporary references to QMC methods:

• I. H. Sloan, What’s new in high-dimensional integration? – Designing quasi-Monte Carlo for applications, In: Proceedings of the ICIAM, Beijing, China(L. Guo and Z. Ma, eds), Higher Education Press, Beijing, 2015, pp. 365–386.[Level 2]

• D. Nuyens, The construction of good lattice rules and polynomial lattice rules,In: Uniform Distribution and Quasi-Monte Carlo Methods (P. Kritzer, H. Nieder-reiter, F. Pillichshammer, A. Winterhof, eds.), Radon Series on Computationaland Applied Mathematics Vol. 15, De Gruyter, 2014, pp. 223–256. http://dx.doi.org/10.1515/9783110317930.223. [Level 3]

• J. Dick, F. Y. Kuo, and I. H. Sloan, High-dimensional integration – the quasi-Monte Carlo way, Acta Numerica, 22, 133–288 (2013). http://dx.doi.org/10.1017/S0962492913000044. [Levels 2–5]

• J. Dick and F. Pillichshammer, Digital Nets and Sequences, Cambridge Univer-sity Press, 2010. [Levels 2–5]

• R. Cools and D. Nuyens, A Belgian view on lattice rules, In: Monte Carlo andQuasi-Monte Carlo Methods 2006 (S. Heinrich, A. Keller, and H. Niederreiter,eds.), Springer-Verlag, 2008, pp. 3–21. [Level 3]

• F. Y. Kuo and I. H. Sloan, Lifting the curse of dimensionality, Notices of theAmerican Mathematical Society, 52, 1320–1328 (2005). http://www.ams.org/notices/200511/index.html. [Level 1]

Classical references to QMC methods:

• I. H. Sloan and S. Joe, Lattice Methods for Multiple Integration, Oxford UniversityPress, Oxford, 1994. [Levels 2–3]

• H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods,SIAM, Philadelphia, 1992. [Level 5]

Application of QMC methods to PDEs with random coefficients:

• F. Y. Kuo and D. Nuyens, Application of quasi-Monte Carlo methods to ellipticPDEs with random diffusion coefficients – A survey of analysis and implementa-tion, Foundations of Computational Mathematics, appeared online (2016), 66pages. See also the accompanying software at https://people.cs.kuleuven.be/~dirk.nuyens/qmc4pde/. [Level 4]

50 §7 Further reading

Application of QMC methods to generalised linear mixed models in statistics:

• F. Y. Kuo, W. T. M. Dunsmuir, I. H. Sloan, M. P. Wand, and R. S. Womersley, Quasi-Monte Carlo for highly structured generalized response models, Methodologyand Computing in Applied Probability, 10, 239–275 (2008). http://dx.doi.org/10.1007/s11009-007-9045-3. [Levels 2–3]

Application of QMC methods to mathematical finance and simulation:

• G. Leobacher and F. Pillichshammer, Introduction to Quasi-Monte Carlo Inte-gration and Applications, Springer, 2014. [Levels 2–4]

• C. Lemieux, Monte Carlo and Quasi-Monte Carlo Sampling, Springer, 2009.[Levels 2–3]

• M. B. Giles, F. Y. Kuo, I. H. Sloan, and B. J. Waterhouse, Quasi-Monte Carlo forfinance applications, ANZIAM Journal, 50 (CTAC2008), C308–C323 (2008). http://anziamj.austms.org.au/ojs/index.php/ANZIAMJ/article/view/1440. [Level 1]

apracticalguidetoquasi-montecarlomethodsdirk.nuyens/taiwan/... · 11/7/2016 ·...

Documents