center for uncertainty quantification logo lock-up · mation and kriging for large non-gridded...

1
Likelihood Approximation With Hierarchical Matrices For Large Spatial Datasets A. Litvinenko, Y. Sun, M. Genton, D. Keyes, CEMSE, KAUST H IERARCHICAL L IKELIHOOD A PPROXIMATION Suppose we observe a mean-zero, stationary and isotropic Gaussian process Z with a Matérn covari- ance at n irregularly spaced locations. Let Z =(Z (s 1 ), ..., Z (s n )) T then Z ∼N (0, C(θ )), θ R q is an unknown parameter vector of interest, where C ij (θ ) = cov(Z (s i ),Z (s j )) = C (ks i - s j k, θ ), and C (r ) := C θ (r )= 2σ 2 Γ(ν ) r 2 ν K ν r , θ =(σ 2 , ν, ‘) T is the Matérn covariance function. The MLE of θ is obtained by maximizing the Gaussian log- likelihood function: L(θ )= - n 2 log(2π ) - 1 2 log |C(θ )|- 1 2 Z > C(θ ) -1 Z. On each iteration of a maximization algorithm we have a new matrix C. For a given θ the Cholesky factorization requires O (n 3 ) FLOPS. We approxi- mate C e C in the H-matrix format with a log- linear computational cost and storage O (kn log n), where rank k n is a small integer. Theorem 1 1. Let ρ( e C -1 C - I) <ε< 1. It holds | log |C|- log | e C|| ≤ -n log(1 - ε). Let kC -1 k≤ c 1 , then | e L(θ ; k ) -L(θ )| = 1 2 log |C| | e C| - 1 2 Z T C -1 - e C -1 Z c 2 0 · c 1 · ε + n log(1 - ε) H-matrix rank 3 7 9 cov. length 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 Box-plots for different H-matrix ranks k = {3, 7, 9}, =0.0334. ν =0.5, n = 66049, rank k = 16, σ 2 =1. H IERARCHICAL MATRICES (H ACKBUSCH ’ 99) Advantages to approximate C by e C: H-approximation is cheap; storage and matrix-vector product cost O (kn log n); LU and inverse cost O (k 2 n log 2 n); efficient parallel implementations exists. (left) H-matrix approximations R n×n , n = 16641, of the discretised Matérn covariance function on unit square. The biggest dense (dark) block R 32×32 , maximal rank k = 13, ν =0.5, ρ =0.1, σ =1; (middle) H-Cholesky factor e L, e C = e L e L T ; (right) Precision matrix e C -1 . N UMERICAL EXAMPLES H-matrix approximation, ν =0.5, domain G = [0, 1] 2 , k e C (0.25,0.75) k 2 = {212, 568}, n = 16049. k KLD kC - e Ck 2 kC e C -1 - Ik 2 =0.25 =0.75 =0.25 =0.75 =0.25 =0.75 10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.1 50 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9 Computing time and number of iterations for maximization of log-likelihood e L(θ ; k ), n = 66049. k size, GB e C, set up time, s. compute e L, s. maximizing, s. # iters 10 1 7 115 1994 13 20 1.7 11 370 5445 9 dense 38 42 657 - Moisture data. We used adaptive rank arithmetics with ε = 10 -4 for each block of e C and ε = 10 -8 for each block of e C -1 . Number of processing cores is 40. n compute e C e L e L T inverse Compr. time size time size kI - ( e L e L T ) -1 Ck 2 time size kI - e C -1 Ck 2 rate % sec. MB sec. MB sec. MB 10000 14% 0.9 106 4.1 109 7.7e-6 44 230 7.8e-5 30000 7.5% 4.3 515 25 557 1.1e-3 316 1168 1.1e-1 n = 512K , accuracy inside each block 10 -8 , matrix setup 261 sec., compression rate 0.02% (0.4 GB against 2006 GB). H-LU is done in 843 sec., required 5.8 GB RAM, inversion LU error 2 · 10 -3 . number of measurements 1000 2000 4000 8000 16000 32000 \nu 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (left) with nuggets {0.01, 0.005, 0.001} for Gaussian covariance, n = 2000, k = 14, σ 2 =1; (center) Zoom of the middle figure; (right) box-plots for ν vs number of locations n. R EFERENCES AND A CKNOWLEDGEMENTS [1] B. N. K HOROMSKIJ , A. L ITVINENKO , H. G. M ATTHIES , Application of hierarchical matrices for computing the Karhunen-Loéve expan-sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 2008. [2] Y. S UN , M. S TEIN, Statistically and computationally efficient estimating equations for large spatial datasets, JCGS, 2016, [3] J. C ASTRILLON -C ANDAS , M. G ENTON , R. Y OKOTA, Multi-Level Restricted Maximum Likelihood Covariance Esti- mation and Kriging for Large Non-Gridded Spatial Datasets, Spatial Statistics, 2015 [4] W. N OWAK , A. L ITVINENKO, Kriging and spatial design accelerated by orders of magnitude: combining low-rank covariance approximations with FFT-techniques, J. Mathematical Geosciences, Vol. 45, N4, pp 411-435, 2013. Work supported by SRI-UQ and ECRC, KAUST.

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Center for Uncertainty Quantification Logo Lock-up · mation and Kriging for Large Non-Gridded Spatial Datasets, Spatial Statistics, 2015 [4]W. NOWAK, A. LITVINENKO, Kriging and spatial

Center for UncertaintyQuantification

Center for UncertaintyQuantification

Center for Uncertainty Quantification Logo Lock-up

LikelihoodApproximationWithHierarchicalMatricesForLargeSpatialDatasets

A. Litvinenko, Y. Sun, M. Genton, D. Keyes, CEMSE, KAUST

HIERARCHICAL LIKELIHOOD APPROXIMATIONSuppose we observe a mean-zero, stationary and isotropic Gaussian process Z with a Matérn covari-

ance at n irregularly spaced locations. Let Z = (Z(s1), ..., Z(sn))T then Z ∼ N (0,C(θ)), θ ∈ Rq is anunknown parameter vector of interest, where

Cij(θ) = cov(Z(si), Z(sj)) = C(‖si − sj‖,θ), and

C(r) := Cθ(r) =2σ2

Γ(ν)

( r2`

)νKν

(r`

), θ = (σ2, ν, `)T

is the Matérn covariance function. The MLE ofθ is obtained by maximizing the Gaussian log-likelihood function:

L(θ) = −n2

log(2π)− 1

2log |C(θ)|− 1

2Z>C(θ)−1Z.

On each iteration of a maximization algorithm wehave a new matrix C. For a given θ the Choleskyfactorization requires O(n3) FLOPS. We approxi-mate C ≈ C̃ in the H-matrix format with a log-linear computational cost and storageO(kn log n),where rank k � n is a small integer.

Theorem 1 1. Let ρ(C̃−1C− I) < ε < 1. It holds| log |C| − log |C̃|| ≤ −n log(1− ε). Let ‖C−1‖ ≤ c1,then

|L̃(θ; k)− L(θ)| = 1

2log|C||C̃|− 1

2ZT(C−1 − C̃−1

)Z

≤ c20 · c1 · ε+ n log(1− ε)

H-matrix rank

3 7 9cov. le

ngth

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

0.06

Box-plots for differentH-matrix ranksk = {3, 7, 9}, ` = 0.0334.

ν = 0.5, n = 66049, rank k = 16, σ2 = 1.

HIERARCHICAL MATRICES (HACKBUSCH’ 99)Advantages to approximate C by C̃: H-approximation is cheap; storage and matrix-vector productcost O(kn log n); LU and inverse cost O(k2n log2 n); efficient parallel implementations exists.

(left) H-matrix approximations ∈ Rn×n, n = 16641, of the discretised Matérn covariance function onunit square. The biggest dense (dark) block ∈ R32×32, maximal rank k = 13, ν = 0.5, ρ = 0.1, σ = 1;(middle)H-Cholesky factor L̃, C̃ = L̃L̃T; (right) Precision matrix C̃−1.

NUMERICAL EXAMPLES

H-matrix approximation, ν = 0.5, domain G = [0, 1]2, ‖C̃(0.25,0.75)‖2 = {212, 568}, n = 16049.

k KLD ‖C− C̃‖2 ‖CC̃−1 − I‖2` = 0.25 ` = 0.75 ` = 0.25 ` = 0.75 ` = 0.25 ` = 0.75

10 2.6e-3 0.2 7.7e-4 7.0e-4 6.0e-2 3.150 3.4e-13 5e-12 2.0e-13 2.4e-13 4e-11 2.7e-9

Computing time and number of iterations for maximization of log-likelihood L̃(θ; k), n = 66049.k size, GB C̃, set up time, s. compute L̃, s. maximizing, s. # iters10 1 7 115 1994 1320 1.7 11 370 5445 9

dense 38 42 657 ∞ -

Moisture data. We used adaptive rank arithmetics with ε = 10−4 for each block of C̃ and ε = 10−8 foreach block of C̃−1. Number of processing cores is 40.

n compute C̃ L̃L̃T inverseCompr. time size time size ‖I− (L̃L̃T)−1C‖2 time size ‖I− C̃−1C‖2rate % sec. MB sec. MB sec. MB

10000 14% 0.9 106 4.1 109 7.7e-6 44 230 7.8e-530000 7.5% 4.3 515 25 557 1.1e-3 316 1168 1.1e-1

n = 512K, accuracy inside each block 10−8, matrix setup 261 sec., compression rate 0.02% (0.4GB against 2006 GB).H-LU is done in 843 sec., required 5.8 GB RAM, inversion LU error 2 · 10−3.

number of measurements

1000 2000 4000 8000 16000 32000

\nu

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(left) with nuggets {0.01, 0.005, 0.001} for Gaussian covariance, n = 2000, k = 14, σ2 = 1; (center) Zoomof the middle figure; (right) box-plots for ν vs number of locations n.

REFERENCES AND ACKNOWLEDGEMENTS

[1] B. N. KHOROMSKIJ, A. LITVINENKO, H. G. MATTHIES, Application of hierarchical matrices for computing theKarhunen-Loéve expan-sion, Computing, Vol. 84, Issue 1-2, pp 49-67, 2008.

[2] Y. SUN, M. STEIN, Statistically and computationally efficient estimating equations for large spatial datasets, JCGS, 2016,[3] J. CASTRILLON-CANDAS, M. GENTON, R. YOKOTA, Multi-Level Restricted Maximum Likelihood Covariance Esti-

mation and Kriging for Large Non-Gridded Spatial Datasets, Spatial Statistics, 2015[4] W. NOWAK, A. LITVINENKO, Kriging and spatial design accelerated by orders of magnitude: combining low-rank

covariance approximations with FFT-techniques, J. Mathematical Geosciences, Vol. 45, N4, pp 411-435, 2013.

Work supported by SRI-UQ and ECRC, KAUST.