matrices h

Matrices hFrom Wikipedia, the free encyclopedia

Contents

1 Hadamard matrix 11.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Sylvesters construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Alternative construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Hadamard conjecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Equivalence of Hadamard matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.6 Skew Hadamard matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.7 Generalizations and special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.8 Practical applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.9 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.11 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.12 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Hadamards maximal determinant problem 62.1 Hadamard matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Equivalence and normalization of {1, 1} matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Connection of the maximal determinant problems for {1, 1} and {0, 1} matrices . . . . . . . . . . 72.4 Upper bounds on the maximal determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4.1 Gram matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.2 Hadamards bound (for all n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.3 Barbas bound for n odd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.4 The EhlichWojtas bound for n 2 (mod 4) . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.5 Ehlichs bound for n 3 (mod 4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Maximal determinants up to size 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Hamiltonian matrix 113.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Extension to complex matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Hamiltonian operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

i

ii CONTENTS

4 Hankel matrix 134.1 Hankel transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Hankel matrices for system identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Orthogonal polynomials on the real line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3.1 Positive Hankel matrices and the Hamburger moment problems . . . . . . . . . . . . . . . 144.3.2 Orthogonal polynomials on the real line . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3.3 Tridiagonal model of positive Hankel operators . . . . . . . . . . . . . . . . . . . . . . . 144.3.4 Relation between Hankel and Toeplitz matrices . . . . . . . . . . . . . . . . . . . . . . . 144.3.5 Relations between structured matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 HasseWitt matrix 155.1 Approach to the denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Cohomology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.3 Abelian varieties and their p-rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.4 Case of genus 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 Hat matrix 176.1 Linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.1.1 Solution with unit weights and uncorrelated errors . . . . . . . . . . . . . . . . . . . . . . 186.2 More generally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.2.1 Non-identical weights and/or correlated errors . . . . . . . . . . . . . . . . . . . . . . . . 186.3 Blockwise formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Hermitian matrix 207.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.3 Further properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217.4 Rayleigh quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227.7 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

8 Hessenberg matrix 238.1 Computer programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

CONTENTS iii

8.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248.6 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

9 Hessian automatic dierentiation 259.1 Reverse Hessian-vector products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259.2 Reverse Hessian: Edge_Pushing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259.3 Graph colouring techniques for Hessians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279.6 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

10 Hessian matrix 2810.1 Mixed derivatives and symmetry of the Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2810.2 Critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2910.3 Second derivative test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2910.4 Bordered Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2910.5 Vector-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3010.6 Generalizations to Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3010.7 Use in optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3010.8 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3110.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3110.10External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

11 Hierarchical matrix 3211.1 Basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3211.2 Application to integral operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3211.3 Application to elliptic partial dierential equations . . . . . . . . . . . . . . . . . . . . . . . . . . 3311.4 Arithmetic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3311.5 H2-matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3411.6 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3411.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

12 Higher spin alternating sign matrix 3512.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

13 Higher-dimensional gamma matrices 3613.1 Charge conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3613.2 Symmetry properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3713.3 Example of an explicit construction in the chiral basis . . . . . . . . . . . . . . . . . . . . . . . . 37

13.3.1 d = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3713.3.2 Generic even d = 2k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3813.3.3 Generic odd d = 2k + 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

iv CONTENTS

13.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3813.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

14 Hilbert matrix 4014.1 Historical note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4014.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4114.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

15 Hollow matrix 4215.1 Sparse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4215.2 Diagonal entries all zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

15.2.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4215.3 Block of zeroes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4315.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

16 Householder transformation 4416.1 Denition and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4416.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

16.2.1 Tridiagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4516.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

16.3 Computational and Theoretical Relationship to other Unitary Transformations . . . . . . . . . . . 4616.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4716.5 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

17 Hurwitz matrix 4817.1 Hurwitz matrix and the Hurwitz stability criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 4817.2 Hurwitz stable matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4917.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4917.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4917.5 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5017.6 Text and image sources, contributors, and licenses . . . . . . . . . . . . . . . . . . . . . . . . . . 51

17.6.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5117.6.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5217.6.3 Content license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Chapter 1

Hadamard matrix

In mathematics, aHadamard matrix, named after the French mathematician Jacques Hadamard, is a square matrixwhose entries are either +1 or 1 and whose rows are mutually orthogonal. In geometric terms, this means that everytwo dierent rows in a Hadamard matrix represent two perpendicular vectors, while in combinatorial terms, it meansthat every two dierent rows have matching entries in exactly half of their columns and mismatched entries in theremaining columns. It is a consequence of this denition that the corresponding properties hold for columns as well asrows. The n-dimensional parallelotope spanned by the rows of an nn Hadamard matrix has the maximum possiblen-dimensional volume among parallelotopes spanned by vectors whose entries are bounded in absolute value by 1.Equivalently, a Hadamard matrix has maximal determinant among matrices with entries of absolute value less thanor equal to 1 and so, is an extremal solution of Hadamards maximal determinant problem.Certain Hadamard matrices can almost directly be used as an error-correcting code using a Hadamard code (gen-eralized in ReedMuller codes), and are also used in balanced repeated replication (BRR), used by statisticians toestimate the variance of a parameter estimator.

1.1 Properties

Let H be a Hadamard matrix of order n. The transpose of H is closely related to its inverse. The correct formula is:

HHT = nIn

where In is the n n identity matrix and HT is the transpose of H. To see that this is true, notice that the rows of Hare all orthogonal vectors over the eld of real numbers and each have lengthpn . Dividing H through by this lengthgives an orthogonal matrix whose transpose is thus its inverse. Multiplying by the length again gives the equalityabove. As a result,

det(H) = nn2 ;

where det(H) is the determinant of H.Suppose thatM is a complex matrix of order n, whose entries are bounded by |Mij| 1, for each i, j between 1 and n.Then Hadamards determinant bound states that

j det(M)j nn/2:

Equality in this bound is attained for a real matrix M if and only if M is a Hadamard matrix.The order of a Hadamard matrix must be 1, 2, or a multiple of 4.

1

2 CHAPTER 1. HADAMARD MATRIX

1.2 Sylvesters constructionExamples of Hadamard matrices were actually rst constructed by James Joseph Sylvester in 1867. Let H be aHadamard matrix of order n. Then the partitioned matrix

H HH H

is a Hadamard matrix of order 2n. This observation can be applied repeatedly and leads to the following sequence ofmatrices, also called Walsh matrices.

H1 =1;

H2 =

1 11 1

;

and

H2k =

H2k1 H2k1H2k1 H2k1

= H2 H2k1 ;

for 2 k 2 N , where denotes the Kronecker product.In this manner, Sylvester constructed Hadamard matrices of order 2k for every non-negative integer k.[1]

Sylvesters matrices have a number of special properties. They are symmetric and, when k 1, have trace zero. Theelements in the rst column and the rst row are all positive. The elements in all the other rows and columns areevenly divided between positive and negative. Sylvester matrices are closely connected with Walsh functions.

1.3 Alternative constructionIf we map the elements of the Hadamard matrix using the group homomorphism f1;1;g 7! f0; 1;g , we candescribe an alternative construction of Sylvesters Hadamard matrix. First consider the matrix Fn , the n2n matrixwhose columns consist of all n-bit numbers arranged in ascending counting order. We may dene Fn recursively by

F1 =0 1

Fn =

012n1 112n1Fn1 Fn1

:

It can be shown by induction that the image of the Hadamard matrix under the above homomorphism is given by

H2n = FTnFn:

This construction demonstrates that the rows of the Hadamard matrix H2n can be viewed as a length 2n linearerror-correcting code of rank n, and minimum distance 2n1 with generating matrix Fn:This code is also referred to as a Walsh code. The Hadamard code, by contrast, is constructed from the HadamardmatrixH2n by a slightly dierent procedure.

1.4 Hadamard conjectureThe most important open question in the theory of Hadamard matrices is that of existence. TheHadamard conjec-ture proposes that a Hadamard matrix of order 4k exists for every positive integer k. The Hadamard conjecture hasalso been attributed to Paley, although it was considered implicitly by others prior to Paleys work.[2]

1.5. EQUIVALENCE OF HADAMARD MATRICES 3

A generalization of Sylvesters construction proves that if Hn and Hm are Hadamard matrices of orders n and mrespectively, then Hn Hm is a Hadamard matrix of order nm. This result is used to produce Hadamard matricesof higher order once those of smaller orders are known.Sylvesters 1867 construction yields Hadamard matrices of order 1, 2, 4, 8, 16, 32, etc. Hadamard matrices of orders12 and 20 were subsequently constructed by Hadamard (in 1893).[3] In 1933, Raymond Paley discovered the Paleyconstruction, which produces a Hadamard matrix of order q+1 when q is any prime power that is congruent to 3modulo 4 and that produces a Hadamard matrix of order 2(q+1) when q is a prime power that is congruent to 1modulo 4.[4] His method uses nite elds.The smallest order that cannot be constructed by a combination of Sylvesters and Paleys methods is 92. A Hadamardmatrix of this order was found using a computer by Baumert, Golomb, and Hall in 1962 at JPL.[5] They used aconstruction, due to Williamson,[6] that has yielded many additional orders. Many other methods for constructingHadamard matrices are now known.In 2005, Hadi Kharaghani and Behruz Tayfeh-Rezaie published their construction of a Hadamard matrix of order428.[7] As a result, the smallest order for which no Hadamard matrix is presently known is 668.As of 2008, there are 13 multiples of 4 less than or equal to 2000 for which no Hadamard matrix of that order isknown.[8] They are: 668, 716, 892, 1004, 1132, 1244, 1388, 1436, 1676, 1772, 1916, 1948, and 1964.

1.5 Equivalence of Hadamard matrices

TwoHadamardmatrices are considered equivalent if one can be obtained from the other by negating rows or columns,or by interchanging rows or columns. Up to equivalence, there is a unique Hadamard matrix of orders 1, 2, 4, 8, and12. There are 5 inequivalent matrices of order 16, 3 of order 20, 60 of order 24, and 487 of order 28. Millions ofinequivalent matrices are known for orders 32, 36, and 40. Using a coarser notion of equivalence that also allowstransposition, there are 4 inequivalent matrices of order 16, 3 of order 20, 36 of order 24, and 294 of order 28.[9]

1.6 Skew Hadamard matrices

A Hadamard matrix H is skew ifHT +H = 2I:Reid and Brown in 1972 showed that there exists a doubly regular tournament of order n" if and only if there existsa skew Hadamard matrix of order n + 1.

1.7 Generalizations and special cases

Many generalizations and special cases of Hadamard matrices have been investigated in the mathematical literature.One basic generalization is the weighing matrix, a square matrix in which entries may also be zero and which satisesWWT = wI for some w, its weight. A weighing matrix with its weight equal to its order is a Hadamard matrix.Another generalization denes a complex Hadamard matrix to be a matrix in which the entries are complex numbersof unit modulus and which satises H H*= n In where H* is the conjugate transpose of H. Complex Hadamardmatrices arise in the study of operator algebras and the theory of quantum computation. Butson-type Hadamardmatrices are complex Hadamard matrices in which the entries are taken to be qth roots of unity. The term complexHadamard matrix has been used by some authors to refer specically to the case q = 4.Regular Hadamard matrices are real Hadamard matrices whose row and column sums are all equal. A necessarycondition on the existence of a regular nn Hadamard matrix is that n be a perfect square. A circulant matrix ismanifestly regular, and therefore a circulant Hadamard matrix would have to be of perfect square order. Moreover,if an nn circulant Hadamard matrix existed with n > 1 then n would necessarily have to be of the form 4u2 with uodd.[10][11]

The circulant Hadamard matrix conjecture, however, asserts that, apart from the known 11 and 44 examples, nosuch matrices exist. This was veried for all but 26 values of u less than 104.[12]

4 CHAPTER 1. HADAMARD MATRIX

1.8 Practical applications Olivia MFSK an amateur-radio digital protocol designed to work in dicult (low signal-to-noise ratio plusmultipath propagation) conditions on shortwave bands.

Balanced Repeated Replication (BRR) a technique used by statisticians to estimate the variance of a statisticalestimator.

Coded aperture spectrometry an instrument for measuring the spectrum of light. The mask element used incoded aperture spectrometers is often a variant of a Hadamard matrix.

FeedbackDelay Networks Digital reverberation devices which use Hadamardmatrices to blend sample values PlackettBurman design of experiments for investigating the dependence of some measured quantity on anumber of independent variables.

Robust parameter designs for investigating noise factor impacts on responses Compressed Sensing for signal processing and undetermined linear systems (inverse problems)

1.9 See also Hadamard transform Combinatorial design Quincunx matrix

1.10 Notes[1] J.J. Sylvester. Thoughts on inverse orthogonal matrices, simultaneous sign successions, and tessellated pavements in two or

more colours, with applications to Newtons rule, ornamental tile-work, and the theory of numbers. Philosophical Magazine,34:461475, 1867

[2] Hedayat, A.; Wallis, W. D. (1978). Hadamard matrices and their applications. Annals of Statistics 6 (6): 11841238.doi:10.1214/aos/1176344370. JSTOR 2958712. MR 523759..

[3] Hadamard, J. (1893). Rsolution d'une question relative aux dterminants. Bulletin des Sciences Mathmatiques 17:240246.

[4] Paley, R. E. A. C. (1933). On orthogonal matrices. Journal of Mathematics and Physics 12: 311320.

[5] Baumert, L.; Golomb, S. W.; Hall, M., Jr. (1962). Discovery of an Hadamard Matrix of Order 92. Bulletin of theAmerican Mathematical Society 68 (3): 237238. doi:10.1090/S0002-9904-1962-10761-7. MR 0148686.

[6] Williamson, J. (1944). Hadamards determinant theorem and the sum of four squares. Duke Mathematical Journal 11(1): 6581. doi:10.1215/S0012-7094-44-01108-7. MR 0009590.

[7] Kharaghani, H.; Tayfeh-Rezaie, B. (2005). A Hadamard matrix of order 428. Journal of Combinatorial Designs 13 (6):435440. doi:10.1002/jcd.20043.

[8] okovi, Dragomir (2008). Hadamardmatrices of order 764 exist. Combinatorica 28 (4): 487489. doi:10.1007/s00493-008-2384-z.

[9] Wanless, I.M. (2005). Permanents ofmatrices of signed ones. Linear andMultilinear Algebra 53: 427433. doi:10.1080/03081080500093990.

[10] Turyn, R. J. (1965). Character sums and dierence sets. Pacic Journal ofMathematics 15 (1): 319346. doi:10.2140/pjm.1965.15.319.MR 0179098.

[11] Turyn, R. J. (1969). Sequences with small correlation. In Mann, H. B. Error Correcting Codes. New York: Wiley. pp.195228.

[12] Schmidt, B. (1999). Cyclotomic integers and nite geometry. Journal of the American Mathematical Society 12 (4):929952. doi:10.1090/S0894-0347-99-00298-2. JSTOR 2646093.

1.11. FURTHER READING 5

1.11 Further reading Baumert, L. D.; Hall, Marshall (1965). Hadamard matrices of the Williamson type. Math. Comp. 19 (91):442447. doi:10.1090/S0025-5718-1965-0179093-2. MR 0179093.

Georgiou, S.; Koukouvinos, C.; Seberry, J. (2003). Hadamard matrices, orthogonal designs and constructionalgorithms. Designs 2002: Further computational and constructive design theory. Boston: Kluwer. pp. 133205. ISBN 1-4020-7599-5.

Goethals, J. M.; Seidel, J. J. (1970). A skew Hadamard matrix of order 36. J. Austral. math. Soc. 11 (3):343344. doi:10.1017/S144678870000673X.

Kimura, Hiroshi (1989). New Hadamard matrix of order 24. Graphs and Combinatorics 5 (1): 235242.doi:10.1007/BF01788676.

Mood, Alexander M. (1964). On Hotellings Weighing Problem. Annals of Mathematical Statistics 17 (4):432446. doi:10.1214/aoms/1177730883.

Reid, K. B.; Brown, E. (1972). Doubly regular tournaments are equivalent to skew Hadamard matrices. J.Combin. Theory Ser. A 12 (3): 332338. doi:10.1016/0097-3165(72)90098-2.

Seberry Wallis, Jennifer (1976). On the existence of Hadamard matrices. J. Combinat. Theory A 21 (2):188195. doi:10.1016/0097-3165(76)90062-5.

Seberry, Jennifer (1980). A construction for generalized hadamard matrices. J. Statist. Plann. Infer. 4 (4):365368. doi:10.1016/0378-3758(80)90021-X.

Seberry, J.; Wysocki, B.; Wysocki, T. (2005). On some applications of Hadamard matrices. Metrika 62(23): 221239. doi:10.1007/s00184-005-0415-y.

Spence, Edward (1995). Classication of hadamard matrices of order 24 and 28. Discr. Math. 140 (1-3):185242. doi:10.1016/0012-365X(93)E0169-5.

Yarlagadda, R. K.; Hershey, J. E. (1997). Hadamard Matrix Analysis and Synthesis. Boston: Kluwer. ISBN0-7923-9826-2.

1.12 External links Skew Hadamard matrices of all orders up to 100, including every type with order up to 28; Hadamard Matrix. in OEIS N. J. A. Sloane. Library of Hadamard Matrices. On-line utility to obtain all orders up to 1000, except 668, 716, 876 & 892. JPL: In 1961, mathematicians from NASAs Jet Propulsion Laboratory and Caltech worked together to con-struct a Hadamard Matrix containing 92 rows and columns

Chapter 2

Hadamards maximal determinantproblem

Hadamards maximal determinant problem, named after Jacques Hadamard, asks for the largest determinantof a matrix with elements equal to 1 or 1. The analogous question for matrices with elements equal to 0 or 1 isequivalent since, as will be shown below, the maximal determinant of a {1,1} matrix of size n is 2n1 times themaximal determinant of a {0,1} matrix of size n1. The problem was posed by Hadamard in the 1893 paper [1] inwhich he presented his famous determinant bound and remains unsolved for matrices of general size. Hadamardsbound implies that {1, 1}-matrices of size n have determinant at most nn/2. Hadamard observed that a constructionof Sylvester[2] produces examples of matrices that attain the bound when n is a power of 2, and produced examplesof his own of sizes 12 and 20. He also showed that the bound is only attainable when n is equal to 1, 2, or a multipleof 4. Additional examples were later constructed by Scarpis and Paley and subsequently by many other authors. Suchmatrices are now known as Hadamard matrices. They have received intensive study.Matrix sizes n for which n 1, 2, or 3 (mod 4) have received less attention. The earliest results are due to Barba,who tightened Hadamards bound for n odd, and Williamson, who found the largest determinants for n=3, 5, 6, and7. Some important results include

tighter bounds, due to Barba, Ehlich, and Wojtas, for n 1, 2, or 3 (mod 4), which, however, are known notto be always attainable,

a few innite sequences of matrices attaining the bounds for n 1 or 2 (mod 4),

a number of matrices attaining the bounds for specic n 1 or 2 (mod 4),

a number of matrices not attaining the bounds for specic n 1 or 3 (mod 4), but that have been proved byexhaustive computation to have maximal determinant.

The design of experiments in statistics makes use of {1, 1} matrices X (not necessarily square) for which theinformation matrix XTX has maximal determinant. (The notation XT denotes the transpose of X.) Such matrices areknown as D-optimal designs.[3] If X is a square matrix, it is known as a saturated D-optimal design.

2.1 Hadamard matrices

Any two rows of an nn Hadamard matrix are orthogonal, which is impossible for a {1, 1} matrix when n is anodd number. When n 2 (mod 4), two rows that are both orthogonal to a third row cannot be orthogonal to eachother. Together, these statements imply that an nn Hadamard matrix can exist only if n = 1, 2, or a multiple of 4.Hadamard matrices have been well studied, but it is not known whether a Hadamard matrix of size 4k exists for everyk 1. The smallest k for which a 4k4k Hadamard matrix is not known to exist is 167.

6

2.2. EQUIVALENCE AND NORMALIZATION OF {1, 1} MATRICES 7

2.2 Equivalence and normalization of {1, 1} matricesAny of the following operations, when performed on a {1, 1} matrix R, changes the determinant of R only by aminus sign:

Negation of a row. Negation of a column. Interchange of two rows. Interchange of two columns.

Two {1,1} matrices, R1 and R2, are considered equivalent if R1 can be converted to R2 by some sequence of theabove operations. The determinants of equivalent matrices are equal, except possibly for a sign change, and it isoften convenient to standardize R by means of negations and permutations of rows and columns. A {1, 1} matrixis normalized if all elements in its rst row and column equal 1. When the size of a matrix is odd, it is sometimesuseful to use a dierent normalization in which every row and column contains an even number of elements 1 and anodd number of elements 1. Either of these normalizations can be accomplished using the rst two operations.

2.3 Connection of the maximal determinant problems for {1, 1} and {0,1} matrices

There is a one-to-one map from the set of normalized nn {1, 1} matrices to the set of (n1)(n1) {0, 1} matricesunder which the magnitude of the determinant is reduced by a factor of 21n. This map consists of the following steps.

1. Subtract row 1 of the {1, 1} matrix from rows 2 through n. (This does not change the determinant.)

2. Extract the (n1)(n1) submatrix consisting of rows 2 through n and columns 2 through n. This matrix haselements 0 and 2. (The determinant of this submatrix is the same as that of the original matrix, as can beseen by performing a cofactor expansion on column 1 of the matrix obtained in Step 1.)

3. Divide the submatrix by 2 to obtain a {0, 1} matrix. (This multiplies the determinant by (2)1-n.)

Example:

26641 1 1 11 1 1 11 1 1 11 1 1 1

3775!2664

1 1 1 10 2 2 00 0 2 20 2 0 2

3775!242 2 00 2 22 0 2

35!241 1 00 1 11 0 1

35In this example, the original matrix has determinant 16 and its image has determinant 2 = 16(2)3.Since the determinant of a {0, 1} matrix is an integer, the determinant of an nn {1, 1} matrix is an integer multipleof 2n1.

2.4 Upper bounds on the maximal determinant

2.4.1 Gram matrixLet R be an n by n {1, 1} matrix. TheGrammatrix of R is dened to be the matrix G = RRT. From this denitionit follows that G

1. is an integer matrix,

8 CHAPTER 2. HADAMARDS MAXIMAL DETERMINANT PROBLEM

2. is symmetric,

3. is positive-semidenite,

4. has constant diagonal whose value equals n.

Negating rows of R or applying a permutation to them results in the same negations and permutation being appliedboth to the rows, and to the corresponding columns, of G. We may also dene the matrix G=RTR. The matrix G isthe usual Gram matrix of a set of vectors, derived from the set of rows of R, while G is the Gram matrix derivedfrom the set of columns of R. A matrix R for which G = G is a normal matrix. Every known maximal-determinantmatrix is equivalent to a normal matrix, but it is not known whether this is always the case.

2.4.2 Hadamards bound (for all n)Hadamards bound can be derived by noting that |det R| = (det G)1/2 (det nI)1/2 = nn/2, which is a consequence ofthe observation that nI, where I is the n by n identity matrix, is the unique matrix of maximal determinant amongmatrices satisfying properties 14. That det R must be an integer multiple of 2n1 can be used to provide anotherdemonstration that Hadamards bound is not always attainable. When n is odd, the bound nn/2 is either non-integeror odd, and is therefore unattainable except when n = 1. When n = 2k with k odd, the highest power of 2 dividingHadamards bound is 2k which is less than 2n1 unless n = 2. Therefore Hadamards bound is unattainable unless n =1, 2, or a multiple of 4.

2.4.3 Barbas bound for n oddWhen n is odd, property 1 for Gram matrices can be strengthened to

1. G is an odd-integer matrix.

This allows a sharper upper bound[4] to be derived: |det R| = (det G)1/2 (det (n1)I+J)1/2 = (2n1)1/2(n1)(n1)/2,where J is the all-one matrix. Here (n1)I+J is the maximal-determinant matrix satisfying the modied property 1and properties 24. It is unique up to multiplication of any set of rows and the corresponding set of columns by 1.The bound is not attainable unless 2n1 is a perfect square, and is therefore never attainable when n 3 (mod 4).

2.4.4 The EhlichWojtas bound for n 2 (mod 4)When n is even, the set of rows of R can be partitioned into two subsets.

Rows of even type contain an even number of elements 1 and an even number of elements 1. Rows of odd type contain an odd number of elements 1 and an odd number of elements 1.

The dot product of two rows of the same type is congruent to n (mod 4); the dot product of two rows of oppositetype is congruent to n+2 (mod 4). When n 2 (mod 4), this implies that, by permuting rows of R, we may assumethe standard form,

G =

A BBT D

;

where A and D are symmetric integer matrices whose elements are congruent to 2 (mod 4) and B is a matrix whoseelements are congruent to 0 (mod 4). In 1964, Ehlich[5] and Wojtas[6] independently showed that in the maximaldeterminant matrix of this form, A and D are both of size n/2 and equal to (n2)I+2J while B is the zero matrix.This optimal form is unique up to multiplication of any set of rows and the corresponding set of columns by 1 and tosimultaneous application of a permutation to rows and columns. This implies the bound det R (2n2)(n2)(n2)/2.Ehlich showed that if R attains the bound, and if the rows and columns of R are permuted so that both G = RRT andG = RTR have the standard form and are suitably normalized, then we may write

2.5. MAXIMAL DETERMINANTS UP TO SIZE 21 9

R =

W XY Z

whereW, X, Y, and Z are (n/2)(n/2) matrices with constant row and column sums w, x, y, and z that satisfy z = w,y = x, and w2+x2 = 2n2. Hence the EhlichWojtas bound is not attainable unless 2n2 is expressible as the sum oftwo squares.

2.4.5 Ehlichs bound for n 3 (mod 4)When n is odd, then by using the freedom to multiply rows by 1, one may impose the condition that each row of Rcontain an even number of elements 1 and an odd number of elements 1. It can be shown that, if this normalizationis assumed, then property 1 of G may be strengthened to

1. G is a matrix with integer elements congruent to n (mod 4).

When n 1 (mod 4), the optimal form of Barba satises this stronger property, but when n 3 (mod 4), it doesnot. This means that the bound can be sharpened in the latter case. Ehlich[7] showed that when n 3 (mod 4), thestrengthened property 1 implies that the maximal-determinant form of G can be written as BJ where J is the all-onematrix and B is a block-diagonal matrix whose diagonal blocks are of the form (n3)I+4J. Moreover, he showed thatin the optimal form, the number of blocks, s, depends on n as shown in the table below, and that each block eitherhas size r or size r+1 where r = bn/sc:Except for n=11 where there are two possibilities, the optimal form is unique up to multiplication of any set of rowsand the corresponding set of columns by 1 and to simultaneous application of a permutation to rows and columns.This optimal form leads to the bound

detR (n 3)(ns)/2(n 3 + 4r)u/2(n+ 1 + 4r)v/21 ur

n 3 + 4r v(r + 1)

n+ 1 + 4r

1/2;

where v = nrs is the number of blocks of size r+1 and u =sv is the number of blocks of size r. Cohn[8] analyzedthe bound and determined that, apart from n = 3, it is an integer only for n = 112t228t+7 for some positive integer t.Tamura[9] derived additional restrictions on the attainability of the bound using the Hasse-Minkowski theorem on therational equivalence of quadratic forms, and showed that the smallest n > 3 for which Ehlichs bound is conceivablyattainable is 511.

2.5 Maximal determinants up to size 21The maximal determinants of {1, 1} matrices up to size n = 21 are given in the following table.[10] Size 22 isthe smallest open case. In the table, D(n) represents the maximal determinant divided by 2n1. Equivalently, D(n)represents the maximal determinant of a {0, 1} matrix of size n1.

2.6 References[1] Hadamard, J. (1893), Rsolution d'une question relative aux dterminants, Bulletin des Sciences Mathmatiques 17: 240

246

[2] Sylvester, J. J. (1867), Thoughts on inverse orthogonal matrices, simultaneous sign successions, and tesselated pavementsin two or more colours, with applications to Newtons rule, ornamental tile-work, and the theory of numbers, LondonEdinburgh and Dublin Philos. Mag. and J. Sci. 34: 461475

[3] Galil, Z.; Kiefer, J. (1980), "D-optimum weighing designs, Ann. Statist. 8: 12931306, doi:10.1214/aos/1176345202

[4] Barba, Guido (1933), Intorno al teorema di Hadamard sui determinanti a valore massimo, Giorn. Mat. Battaglini 71:7086.

10 CHAPTER 2. HADAMARDS MAXIMAL DETERMINANT PROBLEM

[5] Ehlich, Hartmut (1964), Determinantenabschtzungen fr binreMatrizen,Math. Zeitschr. 83: 123132, doi:10.1007/BF01111249.

[6] Wojtas, M. (1964), On Hadamards inequality for the determinants of order non-divisible by 4, Colloq. Math. 12: 7383.

[7] Ehlich, Hartmut (1964), Determinantenabschtzungen fr binre Matrizen mit n 3 mod 4, Math. Zeitschr. 84: 438447, doi:10.1007/BF01109911.

[8] Cohn, J. H. E. (2000), Almost D-optimal designs, Utilitas Math. 57: 121128.

[9] Tamura, Hiroki (2006), D-optimal designs and group divisible designs, Journal of Combinatorial Designs 14: 451462,doi:10.1002/jcd.20103.

[10] "Sloanes A003432 : Hadamard maximal determinant problem: largest determinant of a (real) {0,1}-matrix of order n.",The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

Chapter 3

Hamiltonian matrix

In mathematics, a Hamiltonian matrix is a 2n-by-2n matrix A such that JA is symmetric, where J is the skew-symmetric matrix

J =

0 InIn 0

and In is the n-by-n identity matrix. In other words, A is Hamiltonian if and only if (JA)T = JA where ()T denotes thetranspose.[1]

3.1 PropertiesSuppose that the 2n-by-2n matrix A is written as the block matrix

A =

a bc d

where a, b, c, and d are n-by-nmatrices. Then the condition that A be Hamiltonian is equivalent to requiring that thematrices b and c are symmetric, and that a + dT = 0.[1][2] Another equivalent condition is that A is of the form A = JSwith S symmetric.[2]:34

It follows easily from the denition that the transpose of a Hamiltonian matrix is Hamiltonian. Furthermore, the sum(and any linear combination) of two Hamiltonian matrices is again Hamiltonian, as is their commutator. It followsthat the space of all Hamiltonian matrices is a Lie algebra, denoted sp(2n). The dimension of sp(2n) is 2n2 + n.The corresponding Lie group is the symplectic group Sp(2n). This group consists of the symplectic matrices, thosematrices A which satisfy ATJA = J. Thus, the matrix exponential of a Hamiltonian matrix is symplectic. However thelogarithm of a symplectic matrix is not necessarily Hamiltonian because the exponential map from the Lie algebra tothe group is not surjective.[2]:3436[3]

The characteristic polynomial of a real Hamiltonian matrix is even. Thus, if a Hamiltonian matrix has as aneigenvalue, then , * and * are also eigenvalues.[2]:45 It follows that the trace of a Hamiltonian matrix is zero.The square of a Hamiltonian matrix is skew-Hamiltonian (a matrix A is skew-Hamiltonian if (JA)T = JA). Con-versely, every skew-Hamiltonian matrix arises as the square of a Hamiltonian matrix.[4]

3.2 Extension to complex matricesThe denition for Hamiltonian matrices can be extended to complex matrices in two ways. One possibility is to saythat a matrix A is Hamiltonian if (JA)T = JA, as above.[1][4] Another possibility is to use the condition (JA)* = JAwhere ()* denotes the conjugate transpose.[5]

11

12 CHAPTER 3. HAMILTONIAN MATRIX

3.3 Hamiltonian operatorsLet V be a vector space, equipped with a symplectic form . A linear map A : V 7! V is called a Hamiltonianoperator with respect to if the form x; y 7! (A(x); y) is symmetric. Equivalently, it should satisfy

(A(x); y) = (x;A(y))

Choose a basis e1, , en in V, such that is written asP

i ei^en+i . A linear operator is Hamiltonian with respectto if and only if its matrix in this basis is Hamiltonian.[4]

3.4 References[1] Ikramov, Khakim D. (2001), Hamiltonian square roots of skew-Hamiltonian matrices revisited, Linear Algebra and its

Applications 325: 101107, doi:10.1016/S0024-3795(00)00304-9.

[2] Meyer, K. R.; Hall, G. R. (1991), Introduction to Hamiltonian dynamical systems and the N-body problem, Springer, ISBN0-387-97637-X.

[3] Dragt, Alex J. (2005), The symplectic group and classical mechanics, Annals of the New York Academy of Sciences 1045(1): 291307, doi:10.1196/annals.1350.025.

[4] Waterhouse, William C. (2005), The structure of alternating-Hamiltonian matrices, Linear Algebra and its Applications396: 385390, doi:10.1016/j.laa.2004.10.003.

[5] Paige, Chris; Van Loan, Charles (1981), A Schur decomposition for Hamiltonian matrices, Linear Algebra and its Ap-plications 41: 1132, doi:10.1016/0024-3795(81)90086-0.

Chapter 4

Hankel matrix

In linear algebra, aHankel matrix (or catalecticant matrix), named after Hermann Hankel, is a square matrix withconstant skew-diagonals (positive sloping diagonals), e.g.:

266664a b c d eb c d e fc d e f gd e f g he f g h i

377775:If the i,j element of A is denoted Ai,j, then we have

Ai;j = Ai1;j+1:

The Hankel matrix is closely related to the Toeplitz matrix (a Hankel matrix is an upside-down Toeplitz matrix). Fora special case of this matrix see Hilbert matrix.A Hankel operator on a Hilbert space is one whose matrix with respect to an orthonormal basis is a (possibly innite)Hankel matrix (Ai;j)i;j1 , where Ai;j depends only on i+ j .The determinant of a Hankel matrix is called a catalecticant.

4.1 Hankel transformThe Hankel transform is the name sometimes given to the transformation of a sequence, where the transformedsequence corresponds to the determinant of theHankelmatrix. That is, the sequence fhngn0 is theHankel transformof the sequence fbngn0 when

hn = det(bi+j2)1i;jn+1:

Here, ai;j = bi+j2 is the Hankel matrix of the sequence fbng . The Hankel transform is invariant under thebinomial transform of a sequence. That is, if one writes

cn =nX

k=0

n

k

bk

as the binomial transform of the sequence fbng , then one has

det(bi+j2)1i;jn+1 = det(ci+j2)1i;jn+1:

13

14 CHAPTER 4. HANKEL MATRIX

4.2 Hankel matrices for system identicationHankel matrices are formed when given a sequence of output data and a realization of an underlying state-spaceor hidden Markov model is desired. The singular value decomposition of the Hankel matrix provides a means ofcomputing the A, B, and C matrices which dene the state-space realization.

4.3 Orthogonal polynomials on the real line

4.3.1 Positive Hankel matrices and the Hamburger moment problemsFurther information: Hamburger moment problem

4.3.2 Orthogonal polynomials on the real line

4.3.3 Tridiagonal model of positive Hankel operators

4.3.4 Relation between Hankel and Toeplitz matricesLet Jn be the reection matrix of order n . For example the reection matrix of order 5 is as follows: J5 =266664

11

11

1

377775:If H(m;n) is am n Hankel matrix, thenH(m;n) = T (m;n)Jn , where T (m;n) is am n Toeplitz matrix.

4.3.5 Relations between structured matrices

4.4 See also Cauchy matrix Vandermonde matrix Displacement rank

4.5 Notes

4.6 References Brent R.P. (1999), Stability of fast algorithms for structured linear systems, Fast Reliable Algorithms forMatrices with Structure (editorsT. Kailath, A.H. Sayed), ch.4 (SIAM).

Victor Y. Pan (2001). Structured matrices and polynomials: unied superfast algorithms. Birkhuser. ISBN0817642404.

J.R. Partington (1988). An introduction to Hankel operators. LMS Student Texts 13. Cambridge UniversityPress. ISBN 0-521-36791-3.

Chapter 5

HasseWitt matrix

In mathematics, the HasseWitt matrix H of a non-singular algebraic curve C over a nite eld F is the matrix ofthe Frobenius mapping (p-th power mapping where F has q elements, q a power of the prime number p) with respectto a basis for the dierentials of the rst kind. It is a g g matrix where C has genus g. The rank of the HasseWittmatrix is the Hasse or HasseWitt invariant.

5.1 Approach to the denitionThis denition, as given in the introduction, is natural in classical terms, and is due to Helmut Hasse and Ernst Witt(1936). It provides a solution to the question of the p-rank of the Jacobian variety J of C; the p-rank is boundedby the rank of H, specically it is the rank of the Frobenius mapping composed with itself g times. It is also adenition that is in principle algorithmic. There has been substantial recent interest in this as of practical applicationto cryptography, in the case of C a hyperelliptic curve. The curve C is superspecial if H = 0.That denition needs a couple of caveats, at least. Firstly, there is a convention about Frobenius mappings, andunder the modern understanding what is required for H is the transpose of Frobenius (see arithmetic and geometricFrobenius for more discussion). Secondly, the Frobenius mapping is not F-linear; it is linear over the prime eldZ/pZ in F. Therefore the matrix can be written down but does not represent a linear mapping in the straightforwardsense.

5.2 CohomologyThe interpretation for sheaf cohomology is this: the p-power map acts on

H1(C,OC),

or in other words the rst cohomology of C with coecients in its structure sheaf. This is now called the CartierManin operator (sometimes just Cartier operator), for Pierre Cartier and Yuri Manin. The connection with theHasseWitt denition is by means of Serre duality, which for a curve relates that group to

H0(C, C)

where C = 1C is the sheaf of Khler dierentials on C.

5.3 Abelian varieties and their p-rankThe p-rank of an abelian variety A over a eld K of characteristic p is the integer k for which the kernel A[p] ofmultiplication by p has pk points. It may take any value from 0 to d, the dimension of A; by contrast for any otherprime number l there are l2d points in A[l]. The reason that the p-rank is lower is that multiplication by p on A is an

15

16 CHAPTER 5. HASSEWITT MATRIX

inseparable isogeny: the dierential is p which is 0 in K. By looking at the kernel as a group scheme one can get themore complete structure (reference David Mumford Abelian Varieties pp. 1467); but if for example one looks atreduction mod p of a division equation, the number of solutions must drop.The rank of the CartierManin operator, or HasseWitt matrix, therefore gives an upper bound for the p-rank. Thep-rank is the rank of the Frobenius operator composed with itself g times. In the original paper of Hasse and Wittthe problem is phrased in terms intrinsic to C, not relying on J. It is there a question of classifying the possibleArtinSchreier extensions of the function eld F(C) (the analogue in this case of Kummer theory).

5.4 Case of genus 1The case of elliptic curves was worked out by Hasse in 1934. Since the genus is 1, the only possibilities for the matrixH are: H is zero, Hasse invariant 0, p-rank 0, the supersingular case; or H non-zero, Hasse invariant 1, p-rank 1, theordinary case.[1] Here there is a congruence formula saying that H is congruent modulo p to the number N of pointson C over F, at least when q = p. Because of Hasses theorem on elliptic curves, knowing N modulo p determines Nfor p 5. This connection with local zeta-functions has been investigated in depth.For a plane curve dened by a cubic f(X,Y,Z) = 0, the Hasse invariant is zero if and only if the coecient of (XYZ)p1in fp1 is zero.[1]

5.5 Notes[1] Hartshorne, Robin (1977). Algebraic Geometry. Graduate Texts in Mathematics 52. Springer-Verlag. p. 332. ISBN

0-387-90244-9. MR 0463157. Zbl 0367.14001.

5.6 References Hasse, Helmut (1934). Existenz separabler zyklischer unverzweigter Erweiterungskrper vom Primzahlgradp ber elliptischen Funktionenkrpern der Charakteristik p". Journal f. d. reine u. angew. Math. 172: 7785.doi:10.1515/crll.1935.172.77. JFM 60.0910.02. Zbl 0010.14803.

Hasse, Helmut; Witt, Ernst (1936). Zyklische unverzweigte Erweiterungskrper vom Primzahlgrad p bereinem algebraischen Funktionenkrper der Charakteristik p". Monatshefte f. Math. und Phys. 43: 477492.doi:10.1515/9783110835007.202. JFM 62.0112.01. Zbl 0013.34102.

Manin, Ju. I. (1965). The HasseWitt matrix of an algebraic curve. Transl., Ser. 2, Am. Math. Soc. 45:245246. ISSN 0065-9290. Zbl 0148.28002. (English translation of a Russian original)

Chapter 6

Hat matrix

In statistics, the hat matrix, H, sometimes also called inuence matrix[1] and projection matrix, maps the vectorof response values to the vector of tted values (or predicted values). It describes the inuence each response valuehas on each tted value.[2][3] The diagonal elements of the hat matrix are the leverages, which describe the inuenceeach response value has on the tted value for that same observation.If the vector of response values is denoted by y and the vector of tted values by ,

y^ = Hy:

As is usually pronounced y-hat, the hat matrix is so named as it puts a hat on y". The formula for the vector ofresiduals r can also be expressed compactly using the hat matrix:

r = y y^ = yHy = (I H)y:

Moreover, the element in the ith row and jth column of H is equal to the covariance between the jth response valueand the ith tted value, divided by the variance of the former:

hij = cov[y^i; yj ]/ var[yj ]

The covariance matrix of the residuals is therefore, by error propagation, equal to (I H)>(I H) , where is the covariance matrix of the error vector (and by extension, the response vector as well). For the case of linearmodels with independent and identically distributed errors in which = 2I, this reduces to (I H)2.[2]

Many types of models and techniques are subject to this formulation. A few examples are:

Linear model / linear least squares Smoothing splines Regression splines Local regression Kernel regression Linear ltering

6.1 Linear modelSuppose that we wish to estimate a linear model using linear least squares. The model can be written as

17

18 CHAPTER 6. HAT MATRIX

y = X + ";

whereX is a matrix of explanatory variables (the design matrix), is a vector of unknown parameters to be estimated,and is the error vector.

6.1.1 Solution with unit weights and uncorrelated errorsWhen the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are

^ =X>X

1X>y;

so the tted values are

y^ = X^ = XX>X

1X>y:

Therefore the hat matrix is given by

H = XX>X

1X>:

In the language of linear algebra, the hat matrix is the orthogonal projection onto the column space of the designmatrix X.[3](Note that

X>X

1X> is the pseudoinverse of X.)

Some facts of the hat matrix in this setting are summarized as follows:[3]

r = (I H)y; and r = yHy ? X: H is symmetric, and so is I - H. H is idempotent: H2 = H , and so is I - H. X is invariant under H : HX = X; hence (I H)X = 0 . (I H)H = H(I H) = 0:

The hat matrix corresponding to a linear model is symmetric and idempotent, that is,H2 = H . However, this is notalways the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neithersymmetric nor idempotent.For linear models, the trace of the hat matrix is equal to the rank ofX, which is the number of independent parametersof the linear model. For other models such as LOESS that are still linear in the observations y, the hat matrix can beused to dene the eective degrees of freedom of the model.The hat matrix has a number of useful algebraic properties.[4][5] Practical applications of the hat matrix in regressionanalysis include leverage and Cooks distance, which are concerned with identifying observations which have a largeeect on the results of a regression.

6.2 More generally

6.2.1 Non-identical weights and/or correlated errorsThe above may be generalized to the cases where the weights are not identical and/or the errors are correlated.Suppose that the covariance matrix of the errors is . Then since

^ =X>1X

1X>1 y;

6.3. BLOCKWISE FORMULA 19

the hat matrix is thus

H = XX>1X

1X>1;

and again it may be seen that H2 = H, though now it is no longer symmetric.

6.3 Blockwise formulaSuppose the design matrix C can be decomposed by columns as C = [A;B] . Dene the Hat operator as H(X) =XX>X

1X> . Similarly, dene the residual operator asM(X) = I H(X) . Then the Hat matrix of C can

be decomposed as follows:H(C) = H(A) +H(M(A)B) [6]

There are a number of applications of such a partitioning. The classical application hasA a column of all ones, whichallows one to analyze the eects of adding an intercept term to a regression. Another use is in the xed eects model,where A is a large sparse matrix of the dummy variables for the xed eect terms. One can use this partition tocompute the hat matrix of C without explicitly forming the matrix C , which might be too large to t into computermemory.

6.4 See also MoorePenrose pseudoinverse Studentized residuals Eective degrees of freedom Idempotent matrix Mean and predicted response

6.5 References[1] Data Assimilation: Observation inuence diagnostic of a data assimilation system

[2] Hoaglin, David C.; Welsch, Roy E. (February 1978), The Hat Matrix in Regression and ANOVA, The American Statis-tician 32 (1): 1722, doi:10.2307/2683469, JSTOR 2683469

[3] David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press.

[4] Gans, P. (1992) Data Fitting in the Chemical Sciences,, Wiley. ISBN 978-0-471-93412-7

[5] Draper, N.R., Smith, H. (1998) Applied Regression Analysis, Wiley. ISBN 0-471-17082-8

[6] Rao, C. Radhakrishna; Toutenburg, Shalabh, Heumann (2008). Linear Models and Generalizations (3rd ed.). Berlin:Springer. p. 323. ISBN 978-3-540-74226-5.

Chapter 7

Hermitian matrix

In mathematics, a Hermitian matrix (or self-adjoint matrix) is a square matrix with complex entries that is equalto its own conjugate transposethat is, the element in the i-th row and j-th column is equal to the complex conjugateof the element in the j-th row and i-th column, for all indices i and j:

aij = aji or A = AT , in matrix form.

Hermitian matrices can be understood as the complex extension of real symmetric matrices.If the conjugate transpose of a matrix A is denoted by Ay , then the Hermitian property can be written concisely as

A = Ay:

Hermitian matrices are named after Charles Hermite, who demonstrated in 1855 that matrices of this form share aproperty with real symmetric matrices of always having real eigenvalues.

7.1 ExamplesSee the following example:

24 2 2 + i 42 i 3 i4 i 1

35The diagonal elements must be real, as they must be their own complex conjugate.Well-known families of Pauli matrices, Gell-Mann matrices and their generalizations are Hermitian. In theoreticalphysics such Hermitian matrices are often multiplied by imaginary coecients,[1][2] which results in skew-Hermitianmatrices (see below).Here we oer another useful Hermitian matrix using an abstract example. If a square matrix A equals the multi-plication of a matrix and its conjugate transpose, that is, A = BBy , then A is a Hermitian positive semi-denitematrix. Furthermore, if B is row full-rank, then A is positive denite.

7.2 Properties The entries on the main diagonal (top left to bottom right) of any Hermitian matrix are necessarily real, becausethey have to be equal to their complex conjugate.

Because of conjugation, for complex valued entries the o diagonal entries cannot be symmetric (or same).Hence, a matrix that has only real entries is Hermitian if and only if it is a symmetric matrix, i.e., if it is

20

7.3. FURTHER PROPERTIES 21

symmetric with respect to the main diagonal. A real and symmetric matrix is simply a special case of aHermitian matrix.

Every Hermitian matrix is a normal matrix.

The nite-dimensional spectral theorem says that any Hermitianmatrix can be diagonalized by a unitary matrix,and that the resulting diagonal matrix has only real entries. This implies that all eigenvalues of a Hermitianmatrix A with dimension n are real, and that A has n linearly independent eigenvectors. Moreover, Hermitianmatrix has orthogonal eigenvectors for distinct eigenvalues. Even if there are degenerate eigenvalues, it isalways possible to nd an orthogonal basis of Cn consisting of n eigenvectors of A.

The sum of any two Hermitian matrices is Hermitian, and the inverse of an invertible Hermitian matrix isHermitian as well. However, the product of two Hermitian matrices A and B is Hermitian if and only if AB =BA. Thus An is Hermitian if A is Hermitian and n is an integer.

For an arbitrary complex valued vector v the product vyAv is real because of vyAv = (vyAv)y . This isespecially important in quantum physics where hermitian matrices are operators that measure properties of asystem e.g. total spin which have to be real.

The Hermitian complex n-by-n matrices do not form a vector space over the complex numbers, since theidentity matrix In is Hermitian, but i In is not. However the complex Hermitian matrices do form a vectorspace over the real numbers R. In the 2n2-dimensional vector space of complex n n matrices over R, thecomplex Hermitian matrices form a subspace of dimension n2. If Ejk denotes the n-by-n matrix with a 1 inthe j,k position and zeros elsewhere, a basis can be described as follows:

Ejj for 1 j n (n matrices)together with the set of matrices of the form

Ejk + Ekj for 1 j < k n (n2 n/2 matrices)and the matrices

i(Ejk Ekj) for 1 j < k n (n2 n/2 matrices)where i denotes the complex numberp1 , known as the imaginary unit.

If n orthonormal eigenvectors u1; : : : ; un of a Hermitian matrix are chosen and written as the columns of thematrix U, then one eigendecomposition of A is A = UUy where UUy = I = UyU and therefore

A =Xj

jujuyj

where j are the eigenvalues on the diagonal of the diagonal matrix .

7.3 Further propertiesAdditional facts related to Hermitian matrices include:

The sum of a square matrix and its conjugate transpose (C + Cy) is Hermitian. The dierence of a square matrix and its conjugate transpose (C Cy) is skew-Hermitian (also called anti-hermitian). This implies that commutator of two Hermitian matrices is skew-Hermitian.

An arbitrary square matrix C can be written as the sum of a Hermitian matrix A and a skew-Hermitian matrixB:

22 CHAPTER 7. HERMITIAN MATRIX

C = A+B with A = 12(C + Cy) and B = 1

2(C Cy):

The determinant of a Hermitian matrix is real:

det(A) = det(AT) ) det(Ay) = det(A)

A = Ay ) det(A) = det(A):(Alternatively, the determinant is the product of the matrixs eigenvalues, and as mentioned before, theeigenvalues of a Hermitian matrix are real.)

7.4 Rayleigh quotientMain article: Rayleigh quotient

7.5 See also Skew-Hermitian matrix (anti-Hermitian matrix) Haynsworth inertia additivity formula Hermitian form Self-adjoint operator Unitary matrix

7.6 References[1] Frankel, Theodore (2004). The geometry of physics: an introduction. Cambridge University Press. p. 652. ISBN 0-521-

53927-7.

[2] Physics 125 Course Notes at California Institute of Technology

7.7 External links Hazewinkel, Michiel, ed. (2001), Hermitian matrix, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4

Visualizing Hermitian Matrix as An Ellipse with Dr. Geo, by Chao-Kuei Hung from Shu-Te University, givesa more geometric explanation.

Hermitian Matrices at MathPages.com.

Chapter 8

Hessenberg matrix

In linear algebra, a Hessenberg matrix is a special kind of square matrix, one that is almost triangular. To beexact, an upper Hessenberg matrix has zero entries below the rst subdiagonal, and a lower Hessenberg matrixhas zero entries above the rst superdiagonal.[1] They are named after Karl Hessenberg.[2]

For example:

26641 4 2 33 4 1 70 2 3 40 0 1 3

3775is upper Hessenberg and

26641 2 0 05 2 3 03 4 3 75 6 1 1

3775is lower Hessenberg.

8.1 Computer programmingMany linear algebra algorithms require signicantly less computational eort when applied to triangular matrices, andthis improvement often carries over to Hessenberg matrices as well. If the constraints of a linear algebra problem donot allow a general matrix to be conveniently reduced to a triangular one, reduction to Hessenberg form is often thenext best thing. In fact, reduction of any matrix to a Hessenberg form can be achieved in a nite number of steps (forexample, through Householders algorithm of unitary similarity transforms). Subsequent reduction of Hessenbergmatrix to a triangular matrix can be achieved through iterative procedures, such as shifted QR-factorization. Ineigenvalue algorithms, the Hessenberg matrix can be further reduced to a triangular matrix through Shifted QR-factorization combined with deation steps. Reducing a general matrix to a Hessenberg matrix and then reducingfurther to a triangular matrix, instead of directly reducing a general matrix to a triangular matrix, often economizesthe arithmetic involved in the QR algorithm for eigenvalue problems.

8.2 PropertiesThe product of a Hessenberg matrix with a triangular matrix is again Hessenberg. More precisely, if A is upperHessenberg and T is upper triangular, then AT and TA are upper Hessenberg.A matrix that is both upper Hessenberg and lower Hessenberg is a tridiagonal matrix.

23

24 CHAPTER 8. HESSENBERG MATRIX

8.3 See also Hessenberg variety

8.4 Notes[1] Horn & Johnson (1985), page 28; Stoer & Bulirsch (2002), page 251

[2] Biswa Nath Datta (2010) Numerical Linear Algebra and Applications, 2nd Ed., Society for Industrial and Applied Math-ematics (SIAM) ISBN 978-0-89871-685-6, p. 307

8.5 References Horn, Roger A.; Johnson, Charles R. (1985), Matrix Analysis, Cambridge University Press, ISBN 978-0-521-38632-6.

Stoer, Josef; Bulirsch, Roland (2002), Introduction to Numerical Analysis (3rd ed.), Berlin, NewYork: Springer-Verlag, ISBN 978-0-387-95452-3.

Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007), Section 11.6.2. Reduction to HessenbergForm, Numerical Recipes: The Art of Scientic Computing (3rd ed.), New York: Cambridge University Press,ISBN 978-0-521-88068-8

8.6 External links Hessenberg matrix at MathWorld. Hessenberg matrix at PlanetMath. High performance algorithms for reduction to condensed (Hessenberg, tridiagonal, bidiagonal) form

Chapter 9

Hessian automatic dierentiation

In applied mathematics,Hessian automatic dierentiation are techniques based on automatic dierentiation (AD)that calculate the second derivative of a n -dimensional function, known as the Hessian Matrix.When examining a function in a neighborhood of a point, one can discard many complicated global aspects ofthe function and accurately approximate it with simpler functions. The quadratic approximation is the best-ttingquadratic in the neighborhood of a point, and is frequently used in engineering and science. To calculate the quadraticapproximation, one must rst calculate its gradient and Hessian matrix.Let f : Rn ! R , for each x 2 Rn the Hessian matrix H(x) 2 Rnn is the second order derivative and is asymmetric matrix. See the article on Hessian matrices for more on the denition.

9.1 Reverse Hessian-vector products

For a given u 2 Rn , this method eciently calculates the Hessian-vector product H(x)u . Thus can be used tocalculate the entire Hessian by calculatingH(x)ei , for i = 1; : : : ; n .[1]

Themethod works by rst using forward AD to perform f(x)! uTrf(x) , subsequently the method then calculatesthe gradient of uTrf(x) using Reverse AD to yield r (u rf(x)) = uTH(x) = (H(x)u)T . Both of these twosteps come at a time cost proportional to evaluating the function, thus the entire Hessian can be evaluated at a costproportional to n evaluations of the function.

9.2 Reverse Hessian: Edge_Pushing

An algorithm that calculates the entire Hessian with one forward and one reverse sweep of the computational graph isEdge_Pushing. Edge_Pushing is the result of applying the reverse gradient to the computational graph of the gradient.Naturally, this graph has n output nodes, thus in a sense one has to apply the reverse gradient method to each outgoingnode. Edge_Pushing does this by taking into account overlapping calculations.[2]

The algorithms input is the computational graph of the function. After a preceding forward sweep where all inter-mediate values in the computational graph are calculated, the algorithm initiates a reverse sweep of the graph. Uponencountering a node that has a corresponding nonlinear elemental function, a new nonlinear edge is created betweenthe nodes predecessors indicating there is nonlinear interaction between them. See the example gure on the right.Appended to this nonlinear edge is an edge weight that is the second-order partial derivative of the nonlinear nodein relation to its predecessors. This nonlinear edge is subsequently pushed down to further predecessors in such away that when it reaches the independent nodes, its edge weight is the second-order partial derivative of the twoindependent nodes it connects.[2]

25

26 CHAPTER 9. HESSIAN AUTOMATIC DIFFERENTIATION

3

1

2

2

1 0

v3 = v1 v2v3 = 1

v2 = 3v1 + v20v2 = 0

v1 = v2 + ev1

v1 = 0

v2 = x2v1 = x1

v0 = x0

sweepingnode 3

3

1

2

2

1 0

1

v2 v1

v3 = 1

v2 = v1v1 = v2

3

1

2

2

1 02v03

2v03

2v1

v2 v1

v3 = 1

v2 = v1v1 = v2

sweepingnode 2

sweepingnode 1

3

1

2

2

1 0

2v03ev11

v2 v1

v3 = 1

v2 = v1v1 = v2

2v0ev13

2v1

ev1(6 + v2)

2v0

1Example execution of Edge_Pushing

9.3 Graph colouring techniques for HessiansThe graph colouring techniques explore sparsity patterns of the Hessian matrix and cheap Hessian vector productsto obtain the entire matrix. Thus these techniques are suited for large, sparse matrices. The general strategy of anysuch colouring technique is as follows.

1. Obtain the global sparsity pattern ofH

2. Apply a graph colouring algorithm that allows us to compact the sparsity structure.

3. For each desired point x 2 Rn calculate numeric entries of the compact matrix.4. Recover the Hessian matrix from the compact matrix.

Steps one and two need only be carried out once, and tend to be costly. When one wants to calculate the Hessian atnumerous points (such as in an optimization routine), steps 3 and 4 are repeated.As an example, the gure on the left shows the sparsity pattern of the Hessian matrix where the columns have beenappropriately coloured in such a way to allow columns of the same colour to be merged without incurring in a collisionbetween elements.There are a number of colouring techniques, each with a specic recovery technique. For a comprehensive survey,see.[3] There have been successful numerical results of such methods.[4]

9.4 See also Hessian matrix Jacobian matrix and determinant Automatic dierentiation

9.5. REFERENCES 27

9.5 References[1] Bruce Christianson. Automatic Hessians by Reverse Accumulation, http://imajna.oxfordjournals.org/content/12/2/135.

abstract.

[2] R. Gower, M. Mello. A new framework for the computation of Hessians. In: Optimization Methods and Software. doi:http://www.tandfonline.com/doi/full/10.1080/10556788.2011.580098.

[3] A. H. Gebremedhin, A. Tarafdar, A. Pothen, and A. Walther. Ecient Computation of Sparse Hessians Using Coloringand Automatic Dierentiation. In: INFORMS J. on Computing 21.2 (2009), pp. 209-223. doi: http://dx.doi.org/10.1287/ijoc.1080.0286.

[4] A. Walther. Computing sparse Hessians with automatic dierentiation. In: ACM Trans. Math. Softw. 34.1 (2008), pp.1-15. issn: 0098-3500. doi: http://doi.acm.org/10.1145/1322436.1322439.

9.6 External links What color is your Jacobian? Graph coloring for computing derivatives

Chapter 10

Hessian matrix

In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar eld. It describes the local curvature of a function of many variables. The Hessian matrixwas developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him.Hesse originally used the term functional determinants.Specically, suppose f : n is a function taking as input a vector x n and outputting a scalar f(x) ; if allsecond partial derivatives of f exist and are continuous over the domain of the function, then the Hessian matrix Hof f is a square nn matrix, usually dened and arranged as follows:

H =

26666666666664

@2f

@x21

@2f

@x1 @x2 @

2f

@x1 @xn

@2f

@x2 @x1

@2f

@x22 @

2f

@x2 @xn... ... . . . ...

@2f

@xn @x1

@2f

@xn @x2 @

2f

@x2n

37777777777775:

or, component-wise:

Hi;j =@2f

@xi@xj:

The determinant of the above matrix is also sometimes referred to as the Hessian.[1]

The Hessian matrix can be considered related to the Jacobian matrix by H(f)(x) = J(f)(x).

10.1 Mixed derivatives and symmetry of the HessianThemixed derivatives of f are the entries o the main diagonal in the Hessian. Assuming that they are continuous,the order of dierentiation does not matter (Clairauts theorem). For example,

@

@xi

@f

@xj

=

@

@xj

@f

@xi

:

In a formal statement: if the second derivatives of f are all continuous in a neighborhood D, then the Hessian of f isa symmetric matrix throughout D; see symmetry of second derivatives.

28

10.2. CRITICAL POINTS 29

10.2 Critical pointsIf the gradient (the vector of the partial derivatives) of a function f is zero at some point x, then f has a critical point(or stationary point) at x. The determinant of the Hessian at x is then called the discriminant. If this determinant iszero then x is called a degenerate critical point of f, or a non-Morse critical point of f. Otherwise it is non-degenerate,and called a Morse critical point of f.The Hessian matrix plays an important role in Morse theory, because its kernel and eigenvalues allow classicationof the critical points.

10.3 Second derivative testMain article: Second partial derivative test

The Hessian matrix of a convex function is positive semi-denite. Rening this property allows us to test if a criticalpoint x is a local maximum or a local minimum, as follows.If the Hessian is positive denite at x, then f attains a local minimum at x. If the Hessian is negative denite at x, thenf attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point forf. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian ispositive-semi-denite (resp. negative semi-denite).Note that for positive semidenite and negative semidenite Hessians the test is inconclusive (yet a conclusion can bemade that f is locally convex or concave respectively). However, more can be said from the point of view of Morsetheory.The second derivative test for functions of one and two variables is simple. In one variable, the Hessian contains justone second derivative; if it is positive then x is a local minimum, and if it is negative then x is a local maximum; ifit is zero then the test is inconclusive. In two variables, the determinant can be used, because the determinant is theproduct of the eigenvalues. If it is positive then the eigenvalues are both positive, or both negative. If it is negativethen the two eigenvalues have dierent signs. If it is zero, then the second derivative test is inconclusive.More generally, the second-order conditions that are sucient for a local minimum or maximum can be expressedin terms of the sequence of principal (upper-leftmost) minors (determinants of sub-matrices) of the Hessian; theseconditions are a special case of those given in the next section for bordered Hessians for constrained optimizationthecase in which the number of constraints is zero.

10.4 Bordered HessianA bordered Hessian is used for the second-derivative test in certain constrained optimization problems. Given thefunction f considered previously, but adding a constraint function g such that g(x) = c, the bordered Hessian appearsas

H(f; g) =

266666666666666664

0@g

@x1

@g

@x2 @g

@xn

@g

@x1

@2f

@x21

@2f

@x1 @x2 @

2f

@x1 @xn

@g

@x2

@2f

@x2 @x1

@2f

@x22 @

2f

@x2 @xn... ... ... . . . ...

@g

@xn

@2f

@xn @x1

@2f

@xn @x2 @

2f

@x2n

377777777777777775If there are, say, m constraints then the zero in the upper-left corner is an m m block of zeroes, and there are mborder rows at the top and m border columns at the left.

30 CHAPTER 10. HESSIAN MATRIX

The above rules stating that extrema are characterized (among critical points with a nonsingular Hessian) by a positive-denite or negative-denite Hessian cannot apply here since a bordered Hessian can neither be negative-denite norpositive-denite, as zT H z = 0 if z is any vector whose sole non-zero entry is its rst.The second derivative test consists here of sign restrictions of the determinants of a certain set of n - m submatricesof the bordered Hessian.[2] Intuitively, one can think of the m constraints as reducing the problem to one with n - mfree variables. (For example, the maximization of f(x1,x2,x3) subject to the constraint x1+x2+x3 = 1 can be reducedto the maximization of f(x1,x2,1-x1-x2) without constraint.)Specically, sign conditions are imposed on the sequence of principal minors (determinants of upper-left-justiedsub-matrices) of the bordered Hessian, the smallest minor consisting of the truncated rst 2m+1 rows and columns,the next consisting of the truncated rst 2m+2 rows and columns, and so on, with the last being the entire borderedHessian.[3] There are thus nm minors to consider. A sucient condition for a local maximum is that these minorsalternate in sign with the smallest one having the sign of (1)m+1. A sucient condition for a localminimum is that allof these minors have the sign of (1)m (in the unconstrained case ofm=0 these conditions coincide with the conditionsfor the unbordered Hessian to be negative denite or positive denite respectively).

10.5 Vector-valued functionsIf f is instead a vector eld f : n m, i.e.

f(x) = (f1(x); f2(x); : : : ; fm(x));

then the collection of second partial derivatives is not a nn matrix, but rather a third order tensor. This can bethought of as an array of m Hessian matrices, one for each component of f:

H(f) = (H(f1);H(f2); : : : ;H(fm))

This tensor promptly degenerates to the usual Hessian matrix when m = 1.

10.6 Generalizations to Riemannian manifoldsLet (M; g) be a Riemannian manifold andr its Levi-Civita connection. Let f : M ! R be a smooth function. Wemay dene the Hessian tensor

Hess(f) 2 (T M T M) by Hess(f) := rrf = rdf ,

where we have taken advantage of the rst covariant derivative of a function being the same as its ordinary derivative.Choosing local coordinates fxig we obtain the local expression for the Hessian as

Hess(f) = ri @jf dxidxj =

@2f

@xi@xj kij

@f

@xk

dxi dxj

where kij are the Christoel symbols of the connection. Other equivalent forms for the Hessian are given by

Hess(f)(X;Y ) = hrXgradf; Y i and Hess(f)(X;Y ) = X(Y f) df(rXY ) .

10.7 Use in optimizationHessian matrices are used in large-scale optimization problems within Newton-type methods because they are thecoecient of the quadratic term of a local Taylor expansion of a function. That is,

10.8. SEE ALSO 31

y = f(x+x) f(x) +rf(x)x+ 12xTH(x)x

where f is the gradient (f/x1, ..., f/xn). Computing and storing the full Hessian matrix takes (n2) memory,which is infeasible for high-dimensional functions such as the loss functions of neural nets, conditional random elds,and other statistical models with large numbers of parameters. In such situations, quasi-Newton algorithms have beendeveloped that use approximations to the Hessian. One of the most popular quasi-Newton algorithms is BFGS.[4]

Such approximations may use the fact that an optimization algorithm uses the Hessian only as a linear operatorH(v),and proceed by rst noticing that the Hessian also appears in the local expansion of the gradient:

rf(x+x) = rf(x) +H(x) +O(kxk2)Letting x = rv for some scalar r, this gives

H(x) = H(rv) = rH(v) = rf(x+ rv)rf(x) +O(r2);i.e.,

H(v) = 1r

hrf(x+ rv)rf(x)

i+O(r)

so if the gradient is already computed, the approximate Hessian can be computed by a linear (in the size of thegradient) number of scalar operations. (While simple to program, this approximation scheme is not numericallystable since r has to be made small to prevent error due to the O(r) term, but increasing it loses precision in the rstterm.[5])

10.8 See also The determinant of the Hessian matrix is a covariant; see Invariant of a binary form Polarization identity, useful for rapid calculations involving Hessians. Jacobian matrix Hessian equations The Hessian matrix is commonly used for expressing image processing operators in image processing andcomputer vision (see the Laplacian of Gaussian (LoG) blob detector, the determinant of Hessian (DoH) blobdetector and scale space).

10.9 Notes[1] Binmore, Ken; Davies, Joan (2007). Calculus Concepts and Methods. Cambridge University Press. p. 190. ISBN

9780521775410. OCLC 717598615.[2] Neudecker, Heinz; Magnus, Jan R. (1988). Matrix Dierential Calculus with Applications in Statistics and Econometrics.

New York: John Wiley & Sons. p. 136. ISBN 0-471-91516-5.[3] Chiang, Alpha C. (1984). Fundamental Methods of Mathematical Economics (Third ed.). McGraw-Hill. p. 386.[4] Nocedal, Jorge; Wright, Stephen (2000). Numerical Optimization. Springer Verlag. ISBN 978-0387987934.[5] Pearlmutter, Barak A. (147160). Fast exact multiplication by the Hessian (PDF). Neural Computation 6 (1).

10.10 External links Weisstein, Eric W., Hessian, MathWorld.

Chapter 11

Hierarchical matrix

In numerical mathematics, hierarchical matrices (H-matrices) [1] [2] [3] are used as data-sparse approximations ofnon-sparse matrices. While a sparse matrix of dimension n can be represented eciently in O(n) units of storageby storing only its non-zero entries, a non-sparse matrix would require O(n2) units of storage, and using this typeof matrices for large problems would therefore be prohibitively expensive in terms of storage and computing time.Hierarchical matrices provide an approximation requiring onlyO(nk log(n)) units of storage, where k is a parametercontrolling the accuracy of the approximation. In typical applications, e.g., when discretizing integral equations [4] [5][6] [7] or solving elliptic partial dierential equations [8] [9] [10] a rank proportional to log(1/) with a small constant is sucient to ensure an accuracy of . Compared to many other data-sparse representations of non-sparse matrices,hierarchical matrices oer a major advantage: the results of matrix arithmetic operations like matrix multiplication,factorization or inversion can be approximated in O(nk log(n)) operations, where ; 2 f1; 2; 3g: [11]

11.1 Basic ideaHierarchical matrices rely on local low-rank approximations: let I; J be index sets, and let G 2 RIJ denote thematrix we have to approximate. In many applications (see above), we can nd subsets t I; s J such that Gjtscan be approximated by a rank- k matrix. This approximation can be represented in factorized form Gjts ABwith factors A 2 Rtk; B 2 Rsk . While the standard representation of the matrix Gjts requires O((#t)(#s))units of storage, the factorized representation requires only O(k(#t + #s)) units. If k is not too large, the storagerequirements are reduced signicantly.In order to approximate the entire matrix G , it is split into a family of submatrices. Large submatrices are storedin factorized representation, while small submatrices are stored in standard representation in order to improve theeciency.Low-rank matrices are closely related to degenerate expansions used in panel clustering and the fast multipole methodto approximate integral operators. In this sense, hierarchical matrices can be considered the algebraic counterpartsof these techniques.

11.2 Application to integral operatorsHierarchical matrices are successfully used to treat integral equations, e.g., the single and double layer potentialoperators appearing in the boundary element method. A typical operator has the form

G[u](x) =Z

(x; y)u(y) dy:

The Galerkin method leads to matrix entries of the form

gij =

Z

Z

(x; y)'i(x) j(y) dy dx;

32

11.3. APPLICATION TO ELLIPTIC PARTIAL DIFFERENTIAL EQUATIONS 33

where ('i)i2I and ( j)j2J are families of nite element basis functions. If the kernel function is sucientlysmooth, we can approximate it by polynomial interpolation to obtain

~(x; y) =kX

=1

(x; )`(y);

where ()k=1 is the family of interpolation points and (`)k=1 is the corresponding family of Lagrange polynomials.Replacing by ~ yields an approximation

~gij =

Z

Z

~(x; y)'i(x) j(y) dy dx =kX

=1

Z

(x; )'i(x) dx

Z

`(y) j(y) dy =kX

=1

aibj

with the coecients

ai =

Z

(x; )'i(x) dx;

bj =

Z

`(y) j(y) dy:

If we choose t I; s J and use the same interpolation points for all i 2 t; j 2 s , we obtain Gjts AB .Obviously, any other approximation separating the variables x and y , e.g., the multipole expansion, would also allowus to split the double integral into two single integrals and thus arrive at a similar factorized low-rank matrix.Of particular interest are cross approximation techniques [5] [6] [12] that use only the entries of the original matrix Gto construct a low-rank approximation.

11.3 Application to elliptic partial dierential equationsSince the solution operator of an elliptic partial dierential equation can be expressed as an integral operator involvingGreens function, it is not surprising that the inverse of the stiness matrix arising from the nite element method canbe approximated by a hierarchical matrix.Greens function depends on the shape of the computational domain, therefore it is usually not known. Nevertheless,approximate arithmetic operations can be employed to compute an approximate inverse without knowing the functionexplicitly.Surprisingly, it is possible to prove[8][9] that the inverse can be approximated even if the dierential operator involvesnon-smooth coecients and Greens function is therefore not smooth.

11.4 Arithmetic operationsThe most important innovation of the hierarchical matrix method is the development of ecient algorithms for per-forming (approximate) matrix arithmetic operations on non-sparse matrices, e.g., to compute approximate inverses,LU decompositions and solutions to matrix equations.The central algorithm is the ecient matrix-matrix multiplication, i.e., the computation of Z = Z + XY forhierarchical matrices X;Y; Z and a scalar factor . The algorithm requires the submatrices of the hierarchicalmatrices to be organized in a block tree structure and takes advantage of the properties of factorized low-rankmatricesto compute the updated Z in O(nk2 log(n)2) operations.Taking advantage of the block structure, the inverse can be computed by using recursion to compute inverses andSchur complements of diagonal blocks and combining both using the matrix-matrix multiplication. In a similar way,the LU decomposition [13] [14] can be constructed using only recursion andmultiplication. Both operations also requireO(nk2 log(n)2) operations.

34 CHAPTER 11. HIERARCHICAL MATRIX

11.5 H2-matricesIn order to treat very large problems, the structure of hierarchical matrices can be improved: H2-matrices [15] [16]replace the general low-rank structure of the blocks by a hierarchical representation closely related to the fast multipolemethod in order to reduce the storage complexity to O(nk) .In the context of boundary integral operators, replacing the xed rank k by block-dependent ranks leads to approxi-mations that preserve the rate of convergence of the underlying boundary element method at a complexity of O(n):[17][18]

11.6 Literature[1] W. Hackbusch, A sparse matrix arithmetic based on H-matrices. Part I: Introduction to H-matrices, Computing (1999),

62:89108[2] M. Bebendorf, Hierarchical matrices: A means to eciently solve elliptic boundary value problems, Springer (2008)[3] W. Hackbusch, Hierarchische Matrizen. Algorithmen und Analysis, Springer (2009)[4] W. Hackbusch and B. N. Khoromskij, A sparse H-Matrix Arithmetic. Part II: Application to Multi-Dimensional Problems,

Computing (2000), 64:2147[5] M. Bebendorf, Approximation of boundary element matrices, Num. Math. (2000), 86:565-589[6] M. Bebendorf and S. Rjasanow, Adaptive low-rank approximation of collocation matrices, Computing (2003), 70:124[7] S. Brm and L. Grasedyck, Hybrid cross approximation of integral operators, Num. Math. (2005), 101:221249[8] M. Bebendorf and W. Hackbusch, Existence of H-matrix approximants to the inverse FE-matrix of elliptic operators with

L1 -coecients, Num. Math. (2003), 95:128[9] S. Brm, Approximation of solution operators of elliptic partial dierential equations by H- and H2

matrices h

Documents

hessian matrix

hamiltonian matrix

contents4 hankel matrix

hessenberg matrix

hierarchical matrix

hat matrix

gram matrix

positive hankel matrices