winitzki. advanced general relativity

193
Advanced General Relativity Lecture notes by Sergei Winitzki DRAFT September 28, 2007, version 1.1

Upload: eddiejam1642

Post on 18-Jan-2016

196 views

Category:

Documents


25 download

TRANSCRIPT

Page 1: Winitzki. Advanced General Relativity

Advanced General Relativity

Lecture notes by Sergei Winitzki

DRAFT September 28, 2007, version 1.1

Page 2: Winitzki. Advanced General Relativity

Lecture notes on topics in advanced General Relativity, including differential geometry, singularity theorems, variationalprinciples, and an introduction to spinors. This text is not yet in a final form.

Copyright c© 2005-2007 by SERGEI WINITZKI. Permission is granted to copy, distribute and/or modify this document un-der the terms of the GNU Free Documentation License (version 1.2 or any later version published by the Free SoftwareFoundation) with an Invariant Section being chapter E, with no Front-Cover Texts and no Back-Cover Texts (see Sec. E.2 forthe conditions). The GFDL permits, among other things, unrestricted verbatim copying of the text. The source files used toprepare a printable version of this text, as well as updates, will be found at the author’s home page.

Page 3: Winitzki. Advanced General Relativity

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vSuggested literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Calculus in curved space 11.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Index-free notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Sample practice problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Basic notions: Manifolds and vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Manifolds and coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3 Manifolds: intrinsic picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.4 Tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.5 Tangent vectors as short curve segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.6 *Tangent space as space of derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.7 Vector fields and flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.8 *Tangent bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.9 Tensor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.10 Commutator of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.11 Connecting vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.1 Commutator as Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.2 Lie derivative of tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.3 Geometric interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4 Calculus of differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.1 Volume as antisymmetric tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.2 Motivation for differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.3 Antisymmetric tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.4.4 *Oriented volume and n-vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.4.5 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.6 Differential forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.4.7 *Canonical decomposition of 1-forms and 2-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4.8 The Poincaré lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.4.9 Integration of forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.5 Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.5.1 Motivation: metric on surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.5.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.5.3 Examples of metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.5.4 Orthonormal frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.5.5 Correspondence of vectors and covectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.5.6 The Levi-Civita tensor ε . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.6 Affine connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.6.2 General properties of connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.6.3 The “coordinate derivative” connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.6.4 Compatibility with the metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.6.5 Torsion and torsion-freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.6.6 Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.6.7 Killing vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.6.8 *Koszul formula and the Lie derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.6.9 Divergence of a vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1.7 Calculations in index-free notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381.7.1 Abstract index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381.7.2 Converting expressions into index-free notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391.7.3 Index-free computations of trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401.7.4 Summary of calculation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

1.8 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

i

Page 4: Winitzki. Advanced General Relativity

Contents

1.8.1 Curvature of a connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

1.8.2 Bianchi identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441.8.3 Ricci tensor and scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451.8.4 Calculations with the curvature tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

1.9 Geodesic curves, geodesic vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471.9.1 Parallel transport of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471.9.2 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1.9.3 Geodesics extremize proper length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491.9.4 *Motion under external forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501.9.5 Deviation of geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1.10 Example: hypersurface of constant curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511.10.1 Tangent bundle and induced metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511.10.2 Induced connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

1.10.3 Riemann tensor within the hypersurface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2 Geometry of null surfaces 552.1 Null vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.1.1 Orthogonal complement spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.1.2 Divergence of a null vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.2 Null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.2.1 Three-dimensional hypersurfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.2.2 Integrable vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.2.3 Frobenius theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.2.4 Null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.2.5 Examples of null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.2.6 Lightcones are null surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.2.7 Null functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.2.8 Null functions generate null geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.2.9 Every lightray comes from null functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.2.10 Conformal invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.3 Raychaudhuri equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2.3.1 Distortion tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.3.2 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.3.3 Introducing Raychaudhuri equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.3.4 Shear for timelike congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.3.5 Shear for null congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.4 Applications of Raychaudhuri equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2.4.1 Energy conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652.4.2 Focusing of timelike geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.4.3 Repulsive gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2.4.4 Focusing of null geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672.5 Null tetrad formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3 Asymptotically flat spacetimes 713.1 Stationary spacetimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.1.1 Newtonian limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.1.2 Redshift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.1.3 Conformal Killing vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.1.4 Gravitational potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.1.5 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.2 Conformal infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.2.1 Conformal infinity for Minkowski spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.2.2 Conformal diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.2.3 How to draw conformal diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.3 Asymptotic flatness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.4 Conformal radiation fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.4.1 Scalar field in 1+1 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.4.2 Scalar field in 3+1 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3.4.3 Electromagnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.4.4 Gravitational radiation field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.4.5 Asymptotic behavior of radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

ii

Page 5: Winitzki. Advanced General Relativity

Contents

4 Global techniques 894.1 Singularity theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.1.1 Singularities and geodesic incompleteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.1.2 Past-incompleteness of inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.1.3 Conjugate points on geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.1.4 Second variation of proper length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924.1.5 Singularity in collapsing or expanding universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.1.6 Singularity in a closed universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.1.7 Singularity in gravitational collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.2 Hawking’s area theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.3 Holographic principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5 Variational principle 1015.1 Lagrangian formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.1.1 Classical field theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.1.2 Einstein-Hilbert action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.1.3 Nonlinear f(R) gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.1.4 Energy-momentum tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.1.5 General covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.1.6 Symmetries and Noether theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.2 Hamiltonian formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.2.1 Electrodynamics in Hamiltonian formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125.2.2 Hamiltonian mechanics of constrained systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.2.3 Gauss-Codazzi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145.2.4 Boundary term in Einstein-Hilbert action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155.2.5 The Hamiltonian for pure gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175.2.6 Constraints in General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.3 Quantum cosmology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.3.1 Wave function of the universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.3.2 Wheeler-DeWitt equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.3.3 Interpretation of the wave function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.3.4 “Minisuperspace” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6 Tetrad methods 1256.1 Tetrad formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.1.1 Tetrads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.1.3 Hodge duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.1.4 Levi-Civita connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.1.5 Connection as a set of 1-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.1.6 *Solving equations for n-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.2 Applications of tetrad formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336.2.1 Computing geodesic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336.2.2 Determining Killing vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1346.2.3 Curvature as a set of 2-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1356.2.4 Ricci tensor and Ricci scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.2.5 Einstein-Hilbert action in tetrads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.3 Connections on vector bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.3.1 Vector bundles as generalization of tangent bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.3.2 Examples of bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.3.3 Covariant derivatives on vector bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.3.4 Gauge theories and associated bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.3.5 Tangent bundle as associated bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7 Spinors 1417.1 Introducing spinors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.1.1 Quaternions and rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417.1.2 The Lorentz group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.1.3 Lorentz transformations of spinors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.2 Spinor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457.2.1 The fundamental 2-form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1467.2.2 Relationship of spinors and vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1467.2.3 Simplification of spinorial tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

iii

Page 6: Winitzki. Advanced General Relativity

Contents

7.3 Equations for spinor fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.3.1 Spinors in curved spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.3.2 Covariant derivative on spinors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.3.3 Maxwell equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.3.4 Dirac equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

A Elements of Special and General Relativity 151A.1 Special Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

A.1.1 Spacetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151A.1.2 Motion of bodies in SR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

A.2 Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152A.3 Transition to General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153A.4 Covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

A.4.1 Curved coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154A.4.2 Curved space and induced metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155A.4.3 Covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.4.4 Properties of covariant derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157A.4.5 ∗Choice of connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

A.5 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158A.5.1 Parallel transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158A.5.2 Riemann tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158A.5.3 ∗Expressing Riemann tensor through Γλαβ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

A.6 Covariant integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.6.1 Determinant of the metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.6.2 Covariant volume element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.6.3 Derivative of the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.6.4 Covariant divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160A.6.5 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

A.7 Einstein’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

B How not to learn tensor calculus 163B.1 Tensor algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163B.2 Tensor calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163B.3 Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

C Calculations and proofs 167C.1 For Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167C.2 For Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171C.3 For Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

D Comments on literature 175D.1 Comments on Ludvigsen’s General Relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

E License for this text 177E.1 Author’s position on commercial publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177E.2 GNU Free Documentation License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

E.2.0 Applicability and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177E.2.1 Verbatim copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178E.2.2 Copying in quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178E.2.3 Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Bibliography 181

Index 183

iv

Page 7: Winitzki. Advanced General Relativity

Contents

Preface

This book is a revised version of lecture notes for the course“Topics in AdvancedGeneral Relativity” taught by the authorin the fall semester of 2005/06 at Ludwig-Maximilians Uni-versity, Munich. The audience included advanced undergrad-uates and beginning graduate students. The choice of topicsis intended to complement an introductory course in generalrelativity, which is assumed as a prerequisite. The present textcovers differential geometry on manifolds, asymptotically flatspacetimes, singularity theorems, the use of variational prin-ciples in GR, tetrad formalism, and spinor calculus. My goalis to provide a readable introduction to concepts that are cov-ered in existing advanced texts, such as the classic books [12]and [36], and also explain a small number of newer resultsthat are sufficiently tightly connected with the main topics ofthe lectures.I cannot give an overview of the history of the subject, andthis book is intended a textbook rather than a research mono-graph. Correspondingly, there are few references to originalwork. I feel that this approach is justified because all the ma-terial (with a small number of exceptions) is well-established.Instead, the bibliography contains1 all the sources I consultedwhile preparing this book. After digesting the main ideas ex-plained here, readerswill be able to understand research-levelpapers and monographs listed in the bibliography.I would like to make some comments regarding the organi-zation of the text. I emphasize visual and conceptual explana-tions; however, I derive all the principal results in full detail.Definitions of useful mathematical notions are motivated andillustrated by examples. New terminology is shown in bold-face within or near a definition; the italics type is reserved foremphasis. The symbol marks the end of a derivation, a re-mark, or an example. This mark is intended to aid the readerin skipping unwanted material.The exposition in many contemporary textbooks is inter-spersed with unproved statements that are labeled “exer-cises” but actually provide important additional informationor even constitute an integral part of the development of thematerial. I call such statements pseudo-exercises since theyare rarely as straightforward as genuine practice problemswould be. It seems to me that the main reason for the exis-tence of pseudo-exercises is the authors’ desire to avoid clut-tering the text with derivations that may be omitted during afirst reading. I feel that readers at an advanced level are in-sufficiently motivated and/or lack the time needed to solvepseudo-exercises. So I decided that this book will not containany pseudo-exercises; every relevant statement2 comes with aderivation.3 However, many derivations and calculations arerelegated to Appendix C. I also included a small number ofshort Practice problems intended as straightforward exercisesfor the readers. (Solutions to those are not given.)A comment on the status of this text is in order. The book

1Will contain when I finish the current revision of the book.2An irrelevant statement is something that seems generally interesting buthas no direct relevance to the main subjects of the book. For example,the statement “Bianchi identities can be generalized to connections withnonzero torsion” is irrelevant since I never consider such connections inthis book. Irrelevant statements, sometimes accompanied by referencesto additional literature, are confined to footnotes. The reader may safelyignore all the footnotes.

3At present, some pseudo-exercises still remain, but almost all of them(except a very few) have full solutions. Pseudo-exercises are graduallyconverted to statements as the current major revision of the book pro-gresses.

is “complete” in the sense that there are no gaps in the cal-culations. However, the text is presently being heavily re-vised and should be in a more satisfactory form by the end of2007. This preliminary version is now made freely availablebecause I feel that it is useful even in the present, unfinishedstate. (Warning: the present text is a draft and may containmistakes! If you must have a set of lecture notes in a finishedform, please do not read this text until it is marked as a “re-lease” version. For example, it is presently not clear whichsign convention for the Ricci tensor Rµν I will finally adopt;because of this, signs might be inconsistent in some equationsinvolving Rµν . Some comments regarding possible improve-ments are scattered in boldface throughout the text.)This text may be freely distributed according to theGNU Free Documentation License (see Sec. E.2 orwww.gnu.org/copyleft/fdl.html). Comments and sugges-tions are welcome. The entire text was prepared by the authoron computers running GNU/Linux, using the free documentpreparation systems LYX and LATEX.

Sergei Winitzki, 2007

Suggested literature

There exist many textbooks on General Relativity written atevery level. In the following table, I list some of the moreadvanced textbooks covering the main topics of the presenttext. For an introduction to GR, see e.g. the books [29, 10].

[12] [21] [33] [36] [32] [19] [28]

Difficulty* (1-5): 5 2 4 4 3 2 1Diff. geometry + + + + + + +Index-free +*

Action principle + + + +Asympt. flatness + + + +* +*Spinors in GR +* + +Singularity thms. + +* +* +*Hamiltonian GR + + +Tetrad formalism + + +

*Please note:A book’s level of “difficulty” is my subjective estimate ofhow hard it would be for a well-prepared student to learn thecontents of the book. (Higher level is harder.)The book [21] discusses only some of the more elementaryaspects of spinor calculus and singularity theorems.The book [36] refers the reader to [12] for derivations ofsome key technical statements relevant to singularity theo-rems.The book [32] treats asymptotic flatness exclusively in thespinor language.The book [19] uses exclusively the abstract index notation;other books use non-abstract index notation (as evidenced bythe presence of the non-tensorial Christoffel symbols Γλµν ). Italso discusses only the basic facts relevant to asymptotic flat-ness.The book [28] discusses only some basic aspects of singu-larity theorems relevant to black holes.The present text aims to have the difficulty level 2. Rightnow the exposition is somewhat lacking in the coverage ofasymptotic flatness, and there are very few applications ofspinors and of the tetrad formalism.

v

Page 8: Winitzki. Advanced General Relativity
Page 9: Winitzki. Advanced General Relativity

1 Calculus in curved space

In this chapter I attempt to motivate and explain the mainideas and suggest images that illustrate the mathematical ap-paratus of differential geometry. The actual explanations startin Sec. 1.2.2. Additional literature on differential geometryfrom a mathematical viewpoint is [22, 11]. A more modernmathematical textbook (which I did not read) is [17].

1.1 Summary

General Relativity (GR) is a currently well-established phys-ical theory of gravitation whose mathematical apparatus isbased on differential geometry (more specifically, tensor cal-culus in curved spaces). In this chapter, I introduce the mainconcepts of this calculus: manifolds, tangent spaces, covariantderivatives, and curvature. I will explain and use the index-free notation. You (the reader) will perhaps appreciate thismaterial more fully if you are already somewhat familiar withtensor calculus and GR, at least to the extent covered in Ap-pendix A.The pragmatic side of this chapter is to develop sufficientlypowerful formalism for index-free calculations, which will beheavily used throughout this book. You may skip this chapteron first reading iff1 you are familiar with the following con-cepts and notations.

M smooth manifoldRn n-dimensional Euclidean spaceTpM, T ∗

pM (co)tangent spaces toM at point pTM tangent bundle of a manifoldMv ∈ TpM tangent vector at point pv f derivative of function f w.r.t. vector field v

∂t vector field along a coordinate axis t[a,b] ≡ Lab commutator of two vector fieldsLvT Lie derivative of tensor T w.r.t. vector vu⊗ v tensor product of vectors

ω v 1-form ω applied to vector v∫ω, dω integral / exterior differential of n-form ω

ω1 ∧ ω2 exterior product of n-forms ω1 and ω2

ιvω insertion of vector v into n-form ω

g(a,b) scalar product of vectors using a metric ggv 1-form corresp. to vector v through metric gg−1ω vector corresp. to 1-form ω through metric gε volume n-form, Levi-Civita tensor∇ab covariant derivative of vector b w.r.t. vector adivu divergence of vector field u

R(a,b)c curvature tensor (Riemann tensor)R(a,b, c,d) “fully covariant” Riemann tensorRic(a,b) Ricci tensor

Concepts covered in this chapter also include: connectingvector fields; definition of the metric-compatible (Levi-Civita)connection on a manifold with a metric; torsion and torsion-freeness; properties of the Riemann tensor; geodesics andgeodesic deviation; Riemann tensor for manifolds of constantcurvature.

1If and only if.

1.1.1 Index-free notation

One approach to differential geometry is to treat vectors andtensors as multi-indexed arrays of numbers (called compo-nents), such as Tµν or R

λµαβ , which depend on the choice

of the coordinate system xµ. One manipulates expressionswith many indices, such as gµαgλµ,νg

νβ , and checks at the endof the calculations that the results are covariant (i.e. transformcorrectly under changes of coordinates). This approach waspioneered by Einstein and is still the way tensor calculus ispresented in the current physical literature. I assume that youwere already exposed to an introductory course of GR formu-lated using coordinates and components. If you are not sure,please scan Sec. A.2–A.7 in Appendix A for unfamiliar equa-tions.

In the mathematical literature, the coordinate-free approachand the index-free notation are predominant. For example,the scalar product of vectors is denoted by g(a,b) or just 〈a,b〉instead of gµνa

µbν or aµbµ. The preference for the coordinate-

free approach is, in my opinion, due to the different characterof the tasks typically undertaken by mathematicians and byphysicists. Mathematicians study abstract relationships be-tween objects and are more interested in finding objects thathave rich properties and whose interconnections explain ear-lier results on a more general level and yield new concepts.On the other hand, physicists are mostly interested in makingspecific computations and deriving equations that will even-tually give numerical values for quantities, even if the meth-ods of computation are not particularly general, elegant, orilluminating.

In the coordinate-free approach, one avoids introducinga particular coordinate system xµ, and one does not talkabout components of tensors. Instead, one uses geometricallydefined operations (such as, the covariant derivative or the Liederivative) and the algebraic properties of tensors. In a calcu-lation where a particular tensor is sought, especially when nosymmetries are present, it may be helpful to introduce a suit-able coordinate system and to determine the components ofthat tensor. However, often one needs to perform a “general”calculation that does not depend on a particular tensor (e.g.,an investigation of a general property of the curvature ten-sor). In most cases, such “general” calculations are easier toperform in the coordinate-free approach because coordinatesystems and components offer no computational or concep-tual advantages. The experience of writing this book showsthat the coordinate-free approach is well suited for applica-tions such as singularity theorems or the Hamiltonian formu-lation of GR.

It is natural2 to use the index-free notation when one adoptsthe coordinate-free approach. Here is an example of a cal-culation in the index-free notation. Statement: If a vector u

is geodesic then g(u,u) is constant along the flow lines of u.

2Although, strictly speaking, not necessary. Instead, one could use Penrose’s“abstract index notation,” where aµ means a vector with an abstract labelµ, rather than an array of components. See Sec. 1.7.1 for more details. Onebook that consistently uses the coordinate-free approach together with theabstract index notation is [19].

1

Page 10: Winitzki. Advanced General Relativity

1 Calculus in curved space

Proof : The derivative of g(u,u) along the flow of u is

Lug(u,u) = ∇ug(u,u) = 2g(∇uu,u) = 0

since ∇uu = 0 for geodesic vector fields. For comparison, thesame calculation in the index notation looks like this:

d

dτ(gµνu

µuν) = uλ (gµνuµuν);λ = 2gµνu

λuµ;λuν = 0

since uλuµ;λ = 0.The index notation is frequently very useful and even un-avoidable in some calculations, especially when one needs tomanipulate traces of high-rank tensors, as if often the case inGR. Nevertheless, the index-free notation has, in my view,significant advantages in the study of more mathematicallyadvanced topics of field theory and GR. The index-free nota-tion is concise, emphasizes the geometrical meaning of everyquantity, and completely excludes non-tensorial quantitiesfrom consideration; for instance, the non-tensorial Christoffelsymbols cannot even be defined. The index-free approach iswell-suited for studying the general properties of various ge-ometrical objects used in GR. Many general derivations (suchas computing the second variation of the geodesic length) canbe performed faster and more transparently in the index-freenotation.In this text, I adopt the coordinate-free approach; this leavesthe choice of an index-free notation or the abstract index no-tation. I employ index-free calculations when it is practicallymore convenient than the abstract index notation. The index-free approach turns out to be almost always more convenient,except for certain cases, such as the variation of the Einstein-Hilbert action with respect to the metric, where index-free cal-culations become cumbersome (although still possible). Inthose cases I use the abstract index notation. Section 1.7presents more discussion and shows how to convert expres-sions between the index-free and the index notations. In thisbook, I also introduce an admittedly nonstandard but unam-biguous and adequate index-free notation for the trace of atensor (Sec. 1.7.3).To make the transition to the index-free approach easier forreaders accustomed to the index notation, I will sometimesmention the index representation of tensor quantities usedthroughout the text. The formal correspondence between theindex-free and the index notations is summarized in the fol-lowing table.

v vα∂α ≡ vα ∂∂xα

v f vα∂αf ≡ vαf,α[a,b] aµ∂µb

ν − bµ∂µaν

∇ab aν∇νbµ ≡ aνbµ;ν

div u ∇αuα ≡ uα;α

u⊗ v uαvβ

ω ωαdxα

ω v ωαvα

dω (ωα,β − ωβ,α)dxαdxβ

g(a,b) gµνaµbν ≡ aµb

µ

gv gαβvβ ≡ vα

g−1ω gαβωβ ≡ ωα

Remark on the roman “d”: Recently, certain science pub-lishers started enforcing the rule of using a roman “d” (for

instance, writing d2x(τ)/dτ2 or∫d3x) rather than an italic

“d,” as was the common practice during the previous two cen-turies. I decided that in the present text, the roman symbol“d” will be used to denote the exterior differential in the cal-culus of differential forms, while the italic “d” will appear inother situations when the traditional usage involves an “in-finitesimally small” quantity. My intention is to use a roman“d” for rigorously defined objects, such as differential formsdx, and to contrast them with “infinitesimals” dx, which are

only heuristic tools that help the intuition. I write integrals,

e.g.∫d3x, with a roman “d” when the integration of a differ-

ential form is implied (note that d3x is a shorthand notationfor a 3-form dx∧dy∧dz). However, I write an italic “d” in in-tegrals where a simple one-dimensional integration

∫f(x)dx

is actually used. I also write the “metric interval,”

ds2 = dt2 − dx2 − dy2 − dz2,

with an italic “d” because this notation is essentially a jar-gon that bears only marginally on the calculus of differentialforms; for instance, “ds” cannot be understood as a 1-form inthis notation.

When using ordinary derivatives, I write d/dτ and d2/dτ2

using the italic d, for the following reasons. The operatorsd/dτ and d2/dτ2 are not essentially different from the partialderivative operators ∂/∂x or ∂2/∂x2. Traditionally, one writes∂f/∂x when f depends not only on x but also on other vari-ables y, z, ... that are held constant while f(x, y, z, ...) is dif-ferentiated with respect to x. On the other hand, one writesdf/dx to emphasize that f(x) is a function of only one variablex, or that f is an expression that must be reduced, throughsuitable substitutions, to a function of x only. This notationis widely used to avoid cluttering the equations with explicitsubstitutions. For instance, the Euler-Lagrange equation ofmechanics is written as

∂L

∂x− d

dt

∂L

∂x= 0,

implying that ∂L/∂xmust be expressed as a function of t be-fore evaluating d/dt. However, the differential operators ∂/∂xand d/dx are not related to the exterior differential d. For thisreason, I find it confusing to write “d/dx” with a roman “d”when “d” means the exterior differential. So I use an italic “d”in “d/dx.”

A related recent trend is to write a roman “e” for the base ofthe natural logarithm and a roman “i” for the imaginary unit(i2 = −1). The reason for switching to roman“d”s, “e”s, and“i”s is to avoid confusion when other quantities are denotedby the letters d, e, i. In this book, my intention is to make thenotation unambiguous. I use a roman “i” for the imaginaryunit because the letter i is frequently used as an index, e.g. xi.However, I use an italic e = 2.718... to avoid confusion with aboldface roman e frequently used for basis vectors ea.

1.1.2 Sample practice problems

You should be able to solve these problems if you know thematerial of Chapter 1.

Null vectors. Prove that two non-parallel null vectors l,ncannot be orthogonal. In other words, g(l,n) = 0 is equivalentto l = λnwhen g(l, l) = g(n,n) = 0. Here g is a metric in four-dimensional space with the Lorentzian signature (+,−,−,−).

Extremum of a functional on curves. Consider a functional

F [γ] =

∫ x2

x1

f(g(γ, γ))dτ,

where f(x) is a given smooth, nonconstant function and γ

is the tangent vector to the curve γ(τ). Derive the Euler-Lagrange equations describing the extremum of the func-tional F [γ].

2

Page 11: Winitzki. Advanced General Relativity

1.2 Basic notions: Manifolds and vector fields

Symmetric geodesics. Consider a local coordinate systemx, y, z in a three-dimensional space. Suppose that the metricg has the form

g = φ(x)dx2 +A(y, z)dy2 + C(y, z)dz2,

where φ(x) > 0, A(y, z) > 0, C(y, z) > 0 are smooth func-tions that depend on x, y, z as indicated. Show that lines y =const, z = const (i.e. the coordinate lines of x) are geodesics.Obtain an explicit formula for a Killing vector k that the met-ric g admits. Is the vector field k geodesic?

Extrinsic curvature. A hypersurface with normal vectorfield n is embedded in a curved space with metric g. The ex-trinsic curvature K(x,y) of the hypersurface is defined as abilinear form

K(x,y) = h(∇xn,y),

where h is the inducedmetric on the hypersurface and x,y arearbitrary vectors tangent to the hypersurface. Starting fromthe definition and using g(n,n) = 1, show thatK(x,y) is sym-metric, K(x,y) = K(y,x). Compute the extrinsic curvaturetensor explicitly for the surface x2 + y2 − z2 = −1 definedin 3-dimensional Euclidean space with Cartesian coordinatesx, y, z at the point 0, 0, 1, using x, y as the local coordi-nates on the surface.

1.2 Basic notions: Manifolds andvector fields

In nongravitational physics, one describes events in four-dimensional Minkowski spacetime R4 with familiar coordi-nates t, x, y, z. Physical laws are expressed as partial differ-ential equations for various fields in spacetime. General Rel-ativity replaces Minkowski spacetime by a curved spacetime,and partial derivatives with respect to t, x, y, z are replacedby more complicated differential operations. The mathemat-ical concept of a “manifold” and accompanying notions areused to formalize the idea of a “curved space” and the waysof writing differential equations for fields in such spaces.In the following subsection 1.2.1, I will list concise defini-

tions of manifolds, tangent spaces, vector fields, etc. If youare satisfied by or already familiar with these definitions, youmay skip the entire Sec. 1.2 and continue reading from Sec. 1.3.Otherwise, you may find it helpful to read the following sec-tions (from Sec. 1.2.2 onwards) where I explain those defini-tions more visually.

1.2.1 Definitions

This section concisely summarizes the definitions and prop-erties of manifolds and tangent vectors. If you prefer visualexplanations to formal definitions, you may ignore this sec-tion and start reading from Sec. 1.2.2.A manifold M is a set of points (which I informally call a“space”) that locally looks like a subset of Rn. In other words,every point p ∈ M has a small neighborhood around it, whichis one-to-one mapped into a subset of R

n. This mapping iscalled a chart and is equivalent to specifying a set of scalarfunctions, xµ(p), defined in an open neighborhood U(p0) ofsome point p0 and giving the point x

µ ∈ Rn corresponding topoints p ∈ M. These functions must uniquely identify eachpoint p, so that for any two points p, p′ from the neighbor-hood U(p0)we have x

µ(p) = xµ(p′) iff p = p′. Also, the image

of the neighborhood U(p0) under the map p 7→ xµ(p) ∈ Rn

must be an open set in Rn. Under these conditions, the func-tions xµ(p) constitute a local coordinate system defined onan open subdomain U(p0) ⊂ M. Of course, different subdo-mains U(p0), U(p1), etc., may have different local coordinatesystems. Whenever there is an intersection of two domains ofMwhere local coordinate systems are defined, we get a coor-dinate changemap R

n → M → Rn defined in a subset of Rn.

A manifold is smooth (differentiable) if all these coordinatechange maps are smooth (differentiable) in the ordinary sense(as functions R

n → Rn).

A particular manifold may be specified either by an explicitembedding into a larger space, or by giving a complete set ofcharts and re-charting maps, specifying how the charts are tobe glued together.

A typical example of a manifold is the surface of a spherein 3-dimensional space, usually called the two-dimensionalsphere (or 2-sphere) and denoted S2. The 2-sphere S2 is lo-cally like R2, but globally is not (see Statement 1.2.2.1). It isuseful to visualize a manifold as a “surface” embedded in alarger space, and to think that “we” are “flat” observers liv-ing entirely within the surface. So “we” cannot see the largerspace and must describe the “surface” exclusively through itsintrinsic properties.

As an example of a manifold described in a different way,consider the set of all 2×2 unitary complex matrices with unitdeterminant. These matrices form a group denoted SU(2). (AmatrixA is unitary ifA†A = 1, whereA† is the Hermitian con-jugate, i.e. the transposed and complex conjugate matrix.) It isperhaps not immediately clear how to introduce coordinatesin this set of matrices. It can be shown (see Statement 7.1.1.2on page 143) that all matrices A ∈ SU(2) can be describedexplicitly as

A =

(a0 − ia3 −ia1 − a2

−ia1 + a2 a0 + ia3

)

,

where a0, ..., a3 are real numbers satisfying∑3

j=0 a2j = 1.

Therefore, SU(2) is equivalent (as a manifold) to a three-dimensional sphere S3.

A smooth, one-to-one map from one manifold to anotheris called a diffeomorphism. Two manifolds are called dif-feomorphic if there exists a diffeomorphism between them.Diffeomorphic manifolds are “topologically equivalent.” Forexample, a sphere can be diffeomorphically mapped onto thesurface of a cube, but not onto a torus. Another example wehave just seen is an explicit diffeomorphism (specified usingcoordinates) between SU(2) and S3.

We will always consider smooth finite-dimensional mani-foldsM and smooth functions f(p), p ∈ M, on these mani-folds; so we omit the word “smooth” everywhere.

In our notation, p ∈ M is a point on a manifold, scalar func-tions (scalar fields) are functions f : M → R, so that f(p) isa number, and boldface letters v,w, ... are vectors and vectorfields (defined below). We use Greek letters α, β, ..., λ, µ, ... fortensor indices and also for numbers (scalars).

A curve on a manifold is a set of points γ(τ) parametrizedby a real number τ , i.e. a smooth map γ : R → M, sometimesdefined only on a subset of R.

A derivationD at a point p ∈ M is a map from scalar func-

3

Page 12: Winitzki. Advanced General Relativity

1 Calculus in curved space

tions to numbers, satisfying the “derivative-like” properties

D(f + g) = Df +Dg, [linearity]

D(fg) = D(f)g + fD(g), [Leibnitz rule]

Df = 0 if f = const around p.

All the derivations at a point p form a vector space called thetangent space TpM at the point p. Vectors from the tangentspace are denoted by boldface letters, e.g. u ∈ TpM, and theiraction on functions is denoted by u f . The number u fis interpreted as the derivative of the function f at the pointp in the direction u. Using a local coordinate system xα,α = 1, ..., n, one can prove that the tangent space is an n-dimensional vector space. An explicit basis in this space isgiven by the n coordinate derivatives eα ≡ ∂/∂xα. The com-ponents of a vector v in this basis can be found as vα = v xα,and then the vector v is decomposed as

v =

n∑

α=1

vαeα =

n∑

α=1

vα∂

∂xα.

The union of all tangent spaces TpM for every p ∈ M iscalled the tangent bundle ofM and is denoted TM. The tan-gent bundle is itself a manifold with the same charts asM butwith twice the dimension ofM. This is so because every chartwill map a patch ofM into a patch of Rn and tangent vectorsmap into vectors in Rn, thus the patch of the tangent bundlewill map into a patch of Rn × Rn.A vector field v(p) on a manifold M is a tangent vector

v ∈ TpM chosen at every point p ∈ M, i.e. a map v : M →TM such that the image v(p) of every point p belongs to thetangent space TpM at the same point p.A covector is a linear map ω from vectors v into numbers.The space of covectors to tangent vectors at a point p is de-noted by T ∗

pM and is called the cotangent space at point p. Acovector ω(p) ∈ T ∗

pM chosen at every point p ∈ M is calleda 1-form. The pointwise action of a 1-form ω on a vector fieldv is denoted by ω v and is a scalar function onM. A tensorfield on a manifold is defined similarly: it is a tensor based onthe tangent space at a point p.The commutator of vector fields u and v is the vector field

[u,v] defined by its action on functions f by

[u,v] f ≡ u (v f) − v (u f) .

(It can be shown that [u,v] also satisfies the properties of avector field; see Statement 1.2.10.1.) One says that the vec-tor fields commute if their commutator is equal to zero. Forexample, coordinate basis vector fields ∂x and ∂y commutebecause ∂x∂yf = ∂y∂xf for smooth functions f(x, y). On theother hand, vector fields x∂y and ∂x do not commute because

x∂y (∂xf) − ∂x (x∂yf) = −∂yf 6= 0.

A vector field c is called a connecting vector for the vectorfield v if [c,v] = 0. (Since [c,v] = −[v, c], it follows that v isalso a connecting vector for c.) All the coordinate basis vectors∂/∂xα are connecting vectors for each other. Conversely, ifone is given a set of n linearly independent vectors e1, ..., enall of which are connecting for each other within a domain,then there exists a local coordinate system xα in that domainsuch that eα is equal to the α-th coordinate basis vector ∂/∂x

α.The flow lines of e1, ..., en are the “coordinate grid” of thatcoordinate system.

xy

z

Figure 1.1: A visual example of a manifold is a curved surfaceembedded in the flat 3-dimensional space.

1.2.2 Manifolds and coordinates

If you prefer formal definitions to visual explanations, or ifthe definitions in Sec. 1.2.1 are entirely familiar, you may skipSec. 1.2.2 to 1.2.11.

One of the main assumptions underlying GR is that the ge-ometry of our universe is curved rather than flat. In a curvedgeometry, initially parallel lines may move closer or furtherapart, and the sum of the angles of some triangles may belarger or smaller than π, depending on the triangle and its lo-cation in space. Let us examine the notion of “curved space,”starting from an intuitive picture.

A simple visual image of a curved space is a 2-dimensionalsurface embedded in the 3-dimensional Euclidean (flat) space,as shown in Fig. 1.1. Since the surface is smooth, every suffi-ciently small patch of the surface looks like a small piece of aflat 2-dimensional space R2. However, different such patchesare glued together in a nontrivial way, so that the entire sur-face has a shape we perceive as “curved.”

According to GR, the geometry of the universe is similar tothat of a curved surface. In a sufficiently small neighborhoodof each point, the universe looks like the flat four-dimensionalMinkowski spacetime R4, studied in Special Relativity. How-ever, the global geometry of the universe (i.e. the geometry ob-served at sufficiently large distances and time intervals) is dif-ferent from the Minkowski geometry.

This motivates us to consider the idea of a “general kind ofcurved space,” that is, a space which is only locally similar toR

4 but globally very different from R4. The mathematical ob-

ject that formalizes this intuitive idea is called a manifold. Astandard, formal definition of a manifold is given in Sec. 1.2.1.I will now motivate and explain that definition, using a two-dimensional surface as a main example.

To formalize the idea that a two-dimensional surfaceM “lo-cally looks like” R2, one says that for each point p inM thereis a small patch ofM around p which is mapped bijectively(one-to-one) into some patch of R2. A patch of R2 can be de-scribed by coordinates x1, x2 that vary in some finite ranges.Therefore, there is a one-to-one map assigning values x1 andx2 to points ofM in a small patch (see Fig. 1.2). This map iscalled a local coordinate system or a chart. We can now for-mulate a first rough definition: A manifold is a setM suchthat there is a local coordinate system around every point ofM. A manifold can be visualized as a surface that is smoothnear each point and does not have “corners” or ”spikes.” Anexample of a non-manifold is a sphere with a line connectedto it. ()

A typical example of a smooth manifold is the unit 2-sphere

4

Page 13: Winitzki. Advanced General Relativity

1.2 Basic notions: Manifolds and vector fields

xy

z

x1

x2

Figure 1.2: A local coordinate system x1, x2 covers a smallpatch of a manifold.

specified by the equation

x2 + y2 + z2 = 1,

where x, y, z are the ordinary Cartesian coordinates in a three-dimensional Euclidean space. The 2-sphere is usually de-noted S2. To establish that S2 is a manifold, it is sufficientto find a local coordinate system near each point. This isstraightforward; for instance, it is easy to see that the firsttwo Cartesian coordinates, x, y, can be used as local coor-dinates x1, x2 in a neighborhood of the north pole 0, 0, 1,even in the entire northern hemisphere (but not beyond theequator). The coordinates x, y cannot be used near the pointx, y, z = 0, 1, 0 because, e.g., a part of the straight linesegment −1 < x < 1, y = 0.99 in R2 that maps onto S2

becomes a small circle in S2. So the coordinates x, y do notprovide a one-to-one map from any subset of R2 into S2 nearthe point 0, 1, 0. However, x, z can be used as a local co-ordinate system near 0, 1, 0.As I attempted to illustrate in Fig. 1.2, the relationship be-tween the local coordinates x1, x2 and the Cartesian coor-dinates x, y, zmay be complicated. However, in some casesone can simply select a subset of the Cartesian coordinates (forinstance, x, y) and obtain a local coordinate system coveringsome part of M. The only requirement on the local coordi-nates is that they should provide a one-to-one correspondencebetween a patch U ⊂ M and a patch V ⊂ R2. For instance,any straight line segment situated within V must correspondto a non-self-intersecting line segment inM.It is important to note that there may be no global coordi-nate system covering the entire manifoldM. This is the case,for instance, if the manifoldM is a closed surface, such as asphere (see Statement 1.2.2.1 below), or a donut-shaped sur-face (a torus). If a global coordinate system exists, M mustbe an infinitely extended (perhaps curved) plane-like surfacethat never “closes on itself,” that is, a surface which is topolog-ically the same as R2.It is easy to see that the global geometry of the sphere S2

is different from that of R2, even though small patches of S2

look like small patches of R2.

Statement 1.2.2.1: There exists no global coordinate system(chart) in S2 that would map S2 smoothly and one-to-one intoR2. (Proof on page 167.)

The often used spherical coordinates θ, φ, defined by

x = r cos θ cosφ, y = r cos θ sinφ, z = r sin θ,

do not constitute a global coordinate system on S2 becausethe mapping S2 → θ, φ is not one-to-one at the poles of

the sphere, θ = ±π/2. Nevertheless, this flaw is relatively in-significant since only two points (the poles) are problematic.Such points where the chart map is not one-to-one are calledcoordinate singularities of a local coordinate system. Despitethe presence of coordinate singularities, the spherical coordi-nates constitute an “almost global” coordinate system, whichis very convenient for many calculations. The singular pointsθ = ±π/2 sometimes need to be handled specially.

Practice problem: Consider the surface in R3 specified bythe equation x2 + y2 − z2 = 1, where x, y, z are the stan-dard coordinates. Prove that this surface is a two-dimensionalmanifold by showing how to find local coordinates x1, x2 inthe neighborhood of an arbitrary point x, y, z on the surface.Is there a global coordinate system for this manifold? Answer:No.

Although our considerations so far concerned two-dimensional surfaces embedded in a three-dimensional Eu-clidean space, it is clear that completely analogous argumentsapply to higher-dimensional “hypersurfaces” embedded intoan even higher-dimensional Euclidean space. This pictureleads to the general definition of an n-dimensionalmanifold:a set of pointsM such that each point p ∈ M belongs to achart that maps it into a subset of Rn.

1.2.3 Manifolds: intrinsic picture

A very important idea is to imagine that there are “flat” ob-servers living entirely within the surfaceM. These observerscannot see the larger space and describe the surface M ex-clusively through the relationships between the points ofM,without referring to any embedding into a larger space. Sucha description is called intrinsic. The main fact is that the in-trinsic description is sufficient for all purposes relevant to theinternal geometry of the manifold. One may certainly intro-duce an embedding ofM into a larger space3 — either as away to describe a particular manifold M more concisely oras a guide for the intuition,— but the final results must be in-dependent of any such embedding. In any case, it turns outthat the intrinsic description of the geometry ofM is more ele-gant than a description in terms of an embedding into a largerspace. Another argument in favor of the intrinsic descriptionis that according to General Relativity, we are observers livingin a four-dimensional curved manifold (the spacetime), andwe do not directly see any higher-dimensional space in whichour spacetime is embedded as a “hypersurface.” Therefore,we need a way to describe the properties of our spacetimein terms of intrinsic measurements, i.e. measurements per-formed entirely within the spacetime.An example of an intrinsic property is smoothness. A func-tion is smooth (of class C∞) if it is infinitely many times dif-ferentiable. In the embedding picture, one can visualize asmooth n-dimensional manifoldM as an n-dimensional hy-persurface smoothly embedded into an N -dimensional Eu-clidean space with coordinates Xa, a = 1, ..., N . Using afixed smooth embedding of a manifoldM into RN , one couldeasily define smoothness of functions and curves onM. Thesedefinitions may look like this. A smooth scalar function isa map f : M → R which is a smooth function of the coor-dinates Xa. A smooth curve γ(τ) on a manifold M is a

3It is shown in differential geometry (Whitney’s embedding theorem [37])that any manifold can be embedded in a sufficiently high-dimensionalEuclidean space.

5

Page 14: Winitzki. Advanced General Relativity

1 Calculus in curved space

map γ : R → M such that the curve coordinates Xa(τ) in theembedding space are smooth functions of τ . However, thesedefinitions depend on the embedding ofM into RN . This de-pendence can be avoided and equivalent definitions can begiven intrinsically, using local charts onM that map patchesofM into patches of Rn. Here is how one can define a smoothscalar function. Within one chart, a function f : M → R be-comes a function R

n → R, defined on a patch of Rn. Thus,

smoothness of f can be defined as smoothness of each restric-tion of f to patches of Rn. In brief, a function f is smooth inM if it is smooth in every chart. Similarly, a curve γ : R → Mis smooth if its restriction γ : R → Rn to each chart is smooth.If we define smoothness of functions through their restric-tions to charts, we need to consider what happens within anintersection of two different charts. Could it happen that afunction f is smooth according to one chart but not smoothaccording to the other chart? Let U1,U2 ⊂ M be two differ-ent chart domains and φ1 : U1 → Rn, φ2 : U2 → Rn thecorresponding local coordinate maps. Since φ1 and φ2 areone-to-one maps, we may define a map φ1φ

−12 : Rn → Rn.

(This map is defined in a small patch of Rn which is the imageof the intersection of U1 and U2 under φ2.) We may call thismap a coordinate change map. If every coordinate changemap (for every pair of intersecting charts) is a smooth mapRn → Rn in the usual sense, i.e. defined by smooth func-tions in the standard coordinates x1, ..., xn, then all chartswill agree about the smoothness or otherwise of a functionf : M → R or a curve γ : R → M. Therefore, one adds tothe definition of a smooth manifold the requirement that ev-ery coordinate change map φ1φ

−12 is smooth, for every pair

of intersecting charts. In this way, smoothness of the mani-fold itself is adequately described by the requirement that thecoordinate change maps be smooth.If there exists a smooth one-to-one map between two entiremanifolds, these manifolds are called diffeomorphic. For ex-ample, the surface z = x2 + y2 is diffeomorphic to R2 becausethere is a one-to-one map between points x, y, z on the sur-face and the plane x, y. Thus, just one chart (with x, y asthe coordinates) is sufficient to map this manifold into R2. Onthe other hand, a 2-sphere is not diffeomorphic to R2 (State-ment 1.2.2.1).

1.2.4 Tangent spaces

Vectors are used in ordinary physics to specify “directedmag-nitudes,” such as velocities and forces. Therefore, it is impor-tant to develop the concept of a “directed magnitude” in acurved space.To aid the intuition, let us visualize a manifoldM as a sur-face embedded in a larger space R3. An immediate exampleof a “directed magnitude” comes from considering the ve-locity of a moving point. Hence, let us imagine a point thatmoves along a curve γ(τ)within a manifold, where γ is a mapR → M. It is clear the velocity of the moving point is a vec-tor (in the larger space) that is always tangent to the surfaceM.Due to this picture, “directed magnitudes” defined within amanifold are sometimes called tangent vectors. (Most of thetime, we will call them simply vectors.)At every point p ∈ M, there is a tangent plane to the sur-face. This tangent plane is a vector subspace of R3. Consid-ered as a two-dimensional vector space, it is denoted TpMand is called the tangent space to the manifoldM at point p(see Fig. 1.3). If the manifoldM is an (n− 1)-dimensional hy-

xy

z

p1

p2

Figure 1.3: Tangent spaces at points p1 and p2 of a manifold,visualized as vector subspaces of R3.

persurface embedded into an n-dimensional Euclidean spaceRn then each tangent plane is a vector space of dimensionn− 1.For instance, consider a hypersurface defined by the equa-tion f(x) = 0, where x ∈ R3. The normal vector n at a pointx0 has the following components,

nj(x0) =∂f

∂xj

∣∣∣∣x=x0

.

Any vector t orthogonal to n is tangent to the hypersurface atthe point x0. Hence, the tangent plane at the point x0 is theset of points x in R3 specified by

n · (x − x0) ≡n∑

j=1

(xj − xj0)nj(x0) = 0.

The tangent space is made of vectors with components xj −xj0 satisfying this equation. This is a two-dimensional vectorspace that depends on the point x0 (i.e. it is a different spacefor each x0).It is important to realize that tangent spaces at differentpoints are not naturally related to each other, even thoughevery tangent space is two-dimensional and looks like R2.For instance„ there is no natural way to add a tangent vec-tor at a point p1 to a tangent vector at another point p2, or tosubtract these vectors. This should be especially clear fromFig. 1.3, which shows different tangent planes. There is noobvious way to map vectors from one tangent plane into vec-tors from the other. We will see such a map later when wedefine an additional structure on the manifold called a “con-nection.” This structure literally provides a connection (i.e. aone-to-one map) between tangent spaces that are infinitesi-mally close (and thus, in principle, between any two tangentspaces). However, there are infinitely many possible “connec-tions” on a manifold. If we select a single “preferred” con-nection, in effect we introduce an additional structure on themanifold.

Remark: If an embedding of M into a Euclidean space isgiven, there is a way to construct a geometrically preferredconnection. One imagines that the manifoldM and the tan-gent planes are solid surfaces, and that one can “roll” the firsttangent planewithout sliding andwithout turning around thesurfaceM along a certain path inM until the location of theother plane is reached. This “mechanically” motivated pro-cedure yields a one-to-one map from one tangent plane intoanother one. The mapping between tangent planes depends

6

Page 15: Winitzki. Advanced General Relativity

1.2 Basic notions: Manifolds and vector fields

on the path along which we “roll” them, but is unambigu-ous for infinitesimally close points. (It is only the mapping be-tween infinitesimally close tangent spaces that is necessary todetermine a connection.) The resulting connection is calledthe Levi-Civita connection; it is defined in Sec. 1.6.6 with-out referring to the mechanical construction just described. Ofcourse, this connection depends on the embedding ofM intoa Euclidean space.

So far we considered tangent vectors using the embeddingpicture, i.e. by imagining that the M manifold is a hyper-surface embedded into a larger space. However, it is muchmore useful to describe tangent vectors intrinsically, i.e. with-out such an embedding. We can try to define a tangent vec-tor as the “directed velocity” dγ/dτ of some curve γ(τ) go-ing through p. However, we cannot directly differentiate γ(τ)with respect to τ without an embedding because the expres-sion γ(τ + δ) − γ(τ) is undefined: we cannot “subtract” onepoint from another. To circumvent this problemwithout intro-ducing coordinates, we focus not on points but on functions ofpoints. (Generally, shifting the attention from an abstract setto functions defined on the set is a powerful idea that has provenuseful in many areas of mathematics.)Let f : M → R be a scalar function defined on the mani-foldM. (A scalar function means a function with a numericvalue, as opposed to a matrix-valued or a vector-valued func-tion.) Suppose that a curve γ(τ) goes through the point p0 atτ = 0, i.e. that γ(τ = 0) = p0, and consider the directionalderivative of the function f(p) along the curve γ at the pointp0. This directional derivative is defined as the derivative ofthe function f(γ(τ)) with respect to τ ,

Dγf(p0) ≡d

∣∣∣∣τ=0

f(γ(τ)).

This derivative is interpreted as the instantaneous rate ofchange of the function f at point p0, as one moves along thecurve γ passing through p0.Tomake the discussion specific, consider the manifoldM =

Rn. Since points ofM are arrays of n coordinatesx1, ..., xn

,

a curve γ(τ) is specified by n functions γj(τ). Then the direc-tional derivative Dγf of a function f(x1, ..., xn) at a point p0 isexpressed by the formula

Dγf(p0) =n∑

k=1

dγk

∂f

∂xk

∣∣∣∣xj=p0

.

For our present purposes, it is important to view the direc-tional derivative Dγ as a map from functions to numbers. Thedirectional derivative Dγ depends not only on the direction ofthe curve γ, but also on the “speed” at which the points γ(τ)are traversed as τ changes. For instance, the value of Dγf willbe multiplied by 2 if we keep the shape of the curve γ(τ) un-changed but replace the parameter τ by 2τ . Also, we note thatDγf depends on the behavior of the curve γ in an infinitesi-mal neighborhood of p0 but does not depend on the behaviorof γ at other points; there are many curves γ that yield thesame derivative operation Dγ at the point p0. Therefore, theoperation Dγ carries the information only about the magni-tude and the direction of the “intantaneous velocity” alongthe curve at p0. This is precisely the information we expect tobe represented by a tangent vector at the point p0. Thus weare motivated to view the map Dγ itself as the tangent vector.In other words, we are going to define the space of the tangentvectors at p0, i.e. the tangent space Tp0M, as the vector space

consisting of all the possible directional derivatives Dγ at p0.This will be an intrinsic definition of the tangent space becausethis definition does not make use of any embedding ofM intoa larger space. This definition also does not rely on a choiceof a local coordinate system because only the curve γ(τ) — aset of points in the manifold — that is used to construct thedirectional derivative.It might be difficult to visualize the “space of all possibledirectional derivatives.” In order to help the intuition andto describe this “space of derivatives” more explicitly, let ustemporarily introduce a local coordinate system xα aroundthe point p0. If the curve γ(τ) is represented in the coordi-nates xα by functions γα(τ), we can express the directionalderivative Dγf of a function f as follows,

Dγf(p0) =∑

α

∂f

∂xα

∣∣∣∣p0

dγα

∣∣∣∣τ=0

=

[∑

α

dγα

∣∣∣∣τ=0

∂xα

∣∣∣∣p0

]

f.

It is clear from this formula that the derivative operation Dγis represented by the differential operator

Dγ ≡∑

α

dγα

∣∣∣∣τ=0

∂xα

∣∣∣∣p0

acting on functions f , where ∂/∂xα is the partial derivativewith respect to one coordinate xα. Moreover, we see that allthe possible directional derivatives Dγ are described by theset of coefficients dγα/dτ multiplying a fixed set of differ-ential operators ∂/∂xα. The coefficients dγα/dτ depend onthe curve γ, while the operators ∂/∂xα do not. Therefore, itis natural to regard ∂/∂xα as the basis in the vector space ofdirectional derivatives and dγα/dτ as the components of Dγin that basis.In this way, we have expressed directional derivatives Dγas differential operators in a local coordinate system. An ar-bitrary directional derivative can be written as

α uα∂/∂xα,

where uα are some coefficients. Now it is clear that all direc-tional derivatives form a vector space spanned by the basis∂/∂xα. This vector space is the same as the tangent spaceTp0M at p0 defined above using the embedding picture. Thisis so because the tangent space Tp0M in the embedding pic-ture consists of all the possible vectors tangent to M at p0,which is the same as the set of velocity vectors of all possi-ble curves γ(τ) at p0, i.e. the set of all possible values of thecomponents dγα/dτ . So we may identify the space of tangentvectors to curves at p0 with the space of directional deriva-tives at p0. These are two equivalent descriptions of the tan-gent space Tp0M. A tangent vector can be understood both asa “direction within the manifold” and, at the same time, as adifferential operator acting on scalar functions.The natural basis ∂/∂xα in the tangent space is called thecoordinate basis corresponding to a chosen coordinate sys-tem xα.I will denote tangent vectors by boldface letters such as v

rather than by Dγ , since the curve γ in Dγ plays only an aux-iliary role. The action of a tangent vector v on a function fwill be denoted by v f rather than by Dγf . In local coordi-nates xα, one can express the value of a directional deriva-tive (tangent vector) v at point p0 applied to a function f as

v f =∑

α

vα∂f

∂xα

∣∣∣∣p0

, (1.1)

where vα are the components of the vector v in the basis∂/∂xα. Hence, it is convenient to visualize the vector v as

7

Page 16: Winitzki. Advanced General Relativity

1 Calculus in curved space

the derivative operator

v ≡∑

α

vα∂

∂xα.

The derivative of a function f at a point p0 along a curve γ(τ)can be written as

Dγf(p0) ≡ γ|p0 f,

where the notation γ|p0 means the tangent vector to the curveγ(τ) at point p0.

Remark on notation: The symbol “” is used in general todenote the application of an operation to its argument(s). Forinstance, a tangent vector v is by definition a derivative oper-ation, which can be applied to a function f and yields anotherfunction, v f . In this notation, “X Y ” reads “X is appliedto Y .” Below, I also occasionally use the symbol “” to denotethe application of a 1-form to a vector and ofmultilinear formsto sets of vectors. For instance, a bilinear form B applied totwo vectors x and y yields a number denoted by B (x,y).Tangent vectors to curves γ(τ) are denoted γ where thisdoes not cause confusion. We will not use the notation Dγany more. Coordinate basis vectors such as ∂/∂x or ∂/∂y aredenoted ∂x and ∂y for brevity; in that case, the subscript “x” in∂x is not an index but a local coordinate, and we will make itclear in the text. Attention will be called to cases when the tra-ditional index notation and the Einstein summation conven-tion, such as vα∂α, are used. These cases will not be commonin the text. Usually, summation over indices will be writtenout explicitly.

If a vector v is given, the components vα in any local coor-dinate system xα can be found by substituting the n coordi-nate functions f = x1, f = x2, ..., f = xn into Eq. (1.1). Thus,the components vα can be found as

vα = v xα.

This is a convenient way to compute the components of avector in a coordinate system, if the vector is already knownthrough another coordinate system or expressed in anotherway.

Example: Consider a two-dimensional manifold with localcoordinates x, y. Suppose that a curve γ(τ) is specified as

γ(τ) = x0(τ), y0(τ) = cos τ, 2 sin τ

in these coordinates. Let us compute the tangent vector to thecurve, v(τ) ≡ γ(τ). By definition, the tangent vector v acts onfunctions f(x, y) as

v f ≡ d

dτf(x0(τ), y0(τ)).

The tangent vectors ∂x and ∂y are a basis in the tangent spaceat any point. The components of v in the basis ∂x, ∂y arefound by using the coordinates x and y instead of f in theabove equation. We find

vx = v x =d

dτx0(τ) = − sin τ,

vy = v y =d

dτy0(τ) = 2 cos τ.

Thus, v = − (sin τ) ∂x + 2 (cos τ) ∂y . By inspection, we canexpress v directly through x and y as

v = −1

2y∂x + 2x∂y.

Using this formula, we can define v as a vector field every-where (not only on the curve γ). However, there are infinitelymany different vector fields v that coincide with γ on thecurve γ; the vector field v shown above is just one such fieldthat was, by a coincidence, easy to write down in the presentcase.Let us also compute the components of the vector field v ina different coordinate system. For example, consider a newlocal coordinate system a, b defined by

a = x+ y, b = ex−y.

The inverse functions are

x =1

2(a+ ln b) , y =

1

2(a− ln b) .

The components of v are found as

va = v a =

(

−1

2y∂x + 2x∂y

)

(x+ y) = −1

2y + 2x,

vb = v b =

(

−1

2y∂x + 2x∂y

)

ex−y =

(

−1

2y − 2x

)

ex−y.

Substituting x, y through a, b, we can express the vector v inthe new coordinate system as

v =

(3

4a+

5

4ln b

)

∂a +

(

−5

4a− 3

4ln b

)

b∂b.

Practice problem: (a) Determine the components of the vec-tor v = x∂x + y∂y in the coordinate system a, b, wherea = x+ y, b = 3xy.(b) Given the vector u = ∂x + x∂y , find a coordinate sys-tem X,Y (where X and Y are functions of x, y) such thatu = ∂X .Answers: (a) v = a∂a + 2b∂b. (b) One suitable coordinatesystem is X = x, Y = x2 − 2y, and there are infinitely manyothers.

In most calculations involving directional derivatives, wewill not actually need to introduce local coordinates xα andthe coordinate basis ∂/∂xα. Instead, we will use the follow-ing properties of vectors, which follow immediately from thefact that vectors are directional derivatives at a point:

v (f + g) = v f + v g, [linearity] (1.2)

v (fg) = (v f) g + fv g, [Leibnitz rule] (1.3)

v f = 0 if f = const around p0. (1.4)

For example, one can prove that the chain rule follows fromthe properties (1.2)-(1.4) .

Statement 1.2.4.1: If f is an analytic function f : R → R; g isa smooth functionM → R; and v is a derivation, then

v (f(g)) =df(g)

dgv g.

(Proof on page 167).

8

Page 17: Winitzki. Advanced General Relativity

1.2 Basic notions: Manifolds and vector fields

v p

p′

Figure 1.4: A short curve segment between points p and p′ canbe approximated by a tangent vector v.

1.2.5 Tangent vectors as short curve segments

An intuitive picture of an ordinary vector is that of a piece ofa straight line with an arrow at one end. In a curved space,there are in general no naturally defined “straight lines,” buta sufficiently short line segment can be considered “almoststraight.” So one can visualize a tangent vector v ∈ TpMheuristically as a short line segment going from a point p to aneighbor point p′. Let us now explore this heuristic picture,since it provides an important intuitive link between the or-dinary geometric notions in flat space and the correspondingnotions in the curved space calculus.To build a correspondence between tangent vectors andline segments, consider a curve γ(τ) that starts at p and goesthrough p′. Let us assume that p corresponds to τ = τ0, that is,p = γ(τ0), and likewise p

′ = γ(τ0 +σ), where σ is heuristicallya “very small” number. (In other words, we will take the limitσ → 0 at the end). Now we would like to define a tangentvector v that corresponds to the short segment of the curve γbetween the points p and p′.A tangent vector is, by definition, a derivative along a curve(the derivative is applied to scalar functions). So it is naturalto consider the derivative of an arbitrary function f along thecurve γ at point p,

γ f ≡ limδτ→0

f(γ(τ0 + δτ)) − f(γ(τ0))

δτ.

While finding the exact value of γ f requires taking the limitδτ → 0, we will compute γ f sufficiently precisely if we usethe finite value δτ = σ, provided that σ is sufficiently small.Then we obtain the approximate expression

σγ f ≈ f(p′) − f(p).

The error of this approximation is of order σ2.Thus it is natural to identify the tangent vector σγ as a rep-resentation of the short curve segment between the points pand p′ (see Fig. 1.4). Let us temporarily denote this vector byv ≡ σγ. The vector v does not depend (up to terms of orderσ) on the choice of the parameter τ or the curve γ.Given two “sufficiently close” points p and p′, how can wedetermine the tangent vector v that represents the “almoststraight” line between p and p′? We can choose some curveγ(τ) passing through these points, such that γ(τ0) = p andγ(τ1) = p′. We can compute the vector γ at the point p, andthen the formula for v is

v = (τ1 − τ0) γ(τ0) ∈ TpM.

The tangent vector v represents the “arrow between thepoints” p and p′ in the sense of the formula

f(p′) − f(p) ≈ v f, (1.5)

which is approximately valid for any smooth function f up tosecond-order corrections.

1.2.6 *Tangent space as space of derivations

In this section we will see that the properties (1.2)-(1.4) are aminimal necessary set of properties describing all the possibledirectional derivatives. Therefore, one may forget that Dγ wasdefined using some curve γ(τ), drop the subscript γ, call anymap D from functions to numbers satisfying the above prop-erties a derivation at p0, and define the tangent space Tp0M asthe set of all possible derivations. This somewhat abstract def-inition of TpM is more elegant from the purely mathematicalpoint of view but entirely equivalent to the earlier definitions,because every derivation D satisfying the properties (1.2)-(1.4)is a directional derivative Dγ along some curve γ. In otherwords, the only way to differentiate a function is to take aderivative along a curve in some direction.To prove these statements, let us perform explicit calcula-tions in a local coordinate system xα. If D is a derivationsatisfying Eqs. (1.2)-(1.4) and f(p) is a smooth function, thenwe can apply the chain rule (see Statement 1.2.4.1) and com-pute

D f(p) =∑

α

∂f

∂xαD xα(p),

where Dxα is the application of the derivation D to the scalarcoordinate function xα(p). Hence,

D f(p) =

[∑

α

(D xα)∂

∂xα

]

f(p) ≡∑

α

uα∂

∂xα,

where uα are coefficients defined by uα ≡ D xα. It is alwayspossible to find a curve γ(τ)with the derivatives dγα/dτ = uα

at the point p. Thus any derivationD is equivalent to a tangentvector in the previous sense.

1.2.7 Vector fields and flows

A vector field on a manifoldM is obtained by choosing a tan-gent vector v|p at each point p ∈ M. Most of the time, we willonly need a vector field defined in a neighborhood of somepoint, rather than on the entire manifoldM.Another way to visualize a vector field is to imagine that themanifold is filled by nonintersecting curves going through allits points (or at least through every point in some neighbor-hood); see Fig. 1.5. Such a collection of curves is called a con-gruence of curves or a flow of a vector field. If a congruenceof curves is given, derivatives along curves at various pointspwill determine tangent vectors v|p at all points and thus de-fine a vector field v. The curves in the congruence are alsocalled the flow lines of the vector field v. In our notation, acurve γ is the flow line of a vector field v iff γ(τ) = v|γ(τ) at

every point p ≡ γ(τ) along the curve.Flow lines of a nonzero, smooth vector field always existand are unique, due to a well-known theorem of calculus. Tosee this, consider a local coordinate system xαwhere a vec-tor field v is specified by its components vα as v =

α vαeα

(the components vα can be found as vα = v xα). A curveγ(τ) is specified by n coordinate functions γα(τ). The con-dition γ(τ) = v|γ(τ) translates into a system of differentialequations,

dγα(τ)

dτ= vα|p=γ(τ) .

We may choose the initial condition as γα(0) = xα(0), where

xα(0) are the local coordinates of an initial point of the curve.

9

Page 18: Winitzki. Advanced General Relativity

1 Calculus in curved space

xy

z

pv(p)

Figure 1.5: A vector field v tangent to a congruence of curves,which are the flow lines of v. For each point p, thevector v(p) belongs to the tangent space TpM at p.

Since the vector field v is smooth, such a system always hasa unique solution γα(τ), which represents the required curveγ(τ) in the local coordinate system. (Of course, this systemof equations may be complicated and so it is not always pos-sible to obtain an explicit formula for the solution γα(τ). Inany case, an approximate solution can be found numerically.)Therefore, the flow lines of a nonzero, smooth vector field al-ways form a congruence that (at least locally) fills space.

Example: Given a vector field v, consider the flow lines of vgoing through each point of some neighborhood. Let τ be theparameter along each of the flow lines γ(τ), such that γ = v

everywhere. This defines τ as a scalar function on the mani-fold. We can compute the scalar function v τ . Along everyflow line, we have

v τ =d

dττ = 1,

thus v τ = 1 everywhere.

If xµ are local coordinates around apoint p0, we can con-sider a congruence of curves of varying x0 and constant x1, x2,... The tangent vector field to this congruence is ∂/∂x0. Thiscongruence consists of “curved coordinate axes” of the coor-dinate x0. Similarly, the other coordinate basis vectors ∂/∂x1,∂/∂x2, etc., can be seen as tangent vectors to a congruence ofcurves made by other coordinate axes.

1.2.8 *Tangent bundle

The set of all the tangent spaces (one tangent space TpM foreach point p ∈ M) is denoted TM and called the tangent bun-dle of the manifoldM. The tangent bundle is itself a manifoldwith dimension 2n ifM has dimension n. This manifold isperhaps not easy to visualize as a whole, but we will not needto do anything complicated with the tangent bundle TM asa manifold. Let us only verify that TM has the structure of asmooth manifold.

By assumption,M is a smooth manifold and, as such, canbe covered by a set of charts. Consider one such chart thatmaps a subset U ofM into a subset V ofRn. Tangent vectors atpoints p ∈ U are mapped into tangent vectors in V , which aresimply vectors in Rn since V is a portion of the flat Euclideanspace Rn. Therefore, the part of TM consisting of tangentplanes to points p ∈ U is one-to-one mapped into V × Rn.Since the set V × Rn is a subset of Rn × Rn = R2n, we haveestablished that TM can be covered by patches, and one can

find charts that map these patches into R2n. In other words,the tangent bundle TM is a smooth 2n-dimensional manifold.

Remark: It is important to realize that the tangent bundle,as a whole manifold, is not necessarily equivalent (i.e. diffeo-morphic) to the direct productM × R

n, even though everysmall patch of TM is diffeomorphic to a patch of M × Rn.The reason is the same as that for the manifold M not be-ing diffeomorphic to Rn, even though small patches ofM arediffeomorphic to patches of Rn. Namely, the patches may beglued together in a nontrivial way such that the global ge-ometry ofM is different from that of Rn. A detailed consid-eration would lead us too far into differential topology, so Iwill give only a qualitative example. Suppose a diffeomor-phism φ : M×Rn → TM exists, such that every vector spacep × Rn, where p ∈ M, is mapped one-to-one onto the vectorspace TpM via an invertible linear transformation. Then forevery p ∈ M one may apply φ to p× 1, 0, 0, ..., 0 ∈ M× Rn

and obtain a nonzero vector e1(p) ∈ TpM at every point p.However, such a vector field e1 sometimes cannot exist; atypical example is the 2-sphere S2. It is known that there ex-ists no everywhere nonvanishing and smooth tangent vectorfield on a 2-sphere. This property is called the “impossibil-ity of combing a sphere” (see Statement 6.3.2.1). So it is im-possible to choose even a single nonzero vector field e1 thatsmoothly varies throughout the sphere. Thus, a diffeomor-phism φ : S2 × R2 → TS2 does not exist. A tangent bundlewhich is diffeomorphic toM× Rn is called trivial. It followsthat the 2-sphere S2 has a nontrivial tangent bundle.

1.2.9 Tensor fields

A covector is a linear map ω from vectors v into numbers. Iassume that you know this standard construction from linearalgebra, and I will only summarize the concepts we will need.In the index notation, covectors are denoted by letters witha subscript index, ωα. In our notation, a covector ω appliedto a vector v gives a number ω v (read “ω applied to v”);sometimes we alsowrite ω(v) for the same thing. (In the indexnotation, this is ωαv

α.)Covectors themselves naturally form a vector space calledthe dual space. The dual space to the tangent space TpM isdenoted by T ∗

pM and consists of covectors acting on tangentvectors at point p. The space T ∗

pM is also called the cotangentspace at point p.A covector field, also called a 1-form, is a choice of a cov-ector ω|p from the dual space (cotangent space) T ∗

pM at eachpoint p. A 1-form ω can be seen as a linear map from vectorfields to functions onM, by acting pointwise on a vector fieldv and producing a scalar function f ,

ω : v 7→ f ; ω|p v|p ≡ f(p).

An example of a 1-form is the mapping of a vector v intothe derivative of a fixed function f in the direction v, that is,the map v 7→ v f for a fixed f . Clearly, this is a linear mapand hence, by definition, it is a 1-form. This 1-form is calledthe gradient of the function f and is denoted df . Thus, fora fixed function f we define the 1-form df by its action on avector field v as follows,

(df) v ≡ v f.

Let us recall that tangent vectors can be understood asan approximate representation of short curve segments (see

10

Page 19: Winitzki. Advanced General Relativity

1.2 Basic notions: Manifolds and vector fields

Sec. 1.2.5). If a vector v represents a short curve segment be-tween points p and p′ then the 1-form df acting on v yieldsapproximately f(p′) − f(p), according to Eq. (1.5).

Example: Consider a two-dimensional manifold with a lo-cal coordinate system x, y. The gradient of the coordinatefunction x is the 1-form dx. By definition, this 1-form acts onvector fields u as (dx) u = u x. For instance, if u = ∂x thenobviously u x = 1. In this notation, we obtain the equation

(dx) u = (dx) ∂x = 1,

which may look unusual or even confusing at first glance.However, it is only the notation that is unusual; the resultsare always consistent.Consider another vector field v = y2∂x and a function

f(x, y) = x2y3. The field v acts on f and produces the func-tion

v f = y2 ∂

∂xf(x, y) = 2xy5.

The 1-form df can be computed using standard rules of cal-culus as

df = 2xy3dx+ 3x2y2dy.

However, note that now df , dx, and dy are well-defined ob-jects (1-forms) rather than heuristic “infinitesimals.”The action of the 1-form df on the vector v is, by definition,

(df) v ≡ v f = 2xy5.

Let us see how this result can be obtained using the explicitexpressions for df and v. We have

(df) v =(2xy3dx+ 3x2y2dy

) y2∂x

=(2xy5

)dx ∂x +

(3x2y4

)dy ∂x.

By definition, dx∂x = 1 and dx∂y = 0, therefore (df)v =2xy5 as before.

Practice problem: (a)Given the vector v = ∂x+x∂y , find all1-forms ω = a(x, y)dx+ b(x, y)dy such that ω v = 0.(b) Given the 1-form ω = xdy + ydx, find all vectors v =a(x, y)∂x + b(x, y)∂y such that ω v = 0.

Similarly to vector fields, one can define tensor fields of ar-bitrary rank. For example, if a linear transformation L|p isdefined in each tangent space TpM, such that L|p v|p is an-other vector from TpM, then we have a tensor field L on themanifold M. Also, one defines n-forms as totally antisym-metric tensor fields of rank (0,n). (The calculus of n-forms isexplained in more detail in Sec. 1.4.) The operations of ten-sor product L1 ⊗ L2, the exterior product ω1 ∧ ω2 of n-forms,and tensor contraction (for example, the contraction ω v ofa 1-form and a vector) are naturally defined on tensor fields.These algebraic operations are performed on tensors in eachtangent space TpM separately.

1.2.10 Commutator of vector fields

Since a vector field v can be viewed as a (first-order) differen-tial operator acting on functions as v : f 7→ v f , one mayconsider the commutator of two such operators,

[u,v] f ≡ u (v f) − v (u f) . (1.6)

It turns out (see Statement 1.2.10.1) that the result is again aderivation, i.e. a first-order differential operator acting on func-tions, even though it may appear at first glance that [u,v] f

involves second-order derivatives of f . For this reason, theoperation [u,v] f is equivalent to some vector field actingon f . This vector field is denoted by [u,v] and is called thecommutator of the fields u and v.

Statement 1.2.10.1: (a) It follows from the definition (1.6)and Eqs. (1.2)-(1.3) that the commutator [u,v] satisfies thedefining properties (1.2)–(1.4) of a derivation. Hence, [u,v]is itself a vector field. (b) The components cα of the commu-tator [u,v] in terms of the components uα and vα in a localcoordinate system xαwith a coordinate basis eα,

u =∑

α

uαeα, v =∑

α

vαeα, [u,v] ≡∑

α

cαeα,

are given by the expression

([u,v])β ≡ cβ = uα

∂vβ

∂xα− vα

∂uβ

∂xα. (1.7)

(c) If eα ≡ ∂/∂xα are the coordinate basis vector fields definedthrough a local coordinate system xα then [eα, eβ] = 0.Proof of Statement 1.2.10.1: (a) The properties (1.2) and(1.4) are obvious; the nontrivial one is the Leibnitz rule (1.3).To verify the Leibnitz rule, we compute [u,v] (fg), where fand g are smooth functions:

[u,v] (fg) = u (gv f + fv g) − v (gu f + fu g)= gu (v f) + fu (v g) − gv (u f) − fv (u g)

+ (u g) (v f) + (u f) (v g)− (v g) (u f) − (v f) (u g)

= ([u,v] f) g + f [u,v] g.

(b) A straightforward calculation using the representationof the basis fields eα as differential operators yields

[u,v] =∑

α,β

(

uα∂

∂xα

)(

vβ∂

∂xβ

)

−∑

α,β

(

vα∂

∂xα

)(

uβ∂

∂xβ

)

=∑

α,β

uαvβ∂

∂xα∂

∂xβ+∑

α,β

(

uα∂vβ

∂xα

)∂

∂xβ

−∑

α,β

vαuβ∂

∂xα∂

∂xβ−∑

α,β

(

vα∂uβ

∂xα

)∂

∂xβ

=∑

α,β

(

uα∂vβ

∂xα− vα

∂uβ

∂xα

)∂

∂xβ≡∑

β

cβ∂

∂xβ.

This shows that [u,v] is indeed a first-order differential oper-ator, and also gives an explicit formula for cβ .(c) For a smooth function f(x1, ..., xn), we have

∂xα∂

∂xβf =

∂xβ∂

∂xαf

according to a well-known theorem of calculus. Therefore thebasis vector fields eα commute with each other.

Remark: Statement 1.2.10.1 presents two ways of provingthe fact that [a,b] is a first-order differential operation, inparts (a) and (b). The first proof (a) is performed purely al-gebraically, using the abstract definition of tangent vectors as(unspecified) operations v (...) with certain properties. Theother proof (b) uses a specific representation of tangent vec-tors as differential operators

α vα∂/∂xα in a local coordinate

system xα.

11

Page 20: Winitzki. Advanced General Relativity

1 Calculus in curved space

Both the proofs have certain advantages and disadvan-tages. The proof (a) is very general because it uses only someproperties of v (...). So this proof applies not only to di-rectional derivatives but to any operation with the proper-ties (1.2)-(1.3), such as the Lie derivative and the covariantderivative that we will use later. The proof (a) applies equallywell to derivatives and to finite difference operators, to opera-tors in infinite-dimensional spaces, and so on. The proof (a) isalso more “elegant” because it is index-free, does not involveunnecessary structures (such as a local coordinate system anda basis), and does not depend on the explicit representation ofvectors in coordinates. This is the kind of proof a mathemati-cian would be looking for.On the other hand, the proof (b) is conceptually easier tounderstand (especially for beginners) because it consists ofa specific computation in a familiar and elementary context,namely, manipulations with partial derivatives of functions.Also, the result of the proof (b) is a directly usable formula forthe components of the vector [u,v], while the proof (a) merelyshows that [u,v] is a well-defined vector. In the text that fol-lows, I will usually not need explicit formulas for componentsof vectors, and therefore index-free calculations will be prefer-able. However, in every case one can translate the final resultsinto components in a local coordinate system.

It is important to understand the geometric interpretationof the commutator. Imagine drawing the flow lines of twovector fields a and b (Fig. 1.6). Let us start from a point p0,follow the flow line of a for a small interval δτ of the param-eter τ , and then follow the flow line of b for another inter-val δτ ; we will arrive at a point p. If we again start from p0

but first follow the lines of b and then the lines of a for thesame parameter distances δτ , we will arrive, in general, at adifferent point p′. In the limit δτ → 0, the points p and p′

will become infinitesimally close to each other and to p0. Asshown in Statement 1.2.10.2, the line between p and p′ spec-ifies a well-defined vector in the limit δτ → 0, namely thevector [a,b] δτ2.

Statement 1.2.10.2: Consider an arbitrary smooth functionf , vector fields a,b, and the points p0, p, p

′ in the notation ofFig. 1.6. The commutator [a,b] describes the difference be-tween f(p) and f(p′) in the following way,

limδτ→0

f(p) − f(p′)

δτ2= ([a,b] f)|p0 .

(Proof on page 167.)

Practice problem: Assuming a local coordinate systemx, y, z, determine the commutators [a,b], [a, c], and [b, c],where a = x∂x + y∂y + z∂z , b = x∂y − y∂x, c = ∂z .Answers: [a,b] = 0, [a, c] = −∂z , [b, c] = 0.

1.2.11 Connecting vectors

If two vector fields a and b are such that [a,b] = 0, the vectorfield a is called a connecting vector for the field b. (Since[a,b] = −[b,a], the field b is then also a connecting vectorfor the field a. One also says that the vector fields a and b

commute with eath other.) The notion of a connecting vectorturns out to be very useful in index-free calculations becauseof the following geometric interpretation.Consider a vector field v and a congruence γ of its flow lines(see Fig. 1.7). Suppose that a subset of the flow lines is labeledby an additional real-valued parameter s, so that γ(τ ; s) is a

a

b

p′

p0

p

p1

p2

Figure 1.6: The commutator [a,b] of two vector fields mea-sures the discrepancy between the points p and p′

obtained by following the flow lines of the vec-tor fields in different order, starting at a point p0,for small intervals δτ of the parameter. The blueand the black arrows are the vector fields a and b,while the dotted lines are the flow lines of thesevector fields. The point p is obtained by followingthe line p0 → p1 and then the line p1 → p, while thepoint p′ is obtained by following the line p0 → p2

and then p2 → p′. The small green vector betweenthe points p and p′ is equal to [a,b] δτ2 in the limitof small distances.

τ = 3

τ = 2

τ = 1

τ = 0

v

c

Figure 1.7: Connecting vectors c (thick arrows) point in the di-rection of equal parameter τ along a congruence offlow lines of v (thin lines). Dotted lines are the flowlines of c, which are also the lines of equal τ .

12

Page 21: Winitzki. Advanced General Relativity

1.3 Lie derivative

smooth function R × R → M and the tangent vector v is ex-pressed as v = “∂γ/∂τ”. Using the function γ(τ ; s), we mayalso consider the curves γ(s) ≡ γ(τ0; s) for a fixed value ofτ = τ0. The curves γ(s) go “straight across” the curves γ(τ),intersecting the latter at points of equal τ . We can then de-fine the tangent vector field c to the curves γ in the usual way,c = “∂γ/∂s”. The geometric interpretation of the vector c isthat it points “sideways” with respect to the curves γ(τ), con-necting a point p on a curve γ(τ ; s) to a nearby point p′ ona neighbor curve, γ(τ ; s′), where s and s′ are “infinitesimallyclose” while τ stays constant. In other words, the connectingvector c “connects” neighboring flow lines γ(τ) of the field v

at “corresponding” points, i.e. at points with the same valueof the parameter τ . Thus, the flow lines of c are also lines ofconstant τ . Now it is easy to check that c is a connecting vectorfor v according to the above definition.

Calculation 1.2.11.1: For the vector fields v and c definedin the preceding paragraph through the function γ(τ ; s), wehave [c,v] = 0. (Details on page 167.)

Since the parameter τ for different flow lines of a given vec-tor field v can be started with different values at differentpoints, there are infinitely many connecting vector fields fora given field v. The freedom of selecting a connecting field forv is the same as the freedom of choosing the initial values ofthe parameter τ on each flow line of v.If a vector c is fixed at one point p0, say c(p0) = c0, one canalways select c(p) in the neighborhood of p0 in such a waythat the field c commutes with v everywhere. To find suchc(p), one needs to solve the equation [c,v] = 0. It is easy tosee from the coordinate representation

[c,v] =∑

β

(∑

α

cα∂vβ

∂xα−∑

α

vα∂cβ

∂xα

)

that the equation [c,v] = 0 is a linear differential equation forthe unknown vector field c that involves only the derivativeof c along the flow lines of v. Therefore, it is always possibleto solve that equation with any given initial condition c0.

Remark: Since [v,v] = 0, every field v can be formally con-sidered as a connecting field for itself, although there seemsto be no useful geometric interpretation of that statement.

Vector fields ∂xµ defined through a coordinate system xµalways commute. In general, any vector field v can be con-sidered as a coordinate derivative ∂x in some local coordinatesystem that includes x as one of the coordinates. To show this,it is sufficient to demonstrate that there exists a basis of con-necting fields that includes v; then the coordinate system canbe constructed by using the flow lines of these vector fields.

Statement 1.2.11.2: For any vector field v 6= 0 given onan N -dimensional manifoldM, there exist connecting fieldsc1, c2, ..., cN−1 such that v, c1, c2, ...cN−1 is a basis at anypoint p in a neighborhood of an initial point p0. (Proof onpage 167.)

The geometric interpretation of a connecting basis can beunderstood from Fig. 1.6. If every pair of basis vector fieldsea, eb are connecting, there is no discrepancy between pointsobtained by following the coordinate lines in different order.The absence of this discrepancy is necessary for the existenceof a coordinate system xµ such that ea = ∂/∂xa.

Practice problem: (a) In a local coordinate system x, y, de-termine all connecting vector fields c for the vector u = x∂y ,

i.e. all c = a(x, y)∂x + b(x, y)∂y such that [c,u] = 0 every-where.(b) In a local coordinate system x, y, z, find an explicit ba-sis of connecting vector fields cj , j = 1, 2, 3, for the vectorv = x∂y + y∂z .Answers: (a) c = f(x)u + g(x) (x∂x + y∂y), where f and gare arbitrary functions of x. (b) One such basis is c1 = x∂y +y∂z , c2 = ∂z , c3 = x2∂x + xy∂y +

(y2 − xz

)∂z .

1.3 Lie derivative

A vector field v acts as a directional derivative on scalar func-tions. The Lie derivative Lv with respect to a vector field v

is an important differential operation that can be applied notonly to scalars but also to all tensor fields. We now deduce theformulas for Lv in an axiomatic way, i.e. by assuming somedesirable properties of this operation. Later we will give ageometric interpretation of Lv.We would like Lv to act as the usual directional derivativeon scalar functions f ,

Lvf = v f. (1.8)

Additionally, Lv should have the following properties, fullyanalogous to Eqs. (1.2)-(1.3), but applicable to arbitrary ten-sors A,B:

Lv(A+B) = Lv(A) + Lv(B), (1.9)

Lv(A⊗B) = Lv(A) ⊗B +A⊗ Lv(B), (1.10)

Lv(A B) = Lv(A) B +A Lv(B), (1.11)

where A ⊗ B is the tensor product and A B here denotesany “pairing” of the tensors A and B (e.g. ω v and also v fare “pairings” in this sense). As we will see, these propertiesuniquely define the action of Lv on any tensor.

1.3.1 Commutator as Lie derivative

For instance, if f is a scalar and u,v are vectors, we expect tohave

Lv(u f) = [Lv(u)] f + u Lv(f).

Using Eq. (1.8), this is rewritten as

v (u f) = [Lv(u)] f + u (v f).

Thus,

[Lv(u)] f = v (u f) − u (v f) = [v,u] f.

In other words, the Lie derivative Lv of a vector field u coin-cides with the commutator of v and u,

Lvu = [v,u] = −Luv.

In the index notation, we have, according to Eq. (1.7),

(Lvu)α

=∑

vβ∂u

∂xβ

α

− uβ∂vα

∂xβ. (1.12)

It is important to realize that the Lie derivative Lvu de-pends on the derivatives of v (not only on the value of v andthe derivatives of u).

13

Page 22: Winitzki. Advanced General Relativity

1 Calculus in curved space

Calculation 1.3.1.1: The Lie derivative Lvu involves deriva-tives of v. In particular, if u,v are vector fields and λ is a scalarfunction then

Lλvu = λLvu− (u λ)v.

To derive this expression, we use the antisymmetry of thecommutator and the Leibnitz rule for Lv:

Lλvu = [λv,u] = − [u, λv] = −Lu (λv)

= −Lu (λ)v − λLu (v)

= − (u λ)v + λLvu.

1.3.2 Lie derivative of tensors

Let us continue to investigate the properties of the operationLv. Consider a 1-form ω and a vector field u. According toEq. (1.11), we have

Lv(ω u) = Lv(ω) u + ω Lv(u).

On the other hand, ω u is a scalar function, so Eq. (1.8) gives

Lv(ω u) = v (ω u).

Hence, the Lie derivative of a 1-form ω with respect to v is the1-form Lvω that acts on an arbitrary vector u as

(Lvω) u = v (ω u) − ω [v,u] .

In the index notation,

[Lv(ω)]µ = vα∂αωµ + ωα∂µvα.

The action of the Lie derivative on an arbitrary tensor canbe derived similarly, using the properties (1.8)–(1.11). Forexample, suppose A is a bilinear form with vector values,i.e. A(u,v) is a vector. (The index notation for A would beAλαβ .) Then the Lie derivative LwA is defined as

[LwA] (u,v) = [w, A(u,v)] −A([w,u] ,v) − A(u, [w,v]).

In the index notation,

(LvA)λαβ = vµ∂µA

λαβ +Aλαµ∂βv

µ +Aλµβ∂αvµ −Aµαβ∂µv

λ.

The Lie derivative with respect to a coordinate basis vec-tor of a given coordinate system has special properties withrespect to that coordinate system.

Statement 1.3.2: Let xµ be local coordinates in a manifold,and consider the standard coordinate bases ∂/∂xµ ≡ ∂µand dxµ in the tangent and the cotangent spaces respec-tively.(a) The Lie derivatives with respect to a coordinate basisvector of another basis vector or of a basis 1-form are zero,

L∂µ(∂ν) = 0, L∂µ

(dxν) = 0.

(b) Let us represent an arbitrary tensor of rank (m,n) in thecoordinate basis by an array of components. For example, atensor T of rank (1, 2) is represented by an array Tαβγ as

T =∑

α,β,γ

Tαβγ∂

∂xα⊗ dxβ ⊗ dxγ .

If it is known that L∂/∂x1T = 0, it means that the componentfunctions Tαβγ are independent of the coordinate x

1.

Proof: (a) The coordinate basis vectors commute since

L∂µ(∂ν) f =

∂xµ∂

∂xνf − ∂

∂xν∂

∂xµf = 0

for a smooth function f .The Lie derivative of a 1-form is defined through the actionof the 1-form on an arbitrary vector field v,

(L∂µdxν

) v = L∂µ

(dxν v) − dxν (L∂µ

v).

By definition of the 1-form dxν we have

dxν v ≡ v xν , dxν (L∂µ

v)≡(L∂µ

v) xν ,

where xν is understood as a scalar function representing theν-th coordinate of a point. Hence

(L∂µdxν

) v = ∂µ (v xν) − [∂µ,v] xν

= v (∂µ xν) = v δνµ = 0.

(b) It is sufficient to show the proof on the given exampleinvolving a tensor T of rank (2, 1). Using the Leibnitz rule,we compute

0 = L∂1T =∑

α,β,γ

(L∂1Tαβγ) ∂α ⊗ dxβ ⊗ dxγ

=∑

α,β,γ

(∂

∂x1Tαβγ

)

∂α ⊗ dxβ ⊗ dxγ ,

since L∂1∂α = 0 and L∂1dxβ = 0 for every α, β. Since theset of basic tensors of the form ∂α ⊗ dxβ ⊗ dxγ is linearly in-dependent, it follows that every derivative ∂1T

αβγ vanishes.

1.3.3 Geometric interpretation

To illustrate the geometric significance of the Lie derivative,we first note that the Lie derivative of a connecting vectorfield is zero. A connecting vector, such as c in Fig. 1.7, is trans-ported from the initial point by the flow of the field v. We canimagine that the entire neighborhood of points around the ini-tial point p is transported along the flow lines of v to anothernearby point p′; this neighborhood is deformed in a certainway by the transport. The character of this deformation isdetermined by the vector field v in the neighborhood of theinitial point p and along the way to p′. Tangent vectors at pcan be visualized as directions towards points near p. So alltangent vectors a,b, ... ∈ TpM are also transported to partic-ular tangent vectors a′,b′, ... ∈ Tp′M (see Fig. 1.8). Thus theflow of v determines a map between tangent spaces TpM andTp′M, as long as the points p and p′ lie on a single flow line ofv.We have already seen (Sec. 1.2.11) that a vector field u ob-tained by transporting an initial vector u|p along the flow of vsatisfies Lvu = 0. (The transport from an initial point definesthe vector field u along a single flow line of v, so we need touse many flow lines and many initial values to define the vec-tor field u everywhere.) In general, for two vector fields u andv we will have Lvu 6= 0. So we might say (heuristically) thatthe Lie derivative Lvu measures the extent to which a vectorfield u fails to be a connecting field for v.To make these considerations somewhat more precise, letus temporarily denote by ϕτ : TpM → Tp′M the map thattransports tangent vectors along the flow lines of v from point

14

Page 23: Winitzki. Advanced General Relativity

1.4 Calculus of differential forms

v

a pb

p′

a′

b′

Figure 1.8: A neighborhood of points around p is transportedalong the flow lines of a vector field v to a neigh-borhood around p′. Initial tangent vectors a andb from TpM are visualized as directions betweennearby points. These vectors are transported intovectors a′ and b′ from Tp′M.

p to point p′, where τ is the parameter distance between p andp′. (In other words, we introduce a curve γ parameterized byτ such that γ(0) = p and γ(τ) = p′, while γ = v everywherealong γ.) In this notation, the vectors in Fig. 1.8 satisfy

a′ = ϕτ (a) , b′ = ϕτ (b) .

Nowwewould like to compute the Lie derivative Lvu, whereu is some vector field. Let us compute Lvu at point p; let usassume that τ is very small and so v is a vector pointing fromp to p′. We might try to write the derivative of u along v atpoint p as

limτ→0

u|p′ − u|pτ

, ???

but this expression is meaningless since we cannot subtractvectors u at different points: these vectors belong to differenttangent spaces. However, we can transport u|p′ ∈ Tp′M back

to the tangent space TpM using the inverse map ϕ−1τ . So we

write the derivative of u along v at point p as

Lvu = limτ→0

ϕ−1τ

(

u|p′)

− u|pτ

. (1.13)

The geometric interpretation of this expression is clear: Lvu

is the extent to which the field u fails to be transported alongv.4

Similar considerations apply to the Lie derivative of a ten-sor A with respect to a vector field v. Any tensor A can bedefined as a multilinear function of tangent vectors and cov-ectors, while vectors and covectors are defined through thepoints in an infinitesimal neighborhood of p. Thus, any ten-sor A is ultimately an object defined through combinations ofpoints near p. When these points are transported by the flowof a vector field v, the corresponding transport ϕτ (A) of the

4Although in this section I do not actually prove that Eq. (1.13) is equivalentto the previous definition of Lvu, a proof can be straightforwardly filledin or found in differential geometry textbooks, albeit perhaps in a moreformal or abstract language.

tensor A is also well-defined. Then the Lie derivative LvA isdefined by a formula analogous to Eq. (1.13).It follows from expressions derived above that the Liederivative LvA|p of a tensor A at a point p contains not onlyderivatives ofA but also derivatives of v. So the tensor LvA|pdepends not only on the value v|p of the vector field v atthe point p but also on the values of v in an entire neighbor-hood of p. Thus the Lie derivative LvA|p is not a true direc-tional derivative of A in the direction of v at the point p. Atrue directional derivative along a curve γ(τ), i.e. a deriva-tive that depends only on the value of the vector γ ≡ v(p)at the point p, must be independent of the derivatives of v atp. Such a true directional derivative cannot be defined with-out introducing some additional structures on the manifoldM. This is so because the tensors A(p) and A(p′) at neigh-boring points belong to different tensor spaces, and thus thereis no definition for a quantity such as A(p′) − A(p). Similarly,there is no possibility to integrate vectors or tensors over a do-main. We might say, in figurative language, that the tangentspaces “are disconnected” and a directional derivative cannotbe computed without a “connection” between these spaces.(The structure called the “connection” will be introduced inSec. 1.6 below.) When we evaluate the Lie derivative Lv, theinformation about how to connect the tangent spaces comesfrom the flow of the vector field v. This allows us to relatedirections around p to directions around p′, i.e. to “connect”the tangent spaces TpM and Tp′M. However, this procedureuses information about the vector field v in the entire neigh-borhood of p, not just at one point p.

1.4 Calculus of differential forms

The calculus of differential forms is a basic tool of differen-tial geometry. Differential forms are not widely used in stan-dard texts on GR because they do not provide a decisive com-putational advantage as long as one remains within the stan-dard introductory material. A substantial computational ad-vantage of differential forms is first seen when consideringmore advanced material, such as the Frobenius theorem orthe tetrad formulation of GR. However, the calculus of formsis relatively simple, and in my view the learning effort in-volved is amply justified by the conceptual advantages andwide-ranging applicability of differential forms in theoreticalphysics and mathematics. In the following subsections I ex-plain the motivation for introducing differential forms. Then(Sec. 1.4.3 and below) I give an overview of the basic proper-ties of differential forms, mostly without proof. Since the cal-culus of forms is standardmaterial and since it will be used inthis book only as a computational tool, this brief introductionwill suffice.5

1.4.1 Volume as antisymmetric tensor

The familiar notion of volume can be given a natural interpre-tation in terms of antisymmetric multilinear functions of vec-tors, i.e. antisymmetric tensors. This interpretation is a greathelp in calculations because antisymmetric tensors have richproperties.We begin with the two-dimensional Euclidean plane R

2. Intwo dimensions, the area plays the role of volume, so let us

5The calculus of differential forms is explained, for example, in chapter 4of [33] or in chapter 7 of [1].

15

Page 24: Winitzki. Advanced General Relativity

1 Calculus in curved space

consider the area of a parallelogram spanned by two vectorsu,v. According to standard formula of elementary geometry,the area of the parallelogram is

A = |u| |v| sinα,

where α is the angle between the vectors, which is a numberbetween 0 and π determined by

cosα ≡ u · v|u| |v| .

Here |u| ≡ √u · u and u · v is the standard Euclidean scalar

product. Thus the area of a parallelogram spanned by u andv is a nonnegative number defined by

A(u,v) ≡√

|u|2 |v|2 − (u · v)2.

This number appears to be a complicated nonlinear functionof the vectors u and v. It is more convenient to work withthe oriented area A, which is defined as A = +A if the pair

u,v has positive orientation and A = −A otherwise. (Wecan fix the “positive orientation” to be that of the coordinateaxes x, y in the plane, taken in this order.) The decisiveadvantage of the oriented area A over the conventional (un-

signed) area A is that A(u,v) turns out to be an antisymmetricbilinear function of u and v. In mathematics, a number-valuedbilinear function of two vectors is usually called a bilinearform, while an antisymmetric bilinear form is called a 2-form.The oriented area A(u,v) is by definition antisymmetric in

(u,v), since the pair v,u has opposite orientation to u,v.The fact that A(u,v) is an bilinear form can be demonstratedquite straightforwardly. When one of the vectors u or v ismultiplied by a positive number, the area of a parallelogramis multiplied by the same number. If one of the vectors ismultiplied by a negative number, the orientation of the pair

u,v is reversed and thus A(u,v) changes sign. We also have

A(0,v) = 0. So it is easy to see that A(λu,v) = λA(u,v),where λ is an arbitrary real number (positive, zero, or nega-

tive). Since A(u,v) = −A(v,u), the same property holds forthe argument v. Further, one can show by an elementary geo-metric construction (see Fig. 1.9) that the area does not changewhen a multiple of v is added to u, namely

A(u + λv,v) = A(u,v).

In two dimensions, two vectors u,v are a basis if A(u,v) 6=0. It follows that A is linear in each argument: we can compute

A(u1+u2,v) by decomposing u1 = λ1u+µ1v, u2 = λ2u+µ2v,and then it is easy to see that

A(u1 + u2,v) = A(u1,v) + A(u2,v).

The same property holds for the argument v. Thus, A(u,v)is an antisymmetric bilinear form that we may call the area2-form.To make the above discussion less abstract, let us express allquantities in Cartesian coordinates x, y. The vectors u andv have components u1, u2 and v1, v2; the unoriented areais

A(u,v) =

(u21 + u2

2) (v21 + v2

2) − (u1v1 + u2v2)2.

After some algebraic simplification, this becomes

A(u,v) =

(u1v2 − u2v1)2 = |u1v2 − u2v1| . (1.14)

u

v

u + λv

Figure 1.9: The area of the parallelogram spanned by a pairof vectors u,v is equal to the area spanned byu + λv,v.

Now it is clear that A(u,v) is the absolute value of a 2-form Adefined by

A(u,v) = u1v2 − u2v1 =

∣∣∣∣

u1 v1u2 v2

∣∣∣∣.

The sign of A is chosen such that the area of the unit squarespanned by the unit basis vectors 1, 0 and 0, 1 (in this or-der) is equal to 1. We note that the oriented area A is moreclosely related to linear algebra than the unoriented area A.A similar argument (with a suitable generalization ofFig. 1.9 to three dimensions) shows that the oriented 3-volume

V (a,b, c) of a parallelepiped spanned by three vectors a,b, cin a three-dimensional Euclidean space is a totally antisym-metric trilinear form (3-form). The same conclusion holds inany dimensions. The n-dimensional oriented volume of a par-allelepiped spanned by vectors a1, ..., an in Euclidean space isgiven by

V (a1, ...,an) =

∣∣∣∣∣∣∣

a11 · · · a1

n.... . .

...an1 · · · ann

∣∣∣∣∣∣∣

,

where aij is the i-th component of the vector aj .

Remark: determinants. The last formula suggests a closeconnection between antisymmetric forms and the calculus ofdeterminants. Indeed, such a close connection exists. Forinstance, one can show that the oriented volume of an n-dimensional parallelepiped transformed by a linear map Twill change by a numerical factor, which depends on T butdoes not depend on the parallelepiped. This numerical fac-tor is equal to the determinant of T . In Sec. 1.4.5, we use thisproperty as a convenient definition of the determinant of a lin-ear transformation T .

A somewhat more complicated concept is the area of aparallelogram spanned by two vectors u,v in a higher-dimensional Euclidean space, for example, in R3. The ordi-nary (unoriented) area A(u,v) is a nonlinear function of u,vgiven by Eq. (1.14). In three (or more) dimensions, one can-

not define an “oriented” area A(u,v) linear in u,v. Instead,it turns out that one can define a vector-valued antisymmetric,bilinear functionA(u,v) whose absolute value is equal to thearea A(u,v). This function is the familiar vector product ofthe vectors u and v,

A(u,v) = u× v.

Indeed, it is well known that

|u × v| = |u| |v| sinα = A(u,v).

16

Page 25: Winitzki. Advanced General Relativity

1.4 Calculus of differential forms

Thus, one could say that in three dimensions the area has anorientation vector. Thus, it is natural to regard the fundamen-tal area as vector-valued; the ordinary area is then computedas the absolute value of the “area vector.”

Remark: The same idea can be generalized to n-dimensionalparallelepipeds embedded in N -dimensional space (whereN > n). This time, it turns out that the “orientation” of then-dimensional volume is not a vector but an antisymmetrictensor; so the “oriented” n-dimensional volume is naturallytensor-valued. The ordinary volume is a suitable “absolutevalue” of that tensor. More explanations can be found inSec. 1.4.4. One may even change the point of view and regardthe oriented volume as a basic quantity — which is a certaintotally antisymmetric tensor — and the ordinary volume as aquantity defined as the absolute value of the oriented volume.

1.4.2 Motivation for differential forms

In standard courses of multivariate calculus, one studies inte-grals such as

∫∫∫ +∞

−∞

e−x2−y2−z2dxdydz = π3/2.

Expressions such as “dxdydz” occurring under the sign ofmultiple integration are usually explained as a purely sym-bolic notation. It is perhaps not obvious to a student of calcu-lus that “dxdydz” above can be interpreted, quite rigorously,as an antisymmetric product of “dx,” “dy,” and “dz,” ratherthan an ordinary, symmetric product. In fact, the antisymmet-ric interpretation of “dxdydz” is necessary if one wishes to ob-tain a consistent calculuswhere objects such as “dx” or “dxdy”are well-defined and transform correctly under a change ofvariables. Differential forms can be understood as the rigor-ous mathematical objects that make precise sense out of ex-pressions such as “dxdydz.” In this section we explore thesemotivational ideas.Let us begin by considering an example familiar from me-chanics. Suppose a particle moves in three-dimensional spacealong a curve x(τ) ≡ X(τ), Y (τ), Z(τ) and experiences aspace-dependent external force F ≡ F1, F2, F3. The workdone by the force can be computed as the line integral,

W =

x(τ)

(F1dx+ F2dy + F3dz) . (1.15)

This expression is computed as the ordinary integral

W =

∫ τ2

τ1

(

F1X + F2Y + F3Z)

dτ.

However, this form of the expression requires a parameteri-zation x(τ), while the workW is actually independent of thechoice of the parameter τ along the curve: it depends only onthe force F and on the layout of the given curve in space. Soit is more natural to work with the line integral (1.15). We caninterpret the integrand in Eq. (1.15) directly as a 1-form

ω ≡ F1dx+ F2dy + F3dz.

Note that the tangent vector xdτ is an approximate represen-tation of a short curve segment (see Sec. 1.2.5) between thepoints x(τ) and x(τ + dτ). The expression

(

F1X + F2Y + F3Z)

dτ = ω xdτ

is interpreted as the 1-form ω evaluated on the tangent vec-tor xdτ . This expression represents a small amount of workdone along a short curve segment. So the line integral (1.15)is rewritten as

W =

∫ τ2

τ1

(ω x) dτ ≡∫

γ

ω,

where γ denotes the path along which the particle is movingin space (regardless of the choice of the parameter τ ).

Remark: Note that we do not write any “d”s after the inte-gral sign in the last expression; all the “d”s are contained inthe 1-form ω. We write the roman “d” in 1-forms to empha-size the fact that “dx” is a separate, rigorously defined object,while “dx” stands for a heuristic “infinitesimal change of x.”

In general, one defines the integral of a 1-form ω over acurve γ as follows. One splits the curve into a large num-ber N of short segments. Each segment is almost straight andthus can be approximately represented by a tangent vector(see Sec. 1.2.5). Thus we represent the N segments by N tan-gent vectors v1, ..., vN . Each segment contributes to the inte-gral a small amount computed as ω(vj) ≡ ω vj . The integral∫

γ ω is then defined as the limit of the sum of ω vj ,

γ

ω = limN→∞

N∑

j=1

ω vj .

The line integral∫

γ ω can be reduced to an ordinary inte-

gral by choosing a parameterization γ(τ) for the curve γ andwriting

γ

ω =

∫ τ2

τ1

(ω γ) dτ. (1.16)

But the result does not depend on the choice of the parame-terization. A line integral

γω is defined purely through split-

ting of the curve γ into small segments, and so is a geomet-rically defined quantity independent of the coordinate repre-sentation of γ.The calculus of 1-forms is compatible with the change ofvariables under the integral. For instance, consider a one-dimensional space and an ordinary integral

∫f(x)dx. After

a change of variable x = X(φ), the integral becomes

f(x)dx =

f(X(φ))dX

dφdφ.

This corresponds precisely to the chain rule for the differentialoperator d,

f(x)dx = f(x)dX(φ) = f(x)X ′(φ)dφ.

So it is consistent to interpret the expression f(x)dx within anordinary integral as a 1-form f(x)dx.Let us now consider two-dimensional integrals (surface in-tegrals). An integral over a surface with local coordinatesx, y is computed as a repeated integral,

∫∫

F (x, y)dxdy =

∫ x2

x1

dx

∫ y2

y1

dy F (x, y). (1.17)

Now we would like to interpret the expression F (x, y)dxdy asan analog and a generalization of the 1-form f(x)dx seen inthe previous example.

17

Page 26: Winitzki. Advanced General Relativity

1 Calculus in curved space

As in the previous example, we can divide the surface of in-tegration into a large numberN of small rectangles with sidesδx and δy. The sides of each rectangle are apprxoimately rep-resented by a pair of tangent vectors uj ,vj, j = 1, ..., N . Arectangle number j located at the point xj , yj gives a smallcontribution F (xj , yj)δxδy to the integral. It is clear that thenumber F (xj , yj)δxδy can be understood as the evaluation ofa bilinear form ω acting on tangent vectors uj ,vj, i.e.

F (xj , yj)δxδy ≡ ω (uj ,vj) .

The bilinear form ω is called a 2-form since it has two vectorarguments. Then the surface integral is defined as the limit

∫∫

Fdxdy = limN→∞

N∑

j=1

ω (uj ,vj) .

The 2-form ω can be temporarily written as ω = Fdxdy. Letus now explore its properties under the change of variables inthe integral.We first perform a simple change of variables, x, y →

x, φ, where we replace y by y = Y (x, φ) and Y is a fixedfunction. To transform the integral, it is convenient to use theformula (1.17). Since the variable x is held fixed within theintegration over y, we find

∫ x2

x1

dx

∫ y2

y1

dy F (x, y) =

∫ x2

x1

dx

∫ Φ2(x)

Φ1(x)

dφF (x, Y )∂Y (x, φ)

∂φ.

Hence, if we wish to interpret the expression Fdxdy consis-tently as a bilinear form, we must adopt the rule

dxdY (x, φ) =∂Y (x, φ)

∂φdxdφ.

However, according to the calculus of 1-forms, we have

dY (x, φ) =∂Y

∂xdx+

∂Y

∂φdφ

and (assuming that the product dxdy is bilinear)

dxdY (x, φ) = dx

(∂Y

∂xdx+

∂Y

∂φdφ

)

.

Therefore, this calculus is consistent with the properties oftwo-dimensional integration only if we assume that

dxdx = 0.

Now it is clear that the 2-formdxdymust be a rather specialkind of product of 1-forms dx and dy. This product is calledthe exterior product (also wedge product) and is denoted bythe symbol ∧; so one writes dx ∧ dy rather than dxdy. Theproperty

dx ∧ dx = 0

together with bilinearity means that the exterior product isantisymmetric. To see this, consider

d (x+ y) ∧ d (x+ y) = 0 ⇒ dx ∧ dy + dy ∧ dx = 0.

Let us now check the consistency of a more general changeof variables, x, y → φ, ψ where x = X(φ, ψ) and y =

Y (φ, ψ). Calculations are straightforward because the onlynew rule is the antisymmetry of the exterior product. We find

dx ∧ dy = dX(φ, ψ) ∧ dY (φ, ψ)

=

(∂X

∂φdφ+

∂X

∂ψdψ

)

∧(∂Y

∂φdφ+

∂Y

∂ψdψ

)

=

(∂X

∂φ

∂Y

∂ψ− ∂X

∂ψ

∂Y

∂φ

)

dφ ∧ dψ, (1.18)

where we used the antisymmetry properties

dφ ∧ dφ = dψ ∧ dψ = 0, dφ ∧ dψ + dψ ∧ dφ = 0,

to get the last line. Now we may recognize Eq. (1.18) asthe standard formula for the change of variables, involvingthe two-dimensional Jacobian of the transformation x, y →φ, ψ. In this way, we can see that the unusual rule of theexterior product is a natural consequence of the known prop-erties of multiple integration.6

The geometric formulation of the surface integral is there-fore the integral of a 2-form over a surface, written as

A

ω,

where A is a surface (or part of a surface) and ω is a 2-form.The same formulation applies to integrals over surfaces em-bedded in a higher-dimensional space. Consider another ex-ample familiar from physics: the flux of a magnetic field B

through a surface A is computed as a surface integral

Φ =

A

B · dS,

where dS is understood as a vector-valued “infinitesimal sur-face element.” The flux integral is usually written in compo-nents, B ≡ Bx, By, Bz, as follows:

A

(Bxdydz +Bydxdz +Bzdxdy) .

However, this notation needs to be supplemented by a com-plicated prescription for the orientation of the surface ele-ments. Integrals of this kind become more transparent in thenotation of 2-forms. One introduces the 2-form

β ≡ Bxdy ∧ dz +Bydz ∧ dx+Bzdx ∧ dy

and expresses the flux Φ as

Φ =

A

β.

The (somewhat complicated) rules for changing variables insuch integrals, as well as the orientation prescriptions, are au-tomatically reproduced by the calculus of differential forms.The rules of this calculus are simple: one needs to use thechain rule for d and the antisymmetry of ∧.Similarly, one can treat a three-dimensional integral,

∫∫∫

F (x, y, z)dxdydz,

6However, note that we obtained the Jacobian itself rather than its absolutevalue, which one has in formulas involving the area of domain. This isconsistent because integrals of 2-forms are defined on oriented surfacesand are not merely integrals of the ordinary, unsigned area. Reversingthe orientation of the surface will reverse the sign of the integral.

18

Page 27: Winitzki. Advanced General Relativity

1.4 Calculus of differential forms

as an integral of the 3-form F (x, y, z)dx∧dy∧dz over a three-dimensional manifold (or part of a manifold). The same ruleswill provide the correct formulas under any changes of vari-ables in the integral. Collectively, n-forms (n = 1, 2, 3, ...) arecalled differential forms since they represent expressions in-volving the differentials d.

After this motivation, we proceed to summarize the maindefinitions and properties of differential forms. We begin bystudying totally antisymmetric tensors.

1.4.3 Antisymmetric tensors

An n-form ω is a totally antisymmetric map from sets of nvector fields to numbers, ω(v1, ...,vn) ∈ R, which is linear ineach of its vector arguments vj . In other words, an n-form isa totally antisymmetric tensor of rank (0, n). The equivalentnotations ω(v1, ...vn) ≡ ω (v1, ...,vn)will be used for conve-nience or clarity.

When considering vector fields on manifolds, one dealswith vectors v|p defined in each tangent space TpM. So itis natural to consider n-forms ω(v1, ...,vn; p) defined in eachtangent space TpM, i.e. “n-formfields.” These “n-formfields”are called differential forms or simply n-forms, just as wesometimes call vector fields simply vectors.In a local coordinate system xα, tangent vectors are rep-resented by components vα and n-forms are represented bymulti-indexed arrays of components, ωαβ...γ, such that

ω(v1, ...,vn) = ωαβ...γvα1 v

β2 ...v

γn.

The array of components ωαβ...γ is totally antisymmetric inall indices. We use the index-free notation in which n-formsare represented explicitly through basis 1-forms such as dx.To express the antisymmetry of an n-form, one writes dx ∧dy, where ∧ denotes the special antisymmetric product to beintroduced now.The exterior product (also called thewedge product) of twoforms, say anm-form ω1 and an n-form ω2, is an (m+n)-formω1 ∧ ω2, defined by a total antisymmetrization of products ofω1 and ω2,

(ω1 ∧ ω2) (v1, ...,vm+n) ≡1

m!n!

σ

(−1)|σ|

×ω1

(vσ(1), ...,vσ(m)

)ω2

(vσ(m+1), ...,vσ(m+n)

),

(1.19)

where the sum goes over all the permutations σ of the set1, 2, ...,m+ n, and |σ| = 0, 1 is a function showing whetherthe permutation σ is even or odd. Note that the factorsm!n! inEq. (1.19) will cancel due to the total antisymmetry of ω1 andω2. For instance, if θ is a 1-form and ω is a 2-form then

(θ ∧ ω) (x,y, z) = θ(x)ω(y, z) + θ(y)ω(z,x)

+ θ(z)ω(x,y).

Similarly, if η, θ, and ω are 1-forms then

(η ∧ θ) (x,y) = η(x)θ(y) − η(y)θ(x) = det

∣∣∣∣

η(x) η(y)θ(x) θ(y)

∣∣∣∣,

(η ∧ θ ∧ ω) (x,y, z) = det

∣∣∣∣∣∣

η(x) η(y) η(z)θ(x) θ(y) θ(z)ω(x) ω(y) ω(z)

∣∣∣∣∣∣

.

The components of η ∧ θ in a local coordinate system xαare

(η ∧ θ)αβ = ηαθβ − θαηβ ,

where ηα and θα are the components of the 1-forms η and θ.

Example: Consider a local coordinate system x, y, z andthe 1-forms η = ydx − 2dz, θ = xdy + ydz. The exteriorproduct of η and θ is computed as follows,

η ∧ θ = (ydx− 2dz) ∧ (xdy + ydz)

= xydx ∧ dy + y2dx ∧ dz + 2xdy ∧ dz

(we used dy∧dz = −dz∧dy). Let us now compute how the 2-form η∧θ acts on a pair of vectorsu = 5∂x−y∂z, v = ∂x+3z∂y.We have dy u = 0 and dz v = 0, which simplifies thecalculations:

(dx ∧ dy) (u,v) = (dx u) (dy v) = 15z,

(dx ∧ dz) (u,v) = − (dz u) (dx v) = y,

(dy ∧ dz) (u,v) = − (dz u) (dy v) = 3yz.

Finally, we compute

(η ∧ θ) (u,v) = xy · 15z + y2 · y + 2x · 3yz = 21xyz + y3.

A useful property is that the exterior product of linearly de-pendent 1-forms vanishes.

Statement 1.4.3.1: If ω1, ..., ωk are some 1-forms, then ω1∧...∧ωk 6= 0 iff the set of the 1-forms ωj is linearly independent.

Proof of Statement 1.4.3.1: If the set ωj is linearly depen-dent then we can express e.g. ω1 through other ωj , and theexterior product vanishes due to antisymmetry (ωj ∧ ωj = 0).If the set ωj is linearly independent then it is either alreadya basis or can be completed to a basis ω1, ..., ωk, ..., ωn in then-dimensional space of 1-forms. So the exterior product is (bydefinition) an alternating linear combination of tensor prod-ucts of basis 1-forms,

ω1 ∧ ... ∧ ωk =∑

σ

(−1)|σ|ωσ(1) ⊗ ...⊗ ωσ(k),

where the sum is over all permutations σ of the set 1, ..., k.Since the set of all basis tensors ωj1 ⊗ ...⊗ ωjk is by defini-tion linearly independent, the linear combination cannot van-ish.

1.4.4 *Oriented volume and n-vectors

In Sec. 1.4.1, we have introduced the notion of orientedarea and oriented volume. Now we develop a more gen-eral picture in which certain antisymmetric tensors representn-dimensional volumes embedded in N -dimensional space,where dimensions are arbitrary and N ≥ n.By analogy with the exterior product of forms, one can de-fine the exterior product of vectors. For instance, the exteriorproduct of two vectors is the antisymmetric tensor defined by

x ∧ y ≡ x ⊗ y − y ⊗ x; (x ∧ y)αβ = xαyβ − xβyα.

One can consider the vector space of linear combinations ofsuch exterior products; this is the space of antisymmetric ten-sors of rank (2,0). Similarly, the exterior product of n vectors

19

Page 28: Winitzki. Advanced General Relativity

1 Calculus in curved space

is an antisymmetric tensor of rank (n,0). Such totally antisym-metric tensors are sometimes called n-vectors ormultivectors;for instance, x ∧ y is called a bivector.

Using this construction, one can reinterpret an n-form as alinear map from n-vectors into numbers. For example, a 2-form ω acts on a bivector x ∧ y as follows,

ω(x ∧ y) = ω(x ⊗ y − y ⊗ x) = ω(x,y) − ω(y,x) = 2ω(x,y).

Notice the appearance of an extra factor 2. In the case of n-forms, this extra factor will be n! since we will need to sumover n! possible permutations of n vectors.

Remark: Some textbooks use extra factors of n!when defin-ing the exterior product of n 1-forms; for instance, one coulddefine (η ∧ θ) (x,y) = 1

2 (η(x)θ(y) − η(y)θ(x)). Then extrafactors of n!will appear in some equations but disappear fromsome other equations. These factors play no essential role,i.e. they are cosmetic (they merely make some equations bet-ter looking). Of course, one must keep track of the extra fac-tors when one uses formulas from different books.

An n-vector a1 ∧ ...∧ an can serve as a representation of then-dimensional volume of a parallelepiped spanned by a set ofn given vectors a1, ...,an in anN -dimensional space (whereN ≥ n). Statement 1.4.4.1 shows that the (oriented) volumesof two such parallelepipeds spanned by the sets a1, ...,anand b1, ...,bn are equal iff the two corresponding n-vectors,a1 ∧ ... ∧ an and b1 ∧ ... ∧ bn, are equal. This is proved by ge-ometric arguments similar to those in Sec. 1.4.1, and by usingan n-dimensional generalization of Fig. 1.9. Note that an anal-ogon of Statement 1.4.3.1 holds also for vectors: an n-vector isnonzero iff the set of its constituent vectors is linearly inde-pendent.

Statement 1.4.4.1: We consider an n-dimensional space Rn.(a) Every n-vector a1 ∧ ... ∧ an is proportional to every othern-vector. (b) The ratio of the oriented volumes of two n-dimensional parallelepipeds spanned by the sets a1, ...,anand b1, ...,bn are related by the same proportionality factoras the two corresponding n-vectors, a1∧...∧an and b1∧...∧bn.(Proof on page 168.)

The ordinary (scalar) volume of the parallelepiped spannedby a1, ...,an can be calculated as a suitably defined “abso-lute value” or the norm of the n-vector a1 ∧ ... ∧ an,

Vol (a1, ...,an) =

|a1 ∧ ... ∧ an|2.

The norm |a1 ∧ ... ∧ an| is defined in the usual mannerthrough the scalar product on the space of n-vectors, which isin turn defined through the scalar product in the vector space.For instance, the area of a parallelogram spanned by two vec-tors a,b in an N -dimensional Euclidean space RN is calcu-lated as follows. The “oriented area” of the parallelogram isrepresented by the bivector a ∧ b. Using the standard Carte-sian basis e1, ..., eN in RN , one can decompose this bivectorinto basis bivectors ei ∧ ej, where 1 ≤ i < j ≤ N , withsome coefficients Aij ,

a ∧ b =∑

1≤i<j≤N

Aijei ∧ ej .

The scalar product is then defined in the space of bivectorsby postulating that the bivectors ei ∧ ej constitute an or-thonormal basis. The scalar product of two arbitrary bivectors

is then computed as

i,j

Aijei ∧ ej

·

i,j

Aijei ∧ ej

=∑

i,j

AijBij

(all the sums are taken over 1 ≤ i < j ≤ N ). Hence, theordinary (scalar) area of the parallelogram is

Area (a,b) =

√∑

1≤i<j≤N

|Aij |2.

1.4.5 Determinants

A standard definition of the determinant of a square n × nmatrix Aij is a formula involving the matrix elements:

det (Aij) ≡∑

σ

(−1)|σ|A1σ(1)...Anσ(n),

where the sum is performed over all transpositions σ. A“transposition” is a one-to-one map from the set 1, 2, ..., nto the same set. The quantity |σ| is defined as 0 when σ is aneven transposition (equivalent to an even number of pair in-terchanges) and 1 if σ is an odd transposition. This formulais explicit but complicated. The geometric meaning and theproperties of the determinant are much more apparent if oneadopts a different (but equivalent) definition.

We define the determinant det T of a linear transforma-tion T in an n-dimensional Euclidean space Rn as the vol-ume of the image of the unit cube under the transformation

T . In other words, we select an orthonormal basis e1, ..., enand transform it using T , obtaining the vectors Te1, ..., Ten.By definition, det T is equal to the oriented volume of the n-dimensional parallelepiped spanned by these n vectors.A little work using the concept of multivectors shows thatthe the initial set of vectors e1, ..., en does not actually needto be an orthonormal basis; this is desirable, since we can thendefine the determinant of a transformation T without the ne-cessity to have a scalar product in the vector space.

Statement 1.4.5.1: Let v1, ...,vn be n arbitrary vectors;denote by Vol (v1, ...,vn) the oriented volume of the paral-lelepiped spanned by these vectors. Consider also the paral-

lelepiped spanned by the transformed vectors Tv1, ..., Tvn.Then the transformed volume is proportional to the original

volume with the factor det T ,

Vol(Tv1, ..., Tvn) = Vol (v1, ...,vn) · det T .

This factor is independent of the choice of the vectors vj. Asimilar relationship holds for the corresponding multivectors,

Tv1 ∧ Tv2 ∧ ... ∧ Tvn = (v1 ∧ v2 ∧ ... ∧ vn) det T .

(Proof on page 168.)

Due to Statement 1.4.5.1, we can reformulate the defini-tion of the determinant as follows: The determinant of T isequal to the factor that multiplies the volume of an arbitrary

n-dimensional parallelepiped after the transformation T .Using this definition, it is easy to derive the fundamentalproperty of determinants: the determinant of a product of twotransformations is equal to the product of the two determi-nants,

det(AB) = (det A)(det B).

20

Page 29: Winitzki. Advanced General Relativity

1.4 Calculus of differential forms

This property is a simple consequence of the fact that the vol-

ume of every parallelepiped transformed by A is multiplied

by det A.Finally, let us prove an important relationship involvingvolume in Euclidean space.

Statement 1.4.5.2: The volume of a parallelepiped spannedby vectors v1, ...,vn in a Euclidean space Rn is

Vol (v1, ...,vn) =√

det gij , gij ≡ vi · vj ,

where vi · vj is the scalar product of two vectors.

Proof of Statement 1.4.5.2: Let us consider an orthonor-mal basis ej in this space. The volume of a parallelepipedspanned by ej is, by definition, equal to 1. Since ej is a ba-sis, there exists a linear transformation T that brings ej intothe given set of vectors vj. By Statement 1.4.5.1, the volumeof the parallelepiped spanned by vj is equal to det T , so itremains to compute that determinant. Let us introduce the

components of the transformation T in the basis ej,

vj ≡ Tej =∑

k

Tjkek.

Since ek · el = δkl, we can express the matrix gij as

gij = vi · vj =

(∑

k

Tikek

)

·(∑

l

Tjlel

)

=∑

k

TikTjk.

It follows that the matrix gij is equal to the product of twomatrices Tij . Therefore, the determinant of gij is found as

det gij = (detTij) (detTji) = (detTij)2,

where we used the fact that the determinant of the transposedmatrix is the same as the determinant of the original matrix.Hence,

Vol (v1, ...,vn) = detTij =√

det gij ,

which is the desired formula.

1.4.6 Differential forms

Let us now consider differential forms, i.e. n-forms definedin the tangent space TpM at every point p of a manifoldM.An example of a differential form written in local coordinatesx, y, z is

ω = x2dy ∧ dz − 3y3dx ∧ dy(in this example, ω is a 2-form).For scalar functions (“0-forms”) f , the 1-form df is definedby

(df) v ≡ v f.So the notation “dx” itself can be understood as the differ-ential operator d acting on the function x. It follows thatdf(x) = f ′(x)dx, which is a familiar rule of calculus (the chainrule). For this reason, the notation dx for 1-forms is conve-nient.The operator d can be generalized to a linear operator actingon n-forms and producing (n+1)-forms; this operator is thencalled the exterior differential. So the exterior differential ofan n-form ω is an (n+ 1)-form written as dω.The exterior differential d can be defined either explicitly(see Eq. (1.20) below), or through the properties it satisfies.

These properties are the following. For any forms ω1 and ω2

and scalar functions f we have the Leibnitz-like rule

d (fω1) = df ∧ ω1 + fdω1,

d(ω1 ∧ ω2) = (dω1) ∧ ω2 + (−1)n1ω1 ∧ dω2,

where ω1 is an n1-form. Heuristically, one gets n sign changeswhen one “pulls d through an n-form.”The second fundamental property of the exterior differen-tial is d (dω) = 0 for any n-form ω. This property can beunderstood heuristically as follows: “d d” is symmetric withrespect to the exchange of d with d; each “d” adds a vectorargument into the form, but the result must be a totally anti-symmetric form. So we must have d d = 0.Due to the simplicity of these rules, calculations with differ-ential forms are straightforward.

Example: Consider 1-forms ω1 = x2dy and ω2 = xydx+2dy.The exterior differential of ω1 is

dω1 = d(x2dy

)= 2xdx ∧ dy + x2d (dy) = 2xdx ∧ dy,

because d (dy) = 0. Let us compute dω2:

dω2 = d(xy) ∧ dx+ xyd (dx) + 2d (dy)

= ydx ∧ dx+ xdy ∧ dx= −xdx ∧ dy,

because dx∧dx = 0 and dy ∧ dx = −dx∧dy due to antisym-metry. Consider now the differential of dω2,

d (dω2) = d (−xdx ∧ dy) = −dx ∧ dx ∧ dy = 0.

Another example calculation is

d (xydx ∧ dz) = d (xy) ∧ dx ∧ dz = −xdx ∧ dy ∧ dz.

A rather cumbersome but explicit formula can be derivedfor the differential dω of an arbitrary n-form ω. The (n + 1)-form dω acts on vectors v1, ..., vn+1 as follows,

(dω) (v1, ...,vn+1) ≡n+1∑

s=1

(−1)s−1vs ω(v1, ..., vs, ...,vn+1)

+∑

1≤r<s≤n+1

(−1)r+s−1ω([vr ,vs],v1, ..., vr, ..., vs, ...,vn+1),

(1.20)

where the hat over a vector, vs, indicates the absence of thevector vs among the listed arguments of ω. As an example ofusing this formula, consider a 1-form ω, then dω is the 2-formdefined by

(dω) (x,y) ≡ x (ω(y)) − y (ω(x)) − ω([x,y]). (1.21)

It is straightforward to check that this formula actually definesa bilinear map dω that does not contain derivatives of x or y.

Statement 1.4.6.1: Equation (1.21) defines a bilinear form dωdespite the apparent presence of derivatives of x and y.

Proof of Statement 1.4.6.1: Obviously the formula (1.21) de-fines an antisymmetric function of (x,y). So it is sufficient toshow that

(dω) (λx,y) = λ(dω) (x,y)

21

Page 30: Winitzki. Advanced General Relativity

1 Calculus in curved space

when λ is a scalar function of a point. Once this is proved, itwill follow that dω does not contain derivatives of x or y.The remaining part of the proof is thus a straightforwardcalculation:

(dω) (λx,y) = λx ω(y) − y (λω(x)) − ω([λx,y])

= λ (x ω(y) − y ω(x) − ω([x,y]))

− (y λ)ω(x) + ω ((y λ)x)

= λ(dω) (x,y).

Here we used Statement 1.3.1.1 to express [λx,y].

There is a useful relationship between the Lie derivativeand the exterior differential. For convenience of notation, oneintroduces the insertion operation ιv (also called the interiorproduct) that “inserts” the vector v into n-forms as the firstargument,

(ιvω) (v1, ...,vn−1) ≡ ω (v,v1, ...,vn−1) .

The insertion operator produces (n− 1)-forms out of n-forms.The Lie derivative can now be expressed through ι and d. Tosee how, let us first compute Lvω for some 1-form ω and vec-tor field v. The 1-form Lvω acts on an arbitrary vector u as

(Lvω) u = Lv (ω u) − ω (Lvu) = v (ω(u)) − ω([v,u]).

Let us compare this with the formula (1.21) for dω; by inspec-tion, we notice that

(Lvω) u = (dω) (v,u) + u (ω(v)) .

Using the insertion operator ιv, we can rewrite this as

(Lvω) u = (ιv (dω)) u + u (ιvω) .

Finally, note that ιvω is a scalar function, so

u (ιvω) ≡ (d (ιvω)) u.

Thus(Lvω) u = (ιv (dω) + d (ιvω)) u.

Omitting the arbitrary vector u, we obtain the Cartan homo-topy formula

Lvω = d (ιvω) + ιv (dω) ,

written also more concisely as

Lv = dιv + ιvd. (1.22)

This formula holds not only for 1-forms, but also for arbitraryn-forms ω; note that the exterior differential d brings n-formsinto (n+ 1)-forms, while Lv does not change the number ofarguments of n-forms. The Cartan homotopy formula for n-forms can be derived directly from the explicit definitions ofdω and ιv and the Leibnitz property of Lv, for instance byshowing that ιvdω = Lvω − dιvω. See also e.g. the book [33],§4.5.The following properties also hold for arbitrary n-form ω,

n′-form ω′, and vector fields v,x:

ιx (ω ∧ ω′) = (ιxω) ∧ ω′ + (−1)nω ∧ ιxω′,

(Lvd)ω = (dLv)ω,

Lv (ω ∧ ω′) = (Lvω) ∧ ω′ + ω ∧ Lvω′,

(Lvιx)ω = ι[v,x]ω + ιxLvω.

These properties can be verified by straightforward computa-tions which we omit.

1.4.7 *Canonical decomposition of 1-forms and2-forms

The material is covered in [2], volume 2, chapter 22, Ap-pendix; see also [16], chapter 5.

In this section I review some classical results obtained inthe theory of differential forms. The main statements are thefollowing.With a suitable choice of local coordinates x1, ..., xn in asuitable domain of an n-dimensional manifold:

• A given 1-form ω can be expressed in one of two ways:either as

ω = x1dx2 + ...+ x2k−1dx2k,

or as

ω = x1dx2 + ...+ x2k−1dx2k + dx2k+1.

• (The Darboux theorem) A given closed 2-form Ω can beexpressed as

Ω = dx1 ∧ dx2 + ...+ dx2k−1 ∧ dx2k.

• (The Poincaré lemma) Any closed n-form ω is locally ex-act: if dω = 0, there exists an (n− 1)-form θ such thatω = dθ in some domain (but perhaps not in the entiremanifold).

* Canonical decomposition of closed 2-forms (Darboux the-orem)We consider an n-dimensional smooth manifold. In a localcoordinate system x1, ..., xn, an arbitrary 2-form Ω can beexpressed as

Ω =1

2

n∑

i,j=1

Ωij(x1, ..., xn)dxi ∧ dxj , (1.23)

where Ωij = −Ωji is an x-dependent matrix. However, thecoordinate system may be chosen so that a particular 2-formis written in a simpler way. For example, a closed 2-form

A = dx1 ∧ dx2 + x3dx1 ∧ dx3

can be rewritten as

A = dx1 ∧ d(

x2 +1

2x2

3

)

= dx1 ∧ dy2, y2 ≡ x2 +1

2x2

3.

In this example, a particular choice of the new coordinatesx1, y2, x3, ..., xn reduces the 2-form A to a simpler expres-sion involving just two of the coordinates.It is useful to know how many different coordinates areneeded to represent a given 2-form Ω in the simplest possi-ble way. Of course, it is also useful to be able to determine thenew coordinates if a closed 2-form Ω is specified in a givencoordinate system.The Darboux theorem (proved below) says that for anyclosed 2-form Ω a suitable local coordinate system x1, ..., xncan be found such that Ω is locally (i.e. within a certain do-main) expressed as

Ω = dx1 ∧ dx2 + dx3 ∧ dx4 + ...+ dx2k−1 ∧ dx2k, (1.24)

where k ≤ n/2 is some number.

22

Page 31: Winitzki. Advanced General Relativity

1.4 Calculus of differential forms

It is advantageous to find such canonical local coordinatesbecause then calculations with Ω are much easier. The re-quired number 2k ≤ n of different coordinates in the decom-position (1.24) is called the rank of Ω.If a closed 2-form Ω is given in a certain coordinate system,can one determine the rank of Ωwithout knowing the canoni-cal coordinates (but knowing that they exist)? In the examplewith the 2-formA above, it is easy to guess the correct canon-ical coordinates. Consider another example,

B = dy1 ∧ dy2 + y1dy1 ∧ dy3.

In this case, it is not obvious how to find suitable canoni-cal coordinates x1, x2, x3, ... such that B is expressed as inEq. (1.24).To determine the rank of a given closed 2-form Ω, the fol-lowing trick can be used. One considers exterior powers of Ω,denoted

Ω∧k = Ω ∧ ... ∧ Ω︸ ︷︷ ︸

k times

.

Note that for a 2-formΩ the productΩ∧Ω does not necessarilyvanish. A sufficiently high power of Ω will, of course, vanish.So there will be some number k ≤ n/2 such that

Ω∧k 6= 0, Ω∧(k+1) = 0. (1.25)

From the Darboux theorem we know that the decomposi-tion (1.24) is possible in some coordinates xj. In these co-ordinates, we can compute

Ω∧k = k!dx1 ∧ ... ∧ dx2k 6= 0,

since by assumption all xj are independent coordinates in alocal coordinate system. However,Ω∧(k+1) = 0. Thus the rankof Ω is 2k, where k is the largest integer such that Ω∧k 6= 0 butΩ∧(k+1) = 0. If Ω = 0, we say that its rank is zero.For example, we can immediately see that the 2-form B

specified above has rank 2 because B ∧ B = 0. Thus, weshould expect to find suitable coordinates x1, x2, x3, ... suchthat B = dx1 ∧ dx2. Determining these coordinates in prac-tice may be far from easy; a possible method of finding xjis given in the proof of the Darboux theorem below.

Remark: The rank of a 2-form may be different at differentpoints of the manifold. Presently, we assume that we will bedealing only with such 2-forms whose rank remains constantlocally, i.e. throughout some neighborhood of some point.

The maximal rank of a 2-form on an n-dimensional mani-fold is n (for even n) or n−1 (for odd n). For a 2k-dimensionalmanifold, a 2-form Ω can be nondegenerate if the matrix Ωijis invertible (has nonzero determinant). This condition canbe expressed in the following way: for any nonzero vector xthere exists at least one vector y such thatΩ(x,y) 6= 0; in otherwords, the 1-form ιxΩ is a nonvanishing 1-form. Equivalently:if ιxΩ = 0 for some vector x then x = 0.

Statement 1.4.7.1: For a 2k-dimensional manifold, a 2-formΩ of maximal rank 2k is nondegenerate.

Proof: Suppose a vector x is such that the 1-form ιxΩ van-ishes, ιxΩ = 0. We would like to prove that x = 0. It followsfrom ιxΩ = 0 that

ιx(Ω∧k

)= k (ιxΩ) ∧ Ω∧(k−1) = 0.

The 2k-form Ω∧k is nonzero since Ω has rank 2k, thereforeΩ∧k is proportional to the volume form dx1 ∧ ...∧dx2k with anonzero coefficient. It follows that

ιx (dx1 ∧ ... ∧ dx2k) = 0.

Writing x =∑nj=1 aj∂/∂xj , it is straightforward to see that all

the coefficients aj must vanish. Therefore, x = 0.

Assuming that the Darboux theorem is true, it is easy to seethat a nondegenerate 2-form always has maximal rank. If Ωdoes not have maximal rank, the canonical decomposition ofa 2-form Ω is free of some of the coordinates, say of xn. Itfollows that the vector v ≡ ∂/∂xn satisfies ιvΩ = 0. Hence, Ωis degenerate.An algebraic version of the Darboux theorem is the follow-ing.

Statement 1.4.7.1: An arbitrary antisymmetric bilinear form(2-form) B can be expressed as

B = θ1 ∧ θ2 + ...+ θ2k−1 ∧ θ2k, (1.26)

where θj are some suitable 1-forms. The set θj of these 1-forms is linearly independent. The smallest required number(2k) of these n-forms is the rank of B, which is equal to therank of the matrix Bij in any basis.

Proof: The choice of a cnonical basis for an antisymmetricbilinear form B is a standard task of finite-dimensional lin-ear algebra. We consider an n-dimensional vector space. IfB = 0, the rank is zero and the statement is proved. If B 6= 0,there exists a vector e1 such that the 1-form ιe1B is nonzero.Thus there exists another vector e2 such that B(e1, e2) = 1.Since B is antisymmetric, the vectors e1 and e2 cannot beparallel to each other, thus they can be completed to a basise1, e2, e3, ..., en. We can choose the other basis vectors e3,..., en such that they are orthogonal to e1 and e2 with respectto B, i.e.

B(e1, ej) = B(e2, ej) = 0, j = 3, ..., n.

This choice is always possible because we may add to each ej ,j = 3, ..., n a suitable linear combination of e1 and e2,

ej = ej −B(e1, ej)e2 +B(e2, ej)e1,

so that now B(e1, ej) = B(e2, ej) = 0 for every j = 3, ..., n.For brevity, let us denote the resulting basis again by ej. Theresult of this construction is that the two subspaces spannedby e1, e2 and by e3, ..., en are orthogonal to each otherwith respect to the bilinear form B.After choosing the basis e1, ..., en in this way, we computethe dual basis 1-forms θ1, ..., θn for the basis e1, ..., en andrepresent the 2-form B as

B = θ1 ∧ θ2 +B(1),

where the new 2-form B(1) is equal to zero on e1, e2. Hence,it is sufficient to consider the 2-form B(1) within the (n− 2)-dimensional subspace spanned by e3, ..., en. We can nowapply the same construction to B(1) in a smaller number ofdimensions; either B(1) = 0, or we can find θ3 and θ4 suchthat B(1) = θ3 ∧ θ4 + B(2), etc. The construction eventuallystops because at every step the dimensionality of the space isreduced by two. At that point, we will obtain a decomposition

23

Page 32: Winitzki. Advanced General Relativity

1 Calculus in curved space

of the form (1.26), where all 1-forms θj are (by construction) alinearly independent set.In the basis ej, the 2-form B is represented by the n × nmatrix

0 1−1 0 0 0

. . .

0 0 1−1 0

0

0. . .

0

,

in which the lower right (n− 2k) × (n− 2k) block is zero.It is straightforward to verify that the rank of the 2-form

B is indeed 2k according to the definition (1.25) of the rank.The 2k-form B∧k is nonzero due to the linear independenceof θj,

B∧k = k! θ1 ∧ ... ∧ θ2k,while B∧(k+1) = 0.

The difference between the algebraic and the differential-geometric versions of the Darboux theorem is the following.According to the algebraic theorem just proved, one may re-duce any 2-form Ω to a canonical decomposition at any onepoint p. This decomposition will involve a certain choice ofthe 1-forms θj |p at each p. However, this procedure willgenerally generate different sets θj|p in tangent spaces atdifferent points p, so there will be no local coordinate sys-tem xj in which the 1-forms θj are equal to the coordi-nate 1-forms dxj at every point p. Using the algebraic ver-sion of the Darboux theorem, one could obtain the decompo-sition (1.24) at any one point p but not at neighboring points.The differential-geometric Darboux theorem says that one canactually choose a local coordinate system in which the 2-formΩ has a canonical decomposition (1.24) at every point withinsome domain.

* Proof of the Darboux theoremThe Darboux theorem states that a closed 2-form Ω of localrank 2k in a n-dimensional manifold can be written as

Ω = dx1 ∧ dx2 + ...+ dx2k−1 ∧ dx2k

in a suitable local coordinate system. This theorem is provedin three steps.The first step is to show that there exists a local coordinatesystem x1, ..., xn such that Ω depends only on the first 2kcoordinates x1, ..., x2k, i.e.

Ω =1

2

2k∑

i,j=1

Ωij(x1, ..., x2k)dxi ∧ dxj .

Then it is sufficient to consider the 2-form Ω on a 2k-dimensional submanifold where the remaining coordinatesx2k+1, ..., xn have fixed values.The first step can be proved as follows. If a 2-form Ω 6= 0has rank 2k < n then, by Statement 1.4.7.1, at every point thereexists a decomposition of the form (1.26). It follows that in thetangent space at every point p, there will exist a nonemptysubspace of vectors v such that ιvΩ = 0. Since the 2-formΩ is smooth, this subspace will also vary smoothly betweenpoints. So at least one smooth vector field v 6= 0 can be chosensuch that ιvΩ = 0 at every point within some domain. A local

coordinate system x1, ..., xn can be chosen such that v =∂/∂xn. Then the property ιvΩ = 0 means that the expressionfor Ωmay contain dx1, ..., dxn−1 but no dxn. Further, we havedΩ = 0 and thus (by the Cartan homotopy formula)

LvΩ = (dιv + ιvd)Ω = 0.

The property LvΩ = 0 means that the coefficientsΩij(x1, ..., xn) in the decomposition (1.23) do not actually de-pend on the coordinate xn (see Statement 1.3.2). Thus, it issufficient to study the restriction of the 2-formΩ to an (n− 1)-dimensional submanifold of constant xn. The same construc-tion can be applied to the reduced 2-form Ω. If the rank ofΩ is smaller than n − 1, anther coordinate xn−1 can be foundsuch that Ω does not indolve dxn−1 and the coefficients Ωijdo not depend on xn−1. Thus we will reduce Ω to an (n− 2)-dimensional manifold, etc. Each time, the dimension of themanifold is reduced by one, until we obtain a 2-form Ω ofrank 2k on a 2k-dimensional manifold, i.e. a 2-form of max-imal rank. A 2-form of maximal rank is nondegenerate (State-ment 1.4.7.1). Thus it is sufficient to prove the Darboux theo-rem only for nondegenerate 2-forms.The second step is to prove that there exists a local coordi-nate system x1, ..., xn such that a given closed, nondegen-erate 2-form Ω has constant coefficients Ωij in the coordinatesxj,

Ω =1

2

2n∑

i,j=1

Ω(0)ij dxi ∧ dxj , Ω

(0)ij = −Ω

(0)ij = const.

This involves a differential-geometric argument detailed be-low. The third step is to reduce the nondegenerate, antisym-

metric 2n× 2nmatrix Ω(0)ij to the canonical form

0 1−1 0 0

. . .

0 0 1−1 0

.

The reduction is performed through a change of basis. Ac-cording to Statement 1.4.7.1, a basis e1, ..., e2n can be chosensuch that the matrix Ω

(0)ij assumes the canonical form shown

above (there is no zero block since Ω(0)ij has maximal rank).

It remains to perform the second step. We fix a point p atwhich

Ω|p =1

2

2n∑

i,j=1

Ω(0)ij dxi ∧ dxj ,

and define a new 2-form Ω0 at all other points by the formula

Ω0 =1

2

2n∑

i,j=1

Ω(0)ij dxi ∧ dxj .

By construction, Ω = Ω0 at p, while Ω0 has “constant coeffi-cients” in the given coordinate system. Now we would liketo find a change of coordinates in a neighborhood of p suchthatΩ has constant coefficients in the new coordinates. This isequivalent to finding a diffeomorphism f : M → M such thatf(p) = p and f(Ω) = Ω0. Instead of finding f directly, we willfind a one-parametric flow of diffeomorphisms ft, t ∈ [0, 1],such that f0 = id (the identity map) and f1 = f is the map weneed.

24

Page 33: Winitzki. Advanced General Relativity

1.4 Calculus of differential forms

We would like to find a map ft that transforms Ω into a 2-form Ωt which is “between” Ω and Ω0. The first trick is towrite such an interpolation explicitly, i.e. we define

Ωt ≡ Ω + t (Ω0 − Ω) .

The 2-form Ω0 −Ω is closed and thus, by the Poincaré lemma,there exists a 1-form θ such that

Ω0 − Ω = dθ

(perhaps, the above will hold only in a smaller star-shapedneighborhood of the initial point p). The second trick is todescribe the flow ft through a tangent vector field vt. Thevector field vt will be obtained by the duality map from the1-form θ, using the 2-form Ωt as the “metric.” Converting a 1-form into a vector field is possible only if Ωt is nondegenerateat every point and for every t ∈ [0, 1]. We postpone the proofof the nondegeneracy of Ωt, and presently we assume that wehave a vector field vt satisfying

ιvtΩt = θ, t ∈ [0, 1] .

We now define the flow ft as the flow generated by the (time-dependent!) vector field vt. By construction, the flow ft trans-formsΩ into a t-dependent 2-formΩ(t)which satisfies the dif-ferential equation

d

dtΩ(t) = Lvt

Ω(t).

It is then straightforward to verify that the interpolation Ωtsatisfies this equation:

d

dtΩt = Ω0 − Ω = dθ,

LvtΩt = (dιvt

+ ιd)Ωt = dιvtΩt = dθ.

Thus Ωt = Ω(t) and is indeed the result of the transformationof the initial 2-form Ω by the flow ft. The diffeomorphism f1corresponding to t = 1 is the one that transforms Ω into Ω0.It remains to demonstrate that the 2-form Ωt has maximalrank for every t ∈ [0, 1] and at every point within some do-main. We note that Ωt interpolates between Ω and Ω0, whilethe 2-forms Ω and Ω0 coincide at the chosen point p. SinceΩt = Ω at p, it is clear that Ωt at p remains nondegenerate forall t; the present task is to show that Ωt has maximal rank ev-erywhere in some open domain around p. The 2-form Ωt hasmaximal rank 2n iffΩ∧n

t 6= 0. The 2n-formΩ∧nt is proportional

to the volume form dx1 ∧ ... ∧ dx2n, namely

Ω∧nt = ft(x)dx1 ∧ ... ∧ dx2n,

where the coefficient, temporarily denoted by ft, is a functionof the coordinates xj. The rank ofΩt is maximal if ft(x) 6= 0.We know that ft(p) 6= 0 for every t ∈ [0, 1]. Since the functionft is smooth, the domain where ft 6= 0 is a t-dependent openset containing the point p. The intersection of all such do-mains for every t ∈ [0, 1] is again a nonempty open set (pos-sibly smaller than the set where Ω has maximal rank). Thus,we have found an open domain containing the initial point pwhere Ωt is nondegenerate at every t.

* Canonical decomposition of 1-formsThe concepts of rank and canonical coordinates can be de-fined also for 1-forms. Heuristically, the rank of a 1-form isthe smallest number of independent coordinates required to

express this 1-form. This time, we do not limit ourselves toconsidering only closed 1-forms; but let us briefly examinethe case of a closed 1-form. By the Poincaré lemma, a closed1-form ω is locally represented as ω = df using some functionf . The function f will be nonconstant if ω 6= 0. Thus, f canbe used as one of the coordinates in a local coordinate system;so ω can be expressed using just one coordinate. We concludethat a closed, nonzero 1-form has rank 1.Below we will prove that any 1-form ω can be expressed(using suitable local coordinates) in one of two canonicalways: either

ω = x1dx2 + ...+ x2k−1dx2k (1.27)

(then we say that ω has rank 2k), or

ω = x1dx2 + ...+ x2k−1dx2k + dx2k+1 (1.28)

(then ω has rank 2k + 1). We may call the coordinates usedfor such decompositions canonical coordinates of a given 1-form.Knowing that this statement is true, how could we deter-mine the rank of a given 1-form ω, for example,

ω = dx+ xydz,

without knowing the canonical coordinates? We use the fol-lowing trick. Consider the 2-form dω and compute its exterior

powers, (dω)∧k, k = 1, 2, ... Write the following sequence of

the differential forms,

ω, dω, ω ∧ dω, dω ∧ dω, ω ∧ dω ∧ dω, ... (1.29)

The k-th element of this sequence (k = 1, 2, ...) is a k-form.Eventually some element of this sequence will be zero, andthen all the subsequent elements will also be zero. By sub-stituting the decompositions (1.27) and (1.28) into the se-quence (1.29), one finds that the rank of ω is equal to the num-ber of initial nonzero elements in the sequence. The rank ofω is zero if ω = 0; the rank of ω is one if ω 6= 0 but dω = 0;the rank of ω is two if dω 6= 0 but ω ∧ dω = 0; etc. In thisway it is straightforward to compute the rank of a 1-formgiven in some local coordinates. For example, the rank ofω = dx+ xydz is 3 (at points where xy 6= 0) because

ω ∧ dω = xdx ∧ dy ∧ dz 6= 0, dω ∧ dω = 0.

Remark: The Carathéodory theorem is a form of Frobeniustheorem. It states that the set of null curves of a 1-form ω issurface-forming iff the form ω has rank 1 or 2, so that ω ∧dω = 0. (Null curves of ω are curves γ such that ω γ = 0.)This is proved by an explicit construction in local coordinates,showing that a 1-form of rank 3 or higher has non-surface-forming null curves. (A non-surface-forming set of curves isa set such that curves from the set can reach any point in aneighborhood of an initial point.)

* Proof of the canonical decomposition theorem for 1-formsThe canonical decomposition theorem for 1-forms statesthat for any 1-form ω, one can choose a local coordinate sys-tem xj such that ω is locally represented in one of the twoways, (1.27) or (1.28), as long as ω has a locally constant rank.We prove this theorem by reducing it to the Darboux theo-rem. First, we determine the rank of a given 1-form ω by usingthe sequence (1.29). If ω has an odd rank 2k+1, the 2-form dωhas rank 2k (according to the definition of rank for closed 2-

forms) because (dω)∧k 6= 0 while (dω)∧(k+1) = 0. According

25

Page 34: Winitzki. Advanced General Relativity

1 Calculus in curved space

to the Darboux theorem, we can then choose local coordinagesxj such that

dω = dx1 ∧ dx2 + ...+ dx2k−1 ∧ dx2k.

Applying the Poincaré lemma to the closed 1-form

ω − x1dx2 + ...+ x2k−1dx2k,

we show that there exists a function f such that

ω = df + x1dx2 + ...+ x2k−1dx2k.

Since the 1-form ω has rank 2k + 1, we obtain

0 6= ω ∧ (dω)∧k = df ∧ k!dx1 ∧ ... ∧ dx2k.

Hence, the function f can be used as a local coordinate that isindependent of x1, ..., x2k. Denote this function f by x2k+1,we obtain the decomposition (1.28).In the second case, the 1-form ω has an even rank 2k. Thenthe 2-form dω again has rank 2k, and so we could again arriveat Eq. (1.28), but we would like to eliminate the last term inthat equation. So we need to work a little harder.The idea is to find a function f such that the 1-form fω hasrank 2k− 1. If such f is found, then by the previously provedcase it will follow that

fω = x1 ∧ dx2 + ...+ x2k−3 ∧ dx2k−2 + dx2k−1,

so we will obtain

ω =1

f(x1dx2 + ...+ x2k−3dx2k−2 + dx2k−1)

= y1dy2 + ...+ y2k−3dy2k−2 + y2k−1dy2k, (1.30)

where we simply relabeled the coordinates as

y1 ≡ 1

fx1, y2 ≡ x2, ..., y2k−1 ≡ 1

f, y2k ≡ x2k−1. (1.31)

By assumption, the 1-form ω has rank 2k and thus

(dω)∧k 6= 0, ω ∧ (dω)

∧k= 0.

Since (dω)∧k 6= 0, it will follow from Eq. (1.30) that all

y1, ..., y2k are independent local coordinates, and thus a de-composition of the form (1.27) will be found.It remains to determine f such that the 1-form fω has rank

2k − 1. It is sufficient to find some function f such that

0 = (d(fω))∧k

= (df ∧ ω + fdω)∧k

= kfk−1df ∧ ω ∧ (dω)∧(k−1)

+ fk (dω)∧k.

Thus the function f must satisfy the differential equation

(d ln f) ∧ ω ∧ (dω)∧(k−1) = −1

k(dω)∧k . (1.32)

To solve this equation for f (or rather, to show that a so-lution exists), we need to use some additional informationabout ω. We know that the 2-form dω has rank 2k. There-fore, by the Darboux theorem we may choose local coordi-nates x1, ..., xn such that

dω = dx1 ∧ dx2 + ...+ dx2k−1 ∧ dx2k.

Applying the Poincaré lemma to the closed 1-form

ω − x1dx2 + ...+ x2k−1dx2k,

we show that there exists a function h such that

ω = dh+ x1dx2 + ...+ x2k−1dx2k.

Since dω has rank 2k, we have

0 = (dω)∧(k+1)

= (k + 1)!dh ∧ dx1 ∧ ... ∧ dx2k,

so the function h can depend only on the subset of coordinatesconsisting of the first 2k coordinates x1, ..., x2k. Hence the1-form ω depends also only on these coordinates. Let us write

ω =

2k∑

j=1

ajdxj ,

where aj are coefficients that depend only on x1, ..., x2k.Since both parts of Eq. (1.32) depend only on these coordi-nates, the function f is a function only of x1, ..., x2k.We can now return to the task of determining a suitable

function f . First we express the (2k − 1)-form ω ∧ (dω)∧(k−1)

needed for Eq. (1.32) as

ω ∧ (dω)∧(k−1)

= (k − 1)! (a1dx1 + a2dx2) ∧ dx3 ∧ ... ∧ dx2k + ...

+ (k − 1)! (a2k−1dx2k−1 + a2kdx2k) ∧ dx1 ∧ ... ∧ dx2k−2.

The unknown 1-form d ln f can then be written as

d ln f ≡2k∑

j=1

ljdxj ,

where lj are unknown coefficients depending on x1, ..., x2k.Using this explicit representation, we compute

(d ln f) ∧ ω ∧ (dω)∧(k−1)

= (k − 1)! (a1l2 − a2l1 + ...

+a2k−1l2k − a2kl2k−1)dx1 ∧ ...dx2k

and hence is proportional to the 2k-dimensional volume form

(dω)∧k

= k!dx1 ∧ ...dx2k.

It is convenient to introduce an auxiliary vector field

v ≡ −a2∂

∂x1+ a1

∂x2− ...− a2k

∂x2k−1+ a2k−1

∂x2k

and write

(d ln f) ∧ ω ∧ (dω)∧(k−1)

= (k − 1)! (ιvd ln f)dx1 ∧ ...dx2k.(1.33)

Since both sides of Eq. (1.32) are proportional to the volumeform, the equation is simplified to

ιvd ln f = v ln f = −1.

Note that the field v can be also defined without coordinatesby the requirement

ιv

[

ω ∧ (dω)∧(k−1)

]

= 0.

In other words, v is a vector field that annihilates the nonzero(2k − 1)-form ω ∧ (dω)∧(k−1). (In general, a k-form vanishes

26

Page 35: Winitzki. Advanced General Relativity

1.5 Metric

on an (n− k)-dimensional hypersurface, so a (2k − 1)-formvanishes on a 1-dimensional subspace. The vector v is a basisvector in that subspace.)We found that only restriction on the function f is that itslogarithmic derivative in the direction of v must equal −1.Now it is clear that a solution f exists (at least locally). Aparticular solution f can be determined by integration alongthe flow lines of v from arbitrary initial conditions. This com-pletes the proof.

1.4.8 The Poincaré lemma

The Poincaré lemma states that a closed n-form ω is locallyexact: there exists an (n− 1)-form θ such that ω = dθ in astar-shaped neighborhood of some point.A domain D is a star-shaped neighborhood of a point p ifthere exists a diffeomorphism flow ft that contracts the do-main D into the point p. In other words, ft is a continuousset of diffeomorphisms, defined for t ∈ [0, 1] and such thatf0 = id and f1 maps the entire neighborhood D into the sin-gle point p.Suppose that D is a star-shaped neighborhood of p and asuitable flow ft is given. Let vt be the tangent vector field ofthe flow ft. Suppose that ω is an exact n-form defined in D.The flow ft transforms ω into a t-dependent n-form ωt thatsatisfies ω0 = ω, ω1 = 0, and

d

dtωt = Lvt

ωt.

Since dω = 0, we also have dωt = 0 since d commutes withdiffeomorphisms. Using the Cartan homotopy formula, wefind

d

dtωt = (dιvt

+ ιvtd)ω = dιvt

ω.

Now we integrate this relationship over t,

−ω = ω1 − ω0 =

∫ 1

0

(d

dtωt

)

dt = d

∫ 1

0

(ιvtω) dt.

Defining

θ ≡ −∫ 1

0

(ιvtω) dt,

we find the required relationship, ω = dθ.

1.4.9 Integration of forms

A differential n-forms can be integrated over a manifold ofdimension n: namely, 1-forms can be integrated over curves,2-forms over surfaces, etc. The integral of a given n-form ωover a given manifold A of dimension n is denoted by

A

ω.

Note that the differential “d” is not written after the integralsign because the n-formω already contains the correct numberof d’s.The integral

Aω is defined by the following procedure.

One starts by splitting the manifold A into a large numberN of small n-dimensional paralellepipeds. If all the par-allelepipeds are sufficiently small, one can approximate thesides of parallelepipeds by tangent vectors (see Sec. 1.2.5).Thus, to each parallelepiped numbered j (where j = 1, ..., N )

there corresponds a set of n tangent vectors v(j)1 , ...,v

(j)n .

The parallelepiped numbered j contributes a small amount tothe integral; this amount is equal to some number (the valueof the function being integrated) times the n-dimensional vol-ume of the parallelepiped. It is natural to regard the n-volumeof a parallelepiped spanned by v1, ...,vn as an n-form eval-uated on the vectors vi (see Sec. 1.4.1). So the contribu-tion of the j-th parallelepiped to the integral can be naturallydescribed as an application of some n-form to the n vectors

v(j)1 , ...,v

(j)n . In the integral we are evaluating, this n-form is

the given n-form ω. Hence, to define the integral∫

Aωwe eval-

uate the contribution ω (v(j)1 , ...,v

(j)n ) of each parallelepiped

j and then compute the sum of the contributions over all theparallelepipeds. Thus, the integral

A ω is defined as the limitN → ∞ of that sum,

A

ω ≡ limN→∞

N∑

j=1

ω (v(j)1 , ...,v(j)

n ).

This limit is quite similar to the limit involved in the ordinarydefinition of the Riemann integral. So integrals of n-forms areequivalent to n-fold integrals in the usual sense. However, animportant caveat is that integrals of n-forms are always per-formed over oriented manifolds. Since n-forms are antisym-metric, an integral will change sign when the orientation ofthe manifold is reversed.The fundamental relationship between integration and theexterior differential is

A

dω =

∂A

ω,

whereA is an n-dimensional manifold (or part of a manifold),∂A is its (n − 1)-dimensional boundary, and ω is an arbitrary(n − 1)-form. This theorem is a generalization of the Stokesand the Gauss laws, as well as the fundamental theorem of

calculus∫ b

a f′(x)dx = f(b) − f(a). Since this is a standard

result, we refer the reader to textbooks (such as [1]) for moredetailed explanations of this theorem.

1.5 Metric

1.5.1 Motivation: metric on surfaces

I will introduce the concept of a metric by starting from thepicture of a manifold as a surface embedded in a Euclideanspace Rn. The natural Pythagorean notion of distance in Rn,namely the Euclidean metric, defines the distance betweentwo points a,b ∈ Rn as

Distance (a,b) =

(b1 − a1)2 + ...+ (bn − an)

2, (1.34)

where an and bn are standard coordinates of the points a,b.Let us consider a surfaceM embedded into Rn. We wouldlike to compute the distance between two points a,b ∈ M, asmeasured along some path within the surface. However, if thesurface has a curved shape then one does not have a simpleand general formula analogous to Eq. (1.34) for distancesmea-sured within the surface. Nevertheless, if a pair of points a,bon the surfaceM are very close to each other, we may con-sider a short, almost straight curve segment within M con-necting these points. The length of this curve segment canbe accurately estimated using the Euclidean formula (1.34).

27

Page 36: Winitzki. Advanced General Relativity

1 Calculus in curved space

Then the length of any curve within the surface can be deter-mined by splitting the curve into sufficiently small segments.More precisely, one can integrate the infinitesimal lengths ofinfinitesimal segments along the curve.Let us explore this idea in more detail. To be specific,let us consider a two-dimensional manifold M embeddedin R3 as a surface z = F (x, y) in standard coordinatesx, y, z. We are interested in computing the length of acurve γ(τ) ⊂ M. The curve is specified by three functionsx(τ), y(τ), z(τ). Consider a very short curve segment be-tween τ = τ0 and τ = τ0 + δτ , where δτ is very small.This segment is an almost straight line between the close-bypoints x1, y1, z1 ≡ x(τ0), y(τ0), z(τ0) and x2, y2, z2 ≡x(τ0 + δτ), y(τ0 + δτ), z(τ0 + δτ) inR3. So wemay computethe length δL of the segment approximately as the Euclideandistance between these points,

δL ≈√

(x2 − x1)2

+ (y2 − y1)2+ (z2 − z1)

2.

Further, we may approximate

x2 − x1 = x(τ0 + δτ) − x(τ0) ≈ δτdx

dτ(τ0), etc.,

and hence

δL ≈ δτ

√(dx

dτ(τ0)

)2

+

(dy

dτ(τ0)

)2

+

(dz

dτ(τ0)

)2

.

In the limit δτ → 0, this expression becomes precise since theerror is of order δτ2. So the length of the curve γ(τ) betweenτ = τ0 and τ = τ1 can be found as an integral over the in-finitesimal curve segments,

L [γ] =

∫ τ1

τ0

√(dx

dτ(τ)

)2

+

(dy

dτ(τ)

)2

+

(dz

dτ(τ)

)2

.

So far we have used all three coordinates x, y, z to describethe curve. However, the pair x, y can be used as local coor-dinates onM, and so we would like to be able to computelengths of curves without using the coordinate z. In otherwords, we would like to use only an intrinsic description ofthe manifoldM. The “metric” represents the information oneneeds to be able to compute lengths in the intrinsic descrip-tion. Let us now see what information that must be in ourexample.Given the equation z = F (x, y), we may express

dz(τ)

dτ=∂F (x, y)

∂x

dx(τ)

dτ+∂F (x, y)

∂y

dy(τ)

dτ,

where it is implied that the derivatives of F are evaluated onthe curve γ(τ). Hence, the length δL of an infinitesimal curvesegment can be written as

δL = δτ

√(dx

)2

+

(dy

)2

+

(∂F

∂x

dx

dτ+∂F

∂y

dy

)2

= δτ

A

(dx

)2

+ 2Bdx

dy

dτ+ C

(dy

)2

, (1.35)

where we introduced auxiliary functions A,B,C expresseddirectly through F as

A(x, y) ≡ 1 +

(∂F

∂x

)2

, B ≡ ∂F

∂x

∂F

∂y, C ≡ 1 +

(∂F

∂y

)2

.

Now we note that Eq. (1.35) looks like a quadratic form ap-plied to the two-dimensional vector

γ ≡dx

dτ,dy

∈ Tγ(τ)M.

Recalling the correspondence between tangent vectors fromTpM and short curve segments (see Sec. 1.2.5), we can verifythat the tangent vector δv defined by

δv ≡ (δτ) γ = (δτ)

[dx

dτ∂x +

dy

dτ∂y

]

∈ Tγ(τ0)M

is the vector representing the short curve segment betweenγ(τ0) and γ(τ0 + δτ). Then δL can be expressed as

δL = δτ√

g(γ, γ) =√

g(δv, δv),

where g is the following bilinear form,

g ≡ Adx⊗ dx+B(dx⊗ dy + dy ⊗ dx) + Cdy ⊗ dy.

Thus δL (the length of a short curve segment) can be ex-pressed through the bilinear form g alone, without using anyother information (such as the function F (x, y) or the embed-ding ofM in the Euclidean space R3). The length of the curvebetween τ0 and τ1 can be expressed as

L[γ] =

∫ τ1

τ0

dτ√

g(γ, γ).

We conclude that the only information necessary to com-pute lengths along curves within a surfaceM is a symmet-ric bilinear form g defined in every tangent space TpM. Thisbilinear form g is called the metric on the manifold M. Inour example, once we computed the metric g (which, in a lo-cal coordinate system, is equivalent to determining the threefunctions A,B,C), we may stop using the embedding ofMin R3 because it is sufficient to use the local coordinate sys-tem x, y. This simplifies the calculations since one needs towork with fewer coordinates, and also allows one to concen-trate on the intrinsic properties of the manifoldM. Below wewill not use the embedding picture to the metric; instead, wewill work directly with the metric g, assuming that somehowg is given. It will be unimportant whether or not the metric gcomes from an embedding ofM in a larger Euclidean space.An embedding of M into Rn can be regarded as merely anauxiliary construction that helps visualize the concept of met-ric.

1.5.2 Definition

Ametric on a manifoldM is a nondegenerate, symmetric bi-linear form g(u,v) defined in the tangent space TpM at eachpoint p ∈ M. A symmetric bilinear form g is called nonde-negerate if for any vector u 6= 0 there exists a vector v suchthat g(u,v) 6= 0; in other words, no vector can be orthogonalto every other vector.The nondegeneracy condition prohibits “uninteresting”metrics, such as g = 0. Also, it is important to note that thenondegeneracy condition permits a vector x to be specifieduniquely through its scalar products g(v,x) with other vec-tors v. (If there were two vectors x 6= x′ such that g(v,x) =g(v,x′) for all v, then the vector x−x′ 6= 0would be orthogo-nal to every vector, but this is excluded by the assumption ofnondegeneracy of g.)

28

Page 37: Winitzki. Advanced General Relativity

1.5 Metric

It is known from standard linear algebra that symmetricbilinear forms have signature, which is a set of signs in-dependent of the choice of a basis. In GR we shall usu-ally consider four-dimensional metrics with the signature(+ −−−) as appropriate for a locally Lorentzian physical the-ory, but many results will be the same for arbitrary dimen-sion and signatures of the metric. Manifolds with a metricwith signature (+ + ...+) are calledRiemannian, and pseudo-Riemannian if the metric is not sign-definite. The familiarmetric with a Lorentzian signature is theMinkowski metric,ηµν = diag(1,−1,−1,−1).

Self-test question: The Minkowski metric η admits nullvectors n such that η(n,n) = 0; for example, n = 1, 1, 0, 0 isa null vector. Does this mean that the metric η is degenerate?Answer: No.

In GR, the metric describes physically measured lengthsand time intervals. We focus on the mathematical propertiesof the metric for now.

1.5.3 Examples of metrics

As a first example, consider a flat Minkowski spaceM ≡ R4

with standard coordinates t, x, y, z. The Minkowski metricg can be specified by the following index-free expression,

g(u,v) = (u t) (v t) − (u x) (v x)− (u y) (v y) − (u z) (v z) .

In this expression, the coordinates t, x, y, z are interpretedas scalar functions on the manifoldM, and the vector fieldsu,v are interpreted as derivative operations applied to thesescalar functions. We can then directly compute

g(∂

∂t,∂

∂t) = 1, g(

∂t,∂

∂x) = 0, g(

∂t+

∂x,∂

∂t+

∂x) = 0,

etc. This metric can be also written as

g = dt⊗ dt− dx⊗ dx− dy ⊗ dy − dz ⊗ dz,

where dt,dx,dy,dz are the 1-forms defined through the coor-dinate functions t, x, y, z.

Remark on notation: In the physics literature, one usuallyfinds the Minkowski metric written as follows,

ds2 = dt2 − dx2 − dy2 − dz2.

Strictly speaking, this notation is inconsistent; for instance, dt2

stands for the bilinear form dt ⊗ dt, and yet “ds2” cannot beinterpreted as ds⊗ ds, where ds is a 1-form. Equations of thiskind should be understood as being nomore than a traditional(or “jargon”) notation for the metric, in which “ds2” standsfor the bilinear form g. Also, physicists write “dt dx” for thesymmetric tensor product 1

2 (dt⊗ dx+ dx⊗ dt). We will fre-quently use the “physicists’s” metric notation for brevity (butwe will denote the metric by g instead of the inconsistent“ds2”). Thus, we will write dx2 and dt dx when the rigorousbut cumbersome notation dx⊗dx, 1

2 (dt⊗ dx+ dx⊗ dt) doesnot yield any computational advantages. In those cases, wewrite “dx” with an italic “d” since we are not actually invok-ing the exterior differential d.

More complicated metrics may involve coefficients at thecoordinate 1-forms, or non-diagonal terms, e.g.

g = f1(t, x, y, z)dt2 + f2(t, x, y, z)dt dx+ ...

Here are some further examples of metrics describing space-times important in physics.A Schwarzschild spacetime in the coordinates t, r, θ, φ isdescribed by the metric

g =

(

1 − 2M

r

)

dt2−(

1 − 2M

r

)−1

dr2−r2(dθ2 + (sin2 θ)dφ2

).

(1.36)This spacetime is generated by a nonrotating black hole ofmassM centered at r = 0.A de Sitter spacetime in spatially flat coordinates t, x, y, zhas the metric

g = dt2 − e2Ht(dx2 + dy2 + dz2). (1.37)

This spacetime is used to describe (approximately) theuniverse that is undergoing an accelerated expansion(cosmological inflation).

Remark: In GR, a physical spacetime M is a four-dimensional manifold with a metric g; the metric should havethe Lorentzian signature (+ −−−). A simple-minded viewof a curved spacetimeM is that M = R4 with coordinatesxµ ≡ t, x, y, z, and ametric g is specified through the com-ponents gαβ(t, x, y, z). However, this picture is insufficientlygeneral; for instance, spacetimes containing black holes do nothave this simple structure. In some cases, it turns out that thefull manifoldM is not covered by the coordinate system xµin which a metric gαβ was originally specified.

1.5.4 Orthonormal frames

This section is a very brief introduction to the notion of or-thonormal frames. This subject is more extensively developedin Sec. 6.1.1.In each tangent space TpM, one can choose an orthonormalbasis ea, where a is a label enumerating the basis vectors.We thus obtain a set of vector fields.7 Such a basis of vectorfields is called an orthonormal frame (in four dimensions, atetrad or a vierbein). Orthonormality means that

g(ea, eb) = ηab,

where ηab is a diagonal matrix having diagonal elementsequal to ±1, depending on the signature of the metric g. Forinstance, in GR one uses metrics with Lorentzian signature(+ −−−), so ηab is the matrix

ηab = diag(1,−1,−1,−1) =

1−1

−1−1

.

The dual basis consists of 1-forms θa that can be ex-pressed as follows,

θa u ≡ ηaag(ea,u).

The 1-forms θa are a basis in the cotangent space T ∗pM. The

metric g, as a rank (0,2) tensor, can be recovered from the basisθa as

g =∑

a,b

ηabθa ⊗ θb; g−1 =

a,b

ηabea ⊗ eb. (1.38)

A derivation of Eq. (1.38) can be found in Sec. 6.1.1.

7We assume that the vector fields ea are smooth at least within some patchof M, if it turns out to be impossible to choose a smooth orthonormalframe throughout the entire manifoldM.

29

Page 38: Winitzki. Advanced General Relativity

1 Calculus in curved space

1.5.5 Correspondence of vectors andcovectors

Ametric g determines a one-to-one map between vectors andcovectors (and thus between vector fields and 1-forms). I de-note this map by g, so gv is the 1-form corresponding to avector field v. By definition, the 1-form gv acts on vectors xas follows,

(gv) x ≡ g(v,x).

Then the dual basis 1-forms θa are expressed as θa =ηaagea.Since the correspondence between vectors and 1-forms isone-to-one, there exists the inverse map (from 1-forms to vec-tors), denoted g−1. This map converts a 1-form ω into the vec-tor v = g−1ω such that

g(v,u) ≡ g(g−1ω,u) ≡ ω u

for any vector u. Then the scalar product is also defined on1-forms and is denoted by g−1:

g−1(ω1, ω2) ≡ g(g−1ω1, g−1ω2).

Note that the dual basis θa is orthonormal with respect tothe scalar product g−1.

Remark: In the index notation, the scalar product form g−1

is specified by the matrix gαβ , which is inverse to the matrixgαβ representing the components of the metric g.

For a scalar function f , the 1-form df called the gradient off acts on vectors x as

(df) x ≡ x f,

and the corresponding vector field g−1(df)may be called thecontravariant gradient of f .

Example: For a vector field x and a scalar function f , wehave

x f = g(x, g−1df) = g−1(gx,df).

The derivative of a scalar function φ in the direction of thevector g−1df , i.e. the scalar quantity

(g−1df

) φ, can be also

expressed as

(g−1df

) φ = g−1(df,dφ) =

(g−1dφ

) f.

Practice problem: Suppose S(v) is a transformation-valued1-form such that for any vectors u,v,w we have

S(v)w = −S(w)v,

g(S(v)w,u) = g(w, S(v)u).

Show that these conditions are satisfied only if S(v)w ≡ 0 forall v,w.Hint: Define the auxiliary trilinear form S(v,w,u) ≡

g(S(v)w,u) and investigate its symmetries.

1.5.6 The Levi-Civita tensor ε

Ordinary vector algebra in three-dimensional space makesuse of the Levi-Civita symbol εijk , which is defined as thetotally antisymmetric array of numbers,

ε123 = 1, εijk = −εikj = −εjik, i, j, k = 1, 2, 3.

In curved space, the role of this symbol is played by a totallyantisymmetric tensor field, called the Levi-Civita tensor.Consider a four-dimensional manifold (to be specific and toavoid unnecessary complications in the notation). In four di-mensions, the Levi-Civita tensor has rank four and is definedas the 4-form

ε = θ0 ∧ θ1 ∧ θ2 ∧ θ3,

where θa is an orthonormal basis of 1-forms (see Sec. 1.5.4).We now review the motivation for this definition and theproperties of the tensor ε.In a three-dimensional Euclidean space, the Levi-Civitasymbol εijk is closely related to volume. It is known thatone can calculate the (oriented) volume of a parallelepipedspanned by three vectors a,b, c by using the following ex-plicit formula,

Vol (a,b, c) = a · (b × c) =∑

i,j,k

εijkaibjck,

where ai, bj , ck are the Euclidean components of the three vec-tors (i.e. components in an orthogonal basis). We have seen inSec. 1.4.4 that the oriented volume of such a parallelepiped isequal to the value of a 3-form evaluated on the vectors a,b, c.Thus, the array of numbers εijk can be interpreted as the arrayof components of the 3-form ε in the orthogonal basis. Thenwe have

Vol (a,b, c) = ε (a,b, c) .

Let us generalize this construction to a curved, four-dimensional manifoldM. In a tangent space at a fixed pointp ∈ M, we look for a totally antisymmetric form ε such thatε(v1,v2,v3,v4) is equal to the oriented 4-volume of the paral-lelepiped spanned by the tangent vectors v1, v2, v3, v4. If eais an orthonormal basis with a “positive” orientation then the4-volume of the parallelepiped spanned by ea is equal to 1.Thus, we are looking for a 4-form ε such that

ε(e0, e1, e2, e3) = 1.

Since θa eb = δab , it is clear that ε = θ0 ∧ θ1 ∧ θ2 ∧ θ3 is onesuch 4-form.We defined ε through a particular basis θa, so we needto investigate whether and to what extent ε depends on thechoice of the basis. Of course, ε changes sign when the ori-entation of the basis is reversed. It turns out that ε does notactually depend on the choice of the basis, as long as the ori-entation of the basis is fixed (see Statement 1.5.6.1 below). Inparticular, the orthonormal frame eα may be defined onlylocally, i.e. different frames eα need to be used in differentparts of the manifoldM. Nevertheless, the tensor ε is defineduniquely and globally, i.e. throughout the entire manifold, aslong as the manifold is globally orientable. A manifold isglobally orientable if a choice of orthonormal frame can bemade in every chart such that the orientation of the orthonor-mal frames is the same in every overlapping region betweentwo charts. A Möbius strip is an example of a manifold thatis not globally orientable. It seems that in General Relativityone never needs to consider nonorientable manifolds.

Statement 1.5.6.1: The Levi-Civita tensor

ε = θ0 ∧ θ1 ∧ θ2 ∧ θ3

is invariant under changes of orthonormal basis θa, exceptthat it changes sign when the orientation of the basis is re-versed.

30

Page 39: Winitzki. Advanced General Relativity

1.6 Affine connection

Proof: Let us consider a linear transformation T that bringsthe basis ea into another basis ea,

ea = Tea, θa

= T−1θa.

By definition of the determinant (see Sec. 1.4.5), the orientedvolume of the parallelepiped spanned by ea is equal todet T . Thus, if the new basis ea is orthonormal and has thesame orientation as the old orthonormal basis, the orientedvolume of the parallelepiped spanned by ea is also equal to1. Hence, we must have det T = 1. On the other hand,

ε ≡ θ0 ∧ θ

1 ∧ θ2 ∧ θ

3= (det T−1)θ0 ∧ θ1 ∧ θ2 ∧ θ3 = ε,

by the same logic as

e1 ∧ e2 ∧ e3 ∧ e4 = (det T )e1 ∧ e2 ∧ e3 ∧ e4.

Therefore the new 4-form ε defined through the new basis isequal to the old 4-form ε.

The Levi-Civita tensor is independent of the choice of an or-thonormal basis θa, but it does depend on the metric g. Sinceall 4-forms in four-dimensional space are proportional to eachother, the Levi-Civita tensors determined by different metricswill be proportional to each other. So the dependence of ε onthe metric can be described by a single scalar function. Thisdescription is usually given in terms of the matrix gµν thatrepresents the metric g in a local coordinate system xµ. Bydefinition, the matrix elements gµν are equal to scalar prod-ucts g(∂µ, ∂ν), where ∂µ ≡ ∂/∂xµ is a coordinate basis.Let us now determine the explicit formula for the Levi-Civitatensor ε as a function of the metric g.We used an orthonormal basis θa of 1-forms to write the4-form ε, but actually we can use any other basis: since all 4-forms are proportional to each other, we only need to includea correct scalar factor. So let us use the basis dxµ. Using thisbasis, the 4-form ε can be written as

ε = fdx0 ∧ dx1 ∧ dx2 ∧ dx3 ≡ fd4x,

where f is some function and d4x is a shorthand notation forthe 4-formmade of the basis 1-forms. It remains to determinethe unknown coefficient f . To do that, we can use a suit-able modification of Statement 1.4.5.2, which shows that the4-volume of the parallelepiped spanned by the tangent vec-tors ∂µ is equal to ±

√|det gµν |. The factor “±” reflects the

fact that the volume is oriented and may be negative if thecoordinate basis ∂µ has a negative orientation. The abso-lute value |det gµν | is needed to compensate for the signatureof the metric g, which might make the determinant negative.Thus

f ≡ ε(∂0, ∂1, ∂2, ∂3) = Vol(∂0, ∂1, ∂2, ∂3) = ±√

|det gµν |,

hence the final formula for ε in local coordinates (assuming apositive orientation of the coordinate basis) is

ε =√

|det gµν |dx0 ∧ dx1 ∧ dx2 ∧ dx3.

In this way, the functional dependence of ε on the metric g ismade explicit.It is useful to note that dε = 0. This is so because dε is a5-form, and there are no nonzero 5-forms in four-dimensionalspace.

The main use of the Levi-Civita tensor in GR is to compute4-dimensional volume in curved space.8 Since the tangentvectors are an approximate representation of short curve seg-ments (Sec. 1.2.5), the 4-form ε yields a good approximation tothe 4-dimensional volume for very small volumes. Therefore,one can compute the 4-dimensional volume of a spacetime do-main by integrating the 4-form ε over that domain, and onecan also integrate scalar functions over a domain. The integra-tion is defined purely geometrically, regardless of the choice ofthe coordinate system, once the metric g is fixed. For this rea-son (independence of coordinates), physicists know the Levi-Civita tensor ε by the name covariant volume element.Sometimes the tensor ε is simply called the volume 4-form since the tensor ε|p evaluates the volume in each tan-gent space TpM. It is clear that the construction of the Levi-Civita tensor can be straightforwardly generalized to an N -dimensional manifold with a given metric and orientation. Bythis generalization one obtains anN -form, which is frequentlydenoted by the symbol “Vol” as I have done above. Below Iwill also sometimes denote the Levi-Civita tensor by Vol, par-ticularly when it is used for integration over a manifold. I willuse the notation ε when I need to use its properties as a 4-form, e.g. when I need to evaluate it on some tangent vectors.

1.6 Affine connection

1.6.1 Motivation

To formulate the laws of physics, we need to write differentialequations for some vector and tensor fields on the spacetimemanifold. These differential equations involve derivatives ofthese vector or tensor fields in various directions. Thus weneed to have a directional derivative operation that acts onarbitrary tensors.One could think at first that the Lie derivative Lv is to beused as a directional derivative operation. However, Lv isnot a true directional derivative because it depends on the be-havior of the vector field v in the neighborhood of a point pand not only on the value v|p. A true directional derivativeof a tensor A in a direction v at a point p must depend onlyon the value v|p at the given point. Defining such a true di-rectional derivative requires some information about how to“connect” neighbor tangent spaces TpM and Tp′M, where p′and p are “near-by” points inM. If such a “connection” be-tween the neighbor tangent spaces were defined, tensors atdifferent points could be subtracted. For instance, A|p′ − A|pwould be defined as a tensor in TpM, and the derivative of atensor A along a curve γ at a point p ≡ γ(τ0) could be definedas the limit

limδτ→0

A(γ(τ0 + δτ)) −A(γ(τ0))

δτ.

Such a true directional derivative operation can be defined.It is usually denoted ∇ (pronounced “nabla” or “del”) andcalled a covariant derivative, an affine connection, or simplya connection. Thus, ∇uA|p is the derivative of a tensor A inthe direction of the vector u at a point p. The quantity ∇uA|pis a tensor and does not depend on derivatives of u itself.It turns out that a connection ∇ is not unique becausethere are infinitely many ways to “connect” the neighbor

8There are other, more advanced applications of the Levi-Civita tensor, suchas the Hodge star operation, which I will consider below.

31

Page 40: Winitzki. Advanced General Relativity

1 Calculus in curved space

tangent spaces, i.e. there are infinitely many possible mapsTp′M → TpM. Such a “connection map” can be described bya transformation-valued 1-form

C(u) : Tp′M → TpM,

where u is the tangent vector at point p in the direction ofpoint p′. Thus, there are as many connections∇ as mapsC. Ina local coordinate system, themapC is represented by a quan-tity with three indices, Cλαβ , but it is not a tensor in a singletangent space at p: by definition, it is a map connecting differ-ent tangent spaces. (The map C is related to the non-tensorialChristoffel symbol; see Sec. 1.6.3 below.)In General Relativity, one formulates the laws of physics us-ing a particular choice of the connection ∇ (called the “Levi-Civita connection,” see Sec. 1.6.6 below). This connectionhas several physically motivated and technically convenientproperties with regard to the given metric g on the spacetimemanifold. So the Levi-Civita connection ∇ directly involvesthe metric g and cannot be defined unless a metric is given.A visual motivation for the Levi-Civita connection, usingthe idea of embedding a curved manifold in a flat space, isgiven in Sec. A.4.3 (Appendix A). Presently, I approach thedefinition of the Levi-Civita connection in a somewhat moreabstract way: I first formulate the desirable properties of con-nections in general and then impose suitable conditions to de-duce the Levi-Civita connection.Since we are studying the general properties of tensors, wewill avoid working in a local coordinate system; instead, wewill perform calculations in a coordinate-free manner.

1.6.2 General properties of connections

Wewould like to define a derivative operator∇v which mapsscalars to scalars, vectors to vectors, etc., and depends only onthe direction of the vector v at a point p. The condition that∇vA should not contain any derivatives of v can be written as

∇(λv)A = λ∇vA,

where λ is an arbitrary scalar function. If we had such an op-eration∇v, we could define the “gradient” operator∇ whichmaps scalars to 1-forms, vectors to (1,1)-tensors, and gener-ally tensors of rank (m,n) to tensors of rank (m,n+ 1). In theindex notation, the symbol ∇will have its own index; e.g. the“gradient” of a tensor Aαβ is a tensor of rank (0,3) denoted by∇µAαβ ≡ Aαβ;µ.

Remark: Vectors are always boldface in our index-free con-vention, so there should be no confusion between expressionssuch as ∇νa

µ and ∇va. The former is the index representa-tion of the rank-two tensor ∇a, while the latter is the covari-ant derivative of the vector field a in the given direction v.The index representation of ∇va is v

αaµ;α. The derivative of ascalar function f in the direction of a vector v is equivalentlywritten as v f = Lvf = ∇vf = (df) v, according to theconvenience of the moment. The index representation of v fis vαf,α ≡ vαf;α.

Of course, ∇v should also act on scalar functions as theusual directional derivative, i.e.

∇vf = v f.

In addition, the operator ∇v should have the properties anal-ogous to Eqs. (1.2)-(1.4), (1.10)-(1.11), except that the contrac-

tion “” now means only a contraction of tensors (e.g., appli-cation of a 1-form to a vector) and not the derivative operationv f .

Remark: Unlike the Lie derivative, the connection ∇ is notexpected to satisfy the property

∇u (v f) 6= (∇uv) f + v (∇uf) .

If we assumed that this contraction property holds, we wouldbe forced to have∇u ≡ Lu.

1.6.3 The “coordinate derivative” connection

The above properties of the operation ∇ do not yet uniquelyspecify that operation. There are infinitely many possibleaffine connections. A simple way to define a connection isto fix a local coordinate system xµ and set

∇λ ≡ ∂λ ≡ ∂

∂xλ; (∂vA)

αβγ... ≡ vλ∂

∂xλAαβγ... ≡ vλAαβγ...,λ

for any tensor A. In other words, the component Aαβγ...;λof the covariant derivative of A is calculated as the partialderivative with respect to xλ of the component Aαβγ... of thetensor A in the fixed coordinate system xµ. It is customaryto use the semicolon in the expression Aαβγ ;λ before the extraindex introduced by the covariant derivative.In the index-free notation, we denote the derivative withrespect to the coordinates in a fixed coordinate system by ∂.Thus, the vector ∂uv is defined through the components inthe fixed coordinate system xµ as

(∂uv)α = uλvα,λ.

In the coordinate system xµ, the derivative ∂ is just theusual derivative, so it is easy to see that ∂ satisfies all the prop-erties of a connection. Thus, ∂ is a well-defined affine connec-tion which we call the coordinate derivative connection de-termined by the coordinate system xµ. We stress that theconnection ∂ is tied to a particular coordinate system xµ. In adifferent coordinate system yµ, the components of the vec-tor ∂uv do not have the same simple form uλ∂vα/∂yλ. Thus,for each coordinate system xµwe have a different coordinateconnection ∂.

Remark: The notation “∂” does not indicate explicitly thecoordinate system xµ through which ∂ is defined. So it isimportant to identify that coordinate system every time oneuses ∂. We will rarely make use of ∂, but when we do, we willmake it clear which coordinate system xµ is implied in thedefinition of ∂.

Let us also investigate how an arbitrary connection ∇ is re-lated to the coordinate derivative ∂. It is easy to verify that∇−∂ acts on vectors v linearly (without involving derivativesof v),

∇u (λv) − ∂u (λv) = λ (∇uv − ∂uv) .

Hence, for a fixed vector u the operator ∇u − ∂u is a lin-ear transformation. Equivalently, we can say that ∇ − ∂ isa transformation-valued 1-form and write

∇− ∂ ≡ Γ; (∇u − ∂u)v = Γ (u)v.

The tensor Γ is called the Christoffel tensor of the connection∇with respect to the coordinate system xµ in which ∂ is thecoordinate derivative.

32

Page 41: Winitzki. Advanced General Relativity

1.6 Affine connection

For a fixed coordinate derivative ∂, various choices of Γwillgive different connections ∇. Thus, the Christoffel tensor Γparametrizes all the possible covariant derivatives∇.

Practice problem: Suppose that two different connections∇and ∇ are given (not necessarily coordinate derivative con-nections in any coordinate system). Using the defining prop-

erties of a connection, show that the difference ∇uv − ∇uv

does not depend on derivatives of v and can thus be describedas a linear transformation of v (for a fixed u).

1.6.4 Compatibility with the metric

To formulate the laws of physics, we need to write differentialequations for tensors on the spacetime manifold, and thus weneed to use a directional derivative∇. One possible choice of∇ is the coordinate derivative ∂ in a fixed coordinate systemxµ. But the connection ∇ ≡ ∂ is inconvenient not only be-cause it depends on a fixed coordinate system (while the lawsof physics are expected to be independent of coordinates), butalso because it is in general incompatible with the metric in thefollowing sense.Consider a vector vwhich is “locally constant” in the direc-tion of some other vector u, i.e. ∇uv = 0. Then it is naturalto expect that the length of the vector v is also constant in thedirection of u, i.e. that ∇ug(v,v) = 0. However, this propertydoes not necessarily hold, since we will generally have

∇ug(v,v) = (∇ug) (v,v) + g(∇uv,v) + g(v,∇uv)

= (∇ug) (v,v) 6= 0.

This inconvenience will be avoided only if ∇ug = 0 for anyvector u. The property ∇g = 0 is called compatibility withthe metric (or metricity of the connection). An equivalentway to write this property is

u g(v,w) ≡ ∇ug(v,w) = g(∇uv,w) + g(v,∇uw). (1.39)

In other words, g behaves as a constant under a covariantderivative ∇u, and only v and w are differentiated when onecomputes ∇ug(v,w).

1.6.5 Torsion and torsion-freeness

Usual partial derivatives of scalar functions with respect to co-ordinates xµ (in any local coordinate system) commute,

∂µ∂νf = ∂ν∂µf.

We may ask whether the same property holds for covariantderivatives,

∇µ∇νf = ∇ν∇µf. (1.40)

At the moment this property is written using the index no-tation. To reformulate the property (1.40) in a geometricway (without introducing coordinates explicitly), we contractEq. (1.40) with two arbitrary vectors uµvν and rewrite the re-sult using covariant derivatives,

uµvν∇µ∇νf = uµ∇µ (uν∇νf) − (uµ∇µuν) (∇νf)

= u (v f) − (∇uv) f.

Thus, the property (1.40) is equivalent to

(∇uv −∇vu) f = u (v f) − v (u f) = [u,v] f.

It is not necessarily true that an arbitrary connection ∇ satis-fies this property. To describe the extent of the deviation fromthis property, one defines the torsion tensor,

T (u,v) ≡ ∇uv −∇vu− [u,v] , (1.41)

which is a vector-valued 2-form (see Statement 1.6.5.1 below).The property (1.40) is then equivalent to the requirement thatT (u,v) = 0 for all u,v; this is called the torsion-freeness ofthe connection ∇. Thus, if a connection ∇ is torsion-free, wehave the relation

Luv ≡ [u,v] = ∇uv −∇vu. (1.42)

This relation is similar to Eq. (1.12), except that now covariantderivatives ∇ are used. Note that the coordinate derivative ∂(defined with respect to a coordinate system xµ) is alwaysa torsion-free connection because coordinate derivatives com-mute.

Statement 1.6.5.1: The function T (u,v) as defined byEq. (1.41) does not actually involve derivatives of the vectorsu and v. Namely, for an arbitrary scalar function f , we haveT (fu,v) = fT (u,v) and T (u, fv) = fT (u,v).

Proof of Statement ??: For arbitrary scalars f, φ and vectorsu,v we have

∇v(fu) = (∇vf)u + f∇vu = (v f)u + f∇vu,

∇fuv = f∇uv,

[fu,v] φ = fu (v φ) − v ((fu) φ)

= f [u,v] φ− (v f) (u φ) ,

and it follows that

T (fu,v) φ = fT (u,v) φ− ((v f)u) φ+ (v f) (u φ)

= fT (u,v).

The analogous property for v will follow from the antisym-metry of T .

We have seen that all possible connections∇ are parameter-ized as ∇ = ∂ + Γ, where Γ is a rank 3 tensor (the Christoffeltensor) with respect to a fixed coordinate system xµwhere ∂is the coordinate derivative. We can now compute the torsiontensor T through Γ. Since ∂ is torsion-free, it follows immedi-ately from Eq. (1.41) that

T (u,v) = Γ(u)v − Γ(v)u.

To visualize this property, we may interpret Γ as a vector-valued bilinear form Γ(u,v) ≡ Γ(u)v, and then the propertyT (u,v) = 0 becomes

Γ(u,v) = Γ(v,u).

Thus a torsion-free connection ∇ is characterized by a sym-metric Christoffel tensor.

Practice problem: For a given affine connection ∇, doesthere always exist a coordinate system xµ where ∇ is equalto the coordinate derivative?

Hint: Consider the torsion of the connection ∇. Answer:No.

33

Page 42: Winitzki. Advanced General Relativity

1 Calculus in curved space

1.6.6 Levi-Civita connection

A covariant derivative ∇ which is torsion-free and compat-ible with the metric is called the Levi-Civita connection orthe metric connection. In General Relativity, this is the con-nection used to formulate differential equations for physicalquantities. The properties of metricity and torsion-freenessare imposed for several reasons, including mathematical sim-plicity. Ultimately, these properties are physical assumptions,that is, hypotheses verified by experiments. See Sec. A.4.5 inAppendix A for more discussion and motivation.We now derive an explicit formula for the Levi-Civita con-nection, starting from the properties (1.39) and (1.40). It willfollow that such a connection is unique. Later we shall seehow to describe other connections.

First examples

Before presenting a general derivation, let us make the dis-cussion less abstract by considering examples. First we ver-ify that the Levi-Civita connection in the Euclidean space isjust the ordinary, familiar directional derivative. Consider thethree-dimensional space R

3with coordinates x, y, z and themetric

g = dx2 + dy2 + dz2,

and let us evaluate some covariant derivatives. We will be us-ing only the assumed properties of the Levi-Civita connectionand see where this will lead us.Suppose we need to compute the vector ∇∂x

∂x. The wayto compute it is to evaluate scalar products of this vector withother vectors. For instance, using the compatibility of ∇ withthe metric, and the properties g(∂x, ∂x) = 1, g(∂x, ∂y) = 0 etc.,we find

g(∇∂x∂x, ∂x) =

1

2g(∇∂x

∂x, ∂x) +1

2g(∂x,∇∂x

∂x)

=1

2∇∂x

g(∂x, ∂x) =1

2∂x (1) = 0.

Using the torsion-freeness and the fact that [∂x, ∂y] = 0, wealso find

g(∇∂x∂x, ∂y) = ∇∂x

g(∂x, ∂y) − g(∂x,∇∂x∂y)

= 0 − g(∂x, [∂x, ∂y] + ∇∂y∂x)

= −1

2∇∂y

g(∂x, ∂x) = 0.

In this way, it is easy to see that∇∂x∂x has zero scalar products

with every basis vector; thus it is itself zero,∇∂x∂x = 0.

Let us now consider ∇∂x∂y . We have already seen that

g(∇∂x∂y, ∂x) = 0, and further we obtain

g(∇∂x∂y, ∂y) =

1

2∇∂x

g(∂y, ∂y) = 0.

The computation of the scalar product g(∇∂x∂y, ∂z) is a bit

longer. First, we find

g(∇∂x∂y, ∂z) = g([∂x, ∂y] + ∇∂y

∂x, ∂z) = −g(∂x,∇∂y∂z)

= −g(∇∂y∂z, ∂x).

The trick is to notice that we now have a similar structure withthe coordinates x, y, z cyclically transposed. Repeating thetransposition twice more, we find

g(∇∂x∂y, ∂z) = −g(∇∂x

∂y, ∂z),

so it vanishes. Thus, ∇∂x∂y = 0. Similarly, one can show

that ∇∂x∂z = 0, etc. In other words, the covariant derivative

of every basis vector in any direction equals zero! Now, weconsider an arbitrary vector field

v = A∂x +B∂y + C∂z,

where A,B,C are functions of x, y, z. We compute

∇∂xv = (∂xA) ∂x + (∂xB) ∂y + (∂xC) ∂z ,

since the derivatives ∇∂xof the vectors ∂x, ∂y , ∂z vanish. In

other words, the Levi-Civita connection ∇∂xsimply differen-

tiates the components (A,B,C) of the vector v in the directionx. This is the behavior familiar from standard vector analysis.Thus, the Levi-Civita connection ∇∂x

in flat space coincideswith the familiar directional derivative ∂/∂x.As a less trivial example, let us consider a two-dimensionalspace with a local coordinate system (x, y) and the metric

g = dx2 + f(x)dy2,

where f(x) 6= 0 is some (known) function that depends onlyon x. Let us compute ∇∂x

∂y using the same approach asabove. We find

g(∇∂x∂y, ∂x) = g([∂x, ∂y] + ∇∂y

∂x, ∂x) =1

2∂yg(∂x, ∂x) = 0;

g(∇∂x∂y, ∂y) =

1

2∇∂x

g(∂y, ∂y) =1

2f ′(x).

In this way, the vector ∇∂x∂y is completely determined

through its scalar products with basis vectors ∂x and ∂y :

∇∂x∂y =

1

2

f ′(x)

f(x)∂y.

Similarly, one may evaluate other covariant derivatives, suchas

g(∇∂y∂y, ∂y) = 0,

g(∇∂y∂y, ∂x) = −g(∂y,∇∂y

∂x) = −g(∂y,∇∂x∂y) = −1

2f ′(x).

Thus

∇∂y∂y = −1

2f ′(x)∂x.

Derivation of the Levi-Civita connection

In the examples above, we have computed covariant deriva-tives of some vectors by using only the defining proper-ties (1.39) and (1.42) of the Levi-Civita connection, as well assome tricks. We now derive a general formula for∇ using thesame approach and the same tricks.Consider arbitrary vector fields x,y and let us compute thevector∇xy. This vector will be unambiguously determined ifwe compute its scalar product g(∇xy, z)with an arbitrary vec-tor z. So we start with the scalar product expression g(∇xy, z)and rewrite it using the assumed properties (1.39) and (1.42),

g(∇xy, z) = ∇xg(y, z) − g(y,∇xz)

= x g(y, z) − g(y, [x, z]) − g(y,∇zx)

= x g(y, z) − g(y, [x, z]) − g(∇zx,y).

We have related g(∇xy, z) to the same structure with (x,y, z)cyclically transposed. So now we repeat the same procedure

34

Page 43: Winitzki. Advanced General Relativity

1.6 Affine connection

twice more, replacing the term with ∇zx by a term with ∇yz,and finally by a term with ∇xy. We obtain

g(∇xy, z) = x g(y, z) − g(y, [x, z]) − z g(x,y)

+ g(x, [z,y]) + y g(x, z) − g(z, [y,x]) − g(∇xy, z).

Thus the final formula for the Levi-Civita connection is

g(∇xy, z) =1

2

(x g(y, z) + y g(x, z) − z g(x,y)

− g(x, [y, z]) − g(y, [x, z]) + g(z, [x,y])). (1.43)

This is known as the Koszul formula.

Calculation 1.6.6.1: The familiar formula for the covariantderivative in the index notation,

∇µuν = ∂µu

ν + Γναµuα,

Γλαβ ≡ 1

2gλµ (∂αgµβ + ∂βgµα − ∂µgαβ) , (1.44)

can be derived from Eq. (1.43).Details: The Christoffel tensor Γ describes the differencebetween the covariant derivative∇ and the coordinate deriva-tive ∂ in a particular coordinate system. Let us select a coor-dinate system and substitute x = ∂α, y = ∂β , z = ∂µ (whereα, β, µ are fixed values of the indices) into Eq. (1.43). We notethat the commutators of these vectors vanish and so

2g(∇∂α∂β, ∂µ) = ∂αgβµ + ∂βgµα − ∂µgαβ.

Nowwe can decompose an arbitrary vector u as∑

λ uλ∂λ and

compute

∇αuν = gµνg(∇∂α

u, ∂ν) = ∂αuν + Γναµu

µ.

Note that the Levi-Civita connection ∇ is defined throughthe metric g and thus implicitly depends on g. For instance,the vector field ∇uv involves not only first derivatives of v,but also first derivatives of the metric. This is easy to see byinspection of Eq. (1.43), where the metric g clearly stands un-der differentiation in the first three terms. It is instructive toverify this statement more directly, without using Eq. (1.43).Let us change the metric tensor g as follows,

g → g = e2λg,

where λ is a scalar function. (This transformation of the met-ric is called a conformal transformation because it preservesangles between vectors.) Now we can use Eq. (1.43) to de-

termine the new connection ∇. It is convenient to computeg(∇xy, z) using the new metric g, rather than the old metric.We find

g(∇xy, z) = (x λ) g(y, z) + (y λ) g(x, z) − (z λ) g(x,y)

+ g(∇xy, z). (1.45)

This expression contains first derivatives of λ such as x λ.Therefore, the Levi-Civita connection indeed involves firstderivatives of the metric.

Remark: As we have seen in Sec. 1.6.3, all the possibleconnections (covariant derivatives) ∇ can be described bychoosing a coordinate system xµ (in which the coordinatederivative is ∂) and a transformation-valued 1-form Γ. Then∇uv = ∂uv + Γ(u)v. Torsion-freeness implies the symmetryΓ(u)v = Γ(v)u, but Γ is not otherwise constrained. Of all thepossible torsion-free connections, the Levi-Civita connectionis described by a particular unique choice of Γ determined bythe metricity requirement.

Calculation: Recall that the (2,0) tensor g−1 is the inversemetric defined on 1-forms, or alternatively the map from vec-tors to 1-forms to vectors. We now show that∇u(g−1) = 0 forthe Levi-Civita connection ∇.A simple (but less explicit) calculation is as follows. Con-sider the map g from vectors to 1-forms and the map g−1 from1-forms to vectors. We have

gg−1 = 1,

where 1 is the identity map from 1-forms to 1-forms. Then

∇u(gg−1) = g∇ug−1 = ∇u1 = 0,

thus ∇ug−1 = 0.

Here is amore detailed calculation along the same lines. Foran arbitrary vector v and 1-form ω, we have

ω v = g(g−1ω,v).

Now we apply∇u to both sides,

∇u(ω v) = ∇ug(g−1ω,v) = g(∇u(g−1ω),v) + g(g−1ω,∇uv).

Since ∇ satisfies Leibnitz’s rule with respect to tensor prod-ucts and contractions, we find

∇u(ω v) = (∇uω) v + ω (∇uv)

= g(g−1(∇uω),v) + g(g−1ω,∇uv).

The result∇ug−1 = 0 follows.

1.6.7 Killing vectors

The Lie derivative operation Lv can be applied to arbitrarytensors, and in particular to the metric tensor g. Let us com-pute the Lie derivativeLvgwith respect to a given vector fieldv. By definition, Lvg is a bilinear form acting on two vectorsa,b by first letting Lv act on the entire expression g(a,b) andthen subtracting the Lie derivatives Lv of a and b:

(Lvg) (a,b) ≡ Lv (g(a,b)) − g(Lva,b) − g(a,Lvb)

= v g(a,b) − g([v,a] ,b) − g(a, [v,b]).

There exists a more convenient formula for the Lie deriva-tive of the metric. This formula uses the Levi-Civita connec-tion∇.Statement 1.6.7.1: The Lie derivative of the metric g with re-spect to a vector k is a bilinear form that can be computed as

(Lkg) (a,b) = g(∇ak,b) + g(a,∇bk). (1.46)

Proof: Let k be an arbitrary vector field. By definition ofLkg, we have for arbitrary vector fields a,b the expression

Lkg(a,b) = (Lkg) (a,b) + g(Lka,b) + g(a,Lkb)

= (Lkg) (a,b) + g([k,a] ,b) + g(a, [k,b]). (1.47)

On the other hand, g(a,b) is a scalar function, andLkg(a,b) =∇kg(a,b). Then the metricity condition ∇kg = 0 yields

Lkg(a,b) = ∇kg(a,b) = g(∇ka,b) + g(a,∇kb).

Since [k,a] = ∇ka − ∇ak, and similarly for [k,b], we obtainEq. (1.46).

35

Page 44: Winitzki. Advanced General Relativity

1 Calculus in curved space

A vector field v such that Lvg = 0 is called a Killing vectorfor the metric g. The Killing vector describes a special set ofdirections in space, such that the metric “remains constant”in these directions. It is only due to the presence of a certainsymmetry that ametric could have even a single Killing vectorfield. (By contrast, the Levi-Civita connection gives ∇vg = 0for any vector field v, but this is due to a definition of∇ ratherthan a special property of the metric g. Recall that the Liederivative is defined independently of the metric.)

Example 1.6.7.2: Consider an 3-dimensional space with co-ordinates

(x1, ..., x3

)≡ (x, y, z) and the metric

g =N∑

α,β=1

Aαβ(x, z)dxα ⊗ dxβ ,

where Aαβ(x, z) are some complicated functions of x and z(but not of y). Then it is intuitively clear that the metric g is“constant in the direction y.” The mathematical formulationof this property is that v = ∂y is the Killing vector for g.We can verify this property explicitly. The value of (Lvg)

(a,b) depends only on the values of a and b at each point p,but not on derivatives of a and b. To simplify the calculations,let us choose the vector fields a and b such that

[v,a] = 0, [v,b] = 0

at a point p. In terms of the components aµ and bµ, these con-ditions mean that

∂aµ

∂y

∣∣∣∣p

= 0,∂bµ

∂y

∣∣∣∣p

= 0

at a point p. It is always possible to adjust the derivatives ofaµ and bµ at any single point p, without changing the valuesof the vectors at that point. Then we find, at the same point p,

(Lvg) (a,b) = v g(a,b) − g([v,a] ,b) − g(a, [v,b])

= ∂yg(a,b) = ∂y

α,β

Aαβ(x, z)aαbβ

p

= 0.

The same calculation goes through at every point p. Thus ∂yis a Killing vector for g.

Example 1.6.7.3: Consider the Schwarzschild metric

g =

(

1 − 2M

r

)

dt2−(

1 − 2M

r

)−1

dr2−r2(dθ2 + (sin2 θ)dφ2

).

This metric has the Killing vectors ∂t and ∂φ. One describesthe corresponding symmetries of the spacetime by sayingthat the geometry is stationary (independent of time) and az-imuthally symmetric (independent of the azimuthal angle φ).Note that these vectors commute, [∂t, ∂φ] = 0, meaning thatthe symmetries are independent of each other. Of course, anylinear combination of ∂t and ∂φ (with constant coefficients) isalso a Killing vector for g.

Practice problem: Produce an explicit example showingthat in general Lvg 6= 0 for a vector field v and a metric g.

Using Eq. (1.46), the Killing vector property Lkg = 0 can berewritten in a more convenient way, as a differential equationfor the vector k. This equation is called the Killing equation,

g(∇ak,b) + g(a,∇bk) = 0, (1.48)

and is understood as an identity that should hold for all vec-tors a,b.

Remark: In the index notation, the Killing equation (1.48) is

∇µkν + ∇νkµ = 0 (1.49)

and involves the covariant components of the Killing vector k.

For an arbitrary vector field x, we may consider the bilinearform B(x) defined by

B(x)(a,b) ≡ g(∇ax,b).

We call B(x) the distortion tensor corresponding to the vectorfield x. In the index notation, the distortion tensor is writtenas

[B(x)

]

µν= ∇µxν .

The Killing equation (1.48) can be easily restated in terms ofthe distorsion tensor

B(k)(a,b) ≡ g(∇ak,b) (1.50)

of the vector k. Namely, k is a Killing vector iff its distorsiontensor B(k) is antisymmetric, i.e. B(k)(a,b) = −B(k)(b,a). Inother words, B(k)(a,b) is a 2-form if k is a Killing vector. Thefollowing problem derives an explicit representation of this2-form.

Practice problem: Using Eqs. (1.43) and (1.47), show that

B(k)(a,b) ≡ g(∇ak,b) =a g(k,b) − b g(k,a) − g (k, [a,b])

2

if k is a Killing vector. Then prove that the 2-form B(k) isequivalently represented as

B(k) =1

2d (gk) ,

where d is the exterior differential and gk is the 1-form de-fined by (gk) v ≡ g(k,v).Sketch of a solution: Three of the six terms in Eq. (1.43)cancel because of Eq. (1.47). The rest follows by using the for-mula (1.21).

1.6.8 *Koszul formula and the Lie derivative

In Sec. 1.6.6 we have derived the Koszul formula (1.43) by“brute force,” starting from the expression g(∇xy, z) and us-ing the defining properties of the Levi-Civita connection∇. Ashorter derivation and a more elegant-looking formula can befound using the following trick.We will have a formula for ∇ if we can compute the distor-tion tensor B(x) of an arbitrary vector field x. The trick is toconsider the symmetric and the antisymmetric parts of B(x)

separately. Let us first consider the symmetric part,

B(x)(a,b) +B(x)(b,a) = g(∇ax,b) + g(∇bx,a)

= g(∇xa − Lxa,b) + g(a,∇xb − Lxb)

= Lxg(a,b) − g(Lxa,b) − g(a,Lxb)

= (Lxg) (a,b) .

Now consider the antisymmetric part,

B(x)(a,b) −B(x)(b,a) = g(∇ax,b) − g(∇bx,a)

= ∇ag(x,b) − g(x,∇ab) −∇bg(x,a) + g(x,∇ba)

= a g(x,b) − b g(x,a) − g(x, [a,b]).

36

Page 45: Winitzki. Advanced General Relativity

1.6 Affine connection

Comparing with Eq. (1.21) and denoting by gx the 1-form

(gx) a ≡ g(x,a),

we find

B(x)(a,b) −B(x)(b,a) = (dgx) (a,b) .

Finally, we restore B(x) as a half-sum of its symmetric andantisymmetric parts. Thus, we can rewrite the Koszul formulamore concisely as follows,

B(x) =1

2d (gx) +

1

2Lxg. (1.51)

Now it becomes clearwhy the covariant derivative of a Killingvector k yields an antisymmetric 2-form B(k) = 1

2dgk.

Remark: In the index notation, the formula (1.51) is a rein-terpretation of the trivial identity

∇µkν =1

2(∇µkν −∇νkµ) +

1

2(∇µkν + ∇νkµ) .

The first term represents the components of the exterior differ-ential of the 1-form kµ, while the second term resembles theKilling equation (1.49).

Practice problem: Derive a generalization of the for-mula (1.51) in case of a connection∇ that has a given nonzerotorsion tensor T (u,v) but is still compatible with the metric g.Hint: Use Eq. (1.41) and follow the derivation of Eq. (1.51).Answer:

2B(x)(a,b) = [d (gx) + Lxg] (a,b)

+ g(T (x,a),b) + g(T (x,b),a) − g(T (a,b),x).

1.6.9 Divergence of a vector field

The divergence of a vector field u is a number characterizingthe change in the volume of space carried by the flow of u.To be definite, let us consider a four-dimensional manifold

M. Let u be a given vector field onM, and consider the flowlines of u as curves parameterized by a parameter τ . We mayselect a region V ⊂ M at an initial value τ = 0, and let the flowof u transform this region to a different region V(τ). Such a τ -dependent region of space is called comoving with the flowof u.It is more convenient to consider an “infinitesimal” volume,or a volume element. A 4-volume element of the comovingvolume can be thought of as a parallelepiped spanned by thetangent vector u|p and by three vectors connecting the pointp to nearby points on the congruence corresponding to thesame value of the curve parameter τ . Thus we need to choosethree connecting vectors a, b, c for the field u. Then we wouldlike to compute the rate of change of the volume element V ≡ε(u,a,b, c), where ε is the Levi-Civita symbol. The result willbe the derivative of V along the flow, that is, LuV .If the field u is everywhere timelike and normalized by thecondition g(u,u) = 1, wemay interpret τ as the “proper time”measured by observers along the flow lines. Then the con-necting vectors will correspond to “comoving” spatial direc-tions measured at a fixed time τ in the reference frame of anobserver. So the derivative LuV will be interpreted as the lo-cal rate of change of the comoving 3-volume of space. (The

3-volume is completed to a 4-volume by the unit vector u.)However, for the present calculation we do not need to as-sume that u is timelike.Since

Lua = Lub = Luc = Luu = 0,

the quantity LuV can be written as

LuV = Luε(u,a,b, c) = (Luε) (u,a,b, c).

The tensor Luε is a totally antisymmetric tensor (4-form), andso it must be proportional to ε itself. Hence we must haveLuε = fε, where f is some scalar function. This functiondepends on the field u and is (by definition) called the diver-gence of u, denoted divu.

Statement 1.6.9.1: The quantity divu can be expressedthrough the Levi-Civita connection ∇ and an orthonormalframe ea, such that g(ea, eb) = ηab, by the formulas

divu =∑

a

ηaag(ea,∇eau)

=∑

a

ηaag(ea, [ea,u]).

In the index notation, the divergence can be expressed as

divu = ∇µuµ ≡ uµ;µ.

Proof of Statement 1.6.9.1: Denote by θa the orthonormalbasis of 1-forms dual to ea. We can then compute

Luε = Lu

(θ0 ∧ θ1 ∧ θ2 ∧ θ3

)

=(Luθ0

)∧ θ1 ∧ θ2 ∧ θ3 + θ0 ∧

(Luθ1

)∧ θ2 ∧ θ3 + ...,

where we omitted similar terms involving θ2 and θ3. We nowconsider the first term above. The 1-form Luθ0 can be ex-pressed as a linear combination of the basis 1-forms θa,

Luθ0 =∑

a

θa(Luθ0

) ea.

Due to antisymmetry of ∧ within(Luθ0

)∧ θ1 ∧ θ2 ∧ θ3, only

the term proportional to θ0 survives, so

(Luθ0

)∧ θ1 ∧ θ2 ∧ θ3 =

((Luθ0

) e0

)θ0 ∧ θ1 ∧ θ2 ∧ θ3.

Since θ0 e0 = 1, we can simplify(Luθ0

) e0 as follows,

(Luθ0

) e0 = Lu

(θ0 e0

)− θ0 Lue0 = θ0 (∇e0u −∇ue0)

= η00g(e0,∇e0u −∇ue0) = η00g(e0,∇e0u),

where we used the property

g(e0, [e0,u]) = g(e0,∇e0u),

which is due to

g(e0, e0) = const, g(e0,∇ue0) =1

2∇ug(e0, e0) = 0.

Thus(Luθ0

)∧ θ1 ∧ θ2 ∧ θ3 = η00g(e0,∇e0u)ε.

Analogous simplifications are performed for other terms, andso we obtain the first required formula. Note that

g(ea,∇uea) = 0

37

Page 46: Winitzki. Advanced General Relativity

1 Calculus in curved space

due to the normalization of ea. Thus

g(ea,∇eau) = g(ea, [ea,u]).

The formula for divu in the index notation can be now de-rived by expressing the orthonormal basis ea through thecoordinate basis ∂µ. Suppose that eµa are the components ofthe orthonormal frame, so that

ea = eµa∂µ.

Then in the index notation we have

g(ea,∇eau) = gλµe

µaeνau

λ;ν .

Thusdivu =

a

ηaagλµeµaeνau

λ;ν .

On the other hand, Eq. (1.38) gives

a

ηaaeµaeνa = gµν .

Hence, divu = gλµgµνuλ;ν = uµ;µ.

Since Luε = (divu)ε, our final result is

LuV = (divu) ε(u,a,b, c) = (divu)V.

Thus the quantity divu is the relative rate of change of the vol-ume carried by the flow of u.The divergence of a vector field can be also expressed interms of the Hodge star operation.

Statement 1.6.9.2: If ηab is the canonical form of the metric,the divergence of a vector field u is

divu = (det ηab) ∗ d ∗ gu.

Remark: This expression for the divergence will be rarelyuseful in our calculations.

Proof of Statement 1.6.9.2: Since divu is a scalar, we have∗divu = εdivu, so it is sufficient to show that

εdivu = d ∗ gu.

Since gu is a 1-form, we have

∗gu = ιuε.

Using the property dε = 0 and the Cartan homotopy for-mula (1.22), we compute

d ∗ gu = dιuε = (Lu − ιud) ε = Luε ≡ (divu) ε.

1.7 Calculations in index-free notation

The index-free notation has its advantages and disadvantages,in comparison with the index notation.

• Index-free notation emphasizes geometric operations,such as the commutator [a,b] or scalar product g(a,b),that would otherwise remain hidden under a mass of in-dices. For instance, the Bianchi identities are clearly seen

to be consequences of the Jacobi identity and the assump-tions about the Levi-Civita connection. Also, the geomet-ric relations between tensors are shown explicitly, ratherthan encoded in the positions of indices. Thus, calcula-tions in index-free notation help develop geometric intu-ition rather than merely an agility with manipulating in-dices. In the index notation, the same calculations appearto consist of a certain lucky manipulations of indices (theso-called “juggling of indices”). It remains unclear howto guess the correct sequence of index manipulations thatwould be needed for a particular calculation.

• Index-free notation is in many cases more concise thanthe index notation, since one does not need to write in-dices next to each letter.

• In the index-free approach, every object is definedthrough geometric operations, so it is impossible to in-troduce a non-tensorial quantity. For instance, the non-tensorial Christoffel symbol Γλαβ in its usual form (1.44) issimply undefined; instead one might define a Christoffeltensor Γ, which is a transformation-valued 1-form repre-senting the difference between two given affine connec-

tions ∇ and∇,

Γ(u)v ≡ ∇uv −∇uv.

Within the index notation, one frequently uses non-tensorquantities and calculates their components in a conve-nient coordinate system, and then one needs to check thatthe results are tensors. Operations with non-tensor quan-tities obscure the logic of a derivation and may lead tohard-to-spot errors.

• Index-free notation appears more abstract, whereas ex-pressions in the index notation can be easily interpretedas just “arrays of numbers.”

• Index-free notation is unwieldy if complicated contrac-tions or symmetrizations are applied to tensors of high

rank. For instance, an expression such as A[ρµν;ρB

β]µνα is

difficult to manipulate in the index-free notation.

On the whole, it appears that the index-free notation is moresuitable for “generic” or “abstract” calculations, such as defi-nitions of new quantities or derivations of general propertiesof tensors. In calculations involving complicated tensor con-tractions and symmetrizations, the index notation is better.For specific calculations, e.g. when components of a certaintensor are known in particular coordinates, the index notationis unavoidable.Perhaps, the index notation should be studied first, be-cause a certain familiarity with the index notation seems tobe helpful for learning more abstract material. However, Ifeel that learning to think in the index-free notation is alsohelpful, if only for the fact that the entire mathematical litera-ture has been using exclusively index-free notation for severaldecades.

1.7.1 Abstract index notation

It is somewhat cumbersome to use the index-free notationwhen dealing with tensors of higher rank. For example, sup-pose A is a transformation-valued one-form such that A(v)is a linear transformation in the tangent space, i.e. A(v)w is a

38

Page 47: Winitzki. Advanced General Relativity

1.7 Calculations in index-free notation

vector. Then∇A is a transformation-valued bilinear form thatacts as

(∇A)(u,v)w ≡ [∇uA] (v)w.

Note that we have some difficulty in making it clear that(∇A)(u,v) is a derivative in the direction of u rather than v,and that onlyA is to be differentiated but not v orw. We couldwrite

“(∇A)(u,v)w” ≡ ∇u [A(v)w] − A(∇uv)w −A(v)∇uw,

but manipulating such expressions is a rather cumbersome af-fair. In such cases, it is more convenient to write everything inthe index notation, e.g.

“(∇A)(u,v)w” ≡ uµvλwα∇µAβαλ ≡ uµvλwαAβαλ;µ.

This notation clearly shows that the vector u supplies the di-rection for the derivative, and the vectors v and w are con-tracted with the tensor A, but only A is differentiated. How-ever, when using the index notation we do not actually needto talk about components of the vectors in a basis; in fact, wehave not really chosen any explicit basis so far. In most cases,the indices are actually used only as labels indicating that vec-tors and tensors are contracted with each other in a particularorder. This is the idea of the abstract index notation intro-duced by R. Penrose as a re-interpretation of the familiar com-ponent notation.9

The abstract index notation can be summarized as follows:An index, such as α in vα and uα, does not take values0, 1, 2, ..., but is merely a label indicating that v is a vectorwhile u is a 1-form. Repeated indices, e.g. vαuα, do notmean summation but instead indicate that a certain vector isbeing contracted with a certain 1-form. No particular basisor coordinate system is chosen or implied, and one is care-ful to perform only well-defined geometric operations on allquantities. (Geometric operations are the tensor product, thecontraction of tensors, the Lie derivative, and the covariantderivative. The ordinary coordinate derivative ∂/∂xµ is dis-allowed.) The results of such calculations may be interpretedeither as coordinate-free expressions or as component expres-sions “valid in any coordinate system.” Strictly speaking,the abstract index notation disallows non-tensorial quantitiessuch as the Christoffel symbol Γλαβ , operations with individ-ual components of tensors such asRα4β4, or an implicit choiceof a coordinate system (e.g. calculations in locally inertial co-ordinates).The approach taken in this book is coordinate-free and thenotation is almost everywhere index-free. However, we douse the abstract index notation when it is significantly moreconvenient. For instance, it is straightforward to express com-plicated tensor contractions such asAρµν;ρB

µνα in the index no-

tation, but cumbersome without indices.

Remark: I would like to emphasize that the index nota-

tion such as ∇µAβαλ ≡ Aβαλ;µ does not imply that the covari-

ant derivative is applied to each component of A separately.Rather, this notation refers to the components of the higher-rank tensor (∇A). For instance, the component ∇1A

112 is not

9Penrose wrote in [25]: “What I shall present here is an entirely frame-independent algebra which allows one to calculate with indexed quantities exactlyas before (but now with a clear conscience!)...” The “clear conscience” referredto is the absence of a cordinate system. However, this “conscience” is notreally “clear” if one still uses non-tensor quantities such as Γλ

αβ or per-

forms calculations in specially chosen coordinate systems.

equal to an operator “∇1” acting on the component A112 of the

tensor A. Rather, the notation ∇1A112 stands for the compo-

nent (∇A)1112 of the tensor ∇A. If (for example) Aβαλ = −Aβλαthen the “obvious” property Aβαλ;µ = −Aβλα;µ is not automati-

cally satisfied (as it would be if Aβαλ;µ were a component-wise

derivative of Aβαλ). This property must be derived as a con-sequence of the properties of ∇. The symbol ∇µ is not a col-lection of fixed tensors ∇0,∇1,∇2,∇3; rather, ∇µ representsdifferent operations when applied to different tensors. There-

fore, an abstract index expression such as Aβαλ;µ must be inter-preted and used with care.

1.7.2 Converting expressions into index-freenotation

In this sectionwe develop themethods for converting indexedexpressions into an index-free form.The correspondence between the index notation and theindex-free notation is established using the basis vectors eα ≡∂/∂xα, by assuming fixed values of all the indices. For ex-ample, consider a covector (1-form) A. It is represented inthe index notation by the symbol Aµ, while ∇αAβ means the(α, β)-component of the rank (0,2) tensor∇A. For fixed α andβ, the component ∇αAβ is the number defined by

Aβ;α ≡ ∇αAβ ≡ (∇A)αβ = (∇eαA) eβ.

An equation written in the index notation, such as

Aβ;α + gαβ − 2Aα;β = 0, (1.52)

is converted to the index-free notation by first rewriting it as

∇αAβ + gαβ − 2∇βAα,

where we introduced the symbol∇ explicitly, and then by fix-ing α, β and inserting basis vectors eα, thus obtaining

(∇eαA) eβ + g (eα, eβ) − 2

(∇eβ

A) eα = 0.

However, we expect that such tensor relations can be ex-pressed in a geometric way, i.e. as statements about arbitraryvectors, without using a particular basis eα. Indeed, thiscan be done with a little work.The basic rule of converting indexed expressions to theindex-free notation is to contract each free index with an ar-bitrary vector. The resulting expression will then depend on acertain number of arbitrary vectors but will be independent ofthe choice of basis. Consider Eq. (1.52) as an example. SinceEq. (1.52) has two free indices (α and β), let us contract it witharbitrary vectors a and b having index representations aα andbβ :

aαbβAβ;α + aαbβgαβ − 2aαbβAα;β = 0.

Then we rewrite covariant derivatives explicitly through thesymbol ∇:

aαbβ∇αAβ + aαbβgαβ − 2aαbβ∇βAα = 0.

Finally, this equation is rewritten in the index-free notation as

(∇aA) b + g (a,b) − 2 (∇bA) a = 0.

Note that (∇aA) is a 1-form acting on vectors. For the purposeof facilitating further calculations with the above expression,it might be useful to rewrite the 1-form (∇aA) b as

(∇aA) b = a (A b) − A (∇ab)

= a (A b) − A ([a,b] + ∇ba)

39

Page 48: Winitzki. Advanced General Relativity

1 Calculus in curved space

or in some other form. In any case, we now have an equationinvolving arbitrary vectors a,b and the known 1-formA, andwe are not using any basis vectors.A somewhat more complicated example is an indexed ex-pression involving repeated covariant derivatives, such as

vλ;βα − vλ;αβ = ∇α∇βvλ −∇β∇αv

λ = 0. (1.53)

Contracting Eq. (1.53) with arbitrary vectors aα and bβ , wefind

aαbβ∇α∇βvλ − aαbβ∇β∇αv

λ = 0. (1.54)

We would like to rewrite this expression in the index-free no-tation. But now we encounter a difficulty: Nested covari-ant derivatives, such as ∇a∇bv, contain also derivatives ofb, which are not present in the expression (1.54). Thereforewe first rewrite

aαbβ∇α∇βvλ = aα∇α

(bβ∇βv

λ)−(aα∇αb

β) (

∇βvλ)

= ∇a∇bv −∇(∇ab)v,

and similarly for the other term in Eq. (1.54), and then finallyEq. (1.53) becomes

∇a∇bv −∇(∇ab)v −∇b∇av −∇(∇ba)v

= [∇a,∇b]v −∇[a,b]v = 0.

In the last line, the first commutator is simply a shorthandnotation,

[∇a,∇b]v ≡ ∇a∇bv −∇b∇av,

while the second commutator [a,b] ≡ Lab is the commutatorof vector fields.

Practice problem: Rewrite the following indexed expres-sions in an index-free manner:

uαuβh;αβ = Cαβvβgαλf,λ;

uα;β − uβ;α = vαwβ − vβwα;

vαuβ;α − uαvβ;α = wαwα;β ;

where uα, vα, wα are indexed representations of vector fieldsu,v,w; f and h are scalar functions; andCαβ is a bilinear formC(x,y) ≡ Cαβx

αyβ .Answers: ∇u∇uh −∇∇uuh = C(v, g−1df), dgu = (gv) ∧

(gw), and g(x, [v,u]) = g(w,∇xw), where x is arbitrary.

Note that if we used coordinate basis vectors ∂µ and ∂νinstead of arbitrary vector fields a and b, we would have[∂µ, ∂ν ] = 0 and the term with the commutator would van-ish. Thus we would have obtained a simpler formula

∇∂µ∇∂ν

v −∇∂ν∇∂µ

v = 0,

which looks quite similar to the indexed expression (1.53).The simplification is due to the fact that the commutator term∇[a,b]v drops out. But the same simplification can be achievedwithout choosing the coordinate basis vectors. It suffices to as-sume that the vectors a and b commute. Since our expressionis actually independent of the derivatives of a and b, we canalways choose commuting vector fields a and b in a neighbor-hood of any one point. This choice will be possible in everycase when we convert an indexed expression into an index-free form, since any number of simultaneously commutingvector fields can be chosen at a point (see, for example, State-ment 1.2.11.2). We will frequently use this method of simpli-fication.

1.7.3 Index-free computations of trace

Traces of tensors can be computed in the index-free notation,although in many cases calculations are easier in the indexnotation. Nevertheless, it is possible to develop a sufficientlypowerful formalism, so that the index notation is in princi-ple never needed. In this section I explain the index-free ap-proach to calculating the trace. Since the readerswill certainlybe more familiar with the index notation for traces, I will pro-vide extensive comparisons between indexed and index-freeapproaches.I begin by reviewing the index-free definition of the trace ofa linear transformation. A transformation T is expressed in atensor form as

T = a1 ⊗ ω1 + a2 ⊗ ω2 + ..., (1.55)

where aj are some vectors and ωj are some covectors; by def-inition, any (1,1)-tensor can be decomposed in this way. Thenthe trace of T is

Tr T = ω1 a1 + ω2 a2 + ... (1.56)

However, this definition is not convenient in more compli-cated cases. For example, a tensor expressed in the index no-tation as T λαβ may be interpreted as a transformation-valued1-form,

x → T (a)x, (T (a)x)λ ≡ T λαβa

αxβ .

Supposewewould like to compute the trace of T , expressed inthe index notation as Tααβ . According to the definition (1.56),

we first need to decompose T λαβ as a linear combination oftensor products, as in Eq. (1.55). This computation is cumber-some because many tensor products will be needed in a de-composition of T λαβ . Also, such explicit decompositions usu-ally require a choice of basis. For example, the identity opera-tor can be decomposed as

1 =

n∑

j=1

ej ⊗ θj ,

where ej is a basis andθjis the corresponding dual ba-

sis. (The above relation is called a decomposition of iden-tity.) Calculations with such decompositions rapidly becomeunwieldy for higher-rank tensors, and so we will proceed byanother route.To be specific, let us show how one can compute the trace

Tααβ in a basis-free manner. We first “lower the index” to ob-

tain Tαβλ ≡ T µαβgλµ, which is in the index-free notation a tri-linear form T (a,b, c),

T (a,b, c) ≡ g(T (a)b, c).

Then we need to specify two of three arguments of T whichare to be contracted. For instance, the covector ωβ ≡ Tααβ ≡Tαβλg

αλ looks like the contraction of T (a,b, c)with respect tothe vectors a and c. To make this relationship explicit, we usethe following notation,

ω(b) = Tr(a,c)T (a,b, c). (1.57)

This notation is somewhat verbose but explicit and unam-biguous.The vectors a, c do not enter the final object ω(b) andare merely auxiliary; they may be calledmute vectors.Thus we propose the following index-free definition of thetrace of amultilinear tensor-valued function T (a,b, c, ...)with

40

Page 49: Winitzki. Advanced General Relativity

1.7 Calculations in index-free notation

respect to a,b: The tensor T (a,b, c, ...) is first written as a lin-ear function of a ⊗ b, i.e. T (a ⊗ b, c, ...), and then the trace isfound by substituting the inverse metric g−1, which is a (2,0)-tensor, instead of the argument a ⊗ b:

Tr(a,b)T (a,b, c, ...) = Tr(a,b)T (a ⊗ b, c, ...) ≡ T (g−1, c, ...).

The index notation for this trace would be gαβTαβγ.... In thisway, the trace operation can be understood as a simple substi-tution a ⊗ b = g−1. An explicit notation for this substitution,

Tr(a,b)T (a,b, c, ...) ≡ Tra⊗b=g−1T (a,b, c, ...) (1.58)

could be useful when one needs to specify explicitly that themetric g is being used. For instance, one could then intro-duce several metrics g, h, ..., and compute traces with respectto these metrics. However, we will be working most of thetime with one metric g, and so the verbose and explicit nota-tion (1.58) will not be necessary.Using the trace notation, we may now rewrite an arbitraryindexed expression in an index-free form. When an indexedexpression contains a sum over a pair of “dummy” indices,we can first lower both indices (inserting an extra gαβ) andthen replace the gαβ by a pair of mute vectors.

Example: Let us rewrite the following equation,

uαuβuγwγ;αβ = Cλαλβuαuβ,

in the index-free notation. Using the techniques fromSec.1.7.2, we rewrite the left-hand side as

g(u,∇u∇uw −∇∇uuw).

Turning now to the right-hand side, we note that it is onlynecessary to express the trace Cλαλβ , since all other contrac-tions are ordinary. The tensor Cλαµβ can be interpreted as atransformation-valued function of two vectors,

z → C(x,y)z, [C(x,y)z]λ ≡ Cλαµβz

αxµyβ.

(In an actual calculation, such as that in the practice problemon the following page below, we may be given an explicit,index-free formula for the transformation C(x,y)z.) The traceCλαλβ is first converted into a contraction with g

λµ,

Cλαλβ = gλµCλαµβ .

The tensor Cλαµβ is a function of four vectors, expressedthrough the original tensor Cλαµβ as

Cλαµβaλzαxµyβ = gκλa

κCλαµβzαxµyβ

≡ g(a, C(x,y)z) ≡ C(a, z,x,y).

Thus we rewrite

Cλαλβuαuβ = gλµCλαµβu

αuβ .

Finally, we introduce two mute vectors a,b whose tensorproduct aλbµ will replace gλµ in the contraction gλµCλαµβ .The result of the replacement is

gλµCλαµβuαuβ → aλbµCλαµβu

αuβ = C(a,u,b,u)

= g(a, C(b,u)u).

The trace of this expression with respect to the mute vectorsa,b is written as

Tr(a,b)g(a, C(b,u)u) = Cλαλβuαuβ .

In this way, we are able to rewrite the initial indexed expres-sion in an index-free manner.

To determine the trace such as that in Eq. (1.57) in an ac-tual calculation, we need to have a concrete expression for thetensor T in terms of other known vectors or tensors, and aprocedure to implement the trace operation via an index-freecalculation. I now describe such a procedure and then giveexamples.Suppose we need to compute the trace of a tensor

T (x,y, z, ...) with respect to x,y. One possibility to performthis calculation in the index-free notation is to use an explicitdecomposition of the inverse metric g−1 through tensor prod-ucts of orthonormal frame e0, e1, e2, e3,

g−1 = e0 ⊗ e0 − e1 ⊗ e1 − e2 ⊗ e2 − e3 ⊗ e3, (1.59)

which corresponds to the index notation

gαβ = eα0 eβ0 − eα1 e

β1 − eα2 e

β2 − eα3 e

β3 .

Since T (x,y, z, ...) is a bilinear form in x and y, we can com-pute the trace Tr(x,y)T (x,y, z, ...) by substituting the decom-position (1.59) of the tensor g−1 instead of the arguments x,y.Therefore

Tr(x,y)T (x,y, z, ...) = T (e0, e0, z, ...) −3∑

j=1

T (ej, ej , z, ...).

(1.60)Of course, the trace does not depend on the choice of the basis.For instance, we may also use a decomposition of the form

g−1 =

n∑

j=1

uj ⊗ vj , (1.61)

where uj and vj are some suitable vectors that are not neces-sarily linearly independent or orthogonal (and n ≥ 4). If weuse the decomposition (1.61), we will have

Tr(x,y)T (x,y, z, ...) =

n∑

j=1

T (uj,vj , z, ...).

However, it is somewhat inconvenient to have to choose par-ticular vectors for a decomposition (1.59) or (1.61), especiallysince the result does not depend on the choice of these vectors.Inmany cases, a trace Tr(x,y)T (x,y, ...) can be computed with-out selecting an explicit decomposition of the metric, espe-cially when the tensor T is given by an index-free expressioninvolving fixed vectors, the metric g, and covariant deriva-tives. The following statement lists the basic building blocksfor index-free trace calculations.

Statement 1.7.3.1: The trace operation has the followingproperties:Linearity and distributivity:

Tr (A+B) = Tr A+ Tr B,

Tr(a,b)A(...) ⊗B(a,b, ...) = A(...) ⊗ Tr(a,b)B(a,b, ...),

Tr(a,b)∇cA(a,b,x, ...) = ∇cTr(a,b)A(a,b,x, ...).

Here A,B are arbitrary tensors, g is the metric, P1, P2 aretransformations, i.e. (1,1)-tensors, and a,b, ...,x, ... are vec-tors.Symmetry:

Tr(a,b)A(a,b,x, ...) = Tr(a,b)A(b,a,x, ...),

Tr(a,b)g(P1P2a,b) = Tr(a,b)g(P2P1a,b),

Tr(a,b)g(P1a,b) = Tr(a,b)g(PT1 a,b),

41

Page 50: Winitzki. Advanced General Relativity

1 Calculus in curved space

where we denote by PT the adjoint transformation to P : Thetransformation PT acts on x giving a vector PTx that satisfiesg(PTx,y) ≡ g(x, Py) for arbitrary y. (The adjoint transfor-mation is represented in an orthogonal basis by a transposedmatrix, hence the notation PT .)Relations involving the metric:

Tr(a,b)g(∇ax,b) = divx;

Tr(a,b)g(a,b) = 4,

where 4 is the dimension of spacetime. The substitution prop-erty:

Tr(a,x)g(a,b)x = b,

or more generally,

Tr(a,x)g(a,b)A(x,y, z, ...) = A(b,y, z, ...)

for an arbitrary tensor-valued function A(...) which is linearin x.Relations involving the Levi-Citiva tensor ε ≡ θ0∧θ1∧θ2∧θ3:

Tr(a1,b1)(a2,b2)ε(a1,a2,x,y)ε(b1,b2,x′,y′)

= −2g(x,x′)g(y,y′) + 2g(x,y′)g(y,x′);

Tr(a1,b1)(a2,b2)(a3,b3)ε(a1,a2,a3,x)ε(b1,b2,b3,x′) = −6g(x,x′).

Hints: These properties are straightforward to derive us-ing a tensor decomposition of the inverse metric g−1 in anorthonormal basis. As a last resort, one could rewrite all ex-pressions in the index notation.

Partial proof: Let us derive the last two properties. Toshow that Tr(a,x)g(a,b)x = b, we decompose b =

j bjejin an orthonormal basis ej and use Eq. (1.59):

Tr(a,x)g(a,b)x =∑

j

g(ej,b)ej =∑

j

bjej = b.

For an arbitrary tensor-valued multilinear function A(...), wehave g(a,b)A(x,y, ...) = A(g(a,b)x,y, ...). Since we alreadyderived the desired property for g(a,b)x, and since A(...) islinear in its arguments, it follows that the the same trace prop-erty holds also for A(...).

Example: Let us compute the trace of the transformationPx ≡ x−2ng(n,x), where n is a given unit vector, g(n,n) = 1.

First we define the bilinear form P (x,y) ≡ g(Px,y); thenwe compute its trace with respect to x,y:

Tr P ≡ Tr(x,y)P (x,y) = Tr(x,y) [g(x,y) − 2g(n,x)g(n,y)]

= 4 − 2g(n,n) = 2.

Practice problem: For a fixed vector u in four-dimensionalspace, compute the trace

Tr(a,b)g(a, C(b,u)u),

where the transformation-valued 2-form C is defined by

C(x,y)z ≡ g(y, z)x − g(x, z)y.

Answer: 3g(u,u).

The tensor under the trace may contain derivatives of mutevectors, as long as these derivatives can be moved out ofthe expression or when the derivatives ultimately cancel out.In other words, the expression under the trace should bea linear function of the mute vectors, not really involvingtheir derivatives. Otherwise, the trace expression is undefined(meaningless). For instance, the expressions Tr(a,b)∇ab andTr(a,b)g(a, [x,b]) are meaningless because the result of a cal-culation according to Eq. (1.61) depends on the derivativesof the basis vector fields ej. Such meaningless expressionsare never encountered in practical calculations (or throughrewriting a well-defined indexed expression). On the otherhand, the following expressions containing derivatives arewell-defined,

Tr(a,b)∇xg(a,∇by) = ∇xTr(a,b)g(a,∇by) = ∇x(divy);

Tr(a,b) (∇a∇bx −∇∇abx) ≡ x ≡ gµν∇µ∇νx.

The last line is the D’Alembert operator acting on a vectorfield x. The trace is well-defined because the expression un-der the trace is actually independent of the derivatives of aand b.Computations with expressions of this kind can be simpli-fied using the following trick. It is always possible to choosebasis vector fields so that ∇ej

ek = 0 for all j, k at one pointin spacetime. Hence, when computing a well-defined traceexpression containing derivatives, one could simply assumethat all the mute vectors are “constant vectors,” in the sensethat their (covariant) derivatives are all equal to zero. This as-sumption is also consistent with the property ∇g = 0 and adecomposition (1.59) of the metric through a basis that entersthe trace via Eq. (1.60). For instance, the D’Alembert operatorcan then be written more intuitively as follows,

x = Tr(a,b)∇a∇bx.

1.7.4 Summary of calculation rules

The formalism of coordinate-free, index-free calculations issufficiently powerful to handle all situations involving vec-tor fields and 1-forms, the metric, and covariant deriva-tives. (However, the index-free notation becomes cumber-some when one needs to manipulate tensors of higher rank.)In comparison with the index notation, this formalism is moretransparent, emphasizes the geometric content of a calcula-tion, and frequently helps to guess the necessary sequenceof manipulations that lead to a desired result. I summarizethe rules of index-free, coordinate-free calculations for conve-nience.Vector fields are denoted by boldface letters a,v,x etc.,while scalar functions are denoted by ordinary letters a, b, f ,etc. It is sometimes convenient to use Greek letters α, λ, µ forscalar functions. In blackboard calculations, vectors may beunderlined or even simply left unmarked if it does not createconfusion.The basic operations of tensor calculus are tensor product,tensor contraction, and the derivative. Explicit tensor prod-ucts are written as u⊗ v and seem to be seldom needed. Ten-sor contraction is written as an application of a function to avector, for example ωv if ω is a 1-form, g(a,b) if g is a metric,or ιvω if ω is an n-form.The metric g creates a correspondence between vectors and

1-forms; this correspondence and its inverse are denoted by gand g−1. For instance, gv is a 1-form if v is a vector.

42

Page 51: Winitzki. Advanced General Relativity

1.8 Curvature

There are three basic derivative operations: the Lie deriva-tive LvA of a tensor A (the application of a vector field to ascalar field is a particular case, v λ = Lvλ ); the exteriordifferential dω of an n-form ω (where 0-forms are understoodas scalar functions); and the Levi-Civita covariant derivative∇vA of a tensor A in the direction given by a vector v. Thequantities LvA and ∇vA are tensors of the same rank as A.The quantity dω is an (n+ 1)-form if ω is an n-form.In calculations, it is usually more convenient to use∇v withvectors but d, ιv, and Lv with n-forms.The operations ιv and ∇v are linear in v and local (dependonly on the value of v at a point but not on its derivatives):

ιλv (...) = λιv (...) , ∇λv (...) = λ∇v (...) ;

however, Lv is not local in v,

Lλv (...) 6= λLv (...) .

The precise expression for LλvA depends on the rank of thetensor A. The Leibnitz rule with respect to tensor productsholds for the derivative operations, with the exception of dwhich needs an extra sign:

∇a (η ∧ θ) = (∇aη) ∧ θ + η ∧∇aθ,

La (η ∧ θ) = (Laη) ∧ θ + η ∧ Laθ,

d (η ∧ θ) = (dη) ∧ θ + (−1)|η| η ∧ dθ,

where |η| in the expression (−1)|η|means the rank n of an n-

form η. Also

ιa (η ∧ θ) = (ιaη) ∧ θ + (−1)|η|η ∧ ιaθ.

Some further rules for calculations are the following (a,b, care vectors, λ is a scalar, ω is an n-form, θ is a 1-form, g is themetric tensor).

[a,b] λ ≡ a (b λ) − b (a λ) ,Lab = [a,b] = ∇ab −∇ba = −Lba,

Laλ = a λ ≡ (dλ) a ≡ ιadλ = ∇aλ,

∇a∇bλ = ∇b∇aλ,

∇ag = 0,

(dθ) (a,b) = a (θ b) − b (θ a) − θ [a,b],

(ιaω) (b, c, ...) ≡ ω (a,b, c, ...) ;

Laω = dιaω + ιadω,

ddω ≡ 0.

The symbol ≡ refers to properties that hold directly by defini-tion.

Examples: A consequence of ∇ag = 0 and the Leibnitz ruleis the identity

∇ag−1ω = g−1∇aω

for a 1-form ω.The equivalence of∇a andLa on scalars leads to useful sim-plifications in some calculations with the metric. For instance,

a g(b, c) = ∇ag(b, c) = Lag(b, c);

∇ag(b, c) = g(∇ab, c) + g(b,∇ac),

Lag(b, c) = (Lag) (b, c) + g([a,b], c) + g(b, [a, c]).

Depending on the particular calculation, one of these equiva-lent forms may be used to proceed.

The basic rules for index-free calculations with traces arelisted in Statement 1.7.3.1. Here is another example of suchcalculation.

Statement 1.7.4.1: Let g and g = e2λg be two metrics relatedby a conformal transformation. For a vector field u, we can

then compute two different divergences divu and divu withrespect to the metrics g and g. These divergences are relatedby the formula

divu = divu +Nu λ,

where N is the dimension of the manifold.

Proof of Statement 1.7.4.1: The simplest proof is throughthe definition of divu as the coefficient in the equation

Luε = (divu) ε,

where ε is the Levi-Civita tensor corresponding to the metricg. The Levi-Civita tensor is defined as the N -form with theproperty

ε(e1, ..., eN) = 1,

if ej is an orthonormal basis in the metric g. When the met-ric g is multiplied by e2λ, each vector in the orthonormal basisis multiplied by e−λ. It follows that the Levi-Civita tensor ismultiplied by eNλ,

ε = eNλε.

Then we compute

Luε =(Lue

Nλ)ε + eNλLuε = (Nu λ) ε + (divu) ε

=(

divu)

ε.

Let us perform another, slightly longer computation of divu

using the formula involving the trace. We denote by ∇ theLevi-Civita connection for the metric g. We will use the for-mula

divu = Tr(a,b)g(∇au,b) = e−2λTr(a,b)g(∇au,b).

Using Eq. (1.45), we compute

e−2λTr(a,b)g(∇au,b) = divu

+Tr(a,b) [(a λ) g(u,b) + (u λ) g(a,b) − (b λ) g(u,a)]

= divu + (u λ)Tr(a,b)g(a,b)

= divu +Nu λ.

1.8 Curvature

A motivation for introducing the concept of curvature is todescribe in detail the deviation of the geometry of a manifoldnear each point from the flat geometry. A direct description ofthis deviation can be given using the concept of parallel trans-port (see also Sec. 1.9.1). In Appendix A a formula is derivedfor the parallel transport of a vector along an infinitesimallysmall closed curve, and thus the connection to the curvaturetensor is made explicit (see Sec. A.5.3). In this approach, thecurvature tensor is the object describing the transformationof vectors under parallel transport along infinitesimally smallclosed curves. However, calculations in this approach requireusing a local coordinate system because otherwise the paralleltransport operation cannot be described explicitly. Presently, Iintroduce the curvature tensor by a different formula that canbe directly used in index-free and coordinate-free calculations.

43

Page 52: Winitzki. Advanced General Relativity

1 Calculus in curved space

1.8.1 Curvature of a connection

The curvature of a connection ∇ is a transformation-valued2-form R(u,v) defined by10

R(u,v)w = [∇u,∇v]w −∇[u,v]w. (1.62)

At first glance, it may appear that the right-hand side of theexpression (1.62) contains derivatives of the vectors u,v,w.However, this is not so.

Statement 1.8.1.1: The vector field R(u,v)w defined byEq. (1.62) does not depend on the derivatives of the vectorfields w, u, or v. Thus, R(u,v) is indeed a transformation-valued 2-form.

Proof of Statement 1.8.1.1: Let us verify that no deriva-tives of w remain in R(u,v)w; to that end, we show thatR(u,v)λw = λR(u,v)w, where λ is an arbitrary scalar func-tion. This is obtained by a direct computation:

∇u∇vλw = ∇u (λ∇vw + (v λ)w)

= λ∇u∇vw + (u λ)∇vw

+ (u (v λ))w + (v λ)∇uw;

∇[u,v]λw = λ∇[u,v]w + ([u,v] λ)w;

now it is easy to see that

∇u∇vλw −∇v∇uλw = λR(u,v)w + ∇[u,v]λw.

Similarly, we show that R(u,v)w is a linear function alsoof v (i.e. does not involve derivatives of v) by computingR(u, λv)w. We use Calculation 1.3.1.1 to express [u, λv], andobtain

∇u∇λvw = ∇u (λ∇vw) = λ∇u∇vw + (u λ)∇vw;

∇[u,λv]w = ∇(uλ)v+λ[u,v]w = (u λ)∇vw + λ∇[u,v]w;

R(u, λv)w = λ∇u∇vw − λ∇u∇vw − λ∇[u,v]w = λR(u,v)w.

The analogous property for u follows from the antisymmetryof R(u,v)w in (u,v).

The index notation for the curvature tensor, Rαβµν , can be

defined by

(R(x,y)z)ν ≡ xαyβzµRαβµ

ν .

We can also define a tensor R(a,b, c,d) of rank (0,4) by “low-ering the index,” i.e.

R(a,b, c,d) ≡ g(R(a,b)c,d) ≡ Rαβµνaαbβcµdν .

Either of these equivalent tensors R, when evaluated usingthe Levi-Civita connection, is called the Riemann tensor ofthe manifold.11

1.8.2 Bianchi identities

In this section I review and derive the standard properties ofthe Riemann tensor.The antisymmetry property,

R(a,b, c,d) = −R(b,a, c,d), (1.63)

10Equation (1.62) is known as the Ricci identity; it is convenient to regard itas the definition of R(u, v).

11We are using the (− + −) sign convention, according to the classificationin Misner-Thorne-Wheeler [21].

follows immediately from the definition (1.62). Additionally,the curvature tensor corresponding to the Levi-Civita connec-tion has the following symmetry properties:

R(a,b, c,d) = −R(a,b,d, c), (1.64)

R(a,b, c,d) +R(b, c,a,d) +R(c,a,b,d) = 0. (1.65)

R(a,b, c,d) = R(c,d,a,b). (1.66)

The property (1.65) is called the first Bianchi identity. Theseproperties hold only for the Levi-Civita connection; we alwayswork with that connection here.

Remarks: Usually, one simply says “Riemann tensor of themanifold” instead of themore precise but cumbersome phrase“Riemann tensor of the Levi-Civita connection correspondingto the given metric on the manifold.” The Riemann tensor isa function of the metric that involves second derivatives ofthe metric (see Calculation 1.8.4.1 below). It is easier to re-member the symmetries (1.63)-(1.66) of the Riemann tensorR(a,b, c,d) if one thinks of the expression g(a, c)g(b,d) −g(a,d)g(b, c). This expression (as will be shown below) is ac-tually equal the Riemann tensor of a unit sphere in Euclideanspace, if g is the induced metric on the sphere.12

We will now show in a series of statements that: the prop-erty (1.64) is a consequence of torsion-freeness and the com-patibility of ∇ with the metric; the property (1.65) is a con-sequence of torsion-freeness and the Jacobi identity for com-mutators of vectors; and Eq. (1.66) is a purely algebraic conse-quence of Eqs. (1.64)-(1.65).

Statement 1.8.2.1: The Riemann tensor has the propertyR(a,b,x,x) = 0, where a,b,x are arbitrary vectors. The iden-tity (1.64) then follows due to linearity of R(·, ·, ·, ·).Idea of proof: Transform the torsion-freeness condition

0 = [∇a,∇b] g(x,x) −∇[a,b]g(x,x)

using the metricity condition (1.39).

Proof of Statement 1.8.2.1: We have

∇a∇bg(x,x) = 2g(∇a∇bx,x) + 2g(∇ax,∇bx),

hence[∇a,∇b]g(x,x) = 2g([∇a,∇b]x,x);

also,∇[a,b]g(x,x) = 2g(∇[a,b]x,x);

finally,

R(a,b,x,x) = g(R(a,b)x,x)

= g([∇a,∇b]x,x) − g(∇[a,b]x,x)

=1

2

([∇a,∇b] −∇[a,b]

)g(x,x) = 0.

The identity R(a,b, c,d) +R(a,b,d, c) = 0 follows by settingx = c + d in R(a,b,x,x) = 0.

The calculation in the preceding proof can be made shorterby assuming that the vectors a,b commute, [a,b] = 0, andthat the derivatives of x in directions aand b vanish. This canbe assumed without loss of generality because R(a,b,x,x)

12Due to the symmetry properties, the Riemann tensor has 112

n2(n2 − 1) in-dependent scalar components for an n-dimensional manifold. This fact,which does not seem to have direct applications in General Relativity, nev-ertheless enjoys a mention (usually without a clear derivation) in manyGR textbooks. Here I follow the venerable tradition.

44

Page 53: Winitzki. Advanced General Relativity

1.8 Curvature

does not depend on derivatives of a,b,x, but only on valuesof a and b and x at a point, and thus these derivatives (suchas ∇ab or ∇bx) can be set to zero. (We can always find anyfinite number of vector fields a,b, c, ... in the neighborhood ofa point p such that the values a(p),b(p), c(p), ... at the point pare prescribed and all first derivatives, such as ∇ab or ∇bc,vanish at p; but second derivatives, such as ∇a∇bc, will ingeneral not vanish at p.)Here is the abbreviated calculation. Assuming that ∇ax =

∇bx = [a,b] = 0, we have

0 = [a,b] g(x,x) = [∇a,∇b] g(x,x)

= 2g([∇a,∇b]x,x) = 2R(a,b,x,x).

Below we will often use the assumption of vanishing firstderivatives to simplify calculations.

Statement 1.8.2.2: The property (1.65) follows by rewritingthe Jacobi identity

[a, [b, c]] + [b, [c,a]] + [c, [a,b]] = 0

using covariant derivatives and the torsion-free condi-tion (1.42).

Proof of Statement 1.8.2.2: We start with

[a, [b, c]] = ∇a [b, c] −∇[b,c]a

= ∇a∇bc −∇a∇cb −∇[b,c]a,

and similarly for the other terms. Then we write the Jacobiidentity and rearrange the resulting expression:

[a, [b, c]] + [b, [c,a]] + [c, [a,b]] = ∇a∇bc −∇a∇cb + ∇b∇ca

−∇b∇ac + ∇c∇ab−∇c∇ba −∇[b,c]a −∇[c,a]b−∇[a,b]c

= R(a,b)c +R(b, c)a +R(c,a)b = 0.

Statement 1.8.2.3: Any tensor of rank (0,4) satisfying thetransposition properties (1.64) and (1.65) will also satisfyEq. (1.66).

Proof of Statement 1.8.2.3: For brevity, we will write abcdinstead of R(a,b, c,d) within this calculation. First, we usethe transposition properties to move the indices c, d to the left:

0 = abcd+ bcad+ cabd =

= abcd+ bcad+ (−cadb)= abcd+ bcad+ (adcb+ dcab),

which means that

abcd− cdab = −(bcad− adbc).

This last property can be now used repeatedly on differentindices:

−(bcad− adbc) = +(cabd− bdac),

+(cabd− bdac) = −(abcd− cdab).

Therefore abcd− cdab = −(abcd− cdab) = 0.

The Riemann tensor also satisfies the second Bianchi iden-tity, written in index notation as

∇κRλµνρ + ∇λRµκνρ + ∇µRκλνρ = 0. (1.67)

It is somewhat cumbersome to write this identity using theindex-free notation such as R(a,b, c,d) because only R is tobe differentiated and not the vectors a, b, c, d, which is noteasy to indicate without using indices. The property (1.67)has important consequences for GR.13

Let us now derive the Bianchi identity for thetransformation-valued tensor R(u,v) defined by Eq. (1.62).We need to convert the index expression (1.67) into an index-free form. So we introduce arbitrary vector fields a,b, c,vand consider the expression

cκaλbµvν∇κRλµνρ = ∇c

(R(a,b)v

)−R(∇ca,b)v

−R(a,∇cb)v −R(a,b)∇cv.

(The subtractions are needed to make sure that only R is dif-ferentiated.)

Statement 1.8.2.4: The identity (1.67) holds, due to the van-ishing of the sum of the twelve terms obtained via cyclic per-mutations of a,b, c in the above formula.

Proof of Statement 1.8.2.4: We start the derivation by writ-ing the Jacobi identity,

[[A,B] , C] + [[B,C] , A] + [[C,A] , B] = 0.

This is a purely algebraic relation that holds for arbitrary op-erations A,B,C in any context (as long as the composition ofoperations is associative). We now apply the Jacobi identityto the operations ∇a, ∇b, ∇c, and also use the fact that ∇u islinear in u, to derive the following identities,

[[∇a,∇b] ,∇c]v + [[∇b,∇c] ,∇a]v + [[∇c,∇a] ,∇b]v = 0,

∇[[a,b],c]v + ∇[[b,c],a]v + ∇[[c,a],b]v = 0.

The Bianchi identity will follow if we subtract the first of theserelations from the second and express various terms throughR() using Eq. (1.62). For instance,

− [[∇a,∇b] ,∇c]v + ∇[[a,b],c]v = −R(a,b)∇cv + ∇cR(a,b)v

−∇[a,b]∇cv + ∇c∇[a,b]v + ∇[[a,b],c]v

= ∇c

(R(a,b)v

)−R(a,b)∇cv −R([a,b], c)v.

Adding up the cyclic permutations of the last expressionin (a,b, c) and using Eq. (1.42) and the obvious propertyR(a,b)v = −R(b,a)v, we find all the terms required for theBianchi identity.

1.8.3 Ricci tensor and scalar

The Ricci tensor Rµν is the bilinear form Ric(a,b) defined asthe trace of R(a,x,b,y) with respect to x and y,

Ric(a,b) = Tr(x,y)R(a,x,b,y).

The Ricci tensor is a symmetric bilinear form due to the prop-erty (1.66). In the index notation, this contraction of the Rie-mann tensor is written as

Rµν = Rµανβgαβ.

The Ricci scalar is the trace of the Ricci tensor,

R ≡ Tr(a,b)Ric(a,b) ≡ gµνRµν .

13A generalization of the second Bianchi identity also exists for connectionswith nonvanishing torsion; see [33], §5.5.

45

Page 54: Winitzki. Advanced General Relativity

1 Calculus in curved space

It is sometimes more convenient to use the index notation forcalculations with the Riemann and Ricci tensors. The follow-ing examples illustrate the typical calculations. We will useboth the index and the index-free notations.

The Einstein equation relates the Ricci tensor Rµν to theenergy-momentum tensor of matter, Tµν , as follows,

Rµν −1

2gµνR = −8πGTµν , (1.68)

where G is Newton’s constant. According to this equation,curvature of the spacetime is related to the energy density, ve-locity, and pressure of matter.

Statement 1.8.3.1: It follows from the Einstein equa-tion (1.68) and the second Bianchi identity that Tµν is alwayscovariantly conserved,∇µTµν = 0.

Proof of Statement 1.8.3.1: Contracting Eq. (1.67) withgκνgµρ, we can easily show that∇µ(Rµν − 1

2gµνR) = 0.

Example: We consider a four-dimensional spacetime with aknown metric g. Consider the following relationship for asymmetric tensor Aµν , similar to Eq. (1.68):

Aµν + agµνAαα = Sµν ,

where a is a given constant, Sµν is a given symmetric tensor,and the trace Aαα is defined by A

αα ≡ Aµνg

µν . We would liketo obtain an explicit expression for Aµν in terms of Sµν .

In the index-free notation, A is a symmetric bilinear formsatisfying

A+ agTr(a,b)A(a,b) = S,

or more explicitly

A(x,y) + ag(x,y)Tr(a,b)A(a,b) = S(x,y).

The unknown tensor A would be readily found from thisequation if we knew its trace. So let us compute the trace ofboth parts with respect to (x,y); since the trace of g is 4, wefind

Tr(x,y)A(x,y) + 4aTr(a,b)A(a,b) = Tr(x,y)S(x,y).

It follows that

Tr(x,y)A(x,y) =1

1 + 4aTr(x,y)S(x,y),

and therefore

A = S − a

1 + 4ag.

In the index notation, the calculation looks as follows:

Sµνgµν = (Aµν + agµνA

αα) gµν = Aαα + agµνg

µνAαα

= (1 + 4a)Aαα,

therefore

Aµν = Sµν −a

1 + 4agµνS

λλ .

Note that there may be no solutions, or at any rate no uniqueexpression for A in terms of S, when a = − 1

4 .

Practice problem: In the Einstein-Cartan theory (which is amodification of GR), the torsion tensor T λµν is related to the

spin density tensor Sλµν of matter by the equation

T λµν + δλµTρνρ − δλνT

ρµρ = 8πGSλµν .

Consider a more general relationship,

Aλµν + a(δλµA

ρνρ − δλνA

ρµρ

)= Sλµν ,

where a is a given constant and Sλµν is a given tensor (both

Aλµν and Sλµν are antisymmetric in the lower indices). Obtain

an explicit expression for Aλµν in terms of Sλµν .

Hint: Compute a suitable trace of the left-hand side of thegiven equation.Answer: If a 6= − 1

3 , the solution is

Aλµν = Sλµν −a

1 + 3a

(δλµS

ρνρ − δλνS

ρµρ

).

1.8.4 Calculations with the curvature tensor

We will now perform some further calculations involving thecurvature tensor. The index-free notation will be used.

Statement 1.8.4.1: AKilling vector k has the following prop-erty (written in index notation),

∇µ∇νkα = Rναµβkβ .

The index-free version of the given identity is

g(∇a∇bk, c) − g(∇∇abk, c) = R(b, c,a,k). (1.69)

Outline of proof: We use the first Bianchi identity and thedefinition of a Killing vector. To simplify the calculations, weassume that first derivatives of the auxiliary vectors a,b, cvanish.

Proof of Statement 1.8.4.1: To obtain the index-free expres-sion shown above, we contract the given indexed formulawith arbitrary vectors aµ, bν , cα, and for convenience raise theindex on kα; we find

aµbνcα∇µ∇νkα = cβgαβ [aµ∇µ(b

ν∇νkα) − (aµ∇µb

ν)(∇νkα)]

= g(c,∇a∇bk) − g(c,∇∇abk);

aµbνcαRναµβkβ = R(b, c,a,k).

We note that both sides of Eq. (1.69) depend only on the val-ues of the auxiliary vectors a,b, c but not on their derivatives.So we may simplify calculations by choosing these auxiliaryvectors such that all the first derivatives (e.g. ∇ab) vanish ata given point. Then we rewrite Eq. (1.69), which we need toprove for vector fields a,b, cwith vanishing derivatives, as

g(∇a∇bk, c) = R(b, c,a,k). (1.70)

By assumption, the vector k satisfies the Killing equation(for arbitrary x,y),

g(∇xk,y) = −g(∇yk,x).

The plan is to convert ∇bk into ∇ck using the Killing equa-tion, to obtain a term g(∇a∇ck,b), and to express it throughR(a, c,k,b), which essentially differs from what we need on

46

Page 55: Winitzki. Advanced General Relativity

1.9 Geodesic curves, geodesic vector fields

the right side of Eq. (1.70) by a cyclic permutation of a,b, c.To implement this plan, we need to move ∇a temporarily tothe outside of g(...), so that we can use the Killing equation:

g(∇a∇bk, c) = ∇ag(∇bk, c) − g(∇bk,∇ac);

∇ag(∇bk, c) = −∇ag(∇ck,b)

= −g(∇a∇ck,b) − g(∇ck,∇ab).

Since the first derivatives of a,b, c vanish by assumption, wehave simply

g(∇a∇bk, c) = −g(∇a∇ck,b).

We replace∇a∇ckwith the Riemann tensor,

g(∇a∇ck,b) = R(a, c,k,b) + g(∇c∇ak,b) + g(∇[a,c]k,b)

= R(a, c,k,b) + g(∇c∇ak,b),

since [a, c] = 0. Thus we have derived the property

g(∇a∇bk, c) = −R(a, c,k,b) − g(∇c∇ak,b).

Now we note that the structure of the term on the right sideis exactly the same as that of the left side, except for a cyclicpermutation of the vectors a,b, c. Since the first Bianchi iden-tity involves the permutation of the first three arguments ofR(...), it is convenient to replace−R(a, c,k,b) = R(a, c,b,k).Thus we can apply this relation twice more, until the cyclicpermutation is complete, and we find

g(∇a∇bk, c) = R(a, c,b,k) −R(c,b,a,k)

+R(b,a, c,k) − g(∇a∇bk, c).

Using the first Bianchi identity, we find

g(∇a∇bk, c) = −R(c,b,a,k),

which coincides with Eq. (1.70).

As another application of the index-free method of calcu-lation, we will derive the formula for the change in the Rie-mann tensor under a conformal transformation, g → g ≡ e2λg,where λ(x) is a fixed scalar function. The nonzero function e2λ

is called the conformal factor. Since the resulting expressioncontains second derivatives of the conformal factor, it will fol-low that the Riemann tensor depends on second derivativesof the metric.

Calculation 1.8.4.1: Consider a conformal transformation ofthe metric, g → g ≡ e2λg. It can be derived from Eq. (1.45) thatthe Riemann and Ricci tensors of the new metric g are relatedto the old tensors in the following way. The new Riemanntensor is

e−2λR(a,b, c,d) = R(a,b, c,d)

+Hλ(a, c)g(b,d) −Hλ(a,d)g(b, c)

−Hλ(b, c)g(a,d) +Hλ(b,d)g(a, c)

+ det

∣∣∣∣∣∣

g(a, c) g(b, c) g(l, c)g(a,d) g(b,d) g(l,d)g(a, l) g(b, l) g(l, l)

∣∣∣∣∣∣

,

where l ≡ g−1dλ and the symmetric bilinear form

Hλ(a,b) = Hλ(b,a) ≡ g(∇al,b) ≡ ιb∇adλ

is called the Hessian of the function λ. The Hessian is a ten-sor representing all the (covariant) second derivatives of λ de-fined with respect to the old metric g. In the index notation,the Hessian of λ is the tensor λ;αβ .The new Ricci tensor is

Ric (a, c) = Ric (a, c) + g (a, c) λ

+ (N − 2) [Hλ(a, c) + g(l, l)g(a, c) − g(a, l)g(c, l)] ,

whereλ ≡ Tr(a,b)Hλ(a,b)

denotes the covariant D’Alembert operator (theD’Alembertian) defined with respect to the old metricg. Note that

g(l, l) ≡ g−1(dλ,dλ).

The new Ricci scalar is

R = e−2λTr(a,c)Ric (a, c)

= e−2λ[R+ 2 (N − 1)λ+ (N − 1) (N − 2)g−1(dλ,dλ)

].

In the index notation,

λ = gµνλ;µν , g−1(dλ,dλ) = gαβλ,αλ,β .

(Details on page 169.)

1.9 Geodesic curves, geodesic vectorfields

We have seen before that there is no a priori relation betweentangent vectors at different points of the manifoldM. In otherwords, there is no “intrinsic” (i.e. naturally defined) way tocarry a vector or a tensor from one point to another. However,we can use a connection ∇ to define the notion of a direction-preserving, or “parallel,” transport.

1.9.1 Parallel transport of vectors

A vector field v is called parallelly transported along a curveγ(τ) if the covariant derivative of v along the curve vanishes,

∇γv = 0. (1.71)

The interpretation is that we carry a vector v along the curveγ while keeping the magnitude and direction of v constant.Heuristically, the only way to control the constancy of v isthrough the covariant derivative in the direction γ, hence therequirement (1.71).Let us compare this condition with a similar one involvingthe Lie derivative,

Lγv = 0. (1.72)

A vector v satisfying the condition (1.72) is a connecting vec-tor for the congruence of curves to which γ(τ) belongs; itis also called Lie-propagated along the curve. The condi-tion (1.71) depends only on the direction of γ along the curve,while the Lie derivative in Eq. (1.72) requires us to have a vec-tor field γ defined also around the curve γ(τ). In other words,we need a congruence of curves and not just one curve. Differ-ent choices of the congruence around γ will lead to differentconnecting vector fields v. The geometric meaning is that vconnects points on neighboring curves corresponding to thesame value of the parameter τ .

47

Page 56: Winitzki. Advanced General Relativity

1 Calculus in curved space

To summarize: Transporting a vector from one point to an-other along a curve requires to have a relationship betweentangent spaces at different points. One needs additional infor-mation to produce such a relationship. In the case of Eq. (1.71),this information comes from the chosen connection ∇; in thecase of Eq. (1.72), this information comes from the neighbor-ing curves from a congruence containing the given curve γ.Clearly, the construction of a parallel transport closely reflectsthe intuitive notion of a vector carried along a curve “withoutchange.”

Statement 1.9.1.1: The scalar product g(u,v) is constantalong a curve γ if both u and v are parallelly transportedalong the curve.

Proof: A straightforward computation gives

∇γg(u,v) = g(∇γu,v) + g(u,∇γv) = 0.

1.9.2 Geodesics

A natural question to ask about a curve γ(τ) is whether thecurve parallelly transports its own tangent vector γ(τ). Ifthe tangent vector γ(τ) is indeed parallelly transported alongthe curve, heuristically one could say that the curve γ(τ) is“straight” in the sense that it is a line whose direction remainsunchanged along the line. Within a curved (nonflat) manifold,such lines γ(τ) are the closest analog of straight lines.

By definition, a curve γ(τ) is called a geodesic if its tangentvector γ is parallelly transported along the curve.

It is worth emphasizing that the parallel transport in GR isdefined through the Levi-Civita connection determined by thegivenmetric g. Properties of parallel transport, geodesics, andcurvature are very different if a non-Levi-Civita connection isused. In this book we always denote by ∇ the Levi-Civitaconnection and we use no other connections, so the choice ofconnection will not be discussed any more.

If we temporarily assume that the tangent vector γ is a partof a vector field v, the condition for being geodesic can bewritten as a differential equation on v, called the geodesicequation,

∇vv = 0, (1.73)

which however must hold only at points p = γ(τ) along thecurve. Note that the derivative is taken only in the directionof v itself, so it is sufficient to have the vector field v definedonly on the curve γ (where v = γ is the tangent vector to thecurve). The derivative ∇vv does not depend on the valuesof v outside of the curve. Therefore, the assumption that γ isa part of a vector field v is inessential, and Eq. (1.73) makessense also for the tangent vector v = γ defined only along thecurve.

A vector field v that satisfies the geodesic equation (1.73)at every point (not merely along a single flow line) is called ageodesic vector field.

It is perhaps not immediately evident that the choice of theparameter τ is important for a geodesic curve, according tothis definition. Namely, if γ(τ) is a geodesic and we changethe parameter to τ = τ(σ), then the new curve γ(σ) ≡ γ(τ(σ))will not necessarily be a geodesic. If the parameter τ is chosenso that γ(τ) is a geodesic then τ is called an affine parameterand the vector γ is called the affine tangent vector.

Statement 1.9.2.1: If γ(τ) is a geodesic and the parameter ischanged by τ = τ(σ), the new curve γ(σ) is again a geodesiconly if the function τ(σ) is of the form τ(σ) = aσ + b withconstant a, b and a 6= 0. Thus the affine parameter is definedup to an affine transformation.

Proof of Statement 1.9.2.1: We have

˙γ = τ ′(σ)γ; ∇ ˙γ˙γ = τ ′(σ) (γ τ ′(σ)) γ, (1.74)

which vanishes only if τ ′(σ) = const along the curve.

The square of the tangent vector, g(γ, γ), remains constantalong a geodesic due to Statement 1.9.1.1. This means that atimelike geodesic remains timelike at all times, a null geodesicremains null, and a spacelike geodesic remains spacelike. Itfollows from Eq. (1.74) that an affine parameter along a non-null geodesic can always be chosen so that g(γ, γ) = ±1: It

is sufficient to divide τ by the constant number√

g(γ, γ) toachieve this parameterization. But a null geodesic does nothave a naturally fixed affine parameter τ ; the freedom to re-place τ → aτ + β remains for null geodesics.

Physical interpretation of geodesics: According to GR,massive point particles in freefall move along worldlines γ(τ)which are timelike geodesics in spacetime. In that case, it isstandard to choose the affine parameter τ as the proper timemeasured along the particle’s worldline, so that g(γ, γ) = 1.On the other hand, electromagnetic waves (in the geometri-cal optics approximation) can be pictured as rays propagat-ing along null geodesic curves.14 So I will call null geodesiccurves lightrays. In a calculation involving lightrays whosefrequency is being measured, it is possible to choose the affineparameter for a null geodesic γ(τ) such that the frequency νmeasured at a point by an observer with 4-velocity u is nu-merically given by

ν = g(u, γ). (1.75)

In that case, hν (where h is Planck’s constant) is the locallymeasured energy of a single photon propagating along thenull geodesic γ(τ). The 4-vector hγ can be interpreted as the4-momentum of that photon.The statements about the lightrays can be justified by con-sidering a local coordinate system xµ of an observer whohas the 4-velocity u and measures the frequency of an elec-tromagnetic wave near a spacetime point p. In a very smallneighborhood of p, the wave looks like a plane wave and themetric g can be brought into the Minkowski form, g ≈ η, bya choice of local coordinates. Then the electromagnetic po-tential A near the point p can be expressed by the familiarformula

A = A0 exp [iη(k,x)] ,

where k is the wave vector, A0 is the amplitude of the wave,and x is a vector representing the local coordinates. (Thewavevector k is a null geodesic vector field.) In a reference frame ofthe observer, the frequency of the wave is the coefficient at tin the expression exp [2πiνt] in the phase of the wave; in otherwords,

ν =1

2πk0,

where k0 is the time component of the 4-vector k. The 4-velocity u has components 1, 0, 0, 0 in the observer’s refer-ence frame. Therefore, the frequency ν can be computed as

14This follows from the equations of Maxwell electrodynamics in curvedspace. In this book I will not derive this property.

48

Page 57: Winitzki. Advanced General Relativity

1.9 Geodesic curves, geodesic vector fields

ν = 12πη(k,u) in the observer’s reference frame. To obtain a

formula for ν valid in an arbitrary coordinate system, we needto use the metric g instead of η; hence

ν =1

2πg(k,u).

Since the affine parameter may be multiplied by a constantfactor, we can choose the affine parameter τ along the nullgeodesic γ(τ) such that γ = 1

2πk. Then Eq. (1.75) holds.

Remark: Since the geodesic equation ∇γ γ = 0 is a first-order equation with smooth coefficients,15 it has a unique so-lution that depends only on the direction of the vector γ at aninitial point p. Thus we have a map from vectors v ∈ TpM ata point p to geodesic lines starting at p. This map is called theexponentiation map exp:v → γ(τ). The resulting geodesicsare sometimes denoted γ(τ) = exp(vτ), but we will not usethis notation since it does not give a computational advan-tage.

A useful property of null geodesics is formulated in the fol-lowing statement.

Statement 1.9.2.2: A null geodesic γ(τ) remains a geodesicwhen the metric is rescaled by a conformal transformation,g = e2λg, under a suitable change of the affine parameter τ →f(τ), such that the new affine tangent vector is e−2λγ. In otherwords, the shape of a null geodesic is conformally invariant.

Idea of proof: Use Eq. (1.45).

Proof of Statement 1.9.2.2: Suppose that the new affine tan-gent vector is n = φn, where φ is an unknown function andn = γ. It follows from Eq. (1.45) and ∇nn = 0 that, for arbi-trary x,

g(∇φn (φn) ,x) = 2φ2 (∇nλ) g(n,x) + g(∇φn(φn),x)

= g(n,x)φ2 (2∇nλ+ ∇n lnφ) .

This will vanish if φ = e−2λ. The new affine parameter τ isfound by integrating φ along the curve γ.

1.9.3 Geodesics extremize proper length

In Riemannian geometry, when the metric is positive-definite,sufficiently short geodesic curves have an important property:they are curves of shortest length among all curves connectingtwo given points. In GR, the metric is not positive-definite, sogeodesic curves extremize the proper length. In this section wederive the geodesic equation from the corresponding varia-tional principle.The proper length of a curve segment γ(τ) is

L[γ] =

∫ τ2

τ1

g(γ, γ)dτ. (1.76)

The proper length is extremized (but not necessarily maxi-mized or minimized) if the functional L[γ] does not change infirst order in the perturbation when the curve γ is infinites-imally perturbed (while the endpoints γ(τ1) , γ(τ2) remainfixed). We now apply the standard considerations of the vari-ational calculus to this condition.

15This property may not hold at a point where the manifold is not smooth,i.e. at a physical singularity. In this book I do not attempt to consider non-smooth manifolds and do not examine the precise conditions for smooth-ness.

Let v ≡ γ be the tangent vector along the curve. A pertur-bation of the curve γ can be described as the flow of a vectorfield t which, we imagine, is directed “transversely” to γ andleaves the endpoints γ(τ1,2) in place. Let us denote by σ a pa-rameter along the lines of t, starting from the curve γ. Thus,the flow of t “infinitesimally perturbs” γ into a curve γ, whenwe let the lines of t carry the points of γ for an infinitesimalparameter distance δσ. Let v be the “perturbed” vector fieldobtained by transporting the initial curve γ(τ) by various dis-tances along the flow of t. By construction, v is a connectingvector for t, and so ∇tv = ∇vt. The proper length of a per-turbed curve can now be seen as a function of σ,

L[γ] =

∫ τ2

τ1

g(v, v)dτ.

Hence, the curve γ will extremize the proper length ifdL/dσ = 0 at σ = 0. The operation d/dσ corresponds to ap-plying (under the integral sign) a derivative Lt = ∇t alongthe flow of t,

d

dσL[γ] =

∫ τ2

τ1

∇t

g(v, v)dτ =

∫ τ2

τ1

g(∇vt, v)√

g(v, v)dτ.

Here we have used the property ∇tv = ∇vt. The last expres-sion above does not contain derivatives of v and is evaluatedon the initial curve γ where σ = 0 and v = v, hence

d

dσL[γ]

∣∣∣∣σ=0

=

∫ τ2

τ1

g(∇vt,v)√

g(v,v)dτ. (1.77)

The curve γ extremizes the proper length if the above integralvanishes for all t. Since an arbitrary change of the parameterτ ′ = f(τ) does not change the proper length L[γ], we maychoose τ to normalize the vector v so that g(v,v) = constalong the curve γ, where the constant may be either ±1 orzero, depending on the causal character of the curve. At themoment, we only consider segments of the curve that have thesame causal character, and also assume that g(v,v) 6= 0 (thecase of null curves will be considered below). Similarly, theperturbed curve γ can be normalized in the same way, so thatthe flow of the field t does not perturb the normalization and

∇tg(v,v) = 0. We can then drop the constant factor√

g(v,v).Since

g(∇vt,v) = ∇vg(t,v) − g(t,∇vv),

an integration over τ removes the total derivative term

∇vg(t,v) ≡ d

dτg(t,v),

because by assumption the vector field t vanishes at the end-points. The remaining term yields the condition

∫ τ2

τ1

g(t,∇vv)dτ = 0.

Since this condition should be satisfied for any vector fieldt, we find ∇vv = 0, which is precisely the geodesic equa-tion (1.73). Note that we have already fixed the normaliza-tion of the curve, g(v,v) = const, so the resulting geodesicequation is not invariant under arbitrary changes of parame-ter τ → f(τ).

49

Page 58: Winitzki. Advanced General Relativity

1 Calculus in curved space

Remark: For a brief time in the above calculation, weneeded to use expressions such as Ltv and ∇tv, which re-quire that v be a vector field defined also away from the curveγ(τ). This minor inconvenience is easily remedied by suppos-ing that the curve γ(τ) is one of the flow lines of some vectorfield v, so that v is the tangent vector to γ along the curve γ,but v can have arbitrary values away from the curve γ as longas v is smooth. Since the final result (the geodesic equation)contains only the derivative∇vv along the curve γ, the valuesof v away from γ are irrelevant.

Practice problem: (a) By considering the variation ofthe functional (1.76) but without assuming the conditiong(v,v) = const, derive a more general form of the geodesicequation,

∇v

(

1√

g(v,v)v

)

= 0.

(However, do assume that g(v,v) 6= 0.)(b) Suppose that a vector field v satisfies the equation

∇vv = µv for some scalar function µ. Show that v can bemade geodesic by a rescaling v → λv.

Remark: In the derivation above, we assumed that g(v,v) 6=0; let us now consider the case of null curves, i.e. curves γ(τ)such that g(γ, γ) = 0 along the curve. Such null geodesicscannot be obtained by extremizing the functional (1.76), be-cause a variation (even an infinitesimal one) of a null curveγ(τ)will make some portions of the curve spacelike and other

portions timelike, and then√

g(γ, γ) will not remain well-defined. To avoid this difficulty, one can simply postulateEq. (1.73) for null geodesics. Alternatively, a different vari-ational principle can be used to derive the geodesic equa-tion (1.73). Consider the variational problem with the func-tional

L[γ(τ);N(τ)] ≡∫ τ2

τ1

g(γ, γ)N(τ)dτ,

depending on an additional unknown function N(τ). Sincethe functional does not contain derivatives of N , this variableplays the role of a Lagrange multiplier. The variation withrespect to N(τ) yields the constraint g(γ, γ) = 0, while thevariation with respect to γ(τ) yields the equation

N∇vv = − (∇vN)v +1

2g(v,v)g−1dN.

Using the constraint g(v,v) = 0 and choosing a functionN(τ)that is constant along the flow of v, one recovers the geodesicequation (1.73). When the LagrangemultiplierN(τ) is chosenarbitrarily, one still obtains the geodesic curve but in a non-affine parameterization.The variational principle can be generalized to the case oftimelike or spacelike geodesics in the following way. Onewrites the functional

L[γ(τ);N(τ)] ≡∫ τ2

τ1

[

g(γ, γ)N(τ) − K

N(τ)

]

dτ,

where K is a constant. Variation with respect to N(τ) yieldsthe constraint g(γ, γ) = K , while variation with respect toγ(τ) yields the non-affine geodesic equation as before. Thevalue of K can be chosen freely to control the normalizationof the curve; the value K = 0 is perfectly admissible. In thisway, geodesics of every kind are described through a singlevariational principle.

1.9.4 *Motion under external forces

In classical physics, equations of motion for particles andfields are derived from an action principle: The correct tra-jectories must extremize a certain functional called the actionfunctional. As an example, we now derive the equations ofmotion for a charged massive particle in the presence of anexternal electromagnetic field in curved spacetime.16

The motion of a massive particle corresponds to a timelikeworldline γ(τ) in spacetime. The electromagnetic field is de-scribed by a 1-formA, and the gravitational field by themetricg. We will assume that these fields are fixed, and the questionis to determine the trajectory γ(τ) of the particle.

The action functional for the particle is

S[γ; A, g] = −m∫ τ2

τ1

g(γ, γ)dτ + q

∫ τ2

τ1

(A γ) dτ,

wherem is the rest mass and q is the electric charge of the par-ticle. The first term in the action is m times the proper timealong the trajectory, while the second term is simply the inte-gral of the 1-formA along the worldline, which may be writ-ten more concisely as

γA (see Eq. (1.16) and the preceding

explanations).

The equations of motion for the particle are derived by ex-tremizing the functional S with respect to the trajectory γ(τ).Although the actual computation is fairly short, I will gothrough it slowly and show every step.

As in Sec. 1.9.3, we introduce a vector field twhich is trans-verse to the curve γ and vanishes at τ = τ1,2. The vector fieldt perturbs the curve γ by shifting it to the side. Then we con-struct the vector field v by shifting the curve γ along the flowlines of t by different parameter distances σ. The variation ofthe functional S is found by applying the derivative Lt underthe integral,

d

dσS[γ; A, g] =

∫ τ2

τ1

Lt

(

−m√

g(v, v) + qA v)

dτ. (1.78)

The terms under the integral are simplified using Ltv = 0and the Leibnitz property of the Lie derivative. The first termyields

Lt

g(v, v) =(Ltg) (v, v)

2√

g(v, v)

1=g(∇vt, v)√

g(v, v)

2= −g(t,∇v

v√

g(v, v)) + ∇v

g(t, v)√

g(v, v)

(1= is via Eq. (1.46),

2= is to separate a total derivative). The in-

tegral of the total derivative ∇v(...) vanishes since t vanishesat the endpoints τ1,2. Thus the first integral term in the right-hand side of Eq. (1.78) is simplified to

m

∫ τ2

τ1

g(t,∇v

v√

g(v, v))dτ.

The second term in the right-hand side of Eq. (1.78) becomes

Lt (A v) = (LtA) v,

16I called the electromagnetic field external to emphasize that this field isassumed to be given and fixed, rather than determined dynamically.

50

Page 59: Winitzki. Advanced General Relativity

1.10 Example: hypersurface of constant curvature

and then the line integral can be transformed as

∫ τ2

τ1

(LtA) vdτ1=

γ

LtA2=

γ

(dιtA + ιtdA)

3=

γ

ιtdA4=

∫ τ2

τ1

(dA) (t, v) dτ

= −∫ τ2

τ1

ιtιv (dA) dτ

(1= is by definition (1.16) of integrals of 1-forms over curves,

2=

is by the Cartan homotopy formula (1.22),3= is due to

γ df =

0 for a function f that vanishes at the endpoints of the curve,

and4= is by definition of ιt). Putting the two terms together

and evaluating the derivatives on the initial curve γ, we find

d

dσS[γ; A, g]

∣∣∣∣σ=0

=

∫ τ2

τ1

mg(t,∇v

v√

g(v,v))dτ

−∫ τ2

τ1

q ιtιv (dA) dτ.

The integrand above is linear in the vector field t, and thuscan be expressed as an auxiliary 1-form V applied to t,

d

dσS[γ; A, g]

∣∣∣∣σ=0

=

∫ τ2

τ1

(ιtV ) dτ =

∫ τ2

τ1

(V t) dτ,

V ≡ mg∇v

v√

g(v,v)− qιvdA.

The action S[γ; A, g] is extremized if the derivative dS/dσ = 0.This can happen for an arbitrary field t only if the 1-form V

vanishes for all τ . Thus we obtain the equation

V = mg∇v

v√

g(v,v)− qιvdA = 0.

This is the equation of motion for a charged particle in an ex-ternal electromagnetic field.We may rewrite this equation in a more familiar form bychoosing the parameter τ such that g(v,v) = 1 and notingthat the 2-formdA is the electromagnetic field tensor (2-form),usually denoted F :

m∇vv = qg−1ιvF .

This can be recognized as a covariant form of Newton’s law.The left-hand side is the proper acceleration times mass, andthe right-hand side is the Lorentz force. In the index notation,this is written as

mvαvµ;α = qgµνFλνvλ.

Note that the electromagnetic force appears in the right-handside, while the gravitational force is not explicitly introduced(but is implicitly present because of the covariant derivativein the left-hand side). In GR, the effects of gravitation are de-scribed not by forces but by the modification of the equationsof motion due to the presence of covariant derivatives.

1.9.5 Deviation of geodesics

Consider a geodesic congruence with an affine tangent vectorfield v and an affine parameter τ corresponding to the propertime, so that g(v,v) = 1. Let us investigate how neighbor-ing geodesics deviate from each other with τ . It is natural

to measure the deviation between neighbor geodesics using aconnecting vector field c, since it would connect points cor-responding to the same value of τ . In the reference frame ofan observer whose time coordinate is τ , the neighbor geodesicthus has the coordinate cδ, where δ is “very small.” Then theobserver moving along a geodesic worldline γ(τ), such thatγ = v|p=γ(τ), measures the 4-acceleration of the neighbor ob-server as

a = ∇γ∇γc.

Let us compute this quantity using the properties

[v, c] = ∇vc −∇cv = 0, ∇vv = 0, γ ≡ v(p).

We find, evaluating all the quantities at the point p,

a = ∇v∇vc = ∇v∇cv = R(v, c)v + ∇c∇vv

= R(v, c)v.

This is called the equation of geodesic deviation. In compo-nents, this equation can be written as

aµ = Rλνρµvλcνvρ.

The equation of geodesic deviation shows that the Rie-mann tensor R has a direct physical manifestation: Inertialgeodesics will acceleratewith respect to each other (exhibitinga gravitational tidal effect) iff the Riemann tensor is nonzero.Furthermore, the entire Riemann tensor Rλµνρ can be recov-ered if the vector-valued quantity R(v, c)v is known for arbi-trary v and c; this is derived in the following statement. Thus,all the components of the Riemann tensor at a point can be (inprinciple) measured using only (a finite number of) geodesicdeviation experiments within an infinitesimal neighborhoodof that point.

Statement 1.9.5.1: The Riemann tensor at a point can be de-termined from a finite number of measurements of geodesicdeviation using only trajectories of massive particles in a sin-gle reference frame. In other words, R(a,b, c,d) can be ex-pressed through its special values of the form R(v,x,v,y) forsome future-directed, timelike vectors v and some spacelikevectors x,y orthogonal to v. (Proof on page 170.)

1.10 Example: hypersurface ofconstant curvature

Let xµ be the usual rectangular coordinates in a flat space Rm

with metric g, and let us consider the hypersurface xµxµ ≡g(x,x) = A2, where A is a given constant. This hypersurfacewill be a sphere if g is a Euclidean metric, or a hyperboloidif g is pseudo-Euclidean, but for now we assume that g hasa Euclidean signature and A2 > 0. We shall show that thishypersurface is an (m− 1)-dimensional space of constant cur-vature; this will elucidate the concept of constant curvature.This result holds regardless of the signature of the metric andthe sign of A2, as long as A2 6= 0.

1.10.1 Tangent bundle and induced metric

The vector field nµ(x) ≡ 1Ax

µ is, by construction, everywherenormal to the hypersurface and normalized to g(n,n) = 1.(The normalization g(n,n) = 1 holds only on the hypersur-face, but this is sufficient for our calculations.) Vectors t tan-gent to the hypersurface satisfy g(n, t) = 0; let us call such

51

Page 60: Winitzki. Advanced General Relativity

1 Calculus in curved space

vectors simply tangent. The equivalent condition for tangentvectors t is

t (g(x,x) −A2) = t g(x,x) = 0

for x on the hypersurface.

Statement: The commutator of tangent vector fields is againtangent.Hint: Tangent vectors to a hypersurface f(p) = 0 satisfy

x f = 0.Derivation: This follows directly from the definition of thecommutator: The vector field [x,y] is defined intrinsicallythrough curves lying within the hypersurface and derivativesof functions along these curves. So [x,y] is a derivative alongsome curve within the hypersurface, and thus must be a tan-gent vector field. Here is also an explicit calculation. Supposethat the hypersurface is specified by an equation f(p) = 0;then a vector field x is tangent if xf = 0. By definition of thecommutator we have (for tangent x,y)

[x,y] f = x (y f) − y (x f) = 0.

Let us construct a self-adjoint projector onto the tangentbundle of the hypersurface, that is, a self-adjoint linear trans-formation P such that P 2 = P and Pu is everywhere tan-gent for any field u. It is easy to see that the (transformation-valued) tensor field P must be defined as

Pu = u − n g(u,n) (1.79)

and then we have g(Pu,n) = 0.The condition of self-adjointness,

g(Px,y) = g(x, Py), (1.80)

can be motivated by the consideration that the projection of avector x onto the tangent space must preserve the scalar prod-uct of x with vectors that are already tangent,

g(x, Py) = g(Px, Py) for all x,y.

This latter requirement expresses the usual geometric pic-ture of projecting “straight down to the surface”: a vector xis modified by subtracting a multiple of the normal vector,which preserves scalar product. By exchanging x with y, wefind

g(x, Py) = g(Px, Py) = g(Py, Px) = g(y, Px),

which is the condition (1.80). Conversely, Eq. (1.80) with theproperty P 2 = P gives g(x, Py) = g(Px, Py).The inducedmetric on the hypersurface is simply the samemetric as in the larger space R

m but restricted to tangent vec-tors. For convenience of notation, we call the induced metrich and define h(u,v) = g(u,v) for tangent vectors u,v. Thusthe hypersurface has the structure of a (pseudo-) Riemannianmanifold. Let us now compute the curvature tensor of thismanifold.

1.10.2 Induced connection

Since the hypersurface has a metric (the induced metric h), wemay define the corresponding Levi-Civita connection ∇ (alsocalled the induced connection). This connection is defined

only for vector fields tangent to the hypersurface. We note thatthere is a natural flat connection ∂ in Rm, which is acts by thepartial derivatives with respect to the Cartesian coordinates,

(∂uv)α ≡ uµ∂µv

α.

We shall now see how to express the connection ∇ on thehypersurface through the existing connection ∂ in the largerspace.The first motivation for deriving the induced connection isthe following heuristic consideration. If t and u are tangentvector fields, we can compute ∂ut, but the result is not neces-sarily a tangent vector field. Tomake it tangent, we can project∂ut onto the tangent bundle using the projection operator P ,i.e. we define

∇ut ≡ P∂ut. (1.81)

The result is the “induced” covariant derivative ∇ut that hasvalues in the tangent bundle of the hypersurface.However, it is not immediately obvious that this “pro-jected” connection ∇ is indeed the Levi-Civita connectionthat would be defined intrinsically on the hypersurface, with-out using the already existing connection ∂ in a larger space.Therefore we give a second, more rigorous motivation: TheLevi-Civita connection on the hypersurface is defined byEq. (1.43) if we substitute h, the induced metric, instead ofg. Since h = g and the Levi-Civita connection for g is ∂, weobtain for arbitrary tangent vectors x,y, z

h(∇xy, z) = g(∂xy, z). (1.82)

However, Eq. (1.82) defines the vector ∇xy only through itsscalar product with an arbitrary tangent vector z. The vector∂xy is not necessarily tangent, but Eq. (1.82) shows that ∇xy

and ∂xy must have equal scalar products with any tangentvector z (and, of course, ∇xy must be tangent). Therefore,∇xymust be the projection of ∂xy onto the tangent bundle, asshown in Eq. (1.81).Let us note that g(n, t) ≡ 0 for any tangent vector t, andthus

0 = ∂ug(n, t) = g(∂un, t) + g(n, ∂ut)

on the hypersurface. Then we can rewrite the induced con-nection (1.81) using Eq. (1.79) as

∇ut = ∂ut− ng (n, ∂ut) = ∂ut + ng(t, ∂un) ≡ ∂ut + Γ(u)t,(1.83)

where the tensor Γ is the following transformation-valued 1-form,

Γ(u)t ≡ ng(t, ∂un) = Γλαβuαtβ ; Γλαβ ≡ nλ∂αnβ.

The tensor Γ is called the connection 1-form or theChristoffelsymbol. Substituting nµ = 1

Axµ, we get

∂αnβ =

1

Aδβα, ∂un =

1

Au, Γ(u)t =

1

Ang(u, t). (1.84)

Statement: The induced connection ∇ is in the Levi-Civitaconnection on the hypersurface, i.e. for arbitrary tangent vec-tors x,y, z we have

∇xh(y, z) − h(∇xy, z) − h(y,∇xz) = 0,

∇xy −∇yx − [x,y] = 0.

Derivation:

∇xh(y, z) ≡ x g(y, z);h(∇xy, z) = g(∂xy, z);

52

Page 61: Winitzki. Advanced General Relativity

1.10 Example: hypersurface of constant curvature

then we use x g(y, z) − g(∂xy, z) − g(y, ∂xz) = 0 (the flatconnection ∂ is compatible with the flat metric g). Torsion-freeness follows by projecting onto the tangent bundle:

∇xy −∇yx = P (∂xy − ∂yx) = P [x,y] = [x,y] ,

the last equality holds since [x,y] is tangent.

Self-test question: In the above calculation, the connection1-form Γ is obviously a tensor since it is defined in an invari-ant, geometric way by Eq. (1.84). However, Γ plays the roleof the Christoffel symbol because Eq. (1.83) is written in thecomponent notation as ∇αv

λ = ∂αvλ + Γλαβv

β . It is explained

in most GR textbooks that the Christoffel symbol Γλαβ is not atensor. Is there a contradiction?

Hint: In the index-free approach, every quantity is a tensor.

Answer: “Our” tensor Γ is defined using the “coordinate-based” connection ∂ which depends on a particular, fixed co-ordinate system inRm, which is in the present case the naturalCartesian coordinate system xµ. If we pass to a different co-ordinate system yµ, say to the spherical coordinate system,the components of the tensor Γwill have to be transformed ac-cording to the usual tensor law. The formula for Γ will not bethe same in the coordinate system yµ. One could say thatthe connection ∂/∂xµ itself is no longer ∂/∂yµ in a differentcoordinate system yµ and its components also need to betransformed. The standard GR textbooks, on the other hand,define the components Γλαβ by a fixed, non-covariant formulain an arbitrary coordinate system. This procedure does not de-fine a tensor. So then it is no surprise that the components Γλαβdo not transform as components of a tensor.

Practice problem: A transformation-valued tensor field S isdefined in the tangent bundle of the hypersurface by its actionon tangent vector fields t in the following way,

St = h(v, t)v − 1

17t,

where h is the inducedmetric, v is a given tangent vector fieldwhich is normalized, h(v,v) = 1. Compute∇S, which shouldbe a transformation-valued 1-form.

Solution: For arbitrary tangent vector fieldsu and t, we have

(∇uS)t = ∇u(St) − S∇ut

= ∇u

(

h(v, t)v − 1

17t

)

− h(v,∇ut)v +1

17∇ut

= h(∇uv, t)v + h(v, t)∇uv

= g(∂uv, t)v + g(v, t)

(

∂uv − 1

Ang(u,v)

)

.

1.10.3 Riemann tensor within the hypersurface

Finally, let us compute the Riemann tensor of the hypersur-face. By definition, we have

R(x,y)z = [∇x,∇y] z −∇[x,y]z,

and we write out the terms using Eq. (1.83),

∇x∇yz = ∇x (∂yz + Γ(y)z)

= ∂x∂yz + Γ(x)∂yz + ∂xΓ(y)z + Γ(x)Γ(y)z;

∇[x,y]z = ∂[x,y]z + Γ([x,y])z;

[∇x,∇y] z −∇[x,y]z = [∂x, ∂y] z− ∂[x,y]z + [Γ(x),Γ(y)] z

+Γ(∂xy − ∂yx − [x,y])z + (∂xΓ)(y)z − (∂yΓ)(x)z

= [Γ(x),Γ(y)] z + (∂xΓ)(y)z − (∂yΓ)(x)z. (1.85)

In the last line we used the fact that ∂ is a flat and torsion-freeconnection, in order to cancel the underlined terms. (Note thesimilarity of the last formula to the standard expression forthe Riemann tensor in the component notation, R.... = ∂.Γ

.

.. −∂.Γ

.

.. + Γ...Γ... − Γ...Γ

.

..; see Eq. (A.11) in Appendix A.)Now we substitute Eq. (1.84) into Eq. (1.85) and find (fortangent vectors x,y, z, t)

Γ(x)Γ(y)z =1

Ang(x,Γ(y)z) =

1

Ang(x,n(...)) = 0,

(∂xΓ)(y)z =1

A(∂xn)g(y, z) =

1

A2xg(y, z),

R(x,y)z =1

A2(xg(y, z) − yg(x, z)) ,

R(x,y, z, t) =1

A2(g(x, t)g(y, z) − g(x, z)g(y, t))

=1

A2(h(x, t)h(y, z) − h(x, z)h(y, t)) , (1.86)

since g(u,v) = h(u,v) for tangent vectors u,v. In the indexnotation, the above Riemann tensor is

Rαβγδ =1

A2(hαδhβγ − hαγhβδ) .

A manifold with a Riemann tensor of form (1.86) is calleda space of constant curvature. Note also that the secondBianchi identity forces A = const for a Riemann tensor of theform (1.86).Our result is that the hypersurface xµx

µ = A2 has constantcurvature. The actual magnitude of the curvature is A−2 if themetric g is Euclidean and−A−2 if it is pseudo-Euclidean withsignature (+ −−...), because in the latter case the vector n is“timelike” and the metric h is negative-definite. (The casesxµx

µ = −A2 or g(n,n) = −1 can be treated in a very similarway and the result is essentially the same, with appropriatesign changes.)

Remark: The following two arguments help to visualize theconcept of constant curvature.The first argument appeals to the intuitive understandingthat a sphere is a perfectly symmetrically curved surface thathas a constant curvature at all points. The above calculation of

the Riemann tensor obviously holds for a sphere |x|2 = R2 inthe usual Euclidean space. It makes intuitive sense to say thata sphere of radius R has a constant curvature equal to 1/R.Therefore, any space with a Riemann tensor (1.86) is a spaceof constant curvature, and the value of the curvature is equalto 1/A.The second argument is somewhat more abstract. The ten-sor Rαβλµ = Rλµαβ can be viewed as a symmetric bilinear

form in the space of 2-forms, R(ω(1), ω(2)) = Rαβγδω(1)αβω

(2)γδ ,

or as a linear transformation in the space of 2-forms, ωαβ →Rαβ

λµωλµ, where ωαβ = −ωβα is an arbitrary 2-form. Thespace of 2-forms is six-dimensional, and the bilinear form

53

Page 62: Winitzki. Advanced General Relativity

1 Calculus in curved space

R(ω(1), ω(2)) can be represented as a symmetric 6 × 6 matrix.Due to this symmetry, the corresponding transformation is di-agonalizable. Consider the six eigenvalues λ1(p), ..., λ6(p) ofthat transformation as functions of the point p on the mani-fold. These six scalar functions characterize (in some way) themagnitude of the curvature of the manifold at various points.Without a detailed analysis of the significance of the eigenval-ues λj and the corresponding eigenvectors, we may heuristi-cally expect that, for a manifold of constant curvature, all thesix eigenvalues λj should be equal to each other and remainconstant throughout the manifold. Thus the transformationRαβ

λµ must be proportional to the identity map in the spaceof 2-forms,

Rαβλµ =

(

δµαδλβ − δλαδ

µβ

)

K, K = const.

(The coefficientK must be constant also by the second Bianchiidentity, once we assume that Rαβ

λµ has the form givenabove.) Then the comparison with the sphere shows that thenumeric value of the curvature isK .

Practice problem: Consider a surface embedded in Rm,specified by an equation f(p) = 0, where f is a given scalarfunction and p ∈ Rm. Obtain the normal vector field n interms of f and the flat metric g in Rm. Compute the inducedmetric h and the Riemann tensorR(x,y, z, t) in terms of n andg.Solution: The vector n normal to the hypersurface is

n =1

Ng−1(df), N ≡

g−1(df,df).

In other words, g(n,x) = N−1∂xf for any vector x. In com-ponents:

nα =gαβ∂βf

N, N ≡

gαβ (∂αf) (∂βf).

Note that g(n,n) = 1 and so g(∂xn,n) = 0 for any vector x.The projection onto the tangent bundle is

Px = x − ng(n,x).

The induced connection∇ is equal to ∂ projected onto the tan-gent bundle,

∇xy = P∂xy = ∂xy − ng(n, ∂xy) = ∂xy + ng(y, ∂xn),

which can be also written as

∇xy = ∂xy + Γ(x)y, Γ(x)y ≡ ng(∂xn,y).

The connection ∇ is the Levi-Civita connection on the hyper-surface. The curvature of ∇ is

R(x,y)z = [Γ(x),Γ(y)] z + (∂xΓ)(y)z − (∂yΓ)(x)z

= g(∂yn, z)∂xn − g(∂xn, z)∂yn;

R(x,y, z, t) = g(∂yn, z)g(∂xn, t) − g(∂xn, z)g(∂yn, t).

In the index notation, we can write

Rκλµν =((∂λn

α)(∂κnβ) − (∂κn

α)(∂λnβ))gαµgβν

= (∂λnµ) (∂κnν) − (∂κnµ) (∂λnν) ,

since the metric gαβ in the Euclidean space Rm is flat. It isclear that, in general, the hypersurface f(p) = 0 is not a spaceof constant curvature.

54

Page 63: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

Throughout this chapter, we consider a four-dimensionalspacetimewith ametric g having the signature (+ −−−), andthe Levi-Civita connection ∇. Most results can be straightfor-wardly generalized to higher dimensions, but it is importantthat the metric has indefinite signature of the kind (+ −−...)and, as a consequence, that there exist null directions.

2.1 Null vectors

Recall that a vector v is called timelike if g(v,v) > 0, null ifg(v,v) = 0, and spacelike if g(v,v) < 0. A geodesic curveγ(τ) is called a lightray if g(γ, γ) = 0 for all τ . (Lightraysare worldlines of light or other radiation carried by masslessparticles.)Null vectors have somewhat peculiar properties which wenow review.

Statement 2.1.0.1: a) If n is null and v is orthogonal to n,i.e. g(v,n) = 0, then v is either parallel to n, or spacelike. Inother words: the orthogonal complement to a null vector n

is spanned by n and two spacelike vectors. (The orthogonalcomplement n⊥ of a vector n is the subspace consisting ofvectors v such that g(n,v) = 0.)b) If n is null and v is some vector such that g(v,n) 6= 0 thenthere exists a linear combination αn + βv that is timelike, andanother that is spacelike.Hint: These statements concern vectors in a single tangentspace TpM, so it suffices to consider the Minkowski spacewith a standard coordinate basis t, x, y, z.

2.1.1 Orthogonal complement spaces

The preceding statement 2.1.0.1 shows that the orthogonalcomplement n⊥ to a null vector n is a three-dimensional sub-space spanned by n and two other spacelike vectors. For cal-culations, it is convenient to have a projection operator ontothis subspace. We shall now investigate the properties of suchprojections.We begin with an easier case: the projection onto n⊥, where

n is not null. Recall that a self-adjoint projector is an operatorP such that g(Px,n) = 0 for all x (any vector x is projectedonto n⊥), P 2 = P (“the projection remains projected”), andthe bilinear form corresponding to P is symmetric, g(Px,y) =g(x, Py) for all vectors x,y (self-adjointness; see Sec. 1.10.1for a motivation). For a unit vector n, there is a unique self-adjoint projector onto n⊥, given by the standard formula

Px = x − ng(n,x). (2.1)

(See also Eqs. (1.79) and (1.80) in Sec. 1.10.1.) It is easy to com-pute the trace of P (see Sec. 1.7.3 for details),

g(Px,y) = g(x,y) − g(n,x)g(n,y);

Tr P = Tr(x,y)g(Px,y) = 4 − g(n,n) = 4 − 1 = 3.

For calculations in a subspace, it is convenient to define apartial metric h which coincides with the metric g for vectors

in the subspace but is zero on vectors orthogonal to the sub-space. Thus, h is defined in the entire space but “can onlysee the subspace.” The advantage of introducing the partialmetric is that one can work with vectors from the full space,substituting vectors from the subspace only at the end of cal-culations.Given a self-adjoint projector P onto the subspace, the par-tial metric h satisfying the above requirements can be ex-pressed as

h(a,b) = g(Pa, Pb) = g(Pa,b).

(The last equality is due to P 2 = P and self-adjointness.) Fora standard projector (2.1) to the space n⊥ where g(n,n) = 1,the partial metric is

h(a,b) = g(a,b) − g(n,a)g(n,b).

Nowwe turn to the case when the vector n is null. Then theformula (2.1) fails since g(n,n) = 0, so g(Pu,n) 6= 0. Clearly,the transformation Px needs to subtract from x some vectorwhose scalar product with n is nonzero. So let us try Px =x − lg(n,x), where l is a vector such as g(l,n) = 1. This givesg(Px,n) = 0 for any x; however, this operator P is not self-adjoint, PTx = x−ng(l,x) 6= Px, and we need to symmetrizeit. So finally we define the map

Px = x− lg(n,x) − ng(l,x), (2.2)

where again g(l,n) = 1. It is easy to check that P is nowself-adjoint and that g(Px,n) = 0 for any x. The requirementP 2 = P yields the condition

0 = g(l, Px) = g(l,x− lg(n,x) − ng(l,x)) = g(l, l)g(n,x),

which can be satisfied for all x only if l itself is null, g(l, l) = 0.Thus the self-adjoint projector (2.2) is, by necessity, also a pro-jector onto the orthogonal complement of another null vector

l. The image of P is therefore the set (l,n)⊥of vectors orthog-

onal to both n and l, which is not the entire n⊥ but a two-dimensional subspace within n⊥. Accordingly, the trace of P isequal to 2:

Tr P = 4 − g(n, l) − g(n, l) = 4 − 1 − 1 = 2.

Statement: For null vectors n, there exists no self-adjointprojector onto the full subspace n⊥.Derivation: If P were such a projector, we would have Pn =

n. Consider a vector v such that g(n,v) 6= 0. By assumptionthe vector Pvmust be orthogonal to n, and then the conditionof self-adjointness yields a contradiction:

0 6= g(n,v) = g(Pn,v) = g(n, Pv) = 0.

In other words, we cannot project v while keeping the scalarproduct of v and n constant.The orthogonal complement subspace n⊥ has a naturallyinduced metric, which is just the restriction of the full metricg onto n⊥. However, the induced metric is degenerate since n

55

Page 64: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

is orthogonal to every vector in the subspace. It is awkwardto work with a degenerate metric; for instance, tensors speci-fied only through their scalar products with vectors from n⊥

become ambiguous (defined up to multiples of n). To avoidthis inconvenience, we may further restrict the subspace n⊥

to a two-dimensional subspace (l,n)⊥which is the image of

a projector P . The induced metric is nondegenerate on the

subspace (l,n)⊥. However, this subspace is not uniquely se-

lected since it depends on the choice of a second null vector, l.Despite these difficulties, we will obtain unambiguous resultsthat are independent of the freedom in the choice of l.Let us now compare the projectors for the spaces v⊥, n⊥,and s⊥, where the vectors v,n, s are respectively timelike,null, and spacelike:

Pvx = x − g(v,x)v, g(v,v) = 1, Tr Pv = 3;

Pnx = x − lg(n,x) − ng(l,x), g(n, l) = 1, Tr Pn = 2;

Psx = x+ g(s,x)s, g(s, s) = −1, Tr Ps = 3.

We find that the timelike and the spacelike cases are quite sim-ilar but differ from the null case. The partial metrics are

hv(a,b) = g(a,b) − g(v,a)g(v,b); (2.3)

hn(a,b) = g(a,b) − g(n,a)g(l,b) − g(l,a)g(n,b); (2.4)

hs(a,b) = g(a,b) + g(s,a)g(s,b).

We can also write the partial inverse metrics, which are (2,0)-tensors obtained from the above partial metrics by using themap g, according to

h−1(ω1, ω2) ≡ h(gω1, gω2),

where ω1 and ω2 are arbitrary 1-forms. We find

h−1v = g−1 − v ⊗ v,

h−1n = g−1 − n ⊗ l− l⊗ n,

h−1s = g−1 + s⊗ s.

Note that the partial metrics are not invertible, so we do nottalk of “inverse partial metrics.”It is important to investigate the signature the partial met-rics on the subspaces where they are nondegenerate (the par-tial signature). The easiest way to deduce the signature is tochoose a suitable basis containing the vectors v, n, or s. Forthe metric hv we choose an orthogonal basis v, s1, s2, s3,where s1, s2, s3 are all spacelike. Then we find hv(sj , sk) =−δjk, so the partial signature of hv is (−−−). For the met-ric hn, we choose a basis containing the vectors l,n, s1, s2,where s1, s2 are spacelike and orthogonal to each other and tol,n. Then we find hn(..., l) = hn(...,n) = 0, while hn(sj , sk) =−δjk, and therefore the partial signature of hs is (−−). Finally,for the metric hs we choose an orthogonal basis s,v1, s2, s3,where v1 is a timelike vector and s2, s3 are spacelike. Thus thepartial signature of hs is (+ −−).

2.1.2 Divergence of a null vector field

We have seen that there are two spacelike connecting vectorss1, s2 orthogonal to n within the subspace n⊥. Thus we canvisualize how the 2-volume (i.e. area), A(s1, s2), of the paral-lelogram spanned by these two vectors propagates along thenull direction n. From elementary geometric considerations,this area can be expressed as

A(s1, s2) = |g(s1, s1)g(s2, s2) − g(s1, s2)g(s2, s1)|1/2 .

The change of the area along the flow lines of n is describedby the directional derivative∇nA(s1, s2).It is easier to compute∇nA(s1, s2) if we note thatA(s1, s2) isequal to the 4-volume spanned by s1, s2 and two unit vectorsx,y orthogonal to both s1 and s2. For instance, wemay choosex to be spacelike and y to be timelike, g(y,y) = 1 = −g(x,x).The 2-volume A(s1, s2) is then interpreted as the area mea-sured in the reference frame of an observer moving along thetimelike direction y. (Of course, the areaA(s1, s2) is observer-independent since it is defined regardless of the choice of theauxiliary vectors x, y.) The 4-volume spanned by s1, s2,x,yis expressed through the Levi-Civita symbol as

A(s1, s2) = ε(s1, s2,x,y). (2.5)

Since ε is totally antisymmetric, the expression ε(a,b, c,d) canbe interpreted as a linear function of the totally antisymmet-ric (4,0)-tensor a ∧ b ∧ c ∧ d, where x ∧ y is the exterior (an-tisymmetric) product of vectors. Thus, the vectors x and y inEq. (2.5) can be replaced by their linear combinations x′ andy′ as long as x ∧ y = x′ ∧ y′. It turns out to be convenient forthe calculation of ∇nA to choose the vectors x′,y′ as x′ = l

and y′ = n, where l is the null vector orthogonal to both s1

and s2 and related to n by g(l,n) = 1.

Statement: Indeed the choice x′ = l, y′ = n is possible,i.e. x ∧ y = l ∧ n.Derivation: Write the vectors x′ = l,y′ = n as linear combi-nations of x,y with unknown coefficients:

l = a1x + b1y, n = a2x + b2y.

Since g(x,y) = 0, the conditions g(l, l) = g(n,n) = 0 give

a21 = b21, a2

2 = b22,

so we could choose a1 = b1, a2 = −b2 (the signs must differ, orelse lwill come out parallel to n). Then we have 1 = g(l,n) =−2a1a2 and

l ∧ n = −2a1a2x ∧ y = x ∧ y.

Thus the area is expressed as

A = ε(s1, s2, l,n).

Since s1,2 are connecting vectors, the derivative of the area Aalong the lines of n is most conveniently computed by usingthe Lie derivative,

dA

dτ= LnA = Lnε(s1, s2, l,n)

= (divn)ε(s1, s2, l,n) + ε(s1, s2,Lnl,n).

We need to retain the last term above because l is not neces-sarily a connecting vector for n. Let us compute the scalarproduct of Lnl ≡ [n, l]with n,

g(n, [n, l]) = g(n,∇nl) − g(n,∇ln)

= ∇ng(n, l) − g(∇nn, l) −1

2∇lg(n,n) = 0.

Therefore [l,n] is a linear combination of n, s1, and s2. Hence,ε(s1, s2,Lnl,n) = 0 regardless of the choice of l. (This is to beexpected since A(s1, s2) and∇nA(s1, s2) are defined indepen-dently of the choice of l.) Finally, we find

dA

dτ= (divn)A. (2.6)

The divergence divn is thus interpreted as the relative rate ofchange in the area spanned by the vectors s1, s2 as this area isbeing transported by n.

56

Page 65: Winitzki. Advanced General Relativity

2.2 Null surfaces

2.2 Null surfaces

Null surfaces (3-dimensional hypersurfaces orthogonal to anull vector field) play a major role in relativity. Before con-sidering null surfaces, we shall spend some time studying 3-dimensional hypersurfaces is general.

2.2.1 Three-dimensional hypersurfaces

A convenient way to define a 3-dimensional hypersurface isthrough an equation f(p) = 0, where f(p) is an auxiliary func-tion. In this way the hypersurface can be visualized as the lo-cus of constant f . A vector t is tangent to the hypersurface ofconstant f if the function f remains constant in the directionof t, i.e. if t f = 0. Since x f is a linear function of x, thereexists a vector n such that

x f ≡ (df) x = g(n,x), n = g−1 (df) , (2.7)

where g−1 is the map from 1-forms into vectors. The vectorn is called the contravariant gradient of f and is the normalvector to the hypersurface because it is orthogonal to everytangent vector t,

g(n, t) = t f = 0.

Note that the vector field n is well-defined not only on the hy-persurface but in the entire space; away from the hypersurfacef(p) = 0, the vector n is the normal vector to hypersurfacesf(p) = f0 for other values of f0.Belowwe shall almost always work with 3-dimensional hy-persurfaces in 4-dimensional spacetimes, and for brevity weshall call them simply surfaces.

2.2.2 Integrable vector fields

Not every vector field v admits a family of hypersurfaces forwhich v is the normal vector at each point. This property iscalled hypersurface orthogonality. A vector v is hypersur-face orthogonal if there exists a scalar function f such that vis orthogonal to every vector t tangent to the surfaces of con-stant f :

g(t,v) = 0 for any t such that t f = 0. (2.8)

A general condition for hypersurface orthogonality is givenby the Frobenius theorem (see Sec. 2.2.3). We shall nowmotivate and derive a weaker condition called integrability,which is sufficient for many purposes.In the language of classical mechanics, the “force” field v

has a “potential” f if v is the contravariant gradient of f . Inthe language of forms, we say that v is dual to the 1-form dfand write v = g−1(df). In that case, we have t f ≡ g(t,v)for any vector t, and it is clear that v is the normal vector forsurfaces of constant f . A vector field v is integrable if thereexists a scalar function f such that v is dual to df .The analogous property for 1-forms is called exactness. A1-form ω is exact if ω = df for some scalar function f . Thenwemay say that f is the “integral” of ω, and that a vector fieldis integrable when it is dual to an exact 1-form.Qualitatively, the flow lines of an integrable vector field v

are good coordinate lines. An immediate example of an in-tegrable vector field is a coordinate basis vector field ej ≡∂/∂xj , where j = 0, 1, 2, 3. The vector ej is orthogonal to thesurface of constant xj and is dual to the exact 1-form dxj .

Example: Let us produce some explicit examples of non-integrable vector fields.In flat space with polar coordinates, an example of a non-integrable field would be ∂/∂φ. In Cartesian coordinates, thiswould be v = y∂/∂x−x∂/∂ywhich is non-integrable becausegv = ydx − xdy and d(gv) = 2dy ∧ dx 6= 0. The field v is nota potential field since its flow lines are closed circles.A form ω is closed if dω = 0. Since dd = 0, every exactform is closed, and locally (but not necessarily globally!) everyclosed form is exact. So a field v is (locally) integrable if it isdual to a 1-form gv which is closed,

d(gv) = 0.

Let us now motivate this well-known condition by more intu-itive considerations.We are trying to find a function f(p) such that v is the gra-dient of f . Such a function could be found by integrating the“work done by the force field” v from some initial point p0

to the point p. The work corresponding to an “infinitesimal”displacement lδ in the direction of a vector l is

g(v, l)δ = (gv) lδ,

where gv is the 1-form corresponding to the vector v. [Recallthat the 1-form gv acts on vectors a as (gv) a ≡ g(v,a).] Soour candidate function f would be defined by integrating this1-form along a path leading from p0 to p:

f(p) =

∫ p

p0

gv.

If the function f is well-defined then we will have df = gv asrequired. However, the function f is well-defined only if thevalue f(p) is independent of the path of integration, which isthe case only if the integral along any closed loop L vanishes,∮

Lgv = 0. By the integration theorem (a generalization of

Stokes’s law), ∮

L

gv =

A

d(gv),

where A is a surface enclosed by the loop L. Thus the two-form dgv must be everywhere equal to zero (this follows byconsidering infinitesimal loops around each point).Let us now obtain a more convenient form of this condi-tion. The 2-form dgv acts on arbitrary vectors a,b accordingto Eq. (1.21),

(dgv) (a,b) = a gv(b) − b gv(a) − gv([a,b])

= a g(v,b) − b g(v,a) − g(v, [a,b])

≡ ∇ag(v,b) −∇bg(v,a) − g(v, [a,b])

= g(∇av,b) − g(∇bv,a).

The condition we are looking for is therefore

g(∇av,b) − g(∇bv,a) = 0. (2.9)

Written in the index notation using the covariant componentsof v, this condition becomes

∇αvβ −∇βvα = 0.

This condition is reminiscent of the curl (rotation), ∇ × v, inthree-dimensional vector analysis, where a vector field is agradient if its curl vanishes. By analogy, the 2-form dgv iscalled the rotation of the vector field v.

57

Page 66: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

Another way to express the integrability condition is to de-fine the v-dependent bilinear form

B(v)(a,b) ≡ g(∇av,b), (2.10)

and to require that B(v) be a symmetric bilinear form,B(v)(a,b) = B(v)(b,a). Note that in general, the rotation 2-form dgv is twice the antisymmetric part of B(v).

Remark: We have already seen the bilinear form B(v) inEq. (1.50) when we introduced Killing vectors. For a Killingvector k, the form B(k) is antisymmetric and thus equal to12dgk; for an integrable vector field v, the form B(v) is sym-metric. It follows that B(v) = 0 for an integrable Killing vectorfield v; this means ∇av = 0 for all a, i.e. the vector v is “con-stant everywhere.” The existence of such a vector v is a veryspecial property of the spacetime, indicating a high degree ofsymmetry. The flow lines of v are geometrically preferred co-ordinate lines. For instance, any vectors transported by theflow of v are parallelly transported. If a timelike Killing vec-tor is hypersurface orthogonal, which is a weaker propertythan integrability, the spacetime is called static because in thatcase there is a natural “time” coordinate t such that v is par-allel to ∂/∂t and thus orthogonal to surfaces of constant t. Aspacetime is called stationary if it admits a timelike (but notnecessarily hypersurface-orthogonal) Killing vector.

Statement: An integrable vector field v which is normal-ized, g(v,v) = const, is always geodesic,∇vv = 0.Derivation: (See also the proof of the statement inSec. 2.2.8 where forms are not used.) We use Eq. (2.9) to applythe 2-form dω ≡ d(gv) to v and an arbitrary vector a:

(ιvdω) a ≡ (dω) (v,a) = g(∇vv,a) − 1

2∇ag(v,v)

= g(∇vv,a).

For an integrable field v, we have dω = 0 and so g(∇vv,a) =0 for all a; thus ∇vv = 0. Note that it does not follow from∇vv = 0 that dω = 0, but only that ιvdω = 0. Thus a normal-ized geodesic field is not necessarily integrable.

Statement: The acceleration of a vector field v is defined asthe vector field ∇vv; if the field v is timelike and g(v,v) = 1then ∇vv is interpreted as the acceleration of an observermoving along a worldline of v with respect to the tangentgeodesic line. We show that the acceleration of an integrablevector field is again integrable. (In this case, one recovers alimited version of the Newtonian picture, where the acceler-ation is determined as the gradient of a “gravitational poten-tial,” see Sec. 3.1.1.)Derivation: By assumption, v is integrable, so B(v) is sym-metric and for an arbitrary vector xwe have

g(∇vv,x) = B(v)(v,x) = B(v)(x,v)

= g(∇xv,v) =1

2∇xg(v,v).

So the 1-form ω dual to the field∇vv is a gradient of the scalarfunction 1

2g(v,v); this is the definition of integrability.

2.2.3 Frobenius theorem

The Frobenius theorem gives a sufficient and necessary con-dition for a vector field v to be hypersurface orthogonal. The

condition is written in the language of differential forms as

ω ∧ dω = 0, where ω ≡ gv,

and gv is the 1-form dual to the vector v. Let us motivatethe formulation of the Frobenius theorem before we begin toprove it.A hypersurface orthogonal vector field v is everywhere par-allel to the contravariant gradient g−1(df) of some function fdescribing a family of hypersurfaces f(p) = const. The con-dition of being parallel is most concisely expressed in the lan-guage of differential forms: Two 1-forms ω1 and ω2 are “par-allel” if ω1 ∧ ω2 = 0. So we are prompted to reformulate thecondition (2.8) in the following way. Denote by ω ≡ g−1v the1-form dual to v; then the vector v is hypersurface orthogonalif there exists a function f such that ω ∧ df = 0.However, it is inconvenient to involve an unknown func-tion f in the condition ω ∧ df = 0. We can try to eliminate theunknown 1-form df from this condition by computing

d(ω ∧ df) = dω ∧ df = 0,

so dω is also “parallel to df” just as ω is. Heuristically wemay say that both ω and dω are “parallel” to an unknown 1-form df , so the exterior product of ω and dω must vanish,ω ∧ dω = 0. Thus we have eliminated the unknown 1-formdf and obtained a condition, ω ∧ dω = 0, that convenientlyinvolves only the given 1-form ω (equivalently, the given vec-tor field v). Now we shall begin the proof of the Frobeniustheorem.We say that an n-form ω is parallel to a 1-form θ 6= 0 if

θ ∧ ω = 0. The following lemma states a useful property ofparallel forms.

Lemma 1: Suppose ω is an n-form, n ≥ 1, and θ 6= 0 is a1-form. If ω is parallel to θ, there exists an (n− 1)-form ψ suchthat ω = θ ∧ ψ. (For n = 1 this will be a scalar function ψ andwe will have ω = ψθ.)

Proof: The idea is to think that ω is “parallel to θ” and toobtain the “components” of ω that are “not parallel to θ.” Toobtain the components of a form, we can use a basis of vectors.Since θ 6= 0, we can choose a basis v0,v1,v2, ... such thatθ v0 = 1 and θ vj = 0 for j ≥ 1. Then for any basis vectorsvsj

6= v0 we have

(θ ∧ ω) (v0,vs1 ,vs2 , ...) = θ(v0) = ω (vs1 ,vs2 , ...)

= ω (vs1 ,vs2 , ...) = 0.

Thus ω is zero on any sets of basis vectors that does not con-tain v0, but ω(v0,vs1 ,vs2 , ...) may be nonzero. So we definethe (n− 1)-form

ψ ≡ ιv0ω; ψ(x,y, ...) ≡ ω(v0,x,y, ...).

It is easy to show that ω = θ ∧ ψ on any set of basis vectors vjand thus ω = θ ∧ ψ on any vectors. Note that the form ψ isdefined up to a multiple of θ, because a choice of v′

0 instead ofv0 will yield θ (v′

0 − v0) = 0 and thus

ψ′ = ψ + ιv′

0−v0ω = ψ + ιv′

0−v0(θ ∧ ψ) = ψ − θ ∧ ιv′

0−v0ψ.

An immediate corollary is: If two forms ω1, ω2 are both par-allel to the same 1-form θ then ω1 ∧ ω2 = 0.By Lemma 1, it follows from df ∧dω = 0 that ω = θ∧df forsome 1-form θ. Thus the forms ω and dω are both “parallel to”

58

Page 67: Winitzki. Advanced General Relativity

2.2 Null surfaces

some unknown df and then ω ∧ dω = 0. This is one directionof the Frobenius theorem.It remains to prove the converse statement, namely that if

ω ∧ dω = 0 then there exists a surface which is everywhereorthogonal to v.We can try to construct such a surface by choosing somevector fields t orthogonal to v and following the flow linesof all such t. It is sufficient to choose a basis of vector fieldst1, t2, t3 orthogonal to v. However, the flow lines of thesevector fields will not necessarily form a single surface. Forinstance, we may follow a flow line of a vector t1 and thena line of t2, and we shall not necessarily arrive at the samepoint as when we are following first t2 and then t1. For in-finitesimal distances, the discrepancy is in the direction of thecommutator [t1, t2]. This direction must be orthogonal to v

if all the flow lines are to lie within a single surface. There-fore, surfaces orthogonal to v will exist if for any two vectorfields t1, t2 orthogonal to v, the commutator [t1, t2] is againorthogonal to v. We shall now see that this is indeed the caseif ω ∧ dω = 0. This explanation needs to be made more clearand illustrated by a figure.

Note that a tangent vector t is orthogonal to v if ω(t) = 0,and that dω(x,y) involves the application of ω to the com-mutator [x,y], according to Eq. (1.21). This motivates us toformulate and prove the following statement.

Lemma 2: If vector fields x and y are everywhere orthogonalto v, and if ω ∧ dω = 0, where ω is the 1-form dual to v, thenthe vector field [x,y] is also orthogonal to v.

Proof: The conditions on x and y can be written as ω(x) =ω(y) = 0. By Lemma 1, dω = θ ∧ ω for some 1-form θ. Thenwe have

(dω) (x,y) = (θ ∧ ω) (x,y) = θ(x)ω(y) − θ(y)ω(x) = 0.

On the other hand, from Eq. (1.21) we find

(dω) (x,y) = x ω(y) − y ω(x) − ω([x,y]) = −ω([x,y]).

Thus ω([x,y]) = 0.It follows from Lemma 2 that vector fields orthogonal to

v will have surface-forming flow lines if ω ∧ dω = 0. Thisconcludes the proof of the Frobenius theorem.Here is another useful property of hypersurface-orthogonalgeodesic fields.

Statement 1: A hypersurface-orthogonal, normalized,spacelike or timelike, geodesic field v is always integrable.

Proof: For a normalized and geodesic field v, we have∇vv = 0 and g(v,v) = const, so Eq. (2.9) gives

(ιvdω)x ≡ (dω)(v,x) = g(∇vv,x)−g(∇xv,v) = 0. (2.11)

We say that the 2-form dω is transverse to v. The proof cannow proceed in one of two ways. If we consider the 2-formιv(ω ∧ dω)which vanishes since ω ∧ dω = 0, we find

0 = ιv(ω ∧ dω) = (ιvω) dω − ω ∧ (ιvdω) = ω(v)dω.

Since the field v is by assumption not null, ω(v) ≡ g(v,v) 6= 0,and it follows that dω = 0, i.e. the field v is integrable. An-other, more constructive way to prove the statement is thefollowing. Since v is hypersurface-orthogonal, there exists a

family of surfaces whose normal vectors are everywhere par-allel to v. This family of surfaces can be described by a func-tion ν, and the vector field normal to the surfaces is g−1dν.Since this field is parallel to v, there exists a scalar functionλ 6= 0 such that v = λg−1dν or equivalently ω = λdν anddω = dλ ∧ dν. Then Eq. (2.11) gives

0 = ιvdω = dλ(v)dν − dν(v)dλ ≡ (v λ)dν − 1

λω(v)dλ.

Since ω(v) 6= 0 and λ 6= 0, we can express

dλ =(v λ)λω(v)

dν,

and it follows that dω = dλ ∧ dν = 0.It is interesting to seewhere the argument fails for null fields

v: We cannot divide by ω(v) because ω(v) = g(v,v) = 0. In-deed, we can still find λ and ν such that ω = λdν, but thetransversality ιvdω = 0 yields only v λ = 0. This condi-tion can be satisfied by a function λ whose gradients are or-thogonal to the null vector v, in other words, g−1(dλ, ω) = 0.However, the gradient dλ does not need to be parallel to ω,and thus we might have dω = dλ ∧ dν 6= 0. An explicit ex-ample can be easily found in Minkowski spacetime: Chooseν = t−x and λ = y, such that dλ is orthogonal to dν. Then thevector field dual to ω ≡ λdν is v = y(∂t + ∂x), and we haveιvω = 0. So v is geodesic, null, and hypersurface orthogonal(to surfaces of constant x − t), but non-integrable since it hasnonvanishing rotation dω = dy ∧ (dt− dx) 6= 0.

Statement: A hypersurface-orthogonal field (even normal-ized and even timelike) does not necessarily have an inte-grable acceleration∇vv.Hint: It is not necessarily true that dιvdω 6= 0 even if ιvω =

0.Derivation: Explicit counterexamples are the 1-forms ω =

(x + y)dx, ω = x(dt − dx), ω = coshxdt + sinhxdxin the Minkowski spacetime. These 1-forms are dual tohypersurface-orthogonal fields since ω ∧ dω = 0, but they donot have integrable acceleration.

Statement: A geodesic vector field v is hypersurface-orthogonal everywhere if there exists a single 3-surface Σ towhich v is orthogonal. The same statement holds also forKilling fields k instead of geodesic vector fields.Derivation: Recall that ∇vg(v,v) = 0, thus g(v,v) is con-stant along the lines of v, and so the affine tangent vector vcan be rescaled, v → v = λv, where λ 6= 0 is a scalar func-tion such that ∇vλ = 0 and g(v, v) = const everywhere. Thevector field v is normalized, geodesic, and orthogonal to the3-surface Σ. Since v is everywhere parallel to v, it is sufficientto show that v is hypersurface orthogonal everywhere. Letω = gv be the 1-form dual to v; then ω ∧ dω = 0 on Σ, and weneed to prove that ω∧dω remains zero away fromΣ. We shallnow show that Lv(ω∧dω) = 0. This would mean that the Lie-propagated 3-form ω∧dω remains zero along the flow lines ofv when applied to a Lie-propagated basis of connecting vec-tors to v; that property is equivalent to having ω ∧ dω = 0everywhere. We compute

(Lvω) x = Lv (ω x) − ω Lvx =

= ∇vg(v,x) − g(v, [v,x]) = g(v,∇xv)

=1

2∇xg(v, v) = 0,

59

Page 68: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

since g(v, v) = const. Recall that L commutes with d on anyn-forms, hence

Lvdω = dLvω = 0,

and then we have

Lv(ω ∧ dω) = (Lvω) ∧ dω + ω ∧ Lvdω = 0.

Alternatively, here is a geometric argument. We can con-struct a family of 3-surfaces Σ(τ) which are surfaces of con-stant affine parameter τ measured along flow lines of v, wherewe set τ = 0 onΣ. We can choose a basis of connecting vectorscj , j = 1, 2, 3, such that [cj , v] = 0 and cj are tangent to Σ andthus orthogonal to v on Σ. These connecting vectors cj willremain orthogonal to v for all τ (see Lemma 1 in Sec. 2.2.6,where we need to replace n by v). Therefore, the 3-surfacesΣ(τ) are everywhere orthogonal to v and thus also to v. Nowconsider the case of a Killing field k and define ω ≡ gk. It issufficient to prove that Lk(ω ∧ dω) = 0. Since k is a Killingvector, we have Lkg = 0, so

Lkω = Lkgk = gLkk = 0.

We again recall that L commutes with d, thusLk (ω ∧ dω) = ω ∧ Lkdω = ω ∧ dLkω = 0.

In summary, we have the following relationships betweenintegrability, hypersurface orthogonality, and metric proper-ties of a vector field. The condition for hypersurface orthog-onality, ω ∧ dω = 0, is weaker than the condition for inte-grability, dω = 0. An integrable field v = g−1(df) is alwayshypersurface orthogonal (to the hypersurfaces of constant f ).An integrable field v which is normalized, g(v,v) = const,is always geodesic. A hypersurface orthogonal, normalized,geodesic field is integrable unless it is null (a geodesic fieldcan always be rescaled to make it normalized). The accelera-tion ∇vv of an integrable field v is again integrable.

2.2.4 Null surfaces

A null surface is a 3-surface whose normal vector n is every-where null and nonzero, g(n,n) = 0, n 6= 0. We shall nowstudy the geometry of null surfaces since they have certainspecial and useful properties.One unusual fact is that the normal vector is at the sametime tangent to the null surface since g(n,n) = 0. The tangentspace to a null surface is thus spanned by n and two otherbasis vectors. The latter two vectors must be orthogonal ton, and thus both must be spacelike. In fact, all the tangentvectors that are not parallel to n are spacelike (see the State-ment 2.1.0.1). Thus, a null surface is spanned by the null di-rection n and two spacelike directions orthogonal to n.Since the tangent space contains its normal vector, the in-ducedmetric in the tangent space is degenerate since the vectorn is orthogonal to every vector in the tangent space. This leadsto an ambiguity of the induced Levi-Civita connection on thenull surface: The formula (1.43), which is the only constrainton the Levi-Civita connection, does not define the vector∇xy

uniquely through its scalar product with an arbitrary vectorz. Thus there are many possible Levi-Civita connections on anull surface.Another peculiarity is that there is no unique projector ontothe tangent space (see Sec. 2.1.1). A satisfactory projector (2.2)can be found only after choosing another null vector field l

normalized by g(n, l) = 1, and then the image of the projec-tion is a two-dimensional subspace within the tangent space.

Self-test question: Do we need separate considerations forsurfaces whose tangent space is spanned by (a) a null vector,a timelike vector, and a spacelike vector, (b) two null vectorsand a spacelike vector, (c) three null vectors?Answer: No. Each of the described surfaces is in fact time-like, i.e. orthogonal to a timelike vector. Their tangent spacesare also spanned by one timelike and two spacelike vectors.

2.2.5 Examples of null surfaces

The simplest example of a null surface is the null cone of ligh-trays emitted from a point. Let us describe this surface explic-itly in a flat Minkowski spacetime. The lightcone emitted atthe origin is specified by the equation

t− r ≡ t−√

x2 + y2 + z2 = 0. (2.12)

The vector field n normal to this surface is

n =∂

∂t+x

r

∂x+y

r

∂y+z

r

∂z.

It is easy to check that g(n,n) = 0. The tangent space to thelightcone at a point (t, x, y, z) is spanned by the null vector nand the two spacelike vectors ∂/∂φ and ∂/∂θ,

∂φ≡ 1√

x2 + y2

(

y∂

∂x− x

∂y

)

,

∂θ=z

r

(

x∂

∂x+ y

∂y

)

− x2 + y2

r

∂z.

Another example of a null surface is the horizon of anonrotating, spherically symmetric black hole (Schwarzschildspacetime). The metric is given by Eq. (1.36). It is well knownthat the black hole horizon is the surface r − 2m = 0. How-ever, it is not easy to analyze vectors at r = 2m since theSchwarzschild coordinates are singular there. To show thatthe horizon surface is null, we note that its tangent space isspanned by the two spacelike vectors ∂/∂φ, ∂/∂θ, and also bythe vector ∂

∂t which is null at r = 2m (but not at other val-ues of r). Since ∂/∂t is null, it is also the normal vector to thesurface and hence the surface is null.

2.2.6 Lightcones are null surfaces

We can consider the surface formed by all the null geodesicsemitted from a given initial spacetime point p0 in all direc-tions. This surface is called the lightcone emitted at p0.

Statement: The lightcone emitted at a point p0 is always anull surface, in an arbitrary curved spacetime.We begin the proof by a preliminary calculation.

Lemma 1: If n is normalized, g(n,n) = const, and geodesic,∇nn = 0, and if c is a connecting vector for n, then g(n, c) isconstant along the flow lines of n.

Proof: Since∇cn = ∇nc and∇nn = 0, we have

n g(n, c) = g(n,∇nc) = g(n,∇cn) =1

2∇cg(n,n) = 0.

Lemma 2: The tangent space to a lightcone at any point p isspanned by one null vector and two spacelike vectors that areboth orthogonal to the null vector.

60

Page 69: Winitzki. Advanced General Relativity

2.2 Null surfaces

Proof: We first note that in the immediate vicinity of the ini-tial point p0 the lightcone can be viewed, in suitable local coor-dinates, as a portion of the Minkowski lightcone (2.12). Thusthe tangent space in the vicinity of p0 is indeed spanned byone null vector and two spacelike vectors orthogonal to it. Wenow need to extend this property away from the initial pointp0.By assumption, the lightcone surface is built entirely fromthe congruence of lightrays emitted at p0. Translating this as-sumption into the mathematical language, we say that everypoint on the surface lies on some lightray from the congru-ence, and thus the surface can be parametrically described bya function

γ(τ ; s1, s2) : R × R × R →M

such that γ(τ ; s1, s2) is a lightray for fixed s1 and s2. (Notethat there is a two-dimensional sphere of initial directions foremitting the lightrays, hence two parameters s1,2.) In partic-ular, n = ∂γ/∂τ is the tangent vector field which is null, andthe vectorsu1 ≡ ∂τ/∂s1, u2 ≡ ∂τ/∂s2 are everywhere linearlyindependent, connecting vectors for n (see the Statement 13).Thus we have the properties

g(n,n) = 0, ∇nn = 0,

[n,u1] = [n,u2] = 0.

By construction, the connecting vectors u1,2 are always tan-gent to the surface, and so is n. Thus the vectors (n,u1,u2)are always a basis in the tangent space to the surface. Thevector n is everywhere null by assumption, so it remains toshow that u1 and u2 remain spacelike throughout the surface,at any point p.The Minkowski description of the lightcone holds for aninfinitesimal vicinity of p0, and thus we have g(n,u1) =g(n,u2) = 0 near p0. Then it follows from Lemma 1 thatg(n,u1) = g(n,u2) = 0 everywhere along the flow lines ofn. Vectors orthogonal to a null vector and not parallel to itmust be spacelike (see Statement 2.1.0.1). Since each point pon the surface can be reached by some flow line of n startingfrom p0, the vectors u1,2 remain spacelike and orthogonal ton everywhere on the surface.

Corollary: The null tangent vector n at a point p is also thenormal vector to the surface.

Proof: By Lemma 2, the tangent space at p is spanned by n

and two spacelike connecting vectors u1,u2 that remain or-thogonal to n. Thus, n is orthogonal to every vector in theentire 3-dimensional tangent space, which means that n is thenormal vector to the surface.We have proved that a lightcone is a null surface. A light-cone is by construction a surface built from lines (the individ-ual lightrays). These lines are called the null generators of thesurface. Now we shall use the formalism of “null functions”to prove the converse statement: Any null surface is alwaysbuilt from null generators (even though not every null surfaceis a lightcone emitted from one point).

2.2.7 Null functions

A null surface can be defined through an equation ν(p) = 0,and then the function ν(p) is called a null function. Moregenerally, a null function ν(p) on the spacetime is any functionsuch that dν is a null covector, g−1(dν,dν) = 0. Immediate

examples are the functions

ν(t, x, y, z) = t− r ≡ t−√

x2 + y2 + z2,

or ν(t, x, y, z) = t− x,

in theMinkowski spacetime (verify this!). A null function ν(p)defines a null hypersurface ν(p) = ν0 for each value of ν0.A null function can be physically interpreted as a “retardedtime” carried by lightrays emitted by some sources: If a ligh-tray reaches an event p, it was emitted at the time ν(p) at itssource. (Here we do not assume that all the rays are emittedby one and the same source.) To remember this more easily,recall that time stands still for an observer moving at light-speed, so a lightray “preserves” its emission time and showsit at a point p as the value of ν(p).A peculiar property of null functions ν(p) is that the nor-mal vector n = g−1(dν) is tangent to the surface of constantν. Thus a null surface ν(p) = ν0 naturally comes with a con-gruence of null lines, which are the flow lines of n within thesurface and the null generators of the surface.

2.2.8 Null functions generate null geodesics

The usefulness of null functions comes from the fact that anynull surface consists not just of some null lines, but actually ofnull geodesics (see the following statement). These geodesicsare the null generators of the surface. Since any null surfaceis a surface ν(p) = ν0 for some null function ν, it will followthat null functions provide a complete description of any nullsurface consisting of lightrays.

Statement: If ν(p) is a null function with the normal vectorn, the flow lines of n are null geodesics. More generally, ifa null curve γ(τ) is contained within a surface ν(p) = constthen γ(τ) is a null geodesic.

Proof: A stronger statement was proved in Sec. 2.2.2: If n isintegrable and g(n,n) = const, where the constant is not nec-essarily zero, then ∇nn = 0. We shall give a direct derivationthat does not use forms. To show that∇nn = 0, we shall provethat g(x,∇nn) = 0 for an arbitrary vector x. We compute

g(x,∇nn) = ∇ng(x,n) − g(∇nx,n)

= n g(x,n) − g(∇xn + [n,x],n).

(We have used torsion-freeness and compatibility of ∇ withthe metric.) Since by assumption g(n,n) = const and n =g−1dν, we have g(n,∇xn) = 0 and g(x,n) = x ν [see alsoEq. (2.7)]. Thus we find

g(x,∇nn) = n g(x,n) − g([n,x],n)

= n (x ν) − ([n,x]) ν = x (n ν)= x g(n,n) = 0.

It follows that the flow lines of n are geodesics for integrablenull n. A tangent space to a null surface contains only onenull direction, and that direction is parallel to n. So any nullcurve contained within the surface must be a flow line of nand hence a geodesic.

2.2.9 Every lightray comes from null functions

The following statement demonstrates that any null geodesiccan be obtained from a suitable null function.

61

Page 70: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

Statement: For any null geodesic curve γ(τ) there exists anull function ν(p) such that γ = g−1(dν) on the curve, so thatγ(τ) is a null generator of ν(p) = 0.

Proof: There are of course many possible null functions sat-isfying this condition. We shall construct one such null func-tion ν(p) by choosing an arbitrary timelike curve ζ intersect-ing γ at a point p0. This curve will be the worldline of a “lightsource.” Suppose that the source emits lightrays in all direc-tions from each point of the curve ζ, and that γ(τ) is one ofthese lightrays emitted from p0. If σ is a suitable parameteron the curve ζ(σ), such that the tangent vector is normalized,

g(ζ, ζ) = 1, then σ is the observer’s “proper time” measuredalong the curve ζ(σ). We may suppose that the point p0 cor-responds to σ = 0. Then we define the desired null functionν(p) as the “retarded time” carried by the lightrays from thesource. More precisely, we define ν(p) = σ if the point p isintersected by a lightray emitted from the point ζ(σ). For in-stance, we then have ν(p0) = 0. The function ν(p) is well-defined in some neighborhood of the curve ζ; more precisely,in a neighborhood where the lightcones are well-behaved,i.e. do not form caustics and fill the entire space around thecurve ζ, leaving no gaps. By construction, the function ν(p)is constant along γ(τ) and thus the curve γ(τ) lies within thesurface ν(p) = 0, and the null vector γ(τ) is everywhere tan-gent to the surface ν(p) = 0. It remains to show that ν(p) is anull function; this is so because ν(p) = const is by construc-tion a lightcone surface, and we have already proved that anylightcone surface is null.

2.2.10 Conformal invariance

Under a conformal transformation of the metric, g → g =e2λg, null vectors obviously remain null. Therefore null func-tions remain null under a conformal transformation (i.e. theyare conformally invariant).Also, shapes of null geodesics are conformally invariant: anull geodesic line γ(τ) remains a geodesic after a conformaltransformation of the metric. This statement is now very easyto prove: a null geodesic is always a flow line of a gradient ofsome null function ν(p), the gradient operation d is indepen-dent of the metric, and so the property of being a null func-tion is conformally invariant. It follows that the normal vectorn ≡ g−1(dν)will change to n = e−2λn, and thus the flow linesof n are the same as those of n. The affine parameter τ must ofcourse be changed, but the shape of the lines will remain thesame.

2.3 Raychaudhuri equation

We now consider a geodesic congruence with a given tan-gent vector field. The distortion of the congruence along itsown flow lines will be determined by the geometry of thegiven spacetime. The distortion satisfies a purely kinematicalrelation called the Raychaudhuri equation. We shall deriveseparate versions of this equation for timelike and for nullgeodesics. We begin by introducing some geometric quanti-ties characterizing the distortion of geodesic congruences.

2.3.1 Distortion tensor

In this section we consider a normalized, geodesic vector fieldu which may be either timelike, spacelike, or null. We have

seen in Sec. 1.6.9 that the divergence divu of a vector field u

measures the relative rate of change of infinitesimal volumetransported by the flow of u. Consider an small region ofspace around a point p, and imagine that the entire regionmoves with the flow (we call this a comoving region). Thisregion can be visualized as consisting of a number of small,noninteracting particles, each moving along its flow line. Theregion will be generally distorted by the flow, so that not onlyits volume but also its shape will change. The distortion of theshape ismeasured by tensors called the shear and the rotation,which we shall now introduce.

The shape of the comoving region is naturally characterizedby connecting vectors c. For convenience, we can choose c tobe orthogonal to u at the initial point p. It then follows fromLemma 1 in Sec. 2.2.6 that c remains orthogonal to u along theflow lines of u.

There would be no distortion of the region if every con-necting vector cwere parallelly transported by the flow, i.e. if∇uc = 0 for every connecting c. Of course, in general we ex-pect that ∇uc 6= 0 at least for some c. Thus, the distortion ofthe volume around the point p during an infinitesimal timewill be described by the derivative∇uc along the flow. Since

∇uc = ∇cu,

the quantity∇uc is a linear function of c that can be describedby a u-dependent transformation B(u), which is a (1,1) ranktensor defined by

∇uc = ∇cu ≡ B(u)c.

In components,

Bβα = ∇αuβ ≡ uβ;α.

The transformation B(u) can be also written as

B(u) = ∇v − Lv,

where it is implied that both sides act on a vector field. We callB(u) the distortion tensor since it quantifies the deviation ofthe shape of a small comoving region from being parallellytransported.

The divergence of u is equal to the trace of B(u):

divu ≡ ∇αuα = Tr B(u).

Even if the divergence is zero, but B 6= 0, the shape of theregion will be changed by the flow. To analyze the informa-tion provided byB(u), it is convenient to consider the bilinearform

B(u)(x,y) ≡ g(B(u)x,y) = g(∇ux,y).

Wehave seen in Sec. 2.2.2 thatB(u) is a symmetric bilinear formiff u is an integrable vector field.

Practice problem: Compute the divergence of the null vec-tor field corresponding to a lightcone t = r in Minkowskispacetime.

Solution: The vector field is

u = ∂t +1

r(x∂x + y∂y + z∂z) ;

using the flat connection, we find divu = 2r−1.

62

Page 71: Winitzki. Advanced General Relativity

2.3 Raychaudhuri equation

2.3.2 Rotation

We may decompose B(u) into symmetric and asymmetricparts,

B(u)(x,y) = B(u)(x,y) + r(u)(x,y), (2.13)

B(u)(x,y) ≡ 1

2

(B(u)(x,y) +B(u)(y,x)

),

r(u)(x,y) ≡ 1

2

(B(u)(x,y) −B(u)(y,x)

).

The 2-form r(u) is called the rotation (or the twist , or the vor-ticity) of the field u. It is easy to see from Eq. (2.9) that

r(u) =1

2dgu ≡ 1

2dω,

where ω is the 1-form dual to the vector field u. The rotationtensor r(u) is thus the analog of the familiar three-dimensionalcurl (rotation) ∇ × v of a vector field. A nonzero rotationmeans that the vector field is not a potential field; in our termi-nology, a nonzero r(u) means that the field u is not integrable.To visualize the shape distortion represented by a nonzero

r(u), consider a flat (Minkowski) spacetime in which u is time-like and r(u) is a constant tensor, represented by a constant an-tisymmetric matrix rαβ in an orthogonal basis. Denote by r(u)

the transformation corresponding to the 2-form r(u), so thatfor all vectors x,y we have

g(r(u)x,y) = r(u)(x,y).

For simplicity, suppose that the symmetric part of the distor-

tion tensor vanishes, B(u) = 0. Then B(u) = r(u) and theconnecting vector c satisfies the equation

d

dτc = B(u)c = r(u)c,

with the solution

c(τ) = exp(τ r(u)

)c0.

Since r(u) is transverse to a timelike direction, it is represented(in an orthogonal basis) by an antisymmetric 3× 3matrix act-ing in spacelike directions. The exponential of such a matrixis a rotation matrix, thus c(τ) is related to the initial value c0

by a rotation. We find that all the vectors c are rotated by thesame fixed angle, which is proportional to τ . It follows thatthe entire shape of the initial region is rigidly rotated aroundan axis. This motivates the name “rotation” for the tensor r(u).

2.3.3 Introducing Raychaudhuri equation

The Raychaudhuri equation describes the change of the diver-gence of a geodesic vector field u along its own flow lines. Westraightforwardly find

∇u(divu) = Tr(x,y)

(∇uB(u)

) (x,y);

(∇uB(u)

) (x,y)

= ∇u

(B(u)(x,y)

)−B(u)(∇ux,y) −B(u)(x,∇uy)

= ∇ug(∇xu,y) − g(∇∇uxu,y) − g(∇xu,∇uy).

Using the metricity condition and the definition of the Rie-mann tensor, we rewrite the first and the third terms as

∇ug(∇xu,y) − g(∇xu,∇uy) = g(∇u∇xu,y)

= R(u,x,u,y) + g(∇x∇uu,y) + g(∇[u,x]u,y)

= R(u,x,u,y) + g(∇[u,x]u,y);

finally,(∇uB(u)

) (x,y)

= R(u,x,u,y) + g(∇[u,x]u,y) − g(∇∇uxu,y)

= R(u,x,u,y) − g(∇∇xuu,y)

= R(u,x,u,y) − g(B(u)B(u)x,y).

The trace of the last expression with respect to x,y yields

∇u(divu) = Ric(u,u) − Tr (B(u)B(u)) , (2.14)

where the last term is the trace of the square of the linear trans-formation B(u). This is the initial form of the Raychaudhuriequationwhich we shall later analyze in particular cases.At this point we can use the decomposition (2.13) which

separates the symmetric and the antisymmetric parts of thedistortion tensor. Note that Tr (AB) is a nondegenerate scalarproduct in the linear space of square matrices. The subspacesof symmetric and antisymmetric matrices are orthogonal withrespect to this scalar product. (Verify this!) Therefore with the

decomposition B = B + r the trace Tr(B2) is simplified to

Tr (B(u)B(u)) = Tr(

B(u)B(u))

+ Tr(r(u)r(u)

),

where r(u) is the transformation associated with the 2-formr(u). If the vector field u is integrable, its rotation vanishes and

we are left only with the symmetric part B of the distortiontensor.A further simplification is possible if we separate the trace-

free part from B. (This symmetric, trace-free part is calledthe shear of the field u.) However, this operation dependssensitively on whether the field u is null. Therefore we shallconsider the relevant cases separately.

2.3.4 Shear for timelike congruences

While the distortion tensor B(u) is not necessarily symmetric,it is always transverse to the direction of v:

B(u)(u,x) = g(∇uu,x) = 0,

B(u)(x,u) = g(u,∇xu) =1

2∇xg(u,u) = 0,

for all x. This fact allows us to reduce B(u) to a bilinear formin a lower-dimensional space.First we decompose the tangent space TpM into the sub-space u⊥ and its complement, the one-dimensional spacespanned by u. This decomposition is conveniently expressedusing a projector P onto u⊥ and a complementary projectorQonto u, such that

P +Q = 1, x = Px +Qx for all x, P 2 = P, Q2 = Q.

Note that Tr P = 3 and Tr Q = 1. The above decompositionofidentity is equivalent to the decomposition of themetric g intothe partial metric hu, see Eq. (2.3), and a suitable complement,

g(x,y) = hu(x,y) + g(u,x)g(u,y).

The contravariant metric is correspondingly decomposed as

g−1 = h−1u + u⊗ u.

A bilinear form B(x,y) can be decomposed with respect tothe direction u using the projectors P and Q as follows:

B(x,y) = B(Px, Py)+B(Px, Qy)+B(Qx, Py)+B(Qx, Qy).(2.15)

63

Page 72: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

So far we have not obtained any new information about B.However, if the formB is transverse to the direction of u thenB(·, Qy) = 0, and the above formula simplifies to

B(x,y) = B(Px, Py).

Since the structure of the projectors crucially depends onwhether the vector field u is null, we must consider the rel-evant cases separately. In this section we treat the easier casewhen u is timelike. (The spacelike case is very similar but lessimportant in practice.) Assuming that g(u,u) = 1, the projec-tors are

Px = x− ug(x,u), Qx = ug(x,u).

The distortion tensor is effectively three-dimensional since itis nonzero only on the subspace u⊥, where it coincides withits projection, denoted by B⊥

(u),

B⊥(u)(x,y) ≡ B(u)(Px, Py) = B(u)(x,y).

The shear of the vector field u is, by definition, the sym-metric traceless part of the projected distortion tensor. It isconvenient to express the shear using the partial metric forthe subspace u⊥,

hu(a,b) ≡ g(Pa, Pb) = g(a, Pb) = g(a,b) − g(u,a)g(u,b).

Note that the partial signature of this metric is (−−−) andits trace is equal to 3. If a symmetric bilinear form T (x,y) isnonzero only on the subspace u⊥, the traceless part of T isexpressed as

T − 1

3(Tr T )h.

Since Tr B⊥(u) = Tr B(u) = divu, we obtain the decomposition

B(u) =1

3(divu)h+ σ(u) + r(u), (2.16)

σ(u) ≡ B(u) −1

3(divu)h; Tr σ(u) = 0.

The name “shear” for the tensor σ(u) describes a linear defor-mation of the comoving region that changes its shape but doesnot change its volume.The Raychaudhuri equation (2.14) contains the trace of thesquare of B(u), and we can now simplify this term by usingthe decomposition (2.16). Note that every term in that decom-position is transverse to u and thus is nonzero only on thethree-dimensional subspace on which P projects. It is impor-tant that this subspace has a metric h with a Euclidean signa-ture; the overall minus sign of h is unimportant for the presentconsideration. The partial metric h establishes a correspon-dence between bilinear forms and transformations, and weshall denote transformations by a hat. For example, the bi-linear form σ(u) corresponds to the transformation σ(u) such

that1

σ(u)(x,y) = h(σ(u)x,y).

This relation uniquely determines the vector σ(u)x within the

subspace u⊥.In the space of bilinear forms with the scalar productTr (AB), the subspace of symmetric traceless forms is orthogo-nal to the subspace of “pure trace” (i.e. bilinear forms propor-tional to the partial metric h). Both these subspaces are also

1 Why don’t we also denote the transformation B(u) by a hat?

orthogonal to the subspace of antisymmetric forms. Thereforewe can decompose the trace term as

Tr (B(u)B(u)) =1

3(Tr B(u))

2+ Tr

(σ(u)σ(u)

)+ Tr

(r(u)r(u)

).

The Raychaudhuri equation for timelike geodesics is thenwritten as

∇u(divu) = Ric(u,u) − 1

3(divu)2

− Tr(σ(u)σ(u)

)− Tr

(r(u)r(u)

).

Let us now analyze the individual terms in the above equa-tion. Since a trace of a square of a symmetric matrix is non-negative, we have Tr (σ(u)σ(u)) ≥ 0. We can demonstrate thisfact formally by computing

Tr(x,y)h(σσx,y) ≡ Tr(x,y)σ(σx,y) = Tr(x,y)σ(y, σx)

≡ Tr(x,y)h(σy, σx).

We now need to use a metric decomposition (see Sec. 1.7.3 fordetails of trace calculations). If s1, s2, s3 is an orthonormalbasis of (spacelike) vectors, so that h(si, sj) = −δij , then theinverse metric h−1 is decomposed as

h−1 = −s1 ⊗ s1 − s2 ⊗ s2 − s3 ⊗ s3,

and we get

Tr(x,y)h(σy, σx) = −3∑

j=1

h(σsj , σsj) ≥ 0

since h is negative-definite.

Statement: We have Tr (r(u)r(u)) ≤ 0.Derivation: Same calculation as above, except for

Tr(x,y)r(u)(r(u)x,y) = −Tr(x,y)r(u)(y, r(u)x).

2.3.5 Shear for null congruences

We have seen that for null vector fields n, building a self-adjoint projector (2.2) into n⊥ requires a choice of an addi-tional null vector field l. It is convenient to normalize l byg(l,n) = 1. The definition (2.2) of the projector can be in-terpreted as a decomposition of the identity operator into theprojector P and a complementary projector Q,

1 = P +Q, Qx ≡ lg(n,x) + ng(l,x).

Note that Tr P = 2 and Tr Q = 2. The above formula is equiv-alent to a decomposition of the metric g into the partial metrichn, given by Eq. (2.4), and a suitable complement,

g(x,y) = hn(x,y) + g(l,x)g(n,y) + g(n,x)g(l,y).

Note that a tensor decomposition of the inverse metric g−1 is

g−1 = l ⊗ n + n ⊗ l − s1 ⊗ s1 − s2 ⊗ s2, (2.17)

where s1 and s2 are orthonormal, spacelike vectors, orthogo-nal to both n and l, and spanning the image of P .Let us now decompose the distortion tensor B(n) usingthe projectors P and Q, analogously to Eq. (2.15). SinceB(n)(x,n) = 0 by transversality, we can write

B(n)(x,y) = B(n)(Px, Py) +B(n)(Px, l)g(n,y)

+B(n)(l, Py)g(n,x) +B(n)(l, l)g(n,x)g(n,y).

64

Page 73: Winitzki. Advanced General Relativity

2.4 Applications of Raychaudhuri equation

The first term, B(n)(Px, Py) ≡ B⊥(n)(x,y), is effectively a bi-

linear form on a two-dimensional space spanned by s1, s2.Note that the trace of B⊥

(n) is equal to the trace of B(n). This

can be seen formally by using the general property

Tr(x,y)g(x,n)A(y, ...) = A(n, ...);

in other words, a trace with g(n,x) substitutes n into the restof the expression. However, substituting n into any termyields zero since Pn = 0 and g(n,n) = 0. For instance,

Tr(x,y)g(x,n)g(n,y) = g(n,n) = 0,

Tr(x,y)B(n)(Px, l)g(n,y) = B(n)(Pn, l) = 0.

ThereforeTr B⊥

(n) = divn.

When we compute the trace needed for the Raychaudhuriequation,

Tr (B(n)B(n)) ≡ Tr(a,b)(x,y)B(n) (a,x)B(n) (y,b) ,

it turns out that all the terms vanish except the termTr B⊥

(n)B⊥(n) involving only the “projected” or “transverse”

distortion B⊥(n),

Tr (B(n)B(n)) = Tr B⊥(n)B

⊥(n). (2.18)

(Verifying this is left as an easy exercise.) Therefore all theother terms in the decomposition of B(n) are “unphysical,”i.e. they do not enter the physically significant equations.

Remark: For null congruences n, the reduction of the distor-tion tensor B(n) to its transverseB

⊥(n) can be also motivated as

follows. The distortion tensor describes the change of shape inthe comoving 2-volume spanned by two spacelike connectingvectors s1,2. It is clear that these connecting vectors should beorthogonal to n. Also, connecting vectors could be changedby adding a multiple of n, e.g. s1 → s1 + λn, where λ is ascalar function; this will not modify the orthogonality of s1,2

and n. However, connecting vectors that differ only by a mul-tiple of n correspond to a change of the parameter in nearbycurves of the congruence. Therefore, the connecting vectors sand s + λn carry the same information about the geometry ofthe congruence. Thus we should only consider equivalenceclasses of connecting vectors up to a multiple of n. Whilethere is no canonical choice of representatives for these equiv-alence classes, a relatively straightforward way is to chooses1,2 within a 2-plane orthogonal to n and an auxiliary nullvector l. This is the construction used in the present section.We can now separate the symmetric traceless part σ(n) ofthe projected distortion, called the shear of n, from the “puretrace” part proportional to hn, and from the antisymmetricpart, which is the projected rotation r⊥(n),

B⊥(n) =

1

2(divn)hn + σ(n) + r⊥(n),

σ(n)(x,y) ≡ 1

2

(

B⊥(n)(x,y) +B⊥

(n)(y,x))

− 1

2(divn)hn,

r⊥(n)(x,y) ≡ r(n)(Px, Py) =1

2

(

B⊥(n)(x,y) −B⊥

(n)(y,x))

.

Note the factor 12 instead of

13 , which is due to the projector P

being two-dimensional. The Raychaudhuri equation for nullgeodesics is then written as

∇n(divn) = Ric(n,n) − 1

2(divn)2

− Tr(σ(n)σ(n)

)− Tr

(

r⊥(n)r⊥(n)

)

.

2.4 Applications of Raychaudhuriequation

2.4.1 Energy conditions

The Einstein equation (1.68) specifies the matter energy-momentum tensor Tµν that would produce any given space-time. However, not every tensor Tµν corresponds to physi-cally reasonable situations. There are certain conditions thatare convenient to impose on Tµν ; these are summarily calledenergy conditions.It is reasonable to suppose that the energy density mea-sured locally by an observer is everywhere nonnegative.2 Thisgives the weak energy condition

Tµνuµuν ≡ T (u,u) ≥ 0 for all timelike u.

Since null vectors are limits of timelike vectors, a consequenceof this condition is the null energy condition

T (n,n) ≥ 0 for all null n.

In some calculations it is necessary to impose conditions di-rectly on the Ricci tensor. One such condition is the strongenergy condition,

Ric(u,u) ≤ 0 for all timelike u.

By the Einstein equation, this is equivalent to

T (u,u) − g(u,u)

2Tr T ≥ 0 for all timelike u.

Note that (contrary to what the words suggest) the strong en-ergy condition does not imply the weak one.Finally, the dominant energy condition is

jµ ≡ Tµνuµ is future-directed and jµj

µ ≥ 0,

for all timelike uµ. This condition means that the energy-momentum flows along future-directed timelike or null lines.By definition, a perfect fluid with the velocity field v, den-sity ρ, and pressure p is described by the energy-momentumtensor

T = (ρ+ p)v ⊗ v − pg−1;

Tµν = (ρ+ p) vµvν − pgµν .

In many situations, the matter EMT can be represented by theperfect fluid form with a suitably chosen “velocity” vector v.It is assumed that v is timelike and g(v,v) = 1. Note that thetrace of the perfect fluid EMT is

Tr T = ρ− 3p.

Energy conditions can be expressed as inequalities for p andρ as follows: The weak energy condition is

ρ ≥ 0, ρ+ p ≥ 0,

the null energy condition is just

ρ+ p ≥ 0,

the strong energy condition is

ρ+ 3p ≥ 0, ρ+ p ≥ 0,

and the dominant energy condition is

ρ ≥ 0, ρ ≥ |p| .2This is true for every known classical field but is sometimes violated byquantum fields.

65

Page 74: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

Statement: The above formulas represent the energy condi-tions for the perfect fluid EMT.Derivation: Since the energy conditions involve an arbitrarytimelike vector u, it is convenient to decompose that vectorwith respect to the fluid velocity v as

u = av + w, w ∈ v⊥, g(u,u) = a2 − g(w,w) = 1,

where a is a suitable constant. The value of the vector w

will be irrelevant for the calculations since T (v,v) = p + ρ,T (v,w) = 0, and T (w,w) = −p, while the value of a is con-strained only by |a| ≥ 1. For instance, the weak energy condi-tion gives

T (u,u) = a2T (v,v) + T (w,w) = a2(ρ+ p) − p ≥ 0,

which holds for all a ≥ 1 only if ρ ≥ 0 (at a = 1) and ρ + p ≥0 (the limit of large a). Other energy conditions are treatedsimilarly.

Statement: A massless, minimally coupled scalar field φwith the EMT

T = dφ⊗ dφ− g−1(dφ,dφ)

2g,

Tµν ≡ φ,µφ,ν −1

2gµνφ,αφ

,α,

satisfies the null and the strong energy conditions.Derivation: For any null vector n, we have

T (n,n) = (∇nφ)2 − g(n,n)

2(...) = (∇nφ)

2 ≥ 0.

For any timelike vector v, we have

Tr T = −g−1(dφ,dφ),

T (v,v) − 1

2g(v,v)Tr T = (∇vφ)

2 ≥ 0.

2.4.2 Focusing of timelike geodesics

The divergence divu of a timelike vector field u can be inter-preted as the relative rate of change of a comoving 3-volumeorthogonal to u. This is so because the 3-volume spanned byvectors a,b, c is equal to ε(a,b, c,x), where x is any normal-ized vector orthogonal to a,b, c. If a,b, c are connecting vec-tors for uwhich are orthogonal to u initially, they will remainorthogonal to u, so we can use u as the vector x. Thus

dV

dτ= Luε(a,b, c,u) = (divu) ε(a,b, c,u) = (divu) V.

Consider a congruence of timelike geodesics such that thetangent vector field u is hypersurface orthogonal. Then, as wehave seen, the rotation of u vanishes, and the distortion tensoris symmetric. The Raychaudhuri equation is simplified to

d

dτ(divu) = R(u,u) − 1

3(divu)2 − Tr

(σ(u)σ(u)

).

Imposing the strong energy condition, which holds forclassical matter, we find that the right-hand side of theabove equation is nonpositive, indicating that gravity makeshypersurface-orthogonal geodesics diverge less. This meansthat gravity is an attractive force. In fact, we can obtain astronger result,

d

dτ(divu) ≤ −1

3(divu)2.

This inequality can be integrated with an initial valuedivu(τ0) = θ0, which yields

divu ≤ 1

θ−10 + 1

3τ.

If the initial value θ0 is positive, the divergence goes to zero (orbecomes negative), and if it is initially negative, it diverges tonegative infinity within a finite time. A negative infinite di-vergence means that the geodesics cross at a focal point . Theset of all focal points in a given congruence is called a caustic.We conclude that the appearance of a focal point is inevitablewithin a finite proper time once the initial value of the diver-gence is negative, for hypersurface orthogonal geodesic con-gruences. This statement is known as the focusing theorem.Here is an example of a hypersurface orthogonal congru-ence to which the focusing theorem applies.

Statement 1: A congruence of timelike geodesics emittedfrom one point p is hypersurface orthogonal because it is or-thogonal to hypersurfaces of constant proper time τ .

Proof: Let u be the affine tangent vector field for a con-gruence of geodesics emitted from p. We may normalizeg(u,u) = 1. Since there are three linearly independent con-necting vectors for u at any point, and since these connectingvectors form a basis of the tangent space to a hypersurface ofconstant τ , it is sufficient to show that any connecting vectorfield c is everywhere orthogonal to u. It will then follow thatu is orthogonal to the hypersurface of constant τ . Now, if c isa connecting vector for u, it is easy to show that∇ug(c,u) = 0(see Lemma 1 in Sec. 2.2.6). Thus g(c,u) is constant alongthe lines of the congruence, and so it remains to show thatg(c,u) = 0 at any one point along each line. Since obviouslyall connecting vectors c must vanish at the point p, we haveg(c,u) = 0 at p for every line of u. (Alternatively, we mayconsider an infinitesimal neighborhood of p, where the geom-etry of lines is approximately the same as in the Minkowskispace, where geodesics are straight lines and the statementg(c,u) = 0 is obviously true.) Thus we have proved the state-ment.

2.4.3 Repulsive gravity

Suppose that the energy-momentum tensor contains a cosmo-logical constant term

T = T(m) +Λ

8πg−1,

where T(m) is the matter EMT and Λ > 0 is the cosmologi-cal constant. (The cosmological constant is also called darkenergy if it is implied that Λ = Λ(p) is a slowly-varying dy-namical field and its local value near a point p is perceived as acosmological “constant”.) The effect of introducing a cosmo-logical constant is described by the Raychaudhuri equation,which receives an extra term

d

dτ(divu) = − 1

[

T(m)(u,u) − 1

2Tr T(m)

]

− 1

3(divu)2

− Tr(σ(u)σ(u)) + Λ.

The extra term will decrease the rate of focusing, which indi-cates that a cosmological constant corresponds to a repulsivegravitational force.

66

Page 75: Winitzki. Advanced General Relativity

2.5 Null tetrad formalism

2.4.4 Focusing of null geodesics

We now consider a congruence of null geodesics with ahypersurface-orthogonal tangent vector field n, and wewould like to obtain an analogous focusing theorem. Com-pared with the timelike case, we have two differences: Firstly,the 3-volume spanned by vectors orthogonal to n vanishesand so the interpretation of divn is altered; this issue wastreated in Sec. 2.1.2 where we found that divn describes therate of change of comoving area. Secondly, the rotation tensorr(n) generally does not vanish for hypersurface-orthogonalnull geodesics. However, Raychaudhuri equation involvesthe trace term Tr (r(n)r(n)), and we shall now show that thisterm vanishes for a hypersurface-orthogonal field n, eventhough the rotation tensor r(n) itself does not vanish.We have seen in Sec. 2.3.5 that the trace terms involve onlythe projected distortion tensor [Eq. (2.18)]. Therefore the traceterm will remain the same after we project the rotation tensoronto the subspace spanned by s1,2,

Tr(r(n)r(n)

)= Tr

(

r⊥(n)r⊥(n)

)

.

We now use the fact that the rotation 2-form r(n) is equal to dω,where ω ≡ g−1n is the 1-form dual to n. By the assumption ofhypersurface orthogonality and the Frobenius theorem, thereexist scalar functions λ, ν such that ω = eλdν and then

dω = eλdλ ∧ dν = dλ ∧ ω.

The projected rotation tensor acts on vectors x,y as

r⊥(n)(x,y) ≡ (dω) (Px, Py)

= ω(Py)(dλ) Px − ω(Px)(dλ) Py.

Since ω(Px) = 0 for all x, we have r⊥(n) = 0.

Since the projected rotation vanishes, we obtain the Ray-chaudhuri equation for null geodesics in the form

d

dτ(divn) = R(n,n) − 1

2(divn)2 − Tr

(σ(n)σ(n)

).

Imposing the null energy condition, we find R(n,n) ≤ 0.Hence, the right-hand side of the above equation is nonpos-itive and

d

dτ(divn) ≤ −1

2(divn)2.

This is the focusing theorem for the null case. The conclusionis that gravity makes null geodesics focus within a finite inter-val of the affine parameter τ .

2.5 Null tetrad formalism

The null tetrad formalism is due to Newman and Penrose andconsists of writing equations in a basis where all four vectorsare null and “as orthogonal as possible” with respect to eachother. This leads to simplifications when writing tensor equa-tions in components. We shall now introduce the null tetradformalism as an elegant way of expressing the projected dis-tortion tensor for the case of null geodesics.The idea is to change the previously used basis l,n, s1, s2,where the vectors l,n are null and s1,2 are spacelike con-necting vectors for n, into a null basis. Since l must satisfyg(l,n) = 1 and so l is not (and generally cannot be chosen as)

a connecting vector for n, we are motivated to drop the re-quirement that s1,2 be connecting vectors but instead demandorthonormality, g(sj , sk) = −δjk. Then we define the complex-valued null vectors

m ≡ 1√2

(s1 + is2) , m ≡ 1√2

(s1 − is2) . (2.19)

The fact that these vectors are complex-valued is merely amathematical formality that makes equations more compact;we expect that every physically measurable quantity is real-valued. The null basis l,n,m, m is called the Newman-Penrose null tetrad.The null tetrad is “nearly orthogonal” since g(m, m) = −1,

g(l,n) = 1, and all the other scalar products vanish. Hence,the decomposition (2.17) of the contravariant metric g−1 isrewritten through the null tetrad as

g−1 = l ⊗ n + n ⊗ l − m ⊗ m − m ⊗ m.

The partial metric h(n) is similarly decomposed as

h−1(n) = −m⊗ m − m ⊗ m,

and the projector P is

Px = −mg(x, m) − mg(x,m).

Using the null tetrad decomposition above, the divergenceof a hypersurface orthogonal null geodesic field n can be writ-ten as

divn = Tr(x,y)B(n)(x,y)

= B(n)(l,n) +B(n)(n, l) −B(n)(m, m) −B(n)(m,m)

= −2B(n)(m, m) ≡ −2g(∇mn, m),

because the tensor B(n) is transverse to n and its projectiononto the subspace spanned by (m, m) is rotation-free.Note that for a geodesic field n that is not hypersurface or-thogonal, the quantity −2B(n)(m, m) is complex-valued, andonly its real part is actually equal to the divergence of n. How-ever, the “complex-valued divergence”

θ ≡ −2B(n)(m, m)

is still useful for the analysis of such cases.The other ingredient in the Raychaudhuri equation is theprojected shear tensor σ⊥

(n), which can be expressed as

σ⊥(n)(x,y) = B(n)(Px, Py) − 1

2(divn)h(n)(x,y)

= g(m,x)g(m,y)σ + g(m,x)g(m,y)σ (2.20)

through the auxiliary (complex-valued) scalar quantity σ,

σ ≡ B(n)(m,m),

also called the scalar shear. Another expression for the scalarshear is

σ ≡ g(m,∇mn) = g(m, [m,n]) + g(m,∇nm) = g(m, [m,n]).

It follows that m cannot be a connecting vector for n unlessthe shear vanishes.The completion of a given vector n to a null tetrad is ofcourse not unique. Let us now investigate the freedom inthe choice of the vectors l,m and the resulting uncertainties

67

Page 76: Winitzki. Advanced General Relativity

2 Geometry of null surfaces

in the distortion quantities σ, θ. The tetrad vectors are de-fined by the requirements g(l, l) = g(m,m) = g(l,m) = 0,g(n, l) = −g(m, m) = 1. The vectorm is related by Eq. (2.19)

to an orthonormal basis s1,2 in the space (l,n)⊥. Note thatl,n, s1, s2 is a basis; the set of transformations that preservethe orthonormality of the basis is the Lorentz group; so theadmissible transformations of the tetrad vectors are preciselythe subgroup of the Lorentz group that leaves the vector n

unchanged. Let us describe these transformations explicitly.Suppose that the vectorsm and l change to

m → m = αm + βn + γl,

l → l = λl + µm + µm + νn,

where α, β, γ, µ are complex and λ, ν are real. Then itis straightforward to prove that the orthogonality relations

g(m,n) = 0, g(m, ˜m) = −1, g(l,n) = 1, g(l, m) = 0, andg(l, l) = 0 require that γ = 0, αα = 1, λ = 1, β = µα, andν = µµ. Therefore, the general form of an admissible transfor-mation is

m → m = eiφm +An, (2.21)

l → l = l + eiφAm + e−iφAm +AAn, (2.22)

where φ and A are arbitrary scalar functions; note that φ isreal-valued while A is complex-valued. We can easily see thatthe distortion quantities change under this transformation as

σ → σ = e2iφσ, θ → θ = θ. (2.23)

It is useful to obtain closed-form equations for the diver-gence θ and the scalar shear σ of a geodesic congruence. TheRaychaudhuri equation (for a hypersurface orthogonal, null,geodesic field n) acquires a more elegant form,

d

dτθ ≡ ∇nθ = −1

2θ2 + Ric(n,n) − 2σσ.

Since σσ is invariant under the transformation (2.23), the Ray-chaudhuri equation does not depend on the completion of nto a null tetrad.

Statement: It follows from Eq. (2.20) that Tr σ⊥(u)σ

⊥(u) = 2σσ.

Derivation: Calculation (omitted).Under some additional assumptions about the null tetrad,a similar equation holds for the scalar shear σ,

d

dτσ ≡ ∇nσ = −θ + θ

2σ +R(n,m,n,m). (2.24)

We shall now derive this equation and detail the necessaryassumptions.We need to compute the quantity

∇nσ ≡ ∇nB(n)(m,m) ≡ ∇ng(∇mn,m)

= g(∇n∇mn,m) + g(∇mn,∇nm).

The appearance of nested covariant derivatives suggests thatwe introduce the Riemann tensor,

g(∇n∇mn,m) = R(n,m,n,m) + g(∇[n,m]n,m).

(We used∇nn = 0.) Thus

∇nσ = R(n,m,n,m) + g(∇[n,m]n,m) + g(∇mn,∇nm).(2.25)

By virtue of the following obvious properties,

g(∇mn,n) = 0, g(∇mn,m) ≡ σ, g(∇mn, m) ≡ −1

2θ,

g(∇nm,n) = ∇ng(m,n) − g(m,∇nn) = 0, g(∇nm,m) = 0,

we have

∇mn = −σm +1

2θm + αn,

∇nm = ζm + γn,

where we introduced unknown scalar functions

α ≡ g(∇mn, l), ζ ≡ −g(∇nm, m), γ ≡ g(∇nm, l).

These constants depend on the completion of n to a nulltetrad and should not enter the final expression. Now we canstraightforwardly evaluate:

g(∇mn,∇nm) = ζσ,

[n,m] =

(

ζ − 1

)

m + σm + (γ − α)n,

g(∇[n,m]n,m) = B(n)([n,m] ,m)

=

(

ζ − 1

)

B(n)(m,m) + σB(n)(m,m)

=

(

ζ − 1

)

σ − 1

2θσ.

Finally, Eq. (2.25) is rewritten as

∇nσ = R(n,m,n,m)− 1

2(θ + θ)σ + 2ζσ.

This equation still contains the function ζ ≡ −g(∇nm, m) thatdepends on the completion of n to a null tetrad and is thus ar-bitrary. We would like to eliminate this “gauge dependence.”Note that the function ζ always has pure imaginary valuesand changes under the replacement (2.21) as

ζ → ζ = ζ + i∇nφ.

Therefore, for a given null tetrad, there exists a suitable func-

tion φ such that new null tetrad l,n, eiφm, e−iφm has ζ = 0.(Obviously, the function φ can be found by integrating iζalong the flow lines of n.) Once we have ζ = 0, the requiredequation (2.24) follows.Alternatively, we may redefine the scalar shear for a givennull tetrad as

σg ≡ e2iλσ,

where the function λ is such that

i∇nλ = g(∇nm, m).

(There is, of course, a considerable freedom in choosing thefunction λ.) The redefined “gauge-covariant shear” σg stilldepends on the choice of the tetrad and of the function λ,but is “gauge-covariant” in the sense that a tetrad transfor-mation (2.21), (2.23) yields

∇nσg = e2iφ∇nσg.

The quantity σgσg remains invariant as before, and Eq. (2.24)holds for σg with any choice of tetrad.

68

Page 77: Winitzki. Advanced General Relativity

2.5 Null tetrad formalism

Calculation: Verify the above equation, where σ = e2iφσ and

σg ≡ e2iλσ.Derivation: Let λ be the required function for the original

tetrad and λ for the modified tetrad with m = eiφm. Then wecompute

∇nσg = (2i∇nλ)σg + e2iλ∇nσ,

i∇nλ = g(∇neiφm, e−iφm) = i∇nλ− i∇nφ,

λ = λ− φ,

∇nσg = ∇n(e2iλσ) = (2i∇nλ)σg + e2iλ∇nσ

= 2iσg∇n(λ− φ) + (2i∇nφ)σg + e2iλ+2iφ∇nσ

= e2iφ∇nσg.

69

Page 78: Winitzki. Advanced General Relativity
Page 79: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

If a spacetime contains a finite island of matter surroundedby an infinite sea of vacuum, we intuitively expect that thespacetime is “almost flat sufficiently far from the matter.” Forexample, the metric (1.36) for the Schwarzschild spacetimeis approximately equal to the Minkowski metric in sphericalcoordinates in the limit r → ∞. Such spacetimes contain-ing “well-isolated systems” are called asymptotically flat. Inthis chapter we shall formulate and explore the property of“asymptotic flatness” in more detail. We begin by consider-ing an easier case of stationary spacetimes.

3.1 Stationary spacetimes

Suppose a metric g is given in coordinates xµ, and the coef-ficients gµν are independent of one of the coordinates, say x

1.Then ∂x1 is a Killing vector for themetric g; in other words, theflow of the vector field ∂x1 leaves the metric invariant. Thus,Killing vectors provide a formalization of the concept of “co-ordinate symmetry” of the metric. Using Killing vectors, theheuristic concept of a “time-independent metric” is made pre-cise in the following way.A spacetime is called stationary if it possesses a timelikeKilling vector field. The existence of such a vector field leadsto many useful properties that we shall now explore. Astronger property is being static rather than stationary. Aspacetime is static if it is stationary and the timelike Killingvector is hypersurface-orthogonal. For now, it suffices to con-sider stationary spacetimes.In a stationary spacetime, the flow lines of the Killing vec-tor k can be seen as preferred worldlines along which the ge-ometry does not change. It is useful to consider imaginaryobservers moving along these worldlines. These are calledstationary observers and have 4-velocities

u =1

g(k,k)k. (3.1)

Stationary observers find that the geometry of the spacetimearound them is the same at all times. However, these station-ary observers are not necessarily geodesic observers, i.e. theyare not necessarily freely falling. For example, an observer sit-ting at the surface of a nonrotating planet observers that thegeometry is time-independent. This is a stationary observerwho is, obviously, not freely falling.The following statement lists the most important propertiesof Killing vectors with regard to being geodesic.

Statement 3.1.0.1: (a) For a Killing vector k the property∇kg(k,k) = 0 holds. (b) The function g(k,k) is everywhereconstant iff the vector k is geodesic. (c) In the domain whereg(k,k) 6= 0, the normalized vector field u defined by Eq. (3.1)is geodesic iff k is geodesic.Idea of proof: Compute the scalar product of ∇kk with anarbitrary vector and use the Killing equation (1.48).

Proof of Statement 3.1.0.1: (a) The property ∇kg(k,k) = 0follows directly from the Killing equation (1.48). (b) For an

arbitrary x, we have

g(∇kk,x) = −g(k,∇xk) = −1

2∇xg(k,k).

Thus, the scalar product of ∇kk with an arbitrary vector van-ishes iff the derivative of g(k,k) in an arbitrary direction van-ishes. (c) Assuming that g(k,k) 6= 0, we compute

∇uu =1

g(k,k)∇kk +

k√

g(k,k)∇k

1√

g(k,k)

=1

g(k,k)∇kk.

Thus ∇uu = 0 iff∇kk = 0.

In general, a Killing vector is not normalized, so station-ary worldlines generally correspond to accelerated observersrather than to freely falling observers. For example, the sta-tionary worldlines in the Schwarzschild spacetime describeobservers remaining at a constant distance from the center ofmass. These observers move with a constant acceleration andthus feel a constant pull of gravity. Accordingly, the Killingvector ∂t is not normalized in the Schwarzschild spacetime,i.e. g(k,k) 6= const.

3.1.1 Newtonian limit

In the framework of General Relativity, the Newtonian limitis the assumption that gravitation is weak and that there ex-ists a globally inertial reference frame—the “absolute space-time.” In other words, a Newtonian setting considers a setof “absolute” observers who see each other as motionless ormoving with constant velocities at all times. (Of course, thisassumption is only approximately correct in the actual curvedspacetime; also the clocks at different points run at differentrates and cannot be exactly synchronized everywhere at alltimes.) For example, someone sitting on the surface of theEarth could be such an “absolute” observer. Since “absolute”observers do not necessarily move along geodesic lines, theywill perceive the motion of freely falling bodies as an acceler-ated motion occurring in the “absolute” reference frame. TheNewtonian theory explains this apparently accelerated mo-tion as being due to “gravitational forces.” We now formal-ize these assumptions and derive the Newtonian equations ofgravitation.A congruence of the world-lines of the “absolute” ob-servers is described by a timelike vector field v, normalizedas g(v,v) = 1. The observers measure the “absolute time” τalong the flow lines of v. The central assumption of the New-tonian limit is that these observers are “fixed” with respect toeach other, so any neighbor observers appear tomovewithoutrelative acceleration. In other words, any connecting vectorfield c, such that [c,v] = 0, appears (approximately!) unaccel-erated,∇v∇vc ≈ 0.Let us find out how the “stationary” observers perceive themotion of freely falling (geodesic) particles. If a geodesic hasinstantaneously the same tangent vector v as one of the fixedobservers, the observer’s acceleration relative to the geodesic

71

Page 80: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

is∇vv, therefore the observer will measure the acceleration ofthe freely falling particle as a ≡ −∇vv. Our goal is to derivea formula describing the acceleration a.The plan of the derivation is the following. First we usethe properties of connecting vectors to deduce that the vectorfield a is integrable. It will follow that there exists a scalarfunction φ such that a = g−1dφ. This scalar function is calledthe gravitational potential. Then we shall derive an equationfor φ, using the Einstein equation with matter sources consist-ing of “fixed” matter (not moving with respect to the fixedobservers) with mass density ρ(x). This equation will coin-cidewith the (three-dimensional) Poisson equation∆φ = 4πρ,which is equivalent toNewton’s law of gravity. In this waywereproduce the Newtonian theory of gravitation.The acceleration field a is integrable if the bilinear form

B(a)(x,y) defined by Eq. (2.10) is symmetric. The form of theexpression

B(a)(x,y) ≡ −g(∇x∇vv,y)

suggests that we should consider the combination

g(∇v∇xv −∇x∇vv −∇[v,x]v,y) = R(v,x,v,y)

and try to simplify it using the given information about theconnecting vectors. We substitute x = c and obtain

0 = g(∇v∇vc,y) = g(∇v∇cv,y) = R(v, c,v,y)+g(∇c∇vv,y)

and therefore

B(a)(c,y) ≡ −g(∇c∇vv,y) = R(v, c,v,y). (3.2)

Since R(v, c,v,y) = R(v,y,v, c), we find

B(a)(c,y) = B(a)(y, c).

In fact, the above relation holds for arbitrary vector fields x,yat any point, B(a)(x,y) = B(a)(y,x), because it involves onlythe value of the connecting vector c at one point, and not thefact that c is a connecting vector. Hence, the bilinear formB(a)

is symmetric, the vector field a is integrable, and there existsa scalar function φ such that a = g−1dφ.To derive an equation for φ, we use Eq. (3.2) and the defini-tion of divergence,

diva = Tr(x,y)g(∇xa,y).

It follows that

diva = −div (∇vv) = Tr(x,y)R(v,x,v,y) = Ric(v,v).

On the other hand,

diva = divg−1dφ

= Tr(x,y)g(∇xg−1dφ,y)

= Tr(x,y) (∇xdφ) y = φ,

where the D’Alembertian φ is defined as the trace of thebilinear form∇...∇...φ,

φ ≡ Tr(x,y)(∇...∇...φ) (x,y) ≡ Tr(x,y)(∇x∇yφ−∇∇xyφ).

In components, φ = ∇µ∇µφ. Hence, we obtain the follow-ing equation for φ,

φ = Ric(v,v).

To obtain a concrete expression for Ric(v,v), we need torelate the curvature to the matter content of the spacetime. Inthe Newtonian approximation, the matter sources are approx-imately motionless particles, so the energy-momentum tensorof matter is Tµν = ρvµvν , where ρ(x) is the mass density. TheEinstein equation,

Rµν −1

2gµνR

αα = −8πTµν ,

can be rewritten as

Rµν = −8π

(

Tµν −1

2gµνT

αα

)

.

Since vµvµ = 1, we find T µµ = ρ and finally

φ = Rµνvµvν = −4πρ.

This is in fact the familiar Poisson equation relating the New-tonian gravitational potential φ to the density ρ of matter,written in a Lorentz-covariant way. In an almost-flat space-time, we can choose the coordinate t along the flow lines of v,and then we have approximately

φ =(∂2t − ∆

)φ = −4πρ, ∆ ≡ ∂2

x + ∂2y + ∂2

z .

The time derivatives are negligible since the distribution ofmatter is almost stationary (in the SI units, this is made moreexplicit by the factor 1

c2 ∂2t ). Hence, we recover the three-

dimensional Poisson equation

∆φ = 4πρ.

Note that the three-dimensional acceleration ai = −∂iφ,where i = x, y, z, because of the negative sign of the spatialpart of the metric.

Example: Schwarzschild spacetime

The metric is given by Eq. (1.36) and the stationary observersfollow lines of fixed r, φ, θ for r > 2m. We expect the Newto-nian approximation to hold for large r ≫ 2m. The vector v isthus proportional to ∂t, and normalization g(v,v) = 1 yields

v =

(

1 − 2m

r

)−1/2

∂t.

The vectors ∂θ, ∂φ are connecting vectors for v, but ∂r is not,

[v, ∂r ] =m

r(r − 2m)v.

However, the approximation [v, ∂r ] ≈ 0 is good for large r,namely for r ≫ m.A calculation shows that the acceleration field∇vv is exactlyintegrable (not only in the Newtonian approximation). Letus compute the scalar products of ∇vv with the basis vectorsv, ∂r, ∂θ, ∂φ:

g(∇vv,v) =1

2∇vg(v,v) = 0,

g(∇vv, ∂θ) = −g(∇v∂θ,v) = −g(∇∂θv,v) = 0,

g(∇vv, ∂φ) = 0,

g(∇vv, ∂r) = −g(∇v∂r,v) = −g(v, [v, ∂r ]) − g(v,∇∂rv)

= − m

r(r − 2m).

72

Page 81: Winitzki. Advanced General Relativity

3.1 Stationary spacetimes

Therefore

∇vv =g(∇vv, ∂r)

g(∂r, ∂r)∂r =

m

r2∂r.

The potential is easier to find from the 1-form g∇vv, which isexpressed as

g∇vv = − m

r(r − 2m)dr ≡ dΦ(r),

Φ(r) ≡ 1

2ln

(

1 − 2m

r

)

. (3.3)

Thus Φ(r) is the “potential” for the gravitational force mea-sured by the stationary observers in the Schwarzschild space-time. For large r ≫ 2m the potential becomes −m/r, which isthe familiar Newtonian potential due to a point massm, whilefor r → 2m the gravitational force approaches infinity. (Notethat the coordinate r does not coincide with the physical dis-tance.)

Self-test question: Is the motion of test particles in theSchwarzschild spacetime actually the same as the Newtonianmotion in the potential Φ(r) or in some other potential? An-swer: No. (The “gravitational force” depends on velocity aswell.) Not sure how to explain! Need to make this clear.

Remark: The fact that the Schwarzschild spacetime admits a“potential” Φ(r) is a consequence of spherical symmetry andtime-independence, i.e. the metric depends only on the radiusr. (More formally: The presence of an integrable Killing vec-tor ∂t shows that the spacetime is static; additionally, the “an-gular” Killing vectors ∂θ and ∂φ indicate the spherical sym-metry.) Thus the acceleration of an observer at constant r is avector directed along ∂r whose magnitude is a function onlyof r. Therefore this acceleration must be equal to a gradientof some function Φ(r). In general, all stationary spacetimesadmit such a “potential” although it may not be sphericallysymmetric; see Sec. 3.1.4.

Example: de Sitter spacetime

The metric is given by Eq. (1.37) and the stationary observersfollow lines of fixed x, y, z. The vector v is simply equal to ∂tand the basis of connecting vectors is ∂x, ∂y, ∂z . Since the fieldv is integrable and normalized, it is geodesic and ∇vv = 0.Hence, stationary observers are at the same time freely fallingand do not feel any gravity at all. However, the Newtonianapproximation breaks down at sufficiently large distances.This can be seen by computing the relative acceleration of twonearby observers (see the following calculation),

∇v∇v∂x = H2∂x.

At small distances L ≪ H−1, we are allowed to use the ap-proximation that stationary observers are not accelerated rel-ative to each other.

Calculation: For a connecting vector c which is one of∂x, ∂y, ∂z , show that

∇vc = Hc (3.4)

and thus derive the above equation. The relation (3.4) is calledthe Hubble law. (In a uniformly expanding spacetime, thevelocity of a comoving observer is proportional to the distanceto the observer.)

Solution: We can compute ∇vc by taking scalar products;for instance,

g(∇vc, c) =1

2∇vg(c, c) = −He2Ht,

while scalar products of ∇vc with other basis vectors van-ish. Thus we obtain Eq. (3.4) and as a trivial consequence,∇v∇vc = H2c.

3.1.2 Redshift

In a stationary spacetime, the magnitude of the Killing vec-

tor defines a scalar function, z ≡√

g(k,k). The function zis called the redshift factor because, as we shall now show,the local values of z characterize the variation of the photonfrequency measured by different stationary observers.

A null geodesic with a null tangent vector n represents apath of a photon. The vector n may be rescaled in such away that g(n,u) represents the energy E of the photon in therest frame of a timelike observer with 4-velocity u. From thePlanck relation, E = hν, where h is Planck’s constant, we findthat the measured frequency of the photon is ν = h−1g(n,u).In a stationary spacetime, stationary observers have world-lines with 4-velocity

u =1

g(k,k)k = z−1k.

Therefore, a stationary observer at a given location will mea-sure the frequency of a given lightray with a (null) worldlineγ(τ) as

hν = g(z−1k, γ).

By Statement 3.1.2.1 below, g(k, γ) remains constant alonglightrays. Thus, for a given single photon moving along a nullgeodesic, the frequency measured by a stationary observer isproportional to the local value of z−1. This justifies the name“redshift factor” for the function z.

Statement 3.1.2.1: If k is a Killing vector, so that Eq. (1.48)holds, and γ(τ) is a geodesic, then g(k, γ) is constant along γ.

Proof of Statement 3.1.2.1: Let v be a geodesic vector fieldcontaining γ; then∇vv = 0 and we compute

∇vg(k,v) = g(∇vk,v) = −g(v,∇vk),

thus ∇vg(k,v) = 0.

It follows from Statement 3.1.0.1 that ∇kz = 0; in otherwords, the redshift factor is constant for a given stationaryobserver. It also follows that the function z is everywhere con-stant (i.e. there is no redshift) iff the stationary observers aregeodesic.

3.1.3 Conformal Killing vectors

In cosmology, one usually considers spacetimes that are notstationary, so no timelike Killing vectors are present. Never-theless, sometimes there exists a “conformal Killing vector”

and then the redshift function z ≡√

g(k,k) is still useful.By definition, a conformal Killing vector is a vector field k

such that

Lkg = 2λg,

73

Page 82: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

where λ is some scalar function. It follows directly fromEq. (1.46) that a conformal Killing vector k satisfies the equa-tion

g(∇ak,b) + g(a,∇bk) = 2λg(a,b),

where a,b are arbitrary vectors. This is a modified version ofEq. (1.48).When a spacetime admits a conformal Killing vector, theredshift function is again useful, as demonstrated by the fol-lowing statement.

Statement 3.1.3.1: Assume that a spacetime admits a time-like conformal Killing vector k, and call observers movingalong k “conformally stationary.” Then the frequency of aphoton measured by conformally stationary observers is in-

versely proportional to the redshift function z ≡√

g(k,k).

Proof of Statement 3.1.3.1: For an arbitrary geodesic field v

we have∇vg(k,v) = g(∇vk,v) = λg(v,v).

If the field v is null (g(v,v) = 0) then

∇vg(k,v) = 0,

so g(k,v) remains constant along the geodesic lines, just as in

the case of ordinary Killing vectors. Defining z =√

g(k,k)and u ≡ z−1k, we again find

hν ≡ g(u, γ) = z−1g(k, γ) ∝ z−1.

The photon’s frequency is inversely proportional to the red-shift factor.

Note however that ∇kg(k,k) = 2λg(k,k), so the redshiftfactor z is not constant along the worldlines of stationary ob-servers.

Example 3.1.3.2: For a de Sitter spacetime with the met-ric (1.37), it is reasonable to guess that the vector k = f(t)∂t isa conformal Killing vector if the scalar function f(t) is chosencorrectly. Let us determine f(t), the conformal factor λ, andthe redshift z.It suffices to compute g(∇a∂t,b) for a,b either ∂t or ∂x. Wefind that ∇∂x

∂t = H∂x is the only nonzero derivative (simi-larly for ∂y and ∂z). Hence

g(∇∂tf∂t, ∂t) = fg(∂t, ∂t),

and f = λ. Further,

g(∇∂xf∂t, ∂x) = fH g(∂x, ∂x) ≡ λg(∂x, ∂x),

thus λ = fH = f and f(t) = eHt, λ = HeHt, z =√

g(k,k) =eHt. The frequency of photons decays with time t as e−Ht.

Practice problem: For a flat Friedmann-Robertson-Walkermetric,

g = dt2 − a2(t)(dx2 + dy2 + dz2),

where a(t) is a nonzero function, show that k = a(t)∂t is aconformal Killing vector with the factor λ = a(t).

Unlike the case of null geodesics, the behavior of timelikegeodesics in the presence of a conformal Killing vector is notas straightforward to describe without additional assump-tions.Suppose that a metric g has a conformal Killing vector k,and consider particles moving along a timelike geodesic fieldv normalized as g(v,v) = 1. The relative velocity δ~v of the

particles with respect to the conformally stationary observersis characterized by the scalar product g(v,u) according to thestandard special relativistic formula for the “gamma factor,”

γ ≡ 1√

1 − (δ~v)2

= g(v,u).

Wewould like to derive an equation for γ(τ) along a particle’swordline parameterized by the proper time τ . This is possiblein the following special case.

Calculation 3.1.3.3: Consider stationary observers whoseworldlines are parallel to the conformal Killing vector k =eHt∂t for de Sitter spacetime with the metric (1.37). A pointparticle of mass m is moving along a timelike geodesic; themotion is slow relative to the conformally stationary ob-servers. Show that the relative 3-velocity δ~v of the particledecreases exponentially at late times. In the Newtonian ap-proximation, the slowdown can be ascribed to a “gravitational

friction force” ~F which is proportional to the velocity accord-ing to the law

~F = −mHδ~v.(Details on page 171.)

In a more general situation, a closed equation for γ(τ) canbe derived if we assume additionally that k is an integrablevector field.

Statement 3.1.3.4: Suppose k is an integrable, timelike con-formal Killing vector field, Lkg = 2λg; the vector field u

is defined by u = z−1k, where z is the redshift function,

z ≡√

g(k,k); and v is a timelike geodesic normalized byg(v,v) = 1. Then the scalar product γ ≡ g(u,v) satisfies theequation

∇vγ =(1 − γ2

)λz−1. (3.5)

(Proof on page 171.)

For a particle moving with a small relative velocity δ~v withrespect to conformally stationary observers, Eq. (3.5) can be

interpreted in theNewtonian limit. We have γ ≈ 1+ 12 (δ~v)2, so

Eq. (3.5) can be rewritten as the equation for the “local kineticenergy,”

d

dτEkin = δ~v · ~F , ~F ≡ −mλz−1δ~v, Ekin ≡

1

2(δ~v)

2.

According to the Newtonian description from the viewpointof a conformally stationary observer, the kinetic energy of theparticle is decreaseddue to the “force of gravitational friction”~F that acts opposite and proportional to the velocity δ~v of theparticle.

3.1.4 Gravitational potential

In Sec. 3.1.1 we have seen that the Schwarzschild space-time possesses a “gravitational potential” Φ such that the 3-acceleration of freely falling bodies in stationary reference

frames is described by the Newtonian formula, ~a = −~∇Φ.This fact is a manifestation of a more general property: Everystationary spacetime admits a “gravitational potential” in thissense.

Statement:1 If k is a Killing vector, show that ∇kk is inte-grable. If k is timelike (a stationary spacetime), show that the

1After problem 4a in chapter 6 of [36].

74

Page 83: Winitzki. Advanced General Relativity

3.1 Stationary spacetimes

acceleration field ∇uu is integrable for a stationary observer

with the 4-velocity u = z−1k, where z ≡√

g(k,k) is the red-shift factor. Derive the “gravitational potential” for the accel-eration field∇uu.

Derivation: For an arbitrary vector x,

g(∇kk,x) = −g(k,∇xk) = −x 1

2g(k,k) = −1

2∇xz

2,

thus ∇kk = g−1df with f ≡ − 12z

2. Since u = z−1k and∇kz = 0, we find

∇uu =1

z∇k

(k

z

)

=1

z2∇kk +

k

z∇k

1

z

=1

z2∇kk = −1

2g−1 1

z2d(z2)

= −g−1d ln z.

Thus the “gravitational potential” Φ is

Φ = ln z =1

2ln g(k,k), ∇uu = −g−1dΦ.

(Note the minus sign in the above formula: The 4-accelerationof a freely falling body in a stationary frame is −∇uu, and the

3-acceleration, ~a = −~∇Φ, has the opposite sign relative to the4-acceleration. ***Alarm: what about the minus sign we getwhile computing g−1dΦ from the minus sign in the spatialcomponents of g?) This general expression agreeswith the re-sult (3.3) derived above for the Schwarzschild spacetime withthe Killing vector k ≡ ∂t,

Φ =1

2ln g(∂t, ∂t) =

1

2ln

(

1 − 2m

r

)

.

According to our (so far) heuristic picture of an asymp-totically flat spacetime as something generated by a “well-isolated” clump of gravitating matter, we expect that at largedistances the acceleration of stationary observers goes to zero,thus the gravitational potential Φ becomes constant. SinceΦ = ln z, this is equivalent to assuming that the redshift zbecomes constant at infinite distances. Then it is convenientto multiply the Killing vector k by a constant to achieve z → 1and Φ → 0 at infinity. After this rescaling, the redshift func-tion z describes the redshift of lightrays with respect to ob-servers at infinite distances.

Remark: A non-asymptotically flat spacetime may involveunbounded redshifts at infinity. An example is the de Sitterspacetime with the metric (1.37), which possesses a conformalKilling vector e2Ht∂t. The redshift function is z = e2Ht andgrows unboundedly along any lightray.

Note that the potential Φ is stationary:

k Φ = −g(k,∇uu) = 0.

On physical grounds, we expect that the 3-acceleration of afreely falling mass points “towards the center of gravity,” atleast when the test mass is outside of the clump of gravitat-ing matter. Thus, the gravitational potential decreases “awayfrom infinity,” and outside of the matter sources we have

Φ < 0, z < 1.

3.1.5 Energy

In a stationary spacetime, the Killing vector k generates timetranslations along the stationary worldlines. The local geom-etry of the spacetime is invariant under these time transla-tions: for instance, the scalar product of connecting vectorsremains constant. The existence of such time translations, inturn, leads to a conserved quantity, interpreted as the total en-ergy of the spacetime.We know that the distortion B(k) of a Killing vector k is a2-form,

B(k)(x,y) ≡ g(∇xk,y) = −g(∇yk,x).

In the index notation, we have

B(k)αβ = ∇αkβ .

The divergence of this 2-form is a 1-form e,

eβ ≡ ∇αB(k)αβ = ∇α∇αkβ ≡ kβ ; e = k.

Written in the index-free notation, the expression for e is sig-nificantly more cumbersome,

e z ≡ Tr(x,y)

[∇xB(k)

] (y, z)

= Tr(x,y) [∇xg(∇yk, z) − g(∇∇xy, z) − g(∇yk,∇xz)] .

So we shall use the index notation in the following calcula-tions. The 1-form e is divergence-free,2

∇βeβ = ∇β∇αB(k)αβ = 0,

therefore it defines a conserved vector current e ≡ g−1e. Theflux of e through a (spacelike) 3-volume V with a (timelike)normal vector v is constant, up to boundary terms. Assumingthat the 3-surface V “extends to infinity” and that the currente vanishes sufficiently quickly at infinity so that all the bound-ary terms vanish, we find that the quantity

E ≡∫

V

(e v)d3V ≡∫

V

eβvβd3V =

V

(kβ) vβd3V

is independent of the choice of the 3-volume V . The quantity

E is the “conserved charge” that corresponds to the conserved

current e. Alternatively, E can be written as an integral over aclosed spacelike 2-surface Σwhich is the boundary of V ,

E =

Σ

(∇αkβ) vαnβd2Σ, (3.6)

where the vectors v,n are normal to the 2-surface Σ. The 2-surface Σ lies outside of any matter sources and encircles theentire matter distribution.We shall now express E through the matter distribution us-ing the Einstein equation. It is again more convenient to workin the index notation. We have seen in Statement 1.8.4.1 that

∇µ∇νkα = Rναµβkβ .

Therefore

eβ = kβ = gαγRαβγδkδ = Rβδk

δ,

e x = Ric(x,k),

E =

V

Ric(v,k)d3V =

V

Rµνvµkνd3V.

2The fact that e is divergence-free can be also understood as a consequenceof the identity dd ∗ B(k) = 0, where ∗ is the Hodge star operation. Thusthe integral of the 3-form d ∗ B(k) over a 3-volume can be expressedthrough an integral of ∗B(k) over a closed 2-surface.

75

Page 84: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

We shall now show that E is proportional to the total energyE of the gravitating system in the Newtonian limit, wherethe gravitating matter is an isolated, stationary distributionof dust with density ρ. The energy of such a system is

E =

V

ρd3V,

and the energy-momentum tensor is

Tµν = ρvµvν ,

where v ≈ k are the stationary worldlines of matter particles.By the Einstein equation (1.68), we have

Rµν = −8π(Tµν −1

2Tgµν),

therefore

Rµνvµkν = −4πρvµvνv

µkν ≈ −4πρ

and

E =

V

Rµνvµkνd3V ≈ −4π

V

ρd3V = −4πE.

Thus the total energy of the system is, in the Newtonian ap-proximation,

E ≈ − 1

4πE.

We are therefore motivated to define the total energy con-tained in a stationary spacetime by

E = − 1

4πE = − 1

V

(kβ) vβd3V.

Calculation: Compute the total energy of the Schwarzschildspacetime with the metric (1.36). The Killing vector is k ≡ ∂t.Solution: It is convenient to use Eq. (3.6) and integrate overa 2-sphere Σ of a large radiusR. Choosing the normal vectorsv = z−1k and n = z∂r, we have

E = − 1

Σ

g(∇vk,n)d2Σ = − 1

Σ

g(∇kk, ∂r)d2Σ.

Since (see Sec. 3.1.4)

z = z(r) =

1 − 2m

r,

g(∇kk, ∂r) = −1

2∂rz

2 = −mr2,

we find

E =1

Σ

m

r2d2Σ = m.

Thus the total energy of the Schwarzschild spacetime is equalto the massm of the source of gravity.

Remark: The total momentum and the total angular mo-mentum of an isolated system are so far undefined. (Thereis no preferred reference frame with respect to which onewould measure the total momentum or angular momentum.)A satisfactory definition of these quantities requires more in-formation than the existence of a timelike Killing vector. Forinstance, symmetry with respect to rotations around an axisleads to conservation of the corresponding component of the

angular momentum. In a curved spacetime, there are no pre-ferred axes, so instead the rotational symmetry is formulatedas the existence of a spacelike Killing vector with closed orbits(these orbits play the role of “circles of constant r, θ around thez axis”). In a stationary spacetime there exists also a timelikeKilling vector k. If these two Killing vectors commute, thespacetime is called azimuthally symmetric. In such a space-time, the total angular momentum is well-defined and con-served. In the Newtonian limit, this quantity will correspondto the total angular momentum with respect to the z axis.

3.2 Conformal infinity

In the previous section, we considered a stationary spacetimewhere the total energy of an isolated system can be definedthrough an integral over an infinitely remote 2-surface [seeEq. (3.6)]. It appears that such a definition only requires that aspacetime be “asymptotically stationary,” that is, almost sta-tionary sufficiently far from the matter sources. In fact, itwould be convenient if we could evaluate the 2-surface inte-gral “at infinity” where the spacetime is “exactly” stationary(in a heuristic sense). The notion of conformal infinity allowsone to make these ideas precise. We shall approach this notionby considering the elementary example of a flat Minkowskispacetime.

3.2.1 Conformal infinity for Minkowskispacetime

We shall be able to evaluate quantities “at infinity” if we ex-tend the spacetime manifold M by adding “points at infin-ity” to it. A convenient way to produce such an “extended”spacetime manifold is by using a conformal transformationof the metric, g → g = e2λg, such that the conformal factore2λ vanishes at infinity. Then the infinitely remote points canbe located at finite distances according to the newmetric g. Ofcourse, the newmetric g is unphysical since it does not describeactual distances or time intervals in the spacetime. However,the metric g is related to the physical metric g by a conformaltransformation, which preserves important causal propertiesof the spacetime. Here we describe such a conformal transfor-mation for the Minkowski metric; later, we will consider moregeneral spacetimes.Consider the lightcone coordinates u, v, θ, φ in theMinkowski spacetime, which are defined through the stan-dard spherical coordinates t, r, θ, φ via

u ≡ t+ r, v ≡ t− r.

The radial lightrays are then lines of constant u or v. Themetric has the form

g = du dv − (u− v)2

4dS2, dS2 ≡ dθ2 + sin2 θdφ2.

A conformal transformation of the metric,

g → g = e2λ(u,v)g, (3.7)

where λ(u, v) is an arbitrary function of u and v (but notof θ, φ), leaves the radial lightrays invariant (see State-ment 3.2.1.1). The transformed (unphysical) metric can bewritten as

g = du dv − e2λ(u− v)

2

4dS2 ≡ du dv − r2dS2,

76

Page 85: Winitzki. Advanced General Relativity

3.2 Conformal infinity

where u, v are new lightcone coordinates,

u = f1(u), v = f1(v).

For simplicity, let us choose f1(s) = f2(s) = f(s). (Choosingmore complicated functions for u and v is certainly admissiblebut will make our work harder.) Then we have

e2λ = f ′(u)f ′(v); r2 = f ′(u)f ′(v)(u − v)2

4.

It is obvious that the lines of constant u are at the same timelines of constant u. Thus, the radial lines of constant u arenull generators of a 3-surface u = const, which are alwaysgeodesics (see Sec. 2.2.8). In this case, the function λ(u, v) has aparticularly simple form; the following statement shows thatradial lightrays remain geodesics in the metric g = e2λ(u,v)geven with an arbitrary function λ(u, v).

Statement 3.2.1.1: (a) In a two-dimensional spacetime, anynull curve is a geodesic curve. (b) The radial null geodesics(lightrays) are invariant under a conformal transformation ofthe form (3.7), where g is a flat metric. (Proof on page 171.)

The next important step is the following: the function f canbe chosen such that the new lightcone coordinates u, v have afinite range. There aremany possible choices of f ; for example,

f(s) =2

πarctan s; f(s) = tanh s; f(s) = erf s;

etc. With any of the above choices, f(s) tends to ±1 ass → ±∞. So at large values of u, v the unphysical metricg is flat on the sections of constant angles φ, θ and describesthe points u, v = ±1 as being at finite (unphysical) distances,although physically these points correspond to infinitely re-mote events u, v = ±∞. For convenience, one can introducethe coordinates

t, r, θ, φ

by defining

u ≡ t+ r, v ≡ t− r.

Thet, r(time-radial) section of the spacetime (with the an-

gular coordinates omitted) is sketched in Fig. 3.1.

As we described above, the unphysical manifold M is con-structed from the original spacetimemanifoldM by changingthe metric from g to g and adding the “infinitely far away” do-

mains u = ±1, v = ±1. Thus M is a manifold with boundary.The boundary of M consists of null lines (future null infinityand past null infinity), the spacelike infinity domain i0, andtimelike future infinity/past infinity points i±. Note that theconformal factor e2λ vanishes on the boundary; this is whatbrings infinity to finite distances in the unphysical manifold.To see that the choice of the function f(s) is not entirely ar-bitrary, consider the spacelike infinity domain i0 correspond-ing to r → ∞ (u → ∞, v → −∞). This domain is formally a2-sphere of radius

r2(i0) = limr→∞

eλr2 = limv→−∞

u→∞

f ′(u)f ′(v)(u − v)2

4.

Since f(s) → ±1 as s → ∞, the derivative f ′(s) must go tozero at large s. Suppose that f ′(s) tends to zero as ∼ s−n fors → ∞; then the limit r2(i0) depends on the value of n. Ifn ≥ 2 then the limit is zero, meaning that the domain of a“spacelike infinity” is a sphere of zero radius, i.e. a point. Inthe opposite case, n < 2, the limit is infinite, which means thatthe metric g diverges (is not regular) at the point i0. Thus weare motivated to consider only the regular case n ≥ 2.

1

10−1

−1

t

x

Figure 3.1: The time-radial section of the unphysical manifold

M for the Minkowski spacetime. The coordinatest, xhave finite ranges. The boundary of the dia-

gram represents points “at infinity.”

We may restrict the choice of the function f(s) further. Letus consider the limit of the metric g along a fixed lightray, sayu = u0, v → ∞. The coefficient r2 of the metric is then

r2 = limv→∞

f ′(u0)f′(v)

(u0 − v)2

4.

This limit is equal to zerowhen n > 2 but is finite and nonzerowhen n = 2. For f ′(s) ∼ f0s

−2, we find

r2(I +) =1

4f0f

′(u0) > 0.

We expect that the structure of the null infinity should reflectthe fact that null rays can be emitted in all directions spanninga 2-sphere. Thus, the null infinity at a fixed value of u = f(u0)should have a metric structure of a 2-sphere with a nonzeroradius. This is the case only if f ′(s) ∝ s−2 for s → ±∞.Hence, we shall only consider these choices of f(s). Then thefuture null infinity, denoted I + (“scri-plus”), has the topo-logical structure R × S2 (a 2-sphere for each value u = α),while a spacelike infinity is represented by a single point i0.The asymptotic form of the unphysical metric near a point(u = u0, v = ∞) on I + is

g ≈ du dv − f0f′(u0)

4dS2,

while the conformal factor Ω near I + is

Ω ≈ 1

f0(1 − u) (1 − v) .

Therefore, Ω ∼ r−1 at large distances. From the above expres-sion for Ω, it is also easy to see that the null infinity I + is a

null surface: The normal vector n ≡ ˆg−1dΩ is null onI + (butnot away from it!) since

g(n,n) = g−1(dΩ,dΩ) ∝ (1 − u) (1 − v) ∝ Ω = 0 on I+.

Note that there still remains a considerable freedom inchoosing the conformal transformations. However, within the

77

Page 86: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

present scheme every lightray will always be represented in adiagram by a straight line drawn at 45 angles, and the nullfuture boundaryI + will be a null surface.

Calculation: Show that the Riemann tensor has only one in-dependent component in a 2-dimensional spacetime. Thenuse the following expression for the curvature scalar in theunphysical metric,

R = Ω−2R+ 6Ω−3Ω, (3.8)

to show that any 2-dimensional metric can be transformedinto a flat metric by a choice of a conformal factor Ω2. (Theabove equation is derived e.g. in [36] and [9]. See also State-ment 1.8.4.1 on page 47.)

Solution: The Riemann tensor R(a,b, c,d) is a symmetricbilinear function of a ∧ b and c ∧ d; for a 2-dimensional vec-tor space with a basis e1, e2, any exterior product x ∧ y isproportional to the basis product e1 ∧ e2, so the space of ex-terior products x ∧ y is one-dimensional. Thus R(a,b, c,d) iseffectively a symmetric bilinear function in one-dimensionalspace. Such a function is fully specified by one coefficient,say R(e1, e2, e1, e2), or equivalently by the curvature scalarR = 2R(e1, e2, e1, e2). It is clear from the given relation thatthe curvature scalar can be made to vanish by a choice of thefunction Ω. Then the unphysical spacetime will have an iden-tically vanishing Riemann tensor, i.e. it will be flat.

3.2.2 Conformal diagrams

In the previous section we explored the construction of a con-formal infinity for Minkowski spacetime. The result was an

unphysical 4-dimensional manifold M containing all of Mand additionally some “infinitely remote points.” The met-

ric on M was “unphysical” since it did not represent actualdistances. However, this unphysical manifold allows one toevaluate quantities “at infinity” and hence is useful for theanalysis of asymptotic behavior of fields in the real spacetime.Another important application of the unphysical manifold

M is as a tool to help visualize the causal structure of thephysical spacetimeM. Such a visualization is possible if oneconsiders a 1+1-dimensional slice of the physical spacetime.This slice is a 1+1-dimensional spacetime M(1+1) having acertain “reduced”metric. This metric can be transformed intoa flat Minkowski metric by a conformal transformation. Then,a coordinate transformation can be performed such that themanifoldM(1+1) is mapped into into a 1+1-dimensional un-

physical manifold M having a finite extent in all directions.

The resulting manifold M can be simply drawn as a figureon a plane. (Implicitly, the plane carries the unphysical flatMinkowski metric; in particular, straight lines drawn at 45

angles are null geodesics.) An example of such a drawing isFig. 3.1. Figures of this kind are called conformal diagrams(also Carter-Penrose diagrams). We now explore conformaldiagrams in more detail.3

A conformal diagram provides a visualization of the causalstructure of a spacetime because the null geodesics remainstraight lines after the conformal transformation, and becausethe entire spacetime is drawn as a finite figure in a plane. Forinstance, it is easy to see the entire domain that can receivesignals from a given spacetime point.

3This and the following sections are adapted from the paper [38].

Remark: There is no analog of conformal diagrams for gen-eral 3+1-dimensional spacetimes. The reason is that a generalspacetime is not conformally flat and cannot be mapped intoa region of Minkowski spacetime by a conformal transforma-tion. For instance, even if we demand that the spatial sectionsof the spacetime be flat and assume a metric of the form

g = dt⊗ dt− a2(t, x1, x2, x3)

3∑

i=1

dxi ⊗ dxi,

the spacetime will be conformally flat only if a(t, x1, x2, x3)has the special form4

a(t, x1, x2, x3) =1

η(t) + ξ(t)∑3

i=1 (xi − βi(t))2 ,

where βi, η, ξ are arbitrary functions of time. Nevertheless, inmany cases a 3+1-dimensional spacetime can be adequatelyrepresented by a suitable 1+1-dimensional slice, at least forthe purpose of qualitative illustration. For instance, a spheri-cally symmetric spacetime is visualized using the time-radialhalf-plane t, r, r ≥ 0, where each point stands for a 2-sphereof radius r. A conformal diagram is then drawn for the re-duced 1+1-dimensional spacetime.The reduced conformal diagram is meaningful only ifthe null geodesics in the 1+1-dimensional section are alsogeodesics in the physical 3+1-dimensional spacetime. In thiscase, the 1+1-dimensional section can be visualized as the setof events accessible to an observer who sends and receivessignals only along a fixed spatial direction. Then a conformaldiagram provides information about the causal structure ofspacetime along this line of sight.

After recapitulating the standard construction of conformaldiagrams, I develop an easier method.

Standard procedure

A conformal diagram is defined for a 1+1-dimensional space-time with a given line element gabdx

adxb. The coordinatesxa (where a = 0, 1) must cover the entire spacetime, andseveral sets of overlapping coordinate patches may be usedif necessary. The standard construction of the conformal di-agram may be formulated as follows (see e.g. [?], chapter 3).One first finds a change of coordinates x → x such that thenew variables xa have a finite range of variation; the com-ponents of the metric change according to gab(x)dx

adxb =gab(x)dx

adxb. One then chooses a conformal transformationof the metric (in the new coordinates),

gab → γab (x) = Ω2 (x) gab, Ω (x) 6= 0, (3.9)

such that the new metric γab is flat, i.e. has zero curva-ture. A suitable function Ω(x) always exists because all two-dimensional metrics are conformally flat. The new metric γabdescribes an unphysical, auxiliary flat spacetime. Since themetric γab is flat, a further change of coordinates x → ˜x (thenew variables ˜x again having a finite extent) can be found totransform γab explicitly into the Minkowski metric ηab,

γab (x) dxadxb = ηabd˜xad˜xb, ηab ≡ diag (1,−1) . (3.10)

For brevitywe incorporate all the required coordinate changesinto one, x → x(x), and summarize the transformation of themetric as

Ω2 (x) gabdxadxb = ηabdx

adxb. (3.11)

4The author is grateful to Matthew Parry for figuring this out.

78

Page 87: Winitzki. Advanced General Relativity

3.2 Conformal infinity

Thus the new coordinates xa map the initial spacetime ontoa finite domain within a 1+1-dimensional Minkowski plane.This finite domain is a conformal diagram of the initial (phys-ical) 1+1-dimensional spacetime. The diagram is drawn on asheet of paper which implicitly carries the fiducial Minkowskimetric ηab, the vertical axis usually representing the timelikecoordinate x0.The coordinate and conformal transformations severely dis-tort the geometry of the spacetime since they bring infinitespacetime points to finite distances in the diagram plane. Soone cannot expect in general that straight lines in the diagramcorrespond to geodesics in the physical spacetime. However,it is well known that straight lines drawn at 45 angles in aconformal diagram represent null geodesics in the physicalspacetime. This follows from the fact that any null trajectoryxa(τ) in 1+1 dimensions, i.e. any solution of

gabxaxb = 0, xa ≡ dxa

dτ, (3.12)

is necessarily a geodesic (this is not true in higher dimen-sions), and Eq. (3.12) is invariant under conformal transforma-tions of the metric. By drawing lightrays emitted from variouspoints in the diagram, one can illustrate the causal structureof the spacetime.A textbook example is the conformal diagram for the flatMinkowski spacetime with the metric gab ≡ ηab. Calculationsare conveniently done in the lightcone coordinates

u ≡ x0 − x1, v ≡ x0 + x1, ηabdxadxb = du dv, (3.13)

and a suitable coordinate transformation is

u = tanhu, v = tanh v, du dv =du dv

(1 − u2) (1 − v2). (3.14)

The new coordinates u, v extend from−1 to 1. Multiplying themetric by the conformal factor Ω2(u, v) ≡

(1 − u2

) (1 − v2

),

we obtain the fiducial spacetime,

Ω2du dv = du dv = ηabdxadxb, (3.15)

u ≡ x0 − x1, v ≡ x0 + x1. (3.16)

The new coordinates xa have a finite extent, namely∣∣x0 ± x1

∣∣ < 1, and the resulting diagram has a diamond shape

shown in Fig. 3.2. To appreciate the distortion of the space-time geometry, we can draw the worldline of an inertial ob-server moving with a constant velocity. Note that the angleat which this trajectory enters the endpoints depends on thechosen conformal transformation and thus cannot serve as anindication of the observer’s velocity.There is no analog of conformal diagrams for general 3+1-dimensional spacetimes. Nevertheless, in many cases a 3+1-dimensional spacetime can be adequately represented by asuitable 1+1-dimensional slice, at least for the purpose ofqualitative illustration. For instance, a spherically symmetricspacetime is visualized as the (t, r) half-plane (r ≥ 0) whereeach point stands for a 2-sphere of radius r. A conformal di-agram is then drawn for the reduced 1+1-dimensional space-time.The reduced conformal diagram is meaningful only ifthe null geodesics in the 1+1-dimensional section are alsogeodesics in the physical 3+1-dimensional spacetime. In thiscase, the 1+1-dimensional section can be visualized as the setof events accessible to an observer who sends and receivessignals only along a fixed spatial direction. Then a conformaldiagram provides information about the causal structure ofspacetime along this line of sight.

x0

x1

1

1

A

B0−1

−1

Figure 3.2: A conformal diagram of the 1+1-dimensionalMinkowski spacetime. Dashed lines show ligh-trays emitted from points A,B. The curved line isthe trajectory of an inertial observer moving witha constant velocity, x1 = 0.3x0.

Method of lightrays

The standard construction of conformal diagrams involves anexplicit transformation of the metric to new coordinates thathave a finite extent, and usually a further transformation tobring the metric to a manifestly conformally flat form. Find-ing these transformations requires a certain ingenuity. If thespacetime manifold is covered by several coordinate patches,a different transformation must be used in each patch. How-ever, a conformal diagram typically consists of just a few linesand one would expect that the required computations shouldnot be so cumbersome.

I now describe a method of drawing conformal diagramsthat avoids the need for performing explicit transformationsof the metric. The method is based on a qualitative analysis ofintersections of lightrays. This approach is particularly suit-able for the analysis of stochastic spacetimes encountered inmodels of eternal inflation. Such spacetimes have no symme-tries and their metric is not known in closed form, so one can-not apply the standard construction of conformal diagrams.

Another motivation for the new method is the apparent re-dundancy involved in the standard method. It is clear thatthe transformations used in the standard construction are notunique. For instance, one may replace the lightcone coordi-nates in Eq. (3.14) by

u→ f (u) , v → g (v) , (3.17)

where f, g are arbitrary monotonic, bounded, and continuousfunctions. The shape of the diagram will vary with each pos-sible choice of the transformations, but all resulting diagramsare equivalent in the sense that they contain the same infor-mation about the causal structure of the spacetime. One thusexpects to be able to extract this information without involv-ing specific explicit transformations of the coordinates and themetric.

The crucial observation is that this information is unam-biguously represented by the geometry and topology of ligh-trays and their intersections. I shall now develop this idea intoa self-contained approach to drawing conformal diagramsthat does not involve explicit transformations.

79

Page 88: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

New definition of conformal diagrams

A conformal diagram is a figure in the fiducial Minkowskiplane satisfying certain conditions, and I first formulate a def-inition of a conformal diagram in terms of such conditions. Aconstructive procedure for drawing conformal diagrams willbe presented subsequently.A finite open domain of the plane is a conformal diagramof a given 1+1-dimensional spacetime S if there exists a one-to-one correspondence between all maximally extended ligh-trays in S and all straight line segments drawn at 45 an-gles within the domain of the diagram. This correspondencemust be intersection-preserving, i.e. any two lightrays inter-sect in the physical spacetime exactly as many times as thecorresponding lines intersect in the diagram. It is assumedthat all null geodesics in the spacetime S are either infinitelyextendible or end at singularities or at explicitly introducedspacetime boundaries. Similarly, the straight line segmentsdrawn at 45 angles in a conformal diagram must be limitedonly by the boundary of the diagram. Note that there are onlytwo spatial directions in a 1+1-dimensional spacetime S, andthat two lightrays emitted in the same direction cannot inter-sect.For example, the diamond

∣∣x0 ± x1

∣∣ < 1 is a conformal dia-

gram for the flat spacetime due to the intersection-preservingone-to-one correspondence of null lines x0 ± x1 = const in thediagram and lightrays x0 ± x1 = const in the physical space-time.For spacetimes having a nontrivial topology, appropriatetopological features need to be introduced also into the fidu-cial Minkowski plane. At this point I do not consider suchcases.I shall now demonstrate the equivalence of the proposeddefinition to the standard procedure for drawing conformaldiagrams. It suffices to find a conformal transformation of theform (3.11) in some neighborhood of an arbitrary (nonsingu-lar) spacetime point. Given a diagram with an intersection-preserving correspondence of lightrays, we can introduce lo-cal lightcone coordinates u, v in the diagram such that thenull geodesics are locally the lines u = const or v = const.By assumption, each null geodesic uniquely corresponds to alightray in the physical spacetime. Since the correspondenceis intersection-preserving, the local configuration of the nullgeodesics in the physical spacetime can be visualized as inFig. 3.3. Hence the local lightcone coordinates u, v becomewell-defined local coordinates in the physical spacetime, andagain the null geodesics are the lines u = const or v = const.On the other hand, these null geodesics must be solutions ofEq. (3.12), therefore

guuu2 + 2guvuv + gvvv

2 = 0 if u = 0 or v = 0. (3.18)

It follows that guu = gvv = 0. Thus themetric in the local light-cone coordinates is of the form gabdx

adxb = 2guv(u, v)du dvwhich is explicitly conformally flat. This demonstrates the ex-istence of a local conformal transformation bringing the phys-ical metric gab into the fiducial Minkowski metric du dv =ηabdx

adxb in the diagram.Before presenting examples, I comment on the proposeddefinition of conformal diagrams. The definition may appearto be too broad, allowing many geometric shapes to representthe same spacetime. However, the old procedure also doesnot specify a particular conformal transformation of the met-ric and in effect admits precisely as much freedom. According

x0

x1

x0

x1

correspondence

Figure 3.3: The local correspondence between straight linesdrawn at 45 angles in the conformal diagram (left)and lightrays in the physical spacetime (right).

to the new definition, any two different conformal diagramsof the same spacetime are equivalent in the sense that all thelightrays in those two diagrams will be in an intersection-preserving one-to-one correspondence. In the old language,there must exist a conformal transformation bringing one dia-gram into the other. Equivalent diagrams carry identical in-formation about the causal structure of the physical space-time. In the next sections I shall give examples illustratingthe arbitrary and the necessary choices involved in drawingconformal diagrams.A conformal diagram delivers information mainly throughthe shape of its boundary line. The boundary of a confor-mal diagram generally contains points representing a space-like, timelike, or null infinity, and points belonging to ex-plicit boundaries of the spacetime manifold (e.g. singulari-ties) where lightrays end in the physical spacetime. The latterboundaries will be called physical boundaries to distinguishthem from putative infinite boundaries whose points do notcorrespond to any events in the physical spacetime. (The def-inition of conformal diagrams contains the requirement thatthe diagram domain be topologically open, and so the bound-ary points are not supposed to belong to the diagram.) Theinfinite boundary is of course the most interesting feature of adiagram.Lastly, I would like to emphasize that conformal diagramscan be drawn not only for geodesically complete spacetimesbut also for “artificially incomplete” spacetimes, i.e. for se-lected subdomains of larger manifolds. In fact such “artifi-cially incomplete” spacetimes are often needed in cosmolog-ical applications. Examples are a description of a collapsingstar using a subdomain of the Kruskal spacetime and a de-scription of an inflationary universe using a subdomain of thede Sitter spacetime.

Minkowski spacetime

The new definition merely lists the conditions to be satisfiedby a conformal diagram. Based on these conditions, I now de-velop a practical procedure for drawing the diagrams, usingthe Minkowski spacetime as the first example.We begin by choosing a Cauchy surface in the physicalspacetime. In 1+1 dimensions, a Cauchy surface is a line Lsuch that intersections of lightrays emitted from L entirelycover the part of the spacetime to the future of L. In theMinkowski spacetime, we may choose the line x0 = 0 as theCauchy surface L. The image of the Cauchy surface L in the

conformal diagram must be a finite curve L from which ligh-

80

Page 89: Winitzki. Advanced General Relativity

3.2 Conformal infinity

A

B

C

D

E

Cauchy line L

???

Figure 3.4: Construction of the conformal diagram for theMinkowski spacetime. The point E cannot belongto the diagram because there are no lightrays inter-secting at E.

trays can be emitted in both spatial directions. Therefore the

slope of the curve L must not exceed ±45 but otherwise Lmay be drawn arbitrarily, e.g. as the curve AB in Fig. 3.4. Theendpoints A,B represent a spatial infinity in the two direc-tions.In the Minkowski spacetime, two lightrays emitted towardseach other from any two points on Lwill eventually intersect.Therefore the domain of the conformal diagram must containat least the triangular region ABC. On the other hand, anypoint outside ABC, such as the point E in Fig. 3.4, cannot be-long to the diagram domain because the point E cannot bereached by any left-directed lightray emitted from L, and weknow that all points in the Minkowski spacetime are intersec-tion points of some lightrays. (More formally, the existenceof the point E within the diagram would violate the condi-tion that the correspondence between lightrays is intersection-preserving.) Therefore the future-directed part of the confor-mal diagram is bounded by the lines AC and BC.A completely analogous consideration involving past-directed lightrays leads to the conclusion that the past-directed part of the diagram is the region ABD. Thus a pos-sible diagram for the Minkowski spacetime is the interior ofthe rectangle ACBD. This diagram differs from the square-shaped diagram in Fig. 3.2 by a (finite) conformal transforma-tion of the form (3.17).We can also ascertain that the points C andD are the futureand the past timelike infinity points. For instance, the point Cis the intersection of the lines AC and BC; these lines are in-terpreted as putative lightrays emitted from infinitely remotepoints of L. At sufficiently late times, any inertial observerin the Minkowski spacetime will catch lightrays emitted fromarbitrarily far points. The same holds for observers movingnon-inertially as long as their velocity does not approach thatof light. Hence all trajectories of such observers must finish atC.

Cauchy surfaces and artificial boundaries

Drawing Cauchy surfaces is a convenient starting point in theconstruction of conformal diagrams.A Cauchy surface for a domain of spacetime is a 3-surface

boundary

Cauchy line L

Figure 3.5: A Cauchy line must have a slope between −45

and 45 (left). A timelike boundary must be acurve with a slope between 45 and 135 (right).

S such that every timelike curve (without endpoints) withinthe region intersects S exactly once. (It follows that a Cauchysurface must be spacelike.)It is clear that in any 1+1-dimensional spacetime a suffi-ciently small neighborhood of a Cauchy surface has the samecausal properties as the line x0 = 0 in the Minkowski plane:namely, two nearby lightrays emitted toward each other willcross, while rays emitted from a point in opposite directions

will diverge. Therefore the line L representing a Cauchysurface in a conformal diagram must have a slope between−45 and 45 (see Fig. 3.5, left). We shall call such lineshorizontally-directed. Other than this, there are no restric-tions on the shape of the line L and it may be drawn as an ar-bitrary horizontally-directed curve. (In some spacetimes, oneneeds to use several disconnected Cauchy surfaces, but I shallnot consider such cases here.)Another frequently occurring feature in conformal dia-grams is a timelike boundary. For example, a spherically sym-metric 3+1-dimensional spacetime is usually reduced to the1+1-dimensional (r, t) plane, where 0 < r < +∞. From the1+1-dimensional point of view, the line r = 0 is an artificiallyintroduced timelike boundary that can absorb and emit ligh-trays. The local geometry of lightrays near r = 0 is shownin Fig. 3.5 (right). It is clear from the figure that in a confor-mal diagram the timelike boundary must be represented by aline with a slope between 45 and 135. Let us call such linesvertically-directed.As an example of using timelike boundaries, let us considerthe subdomain (t0 < t <∞, x1 < x < x2) of a de Sitter space-time with flat spatial sections, described by the metric

gabdxadxb = dt2 − e2Htdx2. (3.19)

This subdomain can be visualized as the future of a selectedinitial comoving region. The Cauchy line t = t0 is con-nected to the two timelike boundary lines, x = x1,2. In thecoordinate system (3.19), the null geodesics are solutions ofdx/dt = ±e−Ht and it is easy to see that a lightray emittedat x = 0, t = 0 only reaches the values |x| < H−1 (the limitvalue is the de Sitter horizon). Null geodesics emitted fromthe Cauchy surface and from the boundary lines are sketchedin Fig. 3.6 where it is assumed that the comoving domainx1 < x < x2 contains several de Sitter horizons. A lightrayemitted in the positive direction, such as the ray A, intersectsleft-directed lightrays emitted from the pointB or from nearerpoints but does not intersect lightrays emitted further away,such as the ray C. We call the ray B the rightmost ray inter-secting A. It is clear that the intersection of the correspond-ing lines A and B in the conformal diagram must occur at the

81

Page 90: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

x

x1

A

AB

B

C CD

D

x2

t

t0

Figure 3.6: Construction of a conformal diagram for a part ofde Sitter spacetime delimited by thick lines (left).The upper boundary of the diagram (right) is ahorizontal line.

boundary of the diagram, otherwise there would exist furtherlines intersecting A to the right of B. Considering a ray D tothe right of A, we find that the rightmost ray for D is C. Theintersection of C and D is thus another point on the bound-ary of the conformal diagram. It follows that the boundaryline must have a slope between −45 and 45; for simplicity,we draw a straight horizontal line (Fig. 3.6, right). This linerepresents the (timelike and null) infinite future.

A timelike boundary can be interpreted as the trajectory ofan observer who absorbs or emits lightrays and thus partic-ipates in the exploration of the causal structure of the space-time. The lightrays emitted by the boundary, together withthose emitted from the Cauchy surface, form the totality of alllightrays that must be bijectively mapped into straight lines inthe conformal diagram. The role of timelike boundaries andCauchy surfaces is to provide a physically motivated bound-ary for the part of the spacetime we are interested in.

An artificial timelike boundary may also be introduced intothe spacetime with the purpose of simplifying the construc-tion of the diagram. Below we shall show that conformal di-agrams can be simply pasted together along a common time-like boundary.

3.2.3 How to draw conformal diagrams

We can now outline a general procedure for building a confor-mal diagram for a given 1+1-dimensional spacetime using themethod of lightrays. The procedure can be applied not only togeodesically complete spacetimes but also to spacetimes withexplicitly specified boundaries.

One starts by considering the future-directed part of thespacetime and by choosing a suitable Cauchy surface and,possibly, some timelike boundaries. These lines are repre-sented in the diagram by arbitrarily drawn horizontal andvertical curves of finite extent. The endpoints of these curvescorrespond either to the intersection points of Cauchy linesand timelike boundaries, or to imaginary points at spacelikeand timelike infinity.Note that Cauchy surfaces and timelike boundaries are thelines on which boundary conditions for e.g. a wave equationmust be specified to obtain a unique solution within a domain.One expects that any physically relevant spacetime should

(a) (b)XX rXrX X ′

X ′ rX′

Figure 3.7: The local direction of the infinite boundary (thickline) is found by determining the rightmost raysrX , rX′ for nearby rays X,X ′. The direction is at45 angle (a) when rX = rX′ and horizontal (b)when rX 6= rX′ .

contain a suitable set of Cauchy surfaces and timelike bound-aries, if the classical field theory is to have predictive power.Qualitative knowledge of the geometry and topology of theseboundaries is required for building a conformal diagram us-ing the method of lightrays.After drawing the curves for the physical boundaries, it re-mains to determine the shape of the infinite (timelike and null)boundaries of the diagram. To this end, one can first con-sider right-directed lightrays emitted from various points onthe Cauchy line and from the boundaries, including the limitpoints at infinity. For each right-directed lightray X there ex-ists a certain subset SX of (left-directed) lightrays that inter-sect X . Since the subset SX has a finite extent, there existsa rightmost ray rX ∈ SX . (In the de Sitter example above,the rightmost ray rA is the ray B and the rightmost ray rD isC.) By definition of the conformal diagram, the lightrays arestraight lines limited only by the boundary of the conformaldiagram, therefore the intersection point of X and rX mustbelong to the boundary. In this way we have established thelocation of one point of the unknown boundary line, namelythe endpoint of the rayX .To determine the local direction of the boundary line atthat point, we use the following argument. For each right-directed lightray X , we can consider a right-directed ray X ′

infinitesimally close and to the right of X (if no ray X ′ can befound to the right of X , it means that X itself belongs to theboundary of the diagram). Then there are two possibilities(see Fig. 3.7a,b): either the rightmost ray rX is also the right-most ray rX′ forX ′, or the ray rX′ is located to the right of rX .In the first case, the boundary line has a 45 slope and locallycoincides with rX , while in the second case the boundary lineis horizontally-directed. Thus we can draw a right-directedfragment of the infinite boundary that limits the ray X . Wethen continue by moving further to the right and consider theendpoint of the rayX ′, etc.The same procedure is then repeated for left-directed ligh-trays, until one finishes drawing all unknown lines in thefuture-directed part of the diagram. In this way the infiniteboundary of the conformal diagram is constructed as the lo-cus of “last intersections” of lightrays. Finally one applies thesame considerations to past-directed lightrays and so com-pletes the diagram.It follows that the infinite boundary of a conformal diagramcan always be drawn as a sequence of either straight line seg-ments directed at 45 angles, or horizontally- and vertically-directed curves. In our convention, Cauchy surfaces and arti-ficial timelike boundaries are drawn as curved lines and infi-nite boundaries as straight lines (when possible).Let us briefly consider the pasting of conformal diagramsin general. When two spacetime domains are separated by

82

Page 91: Winitzki. Advanced General Relativity

3.2 Conformal infinity

a timelike worldline, the corresponding conformal diagramscan be pasted together along the boundary line. To verify thisalmost obvious statement more formally, we begin by draw-ing the timelike boundary as a vertically-directed line in thediagram. The shape of the conformal diagram to the rightof the boundary is determined solely by the intersections oflightrays within the right half of the spacetime. Hence, theconformal diagram for the right half of the spacetime can bepasted to the right of the timelike boundary line. The sameholds for the left half of the spacetime. This justifies pastingof diagrams along a common timelike boundary.

Further examples

Collapsing star Themethod of lightrays does not require ex-plicit formulae for the spacetime metric if the qualitative be-havior of lightrays is known. As another example of using thelightray method, let us consider an asymptotically flat space-time with a star collapsing to a black hole (BH).To reduce the spacetime to 1+1 dimensions, we assumespherical symmetry and consider only the (r, t) plane, where0 < r < +∞ and the line r = 0, the center of the star, isan artificial boundary. As before, we start with the future-directed part of the diagram. We must first choose a Cauchysurface; a suitable Cauchy surface is the line t = t0 where t0is a time chosen before the collapse of the star. We representthe Cauchy surface by the curve AB in the diagram (Fig. 3.8).The point A corresponds to (r = 0, t = t0), while B is a spa-tial infinity (r = ∞, t = t0). The artificial boundary r = 0 isrepresented by the vertical line AC.To investigate the shape of the future part of the diagram,we need to analyze the intersections of lightrays emitted fromfaraway points of the Cauchy surface. We know from qualita-tive considerations of the black hole formation that lightrayscan escape from the star interior only until the appearance ofthe BH horizon. Shortly thereafter the star center becomes asingularity that cannot emit any lightrays. Hence, among allthe rays emitted from the star center at various times, thereexists a “last ray” not captured by the BH, while rays emit-ted later are captured. We arbitrarily choose a point E on theboundary line r = 0 to represent the emission of this “lastray.” Any lightray emitted from r = 0 before E will prop-agate away from the BH and so will intersect all left-directedlightrays emitted from arbitrarily remote points of the Cauchyline AB. Since all the intersection points are outside of the BHhorizon, it follows that the conformal diagram contains thepolygon AEFB which represents the spacetime outside theBH. The line FB is the infinite null boundary of the diagram,while the line EF is the BH horizon.It is clear that the point C is the last point from which ligh-trays can be emitted from the star center and thus C is thebeginning of the BH singularity. It remains to determine theshape of the diagram between the points C and F . We knowthat a lightray emitted from the center after E will be recap-tured by the BH singularity and thus will not intersect withlightrays entering the BH horizon sufficiently late. For in-stance, the lightray emitted at E′ will intersect with the ligh-tray F ′ but not with a later rayF ′′, as shown in Fig. 3.8. There-fore the diagram boundary line connecting C and F is locallyhorizontally-directed. This line consists of final intersectionpoints of lightrays emitted from the star center and those en-tering the BH horizon from outside. These final intersectionpoints are located at the BH singularity which is therefore rep-

A

B

C

D

E

F

E′

F ′F ′′

Cauchy line

r = 0

r=

0

Figure 3.8: Construction of the conformal diagram for thespacetime of a collapsing star. The thick dotted lineCF is a Schwarzschild singularity.

resented by the entire line CF .

The past-directed part of the conformal diagram is easy toconstruct. Since any two past-directed lightrays intersect, thepast-directed part is similar to that for the Minkowski space-time with a boundary at r = 0, namely the triangular domainABD. Thus the diagram in Fig. 3.8 is complete.

Future part of de Sitter spacetime In Fig. 3.6 we have drawna conformal diagram for the subdomain of a de Sitter space-time delimited by two timelike boundaries. We shall now con-struct the diagram for the future part of a de Sitter spacetimewith spatially unlimited sections. (Note that the past half ofthe de Sitter spacetime is not covered by the flat coordinatesbecause of incompleteness of past-directed geodesics. In thispaper we shall not use the complete de Sitter spacetime butonly the future of an arbitrarily chosen, unbounded, spacelikeCauchy hypersurface.)

The Cauchy surface t = t0, −∞ < x < ∞ is drawn as afinite horizontal curve in the conformal diagram (the curvedlineAB in Fig. 3.9). The pointsA,B in the diagram represent aspacelike infinity in the two directions and do not correspondto any points in the physical spacetime. It remains to establishthe shape of the infinite future boundary which must be a lineconnecting the points A,B to the future of the Cauchy sur-face. We already know from the construction of the diagramin Fig. 3.6 that this future boundary is locally horizontal. Sincethe behavior of lightrays emitted from all points of the Cauchysurface is the same, the future boundary may be representedby a horizontal straight line connecting the points A,B. (Tokeep the convention of having straight infinite boundaries, wehave drawn the Cauchy surface t = t0 as a curve extendingdownward from the straight line AB.)

Note that the same diagram also represents an inhomoge-neous spacetime consisting of de Sitter-like regions with dif-ferent values of the Hubble constant H . This is so because adifference in the local values of H does not change the quali-tative behavior of lightrays at infinity: the rays will intersectonly if emitted from sufficiently near points. The infinite fu-ture boundary remains a horizontally-directed line as long asH 6= 0 everywhere.

The construction of conformal diagrams for the following

83

Page 92: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

A B

t = t0

Figure 3.9: A conformal diagram for the future part of thede Sitter spacetime with flat spatial sections. Thecurved line represents the spatially infinite Cauchysurface t = t0. The local behavior of lightrays is thesame as in Fig. 3.6.

spacetimes is left as an exercise for the reader.

• A subdomain of the Minkowski spacetime between twocausally separated observers moving with a constantproper acceleration in opposite directions (Fig. 3.10, left).

• A flat closed universe: the subdomain (−∞ < t < ∞,x1 < x < x2) of a Minkowski spacetime with the linesx = x1 and x = x2 identified (Fig. 3.10, right).

• An asymptotically flat spacetime with two stars collaps-ing into two black holes; the line of sight crosses the twostar centers (Fig. 3.11).

• The future part of a de Sitter spacetime with a star col-lapsing into a black hole (Fig. 3.12).

• A maximally extended Schwarzschild-de Sitter space-time (Fig. 3.13). It is interesting to note that this diagramis usually drawn with all Schwarzschild and de Sitter re-gions having the same size, which makes the figure un-bounded (Fig. 3.13, top) despite the intention to representthe spacetime by a finite figure in the fiducial Minkowskiplane. To adhere to the definition of a conformal dia-gram as a bounded figure, one can use a suitable confor-mal transformation reducing the diagram to a finite size(e.g. Fig. 3.13, bottom).

3.3 Asymptotic flatness

Additional literature:[23] A concise derivation of the peeling property.[9] Good conceptual explanations of the asymptotic de-scription of spacetimes in GR.[7] A recent review.

The concept of an asymptotically flat spacetime was in-vented to formalize the picture of a spacetime which is “sim-ilar to Minkowski near infinity.” Informally, a spacetime isasymptotically flat if it can be extended via a conformal trans-formation to include the infinity domain, and then if the struc-ture of the infinity is the same as that of the Minkowski space-time. A definition can be given more formally as follows:A spacetime manifoldM is asymptotically flat in the nullfuture direction if:

1. There exists an extension ofM to an (unphysical) mani-

fold Mwith an (unphysical) regular metric g and bound-aryI +, such that g = Ω2g, where Ω is a smooth function

such thatΩ = 0 on the boundary, but ∇Ω 6= 0 everywhere

(we denote by ∇ the covariant derivative with respect tothe unphysical metric).

A B x=x

1

x=x

2

Figure 3.10: Left: The domain of Minkowski spacetime be-tween two causally separated observers A,Bmoving with a constant proper acceleration in op-posite directions. The dashed lines show that thetwo observers cannot see each other. Right: Aclosed Minkowski universe. The thick lines rep-resent the identified boundaries x = x1,2. A ligh-tray (dashed line) crosses the line x = x1,2 in-finitely many times.

A BT

Figure 3.11: A spacetime with two collapsing stars. The pointT is a future timelike infinity reached by an ob-server (thick curve) remaining between the twoblack holes. The lines AT and TB represent BHsingularities and the dashed lines are the BH hori-zons.

A B

Figure 3.12: The future part of a de Sitter spacetime with a col-lapsing star. The line AB represents the BH sin-gularity; the dashed line is the BH horizon.

84

Page 93: Winitzki. Advanced General Relativity

3.4 Conformal radiation fields

Figure 3.13: Diagrams for a maximally extendedSchwarzschild-de Sitter spacetime. The con-ventional, unbounded diagram (top) and anequivalent diagram having a finite extent (bot-tom) are related by a conformal transformation.The thick dotted lines represent Schwarzschildsingularities. Both diagrams contain infinitelymany Schwarzschild and de Sitter regions.

2. The boundary I + is a null surface (in the unphysicalmetric), i.e. Ω is a null function on the boundary.

3. The boundary surface I + has topology R × S2.

4. The vector−∇Ω is future-pointing atI + (as appropriatefor the future boundary).

5. The energy-momentum tensor of matter (equivalently,the Ricci tensor) vanishes in a neighborhood ofI +. (Thiscondition may be relaxed to vanishing only as Ω4 orfaster.)

From these conditions, we shall now derive the existence ofMinkowski-like coordinates “at large distances.” Let n be thenull vector dual to−dΩ. First, we note that g(n,n) = 0 onI +

implies that the auxiliary function f ≡ Ω−1g(n,n) is smooth

on M . Secondly, there is a freedom to rescale the unphysi-cal metric by a nowhere vanishing factor e2λ. Under such arescaling, the function f changes as

f → e−λ(

f + 2∇nλ)

on I+. (3.20)

Therefore, a suitable choice of λ will make f vanish on I +.There will remain the freedom to perform conformal transfor-

mations g → e2λgwith ∇nλ = 0, i.e. with the factor λ constantalong the flow lines of n.

Calculation: Derive Eq. (3.20).Solution: A replacement g → e2λg requires Ω → eλΩ be-cause the physical metric Ω−2g must remain fixed. Sinceg(n,n) = g−1(dΩ,dΩ) and the differential d is metric-independent, we find that the function f transforms underthe above replacements as

f = Ω−1g−1(dΩ,dΩ)

→ e−λΩ−1e−2λg−1(d(eλΩ),d(eλΩ))

= e−λf + 2e−2λg−1(deλ,dΩ) +O(Ω)

= e−λf + 2e−2λ∇neλ +O(Ω).

Thus we obtain the desired relation on I +.Need to check all conformal transformation equationsagainst Appendix D of Wald!

A calculation shows that the unphysical Ricci tensor Rµν isrelated to the physical one, Rµν , by (see [36] Appendix D)

Rµν = Rµν − 2Ω−1∇µ∇νΩ

− gµν

(

Ω−1∇α∇αΩ − 3Ω−2g(n,n))

. (3.21)

We nowmultiply both sides of Eq. (3.21) byΩ. By assumption,

Rµν vanishes in a neighborhood of I +, and Rµν is regular

everywhere on M since it is a nonsingular manifold. UsingΩ−1g(n,n) = O(Ω), we find

2∇µ∇νΩ + gµν∇α∇αΩ = 0 on I+.

Contracting with gµν gives ∇µ∇νΩ = 0 or equivalently ∇n =0, which means that n is a (null) geodesic, shear-free, anddivergence-free vector field onI +. (The entire distortion ten-sor B(n) vanishes on I +.) Note that the condition g(n,n) = 0could be also obtained from Eq. (3.21) if we multiply bothsides by Ω2.SinceI + has the topology R×S2, we can select coordinates

θ, φ on the S2 sections such that the partial metric is that of asphere,

dS2 =(dθ ⊗ dθ + sin2 θdφ⊗ dφ

)V.

Since the vector field n is divergence-free, the cross-sectionarea of the S2 sections remains constant along the flow lines ofn, thus the partial metric will have equal values of V on everyS2 section. So the remaining conformal freedom (g → e2λgwith λ = const along the flow lines of n) can be used to setV = 1. The function Ω is a well-defined coordinate near I +

since n ≡ ˆg−1dΩ 6= 0. Finally, we can select a fourth coordi-nate u parametrizing the R part of I + = R × S2, such thatn u = 1. The coordinate system (u,Ω, θ, φ) is well-defined ina neighborhood of I +, where the metric has the form

g =1

2(du⊗ dΩ + dΩ ⊗ du) − dθ ⊗ dθ − sin2 θdφ⊗ dφ.

A coordinate system in the physical manifoldM away fromI + can be obtained by the replacement r = Ω−1. The functionr will then serve as a “radial coordinate near infinity” sincer → ∞ “at infinity.”The coordinate system (u, r, θ, φ) is extended away from

I + into the interior of M as follows. The coordinate u ischosen as a null function, so that l ≡ g−1du is a null vectorsuch that g(n, l) = 1. Then, r is defined as an affine param-eter along the flow lines of l, such that l r = 1. The an-gular coordinates θ, φ are kept constant along the flow linesof l, so ∇lφ = ∇lθ = 0. The resulting coordinate systemasymptotically corresponds to spherical Minkowski coordi-nates t, r, θ, φ if we define u ≡ t + r.5 This local coordinatesystem is called the Bondi coordinate system.Thus, we conclude that the assumptions of asymptotic flat-ness are a preciseway of formulating the notion of a spacetimethat is “almost Minkowski at large distances.”

3.4 Conformal radiation fields

Very far from an isolated system, the spacetime is almost flatbut there may still remain radiation (gravitational, electro-magnetic, or other). We are interested in studying the struc-ture of the radiation field at large distances. The construction

5This statement is justified in more detail in [36], p. 280.

85

Page 94: Winitzki. Advanced General Relativity

3 Asymptotically flat spacetimes

of conformal infinity is especially helpful for this task, becauseradiation fields are conformally invariant. Let us review someexamples of conformally invariant fields.

3.4.1 Scalar field in 1+1 dimensions

The action for a massless scalar field in a 1+1-dimensionalspacetime is

S[g, φ] =1

2

d2x√−g g−1(dφ,dφ),

where√−g ≡ √− det g is the determinant of the covariant

metric, and so√−gd2x is the invariant 2-volume element.

This action is invariant under the conformal transformationof the metric, g → g = Ω2g.

Calculation: Show that the action is invariant under theabove transformation.Hint: In a 1+1-dimensional spacetime,

√−g = Ω2√−g. Sub-stitute g = Ω−2g into the action and use the independence ofd from the metric.

3.4.2 Scalar field in 3+1 dimensions

The action for a massless, conformally coupled scalar field is

S[g, φ] =1

2

d4x√−g

(

g−1(dφ,dφ) − 1

6Rφ2

)

,

where R is the scalar curvature. Again one can verify that theaction is invariant under a conformal transformation g → g =Ω2g (up to boundary terms), provided that the field is alsotransformed as

φ→ φ = Ω−1φ.

The power of the conformal factor needed for the correcttransformation law is called the conformal weight of the field.Thus a scalar field has conformal weight 0 in 1+1 dimen-sions and weight −1 in 3+1 dimensions. Conformal invari-ance means that a solution of the equation of motion for φ in

the metric g also gives a solution for the field φ in the metricg.

Calculation: Using Eq. (3.8), show that the action is invari-ant under the above transformations (up to boundary terms).Hint: An integration by parts yields an equivalent form ofthe action,

S[g, φ] =1

2

d4x√−g

(

−φφ− 1

6Rφ2

)

, ≡ ∇α∇α.

Substitute g = Ω−2g, φ = Ωφ into this action, and replace Raccording to Eq. (3.8). Note that

(Ωφ) = Ωφ+ φΩ + 2(∇αΩ)(∇αφ).

3.4.3 Electromagnetic field

The electromagnetic field is described by the Maxwell tensorwhich is a 2-form F ,

F = dA, Fµν = ∂µAν − ∂νAµ,

where the 1-form A is the electromagnetic potential. The ac-tion for the Maxwell field is

S[g, F ] =1

16π

∫ √−gd4x gλνgµρFλµFνρ.

It is easy to see that this action is conformally invariant (in3+1-dimensional spacetimes), the Maxwell field F havingconformal weight 0.

3.4.4 Gravitational radiation field

The Einstein equation relates the Ricci tensor to the distribu-tion of matter in an algebraic way (if there is no matter at apoint p, the Ricci tensor is zero at p). The Ricci tensor, how-ever, is only the trace of the full Riemann tensor. Therefore,the curvature of spacetime contains degrees of freedom notalgebraically related to the distribution of matter. These de-grees of freedom are described by the trace-free part of theRiemann tensor called theWeyl tensor,

Cαβγδ = Rαβγδ +1

2δ[α[γS

β]δ],

where

Sαβ ≡ Rαβ − 1

6Rgαβ

is an auxiliary tensor carrying equivalent information to theRicci tensor. Note that Rαβγδ = Cαβγδ in vacuum (whereRαβ = 0). It can be shown that the second Bianchi identityimplies the constraints

∇µCµαβγ + ∇[αSβ]γ = 0,

∇µSαµ −∇αSµµ = 0.

In the absence of matter, Sαβ = 0 and thus we obtain the equa-tion ∇µCµαβγ = 0 which determines the dynamics of puregravitational waves.The Weyl tensor has the simple and useful conformal trans-formation property

Cαβγδ = Ω2Cαβγδ for g = Ω2g.

Another way to express this property is to say that the tensorCαβγδ is conformally invariant.

3.4.5 Asymptotic behavior of radiation

Radiation fields are massless and propagate along nullgeodesics (lightrays). Let us consider a lightray γ(τ) that ap-proaches null infinityI + at a point p. We shall now character-ize a radiation field along the lightray γ, in the asymptoticallyfar region near infinity.Denote by l the affine tangent vector to the lightray γ(τ). Inthe unphysical metric (g = Ω2g), the affine parameter τ shouldbe replaced by Ω2τ , so that the corresponding affine tangent

vector is l = Ω−2l (see Statement 1.9.2.2). Since the tangent

vector l is regular in the unphysical manifold, it should have

a well-defined value l0 on I +. Then the physical-space tan-

gent vector lmust decay at infinity as l ∼ l0Ω2. We shall now

complete l to a null tetrad l,m,n (see Sec. 2.5), obtain sim-ilar asymptotic properties for the tetrad at infinity, and studythe asymptotic behavior of radiation fields using the tetradcomponents.

The vector l0 can be completed to a null tetrad using the

vector n ≡ −ˆg−1dΩ and a complex-valued vector m is cho-

sen in the orthogonal complement n, l0⊥. The (unphysical)null tetrad l0, m, n can be parallelly transported from I

+

along γ using the unphysical metric, which produces null vec-

tor fields l, m, n defined along γ and satisfying g(n, l) = 1

86

Page 95: Winitzki. Advanced General Relativity

3.4 Conformal radiation fields

and g(m, ¯m) = −1 at every point. However, we need to find a“physical” tetrad, l,m,n, normalized and parallelly trans-ported along γ using the physical metric. We have already

seen that l = Ω2l. The vector n of the physical tetrad can be

chosen as n = n since we must have g(n, l) = g(n, l) = 1everywhere along γ. Finally, the complex-valued vectorm isselected from the subspace n, l⊥ to be proportional to m.The normalization g(m, m) = g(m, ¯m) = −1 shows that theunphysical-space vector m must be chosen as m = Ω−1m.Thus, the physical null tetrad l,m,n and the unphysicalone, l, m, n, must be related by

Ω2l,Ωm, n = l,m,neverywhere along the lightray γ.The convenience of the null tetrads is in the concise waythey describe components of tensors. For example, the an-tisymmetric Maxwell tensor F (a,b) for the electromagneticfield is represented by three tetrad components, usually cho-sen as follows,

Φ0 ≡ F (n,m),

Φ1 ≡ 1

2F (n, l) − 1

2F (m, m),

Φ2 ≡ F (m, l).

(The properties of the tetrad components Φj are studied in

the Statement below.) The unphysical components Φj , j =0, 1, 2, are given by the same equations with the unphysicaltetrad. The relation between the physical and the unphysicalcomponents is therefore

Φ0 = ΩΦ0, Φ1 = Ω2Φ1, Φ2 = Ω3Φ2.

Since the Maxwell field is conformally invariant, the unphys-ical components will also satisfy the Maxwell equations andwill have regular solutions on I +. It is now easy to see thatthe component Φ0 will be the slowest-decaying one,

Φ0 ∼ Ω ∼ 1

r,

while other components will decay faster. Note that F andΦj represent the field strength, which should decay as r

−1 fora radiation field. Hence, it is clear that the the intensity ofoutgoing radiation (i.e. energy carried away to infinity) must beindependent of the two other components Φ1,2. In fact, the

radiated power is proportional to |Φ0|2. Since the unphysicalcomponent Φ0 is constant at infinity, the differential intensityof radiation in a given (null) direction can be obtained directly

by evaluating |Φ0|2 at a point of I + that corresponds to therequired asymptotic direction.Other radiation fields exhibit analogous properties. For ex-ample, a scalar field φ has the “radiation” component ∇nφ,while the gravitational radiation is described by the compo-nent

Ψ0 ≡ Cαβγδnαmβnγmδ = R(n,m,n,m);

note that this component of the Weyl tensor coincides withthat of the Riemann tensor. The component Ψ0 decays as ∼Ω = r−1 at infinity, and the intensity of gravitational radiation

is proportional to |Ψ0|2. All the other components of the Weyltensor decay faster at infinity.The splitting of components according to their falloff behav-ior along null geodesics is called the peeling property. Thisproperty holds in general (even-dimensional) asymptoticallyflat spacetimes and for all massless fields (of arbitrary spin).

Statement: Consider a Lorentz transformation of the tetradthat leaves the vector l constant,

m → eiφm +Al,

n → n + eiφAm + e−iφAm +AAl,

where φ is real and A is complex. Under this transformation,the tetrad components Φj of the Maxwell tensor change as

Φ0 → eiφΦ0 + 2AΦ1 +A2e−iφΦ2,

Φ1 → Φ1,

Φ2 → e−iφΦ2.

It follows that the leading falloff behavior of Φ0 ∼ r−1 isindependent of the choice of the null tetrad. The quantitiesF (n, l) and F (m, m), taken separately, do not carry tetrad-independent information (i.e. one of these may be set to zeroby a choice of tetrad).

87

Page 96: Winitzki. Advanced General Relativity
Page 97: Winitzki. Advanced General Relativity

4 Global techniques

Under global techniques we understand considerationsthat involve entire manifolds or large domains, rather thanlocal considerations (e.g. calculations involving properties oftensors and their derivatives at a single point). Global tech-niques can be used to answer questions such as whether thereis a singularity anywhere, or whether a certain interestingsubdomain of the spacetime is finite or infinite. In this chap-ter we shall introduce some of these techniques and considersome applications.

4.1 Singularity theorems

Additional literature: [8], [26]. There are not many textbooksthat explain singularity theorems.[12]: Contains all the derivations but requires you to investa significant time in reading large portions of the book.[21]: Contains some introductory material on singularitytheorems; Chapter 34 contains a sufficiently detailed proof ofHawking’s area theorem.[36]: Devotes two chapters to mathematical developmentsand explains the main ideas behind singularity theorems.However, refers to [12] for proofs of some crucial statements.

4.1.1 Singularities and geodesicincompleteness

It is well known that some spacetimes occurring in gen-eral relativity contain points where the force of gravity be-comes “infinitely large” (more precisely, the geometric de-scription of the spacetime breaks down). Such spacetimes arecalled singular. Two major examples of singular spacetimesare the Schwarzschild spacetime and a generic Friedmann-Robertson-Walker (FRW) spacetime. In the Schwarzschildspacetime, it is relatively easy to understand that the blackhole center r = 0 represents a singularity because the cur-vature becomes infinite as r → 0, indicating the presence ofunboundedly large tidal forces. Also, every timelike or nullgeodesic entering the Schwarzschild radius cannot escape hit-ting the center within a finite time. No lightrays or timelikegeodesics can be continued past that point because the geom-etry is singular and the geodesic equation becomes undefined(i.e. it does not predict how a body will move past the center).Since the tidal forces will tear apart any material objects ap-proaching the center, it is reasonable to conclude that there isno sensible way to describe the evolution of an observer pastthe center of a black hole. Thus the center is a place wherethe geometry of the spacetime cannot be described by generalrelativity. In an FRW spacetime with the scale factor a(t) → 0as t → 0, the curvature is unbounded near t = 0. Similarly tothe Schwarzschild case, geodesics cannot be extended to thepast beyond t = 0, so one cannot describe what happened toan observer before t = 0. In this sense, t = 0 is the “beginningof time” for an FRW spacetime. It is reasonable to say that the3-surface t = 0 in an FRW spacetime is singular.Since the spacetime manifold is not smooth at “singular”points, it is natural to excise such points from the space-

time and to restrict the calculations to regular (nonsingular)points.1 Hence, one finds a curious situation: gravitationalsingularities are not “places” and are not present “anywhere”in the spacetime manifold. Since all the singular points areremoved, it is difficult to formulate a general definition of asingularity in terms of internal properties of the remainingnonsingular manifold. Nevertheless, the “presence” of sin-gularities is usually manifested in several ways. For instance,the curvature may become infinite along a certain geodesicworldline, indicating that a freely falling observer followingthat worldline will experience unbounded tidal forces. Or,certain geodesics may have only a finite range of the affine pa-rameter, because these geodesics “hit a singularity” and can-not be extended any further.An accepted way to define a singular spacetime is throughthe non-extensibility of geodesics. Intuitively, a geodesicshould be able to extend “arbitrarily far” (for an infinite rangeof the affine parameter) unless there is a singularity in theway.A spacetime is called geodesically complete if any geodesicis extendible to arbitrary values of the affine parameter. Ageodesically incomplete spacetime is interpreted as “singu-lar” in some way. Usually, only timelike and null geodesicsare used to determine geodesic completeness. (Note that aspacetime can be intuitively “singular” without containingany incomplete timelike and null geodesics because the sin-gularitymay be “infinitely far away,” approachable after finiteproper time only by a sufficiently highly accelerated observer,or by a spacelike geodesic.)The presently known singularity theorems are mathemati-cal statements about conditions for a spacetime to be geodesi-cally incomplete. Thus, these theorems only indicate the pres-ence of singularities but neither predict their “location” norcharacterize their physical properties.

Statement:2 Let k be a timelike Killing vector and γ(τ) a(non-geodesic) timelike curve with tangent vector v, such thatg(v,v) = 1. Show that

|∇vg(k,v)| ≤ |g(k,v)| |a| ,where the vector a(τ) ≡ ∇vv is the proper acceleration of

the curve and |a| ≡√

−g(a,a). Deduce that g(v,k) canchange only by a finite amount along the curve γ as long as∫∞

τ0|a(τ)| dτ < ∞, which means that the worldline γ(τ) can

be realized by a rocket with a finite amount of fuel. Show thatthe curve γ cannot reach any spacetime regions where g(k,k)becomes unbounded.Solution: By the Killing property, g(∇vk,v) = 0, thus

∇vg(k,v) = g(k,a). It remains to show that |g(k,a)| <|g(k,v)a| for timelike k,v and spacelike a 6= 0 such thatg(v,a) = 0. We may decompose k = αa + βv + s, where

α ≡ g(k,a)

g(a,a), β ≡ g(k,v),

1It is conjectured that a quantum theory of gravity (to be developed) willreplace “singularities” by some well-defined and finite new physics. Atthis point, it is safe to say that the classical theory of general relativitybreaks down at singularities.

2After [36], Problem 9.1.

89

Page 98: Winitzki. Advanced General Relativity

4 Global techniques

and s is a spacelike vector orthogonal to both a and v. Thenthe inequality g(k,k) > 0 yields

g(k,k) = β2 − α2 |a|2 − |s|2 > 0,

hence

|β| = |g(k,v)| > |α| |a| =1

|a| |g(k,a)| ,

which is equivalent to the desired inequality. Rewriting it as

|∇v ln g(k,v)| < |a| ,

and integrating both sides, we have

∣∣∣∣

∫ ∞

τ0

dτ∇v ln g(k,v)

∣∣∣∣≤∫ ∞

τ0

dτ |∇v ln g(k,v)|

≤∫

|a|dτ <∞.

Therefore the total change of ln g(k,v) is bounded. Since forany timelike vectors x,y we have the inequality

g(x,x)g(y,y) ≤ [g(x,y)]2,

it follows that g(k,k) ≤ |g(k,v)| and thus the total change ofg(k,k) along the curve γ(τ) is bounded for all τ .

4.1.2 Past-incompleteness of inflation

To get a taste of singularity theorems, we start with a simplestatement about the past timelike geodesic incompleteness ofan inflationary spacetime.3

Cosmological inflation is a period of accelerated expansionof spacetime. For instance, a spatially flat FRW metric with agiven scale factor a(t), namely

g = dt⊗ dt− a2(t)d2r, (4.1)

d2r ≡ dx⊗ dx+ dy ⊗ dy + dz ⊗ dz,

represents inflation for those times t when a > 0 and a > 0.The Hubble expansion rate is defined as

H(t) =d

dtln a,

so the conditions for the presence of inflation are H +H2 > 0and H > 0. A full discussion of possible varieties of cosmo-logical inflation is beyond the scope of this text. We shall re-strict our attention to a class of spacetimes that are qualita-tively similar to the inflating FRW case, namely, to spacetimeswith a metric of approximately the form (4.1), in suitable localcoordinates. Locally, the parameters a and H will be well-

defined and should satisfy the conditions H > 0, H +H2 > 0.The processes that generate inflation may result in local fluc-tuations of the Hubble rate H , but these fluctuations will oc-cur on distance and time scales typically much larger than thehorizon size H−1.It is clear that inflation can continue forever to the future.Themain statement is that such an inflationary spacetime can-not be also eternal to the past. Theway to prove this is to showthat a past-directed timelike geodesic will have only a finiterange of proper time. Events that preceded that time are not

3This section is based on the paper [3].

described by the inflationary spacetime and so may be inter-preted as the presence of an initial singularity (the “beginningof time”).We first describe the inflationary character of the space-time in a coordinate-free manner. A salient property of thespacetime (4.1) is the existence of a congruence of timelikegeodesics with tangent vectors v ≡ ∂t and connecting vec-tors c1,2,3 = ∂x, ∂y, ∂z that satisfy the Hubble law (3.4). Thesegeodesics have everywhere positive divergence (see the fol-lowing calculation).

Calculation: For the metric (4.1), show that the vectorsc1,2,3 ≡ ∂x, ∂y, ∂z are connecting for v ≡ ∂t and satisfy theHubble law (3.4) with H = a/a. Then show that the diver-gence div v = 3H and hence is everywhere positive duringinflation.Solution: The Hubble law,∇vcj = Hcj , follows from

g(∇vcj , cj) =1

2∇vg(cj , cj) = −1

2∂ta

2 = −a2H.

Then we compute the divergence using the decomposition

g−1 = v ⊗ v − a−23∑

j=1

cj ⊗ cj

that follows from the fact thatv, a−1c1, a

−1c2, a−1c3

is an

orthonormal basis. Hence,

div v = Tr(x,y)g(x,∇yv)

= g(v,∇uv) − a−23∑

j=1

g(cj ,∇cjv)

= −a−23∑

j=1

g(cj ,∇vcj) = 3H > 0.

Thus we reformulate our assumption of the presence of in-flation throughout the entire “past portion” of the spacetime:We require the existence of a timelike geodesic congruencev covering the entire past, with everywhere positive diver-gence divv > 0. Here, the “past portion” of the spacetimeis understood as the past of some Cauchy 3-surface. Notethat the existence of an everywhere diverging congruence isa nontrivial condition. In any spacetime, one can always findgeodesic congruences having arbitrary, positive or negative,divergence at any given point, but not necessarily everywhereto the past of a Cauchy surface.Given an everywhere diverging geodesic congruence, wedefine the local Hubble expansion rate to be H ≡ 1

3divv. Weshall assume, for simplicity, that there exists a constantH0 > 0such that H ≥ H0 everywhere. (This condition can be signifi-cantly relaxed without affecting the conclusions.)Now we imagine that some dust-like particles follow thegeodesic congruence v everywhere, and consider a geodesicobserver who moves through the universe and measures thevelocities of the dust particles. This observer can concludethat the spacetime is inflating and measure the local value ofH in the following way. Let the observer’s worldline be γ(τ)with an affine tangent vector u ≡ γ, such that g(u,u) = 1.In principle, the observer can measure the following two (3-dimensional) quantities: the relative velocity, ~vrel(τ), of theparticle that passes by the observer at a time τ , and the dis-placement vector δ~x(τ ; τ ′), in the observer’s rest frame, be-tween nearby particles that passed the observer at times τ

90

Page 99: Winitzki. Advanced General Relativity

4.1 Singularity theorems

and τ ′. After registering two nearby particles at times τ andτ ′ ≡ τ + δτ , the observer can calculate the separation and therelative velocity of the two particles in the (approximate) lo-cal rest frame of the particles. Let n be a normalized, spacelikevector describing the separation between two particles in theparticle rest frame. Then the observer can compute the quan-tity g(n,∇nv) and thus estimate the local Hubble rate fromthe rate of change of the separation in the direction n,

H ≡ 1

3divv ≈ −g(n,∇nv)

(the minus sign compensates for the spacelike character ofthe vectors n and ∇nv). Note that n must be normalized,g(n,n) = −1, so the observed Hubble rate depends only onthe direction of the observer’s motion but not on its speed.Since the spacetime is approximately isotropic, the estimateof the value of H from a single direction n is expected to bereasonably accurate.The next calculations derive a formula for H in terms ofthe vectors u and v. It will be convenient to denote α(τ) ≡g(u,v). The quantity α is the “γ factor” of special relativity,describing the time dilation between the observer and the par-ticle rest frames. Note that g(u,v) > 1 for two future-directedtimelike vectors u 6= v normalized to g(u,u) = g(v,v) = 1.Since we assume that the observer is not at rest with respectto any of the particles, we will always have α > 1.

Calculation: Using the geodesic property of the fields v andu, show that the vector n and the instantaneous Hubble rateH are expressed by

n =u − αv√α2 − 1

, H = − ∇uα

α2 − 1.

Derive the relationship between the observed 3-dimensionalquantities ~vrel(τ), ~vrel(τ + dτ), δ~x(τ ; τ + dτ), and the 4-dimensional vectors u, v, as well as∇uα, in the limit dτ → 0.Solution: The vector ndt should represent the infinitesimaldisplacement of the observer in the particle rest frame, wherethe time interval is dt. Spatial vectors in the particle rest frameare (spacelike) vectors orthogonal to v. Thus, the vector n isproportional to the projection of u onto the subspace v⊥:

n =u− vg(u,v)

|u− vg(u,v)| =u − αv√α2 − 1

,

where we have used g(u,u) = g(v,v) = 1. Now we can di-rectly compute H ≡ −g(n,∇nv). Since v is normalized andgeodesic, we have g(v,∇xv) = 0 = g(x,∇vv) for all x. So theonly surviving term in g(n,∇nv) is proportional to g(u,∇uv),

H = −g(n,∇nu) = −g(u,∇uv)

α2 − 1= − ∇uα

α2 − 1.

This is the desired formula. In the instantaneous inertialframe where the observer is at rest, we have u = 1, 0, 0, 0and thus v = α, α~vrel, where

α(τ) ≡ 1√

1 − ~v2rel(τ)

and τ is the observer’s proper time. In this way, the observercan compute α(τ) and∇uα ≡ ∂τα.Replacing ∇u by ∂τ , we can simplify the formula forH to

H(τ) =1

2∂τ ln

α(τ) + 1

α(τ) − 1.

Since α > 1, we have

lnα(τ) + 1

α(τ) − 1> 0 for all τ.

Finally, our hypothetical observer can integrate the instanta-neous value ofH(τ) along the worldline γ(τ). By assumption,H ≥ H0, therefore

∫ τ2

τ1

H(τ)dτ ≥ H0(τ2 − τ1)

for all τ1, τ2. On the other hand,

∫ τ2

τ1

H(τ)dτ =1

2lnα(τ2) + 1

α(τ2) − 1− 1

2lnα(τ1) + 1

α(τ1) − 1

<1

2lnα(τ2) + 1

α(τ2) − 1.

Hence, there is a lower bound on the admissible values τ1 ofthe proper time,

τ1 > τ2 −1

2H0lnα(τ2) + 1

α(τ2) − 1≡ τmin(τ2).

This means that no timelike geodesic can be extended to thepast further than to τmin(τ2). Thus, the spacetime is geodesi-cally incomplete to the past. (A similar argument shows thatnull geodesics are also past-incomplete.) The only geodesicworldlines that might be complete to the past are the particleworldlines.What could the observer see at earlier times τ ≤ τmin? Onepossibility is that the observer did not exist at those times be-cause the spacetime is singular in the distant past. In this case,we can say that a past-directed geodesic “hits a singularity” atτ = τmin or sooner, but we cannot deduce any more informa-tion about the singularity. For instance, different observersmight hit different, “separate” singularities. The only alter-native to the presence of singularities is to conclude that thespacetime was not inflating at τ < τmin. In both cases, theinflationary epoch could not be past-eternal.

4.1.3 Conjugate points on geodesics

Proofs of several singularity theorems are based on the prop-erties of geodesic curves. In this and the following sec-tions we shall obtain some necessary geometric results aboutgeodesics.Let u be a tangent vector field to a congruence which con-sists of geodesics emitted from a single point p; thus p is afocal point for the congruence u. There is a possibility thatsome of the lines of u will focus again at another focal pointq. The condition for this possibility is easy to derive using thegeodesic deviation equation (see Sec. 1.9.5).Namely, consider a connecting vector c for u. The vector csatisfies the geodesic deviation equation,

∇u∇uc −R(u, c)u = 0. (4.2)

We now pick a single geodesic γ(τ) such that τ = 0 at p, andsolve Eq. (4.2) only along that geodesic for τ > 0. Since weare solving a second-order equation, we need to supply initialvalues c and ∇uc at τ = 0. At p, we must have c = 0 since pis a focal point for u. We may pick the value∇uc(τ = 0) arbi-trarily, and obtain c(τ) from the deviation equation. Suppose

91

Page 100: Winitzki. Advanced General Relativity

4 Global techniques

that there exists a particular solution c(τ) such that c(τ1) = 0at some other point q ≡ γ(τ1) on the curve. It means that aninfinitesimally close, neighbor geodesic4 from the congruenceu intersects γ at the point q. Moreover, note that the equa-tion (4.2) is linear in c; if c(τ) is a solution, so is λc(τ) forany constant λ. It follows that there exists a one-parametric setof infinitesimally close neighbor geodesics corresponding toconnecting vectors λc, and each of those geodesics intersectsγ at q. It is clear that q is a second focal point for the congru-ence u. In that case, the point q is called conjugate to p alongthe line γ(τ).

Remark: A vector field c is called a Jacobi field for a curveγ if it satisfies Eq. (4.2) along γ with u ≡ γ. It is clear thatJacobi fields are essentially connecting vectors for a familyof geodesics around γ. One uses the concept of Jacobi fieldswhen one would like to examine the focusing properties of asingle curve γ(τ) and to avoid the need to introduce a wholecongruence of geodesics around γ and to talk about intersec-tions of “infinitesimally close” geodesics. For instance, onecan define conjugate points without involving a congruence,in the following way: Two points p = γ(τ1), q = γ(τ2) on ageodesic curve γ(τ) are conjugate if there exists a Jacobi fieldc(τ), not identically zero, which vanishes at τ1 and τ2.It is intuitively obvious that the divergence of the field u

tends to infinity at a focal point. To show this more formally,consider the defining property of the divergence divu,

LuV = ∂τV = (divu)V,

where V is a 3-volume spanned by three connecting vectorstransverse to u. As we cross a focal point, at least one of theconnecting vectors c that span V will vanish. Thus V passesthrough 0 at a focal point; in the generic case where the func-tion V (τ) has only simple zeros, V will also change sign aftera focal point. Note that ∂τV ≡ ∇uV is always finite sinceV = ε(u, c1, c2, c3) and every connecting vector cj satisfies alinear equation and thus cannot become infinite. Therefore,any point where divu becomes infinite must be a focal pointwhere V = 0. Also, we can rewrite the above equation asdivu = ∂τ lnV . If divu remains finite, lnV cannot go to −∞within a finite time τ . It follows that divu → ∞ at a focalpoint. (In the generic case, we will have divu → −∞ beforea focal point and divu → +∞ after a focal point.) Thus wehave proved that the condition divu → ∞ is necessary andsufficient for the existence of a focal point of a field u.Finally, let us develop a qualitative picture of why conju-gate points occur. We have seen that gravity generally actsas an attractive force that makes geodesics converge. There-fore, a bunch of geodesics emitted from a point p will usuallytend to refocus at a later point q. The refocusing may take awhile, but once the divergence divu becomes negative alongone line, refocusing is inevitable within a finite time. (Notethat the focusing theorem applies to hypersurface orthogonalgeodesics, and the congruence emitted from p is hypersurfaceorthogonal by Statement 1 from Sec. 2.4.2. Moreover, this con-gruence is integrable by Statement 1 from Sec. 2.2.3.) Focalpoints can occur along a caustic surface or caustic line, so weselect one geodesic γ on which focusing occurs at one pointq. Then q is the point conjugate to p along γ; other geodesics

4Generally, the focal intersection exists only among infinitesimally closegeodesics! This is so because the connecting vector c will generally failto satisfy the geodesic deviation equation for a neighbor geodesic at a fi-nite distance from γ.

γ′ may possess different conjugate points to p, and also theremay exist several conjugate points q, q′, ..., to p along a sin-gle curve γ. Thus the property of q to be conjugate to p is aproperty of the local geometry around a selected geodesic γ.Note also that the focusing of geodesics does not yet indicateany singularities in the spacetime; after all, we can select con-gruences that focus at any point by simply emitting geodesicsfrom that point.The focusing theorem (see Sec. 2.4.2) shows that divu willreach infinity within a finite proper time if the congruence ini-tially has negative divergence and if the strong energy con-dition holds. Hence, conjugate points must exist along eachcurve of the congruence, under the same assumptions.

4.1.4 Second variation of proper length

In Sec. 1.9.3 we have seen that a geodesic curve extremizes theproper length among all curves connecting two fixed points.For instance, in the Minkowski space, timelike geodesicsmax-imize the proper time; note that spacelike geodesics do notminimize the proper distance, since the metric g has the sig-nature (+ −−−). However, in more general spacetimes theproper length of a geodesic does not always deliver the maxi-mum proper time. We shall now see that the maximization ofproper length is closely related to the existence of conjugatepoints for neighbor geodesics.The infinitesimal change of the proper length of a curveunder an infinitesimal variation is described by Eq. (1.77) ofSec. 1.9.3. For a timelike curve γ(τ) with a tangent vectorv = γ normalized by g(v,v) = 1, perturbed by the flow ofa vector field t with a parameter σ along the flow, we have

d

dσL[γ; t] = −

∫ τ2

τ1

g(t,∇vv)dτ.

The derivative dL[γ; t]/dσ vanishes when γ is a geodesic, butthis is not sufficient to have an actual maximum of the func-tional L. The proper length is maximized if d2L/dσ2 < 0. Toverify this condition, we again consider a vector field v cre-ated by transporting the tangent vector γ along the flow linesof t. Then

d2

dσ2L[γ; t] = −

∫ τ2

τ1

∇tg(t,∇vv)dτ

= −∫ τ2

τ1

g(∇tt,∇vv)dτ −∫ τ2

τ1

g(t,∇t∇vv)dτ.

(Note that ∇vv 6= 0 away from the curve γ.) Since [v, t] = 0,the term∇t∇vv is expressed through the Riemann tensor as

g(t,∇t∇vv) = R(t,v,v, t) + g(t,∇v∇vt).

Finally, on the geodesic curve γ we have∇vv = 0 and thus

d2

dσ2L[γ; t] =

∫ τ2

τ1

(R(v, t,v, t) − g(t,∇v∇vt))dτ.

The integral expression in the right-hand side above mustbe nonpositive for all vector fields t (and for all τ ) if the curveγ maximizes the proper length under infinitesimal variations.We cannot simplify d2L/dσ2 any further, but we note that theexpression in the right-hand side is similar to the geodesic de-

viation equation (4.2). Denoting by G the linear operator in-volved in that equation, we can rewrite d2L/dσ2 as

d2

dσ2L[γ; t] =

∫ τ2

τ1

g(t, Gt − t)dτ,

92

Page 101: Winitzki. Advanced General Relativity

4.1 Singularity theorems

where the precise definition of G is

Gt ≡ R(v, t)v, t ≡ ∂τ∂τ t ≡ ∇v∇vt.

The vector twill be a Jacobi field if t− Gt = 0. Thus, d2L/dσ2

will vanish if t is a Jacobi field. In the present case, the fieldt plays the role of a “perturbation field” and the curve γ(τ)is fixed at endpoints γ(τ1) and γ(τ2). Thus an admissiblefield t must vanish at endpoints. But a Jacobi field that van-ishes at the endpoints can exist only if both endpoints areconjugate points with respect to the curve γ. The functionald2L[γ; t]/dσ2 is a quadratic functional of the vector field t.Heuristically, one expects that a negative-definite quadraticfunctional will not vanish at nonzero values of its argument.Hence, we expect that the proper length cannot be maximizedif there exists a Jacobi field that vanishes at both the endpoints,or (as will be shown below) if one of the endpoints is conju-gate to a point within the curve. Thus we found a crucial linkbetween the existence of conjugate points and the maximiza-tion of proper length. This link is formalized in the followingstatement.

Statement 1: 5A geodesic curve γ maximizes the properlength between points γ(τ1) and γ(τ2) with respect to in-finitesimal variations iff γ has no conjugate points to γ(τ1)within the interval [τ1, τ2].Sketch of proof: If there are no conjugate points then thereis a one-to-onemap between a Jacobi field c at a point γ(τ) andthe initial value of the derivative ∇uc at τ = 0. This map isspecified by a nondegenerate transformation A(τ). The func-tional d2L[γ; t]/dσ2 evaluated on an arbitrary vector t can beexpressed through the auxiliary vector field z ≡ A−1t, andshown to be a negative-definite functional of z by an explicitcalculation that uses the integrability of the geodesic congru-ence emitted at γ(τ1).Conversely, if there exists a focal point γ(τ∗), where τ∗ isbetween τ1 and τ2, then there exists a Jacobi field c, not iden-tically zero, and such that c(τ1) = c(τ∗) = 0. To show that theproper length is not maximized, it is sufficient to find even asingle example of a vector field t such that d2L[γ; t]/dσ2 > 0.If we define the field t0 along the curve γ(τ) by t0(τ) ≡ c(τ)for τ1 < τ < τ∗ and t0(τ) ≡ 0 for τ∗ < τ < τ2, then we willhave d2L[γ; t0]/dσ

2 = 0 except at τ = τ∗ where t0(τ) has a“corner.” Now, one can deform this field t0(τ) by an infinites-imal amount and construct a smooth field t(τ) such that stilld2L[γ; t]/dσ2 > 0. The construction of t(τ) goes heuristicallyas follows: The field t(τ) is almost equal to c(τ) up to τ = τ∗,which produces an almost zero value of the integral, and thent(τ) can be chosen to be almost zero for τ∗ < τ < τ2 and suchthat the resulting integral is positive. (The curve correspond-ing to t0(τ) has a corner that can be “straightened out.”)More detailed proof: First we prove that there is a maxi-mum of L[γ] if no conjugate points are present on γ. Forbrevity, we denote (covariant) derivatives along the curve

γ by an overdot. The time-dependent transformation A(τ)that maps initial values of c(τ1) to final values c(τ) is the(transformation-valued) solution of the equation

¨A− GA = 0, A(τ1) = 0,

˙A(τ1) = 1.

Note that Gv = 0 and g(Gx,v) = 0 for all x, in other words,G is transverse to the vector v. Thus G and A are effectively

5[12], Proposition 4.5.8.

three-dimensional transformations acting in the subspace v⊥.

If there are no conjugate points, the transformation A is non-degenerate for all τ > τ1. In that case, we will now showthat d2L[γ; t]/dσ2 ≤ 0 for all vector functions t(τ) that sat-isfy t(τ1) = t(τ2) = 0. This will be sufficient to show that

L[γ] has a maximum. Since A−1 is well-defined, we can setz(τ) ≡ A−1(τ)t(τ); the vector z(τ) will then belong to thesubspace v⊥. We can express d2L[γ; t]/dσ2 through z(τ) asfollows,

d2

dσ2L[γ; t] =

∫ τ2

τ1

g(Az, GAz− ∂τ∂τ (Az))dτ

=

∫ τ2

τ1

[

−g(Az, 2˙Az) − g(Az, Az)

]

=

∫ τ2

τ1

[

g(˙Az, Az) − g(Az,

˙Az) + g(Az, Az

]

dτ,

(4.3)

where in the last line we have canceled the total derivative∂τg(Az, Az). Now let us rewrite the first two terms in the lastline in Eq. (4.3) through the adjoint operator AT ,

g(˙Az, Az) − g(Az,

˙Az) = g(z, (

˙AT A− AT

˙A)z).

Note that g(Gx,y) = R(v,x,v,y), so the known symmetry of

the Riemann tensor, R(a,b, c,d) = R(c,d,a,b), means that Gis self-adjoint, GT = G. Thus, the operator AT (τ) satisfies theequation

¨AT − AT G = 0, AT (τ1) = 0,

˙AT (τ1) = 1.

The operator˙AT A− AT

˙A is analogous to a Wronskian and is

time-independent because

∂τ

(˙AT A− AT

˙A)

= AT GA− AT GA = 0.

Since the operator˙AT A − AT

˙A is equal to zero at τ = τ1 due

to the initial condition A(τ1) = 0, we have˙AT A − AT

˙A = 0

for all τ . Thus

g(˙Az, Az) − g(Az,

˙Az) ≡ 0

and the expression (4.3) is simplified to

d2

dσ2L[γ; t] =

∫ τ2

τ1

g(Az, Az)dτ.

Since the vector Az belongs to the subspace v⊥, it is spacelike,hence the above expression is always nonpositive.Now we consider the second case where a conjugate pointis present. We will then show that L[γ] does not have a maxi-mum. The presence of a conjugate point means that there ex-

ists a Jacobi field c(τ), not everywhere zero, satisfying c = Gc,c(τ1) = 0, c(τ∗) = 0 for τ1 < τ∗ < τ2. We shall now demon-strate that d2L[γ; t]/dσ2 > 0 with a specific choice of the per-turbation vector t. The vector twill be defined by “smoothingthe corner” of c(τ) near τ = τ∗, with a small additional per-turbation:

t(τ) ≡ s(τ)c(τ) + εq(τ)a,

where s(τ) is a smooth step-like function such that s(τ <τ∗) ≈ 1, while s(τ > τ∗) ≈ 0; ε > 0 is a “small” constant,

93

Page 102: Winitzki. Advanced General Relativity

4 Global techniques

q(τ) is a smooth function that vanishes at τ = τ1,2 and is pos-itive for τ1 < τ < τ2, for example, q(τ) = (τ − τ1)(τ2 − τ);and a is a constant spacelike vector (to be determined later).The function s(τ) is chosen to be nonconstant only within avery narrow interval around τ = τ∗. The idea is to have εvery small and s(τ) very close to the Heaviside step function,so that t(τ) is smooth but t(τ) ≈ c(τ) for τ1 < τ < τ∗ andt(τ) ≈ 0 for τ∗ < τ < τ2. By construction, t exactly vanishesat the endpoints τ = τ1,2. With the above definition of t(τ),we can directly compute the relevant quantities, neglectingunimportant terms of order ε2:

Gt− t = εq(τ)Ga − 2sc − sc,

d2

dσ2L[γ; t] =

∫ τ2

τ1

g(t, Gt − t)dτ = −∫ τ2

τ1

g(sc, 2sc + sc)dτ

− 2ε

∫ τ2

τ1

q(τ)g(a, 2sc + sc)dτ +O(ε2),

where in the last line we have used the self-adjointness prop-erty (derived using integration by parts),

∫ τ2

τ1

g(x, Gy − y)dτ =

∫ τ2

τ1

g(Gx − x,y)dτ,

which holds for vectors x,y that vanish at endpoints τ = τ1,2.Since s and s are nonzero only in a narrow interval aroundτ = τ∗, it is sufficient to approximate

c(τ) ≈ (τ − τ∗)b, b ≡ c(τ∗)

t(τ) ≈ (τ − τ∗) s(τ)b + εq(τ∗)a,

where b is a known spacelike vector. (With a suitable choiceof the function s(τ), the error of this approximation will be oforder ε2.) Therefore,

d2L[t]

dσ2= O(ε2) − g(b,b)

∫ τ2

τ1

(τ − τ∗) (2ss+ (τ − τ∗)ss)dτ

− 2εq(τ∗)g(a,b)

∫ τ2

τ1

(2s+ (τ − τ∗)s)dτ.

The step-like function s can be chosen so that the first inte-gral in the above equation is arbitrarily small, e.g. of order ε2.Indeed, integration by parts yields

∫ τ2

τ1

(τ − τ∗) (2ss+ (τ − τ∗)ss)dτ = −∫ τ2

τ1

(τ − τ∗)2s2dτ ;

heuristically, s(τ) ≈ δ(τ − τ∗) and thus (τ − τ∗)s ≈ 0. Thesestatements can be made mathematically precise, for instance,by using an explicit formula for s(τ). The remaining integralis easily evaluated,

∫ τ2

τ1

(2s+ (τ − τ∗)s)dτ =

∫ τ2

τ1

[s+ ∂τ ((τ − τ∗)s)]dτ = −1.

Finally, we note that b 6= 0, since by assumption c is not ev-erywhere zero and b = c 6= 0 at the point τ = τ∗ where c = 0.Also, q(τ∗) > 0. So we can choose the vector a such that

2q(τ∗)g(a,b) = 1.

With this choice, we obtain

d2

dσ2L[γ; t] = ε+O(ε2),

which will be positive for small enough ε. This concludes theproof of Statement 1.

Thus, in the presence of (at least one) conjugate point on acurve γ, the maximum proper length is not achieved by γ. Forexample, a body on a circular orbit in a central gravitationalfield does not maximize the proper time over one period afterit comes back to an initial point p, since the elapsed propertime is smaller than that of a body at rest at p. In fact, theproper time is maximized by another geodesic, correspondingto a body thrown away from the center of gravity from p suchthat it comes back to p at the right time.

A related situation is a congruence of timelike geodesicsnormal to a given spacelike 3-surface S. Then this congruenceis hypersurface orthogonal everywhere (not just at S), by anargument analogous to that of Statement 1 in Sec. 2.4.2. For apoint p outside of S, there may or may not be a geodesic thatmaximizes the proper distance between p and S. If a maximiz-ing geodesic exists, it must be one of the congruence normalto S. However, maximization is impossible if the congruencehas focal points between p and S. As before, a definition canbe given in terms of Jacobi fields: A point q on a geodesic γnormal to S is called conjugate to S with respect to γ if thereexists a Jacobi field c along γ, such that c = 0 at q but c 6= 0 onS. Similarly to Statement 1 above, one can prove the follow-ing.

Statement 2: A timelike curve γ maximizes the proper dis-tance between a point p and a spacelike 3-surface S if γ is ageodesic normal to S which contains no points conjugate to Sbefore the point p.

The focusing theorem now shows that conjugate pointsmust exist under certain conditions.

Statement 3: Suppose that S is a spacelike 3-surface, u is anormalized timelike vector field orthogonal to S, such thatdivu ≡ −θ0 < 0 at some point p ∈ S, and the strong en-ergy condition holds, Ric(x,x) ≤ 0 for all timelike x. Thenthe geodesic γ normal to S at the point p contains a conjugatepoint to S within proper time 3/θ0 from p.

Under the conditions of Statement 3, the geodesic γ nor-mal to S at p that stretches up to a point q which is furtherthan 3/θ0 from p (as measured along γ), cannot maximize theproper length between q and S. Our conclusion is the follow-ing.

Statement 4: Suppose the conditions of Statement 3 hold,and additionally divu ≤ −θ0 < 0 everywhere on the surfaceS. If a point q can be connected to S by a timelike curve ofproper length ≥ 3/θ0 then there is no timelike curve of maxi-mum length connecting q and S.

Proof: If a timelike curve γ of maximal proper length existsand reaches S at a point p then γ must be a timelike geodesicthat is orthogonal to S at p (or else we can deform γ or shiftthe point p and increase the proper length). However, if theproper length of γ is larger than 3/θ0, there exists a point onγ conjugate to p, and thus γ does not maximize the properlength.

An example of this situation is a half-hyperboloid t2 − r2 =1, t < 0, in the Minkowski spacetime. This spacelike 3-surfaceis orthogonal to every straight line (i.e. to every geodesic)passing through the origin. Thus, the origin is the focal pointfor all the geodesics emitted orthogonally from S. The half-lightcone, t2 − r2 = 0, t < 0, bounds the domain where conju-

94

Page 103: Winitzki. Advanced General Relativity

4.1 Singularity theorems

gate points do not arise. For any point q such that t2 − r2 > 0,t ≤ 0, and for any point in the upper half-spacetime, t > 0,one can find timelike curves of arbitrarily large proper lengthleading from q to some point on S. Thus, the maximal-lengthcurve connecting some point q to S must have proper length< 1 between q and S.

Calculation: Compute the divergence of the future-directednormal vector field u to the half-hyperboloid t2−x2−y2−z2 =1, t < 0, in the Minkowski spacetime, and show that this 3-surface satisfies the conditions of Statement 4 as well as itsconclusion.Solution: The hyperboloid is specified by the equation

g(x,x) = 1, where x ≡ (t, x, y, z) is the position vector. Thefuture-directed normal vector field is

u = − x√

g(x,x),

and the Levi-Civita connection ∇ = ∂, so we have ∇ax = a

and thus (on the surface)

∇au = −∂ax

g(x,x)= −a + xg(a,x).

The divergence is found as

divu = Tr(a,b)g(∇au,b)

= Tr(a,b) [g(a,x)g(b,x) − g(a,b)] = 1 − 4 = −3.

So we may set θ0 = 3 and apply Statement 4. The properlength to the focus is 3/θ0 = 1, in agreement with the geomet-ric result.Finally, we state an analogous theorem for null geodesics.The differences between the timelike and the null case arepurely technical: 1) Jacobi fields for null geodesics n are effec-tively two-dimensional because connecting vectors differingby a multiple of n are equivalent. 2) Null geodesics contain-ing conjugate points do not maximize the proper length be-tween endpoints, thus the maximal proper length is positive,and there is a timelike curve connecting the endpoints. 3) Oneconsiders a spacelike 2-surface to which null geodesics are or-thogonal (note that there are two null directions orthogonalto a spacelike 2-surface, but no null vectors orthogonal to aspacelike 3-surface). 4) One uses the null energy condition in-stead of the strong energy condition. 5) The focusing theoremfor null geodesics predicts focusing within parameter interval2/θ0, instead of 3/θ0 in the timelike case.

Statement 5: Suppose that S is a spacelike 2-surface, n is anull vector field orthogonal to S, such that divn ≡ −θ0 < 0 ata point p ∈ S, and the null energy condition holds, Ric(x,x) ≤0 for all null x. Then the null geodesic γ normal to S at p, withinitial tangent vector n, contains a conjugate point to S withinthe affine parameter interval 2/θ0. If the geodesic γ can beextended beyond the conjugate point to some point q then γdoes not maximize the proper length between p and q, andthus q can be connected to p by a timelike curve.

4.1.5 Singularity in collapsing or expandinguniverse

We shall now derive some basic singularity theorems. Boththeorems below are due to S.W. Hawking. (Stronger theoremsexist but require much more technical work.)

An intuitive picture of a gravitational collapse is that of acloud of particles that keep moving closer to each other. Themathematical expression of this property is that the congru-ence of the particle worldlines has a negative divergence. Ac-cording to the focusing theorem, a consequence of negativedivergence is a focusing of the geodesics. As a result, thegeodesics fail to maximize proper length. This is the technicalbasis of the singularity theorems. Of course, the geodesic fo-cusing by itself does not signal any singularities; we need ad-ditional conditions on the initial velocities of the particles toconclude that singularities will develop. In the first singular-ity theorem, these conditions are formulated as requirementson a spacelike 3-surface to which the particle worldlines areorthogonal.

Recall that a Cauchy surface Σ for a domain is a 3-surfacesuch that any timelike curve within the domain intersectsΣ exactly once. The intuitive meaning of a Cauchy surfaceis that a partial differential equation has a unique solutionthroughout the domain if appropriate initial data are speci-fied on Σ. The theorem states that a Cauchy surface having aneverywhere negative divergence leads to a singularity.

Statement 1: If Σ is a spacelike Cauchy surface for the en-tire spacetime, with a normal vector field uwhose divergenceis everywhere negative-bounded, divu ≤ θ0 < 0, and if thestrong energy condition holds, then every timelike geodesicnormal to Σ has finite proper length no greater than 3/θ0.

Sketch of a proof: Consider a point p to the future of Σ. Ev-ery timelike geodesic leading from p to Σmust intersect Σ ex-actly once. The points of intersection form an open set I(Σ; p)bounded by intersection points with null geodesics from p toΣ. Timelike geodesics intersecting Σ near the boundary ofI(Σ; p) have very small proper length, so we may consider

the subset I(Σ; p;L) ⊂ I(Σ; p) of intersection points for curveswith proper length ≥ L for some lower bound L. We can se-

lect L such that I(Σ; p;L) 6= ∅, and then the subset I(Σ; p;L)is compact. (These intuitively clear topological statementswill be given a formal proof in Lemma 1 below.) Therefore,the proper time is maximized by some geodesic curve leadingfrom p to Σ. However, by Statement 4 in the previous sec-tion, we know that there are no curves maximizing the properlength and stretching further than 3/θ0. Therefore the space-time cannot include points at a proper length greater than 3/θ0from Σ. In other words, every timelike geodesic emitted fromΣmust be future incomplete.

Lemma 1: Let I(Σ; p;L) be the set of intersection points oftimelike geodesics leading from a point p to a Cauchy surfaceΣ, whose proper length is ≥ L, where L > 0 is a given con-

stant. Then I(Σ; p;L) is a compact set.

Proof: Suppose that the set I(Σ; p;L) is not compact. Thenthere exists a sequence γj of geodesics, whose properlengths are ≥ L, such that the corresponding intersectionpoints q1, q2, ..., on Σ constitute a sequence without accumula-tion points within I(Σ; p;L). However, the initial tangent vec-tors γj at p are within the past lightcone within TpM, which isa compact set if we also include the null directions. Hence, thesequence

γjof tangent vectors must have an accumulation

point γ∞ within the past lightcone. We can emit a geodesicγ∞(τ) from p with the initial tangent vector γ∞. If γ∞(τ)intersects Σ at a point q∞ then q∞ must be an accumulationpoint for qj. Hence the proper length of γ∞ cannot be lessthanL and so q∞ ∈ I(Σ; p;L). This yields a contradiction with

95

Page 104: Winitzki. Advanced General Relativity

4 Global techniques

the assumption that qj has no accumulation points withinI(Σ; p;L). Thus γ∞(τ) cannot intersect Σ. Since every time-like curve intersects Σ, we see that the vector γ∞ must be nulland γ∞(τ) is a null geodesic. Since γ∞(τ) does not intersectΣ, for any τ there exists a neighborhood of γ∞(τ) that doesnot intersect Σ. It is then possible to find a timelike curve(which is not necessarily a geodesic, not necessarily startingat p) that runs along γ∞(τ) for all τ and always remains awayfrom Σ. The existence of an inextendible timelike curve thatnever crossesΣ contradicts the definition of a Cauchy surface.This concludes the proof.

An interpretation of the first theorem is the following: Ifwe see a universe that is everywhere contracting (everywherealong a single Cauchy surface), we should expect to hit a sin-gularity within a finite time. By inverting the direction oftime, we also come to the conclusion that a Cauchy surfacewith everywhere positive divergence contains a singularity inits past. Thus, if our universe is everywhere expanding, andif the strong or the null energy condition is satisfied, then theuniverse must have begun at a singularity at a finite time inthe past.

Self-test question: The half-hyperboloid t2−x2−y2−z2 = 1,t < 0, in the Minkowski spacetime, is an infinite spacelike sur-face whose normal vector x has an everywhere negative di-vergence. Does it follow from Statement 1 that the Minkowskispacetime must contain a singularity to the future of the half-hyperboloid?

Answer: No. (The half-hyperboloid is not a Cauchy surface.)

4.1.6 Singularity in a closed universe

The assumption that a given 3-surface is a Cauchy surfacefor the spacetime is quite strong, since it is a global relation-ship between Σ and the entire spacetime, and not merely alocal property of the surface Σ. One can replace this assump-tion by the requirement that Σ be compact (this will apply toclosed universes). The result will be the incompleteness ofsome geodesics emitted from Σ, which may fill only a part ofthe entire spacetime. Here is how this conclusion can be de-rived.

We restrict our attention to the part of the spacetime con-sisting of points p for which there exists a timelike curve γbetween Σ and p. This part of the spacetime is called the do-main of dependenceD(Σ) of the surface Σ. We would like toprove that D(Σ) contains some incomplete geodesics if Σ sat-isfies the conditions of Statement 1 above, except for being aCauchy surface. It follows from the proof of Statement 1 that,within D(Σ), a geodesic γ connecting Σ to a point p at properlength > 3/θ0 does not maximize the proper length. There-fore, any geodesic that doesmaximize the proper length mustbe shorter than 3/θ0.

Let us denote by C(Σ) ≡ ∂D(Σ) the boundary of the do-main of dependence; it is a 3-surface called the Cauchy hori-zon. It can be seen that the Cauchy horizon C(Σ) is locallya null surface (although it may also contain some “corners”).The fact that a boundary ∂D of a domain of dependence D isa null 3-surface will be derived below (see the proof of State-ment 1 in Sec. 4.2). There are now two interesting possibili-ties: either C(Σ) is compact, or it is not compact. Suppose firstthat C(Σ) is a compact set. Note that a null surface is gen-erated by null geodesics (see Sec. 2.2.8). Then the null gen-erators of C(Σ) must wind around it, coming arbitrarily near

to themselves infinitely many times; informally, we may callsuch curves “almost closed” curves. Since timelike geodesicscan be found arbitrarily close to null geodesics, the presenceof “almost closed” null geodesics is an undesirable feature ofthe spacetime, similar to a causality violation or the presenceof a time machine. A spacetime is called strongly causal if itdoes not admit timelike curves that repeatedly come arbitrar-ily close to some points. Therefore, C(Σ) cannot be compact ina strongly causal spacetime. Now suppose C(Σ) is not com-pact; then C(Σ) contains a sequence of points q1, q2, ..., whichdoes not have an accumulation point within C(Σ). We maychoose a sequence of points qj arbitrarily close to qj butwithin D(Σ). Then, every point qj lies on at least one time-like curve intersecting Σ. Suppose that L is the length of thatcurve; then the set of all timelike curves with length ≥ L iscompact, so one of these curves has the largest proper length.So, for each point qj we have a timelike geodesic γj(τ) that hasthe maximal proper length among all the timelike curves con-necting qj with Σ. (We have already shown that each of thesecurves has proper length τmax < 3/θ0, assuming that γj(0) ison Σ.) Let pj be the point where the curve γj intersects Σ, andlet uj ≡ γj(0) be the initial tangent vector. The intersectionpoints p1, p2, ..., and the corresponding initial tangent vectorsu1, u2, ..., are sequences on a compact set Σ and thus mustaccumulate at some point p∞ and vector u∞. Now, we canshow that the geodesic γ∞(τ) emitted from Σ at p∞ with theinitial tangent vector u∞ cannot be extended past the propertime 3/θ0. If γ∞(τ) were well-defined for τ > 3/θ0 then wewould have a narrow tube-shaped neighborhood stretchingalong γ∞(τ) for 0 < τ ≤ 3/θ0, such that infinitely manygeodesics γj are inside that tube. Thus the point sequencesqj and therefore qj would accumulate somewhere withinthe (compact) neighborhood of γ∞. This contradicts the as-sumption that the sequence qj has no accumulation points.Thus we have proved the following theorem.

Statement 2: If a spacetime contains a compact spacelikesurface Σ (without boundary) having a normal vector u suchthat divu < 0 everywhere on Σ (thus there exists θ0 < 0 suchthat divu ≤ θ0), and if the strong energy condition holds, andif the spacetime is strongly causal6 (contains no almost-closednull geodesics), then there is at least one incomplete timelikegeodesic emitted from Σ.

A universe can contain a compact, “edgeless” spacelike 3-surface only if the universe is closed. In open universes, everyspacelike 3-surface without boundary must stretch to infinityand so cannot be compact. The interpretation of the secondsingularity theorem is that a closed universe must contain asingularity if there exists an everywhere contracting spacelikesection.

Note that this theorem demonstrates only the existence ofa single incomplete timelike geodesic, whereas Statement 1of the previous section showed (under stronger assumptions)that every timelike geodesic is incomplete. The presence of in-complete geodesics emitted from Σ means that the spacetimecontains a singularity somewhere to the future of Σ.

Analogous statements can be derived for null geodesics us-ing the null energy condition.

6S. W. Hawking also proved a version of this theorem without the strongcausality condition. Reference?

96

Page 105: Winitzki. Advanced General Relativity

4.1 Singularity theorems

4.1.7 Singularity in gravitational collapse

Qualitatively, gravitational collapse occurs when a mass m isconcentrated within a region smaller than the Schwarzschildradius Rs ≡ 2m. The collapse results in a singularity, regard-less of the composition of the collapsing matter.In this section we shall consider a singularity theorem dueto Penrose, showing that the formation of a singularity is in-evitable once the normally “outgoing” lightrays start to con-verge. Let us investigate this condition in more detail.There exist precisely two null directions orthogonal to a2-dimensional subspace spanned by two spacelike vectors.Thus, a spacelike 2-surface can emit two congruences of ligh-trays orthogonal to it. Normally, one congruence will be con-verging (the “ingoing” lightrays) and the other diverging (the“outgoing” lightrays). A 2-surface T is called a trapped sur-face if both congruences of lightrays emitted orthogonally toT into the future have a negative divergence.Penrose’s argument begins by considering a sphericallysymmetric distribution of matter. Suppose at first that a massm is concentrated within a region of radius r0 < 2m in aspherically symmetric fashion. Then the metric outside of thematter distribution is Schwarzschild, and it is easy to see thatat points r such that r0 < r < 2m, both lightcones emittedinto the future direction are converging (have negative diver-gence). Thus a sphere of radius r1 is a compact, spacelike,trapped 2-surface for r0 < r1 < 2m. We shall shortly provethat the existence of such a trapped 2-surface indicates thepresence of a singularity in the future. Now, if the matterdistribution is slightly nonspherical, the sphere will remaina trapped surface because the divergence of the lightrays willbe only slightly perturbed and will not become positive. Thusa singularity arises even for asymmetric configurations of col-lapsing matter.The precise formulation of the theorem is the following.

Statement 1:7 If a time-orientable spacetime manifold Mhas a non-compact Cauchy surface C, if the null energy con-dition holds to the future of C, and if there exists a compact,spacelike, trapped 2-surface T , then there exists at least oneincomplete null geodesic to the future of T .

Proof: Consider the “future subdomain” F(T ) of the surfaceT ; this subdomain consists of all the points p that can be con-nected to T by some timelike curve. The proof of the theoremis based on studying the properties of the boundary ∂F(T )of the subdomain F(T ). By considering geodesics near T , itis easy to see that the 3-surface ∂F starts from T as a nullsurface made by null geodesics emitted orthogonally from T .Since T is compact, the everywhere negative divergence ofnull geodesics emitted from T has an upper bound −θ0 < 0.Thus, by the focussing theorem, every null geodesic will havea conjugate point to T within a finite interval of the affineparameter (not larger than 2θ−1

0 ). Let us assume that everynull geodesic to the future of T is complete; in particular, ex-tendable beyond the parameter interval 2θ−1

0 . We know thata geodesic does not maximize proper length beyond a conju-gate point; therefore, points p on null geodesics beyond 2θ−1

0

can be connected with T by a timelike geodesic with positiveproper length. Thus, points on null geodesics beyond 2θ−1

0

are insideF(T ), and so the boundary ∂F consists of finite seg-ments of null geodesics, each segment not longer than 2θ−1

0 .Since T is compact, the set ∂F can be viewed as a subset of a7[12], §8.2, Theorem 1. See also the paper [24].

compact set T × [0, 2θ−10 ], therefore ∂F is itself compact. Since

M is time-orientable, there exists a smooth vector field v suchthat g(v,v) = 1. By definition of the Cauchy surface, everyflow line of v intersects C exactly once. Also, a flow line ofv can intersect the 3-surface ∂F at most once; this followsfrom the definition of ∂F as the boundary of the set F(T ).Thus, the flow lines of v define a map from ∂F to C: a pointp ∈ ∂F is mapped into the intersection of the correspondingtimelike curve with C. This map is continuous and one-to-onebetween ∂F and its image; however, ∂F is compact while C isnot, which is an impossible situation. (The image of ∂F mustbe a compact subset of C, so it must have a nonempty bound-ary within C which will be mapped to a boundary of ∂F , andyet the boundary of ∂F is empty since ∂∂ = 0.) Thereforewe must reject the assumption that all null geodesics can beextended beyond 2θ−1

0 ; in other words, there must exist anincomplete null geodesic to the future of T .

Calculation: Using the Schwarzschild metric (1.36), com-pute the affine tangent vector n for radial null geodesics emit-ted “outwards” from a sphere of radius r. Show that the di-vergence of n is ∼ r−1 for r > 2m, zero for r = 2m, and∼ −r−1 for r < 2m.Hint: In the Schwarzschild metric, a radial null geodesichas an affine tangent vector of the form n = α1∂t + α2∂r,where α1,2 are some functions of r only. Lightrays emitted“outwards” at r = 2m have the tangent vector in the direc-tion ∂t. For r < 2m, the radial coordinate has the meaning of“time,” and the “future” is the direction of decreasing values ofr (because, e.g., the proper time of a timelike observer fallinginto the black hole will increase with decreasing r).Solution:We beginwith the case r = 2m. The Schwarzschildmetric is ill-defined at r = 2m, so we shall use a geometric ar-gument instead of explicit calculations: The null vector ∂t isdivergence-free because the cross-section area of the null sur-face r = 2m is 4πr2 and thus remains constant along ∂t (seeSec. 2.1.2 and the example in Sec. 2.2.5). Now consider thecase r 6= 2m. To compute the divergence of n, it is conve-nient to use a basis l,n, eθ, eφ, where l and n are null andg(l,n) = 1, while the spacelike basis vectors are eθ = r−1∂θ ,eφ = (r sin θ)−1∂φ, in spherical coordinates. The vector n

should be an “outward” radial geodesic while l points alongthe “inward” lightray (but is not necessarily an affine tangentvector to it). After we compute the vectors n and l, it willfollow that

divn = Tr(x,y)g(∇xn,y)

= g(∇nn, l) + g(∇ln,n) − g(∇eθn, eθ) − g(∇eφ

n, eφ)

= g([n,eθ], eθ) + g([n,eφ], eφ).

Now let us determine n and l. Denote by

z(r) ≡(

1 − 2m

r

)1/2

the redshift factor for the Schwarzschild spacetime (seeSec. 3.1.2). Assuming that

n = α1∂t + α2∂r, l = β1∂t + β2∂r,

where α1,2 and β1,2 are unknown scalar functions, we use theconditions

g(l, l) = g(n,n) = 0, g(l,n) = 1,

97

Page 106: Winitzki. Advanced General Relativity

4 Global techniques

and find

z2α21 =

1

z2α2

2, , z2β21 =

1

z2β2

2 ,

z2α1β1 −1

z2α2β2 = 1;

thus α1,2 and β1,2 may be chosen as functions only of r. Fur-ther, we have

[l,n] = β2 (α′1∂t + α′

2∂r) − α2 (β′1∂t + β′

2∂r) ,

where the prime ′ denotes derivatives with respect to r. Usingthe condition

0 = g(∇nn, l) = −g(n,∇nl) = g(n, [l,n]),

we find

z2α1 (α′1β2 − α2β

′1) −

1

z2α2 (α′

2β2 − α2β′2) = 0.

Thus the four equations determine the unknown functionsα1,2, β1,2. A solution is (up to a constant)

n =1

z2∂t ± ∂r, l =

1

2

(∂t ∓ z2∂r

),

where ± corresponds to the sign of z2. Finally, we computethe commutators

[n, eθ] = ±[∂r, eθ] = ±[∂r,1

r∂θ] = ∓1

reθ,

[n, eφ] = ±[∂r,1

r sin θ∂φ] = ∓1

reφ,

and therefore

divn = ±2

r.

This means that the divergence of an “outward”-pointing,future-directed null congruence is positive for r > 2m andnegative for r < 2m.

4.2 Hawking’s area theorem

We have seen in Sec. 2.2.5 that the event horizon of aSchwarzschild black hole is a null surface. In more generalsituations, e.g. in the presence of several black holes that maymove with respect to each other or even merge with eachother, the event horizon is still a null surface. In this sectionwe shall study the global properties of event horizons.Consider an asymptotically flat spacetime that containssome points from which signals cannot escape to infinity.8

Heuristically, the set B of all such points is interpreted as a“black hole-like” region. For a point p 6∈ B, there exists atimelike curve connecting p to infinity; this curve may be per-turbed while remaining timelike, thus there exists a neighbor-hood of p which is not in B. Thus the set B is topologicallyclosed. The event horizon H is defined as the boundary ofB; since B is closed, we have H ⊂ B. The horizon H is a 3-surface that separates B from the domain of spacetime fromwhich timelike curves can escape to infinity (and hence, null

8Note that the notion of “escaping to infinity” is well-defined if the space-time is asymptotically flat (a “neighborhood of infinity” is the place wherethe spacetime becomes almost Minkowski, i.e. a neighborhood of I +

on a conformal diagram). Generalizations of this notion exist for non-asymptotically flat cases, such as the de Sitter spacetime.

curves can escape as well). Thus, the horizon H can also bethought of as the boundary of the past domain of dependenceof the future null infinity I +. Let us study the properties ofH in more detail.The first important property of an event horizon is statedby a theorem due to R. Penrose.

Statement 1: An event horizon is a null surface whose gen-erators are null geodesics without future endpoints, except ifa generator hits a singularity. (Generators cannot leaveH andenter the interior of B.)Proof: By construction, there cannot exist any future-directed timelike curves crossing H from its interior B out-wards; also, for any point p 6∈ B there should exist at leastone timelike curve not crossing H. It follows from these re-quirements that the 3-surface H cannot be locally timelike(i.e. spanned by one timelike and two spacelike directions).Also,H cannot be spacelike, because a spacelike 3-surface sep-arates a past subdomain from a future subdomain, so the fu-ture of H would have to be within B, but then some points tothe past ofHwill be unable to emit signals that do not enter B.Therefore, H must be a null surface, except for points whereH has a “corner” so the tangent 3-space to H is undefined. Anull surface locally separates a past subdomain from a futuresubdomain, and it is clear that B must be on the future sidefrom H. We also know that null surface is generated by nullgeodesics (see Sec. 2.2.8). Suppose that a null generator γ hadan endpoint p, and consider the causal structure of the tan-gent space at p. We are assuming that p is not a singularity, sothat γ can be continued past p. It is clear that γ cannot exit B;thus, since γ leaves H, it must enter the interior of B. Thereexists a point p′ 6∈ B infinitesimally displaced to the past ofγ, and there exists at least one null curve γ′ starting from p′

which escapes to infinity (or else we would have no timelikecurves from p′ escaping to infinity, contradicting the assump-tion p′ 6∈ B). Therefore, γ′ can never cross H. However, thecurve γ′ is arbitrarily close to γ, hence it enters B. This contra-diction proves that γ cannot have an endpoint.

The condition that singularities do not occur at the hori-zon is motivated as follows. We expect that the future historyof an asymptotically flat spacetime is completely predictablefrom initial data on a Cauchy 3-surface C preceding the grav-itational collapse. Thus, any null curve escaping to infinitymust intersect C at one point. (This condition is called asymp-totic predictabilitywith respect to C.) If a null generator γ ofa horizon H stops at a singularity p, it means that the curveγ cannot be continued past p. Since there exist null curvesγ′ 6⊂ B infinitely close to γ and reaching infinity, it follows thatthere exists a (null and/or spacelike) curve γ′′ that “starts”at the singularity p and reaches infinity. To an observer atinfinity, this curve appears to be a past-directed curve thatstops at p. Thus, the curve γ′′ does not intersect C and theasymptotic behavior at infinity in the direction of γ′′ cannotbe predicted from initial data on C. Another way to expressthis is to say that the singularity is “directly visible” to an ob-server at infinity. There is a (so far, not proved but stronglysupported) conjecture that any singularities that can occur inreasonable physical processes, such as gravitational collapse,must be “hidden” behind event horizons and are thus invisi-ble to observers at infinity. This statement is called the cosmiccensorship conjecture (“nature abhors naked singularities”).There are some artificial examples of spacetimeswith “naked”singularities, but all such examples are pathological in one or

98

Page 107: Winitzki. Advanced General Relativity

4.3 Holographic principle

another way. Therefore, the assumption of asymptotic pre-dictability appears reasonable.Let us now examine the area of the event horizon. To beprecise, we need to define what it means to consider an eventhorizon H “at a certain time.” Let us select a global space-like Cauchy 3-surface C in the entire spacetime. The surfaceC may be interpreted as the surface of constant “time.” Theintersection B ∩ C of the “black hole domain” B with C willbe the “black hole interior at a given time.” (The intersectionB∩Cmay consist of several disconnected pieces, in which casewe would say that there are several black holes at this time.)The boundary ∂(B ∩ C) will be a spacelike 2-surface H ∩ C;this is the “total event horizon at a given time.” We have seenin Sec. 2.1.2 that the cross-section area A of a null 3-surface isdefined independently of the observer’s reference frame, andsatisfies Eq. (2.6),

∇nA = (divn)A,

if the null 3-surface has generators n. In Sec. 2.2.5 we showedthat the Schwarzschild horizon is a null surface whose gen-erators are n = ∂t. Since the total area of the Schwarzschildhorizon in the stationary reference frame is time-independent,A = 4πr2, the field ∂t has zero divergence. In a general,nonstationary spacetime containing black holes, the area ofa horizon Hmay change with time (i.e. with the choice of theCauchy surface C), and thus the divergence of the null gener-ators n of H may be nonzero. However, it turns out that thedivergence of n cannot be negative, so the area cannot dimin-ish with time. This is the essential content of the area theo-rem due to S. W. Hawking. Let us first derive the propertydivn ≥ 0 and then prove the area theorem.

Statement 2: The divergence of null generators of an eventhorizon H is everywhere nonnegative, if the null energy con-dition is satisfied and the spacetime is asymptotically pre-dictable.Proof: We use the notation from the proof of Statement 1. Ifdivn < 0 at a point p of H then divn < 0 also within a neigh-borhood of p. By the focusing theorem, the null generatorsof H emitted from the neighborhood of p will focus within afinite interval of the affine parameter. After focusing, somepoint q on a null geodesic γ emitted from p will be reachablefrom p by a timelike curve. A timelike curve starting at pmustbe completely within B (or else this curve will escape to infin-ity, but p ∈ B). Thus, the point qwill be inside B and not on thehorizon surface H any more. Thus, a null geodesic γ emittedfrom pwill have to enter B within a finite interval of the affineparameter; the part of γ within H will then have an endpointwhere γ enters the interior of B. This contradicts Statement 1.

Statement 3: The total area of all the event horizons can-not diminish with time, under the conditions of Statements 1and 2. In other words: For a Cauchy surface C1 and anotherCauchy surface C2 to the future of C1, the area of the intersec-tion does not decrease,A[H ∩ C2] ≥ A[H ∩ C1].Proof: Consider first the intersection H ∩ C1. A portionof H between C1 and C2 will be generated by null geodesicsemitted from H ∩ C1 to the future. Since each generator hasno endpoints on H, it must intersect C2 by the Cauchy sur-face property of C2. The intersection of the generators withC2 will have an area not smaller than A[H ∩ C1], since thesegeodesics have everywhere nonnegative divergence. There isalso a possibility that other disconnected pieces ofHwill haveformed between C1 and C2; this would only increase the total

area A[H ∩ C2]. Therefore,A[H ∩ C2] ≥ A[H ∩ C1].The area theorem has had a wide-reaching impact on all offundamental physics. It has lead to the hypothesis of blackhole entropy,

S =1

4A = 4πm2,

and to the holographic principle. Let us consider a simple ap-plication of the area theorem. In an asymptotically flat space-time containing two distant black holes with masses m1 andm2, there may be some process that lets the two black holescome closer together and eventually merge. Initially, the totalarea of the event horizon of this spacetime is 16π(m2

1 + m22).

After merging, the spacetime contains a larger black hole ofmass M , with the area of the event horizon 16πM2. Thearea theorem says that 16πM2 ≥ 16π(m2

1 + m22) regardless

of the details of the merger, and regardless of whether anyextra matter or radiation was absorbed or emitted in the pro-cess of merging. This inequality sets a fundamental bound onthe amount of energy that can be extracted from a black holemerger.

4.3 Holographic principle

See the review [5].

In the lecture, I covered: general ideas behind the General-ized Second Law of thermodynamics, Bekenstein’s black holeentropy, Susskind’s collapse argument for spacelike bound,problemswith spacelike bounds (closed universes, expandinguniverses, almost-null spacelike surfaces), Bousso’s covariantentropy bound formulated using lightsheets, and Bousso’sprojection theorem.A rough map of ideas: Black holes must contain entropyfor the 2nd law to hold. Hawking’s area theorem supportsthe identification of entropy and the horizon area. Assumingthat black hole is the highest-entropy state, we obtain a boundon the entropy of matter within a (spacelike) 2-sphere. Sinceentropy counts the internal states, there is “enough space”on the boundary of a region to encode all the possible stateswithin the region; hence the term “holography.” Bousso’s co-variant entropy bound is a generalization of this bound thatremains valid in more complicated situations, and yields thespacelike bound as a particular case if appropriate conditionshold. Presently, the holographic principle has the status of astrongly supported conjecture.

99

Page 108: Winitzki. Advanced General Relativity
Page 109: Winitzki. Advanced General Relativity

5 Variational principle

Additional literature: [34].

Under variational principles we understand considera-tions based on the variation of functionals. The commonlyused variational principle states that the equation of motionfor the fields are derived by extremizing a certain functionalof the fields called the action or the Lagrangian. The Hamil-tonian formulation of field dynamics is then derived from theLagrangian formulation. In this chapter we shall examine La-grangian and Hamiltonian formulations of general relativityand explore some related issues.

5.1 Lagrangian formulation

5.1.1 Classical field theory

Classical field theory describes fields, i.e. functions on space-timeM or sections of some vector bundle with the spacetimeM as the base. The fields under consideration may be scalarfields φ1, φ2, ...; vector fields, spinor fields, and so on. Let usdenote all the fields collectively by φj , where j is an index enu-merating each individual field and each of their components.Field theory is based on the variational principle (or Hamil-ton’s principle of stationary of action, or action principle forshort): The field equations are the conditions for having a lo-cal extremum of the action functional,

S [φ] =

M

d4xL (φi,∇µφi, ...) (5.1)

where the Lagrangian density L depends on the field φj andits (covariant) derivatives.

The action principle states that a physically realized config-uration φj(x) of a field φ must be an extremum of the actionfunctional. The variation of the action under a small changeδφj(x) of fields φj(x) is

δS[φ] =

M

d4x∑

j

δS

δφj(x)δφj(x) +O

([δφj]2)

.

This yields the Euler-Lagrange equation for the field,

δS

δφj(x)= 0.

This method of deriving field equations is called a Lagrangianformulation of a field theory.

The currently established field theories (electrodynamics,gravitation, weak and strong interactions) are described byLagrangian densities which depend only on the fields andtheir first derivatives, L = L

(φj , φj,µ

). For such Lagrangians,

the first-order variation of the action is given by the formula

δS =

M

d4x∑

j

(

∂L(φi,∇φi

)

∂φj− ∂

∂xµ∂L(φi,∇φi

)

∂φj,µ

)

δφj(x),

where the summation over µ is implied. The boundary termsvanish if

j

∂L

∂φj,µδφj → 0 sufficiently rapidly as |x| → ∞, |t| → ∞,

which is the usual assumption. Thus we obtain the followingEuler-Lagrange equations for the fields φj ,

δS [φ]

δφj(x)≡ ∂L

(φi,∇φi

)

∂φj− ∂

∂xµ∂L(φi,∇φi

)

∂φj,µ= 0. (5.2)

The formula (5.2) holds for all Lagrangians that dependon fields and their first derivatives. If a Lagrangian for afield φ contains second-order derivatives such as φ;µν , the cor-responding Euler-Lagrange equations will generally containderivatives of third and fourth order.If a classical field theory is to be compatiblewith general rel-ativity, the Lagrangian must be generally covariant. In prac-tice, this means that the Lagrangian should be an invariantcombination of covariant derivatives of fields and the metricgµν . The integration must be performed with the volume ele-ment d4x

√−g, where g ≡ det (gµν) is the determinant of thecovariant metric tensor. Normally, this factor is included intothe Lagrangian L.The simplest example of a field is a scalar field. The La-grangian density for a real-valued scalar field φ(x) is

L (φ, ∂µφ) =

[1

2gµνφ,µφ,ν − V (φ)

]√−g, (5.3)

where V (φ) is a potential that describes self-interaction of thefield.

Covariant volume element

The expression d4x does not give the correct volume ele-ment if the coordinates x are not Cartesian or if the space-time is curved. It is not difficult to show that the volume el-ement in any number of dimensions is given by the formula

dV = dnx√

|g(x)|. If u1, ..., un are some vectors in a Euclideanspace, let Gij ≡ ui · uj be the n × n matrix of their pairwisescalar products. Then the volume of the n-dimensional paral-

lelepiped spanned by the vectors ui is V =√

|detG|. We canprove this standard statement by considering the matrix U jiof components of the vectors ui in an orthonormal basis ej,i.e.

ui =∑

j

U ji ej .

The matrix U can be also understood as a linear transforma-tion that maps the unit parallelepiped spanned by ej to aparallelepiped spanned by ui. A standard definition of thedeterminant of a linear transformation is the volume of theimage of a unit parallelepiped after the transformation. There-fore, the volume V is V = detU . Thenwe observe that thema-trix Gij satisfies G = UTU , therefore detG = (detU)2 = V 2

and V =√

|detG|.

101

Page 110: Winitzki. Advanced General Relativity

5 Variational principle

In general relativity, the spacetime has a metric with signa-ture (+,−,−,−) and the determinant det gµν is always neg-ative (except at singular points where it may be zero or infi-nite). Therefore we change the sign of g and write the volumeelement as d4x

√−g.

Minimal coupling to gravity

The action for a scalar field,

S[φ] =

M

d4x√−g

[1

2gµνφ,µφ,ν − V (φ)

]

, (5.4)

explicitly depends on gµν and thus describes a field coupledto gravity. This form of coupling is called the minimal cou-pling to gravity; this is the minimal required interaction of afield with gravitation which necessarily follows from the re-quirement of compatibility with general relativity.The Euler-Lagrange equation for a minimally coupled field

φ follows from the action (5.4),

∂α∂L

∂φ,α− ∂L

∂φ=(√−ggαβφ,β

)

,α+∂V

∂φ= 0.

This equation can be rewritten in a manifestly covariant formas

φ;α;α +

∂V

∂φ= 0.

This is similar to the Klein-Gordon equation, φ,α,α +m2φ = 0,except for the presence of covariant derivatives.A free (i.e. noninteracting) field has the potential

V (φ) =1

2m2φ2.

This is the simplest nontrivial potential; an additional linear

term Aφ can be removed by a field redefinition φ(x) = φ(x) +φ0. The parameterm is the rest mass of the particles describedby the field φ in quantum theory. The equation of motion for afree field φ is linear and thus describes “waves” that can crosswithout distorting each other. In other words, the field φ hasno self-interaction. A field would have self-interaction if thepotential V (φ) were such that V ′′′ 6= 0, so that the equation ofmotion would be nonlinear.

Gauss’s law with covariant derivatives

When computing the variation of a generally covariant actionsuch as the action (5.4), one needs to integrate by parts. Auseful shortcut in such calculations is an analog of Gauss’slaw with covariant derivatives.The covariant divergence of a vector fieldAµ can be writtenas

Aµ;µ =1√−g ∂µ

(√−gAµ).

The formula analogous to the Gauss law is

V

Aµ;µ

√−gdnx =

∂V

√−hAµdn−1Sµ,

where h is the partial metric on the hypersurface ∂V anddn−1Sµ is the oriented element of the (n− 1)-volume on thehypersurface. Using this formula, we may reduce volume in-tegrals of total divergences to boundary integrals. Assuming

that the contribution of the boundary terms vanishes, we ob-tain ∫

dnx√−gAµ;µB = −

dnx√−gAµB,µ. (5.5)

This formula can be used to integrate by parts: We set Aµ ≡φ,αg

αµ, B ≡ ψ, and find

d4x√−gφ,αψ,βgαβ = −

d4x√−g

(φ,αg

αβ)

;βψ.

Note that the covariant derivative of the metric is zero, g ;ααβ =

0, so we may lower or raise the indices under covariantderivatives at will; for example, Aµ;µ = A ;µ

µ .

5.1.2 Einstein-Hilbert action

Given a metric g on a manifoldM, one can compute the Rie-mann tensor Rκλµν and the Ricci tensor Rµν . Einstein postu-lated that the metric in a spacetimemust be such that the Riccitensor satisfies the Einstein equation

Rµν −1

2gµνR = −8πGTµν ,

where R ≡ Rµνgµν and Tµν is the energy-momentum tensor

of matter. This equation can be derived from a variationalprinciple by extremizing an appropriately chosen action func-tional.We shall begin with the vacuum Einstein equation (i.e. theEinstein equation in the absence of matter, Tµν = 0),

Rµν −1

2gµνR = 0. (5.6)

This equation can be derived from the Einstein-Hilbert ac-tion,

SEH [g] ≡ 1

16πG

M

R√−gd4x,

1

16πG(5.7)

where G is Newton’s gravitational constant and√−g ≡

√− det gµν , so that

√−gd4x is the covariant volume element.The integration in Eq. (5.7) is performed over the entire space-timeM. Such an integral might diverge when the manifoldM is noncompact and if the curvatureR does not vanish at in-finity, in which case one should limit the integration to a very

large but finite subdomain M ⊂ M.In the framework of the variational principle, one considersa small change in the metric,

gµν → gµν + λhµν ,

where λ is a “small” number and hµν is an arbitrary symmet-ric tensor field, which is assumed to vanish outside of the sub-domain M. The first-order variation of the EH action SEH [g]is a linear functional of hµν(p) that can be defined as

M

dnp hµν(p)δSEH [g]

δgµν(p)≡ d

∣∣∣∣λ=0

SEH [g + λh].

Note that the integration above is performed over all pointsp ∈ M and involves the coordinate-based volume elementdnp rather than the invariant volume element

√−gdnx. Thepoint p plays the role of a label, much like the indices µ, ν inhµν(p), and thus a simple “summation” over all the points pis needed.

102

Page 111: Winitzki. Advanced General Relativity

5.1 Lagrangian formulation

The expression

d

∣∣∣∣λ=0

SEH [g + λh]

can be interpreted as the “directional derivative” of the actionfunctional SEH [g] in the “direction” of hµν(p). (I put the word“direction” in quotes because hµν(p) is a tensor-valued func-tion on the manifold that describes an arbitrary modificationof the metric gµν(p) at every point.) The action SEH is ex-tremized if its derivative vanishes in every “direction,” i.e. ifδSEH/δgµν(p) = 0. This yields an equation for g. This methodof deriving the Einstein equation is the Lagrangian formula-tion of general relativity.Let us adopt the following, somewhat more abstract ge-ometric picture of the variational principle. The set of allthe possible metrics gµν(p) can be considered as an (infinite-dimensional) manifold G, every individual metric gµν being asingle “point” of G. The Einstein-Hilbert functional SEH [g] isthen a real-valued scalar “function” on this manifold, SEH :G → R. As usual, the tangent space TgG to the manifold G isthe set of all directional derivatives at a “point” gµν . Thus, atangent vector

h ≡∫

M

dnp hµν(p)δ

δgµν(p)

defines the derivative in a particular “direction” hµν . The tan-gent vector h acts on the “function” SEH [g], yielding

h SEH ≡∫

M

dnp hµν(p)δSEH [g]

δgµν(p),

which is the first-order variation of SEH in the direction hµν .This can be also written as

M

dnp hµν(p)δSEH [g]

δgµν(p)≡ (dSEH [g]) h,

where dSEH [g] is the naturally defined 1-form on G, the “gra-dient” of SEH in the space of metrics. The “components” ofthe 1-form dSEH [g] are δSEH/δgµν(p), where we interpret pas an “index” on par with µ, ν. The action SEH has a (local)extremum at a particular “point” gµν if the 1-form dSEH [g] isequal to zero at gµν .In the rest of this section, we shall show, by a direct calcula-tion, that the condition δSEH/δgµν(p) = 0 is equivalent to thevacuum Einstein equation.It is convenient to compute the first-order variation of

SEH [g] with respect to the inverse metric gµν rather than withrespect to gµν . We need to find the variation of

√−g andR ≡ gµνRµν , where Rµν is the Ricci tensor, under an infinites-imal change gµν → gµν + δgµν . The calculation of δ

√−g iseasy if we use the standard formula for the derivative of thedeterminant of a matrix (note that

√−g =√− det(gµν), while

det(gµν) = 1/ det(gµν) = g−1):

∂√−g∂gµν

= −1

2gµν

√−g.

Calculation: Derive the formula

∂AjkdetA = (detA)

(A−1

)

jk,

where A ≡ Ajk is a finite-dimensional square matrix and A−1

is the inverse matrix.

Hint: Show that

det(1 + λB

)= 1 + λTr B +O(λ2),

where 1 is the identity matrix, λ is “small enough,” and B isan arbitrary matrix.The variation of SEH [g] can be split into three terms,

δSEH [g] = δ

M

d4pRµνgµν√−g

=

M

d4p√−g

[

gµνδRµν +Rµνδgµν − 1

2Rgµνδg

µν

]

.

Shortly, we shall show that the first term in parentheses above,∫

M

d4p√−ggµνδRµν , (5.8)

vanishes (under suitable boundary conditions) because it is atotal divergence. Thus, we shall have

δSEH [g] =

M

d4p√−g

(

Rµν −1

2Rgµν

)

δgµν ,

which is equivalent to

δSEH [g]

δgµν(p)= Rµν −

1

2Rgµν . (5.9)

So the condition δSEH/δgµν = 0 is equivalent to the vacuum

Einstein equation (5.6).It remains to analyze the term (5.8). We shall now use theindex-free notation for clarity. The term of interest is

gµνδRµν = Tr(x,y)δRic(x,y). (5.10)

The Ricci tensor is a contraction of the Riemann tensor,

Ric(x,y) = Tr(a,b)R(x,a,y,b).

To determine the variation of the Riemann tensor, we needto consider the change in the Levi-Civita connection, ∇ → ∇,under g → g = g+δg. It is easy to show that the difference ∇−∇ between any two connections does not contain derivatives,

(∇ − ∇) (fx) = f · (∇ − ∇)x,

and hence acts as a transformation-valued 1-form δΓ,

∇ux −∇ux ≡ δΓ(u)x.

An explicit formula for δΓ can be derived (see the Calculationbelow) but is not required for the present derivation.Now we use the definition (1.62) to express the variation of

the Riemann tensor through δΓ,

δR(u,v)w = [δΓ(u),∇v]w + [∇u, δΓ(v)] w − δΓ([u,v])w

= (∇uδΓ)(v)w − (∇vδΓ)(u)w. (5.11)

Calculation: Verify Eq. (5.11). Details: omitted.The variation of the Ricci tensor is found1 as a trace of thevariation of the Riemann tensor,

δRic(x,y) = Tr(a,b)g(δR(x,a)y,b)

= Tr(a,b)g (b, (∇xδΓ)(a)y − (∇aδΓ)(x)y) .

1From this point and until the end of the section, index-free computationsinvolve traces and are more cumbersome than computations in the indexnotation. However, I would like to show how one could manage a com-plete calculation without indices. Below, an equivalent computation usingindices is shown as well.

103

Page 112: Winitzki. Advanced General Relativity

5 Variational principle

We cannot simplify the last expression any further, until weconsider the trace (5.10),

Tr(x,y)(a,b)g (b, (∇xδΓ)(a)y) − Tr(x,y)(a,b)g (b, (∇aδΓ)(x)y) .(5.12)

The tensor δΓ has rank (1, 2) and can be contracted in twodifferent ways, yielding two vector fields q1,2 defined by

q1 ≡ Tr(x,y)δΓ(x)y,

g(q2,y) ≡ Tr(a,b)g(b, δΓ(a)y), for all y.

The vector fields q1,2 involve derivatives of the metric pertur-bation δg. In the index notation, we have

(δΓ (x)y)µ ≡ δΓµαβx

αyβ; qµ1 ≡ δΓµαβgαβ , qµ2 ≡ gµνδΓααν .

Now we use the properties of the trace (see Sec. 1.7.3) andthe fact that∇ can be interchanged with g and with the “mutevectors” a,b,x,y appearing under the trace operation. Thefirst term in Eq. (5.12) is expressed as

Tr(x,y)∇xTr(a,b)g(b, δΓ(a)y) = Tr(x,y)g(∇xq2,y) ≡ div q2,

and the second term as

Tr(x,y)(a,b)g (b, (∇aδΓ) (x)y) = Tr(a,b)g(b,∇aTr(x,y)δΓ(x)y

)

=Tr(a,b)g (b,∇aq1) ≡ div q1.

Hence,gµνδRµν = div (q2 − q1) ≡ div q, (5.13)

where q ≡ q2 −q1 is an auxiliary vector field defined throughδgµν . Therefore, the term (5.8) is an integral of a total diver-gence and thus vanishes,

M

d4p√−ggµνδRµν =

M

d4p√−g div (q) = 0,

as long as q vanishes at the boundary ∂M of the domain of

integration M. (Since q depends on the derivatives of δgµν ,this condition is satisfied if δgµν identically vanishes outside

the domain M, or alternatively if both δgµν and δgµν;α vanish

at the boundary of M. The Einstein-Hilbert action SEH [g] de-pends on second derivatives of the metric. It is a fortunatefact that the equations of motion (the Einstein equations) areonly second-order in g; but, in any case, it is no surprise thatthe variation δSEH [g] contains boundary terms linear in∇δg.)This result completes the derivation of the vacuum Einsteinequation from the Einstein-Hilbert action.

Remark: The index-free computations above are cumber-some, mainly because different traces of a high-rank tensorδΓ are required. The index-free notation is best for low-ranktensor computations that do not involve complicated traces.After arriving at Eq. (5.11), it is definitely easier to proceedusing the index notation.

Calculation: Using the index notation, reproduce thederivation of Eq. (5.13), starting with Eq. (5.11).Solution: The index notation for the Riemann tensor is

aαbbxλyµRαβλµ ≡ R(a,b,x,y) ≡ g(R(a,b)x,y),

hence Eq. (5.11) gives

δRαβλµ = δΓµβλ;α − δΓµαλ;β ,

δRαλ = δRαβλβ = δΓββλ;α − δΓβαλ;β ,

gαλδRαλ = gαλ(

δΓββλ;α − δΓβαλ;β

)

=(

gαλδΓββλ − gβλδΓαβλ

)

;α≡ qα;α. (5.14)

The last term is a total divergence, as required.

Calculation: Consider the first-order variation of the Levi-Civita connection,

Γ(x)y ≡ limλ→0

1

λ

(

∇xy −∇xy)

,

where ∇ and ∇ are Levi-Civita connections corresponding tothemetrics g and g+λh, and h is an arbitrary symmetric tensorof rank (0, 2). An explicit formula for the tensor Γ in terms ofh can be derived,

2g(Γ(x)y, z) = (∇xh) (y, z) + (∇yh) (x, z)− (∇zh) (x,y).

In the index notation,

Γλαβ =1

2gλµ (hαµ;β + hβµ;α − hαβ;µ) . (5.15)

Derivation:We use Eq. (1.43) and choose vectors x,y, zwithall vanishing derivatives, i.e. vector fields x,y, z such that

∇(x or y or z)(x or y or z) = 0.

This choice is certainly possible at one point p and does notinfluence the result since Γ(x)y involves no derivatives of x ory but only values of x and y at p. Note that with this choice,

we still have ∇xy 6= 0 since ∇ is a different connection from∇, however

∇xy −∇xy = O(λ).

Now Eq. (1.43) yields (up to terms of order λ2)

2 (g + λh) (∇xy, z) − 2g(∇xy, z)

= 2λh(

∇xy, z)

+ λ(x h(y, z) + y h(x, z) − z h(x,y)

)

= λ((∇xh) (y, z) + (∇yh) (x, z) − (∇zh) (x,y)

)

because the term

2λh(

∇xy, z)

= O(λ2)

can be disregarded.

5.1.3 Nonlinear f(R) gravity

In the Lagrangian formulation of field theory, every field φ isdescribed by an action functional,

M

√−gdnpL[φ],

whereL is an appropriate Lagrangian. Thus, Einstein’s theoryof gravity can be viewed as a field theory where the “field”is the point-dependent metric gµν(p) and the Lagrangian isthe Einstein-Hilbert one, Eq. (5.7). If we choose a differ-ent Lagrangian, we may obtain a different theory of grav-ity. Many alternative theories of gravity have been consid-ered (and most of them have been rejected). Motivated bythe requirement of general covariance, one usually constrainsthe choice of gravitational Lagrangians to combinations of in-variant quantities, such as R, RµνR

µν , RκλµνRκλµν , R, etc.

These modified theories of gravity give rise to equations withhigh-order derivatives of gµν because the Ricci scalar R[gµν ]contains second derivatives of gµν .

104

Page 113: Winitzki. Advanced General Relativity

5.1 Lagrangian formulation

Wewill consider only a relatively simple modification of theEinstein-Hilbert action consists of replacing the Ricci scalar Rby a nonlinear function of R, so that the action is

SNL[g] =

M

√−gdnp f(R). (5.16)

(The function f(R) is nonlinear iff f ′′(R) 6= 0). Such theo-ries are called theories of nonlinear gravity, f(R) gravity, orscalar-tensor gravity for the reason explained below.2

Let us now derive the equation of motion for gµν fromthe f(R) Lagrangian. We shall follow the derivation inSec. 5.1.2 with appropriate modifications, using the index no-tation where convenient.As before, the variation of SNL[g] can be split into threeterms,

δSNL[g] = δ

M

d4p f (Rµνgµν)

√−g

=

M

d4p√−g

[

f ′(R) (gµνδRµν +Rµνδgµν) − f(R)

2gµνδg

µν

]

.

The term gµνδRµν = qα;α is a total divergence, as shown byEq. (5.14), but this term is now multiplied by f ′(R) which isnot a constant since, by assumption, f(R) is nonlinear. There-fore the integral

M

d4p√−g f ′(R)gµνδRµν =

M

d4p√−gf ′(R)qα;α

does not identically vanish any more, but instead contributesto the equation for gµν . Integrating by parts and omittingboundary terms, we have

M

d4p√−g f ′(R)qα;α = −

M

d4p√−g [f ′(R)];α q

α.

However, qα still contains derivatives of δgµν , and we nowneed to use Eq. (5.15) to find

qβ = δgαβ ;α − gµνδgµν;β . (5.17)

Calculation: Derive Eq. (5.17) from Eqs. (5.14) and (5.15).Hint: Note that δgµν is not equal to δgµν with “indicesraised,” but

δgαβ = −gαλgβµδgµν ,where δgαβ is the perturbation in gµν entering Eq. (5.15). Ex-press qα through δgµν rather than through δgµν .Finally, we obtain

−∫

M

d4p√−g [f ′(R)];α δq

α

=

M

d4p√−g

(

[f ′(R)];α;α gµν − [f ′(R)];µν

)

δgµν .

Therefore, the equation of motion for gµν is

f ′(R)Rµν −1

2f(R)gµν + gµνf

′(R) − [f ′(R)];µν = 0.

This equation has fourth-order derivatives of the metric andis quite complicated. In order to simplify this theory, thestandard practice is to reduce the above equation to a systemof second-order equations by introducing an auxiliary scalar

2See, for instance, G. Magnano and L. M. Sokolowski,arxiv:gr-qc/9312008.

field and changing the metric by a suitable conformal trans-formation.Rather than guessing the necessary change of variables, onecan perform the simplification in a systematic way by usingthe method of Lagrange multipliers. I will first describe theapplication of this method to a trivial example borrowed fromclassical mechanics. Then I will explain the application of thatmethod to the theory (5.16).In classical mechanics, the dynamics of a system may bedescribed by the action

S[q] =

L(q, q)dt,

where q(t) is the trajectory of the system, which we willassume to be a vector-valued function of time. If the La-grangian L[q, q] depends on first derivatives of q(t) nonlin-early (which is usually the case), the equations of motion forq(t) are second-order (they contain q). Supposewewould liketo reduce the Lagrangian to a simpler form that generates onlyfirst-order equations of motion. Oneway is to introduce a newindependent variable v(t) which is equal to the velocity q(t);then one expects to obtain a system of first-order equationsfor q(t),v(t). A straightforward way to replace velocitieswith an independent variable is to add a constraint q − v = 0to the Lagrangian with a Lagrange multiplier. Since we havea vector-valued constraint at every moment of time t, the La-grangemultiplier must also be vector-valued function of time.Let us denote this vector-valued Lagrange multiplier by s(t).Thus the modified action can be written as

S[q,v, s] =

[L(q,v) + s · (q − v)] dt. (5.18)

The variation of the new action S with respect to three in-dependent vector-valued functions q,v, s yields equations ofmotion that are equivalent to the original ones.We note that the variables s and v enter the action (5.18) as

Lagrange multipliers; the action involves derivatives of q butno derivatives of s or v. As we suspect that two Lagrangemultipliers constraining a single function q are too many, wewould like to eliminate one of these Lagrange multipliers. To

this end, let us compute the variation of S with respect to v:

δS[q,v, s]

δv(t)=∂L(q(t),v(t))

∂v− s(t).

This is an algebraic equation that can be solved with respectto v (if L is a nonlinear and nondegenerate function of v). Letus denote by V(q, s) the functions that express v through q

and s,v = V(q, s).

Since the action (5.18) contains v merely as a Lagrange multi-plier (no derivatives of v are involved), we may now substi-tute v = V(q, s) into the action (5.18) and obtain an equivalentaction that depends only on q and s:

S[q, s] =

[L(q,V(q, s)) − s · V(q, s) + s · q] dt.

This action is linear in time derivatives, and hence the equa-tions of motion for q(t), s(t) are first-order in the time deriva-tives. One may recognize this as the Hamiltonian action,where s is the canonical momentum corresponding to q, and

L− s ·V(s,q) ≡ H(s,q)

is the Hamiltonian.

105

Page 114: Winitzki. Advanced General Relativity

5 Variational principle

Remark: As we have seen, the Lagrange multiplier s has thesignificance of the canonical momentum. Usually, the canon-ical momentum corresponding to q would be denoted by theletter p. In the present derivation, I chose a different letter,s, because I wanted to emphasize that we do not know thesignificance of the new variables in advance. It will not be al-ways the case that the new variables introduced through theLagrange multiplier method have the significance of canoni-cal momenta.The method of Lagrange multipliers is more general thanthe familiar passage from the Lagrangian to the Hamiltoniandescription. For instance, one could treat Lagrangians con-taining higher derivatives in the same systematic manner.One could also replace only some of the higher derivatives butnot all, if this proves to be convenient in a particular situation.

Let us now apply the method of Lagrange multipliers tothe action (5.16). At this point, we do not have the purposeof reducing the system to first-order equations, but rather toremove the nonlinearity in the f(R) term. Therefore, we aremotivated to introduce a new field r equal to the Ricci scalarR[gµν ]. Hence, an action equivalent to the action (5.16) is

M

√−gdnp [f(r) + (R[gµν ] − r) λ] , (5.19)

where λ is a Lagrange multiplier field. Variation of the aboveaction with respect to r yields the algebraic equation

f ′(r) = λ,

which can be solved to express r through λ (since by assump-tion f ′′(r) 6= 0). Let us denote by r(λ) the function obtainedby solving the algebraic equation f ′(r) = λ. We can then sub-stitute r into the action (5.19) and obtain an equivalent action,

S[gµν , λ] =

M

√−gdnp [f(r(λ)) − r(λ)λ + λR[gµν ]] , (5.20)

which now depends only on the metric gµν and the new fieldλ. Note that the field λ must be varied independently of gµν ;the constraintR[gµν ] = r(λ)will be a consequence of the equa-tions of motion.At this point, the action (5.20) depends linearly on R, andwe can use another trick to simplify the equations further.Namely, we perform a suitable conformal transformation ofthe metric. According to Eq. (3.21) with N = 4, a conformaltransformation gµν → gµν with a conformal factor e

2Ω 6= 0will change the variables as follows,

gµν = e2Ωgµν ,√−g = e4Ω

−g,R[gµν ] = e−2Ω

(R[gµν ] + 6Ω + 6gµνΩ,µΩ,ν

),

where the D’Alembert operator is defined with respect tothe new metric gµν . We can now express the action (5.20)through the new metric. An appropriate choice of Ω, namely

Ω = −1

2lnλ, e2Ω =

1

λ, (5.21)

cancels the factor λ in the action (5.20). The action in the newvariables is

S[gµν , λ] ≡ S[gµν ,Ω] =

M

−gdnp[f(r(λ)) − r(λ)λ

λ2

+R[gµν] + 6Ω + 6gµνΩ,µΩ,ν]. (5.22)

The term 6Ω is a total covariant derivative and can be omit-ted from the action. Also, we may express λ through Ω usingEq. (5.21) and regard Ω as a new scalar field. Introducing theauxiliary function

V (Ω) ≡ r(λ)λ − f(r(λ))

λ2

∣∣∣∣λ=λ(Ω)

,

we rewrite the action (5.22) as

S[gµν ,Ω] =

M

−gdnp [R[gµν ] + 6gµνΩ,µΩ,ν − V (Ω)] .

This is the action of usual, unmodified Einstein gravity, mini-mally coupled to a scalar field Ωwith a self-interaction poten-tial V (Ω). The form of the potential V (Ω) is ultimately deter-mined by the function f(R) in the original action (5.16).The additional scalar field Ω can be heuristically viewed asa “scalar component” of the original gravitational field gµν .The metric tensor gµν alone is insufficient to describe the ef-fects of gravity in these nonlinear theories. Therefore, thesemodels are also called scalar-tensor theories of gravity. Theoriginal field variable gµν is called the Jordan frame variables,while the transformed variables Ω, gµν are called the Ein-stein frame variables. The name “Einstein frame” means thatin that frame the theory looks like the ordinary Einstein grav-ity coupled to some additional matter fields. Calculations areoften easier in the Einstein frame. However, the metric gµν isan auxiliary variable that does not describe the physically ob-servedmetric (i.e. the action for other matter fields, such as theelectromagnetic field, contains gµν and not gµν). So one needsto perform the conformal transformation back to the Jordanframe to recover the physical metric gµν .After this brief excursion into alternative theories of gravi-tation, I continue to consider only the currently standard the-ory (Einstein’s General Relativity).

5.1.4 Energy-momentum tensor

Consider a field theory that contains some matter fields φjinteracting with gravity. In the Lagrangian formulation, thetotal system is described by an action of the form

S[φj ; gµν ] =

M

(LEH [g] + L[φj ; gµν ])√−gd4x,

where L[φj ; gµν ] is a Lagrangian for the matter fields and theirinteractions among themselves, while

LEH [g] ≡ 1

16πGR

is the Einstein-Hilbert Lagrangian for gravity. The Euler-Lagrange equation for a matter field φj is then

δL[φ; gµν ]

δφj= 0,

while the equation for the gravitational field is

δLEH [g]

δgαβ+δL[φj ; gµν ]

δgαβ= 0.

Wehave already computed δLEH/δgαβ , and sowe can rewrite

this equation as the Einstein equation,

Rαβ − 1

2Rgαβ = 16πG

δL[φj; gµν ]

δgαβ≡ −8πGTαβ,

106

Page 115: Winitzki. Advanced General Relativity

5.1 Lagrangian formulation

where we have defined the tensor Tαβ called the energy-momentum tensor (EMT) of matter,

Tαβ ≡ 2√−gδL[φj ; gµν ]

δgαβ. (5.23)

This tensor plays the role of a “source” for the gravitationalfield. In most field theories, this tensor also describes the dis-tribution of local energy and momentum density of the field,defined in the conventional sense.For example, a minimally coupled scalar field with the La-grangian (5.3) has the EMT

Tαβ = φ,αφ,β − 1

2gαβφ,µφ

,µ + V (φ)gαβ . (5.24)

The electromagnetic field is described by the Maxwell tensorFµν , which satisfies the Maxwell equations. These equationscan be derived from the Lagrangian

LEM [F ; g] = −√−g16π

FαβFαβ = −

√−g16π

FαβFµνgαµgβν , (5.25)

from which also follows the EMT

Tµν =1

(1

4gµνFαβF

αβ − FαµFβνgαβ

)

.

Calculation: Derive the above expressions for the EMT of ascalar field and of the Maxwell field.Solution: If a Lagrangian is of the form L =

√−gL then

2√−gδL

δgαβ= −Lgαβ + 2

δL

δgαβ.

In the scalar field Lagrangian, the only term that depends ongµν is 1

2gµνφ,µφ,ν . Hence,

δL[g, φ]

δgαβ=

1

2φ,αφ,β ,

Tαβ = φ,αφ,β − gαβL

as required. The same procedure for the Lagrangian LEMyields the required answer since

δL

δgµν= − 1

16π

(FαµFβνg

αβ + FµαFνβgαβ)

= −gαβ

8πFαµFβν .

Calculation: Compute the 0-0 component of the electromag-netic EMT and show that it coincides with the familiar expres-sion for the energy density of the electromagnetic field,

T00 =1

(

| ~E|2 + | ~B|2)

.

5.1.5 General covariance

Consider a field theory where every matter field φi, i =1, ..., N , is coupled to gravity through the metric tensor gµνand the covariant derivatives∇. Such a field theory is invari-ant under general coordinate transformations; this propertyis called the general covariance of the theory. General coor-dinate transformations can be parametrized by four functionsx(x). According to the Noether theorem, the invariance withrespect to gauge transformations leads to mathematical iden-tities that hold regardless of the equations of motion. How-ever, we can obtain useful conservation laws out of a gauge

symmetry if we do use the equations of motion. We shall nowshow that the requirement of general covariance for a set offields φi with a Lagrangian L[φi; g], together with the Euler-Lagrange equation of motion,

δL

δφi(x)= 0, (5.26)

yields a conservation law for the field’s energy-momentumtensor Tµν defined by Eq. (5.23).To derive the conservation law, we consider an infinitesimalcoordinate transformation

x → x = x + εf(x),

where f is a vector field, which is the “generator” of the trans-formation (points are shifted along the flow lines of f ). Thematter fields φi are transformed by the flow of f according to

φi(x) → φi(x) = φi(x) + δqi(x),

where δqi(x) is defined appropriately for each field, such thatthe local variation of the field φi is δφi = εLfφi. For instance,the metric gαβ is transformed as

gαβ(x) → gαβ(x) = gαβ(x)+εLfgαβ(x) = gαβ+ε

(fα;β + fβ;α

).

We know that the action is invariant under the transforma-tion, therefore the variation δ

∫Ld4xmust vanish:

0 = δ

Ld4x =

∫ [δL

δgαβ(x)δgαβ(x) +

δL

δφi(x)δφi(x)

]

d4x

=

∫δL

δgαβ(x)ε(fα;β + fβ;α

)d4x+

∫δL

δφi(x)δφi(x)d4x.

(5.27)

Since the fields φi satisfy Eq. (5.26), the second term vanishes.Expressing the first term through the tensor Tαβ , we get

Tαβfβ;α√−gd4x =

∫ [(Tαβf

β);α − T ;α

αβfβ]√−gd4x

= −∫

T ;ααβf

β√−gd4x = 0. (5.28)

Here we assumed that fα vanishes at infinity sufficientlyquickly. Since Eq. (5.28) must be satisfied for arbitrary fα(x),we conclude that the conservation law T ;α

αβ = 0 holds.

Remark: The absence of the covariant volume factor√−g in

Eq. (5.27) is not amistake; the result is nevertheless a covariantquantity. The derivative with respect to fα is calculated usingthe chain rule, e.g.

δ

Ld4x =

∫ [δL

δgαβ(x)δgαβ(x) +

δL

δφiδφi(x)

]

d4x,

and the rule requires a simple integration over d4x. The cor-rect covariant behavior is supplied by the Lagrangian L thatitself contains a factor

√−g.In a flat spacetime, the laws of energy and momentumconservation follow from the invariance of the action underspacetime translations. In the presence of gravitation thespacetime is curved, so in general the spacetime translationsare not a physical symmetry any more. However, the actionis covariant with respect to arbitrary coordinate transforma-tions. The corresponding conservation law is the “covariant

107

Page 116: Winitzki. Advanced General Relativity

5 Variational principle

conservation” of the EMT, T ;ααβ = 0. However, this law does

not actually express the conservation of energy or momentumof the matter field φi because of the presence of the covariantderivative. That equation would be a conservation law if ithad the form ∂µ (

√−gT µν) = 0, but instead it can be shownthat

∂µ(√−gT µν

)= −√−gΓνµλT µλ 6= 0,

where Γνµλ is the Christoffel symbol. The energy of the matterfields alone, described by the energy-momentum tensor Tαβ ,is not necessarily conserved; the gravitational field can changethe energy and the momentum of matter.

Remark: For one scalar field φ with the Lagrangian (5.3),the conservation of energy-momentum tensor, Tµν

;µ = 0,where Tµν is given by Eq. (5.24), and the equation of motion,gµνφ;µν + V ′(φ) = 0, are equivalent:

0 = Tµν;µ = (φ,µφ,ν)

;µ − 1

2(φ,αφ

,α);ν + V (φ)φ,ν

=(φ;µ

;µ + V (φ))φ,ν .

In a theory of a single scalar field φ coupled to gravity, theconservation law Tµν

;µ = 0 is essentially a consequence of theEinstein equation. Therefore, one may say that the Einsteinequation contains the equation of motion for the scalar field.However, the same statement will not hold for theories withmore than one scalar field. A single conservation law can-not yield several independent equations of motion for all thefields.

5.1.6 Symmetries and Noether theorems

The fundamental theorems of E. Noether explain the relation-ship between symmetries and conservation laws. We showsimple examples of theories with symmetries.

Translational symmetry

A field theory is described by a Lagragian, and a symme-try of a field theory means a transformation that leaves theLagrangian invariant. For example, consider the theory ofa scalar field with the Lagrangian L[φ], for instance, that ofEq. (5.3) in a flat, Minkowski spacetime, where the metric isg = η = diag(1,−1,−1,−1). Since this Lagrangian in thisspacetime does not depend explicitly on the coordinates, it isinvariant under a translation of fields, φ(x) → φ(x+a), wherea is an arbitrary constant 4-vector. We shall now study theconsequences of this symmetry for any Lagrangian L[φ] thatdepends only on φ and ∇φ.

Remark: In this section only, we denote spacetime points byboldface letters x ≡ (t, x, y, z) for brevity. This notation is dif-ferent from the notation in the rest of the text, where boldfaceletters are vectors or vector fields. In the Minkowski space-time, one can identify points (t, x, y, z) with vectors from R4,but in a curved spacetime points are not vectors.Let us first examine the transformation φ(x) → φ(x + a) inmore detail. The transformation is understood as the replace-

ment of the function φ(x) by a different function φ(x),

φ(x) → φ(x) = φ(x + a).

Note that the value of the new function φ at a location x is equalto the value of the old function φ at the location x + a; i.e. the

value of the function is not modified beyond the necessarychange due to the replacement of the argument, x → x + a.

The Lagrangian L[φ] is invariant under the replacement

x → x + a in the following sense: the value of L[φ](x), com-puted at a fixed location x, is equal to the value of L[φ](x + a),i.e., to the value of the same function L[φ], computed for theold field φ, at the location x + a. Let us now express this in-variance as a mathematical identity.It is convenient to consider an infinitesimal displacement

δa. At a given location x, the value of any scalar function f(x)changes under x → x + δa, to first order, as

δf(x) ≡ f(x) − f(x) = f(x + δa) − f(x) = ∇δaf +O(δa2).

This is the local variation of the value of the function f . (Fromnow on, we shall drop the termsO(δa2) since we need to con-sider only first-order variations.) Because of the assumptionof translational invariance, the same argument applies also tothe Lagrangian L[φ](x), considered as a function of x,

δL[φ(x)] ≡ L[φ(x)] − L[φ(x)]

= L[φ](x + δa) − L[φ](x) = ∇δaL[φ](x).

The above relation is, essentially, the requirement of trans-lational invariance of the Lagrangian L. On the other hand,the same variation δL can be computed directly, expressing itthrough the local variation of the field,

δφ(x) ≡ φ(x) − φ(x) = ∇δaφ(x).

We have

δL[φ(x)] =∂L

∂φδφ+

∂L

∂φ,µδφ,µ.

Thus the same quantity δL can be expressed in two differentways:

δL[φ(x)] =∂L

∂φδφ+

∂L

∂φ,µδφ,µ = ∇δaL.

This identity expresses the invariance of the LagrangianL un-der translations. A conservation law can be now derived, as-suming that the Euler-Lagrange equation holds for a specificfield configuration φ(x),

∂L[φ]

∂φ−∇µ

∂L[φ]

∂φ,µ= 0.

Namely, since δa is a constant vector, we can express δφ andδL as total divergences,

δφ = ∇δaφ = φ,µδaµ = ∇µ (φδaµ) , ∇δaL = ∇µ (Lδaµ) ,

and then rewrite the above identity as

0 =∂L

∂φδφ+

∂L

∂φ,µδφ,µ −∇δaL

=

(∂L

∂φ−∇µ

∂L

∂φ,µ

)

δφ+ ∇µ

(∂L

∂φ,µφ,ν − Lδµν

)

δaν .

Since the Euler-Lagrange equation is assumed to hold andδa is arbitrary, it follows that the following conservation lawholds,

0 = ∇µ

(∂L

∂φ,µφ,ν − Lgµν

)

≡ ∇µTµν ,

where

T µν =∂L

∂φ,µφ,ν − Lgµν (5.29)

108

Page 117: Winitzki. Advanced General Relativity

5.1 Lagrangian formulation

is the energy-momentum tensor of the field φ. Indeed, thetensor T µν coincides (after lowering the index) with the expres-sion (5.24).A relation of the form∇µj

µ ≡ divj = 0 in flat space is calleda conservation law, and the vector field j is called a conservedcurrent. A simple interpretation can be given in Minkowskicoordinates (t, ~x): The 4-vector jµ is decomposed into the time

component j0 and a spatial part~j. Then j0 is a density of some

“substance” at a point, while~j is the 3-velocity. The conserva-

tion law 0 = jµ,µ = ∂tj0 + div~j shows that the change in the

density j0 is always due to the transport of the “substance”from neighbor points, which means that the amount of the“substance” is conserved. Alternatively, we may consider aspacelike 3-surface t = t1 and compute the total amountQ(t1)of the “substance” on the surface,

Q(t1) ≡∫

t=t1

j0d3~x.

The conservation law then says that Q(t) is constant,

dQ(t)

dt=

∫dj0

dtd3~x = −

∫ (

div~j)

d3~x = 0.

The quantity Q is called the conserved charge correspondingto the current jµ with respect to a 3-surface. The equationsof motion for the field are such that the charge Q is time-independent. This is the meaning of the conservation law.More generally, we may consider a one-parametric groupof coordinate transformations

x → x(x; ε), φ(x) → φ(x) ≡ φ(x),

where ε is a real parameter and ε = 0 corresponds to theidentical transformation. If a Lagrangian L[φ;x], possibly de-pending explicitly on x, is invariant under this transformationgroup, in the sense explained above, then a similar calculationleads to the conservation law jµ,µ = 0 for a certain 4-vectorfield jµ, called theNoether current corresponding to the sym-metry transformation x → x. The Noether current jµ is thendefined by

jµ =∂L

∂φ,µφ,νf

ν − fµL,

where

fµ ≡ ∂x

∂ε

∣∣∣∣ε=0

.

Calculation: Derive the above expression for the Noethercurrent jµ corresponding to the transformation x → x, φ →φ(x) ≡ φ(x), assuming that

d4xL[φ(x);x] =

d4xL[φ(x); x].

Solution: Consider a transformation with an “infinitesimal”value of ε,

x → x = x + f(x)ε, f(x) ≡ ∂x(x; ε)

∂ε

∣∣∣∣ε=0

.

The vector field f describes the “flow” of the infinitesimal co-ordinate transformation. The local variation of the field φ is

δφ ≡ φ(x) − φ(x) = φ(x) − φ(x) = εfαφ,α = εf φ.

The local variation of the derivative φ,µ is slightly more com-plicated due to the dependence of f on x,

δ(φ,µ) =∂φ(x)

∂xµ− ∂φ(x)

∂xµ=∂φ(x + εf)

∂xµ− ∂φ(x)

∂xµ= (εfαφ,α),µ .

The local variation of the Lagrangian (at fixed x) due to thelocal variation of the field is

δL[φ(x);x] =∂L

∂φδφ+

∂L

∂φ,µδ(φ,µ)

=

(∂L

∂φ,µδφ

)

=

(∂L

∂φ,µφ,νf

νε

)

,

where we have used the Euler-Lagrange equation. On theother hand, the invariance of the action under the transfor-mation means that∫

d4x δL[φ(x);x] =

d4xL[φ(x); x] −

d4xL[φ(x);x].

The volume element is transformed as

d4x =

(

d4x)

det∂xµ

∂xν=(

d4x)

det

(

1 + ε∂fµ

∂xν

)

=(

d4x)(

1 + ε∂fµ

∂xµ

)

=(

d4x)

(1 + εdivf) .

(Recall that det (1 + εA) = 1 + εTrA + O(ε2) for a matrix A.)Since

L[φ(x); x] − L[φ;x] = εfµ∂L

∂xµ= εf L,

we obtain the identity (up to terms of order ε2)

0 =

(∂L

∂φ,µφ,νf

νε

)

− εfµ∂L

∂xµ− ε

∂fµ

∂xµL

= ε

(∂L

∂φ,µφ,νf

ν − fµL

)

≡ εjµ,µ.

Since ε is arbitrary, it follows that jµ is a conserved current.If the symmetry group has n > 1 parameters then n con-servation laws can be found. Let us study this case in somemore detail. Suppose we are given an n-parametric group ofsmooth transformations of a spacetimemanifoldM. Continu-ous groups of transformations, i.e. groups that are themselvessmooth multidimensional manifolds, are called Lie groups. Ifthe symmetry group G is an n-dimensional Lie group theneach element γ ∈ G acts as a transformation x → x(x; γ).An infinitesimal transformation x → x + δx corresponds to anelement of the group Gwhich is “infinitesimally close” to theidentity transformation 1 ∈ G. Such elements can be (heuristi-cally) parametrized as γ = exp(εv) or “γ = 1+εv”, where v isa tangent vector from the (n-dimensional) tangent space T1Gand exp(v) is the exponentiation map that produces pointsalong the flow lines of a vector v. By definition, a tangent vec-tor v is a derivation operator acting on functions on G. Thetransformation of points x corresponding to the vector v canbe written as

x → x = x + εf(x,v),

where

f(x,v) ≡ ∂x(x)

∂ε

∣∣∣∣ε=0

= v x(x; γ),

and x(x; γ) is a function on G. It is clear that f(x,v) is linearin v, thus f is a 1-form on T1G with values in TxM. In the

109

Page 118: Winitzki. Advanced General Relativity

5 Variational principle

index notation, f may be written as fµs , where the index s isn-dimensional and labels the tangent space to the Lie group.The calculation leading to the definition of the Noether cur-rent yields

(∂L

∂φ,µφ,νf

νs − fµs L

)

vs ≡ jµs ;µ = 0,

thus the Noether current jµs is also a 1-form on T1G with val-ues in TxM. We have seen an example of such a Noether cur-rent (5.29): the group of transformations is R4 and thus the in-dex s in jµs can be identified as a four-dimensional Minkowskiindex, yielding a (1,1)-tensor T µν . The corresponding Noethercharge

Qν(t0) =

t=t0

T 0νd

3x

represents the total 4-momentum on a spacelike 3-surface t =t0. The total energy and the total momentum are conserved,Qν(t) = const.

Remark: The EMT (5.29) obtained as a Noether currentwith respect to translation symmetry is called the canonicalEMT, while the tensor defined by Eq. (5.23) is called the met-ric EMT. The canonical and the metric EMT coincide for thescalar field φ but not necessarily for other fields; however, thismismatch is merely a technical problem with the calculations.Both these tensors are conserved in flat space, and thereforethe difference between them is a divergence-free tensor. In-deed, we can always add a divergence of an arbitrary anti-symmetric tensor to a conserved current without changing theconservation law,

jµ → jµ = jµ + (Bµν −Bνµ),ν ; jµ,µ = jµ,µ = 0.

Note that the calculation leading to the definition of theNoether current jµ only shows that jµ,µ = 0 and thus does notnecessarily determine the physically correct expression for jµ.It is the metric EMT, not the canonical EMT, that contributesto the Einstein equations and plays the role of the source ofgravity.

Internal symmetry

The essense of Noether’s theorem is that every symmetry ofa Lagrangian leads to a conservation law. Our first examplewas a transformation that did not modify the value of the fieldφ (beyond the necessary change due to the coordinate shift).The second example is a field theory with an internal symme-try, i.e. a symmetry transformation that changes the value ofthe field φ but does not involve the coordinates.Consider a complex-valued scalar field ψ with the La-grangian

L[ψ] =1

2(gµν∂µψ∂νψ

∗ − V (|ψ|))√−g.

This Lagrangian is manifestly invariant under the group of

transformations ψ → ψ = eiαψ, where α is an arbitrary realnumber; this is an example of an internal symmetry of the

theory. The identity L[ψ] = L[ψ] leads to a conservation law.Again, it is convenient to apply an infinitesimal transforma-tion,

ψ → ψ = ψ (1 + iα) +O(α2),

and to suppress terms of order α2 or higher. The local varia-tion of the field ψ is

δψ(x) = iαψ(x), δψ∗ = −iαψ∗,

hence the local variation of the Lagrangian is

0 = δL[ψ(x),x] = L[ψ(x),x] − L[ψ(x),x]

=∂L

∂ψδψ +

∂L

∂ψ∗δψ∗ +

∂L

∂ψ,µδψ,µ +

∂L

∂ψ∗,µ

δψ∗,µ

=

(∂L

∂ψ,µδψ +

∂L

∂ψ∗,µ

δψ∗

)

=

(∂L

∂ψ∗,µ

ψ − ∂L

∂ψ,µψ∗

)

iα,

where we have used the Euler-Lagrange equations. For theabove Lagrangian, the Noether current is

jµ = gµνψ,νψ

∗ − ψ∗,νψ

2i,

and the Noether charge is interpreted as the charge density ofthe field ψ.

Infinite-dimensional (gauge) symmetry

The Noether theorem applies to a more general case: namely,an n-dimensional Lie group of transformations that changesboth the points x and the values of fields φa, a = 1, ..., A, ac-cording to

x → x = x + εf(x), φa(x) → φa(x) = φa(x) + εqa(x),

and adds a total divergence to the Lagrangian L,∫

d4xL[φa(x),x] =

d4xL[φa(x), x] +

d4x (∂µC

µ) ,

where Cµ is a suitable auxiliary vector field. A correspond-ing Noether current jµs exists also in this case, indicating aconserved quantity Q that remains constant by virtue of theequations of motion.

Remark: The conservation of Q can be viewed as simply aconsequence of the equations of motion, and derived by suit-able algebraic manipulations from the Euler-Lagrange equa-tions. However, in that case one may wonder what other con-servation laws may be found and how the necessary manip-ulations are to be guessed. On the other hand, the Noethertheorem explains that the existence of conserved quantities isnecessary, as long as the Lagrangian has a continuous sym-metry. (And, conversely, every conservation law is a conse-quence of a symmetry.) Thus, symmetries and the Noethertheorem provide a more natural way to obtain conservationlaws in a given field theory.A qualitatively different situation arises if the theory is in-variant under symmetry transformations parametrized by anarbitrary function α(x) of the spacetime. (This makes the groupof transformations an infinite-dimensional group.) Such sym-metry transformation is called a gauge symmetry. A familiarexample of a gauge symmetry is the Maxwell theory with theLagrangian

LEM = − 1

16πFµνF

µν , Fµν ≡ ∂µAν − ∂νAµ,

which is invariant under the gauge transformation

Aµ(x) → Aµ(x) = Aµ(x) + ∂µα(x),

110

Page 119: Winitzki. Advanced General Relativity

5.2 Hamiltonian formulation

where α(x) is an arbitrary function.We shall now examine the consequences of a gauge symme-try. Suppose that a field theory is described by a LagrangianL[φ] that is invariant under the (infinitesimal) internal sym-metry transformations

φ→ φ = φ+ δφ(φ;α),

where δφ(φ;α) is linear in α, and α is an arbitrary function ofx. For simplicity, let us assume that δφ(φ, α) depends only onα and the first derivatives α,µ, so that

δφ(φ, α) = p(φ)α + qµ(φ)α,µ,

where p(φ) and qµ(φ) are some known coefficients. Then thelocal variation of the Lagrangian is

0 = δL[φ(x),x] =∂L

∂φδφ+

∂L

∂φ,µδφ,µ

=∂L

∂φpα+

∂L

∂φqµα,µ +

∂L

∂φ,µ(pα),µ +

∂L

∂φ,µ(qνα,ν),µ

=

(∂L

∂φp+

∂L

∂φ,µp,µ

)

α+

(∂L

∂φqµ +

∂L

∂φ,ν(pδµν + qµ,ν)

)

α,µ

+∂L

∂φ,µqνα,µν .

Since α(x) is an arbitrary function, the terms involving α,∇α,∇∇α, etc., must separately vanish. Thus we obtain the fol-lowing three identities,

∂L

∂φp+

∂L

∂φ,µp,µ = 0,

∂L

∂φqµ +

∂L

∂φ,ν(pδµν + qµ,ν) = 0,

∂L

∂φ,µqν +

∂L

∂φ,νqµ = 0.

(The last line follows because α,µν is an arbitrary symmetrictensor.) Note that we did not assume that the field φ(x) satis-fies the Euler-Lagrange equation. Hence, the above identitieshold simply because of the mathematical properties of the La-grangian and the fields.As an illustration, let us evaluate the three identities in theMaxwell theory. The field φ is now the 1-form Aµ, and wehave

∂LEM∂Aµ

= 0,∂LEM∂Aµ,ν

= − 1

4πFµν ,

δAµ = α,µ = δνµα,ν ≡ qνµα,ν ; qνµ;ρ = 0; p = 0.

Of the three identities, the first two yield 0 = 0 and the lastone is

∂L

∂Aλ,µqνλ +

∂L

∂Aλ,νqµλ = F νµ + Fµν = 0.

This is a simple mathematical consequence of the definition ofFµν .The invariance of a field theory under gauge transforma-

tions φ → φ(φ;α) does not lead to useful conservation lawsbut has another important consequence for the theory. Con-sider a Cauchy problem for the field φ, which consists of spec-ifying the initial values of the field φin(x) at an initial space-like 3-surface Σ. Suppose that φ(x) is a solution of the Euler-Lagrange equation for the initial data φin on Σ. The transfor-

mation φ → φ is a symmetry of the equations of motion, so

φ(φ;α) is another solution, for an arbitrary function α(x). Thefunction α(x) can be chosen so that α = 0 on Σ but α 6= 0to the future of Σ. The solution φ(φ(x);α) will then have thesame initial data φin on Σ but will differ from φ(x) to the fu-ture of Σ. Thus, the solution of the Cauchy problem is notunique.In fact, this lack of uniqueness was a problem initially en-countered by A. Einstein who came up with the following“hole argument” when trying to derive the equations for themetric g. Suppose g(x) is a solution for the metric in the space-time, subject to some boundary conditions. If the boundaryconditions are imposed on some 3-surface Σ that encircles adomain of spacetime, we can find a small region (a “hole”)inside the domain. Due to the general covariance of the the-ory, we are free to transform the coordinates, x → x, in sucha way that the coordinates are changed only within the holebut remain the same everywhere else. This would change themetric g(x) inside the hole but not elsewhere. The new metricg still satisfies the same boundary conditions as g. Thus, themetric at any given point x is not uniquely specified by theequations of motion and boundary conditions.The resolution of this paradox is that coordinates xµ are ar-bitrary labels assigned to events in spacetime. A gauge trans-formation x → x merely relabels the events but does not in-fluence the observable effects of gravitation (e.g. the distancebetween some physical events). Thus, solutions g and g arephysically equivalent. In any theory with a gauge symmetry,two solutions that differ by a gauge transformation are phys-ically equivalent.Moreover, the freedom to choose an arbitrary function α(x)in a gauge transformation means that any one of the fields φjmay be set to zero (or to any other function). This will reducethe number of unknown functions to solve for. Such a reduc-tion is called gauge fixing. For example, the gauge symmetryAµ → Aµ + ∂µα in electrodynamics allows one to fix A0 = 0and to consider the remaining three components A1, A2, A3

as the only physically relevant fields. However, the Maxwellequations in terms of A1, A2, A3 appear much less symmetricand more difficult to solve. Alternatively, one may choose theLorentz gauge condition Aµ

;µ = 0, or other suitable condi-tion.To conclude: The presence of a gauge symmetry indicatesthat the description of the theory contains “too many” fields.The freedom can be eliminated by fixing the gauge. However,a description in a fixed gauge may be much less convenientthan the original description.

5.2 Hamiltonian formulation

Literature sources: [28], chapter 4; [36], Appendix.

The purpose of this section is to introduce the Hamiltonianformulation of general relativity.The starting point of a classical theory is the Lagrangian

L(qi, qi), where qi(t), i = 1, ..., Nq, are the generalized coor-dinates, and the action functional is

S[qi(t)] =

L(qi, qi)dt.

A Hamiltonian formulation of the theory is then obtainedthrough the following steps:

• Define the canonical momenta pi ≡ ∂L/∂qi.

111

Page 120: Winitzki. Advanced General Relativity

5 Variational principle

• Express qi through pi from these equations.

• Define the Hamiltonian H(pi, qi) ≡ ∑

i piqi − L(qi, qi),where qi are expressed as functions of pi in the right-handside.

• Use the Hamilton equations of motion,

d

dtqi =

∂H

∂pi,

d

dtpi = −∂H

∂qi.

These equations can be also considered as Euler-Lagrange equations following from the Hamiltonian ac-tion principle,

δ

∫ t2

t1

dt

(∑

i

piqi −H(p, q)

)

= 0, (5.30)

where pi(t) and qi(t) are varied independently, subject onlyto the boundary conditions δqi(t1,2) = 0.

In field theory, the role of the generalized coordinates qi(t)is played by the fields φj(t, x, y, z), j = 1, ..., Nφ. The indexi in qi now becomes a “condensed index” i ≡ j, x, y, z inφj(t, x, y, z). Therefore, in the Hamiltonian formalism the de-pendence on time t must be separated from the dependenceon spatial coordinates (x, y, z) ≡ xa. (We shall use Latin in-dices for three-dimensional components.) In a curved space-time, there is no natural time variable, which means that weneed to arbitrarily select a coordinate t and the correspond-ing family of spacelike 3-surfaces t = const. Such a fam-ily of nonintersecting spacelike 3-surfaces, covering the entirespacetime, is called a time foliation of the spacetime. Thetraditional Hamiltonian approach is noncovariant: it requiresan explicit time foliation of the spacetime and treats the timedependence of every variable differently from the spacial de-pendence.The Lagrangian L(qi, qi) is replaced by a Lagrangian L,which is an integral over a 3-surface,

L[φi, φi] =

t=const

d3xa L(φi, ∂aφi, φi),

where we separated the time derivatives φi from the spatialderivatives ∂aφi. The Lagrangian L is now a functional and isdenoted by a script letter, in distinction from the Lagrangiandensity Lwhich is a function.The canonical momentum pi(t, xa) is defined as the func-tional derivative of Lwith respect to φi(t, xa),

pi(t, xa) =δL[φi, φi]

δφi(t, xa).

Finally, the summation over i in the definition of the Hamilto-nian must be replaced by a summation over j and an integra-tion over the 3-surface,

H[pi, φi] =

t=const

j

pj(t, xa)φj(t, xa)d3xa − L[φi, φi].

We shall now derive Hamiltonian formulations for theMaxwell theory where the “field φ” is the 4-potential Aµ(x),and for Einstein’s General Relativity where the “field” is themetric gµν(x).

5.2.1 Electrodynamics in Hamiltonianformulation

The Lagrangian (5.25) of the Maxwell theory is

−√−g16π

FµνFµν =

√−g8π

(

| ~E|2 − | ~B|2)

,

which (in flat space) is the familiar expression for the pres-sure of the electromagnetic field.3 The spacetime metric gµν isnow considered as a known function, so only the electromag-netic potential Aµ is to be determined. (To express this, onesays that gµν is a background field.) To compute the canon-ical momenta corresponding to Aµ, we need to separate thedependence of the Lagrangian on the time derivatives of thefield Aµ.Suppose a time foliation of the spacetime is fixed and somecoordinates xa, a = 1, 2, 3 are chosen on equal-time surfacest = const. Let n be a normalized, timelike vector field orthog-onal to the equal-time surfaces. For each point p, the subspacen⊥(p) ⊂ TpM is the tangent space to the equal-time surfaceintersecting the point p. Then we may decompose the vectorfield Aµ as

A = nA0 + PA,

where A0 ≡ g(n,A) and Px = x − ng(n,x) is the orthogonalprojector onto n⊥. The 4-vector PA is spacelike and tangentto the equal-time surfaces. Hence, in local coordinates xa onequal-time surfaces, the vector PA is described by a 3-vector

with components A1, A2, A3 ≡ ~A. Similarly, we can define

the 3-vectors ~E and ~B. Time derivatives appear in the electric

field ~E,

~E ≡ Ea = ∂tAa − ∂aA0 ≡ ∂t ~A− grad A0,

while the magnetic field

~B ≡ Ba =

3∑

b,c=1

εabc∂bAc = rot ~A

depends only on spatial derivatives. It is clear that the La-grangian does not contain the time derivative of A0. There-fore, we cannot define a canonical momentum p0 for A0,and there are no Hamilton equations of motion for the pairA0, p0; in this case, one says that A0 is not a dynamical vari-able. This conclusion agreeswith the existence of a gauge sym-metry in electrodynamics. Namely, by performing a gaugetransformation Aµ → Aµ + ∂µα one could set A0 to an ar-bitrary fixed function (this is called “fixing the gauge”). Forsimplicity, let us set A0 = 0. Note that the variation of theLagrangian with respect to A0 gives

3∑

a=1

∂a(√−gEa

)≡ √−g div ~E = 0, (5.31)

which does not contain time derivatives and is therefore aconstraint of the theory.There might be a sign error here: p = −E? See Wald [36],p. 461-462.

3Several field theories, e.g. the scalar field and the electromagnetic field,have the energy-momentum tensor of a perfect fluid form, Tµν =ρuµuν − (p + ρ) gµν . In that case, the Lagrangian density L is equal tothe pressure p and the Hamiltonian densityH to the energy density ρ.

112

Page 121: Winitzki. Advanced General Relativity

5.2 Hamiltonian formulation

The canonical momenta pa (a = 1, 2, 3) for the spatial com-ponents of Aµ are easily found,

pa(x) = −√−g4π

Ea =

√−g4π

Aa.

The time derivatives Aa are expressed through the momentaas

Aa =4π√−g pa,

hence the Hamiltonian is

H[pa, Aa, A0] =

t=const

d3xa

(

2π√−g

3∑

b=1

p2b(x) +

√−g8π

∣∣∣ ~B∣∣∣

2)

=

t=const

d3xa

√−g8π

(

| ~E|2 − | ~B|2)

,

which (in flat space) is the familiar expression for the totalenergy of the electromagnetic field at a fixed time.The Hamilton equations of motion are

∂Aa∂t

=δHδpa

=4π√−g pa,

∂pa∂t

= − δHδAa

=∑

b,c

∂c(√−gεabcBb) ≡ −√−g rot ~B.

It is straightforward to see that the above equations, togetherwith the constraint (5.31), are equivalent to the Maxwell equa-tions in vacuum. The constraint is merely a restriction onpossible initial conditions: Once the constraint holds at an

initial time, it will hold at any future time since ∂tdiv ~E =

−div(

rot ~B)

= 0.

Remark: The gauge fixing condition A0 = 0 does not en-tirely remove the gauge freedom because one can still per-form a transformation Aa → Aa + ∂aα with a function α(xa)depending only on spatial coordinates. An additional gaugefixing condition, such as

a

∂a(√−gAa

)= div ~A = 0,

will remove the remaining gauge freedom. Note that the con-

straint div ~E = 0 will be an automatic consequence of thisgauge condition. The resulting gauge is called the radiation

gauge because div ~E = 0 is the Maxwell equation in vacuum,describing the propagation of pure electromagnetic field, andA0 = 0 means, heuristically, the absence of an electrostaticfield component.Since we have fixed the gauge, we had to add the con-straint (5.31) to the Hamilton equations of motion. It is possi-ble instead to keep the nondynamical field A0 in the Hamilto-nian (without introducing the momentum p0). The constraintwill then be obtained as the equation δH/δA0 = 0. To get abetter idea of the Hamiltonian treatment of constrained sys-tems, in the next subsection we shall consider some simpleexamples from classical mechanics.

5.2.2 Hamiltonian mechanics of constrainedsystems

See also [18, 14] for a more detailed and wide-ranging devel-opments.

The Hamiltonian formalism involves replacing velocities qiby momenta pi through the relations pi = δL/δqi. However,in many cases these relations cannot be solved for some qi,and then the standard rule for finding the Hamiltonian can-not be used. For instance, the Lagrangian for electrodynamics

is independent of A0 and thus the relation p0 = δL/δA0 can-not be solved for A0. Another example is when L is linearin, say, the velocity q1; in this case, δL/δq1 is independent ofq1 and again the same problem arises: namely, we cannot ex-press q1 through the momentum p1. In these cases, one canuse a slightly different but equivalent approach to developa Hamiltonian formulation. This approach is known as theFaddeev-Jackiw formalism.The main idea of the Faddeev-Jackiw approach is to rec-ognize that the Hamilton equations of motion are first-orderin time derivatives and follow, as Euler-Lagrange equations,from the action (5.30) that is linear in all the velocities qi. Giventhe Lagrangian (5.30), we do not attempt to introduce newcanonical momenta for qi, because we recognize that thesemomenta are already present in the Lagrangian (they are thevariables pi). Thus, the transition from a Lagrangian formu-lation to a Hamiltonian formulation can be seen as the re-placement of a Lagrangian action L(q, q), which is generallynonlinear in the velocities qi, by another Lagrangian, namely

L ≡ ∑

i piqi − H(p, q), which is linear in the velocities andthus yields first-order Euler-Lagrange equations of motion.

The replacementL→ L comes at the cost of extending the La-grangian by introducing extramomentum variables pi. There-fore, we need to add only as many momentum variables asneeded for the Lagrangian to become linear in all the veloci-ties. If we notice that the Lagrangian is linear in velocities, weconclude that all the necessary momentum variables are al-ready introduced, and the Hamilton equations will be foundas the ordinary Euler-Lagrange equations for an ordinary (La-grangian) system. All these Euler-Lagrange equations willbe (at most) first-order in time derivatives, as expected forHamilton equations. After deriving these equations, one caneasily determine whether constraints are present in the sys-tem. Namely, the constraints will be any equations not con-taining time derivatives.As a first example, consider the Lagrangian

L0(x, x, y, y, z, z) = xy +1

2mz2 − 1

2my2 − V (x, z).

This Lagrangian is linear in x, independent of y, and quadraticin z. We would like to obtain a Hamiltonian formulation forthis system. Since the Lagrangian is nonlinear in z, we intro-duce the canonical momentum for the variable z,

pz ≡∂L0

∂z= mz,

express z = 1mpz , and build the “temporary Hamiltonian”

H0 ≡ pz z − L0|z=pz/m= −xy +

1

2mp2z +

1

2my2 + V (x, z).

Now we formulate the “temporary Hamiltonian action prin-ciple” as the variation of the new, extended Lagrangian,

L0(x, x, y, z, z, pz) ≡ pz z −H0

= pz z + xy − 1

2mp2z −

1

2my2 − V (x, z).

The Lagrangian L0 is linear in all the velocities. Therefore, wedo not need to introduce any more “momentum” variables;

113

Page 122: Winitzki. Advanced General Relativity

5 Variational principle

instead, we simply need to relabel some of the existing vari-ables as “momenta.” Presently, it is clear that y plays the roleof the “momentum” for x. Therefore, relabeling px ≡ y, weobtain the Hamiltonian action,

L0(x, x, px, z, z, pz) = pxx+ pz z −1

2m

(p2x + p2

z

)− V (x, z),

which describes a particle of mass m in a two-dimensionalpotential V (x, z). The Euler-Lagrange equations of motion are

x =1

mpx, px = −∂V/∂x,

z =1

mpz, pz = −∂V/∂z.

Clearly, there are no constraints in this model. We concludethat the Lagrangian L0 is actually not constrained, despite thefact that we could not introduce the momenta px and py inthe conventional manner. The reason for the difficulty waspurely technical: the initial Lagrangian L0 was already “half-way done” becoming a Hamiltonian. However, the unusualLagrangian L0 is completely equivalent to the more conven-

tional L0.Some systems, however, are constrained. As a further ex-ample, consider the Lagrangian

L1(q1, q1, q2, q2) =1

2(q1 − q2)

2.

This Lagrangian is quadratic in q1, so we introduce the mo-mentum

p1 ≡ ∂L1

∂q1= q1 − q2,

solve for q1 = p1 + q2, and obtain the “temporary Hamilto-nian”

H1(q1, p1, q2, p2) = p1q1 − L1|q1=p1+q2=

1

2p21 + p1q2

and the extended Lagrangian

L1(q1, q1, q2, q2, p1) = p1q1 −H1 = p1q1 −1

2p21 − p1q2.

Since L1 is linear in all the velocities, we now find the Hamil-ton equations as the Euler-Lagrange equations for L1:

q1 − p1 − q2 = 0, p1 = 0, p1 = 0.

Clearly, we have one constraint, p1 = 0, and one dynamicalequation, q1 = q2.

Practice problem: The system with the Lagrangian L1 is in-variant under the gauge symmetry

q1 → q1 + α(t), q2 → q2 + ∂tα(t),

where α(t) is an arbitrary function. Determine the identitythat this symmetry generates via Noether’s theorem.

Practice problem: Derive the Hamilton equations of motionfor the following Lagrangians:

L3 =x2

2+ xyz; L4 =

x2

2+

(x+ y)2

2; L5 = xy +

y2

2+ xy.

Determine which of these Lagrangians are constrained.

Of course, after determining the Hamilton equations andthe constraints, one needs to solve these equations. Some-times it is easy to “solve the constraints,” i.e. to express somecanonical variables as functions of others, and thus to reducethe number of dynamical degrees of freedom. In other cases,solving the constraints is a nontrivial task, and instead oneattempts to solve the dynamical equations and impose theconstraints afterwards. A full consideration of this topic isbeyond the scope of these lectures; there exists an extensiveliterature about constrained systems, both classical and quan-tized.

5.2.3 Gauss-Codazzi equation

In the Hamiltonian formalism, time derivatives play a com-pletely different role from spatial derivatives. To compute thecanonical momenta in general relativity, we will need to sep-arate the spatial and the temporal derivatives in the Einstein-Hilbert Lagrangian

√−gR. Under “spatial derivatives” itis natural to understand the intrinsic covariant derivativeswithin the 3-surfaces t = const. (We know from Sec. 1.10.1-1.10.2 that a 3-surface embedded in a manifold receives an in-duced metric and an induced Levi-Civita connection.) There-fore, the first step toward a Hamiltonian formulation of GR isto express the curvature scalar R as a sum of a term depend-ing only on spatial derivatives of the metric and remainingterms.Consider a 3-surface Σ embedded into a spacetime man-ifold M with a metric g. The surface Σ is spacelike if thenormal vector is timelike. The surface Σ is timelike if thetangent space to Σ at each point, TΣ(p), contains a timelikevector (and thus is spanned by one timelike and two space-like directions). A timelike 3-surface can be considered asa “sub-spacetime” where two-dimensional creatures mightlive. (To emphasize this interpretation, a timelike 3-surfaceis frequently called “2+1-dimensional”.) A surface Σ receivesthe induced metric h = g|TΣ, so the two-dimensional crea-tures may define the induced Levi-Civita connection (3)∇ andcompute the corresponding (3-dimensional) Riemann tensor(3)R(a,b, c,d) for tangent vectors a,b, c,d. There is a rela-tionship between the reduced curvature (3)R and the curva-ture of the full 4-dimensional spacetime, which we denote(4)R for clarity. This relationship is called the Gauss-Codazziequation, which we shall now derive.Let n be a (spacelike) vector field which is everywhere nor-mal to the surface Σ and normalized, g(n,n) = −1. (The caseof a timelike n is fully analogous.) We know from Sec. 1.10.1and 1.10.2 that the induced Levi-Civita connection (3)∇ isfound using the projector P = 1 − n ⊗ gn onto the tangentbundle TΣ,

(3)∇xt = ∇xt − Γ(x)t,

where we have defined Γ as a transformation-valued 1-form,

Γ(x)t ≡ ng(t,∇xn). (5.32)

Although the calculations in Sec. 1.10.1-1.10.3were performedfor the case of flat spacetimeM, the assumption (4)R = 0 canbe easily dropped. In the derivation leading to Eq. (1.85), weonly need to retain the underlined term [∂x, ∂y] z−∂[x,y]z and

to replace it by (4)R(x,y)z. This yields

(3)R(x,y)z = (4)R(x,y)z + [Γ(x),Γ(y)] z

+ (∇xΓ)(y)z − (∇yΓ)(x)z.

114

Page 123: Winitzki. Advanced General Relativity

5.2 Hamiltonian formulation

Substituting the definition (5.32) of Γ, we find (for tangent vec-tors x,y, z, t)

Γ(x)Γ(y)z = ng(Γ(y)z,∇xn) = ng(n(...),∇xn) = 0,

(∇xΓ)(y)z = (∇xn)g(z,∇yn) + n(...),

g((∇xΓ)(y)z, t) = g(t,∇xn)g(z,∇yn),

where we denoted by (...) some uninteresting terms that can-cel under the scalar product. Hence,

(3)R(x,y, z, t) =(4)R(x,y, z, t)

+ g(t,∇xn)g(z,∇yn) − g(t,∇yn)g(z,∇xn).

The above expression contains the bilinear form

B(n)(x,y) ≡ g(∇xn,y),

which is the distortion tensor of the field n (see Sec. 2.3.1).Since the vector field n is, by construction, orthogonal to the3-surfaceΣ, we may extend n to a normalized geodesic vectorfield in a neighborhood of Σ, and thus the tensor B(n) is sym-metric and transverse to n. Effectively,B(n) is a 3-dimensionalsymmetric tensor in the tangent bundle TΣ. Therefore, themetric g in the definition ofB(n) can be replaced by the partialmetric

B(n)(x,y) = h(∇xn,y),

h(x,y) ≡ g(x,y) + g(x,n)g(y,n).

In this context, the distortion tensorB(n) is called the extrinsiccurvature tensor of the surface Σ and denotedKΣ,

KΣ(x,y) = h(∇xn,y); Kµν = hανnα;µ.

The extrinsic curvature KΣ depends only on the derivativesof n in tangential directions and is thus independent of thecompletion of n to a vector field outside Σ.Note that the same tensor KΣ can be defined by using anarbitrary, unnormalized vector n, in the following way:

KΣ(x,y) = h(∇xn,y),

h(x,y) ≡ g(x,y) − 1

g(n, n)g(x,n)g(y,n).

The resulting tensor is identical toB(n) for n = |g(n, n)|−1/2n.

Using the extrinsic curvature KΣ, the 3-dimensional Rie-mann tensor (defined only for tangent vectors x,y, z, t) can berewritten as

(3)R(x,y, z, t) =(4)R(x,y, z, t) +KΣ(x, t)KΣ(y, z)

−KΣ(y, t)KΣ(x, z). (5.33)

This is called the Gauss-Codazzi equation.The induced connection (3)∇ can be expressed through theextrinsic curvatureKΣ as

(3)∇xy = ∇xy − ng(∇xn,y) = ∇xy − nKΣ(x,y).

where x,y are tangent to Σ. In the index notation, the con-ventional way to denote induced derivatives is by a verticalbar,

((3)∇xy

≡ xνyµ|ν .

Thus we can write (for a tangent vector y ≡ yµ)

yµ|ν = yµ;ν − nµyαKαν .

Remark: The reason for calling KΣ the “extrinsic” curva-ture is the following. The “intrinsic” curvature (3)R of the3-surface Σ will differ from the full curvature (4)R of Mif the 3-surface Σ is embedded into the spacetime M in a“curved” way. The tensor KΣ contains the complete infor-mation needed to compute the intrinsic curvature on Σ if thefull curvature (4)R is known. The covariant derivative (3)∇ isalso expressed throughKΣ and

(4)∇. Therefore, the tensorKΣ

completely characterizes the additional curvature on Σ due tothe embedding of Σ intoM. Note, however, that not all thecomponents of (4)R can be restored from the knowledge ofthree-dimensional tensors (3)R and KΣ: one also needs someinformation about the 4-vector field n and its derivatives.

Statement: The relation (5.33) was derived only for tangentvectors; the tensor (3)R vanishes if any of its arguments is par-allel to n. It is possible, however, to express (4)R(x,y, z,n) interms ofKΣ for tangent x,y, z:

(4)R(x,y, z,n) =(

(3)∇yKΣ

)

(x, z) −(

(3)∇xKΣ

)

(y, z).

(5.34)We can derive Eq. (5.34), starting from Eq. (5.32).Derivation: We start from

(3)R(x,y)z = (4)R(x,y)z + [Γ(x),Γ(y)] z

+ (∇xΓ)(y)z − (∇yΓ)(x)z.

We write K instead of KΣ for brevity. Substituting Γ(x)y =nK(x,y) and using the fact that K is symmetric and trans-verse to n, we find

Γ(x)Γ(y)z = nK(Γ(y)z,x) = nK(n(...), ...) = 0,

(∇xΓ)(y)z = (∇xn)K(y, z) + n (∇xK) (y, z),

g((∇xΓ)(y)z,n) = (∇xK) (y, z).

Since (3)R(x,y, z,n) = 0, we have

(4)R(x,y, z,n) = (∇yK) (x, z) − (∇xK) (y, z).

The derivatives ∇K can be replaced by (3)∇K because K istransverse to n:

K(∇xy, z) = K((3)∇xy, z) +K(n(...), ...) = K((3)∇xy, z);

(∇xK) (y, z) ≡ Lx (K(y, z)) −K(∇xy, z) −K(y,∇xz)

= (3)∇x (K(y, z)) −K((3)∇xy, z) −K(y, (3)∇xz)

= (3) (∇xK) (y, z).

Thus we obtain the desired formula.

5.2.4 Boundary term in Einstein-Hilbert action

In Sec. 5.1.2 we derived the Einstein equation from a varia-tional principle by imposing the condition that both δg and

∇δg must vanish on a boundary surface ∂M. This is dif-ferent from the usual condition where only the variation ofa field is constrained to vanish (but not its derivative). Theformal reason for imposing the extra condition was the pres-ence of a total divergence term,

M

√−gd4pdivq, where q isa vector field depending on the derivatives ∇δg, as defined byEq. (5.14). However, one can cancel this divergence term bysubtracting an extra boundary term from the Einstein-Hilbertaction, thus defining the “corrected” action SG[g],

SG ≡ SEH −∮

∂M

2K√hd3p,

115

Page 124: Winitzki. Advanced General Relativity

5 Variational principle

where K is a suitable function (the factor 2 is for later conve-nience), and h is the induced metric on the 3-surface ∂M. Inthis section, we show how to choose the required function Kso that the variation of the action SG[g] yields simply

δSG[g] =

M

√−gd4p

(

Rµν −1

2Rgµν

)

δgµν(p),

without any total divergence terms. The only boundary con-

dition will be that δgµν should vanish on ∂M.An analogy with ordinary mechanics may be helpful. Con-sider the familiar nonrelativistic Lagrangian

L(q, q) =1

2mq2 − V (q),

describing a particle of mass m in a potential V (q). This La-grangian yields the same Euler-Lagrange equations as

L(q, q, q) = −1

2mqq − V (q).

However, the Lagrangian L depends on the second derivative

of q. The variation of L with respect to δq(t) contains bound-ary terms with δq(t) as well as δq(t),

δ

∫ t2

t1

Ldt = − 1

2mqδq

∣∣∣∣

t2

t1

+1

2mqδq

∣∣∣∣

t2

t1

−∫ t2

t1

(mq + V ′(q)) δq(t)dt.

At first glance, we are required to impose the boundary con-ditions δq(t) = δq(t) = 0 for t = t1,2. However, the problem

with the Lagrangian L can be fixed by adding a “boundaryterm,” so that the “corrected” action becomes

Scorr[q] =

∫ t2

t1

L(q, q, q)dt+1

2mqq

∣∣∣∣

t2

t1

.

The variation of the boundary term is

δ1

2mqq

∣∣∣∣

t2

t1

=1

2mqδq

∣∣∣∣

t2

t1

+1

2mqδq

∣∣∣∣

t2

t1

,

which precisely cancels the unwanted termwith δq in the vari-

ation δL.Instead of adding a boundary term, we may add its deriva-

tive to the Lagrangian L, so that the “corrected” action is writ-ten as

Scorr[q] =

∫ t2

t1

[

L(q, q, q) +d

dt

(1

2mqq

)]

dt.

Of course, this is identical to the usual action∫Ldt for the

particle in a potential. The additional term cancels the second

derivative present in L.In General Relativity, it is more convenient to keep theboundary term as such. The Einstein-Hilbert Lagrangian√−gR is equivalent to other Lagrangians depending only onfirst derivatives of the metric, but none of the “corrected” La-grangians cannot be written in a covariant form.Presently, we are looking for a function K such that

δ (2K) = g(n,q), (5.35)

where n is a normalized vector field orthogonal to the 3-

surface ∂M. If we find such K , then the boundary term willbe equal to the volume integral of div (2Kn).

The induced partialmetric h is defined using the orthogonal

projector onto the tangent bundle T∂M,

hµν = gµν − nµnν ; hµν = gµν − nµnν .

(For simplicity, we assume that the 3-surface ∂M is every-where spacelike and g(n,n) = 1. For spacelike n, the inducedmetric would be h−1 = g−1 + n ⊗ n and we would need totake

√h instead of

√−h. Finally, the surface ∂M may con-

sist of timelike and spacelike pieces, as long as it is nowherenull.) The remainder of this section is occupied by a calcula-tion showing thatK ≡ hµνnµ;ν satisfies Eq. (5.35).

Remark: Note that the quantityK is equal to the trace of the

extrinsic curvature of the 3-surface ∂M, namely

K = Tr(x,y)K∂M(x,y) = gµνKµν = hµνKµν .

On the surface ∂M, we have hµνnµ;ν = gµνnµ;ν = divn. How-ever, the function divn contains non-tangential derivatives ofthe metric and thus cannot be used for a variational principlewhere∇δg is not zero.We begin with Eq. (5.17),

g(n,q) = nβ(δgαβ ;α − gµνδg

µν;β)= −nβgµν(δgµβ;ν − δgµν;β) .

(5.36)Since the variation δgµν of the metric vanishes on ∂M, it fol-lows that any tangential derivative of δg vanishes. Since theinverse partial metric hαβ consists of tensor products of tan-gential vectors, we have

hαβδgµν;α = 0. (5.37)

Substituting gµν = hµν + nµnν into Eq. (5.36), we thereforefind

g(n,q) = −nβhµν (δgµβ;ν − δgµν;β) = nβhµνδgµν;β .

On the other hand, we consider the variation δK due to thevariation δg. Since δg = 0 on the surface ∂M, the vector fieldn remains normalized and orthogonal to the surface, and theonly varying quantity is the Levi-Civita connection ∇, thus

δK ≡ hµν(

∇µnν −∇µnν

)

= −δΓαµνhµνnα.

Using Eqs. (5.15) and (5.37), we find

δK = −1

2hαβnµ (δgµα;β + δgµβ;α − δgαβ;µ) =

1

2hαβnµδgaβ;µ.

Therefore,2δK = hαβnµδgaβ;µ = g(n,q)

as required.

Remark: Adding a boundary integral to the Einstein-Hilbertaction is equivalent to adding a total divergence∇µ(2Kn

µ) ≡div(2Kn) to the Lagrangian

√−gR, provided that the vectorfield n is extended from the boundary ∂M to the entire do-

main M (note thatK is a function of n and the metric g). Moregenerally, one can add an arbitrary total divergence divB ofsome vector field B to the Lagrangian, without changing theequations of motion. When the vector field B is chosen ap-propriately, the variation δSG of the “corrected” action

SG[g] ≡ 1

16πG

M

[R− divB]√−gd4x

116

Page 125: Winitzki. Advanced General Relativity

5.2 Hamiltonian formulation

contains only volume and boundary terms with δgµν . Thisindicates that SG[g] is first-order in the derivatives of g, as arethe Lagrangians of most other field theories. However, thevector field B cannot be expressed as a function of the metricg alone.It seems4 that B can be expressed through a choice of four

vector fields, one of which should coincide with n on ∂M.Denote by K[n] the extrinsic curvature function evaluated ona normalized vector n. Since we will need to vary the metricwhile keeping the vector n fixed, we need a formula for K[n]that includes the normalization of n,

K[n] = divn− g(n,∇nn)

g(n,n).

Choose an orthonormal frame e0, e1, e2, e3 in the entire do-main M. The vector fields eµ are orthonormal with respectto the metric g. Then define

B =∑

µ

2K[eµ]g(eµ, eµ)eµ.

This specifies B as a function of g that contains first deriva-tives of g. Then one can show that R − divB depends onlyon first derivatives of g. This can be shown, for instance, byconsidering a conformal transformation g → g ≡ e2λg. One

computes R − divB and checks that the term containing λ

in R cancels with the terms∑

µ eµ (eµ λ) coming from B.This explanation needs some more detail.Thus, the explicit form of SG[g] depends not only on g butalso on an essentially arbitrary choice of the vector field B

in the interior of the domain M. Therefore, it is impossibleto write the first-order action SG[g] as an explicit and covari-ant (i.e. coordinate-independent) function only of g. In a par-ticular coordinate system, the vector field B may be chosenwith particular numerical components so that the first-orderLagrangian appears to be a function only of the components gµνand gµν,α. However, this form of the first-order Lagrangian isnot covariant (it depends on a chosen coordinate system).

5.2.5 The Hamiltonian for pure gravity

Now we shall compute the canonical momenta and theHamiltonian for pure gravity (without matter) in general rel-ativity.For simplicity, we choose a time foliation such that the(timelike) vector field n ≡ g−1dt normal to the surfaces t =const is normalized, g(n,n) = 1. This means that n is ageodesic vector field (why?); also, t is the proper time alongthe flow lines of n. This choice of the time foliation, called thesynchronous gauge, may be impossible to perform globallydue to focusing of the field n. However, this choice is alwayspossible locally, because one can emit timelike geodesics nor-mal to an arbitrary initial spacelike surface and use the propertime along the geodesics as the coordinate t. It is also con-venient to choose the local coordinates xa within a surfacet = const such that the basis vectors ea are connecting vectorsfor n, i.e. Lnea = 0. This completely removes the coordinatefreedom (“fixes the gauge”).The partial metric h induced on the 3-surface t = const is

hµν = gµν − nµnν .

4I need to make this consideration more precise.

Hence, in a basis n,a,b, c, where a,b, c are orthogonal to n,the metric g has the matrix representation

gµν =

1 0 0 00 ∗ ∗ ∗0 ∗ ∗ ∗0 ∗ ∗ ∗

,

where the stars indicate the unknown nonzero componentshab of the partial metric h. Note that

√−g =√−h, so the

covariant 4-volume element is√−gd4x =

√−hdt ∧ d3xa,

while the covariant 3-volume element on a surface t = constis√−hd3xa.It is clear that the Lagrangian depends only on h, thereforewe choose h(t, xa) to be the set of “generalized coordinates”in the sense of the Hamiltonian formalism. The next task isto find the dependence of the Lagrangian,

√−gR, on ∂hab/∂tand to compute the corresponding canonical momenta pab.

The time derivative hab ≡ ∂hab/∂t can be expressed in ageometric way as

h ≡ Lnh,

since the basis vectors in the 3-surface commute with n ≡ ∂t,i.e. Lnea = 0:

∂hab∂t

= ∂t (h(ea, eb)) = Ln (h(ea, eb)) = (Lnh) (ea, eb).

Hence, for tangent vectors x,y we have

(Lnh) (x,y) = Ln (h(x,y)) − h(Lnx,y) − h(x,Lny)

= ∇nh(x,y) − h(∇nx −∇xn,y) − h(x,∇ny −∇yn)

= h(∇xn,y) + h(x,∇ny) = 2K(x,y), (5.38)

where K(·, ·) is the extrinsic curvature of the 3-surface t =const. Therefore, hab = 2Kab, and the dependence of the

Einstein-Hilbert action on hab will be found if we express

SEH [g] =∫d4x

√−gR through the extrinsic curvature Kab ofthe surfaces t = const.We already saw a relation of precisely this sort, namely theGauss-Codazzi equation (5.33) that expresses the 4-curvatureR ≡ (4)R through the 3-curvature (3)R and the extrinsic cur-vatureKab. The only nontrivial calculation in this section willbe to show that∫

d4x√−g(4)R =

d4x√−g

((3)R+KabK

ab −KaaK

bb

)

,

(5.39)up to total derivative terms that we omit. Note that Eq. (5.33)is very similar to Eq. (5.33), except for the different sign at theK terms. We shall derive Eq. (5.39) from Eq. (5.33) at the endof this section, and now let us assume that Eq. (5.39) is validand determine the canonical momenta.Since the Lagrangian density

√−gR depends on ∂h/∂t onlythrough the terms containing Kab, the canonical momentumpab is

pab ≡ 1

16πG

∂ (√−gR)

∂hab=

√−h

32πG

∂Kab

(KabK

ab −KaaK

bb

).

Rewriting for convenience

KabKab −Ka

aKbb = KabKcd

(hachbd − habhcd

),

we find after a simple calculation

pab =

√−h

16πG

(Kab −Kc

chab). (5.40)

117

Page 126: Winitzki. Advanced General Relativity

5 Variational principle

The inverse relation allows us to express the “velocity” hthrough the momentum p:

hab = 2Kab =32πG√−h

(

pab − 1

2pcch

ab

)

.

Now we can write the Hamiltonian for pure gravity:

H[pab, hab]

=

t=const

d3x

[

pabhab −√−h

16πG

((3)R+KabK

ab −KaaK

bb

)]

=

t=const

d3x

√−h

16πG

[

KabKab −KaaK

bb − (3)R

]

=

t=const

d3x

[

−√−h

16πG(3)R+

16πG√−h

(

pabpab −1

2paap

bb

)]

.

(5.41)

The Hamilton equations of motion are cumbersome but fol-low straightforwardly,

∂hab∂t

=∂H

∂pab=

32πG√−h

(

pab −1

2habp

cc

)

,

∂pab∂t

= − δH

δhab

= −√−h

16πG

δ(3)R

δhab+

δ

δhab

(

pabpab −1

2paap

bb

)16πG√−h

= −√−h

16πG

(

(3)Rab −1

2(3)Rhab

)

+8πG√−h

[(

pabpab −1

2paap

bb

)

hab − 4

(

pacpcb −

1

2pccpab

)]

.

Note that we used the three-dimensional Ricci tensor (3)Rabwhen computing the variation of (3)R.The equations of motion in the Hamiltonian form are equiv-alent to Einstein’s equations, but now it is clear that the metrichab(t) and the extrinsic curvature Kab(t) can be found frominitial data hab(t0), Kab(t0) at an initial spacelike 3-surfacet = t0, as a solution of an appropriate Cauchy problem. Thishas applications in “numerical relativity,” i.e. a numerical so-lution of the Einstein equations.

Derivation of Eq. ( 5.39)

Let us now derive Eq. (5.39) from Eq. (5.33).The 3-dimensional curvature (3)R is defined as the3-dimensional trace of the induced Riemann tensor(3)R(a,b, c,d) with respect to (a, c) and (b,d). Since n

is orthogonal to the 3-surfaces, the 3-dimensional trace (3)Trof any tensor A can be expressed through its 4-dimensionaltrace as

(3)Tr(x,y)A(x,y) = (4)Tr(x,y)A(x,y) −A(n,n).

The Gauss-Codazzi equation (5.33) is

(3)R(x,y, z, t) = (3)R(x,y, z, t)+K(x, t)K(y, z)−K(y, t)K(x, z).

We begin by evaluating the 3-dimensional trace of the aboveequation over (x, z):

(3)Tr(x,z)(3)R(x,y, z, t) =(3)Tr(x,z)

(4)R(x,y, z, t)

+(3)Tr(x,z)K(x, t)K(y, z) −K(y, t)Tr K

= (4)Tr(x,z)(4)R(x,y, z, t) − (4)R(n,y,n, t)

+(3)Tr(x,z)K(x, t)K(y, z) −K(y, t)Kaa ,

where we wrote Kaa ≡ (3)Tr(x,y)K(x,y). Then we evalu-

ate the 3-trace of the result over (y, t), using the fact thatR(n,n, ·, ·) = 0 and hence

(3)Tr(x,z)(4)R(n,x,n, z) = (4)Tr(x,z)

(4)R(n,x,n, z).

We find

(3)R ≡ (3)Tr(x,z)(y,t)(3)R(x,y, z, t) =(4)R

−2(4)Tr(x,z)(4)R(n,x,n, z) +KabK

ab −KaaK

bb .

It remains to compute the term (4)Tr(x,z)(4)R(n,x,n, z). I

shall present a computation in the index notation and also anindex-free computation, for comparison. Let us first convertthis term to the index notation:

Tr(x,z)(4)R(n,x,n, z) ≡ Tr(x,z)zδgγδnαxβ [∇α,∇β]n

γ

= gβγgγδnα [∇α,∇β ]n

γ = nα [∇α,∇β ]nβ .

The last expression is simplified as

nαnβ ;βα − nαnβ ;αβ =(

nαnβ;β

)

;α− nα;αn

β;β −

(nαnβ;α

)

;β+ nα;βn

β;α

= div (...) +KαβKαβ −Kα

αKββ ,

where we used the relationKαβ = nα;β and omitted the unin-teresting total divergence terms. Finally, we obtain

(3)R ≡ (4)R−KabKab +Ka

aKbb + div (...) ,

which is equivalent to Eq. (5.39).In the index-free notation, the manipulations with traces aresomewhat less transparent. We need to use the property thatmute vectors have vanishing derivatives under the trace op-eration, as well as

A(x,y, z, ...) = Tr(a,b)g(a,x)A(b,y, z, ...)

(see Sec. 1.7.3). Also, we have ∇nn = 0 and K(x,y) =K(y,x). Thus,

Tr(x,z)(4)R(n,x,n, z) = Tr(x,z)g(z,∇n∇xn + ∇∇xnn)

=Tr(a,b)(x,z) [g(a,n)g(z,∇b∇xn) + g(a,∇xn)g(z,∇bn)] .

The second term in brackets yields directly

Tr(a,b)(x,z)g(a,∇xn)g(z,∇bn) = Tr(a,b)(x,z)K(a,x)K(z,b) = KabK

while the first term in brackets is transformed as

g(b,n)g(z,∇a∇xn) = ∇a [g(b,n)g(z,∇xn)]−g(b,∇an)g(z,∇xn),

which yields a total divergence and the term

−Tr(a,b)(x,z)g(b,∇an)g(z,∇xn) = −[Tr(a,b)K(a,b)

]2= − (Ka

a)2 .

Therefore we again recover Eq. (5.39).

5.2.6 Constraints in General Relativity

The Hamilton equations of motion are incomplete withoutthe constraint equations which we neglected to derive earlier.These constraints are present because the Lagrangian LEH isindependent of the time derivatives of the components g0µ ofthe metric. To see this explicitly, we can repeat the derivation

118

Page 127: Winitzki. Advanced General Relativity

5.2 Hamiltonian formulation

of the Hamiltonian formulation without assuming that t is theproper time along geodesic lines of n.We have computed the Hamiltonian in the syncronousgauge for simplicity, but the calculation can be performed inan arbitrary gauge, i.e. with an arbitrary selection of the timefoliation. Suppose that a function t is given on a spacetime,such that the contravariant gradient vector g−1dt is every-where timelike (but not necessarily normalized). We may usethe function t as the time coordinate and define equal-time3-surfaces t = const. Let us define the normal vector to the3-surfaces, n ≡ Ng−1dt, where N is such that g(n,n) = 1,namely N−2 = g−1(dt,dt). The local coordinates xa on these3-surfaces may be chosen arbitrarily. Given a choice of thelocal coordinates xa, we may define the vector ∂t which actson functions f(t, xa) as f → ∂f/∂t, the partial derivative atfixed xa. The vector ∂t is not necessarily parallel to n, and byconstruction we have

g(n, ∂t) = Ng(g−1dt, ∂t) = N (dt) (∂t) = N,

hence in general we can decompose ∂t as

∂t = Nn + s,

where s is a spacelike vector tangent to the surface t = const,

which is thus equivalent to a 3-vector s =∑3a=1 s

aea in thelocal basis ea. The parametersN and sa are called the lapsefunction and the shift vector for a chosen coordinate sys-tem. The lapse describes how much time “elapses” betweentwo consecutive t = const surfaces along the lines of n, andthe shift vector shows how much the local coordinate systemshifts when we pass from one t = const surface to anotheralong n.Let us now determine the induced metric h within the 3-surfaces. The matrix hab ≡ g(ea, eb) is related to the metricgµν in the basis ∂t, ea by the following matrix representation(verify this!),

gµν =

N2 + h(s, s) s1 s2 s3s1 h11 h12 h13

s2 h21 h22 h23

s3 h31 h32 h33

,

where sa ≡ habsa. It follows from this representation that the

determinants of g and h are related by√−g = N

√−h. (See

Calculation below.)

Calculation: Show that det g = N2 dethwhen gµν is relatedto hab by the above formula.Solution: To compute det g directly, subtract from the firstrow of the matrix gµν the linear combination of the other threerows with coefficients sa. This does not change the determi-nant, but the first row of the matrix simplifies to

(N2, 0, 0, 0

).

Hence

det gµν = det

∣∣∣∣

N2 + sasa sa

sb hab

∣∣∣∣

= det

∣∣∣∣

N2 0sb hab

∣∣∣∣= N2 dethab.

The action (5.39) involves (3)R, which is independent of thechoice of gauge, and the extrinsic curvatureKab. The relation-ship between the extrinsic curvatureK(x,y) = h(x,∇yn) andthe metric derivatives ∂hab/∂t needs to be derived now. Since

∂t is a connecting vector for ea, we have

∂hab∂t

≡ ∂t [h(ea, eb)] = L∂t[h(ea, eb)]

= (L∂th) (ea, eb),

and hence for arbitrary vectors x,y tangent to a 3-surface, wefind, similarly to Eq. (5.38),

(L∂th) (x,y) = h(∇x∂t,y) + h(x,∇y∂t).

Since h(n, ·) = 0, we can simplify the above terms, e.g.

h(∇x∂t,y) = h(∇x(Nn + s),y)

= Nh(∇xn,y) + h(∇xs,y)

= NK(x,y) + h((3)∇xs,y).

In the last line, we replaced∇ by (3)∇ since, by definition,

h((3)∇xy, z) ≡ g(∇xy, z)

for tangent vectors x,y, z. Hence,

∂hab/∂t = 2NKab + sa|b + sb|a,

where we denote (3)∇a by a vertical bar.It now follows from Eq. (5.39) that the Lagrangian LEH =√−gR does not depend on the time derivatives of N and

sa. Therefore, we do not need to introduce the canonical mo-menta for these nondynamical variables. As before, the onlycanonical momenta are pab corresponding to the componentshab of the induced 3-metric, and a simple calculation showsthat pab is still related to Kab by Eq. (5.40). Hence, the Hamil-tonian is

H =

d3xa

√−h

16πG

[(

KabKab −KaaK

bb − (3)R

)

N

−(Kab −Kc

chab)

|asb

]

.

A complete derivation of the Hamiltonian in an arbitrarygauge (including also a detailed treatment of all the bound-ary terms) can be found in the book [Poisson 2004], chapter 4.Here, we shall only determine the constraints of the theory.The variables N and sa enter the Hamiltonian as nondy-namical Lagrange multipliers, therefore the constraints are

δHδN

= 0 ⇒ KKab −KaaK

bb − (3)R = 0,

δHδsa

= 0 ⇒(Kab −Kc

chab)

|a= 0.

The constraint δH/δN = 0 is called the Hamiltonian con-straint or the energy constraint, while δH/δsa = 0 is calledthe momentum constraint. We find (perhaps with surprise)that the constraints make the Hamiltonian itself vanish, H =0.

Remark: A vanishing Hamiltonian is a characteristic featureof any theory which is invariant under arbitrary coordinatetransformations. The constraint means that H[pab, hab] = 0for pab, hab that solve the equations of motion. It is importantto realize that the constraint H = 0 does not make the theorytrivial; the Hamiltonian H[pab, hab] is a nontrivial functional ofthe canonical variables pab, hab and can be used to derive theequations of motion.

119

Page 128: Winitzki. Advanced General Relativity

5 Variational principle

5.3 Quantum cosmology

Additional literature: [35].

The Hamiltonian description of General Relativity was de-veloped to quantize that theory. We shall now describe theapproach to quantization of gravity due to J. Wheeler and B.DeWitt. So far it was impossible to develop a full quantumtheory of gravity based on the Hamiltonian description pre-sented above. Technical difficulties are too great to derive indetail, for example, the quantum state that corresponds to aSchwarzschild spacetime. Therefore this approach to quanti-zation of gravity remains formal.5 However, as we shall nowsee, some interesting qualitative results can be obtained in theWheeler-DeWitt approach.

5.3.1 Wave function of the universe

According to the general scheme of canonical quantization,we need to replace the coordinates q, p by operators p, q satis-fying the standard equal-time commutation relations. In thecase of gravitation, we shall therefore write

[

pmn(t, xa), hcd(t, xb)

]

= δcmδdnδ

(3)(xa − xb).

It is easier to visualize the quantum theory in the Schrödingerpicture. The quantum state in the “coordinate representation”is a wave function ψ(t; qi) of time t and the generalized co-ordinate qi; the operator qi acts as multiplication by qi, andpi = −i ∂∂qi

. In General Relativity, the role of qi is played by the

3-metric hab(xc), stripped of the dependence on time. There-fore, the “wave function” is a functionalΨ [t;hab(xc)]. Heuris-

tically, one might expect that |Ψ[t;hab]|2 is the probability ofobserving the metric hab(xc) at time t. (Below, we shall seethat the interpretation of this wave function is not quite asstraightforward as it may appear.)

We may also consider other matter fields φj(t, xa) coupledto gravity. In the Hamiltonian formalism, these fields willbe described by generalized coordinates φj(xa) and the cor-responding canonical momenta pj(xa). Thus, the wave func-tion of the full theory will be a functional of hab and φj . Such afunctional Ψ[t;hab(xc), φj(xc)] is called the wave function ofthe universe because it describes (in principle) all the possibleprocesses anywhere in the universe.

The space consisting of all the possible gravity and mat-ter field configurations hab(xc), φj(xc) is called the super-space.6 Thus, the wave function of the universe is a (complex-valued) function on superspace.

According to the standard rules of quantum mechanics, thewave function of the universe satisfies the Schrödinger equa-

5Currently, a more promising approach to canonical quantization of GeneralRelativity is a Hamiltonian approach based on a different choice of gen-eralized coordinates, called “Ashtekar variables.” The theory based onthese variables is called “loop quantum gravity” and is beyond the scopeof this book.

6Note that a point of superspace is a configuration hab(xc), φj(xc) whichcontains functions only of 3-dimensional coordinates xc, not of time t.This can be visualized as an “instantaneous” field configuration on a 3-surface of constant time.

tion,

i∂

∂tΨ[t;hab, φj ] = HΨ[t;hab, φj ]

≡ H (pab, hab; pj, φj)Ψ[t;hab, φj ]

= H[

δ

δhab(xc), hab(xc);

δ

δφj(xc), φj(xc)

]

Ψ[t;hab, φj ].

Here, the operator H is the total Hamiltonian of the sys-tem, which is itself a functional of all the canonical variables.For instance, the Hamiltonian for pure gravity is given byEq. (5.41). Note that the operators pab and pj are replaced byfunctional derivatives with respect to hab and φj .

5.3.2 Wheeler-DeWitt equation

Since classical General Relativity is a constrained theory, theconstraints must be passed on to the quantum theory as well.A comprehensive treatment of quantization for constrainedsystems is beyond the scope of these lectures; here we shalladopt the simplistic point of view that the constraint equa-tions, which are of the form C(qi, pi) = 0, should be madeoperator-valued and imposed as operators on physical states:

C(qi, pi)Ψ(qi) = 0.

In other words, quantum states Ψ satisfying the constraintsare the only valid physical states of the system. It follows thatthe Hamiltonian constraint, H = 0, is translated to the restric-tion

HΨ[t;hab, φj ] = 0. (5.42)

Then the Schrödinger equation yields

i∂

∂tΨ[t;hab, φj ] = 0.

Hence, the wave function of the universe is time-independentand satisfies the Hamiltonian constraint (5.42), which is calledin this context theWheeler-DeWitt (WD) equation. The timeindependence of the wave function may appear to be a puz-zling feature of the theory. However, it is a necessary feature:In a generally covariant theory, the time parameter t is anarbitrary label on events in spacetime, and physically mean-ingful probabilities must be defined in terms of coordinate-independent functionals of hab and φj . We shall consider theinterpretation of Ψ[t;hab, φj ] in the next section.Although we wrote the equations for quantized GeneralRelativity, it remains difficult to extract any tangible resultsfrom these equations. Quite apart from the problem of solv-ing the highly complicated equation (5.42), the question of theoperator ordering: The HamiltonianH is a nonlinear functionof hab and pab, containing terms such as habhcdp

acpbd, hence itis unclear how to order these noncommuting operators in the

quantum Hamiltonian H. The quantum Hamiltonian is thusnot a well-defined operator unless we adopt a particular pre-scription for the operator ordering. To determine the “correct”operator ordering, one needs to compute some predictions ofthe quantum theory with one or another operator orderingand to compare these predictions with experiments or withother known results. Such computations are generally too dif-ficult, and thus the Wheeler-DeWitt equation (5.42) remains,in its full generality, a formula without application. (How-ever, belowwe shall see that theWheeler-DeWitt equation canbe transformed into a differential equation and solved, if onerestricts gravity and matter fields to spatially homogeneousconfigurations.)

120

Page 129: Winitzki. Advanced General Relativity

5.3 Quantum cosmology

5.3.3 Interpretation of the wave function

Even if one somehow computes the wave function of the uni-verse Ψ, its interpretation is nontrivial, mainly because thefunctional Ψ[hab, φj ] is independent from the time parametert. This fact, however, does not mean that the theory alwaysdescribes a static universe! A physically meaningful “time”must be defined not as the value of the parameter t, whichcan be changed by a coordinate transformation, but throughsome physical process (a “clock”). Since the wave functionof the universe contains information about all the processesthrough its dependence on hab and φj , the theory can describean evolving universe if an appropriate “clock process” couldbe found.It is important to realize that the clock process must be re-alized by an essentially classical rather than by an essentiallyquantum physical system. In the quantum language, the gen-eralized coordinate describing the clock process must exhibitvery small quantum fluctuations around a large expectationvalue. If the universe were in a quantum state Ψ in whichquantum fluctuations of every variable (including the metric)are significant, one could not expect to observe anything re-sembling a “flow of time.” Therefore, even the possibility of aninterpretation of the wave function of the universe dependson the existence of (nearly) classical systems in the universe.Suppose that a nearly classical subsystem exists and is de-scribed by a variable φc. According to a standard result ofquantum mechanics, the wave function of the subsystem is ofthe semiclassical (WKB) type,

ψ(φc) ∝ exp (−iScl(φc)) ,

where Scl(φc) is the value of the classical action on a classi-cal trajectory φ(t), expressed as a function of the value of thevariable φc at a final time,

Scl(φc) =

∫ t

t0

Lφc(φ, φ)dt, φ(t) = φc.

The total wave function of the universe is a product of thesemiclassical part ψ(φc) and the quantum part Ψq ,

Ψ[hab, φj ] = exp (−iScl(φc))Ψq[φc;hab, φj ].

Let us now see how the variable φc can be used as a clockprocess. The trajectory φc(t) of the classical subsystem is, atleast locally, a monotonic function of t. Therefore, the val-ues of φc can be used as “time” and then the quantum part ofthe wave function, Ψq[φc;hab, φj ], becomes a time-dependentwave function for the quantum variables hab, φj . One canshow (see e.g. [35]) that a Schrödinger-type equation holds forthis wave function, the variable φc playing the role of time, ifthe total wave function Ψ satisfies the WD equation.

5.3.4 “Minisuperspace”

The WD equation is a functional differential equation for afunctional on a “superspace” and, as such, is too complicatedto be solved in general. One can obtain specific results fromthe WD equation if one simplifies the problem sufficientlydrastically.Note that one of the main applications of General Rela-tivity is cosmology where one considers only the gross fea-tures of the universe, that is, fields averaged over astronom-ically large scales. Astronomical observations offer a strong

evidence that the universe around us is extremely homoge-neous on large scales. Therefore, in cosmology one usuallyassumes that the 3-surfaces of constant time are spatially ho-mogeneous. In other words, the 3-metric hab(t) and the fieldsφj(t) depend only on the time t and are independent of thespatial coordinates xc. Hence, we are motivated to considerthe theory of quantized gravity with spatially homogeneousmetric and fields. The corresponding simplification of theWDequation consists of reducing the infinite-dimensional super-space to a finite-dimensional space containing 3-metrics haband field configurations φj that are spatially homogeneous,i.e. independent of the 3-coordinates xc. The reduced super-space is calledminisuperspace. The WD equation in minisu-perspace becomes a (partial) differential equation for a wavefunction Ψ(hab, φj). The analysis of this simplified WD equa-tion is the subject of quantum cosmology.To be specific, let us consider a model of classical space-time whose equal-time surfaces are homogeneous 3-spheresS3 (hence, a closed universe!) with a fixed 3-metric γ. Thespacetime metric is

g = N2(t)dt⊗ dt− a2(t)γ,

whereN(t) is the lapse function and a(t) is an unknown func-tion of time called the scale factor. Since the lapse functionis nondynamical, the scale factor is the only physical vari-able in the gravitational sector of the model. Further, supposethere exists a scalar field φ(t) which is, again, independent ofthe spatial coordinates xc. Thus the minisuperspace is two-dimensional and consists of two variables, a and φ. We shallnow quantize this cosmological model using the WD equa-tion.We need to express the classicalHamiltonian (5.41) for grav-ity through the variable a(t), and add the Hamiltonian for thescalar field φ,

Hφ =

t=const

d3xcN√−h[1

2p2φ + habφ,aφ,b + V (φ)

]

,

where pφ is the canonical momentum for φ, and

V (φ) ≡ Λ +1

2m2φ2 +O(φ3)

is a potential describing the vacuum energy density Λ, themass m, and a possible self-interaction of the field φ. Thepartial metric h = −a2γ, so

√−h = a3√γ. Since the field φ

is spatially homogeneous, we may integrate over 3-spheres,assuming that the metric γ is normalized so that

t=const

d3xc√γ = σ3 = 2π2,

where

σn ≡ 2πn/2

Γ(n/2)

is the n-volume of a unit n-sphere Sn. (This normalization ofthe 3-metric γ means that the radius of the 3-sphere is equalto 1.) Thus we obtain

Hφ = Na3σ3

[1

2p2φ + V (φ)

]

.

Returning to the gravitational sector of the model, we need tocompute the 3-curvature scalar (3)R and the extrinsic curva-ture tensor Kab. The 3-curvature

(3)R depends only on the in-trinsic geometry of the 3-surface t = const, which is a 3-sphere

121

Page 130: Winitzki. Advanced General Relativity

5 Variational principle

S3 with the natural metric and radius a(t), and thus a space ofconstant curvature; the value of the curvature is a−1(t). The 3-dimensional Riemann tensor for a space of constant curvaturewas computed in Sec. 1.10.3, hence

(3)R(a,b, c,d) = a−2 (h(a, c)h(b,d) − h(a,d)h(b, c)) .

Using(3)Tr(a,b)h(a,b) = 3,

we find

(3)R = (3)Tr(a,c)(b,d)(3)R(a,b, c,d) = 6a−2.

It remains to compute the extrinsic curvature. We may intro-duce local coordinates xa on a surface t = const; however,the explicit form of these coordinates will not be necessaryfor the present calculation. It is sufficient to note that the ba-sis 3-vectors ea ≡ ∂xa

commute with each other and with ∂t.Hence, from Eq. (1.43) we get

g(∇eb∂t, ec) =

1

2∂tg(eb, ec) = aaγ(eb, ec) =

a

ahbc.

Since the normal vector n to the 3-surfaces t = const is n =N−1∂t, the extrinsic curvature tensor is

K (x,y) ≡ g(∇xn,y) = N−1g(∇x∂t,y) = N−1 a

ah(x,y).

Then we determine the relevant combination of the traces ofK ,

KbcKbc −Kb

bKcc =

(a

Na

)2(hbch

bc − hbbhcc

)= −6

(a

Na

)2

.

The gravitational Lagrangian becomes

LG =

t=const

d3xN√−h

16πG

((3)R+KabK

ab −KaaK

bb

)

=

t=const

d3xN√−h

16πG

(

6

a2− 6

(a

Na

)2)

.

We can now integrate over the 3-volume of the 3-sphere t =const, ∫

t=const

d3x√−h = a3σ3,

because all the terms are spatially homogeneous. Therefore,

L =6Na3σ3

16πG

(1

a2− a2

N2a2

)

=3π

4GNa

(

1 − 1

N2a2

)

.

To compute the Hamiltonian, it remains to determine thecanonical momentum pa,

pa =∂L∂a

= − 3π

2G

aa

N.

Hence,

a = −2G

N

apa,

and the gravitational Hamiltonian is

HG(pa, a) = paa− LG

=−3π

4GNa+

N

ap2a

(

−2G

3π+

4G

4G2

9π2

)

=−3π

4GNa− G

N

ap2a.

Finally, we can write the total Hamiltonian for the minisu-perspace theory,

H (pa, a, pφ, φ) = HG + Hφ

=N

a

[

− G

3πp2a −

4Ga2 + a4σ3

(1

2p2φ + V (φ)

)]

.

Therefore, the “wave function of the mini-universe,” Ψ(a, φ),satisfies the WD equation

[

− G

3πp2a −

4Ga2 + 2π2a4

(1

2p2φ + V (φ)

)]

Ψ

=

[G

3π~

2∂2a −

4Ga2 + π2a4

(−~

2∂2φ + 2V (φ)

)]

Ψ = 0.

This equation needs to be supplemented by boundary condi-tions. However, the choice of the boundary conditions is asubtle problem and we shall not discuss this issue here. In thepresent case, we shall simply choose the solution of the WDequation that will have a clear physical interpretation.It is reasonable to suppose that the scale factor a has a semi-classical regime where it is the large radius of the 3-spheret = const. (Our universe is large.) Therefore, in the semiclassi-cal regime the value of a can be used as the “clock” variable.We can then express the wave function in the semiclassicalform with respect to the variable a,

Ψ(a, φ) = exp(−i~−1S(a)

)ψ(a, φ),

where the factor ψ(a, φ) is assumed to be a slow-varying func-tion of a but quickly-varying function of the quantum variableφ, namely

∣∣∣∣

∂ψ

∂a

∣∣∣∣∼ ~

∣∣∣∣

∂ψ

∂φ

∣∣∣∣.

Then we can expand the WD equation in powers of ~, substi-tuting the potential V (φ),

− G

(

S′2 +9π2a2

4G2− 6π3

Ga4V0

)

ψ (5.43)

− iG~

3π(S′′ + 2S′∂a)ψ + π2a4

(−~

2∂2φ +m2φ2

)ψ = 0. (5.44)

The first line (5.43) above is of zeroth order in ~ and thus mustbe satisfied separately. This is an equation determining thebehavior of the metric variable a. Its solution is

S(a) = ±∫

da

6π3

Ga4V0 −

9π2a2

4G2.

To visualize the physical interpretation of this solution, wenote that Eq. (5.43) is formally analogous to the semiclassi-cal form of the stationary Schrödinger equation for a one-dimensional particle in a potential

U(a) ≡ 9π2a2

8G2− 3π3

Ga4V0.

This “particle” can tunnel from the initial state at a = 0 to thevalue

a = a0 ≡√

3

8πGV0.

The regime a > a0 corresponds to a classical motion of the

particle with the velocity da/dτ =√

2U(a), while for 0 < a <a0 the motion is classically forbidden since U(a) < 0. The tun-neling process is interpreted as the creation of a closed universe

122

Page 131: Winitzki. Advanced General Relativity

5.3 Quantum cosmology

having the initial size a0. Since the assumed initial state a = 0corresponds to a sphere of zero radius or, in other words, tothe absence of space, this process is also called creation of auniverse from nothing.Consider now the second part (5.44) of theWD equation. Inthe semiclassical regime, S(a) is a slow-changing function ofa, so we can disregard |S′′| ≪ S′2. Since a is a well-definedsemiclassical variable in the regime a > a0, we may definethe “time” τ according to the heuristic picture of a moving“particle” with the coordinate a. It is convenient to define

dτ =3π3

2G

a4da√

2U(a)=

3π3

2G

a4da

S′(a),

and express a through τ in the range a > a0. Then Eq. (5.44)can be rewritten as

i~∂ψ

∂τ=(−~

2∂2φ +m2φ2

)ψ ≡ Hψψ.

This is a familiar time-dependent Schrödinger equation for a

quantum system with the Hamiltonian Hψ.In this way, we find that the time-independent “wave func-tion of the universe”Ψ(a, φ), when properly interpreted in theregime when one of the variables is classical, yields a familiarpicture of the evolution of quantum systems with time. Notethat in the classically forbidden regime 0 < a < a0, no “time”can be defined, and a classical interpretation of the wave func-tion Ψ is impossible.

123

Page 132: Winitzki. Advanced General Relativity
Page 133: Winitzki. Advanced General Relativity

6 Tetrad methods

The tetrad formalism is covered in the books [21, 33, 36]with varying degrees of clarity and detail.Note: The material on vector bundles is standard but needsto be expanded. Also, there should be more material ontetrad formalism: derivation of Einstein equation, as well asan explanation of tetrad formalism in terms of vector bun-dles (making geometric sense of tetrad indices, “covariant”exterior differential, etc.).

6.1 Tetrad formalism

In the standard approach, General Relativity is formulated asa “theory of the metric.” In other words, the main variable isthemetric tensor g, while the connection and the curvature areexpressed through the metric. An alternative formulation isthrough orthonormal bases of vector fields (called “tetrads”).This formulation is convenient formany calculations, and alsoserves to connect gravity with spinor field theory. Some ap-proaches to quantum gravity are based on the tetrad formula-tion.For convenience, we will restrict attention to a four-dimensional manifold having a metric with Lorentzian sig-nature (+ −−−). Generalizations to other dimensions andsignatures are straightforward.

6.1.1 Tetrads

If a metric tensor g on a manifoldM is specified, one can al-ways choose an orthonormal basis e0(p), e1(p), e2(p), e3(p)in each tangent space TpM. One of the tetrad vectors mustbe timelike and the other three must be spacelike (because themetric g has a Lorentzian signature). It is a useful conventionto choose the vector e0 timelike and the other three spacelike,so that the scalar product matrix is

g(ea, eb) = ηab ≡ diag (1,−1,−1,−1) ,

where ηab is understood as a fixed matrix (which is numer-ically the same as the standard metric in Minkowski space-time). Note that the Latin indices label the basis vectors andrun (for convenience) from 0 to 3. Since these indices are nottensor indices, we will indicate summations explicitly.We called the basis ea “orthonormal” even though thenorm of some vectors is equal to −1, as it must be since themetric has Lorentzian signature. Essentially we have chosena set of vector fields ea(p), defined at every point p. Such anorthonormal set of vector fields ea(p) is called a tetrad or avierbein in case of four-dimensional manifolds, and a framefield or vielbein in any number of dimensions.1

Remark: We call the basis ea “orthonormal” even thoughthe norm of some vectors is equal to −1, as it must be sincethe metric has Lorentzian signature.

1The German words “vierbein” and “vielbein” are pronounced approxi-mately as the English nonwords “fear-bine” and “feel-bine” respectively.The corresponding French term is “repère,” pronounced close to “rep-pair.”

In a local coordinate system xµ where the metric is spec-ified as a set of components gµν(p), the components e

aµ(p) of

the tetrad may be found by the following procedure. At afixed spacetime point p, consider the matrix gµν(p) as a bilin-ear form. This bilinear form may be diagonalized by a linearcoordinate transformation, according to standard proceduresof linear algebra. Suppose that t, x, y, z are new coordinatesin which the metric is diagonal at the point p, e.g.

gµν(p) = diag (A,−B,−C,−D) ≡

A 0 0 00 −B 0 00 0 −C 00 0 0 −D

with A,B,C,D > 0. Then it is easy to write an orthogonalbasis in the tangent space TpM, for instance

e0(p) =1√A∂t, e1(p) =

1√B∂x,

e2(p) =1√C∂y, e3(p) =

1√D∂z .

In this way, a tetrad may be determined at every point p of themanifold.Conversely, if a tetrad ea(p) is given at every point p ofa manifold M then we may recover the metric tensor g ex-plicitly in the following way. First we need to compute thefour 1-forms θa (a = 0, 1, 2, 3) that comprise the dual basis toea. Here and below we denote vectors and forms by bold-face symbols. (It is convenient to write “a” as an upper indexin θa, even though we presently do not consider it a tensorindex.)

Construction of the dual basis: We work in the tangentspace TpM at a point p of a manifoldM. Since ea(p) is abasis in TpM, any vector v ∈ TpM is uniquely decomposedas a linear combination

v =∑

a

λaea,

where λa are some numbers. When the basis ea is fixed,the numbers λa are linear functions of v, and thus are 1-formswhich we denote θa; in other words, λa ≡ θa(v). These 1-forms θa yield the components of a vector in the basis ea,so that

v =∑

a

θa(v)ea.

This relationship holds for any vector v; sometimes this iswritten as a decomposition of the identity operator,

1 =∑

a

ea ⊗ θa. (6.1)

The set of 1-forms θa is called the dual tetrad and is a basisin the dual tangent space T ∗M(p). Sometimes one calls θa

simply the “tetrad” for brevity.In any case, it is straightforward to pass from ea to θaand back. In a local coordinate system, the dual basis has com-ponents θaµ which comprise a 4 × 4matrix satisfying

a

θaµvµeνa = vν

125

Page 134: Winitzki. Advanced General Relativity

6 Tetrad methods

for any vµ. Therefore,

a

θaµeνa = δνµ,

so the matrix of components θaµ can be computed as the in-verse matrix to eµa .

Once the 1-forms θa, a = 0, 1, 2, 3, are computed, the metrictensor g is expressed as

g(u,v) = g

(∑

a

θa(u)ea,∑

b

θb(u)eb

)

=∑

a,b

ηabθa(u)θb(v).

(6.2)Rewritten in a more condensed notation, the above formulabecomes

g =∑

a,b

ηabθa ⊗ θb.

In components,

gµν =∑

a,b

ηabθaµθbν .

This is an explicit formula that expresses the metric tensor gthrough the dual tetrad. Analogously, the inverse metric g−1

is expressed through the tetrad vectors ea as

g−1 =∑

a,b

ηabea ⊗ eb.

Thus, in order to define a metric structure on a manifold,one may specify a tetrad ea or alternatively a dual tetradθa instead of specifying the metric tensor g.

Remarks:

• The tetrad ea(p) can be viewed as a map from a fixed,four-dimensional spacetime R4 to the tangent spaceTpM. A vector

va ≡v0, v1, v2, v3

∈ R

4

is mapped to the tangent vector

v ≡∑

a

vaea(p) ∈ TpM.

This map va → v is compatible with the scalar productin R

4 defined by the matrix ηab and the scalar product inTpM defined by the metric g, namely

a,b

ηabuavb = g(u,v) if ua → u, va → v.

Therefore we refer to the abstract space R4 and the “met-

ric” ηab as the fiducial Minkowski spacetime and metric.This fiducial flat spacetime can be interpreted as the in-stantaneous reference frame of a certain observer. Thisobserver’s four-velocity instantaneously coincides withthe timelike vector e0(p) and the observer’s spatial axesare chosen along the directions e1, e2, e3.

• There are infinitely many possible tetrads correspondingto one and the same metric g. A given tetrad ea can betransformed via a Lorentz rotation,

ea(p) → ea(p) ≡∑

b

Λba(p)eb(p),

without changing the metric g. Here, Λba(p) is a ma-trix representing a Lorentz transformation in the fiducialMinkowski spacetime. This matrix Λba(p) may be cho-sen arbitrarily at every point p and thus is called a localLorentz transformation. Conversely, any two orthonor-mal bases ea and ea are connected by a Lorentztransformation. Therefore, the local Lorentz transforma-tions Λba(p) represent the entire freedom of choosing thetetrad field for a fixed metric g.

• A tetrad is essentially a set of four vector fields whichmust be linearly independent at every point. Since itis impossible to have globally nonzero vector fields onsome manifolds (e.g. on a 2-sphere), the tetrad fieldsea(p) might have to be defined only locally. In otherwords, different tetrad fields will have to be definedwithin different charts covering the manifold.

• It is clear that the 1-forms θa can be expressed through eaand the metric as follows,

θa(v) ≡ θa v = ηaag(ea,v) =∑

b

ηabg(eb,v),

where v is an arbitrary vector. In the notation of Sec. 1.5.5,we may rewrite this as

θa = ηaagea =∑

b

ηabgeb.

(There is no summation over a in the expressions ηaageaabove.) Often it is convenient to raise and lower the Latin(tetrad) indices using the fiducial Minkowski metric ηab.Thus, one defines

θa ≡ ηaaθa =

b

ηabθb, (6.3)

and writes equations such as

g(ea,v) = θa(v). (6.4)

However, we stress that the dual tetrad θa can be cal-culated from eawithout knowledge of the metric tensorg.

• The dual tetrad θa is orthonormal with respect to theinversemetric,

g−1(θa,θb) = ηab.

Thus θa(p) is an orthonormal basis in the cotangentspace T ∗M(p).

• A note on terminology: Sometimes the frame basis eaor the dual basis θa are called nonholonomic bases,while the coordinate basis ∂/∂xµ is called holonomic.This terminology will not be used in the present text; in-stead we call ea an orthonormal frame basis. The term“holonomic” seems to come from mechanics textbooks.In theoretical mechanics, there is a notion of holonomicand nonholonomic constraint equations. A constraintequation is a relation of the form

j

dqj

dtAj(q) = 0,

where q ≡qj∈ Rn is a time-dependent vector of “gen-

eralized coordinates” and Aj(q) are some coefficients. A

126

Page 135: Winitzki. Advanced General Relativity

6.1 Tetrad formalism

constraint equation is called holonomic if it can be rewrit-ten as a total time derivative of a function, i.e. if there ex-ists a function α(q) such that the constraint equation issimply dα(q)/dt = 0, in other words if

Aj =∂α(q)

∂qj, j = 1, ...n.

We can reformulate these statements more transparentlyby introducing the auxiliary 1-formA ≡∑j Ajdq

j . Thenthe constraint equation is rewritten as A q = 0, whereq is the tangent vector to the trajectory q(t). The con-straint is holonomic if there exists a scalar function α(q)such that A = dα; in the standard terminology, the con-straint is holonomic when the 1-form A is exact (equalto a differential of something). So one might call a ba-sis θa “holonomic” if there exist functions xa such thatθa = dxa, and “nonholonomic” otherwise. An orthonor-mal basis is “holonomic” only if the space is flat and xaare Cartesian coordinates; in all other cases one has dθa 6=0 at least for some a, so the basis is “nonholonomic.” Sim-ilarly, the orthonormal basis vectors ea usually cannot berepresented as partial derivatives with respect to somecoordinates; accordingly, some commutators [ea, eb] arenonzero. Thus one calls such ea a “nonholonomic” ba-sis. However, what is important for our considerationsis the orthonormality of the chosen frame basis ea withrespect to a given metric g. If we called ea is a non-holonomic basis, wemay create an impression that we se-lected some nonholonomic basis, whether orthonormal ornot. However, an arbitrary non-orthonormal, nonholo-nomic basis would be much less useful in calculationsthan an orthonormal basis. For this reason, we call eaan “orhonormal frame basis” (or a “vielbein”) rather thana “nonholonomic basis.”

6.1.2 Examples

Example 1: Consider the Schwarzschild metric (1.36). Thetask is to determine a tetrad basis and a dual tetrad basis.

Since the metric is in a diagonal form, a possible choice ofthe tetrad is

e0 =

(

1 − 2M

r

)−1/2

∂t, e1 =

(

1 − 2M

r

)1/2

∂r,

e2 =1

r∂θ, e3 =

1

r sin θ∂φ.

Note that the tetrad is undefined (singular) if r = 0, r = 2M ,or sin θ = 0 because at these locations the coordinate system(t, r, θ, φ) and/or the metric g are singular. This is an exam-ple of a tetrad defined only locally, that is, only within a cer-tain limited coordinate patch, rather than globally in the entirespacetime. More than one coordinate patch must be used tocover the entire Schwarzschild spacetime.

The dual basis θa corresponding to the tetrad ea is

θ0 =

(

1 − 2M

r

)1/2

dt, θ1 =

(

1 − 2M

r

)−1/2

dr,

θ2 = rdθ, θ3 = r sin θdφ.

Example 2: Suppose now that we are given a two-dimensional spacetime with local coordinates t, x and theframe fields

e0 = ∂t, e1 = e−Ht∂x.

The task is to determine the metric tensor g such that thisframe is orthonormal.We can recover the metric by first determining the dualframe,

θ0 = dt, θ1 = eHtdx,

and then writing

g =∑

a,b

ηabθa ⊗ θb = dt⊗ dt− e2Htdx⊗ dx

≡ dt2 − e2Htdx2.

This metric is the two-dimensional version of de Sitter met-ric (1.37).

6.1.3 Hodge duality

Using the metric g, one determines the Levi-Civita tensor ε,and then one can establish a one-to-one linear map betweenn-forms and (4 − n)-forms. This map is called the Hodge du-ality operation. Since this operation is used relatively little inthe present text (it rarely yields a computational advantage), Ionly sketch the construction on examples and list some basicproperties of the Hodge duality. 2

We work with a four-dimensional manifold where the met-ric g and the Levi-Civita tensor ε are known. First consider a1-form ω. The Hodge duality map yields a 3-form ∗ω calledthe Hodge dual to ω; this operation is also called the Hodgestar. The 3-form ∗ω is defined by its action on arbitrary vec-tors x, y, z:

(∗ω) (x,y, z) ≡ ε(g−1ω,x,y, z).

Somewhat more awkwardly, this definition can be rewrittenas

∗ω = ιg−1ωε.

It is clear that ω 7→ ∗ω is a linear map in ω (if we regard themetric g and the tensor ε as fixed).

Example: Consider a four-dimensional Minkowski spacewith the orthonormal basis of 1-forms dt,dx,dy,dz. TheLevi-Civita tensor is

ε = dt ∧ dx ∧ dy ∧ dz.

Let us compute theHodge star applied to the 1-formdt. Usingthe vector g−1dt = ∂t, we find

∗(dt) = ι∂tε = dx ∧ dy ∧ dz.

Similarly, we have

∗(dx) = ι∂xε = −dt ∧ dy ∧ dz.

Heuristically, ∗ω must contain the “complement” of ω, so thatω ∧ ∗ω is proportional to ε.

2I was motivated to make this excursion into the Hodge duality by read-ing the lecture notes [13], chapter 6, where I learned about Statement6.1.3.2(a).

127

Page 136: Winitzki. Advanced General Relativity

6 Tetrad methods

Another convenient way to express ∗ω is to use an orthonor-mal basis ea. Since

g−1ω =∑

a,b

ηab (ιebω) ea,

we can write∗ω =

a

ηaa (ιeaω) (ιea

ε) .

This expression can be also used as a definition of ∗ω. By con-struction, this definition is independent of the choice of thebasis ea, as long as this basis is orthonormal with respectto the fixed metric g and positively oriented with respect toε. (We merely rewrote a basis-free definition in terms of anarbitrary orthonormal basis.) The basis independence can bemade manifest by expressing ∗ω through the trace operationas follows,

∗ω = Tr(a,b) (ιaω) (ιbε) .

It is clear that this can be used as a definition of ∗ω equivalentto the definitions above.Similarly, a 2-form ω(a,b) is converted to its Hodge dual

∗ω, which is also a 2-form. This 2-form can be defined usingan orthonormal basis ea as follows,

∗ ω ≡ 1

2!

a,b

ηaaηbb (ιebιeaω) (ιeb

ιeaε) . (6.5)

As before, this definition is independent of the choice of theorthonormal basis ea. This can also be seen with the fol-lowing trick. Any 2-form ω can be decomposed into a linearcombination of exterior products of 1-forms. Since the Hodgeduality is a linear map, we only need to consider a single exte-rior product ω = φ∧χ, where φ and χ are 1-forms. The 2-form∗(φ ∧ χ) is then expressed as

(∗(φ ∧ χ)) (x,y) = ε(g−1φ, g−1χ,x,y).

It is straightforward to verify that this expression is equivalentto Eq. (6.5). When the definition of ∗ω is written in this basis-free manner, it becomes manifest that ∗ω depends only on themetric g but not on the choice of the basis ea. However, cal-culations using an explicit basis ea are much more conve-nient since one does not need to decompose all n-forms intolinear combinations of exterior products of 1-forms. Some-times also the index-free trace notation is convenient,

∗ω ≡ 1

2!Tr(a,a′)(b,b′) (ιb′ιa′ω) (ιbιaε) .

So far we have seen the definitions of ∗ω for 1-forms and2-forms. To express the Hodge dual of an n-form Ω, one canbuild an expression analogous to Eq. (6.5) but with n sets ofvectors and the factor 1/n! in front:

∗Ω ≡ 1

n!

a1,...,an

ηa1a1 ...ηanan(ιean

...ιea1Ω) (ιean

...ιea1ε)

=1

n!Tr(a1,b1)...(an,bn) (ιan

...ιa1Ω) (ιbn...ιb1ε) .

Here we wrote the arguments eajin the reverse order for con-

venience; recall that

ιean...ιea1

Ω ≡ Ω(ea1 , ..., ean).

As before, this definition of ∗Ω depends only on the metric gand on ε but not on the choice of the orthonormal basis ea.

In four dimensions, the Hodge star establishes a linear mapfrom 0-forms (scalars) to 4-forms, from 1-forms to 3-forms,etc. In particular, the 4-form ε is mapped into a scalar. Thenumerical factor 1/n! leads to the relation ∗ε = −1:

∗ε =1

4!

a,b,c,d

ηaa...ηdd (ιed...ιea

ε) (ιed...ιea

ε)

= η00η11η22η33 = det ηab = −1.

Example: The Hodge dual to a scalar function (“0-form”) fis a 4-form ∗f = fε; taking the Hodge dual again, one gets∗(fε) = −f . Therefore, ∗ ∗ f = −f for scalar functions f .Let θa (a = 0, 1, 2, 3) be an orthonormal basis of 1-forms;then we have

∗θ0 = ιe0

(θ0 ∧ θ1 ∧ θ2 ∧ θ3

)= θ1 ∧ θ2 ∧ θ3;

∗θ3 = ι−e3

(θ0 ∧ θ1 ∧ θ2 ∧ θ3

)= θ0 ∧ θ1 ∧ θ2;

∗(θ0 ∧ θ1) = ι−e1 ιe0

(θ0 ∧ θ1 ∧ θ2 ∧ θ3

)= −θ2 ∧ θ3;

∗(θ2 ∧ θ3) = θ0 ∧ θ1; ∗(θ1 ∧ θ2 ∧ θ3) = −θ0; etc.

In four dimensions, one can derive a more convenientformula for the Hodge dual to a 4-form Ω by noting thatε(ea, ..., ed) 6= 0 only when all the indices a, b, c, d are differ-ent:

∗Ω =1

4!

a,b,c,d

ηaa...ηdd (ιed...ιea

Ω) (ιed...ιea

ε)

= (det ηab)Ω(e1, e2, e3, e4) = −Ω(e1, e2, e3, e4).

It is clear that this formula extends from four-dimensional toN -dimensional space,

∗ Ω = (det ηab)Ω(e1, ..., eN ), (6.6)

where ηab is the canonical form of the N -dimensional metric(i.e. the form where the metric is diagonal and has only ±1 onthe diagonal).

The action of the Hodge star on basis 1-forms θa, as il-lustrated by the examples above, can be summarized by thefollowing formulas:

∗1 ≡ ε =1

4!

a,b,c,d

εabcdθa ∧ θb ∧ θc ∧ θd,

∗θa =1

3!

b,c,d

εabcdθb ∧ θc ∧ θd,

∗(θa ∧ θb) =1

2!

c,d

εabcdθc ∧ θd,

∗(θa ∧ θb ∧ θc) =∑

d

εabcdθd,

where εabcd is not a tensor but a totally antisymmetric array ofnumbers (the Levi-Civita symbol) defined such that ε0123 = 1.(It is important to note that the 1-forms θa differ from θa bythe sign factor ηaa.)The example above shows that ∗ ∗ f = −f for a scalar f .Since ε = ∗1, we also have ∗ ∗ ε = −ε. A similar propertyholds for any n-form ω.

Statement 6.1.3.1: For any n-form ω in N -dimensional man-ifold with canonical metric ηab, one has

∗ ∗ ω = (det ηab) (−1)(N−n)n ω.

128

Page 137: Winitzki. Advanced General Relativity

6.1 Tetrad formalism

(Proof on page 172.)

It is not particularly easy to perform calculations with theHodge star operation. For instance, there are no general iden-tities relating the Hodge star to other operations, such asthe exterior differential or the Lie derivative; so in general∗(ω∧ψ) 6= ∗ω∧∗ψ, d∗ω 6= ∗dω, Lx ∗ω 6= ∗Lxω, etc. When do-ing calculations with the Hodge star, one needs to use eitheran explicit basis ea, or the index-free trace formalism.It follows from the example above that

θ0 ∧ ∗θ0 = −ε; (θ0 ∧ θ1) ∧ ∗(θ0 ∧ θ1) = −ε; etc.

On the other hand, it is easy to see that

θ0 ∧ ∗θ1 = 0; (θ0 ∧ θ1) ∧ ∗(θ0 ∧ θ2) = 0; etc.

It seems that the combination ω1 ∧ ∗ω2 is nonzero on basisn-forms only when the two forms are equal. This combina-tion is indeed a useful function of two n-forms. The followingstatement lists some properties of this function.

Statement 6.1.3.2: (a) If ω1 and ω2 are two n-forms in an N -dimensional manifold, one can compute theN -form ω1∧(∗ω2)that is symmetric under interchange of ω1 and ω2:

ω1 ∧ ∗ω2 = ω2 ∧ ∗ω1.

Thus the scalar ∗(ω1 ∧ ∗ω2) is a symmetric bilinear function inthe space of n-forms. This bilinear function can be viewed as anatural scalar product in the space of n-forms. In particular, ifθa is an orthonormal basis of 1-forms, then an orthonormalbasis in the space of n-forms is the set of exterior products

θa1 ∧ ... ∧ θan ,

with all possible sets of indices a1, ..., an for which these ex-terior products are nonzero.(b) If ω1, ω2 are two n-forms, the combination ω1 ∧ ∗ω2 isproportional to the volume element with the following coeffi-cient,

ω1 ∧ ∗ω2

=Vol

n!Tr(a1,b1)...(an,bn)ω1(a1, ...,an)ω2(b1, ...,bn). (6.7)

(c) Suppose ω1 and ω2 are 1-forms corresponding to vectorsv1 and v2, i.e. ω1 = gv1, ω2 = gv2.Then the scalar productdefined in part (a) coincides with the natural scalar product of1-forms given by the inverse metric g−1. More precisely,

ω1 ∧ ∗ω2 = g−1(ω1, ω2)Vol = g(v1,v2)Vol.

(d) If Ω is a 2-form and ω1, ω2 are 1-forms corresponding tovectors v1 and v2, one has

Ω ∧ ∗(ω1 ∧ ω2) = Ω(v1,v2)Vol.

(Proof on page 172.)

Remark: It is interesting to note what happens if ω1 is an n1-form and ω2 is an n2-form with n1 6= n2. In that case, ω1 ∧ ∗ω2

is an (n1 +N − n2)-form while ω2 ∧ ∗ω1 is an (n2 +N − n1)-form. If n1 6= n2, either n1 +N − n2 > N or n2 +N − n1 > N .Hence, one of the expressions ω1∧∗ω2 or ω2∧∗ω1 is identicallyzero as an n-form with n > N , while the other expression isnot necessarily zero. It is clear that Statement 6.1.3.2 cannotbe extended to this case.

Statement 6.1.3.2: If Ω is a 2-form and ω is the 1-form corre-sponding to the vector v = g−1ω, then

ω ∧ ∗Ω = − ∗ (ιvΩ), (6.8)

where both sides are 3-forms.Hint: Use Statement 6.1.3.2. (Proof on page 173.)

Practice problem: Verify Eq. (6.8) for Ω = θ0 ∧ θ1, ω = θ1,and for Ω = θ2 ∧ θ3, ω = θ2.

6.1.4 Levi-Civita connection

Given a metric g, one may determine the Levi-Civita connec-tion ∇ explicitly (see Eq. (1.43) in Sec. 1.6.6). It turns out thatthe Levi-Civita connection can be expressed directly throughthe tetrad. We begin by assuming that the metric tensor g andthe Levi-Civita connection ∇ are known, and then derive theformula for the covariant derivative of the tetrad field. Sub-sequently, it will be clear how to recover the connection fromthe tetrad without explicitly involving the metric g.Given a tetrad ea, wewould like to be able to compute thecovariant derivative∇uv for arbitrary vector fields u,v. Sinceea is a basis, every vector field v can be expressed throughea as

v =∑

a

vaea,

where va ≡ va(p) is a set of four scalar functions (the Latinindex is not a tensor index). Therefore it is sufficient to be ableto calculate∇uv for u = ea and v = eb, where a, b = 0, 1, 2, 3.According to Eq. (1.43), we have

(∇xy, z) =1

2

(x g(y, z) + y g(x, z) − z g(x,y)

− g(x, [y, z]) − g(y, [x, z]) + g(z, [x,y])).

This formula enables us to calculate g(∇eaeb, ec) if we can

evaluate terms such as ea g(eb, ec) and g(ea, [eb, ec]). Sinceg(eb, ec) = ηbc is (by construction) a fixed numerical matrix,the directional derivatives of ηab are zero, so ea g(eb, ec) = 0.Thus we have

g(∇eaeb, ec) =

g(ec, [ea, eb]) − g(ea, [eb, ec]) − g(eb, [ea, ec])

2.

We may rewrite this formula using the lower-index version ofthe dual tetrad θc [see Eqs. (6.3) and (6.4)] as

θc ∇eaeb =

θc [ea, eb] − θa [eb, ec] − θb [ea, ec]

2. (6.9)

Since θc v is the c-th component of a vector v in the ba-sis ea, the formula (6.9) completely determines the vector∇ea

eb if the basis vectors ea and their commutators [ea, eb]are known.We point out that the essential information requiredfor computing the covariant derivative is the commutators[ea, eb]. In general, these commutators can be expressed as

[ea, eb] =∑

c

Ccabec,

where Ccab = −Ccba are some scalar coefficients (which are,of course, functions of the spacetime point p). The coefficientsCcab can be explicitly computed from ea without need to

129

Page 138: Winitzki. Advanced General Relativity

6 Tetrad methods

involve the metric g; recall that the commutator [u,v] is a ge-ometric operation defined independently of the metric. Thenone has complete information about the Levi-Civita connec-tion ∇ through Eq. (6.9). The formula (6.9) can be rewrittenas

ηccθc ∇ea

eb ≡ θc ∇eaeb =

1

2(Cc ab − Cb ac − Ca bc) , (6.10)

where Cc ab is obtained from Ccab by lowering the first indexthrough ηac. Alternatively, we may write

∇eaeb =

1

2

c

(Cc ab − Cb ac − Ca bc) ηccec. (6.11)

At this point, one can compute covariant derivatives ∇uv ofarbitrary vector fields u,v by expressing them through the ba-sis ea and using the Leibnitz rule.

Example: We have a two-dimensional spacetime with localcoordinates t, x and the given orthonormal frame

e0 = ∂t, e1 = e−Ht∂x.

The task is to compute ∇uv, where u and v are ∂t or ∂x.Let us first determine ∇ea

eb for all a, b. We begin by calcu-lating the commutators [ea, eb]. The only nontrivial commu-tator is

[e0, e1] = −He1.

It follows that C101 = −C1

10 = −H are the only nonzerocoefficients among Ccab. Lowering the first index of C

cab, we

find C1 01 = −C1 10 = H . Then we use Eq. (6.11) and compute

∇e0e0 = 0, (no nonzero Cc ab)

∇e0e1 =1

2(C1 01 − C1 01) η11e1 = 0,

∇e1e0 =1

2(C1 10 − C1 01) η11e1 = He1,

∇e1e1 =1

2(−C1 10 − C1 10) η00e0 = He0.

Finally, we use the Leibnitz rule:

∇∂t∂x = ∇e0

(eHte1

)=(e0 eHt

)e1 + eHt∇e0e1

=(∂te

Ht)e−Ht∂x = H∂x.

Similarly, we find ∇∂t∂t = 0 and∇∂x

∂x = He2Ht∂t.

Remark: In the Koszul formula (1.43), terms with scalarproducts vanish for an orthonormal basis; however, termswith commutators remain. If we choose the basis vectors ascoordinate derivatives ∂µ (this is the choice usually made incalculations using index notation), the terms with commuta-tors will vanish but termswith scalar products will remain. Atfirst glance, it is not obvious which choice of basis has moreadvantages in calculations. However, it turns out that inmanycases calculations are shorter in an orthonormal basis.

6.1.5 Connection as a set of 1-forms

In the previous section we have seen that the informationabout Levi-Civita connection∇ is carried entirely by the com-mutators [ea, eb] and can thus be recovered without explicitlyinvolving the metric g. A more elegant (and practically moreuseful) description of the Levi-Civita connection is obtainedby considering the dual tetrad θa.

Just as the commutator of vector fields is defined geomet-rically without involving the metric, the exterior differentialdθa is a 2-form defined geometrically by

dθa(u,v) = u θa(v) − v θa(u) − θa([u,v]).

Substituting the tetrad vectors eb and ec instead of u,v, wehave

dθa(eb, ec) = −θa([eb, ec]) ≡ −Cabc.So the 2-forms dθa carry essentially the same information asthe commutators of the tetrad vectors. Hence, we expect to beable to express the Levi-Civita connection through the dualtetrad θa and its differentials dθa. To do this, one uses someclever tricks, which we will now show.Since θa is a basis in the dual space, the 2-form dθa canbe always decomposed as

dθa =1

2

b,c

Xabcθ

b ∧ θc

with some coefficients Xabc = −Xa

cb. It is easy to see thatthese coefficients are directly related toCabc (see the followingCalculation).

Calculation: Show thatXabc = −Cabc.

Solution: Consider the 2-formdθa applied to arbitrary vec-tors u,v. Decompose u and v through the tetrad and compute

(dθa) (u,v) = (dθa) (∑

b

ubeb,∑

c

vcec

)

=∑

b,c

ubvc (dθa) (eb, ec) = −∑

b,c

Cabcubvc.

On the other hand, ub ≡ θb u and vc ≡ θc v, so

b,c

Xabcθ

b ∧ θc

(u,v) =∑

b,c

Xabc

(ubvc − ucvb

)

= 2∑

b,c

Xabcu

bvc.

Since u,v are arbitrary vectors, we obtain Xabc = −Cabc.

Let us now see how to recover the Levi-Civita connection.So far we have the property

dθc = −1

2

a,b

Ccabθa ∧ θb, (6.12)

which shows that the coefficientsCcab can be easily computed

by decomposing dθc into the basis of θa ∧ θb. However, ac-cording to Eq. (6.10), covariant derivatives of vectors involvea certain combination of Ccab,

θc ∇eaeb =

1

2(Cc ab − Cb ac − Ca bc) ,

rather than simply Ccab. (Here and below we freely lowerand raise all Latin indices by implicitly using the fiducialMinkowski metric ηab.) Now we note that the scalar expres-sion θc ∇ueb linearly depends on the vector u and is thusequal to some 1-form applied to u. Let us denote that 1-formby ωc

b; in other words, by definition, we set

ωcb(u) ≡ θc ∇ueb.

130

Page 139: Winitzki. Advanced General Relativity

6.1 Tetrad formalism

Another way to restate the definition is to say that the 1-formωc

b satisfies

∇ueb =∑

c

ωcb(u)ec (6.13)

for any vector u. The 1-forms ωcb are called the connection 1-

forms or the spin connection corresponding to the dual tetradθa.It is clear from Eq. (6.10) that the 1-forms ωc b (that is, ω

cb

with the first index lowered) satisfy

ωc b ≡1

2

a

(Cc ab − Cb ac − Ca bc)θa. (6.14)

Thus, the connection formsωc bmay be computed straightfor-wardly from the commutator coefficients Ccab. After comput-ing ωc b, covariant derivatives of arbitrary vector fields can beexpressed as follows,

∇ueb =∑

c

ωcb(u)ec,

∇uv =∑

b

∇u

(vbeb

)=∑

b

(u vb

)eb +

b,c

vbωcb(u)ec.

In terms of the components vc of a vector field v, we find theformula

(∇uv)c ≡ θc ∇uv = u vc +∑

b

ωcb(u)vb. (6.15)

It is clear that ωcb play the role of the Christoffel symbols in

the formula for the covariant derivative.The formula (6.14) is not necessarily the quickest way tocompute ωc b. Comparing it with Eq. (6.12), we notice that itmight help to compute the following expression,

b

ωc b ∧ θb =1

2

a,b

(Cc ab − Cb ac − Ca bc)θa ∧ θb

=1

2

a,b

Cc abθa ∧ θb.

(The simplification in the second line happens because theterms symmetric in a, b cancel). Then we can raise the index c,use Eq. (6.12), and obtain the following relationship,

dθc = −∑

b

ωcb ∧ θb. (6.16)

The relationship (6.16) is called the first Cartan structureequation.An immediate practical importance of this equation is thatit can be used, instead of Eq. (6.14), to determine the 1-formsωc

b. Of course, Eq. (6.16) alone does not specify ωcb uniquely.

For instance, the 1-forms 12

a Ccabθ

a are also a solution ofEq. (6.16), although these 1-forms do not coincide with ωc

b.More generally, if ωc

b is any solution of Eq. (6.16), one mayadd the following term,

ωcb → ωc

b +∑

a

Bcabθa,

where Bcab = Bcba is an arbitrary array of coefficients sym-metric in (a, b). The new ωc

b will still be a solution ofEq. (6.16). However, in addition to Eq. (6.16) the connection1-forms ωc

b must satisfy the antisymmetry requirement,

ωc b = −ωb c, (6.17)

which can be easily read from Eq. (6.14). It turns out3 thatthe two requirements (6.16), (6.17) are sufficient to specify ωc

b

uniquely, so that one does not need to use Eq. (6.14). In prac-tice, it is often quicker to guess the solution of Eqs. (6.16),(6.17) than to follow the straightforward but longer for-mula (6.14).

Example: Consider a two-dimensional spacetime with localcoordinates t, x and a frame field e0, e1 defined as

e0 = e−f(t)∂t, e1 = e−h(t)∂x,

where f(t), h(t) are scalar functions depending only on t (butnot on x). The task is to compute the connection 1-forms ωc band the covariant derivatives∇∂t

∂x and∇∂x∂x.

Solution: The metric corresponding to the given framefield is

g = e2f(t)dt2 − e2h(t)dx2.

We begin by defining the dual frame,

θ0 = ef(t)dt, θ1 = eh(t)dx,

and the differentials,

dθ0 = 0, dθ1 = heh(t)dt ∧ dx = −heh(t)−f(t)dx ∧ θ0,

where h ≡ ∂th(t). Note that we intentionally wrote dθ1 in

the form (...) ∧ θ0 rather than (...) ∧ θ1, trying to satisfy theantisymmetry property (6.17) which entails ω0

0 = ω11 = 0.

Then we can try to guess a solution ωcb of Eq. (6.16) as follows,

ω00 = ω0

1 = 0, ω10 = heh(t)−f(t)dx, ω1

1 = 0.

Despite our efforts, the 1-forms ωab are not yet the correct con-

nection forms because they do not satisfy the antisymmetryproperty (6.17). To correct this, we may add a multiple of dθ0

to ω10 and a multiple of dθ

1 to ω01. Presently, it is clear that

we only need to add heh(t)−f(t)dx to ω01. So the correct set of

the connection 1-forms is

ω01 = ω0 1 = −ω1 0 = ω1

0 = heh(t)−f(t)dx.

The covariant derivatives are now computed using the for-mula (6.15). Consider the vector v = ∂x = eh(t)e1. In the or-thonormal frame e1 the vector v has a single nonzero com-ponent, v1 = eh(t). Therefore Eq. (6.15) is rewritten as

(∇uv)c

= u vc + ωc1(u)v1.

A direct evaluation of components yields

(∇uv)0

= ω01(u)eh(t) = he2h(t)−f(t)u x,

(∇uv)1 = u eh(t) = heh(t)u t.

Therefore, for arbitrary u,

∇uv = heh(t)

eh(t)−f(t) (u x) e0 + (u t) e1

.

Substituting u = ∂t or u = ∂x, we find

∇∂t∂x = heh(t)e1 = h∂x;

∇∂x∂x = he2h(t)−f(t)e0 = he2h(t)−2f(t)∂t.

3See Sec. 6.1.6 for a general proof of this statement.

131

Page 140: Winitzki. Advanced General Relativity

6 Tetrad methods

6.1.6 *Solving equations for n-forms

In Sec. 6.1.5 we found that the connection 1-forms ωa b canbe found by solving the Cartan structure equation (6.16). Ingeneral, there are several tricks for finding general solutionsof linear equations for n-forms in the frame basis.The following statement shows how to solve the equationssuch as the Cartan structure equation in a more general case.

Statement 6.1.6.1: If any set of 2-forms Ac is given (wherec is a frame index) and θc is a dual frame basis then thereexists a unique set of 1-forms χa b such that

Aa +∑

b

χa b ∧ θb = 0; χa b = −χb a.

The 1-forms χa b can be expressed by the formula

χa b =∑

c

χa bcθc, χa bc ≡

1

2(Aa bc −Ab ac −Ac ab) , (6.18)

where Ac ab are the coefficients in the decomposition

Ac =1

2

a,b

Ac abθa ∧ θb, Ac ab = −Ac ba.

Idea of proof: Guess the formula for a solution χa b andthen show that the difference χa b − χa b of two possible so-lutions is zero. (Proof on page 173.)

A consequence of Statement 6.1.6.1 for Ac = dθc, χa b ≡ωa b is that the system of equations (6.16), (6.17) has the uniquesolution (6.14).The formula (6.18) can be also rewritten in a component-free manner, using the frame bases ea and θa instead:

χa b =1

2

(

ιebAa − ιea

Ab −∑

c

Ac(ea, eb)θc

)

. (6.19)

It is clear that this expression is antisymmetric in (a, b). Thecorresponding solution for the connection 1-forms can bewritten as

ωa b =1

2ιebdθa −

1

2ιeadθb −

1

2

c

(ιebιeadθc) θc. (6.20)

It is instructive to check directly thatAa +∑

bχa b ∧ θb = 0holds with χa b defined by Eq. (6.19). We compute

2∑

b

χa b ∧ θb =∑

b

(ιebAa) ∧ θb −

b

(ιeaAb) ∧ θb

−∑

b,c

Ac(ea, eb)θc ∧ θb. (6.21)

To proceed, we note that expressions such as

b

(ιebAa) ∧ θb

are reminiscent of the decomposition (6.1). The followingstatement shows how such expressions can be simplified.

Statement 6.1.6.2: Suppose Ω is any n-form and ea andθa is a basis and a dual basis for which the decomposi-tion (6.1) holds. (These bases must be dual to each other butdo not actually have to be orthonormal.) Then one has thedecomposition

Ω =1

n

a

θa ∧ (ιeaΩ) . (6.22)

In other words,1

n

a

θa ∧ ιea

is the identity operator on n-forms.

Proof of Statement 6.1.6.2: Let us apply the n-form at theright-hand side of Eq. (6.22) to a set of n arbitrary vectorsv1, ...,vn. By definition (1.19) of the exterior product, wehave

(∑

a

θa ∧ ιeaΩ

)

(v1, ...,vn)

=

n∑

j=1

(−1)j−1

a

θa(vj)Ω (ea,v1, ..., vj , ...,vn) ,

where the hat over vj means that the j-th vector is absent fromthe list of arguments. Since the n-form ω is linear in each ar-gument, we may substitute

a

θa(vj)Ω (ea,v1, ...) =∑

a

Ω (θa(vj)ea,v1, ...)

= Ω (vj ,v1, ...) .

Hence,

(∑

a

θa ∧ ιeaΩ

)

(v1, ...,vn)

=

n∑

j=1

(−1)j−1

Ω (vj ,v1, ..., vj , ...,vn)

=

n∑

j=1

Ω (v1, ...,vn) = nΩ (v1, ...,vn) .

This shows that Eq. (6.22) holds.

Using Statement 6.1.6.2 with n = 2, we can transformEq. (6.21) as follows,

2∑

b

χa b ∧ θb = −2Aa −∑

b

(ιeaAb) ∧ θb

+∑

c

(∑

b

ιeb(ιea

Ac) θb

)

∧ θc

= −2Aa −∑

c

(ιeaAc) ∧ θc +

c

(ιeaAc) ∧ θc

= −2Aa.

ThusAa +∑

bχa b ∧ θb = 0.

A useful alternative way of writing the formula (6.19) isshown in the following calculation.

Calculation 6.1.6.3: The formula (6.19) can be rewrittenequivalently as

χa b = ιebAa − ιea

Ab −1

2ιebιea

c

θc ∧ Ac. (6.23)

Details: We may interchange the order of θc and ιeain

Eq. (6.19) by using the property of ιx,

ιx (θc ∧ Ω) = (ιxθc)Ω− θc ∧ ιxΩ,

132

Page 141: Winitzki. Advanced General Relativity

6.2 Applications of tetrad formalism

which holds for any n-form Ω and for any vector x. Sinceιea

θc = δca, we find

ιea

c

θc ∧ Ac =∑

c

δcaAc −∑

c

θc ∧ ιeaAc

= Aa −∑

c

θc ∧ ιeaAc

and

ιebιea

c

θc ∧ Ac = ιeb

(

Aa −∑

c

θc ∧ ιeaAc

)

= ιebAa − ιea

Ab +∑

c

θc ∧ ιebιea

Ac.

(6.24)

Sinceιebιea

Ac ≡ Ac(ea, eb),

the formula (6.19) can be rewritten as Eq. (6.23).

Remark: The spin connectionωa b can be found from the for-mula (6.23) by substituting Ac ≡ dθc. Due to the Frobeniustheorem, the expression θc ∧ Ac = θc ∧ dθc is equal to zero ifthe 1-forms θc correspond to hypersurface-orthogonal vectorfields ec. Since the orthonormal frame ec frequently con-sists of hypersurface-orthogonal vectors, calculations are sim-plified when the formula (6.23) is used.

As an aside, we note that the formula (6.23) cannot holdfor 1-forms Ac (any 1-form Ac is decomposed uniquely as∑

aAacθa and the coefficientsAac are not necessarily antisym-

metric), but it can be generalized from 2-formsAa to n-formsΩa.

Statement 6.1.6.4: Given a set of n-forms Ac (where c is avielbein index and n ≥ 3), there exists a set of (n− 1)-formsBb a such that

Aa −∑

b

θb ∧ Bb a = 0; Ba b = −Bb a. (6.25)

A suitable set of Bb a is determined by the equivalent expres-sions

Bb a =1

n− 1(ιeb

Aa − ιeaAb) −

1

n (n− 1)ιebιea

c

θc ∧ Ac

(6.26)

=1

n(ιeb

Aa − ιeaAb) −

1

n (n− 1)

c

θc ∧ (ιebιea

Ac) .

(6.27)

Unlike the case n = 2, the set Bb a satisfying Eq. (6.25) is notunique. (Proof on page 173.)

6.2 Applications of tetrad formalism

After developing powerful techniques for manipulating theorthonormal frames, let us see what calculations are possiblein this formalism. To make the notation more concise, I willnow adopt the Einstein summation convention for the frameindices; for instance, I will now write ea θa(u) instead of∑

a eaθa(u). (In Sec. 6.1 I did not use the Einstein summationconvention but wrote all summations explicitly, to make thepresentation more clear.) The frame indices are raised andlowered using the fiducial Minkowski metric ηab.

Examples: Here are some previously derived identitiesrewritten in this notation:

g(u,v) = θa(u)θa(v), g = θa ⊗ θa;

gu = g(u, ea)θa = θa(u)θa,

Tr(a,b)X(a,b) = X(ea, ea),

ω = θaω(ea),

where ω is a 1-form.

6.2.1 Computing geodesic equations

The geodesic equation ∇uu = 0 is used to find trajectoriesof particles in GR. We now show how to use the vielbein for-malism to derive the required differential equations for theparticle worldline xµ(τ) in a local coordinate system.4Assuming that xµ is a given local coordinate system inan n-dimensional manifold, we are interested in determininga trajectory γ ≡ xµ(τ) such that ∇γ γ = 0. Suppose thatan orthonormal frame ea, the dual frame θa, and the cor-responding connection 1-forms ωa b are already known. Wewould like to determine the unknown functions xµ(τ). Thetangent vector γ can be expressed as

γ = uaea, ua ≡ θa γ.

The coefficients ua will be particular expressions involvingxµ(τ) and xµ(τ). Then we use Eq. (6.13) and compute

∇γ γ = ∇γ (uaea)

= (∇γua) ea + ubec (ωc

b γ)

=(∇γu

a + ubωab γ

)ea.

Since ea is a basis, we obtain n equations

∇γua +

b

ubωab γ = 0, a = 1, ..., n.

These equations are second-order in derivatives of xµ(τ), be-cause ua contains xµ, while the derivative along the curve,∇γu

a, produces xµ when it acts on xµ contained in ua.The geodesic equations can be further simplified, so thatprecomputing ωa

b is not required.

Calculation 6.2.1.1: The geodesic equation for a vector fieldu can be written as

u (θa(u)) + θb(u)ιeaιu (dθb) = 0, a = 1, ..., n. (6.28)

One can derive the formula (6.28) by using the explicit rela-tionship between ωa

b and θa, for instance, in the form (6.20).However, it is faster to use the Koszul formula (1.43) for com-puting the scalar product g(∇uu, ea), which must vanish forevery a = 1, ..., n. We find (without assuming g(u,u) = const)

g(∇uu, ea) =1

2

(2u g(u, ea) − ea g(u,u) − 2g(u, [u, ea])

)

1= u θa(u) − ea

g(u,u)

2+ θb(u)θb([ea,u])

2= u θa(u) + θb(u) (dθb(u, ea) + ea θb(u))

− ea g(u,u)

23= u θa(u) + θb(u)dθb(u, ea),

4The idea to explore geodesic equations in tetrad formalism came tome afterreading the section entitled “Computing with adapted frames and exam-ples” of the book [20].

133

Page 142: Winitzki. Advanced General Relativity

6 Tetrad methods

where1= is due to Eqs. (6.2) and (6.4);

2= is due to the defini-

tion (1.21) and the fact that u θb(ea) = u ηab = 0; and3= is

due to

θb(u) (ea θb(u)) =1

2ea

(

θb(u)θb(u))

=1

2ea g(u,u).

Of course, g(u,u) = const for a geodesic field u, but we didnot use that fact in the present derivation. As a result, the left-hand side of Eq. (6.28) is strictly equivalent to g(∇uu, ea) evenfor non-geodesic vector fields u.

In practical calculations, it is frequently convenient to usethe “conservation law” g(γ, γ) = const in place of one of the ngeodesic equations. (The first-order equation g(γ, γ) = constis always a consequence of the second-order geodesic equa-tions.)

Example: Let us determine the geodesic curves in the two-dimensional metric

g = e2f(x)dt2 − e2h(x)dx2.

The obvious orthonormal frame consists of the vectors e0 =e−f∂t, e1 = e−h∂x and the corresponding 1-forms θ0 = efdt,θ1 = ehdx. According to the geodesic equation (6.28), a curveγ(τ) ≡ t(τ), x(τ) with the tangent vector

γ(τ) ≡dt(τ)

dτ,dx(τ)

≡ t(τ)∂t + x(τ)∂x

is geodesic if the following two equations hold,

γ (θa(γ)) + θb(γ) (dθb) (γ, ea) = 0, a = 0, 1.

(The overdot denotes d/dτ everywhere.) Presently, dθ0 =f ′efdx ∧ dt and dθ1 = 0, while

θ0(γ) = θ0(γ) = ef t, θ1(γ) = −θ1(γ) = ehx,

so the two geodesic equations are

γ (ef t)

+ ef t(f ′efdx ∧ dt

)(γ, e−f∂t

)= 0, [a = 0]

γ (−ehx

)+ ef t

(f ′efdx ∧ dt

)(γ, e−h∂x

)= 0. [a = 1]

Simplifying these equations, while keeping in mind that γ

acts on t and x simply as d/dτ , we find

t+ 2f ′xt = 0, [a = 0]

x+ h′x2 + f ′e2f−2ht2 = 0. [a = 1]

These are second-order equations for a general geodesic curvet(τ), x(τ).We now use the property g(γ, γ) = const to simplify furthercalculations. To be definite, let us look for timelike geodesicsnormalized by

g(γ, γ) = e2f t2 − e2hx2 = 1.

Presently it is clear that the equation for x(τ) contains onlyt2 and functions of x and x. Therefore, let us substitute t2

through x2 into that equation,

x+ h′x2 + f ′e−2h(1 + e2hx2

)

= x+ (h′ + f ′) x2 + f ′e−2h = 0.

We obtained a closed second-order equation for x(τ), and wecan proceed to solve this equation by standard methods. Re-ducing to the first-order equation by a substitution x = v(x),we find

vv′ + (h′ + f ′) v2 + f ′e−2h = 0,

where the prime stands for d/dx. The last equation can beintegrated to

1

2v2e2(h+f) =

1

2C2 − 1

2e2f ,

where C2 > 0 is an integration constant. Finally, the solutionof the geodesic equation for x(τ) is expressed in the form ofan indefinite integral,

∫ x(τ) dx

v(x)=

∫ x(τ) eh+fdx√C2 − e2f

= τ − τ0.

Then the solution for t(τ) can be found by integrating theequation

t = e−f√

1 + e2hx2 = Ce−2f(x(τ)).

In some cases (for some functions f and h), these equationscan be integrated analytically and the geodesics obtained inclosed form.

6.2.2 Determining Killing vectors

A Killing vector field k satisfies Lkg = 0 (see Sec. 1.6.7); aconformal Killing vector satisfies Lkg = 2λg (see Sec. 3.1.3).Sometimes one would like to determinewhether a Killing vec-tor or a conformal Killing vector exists for a given metric. Onecan perform the necessary calculations in the vielbein formal-ism.Since themetric g can be expressed through the vielbein (seeSec. 6.1.1) as

g = ηabθa ⊗ θb,

while ηab is a constant numeric matrix, we have

Lkg = ηab

[

(Lkθa) ⊗ θb + θa ⊗ Lkθb]

.

The Lie derivatives Lkθa can be computed easily through theCartan homotopy formula (1.22),

Lkθa = d (θa k) + ιkdθa.

Then one can derive the Killing equation easily.

Example: Consider a metric

g = e2f(t)dt2 − e2h(t)dx2.

Let us determine whether

k ≡ a(t, x)∂t + b(t, x)∂x

can be a Killing vector for g, where a(t, x) and b(t, x) are un-known functions.The dual frame is

θ0 = ef(t)dt, θ1 = eh(t)dx.

The differentials are

dθ0 = 0, dθ1 = ehhdt ∧ dx.

134

Page 143: Winitzki. Advanced General Relativity

6.2 Applications of tetrad formalism

So we compute

Lkθ0 = d(θ0 k

)+ ιkdθ

0 = d(efa),

Lkθ1 = d(θ1 k

)+ ιkdθ

1 = d(ehb)

+ ehh (adx− bdt) .

Note thatd(efa) =

(efa)

,tdt+

(efa)

,xdx,

and similarly for d(ehb). So the Killing equation, Lkg = 0,becomes

Lkθ0 ⊗ θ0 + θ0 ⊗ Lkθ0 − Lkθ1 ⊗ θ1 − θ1 ⊗ Lkθ1 = 0.

This is rewritten (with implied symmetric tensor products) as[(efa)

,tdt+

(efa)

,xdx]

efdt

−[(ehb)

,tdt+

(ehb)

,xdx + ehh (adx− bdt)

]

ehdx

= 0.

Gathering terms with dt2, dt dx, and dx2, we obtain the equa-tions

(efa)

,t= 0,

(efa)

,xef −

(ehb)

,teh + bhe2h = 0,

(ehb)

,x+ aheh = 0.

These equations can be simplified to(efa)

,t= 0,

a,x = b,te2h−2f ,

b,x = −ah.

One solution is a = 0, b = const, which is the obvious Killingvector k = ∂x. If a 6= 0, we may solve the first equation asa = q(x)e−f with q 6= 0, differentiate the second equation byx, the third equation by t, and divide by q to obtain

q,xxq

= −e2h−2f(

h− hf)

.

Since the right-hand side is a function of t while the left-handside is a function of x, solutions exist only if both sides areequal to a constant. Unless the functions f(t), h(t) are such

that e2h−2f(

h− hf)

= K = const, no other Killing vectors

exist besides k = ∂x.

6.2.3 Curvature as a set of 2-forms

The connection forms ωab contain first derivatives of the met-

ric and are analogous to Christoffel symbols. The Riemanntensor also can be conveniently represented as a set of 2-forms.Consider the definition of the Riemann tensor,

R(u,v)w ≡ ∇u∇vw −∇v∇uw −∇[u,v]w,

and recall that this is a transformation-valued antisymmetricbilinear form, if viewed as a function of the vectors u,v. Forfixed u and v, the objectR(u,v) is a linear transformation thatcan be described, as usual, by a matrix of coefficients in thebasis ea. Let us denote this matrix by Rab(u,v); so we set,by definition,

Rab(u,v) ≡ θa R(u,v)eb. (6.29)

Then the transformation R(u,v) acts on the vector eb as

R(u,v)eb = eaRab(u,v).

The 2-formsRab completely describe the curvature tensor and

are called the curvature 2-forms.

It is often convenient to lower the index of the curvature2-forms and to consider the array

Ra b ≡ ηacRcb.

The relationship between the 2-forms Ra b and the covariantRiemann tensor R(a,b, c,d) can then be expressed as

R(x,y, ea, eb) = Rb a(x,y) = ιyιxRb a.

Remark: There is an obvious arbitrariness in the definitionof the 2-forms Ra

b: one could interchange the indices a andb, writing Ra b instead of Rb a above. This is the freedom ofchoosing the overall sign of the Riemann tensor. While notessential, this freedom leads to annoying sign discrepanciesbetween different textbooks.

The curvature 2-forms can be computed directly and ef-ficiently from the connection 1-forms ωc

b using the for-mula (6.30) below. To derive that formula, let us begin byexpressing R(u,v)eb with help of Eq. (6.13). We find

∇u∇veb = ∇u (ωab(v)ea)

= (u ωab(v)) ea + ωa

b(v)∇uea

= (u ωab(v)) ea + ωa

b(v)ωca(u)ec;

∇[u,v]eb = eaωab([u,v]).

We can now compute the coefficient at ea of R(u,v)eb (whiledoing that, we need to relabel the index a→ c):

θa R(u,v)eb = u ωab(v) − v ωa

b(u) − ωab([u,v])

+ ωcb(v)ωa

c(u) − ωcb(v)ωa

c(u)

= (dωab) (u,v) + (ωa

c ∧ ωcb) (u,v),

where we used the standard definitions of dωab and ωa

c ∧ωc

b for 1-forms ωab. The result is called the second Cartan

structure equation,

Rab = dωa

b + ωac ∧ ωc

b. (6.30)

Remarks: The curvature 2-forms Ra b (with the first indexlowered through ηac) are obviously antisymmetric in a, b be-cause ωa b are.

The Cartan structure equations make computations of thecurvature tensor significantly shorter. The simplification isdue to several factors: firstly, the connection forms ωa b, be-ing antisymmetric in a, b, have fewer nonzero coefficientsthan the Christoffel symbols Γλαβ , which are symmetric inα, β. Secondly, many terms are cancelled automatically dur-ing the calculation of exterior differential and exterior productin Eq. (6.30).

Calculation: Compute the connection and the curvatureforms for the two-dimensional metric g = e2fdt2 − e2hdx2,where f(t, x) and h(t, x) are arbitrary functions of t and x.Compute the curvature 2-form of the two-dimensional de Sit-ter metric by setting f(t, x) = 0, h(t, x) = Ht.

135

Page 144: Winitzki. Advanced General Relativity

6 Tetrad methods

Solution: The dual frame is

θ0 = efdt, θ1 = ehdx.

The differentials dθa are

dθ0 = df ∧ efdt = f,xefdx ∧ dt, dθ1 = h,te

hdt ∧ dx.

The only nonzero connection forms are ω01 and ω1

0, so aftersome tries we get

ω01 = ω0 1 = −ω1 0 = ω1

0 = f,xef−hdt+ h,te

h−fdx.

The only nonzero curvature 2-forms are R01 = R1

0 and arefound as

R01 = dω0

1 +∑

c

ω0c ∧ ωc

1;

however, the sum over c falls out because ω00 = ω1

1 = 0.Thus the only nontrivial component of the curvature 2-formsis

R01 = dω0

1 =(f,xe

f−h)

,xdx ∧ dt+

(h,te

h−f)

,tdt ∧ dx

=[(h,te

h−f)

,t−(f,xe

f−h)

,x

]

dt ∧ dx.

Note that the calculation is significantly simplified due to theuse of forms.For the de Sitter case, we find

R01 = H2eHtdt ∧ dx.

6.2.4 Ricci tensor and Ricci scalar

The Ricci tensor Ric is defined as the trace of the Riemanntensor,

Ric (a,b) ≡ Tr(x,y)R(a,x,b,y).

In the tetrad formulation, the first two arguments of R() arethe implicit arguments of the 2-formsRa

b, so by Eq. (6.29) wehave

R(a,x, eb,y) ≡ g(R(a,x)eb,y) = g(ea,y)Rab(a,x).

We can use the basis ea to compute the trace:

Ric (a, eb) = Tr(x,y)R(a,x, eb,y) = R(a, ec, eb, ec)

= Rcb(a, ec).

Therefore, the Ricci tensor is adequately described by the setof 1-forms

Ricb ≡ −ιecRc

b = ιecRb c;

Ric (x,y) = θb(y)ιxRicb = ιxg(eb,y)Ricb.

Here it is important to recall that eb differs from eb and is de-fined by eb ≡ ηabea.Similarly, the Ricci scalar is computed as the trace

R = Tr(x,y)Ric (x,y) = ιebRicb = Ra b(eb, ea

). (6.31)

These equations offer an efficient method for practical calcu-lations with the Ricci tensor and scalar.

Calculation 6.2.4.1: Using the tetrad formalism, wewill nowcompute the change in the Ricci tensor and Ricci scalar due toa conformal transformation of the metric.Let us denote by θa the dual basis in themetric g. The cor-responding dual basis in the metric g ≡ e2λg is

eλθa

. Sup-

pose that ωa b are the connection 1-forms for the basis θa.We now need to compute the modified connection forms ωa b.After that, the modified Riemann and Ricci tensors will be de-termined.The connection forms ωa b are found using the first Cartanstructure equation (6.16):

dθc = −ωcb ∧ θb,

d(eλθc

)= −ωc

b ∧ eλθb.

It follows that

(dλ) ∧ θc = − (ωc b − ωc b) ∧ θb.

We can now use Eq. (6.23) to express the 1-forms

δωc b ≡ ωc b − ωc b

explicitly through θa and dλ:

δωa b = ιeb(dλ ∧ θa) − ιea

(dλ ∧ θb) −1

2ιebιea

(θc ∧ dλ ∧ θc) .

The last term vanishes since θc ∧ θc = 0, and after a simplifi-cation,

ιeb(dλ ∧ θa) = ιeb

(dλ) ∧ θa − (dλ) ηab = (eb λ)θa − ηabdλ,

we findδωa b = (eb λ)θa − (ea λ)θb. (6.32)

The Riemann tensor is expressed through ωa b via the sec-ond Cartan structure equation (6.30). Hence, the curvature

2-forms Ra b for the metric g are related to the 2-forms Ra b

for the metric g by

Ra b = Ra b + dδωa b

+ δωa c ∧ ωcb + ωa c ∧ δωc

b + δωa c ∧ δωcb.

Now we need to substitute δωa b from Eq. (6.32). The individ-ual terms can be rewritten as follows,

dδωa b = (d (eb λ)) ∧ θa + (eb λ)dθa − [a↔b] ,

where I indicated by [a↔b] the repetition of previous termswith the indices a and b interchanged. Further, we use thefirst Cartan structure equation to simplify

δωa c ∧ ωcb = (ec λ) θa ∧ ωc

b − (ea λ) θc ∧ ωcb

= (ec λ) θa ∧ ωcb + (ea λ)dθb.

Using the identities

(ec λ)θc = θcιec(dλ) = dλ,

(ec λ) (ec λ) = Tr(a,b) (ιadλ) (ιbdλ) = g−1 (dλ,dλ) ,

θc ∧ θc = 0,

(the first line above follows from Statement 6.1.6.2), we find

δωa c ∧ δωcb

= ((ec λ) θa − (ea λ) θc) ∧ ((eb λ)θc − (ec λ) θb)

= [(eb λ)θa − (eb λ)θb] ∧ dλ− g−1 (dλ,dλ)θa ∧ θb

= δωa b ∧ dλ− g−1 (dλ,dλ)θa ∧ θb.

136

Page 145: Winitzki. Advanced General Relativity

6.2 Applications of tetrad formalism

Putting the terms together and simplifying, we obtain

Ra b = Ra b + (d (eb λ)) ∧ θa − (d (ea λ)) ∧ θb

+ (ec λ) [θa ∧ ωcb − θb ∧ ωc

a]

+ δωa b ∧ dλ− g−1 (dλ,dλ) θa ∧ θb. (6.33)

At this point we note that the terms d (ec λ) involve sec-ond derivatives of λ. It is useful to introduce explicitly thetensor consisting of the second covariant derivatives of λ. Thistensor is called the Hessian of the function λ and denotedHλ. The Hessian is a symmetric bilinear form defined by theequivalent formulas

Hλ(a,b) = ιa (∇bdλ) = ∇a∇bλ−∇∇abλ.

It follows from the definition that Hλ(a,b) contains onlyderivatives of λ and of the metric, but no derivatives of a orb.Since λ is a given function, wemay assume that the Hessian

Hλ is a known tensor. We can now express the terms d (eb λ)through the Hessian. First, we use the Leibnitz rule for thederivative∇a of a 1-form dλ applied to a vector b,

∇a [(dλ) b] = (∇adλ) b + (dλ) ∇ab

= Hλ (a,b) + (∇ab) λ.Substituting the basis vectors a = ea and b = eb, we find

∇ea[(dλ) eb] = ιea

d (eb λ) = Hλ (ea, eb) + (∇eaeb) λ.

Using Eq. (6.13), we can express∇eaeb through the connection

1-form ωa b and obtain

ιead (eb λ) = ιea

ιebHλ + ιea

c

ωcb (ec λ) .

By dropping the insertion ιeaof the arbitrary basis vector ea,

we can rewrite the above as an equality of 1-forms,

d (eb λ) = ιebHλ + ωc

b (ec λ) .This relationship is now used to simplify Eq. (6.33) further,

Ra b = Ra b + (ιebHλ) ∧ θa − (ιea

Hλ) ∧ θb

+ δωa b ∧ dλ− g−1 (dλ,dλ)θa ∧ θb. (6.34)

Using Eq. (6.34), we can now compute the Ricci tensor,

Ricb = ιeaRb a = e−λιeaRb a.

The following auxiliary calculations are helpful,

ιea (θa ∧ θb) = (N − 1)θb;

ιeaδωa b = ιea [(eb λ) θa − (ea λ) θb]

= (N − 1) (eb λ) ;

(ea λ) δωa b = (ea λ) ((eb λ) θa − (ea λ) θb)

= (eb λ)dλ− g−1(dλ,dλ)θb;

where N is the dimension of the spacetime,

N = ιeaθa.

Then we find

eλRicb = ιeaRb a = Ricb

+ ιea [(ιeaHλ) ∧ θb − (ιeb

Hλ) ∧ θa]

+ ιea

[δωb a ∧ dλ− g−1 (dλ,dλ) θb ∧ θa

]

= Ricb + (N − 2) ιebHλ +Hλ(ea, e

a)θb

− (N − 2) (eb λ)dλ+ (N − 2) g−1(dλ,dλ)θb.

The trace of the Hessian Hλ appearing in the last expressionis traditionally called the (covariant)D’Alembertian of λ anddenoted by λ,

Tr(a,b)Hλ(a,b) = Hλ(ea, ea) ≡ λ.

The Ricci scalar is given by

R = ιeb Ricb = e−λιeb Ricb,

so we finally obtain

e2λR = ιebeλRicb

= R+ 2 (N − 1)λ+ (N − 1) (N − 2) g−1(dλ,dλ).

The formulas for Ric and R coincide with those obtained inCalculation 1.8.4.1. The calculation in the tetrad formalism issomewhat shorter and less tedious than the calculation in vec-tor notation (see details on page 169). The trade-off is that oneneeds to introduce and use new techniques, such as connec-tion 1-forms, the exterior product, and the insertion operator.

6.2.5 Einstein-Hilbert action in tetrads

The dynamics of General Relativity is described by Einsteinequations, which are derived from the action principle. In Sec-tion 5.1.2 the action was defined as a functional of the metrictensor g. The metric was treated as a dynamical variable thatsatisfies certain equations of motion.An alternative approach is to regard the tetrad of 1-forms

θa as the primary dynamical variable. Themetric tensor canbe expressed through θa as g =

a θa ⊗ θa. However, it isalso useful to work directly with the tetrad, without using themetric tensor. In this section I will explain how the Einstein-Hilbert action is expressed and the Einstein equations are de-rived in the tetrad formalism.5

The standard way of expressing the Einstein-Hilbert actionin local coordinates is

SEH =1

16πG

R√−gd4x,

where R is the Ricci scalar and G is Newton’s constant. Thisformula can be rewritten with help of the volume 4-form,

SEH =1

16πG

M

RVol.

Now we would like to express the 4-form RVol in terms ofthe tetrad θa. This can be done using the following trick.We note that R is a double trace of the Riemann tensor, andthat a multiple trace can be seen in Eq. (6.7). Therefore, it willbe possible to express the 4-form RVol directly through thecurvature 2-forms. Using Eq. (6.31) and Statement 6.1.3.2(d),we find

RVol = Ra b(eb, ea)Vol = Ra b ∧ ∗(θb ∧ θa).

Using the relationship (see Sec. 6.1.3)

∗(θa ∧ θb) =1

2!εabcdθc ∧ θd,

5I learned some relevant details regarding this derivation from the coursenotes [15], lecture 15.

137

Page 146: Winitzki. Advanced General Relativity

6 Tetrad methods

we obtain

RVol = −1

2εabcdRa b ∧ θc ∧ θd.

Hence, the Einstein-Hilbert action can be rewritten as

SEH =1

16πG

M

Ra b ∧ ∗(

θb ∧ θa)

= − 1

32πG

M

εabcdRa b ∧ θc ∧ θd.

6.3 Connections on vector bundles

6.3.1 Vector bundles as generalization oftangent bundles

The tangent bundle to a manifold can be viewed as a collec-tion of vector spaces TpM attached to each point p of the man-ifoldM. These vector spaces are attached to points in a spe-cial way, so that the relation between tangent spaces TpM andTp′M for neighbor points p, p′ can be nontrivial.This construction is generalized by replacing the tangentspace TpM by a different, perhaps more complicated, vectorspace V . The result is the concept of a vector bundle, whichis the union of copies of vector spaces V (p) attached to eachpoint p of a manifoldM. All the vector spaces V (p) shouldhave the same dimension. The manifoldM is then called thebase manifold and the vector space V (p) is called the fiber ofthe bundle at point p. A function φ : p → V (p) onM withvalues in the fibers V (p) is called a section of the bundle. Forexample, sections of the tangent bundle TM are vector fieldsv(p) since the value of a vector field at a point p is a vectorfrom the space TpM. Sections of more general vector bundlescan be visualized as V -valued functions f(p) on the manifold,such that the value of f(p) belongs to the fiber space V (p).The reason we need to consider more general vector bun-dles than TM is that it is convenient to represent physicalfields in (classical) field theory as sections of vector bundles.Fields are usually vector-valued (or tensor-valued) functionsof spacetime, e.g. a Dirac spinor field ψ represents particlesof spin 1

2 and has values in a four-dimensional complex vec-tor space. When we represent a field by a section of a vectorbundle, the spacetime is the base manifoldM and the valueof the field at a point belongs to some vector space which isthe fiber at that point. For example, spinor fields ψ(p) canbe thought of as sections of a “spinor bundle” with fibers C4;tensor fields Aµν(p) are sections of a vector bundle with thefibers TpM ⊗ TpM. Vector bundles give us a mathematicalconstruction that describes all the physical fields in a unifiedlanguage.

6.3.2 Examples of bundles

A direct productM× V , whereM is any manifold and V isany vector space, is obviously a vector bundle; such bundlesare called trivial. Many of the vector bundles used in physicsare trivial, but sometimes the bundles turn out to be nontriv-ial, i.e. not isomorphic to a direct product of a base and a fiber.For example, consider the tangent bundle TS2 to a sphere

S2. This bundle has the fiber R2 and the base S2. If the bundleTS2 were trivial, one would be able to find a trivialization ofTS2, i.e. a smooth, one-to-one map µ : S2 × R2 → TS2 whichpreserves the linear structure in the fibers. The existence ofsuch a map would imply that the image of a fixed nonzero

vector fromR2, say the basis vector e1 with components (1, 0),is a smooth and everywhere nonzero vector field µ(p; e1) onS2. However, such a vector field does not exist, as shownby the following statement, which is a standard theorem oftopology (“one cannot comb a sphere”).

Statement 6.3.2.1: There is no smooth, everywhere nonzerotangent vector field to a sphere S2. (Proof on this page.)

Proof of Statement 6.3.2.1: Consider a stereographic pro-jection of the sphere S2 onto a plane. The sphere is realized asthe set

(x, y, z) : x2 + y2 + z2 = R2

inR3, and the projection

is described explicitly by

(x, y, z) → x+ iy

R − z≡ ζ,

where the plane is parametrized by a single complex coordi-nate ζ. It is clear that the north pole (0, 0, R) is mapped ontoinfinity,

limx,y→0

x+ iy

R−√

R2 − x2 − y2= ∞,

while the south pole (0, 0,−R) is mapped onto ζ = 0 (thecenter of the plane). Suppose we have a smooth and every-where nonzero vector field v on the sphere; then the imageof v under the projection is a smooth, everywhere nonzerovector field v on the plane. By the smoothness assumption, vis almost constant and nonzero in a sufficiently small neigh-borhood of the north pole. We may trace the direction of varound a small closed contour γ surrounding the pole, andcompute the accumulated angle swept by v. This accumu-lated angle will be zero since v is almost constant. However,after the stereographic projection the small contour γ will be-come a large closed contour γ1 on the plane. Since the interiorof γ on the sphere will be mapped to the exterior of γ1 on theplane, the direction of v is mirrored and the angle swept byv when traced around γ will be 4π. However, the contourγ1 can be continuously deformed into a small contour γ2 sur-rounding the center ζ = 0 of the plane. The angle swept by v

around γ2 is zero since v must be approximately constant ina sufficiently small neighborhood of ζ = 0. However, the an-gle swept by a vector field around a closed contour is alwaysan integer multiple of 2π and hence must remain constant if γis deformed continuously and if v is smooth and everywherenonzero. Thus the angle swept by v around γ1 is zero andcannot be equal to 4π. This gives a contradiction. Therefore,either the vector field v is not smooth around the poles (in-validating the first step of the proof), or v goes to zero or isnon-smooth at another point, which makes the accumulatedangle jump discontinuously from 4π to 0 during the deforma-tion of the contours γ1 → γ2.

So it follows from Statement 6.3.2.1 that the tangent bundleTS2 is nontrivial. However, the tangent bundle of a 3-sphere,TS3, is trivial. To show this, we first note that the sphere S3 isthe same manifold as the special unitary group SU(2) (see thediscussion in Sec. 1.2.2). Tangent vectors on SU(2) correspondto infinitesimal unitary transformations, which are always ofthe form 1+h, where h is a “small” anti-Hermitianmatrix, h =−h†. The space of all anti-Hermitian matrices is isomorphicto R

3. A trivialization that maps TS3 into S3 × R3 can be

described explicitly as follows.6 A pair (A, h) ∈ SU(2) × R3,

6It was proved in differential topology [4] that the only spheres with triv-ial tangent bundles are S1, S3, and S7. Moreover, no even-dimensionalsphere S2n admits a continuous and everywhere nonzero tangent vectorfield [30]. (This information is not actually used in GR.)

138

Page 147: Winitzki. Advanced General Relativity

6.3 Connections on vector bundles

where both are represented by 2 × 2matrices, is mapped intothe tangent vector

v f(g) ≡ ∂

∂s

∣∣∣∣s=0

f(A(1 + sh)A−1

),

where f(A) is a function onM ≡ SU(2),A ∈ SU(2), h is a 2×2anti-Hermitianmatrix, and 1+sh is a 2×2matrix representingan “infinitesimal” unitary transformation for “small” s.

6.3.3 Covariant derivatives on vector bundles

A covariant derivative on a vector bundle is a generalizationof directional derivative, so that for a section φ and a vectorfield u, the derivative ∇uφ is another section. We have seenthat the covariant derivative ∇ acts on various tensors in dif-ferent ways (the covariant derivatives of a vector and a tensorare given by different formulae). Using the picture of bundles,we say that tensors of different ranks are sections of differentbundles, and it is natural that different bundles have differ-ent covariant derivatives. (Up to now, we have been using thesymbol ∇ to mean different connections, depending on therank of tensor it acts on.)A general way to write the covariant derivative on a vectorbundle is

∇ut = ∂ut + Γ(u)t,

where ∂ is the “partial derivative” connection defined usinga particular local coordinate system on the base manifold,i.e. ∂u ≡ uµ∂µ, and Γ is a transformation-valued 1-form. Thevalue of Γ is a linear transformation within a fiber and Γ iscalled the connection 1-form.

6.3.4 Gauge theories and associated bundles

Gauge field theories (such as electrodynamics, electroweaktheory, and chromodynamics) use vector bundles with an ad-ditional structure. Namely, a certain Lie group G is acting ineach fiber, such that the equations of the theory are invari-ant under local transformations of the group. For example,the gauge group G is U(1) for electromagnetism, SU(2) forthe electroweak theory, and SU(3) for chromodynamics. ThegroupG is then called the gauge group and the vector bundleis called associated to this group (this is not a rigorous defi-nition but is intended only as a qualitative hint). For an asso-ciated bundle, the connection 1-form Γ has values in a repre-sentation of the Lie algebra g of the group G. This of courselimits the possible connections Γ, leaving only the physicallyrelevant ones. Why not give a more rigorous definition?

6.3.5 Tangent bundle as associated bundle

Let us try to see if the tangent bundle TM can be also viewedas a bundle of which a gauge group acts. The gauge group inthat case would be the orthogonal group SO(n) that acts fiber-wise, preserving the metric g in each fiber. (For a metric withthe Lorentzian signature, the group will be SO(3, 1), calledalso the Lorentz group.) The connection 1-form Γ is thenrequired to give a transformation which is one of the trans-formations from the representation of the Lie algebra so(n),i.e. for any vector v the transformation Γ(v) must be an anti-symmetric linear transformation such that

g(Γ(v)x,y) + g(x,Γ(v)y) = 0.

It is easy to see that this condition coincides with the con-dition (1.39) expressing the compatibility of the connection∇ = ∂ + Γ with the metric. Therefore the statement that theconnection ∇ on the tangent bundle comes from the orthog-onal gauge group is the same as the assumption of the com-patibility of ∇ with the metric. This is just one example ofhow physical assumptions can be expressed in the languageof gauge groups. This explanation needs to be made moreclear. What exactly is invariant under a gauge transforma-tion? Is Γ invariant?? (It isn’t.) In fact, the above equationg(Γ(v)x,y) + g(x,Γ(v)y) = 0 seems to be incorrect!!!

139

Page 148: Winitzki. Advanced General Relativity
Page 149: Winitzki. Advanced General Relativity

7 Spinors

Additional literature: [27, 32].

The purpose of this chapter is to explain the definition ofspinors and spinor fields, and to show some basic uses ofspinors in general relativity.

7.1 Introducing spinors

A spinor is a two-dimensional complex-valued vector, α ∈ S,whereS ≡ C2 is an auxiliary space (the space of spinors).1 Thenecessity to introduce spinors became apparent only after thedevelopment of quantum mechanics: it turned out that parti-cles such as electrons or quarks must be described by spinor-valued fields, rather than by vector or tensor fields. Below Ishall explain the motivation for introducing spinors in moredetail, and now let us start with a heuristic picture.A usual vector is a “directed magnitude,” which literallymeans a direction along a curve leading from a point ofthe manifold to a neighbor point. Tensors are defined alge-braically through vectors. Every tensor remains unchangedunder a rotation by 2π because every point returns to its origi-nal place, and tensors are defined through directions betweenpoints. Now, a spinor is a special kind of “directed magni-tude” that remains unchanged only under a 4π rotation butchanges sign under a 2π rotation. Clearly, such an object can-not be a vector or a tensor, or any other object defined throughpoints of amanifold, because every point in the neighborhoodof a given center of rotationwill return to its initial position af-ter a 2π rotation. Thus, a spinor cannot be defined through thetangent space to a point on a manifold. Spinors must belongto some auxiliary vector space in which the rotation groupacts in a special way. To understand how a rotation can act insuch an unusual way, we need to study the rotation group insome more detail.

7.1.1 Quaternions and rotations

We consider rotations in a three-dimensional Euclidean space

R3; the group of all orthogonal transformations (RRT =1) is O(3), and its subgroup SO(3) contains proper rota-tions, i.e. orthogonal transformations preserving the orien-

tation (det R = 1). Quaternions (invented by W. R. Hamil-ton) are a generalization of complex numbers where one in-troduces three “imaginary units” instead of one. Quaternionsare useful in physics primarily because they represent three-dimensional rotations in a concise way. Also, spinors arequaternions, in the sense which will be made precise below.(In brief: The four-dimensional space of quaternions is equiv-alent to the complex, two-dimensional spinor space). Let usnow define and explore the properties of quaternions.The set of quaternions H is a four-dimensional real spacespanned by the basis vectors 1,h1,h2,h3. The special ele-

1These two-dimensional objects are also calledWeyl spinors, in distinctionfrom Dirac spinors that have four complex components and are definedusing Weyl spinors. The connection between the two will be explainedbelow, and we say “spinor” instead of “Weyl spinor.”

ments hj called quaternionic units2 obey the multiplication

rule

h1 ∗ h1 = h2 ∗ h2 = h3 ∗ h3 = h1 ∗ h2 ∗ h3 = −1. (7.1)

It follows from the multiplication rule that

hj ∗ hk = −δjk1 +

3∑

l=1

εjklhl, (7.2)

where δ is the Kronecker symbol and εjkl is the coordinaterepresentation of the three-dimensional Levi-Civita symbol,ε123 = 1. In particular,

h1 ∗ h2 = h3 = −h2 ∗ h1,

so the quaternionic space H forms a non-commutative alge-bra.

Statement: Assuming Hamilton’s quaternionic multiplica-tion rule (7.1), the associativity and distributivity of quater-nionic multiplication, and the property 1 ∗ x = x ∗ 1 = x forany quaternion x, derive Eq. (7.2).Solution: Consider the expression h1 ∗ h2 ∗ h3 ∗ h3 = −h1 ∗

h2 = −h3. Similarly, we find h2 ∗ h3 = h1. Consider theexpression h2 ∗h1 ∗h2 ∗h3 = −h2; after multiplying by h3 ∗h2

on the right, it follows that

h2 ∗ h1 = −h2 ∗ h3 ∗ h2 = −h1 ∗ h2 = −h3.

Consider the expression h3 ∗ h1 ∗ h2 ∗ h3 = −h3; it followsthat h3 ∗ h1 = h2. All other required relations are obtained ina similar way.

A general quaternion is x ≡ x01 +∑3

j=1 xjhj , where xj arereal coefficients. It is sometimes useful to visualize a quater-nion as a “sum” of a scalar x0 and a 3-vector ~x ≡ (x1, x2, x3).The conjugate quaternion is defined by

x ≡ x01 −3∑

j=1

xjhj .

A quaternion x is called purely imaginary if x = −x, whichmeans that the first coefficient vanishes, x0 = 0. The magni-tude |x| of a quaternion x is

|x| ≡(x2

0 + x21 + x2

2 + x23

)1/2.

It is easy to see that x∗x = |x|2 1 and |x ∗ y| = |x| |y|, similarlyto the properties of complex numbers.

Calculation: Verify these properties.Solution: Using the vector notation, the multiplication rulecan be written concisely as

(x01 + ~x) ∗ (y01 + ~y) = (x0y0 − ~x · ~y)1 + x0~y + y0~x+ ~x× ~y.

2A more traditional notation for quaternionic units is i, j,k instead ofh1, h2, h3, but we shall need to mix quaternions with complex numberswith their own imaginary unit i.

141

Page 150: Winitzki. Advanced General Relativity

7 Spinors

Since x = x01− ~x, it follows that x ∗ x = |x|2 1. The property

|x ∗ y|2 = |x|2 |y|2 is straightforward to derive, recalling thatthe vector product ~x × ~y is orthogonal to both ~x and ~y, and

using the formula |~x× ~y|2 = |~x|2 |~y|2 − |~x · ~y|2.An important property is that every nonzero quaternion x

has an inverse x−1 such that x ∗ x−1 = x−1 ∗ x = 1.

Calculation: Find the inverse x−1 to a general quaternion

x = x01 +∑3j=1 xjhj , assuming x 6= 0. Show that the in-

verse would not necessarily exist if we admitted complex co-efficients xj .

Solution: Since x ∗ x = |x|2 1, we have

x−1 =x

|x|2=

1

|x|2(x01− x1h1 − x2h2 − x3h3) .

The inverse exists when |x|2 =∑4

j=0 x2j 6= 0, which is not

guaranteed for quaternions with complex coefficients xj .Another useful property is the exponentiation formula.

Statement: The exponential of a quaternion x is defined bythe Taylor series,

expx = 1 + x +1

2!x ∗ x +

1

3!x ∗ x ∗ x + ...

Show that every nonzero quaternion x is an exponential ofsome other quaternion b ≡ lnx. Show that lnx is purelyimaginary if |x| = 1.Hint: First compute the exponential of a purely imaginaryquaternion b, then use the fact that 1 commutes with purelyimaginary quaternions.

Solution: If b =∑3

j=1 bjhj then b∗b = − |b|2 1, and we find

expb = 1 cos |b| + b

|b| sin |b| ,

exp (b01 + b) = 1eb0 cos |b| + eb0b

|b| sin |b| .

For an arbitrary x = x01+∑3j=1 xjhj , we set x = exp(b01+b)

and find

b0 = ln |x| ,

b =x1h1 + x2h2 + x3h3√

x21 + x2

2 + x23

arccosx0

|x| .

The quaternionic multiplication law admits a representa-tion in terms of complex 2 × 2 matrices. We may guess therequired representation if we note that the multiplication ruleof the Pauli matrices,

σ1 =

(0 11 0

)

, σ2 =

(0 −ii 0

)

, σ3 =

(1 00 −1

)

,

σjσk = δjk + i3∑

l=1

εjklσl,

is very similar to the multiplication rule of the quaternionicunits hj , except for the extra factors of i. Therefore a repre-sentation for hj by 2 × 2 complex matrices may be found as

hj = −iσj , j = 1, 2, 3, assuming (naturally) that the identitymatrix 1 is used for the quaternion 1. The general quaternion

a = a01 +∑3

j=1 ajhj will be represented by the matrix

a =

(a0 − ia3 −ia1 − a2

−ia1 + a2 a0 + ia3

)

. (7.3)

Note that the determinant of this matrix is equal to |a|2, andthat the Hermitian conjugate matrix corresponds to a:

det a = |a|2 = a20 + a2

1 + a22 + a2

3; a† = ˆa.

Remark: As an additional motivation, it is helpful to derivea matrix representation of quaternions without guessing thePauli matrices. The four-dimensional space H of quaternionscan be interpreted as a two-dimensional complex vector spaceif we somehow define what it means to multiply a quaternionby a complex number,

(λ+ iµ)a =???

Clearly, it is sufficient to define the multiplication by i, andwe need to imitate the property i2 = −1. Since any of the unitquaternions hj satisfies h

2j = −1, we may choose for instance

h1 as the representative of i, so then we define

(λ+ iµ)a ≡ λa + µa ∗ h1.

This is clearly a linear map in H (note that we multiply byh1 from the right). We expect that H, considered as a com-plex vector space, will be two-dimensional, so any quaterniona ∈ H can be represented as a linear combination of two ba-sis quaternions with complex coefficients. We may choose, forinstance, 1,h2 as the basis, since h1 is already chosen as therepresentative of i. (Different such choices will yield differentbut algebraically equivalent representations.) Then an arbi-trary quaternion will be decomposed as

a ≡ a01 +

3∑

j=1

ajhj = (a0 + ia1)1 + (a2 − ia3)h2

and so can be represented by an array of two complex num-bers (a0 + ia1, a2 − ia3) ∈ C

2. Since the multiplication bya fixed quaternion x is a linear transformation of the vec-tor space C2, it is natural to represent that transformationby a 2 × 2 complex matrix. For example, the transformationa → h3 ∗ awill be equivalent to

h3 ∗(a0 + ia1

a2 − ia3

)

= h3 ∗ (a01 + a1h1 + a2h2 + a3h3)

= −a31− a2h1 + a1h2 + a0h3

=

(−a3 − ia2

a1 − ia0

)

=

(0 −i−i 0

)(a0 + ia1

a2 − ia3

)

,

therefore h3 is represented by the matrix −iσ1. Similarly, wefind h1 → iσ3 and h2 → −iσ2. In this way we find a possiblerepresentation of quaternions through Pauli matrices. (Thisrepresentation is equivalent to the standard one, hj → −iσj ,after a permutation of the unit quaternions.)The connection between quaternions and rotations is foundby using the following important trick. We consider the lineartransformation

Rc : x → c ∗ x ∗ c−1,

for a fixed quaternion c 6= 0. Note that the transformation Rc

preserves the magnitude of quaternions,

|Rc(x)| = |x| .

142

Page 151: Winitzki. Advanced General Relativity

7.1 Introducing spinors

We may normalize |c| = 1 without changing Rc, then

Rc(x) = c ∗ x ∗ c (7.4)

and so Rc(x) = Rc(x). Thus the three-dimensional sub-space of purely imaginary quaternions is invariant under Rc.A purely imaginary quaternion x = −x, interpreted as a 3-

vector, is transformed by Rc without changing its Euclidean

length; thus Rc acts as an orthogonal transformation in R3,

i.e. Rc ∈ O(3). Since every quaternion c such that |c| = 1is an exponential of a purely imaginary b = ln c, we have

Rc(x) = exp (b) ∗ x ∗ exp (−b) ,

and thus we can smoothly deform Rc into the identity map

by decreasing b to zero. Since the determinant det R = ±1

for R ∈ O(3), we must have det Rc = 1 for all c, andtherefore Rc ∈ SO(3). In this way, normalized quaternionsrepresent three-dimensional rotations, and multiplication of

quaternions corresponds to composition of rotations: RaRb =

Ra∗b. Since Rc is quadratic in c, heuristically onemay say that

the quaternion c is a “square root” of the rotation Rc.

Statement 7.1.1.1: The magnitude of a quaternion b = ln c

corresponds to a half of the angle of the rotation Rc, and thedirection of b is the axis of rotation.Hint: Consider an infinitesimal rotation Rexpb with |b| ≪ 1,and show that it rotates by the infinitesimal angle 2 |b| aroundthe axis parallel to b. Derivation: Needs to be filled in!

Statement 7.1.1.2: Show that a quaternion c such that |c| =1 is represented by a matrix c which is unitary and has unitdeterminant.Hint: Use previous statements and the property c† = ˆc.It is important to note that a quaternion represents only thehalf-angle of rotation, so a rotation by 2π may be representedby a nontrivial quaternion c = −1 as well as by c = 1. Imag-ine that we use a quaternion to keep track of the orientation ofa rigid body; then the quaternion will change sign after a rota-tion by 2π and return to the original value after another rota-tion by 2π. This unusual behavior is the main reason quater-nions are useful in the construction of spinors.

Since Rc = R−c, every two opposite quaternions corre-spond to the same three-dimensional rotation. The set of allnormalized quaternions c : |c| = 1 is a sphere S3 in the four-dimensional space H. Thus, the set SO(3) is equivalent to asphere S3 with identified opposite points. We have thus de-

fined a projection map S3 → SO(3) acting as c → Rc, andthis map is everywhere 2-to-1. Such maps between manifoldsare called coverings, so in particular S3 → SO(3) is a twofoldcovering.Moreover, the sphere S3 is represented as the set of uni-tary complex matrices with unit determinant via Eq. (7.3); thegroup of these matrices is denoted by SU(2). Note that thisgroup is at the same time a manifold (S3), and also the mul-tiplication by a fixed matrix A ∈ SU(2) is a smooth mapS3 → S3. Groups that are smooth manifolds are called Liegroups.Note that the multiplication of quaternions corresponds, onthe one hand, to the multiplication of matrices, and on theother hand, to multiplication of rotations. Thus the coveringSU(2) → SO(3) is a group homomorphism.There is no canonical way to define a smooth inverse map

SO(3) → SU(2) because we would need to choose the sign

of a quaternion according to “how many 2π rotations weremade,” and this information cannot be derived from one el-ement of SO(3). If one were given not only an element

R ∈ SO(3) but also a continuous path leading from the iden-

tity transformation 1 ∈ SO(3) to R, then one would be able to

choose a unique element of SU(2) corresponding to R. (Pathsin SO(3) corresponding to an odd and an even number of 2πrotations are topologically inequivalent and cannot be contin-uously deformed into each other.) On the other hand, wehave seen that infinitesimal transformations can be uniquelymapped 1-to-1 between SO(3) and SU(2).We have seen that a quaternion can be equivalently viewedas a two-dimensional complex vector α ∈ C2 onwhich quater-nions c act as unitary matrices c ∈ SU(2). Although the vec-tor α does not belong to the real three-dimensional space R3,

we can say that α “feels” a rotation X ∈ SO(3) performed in

R3 because we can find a quaternion c such that X = Rcandtransform the vector α with the unitary matrix c. (Note thatunitarymatrices preserve the Hermitian scalar product inC

2.)Transformed in this manner, α represents an unusual kind ofa “directed magnitude” that rotates only “half-way.” Thus,the two-dimensional complex vector α is in fact a spinor andthe auxiliary space S ≡ C2 is the spinor space. To summarize,spinors are quaternions viewed as elements of C2.

7.1.2 The Lorentz group

So far, we worked in the Euclidean space R3 and ignored thetime dimension of the spacetime. We shall now extend theconsiderations to the 3+1-dimensional case. We have seen thatproper rotations R ∈ SO(3) are representedbymatrices actingin a two-dimensional complex space S of spinors, such that aspinor changes sign under a rotation by 2π. We shall nowextend the considerations to the 3+1-dimensional case.Our goal for now is to define spinors at a point. We shallbe working in the Minkowski spacetime, which we interpretas a tangent plane to a point of a 3+1-dimensional spacetimemanifold.Lorentz transformations are linear transformations of theMinkowski spacetime that preserve the metric g.

Statement: Show that any transformation L preserving the

metric, g(Lx, Ly) = g(x,y), must be a linear transformation.Hint: Consider an image of an orthogonal basis under L.Solution: An orthogonal basis ej must be transformedinto another orthogonal basis Lej. Then, we have

g(L(x + λy), Lz) = g(x + λy, z)

= g(Lx, Lz) + λg(Ly, Lz).

Choosing z = ej for j = 0...3, we obtain the required linearityproperty.

All the Lorentz transformations L form a group O(3, 1),with a subgroup SO(3, 1) of orientation-preserving transfor-

mations (det L = 1). Below we shall always consider onlythe group SO(3, 1) which we shall call the Lorentz group.Lorentz transformations include proper rotations, SO(3) ⊂SO(3, 1), and we already know how spinors transform un-der SO(3). Now we need to investigate how spinors maytransform under other elements the Lorentz group that arenot purely rotations. These other elements are the boosts,i.e. a transformation between reference frames moving with

143

Page 152: Winitzki. Advanced General Relativity

7 Spinors

respect to one another. (A a proper rotation transforms be-tween frames that are at rest with respect to one another.)3 Wemay identify a proper rotation in a geometric way as a Lorentztransformation that has an invariant timelike vector v0 and aninvariant spacelike vector s0, such that g(v0, s0) = 0. A trans-

formation L such that Lv0 = v0, Ls0 = s0, and g(v0, s0) = 0,is interpreted as a spatial rotation in the rest frame of an ob-server with 4-velocity v0 around the axis s0. A boost cannothave invariant timelike vectors, but instead it has two invari-ant spacelike vectors s1 and s2, which are orthogonal to thedirection of the boost “velocity.” Let us describe boosts moreexplicitly.

Statement: Show by deriving an explicit formula that there

exists a boost L ∈ SO(3, 1) transforming a timelike vector uinto another timelike vector v 6= u, and leaving invariant twospacelike vectors s1,2, orthogonal to u.

Solution: The required Lorentz transformation L should actin the 2-plane spanned by u,v. Such a transformation willleave invariant every spacelike vector orthogonal to u and v.The subspace (u,v)⊥ is two-dimensional, therefore there ex-ist two spacelike vectors orthogonal to u which will be left

invariant. An explicit formula for L can be derived e.g. by de-termining the coefficients λij in the general expression for alinear transformation in the u,v plane,

Lx = λ11ug(x,u) + λ12ug(x,v) + λ21vg(x,u) + λ22vg(x,v).

Assuming g(u,u) = g(v,v) = 1, the result is

Lx = vg(x,u) +γv − u

γ2 − 1g(x, γu− v),

where γ ≡ g(u,v) > 1 is the “boost factor.” Note that forv → u, γ → 1 the result is well-behaved.The relationship between boosts and rotations is the follow-ing.

Statement 1: Every Lorentz transformation can be repre-sented as a product of a boost and a proper rotation.

Proof: Suppose a Lorentz transformation L ∈ SO(3, 1)brings a timelike vector u into a (timelike) vector v. If v = u

then L is itself a proper rotation, so we can still say that L isa product of a “trivial” boost and a rotation. If v 6= u, we

can find a boost L1 that brings u into v. After performing theboost, the 3-plane orthogonal to u is brought into the 3-planev⊥ orthogonal to v. Now we look for the remaining transfor-

mation L2 ∈ SO(3, 1) such that L = L2L1. Since L2 shouldleave v invariant, we have L2(v

⊥) ⊂ v⊥, and thus L2 is aproper rotation.This statement allows us to understand the topologicalstructure of the group SO(3, 1).

Statement 2: The Lorentz group SO(3, 1) has the topologySO(3) × R3.

Proof: If a boost changes a timelike vector u to v then thespacelike vector

~vrel ≡v − γu

γ,

3In the geometric approach adopted in this text, coordinate systems are notused, so Lorentz transformations change vectors rather than coordinate sys-tems. In the coordinate-based approach, such transformations are calledactive, while transformations that merely change coordinate systems arecalled passive.

such that g(u, ~vrel) = 0, represents the relative velocity of vin the frame u. Note that γ−2 = 1 − |~vrel|2. If we considerthe images of u under every possible boost, we find that each3-vector ~vrel ∈ R3 uniquely corresponds to a boost. Thereforethe space of all boosts has the topology R3. By Statement 1,

every L ∈ SO(3, 1) can be decomposed into a product of aboost and a proper rotation. Thus, the manifold SO(3, 1) canbe mapped one-to-one into the manifold SO(3) × R3.

7.1.3 Lorentz transformations of spinors

To determine how spinors transform under a boost, we needto extend the quaternionic picture of proper rotations toboosts. Once each Lorentz transformation is mapped to aquaternion, the representation (7.3) will yield the correspond-ing spinor transformation.We can arrive at a quaternionic representation of Lorentztransformations by analogy with Eq. (7.4) if we considerquaternions with complex coefficients. Denote by H ⊗ C the

space of linear combinations c = c01 +∑3

j=1 cjhj , where cjare complex coefficients. By analogy with Eq. (7.4), let us con-sider the quaternionic transformation4

Lc : x → c ∗ x ∗ c, (7.5)

where c is a quaternion satisfying

|c|2 ≡ c20 + c21 + c22 + c23 = 1,

and the quaternion c is at once the quaternionic and the com-plex conjugate of c,

c ≡ c01 −3∑

j=1

cjhj .

“Complex-valued quaternions” do not necessarily satisfy

|c|2 ≥ 0, but it is easy to check that the algebraic property

|x ∗ y|2 = |x|2 |y|2 still holds (its derivation is independent ofthe assumption that cj are real-valued). Thus we have

|Lcx|2 = |x|2

as before.It is easy to see that x ∗ y = y ∗ x, and so the transformation

Lc acting in the complex quaternionic space H ⊗ C preservesthe subspace of “self-adjoint” or “Hermitian” quaternions x

such that x = x:

Lcx ≡ c ∗ x ∗ c = c ∗ x ∗ c = c ∗ x ∗ c.

Note that a “Hermitian” quaternion must have the form x =

x01 + i∑3

j=1 xjhj , where xj are real. Hence, x can be viewedas a real 4-vector with components (x0, x1, x2, x3). Then we

immediately find that the quantity |x|2 = x20 − x2

1 − x22 − x2

3

is conserved by transformations Lc. Therefore Lc act asLorentz transformations in the 4-dimensional space of Hermi-tian quaternions x. (The same result holds for anti-Hermitianquaternions x such that x = −x.)Hence, we have found a correspondence between quater-

nions c and Lorentz transformations Lc. The following simpleconsiderations show that the inverse map, SO(3, 1) → H ⊗ C,

4Note that the formula x → c ∗ x ∗ c−1 does not generate Lorentz transfor-mations!

144

Page 153: Winitzki. Advanced General Relativity

7.2 Spinor algebra

also exists. It follows from previous calculations that every

c such that |c|2 = 1 is equal to an exponential of a “purelyimaginary” quaternion,

c = expb, b = b1h1 + b2h2 + b3h3,

where now bj are also complex-valued. As before,only orientation-preserving transformations are generated by

quaternions. Transformations Lexpb with all real bj corre-spond to proper rotations that preserve the 4-vector u ≡(1, 0, 0, 0). These rotations form the subgroup SO(3). Trans-formations with imaginary bj are boosts that preserve twospacelike vectors orthogonal to u (see Calculation below).Since every Lorentz transformation is a product of a properrotation and a boost, and since the product of quaternions cor-responds to the product of transformations, it follows that for

any L ∈ SO(3, 1) there exists a complex-valued quaternion c

such that L = Lc.

Calculation: Derive the explicit formula for the transforma-

tion of the coefficients xj of the 4-vector x under x → Lcx,where c = exp (iαh1), and show that this is a boost in the di-rection x1.Now let us use the matrix representation (7.3) which mapsquaternions c into 2 × 2 complex matrices c acting in S ≡ C2.

The condition |c|2 = 1 is equivalent to det c = 1, therefore theset of all the admissible quaternions is the same as the set ofall 2 × 2 complex matrices with unit determinant. This set isdenoted by SL(2,C) and is called the special linear group intwo complex dimensions. It is clear that we have built a grouphomomorphism SL(2,C) → SO(3, 1). As before, quaternions

c and −c generate the same Lorentz transformation Lc. Thus,the map SL(2,C) → SO(3, 1) is a 2-to-1 covering.It is easy to see that the quaternionic conjugation c → c

corresponds to the Hermitian conjugation of matrices, c →c†. Therefore the matrix formula describing the transforma-tion (7.5) is

Lcx = cxc†,

where c ∈ SL(2,C) and the matrix x describing the 4-vectorx is

x ≡(

x0 + x3 x1 − ix2

x1 + ix2 x0 − x3

)

. (7.6)

Note that this matrix is Hermitian, x† = x; thus the subspaceof “Hermitian” quaternions is equivalent to the subspace ofHermitian 2 × 2matrices. A concise expression for the matrixx is

x =

3∑

j=0

σjxj ,

where σj , j = 1, 2, 3 are the Pauli matrices, and we have setσ0 ≡ 1. As before, we have

det x = g(x,x) = x20 − x2

1 − x22 − x2

3.

In this way, the Pauli matrices σj provide a 1-to-1 map be-tween the Minkowski spacetime R4 and the space of all 2 × 2Hermitian matrices.5 Below we shall denote this map by σ.Thus we have built the spinor representationof the Lorentzgroup in the spinor space S = C2. The procedure can be sum-

marized like this: Every Lorentz transformation L ∈ SO(3, 1)

5This map is called the Infeld-van der Waerden map, after the names of itsdiscoverers.

yields a quaternion c such that L = Lc, and then we com-pute the corresponding matrix c acting in S. The product of

Lorentz transformations L corresponds to the product of ma-trices c.Given a Lorentz transformation L ∈ SO(3, 1), the quater-nion c (and thus the matrix c) is defined only up to a sign.Hence, it is necessary to consider spinors α ∈ C2 as quantitiesthat are defined only up to a sign. (In quantum mechanics,this would be a natural conclusion if the spinor were con-sidered as a “wave function” which is defined only up to aphase.) As before, a spinor changes sign after a rotation by2π. However, now we know how to rotate the spinor in anyreference frame since we derived the action of an arbitraryLorentz transformation on a spinor.

7.2 Spinor algebra

The role of quaternions in the construction of spinors is il-lustrative but purely auxiliary. Quaternions provide a visualderivation of the concept of spinor spaces adapted to 3+1-dimensional spacetimes, as well as a convenient tools for com-putations with rotations and Lorentz transformations.6 Fromnow on, we may forget quaternions and simply assume thatspinors are elements of a complex vector space S = C

2 inwhich two structures are defined:

1. A representation “up to a sign” of the Lorentz groupSO(3, 1) which “rotates by half-angle” (the spinor repre-sentation).

2. A fixed 1-to-1 correspondence σ between 2×2Hermitianmatrices and vectors from the Minkowski spacetime R4.

Using these properties as building blocks, we shall now studythe tensor algebra generated by the space S and its relation tothe usual tensor algebra in the Minkowski spacetime. Sincequaternions will not be used any more, we shall denote the

spinorial form of Lorentz transformations simply by Ls ratherthan by c. Also, we shall not distinguish between SO(3, 1)and SL(2,C) since this distinction will play no role in our cal-culations.In a real vector space V , tensors of rank (p, q) are defined astensor products of p copies of V and q copies of the dual spaceV ∗. To build tensors from the space S, we need to considerthe dual space to S. We define the dual space S∗ as the spaceof linear functions on S, i.e. functions f : S → C such that

f(b + λc) = f(b) + λf(c), b, c ∈ S, λ ∈ C.

Additionally, we can define the complex conjugate dual spaceS∗ as the space of anti-linear functions on S, i.e. functionsf : S → C such that

f(b + λc) = f(b) + λf(c), b, c ∈ S, λ ∈ C.

Finally, we define the space dual to S∗ as the space S of “com-plex conjugate” spinors.All the four spaces S, S, S∗, S∗ are two-dimensional, anda basis a,b in S naturally generates a dual basis a∗,b∗in S∗, a conjugate dual basis

a∗, b∗

in S∗, and a conjugate

basisa, b

in S. There is a natural antilinear map from S∗

6A definition of spinors for higher-dimensional spacetimes can be formu-lated using a suitable generalization of quaternions, called Clifford alge-bras.

145

Page 154: Winitzki. Advanced General Relativity

7 Spinors

to S∗: a linear function f(x) generates an anti-linear functionwhose values are f(x). The corresponding map S → S is alsodenoted by the overbar, so for instance

a + λb = a + λb ∈ S if a,b ∈ S, λ ∈ C.

Lorentz transformations act on dual spinors by inverse ma-

trices L−1s , and on conjugate spinors by Hermitian conjugate

matrices L†s.

Spinorial tensors can be built by taking tensor products ofthe spinor space S and its three conjugates. The rank of aspinorial tensor contains four numbers, for instance, (11, 22),showing how many copies of each of S, S, S∗, S∗ participatein the tensor product. For instance, Hermitian bilinear formsare (0, 11)-tensors on S. Complex conjugation is then a natu-rally defined map from

(jk, lm

)-tensors to

(kj,ml

)-tensors.

Since we have many different types of tensors, it is in-evitable that wemust eventually use the (abstract) index nota-tion. It is traditional to denote indices in spinor spaces by cap-ital letters, and indices in conjugate spaces by a prime, thusTAB′ denotes a rank (0, 11) spinorial tensor. Note that the or-der of primed and unprimed indices is insignificant: TAB′ andTB′A is the same object.

7.2.1 The fundamental 2-form

Since the spinor space S is two-dimensional, every 2-form onS is proportional to an arbitrarily selected one. There is infact a naturally selected 2-form, which we shall call the fun-damental 2-form and denote by ε. We shall now define the2-form ε by using the map σ and the Minkowski metric g.The map σ transforms “Hermitian matrices” into 4-vectors,and we need to be more specific about the space of such “ma-trices.” Suppose h is a Hermitian matrix and σh ∈ R4 isthe corresponding 4-vector. Since the formula for the Lorentztransformation is

L (σh) = σ(

LshL†s

)

,

it is clear that h is a spinorial tensor of rank (11, 0), i.e. h ∈S ⊗ S. In the index notation, we may write the application ofthe map σ to such a tensor as

(σh)µ

= σµAA′hAA′

.

Moreover, the property of being Hermitian means that h = h.Hence, the Minkowski metric g defines a bilinear form in thespace S ⊗ S. We shall temporarily denote this bilinear formby angular brackets,

〈h1, h2〉 ≡ g (σh1, σh2) .

On the other hand, we have (with some choice of basis in S) theexplicit formula (7.6), from which it follows that the determi-nant of a Hermitian matrix h is equal to g(σh, σh). The deter-minant of h is a quadratic form in the space S ⊗ S, and it isclear from the above equation that this quadratic form gener-ates the bilinear form 〈h1, h2〉. However, the determinant canbe expressed through the Levi-Civita symbol ε as

deth =1

2εABεA′B′hAA

hBB′

.

It follows that, in the same basis where Eq. (7.6) holds,

gµνσµAA′σ

νBB′ =

1

2εABεA′B′ . (7.7)

Although the above formula is written in a specific basis, itcan be reinterpreted as the definition of the fundamental 2-form ε. The form ε is chosen as follows: First we take anynonzero 2-form ω ∈ S∗ ∧ S∗; then we multiply ω by a suitableconstant λ so that ε = λω satisfies Eq. (7.7).Once the 2-form ε is defined on S ∧ S, we naturally definethe map ε : S → S∗ and the inverse form ε−1 on S∗ ∧ S∗, aswell as the corresponding conjugate forms. Note, however,that the 2-form ε is antisymmetric and hence there is a choiceof sign in defining the inverse. For instance, we might debatewhether the vector ε−1a∗ should be defined by

ε(ε−1a∗,b) ≡ a∗ b

or by ε(ε−1a∗,b) ≡ a∗ b? The convention is to use the firstargument of ε (as shown above) but the second argument ofε−1. In the index notation, this corresponds to the followingdefinitions,

aAεBA = aB, bBεBA = bA, εABεBC = −δAC ≡ εAC .

In the index notation, additional trouble arises because δAC =−δCA if we raise and lower indices using εAB , so one uses theunambiguous notation εAC = εC

A. The same conventions

apply to the conjugate forms εA′B′ and εA′B′

.

Statement: Show that the fundamental 2-form ε is invariantunder Lorentz transformations.Hint: det Ls = 1.Solution: A Lorentz transformation of ε is the 2-form Lε de-fined by

Lε(a,b) ≡ ε(Lsa, Lsb) = ε(Lsa ∧ Lsb)

= (det Ls) ε(a ∧ b) = ε(a,b).

In the index notation: The transformed εAB is

εAB = LCALDBεCD

which is antisymmetric in A,B and thus should be propor-tional to εAB . Contracting with ε

AB , we find the coefficient of

proportionality to be equal to (detL)2 = 1.

7.2.2 Relationship of spinors and vectors

Given the fundamental 2-form ε, we may choose a preferredbasis adapted to the form. The traditional notation for the ba-sis is o, ι, where o, ι ∈ S and ε(o, ι) = 1. This is as close aspossible to an orthogonal basis, given that ε is antisymmetric.The basis o, ι is called the spin basis or the spin frame. Thebasis vectors o, ι naturally induce to spin frames in the dual,conjugate, and dual conjugate spaces. It follows that the fun-damental 2-form is written as

εAB = oAιB − ιAoB ; ε−1 = ι ∧ o = ι⊗ o− o⊗ ι.

Using the spin frame, we obtain a natural basis in the (four-dimensional) space of (11, 0)-tensors:

lAA′

,mAA′

, mAA′

, nAA′

oAoA′

, oAιA′

, ιAoA′

, ιA ιA′

≡ o⊗ o, o⊗ ι, ι⊗ o, ι⊗ ι .Transforming these basis tensors by the map σ, we obtain the4-vectors l,m, m,n. The vectors l and n are real-valued butm and m are complex-valued, because the corresponding ba-sis elements are not Hermitian. It is easy to see, however, thateach of the vectors l,m, m,n is null. In fact, these vectorsare a Newman-Penrose null tetrad (see Sec. 2.5).

146

Page 155: Winitzki. Advanced General Relativity

7.2 Spinor algebra

Statement: Show that σ(aAbA′

) is a null vector when aA ∈ S,bB′ ∈ S are two arbitrary spinors. Show that the tetradl,m, m,n defined above satisfies the required properties ofa NP null tetrad: g(l,n) = 1, g(m, m) = −1, g(n,n) = 0, etc.Find a linear transformation of the spin basis o, ι that pre-serves ε−1 = ι∧o and induces the tetrad transformation (2.21)-(2.22).

Hint: Use Eq. (7.7).

Answer: The spin basis can be transformed as ι → ιe−iφ/2,o → oeiφ/2 + Aιe−iφ/2, where φ and A are the parameters ofEqs. (2.21)-(2.22).

Since the σ map is fixed, we can identify 4-vectors and(11, 0) spinorial tensors. Let us now consider how we couldconstruct some tensor objects out of spinors and in this wayobtain a geometric interpretation for spinors.

Firstly, a spinor c ∈ S defines a null vector nc = σ(c ⊗ c).

In components, this is written as nµ = σµAA′cAcA′

or more

concisely nAA′

= cAcA′

. (The map σ is usually omitted forbrevity since it is a fixed structure relating the spinor and thevector spaces.) One may say figuratively that “a spinor is asquare root of a null vector.” The null direction nc is the prin-cipal geometric information contained in a spinor c. However,spinors differing by a phase, ceiα, define the same null vectornc. The phase information can be extracted if we define theflag bivector

F(c) ≡ c ⊗ c ⊗ ε+ ε⊗ c ⊗ c,

FAA′BB′ ≡ cAcB εA

′B′

+ εAB cA′

cB′

. (7.8)

The spinorial tensor FAA′BB′

is equivalent to an antisymmet-ric bivector Fµν satisfying

Fµνnν = 0, FµνFµν = 0. (7.9)

It follows that there exists a (c-dependent) vector kc such that

Fc = nc ∧ k, Fµν = nµkν − nνkµ, g(nc,kc) = 0.

The choice of the vector kc is up to a multiple of nc, so the pair(nc, Fc) only defines a null plane containing the null directionnc. This geometric configuration is similar to a “flag” that hasa “flagpole” nc and the “flag plane” kc,nc; so we call a pair(nc, Fc) a flag corresponding to a spinor c. Spinors c and c′

differing by sign give the same flag, and conversely, a pair(n, F ) determines a spinor c up to a sign.7

Calculation: Show that a phase rotation, c → eiαc, rotatesthe flag plane by the angle 2α.

These properties offer a possible geometric interpretation ofspinors and illustrate the fact that spinors are related to nulldirections. Since the flag bivector carries the entire physicalinformation in a spinor (the sign of a spinor is not observ-able), one could in principle forego spinors and reformulateevery spinor equation in terms of unambiguously defined flagbivectors. However, flags are quadratic in spinors, and thuslinear equations of motion for spinor fields will become non-linear tensor equations if written in terms of flags. Spinorsprovide a simpler formulation of many equations, especiallythose with massless fields for which null directions are espe-cially important.

7This can be shown following [36], Eqs. (13.1.47)-(13.1.50), and deriving anexplicit formula for c given a pair (n, F ).

7.2.3 Simplification of spinorial tensors

Since the spinor space S is two-dimensional, its propertiesare significantly simpler than those of a higher-dimensionalspace. For instance, every antisymmetric tensor of rank (0, 2)is proportional to ε. Hence, an arbitrary rank (0, 2) spinorialtensor TAB can be simplified into a symmetric tensor and amultiple of ε,

TAB =1

2(TAB + TBA) +

1

2(TAB − TBA)

= T(AB) +1

2

(TCDε

CD)εAB.

Moreover, every spinorial tensor can be reduced to a combi-nation of products of the 2-form ε and some totally symmetrictensors.8 For example, a rank (0, 22) tensor can be decom-posed as

TABA′B′ = T(AB)A′B′ +1

2TCDA′B′εCDεAB.

The same decomposition can be applied to the primed indices.Now let us consider two special cases. Suppose the tensor

TABA′B′ comes from an antisymmetric 4-tensor Tµν , we haveTABA′B′ = −TBAB′A′ , and then

TABA′B′ = φAB εA′B′ + εABφA′B′ ,

where the symmetric spinorial tensor φAB of rank (0, 2) is de-fined by

φAB ≡ 1

2TABA′B′ εA

′B′

.

If, on the other hand, TABA′B′ comes from a symmetric 4-tensor Tµν , then

TABA′B′ = T(AB)(A′B′) +1

4εAB εA′B′

(

TCDC′D′εCDεC′D′

)

.

(7.10)Thus the only nontrivial spinorial tensors we need to con-sider are totally symmetric ones. For those, the following de-composition property holds.

Theorem: Every totally symmetric spinorial tensor can bedecomposed into a symmetric product of spinors:

T(AB) = a(AbB), T(ABC) = a(AbBcC),

etc. The null flagpoles na, nb, etc., corresponding to thespinors a, b, etc., are called the principal null directions ofthe spinorial tensor T .

Proof: A totally symmetric tensor of rank n is completelydetermined by its values on a single vector; e.g. T (x,y, z) canbe restored if we know T (x,x,x) for every x. Setting x =o + ιz, where o and ι are two spinors building a spin frame,we find that T (x,x,x) is a cubic polynomial in z. A cubicpolynomial can be factorized as

T (x,x,x) = (a0 + a1z) (b0 + b1z) (c0 + c1z) ,

where a0, a1, ..., c1 are suitable complex numbers (notuniquely determined). We can interpret these numbers as thespin frame components aA, bA, cA of three spinors a, b, c.Since TABC is totally symmetric, we have obtained a decom-position T(ABC) = a(AbBcC). The general case of a spinorialtensor of rank n is treated analogously.

8See [27] or [32] for a complete proof.

147

Page 156: Winitzki. Advanced General Relativity

7 Spinors

Remark: Classification of 4-tensors. Since everyMinkowski 4-tensor can be mapped into a spinorial ten-sor, we may reduce every 4-tensor first to totally symmetricspinorial tensors, and then to products of individual spinors.In this way, every 4-tensor gives rise to a set of principalnull directions. This construction provides a classificationof 4-tensors based on how many of their principal nulldirections coincide.

Calculation: Derive the spinorial representation of the Levi-Civita symbol εκλµν by mapping it to spinorial tensors andreducing to products of the fundamental 2-form εAB .Answer:

εAA′BB′CC′DD′ = i(εABεCDεA′C′ εB′D′ − εA′B′ εC′D′εACεBD).(7.11)

Statement: Show that the flag bivector Fµν given byEq. (7.8) satisfies

εκλµνFκλFµν = 0. (7.12)

This is an additional property of flags, besides the propertiesgiven by Eq. (7.9).

7.3 Equations for spinor fields

We have worked in the Minkowski spacetime R4, interpretedas the tangent space at a single point of a spacetime manifold.Now we shall consider a (curved) spacetime where a spinorfield should be defined at every point.

7.3.1 Spinors in curved spacetime

A spinor field is, roughly speaking, a spinor-valued functionon a manifoldM; thus we would like to write ψ(p) ∈ S todenote the value of the spinor field at a point p. However,spinors are tied to the Lorentz transformations in a tangentspace TpM, and tangent spaces differ at different points of themanifold. Therefore, we cannot directly compare the values ofspinors at different points.A precise picture of a spinor field is provided by the con-struction of a vector bundle (see Sec. 6.3). We consider aspinor bundle with the spacetime M as the base and thespinor space S as the fiber. Thus, we would like to have aseparate copy of the spinor space S at each point p ∈ M. Aspinor field is defined9 as a (smooth) section of the spinor bun-dle; thus, the value ψ(p) at a point p is a spinor belonging tothe copy S(p) of the spinor space.In the previous section, we constructed the spinor space

S through the Minkowski metric, η ≡ diag(1,−1,−1,−1).However, we now need to assign a spinor space S(p) to eachpoint p in a curved spacetime where the Minkowski metric isreplaced by a general p-dependent metric tensor g(p). There-fore, we need to modify the construction of the spinor space,so that the map σ and the antisymmetric tensor εAB are com-patible with the metric g(p) at each point. This is achieved bythe following construction.Consider a set of four vectors ej, j = 0, 1, 2, 3 such that

g(ej , ek) ≡ gµνeµj eνk = ηjk, (7.13)

9We ignore possible topological difficulties with the construction of a spinorbundle for a given base manifold M. The spinor bundle exists for allphysically relevant spacetimesM.

where η is defined above as a standard, fixed matrix repre-senting the “fiducial” Minkowski metric. These four vectorsobviously form a basis in the tangent space TpM, and such abasis is called an orthonormal tetrad. We can always choosesuch a tetrad ej(p) in some neighborhood of any point p.The tetrad ej can also be viewed as a map from an auxil-iary Minkowski space R4 to the tangent space TpM, wherebya 4-vector (x0, x1, x2, x3) ∈ R

4 is mapped into the spacetime

vector x ≡ ∑3j=0 ejxj ∈ TpM. To contrast 4-vectors and 4-

tensors in the fiducial Minkowski space R4 with vectors and

tensors defined through tangent spaces TpM in the actualcurved spacetime, the latter are sometimes called world vec-tors and world tensors. Since ej(p) is a basis, we obtain a1-to-1 correspondence between R4 and TpM; it is easy to seethat the inverse map TpM → R4 is defined as x → (xj), wherexj ≡ g(x, ej(p)), j = 0, 1, 2, 3.The construction of spinors at a point p in a curved space-time manifold M uses the fixed, Minkowski-space matrices

σ(M) ≡ σjAB′ , j = 0, 1, 2, 3. We define the modified mapσ(p) : S⊗ S → TpM by using a tetrad ej(p)which maps theresult of σ(M) into TpM:

ˆσ(hAB′

) =

3∑

j=0

σjAB′hAB′

ej ∈ TpM.

This defines the curved-spacetime map ˆσ(p) which can bethought of a world-vector-valued spinorial (0, 11)-tensor,

σAB′(p) ≡3∑

j=0

σjAB′ej(p); σµAB′ ≡3∑

j=0

σjAB′eµj .

Due to the defining property (7.13) of a tetrad, we have

gµν(p)σµAA′ (p)σ

νBB′(p) =

3∑

j,k=0

gµν(p)eµj (p)σ

jAA′e

µk(p)σ

kBB′

=

3∑

j,k=0

ηjkσjAA′σ

kBB′ .

Therefore, the spinorial 2-form εAB automatically satisfies theproperty analogous to Eq. (7.7) at every point p,

gµν(p)σµAA′(p)σ

νBB′ (p) =

1

2εABεA′B′ .

Note that the definition of the spinor spaces S(p) involvesan arbitrary choice of an orthonormal tetrad ej(p). Differ-ent choices of the tetrad are related to teach other with Lorentztransformations. (For any two orthonormal tetrads ej ande′j, there exists a unique Lorentz transformation L such that

Lej = e′j .) The map σ and the 2-form εAB are invariant underLorentz transformations. Thus, any Lorentz-invariant spino-

rial expression, such as φAB εA′B′gAA′BB′

, will be mappedinto a world scalar, independently of the choice of the tetrad.

Remark: Since spinors are not defined directly throughpoints of the manifold, there is no concept of arbitrary point-wise transformations (diffeomorphisms) directly applied tospinor fields! The only transformations that apply to spinors

are Lorentz transformations Ls within the spinor space. Thusone cannot define the change in a spinor field due to the flowof an arbitrary vector field v onM; in other words, the Lie

148

Page 157: Winitzki. Advanced General Relativity

7.3 Equations for spinor fields

derivative “Lvψ” of a spinor with respect to a vector is un-defined.10 However, this does not mean that the presence ofspinors in a theory violates general covariance. General pointtransformations can be applied to the tetrad vectors ej andthus effectively change the map σ, which is the only connec-tion between the spinor space S and the tangent space TpM.Field theories involving spinors can be formulated in terms ofLorentz invariants in the spinor space, and invariants involv-ing the map σ and the world metric g, and then these theoriesare generally covariant.

7.3.2 Covariant derivative on spinors

The Levi-Civita connection ∇ (but not any other connection,see below!) can be uniquely extended to a covariant deriva-tive ∇vψ on spinor fields. In addition to the usual properties(linearity, Leibnitz rule, torsion-freeness), the Levi-Civita con-nection satisfies

∇vψ = ∇vψ,

∇vεAB = ∇vεAB = 0,

∇vσ = 0.

Naturally, the gradient operator ∇µ is also equivalent to the(0, 11)-tensor ∇AA′ in the spinor space,

∇v ≡ vµ∇µ ≡ vµσAA′

µ ∇AA′ .

Perhaps, it appears strange that such “simple” operationsas the coordinate derivative ∂µ cannot be defined on spinorfields. One way to explain the fact that only the Levi-Civitaconnection is defined on spinors is to consider the flag field(nψ, Fψ) corresponding to a spinor field ψ. Suppose we at-tempt to define a parallelly transported spinor ψ using someconnection∇. We say that ψ is “parallelly transported along avector v” if ∇vψ

A = 0. Then we expect that the flag (nψ, Fψ)should also be parallelly transported along v. However, thebasic properties (7.9), (7.12) of a flag (the vector nψ should benull, Fψ should be transverse to nψ, etc.) involve the metricg. Unless the connection ∇ is compatible with the metric g, aparallelly transported flag will fail to remain a valid flag; forinstance, the vector nψ may fail to remain null, Fψ may failto remain transverse to nψ , etc. Only the metric-compatible(Levi-Civita) connection guarantees that the pair (nψ, Fψ)willremain a valid flag after an arbitrary parallel transport.

Another argument is based on the concept of associatedbundles. The spinor bundle is associated to the gauge groupSL(2,C), which is essentially equivalent to the Lorentz groupSO(3, 1). The Levi-Civita connection∇ is the gauge-invariantconnection with respect to the gauge group, because∇ is com-patible with the metric (see Sec. 6.3.5). No other connection isadmissible in the associated bundle. Thus, no other connec-tion can be defined on spinors.

7.3.3 Maxwell equations

The electromagnetic field corresponds to a particle of spin 1and is described by a 2-form F = dA, whereA is a 1-form rep-resenting the electromagnetic potential. In the index notation,

10Unless the vector v is a conformal Killing vector for the metric. See [27],vol. 2, § 6.6.

F is the antisymmetric tensor Fµν . We have seen that such atensor can be reduced to a symmetric spinorial tensor φAB by

φAB =1

2εA′B′FAA

′BB′

,

FAA′BB′

= φAB εA′B′

+ εABφA′B′

.

Nowwe shall show that theMaxwell equations can bewrittenconcisely in terms of φAB , namely

∇AA′φAB =1

2εA′B′jBB

, (7.14)

where jBB′ ≡ jµσBB

µ is the spinorial version of the familiar

charge 4-current jµ ≡(

ρ,~j)

.

It is convenient to define the Hodge dual ∗F which is againa 2-form,

∗Fµν ≡ 1

2εκλµνFκλ,

and to rewrite the Maxwell equations equivalently as

∇µFµν = jν ,

∇µ (∗Fµν) = 0.

(The last equation is equivalent to ∇[λFµν] = 0, i.e. dF = 0,which is a consequence of the identity ddA = 0, which isequivalent to F = dA.) Using the explicit expression (7.11)for the Levi-Civita symbol εκλµν , it is straightforward to ver-ify that

FAA′BB′

+ i (∗F )AA′BB′

= 2φAB εA′B′

.

Then, the Maxwell equations are equivalent to

jν = ∇µ(Fµν + i ∗ Fµν) = 2∇AA′

(

φAB εA′B′

)

= 2(∇AA′φAB

)εA

′B′

,

since εAB is “constant” under ∇. Multiplying both sides byεB′C′ , we find that Eq. (7.14) is equivalent to the Maxwellequations.

Statement: Consider a spinorial tensor φAB = ψAψB , whereψA is some spinor, and show that the corresponding tensorFµν describes an electromagnetic wave in vacuum.Hint: In this case, Fµν is the flag of the spinor ψ, so Fµν

must satisfy the properties (7.9), (7.12) of a flag. For an elec-

tromagnetic field described by 3-dimensional vectors ~E and~B, we have

FµνFµν = 2

(

| ~B|2 − | ~E|2)

,

εκλµνFκλFµν = −2 ~E · ~B.

Calculation: Show that the energy-momentum tensor of theelectromagnetic field is

Tµν =1

(1

4gµνFαβF

αβ − FµαFαβgβν

)

≡ TAA′BB′ =1

2πφAB φA′B′ .

Hint: Use Eq. (7.10) to simplify the spinorial form TABA′B′

of the symmetric tensor Tµν .

149

Page 158: Winitzki. Advanced General Relativity

7 Spinors

The spinorial form of the Maxwell equation has its rootsin group theory.11 In a heuristic language, this is the sim-plest and the most natural relativistic equation for a symmet-ric spinorial tensor of rank (2, 0).

7.3.4 Dirac equation

The Dirac equation describes a spinor field φA correspond-ing to a massive particle of spin 1

2 . We shall first consider theDirac equation in Minkowski spacetime and then generalizeto a curved spacetime.The simplest nontrivial Lorentz-invariant equation for afield φA would be

( +m2

)φA = 0, (7.15)

where the D’Alambert operator is

≡ ∇µ∇µ = ∇AA′∇AA′

.

Dirac’s main motivation was to rewrite this second-orderequation as a system of first-order equations. The standardway to achieve this is to introduce extra fields for first deriva-tives. So let us introduce an auxiliary (complex conjugate anddual) spinor field σA′ , which will be the first derivative of φA,

σA′ ≡ ∇AA′φA.

We expect that the equation for σA′ will be of the form

∇BA′

σA′ = (...), and we shall now derive that equation.We need to simplify the expression

∇BA′

σA′ = ∇BA′∇AA′φA. (7.16)

Let us first consider a more general spinorial tensor of thisform,

XAA′BB′C ≡ ∇BB′∇AA′φC ;

when we are done simplifying XAA′BB′C , the term (7.16) willbe expressed as

∇BB′∇AA′φA = εA′B′

εACεBDXDB′AA′C . (7.17)

Since we are working in flat space where the curvatureidentically vanishes, the covariant derivatives commute (evenwhen acting on arbitrary tensors or spinors), so

XAA′BB′C = XBB′AA′C .

Thus we can apply the decomposition (7.10), suitable for sym-metric 4-tensors, to the first four indices ofXAA′BB′C :

XAA′BB′C = X(AB)(A′B′)C +1

4εAB εA′B′YC , (7.18)

YC ≡ XAA′BB′CεAB εA

′B′

.

It is easy to see that

YC = εAB εA′B′∇AA′∇BB′φC = gαβ∇α∇βφC = φC .

When we substitute Eq. (7.18) into Eq. (7.17), the symmet-ric term X(AB)(A′B′)C will be canceled after contracting with

11It can be shown the homogeneous version of the spinorial Maxwell equa-tion, ∇AA′φAB = 0, determines the irreducible representation of thePoincaré group having mass 0 and spin 1.

εA′B′

in Eq. (7.17). Therefore, only the term containing YC sur-vives. After some simplification, we find the equation for σA′ ,

∇BA′

σA′ = ∇BA′∇AA′φA =1

2φB = −1

2m2φB.

The conventional definition of the auxiliary field is χA ≡√2m−1σA′ , which absorbs the awkward factor 1

2m2. Then the

equations acquire a more symmetric form,

∇AA′φA =m√2χA′ ,

∇BA′

χA′ = − m√2φB .

At this point, we are done rewriting the second-order equa-tion (7.15) as a system of first-order equations. We now intro-duce a four-dimensional, complex-valued vector ψ ∈ S ⊕ S∗

instead of the pair(φB , χA′

), and write the above system of

equations as a first-order equation for ψ,

γµ∇µψ = mψ, (7.19)

where γµ are the required 4 × 4 complex matrices called theDirac matrices,

γµ =√

2

(0 −σµAA′

σµBB′

0

)

.

The vector ψ is called aDirac spinor.The Dirac equation in curved spacetime is Eq. (7.19) ratherthan Eq. (7.15). In the presence of curvature, these equationsare not equivalent since covariant derivatives do not com-mute.

150

Page 159: Winitzki. Advanced General Relativity

A Elements of Special and General Relativity

This is a brief review of basic concepts of relativity and dif-ferential geometry. In this appendix I mostly use the tradi-tional index notation to facilitate comparison with the preva-lent physics literature. The material in this appendix is as-sumed known in the main part of the book; the explanationshere cover only a portion of the material in standard relativitytextbooks. A good introductory textbook is [29].Special Relativity (SR) is a theory describing the motionof light and point masses in cases when gravity can be ne-glected. The name General Relativity (GR) is used to denoteEinstein’s theory of gravitation, which he developed as a gen-eralization of SR. I shall now review the mathematical foun-dations of these theories, omitting most proofs.

A.1 Special Relativity

The theory of Special Relativity is based on two main postu-lates: (i) All the laws of physics are the same in every inertialreference frame.1 (ii) The speed of light, denoted c, is inde-pendent of the speed of the light source. One can derive allthe statements of SR from these two postulates. For instance,it is shown that no massive body can be accelerated from restto a velocity greater than or equal to c. Also, the distance x′

and time t′measured in amoving reference framemust be dif-ferent from the distance x and time tmeasured in a rest frame.Namely, the results of these measurements are related by theLorentz transformation

ct′ = γ (ct− βx) , x′ = γ(x− βct),

γ ≡ 1√

1 − β2, β ≡ v

c,

wherewe assume that themotion is in the positive x direction,and β < 1 is the relative velocity of the two reference frames,measured in the units of c. Since c is an absolute constant, onecan measure velocities in the units of c and distances in theunits of time. This choice of measurement units is mathemati-cally equivalent to setting c ≡ 1 in all equations, andwe adoptthis convention everywhere in this text.The fact that coordinates and times are generally not mea-sured to be the same in different reference frames is at thecore of the theory of relativity; for instance, events seen as si-multaneous for one observer may precede one another for adifferent observer. Thus, simultaneity is not absolute but isonly defined relative to an observer’s reference frame (this isone justification for the name “relativity”).

A.1.1 Spacetime

Rather than use distances and times measured in differentreference frames, one adopts a more convenient way to de-scribe events. Namely, one introduces the spacetime, which

1Of course, it is also postulated that such inertial reference frames exist and,in particular, are approximately realized in a laboratory freely floating inempty space.

is an auxiliary four-dimensional vector space M with co-ordinates t, x, y, z. This space M ≡ R4 is called theMinkowski spacetime. In the spacetime picture, eventsx ∈ M, x ≡ t, x, y, z are points of the spacetime (vec-tors such as x are denoted by boldface letters). Bodies fol-low worldlines x(t), y(t), z(t) which may be drawn as linesinM and parametrized by functions x(t), y(t), z(t). It is usu-ally more convenient to parametrize worldlines by four func-tions t(τ), x(τ), y(τ), z(τ), where τ is a real-valued parameter.Then a worldline can be represented by a point-valued func-tion x(τ).A Lorentz transformation between inertial frames will thenbe seen as a linear transformation of the coordinates in thespacetime, t, x, y, z → t′, x′, y′, z′. However, it is natu-ral to think of the spacetime as an independent arena whereevents happen, regardless of coordinates introduced in it. Inother words, the coordinate values (t, x, y, z) ascribed to anevent x by a particular inertial observer may vary, but theevent x happens by itself at a particular “place-and-time,”whether observed or not. (We may imagine the spacetimeMas a chart containing complete histories of all the bodies anda complete record of all the events that happened or will everhappen.) To differentiate between ordinary three-dimensionalvectors and four-dimensional vectors fromM, the latter arefrequently called 4-vectors.It is easy to see that the Lorentz transformation preservesthe quadratic form,

g(x,x) ≡ t2 − x2 − y2 − z2,

called the relativistic interval or the Minkowski metric. Infact, the metric g has a far greater significance than merelyan invariant of Lorentz transformations. The most importantuse of g is in computing distances and time intervals betweenevents. Given two events, x and y, we may ask whether aninertial observer might pass through both x and y. If this isthe case then these events will have coordinates t1, 0, 0, 0and t2, 0, 0, 0 in this observer’s rest frame. The time interval,t2 − t1, measured according to this observer’s clock, is calledthe observer’s proper time interval between x and y. Thisinterval can be computed as

∆τ(x,y) ≡√

(t2 − t1)2 =

g(y − x,y − x).

The 4-vector y − x connecting the events in the Minkowskispace is then called a timelike vector. The last term in theabove formula is expressed through g and is thus Lorentz-invariant. Hence, any observer can compute the proper time∆τ (x,y) using this formula.Another possibility is the existence of a lightray connectingthe events x and y. Lightrays are worldlines of bodies thatmove with the speed of light; for example, the line x = t, y =z = 0 is a lightray. It is easy to see that the squared interval inthis case is g(y− x,y− x) = 0. The 4-vector y − x connectingthe events is then called a null vector.Finally, there may be an inertial observer for whomthe events x,y are simultaneous and thus have coordinates

151

Page 160: Winitzki. Advanced General Relativity

A Elements of Special and General Relativity

t0, x1, x2, x3 and t0, y1, y2, y3. In this case, g(y−x,y−x) <0, the 4-vector y − x is called a spacelike vector, and the realnumber

∆L(x,y) ≡

√√√√

3∑

j=1

(yj − xj)2 =

−g(y − x,y − x)

is equal to the distance between the points x,y in the referenceframe where both events occur simultaneously. The quantity∆L(x,y) is called the proper distance between the events.Again, this distance can be calculated in any other referenceframe using the above formula involving g.

Conversely, it can be shown that the sign of g(y − x,y − x)unambiguously specifieswhich of the three above cases (time-like, null, or spacelike) takes place. Thus, the metric g pro-vides not only a means to compute the proper time andthe proper distance between events in an arbitrary referenceframe, but also classifies pairs of events according to whethertheir separation is timelike, null, or spacelike. Since no ma-terial bodies or signals can propagate faster than light, onlyevents separated by a timelike or null interval can influenceor cause each other. Hence, the metric g determines whichpoints ofM can causally influence each other, i.e. describesthe causal structure of the spacetime.

We have arrived at a picture of the Minkowski spacetimeM whose points are events and where a metric g is defined.Reference frames corresponding to different inertial observersare merely coordinate systems that we may introduce onM.Using this picture, known physical laws can be reformulatedonly using 4-vectors x and the metric g, without reference to aparticular coordinate system onM. In other words, the lawsof physics have a frame-invariant character compatible withthe Lorentz transformations, which means that these laws arerelativistic (agree with Special Relativity).

Note that, according to a standard result of linear algebra,the quadratic form g(x,x) gives rise to a symmetric bilinearform g(x,y) which we may define by

g(x,y) =1

2(g(x + y,x + y) − g(x,x) − g(y,y)) .

Clearly, g plays the role of the scalar product in the 4-dimensional vector space M. In coordinates: if x ≡x0, x1, x2, x3 and y ≡ y0, y1, y2, y3 then

g(x,y) = x0y0 − x1y1 − x2y2 − x3y3.

This bilinear form is also called theMinkowski metric. Two4-vectors x,y are called orthogonal to each other if g(x,y) =0. Note that the unusual signs in the metric g make the geo-metric interpretation of the scalar product g(x,y) and of theorthogonality somewhat difficult. For instance, a null vectorx, such as 1, 1, 0, 0, is “orthogonal to itself” since g(x,x) = 0.However, in any given reference frame, the time direction isalways orthogonal to every spatial direction, and the threespatial directions are also mutually orthogonal, just like innormal Euclidean geometry. From the point of view of lin-ear algebra, the assumption of an “inertial frame” is equiva-lent to choosing a basis e0, e1, e2, e3 in the vector spaceMsuch that e0 is timelike, e1,2,3 are spacelike, and g(e0, e0) = 1,g(e0, ej) = 0, g(ej, ek) = −δjk for j, k = 1, 2, 3 (the basis vec-tors are orthonormal with respect to the bilinear form g).

A.1.2 Motion of bodies in SR

The motion of a massive body is described by a worldlinex(τ), where τ is an arbitrary parameter. It is convenient tochoose τ to be the proper time in the body’s rest frame. Withthat choice, many equations are simplified; in particular, the4-vector called 4-velocity, defined by

v ≡ x ≡ d

dτx,

satisfies g(v,v) = 1 because g(v,v) is frame-invariant and inthe body’s rest frame we have x = τ, 0, 0, 0 and thus v =1, 0, 0, 0.The 4-acceleration,

a ≡ d

dτv =

d2

dτ2x,

is always orthogonal to v in the sense of the metric g, namelyg(v,a) = 0. When τ is the proper time, a noninteracting bodymoves according to the equation

d2

dτ2x = 0.

Solutions of this equation are straight worldlines, which de-scribe motion with constant velocity.The above equation of motion can be derived from a varia-tional principle,

δ

∫√

g(x, x)dτ = 0,

which means that the inertial trajectory is an extremum of theproper time among all the possible worldlines x(τ). (It can beshown that the extremum in fact a global maximum.) With thenormalization g(x, x) = 1, the variational principle becomessimply δ

∫dτ = 0.

To describe a body of rest mass m that interacts with otherbodies, one writes the action

S = −m∫

dτ +

Lint[x, x]dτ,

where the Lagrangian Lint[x, x] is a function of the worldlinex(τ) that represents interactions with other bodies. The tra-jectory x(τ) of the body should extremize Lwith respect to allpossible worldlines. The resulting equation of motion is the“relativistic Newton’s law,”

md2

dτ2x = f , (A.1)

where the 4-vector f is the 4-force which may in generaldepend on x and x. The 4-vector p ≡ mx is called the4-momentum and carries the information about the energyand the momentum of the body, p = E, p1, p2, p3, wherep1, p2, p3 ≡ ~p are the components of the 3-momentum. Notethe normalization, g(p,p) = m2, which implies the energylaw

E =√

m2 + ~p2.

A.2 Index notation

Let us now introduce the index notation. In a chosen referenceframe, a 4-vector x has four coordinates (also called compo-nents) that are usually denoted x0, x1, x2, x3 with an upper

152

Page 161: Winitzki. Advanced General Relativity

A.3 Transition to General Relativity

(superscript) index; the index value 0 means “time” and thevalues 1, 2, 3 mean spatial coordinates. The scalar product oftwo 4-vectors x,y can then be written as

g(x,y) =

3∑

µ=0

3∑

ν=0

gµνxµxν ,

where gµν is a 4 × 4matrix,

gµν ≡ diag(1,−1,−1,−1) ≡

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

.

The low (subscript) position of indices in the symbol gµν isnot accidental. For brevity, it is customary to use the Einsteinsummation convention, which consists of dropping the sum-mation signs and implying summation every time an identicalindex letter appears once as a subscript and once as a super-script. Then, g(x,y) is written as gµνx

µxν . Also, one does notdistinguish between a vector x and the array xµ of its com-ponents in some basis; one simply says “a vector xµ.” This isthe essence of the index notation: It is always assumed that abasis is chosen and that one is performing calculations withcomponents of vectors and tensors in the chosen basis. All 4-vectors and 4-tensors then appear as tables of numbers, whichare indexed by (Greek) letters, for example: xµ, gαβ , εκλµν .A Lorentz transformation is described by amatrixΛαβ whichacts on 4-vectors xµ as

Λ : x → x′; x′µ = Λµνxν .

It is easy to see that the condition of Lorentz invariance of themetric is written as

ΛµαΛνβgµν = gαβ .

Furthermore, one introduces the Kronecker delta symbolδνµ, which corresponds to an identity matrix, and the inversemetric gµν such that

gαβgαγ = δγβ .

Numerically, gµν = diag(1,−1,−1,−1) is the same matrix asgµν .One also introduces the Levi-Civita symbol εκλµν which isa rank 4 totally antisymmetric tensor, defined by the relationε0123 = 1 in an inertial frame.Finally, it is often convenient to work with dual vectors,also called covectors or covariant vectors. A covectors can bepictured geometrically as the slope of the (multidimensional)graph of a function of xµ. The slope at a chosen point de-scribes the derivative of a function in every direction at once,in the following sense. The derivative of a function f(x) in thedirection given by a vector v is

Dvf ≡ vλ∂f

∂xλ.

Thus the slope of a function f(x) is adequately described bythe collection of components sλ ≡ ∂f/∂xλ. If sµ is the “slopeof a function,” then the derivative of a function in a directionvµ is the number sµv

µ. Thus, covectors can be thought of aslinear functions of vectors.

A linear function s of a vector v must be a linear combi-nation of the components vµ, µ = 0, 1, 2, 3, i.e. the function s

must have the form

s(v) = s0v0 + s1v

1 + s2v2 + s3v

3 ≡ sµvµ.

The numbers sµ are the components of the covector s. Thus,in view of the Einstein summation convention, it is natural todenote covectors by letters with lower indices.A standard example of a covector is the function whichmaps x to the scalar product of xwith a fixed vector b:

s : x → g(b,x).

(This is the slope of the hyperplane z = g(b,x), where thecoordinate z is the “ordinate axis” of the plot.) It is easy tosee that the components of the covector s are sµ = gµνb

ν . It iscustomary to refer to the relation between the vector b and thecovector s as the “lowering of an index” of bµ, and to denotethe two objects b, s by the same letter (rather than by differentletters as I have done here). Thus, one writes bµ instead ofsµ and calls it the “covariant version of the vector b

µ” or the“vector bµ with the index lowered.” In this way, the metricgµν can be seen as the “operator that lowers indices,” whilethe inverse metric gµν “raises indices.” (In a more geometriclanguage, gµν is a map from vectors to covectors and g

µν isthe inverse map.)

A.3 Transition to General Relativity

Einstein’s main motivation for introducing General Relativ-ity was to remove the restriction to inertial frames and to ad-mit arbitrary non-inertial, that is, accelerated reference frames.Since gravitation is locally equivalent to an accelerated refer-ence frame (the equivalence principle), he hoped to achievea description of gravitation in this way.Once we admit arbitrary non-inertial reference frames, wemust assume that the coordinates xµmay be curvilinear, whilethe metric gµν may be non-diagonal and generally coordinate-dependent. Thus, the theory of General Relativity is obtainedfrom Special Relativity by the following steps:(1) We replace the Minkowski spacetimeM = R

4, which isitself a vector space, by an arbitrary four-dimensional spaceM, called the spacetime manifold. Coordinates xµ inM arechosen arbitrarily.(2) The Minkowski metric ηµν = diag(1,−1,−1,−1) is re-placed by a symmetric tensor gµν(x) = gνµ(x) which candepend on the coordinates xµ. Therefore, we imagine thatM is an arbitrarily curved spacetime rather than a “flat”Minkowski spacetime. (A spacetime manifold is flat if thereexists a coordinate system in which the metric tensor isgµν(p) = ηµν at every point p. A manifold is curved if nosuch coordinate system can be found.)(3) The signature of the matrix gµν must remain (+ −−−)everywhere. In particular, for each event p ∈ M theremust exist a local coordinate system where gµν(p) =diag(1,−1,−1,−1). Moreover, there exists a local coordinatesystem (called a locally inertial frame) where the physicallaws at the event p are exactly the same as in Special Relativ-ity in the absence of gravitation. Thus, gravitation is locallyequivalent to a non-inertial reference frame (the equivalenceprinciple).(4) Physical laws, which were previously made relativis-tic (compatible with arbitrary inertial frames and rewritten

153

Page 162: Winitzki. Advanced General Relativity

A Elements of Special and General Relativity

through the Minkowski metric), are now reformulated in away that is independent of the choice of coordinate systemsand admits arbitrary, non-inertial coordinates. In particular,in every equation theMinkowski metric is replaced by the ten-sor gµν(x). Scalar products and the raising/lowering of tensorindices is performed using gµν(x).(5) The theory prescribes equations (the Einstein equations)that can be used to determine the metric gµν(x) from a givendistribution of matter in the entire spacetime. The Einsteinequations involve the curvature of the spacetime and theenergy-momentum tensor of matter (see below).It follows from (1) that the coordinates xµ are not them-selves 4-vectors any more, and for instance the coordinates xµ

and yµ of two events cannot be simply subtracted to obtaina 4-vector yµ − xµ. Moreover, the theory must be invariantunder arbitrary coordinate transformations,

xµ → xµ(x0, x1, x2, x3). (A.2)

However, the derivative xµ ≡ dxµ/dτ actually is a 4-vector be-cause it is the difference of coordinates at infinitesimally closepoints, and such points belong to an almost-Minkowski lo-cal environment which exists according to (3). The fact thatuµ ≡ dxµ/dτ is a 4-vector can be also formally established byconsidering the change of the components uµ under the trans-formation (A.2),

uµ → uµ ≡ d

dτxµ =

∂xµ

∂xνdxν

dτ=∂xµ

∂xνuν, (A.3)

which is the usual linear transformation law for componentsof a vector under a general change of basis.Hence, the metric gµν(x) still determines proper timesand proper distances, but only between infinitesimally closeevents. To make this statement, one frequently writes a some-what strange-looking equation

ds2 = gµνdxµdxν ,

where “dxµ” means an infinitesimal displacement betweentwo nearby points, and ds2 is not really a square of any “ds”since ds2 may also be negative. (One can regard the aboveequation simply as a jargon notation, i.e. a meaningless butconvenient and widely understood shorthand for the words“the infinitesimal interval is equal to gµνdx

µdxν .”) To deter-mine the proper time interval between two widely separatedevents xµ(0) and x

µ(1), we need to select a line in M leading

from xµ(0) and xµ(1), say a curve x

µ(s), where s is some param-

eter such that xµ(0) = xµ(0) and xµ(1) = xµ(1), and integrate

along the curve,

∆τ ≡∫ 1

0

gµν (xµ(s))dxµ

ds

dxν

dsds.

Here we assume that curve xµ(s) is such that dxµ/ds is ev-erywhere timelike, so the expression under the square root ispositive. Such curves are called timelike worldlines. The re-sulting time interval ∆τ will of course depend on the chosencurve xµ(s), but not on the choice of the parameter s along thecurve. In general, it is not easy to determine whether there ex-ists a timelike curve connecting two given points in the space-time. It might happen that only a null or a spacelike curveconnects two given events. (Two events are timelike sepa-rated if they can be connected by a timelike worldline.) With-out a detailed analysis of the behavior of the metric gµν ev-erywhere inM, it is not immediately clear whether two givenevents can causally influence each other.

A.4 Covariant derivative

The requirement (4) above means, in particular, that the rela-tivistic Newton’s law (A.1) should be rewritten so that it holdsin all coordinate systems. An immediate difficulty with thisrequirement is that the laws of physics involve derivatives ofvectors, and taking a derivative of a vector or a tensor in anarbitrary coordinate system in curved space is a somewhatcomplicated operation. For instance, d2xµ/dτ2 is not a cor-rect expression for the µ-th component of the 4-acceleration,unless we are using Cartesian coordinates in flat Minkowskispacetime. Let us examine this problem in more detail; weshall begin by considering derivatives in curved coordinatesin flat space, and then generalize to curved space.

A.4.1 Curved coordinates

Suppose that xµ(τ) is the worldline of a particle in the flatMinkowski spacetime specified in a rectangular coordinatesystem xµ, and suppose that xµ is a curvilinear coordi-nate system related to xµ by four given functions fµ(x),µ = 0, 1, 2, 3, as xµ = fµ(x0, x1, x2, x3). The components of4-velocity in the coordinate system xµ are uµ ≡ dxµ/dτ ,and the components of the 4-acceleration are aµ ≡ d2xµ/dτ2.In the coordinate system xµ, the new components of the 4-velocity, uµ, and the 4-acceleration, aµ, must be calculated ac-cording to Eq. (A.3):

uµ =∂fµ

∂xνuν , aµ =

∂fµ

∂xνaν =

∂fµ

∂xνduν

dτ. (A.4)

The worldline of the particle in the new coordinate system isxµ(τ) ≡ fµ(x(τ)), and so one might try to compute the 4-velocity and the 4-acceleration directly in the coordinate sys-tem xµ by computing derivatives dxµ/dτ and d2xµ/dτ2.Now, the 4-velocity will be computed correctly because, byvirtue of the chain rule, ∂αf(x) = (∂f/∂x) ∂αx, and hence

dxµ

dτ=∂fµ

∂xνdxν

dτ=∂fµ

∂xνuν = uµ.

However, the 4-accelerationwould be found incorrectly if oneuses the formula duµ/dτ :

d

dτuµ =

d

(∂fµ

∂xνuν)

= uνd

(∂fµ

∂xν

)

+∂fµ

∂xνduν

= uνd

(∂fµ

∂xν

)

+ aµ. (A.5)

For a general coordinate transformation (not merely a linearchange of coordinates where ∂fµ/∂xν = const), we have

d

(∂fµ

∂xν

)

6= 0,

therefore aµ 6= duµ/dτ . So the formula duµ/dτ cannot be usedin a general coordinate system xµ to compute the compo-nents of the 4-acceleration aµ. The reason for the problem isthat the coordinate system xµ is curved and so the deriva-tive duµ/dτ reflects not only the change in the 4-velocity, butalso the change in the directions of coordinate axes.A correct expression for aµ is given in Eq. (A.4). Neverthe-less, we would like to have a formula for aµ that can be useddirectly in the coordinate system xµ. The solution of theproblem is evident from Eq. (A.5),

aµ =duµ

dτ− uν

d

(∂fµ

∂xν

)

.

154

Page 163: Winitzki. Advanced General Relativity

A.4 Covariant derivative

Here, uν can be expressed through uµ using Eq. (A.4). There-fore, the 4-acceleration aµ in a curved coordinate system canbe computed from duµ/dτ and uµ, if additionally we knowthe functions fµ(xν), µ = 0, 1, 2, 3, that relate the curved coor-dinates to the flat ones. Note that d/dτ is effectively a deriva-tive in the direction of the 4-velocity uµ and can be applied toarbitrary functions of coordinates as

d

dτ(...) = uλ

∂xλ(...) .

Since fµ are functions of coordinates (not of τ ), we may write

d

(∂fµ

∂xν

)

= uλ∂

∂xλ

(∂fµ

∂xν

)

.

Therefore, we can compute the derivative of a vector fieldAµ(x) in the direction uλ in any coordinate system: in flat co-ordinates xµ as

d

dτAµ ≡ uλ

∂Aµ

∂xλ

and in curved coordinates xµ as

[

∂Aµ

∂xλ−Aν

∂2fµ

∂xλ∂xν

]

= uλ

[

∂Aµ

∂xλ− Aρ

(∂xν

∂fρ

)∂2fµ

∂xλ∂xν

]

.

The expression in brackets above is called the covariantderivative and denoted either by a semicolon or by the sym-bol ∇ (pronounced “nabla” or “del”),

∇λAµ ≡ Aµ;λ ≡ ∂Aµ

∂xλ− Aρ

(∂xν

∂fρ

)∂2fµ

∂xλ∂xν. (A.6)

In this notation, the derivative of a vector Aµ in the directionuµ is written as uλAµ;λ or u

λ∇λAµ.

By construction, the covariant derivative Aµ;λ is found byfirst computing the derivative ∂Aµ/∂xλ in flat coordinates(where we know the correct way to differentiate vectors) andthen recalculating the components to the curved coordinatesxµ by using the transformation law for tensors,

Aµ;λ =

(∂Aν

∂xρ

)∂fµ

∂xν∂xρ

∂fλ.

The formula (A.6) gives these components Aµ;λ directly

through the components Aµ(x), without need to know thecomponentsAµ in flat coordinates. Of course, the components

Aµ;λ transform correctly (“covariantly”) under a change of co-ordinates. The origin of the name “covariant” derivative.

Remark: I would like to emphasize that the covariantderivative ∇λA

µ is not a different kind of derivative. If thenotion of the covariant derivative causes you difficulties, itmight be instructive to focus attention on the case of flat space.In flat space, there is already a well-defined, familiar direc-tional derivative of vectors, namely ∂Aµ/∂xλ, where

is a flat (Cartesian) coordinate system. However, we haveseen that the formula ∂Aµ/∂xλ only holds in flat coordinates,while in curved coordinates the correct expression for the di-rectional derivative is the formula (A.6). Thus, the “covari-ant derivative” (A.6) can be thought of as a covariant formulafor the (already well-defined and familiar) directional deriva-tive of a vector field in flat space. The covariant formula ispreferable because it holds in curved coordinates as well as inCartesian coordinates.

Suppose that xµ were actually the same coordinate sys-tem as xµ; then we would have ∂2fµ/(∂xλ∂xν) = 0 andEq. (A.6) would still define the correct derivative ∂Aµ/∂xλ.Therefore, the covariant derivative reduces to the coordinatederivative ∂/∂xλ in flat coordinates, and so it makes sense touse the covariant derivative always. A tensor expression in-volving covariant derivatives, such as

aµ = uνuµ;ν ,

is valid in every coordinate system.A more concise way to represent the covariant deriva-tive (A.6) is to write

Aµ;λ ≡ ∂Aµ

∂xλ+ AρΓµρλ.

Γµρλ ≡ −(∂xν

∂fρ

)∂2fµ

∂xλ∂xν.

Once the quantities Γµρλ are computed for a given coordinatesystem xµ, it is easy to differentiate vectors directly in thatcoordinate system, without needing to convert tensors backto flat coordinates every time.Note that Γµρλ is an array of numbers that depends on thecoordinate system in a nontrivial way and, despite its appear-ance as an indexed quantity, is not a tensor of rank (1,2). Thetransformation law for Γµρλ is given in Eq. (A.9) below.

A.4.2 Curved space and induced metric

In the previous section, we found how a vector field Aµ(x)should be differentiated in an arbitrary, curved coordinatesystem in flat space. The method was to differentiate Aµ ina flat coordinate system and then recalculate the componentsto the curved coordinates. The result was a covariant formulafor the derivative,∇λA

µ. Let us now consider a more generalcase: a curved space.A general manifold M can be visualized as a “non-straight” (curved) surface in flat Euclidean space Rn. A di-rection within the manifold is represented by a vector tangentto the surface. Since tangent vectors are at the same time vec-tors in R

n, we can compute scalar products of such tangentvectors by using the standard scalar product in Rn. Thus wehave automatically defined a metric on the surface. (Ametricon a manifold is a quadratic form on tangent vectors.) Thisnaturally defined metric on the surface is called the inducedmetric on the surface with respect to the given embedding inRn. The idea is that the known scalar product inRn “induces”the scalar product of tangent vectors.

Example: A unit 2-sphere in three-dimensional Euclideanspace with coordinates x, y, z is defined as the locus ofpoints satisfying

x2 + y2 + z2 = 1.

The 2-sphere a curved two-dimensional manifold denotedS2. A tangent vector to the sphere at a point x, y, z canbe visualized as a three-dimensional vector with componentsv1, v2, v3 such that

xv1 + yv2 + zv3 = 0.

(This equation selects the tangent plane at point x, y, zon the sphere.) The scalar product of two tangent vectorsu1, u2, u3 and v1, v2, v3 is the usual Euclidean product,

u · v = u1v1 + u2v2 + u3v3.

155

Page 164: Winitzki. Advanced General Relativity

A Elements of Special and General Relativity

Points on the sphere may be labeled by intrinsic coordinates,e.g. by the spherical coordinates θ, φ. The point θ, φ hasEuclidean coordinates X,Y, Z, where X,Y, Z are functionsof θ and φ,

X = cos θ cosφ, Y = cos θ sinφ, Z = sin θ.

A tangent vector with three-dimensional componentsv1, v2, v3 is visualized as a “velocity” of a point movingwithin the sphere. If this point moves along a trajectoryγ(τ) ≡ Θ(τ),Φ(τ), where Θ and Φ are some functions, thenthe velocity vector is described by two intrinsic componentst1, t2,

v = t1, t2 =

dτ,dΦ

.

It is straightforward to see that the three-dimensional compo-nents v1, v2, v3 of the same velocity vector v are found as

v1 =∂X

∂θ

dτ+∂X

∂φ

dτ, etc.

Thus the correspondence between the intrinsic componentst1, t2 and the three-dimensional components v1, v2, v3 is alinear transformation,

v1 =∂X

∂θt1 +

∂X

∂φt2, v2 =

∂Y

∂θt1 +

∂Y

∂φt2, v3 =

∂Z

∂θt1 +

∂Z

∂φt2.

The scalar product of two tangent vectors u and v can be writ-ten explicitly through the intrinsic components u ≡ s1, s2and v ≡ t1, t2 as

u · v = u1v1 + u2v2 + u3v3

=

(∂X

∂θs1 +

∂X

∂φs2

)(∂X

∂θs1 +

∂X

∂φt2

)

+ ...

After a short calculation with the functions X,Y, Z givenabove, we find

u · v = s1t1 +(sin2 θ

)s2t2.

Thus the induced metric on the sphere (in coordinates θ, φ)is written as

ds2 = dθ2 + dφ2 sin2 θ.

Note that there is no set of coordinates x, y that wouldmake a sphere a flat manifold, i.e. a space with the Euclideanmetric dx2 + dy2. (If that were possible, there would be aunique shortest path connecting opposite poles of the sphere,which is clearly not the case.)

A.4.3 Covariant derivative

We now need to derive a formula for the directional deriva-tive for vector fields in a curved space, when a flat coordinatesystem is not available. The formula must involve only intrin-sic coordinates on the manifold and require no informationabout an embedding ofM into a larger space.It is a known theorem in differential geometry that anymanifold M with a given metric g can be represented as ahypersurface embedded into a higher-dimensional flat space(the higher-dimensional space is sometimes called the bulk inmodern physics). Then the flat coordinates Xµ and the flatmetric Gµν are available in the bulk, and the metric gµν de-fined within the manifoldM is equal to the induced metric

with respect to the embedding into the bulk. A vector fielddefined within the manifoldM is visualized as a field of vec-tors Aµ(x) everywhere tangent to the surface. We can com-pute the derivative of a tangent vector Aµ with respect to atangent direction uµ as uλ∂Aµ/∂Xλ, but this vector may havea nonzero component normal to the surface. If we simply dis-card this component, i.e. if we project the vector uλ∂Aµ/∂Xλ

orthogonally onto the surface, we obtain a vector tangent tothe surface. Let us denote this vector by uλ∇λA

µ (this nota-tion is justified since the projection of uλ∂Aµ/∂Xλ is linear inuλ). The tensor ∇λA

µ is called the covariant derivative of thevector fieldAµ. Thus we have obtained a derivative operation∇λ defined on tangent vectors within the manifoldM. Simi-larly, the covariant derivative is defined on arbitrary tensors.

The procedurewe employed to define the covariant deriva-tive involves the orthogonal projection of ∂Aµ/∂Xλ onto thetangent space of the hypersurfaceM. It may be unclear whyit is useful to apply such a projection rather than some otheroperation. One can motivate this procedure by the followingconsiderations. Imagine that a massive body is constrainedto move without friction along a hypersurfaceM, while noother forces influence its motion. Further, imagine that themotion of the body is observed by a “flat” observer who livesentirely within the hypersurface and thus cannot see the ex-tra dimensions. To the “flat” observer, such a body appearsto be unforced because the constraining force normal to thesurface is invisible. Therefore, the “flat” observer expects thatthe acceleration of the body, as seen from within the hyper-surface, is equal to zero. Since the force acting on the body isalways normal to the hypersurface, indeed the tangential com-ponents of the acceleration are always zero (but not the nor-mal component). Therefore, the orthogonal projection of theacceleration onto the tangent space is always zero. This con-dition is equivalent to saying that the covariant derivative ofthe velocity vector field uµ in the direction of motion is equalto zero, uλ∇λu

µ = 0. Therefore, the construction of the co-variant derivative through the orthogonal projection indeedexpresses the idea that the velocity of a freely moving bodyremains constant along the trajectory.

Note that the normal projection is constructed through themetric Gµν and depends on the embedding of the hypersur-faceM into the flat space. However, the derivative operationcan be written solely in terms of intrinsic coordinates xµwithin the manifoldM and the metric gµν , without referringto any embedding into a larger space. The derivation is stan-dard and somewhat lengthy, so we omit it.2 We present onlythe resulting final formula for the covariant derivative,

∇νuµ =

∂uµ

∂xν+ Γµανu

α, (A.7)

Γλαβ ≡ 1

2gλµ (gµα,β + gµβ,α − gαβ,µ) . (A.8)

The array of numbers Γµαβ is called the Christoffel symbolcorresponding to the given coordinate system and the met-ric gµν . As before, the numbers Γµαν depend on the coordi-nate system in such a way that the sum ∂uµ/∂xν + Γµανu

ν

transforms correctly as a rank (1,1) tensor under an arbitrarychange of coordinates, even though the two terms ∂νu

µ andΓµανu

ν , taken separately, do not transform correctly. The re-quired transformation law for Γµαν is easy to derive. Let us

2A derivation of an explicit formula the covariant derivative is performed inSec. 1.6.6 using coordinate-free calculations.

156

Page 165: Winitzki. Advanced General Relativity

A.4 Covariant derivative

temporarily write ∇ with a tilde when the covariant deriva-tive is computed in the coordinate system xµ (this is onlyfor clarity; normally we do not use such a notation). We as-sume that ∇νu

µ obeys the transformation law for rank (1,1)tensors,

∇νuµ =

(

∇αuβ) ∂xµ

∂xβ∂xα

∂xν,

and substitute

∇νuµ ≡ ∂

∂xνuµ + Γµανu

α,

∇ν uµ ≡ ∂

∂xνuµ + Γµαν u

α.

The result is

Γλαβ = Γµνρ∂xν

∂xα∂xρ

∂xβ∂xλ

∂xµ+

∂2xλ

∂xν∂xρ∂xν

∂xα∂xρ

∂xβ. (A.9)

The above transformation law contains a term that is not pro-portional to Γλαβ , so the numbers Γ

λαβ may all vanish in one co-

ordinate system but become nonzero in another. Clearly, thenumbers Γλαβ cannot represent the components of any tensor.A quantity with the transformation law (A.9) is called a con-nection.

A.4.4 Properties of covariant derivative

As defined above by Eq. (A.7), the covariant derivative oper-ator ∇ν produces tensors of rank (1,1) out of vectors. Despitean extra term in the formula (A.7), the derivative satisfies theusual properties, such as

∇λ (fAµ) = f∇λAµ +Aµ∇λf,

where f is a scalar function and Aµ is a vector field. This isto be expected since ∇ν can be thought of as merely the stan-dard derivative operator recalculated in a curved coordinatesystem. Note that ∇λf ≡ ∂f/∂xλ is an ordinary derivative,but no harm is done by writing ∇λf instead of ∂λf .Since the derivative operation ∇λ is a projection of the or-dinary directional derivative ∂/∂Xλ in the embedding space,it follows that∇λ satisfies the standard properties of a deriva-tive, such as linearity,

∇λ (A,,,,.... +B,,,,....) = ∇λA,,,,.... + ∇λB

,,,,.... ,

and the Leibnitz rule, for instance

∇λ(aαµbβγ) =

(∇λa

αµ

)bβγ + aαµ

(∇λb

βγ).

Using this property, a generalization of the formula (A.7) canbe easily found for covectors or arbitrary tensors. In particu-lar, we demand that

∇ν

(aαbβ

)= (∇νa

α) bβ + aα(∇νb

β),

∇ν (xαyα) = ∂ν (xαyα) .

From this we can derive an explicit formula for the covari-ant derivative for an arbitrary tensor, in terms of coordinatederivatives ∂λ and the Christoffel symbol. For example,

∇νaµ = ∂νaµ − Γλνµaλ,

∇νTαβ = ∂νT

αβ + ΓαµνT

µβ − ΓλνβT

αλ .

For brevity, partial derivatives with respect to a coordinateare often denoted by an index with a preceding comma,

∂uµ

∂xν≡ uµ,ν,

while covariant derivatives are denoted by an index with apreceding semicolon, ∇νu

µ ≡ uµ;ν .All the physical laws can be formulated in a generally co-variant way (i.e., valid in any coordinate system) if we re-place all coordinate derivatives ∂µ by covariant derivatives∇µ. Thus, the generally covariant analog of Eq. (A.1) is

muν∇νuµ = fµ. (A.10)

Here uµ ≡ dxµ/dτ is the 4-velocity of the particle, and we re-placed d/dτ by the covariant derivative uν∇ν . So far, all wehave achieved is a rewriting of the known physical laws inan arbitrary coordinate system; no new physical informationis introduced or derived. The second step towards GeneralRelativity is to postulate that the correct form of the physi-cal equations (e.g., Newton’s law or the Maxwell equations)is obtained from the known laws in Special Relativity bysubstituting an arbitrary (but nondegenerate and Lorentzian-signature) spacetime-dependent metric gµν(x) instead of theflat Minkowski metric ηµν and by replacing derivatives ∂µ bycovariant derivatives ∇µ, where the Christoffel symbol Γλαβis defined by Eq. (A.8). This step of course leads to certainchanges in the physical laws due to a different gµν and a non-trivial Γλαβ ; these changes are interpreted as the influence ofgravitation. The result is a reformulation of all the laws ofphysics in an arbitrary curved spacetime. This is, of course,a major assumption about the way gravitation affects the be-havior of particles and fields: “gravitation” is not a specialforce but the natural and inavoidable influence of the non-trivial geometry on matter in a curved spacetime. Ultimately,this statement must be tested by experiments. The experimen-tal status of General Relativity is quite satisfactory at present.The widespread acceptance of GR is due to the experimentalconfirmation as well as to the simplicity of this theory com-pared with other theories of gravitation.

A.4.5 ∗Choice of connection

We have shown that the transformation law (A.9) can be de-rived merely from the requirement that ∇νA

µ should trans-form as a tensor of rank (1,1). It follows that any connectionΓλαβ that transforms according to Eq. (A.9) will give rise to atensor

∂Aµ

∂xβ+ ΓµαβA

α.

So onemay consider “alternative” covariant derivatives of theform (A.7), where Γµαβ is not necessarily the Christoffel con-nection. Since the physical influence of gravity is describedby terms containing Γµαβ , a natural question is to decide which

connection Γµαβ is “correct.”In GR, one chooses the Christoffel symbol (A.8) as the con-nection. This can be justified on the basis of experimental dataconfirming GR, and also by considerations of mathematicalsimplicity. In fact, the choice (A.8) is equivalent to the follow-ing two conditions: (i) Γλαβ = Γλβα, and (ii) gαβ;ν = 0. Let us

review various arguments supporting these assumptions.3

3I wish to emphasize that these are physical assumptions, i.e. something

157

Page 166: Winitzki. Advanced General Relativity

A Elements of Special and General Relativity

The condition (i) can be seen as a consequence of the equiv-alence principle, which says that each event p admits a locallyinertial frame where Special Relativity holds in an infinitesi-mal neighborhood of p. In other words, there exists a coor-dinate system xµ in which the physical laws hold with ordi-nary derivatives ∂/∂xµ instead of ∇µ. Hence, at the event pwe have Γλαβ(p) = 0. Recalculating Γλαβ for other reference

frames using Eq. (A.9), we find that Γλαβ(p) = Γλβα(p) in ev-ery reference frame. Since this argument holds for any eventp, we find that the equivalence principle entails the conditionΓλαβ = Γλβα. Alternatively, we may demand that the covariantderivatives commute on functions, i.e. f;µν = f;νµ, where f isany scalar function. It can be easily seen that this conditionalso forces Γλαβ = Γλβα.The condition (ii) is another fundamental assumption thatcan be motivated in many ways. For instance, gµν in theMinkowski spacetime is given by a constant matrix, gµν =ηµν , thus ∂αgµν = 0, while we expect that the same property(but involving the covariant derivative) should hold in curvedspacetimes. Alternatively, we may require that the operationof lowering an index of a vector or a tensor should commutewith the covariant derivative:

gµαAµ

;ν = (Aα);ν , where Aα ≡ Aµgµα.

This property is clearly equivalent to gµα;ν = 0. Alternatively,let us consider the properties of a “locally constant” vectorfield. A vector field uµ is “locally constant” if its covariantderivative vanishes at a point p, i.e. uµ;ν(p) = 0. (In a locallyinertial frame at p, this condition is identical to ∂νu

µ(p) = 0.)If the vector uµ is “constant” in this sense, it is natural to ex-pect that the length of the vector uµ is also locally constant:∇α(gµνu

µuν)|p = 0. However, this condition is equivalent togµν;αu

µuν = 0. Since the direction uµ and the point p are arbi-trary, we must require that gµν;α = 0 everywhere.It is a straightforward exercise to derive an explicit form of

Γλαβ from the conditions (i) and (ii), and we omit the deriva-tion (which can be found in standard textbooks). The result isthe formula (A.8). Thus the intrinsic metric gµν uniquely de-fines the Christoffel symbol and the covariant derivative. InGeneral Relativity, one never uses any other connection thanthe Christoffel symbol given above.

Remark: The constancy of the metric under the covariantderivative, ∇λgµν = 0, is a direct consequence of the con-struction of ∇λ through a projection from an embedding ofthe manifoldM as a hypersurface in a “bulk” space with co-ordinates Xλ. Since the bulk has a flat metric Gµν , we have∂λGµν = 0, where ∂λ ≡ ∂/∂Xλ is the derivative with respectto the flat coodinates in the bulk. Suppose uµ and vµ are twotangent vector fields; then by definition of the induced metric,gµνu

µvν = Gµνuµvν . Hence,

∂λ (Gµνuµvν) = Gµν (∂λu

µ) vν +Gµνuµ (∂λv

ν) .

Since uµ and vµ are tangent, the normal component of ∂λuµ

will disappear from the scalar product Gµν (∂λuµ) vν . Thus

∂λ (Gµνuµvν) = gµν (∇λu

µ) vν + gµνuµ (∇λv

ν) .

to be ultimately tested by experiments. Theories similar to GR but withΓλ

αβ 6= Γλβα or gαβ;ν 6= 0 can be constructed as well. Although such

theories are of course more complicated than GR, physicists do considertheir physical consequences and test them experimentally. So far, no al-ternative theory of gravitation has been shown to surpass GR in correctexperimental predictions.

On the other hand,

∂λ (Gµνuµvν) = ∇λ (gµνu

µvν) ,

and it follows that ∇λgµν = 0.

A.5 Curvature

A.5.1 Parallel transport

The concept of parallel transport of vectors on a curved man-ifoldM can be easily visualized if we think of the manifoldM as a surface embedded in flat space. Suppose a path isgiven within the surface, with a tangent vector vµ. An ar-bitrary tangent vector uµ can be transported along the pathin such a way that the derivative along the path in the bulkspace, vλ∂uµ/∂Xλ, is everywhere normal to the surface. Since(by definition) ∇λ is ∂/∂X

λ projected onto the tangent plane,the condition vλ∂uµ/∂Xλ = 0 is equivalent to the conditionvλ∇λu

µ = 0.This motivates the following definition: A vector uµ is par-allelly transported along a path if vλ∇λu

µ = 0, where vλ is atangent vector to the path.The covariant derivative can then be given a different inter-pretation using the parallel transport operation. Let us denote

by Tεauµ the parallel transport of a vector uµ along a straight

line segment εaµ. Then we can say that the covariant deriva-tive measures the rate of change in the vector uµ comparedwith the change due to the parallel transport:

aλ∇λuµ = lim

ε→0

u(xµ + εaµ) − Tεau(xµ)

ε.

It is easy to see that the scalar product of vectors is constantunder a parallel transport:

vλ∇λ (gµνuµwν) = 0 if vλ∇λu

µ = vλ∇λwµ = 0.

Note that in flat space we may introduce flat coordinateswhere we have ∇λ = ∂λ, and thus a vector u

µ is parallellytransported along any curve iff all the components uµ remainconstant. If we use curved coordinates in flat space, the com-ponents of a vector may change during a parallel transportbecause the directions of the basis vectors change in space.However, if we execute a parallel transport of a vector alonga closed path, the components of the vector will return to theirinitial values.

Statement: Show that the operator aµ∇µ (acting on an arbi-trary vector field uλ) commutes with parallel transport alongaµ. In other words,

aµ∇µ

(Tεau

λ)

= Tεa(aµ∇µu

λ).

A.5.2 Riemann tensor

In flat space, the operation of parallel transport along a closedpath does not change any vectors. However, in a curved man-ifold, this is no longer true; e.g., on a spherical surface, a par-allel transport of a vector along a spherical triangle will gen-erally rotate the vector. The curvature of a manifold measuresthe failure of parallel transport around a closed curve to pre-serve vectors.In general, a vector uµ parallelly transported along a closedpath γ will undergo a linear transformation (a rotation and a

158

Page 167: Winitzki. Advanced General Relativity

A.5 Curvature

dilation). We may describe this transformation by a matrix

T µν (γ), so that the new vector is expressed as uµ = T µν (γ)uν .In flat space, parallel transport along a closed path will notchange any vectors (even in curved coordinates!) and thus

T µν (γ) ≡ δµν . In the general case, the dependence of Tµν (γ) on

the path γ is complicated but can be simplified if we considerinfinitesimally small closed paths. A simple example of such apath is a parallelogram spanned by two vectors aµ and bµwithsides εaµ and εbµ. One can compute the change in the vectoruµ after a parallel transport along the parallelogram (see thecalculation below). The result is

uµ = uµ + ε2Rµλαβuλaαbβ,

where Rµαβλ is called the Riemann tensor (also called the cur-vature tensor), which is given by the formula

Rµλαβ = ∂βΓµνα − ∂αΓµνβ + ΓµβλΓ

λαν − ΓµαλΓ

λνβ (A.11)

Note that the Christoffel symbol involves first derivatives ofthe metric gµν , and thus the Riemann tensor involves secondderivatives of the metric.

One also defines the Ricci tensor, Rµν ≡ Rαµαν , and the cur-vature scalarR ≡ gµνRµν . These quantities are convenient forwriting the equation for the gravitational field (the Einsteinequation).

A.5.3 ∗Expressing Riemann tensor through Γλαβ

In this section we give a detailed derivation of Eq. (A.11).

We will compute the matrix T µν that performs the paralleltransport of a vector uµ along a closed parallelogram withvertices xµ(0), x

µ(1) ≡ xµ(0) + εaµ, xµ(2) ≡ xµ(0) + ε(aµ + bµ), and

xµ(3) ≡ xµ(0) + εbµ, where xµ(0) is an arbitrary point in space-

time and aµ, bµ are arbitrary vectors. The matrix T µν describesthe change in vectors as a result of the parallel transport as

uµ = T µν uν .

The calculations are performed in an arbitrary coordinatesystem covering the entire parallelogram. For brevity, weomit the indices on coordinates of points and write simplyx(0), x(0) + εa, etc., instead of xµ(0), x

µ(0) + εaµ, etc. Let us de-

note by T µν (x; εa) the parallel transport matrix between pointsx and x+εa. Then the total transport matrix will be expressedas the (matrix) product of the transport matrices computed forthe four individual segments:

T µν = T µα (x(3);−εb)Tαβ (x(2);−εa)T βγ (x(1); εb)Tγν (x(0); εa).

Since we are only interested in considering an infinitesimallysmall parallelogram, the required value of ε will be small, sowe need to find T µν only up to a certain order in ε. In thecourse of the calculation it will become clear that only termsup to and including ε2 are relevant.

We begin by considering the parallel transport along thesegment x(0) → x(0) + εa. The parallel-transported vectorT µν (x(0); εa)u

ν can be thought of as a function of ε, which wetemporarily denote by uµ(ε). By construction, the derivativeaµ∂µ is equal to d/dε, and therefore the vector u

µ(ε) satisfiesthe differential equation

0 = aµ∇µuν =

duν(ε)

dε+ aµuα(ε)Γναµ(x(0) + εa).

Here we explicitly indicated the point, x(0) + εa, where theChristoffel symbol Γ is evaluated. Hence, the parallel trans-port operator T µν (x(0); εa) satisfies the equation

d

dεT µν (x(0); εa) + aλΓµαλ(x(0) + εa)Tαν (x(0); εa) = 0, (A.12)

with the initial condition T µν |ε=0 = δµν .It is easy to find an approximate solution of Eq. (A.12) tofirst order in ε,

T µν (x(0); εa) = δµν − εaλΓµνλ(x(0)) +O(ε2), (A.13)

where Γµνλ may be computed at x(0) since the variation of Γacross the parallelogram is of order ε and we are disregardingε2. The other transport operators (T µν (x(1); εb), etc.) are foundby the same method. Combining the four operators, we ob-tain

T µν = δµν − ε(aλ + bλ − aλ − bλ

)Γµνλ(x(0)) +O(ε2)

= δµν +O(ε2).

It is clear that the current precision is insufficient to describethe change in vectors due to parallel transport. Thus, we needto compute second-order terms.Now it is necessary to take into account the variation of

Γ across the parallelogram. To this end, we expand Γ inEq. (A.12) as

Γµαλ(x(0) + εa) = Γµαλ(x(0)) + εaβ∂βΓµαλ(x(0)). (A.14)

Since Γ is alwaysmultiplied by ε, it is sufficient to retain termsof order ε in Eq. (A.14). Let us also compress the notation byintroducing the matrix Γ(a) ≡ Γµνλa

λ and omitting the indicesin matrix multiplications. Then Eq. (A.12) with the substitu-tion (A.14) becomes

d

dεT +

(Γ(a) + εaβ∂βΓ(a)

)∣∣x(0)

T +O(ε2) = 0, (A.15)

while the first-order solution (A.13) is written as

T (x(0); εa) = 1 − εΓ(a) +O(ε2),

where 1 ≡ δµν is the identity matrix. For the present purposes,it suffices to obtain a solution of Eq. (A.15) to second order inε. Therefore we consider the ansatz

T (x(0); εa) = 1 − εΓ(a)|x(0)+ ε2U, (A.16)

where the unknown matrix U ≡ Uµν is to be found, whilethe first-order term is copied from Eq. (A.13). SubstitutingEq. (A.16) into Eq. (A.15), we find

Uµν =1

2

(Γ(a)Γ(a) − aλ∂λΓ(a)

)∣∣x(0)

.

The second-order solution is then

T (x(0); εa) = 1 +

[

−εΓ(a) − ε2

2

(Γ(a)Γ(a) − aλ∂λΓ(a)

)]

x(0)

.

Note that the second-order terms,

Γ(b)Γ(b) − bλ∂λΓ(b),

may be evaluated at any point, say at x(0), because the varia-tion of Γ across the parallelogram is of order ε. Therefore, wemay simplify the above expression to

T (x(0); εa) = 1 − εΓ(a)|x(0)+ε2

2

(Γ(a)Γ(a) − aλ∂λΓ(a)

).

159

Page 168: Winitzki. Advanced General Relativity

A Elements of Special and General Relativity

By the same method, we compute the matrix describingparallel transport from x(1) to x(2),

T (x(1); εb) = 1 − εΓ(b)|x(1)+ε2

2

(Γ(b)Γ(b) − bλ∂λΓ(b)

),

and all the other parallel transport matrices.Finally, we have to multiply the four transport matrices,preserving the order of matrix multiplications and discardingany terms of order ε3 or higher. Within first-order terms, weneed to expand Γ to first order in ε, e.g.

Γ(b)|x(1)= Γ(b)|x(0)

+ εaλ ∂λΓ(b)|x(0),

so that

Γ(a)|x(2)− Γ(a)|x(0)

= ε(aλ + bλ

)∂λΓ(a)|x(0)

+O(ε2),

Γ(b)|x(3)− Γ(b)|x(1)

= ε(−aλ + bλ

)∂λΓ(b)|x(0)

+O(ε2).

Within second-order terms, we may evaluate all Γ’s at x(0).After a straightforward calculation, we obtain

T = T (x(3);−εb)T (x(2);−εa)T (x(1); εb)T (x(0); εa)

= 1 − εΓ(a)|x(0)− εΓ(b)|x(1)

+ εΓ(a)|x(2)+ εΓ(b)|x(3)

+ε2(Γ(b)Γ(a) − Γ(a)Γ(b) − aλ∂λΓ(a) − bλ∂λΓ(b)

)

= 1 + ε2(Γ(b)Γ(a) − Γ(a)Γ(b) + bλ∂λΓ(a) − aλ∂λΓ(b)

).

Restoring the full index notation, we can write the result as

T µν = δµν + ε2Rµναβaαbβ,

Rµναβ ≡ ∂βΓµνα − ∂αΓµνβ + ΓµβλΓ

λαν − ΓµαλΓ

λνβ .

The last line coincides with Eq. (A.11). This calculation showsthat the Riemann tensor Rµναβ describes the effect of an in-finitesimal parallel transport along a closed curve on vectors.

A.6 Covariant integration

A.6.1 Determinant of the metric

An important object in GR is the determinant of the metric,usually denoted by g,

g ≡ det (gµν) .

It follows that the determinant of the contravariant metric isg−1 = det (gµν). Note that the function g is a determinantof a (0,2)-tensor gµν which not a scalar function but dependsnontrivially on the coordinate system. For instance, given twocoordinate systems xµ and xµ, the corresponding compo-nents gµν and gµν of the metric are related by

gµνdxµdxν = gµνdx

µdxν ,

which can be rewritten in the matrix notation (temporarilyusing capital letters to denote the matrices G ≡ gµν andS ≡ ∂xµ/∂xν) as

G = ST GS.

Since detAT = detA for any matrix A, we find

det gµνdet gµν

=

(

det∂xµ

∂xν

)2

. (A.17)

It is clear that the metric determinant is a coordinate system-dependent quantity.Note that a physical metric has the signature (+ −−−), soits determinant is always negative.

A.6.2 Covariant volume element

The determinant of the metric is used most frequently to ex-press integration over the manifold in a covariant way, suchthat functions can be integrated in arbitrary coordinate sys-tems.Suppose we would like to integrate a scalar function f(x)over a region of the spacetime manifold. The ordinary for-mula

∫d4x f(x) is good in flat (Cartesian) coordinates but un-

satisfactory in curved coordinates. Namely, a different choiceof coordinates, xµ, will generally yield a different result,

d4x f(x) =

d4x f(x) det

[∂xµ

∂xν

]

6=∫

d4x f(x),

unless the Jacobian happens to be equal to 1, det [∂xµ/∂xν ] =1, which is certainly a very special case. The problem is that

we have a formula,∫d4x f(x), that works only in flat coor-

dinates. The solution is to write all the integrals using thecovariant volume element,

d4V ≡ d4x√

−g(x),

instead of d4x. Because of the transformation law (A.17), it isclear that the covariant formula for the integral of f ,

d4x√

−g(x)f(x),

yields the same answer in every coordinate system, including

flat coordinate systems where√

−g(x) ≡ 1 (if such coordinatesystems exist).

A.6.3 Derivative of the determinant

It is often necessary to compute derivatives of the metric de-terminant, for instance ∂µ

√−g. Such derivatives can be easilyexpressed through gµν and g.In order to derive this expression, we use a general formulafor the derivative of the determinant of a (nondegenerate) ma-trix:

d

dt[detA(t)] = Tr

(

A−1 d

dtA(t)

)

detA(t). (A.18)

This formula can be found quickly from the matrix identity

det eX = eTrX ,

whereX is an arbitrary matrix, after a substitution A ≡ eX .Using Eq. (A.18), we find, in particular,

∂µg =[gαβ∂µgαβ

]g,

∂µ√−g =

[gαβ∂µgαβ

]√−g

2. (A.19)

A.6.4 Covariant divergence

Divergence of a vector field,

div a ≡ ∇µaµ ≡ aµ;µ,

is an important special case where the covariant formula forthe derivative is simpler than in the general form.Using Eqs. (A.7)-(A.8), we find

∇µaµ = ∂µa

µ + Γµµνaν ;

Γµµν =1

2gµα (gαµ,ν + gαν,µ − gµν,α) =

1

2gµαgµα,ν .

160

Page 169: Winitzki. Advanced General Relativity

A.7 Einstein’s equation

Comparing the right-hand side in the last line with Eq. (A.19),we obtain

Γµµν =1√−g∂ν

√−g

and hence

∇µaµ = ∂µa

µ +1√−ga

ν∂µ√−g =

1√−g∂ν[√−gaν

]. (A.20)

This simple formula is useful because it does not require com-puting the full set of Christoffel symbols.

A.6.5 Integration by parts

In field theory, one often needs to integrate by parts using theGauss theorem for volume integrals, for instance,

V

d4x∂µuµ =

Σ

d3Aµuµ, (A.21)

where V is a 4-dimensional region where a vector field uµ is

defined, Σ is the boundary 3-surface of V , and d3Aµ is a di-rected 3-area element of the hypersurface Σ. However, theidentity (A.21) is valid only if we use the same fixed coordi-nate system at both sides of the equation. We would like tohave a “covariant” formula such that both sides of the equa-tion contain only “covariant” quantities that remain the samein any coordinate system.To derive such a formula, we guess (heuristically) that theordinary derivative ∂µu

µ must be replaced by the covariant

derivative ∇µuµ, and the volume element d4x by the covari-

ant volume element d4x√−g. In flat space, these replace-

ments do not modify the original expressions. Let us nowverify that this replacement indeed yields the correct results.The left-hand side of Eq. (A.21) is replaced by

V

d4x∂µuµ →

V

d4x√−g∇µu

µ.

This expression stays the same in every coordinate system.Using Eq. (A.20), we find

V

d4x√−g∇µu

µ =

V

d4x∂µ[√−guµ

].

The above equation is valid in any coordinate system; choos-ing some coordinate system, wemay use Eq. (A.21) and obtain(still in the same fixed coordinate system)

V

d4x∂µ[√−guµ

]=

Σ

d3Aµ√−guµ.

The expression on the right-hand side is interpreted as the in-

tegral of uµ over the covariant area element√−gd3Aµ. This

expression is the same in every coordinate system. Therefore,the “covariant Gauss theorem” is written as

V

d4x√−g∇µu

µ =

Σ

d3Aµ√−guµ.

A.7 Einstein’s equation

We have seen the equation of motion (A.10) of a point massin a curved spacetime. The influence of gravity on a massivebody is described by the presence of the covariant derivativein the equation of motion, rather than by a “force” of gravity

appearing in the right-hand side. When solving Eq. (A.10),the metric gµν (and thus the Christoffel symbol Γ

λαβ) are con-

sidered known.Einstein’s equation describes the influence of matter on themetric gµν :

Rµν −1

2Rgµν = −8πGTµν ,

where Rµν is the Ricci tensor, R is the curvature scalar,G is Newton’s constant, and Tµν is the combined energy-momentum tensor of all matter.

161

Page 170: Winitzki. Advanced General Relativity
Page 171: Winitzki. Advanced General Relativity

B How not to learn tensor calculus

This brief chapter is intended as a consolation to those stu-dents who have difficulty learning tensor calculus from someGR textbooks. Almost every explanation here (except for triv-ial statements) is flawed in one way or another, even thoughall the equations are formally correct. Because of this, a be-ginning student might become thoroughly confused and frus-trated when trying to understand this material. However, thefault is with the explanations and not with the student! Betterexplanations are given elsewhere in this book.

B.1 Tensor algebra

Letters with Greek indices, such as Aα or Bλµν , denote arrays

of numbers, and indices run over 0, ..., N − 1, where N is thedimension of space. For brevity, we shall always use the Ein-stein summation convention: we omit the sign

∑but always

imply summation when an index is repeated in an expression,which must occur once as an upper index and once as a lowerindex. For example, we write

AαBαCβγD

β ≡N∑

α=1

N∑

β=1

AαBαCβγD

β .

The Levi-Civita symbol is defined as ε123...N = 1 andεα...λµ....ρ = −εα...µλ....ρ, i.e. εα...ρ is totally antisymmetric inall the indices. For example, if N = 3 then ε123 = ε231 =ε321 = 1, ε132 = ε213 = ε312 = −1, and all the other compo-nents of εαβγ are equal to zero.The Kronecker symbol δαβ is defined as δαβ = 1 if α = βand δαβ = 0 if α 6= β. This symbol represents an identity ma-trix of dimensions N ×N . Sometimes, the Kronecker symbolis also written as δαβ .It is easy to see that

εα1...αNεα1...αN = N !,

ελα1...αN−1εµα1...αN−1 = (N − 1)!δµλ .

The determinantdetAαβ of a squarematrixAαβ is the num-ber defined by

detAαβ =1

N !εα1...αN

εβN ...βNAα1β1 ...AαNβN ,

where the right-hand side containsN factors of Aαβ . It is wellknown that the determinant of the product of two matrices isequal to the product of their determinants.An N -dimensional contravariant vector vµ is an array ofquantities

(v1, ..., vN

)called components that transform un-

der a change of basis according to the formula

vµ = vν(S−1

ν,

where Sµν is the matrix describing the change of basis,

eµj → eµj = Skj eµk , (B.1)

and S−1 is the inverse matrix. Here, a basis is a set of N con-travariant vectors eµ1 , ..., eµN such that the determinant of the

N × N matrix eµj is nonzero. Such a matrix is called nonde-generate. Since the matrix eµj is nondegenerate, any vector v

µ

can be uniquely decomposed as a linear combination of basisvectors:

vµ = vjeµj .

This is why the N numbers vµ (µ = 1, ..., N ) are called thecomponents of the vector vµ. By convention, a superscript in-dex is used for contravariant vectors.A metric is a nondegenerate symmetric matrix gµν = gνµ,

det gµν 6= 0. The scalar product of vectors uµ and vµ is equalto gµνu

µvν and can be written for brevity as

gµνuµvν ≡ uµvµ,

where the quantities vµ are called the covariant componentsof the vector vµ. Covariant components transform under achange of basis as

vµ = Sνµvµ;

that is, they transform in the same way as the basis vectorsdo [Eq. (B.1)]. Such vectors vµ are called covariant vectors orcovectors and are written with a subscript index µ. Then, inevery expression there can be only one pair of repeated in-dices, one subscript and one superscript, in agreement withthe Einstein summation convention. An expression violatingthis rule, such as aµbµc

ν , is written incorrectly, while aµbµcν is

correct.It is clear that the metric gµν can be used to transform con-travariant components into covariant ones, vµ = gµνv

ν . Thisoperation is called lowering the index. Likewise, the inversemetric gµν can be used to raise the index: vµ = gµνvν . Sinceraising an index after that index has been lowered should re-sult in the same vector, it follows that

gµνgλν = δλµ.

A tensor is a set of components with several indices, suchas Aαβγ , that transforms under a change of basis (B.1) as a suit-able product of components of covariant and contravariantvectors. For example, a tensor Aαβγ transforms as the prod-

uct uαvβwγ , i.e. according to the formula

Aαβγ = Sνγ(S−1

λ

(S−1

µAλµν .

This tensor is said to have rank (2,1). Tensors of any rank(m,n) are defined in this way; contravariant and covariantvectors are tensors of rank (1,0) and (0,1) respectively. It isclear that only tensors of equal rank can be added; for exam-ple, the expression vα+uα does not transform as a vector andis therefore incorrect.Note that uαvβ is not the same tensor as uβvα, but of course

vαvβ = vβvα and uαvβ = vβuα.

B.2 Tensor calculus

So far we considered vectors and tensors in Euclidean coordi-nates, but nowwe need to consider arbitrary, curvilinear coor-dinates xµ. Note that xµ itself is not a vector anymore because

163

Page 172: Winitzki. Advanced General Relativity

B How not to learn tensor calculus

the coordinates are not rectangular. However, an infinitesimaldisplacement δxµ is a vector. To verify this, we consider an ar-bitrary change of coordinates described by arbitrary functionsxµ(xν),

xµ → xµ = xµ(xν).

Then the displacement δxµ transforms as

δxµ =∂xµ

∂xνδxν ,

which is the correct transformation law for a vector under achange of basis (B.1) with the matrix Sµν = ∂xµ/∂xν . There-fore, δxµ is an example of a contravariant vector field, thatis, a set of components vµ(xν) depending on the coordinatesxν that transform under an arbitrary change of coordinates ascomponents of a contravariant vector,

vµ =∂xµ

∂xνvν .

A covariant vector field transforms as components of a co-variant vector. Analogously, a tensor field of rank (p, q) isdefined as a set of components Tα1...αp

β1...βqthat transform

under a change of coordinates as

Tα1...αpβ1...βq

=∂xα1

∂xλ1...∂xαp

∂xλp

∂xµ1

∂xβ1...∂xµq

∂xβqT λ1...λp

µ1...µq.

An important operation of tensor calculus is the covariantderivative. First, observe that a partial derivative of a con-travariant vector field, vµ,ν ≡ ∂vµ/∂xν does not transform asa tensor of rank (1,1). To get a derivative that transforms cor-rectly, one adds a suitable “correction” term and thus definesthe covariant derivative

vµ;ν ≡ ∂vµ

∂xν+ Γµανv

α, (B.2)

where the set of quantities Γµαβ is called the Christoffel sym-bol. It can be verified by a direct calculation that the covariantderivative vµ;ν defined by Eq. (B.2) transforms as a tensor ofrank (1,1) as long as the Christoffel symbol transforms as

Γλαβ = Γµνρ∂xν

∂xα∂xρ

∂xβ∂xλ

∂xµ+

∂2xλ

∂xν∂xρ∂xν

∂xα∂xρ

∂xβ. (B.3)

Therefore, we require that the transformation law (B.3) holdsfor the Christoffel symbol. Note that Γλαβ is not a tensor,and a suitable change of coordinates xµ → xµ can makeΓλαβ = 0 at any point. Such coordinates are called an inertial

coordinate system at that point. However, Γλαβ −Γλβα is a ten-sor because the second term in Eq. (B.3) drops out when wewrite the transformation law for Γλαβ − Γλβα. Since the tensor

Γλαβ − Γλβα vanishes in a locally inertial coordinate system, it

vanishes in all coordinate systems, hence Γλαβ is symmetric in(α, β).As shown in Eq. (B.2), the covariant derivative is denotedby a semicolon and an index. The covariant derivative of acovariant vector is defined similarly by

vµ;ν =∂vµ∂xν

− Γλµνvλ.

Then it can be shown that the covariant derivative of the scalarproduct aµb

µ coincides with the ordinary derivative, i.e.

(aµbµ);α = aµ;αb

µ + aµbµ;α =

∂xα(aµb

µ) .

This is, of course, to be expected for a scalar quantity aµbµ.

Since tensors are quantities that transform as products ofvectors and covectors, the covariant derivative can be definedon arbitrary tensors in a similar way. For each upper indexthere is a term with +Γλαβ , and for each lower index a term

with −Γλαβ . For example,

Aαβγ ;µ =∂

∂xµAαβγ + ΓαµνA

νβγ + ΓβµνA

ανγ − ΓνγµA

αβν .

Then one can prove that the Leibnitz rule holds for arbitrarytensors,

(Aαβ...Bγδ...

)

;µ= Aαβ...;µB

γδ... +Aαβ...Bγδ...;µ.

In General Relativity, one uses the Christoffel symbol of theform

Γµαν =1

2gµβ (gαβ,ν + gνβ,α − gαν,β) . (B.4)

This expression can be derived from the property Γλαβ = Γλβα.To derive Eq. (B.4), let us first prove that gµν;α = 0. For anyvectorXβ , we have gαβX

β = Xα, hence

gαβ(Aβ)

;λ= (Aα);λ (B.5)

and therefore

(gαβA

β)

;λ= gαβ;λA

β + gαβ(Aβ)

⇒ gαβ;λAβ = 0.

The required property gαβ;λ = 0 follows since Aβ is an arbi-trary vector. Then it is a matter of “juggling the indices” toderive Eq. (B.4). Namely, we write

0 = gαβ;λ = gαβ,λ − Γµαλgβµ − Γµβλgαµ,

then exchange indices α, β, λ to get

0 = gαλ,β − Γµαβgλµ − Γµβλgαµ,

0 = gλβ,α − Γµαλgβµ − Γµβαgλµ.

Adding the above two equations and subtracting the initialone, we obtain Eq. (B.4).The Riemann tensor Rκλρσ is defined as the “covariantderivative” of the Christoffel symbol as follows,

Rκλρσ = ∂ρΓκλσ − ∂σΓ

κρλ + ΓκνρΓ

νσλ − ΓκνσΓ

νλρ.

It is easy to check that this formula defines a tensor (despiteΓλαβ not being a tensor), because one can notice that

Rκλρσuσ = ∇ρ∇λu

κ −∇λ∇ρuκ, (B.6)

which shows that Rκλρσ is manifestly a tensor. In flat space,∇λ ≡ ∂λ and therefore R

κλρσ = 0 because ∂λ∂µ = ∂µ∂λ.

B.3 Hints

In this section, I give you some hints as to why the precedingexplanations are flawed.The Levi-Civita symbol ε is not merely a collection of num-bers 0, 1, and −1, arranged in a special way, but is inter-preted as an antisymmetric tensor or a volume N -form. TheKronecker symbol δαβ is the matrix representing an identity

164

Page 173: Winitzki. Advanced General Relativity

B.3 Hints

transformation (in any basis), while the matrix δαβ represents(in an orthogonal basis) a bilinear form that may represent ascalar product in a Euclidean space. Determinants of matri-ces are not simply some complicated combinations of matrixelements. Determinants have a direct geometric meaning; forinstance, the oriented volume of the image of the unit cubeunder a linear transformation T is equal to detT .Vectors are most clearly seen as geometric quantities — ele-ments of vector spaces, rather than collections of componentsthat mysteriously “transform themselves under a change ofbasis.” Bases are maximal sets of linearly independent vec-tors, and the components of a vector are coefficients relative toa chosen basis. It is easier to regard the transformation formu-las for components as consequences of the geometric picturerather than as definitions.Covariant vectors are better seen as vectors from a differentvector space (the dual space), rather than as a “different kindof components” of the same vector. There is no natural mapbetween contravariant and covariant vectors unless a metricis given; and when several metrics are given, there are severalsuch maps. The idea that there exist “covariant components”of vectors can be justified only if a metric is fixed once and forall.The inverse metric gαβ is either defined as the inverse matrixto gαβ , or is derived through the dual basis (in the dual space).A tensor is an element of a tensor space (the space of ten-sor products and their linear combinations). As in the case ofvectors, the components of a tensor with respect to a basis canbe defined after the tensor space is defined. Then the compli-cated transformation law for the components will be a naturalconsequence of the construction, rather than a definition of atensor.An “infinitesimal displacement” δxµ is a heuristic represen-tation of a tangent vector: one imagines that a point p movesby “infinitesimal” amount along a curve which is a flow lineof a given tangent vector (see Sec. 1.2.5). “Covariant” tangentvectors are elements of the dual tangent space.A covariant derivative is not simply an old derivative witha “correction” added to it, but rather the only type of direc-tional derivative that can be meaningfully considered as a ge-ometric object in a curved space. (The coordinate derivative∂vµ/∂xν ≡ vµ,ν does not exist as a geometric object since itdepends on the coordinate system xµ.) However, there ex-ist infinitely many possible covariant derivatives since thereare infinitely many possible Γλαβ .

The Christoffel symbol Γλαβ cannot be set to zero by a

change of coordinates unless T λαβ ≡ Γλαβ − Γλβα = 0 (torsion-

freeness) already holds. If the torsion tensor T λαβ is nonzero,locally inertial coordinate systems do not exist; it is a physicalassumption that they exist, rather than a mathematical prop-erty.Covariant derivative is defined on arbitrary tensors by therequirements that the Leibnitz rule hold and that ∇ coincidewith ∂ on scalar functions. These properties cannot be derivedfrom the covariant transformation alone. The requirement ofcorrect transformation of components is not sufficient to de-fine covariant derivatives of arbitrary tensors, e.g. Aαβγ

µ, be-cause one could in principle choose a different Γλαβ for tensorsof each different rank. One needs extra information to de-rive the fact that the same Γλαβ is used in covariant derivativesof tensors of every rank. This extra information comes fromassuming the Leibnitz rule and the property ∇µf = ∂µf forscalar functions f .

The assumption gαβ;λ = 0 is a separate physical assumptionthat cannot be actually derived from general properties of vec-tors; the argument given above assumes gαβ;λ = 0 tacitly inEq. (B.5). The formula (B.4) indeed follows from Γλαβ = Γλβαand gαβ;λ = 0.The Riemann tensor cannot be seen as a covariant deriva-tive of Γλαβ because Γλαβ is not a tensor, and in any case, thegiven formula does not represent a covariant derivative of athird-rank tensor. The property (B.6) is ordinarily used as adefinition of Rκλρσ .

165

Page 174: Winitzki. Advanced General Relativity
Page 175: Winitzki. Advanced General Relativity

C Calculations and proofs

C.1 For Chapter 1

Proof of Statement 1.2.2.1 on page 5: Suppose such asmooth map exists and maps the north pole 0, 0, 1 into apoint p0 ∈ R2. Consider a very small circle around the northpole. Since the chart provides a smooth one-to-one map, thiscircle will be mapped into a small “image” circle around p0

in R2. The circle around the north pole can be smoothly con-tracted into a point in two essentially different ways: eitherby contracting it to the north pole, or by shifting it around thesphere towards the south pole and contracting to the southpole, without ever crossing the north pole. These deforma-tions of the circle are, by assumption, smoothly mapped intocorresponding deformations of the image circle in R2. Dur-ing the second deformation, the image circle around p0 in R2

is supposedly contracted into a point without ever crossingp0. But such a deformation is obviously impossible withinR2. This contradicts the assumption that a smooth one-to-onemap S2 → R2 exists.

Proof of Statement 1.2.4.1 on page 8: An analytic functionf(z) is a sum of its Taylor series,

f(z) =

∞∑

m=0

1

m!zm

d(m)f

dzm

∣∣∣∣z=0

.

Since derivations are linear by Eq. (1.2), it is sufficient to provethe statement for f(z) = zm, m = 0, 1, 2, ... By Eq. (1.4), wehave v 1 = 0. It is trivial to check the statement for f(z) = z.Now use induction inm, starting withm = 1, i.e. with f(z) =z. Once the statement is proved for f(z) = zm−1, we have

v (gm−1

)= (m− 1) gm−2v g.

Now we consider f(z) = zm and use Eq. (1.3),

v f(g) = v (ggm−1

)= (v g) gm−1 + gv (gm−1)

= mgm−1v g,thus the statement is proved for f(z) = zm.

Proof of Statement 1.2.10.2 on page 12: It is not necessaryto introduce a local coordinate system around p0. Introducethe points p1 and p2 as in Fig. 1.6. By definition of a tangentvector and its flow lines, we have

(a f)|p0 = limδτ→0

f(p1) − f(p0)

δτ, (C.1)

because the point p1 is obtained by following the flow line ofa for an interval δτ , while the right-hand side of Eq. (C.1) isthe definition of the directional derivative along the flow line.The relation (C.1) holds for every function f , so let us apply itto the function b f instead of f ,

(a (b f))|p0 = limδτ→0

1

δτ(b f(p1) − b f(p0)) . (C.2)

We now need to express the function b f similarly,

(b f)|p1 = limδτ ′→0

f(p) − f(p1)

δτ ′,

(b f)|p0 = limδτ ′→0

f(p2) − f(p0)

δτ ′,

where we introduced a different variable δτ ′ for convenience.Substituting this into Eq. (C.2), we find

(a (b f))|p0 = limδτ → 0δτ ′ → 0

f(p) − f(p1) − f(p2) + f(p0)

δτ δτ ′.

Similarly, using the analogous relation

(a f)|p2 = limδτ→0

f(p′) − f(p2)

δτ,

we find

(b (a f))|p0 = limδτ → 0δτ ′ → 0

f(p′) − f(p2) − f(p1) + f(p0)

δτ δτ ′.

It follows that

([a,b] f)|p0 = limδτ → 0δτ ′ → 0

f(p) − f(p′)

δτ δτ ′,

as required.

Calculation 1.2.11.1 on page 13: Using the family of curvesγ(τ ; s) as a coordinate grid, we may introduce a local coordi-nate system τ, s, t1, ..., tn−2 such that τ and s are the first twocoordinates. Then v = ∂τ and c = ∂s, therefore the vectorsc and v are the first two vectors of the coordinate basis cor-responding to the local coordinate system τ, s, t1, ..., tn−2.Hence, these vectors commute.

Proof of Statement 1.2.11.2 on page 13: We maycomplete the vector v(p0) 6= 0 to an arbitrary basisv(p0), c1(p0), c2(p0), ... at the initial point p0. Then the re-lation [v, cj ] = 0 is a first-order differential equation for cjthat has a unique solution cj(p) along one flow line of v start-ing from the initial point p0, at least in some neighborhood ofp0. To see this explicitly, consider the equation [c,v] = 0 in alocal coordinate system,

α

cα∂vβ

∂xα−∑

α

vα∂cβ

∂xα= 0.

Consider one flow line γ(τ) of v, such that v ≡ γ is the tan-gent vector to the curve γ(τ). For points p on the curve γ, thesecond term above is equal to the derivative of the componentcβ along this flow line,

α

vα∂cβ

∂xα

∣∣∣∣∣p

=d

dτcβ(τ),

assuming that γ(τ) = p. So the equation [c,v] = 0 at point pbecomes

d

dτcβ =

α

cα∂vβ

∂xα,

167

Page 176: Winitzki. Advanced General Relativity

C Calculations and proofs

which is an ordinary differential equation for the unknowncomponents cβ(τ), to be solved with an initial conditioncβ(τ0). This differential equation has a unique solution (byassumption, the vector field v is smooth everywhere). Thuswe can in principle compute the vectors cj(p), where p isany point along one flow line of v. The set of vectorsv(p), c1(p), ..., cn−1(p) is linearly independent at the initialpoint p0 and will remain linearly independent along the flowline at least in some neighborhood of p0. We can prove this bynoting that linear independence of the vectors v, c1, ..., cn−1can be verified by computing the determinant of their compo-nents in a local coordinate system. The vectors form a basis iffthe determinant is nonzero. By assumption, the determinantis nonzero at p0, and the determinant is a smooth functionalong the flow line. Thus the determinant remains nonzeroalong the flow line at least in some neighborhood of p0 (per-haps in a very small neighborhood, but that is all we needpresently).So far we obtained a basis of connecting vectors along oneflow line of v. Since the flow lines of v fill at least a small patchU ⊂ M of the manifoldM around the point p0, the same con-struction gives a basis of connecting vector fields along everyother flow line within U , again perhaps only in a sufficientlysmall neighborhood of p0. Thus we have obtained a basis ofconnecting fields locally, in a neighborhood of p0.

Proof of Statement 1.4.4.1 on page 20: (a) Let e1, ..., en bea basis in Rn; we will show that a1 ∧ ... ∧ an is either equal tozero, or proportional to e1∧ ...∧en with a nonzero coefficient.It will follow that any two n-vectors are proportional.First we show that a1 ∧ ... ∧ an = 0 if the set a1, ...,anis linearly dependent. This follows from the antisymmetry ofthe exterior product: suppose

an =

n−1∑

j=1

λjaj ,

where some λj are nonzero. Since

a1 ∧ ... ∧ an−1 ∧ aj = 0 for j = 1, ..., n− 1,

we have

a1 ∧ ... ∧ an = a1 ∧ ... ∧ an−1 ∧n−1∑

j=1

λjaj = 0.

Now suppose that the set aj is linearly independent. Thenevery basis vector ek can be expressed as a linear combinationof ajwith some coefficients,

ek =∑

j

Akjaj .

Let us now substitute these linear combinations into thenonzero multivector e1 ∧ ... ∧ en, and simplify the resultingexpression using the linearity of ∧. The result will be a sum ofterms of the form aj1 ∧ ... ∧ ajn with some coefficients. Onlythe terms with all different jk will survive. Finally, we canreorder the vectors aj at will (and if necessary change thesign of a1 ∧ ... ∧ an), so eventually we will obtain simplya1 ∧ ... ∧ an with a nonzero coefficient. Therefore, the mul-tivectors a1 ∧ ... ∧ an and e1 ∧ ... ∧ en are proportional.(b) We need to show that the relation between n-dimensional volumes,

λVol (a1, ...,an) = Vol (b1, ...,bn) ,

is equivalent to the relation between multivectors

λa1 ∧ ... ∧ an = b1 ∧ ... ∧ bn.

It is sufficient to consider the case when all the volumes in-volved are nonzero. By part (a), the two multivectors are al-ways proportional with some nonzero coefficient; denote thatcoefficient by λ. It remains to show that the volumes are re-lated by the same factor λ.To show this, we will transform the parallelepiped spannedby aj into the parallelepiped spanned by bj. The trans-formation will be performed through a sequence of stepswhich preserve the relationship between the volume of theparallelepiped and the multivector a1 ∧ ...∧an. Each step willeither multiply both the volume and the multivector by thesame number, or leave them both unchanged. At the end, themultivector a1 ∧ ... ∧ an will be transformed into b1 ∧ ...∧ bn;hence, the volumes are related by the factor λ, which willprove the statement.The allowed steps of the transformation are of three kinds:(i) adding a multiple of aj to ai,

ai → ai + αaj ,

where α 6= 0 is some number; (ii) stretching a vector ai,

ai → αai,

where α 6= 0; (iii) exchanging two vectors, ai with aj . It isclear that the multivector

a1 ∧ ... ∧ an

remains unchanged under (i), is multiplied by α under (ii),and changes sign under (iii). Similarly, it is easy to see thatthe volume of the parallelepiped spanned by aj remainsunchanged under (i) due to the argument of Fig. 1.9, appliedto the two-dimensional plane containing the vectors ai,aj.The volume is multiplied by α under (ii) and changes signunder (iii).By assumption, the initial set aj is linearly independent(otherwise the volume is zero); thus, every vector bk canbe expressed through linear combinations of aj. These lin-ear combinations can be built by a finite sequence of steps (i),(ii), or (iii). In this way, the multivector a1 ∧ ... ∧ an can betransformed into b1 ∧ ... ∧ bn. This concludes the proof.

Proof of Statement 1.4.5.1 on page 20: After State-ment 1.4.4.1, we only need to prove the relationship betweenthe multivectors,

Tv1 ∧ Tv2 ∧ ... ∧ Tvn = (v1 ∧ v2 ∧ ... ∧ vn) det T .

Since any two n-vectors are proportional in an n-dimensional

space, it remains to prove that the proportionality factor det Tis actually independent of the choice of the vectors vj. (Itwill follow that det T coincides with the determinant definedthrough an orthogonal basis.)Let us first give the proof for n = 2 and then generalize toany dimension n. In two dimensions, we have

Tv1 ∧ Tv2 = (v1 ∧ v2) det T . (C.3)

The bivector v1 ∧ v2 is zero if v1 is parallel to v2; in this case,the statement is trivial, so let us consider the case v1 ∧ v2 6= 0.

168

Page 177: Winitzki. Advanced General Relativity

C.1 For Chapter 1

Then the vectors v1,v2 are a basis in the plane. We wouldlike to show that

Tu1 ∧ Tu2 = (u1 ∧ u2) det T (C.4)

for any vectors u1,u2. Since v1,v2 is a basis, we can decom-pose

u1 = U11v1 + U12v2, u2 = U21v1 + U22v2.

Since T is a linear map, we also have a similar decomposition

Tu1 = U11Tv1 + U12Tv2, Tu2 = U21Tv1 + U22Tv2.

Then we compute

u1 ∧ u2 = (U11v1 + U12v2) ∧ (U21v1 + U22v2)

= (U11U22 − U12U21)v1 ∧ v2,

Tu1 ∧ Tu2 =(

U11Tv1 + U12Tv2

)

∧(

U21Tv1 + U22Tv2

)

= (U11U22 − U12U21) Tv1 ∧ Tv2.

Therefore, the relationship (C.4) simply follows from Eq. (C.3)through multiplication by the factor U11U22 − U12U21.In n dimensions, it is sufficient to consider the case when

v1 ∧ ... ∧ vn 6= 0, i.e. the set vj is a basis. We need to showthat

Tu1 ∧ ... ∧ Tun = (u1 ∧ ... ∧ un) det T

for any other n-vector u1 ∧ ... ∧ un. Since every uj can beexpressed as a linear combination of vj,

uj =∑

k

Ujkvk,

we may substitute these linear combinations into u1 ∧ ... ∧ un

and also into Tu1 ∧ ... ∧ Tun. Since T is a linear map,

Tuj =∑

k

UjkTvk,

so the substitution of Tuj =∑

k UjkTvk into Tu1 ∧ ... ∧ Tunyields terms

Tv1 ∧ ...Tvn

with the same coefficients as the substitution of uj =∑

k Ujkvk into u1 ∧ ... ∧ un. Therefore the desired relation-ship for uj follows.

Calculation 1.8.4.1 on page 47: Denote by ∇ and∇ the Levi-Civita connections corresponding to the metrics g and g, and

analogously the Riemann tensors R(...) andR(...). It is conve-nient to consider the fully covariant Riemann tensor,

R(a,b, c,d) = g(

[∇a, ∇b]c − ∇[a,b]c,d)

.

Since the derivatives of the arbitrary vectors a,b, c,d willeventually cancel, we can simplify the calculations if we as-sume that all these derivatives vanish, so ∇ab = 0, [a,b] = 0,etc., at a point pwhere we are computing the Riemann tensor.

(But note that ∇ab 6= 0, etc.) Since [a,b] = 0, we have

e−2λR(a,b, c,d) = g(

∇a∇bc − ∇b∇ac,d)

,

R(a,b, c,d) = g (∇a∇bc −∇b∇ac,d) ,

We would like to transform the expression R(a,b, c,d) tosomething containing R(a,b, c,d) plus extra terms. It is clear

that we need to express ∇ everywhere through ∇. We maywork as follows. The only way to express ∇ through ∇ is byenclosing them in the metric g and using Eq. (1.45). The dif-

ference between the connections ∇ and∇ is a transformation-valued 1-form Γ, which is expressed as

Γ(a)b ≡ ∇ab −∇ab,

g(Γ(a)b, c) = (a λ) g(b, c) + (b λ) g(a, c)− (c λ) g(a,b).

We would like to write Γmore explicitly, without enclosing itin g. To this end, we introduce an auxiliary (but fixed) vectorfield l ≡ g−1dλ. Then x λ ≡ g(x, l) for every vector x. Using

the field l, we can rewrite Γmore concisely,

Γ(a)b = g(a, l)b + g(b, l)a− g(a,b)l.

Now we consider the first term in the new Riemann tensor,

∇a∇bc =(

∇a + Γ(a))(

∇b + Γ(b))

c

= ∇a∇bc + ∇aΓ(b)c + Γ(a)Γ(b)c,

where we omitted terms containing first derivatives of b andc, since by assumption these derivatives vanish. The deriva-

tive∇aΓ can be computed as follows,

∇aΓ(b)c = g(b,∇al)c + g(c,∇al)b− g(b, c)∇al,

again omitting first derivatives of b, c. Finally, we simplifythe last term,

Γ(a)Γ(b)c = g(a, l)Γ(b)c + g(Γ(b)c, l)a − g(a, Γ(b)c)l

= g(a, l) (g(b, l)c + g(c, l)b− g(b, c)l)

+ (g(b, l)g(c, l) + g(c, l)g(b, l) − g(b, c)g(l, l))a

− (g(b, l)g(a, c) + g(c, l)g(a,b) − g(b, c)g(a, l)) l

= (2g(b, l)g(c, l) − g(b, c)g(l, l))a + g(a, l)g(b, l)c

+ g(a, l)g(c, l)b− (g(b, l)g(a, c) + g(c, l)g(a,b)) l.

Now we need to antisymmetrize the resulting expression for

∇a∇bc in a,b. Note that

g(b,∇al) = ∇ag(b, l) = a (b λ) ,

and also [a,b] = 0 by assumption, therefore g(b,∇al) is asymmetric bilinear form in a,b. This bilinear form is calledthe Hessian of the function λ; it is a tensor representing allthe (covariant) second derivatives of λ. Let us denote this ten-sor by

Hλ(a,b) = Hλ(b,a) ≡ g(∇al,b);

in the index notation this would be λ;αβ . Then

∇aΓ(b)c = Hλ(a,b)c +Hλ(a, c)b − g(b, c)∇al,

so the antisymmetrization of∇aΓ(b)c in a,b cancelsHλ(a,b)and yields

∇aΓ(b)c −∇bΓ(a)c = Hλ(a, c)b − g(b, c)∇al

−Hλ(b, c)a + g(a, c)∇bl.

169

Page 178: Winitzki. Advanced General Relativity

C Calculations and proofs

The antisymmetrization of Γ(a)Γ(b)c gives (after some can-cellations)

Γ(a)Γ(b)c − Γ(b)Γ(a)c = g(l, l) (g(a, c)b − g(b, c)a)

+ g(a, l) (g(b, c)l − g(c, l)b) + g(b, l) (g(c, l)a − g(a, c)l) .

Finally, putting the pieces together, we express the new Rie-mann tensor as follows,1

e−2λR(a,b, c,d) = R(a,b, c,d)

+Hλ(a, c)g(b,d) −Hλ(a,d)g(b, c)

−Hλ(b, c)g(a,d) +Hλ(b,d)g(a, c)

+ g(l, l) (g(a, c)g(b,d) − g(b, c)g(a,d))

+ (g(a, l)g(b, c) − g(b, l)g(a, c)) g(d, l)

+ (g(b, l)g(a,d) − g(a, l)g(b,d)) g(c, l).(C.5)

Note (out of curiosity) that the last six terms of the expressionabove can be rewritten more concisely as the determinant of acertain 3 × 3matrix,

e−2λR(a,b, c,d) = R(a,b, c,d)

+Hλ(a, c)g(b,d) −Hλ(a,d)g(b, c)

−Hλ(b, c)g(a,d) +Hλ(b,d)g(a, c)

+ det

∣∣∣∣∣∣

g(a, c) g(b, c) g(l, c)g(a,d) g(b,d) g(l,d)g(a, l) g(b, l) g(l, l)

∣∣∣∣∣∣

.

Now we can compute the Ricci tensor, assuming that thespacetime has N dimensions. We use the properties

Tr(a,b)g(a,b) = N,

Tr(a,b)Hλ(a,b) ≡ Tr(a,b)g(∇al,b) ≡ div l ≡ λ,

Tr(a,b)g(a,x)F (b, c, ...) = F (x, c, ...).

The trace Tr is always performedwith respect to the oldmetricg, so the new Ricci tensor is e−2λ times the trace Tr of the newRiemann tensor. In this way we derive the required expres-

sions for Ric and R. First we use Eq. (C.5) to obtain

Ric (a, c) = e−2λTr(b,d)R(a,b, c,d)

= Ric (a, c)

+ (N − 2)Hλ(a, c) + g(a, c)λ

+ g(l, l) (g(a, c)N − g(a, c))

+ (g(a, l)g(l, c) − g(l, l)g(a, c))

+ (g(a, l) − g(a, l)N) g(c, l)

= Ric (a, c) + (N − 2)Hλ(a, c) + g(a, c)λ

+ (N − 2) [g(l, l)g(a, c) − g(a, l)g(c, l)] .

This is the required formula for the new Ricci tensor; the newRicci scalar is obtained straightforwardly by taking the traceof the new Ricci tensor.

Proof of Statement 1.9.5.1 on page 51: We would like to liftthe restrictions on the vectors v,x,y. For a fixed v, the func-tion R(v,x,v,y) is a symmetric bilinear form of x,y. Thisbilinear form can be determined for spacelike x,y by a finitenumber of values (say, with x and y being vectors of a space-like basis). The requirement that vectors x,y be spacelike and

1Note that the Landau-Lifshitz convention has the opposite sign ofRαβ (butLudvigsen has the same sign).

orthogonal to v is then easily lifted since R(v,x, ...) is linearand antisymmetric in v,x and hence

R(v,x,v,y) = R(v,x + λv,v,y + µv)

for any λ, µ. Thus we can deduce R(v,x,v,y) for arbitraryx,y. Subsequently, we can regard R(v,x,v,y) for fixed x,yas a quadratic form

Qx,y(v,v) ≡ R(v,x,v,y). (C.6)

A standard trick in linear algebra is to recover a symmetricbilinear form A(a,b) if the quadratic form A(v,v) is known:

A(a,b) =A(a + b,a + b) −A(a,a) −A(b,b)

2.

Therefore, we can try to recover a symmetric bilinear formQx,y(a,b) from the quadratic formQx,y(v,v) as follows,

Qx,y(a,b) ≡1

2(R(a + b,x,a + b,y)

−R(a,x,a,y) −R(b,x,b,y)) . (C.7)

To recover Qx,y(a,b) completely, it suffices to know the val-ues of Qx,y(a,b) for arbitrary basis vectors a,b; in total, weneed 10 different values (for fixed x,y). However, we areallowed to measure only Qx,y(v,v) for future-directed andtimelike v. We note that the sum of two such vectors, v1 + v2,is again future-directed and timelike. So we may deduceQx,y(a,b) for any two future-directed, timelike a and b usingEq. (C.7). We also note that a basis in the four-dimensionalspace can be found as a linearly independent set of 4 future-directed, timelike vectors. Such a basis will not be orthogonal,but one does not need an orthogonal basis to determine a bi-linear form. Hence, for fixed x,y the bilinear form Qx,y(a,b)can be determined by finitely many measurements. This pro-cess needs to be repeated for 10 different combinations of x,y.We are thus able to deduce the values Qx,y(a,b) for arbitraryvectors a,b,x,y if we measure R(v,x,v,y) for sufficientlymany future-directed timelike vectors v and spacelike vectorsx,y (each time orthogonal to v).So far we recovered the bilinear form Qx,y(a,b), which issymmetric in a,b and also in x,y. Despite the suggestiveEq. (C.6), we must note that Qx,y(a,b) 6= R(a,x,b,y). Con-sider R(a,x,b,y) for fixed x,y as a bilinear form of a,b. Thisbilinear form is not necessarily symmetric in a,b. In general,this bilinear form has a symmetric part,

1

2(R(a,x,b,y) +R(b,x,a,y)) ,

and an antisymmetric part,

1

2(R(a,x,b,y) −R(b,x,a,y)) .

The quadratic form Qx,y(v,v) is defined through Eq. (C.6)and knows only about the symmetric part of R(...) as shownabove. Therefore, we recover precisely the symmetric partof R(...) when we determine the bilinear form Qx,y(a,b). Inother words,Qx,y(a,b) is related toR(a,x,b,y) through sym-metrization in (a,b),

Qx,y(a,b) =1

2(R(a,x,b,y) +R(b,x,a,y)) . (C.8)

To recover the full Riemann tensor, it remains to extract theantisymmetric part. A property of R not yet used is the first

170

Page 179: Winitzki. Advanced General Relativity

C.2 For Chapter 3

Bianchi identity (1.65), which we apply to the last term inEq. (C.8) and obtain

2Qx,y(a,b) = R(a,x,b,y) −R(x,a,b,y) −R(a,b,x,y)

= −2R(a,x,y,b) +R(a,b,y,x).

We now need to express this somehow through Q. Note thata suitable combination is

2Qb,x(a,y) = R(a,b,y,x) +R(a,x,y,b).

Hence

2Qx,y(a,b) = −3R(a,x,y,b) + 2Qb,x(a,y);

R(a,x,b,y) =2

3(Qx,y(a,b) −Qb,x(a,y)) .

Thus all the values of R(...) can be deduced from the givenexperimental data.

C.2 For Chapter 3

Details for Calculation 3.1.3.3 on page 74: Suppose that thevector field u is the 4-velocity of the stationary observers, so

u = ∂t =1

zk, z = eHt.

Denote by v the 4-velocity of the particle. Then

g(v,v) = g(u,u) = 1.

The relative 3-velocity δ~v is related to the “relativistic gamma-factor” by

γ =1

1 − (δ~v)2≥ 1;

on the other hand, γ = g(u,v). Therefore, we need to computethe change of the “gamma-factor” γ along the worldline of theparticle. We first note that for arbitrary x,y the property

g(∇xk,y) + g(∇yk,x) = (Lkg) (x,y) = 2HeHtg(x,y)

holds. Thus, assuming ∇vv = 0, we have

∇vg(k,v) = g(∇vk,v) = HeHtg(v,v) = HeHt.

However, we need to compute∇vg(u,v), while g(u,v) differsfrom g(k,v) by a function of t only. So we need to computethe derivative

∇vt ≡ v t = g(v, ∂t) = g(v,u) ≡ γ.

Then we can write

∇vγ = ∇vg(u,v) = ∇v

[e−Htg(k,v)

]

=(∇ve

−Ht)eHtγ + e−HtHeHt

=(1 − γ2

)H.

Denote by τ the proper time parameter along the worldline ofthe particle. Then we obtain the differential equation

∇vγ =dγ

dτ=(1 − γ2

)H.

The general solution, γ(τ) = cothH(τ − τ0), exponentiallyapproaches γ = 1 at late times. In the Newtonian limit,

γ ≈ 1 − 1

2(δ~v)2 ,

the equation for γ(τ) becomes

δ~v · ddτδ~v = −H (δ~v)

2.

It is easy to see that a geodesic is a straight line in the three-dimensional space. Thus the “force of friction” acting on theparticle is directed opposite to the velocity δ~v. Then, in theframework of Newton’s second law, the force vector can beexpressed as

~F = md

dτδ~v = −Hδ~v.

Proof of Statement 3.1.3.4 on page 74: We need to compute

dτ≡ ∇vg(u,v) = g(∇v

(z−1k

),v)

= −g(k,v)z−2∇vz + z−1g(∇vk,v)

=g(∇vk,v) − g(u,v)∇vz

z.

Since z ≡√

g(k,k), we have

∇vz =1

2z∇vg(k,k) =

1

zg(∇vk,k).

So it remains to compute the bilinear form g(∇xk,y). If thevector field k is integrable then gk is an exact 1-form and thefirst term the Koszul formula,

g(∇xk,y) =

(1

2dgk +

1

2Lkg

)

(x,y) ,

cancels (here x,y are arbitrary vectors). For an integrable con-formal Killing vector k, we therefore get

g(∇xk,y) =1

2(Lkg) (x,y) = λg(x,y).

Using this property, we compute

dτ=λg(v,v) − γz−1λg(v,k)

z

=(1 − γ2

)λz−1.

Proof of Statement 3.2.1.1 on page 77: (a) Consider a con-gruence of null curves in a two-dimensional spacetime, anddenote by u the corresponding tangent vector field. Sinceg(u,u) = 0, we have

g(u,∇uu) = 0.

In two dimensions, there is only a one-dimensional subspaceof vectors orthogonal to a null vector u, namely the space ofvectors parallel to u itself. Thus ∇uu is parallel to u itself, sothere exists a scalar function µ such that

∇uu = µu.

171

Page 180: Winitzki. Advanced General Relativity

C Calculations and proofs

This means that the flow lines of u are geodesic (but perhapsnot affinely parameterized). A geodesic null vector field canbe found as αu, where α is a scalar function satisfying

∇uα+ µα = 0.

Such a function α always exists since it is specified by a differ-ential equation along the flow lines of u. Since ∇αu(αu) = 0,so the flow lines of αu (which are the same curves as the flowlines of u but perhaps differently parameterized) are geodesiccurves.(b) Radial null geodesics in the flat metric g are lines of ei-ther constant u or constant v (and constant θ, φ). Since the newmetric is

g =1

2e2λ (du⊗ dv + dv ⊗ du) − e2λ

(u− v)2

4dS2,

while dS2 = 0 if θ, φ = const, it is clear that lines of constant uor v (and constant θ, φ) will remain null curves also accordingto the metric g. Since the metric is independent of θ, φ, linesof constant θ, φ will be geodesics if they are geodesics in thetwo-dimensional spacetime with metric

1

2e2λ (du⊗ dv + dv ⊗ du)

and coordinates (u, v) at constant θ, φ. But in a two-dimensional spacetime any null curve is a null geodesic, ac-cording to part (a) of this statement. Thus, we find that theradial null curves are also null geodesics in the metric g.

C.3 For Chapter 6

Proof of Statement 6.1.3.1 on page 128: Let ea be an or-thonormal basis and θa the corresponding dual basis. Sincethe Hodge duality map ω 7→ ∗ω is linear in ω, it suffices toconsider ω of the form ω = χ1 ∧ ... ∧ χn, where χj are some 1-forms. Moreover, it is sufficient to consider the case when allχj are basis 1-forms θ

a. Further, it is sufficient to treat the casewhen χ1 = θ1, ..., χn = θn. Thus we only need to consider the

n-form ω = θ1 ∧ ...∧ θn. Using the definition ε = θ1 ∧ ...∧ θN

and the fact that

ιen...ιe1

(θ1 ∧ ... ∧ θn

)= 1

is (up to permutations) the only nonzero expression of thetype

ιeaιeb...ιec

(θ1 ∧ ... ∧ θn

),

we can compute ∗ ∗ ω explicitly:

∗ω = ∗(θ1 ∧ ... ∧ θn) = η11...ηnn (ιen...ιe1ε)

= η11...ηnnθn+1 ∧ ... ∧ θN ;

∗ ∗ ω = ∗(θn+1 ∧ ... ∧ θN )η11...ηnn

= η11...ηNN(ιeN

...ιen+1ε)

= (det ηab) (−1)(N−n)n ω.

The sign factor (−1)(N−n)n

appears in the identity

ιeN...ιen+1ε = ιeN

...ιen+1

(

θ1 ∧ ...θn ∧ θn+1 ∧ ... ∧ θN)

= (−1)(N−n)n

θ1 ∧ ... ∧ θn

due to the necessity to carry (N − n) vectors eN , ..., en+1through the initial group of n 1-forms

θ1, ...,θn

.

Proof of Statement 6.1.3.2 on page 129: (a)We will use thedefinition of ∗ω through an orthonormal basis ea. We com-pute the N -form

ω1∧∗ω2 =∑

k1,...,kn

ηk1k1 ...ηknkn

n!

(ιekn

...ιek1ω2

)ω1∧

(ιekn

...ιek1ε);

the first task is to show that this expression is symmetric withrespect to the exchange ω1 ↔ ω2. We use Eq. (6.6) to computethe Hodge dual of this N -form,

∗(ω1 ∧ ∗ω2) = (det ηab) (ω1 ∧ ∗ω2) (e1, ..., eN ) .

As an intermediate step, we need to compute

[ω1 ∧

(ιekn

...ιek1ε)]

(e1, ..., eN )

=∑

σ

(−1)|σ|

n! (N − n)!ω1(eσ(1), ..., eσ(n))ε(ek1 , ..., ekn

, eσ(n+1), ..., eσ(N))

(C.9)

In the expression (C.9), the summation is performed over allpermutations σ of the set 1, ..., N. We note that

ε(ek1 , ..., ekn, eσ(n+1), ..., eσ(N)) = 0

unless the set k1, ..., kn, σ(n+ 1), ..., σ(N) is also a permuta-tion of the same kind. It follows that a nonzero contribution ispossible only if the set k1, ..., kn is a permutation of the setσ(1), ..., σ(n). The expression for ∗(ω1∧∗ω2) contains a sumover the indices k1, ..., kn as well as a sum over all permuta-tions σ,

∗ (ω1 ∧ ∗ω2) = (det ηab)∑

σ

k1,...,kn

ηk1k1 ...ηknkn

n!ω2(ek1 , ..., ekn

)

× (−1)|σ|

n! (N − n)!ω1(eσ(1), ..., eσ(n))

× ε(ek1 , ..., ekn, eσ(n+1), ..., eσ(N)). (C.10)

For a given permutation σ, there are n! sets k1, ..., kn thatgive a nonzero contribution because they are permutations ofσ(1), ..., σ(n). The contributions of these n! sets to Eq. (C.10)are all equal, hence the sum over k1, ..., kn can be replacedby an extra factor n!, while the indices k1, ..., kn may be rela-beled as σ(1), ..., σ(n). So we find

∗ (ω1 ∧ ∗ω2) = (det ηab)∑

σ

(−1)|σ|

n! (N − n)!ησ(1)σ(1)...ησ(n)σ(n)

× ω2(eσ(1), ..., eσ(n))ω1(eσ(1), ..., eσ(n))ε(eσ(1), ..., eσ(N)).

The expression above is manifestly symmetric under the ex-change ω1 ↔ ω2. This already proves the first part of thestatement (a), but it is useful to derive a simplified form ofthe expression ∗(ω1 ∧ ∗ω2). We note that

(−1)|σ| ε(eσ(1), ..., eσ(N)) = 1

for any permutation σ, and hence

∗ (ω1 ∧ ∗ω2) = (det ηab)∑

σ

ησ(1)σ(1)...ησ(n)σ(n)

n! (N − n)!

× ω2(eσ(1), ..., eσ(n))ω1(eσ(1), ..., eσ(n)).

172

Page 181: Winitzki. Advanced General Relativity

C.3 For Chapter 6

Since the expression above is independent of the permutationelements σ(n+ 1), ..., σ(N), we may sum over them, intro-ducing an extra factor (N − n)!. Finally, we relabel the indicesσ(1), ..., σ(n) as k1, ..., kn and obtain the formula

∗(ω1 ∧ ∗ω2) =(det ηab)

n!

k1,...,kn

ηk1k1 ...ηknkn

× ω1(ek1 , ..., ekn)ω2(ek1 , ..., ekn

). (C.11)

Using this explicit formula, it is straightforward to show thatthe basis n-forms θa1 ∧ ...∧θan comprise an orthonormal basiswith respect to the scalar product ∗(ω1 ∧ ∗ω2).(b)We rewrite the formula (C.11) without an explicit choiceof basis. We note that a summation over basis vectors ekmultiplied by ηkk is equivalent to taking the trace, e.g.

Tr(a,b)X(a,b) =∑

k

ηkkX(ek, ek).

Hence, Eq. (C.11) becomes

∗ (ω1 ∧ ∗ω2)

=(det ηab)

n!Tr(a1,b1)...(an,bn)ω1(a1, ...,an)ω2(b1, ...,bn).

This is equivalent to Eq. (6.7).(c) Using the explicit formula (6.7), we find

ω1 ∧ ∗ω2 = Vol Tr(a,b)ω1(a)ω2(b).

The trace is computed as

Tr(a,b)ω1(a)ω2(b) = Tr(a,b)g(v1,a)g(v2,b) = g(v1,v2).

(d)We use Eq. (6.7) and obtain

Ω ∧ ∗(ω1 ∧ ω2) =Vol

2!Tr(a1,b1)(a2,b2)Ω(a1,a2) (ω1 ∧ ω2) (b1,b2)

= Vol Tr(a1,b1)(a2,b2)Ω(a1,a2)ω1(b1)ω2(b2)

= Vol Tr(a1,b1)(a2,b2)Ω(a1,a2)g(v1,b1)g(v2,b2)

= VolΩ(v1,v2).

Statement 6.1.3.2 on page 129: Since ω∧∗Ω is a 3-form, thereexists a 1-form φ such that

ω ∧ ∗Ω = ∗φ.

Consider an arbitrary 1-form χ and the expression χ∧ ∗φ. Wewill use Statement 6.1.3.2. On the one hand,

χ ∧ ∗φ = g−1(χ, φ)Vol.

On the other hand,

χ ∧ ∗φ = χ ∧ ω ∧ ∗Ω = Ω ∧ ∗ (χ ∧ ω)

= Ω(g−1χ, g−1ω)Vol = − (ιvιxΩ)Vol,

where x ≡ g−1χ and v ≡ g−1ω. We obtain the relationship

g−1(χ, φ) = Ω(g−1χ, g−1ω)

that holds for all 1-forms χ, or equivalently

ιxφ = −ιxιvΩ

that holds for all vectors x. It follows that φ = −ιvΩ, so ω ∧∗Ω = − ∗ (ιvΩ).

Proof of Statement 6.1.6.1 on page 132: The formula for χa bccan be guessed in the following way. One notes that χa bcshould be a linear combination of Aa bc, with some indicespermuted. Due to the antisymmetry of Aa bc, there are onlythree possible permutations (namely with either a, b, or c asthe first index). Moreover, the antisymmetry of χa bc in (a, b)means that the only possibility is

χa bc = (Aa bc −Ab ac)λ+Ac abµ,

where λ, µ are unknown constants. A direct substitution thenyields λ = −µ = 1

2 .It remains to show that this solution is unique. Supposethere exist two different solutions χa b and χa b. The differenceXa b ≡ χa b − χa b satisfies the homogeneous equations

b

Xa b ∧ θb = 0, Xa b = −Xb a.

The 1-formsXa b can be expanded in the dual frame basis θcas

Xa b =∑

c

Xa bcθc, Xa bc = −Xb ac.

Then we can write

b

Xa b ∧ θb =∑

b,c

Xa bcθb ∧ θc = 0.

It follows that Xa bc = Xa cb, i.e. Xa bc is symmetric in b, c.However, it is impossible that a nonzero quantity Xa bc issymmetric in b, c and at the same time antisymmetric in a, b.Namely,

Xa bc = −Xb ac = −Xb ca = Xc ba = Xc ab = −Xa cb = −Xa bc,

so Xa bc = 0 for any a, b, c. Therefore, the difference betweenχa b and χa b equals zero, i.e. the solution χa b is unique.

Proof of Statement 6.1.6.4 on page 133: The overall shapeof Eq. (6.26) can be guessed up to the values of the numericalcoefficients. First we verify that the given expression for Ba b

solves the required equations, checking the equivalence of thetwo given formulas, and then we investigate the uniquenessof the solution.Let us compute

b θb ∧ Bb a, where Bb a is given byEq. (6.26). Using Statement 6.1.6.2 for n-forms, we find

1

n

b

θb ∧ ιebAa = Aa,

1

n

b

θb ∧ ιeb

(

ιea

c

θc ∧ Ac

)

= ιea

c

θc ∧ Ac

= Aa −∑

c

θc ∧ ιeaAc.

Then it is straightforward to see that Eq. (6.25) holds. Theequivalence with Eq. (6.27) follows by virtue of Eq. (6.24),which also holds for arbitrary n-formsAa.We can demonstrate the non-uniqueness of the solution

Bb a. One may modify a given solution Bb a by adding anarbitrary (n− 1)-formXb a as long as

a

θa ∧ Xb a = 0, Xb a = −Xa b.

173

Page 182: Winitzki. Advanced General Relativity

C Calculations and proofs

Unlike the case n = 2, there exist nontrivial Xb a 6= 0 satisfy-ing these conditions. Note that Xb a is an (n− 1)-form whilen ≥ 3, so we may write for example

Xb a = ξb ∧ ξa ∧ h,

where h is an arbitrary (n− 3)-form and ξb is a 1-form such

that∑

b θb ∧ ξb = 0. An example of such ξb is

ξb ≡∑

c

Fbcθc, Fbc = Fcb,

where Fbc is an arbitrary but symmetric array of coefficients.One can also add several such terms Xb a with different Fbcand h (this is not equivalent to adding one such term). Thusthere is a considerable remaining freedom in selecting the(n− 1)-forms Bb a. Note that this freedom does not exist ifn = 2.

174

Page 183: Winitzki. Advanced General Relativity

D Comments on literature

D.1 Comments on Ludvigsen’s GeneralRelativity

The book is M. Ludvigsen, General relativity: a geometricaproach (Cambridge University Press, 1999). Many explana-tions in that book are outstandingly clear, and I benefitedgreatly by reading it. Nevertheless, there are some minorgaffes:1) On p. 91, eq. 9.27 is supposedly the same as eq. 9.20when“written in full.” However, these equations actually differ bythe choice of the permuted indices. The relation 9.27 can beobtained from 9.20 only if one assumes the identity Rabcd =Rcdab, which Ludvigsen actually never mentions in the book.This well-known standard identity is a consequence of 9.19and 9.20.2) On p. 103, eq. 10.14 contains ∇a∇aφ = R..., whilethe preceding (unnumbered) equation on p.102 contains−∇e∇aφ = R.... A minus sign has materialized fromnowhere! The answer (10.17) is correct, and the extra minussign is actually needed to compensate for an error made ear-lier. In the last paragraph on p. 101, Ludvigsen writes (in3-dimensional notation) a = ∇φ whereas in fact a = −∇φin Newtonian gravity. (The acceleration points down, the po-tential grows upwards.) So the correct calculation starts byintroducing aa = ∇aφ and not −∇aφ.3) On p. 108, top line, “Using the fact that lan

a = 1 and∇alb = ∇bla, we have...” — actually, the same result followswith merely the assumption that la is a null geodesic. It is notnecessary to assume that la is integrable.4) On p. 109, top line, — one cannot actually derive Eq.(11.10) as claimed. By contracting the top equation withlambmc and using Eq. (11.8), which already assumes that l isintegrable, one gets

Dσ = R(l,m, l,m) + 2lamb(∇amb).

Now it is unclear how to show that the last term is equal to2ρσ. There is a significant freedom in ∇m since m is chosensimply to be orthogonal to n, l and this is not sufficientto fix ∇m. In fact, the null tetrad can be changed by thetransformation m → eiλm. Then σ → e2iλσ, as shown at topof page 110. If λ is a function of position then the equationDσ = ... will be changed after the transformation! Thus thisequation really depends on the choice of the tetrad, and somechoices are better than others. The equation (11.10) is perhapsobtained with a suitable choice of the tetrad, but this is notdiscussed in the book.

Extra topics: Derivation of Schwarzschild uniquely fromsymmetry (Ludvigsen?) Definition of mass in spheri-cally symmetric spacetime (see Frolov-Kofman?) ADMand Bondi mass? Positivity of energy theorem? Energy-momentum pseudotensors?

175

Page 184: Winitzki. Advanced General Relativity
Page 185: Winitzki. Advanced General Relativity

E License for this text

E.1 Author’s position on commercialpublishing

Thanks to modern technology, one can prepare an entire bookelectronically on a personal computer, in ready-to-print form.Sending an electronic book across the world takes at most afew minutes and costs about as much as a cup of tea. Theauthor encourages everyone interested in reading the text todownload and/or print it, in whole or in part. The two-column formatting of the text is designed to require the leastpossible amount of paper when printed. Everyone is also en-titled to commission a print shop to produce bound copies ofthe text, in which case the single-column formatting may bepreferred. The cost of one bound copy may be estimated as 10to 20 US dollars.A commercial publisher may want to offer professionallyprinted and bound copies for sale. Since this book is dis-tributed with complete source, it will be a matter of minutesto reformat the book according to the taste or constraints ofa particular publisher even without the author’s assistance.The author welcomes commercial printing of the text, as longas the publisher adheres to the conditions of the license (theGNU FDL). Since the FDL disallows granting exclusive dis-tribution rights, the author cannot sign a standard exclusive-rights contract with a publisher. However, the author willconsider signing any publishing contract that leaves intact theconditions of the FDL.

E.2 GNU Free Documentation License

Version 1.2, November 2002Copyright (c) 2000,2001,2002 Free Software Foundation,Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307, USAEveryone is permitted to copy and distribute verbatimcopies of this license document, but changing it is not allowed.

Preamble

The purpose of this License is to make a manual, textbook,or other functional and useful document free in the sense offreedom: to assure everyone the effective freedom to copyand redistribute it, with or without modifying it, either com-mercially or noncommercially. Secondarily, this License pre-serves for the author and publisher a way to get credit fortheir work, while not being considered responsible for modi-fications made by others.This License is a kind of “copyleft”, which means thatderivative works of the document must themselves be free inthe same sense. It complements the GNU General Public Li-cense, which is a copyleft license designed for free software.We have designed this License in order to use it for manualsfor free software, because free software needs free documen-tation: a free program should come with manuals providingthe same freedoms that the software does. But this License isnot limited to software manuals; it can be used for any textualwork, regardless of subject matter or whether it is published

as a printed book. We recommend this License principally forworks whose purpose is instruction or reference.

E.2.0 Applicability and definitions

This License applies to any manual or other work, in anymedium, that contains a notice placed by the copyright holdersaying it can be distributed under the terms of this License.Such a notice grants a world-wide, royalty-free license, unlim-ited in duration, to use that work under the conditions statedherein. The “Document”, below, refers to any such manualor work. Any member of the public is a licensee, and is ad-dressed as “you”. You accept the license if you copy, modifyor distribute the work in a way requiring permission undercopyright law.A “Modified Version” of the Document means any workcontaining the Document or a portion of it, either copied ver-batim, or with modifications and/or translated into anotherlanguage.A “Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively withthe relationship of the publishers or authors of the Documentto the Document’s overall subject (or to related matters) andcontains nothing that could fall directly within that overallsubject. (Thus, if the Document is in part a textbook of math-ematics, a Secondary Section may not explain any mathemat-ics.) The relationship could be a matter of historical connec-tion with the subject or with related matters, or of legal, com-mercial, philosophical, ethical or political position regardingthem.The “Invariant Sections” are certain Secondary Sectionswhose titles are designated, as being those of Invariant Sec-tions, in the notice that says that the Document is released un-der this License. If a section does not fit the above definitionof Secondary then it is not allowed to be designated as Invari-ant. The Document may contain zero Invariant Sections. If theDocument does not identify any Invariant Sections then thereare none.The “Cover Texts” are certain short passages of text that arelisted, as Front-Cover Texts or Back-Cover Texts, in the noticethat says that the Document is released under this License. AFront-Cover Text may be at most 5 words, and a Back-CoverText may be at most 25 words.A “Transparent” copy of the Document means a machine-readable copy, represented in a format whose specification isavailable to the general public, that is suitable for revising thedocument straightforwardly with generic text editors or (forimages composed of pixels) generic paint programs or (fordrawings) some widely available drawing editor, and that issuitable for input to text formatters or for automatic transla-tion to a variety of formats suitable for input to text format-ters. A copy made in an otherwise Transparent file formatwhose markup, or absence of markup, has been arranged tothwart or discourage subsequent modification by readers isnot Transparent. An image format is not Transparent if usedfor any substantial amount of text. A copy that is not “Trans-parent” is called “Opaque”.

177

Page 186: Winitzki. Advanced General Relativity

E License for this text

Examples of suitable formats for Transparent copies includeplain ASCII without markup, Texinfo input format, LATEX in-put format, SGML or XML using a publicly available DTD,and standard-conforming simple HTML, PostScript or PDFdesigned for human modification. Examples of transparentimage formats include PNG, XCF and JPG. Opaque formatsinclude proprietary formats that can be read and edited onlyby proprietary word processors, SGML or XML for which theDTD and/or processing tools are not generally available, andthe machine-generated HTML, PostScript or PDF producedby some word processors for output purposes only.The “Title Page” means, for a printed book, the title pageitself, plus such following pages as are needed to hold, leg-ibly, the material this License requires to appear in the titlepage. For works in formats which do not have any title pageas such, “Title Page” means the text near the most prominentappearance of the work’s title, preceding the beginning of thebody of the text.A section “Entitled XYZ” means a named subunit of theDocument whose title either is precisely XYZ or contains XYZin parentheses following text that translates XYZ in anotherlanguage. (Here XYZ stands for a specific section name men-tioned below, such as “Acknowledgements”, “Dedications”,“Endorsements”, or “History”.) To “Preserve the Title” ofsuch a section when you modify the Document means that itremains a section “Entitled XYZ” according to this definition.The Document may include Warranty Disclaimers next tothe notice which states that this License applies to the Doc-ument. These Warranty Disclaimers are considered to be in-cluded by reference in this License, but only as regards dis-claiming warranties: any other implication that these War-ranty Disclaimers may have is void and has no effect on themeaning of this License.

E.2.1 Verbatim copying

You may copy and distribute the Document in any medium,either commercially or noncommercially, provided that thisLicense, the copyright notices, and the license notice sayingthis License applies to the Document are reproduced in allcopies, and that you add no other conditions whatsoever tothose of this License. You may not use technical measures toobstruct or control the reading or further copying of the copiesyou make or distribute. However, you may accept compensa-tion in exchange for copies. If you distribute a large enoughnumber of copies you must also follow the conditions in sec-tion E.2.2.You may also lend copies, under the same conditions statedabove, and you may publicly display copies.

E.2.2 Copying in quantity

If you publish printed copies (or copies in media that com-monly have printed covers) of the Document, numberingmore than 100, and the Document’s license notice requiresCover Texts, you must enclose the copies in covers that carry,clearly and legibly, all these Cover Texts: Front-Cover Texts onthe front cover, and Back-Cover Texts on the back cover. Bothcovers must also clearly and legibly identify you as the pub-lisher of these copies. The front cover must present the fulltitle with all words of the title equally prominent and visible.You may add other material on the covers in addition. Copy-ing with changes limited to the covers, as long as they pre-

serve the title of the Document and satisfy these conditions,can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous tofit legibly, you should put the first ones listed (as many asfit reasonably) on the actual cover, and continue the rest ontoadjacent pages.

If you publish or distribute Opaque copies of the Documentnumberingmore than 100, youmust either include amachine-readable Transparent copy along with each Opaque copy, orstate in or with each Opaque copy a computer-network loca-tion from which the general network-using public has accessto download using public-standard network protocols a com-plete Transparent copy of the Document, free of added ma-terial. If you use the latter option, you must take reasonablyprudent steps, when you begin distribution of Opaque copiesin quantity, to ensure that this Transparent copy will remainthus accessible at the stated location until at least one year af-ter the last time you distribute an Opaque copy (directly orthrough your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the au-thors of the Document well before redistributing any largenumber of copies, to give them a chance to provide you withan updated version of the Document.

E.2.3 Modifications

You may copy and distribute a Modified Version of the Doc-ument under the conditions of sections E.2.1 and E.2.2 above,provided that you release the Modified Version under pre-cisely this License, with the Modified Version filling the roleof the Document, thus licensing distribution and modificationof the Modified Version to whoever possesses a copy of it. Inaddition, you must do these things in the Modified Version:

A. Use in the Title Page (and on the covers, if any) a titledistinct from that of the Document, and from those of previ-ous versions (which should, if there were any, be listed in theHistory section of the Document). You may use the same titleas a previous version if the original publisher of that versiongives permission.

B. List on the Title Page, as authors, one or more personsor entities responsible for authorship of the modifications inthe Modified Version, together with at least five of the princi-pal authors of the Document (all of its principal authors, if ithas fewer than five), unless they release you from this require-ment.

C. State on the Title page the name of the publisher of theModified Version, as the publisher.

D. Preserve all the copyright notices of the Document.

E. Add an appropriate copyright notice for your modifica-tions adjacent to the other copyright notices.

F. Include, immediately after the copyright notices, a licensenotice giving the public permission to use the Modified Ver-sion under the terms of this License, in the form shown in theAddendum below.

G. Preserve in that license notice the full lists of InvariantSections and required Cover Texts given in the Document’slicense notice.

H. Include an unaltered copy of this License.

I. Preserve the section Entitled “History”, Preserve its Ti-tle, and add to it an item stating at least the title, year, newauthors, and publisher of the Modified Version as given onthe Title Page. If there is no section Entitled “History” in the

178

Page 187: Winitzki. Advanced General Relativity

E.2 GNU Free Documentation License

Document, create one stating the title, year, authors, and pub-lisher of the Document as given on its Title Page, then add anitem describing the Modified Version as stated in the previoussentence.J. Preserve the network location, if any, given in the Docu-ment for public access to a Transparent copy of the Document,and likewise the network locations given in the Document forprevious versions it was based on. These may be placed inthe “History” section. You may omit a network location for awork that was published at least four years before the Docu-ment itself, or if the original publisher of the version it refersto gives permission.K. For any section Entitled “Acknowledgements” or “Ded-ications”, Preserve the Title of the section, and preserve in thesection all the substance and tone of each of the contributoracknowledgements and/or dedications given therein.L. Preserve all the Invariant Sections of the Document, un-altered in their text and in their titles. Section numbers or theequivalent are not considered part of the section titles.M. Delete any section Entitled “Endorsements”. Such a sec-tion may not be included in the Modified Version.N. Do not retitle any existing section to be Entitled “En-dorsements” or to conflict in title with any Invariant Section.O. Preserve any Warranty Disclaimers.If the Modified Version includes new front-matter sectionsor appendices that qualify as Secondary Sections and containno material copied from the Document, you may at your op-tion designate some or all of these sections as invariant. Todo this, add their titles to the list of Invariant Sections in theModified Version’s license notice. These titles must be distinctfrom any other section titles.You may add a section Entitled “Endorsements”, providedit contains nothing but endorsements of your Modified Ver-sion by various parties—for example, statements of peer re-view or that the text has been approved by an organization asthe authoritative definition of a standard.Youmay add a passage of up to five words as a Front-CoverText, and a passage of up to 25 words as a Back-Cover Text,to the end of the list of Cover Texts in the Modified Version.Only one passage of Front-Cover Text and one of Back-CoverText may be added by (or through arrangements made by)any one entity. If the Document already includes a cover textfor the same cover, previously added by you or by arrange-ment made by the same entity you are acting on behalf of, youmay not add another; but you may replace the old one, on ex-plicit permission from the previous publisher that added theold one.The author(s) and publisher(s) of the Document do not bythis License give permission to use their names for publicityfor or to assert or imply endorsement of anyModified Version.

Combining documents

You may combine the Document with other documents re-leased under this License, under the terms defined in section4 above for modified versions, provided that you include inthe combination all of the Invariant Sections of all of the origi-nal documents, unmodified, and list them all as Invariant Sec-tions of your combinedwork in its license notice, and that youpreserve all their Warranty Disclaimers.The combined work need only contain one copy of this Li-cense, and multiple identical Invariant Sections may be re-placed with a single copy. If there are multiple Invariant Sec-

tions with the same name but different contents, make the ti-tle of each such section unique by adding at the end of it, inparentheses, the name of the original author or publisher ofthat section if known, or else a unique number. Make the sameadjustment to the section titles in the list of Invariant Sectionsin the license notice of the combined work.In the combination, you must combine any sections Enti-tled “History” in the various original documents, forming onesection Entitled “History”; likewise combine any sections En-titled “Acknowledgements”, and any sections Entitled “Ded-ications”. You must delete all sections Entitled “Endorse-ments.”

Collections of documents

You may make a collection consisting of the Document andother documents released under this License, and replace theindividual copies of this License in the various documentswith a single copy that is included in the collection, providedthat you follow the rules of this License for verbatim copyingof each of the documents in all other respects.You may extract a single document from such a collection,and distribute it individually under this License, providedyou insert a copy of this License into the extracted document,and follow this License in all other respects regarding verba-tim copying of that document.

Aggregation with independent works

A compilation of the Document or its derivatives with otherseparate and independent documents or works, in or on a vol-ume of a storage or distribution medium, is called an “aggre-gate” if the copyright resulting from the compilation is notused to limit the legal rights of the compilation’s users beyondwhat the individual works permit. When the Document is in-cluded an aggregate, this License does not apply to the otherworks in the aggregate which are not themselves derivativeworks of the Document.If the Cover Text requirement of section E.2.2 is applicableto these copies of the Document, then if the Document is lessthan one half of the entire aggregate, the Document’s CoverTexts may be placed on covers that bracket the Documentwithin the aggregate, or the electronic equivalent of coversif the Document is in electronic form. Otherwise they mustappear on printed covers that bracket the whole aggregate.

Translation

Translation is considered a kind of modification, so you maydistribute translations of the Document under the terms ofsection E.2.3. Replacing Invariant Sections with translationsrequires special permission from their copyright holders, butyou may include translations of some or all Invariant Sectionsin addition to the original versions of these Invariant Sections.You may include a translation of this License, and all the li-cense notices in the Document, and anyWarrany Disclaimers,provided that you also include the original English versionof this License and the original versions of those notices anddisclaimers. In case of a disagreement between the transla-tion and the original version of this License or a notice or dis-claimer, the original version will prevail.If a section in the Document is Entitled “Acknowledge-ments”, “Dedications”, or “History”, the requirement (sec-

179

Page 188: Winitzki. Advanced General Relativity

E License for this text

tion E.2.3) to Preserve its Title (section E.2.0) will typically re-quire changing the actual title.

Termination

You may not copy, modify, sublicense, or distribute the Doc-ument except as expressly provided for under this License.Any other attempt to copy, modify, sublicense or distributethe Document is void, and will automatically terminate yourrights under this License. However, parties who have re-ceived copies, or rights, from you under this License will nothave their licenses terminated so long as such parties remainin full compliance.

Future revisions of this license

The Free Software Foundation may publish new, revised ver-sions of the GNU Free Documentation License from time totime. Such new versions will be similar in spirit to the presentversion, but may differ in detail to address new problems orconcerns. See http://www.gnu.org/copyleft/.Each version of the License is given a distinguishing versionnumber. If the Document specifies that a particular numberedversion of this License “or any later version” applies to it, youhave the option of following the terms and conditions eitherof that specified version or of any later version that has beenpublished (not as a draft) by the Free Software Foundation.If the Document does not specify a version number of thisLicense, you may choose any version ever published (not as adraft) by the Free Software Foundation.

ADDENDUM: How to use this License for yourdocuments

To use this License in a document you have written, includea copy of the License in the document and put the followingcopyright and license notices just after the title page:

Copyright (c) <year> <your name>. Permission is grantedto copy, distribute and/or modify this document under theterms of the GNU Free Documentation License, Version 1.2or any later version published by the Free Software Founda-tion; with no Invariant Sections, no Front-Cover Texts, and noBack-Cover Texts. A copy of the license is included in the sec-tion entitled “GNU Free Documentation License”.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the “with...Texts.” line with this:

with the Invariant Sections being <list their titles>, with theFront-Cover Texts being <list>, andwith the Back-Cover Textsbeing <list>.

If you have Invariant Sections without Cover Texts, or someother combination of the three, merge those two alternativesto suit the situation.

If your document contains nontrivial examples of programcode, we recommend releasing these examples in parallel un-der your choice of free software license, such as the GNUGen-eral Public License, to permit their use in free software.

Copyright

Copyright (c) 2000, 2001, 2002 Free Software Foundation, Inc.59 Temple Place, Suite 330, Boston, MA 02111-1307, USA

Everyone is permitted to copy and distribute verbatimcopies of this license document, but changing it is not al-lowed.

180

Page 189: Winitzki. Advanced General Relativity

Bibliography

[1] V. I. Arnold, Mathematical methods of classical mechanics(Springer, NY, 1997). 15, 27

[2] P. Bamberg, S. Sternberg. A course in mathematics forstudents of physics (Cambridge, 1990). 22

[3] A. Borde, A. Vilenkin, and A. H. Guth. Inflationary space-times are not past-complete. Phys. Rev. Lett. 90, 151301(2003); online preprint gr-qc/0110012 (2001). 90

[4] R. Bott and J. Milnor. On the parallelizability of the spheres.Bull. Amer. Math. Soc. 64, 87 (1958). 138

[5] R. Bousso. The Holographic Principle. Rev. Mod. Phys. 75,825 (2002); online preprint hep-th/0203101 (2002). 99

[6] T. Eguchi, P. B. Gilkey, and E. J. Hanson. Gravitation,gauge theories and differential geometry. Phys. Rep. 66, 213(1980).

[7] J. Frauendiener. Conformal infinity. Living Rev. Rel. 7,1 (2004). Online article: cited Nov. 2005,www.livingreviews.org/lrr-2004-1 . 84

[8] R. P. Geroch. Singularities, in: Relativity, edited by M.Carmeli, S. I. Fickler, and L. Witten (Plenum Press, NY,1970), p. 259. 89

[9] R. P. Geroch. Asymptotic structure of spacetime. In:Proc. Symp. Cincinnati, Ohio, June 14-18, 1976, ed. by F.P. Esposito and L. Witten (Plenum Press, NY, 1977), p. 1.78, 84

[10] Ø. Grøn and S. Hervik. Einstein’s theory of relativity (bookdraft, online at www.fys.uio.no/~sigbjorh/GRbook.html,dated December 2004). v

[11] S. Gudmundsson. An introduction to Rie-mannian geometry (book draft, online atwww.matematik.lu.se/matematiklu/personal/sigma/,dated 2006). 1

[12] S. W. Hawking and G. F. R. Ellis. The large-scale structureof space-time (Cambridge, 1973). v, 89, 93, 97

[13] A. Hebecker. Allgemeine Relativität (lecture notesfor the University of Heidelberg, online atres.kaelteschaden.de/ART.pdf, dated July 2007). 127

[14] R. Jackiw. Constrained quantization without tears. Onlinepreprint arxiv:hep-th/9306075 (1993). 113

[15] A. Kempf. General Relativity for cosmology(lecture notes for AMATH875, online atwww.math.uwaterloo.ca/~akempf/amath875.shtml, dated2005). 137

[16] S. Lang. Introduction to differentiable manifolds(Springer, 2002). 22

[17] J. M. Lee. Introduction to smooth manifolds (Springer, 2003).1

[18] J. Lee and R. M. Wald. Local symmetries and constraints. J.Math. Phys. 31 , 725 (1990). 113

[19] M. Ludvigsen. General relativity: a geometric approach(Cambridge, 1999). v, 1

[20] P. W. Michor. Topics in differential geometry (lec-ture notes of a course given in Vienna, online atwww.mat.univie.ac.at/~michor/dgbook.pdf, draft datedApril 2007). 133

[21] C. Misner, K. Thorne, and J. Wheeler. Gravitation (Free-man, 1973). v, 44, 89, 125

[22] B. O’Neill. Semi-Riemannian geometry (Academic Press,1983). 1

[23] R. Penrose. Asymptotic properties of fields and spacetimes.Phys. Rev. Lett. 10, 66 (1963). 84

[24] R. Penrose. Gravitational collapse and space-time singulari-ties. Phys. Rev. Lett. 14, 57 (1965). 97

[25] R. Penrose. Structure of spacetime. In: Battelle Rencontres,ed. by C. M. DeWitt and J. A. Wheeler (Benjamin, NY,1968), p. 121. 39

[26] R. Penrose. Techniques of differential topology in relativity(SIAM, Philadelphia, 1972). 89

[27] R. Penrose and W. Rindler. Spinors and space-time (Cam-bridge, 1988). 141, 147, 149

[28] E. Poisson. A relativist’s toolkit (Cambridge, 2004). v, 111

[29] B. F. Schutz. A first course in general relativity (Cambridge,1985). v, 151

[30] N. Steenrod. Topology of fibre bundles (Princeton, 1951).138

[31] S. Sternberg. Semi-riemannian geometry and GR (bookdraft, online at www.math.harvard.edu/~shlomo, datedSeptember 2003).

[32] J. Stewart. Advanced general relativity (Cambridge, 1991).v, 141, 147

[33] N. Straumann. General relativity and relativistic astro-physics (Springer, 1984). v, 15, 22, 45, 125

[34] A. Trautman. Conservation laws in general relativity. In:Gravitation: an introduction to current research, ed. by L.Witten (Wiley, 1962), p. 169. 101

[35] A. Vilenkin. Interpretation of the wave function of the uni-verse. Phys. Rev. D 39, 1116 (1989). 120, 121

[36] R. M. Wald. General relativity (University of Chicago,1984). v, 74, 78, 85, 89, 111, 112, 125, 147

[37] H. Whitney.Differentiable manifolds.Ann. of Math. 37, 645(1936). 5

181

Page 190: Winitzki. Advanced General Relativity

Bibliography

[38] S. Winitzki. Drawing conformal diagrams for a fractal land-scape. Phys. Rev. D 71, 123523 (2005); online preprintarxiv.org/abs/gr-qc/0503061 (2005). 78

182

Page 191: Winitzki. Advanced General Relativity

Index

f(R) gravity, 105n-form, 11, 19parallel to 1-form, 58

n-vectors, 201-form, 4, 10closed, 57exact, 57

2-form, 162-sphere, 3, 5

abstract index notation, 39action principle, 50, 101active transformation, 144adjoint transformation, 42affine connection, 31affine parameter, 48affine tangent vector, 48affine transformation, 48anti-linear function, 145asymptotic predictability, 98asymptotically flat spacetime, 71, 84azimuthally symmetric spacetime, 36, 76

background field, 112Bianchi identityfirst, 44second, 45

bilinear form, 16bivector, 20boldface, vBondi coordinate system, 85boost, 143bulk, 156

canonical energy-momentum tensor, 110canonical momentum, 111Cartan homotopy formula, 22Cartan structure equation, 131, 135Cauchy horizon, 96Cauchy surface, 81causal structure, 152caustic, 66chain rule, 8Christoffel symbol, 156tensor or non-tensor, 53

Christoffel tensor, 32combing a sphere, 10, 138commutator of vectors, 4comoving region, 62comoving volume, 37components, 1conformal factor, 47conformal invariance, 62of null geodesics, 49

conformal Killing vector, 73

conformal transformation, 35, 47, 76of Ricci tensor, 47of Riemann tensor, 47

conformal weight, 86congruence, 9conjugate point, 92, 94conjugate quaternion, 141connectig vector, 4connecting vector, 12connection, 31, 157connection 1-form, 52, 139connection 1-forms, 131conservation law, 109constraint, 112constraint equation, 126contravariant gradient, 30, 57coordinate basis, 7coordinate singularity, 5cosmetic factors, 20cosmic censorship, 98cosmological constant, 66cosmological inflation, 29cosmology, 121cotangent space, 4, 10covariant, 1covariant derivative, 31, 155, 156covariant volume element, 101covector, 4, 10covering, 143curvature, 44curvature 2-forms, 135curve on a manifold, 3

D’Alembertian, 47, 72, 137Darboux theorem, 22dark energy, 66de Sitter spacetime, 29, 73, 75decomposition of identity, 40, 63derivation, 9diffeomorphism, 3differential forms, 19Dirac spinor, 138, 150Dirac spinors, 141directional derivative, 7distorsion tensor, 36distortion tensor, 62, 115divergence of vector fields, 37domain of dependence, 96dual basis, 29dual space, 10dual tetrad, 125

Einstein equation, 46, 102, 106Einstein frame, 106

183

Page 192: Winitzki. Advanced General Relativity

Index

Einstein-Cartan theory, 46Einstein-Hilbert action, 102energy conditions, 65energy-momentum tensorcovariant conservation, 46

Euclidean metric, 27Euler-Lagrange equation, 101event horizon, 98exponentiation map, 49exterior differential, 21, 130exterior product, 19of vectors, 19

exterior product of vectors, 56extrinsic curvature, 115, 116

Faddeev-Jackiw formalism, 113flag bivector, 147flag of a spinor, 147focal point, 66focusing theorem, 66, 67frame field, 125free field, 102Frobenius theorem, 57

gauge symmetry, 110Gauss-Codazzi equation, 115geodesic curve, 48geodesic deviation, 51geodesic equation, 48geodesic vector field, 48geodesically complete spacetime, 89GNU Free Documentation License, vgradient, 10, 30gravitational potential, 72, 74for Schwarzschild spacetime, 73

Hamilton equations, 112Hamiltonian action principle, 112Hamiltonian constraint, 119Hessian, 47, 137, 169Hodge duality, 38, 127Hodge star, 75, 149hole argument, 111holonomic basis, 126Hubble law, 73Hubble rate, 90

iff, 1index notation, 39index-free approach, 38index-free notation, 1converting into, 39

induced connection, 52induced metric, 52, 155Infeld-van der Waerden map, 145inflation, 90internal symmetry, 110intrinsic description, 5irrelevant statement, v

Jacobi field, 92Jacobi identity, 45Jordan frame, 106

Killing equation, 36

Killing vector, 36, 46, 58, 71Koszul formula, 35

Lagrange multipliers, 105lapse function, 119Levi-Civita connection, 7, 34Koszul formula, 35variation of, 104

Levi-Civita symbol, 30Levi-Civita tensor, 30Lie derivative, 13interpretation, 14

Lie group, 109, 143SL(2,C), 145SU(2), 143

Lie-propagated vector, 47lightcone, 60lightcone coordinates, 76lightrays, 48line integral, 17local coordinates, 3, 4local Lorentz transformation, 126local variation, 108Lorentz group, 143spinor representation, 145

Lorentz transformation, 151

manifold, 4definition, 3diffeomorphic, 6globally orientable, 30pseudo-Riemannian, 29Riemannian, 29smooth, 3

metric, 28Euclidean, 27nondenegeracy of, 28

metric connection, 34metric energy-momentum tensor, 110metricity, 33minimal couplingto gravity, 102

minisuperspace, 121Minkowski metric, 151, 152Minkowski spacetime, 151multivectors, 20mute vectors, 40, 42

Newtonian limit of GR, 71Noether current, 109nonholonomic basis, 126nonlinear gravity, 105null curve, 50null function, 61null generator, 61null surface, 60null tetrad, 67from spin basis, 146

null vector, 29, 55

operator ordering, 120oriented n-volume, 20oriented area, 16orthogonal complement, 55

184

Page 193: Winitzki. Advanced General Relativity

Index

orthonormal frame, 29

parallel transport, 47, 158partial inverse metric, 56partial metric, 55passive transformation, 144Pauli matrices, 142peeling property, 87perfect fluid, 65, 112Poincaré lemma, 27Poisson equation, 72principal null direction, 147projector, 52, 55for null directions, 55self-adjointness, 52self-adjointness of, 55

proper length, 49proper rotation, 141pseudo-exercises, v

quantum cosmology, 121quaternion, 141with complex coefficients, 144

radiation gauge, 113rank of 2-form, 23Raychaudhuri equation, 63in null tetrad formalism, 68

redshift factor, 73Ricci scalar, 45Ricci tensor, 45Riemann tensor, 44

scalar shear, 67scalar-tensor gravity, 105, 106scale factor, 121Schwarzschild spacetime, 29, 60, 72, 127gravitational potential, 73

shift vector, 119smooth function, 5smooth manifold, 3space of constant curvature, 53spherical coordinates, 5spin basis, 146spin connection, 131spinor, 141, 143spinor bundle, 148spinor field, 148spinor space, 143spinorial tensor, 146star-shaped neighborhood, 27static spacetime, 58, 71stationary observer, 71stationary spacetime, 58, 71strongly causal spacetime, 96sunchronous gauge, 117superspace, 120surface integral, 17

tangent bundle, 4, 10, 138tangent space, 4, 6tangent vector, 6tensor field, 11tetrad, 29, 125, 148

tetrad components of Maxwell tensor, 87tidal effect, 51time foliation, 112timelike surface, 60, 114torsion, 46torsion tensor, 33torsion-free, 33trace, 40index-free computation, 41

transverse tensor, 59, 63twist, 63

unitary matrix, 3

vector bundle, 138fiber, 138section of, 138trivial, 138trivialization, 138

vector field, 9acceleration of, 58divergence of, 56flow lines of, 9hypersurface orthogonal, 57, 58integrable, 57rotation of, 57, 63shear of, 64, 65

vielbein, 125vierbein, 29, 125volume 4-form, 31volume element, 31, 37vorticity, 63

wave function of the universe, 120wedge product, 19Weyl spinors, 141Wheeler-DeWitt equation, 120world tensor, 148

185