electricity and magnetism for mathematicians : a guided path from maxwell's equations to...

299

Upload: others

Post on 11-Sep-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills
Page 2: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Electricity and Magnetism for Mathematicians

This text is an introduction to some of the mathematical wonders of Maxwell’sequations. These equations led to the prediction of radio waves, the realization that lightis a type of electromagnetic wave, and the discovery of the special theory of relativity.In fact, almost all current descriptions of the fundamental laws of the universe canbe viewed as deep generalizations of Maxwell’s equations. Even more surprising isthat these equations and their generalizations have led to some of the most importantmathematical discoveries of the past thirty years. It seems that the mathematics behindMaxwell’s equations is endless.

The goal of this book is to explain to mathematicians the underlying physics behindelectricity and magnetism and to show their connections to mathematics. Starting withMaxwell’s equations, the reader is led to such topics as the special theory of relativity,differential forms, quantum mechanics, manifolds, tangent bundles, connections, andcurvature.

T H O M A S A . G A R R I T Y is the William R. Kenan, Jr. Professor of Mathematics atWilliams, where he was the director of the Williams Project for Effective Teaching formany years. In addition to a number of research papers, he has authored or coauthoredtwo other books, All the Mathematics You Missed [But Need to Know for GraduateSchool] and Algebraic Geometry: A Problem Solving Approach. Among his awardsand honors is the MAA Deborah and Franklin Tepper Haimo Award for outstandingcollege or university teaching.

Page 3: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills
Page 4: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

ELECTRICITY AND MAGNETISMFOR MATHEMATICIANS

A Guided Path from Maxwell’sEquations to Yang-Mills

THOMAS A. GARRITY

Williams College, Williamstown, Massachusettswith illustrations by Nicholas Neumann-Chun

Page 5: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

32 Avenue of the Americas, New York, NY 10013-2473, USA

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit ofeducation, learning, and research at the highest international levels of excellence.

www.cambridge.orgInformation on this title: www.cambridge.org/9781107435162

c© Thomas A. Garrity 2015

This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place without the written

permission of Cambridge University Press.

First published 2015

Printed in the United States of America

A catalog record for this publication is available from the British Library.

Library of Congress Cataloging in Publication DataGarrity, Thomas A., 1959– author.

Electricity and magnetism for mathematicians : a guided path from Maxwell’sequations to Yang-Mills / Thomas A. Garrity, Williams College, Williamstown,

Massachusetts; with illustrations by Nicholas Neumann-Chun.pages cm

Includes bibliographical references and index.ISBN 978-1-107-07820-8 (hardback) – ISBN 978-1-107-43516-2 (paperback)

1. Electromagnetic theory–Mathematics–Textbooks. I. Title.QC670.G376 2015

537.01′51–dc23 2014035298

ISBN 978-1-107-07820-8 HardbackISBN 978-1-107-43516-2 Paperback

Cambridge University Press has no responsibility for the persistence or accuracy ofURLs for external or third-party Internet Web sites referred to in this publication

and does not guarantee that any content on such Web sites is, or will remain,accurate or appropriate.

Page 6: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Contents

List of Symbols page xiAcknowledgments xiii

1 A Brief History 11.1 Pre-1820: The Two Subjects of Electricity and Magnetism 11.2 1820–1861: The Experimental Glory Days of

Electricity and Magnetism 21.3 Maxwell and His Four Equations 21.4 Einstein and the Special Theory of Relativity 21.5 Quantum Mechanics and Photons 31.6 Gauge Theories for Physicists:

The Standard Model 41.7 Four-Manifolds 51.8 This Book 71.9 Some Sources 7

2 Maxwell’s Equations 92.1 A Statement of Maxwell’s Equations 92.2 Other Versions of Maxwell’s Equations 12

2.2.1 Some Background in Nabla 122.2.2 Nabla and Maxwell 14

2.3 Exercises 14

3 Electromagnetic Waves 173.1 The Wave Equation 173.2 Electromagnetic Waves 203.3 The Speed of Electromagnetic Waves Is Constant 21

3.3.1 Intuitive Meaning 21

v

Page 7: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

vi Contents

3.3.2 Changing Coordinates for the Wave Equation 223.4 Exercises 25

4 Special Relativity 274.1 Special Theory of Relativity 274.2 Clocks and Rulers 284.3 Galilean Transformations 314.4 Lorentz Transformations 32

4.4.1 A Heuristic Approach 324.4.2 Lorentz Contractions and Time Dilations 354.4.3 Proper Time 364.4.4 The Special Relativity Invariant 374.4.5 Lorentz Transformations, the Minkowski Metric,

and Relativistic Displacement 384.5 Velocity and Lorentz Transformations 434.6 Acceleration and Lorentz Transformations 454.7 Relativistic Momentum 464.8 Appendix: Relativistic Mass 48

4.8.1 Mass and Lorentz Transformations 484.8.2 More General Changes in Mass 51

4.9 Exercises 52

5 Mechanics and Maxwell’s Equations 565.1 Newton’s Three Laws 565.2 Forces for Electricity and Magnetism 58

5.2.1 F = q(E + v× B) 585.2.2 Coulomb’s Law 59

5.3 Force and Special Relativity 605.3.1 The Special Relativistic Force 605.3.2 Force and Lorentz Transformations 61

5.4 Coulomb + Special Relativity+ Charge Conservation = Magnetism 62

5.5 Exercises 65

6 Mechanics, Lagrangians, and the Calculus of Variations 706.1 Overview of Lagrangians and Mechanics 706.2 Calculus of Variations 71

6.2.1 Basic Framework 716.2.2 Euler-Lagrange Equations 736.2.3 More Generalized Calculus of Variations Problems 77

6.3 A Lagrangian Approach to Newtonian Mechanics 78

Page 8: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Contents vii

6.4 Conservation of Energy from Lagrangians 836.5 Noether’s Theorem and Conservation Laws 856.6 Exercises 86

7 Potentials 887.1 Using Potentials to Create Solutions for Maxwell’s Equations 887.2 Existence of Potentials 897.3 Ambiguity in the Potential 917.4 Appendix: Some Vector Calculus 917.5 Exercises 95

8 Lagrangians and Electromagnetic Forces 988.1 Desired Properties for the Electromagnetic Lagrangian 988.2 The Electromagnetic Lagrangian 998.3 Exercises 101

9 Differential Forms 1039.1 The Vector Spaces �k(Rn) 103

9.1.1 A First Pass at the Definition 1039.1.2 Functions as Coefficients 1069.1.3 The Exterior Derivative 106

9.2 Tools for Measuring 1099.2.1 Curves in R

3 1099.2.2 Surfaces in R

3 1119.2.3 k-manifolds in R

n 1139.3 Exercises 115

10 The Hodge � Operator 11910.1 The Exterior Algebra and the � Operator 11910.2 Vector Fields and Differential Forms 12110.3 The � Operator and Inner Products 12210.4 Inner Products on �(Rn) 12310.5 The � Operator with the Minkowski Metric 12510.6 Exercises 127

11 The Electromagnetic Two-Form 13011.1 The Electromagnetic Two-Form 13011.2 Maxwell’s Equations via Forms 13011.3 Potentials 13111.4 Maxwell’s Equations via Lagrangians 13211.5 Euler-Lagrange Equations for the Electromagnetic

Lagrangian 13611.6 Exercises 139

Page 9: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

viii Contents

12 Some Mathematics Needed for Quantum Mechanics 14212.1 Hilbert Spaces 14212.2 Hermitian Operators 14912.3 The Schwartz Space 153

12.3.1 The Definition 15312.3.2 The Operators q( f ) = x f and p( f ) = −id f /dx 15512.3.3 S(R) Is Not a Hilbert Space 157

12.4 Caveats: On Lebesgue Measure, Types of Convergence,and Different Bases 159

12.5 Exercises 160

13 Some Quantum Mechanical Thinking 16313.1 The Photoelectric Effect: Light as Photons 16313.2 Some Rules for Quantum Mechanics 16413.3 Quantization 17013.4 Warnings of Subtleties 17213.5 Exercises 172

14 Quantum Mechanics of Harmonic Oscillators 17614.1 The Classical Harmonic Oscillator 17614.2 The Quantum Harmonic Oscillator 17914.3 Exercises 184

15 Quantizing Maxwell’s Equations 18615.1 Our Approach 18615.2 The Coulomb Gauge 18715.3 The “Hidden” Harmonic Oscillator 19315.4 Quantization of Maxwell’s Equations 19515.5 Exercises 197

16 Manifolds 20116.1 Introduction to Manifolds 201

16.1.1 Force = Curvature 20116.1.2 Intuitions behind Manifolds 201

16.2 Manifolds Embedded in Rn 203

16.2.1 Parametric Manifolds 20316.2.2 Implicitly Defined Manifolds 205

16.3 Abstract Manifolds 20616.3.1 Definition 20616.3.2 Functions on a Manifold 212

16.4 Exercises 212

Page 10: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Contents ix

17 Vector Bundles 21417.1 Intuitions 21417.2 Technical Definitions 216

17.2.1 The Vector Space Rk 216

17.2.2 Definition of a Vector Bundle 21617.3 Principal Bundles 21917.4 Cylinders and Mobius Strips 22017.5 Tangent Bundles 222

17.5.1 Intuitions 22217.5.2 Tangent Bundles for Parametrically Defined

Manifolds 22417.5.3 T (R2) as Partial Derivatives 22517.5.4 Tangent Space at a Point of an Abstract Manifold 22717.5.5 Tangent Bundles for Abstract Manifolds 228

17.6 Exercises 230

18 Connections 23218.1 Intuitions 23218.2 Technical Definitions 233

18.2.1 Operator Approach 23318.2.2 Connections for Trivial Bundles 237

18.3 Covariant Derivatives of Sections 24018.4 Parallel Transport: Why Connections Are Called

Connections 24318.5 Appendix: Tensor Products of Vector Spaces 247

18.5.1 A Concrete Description 24718.5.2 Alternating Forms as Tensors 24818.5.3 Homogeneous Polynomials as Symmetric Tensors 25018.5.4 Tensors as Linearizations of Bilinear Maps 251

18.6 Exercises 253

19 Curvature 25719.1 Motivation 25719.2 Curvature and the Curvature Matrix 25819.3 Deriving the Curvature Matrix 26019.4 Exercises 261

20 Maxwell via Connections and Curvature 26320.1 Maxwell in Some of Its Guises 26320.2 Maxwell for Connections and Curvature 26420.3 Exercises 266

Page 11: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

x Contents

21 The Lagrangian Machine, Yang-Mills, and Other Forces 26721.1 The Lagrangian Machine 26721.2 U(1) Bundles 26821.3 Other Forces 26921.4 A Dictionary 27021.5 Yang-Mills Equations 272

Bibliography 275Index 279

Color plates follow page 234

Page 12: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

List of Symbols

Symbol Name∇ nabla� LaplacianT transpose∈ element ofO(3,R) orthogonal groupR real numbersρ(·, ·) Minkowski metric∧k(Rn) k-forms on Rn

∧ wedge◦ composed with� star operatorH Hilbert space〈·, ·〉 inner productC complex numbersL2[0,1] square integrable functions∗ adjoint⊂ subset ofS Schwartz spaceh Planck constant∩ set intersection∪ set unionGL(k,R) general linear groupC∞

p germ of the sheaf of differentiable functions�(E) space of all sections of E∇ connection⊗ tensor product� symmetric tensor product

xi

Page 13: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills
Page 14: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Acknowledgments

There are many people who have helped in the preparing of this book. Firstoff, an earlier draft was used as the text for a course at Williams College inthe fall of 2009. In this class, Ben Atkinson, Ran Bi, Victoria Borish, AaronFord, Sarah Ginsberg, Charlotte Healy, Ana Inoa, Stephanie Jensen, DanKeneflick, Murat Kologlu, Edgar Kosgey, Jackson Lu, Makisha Maier, AlexMassicotte, Merideth McClatchy, Nicholas Neumann-Chun, Ellen Ramsey,Margaret Robinson, Takuta Sato, Anders Schneider, Meghan Shea, JoshuaSolis, Elly Tietsworth, Stephen Webster, and Qiao Zhang provided a lot offeedback. In particular Stephen Webster went through the entire manuscriptagain over the winter break of 2009–2010. I would like to thank Weng-HimCheung, who went through the whole manuscript in the fall of 2013. I wouldalso like to thank Julia Cline, Michael Mayer, Cesar Melendez, and EmilyWickstrom, all of whom took a course based on this text at Williams in the fallof 2013, for helpful comments.

Anyone who would like to teach a course based on this text, please letme know ([email protected]). In particular, there are write-ups of thesolutions for many of the problems. I have used the text for three classes,so far. The first time the prerequisites were linear algebra and multivariablecalculus. For the other classes, the perquisites included real analysis. The nexttime I teach this course, I will return to only requiring linear algebra andmultivariable calculus. As Williams has fairly short semesters (about twelveto thirteen weeks), we covered only the first fifteen chapters, with a brief,rapid-fire overview of the remaining topics.

In the summer of 2010, Nicholas Neumann-Chun proofread the entiremanuscript, created its diagrams, and worked a lot of the homework problems.He gave many excellent suggestions.

My Williams colleague Steven Miller also carefully read a draft, helpingtremendously. Also from Williams, Lori Pedersen went through the text a few

xiii

Page 15: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

xiv Acknowledgments

times and provided a lot of solutions of the homework problems. Both WilliamWootters and David Tucker-Smith, from the Williams Physics Department,also gave a close reading of the manuscript; both provided key suggestionsfor improving the physics in the text.

Robert Kotiuga helped with the general exposition and especially in givingadvice on the history of the subject.

I would like to thank Gary Knapp, who not only went through the wholetext, providing excellent feedback, but who also suggested a version of the title.Both Dakota Garrity and Logan Garrity caught many errors and typos in thefinal draft. Each also gave excellent suggestions for improving the exposition.

I also would like to thank my editor, Lauren Cowles, who has providedsupport through this whole project.

The referees also gave much-needed advice.I am grateful for all of their help.

Page 16: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

1

A Brief History

Summary: The unification of electricity, magnetism, and light by JamesMaxwell in the 1800s was a landmark in human history and has continuedeven today to influence technology, physics, and mathematics in profound andsurprising ways. Its history (of which we give a brief overview in this chapter)has been and continues to be studied by historians of science.

1.1. Pre-1820: The Two Subjects of Electricity and Magnetism

Who knows when our ancestors first became aware of electricity andmagnetism? I imagine a primitive cave person, wrapped up in mastodon fur,desperately trying to stay warm in the dead of winter, suddenly seeing a sparkof static electricity. Maybe at the same time in our prehistory someone felta small piece of iron jump out of their hand toward a lodestone. Certainlylightning must have been inspiring and frightening, as it still is.

But only recently (meaning in the last four hundred years) have thesephenomena been at all understood. Around 1600, William Gilbert wrote hisinfuential De Magnete, in which he argued that the earth was just one bigmagnet. In the mid-1700s, Benjamin Franklin showed that lightning wasindeed electricity. Also in the 1700s Coulomb’s law was discovered, whichstates that the force F between two stationary charges is

F = q1q2

r2,

where q1 and q2 are the charges and r is the distance between the charges(after choosing correct units). Further, in the 1740s, Leyden jars wereinvented to store electric charge. Finally, still in the 1700s, Galvani and Volta,independently, discovered how to generate electric charges, with the inventionof galvanic, or voltaic, cells (batteries).

1

Page 17: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

2 1 A Brief History

1.2. 1820–1861: The Experimental Glory Days ofElectricity and Magnetism

In 1820, possibly during a lecture, Hans Christian Oersted happened to movea compass near a wire that carried a current. He noticed that the compass’sneedle jumped. People knew that compasses worked via magnetism and at thesame time realized that current was flowing electricity. Oersted found solidproof that electricity and magnetism were linked.

For the next forty or so years amazing progress was made finding out howthese two forces were related. Most of this work was rooted in experiment.While many scientists threw themselves into this hunt, Faraday stands out as atruly profound experimental scientist. By the end of this era, most of the basicempirical connections between electricity and magnetism had been discovered.

1.3. Maxwell and His Four Equations

In the early 1860s, James Clerk Maxwell wrote down his four equations thatlinked the electric field with the magnetic field. (The real history is quite abit more complicated.) These equations contain within them the predictionthat there are electromagnetic waves, traveling at some speed c. Maxwellobserved that this speed c was close to the observed speed of light. Thisled him to make the spectacular conjecture that light is an electromagneticwave. Suddenly, light, electricity, and magnetism were all part of the samefundamental phenomenon.

Within twenty years, Hertz had experimentally shown that light was indeedan electromagnetic wave. (As seen in Chapter 6 of [27], the actual history isnot quite such a clean story.)

1.4. Einstein and the Special Theory of Relativity

All electromagnetic waves, which after Maxwell were known to include lightwaves, have a remarkable yet disturbing property: These waves travel at afixed speed c. This fact was not controversial at all, until it was realized thatthis speed was independent of any frame of reference.

To make this surprise more concrete, we turn to Einstein’s example ofshining lights on trains. (No doubt today the example would be framed interms of airplanes or rocket ships.) Imagine you are on a train traveling at 60miles per hour. You turn on a flashlight and point it in the same direction asthe train is moving. To you, the light moves at a speed of c (you think your

Page 18: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

1.5 Quantum Mechanics and Photons 3

speed is zero miles per hour). To someone on the side of the road, the lightshould move at a speed of 60 miles per hour +c. But according to Maxwell’sequations, it does not. The observer off the train will actually see the lightmove at the same speed c, which is no different from your observation on thetrain. This is wacky and suggests that Maxwell’s equations must be wrong.

In actual experiments, though, it is our common sense (codified inNewtonian mechanics) that is wrong. This led Albert Einstein, in 1905, topropose an entirely new theory of mechanics, the special theory. In large part,Einstein discovered the special theory because he took Maxwell’s equationsseriously as a statement about the fundamental nature of reality.

1.5. Quantum Mechanics and Photons

What is light? For many years scientists debated whether light was madeup of particles or of waves. After Maxwell (and especially after Hertz’sexperiments showing that light is indeed a type of electromagnetic wave), itseemed that the debate had been settled. But in the late nineteenth century,a weird new phenomenon was observed. When light was shone on certainmetals, electrons were ejected from the metal. Something in light carriedenough energy to forcibly eject electrons from the metal. This phenomenon iscalled the photoelectric effect. This alone is not shocking, as it was well knownthat traditional waves carried energy. (Many of us have been knocked over byocean waves at the beach.) In classical physics, though, the energy carried by atraditional wave is proportional to the wave’s amplitude (how high it gets). Butin the photoelectric effect, the energy of the ejected electrons is proportionalnot to the amplitude of the light wave but instead to the light’s frequency. Thisis a decidedly non-classical effect, jeopardizing a wave interpretation for light.

In 1905, in the same year that he developed the Special Theory of Relativity,Einstein gave an interpretation to light that seemed to explain the photoelectriceffect. Instead of thinking of light as a wave (in which case, the energy wouldhave to be proportional to the light’s amplitude), Einstein assumed that light ismade of particles, each of which has energy proportional to the frequency, andshowed that this assumption leads to the correct experimental predictions.

In the context of other seemingly strange experimental results, peoplestarted to investigate what is now called quantum mechanics, amassing anumber of partial explanations. Suddenly, over the course of a few years in themid-1920s, Born, Dirac, Heisenberg, Jordan, Schrödinger, von Neumann, andothers worked out the complete theory, finishing the first quantum revolution.We will see that this theory indeed leads to the prediction that light must haveproperties of both waves and particles.

Page 19: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4 1 A Brief History

1.6. Gauge Theories for Physicists:The Standard Model

At the end of the 1920s, gravity and electromagnetism were the only twoknown forces. By the end of the 1930s, both the strong force and the weakforce had been discovered.

In the nucleus of an atom, protons and neutrons are crammed together. Allof the protons have positive charge. The rules of electromagnetism wouldpredict that these protons would want to explode away from each other, butthis does not happen. It is the strong force that holds the protons and neutronstogether in the nucleus, and it is called such since it must be strong enough toovercome the repelling force of electromagnetism.

The weak force can be seen in the decay of the neutron. If a neutron isjust sitting around, after ten or fifteen minutes it will decay into a proton,an electron, and another elementary particle (the electron anti-neutrino, tobe precise). This could not be explained by the other forces, leading to thediscovery of this new force.

Since both of these forces were basically described in the 1930s, theirtheories were quantum mechanical. But in the 1960s, a common frameworkfor the weak force and the electromagnetic force was worked out (resultingin Nobel Prizes for Abdus Salam, Sheldon Glashow, and Steven Weinberg in1979). In fact, this framework can be extended to include the strong force.This common framework goes by the name of the standard model. (It does notinclude gravity.)

Much earlier, in the 1920s, the mathematician Herman Weyl attempted tounite gravity and electromagnetism, by developing what he called a gaugetheory. While it quickly was shown not to be physically realistic, theunderlying idea was sufficiently intriguing that it resurfaced in the early 1950sin the work of Yang and Mills, who were studying the strong force. Theunderlying mathematics of their work is what led to the unified electro-weakforce and the standard model.

Weyl’s gauge theory was motivated by symmetry. He used the word“gauge” to suggest different gauges for railroad tracks. His work wasmotivated by the desire to shift from global symmetries to local symmetries.We will start with global symmetries. Think of the room you are sitting in.Choose a corner and label this the origin. Assume one of the edges is thex-axis, another the y-axis, and the third the z-axis. Put some unit of lengthon these edges. You can now uniquely label any point in the room by threecoordinate values.

Page 20: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

1.7 Four-Manifolds 5

Of course, someone else might have chosen a different corner as the origin,different coordinate axes, or different units of length. In fact, any point in theroom (or, for that matter, any point in space) could be used as the origin, andso on. There are an amazing number of different choices.

Now imagine a bird flying in the room. With your coordinate system, youcould describe the path of the bird’s flight by a curve (x(t), y(t), z(t)). Someoneelse, with a different coordinate system, will describe the flight of the bird bythree totally different functions. The flight is the same (after all, the bird doesnot care what coordinate system you are using), but the description is different.By changing coordinates, we can translate from one coordinate system into theother. This is a global change of coordinates. Part of the deep insight of thetheory of relativity, as we will see, is that which coordinate changes are allowedhas profound effects on the description of reality.

Weyl took this one step further. Instead of choosing one global coordinatesystem, he proposed that we could choose different coordinate systems at eachpoint of space but that all of these local coordinate systems must be capable ofbeing suitably patched together. Weyl called this patching “choosing a gauge.”

1.7. Four-Manifolds

During the 1950s, 1960s, and early 1970s, when physicists were developingwhat they called gauge theory, leading to the standard model, mathematicianswere developing the foundations of differential geometry. (Actually this workon differential geometry went back quite a bit further than the 1950s.) Thismainly involved understanding the correct nature of curvature, which, in turn,as we will see, involves understanding the nature of connections. But sometimein the 1960s or 1970s, people must have begun to notice uncanny similaritiesbetween the physicists’ gauges and the mathematicians’ connections. Finally,in 1975, Wu and Yang [69] wrote out the dictionary between the two languages(this is the same Yang who was part of Yang-Mills). This alone was amazing.Here the foundations of much of modern physics were shown to be the sameas the foundations of much of differential geometry.

Through most of the twentieth century, when math and physics interacted,overwhelmingly it was the case that math shaped physics:

Mathematics ⇒ Physics

Come the early 1980s, the arrow was reversed. Among all possiblegauges, physicists pick out those that are Yang-Mills, which are in turndeep generalizations of Maxwell’s equations. By the preceding dictionary,connections that satisfy Yang-Mills should be special.

Page 21: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6 1 A Brief History

This leads us to the revolutionary work of Simon Donaldson. He wasinterested in four-dimensional manifolds. On a four-manifold, there is thespace of all possible connections. (We are ignoring some significant facts.)This space is infinite dimensional and has little structure. But then Donaldsondecided to take physicists seriously. He looked at those connections that wereYang-Mills. (Another common term used is “instantons.”) At the time,there was no compelling mathematical reason to do this. Also, his four-manifolds were not physical objects and had no apparent link with physics.Still, he looked at Yang-Mills connections and discovered amazing, deeplysurprising structure, such as that these special Yang-Mills connections forma five-dimensional space, which has the original four-manifold as part ofits boundary. (Here we are coming close to almost criminal simplification,but the underlying idea that the Yang-Mills connections are linked to a five-manifold that has the four-manifold as part of its boundary is correct.) Thiswork shocked much of the mathematical world and transformed four-manifoldtheory from a perfectly respectable area of mathematics into one of its hottestbranches. In awarding Donaldson a Field’s Medal in 1986, Atiyah [1] wrote:

The surprise produced by Donaldson’s result was accentuated by the fact that hismethods were completely new and were borrowed from theoretical physics, in theform of Yang-Mills equations. . . . Several mathematicians (including myself)worked on instantons and felt very pleased that they were able to assist physicsin this way. Donaldson, on the other hand, conceived the daring idea of reversingthis process and of using instantons on a general 4-manifold as a new geometricaltool.

Many of the finest mathematicians of the 1980s started working ondeveloping this theory, people such as Atiyah, Bott, Uhlenbeck, Taubes, Yau,Kobayashi, and others.

Not only did this work produce some beautiful mathematics, it changed howmath could be done. Now we have

Physics ⇒ Mathematics

an approach that should be called physical mathematics (a term first coined byKishore Marathe, according to [70]: This text by Zeidler is an excellent placeto begin to see the power behind the idea of physical mathematics).

Physical mathematics involves taking some part of the real world that isphysically important (such as Maxwell’s equations), identifying the underlyingmathematics, and then taking that mathematics seriously, even in contexts farremoved from the natural world. This has been a major theme of mathematics

Page 22: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

1.9 Some Sources 7

since the 1980s, led primarily by the brilliant work of Edward Witten. WhenWitten won his Field’s Medal in 1990, Atiyah [2] wrote:

Although (Witten) is definitely a physicist his command of mathematics is rivaledby few mathematicians, and his ability to interpret physical ideas in mathematicalform is quite unique. Time and again he has surprised the mathematical communityby a brilliant application of physical insight leading to new and deep mathematicaltheorems.

The punchline is that mathematicians should take seriously underlyingmathematical structure of the real world, even in non-real world situations.In essence, nature is a superb mathematician.

1.8. This Book

There is a problem with this revolution of physical mathematics. How can anymere mortal master both physics and mathematics? The answer, of course, isyou cannot. This book is a compromise. We concentrate on the key underlyingmathematical concepts behind the physics, trying at the same time to explainjust enough of the real world to justify the use of the mathematics. By theend of this book, I hope the reader will be able to start understanding the workneeded to understand Yang-Mills.

1.9. Some Sources

One way to learn a subject is to study its history. That is not the approach weare taking. There are a number of good, accessible books, though. StephenJ. Blundell’s Magnetism: A Very Short Introduction [4] is excellent for apopular general overview. For more technical histories of the early days ofelectromagnetism, I would recommend Steinle’s article “Electromagnetismand Field Physics” [62] and Buchwald’s article “Electrodynamics fromThomson and Maxwell to Hertz” [7].

Later in his career, Abraham Pais wrote three excellent books coveringmuch of the history of twentieth century physics. His Subtle Is the Lord: TheScience and the Life of Albert Einstein [51] is a beautiful scientific biographyof Einstein, which means that it is also a history of much of what was importantin physics in the first third of the 1900s. His Niels Bohr’s Times: In Physics,Philosophy, and Polity [52] is a scientific biography of Bohr, and hence a goodoverview of the history of early quantum mechanics. His Inward Bound [53]

Page 23: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

8 1 A Brief History

is a further good reference for the development of quantum theory and particlephysics.

It appears that the ideas of special relativity were “in the air” around 1905.For some of the original papers by Einstein, Lorentz, Minkowski, and Weyl,there is the collection [19]. Poincaré was also actively involved in the earlydays of special relativity. Recently two biographies of Poincaré have beenwritten: Gray’s Henri Poincaré: A Scientific Biography [27] and Verhulst’sHenri Poincaré: Impatient Genius [67]. There is also the still interesting paperof Poincaré that he gave at the World’s Fair in Saint Louis in 1904, which hasrecently been reprinted [54].

At the end of this book, we reach the beginnings of gauge theory. In [50],O’Raifeartaigh has collected some of the seminal papers in the developmentof gauge theory. We encourage the reader to look at the web page of EdwardWitten for inspiration. I would also encourage people to look at many ofthe expository papers on the relationship between mathematics and physicsin volume 6 of the collected works of Atiyah [3] and at those in volume4 of the collected works of Bott [5]. (In fact, perusing all six volumes ofAtiyah’s collected works and all four volumes of Bott’s is an excellent wayto be exposed to many of the main themes of mathematics of the last half ofthe twentieth century.) Finally, there is the wonderful best seller The ElegantUniverse by Brian Greene [28].

Page 24: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

2

Maxwell’s Equations

Summary: The primary goal of this chapter is to state Maxwell’s equations.We will then see some of their implications, which will allow us to givealternative descriptions for Maxwell’s equations, providing us in turn with areview of some of the basic formulas in multivariable calculus.

2.1. A Statement of Maxwell’s Equations

Maxwell’s equations link together three vector fields and a real-valuedfunction. Let

E = E(x , y, z, t) = (E1(x , y, z, t), E2(x , y, z, t), E3(x , y, z, t))

and

B = B(x , y, z, t) = (B1(x , y, z, t), B2(x , y, z, t), B3(x , y, z, t))

be two vector fields with spacial coordinates (x , y, z) and time coordinate t .Here E represents the electric field while B represents the magnetic field. Thethird vector field is

j (x , y, z, t) = ( j1(x , y, z, t), j2(x , y, z, t), j3(x , y, z, t)),

which represents the current (the direction and the magnitude of the flow ofelectric charge). Finally, let

ρ(x , y, z, t)

be a function representing the charge density. Let c be a constant. (Here c isthe speed of light in a vacuum.) Then these three vector fields and this function

9

Page 25: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

10 2 Maxwell’s Equations

Fsurface Sinterior region V

n

Figure 2.1

satisfy Maxwell’s equations if

div(E) = ρ

curl(E) = −∂ B

∂ tdiv(B) = 0

c2 curl(B) = j + ∂ E

∂ t.

(Review of the curl, the divergence, and other formulas from multivariablecalculus is in the next section.)

We can reinterpret these equations in terms of integrals via various Stokes-type theorems. For example, if V is a compact region in space with smoothboundary surface S, as in Figure 2.1, then for any vector field F we know fromthe Divergence Theorem that∫ ∫

SF · n dA =

∫ ∫ ∫V

div(F) dxdydz,

where n is the unit outward normal of the surface S.In words, this theorem says that the divergence of a vector field measures

how much of the field is flowing out of a region.Then the first of Maxwell’s equations can be restated as∫ ∫

SE · n dA =

∫ ∫ ∫div(E) dxdydz

=∫ ∫ ∫

ρ(x , y, z, t) dxdydz

= total charge inside the region V.

Page 26: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

2.1 A Statement of Maxwell’s Equations 11

curve C

T

surface S

n

n

n

Figure 2.2

Likewise, the third of Maxwell’s equations is:∫ ∫S

B · n dA =∫ ∫ ∫

div(B) dxdydz

=∫ ∫ ∫

0 dxdydz

= 0

= There is no magnetic charge inside the region V.

This is frequently stated as “There are no magnetic monopoles,” meaning thereis no real physical notion of magnetic density.

The second and fourth of Maxwell’s equations have similar integralinterpretations. Let C be a smooth curve in space that is the boundary ofa smooth surface S, as in Figure 2.2. Let T be a unit tangent vector of C .Choose a normal vector field n for S so that the cross product T ×n points intothe surface S.

Then the classical Stokes Theorem states that for any vector field F , wehave ∫

CF · T ds =

∫ ∫S

curl(F) · n dA.

This justifies the intuition that the curl of a vector field measures how muchthe vector field F wants to twirl.

Then the second of Maxwell’s equations is equivalent to∫C

E · T ds = −∫ ∫

S

∂ B

∂ t· n dA.

Thus the magnetic field B is changing in time if and only if the electric fieldE is curling.

Page 27: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12 2 Maxwell’s Equations

y

x

z

Figure 2.3

This is the mathematics underlying how to create current in a wire bymoving a magnet. Consider a coil of wire, centered along the z-axis (i.e.,along the vector k = (0,0,1)).

The wire is coiled (almost) in the xy-plane. Move a magnet through themiddle of this coil. This means that the magnetic field B is changing in time inthe direction k. Thanks to Maxwell, this means that the curl of the electric fieldE will be non-zero and will point in the direction k. But this means that theactual vector field E will be “twirling” in the xy-plane, making the electronsin the coil move, creating a current.

This is in essence how a hydroelectric dam works. Water from a river isused to move a magnet through a coil of wire, creating a current and eventuallylighting some light bulb in a city far away.

The fourth Maxwell equation gives

c2∫

CB · T ds =

∫ ∫S

(j + ∂ E

∂ t

)· n dA.

Here current and a changing electric field are linked to the curl of the magneticfield.

2.2. Other Versions of Maxwell’s Equations

2.2.1. Some Background in Nabla

This section is meant to be both a review and a listing of some of the standardnotations that people use. The symbol � is pronounced “nabla” (sometimes ∇

Page 28: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

2.2 Other Versions of Maxwell’s Equations 13

is called “del”). Let

∇ =(

∂x,

∂ y,

∂z

)

= i∂

∂x+ j

∂ y+ k

∂z

where i = (1,0,0), j = (0,1,0), and k = (0,0,1). Then for any functionf (x , y, z), we set the gradient to be

∇ f =(

∂ f

∂x,∂ f

∂ y,∂ f

∂z

)

= i∂ f

∂x+ j

∂ f

∂ y+ k

∂ f

∂z.

For a vector field

F = F(x , y, z) = (F1(x , y, z), F2(x , y, z), F3(x , y, z))

= (F1, F2, F3)

= F1 · i + F2 · j + F3 · k,

define the divergence to be:

∇ · F = div(F)

= ∂ F1

∂x+ ∂ F2

∂ y+ ∂ F3

∂z.

The curl of a vector field in this notation is

∇ × F = curl(F)

= det

i j k

∂x

∂ y

∂z

F1 F2 F3

=(

∂ F3

∂ y− ∂ F2

∂z,−(

∂ F3

∂x− ∂ F1

∂z

),∂ F2

∂x− ∂ F1

∂ y

).

Page 29: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

14 2 Maxwell’s Equations

2.2.2. Nabla and Maxwell

Using the nabla notation, Maxwell’s equations have the form

∇ · E = ρ

∇ × E = −∂ B

∂ t∇ · B = 0

c2 ∇ × B = j + ∂ E

∂ t

Though these look like four equations, when written out they actually formeight equations. This is one of the exercises in the following section.

2.3. Exercises

The first few problems are exercises in the nabla machinery and in the basicsof vector fields.

Exercise 2.3.1. For the function f (x , y, z) = x2 + y3 + xy2 + 4z, computegrad(f), which is the same as computing ∇( f ).

Exercise 2.3.2. a. Sketch, in the xy-plane, some representative vectors makingup the vector field

F(x , y, z) = (F1, F2, F3) = (x , y, z),

at the points

(1,0,0), (1,1,0), (0,1,0), (−1,1,0), (−1,0,0), (−1,−1,0), (0,−1,0), (1,−1,0).

b. Find div(F) = ∇ · F.c. Find curl(F) = ∇ × F.

Comment: Geometrically the vector field F(x , y, z) = (x , y, z) is spreadingout but not “twirling” or “curling” at all, as is reflected in the calculations ofits divergence and curl.

Exercise 2.3.3. a. Sketch, in the xy-plane, some representative vectors makingup the vector field

F(x , y, z) = (F1, F2, F3) = ( − y, x ,0),

at the points

(1,0,0), (1,1,0), (0,1,0), (−1,1,0), (−1,0,0), (−1,−1,0), (0,−1,0), (1,−1,0).

Page 30: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

2.3 Exercises 15

b. Find div(F) = ∇ · F.c. Find curl(F) = ∇ × F.

Comment: As compared to the vector field in the previous exercise, this vectorfield F(x , y, z) = ( − y, x ,0) is not spreading out at all but does “twirl” in thexy-plane. Again, this is reflected in the divergence and curl.

Exercise 2.3.4. Write out Maxwell’s equations in local coordinates (meaningnot in vector notation). You will get eight equations. For example, one of themwill be

∂ yE3 − ∂

∂zE2 = − ∂

∂ tB1.

Exercise 2.3.5. Let c = 1. Show that

E = (y − z,−2zt ,−x − z2)

B = ( − 1 − t2,0,1 + t)

ρ = −2z

j = (0,2z,0)

satisfy Maxwell’s equations.

Comment: In the real world, the function ρ and the vector fields E , B , andj are determined from experiment. That is not how I chose the function andvector fields in problem 5. In Chapter 7, we will see that given any functionφ(x , y, z, t) and vector field A(x , y, z, t) = (A1, A2, A3), if we set

E = −∇(φ) − ∂ A

∂ t

=(

∂φ

∂x,∂φ

∂ y,∂φ

∂z

)−(

∂ A1

∂ t,∂ A2

∂ t,∂ A3

∂ t

)B = ∇ × A

and, using these particular E and B , set

ρ = ∇ · E

j = ∇ × B − ∂ B

∂ t,

we will have that ρ, E , B , and j satisfy Maxwell’s equations. For this lastproblem, I simply chose, almost at random, φ(x , y, z, t) = xz and

A = ( − yt + x2, x + zt2,−y + z2t).

Page 31: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16 2 Maxwell’s Equations

The punchline of Chapter 7 is that the converse holds, meaning that if thefunction ρ and the vector fields E , B , and j satisfy Maxwell’s equations,then there must be a function φ(x , y, z, t) and a vector field A such thatE = −∇(φ) − ∂ A

∂ t and B = ∇ × A. The φ(x , y, z, t) and A are called thepotentials.

Page 32: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

3

Electromagnetic Waves

Summary: When the current j and the density ρ are zero, both the electricfield and the magnetic field satisfy the wave equation, meaning that bothfields can be viewed as waves. In the first section, we will review the waveequation. In the second section, we will see why Maxwell’s equations yieldthese electromagnetic waves, each having speed c.

3.1. The Wave Equation

Waves permeate the world. Luckily, there is a class of partial differentialequations (PDEs) whose solutions describe many actual waves. We will notjustify why these PDEs describe waves but instead will just state their form.(There are many places to see heuristically why these PDEs have anything atall to do with waves; for example, see [26].)

The one-dimensional wave equation is

∂2 y

∂ t2− v2 ∂2 y

∂x2= 0.

Here the goal is to find a function y = y(x , t), where x is position and t istime, that satisfies the preceding equation. Thus the “unknown” is the functiony(x , t). For a fixed t , this can describe a function that looks like Figure 3.1.

From the heuristics of the derivation of this equation, the speed of this waveis v.

The two-dimensional wave equation is

∂2z

∂ t2− v2

(∂2z

∂x2+ ∂2z

∂ y2

)= 0.

17

Page 33: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18 3 Electromagnetic Waves

Here, the function z(x , y, t) is the unknown, where x and y describe positionand t is again time. This could model the motion of a wave over the(x , y)-plane. This wave also has speed v.

y

x

Figure 3.1

y

x

z

Figure 3.2

The three-dimensional wave equation is

∂2 f

∂ t2− v2

(∂2 f

∂x2+ ∂2 f

∂ y2+ ∂2 f

∂z2

)= 0.

Once again, the speed is v.Any function that satisfies such a PDE is said to satisfy the wave equation.

We expect such functions to have wave-like properties.

The sum of second derivatives ∂2 f∂x2 + ∂2 f

∂ y2 + ∂2 f∂z2 occurs often enough to

justify its own notation and its own name, the Laplacian. We use the notation

�( f ) = ∂2 f

∂x2+ ∂2 f

∂ y2+ ∂2 f

∂z2.

Page 34: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

3.1 The Wave Equation 19

This has a convenient formulation using the nabla notation ∇ = ( ∂∂x , ∂

∂ y , ∂∂z ).

Thinking of ∇ as a vector, we interpret its dot product with itself as

∇2 = ∇ ·∇=(

∂x,

∂ y,

∂z

)·(

∂x,

∂ y,

∂z

)

= ∂2

∂x2+ ∂2

∂ y2+ ∂2

∂z2,

and thus we have

∇2 = �.

Then we can write the three-dimensional wave equation in the following threeways:

0 = ∂2 f

∂ t2− v2

(∂2 f

∂x2+ ∂2 f

∂ y2+ ∂2 f

∂z2

)

= ∂2 f

∂ t2− v2�( f )

= ∂2 f

∂ t2− v2∇2( f ).

But this wave equation is for functions f (x , y, z, t). What does it mean fora vector field F = (F1, F2, F3) to satisfy a wave equation? We will say that thevector field F satisfies the wave equation

∂2 F

∂ t2− v2�(F) = 0,

if the following three partial differential equations

∂2 F1

∂ t2− v2

(∂2 F1

∂x2+ ∂2 F1

∂ y2+ ∂2 F1

∂z2

)= 0

∂2 F2

∂ t2− v2

(∂2 F2

∂x2+ ∂2 F2

∂ y2+ ∂2 F2

∂z2

)= 0

∂2 F3

∂ t2− v2

(∂2 F3

∂x2+ ∂2 F3

∂ y2+ ∂2 F3

∂z2

)= 0

hold.

Page 35: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

20 3 Electromagnetic Waves

3.2. Electromagnetic Waves

In the half-century before Maxwell wrote down his equations, an amazingamount of experimental work on the links between electricity and magnetismhad been completed. To some extent, Maxwell put these empirical obser-vations into a more precise mathematical form. These equations’s strength,though, is reflected in that they allowed Maxwell, for purely theoreticalreasons, to make one of the most spectacular intellectual leaps ever: Namely,Maxwell showed that electromagnetic waves that move at the speed c had toexist. Maxwell knew that this speed c was the same as the speed of light,leading him to predict that light was just a special type of elecromagneticwave. No one before Maxwell realized this. In the 1880s, Hertz provedexperimentally that light was indeed an electromagnetic wave.

We will first see intuitively why Maxwell’s equations lead to the existenceof electromagnetic waves, and then we will rigorously prove this fact.Throughout this section, assume that there is no charge (ρ = 0) and no current( j = 0) (i.e., we are working in a vacuum). Then Maxwell’s equations become

∇ · E = 0

∇ × E = −∂ B

∂ t∇ · B = 0

c2 ∇ × B = ∂ E

∂ t.

These vacuum equations are themselves remarkable since they show that thewaves of the electromagnetic field move in space without any charge or current.

Suppose we change the magnetic field (possibly by moving a magnet). Then∂ B∂ t �= 0 and hence the electric vector field E will have non-zero curl. Thus E

will have a change in a direction perpendicular to the change in B . But then∂ E∂ t �= 0, creating curl in B , which, in turn, will prevent ∂ B

∂ t from being zero,starting the whole process over again, never stopping.

E

B

t

Figure 3.3

Page 36: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

3.3 The Speed of Electromagnetic Waves Is Constant 21

This is far from showing that we have an actual wave, though.Now we show that the electric field E satisfies the wave equation

∂2 E

∂ t2− c2�(E) = 0.

We have

∂2 E

∂ t2= ∂

∂ t

(∂ E

∂ t

)

= ∂

∂ t(c2 ∇ × B)

= c2 ∇ × ∂ B

∂ t

= −c2 ∇ ×∇ × E

= c2 (�(E1),�(E2),�(E3)) ,

which means that the electric field E satisfies the wave equation. Note thatthe second and fourth lines result from Maxwell’s equations. The fact that∇×∇× E =−(�(E1),�(E2),�(E3)) is a calculation coupled with Maxwell’sfirst equation, which we leave for the exercises. The justification for the thirdequality, which we also leave for the exercises, stems from the fact that theorder of taking partial derivatives is interchangeable. The corresponding prooffor the magnetic field is similar and is also left as an exercise.

3.3. The Speed of Electromagnetic Waves Is Constant

3.3.1. Intuitive Meaning

We have just seen there are electromagnetic waves moving at speed c, whenthere is no charge and no current. In this section we want to start seeing thatthe existence of these waves, moving at that speed c, strikes a blow to ourcommon-sense notions of physics, leading, in the next chapter, to the heart ofthe Special Theory of Relativity.

Consider a person A. She thinks she is standing still. A train passes by,going at the constant speed of 60 miles per hour. Let person B be on the train.B legitimately can think that he is at the origin of the important coordinatesystem, thus thinking of himself as standing still. On this train, B rolls a ballforward at, say, 3 miles per hour, with respect to the train. Observer A, though,would say that the ball is moving at 3 + 60 miles per hour. So far, nothingcontroversial at all.

Page 37: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

22 3 Electromagnetic Waves

Let us now replace the ball with an electromagnetic wave. Suppose personB turns it on and observes it moving in the car. If you want, think of B asturning on a flashlight. B will measure its speed as some c miles per hour.Observer A will also see the light. Common sense tells us, if not screams atus, that A will measure the light as traveling at c + 60 miles per hour.

But what do Maxwell’s equations tell us? The speed of an electromagneticwave is the constant c that appears in Maxwell’s equations. But the value ofc does not depend on the initial choice of coordinate system. The (x , y, z, t)for person A and the (x , y, z, t) for person B have the same c in Maxwell. Ofcourse, the number c in the equations is possibly only a “constant” once acoordinate system is chosen. If this were the case, then if person A measuredthe speed of an electromagnetic wave to be some c, then the correspondingspeed for person B would be c − 60, with this number appearing in person B’sversion of Maxwell’s equations. This is now an empirical question about thereal world. Let A and B each measure the speed of an electromagnetic wave.What physicists find is that for both observers the speed is the same. Thus, inthe preceding train, the speed of an electromagnetic wave for both A and B isthe same c. This is truly bizarre.

3.3.2. Changing Coordinates for the Wave Equation

Suppose we again have two people, A and B . Let person B be traveling ata constant speed α with respect to A, with A’s and B’s coordinate systemsexactly matching up at time t = 0.

To be more precise, we think of person A as standing still, with coordinatesx ′ for position and t ′ for time, and of person B as moving to the right at speedα, with position coordinate x and time coordinate t . If the two coordinatesystems line up at time t = t ′ = 0, then classically we would expect

x ′ = x +αt

t ′ = t ,

or, equivalently,

x = x ′ −αt

t = t ′.

Page 38: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

3.3 The Speed of Electromagnetic Waves Is Constant 23

Moving Framez

x

y

Lab Framez'

x'

y'

speed α

Figure 3.4

This reflects our belief that the way a stopwatch works should not be influencedby how fast it is moving. (This belief will also be shattered by the SpecialTheory of Relativity.)

Suppose in the reference frame for B we have a wave y(x , t) satisfying

∂2 y

∂ t2− v2 ∂2 y

∂x2= 0.

In B’s reference frame, the speed of the wave is v. From calculus, this speedv must be equal to the rate of change of x with respect to t , or in other wordsv = dx/dt . This in turn forces

∂ y

∂ t= −v

∂ y

∂x,

as we will see now. Fix a value of y(x , t) = y0. Then we have some x0 suchthat

y0 = y(x0,0).

This means that for all x and t such that x0 = x − vt , then y0 = y(x , t). Thespeed of the wave is how fast this point with y-coordinate y0 moves along thex-axis with respect to t . Then

0 = dy

dt

= ∂ y

∂x

dx

dt+ ∂ y

∂ t

dt

dt

= ∂ y

∂x

dx

dt+ ∂ y

∂ t

= v∂ y

∂x+ ∂ y

∂ t,

giving us that ∂ y/∂ t = −v∂ y/∂x .Person A is looking at the same wave but measures the wave as having

speed v + α. We want to see explicitly that under the appropriate change ofcoordinates this indeed happens. This is an exercise in the chain rule, which iscritically important in these arguments.

Page 39: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

24 3 Electromagnetic Waves

Our wave y(x , t) can be written as function of x ′ and t ′, namely, as

y(x , t) = y(x ′ −αt ′, t ′).

We want to show that this function satisfies

∂2 y

∂ t ′2 − (v+α)2 ∂2 y

∂x ′2 = 0.

The key will be that

∂x ′ = ∂x

∂x ′∂

∂x+ ∂ t

∂x ′∂

∂ t

= ∂

∂x∂

∂ t ′ = ∂x

∂ t ′∂

∂x+ ∂ t

∂ t ′∂

∂ t

= −α∂

∂x+ ∂

∂ t

by the chain rule.We start by showing that

∂ y

∂ t ′ = −α∂ y

∂x+ ∂ y

∂ t

and∂ y

∂x ′ = ∂ y

∂x,

whose proofs are left for the exercises.Turning to second derivatives, we can similarly show that

∂2 y

∂ t ′2 = α2 ∂2 y

∂x2− 2α

∂2 y

∂x∂ t+ ∂2 y

∂ t2

and∂2 y

∂x ′2 = ∂2 y

∂x2,

whose proofs are also left for the exercises.Knowing that ∂ y

∂ t = −v ∂ y∂x , we have

∂2 y

∂x∂ t= −v

∂2

∂x2.

This allows us to show (left again for the exercises) that

∂2 y

∂ t ′2 − (v+α)2 ∂2 y

∂x ′2 = 0,

which is precisely what we desired.

Page 40: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

3.4 Exercises 25

Classically, the speed of a wave depends on the coordinate system thatis being used. Maxwell’s equations tell us that this is not true, at least forelectromagnetic waves. Either Maxwell or the classical theory must be wrong.In the next chapter, we will explore the new theory of change of coordinatesimplied by Maxwell’s equations, namely, the Special Theory of Relativity.

3.4. Exercises

Exercise 3.4.1. Show that the functions y1(x , t) = sin (x − vt) and y2(x , t) =sin (x + vt) are solutions to the wave equation

∂2 y

∂ t2− v2 ∂2 y

∂x2= 0.

Exercise 3.4.2. Let f (u) be a twice differentiable function. Show that bothf (x −vt) and f (x +vt) are solutions to the wave equation. Then interpret thesolution f (x − vt) as the graph y = f (u) moving to the right at a speed v andthe solution f (x + vt) as the graph y = f (u) moving to the left at a speed v.

y = f (x vt) vy = f (x + vt)v

Figure 3.5

Exercise 3.4.3. Suppose that f1(x , t) and f2(x , t) are solutions to the wave

equation ∂2 f∂ t2 − v2�( f ) = 0. Show that λ1 f1(x , t) + λ2 f2(x , t) is another

solution, where λ1 and λ2 are any two real numbers.

Comment: When a differential equation has the property that λ1 f1(x , t) +λ2 f2(x , t) is a solution whenever f1(x , t) and f2(x , t) are solutions, we say thatthe differential equation is homogeneous linear.

Exercise 3.4.4. Show that

∂ t(∇ × F) = ∇ × ∂ F

∂ t.

Exercise 3.4.5. Let B(x , y, z, t) be a magnetic field when there is no charge(ρ = 0) and no current ( j = 0). Show that B satisfies the wave equation ∂2 B

∂ t2 −c2�(B) = 0. First prove this following the argument in the text. Then closethis book and recreate the argument from memory. Finally, in a few hours, gothrough the argument again.

Page 41: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

26 3 Electromagnetic Waves

Exercise 3.4.6. Suppose that ∇ · E = 0. Show that

∇ ×∇ × E = −(�(E1),�(E2),�(E3)) .

As a word of warning, this is mostly a long calculation, though at a criticalpoint you will need to use that ∇ · E = 0.

Exercise 3.4.7. LetE = (0, sin(x − ct),0).

Assume that there is no charge (ρ = 0) and no current ( j = 0). Find acorresponding magnetic field B and then show that both E and B satisfy thewave equation.

Exercise 3.4.8. Using the notation from Section 3.3, show that

∂ y

∂ t ′ = −α∂ y

∂x+ ∂ y

∂ t

and∂ y

∂x ′ = ∂ y

∂x.

Exercise 3.4.9. Using the notation from Section 3.3, show that

∂2 y

∂ t ′2 = α2 ∂2 y

∂x2− 2α

∂2 y

∂x∂ t+ ∂2 y

∂ t2

and∂2 y

∂x ′2 = ∂2 y

∂x2.

Exercise 3.4.10. Using the notation from Section 3.3, show that

∂2 y

∂ t ′2 − (v+α)2 ∂2 y

∂x ′2 = 0.

(Since we know that ∂ y∂ t = −v ∂ y

∂x , we know that

∂2 y

∂x∂ t= −v

∂2 y

∂x2.)

Page 42: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4

Special Relativity

Summary: We develop the basics of special relativity. Key is determining theallowable coordinate changes. This will let us show in the next chapter not justhow but why magnetism and electricity must be linked.

4.1. Special Theory of Relativity

The physics and mathematics of Maxwell’s equations in the last chapter wereworked out during the 1800s. In these equations, there is no real description forwhy electricity and magnetism should be related. Instead, the more practicaldescription of the how is treated. Then, in 1905, came Albert Einstein’s “Onthe Electrodynamics of Moving Bodies” [19], the paper that introduced theworld to the Special Theory of Relativity. This paper showed the why ofMaxwell’s equations (while doing far more).

The Special Theory of Relativity rests on two assumptions, neither at firstglance having much to do with electromagnestism.

Assumption I: Physics must be the same in all frames of reference moving atconstant velocities with respect to each other.

Assumption II: The speed of light in a vacuum is the same in all frames ofreference that move at constant velocities with respect to each other.

The first assumption is quite believable, saying in essence that how youchoose your coordinates should not affect the underlying physics of what youare observing. It leads to an interesting problem, though, of how to translatefrom one coordinate system to another. It is the second assumption that willdrastically restrict how we are allowed to change coordinate systems.

It is also this second assumption that is, at first glance, completely crazy.In the last chapter we saw that this craziness is already hidden in Maxwell’s

27

Page 43: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

28 4 Special Relativity

equations. Assumption II is an empirical statement, one that can be tested. Inall experiments ever done (and there have been many), Assumption II holds.The speed of light is a constant.

But Einstein in 1905 had not done any of these experiments. How did heever come up with such an idea? The answer lies in Maxwell’s equations.As we have seen, Maxwell’s equations give rise to electromagnetic wavesthat move at c, where c is the speed of light. But this speed is independentof the observer. The electromagnetic waves in Maxwell’s equations satisfyAssumption II. This fact wreaks havoc on our common sense notions of spaceand time.

4.2. Clocks and Rulers

To measure anything, we must first choose units of measure. For now, wewant to describe where something happens (its spatial coordinates) and whenit happens (its time). Fixing some point in space as the origin and some timeas time zero, choosing units of length and of time, and finally choosing threeaxes in space, we see that we can describe any event in terms of four numbers(x , y, z, t), with (x , y, z) describing the position in space and t the time.

Suppose we set up two coordinate systems. The first we will call thelaboratory frame, denoting its coordinates by

(x1, y1, z1, t1).

Think of this frame as motionless. Now suppose we have another frame ofreference, which we will call the moving frame. Its coordinates are denoted by

(x2, y2, z2, t2).

Suppose this new frame is moving along at a constant speed of v units/sec ina direction parallel to the x-axis, with respect to the lab frame. Further wesuppose that the two frames agree at some specified event.

You can think of the moving frame as being in a train that is chugging alongin the lab.

Let c be the speed of light. Assumption II states that this speed c must bethe same in both reference frames. Suppose we put a mirror in the movingframe at the point

x2 = 0

y2 = c

z2 = 0

Page 44: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.2 Clocks and Rulers 29

MovingFrame

t = 1

t = 0

t

MovingFrame

LabFrame

LabFrame

LabFrame

MovingFrame

=

Figure 4.1

At t = 0, shine a flashlight, in the moving frame, from the moving frame’sorigin toward the mirror. (The flashlight is stationary in the moving frame butappears to be moving at speed v in the stationary frame.) The light bounces offthe mirror, right back to the moving frame’s origin.

Moving Frame

c mirror

Figure 4.2

It takes one second for the light to go from the origin to the mirror and anothersecond to return. Thus the light will travel a distance of 2c units over a time of2 seconds.

Page 45: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

30 4 Special Relativity

Thus the velocity of light will be what we expect:

speed of light = distance

time= 2c

2= c.

What does someone in the lab frame see? In the lab frame, at time t = 0,suppose that the flashlight is turned on at the place

x1 = −v

y1 = 0

z1 = 0

At this time, the mirror is at

x1 = −v

y1 = c

z1 = 0

The light will appear to move along this path

c mirror

Lab Frame

v-v

Figure 4.3

in the lab frame. Common sense tells us that the light should still take 2seconds to complete its path, a path of length 2

√v2 + c2. It appears that the

speed of light in the lab frame should be

speed of light = distance

time= 2

√v2 + c2

2=√v2 + c2.

This cannot happen under Assumption II.This leads to the heart of the problem. We need to know how to translate

from one of our coordinate systems to the other. Assumption II will require usto allow surprising and non-intuitive translations.

Page 46: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.3 Galilean Transformations 31

4.3. Galilean Transformations

There is no mystery that people use different coordinate systems in differentplaces and times. Special relativity highlights, though, that how one coordinatesystem can be changed into another is at the heart of mechanics.

Here we will write the classical change of coordinates (called Galileantransformations) via matrix transformations. Later in the chapter, we will des-cribe the analogous transformations for special relativity (the Lorentz transfor-mations), also via matrices.

We start with two reference frames, one moving with speed v with respectto the other. We will just follow our nose and ignore any condition on the speedof light.

Assume, as would seem obvious, that we can synchronize the times in thetwo frames, meaning that

t1 = t2.

Suppose that at time zero, the two origins are at the same point but that oneframe is moving in the x-direction with speed v with respect to the other frame.Then the spatial coordinates are related by:

x2 = x1 + vt

y2 = y1

z2 = z1.

We are in a Newtonian world, where the origin in the moving frame ismoving along the x1-axis in the lab frame. (As we saw earlier, this type ofchange of coordinates violates the assumption that the speed of light must beconstant in all frames of reference.) We can write this change of coordinatesas the matrix transformation

x2

y2

z2

t2

=

1 0 0 v

0 1 0 00 0 1 00 0 0 1

x1

y1

z1

t1

.

This format suggests how to write down changes of coordinates in general.We keep with the assumption that the times in both coordinate systems areequal and synchronized (t1 = t2) and that at time zero, the two origins are atthe same point. We further assume that a unit of length in one reference frameis the same as a unit of length in the other and that angles are measured the

Page 47: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

32 4 Special Relativity

same. Then all allowable changes of coordinates take the form

x2

y2

z2

t2

=

a11 a12 a13 v1

a21 a22 a23 v2

a31 a32 a33 v3

0 0 0 1

x1

y1

z1

t1

.

In order for the lengths and angles to be preserved in the two referenceframes, we need the dot products between vectors in the two frames to bepreserved when transformed. This means that we need the matrix

A = a11 a12 a13

a21 a22 a23

a31 a32 a33

to be in the orthogonal group O(3,R), which is just a fancy way of saying thefollowing:

Recall that the transpose of the matrix A is

AT = a11 a21 a31

a12 a22 a32

a13 a23 a33

.

Then A ∈ O(3,R) means that for all vectors(x y z

)and

(u v w

), we

have

(x y z

) · u

v

w

= (

x y z)

AT A

u

v

w

.

These orthogonal transformations are the only allowable change of coordinatesfor transforming from one frame of reference to another frame that is movingwith velocity (v1,v2,v3) with respect to the first, under the assumption that thetime coordinate is “independent” of the spatial coordinates.

4.4. Lorentz Transformations

4.4.1. A Heuristic Approach

(While the derivation that follows is standard, I have followed the presentationin chapter 3 of A. P. French’s Special Relativity [25], a source I highlyrecommend.)

The Lorentz transformations are the correct type of changes of coordinatesthat will satisfy the Special Relativity requirement that the speed of light is aconstant. H. A. Lorentz wrote down these coordinate changes in his papers

Page 48: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.4 Lorentz Transformations 33

“Michelson’s Interference Experiment” and “Electromagnetic Phenomena ina System Moving with any Velocity Less than that of Light,” both reprintedin [19] and both written before Einstein’s Special Theory of Relativity, in anattempt to understand the Michelson-Morley experiments (which can now beinterpreted as experiments showing that the speed of light is independent ofreference frame).

Suppose, as before, that we have two frames: one we call the labframe (again with coordinates labeled (x1, y1, z1, t1)) and the other the movingframe (with coordinates (x2, y2, z2, t2)). Suppose this second frame moves at aconstant speed of v units/sec in a direction parallel to the x-axis, with respectto the lab frame. Further, we suppose that the two frames coincide at timet1 = t2 = 0. We now assume that the speed of light is constant in every frameof reference and hence is a well-defined number, to be denoted by c. We wantto justify that the two coordinates of the two systems are related by

x2 = γ x1 − γ vt1

y2 = y1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

where

γ = 1√1 − ( vc )2

.

It is certainly not clear why this is at all reasonable. But if this coordinatechange is correct, then a major problem will occur if the velocity is greaterthan the speed of light, as γ will then be an imaginary number. Hence, if theprevious transformations are indeed true, we can conclude that the speed oflight must be an upper bound for velocity.

We can rewrite the preceding change of coordinates in the following matrixform:

x2

y2

z2

t2

=

γ 0 0 −vγ

0 1 0 00 0 1 0

γ ( − vc2 ) 0 0 γ

x1

y1

z1

t1

.

We now give a heuristic justification for these transformations, criticallyusing both of our assumptions for special relativity.

First, since the two reference frames are only moving with respect to eachother along their x-axes, it is reasonable to assume that the two coordinatesystems agree in the y and z directions, motivating our letting y2 = y1 and

Page 49: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

34 4 Special Relativity

z2 = z1, which in turn allows us to ignore the y and z coordinates. We makethe additional assumption that there are constants a and b such that

x2 = ax1 + bt1.

Thus, we are explicitly assuming that any change of coordinates should belinear. Now consider Assumption I. An observer in frame 1 thinks that frame2 is moving to the right at speed v. But an observer in frame 2 will think ofthemselves as standing still, with frame 1 whizzing by to the left at speed v.Assumption I states that there is intrinsically no way for anyone to be able todistinguish the two frames, save for the directions they are moving with respectto each other. In particular, neither observer can claim to be the frame that is“really” standing still. This suggests that

x1 = ax2 − bt2

with the minus sign in front of the b coefficient reflecting that the two framesare moving in opposite directions. Here we are using Assumption I, namely,that we cannot distinguish the two frames, save for how they are moving withrespect to each other.

Now to start getting a handle on a and b. Consider how frame 2’s originx2 = 0 is moving in frame 1. To someone in frame 1, this point must bemoving at speed v to the right. Since x2 = ax1 + bt1, if x2 = 0, we must havex1 = −( b

a

)t1 and hence

v = dx1

dt1= −b

a.

Thus we have

x2 = ax1 − avt1

x1 = ax2 + avt2.

Assumption II now moves to the fore. The speed of light is the same in allreference frames. Suppose we turn on a flashlight at time t1 = t2 = 0 at theorigin of each frame, shining the light along each frame’s x-axis. Then wehave, in both frames, that the path of the light is described by

x1 = ct1

x2 = ct2.

We havect2 = x2 = ax1 − avt1 = act1 − avt1 = a(c − v)t1

andct1 = x1 = ax2 + avt2 = act2 + avt2 = a(c + v)t2.

Page 50: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.4 Lorentz Transformations 35

Dividing through by c in the second equation and then plugging in for t1 in thefirst equation, we get

ct2 =(

a2(c2 − v2)

c

)t2

which means that

a2 = 1

1 − ( vc )2

finally yielding that

a = 1√1 − ( vc )2

= γ .

We leave the proof that

t2 = −γ( v

c2

)x1 + γ t1

as an exercise at the end of the chapter.

4.4.2. Lorentz Contractions and Time Dilations

(This section closely follows [25].)The goal of this section is to show how length and time changes are not

invariant but depend on the frame in which they are measured. Suppose youare in a reference frame. Naturally, you think you are standing still. A friendwhizzes by, in the positive x direction, at velocity v. Each of you has a meterstick and a stopwatch. As she moves by you, you notice that her meter stickis shorter than yours and her second hand is moving more slowly than yoursecond hand. You call for her to stop and stand next to you and discover thatnow the two meter sticks exactly line up and that the two stopwatches tickexactly the same. Something strange is going on. In this section, we use theprevious Lorentz transformations to explain this weirdness in measurementsof length and time.

We start with length. Suppose we measure, in our reference frame, thedistance between a point at x1 and a point at x2 with both points on the x-axis.The length is simply

l = x2 − x1,

assuming that x1 < x2. But what is this in the moving frame? We have to makethe measurement in the moving frame when the time is constant. So supposein the moving frame that the point at x1 is at x ′

1 at time t ′ and the point at x2 isat x ′

2 at time t ′. In the moving frame the distance between the points is

l ′ = x ′2 − x ′

1.

Page 51: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

36 4 Special Relativity

Via the Lorentz transformations, we can translate between the coordinates inthe two frames, using that x = γ x ′ +vγ t ′ where γ = 1√

1−(v/c)2. Then we have

x2 = γ x ′2 + vγ t ′

x1 = γ x ′1 + vγ t ′.

Hencel = x2 − x1 = γ (x ′

2 − x ′1) = γ l ′.

Lengths must vary.Now to see that time must also vary. Here we fix a position in space and

make two time measurements. In our frame, where we think we are standingstill, suppose at some point x we start a stopwatch at time t1 and stop it at t2.We view the elapsed time as simply

�t = t2 − t1.

Let the times in the moving frame be denoted by t ′1 and t ′

2, in which case thechange in time in this frame is

�t ′ = t ′2 − t ′

1.

Using that t ′ = γ(

vc2

)x + γ t , we have that

t ′1 = γ

( v

c2

)x + γ t1

t ′2 = γ

( v

c2

)x + γ t2,

which gives us�t ′ = t ′

2 − t ′1 = γ (t2 − t1) = γ�t .

Note that since v < c, we have γ > 1. Thus l ′ < l while �t ′ > �t , which iswhy the we use the terms “Lorentz contraction” and “time dilation.”

4.4.3. Proper Time

There is no natural coordinate system. There is no notion of “true” time or“true” position. Still, for a given particle moving at a constant velocity, there isone measure of time that can be distinguished from all others, namely, the timeon a coordinate system when the particle’s velocity is simply zero. Thus, forthis particular coordinate system, the particle stays at the origin for all time.The measure of time for this coordinate system has its own name: proper time.

From the previous subsection, if we use t , x , y, and z to denote thecoordinate system for which a particle remains at the origin, and t ′, x ′, y ′, and

Page 52: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.4 Lorentz Transformations 37

z′ to denote a coordinate system moving with speed v with respect to the firstcoordinate system, we have

�t ′ = t ′2 − t ′

1 = �t√1 − ( vc )2

= γ (t2 − t1) = γ�t .

The use of proper time will be critical when we define the relativistic versionof momentum.

4.4.4. The Special Relativity Invariant

Here is how the world works. Start with two friends, both having identicalmeter sticks and identical stopwatches. Now suppose each is moving at a speedv with respect to the other. (Thus both think that they are perfectly still andit is their friend who is moving.) Both think that their friend’s meter stick hasshrunk while their friend’s stopwatch is running slow. The very units of lengthand time depend on which reference frame you are in. These units are notinvariants but are relative to what reference frame they are measured in. Thisis why this theory is called the Special Theory of Relativity.

This does not mean that everything is relative. Using Lorentz transforma-tions, we can translate the lengths and times in one reference frame to thosein another. But there is a quantity that cannot change, no matter the referenceframe. There is a number that is an invariant.

Suppose we have a coordinate system (x1, y1, z1, t1), where (x1, y1, z1) arethe three space coordinates and t is the time coordinate. Let there be anothercoordinate system moving to the right of the first, in the x-direction, at speedv, with corresponding coordinates (x2, y2, z2, t2). We know that

x2 = γ x1 − γ vt1

y2 = y1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

where

γ = 1√1 − ( vc )2

.

The number that is invariant is

c2t2 − x2 − y2 − z2.

Page 53: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

38 4 Special Relativity

One of the exercises asks you to show explicitly for the previous transformationthat

c2t22 − x2

2 − y22 − z2

2 = c2t21 − x2

1 − y21 − z2

1.

Note that time is playing a different role than the space coordinates.

4.4.5. Lorentz Transformations, the Minkowski Metric,and Relativistic Displacement

We want to systematize the allowable changes of coordinates. In the previousdiscussion, we have a very special type of change of coordinates. Whileit is certainly reasonable to assume that we can always choose our spacecoordinates so that the relative movement between reference frames is in thex-direction, this does involve a choice.

We expect that two coordinate systems moving at a constant velocity withrespect to each other are related by a four-by-four matrix A such that

t2x2

y2

z2

= A

t1x1

y1

z1

What we need to do is find conditions on A that are compatible with relativitytheory.

As seen in the previous subsection, the key is that we want

c2(change in time)2 − (change in distance)2

to be a constant.

Definition 4.4.1. The map

ρ : R4 ×R4 → R

defined by setting

ρ

t2x2

y2

z2

,

t1x1

y1

z1

= (

t2 x2 y2 z2)

c2 0 0 00 −1 0 00 0 −1 00 0 0 −1

t1x1

y1

z1

= c2t1t2 − x1x2 − y1 y2 − z1z2

is the Minkowski metric. Sometimes this is also called the Lorentz metric.

Page 54: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.4 Lorentz Transformations 39

This suggests that in special relativity the allowable changes of coordinateswill be given by 4 × 4 matrices A that satisfy

AT

c2 0 0 00 −1 0 00 0 −1 00 0 0 −1

A =

c2 0 0 00 −1 0 00 0 −1 00 0 0 −1

,

that is, the matrices that preserve the Minkowski metric.More succinctly, we say:

Definition 4.4.2. An invertible 4 × 4 matrix A is a Lorentz transformation iffor all vectors v1,v2 ∈R4, we have

ρ(v2,v1) = ρ(Av2, Av1).

The function ρ : R4 ×R4 → R is the special relativistic way of measuring“distance” and thus can be viewed as an analog of the Pythagorean theorem,which gives the distance in Euclidean space. Classically, the vector thatmeasures a particle’s movement over time, or its displacement, is given bythe changes in the spatial coordinates (�x ,�y,�z). Relativistically, this doesnot seem to be a natural vector at all. Instead, motivated by the Minkowskimetric, we make the following definition:

Definition 4.4.3. The relativistic displacement vector is

(c�t ,�x ,�y,�z).

Note that the Minkowski length of the displacement vector

ρ((c�t ,�x ,�y,�z), (c�t ,�x ,�y,�z))

is independent of which coordinate system is chosen.There is one significant difference between the traditional “Pythagorean”

metric and our new Minkowski metric, in that we can have “negative” lengths.Consider

ρ(v,v) = c2t2 − x2 − y2 − z2.

This number can be positive, negative, or even zero.

Definition 4.4.4. A vector v = (t , x , y, z) is space-like if

ρ(v,v) < 0,

time-like if

ρ(v,v) > 0,

Page 55: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

40 4 Special Relativity

and light-like ifρ(v,v) = 0.

Time-likeSpace-like

Light-liket

xy

t

xy

t

xy

ρ(v, v ) < 0 ρ(v, v ) = 0 ρ(v, v ) > 0

Figure 4.4

(From Figure 4.4, we can see why people say that light travels on the “lightcone.”) A natural question is why different vectors have the names “space-like,” “time-like,” and “light-like.” We will now build the intuitions behindthese names.

Suppose in a coordinate frame we are given two points: an event A withcoordinates (0,0,0,0) and an event B with coordinates (t ,0,0,0) with t > 0.We can reasonably interpret this as both A and B being at the origin of space,but with B occurring at a later time. Note that event B is time-like.

Now suppose that event B is still time-like, but now with coordinates(t , x ,0,0), with, say, both t and x positive. Thus we know that c2t2 − x2 > 0and hence that ct > x . We want to show that there is a Lorentz transformationleaving event A at the origin but changing event B’s coordinates to (t ′,0,0,0),meaning that in some frame of reference, moving at some velocity v withrespect to the original frame, event B will seem to be standing still. Thuswe must find a v such that

x ′ = γ x − γ vt = 0,

where γ = 1√1−(v/c)2

.

Consider the Figure 4.5:Since B is time-like, we have

x = λt

for some constant λ < c. If we set v = λ, we get our result. Thus a point istime-like if we can find a coordinate system such that the event is not movingin space.

Page 56: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.4 Lorentz Transformations 41

t

x

x = –ct x = ct(t1, x1, 0, 0)

x = λc

x1

t1

Figure 4.5

Space-like events have an analogous property. Start with our event A atthe origin (0,0,0,0), but now let event B have coordinates (0, x , y, z). In thiscoordinate frame, we can interpret this as event A and event B as happening atthe same time but at different places in space. Note that event B is space-like.

Now let event B still be space-like, but now with coordinates (t , x ,0,0),with both t and x positive. Thus we know that ct < x . Similarly to before,we want to find a Lorentz transformation leaving the event A at the origin butchanging the event B’s coordinates to (0, x ,0,0), meaning that in some frameof reference, moving at some velocity v with respect to the original frame, theevent B will occur at exactly the same time as event A. Thus we want to find avelocity v such that

t ′ = −γ( v

c2

)x + γ t = 0.

Since B is space-like, we have

x = λt

for some constant λ > c.

To get t ′ = 0, we need

( v

c2

)x = t

or, plugging in x = λt , ( v

c2

)λt = t .

Page 57: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

42 4 Special Relativity

t

x

x = –ct x = ct

(t1, x1, 0, 0)

x = λc

x1

t1

Figure 4.6

Thus we need to have

v = c2

λ.

Here it is critical that event B is space-like, since that forces λ > c, which inturn means that v is indeed less than the speed of light c.

Consider again

t

xy

Past

Future

Figure 4.7

Suppose we start at the origin (0,0,0,0). Is it possible for us to move to someother event with coordinates (t , x , y, z)? Certainly we need t > 0, but we alsoneed (t , x , y, z) to be time-like, as otherwise we would have to move faster thanthe speed of light. Hence we call this region the future. Similarly, for an event(t , x , y, z) to be able to move to the origin, we need t < 0 for it to be time-like; these points are in the region called the past. Space-like events cannotinfluence the origin.

Page 58: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.5 Velocity and Lorentz Transformations 43

4.5. Velocity and Lorentz Transformations

Fix a coordinate frame, which we label as before with coordinates(x1, y1, z1, t1). Suppose a particle moves in this frame, with space coordinatesgiven by functions

(x1(t1), y1(t1), z1(t1)).

Then the velocity vector is

v1(t1) =(

dx1

dt1,dy1

dt1,dz1

dt1

).

Now consider a different frame, with coordinates (x2, y2, z2, t2), movingwith constant velocity with respect to the first frame. We know that thereis a Lorentz transformation A taking the coordinates (x1, y1, z1, t1) to thecoordinates (x2, y2, z2, t2). Then the moving particle must be describable viaa function

(x2(t2), y2(t2), z2(t2)),

with velocity vector

v2(t2) =(

dx2

dt2,dy2

dt2,dz2

dt2

).

The question for this section is how to relate the vector v1 with the vectorv2.

Since we have

t2x2(t2)y2(t2)z2(t2)

= A

t1x1(t1)y1(t1)z1(t1)

,

we can compute, for example, that

dx2

dt2= d(x2 as a function of (x1, y1, z1, t1))

dt1

dt1dt2

,

allowing us to relate dx2/dt2 to dx1/dt1,dy1/dt1 and dz1/dt1. In a similar way,we can find dy2/dt2 and dz2/dt2 in terms of the dx1/dt1,dy1/dt1 and dz1/dt1.

To make this more concrete, let us consider one possible A, namely, thecoordinate change

x2 = γ x1 − γ vt1

y2 = y1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

Page 59: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

44 4 Special Relativity

where γ = 1/√

1 − ( vc )2.

Since

dt2dt1

=d( − γ

( v

c2

)x1 + γ t1)

dt1

= −γ( v

c2

) dx1

dt1+ γ

and

dx2

dt1= d(γ x1 − γ vt1)

dt1

= γdx1

dt1− γ v,

we have

dx2

dt2= dx2

dt1

dt1dt2

=

dx1

dt1− γ v

)(

−γ( v

c2

) dx1

dt1+ γ

)

=

(dx1

dt1− v

)(

−( v

c2

) dx1

dt1+ 1

) .

In a similar way, as will be shown in the exercises, we have

dy2

dt2=

dy1

dt1(−γ

( v

c2

) dx1

dt1+ γ

)

dz2

dt2=

dz1

dt1(−γ

( v

c2

) dx1

dt1+ γ

) .

Page 60: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.6 Acceleration and Lorentz Transformations 45

4.6. Acceleration and Lorentz Transformations

Using the notation from the last section, the acceleration vector in frame 1 is

a1(t1) =(

d2x1

dt21

,d2y1

dt21

,d2z1

dt21

),

while the acceleration vector in frame 2 is

a2(t2) =(

d2x2

dt22

,d2y2

dt22

,d2z2

dt22

).

We can relate the vector a2(t2) to a1(t1) in a similar fashion to how we relatedthe velocities in the previous section. For example, we have

d2x2

dt22

= d

dt1

(dx2

dt2

)dt1dt2

,

and we know how to write dx2/dt2 in terms of the coordinates of frame 1. Tobe more specific, we again consider the Lorentz transformation

x2 = γ x1 − γ vt1

y2 = y1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

where γ = 1/√

1 − ( vc )2. Then we have

d2x2

dt22

= d

dt1

(dx2

dt2

)dt1dt2

= d

dt1

(dx1

dt1− v

)(

−( v

c2

) dx1

dt1+ 1

) dt1

dt2

=d2x1

dt21

γ 3

(−( v

c2

) dx1

dt1+ 1

)3 .

Page 61: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

46 4 Special Relativity

Similarly, as will be shown in the exercises, we have

d2 y2

dt22

= 1

γ 2

(−( v

c2

) dx1

dt1+ 1

)2

d2 y1

dt21

+

vdy1

dt1

c2 − vdx1

dt1

d2x1

dt21

d2z2

dt22

= 1

γ 2

(−( v

c2

) dx1

dt1+ 1

)2

d2z1

dt21

+

vdz1

dt1

c2 − vdx1

dt1

d2x1

dt21

.

4.7. Relativistic Momentum

(While standard, here we follow Moore’s A Traveler’s Guide to Spacetime:An Introduction to the Special Theory of Relativity [43].) There are a numberof measurable quantities that tell us about the world. Velocity (distance overtime) and acceleration (velocity over time) are two such quantities, whoseimportance is readily apparent. The importance of other key quantities,such as momentum and the various types of energy, is more difficult torecognize immediately. In fact, Jennifer Coopersmith’s recent Energy, theSubtle Concept: The Discovery of Feynman’s Blocks from Leibniz to Einstein[11] gives a wonderful history of the slow, if not tortuous, history of thedevelopment of the concept of energy and, to a lesser extent, momentum. Wewill simply take these notions to be important. Here we will give an intuitivedevelopment of the relativistic version of momentum. As before, this shouldnot be viewed as any type of mathematical proof. Instead, we will make aclaim as to what momentum should be in relativity theory and then see, inthe next chapter, that this relativistic momentum will provide us with a deeperunderstanding of the link between electric fields and magnetic fields.

Classically, momentum is

momentum = mass × velocity.

Thus, classically, for a particle moving at a constant velocity of v we have

momentum = mass × displacement vectorchange in time .

In special relativity, though, as we have seen, the displacement vector is notnatural at all. This suggests that we replace the preceding numerator with therelativistic displacement.

Page 62: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.7 Relativistic Momentum 47

We need to decide now what to put into the denominator. Among allpossible coordinate systems for our particle, there is one coordinate systemthat gives us proper time. Recall that proper time is the time coordinate in thecoordinate system where the particle stays at the origin. This leads to

Definition 4.7.1. The relativistic momentum p is

p = mass × relativistic displacement vector

change in proper time.

We now want formulas, since, if looked at quickly, it might appear that weare fixing a coordinate system where the particle has zero velocity and hencewould always have zero momentum, which is silly.

Choose any coordinate system t ′, x ′, y ′, and z′ in which the particle ismoving at some constant v = (vx ,vy ,vz) whose speed we also denote by v.The relativistic displacement vector is

(c�t ′,�x ′,�y ′,�z′).

In the different coordinate system that gives us the proper time (in which theparticle is not moving at all), if �t measures the change in proper time, weknow from three sections ago that

�t ′ = γ�t .

Then for the momentum we have

p = (pt , px , py, pz)

= m ·(

c�t ′

�t,�x ′

�t,�y ′

�t,�z′

�t

)

= m ·(

c�t ′

�t ′ · �t ′

�t,�x ′

�t ′ · �t ′

�t,�y ′

�t ′ · �t ′

�t,�z ′

�t ′ · �t ′

�t

)

=(

γ mc,γ m�x ′

�t ′ ,γ m�y ′

�t ′ ,γ m�z′

�t ′

)

= (γ mc,γ mvx ,γ mvy ,γ mvz

).

Thus momentum in relativity theory is a four-vector, with spatial componentsthe classical momentum times an extra factor of γ , and is hence

γ mv = γ m(vx ,vy,vz).

Our justification for this definition will lie in the work of the next chapter.Also, though we will not justify this, the time component pt = γ mc of the

momentum is the classical energy divided by the speed of light c. This is why

Page 63: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

48 4 Special Relativity

in relativity theory people usually do not talk about the momentum vector butinstead use the term energy-momentum four-vector.

4.8. Appendix: Relativistic Mass

4.8.1. Mass and Lorentz Transformations

In the last section, we gave an intuitive argument for why the momentum (inthe spatial coordinates) should be γ mv = γ m(vx ,vy ,vz). Thus, in relativitytheory, we want not only lengths and times to depend on reference framesbut also momentums. There is another, slightly more old-fashioned approach,though. Here we stick with momentum being mass times velocity, as inclassical physics, with no apparent extra factor of γ . What changes is thatwe no longer have the mass m as a constant but, instead, also consider themass as depending on the reference frame.

With this approach, we start with an object that is in a reference frame inwhich it is not moving. Suppose it has a mass m0, which we call the rest mass.In this approach, the rest mass is what, in the last section, we just called themass. We will show that the same object moving at speed v will have mass

m = γ m0,

where γ = 1/√

1 − (v/c)2. Then, if momentum is mass times velocity, we willhave the momentum being γ m0v, agreeing with our work from last section.

This argument is more subtle and demands a few more underlyingassumptions than those for velocity and acceleration, but it does have someadvantages for justifying last section’s definition for momentum. We will beclosely following section 24 of [65].

In frame 1, suppose we have two identical objects moving toward each otheralong the x-axis, hitting each other at the origin in a head-on collision. LetObject 1 move with speed u in the x-direction and Object 2 move with speed−u, also in the x-direction, each with the same mass. We assume that theybounce off each other, with the first object now having speed −u and thesecond having speed u. Such collisions are called elastic.

Object 1

speed u

Object 2

speed –u

Before Collision

Figure 4.8

Page 64: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.8 Appendix: Relativistic Mass 49

Object 1

speed –u

Object 2

speed u

After Collision

Figure 4.9

Now assume that reference frame 2 is moving with speed −v in the x-direction with respect to the lab frame. We denote the speed of Object 1 in thex-direction as u1 and its mass as m1, and denote the speed of Object 2 movingin the x-direction as u2 and its mass as m2.

Now to make some assumptions, which are a fleshing out ofAssumption I: Physics must be the same in all frames of references that moveat constant velocities with respect to each other.

First, in any given reference frame, we assume that the total mass of asystem cannot change (meaning that the total mass must be conserved in agiven reference frame). Note that we are not saying that mass stays the samein different reference systems, just that within a fixed reference frame the totalmass is a constant. In the second reference frame, let the total mass be m. Thenwe have

m1 + m2 = m.

We now set momentum to be equal to mass times velocity. (This is onlyfor this Appendix; for most of the rest of the book, momentum will be γ

times mass times velocity.) Classical physics tells us that momentum isconserved. Like mass, total momentum will be assumed to be conserved ina fixed reference frame. (Again, this is subtle, as momentum does change ifwe shift reference frames.)

In the second reference frame, the total momentum will be m1u1 + m2u2

before the collision. At the moment of collision, we can think of the twoobjects as a single object, with mass m moving with velocity v and havingmomentum mv. Since we are assuming that the total momentum will beconserved in reference frame 2, we have

m1u1 + m2u2 = mv.

We want to find formulas for the masses m1 and m2 in frame 2. Since wenow have two equations and our two unknowns, we can solve to get

m1

m2= u2 − v

v− u1,

as shown in the exercises. Note that the total mass m has dropped out.

Page 65: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

50 4 Special Relativity

By our work on how velocities change under Lorentz transformations, weknow that

u1 = u + v

1 + uv

c2

and u2 = −u + v

1 − uv

c2

.

This will yield

m1

m2=

1 + uv

c2

1 − uv

c2

,

as seen in the exercises.One final equality is needed before reaching our goal of motivating the

equation m = γ m0. From the exercises, we know that u1 = (u +v)/(1+uv/c2)implies that √

1 − u21

c2=

√1 − u2

c2

√1 − v2

c2

1 + uv

c2

,

with a similar formula for√

1 − (u2/c)2.Putting these together yields

m1

m2=

√1 − u2

2

c2√1 − u2

1

c2

.

By assumption both objects are identical with mass m0 at rest. Choose oursecond reference frame so that u2 = 0, meaning that the second object is at rest(and hence m2 = m0) and so u = v. Then we get

m1

m0= 1√

1 − u21

c2

,

giving us our desiredm = γ m0.

Of course, in the real world we would now have to check experimentallythat momentum and mass are indeed conserved in each reference frame. Asmathematicians, though, we can be content with just taking these as given.Also, as mentioned before, most people would now consider mass to be the restmass and hence an invariant and then slightly alter the definition of momentumby multiplying through by the extra factor of γ .

Page 66: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.8 Appendix: Relativistic Mass 51

4.8.2. More General Changes in Mass

We now want to compare the measurement of the mass of an object indifferent reference frames in an even more general situation.

As usual, start with reference frame 1 and assume that reference frame 2 ismoving in the positive x-direction with respect to frame 1. We will have threedifferent velocities. Let v be the velocity of frame 2 with respect to frame 1,let v1 be the speed of an object in frame 1 moving in the x-direction, and letv2 be the speed of the same object but now in frame 2, still moving in the x-direction. Let m1 be the object’s mass in frame 1 and m2 be the mass in frame2.

Our goal is to justify

m2 =

1 − v1v

c2√1 − v2

c2

m1

= γ(

1 − v1v

c2

)m1.

We start with the formulas for rest mass from the last section

m2 = m0√1 − v2

2

c2

m1 = m0√1 − v2

1

c2

.

Then we get

m2 =

1 − v21

c2√1 − v2

c2

m1.

Now, we know how to relate the velocities v1 and v2, namely,

v2 = v1 − v

1 − v1v

c2

.

Plugging in the preceding, after squaring and doing a lot of algebra (which youare asked to do as an exercise), yields our desired

m2 = γ

(1 − v1v

c2

)m1.

Page 67: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

52 4 Special Relativity

4.9. Exercises

Exercise 4.9.1. Suppose we set up two coordinate systems. The first we willcall the laboratory frame, denoting its coordinates by

(x1, y1, z1, t1).

The second frame of reference will have coordinates denoted by

(x2, y2, z2, t2).

Suppose that this second frame is moving along at a constant speed of v inthe direction parallel to the y-axis, with respect to the lab frame. Finally,suppose that the origins of both coordinate systems describe the same event.Find the Lorentz transformation that takes the (x1, y1, z1, t1) to the (x2, y2, z2, t2)coordinate system.

Exercise 4.9.2. Suppose we set up two coordinate systems. The first we willcall the laboratory frame, denoting its coordinates by

(x1, y1, z1, t1).

Now suppose we have another frame of reference with its coordinates denotedby

(x2, y2, z2, t2).

Suppose that this new frame is moving along at a constant speed of v units/secin the direction parallel to the vector (1,1,0) with respect to the lab frame.Suppose that the origins of both coordinate systems describe the same event.Find the Lorentz transformation that takes the (x1, y1, z1, t1) to the (x2, y2, z2, t2)coordinate system.

Exercise 4.9.3. Let A ∈ O(3,R). Suppose that v ∈ R3 is an eigenvector of Awith eigenvalue λ, meaning that Av = λv. Show that λ = ±1.

Exercise 4.9.4. Using the previous exercise, show that if A ∈ O(3,R) then Ais an invertible matrix.

Exercise 4.9.5. Show that A ∈ O(3,R) means that

AT A = I .

Exercise 4.9.6. For any two 3 × 3 matrices, show that (AB)T = BT AT . Usethis to show that if A, B ∈ O(3,R), then AB ∈ O(3,R).

Exercise 4.9.7. Using the notation of Section 4.4.1 and using that x2 = γ x1 −γ vt1 and x1 = γ x2 + γ vt2 show that t2 = −γ

(vc2

)x1 + γ t1.

Page 68: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.9 Exercises 53

Exercise 4.9.8. With the notation of Section 4.4.4, show that

c2t22 − x2

2 − y22 − z2

2 = c2t21 − x2

1 − y21 − z2

1.

Exercise 4.9.9. Show that if A and B are Lorentz transformations, then so isAB.

Exercise 4.9.10. Show that if A is a Lorentz transformation, then A must beinvertible. Then show that A−1 is also a Lorentz transformation.

These last two exercises show that the Lorentz transformations form agroup.

Exercise 4.9.11. For coordinate change

x2 = γ x1 − γ vt1

y2 = y1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

where γ = 1/√

1 − ( vc )2, show

dy2

dt2=

dy1

dt1(−γ

( v

c2

) dx1

dt1+ γ

)

dz2

dt2=

dz1

dt1(−γ

( v

c2

) dx1

dt1+ γ

) .

Page 69: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

54 4 Special Relativity

Exercise 4.9.12. Using the notation from the previous exercise, show

d2x2

dt22

=d2x1

dt21

γ 3

(−( v

c2

) dx1

dt1+ 1

)3

d2 y2

dt22

= 1

γ 2

(−( v

c2

) dx1

dt1+ 1

)2

d2 y1

dt21

+

vdy1

dt1

c2 − vdx1

dt1

d2x1

dt21

d2z2

dt22

= 1

γ 2

(−( v

c2

) dx1

dt1+ 1

)2

d2z1

dt21

+

vdz1

dt1

c2 − vdx1

dt1

d2x1

dt21

.

Exercise 4.9.13. Choose units such that the speed of light c = 10. Suppose youare moving in some coordinate frame with velocity v = (3,5,2). Observe twoevents, the first occurring at t1 = 8 and the second at t2 = 12. Find the propertime between these two events.

Exercise 4.9.14. Choose units such that the speed of light c = 10. Let aparticle have mass m = 3 and velocity v = (4,5,2). Calculate its relativisticmomentum.

Exercise 4.9.15. Using the notation from the Appendix, show that if

m1 + m2 = m.

and

m1u1 + m2u2 = mv,

thenm1

m2= u2 − v

v− u1.

Exercise 4.9.16. Using the notation from the Appendix, show that if

m1

m2= u2 − v

v− u1,

and

u1 = u + v

1 + uv

c2

and u2 = −u + v

1 − uv

c2

,

Page 70: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

4.9 Exercises 55

then

m1

m2=

1 + uv

c2

1 − uv

c2

,

Exercise 4.9.17. Using the notation from the Appendix, show that u1 = u+v1+uv/c2

implies that √1 − u2

1

c2=

√1 − u2

c2

√1 − v2

c2

1 + uv

c2

and that u2 = −u+v1−uv/c2 implies that

√1 − u2

2

c2=

√1 − u2

c2

√1 − v2

c2

1 − uv

c2

,

Exercise 4.9.18. Using the notation from the Appendix, show that

m2 =

1 − v1v

c2√1 − v2

c2

m1

= γ(

1 − v1v

c2

)m1,

using that

m2 = m0√1 − v2

2

c2

m1 = m0√1 − v2

1

c2

v2 = v1 − v

1 − v1v

c2

.

Page 71: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5

Mechanics and Maxwell’s Equations

Summary: Despite linking the electric and magnetic fields, Maxwell’sequations are worthless for science if they do not lead to experimentalpredictions. We want to set up a formalism that allows us to makemeasurements. This chapter will first give an overview of Newtonianmechanics. Then we will see how the electric and magnetic fields fit intoNewtonian mechanics. In the final section, we will see that the force from theelectric field (Coulomb’s law), together with the Special Theory of Relativityand the assumption of charge conservation, leads to magnetism.

5.1. Newton’s Three Laws

The development of Newtonian mechanics is one of the highlights of humanity.Its importance to science, general culture, and our current technological worldcannot be overstated. With three laws, coupled with the calculational powerof calculus, much of our day-to-day world can be described. Newton, with hislaws, could describe both the motions of the planets and the flight of a ball. Theworld suddenly became much more manageable. Newton’s approach becamethe model for all of learning. In the 1700s and 1800s, bright young people,at least in Europe, wanted to become the Newtons of their fields by findinganalogs of Newton’s three laws. No one managed to become, though, theNewton of sociology.

We will state Newton’s three laws and then discuss their meaning. We quotethe three laws from Halliday and Resnick’s Physics [32].

Newton’s First Law: Every body persists in its state of rest or of uniformmotion in a straight line unless it is compelled to change that state by forcesimpressed on it. (On page 75 in [32], quoting in turn Newton himself.)

Newton’s Second Law:

Force = mass · acceleration.

56

Page 72: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5.1 Newton’s Three Laws 57

Newton’s Third Law: To every action there is always an opposed equalreaction; or, the mutual actions of two bodies upon each other are always equal,and directed to contrary parts. (On page 79 in [32], quoting again Newtonhimself.)

Now we have to interpret what these words mean. To define terms such as“force” rigorously is difficult; we will take a much more pragmatic approach.

We start with the description of a state of an object that we are interestedin. All we know is its position. We want to know how its position will changeover time. We assume that there is some given Cartesian coordinate systemx , y, z describing space and another coordinate t describing time. We want tofind three functions x(t), y(t), and z(t) so that the vector-valued function

r (t) = (x(t), y(t), z(t))

describes the position of our object for all time. Once we have r (t), we candefine the velocity v(t) and the acceleration a(t) as follows:

v(t) = d

dtr (t) =

(dx

dt,dy

dt,dz

dt

)

a(t) = d

dtv(t) =

(d2x

dt2,d2 y

dt2,d2z

dt2

).

Though few physicists would be nervous at this point, do note that we aremaking assumptions about the differentiability of the component functions ofthe position function r (t).

Now to define force. To some extent, Newton’s three laws give a descriptionof the mathematical possibilities for force. Newton’s first law states that ifthere are no forces present, then the velocity must be a constant. Turning thisaround, we see that if the velocity is not constant, then a force must be present.

The second law

Force = mass × acceleration

F = m · a(t)

gives an explicit description of how a force affects the velocity. The right-handside is just the second derivative of the position vector times the object’s mass.It is the left-hand side that can change, depending on the physical situationwe are in. For example, the only force that Newton explicitly describedwas gravity. Let our object have mass m in space at the point (x , y, z) in agravitational field created by some other object of mass M centered at theorigin (0,0,0). Then the gravitational force on our first object is the vector

Page 73: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

58 5 Mechanics and Maxwell’s Equations

valued function

F = GmM

(x2 + y2 + z2)3/2(x , y, z)

where G is a constant that can be determined from experiments. Note thatthe length of this vector, which corresponds to the magnitude of the force ofgravity, is

|F | = GmM

(distance)2,

as seen in the exercises. (We needed to put the somewhat funny lookingexponent 3/2 into the definition of the gravitational force in order to get thatthe magnitude of the force is proportional to 1 over the distance squared.)

In order to use the second law, we need to make a guess as to the functionalform of the relevant force. Once we have that, we have a differential equationthat we can attempt to solve.

The third law states, “To every action there is always an opposed and equalreaction,” leading generations of students to puzzle out why there is any motionat all. (After all, why does everything not cancel out?) What Newton’s thirdlaw does is allow us to understand other forces. Let us have two objects A andB interact. We say that object A acts on object B if A exerts some force on B ,meaning from the first law that somehow the presence of A causes the velocityof B to change. The third law then states that B must do the same to A, butin the opposite direction. This law places strong restrictions on the nature offorces, allowing us to determine their mathematical form.

We have not explained at all the concept of mass. Also, the whole waywe talked about one object exerting a force on another certainly begs a lot ofquestions, for example, how does one object influence another; how long doesit take; or what is the mechanism? Questions such as “action at a distance”and “instantaneous transmission of forces” again become critical in relativitytheory.

5.2. Forces for Electricity and Magnetism

5.2.1. F = q(E + v× B)

For science to work, we must be able to make experimental predictions. Wewant to set up the formalism that allows us to make measurements. For now,we will be in a strictly Newtonian world. Historically, Newton’s mechanicswere the given and Maxwell’s equations had to be justified. In fact, it strongly

Page 74: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5.2 Forces for Electricity and Magnetism 59

appears that Maxwell’s equations are more basic than Newton’s. But, for now,we will be Newtonian.

Maxwell’s equations link the electric field with the magnetic field. Butneither of these vector fields is a force and thus neither can be applied directlyin Newton’s second law F = m · d2r

dt2 . We need to show how the electric fieldE and the magnetic field B define the force that a particle will feel. We willsimply state the relationship. The “proof” lies in doing experiments.

Let a particle be described by r (t) = (x(t), y(t), z(t)), with velocity

v(t) = dr

dt=(

dx

dt,dy

dt,dz

dt

).

Suppose it has charge q , which is just a number. From experiment, the forcefrom the electric and magnetic fields is

F = q(E + v× B).

Thus if the particle’s charge is zero, then the electric and magnetic fields willhave no effect on it. If a particle’s velocity is zero, then the magnetic fieldhas no effect. The presence of the cross-product reflects what must have beeninitially a deep surprise, namely, that if a charged particle is moving in a givendirection in the presence of a magnetic field, then the particle will suddenlyaccelerate at right angles to both the magnetic field and its initial direction.If the particle is initially moving in the same direction as the magnetic field,then no force is felt. We will see later in this chapter how this perpendicularitystems from the Special Theory of Relativity and the force from the electricfield (namely, q E).

5.2.2. Coulomb’s Law

Suppose that we have two charges, q1 and q2, that are a distance r apart fromeach other. Coulomb’s law states that the magnitude of the force betweenthem is

F = constantq1q2

r2.

This is not a mathematical theorem but an experimental fact. If r denotes thevector between the two charges, then Coulomb’s law can be expressed as

F = constantq1q2

(length of r )3r ,

where F denotes the force vector.We want to use the description of the force from Coulomb’s law to derive

a corresponding electric field E . Fix a charge q1 at a point in space p1 =

Page 75: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

60 5 Mechanics and Maxwell’s Equations

(x1, y1, z1). This charge will create an electric field, which we will nowdescribe. Given a point p = (x , y, z) �= p1, let

r = (x − x1, y − y1, z − z1)

with length|r | =

√(x − x1)2 + (y − y1)2 + (z − z1)2.

Introduce a new charge q at the point p = (x , y, z). The electric field createdby the charge q1 will be

E(x , y, z) = q1

|r |3 r =(

q1(x − x1)

|r |3 ,q1(y − y1)

|r |3 ,q1(z − z1)

|r |3)

,

since this electric field will indeed give us that F = q E .

5.3. Force and Special Relativity

5.3.1. The Special Relativistic Force

In classical mechanics, we have the force as

F = ma = mdv

dt= d(mv)

dt= d(momentum)

dt.

Here we are using the classical mechanics definition for momentum, namely,that momentum equals mass times velocity. Last chapter, though, we gavean intuitive argument for why in relativity theory the momentum vector p =( px , py, pz)1 should actually be

p = (px , py, pz)

= γ m(vx ,vy ,vz)

= γ mv,

where γ = 1/√

1 − (v/c)2 with v being the speed. This leads to the question ofthe correct definition for force in special relativity: Is it F = ma or F = d(γ mv)

dt ?We treat this as an empirical question, allowing nature to dictate that the forcemust be

F = d(γ mv)

dt

1 A comment on notation. Earlier we used the notation px to refer to ∂ p/∂x , and so forth.In this section, though, px refers not to the partial derivative but to the x-component of thevector p. Earlier we denoted this x-component by p1, which we are not doing here, since in amoment we will need to use the numbered subscript to refer to different frames of reference.We will follow the same notation convention for the velocity = (vx ,vy ,vz ), acceleration a =(ax ,ay ,az ), and force F = (Fx , Fy , Fz ).

Page 76: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5.3 Force and Special Relativity 61

in special relativity. (There are other more mathematical approaches.)Denote the force as

F = (Fx , Fy , Fz),

the velocity as

v = (vx ,vy ,vz),

and the acceleration as

a = (ax ,ay,az).

Then we get

F = (Fx , Fy , Fz)

= d

dt(γ mv)

= γ md(v)

dt+ mv

d(γ )

dt

= γ m(ax ,ay ,az) + d(γ )

dt(mvx ,mvy ,mvz)

= γ ma + d(γ )

dtmv.

In special relativity, it is no longer the case that the force vector points in thesame direction as the acceleration vector.

5.3.2. Force and Lorentz Transformations

We now want to see how forces change under Lorentz transformations. Wecould derive the equations, using the same techniques as used previouslyto find how velocity, acceleration, and momentum change under Lorentztransformations. This derivation is, though, quite algebraically complicated.Thus we will just write down the resulting coordinate changes.

Let reference frame 1 denote the lab frame. In this frame of reference,denote the object’s velocity by

v1 = (v1x ,v1y ,v1z)

and its force by

F1 = (F1x , F1y , F1z).

Page 77: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

62 5 Mechanics and Maxwell’s Equations

Let reference frame 2 move at constant speed v in the direction (1,0,0) withrespect to frame 1, with Lorentz transform

x2 = γ x1 − γ vt1

y2 = y1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

where γ = 1/√

1 − (v/c)2. Let the force in reference frame 2 be denoted byF2 = (F2x , F2y , F2z), and let its velocity be denoted by v1 = (v2x ,v2y ,v2z).

Then, after a lot of calculations, we get

F1x = F2x + v

c2 + v2xv(v2y F2y + v2z F2z)

F1y = F2y

γ

(1 +

(v2xv

c2

))

F1z = F2z

γ(

1 +(v2xv

c2

)) .

5.4. Coulomb + Special Relativity+ Charge Conservation = Magnetism

(While standard, this follows from section 6.1 in Lorrain and Corson’sElectromagnetic Fields and Waves [37].)

Maxwell’s equations use cross-products. This reflects the physical fact thatif you have current flowing through a wire in the xy-plane running along thex-axis and then move a charge (in the xy-plane) along the y-axis toward thewire, suddenly the charge will feel a force perpendicular to the xy-plane.The charge wants to jump out of the plane. This is easy to observeexperimentally. On a personal note, I still remember my uneasiness uponlearning this strange fact. Thus it came as both a relief and a surprise whenI learned in college how special relativity gives a compelling reason for theoccurrence of this “perpendicularness.”

Here is another way of interpreting the “oomph” behind this section.Maxwell’s equations explain how electricity and magnetism are linked but, intheir initial form, give no clue as to why they are related. Special relativity willanswer this “why” question by showing that our observation of a magnetic

Page 78: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5.4 Coulomb + Special Relativity + Charge Conservation = Magnetism 63

force is actually our observation of the electric force in a moving frame ofreference.

We will show in this section how

Coulomb’s Law + Special Relativity + Charge Invariance

will yield

Magnetism.

We make the (experimentally verifiable) assumption that the charge of anobject is independent of frame of reference. This means that the total chargedoes not change in a given reference frame (i.e., charge is conserved) but alsothat the charge is the same in every reference frame (i.e, charge is an invariant).This is in marked contrast to quantities such as length and time.

Now to see how magnetism must occur under the assumptions of Coulomb’slaw, special relativity, and charge invariance. We start with two charges, q1 andq2, and will consider two reference frames 1 and 2.

In reference frame 2, suppose that the two charges have coordinates q1 =(0,0,0) and q2 = (x2, y2,0), as in Figure 5.1.

y

x

= (0 , 0, 0)

q2

q1

= ( x 2 , y2 , 0)

Figure 5.1

We assume in frame 2 that the charges are not moving. Then by Coulomb’slaw, the force between the two charges is

F2 = (F2x , F2y , F2z )

=(

q1q2

(x22 + y2

2)3/2x2,

q1q2

(x22 + y2

2 )3/2y2,0

).

Let reference frame 2 move at constant speed v in the direction (1,0,0) withrespect to frame 1, with Lorentz transform

x2 = γ x1 − γ vt1

y2 = y1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

Page 79: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

64 5 Mechanics and Maxwell’s Equations

where γ = 1/√

1 − (v/c)2. Since the charges are not moving in frame 2, wehave

v2 = (v2x ,v2y ,v2z) = (0,0,0).

Then, from the previous section, we have

F1x = F2x + v

c2 + v2xv(v2y F2y + v2z F2z)

= F2x

= q1q2

(x22 + y2

2)3/2x2

F1y = F2y

γ(

1 +(v2xv

c2

))= 1

γF2y

= q1q2

γ (x22 + y2

2)3/2y2

F1z = F2z

γ(

1 +(v2xv

c2

))= 0.

We now put F1x and F1y in terms of the coordinates x1 and y1, yielding, att1 = 0,2

F1x = γ q1q2x1

(γ 2x21 + y2

1 )3/2

F1y = q1q2y1

γ (γ 2x21 + y2

1 )3/2

= γ q1q2y1

(γ 2x21 + y2

1 )3/2

(1 −

(vc

)2)

.

Then the force F1 is

γ q1q2

(γ 2x21 + y2

1 )3/2(x1, y1,0) − γ q1q2v

2 y1

c2(γ 2x21 + y2

1)3/2(0,1,0),

2 Note we are using that:

1

γ= γ

γ 2

= γ

(1 −

(v

c

)2)

.

Page 80: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5.5 Exercises 65

which equals

q2

(γ q1

(γ 2x21 + y2

1)3/2(x1, y1,0) + v(1,0,0) ×

(γ q1vy1

c2(γ 2x21 + y2

1)3/2

)(0,0,1)

).

Theq2

γ q1

(γ 2x21 + y2

1)3/2(x1, y1,0)

part of the force is just the contribution, in frame 1, from Coulomb’s law. Butthere is the additional term, pointing in the (0,0,1) direction. It is this secondterm that is captured by magnetism.

5.5. Exercises

Exercise 5.5.1. Let

F = GmM

(x2 + y2 + z2)3/2(x , y, z)

where G, m, and M are constants. Show that the magnitude of F is

|F | = GmM

(distance)2.

The next few problems discuss how to solve second order ordinarydifferential equations, which naturally occur when trying to solve

F = ma = md2r

dt2.

Exercise 5.5.2. Suppose we have an object on a spring with mass m = 1 thatcan only move along the x-axis. We know that the force F on a spring is

F(x , t) = −k2x ,

where k is a constant. Let x(t) describe the path of the object. Suppose thatwe know

x(0) = 1

dx

dt(0) = 1.

Using Newton’s second law, show that

x(t) = cos(kt) + 1

ksin (kt)

is a possible solution.

Page 81: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

66 5 Mechanics and Maxwell’s Equations

Exercise 5.5.3. Let y1(t) and y2(t) be solutions to the ordinary differentialequation

d2 y(t)

dt2+ a

dy(t)

dt+ by = 0,

where a and b are constants. Show that

αy1(t) +β y2(t)

is also a solution, for any constants α,β.

Exercise 5.5.4. Consider the ordinary differential equation

d2 y(t)

dt2+ a

dy(t)

dt+ by = 0,

for two constants a and b. Show that if

y(t) = eαt

is a solution, then α must be a root of the second-degree polynomial

x2 + ax + b.

If a2 − 4b �= 0, find two linearly independent solutions to this differentialequation. (Recall that functions f (t) and g(t) are linearly independent whenthe only way for there to be constants λ and µ such that

λ f (t) = µg(t)

is if

λ = µ = 0. )

Exercise 5.5.5. Using Taylor series, show that for all t ∈R,

eit = cos(t) + i sin (t)

e−it = cos(t) − i sin (t).

Exercise 5.5.6. Find two linearly independent solutions, involving exponentialfunctions, to

d2 y(t)

dt2+ k2 y = 0,

via finding the roots to

t2 + k2 = 0.

Use these solutions to show that

Page 82: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5.5 Exercises 67

α cos(kt) +β sin (kt)

is also a solution, for any constants α,β.

The next series of exercises discusses a special type of force, calledconservative. Recall that a path in R

3 is given by

r (t) = (x(t), y(t), z(t)).

A tangent vector to this path at a point r (t0) is

dr (t0)

dt=(

dx(t0)

dt,dy(t0)

dt,dz(t0)

dt

).

LetF = (F1(x , y, z), F2(x , y, z), F3(x , y, z)).

Then we have, for a curve σ defined by r (t) = (x(t), y(t), z(t)) with a ≤ t ≤ b,

∫σ

F · dr (t) =∫ b

aF1(x(t), y(t), z(t))

dx

dtdt

+∫ b

aF2(x(t), y(t), z(t))

dy

dtdt

+∫ b

aF3(x(t), y(t), z(t))

dz

dtdt .

Exercise 5.5.7. Let σ be defined by r (t) = (x(t), y(t), z(t)) with a ≤ t ≤ b.Suppose

r (a) = p = (p1, p2, p3)

r (b) = q = (q1,q2,q3).

Finally, for a function f (x , y, z), set

F = ∇( f ).

Then show that ∫σ

F · dr (t) = f (q1,q2,q3) − f (p1, p2, p3).

Any vector field that is defined via F = ∇( f ), for a function f (x , y, z), issaid to be conservative.

Exercise 5.5.8. Let F be a conservative vector field. Let σ1 be defined byr1(t) = (x1(t), y1(t), z1(t)) with a ≤ t ≤ b and let σ2 be defined by r2(t) =

Page 83: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

68 5 Mechanics and Maxwell’s Equations

(x2(t), y2(t), z1(t)) with a ≤ t ≤ b such that

r1(a) = r2(a), r1(b) = r2(b).

Then show that ∫σ1

F · dr1(t) =∫

σ2

F · dr2(t).

Exercise 5.5.9. Let F be a conservative vector field. Let σ be defined byr (t) = (x(t), y(t), z(t)) with a ≤ t ≤ b such that

r (a) = r (b).

Show that ∫σ

F · dr (t) = 0.

Exercise 5.5.10. Let F be conservative. Show that

∇ × F = (0,0,0).

Exercise 5.5.11. By considering the function

f (x , y, z) = −GMm√

x2 + y2 + z2,

show that the gravitational force

F(x , y, z) = GMm

(x2 + y2 + z2)3/2(x , y, z)

is conservative.

Exercise 5.5.12. Let reference frame 1 denote the lab frame. In this frame ofreference, denote the object’s velocity by

v1 = (v1x ,v1y ,v1z)

and its force byF1 = (F1x , F1y , F1z).

Let reference frame 2 move at constant speed v in the direction (0,1,0) withrespect to frame 1, with Lorentz transformation

x2 = x1

y2 = γ y1 − γ vt1

z2 = z1

t2 = −γ( v

c2

)x1 + γ t1,

Page 84: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

5.5 Exercises 69

where γ = 1/√

1 − (v/c)2. Let the force in reference frame 2 be denoted byF2 = (F2x , F2y , F2z) and let its velocity be denoted by v2 = (v2x ,v2y ,v2z). WriteF1 in terms of the components of the force F2 and the velocity v2.

Exercise 5.5.13. Use the notation of Section 5.4, but now assume thatthe second charge q2 is at the point (x2,0, z2). Find the force F2.

Exercise 5.5.14. Using the notation of the previous problem, calculate theforce F1 and identify what should be called the magnetic component.

Exercise 5.5.15. Use the notation of Section 5.4, but now assume thatthe second charge q2 is at the point (x2,0,0). Find the force F2. Then calculatethe force F1 and identify what should be called the magnetic component.

Exercise 5.5.16. Use the notation of Section 5.4, but now assume that thesecond charge q2 is at the point (0, y2,0). Find the force F2. Then calculate theforce F1 and identify what should be called the magnetic component.

Page 85: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6

Mechanics, Lagrangians, and theCalculus of Variations

Summary: The goal of this chapter is to recast all of mechanics into theproblem of finding critical points over a space of functions of an integral(the integral of what will be called the Lagrangian). This process is calledthe calculus of variations. Using the tools of the calculus of variations, wewill show that F = ma is equivalent to finding critical points of the integral ofthe Lagrangian. This approach is what most naturally generalizes to deal withforces beyond electricity and magnetism and will be the natural approach thatwe will later take when we quantize Maxwell’s equations.

6.1. Overview of Lagrangians and Mechanics

Newton’s second law states that

Force = mass × acceleration.

As discussed in the last chapter, this means that if we want to predict the pathof a particle (x(t), y(t), z(t)) with mass m, given a force

(F1(x , y, z, t), F2(x , y, z, t), F3(x , y, z, t)),

we have to solve the differential equations

F1(x , y, z, t) = md2x

dt2

F2(x , y, z, t) = md2 y

dt2

F3(x , y, z, t) = md2z

dt2.

70

Page 86: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.2 Calculus of Variations 71

In this chapter, we will see how this system of differential equations can berecast into a system of integrals. The motion of the underlying object will nolonger be modeled as a function that solves a system of differential equationsbut now as a function that is a critical point of a system of integrals. Thissystem of integrals is called the action. The function that is integrated is calledthe Lagrangian. The process of finding these critical paths is called thecalculus of variations.

Newtonian mechanics is only one theory of mechanics, which, whileworking quite well for most day-to-day calculations, is not truly accurate, aswe saw in the need for the Special Theory of Relativity. In fact, as we willsee in a few chapters, we will eventually need a completely different theory ofmechanics: quantum mechanics. Moving even beyond quantum mechanics,when trying to understand the nature of elementary particles, Newtonianmechanics breaks down even further. For these new theories of mechanics, theLagrangian approach for describing the time evolution of the state of a particleis what can be most easily generalized. We will see that different theoriesof mechanics are distinguished by finding critical points of the integrals withdifferent Lagrangians. The physics is captured by the Lagrangian.

In this chapter, we will develop the calculus of variations and then see howNewton’s second law can be recast into this new language.

6.2. Calculus of Variations

The differential equations of mechanics have natural analogs in the calculusof variations. The formulations in terms of the calculus of variations are infact the way to understand the underlying mechanics involving the weak andstrong forces. In the next subsection, we will set up the basic framework forthe calculus of variations. Then, we will derive the Euler-Lagrange equations,which are the associated system of differential equations.

6.2.1. Basic Framework

One of the basic problems that differential calculus of one variable solves isthat of finding maximums and minimums for functions. For a function to havea max or a min at a point p for some reasonable function f (x), it must be thecase that its derivative at p must be zero:

Page 87: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

72 6 Mechanics, Lagrangians, and the Calculus of Variations

y

xp1

p2

dfdx

(p2) = 0

dfdx

(p1) = 0

Figure 6.1

Calculus gives us a procedure for finding the points where a function has anextreme point.

Remember that one subtlety from calculus in finding local minimums andmaximums is that there can be points where the derivative is zero that areneither minimums nor maximums. For example, at x = 0, the function f (x) =x3 has derivative zero, despite the fact that the origin is not a minimum or amaximum. Back in calculus, points for which the derivative is zero are calledcritical or extreme points. All we know is that the local minimums and localmaximums are contained in the extreme points.

The calculus of variations provides us with a mechanism not for findingisolated points that are the extreme values of a function but for finding curvesalong which an integral has a max or a min. As with usual calculus, these arecalled critical or extreme curves. The curves that minimize or maximize theintegrals will be among the extreme curves. For ease of exposition, we will befinding the curves that minimize the integrals.

Consider R2 with coordinates (t , x) and some reasonable function f (t , x , y)of three variables. (By reasonable, we mean that the following integrals andderivatives all exist.) Fix two points (t0, x0) and (t1, x1). We want to find afunction x(t) that makes the integral∫ t1

t0

f (t , x(t), x ′(t)) dt

as small as possible. (Just to be clear, x ′(t) is the derivative of the functionx(t).)

x

t(t0 , x 0)

(t1 , x 1)

x (t)

Figure 6.2

Page 88: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.2 Calculus of Variations 73

For example, let f (t , x , y) = x2 + y2 and let

(t0, x0, y0) = (1,0,0) and (t1, x1, y1) = (3,0,0).

For any function x(t), we have∫ 3

1(x(t)2 + x ′(t)2) dt ≥ 0.

This will be minimal precisely when x(t) = 0. Of course, for most exampleswe cannot find an answer by just looking at the integral.

6.2.2. Euler-Lagrange Equations

We would like to be able to find some reasonable description for the solutioncurve x(t) to the calculus of variations problem of minimizing∫ t1

t0

f (t , x(t), x ′(t)) dt .

We reduce this to standard calculus, namely, to taking a derivative of a newone-variable function. At a key stage, we will be using both integration byparts and the multi-variable chain rule.

Start with the assumption that we somehow already know a solution, whichwe call x(t). In particular we assume we know the initial conditions

x(t0) = x0, x(t1) = x1.

We want to look at paths that are slightly perturbed from the true (though fornow unknown) minimal path x(t). We do this by considering a function η(t),subject only to the conditions

η(t0) = 0, η(t1) = 0.

The perturbed function will be

xε(t) = x(t) + εη(t).

x

t

x (t)

x ε(t)

(t1 , x 1)

(t0 , x 0)

Figure 6.3

Page 89: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

74 6 Mechanics, Lagrangians, and the Calculus of Variations

Here we of course are thinking of ε as some small number. The boundaryconditions on η(t) ensure that xε(t) and x(t) agree at the endpoints t0 and t1.By assumption on the function x(t), we know that for all ε

∫ t1

t0

f (t , x(t), x ′(t)) dt ≤∫ t1

t0

f (t , xε(t), x ′ε(t)) dt .

Define a new function

F(ε) =∫ t1

t0

f (t , xε(t), x ′ε(t)) dt .

This function is a function of the real variable ε, which has a minimum atε = 0, since x0(t) = x(t). Thus its derivative must be zero at ε = 0

dF

dε(0) = 0,

no matter what function η we choose. The strategy now is to differentiate F(ε)and to find needed conditions on the function x(t). We have

0 = dF

dε(0) = d

∫ t1

t0

f (t , xε(t), x ′ε(t)) dt

∣∣∣ε=0

=∫ t1

t0

d

dεf (t , xε(t), x ′

ε(t)) dt∣∣∣ε=0

.

Note that we have not justified our pulling the derivative inside the integral.(This can be made rigorous, as discussed in [44].) Now, by the multi-variablechain rule, we have

d

dεf (t , xε(t), x ′

ε(t)) = ∂

∂ tf (t , xε(t), x ′

ε(t))dt

+ ∂

∂xf (t , xε(t), x ′

ε(t))dxε

+ ∂

∂x ′ f (t , xε(t), x ′ε(t))

dx ′ε

= ∂

∂xf (t , xε(t), x ′

ε(t))η(t) + ∂

∂x ′ f (t , xε(t), x ′ε(t))η′(t).

As is usual with the multi-variable chain rule, the notation begins to becomecumbersome. Here the symbols x and x ′ are playing the roles of bothdependent variables (as functions of t and ε) when we differentiate and

Page 90: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.2 Calculus of Variations 75

independent variables (when we write ∂∂x f and ∂

∂x′ f ). We are also using thefact that t is independent of the variable ε, meaning that dt

dε= 0.

Thus we have

d

dεf (t , xε(t), x ′

ε(t)) = ∂

∂xf (t , xε(t), x ′

ε(t))η(t) + ∂

∂x ′ f (t , xε(t), x ′ε(t))η′(t).

At ε = 0, we get

0 =∫ t1

t0

∂xf (t , x(t), x ′(t))η(t)dt +

∫ t1

t0

∂x ′ f (t , x(t), x ′(t))η′(t) dt .

We briefly concentrate on the term∫ t1

t0∂

∂x′ f (t , x(t), x ′(t))η′(t) dt . It is here that

we use integration by parts (i.e.,∫

u · dv = u · v− ∫v · du). Let

u = ∂

∂x ′ f (t , x(t), x ′(t))

dv

dt= η′(t).

Using that η(t0) = 0 and η(t1) = 0, we get that∫ t1

t0

∂x ′ f (t , x(t), x ′(t))η′(t) dt = −∫ t1

t0

η(t)d

dt

∂x ′ f (t , x(t), x ′(t)) dt .

Thus we have the function η appearing in both of the integral terms.Combining both into one integral, we get

0 =∫ t1

t0

η(t)

[∂

∂xf (t , x(t), x ′(t)) − d

dt

∂x ′ f (t , x(t), x ′(t))]

dt .

Now to use that the function η(t) can be any perturbation. It can be anyreasonable function as long as it satisfies the boundary conditions η(t0) = 0and η(t1) = 0. But no matter what function η(t) we choose, the precedingintegral must be zero. The only way that this can happen is if our solutioncurve x(t) satisfies

∂xf (t , x(t), x ′(t)) − d

dt

∂x ′ f (t , x(t), x ′(t)) = 0.

This is the Euler-Lagrange equation. Though the preceding was derived forwhen x(t) was a minimizing curve, the same argument works for when x(t) ismaximizing or, in fact, when x(t) is any critical value. Thus we have

Theorem 6.2.1. A function x(t) is a critical solution to the integral equation∫ t1

t0

f (t , x(t), x ′(t)) dt

Page 91: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

76 6 Mechanics, Lagrangians, and the Calculus of Variations

if and only if it satisfies the Euler-Lagrange equation

∂xf (t , x(t), x ′(t)) − d

dt

∂x ′ f (t , x(t), x ′(t)) = 0.

Of course, we should now check this by doing an example whose answer wealready know. In the (t , x)-plane, let us prove that the Euler-Lagrange equationwill show us that the shortest distance between two points is a straight line. Ifthe Euler-Lagrange equations did not show this, we would know we had madea mistake in the previous derivation.

Fix two points (t0, x0) and (t1, x1). Given any function x(t) with

x(t0) = x0 and x(t1) = x1,

we know that the arclength is given by the integral∫ t1

t0

√1 + (x ′)2 dt .

In the language we used earlier, we have

f (t , x , x ′) =√

1 + (x ′)2.

Note that this function does not depend on the variable x . The Euler-Lagrangeequation gives us that

0 = ∂

∂xf (t , x(t), x ′(t)) − d

dt

∂x ′ f (t , x(t), x ′(t))

= ∂

∂x

√1 + (x ′)2 − d

dt

∂x ′√

1 + (x ′)2

= − d

dt

∂x ′√

1 + (x ′)2.

Since the derivative with respect to t of the function ∂∂x′√

1 + (x ′)2 is zero, itmust itself be equal to a constant, say c. Thus

c = ∂

∂x ′√

1 + (x ′)2

= x ′√1 + (x ′)2

.

Then

c√

1 + (x ′)2 = x ′.

Page 92: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.2 Calculus of Variations 77

Squaring both sides and then solving gives us

(x ′)2 = c2

1 − c2.

Taking square roots, we see that the derivative of the function x(t) must beequal to a constant:

x ′(t) = constant.

Thus we indeed get that

x(t) = a + bt ,

a straight line. (The constants a and b can be found from the initial conditions.)

6.2.3. More Generalized Calculus of Variations Problems

Our attempt to find an extremal function x(t) for an integral∫L(t , x(t), x ′(t)) dt

is only one type of a calculus of variations style problem. In general, we aretrying to find a function or a collection of functions that are critical pointsfor some sort of integral. Also, in general, we can show that these criticalpoints must satisfy some sort of collection of differential equations, all ofwhich are called Euler-Lagrange equations. (Again, though we are callingthese “critical points,” they are actually functions.) The method for deriving theEuler-Lagrange equations for a given calculus of variations problem, though,is almost always analogous to what we did in the previous section: assume wehave a critical point, perturb it a bit, and then reduce the original problem tofinding a critical point of a one-variable function.

In this section, we just state various Euler-Lagrange equations, leaving theproofs to the exercises.

Theorem 6.2.2. An extremum (x1(t), . . . , xn(t)) to∫ t1

t0

L(t , x1(t), x ′1(t), . . . , xn(t), x ′

n(t)) dt ,

subject to the initial conditions

x(t0) = a1, . . . , xn(t0) = an

and

x(t1) = b1, . . . , xn(t1) = bn

Page 93: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

78 6 Mechanics, Lagrangians, and the Calculus of Variations

will satisfy

∂x1L(t , x(t), x ′(t), y(t), y ′(t)) − d

dt

∂x ′1

L(t , x(t), x ′(t), y(t), y ′(t)) = 0

...∂

∂xnL(t , x(t), x ′(t), y(t), y ′(t)) − d

dt

∂x ′n

L(t , x(t), x ′(t), y(t), y ′(t)) = 0.

Here we are trying to find a curve (x1(t), . . . , xn(t)) that is a critical point forsome integral.

The next type of calculus of variations problem is finding surfaces x(s, t)that minimize some integral.

Theorem 6.2.3. Let R be a rectangle in the (s, t)-plane, with boundary ∂(R).If a function (x(s, t)) minimizes the integral∫

RL

(s, t , x(s, t),

∂x

∂s,∂x

∂ t

)dsdt ,

then x(s, t) will satisfy

∂L

∂x− ∂

∂s

∂L

∂xs− ∂

∂ t

∂L

∂xt= 0.

(Here we are using the notation

xs = ∂x

∂s

xt = ∂x

∂ t.)

And of course we can create new calculus of variations problems by addingmore dependent and independent variables.

6.3. A Lagrangian Approach to Newtonian Mechanics

Newton’s second law states that F(x , y, z, t) = m · a(x , y, z, t). We want toreplace this differential equation with an integral equation from the calculus ofvariations. (At this point, it should not be clear that there is any advantage tothis approach.) We have a vector-valued function r (t) = (x(t), y(t), z(t)) thatdescribes a particle’s position. Our goal is still explicitly to find this functionr (t). We still define velocity v(t) as the time derivative of r (t):

v(t) = dr

dt=(

dx

dt,dy

dt,dz

dt

).

Page 94: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.3 A Lagrangian Approach to Newtonian Mechanics 79

Likewise, the acceleration is still the second derivative of the position functionr (t):

a(t) = d2r

dt2=(

d2x

dt2,d2 y

dt2,d2z

dt2

).

We still want to see how the acceleration is related to the vector fielddescribing force:

F(x , y, z, t) = (F1(x , y, z, t), F2(x , y, z, t), F3(x , y, z, t)).

Now we must make an additional assumption on this vector field F , namely,that F is conservative or path-independent. These terms mean the same thingand agree with the terms from vector calculus. Recall that a vector field isconservative if there exists a function U (x , y, z, t) such that

−∂U

∂x= F1(x , y, z, t)

−∂U

∂ y= F2(x , y, z, t)

−∂U

∂z= F3(x , y, z, t).

This can also be written using the gradient operator ∇ =(

∂∂x , ∂

∂ y , ∂∂z

)as

−∇U = F .

The function U is usually called the potential or, in physics if F is a force, thepotential energy. The existence of such a function U is equivalent to sayingthat path integrals of F depend only on end points. (This is deep and important,but standard, in vector calculus.) Thus we assume that for any differentiablepath

σ (t) = (x(t), y(t), z(t))

with starting point

σ (0) = (x0, y0, z0)

and endpoint

σ (1) = (x1, y1, z1),

the value of the integral ∫σ

F · dσ

depends only on the points (x0, y0, z0) and (x1, y1, z1).

Page 95: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

80 6 Mechanics, Lagrangians, and the Calculus of Variations

The potential function U and the path integral are linked by the following:∫σ

F · dσ = U (x1, y1, z1, t) − U (x0, y0, z0, t).

Note that we are leaving the time t fixed throughout these calculations. In manysituations, such as gravitational fields, the force F is actually independent ofthe time t .

We need a few more definitions, whose justification will be that they willyield a calculus of variations approach to Newton’s second law. Define kineticenergy to be

T = 1

2m · (length of v)2

= 1

2m

((dx

dt

)2

+(

dy

dt

)2

+(

dz

dt

)2)

.

Note that this is the definition of the kinetic energy. (Using the term “energy”does help physicists’ intuition. It appears that physicists have intuitions aboutthese types of “energies.” Certainly in popular culture people bandy about theword “energy” all the time. For us, the preceding ones are the definitions ofenergy, no more and no less.)

Define the Lagrangian L to be

L = Kinetic energy − Potential energy

= T − U .

We need to spend a bit of time on understanding the Lagrangian. It is herethat the notation can become not only cumbersome but actually confusing, asvariables will sometimes be independent variables and sometimes dependentvariables, depending on context.

The kinetic energy is a function of the velocity while the potential energyis a function of the position (and for now the time). We can write theLagrangian as

L

(t , x , y, z,

dx

dt,dy

dt,dz

dt

)= T

(dx

dt,dy

dt,dz

dt

)− U (t , x , y, z).

Frequently people write derivatives with respect to time by putting a dotover the variable. With this notation,

x = dx

dt, y = dy

dt, z = dz

dt.

Page 96: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.3 A Lagrangian Approach to Newtonian Mechanics 81

This notation goes back to Newton; it is clearly preferred by typesetters in thefollowing computations. With this notation, the Lagrangian is

L(t , x , y, z, x, y, z).

Now to link this Lagrangian to mechanics. Suppose we are given two pointsin space:

p0 = (x0, y0, z0)

p1 = (x1, y1, z1).

Our goal is to find the actual path

r (t) = (x(t), y(t), z(t))

that a particle will follow from a time t0 to a later time t1, subject to the initialand final conditions

r (t0) = p0,r (t1) = p1.

Given any possible path r (t), the Lagrangian L is a function on this path, aseach point of the path gives us positions and velocities. We still need one moredefinition. Given any possible path r (t) going from point p0 to p1, let theaction S be defined as

S(r (t)) =∫ t1

t0

L dt

=∫ t1

t0

L(t , x , y, z, x , y, z) dt .

We can finally state our integral version of Newton’s second law.

Newton’s Second Law (Calculus of Variations Approach): A bodyfollowing a path r (t) from an initial point p0 to a final point p1 is a criticalpoint for the action S(r (t)).

Of course we now want to link this statement to the differential equationapproach to Newton’s second law. We must link the path that is a critical valuefor

S(r (t)) =∫ t1

t0

L dt

to the differential equation

F = m · a(t).

Page 97: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

82 6 Mechanics, Lagrangians, and the Calculus of Variations

Assume we have found our critical path r (t). Then the Lagrangian mustsatisfy the Euler-Lagrange equations:

∂L

∂x− d

dt

∂L

∂ x= 0

∂L

∂ y− d

dt

∂L

∂ y= 0

∂L

∂z− d

dt

∂L

∂ z= 0.

In taking the partial derivatives, we are treating the Lagrangian L as afunction of the seven variables t , x , y, z, x, y, z, but, in taking the derivativeddt , we are treating L as the function L(t , x(t), y(t), z(t), x(t), y(t), z(t)) of thevariable t alone.

Since

L = Kinetic energy− Potential enery

= 1

2m(x2 + y2 + z2) − U (x , y, z),

the Euler-Lagrange equations become

−∂U

∂x− d(mx)

dt= 0

−∂U

∂ y− d(my)

dt= 0

−∂U

∂z− d(mz)

dt= 0.

Since the x , y, z are just another notation for a derivative, we have

−∂U

∂x= m

d2x

dt2

−∂U

∂ y= m

d2y

dt2

−∂U

∂z= m

d2z

dt2.

Page 98: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.4 Conservation of Energy from Lagrangians 83

Note that the right-hand side of the preceding is just the mass (a number) timesthe individual components of the acceleration vector,

m · a = m

(d2x

dt2,d2 y

dt2,d2z

dt2

),

making it start to look like the desired Force = mass times acceleration. Thefinal step is to recall that the very definition of the potential energy function Uis that it is a function whose gradient gives us the force. Thus the force is thevector field:

F = −gradient(U ) = −∇U =(

−∂U

∂x,−∂U

∂ y,−∂U

∂z

).

The Euler-Lagrange equations are just F = ma in disguise.

6.4. Conservation of Energy from Lagrangians

In the popular world there is the idea of conservation of energy, usuallyexpressed in the form that energy can be neither created nor destroyed. Ihave even heard this phrase used by someone trying to make the existenceof spirits and ghosts sound scientifically plausible. While such attempts aresilly, it is indeed the case that, in a well-defined sense, in classical Newtonianphysics, energy is indeed conserved. Historically, finding the right definitionsand concepts was somewhat tortuous [11]. Technically, in classical physics, itcame to be believed that in a closed system, energy must be a constant. RichardFeynman, in section 4.1 of The Feynman Lectures on Physics, Volume I [21],gave an excellent intuitive description, comparing energy to children’s blocks.

In the context of the last section, we will show that

Kinetic Energy + Potential Energy

is a constant, meaning that we want to show that

1

2m(x2 + y2 + z2

)+ U (x , y, z)

is a constant. We will see, in the exercises, that this is a consequence of

Theorem 6.4.1. Suppose that L = L(t , x , y, z, x , y, z) is independent of t,meaning that

∂L

∂ t= 0.

Page 99: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

84 6 Mechanics, Lagrangians, and the Calculus of Variations

When restricting to a critical path (x(t), y(t), z(t)), the function

∂L

∂ xx + ∂L

∂ yy + ∂L

∂ zz − L

is a constant function.

Proof. To show that ∂L∂ x x + ∂L

∂ y y + ∂L∂ z z − L is a constant, we will prove that

d

dt

(∂L

∂ xx + ∂L

∂ yy + ∂L

∂ zz − L

)= 0.

This will be an exercise in the multi-variable chain rule, critically using theEuler-Lagrange equations.

We have

dL

dt= ∂L

∂ t

dt

dt

+ ∂L

∂x

dx

dt+ ∂L

∂ y

dy

dt+ ∂L

∂z

dz

dt

+ ∂L

∂ x

dx

dt+ ∂L

∂ y

dy

dt+ ∂L

∂ z

dz

dt.

Using our assumption that ∂L/∂ t = 0 and the Euler-Lagrange equations, weget

dL

dt= d

dt

(∂L

∂ x

)x + d

dt

(∂L

∂ y

)y + d

dt

(∂L

∂ z

)z

+ ∂L

∂ x

dx

dt+ ∂L

∂ y

dy

dt+ ∂L

∂ z

dz

dt

= d

dt

(∂L

∂ xx + ∂L

∂ yy + ∂L

∂ zz

),

which gives us our desired ddt

(∂L∂ x x + ∂L

∂ y y + ∂L∂ z z − L

)= 0.

To finish this section, we need to link this with energy, as is done in thefollowing corollary, whose proof is left to the exercises:

Corollary 6.4.1. If

L = m

2(x2 + y2 + z2) − U (x , y, z),

then the energym

2(x2 + y2 + z2) + U (x , y, z)

is a constant.

Page 100: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.5 Noether’s Theorem and Conservation Laws 85

6.5. Noether’s Theorem and Conservation Laws

Noether’s Theorem is one of the most important results of the twentiethcentury. It provides a link among mechanics (as described by Lagrangians),symmetries (and hence group theory), and conservation laws. In essence,Noether’s Theorem states that any time there is a continuous change ofcoordinates that leaves the Lagrangian alone, then there must be a quantitythat does not change (which, in math language, means that there is an invariant,while in physics language, it means there is a conservation law). For example,in the last section, under the assumption that the Lagrangian does not changeunder a change of the time coordinate (which is basically what our assumptionthat ∂L/∂ t = 0 means), we derived the conservation of energy. This is anextremely special case of Noether’s Theorem.

A general, sweeping statement of Noether’s Theorem, which we quote fromsection 1.1 of Zeidler’s Quantum Field Theory I: Basics in Mathematics andPhysics [70] is

Conservation laws in physics are caused by symmetries of physical systems.

Hence, in classical mechanics, the conservation of energy is derivable fromthe symmetry of time change. Similarly, in classical mechanics, wheneverthe Lagrangian does not change under spatial translations, the correspondingquantity that does not change can be shown to be the momentum, and,whenever the Lagrangian does not change under rotations of the spatialcoordinates, the corresponding conserved quantity can be shown to be theangular momentum. In special relativity, time and space are not independentof each other, so it should not be surprising, under the influence of Noether,that energy and momentum might not be conserved anymore. But, insteadof energy and momentum being conserved, there is a four-vector (called theenergy-momentum four-vector) that is conserved.

Here is the big picture. Different theories of mechanics are described bydifferent Lagrangians. Different Lagrangians are invariant under differentchanges of coordinates (i.e., different symmetries). Different symmetries giverise to different conservation laws.

Which is more important; which is more basic: the Lagrangian, thesymmetries, or the conservation laws? The answer is that all three are equallyimportant, and to a large extent, imply each other. We could have built thisbook around any of the three.

There are two recent books on Noether’s Theorem: Kosmann-Schwarzbach’s The Noether Theorems: Invariants and Conservation Laws inthe Twentieth Century [35] and Neuenschwander’s Emmy Noether’s WonderfulTheorem [47].

Page 101: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

86 6 Mechanics, Lagrangians, and the Calculus of Variations

6.6. Exercises

Exercise 6.6.1. (This is problem 1 in section 48 of [59].) Find the Euler-Lagrange equations for ∫ √

1 + (x ′)2

xdt .

Exercise 6.6.2. (This is problem 2 in section 48 of [59].) Find the extremalsolutions to ∫ 4

0

(tx ′ − (x ′)2

)dt .

Exercise 6.6.3. Show that an extremum (x(t), y(t)) to∫ t1

t0

f (t , x(t), x ′(t), y(t), y ′(t))dt ,

subject to the initial conditions

x(t0) = x0, y(t0) = y0, x(t1) = x1, y(t1) = y1,

must satisfy

∂xf (t , x(t), x ′(t), y(t), y ′(t)) − d

dt

∂x ′ f (t , x(t), x ′(t), y(t), y ′(t)) = 0

and

∂ yf (t , x(t), x ′(t), y(t), y ′(t)) − d

dt

∂ y ′ f (t , x(t), x ′(t), y(t), y ′(t)) = 0.

Exercise 6.6.4. Fix two points (t0, x0, y0) and (t1, x1, y1). Given any function(x(t), y(t)) with

x(t0) = x0 and x(t1) = x1

y(t0) = y0 and y(t1) = y1,

we know that the arclength is given by the integral∫ t1

t0

√1 + (x ′)2 + (y ′)2dt .

Show that the shortest distance between these two points is indeed a straightline.

Exercise 6.6.5. Let R be a rectangle in the (s, t)-plane, with boundary ∂(R).Show that if the function (x(s, t)) minimizes the integral∫

Rf

(s, t , x(s, t),

∂x

∂s,∂x

∂ t

)dsdt ,

Page 102: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

6.6 Exercises 87

then x(s, t) will satisfy

∂ f

∂x− ∂

∂s

∂ f

∂xs− ∂

∂ t

∂ f

∂xt= 0.

(Here we are using the notation

xs = ∂x

∂s

xt = ∂x

∂ t. )

Exercise 6.6.6. Let R be a rectangle in the (s, t)-plane, with boundary ∂(R).Show that if the functions (x(s, t), y(s, t)) minimize the integral∫

Rf

(s, t , x(s, t),

∂x

∂s,∂x

∂ t, y(s, t),

∂ y

∂s,∂ y

∂ t

)dsdt ,

then (x(s, t), y(s, t)) will satisfy

∂ f

∂x− ∂

∂s

∂ f

∂xs− ∂

∂ t

∂ f

∂xt= 0

and∂ f

∂ y− ∂

∂s

∂ f

∂ ys− ∂

∂ t

∂ f

∂ yt= 0.

(Here we are using the notation

xs = ∂x

∂s, ys = ∂ y

∂s

xt = ∂x

∂ t, yt = ∂ y

∂ t. )

Exercise 6.6.7. If the Lagrangian is

L = m

2(x2 + y2 + z2) − U (x , y, z),

show that the energy

m

2(x2 + y2 + z2) + U (x , y, z)

is a constant.

Exercise 6.6.8. Suppose that we have a particle whose mass m = 2, for somechoice of units. Suppose that the potential energy is U (x) = x2. For theLagrangian L = x2 − x2, explicitly solve the Euler-Lagrange equation. Usethis solution to show explicitly that the energy x2 + x2 is a constant. (Hint: Infinding solutions for the Euler-Lagrange equation, think trig functions.)

Page 103: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

7

Potentials

Summary: The different roles of the scalar potential function and thevector potential function will be developed. Initially both potentials seemto be secondary mathematical artifacts, as compared to the seemingly morefundamental electric field E(x , y, z, t) and magnetic field B(x , y, z, t). This isnot the case. Both potentials are critical for the Lagrangian description ofMaxwell’s equations (which is the goal of the next chapter) and, ultimatelymore importantly, are critical for seeing how to generalize Maxwell’s equationsto the strong and weak forces.

7.1. Using Potentials to Create Solutions for Maxwell’s Equations

It is not at all clear how easy it is to find vector fields E , B , and j and a functionρ that satisfy the many requirements needed for Maxwell’s equations. Thereis a mathematical technique that almost seems to be a trick that will allow useasily to construct such solutions; we will see, though, that this approach is farmore than a mere mathematical artifice. We start with the trick.

Let φ(x , y, z, t) be any function and let A(x , y, z, t) be any vector field(A1(x , y, z, t), A2(x , y, z, t), A3(x , y, z, t)). We now simply set

E(x , y, z, t) = −∇φ − ∂ A(x , y, z, t)

∂ t

= −(

∂φ

∂x,∂φ

∂ y,∂φ

∂z

)−(

∂ A1

∂ t,∂ A2

∂ t,∂ A3

∂ t

)B(x , y, z, t) = ∇ × A(x , y, z, t)

=(

∂ A3

∂ y− ∂ A2

∂z,−(

∂ A3

∂x− ∂ A1

∂z

),∂ A2

∂x− ∂ A1

∂ y

).

88

Page 104: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

7.2 Existence of Potentials 89

By also settingρ = div(E)

and

j = c2curl(B) − ∂ E

∂ t,

we get solutions to Maxwell’s equations. (This is explicitly shown in theexercises.)

We call φ the scalar potential and A the vector potential. These potentialsgenerate a lot of solutions. Of course, there is no reason to think if we havea real electric field E and a real magnetic field B then there should be theseauxiliary potentials. The goal of the next section is to show that they actuallyexist.

(First for a few technical caveats. Throughout this book, we are assumingthat all functions can be differentiated as many times as needed. We alsoassume that our functions are such that we can interchange derivatives andintegrals when needed. In this section, we are critically using that the domainsfor our functions are R

3, R4, or, at worst, contractible spaces, which is aconcept from topology. If the domains are not contractible, then the resultsof this chapter are no longer true. In particular, there is no guarantee for theexistence of potentials. The full discussion of these issues requires a moredetailed knowledge of differential topology, such as in the classic DifferentialTopology by Guillemin and Pollack [30].)

7.2. Existence of Potentials

Our goal is to prove

Theorem 7.2.1. Let E and B be vector fields satisfying ∇ × E = − ∂ B∂ t and

∇ · B = 0. Then there exist a scalar potential function φ and a vector potentialfield A such that

E(x , y, z, t) = −∇φ − ∂ A(x , y, z, t)

∂ tB(x , y, z, t) = ∇ × A(x , y, z, t).

Note that this theorem applies only to vector fields E and B satisfyingMaxwell’s equations.

To some extent, the existence of these potentials stems from the followingtheorem from vector calculus:

Theorem 7.2.2. Let F be a vector field.

Page 105: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

90 7 Potentials

1. If ∇ × F = 0, then there exists a function φ, the scalar potential, such that

F(x , y, z, t) = ∇φ(x , y, z, t).

2. If ∇ · F = 0, then there exists a vector field A, the vector potential, such that

F(x , y, z, t) = ∇ × A(x , y, z, t).

(We give intuitions and a proof of this theorem in the Appendix at the endof this chapter.) Though this theorem will provide a fairly quick proof thatpotentials exist, it makes their existence feel more like a mathematical trick.This is one of the reasons why for many years potentials were viewed as notas basic as the fields E and B . As we will see in later chapters, this view is notcorrect.

Now for the proof of Theorem 7.2.1.

Proof. Since we are assuming ∇ · B = 0, we know from the preceding theoremthat there is a vector field A such that

B = ∇ × A.

We must now show that the function φ exists. We know that

∇ × E = −∂ B

∂ t

which means that

∇ × E = −∂(∇ × A)

∂ t.

As will be shown in the exercises, this can be rewritten as

∇ × E = −∇ × ∂ A

∂ t,

meaning that

0 = ∇ × E +∇ × ∂ A

∂ t

= ∇ ×(

E + ∂ A

∂ t

).

But then there must be a function φ such that

E + ∂ A

∂ t= −∇ ·φ

giving us that

E = −∇ ·φ − ∂ A

∂ t.

Page 106: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

7.4 Appendix: Some Vector Calculus 91

7.3. Ambiguity in the Potential

In any given physical situation, there is no ambiguity in the electric field E andthe magnetic field B . This is not the case with the potentials. The goal of thissection is to start to understand the ambiguity of the potentials, namely,

Theorem 7.3.1. Let E and B be vector fields satisfying Maxwell’s equations,with scalar potential function φ and vector potential field A. If f (x , y, z, t) isany function, then the function φ − ∂ f

∂ t and the vector field A + ∇( f ) are alsopotentials for E and B.

The proof is a straightforward argument from vector calculus and is in theexercises.

Note that even using the phrase “The proof is a straightforward argumentfrom vector calculus” makes this result seem not very important, suggestingeven further that the potentials are just not as basic as the electric field E andthe magnetic field B . Again, this is not the case. In later chapters we willsee that potentials have a deep interpretation in modern differential geometry.(Technically, potentials are connections in a certain vector bundle.)

Also, in quantum mechanics, there is the Aharonov-Bohm effect, whichproves that potentials have just as much status as the original fields E and B .

7.4. Appendix: Some Vector Calculus

We will now proceed to prove Theorem 7.2.2. Let

F = (F1, F2, F3)

be a vector field. We first assume that its curl is zero: ∇ × F = (0,0,0), whichmeans that

∂ F1

∂ y= ∂ F2

∂x

∂ F1

∂z= ∂ F3

∂x∂ F3

∂ y= ∂ F2

∂z.

Page 107: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

92 7 Potentials

We want to find a function φ such that ∇φ = F , meaning we want

∂φ

∂x= F1

∂φ

∂ y= F2

∂φ

∂z= F3.

We will just state how to construct the function φ, ignoring, unfortunately,any clue as to how anyone could initially think up this function. Given a point(x , y, z) ∈ R

3, consider the straight line γ from the origin (0,0,0) to (x , y, z)given by

r (t) = (xt , yt , zt),

for 0 ≤ t ≤ 1. as in Figure 7.1. Then define

φ(x , y, z) =∫

γ

F · dr .

F

(0, 0, 0)

(x, y, z)

γ (t) = (xt, yt, zt)

Figure 7.1

We have

φ(x , y, z) =∫

γ

F · dr

=∫ 1

0(x F1(tx , ty, tz) + y F2(tx , ty, tz) + z F3(tx , ty, tz)) dt .

Page 108: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

7.4 Appendix: Some Vector Calculus 93

We will now show that ∂φ/∂x = F1, leaving ∂φ/∂ y = F2 for the exercises(showing ∂φ/∂z = F3 is similar). We have

∂φ

∂x= ∂

∂x

(∫ 1

0(x F1(tx , ty, tz) + y F2(tx , ty, tz) + z F3(tx , ty, tz)) dt

)

=∫ 1

0

∂x(x F1(tx , ty, tz) + y F2(tx , ty, tz) + z F3(tx , ty, tz))dt

=∫ 1

0

(F1(tx , ty, tz) + tx

∂ F1

∂x+ ty

∂ F2

∂x+ tz

∂ F3

∂x

)dt

=∫ 1

0

(F1(tx , ty, tz) + tx

∂ F1

∂x+ ty

∂ F1

∂ y+ tz

∂ F1

∂z

)dt

(using ∇ × F = 0)

=∫ 1

0

(F1(tx , ty, tz) + t

d

dt(F1(tx , ty, tz))

)dt .

Using integration by parts, we have

∫ 1

0t

d

dt(F1(tx , ty, tz))dt = F1(x , y, z) −

∫ 1

0F1(tx , ty, tz) dt .

Then

∂φ

∂x=∫ 1

0

(F1(tx , ty, tz) + t

d

dt(F1(tx , ty, tz))

)dt = F1(x , y, z),

as desired.Now for the second part of the theorem. Assume that ∇ · F = 0, meaning

that

∂ F1

∂x+ ∂ F2

∂ y+ ∂ F3

∂z= 0.

Page 109: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

94 7 Potentials

(Eventually we will use that ∂ F1∂x = − ∂ F2

∂ y − ∂ F3∂z .) We must construct a vector

field G = (G1, G2, G3) such that ∇ × G = F , or, in other words, such that

F1 = ∂G3

∂ y− ∂G2

∂z

F2 = ∂G1

∂z− ∂G3

∂x

F3 = ∂G2

∂x− ∂G1

∂ y.

Set

G1(x , y, z) =∫ 1

0(zt F2(tx , ty, tz) − yt F3(tx , ty, tz)) dt

G2(x , y, z) =∫ 1

0(xt F3(tx , ty, tz) − zt F1(tx , ty, tz)) dt

G3(x , y, z) =∫ 1

0(yt F1(tx , ty, tz) − xt F2(tx , ty, tz)) dt .

(Again, we are giving no motivation for how anyone first thought of thismethod.) We will show that

F1 = ∂G3

∂ y− ∂G2

∂z,

leaving the other parts for the exercises. We have

∂G3

∂ y− ∂G2

∂z= ∂

∂ y

(∫ 1

0(yt F1(tx , ty, tz) − xt F2(tx , ty, tz)) dt

)

− ∂

∂z

(∫ 1

0(xt F3(tx , ty, tz) − zt F1(tx , ty, tz)) dt

)

=∫ 1

0

∂ y(yt F1(tx , ty, tz) − xt F2(tx , ty, tz)) dt

Page 110: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

7.5 Exercises 95

−∫ 1

0

∂z(xt F3(tx , ty, tz) − zt F1(tx , ty, tz)) dt

=∫ 1

0

(t F1 + yt2 ∂ F1

∂ y− xt2 ∂ F2

∂ y

)dt

−∫ 1

0

(xt2 ∂ F3

∂z− t F1 − zt2 ∂ F1

∂z

)dt .

Since ∂ F1∂x = − ∂ F2

∂ y − ∂ F3∂z , we have that

−xt2 ∂ F2

∂ y− xt2 ∂ F3

∂z= xt2 ∂ F1

∂x.

Then the preceding becomes

∂G3

∂ y− ∂G2

∂z=∫ 1

0

(2t F1 + t2 d

dtF1(tx , ty, tz)

)dt .

Using integration by parts we have

∫ 1

0t2 d

dtF1(tx , ty, tz) dt = t2 F1(tx , ty, tz)

∣∣∣10−∫ 1

02t F1 dt = F1 −

∫ 1

02t F1 dt .

Thus∂G3

∂ y− ∂G2

∂z= F1(x , y, z),

as desired.

7.5. Exercises

Exercise 7.5.1. Letφ(x , y, z, t) = xyzt

andA(x , y, z, t) = (x2y + t3, xyz2t , t3x + yzt).

Compute

E(x , y, z, t) = −∇φ − ∂ A(x , y, z, t)

∂ t

B(x , y, z, t) = ∇ × A(x , y, z, t).

Page 111: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

96 7 Potentials

Then compute

ρ = div(E)

j = c2curl(B) − ∂E

∂ t.

Verify that these are solutions to Maxwell’s equations.

Exercise 7.5.2. Let φ(x , y, z, t) be any function and A(x , y, z, t) any vectorfield. Setting

E(x , y, z, t) = −∇φ − ∂ A(x , y, z, t)

∂ t

B(x , y, z, t) = ∇ × A(x , y, z, t)

and then in turn setting

ρ = div(E)

j = c2curl(B) − ∂E

∂ t,

show that we have a solution to Maxwell’s equations.

Exercise 7.5.3. Let

f (x , y, z, t) = x2 + yz + zt2.

Using the notation in the first exercise of this chapter, replace the preceding φ

with

φ − ∂ f

∂ t

and A with

A +∇( f ).

Show that we get the same values as in the first exercise for E , B , j , and ρ.

Exercise 7.5.4. For any vector field F = (F1, F2, F3), show that

∂(∇ × F)

∂ t= ∇ × ∂ F

∂ t.

Exercise 7.5.5. For F = (F1, F2, F3), show that

∇ × F = 0

Page 112: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

7.5 Exercises 97

if and only if

∂ F2

∂x= ∂ F1

∂ y

∂ F3

∂x= ∂ F1

∂z∂ F2

∂z= ∂ F3

∂ y.

Exercise 7.5.6. Prove that if E and B are any vector fields satisfyingMaxwell’s equations, with potential function φ and potential vector field A,then, for any function f (x , y, z, t), the function φ − ∂ f

∂ t and the vector fieldA +∇( f ) are also potentials for E and B.

Exercise 7.5.7. Let F be a vector field with ∇ × F = (0,0,0). Set

φ(x , y, z) =∫ 1

0(x F1(tx , ty, tz) + y F2(tx , ty, tz) + z F3(tx , ty, tz)) dt .

Show that∂φ

∂ y= F2.

Exercise 7.5.8. LetF = (x , y, z).

Show that ∇ × F = (0,0,0) and then find a function φ(x , y, z) such that

∇ ·φ = F .

Exercise 7.5.9. Let F be a vector field such that ∇ · F = 0. Set

G1(x , y, z) =∫ 1

0(zt F2(tx , ty, tz) − yt F3(tx , ty, tz)) dt

G3(x , y, z) =∫ 1

0(yt F1(tx , ty, tz) − xt F2(tx , ty, tz)) dt .

Show that

F2 = ∂G1

∂z− ∂G3

∂x.

Exercise 7.5.10. Let F = (yz, x , y). Show that ∇ · F = 0. Then find a vectorfield G such that

∇ × G = F .

Page 113: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

8

Lagrangians and Electromagnetic Forces

Summary: Using last chapter’s scalar and vector potentials, we will constructthe Lagrangian whose corresponding Euler-Lagrange equation

F = q(E + v× B)

gives us the electromagnetic force F .

8.1. Desired Properties for the Electromagnetic Lagrangian

In Chapter 6, we saw how to recast basic mechanics in terms of a Lagrangian.We want to see how to do this for electromagnetic forces. Thus we want tofind a function

L = L(t , x , y, z, x , y, z)

with the following key property. Suppose we have a particle moving in anelectromagnetic field that starts at the point (x0, y0, z0) and ends at the point(x1, y1, z1). We want to be able to predict the path (x(t), y(t), z(t)) for 0 ≤ t ≤ 1that the particle follows. Thus we want our function L to have the propertythat, among all possible paths, the actual path is a critical point of the integral

∫ 1

0L(t , x , y, z, x , y, z) dt .

From Chapter 5, we know that the particle’s path (x(t), y(t), z(t)) will satisfythe system of differential equations

F = q(E + v× B).

98

Page 114: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

8.2 The Electromagnetic Lagrangian 99

From Chapter 6, we know for this path that the Euler-Lagrange equations

d

dt

(∂L

∂ x

)− ∂L

∂x= 0

d

dt

(∂L

∂ y

)− ∂L

∂ y= 0

d

dt

(∂L

∂ z

)− ∂L

∂z= 0

must be satisfied. A candidate function L(t , x , y, z, x , y, z) will be our desiredLagrangian if the Euler-Lagrange equations can be shown to be equivalent toF = q(E + v× B).

One final word about notation, which can quickly become a nightmare.When we write L = L(t , x , y, z, x , y, z), we are thinking of L as a functionof seven independent variables. The reason to denote the last three variablesas x , y, z instead of, say, u,v,w, is that when we find our path (x(t), y(t), z(t))that minimizes

∫ 10 L(t , x , y, z, x , y, z)dt , we will have

x = dx(t)

dt

y = dy(t)

dt

z = dz(t)

dt.

8.2. The Electromagnetic Lagrangian

The Lagrangian for electromagnetism is

L = 1

2mv · v− qφ + q A · v.

The vector v denotes the velocity v = (x , y, z), the function φ(x , y, z, t) is thescalar potential, the vector field A(x , y, z, t) is the vector potential. Also, m isthe mass of the particle and q is its charge, both of which are constants. Thusthis L is a function of the variables t , x , y, z, x , y, z.

We are simply stating the Lagrangian L and giving no clue as to how anyonecould have ever first written it down. Our justification will be that the Euler-Lagrange equations for this L will yield the correct force F = q(E + v × B)for electromagnetism. Hence the following:

Page 115: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

100 8 Lagrangians and Electromagnetic Forces

Theorem 8.2.1. Any path (x(t), y(t), z(t)), for 0 ≤ t ≤ 1, that minimizes∫ 1

0L(t , x , y, z, x, y, z) dt

will satisfy the partial differential equations given by

F = q(E + v× B).

Proof. The ordinary differential equations given by F = q(E + v× B) are

mx = q E1 + q(yB3 − zB2)

my = q E2 + q(zB1 − x B3)

mz = q E3 + q(x B2 − yB1).

Similar to x denoting the derivative of x(t) with respect to time t , we are lettingx denote the second derivative of x(t) with respect to time t , and similar for y,z, and so on.

We must show that these are implied by the Euler-Lagrange equations:

d

dt

(∂L

∂ x

)− ∂L

∂x= 0

d

dt

(∂L

∂ y

)− ∂L

∂ y= 0

d

dt

(∂L

∂ z

)− ∂L

∂z= 0.

We will show here that ddt

(∂L∂ x

)− ∂L∂x = 0 implies that mx = q E1 + q(yB3 −

z B2), leaving the rest to the exercises.We are assuming that the Lagrangian is

L = 1

2mv · v− qφ + q A · v.

The scalar potential φ and the vector potential A = (A1, A2, A3) are functionsof x , y, z and do not directly depend on the derivatives x , y, z. The velocity v,of course, is just v = (x , y, z). Thus the Lagrangian is

L(t , x , y, z, x , y, z) = 1

2m(x2 + y2 + z2) − qφ(x , y, z, t) + q(A1x + A2 y + A3 z).

Then∂L

∂ x= mx + q A1.

Page 116: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

8.3 Exercises 101

Hence,

d

dt

(∂L

∂ x

)= mx + q

dA1(t , x(t), y(t), z(t))

dt

= mx + q

(∂ A1

∂ t· dt

dt+ ∂ A1

∂x· dx

dt+ ∂ A1

∂ y· dy

dt+ ∂ A1

∂z· dz

dt

)

= mx + q

(∂ A1

∂ t+ ∂ A1

∂xx + ∂ A1

∂ yy + ∂ A1

∂zz

).

Now,∂L

∂x= −q

∂φ

∂x+ q

(∂ A1

∂xx + ∂ A2

∂xy + ∂ A3

∂xz

).

Then the Euler-Lagrange equation

∂L

∂x− d

dt

(∂L

∂ x

)= 0

becomes

mx = −q∂φ

∂x− q

∂ A1

∂ t+ q

[y

(∂ A2

∂x− ∂ A1

∂ y

)+ z

(∂ A3

∂x− ∂ A1

∂z

)],

which is the desired formula

mx = q E1 + q (y B3 − z B2) .

There is still the question of how anyone ever came up with the LagrangianL = 1

2 mv ·v−qφ +q A ·v. In general, given a system of differential equations,one can always ask whether there is a Lagrangian whose minimal paths givesolutions to the initial equations. This is called the Inverse Problem forLagrangians. Of course, for any specific system, such as for our systemF = q(E + v × B), the corresponding L can be arrived at via fiddling withthe Euler-Lagrange equations, as is indeed what originally happened.

8.3. Exercises

Exercise 8.3.1. Show that

d

dt

(∂L

∂ y

)− ∂L

∂ y= 0

implies that

m y = q E2 + q

c(zB1 − x B3)

Page 117: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

102 8 Lagrangians and Electromagnetic Forces

and thatd

dt

(∂L

∂ z

)− ∂L

∂z= 0

implies thatmz = q E3 + q(x B2 − yB1).

Exercise 8.3.2. Let φ(x , y, z) = xyz2 be a scalar potential and A = (xyz, x +z, y + 3x) be a vector potential. Find the Lagrangian and then write down thecorresponding Euler-Lagrange equations.

Exercise 8.3.3. Use the notation of the previous problem. Let f (x , y, z) =x2 + y2z. Find the Lagrangian for the scalar potential

φ − ∂ f

∂ t

and vector potentialA +∇( f ).

Compute the corresponding Euler-Lagrange equations, showing that these arethe same as the Euler-Lagrange equations in the previous problem.

Exercise 8.3.4. Let φ be a scalar potential, A be a vector potential, and fbe any function. Show that the Euler-Lagrange equations are the same for theLagrangian using the scalar potential φ and the vector potential A as for theLagrangian using the scalar potential φ − ∂ f

∂ t and vector potential A +∇( f ).

Page 118: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9

Differential Forms

Summary: This chapter will develop the basic definitions of differential formsand the exterior algebra. The emphasis will be on how to compute withdifferential forms. The exterior algebra is both a highly efficient languagefor the basic terms from vector calculus and a language that can be easilygeneralized to far broader situations. Our eventual goal is to recast Maxwell’sequations in terms of differential forms.

9.1. The Vector Spaces �k(Rn)

9.1.1. A First Pass at the Definition

(In this subsection we give a preliminary definition for k-forms. Here they willbe finite-dimensional real vector spaces. In the next subsection we will givethe official definition, where we extend the coefficients to allow for functionsand not just numbers.)

Let x1, x2, . . . , xn be coordinates for Rn. We want to define the vector spaceof k-forms. For now, we will concentrate on explicit definitions and methodsof manipulation. (Underlying intuitions will be developed in the next section.)This division reflects that knowing how to calculate with k-forms is to a largeextent independent of what they mean. This is analogous to differentiation inbeginning calculus, where we care about derivatives because of their meaning(i.e., we want to know a function’s rate of change, or the slope of a tangentline) but use derivatives since they are easy to calculate (using various rulesof differentiation, such as the product rule, chain rule, quotient rule, etc.).Thinking about the meaning of the derivative rarely aids in calculating.

For each nonnegative integer k, we will define a vector space of dimension(n

k

)= n!

k!(n − k)!.

103

Page 119: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

104 9 Differential Forms

We write down basis elements first. For each sequence of positive integers1 ≤ i1 < i2 < · · · < ik ≤ n, we write down the symbol

dxi1 ∧ dxi2 ∧ ·· · ∧ dxik

which we call an elementary k-form. For I = {i1, i2, . . . ik}, we write

dxI := dxi1 ∧ dxi2 ∧ ·· · ∧ dxik .

There are(n

k

)such symbols. Then we define, temporarily, �k(Rn) to be the

vector space obtained by taking finite linear combinations, using real numbersas coefficients, of these elementary k-forms. (In the next subsection we willextend the definition of �k(Rn) by allowing functions to be coefficients.)

Let us consider the forms for R3, with coordinates x , y, z. The elementary1-forms are

dx ,dy,dz,

the elementary 2-forms are

dx ∧ dy, dx ∧ dz, dy ∧ dz,

and the elementary 3-form is

dx ∧ dy ∧ dz.

The elementary 0-form is denoted by 1.Then the vector space of 1-forms is

�1(R3) = {adx + bdy + cdz : a,b,c ∈R}.This is a vector space if we define scalar multiplication by

λ(adx + bdy + cdz) = λadx +λbdy +λcdz

and vector addition by

(a1dx + b1dy + c1dz) + (a2dx + b2dy + c2dz)

= (a1 + a2)dx + (b1 + b2)dy + (c1 + c2)dz.

In a similar way, we have

�0(R3) = {a · 1 : a ∈R}�2(R3) = {adx ∧ dy + bdx ∧ dz + cdy ∧ dz : a,b,c ∈R}�3(R3) = {adx ∧ dy ∧ dz : a ∈ R}.

Often we write �0(R3) simply as R.

Page 120: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9.1 The Vector Spaces �k(Rn) 105

The elementary k-form dxi1 ∧dxi2 ∧·· ·∧dxik depends on the ordering i1 <

i2 < · · · < ik . We want to make sense out of any

dxi1 ∧ dxi2 ∧ ·· · ∧ dxik ,

even if we do not have i1 < i2 < · · · < ik . The idea is that if we interchange anytwo of the terms in dxi1 ∧ dxi2 ∧ ·· · ∧ dxik , we change the sign. For example,we define

dx2 ∧ dx1 = −dx1 ∧ dx2

anddx3 ∧ dx2 ∧ dx1 = −dx1 ∧ dx2 ∧ dx3.

In general, given a collection (i1, i2, . . . , ik), there is a reordering σ (called apermutation ) such that

σ (i1) < · · · < σ (ik).

Each such permutation is the composition of interchangings of two terms. Ifwe need an even number of such permuations we say that the sign of σ is 1,while if we need an odd number, the sign is −1. (The notion of “even/odd”is well-defined, since it can be shown that a given permutation cannot be botheven and odd.) Then we define

dxi1 ∧ dxi2 ∧ ·· · ∧ dxik = sign(σ ) · dxσ (i1) ∧ dxσ (i2) ∧ ·· · ∧ dxσ (ik ).

Let us consider the 3-form dx1 ∧ dx2 ∧ dx3. There are 3! = 6 ways forrearranging (1,2,3), including the “rearrangement that leaves them alone.” Wehave

dx1 ∧ dx2 ∧ dx3 = dx1 ∧ dx2 ∧ dx3

dx2 ∧ dx1 ∧ dx3 = −dx1 ∧ dx2 ∧ dx3

dx3 ∧ dx2 ∧ dx1 = −dx1 ∧ dx2 ∧ dx3

dx1 ∧ dx3 ∧ dx2 = −dx1 ∧ dx2 ∧ dx3

dx3 ∧ dx1 ∧ dx2 = −dx1 ∧ dx3 ∧ dx2

= dx1 ∧ dx2 ∧ dx3

dx2 ∧ dx3 ∧ dx1 = −dx2 ∧ dx1 ∧ dx3

= dx1 ∧ dx2 ∧ dx3.

Finally, we need to give meaning to symbols for dxi1 ∧ dxi2 ∧ ·· · ∧ dxik whenthe i1, i2, . . . , ik are not all distinct. We just declare this type of k-form to bezero. This intuitively agrees with what we just did, since if

dxi ∧ dx j = −dx j ∧ dxi ,

Page 121: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

106 9 Differential Forms

then we should have, for i = j ,

dxi ∧ dxi = −dxi ∧ dxi = 0.

9.1.2. Functions as Coefficients

We have defined each �k(Rn) to be a real vector space. Thus the coefficientsare real numbers. But we want to allow the coefficients to be functions. Forexample, we want to make sense out of terms such as

(x2 + y)dx + xyzdy + yz3dz

andsin (xy2z)dx ∧ dy + eydx ∧ dz + zdy ∧ dz.

Using tools from abstract algebra, the most sophisticated way is to start witha ring of functions on Rn , such as the ring of differentiable functions. Then wenow officially define each �k(Rn) to be the module over the ring of functionswith basis elements the elementary k-forms. This will be the definition that wewill be using for �k(Rn) for the rest of the book. (As is the case throughoutthis book, our functions are infinitely differentiable.)

There is a more down-to-earth approach, though. We want to make senseout of the symbol ∑

I

f I (x1, x2, . . . , xn)dxI ,

where we are summing over all indices I = {i1, i2, . . . ik} for i1 < i2 < · · · < ik

and where the fI are functions. For each point a = (a1,a2, . . . ,an) ∈ Rn , we

interpret∑

I f I (x1, x2, . . . , xn)dxI at a as the k-form∑I

fI (a)dxI =∑

I

f I (a1,a2, . . . ,an)dxI .

Thus∑

I f I (x1, x2, . . . , xn)dxI can be thought of as a way of defining a wholefamily of k-forms, changing at each point a ∈Rn .

9.1.3. The Exterior Derivative

So far, we have treated k-forms∑

I f I (x1, x2, . . . , xn)dxI quite formally, withno interpretation yet given. Still, we have symbols like dx showing up. Thissuggests that underlying our eventual interpretation will be derivatives. Fornow, we simply start with the definition of the exterior derivative:

Definition 9.1.1. The exterior derivative

d : �k(Rn) → �k+1(Rn)

Page 122: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9.1 The Vector Spaces �k(Rn) 107

is defined by first setting, for any real-valued function f (x1, . . . , xn),

d( f (x1, x2, . . . , xn)dxI ) =n∑

j=1

∂ f

∂x jdx j ∧ dxI

and then, for sums of elementary k-forms, setting

d

(∑I

f I (x1, x2, . . . , xn)dxI

)=∑

I

d( f I (x1, x2, . . . , xn) ∧ dxI .

For example,

d((x2 + y + z3)dx

)= ∂(x2 + y + z3)

∂xdx ∧ dx

+ ∂(x2 + y + z3)

∂ ydy ∧ dx

+ ∂(x2 + y + z3)

∂zdz ∧ dx

= − dx ∧ dy − 3z2dx ∧ dz.

Theorem 9.1.1. For any k-form ω, we have

d(dω) = 0.

The proof is a calculation that we leave for the exercises. At a critical stageyou must use that the order of differentiation does not matter, meaning that

∂2 f

∂xi∂x j= ∂2 f

∂x j∂xi.

Theorem 9.1.2 (Poincaré’s Lemma). In Rn, let ω be a k-form such that

dω = 0.

Then there exists a (k − 1)-form τ such that

ω = dτ .

(For a proof, see theorem 4.11 in [61]. It must be emphasized that we areworking in R

n. Later we will have differential forms on more complicatedspaces. On these, Poincaré’s Lemma need not hold.)

Page 123: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

108 9 Differential Forms

Let us look at an example. In R3, let

ω = f dx + gdy + hdz.

Suppose that

dω = 0.

This means that

0 = dω

= d( f dx + gdy + hdz)

= ∂ f

∂ ydy ∧ dx + ∂ f

∂zdz ∧ dx

+ ∂g

∂xdx ∧ dy + ∂g

∂zdz ∧ dy

+ ∂h

∂xdx ∧ dz + ∂h

∂ ydy ∧ dz

=(

∂g

∂x− ∂ f

∂ y

)dx ∧ dy +

(∂h

∂x− ∂ f

∂z

)dx ∧ dz +

(∂h

∂ y− ∂g

∂z

)dy ∧ dz.

Thus dω = 0 is just a succinct notation for the set of conditions

∂ f

∂ y= ∂g

∂x

∂ f

∂z= ∂h

∂x∂g

∂z= ∂h

∂ y.

The preceding theorem is stating that there is a 0-form τ such that dτ = ω.Now a 0-form is just a function τ (x , y, z), which in turn means

dτ = ∂τ

∂xdx + ∂τ

∂ ydy + ∂τ

∂zdz.

Thus dτ = ω means that there is a function τ such that

f = ∂τ

∂x

g = ∂τ

∂ y

h = ∂τ

∂z.

Page 124: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9.2 Tools for Measuring 109

We will see in Chapter 11 that this theorem is a general way of stating Theorem7.2.2 from vector calculus, namely, that the scalar and vector potentials alwaysexist.

9.2. Tools for Measuring

We will now see that differential forms are tools for measuring. A 1-form isa measuring tool for curves, a 2-form a measuring tool for surfaces, and, ingeneral, a k-form is a measuring tool for k-dimensional objects. In general, ifM is a k-dimensional subset of Rn and ω is a k-form, we want to interpret∫

as a number. We will start with seeing how to use 1-forms to make thesemeasurements for curves in R

3, then turn to the 2-form case for surfaces inR

3, and finally see the general situation. As mentioned earlier, in manipulatingforms, we rarely need actually to think about them as these “measuring tools.”

9.2.1. Curves in R3

We will consider curves in R3 as images of maps

γ : [a,b] →R3

whereγ (u) = (x(u), y(u), z(u)).

We will frequently denote this curve by γ .For example,

γ (u) = (u,3u,5u)

describes a straight line through the point γ (0) = (0,0,0) and the point γ (1) =(1,3,5). The curve

γ (u) = (cos(u), sin(u),1)

describes a unit circle in the plane z = 1.The tangent vector to a curve γ (u) at a point u is the vector(

dx(u)

du,dy(u)

du,dz(u)

du

).

Now consider a 1-form

ω = f (x , y, z)dx + g(x , y, z)dy + h(x , y, z)dz.

Page 125: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

110 9 Differential Forms

Recalling that γ : [a,b] →R3, we make the following definitions:

∫γ

f (x , y, z)dx =∫ b

af (x(u), y(u), z(u))

(dx

du

)du

∫γ

g(x , y, z)dy =∫ b

ag(x(u), y(u), z(u))

(dy

du

)du

∫γ

h(x , y, z)dz =∫ b

ah(x(u), y(u), z(u))

(dz

du

)du.

Then we define ∫γ

ω =∫

γ

f dx +∫

γ

gdy +∫

γ

hdz.

This is called the path integral. (In many multivariable calculus classes, it iscalled the line integral, even though the integration is not necessarily done overa straight line but over a more general curve.)

The definition for∫

γω explicitly uses the parameterization of the curve

γ (u) = (x(u), y(u), z(u)). Luckily, the actual value of∫

γω depends not on

this parameterization but only on the image curve γ in R3. In fact, all of

the preceding can be done for curves in Rn , leading to:

Theorem 9.2.1. Let ω ∈ �1(Rn) and let

γ : [a,b] →Rn

be any curve

γ (u) = (x1(u), . . . , xn(u)) .

Let

µ : [c,d] → [a,b]

be any map such that µ(c) = a and µ(d) = b. Then

∫γ ◦µ

ω =∫

γ

ω.

This means that the path integral∫

γω is independent of parameterization.

You are asked to prove a slightly special case of this in the exercises.

Page 126: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9.2 Tools for Measuring 111

9.2.2. Surfaces in R3

For us, surfaces in R3 will be described as the image of a map

γ : [a1,a2] × [b1,b2] →R3,

where

γ (u,v) = (x(u,v), y(u,v), z(u,v)).

v

u

z

xy

Figure 9.1

The Jacobian of the map γ is

D(γ ) =

∂x

∂u

∂x

∂v∂ y

∂u

∂ y

∂v∂z

∂u

∂z

∂v

.

From multivariable calculus, we know that at the point γ (a,b) =(x(a,b), y(a,b), z(a,b)), the tangent plane is spanned by the columns of D(γ ).

Let

ω = f dx ∧ dy + gdx ∧ dz + hdy ∧ dz.

We want to be able to give meaning to the symbol

∫γ

ω =∫

γ

( f dx ∧ dy + gdx ∧ dz + hdy ∧ dz).

Page 127: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

112 9 Differential Forms

Start with the following definitions:

dx ∧ dy (D(γ )) =dx ∧ dy

∂x

∂u

∂x

∂v∂ y

∂u

∂ y

∂v∂z

∂u

∂z

∂v

= det

∂x

∂u

∂x

∂v∂ y

∂u

∂ y

∂v

= ∂x

∂u

∂ y

∂v− ∂ y

∂u

∂x

∂v,

dx ∧ dz (D(γ )) =dx ∧ dz

∂x

∂u

∂x

∂v∂ y

∂u

∂ y

∂v∂z

∂u

∂z

∂v

= det

∂x

∂u

∂x

∂v∂z

∂u

∂z

∂v

= ∂x

∂u

∂z

∂v− ∂z

∂u

∂x

∂v,

dy ∧ dz (D(γ )) =dy ∧ dz

∂x

∂u

∂x

∂v∂ y

∂u

∂ y

∂v∂z

∂u

∂z

∂v

= det

∂ y

∂u

∂ y

∂v∂z

∂u

∂z

∂v

= ∂ y

∂u

∂z

∂v− ∂z

∂u

∂ y

∂v.

Page 128: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9.2 Tools for Measuring 113

Then define each of the following:∫γ

f dx ∧ dy =∫ b2

b1

∫ a2

a1

f (x(u,v), y(u,v), z(u, z))dx ∧ dy(D(γ )) dudv,

∫γ

gdx ∧ dz =∫ b2

b1

∫ a2

a1

g(x(u,v), y(u,v), z(u, z))dx ∧ dz(D(γ )) dudv,

∫γ

hdy ∧ dz =∫ b2

b1

∫ a2

a1

h(x(u,v), y(u,v), z(u, z))dy ∧ dz(D(γ )) dudv.

Finally define∫γ

ω =∫

γ

f dx ∧ dy +∫

γ

gdx ∧ dz +∫

γ

hdy ∧ dz.

We have reduced∫

γω to the calculation of three double integrals in the (u,v)-

plane.Although we have the domain for γ being a rectangle [a1,a2]× [b1,b2], any

region in the (u,v)-plane can be used.As with curves, while the definition of

∫γω seems to depend on the

given parameterization for our surface, the actual computed number isparameterization independent, as seen in

Theorem 9.2.2. Let ω ∈ �2(Rn). Let

γ : [a1,b1] × [c1,d1] →Rn

be any surfaceγ (u,v) = (x1(u,v), . . . , xn(u,v)) .

Letµ = (µ1(s, t),µ2(s, t)) : [a2,b2] × [c2,d2] → [a1,b1] × [c1,d1]

be any map such that the boundary of [a2,b2] × [c2,d2] maps to the boundaryof [a1,b1] × [c1,d1]. Then ∫

γ ◦µω =

∫γ

ω.

A slightly special case of the proof is one of the exercises.

9.2.3. k-manifolds in Rn

A k-space in Rn will be described as the image of a map

γ : [a1,b1] ×·· ·× [ak ,bk] → Rn,

Page 129: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

114 9 Differential Forms

whereγ (u1, . . . ,uk) = (x1(u1, . . . ,uk), . . . , xn(u1, . . . ,uk)).

The Jacobian of the map γ is the following n × k matrix:

D(γ ) =

∂x1

∂u1· · · ∂x1

∂uk...

∂xn

∂u1· · · ∂xn

∂uk

.

The image of γ is a manifold if the preceding matrix has rank k at all points,which geometrically means that, at each point of the image, the columns of theJacobian are linearly independent vectors in R

n spanning the tangent space.Let each row of D(γ ) be denoted by Ri , such that

D(γ ) =

R1...

Rn

.

If ω is any k-form, we want to make sense out of the symbol∫

γω. As before

with curves and surfaces, we start by looking at an elementary k-form dxi1 ∧·· · ∧ dxik . The key is that at a point on γ , we define

dxi1 ∧ ·· · ∧ dxik (D(γ )) = det

Ri1...

Rik

.

Then we define ∫γ

f dxi1 ∧ ·· · ∧ dxik

to be ∫f (x1(u1, . . . ,uk), . . . , xn(u1, . . . ,uk))(D(γ )) du1 · · ·duk ,

where we are integrating over [a1,b1] ×·· ·× [ak ,bk] in Rk .Then if

ω =∑

f I dxI ,

we define ∫γ

ω =∑∫

γ

f I dxI .

As with the special cases of curves and surfaces, this integral is actuallyindependent of parameterization.

Page 130: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9.3 Exercises 115

(For more on differential forms, see [26] or Hubbard and Hubbard’s VectorCalculus, Linear Algebra, and Differential Forms: A Unified Approach, [33].)

9.3. Exercises

Exercise 9.3.1. Write down a basis for the vector space �1(R4), the vectorspace �2(R4), the vector space �3(R4), and the vector space �4(R4).

Exercise 9.3.2. Write down a basis for the vector space �1(R5), the vectorspace �2(R5), the vector space �3(R5), and the vector space �4(R5).

Exercise 9.3.3. Let

ω = (x21 + x2x3)dx1 + x2x3

3 dx2 ∈ �1(R3).

Compute dω.

Exercise 9.3.4. Let

ω = (x21 + x2x4)dx1 ∧ dx2 + x2x3

3 dx2 ∧ dx3 + sin (x2)dx3 ∧ dx4 ∈ �2(R4).

Compute dω.

We now begin a series of exercises to show that

d2ω = 0.

Exercise 9.3.5. Let f (x , y, z) be any function on R3. Show that

dd f = d2 f = 0.

Exercise 9.3.6. Let f be any function on Rn. Show that

d2 f = 0.

Exercise 9.3.7. Let f be any function on Rn. Set ω = f dx1. Show that

d2ω = 0.

Exercise 9.3.8. Let ω =∑fi dxi be any 1-form. Show that

d2ω = 0.

Exercise 9.3.9. Let ω = f dx1 ∧ dx2 be any 2-form on Rn. Show that

d2ω = 0.

Page 131: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

116 9 Differential Forms

Exercise 9.3.10. Let ω = f dxI be a k-form on Rn. Show that

d2ω = 0.

Exercise 9.3.11. Let ω be a k-form on Rn. Show that

d2ω = 0.

Exercise 9.3.12. Let ω = (x21 + x2x3)dx1 + x2x3

3dx2 ∈ �1(R3). For the curve

γ : [2,6] →R3,

given byγ (u) = (u,2 + u,5u) ,

compute ∫γ

ω.

Exercise 9.3.13. Let

ω = (x21 + x2x4)dx1 ∧ dx2 + x2x3

3 dx2 ∧ dx3 + x2dx3 ∧ dx4 ∈ �2(R4).

For the surfaceγ : [0,6] × [0,3] →R

4,

given byγ (u,v) = (u + v,2u + 3v,5u + v,u − v) ,

compute ∫γ

ω.

Exercise 9.3.14. Let

ω = x1x2x23 dx1 ∧ dx2 ∧ dx3 ∈ �3(R3).

For the boxγ : [0,1] × [0,2] × [0,3]→ R

3,

defined byγ (u,v,w) = (u +w,v+ 3w,u),

compute ∫γ

ω.

Exercise 9.3.15. Let f (x1, x2, x3) = x31 x2x2

3 . Set

ω = d( f ).

Page 132: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

9.3 Exercises 117

For the curveγ : [0,6] →R

3,

given byγ (u) = (

u2,2 + u3,5u)

,

compute from the definition ∫γ

ω.

Show that this is equal to

f (γ (6)) − f (γ (0)).

Exercise 9.3.16. Let ω = (x21 + x2x3)dx1 + x2x3

3 dx2 ∈ �1(R3). Let

γ : [2,6] → R3

be the curveγ (u) = (

u2,2 + u3,5u)

.

Letµ : [0,4] → [2,6]

be defined byµ(t) = t + 2.

For the curve γ ◦µ : [0,4] →R3, compute∫γ ◦µ

ω.

Show that this integral equals∫

γω.

Exercise 9.3.17. Use the same notation as in the last problem, but now let

µ : [√

2,√

6] → [2,6]

be defined byµ(t) = t2.

Again, for the curve γ ◦µ : [√

2,√

6] → R3, compute∫γ ◦µ

ω.

Show again that this integral equals∫

γω.

Exercise 9.3.18. Let ω ∈ �1(Rn). Let

γ : [a,b] →Rn

Page 133: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

118 9 Differential Forms

be any curveγ (u) = (x1(u), . . . , xn(u)) .

Letµ : [c,d] → [a,b]

be any map such that µ(c) = a and µ(d) = b and, for points in [c,d], dµ/dt >

0. Show that ∫γ ◦µ

ω =∫

γ

ω.

This means that the path integral∫

γω is independent of parameterization.

Exercise 9.3.19. Let ω ∈ �2(Rn). Let

γ : [a1,b1] × [c1,d1] →Rn

be any surfaceγ (u,v) = (x1(u,v), . . . , xn(u,v)) .

Letµ = (µ1(s, t),µ2(s, t)) : [a2,b2] × [c2,d2] → [a1,b1] × [c1,d1]

be any map such that the boundary of [a2,b2] × [c2,d2] maps to the boundaryof [a1,b1] × [c1,d1] and, for all points in [a2,b2] × [c2,d2], the Jacobian ispositive, meaning that

det

∂µ1

∂s

∂µ1

∂ t∂µ2

∂s

∂µ2

∂ t

> 0.

Show that ∫γ ◦µ

ω =∫

γ

ω.

This means that the surface integral∫

γω is independent of parameterization.

Exercise 9.3.20. Generalize the last few problems to the integral of

ω ∈ �k(Rn)

along a k-manifold. Do not do the proof (unless you so desire), but state theappropriate theorem.

Page 134: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

10

The Hodge � Operator

Summary: The goal of this chapter is to define the exterior algebra �(Rn)and then to define and understand, in several different contexts, certain naturallinear maps

� : �k(Rn) → �n−k(Rn).

We will start with the standard Hodge � operator for the exterior algebra.Using this � operator, we will then show how the various operations fromvector calculus, such as the gradient, the curl, and the divergence, have naturalinterpretations in the language of the exterior algebra. Then we will generalizethe Hodge � operator by placing different inner products on the vector spaces�k(Rn), allowing us to interpret the Hodge � operator for the Minkowskimetric, which will be critical for the next chapter, where we will interpretMaxwell’s equations in terms of differential forms.

10.1. The Exterior Algebra and the � Operator

For this section, our goal is to define a natural linear map

� : �k(Rn) → �n−k(Rn),

called the Hodge � operator, or, simply, the � operator. The symbol � ispronounced “star.”

First, there is a natural way to combine a k-form ω with an l-form τ tocreate a (k + l)-form, which we will denote as

ω ∧ τ .

It is best to see this via an example. Suppose we are in R7. Let

ω = dx1 ∧ dx3 + dx2 ∧ dx5

119

Page 135: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

120 10 The Hodge � Operator

be a 2-form and

τ = dx2 ∧ dx4 ∧ dx6 + dx3 ∧ dx4 ∧ dx6

be a 3-form. Then ω ∧ τ will be the 5-form

ω ∧ τ = (dx1 ∧ dx3 + dx2 ∧ dx5) ∧ (dx2 ∧ dx4 ∧ dx6 + dx3 ∧ dx4 ∧ dx6)

= (dx1 ∧ dx3) ∧ (dx2 ∧ dx4 ∧ dx6)

+ (dx1 ∧ dx3) ∧ (dx3 ∧ dx4 ∧ dx6)

+ (dx2 ∧ dx5) ∧ (dx2 ∧ dx4 ∧ dx6)

+ (dx2 ∧ dx5) ∧ (dx3 ∧ dx4 ∧ dx6)

= −dx1 ∧ dx2 ∧ dx3 ∧ dx4 ∧ dx6 + dx2 ∧ dx3 ∧ dx4 ∧ dx5 ∧ dx6.

We denote the space of all differential forms on Rn by �(Rn), which we callthe exterior algebra.

Now we turn to the definition of � : �k(Rn) → �n−k(Rn).The vector space �n(Rn) of n-forms on Rn is one dimensional, with basis

element

dx1 ∧ dx2 ∧ ·· · ∧ dxn.

Any k-form ω is the sum of elementary k-forms:

ω =∑

f I dxI

where, as before, we are summing over all I = (ii , . . . , ik) and where dxI =dxi1 ∧ ·· · ∧ dxik . We will first define � : �k(Rn) → �(Rn−k) as a map on theelementary k-forms and then simply define

�ω =∑

f I � (dxI ),

for a general k−form.So, given an elementary k-form dxI , we define �(dxI ) to be the elementary

(n − k)-form dxJ such that

dxI ∧ dxJ = dx1 ∧ dx2 ∧ ·· · ∧ dxn.

Let us look at some examples. We start with R3, with coordinates (x , y, z).We write a basis for the space of three-forms on R3 as

dx ∧ dy ∧ dz.

Page 136: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

10.2 Vector Fields and Differential Forms 121

Then we have

�(dx) = dy ∧ dz

�(dy) = −dx ∧ dz

�(dz) = dx ∧ dy.

The minus sign in the second equation arises because

dy ∧ ( − dx ∧ dz) = −dy ∧ dx ∧ dz = dx ∧ dy ∧ dz.

10.2. Vector Fields and Differential Forms

Our earlier language of vector fields, including gradients, curls, and diver-gences, can now be recast into the language of differential forms. The key isthat there are four linear maps (using the notation in section 6.3 of [26]):

T0 : functions on R3 → 0-formsT1 : vector fields on R

3 → 1-formsT2 : vector fields on R3 → 2-formsT3 : functions on R3 → 3-forms

with each defined as

T0( f (x , y, z)) = f (x , y, z)

T1(F1(x , y, z), F2(x , y, z), F3(x , y, z)) = F1dx + F2dy + F3dz

T2(F1(x , y, z), F2(x , y, z), F3(x , y, z)) = �(F1dx + F2dy + F3dz)

= F1dy ∧ dz − F2dx ∧ dz + F3dx ∧ dy

T3( f (x , y, z)) = f (x , y, z)dx ∧ dy ∧ dz.

Then we have that ∇( f ) will correspond to the exterior derivative of a function:

∇( f ) =(

∂ f

∂x,∂ f

∂ y,∂ f

∂z

)T1→ ∂ f

∂xdx + ∂ f

∂ ydy + ∂ f

∂zdz

= d( f ).

Page 137: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

122 10 The Hodge � Operator

Similarly we have:

∇ × (F1, F2, F3) =(

∂ F3

∂ y− ∂ F2

∂z,∂ F1

∂z− ∂ F3

∂x,∂ F2

∂x− ∂ F1

∂ y

)T2→(

∂ F3

∂ y− ∂ F2

∂z

)dy ∧ dz −

(∂ F1

∂z− ∂ F3

∂x,

)dx ∧ dz

+(

∂ F2

∂x− ∂ F1

∂ y

)dx ∧ dy

= d(F1dx + F2dy + F3dz)

= d(T1(F1, F2, F3))

∇ · (F1, F2, F3) = ∂ F1

∂x+ ∂ F2

∂ y+ ∂ F3

∂z

T3→(

∂ F1

∂x+ ∂ F2

∂ y+ ∂ F3

∂z

)dx ∧ dy ∧ dz

= d(F1dy ∧ dz − F2dx ∧ dz + F3dx ∧ dy)

= d(T2(F1, F2, F3)).

Of course, this level of abstraction for its own sake is not worthwhile. Whatis important is that we have put basic terms of vector calculus into a muchmore general setting.

10.3. The � Operator and Inner Products

In Section 10.1 we defined the map � : �k(Rn) → �n−k(Rn). Actually there aremany different possible star operators, each depending on a choice of a basiselement for the one-dimensional vector space �n(Rn) and on a choice of aninner product on each vector space �k(Rn).

Let ω be a non-zero element of �n(Rn). We declare this to be our basiselement. Let 〈·, ·〉 be a fixed inner product on �k(Rn).

Definition 10.3.1. Given ω and 〈·, ·〉, for any α ∈ �k(Rn), we define �(α) to bethe (n − k)-form such that, for any β ∈ �k(Rn), we have

β ∧ ( �α) = 〈α,β〉ω.

Let us see that this definition is compatible with the earlier one. For Rn withcoordinates x1, x2, . . . , xn , we let our basis n-form be

ω = dx1 ∧ ·· · ∧ dxn.

Page 138: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

10.4 Inner Products on �(Rn) 123

For indices I = (i1, . . . , ik) with 1 ≤ i1 < i2 < · · · < ik ≤ n, we know that variousdxI form a basis for �k(Rn). We choose our inner product such that the dxI

are orthonormal:

〈dxI ,dxJ 〉 ={

1 if I = J0 if I �= J

.

Let us show that�dx1 = dx2 ∧ ·· · ∧ dxn.

First, we need, for i �= 1,

dx1 ∧ �dx1 = 〈dx1,dx1〉dx1 ∧ ·· · ∧ dxn

= dx1 ∧ ·· · ∧ dxn

dxi ∧ �dx1 = 〈dx1,dxi〉dx1 ∧ ·· · ∧ dxn

= 0 · dx1 ∧ ·· · ∧ dxn

= 0.

There is only one (n − 1)-form that has these properties. Hence

�dx1 = dx2 ∧ ·· · ∧ dxn.

Similarly, we have

�dx2 = −dx1 ∧ dx3 ∧ ·· · ∧ dxn,

since for i �= 1,

dx2 ∧ �dx2 = 〈dx2,dx2〉dx1 ∧ ·· · ∧ dxn

= dx2 ∧ ( − (dx1 ∧ dx3 · · · ∧ dxn))

dxi ∧ �dx1 = 〈dx1,dxi〉dx1 ∧ ·· · ∧ dxn

= 0 · dx1 ∧ ·· · ∧ dxn

= 0.

10.4. Inner Products on �(Rn)

From the last section, given an inner product on �k(Rn) and a basis n-formon �n(Rn), there is a star operator. But, to a large extent, the space of allexterior forms �(Rn) is one object, not a bunch of independent vector spaces�1(Rn),�2(Rn),. . . ,�n(Rn).

In this section we will choose an inner product for �1(Rn) and show howthis inner product induces inner products on each of the other vector spaces

Page 139: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

124 10 The Hodge � Operator

�k(Rn). First to recall some linear algebra facts. Let V be a vector space withbasis v1,v2, . . . ,vn . An inner product 〈·, ·〉 on V is completely determined byknowing the values

〈vi ,v j 〉 = ai j ,

for all 1 ≤ i , j ≤ n. Given two vectors u =∑ni=1 αivi and w =∑n

j=1 β jv j , theinner product is

〈u,w〉 =⟨

n∑i=1

αivi ,n∑

j=1

β jv j

=n∑

i , j=1

ai jαiβ j ,

or, in matrix notation,

〈u,w〉 = (α1, · · · ,αn)

a11 · · · a1n...

an1 · · · ann

β1...

βn

=n∑

i , j=1

ai jαiβ j .

Now we turn to the exterior algebra �(Rn). Our basis for �1(Rn) isdx1,dx2, . . . ,dxn. Suppose we have an inner product on �1(Rn). Thus weknow the values for 〈dxi ,dx j〉. A basis for �k(Rn) is formed from all thedxI = dxi1 ∧·· ·∧dxik . Then we define our inner product on �k(Rn) by setting,for I = (i1, . . . , ik) and J = ( j1, . . . , jk),

〈dxI ,dxJ 〉 =∑σ∈Sk

sign(σ )〈dxi1 ,dx jσ (1)〉〈dxi2 ,dx jσ (2)〉 · · · 〈dxik ,dx jσ (k)〉,

where Sk is the group of all the permutations on k elements.Of course, we need to look at an example. For 2-forms, this definition gives

us

〈dxi1 ∧ dxi2 ,dx j1 ∧ dx j2〉 = 〈dxi1 ,dx j1〉〈dxi2 ,dx j2〉− 〈dxi1 ,dx j2〉〈dxi2 ,dx j1〉

Page 140: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

10.5 The � Operator with the Minkowski Metric 125

and for 3-forms, we have that 〈dxi1 ∧ dxi2 ∧ dxi3 ,dx j1 ∧ dx j2 ∧ dx j3〉 is

〈dxi1 ,dx j1〉〈dxi2 ,dx j2〉〈dxi3 ,dx j3〉− 〈dxi1 ,dx j2〉〈dxi2 ,dx j1〉〈dxi3 ,dx j3〉− 〈dxi1 ,dx j3〉〈dxi2 ,dx j2〉〈dxi3 ,dx j1〉− 〈dxi1 ,dx j1〉〈dxi2 ,dx j3〉〈dxi3 ,dx j2〉+ 〈dxi1 ,dx j2〉〈dxi2 ,dx j3〉〈dxi3 ,dx j1〉+ 〈dxi1 ,dx j3〉〈dxi2 ,dx j1〉〈dxi3 ,dx j2〉.

10.5. The � Operator with the Minkowski Metric

We turn to R4, with space coordinates x , y, and z and time coordinate t . Specialrelativity suggests that the four-dimensional vector space �1(R4) with orderedbasis dt ,dx ,dy,dz should have an inner product given by

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

.

(In our chapter on special relativity, we had a c2 instead of the 1 in the matrix;here we are following the convention of choosing units to make the speed oflight one.) Thus for any 1-forms

α = α1dt +α2dx +α3dy +α4dz

β = β1dt +β2dx +β3dy +β4dz,

the inner product will be

〈α,β〉 = (α1,α2,α3,α4)

1 0 0 00 −1 0 00 0 −1 00 0 0 −1

β1

β2

β3

β4

= α1β1 −α2β2 −α3β3 −α4β4.

The induced inner product on the six-dimensional vector space of 2-forms,with ordered basis

dt ∧ dx , dt ∧ dy, dt ∧ dz, dx ∧ dy, dx ∧ dz, dy ∧ dz

Page 141: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

126 10 The Hodge � Operator

is

−1 0 0 0 0 00 −1 0 0 0 00 0 −1 0 0 00 0 0 1 0 00 0 0 0 1 00 0 0 0 0 1.

.

As an example,

〈dt ∧ dx ,dt ∧ dx〉 = 〈dt ,dt〉〈dx ,dx〉− 〈dt ,dx〉〈dx ,dt〉= 1 · ( − 1) − 0 · 0

= −1,

while

〈dt ∧ dx ,dt ∧ dy〉 = 〈dt ,dt〉〈dx ,dy〉− 〈dt ,dy〉〈dx ,dt〉= 1 · 0 − 0 · 0

= 0.

The basis elements are mutually orthogonal, and we have

〈dt ∧ dx ,dt ∧ dx〉 = 〈dt ∧ dy,dt ∧ dy〉 = 〈dt ∧ dz,dt ∧ dz〉 = −1

and

〈dx ∧ dy,dx ∧ dy〉 = 〈dy ∧ dz,dy ∧ dz〉 = 〈dy ∧ dz,dy ∧ dz〉 = 1.

In a similar manner, we have that the induced inner product on the space of3-forms �3(R4) will have the basis elements

dx ∧ dy ∧ dz,dt ∧ dx ∧ dy,dt ∧ dx ∧ dz,dt ∧ dy ∧ dz

being mutually orthogonal and

〈dx ∧ dy ∧ dz,dx ∧ dy ∧ dz〉 = − 1

〈dt ∧ dx ∧ dy,dt ∧ dx ∧ dy〉 =1

〈dt ∧ dx ∧ dz,dt ∧ dx ∧ dz〉 =1

〈dt ∧ dy ∧ dz,dt ∧ dy ∧ dz〉 =1.

This allows us now to write down the � operator with respect to theMinkowski inner product. We choose our basis for �4(R4) to be dt ∧ dx ∧dy ∧ dz. Then

Page 142: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

10.6 Exercises 127

�dt = dx ∧ dy ∧ dz�dx = dt ∧ dy ∧ dz�dy = −dt ∧ dx ∧ dz�dz = dt ∧ dx ∧ dy

�(dt ∧ dx) = −dy ∧ dz�(dt ∧ dy) = dx ∧ dz�(dt ∧ dz) = −dx ∧ dy�(dx ∧ dy) = dt ∧ dz�(dx ∧ dz) = −dt ∧ dy�(dy ∧ dz) = dt ∧ dx

�(dx ∧ dy ∧ dz) = dt�(dt ∧ dx ∧ dy) = dz�(dt ∧ dx ∧ dz) = −dy�(dt ∧ dy ∧ dz) = dx .

10.6. Exercises

Exercise 10.6.1. Using the Hodge � operator of Section 10.1, find �(dx1) and�(dx2) for dx1,dx2 ∈ �1(R6).

Exercise 10.6.2. Using the Hodge � operator of Section 10.1, find

�(dx2 ∧ dx4)

and�(dx1 ∧ dx4)

for dx2 ∧ dx4,dx1 ∧ dx4 ∈ �2(R6).

Exercise 10.6.3. Using the Hodge � operator of Section 10.1, find

�(dx2 ∧ dx3 ∧ dx4)

and�(dx1 ∧ dx3 ∧ dx4)

for dx2 ∧ dx3 ∧ dx4,dx1 ∧ dx3 ∧ dx4 ∈ �3(R6).

Exercise 10.6.4. Using the Hodge � operator of Section 10.1, find

�(dx2 ∧ dx3 ∧ dx4 ∧ dx5)

and�(dx1 ∧ dx3 ∧ dx4 ∧ dx5)

Page 143: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

128 10 The Hodge � Operator

for dx2 ∧ dx3 ∧ dx4 ∧ dx5,dx1 ∧ dx3 ∧ dx4 ∧ dx5 ∈ �4(R6).

Exercise 10.6.5. Using the Hodge � operator of Section 10.1, find

�(dx2 ∧ dx3 ∧ dx4 ∧ dx5 ∧ dx6)

and�(dx1 ∧ dx3 ∧ dx4 ∧ dx5 ∧ dx6)

for dx2 ∧ dx3 ∧ dx4 ∧ dx5 ∧ dx6,dx1 ∧ dx3 ∧ dx4 ∧ dx5 ∧ dx6 ∈ �5(R6).

Exercise 10.6.6. Define an inner product on �1(R3) by setting

〈dx1,dx1〉 = 2, 〈dx2,dx2〉 = 4, 〈dx3,dx3〉 = 1

and〈dx1,dx2〉 = 1, 〈dx2,dx3〉 = 0, 〈dx1,dx3〉 = 0.

With the ordering dx1,dx2,dx3, write this inner product as a 3 × 3 symmetricmatrix.

Exercise 10.6.7. With the inner product on �1(R3) of the previous problem,find the corresponding inner products on �2(R3) and �3(R3). Using theordering dx1 ∧ dx2,dx1 ∧ dx3,dx2 ∧ dx3, write the inner product on �2(R3)as a 3 × 3 symmetric matrix.

Exercise 10.6.8. Using the Hodge � operator as defined in Section 10.3 andthe inner product in the previous problem, find

�(dx1),�(dx2),�(dx3),

with dx1 ∧ dx2 ∧ dx3 as the basis element of �3(R3).

Exercise 10.6.9. Using the Hodge � operator as defined in Section 10.3 andthe inner product in the previous problem, find

�(dx1 ∧ dx2),�(dx2 ∧ dx3),�(dx1 ∧ dx3),

with dx1 ∧ dx2 ∧ dx3 as the basis element of �3(R3).

Exercise 10.6.10. Define an inner product on �1(R3) by setting

〈dx1,dx1〉 = 3, 〈dx2,dx2〉 = 4, 〈dx3,dx3〉 = 5

and〈dx1,dx2〉 = 1, 〈dx1,dx3〉 = 2, 〈dx2,dx3〉 = 1.

With the ordering dx1,dx2,dx3, write this inner product as a 3 × 3 symmetricmatrix.

Page 144: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

10.6 Exercises 129

Exercise 10.6.11. With the inner product on �1(R3) of the previous problem,find the corresponding inner products on �2(R3) and �3(R3). Using theordering dx1 ∧ dx2,dx1 ∧ dx3,dx2 ∧ dx3, write the inner product on �2(R3)as a 3 × 3 symmetric matrix.

Exercise 10.6.12. Using the Hodge � operator as defined in Sections 10.3 and10.4 and the inner product in the previous problem, find

�(dx1),�(dx2),�(dx3),

with dx1 ∧ dx2 ∧ dx3 the basis element of �3(R3).

Exercise 10.6.13. Using the Hodge � operator as defined in 10.3 and 10.4 andthe inner product in the previous problem, find

�(dx1 ∧ dx2),�(dx2 ∧ dx3),�(dx1 ∧ dx3),

with dx1 ∧ dx2 ∧ dx3 the basis element of �3(R3).

Exercise 10.6.14. Using the Minkowski metric, show

�dt = dx ∧ dy ∧ dz�dx = dt ∧ dy ∧ dz�dy = −dt ∧ dx ∧ dz�dz = dt ∧ dx ∧ dy

�(dt ∧ dx) = −dy ∧ dz�(dt ∧ dy) = dx ∧ dz�(dt ∧ dz) = −dx ∧ dy�(dx ∧ dy) = dt ∧ dz�(dx ∧ dz) = −dt ∧ dy�(dy ∧ dz) = dt ∧ dx

�(dx ∧ dy ∧ dz) = dt�(dt ∧ dx ∧ dy) = dz�(dt ∧ dx ∧ dz) = −dy�(dt ∧ dy ∧ dz) = dx .

Page 145: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

11

The Electromagnetic Two-Form

Summary: We recast Maxwell’s equations into the language of differentialforms. This language not only allows for a deeper understanding of Maxwell’sequations, and for eventual generalizations to more abstract manifolds, but willalso let us recast Maxwell’s equations in terms of the calculus of variations viaLagrangians.

11.1. The Electromagnetic Two-Form

We start with the definitions:

Definition 11.1.1. Let E = (E1, E2, E3) and B = (B1, B2, B3) be two vectorfields. The associated electromagnetic two-form is

F = E1dx ∧ dt + E2dy ∧ dt + E3dz ∧ dt

+ B1dy ∧ dz + B2dz ∧ dx + B3dx ∧ dy.

This two-form is also called the Faraday two-form.

Definition 11.1.2. Let ρ(x , y, z, t) be a function and (J1, J2, J3) be a vectorfield. The associated current one-form is

J = ρdt − J1dx − J2dy − J3dz.

11.2. Maxwell’s Equations via Forms

So far we have just repackaged the vector fields and functions that make upMaxwell’s equations. That this repackaging is at least reasonable can be seenvia

130

Page 146: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

11.3 Potentials 131

Theorem 11.2.1. Vector fields E, B, and J and function ρ satisfy Maxwell’sequations if and only if

dF =0

� d � F = J .

Here the star operator is with respect to the Minkowski metric, withbasis element dt ∧ dx ∧ dy ∧ dz for �(R4). The proof is a long, thoughenjoyable, calculation, which we leave for the exercises. While the proofis not conceptually hard, it should be noted how naturally the language ofdifferential forms can be used to describe Maxwell’s equations. This languagecan be generalized to different, more complicated, areas of both mathematicsand physics.

11.3. Potentials

We have rewritten Maxwell’s equations in the language of differential forms,via the electromagnetic two-form and the current one-form. The questionremains as to how far we can go with this rewriting. The answer is that allof our earlier work can be described via differential forms. In this sectionwe will see how the potential function and the potential vector field can becaptured via a single one-form.

Recall from the chapter on potentials that for any fields E and B satisfyingMaxwell’s equations, there are a function φ and a vector field A such thatE(x , y, z, t) = −∇φ − ∂ A(x,y,z,t)

∂ t and B(x , y, z, t) = ∇ × A(x , y, z, t). We want toencode these potentials into a single one-form.

Definition 11.3.1. Given a function φ and a vector field A, the associatedpotential one-form is

A = −φdt + A1dx + A2dy + A3dz.

By setting A0 = −φ, we write this as

A = A0dt + A1dx + A2dy + A3dz.

Of course, this definition is only worthwhile because the following is true:

Theorem 11.3.1. Let E and B be vector fields and let F be the correspondingelectromagnetic two-form. Then there are a function φ and a vector field A

Page 147: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

132 11 The Electromagnetic Two-Form

such that

E(x , y, z, t) = −∇φ − ∂ A(x , y, z, t)

∂ t

B(x , y, z, t) =∇ × A(x , y, z, t)

if and only if, for the potential one-form A, we have

F = dA.

The proof is another enjoyable calculation, which we will again leave to theexercises.

In our earlier chapter on potentials, we saw that the potentials for the givenvector fields E and B are not unique. This has a natural interpretation fordifferential forms. The key is that for any form ω, we always have

d2ω = 0,

as proven in the exercises in Chapter 9. In particular, for any function f , wealways have

d2 f = 0.

Suppose A is a potential one-form, meaning that F = dA. Then for anyfunction f (x , y, z, t),

A + d f

will also be a potential one-form, since

d(A + d f ) = dA + d2 f = F .

Thus any potential one-form A can be changed to A + d f .

11.4. Maxwell’s Equations via Lagrangians

In Chapter 8 we described how the path of a particle moving in electric andmagnetic fields can be described not only as satisfying the differential equationstemming from F = ma, but also as the path that was a critical value of acertain integral, the Lagrangian, allowing us to use the machinery from thecalculus of variations. We were taking, however, Maxwell’s equations as givenand thinking of the path of the particle as the “thing” to be varied.

Maxwell’s equations are the system of partial differential equations thatdescribe the electric and magnetic fields. Is it possible to reformulateMaxwell’s equations themselves in terms of an extremal value for aLagrangian? The answer to that is the goal of this section. While we are

Page 148: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

11.4 Maxwell’s Equations via Lagrangians 133

motivating this work by trying to answer the intellectual challenge of recastingMaxwell’s equations into the language of the calculus of variations, it is thisvariational approach that most easily generalizes Maxwell’s equations to theweak force and the strong force and to more purely mathematical contexts.

We start with simply stating the Lagrangian L that will yield Maxwell’sequations. It should not be clear how this L was discovered.

Definition 11.4.1. The electromagnetic Lagrangian for an electric field E anda magnetic field B is

L = ( � J ) ∧ A + 1

2( � F) ∧ F .

This is the most efficient way for writing down L but is not directly usefulfor showing any link with Maxwell’s equations.

Now, J is a one-form. Thus ( � J ) is a three-form, meaning that ( � J ) ∧ Amust be a four-form. Similarly, the Faraday form F is a two-form, meaningthat ( � F) is also a two-form. Thus ( � F) ∧ F is also a four-form. Weknow then that L is a four-form on R4. Thus we will want to find the criticalvalues for ∫

L dxdydzdt .

(If you want, you can think that we want to find the values that minimize thepreceding integral. These minimizers will, of course, be among the criticalvalues.)

Proposition 11.4.1. The electromagnetic Lagrangian L equals

L = (A0ρ + A1 J1 + A2 J2 + A3 J3

+ 1

2(E2

1 + E22 + E2

3 − B21 − B2

2 − B23 ))dx ∧ dy ∧ dz ∧ dt .

The proof is another pleasant calculation, which we leave to the exercises.We want somehow to find the critical values of this complicated function L.

But we need to see exactly what we are varying, in the calculus of variations,to get critical values. Here the potentials become as important as, if not moreimportant than, the fields E and B . We know that

E1 = −∂φ

∂x− ∂ A1

∂ t= ∂ A0

∂x− ∂ A1

∂ t

E2 = −∂φ

∂ y− ∂ A2

∂ t= ∂ A0

∂ y− ∂ A2

∂ t

E3 = −∂φ

∂z− ∂ A3

∂ t= ∂ A0

∂z− ∂ A3

∂ t

Page 149: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

134 11 The Electromagnetic Two-Form

B1 = ∂ A3

∂ y− ∂ A2

∂z

B2 = ∂ A1

∂z− ∂ A3

∂x

B3 = ∂ A2

∂x− ∂ A1

∂ y.

Thus we can consider our electromagnetic Lagrangian as a function

L

(A0, A1, A2, A3,

∂ A0

∂ t,∂ A0

∂x,∂ A0

∂ y,∂ A0

∂z, · · · ,

∂ A3

∂ t,∂ A3

∂x,∂ A3

∂ y,∂ A3

∂z

).

It is the functions A0, A1, A2, A3 that are varied in L, while the charge functionρ and the current field J = (J1, J2, J3) are givens and fixed. Then we have

Theorem 11.4.1. Given a charge function ρ and a current field J =(J1, J2, J3), the functions A0, A1, A2, A3 are the potentials for fields E and Bthat satisfy Maxwell’s equations if and only if they are critical points for∫

L dxdydzdt .

Sketch of Proof: We will discuss in the next section that A0, A1, A2, A3 arecritical points for

∫L dxdydzdt if and only if they satisfy the Euler-Lagrange

equations:

∂L

∂ A0= ∂

∂ t

(∂L

∂( ∂ A0∂ t )

)+ ∂

∂x

(∂L

∂( ∂ A0∂x )

)+ ∂

∂ y

(∂L

∂( ∂ A0∂ y )

)+ ∂

∂z

(∂L

∂( ∂ A0∂z )

)

∂L

∂ A1= ∂

∂ t

(∂L

∂( ∂ A1∂ t )

)+ ∂

∂x

(∂L

∂( ∂ A1∂x )

)+ ∂

∂ y

(∂L

∂( ∂ A1∂ y )

)+ ∂

∂z

(∂L

∂( ∂ A1∂z )

)

∂L

∂ A2= ∂

∂ t

(∂L

∂( ∂ A2∂ t )

)+ ∂

∂x

(∂L

∂( ∂ A2∂x )

)+ ∂

∂ y

(∂L

∂( ∂ A2∂ y )

)+ ∂

∂z

(∂L

∂( ∂ A2∂z )

)

∂L

∂ A3= ∂

∂ t

(∂L

∂( ∂ A3∂ t )

)+ ∂

∂x

(∂L

∂( ∂ A3∂x )

)+ ∂

∂ y

(∂L

∂( ∂ A3∂ y )

)+ ∂

∂z

(∂L

∂( ∂ A3∂z )

).

Thus we need the preceding to be equivalent to Maxwell’s equations. We willonly show here that the third equation in the preceding is equivalent to the partof Maxwell’s equations that gives the second coordinate in

∇ × B = ∂ E

∂ t+ J .

Page 150: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

11.4 Maxwell’s Equations via Lagrangians 135

In other words,∂ B1

∂z− ∂ B3

∂x= ∂ E2

∂ t+ J2,

which, in the language of the potentials, is equivalent to

∂2 A3

∂ y∂z− ∂2 A2

∂z2− ∂2 A2

∂x2+ ∂2 A1

∂x∂ y= ∂2 A0

∂ y∂ t− ∂2 A2

∂ t2+ J2.

The rest of the proof is left for the exercises.Start with ∂L/∂ A2. Here we must treat A2 as an independent variable in

the function L. The only term in L that contains an A2 is A2 J2, which meansthat

∂L

∂ A2= J2.

Now to calculate∂

∂ t

(∂L

∂( ∂ A2∂ t )

).

We start with∂L

∂( ∂ A2∂ t )

.

Here we must treat ∂ A2/∂ t as the independent variable. The only term of Lthat contains ∂ A2/∂ t is

1

2E2

2 = 1

2

(∂ A0

∂ y− ∂ A2

∂ t

)2

.

Then∂L

∂( ∂ A2∂ t )

= −∂ A0

∂ y+ ∂ A2

∂ t,

giving us

∂ t

(∂L

∂( ∂ A2∂ t )

)= −∂2 A0

∂ y∂ t+ ∂2 A2

∂ t2.

Next for ∂(∂L/∂(∂ A2/∂x))/∂x , where here ∂ A2/∂x is treated as anindependent variable. The only term of L that contains ∂ A2/∂x is

−1

2B2

3 = −1

2

(∂ A2

∂x− ∂ A1

∂ y

)2

.

Then∂L

∂( ∂ A2∂x )

= −∂ A2

∂x+ ∂ A1

∂ y.

Page 151: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

136 11 The Electromagnetic Two-Form

Thus∂

∂x

(∂L

∂( ∂ A2∂x )

)= −∂2 A2

∂x2+ ∂2 A1

∂x∂ y.

The next term, ∂(∂L/∂(∂ A2/∂ y))/∂ y, is particularly simple, since Lcontains no term ∂ A2/∂ y, giving us that

∂ y

(∂L

∂( ∂ A2∂ y )

)= 0.

The final term can be calculated, in a similar way, to be

∂z

(∂L

∂( ∂ A2∂z )

)= −

(∂2 A3

∂ y∂z− ∂2 A2

∂z2

).

Putting all of this together, we see that the third of the Euler-Lagrangeequations is the same as the second coordinate in ∇ × B = ∂ E

∂ t + J , as desired.The other parts of the Euler-Lagrange equations will imply, by similar

means (given in the exercises), the other two coordinates of ∇ × B = ∂ E∂ t + J

and ∇ · E = ρ. The remaining two parts of Maxwell’s equations (∇ × E =−∂ B/∂ t and ∇ · B = 0) are built into the machinery of the potential one-formand are thus automatically satisfied. Thus we have a Lagrangian approach toMaxwell’s equations.

11.5. Euler-Lagrange Equations for the Electromagnetic Lagrangian

In the last section, in our derivation of Maxwell’s equations via Lagrangians,we simply stated the appropriate Euler-Lagrange equations. We now wantto give an argument that justifies these equations. Starting with a functionL, depending on the four functions A0(x , y, z, t), A1(x , y, z, t), A2(x , y, z, t),A3(x , y, z, t) and all their partial derivatives with respect to x , y, z and t , theEuler-Lagrange equations are a system of partial differential equations whosesolutions are critical points of the integral

∫L dxdydzdt . We will show how

this is done by showing that the critical points satisfy the first Euler-Lagrangeequation

∂L

∂ A0= ∂

∂ t

(∂L

∂( ∂ A0∂ t )

)+ ∂

∂x

(∂L

∂( ∂ A0∂x )

)+ ∂

∂ y

(∂L

∂( ∂ A0∂ y )

)+ ∂

∂z

(∂L

∂( ∂ A0∂z )

).

Here is the approach:We assume that we have functions A0(x , y, z, t), A1(x , y, z, t), A2(x , y, z, t),

and A3(x , y, z, t) that are critical values of∫

L dxdydzdt . Let η(x , y, z, t) be

Page 152: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

11.5 Euler-Lagrange Equations for the Electromagnetic Lagrangian 137

any function that is zero on the boundary of what we are integrating over andlet ε be a number. We perturb the function A0(x , y, z, t) by setting

Aε = A0 + εη.

Set

S(ε) =∫

L

(Aε ,

∂ Aε

∂x,∂ Aε

∂ y,∂ Aε

∂z,∂ Aε

∂ t

)dxdydzdt ,

where we are suppressing the other variables of L. By assumption, the functionS(ε) has a critical value at ε = 0. Then we have

dS

dε= 0

at ε = 0.Now we have

dS

dε= d

(∫L dxdydzdt

)

=∫

d

dε(L)dxdydzdt .

Note that to be absolutely rigorous, we would have to justify the interchangingof the derivative and the integral in the preceding expression. To calculate thederivative dL/dε, we use the chain rule to get

dL

dε= ∂L

∂ A0

dAε

dε+ ∂L

∂(

∂ A0∂ t

)(d

(∂ Aε∂ t

)dε

)+ ∂L

∂(

∂ A0∂x

)(d

(∂ Aε∂x

)dε

)

+ ∂L

∂(

∂ A0∂ y

)d

(∂ Aε∂ y

)dε

+

∂L

∂(

∂ A0∂z

)(

d(

∂ Aε∂z

)dε

)

=(

∂L

∂ A0

)η +

∂L

∂(

∂ A0∂ t

)(∂η

∂ t

)+ ∂L

∂(

∂ A0∂x

)(∂η

∂x

)

+ ∂L

∂(

∂ A0∂ y

)(∂η

∂ y

)+ ∂L

∂(

∂ A0∂z

)(∂η

∂z

).

In order to put the preceding mess into a more manageable form, wewill now use that η is assumed to be zero on the boundary of our region of

Page 153: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

138 11 The Electromagnetic Two-Form

integration and use integration by parts to show, for any function f , that∫f∂η

∂ tdxdydzdt = −

∫η

∂ t( f ) dxdydzdt .

Here, the order of integration does not matter, meaning that∫f∂η

∂ tdxdydzdt =

∫ ∫ ∫ ∫f∂η

∂ tdtdxdydz.

Here we are now writing out all four integral signs, on the right-hand side; thiswill make the next few steps easier to follow.

We concentrate on the first integral. Using integration by parts and the factthat η is zero on the boundary, we get that∫ ∫ ∫ ∫

f∂η

∂ tdtdxdydz = −

∫ ∫ ∫ ∫η

∂ t( f ) dtdxdydz,

giving us what we want.By a similar argument, we have∫

f∂η

∂xdxdydzdt = −

∫η

∂x( f ) dxdydzdt∫

f∂η

∂ ydxdydzdt = −

∫η

∂ y( f ) dxdydzdt

∫f∂η

∂zdxdydzdt = −

∫η

∂z( f ) dxdydzdt .

Thus

dL

dε= η

(∂L∂ A0

− ∂∂ t

(∂L

∂(

∂ A0∂ t

))

− ∂∂x

(∂L

∂(

∂ A0∂x

))

− ∂∂ y

(∂L

∂(

∂ A0∂ y

))

− ∂∂z

(∂L

∂(

∂ A0∂z

)))

.

We now have

0 = dS

=∫ η

∂L

∂ A0− ∂

∂ t

∂L

∂(

∂ A0∂ t

)− ∂

∂x

∂L

∂(

∂ A0∂x

)

− ∂

∂ y

∂L

∂(

∂ A0∂ y

)− ∂

∂z

∂L

∂(

∂ A0∂z

)dxdydzdt .

Page 154: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

11.6 Exercises 139

Since η can be any function that is zero on the boundary, we must have

0 = ∂L

∂ A0− ∂

∂ t

(∂L

∂( ∂ A0∂ t )

)− ∂

∂x

(∂L

∂( ∂ A0∂x )

)− ∂

∂ y

(∂L

∂( ∂ A0∂ y )

)− ∂

∂z

(∂L

∂( ∂ A0∂z )

),

which yields the first Euler-Lagrange equation. The other three are derivedsimilarly.

For more on this style of approach to Maxwell’s equation, I encouragethe reader to look at Gross and Kotiuga’s Electromagnetic Theory andComputation: A Topological Approach [29].

11.6. Exercises

Throughout these exercises, use the star operator with respect to theMinkowski metric, with basis element dt ∧ dx ∧ dy ∧ dz for �(R4).

Exercise 11.6.1. Prove that the vector fields E, B, and J and function ρ satisfyMaxwell’s equations if and only if

dF =0

� d � F = J .

Exercise 11.6.2. Let E and B be vector fields and let F be the correspondingelectromagnetic two-form. Show that there are a function φ and a vector fieldA such that

E(x , y, z, t) = −∇φ − ∂ A(x , y, z, t)

∂ t

B(x , y, z, t) = ∇ × A(x , y, z, t)

if and only if there is a potential one-form A such that

F = dA.

Exercise 11.6.3. If F is the electromagnetic two-form, show that

�F = E1dy ∧ dz − E2dx ∧ dz + E3dx ∧ dy

− B1dx ∧ dt − B2dy ∧ dt − B3dz ∧ dt .

Page 155: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

140 11 The Electromagnetic Two-Form

Exercise 11.6.4. If A is the potential one-form, show that

�A = −φdx ∧ dy ∧ dz + A1dy ∧ dz ∧ dt

− A2dx ∧ dz ∧ dt + A3dx ∧ dy ∧ dt

= A0dx ∧ dy ∧ dz + A1dy ∧ dz ∧ dt

− A2dx ∧ dz ∧ dt + A3dx ∧ dy ∧ dt .

Exercise 11.6.5. Letting

L = ( � J ) ∧ A + 1

2( � F) ∧ F ,

show that

L = (A0ρ + A1 J1 + A2 J2 + A3 J3

+ 1/2(E21 + E2

2 + E23 − B2

1 − B22 − B2

3 ))dx ∧ dy ∧ dz ∧ dt .

Exercise 11.6.6. Show that the first of the Euler-Lagrange equations for theelectromagnetic Lagrangian

∂L

∂ A0= ∂

∂t

⎛⎜⎜⎝ ∂L

(∂ A0

∂t

)⎞⎟⎟⎠+ ∂

∂x

⎛⎜⎜⎝ ∂L

(∂ A0

∂x

)⎞⎟⎟⎠+ ∂

∂ y

⎛⎜⎜⎝ ∂L

(∂ A0

∂ y

)⎞⎟⎟⎠+ ∂

∂z

⎛⎜⎜⎝ ∂L

(∂ A0

∂z

)⎞⎟⎟⎠

is equivalent to

∇ · E = ρ,

the first of Maxwell’s equations.

Exercise 11.6.7. Show that the second of the Euler-Lagrange equations forthe electromagnetic Lagrangian

∂L

∂ A1= ∂

∂t

⎛⎜⎜⎝ ∂L

(∂ A1

∂t

)⎞⎟⎟⎠+ ∂

∂x

⎛⎜⎜⎝ ∂L

(∂ A1

∂x

)⎞⎟⎟⎠+ ∂

∂ y

⎛⎜⎜⎝ ∂L

(∂ A1

∂ y

)⎞⎟⎟⎠+ ∂

∂z

⎛⎜⎜⎝ ∂L

(∂ A1

∂z

)⎞⎟⎟⎠

implies the first coordinate in

∇ × B = ∂ E

∂ t+ J ,

the fourth of Maxwell’s equations.

Page 156: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

11.6 Exercises 141

Exercise 11.6.8. Show that the fourth of the Euler-Lagrange equations for theelectromagnetic Lagrangian:

∂L

∂ A3= ∂

∂t

⎛⎜⎜⎝ ∂L

(∂ A3

∂t

)⎞⎟⎟⎠+ ∂

∂x

⎛⎜⎜⎝ ∂L

(∂ A3

∂x

)⎞⎟⎟⎠+ ∂

∂ y

⎛⎜⎜⎝ ∂L

(∂ A3

∂ y

)⎞⎟⎟⎠+ ∂

∂z

⎛⎜⎜⎝ ∂L

(∂ A3

∂z

)⎞⎟⎟⎠

implies the third coordinate in

∇ × B = ∂ E

∂ t+ J ,

the fourth of Maxwell’s equations.

Exercise 11.6.9. Let L be the electromagnetic Lagrangian. Show that ifA0, A1, A2, and A3 are critical values for

∫L dxdydzdt , then they satisfy the

second Euler-Lagrange equation

∂L

∂ A1= ∂

∂t

⎛⎜⎜⎝ ∂L

(∂ A1

∂t

)⎞⎟⎟⎠+ ∂

∂x

⎛⎜⎜⎝ ∂L

(∂ A1

∂x

)⎞⎟⎟⎠+ ∂

∂ y

⎛⎜⎜⎝ ∂L

(∂ A1

∂ y

)⎞⎟⎟⎠+ ∂

∂z

⎛⎜⎜⎝ ∂L

(∂ A1

∂z

)⎞⎟⎟⎠ .

Page 157: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12

Some Mathematics Needed forQuantum Mechanics

Summary: The goal for this chapter is to define Hilbert spaces and Hermitianoperators, both of which will be critical when we try to understand thephotoelectric effect of light, which will be explained via the process forquantizing Maxwell’s equations over the next few chapters. Thus this chapteris necessary both as a preliminary for the next three chapters on quantummechanics and as a break from the first part of the book.

12.1. Hilbert Spaces

Hilbert spaces are built into the basic assumptions of quantum mechanics.This section provides a quick overview of Hilbert spaces. The next sectionwill provide a similar overview of Hermitian operators. We will start with thedefinition for a Hilbert space and then spend time unraveling the definition.

Definition 12.1.1. A Hilbert space is a complex vector space H with an innerproduct that is complete.

Now we need to define inner product and completeness.

Definition 12.1.2. An inner product on a complex vector space V is a map

〈·, ·〉 : V × V → C

such that

1. For all vectors v ∈ V , 〈v,v〉 is a nonnegative real number, with 〈v,v〉 = 0only if v = 0.

2. For all vectors v,w ∈ V , we have

〈v,w〉 = 〈w,v〉.3. For all vectors u,v,w ∈ V and complex numbers λ and µ, we have

〈u,λv+µw〉 = λ〈u,v〉+µ〈u,w〉.

142

Page 158: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.1 Hilbert Spaces 143

(Recall that a + bi = a − bi is the complex conjugate of the complexnumber a + bi , where both a and b are real numbers.) Inner products aregeneralizations of the dot product on R

2 from multivariable calculus. Similarlyto the dot product, where, recall, for vectors v,w ∈R

2, we have

v ·w = (length of v)(length of w)cos(θ ),

with θ being the angle between v and w, an inner product can be used to definelengths and angles on a complex vector space, as follows.

Definition 12.1.3. Let V be a complex vector space with an inner product〈·, ·〉. The length of a vector v ∈ V is

|v| =√

〈v,v〉.Two vectors v,w are said to be orthogonal if

〈v,w〉 = 0.

We will look at some examples in a moment, but we need first to understandwhat “complete” means. (Our meaning of completeness is the same as theone used in real analysis.) Let V be a complex vector space with an innerproduct 〈·, ·〉. The inner product can be used to measure distance betweenvectors v,w ∈ V by setting

Distance between v and w = |v−w| =√

〈v−w,v−w〉.Definition 12.1.4. A sequence of vectors vk is a Cauchy sequence if, for anyε > 0, there exists a positive integer N such that, for all n,m > N, we have

|vn − vm| < ε.

Thus a sequence is Cauchy if the distances between the vn and vm can bemade arbitrarily small when the n and m are made sufficiently large.

We can now define what it means for a vector space to be complete. (Wewill then look at an example.)

Definition 12.1.5. An inner product space V will be complete if all Cauchysequences converge. Hence, if vk is a Cauchy sequence, there is a vectorv ∈ V such that

limk→∞

vk = v.

The most basic example is the complex vector space C of dimension one,where the vectors are just complex numbers and the inner product is simply〈z,w〉 = zw. If instead of considering complex vector spaces, we looked atreal vector spaces, then the most basic example of a complete space is the real

Page 159: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

144 12 Some Mathematics Needed for Quantum Mechanics

vector space R of dimension one, where the vectors now are real numbers andthe inner product is just multiplication. One of the key properties of the realnumbers is that all Cauchy sequences converge. (This is proven in many realanalysis texts, such as in chapter 1 of [56] or in [24].)

Let us look at a more complicated example. The key behind this examplewill be the fact that the series

∑∞k=1

1k2 converges, and hence that

∞∑k=N

1

k2

can be made arbitrarily small by choosing N large enough. Let V be the vectorspace consisting of infinite sequences of complex numbers, with all but a finitenumber being zero. We denote a sequence (a1,a2,a3, . . . ) by (an). Thus (an) ∈V if there is an N such that an = 0 for all n > N . Define an inner product onV by setting

〈(an), (bm)〉 =∞∑

k=1

akbk .

Though looking like an infinite series, the preceding sum is actually alwaysfinite, since only a finite number of the ak and bk are non-zero. We want toshow that this vector space is not complete. Define the sequence, for all n,

vn =(

1,1

2,1

3, . . . ,

1

n,0, . . .

)

For n < m, we have that

〈vm − vn,vm − vn〉 =m∑

k=n+1

1

k2,

which can be made arbitrarily small by choosing n large enough. Thus thesequence (vk) is Cauchy. But this sequence wants to converge to the “vector”

v = (1,1

2,1

3, . . . ,

1

n, . . . ),

which is not in the vector space V .This does suggest the following for a Hilbert space.

Theorem 12.1.1. Define

l2 ={

(a1,a2,a3, . . . ) : ak ∈C and∞∑

k=1

|ak |2 < ∞}

.

Page 160: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.1 Hilbert Spaces 145

For (ak), (bk) ∈ l2, define

〈(ak), (bk)〉 =∞∑

k=1

akbk .

With this inner product, l2 is a Hilbert space.

People call this Hilbert space “little ell 2.” There are, as to be expected,other vector spaces “little ell p,” defined by setting

l p = {(ak) : ak ∈C and∞∑

k=1

|ak |p < ∞}.

These other spaces do not have an inner product and hence cannot be Hilbertspaces.

Proof. We first need to show that 〈(ak), (bk)〉 is a finite number and hence mustshow that

∑∞k=1 akbk converges whenever

∞∑k=1

|ak|2,∞∑

k=1

|bk |2 < ∞.

Now ∣∣∣∣∣∞∑

k=1

akbk

∣∣∣∣∣≤∞∑

k=1

|akbk | =∞∑

k=1

|ak ||bk|.

The key is that for all complex numbers,

|ab| ≤ |a|2 +|b|2.

Then∞∑

k=1

|ak ||bk| ≤∞∑

k=1

(|ak|2 +|bk |2) =∞∑

k=1

|ak |2 +∞∑

k=1

|bk|2 < ∞.

One of the exercises is to show that l2 is a vector space.We now must show that all Cauchy sequences converge. (While this proof

is standard, we are following section 1.8 in [31].) Let vk ∈ l2, with

vk = (a1(k),a2(k),a3(k), . . .),

forming a Cauchy sequence. We must show that the sequence of vectorsv1,v2,v3, . . . converges to a vector v = (a1,a2,a3, . . . ). We know, for n < m,that

limn,m→∞|vm − vn|2 = 0.

Page 161: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

146 12 Some Mathematics Needed for Quantum Mechanics

Thus, for any fixed positive integer N we can make

N∑i=1

|ai (m) − ai(n)|2

arbitrarily small. More precisely, given any ε > 0, we can find a positive integerN such that

∞∑k=1

|ak(m) − ak(n)|2 < ε

for any m,n > N . In particular, for any fixed i , we can make |ai(m) − ai (n)|2arbitrarily small. But this means that the real numbers ai (1),ai(2),ai(3), . . .form a Cauchy sequence, which must in turn converge to a real number,denoted by ai . We claim that (a1,a2,a3, . . . ) is our desired sequence.

We know that for any positive integer L that∑L

k=1 |ak(m)−ak(n)|2 < ε, forall m and n larger than our earlier chosen N . By choosing m large enough, wehave that

L∑k=1

|ak − ak(n)|2 ≤ ε.

This is true for all L, meaning that

∞∑k=1

|ak − ak(n)|2 ≤ ε.

Thus the sequence of vectors (vn) does converge to v.We still have to show that v is in the vector space l2. But we have

∞∑k=1

|ak |2 =∞∑

k=1

|ak − ak(n) + ak(n)|2

≤∞∑

k=1

|ak − ak(n)|2 +∞∑

k=1

|ak(n)|2

< ∞

finishing the proof. �

Another standard complex Hilbert space is L2[0,1], the square integrablefunctions on the interval [0,1], which is the vector space

L2[0,1] ={

f : [0,1] →C :∫ 1

0| f (x)|2 < ∞

},

Page 162: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.1 Hilbert Spaces 147

with inner product given by

〈g(x), f (x)〉 =∫ 1

0g(x) f (x)dx .

(This is actually not quite right, as we have to identify any two functions thatare equal off of a set of measure zero; we talk about this subtlety in the lastsection of this chapter.)

Finally, we need to talk about bases of Hilbert spaces. A Schauder basisfor a Hilbert space H is a countable collection v1,v2,v3, . . . such that anyvector in H can be written uniquely as a possibly infinite linear combinationof v1,v2,v3, . . .. Thus, given any v ∈ H, there are unique complex numbersa1,a2,a3, . . . such that

v =∞∑

k=1

akvk .

Since the preceding is an infinite series, we have to worry a little bit aboutwhat convergence means, but this is not too hard. We simply state that v =∑∞

k=1 akvk if

limN→∞

∣∣∣∣∣v−N∑

k=1

akvk

∣∣∣∣∣= 0.

Since∑N

k=1 akvk is a finite sum, the preceding is well-defined.For a given Hilbert space, a basis is far from unique. But some bases are

better than others, namely, those that are orthonormal. A basis v1,v2,v3, . . . isorthonormal if

〈vi ,v j 〉 ={

1 if i = j0 if i �= j

.

For an orthonormal basis, the unique coefficients ai have a particularly niceform:

Theorem 12.1.2. Let v1,v2,v3, . . . be an orthnormal basis for a Hilbert spaceH. Then, for any vector v ∈H, we have

v =∞∑

k=1

〈vk ,v〉vk .

Page 163: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

148 12 Some Mathematics Needed for Quantum Mechanics

Sketch of Proof: There are unique numbers ak such that v =∑akvk . Then we

have

〈v j ,v〉 = 〈v j ,∞∑

k=1

akvk〉

=∞∑

k=1

〈v j ,akvk〉

= ak ,

as desired. (The reason why we called this a ‘sketch of a proof’ is that we didnot justify why 〈v j ,

∑∞k=1 akvk〉 = ∑∞

k=1〈v j ,akvk〉. While straightforward, itdoes take a little work.)

Now to see that any Schauder basis can be transformed into an orthonormalbasis, via the Gram-Schmidt process. Start with any basis v1,v2,v3, . . .. Wewill recursively construct an orthonormal basis w1,w2,w3, . . .. Start by setting

w1 = 1

|v1|v1,

a vector with length one.In an intermediate step, set

w2 = v2 −〈w1,v2〉w1.

As seen in the exercises, w2 is orthogonal to w1. To get this vector to havelength one, set

w2 = 1

|w2| w2.

Now to make the inductive step. Suppose we have created an orthonormalsequence w1, . . . ,wn−1. Set

wn = vn −n−1∑k=1

〈wk ,vn〉wk .

As shown in the exercises, the new vector wn has been created to be orthogonalto the vectors w1, . . .wn−1. To get a vector of length one, we set

wn = 1

|wn| wn .

Page 164: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.2 Hermitian Operators 149

For our Hilbert space l2, there is a quite simple basis, namely,

e1 = (1,0,0,0, . . .)

e2 = (0,1,0,0, . . .)

e3 = (0,0,1,0, . . .)

...

For the Hilbert space L2[0,1], the sequence 1,e2π i x ,e−2π i x ,e4π i x ,e−4π i x , . . .is a natural choice for an orthonormal basis. Writing a function f (x) ∈ L2[1,0]in terms of this basis, if

f (x) =∞∑

−∞ake2π ikx ,

then

ak =∫ 1

0e−2π ikx f (x) dx

and is hence just a fancy way of talking about the Fourier series for f (x).This is in part why some introductory books in quantum mechanics will hardlymention Hilbert spaces, concentrating instead on an approach based on Fourierseries.

12.2. Hermitian Operators

Hilbert spaces are vector spaces. The natural type of map from a vector spaceto itself is a linear operator:

Definition 12.2.1. A linear operator from a complex vector space V to itself isa map

T : V → V

such that for all vectors v1,v2 ∈ V and complex numbers λ1,λ2, we have

T (λ1v1 +λ2v2) = λ1T (v1) +λ2T (v2).

We now look at linear operators on a Hilbert space.

Definition 12.2.2. Let H be a Hilbert space. A linear operator T : H → H iscontinuous if whenever a sequence of vectors vn in H converges to a vector vin H, then T (vn) converges to T (v). Thus if limn→∞ vn = v, then

limn→∞ T (vn) = T (v).

Page 165: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

150 12 Some Mathematics Needed for Quantum Mechanics

It can be shown that all linear operators on a finite dimensional Hilbert spaceare continuous. This is not true for infinite dimensional Hilbert spaces. Also,often people do not say that an operator is continuous but instead say that itis bounded. These mean the same thing, as shown in the following theorem(which we will not prove, even though it is not that hard to show).

Theorem 12.2.1. Let H be a Hilbert space. A linear operator T : H → H iscontinuous if and only if there is a constant B such that for all vectors v ∈ H,we have

|T (v)| ≤ B|v|.Now to discuss the adjoint of a Hermitian operator. Its existence is the

point of

Theorem 12.2.2. Let H be a Hilbert space. Given any continuous linearoperator T : H →H, there exists an adjoint operator

T ∗ : H →H

such that for all v,w ∈H, we have

〈w, T (v)〉 = 〈T ∗(w),v〉.The proof that the adjoint always exists is in most texts on functional

analysis, such as in chapter 12 of [58]. Further, if the linear operator is notcontinuous, there is still the notion of an adjoint always existing, but only aftersuitably restricting the domain of the operator. This is a nontrivial subtlety thatwe will ignore here. Thus we will always assume that an adjoint must exist.We will also see in a moment, via an example, that adjoints are hardly esotericbut in fact quite common.

We can now define Hermitian operators.

Definition 12.2.3. A linear operator T : H →H is Hermitian if

T = T ∗.

As we will see in the next chapter, a basic assumption in quantummechanics is that anything that can be measured must correspond to aHermitian operator and that what can actually be measured (the number youget in a lab) must be an eigenvalue of the operator. (Technically, the numbersmeasured are in the spectrum of the operator. For us, we will restrict ourattention to eigenvalues, leaving a discussion of the spectrum of an operator tothe end of this section.) If we want to make measurements that have meaning,then these measurements should yield real numbers. While the eigenvalues for

Page 166: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.2 Hermitian Operators 151

a general linear operator acting on a complex vector space could be complexnumbers, the eigenvalues for a Hermitian operator must be real numbers:

Theorem 12.2.3. The eigenvalues of a Hermitian operator are real numbers.

Proof. Let λ be an eigenvalue with eigenvector v for a Hermitian operatorT : H →H. By definition this means T (v) = λv. Now

λ〈v,v〉 =〈v,λv〉=〈v, T (v)〉=〈T ∗(v),v〉, by the definition of adjoint

=〈T (v),v〉, since T is Hermitian

=〈λv,v〉=λ〈v,v〉.

Since 〈v,v〉 �= 0, we must have λ = λ, meaning that λ ∈R. �

Theorem 12.2.4. Let v be an eigenvector of a Hermitian operator T witheigenvalue λ, and let w be another eigenvector with distinct eigenvalue µ.Then v and w are orthogonal.

Proof. We haveλ〈w,v〉 =〈w,λv〉

=〈w, T v〉=〈T ∗w,v〉 by the definition of adjoint

=〈Tw,v〉 since T is Hermitian

=〈µw,v〉=µ〈w,v〉.

Since λ �= µ, we must have that 〈w,v〉 = 0, as desired.�

Let us look at the particularly simple complex Hilbert space C2, just to seethat Hermitian operators are not strange and unnatural. For vectors

v =(

v1

v2

), w =

(w1

w2

),

the inner product is simply

<w,v >=wtranspose · v = (w1,w2)

(v1

v2

)=w1v1 +w2v2.

Page 167: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

152 12 Some Mathematics Needed for Quantum Mechanics

An operator T : C2 →C2 is given by a two-by-two matrix:

T =(

a bc d

).

Then we have the adjoint being

T ∗ = Ttranspose =

(a cb d

),

by direct calculation. Now T will be Hermitian when T = T ∗, or when

a = a

b = c

d = d.

In particular, both elements along the diagonal, a and d , must be real numbers.Thus (

3 5 + 6i5 − 6i 10

)is Hermitian, while(

2 4 + 8i3 + 9i 11

)and

(2 + 3i 6

6 7

)

are not.Finally, we will briefly discuss the spectrum of an operator T on a Hilbert

space H.

Definition 12.2.4. Let T : H → H be a linear operator. A complex number λ

is in the spectrum of T if the operator

T −λI

is not invertible, where I is the identity map.

If λ is an eigenvalue with eigenvector v for an operator T , we have that

T (v) = λv ⇐⇒ (T −λI )(v) = 0.

Since an eigenvector cannot be the zero vector, this means that T − λI hasa non-trivial kernel (non-trivial null-space) and hence cannot be invertible,showing

Eigenvalues of T ⊂ Spectrum of T .

For infinite-dimensional Hilbert spaces, however, the spectrum can bestrictly larger than the set of eigenvalues. In quantum mechanics, the numbers

Page 168: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.3 The Schwartz Space 153

that can be obtained via making a measurement must be in the spectrum ofthe corresponding operator. These are nontrivial, significant subtleties that wewill avoid. Most texts on functional analysis (such as [58]) treat these issuesthoroughly.

12.3. The Schwartz Space

As we will see, not only are Hilbert spaces the natural spaces for quantummechanics, but also it is often the case that we can do serious real-worldcalculations without ever concretely specifying what Hilbert space we areusing. To some extent, this lack of specificity of the Hilbert space makesquantum mechanics into a quite flexible tool. (For a particularly clearexplanation, see page 47 in section 3.2 of Folland’s Quantum Field Theory: ATourist Guide for Mathematicians [23].) Still, there are times when we actuallyneed to work in a given vector space. This section deals with the Schwartzspace. As a word of warning, the Schwartz space will not be a Hilbert space.

12.3.1. The Definition

Intuitively, the Schwartz space is the vector space of smooth functions from thereal numbers to the complex numbers that approach zero extremely quickly.More precisely,

Definition 12.3.1. The Schwartz space S(R) is the vector space of smoothfunctions

f : R→C

such that, for all n,m ∈N, we have

limx→±∞

∣∣∣∣xn dm f

dxm

∣∣∣∣= 0.

For n = 0,m = 0, we see that if f ∈ S(R) then

limx→∞| f (x)| = lim

x→−∞| f (x)| = 0,

and thus f (x) does indeed approach zero for large |x |. But, since we also have,for n = 1,m = 0, that

limx→∞|x f (x)| = lim

x→−∞|x f (x)| = 0,

| f | must go to zero fairly quickly in order to compensate for |x | going toinfinity.

Page 169: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

154 12 Some Mathematics Needed for Quantum Mechanics

We want to show that S(R) can be made into an inner product space bydefining the inner product to be

〈 f , g〉 =∫ ∞

−∞f g dx .

We will see that while it is an inner product space, S(R) is not complete andhence cannot be a Hilbert space.

But we first have to show, for all f , g ∈ S(R), that 〈 f , g〉 = ∫∞−∞ f g dx is

finite. We first need

Lemma 12.3.1. Let f ∈ S(R). Then∫ ∞

−∞| f (x)|2 dx < ∞.

This lemma is stating that functions in the Schwartz space approach zero soquickly that the area under | f (x)|2 is finite. The technique behind the proof isone of the standard ways to estimate sizes of integrals.

Proof. We will show that each of the three integrals∫ −1

−∞| f (x)|2 dx ,

∫ 1

−1| f (x)|2 dx ,

∫ ∞

1| f (x)|2 dx

is finite.Since f (x) is continuous, from calculus we know that | f (x)|2 is also

continuous. Then the area under | f (x)|2 over the bounded closed interval[ − 1,1] must be finite,

1

| f (x )|2

−1

Finite area

Figure 12.1

which means that ∫ 1

−1| f (x)|2 dx < ∞.

Now to look at∫∞

1 | f (x)|2 dx . We know that

limx→∞|x f (x)| = 0.

This means that there is a positive constant B such that, for all x ≥ 1, we have

|x f (x)| < B ,

Page 170: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.3 The Schwartz Space 155

which in turn means that for |x | ≥ 1,

| f (x)| <∣∣∣∣ B

x

∣∣∣∣ .Then ∫ ∞

1| f (x)|2 dx <

∫ ∞

1

B2

x2dx

= B2.

A similar argument, left for the exercises, will show that∫ −1

−∞ | f (x)|2 dx isalso finite. �

Proposition 12.3.1. For all f , g ∈ S(R), we have that

〈 f , g〉 =∫ ∞

−∞f g dx

exists.

Proof. The key is that for all complex numbers z,w we have

|zw| < |z|2 +|w|2,

a straightforward calculation that we leave for the exercises. Then∣∣∣∣∫ ∞

−∞f g dx

∣∣∣∣≤∫ ∞

−∞

∣∣ f g∣∣ dx

≤∫ ∞

−∞

(| f |2 +|g|2) dx

=∫ ∞

−∞| f |2 dx +

∫ ∞

−∞|g|2 dx

< ∞.

Thus the integral∫∞

−∞ f g dx exists.�

12.3.2. The Operators q(f) = xf and p(f) = – i df / d x

There are two Hermitian operators that will later be important for quantummechanics. They are

q : S(R) → S(R), p : S(R) → S(R)

where

q( f )(x) = x f (x), p( f )(x) = −id

dx.

Page 171: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

156 12 Some Mathematics Needed for Quantum Mechanics

The key is

Theorem 12.3.1. Both q( f )(x) = x f (x) and p( f )(x) = −i d fdx are Hermitian

operators from S(R) → S(R).

The proof is contained in the following set of lemmas, which are left asexercises.

Lemma 12.3.2. For any f ∈ S(R), we have that q( f ) ∈ S(R).

Lemma 12.3.3. For any f ∈ S(R), we have that p( f ) ∈ S(R).

Lemma 12.3.4. For all f , g ∈ S(R), we have that

〈 f ,q(g)〉 = 〈q( f ), g〉.Lemma 12.3.5. For all f , g ∈ S(R), we have that

〈 f , p(g)〉 = 〈p( f ), g〉..As we will see, the failure of the two operators to commute will be

important in the quantum mechanics of springs, which in turn will be criticalfor the quantum mechanics of light.

Definition 12.3.2. The commutator of two operators a and b is

[a,b] = ab − ba.

Thus [a,b] = 0 precisely when a and b commute (meaning that ab = ba).To see how q and p fail to commute we need to look at [q , p].

Proposition 12.3.2. For all f ∈ S(R), we have

[q , p]( f )(x) = i f (x).

Proof.

[q , p]( f )(x) = q(p( f ))(x) − p(q( f ))(x)

= x

(−i

d f

dx

)−(

−id

dx(x f )

)

= −i xd f

dx−(

−i f (x) − i xd f

dx

)= i f (x),

as desired.�

Page 172: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.3 The Schwartz Space 157

12.3.3. S(R) Is Not a Hilbert Space

While S(R) has an inner product, it is not complete and therefore is not aHilbert space. We will explicitly construct a Cauchy sequence of functions inS(R) that converge to a discontinuous function, and hence to a function that isnot in our Schwartz space S(R).

For all positive integers n, set

fn(x) = 1

enx2 = e−(nx2).

As shown in the exercises, each fn(x) ∈ S(R). We will now show

Lemma 12.3.6. ∫ ∞

−∞

1

enx2 dx =√

π

n.

Proof. This is a standard argument from calculus, using one of the cleveresttricks in mathematics. Set

A =∫ ∞

−∞

1

enx2 dx .

As it stands, it is difficult to see how to perform this integral. (In fact, there isno way of evaluating the indefinite integral∫

1

enx2 dx

in terms of elementary functions.)Then, seemingly arbitrarily, we will find not A but instead its square:

A2 =(∫ ∞

−∞

1

enx2 dx

)(∫ ∞

−∞

1

enx2 dx

)

=(∫ ∞

−∞

1

enx2 dx

)(∫ ∞

−∞

1

eny2 dy

),

since it does not matter what we call the symbol we are integrating over. Butnow the two integrals are integrating over different variables. Hence

A2 =(∫ ∞

−∞

1

enx2 dx

)(∫ ∞

−∞

1

eny2 dy

)=∫ ∞

−∞

∫ ∞

−∞

1

en(x2+y2)dxdy.

We did this so that we can now use polar coordinates, setting

x = r cos(θ ), y = r sin (θ ),

giving us thatr 2 = x2 + y2, r drdθ = dxdy,

Page 173: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

158 12 Some Mathematics Needed for Quantum Mechanics

with the limits of integration being given by 0 ≤ r ≤ ∞ and 0 ≤ θ ≤ 2π . Thenwe have

A2 =∫ 2π

0

∫ ∞

0e−nr2

r drdθ

=∫ 2π

0

−1

2ne−nr2 |∞0 dθ

=∫ 2π

0

1

2ndθ

= 1

2n

∫ 2π

0dθ

= π

n,

finishing the proof.�

We will now show that the sequence of functions

fn(x) = 1

enx2

is a Cauchy sequence in our Schwartz space that does not converge to afunction in the Schwartz space. Assume for a moment that this sequence isCauchy. Note that for a fixed number x �= 0 we have

limn→∞ fn(x) = 0.

For x = 0, though, we have for all n that

fn(0) = 1.

Thuslim

n→∞ fn(0) = limn→∞1 = 1,

giving us that the limit function f of the sequence fn is the discontinuousfunction

f (x) ={

1 if x = 00 if x �= 0.

Hence the limit function f (x) is not in S(R), meaning that S(R) cannot be aHilbert space. Thus we must show

Lemma 12.3.7. The sequence of functions fn(x) = e−nx2is Cauchy.

Proof. We need to show

Page 174: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.4 Caveats 159

limn,m→∞| fn − fm |2 = 0.

Since the functions fn are all real-valued, we need not worry about usingconjugation signs. Then we have

| fn − fm|2 =∫ ∞

−∞( fn − fm)( fn − fm) dx

=∫ ∞

−∞( f 2

n − 2 fn fm + f 2m) dx

≤∫ ∞

−∞f 2n dx + 2

∫ ∞

−∞fn fm dx +

∫ ∞

−∞f 2m dx

=∫ ∞

−∞e−2nx2

dx + 2∫ ∞

−∞e−(n+m)x2

dx +∫ ∞

−∞e−2mx2

dx

=√

π

2n+ 2

√π

n + m+√

π

2m

→ 0,

as n,m → ∞, as desired.�

12.4. Caveats: On Lebesgue Measure, Types of Convergence,and Different Bases

To make much of this chapter truly rigorous would require a knowledge ofLebesgue measure, a standard and beautiful topic covered in such classics asthose by Royden [56], Rudin [57], and Jones [34], and by many others. Forexample, while our definition for the Hilbert space

l2 ={

(a1,a2,a3, . . . ) : ak ∈C and∞∑

k=1

|ak|2 < ∞}

is rigorous, our definition for square integrable functions

L2[0,1] ={

f : [0,1] →C :∫ 1

0| f (x)|2 < ∞

}

is not. While this is a vector space, it is not actually complete and hencenot quite a Hilbert space. To make it into an honest Hilbert space, we haveto place an equivalence relation on our set of functions. Specifically, we haveto identify two functions if they are equal off of a set of measure zero (where“measure zero” is a technical term from Lebesgue measure theory).

Page 175: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

160 12 Some Mathematics Needed for Quantum Mechanics

This also leads to different types of convergence. Even in the world ofcontinuous functions, there is a distinction between pointwise convergence anduniform convergence. With the vector space L2[0,1], we can also talk about L2

convergence. Here we say that a sequence of functions fn(x) ∈ L2[0,1] willL2 converge to a function f ∈ L2[0,1] if the numbers

∫ 10 | fn(x) − f (x)|2dx

converge to zero.Consider our sequence of functions fn(x) = e−(nx2) in Schwartz space S(R).

We showed that this sequence converged to the discontinuous function that isone at x = 0 and zero everywhere else. This convergence is both pointwiseand in the sense of L2. But this sequence does converge in L2 to a perfectlyreasonable function in the Schwartz space, namely, to the zero function. Notethat the zero function and the function that is one at x = 0 and zero everywhereelse, while certainly not equal pointwise, are equal off a set of measure zero.There are, though, examples of sequences in S(R) that converge pointwise andin L2 to functions that are not in S(R).

Finally a word about bases for vector spaces. A “basis” from beginninglinear algebra for a vector space V is a collection of vectors such that everyvector v ∈ V can be written uniquely as a finite linear combination of vectorsfrom the basis. Every vector space has such a basis. This type of basis is calleda Hamel basis. Unfortunately, for infinite-dimensional vector spaces, it can beshown that a Hamel basis has to have an uncountable number of elements. Thisis a deep fact, linked to the axiom of choice. It also means that such a basis isalmost useless for infinite-dimensional spaces. Luckily for Hilbert spaces thereis another type of basis, namely, the Schauder basis defined in this chapter.

12.5. Exercises

Exercise 12.5.1. Find two complex numbers α and β such that

1.

|α +β|2 > |α|2 −|β|2,

2.

|α +β|2 = |α|2 +|β|2,

3.

|α +β|2 < |α|2 +|β|2.

Exercise 12.5.2. Show for all complex numbers z,w ∈C that

|zw| ≤ |z|2 +|w|2.

Page 176: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

12.5 Exercises 161

Exercise 12.5.3. Let V be the vector space consisting of infinite sequences ofcomplex numbers, with all but a finite number being zero. Show that

〈(an), (bm)〉 =∞∑

k=1

akbk

defines an inner product on V .

Exercise 12.5.4. The goal of this exercise is to show that l2 is a vectorspace.

1. Let (ak), (bk) ∈ l2. Show that (ak + bk) ∈ l2.2. Let (ak) ∈ l2 and let λ ∈C. Show that λ(ak) = (λak) is still in l2.

Exercise 12.5.5. Let v1,v2,v3, . . . be a basis for a Hilbert space H.

1. Show that the vector w1 = 1|v1|v1 has length one.

2. Show thatw2 = v2 −〈w1,v2〉w1

is orthogonal to w1.3. Find a vector w2 that has length one and is orthogonal to w1, such that the

vectors w1 and w2 span the same space as v1 and v2.4. Suppose w1, . . . ,wn−1 is an orthonormal sequence. Setting

wn = vn −n−1∑k=1

〈wk ,vn〉wk ,

show that wn is orthogonal to each wk .5. Set

wn = wn

|wn| .

Show that w1,w2, . . . is an orthonormal basis for H.

Exercise 12.5.6. For the Hilbert space C2 with inner product

〈w,v〉 =wtranspose · vshow that

v1 =(

12

), v2 =

(3

−2

)form a basis. Use the Gram-Schmidt process to construct another basis that isorthonormal.

Exercise 12.5.7. For the Hilbert space C2 with inner product

〈w,v〉 =wtranspose · v,

Page 177: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

162 12 Some Mathematics Needed for Quantum Mechanics

show that the adjoint of

T =(

a bc d

)is

T ∗ = Ttranspose =

(a cb d

).

Exercise 12.5.8. Consider the Hilbert space C3 with inner product 〈w,v〉 =

wtranspose · v. Let

T = a11 a12 a13

a21 a22 a23

a31 a32 a33

.

Show that T is Hermitian if and only if

ai j = a ji ,

for all i and j .

Exercise 12.5.9. Generalize the preceding to the vector space Cn.

Exercise 12.5.10. For any f ∈ S(R), show that∫ −1

−∞| f (x)|2 dx < ∞.

Exercise 12.5.11. For any f ∈ S(R), show that q( f ) ∈ S(R).

Exercise 12.5.12. For any f ∈ S(R), show that p( f ) ∈ S(R).

Exercise 12.5.13. For all f , g ∈ S(R), show that

〈 f ,q(g)〉 = 〈q( f ), g〉.Exercise 12.5.14. For all f , g ∈ S(R), show that

〈 f , p(g)〉 = 〈p( f ), g〉.Exercise 12.5.15. For each positive n, show that

limx→±∞ xke−(nx2) = 0.

Exercise 12.5.16. For each positive n, show that

limx→±∞

∣∣∣∣∣xk dme−(nx2)

dxm

∣∣∣∣∣= 0,

for all m, allowing us to conclude that e−(nx2) ∈ S(R).

Page 178: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

13

Some Quantum Mechanical Thinking

Summary: After discussing the photoelectric effect, which gives experimentalevidence that light is not solely a wave but also must have particle properties,we will state the basic assumptions behind quantum mechanics. The lastsection discusses quantization, which is a method to link traditional classicalphysics with the new world of quantum mechanics.

13.1. The Photoelectric Effect: Light as Photons

Shining light on certain metals will cause electrons to fly off.

Surface

electronslight

Figure 13.1This is called the photoelectric effect. Something in light provides enough

“oomph” to kick the electrons off the metal.At a qualitative level, this makes sense. We have seen that light is an

electromagnetic wave. Since waves transmit energy, surely an energeticenough light wave would be capable of kicking an electron out of its originalmetal home.

Unfortunately, this model quickly breaks down when viewed quantitatively.The energy of the fleeing electrons can be measured. A classical wave hasenergy that is proportional to the square of its amplitude (we will give anintuition for this in a moment), but changing the amplitude of the light doesnot change the energy of the electrons. Instead, changing the frequency ofthe light wave is what changes the energy of the electrons: Higher frequencymeans higher electron energies. This is just not how waves should behave.

163

Page 179: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

164 13 Some Quantum Mechanical Thinking

In 1905, Albert Einstein made the assumption that light is not a wave butinstead is made up of particles. He further assumed that the energy of eachparticle is hν, where h is a number called the Planck constant and, moreimportantly, ν is the frequency of the light. This assumption agreed withexperiment.

Einstein made this claim for the particle nature of light the very same yearthat he developed his Special Theory of Relativity, which is built on the wavenature of light. This photoelectric theory paper, though, is far more radicalthan his special relativity paper. For an accurate historical account, see Pais’sSubtle Is the Lord: The Science and the Life of Albert Einstein [51].

Even so, Einstein is still using, to some extent, a wavelike property of light,in that light’s energy depends on its frequency, which is a wave property. Still,none of this agrees with the world of waves and particles in classical physics,leading by the mid-1920s to the quantum revolution.

Before giving a barebones outline for quantum mechanics in the nextsection, let us first look at why a wave’s energy should be related to itsamplitude, as opposed to its frequency.

Think of a wave as made up of a piece of string. In other words, we havea material wave. Since light is an electromagnetic wave, people thought it hadto be a wave of something. This something was called the “ether.”

Consider a wave moving through time, with fixed endpoints:

A B C

Figure 13.2Now, the energy of the wave is the sum of its potential energy and its

kinetic energy. The up-and-down motion of the wave, and hence the wave’skinetic energy, is at its least when the wave is stretched the most. Thus thepotential energy (where the kinetic energy is zero) should be proportional tosome function of the amplitude. (We will see why it is the square of theamplitude in the next chapter.) The frequency of the wave should not affectits energy, unless the wave-like nature of electromagnetism is a fundamentallydifferent type of wave.

13.2. Some Rules for Quantum Mechanics

Historically, it took only about twenty years for people to go from Einstein’sinterpretation of the photoelectric effect and other experimental oddities, such

Page 180: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

13.2 Some Rules for Quantum Mechanics 165

as the sharp spectral lines of hydrogen, to a sharp, clean, spectacular newtheory of mechanics: quantum mechanics. There are many ways to introducethe beginnings of quantum mechanics, ranging from a mechanical, rule-basedscheme for calculations to a presentation based on a highly abstract axiomatics.Our underlying motivation is to identify some of the key mathematicalstructures behind the physics and then to take these structures seriously, evenin non-physical contexts.

Thus we will present here some of the key rules behind quantum mechanics.It will not be, in any strict sense, an axiomatic presentation of quantummechanics. (Our presentation is heavily influenced by chapter 3 in [17] and, toa lesser extent, section 2.2 in [38].)

We want to study some object. For us to be able to apply mathematicaltools, we need to be able to describe this object mathematically, to measureits properties, and to predict how it will change through time. Let us look at aclassical example: Suppose we want to understand a baseball of mass m flyingthrough the air. We need to know its position and its momentum.

y1

x 1

velocity = (a, b)

Figure 13.3

Fixing some coordinate system, we need to know six numbers: the three spatialcoordinates and the three momentum coordinates. The baseball will then bedescribed as a point

(x , y, z,m

dx

dt,m

dy

dt,m

dz

dt

)∈R

6.

We say that the state of the baseball is its corresponding point in R6. To

determine where the baseball will move through time, we use Newton’s secondlaw. We need to find all the forces F acting on the baseball and then solve thesystem of differential equations F = ma.

In quantum mechanics, we must be able to describe the state of an objectand to know how to measure its properties and to find the analog of F = ma.None of this will be at all obvious.

The natural (though hardly naive) setting for quantum mechanics is that ofHilbert spaces and Hermitian operators. While technical definitions were givenin the last chapter, you can think of a Hilbert space as a complex vector space

Page 181: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

166 13 Some Quantum Mechanical Thinking

with an inner product (and thus a vector space for which we have a notion ofangle) and Hermitian operators as linear transformations on the Hilbert space.

Assumption 1. The state of an object will be a ray in a Hilbert space H. Thusa state of an object is specified by a non-zero vector v in the Hilbert space, andtwo vectors specify the same state if they are non-zero multiples of each other.

Assumption 2. Any quantity that can be measured will correspond to aHermitian operator

A : H → H.

Thus, if we want to measure, say, the velocity of our object, there mustcorrespond a “velocity” operator.

Assumption 3. Actual measurements will always be in the spectrum of thecorresponding Hermitian operator.

A number λ will be in the spectrum of an operator A if the operator

A −λI

is non-invertible, where I is the identity operator. If λ is an eigenvalue witheigenvector v for an operator A, i.e.,

Av = λv,

we must have that(A −λI )v = 0,

which means that A − λI is non-invertible. For most practical purposes, onecan think of the spectrum as the eigenvalues of A. This is why the previousassumption is frequently thought of as saying that actual measurements mustbe eigenvalues of the corresponding Hermitian operator. For the rest of thissection, we will assume that the spectrum for each of our Hermitian operatorsonly consists of eigenvalues.

Recall that an eigenvalue of operator A is simple if its eigenvectors form aone-dimensional subspace of the Hilbert space.

Assumption 4. Suppose our object has state v ∈ H. Let A be a Hermitianoperator with simple eigenvalue λ and corresponding eigenvector w ∈ H.When we make a measurement on our object for the quantity correspondingto A, then the probability that we get the number λ is

|〈w,v〉|2〈w,w〉〈v,v〉 .

Page 182: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

13.2 Some Rules for Quantum Mechanics 167

Now we start to leave the classical, Newtonian world far behind. We willtake a few paragraphs to start discussing what this assumption actually entails.

We are using the Hilbert space structure, more precisely the inner product

〈·, ·〉. First, we want to make sure that |〈w,v〉|2〈w,w〉〈v,v〉 is a real number between 0 and

1 (since we want to interpret this number as a probability).

Proposition 13.2.1. For any non-zero vector v ∈H, we have

0 ≤ |〈w,v〉|2〈v,v〉〈w,w〉 ≤ 1.

Proof. Since H is a complex vector space, the inner product 〈w,v〉 could be acomplex number, but |〈w,v〉|2 is a real number. Similarly, as we saw in the lastchapter, while in general 〈w,v〉 does not need to be a real number, we always

have that 〈v,v〉 is a positive real number (provided v �= 0). Thus |〈w,v〉|2〈v,v〉〈w,w〉 is a

nonnegative real number.For this number to have a chance to be a probability, we need it to be less

than or equal to 1. This follows, though, from the Cauchy-Schwarz inequality,which states that

|〈w,v〉| ≤√

〈w,w〉√

〈v,v〉.�

The squaring of |〈w,v〉| leads to non-linearity, which is built into the heartof quantum mechanics. Suppose we have two states v1 and v2. While

〈w,v1 + v2〉 = 〈w,v1〉+ 〈w,v2〉,it is almost always the case that

|〈w,v1 + v2〉|2 �= |〈w,v1〉|2 +|〈w,v2〉|2.

(This is really a fact about complex number, and is shown in the exercises inchapter 12.) Though we will not be talking about it, this is the mathematicalunderpinning of quantum interference.

We immediately have

Proposition 13.2.2. Suppose that our state is an eigenvector w with eigen-value λ for a Hermitian operator A. Then the probability when we take ameasurement for the value corresponding to A that we measure the number λ

is precisely one.

Finally, we defined our states to be rays in our Hilbert space. But in makingmeasurements, we chose a vector on the ray. What if we had chosen a different

Page 183: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

168 13 Some Quantum Mechanical Thinking

vector on the ray? We want to make sure that our probabilities do not change,as the following proves:

Proposition 13.2.3. Let A be a Hermitian operator with simple eigenvalue λ

with eigenvector w ∈ H. The probability that we get the value λ when wemeasure the state specified by v is precisely equal to the probability of gettingλ when we measure the state specified by µv, for µ any complex number notequal to 0.

Proof. The probability of measuring λ for the state µv is

|〈w,µv〉|2〈w,w〉〈µv,µv〉 .

We have

|〈w,µv〉|2〈w,w〉〈µv,µv〉 . = |µ〈w,v〉|2

|µ|2〈w,w〉〈v,v〉

= |〈w,v〉|2〈w,w〉〈v,v〉 ,

which is just the probability that, in measuring v, we get the number λ. �

This proposition allows us to identify the ray that describes a state with anactual vector on the ray, an identification that we will regularly make, startingin the following:

Assumption 5. Let A be a Hermitian operator that has a simple eigenvalueλ with eigenvector w ∈ H. Start with a state v ∈ H. Take a measurement onthe state v for the quantity corresponding to A. If we measure the number λ,then the state of our object after the measurement is now the ray spanned bythe eigenvector w.

By this assumption, the act of measurement will fundamentally change thestate of the object. Further, we can only know probabilistically what the newstate will be. Let us look at an example. Suppose that our Hilbert space isH =C

2 with inner product given by

〈(α1,α2), (β1,β2)〉 = α1β1 +α2β2.

Let A be a Hermitian operator with eigenvectors w1 and w2, with correspond-ing eigenvalues λ1 and λ2, and with λ1 �= λ2. As we saw last chapter, we knowthat w1 and w2 are orthogonal and form a basis for C2. Further, as we haveseen, we can assume that each has length 1. Let our object initially be a vector

Page 184: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

13.2 Some Rules for Quantum Mechanics 169

v ∈C2. Then there exist complex numbers α1 and α2 such that

v = α1w1 +α2w2.

Suppose we measure the number λ1. This will happen with probability

|α1|2|α1|2 +|α2|2 ,

since

|〈w1,v〉|2〈v,v〉 = |〈w1,α1w1 +α2w2〉|2

〈α1w1 +α2w2,α1w1 +α2w2〉

= |α1|2|α1|2 +|α2|2 .

Note that we are not thinking that the object is initially in either the state w1

or the state w2, or that the probabilities are just measuring our uncertainties.No, the assumption is that the initial state is v = α1w1 + α2w2. The act ofmeasurement transforms the initial state into the state w1 with probability

|α1|2|α1|2+|α2|2 or into the state w2 with probability |α2|2

|α1|2+|α2|2 . Chance lies at theheart of measurement.

This assumption also means that the order in which we take measurementsmatters. Suppose we have two Hermitian operators A and B . If we take ameasurement corresponding to A, the state of the system must now becomean eigenvector of A, while if we take a measurement corresponding to B , thestate of the system must be an eigenvector of B . If we first measure A andthen B , we will end up with the state being an eigenvector of B , while if wefirst measure B and then A, our state will end up being an eigenvector of A.Thus the order of measuring matters, since the eigenvectors of A and B maybe quite different. This is far from the classical world.

The previous assumptions are all about how to calculate measurements inquantum mechanics. The next assumption is about how the state of an objectcan evolve over time and is hence the quantum mechanical analog of Newton’ssecond law F = ma.

Assumption 6. Let v(t) ∈ H be a function of time t describing the evolutionof an object. There is a Hermitian operator H (t) such that v(t) must satisfythe partial differential equation

i�∂v(t)

∂ t= H (t)v(t).

This partial differential equation is called Schrödinger’s equation.

Page 185: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

170 13 Some Quantum Mechanical Thinking

The constant � is Planck’s constant divided by 2π . More importantly, theHermitian operator H (t) is called the Hamiltonian and corresponds to theclassical energy. Hence its eigenvalues will give the energy of the object.

At this level of abstraction, it is hard to see how one could ever hope toset up an actual partial differential equation (PDE) that could be solved, solet us look at a specific example. (Though standard, we are following section5.1 of [55].) Suppose we have a particle restricted to moving on a straightline, with coordinate x . The Hilbert space will be a suitable space of complex-valued functions on R. Hence our states will be functions v(x). Since ourstates will be evolving in time, we will actually have two-variable functionsv(x , t). We assume that our object has a mass m and let U (x) be some functioncorresponding to the potential energy. Then the Hamiltonian will be

H (x , t)v(x , t) = −(

2m

)∂2v

∂x2+ U (x)v(x , t),

giving us Schrödinger’s equation:

i�∂v(x , t)

∂ t= −

(�

2m

)∂2v

∂x2+ U (x)v(x , t).

The reader should not feel that the preceding is reasonable; we stated it justto show that real physics can turn Assumption 6 into a concrete equation wecan sometimes solve.

Also, we are being deliberately cavalier about the nature of our Hilbertspace of functions. Frequently we will look at the Hilbert space of square-integrable functions with respect to Lebesgue measure. These functions neednot be continuous, much less differentiable. Our goal in this section was notto deal with these (interesting) issues but instead to give an outline of theunderlying mathematical structure for quantum mechanics.

13.3. Quantization

Classical Newtonian mechanics accurately describes much, but not all, ofthe world around us. Quantization is the procedure that allows us to startfrom a Newtonian description of a system and produce a quantum mechanicalone. We will explicitly do this in the next chapter with harmonic oscillators(springs).

In classical physics, we need coordinate systems. If we have n particles,then each particle will be described by its position and momentum. Positionsand momentums can be measured in the lab. To move to quantum mechanics,we need to replace the classical positions and momentums with operators. But

Page 186: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

13.3 Quantization 171

which operators do we chose? This process of replacing classical variableswith appropriate operators is the goal of quantization. The key is that ingeneral operators do not commute. We must choose our operators so that theircommutators have certain specified values. (Recall that given operators a andb, the commutator is [a,b] = ab − ba.) Using deep analogies with classicalmechanics (in particular with something called the Poisson bracket), Dirac, inchapter IV of his classic Principles of Quantum Mechanics [18], developed thefollowing method.

Classically, for n particles, the positions describe a point in R3n (since each

particle will have x , y, and z coordinates), and the corresponding momentumsdescribe a different point in R

3n (since the momentum of each particle willalso have x , y, and z coordinates). Thus a system of n particles is describedclassically by a point in R

6n .To quantize, we replace each of the position coordinates by operators

q1, . . . ,q3n and each of the momentum coordinates by operators p1, . . . , p3n,subject to the commutator rules

[qi ,q j ] = 0

[pi , p j ] = 0

and

[qi , p j] ={

i� if i = j0 if i �= j

.

For Dirac, each of these commutator rules stem from corresponding classicalproperties of the Poisson bracket. (We are not defining the Poisson bracket butinstead are mentioning them only to let the reader know that these commutatorrules were not arrived at arbitrarily.) Of course, we have not come close tospecifying the underlying Hilbert space. Frequently, quite a lot of informationcan be gleaned by treating the entire system at this level of abstraction.

Let us make all of this a bit more specific. Consider a one-dimensionalsystem, where the Hilbert space is some suitable space of differentiablefunctions on the interval [0,1]. We will have one coordinate operator q andone momentum operator p. Letting f (x) be a function in our Hilbert space,we define

q( f ) := x f (x),

or, in other words, simply multiplication by x . In order for [q , p] = i� to hold,we need

p( f ) := −i�d f

dx,

Page 187: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

172 13 Some Quantum Mechanical Thinking

since [x ,−i�

d

dx

]= i� f .

It is not uncommon for the position operators to be multiplications by thetraditional coordinates; that, in turn, forces the momentums to be −i� timesthe corresponding partial derivative.

13.4. Warnings of Subtleties

A rigorous axiomatic development of quantum mechanics is non-trivial. Thisis why we said we were making “assumptions,” as opposed to stating axioms.Our assumptions are closer to “rules of thumb.” For example, frequently theHermitian operator corresponding to an observable will not be defined on theentire Hilbert space but instead on a dense subspace. Often, the operators willbe defined on a Schwartz space, which, as we saw in the last chapter, is notitself a Hilbert space.

13.5. Exercises

Exercise 13.5.1. Consider the Hilbert space C2 with inner product

〈w,v〉 =wtranspose · v.

Show that

A =(

2 00 3

)

is a Hermitian operator. Let

v =(

12

)

be a vector corresponding to the state of some object. Suppose we take ameasurement of v for the quantity corresponding to A.

1. What is the probability that we get a measurement of 2?2. What is the probability that we get a measurement of 3?3. What is the probability that we get a measurement of 4?

Exercise 13.5.2. Consider the Hilbert space C2 with inner product

〈w,v〉 =wtranspose · v.

Page 188: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

13.5 Exercises 173

Show that

A =(

3 2 + i2 − i 1

)

is a Hermitian operator. Find the eigenvectors and eigenvalues of A. Let

v =(

12

)

be a vector corresponding to the state of some object. Suppose we make ameasurement of v for the quantity corresponding to A.

1. What is the probability that we get a measurement of 2 +√6?

2. What is the probability that we get a measurement of 3?

Exercise 13.5.3. 1. Let

A =(

a11 a12

a21 a22

).

Show thatP(x) = det (A − x I )

is at most a degree two polynomial.2. Let

A = a11 a12 a13

a21 a22 a23

a31 a32 a33

.

Show thatP(x) = det (A − x I )

is at most a degree three polynomial.3. Let A be an n × n matrix. Show that

P(x) = det (A − x I )

is at most a degree n polynomial.

Exercise 13.5.4. Let A be an n ×n matrix. Let λ be an eigenvalue of A. Showthat λ must be a root of

P(x) = det (A −λI ).

Conclude that A can have at most n eigenvalues.

Exercise 13.5.5. Let

A =(

3 22 5

).

Find the eigenvectors and eigenvalues of A.

Page 189: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

174 13 Some Quantum Mechanical Thinking

Exercise 13.5.6. Let

B =(

1 00 2

).

1. Find the eigenvalues and eigenvectors of B.2. Using the matrix A in the previous problem, show that

AB �= B A.

3. Show that AB has different eigenvectors than B A.4. Explain why measuring with respect to B first, then with respect to A is

fundamentally different from measuring with respect to A first, then withrespect to B. Hence the order in which we take measurements matters.

Exercise 13.5.7. Let A and B be n × n matrices such that A has n distincteigenvalues. Suppose that

AB = B A.

Show that B has the same eigenvectors.

Exercise 13.5.8. Let H be a Hilbert space. Let A be a Hermitian operatoracting on H and let v ∈H. Let µ be any nonzero complex number. Let λ be aneigenvalue of A with eigenvectorw. Show that the probability of measuring thenumber λ for the state corresponding to v is the same probability as measuringthe number λ for the state corresponding to µv. Explain why there is no wayof distinguishing the state v from the state µv.

Exercise 13.5.9. Let V be a vector space of infinitely differentiable functionson the unit interval. Let f (x) ∈ V .

1. Show that the operator

q( f ) = x f (x)

is a linear operator.2. Show that the operator

p( f ) = −i�d( f )

dx

is a linear operator.3. Show that

[q ,q] = 0[p, p

]= 0[q , p

]= i�.

Page 190: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

13.5 Exercises 175

Exercise 13.5.10. Let V be a vector space of infinitely differentiable functionson the unit square. Let f (x1, x2) ∈ V .

1. Show that the operators

q1( f ) = x1 f (x1, x2)

andq2( f ) = x2 f (x1, x2)

are linear operators.2. Show that the operators

p1( f ) = −i�∂( f )

∂x1

and

p2( f ) = −i�∂( f )

∂x2

are linear operators.3. Show that

[qi ,q j ] = 0[pi , p j

]= 0[qi , pi

]= i�[qi , p j

]= 0 for i �= j .

Exercise 13.5.11. Let V be the set of all continuous functions

f : [0,1] →C.

(You may assume that V is a complex vector space.) Define

〈·, ·〉 : V × V →C

by setting

〈 f , g〉 =∫ 1

0f (x)g(x) dx .

Show that this is an inner product.

Exercise 13.5.12. Show that the preceding vector space V is not complete.

Page 191: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

14

Quantum Mechanics of Harmonic Oscillators

Summary: The goal for this chapter is to quantize harmonic oscillators(e.g., springs). The underlying mathematics of harmonic oscillators is keyto understanding light, as we will see in the next chapter. We will see that thepossible energies for a quantized harmonic oscillator can only occur in discrete(quantized) values.

14.1. The Classical Harmonic Oscillator

It is surprising how much of the world can be modeled by using harmonicoscillators. Thus we want to understand the movement of springs andpendulums, each moving without friction.

Pendulum

or

Spring

x -axis

Figure 14.1

We will first give the Newtonian second law approach and then give theLagrangian and Hamiltonian approach for understanding how a spring willmove through time, allowing us to proceed to quantization in the next section.

Consider a spring

mass m

Figure 14.2

176

Page 192: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

14.1 The Classical Harmonic Oscillator 177

with a mass m attached at the end. The key assumption (Hooke’s law), onethat needs to be verified experimentally, is that the force acting on the end ofthe spring is

Force = −kx ,

where k is a positive constant depending on the physical make-up of the springand x is the position of the mass. Newton’s second law can now be applied.Let x(t) denote the position of the mass at time t . By Newton’s second law, weknow that x(t) must satisfy the differential equation

md2x

dt2= −kx(t),

or

md2x

dt2+ kx(t) = 0.

As this is a simple second order ordinary differential equation with constantcoefficients, a basis for its solutions can be easily determined. One basis ofsolutions is

cos

(√k

mt

), sin

(√k

mt

)

while another is

exp

(i

√k

mt

), exp

(−i

√k

mt

).

Note that a solution of the form

x(t) = c1 cos

(√k

mt

)+ c2 sin

(√k

mt

)

for constants c1 and c2, will indeed oscillate back and forth, just as one wouldexpect for a frictionless spring.

Amplitude = c21 + c2

2

Figure 14.3

Also, using that sin (a +b) = sin (a)cos(b)+sin(b)cos(a), any solution canbe written in the form

x(t) = A sin

(√k

mt +φ

),

Page 193: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

178 14 Quantum Mechanics of Harmonic Oscillators

for a constant φ. Since sine can only have values between −1 and 1, the largestthat x(t) can be is the absolute value of A, which is the amplitude.

Now to find the Hamiltonian, which is

Hamiltonian = H = Kinetic Energy+ Potential Energy.

The kinetic energy is the term that captures the energy involving speed.Technically,

Kinetic Energy = 1

2mv2 = 1

2mp2,

where v is the velocity and p = mv is the momentum.The potential energy is the term that gives us the energy depending on

position. Technically, its definition is

Potential Energy = −∫ x

0Force dx ,

assuming that we start with zero potential energy. Since for a harmonicoscillator the force is −kx , we have

Potential Energy = −∫ x

0−kt dt = k

2x2.

Thus we have that the total energy is

H = 1

2mp2 + k

2x2.

Now we can finally show that the energy is proportional to the square of theamplitude.

Theorem 14.1.1. The total energy for a harmonic oscillator with solution

x(t) = A sin(√

km t +φ

)is (1/2)k A2.

Page 194: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

14.2 The Quantum Harmonic Oscillator 179

Proof. We have

H = 1

2mv2 + k

2x2

= 1

2m

(dx

dt

)2

+ k

2x2

= 1

2m

(−A

√k

mcos

(√k

mt +φ

))2

+ k

2

(A sin

(√k

mt +φ

))2

= k A2

2

(cos2

(√k

mt +φ

)+ sin2

(√k

mt +φ

))

= k A2

2,

as desired.�

Finally, as seen in the earlier chapter on Lagrangians, we have

Lagrangian = Kinetic Energy− Potential Energy

= 1

2mp2 − k

2x2.

14.2. The Quantum Harmonic Oscillator

(While this section will be covering standard material, we will be closelyfollowing the outline and the notation of chapter 2 in Milonni’s TheQuantum Vacuum: An Introduction to Quantum Electrodynamics [41].) Fromlast section, we have our classical description for a harmonic oscillator’sHamiltonian:

H = 1

2mp2 + k

2x2,

where m is the mass, k a positive constant, x the position, and p themomentum. We want to quantize this. In particular, we want to find theallowable energies for a quantum spring, which means we should replace inthe Hamiltonian H the position variable x and the momentum variable p byoperators and then find this new H ’s eigenvalues. As a side benefit, the all-important annihilation and creation operators will be defined, and we willbegin to see the strange quantum properties of the vacuum.

Before finding these operators, we need to specify the Hilbert space, whichwill be an infinite-dimensional vector space of one-variable functions that

Page 195: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

180 14 Quantum Mechanics of Harmonic Oscillators

converge extremely rapidly to zero for x → ±∞. Slightly more specifically,our Hilbert space will contain

S(R) ={

f : R→C : ∀n,m ∈N, limx→±∞

∣∣∣∣xn dm f

dxm

∣∣∣∣= 0,∫ ∞

−∞| f |2 < ∞

},

the Schwartz space from the chapter on Hilbert spaces. We are deliberatelybeing a bit vague in stating what type of functions f : R → C we areconsidering, as this is technically difficult. The inner product is

〈 f , g〉 =∫ ∞

−∞f g dx .

The x will become the operator q that sends each function f (x) to the newfunction x f (x). Thus

q( f ) = x f (x).

As seen in the Hilbert space chapter, this q is a linear operator. The operatorcorresponding to the momentum p, which we will also denote by p, mustsatisfy the commutation relation

[q , p] = i�.

Also, as seen in the Hilbert space chapter, this means that as an operator

p = �

i

d

dx= −i�

d

dx.

We know from Assumption 3 of the previous chapter that the allowableenergies will be the eigenvalues of the operator

H = 1

2mp2 + k

2q2.

For notational convenience, which will be clearer in a moment, we set

ω =√

k

m,

so that k = mω2. Note that in the classical world, ω is the frequency of thesolution.

Thus we want to understand

H = 1

2mp2 + mω2

2q2.

We are going to show that the eigenvalues and eigenvectors are indexed by thenonnegative integers n = 0,1,2, . . . with eigenvalues

En =(

n + 1

2

)�ω.

Page 196: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

14.2 The Quantum Harmonic Oscillator 181

Further, in showing this, we will need two associated operators: the creationand annihilation operators (also called the raising and lowering operators),which have independent and important meaning.

To start our calculations, set

a = 1√2m�ω

(p − imωq).

We call this operator, for reasons that will be made clear in a moment, theannihilation operator or the lowering operator. Its adjoint is

a∗ = 1√2m�ω

(p + imωq),

which is to be shown in the exercises and is called the creation or raisingoperator. By direct calculation (also to be shown in the exercises), we have

[a,a∗] = 1.

This allows us to rewrite our Hamiltonian operator as

H = 1

2�ω(aa∗ + a∗a)

= �ω

(a∗a + 1

2

).

We will concentrate on the operator a∗a. Denote an eigenvector of a∗aby vλ, normalized to have length one (〈vλ,vλ〉 = 1), with eigenvalue λ. Wewill show in the next few paragraphs that these eigenvalues λ are nonnegativeintegers. We first show that these eigenvalues are non-negative real numbers.

Lemma 14.2.1. The eigenvalues of a∗a are non-negative real numbers.

Proof. By the definition of the adjoint a∗ and of vλ and λ, we have

〈avλ,avλ〉 = 〈vλ,a∗avλ〉 = λ〈vλ,vλ〉 = λ.

Since this is the square of the length of the vector avλ, we see that λ must be anonnegative real number. �

Lemma 14.2.2. Assuming that each eigenvalue of a∗a has only a one-dimensional eigenspace, then

avλ =√

λvλ−1

anda∗vλ = √

λ+ 1vλ+1.

(This is why some call a a lowering operator and a∗ a raising operator.)

Page 197: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

182 14 Quantum Mechanics of Harmonic Oscillators

Proof. As already mentioned, we know that

1 = [a,a∗] = aa∗ − a∗a.

Then

a∗a = aa∗ − 1.

Hence

(a∗a)avλ = (aa∗ − 1)avλ

= aa∗avλ − avλ

= (λ− 1)avλ.

Thus avλ is an eigenvector of a∗a with eigenvalue λ − 1. By our assumptionthat each eigenvalue of a∗a has only a one-dimensional eigenspace, we havethat avλ must be a multiple of vλ−1. That is,

avλ = Cvλ−1.

We know, however, that the square of the length of avλ is λ. Since vλ−1 haslength one, we see that

C =√

λ.

(At the end of this section, we will show that each eigenvalue of a∗a actuallydoes have a one-dimensional eigenspace.)

Proposition 14.2.1. The eigenvalues λ for the operator a∗a are non-negativeintegers.

Proof. By induction we have

akvλ =√

λ√

λ− 1 · · ·√

λ− (k + 1)vλ−k .

But all of the eigenvalues are non-negative. This forces λ to be a non-negativeinteger.

This allows us to write the eigenvectors as vn , for non-negative integers n.We are not directly interested in the eigenvectors and eigenvalues of the

“auxilary” operator a∗a but instead want to know the possible energies (theeigenvalues of the Hamiltonian) of a harmonic oscillator. Hence the followingis key:

Page 198: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

14.2 The Quantum Harmonic Oscillator 183

Theorem 14.2.1. The eigenvalues for the Hamiltonian for the quantumharmonic oscillator are

En = (n + 1/2)�ω

with corresponding eigenvectors vn.

Proof. We have that

Hvn = �ω

(a∗a + 1

2

)vn

= (n + 1/2)�ωvn.

Thus the vn are indeed eigenvectors with the desired eigenvalues.Now to show that these are the only eigenvectors. Let w be an eigenvector

of H , with corresponding eigenvalue λ. We have

a∗a = 1

�ωH − 1

2.

Then

a∗aw =(

1

�ωH − 1

2

)w

=(

λ

�ω− 1

2

)w.

Thus w is also an eigenvector for a∗a and hence must be one of the vn . Thusthe eigenvectors for H are precisely the vectors vn .

This result is remarkable. The energies can only be the numbers

(1/2)�ω, (3/2)�ω, (5/2)�ω, . . ..

Further, there is a lowest energy (1/2)�ω. It is reasonable to think of this asthe energy when nothing is happening in the system. In the quantum world, aspring that is just sitting there must have a positive energy. We will see in thenext chapter that this leads in the quantum world to the vacuum itself beingcomplicated and problematic.

We have one final technicality to deal with, namely, showing that theeigenspaces of a∗a are all one-dimensional.

Lemma 14.2.3. The eigenspaces for the operator a∗a have dimension one.

Proof. There are parts of the preceding argument that do not depend on theeigenspaces of a∗a having dimension one. We used the notation that the vλ

are eigenvectors of a∗a with eigenvalue λ, without requiring λ to be an integer.

Page 199: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

184 14 Quantum Mechanics of Harmonic Oscillators

We showed both that avλ is an eigenvector of a∗a with eigenvalue λ − 1 andthat each avλ has length

√λ. This is true for any eigenvector of length one

with eigenvalue λ. Then avλ must be√

λ times an eigenvector of lengthone with eigenvalue λ − 1. Hence akvλ must be

√λ√

λ− 1 · · ·√λ− (k + 1)times an eigenvector of length one with eigenvalue λ− k. But these can neverbe negative. Hence each λ is a non-negative integer. In particular, 0 is aneigenvalue.

We will show that the eigenspace for the eigenvalue 0 is one-dimensional.Since the a∗ and a operators map eigenspaces to each other, this will give usour result.

Here we need actually to look at the relevant Hilbert space of square-integrable one-variable functions, with some fixed interval as domain. Let v0

be an eigenvector of a∗a with eigenvalue 0.Then av0 = 0. We will now write v0 as a function ψ(x) in our Hilbert space.

Using that

a = p − imωx = �

i

d

dx− imωx ,

we have ψ(x) satisfying the first-order differential equation

i

dψ(x)

dx= imωxψ(v).

But under the normalization that ψ(x) has length one, this first-order ordinarydifferential equation has the unique solution

ψ(x) =( mω

4π�

)1/4e− mωx2

2� ,

giving us our one-dimensional eigenspace.�

14.3. Exercises

Exercise 14.3.1. Show that

x(t) = c1 cos

(√k

mt

)+ c2 sin

(√k

mt

),

for any constants c1 and c2, is a solution to

md2x

dt2+ kx(t) = 0.

Then show that

x(t) = c1 exp

(i

√k

mt

)+ c2 exp

(−i

√k

mt

),

Page 200: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

14.3 Exercises 185

for any constants c1 and c2, are also solutions.

Exercise 14.3.2. Let a be the annihilation operator and a∗ the creationoperator for the quantum harmonic oscillator. Show that

[a,a∗] = 1.

Exercise 14.3.3. Let a be the annihilation operator, a∗ the creation operator,and H the Hamiltonian operator for the quantum harmonic oscillator. Showthat

H = 1

2�ω(aa∗ + a∗a)

= �ω

(a∗a + 1

2

).

Exercise 14.3.4. On the Schwartz space, show that

〈 f , g〉 =∫ ∞

−∞f g dx

defines an inner product.

Exercise 14.3.5. Show that q( f (x)) = x f (x) defines a map

q : S(R) → S(R).

Exercise 14.3.6. Show that q( f (x)) = x f (x) is Hermitian.

Exercise 14.3.7. Show that p( f ) = −i� d fdx defines a map

p : S(R) → S(R).

Exercise 14.3.8. Show that p( f ) = −i� d fdx is Hermitian.

Exercise 14.3.9. Show that the creation operator 1√2m�ω

(p + imωq) is theadjoint of the annihilation operator a, using the Hilbert space of Section 2.

Exercise 14.3.10. Let H be a Hilbert space with inner product 〈·, ·〉. Let A, B :H →H be Hermitian. Show that A2 and A + B are also Hermitian.

Exercise 14.3.11. Prove that

H = 1

2mp2 + mω2

2q2

is Hermitian.

Page 201: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15

Quantizing Maxwell’s Equations

Summary: Our goal is to quantize Maxwell’s equations, leading to anatural interpretation of light as being composed of photons. The key is thatthe quantization of harmonic oscillators (or, more prosaically, the mathematicsof springs) is critical to understanding light. In particular, we show that thepossible energies of light form a discrete set, linked to the classical frequency,giving us an interpretation for the photoelectric effect.

15.1. Our Approach

From Einstein’s explanation of the photoelectric effect, light seems to bemade up of photons, which in turn are somewhat particle-like. Can we startwith the classical description of light as an electromagnetic wave solution toMaxwell’s equations when there are no charges or currents, quantize, and thenget something that can be identified with photons? That is the goal of thischapter. (As in the previous chapter, while all of this is standard, we will befollowing the outline given in chapter 2 of [41].)

Maxwell’s equations are linear. We will concentrate on the ‘monochro-matic’ solutions, those with fixed frequency and direction. We will seethat the Hamiltonians of these monochromatic solutions will have the samemathematical structure as a Hamiltonian for a harmonic oscillator. Weknow from the last chapter how to quantize the harmonic oscillator. Moreimportantly, we saw that there are discrete eigenvalues and eigenvectors for thecorresponding Hamiltonian operator. When we quantize the Hamiltonians forthe monochromatic waves in an analogous fashion, we will again have discreteeigenvalues and eigenvectors. It is these eigenvectors that we can interpret asphotons.

186

Page 202: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15.2 The Coulomb Gauge 187

15.2. The Coulomb Gauge

We know that light is an electromagnetic wave satisfying Maxwell’s equationswhen there are no charges (ρ = 0) and zero current ( j = 〈0,0,0〉):

∇ · E = 0

∇ × E = −∂ B

∂ t∇ · B = 0

∇ × B = ∂ E

∂ t,

where we assume that the speed of light is one. In Chapter 7, we saw that thereis a vector potential field A(x , y, z, t) with

B = ∇ × A

and a scalar potential function φ(x , y, z, t) with

∇ ·φ = −E − ∂ A

∂ t.

As discussed in Chapter 7, neither potential is unique. In order to makethe problem of quantizing electromagnetic waves analogous to quantizingharmonic oscillators, we need to choose particular potentials. We will assumethat “at infinity” both potentials are zero.

Theorem 15.2.1. For an electric field E and a magnetic field B when thereare no charges and current, we can choose the vector potential A to satisfy

∇ · A = 0

(such an A is called a Coulomb potential, or a Coulomb gauge) and a scalarpotential to satisfy

φ = 0.

We will show that the part analogous to the Hamiltonian of a harmonicoscillator will be in terms of this potential.

Proof. From Chapter 7, we know that there is at least one vector potential fieldA such that

B = ∇ × A.

Fix one of these solutions. Also, from Chapter 7, we know that A + ∇( f ) isalso a vector potential, where f (x , y, z, t) is any given function.

Page 203: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

188 15 Quantizing Maxwell’s Equations

To find a Coulomb gauge, we must show that there exists a function f suchthat

∇ · (A +∇( f )) = 0.

Thus we must find a function f such that

∇ ·∇( f ) = −∇ · A.

Set A = (A1, A2, A3). Since

∇ ·∇( f ) =(

∂x,

∂ y,

∂z

)·(

∂ f

∂x,∂ f

∂ y,∂ f

∂z

)= ∂2 f

∂x2+ ∂2 f

∂ y2+ ∂2 f

∂z2,

we must solve

∂2 f

∂x2+ ∂2 f

∂ y2+ ∂2 f

∂z2= −∂ A1

∂x− ∂ A2

∂ y− ∂ A3

∂z,

where the right hand side is given. This partial differential equation can alwaysbe solved. (For background, see chapter 2, section C of [23].)

Thus we can assume that there is a vector potential A with both B = ∇ × Aand ∇ · A = 0. We must now show that we can choose the scalar potential φ alsoto be identically zero. Here we will need to use that, at great enough distances(usually referred to as “at infinity”), the various fields and hence the variouspotentials are zero; this is an assumption that we have not made explicit untilnow.

From our earlier work on the existence of potential functions, we know thatthere is a scalar potential function φ such that

E = −∇ ·φ − ∂ A

∂ t.

But for our E , there are no charges. Thus

0 = ∇ · E

= −∇ ·(

∇ ·φ + ∂ A

∂ t

)

= −∇2 ·φ − ∂∇ · A

∂ t

= −∇2 ·φ.

Thus we must have that everywhere in space

∇2 ·φ = ∂2φ

∂x2+ ∂2φ

∂ y2+ ∂2φ

∂z2= 0.

Page 204: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15.2 The Coulomb Gauge 189

Such functions φ are called harmonic. Since at the boundary (at “infinity”)we are assuming that φ = 0, we get φ = 0 everywhere, as desired. (Again,this results from knowing how to find harmonic functions; for background seechapter 2, section A in Folland’s Introduction to Partial Differential Equations[22].)

In the preceding proof that ∇ · A = 0, we do not use the assumption that Eand B satisfy Maxwell’s equations when there are no charges or current, justthat they satisfy Maxwell’s equations in general. Hence we can always choosethe vector potential A to be Coulomb. To get a zero function scalar potential,though, we needed to use that there is no charge or current.

If the vector potential A is Coulomb, then it will satisfy a wave equation:

Theorem 15.2.2. Let A be a Coulomb vector potential for electric andmagnetic fields satisfying Maxwell’s equations in a free field. Then A satisfiesthe wave equation

∇2 A − ∂2 A

∂ t2= 0.

Proof. For ease of notation, we now write A = (A1, A2, A3), instead of ourearlier (A1, A2, A3)

Recall that ∇2 A denotes(∂2 A1

∂x2+ ∂2 A1

∂ y2+ ∂2 A1

∂z2,∂2 A2

∂x2+ ∂2 A2

∂ y2+ ∂2 A2

∂z2,∂2 A3

∂x2+ ∂2 A3

∂ y2+ ∂2 A3

∂z2

)

for the vector field A = (A1, A2, A3). Thus we want to show for each i that

∂2 Ai

∂x2+ ∂2 Ai

∂ y2+ ∂2 Ai

∂z2= −∂2 Ai

∂ t2.

Since we are assuming that the scalar potential φ is identically zero, wehave

E = −∂ A

∂ t.

Then we have

∂2 A

∂ t2= −∂ E/∂ t

= −∇ × B , by Maxwell

= −∇ × (∇ × A), since B = ∇ × A.

Page 205: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

190 15 Quantizing Maxwell’s Equations

Now to calculate ∇ × (∇ × A). To ease notation, we will write

∂ Ai

∂x= Ai

x ,∂ Ai

∂ y= Ai

y,∂ Ai

∂z= Ai

z ,∂2 Ai

∂x∂ y= Ai

xy , etc.

We have

∇ × (∇ × A) = ∇ × det

i j k

∂x

∂ y

∂zA1 A2 A3

= ∇ × (A3y − A2

z , A1z − A3

x , A2x − A1

y),

which is

(A2xy − A1

yy − A1zz + A3

xz , A3zy − A2

zz − A2xx + A1

xy , A1xz − A3

xx − A3yy + A2

yz).

Here we will finally use that A is a Coulomb gauge:

∇ · A = A1x + A2

y + A3z = 0.

Thus

A2xy + A3

xz = ∂

∂x(A2

y + A3z ) = −A1

xx

A1xy + A3

yz = ∂

∂ y(A1

x + A3z ) = −A2

yy

A1xz + A2

yz = ∂

∂z(A1

x + A2y) = −A3

zz ,

giving us our desired ∇2 A − ∂2 A∂ t2 = 0.

The wave equation has been extensively studied, and its solutions are wellknown. The standard approach is via separation of variables. Almost all bookson differential equations will discuss separation of variables, ranging frombeginning texts, such as the one by Boyce and DiPrima [6], to graduate leveltexts, such as the one by Evans [20] and all the way to research monographs,such as Cain and Mayer’s Separation of Variables for Partial DifferentialEquations [9]. This allows us just to write down the solutions.

The method of separation of variables starts with the assumption that allsolutions to ∇2 A − ∂2 A

∂ t2 = 0 are (possibly infinite) linear combinations ofsolutions of the form

f (t)G(x , y, z),

Page 206: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15.2 The Coulomb Gauge 191

and hence a combination of a product of a real-valued function f (t) with avector-valued function G(x , y, z). Let us assume for a moment that our solutionto the wave equation is A(x , y, z, t) = f (t)G(x , y, z). Then we have

0 = ∇2 A − ∂2 A

∂ t2

= ∇2( f (t)G(x , y, z)) − ∂2

∂ t2( f (t)G(x , y, z))

= f (t)∇2G(x , y, z) −(

∂2 f

∂ t2

)G(x , y, z).

WritingG(x , y, z) = (G1(x , y, z), G2(x , y, z), G3(x , y, z)) ,

we have

∇2Gi (x , y, z)

Gi (x , y, z)=(

∂2 f∂ t2

)f (t)

for each i . Here is why this technique is called “separation of variables.” Theleft-hand side of the preceding equation is a function of x , y, z alone, while theright-hand side is a function of just the variable t . But this means that eachside must be equal to some constant k. Thus we must have G and f satisfying

∇2G(x , y, z) = kG(x , y, z)

∂2 f

∂ t2= k f (t).

To solve only the wave equation, any constant k will work. As seen in thepreceding references, though, for the solutions to be periodic, and thus to bewhat we would want to consider as waves, we need k to be negative; that iswhy we can set

k = −ω2,

for a constant ω. Thus we want to find solutions to

∇2G(x , y, z) = −ω2G(x , y, z)

∂2 f

∂ t2= −ω2 f (t).

The solution G(x , y, z) is a vector field with x-coordinate

a sin(k1ωx + k2ωy + k3ωz +α),

y-coordinateb sin(k1ωx + k2ωy + k3ωz +α),

Page 207: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

192 15 Quantizing Maxwell’s Equations

and z-coordinate

c sin (k1ωx + k2ωy + k3ωz +α),

where (k1,k2,k3) is a vector of length one, and the phase α and the coefficientsa, b, and c are real constants. The solution f (t) is of the form

f (t) = sin (ωt +β),

where the phase β is a real constant. To ease notation, we will assume that thephases α and β are both zero. Also to ease notation, we will reorient our x , y, zcoordinate system so that (k1,k2,k3) = (0,0,1). None of these assumptionsfundamentally alters the underlying mathematics. We then have

G(x , y, z) = (a sin(ωz),b sin(ωz),c sin(ωz))

f (t) = sin (ωt).

We know that

∇ · A = 0.

Then

0 = ∂( sin (ωt)a sin(ωz))

∂x+ ∂( sin(ωt)b sin (ωz))

∂ y+ ∂( sin (ωt)c sin(ωz))

∂z

= c sin (ωt)cos(ωz),

forcing c = 0, meaning

A(x , y, z, t) = (a sin (ωt) sin (ωz),b sin(ωt) sin (ωz),0) .

Then the electric and magnetic fields are

E(z, t) = −∂ A

∂ t

= −∂ f (t)

∂ tG(x , y, z)

= −ωcos(ωt)G(x , y, z)

= (−aωcos(ωt) sin (ωz),−bωcos(ωt) sin (ωz),0)

B(z, t) = ∇ × A

= f (t)∇ × G(x , y, z)

= f (t)∇ × (a sin (ωz),b sin(ωz),0)

= (−bω sin(ωt)cos(ωz),aω sin(ωt)cos(ωz),0) .

Page 208: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15.3 The “Hidden” Harmonic Oscillator 193

15.3. The “Hidden” Harmonic Oscillator

We are close to our goal of showing that the Hamiltonian for a monochromaticlight wave has the same form as a harmonic oscillator’s Hamiltonian.

For a physical wave, we know that the Hamiltonian (the energy) isproportional to the square of the wave’s amplitude. We want to make the claimthat the energy density for an electromagnetic wave is

E · E + B · B ,

in analogy to the square of the amplitude of a traditional wave.Here is the intuitive justification. (Of course, this formula for the energy of

an electromagnetic wave can be checked experimentally.) An electromagneticwave is made up of two separate waves, namely, the electric wave and themagnetic wave. The energy density of the electric wave will be the squareE · E of its amplitude E , and the energy density of the magnetic wave will bethe square B · B of its amplitude B . The total energy should simply be the sumof these two energies, giving us our desired energy density of E · E + B · B .

To find the Hamiltonian, we need to integrate this energy density over anappropriate volume V . Using the notation for A, E , and B from the previoussection, we have the constant ω. We choose for our volume a box with volumeV whose sides are parallel to the x-y plane, x-z plane, and y-z plane, withside lengths of

√ω/π in the directions of the x- and y-axes and 2π/ω in the

direction of the z-axis. The choice of 2π/ω for the z-axis is natural, reflectinghow we set- up the electric and magnetic fields in the previous section. Thechoice of

√ω/π for the x- and y-axes is a bit more arbitrary but will make

some of the integrals that follow turn out cleanly.

z

x

y

ω/π

ω/π

2π/ω

Volume V

Figure 15.1

We will use that

∫ 2πω

0sin2 (ωz) dz =

∫ 2πω

0cos2 (ωz) dz = π

ω.

Page 209: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

194 15 Quantizing Maxwell’s Equations

Then the Hamiltonian over the volume V will be

H (t) =∫

V(E · E + B · B) dxdydz

=∫

Va2ω2 cos2 (ωt) sin2 (ωz) dxdydz

+∫

Vb2ω2 cos2 (ωt) sin2 (ωz) dxdydz

+∫

Vb2ω2 sin2 (ωt)cos2 (ωz) dxdydz

+∫

Va2ω2 sin2 (ωt)cos2 (ωz) dxdydz

= a2ω2 cos2 (ωt) + b2ω2 cos2 (ωt)

+b2ω2 sin2 (ωt) + a2ω2 sin2 (ωt).

Of course, we could rewrite this as simply ω2(a2 +b2) = ω2, since a2 +b2 wasassumed to be 1, which reflects that the Hamiltonian for a monochromaticwave is a constant. We will resist this temptation, as our goal is not thecleanest formula for the classical electromagnetic Hamiltonian, but instead aformulation that looks like the Hamiltonian for a harmonic oscillator, allowingus to quantize.

Still following the outline in [41], we set

α(t) = e−iωt ,

giving us that

sin (ωt) = i

2(α(t) −α(t))

cos(ωt) = 1

2(α(t) +α(t)) .

Then set

q(t) = i

2(α(t) −α(t))

p(t) = ω

2(α(t) +α(t)) .

Then the Hamiltonian is

H (t) = p2 +ω2q2.

Page 210: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15.4 Quantization of Maxwell’s Equations 195

Here is why we made these seemingly random substitutions. Recall that theHamiltonian for a harmonic oscillator is

Hharmonic oscillator = 1

2mp2 + mω2

2x2,

where now x is the position and p is the momentum. For the electromagneticHamiltonian, the function q plays the role of position and the function p therole of the momentum. In order for this analog to hold, note that

dq

dt= p,

d p

dt= −ω2q ,

in direct analog to the role that the variables x and p play in the Hamiltonianfor the quantum harmonic oscillator.

Thus, mathematically, the Hamiltonian for a monochromatic electromag-netic wave has the same form as the Hamiltonian for a harmonic oscillator.This result will allow us easily to quantize Maxwell’s equations in the nextsection.

15.4. Quantization of Maxwell’s Equations

We are finally ready to quantize the electromagnetic fields. We have just seenthat the Hamiltonian for the electromagnetic field can be put into the same formas that of the Hamiltonian for a harmonic oscillator. From the last section, theHamiltonian for a monochromatic electromagnetic wave is

H (t) = 1

2(p2 +ω2q2)

where q(t) = i2 (α(t) −α(t)) and p(t) = ω

2 (α(t) +α(t)). To quantize,motivated by our work for the harmonic oscillator, we replace the continuousvariable α(t) with an appropriate operator.

Let a(t) be an operator with adjoint a∗(t) such that

[a,a∗] = 1.

Replace the continuous variable α(t) by the operator

√�

ωa(t),

and, hence, replace α(t) by the operator√�

ωa∗(t).

Page 211: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

196 15 Quantizing Maxwell’s Equations

We will still call a∗(t) and a(t) the creation and annihilation operators.Then we replace our continuous function q(t) by the operator (still denoted

as q(t))

q(t) = i

2

√�

ω

(a(t) − a∗(t)

)and the continuous function p(t) by the operator (still denoted as p(t))

p(t) = ω

2

√�

ω

(a(t) + a∗(t)

).

Then, using that [a,a∗] = 1 implies aa∗ = a∗a +1, the quantized version ofthe Hamiltonian for the electromagnetic field is

H = �ω

(a∗a + 1

2

).

We know that the eigenvalues of the Hamiltonian are of the form

En =(

n + 1

2

)�ω.

We can interpret the integer n as the number of photons of frequency ω. Notethat, even when n = 0, there is still a non-zero energy. The big news is thatour formulation has led to the prediction of photons, allowing us at last to startexplaining the photoelectric effect.

We now want to quantize the vector potential A(x , y, z, t), the electricfield E(x , y, z, t), and the magnetic field B(x , y, z, t). This procedure isstraightforward. We just replace each occurrence of α(t) with the operator√

ωa(t) and each α(t) with the operator

√�

ωa∗(t). In the classical case we

have

A(x , y, z, t) = sin (ωt)G(x , y, z) = i

2(α(t) −α(t))G(x , y, z),

where G is a vector field. Then the quantized version is

A = i

2

(√�

ωa(t) −

√�

ωa∗(t)

)G(x , y, z),

with G still a vector field.Similarly, as we saw in Section 15.2, the classical E and B are

E(x , y, z, t) = −ωcos(ωt)G(x , y, z) = −ω

2(α(t) +α(t))G(x , y, z)

and

B(x , y, z, t) = sin (ωt) (∇ × G(x , y, z)) = i

2(α(t) −α(t))(∇ × G(x , y, z)) .

Page 212: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15.5 Exercises 197

Then the quantized versions must be

E = −ω

2

(√�

ωa(t) +

√�

ωa∗(t)

)G(x , y, z)

and

B = i

2

(√�

ωa(t) −

√�

ωa∗(t)

)(∇ × G(x , y, z)) .

15.5. Exercises

Exercise 15.5.1. Let F(x) be a function of x and G(t) be a function of adifferent variable t. Suppose for all x and t that

F(x) = G(t).

Show that there is a constant α such that for all x

F(x) = α

and for all t

G(t) = α.

Exercise 15.5.2. Suppose that a solution f (x , t) to the partial differentialequation

∂2 f

∂x2= ∂2 f

∂ t2

can be written as the product of two one-variable functions X(x) and T (t):

f (x , t) = X(x)T (t).

Using the first problem, show that there is a constant α such that X(x) satisfiesthe ordinary differential equation

d2 X(x)

dx2= αX(x)

and T (t) satisfies the ordinary differential equation

d2T (t)

dt2= αT (t).

Exercise 15.5.3. Show that the product

A(r , t) = α(t)A0(r )

Page 213: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

198 15 Quantizing Maxwell’s Equations

satisfies the wave equation

∇2(A(r , t)) − ∂2(A(r , t))

∂ t2= 0,

if A0(r ) is a vector-valued function, where r denotes the position variables(x , y, z), satisfying

∇2 A0(r ) + k2 A0(r ) = 0,

and α(t) is a real-valued function satisfying

d2α(t)

dt2= −k2α(t),

and k is any constant.

Exercise 15.5.4. Show that the vector field

E(x , y, x , t) = (E1(x , y, z, t), E2(x , y, z, t), E3(x , y, z, t))

whereEi (x , y, z, t) = Ei sin (k1ωx + k2ωy + k3ωz −ωt),

with each Ei a constant, the vector (k1,k2,k3) having length one and ω beinga constant, satisfies the wave equation

∇2 E − ∂2 E

∂ t2= 0.

Exercise 15.5.5. Let

E = (E1( sin (ωz −ωt), E2( sin (ωz −ωt),0)

andB = (E2 cos(ωz −ωt),−E1 cos(ωz −ωt),0).

Show thatE · E + B · B = E2

1 + E22

andE · B = 0.

Exercise 15.5.6. Let

α(t) = α(0)e−iωt , α(t) = α(0)eiωt

be solutions to d2α(t)dt2 = −ω2α(t). Letting q(t) = i

2 (α(t) − α(t)) and p(t) =ω2 (α(t) +α(t)) , show that

dq

dt= p,

d p

dt= −ω2q .

Page 214: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

15.5 Exercises 199

Exercise 15.5.7. Show∫ 2πω

0sin2 (ωz) dz =

∫ 2πω

0cos2 (ωz) dz = π

ω.

Exercise 15.5.8. Using the notation from Section 15.3, show that

sin (ωt) = i

2(α(t) −α(t))

and

cos(ωt) = 1

2(α(t) +α(t)) .

Exercise 15.5.9. Using the notation of Section 15.3, show that

H (t) = p2 +ω2q2.

Exercise 15.5.10. Using the notation from Section 15.4, show that

[a(t),a∗(t)] = 1

implies[q(t), p(t)] = i�.

Exercise 15.5.11. Let A : V → V be any linear operator from a Hilbert spaceV to itself. Show that

A + A∗

must be Hermitian. (You may assume that A∗∗ = A, as is indeed always thecase.)

Exercise 15.5.12. Let A : V → V be any linear operator from a Hilbert spaceV to itself. Show that

i (A − A∗)

must be Hermitian.

Exercise 15.5.13. Using the notation from Section 15.4, show that q(t), p(t),and H (t) are all Hermitian.

Exercise 15.5.14. Using the notation from 15.4, show that the operatorcorresponding to the classical electric field,

E = −ω

2

(√�

ωa(t) +

√�

ωa∗(t)

)G(x , y, z),

is Hermitian.

Page 215: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

200 15 Quantizing Maxwell’s Equations

Exercise 15.5.15. Using the notation from 15.4, show that the operatorcorresponding to the classical magnetic field

B = i

2

(√�

ωa(t) −

√�

ωa∗(t)

)∇ × G(x , y, z)

is Hermitian.

Page 216: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16

Manifolds

Summary: The goal of this chapter is to introduce manifolds, which are keyobjects for geometry. We will give three different ways for defining manifolds:parametrically, implicitly, and abstractly.

16.1. Introduction to Manifolds

16.1.1. Force = Curvature

The goal for the rest of this book is to understand the idea that

Force = Curvature.

This idea is one of the great insights in recent science. In this chapterwe introduce manifolds, which are basically the way to think about decentgeometric objects. In the next chapter we will introduce vector bundles onmanifolds. In the following chapter, we will introduce connections, whichare fancy ways of taking derivatives. In Chapter 19, we will define differentnotions for curvature. Finally, in Chapter 20, in the context of Maxwell’sequations, we will see how “Force = Curvature.”

16.1.2. Intuitions behind Manifolds

(This section is really the same as the first few paragraphs of section 6.4 in[26].)

While manifolds are, to some extent, some of the most naturally occurringgeometric objects, it takes work and care to create correct definitions. Inessence, a k-dimensional manifold is any space that, in a neighborhood of anypoint, looks like a ball in R

k . We will be concerned at first with manifolds thatlive in some ambient Rn. For this type of manifold, we give two equivalent

201

Page 217: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

202 16 Manifolds

definitions: the parametric version and the implicit version. For each of theseversions, we will carefully show that the unit circle S1

1

Figure 16.1

in R2 is a one-dimensional manifold. (Of course, if we were just interested in

circles we would not need all of these definitions; we are just using the circleto get a feel for the correctness of the definitions.) Then we will define anabstract manifold, a type of geometric object that need not be defined in termsof some ambient Rn.

Consider the circle S1. Near any point p ∈ S1 the circle looks like an interval(admittedly a bent interval). In a similar fashion, we want our definitions toyield that the unit sphere S2 in R3 is a two-dimensional manifold, since nearany point p ∈ S2,

z

x

y

Figure 16.2

the sphere looks like a disc (though, again, more like a bent disc). We want toexclude from our definition of manifold objects that contain points for whichthere is no well-defined notion of a tangent space, such as the cone:

p

Figure 16.3

which has tangent difficulties at the vertex p.

Page 218: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16.2 Manifolds Embedded in Rn 203

16.2. Manifolds Embedded in Rn

16.2.1. Parametric Manifolds

(This is an expansion of Section 9.2.)Let M be a set of points in R

n . Intuitively we want to say that M is a k-dimensional manifold, if, for any point p on M , M looks, up close and near p,like Rk .

Before giving the definition, we need some notation. Let x = (x1, x2, . . . , xn)be the coordinates for Rn and let u = (u1, . . . ,uk) be coordinates in R

k . Letk ≤ n and let U be an open ball in R

k centered at the origin.A map

γ : U →Rn

is given by specifying n functions x1(u), . . . , xn(u) such that

γ (u) = γ (u1,u2, . . . ,uk)

= (x1(u1, . . . ,uk), . . . , xn(u1, . . . ,uk))

= (x1(u), . . . , xn(u)).

The map γ is said to be continuous if the functions x1(u), . . . , xn(u) arecontinuous, differentiable if the functions are differentiable, and so on. Wewill assume that our map γ is differentiable. The Jacobian of the map γ is then × k matrix

J (γ ) =

∂x1∂u1

· · · ∂x1∂uk

...∂xn∂u1

· · · ∂xn∂uk

.

We say that the rank of the Jacobian is k if the preceding matrix has a rankk × k submatrix or, in other words, if one of the k × k minors is an invertiblematrix. We can now define (again) a k-dimensional manifold in Rn .

Definition 16.2.1. A set of points M in Rn is a k-dimensional manifold if for

every point p ∈ M, there exist a small open ball V in Rn, centered at p, and adifferentiable map

γ : U →Rn ,

with U an open ball in Rk , centered at the origin, such that

1. γ (0, . . . ,0) = p,2. γ is a one-to-one onto map from U to M ∩ V ,3. The rank of the Jacobian for γ is k at every point.

Page 219: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

204 16 Manifolds

Let us look at an example: We show that the circle S1 is indeed a one-dimensional manifold via the preceding definition, for the point p = (1,0).

(1, 0) = p

Figure 16.4

Our map will be

γ (t) = (√

1 − t2, t).

Note that

γ (0) = (1,0).

Let our open “ball” U be the open interval {t : −1/2 < t < 1/2}. To find thecorresponding open ball V in R2, let the radius of V be

r =

√√√√(√3

2− 1

)2

+ 1

4.

(This r is just the distance from the point γ (1/2) = (√

3/2,1/2) to γ (0) =(1,0). )

V

radius r

Figure 16.5

It is not hard to show that γ is indeed a one-to-one onto map from U to S1 ∩V .We have the Jacobian being

J (γ ) =(

∂√

1 − t2

∂ t,∂ t

∂ t

)=( −t√

1 − t2,1

).

Though the function −t/√

1 − t2 is not defined at t = ±1, this is no problem,as we chose our open interval U to be strictly smaller than [ − 1,1]. Since thesecond coordinate of the Jacobian is the constant function 1, we know that theJacobian has rank one.

Page 220: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16.2 Manifolds Embedded in Rn 205

To show that the circle is a one-dimensional manifold at the point (0,1), wewould need to use a different parameterizing map, such as

γ (t) = (t ,√

1 − t2).

16.2.2. Implicitly Defined Manifolds

(This is repeating much of section 6.4 in [26].)We could have also described the circle S1 as

S1 = {(x , y) : x2 + y2 − 1 = 0}.

If we set

ρ(x , y) = x2 + y2 − 1,

then

S1 = {(x , y) : ρ(x , y) = 0}.We would then say that the circle S1 is the zero locus of the function ρ(x , y).

This suggests a totally different way for defining manifolds, namely, as zeroloci of a set of functions on R

n.

Definition 16.2.2 (Implicit Manifolds). A set M in Rn is a k-dimensional

manifold if for any point p ∈ M there is an open set U containing p and(n − k) differentiable functions ρ1, . . . ,ρn−k such that

1. M ∩U = (ρ1 = 0) ∩ ·· · ∩ (ρn−k = 0),2. At all points in M ∩U, the gradient vectors

∇ρ1, . . . ,∇ρn−k

are linearly independent.

(From multivariable calculus, the vectors ∇ρ1, . . . ,∇ρn−k are all normalvectors.) Intuitively, each function ρi cuts down the degree of freedom by one.If we have n − k functions on Rn , then we are left with k degrees of freedom,giving some sense as to why this should also be a definition for k-dimensionalmanifolds.

Returning to the circle, for ρ = x2 + y2 − 1 we have

∇(x2 + y2 − 1) = (2x ,2y),

which is never the zero vector at any point on the circle.

Page 221: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

206 16 Manifolds

16.3. Abstract Manifolds

16.3.1. Definition

It took mathematicians many years to develop the definition of, or even torealize the need for, abstract manifolds. Here we want to have a geometricobject that is independent of being defined in an ambient Rn .

We will be following closely the definition of a manifold given in section1.1 of [49].

Definition 16.3.1. A set of points M is a smooth n-dimensional manifoldif there are a countable collection U = {Uα} of subsets of M, called thecoordinate charts; a countable collection V = {Vα} of connected open subsetsof Rn; and one-to-one onto maps χα : Vα → Uα , such that

1. The coordinate charts Uα cover M, meaning that⋃Uα = M .

2. If for two coordinate charts Uα and Uβ we have non-empty intersection

Uα ∩Uβ �= ∅,

then the map

χ−1β ◦χα : χ−1

α (Uα ∩Uβ) → χ−1β (Uα ∩Uβ )

is differentiable.3. Let x ∈ Uα and y ∈ Uβ be two distinct points on M. Then there exist open

subsets W ⊂ Vα and W ′ ⊂ Vβ with

χ−1α (x) ∈ W and χ−1

β (y) ∈ W ′

andχα(W ) ∩χβ(W ′) = ∅.

We say that (U,V,{χα}) defines a manifold structure on M.

By only requiring that the various maps χ−1β ◦ χα be continuous, we have

the definition for a topological manifold. Similarly, if the maps χ−1β ◦ χα are

real-analytic, we have real-analytic manifolds, and so on.We now have to unravel the meaning of this definition. Underlying this

definition is that we know all about maps F : Rn →Rn. Such F are defined by

settingF(x1, . . . , xn) = ( f1(x1, . . . , xn), . . . , fn(x1, . . . , xn)),

where each fi : Rn → R. The map F is differentiable if all first-orderderivatives for each fi exist.

Page 222: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16.3 Abstract Manifolds 207

We want to think of each coordinate chart Uα as “carrying” part of Rn .

Mχα

VαUα

Figure 16.6

The condition⋃

Uα = M is simply stating that each point of M is in at leastone of the coordinate charts.

Condition 2 is key. We have

M

°

∩Uβ

Uα Uβ

Vα Vβ

χ− 1α (Uα ∩Uβ )

χα

χ− 1β (Uα ∩Uβ )

χ− 1β χα

χβ

Figure 16.7

Then each χ−1β ◦ χα is a map from Rn to Rn . We can describe these maps’

differentiability properties.The third condition states that we can separate points in M . For those who

know topology, this condition is simply giving us that the set M is Hausdorff.

MUα

Vα Vβ

χα χβ

W W ′

χα (W )

χβ (W ′)

Figure 16.8

Page 223: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

208 16 Manifolds

Let us look at a simple example, the unit circle S1 = {(x , y) : x2 + y2 = 1}.We set

U1 = {(x , y) ∈ S1 : x > 0}

U1

Figure 16.9

U2 = {(x , y) ∈ S1 : y > 0}

U2

Figure 16.10

U3 = {(x , y) ∈ S1 : x < 0}

U3

Figure 16.11

U4 = {(x , y) ∈ S1 : y < 0}

U4

Figure 16.12

For all four charts, we have

V1 = V2 = V3 = V4 = {t ∈ R : −1 < t < 1}.We set

Page 224: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16.3 Abstract Manifolds 209

U1

− 1 0 1

V1

χ1(t) = ( 1 − t2 , t )

Figure 16.13

Then

χ−11 (x , y) = y.

− 1 0 1

V2

χ2(t) = (t, 1 − t2) U2

Figure 16.14

χ−12 (x , y) = x .

− 1 0 1

χ3(t) = (− 1 − t2 , t )V3

U3

Figure 16.15

χ−13 (x , y) = y.

− 1 0 1U4

V4

χ4(t) = (t, − 1 − t2)

Figure 16.16

χ−14 (x , y) = x .

Let us look at the overlaps. We have that

U1 ∩U2 = {(x , y) ∈ S1 : x > 0, y > 0}

Page 225: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

210 16 Manifolds

U1 U∩ 2

Figure 16.17

withχ−1

1 (U1 ∩U2) = {t ∈R : 0 < t < 1} = χ−12 (U1 ∩U2).

Thenχ−1

2 ◦χ1(t) = χ−12 (

√1 − t2, t) =

√1 − t2.

Similarly, we get

χ−13 ◦χ2(t) =χ−1

3 (t ,√

1 − t2) =√

1 − t2

χ−14 ◦χ3(t) =χ−1

4 ( −√

1 − t2, t) = −√

1 − t2

χ−11 ◦χ4(t) =χ−1

4 (t ,−√

1 − t2) = −√

1 − t2.

All of these maps are infinitely differentiable real-valued maps.Now for a technicality. The proceeding depends on choosing coordinate

charts. For example, we could have started with our circle S1 and chosencoordinate charts

U ′1 =

{(x , y) ∈ S2 : −

√3

2< x <

√3

2, y > 0

}

U ′1

Figure 16.18

U ′2 =

{(x , y) ∈ S2 : −

√3

2< y <

√3

2, x < 0

}

U ′2

Page 226: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16.3 Abstract Manifolds 211

Figure 16.19

U ′3 =

{(x , y) ∈ S2 : −

√3

2< x <

√3

2, y < 0

}

U ′3

Figure 16.20

U ′4 =

{(x , y) ∈ S2 : −

√3

2< y <

√3

2, x > 0

}.

U ′4

Figure 16.21

Surely we want to say that this is the same manifold. We need to be ablesomehow to compare different coordinate charts for the same set of pointsM . Let U = {Uα} be a collection of coordinate charts on M , with V = {Vα}the corresponding connected open sets in R

n and χα : Vα → Uα the maps thatmake M into a manifold. Now consider another collection of coordinate charts,denoted by U′ = {U ′

µ}, with corresponding connected open sets V= {V ′µ} in Rn

and maps χ ′µ : U ′

µ → U ′µ. We want to know when (U,V, {χα}) and (U′,V′, {χ ′

µ})define the “same” manifold.

Definition 16.3.2. We say that (U,V, {χα}) and (U′,V′, {χ ′µ}) are compatible if

the mapsχ ′−1

µ ◦χα : χ−1α (Uα ∩U ′

µ) → χ ′−1µ (Uα ∩U ′

µ)

are differentiable, in which case we say they define the same manifold structureon M.

Finally, this definition for a manifold allows us to have a natural notion ofopen sets.

Definition 16.3.3. A subset U of a manifold M is open if for all α the sets

χα(U ∩Uα)

Page 227: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

212 16 Manifolds

are open sets of Rn.

16.3.2. Functions on a Manifold

We can now talk about what it means for a function to be differentiable ona manifold. Again, we will reduce the definition to a statement about thedifferentiability of a function from R

n to R.

Definition 16.3.4. A real-valued function f on a manifold M is differentiableif, for an open cover (Uα) and maps φα : Open ball in Rn → Uα , thecomposition functions

f ◦φα : Open ball in Rn →R

are differentiable, for all α.

16.4. Exercises

Exercise 16.4.1. In R3, the unit sphere S2 is the set of points distance one fromthe origin and thus the points (x , y, z) ∈R

3 such that

x2 + y2 + z2 = 1.

Using the definition of implicitly defined manifold, show that S2 is a manifold.

Exercise 16.4.2. Using the definition for parametrically defined manifolds,show that S2 is a manifold.

Exercise 16.4.3. Using the definition for abstractly defined manifolds, showthat S2 is a manifold.

Exercise 16.4.4. Show that

{(x , y, z) ∈R3 : x2 + y2 + z2 = 1, x + y + z = 0}

is an implicit manifold of dimension one. What type of geometric object is it?

Exercise 16.4.5. Show that {(x , y, z) ∈ R3 : x2 + y2 + z2 = 1, x + y + z = 0}can also be proven to be a parametrically defined dimension one manifold.

Exercise 16.4.6. Show that z2 = x2 + y2 is not a manifold at the origin.

Exercise 16.4.7. Show that in R2, the zero set

x2 − y2 = 0

is not a manifold.

Page 228: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

16.4 Exercises 213

Exercise 16.4.8. Let M be an abstract manifold with covering {Uα}. Let Uand U ′ be two open sets on M. Prove that U ∪U ′ and U ∩U ′ are also open.

Exercise 16.4.9. Show that on the unit circle

S1 = {(x , y) ∈R2 : x2 + y2 = 1},

the functionf (x , y) = x

is differentiable. (Here you need to think in terms of coordinate charts.)

Exercise 16.4.10. Show that on the unit circle S1 the function

f (x , y) = |x |is continuous but not differentiable at two points.

Page 229: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17

Vector Bundles

Summary: The basics of vector bundles are given in this chapter. As we willsee in the next two chapters, vector bundles are needed to understand curvatureof manifolds correctly. Further, we will eventually see that vector bundles areneeded to generalize Maxwell’s equations to other forces.

17.1. Intuitions

Picture a surface in space.

Figure 17.1

In the previous chapter, we saw methods for describing manifolds. But nowconsider the surface in Figure 17.2.

Figure 17.2 (see Plate 1 for color version)

How can we account for the color? One method would be still to describethe surface as a manifold, but add the following type of extra information. At

214

Page 230: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.1 Intuitions 215

each point of the surface, attach a color to the point. Imagine a color wheel(Figure 17.3).

Figure 17.3 (see Plate 2 for color version)

Then corresponding to each point of the surface is a point on the color wheel.We can now describe our colorful surface by points in the product of

Colorless surface × Color Wheel.

This happens all the time. You have a geometric object that has extrainformation attached to each point. For example, we can imaging a spinningtop moving along a curve in space

Figure 17.4

Here we can attach to each point of the curve the direction about which the topis spinning.

This is the beginnings of the theory of fiber bundles. You start with amanifold and attach to each point extra information. All the possible valuesfor the extra information are called the fiber over the point. For our initialcolor example, the fiber would be the entire color wheel.

For us, the extra information will be a vector space. We want somehow toattach, at each point of our manifold, a vector space, and to have these attachedvector spaces vary continuously as we vary the point on the manifold.

Page 231: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

216 17 Vector Bundles

17.2. Technical Definitions

17.2.1. The Vector Space Rk

In this book, the vector spaces making up our vector bundles will all be variousreal vector spaces Rk . Here we just want to fix some notation. For us, Rk willbe column vectors:

Rk =

x1...

xk

: xi ∈ R

.

The natural maps from the vector space Rk to itself are given by matrix

multiplications of k × k matrices times the column vectors of Rk . Thus, givenan n × n matrix

g =

g11 · · · g1k... · · · ...

gk1 · · · gkk

,

we have the map

g : Rk →Rk

given by

gx =

g11 · · · g1k... · · · ...

gk1 · · · gkk

x1...

xn

,

for x ∈ Rk . The map given by g will be one-to-one and onto if and only if the

matrix g is invertible. Such matrices make up the general linear group

GL(k,R) = {invertible k × k matrices}.

17.2.2. Definition of a Vector Bundle

We will start with the definition for a vector bundle. We want to capture theidea of attaching at each point of a manifold a vector space. We follow chapter1, section 2 of Wells’s Differential Analysis on Complex Manifolds [68].

Definition 17.2.1. A real topological vector bundle of rank k over a manifoldM is a manifold E and a continuous onto map

π : E → M

such that

Page 232: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.2 Technical Definitions 217

1. For each point x ∈ M the inverse image of π is a rank k real vector space(i.e., π−1(x) is a rank k vector space).

2. For each x ∈ M, there is an open neighborhood U of x such that there is ahomeomorphism

h : π−1U → U ×Rk

such that h restricted to the inverse image of x is a vector spaceisomorphism onto Rk .

A vector bundle is an example of a fiber bundle. The inverse image over apoint x ∈ M is called the fiber and is denoted by Ex . By condition (1), the fiberEx is a real vector space of dimension k. The underlying manifold M is calledthe base space.

We want to calculate on vector bundles and thus need to set up a languagefor local coordinates.

Here is another way of thinking of vector bundles. Start with our basemanifold M . Cover M by a collection of open sets Uα. On each open set Uα ,consider the product space

Uα ×Rk .

To form our vector bundle E , we want to glue, or patch, together the variousUα ×R

k on any intersection Uα ∩ Uβ . To fix notation, for each α, label themap h : π−1Uα → Uα ×R

k as hα and the inverse map as

h−1α : Uα ×R

k → π−1Uα .

On the intersection of Uα ∩Uβ , define

gαβ = hβ ◦ h−1α .

Then, on the intersection, we have

gαβ = Identity on Uα ∩Uβ × (invertible k × k matrix)

= Identity on Uα ∩Uβ × (element of GL(k,R)).

The gαβ are called transition functions. Often we will identify eachtransition function gαβ with its matrix. One can show, though we will not,the following:

Theorem 17.2.1. For a vector bundle E, the transition functions satisfy

1. For all (x ,v) ∈ Uα ×Rk ,

gαα(x ,v) = (x ,v).

Page 233: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

218 17 Vector Bundles

2. For all x ∈ Uα ∩Uβ ∩Uγ , we have

gαβ · gβγ = gαγ .

(This last condition is known as the co-cycle condition; this and its analogscome up a lot in mathematics.)

Suppose we have a vector bundle E . On an open set U of the base manifoldM , suppose we have a map

s : M → E

such that

π ◦ s = Identity on M .

The map s is called a section of E . On an open set U in M , we have aframing of E if we can find sections s1, . . . ,sk such that for all x ∈ U , thevectors s1(x), . . . ,sk(x) are linearly independent. Then the map

h−1 : U ×Rk → π−1U

can be given by

h−1(x , (a1, . . . ,ak)) =∑

ai si (x).

Suppose we have an open set Uα with a framing sα1 , . . . ,sα

k with correspond-ing map

h−1α : Uα ×R

k → π−1Uα

and have an open set Uβ with a framing sβ1 , . . . ,sβ

k such that for all x ∈ Uβ withcorresponding map

h−1β : Uα ×R

k → π−1Uβ .

We want to find the invertible matrix making up the transition function

gαβ = hβ ◦ h−1α : Uα ∩Uβ ×R

k → Uα ∩Uβ ×Rk .

(Keeping track of the various hα versus the inverse map h−1α can be annoying;

it is easy to get the various maps mixed up when doing a calculation.) At eachpoint x in the intersection Uα ∩Uβ , we know that each sα

j (x), as an element ofthe vector space Ex , can be written as a linear combination of the basis vectorssβ

1 (x), . . . ,sβk . Thus there are numbers ai j (depending on the point x) such that

sαj (x) =

k∑i=1

ai j sβi (x).

Page 234: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.3 Principal Bundles 219

Consider the matrix

gαβ =

a11 · · · a1k...

......

ak1 · · · akk

.

Then if

gαβ(x , (a1, . . . ,ak)) = hβ ◦ h−1α (x , (a1, . . . ,ak)) = (x , (b1, . . . ,bk)),

we have

b1...

bk

= gαβ

a1...

ak

=

a11 · · · a1k...

......

ak1 · · · akk

a1...

ak

.

In working with vector bundles, there is the constant tension of writingeverything out in terms of transition functions (which you must do frequentlyif you want to calculate anything) and thinking of the vector bundles as somesort of abstract manifold E . The reality is that one must be comfortable doingboth.

17.3. Principal Bundles

One of the difficulties in this subject is that people use markedly differentlanguages to describe what is in essence the same thing. In the last section wedefined the notion of a vector bundle. Another approach is to use principalbundles, the topic of this section. We will first motivate the developmentof principal bundles from vector bundles and only then give the technicaldefinition. We could easily have reversed the path, first defining principalbundles and then using this to motivate vector bundles.

Suppose we have a vector bundle E with base manifold M . Then for eachpoint p ∈ M , there is the vector space E p. In studying any type of mathematicalobject, one can either concentrate on the objects themselves or concentrateon the maps between these objects. In linear algebra, the objects are vectorspaces while the maps are linear transformations. For vector bundles, thissuggests that instead of considering the fiber of a point to be the vectorspace E p, possibly we should have the fiber be the linear one-to-one, ontotransformations from the vector space E p to itself. The collection of all linearone-to-one, onto transformations from a vector space to itself forms a groupcalled the automorphism group. Thus the fiber would be a group, not a vectorspace.

First, the groups will be Lie groups. A Lie group is simply a group that isalso a differentiable manifold, with the requirement that the function given by

Page 235: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

220 17 Vector Bundles

the group operation must be smooth (i.e., for each fixed g ∈ G, the function thatsends each h ∈ G to gh must be smooth) and the function of taking inverses(taking each g ∈ G to g−1) must be smooth. For example, the group GL(n,R)of invertible matrices is a Lie group.

Definition 17.3.1. For a Lie group G, a principal G-bundle over a manifoldM is a manifold P with a continuous onto map

π : P → M

such that

1. For each point p ∈ M the inverse image of π is a copy of the group G.2. For each x ∈ M, there is an open neighborhood U of x such that there is a

homeomorphismh : π−1U → U × G

such that h restricted to the inverse image of x is a group isomorphismonto G.

In the previous section, from the definition for vector bundles weconstructed the transition functions gαβ , which were elements of GL(n,R) andhence can be used to construct a principal bundle. We could have reversed thedirection and, starting with the transition functions gαβ and principal bundles,constructed vector bundles. In this book, we will overwhelmingly take thevector bundle approach. This is just a matter of choice.

17.4. Cylinders and Möbius Strips

We look at two examples of rank one vector bundles. Start with the cylinder

E = {(x , y, z) ∈R3 : x2 + y2 = 1}.

z

x

y

Figure 17.5

Given any (x , y) with x2 + y2 = 1, we can choose any number z and beguaranteed that (x , y, z) is on the cylinder. We can think of the base manifoldM as the circle S1 where z = 0:

Page 236: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.4 Cylinders and Möbius Strips 221

Base Manifold M = {(x , y) : x2 + y2 = 1} = {(cos(θ ), sin(θ )) : θ ∈R},which we can identify with

{(x , y,0) : x2 + y2 = 1} = {(cos(θ ), sin (θ ),0) : θ ∈R}.z

x

y

Base Manifold M

Figure 17.6

The fiber over any point p = ((cos(θ ), sin(θ )) ∈ M is the entire line parallel tothe z-axis.

z

x

y

Fiber over p

p

Figure 17.7

The Möbius strip looks like

Figure 17.8

The base manifold M will again be the circle, which we can identify to

Base Manifold M

Figure 17.9

Page 237: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

222 17 Vector Bundles

Consider the circle with the open sets U1,U2 and U3.

U1 ∩ U2

U1 ∩ U3

U2 ∩ U3

U1

U2

U3

Figure 17.10

We will define the Möbius strip via transition functions g12, g13 and g23. Weneed

gαβ : Uα ∩Uβ ×R→ Uα ∩Uβ ×R.

Set

g12(p,v) = (p,v)

g13(p,v) = (p,v)

g23(p,v) = (p,−v).

Visually, we have Figure 17.11.

vv

v

v

v

U1

U2

U3− v

g23 (p, v) = (p, − v)

g13 (p, v) = (p, v)g12 (p, v ) = ( p, v)

Figure 17.11

Note that if the last map were g23(p,v) = (p,v), we would have ended upwith the cylinder.

17.5. Tangent Bundles

17.5.1. Intuitions

One of the most important vector bundles is the tangent bundle of amanifold, which is the subject of the rest of this chapter.

Page 238: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.5 Tangent Bundles 223

Intuitively, the tangent bundle T (M) of a manifold M should be the spacemade up of all tangent spaces for M at each point of M . For example, considerthe circle S1 in the plane. We want the fiber Tp(S1) of any point p ∈ S1

to be the tangent line, thought of as the real line with origin at p. ThusT(1,0)(S1) is a copy of the real line parallel to the y-axis. Similarly, T(0,1)(S1)and T(1/2,

√3/2)(S1) are (Figure 17.12):

tangent line at (1,0) tangent line at (0,1) tangent line at (1 / 2, 3/ 2)

Figure 17.12

In a similar fashion, we want the tangent bundle T (S2) for the unit sphereS2 in R

3 to be all the tangent planes of points of S2. Thus T(1,0,0)(S2) is(Figure 17.13)

z

x

y

Figure 17.13

while T(1/√

3,1/√

3,1/√

3)(S2) is (Figure 17.14)

z

x

y

Figure 17.14

For manifolds defined in some ambient Rn , all of this can be done in a fairlystraightforward way, as we will do in the next section for parametricallydefined manifolds.

Page 239: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

224 17 Vector Bundles

But what if we have an abstract manifold, one that does not “live” in someR

n? Here, the answer is to think of tangent spaces not so much as geometricobjects as more like derivatives. Recall that one of the main goals of calculusis to find the slope of the tangent line to a curve y = f (x).

y

x

y − y0 = f ′(x 0)(x − x 0)

y = f (x)

x 0

Figure 17.15

Of course, the slope is just the derivative f ′(x), giving us that the line tangentto the curve at a point (x0, y0) is

y − y0 = f ′(x0)(x − x0).

Similarly, for a surface in R3 given as a zero set of a function f (x , y, z) =

0, we know from multivariable calculus that a normal vector at a point p =(x0, y0, z0) on the surface is

∇ f (p) =(

∂ f

∂x(p),

∂ f

∂ y(p),

∂ f

∂z(p)

),

giving us that the tangent plane at the point is

∂ f

∂x(p)(x − x0) + ∂ f

∂ y(p)(y − y0) + ∂ f

∂z(p)(z − z0) = 0.

This suggests a more intrinsic way for defining the tangent bundle, using somenotion of derivative. This we do in the last section of this chapter.

17.5.2. Tangent Bundles for Parametrically Defined Manifolds

This definition of tangent bundle will not be that hard, as we have basicallyset up the correct notation for the tangent bundle in Section 16.2.1. As before,let x = (x1, x2, . . . , xn) be the coordinates for R

n and let u = (u1, . . . ,uk) becoordinates for Rk . If M is a k-dimensional manifold living in an ambientRn , then we know that for every point p ∈ M there are an open ball V in R

n ,centered at p, and a differentiable map

γ : U →Rn ,

with U an open ball in Rk , centered at the origin, such that

Page 240: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.5 Tangent Bundles 225

1. γ (0, . . . ,0) = p.2. γ is a one-to-one onto map from U to M ∩ V .3. The rank of the Jacobian for γ is k at every point.

Each row of the Jacobian is a vector in Rn . Further, by the rank condition, weknow that the k rows are linearly independent. The tangent space Tp M is thek-dimensional vector space spanned by the rows of the Jacobian. (All of thisis done in multivariable calculus, though the usual examples are of curves inthe plane, curves in space, or surfaces in space.)

Let us look at the circle again. Unlike in Chapter 16, consider theparameterization

γ : R→ S1

given by

γ (θ ) = (cos(θ ), sin (θ )).

Then the Jacobian is

J (γ ) =(

∂x1

∂θ,∂x2

∂θ

)= ( − sin(θ ), cos(θ )).

Note that at p = (1,0), or when θ = 0, the tangent line is spanned by

( − sin(0),cos(0)) = (0,1),

(− sin(0) , cos(0)) = (0,1)

Figure 17.16

as we want.A given parameterization for a manifold need not be unique. Thus we

should now show that the vector spaces spanned by the Jacobians for twodifferent parameterizations agree at any given point. The proof, which we willnot do, is simply an application of the chain rule. This should be no surprise, asdifferent parameterizations can be thought of as different coordinate systemson the manifold.

17.5.3. T (R2) as Partial Derivatives

Let us start with the plane R2. At any point (x0, y0) ∈ R2, the tangent planeT(x0 ,y0)R

2 is just another copy of R2.

Page 241: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

226 17 Vector Bundles

y

x

x 0

y0

Figure 17.17

But we now want to emphasize tangency as derivatives, or as rates of change.The partial derivative ∂/∂x acting on any function f (x , y), evaluated at (x0, y0),measures how fast f is changing at (x0, y0) in the (1,0) direction,

y

x

�f

�x(x0,y0) =

how fast f is changingin the (1,0) direction

x 0

y0

(1,0)

Figure 17.18

while ∂/∂ y acting on f (x , y), evaluated at (x0, y0), measures how fast f ischanging at (x0, y0) in the (0,1) direction. In general, Figure 17.19(

a∂

∂x+ b

∂ y

)f∣∣∣(x0 ,y0)

y

xx 0

y0(a, b)-direction

Figure 17.19

gives the rate of change of f at (x0, y0) in the (a,b) direction. We will identifythe tangent plane of (x0, y0) ∈R2 as the two-dimensional vector space

T(x0,y0)R2 =

{a

∂x+ b

∂ y

∣∣∣(x0 ,y0)

: a,b ∈R

}.

As will be shown in the exercises,

Theorem 17.5.1. For any L ∈ T(x0,y0)R2, for all functions f (x , y) and g(x , y)

and all real constants α and β, we have

L(α f +βg) = αL( f ) +βL(g)

L( f g) = f L(g) + gL( f ).

(This actually follows from some simple properties of differentiation.) Thekey for us is that we can identify the tangent plane at a point in R2 with linear

Page 242: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.5 Tangent Bundles 227

maps

L : Functions on R2 →R

that satisfy Leibniz’s rule L( f g) = f L(g) + gL( f ).

17.5.4. Tangent Space at a Point of an Abstract Manifold

Let M be an abstract n-dimensional manifold. We want to develop the ideathat the elements of the tangent space at a point p will be ways for measuringthe rates of change of a function on M .

We know that M has a cover U = {Uα} with corresponding collection V ={Vα} of connected open subsets of Rn and one-to-one onto maps χα : Vα → Uα ,such that, for any two open Uα and Uβ , the map

χ−1β ◦χα : χ−1

α (Uα ∩Uβ) → χ−1β (Uα ∩Uβ )

is differentiable. Recall that this allowed us to define a real-valued function fon a manifold M to be differentiable if the composition function

f ◦φα : Open ball in Rn →R

is differentiable as a real-valued function defined on Rn .We want to define what it means to differentiate functions on M . (The

rest of this paragraph is a slight rephrasing of section 6.5.2 in [26].) Thus wewant to measure the rate of change of f at p. This should only involve thevalues of f near p. The values f away from p should be irrelevant. Thisis the motivation behind the following equivalence relation. Let ( f1,U1) and( f2,U2) denote open sets on M containing p, with f1 and f2 correspondingdifferentiable functions. We will say that

( f1,U1) ∼ ( f2,U2)

if, on the open set U1 ∩U2, we have f1 = f2. This leads us to defining

C∞p = {( f ,U )}/ ∼.

We will frequently abuse notation and denote an element of C∞p by f . The

space C∞p is a vector space and captures the properties of functions close to

the point p. (For mathematical culture’s sake, C∞p is an example of a germ of

a sheaf, in this case, the sheaf of differentiable functions.)

Definition 17.5.1. The tangent space Tp(M) is the space of all linear maps

v : C∞p → C∞

p

Page 243: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

228 17 Vector Bundles

such that

v( f g) = f v(g) + gv( f ).

We say that the rate of change of a function f at a point p ∈ M in the“direction” v ∈ Tp(M) is the value of the function v( f ) at p.

Showing that Tp(M) is a dimension n vector space is one of the exercises.Finding a relatively straightforward method for actually constructing elementsof Tp(M) is one of the goals of the next section.

17.5.5. Tangent Bundles for Abstract Manifolds

Intuitively, an abstract n-dimensional manifold M can be covered byvarious open sets U such that for each U there are an open ball V ⊂Rn and aone-to-one onto map

χ : V → U .

Let x1, . . . , xn be local coordinates for Rn . On V there are the partial derivatives∂/∂x1, . . . ,∂/∂xn . People like to think of U as being identified with the ball V ,in which case on U there should also be corresponding partial derivatives. Wewant to write down here the explicit maps that are needed to make this rigorous.

We start with a function

f : U →R

UM

f R

Figure 17.20

Corresponding to each partial derivative ∂/∂xi , we want to define an elementvi ∈ Tp(M). Using our map χ : V → U , we know that

f ◦χ : V →R.

UM

f R

V χ

°f χ

Figure 17.21

Page 244: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.5 Tangent Bundles 229

Then taking the partial derivative of f ◦ χ with respect to the variable xi

makes perfect sense, giving us a new function

∂( f ◦χ)

∂xi: V → R.

UM

f R

V − 1

( f )∂xi

χ

χ∂ °

Figure 17.22

Then we define

vi ( f ) = ∂( f ◦χ)

∂xi◦χ−1 : U → R.

Showing that this vi is indeed in Tp(M) is left to the exercises, as is showingthat the vi are linearly independent. Not left to the exercises, but still passedover, is the proof that the various vi form a basis for Tp(M).

This notation is admittedly a bit complicated. Since χ is a one-to-one ontomap, people frequently just identify the open set U with the open set V in Rn ,and let x1, . . . , xn denote the coordinates on U , without putting in the requiredmaps χ and χ−1. When this is done, people consider the partial derivatives

∂xi

as forming the basis of the tangent bundle for M over the open set U . In thenext chapter, for example, we will write a tangent vector on the open set U ofa manifold M as some

a1∂

∂x1+ ·· ·+ an

∂xn.

Finally, the coordinate maps χ : U → V are hardly unique. Showing that weget the same vector space Tp(M) for different coordinate maps is a non-trivialapplication of the chain rule.

Page 245: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

230 17 Vector Bundles

17.6. Exercises

Exercise 17.6.1. For the unit circle S1 in R2, consider the following three

parameterizations:

γ1(θ ) = (cos(θ ), sin(θ ))

γ2(s) = (s,√

1 − s2)

γ3(t) = (√

1 − t2, t).

Show thatγ1(π/4) = γ2(1/

√2) = γ3(1/

√2).

At this common point on S1, compute the Jacobian for each of theseparameterizations. Show that at this point, each Jacobian spans the samevector space.

Exercise 17.6.2. For the unit sphere S2 in R3, consider the parameterization

γ (u,v) = (u,v,√

1 − u2 − v2)

from the unit disc {(u,v) : u2 + v2 < 1} to the hemisphere. Compute theJacobian.

Exercise 17.6.3. Using the notation from the previous exercise, show that thetangent space of S2 at (0,0,1) is a plane parallel to the xy-plane.

Exercise 17.6.4. Using the previous notation, find the Jacobian at the pointγ (1/

√3,1/

√3).

Exercise 17.6.5. Consider a different parameterization for S2:

τ (s, t) = (s,√

1 − s2 − t2, t).

Find the Jacobian at τ (1/√

3,1/√

3). Show that the rows of this Jacobian(which form a basis for the tangent plane at this point) span the same vectorspace as the rows of the Jacobian in the previous problem.

Exercise 17.6.6. Let

T(x0,y0)R2 =

{a

∂x+ b

∂ y

∣∣∣(x0 ,y0)

: a,b ∈R

}.

Show for any L ∈ T(x0,y0)R2 that for all functions f (x , y) and g(x , y) and all

real constants α and β we have

L(α f +βg) = αL( f ) +βL(g)

L( f g) = f L(g) + gL( f ).

Page 246: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

17.6 Exercises 231

Exercise 17.6.7. Show that Tp(M), the space of all linear maps

v : C∞p → C∞

p

such thatv( f g) = f v(g) + gv( f ),

is a vector space.

Exercise 17.6.8. If

vi ( f ) = ∂( f ◦χ)

∂xi◦χ−1 : U → R,

show that vi ∈ Tp(M).

Exercise 17.6.9. Using the notation from the previous exercise, show that thev1, . . . ,vn are linearly independent.

Page 247: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18

Connections

Summary: The goal of this chapter is to develop the technical definition fora connection on a vector bundle, which will allow us to differentiate sectionsof a vector bundle. Physicists usually refer to connections as “gauges.” As wewill see in later chapters, choosing scalar and vector potentials can be recast aschoosing a connection for an appropriate vector bundle.

18.1. Intuitions

Suppose we have a vector bundle E over a base manifold M . Let

s : M → E

M

E

s

Figure 18.1

be a section. We want to determine how fast the section is varying in E . Wewould like to take the derivative of s. Surprisingly, this is not possible. Naively,the derivative at a point p would be

limq→p

s(q) − s(p)

q − p.

Problems immediately arise with making sense out of this formula. First, pand q are points in a manifold; q − p does not make sense. This is not themain problem, as we could instead try to measure the rate of change of s alonga parameterized path σ (t) in M with σ (0) = p and then correspondingly alter

232

Page 248: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.2 Technical Definitions 233

our inital stab at differentiating s by trying to calculate

limt→0

s(σ (t)) − s(p)

t.

We must make sense out of the numerator s(σ (t)) − s(p). This is the realdifficulty, as s(p) is in the vector space E p and s(σ (t)) is in the vector spaceEσ (t). These are obviously different vector spaces. If we want to measurehow fast our section is changing, then we must somehow measure changes indifferent vector spaces. We must somehow link up or “connect” the differentspaces. This will lead to the notion of connection.

Before giving the definition, let us emphasize the problem. In a vectorbundle E over a base space M of rank k, it is certainly the case that thefiber over any point in M is a real k-dimensional vector space. Thus thevector spaces E p and Eq are isomorphic. But this does not allow us tocompare vectors in E p with vectors in Eq , since these vector spaces are notcanonically isomorphic (meaning there is no intrinsic one-to-one onto lineartransformation between them). There is no intrinsic way, independent ofarbitrary choices, for comparing the vectors. There are many different possibleisomorphisms between the vector spaces E p and Eq . None are intrinsicallybetter than any others.

Thus it is not surprising that there will be no single method for measuringthe rate of change of a section. It is also the case that there is more thanone method for defining connections (though all are equivalent). We willcharacterize, in an algebraic fashion, what these rates of change can be, ina way that is analogous to defining the derivative not in terms of limits, butinstead as a linear map from function spaces to function spaces that mapsconstants to zero and satisfy Leibniz’s rule. Then we will return to a morelimit type approach, where we develop the notion of parallel transport of avector along a curve in the base manifold.

18.2. Technical Definitions

18.2.1. Operator Approach

(We will be using tensor products in this section; these are defined in theAppendix of this chapter.)

For a vector bundle E with base space M , let

�(E) = {s : M → E : π ◦ s = identity map}

Page 249: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

234 18 Connections

denote the space of all differentiable sections from M to E , where π : E → Mis the map that sends every point in the fiber E p to the point p in M . Also,recall that �k(M) denotes the k-forms on M .

Definition 18.2.1. A connection on a vector bundle E with base space M isany linear map

∇ : �k(M) ⊗�(E) → �k+1(M) ⊗�(E)

that satisfies, for all k forms ω on M and sections s ∈ �(E),

∇(ω⊗ s) = ω∧∇(s) + ( − 1)kdω ⊗ s.

We will concentrate on the case

∇ : �(E) → �1(M) ⊗�(E).

(Recall that �0(M) are constants, so that �0(M) ⊗ �(E) = �(E).) Let f :M →R be any differentiable function on M . Thus for each point p ∈ M , f (p)is a number. Now let s ∈ �(E) be a section of E . Then s(p) is a vector in thevector space E p. We can multiply this vector by any scalar, and in particularmultiply the vector s(p) by the number f (p). Hence in a natural way, f s isanother section of the bundle E , with

( f s)(p) = f (p)s(p) ∈ E p.

A connection must have the property that

∇( f s) = f ∇(s) + (d f ) ⊗ s,

which is just a version of the single variable calculus requirement that thederivative of a product satisfy

(uv)′ = u′v+ v′u (Leibniz’s rule).

We want to understand how to create and how to write down connections.Over an open set U in the base manifold M , let s1, . . . ,sk be a framing for thevector bundle E . (Thus at each point p ∈ M , the vectors s1(p), . . . ,sk(p) mustform a basis of the vector space E p.) Let ∇ be a connection. Then for eachsection si , we know that ∇si must be an element of �1(M)⊗�(E). Thus theremust be 1-forms ωi j on U such that

∇si =k∑

j=1

ωi j ⊗ s j .

Page 250: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.2 Technical Definitions 235

Our connection may thus be described as a k × k matrix of 1-forms

ω = (ωi j ).

This matrix ω depends on our choice of a framing (or of a basis) for the vectorbundle E .

We can describe ∇(s) for any section s ∈ �(E) in terms of the connectionmatrix ω as follows. Since the s1, . . . ,sk form a basis of sections, there arereal-valued functions fi defined on M such that

s = f1s1 + ·· · + fksk .

Then we have

∇(s) = ∇( f1s1 + ·· ·+ fksk)

= ∇( f1s1) + ·· ·+∇( fksk)

= f1∇(s1) + s1d( f1) + ·· ·+ fk∇(sk) + skd( fk)

= f1

k∑

j=1

ωi j ⊗ s j

+ s1d( f1) + ·· ·+ fk

k∑

j=1

ωk j ⊗ s j

+ skd( fk)

=(

k∑i=1

fiωi1 + d f1

)s1 + ·· ·+

(k∑

i=1

fiωik + d fk

)sk .

(Note that in the last equation, for notational convenience, we are suppressingthe ⊗.)

As can be seen, such direct calculations are not pleasant. Here is a moreuser-friendly style of notation, though it must be emphasized that actuallyto calculate, one must usually write everything out as we did in the previousexample.

To ease notation, suppose that E is a rank-two vector bundle. Let s1 and s2

be two linearly independent sections. Thus every section S can be written as

S = f1s1 + f2s2,

where f1 and f2 are real-valued functions on the base manifold M . Set

f = (f1 f2

), s =

(s1

s2

).

Note that we are now using the symbol s to denote a column vector. That iswhy we use a capital S to denote the section. We have

S = f · s.

Page 251: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

236 18 Connections

Then the connection can be written as

∇(S) = ∇( f · s)

= d f · s + f · ∇s

= d f · s + f ·ω · s,

which is just a slightly slicker way of writing

∇( f1s1 + f2s2) = (d f1 d f2

)( s1

s2

)+ (

f1 f2)(ω11 ω12

ω21 ω22

)(s1

s2

).

In general, a section S of a rank k bundle can be written as the productS = f · s, where f denotes a 1 × k row matrix of functions and s denotes ak × 1 column vector.

All of the preceding calculations depend on choosing a particular basisof sections. If we choose a different basis, then the connection matrix willbe different, despite the fact that both are representing the same underlyingconnection. We now want to see how to compare different connection matriceswith respect to different choices of frames.

Let s1, . . . ,sk be a framing with connection matrix ω. Let t1, . . . , tk be adifferent framing, with connection matrix τ . Let A be the k × k invertiblematrix of functions that takes the t frame to the s frame, namely,

s = At

or, equivalently,

s1 = a11t1 + ·· · + a1ktk...

sk = ak1t1 + ·· ·+ akk tk .

Our goal is to show

Proposition 18.2.1.

ωA = Aτ + d A,

which is equivalent to showing for each i and j that

ωi1a1 j + ·· ·+ωikak j

is equal to

ai1τi j + ·· ·+ aikτk j + dai j .

Page 252: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.2 Technical Definitions 237

Proof. Let S be a section. Then we can write S either as a linear combinationof the si or as a linear combination of the ti . We write

S = f · s = g · t .

Here f and g are row vectors of functions. Since s = At , we have

f · s = f · A · t = g · t ,

which means thatg = f · A.

Now

∇(S) = ∇( f · s)

= d f · s + f ·ω · s

= d f · A · t + f ·ω · A · t

= (d f · A + f ·ω · A) · t .

But we also have

∇(S) = ∇(g · t)

= dg · t + f · τ · t

= (dg + g · τ ) · t

= (d( f · A) + f · A · τ ) · t

= (d f · A + f · dA + f · A · τ ) · t .

Thus we havef · dA + f · Aτ = f ·ω · A,

which means that ωA = Aτ + d A.�

We could have written the preceding out in local coordinates and justcomputed. This would have the advantage of being a bit more concrete butthe disadvantage of looking messy.

18.2.2. Connections for Trivial Bundles

In the last thirty or so years, the study of the space of possible connectionsfor a vector bundle has become increasingly central to both modern mathemat-ics and physics. This is not a subject to be entered into lightly. In this sectionwe will look at the simplest class of examples, namely, connections on trivial

Page 253: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

238 18 Connections

bundles. Though we are using the term “trivial,” these types of connectionswill lead to a natural bundle interpretation of Maxwell’s equations inChapter 20.

Definition 18.2.2. A rank k vector bundle E over a base manifold M istrivial if

E = M ×Rk .

Thus the cylinder from the last chapter is a trivial bundle. Though it is notparticularly natural, we can interpret the plane R2 as a rank-one bundle overR, treating the x-axis as the base manifold and the vertical lines as the fibers.

x

y

Figure 18.2

In a similar way, as you are asked to discuss in the exercises, we can think ofR3 as a rank-two bundle over the line or as a rank-one bundle over the plane.

In terms of transition functions, we have

Theorem 18.2.1. A vector bundle E over a base manifold M is trivial ifand only if we can find an open covering {Uα} of M such that the transitionfunctions are always the identity map, that is,

gαβ = Identity.

Though it is not particularly hard, we will not prove this.For a trivial bundle M ×R

k , the sections are quite easy to write down. Allwe need to do is to choose k real-valued functions

fi : M →R.

The corresponding section

s : M → M ×Rk

is simply

s(x) = (x ,

f1(x)...

fk(x)

).

Connections are fairly easy to describe for trivial vector bundles. We onlyneed to specify the k × k matrix ω of 1-forms.

Page 254: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.2 Technical Definitions 239

Definition 18.2.3. The trivial connection for a trivial vector bundle occurswhen

ω = (0).

To be more specific, given a section s : M → M ×Rk , the trivial connectionis

∇ = d

and thus

∇s = ds =

d f1(x)...

d fk(x)

.

To be even more specific, if it happens that the base manifold M is the realnumbers R with coordinate x , then

∇s = ds =

d f1(x)...

d fk(x)

=

f ′1(x)dx

...f ′k(x)dx

.

There are connections on trivial bundles that are not the trivial connection.This will become important when we put Maxwell’s equations into thelanguage of vector bundles. For now, suppose E = M ×Rk is a trivial bundle.Then we can define a connection by choosing any k × k matrix of 1-forms onM and setting

∇ = d +ω.

For example, let E be the trivial rank-one bundle over the real line R withcoordinate x . Choose our matrix of one-forms to be

ω = (x2dx).

Since E has rank-one, a section is just a single real-valued function f (x). Then

∇( f ) = (d +ω)( f (x)) = ( f ′(x) + x2 f (x))dx .

For another example, now let E be the trivial rank-two bundle over the realline R with coordinate x . Choose the connection 1-form matrix to be

ω =(

xdx exdxdx x2dx

).

Page 255: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

240 18 Connections

A section is now given by two real-valued functions f1(x) and f2(x). We have

∇(s) = ∇(

f1(x)f2(x)

)

=(

d +(

xdx exdxdx x2dx

))(f1(x)f2(x)

)

=(

( f ′1 + x f1 + ex f2)dx

( f ′2 + f1 + x2 f2)dx

).

18.3. Covariant Derivatives of Sections

(While all of this section is standard, we are following the ideas in [10].) Ouroriginal goal was to differentiate sections. Our claim is that the technicaldefinition given in the last section of a connection as a linear map

∇ : �(E) → �1(M) ⊗�(E)

is the correct definition for the derivative of a section.Let s ∈ �(E) be a section. Let

σ : [a,b] → M

be a curve in the base manifold M . Our goal is to fix a connection ∇ andto use ∇ to differentiate the section s along the curve σ . First, to simplifymatters, we will assume that the image of σ lands in an open set U of M thatwe can identify with an open ball in R

n . This allows us to fix local coordinatesx1, . . . , xn for U , which in turn allows us to describe the curve σ by n functions:

σ (t) = (x1(t), . . . , xn(t)).

Then the tangent vector to our curve is

dx1

dt· ∂

∂x1+ ·· ·+ dxn

dt· ∂

∂xn.

σ ′(t) =dx 1

dt,

dx 2

dt

x 2

x 1

Figure 18.3

Page 256: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.3 Covariant Derivatives of Sections 241

Recall that 1-forms act on partial derivatives via the rule

dxi

(∂

∂x j

)={

1 if i = j0 if i �= j

and linearity. For example, consider the 1-form

ω = x1dx1 + (x1 + x2)dx2.

At the point (x1, x2) = (3,2), we have

ω = 3dx1 + 5dx2.

Let

v = x2∂

∂x1+ ∂

∂x2,

which at the point (3,2) becomes

v = 2∂

∂x1+ ∂

∂x2.

Then at the point (3,2), we have

ω(v) = (3dx1 + 5dx2)

(2

∂x1+ ∂

∂x2

)= 3 · 2 + 5 · 1 = 11.

The key is that 1-forms send tangent vectors to numbers.This suggests

Definition 18.3.1. Let ∇ be a connection on a vector bundle E, let s be asection of E, and let X be a vector field on M. Then the covariant derivativeof s with respect to X is

∇X (s) = ∇(s)(X).

We will first make sense out of the preceding, and then see how actuallyto compute a covariant derivative. We know that ∇(s) is in �1(M) ⊗ �(E)and thus consists of a linear combination of sections of E with 1-forms on M .These 1-forms act on the vector field X .

If we want to differentiate along a curve σ , then we choose our vector fieldto be the tangent vectors to σ .

Now to see how to compute. As we saw earlier, given a basis s1, . . . ,sk forour vector bundle, for any section s there are real-valued functions fi definedon M such that

s = f1s1 + ·· · + fksk .

Page 257: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

242 18 Connections

We saw that

∇(s) =(

k∑i=1

fiωi1 + d f1

)s1 + ·· ·+

(k∑

i=1

fiωik + d fk

)sk .

For any vector field X , we have

∇X (s) =(

k∑i=1

fiωi1(X) + d f1(X)

)s1 + ·· ·+

(k∑

i=1

fiωik(X) + d fk(X)

)sk .

(Just to be concrete, both∑k

i=1 fiωi1(X) + d f1(X) and∑k

i=1 fiωik(X) +d fk(X) are numbers.)

We now look at a concrete example. Let E be a rank-two bundle with basisof section s1 and s2 on a surface M , which has coordinates x1 and x2. Let theconnection matrix be

ω =(

dx1 dx1 + dx2

dx2 dx1

).

Let the coefficients for our section s be

f1(x1, x2) = x1 + x2

f2(x1, x2) = x1 − x2.

Then

d f1 = dx1 + dx2

d f2 = dx1 − dx2.

Then

∇(s) = ∇( f1s1 + f2s2)

= ((x1 + x2)dx1 + (x1 − x2)dx2 + (dx1 + dx2))s1

+((x1 + x2)(dx1 + dx2) + (x1 − x2)dx1 + (dx1 − dx2))s2.

At the point (x1, x2) = (0,2) this becomes

∇(s) = (2dx1 − 2dx2 + (dx1 + dx2))s1

+(2(dx1 + dx2 − 2dx1 + (dx1 − dx2))s2

= (3dx1 − dx2)s1 + (dx1 + dx2)s2.

For the vector field

X = ∂

∂x1,

Page 258: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.4 Parallel Transport: Why Connections Are Called Connections 243

we get that∇(

∂∂x1

)(s) = 3s1 + s2.

Finally, if we think of a connection ∇ as giving a way to differentiate asection s, then the covariant derivative with respect to a curve σ is a way fordifferentiating the section along the curve. This intuition is important for thenext section.

18.4. Parallel Transport: Why Connections Are Called Connections

Let us return to the original problem of trying to understand the meaning of

s(q) − s(p)

(since a derivative should be some type of limq→p (s(q)−s(p))/(q − p)), wheres : M → E is a section from the base manifold M to the vector bundle E . Thedifficulty is that s(q) is a vector in the vector space Eq , while s(p) is a vectorin the completely different vector space E p. We want somehow to use ourconnection ∇ to compare vectors in Eq with vectors in E p. This we will do,provided we fix a curve σ in the base manifold M going from the point p tothe point q . Let

vq ∈ Eq .

We want to be able to “move” this vector vq to a vector vp ∈ E p. The key liesin the concept of parallel transport along the curve σ .

Definition 18.4.1. A section s is parallel with respect to a curve σ if at allpoints on the curve, we have

∇σ ′s = 0.

(Here σ ′ denotes the tangent vectors of the curve σ .)

Let us quickly discuss why we are using the word “parallel.” In one-variablecalculus, a function f (x) is a constant function if and only if its derivative iszero. In the world of vector bundles, after fixing a connection ∇, the analogof a function is a section, and the analog of the derivative of a function is ∇σ ′ .Thus a section s being parallel with respect to a curve σ is the vector bundleanalog of a constant function.

Definition 18.4.2. The parallel transport of a vector vq ∈ Eq along a curve σ

from the point p to q is the value

s(p) ∈ E p,

Page 259: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

244 18 Connections

where s is a section parallel with respect to σ such that

s(q) = vq .

This is just applying the idea that a parallel section s is the vector bundleanalog of constant functions. This is why the vector s(p) in E p should be thevector corresponding to the original vector vq in Eq .

It is not obvious that such parallel transports exist, though. The followingtheorem states that they do. (The key for the proof is that we reduce theexistence of such parallel transports to solving a system of ordinary differentialequations, which can always be done.)

Theorem 18.4.1. Let ∇ be a connection for a rank k vector bundle E on amanifold M. Let σ be a curve on M going through a point q ∈ M. Then, givenany vector v ∈ Eq, there is a unique section s that is parallel with respect to σ

such that s(q) = v.

Proof. Let s1, . . . ,sk be a basis for our vector bundle. We are given the vectorv ∈ Eq . This means that there are numbers v1, . . . ,vk such that

v = v1s1(q) + ·· ·+ vksk(q).

Let x1, . . . , xn be our local coordinates for M . Label q ∈ M as

q = (q1, . . . ,qn).

Let our curve σ be described via

σ (t) = (x1(t), . . . , xn(t)).

We can assume that

σ (0) = (x1(0), . . . , xn(0)) = (q1, . . . ,qn) = q .

We know that the tangent vector to σ is

σ ′(t) =(

dx1(t)

dt

)∂

∂x1+ ·· ·+

(dxn(t)

dt

)∂

∂xn.

We know for any section

s = f1s1 + ·· ·+ fksk

that

∇σ ′(s) =(

k∑i=1

fiωi1(σ ′) + d f1(σ ′)

)s1 + ·· · +

(k∑

i=1

fiωik(σ ′) + d fk(σ ′)

)sk .

Page 260: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.4 Parallel Transport: Why Connections Are Called Connections 245

We must find functions f1, . . . , fk such that

f1(q) = v1, . . . , fk(q) = vq

and such that, for the tangent vectors σ ′ along our curve σ , we have

∇σ ′ (s) = 0.

But this means we must find the functions fi such that

k∑i=1

fiωi1(σ ′) + d f1(σ ′) = 0

...k∑

i=1

fiωik(σ ′) + d fk(σ ′) = 0

Now

d fi (σ ′) =(

∂ fi

∂x1· dx1 + ·· ·+ ∂ fi

∂xn· dxn

)(σ ′)

=(

∂ fi

∂x1· dx1 + ·· ·+ ∂ fi

∂xn· dxn

) n∑j=1

(dx j (t)

dt

)∂

∂x j

= ∂ fi

∂x1· dx1(t)

dt+ ·· ·+ ∂ fi

∂xn· dxn(t)

dt

= d

dt( fi (x1(t), . . . , xn(t))) .

The key for us is that ddt ( fi (x1(t), . . . , xn(t))) is an ordinary derivative.

Thus to find the fi so that ∇σ ′ (s) = 0 comes down to solving by followingthe system of k ordinary differential equations. We need

d

dt

(f j (x1(t), . . . , xn(t))

)to equal

−k∑

i=1

fiωi j

((dx1(t)

dt

)∂

∂x1+ ·· · +

(dxn(t)

dt

)∂

∂xn

),

for all j = 1, . . .k, with initial conditions f1(q) = v1, . . . , fk(q) = vq . The left-hand side of the preceding consists of ordinary derivatives while the right-hand side is made up of our unknown functions fi and known functions. Such

Page 261: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

246 18 Connections

systems always have unique solutions, as seen in almost all books on ordinarydifferential equations. �

The uniqueness of parallel transports is important. Start with our fixed curveσ in the base manifold M , starting at the point p and ending at q (Figure 18.4).

M

p

σq

Figure 18.4

The vector bundle E over the curve σ is (Figure 18.5)

σp q

zero section

Figure 18.5

(Here we are drawing E as a dimension one vector bundle, which is of coursenot necessarily the case.) Fix a vector vq ∈ Eq . There are (infinitely) manysections s over σ such that s(q) = vq .

σ

υq

p q

zero section

Figure 18.6

But there is only one section s that is parallel with respect to σ with s(q) = vq .Thus the parallel transport of vq to the point s(p) is unique.

q

p q

zero section

p

uniques = 0

σ σ

υ υ

′∇

Figure 18.7

Page 262: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.5 Appendix: Tensor Products of Vector Spaces 247

Finally, why are connections called connections? Parallel transports exist,letting us move a vector v ∈ Eq to vectors in other fibers. Thus having aconnection ∇ allows us to “connect,” or link, vectors in any Eq with vectors inany other E p.

18.5. Appendix: Tensor Products of Vector Spaces

18.5.1. A Concrete Description

Let V be an n-dimensional vector space and W be an m-dimensional vectorspace. We want to construct a new vector space, denoted by V ⊗ W , whosedimension will be nm. We call this new vector space V ⊗W the tensor productof V and W .

Let v1, . . . ,vn be a basis for V and let w1, . . . ,wm be a basis for W . Rightnow, take as our basis for V ⊗ W to be all possible

vi ⊗w j .

Then define the elements of V ⊗ W to be all possiblem∑

j=1

n∑i=1

ai jvi ⊗w j ,

with the ai j being numbers. For example, if n = 2 and m = 3, then an elementof V ⊗ W will be of the form

a11v1 ⊗w1 +a12v1 ⊗w2 +a13v1 ⊗w3 +a21v2 ⊗w1 +a22v2 ⊗w2 +a23v2 ⊗w3.

We now add a few extra requirements onto the algebraic structure of V ⊗ W .We require, for all vectors v, v ∈ V , w, w ∈ W and scalars λ, that

(v+ v) ⊗w = v⊗w+ v⊗w,

v⊗ (w+ w) = v⊗w+ v⊗ w

andλ(v⊗w) = (λv) ⊗w = v⊗ (λw).

These rules allow us to interpret any v⊗w as a linear combination of variousvi ⊗w j , as described in

Theorem 18.5.1. Suppose

v = α1v1 + ·· · +αnvn ∈ V

andw = β1w1 + ·· · +βmwm ∈ W .

Page 263: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

248 18 Connections

Then

v⊗w =i=n, j=m∑i=1, j=1

αiβ jvi ⊗w j .

(The proof is one of the exercises at the end of the chapter.)For example, we have

(2v1 + 3v2) ⊗ (4w1 +w2) = (2v1 + 3v2) ⊗ (4w1) + (2v1 + 3v2) ⊗w2

= (2v1) ⊗ (4w1) + (2v1) ⊗w2

+ (3v2) ⊗ (4w1) + (3v2) ⊗w2

= 8v1 ⊗w1 + 2v1 ⊗w2

+12v2 ⊗w1 + 3v2 ⊗w2.

18.5.2. Alternating Forms as Tensors

Let V be a vector space. Earlier we defined alternating k-forms �k V . Thesecan also be defined in term of tensors. We start with our reinterpretation of2-forms, �2V , which we define to be the subspace of V ⊗ V generated by allelements of the form

v⊗w−w⊗ v,

for any v,w ∈ V . We set

v∧w = v⊗w−w⊗ v.

Note that

v∧w = v⊗w−w⊗ v = −(w⊗ v− v⊗w) = −w∧ v.

Thusv∧ v = −v∧ v = 0.

Given a basis for V , we have a natural basis for �2V :

Theorem 18.5.2. Let v1, . . . ,vn be a basis for V . Then a basis for �2V will beall

vi ∧ v j ,

for 1 ≤ i < j ≤ n.

The proof is left for the exercises.Let us see how this works. Suppose V is a two-dimensional vector space,

with basis v1 and v2. Let us write

(2v1 + 3v2) ∧ (4v1 + v2)

Page 264: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.5 Appendix: Tensor Products of Vector Spaces 249

in terms ofv1 ∧ v2.

By definition

(2v1 + 3v2) ∧ (4v1 + v2) = (2v1 + 3v2) ⊗ (4v1 + v2) − (4v1 + v2) ⊗ (2v1 + 3v2).

We know from the preceding that

(2v1 + 3v2) ⊗ (4v1 + v2) = 8v1 ⊗ v1 + 2v1 ⊗ v2 + 12v2 ⊗ v1 + 3v2 ⊗ v2.

Via calculation, we have

(4v1 + v2) ⊗ (2v1 + 3v2) = 8v1 ⊗ v1 + 12v1 ⊗ v2 + 2v2 ⊗ v1 + 3v2 ⊗ v2.

Then

(2v1 + 3v2) ∧ (4v1 + v2) = 8v1 ⊗ v1 + 2v1 ⊗ v2 + 12v2 ⊗ v1 + 3v2 ⊗ v2

− (8v1 ⊗ v1 + 12v1 ⊗ v2 + 2v2 ⊗ v1 + 3v2 ⊗ v2)

= −10v1 ⊗ v2 + 10v2 ⊗ v1

= ( − 10)(v1 ⊗ v2 − v2 ⊗ v1)

= −10v1 ∧ v2.

The space �3V will be the subspace of V ⊗ V ⊗ V spanned by

u ⊗ v⊗w− v⊗ u ⊗w−w⊗ v⊗ u − u ⊗w⊗ v+w⊗ u ⊗ v+ v⊗w⊗ u.

Each of these vectors we denote by

u ∧ v∧w.

Thus u ∧ v ∧w is the linear combination of the six possible tensor productsof u,v,w ∈ V , with coefficients ±1 depending on whether the permutation iseven or odd. We have

Theorem 18.5.3. Let v1, . . . ,vn be a basis for V . Then a basis for �3V will beall

vi ∧ v j ∧ vk ,

for 1 ≤ i < j < k ≤ n.

The proof is left for the exercises.In general, the alternating k-forms �k V will be the subspace of V ⊗·· ·⊗V

(V tensored with itself k times) generated by the sum of all tensor products ofk vectors under all possible permutations, with coefficients ±1 depending onwhether the permutation is even or odd.

Page 265: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

250 18 Connections

Theorem 18.5.4. Let v1, . . . ,vn be a basis for V . Then a basis for �k V will beall

vi1 ∧ ·· · ∧ vik ,

for 1 ≤ i1 < · · · < ik ≤ n.

18.5.3. Homogeneous Polynomials as Symmetric Tensors

Besides alternating tensors, there is another natural subspace of any tensorspace V ⊗ ·· · ⊗ V , the symmetric tensors. In this section we will definesymmetric tensors and then see how they can be easily interpreted ashomogeneous polynomials. This suggests how flexible tensor notation canbe, as it can be used to capture not only the language of differential forms butalso the language of much of algebra.

Let V be a vector space. Define S2(V ) (whose elements are called thesymmetric two-tensors) to be the subspace of V ⊗ V generated by all elementsof the form

v⊗w+w⊗ v,

for any v,w ∈ V . We set

v�w = v⊗w+w⊗ v.

Theorem 18.5.5. Let v1, . . . ,vn be a basis for V . Then a basis for S2(V ) willbe all

vi � v j ,

for 1 ≤ i ≤ j ≤ n.

The proof is left for the exercises.In analog to alternating k-forms, the symmetric k-forms Sk(V ) will be the

subspace of V ⊗·· ·⊗ V (V tensored with itself k times) generated by the sumof all tensor products of k vectors under all possible permutations.

Theorem 18.5.6. Let v1, . . . ,vn be a basis for V . Then a basis for Sk V will beall

vi1 �·· · � vik ,

for 1 ≤ i1 ≤ ·· · ≤ ik ≤ n.

The proof is also left for the exercises.Here we are generalizing the use of � to mean that vi1 �·· ·�vik should be

interpreted to be the sum of all tensor products of k vectors under all possiblepermutations.

Page 266: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.5 Appendix: Tensor Products of Vector Spaces 251

Now to include homogeneous polynomials. Homogeneous polynomials inthe two variables x1 and x2 are simply those polynomials that are the sum ofmonomials of the same degree. Thus

x1x2 + x22

is homogeneous of degree two, since its two monomials x1x2 and x22 each have

degree two, whilex1x2 + x3

2

is not homogeneous, since x32 has degree three. In two variables, all degree two

homogeneous polynomials are of the form

ax21 + bx1x2 + cx2

2 ,

while all degree three homogeneous polynomials are of the form

ax31 + bx2

1 x2 + cx1x22 + dx3

2 .

Similar definitions hold for three variables. For example, in three variables,all degree two homogeneous polynomials are of the form

ax21 + bx1x2 + cx1x3 + dx2

2 + ex2x3 + f x23 .

Now to link with symmetric tensors. Let V be a two-dimensional vectorspace with basis x1 and x2. Since S2(V ) has basis x1 � x1, x1 � x2 and x2 � x2,we know that all symmetric two-tensors are of the form

ax1 � x1 + bx1 � x2 + cx2 � x2.

Each such symmetric two-tensor can of course be effortlessly thought of as thepolynomial ax2

1 + bx1x2 + cx22 .

In general, if V has basis x1, . . . , xn, we associate each

xi1 �·· · � xik

to the polynomialxi1 · · · xik .

What is important is that a change of basis on V will correspond to ahomogeneous linear change of coordinates for the corresponding polynomials.

18.5.4. Tensors as Linearizations of Bilinear Maps

So far in this Appendix we have been emphasizing how to compute andconstruct tensor spaces. Here we will give a more intrinsic approach. Thenatural maps between vector spaces are linear transformations. But a number

Page 267: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

252 18 Connections

of times in this text we have looked not at linear maps but instead at bilinearmaps. Given three vector spaces U , V , and W , recall that a bilinear map is amap

B : U × V → W

such that, for all vectors u1,u2 ∈ U , v1,v2 ∈ V and scalars α1,α2 ∈R, we have

B(α1u1 +α2u2,v1) = α1 B(u1,v1) +α2 B(u2,v1)

B(u1,α1v1 +α2v2) = α1 B(u1,v1) +α2 B(u1,v2).

At almost the level of fantasy math, it would be great if we could somehowtranslate this bilinear map into a linear map. Somewhat surprisingly this canbe done:

Theorem 18.5.7. Given any two vector spaces U and V , there exists a thirdvector space, denoted by U ⊗ V and a natural bilinear map

π : U × V → U ⊗ V

such that for any other vector space W and any bilinear map B : U × V → Wthere exists a unique linear map

b : U ⊗ V → W

such that

B = b ◦π .

Thus we always have the following commutative diagram:

U × VB−→ W

π ↓ ↗ bU ⊗ V

.

Here are the actual details of how to link B with b. Let u1, . . . ,um be a basisfor U and let v1, . . . ,vn be a basis for V . The map

π : U × V → U ⊗ V

is defined by settingπ(u,v) = u ⊗ v.

Ifu = α1u1 + ·· ·+αmum

andv = β1v1 + ·· ·+βnvn ,

Page 268: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.6 Exercises 253

then

π(u,v) = (α1u1 + ·· ·+αmum) ⊗ (β1v1 + ·· · +βnvn)

=∑

αiβ j ui ⊗ v j .

Any bilinear map B : U × V → W is determined by its values

Bi j = B(ui ,v j ).

Define the map b : U ⊗ V → W by setting

Bi j = b(ui ⊗ v j ).

We want to show thatB(u,v) = b(u ⊗ v).

We have

B(u,v) = B(α1u1 + ·· ·+αmum ,β1v1 + ·· ·+βnvn)

=∑

αi B(ui ,β1v1 + ·· · +βnvn)

=∑

αiβ j B(ui ,v j )

=∑

αiβ j b(ui ⊗ v j )

= b((α1u1 + ·· ·+αmum) ⊗ (β1v1 + ·· ·+βnvn))

= b(u ⊗ v),

as desired.

18.6. Exercises

Exercise 18.6.1. Describe R3 as a rank-two trivial bundle with basemanifold R.

Exercise 18.6.2. Describe R3 as a rank-one trivial bundle with basemanifold R2.

Exercise 18.6.3. Let E be a rank-two bundle with basis of sections s1 ands2 on a surface M, which has coordinates x1 and x2. Let the connectionmatrix be

ω =(

dx1 + 2dx2 dx1 + dx2

dx1 − dx2 dx1

).

Fors = (x2

1)s1 + (x1x2)s2,

Page 269: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

254 18 Connections

find

∇(s).

Exercise 18.6.4. Let E be a rank-two bundle with basis of sections s1 and s2

on a surface M, which has coordinates x1 and x2. Let the connection matrixbe

ω =(

dx1 dx1 + dx2

dx2 dx1

).

Let

s = (x2dx1) ⊗ s1 + (x2dx2) ⊗ s2 ∈ �1(M) ⊗�(E).

Compute

∇(s).

Exercise 18.6.5. Let E be the trivial rank-two bundle with base manifold R2

with trivial connection. Let x and y be the coordinates for R2. For the section

s(x , y) =(

x + y2

x3

)

and curve

σ (t) = (t , t2),

find

∇σ ′(0)s.

Exercise 18.6.6. Let E be the trivial rank-two bundle with base manifold R2

with trivial connection. Let

q = (2,4) ∈R2

and

v =(

34

).

Parallel transport v to a vector in E(0,0) along the curve

σ (t) = (t , t2).

Exercise 18.6.7. Do the same as in the previous problem, but now paralleltransport the vector v along the path

σ (t) = (t ,2t).

Page 270: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

18.6 Exercises 255

Exercise 18.6.8. Show that the zero section of a vector bundle E (i.e., s(p) = 0for any p ∈ M) is parallel to any curve σ (t) ⊂ M, with respect to anyconnection.

Exercise 18.6.9. Let V have basis vectors v1 and v2 and let W have basisvectors w1, w2, and w3. Write

(2v1 + 3v2) ⊗ (4w1 +w2 + 5w3)

in terms of the basis formed from the various vi ⊗w j .

Exercise 18.6.10. Let v1, . . . ,vn be a basis for V and let w1, . . . ,wm be a basisfor W. Suppose

v = α1v1 + ·· · +αnvn ∈ V

and

w = β1w1 + ·· · +βmwm ∈ W .

Then show that

v⊗w =i=n, j=m∑i=1, j=1

αiβ jvi ⊗w j .

Exercise 18.6.11. Let v1,v2,v3 be a basis for V . Write

(3v1 + 2v2 + 4v3) ∧ (v1 − v2 + v3)

in terms of the various vi ∧ v j , with i < j .

Exercise 18.6.12. Let v1, . . . ,vn be a basis for V . Show that all

vi ∧ v j ,

for 1 ≤ i < j ≤ n, form a basis for �2V .

Exercise 18.6.13. Let v1, . . . ,vn be a basis for V . Show that all

vi ∧ v j ∧ vk ,

for 1 ≤ i < j < k ≤ n, form a basis for �3V .

Exercise 18.6.14. Let V have basis vectors v1 and v2. Write

(2v1 + 3v2) � (4v1 + v2 + 5v3)

in terms of the basis formed from the various vi � v j .

Page 271: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

256 18 Connections

Exercise 18.6.15. Let v1, . . . ,vn be a basis for V . Show that all

vi � v j ,

for 1 ≤ i ≤ j ≤ n, form a basis for S2(V ).

Exercise 18.6.16. Let v1, . . . ,vn be a basis for V . Show that all

vi1 �·· · � vik ,

for 1 ≤ i1 ≤ ·· · ≤ ik ≤ n, form a basis for Sk V .

Page 272: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

19

Curvature

Summary: The goal of this chapter is to define the curvature of a vectorbundle. Curvature will depend on a choice for a connection for the vectorbundle. If a connection provides a method for differentiating sections ofa vector bundle, then the curvature can be interpreted as taking the secondderivative of a section.

19.1. Motivation

The study of curvature lies at the heart of much of geometry, if not most ofmathematics. Originally, curvature captured the rate of change of tangentvectors of a manifold. Since tangent vectors in turn are captured by firstderivative type information, we should expect curvature to be rates of changeof derivatives and hence involve second derivatives.

Probably most people’s introduction to these ideas is in the study ofconcavity properties of single-variable functions in beginning calculus.

y

x

y

x

f > 0′′ ′′concave up

f < 0concave down

Figure 19.1

Here (Figure 19.1) a curve y = f (x) is concave up if f ′′(x) > 0 and concavedown if f ′′(x) < 0.

The second standard place to see curvature is in a multivariable calculuscourse. Starting with a plane curve (x(t), y(t)) (Figure 19.2),

257

Page 273: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

258 19 Curvature

y

x

(x (t), y (t))

Figure 19.2

the curvature is the rate of change of the unit tangent vector with respect toarclength. The actual curvature is shown to be

|x ′y ′′ − y ′x ′′|((x ′)2 + (y ′)2

)3/2 .

Note that the formulas already are becoming a bit complicated. This is arecurring theme, namely, that formulas exist but are difficult to understandtruly.

Thinking of curvature as second derivative type information suggests thefollowing approach for defining curvature for vector bundles. Start by fixinga connection ∇, which as we saw in the last chapter is a method for definingderivatives of sections. Then curvature should be something like

Curvature = ∇ ◦∇,

or, in other words, the connection of the connection. Now to make sense ofthis intuition.

19.2. Curvature and the Curvature Matrix

Let E be a rank k vector bundle with base manifold M . Fix a connection ∇on E . Motivated by the belief that curvature should be a second derivative, wehave

Definition 19.2.1. The curvature for the connection ∇ is the map

∇2 : �(E) → �2(M) ⊗�(E)

defined by setting∇2(s) = ∇(∇(s)).

We can capture the curvature by a k ×k matrix of 2-forms. Fix a local frameof sections s1, . . . ,sk for E . Then we have the associated k × k connectionmatrix of 1-forms ω = (ωi j ).

Page 274: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

19.3 Curvature and the Curvature Matrix 259

Definition 19.2.2. The curvature matrix with respect to a connection ∇ andlocal frame s1, . . . ,sk is

� = ω ∧ω − dω.

Before showing how to use the connection matrix � to compute curvature,let us look at an example. Suppose that E has rank two, and that the connectionmatrix is

ω =(

ω11 ω12

ω21 ω22

).

Then

� = ω ∧ω − dω

=(

ω11 ω12

ω21 ω22

)∧(

ω11 ω12

ω21 ω22

)−(

dω11 dω12

dω21 dω22

)

=(

ω11 ∧ω11 +ω12 ∧ω21 ω11 ∧ω12 +ω12 ∧ω22

ω21 ∧ω11 +ω22 ∧ω21 ω21 ∧ω12 +ω22 ∧ω22

)−(

dω11 dω12

dω21 dω22

)

=(

ω11 ∧ω11 +ω12 ∧ω21 − dω11 ω11 ∧ω12 +ω12 ∧ω22 − dω12

ω21 ∧ω11 +ω22 ∧ω21 − dω21 ω21 ∧ω12 +ω22 ∧ω22 − dω22

).

The link between this matrix of 2-forms and curvature lies in the following.A section S of E is any

S = f1s1 + ·· ·+ fksk ,

for real-valued functions f1, . . . , fk . As in the last chapter, let

f = ( f1 · · · fk )

and let s denote the corresponding column vector of sections with entriess1, . . . ,sk . Then

S = f · s.

The key is the following theorem, which we will prove in the next section:

Theorem 19.2.1.∇2(S) = f ·� · s.

Thus for a rank-two bundle, we have

∇2(S) = f ·� · s

= (f1 f2

)( �11 �12

�21 �22

)(s1

s2

)= ( f1�11 + f2�21)s1 + ( f1�12 + f2�22)s2.

Page 275: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

260 19 Curvature

19.3. Deriving the Curvature Matrix

Our goal is to prove that

∇2(S) = f ·� · s.

We will be using that it is always the case that

d2 f = 0.

Also, for a section s and an m-form τ , we have

∇(τ ⊗ s) = ( − 1)mdτ ⊗ s + τ ∧∇(s).

We have

∇2(S) = ∇(∇(S))

= ∇(∇( f · s))

= ∇(d f · s + f ·ω · s)

= ∇(d f · s) +∇( f ·ω · s).

Now, d f is a row matrix of 1-forms. Thus

∇(d f · s) = −d(d f ) · s + d f ∧∇(s)

= d f ∧ω · s,

since d(d f ) = d2 f = 0. We can think of f · ω as a row matrix of 1-forms.Using that

d( f ·ω) = d f ∧ω + f · dω,

we have

∇( f ·ω · s) = −d( f ·ω) · s + f ·ω ∧∇(s)

= −d f ∧ω · s − f · dω · s + f ·ω∧ω · s.

Putting all of this together, we have

∇2(S) = f · (ω ∧ω − dω) · s = f ·� · s,

as desired.Some of the exercises ask for you to give more coordinate-dependent

proofs.

Page 276: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

19.4 Exercises 261

19.4. Exercises

Exercise 19.4.1. Let E be a rank-two vector bundle with base manifold R2,

with coordinates x1, x2. Suppose we have the connection matrix(x2

2dx1 + dx2 x1dx1 + dx2

x2dx1 − dx2 dx1 + x1dx2

),

with respect to a basis of sections s1 and s2 for E. Compute the curvaturematrix �.

Exercise 19.4.2. Using the same notation as in the previous problem, considera section

S = (x1x2)s1 + (x1 + x2)s2.

Find the curvature of this section.

Exercise 19.4.3. Let E be a rank-two vector bundle with base manifold R2,

with coordinates x1, x2. Suppose we have the connection matrix(x2dx1 + (x1 + x2)dx2 x1dx1 + dx2

x2dx1 + dx2 dx1 − x1dx2

),

with respect to a basis of sections s1 and s2 for E. Consider the section

S = (x1 − x2)s1 + (x21 + x2)s2.

Find the curvature of this section.

Exercise 19.4.4. Let E be a rank k bundle over a one-dimensional basemanifold M. Let ω be a connection matrix for E with respect to some basisof sections for E. Compute the curvature of any section. (Hint: Since so littleinformation is given, the actual answer cannot be that hard.)

Exercise 19.4.5. Let E be a trivial bundle on a manifold M, with trivialconnection. Find the corresponding curvature matrix.

Exercise 19.4.6. Letf = ( f1 f2 )

be a row vector of functions and

ω =(

ω11 ω12

ω21 ω22

)

a matrix of 1-forms. Prove that

d( f ·ω) = d f ∧ω + f · dω.

Page 277: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

262 19 Curvature

The next two exercises are deriving the formula for the curvature matrix vialocal calculations.

Exercise 19.4.7. Let E be a rank-two vector bundle over a base manifold M.Let s1 and s2 be a basis of sections. Suppose that there is a connection matrix

ω =(

ω11 ω12

ω21 ω22

).

For a section S = f1s1 + f2s2, we know that

∇(S) = (d f1 + f1ω11 + f2ω21)s1 + (d f2 + f1ω12 + f2ω22)s2.

Explicitly calculate the curvature ∇2(S) and then show that it agrees with ourmatrix formulation of f ·� · s.

Exercise 19.4.8. Let E be a rank-k vector bundle over a base manifold M.Let s1, . . . ,sk be a basis of sections. Suppose that there is a connection matrix

ω =

ω11 · · · ω1k...

ωk1 · · · ωkk

.

For a section S = f1s1 + ·· ·+ fksk , we know that

∇(S) =k∑

i=1

d fi +

k∑j=1

f jω j i

si .

Explicitly calculate the curvature ∇2(S) and then show that it agrees with ourmatrix formulation of f ·� · s.

Page 278: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

20

Maxwell via Connections and Curvature

Summary: We finally pull together all the themes of this book, formulatingMaxwell’s equations in terms of connections and curvature. It is thisformulation that will allow deep generalizations in the next chapter.

20.1. Maxwell in Some of Its Guises

At the beginning of this book we wrote down Maxwell’s equations:

div(E) = ρ

curl(E) = −∂ B

∂ tdiv(B) = 0

c2 curl(B) = j + ∂ E

∂ t.

Here

E = (E1(x , y, z, t), E2(x , y, z, t)), E3(x , y, z, t))

is the electric field,

B = (B1(x , y, z, t), B2(x , y, z, t)), B3(x , y, z, t))

is the magnetic field, the function ρ is the charge density, and

j = (J1(x , y, z, t), J2(x , y, z, t), J3(x , y, z, t))

is the current. We showed that there always exist a scalar potential functionφ(x , y, z, t) and a vector potential A = (A1(x , y, z, t), A2(x , y, z, t), A3(x , y, z, t))

263

Page 279: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

264 20 Maxwell via Connections and Curvature

such that

E = −∇φ − ∂ A

∂ tB = ∇ × A.

We then recast these equations into the language of differential forms. Wedefined the electromagnetic 2-form to be

F = E1dx ∧ dt + E2dy ∧ dt + E3dz ∧ dt

+ B1dy ∧ dz + B2dz ∧ dx + B3dx ∧ dy

and the potential 1-form to be

A = −φdt + A1dx + A2dy + A3dz,

with the relationF = dA.

Encoding the charge density ρ and the current j as the 1-form

J = −ρdt + J1dx + J2dy + J3dz,

we saw that Maxwell’s equations could be written as

dF =0

� d � F = J .

We further saw that the function

L = ( � J ) ∧ A + 1

2( � F) ∧ F

serves as a Lagrangian for Maxwell’s equations, meaning that the Euler-Lagrange equations for finding the critical points of∫

R4L dxdydzdt

are Maxwell’s equations.

20.2. Maxwell for Connections and Curvature

We can now recast Maxwell’s equations into the language of connectionsand curvature. The end result will be that the potential one-form will be aconnection matrix and the electromagnetic two-form will be the curvature ofthis connection. This may seem to be a mere reshuffling of definitions. Its

Page 280: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

20.2 Maxwell for Connections and Curvature 265

importance is that this will create a natural language for deep generalizationsthat profoundly impact physics and mathematics.

Let E be a trivial real line bundle over R4. Choose a section s that is

nowhere zero. To define a connection, let ω be a 1-form and define theconnection for E by setting, for any function f on R

4,

∇( f s) = f ·ω · s + d f · s

The connection matrix is simply A. Then the curvature of this connection willbe

F = ω ∧ω − dω

= −dω,

since the wedge of a 1-form with itself is always zero. For convenience, setA = −ω. Then F = dA.

We want to identify the curvature F with the electromagnetic two-form.The curvature F will correspond to the electromagnetic two-form if

dF = 0

� d � F = J ,

where J is the one-form encoding charge density. We always have dF = 0since F = dA and d(d) is always zero. The extra condition can be phrasedeither as � d � F = J or, as discussed in Chapter 11, as finding the criticalpoints for the Lagrangian,

L = ( � J ) ∧ A + 1

2( � F) ∧ F .

Thus we can indeed describe Maxwell’s equations as follows: Given acharge density and a current, we have a fixed current one-form J . Among allpossible connections for the trivial line bundle E over the manifold M =R

4

E↓

M ,

we choose those connections A whose corresponding curvature F = dAsatisfies � d � F = J , or those that are critical points for the LagrangianL = ( � J ) ∧ A + 1

2 ( � F) ∧ F .The key for generalizing is that the field of the force, here the electric and

magnetic fields, is described as the curvature of a connection, which we writefor emphasis as

Force = Curvature

Page 281: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

266 20 Maxwell via Connections and Curvature

Further, among all possible connections, the physically significant ones arethose whose corresponding curvatures are critical points for a specifiedLagrangian. As mentioned earlier, physicists would use the term “gauge” forconnections. The corresponding Euler-Lagrange equations for the connectionsare called Yang-Mills equations.

20.3. Exercises

Exercise 20.3.1. For the trivial line bundle E over the manifold M = R4,

consider the connection 1-form

ω = −xzdt + (x2 − yt)dx + (x + zt2)dy + (z2t − y)dz.

Compute the curvature 2-form F = −dω and show

dF = 0

� d � F = −2zdt − 2zdy.

Compare this to Exercise 2.3.5

Exercise 20.3.2. For the trivial line bundle E over the manifold M = R4,consider the connection 1-form

ω = −dt + (x − yt)dx + (x + zt2)dy + (z2t − y)dz.

Compute the curvature 2-form F = dA = −dω. By computing dF and� d � F, determine whether the connection A is from an electromagnetic fieldwith

E = (tx , xy, z2)

B = (y, xy, zt2)

ρ = t2xyz

j = (x + y, y + z + t , t + x2z).

Page 282: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

21

The Lagrangian Machine, Yang-Mills, andOther Forces

Summary: In this final chapter, we generalize our description of Maxwell’sequations via connections and curvature to motivate a description of the weakforce and the strong force. This, in turn, will motivate Yang-Mills equations,which have had a profound effect not only on physics but also on geometry. Inall of these there is the theme that “Force = Curvature.”

21.1. The Lagrangian Machine

Here we set up a general framework for understanding forces. (This sectionis heavily indebted to section 5.13 of Sternberg’s Group Theory and Physics[63].)

Start with a vector bundle E over a manifold M:

E↓M

The current belief is that all forces can be cast into the language of theLagrangian Machine. Specifically, the Lagrangian Machine is a function

L : Connections(E) ×�(M , E) → C∞(M),

where Connections(E) is the space of connections on the bundle E , �(M , E)is the space of sections from M to E , and C∞(M) are the smooth functions onM .

Here is how the real world enters this picture. The base manifold M is ourworld, the world we see and hence should be four-dimensional space-time R4.The vector bundle E encodes other information. The forces enter the pictureby choosing a connection. In electricity and magnetism, we have seen that theconnection is a potential for the corresponding force; the actual force will bethe curvature of this connection. This will turn out also to hold for the weak

267

Page 283: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

268 21 The Lagrangian Machine, Yang-Mills, and Other Forces

force and the strong force. The path of a particle will be a section. Suppose wewant to know the path of a particle going from a point p ∈ M to a point q ∈ M .We know the various forces that exist, allowing us to choose a connection. Weknow the extra information about the particle at both p and q , which meansthat we are given points in the fiber of E over both p and q . Then using thefunction L, choose the section, whose projection to M is the actual path of theparticle, by requiring that the function L has a critical point for this section.

This is precisely the language used in the last chapter to describe Maxwell’sequations. Further, our earlier work on describing classical mechanics viaLagrangians can easily be recast into the Lagrangian Machine. (To a largeextent, this is precisely what we have done.)

21.2. U(1) Bundles

We first look at a more quantum mechanical approach to the LagrangianMachine as applied to Maxwell.

We stick with our base manifold being M = R4. Quantum mechanics, as

we saw, is fundamentally a theory over the complex numbers. This motivatesreplacing our original trivial real line bundle E with a trivial complex linebundle.

Thus we should consider

R4 ×C

↓R

4.

Finally, in quantum mechanics the state of a particle is defined only up toa phase factor eiα. This suggests that we want to think of our trivial vectorbundle as a U (1) bundle, where U (1) is the unitary group of points on the unitcircle:

U (1) = {eiα : α ∈R},with the group action being multiplication.

Hence in an almost line-for-line copying from the last section, we have atrivial complex line bundle over R4. Choose a section s that is nowhere zero.Let A be a 1-form. Define a connection for E by setting, for any function f onR4,

∇( f s) = s ∧ i f A + s ∧ d f .

The connection matrix is now i A. (It is traditional for physicists to put in thisextra factor of i , possibly so that it is clear we are now dealing with complex

Page 284: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

21.3 Other Forces 269

line bundles.) Then the curvature of this connection will be

F = i A ∧ i A + id(A)

= idA,

since the wedge of a 1-form with itself is always zero.As before, the curvature F corresponds to the electromagnetic 2-form if

dF = 0

� d � F = J ,

where J is the 1-form encoding charge density and current.

21.3. Other Forces

The modern approach to understanding a force is to attempt to find theappropriate Lagrangian Machine. This involves identifying the correct vectorbundle and even more so the correct Lagrangian. Attempts to unify forcesbecome mathematically the attempt to put two seemingly different forces intothe same Lagrangian Machine.

There are three known forces besides electricity and magnetism: the weakforce, the strong force, and gravity. The weak force, for example, is whatcauses a lone neutron to split spontaneously into a proton, an electron, and anelectron antineutrino. The strong force is what keeps the nucleus of an atomstable, despite being packed with a number of protons, all with positive chargeand hence with the desire to be blown apart. And, of course, gravity is whatkeeps us grounded on Earth.

All these forces have Lagrangian interpretations. In the 1960s and 1970sthe weak force was unified with the electromagnetic force. The very languageof this unification was in terms of gauge theory, which is the physicist versionof connections on a vector bundle. Soon afterward, a common framework,called the standard model, was given to include also the strong force.

Linking these three forces with gravity is still a mystery. The best currentthinking falls under the name of string theory. Unfortunately, any type ofexperimental evidence for string theory seems a long way off. It has, however,generated some of the most beautiful and important mathematics since the1980s.

Page 285: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

270 21 The Lagrangian Machine, Yang-Mills, and Other Forces

Both the weak force and the strong force are viewed as non-Abeliantheories. (Abelian is another term for commutative.) Unlike the Lagrangianmachine for electromagnetism, for each of these forces the correspondingvector bundles are not trivial complex line bundles but instead bundles ofhigher rank. For the weak force, the bundle E will be of rank two whilefor the strong force it will be of rank three. The corresponding transitionfunctions then cannot be made up of invertible one-by-one matrices (complexvalued functions) but instead must be matrices. For the weak force, thetransition matrices will be in the special unitary group of two-by-two matrices,SU (2), while for the strong force, the transition matrices will be in the specialunitary group of three-by-three matrices, SU (3). The standard model has as itstransition matrices elements of U (1) × SU (2) × SU (3).

Both the weak force and the strong force are quantum mechanical; that, inthe language of the Lagrangian Machine, is why the transition functions are inspecial unitary groups. Gravity also has a Lagrangian interpretation, but thecorresponding transition functions seem to have nothing to do with any typeof unitary group; that is one way of saying that the theory of gravity is not yetcompatible with quantum mechanics.

But for all of these, it is the case that

Force = Curvature

21.4. A Dictionary

It was a long, slow process for physicists to go from the seeminglynonphysical but mathematical meaningfulness of the vector and scalarpotentials to the whole machinery of connections and curvature. In fact, atleast for the three fields of electromagnetism, the weak force, and the strongforce, the rhetoric was in terms of gauges, not connections.

Mathematicians developed the language of connections for vector bundlesfrom questions about curvature. Physicists, on the other hand, developedthe language of gauges from questions about forces. It was only in the1960s and the 1970s that people began to realize that gauges and connectionswere actually the same. In 1975, Wu and Yang [69] made the dictionaryexplicit, in the following table, between the physicists’ language of gaugesand the mathematicians’ language of connections, which we reproduce here(since this is lifted from their paper, do not worry very much about what the

Page 286: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

21.4 A Dictionary 271

various symbols mean; we are copying this only to give a flavor as to possibledictionaries):

Gauge field terminology Bundle terminologygauge (or global gauge) principal coordinate bundlegauge type principal fiber bundlegauge potential bk

µ connection on a principal fiberbundle

Sba (see Sec. V) transition functionsphase factor �Q P parallel displacementfield strength f k

µν curvaturesource J K

µ ?electromagnetism connection on a U (1) bundleisotopic spin gauge field connection on a SU2 bundleDirac’s monopole quantization classification of U (1) bundle

according to first Chern classelectromagnetism without monopole connection on a trivial U (1)

bundleelectromagnetism with monopole connection on a nontrivial U (1)

bundle

Now to break briefly from the standard impersonal writing of math andscience. In the early 1980s, as a young graduate student in mathematics, I wasconcerned with curvature conditions on algebraic vector bundles. Hence I wasintensely interested in connections, soon seeing how the abstractions that wehave developed here fit naturally into curvature questions. At the same time, itwas almost impossible not to hear in the halls of math departments talk aboutgauges and the links among gauges, bundles, and elementary particles. In fact,I certainly knew that connections were gauges, but for the life of me couldnot imagine how anyone interested in the real world of physics could havecome up with the idea of connections. In the mid 1980s I heard Yang givea talk at Brown University. This is the Yang of the preceding dictionary and,more importantly, one of the prime architects of modern physics, someone whomakes up half of the phrase “Yang-Mills,” the topic of the next section. In thistalk, he spoke about his growing recognition that mathematicians’ connectionsand physicists’ gauges were the same. He commented on the fact that ittook years for people to realize that these notions were the same, despite thefact that often the mathematicians developing connections and the physicists

Page 287: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

272 21 The Lagrangian Machine, Yang-Mills, and Other Forces

developing gauge theory worked in the same buildings.1 Then he said that, forhim, it was clear how and why physicists developed the theory of gauges. Thesurprise for him was that mathematicians could have developed the same ideafor purely mathematical reasons. (It must be noted that this is a report on a talkfrom almost thirty years ago, and hence might be more of a false memory onmy part than an accurate reporting.)

21.5. Yang-Mills Equations

We now want to outline Yang-Mills theory. Its influence on modernmathematics can hardly be overestimated. We will use the language of theLagrangian Machine.

Let E be a vector bundle of rank k on an n-dimensional manifold M . Letω be a connection matrix for E with respect to some basis of sections forE . There is the corresponding k × k curvature matrix � whose entries are2-forms on the base manifold M . Suppose a metric exists on M . This willallow us to define a Hodge � operator, which as we saw earlier, will map2-forms to (n − 2)-forms. Then we can form a new k × k matrix consistingof (n − 2)-forms, namely, ��.

We are now ready for the key definition.

Definition 21.5.1. The Yang-Mills Lagrangian is the map

YM : Connections on E →R

defined by

YM(ω) =∫

MTrace(�∧ ��).

A connection that is a critical point of this Lagrangian is called aYang-Mills connection. The Euler-Lagrange equations corresponding to thisLagrangian are the Yang-Mills equations.

The manifolds and bundles that are studied are such that the Yang-MillsLagrangians are well defined. Note at the least that (� ∧ ��) is a matrix ofn-forms and hence the trace is a single n-form on M , which can indeed beintegrated out to get a number.

1 I suspect that he was actually thinking of buildings at SUNY at Stony Brook. In the early1970s, Yang from the Stony Brook Physics Department did start to talk to James Simon of theStony Brook Math Department.

Page 288: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

21.5 Yang-Mills Equations 273

The connections that we study are those are the critical points of YM.The corresponding Euler-Lagrange equations are the Yang-Mills differentialequations.

If we take physics seriously, then we should expect that Yang-Millsconnections should be important in mathematics, even in non-physicalsituations. This indeed happens. In particular, Donaldson [14, 15, 16, 36],concentrating on four-dimensional simply connected manifolds with SU (2,C)vector bundles, discovered beautiful mathematical structure that no one hadpreviously suspected.

This process has continued, such as in the equally groundbreaking workof Seiberg-Witten theory [60, 45, 64, 48, 42, 39]. And this revolution willcontinue.

Page 289: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills
Page 290: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Bibliography

[1] M. Atiyah, “On the Work of Simon Donaldson,” Proceedings of the InternationalCongress of Mathematicians, Berkeley CA, 1986, American MathematicalSociety, pp. 3–6.

[2] M. Atiyah, “On the Work of Edward Witten,” Proceedings of the InternationalCongress of Mathematicians, Kyoto, 1990 (Tokyo, 1991), pp. 31–35.

[3] M. Atiyah, Collected Works Volume 6, Oxford University Press, 2005.[4] Stephen J. Blundell, Magnetism: A Very Short Introduction, Oxford University

Press, 2012.[5] R. Bott, Collected Papers of Raoul Bott, Volume 4, edited by R. MacPherson,

Birkhäuser, 1994.[6] William Boyce and Richard DiPrima, Elementary Differential Equations and

Boundary Value Problems, eighth edition, Wiley, 2004.[7] J. Buchwald, Electrodynamics from Thomson and Maxwell to Hertz, Chapter 19

in The Oxford Handbook of The History of Physics (edited by J. Buchwald andR. Fox), Oxford University Press, 2013.

[8] J. Buchwald and R. Fox (editors), The Oxford Handbook of the History ofPhysics, Oxford University Press, 2013.

[9] George Cain and Gunter Mayer, Separation of Variables for Partial DifferentialEquations: An Eigenfunction Approach (Studies in Advanced Mathematics),Chapman & Hall/CRC, 2005.

[10] S. S. Chern, W. H. Chen and K. S. Lam, Lectures in Differential Geometry,World Scientific, 1999.

[11] J. Coopersmith, Energy, the Subtle Concept: The Discovery of Feynman’s Blocksfrom Leibniz to Einstein, Oxford University Press, 2010.

[12] H. Corben and P. Stehle, Classical Mechanics, second edition, Dover, 1994.[13] O. Darrigol, Electrodynamics from Ampère to Einstein, Oxford University Press,

2000.[14] S. Donaldson, “An application of gauge theory to 4-dimensional topology,”

Journal of Differential Topology, Volume 18, Number 2 (1983), pp. 279–315.[15] S. Donaldson, “Connections, cohomology and the intersection forms of

4-manifolds,” Journal of Differential Geometry, Volume 24, Number 3 (1986),pp. 275–341.

275

Page 291: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

276 Bibliography

[16] S. Donaldson and P. Kronheimer, The Geometry of Four-Manifolds, OxfordUniversity Press, 1990.

[17] B. d’Espagnat, Conceptual Foundations of Quantum Mechanics, second edition,Westview Press, 1999.

[18] P. Dirac, Principles of Quantum Mechanics, fourth edition, Oxford UniversityPress, 1958 (reprinted 1981).

[19] A. Einstein et al., The Principle of Relativity, Dover, 1952.[20] Lawrence Evans, Partial Differential Equations, Graduate Studies in Mathemat-

ics, Volume 19, American Mathematical Society, 1998.[21] R. Feynman, R. Leighton and M. Sands, The Feynman Lectures on Physics

Volume 1, Addison-Wesley, 1963.[22] G. Folland, Introduction to Partial Differential Equations, Mathematical Notes,

Vol. 17, Princeton University Press, 1976.[23] G. Folland, Quantum Field Theory: A Tourist Guide for Mathematicians,

Mathematical Surveys and Monographs, Volume 149, American MathematicalSociety, 2008.

[24] G. Folland, Real Analysis: Modern Techniques and Their Applications, Wiley,1999.

[25] A. P. French, Special Relativity, Chapman & Hall, 1989.[26] T. Garrity, All the Mathematics You Missed but Need to Know for Graduate

School, Cambridge, 2002.[27] J. Gray, Henri Poincaré: A Scientific Biography, Princeton University Press,

2012.[28] B. Greene, The Elegant Universe, Vintage Books, 2000.[29] P. Gross and P. R. Kotiuga, Electromagnetic Theory and Computation: A

Topological Approach, Mathematical Science Research Institute Publication, 48,Cambridge Unversity Press, 2004.

[30] V. Guillemin and A. Pollack, Differential Topology, American MathematicalSociety, reprint edition, 2010.

[31] D. M. Ha, Functional Analysis. Volume I: A Gentle Introduction, MatrixEditions, 2006.

[32] D. Halliday and R. Resnick, Physics, third edition, John Wiley and Sons, 1977.[33] J. H. Hubbard and B. B. Hubbard, Vector Calculus, Linear Algebra, and

Differential Forms: A Unified Approach, Prentice Hall, 1999.[34] F. Jones, Lebesgue Integration on Euclidean Space, Jones and Bartlett Learning;

revised edition, 2000.[35] Y. Kosmann-Schwarzbach, The Noether Theorems: Invariance and Conserva-

tion Laws in the Twentieth Century, Springer-Verlag, 2011.[36] H. B. Lawson Jr., The Theory of Gauge Fields in Four Dimensions, American

Mathematical Society, 1985.[37] P. Lorrain and D. Corson, Electromagnetic Fields and Waves, second edition, W.

H. Freeman and Company, 1970.[38] G. Mackey, Mathematical Foundations of Quantum Mechanics, Dover, 2004.[39] M. Marcolli, Seiberg-Witten Gauge Theory, Hindustan Book Agency, 1999.[40] J. McCleary, “A topologist’s account of Yang-Mills theory,” Expositiones

Mathematicae, Volume 10 (1992), pp. 311–352.

Page 292: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Bibliography 277

[41] P. Milonni, The Quantum Vacuum: An Introduction to Quantum Electrodynam-ics, Academic Press, 1994.

[42] J. Moore, Lectures on Seiberg-Witten Invariants, Lecture Notes in Mathematics,Volume 1629, Springer-Verlag, 2001.

[43] T. Moore, A Traveler’s Guide to Spacetime: An Introduction to the SpecialTheory of Relativity, McGraw-Hill, 1995.

[44] F. Morgan, Real Analysis and Applications: Including Fourier Series and theCalculus of Variations, American Mathematical Society, 2005.

[45] J. Morgan, The Seiberg-Witten Equations and Applications to the Topology ofSmooth Four-Manifolds, Princeton University Press, 1995

[46] A. Moyer, Joseph Henry: The Rise of an American Scientist, SmithsonianInstitution Press, 1997.

[47] D. Neuenschwander, Emmy Noether’s Wonderful Theorem, Johns HopkinsUniversity Press, 2011.

[48] L. Nicolaescu, Notes on Seiberg-Witten Theory, Graduate Studies in Mathemat-ics, Vol. 28, American Mathematical Society, 2000.

[49] P. Olver, Applications of Lie Groups to Differential Equations, second edition,Graduate Text in Mathematics, Vol. 107, Springer-Verlag, 2000.

[50] C. O’Raifeartaigh (editor), The Dawning of Gauge Theory, Princeton UniversityPress, 1997.

[51] A. Pais, Subtle Is the Lord: The Science and the Life of Albert Einstein, OxfordUniversity Press, 1982.

[52] A. Pais, Niels Bohr’s Times: In Physics, Philosophy, and Polity, OxfordUniversity Press, 1991.

[53] A. Pais, Inward Bound, Oxford University Press, 1988.[54] H. Poincaré, “The Present and the Future of Mathematical Physics,” Bulletin of

the American Mathematical Society, 2000, Volume 37, Number 1, pp. 25–38.[55] J. Powell and B. Crasemann, Quantum Mechanics, Addison-Wesley, 1961.[56] H. L. Royden, Real Analysis, Prentice Hall, 1988.[57] W. Rudin, Real and Complex Analysis, McGraw-Hill Science/Engineering/Math,

third edition, 1986.[58] W. Rudin, Functional Analysis, McGraw-Hill Science/Engineering/Math, sec-

ond edition, 1991.[59] G. Simmons, Differential Equations with Applications and Historical Notes,

McGraw-Hill, 1972.[60] N. Seiberg and E. Witten, “Monopole Condensation and Confinement in N = 2

Supersymmetric Yang-Mills Theory,” Nuclear Physics, B426, (1994).[61] M. Spivak, Calculus on Manifolds: A Modern Approach to Classical Theorems

of Advanced Calculus, Westview Press, 1971.[62] F. Steinle, Electromagnetism and Field Theory, Chapter 18 in The Oxford

Handbook of The History of Physics (edited by J. Buchwald and R. Fox), OxfordUniversity Press, 2013.

[63] S. Sternberg, Group Theory and Physics, Cambridge University Press, 1994.[64] C. Taubes and R. Wentworth, Seiberg-Witten and Gromov Invariants for

Symplectic 4-Manifolds, International Press of Boston, 2010.[65] R. Tolman, Relativity, Thermodynamics and Cosmology, Oxford University

Press, 1934.

Page 293: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

278 Bibliography

[66] R. A. R. Tricker, The Contributions of Faraday and Maxwell to ElectricalScience, Pergamon Press, 1966.

[67] F. Verhulst, Henri Poincaré: Impatient Genius, Springer-Verlag, 2012.[68] R. O. Wells Jr., Differential Analysis on Complex Manifolds, third edition,

Springer Verlag, 2010.[69] T. Wu and C. Yang, “Concept of nonintegrable phase factors and gobal

formulation of gauge fields,” Physical Review D, Volume 12, Number 12 (1975),pp. 3845–3857.

[70] E. Zeidler, Quantum Field Theory. I: Basics in Mathematics and Physics,Springer-Verlag, 2006.

[71] E. Zeidler, Quantum Field Theory. II: Quantum Electrodynamics, Springer-Verlag, 2009.

[72] E. Zeidler, Quantum Field Theory. III: Gauge Theory, Springer-Verlag, 2009.

Page 294: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Index

acceleration, 46, 48, 57, 61, 70, 79, 83under Lorentz transformations, 45

action, 71, 81adjoint, 150–152, 162, 181, 185, 195alternating forms, 248, 249amplitude, 3, 163, 164, 178, 193annihilation operator, 179, 181, 185, 196Atiyah, Michael, 6

basis, 124, 160, 218, 229, 230, 234–236, 241,242, 244, 247–253, 255, 256, 261, 262

for �k (Rn ), 104, 106, 120, 122–126, 128,129, 131, 139

Hamel, 160Hilbert space, 147–149, 161orthonormal, 147–149, 161, 168Schauder, 147, 148, 160

Blundell, Stephen J., 7Born, Max, 3Bott, Raoul, 6, 8Boyce, William, 190Buchwald, Jed, 7bundle

fiber, 215, 217principal, 219tangent, 222–224, 229trivial, 238vector, 91, 214, 216, 217, 219, 220, 222,

232–235, 238, 239, 241, 243, 244, 246,253–255, 257–259, 261, 262, 267–273

Cain, George, 190calculus of variations, 70, 71, 77, 78, 80, 130,

132, 133Cauchy sequence, 143–146, 157, 158Cauchy-Schwarz inequality, 167

chain rule, 23, 73, 74, 84, 103, 137, 225, 229charge invariance, 63commutator, 156, 171complete inner product space, 143, 144, 157,

159, 175connection, 5, 91, 233–240, 242, 243, 254,

257–259, 261, 262, 266–271for Maxwell’s equations, 263–265, 267and curvature, 258trivial, 239, 254, 261Yang-Mills, 5, 6, 266, 272

conservation of energy, 83, 85conservative vector field, 67, 68, 79Coopersmith, Jennifer, 46coordinate charts, 206Corson, Dale, 62Coulomb gauge, 187, 189Coulomb’s law, 1, 56, 59, 62, 63, 65covariant derivative, 240, 243creation operator, 179, 181, 185, 196critical curves, 72curl, 10–15, 91, 119, 121current, 2, 9, 12, 17, 20, 21, 25, 62, 134, 186,

187, 189, 263, 269current 1-form, 130, 131, 264, 265curvature, 5, 201, 214, 257, 258, 261, 262,

264–267, 269–271from connections, 258definition, 258in beginning calculus, 257matrix, 259Maxwell’s equations, 263–265, 267

density, 17charge, 9, 263, 265energy, 193

279

Page 295: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

280 Index

density (cont.)magnetic, 11

differential forms, 103, 107, 109, 115, 119,120, 130, 131, 250, 264, 265

and vector fields, 121as tensors, 248elementary, 104integrating k-forms, 114

DiPrima, Richard, 190Dirac, P. A. M., 3, 171divergence, 10, 13–15, 119, 121Divergence Theorem, 10Donaldson, Simon, 6, 273

eigenvalue, 150eigenvector, 166Einstein, Albert, 2, 7, 27, 33, 164,

186electromagnetic

force, 58, 98, 99, 2652-form, 130, 131, 139, 264waves, 2, 3, 17, 20–22, 28, 163, 164, 186,

187, 193, 195energy, 3, 46, 47, 80, 84, 85, 87, 163, 164,

170, 178, 182, 183, 193, 196conservation of, 83, 85density, 193energy-momentum, 48kinetic, 80, 82, 83, 164, 178, 179potential, 79, 80, 82, 83, 87, 164, 170, 178,

179Euler-Lagrange equation, 71, 73, 75–77, 82,

83, 99, 101, 102, 134, 136, 139–141, 264,272

Evans, Lawrence, 190exterior algebra, 103, 119, 120, 124exterior derivative, 106, 121extreme values, 72

Faraday 2-form, 130Faraday, Michael, 2Feynman, Richard, 83fiber bundle, 215, 271Folland, Gerald, 189force, 56, 57, 62, 64, 69, 70, 79, 83, 165, 178,

267, 269conservative, 67, 79electromagnetic, 4, 56, 58, 59, 98, 99, 265,

269, 270gravitational, 4, 57, 58, 68, 80, 269special relativistic definition, 61, 62

strong, 4, 71, 88, 133, 267, 269, 270under Lorentz transformations, 61weak, 4, 71, 88, 133, 267, 269, 270

Fourier series, 149framing, 218, 234, 236French, A. P., 32

Galilean transformations, 31Galvani, Luigi, 1Gilbert, William, 1Glashow, Sheldon, 4gradient, 13, 79, 83, 119, 121, 205Gram-Schmidt process, 148, 161Gray, Jeremy, 8Greene, Brian, 8Gross, Paul, 139group, 53, 85

action, 268automorphism, 219general linear, 216Lie, 219orthogonal, 32permutations, 124unitary, 268, 270

Guillemin, Victor, 89

Halliday, David, 56Hamiltonian, 170, 178, 179, 181, 182, 185,

186, 194, 196harmonic function, 189harmonic oscillator, 170, 176, 178, 179,

185–187, 193–195energy of, 178, 182

Hausdorff, 207Heisenberg, Werner, 3Hermitian operator, 142, 149–152, 155, 156,

162, 165–169, 172–174, 185,199, 200

adjoint, 150eigenvalues of, 151, 166

Hertz, Heinrich Rudolf , 2, 20Hilbert space, 142, 144, 147, 149–153,

157–161, 165–168, 170–172, 174, 179,180, 184, 185, 199

L2[0,1], 146, 149l2, 145, 149, 159basis, 147–149, 161

Hodge � operator, 119, 127–129, 272homogeneous polynomials, see symmetric

tensorsHubbard, Barbara, 115

Page 296: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Index 281

Hubbard, John, 115hydroelectric dam, 12

inner product, 122–126, 128, 129, 142–145,147, 151, 154, 157, 161, 162, 166–168,172, 175, 180, 185

complete, 143, 144, 157, 159, 175integration by parts, 73, 75, 93, 138invariant, 37, 85

Jacobian, 111, 114, 118, 203, 204, 225, 230Jordan, Pascual, 3

kinetic energy, 80, 178, 179Kobayashi, Shoshichi, 6Kosmann-Schwarzbach, Yvette, 85Kotiuga, P. Robert, 139

Lagrangian, 70, 71, 78, 80–83, 85, 87, 88,98–102, 130, 132, 136, 176, 179, 265,267–269, 272

electromagnetic, 98, 99, 133, 136, 140, 264inverse problem for, 101

Laplacian, 18Lebesgue measure, 159, 170Leibniz’s rule, 227, 233, 234light cone, 40light-like, 40linear

change of coordinates, 34, 251differential equation, 25operator, 149, 151, 174, 175, 180, 199

Lorentz contractions, 35, 36Lorentz metric, 38Lorentz transformations, 31, 32, 35, 37–39,

41, 43, 45, 52, 53, 62, 63, 68acceleration, 45force, 61mass, 48velocity, 43

Lorentz, Hendrik, 8Lorrain, Paul, 62lowering operator, 181

Möbius Strip, 220, 222magnetism, 63manifold, 130, 201, 202, 206, 211–223,

232–235, 238–240, 244, 246, 253, 254,257, 258, 261, 262, 265–268, 272, 273

abstract, 201, 206, 212, 213, 224, 227–229four-dimensional, 5

functions on, 212implicit, 201, 205, 212parametric, 113, 118, 201, 203, 212, 213,

223–225Marathe, Kishore, 6mass

relativistic, 48rest, 48

Maxwell’s equations, 3, 5, 6, 9–11, 15, 20–22,25, 27, 28, 56, 58, 62, 70, 88, 89, 91, 96,97, 103, 119, 131, 132, 134, 136, 139,142, 186, 189, 195, 201, 214, 238, 239,263, 267, 268

vacuum, 20nabla form, 14via differential forms, 130, 131, 264, 265

Maxwell, James Clerk, 1–3, 20Mayer, Gunter, 190Mills, Robert, 4Milonni, Peter, 179Minkowski length, 39Minkowski metric, 38, 39, 119, 125, 129, 131,

139Minkowski, Hermann, 8momentum, relativistic, 46monochromatic, 186, 194, 195Moore, Thomas, 46

nabla, 12Neuenschwander, Dwight, 85Newton, Isaac, 56, 57, 81Noether’s Theorem, 85

O’Raifeartaigh, Lochlainn, 8Oersted, Hans Christian, 2operator

linear, 149, 151, 174, 175, 180, 199q( f ) and p( f ), 155spectrum, 150, 152, 166

Pais, Abraham, 7, 164parallel transport, 243path integral, 79, 80, 110, 118permutation, 105, 124, 250photoelectric effect, 3, 142, 163, 164, 186, 196Planck constant, 164, 170Poincaré’s Lemma, 107Poincaré, Henri, 8Pollack, Alan, 89potential, 16, 80, 88, 91, 131–133, 267

Coulomb, 187, 189

Page 297: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

282 Index

potential (cont.)energy, 79, 80, 83, 87, 164, 170, 178, 1791-form, 131, 132, 136, 140, 264scalar, 88, 89, 91, 97–100, 102, 187–189vector, 88, 89, 91, 97–100, 102, 187, 188,

196proper time, 36, 47, 54

quantization, 163, 170, 176, 186, 195, 271

raising operator, 181relativistic displacement, 38, 39, 46, 47relativistic mass, 48relativistic momentum, 46, 54Resnick, Robert, 56

Salem, Abdus, 4Schrödinger’s equation, 169, 170Schrödinger, Erwin, 3Schwartz space, 153, 154, 157, 158, 160, 172,

185section, 218, 232–235, 238, 239, 241, 243,

244, 246, 253, 254, 258–262, 267, 268,272

covariant derivative of, 240, 243separation of variables, 190Simon, James, 272space-like, 39–42special relativity, 8, 21, 23, 25, 27, 33, 39,

59–63, 85, 125spectrum, 150, 152, 166springs, 170, 176, 178, 186square integrable functions, 146, 170Steinle, Fredrich, 7Sternberg, Shlomo, 267Stokes Theorem, 10, 11symmetric tensors, 250, 251

tangent, 11, 67, 103, 109, 224, 229, 240, 241,243, 244, 257

bundle, 222–224, 229line, 224, 225plane, 111, 223–226, 230space, 114, 202, 224, 225, 227, 230

Taubes, Clifford, 6

tensor product, 233, 247alternating forms, 248, 249linearizations of bilinear maps, 251symmetric tensors, 250, 251

time dilation, 35, 36time-like, 39, 40, 42transformations

Galilean, 31Lorentz, 31, 32, 35, 37–39, 41, 43, 45, 52,

53, 61–63, 68transition functions, 217, 238translations, see transformations

Uhlenbeck, Karen, 6

vacuum, 9, 20, 27, 179, 183vector bundle, 91, 214, 216, 217, 219, 220,

222, 232–235, 238, 239, 241, 243, 244,246, 253–255, 257–259, 261, 262,267–273

velocity, 32, 35, 38, 40, 46, 48, 49, 51, 57–61,68, 78, 80, 100, 178

light, 30under Lorentz transformations, 43

Verhulst, Ferdinand, 8Volta, Alessandro, 1von Neumann, John, 3

waveamplitude, 3, 163, 164, 178, 193electromagnetic, 2, 3, 17, 20–22, 28, 163,

164, 186, 187, 193, 195equation, 17–19, 21, 25, 26, 189, 198equation for vector fields, 19

Weinberg, Steven, 4Weyl, Herman, 4, 5, 8Witten, Edward, 7, 8Wu, Tai Tsun, 5, 270

Yang, Chen-Ning Franklin, 4, 5, 270–272Yang-Mills, 5, 6, 266, 267, 271, 272Yau, Shing-Tung, 6

Zeidler, Eberhard, 6, 85

Page 298: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Plate 1

Page 299: Electricity and magnetism for mathematicians : a guided path from Maxwell's equations to Yang-Mills

Plate 2