classical mechanics and dynamical systemsutf.mff.cuni.cz/~scholtz/data/mechanics-mathematica.pdf ·...

Martin Scholtz

Classical MechanicsandDynamical Systems

With calculations in Mathematica

December 27, 2012

2

Department of Applied MathematicsFaculty of Transportation Sciences

Czech Technical University in Prague

Contents

1 Classical mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1 Newton’s laws of motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.1 Einstein’s convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4 Potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.5 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.6 Conservation of momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.7 Conservation of angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.8 Curvilinear coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.8.1 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.8.2 Spherical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.2 Lagrange equations of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.2.1 Generalized coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.2 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.2.3 Generalized forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.4 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3 Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.4 Particle in homogeneous gravitational field . . . . . . . . . . . . . . . . . . . . . . . . 472.5 Harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.6 Mathematical pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Contents

2.7 Lagrange equations in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.8 Solving the equations of motion of pendulum . . . . . . . . . . . . . . . . . . . . . . 552.9 Deriving the Lagrangian in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . 572.10Planet in gravitational field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2 Legendre transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.4 Particle in homogeneous gravitational field . . . . . . . . . . . . . . . . . . . . . . . . 683.5 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.5.1 Homogeneous functions and Hamiltonian . . . . . . . . . . . . . . . . . . . . . 703.5.2 Conservation of energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.6 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.7 Harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Variational principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.1 Fermat’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.2 Formulation of variational problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.3 Variation of the functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.4 Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.5 Non-uniqueness of the Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.6 Variational derivation of Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . 914.7 Noether’s theorem: motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924.8 Noether’s theorem: proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.9 Basic conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Hamilton-Jacobi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.1 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.2 Hamilton-Jacobi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.3 Example: harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.4 Action-angle variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6 Electromagnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.1 Lagrangian and equations of motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.2 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.3 Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.4 Homogeneous fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.5 Electromagnetic wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Contents 5

6.6 Electrostatic wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7 Discrete dynamical systems and fractals . . . . . . . . . . . . . . . . . . . . . . . . . 1297.1 Complex sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297.2 Mandelbrot set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8 Dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1398.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1398.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1418.3 Implementation in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1438.4 Chaotic pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468.5 Critical points of the pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528.6 Stability of critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

8.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1598.7 Classification of critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8.7.1 Stable and unstable nodes, saddle points . . . . . . . . . . . . . . . . . . . . . 1618.7.2 Centres and foci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.8 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1748.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1768.10Flow of the vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1838.11Lyapunov stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

9 Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1899.1 Saddle-node bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1899.2 Transcritical bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1909.3 Pitchfork bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1929.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

A Important commands in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . 201A.1 D-derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201A.2 Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

B Some features of Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203B.1 Rules of replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203B.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204B.3 Pure functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205B.4 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206B.5 Working with heads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

6 Contents

C Shortcuts in Mathematica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209C.1 Greek letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

D To do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

1

Classical mechanics

Classical mechanics is the most basic part of the physics. In fact, the physics asan exact science started with the development of mechanics by sir Isaac Newton.Conventionally we distinguish two parts of mechanics: the kinematics and the dy-namics. In this chapter we introduce basic notions of dynamics, including the notionof generalized coordinates and several conventions being used thorough the entiretextbook.

The word “kinematics” is derived from the Greek word κινειν (kinein) meaning“to move”. Thus, the kinematics studies the motion of bodies and point masses. Itdoes not, however, ask why the bodies move in a given way, but rather it providesus with the description of the motion. In kinematics we ask where the bodies are,at what velocities and with what accelerations they move. We also classify the kindsof motion according to the shapes of the trajectories or according to the velocities.Typical kinematic quantities are position, velocity and acceleration.

In dynamics, on the other hand, we study reasons of the motion. The word “dy-namics” has an ancient origin as well: δυναµικoς means “powerful”. In this branch ofmechanics we ask what are the forces acting on the bodies and what is the influenceof the forces on the motion. This influence will not depend on the force themselvesonly, but also on the mass of the bodies. Mass, force and momentum belong to basicquantities in dynamics.

1.1 Newton’s laws of motion

In this chapter we start with reformulation of non-relativistic classical, i.e. Newto-nian, dynamics. In Newtonian dynamics, physical bodies or idealized point particlesare moving and interacting according to Newton’s laws of motion. Newton himselfformulated them in his Mathematical Principles of Natural Philosophy as follows:

8 1 Classical mechanics

• Law of inertiaEvery body persists in its state of being at rest or of moving uniformly straightforward, except insofar as it is compelled to change its state by force impressed.

• Law of forceThe alteration of motion is ever proportional to the motive force impressed; andis made in the direction of the right line in which that force is impressed.

• Law of action and reactionTo every action there is always an equal and opposite reaction: or the forces oftwo bodies on each other are always equal and are directed in opposite directions.

These laws involve important notions of force, momentum and mass and we as-sume that reader is familiar with them. Consider a point particle of mass m. Choosingsome fixed point O (origin) in the space, we can describe motion of the particle byposition vector (radius vector) r. Position vector is time-dependent if the body ismoving with respect to the origin O, which is mathematically denoted by r = r(t).Trajectory of the point particle is the set of all end-points of the position vectorin some time interval (see fig. 1.1). Velocity is defined as a derivative of r(t) withrespect to time:

v(t) =dr

dt.

The total derivative with respect to time will be often denoted by “dot”, so that thelast equation is briefly written as v = r(t) = r. Sometimes it is useful to parametrizeposition vector by other parameter than time, for example by the length of thetrajectory.

Similarly, second derivative with respect to time will be denoted by “double-dot”.The most important example is the definition of acceleration, which is the secondderivative of position vector with respect to time:

a =d2r

dt2=

dv

dt= r = v.

Quantities r, v and a are so called kinematic quantities. They describe the motionindependently of the causes and reasons of motion. According to Aristotle, the motionis caused by forces, but this is wrong. Aristotle’s opinion was so influential that itstopped the progress in physics for the next two thousands years. Experimentalresearch of Galileo Galilei, his discovery of the law of inertia, and finally the grandwork of Isaac Newton founded the basis of modern physics.

Why is Aristotle’s point of view wrong? Well, we have to clarify what we mean bythe statement that “the motion is caused by the forces”. The law of inertia says that

1.1 Newton’s laws of motion 9

O(origin)

P (position at time t)

r(t)(radius vector)

trajectory

v(t)(velocity)

Fig. 1.1. The point mass is moving along the trajectory. Its position at time t is given by positionvector r(t). The velocity v = r

if there is no force, the body will move uniformly along the straight line. We needthe force to change the motion, not to preserve it. Therefore there is no connectionbetween the force and velocity, but there must be a relation between the force andacceleration. This crucial point was missed by Aristotle.

The precise form of the relation between force and acceleration is given by New-ton’s second law, the law of force. We expect that acceleration has the same directionas the force, and that bigger force will cause bigger acceleration. Experience teachesus that we need bigger force to change the motion of heavier bodies, so the acceler-ation must be inversely proportional to the mass. This simple consideration directlyleads us to the suggestion

a =F

m,

where m is the mass of the body and F is the force acting on the body. The lastformula is a mathematical expression of Newton’s second law and the experimentsshow that it is in a very good accordance with the reality, although it fails for highvelocities, strong gravitational fields and for microscopic objects.

We can formulate this law in slightly different form by defining the (linear) mo-mentum p of the body:

p = mv.

Momentum incorporates both the measure of the inertia (mass) and the “state ofmotion”, velocity. The force can be then defined as the change of the momentum intime, i.e.


F =dp

dt.

If the mass of the body is constant in time, we have p = mv = ma, which is againNewton’s law

F = ma.

1.2 Index notation

In the previous section we defined the position vector, velocity and the other quanti-ties geometrically. For example, by position vector we mean oriented line connectingthe origin with given point, the velocity was defined as a vector tangent to thetrajectory etc. This geometrical language is very convenient and useful and will bedeveloped in more detail thorough the textbook. However, if we want to describethe position of a body, we have to introduce a coordinate system in which we canspecify the coordinates. In what follows the notion of coordinates will be crucial. Inthis section we therefore briefly review the basics of the so-called index notation.

We suppose that the reader is familiar with the Cartesian coordinate system.Consider figure 1.2. Again, the point O is the origin of the reference frame and wewant to specify the position of the point P . The position vector r is an orientedline connecting points O and P . Now, choose three lines called x, y and z, which areperpendicular to each other and intersect at the origin O. These lines are called axes.Then to each point P we can assign three real numbers called Cartesian coordinates.Symbolically we write

r = (x, y, z)

and say that x, y and z are the coordinates (or the components) of the position vectorr. In the index notation we define

x1 = x, x2 = y, x3 = z.

Thus, each point P has three coordinates

xi, i = 1, 2, 3.

If the position vector depends on time, i.e. r = r(t), also its coordinates xi do:

xi = xi(t).

1.2 Index notation 11

The components of the velocity v = r are then

vi = xi,

and the components of the acceleration are

ai = vi = xi.

Any vector equation can be then written equivalently in the index form. Forexample, Newton’s law of force F = ma is equivalent to equation

Fi = mai.

Substituting for ai we obtain the index form of the law of force:

Fi = mxi.

Recall that the momentum of the particle was defined as p = mv. In the indexnotation we can write

pi = mvi = mxi.

1.2.1 Einstein’s convention

Let x and y be arbitrary vector quantities with corresponding components xi andyi. In other words,

x = (x1, x2, x3),

y = (y1, y2, y3).(1.1)

The scalar product of these vector can be defined as a scalar quantity

x · y = x1 y1 + x2 y2 + x3 y3.

Using the summation symbol Σ, the last equality can be rewritten as

x · y =3∑i=1

xi yi.

This notation means that we subsequently substitute values 1, 2, 3 for the variablei and then add all terms. Notice that under the summation symbol we have twovector quantities and the index i appears there exactly twice. In fact, expressions ofthis type arise very often in mathematics and physics. Albert Einstein introduced aconvention named after him, in which we do not write the symbol Σ. More precisely,if some index appears in some term exactly twice, the sum through this index isautomatically assumed. That is, the scalar product can be written simply as

x · y = xi yi.


1.2.2 Differentiation

Let us see another example. Suppose that f is a physical quantity depending on theposition, i.e.

f = f(r) = f(x, y, z) = f(x1, x2, x3).

If the coordinates xi represent the position of moving body, functions xi depend ontime, i.e. xi = xi(t). Then also the quantity f depends on time, for we can write

f(t) = f(x(t)).

The total derivative of f with respect to time reads

f =∂f

∂x1

dx1dt

+∂f

∂x2

dx2dt

+∂f

∂x3

dx3dt

=3∑i=1

∂f

∂xi

dxidt.

We can see that the expression under the sum has again the same structure: index iappears there exactly twice. According to Einstein’s convention we therefore write

f =∂f

∂xi

dxidt.

For convenience we introduce also the notation

∂

∂xi= ∂i,

and together with notation xi = dxi/dt we can write simply

f = xi ∂if.

Obviously, index notation is very compact and brief.

1.2.3 Examples

Example 1: scalar product

Let x and y be vectors with the components

x = (1, 2, 3), y = (−5, 1, 1). (1.2)

Find the scalar product x · y.


O

P (x, y, z)

r

x

y

z

Fig. 1.2. Position vector in Cartesian coordinates.

Solution. Both vectors have three components, so that the index i takes values

i = 1, 2, 3.

The scalar product is defined as

x · y = xi yi.

Since the index i repeats twice in the previous expression, according to Einstein’ssummation convention we have

xi yi = x1 y1 + x2 y2 + x3 y3.

Substituting values (1.2) we find

xi yi = 1 · (−5) + 2 · 1 + 3 · 1 = 0.

Thus, the scalar product of given vectors vanishes:

x · y = 0.

In such a case the vectors x and y are said to be mutually orthogonal.


Example 2: divergence

Let

v = (v1, v2, v3)

be a given vector field in Cartesian coordinates. Write down the expression

∂ivi

explicitly. Expression ∂ivi is called the divergence of vector field v. Evaluate resultingexpression for vector field

v = (x− y, x2 + y2, xy). (1.3)

Solution. According to Einstein’s convention, expression ∂ivi is a sum through i:

∂ivi = ∂1v1 + ∂2v2 + ∂3v3

or, equivalently,

∂ivi =∂v1∂x

+∂v2∂y

+∂v3∂z

.

Substituting (1.3) we find

∂1v1 = 1,

∂2v2 = 2 y,

∂3v3 = 0

(1.4)

so that the divergence is

∂ivi = 1 + 2 y.

Example 3: The Laplace operator

The Laplace operator or the Laplacian is an operator ∆ defined as

∆f = ∂i ∂if,

where f is an arbitrary object (scalar function or the component of vector). Find theexpression for the Laplacian in the Cartesian coordinates.


Solution. Again, we use Einstein’s convention in a straightforward way:

∆f = ∂i ∂if = ∂1∂1f + ∂2∂2f + ∂3∂3f =∂2f

∂x2+∂2f

∂y2+∂2f

∂z2.

Thus, the Laplacian reads

∆ =∂2

∂x2+

∂2

∂y2+

∂2

∂z2.

Example 4: radius vector

Let

r = (x, y, z)

be a position (radius) vector. Its magnitude is given by

r =√r · r =

√x2 + y2 + z2.

Find the derivatives of radius vector with respect to Cartesian coordinates and writedown the result in the index notation.

Solution. We need to evaluate quantities ∂ir. We start with ∂1r, i.e. with the deriva-tive with respect to coordinate x. We have

∂r

∂x= ∂x(x

2 + y2 + z2)1/2 =1

2(x2 + y2 + z2)−1/2 2x =

x√x2 + y2 + z2

.

Thus, we arrived at

∂r

∂x=x

r.

Similarly, one can show that for the other coordinates the following holds:

∂r

∂y=y

r,∂r

∂z=z

r.

All these result can be summarized in the index notation as

∂ir =xir.


Example 5: time dependence

Now let the radius vector from the previous example be time-dependent in such waythat

x(t) = a cosωt,

y(t) = a sinωt,

z(t) = 0.

(1.5)

Find the velocity v = r, the acceleration a = v and their magnitudes.

Solution. Since r = (x, y, z), we can find the velocity by straightforward differenti-ation:

x = − b ω sinωt,

y = b ω cosωt,

z = 0.

(1.6)

The velocity is then

v = (− b ω sinωt, b ω cosωt, 0).

Its magnitude is

v =√v · v =

√(b ω sinωt)2 + (b ω cosωt)2 = ω b.

Differentiate the velocity to find the acceleration,

a = (− b ω2 cosωt,−b ω2 sinωt),

and calculate the magnitude:

a =√a · a = ω2 b.

Example 6: The total differential

Prove the relation

d

dtv2 = 2v · v, (1.7)

preferably in the index notation.

1.3 Kinetic energy 17

Standard solution. First we prove (1.7) in an usual way. If the components of vare

v = (v1, v2, v3),

its total derivative with respect to time is

v = (v1, v2, v3).

The magnitude of v is

v2 = v21 + v22 + v23.

Let us differentiate it with respect to time:

d

dtv2 = 2 v1 v1 + 2 v2 v2 + 2 v3 v3 = 2v · v.

Solution in index notation. The proof is essentially identical to the previous one,but more compact:

d

dtv2 =

d

dtvi vi = 2 vi vi = 2v · v.

1.3 Kinetic energy

The notion of energy is somewhat subtle and its full understanding relies on the so-called Emmy Noether theorems, see sections 4.7 and 4.8 in chapter 4. In mechanics,however, the situation is quite simple. Roughly speaking, the body has an energy ifit can perform the work.

The force can cause the displacement of a body and the quantity known as “work”is a quantitative characteristics of this process. Suppose that the force F causes thedisplacement of the body along a given trajectory from point A to point B. The workdone by this force is defined by

W =

B∫A

F · dr.

Suppose, in addition, that the body was at rest at point A, while the final velocityof the body was v. Let us evaluate the work done by the force in terms of the finalvelocity. Using Newton’s law in the form


F = m v

we have

W =

B∫A

mdv

dt· dr.

Notice that infinitesimal displacement dr is related to the velocity via

dr = v dt,

so that

W = m

B∫A

v · dv.

Now we can use relation (1.7) to find

W = m

B∫A

1

2dv2.

The last expression integrates to

W =1

2m (v2

B − v2A) =

1

2mv2,

where we have used the assumptions

vA = 0, vB = v.

Let us analyze the result

W =1

2mv2

in some detail. First, we can see that the work W done by the force F does notdepend on the trajectory. It does not matter whether the body was moving alongthe line or along the curved trajectory, the result depends only on the final velocityv. Moreover, the work does not depend on the character of motion: the body couldbe accelerated uniformly with constant acceleration, or it could be accelerated with

1.4 Potential energy 19

variable acceleration, but the work depends only on the final velocity. And, finally,the work does not depend on the force. The force could be small and act for a longtime, or it could be big and act only for a moment, but the work depends only onthe final velocity.

Summa summarum, if the body was at rest at the beginning, but it had velocityv at the end, the work needed to accelerate the body is always the same, regardlesson the way how it was accelerated. This work is called kinetic energy and is definedby

T =1

2mv2 =

1

2mxi xi. (1.8)

The body has kinetic energy if it is moving and the kinetic energy is equal to workwhich must be done by the force to accelerate the body from the rest to velocity v.

1.4 Potential energy

The notion of potential energy is a subtle one and it cannot be defined for a generalsystem, as we will see later in this textbook. Suppose that there is some force actingon the particle. It can be gravitational force, electromagnetic force or the force whichspring exerts on the point mass attached to one of its endpoints. In the cases justenumerated, we can give an explicit expressions for the force. Gravitational force isgiven by Newton’s gravitational law

Fg = Gm1m2

r2

where m1 and m2 are masses of bodies and r their distance, G is gravitationalconstant. Electromagnetic force acting on the point charge q moving at velocity vin electromagnetic field characterized by electric field E and magnetic field B is theso-called Lorentz force

F EM = q (E + v ×B) .

When the spring is displaced from its equilibrium position by y, it exerts force

F = − k y

where k is the constant characterizing the spring. Hence, one way how to characterizethe force is to give an explicit expression. Since the force is vector quantity, we haveto specify three coordinates Fx, Fy and Fz.


In the previous section we introduced kinetic energy which is a quantity charac-terizing the state of motion. Recall that it is equal to work which is necessary toaccelerate the body of mass m from the state of rest to state of motion at velocityv. An important feature of kinetic energy is that it does not depend on the processhow the body acquired its velocity.

Now, suppose that the body is under influence of the force so that its velocity isbeing changed. This is connected to corresponding change of kinetic energy of thebody. For example, the body released from some altitude undergoes the change fromzero velocity to accelerated motion called free fall under the influence of gravitationalforce. In this case it is the gravitational field which performs the work on the bodyand this work is equal to the change of kinetic energy of the body. Thus, another wayhow to characterize the force is to specify how the kinetic energy of the body changesunder this force. Potential energy will be therefore defined as the work performed bythe force acting on the body.

Suppose that under the influence of the force, the body was displaced from pointA to point B along the trajectory γ depicted in figure 1.3. Work performed by theforce during this motion is, as usually, defined by

W =

∫γ

F · dr (1.9)

where symbol γ resembles the trajectory along which the body was moving. However,if we choose any other curve γ′, figure 1.3, the work associated with this curve willbe, in general, different:

W ′ =

∫γ′

F · dr 6= W. (1.10)

In such a case, the notion of potential energy is useless because it depends on partic-ular trajectory. Hence, in general, potential energy is meaningless quantity. Surpris-ingly enough, there are many examples of forces where the work W in fact does notdepend on the choice of trajectory γ. Such forces are called conservative or potentialforces and in such cases we can define a useful and meaningful potential energy.

Let us find which forces have this property. We demand that integral (1.9) does notdepend on γ and it depends only on points A and B and investigate the consequencesof this assumption. Then, however, work performed along any closed loop must beequal to zero. Indeed, let γ be arbitrary closed loop as depicted in figure 1.4. Let uschoose arbitrary two points A and B lying on the curve. In this way we obtain two


A

B

γ′

γ

Fig. 1.3. Under the influence of force F , body moves from point A to point B. Potential energy isthe work done by the force during this displacement. However, there are infinitely many trajectoriesconnecting these two points and the work is, in general, different for each of them.

curves: γ1 which is part of γ going from A to B and γ2 which is going from B to Aalong different trajectory. Integral over γ can be written as a sum∮

γ

F · dr =

∫γ1

+

∫γ2

F · dr. (1.11)

We have made an assumption that for any two points A and B the integral be-tween these two points does note depend on the trajectory but only on the pointsthemselves. Thus, we can write∫

γ1

F · dr =

B∫A

F · dr

where we do not specify the trajectory as the integral does not depend on it. Similarconsideration applies to integral over γ2 but notice that this curve starts at point Band ends at point A. Hence,∫

γ2

F · dr =

A∫B

F · dr = −B∫A

F · dr.

Therefore, both integrals on the right hand side of (1.11) have the same values butdiffer by sign and so we arrive at

22 1 Classical mechanics∮γ

F · dr = 0 (1.12)

as claimed. Conversely, we leave to the reader to show that if (1.12) hold for arbitraryclosed loop γ, then necessarily integral between any two points does not depend onthe trajectory. Conservative forces are those for which (1.12) holds.

There is yet another formulation of the fact that the force is conservative. Thislast formulation is convenient for practical purposes because it is a differential ratherthan integral criterion for a force to be conservative. If γ is arbitrary closed loop andthe force is conservative, i.e. (1.12) holds, we can use the Stokes theorem to convertthe line integral into a surface integral:∮

γ

F · dr =

∫S(γ)

(∇× F ) · dS (1.13)

where S(γ) is the surface surrounded by loop γ. Then, by the conservative characterof F , the Stokes theorem implies∫

S(γ)

(∇× F ) · dS = 0

for arbitrary loop γ. But since the choice of γ is arbitrary, the last equality can holdfor all possible loops only if the integrand vanishes everywhere, i.e.

∇× F = 0. (1.14)

In other words, the curl of conservative field F is necessarily zero. Poincare’s lemmathen asserts that any vector field with vanishing divergence is the gradient of somescalar field φ,

F = −∇φ, (1.15)

so that the components of the force are given by partial derivatives of function φ:

Fi = − ∂φ

∂xi≡ −∂iφ. (1.16)

The sign minus is conventional.Let us recapitulate. We have discussed the notion of the work performed on the

body by force of the external force field in which the body is moving. We have argued


A

B

γ1

γ = γ1 ∪ γ2

γ2

Fig. 1.4. Let γ be any closed trajectory (a loop) and A and B any of its two points which split thecurve γ into a union of curves γ1 and γ2.

that this work in general depends not only on the initial and final positions but alsoon the trajectory. Then we defined a special class of the forces for which this is nottrue and the work is actually path-independent and called such forces conservativeor potential. We have found four equivalent criteria for the force to be conservative:

• For arbitrary points A and B, the integral

W =

B∫A

F · dr (1.17)

does not depend on the trajectory between points A and B.• For arbitrary closed curve γ integral W vanishes,∮

γ

F · dr = 0. (1.18)

• The curl of the force field vanishes,

∇× F = 0. (1.19)

• Force field F is (minus) the gradient of some scalar function

F = −∇φ. (1.20)

Function φ, if exists, is called the potential of vector field F or simply the potentialenergy.


Sometimes, especially in the context of Lagrange’s and Hamilton’s formalism, thepotential is denoted by V instead of φ. This convention will be followed later in thetextbook.

To conclude this section we repeat what is the potential energy. This term canbe defined only for conservative forces, i.e. for forces satisfying one of equivalentconditions1 (1.17)–(1.20). Then, by (1.20), there exists a function φ such that F =−∇φ. This function is, by definition, called potential energy. Recall that our originalmotivation was to characterize the field not by the force but by the work whichthe force performs during the motion of body. This work, for conservative forces, isdirectly related to the potential:

W =

B∫A

F · dr = −B∫A

(∇φ) · dr = −B∫A

dφ = φA − φB. (1.21)

Thus, the work performed by the force is equal to difference of the values of thepotential at the initial and at the final point of the trajectory. In the proof we haveused the identity

dφ = ∂iφ dxi = (∇φ) · dr.

1.5 Conservation of energy

Adjective “conservative” introduced to name forces which display properties (1.17)–(1.20) reflects the fact that energy of a moving body is constant in such force field.We define the total mechanical energy of the body in conservative field F = −∇φ by

E = T + φ. (1.22)

This quantity is constant in time. Before we proof this statement, notice that bysecond Newton’s law and the definition of the potential, the acceleration of the bodyis

a =dv

dt=

1

mF = − 1

m∇φ.

Let us differentiate the total energy with respect to time:

1 The equivalence means that if the force satisfies one of these conditions, it automatically satisfies re-maining three conditions.

1.6 Conservation of momentum 25

dE

dt=

d

dt

(1

2mv2 + φ

)= mv · v + dφ = −v · ∇φ+ (∇φ) · v = 0. (1.23)

Thus, quantity E is indeed constant in time, i.e. it is conserved.

Theorem 1. Mechanical energy of the system in which the forces are potential isconstant.

Later we will see that the conservation of energy is in fact a consequence of adeeper principle that the laws of motion cannot depend on time, i.e. the laws arethe same at all times. We say that the time is homogeneous. Precise meaning of thisstatement will be clarified in sections 4.7 and 4.8.

1.6 Conservation of momentum

Consider a system of N particles, each of them has position vector ri, velocity vi = riand acceleration ai = vi = ri, where i = 1, 2 . . . N . Mass of the i−th particle will bedenoted by mi and hence the momentum of i−the particle is pi = mivi.

These particle are in the interaction with each other and, in addition, there canbe an external force acting on each particle, e.g. gravitational force. Internal forceexerted by i−th particle on j−th particle will be denoted by F ij. In accordance withthe action of reaction, internal forces obey relations

F ij = −F ji. (1.24)

According to the law of force, total force exerted on i−th particle is equal to derivativeof its momentum, i.e.

dpidt

= F i +∑j 6=i

F ij (1.25)

where the total force on the right hand side is a sum of the external force and internalforces exerted by all other particles.

Total momentum of the system is a sum of momenta of all particles,

P =∑i

pi,

and its time derivative reads

P =∑i

pi =∑i

F i +∑i

∑j 6=i

F ij.


Now we use the law of action and reaction (1.24). In the expression∑i

∑j 6=i

F ij

we sum through all (ordered) pairs of particles. For each pair (i, j) contributing byF ij to the sum, there is a pair (j, i) contributing to the sum by F ji = −F ij. Hence,the total sum of all internal forces is necessarily equal to zero and the time derivativeof the momentum reads

P =∑i

F i. (1.26)

In other words, the total momentum changes only because of the external forces andinternal interaction does not contribute to the overall change of momentum. If thereare no external forces, the total momentum is constant,

P = 0. (1.27)

Law (1.26) states that the total change of the momentum is equal to the externalforce impressed on the system and total momentum is constant if there are no ex-ternal forces. System with no external forces is called isolated because of lack of itsinteraction with surrounding bodies. Thus, law (1.26) can be reformulated as follows.

Theorem 2. Total momentum of isolated system of interacting particles is constantin time (conserved).

Similarly to the case of energy, conservation of momentum is a consequence of homo-geneity of the space. Notion of isotropy and its relation to conservation of momentumare discussed in detail in sections 4.7 and 4.8.

1.7 Conservation of angular momentum

For a single particle, as well as for a system of particles, the force impressed will causea change of the momentum. However, in the case of the system of N particles, itmakes sense to distinguish two kinds of motion: translation, when the body changesthe position, and rotation.

In the introductory courses of elementary physics it is explained that rotationaleffect of the force can be quantified by the so-called torque (or moment of force) withrespect to a fixed origin defined by

1.7 Conservation of angular momentum 27

M = r × F . (1.28)

Recall that the magnitude of the cross product is

M = r F sinα

where α is an angle between both vectors. Because of the presence of the crossproduct, the torque vanishes if the force is parallel to the position vector. In such acase we expect that the force will not cause a rotation. In contrary, rotational effectof the force will be maximal if vectors r and F are orthogonal, see figure 1.5.

rF n

F t

F

α

Fig. 1.5. Rotational effect of the force F on the disk attached to a fixed point in its centre. Anyforce F can be decomposed into the normal part Fn and the tangential part Ft. Clearly, the normalpart F n does not affect the rotation of the disk and only tangential part Ft is responsible for rotation.Magnitude of tangential part is given by Ft = F sinα and hence we define the torque by (1.28).

By the same argumentation we can arrive at the notion of angular momentum.While the torque characterizes rotational effect of the force exerted, angular momen-tum characterizes rotational state of motion. Angular momentum of the i−th particlewith respect to fixed origin is defined as

li = ri × pi. (1.29)

The total angular momentum is then naturally


L =∑i

li. (1.30)

Let us take a time derivative of total angular momentum:

L =∑i

li =∑i

[ri × pi + ri × pi] .

Since pi = miri, vectors ri and pi are parallel and hence their cross product vanishes.Thus, we have

L =∑i

ri × pi.

Because pi is a total force acting on i−the particle, we can see that the rate of changeof the angular momentum of i−th particle is given by the torque of total force actingon this particle. However, we can proceed further and decompose pi into an externalforce and the sum of internal forces (as in the previous section) to find

L =∑i

ri × F i +∑i

∑i 6=j

ri × F ij.

Repeating the argument based on the action-reaction law we conclude that the totalchange of the angular momentum is

L =∑i

ri × F i = M (1.31)

where

M =∑i

ri × F i

is the total torque of external forces impressed on the system.Hence, internal forces does not contribute to the total change of rotational state

of the system, i.e. internal forces cannot affect total angular momentum. The onlyreason why the system of particles can change its angular momentum is the presenceof external forces. Again, when no external forces are present and the system isisolated, total angular momentum is conserved.

Theorem 3. Total angular momentum of an isolated system of interacting particlesis constant.

1.8 Curvilinear coordinates 29

1.8 Curvilinear coordinates

In this chapter we have introduced familiar Cartesian coordinate system. In theCartesian coordinates we assign a triple of numbers (x, y, z), or xi where i = 1, 2, 3,to each point of the space. In chapter 2 we will see that sometimes it is useful touse a different coordinate system which is better adapted to a problem to be solved.Motivation will be presented in chapter 2 but let us introduce the most commoncoordinate systems here.

In geometry, coordinates from the Cartesian ones are called curvilinear coordinatesbecause axes associated to non-Cartesian coordinates are usually curves rather thanlines, see below. In mechanics we often refer to curvilinear coordinates as generalizedcoordinates in a sense that the Cartesian coordinates comprise only special class ofmore general coordinate systems. In this book we use convention that the Cartesiancoordinates will be always denoted by symbol x and labelled by the Latin indices

i, j, k, . . .

which take values 1, 2, . . . n where n is a dimension of the space. For example, n = 3for ordinary three-dimensional space, n = 2 for the plane. Later we will meet abstractspaces with higher dimensions, e.g. the phase space. Hence, for n = 3 we have threecoordinates

x = (x1, x2, x3), or, for brevity, xi.

We do not specify the values of indices i, j, k, . . . if the dimension is clear from thecontext. Notice that symbol x without index stands for the n−tuple of coordinatesxi where i = 1, 2, . . . n, in general. Occasionally, we use standard notation

x1 = x, x2 = y, x3 = z,

if no confusion can arise.Generalized coordinates will be denoted by symbol q and labelled by the Latin

indices

a, b, c, . . .

which take values 1, 2, . . . n where n is a dimension of the space. Again, symbol qstands for the n−tuple of coordinates qa,

q = (q1, q2, . . . qn).


Hence, if we write f = f(q), it means that function f depends on all generalizedcoordinates qa. Usually, if generalized coordinates have direct geometrical meaning,we use specific symbols for individual coordinates. For example, if q1 has the meaningof distance, it will be denoted by q1 = r, if q2 is the angle, it will be denoted by q2 = φ.

When dealing with coordinate transformation from Cartesian coordinates to curvi-linear coordinates, we often need the Jacobi matrix of a transformation. Jacobi matrixis the J matrix of first derivatives of ”new” coordinates with respect to the ”old”ones. Elements of Jacobi matrix are therefore defined by

Jia =∂xi∂qa

=

∂x1∂q1

∂x1∂q2· · · ∂x1

∂qn∂x2∂q1

∂x2∂q2· · · ∂x2

∂qn...

∂xn∂q1

∂xn∂q2· · · ∂xn

∂qn

.

Notice that the matrix itself is denoted by the bold symbol J while the elements ofthe matrix are denoted by Jia.

Suppose that we are given transformation

xi = xi(q)

from curvilinear coordinates to the Cartesian coordinates. Notice that the last equa-tion is in fact an abbreviation for n transformation relations. If we invert theserelations we arrive at the inverse coordinate transformation

qa = qa(x)

with the Jacobi matrix

Jai =∂qa∂xi

. (1.32)

Let us take a matrix product of the Jacobi matrix J and matrix J of the inversetransformation. We find

Jia Jaj =∂xi∂qa

∂qa∂xj

= δij

where we have used the chain rule for partial derivatives in the last step. Since δijare the components of the unit matrix, we have


J · J = I

where I is the identity matrix. The last equality shows

J−1 = J ,

i.e. Jacobi matrices of direct and inverse coordinate transformations are mutuallyinverse.

1.8.1 Polar coordinates

Polar coordinates are defined in the plane rather than in three-dimensional space,see figure 1.6. Let (x, y) be Cartesian coordinates of a given point with the positionvector r. Distance of this point from the origin will is denoted by r and is related toCartesian coordinates by

r =√x2 + y2.

Now we denote the angle between the position vector and the axis x by θ, see figure1.6. Then the pair (r, θ) constitutes the polar coordinates of a point under consider-ation. Clearly, polar coordinates and Cartesian coordinates are related by equations

x = r cos θ,

y = r sin θ.(1.33)

The inverse transformation reads

r =√x2 + y2,

θ = arctany

x.

(1.34)

In the notation introduced above, the Cartesian coordinates for the plane are

x1 = x and x2 = y

while the generalized coordinates qa (polar coordinates, in this case) are

q1 = r, q2 = θ.

For transformation (1.33), the Jacobian is


x

y

(x, y)

r

θ

Fig. 1.6. Polar coordinates in the plane. Cartesian coordinates of the point are (x, y), polar coordinatesare (r, θ) where r is a distance of the point from the origin and θ is the angle between the radius-vectorand x−axis.

J =

∂x

∂r

∂x

∂θ

∂y

∂r

∂y

∂θ

=

(cos θ −r sin θsin θ r cos θ

). (1.35)

Let us see how this result can be obtained using Mathematica. First we define functionJacobi which accepts the list of the Cartesian coordinates xs, the list of generalizedcoordinates qs and the list of transformation rules rules. These rules are assumed tobe of the form

x1 -> ..., x2 -> ..., etc.

where the dots express the Cartesian coordinates in terms of generalized ones. Func-tion Jacobi can be defined, for example, as follows:

In[1]:=H* xi = xi H q L *LJacobi@ xs_ , qs_ , rules_ D := D@ xs . rules , ð D & qs Transpose

For our particular example of polar coordinates, this function should be called in thefollowing way:


In[2]:=Jacobi@ 8x , y <, 8r , Θ <, 8x ® r Cos@Θ D, y ® r Sin @Θ D<D MatrixForm

Out[2]//MatrixForm=

K Cos@ΘD - r Sin @ΘDSin @ΘD r Cos@ΘD O

We can see that the result is identical with the previous one. In the rest of thischapter we will use function Jacobi freely without explicitly mentioning it. Moreover,we can call function Inverse to find

J−1 =

(cos θ sin θ

−sin θ

r

cos θ

r

).

By (1.32), we can deduce the partial derivatives of generalized coordinates withrespect to the Cartesian ones without actually calculating them:

∂r

∂x= cos θ,

∂r

∂y= sin θ,

∂θ

∂x= −sin θ

r,

∂θ

∂y=

cos θ

r.

(1.36)

Now suppose that we want to describe the motion of a particle in the polarcoordinates. Since the particle is moving, its Cartesian coordinates will depend ontime, xi = xi(t), or explicitly

x = x(t), y = y(t).

The Cartesian components of the velocity are, as usually, vi = xi, or explicitly

vx =dx

dt, vy =

dy

dt.

If the Cartesian coordinates depend on time, so do the polar coordinates, i.e. qa =qa(t), or explicitly

r = r(t), θ = θ(t).

Cartesian components of the velocity in terms of polar coordinates read


x =d

dt(r(t) cos θ(t)) = r cos θ − r θ sin θ,

y =d

dt(r(t) sin θ(t)) = r sin θ + r θ cos θ.

(1.37)

The magnitude of the velocity is then

v2 = xi xi = x2 + y2 = r2 + r2 θ2. (1.38)

1.8.2 Spherical coordinates

Spherical coordinates are analogous to polar coordinates but they are defined inthree-dimensional space. Geometrical meaning of spherical coordinates is depicted infigure 1.7. Again, r is distance of the point from the origin, θ is the angle betweenthe position vector and z−axis. Next we project the position vector onto xy−plane,obtaining so a vector r′. Angle between this vector and the x−axis is denoted by φ.By simple geometry we find transformation relations

x = r sin θ cosφ,

y = r sin θ sinφ,

z = r cos θ.

(1.39)

Corresponding inverse relations read

r =√x2 + y2 + z2,

θ = arctan

√x2 + y2

z= arccos

z√x2 + y2 + z2

= arcsin

√x2 + y2√

x2 + y2 + z2,

φ = arctany

x.

(1.40)

The Jacobi matrix and its inverse are

J =

cosφ sin θ r cos θ cosφ −r sin θ sinφsin θ sinφ r cos θ sinφ r cosφ sin θ

cos θ −r sin θ 0

,

J−1 =

cosφ sin θ sin θ sinφ cos θcos θ cosφ

r

cos θ sinφ

r−sin θ

r

−csc θ sinφ

r

cosφ csc θ

r0

,

(1.41)


(x, y, z)θ

φ

r

r′

Fig. 1.7. Spherical coordinates (r, θ, φ) in three-dimensional space.

where

cscx =1

sinx.

Components of the velocity can be calculated in the same way as in the previoussubsection. However, we can use Mathematica as in the following example:

In[31]:=x @t_ D = r @tD Sin @Θ @tDD Cos@Φ @tDD;

y @t_ D = r @tD Sin @Θ @tDD Sin @Φ @tDD;

z@t_ D = r @tD Cos@Θ @tDD;

Print@"x ' = ", x '@tDDPrint@"y ' = ", y '@tDDPrint@"z' = ", z '@tDDPrintB"v 2

= ", Simplify B x '@tD2+ y '@tD2

+ z '@tD2FF

x ' = Cos@Φ@tDD Sin @Θ@tDD r ¢ @tD + Cos@Θ@tDD Cos@Φ@tDD r @tD Θ¢ @tD - r @tD Sin @Θ@tDD Sin @Φ@tDD Φ

¢ @tD

y ' = Sin @Θ@tDD Sin @Φ@tDD r ¢ @tD + Cos@Θ@tDD r @tD Sin @Φ@tDD Θ¢ @tD + Cos@Φ@tDD r @tD Sin @Θ@tDD Φ

¢ @tD

z' = Cos@Θ@tDD r ¢ @tD - r @tD Sin @Θ@tDD Θ¢ @tD

v 2= r ¢ @tD2

+ r @tD2 IΘ¢ @tD2

+ Sin @Θ@tDD2Φ

¢ @tD2M


The last line of the output shows that the magnitude of the velocity in sphericalcoordinates is

v2 = r2 + r2(θ2 + sin2 θ φ2

).

2

Lagrange equations

2.1 Motivation

Basic equation of classical mechanics is Newton’s law of force. If the force F acts ona point mass m, this point mass undergoes an acceleration a according to formula

a =F

m.

In the previous chapter we have introduced a Cartesian coordinate system, in whichthe law of force can be written in the form

Fi = mxi. (2.1)

We can see that Newton’s law is a differential equation of second order. Solving thisequation we find three coordinates xi as functions of time

xi = xi(t).

However, equation (2.1) holds only in Cartesian coordinates. Since we are inter-ested in the motion of bodies in three dimensional space E3 or two dimensional spaceE2, we can always introduce Cartesian coordinate system, write down equations ofmotion and in principle we can also solve them. However, Cartesian system is notalways the most convenient choice and there can be other coordinate systems whichare more appropriate. So, natural question arises: what are the equations of motionin arbitrary coordinate system?

To illustrate why we need non-Cartesian coordinates, let us consider the followingexample. Mathematical pendulum is a point mass m attached to a fixed point calledpivot via rigid rod of length r, see figure 2.1. Cartesian coordinates of the point mass

38 2 Lagrange equations

are (x, y). Pendulum is subject to gravitational force F = mg, where g = (0,−g) isgravitational acceleration. Thus, in order to find the equation of motion we have tofind Cartesian components of the force F and insert them into Newton’s law (2.1).There is a problem, however: coordinates x and y are not independent. For the rodis assumed to be rigid, it has fixed length and, by Pythagorean theorem, coordinatesx and y have to satisfy equation

x2 + y2 = r2, (2.2)

where r is the length of the rod. This is not a dynamical equation, because it is not adifferential equation which can be solved for given initial conditions. Rather it is analgebraic equation which must be satisfied for any solution of equations of motion.Equations of this kind are called constraints and we say that coordinates x and yare constrained.

(x, y)

θ

r

mg

Ft Fn

θ

x

yFig. 2.1. Mathematical pendulum.

In other words, we have two equations of motion, one for each coordinate, but inaddition we have to satisfy the constraint (2.2). Instead of two equations we have

2.1 Motivation 39

to solve three. The reason is that the Cartesian coordinates are not well adapted tothe problem at all. If the system is described by two independent coordinates, wesay that it has two degrees of freedom. But the constraint reduces the number ofdegrees of freedom to one! It is natural, because the pendulum can move only alongthe circle of radius r. And circle is one-dimensional object. Although we describe theposition of pendulum by two coordinates, it has only one degree of freedom.

Can we describe the motion of the pendulum in such a way that it will havemanifestly only one degree of freedom? Definitely we can. The position of pendulumis uniquely determined by the angle of deflection θ, see again figure 2.1. Accordingto that figure, Cartesian coordinates (x, y) are related to the angle θ by

x = r sin θ,

y = r cos θ.(2.3)

This is similar to polar coordinates introduced before, the exchange of sin and coscomes from different definition of angle θ. More important is that quantity r isconstant, it is not a variable. We can easily verify that the constraint (2.2) is satisfiedfor any value of θ:

x2 + y2 = r2 sin2 θ + r2 cos2 θ = r2(sin2 θ + cos2 θ) = r2.

We can see that if we describe the pendulum by angle θ, we do not have to careabout the constraint anymore, for it is automatically satisfied. We thus have thesingle variable θ which corresponds to the fact that the pendulum has only onedegree of freedom.

This is certainly a progress! In Cartesian coordinates we had two equations ofmotion and one constraint. Now we have only one variable and no constraint. Whatremains is to find the equation of motion. From the figure 2.1 it is obvious thatthe force F acting on the pendulum can be decomposed to two components F t andF n. Force F n is the normal component parallel to the rod. It causes the tension ofthe rod, but since the rod is rigid, this has no effect on the motion of pendulum.On the other hand, component F t tangent to the trajectory causes the acceleration.Magnitude of tangent force is

Ft = F sin θ = mg sin θ.

By Newton’s law, this force causes tangential acceleration at of magnitude

at =Ftm

= g sin θ.


Tangential acceleration is

at = r θ

where θ is angular acceleration. The equation of motion of mathematical pendulumis therefore

r θ + g sin θ = 0

or in slightly modified form

θ +g

rsin θ = 0 (2.4)

The point is that the Cartesian coordinates are not always the most convenient.We have seen that if we describe pendulum by Cartesian coordinates we have to solvetwo equations of motion and one constraint, i.e. three equations. But the pendulumhas only one degree of freedom and its description by two coordinates is redundant.This redundancy is reason why we have to impose the constraint. The problem canbe circumvented by appropriate choice of coordinates. Choosing angle θ as a singlecoordinate we have eliminated the constraint and we have found the single equationof motion. So we have one variable θ and one equation of motion. In this coordinatesystem the system has one degree of freedom manifestly and we do not have toimpose the constraint.

Mathematical pendulum is a very simple system and we will analyze its propertieslater on. We will see that despite its simplicity it possesses several non-trivial proper-ties and the equation of motion cannot be even solved. In physics and in modelling ofrealistic situations we often meet systems which are much more complicated. Doublependulum, for example, consists of two point masses, one is attached to pivot, butthe second point mass is attached to the first one. Analysis shows that the motion ofdouble pendulum is chaotic. But in the case of double pendulum it is not clear how tofind the equations of motion and the procedure sketched above is more complicated.Lagrange formalism to be introduced in this chapter provides a systematic way howto derive equations of motion in arbitrary curvilinear coordinate system.

2.2 Lagrange equations of the second kind

We start with the derivation of Lagrange equations of the second kind. Lagrangeequations of the first kind also exist but they contain the constraints explicitly. Wewill not study them in this text. Lagrange equations of the second kind eliminateconstraints by choosing appropriate coordinate system. For simplicity we consideronly one particle of mass m. The result will be easily generalized to more particles.

2.2 Lagrange equations of the second kind 41

2.2.1 Generalized coordinates

In Cartesian coordinates, the law of force has the form

Fi = mxi. (2.5)

We want to transform this equation into an arbitrary curvilinear coordinate system.New coordinates will be denoted q and labeled by indices a = 1, 2, . . . n, where n isnot necessarily equal to 3. For example, as we have seen, the pendulum is describedby single coordinate θ. Variables qa are called generalized coordinates. Cartesian co-ordinates are connected to generalized coordinates by relations of the form

xi = xi(q),

where symbol q stands for the whole n−tuple (q1, . . . qn). If this is too abstract forthe reader, equations (2.3) from the previous section can serve as an example of acoordinate transformation. In the case of pendulum, Cartesian coordinates xi are xand y, and the only generalized coordinate is q1 = θ.

Moreover, we assume that previous relations can be inverted, i.e. we can expressgeneralized coordinates as functions of the Cartesian ones:

qa = qa(x).

Thus, the generalized coordinates are functions of the Cartesian coordinates and viceversa. On the other hand, Cartesian coordinates depend on time (they are solutionsof (2.5)), so the generalized coordinates must depend on time, too:

qa(t) = qa(x(t)).

The total derivative of qa with respect to time can be obtained by the chain rule forderivatives:

qa =∂qa∂xi

xi.

This relation immediately implies

∂qa∂xi

=∂qa∂xi

. (2.6)

The total derivative of xi expressed in terms of generalized coordinates reads


xi =∂xi∂qa

qa. (2.7)

Notice that since qa depend on xi, also the quantity

∂qa∂xi

depends on xi. Similarly, xi depends on qa and therefore

∂xi∂qa

depends on qa as well.We know that if xi is a Cartesian coordinate, then xi is i−th component of the

velocity, i.e. vi = xi. Analogously, derivatives of qa with respect to time are calledgeneralized velocities. In Lagrangian formalism, coordinates and corresponding ve-locities are treated as independent variables. In other words,

∂xi∂xj

=∂qa∂qb

= 0.

2.2.2 Kinetic energy

Kinetic energy expressed in the Cartesian coordinates is

T =1

2mv2 =

1

2mxi xi. (2.8)

Kinetic energy therefore depends on the Cartesian velocities, but it does not dependon the Cartesian coordinates themselves:

∂T

∂xi= 0,

but∂T

∂xi= mxi. (2.9)

Expression (2.8) for kinetic energy can be rewritten in terms of generalized coor-dinates using (2.7):

T =1

2m

(∂xi∂qa

qa

) (∂xi∂qb

qb

)=

1

2m∂xi∂qa

∂xi∂qb

qa qb.

Kinetic energy depends on generalized velocities qa, but now it depends also on qa,because of partial derivatives (recall the remarks below equation (2.7)),

T = T (q, q),∂T

∂qa6= 0,

∂T

∂qa6= 0.

2.2 Lagrange equations of the second kind 43

2.2.3 Generalized forces

The last ingredient neccesary for the derivation of Lagrange equations is the notionof generalized forces. Generalized forces are the components of the force F in thecurvilinear coordinate system. If Fi are Cartesian components of the force, then thegeneralized forces are defined by

Qa = Fi∂xi∂qa

. (2.10)

2.2.4 Derivation

Now we are prepared to derive the Lagrange equations of the second kind. Newton’slaw reads

Fi = mxi.

Using relation (2.9), the right hand side can be rewritten as

Fi =d

dt

∂T

∂xi.

Multiply this equation by ∂xi/∂qa to obtain

∂xi∂qa

Fi =∂xi∂qa

d

dt

∂T

∂xi.

On the left hand side we can see the generalized forces Qa according to relation(2.10):

Qa =∂xi∂qa

d

dt

∂T

∂xi. (2.11)

Now we are going to rearrange the right hand side in order to eliminate the Cartesiancoordinates xi.

Using the Leibniz rule1, the right hand side can be rewritten as

Qa =d

dt

(∂xi∂qa

∂T

∂xi

)− ∂T

∂xi

d

dt

∂xi∂qa

. (2.12)

1 Leibniz rule is a product rule for differentiation. Derivative of the product fg is (fg)′ = f ′ g + f g′. Weuse this rule in the form f g′ = (fg)′ − f ′g.


The first term on the right hand side is, using (2.6), equal to

d

dt

(∂xi∂qa

∂T

∂xi

)=

d

dt

(∂xi∂qa

∂T

∂xi

).

Recall that the kinetic energy depends on Cartesian velocities xi but it does notdepend on the coordinates. Then, by the chain rule, we have

∂T

∂qa=∂T

∂xi

∂xi∂qa

+∂T

∂xi︸︷︷︸0

∂xi∂qa

=∂T

∂xi

∂xi∂qa

.

Thus, equation (2.12) acquires the form

Qa =d

dt

∂T

∂qa− ∂T

∂xi

d

dt

∂xi∂qa

. (2.13)

Now we want to eliminate the Cartesian coordinates from the second term ofequation (2.13). Consider following identity:

d

dt

∂xi∂qa

=∂

∂qb

(∂xi∂qa

)qb +

∂

∂qb

(∂xi∂qa

)qb.

The order of partial derivatives can be interchanged (partial derivatives commute):

d

dt

∂xi∂qa

=∂

∂qa

[∂xi∂qb

qb +∂xi∂qb

qb

]=

∂

∂qa

dxidt

=∂xi∂qa

.

The second term in (2.13) then reads

∂T

∂xi

d

dt

∂xi∂qa

=∂T

∂xi

∂xi∂qa

=∂T

∂qa.

Substituting this equality into (2.13) we arrive at the final form of the equations ofmotion.

d

dt

∂T

∂qa− ∂T

∂qa= Qa (2.14)

2.3 Lagrange equations 45

2.3 Lagrange equations

In the previous section we have derived, after some effort, the Lagrange equations ofthe second kind (2.14). These equations are completely equivalent to Newton’s lawof motion, but they are written in arbitrary curvilinear coordinate system, while theNewton’s law has its simple form in the Cartesian coordinates only. If we want toderive equations of motion for particular system, we have to write down expressionfor kinetic energy T , transform it to appropriate coordinate system and find thecomponents of generalized forces Qa and insert them into equations (2.14).

Note that while it is easy to find the expression for T in generalized coordinates,because it is a scalar function, generalized forces involve the calculation of the sum

Qa =∂xi∂qa

Fi.

There is, however, special but very important case, when the forces Fi are conserva-tive. We know from elementary physics that gravitational force or electrostatic forcecan be written as a gradient of a scalar function called potential. By definition, theforce with components Fi is called conservative if there exists a potential V suchthat

Fi = −∂V∂xi≡ − ∂iV, (2.15)

where the minus sign is conventional. What are the components of the generalizedforces in such case? The calculation is straightforward:

Qa =∂xi∂qa

Fi = −∂xi∂qa

∂V

∂xi= −∂V

∂qa.

Thus, for conservative forces, the components of generalized forces are simply par-tial derivatives of the potential with respect to generalized coordinates. Lagrangeequations of the second kind then acquire the form

d

dt

∂T

∂qa− ∂T

∂qa= −∂V

∂qa. (2.16)

An important point is that the potential cannot depend od velocities. It is a conse-quence of the fact that conservative forces do not depend on the motion of the bodieson which they act2 – they depend only on the configuration of the system, i.e. onthe positions of individual objects. In other words,

2 For example, gravitational force depends only on the distance of both objects, but it does not dependon the velocity of the bodies. On the other hand, electromagnetic force does depend on the velocity –magnetic force is a cross product of the velocity and magnetic field. Nevertheless, the concept of theLagrangian is valid also for the electromagnetic force; this issue is explained later.


∂V

∂qa= 0.

Now, rewrite equation (2.16) as

d

dt

∂T

∂qa− ∂(T − V )

∂qa= 0.

Since the potential does not depend on qa, we can write also

d

dt

∂(T − V )

∂qa− ∂(T − V )

∂qa= 0,

because the term ∂V/∂qa which we added vanishes anyway. Obviously, it is useful tointroduce a new scalar function called Lagrangian by

L = T − V. (2.17)

We arrive at Lagrange equations in the form

d

dt

∂L∂qa− ∂L∂qa

= 0. (2.18)

Notice the terminology used: there exist Lagrange equations of the first kindbut we do not consider them in this text. In the previous section we derived theLagrange equations of the second kind which are equivalent to Newton’s law ofmotion, but they are written in generalized coordinate system. Equations (2.18)are called simply Lagrange equations. They are not completely equivalent to theNewton law, because we assumed that the forces are conservative. Gravitationaland electrostatic forces are typical conservative forces. By contrast, the friction andgeneral electromagnetic forces are non-conservative, i.e. there is no potential V fromwhich they can be derived. If the system under consideration contains the friction,we cannot find the Lagrangian of this system and we cannot use Lagrange equations,but we still can use Lagrange equations of the second kind. It is interesting thatalthough the electromagnetic force is not conservative, the Lagrangian exists, aswe will see later. The friction is not a fundamental force, however: it is a result ofcomplicated interaction between the molecules forming surfaces of bodies in contact.The electromagnetic force, on the other hand, is fundamental, it is one of four basicforces in Nature. In fact, it is the most important force for us. Fortunately, it can bedescribed in Lagrange formalism so that the Lagrange equations (2.18) are sufficientfor the description of almost all physically relevant situations.

2.4 Particle in homogeneous gravitational field 47

Lagrange equations have one important advantage compared to Lagrange equa-tions of the second kind. The equations of motion are derived from the single functionL and we do not have to calculate generalized forces Qa. In the following sectionswe will show few examples how Lagrange formalism works, then we show how toimplement new formalism in Mathematica.

2.4 Particle in homogeneous gravitational field

We start with very simple example, with homogeneous gravitational field. Gravita-tional field is never homogeneous but near the surface of the Earth it is approxi-mately constant. All bodies are moving with constant gravitational acceleration g.Its Cartesian coordinates are

g = (0,−g),

where g = 9.81 kg m s−2. Thus, gravitational acceleration always points downwardsand has magnitude g. On the other hand, Cartesian coordinates of the accelerationare (x, y), so the equations of motion are

x = 0, y = −g. (2.19)

Let us see how the same result can be derived in Lagrange formalism. Kineticenergy of the particle is

T =1

2mxi xi =

1

2m(x2 + y2

).

Gravitational force is

F = m g,

so that

F1 = 0, F2 = −mg.

In order to find the potential we have to solve equations

F1 = −∂V∂x

, F2 = −∂V∂y

.

The first equation merely states that V does not depend on x, i.e.


V = V (y).

The second equation then reads

−∂V∂y

= −g

which integrates to

V =

∫mg dy = mg y + const.

Integration constant does not affect the equations of motion (why?), so we can setthe constant to zero without the loss of generality.

We have found the kinetic energy and the potential, so we can write down theLagrangian which is by definition

L = T + V =1

2m(x2 + y2

)−mg y. (2.20)

Notice that Lagrange equations (2.18) are written in an arbitrary coordinate system.Our motivation was to introduce curvilinear coordinates but these equations hold inthe Cartesian system as well. Now the generalized coordinates are simply

q1 = x, q2 = y,

and Lagrange equations read

d

dt

∂L∂xi− ∂L∂xi

= 0.

For the Lagrangian (2.20) we have

∂L∂x

= mx,d

dt

∂L∂x

= mx,

∂L∂y

= m y,d

dt

∂L∂y

= m y,

∂L∂x

= 0,∂L∂y

= mg.

2.5 Harmonic oscillator 49

Substituting these expressions into Lagrange equations (2.18) we arrive at the equa-tions of motion:

d

dt

∂L∂x− ∂L∂x

= 0 → mx = 0,

d

dt

∂L∂y− ∂L∂y

= 0 → m y = −mg.

We can see that the Lagrange equations are familiar equations of motion (2.19). Ofcourse, for the motion in homogeneous gravitational field we can find the equationsof motion easier than through the Lagrangian. But before we can apply it to morecomplicated problems, it is useful to see how it works in simple cases where we knowthe result even without Lagrange equations.

2.5 Harmonic oscillator

Harmonic oscillator is one of the most important models in physics. In mechanics itcorresponds to the motion of an idealized spring, see figure 2.2. The point mass mis connected to a fixed point via massless spring. If the point mass is displaced fromthe equilibrium position, the spring exerts the force

F = −k q,

where q is the displacement. The minus sign is due to fact that the force always actsin the direction opposite to the displacement. The constant k is called the rigidityof the spring. According to the Newton’s law of motion, the acceleration is given by

a =F

m.

Since the motion is one-dimensional, the only component of the previous equation is

q = − kmq.

Constant k/m is usually denoted as

ω2 =k

m

so that the equation of motion is


q + ω2 q = 0. (2.21)

This equation appears in physics very frequently, even if it is not connected withthe motion of the spring, e.g. oscilations of the electric circuits, vibrating atoms inthe crystal lattice.

Let us find the Lagrangian for harmonic oscilator. Kinetic energy is straightfor-ward:

T =1

2m q2.

Potential is defined by relation

F = −∂V∂q

and the integration yields

V = −∫F dq =

∫k q dq =

1

2k q2.

This expression is usually written in terms of parameter ω:

V =1

2mω2 q2.

Thus, the Lagrangian is

L =1

2m q2 − 1

2mω2 q2. (2.22)

Lagrange equations are obtained in a usual way and we find

∂L∂q

= m q,d

dt

∂L∂q

= m q,∂L∂q

= mω2 q. (2.23)

Inserting these expressions into (2.18) we find

d

dt

∂L∂q− ∂L∂q

= 0 → q + ω2 q = 0. (2.24)

Again, we arrived at expected equation of motion (2.21).

2.6 Mathematical pendulum 51

q = 0 q

Fm

Fig. 2.2. Equilibrium position of the spring corresponds to q = 0. Restoring force F is proportionalto the displacement, F = −kq.

2.6 Mathematical pendulum

So far we considered Lagrange equations in the Cartesian coordinates. Now we returnto the example from the introduction to this chapter, to mathematical pendulum;recall figure 2.1. As we explained, it is more convenient to introduce polar coordinatesvia relations

x = r sin θ, y = r cos θ.

The kinetic energy is again given, in Cartesian coordinates, by

T =1

2mxi xi =

1

2m(x2 + y2

).

Now we have to rewrite this expression in the polar coordinates r and θ. Since the rodof the pendulum is supposed to be perfectly rigid, the coordinate r remains constant.On the other hand, coordinate θ depends on time,

θ = θ(t).

Derivatives of Cartesian coordinates x and y with respect to time are therefore givenby

x = r (cos θ) θ = r θ cos θ,

y = −r (sin θ) θ = −r θ sin θ.(2.25)

Next we insert these relations into T :


T =1

2m(r2 θ2 cos2 θ + r2 θ2 sin2 θ

)=

1

2mr2 θ2

(cos2 θ + sin2 θ

)=

1

2mr2 θ2.

(2.26)

What about the potential V ? In our elementary analysis from the beginning of thechapter, we decomposed the force F into tangent and normal component and realizedthat the normal component does not affect the motion and the tangent componentcauses the angular acceleration. Decomposition of the force was easy but sometimes itcan be very difficult and one has to find an appropriate way. In Lagrange formalism,however, the procedure is straightforward (although it can be complicated).

Cartesian components of the force are

F1 = 0, F2 = mg.

Note that we do not include the minus sign, because y−axis is oriented downwards(see figure 2.1). Now we can compute generalized forces according to relation (2.10).Since we have only one generalized coordinate θ, there is only one generalized force:

Q =∂xi∂θ

Fi = F1∂x

∂θ+ F2

∂y

∂θ= −mg r cos θ.

Potential is then defined as

Q = −∂V∂θ

which integrates to

V = −∫Q dθ = −mg r cos θ.

The Lagrangian of mathematical pendulum is therefore

L =1

2mr2 θ2 +mg r cos θ. (2.27)

Once we have the Lagrangian, equations of motion follow immediately from Lagrangeequations (2.18):

∂L∂θ

= mg r2 θ,d

dt

∂L∂θ

= mg r2 θ,∂L∂θ

= −mg r sin θ.

d

dt

∂L∂θ− ∂L∂θ

= 0 → θ +g

rsin θ = 0.

2.7 Lagrange equations in Mathematica 53

If we define

ω20 =

g

r,

the equation of motion acquires the form

θ + ω20 sin θ = 0. (2.28)

We have seen how the Lagrange formalism can be applied to familiar problems toobtain the equations of motion and we are ready for studying a new problem wherethe equations of motion are unknown. We will illustrate the power of the formalismon the example of the double pendulum. Before we analyse double pendulum, let ussee how the Lagrange formalism can be implemented in Mathematica.

2.7 Lagrange equations in Mathematica

In this section we present one possible way how to derive Lagrange equations usingMathematica. The algorithm to be explained takes Lagrangian L, the list of gen-eralized coordinates and velocities and differentiates Lagrangian in order to obtainLagrange equations. In our example we study the motion in homogeneous gravita-tional field investigated in section 2.4.

Generalized coordinates are now

q1 = x, q2 = y,

the Lagrangian is

L =1

2m(x2 + y2

)−mg y.

Let us explicitly denote which variables depend on time:

L =1

2m(x(t)2 + y(t)2

)−mg y(t).

In order to find the Lagrange equations

d

dt

∂L∂qa− ∂L∂qa

= 0

we have to evaluate partial derivatives


∂L∂x(t)

,∂L∂y(t)

,∂L∂x(t)

,∂L∂y(t)

,

and then to calculate total derivatives

d

dt

∂L∂x(t)

,d

dt

∂L∂y(t)

.

Derivative of the Lagrangian with respect to coordinate x can be found by the com-mand

D[ L, x[t] ]

where L is the Lagrangian written in Mathematica. Similarly, derivative with respectto velocity is simply

D[ L, x’[t] ].

An equivalent way how to perform the last command is

D[ L, D[x[t], t] ].

Since we need to differentiate this expression with respect to time again, we can write

D[ L, D[x[t], t], t ].

Hence, the Lagrange equation for variable x can be written in the form

D[ L, D[x[t], t], t ] - D[L, x[t]] == 0.

Analogous command can be constructed for the second variable y.In order to make our code universal, we realize that we have to perform operation

D[ L, D[#, t], t] - D[L, #] == 0

for each generalized coordinate #, where # must be taken from the list of generalizedcoordinates. Hence, suppose that we have a list of generalized coordinates called qs,in our case

qs = x[t], y[t]

and the Lagrangian L. Then we can define the function

In[5]:=LagrangeEqs@qs_ , L_ D := D@L , D@ð , tD, tD - D@L , ð D 0 & qs

and use it by invoking

2.8 Solving the equations of motion of pendulum 55

In[6]:=LagrangeEqsB 8x @tD, y @tD<,

1

2m J x '@tD2

+ y '@tD2N - m g y @tDF

Out[6]= 8m x ¢¢ @tD 0, g m + m y ¢¢ @tD 0<

Obviously, Mathematica returns correct set of equations of motion.

2.8 Solving the equations of motion of pendulum

In the case of mathematical pendulum we can write

q = \[Theta];

v = p;

L = 1/2 m r^2 p^2 + m g r Cos[\[Theta]];

where have denoted p = θ. Again, running the rest of our code yields correct equationof motion

g m r Sin[\[Theta][t]] + m r^2 \[Theta]’’[t] == 0.

We can see that the mass can be canceled and the equation of motion is (2.28):

θ + ω20 sin θ = 0. (2.29)

This equation cannot be solved in a closed form which means that we cannot writedown its explicit solution. Fortunately, there exist numerical methods which allow usto find the approximate solution. In fact, Mathematica has many built-in methods forconstructing the numerical solutions of many types of differential equations. They areall encapsulated in NDSolve function. But the numerical solution cannot be obtainedif the values of the constants are not specified. Moreover, to find a particular solutionwe have to provide also the initial conditions. Our task is now to solve equation (2.29)with appropriate initial conditions numerically.

\[Omega]0 = 1; T0 = 2 \[Pi] / \[Omega]0;

eqs = \[Theta]’’[t] + \[Omega]0^2 Sin[\[Theta][t]] == 0,

\[Theta][0] == \[Pi]/4, \[Theta]’[0] == 0;

sol = NDSolve[ eqs, \[Theta][t], t, 0, 2 T0]


Here we first set the value of ω0 to 1 for simplicity. Moreover we define the “period”T0 = 2π/ω0 because we know that for harmonic oscilator such relation holds. Nextwe define the list of three equations,

θ + ω20 sin θ = 0, θ(0) =

π

4, θ(0) = 0.

First of them is the equation of motion, the other two represent the initial conditions.Equation θ(0) = π/4 means that the angle of deflection at time t = 0 is equal toπ/4 (in what position the pendulum is?). The velocity θ has been set to zero. Thesolution is found by the function NDSolve, as claimed, where we specify

• system of equations to solve – eqs;• unknown variable – θ[t];• interval of – t, 0, 2 T0.

The result of NDSolve is something of the form

\[Theta][t] -> InterpolatingFunction[....][t]

We can see that it is a replacement rule. According to this rule, any occurence of θ[t]will be replaced by interpolating function. When the function NDSolve constructs thesolution, it finds only a finite number of values of the unknown function θ on desiredinterval. Then, however, we want to evaluate the solution at arbitrary time t and itcan happen that this time will be different than any time used in the construction ofthe solution. For this reason, Mathematica has to ”guess” the correct value of θ atthat time. By the ”guessing” we mean the interpolation between two closest timesat which the value of θ is known.

However, it is not important how the procedure works for us. What we need isthat in order to evaluate the solution at arbitrary time t we have to type

\[Theta][t] /.sol /.t->1

Symbol θ[t] has no meaning to Mathematica, but the rule sol will replace the symbolby the function which is the solution of the equation of motion. Then we can replacethe argument t by its concrete numerical value using the next rule.

Finally, we can visualise the solution by the command Plot. Complete code forsolving the equations of motion and plotting it follows, resulting picture is in figure2.3.

2.9 Deriving the Lagrangian in Mathematica 57

\[Omega]0 = 1; T0 = 2 \[Pi] / \[Omega]0;

eqs = \[Theta]’’[t] + \[Omega]0^2 Sin[\[Theta][t]] ==

0, \[Theta][0] == \[Pi]/4, \[Theta]’[0] == 0;

sol = NDSolve[ eqs, \[Theta][t], t, 0, 2 T0]

Plot[ \[Theta][t] /. sol, t, 0, 2 T0]

Fig. 2.3. Numerical solution of equation of mathematical pendulum for initial conditions θ(0) =π/4, θ = 0 and value ω0 = 1.

2.9 Deriving the Lagrangian in Mathematica

The code we have developed in previous sections is able to find the equations ofmotion from arbitrary Lagrangian provided that the lists of generalized coordinatesand velocities are specified. In the case of mathematical pendulum in section 2.6we have seen that sometimes the Lagrangian must be transformed into appropriatecoordinate system. Even this procedure can be automatized by Mathematica.

As an example we use mathematical pendulum again. The Lagrangian in Cartesiancoordinates reads

L =1

2m(x2 + y2

)−mg y.

Recall that polar coordinates for pendulum were introduced by


x = r sin θ,

y = r cos θ.(2.30)

Recall, in addition, that only the coordinate θ depends on time. In Mathematica wetype

x[t_] = r Sin[\[Thetat];

y[t_] = r Cos[t];

L = 1/2 m ( x’[t]^2 + y’[t]^2) + m g y[t]

which yields

g m r Cos[\[Theta][t]] +

1/2 m (r^2 Cos[\[Theta][t]]^2 \[Theta]’[t]^2 +

r^2 Sin[\[Theta][t]]^2 \[Theta]’[t]^2)

This is correct but can be simplified:

x[t_] = r Sin[t];

y[t_] = r Cos[t];

L = Simplify[ 1/2 m ( x’[t]^2 + y’[t]^2)] + m g y[t]

Now the identity sin2 θ + cos2 θ = 1 is applied automatically and the result is

-g m r Cos[\[Theta][t]] + 1/2 m r^2 \[Theta][t]^2.

Reader can check that this result is identical with the Lagrangian (2.27).

2.10 Planet in gravitational field

In this section we study a new problem: motion of the planet in homogeneous grav-itational field. First we formulate the problem in physical terms, then we present itssolution using Mathematica.

Suppose we have a massive star, e.g. the Sun, which is at rest in a given referenceframe. The Sun produces gravitational field which attracts all bodies to its center.According to Newton’s gravitational law, the body of arbitrary mass m moves withthe acceleration

a = −M Gr

r3, (2.31)

where G is Newton’s gravitational constant, M is the mass of the Sun and bmr isthe position vector of the planet with respect to Sun, see figure 2.4. We can chooseunits in such a way that

2.10 Planet in gravitational field 59

GM = 4π2.

It is convenient to introduce polar coordinates in a standard way as

x = r cos θ, y = r sin θ,

where both coordinates r and θ depend on time (in general).

Sun, mass M

planet, mass m

r (position vector)

Fig. 2.4. Position of the planet with respect to the Sun.

It is easy to show that potential of the gravitational force is

V (r) = −4 π2 m

r, (2.32)

where m is the mass of a planet. The Lagrangian of a planet is then

L =1

2m(x2 + y2

)− V.

Let us find the expression for this Lagrangian in the polar coordinates. CorrespondingMathematica code reads

x[t_] = r[t] Sin[\[Theta][t]];

y[t_] = r[t] Cos[\[Theta][t]];

L = Simplify[1/2 m ( x’[t]^2 + y’[t]^2)] - (4 \[Pi]^2 m)/r

and yields

L =1

2m(r2 + r2 θ2

)− 4 π2m

r.

3

Hamilton’s equations

3.1 Motivation

Let us recapitulate the advantages of Lagrange’s equations compared to the Newtonlaw of motion:

• Lagrange’s equations hold in arbitrary curvilinear coordinate system;• the number of Lagrange’s equations is equal to the number of degrees of freedom

while in the Cartesian system we always have three equations for each particleand possibly some additional constraints;

• the system is described by single scalar function called Lagrangian which simplifiesthe transformation to generalized coordinate system.

In the context of classical mechanics, Lagrange’s equations are equivalent to New-ton’s laws, but Newton’s laws appeared to be incorrect and have to be replaced bythe theory of relativity and the quantum mechanics. Nevertheless, the formalism ofLagrange’s equations can be applied even in those theories.

Typical Lagrangian for one particle in Cartesian coordinates has form

L = T − V =1

2mxi xi − V (x). (3.1)

Let us examine the structure of Lagrange’s equations

d

dt


= 0

compared to Newton’s law of force in the form

dpidt

= Fi.

62 3 Hamilton’s equations

For the Lagrangian under consideration we have

∂L∂xi

= mxi = pi.

We can see that, in Cartesian coordinates, derivative of Lagrangian with respect tovelocity xi is equal to ordinary momentum

pi = mxi.

Lagrange’s equations can be then written in the form

dpidt

=∂L∂xi

.

But for Lagrangian (3.1) we have

∂L∂xi

=∂

∂xi(T − V ) = − ∂L

∂xi= Fi,

since the kinetic energy T does not depend on xi and the potential is defined byrelation Fi = −∂V/∂xi. With this observation we immediately see that Lagrange’sequations are equivalent to Newton’s law for they acquire form

dpidt

= Fi.

Notice that all of this holds in the Cartesian coordinates only.What about general coordinate system qa? We have seen, see equation (2.27), that

the Lagrangian of the pendulum in polar coordinates is

L =1

2mr2 θ2 +mg r cos θ,

where the single generalized coordinate is θ and corresponding generalized velocityis θ. Now

∂L∂θ

= mr2 θ.

This expression is not equal to the momentum of the pendulum, nevertheless, thereis a connection. The velocity of the pendulum is

v = ω r

3.1 Motivation 63

where ω = θ is immediate angular velocity of the pendulum, Since p = mv, actualmomentum of the pendulum is

p = mv = mr θ.

We can see that quantities p and ∂L/∂θ differ by a factor r. But this is a consequenceof the choice of the coordinates only! Although p and ∂L/∂θ are different, they areobviously related.

Thus, in the two cases we presented, derivatives of the Lagrangian with respectto generalized velocities are related to the momentum of the particle. As we haveseen, in Cartesian system they coincide, but in curvilinear coordinates they do not.Nevertheless, it seems reasonable to define notion of momenta derived from the La-grangian.

Let L be an arbitraty Lagrangian depending on generalized coordinates qa andvelocities qa, i.e. L = L(q, q). We define generalized momentum pa conjugated tocoordinate qa:

pa =∂L∂qa

. (3.2)

Lagrange’s equations then acquire form

dpidt

=∂L∂qa

.

If we know actual position and momentum of the particle, we can calculate how themomentum varies in time. But how the position changes? We can find the answer onlyby solving Lagrange’s equations to obtain functions qa = qa(t) and then calculate qa.It would be better, however, if we could write equations of the form

qa = something,

pa = something else.(3.3)

If we describe the state of the system by coordinates qa and momenta pa, Lagrange’sequations give only the second part via

pa =∂L∂qa

. (3.4)

But the Lagrangian is a function of qa and qa, so we can, in principle, use the definitionof generalized momentum


pa =∂L∂qa

(3.5)

and invert it to obtain relation of the form

qa = qa(q, p).

In general, however, we cannot say anything more.Summa summarum, Lagrange’s equations are second order differential equations

for unknown functions qa. In the Lagrange formalism independent variables are co-ordinates qa and velocities qa. We defined a generalized momentum pa by (3.2). Nowwe want to rewrite Lagrange’s equations in such a way that new equations will havethe form (3.3). We have seen that using momenta pa Lagrange’s equations have theform (3.5) and constitute only the half of equations we want to find. The difficultyessentialy is that the Lagrangian itself is a function of qa and qa, but now we want theindependent variables to be qa and pa. Hamilton’s formalism provides a systematicway how to obtain desired equations. Before we present it, a remark on the Legendretransformation must be made.

3.2 Legendre transformation

In this section we formulate the problem in more general way, in the following sectionwe apply it to Lagrange’s formalism. Suppose that function f of variables (x1, . . . xn)is given,

f = f(x1, . . . xn) ≡ f(x).

Its total differential is

df =∂f

∂x1dx1 + . . . +

∂f

∂xndxn,

or, using Einstein’s summation convention,

df =∂f

∂xidxi.

It is an important point, that also the converse is true. If f is a function of some setof variables, and its total differential is found to be

df = yi dxi,

3.2 Legendre transformation 65

we can deduce that f is function of variables xi and relation

∂f

∂xi= yi

holds.OK, let us return to expression

df =∂f

∂xidxi.

Now denote

yi =∂f

∂xi

so that the differential df acquires the form

df = yi dxi.

Using the Leibniz rule we can write

df = d(xi yi)− xi dyi.

This is equivalent to

d(xi yi)− df = xi dyi

or, using the linearity of the differential,

d (xi yi − f) = xi dyi.

Note that on the left hand side we have the total differential of some function whichwill be denoted by g:

dg = xi dyi,

where g = xiyi − f . And this is what we wanted to achieve. Function g is a functionof yi, because its differential contains only differentials dyi, which means that

g = g(y).

Moreover, relation


∂g

∂yi= xi

holds.Let us recapitulate the procedure. We started with function f = f(x) depending

on variables xi. Then we defined new variables yi by

yi =∂f

dxi

which means that the differential df became

df = yi dxi.

Finally we defined new function

g = xi yi − f

with differential

dg = xi dyi

which means that the new function depends on new variables yi. Function g is calledLegendre transformation of function f . Thus, Legendre transformation is procedurehow to transform function f = f(x) to new function g = g(y) where yi = ∂if .

3.3 Hamilton’s equations

Now we are in position to derive Hamilton’s equations. Suppose that our system isdescribed by the Lagrangian

L = L(q, q, t)

where qa are generalized coordinates, qa are generalized velocities. We also allow theLagrangian to depend on time explicitly, i.e. ∂tL 6= 0. We introduce new variablescalled generalized momenta by

pa =∂L∂qa

.

Thus, generalized momenta are partial derivatives of function L with respect toone set of variables – velocities. We want to find the Legendre tranformation of

3.3 Hamilton’s equations 67

Lagrangian in order to obtain function which depends on coordinates qa and momentapa.

The total differential of the Lagrangian reads

dL =∂L∂qa

dqa +∂L∂qa

dqa +∂L∂t

dt.

Using the definition of generalized momentum and using the Lagrange equations(3.4) we find

dL = pa dqa + pa dqa +∂L∂t

dt.

Rearrange the terms to get

pa dqa − dL = − pa dqa −∂L∂t

dt. (3.6)

The left hand side can be rewritten as

pa dqa − dL = d(pa qa)− qa dpa − dL = d (pa qa − L) − qa dpa.

Plugging this expression back to (3.6) yields

d (pa qa − L) = qa dpa − pa dqa −∂L∂t

dt. (3.7)

On the left hand side we have again the total differential of some function. Thisfunction is called the Hamiltonian and is defined by

H = pa qa − L. (3.8)

Hamiltonian H is a Legendre transformation of the Lagrangian and depends on qaand pa. This fact follows from equation (3.7) according to which the total differentialof Hamiltonian is

dH = qa dpa − pa dqa −∂L∂t

dt.

We know that coefficients standing by the differentials on the right hand side are infact partial derivatives of function on the left hand side, i.e. the partial derivativesof Hamiltonian:

∂H∂pa

= qa,∂H∂qa

= − pa,∂H∂t

= − ∂L∂t. (3.9)


Notice that the first two equations have exactly the form (3.3)! These are new equa-tions of motion called Hamilton’s equations :

qa =∂H∂pa

pa = − ∂H∂qa

(3.10)

Hamilton’s equation possess the advantages of Lagrange’s equations but there aresome differences. Let us compare them briefly.

• Both Lagrange and Hamilton equations hold in arbitrary curvilinear coordinatesystem;

• equations of motion are derived from single scalar function, L or H;• Hamilton’s equations are of the first order while the Lagrange equations are of

the second order;• there is one Lagrange equation for each degree of freedom, so for the system withn degrees of freedom we have n Lagrange’s equations; on the other hand, thereare two Hamilton’s equations for each degree of freedom, one for the coordinateand one for the momentum – thus, there are 2n Hamilton’s equations.

Let us illustrate how Hamilton’s equations “work” on familiar examples.

3.4 Particle in homogeneous gravitational field

At this stage, the reader should be very familiar with the section (2.4), page 47. TheLagrangian of the particle in homogeneous gravitational field is

L =1

2mxi xi −mg y =

1

2m(x2 + y2

).

The generalized coordinates in this case are

q = (q1, q2) = (x, y),

i.e. we use ordinary Cartesian coordinates. The generalized momenta are, by defini-tion (3.2),

p1 =∂L∂x

= mx,

p2 =∂L∂y

= m y.(3.11)

3.5 Conservation of energy 69

We can see that the generalized momenta are ordinary momenta, the components ofp = mv in Cartesian coordinates. The last relations can be inverted to find

x =p1m,

y =p2m.

(3.12)

In order to find the Hamiltonian we have to perform the Legendre transformationof the Lagrangian using relation (3.8):

H = pa qa − L = p1 x+ p2 y −1

2m (x2 + y2) +mg y.

This is not a correct expression for we have to eliminate velocities qa and expressthem as functions of momenta pa:

H = p1p1m

+ p2p2m− 1

2m

p21m2

+1

2m

p22m2

+mg y.

Collecting similar terms we arrive at the Hamiltonian in the form

H =p21 + p22

2m+mg y. (3.13)

Notice that this is, not accidentally, an expression for the total energy of the particle.Hamilton’s equations then follow straighforwardly from (3.10).

x = p1, y = p2,

p1 = 0, p2 = − mg.(3.14)

3.5 Conservation of energy

In mechanics we often analyze systems where the total energy is conserved. Allexamples we have seen until know belong to this class of systems. The fact thatenergy is conserved can be made explicit in the Hamiltonian framework.

First we show that the Hamiltonian H is in fact equal to the total energy E =T +V , there T is kinetic energy and V is the potential. Recall that the kinetic energyof one particle is

T =1

2mxi xi =

1

2m∂xi∂qa

∂xi∂qb

qa qb =1

2gab qa qb,


where we defined

gab = m∂xi∂qa

∂xi∂qb

.

Direct differentiation gives

∂T

∂qa= gab qb.

Since the potential V does not depend on generalized velocities, we have

H = qa pa − L = qa∂L∂qa− T + V = qa

∂T

∂qa− T + V = gab qa qb − T + V

and therefore

H = 2T − T + V = T + V = E.

We have proved that the Hamiltonian is equal to the total energy.

3.5.1 Homogeneous functions and Hamiltonian

There exists more general proof of the last statement and we present it briefly.It relies on Euler’s theorem about homogeneous functions. Function f of variablesx = (x1, . . . xn) is said to be homogeneous of degree N if the following holds:

f(λx) = λN f(x).

Differentiating this equation with respect to λ we arrive at

∂f

∂(λxi)xi = N λN−1 f(x).

Setting λ = 1 gives the Euler theorem.

xi∂f

∂xi= N f(x). (3.15)

Let us apply this theorem to the Hamiltonian. Kinetic energy T is a function ofcoordinates qa and velocities qa, since

T (q, q) =1

2gab(q) qa qb.

3.5 Conservation of energy 71

We can see that kinetic energy is homogeneous function of degree 2 in velocities, forwe have

T (q, λq) =1

2gab(q) (λqa) (λqb) = λ2 T.

Application of (3.15) immediately yields

qa∂T

∂qa= 2T,

so that the Hamiltonian is

H = qa pa − L = qa∂L∂qa− T + V = qa

∂T

∂qa− T + V = T + V = E.

This is an alternative proof of the above statement that the Hamiltonian is equal tothe total energy.

3.5.2 Conservation of energy

Since now we know that the Hamiltonian is equal to energy, we can discuss theconservation of energy. Relation (3.9) shows that

∂H∂t

= − ∂L∂t.

Let us calculate the overall change of the energy per unit time:

dHdt

=∂H∂qa

qa +∂H∂pa

pa +∂H∂t

.

Using Hamilton’s equations (3.9) we find

dHdt

= − pa qa + qa pa −∂L∂t

= − ∂L∂t.

Thus, we derived relation

dHdt

= − ∂L∂t.

Energy is conserved if H = 0 but this is achieved when

∂L∂t

= 0.

The question is: can the Lagrangian depend on time explicitly? Notice that


3.6 Phase space

What is the interpretation of Hamilton’s equations? They describe the evolution ofthe system in time. Let us explain this important point in some detail.

Generalized coordinates qa describe the position of all parts of the system. If weknow values

q = (q1, . . . qn)

we know where all the particles are. We say that qa describe the configuration ofthe system. Generalized momenta describe the state of motion (recall that momentaare related to generalized velocities) of individual particles. Together we have 2nquantities describing the actual state of the system which can be encapsulated in theorder 2n−tuple

z = (q1, . . . , qn, p1, · · · , pn) ≡ (q, p).

Variables (q, p) define the state of the system. Hamilton’s equations (3.10) then sayhow, for given state, these quantities change in time. The set of all possible statesof the system is called phase space. In other words, each state can be identified withone point of the phase space. Let us illustrate the idea on the example of harmonicoscillator.

3.7 Harmonic oscillator

Lagrangian treatment of harmonic oscillator can be found in the section 2.5, page49. The Lagrangian of harmonic oscillator is

L =1

2m q2 − 1

2mω2 q2.

Generalized momentum is then

p =∂L∂q

= m q

and corresponding Hamiltonian

H = pq − L =p2

m− L =

p2

2m+

1

2mω2 q2. (3.16)

3.7 Harmonic oscillator 73

Then we can derive Hamiltonian equations of motion easily:

q =∂H∂p

=p

m, p = − ∂H

∂q= −mω2 q.

Now we set m = ω = 1 in order to simplify the analysis, so that the equations ofmotion become

q = p, p = − q. (3.17)

Let us interpret these equations in the spirit of section 3.6. We have two vari-ables describing the state of harmonic oscillator, coordinate q and the momentump. Hence, the phase space is a two-dimensional plane with coordinates q and p, seefigure 3.1. In this figure, the actual state of the oscillator is depicted as a point withcoordinates (q, p). Oscillator will then evolve in accordance with Hamilton’s equa-tions (3.17) which determine the derivatives of coordinates. Thus, the oscillator willmove in the phase plane in the direction of velocity (q, p) which is a vector tangentto the trajectory of the oscillator in the phase plane. This trajectory is called phasetrajectory. By Hamilton’s equations (3.17) we have

(q, p) = (p, − q).

This vector is depicted as an arrow in figure 3.1.Since Hamilton’s equations are of the first order, the evolution of the system is

given uniquely by the initial position in the phase plane. If we draw a velocity ateach point of the phase plane, we get a reasonable idea about the behaviour of theoscillator. Simple Mathematicacode which can be used to draw the velocity field ofharmonic oscillator follows:

VectorPlot[ p, -q, q, -5, 5, p, -5, 5,

Frame -> True,

FrameLabel -> "q", "p",

BaseStyle -> FontFamily -> "Times New Roman", FontSize -> 13

]

The result is plotted in figure 3.2. This figure suggests that phase trajectories ofharmonic oscillators are circles centered at the origin of the phase plane. In the case ofharmonic oscillator we can prove this analytically. Since the Hamiltonian representsthe total energy, which was proved to be conserved, we can write

H = E,


q

p(q, p)

(p,−q)

Fig. 3.1. Geometrical interpretation of Hamilton’s equations (3.17) in the phase space (q, p). The stateof the oscillator is represented by the position q and momentum p which can be regarded as coordinatesin the phase space. The ”velocity” is then vector with coordinates (q, p) where the derivatives aredetermined by Hamilton’s equations.

where E is the total energy of the oscillator, while H is the Hamiltonian (3.16) withthe simplification m = ω = 1 employed in this section for brevity:

H =p2

2+q2

2.

Now, equation H = E can be rearranged slightly so it acquires the form of theequation of circle,

q2 + p2 =(√

2E)2,

where the radius of the circle is manifestly r =√

2E. We can see that the phasetrajectory is determined by the single parameter E, the total energy.

-4 -2 0 2 4

-4

-2

0

2

4

q

p

Fig. 3.2. Velocity field of harmonic oscillator. At each point of the phase plane (q, p) we calculate thevelocity (q, p) using the Hamilton equations (3.17) and draw the vector representing the velocity.

4

Variational principle

We have seen that both Lagrange’s equations and Hamilton’s equations are essen-tially equivalent (at least when the forces involved have potential) to Newton’s lawof motion. In this chapter we derive Lagrange’s equations in a completely differentway, using variational principle. We will see that with this principle it is possible toderive equations of motion from scratch, with the minimum of initial assumptions.This approach is much more powerful, because it works even outside the realms ofclassical mechanics. In fact, all laws of modern physics can be formulated in termsof variational principle.

4.1 Fermat’s principle

Before we formulate variational principle, or Hamilton’s principle, in classical me-chanics, we start our discussion with perhaps more familiar example from optics. Itis well-known that the light in different media propagates at different speeds. If cdenotes the speed of light in the vacuum, than the refractive index of given mediumis defined as

n =c

v

where v is the speed of light in that medium. For example, the refractive index ofwater is about n = 1, 33 which means that the light propagates 1, 33 times slower inwater than in the vacuum. Refractive index of the air is approximately n = 1, i.e.the speed of light in the air is the same as in the vacuum.

Now, when the light ray propagates from one medium to another, it changes thedirection. Suppose that the light rays crosses the interface between two media withrefractive indices n1 and n2, figure 4.1. It is customary to measure the angle of impact

78 4 Variational principle

with respect to the line perpendicular to the plane of the interface. For example, theangle of impact in figure 4.1 is denoted by α. Similarly, the angle of refraction is β.Can we calculate the angle of refraction provided that the angle of impact is given?Yes, we can. According to the Snell law, these angles must satisfy equation

sinα

sin β=n2

n1

. (4.1)

The Snell law (4.1) is a phenomenological law which was discovered before thetheory of electromagnetism and propagation electromagnetic waves have been found.We can say that the role of the Snell law is similar to Newton’s law of motion. Thisanalogy goes even further. Three basic laws of geometrical optics are

• In homogeneous medium, the light propagates along straight lines;• When the light propagates from one medium to another, the angle of impact and

the angle of refraction are related by the Snell law (4.1); if the light is reflectedon the interface, then the angle of impact is equal to the angle of reflection;

• If the light ray can propagate along some trajectory, then it can propagate alsoin the opposite direction along the same trajectory.

With a small portion of fantasy one observes striking analogy between Newton’s lawsof motion and these three laws of optics. The first law tells us that if no changesof refractive index occur, the light is propagating along the straight line which canbe regarded as an analogy to Newton’s law of motion: if no forces act, the body ismoving uniformly along a straight line. Second law, on the other hand, tells us howthe direction of propagation is influenced by changes of refractive index. Newton’slaw tells us how the velocity is changed under external force. Finally, the third lawof optics ensures that if light can propagate along some trajectory, it can propagatein a reverse direction. In other words, if Alice can see Bob, then also Bob can seeAlice. Newton’s third law says that if body A exerts a force on body B, then alsobody B exerts the force of the same magnitude and opposite direction on body A.

However, what we want to emphasize is that both the Snell law and the Newtonlaw of force are empirical laws which are justified by experiment. Is there any deeperlaw which could explain all three laws of optics? Can we replace three laws of opticsby a single law? Yes, we can and it is called Fermat’s law.

Let us see how we can arrive at the formulation of the Fermat principle by heuristicarguments. Suppose that the light ray is propagating in the homogeneous medium inwhich, by definition, the refractive index is constant. Then, by the first law of optics,the ray propagates along the straight line which is the shortest curve connectinggiven two points. Hence, in the homogeneous medium, the light ray which travels

4.1 Fermat’s principle 79

medium n1

medium n2

A

B

α

β

Fig. 4.1. The light ray changes the direction on the interface between two media with refractive indicesn1 and n2. In this figure we assume n1 < n2 which means that the light is faster in the first medium.

from point A to point B follows the shortest path from A to B. Does this statementhold in general? Certainly not, as it is obvious from figure 4.1: the trajectory of lightray passing from one medium to another is not a straight line and so its trajectory islonger than the shortest possible one. But recall that the light propagates at differentspeeds in both media. Maybe that the time rather than length is the minimal!

There is a beautiful argument by Richard Feynman. Suppose that you are standingat the coast and there is a nice girl drowning in the sea. Of course, you want to saveher (this statement does not depend on whether the reader is a girl or a boy). Youmust to reach the girl in a shortest possible time, not by the shortest distance! It is notthe same because you run faster than you swim. There are two extreme trajectoriesalong which you can travel, figure 4.2.

If you run straightly to that girl, trajectory i), you have to swim a long distancewhich takes a longer time. If you choose trajectory ii), you spend the shortest possibletime in the water, but you have to run longer while your enter the water. It is obviousthat we have to find a point where to enter the water, so that the overall time whichyou need to reach the girl is the shortest. This qualitative analysis shows that thetrajectory must be something like it is shown in figure 4.1 which indeed suggests thatthe light ray is following the trajectory with minimal time.

Thus, we have arrived at the conjecture that the light ray propagates from pointA to point B along trajectory which takes the minimal time. This is the Fermatprinciple. Let us formulate it in mathematical terms. Suppose that the light raystarts in the point A and ends in the point B as in figure 4.1. Time which the rayneeds to travel along distance dr is


coast

water

A

B (drowning girl)

i)

ii)

Fig. 4.2. Two extreme trajectories which can be used to save drowning girl. Trajectory i) is the mostnatural one, but the time you spend in water is too long. Trajectory ii) is better, but now you spendtoo much time on the coast.

dt =dr

v=

1

cn dr.

Speed of light c is irrelevant, because whenever ndr is minimal, so is dt. Hence, wedefine optical path length by

ds = n dr.

Total optical path length between points A and B is

S =

B∫A

ds =

B∫A

n dr. (4.2)

This notation is slightly awkward because the value of the integral depends not onlyon points A and B but on the whole trajectory. In figure 4.3 we depict two differenttrajectories γ and γ′ connecting points A and B. Obviously, optical path length S isdifferent for both trajectories and thus instead of writing integration bounds A andB we write the trajectory explicitly, e.g.

S[γ] =

∫γ

n dr.

Here we explicitly emphasize that the integral is taken along trajectory γ. Noticethat S depends on the entire trajectory γ. In other words, S can be regarded as amapping which assigns a number, optical path length, to each trajectory γ,

4.1 Fermat’s principle 81

S : γ 7→ R.

In general, any mapping from arbitrary set into the real numbers is called a func-tional.

A

B

γ

γ′

Fig. 4.3. There are many trajectories connecting points A and B and the optical path length S isdifferent for each of them.

What is the law of propagation for the light ray? The Fermat principle statesthat the light propagates along such trajectory for which the optical length S[γ] isminimal. All three optical laws formulated at the beginning of this section can berecovered from this simple statement. We do not show how it can be done in general,but we show how the Snell law (4.1) can be derived from the Fermat principle.

Situation is sketched in figure 4.4. Suppose again that the light ray starts at pointA in the medium with refractive index n1, crosses the interface between both mediaand finally ends at point B in the medium with refractive index n2. Let a be thedistance of point A from the interface, let b be the distance of point B from theinterface and let x be a coordinate of the place where the ray crosses the interface. Ifpoints A and B are held fixed, it is the coordinate x which is unknown: we want tofind the place where the ray must cross the interface in order that the optical pathlength be minimal. Complement of distance x will be denoted by y. Notice that, forfixed points A and B, the sum of x and y is constant, say, l (it is the horizontaldistance of between points A and B):

x+ y = l.

This equation immediately implies

∂y

∂x= − 1. (4.3)


A

B

α

β

xa

b

n1

n2

r1

r2

y

Fig. 4.4. Derivation of the Snell law using the Fermat principle.

Distance from point A to the crossing-point is

r1 =√a2 + x2

and corresponding optical path length is

s1 = n1 r1 = n1

√a2 + x2.

Similarly, distance from the crossing point to point B and corresponding optical pathlength are

r2 =√b2 + y2, s2 = n2 r2 = n2

√b2 + y2.

The total optical path length of the light ray is therefore

S = n1

√a2 + x2 + n2

√b2 + y2. (4.4)

We want to find such x that the length S will be minimal. This is an easy task ofelementary calculus: we differentiate S with respect to x and set the derivative equalto zero. Assuming that n1 and n2 are constants and using (4.3) we find

∂S

∂x= n1

x√a2 + x2

− n2y√

b2 + y2= 0. (4.5)

From figure 4.4 we can see that x and y are related to angles α and β by

sinα =x

r1=

x√a2 + x2

, sin β =y

r2=

x√b2 + y2

,

4.2 Formulation of variational problem 83

and therefore equation (4.5) is equivalent to

sinα

sin β=n2

n1

which is the Snell law. This finalizes the proof.Let us recapitulate and conclude this section. First we formulated three basic laws

of geometrical optics and emphasized the analogy between these laws and Newton’slaws of motion. By some argumentation we have arrived at the conjecture that thelaws of optics can be replaced by the single law called Fermat’s principle: the lightray propagates in such a way that the optical path length is minimal. From thissimple law we were able to derive the Snell law of refraction of the light.

Although this textbook is not concerned with the optics, our aim was to illustratethe idea of variational principle on familiar example. The reason of this digressionrests in a fundamental importance of variational principles in theoretical physics.It is possible to show that all phenomena which occur in geometrical optics can beexplained on the basis of the Fermat principle. Since the laws of optics are analogous,at least mathematically, to the Newton laws, we can hope that it is possible toformulate Newton’s laws in the framework of the variational principle. This will bedone in the rest of this chapter.

4.2 Formulation of variational problem

Before we apply the variational principle to Newtonian mechanics, we formulate theproblem in precise mathematical terms and solve it. Let us return to integral (4.2)which is only a formal expression of the Fermat principle. When we derived the Snelllaw from the Fermat principle, we assumed that the refractive index was constant inthe first medium and constant but different in the second medium. This allow us tosplit the integral into the sum of two terms (4.4). In general, however, the refractiveindex will be the function of coordinates. Indeed, the first law of optics tells us thatif the refractive index is constant everywhere, the light rays propagate along thestraight lines. Hence, at the first step, we must admit that n is a function of spatialcoordinates:

S[γ] =

∫γ

n(x, y) dr.

For simplicity, we restrict ourselves to two-dimensional case and so we suppress thez−coordinate. The line element dr must be then expressed in terms of the Cartesiancoordinates as well; using the Pythagorean theorem we have


dr =√

dx2 + dy2.

Now suppose that we parametrize coordinates x and y by some parameter t. Then

dx = x dt, dy = y dt,

where x and y are derivatives of coordinates with respect to parameter t. Integral Sacquires final form

S[γ] =

∫γ

n(x, y)√x2 + y2 dt. (4.6)

We can see that the optical path length S[γ] is a functional of the form

S[γ] =

∫γ

L(x(t), x(t)) dt

where L is a function of coordinates and their derivatives. Our task is to find suchtrajectory γ for which the value S[γ] is minimal. It is a task similar to finding theminimum of function familiar from elementary calculus. Such problem is solved bytaking the derivative with respect to the variable and setting it to zero. The differencein our case is that now γ is not a single variable but it is the entire trajectory and itis not obvious how to differentiate S with respect to γ. This concept is known as afunctional derivative or a variation and it can be defined in a very general context.Here we define it in a more pedestrian way sufficient for our purposes.

4.3 Variation of the functional

In the previous section we have formulated basic problem of variational calculus inthe Cartesian coordinates. We have seen in previous chapters that it is often usefulto introduce generalized coordinates qa instead of Cartesian coordinates xi. Hence,we replace x in integral (4.6) by q:

S[q] =

∫γ

L(q(t), q(t)) dt. (4.7)

Here we replaced the argument γ of functional S by argument q, because q is acoordinate expression of the trajectory. How can we differentiate S with respect toγ in order to find γ for which S[γ] is minimal?

4.3 Variation of the functional 85

Suppose that qa is the trajectory which is the solution to our problem, i.e. supposethat for qa the functional S[q] acquires minimal value. Let this trajectory passes pointA for t = t1 and point B for t = t2, see figure 4.5. Since qa is the minimal trajectory,any trajectory q′a different from qa must yield bigger value of S. Notice that we canchoose arbitrary trajectory q′a but it must satisfy boundary conditions

q′a(t1) = qa(t1), q′a(t2) = qa(t2), (4.8)

because points A and B are held fixed. Let us write trajectory q′a in the form

q′a(t) = qa(t) + ε ηa(t) (4.9)

where ε is arbitrary constant parameter and η(t) is arbitrary function of time, subjectto boundary conditions

ηa(t1) = ηa(t2) = 0 (4.10)

in order to satisfy conditions (4.8).Since function ηa is a difference between trajectories qa and q′a, it is called a

variation and in physical textbooks it is often denoted by δqa = εηa. Symbol δ hasformally the same properties as the total differential d. Notice that (4.9) implies

q′a = qa + ε ηa

so we can write δqa = εηa. In other words, variation δ commutes with differentiationwith respect to parameter t.

As we emphasized repeatedly, functional S depends on the trajectory and for qait acquires minimal value, while for q′a 6= qa we have

S[q′] = S[q + εη].

Notice that now we parametrized the family of trajectories q′a by single parameter ε.Since we want to find the minimum of S[q′] (which is S[q]), we need to differentiateS[q′] somehow. While we do not know how to differentiate S with respect to entiretrajectory, differentiation with respect to ε is a well-defined operation. Hence, wedefine the variation or functional derivative of S by

δS =d

dε

∣∣∣∣ε=0

S[q + εη]. (4.11)

Notation d/dε|ε=0 means that first we differentiate the function with respect to εand then set ε = 0. The reason why we substitute zero for ε will be clear soon. Now,the correct qa is a solution to equation

δS = 0.


A

B

t

qa

t1 t2

qa

q′a

ε η

Fig. 4.5. Two trajectories qa and q′a starting at point A and ending at point B. Only for trajectoryqa the functional S is minimized.

4.4 Euler-Lagrange equations

Having defined the variation (4.11), we can now easily solve our variational problem.Let us state it again. We want to find such trajectory qa(t) that the integral (calledaction)

S[q] =

t2∫t1

L(q(t), q(t)) dt (4.12)

is minimal. In order to find this trajectory we replace qa by varied trajectory

qa 7→ qa + ε ηa

and solve equation δS = 0 where the variation δ is defined by (4.11). Since we supposethat initial and final points A and B are fixed, we are interested only in trajectoriesfor which

ηa(t1) = ηa(t2) = 0.

Let us now find the variation δS explicitly. We have

4.4 Euler-Lagrange equations 87

δS =d

dε

∣∣∣∣ε=0

S[q + εη] =d

dε

∣∣∣∣ε=0

t2∫t1

L(q(t) + εη(t), q + εη) dt

=

t2∫t1

[∂L

∂qaηa +

∂L

∂qaηa

]dt.

(4.13)

Note that function L in the first line is evaluated on varied trajectory q′a = qa + εηa.Then we differentiate L using the chain rule with respect to its first and then withrespect to its second argument. After differentiation we put ε = 0 so that functionL is evaluated on the original trajectory qa after the differentiation. Hence, after thedifferentiation we do not have varied trajectory, only the original one.

Next step is to remove derivative of variation ηa with respect to parameter t.Using integration by parts we find

t2∫t1

∂L

∂qa

dηadt

dt =

[∂L

∂qaηa

]t2t1

−t2∫t1

d

dt

∂L

∂qadt. (4.14)

Now we impose boundary conditions (4.10) that ηa must vanish at boundary pointsA and B which implies that the ”boundary” term in square brackets is equal to zero!Thus, after integration by parts, the variation of the action becomes

δS =

t2∫t1

[∂L

∂qa− d

dt

∂L

∂qa

]ηa dt. (4.15)

Our variational principle tells us that this variation must be equal to zero. Recallthat during the variation we kept boundary points A and B fixed. However, equation(4.15) must hold for arbitrary points A and B, because we did not say anythingspecific about these points. We can choose these points arbitrarily and then find δSand this variation δS must vanish. Moreover, variation ηa was chosen to be arbitraryas well. Then, δS can vanish for all ηa and for all points A and B only if the expressionin the square brackets is zero everywhere. In other words, variational principle impliesthat following equations must hold:

∂L

∂qa− d

dt

∂L

∂qa= 0. (4.16)

These equations are known as the Euler-Lagrange equations of variational calculus.


We can see that the Euler-Lagrange equations are nothing else than Lagrange’sequations (2.18), if we identify the Lagrangian L with function L above. This isa surprising result: actual physical system evolves in time in such a way so as tominimize the action (4.12)!

From the other point of view, recall that Lagrange’s equations (2.18) have beenderived as an equivalent formulation of Newton’s laws of motion in arbitrary coordi-nate system. Thus, at the beginning, we had the Newton law which is a physical law.In this chapter, on the other hand, we have not assumed anything about the physics:we merely formulated the rule, variational principle, that action must be minimal.Then we performed some calculations and showed that this principle is equivalent tothe Euler-Lagrange equations (4.16). Hence, we have derived the same form of thelaw of motion without using any physics.

Of course, this strong statement is somewhat weakened if we realize that varia-tional principle does not tell us what is the form of function L. In order to guess theform of L we have to impose some physical restrictions. First, consider free particle,i.e. particle moving in free space where no forces are present. If we describe the parti-cle in the Cartesian coordinates, the Lagrangian L can depend on coordinates xi andvelocities xi. However, all points of the space are equivalent and no one is preferred.If there are no forces, the particle must behave in the same way independently of itsposition. Hence, Lagrangian cannot depend directly on coordinates, it can dependonly on the velocities. This is a consequence of homogeneity of space.

Next restriction comes from a isotropy of space. While homogeneity implies thatall points are equivalent, isotropy implies that for a given point, all directions in thespace are equivalent. We can rotate the system containing our particle under analysisand the particle will behave in the same way. Thus, the Lagrangian cannot dependon the direction of velocity vi = xi and can depend only on its magnitude, v2 = xixi.

Thus, we have determined the Lagrangian of the free particle up to multiplicativeconstant and we can write it in the form

L = α v2 (4.17)

where α is multiplicative constant. This constant cannot be specified further becauseit must be a constant characteristic to the particle and its value will depend onthe convention we use. We can argue that our Lagrangian is proportional to kineticenergy and therefore it is plausible to set α = m/2, but it is not necessary. Weemphasize that it is more-less only a convention that we write constant α in thisform. The reason is that it was the Newton law which was discovered first and thevariational principle was discovered later. From now we assume α = m/2 and denotekinetic energy by

4.4 Euler-Lagrange equations 89

T = α v2 =1

2mv2

and investigate what happens in the presence of forces.We can see the heuristic power of variational principle: equations of motion are

provided by the Euler-Lagrange equations which have always the same form regard-less on the system we describe and independently on the coordinates used. In orderto find equations of motion we have only to specify the Lagrangian. Usually we donot have too many possibilities how the Lagrangian can look like. We have seen inthe case of the free particle that essentially the only form of admissible Lagrangianis (4.17). The reason is that the Lagrangian is a scalar, so we must construct a scalarquantity from quantities describing our system, like velocity and coordinates. Usuallythere are only few possibilities.

Situation is similar even in the presence of forces. If the force is potential and thusdescribed by single scalar V such that Fi = −∂iV , it is natural to set

L = α v2 − Vwhere the minus sign is customary again and is related to the fact that the force isminus the gradient. This choice is convenient but absolutely not necessary.

Electromagnetic forces, on the other hand, are not potential. Thus, the construc-tion of Lagrangian as in chapter 2 is impossible: we can define the generalized forcesQa but they are not a gradient of any scalar. In fact, electromagnetic field is describedby one scalar potential φ and one vector potential Ai. These potentials in generaldepend on time and position. Now it is not important what is the vector potential,we want just illustrate that even in this case the Lagrangian can be constructed.Indeed, the particle moving in the electromagnetic field is described again by theposition xi and by the velocity vi while the field itself is described by potentials φand Ai. Can we combine these quantities to form a scalar Lagrangian? Yes, and theconstruction is fairly unique. Since φ is itself a scalar, we can simply add it to theLagrangian of free particle (or, more precisely, subtract it from the Lagrangian), sothat the first part of the Lagrangian will be

L = T − βφ.Here, β is again a constant to be specified later. Now we can form two scalar functionsfrom quantities xi, vi and Ai:

xivi, xiAi, Aivi.

The first combination does not contain field quantities and we can exclude it im-mediately, for it cannot describe interaction of the particle with a field. The second


combination looks better but recall that the space itself is homogeneous. This ho-mogeneity is broken down by the presence of the electromagnetic field, but still theLagrangian should not depend on coordinates directly, only through potentials φ andAi. Hence, the only plausible combination is Aivi and we can write

L = T − βφ+ γA · v.

Now, constants β and γ obviously determine the strength of interaction between thefield and the particle. We know from the experience that electromagnetic force isproportional to the charge of the particle e and thus we can write the Lagrangian inthe final form

L = T − e φ+ eA · v. (4.18)

We can see that our construction is not ”bullet-proof” but it is very natural and,moreover, it yields correct equations of motion. This heuristic approach is even morepowerful in relativistic theories where the action must be a scalar1 with respect to so-called Lorentz group which is a strong restriction. Notice that in classical mechanicswe know what the correct equations of motion are: Lagrange’s equations must reduceto Newton’s law. However, when we are developing a new theory, we do not knowwhat the correct equations are. In such a position we usually assume that variationalprinciple is correct and guess the form of the action or the Lagrangian. In thisway, people constructed modern quantum field theories of electromagnetic, weak andstrong interactions. Hence, variational principle is much more fundamental principlethan it seems from our discussion.

4.5 Non-uniqueness of the Lagrangian

Using the variational formulation it is easy to see that the Lagrangian is not unique,i.e. there are many different Lagrangians which yield the same equations of motion.To see this, consider arbitrary function F = F (t) of time and define

f(t) =dF

dt.

Let us modify the action by adding a new term to it:

1 In classical mechanics it does not matter whether we construct the action or directly the Lagrangian,because they differ only by integration over time. In relativistic theories, time is not invariant and trans-forms as a component of (four-)vector. Hence, it is the action which must be scalar, not the Lagrangian.

4.6 Variational derivation of Hamilton’s equations 91

S ′ = S +

t2∫t1

f(t) dt.

Second term can be integrated,

S ′ = S +

t2∫t1

dF

dtdt = S + [F (t)]t2t1 = S + F (t2)− F (t1).

Thus, the new action S ′ differs from S only by boundary terms – values of F atboundaries of the trajectory. These are fixed under variation and so we have

δS = δS ′.

That means that variational principle δS = 0 gives the same equations of motion asprinciple δS ′ = 0. By the definition of the action, we have

S ′ =

t2∫t1

(L+ f(t)) dt,

which can be written as

S ′ =

t2∫t1

L′ dt

where

L′ = L+ f(t) = L+dF

dt. (4.19)

In other words, if we change the Lagrangian by adding function f which is a totalderivative of some other function F with respect to time, we do not change theequations of motion. Hence, the Lagrangian is not unique. This is an importantobservation which will be exploited in the connection with canonical transformation,chapter 5.

4.6 Variational derivation of Hamilton’s equations

We have shown that variational principle reproduces Lagrange’s equations. Can wereproduce Hamilton’s equations as well? Let us start with the action (4.7) and expressthe Lagrangian in terms of the Hamiltonian using the Legendre transform (3.8):


S =

t2∫t1

(pa qa −H) dt. (4.20)

Recall that the Hamiltonian is function of coordinates and momenta, H = H(q, p).Let us variate the action, remembering that δ−symbol behaves like the differential,

δS =

t2∫t1

(pa δqa + qa δpa −

∂H∂qa

δqa −∂H∂pa

δpa

)dt.

Now we have three variations δqa, δpa and δqa. However, they are not independentbecause qa should be expressed in terms of momenta. We can get rid of this termintegrating by parts,

t2∫t1

pa δqa dt = [pa δqa]t2t1−

t2∫t1

pa δqa dt

where the first term on the right hand side vanishes by boundary conditions δpa = 0for t = t1 and t = t2. Then the variation of the action becomes

δS =

t2∫t1

(−pa δqa + qa δpa −

∂H∂qa

δqa −∂H∂pa

δpa

)dt.

Variation will be zero for arbitrary choice of t1 and t2 if the integrand vanishes.Comparing coefficients standing beside independent variations δqa and δpa we recoverHamilton’s equations

qa =∂H∂pa

, pa = − ∂H∂qa

. (4.21)

4.7 Noether’s theorem: motivation

In general, during the evolution of mechanical system, quantities characterizing thesystem change. Namely, coordinates and velocities (or momenta in the Hamiltonianformulation) are solutions to equations of motion and hence they are genuine (non-trivial) functions of time. However, there are other quantities which are functions ofqa and pa but for the real evolution they remain constant. The most familiar example

4.7 Noether’s theorem: motivation 93

is energy. We have seen that Hamiltonian represents total energy of the system and ifit does not depend on time explicitly, it does not depend at time at all. For example,the Hamiltonian of harmonic oscillator is

H =p2

2m+

1

2mω2 qa

where both p and q are functions of time. Nevertheless, for any solution of Hamil-ton’s equations, particular combination of coordinates and momenta given by H isa constant. In this case we say that the energy is conserved.

Other examples of conserved quantities are momentum and angular momentum.Total momentum and total angular momentum of an isolated system are constantin time.

From mathematical point of view, the existence of conserved quantities is notsurprising but it is a direct consequence of properties of differential equations. Forthe system with n degrees of freedom we have n Lagrange’s equations of the secondorder or 2n Hamilton’s equations of the first order. Solution of second-order equationcontains two arbitrary constants, so the solution of complete set of Lagrange’s equa-tions contains 2n constants. Similarly, solution to first-order equation contains oneintegration constant, so the solution of complete set of Hamilton’s equations containsagain 2n constants.

We have arrived at conclusion that, regardless on the formalism, the solutionof equations of motion depends on the choice of 2n arbitrary constants C1, . . . C2n.Hence, the solution (q, p) of equations of motion can be written in the form

q1 = q1(t, C1, . . . C2n), p1 = p1(t, C1, . . . C2n),

......

qn = qn(t, C1, . . . C2n), pn = pn(t, C1, . . . C2n).

This is the system of 2n equations for constants Cm which can be inverted to obtain

C1 = C1(q1, . . . qn, p1, . . . pn, t),

...

C2n = C2n(q1, . . . qn, p1, . . . pn, t),

In other words, for any solution of Hamilton’s equations there must exist at least2n functions Cm of coordinates and momenta which are in fact constant and henceconserved. In this sense the existence of conserved quantities is a pure mathemati-cal consequence of the fact that solutions of differential equations contain integration


constants. Of course, any combination of constants Cm is again a constant and there-fore the set of conserved quantities is not unique.

There is, however, much deeper physical interpretation of the existence of con-served quantities. Some of these conserved quantities reflect properties of the spaceand time and so they are intimately related to symmetries of Nature. This relation isthe content of celebrated Noether’s theorem, one of the most fundamental and strik-ing achievements of modern theoretical physics. The most important consequencesof Noether’s theorem can be found in relativistic quantum field theories, but it hasimplications even in the context of classical mechanics. In the following we derive andproof the Noether theorem, then we show that conservation of energy, momentumand angular momentum is the consequence of this theorem. The reader will noticethat the theorem is genuinely based on the variational principle which this chapteris devoted to.

4.8 Noether’s theorem: proof

When we derived Lagrange’s equations from the action, the idea was to find suchtrajectory qa that the action S acquires its extremal value. The variation of the actionwas introduced with the help of varied trajectories, recall figure 4.5. The variationof the trajectory was arbitrary with the only constraint that it must vanish at theboundary points A and B. Using this constraint we were able to derive equations ofmotion, i.e. the Lagrange equations.

Now we proceed differently. We claimed in the previous section that to each sym-metry of the system there is a conserved quantity. What do we mean by the symmetryof the system? The simplest example of the symmetry is the invariance with respectto temporal translation. Isolated systems must be invariant under translations intime. In other words, if we perform some experiment at time t1 and then the sameexperiment at later time t2 > t1, both experiments must give the same results, if allconditions remain unchanged.

For example, suppose we study the collision of two particles with initial velocitiesv1 and v2 and masses m1 and m2. In addition, we suppose that these particles forman isolated system, not affected by the laboratory. After the collision we measurethe velocities and find that new velocities of particles are v′1 and v′2. The point isthat if the initial velocities and masses do not change, resulting velocities after thecollision do not depend on time when the experiment was performed. It does notmatter whether we study the collision on Monday or on Friday, the result must bethe same, independent of time.

4.8 Noether’s theorem: proof 95

t1 t2t′1 t′2

A

B

A′

B′

q

Fig. 4.6. Translation of the system in time.

More generally, imagine that qa = qa(t) is the real trajectory (i.e. it is a solutionof equations of motion) which passes point A at time t1 and point B at time t2,see figure 4.6. If we perform the same experiment at later time, we can imagine itas ”shifting” the trajectory to the right (in time direction), so that new trajectorystarts at point A′ at shifted time t′1 and ends at point B′ at time t′2. We say that,mathematically, we translated the system in time. If all other conditions remain thesame, then the shape of the trajectory cannot change, the particle must move alongthe same trajectory but at later time.

We say that time is homogeneous, i.e. alt instants of time are physically equivalent.Hence, the result of any experiment cannot depend explicitly on time at which it wasperformed: isolated system must be invariant under the translation in time.

Notice that this conclusion does not apply to non-isolated systems. For example,suppose that we measure the intensity of the sunlight at 8.00 am and at 11.00 pm.Then the results will be, of course, different! We cannot say that the intensity of thesunlight is always the same. However, this is related to the fact that Earth is not anisolated system if one studies the sunlight, because for our measurement it is crucialthat there is an energy coming from Sun to Earth. The conditions which can affectthe experiment are not the same in the morning and before midnight. Hence, theassumption that the system is isolated is important. In fact, the existence of Sun andthe rotation of Earth breaks down the homogeneity of time.

We will not always emphasize it, but in connection with conservation laws, wewill always assume that the system is isolated.


Homogeneity of time is the simplest of the symmetries to be discussed. The nextone is the homogeneity of the space. This principle states that the result of experimentcannot depend on the place where we perform it. Again, we must add an assumptionthat all external conditions must be the same. But if this assumption is satisfied, itdoes not matter where we perform the experiment. The physics must be invariantwith respect to translation in the space; this transformation is plotted at figure 4.7.

t1 t2

A

B

A′

B′q

q(t1)

q′(t1)

Fig. 4.7. Translation of the system in space.

The last of the most important symmetries is the isotropy of the space. Isotropymeans that at given point of the space, all directions are equivalent and the resultof any experiment cannot change if we rotate the system by arbitrary angle.

If the system is invariant with respect to some transformation(translation in time,space or rotation), the action of this system does not change under this transforma-tions. Noether’s theorem then implies that each of these symmetries is responsiblefor the conservation of some quantity. Homogeneity of time implies the conserva-tion of energy, homogeneity of the space implies the conservation of momentum andisotropy of the space implies conservation of angular momentum.

Notice that in previous examples we varied either the trajectory or the time. Inthe case of spatial translation, figure 4.7, we did not transform the time, only thetrajectory. However, boundary points were not fixed because the endpoints of thetrajectory are transformed as well. Thus, in general, boundary conditions

δqa(t1) = δqa(t2) = 0


must be relaxed. In the case of time translation we did not change values of coordi-nates qa, but we shifted the trajectory in time and thus we must consider not onlyvariations of coordinates qa, but also variation of time δt.

All transformations considered above are special cases of general transformation

q(t) 7→ q′(t) + δq(t), t′ 7→ t+ δt(t).

Here we explicitly emphasized that variations δt and δqa can depend on time. Varia-tion δq is called isochronous variation because it is a difference of varied coordinateq′(t) and original coordinate q(t) at the same time. Beside δqa we introduce alsonon-isochronous variation or total variation ∆qa and defined by

∆qa(t) = q′a(t′)− qa(t).

Using the Taylor expansion we can write

∆qa = q′a(t+ δt)− qa(t) = q′a(t) + qa δt− qa(t) = δqa + qa δt. (4.22)

Now we are prepared to prove the Noether theorem.

Theorem 4 (Emmy Noether’s theorem). Let S be the action of the system de-fined by

S =

t2∫t1

L(q, q, t) dt (4.23)

Let

q(t) 7→ q′(t) + δq(t), t′ 7→ t+ δt(t). (4.24)

be a transformation of the coordinates qa and time t which leaves S invariant. Thenquantity

Q = pa∆qa − E δt (4.25)

is constant during the evolution of the system whenever qa is the solution of equationsof motion, where pa are the generalized momenta and E is generalized energy of thesystem E defined by

pa =∂L

∂qa, E = paqa − L. (4.26)


Proof. By assumption, the action (4.23) is invariant under transformation (4.24).The action associated with varied trajectory q′a and varied time t is

S ′ =

t2+δt2∫t1+δt1

L(q′(t), q′(t), t) dt (4.27)

where we use notation δt1 = δt(t1) and δt2 = δt(t2) for brevity. Notice that thetime translation δ affects only the integration bounds, not the integrand. The totalvariation of the action is then ∆S = S ′−S which is zero by assumption of invarianceof the action:

∆S = S ′ − S = 0. (4.28)

Using the additivity of integral we can rewrite varied action as

S ′ =

t2+δt2∫t1+δt1

=

t2∫t1+δt1

+

t2+δt2∫t2

= −t1+δt1∫t2

+

t2+δt2∫t2

= −t1∫t2

−t1+δt1∫t1

+

t2+δt2∫t2

=

t2∫t1

L(q′, q′, t) dt−t1+δt1∫t1

L(q′, q′, t) dt+

t2+δt2∫t2

L(q′, q′, t) dt

where we have omitted the integrand in intermediate steps. Hence, the total variationof the action reads

∆S =

t2∫t1

L(q′, q′, t)− L(q, q, t) dt

︸︷︷︸∆S1

+

t2+δt2∫t2

L(q′, q′, t) dt−t1+δt1∫t1

L(q′, q′, t) dt

︸︷︷︸∆S2

.

(4.29)

Now we are in position to expand these integrals in variations δqa and δt assumingthey are infinitesimal and hence neglecting higher order terms. This is ”legal” becausein the definition of the variation it was assumed that after variation, all quantitieswill be evaluated at δqa = δt = 0, so only the first order terms enter the result.

First we express the variation denoted by ∆S1 in the equation above. The La-grangian is evaluated on different trajectories but at the same time and so the ex-pression under the integral is isochronous variation of the Lagrangian:


∆S1 =

t2∫t1

δL dt =

t2∫t1

∂L

∂qaδqa +

∂L

∂qaδqa dt.

Second term can be integrated by parts to find

∆S1 =

[∂L

∂qaδqa

]t2t1

+

t2∫t1

(∂L

∂qa− d

dt

∂L

∂qa

)δqa dt.

We arrived at the same expression when we derived Lagrange’s equations from thevariational principle but now the interpretation is different. There we assumed thatboundary points of the trajectory are fixed and so we assumed δqa(t1) = δqa(t2) = 0.By this assumption, the first term in square brackets vanished and hence we deducedthat in order to satisfy δS = 0, the Lagrange equations must hold. But now theboundary conditions are not fixed because we consider the transformation of thesystem. However, we assume that the equations of motion are satisfied and thereforethe second term vanishes! Consequently, the only contribution from ∆S1 to totalvariation is merely

∆S1 =

[∂L

∂qaδqa

]t2t1

.

Next we evaluate variation ∆S2 in the expression (4.29). Recall that we are ex-panding all quantities up to the first order in variations δqa and δt. Thus, for example,the first integral in ∆S2 is

t2+δt2∫t2

L(q′, q′, t) dt =

t2+δt2∫t2

L(q, q, t) dt+O(δq2).

Now we can expand the Lagrangian into series in t as

L(t) = L(t2) + L(t2)(t− t2) +O((t− t2)2

).

Since the integral is taken over interval (t2, t2 + δt2), the inequality

t− t2 < δt2

holds and therefore we can write

L(t) = L(t2) + L(t2)(t− t2) +O((δt2)

2).


Then the integral reads

t2+δt2∫t2

L(t) dt = [L(t2) t]t2+δt2t2

+O((δt2)

2)

= L(t2) δt2.

Neglecting the quadratic terms we arrive at

t2+δt2∫t2

L(q′, q′, t) dt = L(q(t2), q(t2), t2) δt2.

By the same reasoning we can derive

t1+δt1∫t1

L(q′, q′, t) dt = L(q(t1), q(t1), t1) δt1.

We can conclude that total variation ∆S2 is equal to

∆S2 = L(q(t2), q(t2), t2) δt2 − L(q(t1), q(t1), t1) δt1 = [L δt]t2t1 .

Summa summarum, the total variation of the action reads

∆S =

[∂L

∂qaδqa + L δt

]t2t1

. (4.30)

Using the definition of generalized momentum (3.2) and relation between isochronousand total variation (4.22) we find

∆S = [pa∆qa − (pa qa − L) δt]t2t1 . (4.31)

The coefficient standing by variation δt is in fact equal to the Hamiltonian (3.8). Thereason why we do not denote it by H is that the Hamiltonian should be expressedas the function of qa and pa which is not our case. But we know that Hamiltonian isequal to the total energy and hence we define generalized energy by

E = pa qa − L

so that the total variation of the action becomes

∆S = [pa∆qa − E δt]t2t1 . (4.32)

4.9 Basic conservation laws 101

Finally, let us denote the expression in square brackets by Q:

Q = pa∆qa − E δt. (4.33)

The total variation is then

∆S = [Q]t2t1 = Q(t2)−Q(t1).

Now, by (4.28) we have ∆S = 0 and hence

Q(t1) = Q(t2). (4.34)

Since times t1 and t2 can be chosen arbitrarily, we have Q(t1) = Q(t2) for arbitrarytimes t1 and t2. In other words, the value of Q at arbitrary time t1 is equal to valueof Q at arbitrary time t2. In other words, Q acquires the same value at each timeand hence Q is a conserved quantity,

Q = constant.

Nevertheless, Q is not our final expression for conserved quantity, because it con-tains the variations ∆qa and δqa and hence it depends on particular transformation.We have to clarify the nature of variations further. If we say that the system is in-variant under, for example, translations, we actually mean that it is invariant underarbitrary translation. The translation in, say, x−direction can be understood as acontinuous transformation parametrized by parameter a,

x 7→ x+ a.

For a = 0 we have the identity transformation x 7→ x. Since a is a continuousparameter, also the transformation x 7→ x+ a is continuous in variable a.

This concludes the proof of Noether’s theorem.ut

4.9 Basic conservation laws

In previous section we have proved the Noehter’s theorem for general transformationof the system generated by infinitesimal variations ∆qa and δt. We have proved thatfor such a general transformation, quantity (4.25) given by

Q = pa∆qa − E δtis conserved. In this section we investigate the implications of Noether’s theoremregarding basic symmetries of the space and time discussed above: homogeneity ofspace and time and the isotropy of space.

5

Hamilton-Jacobi equation

In previous chapters we found two alternative formulations of Newton’s laws of mo-tion, namely the Lagrange and the Hamiltonian formulation. Lagrange’s equationsare formulated in arbitrary coordinate system. Their main advantage is that by anappropriate choice of the coordinates we can eliminate the constraints which compli-cate the analysis. Similarly to Newton’s law, Lagrange’s equations are second orderequations. Hamilton’s equations are also coordinate-independent but, in addition,they have the form of first order differential equations. In general, first order equa-tions are easier to solve. In the case of Hamilton’s equations, this advantage is onlyformal because in order to solve the system of Hamilton’s equations we usually haveto convert them back to second-order equations. Main advantage of Hamilton’s equa-tions is that we can interpret the motion of particles as the motion in the phase space.We have seen that the conservation of energy allows us to find the phase trajectorieseven without solving the equations of motion.

In this chapter we start with the analysis of such coordinate transformationswhich leave the form of Hamilton’s equation invariant, so-called canonical trans-formations. Then we study the possibility of finding such transformations whichsimplify the Hamilton’s equations so that they can be solved easily. We will see thatthis is indeed possible if we solve the Hamilton-Jacobi equation. In many situations,Hamilton-Jacobi equation can be solved exactly and the solution of Hamilton’s equa-tions simplify significantly. Analysis of Hamilton-Jacobi equation will lead us to anew, third formulation of classical mechanics. Finally we introduce action-angle vari-ables which will be useful in the analysis of more complicated systems with periodicbehaviour.

104 5 Hamilton-Jacobi equation

5.1 Canonical transformations

In Hamilton’s formalism we treat coordinates qa and momenta pa as independentvariables. Let us investigate such transformations which do not change the form ofHamilton’s equations

qa =∂H∂pa

, pa = − ∂H∂qa

. (5.1)

Hence, we are interested in transformations

Qa = Qa(q, p), Pa = Pa(q, p), (5.2)

preserving equations (5.1), i.e. such that new equations of motion will be

Qa =∂H′∂Pa

, Pa = − ∂H′∂Qa

. (5.3)

In chapter 4 we have seen that the Lagrangian is not determined uniquely, so that wecan add arbitrary function which is a total time-derivative to a Lagrangian withoutaffecting the equations of motion, recall equation (4.19).

Suppose that we have the Lagrangian L = L(q, q) and corresponding Hamiltonian

H = qa pa − L.

Then we perform transformation (5.2) to new coordinates Qa and new momenta Paand obtain a new Lagrangian L′ = L′(Q, Q) with associated Hamiltonian

H′ = Qa Pa − L′.

We require that both Lagrangians yield the same equations of motion. Then, by(4.19), two Lagrangians can differ only by a total derivative of some function F withrespect to time,

L′ = L+dF

dt.

In terms of Hamiltonian this means

qa pa −H = Qa Pa −H′ +dF

dt. (5.4)

In general, function F depends on both old coordinates, new coordinates and possiblyon time,

5.1 Canonical transformations 105

F = F (q1, . . . qn, p1, . . . pn, Q1, . . . Qn, P1 . . . Pn, t) ≡ F (q, p,Q, P, t),

i.e. it is a function of 4n + 1 variables. But these coordinates are not all indepen-dent as they are constrained by 2n equations (5.2). Hence, F is a function of 2n+ 1independent variables and we can decide which variables will be independent. Trans-formations (5.2) are called canonical and function F is called generating function forcanonical transformations (5.2).

Let us choose a generating function F1 which is a function of old and transformedcoordinates (and possibly on time),

F1 = F1(q,Q, t). (5.5)

Its total derivative with respect to time is

dF1

dt=∂F1

∂qaqa +

∂F1

∂Qa

Qa +∂F1

∂t. (5.6)

Substituting this expression into (5.4) and comparing coefficients standing by inde-pendent derivatives qa and Qa, respectively, we find

pa =∂F1

∂qa, Pa = − ∂F1

∂Qa

, H′ = H +∂F1

∂t. (5.7)

Hence, we can define arbitrary function F1 of qa and Qa and, using relations (5.7),we can find transformations which function F1 generates. Equation

pa =∂

∂qaF1(q,Q, t)

can be used to find defining relation for Qa, i.e. we can solve this equation to find

Qa = Qa(q, p, t).

This result can be substituted to equation

Pa = − ∂

∂Qa

F1(q,Q, t)

which can be then solved to find relation

Pa = Pa(q, p, t).


Sometimes it is useful to define generating function which depends on variables qaand Pa. This can be achieved using familiar Legendre transformation. Let us writethe differential of F1 with the help of equations (5.7):

dF1 =∂F1

∂qadqa +

∂F1

∂Qa

dQa +∂F1

∂t

= pa dqa − Pa dQa +∂F1

∂t

= pa dqa − d(Qa Pa) +Qa dPa +∂F1

∂tdt.

Let us define function

F2 = F1 +Qa Pa. (5.8)

Its differential reads

dF2 = pa dqa +Qa dPa +∂F1

∂tdt (5.9)

which means that F2 is function of qa and Pa,

F2 = F2(q, P, t), (5.10)

and, in addition, transformation generated by function F2 is

Qa =∂F2

∂Pa, pa =

∂F2

∂qa, H′ = H +

∂F2

∂t. (5.11)

5.2 Hamilton-Jacobi equation

Canonical transformations preserve the equations of motion. Let us find such canon-ical transformation that Hamilton’s equations simplify as much as possible so thatwe can solve them explicitly. We introduce generating function of type (5.10) but wewill denote it by S:

S = S(q, P, t).

From (5.11) we have

Qa =∂S

∂Pa, pa =

∂S

∂qa, H′ = H +

∂S

∂t. (5.12)

5.2 Hamilton-Jacobi equation 107

In order to simplify Hamilton’s equations, let us put H′ = 0, so that S satisfiesequation

H +∂S

∂t= 0. (5.13)

Hamilton’s equations (5.3) with H′ = 0 then imply

Qa = 0, Pa = 0. (5.14)

In other words, transformed coordinates and momenta are constant. Equations (5.14)can be solved trivially,

Qa = αa, Pa = βa, (5.15)

where αa and βa are integration constants, but they are equal to constant values ofcoordinates and momenta. Then the generating function can be written as

S = S(q, β, t) (5.16)

and equations (5.12) acquire the form

αa =∂S

∂βa, pa =

∂S

∂qa, H +

∂S

∂t= 0. (5.17)

Thus, if we want to find canonical transformation which simplifies the Hamiltonequations, we first solve equations

pa =∂S

∂qa, H(q, p, t) +

∂S

∂t.

Notice that the first equation is merely a definition of pa so the only equation whichmust be in fact solved is

H(q,∂S

∂q, t

)+∂S

∂t. (5.18)

This equation for generating function S is known as the Hamilton-Jacobi equation.Hamilton-Jacobi equation contains 2n + 1 derivatives and therefore the solution Scontains 2n + 1 constants. On of them is additive, for obviously any function S ′ =S + c, where c is constant, is also a solution to (5.18). This constant can be set tozero without the loss of generality because Hamilton-Jacobi equation contains onlyderivatives of S. Hence, the solution will contain 2n constants:


S = S(q1, . . . qn, c1, . . . cn, t). (5.19)

This result should be compared to equation (5.16) where βa are constant momenta.Our aim was to arrive at Hamilton’s equations in the form (5.14), so in order toidentify constants ca with momenta βa we have to show that coordinates derivedfrom generating function (5.19) via (5.12) are indeed constant. We have

Qa =∂S

∂ca

and using Hamilton’s equations and the Hamilton-Jacobi equation we find

Qa =d

dt

∂S

∂ca=

∂

∂qb

(∂S

∂ca

)qb +

∂

∂t

∂S

∂ca=

∂

∂ca

(∂S

∂qb

)qb +

∂

∂ca

∂S

∂t

=∂pb∂ca

∂H∂pb− ∂H∂ca

.

Now we use that fact that Hamiltonian H depends on constants ca only throughgenerating function S,

∂

∂caH

q, ∂S(q, c, t)

∂q︸︷︷︸p

, t

=∂H∂pb

∂pb∂ca

,

so that expression for Qa actually reduces to zero:

Qa = 0.

Hence, we have proved that function S which is a solution to Hamilton-Jacobi equa-tion generates canonical transformation after which the coordinates Qa are constantand we denote them by αa = Qa as we did above. Then we can identify unknown con-stants ca in function S with constant momenta Pa = ca = βa. Hamilton’s equationsin transformed coordinates thus read

Qa = 0, Pa = 0,

as desired.

5.3 Example: harmonic oscillator 109

5.3 Example: harmonic oscillator

The procedure explained in the previous section may seem to be somewhat abstractand it could be useful to see how it works on our favorite example of harmonicoscillator. Let us take, for simplicity, take the Hamiltonian in the form

H(q, p) =p2

2+q2

2.

In order to formulate Hamilton-Jacobi equation, we replace the momentum p byderivative of generating function S,

p =∂S

∂q,

in accordance with (5.12). Hamilton-Jacobi equation (5.18) then reads

∂S

∂t+H

(q,∂S

∂q

)= 0.

Let us put

S = A(t) +W (q)

where A is only a function of time t and W is time-independent. Then the Hamilton-Jacobi equation acquires the form

∂A

∂t= −H

(q,∂W

∂q

).

We know that the Hamiltonian H is constant and equal to the total energy,

∂A

∂t= − E

which integrates to A = −Et and the generating function can be written in the form

S(q, E, t) = − E t+W (q).

Energy E is the first integration constant, in the notation of previous section wewrite β = E. Hamilton-Jacobi equation is now

H(q,∂W

∂q

)= E.


Using particular form of the Hamiltonian we arrive at equation

1

2(W ′(q))

2+

1

2q2 = E,

and after rearrangement,

dW =√

2E − q2dq.

This is an elementary integral and can be evaluated easily but with some work (orusing Mathematica). The result is

W (q, E) = E arcsinq√2E

+1

2q√

2E − q2.

where the additive integration constant has been set to zero1.Since we have identified integration constant E with constant momentum P = β,

we can use relation (5.17),

α =∂S

∂β,

to obtain constant transformed coordinate Q = α. By differentiating S = −Et+Wwe find

α = − t+∂W

∂E= − t+ arcsin

q√2E

.

We have proved in the previous section that α must be constant, we can use the lastequation to express q as a function of time t:

q =√

2E sin(α + t)

which is the usual solution to equation of harmonic oscillator.

1 In order to perform the integration, use the substitution q =√

2E sinx to obtain 2E∫

cos2 xdx. Thenuse trigonometric formula cos2 x = (1 + cos 2x)/2 and perform trivial integration. Finally, return tovariable q by inverting the relation for x and use formula sin 2x = 2 sinx cosx where sinx = q/

√2E and

cosx =√

1− sin2 x.

5.4 Action-angle variables 111

5.4 Action-angle variables

Let us continue with our analysis of harmonic oscillator. An important class of sys-tems is described by so-called integrable Hamiltonians, the term to be defined later.Before we discuss the integrability of the system, we need to introduce a new set ofcanonically conjugated variables known as action-angle variables.

In the previous section we have seen that if the Hamiltonian is time-independent,the action S can be written in the form

S = − E t+W (q, E).

Function W is called Hamilton’s characteristic function and it depends on the coor-dinate q and total energy E. Now we are going to use this function as a generatingfunction for canonical transformation.

We know that harmonic oscillator moves in a periodic way and its phase trajec-tories are circles (or ellipses when we use simplified units, as we do in this chapter)in the phase space. In other words, its phase trajectories are always closed curves.Hence, it makes sense to define new momentum called action variable by

J =

∮p dq (5.20)

where the integral is taken along the orbit of the oscillator, i.e. along the circle. Wesaid that J will be treated as a momentum which means that we identify transformedmomentum β with action-variable J . Recall that the Hamiltonian is equal to the totalenergy,

H(q, p) = E

which can be inverted to find

p = p(q, E).

Hence, the integrand of (5.20) depends on q and E. But since we integrate overvariable q, the integral does not depend on q anymore and we have

J = J(E) or E = E(J).

Consequently, we can write Hamilton’s characteristic function as the function of qand J :

W = W (q, J).


According to (5.11), coordinate Q conjugated to momentum P is a derivative ofgenerating function with respect to momentum. In our case W is the generatingfunction, J plays the role of momentum and conjugated coordinate will be calledangle variable and defined by

w =∂W

∂J. (5.21)

Because generating function W does not depend on time explicitly, by (5.11) wehave H′ = H and since canonical transformations preserve the form of Hamilton’sequations, equation of motion in terms of action-angle variables is simply

w =∂H∂J

. (5.22)

In the case of harmonic oscillator we have

H =p2

2+q2

2= E

so that p =√

2E − q2.

6

Electromagnetic field

Lagrange’s equations and Hamilton’s equations have been derived from Newton’slaw under assumption that the force which acts on the particle is conservative, i.e.it can be written as a gradient of the potential,

F = −∇V.

In this case we can define the Lagrangian L = T − V as a difference of kinetic andpotential energy and consequently we can introduce the Hamiltonian. On the otherhand, when we derived Lagrange’s equations from the variational principle, we justassumed that the system can be described by some Lagrangian L without assumingthe conservative nature of the forces explicitly. We have only argued that if the forceis conservative and thus has a potential V , then the natural choice is L = T−V . Thisapproach, however, does not exclude the possibility that the system can be describedby some function L even if the force is not conservative.

Particle in electromagnetic field is the most important practical example of suchsystem. In physics we often study the motion of charged particles in external elec-tromagnetic fields but we do not care how these fields emerged. Hence, we do notstudy the dynamics of the fields, we merely assume that these fields are given andinvestigate the motion of particles in regions where electromagnetic fields are present.

In the past people thought that electricity and magnetism are two different phe-nomena while today we know that they are just two different aspects of single entitycalled electromagnetic field. Electric part of the field is described by vector field E(sometimes called electric field strength or electric intensity) and magnetic part ofthe field is described by vector field B (sometimes called magnetic induction). Fullyunified view of these fields as parts of electromagnetic field is possible only in theframework of special theory of relativity. Let us elucidate the meaning of fields Eand B.

114 6 Electromagnetic field

Recall that in classical Newtonian theory of gravitation, the sources of gravita-tional force are masses: gravitational force between two point masses m and m′ isproportional to product mm′ and is given by

F = Gmm′

r2

where r is the distance between the between the points and G is gravitational con-stant. Numerical value of constant G in standard SI units is

G = 6, 674× 10−11 m3 kg−1 s−2.

Similarly, the sources of electromagnetic interaction are charges, i.e. charged par-ticles. Charge is usually denoted by symbol q or e and it can be either positive ornegative. Particles with vanishing charge are called neutral. It is a remarkable factthat for two point charges at rest, the electric force of their interaction is given bythe Coulomb law which is formally identical to the Newton’s law of gravitation. Twopoint charges q and q′ at mutual distance r act on each other by electric force ofmagnitude

F = kq q′

r2(6.1)

where k is the constant characterizing the strength of electromagnetic interaction andplays the role similar to that of gravitational constant G in Newton’s law. Numericalvalue of constant k depends on the system of units we use. In standard SI units wewrite k in the form

k =1

4πε0

where ε0 is called permittivity of the vacuum and its value is

ε0 = 8, 854× 10−12 F ·m−1

so that the constant k is

k = 8, 99× 109 F−1 m.

Comparing this value to the value of gravitational constant G we can see that electricforce is much, much stronger than gravitational force.

6 Electromagnetic field 115

However, simple Coulomb’s law (6.1) holds only for charges at rest. When thecharges start to move in an arbitrary way, new effects emerge. First, electromagneticfield propagates at finite speed c equal to the speed of light,

c = 299 792 458 m · s−1.

Notice that in SI units, this value is not approximate but exact. It is related toconstant ε0 by

c =1√ε0 µ0

where µ0 is called permeability of vacuum and its value is, by definition,

µ0 = 4π × 10−7 m · kg · s−2 · A−2.

When we say that the speed of propagation of electromagnetic field is finite andequal to c, we mean that if one charge changes its position, the other charges do notfeel this change immediately but only after time

∆t =r

c

where r is the distance from the charge which changed the position. From this factit is immediately obvious that r in the Coulomb law (6.1) is a problematic quantitybecause we must take into account that the charge at actual distance r cannot haveimmediate effect on some other charge.

Next problem is that moving charge produces not only electric but also magneticfield. Time-dependent electric field is a source of magnetic field and vice versa. Thisis what we mean by dynamics of electromagnetic field: the field can propagate overempty spacetime (without charges) at the speed of light. Hence, the notion of theforce is not appropriate for description of dynamics of electromagnetic interactionand the notion of the field must be introduced.

But, as we claimed, we will not discuss the dynamics of electromagnetic field whichis given by celebrated Maxwell’s equations. We simply assume that the electromag-netic field is given and investigate the motion of charged particles in this field. Onceagain, electromagnetic field is described by electric field E and magnetic field B.

Consider particle with charge q which is moving in the region where only electricfield is present, i.e. B = 0. Then the electric field acts on the particle by force givenby

F = qE. (6.2)


In other words, electric force is proportional to electric field E and the charge ofparticle q, which is an experimental fact. Once we discover this fact, relation (6.2) isa definition of electric vector E. Vector E is such vector that electric force exertedon a point charge q is given by (6.2).

Similarly, consider particle moving in the region where only magnetic field ispresent. Once again we find (experimentally) that the force acting on charge q isproportional to the charge. But, in addition we find that the direction of magneticforce is always orthogonal to the velocity v of the charge. It was discovered thatmagnetic force is given by

F = q v ×B (6.3)

where operation × is standard vector product1(or cross product). Again, relation(6.3) is a definition of magnetic vector B.

When both electric and magnetic fields are present, the force exerted on theparticle is given by the so-called Lorentz force

F = q (E + v ×B) . (6.4)

We emphasize that relation (6.4) is an experimental fact, similarly as the Newton lawof force is, and we do not derive it from some more basic principle. It is fascinatingthat relation (6.4) can be derived from more basic principles but this is completelybeyond the scope of this textbook2. In the theory of electromagnetism it is shownthat instead of electric field E and magnetic field B we can introduce one scalarfunction φ and one vector function A; it is a consequence of Maxwell’s equations. Inthis textbook we proceed differently and assume that this can be done. From thisassumption we will be able to derive correct equations of motion of charged particlein arbitrary electromagnetic field.

6.1 Lagrangian and equations of motion

In accordance with the last paragraph of previous section, we assume that electro-magnetic field can be described, in some sense, by one scalar field φ called scalar

1 Recall that the cross product a × b of vectors a and b is a vector orthogonal both to a and b and itsmagnitude is |a× b| = a b sin θ where θ is the angle between both vectors.

2 Particular form of the Lorentz force can be obtained from the first principles by considering the Poincaregroup of isometries of the Minkowski spacetime. Electromagnetic fields appears to be a massless repre-sentation of the Poincare group with spin 1 which yields the set of Maxwell equations. The Lorentz forcecan be then derived using the principle of local gauge invariance.

6.1 Lagrangian and equations of motion 117

potential and by one vector field A called vector potential so that the Lagrangian ofparticle in electromagnetic field is

L =1

2mv2 − e φ+ ev ·A =

1

2mxi xi − e φ+ e xiAi. (6.5)

where e is a constant measuring the strength of the interaction between the particleand the electromagnetic field; this constant is called charge of the particle. We assumethat the Lagrangian (6.5) represents correct description of particle moving in givenelectromagnetic field. This assumption is justified a posteriori by accordance of thetheory with the experiment.

Equations of motion can be derived from usual Lagrange’s equations (2.18)

d

dt


= 0.

Partial derivatives read

∂L∂xi

= mxi + eAi,∂L∂xi

= −e ∂iφ+ e xj ∂iAj.

Note that total derivative of Ai with respect to time is

dAidt

=∂Ai∂xj

dxjdt

+∂Ai∂t≡ xj ∂jAi + ∂tAi

and hence

d

dt

∂L∂xi

= mxi + e xj ∂jAi + e ∂tAi.

Collecting these auxiliary expression we find that the Lagrange equations of motionare

mxi = −e ∂iφ− e ∂tAi + e xj (∂iAj − ∂jAi) . (6.6)

Now, since ∂ixj = 0, the last term on the right hand side can be rewritten as

xj (∂iAj − ∂jAi) = ∂i(Ajxj)− xj∂jAi = [∇(A · v)− v · ∇A]i .

It is straightforward to proove the identity

v × (∇×A) = ∇(A · v)− v · ∇A


so that relation (6.6) can be written in the vector form as

mdv

dt= −e∇φ− e

∂A

∂t+ ev × (∇×A). (6.7)

This is the equation of motion of charged paricle. However, we can see that theacceleration is not given directly by potentials but by their derivatives (that is thereason why they are called potentials). Hence, we can introduce vectors

E = −∇φ− ∂A

∂t, B = ∇×A, (6.8)

in which case we can write equation (6.7) in the form

mdv

dt= e (E + v ×B) (6.9)

which is the law for the Lorentz force (6.4). For the sake of completeness we list theCartesian components of equation (6.9):

mdvxdt

= eEx + e vy Bz − e vz By,

mdvydt

= eEy + e vz Bx − e vxBz,

mdvzdt

= eEz + e vxBy − e vy Bx.

(6.10)

6.2 Hamilton’s equations

Having derived the Lagrange equations of motion of charged particle in an exter-nal electromagnetic field, we now turn to the Hamiltonian description of the sameproblem. Proceeding in a standard way we introduce a generalize momentum by

pi =∂L∂xi

= mxi + eAi. (6.11)

In order to find the Hamiltonian we invert this relation to find

xi =pim− e

mAi. (6.12)

Notice that although we are working in the Cartesian coordinates, generalized mo-mentum

6.3 Mathematica 119

p = mv + eA

is different from linear momentum mv. The Hamiltonian is then given by the Leg-endre transformation of the Lagrangian,

H = xi pi − L, (6.13)

where we must, however, express the velocities xi in terms of generalized momenta(6.11). After simple rearrangements we find

H =1

2m(p− eA)2 + e φ. (6.14)

Let us now differentiate the Hamiltonian with respect to coordinates and mo-menta,

∂H∂xi

= − e

m(pj − eAj) ∂iAj + e ∂iφ,

∂H∂pi

=1

m(pi − eAi) ,

from which the Hamilton equations follow:

xi =1

m(pi − eAi) ,

pi =e

m(pj − eAj) ∂iAj − e ∂iφ.

(6.15)

6.3 Hamilton equations in Mathematica

Hamilton’s equations (6.15) can be easily implemented in Mathematica. Althoughfollowing code may look a bit complicated, it is in fact very straightforward. Weimplement function

HamiltonEM[φ,A]

where φ and A are functions of Cartesian coordinates x, y, z representing the scalarand vector potential. This function consequently produces the list of six Hamilton’sequations for the particle in electromagnetic field.


HamiltonEM @Φ_ , A_ ; ListQ @A D ì Length @A D 3D :=

Module B8xs, ps, eqs1, eqs2, dependencies, DA , DΦ <,

xs = 8x , y , z<;

ps = 8p1, p2, p3<;

dependencies = 8 x ® x @tD, y ® y @tD, z ® z@tD, p1 ® p1@tD, p2 ® p2@tD, p3 ® p3@tD<;

eqs1 = Equal

Transpose B : D@xs . dependencies, tD -1

mH H ps - e A L . dependenciesL, 80, 0, 0<> F;

DΦ = D@Φ , ð D & xs;

DA = D@A , ð D & xs;

eqs2 = Equal Transpose B: D@ps . dependencies, tD -

e

mH DA .ps - e DA .A L - e DΦ . dependencies , 80, 0, 0<> F;

Flatten @ 8eqs1, eqs2<DF

6.4 Homogeneous fields

As a first example we consider motion of charged particle in the homogeneous mag-netic field B without the presence of electric field, i.e.

E = 0, B = constant.

Let us reformulate these conditions in terms of potentials A and φ.Since magnetic and electric fields are assumed to be constant (or even vanishing),

potentials obviously do not depend on time, so that

E = −∇φ, B = ∇×A.

Next, electric field vanishes and so, by last equations, potential φ is constant whichcan be set to zero without the loss of generality. Remaining equation B = ∇×A inthe component form reads

Bx = ∂yAz − ∂zAy,By = ∂zAx − ∂xAz,Bz = ∂xAy − ∂yAx.

It is possible to find the solution for arbitrary direction of magnetic field, but forconvenience we choose a coordinate system in which B has direction of z−axis,

B = (0, 0, B).

6.4 Homogeneous fields 121

This orientation of Cartesian coordinate system can always be achieved by appro-priate rotation. With this choice we have

0 = ∂yAz − ∂zAy,0 = ∂zAx − ∂xAz,B = ∂xAy − ∂yAx.

Since B is constant along the z−axis, all partial derivatives ∂z must be zero:

0 = ∂yAz,

0 = −∂xAz,B = ∂xAy − ∂yAx.

First two equations tell that Az does not depend on x and y and hence is a constant.However, this constant does not enter expression for B in the third equation andthus we can set Az = 0. Equation for B can be solved, for example, by setting

Ax = 0, Ay = Bx.

Summa summarum, potentials φ and A representing homogeneous magnetic fieldparallel to the z−axis can be written in the form

φ = 0, A = (0, Bx, 0). (6.16)

Reader can check that ∇×A = (0, 0, B). Of course, the choice of the potentials isnot unique and we have chosen the simplest possibility.

In the following code we generate the set of Hamilton’s equations by invokingfunction HamiltonEM defined above and setting initial conditions to

x0 = 1, y0 = 0, z0 = 0, p10 = 0, p20 = 2, p30 = 0.

Numerical values of constants are set to

m = B = e = 1.

In[2]:=eqs = HamiltonEM @ 0, 80, B x , 0<D;

vals = 8m ® 1, B ® 1, e ® 1<;

initConds = 8 x @0D 1, y @0D 0, z@0D == 0, p1@0D 0, p2@0D 2, p3@0D 0<;

tmax = 20;

sol = NDSolve @ Join @eqs, initCondsD . vals,

8x @tD, y @tD, z@tD, p1@tD, p2@tD, p3@tD<, 8t, 0, tmax <D;


Now we plot the solution. All plotting options can be ignored, they serve just toimprove the quality of the plot.

In[114]:=g1 = ParametricPlot3D@ 8x @tD, y @tD, z@tD< . sol, 8t, 0, tmax <,

AxesOrigin ® 80, 0, 0<, Boxed ® False , PlotRange ® 88- 1, 3.5<, 8- 1, 2<, 8- 1, 1<<,

Ticks ® 8 Range @- 1, 3, 1D, Range @- 1, 2, 1D, 8- 1, 1<<,

BaseStyle ® 8FontFamily ® "Times New Roman ", FontSize ® 15<,

ViewPoint ® 81, 1, 1<D;

g2 = Graphics3D@ 8Text@Style @"x ", 15D, 83.5, 0.2, 0<D,

Text@Style @"y ", 15D, 8-0.2, 2, 0<D,

Text@Style @"z ", 15D, 8-0.05, 0.1, 1<D<

D;

Show @g1, g2D

The code above produces following figure:

x

y

z

-10

1

2

3

-1

1

2-1

1

We can see that the trajectory of the particle is a circle of radius 1 centered atposition (2, 0, 0). This is a familiar property of magnetic field: the field does notperform the work on a particle, only changes direction of its motion. Since magneticforce is always orthogonal to velocity, resulting trajectory is a circle.

Now suppose that we add an initial velocity in the z−direction, e.g. we set

p30 = 0, 1.

6.4 Homogeneous fields 123

That means that initial velocity is not orthogonal to magnetic field B anymore, butthe vz-component of the velocity does not affect magnetic force. Hence, in addition tocircular motion, the charge will move uniformly in z−direction. Resulting trajectoryof the particle is called helix (in order to obtain this figure in Mathematica, do notforget to adjust the range on z−axis).

x

y

z-1

0

1

2

3

-1

1

2-1

1

2

3

Let us consider another example. Suppose that in addition to magnetic field, thereis homogeneous electric field

E = (0, 0, E)

in the direction of axis z. This field is time-independent again and thus the equationfor scalar potential reads

E = −∇φ

or, in components,

∂φ

∂x= 0,

∂φ

∂y= 0,

∂φ

∂z= −E,

from which we find


φ = −x z.Corresponding code:

In[248]:=eqs = HamiltonEM @ - E0 z, 80, B x , 0<D;

vals = 8m ® 1, B ® 1, e ® 1, E0 ® 0.01<;

initConds = 8 x @0D 1, y @0D 0, z@0D == 0, p1@0D 0, p2@0D 2, p3@0D 0<;

tmax = 100;



In this case, the motion of the particle consists of uniform circular motion in theplane z = constant and uniformly accelerated motion in the direction of z−axis.

x

y

z-1

0

1

2

3

-1

1

2-1

1

2

3

6.5 Electromagnetic wave

In this section we consider harmonic electromagnetic plane wave propagating in thedirection of x−axis. Electric field is assumed to have a form

E(t, x) = (0, 0, E0 cos(t− x)),

6.5 Electromagnetic wave 125

i.e. it has only z−component. E0 is the amplitude of the electric field. Electric fieldis related to potentials via

E = −∇φ− ∂A

∂t.

Let us set φ = 0:

E = −∂A∂t

.

This equation can be integrated to find the vector potential in the form

A = −∫

E dt = (0, 0,−E0 sin(t− x)) .

Corresponding magnetic field is then

B = (0,−E0 cos(t− x), 0) .

We can see that magnetic field has direction of y−axis and hence it is orthogonalto electric field, which is a general property of electromagnetic waves. Derivationperformed above can be done with Mathematica using following commands:

In[11]:=Needs@"VectorAnalysis`"DEl@t_ , x_ D = 80, 0, E0 Cos@t - x D<;

A = -à El@t, x D â t;

B = Curl@A . x ® Xx D . Xx ® x

Out[14]= 80, - E0 Cos@t - x D, 0<

New potential A can be used to derive Hamilton’s equations in a usual manner,

In[211]:=eqs = HamiltonEM @ 0, A D;

vals = 8m ® 1, E0 ® 1, e ® 1<;

initConds = 8 x @0D 1, y @0D 0, z@0D 0, p1@0D 0, p2@0D 0, p3@0D 0<;

tmax = 100;




and plotted by

In[216]:=g1 = ParametricPlot3D@ 8x @tD, y @tD, z@tD< . sol, 8t, 0, 80<,

AxesOrigin ® 80, 0, 0<, Boxed ® False , PlotRange ® Full,

BaseStyle ® 8FontFamily ® "Times New Roman ", FontSize ® 15<,

ViewPoint ® 81, 1, 1<D;

g2 = Graphics3D@ 8Text@Style @"x ", 15D, 8- 5, 0.5, 0<D,

Text@Style @"y ", 15D, 80, 1.5, 0<D,

Text@Style @"z ", 15D, 80, 0, - 2.2<D<

D;

g = Show @g1, g2D

which yields the following result.

x

y

z

-4

-2

00.0

0.51.0

1.52.0

0.0

0.5

1.0

1.5

6.6 Electrostatic wave 127

6.6 Electrostatic wave

Relations (6.8) hold in general. It can be shown directly from Maxwell’s equationsthat electric and magnetic fields can always be written in the form (6.8). However,there are situations too complicated to be analyzed in this way. For example, electro-magnetic field in the plasma is a complicated consequence of interaction of externalelectromagnetic fields and fields produced by the particles comprising plasma. Insuch situations we usually cannot find electromagnetic fields as exact solution toMaxwell’s equations and some simplifications are necessary. One can imagine exter-nal homogeneous magnetic fields penetrating to plasma and, in addition, an electro-static wave propagating in the plasma. We have seen that electric way described bytime-dependent vector potential is always accompanied by magnetic field given bythe curl of this potential. Thus, any electric wave must be accompanied by magneticwave, as we have seen in the previous section.

On the other hand, in plasma it is possible for electric wave to propagate throughthe medium without generating accompanying magnetic wave which is a consequenceof complicated interactions mentioned above. In this case we can proceed in thefollowing way. We assume the presence of homogeneous magnetic field B and assumethe presence of electrostatic wave. For example,

E = (0, 0, E0 cos(t− x)), B = (B, 0, 0). (6.17)

These fields cannot be described by the same vector potential and hence the equationsof motion cannot be derived from any potential. Nevertheless, with this prescription,we can write down usual Newtonian equation of motion

mdv

dt= e (E + v ×B)

and solve it numerically. Appropriate Mathematica code reads

In[23]:=El = 80, 0, Cos@t - x @tDD<;

B = 81, 0, 0<;

r @t_ D = 8 x @tD, y @tD, z@tD <;

eqs = 8Equal Transpose @8r ''@tD , El + r '@tD B<D,

x @0D 0, y @0D 0, z@0D 0,

x '@0D 0, y '@0D 0, z '@0D 0<sol = NDSolve @ eqs, r @tD, 8t, 0, 100<DParametricPlot@8y @tD, z@tD< . sol, 8t, 0, 100<D


Here we have set initial velocity to zero. The trajectory is found to be the spiral.

-40 -20 20 40

-40

-20

20

40

7

Discrete dynamical systems and fractals

This chapter is a digression from the main line but, first, discrete dynamical systemsprovide a simple model of more complicated continuous dynamical systems which wewill study later and, second, we will plot nice pictures called fractals and get someinsight into complicated nature of chaotic systems.

7.1 Complex sequences

We start our discussion with one of the most famous examples of fractals, the Man-delbrot set, which is very easy to plot using Mathematica. Let us choose arbitrarypoint z0 ∈ C in the complex plane and let us define a sequence of complex numbersby recurrent relation

zn+1 = f(zn) + z0 (7.1)

where f(z) = z2. Thus, starting from a given z0, members of this sequence read

z1 = f(z0) + z0 = z20 + z0,

z2 = f(z1) + z0 = z40 + 2 z30 + z20 + z0,

· · ·

We can use Mathematicato generate members of this sequence using the followingcommand

NestList[#2 + z0&, z0, 5]//Expand (7.2)

which generates first five members:

130 7 Discrete dynamical systems and fractals

z0 = z0,

z1 = z20 + z0,

z2 = z40 + 2z30 + z20 + z0,

z3 = z80 + 4z70 + 6z60 + 6z50 + 5z40 + 2z30 + z20 + z0,

z4 = z160 + 8z150 + 28z140 + 60z130 + 94z120 + 116z110 + 114z100 + 94z90 + 69z80+ 44z70 + 26z60 + 14z50 + 5z40 + 2z30 + z20 + z0,

z5 = z320 + 16z310 + 120z300 + 568z290 + 1932z280 + 5096z270 + 10948z260+ 19788z250 + 30782z240 + 41944z230 + 50788z220 + 55308z210 + 54746z200+ 49700z190 + 41658z180 + 32398z170 + 23461z160 + 15864z150 + 10068z140+ 6036z130 + 3434z120 + 1860z110 + 958z100 + 470z90 + 221z80 + 100z70+ 42z60 + 14z50 + 5z40 + 2z30 + z20 + z0.

Obviously, the complexity of each term zn grows very quickly with increasing n.It is instructive to see the behaviour of the sequence graphically. Hence, we choose

some particular z0 and plot few terms zn of the sequence starting from z0. Let usdefine following functions:

seq[z0_, n_] := NestList[ #^2 + z0 &, z0, n] // Expand

list[z0_, n_] := Re[#], Im[#] & /@ seq[z0, n]

First definition defines function which generates the list of n members of the sequencezn. For example, command seq[I, 10] generates the list of ten members of the sequencestarting at point z0 = i:

i,−1 + i,−i,−1 + i,−i,−1 + i,−i,−1 + i,−i,−1 + i,−i.

However, we cannot plot complex numbers directly and so we must convert eachcomplex number z = x+ iy into a pair of coordinates (x, y). This is accomplished byfunction list. We define a pure function

Re[#], Im[#]&

which splits the argument into its real and imaginary parts. Then we apply this purefunction to all elements of the list seq[z0,n]. Using the previous example, commandlist[I, 10] produces

0, 1, −1, 1, 0,−1, −1, 1, 0,−1,−1, 1, 0,−1, −1, 1, 0,−1, −1, 1, 0,−1.

7.1 Complex sequences 131

Notice that this sequence is periodic: except from the starting point i, the sequenceis jumping from −1 + i to −i and back, infinitely.

The list produced by list can be already plotted by ListLinePlot. Let us plot thelist list[I,10] by

ListLinePlot[ list[I, 10],

PlotRange -> Full, AxesOrigin -> 0, 0, AspectRatio -> 1,

PlotMarkers -> Automatic,

PlotStyle -> Blue ,


]

Expected result is plotted in figure 7.1.

Fig. 7.1. Points of the sequence zn starting from point i.

Now, let us choose a different starting point close to original point i, e.g z0 =0.8i, and construct first ten members of the sequence again. We can compare bothtrajectories using the following code:

ListLinePlot[ list[I, 10], list[0.8 I, 10],

PlotRange -> Full, AxesOrigin -> 0, 0, AspectRatio -> 1,

PlotMarkers -> Automatic,


PlotStyle -> Blue, Orange,


]

We can see in figure 7.2 that the behaviour of the sequence changed significantly.It is not periodic anymore but, in addition, it exhibits unpredictable behaviour. Wecould guess that if we choose starting point z0 = 0.9i we obtain sequence ”somewherebetween” sequences starting from i and 0.8i. Reader is invited to plot the result forz0 = 0.8i, here we just present the list of points produced by list[0.9I,10]:

0., 0.9, −0.81, 0.9, −0.1539,−0.558, −0.287679, 1.07175, −1.06589, 0.283359, 1.05584, 0.295938,1.02721, 1.52493, −1.27023, 4.03285, −14.6504,−9.34529, 127.3, 274.725, −59268.4, 69945.6

Obviously, this sequence is not bounded and it escapes to infinity very quickly.What conclusion can be drawn from examples above? What we did actually see

is the most characteristic property of chaotic systems: sensitivity to initial condi-tions. Particular choice of the starting point z0 corresponds to imposing the initialcondition. We have seen three sequences starting from points close to each other,i, 0.9i and 0.8i. In non-chaotic systems, if we change initial positions slightly, alsothe solution will change only slightly. In chaotic systems, the behaviour can differdrastically even for very similar initial conditions. In our examples, first sequencewas periodic, second was unpredictable and the third one was diverging and tendingto infinity.

7.2 Mandelbrot set

In the case of Hamiltonian systems we were able to visualize possible behaviour of thesystem by the method of phase portraits. Phase trajectories of harmonic oscillatorwere circles, phase trajectories of pendulum were more complicated and we revealedthe existence of two type of periodic motions (open and closed curves) separatedby separatrix. For chaotic systems it is usually impossible to plot a phase portraitbecause trajectories are very complicated and irregular. For illustration, figure 7.3certainly is not very useful.

However, in order to visualize extreme sensitivity to initial conditions, it is notimportant to see all kinds of trajectories. Qualitative behaviour of trajectories ismore interesting. Each sequence can either stay in a bounded region or escape toinfinity. We cannot inspect asymptotic behaviour of particular sequence but we canchoose a fixed radius R and investigate whether the sequence stays inside the regionbounded by circle of radius R or whether it escapes the circle after some numberof steps. In such a way we can assign a number to each point of the plane. Let usdescribe the algorithm more precisely.

7.2 Mandelbrot set 133

Fig. 7.2. Comparison of two sequences with close starting points i and 0.8i.

Fig. 7.3. Ten sequences starting from initial points of the form z0 = x+ 0.8I, x ∈ (−0.5, 0.5).


Parameters of the algorithm are radius R > 0 and maximum number of stepsnmax. We choose a point z0 = x0 + iy0 ∈ C and construct the sequence zn startingfrom this point. If |zn| > R then the algorithm stops and returns value n. If |zn| < R,we compute zn+1 and repeat the procedure. If n > nmax, the algorithm stops andreturns value nmax. In this way we assign an integer to each point z0 of the complexplane or, equivalently, to each point (x0, y0) of usual Euclidean plane.

Let us see how this algorithm can be implemented in Mathematica. In usual pro-cedural languages we would use some kind of cycle like for or while. In Mathematica,these cycles can be still implemented but functional methods are more satisfactory;in this case we use function NestWhileList. Function Mandelbrot implementing thealgorithm described above follows:

Mandelbrot[x_, y_,

OptionsPattern[MaxRadius -> 100, MaxSteps -> 50]] :=

Module[ c, R, n,

c = x + I y;

R = OptionValue[MaxRadius];

n = OptionValue[MaxSteps];

Length[NestWhileList[ N[#^2 + c] &, c, (Abs[#] < R) &, 1, n]]

]

The head of the function tells Mathematicathat the function has two obligatory pa-rameters x and y – these are the coordinates of initial point (x0, y0) in the plane.Moreover, function accepts optional arguments specifying the behaviour of the func-tion. In our case, optional parameters are maximum radius R with default value 100and the maximum number of steps with default value 50. If we call function withoutspecifying optional parameters, e.g.

Mandelbrot[ 1, 3 ],

default values are used. If we want to change these values, we call the function inthe form, e.g.

Mandelbrot[1, 3, MaxRadius -> 20, MaxSteps -> 1000]

Reader should be familiar with this notation as it is used in many predefined functionsin Mathematica.

Then we define three local variables c, R and n. Variable c represents the initialpoint because we set

c = x+ i y.

7.2 Mandelbrot set 135

Variables R and n are set to the values of parameters MaxRadius and MaxSteps andwe introduce them only to increase the readability of the code. The core of functionMandelbrot is in the last command. Function

NestWhileList[ N[#^2 + c] &, c, (Abs[#] < R) &, 1, n]

applies the pure function #2+c&, which is our function f(z) = z2 + z0, to the initialvalue c repeatedly. Calling of function N is included in order to obtain just numericalvalue of the result instead of exact value which would take a long time and occupya lot of memory (after all, reader is invited to remove the calling of this function tosee the differnce). Function NestWhileList stops when the condition specified againas a pure function is violated. In our case, the condition |zn| < R is typed as a purefunction (Abs[#] < R)&. Next parameter of NestWhileList specifies how many recentresults of nested call should be inserted to the test. Here we want to test only the lastresult and hence set this parameter to 1. The last parameter n specifies maximumnumber of the calls.

The result of NestWhileList is the sequence of numbers zn which stops if |zn| > Ror if n > nmax. The point is that this command returns the list of all members ofgiven sequence so taking its length we find how long this sequence is. This numberis then a result of function Mandelbrot.

Finally we can visualize function Mandelbrot using

DensityPlot[

Mandelbrot[x, y], x, -1.5, 0.5, y, -1.3, 1.3,

PlotPoints -> 100]

Function DensityPlot serves to visualize functions of two variables not by plotting athree-dimensional graph but by assigning a color to each point (x, y) depending onthe value of the function to be plotted. The result is shown in figure 7.4 and is knownas the Mandelbrot set.

The meaning of regions with different colors can be understood easily. For exam-ple, if we choose zero to be the initial point, z0 = 0 then all members of the sequencemust be zero, for we have zn = z2n−1 + 0 = 0. In other words, the sequence stays atpoint zero for all n and therefore function Mandelbrot will stop only after maximumnumber of steps have been reached. Indeed, typing

Mandelbrot[0, 0]

yields the result 51. That means that after 50 steps the sequence was still in thecircle of radius R. We can see that the neighbourhood of zero is plotted in whitecolor in figure 7.4. Hence, white regions correspond to high values of the functionMandelbrot. Blue color, on the other hand, represents regions where the values of the


Fig. 7.4. The Mandelbrot set.

function are small and so the sequence escapes the circle of radius R very soon. Forexample, at point (−1.5, 1) the value of

Mandelbrot[-1.5, 1]

is equal to 5 which means that the sequence escapes the circle after 5 steps.It is natural to expect that small numbers close to zero yield bounded sequence

while numbers distant from zero yield rapidly diverging sequences. An unexpectedfeature of this construction is the existence of boundary between blue and whiteregion which exhibits highly non-trivial structure. This boundary is obviously irreg-ular but when we zoom into the boundary, we find kind of self-similarity: at eachscale we observe similar shape of the boundary. In figure 7.5 we plot the boundaryof Mandelbrot sets for different zooms.

This complicated structure of Mandelbrot’s set corresponds to the behaviour ob-served in the previous section. Two different but close points give rise to sequenceswith very different behaviour: one sequence remains bounded while the other oneescapes to infinity. Thus, in this sense, Mandelbrot set visualize extreme sensitivityof the sequence zn to the choice of the initial point.

Fig. 7.5. Mandelbrot set on different scales.

8

Dynamical systems

We explained in previous chapters that both Lagrange’s equations and Hamilton’sequations are equivalent to original Newton’s law of force F = ma if the force Fcan be written as a gradient of the potential, i.e. if the force F is conservative. Onthe other hand, we have seen that electromagnetic field is not conservative but themotion of charged particle can still be described by the Lagrangian and consequentlyby the Hamiltonian.

Hamilton’s equations

qa =∂H∂pa

, pa = −∂H∂qa

are first-order ordinary differential equations and we have seen that such system ofequations can be given a geometrical interpretation in the phase space. In fact, usingthe conservation of energy we were able to plot the phase trajectories even withoutactually solving the equations of motion. In this chapter we study more generalsystem of equations when the right hand side is not derived from the Hamiltonianbut it is a general function. We will see that any second-order equation of motioncan be written as the system of first-order equation, but we will not be restricted toconservative systems while the geometrical interpretation of the phase trajectorieswill be preserved. Dynamical systems provide an appropriate framework for studyingall kinds of physical systems including those with the friction or time-dependentexternal forces.

8.1 Definition

Dynamical system is a set of n first-order ordinary differential equations of the form

140 8 Dynamical systems

x1(t) = f1(x1(t), x2(t), . . . xn(t), t),

x2(t) = f2(x1(t), x2(t), . . . xn(t), t),

...

xn(t) = fn(x1(t), x2(t), . . . xn(t), t),

(8.1)

where xa = xa(t) are unknown functions of time, a = 1, 2, . . . n and fa are arbitrarydifferentiable functions of variables xa and possibly of time t. If functions fa do notdepend on time explicitly, dynamical system is called autonomous, otherwise it iscalled non-autonomous. Using the index notation, dynamical system (8.1) can bewritten briefly in the form

xa = fa(x, t) (8.2)

where x stands for the n−tuple of variables xa. Autonomous system is then

xa = fa(x).

In this notation we suppress time-dependence of xa on time because this dependenceis assumed implicitly.

Motivated by Hamilton’s formalism, we intend to interpret the solution xa = xa(t)as the motion in the phase space. Phase space is an abstract space1

M = Rn[x1, x2, . . . xn]

with coordinates xa. Arbitrary point x ∈ M represents the state of physical systemdescribed by equations (8.1). Solution of dynamical system is not unique unless wespecify the initial conditions, i.e. values of coordinates xa at some given initial timet0,

x10 = x1(t0), . . . xn0 = xn(t0).

Usually we set t0 = 0. The n−tuple of initial coordinates xa0 will be denoted simplyby x0 ∈M .

Suppose we choose a point x0 ∈M at time t0 = 0 as in figure 8.1. A mathematicaltheorem guarantees that there exists unique solution x = x(t) satisfying (8.1) suchthat x(0) = x0. The solution x = x(t) is also called the phase trajectory. Equations(8.1) essentially state that vector f(x(t)) evaluated at arbitrary point of the trajec-tory is in fact tangent to the trajectory, see figure 8.1. Hence, we can interpret vectorfield f(x) as a velocity. Although it can be very difficult or even impossible to solvethe equations of motion (8.1), the velocity gives us a good idea about the behaviourof the system.

1 Our definition is a simplification. In differential geometry, the phase space is defined as cotangent bundleon the configuration manifold endowed with canonical symplectic form ω = qa ∧ pa.

8.2 Example 141

x1

x2

x10

x20

f (x0) = x(0)

f (x) = x

x0

x

Fig. 8.1. Two-dimensional dynamical system of the form xa = fa. Initial position is at x0 = x(0).The “velocity” vector at x0 is f(x0) and determines the trajectory of the system in the infinitesimalneighbourhood of the initial point.

8.2 Example

Let us see an illustrating example. We are already familiar with the equation ofharmonic oscillator

θ + θ = 0.

This is a second order equation but we can bring into into the firs-order form bysetting

x1 = θ, x2 = θ.

Then we have

x1 = θ = x2

and

x2 = θ = −θ = −x1.


Hence, instead of single equation θ+θ = 0 of second order we now have two equationsof first order

x1 = x2,

x2 = −x1.(8.3)

Clearly, this is a dynamical system (8.1) if we set f1 = x2 and f2 = −x1. Thus, thevelocity field is

f(x1, x2) = (x2,−x1) . (8.4)

This vector field can be visualised in Mathematica by function VectorPlot:

f@x_ , y_ D = 8y , - x <;

StreamPlot@ f@x , y D, 8x , - 2, 2<, 8y , - 2, 2<D

Resulting figure is

-2 -1 0 1 2

-2

-1

0

1

2

This picture agrees with our previous analysis when we used the conservation ofenergy to show that the phase trajectories of harmonic oscillator are circles (or ellipses

8.3 Implementation in Mathematica 143

when using SI units). Another possibility is to use function StreamPlot with the samearguments which yields

-2 -1 0 1 2

-2

-1

0

1

2

8.3 Implementation in Mathematica

In this section we show how to implement a dynamical system in Mathematica in aconvenient way.

DynSys@f_ , IC_ , tmax_ D := Module @8vars, lhs, rhs, eqs, inConds<,

vars = Table @ x @aD@tD, 8a, 1, Length @IC D<D;

lhs = D@vars, tD;

rhs = f@Sequence varsD;

eqs = Equal Transpose @8lhs, rhs<D;

inConds = Equal Transpose @8vars . t ® 0, IC <D;

NDSolve @Join @eqs, inCondsD, vars, 8t, 0, tmax <DD

This code deserves a brief explanation. Arguments of the function DynDys are

• pure function f – this is a vector function representing the right hand side ofdynamical system (8.1);


• initial conditions IC – list of the initial values of variables xa at time t = 0;• tmax – upper bound of interval t ∈ (0, tmax).

Hence, function DynSys can be called, e.g. with the arguments

In[4]:=sol = DynSys@ 8ð2, - ð1< &, 81, 1<, 10D

In this example, pure function f is

#2, -#1 &

which is equivalent to

f(x1, x2) = (x2,−x1).

Clearly, this corresponds to harmonic oscillator (8.4). Initial conditions IC are set to

x1(0) = 1, x2(0) = 1

and we want to find the solution in time interval (0, 10).Now suppose that we called function DynSys with the arguments above and let

us explain how this function works. Thus, we assume that the arguments are

f = #2, -#1&

IC = 1, 1

tmax = 10.

The first command

vars = Table[ x[a][t], a, 1, Length[IC]];

creates a list of variables xa(t) in the form

vars = x[1][t], x[2][t] .

The left hand side of equations xa(t) = fa(t) is generated simply by calling

lhs = D[vars, t]

which yields

lhs = x[1]’[t], x[2]’[t] .

8.3 Implementation in Mathematica 145

Now we form the right hand side of equations. Recall that vars is the list ofvariables. We want to evaluate functions fa at point xa, i.e. we need the expression

fa(x1, . . . xn).

However, we cannot write simply f[vars] because this would mean

f[x[1][t], x[2][t]]

while what we need is

f[ x[1][t], x[2][t] ].

Hence, we must turn the list vars into the sequence of arguments by replacing itshead. Command

rhs = f[Sequence @@ vars].

leads to correct application of function f to arguments xa:

f[ Sequence @@ vars ]= f[ x[1][t], x[2][t] ]

= x[2][t], -x[1][t].

Having defined the left hand side and the right hand side of dynamical systemseparately, we join them in a usual way,

eqs = Equal @@@ Transpose[ lhs, rhs ]

Next we define the initial conditions with values specified in argument IC=1,1. Weneed to produce the list

x[1][0] == 1, x[2][0] == 1

The left hand side of initial conditions consists of the elements of vars evaluated attime 0,

vars /. t->0,

the right hand side consists of the elements of IC. We con join them together by

inConds = Equal @@@ Transpose[vars /. t -> 0, IC];

Finally we solve the list of equations of motion and initial conditions by NDSolve:

NDSolve[Join[eqs, inConds], vars, t, 0, tmax].

Now, the reader should be familiar with functionality of function DynSys. We usethis function in the following examples.

To finalize this section we show how to use function DynSys to solve the motionof harmonic oscillator.


In[7]:=sol = DynSys@ 8ð2, - ð1< &, 81, 1<, 10DParametricPlot@ 8x @1D@tD, x @2D@tD< . sol, 8t, 0, 10<D

Out[7]= 88x @1D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD,x @2D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD<<

Out[8]=

- 1.0 - 0.5 0.5 1.0

- 1.0

- 0.5

0.5

1.0

8.4 Chaotic pendulum

In the previous chapters we introduced the mathematical pendulum as a simpleexample of physical system which has only one degree of freedom (angle of deflectionθ) and is described by the Lagrange equation

θ + sin θ = 0.

This equations is non-linear because of the presence of the sine. We have seen thatthis equation cannot be solved in terms of elementary functions but we were ableto find the numerical solution. Moreover, using the Hamiltonian formalism we wereable to plot the phase trajectories without actually solving the equation of motion.

We can generalize the model of mathematical pendulum in several ways. First, anyrealistic system is dissipative, i.e. there are resisting forces acting against the motion

8.4 Chaotic pendulum 147

of the pendulum. As an approximation, resisting force is proportional to velocity andhas opposite direction. In the case of pendulum, velocity is proportional to θ andhence equation of pendulum with resisting force has the form

θ + b θ + sin θ = 0,

where b is the constant characterizing the strength of resisting force. For example,it can be related to the viscosity of the medium in which the pendulum moves.Pendulum with resisting force is called damped pendulum.

Next we can assume that in addition to restoring gravitational force there is anexternal force acting on the pendulum. Such force is called driving force. In thepresence of driving force, even if the initial velocity of the pendulum is zero (and thependulum is at equilibrium position), driving force will make the pendulum to move.Resulting motion of the pendulum will be a ”mixture” of two motions: periodicmotion due to self-oscillations of the pendulum, and motion due to driving force.Pendulum with the driving force, driven pendulum with the friction is described byequation

θ + b θ + sin θ = F0 sinΩt (8.5)

where we assume that driving force is harmonic with angular frequency Ω and ampli-tude F0. In the subsequent analysis we will show that this kind of pendulum exhibitschaotic behaviour and hence we also call it chaotic pendulum.

We start with rewriting equation (8.5) in the form of dynamical system. This isstraightforward since we can define

x1 = θ, x2 ≡ p = θ, x3 = φ = Ωt.

We will freely pass from notation (θ, p, φ) to equivalent notation (x1, x2, x3) accordingto the context. By definition, variable φ satisfies equation

φ = Ω,

while variable p (which is clearly related to the momentum of the pendulum) wasdefined by

θ = p

which can be consequently regarded as an equation for θ. The only true dynamicalequation is an equation for p which follows from (8.5):


p = F0 sinφ− b p− sin θ.

Thus, variables (x1, x2, x3) = (θ, p, φ) are determined by dynamical system

θ = p,

p = F0 sinφ− b p− sin θ,

φ = Ω,

(8.6)

supplemented with initial conditions, i.e. values of xa at time t = 0.Dynamical system (8.6) can be solved in Mathematica using function DynSys

defined in the previous section. We choose initial conditions

θ(0) =π

4, p(0) = 0, φ(0) = 0

and investigate how values b, Ω and F0 affect the behaviour of the pendulum. Firstwe set

b = 0, , F0 = 0, , Ω = 0.

So, we reduce driven pendulum to the case of ordinary mathematical pendulumwithout friction and driving force (damping coefficient b = 0 and the amplitude ofthe force is F0 = 0).

In[37]:=vals = 8b ® 0, f0 ® 0, W ® 0<;

tmax = 10;

sol = DynSys@8 ð2, f0 Sin @ð3D - b ð2 - Sin @ð1D, W< & . vals, 8Π 4, 0, 0<, tmax D

Out[39]= 88x @1D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD,x @2D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD,x @3D@tD ® InterpolatingFunction @880., 10.<<, <>D@tD<<

Here we have chosen tmax = 10 but the reader should adjust this parameter in orderto reproduce all figures below. Now we can plot the phase trajectory in a usual way.


In[28]:=ParametricPlot@ 8x @1D@tD, x @2D@tD< . sol, 8t, 0, tmax <, PlotRange ® FullD

Out[28]=

- 0.5 0.5

- 0.6

- 0.4

- 0.2

0.2

0.4

0.6

Let us add the friction now and set

b = 0.1,

see figure 8.2 for the result. We can see that the phase trajectory is a spiral which,in the limit tmax → ∞, ends at the origin of the phase plane. This means that theoscillations are damped until the pendulum stops. Slightly more ”fancy” picture canbe obtained by

In[111]:=g1 = ParametricPlot@ 8x @1D@tD, x @2D@tD< . sol, 8t, 0, tmax <, PlotRange ® Full,

AxesLabel ® 8"Θ ", " p "<, Ticks ® None ,

BaseStyle ® 8 FontName ® "Times New Roman ", FontSize ® 15<D;

g2 = Plot@ x @1D@tD . sol, 8t, 0, tmax <,

AxesLabel ® 8"t", "Θ "<, Ticks ® None ,

BaseStyle ® 8 FontName ® "Times New Roman ", FontSize ® 15<D;

GraphicsRow @ 8g1, g2<D

which results in figure 8.3.


-0.6 -0.4 -0.2 0.2 0.4 0.6 0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

Fig. 8.2. Parameters b = 0.1, F0 = 0. Non-zero friction leads to damped oscillations of the pendulum.

Θ

p

t

Θ

Fig. 8.3. Phase trajectory of damped pendulum together with the time dependence of deflectionθ = θ(t).


Let us see how the driving force affects the motion. For this purpose we set

b = 0, F0 = 1, Ω = 2,

and choose the initial conditions to

θ(0) = p(0) = φ(0) = 0.

In other words, the pendulum is initially at its equilibrium position and, hence,without driving force it would stay at rest. However, the presence of driving forceleads to solution plotted in figure 8.4.

Θ

p

t

Θ

Fig. 8.4. Motion of the pendulum without friction under the external driving force with amplitudeF0 = 1 and angular frequency Ω = 2. Initial position of the pendulum is θ(0) = p(0) = 0.

In the following example we choose

b = 1, F0 = 1, Ω = 1,

see figure 8.5. An interesting feature of this solution is the presence of short transientstage during which the phase trajectory follows outgoing spiral but then settles atcircular periodic orbit. Such behaviour is called limit cycle.

The reader is invited to experiment with values of parameters b, F0 andΩ and withinitial values θ(0), p(0) and φ(0). We can see that resulting motion of the pendulumis a consequence of complicated and delicate interplay between three motions:


Θ

p

t

Θ

Fig. 8.5. Motion of the pendulum with friction (b = 1) under the external driving force with amplitudeF0 = 1 and angular frequency Ω = 1. Initial position of the pendulum is θ(0) = p(0) = 0.

• self-oscillations of the pendulum;• resisting force (friction);• external driving force.

While for some combinations of parameters the motion is perfectly understandable(like in the absence of the friction and the driving force or in the absence of drivingforce but in the presence of friction), for general values the motion is unpredictable,chaotic.

8.5 Critical points of the pendulum

Let us return to equation of pure mathematical pendulum

θ + sin θ = 0,

or, in the form of dynamical system,

θ = p, p = − sin θ. (8.7)

What are possible equilibrium positions of the pendulum? Clearly, if we set

θ(0) = 0, p(0) = 0,

the pendulum will not move. These conditions correspond to the situation when thependulum is hanging freely at the equilibrium position with zero initial velocity. Inthis case the derivatives of θ and p take values

8.5 Critical points of the pendulum 153

θ = 0, p(0) = − sin θ(0) = − sin θ0 = 0.

In other words, the derivatives of all variables xa, where x = (θ, p), vanish andtherefore the pendulum does not move.

However, there is another possibility. If we set

θ(0) = π, p(0) = 0,

then the derivatives will vanish as well:

θ = 0, p(0) = − sin θ(0) = − sin θπ = 0.

This corresponds to situation when the pendulum is in upper position. If we wereable to arrange initial conditions in such a way that the angle of deflection θ is exactlyequal to π and the velocity is zero, we would obtain an equilibrium configuration inwhich the pendulum does not move.

Points with these properties are called critical points or fixed points. In general,critical point xC of dynamical system

xa = fa(x)

is a point for which

xa = fa(xC) = 0.

If we choose the initial point to x(0) = xC , the system will remain at this initialposition forever, it will not move. For mathematical pendulum we have two criticalpoints,

xC1 = (0, 0) and xC2 = (π, 0).

They are sketched in figure 8.6.On the other hand, we feel that two critical points of mathematical pendulum

have a different character. The first critical point xC1 is stable in the sense that smallperturbation results in periodic oscillations near this critical point. Critical pointxC2 is unstable in the sense that arbitrarily small perturbation will cause the fall ofpendulum and results in oscillations around critical point xC1!

This observation is based on our physical intuition but can we predict stabilityor instability of critical points directly from equations? By definition, critical pointsrepresent equilibrium configurations of the system. Can we predict the behaviour ofthe system near the critical point?


xC2 = (π, 0), θ = π, p = 0

xC1 = (π, 0), θ = 0, p = 0

Fig. 8.6. Two critical points of mathematical pendulum.

The idea is that small perturbations of stable critical points will produce smalldeviations from equilibrium position, but small perturbations of unstable criticalpoints will result in a motion far from the critical point. We will use a basic factfrom mathematical analysis that, under certain assumptions, function f(x) can beexpanded into the Taylor series around arbitrary point xC in the following way

f(xC + δ) = f(xC) + δ f ′(xC) +1

2δ2 f ′′(xC) +O

(δ3)

(8.8)

where f ′(x) denotes the value of derivative of f at point x, f ′′(x) is the secondderivative at the x, etc.

First we analyse critical point xC1 = (0, 0). Let us denote critical values of θ = x1and p = x2 by

θC = 0, pC = 0.

Now suppose that the angle θ is only a small perturbation of θC , i.e.

θ = θC + δ where |δ| 1.

Since θC = 0 is a constant, we have

8.5 Critical points of the pendulum 155

θ = δ.

Next we simplify equations of motion (8.7) under this assumption. Let us expandsin θ around critical point θC = 0:

sin θ = sin(θC + δ) = sin θC + δd

dθsin θ

∣∣∣∣θ=θC

= sin θC + δ cos θC = δ

where we have neglected higher powers of δ which is assumed to be small. With thisassumption, equations of pendulum (8.7) simplify to

δ = p, p = −δ.

These are, in fact, well-known equations for harmonic oscillator and we can easilyplot the solution which we already know is a circle. We can plot it by

sol = DynSys[ #2, - #1 &, 0.1, 0, 10];

ParametricPlot[ x[1][t], x[2][t] /. sol, t, 0, 10]

where we have chosen small initial deflection θ(0) = 0.1, in accordance with theassumption. Since the solution is a circle, we observe that phase trajectories nearthe first critical point remain in the vicinity of this critical point; an indicator ofstability.

Let us now investigate the second critical point located at

θC = π, pC = 0.

As in the previous case, we assume that deviations from critical value θC are smalland write

θ = θC + δ = π + δ, |δ| 1.

Now we expand sin θ into the Taylor series about θ = θC as follows:

sin θ = sin(θC + δ) = sin θC + δd

dθsin θ

∣∣∣∣θ=π

= sinπ + δ cos π = δ.

Equations of motion (8.7) simplify to

δ = p, p = δ.

In order to compare solutions near both critical points we use the following code:


In[154]:=tmax = 2 Π ;

Needs@"PlotLegends`"Dsol1 = DynSys@ 8ð2, - ð1< &, 80.1, 0<, tmax D;

sol2 = DynSys@ 8ð2, ð1< &, 80.1, 0<, tmax D;

ParametricPlot@ 88x @1D@tD, x @2D@tD< . sol1, 8x @1D@tD, x @2D@tD< . sol2<,

8t, 0, tmax <, PlotRange ® 8 8-0.5, 2<, 8-0.5, 2<<,

AxesLabel ® 8∆, p<, PlotStyle ® 8 Red , Blue <, BaseStyle ® 8FontSize ® 15<,

PlotLegend ® 8"Critical point ΘC =0", "Critical point Θc = Π "<, LegendPosition ® 8-0.5, 1<D

Both trajectories near critical points are plotted in figure 8.7. We can see that tra-jectory corresponding to first critical point θC = 0 is a circle and thus remains in thevicinity of the critical point. The second trajectory corresponding to critical pointθC = π, on the other hand, is a line which escapes to infinity. Hence, we can seethat the second critical point is unstable in the following sense. If we move the pen-dulum to θ = π and set the initial velocity to zero, the pendulum remains at thisequilibrium position. However, arbitrarily small perturbation (in our case δ = 0.1)will cause the pendulum to escape from equilibrium position quickly. In our case,the trajectory escapes to infinity, but this is an artefact of the linearization: we haveassumed that the perturbation δ is small but as soon as the pendulum is far enoughfrom the critical point, this assumption is not valid anymore.

8.6 Stability of critical points

Having illustrated the main idea about the stability and instability on the exampleof mathematical pendulum, we can proceed to a general theory. For simplicity werestrict ourselves to autonomous planar dynamical systems, i.e. dynamical systemswith only two variables x1 = x and x2 = y which can be visualised in the plane.Hence, planar dynamical system is a set of two first-order equations of the form

x = fx(x, y) y = fy(x, y). (8.9)

Critical point of system (8.9) is such a point (xC , yC) for which

fx(xC , yC) = fy(xC , yC) = 0. (8.10)

Then the equations of motion (8.9) reduce to

x(xC , yC) = 0, y(xC , yC) = 0,

8.6 Stability of critical points 157

- 0.5 0.5 1.0 1.5 2.0∆

- 0.5

0.5

1.0

1.5

2.0p

Critical point Θ c = Π

Critical point Θ C = 0

Fig. 8.7. Phase trajectories near two critical points θC = 0 and θC = π. In both cases the actualdeflection is θ = θC + δ but with different θC . We can see that the red trajectory is a circle about theorigin while the blue trajectory diverges to infinity rapidly.

which means that critical points represent the equilibrium configurations of the sys-tem.

Now we want to investigate the stability or instability of critical points. Thatmeans we want to find out how the phase trajectories behave in the vicinity ofcritical points. In the case of the pendulum we have seen that an appropriate wayhow to proceed is to linearize the system of equations near the critical point.

Let us assume that (xC , yC) is a critical point of system (8.9). In the neighbour-hood of critical point we can write


x = xC + δ, |δ| 1,

y = yC + ε, |ε| 1.(8.11)

Since xC and yC are constants, for the time derivatives of x and y we have

x = δ, y = ε.

Function fx(x, y) can be then expanded into the Taylor series:

fx(x, y) = fx(xC + δ, yC + ε) = fx(xC , yC) + δ∂fx∂x

∣∣∣∣(xC ,yC)

+ ε∂fx∂y

∣∣∣∣(xC ,yC)

= a δ + b ε, (8.12)

where we have used definition (8.10) in the last step and denoted partial derivativesof fx by

a =∂fx∂x

∣∣∣∣(xC ,yC)

, b =∂fx∂y

∣∣∣∣(xC ,yC)

. (8.13)

Vertical line with the subscript indicates that partial derivatives must be evaluatedat the critical point. Similarly, for fy we find

fy(x, y) = fy(xC + δ, yC + ε) = c δ + d ε

where

c =∂fy∂x

∣∣∣∣(xC ,yC)

, d =∂fy∂y

∣∣∣∣(xC ,yC)

. (8.14)

Thus, near the critical point, planar dynamical system (8.9) can be replaced bysimpler equations

δ = a δ + b ε,

ε = c δ + d ε.(8.15)

Coefficients a, b, c and d are not functions but constants given by (8.13) and (8.14).It is useful to write equations (8.15) in the matrix form. Let us define

x =(δ ε), J =

(a bc d

).

Then two equations (8.15) are equivalent to single matrix equation

x = J · x (8.16)

where the dot denotes standard matrix multiplication.

8.6 Stability of critical points 159

8.6.1 Example

Let us consider non-linear dynamical system

x = x(1 + y), y = y(1− x)

with

fx = x(1 + y), fy = y(1− x).

First we find the critical points, i.e. we set

xC(1 + yC) = 0, yC(1− xC) = 0.

There are two solution to these equations,

(xC , yC) = (0, 0) and (xC , yC) = (1,−1).

We analyse these points separately. The emphasis is on finding the critical pointsand deriving linearized equations of motion. The solution is merely stated becausewe will analyse all cases in detail later.

a) Critical point (0, 0). In this case we write

x = xC + δ = δ, y = yC + ε = ε.

Now we have

fx = x(1 + y) = δ(1 + ε) = δ, fy = y(1− x) = ε(1− δ) = ε,

where we have neglected higher order terms εδ because of linearization. Hence, inthe neighbourhood of the first critical point, the equations of motion are

δ = δ, ε = ε.

These equations can be solved trivially to find

δ = C1 et, ε = C2 e

t,

where C1 and C2 are integration constants. We will discus this later, but for now itis obvious that the phase trajectory escapes to infinity because

limt→∞

et =∞.


Hence, this critical point is unstable.

b) Critical point (1,−1). In this case we write

x = xC + δ = 1 + δ, y = yC + ε = −1 + ε,

so that

fx = x(1+y) = (1+δ)(1−1+ε) = ε, fy = y(1−x) = (−1+ε)(1−1−δ) = δ,

where we have neglected products εδ again. Now the linearized equations of motionare

δ = ε, ε = δ

which solve to

δ = C1 cosh t+ C2 sinh t, ε = C1 sinh t+ C2 cosh t.

The reader is invited to check that solutions (δ, ε) are hyperbolas escaping to infinityand hence the second critical point is unstable again.

8.7 Classification of critical points

In the previous idea we defined the critical points and sketched how these points canbe divided to stable and unstable points. We have seen that mathematical pendulumhas two critical points, one is stable, the other is not. In the next example we haveseen a system with two unstable critical points. The classification of critical points,however, is more subtle and we discuss all possibilities in this section.

Let us first recapitulate our goal. We study planar dynamical system describedby equations

x = fx(x, y), y = fy(x, y).

We assume that we have found critical point of this system, i.e. point (xC , yC) suchthat

fx(xC , yC) = fy(xC , yC) = 0,

and study the behaviour of the system near this critical point. We linearize theequations in the neighbourhood of critical point so that we obtain equations2

2 In the notation of previous section, our functions x and y are in fact perturbations δ and ε. In thissection, however, we use x and y as they are more natural.

8.7 Classification of critical points 161

x = a x+ b y, y = c x+ d y.

This system can be written also in the matrix form

x = J · x

where

J =

(a bc d

).

Now we discuss several forms of matrix J and classify the critical points. Finally wewill show how the analysis can be done for general matrix J .

8.7.1 Stable and unstable nodes, saddle points

Consider linear planar system of the form

x = λ1 x, y = λ2 y (8.17)

which corresponds to matrix

J =

(λ1 00 λ2

). (8.18)

System (8.17) can be easily solved. Equations for x and y are independent; we saythat these equations are decoupled which means that equation for x does not containy and vice versa.

Let us solve equation

x = λ1 x

first. In usual mathematical notation, this equation reads

dx

dt= λ1 x

which is separable differential equation. We can rewrite it as

dx

x= λ1 dt.

This form of equation is called separated because the left hand side of the equationscontains only x and the right hand side contains only time t. We can integrate theequation,

162 8 Dynamical systems∫dx

x=

∫λ1 dt,

to obtain

log x = λ1 t+ C

where C is an integration constant. It is customary that if the logarithm appears inthe solution, we write the constant as a logarithm as well3:

log x = λ1 t+ logK.

Exponentiating the last equation we arrive at

x = K eλ1t.

By the same procedure we solve equation for y to get

y = Leλ2t

where L is an integration constant again. Notice that, according to the solution, wehave

x(0) = K and y(0) = L.

Hence, K and L are values of x and y at time t = 0, respectively. Therefore, we canwrite the solution of (8.17) in the form

x(t) = x0 eλ1t, y(t) = y0 e

λ2t. (8.19)

Clearly, the only critical point of system (8.17) is (0, 0). Having derived solutionof this system, we can analyze its behaviour near the critical point. Useful functionto visualise properties of the system near critical point is StreamPlot which takes thevector field and plots trajectories. In the following example we choose λ1 = λ2 = 1.

3 Notice that arbitrary real number C is a logarithm of some other real number, i.e. we can write C = logKfor some K.


vals = 8 Λ1 ® 1, Λ2 ® 1<;

StreamPlot@ 8Λ1 x , Λ2 y < . vals, 8x , - 10, 10<, 8y , - 10, 10<D

Out[173]=

- 10 - 5 0 5 10

- 10

- 5

0

5

10

In this figure we can see trajectories (8.19) for initial points (x0, y0) chosen by Math-ematica. Notice that we have inserted the right hand side of (8.17) as an argumentof function StreamPlot. We can see that the trajectories are straight lines emanatingfrom the origin (critical point) and tending to infinity exponentially.

What about other choices of λ1,2? It is clear that function eλt is increasing forλ > 0 and decreasing for λ < 0. We can conclude that qualitative behaviour of thesystem depends on signs of λ1,2 and four possibilities are shown in figure 8.8 whichwas created by following commands in Mathematica. We distinguish three cases.

• λ1 > 0 and λ2 > 0In this case the critical point is called unstable node. Trajectories are emanatingfrom the origin and they are repelled to infinity.

• λ1 > 0, λ2 < 0 or λ1 < 0, λ2 > 0Critical point is called saddle point. Trajectories are repelled from y−axis andattracted to x−axis (for λ1 < 0) or repelled from x−axis and attracted to y−axis(for λ2 < 0).

• λ1 < 0 and λ2 < 0Critical point is called stable node. Trajectories are attracted to the origin.


In addition to this classification, critical points with distinct values λ1 6= λ2 are calledsingular while critical points with the same values λ1 = λ2 are called degenerate.Clearly, the saddle points cannot be singular.

- 10 - 5 0 5 10- 10

- 5

0

5

10aL Λ1 > 0 , Λ2 > 0

- 10 - 5 0 5 10- 10

- 5

0

5

10bL Λ1 > 0 , Λ2 < 0

- 10 - 5 0 5 10- 10

- 5

0

5

10cL Λ1 < 0 , Λ2 > 0

- 10 - 5 0 5 10- 10

- 5

0

5

10dL Λ1 < 0 , Λ2 < 0

Fig. 8.8. Different behaviour of planar system (8.17) for different choices of λ1,2. Critical points area) unstable node, b,c) saddle point, d) stable node.


Recall that planar dynamical system (8.17) can be represented by the matrix(8.18),

J =

(λ1 00 λ2

).

From elementary linear algebra we know that with matrix J we can associate a setof eigenvalues λ defined by equation

J · e = λ e

where e is called an eigenvector. It is easy to show that the eigenvalues of matrix(8.18) are λ1 and λ2 and corresponding eigenvectors are

e1 =

(10

), e2 =

(01

).

In other words, vectors e1 and e2 satisfy equations

J · e1 = λ1 e1, J · e2 = λ2 e2.

We can see that trajectories starting on lines determined by vectors ei, i = 1, 2,always remain in these lines. If the trajectory is being repelled from the critical pointalong direction e, the line determined by vector e is called unstable manifold. If thetrajectory is attracted to the critical point along the vector e, the line determined by eis called stable manifold. For matrix (8.18), vectors e1 and e2 are always eigenvectors.We can see that e1 lies on the x−axis and e2 lies on the y−axis. Hence, the axes arestable or unstable manifolds of system (8.17), depending on the sign of λ1,2.

The classification introduced above can be reformulated in the following way. Let

J =

(a bc d

)be a matrix of general linear dynamical system

x = a x+ b y, y = c x+ d y.

If matrix J has two real eigenvalues λ1 and λ2, then critical point is stable/unstablenode or a saddle point, depending on the signs of these eigenvalues.

We illustrate this classification on the example. Consider dynamical system

x = 2x+ y, y = x, (8.20)


with the matrix

J =

(2 11 0

).

This matrix is not of the form (8.18) but we can apply the second criterion. Eigen-values and eigenvectors can be found in Mathematica using

In[58]:=J = 882, 1<, 81, 0<<;

Eigensystem @J D

Out[59]= ::1 + 2 , 1 - 2 >, ::1 + 2 , 1>, :1 - 2 , 1>>>

which shows that eigenvalues are

λ1 = 1 +√

2, λ2 = 1−√

2,

and corresponding eigenvectors are

e1 =

(1 +√

21

), e2 =

(1−√

21

).

Since λ1 > 0 and λ2 < 0, vector e1 defines the stable manifold and e2 defines unstablemanifold. Since both eigenvalues have different signs, the critical point is a saddlepoint and it is regular. Phase trajectories together with stable and unstable manifoldscan be plotted by

In[54]:=g1 = StreamPlot@J .8x , y <, 8x , - 10, 10<, 8y , - 10, 10<D;

g2 = GraphicsB : Thick , Blue , Line B:- 10 : 1 + 2 , 1> , 10 : 1 + 2 , 1> > F> F;

g3 = GraphicsB : Thick , Red , Line B:- 10 : 1 - 2 , 1> , 10 : 1 - 2 , 1> > F> F;

Show @g1, g2, g3D

The result is plotted in figure 8.9.

8.7.2 Centres and foci

Next special case we consider is the dynamical system of the form


-10 -5 0 5 10

-10

-5

0

5

10

Fig. 8.9. Phase portrait for dynamical system (8.20). Blue line represents unstable manifold, red linerepresents stable manifold.

x = αx+ β y, y = −β x+ α y. (8.21)

Matrix of this system is

J =

(α β−β α

). (8.22)

System (8.21) is little trickier to solve. Let us switch to polar coordinate systemby usual transformation

x = r cos θ, y = r sin θ,

where r = r(t) and θ = θ(t). Inverse transformation reads

r =√x2 + y2, θ = arctan

y

x.

These relations can be used to find


∂r

∂x=

x

r,

∂r

∂y=

y

r,

∂θ

∂x= − y

r2,

∂θ

∂y=

x

r2.

Now we use (8.21) to derive corresponding equations for r and θ:

r =∂r

∂xx+

∂r

∂yy = α r,

θ =∂θ

∂xx+

∂θ

∂yy = −β.

We can see that dynamical system (8.21) in polar coordinates decouples to twoindependent equations for coordinates r and θ,

r = α r, θ = − β. (8.23)

First we solve equation for r. Let us write it in the form

dr

r= α dt

which integrates to

log r = α t+ logC

where the integration constant has been written as a logarithm (see footnote on page162). Exponentiating the last equation we arrive at

r = C eαt.

Obviously, at time t = 0 we have r(0) = C and so we write the solution in the form

r = r0 eαt.

Next we solve equation for θ. This is trivial since we have

dθ = β dt

which integrates to

θ = β t+ θ0


where the integration constant has been denoted by θ0 and represents the value of θat t = 0. Summa summarum, solution of system (8.23) acquires the form

r = r0 eαt, θ = θ0 + β t. (8.24)

Hence, solution of original system (8.21) in the Cartesian coordinates reads

x = r0 eαt cos (θ0 + β t) , y = r0 e

αt sin (θ0 + β t) . (8.25)

Suppose that α = 0 so that

x = r0 cos(θ0 + β t), y = r0 sin(θ0 + β t).

Clearly, this represents motion at constant angular velocity β and constant radius r0and therefore the phase trajectories are circles of radius r0. If α 6= 0, the radius ofthe ”circle” will be

r0 eαt

and hence the trajectory will be a spiral. If α > 0, the radius will increase exponen-tially and the spiral will tend to infinity. If, on the other hand, α < 0, the radius willdecrease exponentially and the phase trajectories will spiral towards the origin. Allcases are plotted in figure 8.10 by Mathematica commands

In[25]:=J = 8 8Α , Β<, 8- Β, Α <<;

g1 = StreamPlot@ J .8x , y < . 8Α ® 0, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,

PlotLabel ® "Α = 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;

g2 = StreamPlot@ J .8x , y < . 8Α ® 0, Β ® - 1<, 8x , - 5, 5<, 8y , - 5, 5<,

PlotLabel ® "Α = 0, Β < 0", BaseStyle ® 8FontSize ® 10<D;

g3 = StreamPlot@ J .8x , y < . 8Α ® 1, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,

PlotLabel ® "Α > 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;

g4 = StreamPlot@ J .8x , y < . 8Α ® - 1, Β ® 1<, 8x , - 5, 5<, 8y , - 5, 5<,

PlotLabel ® "Α < 0, Β > 0", BaseStyle ® 8FontSize ® 10<D;

g = GraphicsGrid @ 88g1, g2<, 8g3, g4< <D

and can be classified as follows:

• α = 0Critical point is called centre. Trajectories are circles centred at the origin.

• α > 0Critical point is called unstable focus, trajectories are spirals escaping to infinity.


• α < 0Critical point is called stable focus, trajectories are spirals tending to the origin.

Parameter β has the meaning of angular velocity. If it is zero, spirals become straightlines and dynamical system reduces to previous case (8.17). If it is non-zero, its signdetermines the sense of rotation: trajectories orbit the origin in a clockwise sense forβ > 0 and in a counter-clockwise sense for β < 0.

Let us now analyse critical points of system (8.21) in terms of eigenvalues ofmatrix (8.22)

J =

(α β−β α

).

We can use Mathematica to find the eigenvalues and eigenvectors of matrix (8.22)by

In[44]:=Eigensystem @ 8 8Α , Β<, 8- Β, Α <<D

Out[44]= 88Α - ä Β, Α + ä Β<, 88ä, 1<, 8-ä, 1<<<

which shows that this matrix has two eigenvalues

λ1 = α− i β and λ2 = α + i β

with eigenvectors

e1 =

(i1

), e2 =

(−i

1

).

In other words, eigenvalues and eigenvectors of matrix J satisfy relations

J · e1 = λ1 e1, J · e2 = λ2 e2.

The first observation is that the eigenvectors are complex and hence there are noneither stable nor unstable manifolds, i.e. there is no real direction which is mappedto the same direction. The only exception is when β = 0 since in this case dynamicalsystem (8.21) reduces to (8.17) and the eigenvectors become real.

Second, eigenvalues λ1,2 are mutually complex conjugated (as well as the eigen-vectors),


- 4 - 2 0 2 4

- 4

- 2

0

2

4

a L Α = 0, Β > 0

- 4 - 2 0 2 4

- 4

- 2

0

2

4

bL Α = 0, Β < 0

- 4 - 2 0 2 4

- 4

- 2

0

2

4

cL Α > 0, Β > 0

- 4 - 2 0 2 4

- 4

- 2

0

2

4

dL Α < 0, Β > 0

Fig. 8.10. Classification of critical points for the system (8.21): a, b) centre, c) unstable focus, d)stable focus.


λ1 = λ2

where the bar denotes the complex conjugation. Hence, even if the dynamical systemis not of the form (8.21), we can conclude, that if the matrix J has two complexconjugated eigenvalues

α± i β,

the critical point is stable/unstable focus or a centre, depending on the values of αand β as classified above.

Example. Consider dynamical system

x = 2x+ 4 y, y = −3x+ 2y.

This system is not of the form (8.21) but we can apply the criterion based on theanalysis of eigenvalues. In Mathematica we type

In[68]:=J = 8 82, 4<, 8- 3, 2<<;

Eigensystem @J D Expand

Out[69]= :: 2 + 2 ä 3 , 2 - 2 ä 3 >, ::-2 ä

3, 1>, :

2 ä

3, 1>>>

where we have used Expand in order to simplify the expression for eigenvectors (trythis code without Expand). We have found two eigenvalues

λ1,2 = 2± 2 i√

3 = α± i β,

which are mutually complex conjugated. In this case, parameters α and β are

α = 2, β = 2 i√

3.

Parameter α is positive and so the critical point is an unstable focus. Trajectories ofdynamical system considered:


- 4 - 2 0 2 4

- 4

- 2

0

2

4

Another example is the system

x = x+ 2 y, y = −2x− y.

Eigenvalues are found by

In[148]:=J = 8 81, 2<, 8- 2, - 1<<;

Eigensystem @J D Expand

Out[149]= ::ä 3 , -ä 3 >, ::-1

2-

ä 3

2, 1>, :-

1

2+

ä 3

2, 1>>>

Hence, now the eigenvalues are

λ1,2 = ±i√

3 = α± i β

which means that

α = 0, β = ±√

3.


Since α = 0, critical point is a centre rather than focus. Trajectories of this dynamicalsystem are the following:

- 4 - 2 0 2 4

- 4

- 2

0

2

4

8.8 General case

In the previous two sections we studied two special cases of planar linear dynamicalsystems given by matrices

J =

(λ1 00 λ2

)and J =

(α β−β α

).

However, we have seen that the analysis can be performed using the eigenvalues ofthese matrices. Now we consider general linear planar dynamical system

x = αx+ β y, y = γ x+ δ y. (8.26)

Let us find the eigenvalues and eigenvectors of this general matrix. Recall that thedeterminant of matrix J is

8.8 General case 175

D = detJ = α δ − β γ.

The trace of the matrix is defined as a sum of its diagonal elements, i.e.

T = Tr J = α + δ.

Eigenvalues λ are defined by equation

J · e = λ e

where e is an eigenvector. The last equation can be rewritten in the form

(J − λ I) · e = 0

where I is the unit matrix 2× 2 so that

(J − λ I) =

(α− λ βγ δ − λ

).

This equation is a homogeneous system of linear equations which has non-trivialsolutions only if the determinant of the system is zero:

det (J − λ I) = 0.

This determinant reads

(α− λ)(δ − λ)− β γ = 0.

Expanding the brackets we arrive at

λ2 − (α + δ)λ+ α δ − β γ = 0,

or, equivalently

λ2 − T λ+D = 0.

This is a quadratic equation for λ and its solutions are

λ1,2 =T ±√T 2 − 4D

2. (8.27)

Now we can summarize the classification of critical points as follows.


• λ1,2 ∈ R (real eigenvalues)– λ1 6= λ2 – singular node– λ1 = λ2 – degenerate node– λ1 > 0, λ2 > 0 – unstable node– λ1, λ2 < 0 – saddle point– λ1 < 0, λ2 < 0 – stable node

• λ1,2 = α± i β, λ1 = λ2 (complex conjugated eigenvalues)– α = 0 – centre– α > 0 – unstable focus– α < 0 – stable focus

Moreover, if the real parts of eigenvalues λ1,2 are non-zero, critical point is calledhyperbolic, otherwise it is called non-hyperbolic.

8.9 Examples

Example 1

Consider linear dynamical system

x = 2x+ y, y = x+ 2 y.

There is only one critical point at the origin,

xC = 0, yC = 0.

Since the system is linear, we do not have to linearize it and can write the matrix oflinearized system immediately:

J =

(2 11 2

).

Its eigenvalues are calculated from (8.27):

λ1 = 1, λ2 = 3.

Eigenvectors can be found easily by hand. Recall that eigenvectors are solutions toequation

J · ei = λi ei, i = 1, 2.

8.9 Examples 177

(Notice that we suppress the Einstein summation convention). For i = 1 we obtainhomogeneous system of linear equations(

1 11 1

)·(ab

)= 0

where e1 = (a, b) is unknown eigenvector. Since the rows (or columns) of the matrixabove are linearly dependent4, this system has infinitely many non-trivial solutionssatisfying condition a = −b. Hence, all eigenvectors corresponding to eigenvalueλ1 = 1 have the form(

a−a

).

We choose the eigenvector to be

e1 =

(1−1

).

By similar consideration we find that the eigenvector associated with eigenvalueλ2 = 3 is

e2 =

(11

).

To summarize, we have found the eigensystem of matrix J :

λ1 = 1, e1 =

(1−1

),

λ2 = 3, e2 =

(11

).

(8.28)

Now we can classify the critical point (0, 0). Since the eigenvalues are real and non-zero, critical point is hyperbolic. They are both positive and hence the critical pointis unstable node. Finally, eigenvectors are real and so the system has two unstablemanifolds given by e1 and e2. Implementation in Mathematica is shown in figure8.11.4 This is a consequence of (8.27), because this equation has been derived under the assumption det(J −λ I) = 0.


Dynamical systemx’ = 2 x + y, y’ = x + 2y

with the matrix J = K 2 11 2

O

In[4]:=H* critical points *LSolve @ 82 x + y 0, x + 2 y 0<, 8x , y <D

Out[4]= 88x ® 0, y ® 0<<

Origin (0, 0) is the only critical point. Eigenvalues and eigenvectors are found by

In[22]:=J = 8 82, 1<, 81, 2< <;

Eigensystem @J D

Out[23]= 883, 1<, 881, 1<, 8-1, 1<<<

Eigenvalues are real and positive:hyperbolic point, unstable node

In[24]:=g1 = StreamPlot@ J .8x , y <, 8x , - 5, 5<, 8y , - 5, 5<D; H* phase trajectories *LH* unstable manifold e1 = H 1, 1L *Lg2 = Graphics@8Blue , Thick , Line @ 8 - 10 81, 1<, 10 81, 1<<D<D;

H* unstable manifold e1 = H- 1, 1L *Lg3 = Graphics@8Red , Thick , Line @ 8 - 10 8- 1, 1<, 10 8- 1, 1<<D<D;

Show @g1, g2, g3D

Out[27]=

- 4 - 2 0 2 4

- 4

- 2

0

2

4

Fig. 8.11. Example 1.

8.9 Examples 179

Example 2

Linear dynamical system has the form

x = − 2x, y = − 4x− 2 y.

The matrix of this system is clearly

J =

(−2 0−4 −2

).

Eigenvalues are

λ1 = λ2 = −2

and so it is a degenerate node. Since both eigenvalues are negative, it is a stabledegenerate node. Implementation in Mathematica is shown in figure 8.12.

Example 3. Volterra-Lotka equations

Volterra-Lotka equations belong to the class of predator-prey models which describeinteraction between two populations. The population of preys has a tendency to growand the population of predators tends to die. It is due to their mutual interactionthat also the population of predators can grow and the population of preys can die,in other words, predators are eating preys.

Let x = x(t) be the number of preys, say, rabbits, let y = y(t) be the numberof predators, say, foxes. We can construct a plausible model of interaction betweenfoxes and rabbits by following simple considerations. Suppose that y = 0, i.e. thereare only rabbits present. As a first approximation we can assume that the populationof rabbits will grow, the number of rabbits x will increase because of ”interaction”between rabbits and the higher is the number of rabbits, the higher is the rate ofgrowth. Hence, we can postulate that isolated population of rabbits will be governedby equation

x = αx.

Roughly speaking, constant α can be interpreted as a probability of the birth of anew rabbit when there are no foxes. This equation has solution

x = x0 eα t

which means that isolated population of rabbits will grow exponentially.


Dynamical systemx’ = -2 x,y’ = -4 x - 2 y

with the matrix J = K - 2 0- 4 - 2

O

In[28]:=H* critical points *LSolve @ 8- 2 x 0, - 4 x - 2 y 0<, 8x , y <D

Out[28]= 88x ® 0, y ® 0<<

Origin (0, 0) is the only critical point. Eigenvalues and eigenvectors are found by

In[29]:=J = 8 8- 2, 0<, 8- 4, - 2< <;

Eigensystem @J D

Out[30]= 88- 2, - 2<, 880, 1<, 80, 0<<<

Eigenvalues are real and repeatedhyperbolic point, stable degenerate nodeStable manifold is given by e1 = H 0, 1L

In[37]:=g1 = StreamPlot@ J .8x , y <, 8x , - 5, 5<, 8y , - 5, 5<D; H* phase trajectories *LH* stable manifold *Lg2 = Graphics@8Red , Thick , Line @ 8 - 10 80, 1<, 10 80, 1<<D<D;

Show @g1, g2D

Out[39]=

- 4 - 2 0 2 4

- 4

- 2

0

2

4

Fig. 8.12. Example 2.

8.9 Examples 181

Similar consideration applies to isolated population of foxes. If γ is the probabilityof death of the fox, isolated population of foxes will be governed by equation

y = −γ y

which has the solution

y = y0 e−γ t,

i.e. the population of foxes will die exponentially.Now we add an interaction to our equations. The number of rabbits eaten by foxes

is proportional to number of rabbits and to number of foxes. Conversely, the numberof new-born foxes is proportional to number of foxes and to number of rabbits. If weintroduce constants β and δ for both processes, equations for interacting populationsof rabbits and foxes read

x = αx− β x y, y = −γ y + δ x y. (8.29)

These are Volterra-Lotka equations. Obviously, they are non-linear and the non-linearity represents the interaction between two populations. All constants are as-sumed to be positive.

Critical points can be found by

In[4]:=cp = Solve @ 8Α x - Β x y 0, - Γ y + ∆ x y 0<, 8x , y <D

Out[4]= :: x ®

Γ

∆

, y ®

Α

Β

>, 8x ® 0, y ® 0<>

Hence, the critical points are

xC1 =

(γ

δ,

α

β

), xC2 = (0, 0) .

In order to linearize equations (8.29) we introduce the Jacobi matrix J

J =

∂x

∂x

∂x

∂y∂y

∂x

∂y

∂y

.

The Jacobi matrix can be found in Mathematica by


In[16]:=f@x_ , y_ D = 8 Α x - Β x y , - Γ y + ∆ x y <;

J = Transpose @D@ f@x , y D, ð D & 8x , y <D

Out[17]= 88Α - y Β, - x Β<, 8y ∆, - Γ + x ∆<<

which shows

J =

(α− yβ −xβyδ xδ − γ

).

Next we evaluate the Jacobian at both critical points:

In[24]:=J1 = J . cp@@1DDJ2 = J . cp@@2DD

Out[24]= ::0, -

Β Γ

∆

>, :Α ∆

Β

, 0>>

Out[25]= 88Α, 0<, 80, - Γ <<

i.e. we have

J1 =

0 −β γδ

α δ

β0

at critical point xC1,

J2 =

(α 00 −γ

)at critical point xC2.

Finally we find eigenvalues and eigenvectors by

In[27]:=Eigensystem @J1DEigensystem @J2D

Out[27]= ::-ä Α Γ , ä Α Γ >, ::-

ä Β Γ

Α ∆

, 1>, :ä Β Γ

Α ∆

, 1>>>

Out[28]= 88Α, - Γ <, 881, 0<, 80, 1<<<

8.10 Flow of the vector field 183

8.10 Flow of the vector field

In this section we introduce some useful notions related to the concept of dynamicalsystem. We consider general autonomous dynamical system (8.1)

xa = fa(x), a = 1, 2, . . . n. (8.30)

We know that the solution exists and is unique if prescribe initial conditions

xa(0) = xa0 (8.31)

where xa0 are constants with the meaning of initial value of coordinates xa. Thesolution of dynamical system is then a set of functions xa as functions of time,

xa(t) = xa(t, x0), (8.32)

where we have explicitly emphasized that particular solution depends on initial values

x0 = (x10, x20, . . . xn0) .

Hence, in the following, by symbol x(t, x0) we mean the set of functions

x(t, x0) = (x1(t), x2(t), . . . xn(t))

such that

x(0, x0) = x0 andd

dtx(t, x0) = fa(x(t, x0)). (8.33)

In other words, x(t, x0) is a solution of dynamical system (8.30) with initial conditions(8.31).

It is useful to introduce slightly more formal notation for x(t, x0). We definedthe phase space M as an abstract space with coordinates xa. For n−dimensionaldynamical system, the phase space is

M = Rn = R× R× · · ·R︸︷︷︸n

.

The flow of dynamical system (8.30) is a mapping


Φ : R×M 7→M

defined by

Φs(x0) = x(s, x0).

Geometrically, the flow Φs is a mapping which maps arbitrary point x0 to pointx(s, x0), i.e. shifts point x0 along the phase trajectory by parametric distance s.Hence, the flow satisfies relations

Φ0(x0) = x0, Φs+t = Φs Φt, (Φs)−1 = Φ−s.

Obviously,

dΦs(x0)

ds

∣∣∣∣s=0

=d

ds

∣∣∣∣s=0

x(s, x0) = fa(x0).

Thus, we can also say that the flow Φs shifts point x0 along the vector field fa.Let us illustrate it on the example of familiar planar dynamical system

x = y, y = −xso that we have

f1(x, y) = y, f2(x, y) = −x.Vector field fa can be plotted by

In[63]:=VectorPlot@ 8y , - x <, 8x , - 10, 10<, 8y , - 10, 10<D

Out[63]=

- 10 - 5 0 5 10

- 10

- 5

0

5

10

8.10 Flow of the vector field 185

This dynamical system for initial conditions

x(0) = x0, y(0) = y0,

can be solved explicitly by

In[66]:=sol = DSolve @ 8x '@tD y @tD, y '@tD - x @tD, x @0D x0, y @0D y0<, 8x @tD, y @tD<, tD

Out[66]= 88x @tD ® x0 Cos@tD + y0 Sin @tD, y @tD ® y0 Cos@tD - x0 Sin @tD<<

which shows, in the notation introduced above,

x(t, x0, y0) = x0 cos t+ y0 sin t, y(t, x0, y0) = −x0 sin t+ y0 cos t.

Thus, the flow Φs maps point (x0, y0) to point which lies on the solution with initialconditions (x0, y0) at time s:

Φs(x0, y0) = (x0 cos s+ y0 sin s, − x0 sin s+ y0 sin s).

Hence, Φs(x0, y0) is a position of the system at time s for initial conditions (x0, y0).In figure 8.13 we plot the flow for initial conditions

x0 = 1, y0 = 8.

We have seen that the curve Φs(x0) for a given x0 is a solution of dynamicalsystem with initial condition x(0) = x0. This curve is called orbit of point x0 and isdenoted by

Λ(x0) = Φs(x0) | −∞ < s <∞ . (8.34)

Similarly, we define positive semi-orbit and negative semi-orbit by

Λ+(x0) = Φs(x0) | s > 0 ,Λ−(x0) = Φs(x0) | s < 0 . (8.35)


In[187]:=IC = 8 x0 ® 1, y0 ® 8<;

g1 = VectorPlot@ 8y , - x <, 8x , - 10, 10<, 8y , - 10, 10<, VectorStyle ® Orange D;

g2 = ParametricPlot@ 8x @tD, y @tD< . sol . IC , 8t, 0, 5<, PlotStyle ® Black D;

g3 = Graphics@ 8Black , Text@Style @"x0 = F0 H x0 L", 8Large <D, 80.2, 9<D<D;

g4 = Graphics@ 8Black , Text@Style @"F5 H x0 L", 8Large <D, 8- 6, 4.2<D<D;

Show @g1, g2, g3, g4D

Out[192]=

x0=F0Hx0L

F5Hx0L

- 10 - 5 0 5 10

- 10

- 5

0

5

10

Fig. 8.13. Illustration of the flow.

8.11 Lyapunov stability

Recall that we have defined the critical point or fixed point xC of dynamical system(8.30) as such point xC for which

fa(xC) = 0

and hence xa(xC) = 0. System with initial conditions x0 = xC is in equilibrium inthe sense that it remains in the critical point at all times, i.e.

Λ(xC) = xC.

In other words, critical point xC satisfies relation

Φs(xC) = xC for all s ∈ R.

8.11 Lyapunov stability 187

We have classified critical points according to behaviour of the orbits (phase tra-jectories) in the vicinity of the critical point. If the orbit remained in the vicinity ofcritical point, we have said that the critical point is stable. If the orbit was attractedto critical point, it was called stable node or stable focus, depending on the characterof the system. If the orbit was circular, critical point was called centre. Finally, ifthe orbit escaped from the critical point to infinity, we called the critical point theunstable node or unstable focus. However, this analysis was performed for linearizeddynamical system. Now we can formulate the stability for general non-linear systemin terms of the flow.

Let ‖ · ‖ be standard norm defined on the phase space M , i.e. for any x ∈ M itsnorm is

‖x‖ =√x21 + x22 + · · ·x2n.

In general, the norm is a measure of distance of point x from the origin. In somesituations, it is useful to introduce different notion of the norm, for example theso-called p−norm (p is positive integer) defined by

‖x‖p = p√xp1 + xp2 + · · ·xpn.

In the following we will use standard norm ‖·‖ = ‖·‖2 which is a standard Euclideandistance, as follows from the Pythagorean theorem. In general, the norm must satisfythree relations.

• Positive definiteness

‖x‖ ≥ 0 and ‖x‖ = 0 only for x = 0.

• Linearity

‖αx‖ = |α| ‖x‖

for arbitrary real α ∈ R.• Triangle inequality

‖x+ y‖ ≤ ‖x‖+ ‖y‖.

In some contexts the first condition is relaxed, i.e. we admit there are vectorsx 6= 0 for which ‖x‖ = 0. In this case, operation ‖ · ‖ is called semi-norm. In thistextbook we consider only positive definite norms satisfying the first property. Noticethat positive definiteness implies that whenever


‖x− y‖ = 0,

vectors x and y are equal, x = y.Solution Φs(x0) is called Lyapunov stable if for any ε > 0 there exists δ > 0 such

that

‖x0 − y0‖ < δ → ∀s∈R ‖Φs(x0)− Φs(y0)‖ < ε.

If solution Φs(x0) is not Lyapunov stable, it is called unstable. Solution Φs(x0) iscalled asymptotically stable if it is stable and, in addition, there exists δ > 0 suchthat

‖x0 − y0‖ < δ → lims→∞‖Φs(x0)− Φs(y0)‖ = 0.

9

Bifurcations

In the previous chapter we defined the concept of dynamical system and introducedseveral notions related to dynamical systems. Among others, we have investigated thestability of critical points. This discussion was connected with the behaviour of thephase trajectories (or orbits) n the neighbourhood of the critical point. In this sectionwe analyse dynamical systems from another point of view. Instead of investigatingthe orbits (but using classification introduced in previous chapter) we investigate theinfluence of the parameters of the system. We will observe that there are values ofparameters for which the system can exhibit different behaviour. Which behaviouroccurs depends on the circumstances, e.g. on the history of the system. Points atwhich the system must ”decide” which behaviour to choose are called bifurcationpoints. These issues will be clarified and illustrated below. Bifurcation theory is alarge subject and in this chapter we merely sketch the main ideas without going intodepth.

9.1 Saddle-node bifurcation

The existence and properties of critical points can depend on the parameters ofdynamical system. Consider one-dimensional dynamical system

x = µ+ x2 (9.1)

where µ is a real parameter. If µ > 0, there are no real critical points. For µ = 0, theonly critical point is xC = 0, and for µ < 0 there are two critical points at xC =

√µ

and xC = −√µ. Let us examine the character of critical points briefly.For µ = 0 and critical point xC = 0, the linearized version of system (9.1) reads

x = 0

190 9 Bifurcations

which shows that xC is non-hyperbolic critical point (eigenvalue of Jacobi matrix hasvanishing real part).

For µ < 0, the critical point is xC = ±√µ. We expand function

f(x) = µ+ x2

into the Taylor series in x about point xC and find

f(x) = f(xC) + (x− xC) f ′(xC) = 2µ+ 2 (x∓√µ) (±√µ) = ±2√µx.

Hence, system (9.1) linearized in the neighbourhood of point√µ reads

x = 2√µx

which shows that critical point√µ is unstable node. In the neighbourhood of critical

point −√µ we have

x = −2√µx

and so this critical point is a stable node. We can plot critical points correspondingto different values of µ by code presented in figure 9.1.

Saddle-node bifurcations occur when critical points do not exist for some values ofthe parameter, then a critical point suddenly appears at some value of the parameterand single critical point splits into two critical points for other values of the param-eter. In our case, there are no critical points for µ > 0 but a critical point appearsat µ = 0. This is a bifurcation point. Finally, for µ < 0 there are two critical points,one of them being stable, the other one being unstable.

9.2 Transcritical bifurcations

Now consider dynamical system

x = µx− x2 = x (µ− x). (9.2)

Regardless on the value of µ, there is always one critical point at xC = 0 and onecritical point at xC = µ. Hence, unlike the case of saddle-node bifurcations, thenumber of critical points does not change. However, we will show that the characterof these critical points change at the bifurcation point.

First critical point is xC = 0. After linearization of system (9.2) we find

x = µx.

9.2 Transcritical bifurcations 191

In[61]:=PlotB : - Μ , - - Μ > , 8Μ, - 2, 0.5<,

PlotStyle ® 88Dashed , Thick <, 8Thick <<, AspectRatio ® 1, AxesLabel ® 8"Μ", "xC "<,

BaseStyle ® 8FontSize ® 15<,

Epilog ® 8Disk @80, 0<, 0.03D,

Text@"unstable node ", 8- 1, 1.3<D,

Text@"stable node ", 8- 1, - 1.3<D,

Text@"bifurcation point", 8-0.6, 0.1<D<

F

Out[61]=

- 2.0 -1.5 -1.0 - 0.5 0.5Μ

-1.0

- 0.5

0.5

1.0

xC

unstable node

stable node

bifurcation point

Fig. 9.1. Saddle-node bifurcation. Diagram for one-dimensional system (9.1).

Obviously, for µ > 0, critical point is unstable while for µ < 0 it is stable. On theother hand, after linearization of system (9.2) we have

x = µ2 − µx. (9.3)

This equation is inhomogeneous linear equation with constant coefficients and canbe solved by elementary methods. First we write down corresponding homogeneousequation

x = −µx

which integrates to

192 9 Bifurcations

xH = C e−µ t

where subscript H stands for ”homogeneous”. Next we need to find any particularsolution of original inhomogeneous equation. This is trivial, however, for obviouslythe choice x = µ is a solution to equation 9.3. By a mathematical theorem, generalsolution to equation (9.3) is

x = µ+ C e−µ t.

Constant µ does not affect the character of critical point (prove!) and only the ex-ponential term matters. We can see that for µ > 0 the critical point is stable whilefor µ < 0 it is unstable.

To summarize, we have found two critical points

xC = 0 and xC = µ

which change the character at µ = 0. Character of both critical points is depicted infigure 9.2.

Transcritical bifurcations occur when there are two critical points for all values ofparameter. However, at bifurcation point (in our case µ = 0), these critical pointsinterchange their character and the point which was stable becomes unstable andvice versa.

9.3 Pitchfork bifurcation

Next we examine the system

x = µx− x3. (9.4)

Notice that this system is invariant under reflection x 7→ −x, for under this trans-formation we have

x 7→ −x, x 7→ −x, x3 7→ −x3,

and hence

x = µx− x3 7→ −x = −µx+ x3 → x = µx− x3.

Thus, equation (9.4) does not change its form under the reflection, i.e. the reflectionis a symmetry of equation (9.4). Pitchfork bifurcations occur often in the systemspossessing some kinds of symmetries.

9.3 Pitchfork bifurcation 193

In[66]:=xC1@Μ_ D = Piecewise @8 80, Μ < 0<, 8Μ, Μ > 0<<D;

xC2@Μ_ D = Piecewise @8 8Μ, Μ < 0<, 80, Μ > 0<<D;

In[83]:=Plot@ 8xC1@ΜD, xC2@ΜD<, 8Μ, - 2, 2<,

PlotStyle ® 88Blue , Thick <, 8Dashed , Red , Thick <<, AspectRatio ® 1, AxesLabel ® 8"Μ", "xC "<,

Axes ® 8False , True <, BaseStyle ® 8FontSize ® 15<,

Epilog ® 8Disk @80, 0<, 0.03D,

Text@"unstable ", 8- 1, 0.1<D,

Text@"stable ", 81, 0.1<D,

Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D<

D

Out[83]=

- 2

-1

0

1

2

xC

unstable stableΜ

Fig. 9.2. Transcritical bifurcation diagram for system (9.2).

There is always a critical point xC = 0 regardless on the value of µ. Linearizationof system (9.4) yields

x = µx

and so this critical point is stable for µ < 0 and unstable for µ > 0.For µ > 0 there are two other critical points xC = ±√µ. By linearization we find

x = −2µ (x∓√µ)

194 9 Bifurcations

which shows that (ignoring the constant factor as in the previous section) both criticalpoints ±√µ are stable. Indeed, µ > 0 and hence the factor standing by x is always−2µ < 0. All possibilities are plotted in figure 9.3 again.

In[99]:=xC1@Μ_ ; Μ £ 0D = 0;

xC2@Μ_ ; Μ > 0D = Μ ;

xC3@Μ_ ; Μ > 0D = - Μ ;

xC4@Μ_ ; Μ > 0D = 0;

In[104]:=Plot@ 8xC1@ΜD, xC2@ΜD, xC3@ΜD, xC4@ΜD<, 8Μ, - 2, 2<,

PlotStyle ® 88Blue <, 8Blue <, 8Blue <, 8Red , Dashed <<,

AspectRatio ® 1, AxesLabel ® 8"Μ", "xC "<,


Epilog ® 8Disk @80, 0<, 0.03D,

Text@"stable ", 8- 1, 0.1<D,

Text@"unstable ", 81, 0.1<D,


D

Out[104]=

-1.0

- 0.5

0.0

0.5

1.0

xC

stable unstable

Μ

Fig. 9.3. Supercritical pitchfork bifurcation diagram for system (9.4).

9.4 Example 195

Bifurcations of the type discussed are called supercritical pitchfork bifurcations.Dynamical system

x = µx+ x3

is a typical system showing the so-called subcritical pitchfork bifurcation. Show bystandard analysis that bifurcation diagram for this system is correctly depicted infigure 9.4.

9.4 Example

Now let us see a non-trivial example on pitchfork bifurcation. Let the system be

x = µx+ y + sinx, y = x− y. (9.5)

Our task is to determine the bifurcation point and type of bifurcation. We will useMathematica to solve particular steps.

First we find critical points by setting x = 0 and y = 0. Second equation imme-diately gives x = 0 and hence equation for x reads

µx+ x+ sinx = 0. (9.6)

Clearly, a general solution cannot be found analytically but we can see that forarbitrary µ there is always a solution

xC = yC = 0.

Let us determine the character of this critical point. Jacobi matrix of system (9.5) is

J =

(µ+ 1 1

1 −1

)(9.7)

and its eigenvalues can be found by Mathematica :

In[30]:=J = 8 8Μ + 1, 1<, 81, - 1<<;

sys = Eigenvalues@J D

Out[31]= :1

2Μ - 8 + 4 Μ + Μ

2 ,1

2Μ + 8 + 4 Μ + Μ

2 >

196 9 Bifurcations

In[11]:=xC1@Μ_ ; Μ £ 0D = 0;

xC2@Μ_ ; Μ < 0D = - Μ ;

xC3@Μ_ ; Μ < 0D = - - Μ ;

xC4@Μ_ ; Μ > 0D = 0;

In[17]:=Plot@ 8xC1@ΜD, xC2@ΜD, xC3@ΜD, xC4@ΜD<, 8Μ, - 2, 2<,

PlotStyle ® 88Blue <, 8Blue <, 8Blue <, 8Red , Dashed <<,



Epilog ® 88PointSize @Large D, Point@80, 0<D<,

Text@"stable ", 8- 1, 0.1<D,

Text@"unstable ", 81, 0.1<D,


D

Out[17]=

-1.0

- 0.5

0.0

0.5

1.0

xC

stable unstable

Μ

Fig. 9.4. Subcritical pitchfork bifurcation diagram for system x = µx+ x3.

9.4 Example 197

Although it is trivial to investigate behaviour of eigenvalues as functions of parameterµ, it is even easier to use Mathematica to plot dependence of λ1 and λ2 on µ.

In[14]:=Λ1@Μ_ D = sysP 1TΛ2@Μ_ D = sysP 2TPlot@ 8Λ1@ΜD, Λ2@ΜD<, 8Μ, - 10, 10<, PlotStyle ® 8Blue , Red <D

Out[14]= 1

2Μ - 8 + 4 Μ + Μ

2

Out[15]= 1

2Μ + 8 + 4 Μ + Μ

2

Out[16]=

- 10 - 5 5 10

- 5

5

10

Hence, for all values of µ we have λ1 < 0 while λ2 changes the sign for µ = −2. Thatmeans that for µ < −2, when both eigenvalues are negative, the critical point is astable node. For µ > −2, the critical point is a saddle point because eigenvalues havedifferent signs.

Clearly, point µ = −2 is a candidate for being a bifurcation point. Since we cannotsolve equation (9.6) exactly, we restrict our attention to neighbourhood of potentialbifurcation point µ = −2. Critical points are roots of function

Rµ(x) = µ(x+ 1) + sinx.

In figure 9.5 we plot this function for three values of µ. We can see that criticalpoints different from the origin appear only for µ > −2. Approximate location ofthese critical points can be found by expanding function sinx in (9.6) up to the thirdorder,

sinx = x− 1

3!x3,

198 9 Bifurcations

so that this equation simplifies to

x (µ+ 2)− 1

6x3 = 0.

One solution is, of course, x = 0, the other two are

x = ±√

6(µ+ 2). (9.8)

- 1.0 - 0.5 0.5 1.0x

- 0.4

- 0.2

0.2

0.4

R Μ H x L = Μ H x +1L + sin x

Μ = -1.9

Μ = -2

Μ = -2.1

Fig. 9.5. Plot of function Rµ(x) = µ(x + 1) + sinx. Its roots are critical points of system (9.5). Forµ ≤ −2, the origin x = 0 is the only critical point, for µ > −2 there are two critical points symmetricabout the origin.

Now we can determine the character of bifurcation point even without analysisof new critical points. Recall that the origin is a critical point, stable for µ < −2and unstable for µ > −2. New critical point emerge at bifurcation point and existfor µ > −2. Hence, the bifurcation diagram is similar to that in figure 9.3. We candeduce that the bifurcation is supercritical and two new critical points are stable.

In Mathematica we can easily find precise locations of critical points numericallyusing function FindRoot. This function needs a starting point and we choose thisstarting point to be approximate solution (9.8). Full Mathematica code for plottingcorrect bifurcation diagram in the neighbourhood of the bifurcation point µ = −2 isshown in figure 9.6.

9.4 Example 199

In[218]:=cp@Μ_ ; Μ > - 2D := FindRootB Μ x + x + Sin @x D 0, : x , 6 H Μ + 2L > F@@1, 2DD

In[229]:=xC1@Μ_ ; Μ £ - 2D = 0;

xC2@Μ_ ; Μ > - 2D = 0;

xC3@Μ_ ; Μ > - 2D = cp@ΜD;

xC4@Μ_ ; Μ > - 2D = - cp@ΜD;

In[262]:=g = Plot@ 8xC1@ΜD, xC2@ΜD, xC3@ΜD, xC4@ΜD<,

8Μ, - 3, - 1<, PlotStyle ® 88Blue <, 8Red , Dashed <, 8Blue <, 8Blue <<,



Epilog ® 8Disk @80, 0<, 0.03D,

Text@"stable ", 8- 2.5, 0.2<D,

Text@"stable ", 8- 1.5, 2.5<D,

Text@"unstable ", 8- 1.2, 0.2<D,

Text@Style @"Μ", FontSize ® 15D, 81.9, -0.1<D,

8PointSize @Large D, Point@8- 2, 0<D<,

Text@Style @"Μ=- 2", FontSize ® 15D, 8- 1.8, 0.2<D<

D

Out[262]=

- 3

- 2

-1

0

1

2

3

xC

stable

stable

unstableΜ =-2

Fig. 9.6. Supercritical bifurcation point for dynamical system (9.5).

A

Important commands in Mathematica

A.1 D-derivative

Derivatives in Mathematica can be computed in several ways. Command of the form

D[f, x]

differentiates function f with respect to variable x. If we need n−th order derivativeof f , we use

D[f, x, n]

Similarly, second partial derivatives with respect to several variables can be calculatedby

D[ f, x, y ]

which is equivalent of

∂2f

∂x ∂y

For example, commands

D[ Sin[x^2], x]

D[ x^3, x, 2 ]

D[ y x^2 + x y^2, x, y]

are equivalents of mathemematical expressions

d

dxsinx2,

d2

dx2x3,

∂2

∂x ∂y

(y x2 + x y2

)and produce following output

202 A Important commands in Mathematica

2 x Cos[x^2]

6 x

2 x + 2 y

A.2 Table

Command Table[...] creates one-dimensional or more dimensional lists of elements.One-dimensional list can be created by

Table[ expr, i, imin, imax ]

where expr is some expression depending on variable i. Command Table subsequentlysubstitutes values of i into expression expr and produces a list of expressions. Forexample, command

squares = Table[ i^2, i, 1, 5 ]

produces a list

1, 4, 9, 16, 25

which is now stored in variable squares. In order to access individual elements of thelist, use the double-square-brackets [[ and ]]. For example, third element of the listsquares can be accessed via

squares[[ 3 ]]

which returns

9.

B

Some features of Mathematica

B.1 Rules of replacement

One of the most powerfull tools in Mathematica is the rule-based replacement. Westart by simple example. Suppose we have trivial expression

y

and we want to replace symbol y by some more complicated expression, say y = x2.Let us write

y /. y-> x^2

In the previous code, symbol /. means that we are going to use some rules of replace-ment. The rule itself is

y -> x^2

and says that any occurence of symbol y will be replaced by expression x2. This canbe useful when the expression is more complicated. Example:

x + y^2 - 1/y /.y->x^2

will replace all occurences of symbol y in expression x+ y2 − 1/y by x2, so that theresult is

-(1/x^2) + x + x^4

We can define the list of rules as well. Imagine we want to replace simultaneouslyx and y in some expression, for example, we want to replace x by x − 1 and y by1− y2 in expression x2 + y2:

x^2 + y^2 /. x-> x-1, y -> 1-y^2

which yields

204 B Some features of Mathematica

(-1 + x)^2 + (1 - y^2)^2.

Sometimes it is useful to define the rules separately in order to increase the read-ibility of the code. Previous example is equivalent to the following:

rules = x-> x-1, y -> 1-y^2 ;

x^2 + y^2 /. rules

B.2 Functions

In Mathematica you can define functions of any type and there are many features tobe covered. Here we discuss only what is necessary for the purposes of our textbook.

Function of one or more variables is defined according to scheme

func_name [ var1_, var2_, ... ] = expr

where func name is the name of new function. In square brackets you have to enu-merate all variables which the function depends on. Notice the underline symbolafter the name of each variable. Assignment is performed via traditional symbol =.Finally, on the right hand side there is an expression for the function.

For example, you can define function f = 3x e−x2

as

f[ x_ ] = 3 x Exp[-x^2];

Now you can evaluate it at some point, say 10, by

f[10]

which yields

30

e100.

If you need numerical value, type

f[10] //N

to find result 1.11602× 10−42.Let us see an example of function of more variables.

f[ x_, y_, z_ ] = x^2 + y^2 + z^2

To evaluate this function at some point, say (1, 2, 3), type

f[1, 2, 3]

to get number 14.

B.3 Pure functions 205

B.3 Pure functions

Pure functions are very useful constructions in Mathematica. In mathematics thereis a difference between f and f(x), although these symbols are (in some contexts)used as equivalent. Symbol f is a function of, say, one variable x, which means thatit maps real number into real number, mathematically

f : R 7→ R.

On the other hand, symbol f(x) is a value of function f at point x. More precisely,f is a set of ordered pairs (x, y) such that there is only one y for each x. If a pair(x, y) is an element of f , i.e. (x, y) ∈ f , we write usually

y = f(x).

Thus, f is a set of ordered pairs of real numbers, while f(x) is the single real numbermeaning the value of f at point x.

Let us turn back to Mathematica. When you write, for example,

f[x_] = 1 + x^2

you tell Mathematica that the value of function f at point x is f(x) = 1 + x2. Butthe name of the argument is irrelevant, for if you write

f[q_] = 1 + q^2

you define exactly the same function! The name of argument is only formal. Thealternative is to use the pure function.

Consider following definition:

f = Function[ 1 + #^2 ]

Here we do not use the name of arguments. The sharp symbol # means the argumentof function regardless on its name. You can verify that function f defined in this waybehaves as function f[x] or f[q] defined above. Similarly, you can define function ofmore variables by

f = Function[ #1^2 + #2^2 ]

where symbols #1 and #2 stand for the first and the second argument, respectively.Calling

f[x,y]


now yields

x2 + y2,

calling

f[1, 3]

yields number 10.Pure function can be defined without using command Function by symbol &.

Following three lines are equivalent:

f[x_] = 1 + x^2

f = Function[ 1+#^2 ]

f = (1 + #^2)&

Notation with symbol & is particularly useful if we need to use the function onlyat one place but we do not need it later. Then it is unnecessary to define functionseparately. For example, suppose that you are given a list

list = 1, 2, 3, 4, 5 ;

and you want to apply function f(x) = 1 + x2 to each element of list. We can useoperator /@:

(1+#^2)& /@ list

Here we defined a pure function (1 + #2)& which, as we have seen, is an abstractway of defining function 1+x2. Operator /@ now substitutes each element of list intothis pure function and produces a list

2, 5, 10, 17, 26.

B.4 Expressions

Anything you type in Mathematica is called expression and expressions can be di-vided into two groups, atomic and composed. Atomic expressions are the most simpleelements, e.g. numbers, functions. Each expression has the so-called head which canbe found using function Head. For example, try the following code:

Head[2]

Head[4.5]

Head[2 + 3 I ]

Mathematica returns “values”

B.4 Expressions 207

Integer, Real, Complex

which means that 2 was recognized as an integer, 4.5 as a real, and 2+3i as a complexnumber. Mathematica’s power rests in its ability to work with symbolic expressions.If the atomic expression is not identified as a number, its head is symbol. Verify thisfact for:

Head[x]

Head[Sin]

Head[f]

etc.Atomic expressions we have seen above can be combined into composed expres-

sions. For example, symbols x and y are atomic expressions, but their sum x + y iscomposed expression. The head of expression x+y is Plus (check!). If you want toaccess several parts of composed expression, you can use function Part. For example,

Part[ x+y, 2 ]

returns the second part of composed expression x+ y which is y. We can also list allparts of the composed expression by Level:

Level[ x+y-z+b, 1 ]

yields b, x, y, -z. Try the following:

Head[x + y]

Head[x y]

Head[ x^y ]

You can see that Mathematica returns Plus, Times, Power. If, however, you type

Head[x-y]

Mathematica returns Plus again (did you expect “minus”?). The reason is obvious,for if we type

Level[ x-y, 1 ]

Mathematica returns x, -y. Thus, Mathematica treats expression x − y as a sumof x and −y. Typing

Head[-y]

Level[-y,1]

yields


Times

-1, y

Therefore, −y is a product of −1 and y. Reader is invited to experiment with severalexpressions in order to get feeling for the structure of Mathematica.

B.5 Working with heads

Heads of arbitrary expressions can be replaced without changing the structure ofexpression. For example, expressions

x + y + z

x y z

x, y, z

all have the same strcture and differ only by the head. We can verify that by functionsHead and Level. These functions reveal that the head of the first expression is Plus,the head of the second one is Times and the head of the third expression is List.Nevertheless, calling the function Level shows that the structure of all expressions is

x, y, z.

Therefore, by changing the head, we can easily convert those expressions betweenthemselves. The head of the expression can be changed by function Apply as in thefollowing example:

Apply[ Plus, x, y, z ]

turns the list x,y,z into expression x+y+z. The same operation can be written inan abbreviated form as

Plus @@ x, y, z

The head Plus is applied by operator @@ to the list on the right hand side.

C

Shortcuts in Mathematica

C.1 Greek letters

Greek letters can be typed in several ways. The most convenient is to use followingtable:

α ESC a ESC ι ESC i ESC σ ESC s ESCβ ESC b ESC κ ESC k ESC τ ESC t ESCγ ESC g ESC λ ESC l ESC φ ESC f ESCδ ESC d ESC µ ESC m ESC χ ESC c ESCε ESC a ESC ν ESC n ESC ψ ESC y ESCζ ESC z ESC ξ ESC x ESC ω ESC w ESCη ESC h ESC π ESC p ESCθ ESC q ESC ρ ESC r ESCFor example, to type α just press Escape key, then type a and press Escape again.

Mathematica will automatically display symbol α. Another way is to use table

α \[Alpha] ι \[Iota] σ \[Sigma]β \[Beta] κ \[Kappa] τ \[Tau]γ \[Gamma] λ \[Lambda] φ \[Phi]δ \[Delta] µ \[Mu] χ \[Chi]ε \[Epsilon] ν \[Nu] ψ \[Psi]ζ \[Zeta] ξ \[Xi] ω \[Omega]η \[Eta] π \[Pi]θ \[Theta] ρ \[Rho]

D

To do

• Rotation matrices• Full analysis of chaotic pendulum• Matrix eigenvalues• Volterra-Lotka equations• Pictures on Lyapunov stability• More coordinate systems

classical mechanics and dynamical systemsutf.mff.cuni.cz/~scholtz/data/mechanics-mathematica.pdf ·...

Documents