a geometric approach to linear ordinary diﬀerential...

A Geometric Approach to LinearOrdinary Differential Equations

R.C. ChurchillHunter College and the Graduate Center of CUNY,

and the University of Calgary

Address for Correspondence

Department of MathematicsHunter College

695 Park Avenue, New York, NY 10021, USA

October 19, 2006

A Geometric Approach toLinear Ordinary Differential Equations

R.C. Churchill

Prepared for the

Kolchin Seminar on Differential Algebra

Graduate Center, City University of New York13 October 2006

Amended and Corrected, 19 October 2006

Abstract

We offer a formulation of linear ordinary differential equations midwaybetween what one encounters in a first undergraduate ODE course andwhat one encounters in a graduate Differential Geometry course (in thelatter instance under the heading of “connections”). Analogies with el-ementary linear algebra are emphasized; no familiarity with DifferentialGeometry is assumed.

Contents

Introduction

§1. Preliminaries on Derivations

§2. Differential Modules

§3. nth-Order Linear Differential Equations

§4. Dimension Considerations

§5. Fundamental Matrix Solutions

§6. Dual Structures and Adjoint Equations

§7. Cyclic Vectors

§8. Extensions of Differential Structures

§9. The Differential Galois Group

Bibliography

1

Introduction

Let V be a finite-dimensional vector space over a field K and let T : V → Vbe a K-linear operator. In a first course in linear algebra one learns that T can bestudied via matrices as follows.

• Select an ordered basis e = (e1, e2, . . . , en) of V and let B = (bij) be thee-matrix of T , i.e., the n× n matrix with entries bij ∈ K defined by

(i) Tej =∑n

j=1bijei, j = 1, 2, . . . , n.

• Let coln(K) ' Kn denote the K-space of n× 1 column vectors of elements ofK i.e., n × 1 matrices, and let βe : V → coln(K) be the isomorphism definedby

(ii) v =∑

jkjej ∈ V 7→ ve :=

k1

k2...

kn

∈ coln(K).

• Let TB : coln(K)→ coln(K) denote left multiplication by B, i.e., the functionx ∈ coln(K) 7→ Bx ∈ coln(K). Then TB is K-linear and one has a commutativediagram

(iii)

VT−→ V

βe↓ ↓βe

coln(K)TB−→ coln(K)

The commutativity of the diagram suggests that anything one wants to knowabout T should somehow be reflected in the K-linear mapping TB or, better yet, inthe matrix B alone. The hope is that information desired about T is more easilyextracted from B.

Example : An ordered pair (λ, v) ∈ F × (V \ {0}) is an eigenpair for T if

(iv) Tv = λv;

one calls λ an eigenvalue of T , v an eigenvector, and the two entities are said to be“associated.”

2

We interpret (iv) geometrically: the effect of T on v is to “scale” this vectorby a factor of λ. The idea is easy enough to comprehend, but suppose one actuallyneeds to compute such pairs?

In that case one switches from T to the matrix B: one learns that the eigenvaluesof T are the roots of the characteristic polynomial

char(B) := det(xI −B)

of B, and once these have been determined one can begin searching for associatedeigenvectors.

Of course the matrix B introduced above is not the unique “matrix representa-tion” of T : a different choice of basis results in a different matrix, say A. One learnsthat A and B must be “similar,” i.e., that A = PBP−1 for some non-singular n×nmatrix P , and that similar matrices have the same characteristic polynomial. Thisshows that the characteristic polynomial is something intrinsically associated with T ,not simply with matrix representations of T . In particular, one can unambiguouslydefine the characteristic polynomial char(T ) of T by char(T ) := char(B). Sincethe determinant (up to sign) and the negative of the trace of B appear as coefficientsof this polynomial, these entities are also directly associated with T , not simplywith matrix representations thereof. One can therefore define det(T ) := det(B) andtr(T ) := tr(B), again without ambiguity.

To summarize: there are entities intrinsically associated with T which can easilybe computed from matrix representations of this operator.

In these notes we will view linear ordinary differential equations in an analogousmanner. Specifically, we will explain how to think of a first order system

(v)

x1 + b11x1 + b12x2 + · · ·+ b1nxn = 0

x2 + b21x1 + b22x2 + · · ·+ b2nxn = 0...

xn + bn1x1 + bn2x2 + · · ·+ bnnxn = 0

as a “basis representation” of a geometric entity which we call a “differential mod-ule.” For our purposes the fundamental object intrinsically associated with such anentity is the “differential Galois group,” a group traditionally associated with basisrepresentations as in (v).

We need some preliminaries on derivations.

3

1. Preliminaries on Derivations

In this section R is a (not necessarily commutative) ring with multiplicative identity1, with 1 = 0 allowed (in which case R = 0).

An additive group endomorphism δ : r ∈ R 7→ r ′ ∈ R is a derivation if theproduct or Leibniz rule

(1.1) (rs) ′ = rs ′ + r ′s

holds for all r, s ∈ R. One also writes r ′ as r(1) and defines r(n) := (r(n−1)) ′ forn ≥ 2. The notation r(0) := r proves convenient.

The usual differentiation operators ddz

on the polynomial ring C[z] and the quo-tient field C(z) are the basic examples of derivations. The second can be viewed as aderivation on the field M(P1) of meromorphic functions on the Riemann sphere bymeans of the usual identification C(z) 'M(P1). The same derivation on M(P1) isdescribed in terms of the usual coordinate t = 1/z at ∞ by −t2 d

dt.

Another example of a derivation is provided by the zero mapping r ∈ R 7→ 0 ∈ R;this is the trivial derivation.

For an example involving a non-commutative ring choose an integer n > 1, letR be the collection of n× n matrices with entries in a commutative ring A with aderivation a 7→ a ′, and for r = (aij) ∈ R define r ′ := (a ′

ij).When r 7→ r ′ is a derivation on R one sees from (1.1) that 1 ′ = (1 · 1) ′ =

1 · 1 ′ + 1 ′ · 1 = 1 ′ + 1 ′, and as a result that

(1.2) 1 ′ = 0 .

When r ∈ R is a unit it then follows from 1 = rr−1 and (1.1) that

0 = (rr−1) ′ = r · (r−1) ′ + r ′ · r−1,

whence

(1.3) (r−1) ′ = −r−1 · r ′ · r−1 .

This formula is particularly useful in the matrix example given above. When R iscommutative it assumes the more familiar form

(1.4) (r−1) ′ = −r ′r−2 .

4

The ring assumption suggests the generality of the concept of a derivation, but ourmain interest will be in derivations on fields. In this regard we note that any derivationon an integral domain extends uniquely, via the quotient rule, to the quotient field.

Henceforth K denotes a differential field (of characteristic 0), i.e., a field Kequipped with a non-trivial derivation k 7→ k ′. By a constant we mean an elementk ∈ K satisfying k ′ = 0, e.g., we see from (1.2) that 1 ∈ R has this property.Indeed, the collection KC ⊂ K of constants is easily seen to be a subfield containingQ; this is the field of constants (of K = (K, δ)).

When K = C(z) with derivation ddz

we have KC = C. The constants of thedifferential field M(P1) defined above are the constant functions f : P1 → C.

The determinant

(1.5) W := W (k1, . . . , kn) := det

k1 k2 · · · kn

k ′1 k ′

2

... k ′n

k(2)1 k

(2)2

...

...

k(n−1)1 k

(n−1)2 · · · · · · k

(n−1)n

is the Wronskian of the elements k1, . . . , kn ∈ K. This entity is useful for determininglinear (in)dependence over KC .

Proposition 1.6 : Elements k1, . . . , kn of a differential field K are linearly depen-dent over the field of constants KC if and only if their Wronskian is 0.

Proof :⇒ For any c1, . . . , cn ∈ KC and any 0 ≤ m ≤ n we have (

∑j cjkj)

(m) =∑j cjk

(m)j . In particular, when

∑j cjkj = 0 the same equality holds when kj is

replaced by the jth column of the Wronskian and 0 is replaced by a column of zeros.The forward assertion follows.⇐ The vanishing of the Wronskian implies a dependence relation (over K) among

columns, and as a result there must be elements c1, . . . , cn ∈ K, not all 0, such that

(i)∑n

j=1cjk(m)j = 0 for m = 0, . . . , n− 1.

What requires proof is that the cj may be chosen in KC , and this we establish byinduction on n. As the case n = 1 is trivial we assume n > 1 and that the resultholds for any subset of K with at most n− 1 elements.

5

If there is also a dependence relation (over K) among the columns of the Wron-skian of y2, . . . , yn, e.g., if c1 = 0, then by the induction hypothesis the elementsy2, . . . , yn ∈ K must be linearly dependent over KC . But the same then holds fory1, . . . , yn, which is precisely what we want to prove. We therefore assume (w.l.o.g.)that c1 = 1 and that the columns of the Wronskian of y2, . . . , yk are linearly inde-pendent over K. From (i) we then have

0 = (∑n

j=1cjk(m)j ) ′ =

∑nj=1 cjk

(m+1)j +

∑nj=2 c ′jk

(m)j = 0 +

∑nj=2 c ′jk

(m)j =

∑nj=2 c ′jk

(m)j

for m = 0, . . . , n − 2, thereby forcing c ′2 = · · · = c ′n = 0. But this means cj ∈ KC

for j = 1, . . . , n, and the proof is complete. q.e.d.

6

2. Differential Modules

Throughout this section K denotes a differential field with derivation k 7→ k ′ andV is a K-space (i.e., a vector space over K). The collection of n× n matrices withentries in K is denoted gl(n, K); the group of invertible matrices in gl(n, K) isdenoted GL(n, k).

A differential structure on V is an additive group homomorphism D : V → Vsatisfying

(2.1) D(kv) = k ′v + kDv, k ∈ K, v ∈ V,

where Dv abbreviates D(v). The Leibniz rule terminology is also used with (2.1).Vectors v ∈ V satisfying Dv = 0 are said to be horizontal 1. The zero vector 0 ∈ Vis always has this property; other such vectors need not exist.

When D : V → V is a differential structure the pair (V, D) is called a differentialK-module, or simply a differential module when K is clear from context. WhendimK(V ) = n <∞ the integer n is the dimension of the differential module.

For an example of a differential structure/module take V := coln(K) and defineD : coln(K)→ coln(K) by

(2.2) D

k1

k2...

kn

:=

k ′

1

k ′2...

k ′n

.

Further examples will be evident from Proposition 2.12.Since KC is a subfield of K we can regard V as a vector space over KC by

restricting scalar multiplication to KC × V .

Proposition 2.3 :

(a) Any differential structure D : V → V is KC-linear.

(b) The collection of horizontal vectors of a differential structure D : V → Vcoincides with the kernel ker D of D when D is considered as a KC-linearmapping.

1The terminology is borrowed from Differential Geometry.

7

(c) The horizontal vectors of a differential structure D : V → V constitute a KC-subspace of (the KC-space) V .

Proof :

(a) Immediate from (2.1).

(b) Obvious from the definition of horizontal.

(c) Immediate from (b).

q.e.d.

To obtain a basis description of a differential structure D : V → V let e =(ej)

nj=1 ⊂ V n be a(n ordered) basis of V let and B = (bij) ∈ gl(n, K) be defined by

(2.4) Dej :=∑n

j=1bijei, j = 1, . . . , n.

(Example: For D as in (2.2) and2 ej = (0, . . . , 0, 1, 0, . . . , 1)τ [1 in slot j] forj = 1, . . . , n we have B = (0) [the zero matrix].) We refer to B as the defining(e)-matrix of D, or as the defining matrix of D relative to the basis e. Note that,for any v =

∑nj=1 vjej ∈ V , additivity and the Leibniz rule (2.1) give

(2.5) Dv =∑n

i=1(v′i +

∑nj=1 bijvj)ei.

This is better expressed in the matrix form

(2.6) (Dv)e = v ′e + Bve,

wherein ve is as in (ii) of the introduction, i.e., the image of v under the isomorphism

βe : v =∑

jvjej ∈ V 7→

v1

v2...

vn

∈ coln(K),

and

v ′e :=

v ′

1

v ′2...

v ′n

.

2The superscript τ (“tau”) denotes transposition.

8

Where all this is leading should hardly be a surprise: the mapping DB : x ∈coln(K)→ x ′+Bx ∈ coln(K) defines a differential structure on the K-space coln(K),and one has a commutative diagram

(2.7)

VD−→ V

βe↓ ↓βe

coln(K)DB−→ coln(K).

An immediate connection with linear ordinary differential equations is seen from(2.6): a vector v ∈ V is horizontal if and only if ve ∈ coln(K) is a solution of thefirst-order linear system

(2.8) x ′ + Bx = 0.

This is the defining (e)-equation of D.Linear systems of ordinary differential equations of the form

(2.9) x ′ + Bx = 0

are called homogeneous. One can also ask for solutions of inhomogeneous systems,i.e., systems of the form

(2.10) x ′ + Bx = b,

wherein 0 6= b ∈ coln(K) is given. For b = we this is equivalent to the search for avector v ∈ V satisfying

(2.11) Dv = w .

Equation (2.9) is the homogeneous equation corresponding to (2.10).

Proposition 2.12 : When dimK V < ∞ and e is a basis the correspondence be-tween differential structures D : V → V and n × n matrices B defined by (2.4)is bijective; the inverse assigns to a matrix B ∈ gl(n, K) the differential structureD : V → V defined by (2.6).

Proof : The proof is by routine verification. q.e.d.

9

Proposition 2.13 :

(a) The solutions of (2.9) within coln(K) form a vector space over KC.

(b) When dimK V = n < ∞ and e is a basis of V the K-linear isomorphismv ∈ V 7→ ve ∈ coln(K) restricts to a KC-linear isomorphism between the KC-subspace of V consisting of horizontal vectors and the KC-subspace of coln(K)consisting of solutions of (2.9).

Proof :

(a) When y1, y2 ∈ coln(K) are solutions and c1, c2 ∈ KC we have

(c1y1 + c2y2)′ = (c1y1)

′ + (c2y2)′

= c ′1y1 + c1y′1 + c ′2y2 + c2y

′2

= 0 · y1 + c1(−By1) + 0 · y2 + c2(−By2)

= −Bc1y1 −Bc2y2

= −B(c1y1 + c2y2) .

(b) That the mapping restricts to a bijection between horizontal vectors and so-lutions was already noted immediately before (2.9), and since the correspondencev 7→ ve is K-linear and KC is a subfield of K any restriction to a KC-subspacemust be KC-linear.

q.e.d.

Suppose e = (ej)nj=1 ⊂ V n is a second basis and P = (pij) is the transition

matrix, i.e., ej =∑n

i=1 pijei. Then the defining e and e-matrices B and A of Dare easily seen to be related by

(2.14) A := PBP−1 − P ′P−1,

where P ′ := (p ′ij). The transition from B to A is viewed classically as a change ofvariables: substitute w = Px in (2.9); then note from

w ′ = Px ′ + P ′x = P (−Bx) + P ′P−1w = −PBP−1w + P ′P−1w

thatw ′ + (PBP−1 − P ′P−1)w = 0 .

The modern viewpoint is to regard (P, B) 7→ PBP−1 − P ′P−1 as defining a leftaction of GL(n, K) on gl(n, K); this is the action by gauge transformations.

10

Example 2.15 : Assume K = C(z) with derivation ddz

and consider the first-ordersystem

(i) x ′ +

(10z4−(2ν2−1)z2−2

z(2z4−1)

)4z6−4ν2z4−4z2+1

z4(2z4−1)

− ( z2(z4+3z2−ν2)2z4−1

) (2ν2−1)z2+1z(2z4−1)

x = 0 ,

i.e.,

x ′1 +

(10z4−(2ν2−1)z2−2

z(2z4−1)

)x1 +

(4z6−4ν2z4−4z2+1

z4(2z4−1)

)x2 = 0

x ′2 −

(z2(z4+3z2−ν2)

2z4−1

)x1 +

((2ν2−1)z2+1

z(2z4−1)

)x2 = 0

,

wherein ν is a complex parameter. This has the form (2.9) with

B :=

(10z4−(2ν2−1)z2−2

z(2z4−1)

)4z6−4ν2z4−4z2+1

z4(2z4−1)

− ( z2(z4+3z2−ν2)2z4−1

) (2ν2−1)z2+1z(2z4−1)

,

and with the choice3

P :=

(z2 2z

z3 z−2

)one sees that the transformed system is

(ii) x ′ + Ax = 0, where A := PBP−1 − P ′P−1 =

(0 −1

1− ν2

z21z

).

We regard (i)-(ii) as distinct basis descriptions of the same differential structure.

3At this point readers should not be concerned with how this particular P was constructed.

11

3. nth-Order Linear Differential Equations

The concept of an nth-order linear homogeneous equation in the context of a differ-ential field K is formulated in the obvious way: an element k ∈ K is a solutionof

(3.1) y(n) + `1y(n−1) + · · ·+ `n−1y

′ + `ny = 0 ,

where `1, . . . , `n ∈ K, if and only if

(3.2) k(n) + `1k(n−1) + · · ·+ `n−1k

′ + `nk = 0 ,

where k(2) := k ′′ := (k ′) ′ and k(j) := (k(j−1)) ′ for j > 2. Using a Wronskianargument one can easily prove that (3.1) has at most n solutions (in K) linearlyindependent over KC .

As in the classical case k ∈ K is a solution of (3.1) if and only if the columnvector (k, k ′, . . . , k(n−1))τ is a solution of

(3.3) x ′ + Bx = 0, B =

0 −1 0 · · · 0... 0 −1

...

0. . .

. . . −1 0

0 −1

`n `n−1 · · · · · · `2 `1

.

Indeed, one has the following analogue of Proposition 2.13.

Proposition 3.4 :

(a) The solutions of (3.1) within K form a vector space over KC.

(b) The KC-linear mapping (y, y ′, . . . , y(n−1))τ ∈ coln(K) 7→ y ∈ K restricts to aKC-linear isomorphism between the KC-subspace of V consisting of horizontalvectors and the KC-subspace of K described in (a).

Proof : The proof is a routine verification. q.e.d.

12

Example 3.5 : Equation (ii) of Example 2.15, i.e.,

(i) x ′ +

(0 −1

1− ν2

z21z

)x = 0,

has the form seen in (3.3). This linear differential equation would commonly bewritten as

(ii) y ′′ +1

zy ′ +

(1− ν2

z2

)y = 0,

or as

(iii) z2y ′′ + z y + (z2 − ν2) y = 0.

Either form can be regarded as a basis description of the differential module of thatexample. Some readers will immediately recognize (iii): it is Bessel’s equation.

Converting the nth-order equation (3.1) to the first-order system (3.3) is stan-dard practice. Less well-known is the fact that any first-order system of n equationscan be converted to the form (3.3), and as a consequence can be expressed nth-orderform. We will prove this in §7. For many purposes nth-order form has distinct advan-tages, e.g., explicit solutions are often easily constructed with series expansions, e.g.,equation (iii) of Example 3.5 can be solved explicitly in terms of Bessel functions.

13

4. Dimension Considerations

In this section K denotes a differential field and (V, D) is a differential K-moduleof dimension n ≥ 1.

Proposition 4.1 : When V is a K-space with differential structure D : V → Vthe following assertions hold.

(a) A collection of horizontal vectors within V is linearly independent over K ifand only if it is linearly independent over KC.

(b) The collection of horizontal vectors of V is a vector space over KC of dimen-sion at most n.

Proof :

(a) ⇒ Immediate from the inclusion KC ⊂ K . (In this direction the horizontalassumption is unnecessary.)

⇐ If the implication is false there is a collection of horizontal vectors in V which isKC-(linearly) independent but K-dependent, and from this collection we can choosevectors v1, . . . , vm which are K-dependent with m > 1 minimal w.r.t. this property.We can then write vm =

∑m−1j=1 kjvj, with kj ∈ K, whereupon applying D and the

hypotheses Dvj = 0 results in the identity 0 =∑m−1 k ′

jvj. By the minimality ofm this forces k ′

j = 0, j = 1, . . . ,m − 1, i.e., kj ∈ KC , and this contradicts linearindependence over KC .

(b) This is immediate from (a) and the fact that any K-linearly independentsubset of V can be extended to a basis.

q.e.d.

Suppose dimK V = n <∞, e is a basis of V , and x ′+Ax = 0 is the defining e-equation of D. Then assertion (b) of the preceding result has the following standardformulation.

Corollary 4.2 : For any matrix B ∈ gl(n,K) a collection of solutions of

(i) x ′ + Bx = 0

within coln(K) is linearly independent over K if and only if the collection is linearlyindependent over KC. In particular, the KC-subspace of coln(K) consisting ofsolutions of (i) has dimension at most n.

14

Proof : By Proposition 2.13. q.e.d.

Equation (i) of Corollary 4.2 is always satisfied by the column vector x =(0, 0, . . . , 0)τ ; this is the trivial solution, and any other is non-trivial. Unfortunately,non-trivial solutions (with entries in K) need not exist. For example, the linear dif-ferential equation y ′ − y = 0 admits only the trivial solution in the field C(z) : fornon-trivial solutions one must recast the problem so as to include the extension field(C(z))(exp(z)).

Corollary 4.3 : For any elements `1, . . . , `n−1 ∈ K a collection of solutions {yj}mj=1

⊂ K of

(i) y(n) + `1y(n−1) + · · ·+ `n−1y

′ + `ny = 0

is linearly independent over KC if and only if the collection {(yj, y′j , . . . , y

(n−1)j }mj=1

is linearly independent over K. In particular, the KC-subspace of K consisting ofsolutions of (i) has dimension at most n.

Proof : Use Proposition 3.4(b) and Corollary 4.2. q.e.d.

15

5. Fundamental Matrix Solutions

Throughout the section K is a differential field and D : V → V is a differentialmodule of positive dimension n.

Choose any basis e of V and let

(5.1) x ′ + Bx = 0

be the defining e-equation of D. A non-singular matrix M ∈ gl(n, K) is a funda-mental matrix solution of (5.1) if M satisfies this equation, i.e., if and only if

(5.2) M ′ + BM = 0.

Example: Take K = C(z) with the usual derivation δ = d/dz; then

M :=

(z2 1 + z

0 z3

)

is a fundamental matrix solution of the equation

x ′ +

(−2

zz+2z4

0 −3z

)x = 0.

Proposition 5.3 : A matrix M ∈ gl(n,K) is a fundamental matrix solution of(5.1) if and only if the columns of M constitute n solutions of that equation linearlyindependent over KC.

Of course linear independence over K is equivalent to the non-vanishing of theWronskian W (y1, . . . , yn).

Proof : First note that (5.2) holds if and only if the columns of M are solutions of(5.1). Next observe that M is non-singular if and only if these columns are linearlyindependent over K. Finally, note from Propositions 2.13(b) and 4.1(a) that thiswill be the case if and only if these columns are linearly independent over KC . q.e.d.

Corollary 5.4 : Equation (5.1) admits a fundamental matrix solution in GL(n, K)if and only if V admits a basis consisting of D-horizontal vectors.

16

Proof : By Proposition 5.3 the existence of a fundamental matrix solution inGL(n,K) is equivalent to the existence of n D-horizontal vectors linearly indepen-dent over KC . Now recall Proposition 4.1. q.e.d.

Proposition 5.5 : Suppose M, N ∈ gl(n, K) and M is a fundamental matrix so-lution of (5.1). Then N is a fundamental matrix solution if and only if N = MCfor some matrix C ∈ GL(n,KC).

Proof : B

⇒ : By (1.3) we have

(M−1N) ′ = M−1 ·N ′ = (M−1) ′ ·N= M−1 · (−BN) + (−M−1M ′M−1) ·N= −M−1BN + (−M−1)(−BM)(−M−1)N

= −M−1BN + M−1BN

= 0 .

⇐ : We have N ′ = (MC) ′ = M ′C = −BM · C = −B ·MC = −BN .

q.e.d.

Fundamental matrix solutions have an interesting geometric characterization. Toexplain that we need to introduce the dual of a differential module.

17

6. Dual Structures and Adjoint Equations

Differential structures allow for a simple conceptual formulation of the “adjoint equa-tion” of a linear ordinary differential equation.

In this section K is a differential field and (V, D) is a differential K-module ofdimension n ≥ 1. Recall that the dual space V ∗ of V is defined as the K-spaceof linear functionals v∗ : V → K, and the dual basis e∗ of a basis e = {eα} of V isthe basis {e∗α} of V ∗ satisfying e∗βeα = δαβ (wherein δαβ is the usual Kronecker delta,i.e., δαβ := 1 if and only if α = β; otherwise δαβ := 0).

There is a dual differential structure D∗ : V ∗ → V ∗ on the dual space V ∗ naturallyassociated with D: the definition is

(6.1) (D∗u∗)v = δ(u∗v)− u∗(Dv), u∗ ∈ V ∗, v ∈ V.

The verification that this is a differential structure is straightforward, and is left tothe reader. One often sees u∗v written as 〈v, u∗〉, and when this notation is used (6.1)becomes

(6.2) δ〈v, u∗〉 = 〈Dv, u∗〉+ 〈v, D∗u∗〉.

This is the Lagrange identity ; it implies that u∗v ∈ KC whenever v and u∗ arehorizontal.

Proposition 6.3 : Suppose e ⊂ V n is a basis of V and B = (bij) ∈ gl(n, K) is thedefining e-matrix of D. Then the defining e∗-matrix of D∗ is −Bτ .

The proof, as are most of the proofs in this section, is a simple application of theLagrange identity.

Proof : First note from (6.2) that for any 1 ≤ i, j ≤ n we have

0 = δ ′ij= δ〈ei, e

∗j〉

= 〈Dei, e∗j〉+ 〈ei, D

∗e∗j〉= 〈

∑k bkiek, e

∗j〉+ 〈ei, D

∗e∗j〉=

∑k bki〈ek, e

∗j〉+ 〈ei, D

∗e∗j〉= bji + 〈ei, D

∗e∗j〉,

18

from which we see that

(i) 〈ei, D∗e∗j〉 = −bji.

However, for the defining e∗-matrix C = (cij) of D∗ we have

D∗e∗i =∑

kckie∗k,

and therefore〈ei, D

∗e∗j〉 = 〈ei,∑

kckje∗k〉 =

∑k ckj〈ei, e

∗k〉 = cij

for any 1 ≤ i, j ≤ n. The result now follows from (i). q.e.d.

As an immediate consequence of Proposition 6.3 we see that the defining e∗-equation of D∗ is

(6.4) y ′ −Bτy = 0;

this is the adjoint equation of

(6.5) x ′ + Bx = 0.

Note from the usual identification V ∗∗ ' V and −(−Bτ )τ = B that (6.5) can beviewed as the adjoint equation of (6.4). Intrinsically: the identification V ' V ∗∗

induces the additional identification D∗∗ := (D∗)∗ ' D.Equations (6.5) and (6.4) are interchangeable in the sense that information about

either one can always be obtained from information about the other. In particular,there is fundamental relationship between solutions of a linear differential equationand the solutions of the adjoint equation. This is explained in the following result,which also contains the promised geometric characterization of a fundamental matrixsolution.

Proposition 6.6 : Suppose e is a basis of V and

(i) x ′ + Bx = 0

is the defining e-matrix of D. Then for any M ∈ gl(n, K) the following statementsare equivalent:

(a) M is a fundamental matrix solution of (i);

19

(b) (M τ )−1 is a fundamental matrix solution of the adjoint equation

(ii) x ′ −Bτx = 0

of (i); and

(c) M τ is the transition matrix from the dual basis e∗ of e to a basis of V ∗

consisting of D∗-horizontal vectors.

Proof : First note that (M τ ) ′ = (M ′)τ ; hence that

(iii) M ′ + BM = 0 ⇔ (M τ ) ′ + M τBτ = 0.

(a) ⇔ (b) : From (iii), (M τ )−1 = (M−1)τ and (M τ ) ′ = (M ′)τ we have

M ′ + MB = 0 ⇔ (M ′)τ + M τBτ = 0

⇔ −(M τ )−1(M ′)τ (M τ )−1 + (M τ )−1M τBτ (M τ )−1 = 0

⇔ −(M−1)τ (M ′)τ (M−1)τ + Bτ (M−1)τ = 0

⇔ −(M−1M ′M−1)τ + Bτ (M−1)τ = 0

⇔ ((M−1) ′)τ −Bτ (M τ )−1 = 0

⇔ ((M τ )−1) ′ −Bτ (M τ )−1 = 0 .

(a) ⇔ (c) : One has

M ′ + BM = 0 ⇔ (M τ ) ′ + M τBτ = 0

⇔ (M τ ) ′(M τ )−1 + M τBτ (M τ )−1 = 0

⇔ (M τ )(−Bτ )(M τ )−1 − (M τ ) ′(M τ )−1 = 0.

This last line is precisely what one obtains when A, B and P in (2.14) are replacedby 0, −Bτ and M τ respectively, i.e., it asserts that when M τ is regarded as atransition matrix from e∗ to some other basis e∗ of V ∗, the defining e∗-matrix ofD∗ must vanish. Now simply observe from (2.4) that that the defining matrix of abasis vanishes if and only if that basis consists of horizontal vectors.

q.e.d.

Dual differential structures are useful for solving equations of the form Du = w,wherein w ∈ V is given. The fundamental result in this direction is the following.

20

Proposition 6.7 : Suppose (v∗m)nj=1 is a basis of V ∗ consisting of horizontal vectors

and (vj)nj=1 is the dual basis of V ' V ∗∗. Then the following statements hold.

(a) All vj are horizontal.

(b) Suppose w ∈ V and there are elements kj ∈ K such that k ′j = 〈w, v∗j 〉 for

j = 1, . . . , n. Then the vector

(i) u :=∑

jkjvj ∈ V

satisfies

(ii) Du = w .

Te vector u introduced in (i) is not the unique solution of (ii): the sum of uand any horizontal vector will also satisfy that equation.

Proof :

(a) This can be seen as a corollary of Proposition 6.6, but a direct proof is quiteeasy: from 〈vi, v

∗j 〉 ∈ {0, 1} ⊂ KC , Lagrange’s identity (6.2) and the hypothesis

D∗v∗j = 0 we have

0 = 〈vi, v∗j 〉 ′

= 〈Dvi, v∗j 〉+ 〈vi, D

∗v∗j 〉= 〈Dvi, v

∗j 〉 ,

and since (v∗j ) is a basis this forces Dvi = 0 for i = 1, . . . , n.

(b) First note that for any v ∈ V we have

(iii) v =∑

j〈v, v∗j 〉vj .

Indeed, we can always write v in the form v =∑

i civi, where ci ∈ K, and applyingv∗j to this identity gives 〈v, v∗j 〉 =

∑i ci〈vi, v

∗j 〉 = cj.

From (a) and (iii) we then have

Du =∑

j D(kjvj)

=∑

j(k′jvj + kjDvj)

=∑

j(〈w, v∗j 〉vj + kj · 0)

=∑

j〈w, v∗j )vj

= w .

q.e.d.

21

The following corollary explains the relevance of the adjoint equation for solvinginhomogeneous systems. In the statement we adopt more classical notation: whenk, ` ∈ K satisfy ` ′ = k we write ` as

∫k, omit specific reference to `, and simply

assert that∫

k ∈ K. Moreover, we use the usual inner product 〈y, z〉 :=∑

j yjzj toidentify coln(K) with (coln(K))∗, i.e., we identify the two spaces by means of theK-isomorphism v ∈ coln(K) 7→ (w ∈ coln(K) 7→ 〈w, v〉 ∈ K) ∈ (coln(K))∗.

Corollary 6.8 4: Suppose:

(a) B ∈ gl(n, K);

(b) b ∈ coln(K);

(c) (zj)nj=1 is a basis of coln(K) consisting of solutions of the adjoint equation

(i) x ′ −Bτx = 0

of

(ii) x ′ + Bx = 0 ;

(d)∫〈b, zj〉 ∈ K for j = 1, . . . , n; and

(e) (yi)ni=1 is a basis of coln(K) satisfying

(iii) 〈yi, zj〉 =

{1 if i = j

0 otherwise.

Then (yj)nj=1 is a basis of solutions of the homogeneous equation (ii) and the vector

(iv) y :=∑

j(∫〈b, zj〉) · yj

is a solution of the inhomogeneous system

(v) x′ + Bx = b .

4For a classical account of this result see, e.g., [Poole, Chapter III, §10, pp. 36-39]. In fact thetreatment in this reference was the inspiration for our formulation of this corollary.

22

The appearance of the integrals in (iv) explains why solutions of (i) are calledintegrating factors of (ii) (and vice-versa, since, as already noted, (i) may be regardedas the adjoint equation of (ii)).

The result is immediate from Proposition 6.7. However, it is a simple enoughmatter to give a direct proof, and we therefore do so.

Proof : Hypothesis (d) identifies (zj)nj=1 with the dual basis of (yi)

ni=1. In particular,

it allows us to view (zj)nj=1 as a basis of (coln(K))∗.

To prove that the xj satisfy (ii) simply note from (iii) and (i) that

0 = 〈yi, zj〉 ′

= 〈y ′i , zj〉+ 〈yi, z

′j〉

= 〈y ′i , zj〉+ 〈yi, B

τzj〉= 〈y ′

i , zj〉+ 〈Byi, zj〉= 〈y ′

i + Byi, zj〉.

Since (zj)nj=1 is a basis (of (coln(K))∗) it follows that

(vi) y ′j + Byj = 0 , j = 1, . . . , n.

Next observe, as in (i) of the proof of Proposition 6.7, that for any b ∈ coln(K)condition (iii) implies

b =∑

j

〈b, zj〉yj .

It then follows from (vi) that

y ′ =∑

j

((∫〈b, zj〉) · y ′

j + 〈b, zj〉yj

)=

∑j

(−(∫〈b, zj〉) ·Byj + 〈b, zj〉yj

)= −B

∑j(∫〈b, zj〉) · yj +

∑j〈b, zj〉yj

= −By + b .

q.e.d.

Corollary 6.8 was formulated so as to make the role of the adjoint equation evident.The following alternate formulation is easier to apply in practice.

23

Corollary 6.9 : Suppose B ∈ gl(n, K) and M ∈ GL(n, K) is a fundamental matrixsolution of

(i) x ′ + Bx = 0.

Denote the jth-columns of M and (M τ )−1 by yj and zj respectively, and supposeb ∈ coln(K) and

∫〈b, zj〉 ∈ K for j = 1, . . . , n. Then

(ii) y :=∑

j(∫〈b, zj)) · yj

is a solution of the inhomogeneous system

(iii) x ′ + Bx = b .

Proof : By Proposition 5.3 the zj form a basis of coln(K), and from (M τ )−1 =(M−1)τ and M−1M = I we see that

〈yi, zj〉 =

{1 if i = j

0 otherwise.

The result is now evident from Corollary 6.8. q.e.d.

Finally, consider the case of an nth-order linear equation

(6.10) u(n) + `1u(n−1) + · · ·+ `n−1u

′ + `nu = 0 .

In this instance the adjoint equation generally refers to the nth-order linear equation

(6.11) (−1)nv(n) + (−1)n−1(`1v)(n−1) + · · ·+ (−1)(`n−1v) ′ + `nv = 0 ,

e.g., the adjoint equation of

(6.12) u ′′ + `1u′ + `2u = 0

is

(6.13) v ′′ − `1v′ + (`2 − ` ′1)v = 0 .

24

Examples 6.14 :

(a) The adjoint equation of Bessel’s equation

y ′′ +1

xy ′ + (1− ν2

x2) y = 0

is

z ′′ − 1

xz ′ + (1− ν2 − 1

x2) z = 0 .

(b) The adjoint equation of any second-order equation of the form

y ′′ + `2y = 0

is the identical equation (despite the fact that they describe differential struc-tures on spaces dual to one another).

To understand why the “adjoint” terminology is used with (6.11) first convert(6.10) to the first order form (3.3) and write the corresponding adjoint equationaccordingly, i.e., as

(6.15) x ′ −Bτx = 0, −Bτ =

0 0 0 · · · 0 −`n

1 0 0 0 −`n−1

0 1 0...

...

. . . . . .

... 1 0 −`2

0 0 1 −`1

.

The use of the terminology is then explained by the following result.

Proposition 6.16 : A column vector x = (x1, . . . , xn)τ ∈ coln(K) is a solution of(6.15) if any only if xn is a solution of (6.11) and

(i) xn−j = (−1)jx(j)n +

j−1∑i=0

(−1)i(`j−ixn)(i) for j = 1, . . . , n− 1 .

25

Proof :

⇒ If x = (x1, . . . , xn)τ satisfies (6.15) then

(ii) x ′j = −xj−1 + `n+1−jxn for j = 1, . . . , n ,

where x0 := 0. It follows that

x ′n = −xn−1 + `1xn,

x ′′n = −x ′

n−1 + (`1xn) ′

= −(−xn−2 + `2xn) + (`1xn) ′

= (−1)2xn−2 + (−1)`2xn + (`1xn) ′,

x(3)n = (−1)2x ′

n−2 + (−1)(`2xn) ′ + (`1xn) ′′

= (−1)2(−xn−3 + `3xn) + (−1)(`2xn) ′ + (`1xn) ′′

= (−1)3xn−3 + (−1)2`3xn + (−1)(`2xn) ′ + (`1xn) ′′,

and by induction (on j) that

x(j)n = (−1)jxn−j +

j−1∑i=0

(−1)j−1−i(`j−ixn)(i) , j = 1, . . . , n .

This is equivalent to (i), and equation (6.11) amounts to the case j = n.

⇐ Conversely, suppose xn is a solution of (6.11) and that (i) holds. We mustshow that (iii) holds or, equivalently, that

x ′n−j = −xn−(j+1) + `j+1xn for j = 1, . . . , n.

This, however, is immediate from (i). Indeed, we have

x ′n−j = (−1)jx

(j+1)n +

∑j−1i=0 (−1)i(`j−ixn)i+1

= (−1)jx(j+1)n +

∑ji=1(−1)i+1(`j+1−ixn)(i)

= (−1)jx(j+1)n +

∑ji=0(−1)i+1(`j+1−ixn)(i) + `j+1xn

= −((−1)j+1x

(j+1)n +

∑ji=0(`j+1−ixn)(i)

)+ `j+1xn

= −xn−(j+1) + `j+1xn .

q.e.d.

For completeness we record the nth-order formulation of Corollary 6.9.

26

Proposition 6.17 (“Variation of Constants”) : Suppose `1, . . . , `n ∈ K and{y1, . . . , yn} ⊂ coln(K) is a collection of solutions of the nth-order equation

(i) u(n) + `1u(n−1) + · · ·+ `n−1u

′ + `nu = 0

linearly independent over KC. Let

M :=

y1 y2 · · · yn

y ′1 y ′

2

... y ′n

y(2)1 y

(2)2

...

...

y(n−1)1 y

(n−1)2 · · · · · · y

(n−1)n

and let (z1, . . . , zn) denote the nth-row of the matrix (M τ )−1. Suppose k ∈ K is suchthat

∫kzj ∈ K for j = 1, . . . , n. Then

(ii) y :=∑

j(∫

kzj) · yj

is a solution of the inhomogeneous equation

(iii) u(n) + `1u(n−1) + · · ·+ `n−1u

′ + `nu = k .

Proof : Convert (i) to a first-order system as in (3.3) and note from Propositions1.6 and 3.4 that M is a fundamental matrix solution. Denote the jth-column ofM by yj and apply Corollary 6.9 with b = (0, . . . , 0, k)τ and yj (in that statement)replaced by yj so as to achieve

y ′ + By = b.

Now write y = (y, y2, . . . , yn)τ and eliminate y2, . . . , yn in the final row of (iii) byexpressing these entities in terms of y and derivatives thereof: the result is (iii).

q.e.d.

Since our formulation of Proposition 6.17 is not quite standard5, a simple exampleseems warranted.

5Cf. [C-L, Chapter 3, §6, Theorem 6.4, p. 87].

27

Example 6.18 : For K = C(x) with derivation ddx

we consider the inhomogeneoussecond-order equation

(i) u ′′ +2

xu ′ − 6

x2u = x3 + 4x ,

and for the associated homogeneous equation

u ′′ +2

xu ′ − 6

x2u = 0

take y1 = x2 and y2 = 1/x3 so as to satisfy the hypothesis of Proposition 6.17. Inthe notation of that proposition we have

M =

(x2 1

x3

2x − 3x4

), (M τ )−1 =

(3

5x22x3

5

15x−x4

5

),

and from the second matrix we see that z1 = 1/5x, z2 = −x4/5. A solution to (i) istherefore given by

y =(∫

15x· (x3 + 4x)

)· x2 +

((−1)

∫x4

5· (x3 + 4x)

)· 1

x3

=x5

24+

2x3

3,

as is easily checked directly.

28

7. Cyclic Vectors

Throughout the section K is a differential field and (V, D) is a differential K-module of dimension n ≥ 1.

Define D 0 := idV (the identity operator on V ), D 1 := D , and Dk := D ◦ Dk−1

for k > 1. Using (1.1) and induction on n ≥ 1 we see that for any k ∈ K andv ∈ V we have the Leibniz rule

(7.1) Dn(kv) =n∑

j=0

(nj

)k(j)Dn−jv .

A vector v ∈ V is cyclic (w.r.t. D), or is D-cyclic, if {v, Dv, . . . , Dn−1v} is a basisof V .

Example 7.2 : Suppose K has characteristic 0 and contains an element k suchthat k ′ = 1. Moreover, assume V admits a basis {e1, . . . , en} of horizontal vectors.Then the vector

v :=n−1∑j=0

kj

j!ej+1

is cyclic. Indeed, by induction one sees that

D`v =n−1∑j=`

kj−`

(j − `)!ej+1

for ` = 1, . . . , n− 1, and the claim follows easily.

The hypotheses of Example 7.2 are too restrictive for most applications, but wewill see that cyclic vectors always exist when K has characteristic 0, the inclusionKC ⊂ K is proper (i.e., the derivation on K is nontrivial), and the field of constantsKC is algebraically closed. The goal of the section is a proof of this assertion,6 butit seems preferable to first indicate why such vectors might be of interest.

6The proof we offer is due to J. Kovacic (unpublished). However, any errors that appear are theresponsibility of R. Churchill. For a constructive proof, and references to alternate proofs, see [C-K].

29

Proposition 7.3 : The following assertions are equivalent:

(a) D∗ : V ∗ → V ∗ admits a cyclic vector;

(b) there is a basis e of V such that the defining e-equation of D has the form

(i) x ′ +

0 −1 0 · · · 0

0 0 −1 0...

.... . . . . . . . .

0 −1p0 p1 · · · pn−2 pn−1

x = 0,

and

(c) there is a basis representation of D which can be converted to the nth-orderform

(ii) y(n) + pn−1y(n−1) + · · ·+ p1y

′ + p0y = 0 .

Less formally: finding a cyclic vector for D∗ is equivalent to expressing D interms of an nth-order homogeneous linear differential equation.

Proof :

(a)⇔ (b) : From the definition of a cyclic vector one sees that D∗ admits such avector v∗ if and only if there is a basis (v∗, e∗2, . . . , e

∗n) of V ∗ such that the associated

defining matrix has the form

0 0 · · · 0 −p0

1 0 0 0 −p1

0 1 0...

......

. . . . . . . . .

0 −pn−2

0 0 · · · 0 1 −pn−1

.

Since this matrix is the negative transpose of that in (i), the asserted equivalence isevident from Proposition 6.3.

30

(b) ⇔ (c) : This equivalence has already been discussed: see the paragraph sur-rounding (3.3).

q.e.d.

We turn our attention to the existence problem. Although Proposition 7.3 isphrased in terms of cyclic vectors for D∗, it suffices, since D∗∗ ' D, to concentrateon the existence of cyclic vectors for D.

A D-invariant subspace W ⊂ V is a differential subspace, or D-subspace, ofV . As the reader can easily check, the intersection of any family of such subspacesis again such a subspace. In particular, the intersection 〈S〉 of those differentialsubspaces contained some nonempty subset S ⊂ V is a D-subspace; this is thesubspace differentially generated by S. When S = {s1, . . . , sr} ⊂ V is finite thenotation 〈{s1, . . . , sr}〉 is abbreviated to 〈s1, . . . , sr〉. In particular, 〈v〉 denotes thesubspace differentially generated by a single vector v ∈ V .

Proposition 7.4 : When 1 ≤ m ≤ n and W ⊂ V is an m-dimensional D-spaceand v ∈ W the following statements are equivalent:

(a) 〈v〉 = W ;

(b) {v, Dv, . . . , Dm−1v} is a basis of W .

Proof :(a)⇒ (b) : If (b) is false there must be scalars kj ∈ K such that

∑m−1j=0 kjD

jv = 0.Letting p denote the maximal j ≤ m − 1 satisfying kj 6= 0 we can write this

dependence relation in the form Dpv =∑p−1

j=0 kjDjv , from which we easily see that

Dp+sv must be in the span of {v, Dv, . . . , Dp−1v} for all s ≥ 0. We conclude thatdimK(〈v〉) ≤ p < m = dimK(W ), hence that 〈v〉 6= W , and we have a contradiction.

(b) ⇒ (a) : Obvious.q.e.d.

Corollary 7.5 : A vector v ∈ V is cyclic if and only if V = 〈v〉 .

Our proof of the existence of cyclic vectors requires four lemmas.

Lemma 7.6 : Suppose w, v ∈ V are non zero vectors satisfying both V = 〈w, v〉and 〈v〉 6= V . Then :

(a) V = 〈w〉+ 〈v〉 ;

(b) 〈w〉 ∩ 〈v〉 = {0} if and only if dimK(V ) = dimK(〈v〉) + dimK(〈w〉) ; and

31

(c) the inclusion 〈w〉 ∩ 〈v〉 ⊂ 〈w〉 is proper.

Proof :(a) : 〈w〉 + 〈v〉 is a differential subspace of V containing the set {w, v}, and

〈w, v〉 = V is the intersection of all such subspaces. The equality follows.(b) and (c) : By (a) (and elementary linear algebra) we have dimK(V )

= dimK(〈w〉) + dimK(〈v〉) − dimK(〈w〉 ∩ 〈v〉). Assertion (b) follows immediately,and if (c) fails the equality reduces to dimK(V ) = dimK(〈v〉), contradicting V 6= 〈v〉.

q.e.d.

Lemma 7.7 : Suppose KC is algebraically closed, of characteristic 0, and the in-clusion KC ⊂ K is proper. Let W ⊂ V be a nontrivial differential subspace of V ,let 1 ≤ r ∈ Z, and let w1, . . . , wr ∈ W be subject only to the restriction wr 6= 0.Then there is an element ` ∈ K such that

∑rj=0 `(j)wj 6= 0.

Without the characteristic 0 assumption the proper inclusion hypothesis on KC ⊂K can fail, e.g., when K is a finite field one has KC = K.

Proof : Choose any k ∈ K\KC . It is not difficult to see that when k is algebraicover KC one has k ∈ KC , contrary to hypothesis, and we conclude that k must betranscendental over KC . In particular, the collection {1, k, . . . , k r} must be linearlyindependent over KC .

Fix a basis (e1, e2, . . . , en) of V , write wj =∑n

i=1 aijei, and choose ` ∈ K atrandom. Then from∑r

j=0`(j)wj =

∑j `(j)

∑i aijei =

∑i(∑

j aij`(j))ei

we see that ∑rj=0`

(j)wj = 0⇔∑

j aij`(j) = 0 for i = 1, . . . , n .

In other words,∑r

j=0 `(j)wj = 0 if and only if ` is a solution of the system

a1ry(r) + a1,r−1y

(r−1) + · · ·+ a10y = 0

a2ry(r) + a2,r−1y

(r−1) + · · ·+ a20y = 0

......

anry(r) + an,r−1y

(r−1) + · · ·+ an0y = 0 .

But from Corollary 4.3 we know that the solution space of any one (and thereforeall) of these rth-order homogeneous linear differential equations has KC-dimension

32

at most r. From the previous paragraph we conclude that for any k ∈ K\KC atleast one of 1, k, . . . , k r cannot be a solution, and the lemma is thereby established.

q.e.d.

Lemma 7.8 : Assume KC is algebraically closed of characteristic 0 and the inclu-sion KC ⊂ K is proper. Suppose w, v ∈ V are non-zero and satisfy 〈w〉∩〈v〉 = {0}.Then there is an element ` ∈ K with the property that for v := v + `w one has

(a) 〈w, v〉 = 〈w, v〉 and

(b) dimK(〈v〉) > dimK(〈v〉).

Proof :(a) The equality holds for any ` ∈ K.(b) Let r := dimK(〈v〉). Then {v, Dv, . . . , D r−1v} is a basis of this space

(by Proposition 7.4(b)) and the collection {v, Dv, . . . , D r−1v, Drv} is therefore lin-early dependent over K. In particular we can find scalars a0, . . . , ar ∈ K, wherew.l.o.g. ar = 1, such that

∑rj=0 ajD

jv = 0.Now set

wj :=∑r

i=j

(ij

)aiD

i−jw, j = 0, 1, . . . , r ,

and note that wr = w 6= 0. It follows from Lemma 7.7 that we can find an element` ∈ K such that

(i) 0 6=∑r

j=0 `(j)wj =∑r

j=0 `(j)∑r

i=j aiDi−jw.

Define v := v + `w.Suppose (b) is false. Then there is an integer 1 ≤ s ≤ r, and elements bi ∈ K

for i = 0, . . . , s, where w.l.o.g. bs = 1, such that∑s

i=0 biDiv = 0, i.e., such that

0 =∑s

i=0biDiv =

∑si=0 biD

iv +∑s

i=0 bi

∑ij=0

(ij

)`(j)Di−jw ,

where in computing the final term we have used (7.1). The first term on the rightis in 〈v〉 while the second is in 〈w〉, and since 〈w〉 ∩ 〈v〉 = {0} it follows thatboth must vanish. Immediate consequences of this vanishing are: s = r (because∑s

i=0 biDiv = 0 and bs = 1); ai = bi for i = 0, 1, . . . , r ; and

0 =∑r

i=0ai

∑ij=0

(ij

)`(j)Di−jw =

∑rj=0 `(j)

∑ri=j

(ij

)aiD

i−jw =∑r

j=0 `(j)wj .

33

This contradicts (i), and (b) is thereby established.q.e.d.

When W ⊂ V is a D-subspace the quotient space V/W inherits a differentialstructure in the expected way, i.e., [v] := v + W 7→ [Dv] := Dv + W . This “quotientdifferential structure” is useful for induction arguments, such as the proof of thefollowing result.

Lemma 7.9 : Suppose n = dimK(V ) > 1 and every differential K-space of lowerpositive dimension admits a cyclic vector. Then for any 0 6= w ∈ V there is a v ∈ Vsuch that 〈w, v〉 = V .

Proof : If 〈w〉 = V take v = 0; if not give V/〈w〉 the quotient differential structureand let π : V → V/〈w〉 denote the quotient mapping. By assumption V/〈w〉 containsa cyclic vector [v], and for any v ∈ π−1([v]) we then have V = 〈w, v〉. q.e.d.

Theorem 7.10 (The Cyclic Vector Theorem) : Suppose KC is algebraicallyclosed of characteristic 0 and the inclusion KC ⊂ K is proper. Then V contains acyclic vector.

Proof : We argue by induction on n = dimK(V ), omitting the trivial case n = 1.We assume n > 1, and that the result holds for all non trivial differential K-spacesof dimension strictly less than n.

Choose 0 6= w1 ∈ V at random. If 〈w1〉 = V we are done; otherwise we invokeLemma 7.9 (and the induction hypothesis) to guarantee the existence of a vectorv1 ∈ V such that 〈w1, v1〉 = V .

We may assume that

(i) dimK(〈v1〉) > dimK(V )− dimK(〈w1〉) ,

and (as a consequence) that

(ii) 〈w1〉 ∩ 〈v1〉 6= {0}.

Indeed, if (i) fails then 〈w1〉∩〈v1〉 = {0} by Lemma 7.6(b), and we can use Lemma 7.8to replace v1 with a vector v1 such that 〈w1, v1〉 = 〈w1, v1〉 = V and dimK(〈v1〉) >dimK(〈v1〉). Inequality (i) is then evident from dim(〈v1〉) = dimK(V )− dimK(〈w1〉).

If 〈v1〉 = V we are done. Otherwise we use (ii) to find some 0 6= w2 ∈ 〈w1〉∩〈v1〉,noting from Lemma 7.6(c) that dimK(〈w2〉) < dimK(〈w1〉) (which implies that w2

34

cannot be cyclic). Repeating the argument of the previous two paragraphs with w2

replacing w1 we then produce a vector v2 ∈ V such that V = 〈w2, v2〉,

dimK(〈v2〉) > dimK(V )− dimK(〈w2〉) ,

and

〈w2〉 ∩ 〈v2〉 6= {0} .

If 〈v2〉 = V we are done; otherwise we repeat the construction once again, etc.The result is a sequence of subspaces 〈wj〉 with strictly decreasing dimensions and asequence of vectors vj satisfying

dimK(〈vj〉) > dimK(V )− dimK(〈wj〉) .

But this inequality is impossible if we reach dimK(〈wj〉) = 0 (because 〈vj〉 ⊂ V ),and we conclude that the iteration terminates after finitely many steps. Since theonly requirement for continuing is that vj not be cyclic, this proves the theorem.

q.e.d.

35

8. Extensions of Differential Structures

Here K is a differential field with derivation k 7→ k ′, V is a K-space (i.e., a vectorspace over K) of dimension n, and D : V → V is a differential structure. Primes ′

will also be used to indicate the derivation on any differential extension field L ⊃ K.

Recall7 that when L ⊃ K is an extension field of K (not necessarily differential)the tensor product L ⊗K V over K admits an L-space structure characterized by` · (m ⊗K v) = (`m) ⊗K v. This structure will always be assumed. By means of theK-embedding

(8.1) v ∈ V 7→ 1⊗ v ∈ L⊗K V

one views V as a K-subspace of L⊗K V when the latter is considered as a K-space.In particular, any (ordered) basis e of V can be regarded as a subset of L⊗K V .

Proposition 8.2 (“Extension of the Base”) : Assuming the notation of the pre-vious paragraph any basis of the K-space V is also a basis of the the L-space L⊗K V .In particular,

(i) dimK V = dimL(L⊗K V ) .

Proof : See, e.g., [Lang, Chapter XVI, §4, Proposition 4.1, p. 623]. q.e.d.

Proposition 8.3 : Suppose W, V and W are finite-dimensional 8 K-spaces andT : V → W and T : V → W are K-linear mappings. Then there is a K-linearmapping T ⊗K T : V ⊗K V → W ⊗K W characterized by

(i) (T ⊗K T )(v ⊗K v) = Tv ⊗K T v, v ⊗K v ∈ V ⊗K V .

Proof 9 : There is a standard characterization of the tensor product V ⊗K V interms of K-bilinear mappings of V × V into K-spaces Y , e.g., see [Lang, ChapterXVI, §1, p. 602]. The proposition results from considering the K-bilinear mapping(v, v) ∈ V × V 7→ Tv ⊗K T v ∈ W ⊗K W . q.e.d.

7As a general reference for the remarks in this paragraph see, e.g., [Lang, Chapter XVI, §4,pp. 623-4, particularly Example 2]. Except for references to bases, most of what we say does notrequire V to be finite-dimensional.

8The finite-dimensional hypothesis is not needed; it is assumed only because this is a standinghypothesis for V .

9We offer only a quick sketch. The result is more important for out purposes than a formal proof,and filling in all the details would lead us too far afield.

36

Proposition 8.4 : To any differential field extension L ⊃ K there corresponds aunique differential structure DL : L ⊗K V → L ⊗K V extending D : V → V , andthis structure is characterized by the property

(i) DL(`⊗K V ) = ` ′ ⊗K v + `⊗K Dv, `⊗K v ∈ L⊗K V.

Recall from (8.1) that we are viewing V as a K-subspace of L ⊗K V by iden-tifying V with its image under the embedding v 7→ 1 ⊗K v. Assuming (i) we haveDL(1 ⊗K v) = 1 ⊗K Dv ' Dv for any v ∈ V , and this is the meaning of DL

“extending” D.In the proof we denote the derivation ` 7→ ` ′ by δ : L → L, and we also write

the restriction δ|K as δ.One is tempted to prove the proposition by invoking Proposition 8.3 so as to

define mappings δ ⊗K idV : L⊗K V → L⊗K V and idL ⊗K D : L⊗K V → L⊗K Vand to then set DL := δ ⊗K idV + idL ⊗K D. Unfortunately, Proposition 8.3 doesnot apply since D is not K-linear.

Proof10 : The way around the problem is to first use the fact that D is KC-linear;one can then conclude from Proposition 8.3 (with K replaced by KC) that a KC-linear mapping D : L⊗KC

V → L⊗KCV is defined by

(ii) D := δ ⊗KCidV + idKC

⊗KCD .

The next step is to define Y ⊂ L⊗KCV to be the KC-subspace generated by all

vectors of the form `k ⊗KCv − ` ⊗KC

kv, where ` ∈ L, k ∈ K and v ∈ V . Thenfrom the calculation

D(`k ⊗KCv − `⊗KC

kv) = δ(`k)⊗KCv + `k ⊗KC

Dv

− δ(`)⊗KCkv − `⊗KC

D(kv)

= `k ′ ⊗KCv + k` ′ ⊗KC

v + `k ⊗KCDv

− ` ′ ⊗KCkv − `⊗KC

(k ′v + kDv)

= `k ′ ⊗KCv − `⊗KC

k ′v

+ ` ′k ⊗KCv − `⊗KC

k ′v

+ `k ⊗KCDv − `⊗KC

kDv

10Footnote 9 applies here also.

37

we see that Y is D-invariant, and D therefore induces a KC-linear mappingD : (L⊗KC

V )/Y → (L⊗KCV )/Y which by (ii) satisfies

(iii) D([`⊗KCv]) = [` ′ ⊗KC

v] + [`⊗KCDv],

where the bracket [ ] denotes the equivalence class (i.e., coset) of the accompanyingelement.

Now observe that when L ⊗KCV is viewed as an L-space (resp. K-space), Y

becomes an L-subspace (resp. a K-subspace), and it follows from (iii) that D is adifferential structure when the L-space (resp. K-space) structure is assumed.

In view of the K-space structure on (L ⊗KCV )/Y the K-bilinear mapping11

(`, v) 7→ [` ⊗KCv] induces a K-linear mapping T : L ⊗K V → (L ⊗KC

V )/Ywhich one verifies to be K-isomorphism. It then follows from (iii) and (iv) that themapping DL := T−1 ◦ D ◦ T : L ⊗K V → L ⊗K V satisfies (i), and it follows thatDL is a differential structure on the L-space L⊗K V .

As for uniqueness, suppose D : L ⊗K V → L ⊗K V is any differential structureextending D, i.e., having the property

D(1⊗K v) = 1⊗K Dv, v ∈ V .

Then for any `⊗K v ∈ L⊗K V one has

D(`⊗K v) = D(` · (1⊗K v))

= ` ′ · (1⊗K v) + ` · D(1⊗K v)

= ` ′ ⊗K v + ` · (1⊗K Dv)

= ` ′ ⊗K v + `⊗K Dv

= DL(`⊗K v),

hence D = DL. q.e.d.

When considered over the differential field C(z) = (C(z), ddz

) the linear differentialequation y ′′ = y , has only the trivial solution, but if we “allow solutions” from thedifferential field extension C(z)(exp(z)) = (C(z) exp(z), d

dz) this is no longer the

case. At the conceptual level, “allowing solutions from a differential field extension”simply means considering extensions of the given differential structure. However, atthe computational level these extensions play no role, as one sees from the followingresult.

11Which we note is not L-bilinear, since for v ∈ V the product `v is only defined when ` ∈ K.

38

Proposition 8.5 : Suppose e is a basis of V and

(i) x ′ + Bx = 0

is the defining e-equation for D. Let L ⊃ K be a differential field extension andconsider e as a basis for the L-space L⊗K V . Then the defining e-equation for theextended differential structure DL : L⊗K V → L⊗K V is also (i).

Proof : Since DL extends D the e matrices of these two differential structures arethe same. q.e.d.

A differential field extension L ⊃ K has no new constants if LC = KC . (Notethat LC ⊃ KC is automatic.)

Proposition 8.6 : Suppose e and e are two bases of V and

(i) x ′ + Bx = 0

and

(ii) x ′ + Ax = 0

are the defining e and e-equations of D respectively. Let L ⊃ K be a no new con-stant differential field extension in which each equation admits a fundamental matrixsolution. Then the field extensions of K generated by the entries of these fundamentalmatrix solutions are the same.

Proof : In view of the discussion surrounding (2.14) we can assume A and B arerelated by

A = PBP−1 − P ′P−1,

where P ∈ GL(n,K).Let M N ∈ gl(n, L), be fundamental matrix solutions of (i) and (ii) respectively

and set M := PM . Then from

M ′ = (PM) ′

= PM ′ + P ′M

= P (−BN) + P ′M

= −PBP−1M + P ′P−1M

39

we see that0 = M ′ + (PBP−1 − P ′P−1)M = M ′ + AM,

and we conclude that M ∈ GL(n, L) is also a fundamental matrix solution of (ii).By Proposition 5.5 we have PM = M = NC for some C ∈ gl(n, LC), and we cantherefore write

(iii) M = P−1NC.

The entries of P−1 are in K, and by the no new constant hypothesis the same holdsfor the entries of C. The result follows. q.e.d.

40

9. The Differential Galois Group

Here K is a differential field and (V, D) is a differential K-module. We assumedimK V = n <∞.

A Picard-Vessiot extension for (V, D) is a differential field extension L ⊃ Ksatisfying the following conditions:

(a) the extension has no new constants;

(b) the L-space L⊗K V admits a basis consisting of horizontal vectors of DL; and

(c) when M ⊃ K is any other differential extension satisfying (a) and (b) thereis a differential field embedding ϕ : L → M over K, i.e., a field embeddingover K satisfying

(9.1) ϕ ◦ δL = δM ◦ ϕ .

The intuitive idea behind (c) is that L ⊃ M is “minimal” among differential ex-tensions M ⊃ K satisfying properties (a) and (b): any other such extension is “atleast as big” in the sense that it must contain an isomorphic copy of L.

The differential Galois group of (V, D) corresponding to an associated Picard-Vessiot extension L ⊃ K is the group GL of automorphisms of L over K whichcommute with the derivation δL on L. This group obviously depends on L, but, aswe will see, only up to isomorphism.

We need a few preliminaries. Define a fundamental matrix solution of (V, D) inGL(n, L) to be any fundamental matrix solution of any defining equation of DL.

Proposition 9.2 : Condition (b) is equivalent to the existence of a fundamentalmatrix solution for (L⊗K V, DL) in GL(n, L).

Proof : By Corollary 5.4.

I thank J. Kovacic for the next observation.

41

Proposition 9.3 : When L ⊃ K is a Picard-Vessiot extension for (V, D) thefollowing equivalent conditions hold.

(c1) If L ⊃ M ⊃ K is an intermediate differential field, and if the differential fieldextension M ⊃ K also satisfies (a) and (b), then there is a differential fieldembedding φ : L→M over K.

(c2) For any fundamental matrix solution Z ∈ GL(n, L) for (V, D) one has L =K(Z), i.e., L is generated by the entries of Z.

(c3) If L ⊃ M ⊃ K is an intermediate differential field, and if the differential fieldextension M ⊃ K also satisfies (a) and (b), then M = L.

Proof : Condition (c1) is immediate from (c).

(c1) ⇒ (c2) : Set M := K(Z) ⊂ L. We must prove that L ⊂M .Since the extension L ⊃ K has no new constants the same is true of M ⊃ K, and

by construction M contains the fundamental solution matrix Z of (V, D). It followsfrom Proposition 9.2 that conditions (a) and (b) in the definition of a Picard-Vessiotextension are satisfied. By (c) there is a differential embedding φ : L→ K(Z), andfrom (9.1) one sees that φ(Z) is also a fundamental matrix solution for (V, D). FromProposition 5.5 we conclude that ZC = φ(Z) for some C ∈ GL(n, LC) = GL(n,KC),the last equality by (a).

Choose any ` ∈ L. Then φ(`) ∈M = K(Z), and we can therefore write

φ(`) =p(Z)

q(Z), where p, q ∈ K[Xij] and q(Z) 6= 0.

Consider the elements p(ZC−1) and q(ZC−1) of M . Since φ is an embedding overK and C ∈ GL(n,KC) ⊂ GL(n, K) we have

φ(p(ZC−1)) = p(φ(ZC−1)) = p(φ(Z)φ(C−1)) = p(φ(Z)C−1) = p(Z)

and, similarly,φ(q(ZC−1)) = q(Z).

Note from this last identity and q(Z) 6= 0 that q(ZC−1) 6= 0. Since φ is injectivewe see from

φ

(`− p(ZC−1)

q(ZC−1)

)= φ(`)− p(Z)

q(Z)= 0

that ` = p(ZC−1)q(ZC−1)

∈M , and L ⊂M follows.

42

(c2)⇒ (c3) : If W ∈ GL(n,M) is a fundamental matrix solution for (V, D) thenM = K(W ) by (c2). However, W may also be considered as a fundamental matrixsolution of (V, D) in GL(n, L), and, as above, we therefore have W = ZC for someC ∈ GL(n, KC). The equalities M = K(W ) = K(ZC) = K(Z) = L follow.

(c3) ⇒ (c1) : Take φ := idL.

q.e.d.

Corollary 9.4 : When L ⊃ K is a Picard-Vessiot extension for (V, D) any differ-ential field embedding φ : L→ L over K is an automorphism.

Proof : φ(L) ⊂ L satisfies (a) and (b), hence φ(L) = L by (c3). q.e.d.

Theorem 9.5 : Suppose L ⊃ K and M ⊃ K are Picard-Vessiot extensions for(V, D). Then there is a differential field isomorphism φ : L → M over K, and theassociated differential Galois groups are isomorphic.

The mapping φ is not unique, e.g., for any g ∈ GM the composition g ◦ φ : L→M is another differential field isomorphism over K. Nevertheless, the result will beused to justify reference to “the” Picard-Vessiot extension of a differential K-module(V, D). A similar convention is used with differential Galois groups: one refers to“the” differential Galois group of (V, D).

Proof : By definition we have differential embeddings L−→← M over K which by

Corollary 9.4 must be isomorphisms. If we let φ : L → M denote the upper arrowwe obtain an isomorphism σ : GL → GM between the differential Galois groups byassigning g ∈ GL to φ◦g◦φ−1 ∈ GM ; the inverse is given by h ∈ GM 7→ φ−1◦h◦φ ∈GL. q.e.d.

Picard-Vessiot extensions and differential Galois groups have traditionally beenassociated with homogeneous linear ordinary differential equations, i.e., with whatwe are viewing as basis representations of differential modules. Readers familiar withthat approach will see immediately from Proposition 9.3(c2) that the extensions andgroups we have defined for a differential module agree with the traditional definitionsfor any defining equation of that structure.

One question we have not addressed is existence, i.e., does a Picard-Vessiot ex-tension exist for any differential structure? In general the answer is no: the standardproof one needs characteristic zero and algebraically closed hypotheses on KC .

43

Bibliography

[C-K] R.C. Churchill and J. Kovacic, Cyclic Vectors, in Differential Algebraand Related Topics, (Li Guo, P. Cassidy, W. Keigher and W. Sit, eds.),World Scientific, Singapore, 2002.

[C-L] E.A. Coddington and N. Levinson, Theory of Ordinary DifferentialEquations, McGraw-Hill, New York, 1955.

[Lang] S. Lang, Algebra, Revised Third Edition, Springer-Verlag, New York,2002.

[Poole] E.G.C. Poole, Introduction to the Theory of Linear Differential Equa-tions, Dover Publications, New York, 1960,

[vdP-S] M. van der Put and M.F. Singer, Galois Theory of Linear DifferentialEquations, Springer-Verlag, Berlin, 2003.

R.C. ChurchillDepartment of MathematicsHunter College and the Graduate Center, CUNY,

and the University of Calgarye-mail [email protected]

44

a geometric approach to linear ordinary diﬀerential...

Documents