math 537 class notesbelked/lecturenotes/537/537.pdf · math 537 class notes ed belk fall, 2014 1...

MATH 537 Class Notes

Ed Belk

Fall, 2014

1 Week One

1.1 Lecture One

Instructor: Greg Martin, Office Math 212

Text: Niven, Zuckerman & Montgomery

Conventions: N will denote the set of positive integers, and N0 the set of nonnegative integers. Unlessotherwise stated, all variables are assumed to be elements of N.

§1.2 – Divisibility

Definition: Let a, b ∈ Z with a 6= 0. Then a is said to divide b, denoted a|b, if there exists some c ∈ Z suchthat ac = b. If in addition a ∈ N, then a is called a divisor of b.

Properties of Divisibility: For all a, b, c ∈ Z with a 6= 0, one has:

• If a|b then ±a| ± b

• 1|b, b|b, a|0

• If a|b and b|a then a = ±b

• If a|b and a|c, then a|(bx+ cy) for any x, y ∈ Z

If we assume that a and b are positive, we also have

• If a|b then a ≤ b

The Division Algorithm: Let a, b ∈ N. Then there exist unique natural numbers q and r such that:

1. b = aq + r, and

2. 0 ≤ r < a

Proof : We prove existence first; consider the set

R = b− an : n ∈ N0 ∩ N0.

By the well-ordering axiom, R has a least element r, and we define q to be the nonnegative integer q such thatb− aq = r. Then b = aq + r and r ≥ 0; moreover, if r ≥ a then one has

0 ≤ r − a = (b− aq)− a = b− a(q + 1) < b− aq + r,

contradicting the minimality of r ∈ R, and we are done.

1

Now, suppose q′ and r′ are such that we have

b = aq + r = aq′ + r′.

Without loss of generality we may assume than r ≥ r′. Then

r − r′ = (b− aq)− (b− aq′) = a(q′ − q)⇒ a|(r − r′);

but 0 ≤ r − r′ ≤ r < a, and so the above equation is a contradiction unless r − r′ = 0, and the result isimmediate.

Greatest Common Divisor: Given any two integers a and b not both equal to zero, we define their greatestcommon divisor (commonly abbreviated gcd) to be the largest d ∈ N such that d|a and d|b; we write d = (a, b).Note that because a and b each have only finitely many divisors, the gcd is always well-defined.

Theorem 1.1.1 Let a, b ∈ Z, not both equal to zero. Then:

1. (a, b) = minS, where S = (ax+ by : x, y ∈ Z ∩ N), and

2. For any c ∈ Z such that c|a and c|b, we have c|(a, b).

The existence of integers x, y so that ax+ by = (a, b) as in part (1) is known as Bezout’s identity.

Proof : 1. Let m = minS, with u and v such that m = au+ bv, and let g = (a, b); note that m ≤ a. Since g|aand g|b, we know from the properties of divisibility that g|m and so g ≤ m. Now, if m - a then by the divisionalgorithm we may write a = mq + r with 0 < r < m, and thus

r = a−mq = a− q(au+ bv) = a(1− qu) + b(−qv) ∈ S,

and we deduce that r ≥ m = minS, a contradiction; thus m|a. In the same fashion we show m|b, and so bydefinition m ≤ (a, b) = g, and we are done.

2. If c|a and c|b, then we know c|(ax + by) for every x, y ∈ Z, and in particular for those u, v such that(a, b) = au+ bv, whose existence is guaranteed by part 1.

2

1.2 Lecture Two

Recall: Bezout’s identity states that (a, b) is the smallest positive integer that may be written ax+ by, wherex, y ∈ Z.

Proposition 1.2.1 For a, b ∈ N, one has (ma,mb) = m(a, b).

Corollary 1: If d|a, d|b, then(ad ,

bd

)= 1

d(a, b); in particular,(

a(a,b) ,

b(a,b)

)= 1.

Proof : Set g = (a, b), so that we may writeax+ by = g,

for some x, y ∈ Z. Thenmg = (ma)x+ (mb)y, thus mg ≥ (ma,mb).

Furthermore, g|a and so mg|ma; similarly mg|mb, thus mg ≤ (ma,mb), and we are done.

Definition: Two integers a and b are called relatively prime (or coprime) if (a, b) = 1.

nb. We observe that (a, b) = 1 if and only if there exist x, y such that ax+by = 1. The corresponding statementwith (a, b) = k > 1 is not, in general, true, however it is the case that

ax+ by = k ⇒ (a, b)|k.

Proposition 1.2.2 If (a, n) = (b, n) = 1, then (ab, n) = 1.

Proof : Suppose we have u, v, x, y so that au+ nv = bx+ ny = 1; then we have

1 = 1 · 1 = (au+ nv)(bx+ ny) = ab(ux) + n(auy + bvx+ nvy),

and the result is immediate.

[Aside: Compare with the analagous result in commutative algebra. If R is a commutative, unital ring andI, J,K ⊂ R are ideals such that I +K = J +K = R, then IJ +K = R.]

Proposition 1.2.3 If a|c, b|c, and (a, b) = 1, then ab|c. (Note that this is not, in general, true for (a, b) > 1,e.g. a = b = c = 2.)

Proof : Choose m,n, x, y so that c = am = bn and ax+ by = 1. Then

c = cax+ cby = (bn)ax+ (am)by = ab(nx+my),

and we deduce that ab|c.

Theorem 1.2.4 (Theorem 1.10, Niven) If d|ab and (b, d) = 1, then d|a.

Proof : Exercise.

nb. If d|a, d|b, then d|b+ax for any x ∈ Z. In fact, the condition is also necessary, as b = (b+ax)−x(a).

The Euclidean Algorithm: How can we find the gcd of two integers, for example 537 and 105?

By the division algorithm, we have 537 = 5 · 105 + 12, and so by the above note we know (537, 105) = (105, 12).Repeating this process, we see

105 = 8 · 12 + 9⇒ (105, 12) = (12, 9);

12 = 1 · 9 + 3⇒ (12, 9) = (9, 3);

3

9 = 3 · 3 + 0⇒ (9, 3) = (3, 0) = 3.

Thus (537, 105) = 3.

Notation: The least common multiple of a and b is denoted lcm(a, b) or, more commonly, [a, b].

Exercise: Show that (a, b)[a, b] = ab.

§1.3 – Primes

Definition: A natural number n is called prime if it has exactly two divisors. n is called composite if thereexists some d with 1 < d < n such that d|n. The integer n = 1 is neither prime nor composite.

Notation: Unless otherwise stated, p will denote a prime number.

Lemma 1.2.5 (Euclid’s lemma) If p|ab, then p|a or p|b.

Proof : Suppose p - b. Then (p, b) = 1, and so by theorem 1.2.4 we know that p|a.

Theorem 1.2.6 (The Fundamental Theorem of Arithmetic) Every n ∈ N, n > 2 may be written as the productof primes; moreover this expression is unique up to reordering of the factors.

Proof : (existence) We use strong induction. The case n = 2 is trivial from the definition of a prime, thereforesuppose n > 2. If n is prime we have the trivial factorization n = n, otherwise we may write n = ab, with1 < a < n and 1 < b < n. By the inductive hypothesis we may write a = p1p2 · · · pk, b = q1q2 · · · ql, with eachpi, qj prime, and the result is immediate.

(uniqueness) Let n ∈ N and suppose we have

n = p1p2 · · · pk = q1q2 · · · ql, each pi, qj prime.

Since p1|q1q2 · · · ql we have by lemma 1.2.5 that p1|q1 or p1|q2 · · · ql. Repeating this process as many times asnecessary, we find qt such that p1|qt, and by relabelling the qj if necessary we will assume t = 1. Since p1 6= 1this implies that p1 = q1, as q1 has no other factors. We then cancel p1 = q1 on both sides of the equation andwe have

p2p3 · · · pk = q2q3 · · · ql.

We apply the same argument to this expression to obtain p2 = q2, p3 = q3, and so on; it follows that k = l, andwe are done.

4

2 Week Two

2.1 Lecture Three

Doing a linear algebra problem backwards. Consider the augmented matrix(1 0 5370 1 105

);

this system clearly has solution

(xy

)=

(537105

). Moreover, from basic linear algebra we know that the application

of elementary row operations to this augmented system will not change the solution; therefore, with R1, R2

respectively denoting the first and second row of the matrix, we observe that

(xy

)=

(537105

)is also a solution

to the augmented matrices (1 −5 120 1 105

)(R1 → R1 − 5R2),(

1 −5 12−8 41 9

)(R2 → R2 − 8R1),(

9 −46 3−8 41 9

)(R1 → R1 −R2),(

9 −46 3−35 179 0

)(R2 → R2 − 3R1).

Thus we have the matrix equation (9 −46−35 179

)(537105

)=

(30

).

The first entry of this equation indicates that 9(537) + (−46)(105) = 3 = (537, 105), while the entries in thesecond row of the matrix are −35 = − 105

(537,105) and 179 = 537(537,105) . This operation is known as the extended

Euclidean algorithm.

Lemma 2.1.1 Let a, b ∈ N and use the division algorithm to write b = aq + r with 0 ≤ r < a. Then a|b if andonly if r = 0.

Proof : If r = 0 then b = aq and we are done. Conversely, if a|b then a|b−ax for every x, and since r = a−bq < a,we must have r = 0.

Theorem 2.1.2 (Euclid’s theorem) There are infinitely many prime numbers.

Proof : It suffices to show that every finite list of primes excludes at least one prime number. Let p1, p2, . . . , pkbe a set of finitely many primes and let

N = p1p2 · · · pk + 1.

Then N ≥ 2 and so by the fundamental theorem of arithmetic N is the product of primes, so there exists someprime p such that p|N . Applying the division algorithm with N and any pj yields

N = pj(p1 · · · pj−1pj+1 · · · pk) + 1,

which (since 1 < pj) by lemma 2.1.1 implies that pj - N for any j. Thus we deduce that p 6= pj for anyj = 1, 2, . . . , k, and therefore that the set of primes p1, p2, . . . , pk is not exhaustive.

5

§2.1 – Congruences

Definition: Let m ∈ Z,m 6= 0. Given a, b ∈ Z, we say that a is congruent to b modulo m , writtena ≡ b mod m, if m|(b− a). For example, we have

53 ≡ 7 mod 23, but 5 6≡ 37 mod 23.

Lemma 2.1.3 For fixed m 6= 0, “congruence modulo m” is an equivalence relation.

Proof : Clearly a ≡ a mod m because m|0 = a − a, which proves reflexivity. Symmetry is an immediateconsequence of the fact that m|(b− a)⇔ m|(a− b), and to prove transitivity we observe that

a ≡ b mod m, b ≡ c mod m⇒ m|(b− a),m|(c− b)⇒ m|(c− b) + (b− a) = (c− a),

and we are done.

Thus in particular, congruence modulo m (as every equivalence relation) partitions Z into equivalence classes,called residue classes modulo m . For example, one residue class modulo 23 is the set

. . . ,−39,−16, 7, 30, 53, . . ..

In general, a residue class modulo m is of the form a + km : k ∈ Z. Note in particular that a ≡ b mod m ifand only if a and b have the same remainder when dividing by m.

Lemma 2.1.4 Suppose a ≡ b mod m, c ≡ d mod m. Then:

1. If d|m then a ≡ b mod d,

2. a+ c ≡ b+ d mod m,

3. ac = bd mod m.

Proof : We prove only (3), as the others are clear from the definitions: since m|(b−a),m|(c−d), we must havethat m divides c(b− a) + b(d− c) = bd− ac, and the result follows.

The last two parts of lemma 2.1.4 imply further that a − c ≡ b − d mod m, and more generally, if f(X) ∈Z[X], then f(a) ≡ f(b) mod m whenever a ≡ b mod m. In particular, we have that ak ≡ bk mod m for anyk ∈ N.

Question: If j ≡ k mod m, do we have aj ≡ ak mod m?

In general, no: some counterexamples include a = 2,m = 3 or a = 2,m = 4.

We have seen that the operations of addition, subtraction, and multiplication behave well with respect tocongruence modulo m; does division? Again, in general the answer is no:

18 ≡ 28 mod 10, but 9 6≡ 14 mod 10,

as we might expect if we were allowed to “divide by 2.”

Theorem 2.1.5 (Theorem 2.3, Niven) We have ax ≡ ay mod m if and only if x ≡ y mod m(a,m) . In particular,

if (a,m) = 1 thenax ≡ ay mod m⇔ x ≡ y mod m.

6

Proof : Suppose ax ≡ ay mod m so that m|a(y−x); then we have m(a,m) |

a(a,m)(y−x), and since

(m

(a,m) ,a

(a,m)

)= 1

we know that m(a,m) |(y − x), hence x ≡ y mod m

(a,m) . Now, suppose x ≡ y mod m(a,m) so that m

(a,m) |(y − x). Then

we certainly have a m(a,m) |a(y−x), hence a

(a,m)m|a(y−x) and so in particular m|a(y−x), and we are done.

Definition: Given m ∈ Z,m 6= 0, a complete residue system modulo m is a set containing exactly oneelement from each residue class modulo m. For example, with m = 5 we may take any of the sets

0, 1, 2, 3, 4, 1, 2, 3, 4, 5, −2,−1, 0, 1, 2, or −17, 60, 101, 12,−111.

A reduced residue system is a set of representatives from all residue classes relatively prime to m; continuingin the same example, we may take

1, 2, 3, 4 or 537,−7, 1, 99999929.

7

2.2 Lecture Four

Recall: A reduced residue system modulo m is a set consisting of exactly one element form eachresidue class modulo m whose elements are relatively prime to m; these are called reduced residue classes.Equivalently, we may take any complete residue system modulo m, and discard all elements d such that(d,m) > 1.

Example: If m = 10, a complete residue system is given by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; by discarding all elementsnot relatively prime to 10, we obtain the reduced residue system 1, 3, 7, 9. If m is prime, a reduced residuesystem is given by 1, 2, . . . ,m− 1.

Definition: The Euler φ-function (or Euler totient function) is the function which assigns to m ∈ N thecardinality of a reduced residue system modulo m; that is,

φ(m) = #1 ≤ i ≤ m : (i,m) = 1.

For example, φ(10) = 4, and φ(p) = p− 1 for any prime p.

Lemma 2.2.1 Let r1, r2, . . . , rφ(m) be a reduced residue system modulo m and let a ∈ Z with (a,m) = 1.Then

ar1, ar2, . . . , arφ(m)

is also a reduced residue system modulo m.

For example, with m = 10, a = 13, we see that 13, 39, 91, 117 = 13 · 1, 13 · 3, 13 · 7, 13 · 9 is a reduced residuesystem modulo 10.

Proof : By assumption a and each rj are relatively prime to m, and so each arj is also relatively prime to m.Moreover, if ari, arj lie in the same residue class, then one has

ari ≡ arj mod m.

By theorem 2.1.5, we may cancel a (which is relatively prime to the modulus) to yield the congruence

ri ≡ rj mod m,

and hence (since we began with a reduced residue system) we know that i = j, and the result is immediate.

Theorem 2.2.2 (Euler’s theorem) If (a,m) = 1, then aφ(m) ≡ 1 mod m.

Proof : Let r1, r2, . . . , rφ(m) be a reduced residue system modulo m. Then by lemma 2.2.1, the elementsar1, ar2, . . . , arφ(m) are congruent (in some order) to the elements r1, r2, . . . , rφ(m), and therefore

r1r2 · · · rφ(m) ≡ (ar1)(ar2) · · · (arφ(m)) mod m

≡ aφ(m)r1r2 · · · rφ(m) mod m.

Since (r1r2 · · · rφ(m),m) = 1, we may cancel it, and the result follows.

Corollary 1: (Fermat’s little theorem) If p is prime and p - a, then ap−1 ≡ 1 mod p, and for all a ∈ Z one hasap ≡ a mod p.

Corollary 2: Let (a,m) = 1. If there exist e and f with e ≡ f mod φ(m), then ae ≡ af mod m.

For example, 537 ≡ 1 mod 4, and since 4 = φ(10) we have that 3537 ≡ 31 mod 10.

8

Proof : Suppose without loss of generality that f ≥ e and write f = e+ kφ(m). We have

af = ae+kφ(m) = ae(aφ(m))k ≡ ae(1)k mod m

≡ ae mod m,

as claimed.

Definition: Given a,m ∈ Z with m 6= 0, we call x ∈ Z a (multiplicative) inverse of a modulo m ifax ≡ 1 mod m.

Theorem 2.2.3 (Theorem 2.9, Niven) If (a,m) > 1, then a has no inverse modulo m. If (a,m) = 1, thenthere exists a unique reduced residue class modulo m which contains all inverses of a. We denote any suchinverse as a or a−1.

Note that the notation a−1 is justified, as for example if we define a−k to be (a−1)k mod m, then we indeedhave (ak)−1 = (a−1)k.

Proof : Let g = (a,m); note that if ax ≡ 1 mod m then ax ≡ 1 mod g, and since g|a this congruence becomes0x ≡ 1 mod g, a contradiction unless g = 1. Thus with the assumption that g = 1, we first prove uniqueness:if

ax ≡ 1 mod m and ay ≡ 1 mod m,

then ax ≡ ay mod m, hence (since (a,m) = 1) x ≡ y mod m, as claimed. To show existence, we give two shortproofs:

(1) By Euler’s theorem, we have 1 ≡ aφ(m) mod m ≡ a · aφ(m)−1 mod m, so we may take a−1 = aφ(m)−1.

(2) Since (a,m) = 1, there exist integers u, v such that au+ bv = 1. Taking this equation modulo m yields thecongruence au ≡ 1 mod m, and so we may take a−1 = u.

9

2.3 Lecture Five

Calculating inverses: Suppose we want to calculate the (multiplicative) inverse of 9 modulo 20; note thatthis calculation is well-defined, as (9, 20) = 1. We perform the Euclidean algorithm:

20 = 9 · 2 + 2; 9 = 2 · 4 + 1

⇒ 1 = 9− 2 · 4 = 9− 2 · (20− 2 · 9) = 9 · 9− 4 · 20.

Taking this last equation modulo 20, we see that 92 ≡ 1 mod 20, so 9−1 ≡ 9 mod 20. The same equation alsotells us that 20−1 ≡ 4 mod 9. One clearly has

20−1 ≡ 1 mod 19, 19−1 ≡ −1 mod 20,

19−1 ≡ 1 mod 9, 9−1 ≡ −2 mod 19.

Definition: A collection of integers m1,m2, . . . ,mr are called pairwise coprime (or pairwise relativelyprime) if (mi,mj) = 1 for all i 6= j. Note that this is stronger than the statement that (m1,m2, . . . ,mr) = 1.For example, (6, 10, 15) = 1, but (6, 10) = 2, (6, 15) = 3, (10, 15) = 5.

Theorem 2.3.1 (Theorem 2.18, Niven; the Chinese remainder theorem) Let m1,m2, . . . ,mr be pairwise co-prime, and let a1, a2, . . . ,mr be any set of integers. Then there exists a solution x to the system of congruences

x ≡ a1 mod m1,

x ≡ a2 mod m2,

...

x ≡ ar mod mr,

and moreover the set of all solutions is exactly the residue class of x modulo M = m1m2 · · ·mr.

Proof : For j = 1, 2, . . . , r, let Nj = m1m2···mrmj

, and note that (mj , Nj) = 1. Therefore we may define bj to be

the inverse of Nj modulo mj , so Njbj ≡ 1 mod mj . Set

x0 =r∑j=1

Njbjaj ;

we claim that x0 solves our system. Indeed, modulo mj , each Ni with i 6= j is congruent to 0 modulo mj , andso x0 ≡ (Njbj)aj mod mj ≡ aj mod mj , as claimed. Now, if x ≡ x0 mod M , then in particular for each j wehave

x ≡ x0 mod mj ≡ aj mod mj ,

so x is also a solution. Finally, if y is any solution to our system, then y ≡ aj mod mj ≡ x0 mod mj for every j,so mj |(y − x0). Since the mi are pairwise coprime, we have m1m2|(y − x0),m1m2m3|(y − x0), and so on, untilwe obtain M |(y − x0), and we are done.

Remark: If m1,m2, . . . ,mr are not pairwise coprime, then there may be no solution, or there may be oneresidue class of solutions modulo [m1,m2, . . . ,mr]. For example, the system

x ≡ 0 mod 6,

x ≡ 1 mod 4,

10

has no solution, whilex ≡ 0 mod 6,

x ≡ 2 mod 4,

has as its solution the residue class of 6 modulo 12.

Example: Greg steals B boxes of 20 Timbits each. There are an equal number of each of the 9 flavours, andone extra to fill the last box. In class, he divides the Timbits equally among the 19 students, with 4 leftoverfor himself. What is the smallest possible value of B?

Solution: Let t be the total number of Timbits; we have

t ≡ 0 mod 20,

t ≡ 1 mod 9,

t ≡ 4 mod 19.

Set m1 = 20,m2 = 9,m3 = 19; then

N1 = 171, N2 = 380, N3 = 180.

We need b1 ≡ N−11 mod m1 ≡ (9 · 19)−1 mod 20 ≡ (9)−1(19)−1 mod 20 ≡ 11 mod 20, from our previous work.Similarly, b2 ≡ 5 mod 9, b3 ≡ −2 mod 19. Hence

x0 = N1b1a1 +N2b2a2 +N3b3a3 = (171)(11)(0) + (380)(5)(1) + (180)(−2)(4) = 460.

Structural comments: Let Zm = Z/mZ be the set of residue classes modulo m. If d|m, then there is awell-defined projection map πd : Zm → Zd given by

πd(a mod m) = a mod d.

Note that this map is not well-defined if d - m. Now, let m1,m2, . . . ,mr be pairwise coprime. We have amap

π : Zm1m2···mr −→ Zm1 × Zm2 × · · · × Zmr ,

given in each component Zmi by πmi . The Chinese remainder theorem gives a map

ρ : Zm1 × Zm2 × · · · × Zmr −→ Zm1m2···mr

such that π ρ = id. Since each set is finite, we know that π and ρ are bijections. One can check that:

1. π and ρ respect coprimality, and

2. π and ρ respect multiplication and addition.

Hence, π and ρ are ring isomorphisms. In particular, if Z×m is the set of reduced residue classes modulo m,then

π : (Zm1m2···mr)× −→ Z×m1

× Z×m2× · · · × Z×mr

is an isomorphism of multiplicative groups. It follows from this, and the formula for the Euler φ-function,that

φ(m1m2 · · ·mr) = φ(m1)φ(m2) · · ·φ(mr).

11

3 Week Three

3.1 Lecture Six

Suppose n ∈ N has prime factorizationn = pα1

1 pα22 · · · p

αrr ,

with αi > 0 and pi 6= pj for all i 6= j. Then as discussed last time, we have maps

π : Zm1m2···mr −→ Zm1 × Zm2 × · · · × Zmr ,

ρ : Zm1 × Zm2 × · · · × Zmr −→ Zm1m2···mr ,

where π = πpα11× πpα22

× · · · × πpαrr and ρ is the map given by the Chinese remainder theorem. These maps aremutual inverses, and moreover are ring isomorphisms.

In particular, these maps respect coprimality, and so their restrictions to their respective multiplicative groupsof units yield mutually inverse group isomorphisms

π : (Zm1m2···mr)× −→ Z×m1

× Z×m2× · · · × Z×mr ,

ρ : Z×m1× Z×m2

× · · · × Z×mr −→ (Zm1m2···mr)×.

By definition, (Zn)× has cardinality φ(n), and so it follows that

φ(m1m2 · · ·mr) = φ(m1)φ(m2) · · ·φ(mr).

Thus we are led to compute φ(pα) for prime p; but since the only 1 ≤ k ≤ pα with (pα, k) > 1 must have(pα, k) = p, we deduce that exactly the multiples of p are not relatively prime to pα, hence φ(pα) = pα−pα−1 =

pα(

1− 1p

). It follows that

φ(n) = n∏p|n

(1− 1

p

),

with the product running over all prime divisors p of n.

Lemma 3.1.1 Fix m ∈ N, and consider the following statements:

1. x2 ≡ 1 mod m

2. x−1 ≡ x mod m

3. x ≡ ±1 mod m

For any m, one has (1) if and only if (2), and that (3) implies (1). If m is prime, then all three are equivalent.

Proof : The first statement is clear, as is the statement that (3) implies (1). Thus we will assume m is prime;then one has (3) if and only if m|x2− 1 = (x+ 1)(x− 1). Thus by Euclid’s lemma we have m|x+ 1 or m|x− 1,and the result is immediate.

We saw in the last lecture that 9−1 ≡ 9 mod 20, but clearly 9 6≡ ±1 mod 20. The same is true for 11 ≡−9 mod 20.

Theorem 3.1.2 (Wilson’s theorem) If p is prime, then (p− 1)! ≡ −1 mod p.

12

Proof : The cases p = 2, p = 3 are clear by computation. For p > 3, we pair off the numbers 2, 3, . . . , p − 2as a1, b1, a2, b2, . . . , ak, bk, where k = p−3

2 and aibi ≡ 1 mod p. We know that this is well-defined by lemma3.1.1, and the fact that inverses modulo p are unique. One then has

(p− 1)! = 1 · 2 · · · (p− 1) = 1 · (p− 1) · a1b1 · · · akbk

≡ 1 · (p− 1) · 1 · 1 · · · 1 mod p ≡ −1 mod p,

as claimed.

§2.2 – Solutions of congruences

How many solutions hasX4 + 2X3 +X + 1 ≡ 0 mod 5?

As integers, we have solutions

x ∈ · · · ,−14,−13,−9,−8,−4,−3, 1, 2, 6, 7, 11, 12, · · · .

As residue classes modulo 5, we have only

x ≡ 1 mod 5 and x ≡ 2 mod 5;

we say that our congruence has only 2 solutions modulo 5.

Definition: Given a polynomial f(X) ∈ Z[X], the number of solutions of f(X) ≡ 0 mod m, denoted σf (m),is the number of residue classes modulo m which satisfy the congruence; equivalently,

σf (m) = #1 ≤ x ≤ m : f(x) ≡ 0 mod m.

Example: Let f(X) = X2 − 1. We saw that σf (20) ≥ 4, while by lemma 3.1.1 we know that if p is an oddprime then σf (p) = 2, while σf (2) = 1.

We begin our investigation by studying linear congruences of the form ax ≡ b mod m.

Theorem 3.1.3 (Theorem 2.17, Niven) Let m ∈ N and set f(X) = aX − b, a, b ∈ Z. Set g = (a,m). Thenσf (m) = 0 unless g|b, in which case σf (m) = g.

Proof : If ax ≡ b mod m, then ax ≡ b mod g, i.e. 0x ≡ b mod g, since g|a, and hence we must have g|b. Now,suppose g|b and write a = αg, b = βg,m = µg. Then

ax ≡ b mod m⇔ αx ≡ β mod µ,

by theorem 2.1.5. But (α, µ) = 1 by construction, so α−1 modulo µ exists, and we have the unique solutiongiven by x ≡ α−1β mod µ. This yields g = m

µ solutions modulo m, as claimed.

Example: Let m = 100 and g = 5, so that µ = 20. Then x ≡ 14 mod 20 if and only if x ≡ 14, 34, 54, 74, or 94modulo 100.

Let m have prime factorization m = pe11 pe22 · · · perr . By the Chinese remainder theorem, the congruence f(x) ≡

0 mod m is equivalent to the system of congruences

f(x) ≡ 0 mod pe11 ,

f(x) ≡ 0 mod pe22 ,

...

f(x) ≡ 0 mod perr .

13

In particular, this implies that

σf (m) =r∏i=1

σf (peii ),

and thus it suffices to study polynomial congruences modulo prime powers; this will be the focus of our nextlecture.

14

3.2 Lecture Seven

Exercise: Prove that the product of any k consecutive integers is a multiple of k!.

Solution: The pigeonhole principle implies that among any k consecutive integers must be a multiple of 1, of2, and so on up to k, but this is not quite enough, since these numbers need not be pairwise coprime.

Instead, we may prove it one prime at a time, from which the general case follows. On the other hand, we maysimply use the identity

j(j − 1) · · · (j − k + 1)

k!=

j!

k!(j − k)!=

(j

k

)∈ Z,

from which the fact is apparent; granted, the last method is a Deus ex machina.

§2.6 – Prime power moduli

Lemma 3.2.1 Let f(X) ∈ C[X] have degree d. Then for any a ∈ C, we have

f(a+ h) = f(a) + hf ′(a) + h2f ′′(a)

2!+ · · ·+ hd

f (d)(a)

d!.

Proof : Fix a; both expressions above are polynomials in h of degree d, and their zeroth derivatives agree ath = 0, as do their first derivatives, second, and so on up to the dth derivatives. Thus their derivative, which is apolynomial in h of degree at most d, is divisible by hd+1, which implies that they must, in fact, be equal.

nb. With the notion of a derivative not defined here, we instead will use the formal derivative of a polynomialor power series, i.e.

if f(X) =

m∑n=0

anXn, then f ′(X) =

m∑n=0

nanXn−1,m ∈ N0 ∪ ∞.

Lemma 3.2.2 If f(X) ∈ Z[X], then for any a ∈ Z, k ∈ N, we have that f (k)(a)k! is an integer.

Proof : Write f(X) =

d∑n=0

anXn, an ∈ Z. Then

f (k)(a)

k!=

d∑n=0

n(n− 1) · · · (n− k + 1)

k!an−k,

and by the exercise we know that n(n−1)···(n−k+1)k! ∈ Z.

Theorem 3.2.3 (Hensel’s lemma) Let f(X) ∈ Z[X] and let pj be a prime power. Suppose there exists a ∈ Zso that

f(a) ≡ 0 mod pj and f ′(a) 6≡ 0 mod p.

Then there exists a unique integer t, 0 ≤ t < p such that f(a+ tpj) ≡ 0 mod pj+1.

Example: Take f(X) = X2 − 2, a = 4, pj = 71. Then

f(4) = 16− 2 ≡ 0 mod 7, f ′(4) = 2(4) 6≡ 0 mod 7.

It follows that exactly one element of 4, 11, 18, 25, 32, 39, 46 is a root of f(X) modulo 72; it turns out to be39.

15

Note that the residue class a modulo pj is the union of the p residue classes a+ tpj , 0 ≤ t < p. The one whichis a root modulo pj+1 is called a lift of a.

Proof of Hensel’s lemma: By lemma 3.2.1, we may write

f(a+ tpj) = f(a) + tpjf ′(a) +(tpj)2f ′′(a)

2!+ · · ·+ (tpj)df (d)(a)

d!.

Taking this expression modulo pj+1 yields

f(a+ tpj) ≡ f(a) + tpjf ′(a) mod pj+1.

Since f(a) ≡ 0 mod pj , we have that this is the case if and only if

f(a)

pj≡ −tf ′(a) mod p.

Since f ′(a) 6≡ 0 mod p, we have that f ′(a) is a unit modulo pj+1, and so we find the unique class t to be givenby

t ≡ −(f ′(a))−1f(a)

pjmod p,

as can be easily verified.

Example: Using the same example from before, we calculate f(a)pj

= 147 = 2, f ′(a) = 8 ≡ 1 mod 7, so we ought

to take t = −(1)−1(2) ≡ 5 mod 7, and indeed

f(4 + 5 · 7) = f(39) = 1519 ≡ 0 mod 72.

Corollary 1: Given f(X) ∈ Z[X], a prime p, and a ∈ Z with f(a) ≡ 0 mod p and f ′(a) 6≡ 0 mod p, then forevery j ≥ 2 there exists a unique lift of a to a root of f modulo pj; that is, a unique residue class aj mod pj

such thatf(aj) ≡ 0 mod pj and aj ≡ a mod p.

Proof : Exercise. (hint: use induction and Hensel’s lemma)

Remark: The aj of the corollary are given recursively by a1 = a and, for j ≥ 1,

aj+1 = aj − f ′(aj)−1f(aj).

nb. The condition f ′(a) 6≡ 0 mod p is the condition that a is a nonsingular root of f(X) modulo p. Aswritten, this formula fails for singular roots: consider f(X) = X2. Then a = 0 is a root modulo p, and every liftof a is a root of f modulo p2. Similarly, for g(X) = X2− p, a = 0 is a root modulo p, but no lifts of a are rootsmodulo p2. There is a more general version of Hensel’s lemma (theorem 2.24 of Niven) which accommodatessuch roots.

Fact: There exist polynomials, such as

(X2 − 2)(X2 − 17)(X2 − 34), or 3X3 + 4Y 3 + 5Z3,

which have roots modulo m for every m ∈ N, but have no roots over the rationals.

16

3.3 Lecture Eight

§2.7 – Prime modulus

Definition: Let f(X) =∑ajX

j , g(X) =∑bjX

j ∈ Z[X]. We will say that f(X) is congruent to g(X)modulo m, written f(X) ≡ g(X) mod m, if aj ≡ bj mod m for every j. In other words, f(X) ≡ g(X) mod mif and only if f(X) and g(X) have the same image in (Z[X])/(m) ∼= (Z/mZ)[X].

Example: Suppose f(X) = 15X2 + 3X + 8 ∈ Z[X]. We note that deg f = 2 over Z, but deg f = 1 over Z5,and deg f = 0 over Z3.

Lemma 3.3.1 Let p be prime, a an integer, and f(X) ∈ Z[X]. If f(a) ≡ 0 mod p, then there exists g(X) ∈Z[X] with deg g = deg f − 1 such that

f(X) ≡ (X − a)g(X) mod p.

Proof : We saw in our last lecture that (with d = deg f)

f(a+ h) = f(a) + hf ′(a) + h2f ′′(a)

2!+ · · ·+ hd

f (d)(a)

d!.

We set

g(X) =

d∑j=1

(X − a)j−1f (j)

j!,

and we have thatf(X) = f(a) + (X − a)g(X) ≡ (X − a)g(X) mod p.

Note that the leading coefficient of f(X) is f (d)(a)d! and that deg g = d− 1.

Observe that the primality condition is necessary; indeed, if f(X) = X2 − 1, then f has roots ±1, but we mayfactor f(X) = (X − 5)(X + 5).

Theorem 3.3.2 (Theorem 2.26, Niven) Let f(X) ∈ Z[X], deg f = d modulo p, with p prime. Then f has atmost d roots modulo p.

Proof : We induct on deg f . For deg f = 0 the result is clear, so suppose deg f = d > 0. If f has no rootsmodulo p we are done; otherwise, write

f(X) ≡ (X − a)g(X) mod p,

where f(a) = 0 and deg g = d− 1, as guaranteed by lemma 3.3.1. Since p is prime, any root of f(X) modulo pis a root of X − a or g(X). By the inductive hypothesis, g has at most d− 1 roots modulo p, and X − a has asingle root modulo p, from which we deduce the result.

Example: Consider f(X) = Xp −X with p prime. By Fermat’s little theorem, every residue class modulo pis a root of f , and by lemma 3.3.1 it follows that

f(X) = X(X − 1)(X − 2) · · · (X − p+ 1) mod p.

Comparing coefficients yields some interesting congruences, among which we have in the coefficient ofXp−1

0 + 1 + 2 + · · ·+ (p− 1) ≡ 0 mod p, p > 2,

17

and in the coefficient of Xp−2 ∑0≤j<k≤p−1

jk ≡ 0 mod p, p > 3.

Finally, from the coefficient of X we may deduce Wilson’s theorem

(p− 1)! ≡ −1 mod p.

Remark: This example implies that if f(X), g(X) ∈ Z[X] are such that f(a) ≡ g(a) mod p for every a ∈ Z,then

f(X)− g(X) ≡ h(X)(Xp −X) mod p

for some h(X) ∈ Z[X]. In fact, this condition is also sufficient.

Proposition 3.3.3 Let F (X) be any function (i.e. set map) from Zp to Zp. Then there exists a uniquepolynomial g(X) modulo p of degree at most p− 1 such that

F (a) ≡ g(a) mod p for every a ∈ Z.

Proof : We show uniqueness first. If g(X), h(X) both satisfy the condition, then from our remark above wehave that

g(X)− h(X) = q(X)(Xp −X), some q(X) ∈ Z[X].

Comparing degrees, we see that we must have g = h. For existence, we give two proofs. First of all, if weset

g(X) =

p−1∑a=0

(1− (X − a)p−1)F (a),

then by Fermat’s little theorem we see that g(a′) ≡ (1− 0)F (a′) mod p ≡ F (a′) mod p.

Alternatively, we observe that there are exactly pp functions Zp → Zp, and there are exactly pp polynomialsover Zp of degree at most p − 1. No two of these polynomials give the same function, and it follows that thetwo sets must coincide.

Corollary 1: (Corollary 2.30, Niven) Let p be prime and suppose that d|(p − 1). Then Xd − 1 has exactly droots modulo p.

Proof : By theorem 3.3.2 there are most d roots, so we need only show there are at least d roots. Notethat

Xp−1 − 1 ≡ (X − 1)(X − 2) · · · (X − p+ 1) mod p

has exactly p− 1 roots modulo p. Since d|(p− 1), we have

Xp−1 − 1 = (Xd − 1)(Xp−1−d +Xp−1−2d + · · ·+X2d +Xd + 1).

The second factor has at most p− 1− d roots modulo p, and so by the pigeonhole principle Xd − 1 must haveat least d roots modulo p, as claimed.

§2.8 – Primitive roots and power residues

Consider the congruence Xn ≡ 1 mod m; note that any solution a must satisfy (a, n) = 1.

Definition: Given a with (a,m) = 1, the multiplicative order of a modulo m (often called simply theorder of a) is the least positive integer k such that ak ≡ 1 mod m. One sometimes says that a belongs to theexponent k modulo m.

18

Example: Let m = 11, a = 3. We have

31 ≡ 3 mod 11, 32 ≡ 2 mod 11, 33 ≡ 5 mod 11, 34 ≡ 4 mod 11, 35 ≡ 1 mod 11,

and we see that the order of 3 modulo 11 is 5.

Fact: The order of a modulo m always divides φ(m).

19

4 Week Four

4.1 Lecture Nine

Lemma 4.1.1 (Lemma 2.31, Niven) ak ≡ 1 mod m if and only if the order of a modulo m divides k.

Proof : Let h be the order of a modulo m. If h|k, we have k = hq for some q, hence

ak = ahq = (ah)q ≡ 1q mod m ≡ 1 mod m.

Conversely, if ak ≡ 1 mod m, we may use the division algorithm to write k = hq + r, 0 ≤ r < h. One thenhas

1 ≡ ak mod m ≡ (ah)qar mod m ≡ ar mod m.

Since h is the minimal positive integer such that ah ≡ 1 mod m, it follows that r = 0, and we are done.

If (a,m) = 1, then the order of a modulo m divides φ(m).

Lemma 4.1.2 (Lemma 2.33, Niven) If a has order h modulo m, then ak has order h(h,k) modulo m.

For example, the order of a2 modulo m is h2 if h is even, and h if h is odd.

Proof : The following statements about positive integers j are equivalent:

1. (ak)j ≡ 1 mod m

2. h|(kj)

3. h(h,k) |

k(h,k)j

4. h(h,k) |j

It follows that the least positive j satisfying (4), and hence (1), is exactly j = h(h,k) .

Remark: The subgroup of Z×m generated by a is a cyclic group of order h. The same proof shows that thesmallest positive integer y such that ky ≡ 0 mod h is y = h

(h,k) .

Lemma 4.1.3 Let a have order r modulo m, and let b have order s modulo m. Then the order of ab modulom divides rs

(r,s) = [r, s], and moreover is a multiple of rs(r,s)2

= [r,s](r,s) .

In particular (Lemma 2.34, Niven), if (r, s) = 1, then the order of ab modulo m is exactly rs.

Proof : Let t be the order of ab modulo m. Then

(ab)rs/(r,s) = (ar)s/(r,s)(bs)r/(r,s) ≡ (1)(1) mod m ≡ 1 mod m,

and it follows that t| rs(r,s) . We also have

ast ≡ ast(bs)t mod m ≡ ((ab)t)s mod m ≡ 1 mod m,

hence r|st, so r(r,s) |

s(r,s) t⇒

r(r,s) |t. By a symmetric argument we may show that s

(r,s) |t, and since(

r(r,s) ,

s(r,s)

)= 1

it follows that rs(r,s)2|t.

Definition: An integer a is called a primitive root modulo m if it has order φ(m) modulo m. In this case,Z×m is the cyclic group of order φ(m).

20

Proposition 4.1.4 If m has a primitive root, then it has exactly φ(φ(m)) primitive roots.

Proof : Let g be a primitive root modulo m. Then we have a reduced residue system modulo m given byg, g2, . . . , gφ(m). By lemma 4.1.2, the order of gj modulo m is exactly φ(m)

(j,φ(m)) , which equals φ(m) exactly

when (j, φ(m)) = 1. There are exactly φ(φ(m)) such residue classes, and we are done.

Lemma 4.1.5 (Lemma 2.35, Niven) Let p, q be primes and let r ∈ N be such that qr|(p − 1). Then there areqr − qr−1 residue classes of order qr modulo p.

Proof : The order of a modulo p divides qr if and only if aqr ≡ 1 mod p. This congruence has exactly qr solutions

by corollary 1 of proposition 3.3.3. The order of a modulo p divides qr−1 if and only if aqr−1 ≡ 1 mod p, which

has exactly qr−1 solutions. The result is now immediate.

Theorem 4.1.6 (Theorem 2.36, Niven) Every prime p has a primitive root.

Proof : If p = 2 the result is immediate, so assume p is odd and write p− 1 in its prime factorization

p− 1 = qr11 qr22 · · · q

rkk .

For each 1 ≤ j ≤ k, let aj be some integer of order qrjj modulo p, whose existence is guaranteed by lemma 4.1.5.

Since (qrii , qrjj ) = 1 for all i 6= j, we have by lemma 2.34 of Niven that a1a2 has order qr11 q

r22 modulo p, that

a1a2a3 has order qr11 qr22 q

r33 modulo p, and continuing in this fashion, we eventually see that a1a2 · · · ak has order

p− 1 modulo p, as claimed.

21

4.2 Lecture Ten

Example: Modulo 5, the reduced residue classes are 1, 2, 3, and 4, with respective orders 1, 4, 4, and 2; we seethat 2 and 3 are the φ(φ(5)) primitive roots modulo 5. What are the primitive roots modulo 25? Exactly

2, 3, 8, 12, 13, 17, 22, 23.

Note that there are 8 = φ(φ(25)) of them, and that all are also primitive roots modulo 5. In fact, we may liftany primitive root modulo p to p − 1 primitive roots modulo p2, and for j ≥ 2, any primitive root modulo pj

lifts to exactly p primitive roots modulo pj+1.

Proposition 4.2.1 For n ≥ 1, we have ∑d|n

φ(d) = n.

Proof : The fractions 1n ,2n , . . . ,

nn are not all in lowest terms; when we do so, we may consider their denomi-

nators. For every divisor d of n, exactly φ(d) of these fractions have denominator d; indeed, these fractions areexactly

k(n/d)

n: 1 ≤ k ≤ d, (k, d) = 1

.

Since there are exactly n fractions in our original set, the result follows.

Alternative proof of the existence of primitive roots modulo p: We use strong induction to find thenumber of elements of order k modulo p, namely φ(k) if k | (p − 1), and 0 if k - (p − 1). The case k = 1 istrivial. For k > 1, k | (p− 1), we first note that

φ(k) +∑d|k,d<k

φ(d) =∑d|k

φ(d) = k.

Since p is prime, there are exactly k solutions to the congruence xk ≡ 1 mod p, which are exactly those xmodulo p with order dividing k. This, again, is exactly the sum

#x : ordp(x) = k+∑d|k,d<k

#x : ordp(x) = d,

where ordp(x) denotes the order of x modulo p; the result is now immediate.

Lemma 4.2.2 If d|n, then for any a with (a, n) = 1, the order of a modulo d divides the order of a modulo n.

Proof : If ordn(a) = h, then ah ≡ 1 mod n, so ah ≡ 1 mod d.

Proposition 4.2.3 If g is a primitive root modulo pr with r ≥ 2, then

gpr−2(p−1) 6≡ 1 mod pr.

Moreover, the converse holds if g is a primitive root modulo pr−1.

Proof : If g is a primitive root modulo pr, then

ordpr(g) = φ(pr) = pr−1(p− 1) > pr−2(p− 1),

22

from which it follows thatgp

r−2(p−1) 6≡ 1 mod pr.

Now, suppose that g is a primitive root modulo pr−1 and that

gpr−2(p−1) 6≡ 1 mod pr.

The order of g modulo pr divides φ(pr) = pr−1(p − 1), and by lemma 4.2.2 must be a multiple of pr−2(p − 1).Since ordpr(g) 6= pr−2(p− 1) by assumption, we deduce the result.

Theorem 4.2.4 Primitive roots exist modulo p2 for any prime p.

Proof : Let g be a primitve root modulo p and consider the lifts g + tp modulo p2, 0 ≤ t ≤ p − 1. We claimthat all but one of these lifts are primitive roots modulo p2.

Indeed, by proposition 4.2.3 it suffices to show that exactly one lift satifsies

(g + tp)p−1 ≡ 1 mod p2.

Let f(X) = Xp−1 − 1. Then g is a root of f(X) modulo p, and

f ′(g) = (p− 1)gp−2 6≡ 0 mod p.

Thus g is a nonsingular root of f modulo p, and so by Hensel’s lemma exactly one lift of g is a root of f modulop2; every other such lift must then yield a primitive root.

Lemma 4.2.5 If g is a primitive root modulo p2, then it is also a primitive root modulo p.

Proof : If ak ≡ 1 mod p, then

apk − 1 = (ak − 1)((ak)p−1 + (ak)p−2 + · · ·+ ak + 1).

Both factors are multiples of p, so it follows that apk ≡ 1 mod p2. In particular, if g is a primitive root modulop2, then gpk 6≡ 1 mod p2 for k = 1, 2, . . . , p− 2. Hence gk 6≡ 1 mod p for 1 ≤ k ≤ p− 2, and it follows that theorder of g modulo p is p− 1.

Next, we will consider primitive roots modulo pr for r ≥ 3. No more degenerate cases arise here, except whenp = 2. In this case, there are no primitive roots modulo 2r for any r ≥ 3.

23

4.3 Lecture Eleven

Theorem 4.3.1 Let p be an odd prime and let r ≥ 2. Then any primitve root modulo p2 is a primitive rootmodulo pr.

Proof : We induct on r. The case r = 2 is trivial, so for r > 2 assume g is a primitive root modulo pr; we willshow that g is a primitive root modulo pr+1.

Indeed, by proposition 4.2.3 we have that

gpr−2(p−1) 6≡ 1 mod pr,

and so by the same proposition it suffices to show that gpr−1(p−1) 6≡ 1 mod pr+1. By Euler’s theorem we have

thatgp

r−2(p−1) ≡ 1 mod pr−1,

so we can write gpr−2(p−1) = 1 + npr−1 for some n 6≡ 0 mod p. By the binomial theorem we have that

gpr−1(p−1) = (1 + npr−1)p =

p∑n=0

(p

k

)(npr−1)k,

and since p|(pk

)for 2 ≤ k ≤ p−1, we see that pr+1|

(pk

)(npr−1)k. In fact we also have this divisibilty when k = p,

and sogp

r−1(p−1) ≡ 1 + npr mod pr+1 6≡ 1 mod pr+1,

and we are done.

nb. We only use the fact that p is odd in the cancellation of(p2

)n2p2r−2.

Lemma 4.3.2 If r ≥ 3, then the order of every odd integer modulo 2r divides 2r−2 = 12φ(2r). In particular,

there are no primitive roots modulo 2r.

Proof : Again we induct on r. We did the case r = 3 in the last lecture, and so assuming the claim is true forsome r with r ≥ 3, then

a2r−2 ≡ 1 mod 2r

for every odd a. Then 2r|(a2r−2 − 1) and 2|(a2r−2+ 1) by parity, hence

2r+1|(a2r−2 − 1)(a2r−2

+ 1) = a2r−1 − 1,

whence a2r−1 ≡ 1 mod 2r+1, as claimed.

nb. The same proof shows that if a ≡ 5 mod 8, then 2α+2||(a2α − 1), where pk||n if and only if pk | n andpk+1 - n.

Theorem 4.3.3 (Theorem 2.43, Niven) Let r ≥ 3; then the set ±5,±52, . . . ,±52r−2 is a reduced residue

system modulo 2r. In particular, 5 has order 2r−2 modulo 2r, and the abelian group homomorphism

f : Z2r−2 × Z2 −→ Z×2r

given by f(x, y) = 5x(−1)y is an isomorphism.

24

By way of comparison, note that if p is odd, the map is an isomorphism

f : Zpr−1(p−1) −→ Z×pr

given by f(x) = gx for any primitive root g modulo pr−1.

Proof : The order of 5 modulo 2r divides 2r−2 by lemma 4.3.2, and so if 2r−2 is not the order, then the orderdivides 2r−3, hence

52r−3 ≡ 1 mod 2r.

But then 2r|52r−3 − 1, contradicting our previous remark with α = r − 3. Thus 5 has order 2r−2 modulo 2r,and so the residue classes

5, 52, . . . , 52r−2

are distinct modulo 2r, as are the residue classes

−5,−52, . . . ,−52r−2.

Finally, 5k ≡ 1 mod 4, while −5k ≡ 3 mod 4, so the two sets above are disjoint, and we are done.

We now know the group structure of Z×n for every n. If n has prime factorization n = pe11 pe22 · · · perr , then by the

Chinese remainder theoremZ×n ∼= Z×

pe11

× Z×pe22

× · · · × Z×perr.

If p is odd, thenZ×peii

∼= Zpei−1i (pi−1)

,

and for p = 2 we have

Z×2r ∼=

Z1 if r = 1,

Z2 if r = 2, and

Z2r−2 × Z2 if r ≥ 3.

Primitive roots modulo non-prime powers

Note that φ(n) is even for every n ≥ 3. If we can write n = cd with (c, d) = 1 and c, d ≥ 3, then the order ofany a modulo n must divide 1

2φ(n) = 12φ(c)φ(d), as we have

aφ(n)/2 = (aφ(c))φ(d)/2 ≡ 1φ(d)/2 mod c ≡ 1 mod c,

and similarlyaφ(n)/2 = (aφ(d))φ(c)/2 ≡ 1φ(c)/2 mod d ≡ 1 mod d,

since by our assumption 2|φ(c), 2|φ(d). Our claim then follows by the Chinese remainder theorem.

The only integers a which do not have such a factorization are powers of 2, or are of the form a = pr or a = 2pr,where p is an odd prime and r ≥ 1. Numbers of this form are the only ones which could possibly have primitiveroots.

Theorem 4.3.4 (Theorem 2.41, Niven) The moduli that have primitive roots are exactly 1, 2, 4, pr, and 2pr,where p is an odd prime and r ≥ 1.

Proof : Next lecture.

25

5 Week Five

5.1 Lecture Twelve

Fun fact! If S(x) denotes the set of squarefree numbers s with s ≤ x, then one has

limn→∞

#S(x)

x=

6

π2.

Recall theorem 4.3.4 from last lecture, and let PR denote the set of moduli which have primitive roots. Forexample, modulo 18, we have φ(18) = 6, and indeed a reduced residue system is given by 1, 5, 7, 11, 13, 17,which have respective order 1, 6, 3, 6, 3, and 2. Thus 5 and 11 are primitive roots modulo 18, and as expectedwe find there are 2 = φ(φ(18)) of them.

Similarly, modulo 9 a reduced residue system is given by 1, 2, 4, 5, 7, 8 with respective orders 1, 6, 3, 6, 3, and2 (note the similarity with Z×18), and we have the same result with the primitive roots 2 and 5.

Proof : (of theorem 4.3.4) We need only check that m = 2pr has primitive roots, the other claims havingalready been proven. If a1, a2, . . . , aφ(pr) is a reduced residue system modulo pr, then we claim that

aj : 2 - aj ∪ aj + pr : 2 | aj

is a reduced residue system modulo 2pr. Indeed, we see that we have exactly φ(2pr) = φ(2)φ(pr) = φ(pr)residue classes, that all are distinct, and since (aj , p) = 1 we have u, v so that aju+ pv = 1; thus writing x = uand y = v − pr−1u, we have

1 = ajx+ p(y + pr−1x) = (aj + pr)x+ py ⇒ (aj + pr, p) = 1,

and hence (since p is assumed odd) aj + pr is indeed a unit modulo 2pr, by the Chinese remainder theorem.Furthermore, the order of the elements of the latter set (the lifts of the even aj) do not change, as for 0 < k <ordpr(aj) we have

(aj + pr)k =k∑

n=0

(k

n

)anj p

r(k−n) ≡ akj mod pr,

which is nonzero by assumption, thus akj 6≡ 0 mod 2pr. The same argument holds for the odd aj , and we seethat one of the elements in our reduced residue system must have order φ(pr) = φ(2pr), which completes theproof.

Remark: When m is odd, we have an isomorphism of groups π : Z×m∼−→ Z×2m.

Corollary 1: (Corollary 2.42, Niven) Let m ∈ PR and let (a,m) = 1. The congruence xn ≡ a mod m has dsolutions if aφ(m)/d ≡ 1 mod m where d = (n, φ(m)), and zero solutions otherwise.

Remark: The analogue for m = 2r, r ≥ 3, is corollary 2.44 in Niven.

Proof : Let g be a primitive root modulo m. Choose j, 1 ≤ j ≤ φ(m) so that gj ≡ a mod m, and note that ifxn ≡ a mod m then one must have (x, n) = 1. For every such x, there exists k so that gk ≡ x mod m, and thusit suffices to solve the congruence

(gk)n ≡ gj mod m

for k. Since the order of g is φ(m), this congruence has a solution if and only if kn ≡ j mod φ(m). For fixed j,theorem 3.1.3 tells us that there are d = (n, φ(m)) solutions if d|j, and none otherwise. But d|j if and only ifj = dl for some 1 ≤ l ≤ m, if and only if a ≡ gdl mod m.

26

Finally, this is equivalent to the statement that aφ(m)/d ≡ gφ(m)l mod m (it is a sufficient condition becausegdi 6≡ 1 mod m for 1 ≤ i ≤ l − 1); but gφ(m)l ≡ 1 mod m, and we are done.

Corollary 2: (Corollary 2.38, Niven; Euler’s criterion): Let p be an odd prime. The congruence X2 ≡ a mod p

has two solutions if ap−12 ≡ 1 mod p, and no solutions otherwise. There is one solution if p|a.

Definition: The Carmichael lambda function, denoted λ(m), is the smallest exponent e ∈ N such thatae ≡ 1 mod m for every (a,m) = 1.

Remark: We know λ(m)|φ(m), and λ(m) = φ(m) if and only if m ∈ PR. Moreover, as seen last week, if

m ∈ PR then λ(m) ≤ φ(m)2 . By the Chinese remainder theorem,

λ(pe11 pe22 · · · p

err ) = [pe11 , p

e22 , . . . , p

err ].

For odd primes, we have λ(pr) = pr−1(p− 1), which also holds for p = 2 and r ≤ 2. For r ≥ 3, one has insteadλ(2r)/2r−2. Group theoretically, λ(m) is the exponent of the group Z×m.

Definition: A base-b pseudoprime is a composite number m such that bm−1 ≡ 1 mod m.

For example, we may take b = 2,m = 341; then

210 = 1024 = 3 · 341 + 1,

and so 2341−1 = (210)34 ≡ 134 mod 341 ≡ 1 mod 341. Thus 341 is a base-2 pseudoprime. This notion gives riseto the Fermat test for primality: if bm−1 6≡ 1 mod m, then m is composite. For example, with m = 341, b = 3,we have

3341−1 ≡ 56 mod 341 6≡ 1 mod 341,

and it follows that 341 is not prime.

27

5.2 Lecture Thirteen

Recall: Fermat’s test for primality.

Definition: Let m be composite. Then m is called a Carmichael number if bm−1 ≡ 1 mod m for all(b,m) = 1.

For example, we might take m = 561 = 3 · 11 · 17. If (b,m) = 1, then we have by Euler’s theorem

b561−1 ≡

(b2)280 mod 3 ≡ 1 mod 3,

(b10)56 mod 11 ≡ 1 mod 11,

(b16)35 mod 17 ≡ 1 mod 17.

The Chinese remainder theorem then implies that b560 ≡ 1 mod m.

In 1994, Alford, Granville, and Pomerance showed that there are infinitely many Carmichael numbers, in thepaper of the same name.

In fact, if 6k+ 1, 12k+ 1, and 18k+ 1 are all prime for some k ∈ N, then their product is a Carmichael number.For example with k = 1 we get that 1729 is a Carmichael number.

§3.1 – Quadratic residues

Most generally, we will investigate congruences of the form aX2 + bX + c ≡ 0 mod p, where p is an odd prime.Completing the square gives

4a2X2 + 4abX + 4ac ≡ 0 mod p⇒ (2aX + b)2 ≡ b2 − 4ac mod p.

Thus we are led to ask when y2 ≡ ∆ mod p (where ∆ = b2 − 4ac is the discriminant of our polynomial) has asolution. If so, then

2aX + b ≡ y mod p⇔ x ≡ (y − b)(2a)−1 mod p.

We note the obvious analogue of the quadratic formula. Thus it suffices to investigate when X2 ≡ a mod p canbe solved. By Euler’s criterion, this occurs exactly when

ap−12 ≡ 1 mod p, if p - a.

Example: We investigate such congruences modulo 7, when p−12 = 3.

a ord7(a) a3 mod 7 Solutions of x2 ≡ a mod 7

0 – 0 x ≡ 0 mod 71 1 1 x ≡ 1, 6 mod 72 3 1 x ≡ 3, 4 mod 73 6 −1 none4 3 1 x ≡ 2, 5 mod 75 6 −1 none6 2 −1 none

Definition: If (a,m) = 1, then a is called a quadratic residue modulo m if X2 ≡ a mod m has a solution,and a quadratic nonresidue otherwise.

Definition: If p is an odd prime, define the Legendre symbol(ap

)via

(a

p

)=

1 if a is a quadratic residue modulo p,

−1 if a is a quadratic nonresidue modulo p,

0 if p|a.

28

Remark: If a ≡ b mod p, then(ap

)=(bp

). Moreover, the number of solutions of X2 ≡ a mod p is exactly(

ap

)+ 1.

Theorem 5.2.1 (Theorem 3.1, Niven) If p is an odd prime and (a, p) = 1, then(ap

)= a

p−12 .

Proof : We give two proofs. In the first, we simply use Euler’s criterion (this is left as an exercise).

For the second, we observe that if a is a quadratic residue modulo p, then we can choose some z such thatz2 ≡ (−z)2 mod p ≡ a mod p. We then pair the reduced residue classes modulo p apart from ±z as (xi, yi),with xiyi ≡ a mod p. There are p−3

2 such pairs, and by Wilson’s theorem

−1 ≡ (p− 1)! mod p ≡ z(−z)

p−32∏i=1

xiyi mod p

≡ −a · ap−32 mod p ≡ −a

p−12 mod p,

and the result follows. If a is a nonresidue, we repeat the above construction, this time pairing all residueclasses xiy1 ≡ a mod p, i = 1, 2, . . . , p−12 , and we are done.

Corollary 1: For any integers a, b, we have(abp

)=(ap

)(bp

); in particular, if (a, p) = 1 we have

(a2

p

)= 1.

In other words, the product of two quadratic residues is a quadratic residue, as is the product of two quadraticnonresidues. The product of a residue and a nonresidue is a nonresidue – compare this behaviour with that ofthe positive and negative integers.

29

5.3 Lecture Fourteen

Recall: The Legendre symbol for p - a is defined(a

p

)=

1 if x2 ≡ a mod p has a solution,

−1 otherwise.

By Euler’s criterion, we showed that ap−12 ≡

(ap

)mod p.

Example: When a = −1 and p is odd, we have that(−1

p

)≡ (−1)

p−12 mod p ≡

1 if p ≡ 1 mod 4,

−1 if p ≡ 3 mod 4.

So X2 ≡ −1 mod p has two solutions if p ≡ 1 mod 4, and no solutions if p ≡ 3 mod 4.

nb. For odd primes p, we have

p−1∏i= p+1

2

i ≡ (−1)p−12

p−12∏j=1

j mod p ≡ (−1)p−12

(p− 1

2

)! mod p. (1)

In particular, if p ≡ 1 mod 4 we get((p− 1

2

)!

)2

≡(p− 1

2

)(−1)

p−12

p−1∏i= p+1

2

i mod p ≡ (p− 1)! mod p ≡ −1 mod p,

and hence x =(p−12

)! solves x2 ≡ −1 mod p.

Theorem 5.3.1 (The Law of Quadratic Reciprocity) Let p 6= q be odd primes; then(p

q

)(q

p

)= (−1)

p−12· q−1

2 .

In other words,(pq

)=(qp

)if p or q ≡ 1 mod 4, and

(pq

)= −

(qp

)if p ≡ q ≡ 3 mod 4. Knowing whether or not

X2 ≡ p mod q has solutions is the same as knowing whether or not X2 ≡ q mod p has solutions.

Proof : (due to Rousseau, 1991) First, some background. Let α = p−12 , β = q−1

2 . Let

F =

1 ≤ k < pq

2: (k, pq) = 1

be the “first half” of Z×pq and let

L =

(i, j) ∈ Z×p × Z×q : 1 ≤ i ≤ p− 1, 1 ≤ j < q

2

be the “left half” of Z×p × Z×q , and let π : Zpq → Zp × Zq be the map given by the Chinese remainder theorem.One can see that for every k ∈ Z×pq, one has π(k) ∈ L or −π(k) ∈ L (we will write k ∈ −L). For each such k,choose εk ∈ ±1, ik ∈ 1, 2, . . . , p− 1, jk ∈ 1, 2, . . . , β such that

π(k) = ε(ik, jk).

30

In particular, if k 6= k′ ∈ F , then π(k) 6= π(k′) and π(k) 6= −π(k′). Thus each ordered pair (ik, jk) is distinct,and we obtain ∏

k∈F(k, k) ≡

∏k∈F

π(k) ≡∏k∈F

εk(ik, jk) ≡

(∏k∈F

εk

) ∏(i,j)∈L

(i, j)

, (2)

the calculation taking place in Z×p × Z×q and the congruences taken (modp,modq).

Now, consider the right-hand side of (2): we have (with the same notation convention)

∏k∈F

(i, j) ≡p−1∏i=1

β∏j=1

(i, j) ≡ (((p− 1)!)β, (β!)p−1).

From (1), we have thatq−1∏i=β+1

i ≡ (−1)ββ! mod q,

hence (modp,modq) we have

∏(i,j)∈L

(i, j) ≡

((p− 1)!)β,

β! ·q−1∏β+1

i(−1)β

α ≡ (((p− 1)!)β, (−1)αβ((q − 1)!)α),

and finally by Wilson’s theorem we obtain∏(i,j)∈L

(i, j) ≡ ((−1)β, (−1)αβ(−1)α).

Thus with ε =∏k∈F εk, the right-hand side of (2) becomes

ε((−1)β, (−1)αβ(−1)α).

Now, on the left-hand side, we look at the first co-ordinate modulo p:

∏k∈F

k ≡∏

1≤k< pq2,

(pq,k)=1

k ≡

∏1≤k< pq

2,

p-k

k

∏

1≤k< pq2,

q|k

k

−1

. (3)

The first factor in (3) splits into intervals of length p− 1, with one exception, namely the interval ending⌊pq

2

⌋.

Thus modulo p we see

∏1≤k< pq

2,

p-k

k =

∏1≤k≤p−1

k

∏p+1≤k≤2p−1

k

· · · ∏

(β−1)p≤k≤βp−1

k

∏βp+1≤k≤βp+α

k

;

but βp+ α =⌊pq

2

⌋, so we see that ∏

1≤k< pq2,

p-k

k ≡ ((p− 1)!)βα! mod p.

The second factor of (3) is the inverse of∏1≤k< pq

2,

q|k

k ≡ q · 2q · · ·αq mod p ≡ qαα! mod p ≡(q

p

)α! mod p,

31

with the last congruence following by Euler’s criterion. Thus (3) becomes

∏k∈F

k ≡ ((p− 1)!)βα!

((q

p

)α!

)−1mod p,

which by Wilson’s theorem is congruent modulo p to (−1)β(qp

). The same proof shows

∏k∈F

k ≡ (−1)α(p

q

)mod q,

and so (2) becomes ((−1)β

(q

p

), (−1)α

(p

q

))≡ ((−1)βε, (−1)αβ(−1)αε) (modp,modq).

The first co-ordinate tells us that(qp

)≡ ε mod p, and the second that

(pq

)= (−1)αβε = (−1)αβ

(qp

)(where we

have equality rather than congruence, as(qp

)∈ ±1 and p is odd), hence(p

q

)(q

p

)= (−1)αβ,

as claimed.

32

6 Week Six

6.1 Lecture Fifteen

Recall: Last week, we saw that Euler’s criterion implies that(−1p

)= (−1)

p−12 for any odd prime p. In other

words, x2 ≡ −1 mod p has 2 solutions if p ≡ 1 mod 4, and no solutions if p ≡ 3 mod 4. There is a single solutionif p = 2.

Consequently, we see that, for every integer x, all of the prime factors of x2+1 (other than 2) must be congruentto 1 modulo 4. Similarly, for any x, k ∈ Z we have that all prime factors p of x2 + k2 satisfy

p | 2k or p ≡ 1 mod 4,

since if p - k then x2 + k2 ≡ 0 mod p implies that x2 ≡ −k2 mod p, hence (xk−1)2 ≡ −1 mod p and so p = 2 orp ≡ 1 mod 4. Note that in the first case, we must have (x, k) > 1.

Example: We use quadratic reciprocity to answer the question: Does x2 ≡ 55 mod 367 have a solution? Notethat 367 is a prime congruent to 3 modulo 4.

To answer this question we compute the Legendre symbol(55367

): by multiplicativity we have(

55

367

)=

(5

367

)(11

367

).

The law of quadratic reciprocity then implies that(5

367

)=

(367

5

)=

(2

5

)= −1,

since the quadratic residues modulo 5 are 1 and 4, and similarly(11

367

)= −

(367

11

)= −

(4

11

)= −

(2

11

)2

= −1.

Thus(55367

)= (−1)(−1) = 1, and we see that 55 is a quadratic residue modulo 367. The theorem is non-

constructive, but one may check that (±34)2 ≡ 55 mod 367.

We see from this example that one algorithm for calculating (ap) is given by:

1. Factor a completely, a = pe11 pe22 · · · p

ekk .

2. Use multiplicativity and periodicity: (a

p

)=

(pe11p

)(pe22p

)· · ·(pekkp

).

3. Use the law of quadratic reciprocity.

4. If not finished, return to 1.

Theorem 6.1.1 (Theorem 3.3, Niven) If p is an odd prime, then(2

p

)= (−1)

p2−18 ;

that is, (2

p

)=

1 if p ≡ ±1 mod 8,

−1 if p ≡ ±3 mod 8.

33

The proof is not given here.

§3.3 – The Jacobi symbol

Let p1, p2, . . . , pk be odd primes (not necessarily distinct), and let Q be their product. The Jacobi symbol(aQ

)is defined (

a

Q

)=

k∏j=1

(a

pj

),

where the symbols on the right are Legendre symbols.

Example: We compute the Jacobi symbol(815

). We have(

8

15

)=

(8

3

)(8

5

)=

(2

5

)(2

5

)= (−1)(−1) = 1.

Note that although the Jacobi symbol(815

)is 1, the congruence x2 ≡ 8 mod 15 has no solution, as x2 ≡ 2 mod 3

hasn’t any. However, we can say that, if(aQ

)= −1, then x2 ≡ a mod Q has no solutions.

Our example shows that the converse is false; why, then, define the Jacobi symbol at all? There are severalreasons, chief among which are

1. It agrees with the Legendre symbol when Q is prime, and

2. It is easy to compute without factoring any integers.

The first of these assertions is clear, but the second is not yet.

Properties of the Jacobi symbol

• It is totally multiplicative in both arguments; that is, if Q and R are odd primes, then for any a, b wehave (

ab

Q

)=

(a

Q

)(b

Q

),

(a

QR

)=

(a

Q

)(a

R

).

• It is periodic in the top argument with period Q, i.e. if a ≡ b mod Q then(aQ

)=(bQ

).

The second property is immediate if Q is squarefree, and if not then we write Q = Q′S with Q′ squarefree andS a perfect square, and we have that(

a

Q

)=

(a

Q′

)(a

S

)=

(a

Q′

)(a√S

)2

=

(a

Q

).

Before proceeding, we first record the following

Lemma 6.1.2 If b1, b2, . . . , bk are odd, then

k∑j=1

bj − 1

2≡ b1b2 · · · bk − 1

2mod 2.

Proof : If k = 2, then

b1b2 − 1

2−(b1 − 1

2+b2 − 1

2

)=

(b1 − 1)(b2 − 1)

2≡ 0 mod 2,

and the general case follows by induction (exercise).

34

Theorem 6.1.3 (Theorem 3.7, Niven) If Q > 0 is odd, then the Jacobi symbol(−1Q

)equals

(−1)Q−12 =

1 if Q ≡ 1 mod 4,

−1 if Q ≡ 3 mod 4.

Proof : Since square factors of Q do not affect the Jacobi symbol (as illustrated above), we may assume withoutloss of generality that Q = p1p2 · · · pk is squarefree. Then by lemma 6.1.2 we have that

Q− 1

2≡ p1 − 1

2· p2 − 1

2· · · pk − 1

2mod 2,

hence (−1

Q

)=

(−1

p1

)(−1

p2

)· · ·(−1

pk

)= (−1)

(p1−1

2

)(−1)

(p2−1

2

)· · · (−1)

(pk−1

2

)= (−1)

Q−12 ,

as claimed.

35

6.2 Lecture Sixteen

Theorem 6.2.1 (Theorem 3.8, Niven; the law of Quadratic reciprocity for Jacobi symbols) Let P,Q ∈ N beodd with (P,Q) = 1. Then(

P

Q

)(Q

P

)= (−1)

P−12·Q−1

2 =

−1 if P ≡ Q ≡ 3 mod 4,

1 otherwise.

Note that if (P,Q) > 1, we must have(PQ

)= 0.

Proof : Write P = p1p2 · · · pk, Q = q1q2 · · · ql, where the pi and qj are odd (not necessarily distinct) primes. Bymultiplicativity, we have (

P

Q

)=

k∏i=1

(piQ

)=

k∏i=1

l∏j=1

(piqj

),

where the factors in the last product are Legendre symbols. The law of quadratic reciprocity (for Legendresymbols) then implies that(

P

Q

)=

k∏i=1

l∏j=1

(qjpi

)(−1)

pi−1

2·qj−1

2 =

(Q

P

)(−1)

∑ki=1

∑lj=1

pi−1

2·qj−1

2 .

By lemma 6.1.2 from our last lecture, the exponent of −1 is exactly

k∑i=1

l∑j=1

pi − 1

2· qj − 1

2≡ P − 1

2· Q− 1

2,

hence (P

Q

)= (−1)

P−12·Q−1

2 ,

as claimed.

Application: We calculate the Legendre symbol(2p

), where p is an odd prime; rather, we will show that the

Jacobi symbol(2Q

)obeys the formula from last lecture, namely

(2

Q

)= (−1)

Q2−18 =

1 if Q ≡ ±1 mod 8,

−1 if Q ≡ ±3 mod 8,

from which the special case of the Legendre symbol follows. By periodicity in the top argument, we havethat (

2

Q

)=

(2−QQ

)=

(−1

Q

)(Q− 2

Q

)= (−1)

Q−12

(Q− 2

Q

).

Since Q is odd and positive, we must have that (Q,Q−2) = 1, and so by quadratic reciprocity we see that(2

Q

)= (−1)

Q−12

(Q

Q− 2

)(−1)

Q−12·Q−3

2 ;

again, since one of Q− 1 and Q− 3 must be divisible by 4, we cancel the last factor and obtain(2

Q

)= (−1)

Q−12

(Q

Q− 2

)= (−1)

Q−12

(2

Q− 2

).

36

By descent, we obtain (2

Q

)= (−1)

Q−12 (−1)

Q−32 · · · (−1)3(−1)2

(2

3

),

and finally since 2 is a quadratic nonresidue modulo 3 we have(2

Q

)= (−1)1+2+···+Q−1

2 = (−1)12·Q−1

2·Q+1

2 = (−1)Q2−1

8 ,

and we are done.

We can turn this into a general algorithm for computing the Jacobi symbol. Indeed, to compute(aQ

), we may

apply the following steps:

1. Factor −1 and any powers of 2 from a, leaving(PQ

)with P an odd positive number.

2. Use quadratic reciprocity and periodicity.

3. If not finished, return to 1.

Note, in particular, that this algorithm doesn’t require us to factor any integers.

Example: 53681 is prime and congruent to 1 modulo 4. Is 1311 a quadratic residue modulo 53681?

It suffices to compute the Jacobi symbol, which in the case that Q is an odd prime is exactly the Legendresymbol. Using the algorithm outlined above, we find(

1311

53681

)=

(53681

1311

)=

(−70

1311

)=

(−1

1311

)(2

1311

)(35

1311

)

= (−1)(1)

(35

1311

)= −

(1311

35

)(−1) =

(16

35

)=

(4

35

)2

= 1.

So 1311 is indeed a square modulo 53681.

Here we will give an outline of a more “traditional” proof of the law of quadratic reciprocity, nearer to the proofgiven in Niven. We start with a preliminary result.

Lemma 6.2.2 (Gauss’s lemma) Let p be an odd prime and let

F =

1, 2, . . . ,

p− 1

2

,−F =

p+ 1

2,p+ 3

2, . . . , p− 1

.

Given a with (a, p) = 1, let n = #k ∈ F : ak mod p ∈ −F. Then(ap

)= (−1)n.

Note that from this we can immediately compute(2p

), since in this case n = #p4 < k < p

2. Next, we show

that

n ≡

p−12∑j=1

⌊aj

p

⌋mod 2,

and we also use the fact thatp−12∑j=1

⌊aj

p

⌋+

q−12∑

k=1

⌊kp

q

⌋=p− 1

2· q − 1

2.

One proof of this fact counts lattice points in the rectangle R in the first quadrant, whose vertices are at(0, 0), (0, q), (p, 0) and (p, q); specifically, those lying above and below the line segment joining the origin to(p, q) — but this is all the detail we give here.

37

With this machinery, we can show that there are infinitely many primes congruent to 1 modulo 4. Indeed, ifp1, p2, . . . , pk is any finite list of such primes, let

N = (2p1p2 · · · pk)2 + 1.

Then pi - N for i = 1, 2, . . . , k. But since N is one more than a square and odd, we know that all of itsprime factors must be congruent to 1 modulo 4; in particular, there must be such a prime which is not on thelist.

38

6.3 Lecture Seventeen

Final exam date: Friday, December 8, at noon.

Definition: A degree-d form (or homogeneous polynomial) is a polynomial, each of whose monomialshas degree d. For example, X3 + 2Y 3 + 3Y 2Z − 4XY Z is a degree-3 form. A binary form is a form in twovariables, and a quadratic form is a degree-2 form. We will focus on binary quadratic forms.

Example: One binary quadratic form is f(X,Y ) = X2 + Y 2; another is g(X,Y ) = 53X2 + 152XY +109Y 2.

Among the questions we might ask about binary quadratic forms f(X,Y ), two important ones are:

1. Which m ∈ Z are represented by f? That is, for which m ∈ Z do we have x, y ∈ Z with f(x, y) = m?

2. Which n ∈ Z can be properly represented by f? That is, when is m represented m = f(x, y) with(x, y) = 1?

One motivation for the second question is the observation that for any binary quadratic form f , we havef(dx, dy) = d2f(x, y). We first investigate the form f(X,Y ) = X2 + Y 2, and investigate when f represents aprime p. We observe that 2 = 12 + 12, and from now on will restrict our attention to odd primes p.

Lemma 6.3.1 If p ≡ 3 mod 4 and p|(x2 + y2), then p|x and p|y.

Proof : Since p|(x2 + y2), we have that x2 ≡ −y2 mod p. If p - y, then y is a unit modulo p and we have theequivalent congruence (xy−1)2 ≡ 1 mod p, or p | ((xy−1)2 + 1), contradicting our result from the end of the lastlecture that p | ((2n)2 + 1) implies p ≡ 1 mod 4. Thus p | y, from which we immediately see p | x.

In particular, if p ≡ 3 mod 4, then there is no way to express p as the sum of two squares.

Proposition 6.3.2 If p ≡ 1 mod 4, then there exist x, y ∈ Z such that x2 + y2 = p and (x, y) = 1.

Proof : Fix some z so that z2 ≡ −1 mod p, and consider the set

S = u+ zv : 0 ≤ u < √p, 0 ≤ v < √p.

It is not difficult to see that #S = (1 + b√pc)2, and that

(1 + b√pc)2 > d√pe2 > p,

where dxe denotes the ceiling function. Thus by the pigeonhole principle there must be two distinct elementsu+ zv, u′ + zv′ (i.e. with not both u = u′ and v = v′) which are congruent modulo p. Define

x = u− u′, y = v′ − v.

Then since u − u′ ≡ z(v′ − v) mod p, we see that x2 ≡ −y2 mod p, and so p|(x2 + y2). Moreover, we seethat

|x2 + y2| ≤ |x|2 + |y|2 < 2p,

and since we do not have x = y = 0 by our earlier remarks, it follows that x2 + y2 = p. Furthermore, ifd = (x, y), then it follows that d2|p and hence d = 1.

Theorem 6.3.3 (due to Fermat) An integer n is properly represented by X2 + Y 2 if and only if 4 - n and noprime p ≡ 3 mod 4 has p | n.

39

Proof : Suppose first that n = x2 + y2 with (x, y) = 1, and let p ≡ 3 mod 4 be prime. If p|(x2 + y2), then bylemma 6.3.2 p|x and p|y, thus (x, y) > 1, a contradiction.

Conversely suppose that no prime factor p of n has p ≡ 3 mod 4. Since we know each prime factor is properlyrepresented, its suffices to prove that the product mn of any numbers m,n properly represented by X2 + Y 2,is itself properly represented.

Write m = w2 + z2 and n = x2 + y2 with (w, z) = (x, y) = 1. Then

mn = (wx)2 + (wy)2 + (xz)2 + (yz)2 = (wx− yz)2 + (wy − xz)2,

and it suffices to check coprimality.

[Here we encounter an error in the proof, the rest of which has been omitted.]

In the next lecture, we will prove the following, also due to Fermat.

Theorem 6.3.4 Given n ∈ N, write n in its prime factorization as

n = 2αk∏i=1

pβii

l∏j=1

qγjj ,

where every pi has pi ≡ 1 mod 4 and every qj has qj ≡ 3 mod 4. Then n is represented by X2 + Y 2 if and onlyif every γj is even; in other words, if and only if we can write n = ab2, where

p|a⇒ p 6≡ 3 mod 4 and p|b⇒ p ≡ 3 mod 4.

40

7 Week Seven

7.1 Lecture Eighteen

Recall: Theorem 6.3.4.

Proof : Lemma 6.3.1 showed that if q|(x2 + y2) and q ≡ 3 mod 4 is prime, then q|x and q|y, thus q2|(x2 + y2).Conversely, proposition 6.3.2 showed the converse statement for p ≡ 1 mod 4, and theorem 6.3.3 for 2 and forq2, q ≡ 3 mod 4, and since

(a2 + b2)(c2 + d2) = (ac− bd)2 + (ad+ bc)2

we see that representability by X2 + Y 2 is multiplicative, which completes the proof.

Fact: A positive integer n can be properly represented by X2 + Y 2 if and only if each γj = 0; that is, if andonly if no prime congruent to 3 modulo 4 divides n. The proof of one implication was attempted at the end ofthe last lecture; today, we develop machinery to prove more general statements.

[Aside: Lagrange’s Four-Square theorem asserts that any nonnegative integer can be written as the sum of atmost four squares. One proves this first for primes, then by showing multiplicative closure of representabilityby W 2 + X2 + Y 2 + Z2. We may draw an analogy between the corresponding observation in the proof oftheorem 6.3.4 and multiplicativity of the complex norm |a+ ib|2 = a2 + b2, and that of the norm in the ring ofquaternions,

|a+ ib+ jc+ kd|2 = a2 + b2 + c2 + d2.

Moreover let f(X1, X2, . . . , Xn) be any quadratic form. If f represents every integer in the set 1, 2, . . . , 15,then f represents every integer. This is known as the Fifteen Theorem.]

§3.4 – Binary quadratic forms

Notation: For the remainder of this lecture, f(X,Y ) = aX2 + bXY + cY 2 will denote an arbitrary quadraticform of discriminant d = b2 − 4ac.

When does f(x, y) = 0 for x, y not both 0? Suppose d is a perfect square. If a 6= 0 then we may factor f overQ via

f(x, y) = a

(x+

b− 2√d

2ay

)(x+

b−√d

2ay

),

and so by proposition 6.2.2 we see that f also factors over Z. In this case, there are many ways to represent0, as we need only make one of the factors equal zero. If a = 0 then f(X,Y ) = Y (bX + cY ) and we have thesame observation.

In the case d = 0, we can write f(X,Y ) = e(gX + hY )2 for some integers e, g, h. If e > 0 then f is positivesemidefinite; that is, f(x, y) ≥ 0 for any x, y ∈ Z. Similarly if e < 0 then f(x, y) ≤ 0 for all x, y ∈ Z, and fis said to be negative semidefinite. If furthermore f(x, y) = 0 implies that x = y = 0, then f is said to bepositive definite (resp. negative definite).

Now, suppose d is not a perfect square; then f is irreducible over Q. In particular, ac 6= 0, else d = b2 which isnot the case.

Theorem 7.1.1 (Theorem 3.10, Niven) Suppose that a binary quadratic form f(X,Y ) has discriminant d < 0;then f is definite (i.e. positive definite or negative definite).

Proof : Suppose f(m,n) = 0 and suppose n 6= 0. The identity

4af(x, y) = (2ax+ by)2 − dy2

41

implies that

(2am+ bn)2 − dn2 = 0⇔ dn2 = (2am+ bn)2 ⇔ d = (2am

n+ b)2,

so d < 0 is the square of a rational number, which is a contradiction. A symmetric argument with the assumptionm 6= 0 completes the proof.

We might ask: when is f positive? negative?

Theorem 7.1.2 (Theorem 3.11, Niven) Let f be a binary quadratic form of discriminant d. If d > 0 then fis indefinite, that is, f represents both positive and negative values. If d < 0 and a > 0, then f is positivedefinite. If d < 0 and a < 0, then f is negative definite.

Proof : Suppose d > 0. Then if a 6= 0 we have that f(1, 0) = a and f(b,−2a) = −ad, and since d > 0 weknow that a and −ad have opposite signs, so f is indefinite. The same argument works if we assume c 6= 0,using f(0, 1) = c, f(−2c, b) = −cd. Finally if a = c = 0 then f(1, 1) = b, f(−1, 1) = −b, and since f 6= 0 byassumption this exhausts all cases.

Suppose now that d < 0 so that in particular d is not a perfect square. Then we know a 6= 0 and so by ouridentity we have that

4af(x, y) = (2ax+ by)2 + |d|y2 ≥ 0,

from which it follows that a must have the same sign as f(x, y). The same equation shows that if f(x, y) = 0then y = 0, thus x = 0, and we are done.

42

7.2 Lecture Nineteen

Theorem 7.2.1 (Theorem 3.12, Niven) Let d ∈ Z; then there exists a binary quadratic form of discriminantd if and only if d ≡ 0 or 1 mod 4.

Proof : Suppose f(X,Y ) = aX2 + bXY + cY 2 has discriminant d; then

d = b2 − 4ac ≡ b2 mod 4,

and since the squares modulo 4 are 0 and 1 the result is clear. Conversely, if d ≡ 0 mod 4 we may takef(X,Y ) = X2− d

4Y2 which has discriminant d, and if d ≡ 1 mod 4 we instead take f(X,Y ) = X2+XY − d−1

4 Y 2

with the same result.

Theorem 7.2.2 (Theorem 3.13, Niven) Let d, n ∈ Z with n 6= 0. There exists a binary quadratic form ofdiscriminant d that properly represents n if and only if the congruence x2 ≡ d mod 4n has a solution.

Remark: This theorem guarantees the existence of some binary quadratic form of discriminant d, but repre-sentability by a specific form is a much harder question.

Example: Take n = −3. There is a binary quadratic form of discriminant d representing −3 if and only ifx2 ≡ d mod −12 has a solution. The squares modulo 12 are 0, 1, 4, and 9, and so we see that the only binaryquadratic forms representing −3 have discriminant d lying in one of these residue classes modulo 12.

Proof : Suppose u2 ≡ d mod 4n, and write u2 − d = 4nv for some integer v. Then with

f(X,Y ) = nX2 + uXY + vY 2,

we see that the discriminant of f is u2−4nv = d and that f(1, 0) = n. Conversely, suppose that as2+bst+ct2 = nwith (s, t) = 1 and b2 − 4ac = d. Choose m1,m2 ∈ Z such that (m1,m2) = 1,m1m2 = 4n, and also (m1, t) =(m2, s) = 1. Note that we can always choose such m1,m2: for example,

m1 =∏p|s

pordp(4n), m2 =4n

m1.

Recalling from last lecture the identity 4af(x, y) = (2ax+ by)2 − dy2, hence

(2as+ bt)2 − dt2 ≡ 0 mod m1 ⇔ d ≡ (2ast−1 + b)2 mod m1,

since (t,m1) = 1. A symmetric argument shows that d ≡ (2cts−1 + b)2 mod m2, and since (m1,m2) = 1 theChinese remainder theorem implies that we have a solution to the congruence x2 ≡ d mod m1m2 ≡ d mod 4n,and we are done.

Corollary 1: Let d ≡ 0 or 1 mod 4, and let p be an odd prime. There exists a binary quadratic form ofdiscriminant d representing p if and only if

(dp

)= 0 or 1.

Proof : By Theorem 7.2.2 it suffices to show that x2 ≡ d mod 4p has a solution if and only if(dp

)= 0 or 1.

Suppose x2 ≡ d mod 4p so that x2 ≡ d mod p; it follows that(dp

)= 0 or 1.

Conversely, if(dp

)= 0 or 1, then we may write x2 ≡ d mod p, and since d is a square modulo 4 by assumption

we have y2 ≡ d mod 4, and the Chinese remainder theorem completes the proof.

Thus we are led to investigate the set of all binary quadratic forms of a given discriminant.

43

Example: Determine all integers represented by f(X,Y ) = 53X2 + 152XY + 109Y 2.

If we set y = 2u − 7v, x = −3u + 10v, then a calculation shows that f(x, y) = u2 + v2, and thus if n isrepresented by f , it is also represented by X2 + Y 2. Conversely if n is represented by this latter form, thenn = u2+v2 = f(−3u+10v, 2u−7v), and we see that both forms represent exactly the same set of integers.

We can associate to any binary quadratic form f(X,Y ) = aX2 + bXY + cY 2 the 2 × 2 symmetric matrix

F =

(a b

2b2 c

), which has the property that

~xTF~x = f(x, y), ~x =

(xy

),

where AT denotes the matrix transpose. In our above example, F =

(53 7676 109

)is associated to f(X,Y ) =

53X2 + 152XY + 109Y 2, and G =

(1 00 1

)is associated to g(X,Y ) = X2 + Y 2.

With this in mind, we write our change of variables from our example above as

~x =

(xy

)=

(−3 102 −7

)(uv

)=: M~u,

hencef(x, y) = ~xTF~x = (M~u)TF (M~u) = ~uT (MTFM)~u,

and indeed, MTFM = G.

44

8 Week Eight

8.1 Lecture Twenty

Recall from last lecture the binary quadratic forms

f(X,Y ) = 53X2 + 152XY + 109Y 2, g(X,Y ) = X2 + Y 2,

with their associated matrices

F =

(53 7676 109

)and G =

(1 00 1

),

respectively. We saw that MTFM = G, where M =

(−3 102 −7

). Recall that if A =

(a bc d

), then

A−1 =1

detA

(d −b−c a

)=

1

ad− bc

(d −b−c a

).

In our case, detM = 1 and so M−1 =

(−7 −10−2 −3

); however, we observe that if M

(uv

)=

(xy

), then

(uv

)= M−1

(xy

)=

(−7x− 10y−2x− 3y

).

Since f(−u,−v) = f(u, v) for any binary quadratic form, the negative signs in this matrix are of no concern.Thus we obtain F = (M−1)TGM−1, which combined with our previous relation G = MTFM implies that fand g represent exactly the same integers.

Definition: The modular group Γ is the set of all 2× 2 matrices over Z with determinant 1, with the groupoperation being multiplication.

Also used to denote Γ are SL2(Z) and SL(2,Z). Since Γ is a group we have that M ∈ Γ⇔M−1 ∈ Γ.

Definition: Two binary quadratic forms f and g are called equivalent, denoted f ∼ g, if there exists someM ∈ Γ such that MTFM = G, where F and G are the associated matrices of f and g, respectively.

It is easy to see that if f ∼ g with M tFM = G,M =

(a bc d

), then f(ax + by, cx + dy) = g(x, y). In our

previous example, we showed that 53X2 + 152XY + 109Y 2 ∼ X2 + Y 2.

Remark: If MTFM = G, then (−M)TF (−M) = G. Thus we may take M or −M as we see fit, or equivalentlychoose a representative from PSL2(Z) = Γ/±I.

Theorem 8.1.1 (Theorem 3.16, Niven) ∼ is an equivalence relation.

Proof : Reflexivity is clear, as F = ITFI, as is symmetry by our remarks above, so it suffices to provetransitivity. Suppose f ∼ g, g ∼ h, and let M,N ∈ Γ be such that MTFM = G,NTGN = H. Then MN ∈ Γand (MN)TF (MN) = H, so f ∼ h, and we are done.

Note that if f(X,Y ) = aX2 + bXY + cY 2 has associated matrix F , then detF = ac − b2

4 = −d4 , where d is

the discriminant of f . In particular, this means that if f ∼ g then their discriminants are equal. Indeed, inour perennial example f(X,Y ) = X2 + Y 2, it is not difficult to see that the discriminant of f is −4, as is thediscriminant of g.

Theorem 8.1.2 (Theorem 3.17, Niven) Let f ∼ g be binary quadratic forms, and let n ∈ Z. Then:

45

1. The representations of n by f are in one-to-one correspondence with the representations of n by g.

2. The proper representations of n by f are in one-to-one correspondence with the proper representations ofn by g.

Proof :

1. If f(x, y) = n, then ~xTF~x = (n), and so with MTFM = G we have (M~x)TG(M~x) = (n). This process isinvertible, whence we deduce the result.

2. In the calculation in the proof of the first statement, if m|x and m|y then m divides both entries of M~x,and conversely.

We seek to understand the structure of the equivalence classes of binary quadratic forms of discriminant d,which our work above shows to be partitioned by ∼. We begin by showing that every equivalence class containsa “nice” form; that is, roughly speaking, one in which b is the smallest coefficient in absolute value and c thelargest.

Definition: Let f(X,Y ) = aX2 + bXY + cY 2 be a binary quadratic form. Then f is said to be reduced ifone of the following conditions hold:

1. −|a| < b ≤ |a| < |c|.

2. 0 ≤ b ≤ |a| = |c|.

46

8.2 Lecture Twenty-One

Recall from last time the notion of a reduced binary quadratic form; there is an algorithm for converting anygiven binary quadratic form f into an equivalent, reduced binary quadratic form.

Example: We will reduce f = f0(X,Y ) = 53X2 + 152XY + 109Y 2, which corresponds to the matrix F =(53 7676 109

). For n ∈ Z, let

Tn =

(1 n0 1

), S =

(0 1−1 0

).

We note that if F1 is defined via

F1 = T T−1F0T−1 =

(1 −10 1

)T (53 7676 109

)(1 −10 1

)=

(53 2323 10

),

which corresponds to the form f1(X,Y ) = 53X2 + 46XY + 10Y 2. Next, we set

F2 = STF1S =

(0 1−1 0

)T (53 2323 10

)(0 1−1 0

)=

(10 −23−23 53

),

so that f2(X,Y ) = 10X2 − 46XY + 53Y 2. Continuing in this way, we set

F3 = T T2 F2T2 =

(1 20 1

)T (10 −23−23 53

)(1 20 1

)=

(10 −3−3 1

),

F4 = STF3S =

(0 1−1 0

)T (10 −3−3 1

)(0 1−1 0

)=

(1 33 10

),

F5 = T T−3F4T−3 =

(1 −30 1

)T (1 33 10

)(1 −30 1

)=

(1 00 1

).

We see that f0 ∼ f5 and that f5(X,Y ) = X2 + Y 2 is reduced. Thus, if M = T−1ST2ST−3 =

(−3 102 −7

), then

we have that M tF0M = F5.

Theorem 8.2.1 (Theorem 3.18, Niven) Let d ≡ 0 or 1 mod 4, with d not a perfect square. Then everyequivalence class of binary quadratic forms of discriminant d contains a reduced form.

Proof : Let f0(X,Y ) = a0X2 + b0XY + c0Y

2 have discriminant d, and for s ≥ 0 let Fs =

(as

bs2

bs2 cs

), with Tn

and S as above. Define an algorithm via:

(A) If |cs| < |as|, set Fs+1 = T TFsT so that as+1 = cs, cs+1 = as, bs+1 = −bs.

(B) If |as| ≤ |cs| but |bs| /∈ (−|as|, |as|], then choose n ∈ Z so that 2asn+ bs ∈ (−|as|, |as|]. Indeed, this choiceis unique by the division algorithm, writing

|as| − bs = (2as)q + r; set n = q.

Then set Fs+1 = T Tn FsTn, so that

as+1 = as, bs+1 = 2asn+ bs, cs+1 = asn2 + bsn+ cs = fs(n, 1).

(C) If |as| = |cs| but bs < 0, then set Fs+1 = STFsS.

47

We observe that if a binary quadratic form does not satisfy the premises of (A), (B), or (C), then it is reduced;thus it suffices to show that the algorithm terminates.

Since d is assumed not to be a perfect square we know that as 6= 0 for any s. We see that (A) is never followedby (A), nor (B) by (B), nor (C) by (C), and moreover since the output of (C) is reduced by construction itremains only to show that we cannot have an infinite loop (A) followed by (B) followed by (A), and so on. Butthis is clear, since every time we apply step (A), |as| decreases, and so the well-ordering axiom implies that thealgorithm terminates.

Note that if d is a perfect square, then applying the above algorithm may obtain as = 0, meaning that none ofthe steps (A), (B), or (C) is triggered unless as = bs = cs = 0.

Theorem 8.2.2 (Theorem 3.19, Niven) Let d ∈ Z with d not a perfect square, and let f(X,Y ) = aX2+bXY +cY 2 be a reduced binary quadratic form of discriminant d. Then:

1. If d > 0 then ac < 0 and 0 < |a| <√

d2 .

2. If d < 0 then ac > 0 and 0 < |a| <√|d|3 .

It is an immediate consequence of this theorem that there are only finitely many equivalence classes of bi-nary quadratic forms of discriminant d, as there are only finitely many such reduced forms: indeed, we musthave

0 ≤ |b| ≤ |a| ≤√|d|, c =

b2 − d4a

.

The proof will be given in the next lecture; today, we end with the following definition.

Definition: Let d ∈ Z with d not a perfect square. The number of equivalence classes of binary quadraticforms of discriminant d is called the class number of d and is denoted H(d).

48

8.3 Lecture Twenty-Two

Recall theorem 8.2.2 from last time. Today, we prove the second assertion of the theorem.

Proof : (of Theorem 8.2.2, part (2)) Since d < 0 we know that ac > 0, as b2 − 4ac < 0, so in particular |a| > 0.Then

|d| = −d = 4ac− b2 = 4|ac| − b2.

Since f is reduced, we have that |b| ≤ |a| ≤ |c|, and so

4|ac| − b2 ≥ 4a2 − a2 = 3a2,

and we have that |a| ≤√|d|3 , as claimed.

Recall also the definition of the class number H(d) of d.

Example: We compute H(−7). We proceed by listing all reduced binary quadratic forms of discriminant −7and then checking whether any are equivalent. Theorem 8.2.2 shows that if f(X,Y ) = aX2 + bXY + cY 2 isreduced of discriminant −7, then 0 < |a| ≤

√73 < 2, hence a = ±1.

If |a| = |c| = 1 then we have −1 < b ≤ 1, and if |a| < |c| we have 0 ≤ b ≤ 1; that is, in both cases b ∈ 0, 1.Calculating the possibilities for c = b2−d

4a yields the following table:

a b c valid?

1 0 74 no

1 1 2 yes−1 0 −7

4 no−1 1 −2 yes

(where the last column indicates whether or not aX2 + bXY + cY 2 is a valid binary quadratic form). It followsfrom this that H(−7) ≤ 2. Since the discriminant is negative, it follows that both of the binary quadraticforms

f(X,Y ) = X2 +XY + 2Y 2, g(X,Y ) = −X2 +XY − 2Y 2

are (positive or negative) definite, and a calculation shows that f(1, 1) = 4 > 0, g(1, 1) = −2. Thus f is positivedefinite, g is negative definite, and so in particular f 6∼ g and we have that H(−7) = 2.

Note that for any binary quadratic form of discriminant d, we have that d = b2 − 4ac ≡ b2 mod 2, so b musthave the same parity as d.

Example: Which primes are represented by the reduced form f found in our example above?

By theorem 7.2.2 we have that n is properly represented by some binary quadratic form of discriminant −7if and only if there exists a solution to the congruence x2 ≡ −7 mod 4|n|. If n > 0, then x2 ≡ −7 mod 4nimplies that n is properly represented by f , since f is the only positive definite reduced binary quadratic formof discriminant −7. Furthermore, if n = p is prime, then every representation of p is proper.

For p = 2, take (x, y) = (0, 1) so that f(x, y) = 2. For odd p, we see that f represents p if and only ifx2 ≡ −7 mod p has a solution, by the Chinese remainder theorem. If p = 7 this is clear; otherwise,

• If p ≡ 1 mod 4 then(−7p

)=(−1p

)(7p

)=(p7

).

• If p ≡ 3 mod 4 then(−7p

)=(−1p

)(7p

)=(p7

).

The quadratic residues modulo 7 are 1, 2, and 4; thus p is represented by f if and only if p ≡ 0, 1, 2 or4 mod 7.

49

Theorem 8.3.1 (Theorem 3.25, Niven) Let f(X,Y ) = aX2 + bXY + cY 2, g(X,Y ) = a′X2 + b′XY + c′Y 2 bereduced, positive definite binary quadratic forms. If f ∼ g, then f = g.

Proof : Exercise.

Consequently, if d < 0 then H(d) equals the number of reduced binary quadratic forms of discriminant d, whichis twice the number of such positive definite forms.

[Aside: there is also the notion of the class number of a number field; when d < 0, the class number of Q(√−|d|)

equals 12H(d).]

50

9 Week Nine

9.1 Lecture Twenty-Three

Recall: Theorem 8.3.1

Can we “compose” two binary quadratic forms? We can generalize the multiplication formula

(a2 + b2)(c2 + d2) = (ac− bd)2 + (ad+ bc)2.

Note that if z = a+ ib, w = c+ id are complex numbers, then the above formula states exactly that |z|2|w|2 =|zw|2. Thus, the binary quadratic form f(X,Y ) = X2 + Y 2 has a “composition law” given by

f(a, b)f(c, d) = f(ab− cd, ad+ bc);

in particular, this implies that the set of numbers represented by f is multiplicatively closed. Can we generalizethis idea to arbitrary binary quadratic forms?

Example: Let d = −7. We saw last week that the single equivalence class of positive definite binary quadraticforms of discriminant −7 is represented by the reduced form f(X,Y ) = X2 + XY + 2Y 2. We factor over thecomplex numbers, using the quadratic formula:

f(a, b) =

(a+

1 + i√

7

2b

)(a+

1− i√

7

2b

).

Thus we are led to compute(a+

1 + i√

7

2b

)(c+

1 + i√

7

2d

)= (ac− 2bd) +

1 + i√

7

2(ad+ bc+ bd),

which impliesf(a, b)f(c, d) = f(ac− 2bd, ad+ bc+ bd),

and again we see that the set of represented values is multiplicatively closed.

Example: Suppose d = −20. In assignment 4, we verify that there are exactly two positive definite reducedbinary quadratic forms of discriminant −20, namely

f+(X,Y ) = X2 + 5Y 2, and f−(X,Y ) = 2X2 + 2XY + 3Y 2.

Observe that the set of values represented by f− is not multiplicatively closed, as indeed

f−(1, 0) = 2, f−(0, 1) = 3, but f−(x, y) 6= 6 for any x, y ∈ Z.

Indeed, we have the identity 4af−(x, y) = (2ax+ by)2 − dy2, hence

8f−(x, y) = (4x+ 2y)2 + 20y2 ⇔ 2f−(x, y) = (2x+ y)2 + 5y2,

and thus f−(x, y) = 6 implies that (2x + y)2 + 5y2 = 12, which is never satisfied, as can easily be verifiedby checking possible values of x and y. In particular, this means that there is no multiplicative formula (or“composition law”) for f− as there were for our previous examples.

Does such a formula exist for f+? The identity

(a+ i√

5b)(c+ i√

5d) = (ac− 5bd) + i√

5(ad+ bc)

51

impliesf+(a, b)f−(c, d) = f+(ac− 5bd, ad+ bc).

We see that if we factor f− using the quadratic formula, we obtain

f−(a, b) = 2

(a+

1 + i√

5

2b

)(a+

1− i√

5

2b

)=

(√

2a+1 + i

√5

2b

)(√

2a+1 + i

√5

2b

).

Calculating as before, we obtain(√

2a+1 + i

√5

2b

)(√

2c+1 + i

√5

2d

)= (2ac+ ad− 2bd) + i

√5(ad+ bc+ bd),

which impliesf−(a, b)f−(c, d) = f+(2ac+ ad− 2bd, ad+ bc+ bd).

What happens if we consider the product f+(a, b)f−(c, d)? The relevant calculation is

(a+ i

√5b)(√

2c+1 + i

√5

2d

)=√

2(ac+ 2bc− 3bd) +1 + i

√5

2(ad+ 2bc+ bd),

hencef+(a, b)f−(c, d) = f−(ac+ 2bc− 3bd, ad+ 2bc+ bd).

Thus we have obtained the following “multiplication table”:

f+ f−

f+ f+ f−f− f− f+

The entries are understood to mean, for example, that the product of two numbers represented by f+ may alsobe represented by f+. In fact, this relation holds on the level of equivalence classes; that is, if f ∼ f+, g ∼ f−,then f(a, b)g(c, d) = h(x, y) for some x, y linear combinations of a, b, c, d, and h ∼ f−.

In general, the set of equivalence classes of positive definite binary quadratic forms of negative discriminant isa group under the operation of “multiplication” alluded to above. This is known as the class group.

This ends our discussion of binary quadratic forms; next, we will discuss arithmetic functions; that is,complex-valued functions whose domain is N.

52

9.2 Lecture Twenty-Four

§4.2 – Arithmetic functions

Notation: Let τ(n) denote the number of positive divisors of n (also used is the notation d(n)).

Lemma 9.2.1 Let n have prime factorization n = pe11 pe22 · · · p

ekk . Any integer d divides n if and only if d =

ps11 ps22 · · · p

skk , with 0 ≤ sj ≤ ej for every j.

Proof : Clearly, with n and d as above we see that n = d(pe1−s11 pe2−s22 · · · pek−skk ). Conversely, if d|n andp 6= pj is prime with p|d, then p - n, a contradiction. Finally if sj > ej and p

sjj |d, then pj | d

pejj

; but pj - n

pejj

, a

contradiction, hence d

pejj

- n

pejj

, if and only if d - n, and we are done.

One consequence of this lemma is that if n = pe11 pe22 · · · p

ekk , then

τ(n) = #(s1, s2, . . . , sk) : 0 ≤ sj ≤ ej = (1 + e1)(1 + e2) · · · (1 + ek),

or more succinctly written,

τ(n) =∏pα‖n

(α+ 1).

Proposition 9.2.2 If (m,n) = 1, then τ(mn) = τ(m)τ(n).

This statement is false if (m,n) > 1; for example, τ(8) = 4 6= 6 = τ(2)τ(4).

Proof : We give two sketches, left as exercises.

1. The assertion follows from the multiplicative formula found above.

2. Divisors d of n are in one-to-one correspondence with pairs of integers (d, e) where de = n.

Definition: An arithmetic function f : N → C which is not identically zero is called multiplicative if,whenever (m,n) = 1, we have f(mn) = f(m)f(n).

Proposition 9.2.2 shows that τ(n) is multiplicative, and from previous work we know that φ(n) is also multi-plicative. Indeed, we used this property to prove the formula

φ(n) = n∏p|n

(1− 1

p

).

A similar example is given by the function

σf (n) = #x mod n : f(x) ≡ 0 mod n,

where f(X) ∈ Z[X]. The Chinese remainder theorem tells us that σf (n) is multiplicative, and indeed we observethat

φ(n) = σXφ(n)−1(n).

Properties of multiplicative functions: Suppose f is a multiplicative function.

• For every n, we have the formula

f(n) =∏pα‖n

f(pα).

53

In particular, f is determined by its values on prime powers. Conversely, any set map

f : pk : p prime, k ∈ N0 → C

induces a multiplicative function.

• f(1) = 1. Indeed, since there must be some n with f(n) 6= 0, we have f(n) = f(1 · n) = f(1)f(n).

Definition: If an arithmetic function f , not identically zero, satisfies f(mn) = f(m)f(n) for every pair ofnumbers m,n, then f is said to be totally multiplicative (or completely multiplicative).

Clearly, any totally multiplicative function is also multiplicative.

Example: For any λ ∈ R, the function fλ(n) = nλ is totally multiplicative. In particular, when λ = 0 we havefλ = 1 for all n, and for λ = 1 we have fλ(n) = id(n) = n for every n.

Example: The iota function ι(n), defined

ι(n) =

1 if n = 1,

0 if n 6= 1,

is totally multiplicative.

Example: Let f(n) = (−1)n−1, so that f(n) = 1 if n is odd and −1 if n is even. Then f is not totallymultiplicative, as for example

f(8) = −1 6= 1 = f(2)f(4);

however, f(n) is multiplicative, and indeed f is induced by the map f(pα) =

1 if p is odd,

−1 if p = 2.

Example: The function f(n) = (−1)n is not multiplicative, and so in particular is not totally multiplica-tive.

Theorem 9.2.3 (Theorem 4.4, Niven) Let f(n) be a multiplicative function and let

F (n) =∑d|n

f(d).

Then F (n) is also multiplicative.

Proof : As alluded to in the proof of proposition 9.2.2, divisors d of mn are in one-to-one correspondence withordered pairs (b, c), with bc = d, b|m, c|n. Thus, if (m,n) = 1, we have

F (mn) =∑d|mn

f(d) =∑b|m

∑c|n

f(bc) =∑b|m

∑c|n

f(b)f(c)

=

∑b|m

f(b)

∑c|n

f(c)

= F (m)F (n),

and we are done.

Example: Let f(n) = n0 = 1. Then

F (n) =∑d|n

f(n) = τ(n),

giving another proof of the fact that τ is multiplicative. Note that f is totally multiplicative, while F (n) isnot.

54

9.3 Lecture Twenty-Five


Motivating questions:

• Is the converse of theorem 9.2.3 true? That is, if F (n) =∑

d|n f(d) is multiplicative, must f(n) also bemultiplicative?

• Given F (n), how can we get information about f(n)?

Remark: Given any arithmetic function F , there is exactly one function f so that F (n) =∑

d|n f(d). Indeed,we set f(1) = 1 and recusively define the other values via

f(n) = F (n)−∑d|n,d<n

f(d).

Example: We find the function f(n) satisfying

∑d|n

f(d) = ι(n) =

1 if n = 1,

0 if n > 1.

We calculate the first couple of values:

f(1) = 1, f(2) = F (2)− f(1) = 0− 1 = −1.

Clearly, for any prime p we have

f(p) = F (p)− f(1) = −1, f(p2) = F (p2)− f(p)− f(1) = 0,

and indeed f(pk) = 0 for k > 1. For composite numbers of the form pq where p, q are distinct primes, wehave

f(pq) = F (pq)− f(p)− f(q)− f(1) = 0− (−1)− (−1)− 1 = 1 = f(p)f(q),

while for n = p2q we have

f(p2q) = F (p2q)− f(p)− f(p2)− f(q)− f(pq)− f(1) = 0 = f(p2)f(q).

The above calculations suggest that f is multiplicative, which motivates the following definition.

Definition: The Mobius function µ(n) is the multiplicative function satisfying, for every prime p,

µ(pα) =

−1 if α = 1,

0 if α > 1.

Equivalently: if n is not squarefree, then µ(n) = 0. Otherwise, writing n = p1p2 · · · pk with pj distinct primes,one has µ(n) = (−1)k.

Notation: Denote by ω(n) the number of distinct prime divisors of n, and by Ω(n) the number of prime factorsof n counted with multiplicity.

For example, with n = 720 = 24 · 32 · 5, we have ω(n) = 3,Ω(n) = 4 + 2 + 1 = 5. With this notation, we maydefine

µ(n) =

(−1)ω(n) if n is squarefree,

0 otherwise.

55

Theorem 9.3.1 (Theorem 4.7, Niven) One has∑d|n

µ(d) = ι(n).

This theorem is much more widely invoked than is the definition of µ(n).

Proof : We give two proofs.

1. Both sides of the equation are multiplicative by theorem 9.2.3, and we already know that both sides agreewhen n is a prime power, from which we deduce the result.

2. By definition, ∑d|n

µ(d) =∑d|n,

d squarefree

(−1)ω(d),

and so if ω(n) = k then there are exactly(kj

)squarefree divisors d of n with ω(d) = j. Thus

∑d|n

µ(d) =

k∑j=0

(k

j

)(−1)j = (1− 1)k =

1 if n = 1,

0 if n > 1,

and we are done.

Theorem 9.3.2 (Theorem 3.8, Niven; the Mobius inversion formula) Let f(n) be an arithmetic function andlet F (n) =

∑d|n f(d). Then

f(n) =∑d|n

µ(d)F(nd

).

For example, for any multiplicative function f(n), we have f(12) = F (12)− F (6)− F (4) + F (2).

Proof : The right-hand side of the equation is∑d|n

µ(d)F(nd

)=∑d|n

µ(d)∑δ|nd

f(δ) =∑dδ|n

µ(d)f(δ)

=∑δ|n

f(δ)∑d|nδ

µ(d) =∑δ|n

f(δ)ι(nδ

)= f(n),

where we have used the result of theorem 9.3.1, and the result folllows.

56

10 Week Ten

10.1 Lecture Twenty-Six

Recall: The Mobius inversion formula.

Example: We have proven the identity

n = id(n) =∑d|n

φ(d),

and so Mobius inversion implies that

φ(n) =∑d|n

µ(d)id(nd

)=∑d|n

µ(d)n

d;

that is,φ(n)

n=∑d|n

µ(d)

d.

Note that µ(d)d is multiplicative, thus by theorem 9.2.3 we know that φ(n)

n is multiplicative. Indeed, checking onprime powers, we see for α ≥ 1 that

φ(pα)

pα=pα−1(p− 1)

pα=p− 1

p= 1− 1

p,

and similarly ∑d|pα

µ(d)

d=µ(1)

1+µ(p)

p+µ(p2)

p2+ · · ·+ µ(pα)

pα= 1 +

(−1)

p+ 0 + · · ·+ 0 = 1− 1

p.

Theorem 10.1.1 (Theorem 4.9, Niven) Let F (n) be an arithmetic function and define

f(n) =∑d|n

µ(d)F(nd

).

ThenF (n) =

∑d|n

f(d).

Proof : We have ∑d|n

f(d) =∑d|n

∑δ|d

µ(δ)F

(d

δ

) .

With d fixed, as δ ranges over the divisors of d, so does dδ . Thus

∑d|n

f(d) =∑d|n

∑δ|d

µ

(d

δ

)F (δ) =

∑δ|n

∑d|δ

µ

(d

δ

)F (δ).

Writing d = δ(dδ

), we have

∑d|n

f(d) =∑δ|n

F (δ)∑dδ|nδ

µ

(d

δ

)=∑δ|n

F (δ)ι(nδ

)= F (n),

57

and we are done.

Definition: Let f(n), g(n) b two arithmetic functions. Their Dirichlet convolution, denoted f ∗ g, isdefined

(f ∗ g)(n) =∑d|n

f(d)g(nd

).

Note that Dirichlet convolution is commutative, as

(g ∗ f)(n) =∑d|n

g(d)f(nd

)=∑d|n

g(nd

)f(d) = (f ∗ g)(n).

Example: If g(n) = 1 for every n, then

(f ∗ g)(n) =∑d|n

f(d).

(The function g is sometimes written 1.) In particular, this means that id = φ ∗ 1, ι = µ ∗ 1, and τ = 1∗ 1.

With this notation, we may restate the Mobius inversion formula as: F = f ∗ 1 if and only if f = F ∗ µ.

Theorem 10.1.2 If f and g are multiplicative functions, then f ∗ g is multiplicative.

Note that this theorem is a generalization of theorem 9.2.3.

Proof : If (m,n) = 1, then

(f ∗ g)(mn) =∑d|mn

f(d)g(mnd

).

For each divisor d of mn, we may uniquely factor d = d1d2 with d1|m and d2|n. Thus

(f ∗ g)(mn) =∑d1|m

∑d2|n

f(d1d2)g

(mn

d1d2

)=∑d1|m

∑d2|n

f(d1)g

(m

d1

)f(d2)g

(n

d2

)

=

∑d1|m

f(d1)g

(m

d1

)∑d2|n

f(d2)g

(n

d2

) = (f ∗ g)(m)(f ∗ g)(n),

as claimed.

[Structural remarks: Let A = f : N→ C be the set of arithmetic functions and let A× = f ∈ A : f(1) 6= 0;then (A×, ∗) forms an abelian group. In this group, ι is the identity and 1−1 = µ, which yields yet anotherstatement of the Mobius inversion formula:

F = f ∗ 1⇔ µ ∗ F = µ ∗ (f ∗ 1) = f ∗ (µ ∗ 1) = f ∗ ι = f.

Moreover, by theorem 10.1.2, the set of multiplicative functions forms a subgroup.]

Example: Let

s(n) =

1 if n is a perfect square,

0 otherwise;

we will identify s ∗ (µ2).

58

Note that s is multiplicative, and is characterized by

s(pα) =

1 if 2 | α,0 if 2 - α.

Moreover, µ2 is multiplicative, as the product of two multiplicative functions; hence f = s ∗ (µ2) is alsomultiplicative. We compute:

f(pα) =∑d|pα

s

(pα

d

)µ2(d) = s(pα)µ2(1) + s(pα−1)µ2(p) + · · ·+ s(1)µ2(pα) = s(pα) + s(pα−1) = 1.

So f(pα) = 1 for every α ≥ 1, and it follows that s ∗ (µ2) = 1.

Note that µ2 is the characteristic function of squarefree numbers, and indeed we see

(s ∗ µ2)(n) =∑ab=n

s(a)µ2(b) = #a, b ∈ N : ab = n, a = s2 some s, b squarefree = 1.

Thus there is a unique way to factor any n ∈ N as n = n′s2 where n′ is squarefree. For example, if n = 2·32·53·74,we have n = (2 · 5)(3 · 5 · 72)2.

59

10.2 Lecture Twenty-Seven

Properties of Mobius inversion:

• We do not assume multiplicativity of the functions; that is, the inversion formula holds for any arithmeticfunctions.

• If F (n) =∑d|n

f(d) and F (n) is multiplicative, then so is f(n), as f = F ∗ µ.

Recall: Dirichlet convolution.

When n = pα is a prime power, then

(f ∗ g)(pα) =∑d|pα

f(d)g(nd

)= f(1)g(pα) + f(p)g(pα−1) + · · ·+ f(pα)g(1).

Let us assign names to these values, so that f(1) = a0, f(p) = a1, f(p2) = a2, . . ., and similarly g(1) = b0, g(p) =b1, g(p2) = b2, . . . We obtain the following table:

α f(pα) g(pα) (f ∗ g)(pα)

0 a0 b0 a0b01 a1 b1 a0b1 + a1b02 a2 b2 a0b2 + a1b1 + a2b03 a3 b3 a0b3 + a1b2 + a2b1 + a3b0

We observe the similarity with the coefficients of the product of power series:( ∞∑α=0

f(pα)Xα

)( ∞∑α=0

g(pα)Xα

)=∞∑α=0

(f ∗ g)(pα)Xα.

Example: Find an arithmetic function f such that

φ(n)

n=∑d|n

f(d),

forgetting that we found it in the previous lecture.

Let F (n) = φ(n)n , so that F = f ∗ 1. By Mobius inversion we know that f = F ∗ µ and that f is multiplicative,

since F is. Thus we have a table as before:

α F (pα) µ(pα) f(pα)

0 1 1 11 1− 1

p −1 −1p

2 1− 1p 0 0

3 1− 1p 0 0

We see that f is the multiplicative function generated by

f(pα) =

−1p if α = 1,

0 if α > 1.

That is, f(n) = µ(n)n , as before.

60

Example: Define a multiplicative function r via

r(pα) =

2 if p ≡ 1 mod 4,

0 if p ≡ 3 mod 4,

1 if p = 2 and α = 1,

0 if p = 2 and α > 1.

Now, define R = r ∗ s, where s is the indicator function of the perfect squares from lecture twenty-six; note thatR is multiplicative. Determine the values of R(pα).

[Aside: Theorem 3.2.2 of Niven tells us that the number of proper representations of n by the binary quadraticform X2 + Y 2 equals 4r(n). In the statement of theorem 6.3.3 originally given, there was an error, in that weforgot the necessary condition that 4 - n.

Note also that any representation x2 + y2 = n corresponds to a proper representation(xd

)2+(yd

)2= n

d2, where

d = (x, y). Thus if Sn denotes the set of representations of n by X2 + Y 2, and Spn ⊂ Sn denotes the subset ofproper representations, then

#Sn =∑g2|n

#Spn/g2

=∑g2|n

4r

(n

g2

)= 4

∑d|n

r(nd

)s(d) = 4(r ∗ s)(n) = 4R(n).

Note in particular that Niven’s functions R and r correspond to our 4R and 4r, respectively.]

First, we assume that p ≡ 1 mod 4. We get the table

α r(pα) s(pα) R(pα)

0 1 1 11 2 0 22 2 1 33 2 0 44 2 1 55 2 0 6

In fact, we can prove that R(pα) = α+ 1 for any p ≡ 1 mod 4: if α is even then

R(pα) =

α∑j=0

r(pj)s(pα−j) = r(1)s(pα) +∑

1≤j≤α,α even

r(pj) = 1 +∑

1≤j≤α,α even

2 = 1 + 2(α

2

)= α+ 1.

A similar proof works for α odd, and is left as an exercise. Now, suppose p ≡ 3 mod 4; we obtain


0 1 1 11 0 0 02 0 1 13 0 0 04 0 1 15 0 0 0

61

On these primes, r acts like s, so the restriction of r ∗ s to the primes congruent to 3 modulo 4 is simply s.Finally, suppose p = 2; the table this time is


0 1 1 11 1 0 12 0 1 13 0 0 14 0 1 15 0 0 1

On these prime powers, r acts like µ2, so R acts like µ2 ∗ s = 1. Thus we conclude that R is the multiplicativefunction generated by

R(pα) =

α+ 1 if p ≡ 1 mod 4,

1 if p ≡ 3 mod 4 and α is even,

0 if p ≡ 3 mod 4 and α is odd,

1 if p = 2.

One consequence of this fact is that R(n) = 0, or

R(n) = #d : d|n and p|d⇒ p ≡ 1 mod 4.

62

10.3 Lecture Twenty-Eight

Example: Let R(n) be the multiplicative function from the last lecture, generated by

R(pα) =

α+ 1 if p ≡ 1 mod 4,

1 if p ≡ 3 mod 4 and α is even,

0 if p ≡ 3 mod 4 and α is odd,

1 if p = 2.

Find a function g such that R(n) =∑d|n

g(d).

nb. We defined

R(n) =∑g2|n

r

(n

g2

)=∑d|n

r(nd

)s(d).

Note that, since R = g ∗ 1, the Mobius inversion formula implies that g = R ∗ µ, and since R and µ are bothmultiplicative, we know that g is as well. We observe that

g(pα) =∑d|pα

R

(pα

d

)µ(d) = R(pα)µ(1) +R(pα−1)µ(p) + · · ·+R(1)µ(pα) = R(pα)−R(pα−1).

Thus:

• If p ≡ 1 mod 4 then g(pα) = (α+ 1)− α = 1.

• If p ≡ 3 mod 4 then g(pα) =

1− 0 = 1 if α is even,

0− 1 = −1 if α is odd.

• If p = 2 then g(pα) = 1− 1 = 0.

Remarks:

• Since g(pα) = g(p)α for every prime p and positive integer α, it follows that g is totally multiplicative.

• On odd primes, g(p) equals the Legendre symbol(−1p

), and hence on odd n, g(n) equals the Jacobi symbol(−1

n

). Thus, for odd n, g(n) = (−1)

n−12 .

Consequently,

R(n) =∑d|n

g(d) = #d|n : d ≡ 1 mod 4 −#d|n : d ≡ 3 mod 4.

Some miscellany: Recall that σ(n) =∑

d|n d = 1∗ id. The Greeks defined a perfect number to be a numbern whose proper divisors sum to n itself; that is, a number satisfying

n = σ(n)− n⇔ σ(n) = 2n.

For example, 6 is perfect, as 6 = 1 + 2 + 3, as is 28 = 1 + 2 + 4 + 7 + 14. The next perfect number is 496, then8128. Note that σ(n) is multiplicative, and that

σ(pα) = 1 + p+ p2 + · · ·+ pα =pα+1 − 1

p− 1.

63

We see equivalently that n is a perfect number if and only if

2 =σ(n)

n=∏pα‖n

pα+1 − 1

pα(p− 1).

Let us factor the first three perfect numbers:

6 = 2 · 3 = 21(22 − 1), 28 = 22 · 7 = 22(23 − 1), 496 = 24 · 31 = 24(25 − 1).

This motivates our next result.

Theorem 10.3.1 If q = 2p − 1 is prime, then n = 2p−1q is a perfect number.

Recall from a homework problem that if 2k − 1 is prime, then k must be prime, although this is not a sufficientcondition as e.g. 211 − 1 = 2047 = 23 · 89.

Proof : We give two.

(1) By multiplicativity,

σ(2p−1q) = σ(2p−1)σ(q) = (2p − 1)(q + 1) = 2p(2p − 1) = 2(2p−1)(2p − 1) = 2(2p−1q),

and we are done.

(2) We simply verify that the divisors of 2p−1q, namely

1, 2, 22, . . . , 2p−1, q, 2q, 22q, . . . , 2p−1q,

sum to 2(2p−1q).

We know exactly 48 numbers of this form, and note that all such numbers by construction are even. Thefollowing theorem gives the converse statement.

Theorem 10.3.2 If n is an even perfect number, then n = 2p−1(2p − 1), where both p and 2p − 1 are prime.

Proof : Write n = 2k−1m where k ≥ 2 and m odd. If n is perfect, then

2km = 2n = σ(n) = σ(2k−1)σ(m) = (2k − 1)σ(m).

Hence (2k−1)|2km, so by Euclid’s lemma we have that (2k−1)|m. Writing m = (2k−1)l, we have 2kl = σ(m);but l and m are both divisors of m, so

σ(m) ≥ m+ l = (2k − 1)l + l = 2kl.

Thus we have the equality

σ(m) =2km

2k − 1= 2kl = (2k − 1)l + l = m+ l,

so m has exactly two divisors m and l, which are distinct because k ≥ 2, and we must have l = 1. It followsthat m = 2k − 1 is prime.

Some open conjectures:

1. There are infinitely many Mersenne primes (that is, primes of the form 2p − 1 with p prime), and henceinfinitely many even perfect numbers.

2. There are no odd perfect numbers.

64

11 Week Eleven

11.1 Lecture Twenty-Nine

Diophantine approximation is the technique of finding rational numbers near given real numbers. Onefundamental fact of Diophantine approximation that we will use frequently is that, if n ∈ Z and n 6= 0,then |n| ≥ 1.

Example: Define

e =

∞∑n=0

1

n!;

we will prove that e is irrational. Indeed, assume not, and choose a, b ∈ Z, b > 0 such that e = ab . Then be ∈ Z

and so in particular b!e ∈ Z. Thus we define

m = b!e−b∑

n=0

b!

n!= b!

∞∑n=b+1

1

n!∈ Z.

Clearly m > 0, and moreover in the last sum we see that every term is at most half the previous term, thus

m = b!∞∑

n=b+1

1

n!< b!

∞∑n=b+1

1

(b+ 1)!· 1

2n−(b+1)=

2b!

(b+ 1)!=

2

b+ 1≤ 1.

That is,m ∈ Z and 0 < m < 1,

which is a contradiction. Thus e /∈ Q.

Lemma 11.1.1 If ab ,

cd are distinct rational numbers, then

∣∣ab −

cd

∣∣ ≥ 1|bd| .

Proof : This follows from the basic rules of arithmetic:∣∣∣∣ac − b

d

∣∣∣∣ =

∣∣∣∣ad− bcbd

∣∣∣∣ ≥ 1

|bd|.

Theorem 11.1.2 (Theorem 6.8, Niven; Dirichlet’s theorem on Diophantine approximation) Let x ∈ R, n ∈ N.Then there exists a

b ∈ Q with 1 ≤ b ≤ n and |x− ab | ≤

1b(n+1) .

nb. It is slightly easier to prove the bound |x− ab | <

1bn or 1

b(n−1) , but the inequality in the theorem statement

is the best possible result; indeed, we attain equality with x = cn+1 , (c, n+ 1) = 1.

Proof : Define the fractional part of y to be y = y − byc ∈ [0, 1). Consider the n real numbersx, 2x, . . . , nx and the n+ 1 subintervals[

0,1

n+ 1

),

[1

n+ 1,

2

n+ 1

), . . . ,

[n

n+ 1, 1

),

whose disjoint union is [0, 1). If some jx ∈ [0, 1n+1), then let a

b = bjxcj ; we have

∣∣∣x− a

b

∣∣∣ =

∣∣∣∣jxj − bjxcj∣∣∣∣ =

∣∣∣∣jxj∣∣∣∣ < 1

j(n+ 1)=

1

b(n+ 1).

65

Similarly, if some jx ∈ [ nn+1 , 1) then we may take a

b = bjxc+1j , and we have

∣∣∣ab− x∣∣∣ =

∣∣∣∣bjxc+ 1

j− jx

j

∣∣∣∣ =1− jx

j<

(1

n+1

)j

=1

b(n+ 1).

Finally, if neither of these cases occur, then by the pigeonhole principle there exists some subinterval containingjx and kx with j < k (say), so that |jx − kx| < 1

n+1 . Then, with a = bkxc − bjxc, b = k − j, wehave ∣∣∣x− a

b

∣∣∣ =

∣∣∣∣(k − j)xb− bkxc − bjxc

b

∣∣∣∣ =|kxjx|

b<

(1

n+1

)b

,

and we are done.

Corollary 1: If x ∈ R \Q, then there exist infinitely many ab ∈ Q such that |x− a

b | <1b2

.

Proof : Theorem 11.1.2 gives, for every n ∈ N, a rational number anbn

with 1 ≤ bn ≤ n and

0 <

∣∣∣∣x− anbn

∣∣∣∣ ≤ 1

bn(n+ 1)<

1

b2n.

Since x /∈ Q, we know that |x− anbn| 6= 0, so any given a

b can equal only finitely many of the terms anbn

, since

limn→∞

∣∣∣∣x− anbn

∣∣∣∣ = 0.

We may generalize lemma 11.1.1 as follows:

Lemma 11.1.3 Let p(X) ∈ Z[X] have degree d and let ab ∈ Q. If p

(ab

)6= 0, then |p

(ab

)| ≥ 1

bd.

Proof : If p(X) = cdXd + cd−1X

d−1 + · · ·+ c1X + c0, where ci ∈ Z, cd 6= 0, then

bdp(ab

)= cda

d + cd−1ad−1b+ · · ·+ c1ab

d−1 + c0bd ∈ Z.

Hence if p(ab

)6= 0, then |bdp

(ab

)| ≥ 1, and the result is immediate.

Definition: Let α ∈ R. We say that α is algebraic of degree d if there exists an irreducible polynomialp(X) ∈ Z[X] such that p(α) = 0. If α is not algebraic, then α is said to be transcendental.

For example,√

2 is algebraic of degree 2, as it is a root of X2 − 2. Furthermore, α is algebraic of degree 1 ifand only if α ∈ Q.

Theorem 11.1.4 (Liouville’s theorem on Diophantine approximation) Let α be algebraic of degree d. Thenthere exists some constant C = C(α) > 0 such that, for any a

b ∈ Q, ab 6= α, we have∣∣∣α− a

b

∣∣∣ ≥ C(α)

bd.

Proof : By taking C(α) ≤ 1 we may assume that ab satisfies |α− a

b | ≤ 1. Choose p(X) ∈ Z[X] to be irreducibleof degree d and such that p(α) = 0. Then we must have p

(ab

)6= 0 and so by lemma 11.1.3 that |p

(ab

)| ≥ 1

bd.

But ∣∣∣p(ab

)∣∣∣ =∣∣∣p(a

b

)− p(α)

∣∣∣ =∣∣∣ab− α

∣∣∣ p′(t),66

for some t between α and ab , by the mean value theorem. Thus, taking

C(α) =1

maxp′(t) : t ∈ [α− 1, α+ 1],

we obtain1

bd≤∣∣∣p(a

b

)∣∣∣ =∣∣∣ab− α

∣∣∣ p′(t) ≤ ∣∣∣ab− α

∣∣∣ · 1

C(α),

and we are done.

It was using this theorem that Liouville first demonstrated (1844) the existence of transcendental numbers.This work preceded by several decades Cantor’s investigation of uncountable sets, which yields a simpler albeitnon-constructive proof of the existence of transcendental numbers.

67

11.2 Lecture Thirty


It is a trivial consequence of this theorem that the number

α =∞∑n=1

10−n! = 0.11000100 . . .

is transcendental. Indeed, define

akbk

=

k∑n=1

10−n!,

so that bk = 10k! and thus ∣∣∣∣α− akbk

∣∣∣∣ =∞∑

n=k+1

10−n!.

We note that each summand is at most half the previous one, thus∣∣∣∣α− akbk

∣∣∣∣ =

∞∑n=k+1

10−n! ≤∞∑

n=k+1

10−(k+1)! 1

2n−(k+1)=

2

10(k+1)!.

If α were algebraic of degree d, then for some constant C(α) > 0 we would have

C(α)

bdk≤∣∣∣∣α− ak

bk

∣∣∣∣ ≤ 2

bk+1k

,

and thus bk+1−dk ≤ 2

C(α) . Taking k →∞ yields a contradiction, and so we see that α cannot be algebraic.

Recall: Last lecture we showed that for all α ∈ R\Q there are infinitely many ab ∈ Q such that |α−a

b | <1b2

.

Theorem 11.2.1 (Roth’s theorem) If α is algebraic, then for any ε > 0 there exists some constant C = C(α, ε)such that ∣∣∣α− a

b

∣∣∣ ≥ C(α, ε)

b2+ε, for all

a

b∈ Q.

§6.1 – Farey sequences

Given n ∈ N, the Farey fractions of order n are those ab ∈ Q such that 1 ≤ b ≤ n and 0 ≤ a ≤ b; that

is,

Fn = ab

: 1 ≤ b ≤ n, 0 ≤ a ≤ b ⊂ Q ∩ [0, 1].

Usually the set is thought of as being totally-ordered. For example,

F5 =

0

1,1

5,1

4,1

3,2

5,1

2,3

5,2

3,3

4,4

5, 1

.

If we know the first few elements of Fn, how can we compute the next?

Proposition 11.2.2 Let ab ∈ Fn with a 6= b. The next element of Fn after a

b is xy , where y ≡ −a−1 mod b, n−

b < y ≤ n, and x = ay+1b .

68

Proof : Since ay + 1 ≡ a(−a−1) + 1 ≡ 0 mod b, we know that x ∈ Z. Moreover since y ≤ n and 1 ≤ y(b − a),we know

x

y=ay + 1

by≤ by

by= 1,

and thus xy ∈ Fn. Now, suppose c

d ∈ Fn with ab <

cd <

xy . Then(

x

y− c

d

)+( cd− a

b

)=bx− ayyb

=1

yb.

But by lemma 11.1.1, we know that(x

y− c

d

)+( cd− a

b

)≥ 1

yd+

1

db=y + b

ybd≥ n+ 1

ybd≥ 1

yb· n+ 1

n>

1

yb,

which is a contradiction, and we are done.

Corollary 1: If ab <

xy are consecutive Farey fractions (for any fixed n), then xb− ay = 1.

Corollary 2: If ab <

cd <

xy are consecutive Farey fractions, then c

d = a+xb+y .

For example,

F4 =

0

1,1

4,1

3,1

2,2

3,3

4, 1

.

The fractions of F5 \ F4 are exactly

1

5=

0 + 1

1 + 4,

2

5=

1 + 1

3 + 2,

3

5=

1 + 2

2 + 3,

4

5=

3 + 1

4 + 1,

which are seen to lie in the respective intervals(0

1,1

4

),

(1

3,1

2

),

(1

2,2

3

),

(3

4,1

1

).

Next lecture, we will use the Farey fractions to give an alternate proof of Dirichlet’s theorem.

69

11.3 Lecture Thirty-One

In the Farey fractions Fn of order n, we have that if br <

cs are consecutive, then

rc− sb = 1 andb

r<b+ c

r + s<c

swith r + s ≥ n+ 1.

Indeed, the condition r + s ≥ n + 1 is necessary for our second result, otherwise the middle fraction is itself aFarey fraction, a contradiction.

Recall: Dirichlet’s theorem on Diophantine approximation (theorem 11.1.2), which states that if x ∈ R, n ∈ N,then there exists a

q ∈ Q with 1 ≤ q ≤ n and |x− aq | ≤

1q(n+1) .

Proof : If α ∈ Fn, then take aq = α. Otherwise, choose b

r <cs to be consecutive in Fn such that

b

r< α <

c

s,

by replacing α with α if necessary. We now have two cases.

1. Supposeb

r< α ≤ b+ c

r + s,

and take aq = b

r . We have∣∣∣∣α− b

r

∣∣∣∣ ≤ b+ c

r + s− b

r=

cr − bsr(r + s)

=1

r(r + s)≤ 1

r(n+ 1),

and by assumption 1 ≤ r ≤ n.

2. If instead we haveb+ c

r + s≤ α < c

s,

we instead take aq = c

s , and the proof unfolds in the same way.

§7.1 – The Euclidean algorithm

We can think of continued fractions as a consequence of the Euclidean algorithm.

Example: We find (76, 26). Simple calculation shows

73 = 2 · 26 + 21,

26 = 1 · 21 + 5,

21 = 4 · 5 + 1,

5 = 5 · 1 + 0.

Note also that73

26= 2 +

21

26= 2 +

1

(26/21)= 2 +

1

1 + 521

.

Continuing in this fashion, we have

73

26= 2 +

1

1 +5

21

= 2 +1

1 +1

4 +1

5

.

70

This is an example of the type of expression we will now study.

Definition: A continued fraction is an expression of the form

x0 +1

x1 +1

x2 +1

. . . +1

xj

,

where xi ∈ R and x0, x1, . . . , xj > 0; we will mostly be interested in the situation when xi ∈ Z for every i. Wehave the shorthand notation 〈x0;x1, x2 . . . , xj〉. For example,

76

23=

⟨2;

26

21

⟩=

⟨2; 1,

21

5

⟩= 〈2; 1, 4, 5〉 .

Example: Find a simple expression for 〈1; 3, 1, 5, x〉 as a function of x > 0. We have

〈1; 3, 1, 5, x〉 = 1 +1

3 +1

1 +1

5 +1

x

= 1 +1

3 +1

1 +x

5x+ 1

= 1 +1

3 +5x+ 1

6x+ 1

= 1 +6x+ 1

23x+ 4=

29x+ 5

23x+ 4.

We may write the above calculation more compactly as

〈1; 3, 1, 5, x〉 =

⟨1; 3, 1,

5x+ 1

x

⟩=

⟨1; 3,

6x+ 1

5x+ 1

⟩=

⟨1;

23x+ 4

6x+ 1

⟩=

⟨29x+ 5

23x+ 4

⟩.

Some useful identities:

• 〈x0;x1, x2, . . . , xj〉 = x0 + 1〈x1;x2,x3,...,xj〉 .

• 〈x0;x1, x2, . . . , xj〉 =⟨x0;x1, x2, . . . , xj−2, xj−1 + 1

xj

⟩.

Example: We find a fraction between 145 = 2.8 and 73

26 = 2.8076923, with minimal denominator. Note that145 = 〈2; 1, 4〉 and 76

23 = 〈2; 1, 4, 5〉. The function x 7→ 〈2; 1, 4, x〉 for x > 0 is a decreasing function of x andsatisfies

f(5) =73

26, lim

x→∞f(x) =

14

5.

Thus taking x = 6 we have

f(6) = 〈2; 1, 4, 6〉 =87

31= 2.8064 . . .

It is no coincidence that this is the Farey mediant 14+735+26 of 14

5 and 7326 in F31.

It is not difficult to see thatf(x0, x1, . . . , xk) = 〈x0;x1, x2, . . . , xk〉

is an increasing function of xj for every even j and a decreasing function of xj for every odd j. Thus if ai, bi ∈ Z,we have that

〈a0; a1, a2, . . . , ak〉 < 〈b0; b1, b2, . . . , bk〉

if and only if

71

• a0 < b0, or

• a0 = b0 and a1 > b1, or

• a0 = b0 and a1 = b1 and a2 < b2, or . . .

Thus we have an alternating lexicographic ordering on the integral continued fractions. To compare〈a0; a1, a2, . . . , ak〉 to 〈a0; a1, a2, . . . , al〉 with k < l, we write, formally,

〈a0; a1, a2, . . . , ak〉 = 〈a0; a1, a2, . . . , ak,∞〉 .

Finally since we may always write, for example,

4 = 3 +1

1⇒ 〈2; 1, 4〉 = 〈2; 1, 3, 1〉 ,

we remark on the special case

〈a0; a1, a2, . . . , ak〉 = 〈a0; a1, a2, . . . , ak − 1, 1〉 .

Notation: For the Euclidean algorithm applied to the pair (u0, u1), we write

u0 = u1a0 + u2, 0 < u2 < u1,

u1 = u2a1 + u3, 0 < u3 < u2,

...

uk−1 = ukak−1 + uk+1, 0 < uk+1 < uk,

uk = uk+1ak + uk+2, 0 = uk+2 < uk+1.

We call the ai coefficients partial quotients. We have equivalently

u0u1

= a0 +1

u1/u2, a0 =

⌊u0u1

⌋,

u1u2

= a1 +1

u2/u3, a1 =

⌊u1u2

⌋,

...

u0u1

= ak, ak =

⌊ukuk+1

⌋=

ukuk+1

.

Similarly, we have for exampleu1u2

=1u0u1

=1u0u1

− a0.

72

12 Week Twelve

12.1 Lecture Thirty-Two

The Process: Given ξ ∈ R, define ξ0 = ξ and set

a0 = bξ0c, ξ1 =1

ξ0 − a0=

1

ξ0,

a1 = bξ1c, ξ2 =1

ξ1 − a1=

1

ξ1,

and so on. We saw in our last lecture that if ξ = mn , then The Process is exactly the Euclidean algorithm

applied to find (m,n); in particular, The Process eventually terminates. Conversely, if ξ ∈ R \Q, then TheProcess never terminates. Furthermore, we see that

ξ = 〈ξ〉 = 〈a0; ξ1〉 = 〈a0; a1, ξ2〉 = · · ·

The numbers aj are called the partial quotients of ξ.

Example: Let ξ = 3√

2 = 1.25992 . . . We have ξ0 = ξ, and

a0 = b 3√

2c = 1, ξ1 =1

ξ0 − 1= 3.84732 . . .

a1 = bξ1c = 3, ξ2 =1

ξ1 − 3= 1.18019 . . .

a2 = bξ2c = 1, ξ3 =1

ξ2 − 1= 5.54974 . . .

a3 = bξ3c = 5, ξ4 =1

ξ3 − 5= 1.81905 . . .

We have that3√

2 = 〈1; 3, 1, 5, ξ4〉 =29ξ4 + 5

23ξ4 + 4;

solving this expression for ξ4, we obtain

ξ4 =4 3√

2− 5

−23 3√

2 + 29.

Definition: Given a0 ∈ Z, a1, a2 ∈ N, define recursively the sequences

h−2 = 0, h−1 = 1, hj = ajhj−1 + hj−2 for j ≥ 0,

k−2 = 1, k−1 = 0, kj = ajkj−1 + kj−2 for j ≥ 0.

Furthermore for j ≥ 0 define rj =hjkj

; if the coefficients aj are those found in The Process applied to ξ ∈ R,

then rj is called the jth convergent to ξ. Continuing from our last example, the partial quotients of 3√

2 are1, 3, 1, 5, . . . We have the following table:

j aj hj kj rj−2 0 1 0−1 1 0 ∞0 1 1 1 11 3 4 3 4

32 1 5 4 5

43 5 29 23 29

23

73

Note that r0 = 1, r1 = 1.3333 . . . , r2 = 1.25, r3 = 1.26087 . . ., so that the convergents are indeed good rationalapproximations to 3

√2 = 1.25992 . . ..

Theorem 12.1.1 (Theorem 7.3, Niven) For any x > 0, we have that

〈a0; a1, a2, . . . , aj−1, x〉 =xhj−1 + hj−2xkj−1 + kj−2

.

In particular,

〈a0; a1, a2, . . . , aj−1, aj〉 =ajhj−1 + hj−2ajkj−1 + kj−2

=hjkj.

Proof : We use induction. In the j = 0 case we have that 〈x〉 = x·1+00·x+1 which is clearly so, and thus we may

assume the claim holds up to j. We have

〈a0; a1, a2, . . . , aj , x〉 = 〈a0; a1, a2, . . . , aj−1, aj +1

x〉 =

(aj + 1x)hj−1 + hj−2

(aj + 1x)kj−1 + kj−2

=(ajhj−1 + hj−2)x+ hj−1(ajkj−1 + kj−2)x+ kj−1

=xhj + hj−1xkj + kj−1

.

Example: Suppose aj = 1 for all j ≥ 0. Then hj = Fj+2, kj = Fj+1, where Fn are the Fibonacci numbersFn = Fn−1 + Fn−2 normalized so that F0 = 0, F1 = 1. In particular,

〈1; 1, 1, . . . , 1︸︷︷︸j copies

〉 =Fj+1

Fj

j→∞−→ ϕ,

where ϕ = 1+√5

2 = 1.618033 . . . is the golden ratio.

Theorem 12.1.2 (Theorem 7.5, Niven) For j ≥ −1 one has hjkj−1 − kjhj−1 = (−1)j−1. In particular, thismeans that (hj , kj) = 1 for every j and that

rj − rj−1 =(−1)j−1

kjkj−1.

Proof : Exercise. (hint: use induction)

From the last equation, we know that rj > rj−1 if and only if j is odd.

Theorem 12.1.3 (Convergence of convergents) Let ξ ∈ R and let a0, a1, a2, . . . be its partial quotients, withξj , hj , kj , rj defined as above. Then

ξ − rj =(−1)j

kj(ξj+1kj + kj−1),

and in particular limj→∞

rj = ξ.

Proof : We apply theorems 12.1.1 and 12.1.2 to obtain

ξ − rj = 〈a0; a1, a2, . . . , aj , ξj+1〉 − rj =ξj+1hj + hj−1ξj+1kj + kj−1

− hjkj

=hj−1kj − hjkj−1kj(ξj+1kj + kj−1

=(−1)j

kj(ξj+1kj + kj−1),

74

and we are done.

Note that aj+1 ≤ ξj+1 < aj+1 + 1. Given n ∈ N, then choosing j so that kj ≤ n < kj+1, then we can showthat ∣∣∣∣ξ − hj

kj

∣∣∣∣ ≤ 1

kj(n+ 1).

Thus every convergent rj confirms Dirichlet’s theorem on Diophantine approximation. We may also restate thetheorem thus: ∣∣∣∣ξ − hj

kj

∣∣∣∣ =1

k2j· 1

ξj+1 + kj−1/kj, where aj+1 ≤ ξj+1 +

kj−1kj≤ aj+1 + 2.

Hence, the greater aj+1, the better the approximation rj = 〈a0; a1, a2, . . . , aj〉 is to ξ.

75

12.2 Lecture Thirty-Three

Recall: Theorem 12.1.1 tells us that

ξ =ξjhj−1 + hj−2ξjkj−1 + kj−2

,

from which it follows that

ξj =ξkj−2 − hj−2−ξkj−1 + hj−1

.

Example: Let ξ = ξ0 =√

41 = 6.4312 . . . We see that

a0 = b√

41c = 6, ξ1 =1

ξ0 − 6= 2.48062 . . .

a1 = bξ1c = 2, ξ2 =1

ξ1 − 2= 2.08062 . . .

a2 = bξ2c = 2, ξ3 =1

ξ2 − 2= 12.40312 . . .

We have the table:j aj hj kj−2 0 1−1 1 00 6 6 11 2 13 22 2 32 5

Thus

ξ1 =ξk−1 − h−1−ξk0 + h0

=1√

41− 6,

ξ2 =ξk0 − h0−ξk1 + h1

=

√41− 6

−2√

41 + 13,

ξ3 =ξk1 − h1−ξk2 + h2

=2√

41− 13

−5√

41 + 32.

Rationalizing denominators, we obtain

ξ1 =1√

41− 6·√

41 + 6√41 + 6

=

√41 + 6

5,

ξ2 =

√41− 6

−2√

41 + 13· 2√

41 + 13

2√

41 + 13=

4 +√

41

5,

ξ3 =2√

41− 13

−5√

41 + 32· 5√

41 + 32

5√

41 + 32= 6 +

√41.

We see that√

41 = 〈6; 2, 2, 6 +√

41〉, hence

6 +√

41 = 〈12; 2, 2, 6, 6 +√

41〉 = 〈12; 2, 2, 12, 2, 2, 6 +√

41〉 = · · ·

Thus√

41 = 〈6; 2, 2, 12〉; that is,√

41 has a periodic continued fraction.

Lemma 12.2.1 If the continued fraction of ξ ∈ R is eventually periodic, then ξ is a quadratic irrational,i.e. it is the root of some quadratic polynomial with integer coefficients.

76

Proof : For simplicity we will assume that the continued fraction is purely periodic, although the stronger claimis true; that is, assume

ξ = 〈a0; a1, a2, . . . , aj−1〉.

Then

ξ = 〈a0; a1, a2, . . . , aj−1, ξ〉 =ξhj−1 + hj−2ξkj−1 + kj−2

,

hence ξ(ξkj−1 + kj−2) = ξhj−1hj−2, and so

kj−1ξ2 + (kj−2 + hj−1)ξ − hj−2 = 0.

Lemma 12.2.2 Every real quadratic irrational r + s√c, where r, s ∈ Q and c ∈ N is not a perfect square

(written c ∈ N \ N2) can be written m+√d

q , where m, q ∈ Z, d ∈ N \ N2, and q|(d−m2).

Proof : Taking a common denominator for r and s, we may write

r + s√c =

a+ b√c

e=a+√cb2

e=ae+

√cb2e2

e2,

and the claim is now immediate.

The Quadratic Irrational Process: Let ξ = ξ0 = m0+√d

q0, where d,m0, and q0 satisfy the conditions of

lemma 12.2.2. For j ≥ 0, define

aj = bξjc, mj+1 = ajqj −mj , qj+1 =d−m2

j+1

qj, ξj+1 =

mj+1 +√d

qj+1.

The aj and ξj so produced are the same as those produced in The Process.

Example: ξ = ξ0 =√

41, so that m0 = 0, d = 41, q0 = 1.

j = 0 : a0 =⌊√

41⌋

= 6, m1 = 6 · 1− 0 = 6, q1 =41− 62

1= 5, ξ1 =

6 +√

41

1.

j = 1 : a1 =

⌊6 +√

41

5

⌋= 2, m2 = 2 · 5− 6 = 4, q2 =

41− 42

5= 5, ξ2 =

4 +√

41

5.

j = 2 : a2 =

⌊4 +√

41

5

⌋= 2, m3 = 2 · 5− 4 = 6, q2 =

41− 62

5= 1, ξ2 =

6 +√

41

1.

Theorem 12.2.3 (Theorem 7.19, Niven) Given a quadratic irrational ξ0, we have:

1. The qj from The Quadratic Irrational Process are integers which are eventually positive.

2. The qj and the mj are bounded.

3. The continued fraction for ξ0 is eventually periodic.

Example: The quadratic irrational −12 −

34

√5 has continued fraction 〈−3; 1, 4, 4, 1, 1, 1, 5, 3, 5〉.

Proof : (sketch) (1) ⇒ (2): Since qj > 0 for all j sufficiently large, and qj+1 + qj + m2j = d, we see that there

are only finitely many choices for the qj ,mj .

(2) ⇒ (3) There are only finitely many pairs (mj , qj), and so by the pigeonhole principle there must eventuallyoccur a duplicate. The pair (mj , qj) determines the values for the next step of The Quadratic IrrationalProcess.

77

(3) ⇒ (1) Highly nontrivial, and omitted.

Theorem 12.2.4 (Theorem 7.21, Niven) Let d ∈ N \ N2 and set c =√d. Then bcc + c has a purely periodic

continued fraction 〈a0; a1, a2, . . . , ar−1〉 with a0 = 2c. Hence c = 〈c; a1, a2, . . . , ar〉 where ar = 2c.

We refer to our earlier example, where we found that 6 +√

41 has a purely periodic continued fraction.

Proof : (omitted)

Facts: If ξ =√d and qj are defined as above, then:

• For every j we have qj 6= −1.

• If r is the period of the continued fraction of ξ, then qj = 1 if and only if r | j.

78

12.3 Lecture Thirty-Four

Notation: Throughout this lecture, d denotes a positive integer that is not a perfect square. The symbolsaj , hj , kj denote the terms from The Process applied to

√d, and similarly for mj , qj .

Pell’s equation: We are interested in integer solutions to the equation x2− dy2 = N for some fixed N ∈ Z; inparticular, we seek solutions where both x and y are positive.

Theorem 12.3.1 (Theorem 7.24, Niven) If |N | <√d, then for any positive solution (x, y) to Pell’s equation

we must have that xy is a convergent to

√d. In particular, if (x, y) = 1 then we must have that x = hj and

y = kj for some j.

Proof : (omitted)

Example: Every solution of x2 − 41y2 = −1 must come from a convergent of√

41. We saw in our last lecturethat in this case h2 = 32, k2 = 5, and indeed

(32)2 − 41(5)2 = 1024− 1025 = −1.

Theorem 7.22 of Niven gives us the following key identity: for j ≥ −1, one has h2j − dk2j = (−1)j+1qj+1. Atthe end of our last lecture we saw that qj = 1 if and only if r|j, where r is the period of the continued fractionof√d. It is a corollary (Corollary 7.23) that, for every l ≥ 0, we have

h2lr−1 − dk2lr−1 = (−1)lr.

Example: We solve Pell’s equation for d = 45. We have√

45 = 〈6; 1, 2, 2, 2, 1, 12〉,

so r = 6. Then with l = 1, we have by corollary 7.23 that

h5 = 161, k5 = 24, hence 1612 − 45(24)2 = q6 = 1.

So a solution to x2 − 45y2 = 1 is x = 161, y = 24. Note that

h5k5

= r5 = 〈6; 1, 2, 2, 2, 1〉.

Another solution is given by l = 2; we have

h11k11

= r11 = 〈6; 1, 2, 2, 2, 1, 12, 1, 2, 2, 2, 1〉 =51841

7728,

and indeed we have that 518412 − 45(7728)2 = 1.

Theorem 12.3.2 (Theorem 7.25, Niven) All solutions to x2 − dy2 = ±1 are of the form x = hlr−1, y = klr−1,where l ≥ 0 and r is the period of the continued fraction of

√d. Furthermore if r is even then there are no

positive solutions to x2 − dy2 = −1, and the positive solutions to x2 − dy2 = 1 are exactly x = hlr−1, y = klr−1with l ≥ 1; if r is odd, then the positive solutions to x2 − dy2 = −1 are exactly x = hlr−1, y = klr−1 where l isodd and positive, and the positive solutions to x2− dy2 = 1 are exactly x = hlr−1, y = klr−1 where l is even andpositive. In every case, y = y(l) is a strictly increasing function of l.

This is the main important result of our foregoing work.

Remark: Suppose s2 − dt2 = A, u2 − dv2 = B. Factoring over the reals gives

A = (s− t√d)(s+ t

√d), B = (u− v

√d)(u+ v

√d),

79

from which it follows that

AB = ((su+ dtv)−√d(sv + tu))((su+ dtv) +

√d(sv + tu)) = (su+ dtv)2 − d(sv + tu)2.

In particular, if A = 1, then we get new solutions to the equation x2 − dy2 = A by considering (s+ t√d)l with

l ≥ 2.

Example: Suppose d = 45. Set s = 161, t = 24 so that s2 − dt2 = 1. We have

(161 + 24√

45)2 = 51841 + 7728√

45, (161 + 24√

45)3 = 16, 692, 641 + 2, 488, 392√

45,

and indeed16, 692, 6412 − 45 · 2, 488, 3922 = 1, h17 = 16, 692, 641, k17 = 2, 488, 392.

Proof : (omitted)

Theorem 12.3.3 (Theorem 7.26, Niven) Set x1 = hr−1, y1 = kr−1, where r is the period of the continuedfraction of

√d. Define xl, yl recursively via

xl + yl√d = (x1 + y1

√d)l.

Then xl = hlr−1 and yl = klr−1.

Proof : (omitted)

Theorems 12.3.2 and 12.3.3 together tell us that the smallest (in terms of y) solution to x2− dy2 = ±1 is givenby x1 = hr−1, y1 = kr−1, and moreover that all solutions may be found by taking exponents of x1 +y1

√d.

Example: Suppose d = 41; then the smallest positive solution to x2− 41y2 = −1 is x1 = h2 = 32, y1 = k2 = 5.Thus

x2 + y2√

41 = (32 + 5√d)2 = 2049 + 320

√41.

By theorem 12.3.3, (2049, 320) is the smallest positive solution to x2 − 41y2 = 1.

80

13 Week Thirteen

13.1 Lecture Thirty-Five

Miscellany about continued fractions: Given an arbitrary continued fraction, must it correspond to a realnumber? Let a0 ∈ Z, a1, a2, . . . ∈ N, and define

L = 〈a0; a1, a2, . . .〉 = limn→∞

〈a0; a1, a2, . . . , an〉.

Theorem 13.1.1 The limit L always exists and is irrational. Moreover, the partial quotients of L are exactlya0, a1, a2, . . .

Recall: If rn denotes the nth convergent of L, we have rn = 〈a0; a1, . . . , an〉 and moreover

rn − rn−1 =(−1)n−1

knkn−1.

This implies that the convergents oscillate around L. Indeed, define αn = 1knkn−1

so that

rn = a0 +

n∑j=1

(−1)j−1αj ;

as a decreasing, alternating series, we know that this series converges and thus that the convergents alsoconverge.

Example: Define x = 〈1; 1, 1, . . .〉 so that x = 1 + 1x . This yields the quadratic equation x2 − x − 1 = 0 and

since x > 0 we deduce that x = 1+√5

2 = ϕ, as introduced in lecture thirty-two. With the Fibonacci numbers asdefined there, we have

Fn =1√5

(ϕn − (−ϕ)n), and m|n⇒ Fm|Fn.

Definition: A real number is called simply normal in base-10 if, for every i ∈ 0, 1, . . . , 9, the probabilityof randomly selecting an i in its decimal expansion is 0.1.

There is an analogous definition for simple normality in base-b. A real number is normal base-b if it is simplynormal base-b, base-b2, base-b3, and so on. For example, 0.0123456789 is simply normal base-10, but notnormal.

Theorem 13.1.2 Almost all real numbers are normal base-10.

Champernowne’s number: Let c = 0.12345678910111213 . . . D.G. Champernowne showed in 1933 that c isnormal base-10.

It is conjectured that the following numbers are normal: π, e, log 2, and any q ∈ Q of degree at least 3.

It is a trivial consequence of theorem 13.1.2 that almost all real numbers are normal in every base simultane-ously.

Back to continued fractions: given ξ ∈ R, define

δk(ξ) = limx→∞

#n ≤ x : an = kx

.

Aleksandr Khinchin showed that, for almost all ξ ∈ R, δk(ξ) exists and equals log2(1 + 1k(k+2)), thus

δ1 ≈ 0.415, δ2 ≈ 0.170, δ3 ≈ 0.093, . . .

81

One number which fails this test is

e = 〈2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, . . .〉

Furthermore, any number of the form me+nre+s also fails Khinchin’s theorem. It is conjectured that the following

numbers satisfy Khinchin’s theorem: π, e, log 2, and any q ∈ Q of degree at least 3. Khinchin also proved (1934)that, for almost all ξ ∈ R, one has

limn→∞

(a1a2 · · · an)1/n =∞∏k=1

(1 +

1

k(k + 2)

)log2 k

= 2.6854520010 . . .

Theorem 13.1.3 (Theorem 7.17, Niven) For all ξ ∈ R \ Q, there exist infinitely many ab ∈ Q such that

|ξ − ab | <

1√5b2

, and moreover√

5 is the best possible such bound.

By discarding the (countable) set of real numbers ξ for which the bound√

5 is necessary, we may improve the

bound to√

8; repeating this process we obtain bounds of√2215 ,

√151713 , . . . These numbers arise naturally in the

study of the Markov spectrum.

Theorem 13.1.4 (Theorem 7.14, Niven) If |ξ − ab | <

12b2

, then ab is a convergent to ξ.

82

13.2 Lecture Thirty-Six

Numerical examples of continued fractions

Let y = 365.242199 . . . be the number of solar days in a year; it has been a challenge for centuries to constructa calendar which takes into account this lack of integrality. Numa Pompilius devised a calendar (ca. 713 BCE)in which occasional and irregular leap months would be added into the middle of February. Julius Caesar (48BCE) devised the Julian calendar, in which every year has 365 days, except for every fourth year which has366.

While divergence from the true count is slow in the Julian calendar (amounting to about 11 days over 1800years) it is noticeable; in 1582, Pope Gregory XIII introduced the Gregorian calendar as a replacement. In thiscalendar, every year divisible by 4 is a leap year, except years divisible by 100 and not 400. This is the mostwidely-used calendar in contemporary Western society; it averages 365.2425 days per year, and so diverges byabout 3 days every 10,000 years.

The continued fraction of y is 〈365; 4, 7, 1, 3, 5, 20, . . .〉, and the convergents to y − 365 are

1

4,

7

29,

8

33,

31

128,163

673, . . .

To get a good rational approximation, we need to truncate before a large partial quotient. Using the convergent31128 , we might say that we have a leap year every year which is divisible by 4, except years that are divisible by128. In hexadecimal: a year is a leap year if it ends in 0, 4, 8, or C, unless it ends A00. This diverges by aboutone day every 87,000 years, and we have

36531

128= 365.2421875.

Now, let m = 29.53059 . . . be the number of days in a lunar month (that is, from one new moon to the next),so that we have y

m = 12.3683 . . . Taking the continued fraction,

y

x= 〈12; 2, 1, 2, 1, 1, 17, . . .〉,

and the convergents of yx − 12 are

1

2,1

3,3

8,

4

11,

7

19, . . .

Modern lunisolar calendars have 7 leap months every 19 years, diverging by one month every 6800 years.

In modern western music, the A above middle C is assigned the frequency 440Hz. By doubling this frequency,we obtain a note one octave higher; tripling it, we obtain a perfect fifth between 880Hz and 1320Hz. Un-fortunately much like the alignment of months and years, the alignment of octaves and fifths is out of sync;indeed,

(3/2)12

27≈ 1.015.

However, an equally-tempered tuning divides each octave into 12 equal segments, so each semitone is anincrease by a factor of 21/12; in this case we see 27/12 ≈ 1.498. We take the continued fraction:

log2(3/2) =log(3/2)

log 2= 0.58496 . . . = 〈0; 1, 1, 2, 2, 3, 1, 5 . . .〉,

with convergents

1,1

2,3

5,

7

12,24

41, . . .

83

So if we wanted to divide the octaves into x notes so that an interval of y of them make a perfect fifth, wewould be better to take x = 41, y = 24.

Pythagorean triplets: What are all positive integer solutions to the equation x2 + y2 = z2? A primitivetriplet is a solution to this equation in which (x, y) = 1.

Theorem 13.2.1 (Theorem 5.5, Niven) The positive, primitive Pythagorean triplets (with y even) are param-eterized by:

x = r2 − s2, y = 2rs, z = r2 + s2,

where r > s > 0, (r, s) = 1, and r and s have opposite parity.

nb. For any primitive (x, y, z), exactly one of x and y is even.

Proof : We give two sketches.

1. We may factor y2 = (z − x)(z + x), hence(y2

)2=z + x

2· z − x

2, with (x+z2 , x−z2 ) = 1.

By Euclid’s lemma, we must have that z+x2 = r2, z−x2 = s2.

2. We have(xz

)2+(yz

)2= 1, and so we seek to find the rational points q of the unit circle. The line joining

any rational point q to (−1, 0) has rational slope; conversely, any line through (−1, 0) with rational slopeintersects the circle in a rational point:

y = mx+ b,m ∈ Q⇒ x2 + (m(x+ 1))2 = 1⇔ (x+ 1)((m2 + 1)x+ (m2 − 1)) = 0.

So, all rational points on the circle have the form(1−m2

1 +m2,

2m

1 +m2

), m ∈ Q.

The approach of proof (2) generalizes to arbitrary conic sections.

84

13.3 Lecture Thirty-Seven

Final exam review

At least half of the problems on the final will be taken from homework problems. No calculators are permitted.Below is a brief overview of the important topics covered.

Chapter One – Divisibility

• The Euclidean algorithm: calculating the gcd, Bezout’s identity, calculating inverses modulo m.

• The Fundamental theorem of arithmetic.

• Euclid’s theorem

Chapter Two – Congruences

• The Chinese remainder theorem.

• Euler’s theorem; Fermat’s little theorem.

• The Euler φ-function.

• Primitive roots; the structure of Z×n .

• Hensel’s lemma.

• Solving linear congruences ax ≡ b mod m.

• The number of solutions of xn ≡ a mod p.

Example problems: Find all n ∈ Z such that 3n ≡ n mod 7. Show that aφ(n) ≡ a2φ(n) mod n for all a ∈ Z, n ∈ N.Prove that a squarefree integer n is a Carmichael number if and only if (p− 1)|(n− 1) for every p|n.

Chapter Three – Quadratic Reciprocity and Quadratic Forms

• Sums of two squares.

• The law of quadratic reciprocity.

• Jacobi symbols, Legendre symbols; special known values of the same.

• Quadratic residues and nonresidues.

• Euler’s criterion.

• Binary quadratic forms

Example problem: In Z×n , prove that at most half of the elements are quadratic residues, and that exactly halfof them are quadratic residues if and only if n has a primitive root.

Chapter Four – Some Functions of Number Theory

• Multiplicative functions, totally multiplicative functions.

• Dirichlet convolution.

• Mobius inversion.

Chapters Six and Seven – Farey Fractions and Irrational Numbers; Simple Continued Frac-tions

• Dirichlet’s theorem on Diophantine approximation.

85

• Farey fractions.

• Diophantine approximations to rational and algebraic numbers.

• Continued fractions.

• Pell’s equation.

86

math 537 class notesbelked/lecturenotes/537/537.pdf · math 537 class notes ed belk fall, 2014 1...

Documents