contents introduction - harvard universitypeople.math.harvard.edu/~pspark/bv.pdfthe...

THE BOMBIERI–VINOGRADOV THEOREM

PETER S. PARK

Contents

1. Introduction 12. Dirichlet Characters 43. The Large Sieve 104. The Main Proof 16Appendix A. Dirichlet L-Functions 26Acknowledgments 33References 33

1. Introduction

A central motivating question in analytic number theory is the following:

For a given x > 0, how many prime numbers are there less than x? More generally,how many prime numbers less than x satisfy a given condition – e.g, are containedin a given arithmetic progression ≡ a (mod q)?

The last question is trivial when a and q are not relatively prime, but when (a, q) = 1 does hold,one can seek an asymptotic formula for the counting function

π(x; q, a) :=∑p≤x

p≡a (mod q)

1.

In fact, we can reduce the problem of estimating π(x; q, a) with that of estimating Chebyshev’sϑ-function for the arithmetic progression ≡ a (mod q), defined by

ϑ(x; q, a) :=∑p≤x

p≡a (mod q)

log p.

Indeed, we can use a Riemann-Stieltjes integral and integrate by parts to get

π(x; q, a) =

∫ x

2−

1

log tdϑ(t; q, a) =

ϑ(x; q, a)

log x+

∫ x

2

ϑ(t; q, a)

t log2 tdt.

We can further reduce to estimating a sum that is close to ϑ(x; q, a) for all intents and purposes:Chebyshev’s ψ-function for the arithmetic progression ≡ a (mod q), defined by

(1.1) ψ(x; q, a) =∑n≤x

n≡a (mod q)

Λ(n),

Date: July 29, 2016.

1

2 PETER S. PARK

where Λ : Z>0 → R is the von Mangoldt function defined by

Λ(n) =

{log p if n = pr for some prime p,

0 otherwise.

It is more natural to deal with ψ(x; q, a) than it is to deal with ϑ(x; q, a) or π(x; q, a). To see this,we note that the orthogonality of characters (Proposition 2.3 in the next section) allows us to write

(1.2) ψ(x; q, a) =∑n≤x

Λ(n)1q,a(n) =∑n≤x

Λ(n)

1

φ(q)

∑χ∈Xq

χ(a)χ(n)

=1

φ(q)

∑χ∈Xq

χ(a)ψ(x, χ),

where Xq denotes the set of Dirichlet characters with period q, the indicator function 1q,a(n) forthe arithmetic progression ≡ a (mod q) is defined by

(1.3) 1q,a(n) :=

{1 if n ≡ a (mod q),

0 otherwise,

and ψ(x, χ) is defined by

ψ(x, χ) :=∑n≤x

Λ(n)χ(n).

We can then exploit the appearance of Λ(n) in the logarithmic derivative of L(s, χ) (see Defini-tion A.1 in the appendix) to approximate ψ(x, χ) by the integral

1

2πi

∫ c+iT

c−iT

(−L

′(s, χ)

L(s, χ)

)xs

sds =

1

2πi

∫ c+iT

c−iT

∞∑n=1

χ(n)Λ(n)

ns· x

s

sds

=∞∑n=1

χ(n)Λ(n)1

2πi

∫ c+iT

c−iT

1

s

(xn

)sds,

since setting c > 0 makes the inner integral on the right-hand side,

1

2πi

∫ c+iT

c−iT

1

s

(xn

)sds,

approximate the indicator function11>1(

xn), where

(1.4) 1>1(y) :=

0 if 0 < y < 1,12 if y = 1,

1 if y > 1.

Thus, having the von Mangoldt function in the expression for ψ(x; q, a) allows us to use the analyticproperties of L(s, χ) to asymptotically study ψ(x; q, a) and by extension, π(x; q, a).

The best currently known estimate for ψ(x; q, a) that is uniform across a range of values for qis the following, proven by Walfisz [23] as an application of a theorem of Siegel [14] regarding thehypothetical exceptional zero of L(s, χ) for real nonprincipal primitive χ.

Theorem 1.1 (Siegel-Walfisz). For any fixed A > 0, we have

ψ(x; q, a) =x

φ(q)+O

(xe−c

√log x

)for all x ≥ 2 and 1 ≤ q ≤ (log x)A, where c > 0 and the absolute constant depends only on A.

There is also a variant of the above result for the more fundamental sum ψ(s, χ).

1The proof of this fact is similar to that of (4.3). For a reference, see [3, p.105–6]

THE BOMBIERI–VINOGRADOV THEOREM 3

Theorem 1.2. For any fixed A > 0, we have

ψ(s, χ) = 1χ0(χ)x+O(xe−c

√log x

)for all x ≥ 2, 1 ≤ q ≤ (log x)A, and χ ∈ Xq, where c > 0,

1χ0(χ) :=

{1 if χ is principal,

0 otherwise,

and the absolute constant depends only on A.

For a detailed discussion of these results and their proofs, we direct the reader to [3, §22].

The unconditional error term O(xe−c√log x) is quite bad; it is worse than xε for every ε < 1.

This is especially unsettling, since we expect the error term to actually behave as O(x12 (log x)2),

the error term conditional under the Generalized Riemann Hypothesis (stated in Conjecture A.7and heretofore denoted as GRH). Indeed, Titchmarsh [17] has shown that if GRH holds, then

ψ(x, χ) = 1χ0(χ)x+O(x12 (log qx)2),

for all χ ∈ Xq, from which one gets

(1.5) ψ(x; q, a) =x

φ(q)+O(x

12 (log x)2)

uniformly for all q.However, one can get an improved error term for ψ(x; q, a) by averaging over a range of q. More

precisely, denote the error term for ψ(x; q, a) by

E(x; q, a) := ψ(x; q, a)− x

φ(q).

Furthermore, let

E(x; q) := max{|E(x; q, a)| : (a, q) = 1}and

E∗(x; q) := maxy≤x

E(y; q).

A result proven independently by Bombieri [2] and A. I. Vinogradov [20] states that averagingE∗(x; q) over a range of q gives an asymptotic order of growth that is more in line with the errorterm (1.5) that we expect from GRH.

Theorem 1.3 (Bombieri–Vinogradov). Fix A > 0. For all x ≥ 2 and all Q ∈ [x12 (log x)−A, x

12 ],

we have ∑q≤Q

E∗(x; q)� x12Q(log x)5,

where the absolute constant depends only on A.

We note that this bound is trivial for Q sufficiently large. Indeed, for y ≥ 1 and q ∈ Z>0, thereare at most y

q + 1 positive integers n such that n ≤ y and n ≡ a (mod q). Thus, applying the

triangle inequality to (1.1) yields ψ(y; q, a) � x log xq for y ≤ x and q ≤ x

12 . Furthermore, since

φ(q) � qlog(q+1) (for a reference, see [7]), we have y

φ(q) �x log(q+1)

q � x log xq for y ≤ x and q ≤ x

12 .

This overall shows that in the statement of Theorem 1.3, each term E∗(x; q) occurring in the sum is

bounded in magnitude by x log xq . This gives us a trivial bound under the hypotheses of the theorem:

(1.6)∑q≤Q

E∗(x; q)�∑q≤Q

x log x

q� x(log x)(logQ)� x(log x)2.

4 PETER S. PARK

Thus, the statement of Theorem 1.3 is trivial for Q � x12 (log x)−3. Also, given the lower bound

x12 (log x)−A ≤ Q, we observe that the bound given by Theorem 1.3 is� x(log x)5−A. It follows that

Theorem 1.3 is nothing more than a reduction in the exponent of log x in the trivial bound (1.6).However, this reduction turns out to be essentially of the same strength as the estimate fromassuming GRH and thus critical in applications, such as the work of Zhang [25] and Goldston etal. [6] on bounded prime gaps.

We first build up the basic theory of Dirichlet characters in Section 2, which will be necessaryfor using ψ(x, χ) to investigate ψ(x; q, a) as in (1.2), and ultimately in our proof of Theorem 1.3. Anatural follow-up topic is a discussion of the associated Dirichlet L-functions, which are essentialfor motivating our line of inquiry – in particular, they comprise the central methodology used inthe proof of the Siegel-Walfisz theorem, and of any modern prime-number-theorem-type result –but are not needed for our immediate purposes and thus included as an appendix. We then proceedto introduce in Section 3 the large sieve, which will be the key tool that we use for our main proofof Theorem 1.3 in Section 4.

2. Dirichlet Characters

Definition 2.1. Let q ∈ Z>0. A Dirichlet character of period q is a function χ : Z → C inducedby a homomorphism Z/qZ× → C× (i.e, an irreducible character of (Z/qZ)×) in the following way:

(Z/qZ)×� _

��

// C×� _

��

Zχ

55// //// Z/qZ // C

Equivalently [3, p.2], it is a function χ : Z→ C which is

(1) periodic modulo q, i.e, χ(n+ q) = χ(n) for all n ∈ Z,(2) totally multiplicative, i.e, χ(mn) = χ(m)χ(n) for all m,n ∈ Z,(3) satisfies χ(1) = 1, and(4) satisfies χ(n) = 0 for all n ∈ Z such that (n, q) > 1.

As stated before, we denote the set of Dirichlet characters of period q (corresponding to the set ofirreducible characters of (Z/qZ)×) by Xq. Since (Z/qZ)× is a finite abelian group, the irreduciblecharacters of (Z/qZ)× are all one-dimensional and form a group (under multiplication) that isisomorphic to (Z/qZ)×. It immediately follows that Xq is a group in the same way. Note that thecomplex conjugate of a Dirichlet character of period q is a Dirichlet character of period q, and alsothat |Xq| = |(Z/qZ)×| = φ(q).

Example 2.2. The principal character of period q, denoted by

χ0(n) :=

{1 if (n, q) = 1,

0 otherwise,

is a Dirichlet character of period q. For q = 1 or 2, the principal character is clearly the onlyDirichlet character. However, this is not true for q > 2. For q = 3, there are precisely φ(3) = 2distinct Dirichlet characters.

n (mod 3) 0 1 2χ0(n) 0 1 1χ1(n) 0 1 −1

For q = 4, there are also precisely two distinct Dirichlet characters, since φ(4) = 2.


n (mod 4) 0 1 2 3χ0(n) 0 1 0 1χ1(n) 0 1 0 −1

For q = 5, there are precisely φ(5) = 4 distinct Dirichlet characters. Note that unlike in the previouscases, we have complex (i.e, non-real) characters, χ1 and χ3, which are complex conjugates.

n (mod 5) 0 1 2 3 4χ0(n) 0 1 1 1 1χ1(n) 0 1 i −i −1χ2(n) 0 1 −1 −1 1χ3(n) 0 1 −i i −1

In the above examples, one can observe that for any a ∈ {0, 1, . . . , q − 1} (mod q), the func-tion 1q,a(n) (defined in the introduction) can be written as a linear combination of the Dirichletcharacters of period q. For example,

15,2(n) =1

4χ0(n)− i

4χ1(n)− 1

4χ2(n) +

i

4χ3(n).

In fact, by the orthogonality of characters, this property holds for any period q.

Proposition 2.3 (Orthogonality). For any q ∈ Z>0 and a ∈ {0, . . . , q − 1} (mod q), we have

1q,a(n) =1

φ(q)

∑χ∈Xq

χ(a)χ(n)

Proof. Since (Z/qZ)× is an abelian group, every element comprises its own conjugacy class. Thus,the claim follows immediately from column orthogonality. �

It is possible for a Dirichlet character to be induced by another Dirichlet character with a smallerperiod. We formalize this notion as follows.

Definition 2.4. Let χ ∈ Xq. For q1 ∈ Z>0, we say that χ(n) restricted by (n, q) = 1 has period q1if it has the property that χ(n + q1) = χ(n) for all n such that (n, q) = (n, q1) = 1. Furthermore,let c(χ) denote the conductor of χ, defined to be the least q1 ∈ Z>0 such that χ(n) restrictedby (n, q) = 1 has period q1. We say that χ is primitive if c(χ) = q, and imprimitive precisely ifc(χ) < q.

Example 2.5. The principal character of period 1 is trivially primitive. On the other hand, anyprincipal character of period q > 1 is imprimitive, since its conductor is clearly 1.

For another example, let χ ∈ X15 be the Dirichlet character given by

n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14χ(n) 0 1 i 0 −1 0 0 i −i 0 0 1 0 −i −1

Since χ(1) = χ(11), χ(2) = χ(7), χ(4) = χ(14), and χ(8) = χ(13), we have that χ(n) restricted by(n, 15) = 1 has period 5. It is immediate to check that this is the least such period, i.e, c(χ) = 5.Thus, χ is imprimitive.

For q1 ∈ Z>0 such that χ(n) restricted by (n, q) = 1 has period q1, we have c(χ) | q1. Indeed,q2 := (c(χ), q1) also satisfies the property that χ(n) restricted by (n, q) = 1 has period q2, but sincec(χ) is the least period for which this is true, we necessarily have q2 = c(χ), i.e, c(χ) | q1.

As we originally set out to do, we will now show that every imprimitive character is induced bya primitive character with a smaller period.

6 PETER S. PARK

Proposition 2.6. Let χ ∈ Xq. There exists a unique Dirichlet character χ1 ∈ Xc(χ) that inducesχ, i.e, such that

χ(n) =

{χ1(n) if (n, q) = 1,

0 otherwise.

Furthermore, χ1 is primitive.

Proof. Such a Dirichlet character χ1 satisfies χ1(n) = χ1(n+ tc(χ)) = χ(n+ tc(χ)) for all n, t ∈ Zsuch that (n, c(χ)) = (n+ tc(χ), q) = 1, since χ1 has period c(χ). This proves uniqueness.

We prove existence by construction. For each n ∈ Z satisfying (n, c(χ)) = 1, we choose t ∈ Zsuch that (n+ tc(χ), q) = 1, which allows us to set χ1(n) := χ(n+ tc(χ)). This process completelydefines χ1 ∈ Xc(χ), provided that such an integer t exists for each n with (n, c(χ)) = 1. It sufficesto show that given such an integer n, there exists t such that p - n+ tc(χ) for every p ∈ P , where Pis the set of primes that divide q but not c(χ). But P is finite, and for each p ∈ P , the congruencen + tc(χ) ≡ 1 (mod p) has a (unique) solution t (mod p). Thus, the Chinese remainder theoremstates that there exists an integer t such that n+tc(χ) ≡ 1 (mod p) for every p ∈ P , which emsuresthat p - n+ tc(χ) for every p ∈ P , as needed.

It is standard to check that χ1 is a Dirichlet character of period c(χ). It remains to show thatχ1 is primitive. Suppose that q2 ∈ Z>0 satisfies the property that χ1 restricted by (n, c(χ)) = 1has period q2. Since (n, q) = 1 =⇒ (n, c(χ)) = 1, it follows that χ(n) restricted by (n, q) = 1 hasperiod q2, which must be greater than or equal to c(χ). Thus, q2 ≥ c(χ), which shows that χ1 isprimitive. �

Example 2.7. It is easy to check that the Dirichlet character χ ∈ X15 from Example 2.5 (withconductor c(χ) = 5) is induced by the primitive character χ1 ∈ X5 defined in Example 2.2.

The following is a useful criterion for ascertaining that a given positive integer q1 is a period ofχ ∈ Xq restricted by (n, q) = 1.

Lemma 2.8. Let χ ∈ Xq and q1 ∈ Z>0. χ(n) restricted by (n, q) = 1 has period q1 if and only ifχ(n) = 1 for all n ∈ Z such that n ≡ 1 (mod q1) and (n, q) = 1.

Proof. The forward direction is immediate, so we now prove the backward direction. Let q2 =(q, q1). There exist integers x, y such that xq+yq1 = q2. For any integer n such that n ≡ 1 (mod q2)and (n, q) = 1, we can write n = 1+aq2 for some integer a, which yields n = 1+a(xq+yq1) ≡ 1+ayq1(mod q). Thus, χ(1+hyq1) = χ(n) 6= 0, which gives us that (1+hyq1, q) = 1. Furthermore, by ourhypothesis, χ(1 + hyq1) = 1. We have shown that χ(n) = 1 for all n ∈ Z such that n ≡ 1 (mod q2)and (n, q) = 1, i.e, that the hypothesis holds even when we replace q1 with q2.

Let h, k be integers such that (h, q) = (k, q) = 1 and h ≡ k (mod q2). Viewing h, k as elementsof (Z/qZ)×, there exists a unique element m ∈ (Z/qZ)× such that h ≡ mk (mod q), which impliesthat h ≡ mk (mod q2) since q2 | q. Since (k, q) = 1, it follows that m ≡ 1 (mod q2), and thusχ(m) = 1. This shows that χ(h) = χ(mk) = χ(m)χ(k) = χ(k). Thus, χ(n) restricted by (n, q) = 1has period q2, and since q2 | q1, χ(n) restricted by (n, q) = 1 has period q1. �

So far, we have discussed the properties of Dirichlet characters, which correspond precisely tothe characters of the multiplicative group of (Z/qZ)×. In a similar vein, we can also consider thecharacters of the additive group Z/qZ. For x ∈ R, let e(x) := exp(2πix). Note then that thecharacters of the additive group Z/qZ are precisely given by

n 7→ e

(mn

q

), m ∈ {0, 1, . . . , q − 1}.

It will be useful to write the multiplicative characters in terms of the additive characters. To thisend, we introduce the following.


Definition 2.9. For χ ∈ Xq, define the Gauss sum of χ by

τ(χ) :=∑

m∈Z/qZ

χ(m)e

(m

q

).

Suppose that (n, q) = 1, i.e, that n is invertible in Z/qZ. Then,

(2.1) χ(n)τ(χ) = χ(n)∑

m∈Z/qZ

χ(m)e

(m

q

)=

∑m∈Z/qZ

χ(n−1m)e

(m

q

)=

∑h∈Z/qZ

χ(h)e

(hn

q

).

This gives us our desired expression for χ(n) in terms of exponentials, so long as τ(χ) 6= 0. In fact,if χ is primitive, we can show that the equality

(2.2) χ(n)τ(χ) =∑

h∈Z/qZ

χ(h)e

(hn

q

)holds for any n.

Lemma 2.10. Let χ ∈ Xq be primitive. For any n ∈ Z/qZ such that (n, q) > 1, we have∑h∈Z/qZ

χ(h)e

(hn

q

)= 0.

In particular, the equality (2.2) holds for all n.

Proof. Let a = q(n,q) . We can write the elements h ∈ Z/qZ as h = ab+r for b ∈ {0, 1, . . . , (n, q)−1}

and r ∈ {0, 1, . . . , a− 1}, which yields∑h∈Z/qZ

χ(h)e

(hn

q

)=

a−1∑r=0

(n,q)−1∑b=0

χ(ab+ r)e

((ab+ r)n

q

)=∑r=0

e

(rn

q

) (n,q)−1∑b=0

χ(ab+ r).

Note that χ(ab+ r) is periodic with period (n, q), which allows us to rewrite the inner sum as

(n,q)−1∑b=0

χ(ab+ r) =(n, q)

q

q−1∑b=0

χ(ab+ r) =(n, q)

q

∑b∈Z/qZ

χ(ab+ r).

Thus, it suffices to show that S :=∑

b∈Z/qZ

χ(ab+ r) vanishes.

For c ∈ Z/qZ, the congruence ax + r ≡ c (mod q) is solvable if and only if (a, q) | c − r, and ifthis is true, the congruence is equivalent to

a

(a, q)x ≡ c− r

(a, q)

(mod

q

(a, q)

),

which has a unique solution modulo q(a,q) , since ( a

(a,q) ,q

(a,q)) = 1. Thus, as x ranges through

Z/qZ, ax + r ranges through precisely the congruence classes modulo q that are congruent to r(mod (a, q)), and each such congruence class occurs precisely (a, q) times. Thus,

S =∑

b∈Z/qZ

χ(ab+ r) = (a, q)∑

d∈Z/qZd≡r (mod(a,q))

χ(d)

Let m ∈ (Z/qZ)× such that m ≡ 1 (mod (a, q)). Then,

χ(m)S = (a, q)∑


χ(m)χ(d) = (a, q)∑


χ(md) = (a, q)∑

c∈Z/qZc≡r (mod(a,q))

χ(c) = S

8 PETER S. PARK

Suppose for the sake of a contradiction that S 6= 0. Then, the above shows that χ(m) = χ(m) = 1for all m ∈ (Z/qZ)× such that m ≡ 1 (mod (a, q)). Since χ is primitive, it follows from Lemma 2.8that q = (a, q), i.e, q | a. But this is impossible, since (n, q) > 1. Thus, S = 0, as needed. �

For primitive χ, we can also show that τ(χ) 6= 0, which means we can divide both sides of theequality (2.2) by τ(χ) to get our desired identity expressing χ in terms of exponentials.

Lemma 2.11. Let χ ∈ Xq be primitive. Then

|τ(χ)| = q12 .

Proof. Multiplying the equality (2.2) with its complex conjugate counterpart, we get

|χ(n)|2 |τ(χ)|2 =∑

h∈Z/qZ

∑k∈Z/qZ

χ(h)χ(k)e

(n(h− k)

q

).

Sum both sides of this equality over n ∈ Z/qZ. By row orthogonality of characters, the resulting

left-hand side is φ(q) |τ(χ)|2. For each (h, k) ∈ (Z/qZ)2, the sum over n ∈ Z/qZ of the summandcorresponding to (h, k), i.e,∑

n∈Z/qZ

χ(h)χ(k)e

(n(h− k)

q

)= χ(h)χ(k)

∑n∈Z/qZ

e

(n(h− k)

q

),

is equal to 0 if h− k 6≡ 0 (mod q) and equal to χ(h)χ(k)q = q if h− k ≡ 0 (mod q); this is again aconsequence of row orthogonality. Thus, summing over all (h, k) yields the equality

φ(q) |τ(χ)|2 = q∑

h∈Z/qZ

χ(h)χ(h) = q∑

h∈(Z/qZ)×1 = qφ(q)

This proves our claim. �

Suppose that χ ∈ Xq is nonprincipal. Then, since

q−1∑n=0

χ(n) = 0 by row orthogonality, it im-

mediately follows that

M+N∑n=M+1

χ(n) < q for any M ∈ Z and N ∈ Z>0. However, it is possible to

obtain a tighter bound on the character sum. In 1918, Polya [10] and I. M. Vinogradov [21] (notto be confused with A. I. Vinogradov, who is the “Vinogradov” in the subject of this exposition)

independently proved the better upper bound of O(q12 log q), which will be useful for our main

proof.

Theorem 2.12 (Polya–Vinogradov). Let χ ∈ Xq be nonprincipal. For any M ∈ Z and N ∈ Z>0,∣∣∣∣∣M+N∑n=M+1

χ(n)

∣∣∣∣∣ < 2q12 log q.

Proof. We follow the elementary argument of Schur [13].Suppose that χ is primitive. Since it is also nonprincipal, we necessarily have q ≥ 3 and χ(q) = 0.

By Lemma 2.10, we have

χ(n) =1

τ(χ)

q−1∑a=0

χ(a)e

(an

q

)=

1

τ(χ)

q−1∑a=1

χ(a)e

(an

q

),


which implies that

M+N∑n=M+1

χ(n) =1

τ(χ)

q−1∑a=1

χ(a)M+N∑n=M+1

e

(an

q

)=

1

τ(χ)

q−1∑a=1

χ(a) ·e(a(M+N+1)

q

)− e(a(M+1)

q

)e(aq

)− 1

.

Recall from Lemma 2.11 that |τ(χ)| = q12 . Thus, taking absolute values and applying the triangle

inequality, we have

∣∣∣∣∣M+N∑n=M+1

χ(n)

∣∣∣∣∣ ≤ q− 12

q−1∑a=1

2∣∣∣e(aq − 1)∣∣∣ = q−

12

q−1∑a=1

1∣∣∣12 (e( a2q

)− e(− a

2q

))∣∣∣ ∣∣∣e( a2q

)∣∣∣= q−

12

q−1∑a=1

1∣∣∣sin(πaq )∣∣∣ = q−12

q−1∑a=1

1

sin(πaq

) .Since

1

sin(πx)is convex on [0, 1], its definite integral is lower-bounded by any midpoint Rie-

mann sum approximation. So, since1

q

q−1∑a=1

1

sin(πaq

) is a midpoint Riemann sum approximation

of

∫ 1− 12q

12q

1

sin(πx), we can make the upper bound

∣∣∣∣∣M+N∑n=M+1

χ(n)

∣∣∣∣∣ ≤ q− 12 · q · 1

q·q−1∑a=1

1

sin(πaq

) = q12

∫ 1− 12q

12q

1

sin(πx)= 2q

12

∫ 12

12q

1

sin(πx).

Note that sin(πx) < 2x for 0 < x < 12 , so the above is

< 2q12

∫ 12

12q

1

2x= q

12 log q.

This proves the claim for χ primitive. In fact, we have the tighter bound

(2.3)

∣∣∣∣∣M+N∑n=M+1

χ(n)

∣∣∣∣∣ < q12 log q.

Now, suppose that χ is not primitive. Let χ1 ∈ Xq1 be the primitive character that induces χ,and write q = q1r. Then,

(2.4)M+N∑n=M+1

χ(n) =M+N∑n=M+1(n,r)=1

χ1(n).

10 PETER S. PARK

We recall the following identity2, valid when m is a positive integer:∑d|m

µ(d) =

{1 if m = 1,

0 if m > 1,

where the Mobius function µ : Z>0 → {−1, 0, 1} is defined by

µ(n) :=

1 if n = 1,

0 if n has repeated prime factors,

(−1)k if n is the product of k distinct prime factors.

Using the above, we see that (2.4) is equal to

M+N∑n=M+1

χ1(n)∑d|(n,r)

µ(d) =∑d|r

µ(d)

M+N∑n=M+1d|n

χ1(n) =∑d|r

µ(d)χ1(d)∑

M+1d≤m≤M+N

d

χ1(m).

Taking absolute values and applying (2.3) to the inner sum of the right-hand side above, we obtain∣∣∣∣∣M+N∑n=M+1

χ(n)

∣∣∣∣∣ ≤∑d|r

|µ(d)χ1(d)| ·

∣∣∣∣∣∣∣∑

M+1d≤m≤M+N

d

χ1(m)

∣∣∣∣∣∣∣ < q121 log q1

∑d|r

1 ≤ q121 log q1 · 2

∑d|r

d≤√r

1,

where we use the bijection between the divisors of r that are less than√r and the divisors that are

greater than√r, given by d 7→ r

d . We can use the trivial bound on the inner sum of the right-handside above to get∣∣∣∣∣

M+N∑n=M+1

χ(n)

∣∣∣∣∣ < q121 log q1 · 2

∑d|r

d≤√r

1 ≤ q121 log q1 · 2r

12 = 2q

12 log q1 ≤ 2q

12 log q. �

3. The Large Sieve

Recall Bessel’s inequality, the statement that given orthonormal vectors φ1, . . . ,φR of an innerproduct space V over C, the inequality

R∑r=1

|(ξ,φr)|2 ≤ ‖ξ‖2

holds for any ξ ∈ V . However, in number theory, one often deals with vectors that are notnecessarily orthonormal. Thus, it makes sense to seek a general bound analogous to Bessel’sinequality, i.e, an inequality of the form

R∑r=1

|(ξ,φr)|2 ≤ A ‖ξ‖2

2To prove this, note that for pe11 · · · pajj 6= 1 (i.e, j ≥ 1), we have∑

d|pe11 ···pajj

µ(d) = µ(1) + µ(p1) + . . .+ µ(pj) + µ(p1p2) + . . .+ µ(pj−1pj) + . . .+ µ(p1 · · · pj)

=

(j

0

)+

(j

1

)(−1) +

(j

2

)(−1)2 + . . .+

(j

j

)(−1)j = (1− 1)j = 0.


for some A that depends on φ1, . . . ,φR. Ideally, we would like the constant A to be close to 1 whenthe vectors φr are close to being orthonormal in some sense. To this end, we prove the following.

Proposition 3.1. Let φ1, . . . ,φR be arbitrary vectors of an inner product space V over C. Then,for

A := maxr∈{1,...,R}

R∑s=1

|(φr,φs)| ,

the inequalityR∑r=1

|(ξ,φr)|2 ≤ A ‖ξ‖2

holds for all ξ ∈ V .

Proof. Let u1, . . . , uR ∈ C. Since 0 ≤ (|ur| − |us|)2 = |ur|2 − |2urus|+ |us|2, we have∣∣∣∣∣R∑r=1

R∑s=1

urus(φr,φs)

∣∣∣∣∣ ≤R∑r=1

R∑s=1

1

2

(|ur|2 + |us|2

)|(φr,φs)| .

The summands such that r = s are |ur|2 |(φr,φr)|, and the summands such that r 6= s can besymmetrically paired off so that the sum of each pair’s summands is

1

2

(|ua|2 + |ub|2

)|(φa,φb)|+

1

2

(|ub|2 + |ua|2

)|(φb,φa)| = |ua|2 |(φa,φb)|+ |ub|2 |(φb,φa)|

Thus, we can rearrange the double sum as

R∑r=1

R∑s=1

1

2

(|ur|2 + |us|2

)|(φr,φs)| =

R∑r=1

|ur|2R∑s=1

|(φr,φs)| ,

which overall gives us∣∣∣∣∣R∑r=1

R∑s=1

urus(φr,φs)

∣∣∣∣∣ ≤R∑r=1

|ur|2R∑s=1

|(φr,φs)| ≤

(max

r∈{1,...,R}

R∑s=1

|(φr,φs)|

)R∑r=1

|ur|2 = AR∑r=1

|ur|2 .

But by Boas’ criterion [1], the above implies that the inequality

R∑r=1

|(ξ,φr)|2 ≤ A ‖ξ‖2

holds for all ξ ∈ V . Indeed, following Boas’ argument, we have

0 ≤

∥∥∥∥∥ξ −R∑r=1

urφr

∥∥∥∥∥2

=

(ξ −

R∑r=1

urφr, ξ −R∑r=1

urφr

)

= ‖ξ‖2 −R∑r=1

ur(ξ,φr)−R∑r=1

ur(ξ,φr) +

R∑r=1

R∑s=1

urus(φr,φs)

≤ ‖ξ‖2 − 2 Re

(R∑r=1

ur(ξ,φr)

)+A

R∑r=1

|ur|2 ,

upon which we can take ur =1

A(ξ,φr) to see that

0 ≤ ‖ξ‖2 − 2

A

R∑r=1

|(ξ,φr)|2 +1

A

R∑r=1

|(ξ,φr)|2 = ‖ξ‖2 − 1

A

R∑r=1

|(ξ,φr)|2 ,

12 PETER S. PARK

which is equivalent to our desired statement. �

We now introduce a notion that is closely related to the above discussion: the large sieve. Thisidea was first proposed by Linnik in [8], and soon after, he used the large sieve to investigatethe distribution of quadratic nonresidues. However, it was Renyi who methodically developed thelarge sieve, using a probabilistic point of view. The large sieve eventually became a standard toolfor number theorists, owing largely to Roth’s fundamental paper [12] that obtained essentiallyoptimal estimates for the large sieve, and to Bombieri’s application of a further refined large sieveto the distribution of primes in arithmetic progressions [2], which is the subject of this exposition.Davenport and Halberstam [4] have since formalized the definition of the large sieve in the followinganalytic language.

Definition 3.2. Let N ∈ Z>0 and δ > 0. For a real-valued function ∆ := ∆(N, δ) > 0, we saythat the large sieve inequality holds with ∆ precisely if the following holds for any integer M andsequence of complex numbers {an}n∈Z≥0

:

Denoting

S(α) :=M+N∑n=M+1

ane(nα),

for any R ∈ Z>0 and any α1, . . . , αR ∈ R such that3 r 6= s =⇒ ‖αr − αs‖ ≥ δ, we have

R∑r=1

|S(αr)|2 ≤ ∆

M+N∑n=M+1

|an|2 .

In fact, to show that the large sieve inequality holds with ∆, it suffices to check that the abovecondition holds for one choice of M . This can be seen by defining

TK(α) :=K+N∑n=K+1

aM−K+n · e(nα) = e((K −M)α)S(α)

for each K ∈ Z and noting that |S(α)| = |TK(α)|. Thus, showing that the aforementioned conditionholds for M is equivalent to showing it for all K ∈ Z.

A quick argument shows that a large sieve ∆ must necessarily be � N + 1δ .

Proposition 3.3. Suppose that ∆ satisfies the large sieve inequality for N ∈ Z>0 and δ > 0. Then,

∆ ≥ max

{N,

1

δ− 1

}.

Proof. Setting M = 0, an = 1 for all n ∈ Z, R = 1, and α1 = 0 in Definition 3.2, ∆ necessarilysatisfies |S(0)|2 ≤ ∆ ·N . Since S(0) = N , we have N ≤ ∆, as needed.

To prove the other required bound ∆ ≥ 1δ − 1, we can suppose that δ < 1, since otherwise we

are done. For any R ∈ Z≥0 and any α1, . . . , αR ∈ R, we have∫ 1

0

R∑r=1

|S(αr + β)|2 dβ = R

∫ 1

0|S(β)|2 dβ = R

M+N∑n=M+1

|an|2 .

Furthermore, by the mean value theorem for integrals, there exists γ ∈ [0, 1] such that

R∑r=1

|S(αr + γ)|2 =

∫ 1

0

R∑r=1

|S(αr + β)|2 dβ = R

M+N∑n=M+1

|an|2 .

3In a departure from prior notation, ‖·‖ will heretofore denote the distance to the closest integer.


Note that ‖αr + γ − (αs + γ)‖ = ‖αr − αs‖. Thus, the above implies that R ≤ ∆ as long asα1, . . . , αR satisfy r 6= s =⇒ ‖αr − αs‖ ≥ δ. In particular, note that R =

⌊1δ

⌋allows for such a

choice of α1, . . . , αR; for example, one can define αr = rδ. Thus, ∆ ≥⌊1δ

⌋≥ 1

δ − 1. �

The above proposition shows that the following large sieve has the optimal order of magnitude.

Theorem 3.4. The large sieve inequality holds with ∆ = N +3

δ.

Proof. If R = 1, then by the Cauchy-Schwarz inequality,

|S(α1)|2 =

∣∣∣∣∣M+N∑n=M+1

ane(nα1)

∣∣∣∣∣2

≤ NM+N∑n=M+1

|an|2 ,

from which the claim is trivial. So, we can suppose that R ≥ 2. Note that ‖αr − αs‖ ≥ δ for r 6= s,and also that the large sieve inequality only depends on αr (mod 1). So, without loss of generality,we can suppose 0 ≤ α1 < α2 < · · · < αR < 1 ≤ α1 + 1, where the difference between any twoadjacent terms in this inequality is greater than or equal to δ. It follows that δ ≤ 1

R ≤12 .

As remarked earlier, it suffices to show the claim for one value of M , so we have reduced toshowing

(3.1)

R∑r=1

∣∣∣∣∣K∑

k=−Kake(kαr)

∣∣∣∣∣2

≤(

2K +3

δ

) K∑k=−K

|ak|2

for arbitrary K ∈ Z≥0. Indeed, this proves the claim for when N is odd (by setting N = 2K + 1)as well as for when N is even (by setting N = 2K and a−K = 0).

We proceed by applying Theorem 3.1 to

`2 =

{v = (vk)k∈Z : vk ∈ C such that

∑k∈Z|vk|2 <∞

}with inner product

(x,y) =∑k∈Z

xkyk.

We will define ξ and the vectors φr in terms of nonnegative real numbers {bk}k∈Z such that∑k∈Z bk <∞ and bk > 0 for k ∈ {−K, . . . ,K}. Specifically, we define ξ = (ξk)k∈Z ∈ `2 by

ξk =

ak√bk

if k ∈ {−K, . . . ,K},

0 otherwise,

and φr = (φr,k)k∈Z ∈ `2 by φr,k =√bke(−kαr), for r ∈ {1, . . . , R}. Then, by Theorem 3.1, we have

R∑r=1

∣∣∣∣∣K∑

k=−Kake(kαr)

∣∣∣∣∣2

≤ AK∑

k=−K

|ak|2

bk,

where

A = maxr∈{1,...,R}

|(φr,φs)| = maxr∈{1,...,R}

∣∣∣∣∣∑k∈Z

bke(k(αs − αr))

∣∣∣∣∣ .By writing B(α) :=

∑k∈Z

bke(kα), we can write the above as A = maxr∈{1,...,R}

R∑s=1

|B(αs − αr)|.

14 PETER S. PARK

To prove (3.1), it suffices to show that there exists a choice of {bk}k∈Z such that A ≤ 2K +3

δ, i.e,

that for all r ∈ {1, . . . , R}, we have

(3.2)R∑s=1

|B(αs − αr)| ≤ 2K +3

δ.

With this goal in mind, we set

bk =

1 if |k| ≤ K,

1− |k| −KL

if K ≤ |k| ≤ K + L,

0 otherwise.

Note that B(0) =∑k∈Z

bk = 2K + L.

It will be useful to obtain a general expression for B(α). To this end, we introduce the identity

∑|j|≤J

(J − |j|)e(jα) =J∑

j1=1

J∑j2=1

e((j1 − j2)α) =J∑

j1=1

J∑j2=1

e(j1α) e(j2α) =

∣∣∣∣∣∣J∑j=1

e(jα)

∣∣∣∣∣∣2

=

∣∣∣∣ e(Jα)− 1

1− e(−α)

∣∣∣∣2

=

∣∣∣∣∣ e(J2α)(e(J2α)− e(−J

2α))

e(−12α)

(e(12α)− e(−1

2α))∣∣∣∣∣

2

=

∣∣∣∣∣e(J2α)− e(−J

2α)

e(12α)− e(−12α)

∣∣∣∣∣2

=

(sinπJα

sinπα

)2

,

which holds for J ∈ Z≥0 and α /∈ Z. We apply the identity first for J = K+L and then for J = K.By subtracting the latter from the former, we obtain

B(α) =sin2 (π(K + L)α)− sin2(πKα)

L sin2(πα)(α /∈ Z),

from which we observe that

(3.3) |B(α)| ≤ 1

L sin2(πα)≤ 1

4L ‖α‖2(α /∈ Z),

since sin(πx) ≥ 2x for x ∈ [0, 12 ].Recall that ‖αs − αr‖ ≥ δ for r 6= s. Thus, it follows from (3.3) that

R∑s=1

|B(αs − αr)| = B(0) + 2

∞∑j=1

1

4Lj2δ2= 2K + L+

π2

12Lδ2≤ 2K + L+

1

Lδ2.

Set L =⌈1δ

⌉to obtain

R∑s=1

|B(αs − αr)| ≤ 2K +1

δ+ 1 +

1

δ≤ 2K +

3

δ,

where we have used the fact that δ ≤ 12 . Thus, (3.1) holds, which proves our claim. �

While Theorem 3.4 has the optimal order of magnitude, we remark that constants can be im-proved. To illustrate, Selberg has shown that the large sieve inequality holds with ∆ = N + 1

δ − 1;for a detailed discussion and proof, see the survey article [9].

We conclude our discussion of the large sieve with an important application to estimating averagesof character sums (due to Gallagher in [5]), which will be the cornerstone of our main proof.


Theorem 3.5 (Gallagher). For M ∈ Z, Q,N ∈ Z>0, and complex numbers {an}n∈Z≥0, we have4

∑q≤Q

q

φ(q)

∑χ∈X∗q

∣∣∣∣∣M+N∑n=M+1

anχ(n)

∣∣∣∣∣2

≤ (N + 3Q2)M+N∑n=M+1

|an|2.

Proof. Recall that we used Lemma 2.10 and Lemma 2.11 to show that for χ ∈ X∗q ,

χ(n) =1

τ(χ)

∑h∈Z/qZ

χ(h)e

(hn

q

)for any integer n. Multiplying both sides by an and summing over all n, we have

M+N∑n=M+1

anχ(n) =1

τ(χ)

∑h∈Z/qZ

χ(h)

M+N∑n=M+1

an · e(hn

q

)=

1

τ(χ)

∑h∈Z/qZ

χ(h)S

(h

q

),

where S(α) :=

M+N∑n=M+1

ane(nα) as before.

Recall from Lemma 2.11 that |τ(χ)| = q12 . Thus, taking the squared absolute value of the above

equality and summing over all χ ∈ X∗q , we see that

∑χ∈X∗q

∣∣∣∣∣M+N∑n=M+1

anχ(n)

∣∣∣∣∣2

=1

q

∑χ∈X∗q

∣∣∣∣∣∣∑

h∈Z/qZ

χ(h)S

(h

q

)∣∣∣∣∣∣2

≤ 1

q

∑χ∈Xq

∣∣∣∣∣∣∑

h∈Z/qZ

χ(h)S

(h

q

)∣∣∣∣∣∣2

=1

q

∑χ∈Xq

q−1∑h=0

q−1∑k=0

χ(h)χ(k)S

(h

q

)S

(k

q

)

=1

q

q−1∑h=0

q−1∑k=0

S

(h

q

)S

(k

q

) ∑χ∈Xq

χ(h)χ(k).

By orthogonality (Proposition 2.3),∑χ∈Xq

χ(h)χ(k) equals φ(q) if h ≡ k (mod q) and (h, q) = 1, and

vanishes otherwise. Thus,

∑χ∈X∗q

∣∣∣∣∣M+N∑n=M+1

anχ(n)

∣∣∣∣∣2

≤ 1

q

q−1∑h=0

q−1∑k=0

S

(h

q

)S

(k

q

) ∑χ∈Xq

χ(h)χ(k) =φ(q)

q

q−1∑h=0

(h,q)=1

∣∣∣∣S(hq)∣∣∣∣2 .

It follows that ∑q≤Q

q

φ(q)

∑χ∈X∗q

∣∣∣∣∣M+N∑n=M+1

anχ(n)

∣∣∣∣∣2

≤Q∑q=1

q−1∑h=0

(h,q)=1

∣∣∣∣S(hq)∣∣∣∣2 .

Note that for h1, h2 ∈ {0, . . . , q − 1} and q1, q2 ∈ {1, . . . , Q} such that (h1, q1) = (h2, q2) = 1,∥∥∥∥h1q1 − h2q2

∥∥∥∥ =

∥∥∥∥h1q2 − h2q1q1q2

∥∥∥∥ ≥ 1

q1q2≥ 1

Q2.

4We define X∗q to be the set of primitive Dirichlet characters of period q.

16 PETER S. PARK

Thus, we can apply the large sieve of Theorem 3.4 with δ = 1Q2 to conclude that

Q∑q=1

q−1∑h=0

(h,q)=1

∣∣∣∣S(hq)∣∣∣∣2 ≤ (N + 3Q2)

M+N∑n=M+1

|an|2,

which proves our claim. �

4. The Main Proof

We now set out to prove our main result, Theorem 1.3. Recall from (1.2) that for a, q such that(a, q) = 1, we have

ψ(y; q, a) =1

φ(q)

∑χ∈Xq

χ(a)ψ(y, χ).

Let ψ′(y, χ) := ψ(y, χ)− 1χ0(χ)y. Then,

ψ(y; q, a)− y

φ(q)=

1

φ(q)

∑χ∈Xq

χ(a)ψ′(y, χ),

to which we can apply the triangle inequality to obtain

|E(y; q, a)| ≤ 1

φ(q)

∑χ∈Xq

∣∣ψ′(y, χ)∣∣ .

The right-hand side is independent of a, so in fact, we have

E(y; q) ≤ 1

φ(q)

∑χ∈Xq

∣∣ψ′(y, χ)∣∣ .

For χ ∈ Xq, let χ1 denote the primitive character inducing χ, and let q1 denote the period of χ1.Then, we see that

ψ′(y, χ1)− ψ′(y, χ) = ψ(y, χ1)− ψ(y, χ) =∑p|qp-q1

∑k≤logp y

χ1(pk) log p = O

log y∑p|q

log p

= O ((log y)(log q)) = O

((log qy)2

).

Therefore,

E(y; q)� (log qy)2 +1

φ(q)

∑χ∈Xq

∣∣ψ′(y, χ1)∣∣ ,

from which it follows that∑q≤Q

E∗(x; q)�∑q≤Q

(log qx)2 +1

φ(q)

∑χ∈Xq

maxy≤x

∣∣ψ′(y, χ1)∣∣

≤ Q(logQx)2 +∑q≤Q

∑χ∈Xq

1

φ(q)maxy≤x

∣∣ψ′(y, χ1)∣∣ .

We can count the Dirichlet characters of period q ≤ Q by considering all characters induced byprimitive characters of period q1 ≤ Q. Counting in this way, we can rewrite the above as

(4.1)∑q≤Q

E∗(x; q)� Q(logQx)2 +∑q1≤Q

∑χ1∈X∗q1

maxy≤x

∣∣ψ′(y, χ1)∣∣ ∑

1≤m≤ Qq1

1

φ(mq1)

.


Since φ(m)φ(q1) ≤ φ(mq1), we have for any r ∈ R>1 that∑1≤m≤r

1

φ(mq1)≤ 1

φ(q)

∑1≤m≤r

1

φ(m).

To bound the sum in the right-hand side, we use the fact that φ is multiplicative (note, however,that it is not totally multiplicative) to observe that∑

1≤m≤r

1

φ(m)≤∏p≤r

( ∞∑k=0

1

φ(pk)

)=∏p≤r

(1 +

∞∑h=0

1

ph(p− 1)

)=∏p≤r

(1 +

1

(p− 1)(1− p−1)

)

=∏p≤r

(1 +

p

(p− 1)2

),

and taking the logarithm, we have

log∑

1≤m≤r

1

φ(m)=∑p≤r

log

(1 +

p

(p− 1)2

)≤∑p≤r

p

(p− 1)2=∑p≤r

(1

p+O(p−2)

)= log log(r + 1) +O(1),

where we have applied Mertens’ estimate∑p≤r

1

p= log log(r+1)+O(1) (for a proof, see [3, p.56–57]),

as well as the fact that∑p≤r

1

p2= O(1). Taking the exponential of the above, we arrive at

∑1≤m≤r

1

φ(m)� log(r + 1).

Thus, under the hypothesis Q ≤ x12 of Theorem 1.3, we have for all q1 ≤ Q that∑

1≤m≤ Qq1

1

φ(mq1)≤ 1

φ(q1)log

(Q

q1+ 1

)� log x

φ(q1).

Applying this bound within (4.1) and replacing q1 and χ1 respectively with q and χ for notationalconvenience, we obtain∑

q≤QE∗(x; q)� Q(logQx)2 + (log x)

∑q≤Q

1

φ(q)

∑χ∈X∗q

maxy≤x

∣∣ψ′(y, χ)∣∣ .

We have therefore reduced to showing that for all x ≥ 2 and Q ∈ [x12 (log x)−A, x

12 ], we have∑

q≤Q

1

φ(q)

∑χ∈X∗q

maxy≤x

∣∣ψ′(y, χ)∣∣� x

12Q(log x)4.

To this end, assume for now that the following bound holds.

Proposition 4.1. For all x ≥ 2 and Q ≥ 1, we have∑q<Q

q

φ(q)

∑χ∈X∗q

maxy≤x|ψ(y, χ)| � (x+ x

56Q+ x

12Q2)(logQx)4.

Then, for any U ≥ 1, we can apply the above proposition with Q = 2U to show that∑U<q≤2U

1

φ(q)

∑χ∈X∗q

maxy≤x|ψ(y, χ)| ≤

∑U<q≤2U

q/U

φ(q)

∑χ∈X∗q

maxy≤x|ψ(y, χ)| �

( xU

+ x56 + x

12U)

(logUx)4.

18 PETER S. PARK

We now apply dyadic decomposition. For any Q1 ∈ [1, Q], we can sum the above inequality overall U = 2k for integers k such that 1

2Q1 < 2k ≤ 2Q to obtain∑Q1<q≤Q

1

φ(q)

∑χ∈X∗q

maxy≤x

∣∣ψ′(y, χ)∣∣ ≤ ∑

log2(12Q1)<k≤log2(2Q)

∑2k<q≤2k+1

1

φ(q)

∑χ∈X∗q

maxy≤x|ψ(y, χ)|

�∑

log2(12Q1)<k≤log2(2Q)

( x2k

+ x56 + x

12 2k)

(log 2Qx)4

�(x

Q1+ x

56 logQ+ x

12Q

)(logQx)4,

where we have used the fact that ψ(y, χ) = ψ′(y, χ) for nonprincipal primitive characters χ.Setting Q1 = (log x)A, we have that

x

Q1+ x

56 logQ+ x

12Q� x

12Q

for Q ∈ [x12 (log x)−A, x

12 ], which overall gives∑(log x)A<q≤Q

1

φ(q)

∑χ∈X∗q

maxy≤x

∣∣ψ′(y, χ)∣∣� x

12Q(log x)4

To bound the remaining sum over q ≤ (log x)A, we can apply the variant of Siegel-Walfisz for ψ(s, χ)

(Theorem 1.2) to see that maxy≤x|ψ(y, χ)| � xe−c

√log x, and we can sum this over all q ≤ (log x)A to

obtain ∑1≤q≤(log x)A

1

φ(q)

∑χ∈X∗q

maxy≤x

∣∣ψ′(y, χ)∣∣� (log x)A · xe−c

√log x � x(log x)−A � x

12Q(log x)4.

Thus, we have shown that Theorem 1.3 holds, conditional on the truth of Proposition 4.1.A key ingredient for proving Proposition 4.1 is applying the large sieve to estimate averages of

sums of Dirichlet characters with weighted coefficients, as in the following.

Proposition 4.2. For real numbers Q,M,N ≥ 1 and complex numbers {am}m∈Z≥0and {bn}n∈Z≥0

,we have

∑q≤Q

q

φ(q)

∑χ∈X∗q

maxu∈[1,MN ]

∣∣∣∣∣∣∣∣∑m≤M

∑n≤Nmn≤u

ambnχ(mn)

∣∣∣∣∣∣∣∣� (M +Q2)

12 (N +Q2)

12

∑m≤M

|am|2 1

2∑n≤N|bn|2

12

log(2MN)

Proof. Recall from Theorem 3.5 that

∑q≤Q

q

φ(q)

∑χ∈X∗q

∣∣∣∣∣∣∑m≤M

amχ(m)

∣∣∣∣∣∣2

� (M +Q2)∑m≤M

|am|2 ,


and the analogous inequality holds for {bn}n∈Z≥0. Then, we can apply the Cauchy-Schwarz in-

equality to obtain

∑q≤Q

q

φ(q)

∑χ∈X∗q

∣∣∣∣∣∣∑m≤M

∑n≤N

ambnχ(mn)

∣∣∣∣∣∣≤

∑q≤Q

q

φ(q)

∑χ∈X∗q

∣∣∣∣∣∣∑m≤M

amχ(m)

∣∣∣∣∣∣2

12∑q≤Q

q

φ(q)

∑χ∈X∗q

∣∣∣∣∣∣∑n≤N

bnχ(n)

∣∣∣∣∣∣2

12

� (M +Q2)12 (N +Q2)

12

∑m≤M

|am|2 1

2∑n≤N|bn|2

12

.(4.2)

We wish to insert the condition mn ≤ u into the inner sum. To do so, we construct a well-behavedindicator function similar to (1.4). For α > 0, define

δ(x) :=

{1 if |x| < α,

0 if |x| > α,

for x ∈ R \ {−α, α} , as well as

C =

∫ ∞−∞

sin y

ydy,

which is convergent by the alternating series test. Then, note that∫ ∞−∞

eixysinαy

Cydy =

∫ ∞−∞

(cosxy + i sinxy)sinαy

Cydy =

∫ ∞−∞

cosxy sinαy

Cydy

=

∫ ∞−∞

1

2Cy(sin ((α+ x)y) + sin ((α− x)y)) dy

=1

2C

(∫ ∞−∞

{1 if α+ x > 0−1 if α+ x < 0

}· sin y1

y1dy1 +

∫ ∞−∞

{1 if α− x > 0−1 if α− x < 0

}· sin y2

y2dy2

)= δ(x),

where we have used the fact that the integral of an odd function over R vanishes. Furthermore, itfollows from integration by parts that∫ ∞

T

sinλx

xdx =

[−cosλx

λx

]∞x=T

+

∫ ∞T

sinλx

x2dx� 1

λT,

which gives us the useful estimate

(4.3) δ(x) =

∫ T

−Teixy

sinαy

Cydy +O

(1

T |α− |x||

).

Setting α = log u (where without loss of generality, we can suppose that u is of the form u = k+ 12

for an integer k ∈ [1,MN ], since the inner sum∑m≤M

∑n≤Nmn≤u

ambnχ(mn)

depends only on the integer part of u) and x = − logmn, we have{1 if mn < u0 if mn > u

}=

∫ T

−Tm−iyn−iy

sin(y log u)

Cydy +O

(T−1

∣∣∣logmn

u

∣∣∣−1) .

20 PETER S. PARK

Multiplying this estimate to the inner sum yields

∑m≤M

∑n≤Nmn≤u

ambnχ(mn) =

∫ T

−TA(y, χ)B(y, χ)

sin(y log u)

Cydy+O

T−1 ∑m≤M

∑n≤N|ambn|

∣∣∣logmn

u

∣∣∣−1 ,

where

A(y, χ) =∑m≤M

amχ(m)m−iy and B(y, χ) =∑n≤N

bnχ(n)n−iy.

Since

∣∣∣logmn

u

∣∣∣� min{

1,∣∣∣mnu− 1∣∣∣} ≥ min

{1,

∣∣∣∣∣u+ 12

u− 1

∣∣∣∣∣ ,∣∣∣∣∣u− 1

2

u− 1

∣∣∣∣∣}� 1

u≥ 1

MN

and

|sin(y log u)| ≤ min{1, |y log u|} ≤ min{1, |y| log(2MN)},

it follows that

maxu∈[1,MN ]

∣∣∣∣∣∣∣∣∑m≤M

∑n≤Nmn≤u

ambnχ(mn)

∣∣∣∣∣∣∣∣�∫ T

−T|A(y, χ)B(y, χ)|min

{1

|y|, log(2MN)

}dy +

MN

T

∑m≤M

∑n≤N|ambn| ,

≤∫ T

−T|A(y, χ)B(y, χ)|min

{1

|y|, log(2MN)

}dy +

MN

T

∑m≤M

|am|2 1

2∑n≤N|bn|

12

,

where we have applied the Cauchy-Schwarz inequality. Multiplying the above by qφ(q) and summing

over q ≤ Q and χ ∈ X∗q , we get

∑q≤Q

q

φ(q)

∑χ∈X∗q

maxu∈[1,MN ]

∣∣∣∣∣∣∣∣∑m≤M

∑n≤Nmn≤u

ambnχ(mn)

∣∣∣∣∣∣∣∣�∫ T

−T

∑q≤Q

q

φ(q)|A(y, χ)B(y, χ)|min

{1

|y|, log(2MN)

}dy

+MN

T

∑m≤M

|am|2 1

2∑n≤N|bn|

12

.


Setting T = MN , we can use (4.2) and

Q∑q=1

q

φ(q)≤

Q∑q=1

q � Q2 to bound the first summand by

� (M +Q2)12 (N +Q2)

12

∑m≤M

|am|2 1

2∑n≤N|bn|

12 ∫ T

−Tmin

{1

|y|, log(2MN)

}dy

� (M +Q2)12 (N +Q2)

12

∑m≤M

|am|2 1

2∑n≤N|bn|

12

log(2MN).

Moreover, setting T = MN makes the second summand bounded by the above expression as well.�

Proof of Proposition 4.1. If Q2 > x, then we are done by applying Proposition 4.2 with M =1, a1 = 1, bn = Λ(n), and N = x. So, suppose Q2 < x. For our starting point, we follow

Vaughan’s method [19] of estimating sums of the form∑n≤N

f(n)Λ(n); inspired by I. M. Vinogradov’s

original method [22] for the same purpose, Vaughan developed an analogue in which the details aresignificantly simpler.

We begin with the following identity for the logarithmic derivative of the Riemann zeta functionζ(s) (see Definition A.2): for

F (s) =∑m≤U

Λ(m)m−s and G(s) =∑d≤V

µ(d)d−s,

we have

−ζ′(s)

ζ(s)= F (s)− ζ(s)F (s)G(s)− ζ ′(s)G(s) +

(−ζ′(s)

ζ(s)− F (s)

)· (1− ζ(s)G(s)).

Comparing Dirichlet series coefficients, we discern that

Λ(n) = a1(n) + a2(n) + a3(n) + a4(n),

where a1(n) is the Dirichlet series coefficient of F (s),

a1(n) =

{Λ(n) if n ≤ U,0 if n > U,

a2(n) is the Dirichlet series coefficient of −ζ(s)F (s)G(s),

a2(n) = −∑md|nm≤Ud≤V

Λ(m)µ(d),

a3(n) is the Dirichlet series coefficient of −ζ ′(s)G(s),

a3(n) =∑hd=nd≤V

µ(d) log h,

and a4(n) is the Dirichlet series coefficient of(− ζ′(s)ζ(s) − F (s)

)· (1− ζ(s)G(s)),

a4(n) = −∑mk=nm>Uk>1

Λ(m)

∑d|kd≤V

µ(d)

.

22 PETER S. PARK

Thus, for any y ≥ 1 and χ ∈ X∗q , we have

ψ(y, χ) =4∑j=1

Sj(y, χ, U, V )

for

Sj(y, χ, U, V ) :=∑n≤y

aj(n)χ(n).

It follows that

(4.4)∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x|ψ(y, χ)| ≤

4∑j=1

∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x|Sj(y, χ, U, V )| .

Hence, it suffices to set out to bound each sum∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x|Sj(y, χ, U, V )|

under the hypotheses that x ≥ 2, Q ≤ x12 , U, V ≥ 1, and UV ≤ x. The contribution from

S1(y, χ, U, V ) is � Q2U , since clearly S1(y, χ, U, V ) = O(U).We will first bound the contribution from

S4(y, χ, U, V ) = −∑n≤y

χ(n)∑mk=nm>Uk>1

Λ(m)

∑d|kd≤V

µ(d)

= −∑

U<m≤ yV

Λ(m)∑

V <k≤ ym

∑d|kd≤V

µ(d)

χ(mk),

where we have used the fact that∑d|kd≤V

µ(d) = 0 for k ≤ V , since∑d|n

µ(d) = 0 for n > 1. To this

end, we will apply dyadic decomposition to m. Note that for any M ∈ [2, x],

∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x

∣∣∣∣∣∣∣∣∣∑

U<m≤ yV

M<m≤2M

Λ(m)∑

V <k≤ ym

∑d|kd≤V

µ(d)

χ(mk)

∣∣∣∣∣∣∣∣∣=∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x

∣∣∣∣∣∣∣∣∣∑

1≤m≤2M

∑1≤k≤ x

Mmk≤y

ambkχ(mk)

∣∣∣∣∣∣∣∣∣(4.5)

for

am =

{Λ(m) if max{U,M} < m ≤ min{ xV , 2M},0 otherwise,

and bk =

∑d|kd≤V

µ(d) if k > V,

0 otherwise.


This formulation allows us to apply Proposition 4.2, from which one deduces that (4.5) is

� (Q2 +M)12

(Q2 +

x

M

) 12

∑M<m≤2M

Λ(m)2

12 ∑V <k≤ x

M

d(k)2

12

log x.

Note that∑

M<m≤2MΛ(m)2 � logM

∑M<m≤2M

Λ(m) � M logM . Furthermore, it is true that∑k≤z

d(k)2 � z(log 2z)3. To show this, observe that d(k)2 =∑d|k

f(d), where f is the multiplicative

function defined by f(pa) = 2a+ 1. Then,∑k≤z

d(k)2 =∑k≤z

∑d|k

f(d) =∑d≤z

f(d)⌊zd

⌋≤ z

∑d≤z

f(d)

d≤ z

∏p≤z

(1 +

3

p+

5

p2+ · · ·

)

≤ z∏p≤z

(1− 1

p

)−3� z(log 2z)3,

as claimed. Thus, we use these two bounds to conclude that (4.5) is

� (Q+M12 )(Q+ x

12M−

12

) 12

(M logM)12

( xM

log3 x) 1

2log x

� (Q2x12 +QxM−

12 +Qx

12M

12 + x)(log x)3,

where we have applied Cauchy-Schwarz.Finally, we apply dyadic decomposition. Setting M = 2`U for all nonnegative integers ` such

that 2`U ≤ xV , we have that∑

q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x|S4(y, χ, U, V )| �

∑0≤`≤log2( x

UV)

(Q2x12 +QxM−

12 +Qx

12M

12 + x)(log x)3

� (Q2x12 +QxU−

12 +QxV −

12 + x)(log x)4,(4.6)

since there are � log x such integers `. This is our desired bound for the contribution from S4.Next, we seek to bound the contribution from

S2(y, χ, U, V ) = −∑n≤y

∑md|nm≤Ud≤V

χ(n)Λ(m)µ(d) = −∑m≤Ud≤V

Λ(m)µ(d)∑r≤ y

md

χ(rmd)

= −∑t≤UV

∑md=tm≤Ud≤V

Λ(m)µ(d)

∑r≤ y

t

χ(rt).

To obtain tighter bounds, we split the sum as S2(y, χ, U, V ) = S′2(y, χ, U, V )+S′′2 (y, χ, U, V ), where

S′2(y, χ, U, V ) = −∑t≤U

∑md=tm≤Ud≤V

Λ(m)µ(d)

∑r≤ y

t

χ(rt).

24 PETER S. PARK

and

S′′2 (y, χ, U, V ) = −∑

U<t≤UV

∑md=tm≤Ud≤V

Λ(m)µ(d)

∑r≤ y

t

χ(rt).

Just as before, we can deal with S′′2 applying dyadic decomposition, to the variable t in this case.Analogously to before, we note that for T ∈ [2, x], we have

∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x

∣∣∣∣∣∣∣∣∣∣∑

U<t≤UVT<t≤2T

∑md=tm≤Ud≤V

Λ(m)µ(d)

∑r≤ y

t

χ(rt)

∣∣∣∣∣∣∣∣∣∣=∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x

∣∣∣∣∣∣∑

1≤t≤2T

∑1≤r≤ x

T

atbrχ(rt)

∣∣∣∣∣∣(4.7)

for

at =

∑md=tm≤Ud≤V

Λ(m)µ(d) if max{U, T} < t ≤ min{UV, 2T},

0 otherwise,

and br = 1 for all r. Applying Proposition 4.2, we deduce that (4.7) is

(4.8) � (Q2 + T )12

(Q2 +

x

T

) 12

∑

T<t≤2T

∑md=tm≤Ud≤V

Λ(m)µ(d)

2

12 ( x

T

) 12

log x.

For positive integer t, we have

(4.9)

∣∣∣∣∣∣∣∣∣∣∑md=tm≤Ud≤V

Λ(m)µ(d)

∣∣∣∣∣∣∣∣∣∣≤∑m|t

Λ(m) = log t.

Thus, since∑

T<t≤2T(log t)2 � T (log T )2, it follows that (4.8) is

� (Q+ T12 )(Q+ x

12T−

12

) (T (log T )2

) 12

(x

12T−

12

)log x

� (Q2x12 +QxT−

12 +Qx

12T

12 + x)(log x)2

Summing the above bound over T = 2`U for all nonnegative integers ` such that 2`U ≤ UV , wesee that

(4.10)∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x

∣∣S′′2 (y, χ, U, V )∣∣� (Q2x

12 +QxU−

12 +Qx

12U

12V

12 + x)(log x)3.


We now move on to bounding the contribution from S′2. Applying (4.9), we have

∣∣S′2(y, χ, U, V )∣∣�∑

t≤U(log t)

∣∣∣∣∣∣∑r≤ y

t

χ(t)χ(r)

∣∣∣∣∣∣ ≤ (logU)∑t≤U

∣∣∣∣∣∣∑r≤ y

t

χ(r)

∣∣∣∣∣∣By the Polya–Vinogradov inequality (Theorem 2.12), we have that the inner sum is � q

12 log q for

χ nonpinricpal. Thus, for q > 1, we have∣∣S′2(y, χ, U, V )∣∣� (logU)Uq

12 log q � q

12U(log qU)2.

As for χ principal (i.e, q = 1), we can use the bound∣∣S′2(y, χ, U, V )∣∣� (logU)

∑t≤U

y

t� y(logU)2.

Thus, since Q ≤ x12 and U ≤ UV ≤ x, we overall have∑

q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x

∣∣S′2(y, χ, U, V )∣∣� x(logU)2 +

∑1<q≤Q

q32U(log qU)2

� x(logU)2 +Q52U(logQU)2 � (x+Q

52U)(log x)2.(4.11)

Finally, we bound the contribution from

S3(y, χ, U, V ) =∑n≤y

χ(n)∑hd=nd≤V

µ(d) log h =∑d≤V

µ(d)∑h≤ y

d

χ(hd) log h

� (log y)∑d≤V

max1≤w≤ y

d

∣∣∣∣∣∣∑h≤w

χ(h)

∣∣∣∣∣∣ .Again, we apply the Polya–Vinogradov inequality for q > 1 to get

|S3(y, χ, U, V )| � (log y)∑d≤V

q12 log q ≤ q

12V (log y)(log q) ≤ q

12V (log qy)2,

whereas if q = 1, we have

|S3(y, χ, U, V )| � (log y)∑d≤V

y

d= y(log y)(log V ) ≤ y(log yV )2.

Thus, we overall have∑q≤Q

q

φ(q)

∑χ∈X∗q

maxy≤x|S3(y, χ, U, V )| � x(log V x)2 +

∑1<q≤Q

q32V (log qx)2

� (x+Q52V )(log x)2.(4.12)

By applying the bounds (4.6), (4.10), (4.11), and (4.12) to (4.4), we obtain∑q≤Q

∑χ∈X∗q

maxy≤x|ψ(y, χ)|

� (Q2x12 + x+QxU−

12 +QxV −

12 +Qx

12U

12V

12 +Q

52U +Q

52V )(log x)4.

If x13 ≤ Q ≤ x

12 , then setting U = V = x

23Q−1 yields∑

q≤Q

∑χ∈X∗q

maxy≤x|ψ(y, χ)| � (Q2x

12 + x+Q

32x

23 + x

76 )(log x)4 � Q2x

12 (log x)4,

26 PETER S. PARK

whereas if Q < x13 , then setting U = V = x

13 yields∑

q≤Q

∑χ∈X∗q

maxy≤x|ψ(y, χ)| � (Q2x

12 + x+Qx

56 +Q

52x

13 )(log x)4

� (Q2x12 + x+Qx

56 )(log x)4.

Thus, our claimed bound holds in both cases, and we are done. �

This concludes our proof of the Bombieri–Vinogradov theorem.

Appendix A. Dirichlet L-Functions

Definition A.1. Let χ be a Dirichlet character of period q. The corresponding Dirichlet L-functionis defined by

L(s, χ) :=

∞∑n=1

χ(n)n−s,

for s ∈ C such that5 σ > 1.

Example A.2. Let χ0 be the principal character of period 1. Then, L(s, χ0) is given by thewell-known Riemann zeta function,

ζ(s) :=

∞∑n=1

1

ns.

Since |χ(n)| ≤ 1 for all n ∈ Z>0, it follows from the triangle inequality and the WeierstrassM -test that L(s, χ) is uniformly convergent in any compact subset of {σ > 1}, which shows thatL(s, χ) is analytic in its domain {σ > 1}. Furthermore, in this domain, one can show that theidentity

L(s, χ) =∏p

(1− χ(p)p−s)−1

holds, by appealing to the fundamental theorem of arithmetic (unique prime factorization); we callthis expression the Euler product of the L-function. In particular, if χ is a nonprimitive characterof period q induced by primitive character χ1, comparing their Euler products gives us

L(s, χ) = L(s, χ1)∏p|q

(1− χ1(p)p

−s) .Thus, to understand the behavior of an arbitrary L-function, it suffices to understand that ofL-functions corresponding to primitive characters.

While the domain of Dirichlet L-functions as defined above is restricted to σ > 1, we can in factanalytically continue such L-functions in the following manner.

Theorem A.3. Let χ be a primitive character of period q. Then, the completed L-function corre-sponding to χ, defined by6

ξ(s, χ) := (π/q)−12(s+a(χ))Γ

(1

2(s+ a(χ))

)L(s, χ),

5Heretofore, we use the notation σ = Re s and t = Im s.6In his foundational memoir [11], Riemann denoted the xi function ξ(s) associated to ζ(s) as 1

2s(s−1)π−

s2 Γ( 1

2)ζ(s).

The additional factor of 12s(s − 1) makes the xi function entire. To remedy this deviation from historical tradition,

we note the (unfortunately confusing) relation

ξ(s) :=1

2s(s− 1)ξ(s, χ0),

for the principal character χ0 ∈ X1.


satisfies the functional equation

ξ(1− s, χ) =ia(χ)q

12

τ(χ)ξ(s, χ),

where a(χ) = 0 if χ(−1) = 1 and a(χ) = 1 if χ(−1) = −1. If χ is principal, L(s, χ) can beanalytically continued to C \ {1} with a simple pole of residue 1 at s = 1. If χ is nonprincipal,ξ(s, χ) is entire, and so L(s, χ) can be analytically continued to C.

Proof. The definition of the gamma function gives us

Γ(12s) =

∫ ∞0

e−tt12s−1dt, (σ > 0),

and by substituting t = n2πxq , we obtain

(A.1) π−12sq

12sΓ(12s)n

−s =

∫ ∞0

x12s−1e

−n2πxq dx, (σ > 0).

This gives us the identity

π−12sq

12sΓ(12s)L(s, χ) =

∞∑n=1

π−12sq

12sΓ(12s)χ(n)n−s =

∞∑n=1

χ(n)

∫ ∞0

x12s−1e

−n2πxq dx

=

∫ ∞0

x12s−1

( ∞∑n=1

χ(n)e−n

2πxq

)dx, (σ > 0),

where absolute convergence justifies switching the summation with the integral, as checked by∞∑n=1

∫ ∞0

∣∣∣∣x 12s−1e

−n2πxq

∣∣∣∣ dx =∞∑n=1

∫ ∞0

x12σ−1e

−n2πxq dx =

∞∑n=1

π−12σΓ(12σ)n−σ <∞.

We now consider the following two cases.Case 1: χ(−1) = 1. Note that for σ > 0, we have the equality

(A.2)

ξ(s, χ) = π−12sΓ(12s)L(s, χ) =

∫ ∞0

x12s−1ω(x)dx =

∫ ∞1

x12s−1ω(x)dx+

∫ ∞1

x−12s−1ω(x−1)dx,

where

ω(x) = ω(x, χ) :=

∞∑n=1

χ(n)e−n

2πxq .

First, suppose that q = 1, which means that χ is the principal character and L(s, χ) = ζ(s). Wehave the equality 2ω(x) = θ(x)− 1, where the series

θ(x) :=

∞∑n=−∞

e−n2πx

is known to satisfy the functional equation θ(x−1) = x12 θ(x). Indeed, this is a special case of the

following fact.

Lemma A.4. For any a ∈ C and t > 0, we have∞∑

n=−∞e−πt(n+a)

2=

∞∑n=−∞

t−12 e−

πn2

t e2πina

Proof. By Poisson summation. For details, see [15, p.120]. �

28 PETER S. PARK

Using the functional equation of θ(x), we obtain

ω(x−1) = −1

2+

1

2θ(x−1) = −1

2+

1

2x

12 θ(x) = −1

2+

1

2x

12 + x

12ω(x),

from which we see that∫ ∞1

x−12s−1ω(x−1)dx =

∫ ∞1

x−12s−1ω(x−1)dx =

∫ ∞1

x−12s−1(−1

2+

1

2x

12 + x

12ω(x)

)dx

=

∫ ∞1−1

2x−

12s−1 +

1

2x−

12s− 1

2 + x−12s− 1

2ω(x) = −1

s+

1

s− 1+

∫ ∞1

x−12s− 1

2ω(x)dx.

Thus,

(A.3) ξ(s, χ) = π−12sΓ(12s)ζ(s) = −1

s+

1

s− 1+

∫ ∞1

(x

12s−1 + x−

12s− 1

2

)ω(x)dx.

Note that the integral in the above expression is uniformly convergent in every compact subset ofC by the Weierstrass M -test, since ω(x) decays at least exponentially, as one can see from

ω(x) ≤∞∑n=1

e−nπx =e−πx

1− e−πx= O(e−πx), x > 1, x→∞.

It follows that (A.3) is an analytic continuation of L(s, χ) = ζ(s) to all of C. To identify the pole(s)

of ζ(s), note that π12sΓ(12s)

−1 is entire, which means that the only candidates for poles of ζ(s)

are 0 and 1. However, π12sΓ(12s)

−1 = 0 at s = 0, which means 0 cannot be a pole of ζ(s). Since

π12sΓ(12s)

−1 = 1 at s = 1, it follows that ζ(s) has a simple pole of residue 1 at s = 1. Thus, (A.3)gives an analytic continuation of L(s, χ) = ζ(s) to C \ {1}, and substituting s with 1− s preservesthe right-hand side of (A.3), showing that our proposed functional equation holds.

Next, suppose that q > 1. Then, since χ(n) = χ(−n) for all n ∈ Z and χ(0) = 0, we have

ω(x, χ) =1

2

∞∑n=−∞

χ(n)e−n

2πxq .

In fact, by applying (2.2) and Lemma (A.4) with t = qx , we have

τ(χ)ω(x, χ) =1

2

∞∑n=−∞

(q−1∑m=0

χ(m)e

(mn

q

))e−n

2πxq =

1

2

q−1∑m=0

χ(m)

∞∑n=−∞

e−n

2πxq

+ 2πimnq

=1

2

q−1∑m=0

χ(m)( qx

) 12∞∑

n=−∞e−(n+m

q)2·πq

x =1

2

( qx

) 12

q−1∑m=0

χ(m)

∞∑n=−∞

e−(qn+m)2· π

qx

=1

2

( qx

) 12∞∑

k=−∞χ(k)e

− k2πqx =

( qx

) 12ω(x−1, χ).

We apply this functional equation to split the integral as we have in (A.2), which yields

ξ(s, χ) = π−12sq

12sΓ(12s)L(s, χ) =

∫ ∞0

x12s−1ω(x, χ)dx

=

∫ ∞1

x12s−1ω(x, χ)dx+

∫ ∞1

x−12s−1ω(x−1, χ)dx

=

∫ ∞1

x12s−1ω(x, χ)dx+

q12

τ(χ)

∫ ∞1

x−12s− 1

2ω(x, χ)dx.


Similarly as before, the two integrals in the right-hand side above are both uniformly convergent in

every compact subset of C by the Weierstrass M -test, so ξ(s, χ) is entire. Since π12sq−

12sΓ(12s)

−1 isentire, we have that L(s, χ) is entire. Furthermore, we can replace s with 1− s and χ with χ to get

ξ(1− s, χ) =

∫ ∞1

x−12s− 1

2ω(x, χ)dx+q

12

τ(χ)

∫ ∞1

x12s−1ω(x, χ)dx.

Note that for any primitive χ with χ(−1) = 1, since

τ(χ) =

q−1∑m=0

χ(m)e

(m

q

)=

q−1∑m=0

χ(m)e

(−mq

)=

q−1∑m=0

χ(−m)e

(−mq

)= τ(χ),

it follows from Lemma 2.11 that τ(χ)τ(χ) = |τ(χ)|2 = q. Thus, we overall have

ξ(1− s, χ) =q

12

τ(χ)ξ(s, χ),

as needed.Case 2: χ(−1) = −1. Replace s with s+ 1 in (A.1) to get

π−12(s+1)q

12(s+1)Γ(12(s+ 1))n−s =

∫ ∞0

nx12s− 1

2 e−n

2πxq dx, (σ > −1).

This gives us the identity

ξ(s, χ) = π−12(s+1)q

12(s+1)Γ(12(s+ 1))L(s, χ) =

∫ ∞0

x12s− 1

2ω1(x, χ)dx, (σ > −1),

where

ω1(x, χ) :=∞∑n=1

nχ(n)e−n

2πxq , (x > 0).

As before, we can split the integral to obtain

(A.4) ξ(s, χ) =

∫ ∞1

x12s− 1

2ω1(x, χ)dx+q

12

τ(χ)

∫ ∞1

x−12s− 3

2ω1(x−1, χ)dx.

We wish to prove a functional equation for ω1(x, χ). To do so, we take ∂∂a of the identity given in

Lemma A.4, which yields

−2πt∞∑

n=−∞(n+ a)e−(n+a)

2πt =2πi

t12

∞∑n=−∞

ne−n2πt

+2πina.

Set t = qx and a = m

q to obtain

∞∑n=−∞

ne−n

2πxq

+ 2πimnq = i

( qx

) 32∞∑

n=−∞

(n+

m

q

)e−(n+m

q)2·πq

x .

30 PETER S. PARK

This allows us to derive a functional equation for ω1(x, χ):

τ(χ)ω1(x, χ) =

q−1∑m=0

χ(m)e

(m

q

)(1

2

∞∑n=−∞

ne−n

2πxq

)=

1

2

q−1∑m=0

χ(m)∞∑

n=−∞ne−n

2πxq

+ 2πimnq

=1

2i( qx

) 32

q−1∑m=0

χ(m)

∞∑n=−∞

(n+

m

q

)e−(n+m

q)2·πq

x

=1

2iq

12x−

32

q−1∑m=0

χ(m)∞∑

n=−∞(qn+m) e

−(qn+m)2· πqx

= iq12x−

32

(1

2

∞∑k=−∞

kχ(k)e−πk

2

qx

)= iq

12x−

32ω1(x

−1, χ).

Applying the above to (A.4), we see that

ξ(s, χ) = π−12(s+1)q

12(s+1)Γ(12(s+ 1))L(s, χ)

=

∫ ∞1

x12s− 1

2ω1(x, χ)dx+q

12

τ(χ)

∫ ∞1

x−12s− 3

2ω1(x−1, χ)dx

=

∫ ∞1

x12s− 1

2ω1(x, χ)dx+iq

12

τ(χ)

∫ ∞1

x−12sω1(x, χ)dx.

Note that ω1(x, χ) decays at least exponentially, since for sufficiently large x, we have

ω1(x, χ) ≤∞∑n=1

ne−n

2πxq ≤ e−

πxq +

∫ ∞1

te− t

2πxq dt = O

(e−πx

q

).

Here, we have used that∞∑n=2

ne−n

2πxq is a right-hand Riemann sum approximation of

∫ ∞1

te− t

2πxq dt,

and thus bounds the integral from below, since te− t

2πxq is decreasing in t ∈ [1,∞) for sufficiently

large x. Thus, both integrals in the above expression for ξ(s, χ) are uniformly convergent in everycompact subset of C by the Weierstrass M -test, proving that ξ(s, χ) is entire and consequently,that L(s, χ) is entire. Furthermore, since any primitive χ with χ(−1) = −1 satisfies

τ(χ) =

q−1∑m=0

χ(m)e

(m

q

)=

q−1∑m=0

χ(m)e

(−mq

)= −

q−1∑m=0

χ(−m)e

(−mq

)= −τ(χ),

it follows from Lemma 2.11 that τ(χ)τ(χ) = − |τ(χ)|2 = −q. Thus, we overall have

ξ(1− s, χ) =iq

12

τ(χ)ξ(s, χ),

as needed.�

The functional equation allows us to classify the zeroes of a Dirichlet L-function.

Corollary A.5. Let χ be a primitive character of period q. Any zero of ξ(s, χ) is contained inthe critical strip {0 ≤ σ ≤ 1} (but neither s = 0 or 1 is a zero), and such zeroes are placedsymmetrically with respect to the critical line σ = 1

2 .The zeroes of L(s, χ) are precisely given by the zeroes of ξ(s, χ) (denoted as the nontrivial zeroes)

and simple zeroes at certain nonpositive integers (denoted as the trivial zeroes). If χ is principal, the


trivial zeroes are at s = −2,−4, . . ., whereas otherwise, they are at s = −a(χ),−a(χ)− 2,−a(χ)−4, . . ., where a(χ) = 0 if χ(−1) = 1 and a(χ) = 1 if χ(−1) = −1.

Proof. The Euler product shows that L(s, χ) has no zeroes in {σ > 1}, and π−s+a(χ)

2 qs+a(χ)

2 Γ(s+a(χ)

2

)is nonvanishing for all s ∈ C. Thus, ξ(s, χ) has no zeroes in {σ > 1}. By the functional equation

ξ(1− s, χ) = ia(χ)q12

τ(χ) ξ(s, χ), it follows that ξ(s, χ) also has no zeroes in {σ < 0}.We invoke the important fact that L(1, χ) does not vanish; one way of discerning this is to see

that the Dedekind zeta function of K = Z[ζq] (where ζq is a qth root of unity) satisfies

ζK(s) =∏χ∈Xq

L(s, χ),

and has a pole at s = 1, which means that none of the L(1, χ) can vanish, since among Xq, only theprincipal character χ0 of period q satisfies that L(s, χ0) has a pole, which is simple (for a detaileddiscussion of the Dedekind zeta function and proofs of the aforementioned claims regarding it, werefer the reader to [24, Theorem 4.3]).7 Thus, L(1, χ) and L(1, χ) are nonvanishing, which impliesthat ξ(1, χ), ξ(1, χ), ξ(0, χ), and ξ(0, χ) are nonvanishing as well.

Note that ξ(1− s, χ) = ξ(1− s, χ) = ia(χ)q12

τ(χ) ξ(s, χ), which shows that s is a zero of ξ(s, χ) if and

only if 1− s is also a zero, and these have the same multiplicity. This is precisely the fact that thezeroes of ξ(s, χ) are placed symmetrically with respect to the critical line σ = 1

2 .

Finally, note that π−s+a(χ)

2 qs+a(χ)

2 Γ(s+a(χ)

2

)has no zeroes, and this function’s poles are given by

simple poles at all s such that s+a(χ)2 ∈ Z≤0, i.e, at s ∈ {−a(χ),−a(χ)− 2,−a(χ)− 4, . . .}. If χ is

principal, then ξ(s, χ) has a simple pole at s = 0 and is analytic and nonzero for σ < 0, whereas ifχ is principal, then ξ(s, χ) is analytic and nonzero at s = 0 and for σ < 0. Thus, our final claim inthe proposition holds. �

While we know that the zeroes of ξ(s, χ) (i.e, the nontrivial zeroes of L(s, χ)) are located in thecritical strip {0 < σ < 1}, it is also important to know if such zeroes exist, and if so, how many. Infact, we see that there are infinitely many such zeroes.

Theorem A.6. Let χ be a primitive Dirichlet character. Define f(s) = ξ(s) = 12s(s− 1)ξ(s, χ0) if

χ is principal and f(s) = ξ(s, χ) otherwise. The entire function f(s) has order 1 and has infinitelymany zeroes. In particular, L(s, χ) has infinitely many nontrivial zeroes.

Proof. First, we show that there exists C > 0 such that |f(s)| = O(eC|s| log|s|) for sufficiently large|s|. By the functional equation for f , it suffices to show this claim for σ > 1

2 . The factor 12s(s− 1),

if present in the expression for f(s), is polynomial and thus O(eC|s| log|s|) for sufficiently large |s|.The factor (πq )−

12(s+a(χ)) is exponential and thus also O(eC|s| log|s|) for sufficiently large |s|. Recall

Stirling’s formula for Γ(s) (see [15, p.120]), which states that for a fixed δ > 0, we have

Γ(s) = es log se−s√

2π

s12

(1 +O

(1

|s|12

))for sufficiently large |s| such that |arg s| ≤ π − δ. Since σ > 1

2 implies that∣∣arg 1

2s∣∣ < π

2 It followsthat for sufficiently large C > 0, we have

Γ(s+a(χ)

2

)= e

12 (s+a(χ))

(log(s+a(χ))+log

12

)−12 s

√2π

( s+a(χ)2 )12

(1 +O

(1

|s|12

))= O(eC|s| log|s|)

7The classical way to show that L(1, χ) 6= 0 is by appealing to the class number formula, which is essentiallyequivalent to the above method. See [3, §6] for more details.

32 PETER S. PARK

for σ > 12 and sufficiently large |s|.

We only have the factor L(s, χ) left to bound. We note that L(s, χ) is bounded in the half-

plane {σ ≥ 2} by∑∞

n=1 n−2 = π2

6 , so it suffices to show that L(s, χ) is bounded by eC|s| log|s| for

sufficiently large |s| such that 12 < σ < 2. Indeed, letting (x) := x−bxc be the fractional part of x,

we check for an arbitrary X ≥ 1 that

|L(s, χ)| ≤

∣∣∣∣∣∣∑

1≤n≤Xn−s

∣∣∣∣∣∣+

∣∣∣∣∣∑n>X

n−s

∣∣∣∣∣ =

∣∣∣∣∣∣∑

1≤n≤Xn−s

∣∣∣∣∣∣+

∣∣∣∣∫ ∞X−

x−sd bxc∣∣∣∣

=

∣∣∣∣∣∣∑

1≤n≤Xn−s

∣∣∣∣∣∣+

∣∣∣∣[x−s bxc]∞x=X + s

∫ ∞X

x−s−1 bxc dx∣∣∣∣

=

∣∣∣∣∣∣∑

1≤n≤Xn−s

∣∣∣∣∣∣+

∣∣∣∣−bXcXs+ s

∫ ∞X

x−s − (x)

xs+1dx

∣∣∣∣=

∣∣∣∣∣∣∑

1≤n≤Xn−s

∣∣∣∣∣∣+

∣∣∣∣−X1−s +(X)

Xs+

s

s− 1X1−s − s

∫ ∞X

(x)

xs+1dx

∣∣∣∣=

∣∣∣∣∣∣∑

1≤n≤Xn−s

∣∣∣∣∣∣+

∣∣∣∣(X)

Xs+

1

(s− 1)Xs−1 − s∫ ∞X

(x)

xs+1dx

∣∣∣∣Using the fact that 1

2 < σ < 2 and imposing the lower bound t ≥ 1, we further obtain by thetriangle inequality

|L(s, χ)| ≤∑

1≤n≤Xn−

12 +

1

X12

+1

tX12−1

+ (2 + t) · 1

X12+1≤∫ bXc0

1

x12

dx+1

X12

+X

12

t+

2 + t

X32

≤ 2X12 +

1

X12

+X

12

t+

2 + t

X32

Setting X = t, we get that the above is O(t12 ). This clearly implies that L(s, χ) is O(|s|

12 ) for

σ > 12 . Thus, we overall have that f(s) = O(eC|s| log|s|).

On the other hand, it follows from our work above thatf(s)

Γ(12s)decays at most exponentially,

whereas Γ( s+a(χ)2 ) ≥ Γ( s2) > e0.49|s| log|s| for sufficiently large s ∈ R>0, so f(s) cannot be O(eC1|s|)for any C1 > 0. So, f is an entire function of order 1.

Suppose for the sake of a contradiction that f has finitely many zeroes z1, . . . , zN , counted withmultiplicity. Then, since f(0) 6= 0, Hadamard’s factorization theorem (see [15, p.147]) implies

f(z) = eg(z)N∏n=1

(1− z

zn

)for a polynomial g(z) of degree ≤ 1. But this contradicts the fact that f(s) cannot be O(eC1|s|) forany C1 > 0. Thus, f has infinitely many zeroes, i.e, L(s, χ) has infinitely many nontrivial zeroes.

�

We fittingly conclude our discussion of Dirichlet L-functions with a famous conjecture regardingthe placement of their infinitely many nontrivial zeroes, perhaps the most famous open problem inmathematics.


Conjecture A.7 (Generalized Riemann Hypothesis). The nontrivial zeroes of L(s, χ) all lie onthe critical line σ = 1

2 .

Acknowledgments

The author would like to thank Professor Manjul Bhargava for generously agreeing to advise thisjunior independent work and for being understanding of the author’s proposed change in topicsfrom representation theory to analytic number theory halfway into the semester. The author alsowould like to thank Daniel Li and Roy Zhao for their helpful comments on the manuscript, withparticular thanks to Daniel Li for the idea of formulating the definition of a Dirichlet character interms of a commutative diagram.

This paper represents my own work in accordance with university regulations.

References

[1] R. P. Boas Jr., A general moment problem, Amer. J. Math. 63 (1941), 361–370.[2] E. Bombieri, On the large sieve, Mathematika 12 (1965), 201–225.[3] H. Davenport, Multiplicative number theory, Third, Graduate Texts in Mathematics, vol. 74, Springer-Verlag,

New York, 2000. Revised and with a preface by Hugh L. Montgomery.[4] H. Davenport and H. Halberstam, The values of a trigonometrical polynomial at well spaced points, Mathematika

13 (1966), 91–96.[5] P. X. Gallagher, The large sieve, Mathematika 14 (1967), 14–20.[6] D. A. Goldston, J. Pintz, and C. Y. Yildirim, Primes in tuples. I, Ann. of Math. (2) 170 (2009), no. 2, 819–862.[7] H. Hatalova and T. Salat, Remarks on two results in the elementary theory of numbers, Acta Fac. Rerum Natur.

Univ. Comenian. Math. 20 (1970), 113–117 (1970).[8] U. V. Linnik, The large sieve, C. R. (Doklady) Acad. Sci. URSS (N.S.) 30 (1941), 292–294.[9] H. L. Montgomery, A note on the large sieve, J. London Math. Soc. 43 (1968), 93–98.

[10] G. Polya, Uber die verteilung der quadratischen reste und nichtreste, Nachr. Akad. Wiss. Gottingen (1918), 21–29.

[11] B. Riemann, Ueber die anzahl der primzahlen unter einer gegebenen grosse, Monat. der Konigl. Preuss. Akad.der Wissen (1860), 671–680.

[12] K. F. Roth, On the large sieves of Linnik and Renyi, Mathematika 12 (1965), 1–9.[13] I. Schur, Einige bemerkungen zur vorstehenden arbeit des Herrn G. Polya, Nachrichten Konigl. Ges. Wiss.

Gottingen (1918), 30–36.

[14] C. L. Siegel, Uber die classenzahl quadratischer zahlkorper, Acta Arithmetica 1 (1935), no. 1, 83–86.[15] E. M. Stein and R. Shakarchi, Complex analysis, Princeton Lectures in Analysis, II, Princeton University Press,

Princeton, NJ, 2003.[16] A. Strombergsson, Analytic number theory – lecture notes based on Davenport’s book. http://www2.math.uu.

se/astrombe/analtalt08/wwwnotes.pdf.[17] E. C. Titchmarsh, A divisor problem, Rend. Circ. Mat. Palermo 54 (1930), 414–429.[18] R. C. Vaughan, The Bombieri-Vinogradov theorem. http://www.personal.psu.edu/rcv4/Bombieri.pdf.[19] , Sommes trigonometriques sur les nombres premiers, C. R. Acad. Sci. Paris, Serie A 285 (1977), 981–983.[20] A. I. Vinogradov, On the density hypothesis and the quasi-Riemann hypothesis, Dokl. Akad. Nauk SSSR 158

(1964), 1014–1017.[21] I. M. Vinogradov, Sur la distribution des residus et des nonresidus des puissances, J. Soc. Phys. Math. Univ.

Permi (1918), 18–28.[22] , Some theorems concerning the theory of primes, Recueil Math. 2 (1937), no. 44, 179–195.[23] A. Walfisz, Zur additiven Zahlentheorie. II, Math. Z. 40 (1936), no. 1, 592–607.[24] L. C. Washington, Introduction to cyclotomic fields, Second, Graduate Texts in Mathematics, vol. 83, Springer-

Verlag, New York, 1997.[25] Y. Zhang, Bounded gaps between primes, Ann. of Math. (2) 179 (2014), no. 3, 1121–1174.

Department of Mathematics, Princeton University, Princeton, NJ 08544E-mail address: [email protected]

http://www2.math.uu.se/ astrombe/analtalt08/www notes.pdf

http://www2.math.uu.se/ astrombe/analtalt08/www notes.pdf

http://www.personal.psu.edu/rcv4/Bombieri.pdf

mailto:[email protected]

contents introduction - harvard universitypeople.math.harvard.edu/~pspark/bv.pdfthe...

Documents