mandrewmoshier.github.io · 2019-12-31 · what is this about? discrete mathematics consists of...

M . A N D R E W M O S H I E R

C O N T E M P O R A R YD I S C R E T E

M AT H E M AT I C S

M & H P U B L I S H I N G

Copyright © 2019 M. Andrew Moshier

PUBLISHED BY M&H PUBLISHING

MANDREWMOSHIER .GITHUB . IO/LNDM/

Contents

What is this about? 4

I Natural Numbers and Induction 9

1 The Natural Numbers 11

2 Arithmetic 18

3 Laws of Arithmetic 22

4 Examples of Recursion and Induction 31

5 Ordering and Strong Induction 44

6 Lists 52

II Sets, Functions, Relations and Proofs 63

7 Sets, Functions, Relations 68

3

8 Functions in More Detail 81

9 Basics of Relations and Logic 101

10 The Set of Natural Numbers 118

III Applications 125

11 Minimum and Maximum 126

12 Divisibility: Ordering the Natural Numbers by Multiplication 130

13 Greatest Common Divisors and Least Common Multiples 138

14 Prime Numbers 147

What is this about?

D ISCRETE MATHEMATICS consists of many individual topics that, imprecisely, contrast with continuousmathematics, e.g., topics like calculus, real analysis, differential equations, and so on. But a sharp contrastbetween discrete and continuous mathematics is mainly a convenience. We put some topics in one basket,others in a different basket, but things are more connected than that.

Discrete mathematics texts tend to be disorganized, featuring disparate topics that start looking like anIsland of Misfit Math — stuff that simply does not fit anywhere else. In this text, I take the view that discretemathematics comprises a coherent set of ideas centered around some key themes:

• Discrete mathematics is mainly about structure and information. Numbers play an important role,but mainly because different sorts of numbers convey different sorts of information. For example, thenumber of students in a class conveys some information about the class. Or the area under a certain curveconveys some information about the curve.

• The natural numbers and closely related inductive structures are central because they provide a bridgebetween mathematics and computation.

• Mathematical data is “typed”. Natural numbers constitute a type of data; real numbers constitute adifferent type of data; polynomials with real coefficients constitute yet another type of data, and so on.

• Algebraic (equational and inequational) reasoning provides key insights about how structures can berelated.

• Mathematics itself is computational — at least it is extremely productive to think that way as much aspossible.

• Abstraction and use of analogy is a crucial means of discovering new ideas, and of tying what we knowtogether.

In these lecture notes, we deal head on with mathematics as the study of abstract structure. This can befrustrating at first because the concrete applications are not always obvious. The pay off comes when we seethat generalization from particulars leads to much wider applications than we could have anticipated.

For example, 154133+ 111111

(a number with more that 100 trillion digits) is not divisible by 11. These twonumbers are far too big to calculate explicitly as a numeral. We can not actually hope to perform the expo-nentiations and addition, and check whether the result is divisible by 11. Yet, we are completely confidentthat if we could perform the calculations, the result would be a number that is not divisible by 11.

We should be able to convince ourselves thatmn is always divisible bymwhenever n is a positiveinteger, not just by taking that for granted. Likewise, we can check (by performing a long division) that

5

154133 is not evenly divisible by 11. Finally, we should also be able to convince ourselves that ifm is notdivisible by some number (let’s call it p) and n is divisible by p, thenm+n is not divisible by p. So, we canreason about a particular example indirectly by working about more general principles.

Figuring out how to convince someone that something like this must true is actually the main activity ofmathematics.

Persuasion

Mathematicians spend a lot of time thinking about what is, or isn’t true. But that would be pretty useless,if they did not then communicate the result of their thinking. To actually put mathematics to work, you mustlearn how to pursuade someone else to accept a proposed solution to a problem. It is not enough to knowhow to solve problems.

The standards for persuasion in mathematics are very high. We do not settle for “preponderance ofevidence” or “beyond reasonable doubt”, or “that seems reasonable.” We aim for “beyond any doubt.” Inmost human endeavours (including in the sciences), that standard would be paralyzing, but in mathematics,where abstract structure is the object of investigation, it is not only within reach, but it is the standard thatmakes sense.

In mathematics, a proof is a convincing argument. So it falls into the genre of pursuasive writing. Thispoint of view is critical for understanding how mathematicians work.

The emphasis on writing is important. After all, no one cares what I think, unless I manage to convincehim or her that my thinking is on track.

Most exercises in these notes are not of the sort: “”Solve for x: 4x− 3 = x2". Rather, an exercise might say“Show that the equation 4x− 3 = x2 has two solutions.” The former would tempt you simply to write (witha box around the answer, of course) something like x = 1, 3. The latter asks you to convince the reader thatthese two values are the only solutions.

Particularly in the first part of the text, I will emphasize fairly formulaic writing. This is meant to giveyou practice with precise mathematical writing. Think of it as practicing basic scales and arpeggios.

Roughly, an Outline

These notes start by investigating two ideas that are fundamental to all of mathematics.

• First, we look carefully at the idea of counting things. You might think there is not much to say aboutsomething you have been doing reliably for most of your life. But arithmetic (addition and multiplication,for example) is tightly related to counting. Understanding arithmetic relies on understanding howcounting works.

• Second is the idea of putting things into lists. Again, you have been doing this a long time. You writedown shopping lists, you take two lists and merge them into one list, and so on. Lists are crucial foreverything in mathematics, even how we write a number in base ten. After all, what exactly does 3429mean? A base ten numeral is formed by putting symbols (digits) next to each other into a certain order. So3429 is not the same numeral as 3492. Lists provide a general way to think about things like this.

After looking at the basics of counting and of lists, we turn to the foundational language of contemporary

6

mathematics: sets, functions and relations. Informally, a set is simply a collection of things, such as N — thecollection of natural numbers. A function is a correlation of the things in one set with things in another set.A familiar example is f(x) = x2 that correlates with any real number x, the square x2. A relation is what issays it is: a way to relate things in one set to things in another.

To make sense of sets, functions and relations, we need to consider some basic questions:

• What does it mean say two sets are equal?

• What does it mean to say two functions are equal?

• What does it mean to say two relations are equal?

• How can we specify or “build” particular sets?

• How can we specify or “build” particular functions?

• How can we speficy or “build” particular relations?

The middle part of this text concerns these questions.In the last part of the text, we investigate a variety of useful mathematical ideas using the techniques de-

veloped in the first two parts. The emphasis is mostly on practical ideas that either emerge from computing,or are directly useful there.

Some basics

Before we launch into mathematics proper, we need to sort out some basic notation and discuss how to readthis text. As usual, we typically use single letters, for example, in x2 − 2x+ 1, as placeholders for values.These are frequently called “variables”, but there may or may not be anything “varying” about the idea. Forexample, in the equation 0 = x2 − 2x+ 1, there is exactly one real number (1) that can be the value of x.

There is nothing special about the name ‘x’. I might as well have written y2 − 2y+ 1 or a2 − 2a+ 1. Thedifference is only apparent when we make assumptions about the variables. For example, look at these two:

• “Suppose x is a positive real number. Consider x2 + 3x+ 1.”

• “Suppose x is a positive real number. Consider y2 + 3y+ 1.”

Clearly, they mean different things. In the first one, the polynomial expression is guaranteed to be positive.In the second one, the polynomial has nothing to to with x. Contrast this with these two:

• “Suppose x is a positive real number. Consider x2 + 3x+ 1.”

• “Suppose y is a positive real number. Consider y2 + 3y+ 1.”

Evidently, these two really do mean the same thing. It is important to keep details like this in mind. Tounderstand how a variable should be interpreted look for the context in which it is first mentioned.

Frequently, we use letters in certain parts of the alphabet to refer to specific types on values. For example,I will mostly usem, n and p to refer to natural numbers, and keep using x, y and z for generic values (notfrom a particular type of value). Usually f, g, h stand for functions. Later on, other types of entities willbe needed. I will use Greek letters for some things. So it is in your interest to be familiar with the Greekalphabet — at least, some commonly used letters.

7

Sometimes, a datum should have no meaning on its own. Symbols are just names with no interpretation.I will typically write a symbol in a type face like so: waffles. Thus x is a symbol, whereas x is a variable. Inhandwritten text, a symbol might be indicated by underlining.

Two other type faces will come in handy: san serif and SMALL CAPS. Usage will not make much sensenow. I’ll wait to explain how I intend to use them when the time is right.

Parentheses are used in mathematics is several distinct ways. This, unfortunately, can be a source ofmisunderstanding. Here are some of the ways they are used.

• Function application. We frequently write f(5) to mean “apply the function f to the value 5.

• Pairs, triples and so on: (4, 5) means “the pair consisting of 4 then 5.”

• Ranges. In some situations, the notation (0, 1) means “the collection of real numbers strictly between 0and 1.

• Grouped sub-expressions. In an expression such a+ b · c, you know to evaluate b · c before added a to theresult. But to get the result of multiplying a+ bwith c, you would write (a+ b) · c.

Usually, these different uses of parentheses will not cause any confusion because the context will make itclear what is meant. I just caution you to pay close attention.

Throughout the text, I refer to certain parts as algorithms or definitions. The distinction is not precise. Theidea is that an algorithm is meant to convey a precise way of calculating something; a definition is meantto convey a way to talk about something. For example, addition of two natural numbers is presented asan algorithm because there is a precise sequence of concrete steps to take when adding. But ≤ (the relationof being less than or equal to) is presented as a definition. Even though there may be an calculation todetermine whetherm ≤ n, the calculation is not the main idea. There are no concrete steps in mind fordetermining whetherm ≤ n is true.

This usage of the word algorithm is not meant to be formal. An algorithm in the text is not a program ina particular programming language. Rather, an algorithm is just a precise description of a procedure thatmight be carried out by hand or by computer.

Part I

Natural Numbers and Induction

10 CONTEMPORARY DISCRETE MATHEMATICS

OUR EARLIEST MATHEMATICAL EXPERIENCE is learning to count. Arithmetic on the natural numbers,simple as it seems, exhibits many of the features of higher contemporary mathematics that concern usthroughout this course. By starting with a careful look at arithmetic, you prepare yourself for what comes inthis course and in later mathematics.

An important technique for reasoning about natural numbers and many other mathematical structures iscalled induction. Induction is the main theme of Part I, and is used throughout the rest of the text.

1The Natural Numbers

THE NATURAL NUMBERS have to do with counting. In this chapter,CHAPTER GOALS

Investigate the structure of countingand explain that structure via postulatesfor the natural numbers.

the structure of natural numbers is the topic. The natural numbersconstitute the most important data type of mathematics and com-puting. As the 19th century mathematician Kronecker quipped “Dienatürlichen Zahlen hat der liebe Gott gemacht, alles andere ist Men-schenwerk” (the natural numbers were made by God, the rest is man’shandiwork).

To hone in on that structure, we look at similar structures that failto capture basic aspects of counting. These bogus structures can beruled out by being precise about how counting behaves. The result isa system of easily remembered postulates that distinguish the structureof natural numbers from all others.

1.1 The Basic Picture

A question “how many X’s are there?” is distinct from “How much x isINFORMAL DEFINITION OF NATU-

RAL NUMBERS

A counting or natural number is anumber that can be used to answer aquestion of the form “How many X’s arethere?”

there?”. The latter sort of question could be answered with 12 (as long

as we know how we are measuring). On the other hand, the answer toa “how many” question could be 1, 2, 3 and so on. The answer couldalso be 0 (for example, “how many math professors own unicorns?”).On the other hand, “how many?” can not be answered with 1

2 , π, −5and so on.

Natural numbers may be pictured like stepping stones as in Fig-ure 1.1.

0 1 2 3 . . . Figure 1.1: A picture of the naturalnumbers

Not all “stepping stone” pictures are acceptable as pictures of thenatural numbers. Figures 1.2, 1.3 and 1.4 illustrate three ways not to


picture the natural numbers.

0 . . .

Figure 1.2: Forks in the path

0 Figure 1.3: Nowhere to go

. . . . . . Figure 1.4: Nowhere to start

Examples of how things can go wrongplay an important role in math. They arecalled counter-examples.

These incorrect pictures can be ruled out by explaining the basicvocabulary of counting.

THE NATURAL NUMBERS 13

VOCABULARY 1: Basic Vocabulary of Natural Numbers

The natural numbers have the following features.

• There is a special natural number, zero, denoted by 0.

• For any natural number n, there is a unique next natural num- The notation ny is not standard. I useit to distinguish the successor of n fromn+ 1, because the latter presupposes thatwe already know what + means.

ber, called the successor of n. In these notes, the successor of n isdenoted by ny.

The collection of natural numbers is denoted by N. To indicate The symbol N is very common usage.Any mathematician will know im-mediately that you mean the naturalnumbers when you use this symbol.In Chapter 7, we will start to make thenotation n ∈ N more precise, and willgeneralize to other “data types”.

that a variable n is intended to range over natural numbers, we writen ∈N. So for example, saying “Every n ∈N is either odd or even” isthe same as saying “Every natural number n is either odd or even.”

According to Vocabulary 1, expressions like 0, 0y, 0yy, 0yyy

denote natural numbers. Of course, we usually abbreviate these bywriting 0, 1, 2, 3. But the characters 1, 2, 3, etc., are not related to eachother. Vocabulary 1 makes it clear that 0y is meant to be the numberafter 0; 0yy, the number after that, and so on. We will want to be ableto switch between the familiar “decimal” notation and “successor”notation whenever it is convenient. Assuming the plus sign is part of We will not worry about parentheses

too much in this notation. For example,(0yy)yyy and 0yyyyy mean thesame thing. We will assume that y takeshigh precedence. For example,m · nywill mean “m times the successor ofn.” This is distinct from (m · n)y (“thesuccessor of the productm times n”.

an augmented vocabulary, 2+ 2 = 4 could be written as

0yy + 0yy = 0yyyy.

The next exercises emphasize the switch back and forth.


Exercises

Convert the following expres-sions from decimal notation tosuccessor notation.

1. 9

2. 10

3. 4+ 3

4. n+ 4

Convert the following expres-sions from successor notation todecimal notation.

5. 0yyyy

6. n+ 0y = 0yy +m

7. 5yy

8. 0y + 0yy

1.2 Narrowing to actual counting

F IGURES 1.5 AND 1.6 illustrate correct pictures of Vocabulary 1.Obviously, they are not pictures of the natural numbers. It is notenough just to spell out carefully how to talk about counting. We alsoneed to explain what is not allowed. Postulates play that role.

0

Figure 1.5: A strange way to count

0

Figure 1.6: Another strange way tocount

Figure 1.5 is flawed because 0 has a predecessor: a value n satisfyingny = 0. Figure 1.6 is flawed because an element has two distinctpredecessors: 0y = 0yyyy.

We can rule these problems out by postulating behavior of 0 and ofsuccessors.


POSTULATE 1: Nothing Precedes 0

For every natural number n, it is the case that ny 6= 0.

POSTULATE 2: Predecessors are Unique

For any natural numbersm and n, ifmy = ny thenm = n. In other words, if p is the successor of anatural number, then that predecessor isunique.

Suppose you are told that “m has a predecessor.” Then you knowm = ky for some k. By the previous two postulates, you immediatelyknow two more things. First,m 6= 0. Second, k is unique. That is, wecan speak about the predecessor ofm. This suggests that we can thinkof those natural numbers with predecessor specially. Let us make thatofficial.

DEFINITION 1: Positive Natural Numbers

We say that a natural numberm is positive ifm has a predecessor.In particular, 0 is not positive. Also, following the notation N for thecollection of all natural numbers, we write N+ for the collection of allpositive natural numbers.

Form ∈ N+, the predecessor ofm is the unique k ∈ N satisfyingm = ky. We will write pred(m) for this. In other words, for eachm ∈N+,

m = pred(m)y

and for each n ∈N,n = pred(ny).

The postulates eliminate Figures 1.5, 1.6 and similar bad pictures.But there is still a problem, illustrated in Figure 1.7. The circle labelled? has a predecessor because ?y = ?. But it is not clear that is makessense to call it positive.


0 . . .

?

Figure 1.7: A model of the naturalnumbers?

The picture in Figure 1.7 uses the basic vocabulary correctly, andsatisfies the first two postulates. Yet, it is not a picture of naturalnumbers because it has “extra stuff” in it. In this case, ? is “extrastuff,” but we could have concocted more complicated examples. Thechallenge is to rule out any possible “extra stuff”.

Were we to erase the circle labelled ? and any arrows leading to andfrom it, the remaining part of Figure 1.7 would still live up to the basicvocabulary of natural numbers (Vocabulary 1). This is exactly whatmakes for “extra stuff”: elements that can be removed without doingany harm to our ability to talk about all natural numbers. Clearly,we can not remove 0 because the basic vocabulary requires it. Andwe can not remove 0y because the basic vocabulary requires 0 tohave a successor. Likewise, we cannot remove 0yy because the basicvocbulary requires 0y to have a successor. And so on. So the valuesthat can be reached by a sequence of steps (that is, by counting) from0must stay. Anything else would be extra. This leads to our lastpostulate.

POSTULATE 3: The Axiom of Induction

No natural numbers can be removed without violating Vocabu- This postulate is different from theprevious two. In the Postulates 1 and 2,the axiom refers directly to the things(natural numbers) being talked aboutusing the vocabulary. The Axiom ofInduction refers to the whole collectionof natural numbers at once. Statementsthat refer to individual things are calledfirst-order statements. Those that refer towhole collections are called second-order.

lary 1.

Believe it or not, the vocabulary and three postulates we havestated here completely characterize the standard picture of the naturalnumbers. In other words, any picture that satisfies these will look thesame. A rigorous proof of this is possible, but not necessary for now.

With this axiom, the term positive actually means what we expectit to mean: m is positive if and only ifm is strictly greater than 0. Tomake that claim precise we need to know what strictly greater thanshould mean. This is a topic discussed in Chapter 11.


Exercises

9. Each of the following pictures violates the vocabulary or one ormore of our postulates. For each, explain what is violated.

(a) 0

(b)

(c) 0

(d) 0

10. I have in mind a picture for Vocabulary 1 and Postulates 1 and 2. Exercise 10 shows that in the naturalnumbers, if n 6= 0 then n = ky forsome k. In other words, every n ∈ N

falls into exactly one of two mutuallyexclusive cases: either n = 0 or n has apredecessor.

Furthermore, in the picture, I have in mind an element n for which(a) n 6= 0 and (b) n has no predecessor (that is, n 6= ky for everynatural number k). Convince me that the picture fails to satisfyPostulate 3.

11. Draw three different pictures of situations that satisfy all the pos-tulates except that they fail Postulate 1. So there will be an arrowfrom a bubble into the bubble labelled 0. The result must satisfy allother postulates including the Axiom of Induction.

2Arithmetic

ADDING AND MULTIPLYING arise from counting. In this chapter,CHAPTER GOALS

Define addition and multiplicationas mechanical computations based oncounting.

we explore how to define them computationally, purely in terms ofcounting.

2.1 Basic Arithmetic Operations

Addition works by counting ahead. For example, to add 4+ 5, startwith 4 and then count five more. This is how you first learned to add.Multiplication works by counting a number of additions. To multiply2 · 3, add 2 three times: 2+ 2+ 2. The following capture the idea.

ALGORITHM 1: Addition

The sum ofm ∈N and n ∈N is a natural numberm+n, calculatedby the following:

m+ 0 = m

m+ ky = (m+ k)y for any k

ARITHMETIC 19

ALGORITHM 2: Multiplication

The product ofm ∈ N and n ∈ N is a natural numberm · n, In this text, I will always denote multi-plication with an explicit ·. Though itis conventional to write things like 2minstead of 2 ·m, I keep the multiplicationsign to make things clearer.

calculated by the following:

m · 0 = 0m · ky = m+ (m · k) for any k

An algorithm is a precise specification for how to calculate some-thing. The foregoing are both algorithms in so far as they spell outhow to calculatem+n andm · n for anym and n.

The vocabulary and postulates of natural numbers ensure thatthere are indeed unique operations + and · that satisfy the equations.A detailed proof of this fact is not illuminating right now. We willbe better off proving that recursive algorithms like this always work.Then sum and product will turn out just to be particular instances of ageneral method for specifying algorithms. We will return to this pointin Chapter 10.

In the next chapter, we will establish a proof pattern called “simplearithmetic induction” generalizing this argument. This is one of themainstays of mathematics and computing.

Induction also shows that the equations for sums and productsdefine operations.

EXAMPLE 1:

Calculate 4+ 3:When you need to calculate something“by hand”, use this format. Write thefirst expression followed by =, followedby the first step in the calculation. Alignthe equals signs and write subsequentsteps on new lines. This makes it mucheasier to check your own work. Alsouse another column to write notesexplaining why a step is valid. Theseexplanatory notes are not alwaysneeded when a step is obvious, but fornow get in the habit of writing this way.

3+ 4 = 0yyy + 4 [3 abbreviates 0yyy]

= (0yy + 4)y [ky +n = (k+n)y]

= (0y + 4)yy [Same reason]

= (0+ 4)yyy [Same reason]

= 4yyy [0+n = n]

= (0yyyy)yyy [4 abbreviates 0yyyy]

= 0yyyyyyy [Remove unneeded parentheses]

= 7 [7 abbreviates 0yyyyyyy]


A product can be calculated similarly. Consider 3 · 2.

3 · 2 = 3 · 0yy [2 abbreviates 0yy]

= 3+ (3 · 0y) [m · ky = m+ (m · k)]= 3+ (3+ (3 · 0)) [Same reason]

= 3+ (3+ 0) [m · 0 = 0]= 3+ 3 [m+ 0 = m]

= 3+ 0yyy [3 abbreviates 0yyy]

= (3+ 0yy)y [m+ ky = (m+ k)y]

= (3+ 0y)yy [Same reason]

= (3+ 0)yyy [Same reason]

= 3yyy [m+ 0 = m]

= (0yyy)yyy [3 abbreviates 0yyy]

= 0yyyyyy [Remove unnecessary parentheses]

= 6 [6 abbreviates 0yyyyyy]

We certainly will not want to calculate this way in real life. Even asimple calculation like 3 · 2 = 6 took thirteen steps. But these examples You knew this already. All we have

done is make it explicit.show how addition and multiplication are built mechanistically fromcounting.

Exercises

12. Calculate these sums, following the previous example to writeeach step of your calculation explicitly. Include the reason for eachstep (as in the previous example). Take care to lay out the chain ofequalities correctly, and do not skip any steps.

(a) 2+ 4

(b) 4+ 2

(c) 3+ (3+ 1)

(d) (3+ 3) + 1

(e) 0+ 3

13. Notice that it takes more steps to calculate 2+ 4 than 4+ 2, eventhough they produce the same answer. Explain why.

14. Calculate the following values, writing each step explicity.

ARITHMETIC 21

(a) 2 · 3

(b) 0 · 2

(c) 2 · (2 · 2)

(d) 3 · (2+ 1)

(e) (3 · 2) + (3 · 1)

15. Write a definition of exponentiation via equations form0 and formk

y. Follow the pattern of definition for addition and multiplica-

tion.

3Laws of Arithmetic

YOU DO NOT NEED TO CALCULATE anything to know thatCHAPTER GOALS

Remind ourselves of the basic laws ofaddition and multipication. Establishthe pattern of proof by simple arithmeticinduction. Illustrate how to use thepattern by proving that these laws hold.

335 · (23+ 17) = 335 · 23+ 335 · 17.

This is because you know that multiplication always distributes overaddition. This is a law of arithmetic. You can think of several otherlaws we use all the time, such as commutativity —m+ n = n+m—and associativity —m+ (n+ p) = (m+n) + p.

We can prove that these laws hold by virtue of the structure of We use the names of these laws fre-quently. Many of them show up innon-numerical contexts as well andhave common, familiar names. So youwill do yourself a favor by memorizingthem. A few of them will be more ob-scure to you right now, but they are allimportant for actually using arithmatic.

counting and the definitions of addition and multiplication. In thischapter, we see that the Axiom of Induction gives rise to an importantpattern for proving facts about natural numbers. Later we generalizethe techique to other kinds of data.

The following table summarizes several useful laws of arithmeticon the natural numbers. Most of them will be familiar to you.

LAWS 1: Basic Laws of Arithmetic

For any natural numbers,m, n and p: This table is organized to emphasizesimilarities between addition andmultiplication. Pay attention to that.

LAWS OF ARITHMETIC 23

Associativity m+ (n+ p)= (m+n) + p

m · (n · p)= (m · n) · p

Commutativity m+n= n+m

m · n= n ·m

Identity m+ 0= m = 0+m

m · 1= m = 1 ·m

Positivity ifm+n = 0 then n = 0

Integrality ifm · n = 1 then n = 1

Cancellativity ifm+ p = n+ p thenm = n

ifm · (py) = n · (py) thenm = n

Distributivity m · (n+ p)= (m · n) + (m · p)

Case Distinction ifm 6= 0 thenm ∈N+

No Zero Divisors my · ny 6= 0

The Law of Case Distinction was the subject of Chapter ?? Exer-cise 10. Go back and look at that exercise again.

3.1 Inductive Proofs

Suppose we wish to prove that every natural number has some prop-erty. For example, we wish to prove that 0+m = m for all naturalnumbersm. This is obviously true. But why? Our definition of +says thatm+ 0 = m, not the other way around. So something needsto be explained. We could try saying “addition is commutative, so0+m = m+ 0 and by the definitionm+ 0 = m”. But now we haveanother problem. How do we know that addition is commutative?Again, the defining equations for addition do not explicitly tell us this.

The Axiom of Induction holds the key to proving that some fact istrue for all natural numbers. The idea is simple, but will take somegetting used to. Back to the example of 0+m = m. We could simplytry to check for allmwhether this is true. First, is 0+ 0 = 0 true? Yes.Is 0+ 0y = 0y? Now we need to do a bit of calculating. 0+ 0y =

(0+ 0)y according to the definition of addition. But we just noted that0+ 0 = 0. Now is 0+ 0yy = 0yy? Again, we will need to do somecalculating. Evidently, “just check everything” is not going to workbecause we will never complete the task.


On the other hand, think about the natural numbers that do happento satisfy 0+m = m. If just these natural numbers provide us with amodel of Vocabulary 1, then the Axiom of Induction asserts that wecan not remove any natural numbers from the ones that happen tosatisfy 0+m = m. So all natural numbers must.

Let us see how it works in practice. Say that a natural numberm isnice if it has the property 0+m = m. We want to prove that all naturalnumbers are nice. What we really do is show that the nice naturalnumbers form a picture of Vocabulary 1.

• Evidently, 0 is nice because 0+ 0 = 0. That is part of the algorithmfor addition.

• Suppose k is a nice natural number.

• For the nice natural numbers to form a model of Vocabulary 1, itmust be the case that ky is also nice. So we check:

0+ ky = (0+ k)y [By the addition of algorithm.]

= ky [Because we suppose k is nice]

Putting these together, we see that the nice natural numbers form thecharacteristic “stepping stone” picture: zero is nice, any successor of anice number is nice. So the Axiom of Induction ensures that all naturalnumbers are nice.

A proof employing the Axiom of Induction in this way is called aproof by simple arithmetic induction, or just a proof by induction, forshort. We will see more general forms of induction later.


OUTLINE OF INDUCTIVE PROOFS

To make inductive proofs easier to understand, we often write themin the three-step outline suggested above. To prove that all naturalnumbers are “nice”:

Basis Prove that 0 is nice.

Inductive Hypothesis Assume that k is nice for some (unspecified)k ∈N.

Inductive Step Prove that ky is also nice. You may use the inductive hypothesis(the assumption that k is nice) in theinductive step.From these, the nice numbers constitute a model of Vocabulary 1).So

by the Axiom of Induction, all natural numbers are nice.

The familiar laws of arithmetic are now provable by induction.We do not have to accept them “from authority.” We can check forourselves why they must be true.

PROPOSITION 1: Addition is associative.

For any natural numbersm, n and p,

m+ (n+ p) = (m+n) + p.

PROOF : Suppose thatm and n are fixed natural numbers (not known A common technique is to suppose thatsome of the values in the proof are fixed,but otherwise not special in any way. Aproof that does not use anything specialabout them will be valid for all.

to us). We prove the result by induction on p.

Basis The goal of the basis is the show thatm+ (n+ 0) = (m+n) + 0.Butm+ (n+ 0) = m+n = (m+n) + 0 due to the definition of +.

Inductive Hypothesis Assume thatm+ (n+ k) = (m+ n) + k forsome k ∈N.

Inductive Step The goal is to show thatm+ (n+ ky) = (m+n) + ky.

m+ (n+ ky) = m+ (n+ k)y [Def. of +]

= (m+ (n+ k))y [Same reason]

= ((m+n) + k)y [Inductive Hypothesis]

= (m+n) + ky [Def. of +]

Therefore (by the Axiom of Induction),m+(n+p) = (m+n)+p holdsfor all natural numbers p. Since this argument does not depend on


any extra assumptions aboutm and n, it holds for all natural numbersm and n. �

Mathematicians typically use thesymbol � as a punctuation mark toindicate the end of a proof.

In the remainder of this section, we illustrate the technique ofsimple arithmetic induction by proving other laws of arithmetic.

PROPOSITION 2: 0 is the identity for addition.

For anym, it is the case thatm+ 0 = m andm = 0+m. Because we are currently discussingnatural numbers and nothing else, itis safe to write “For anym” instead of“For any natural numberm”.

PROOF : The first equality is true by the definition of +. The secondequality,m = 0+m, was proved above as an introduction to induction.�

To prove that addition is commutative, we need another fact abouthow successor and addition interact.

LEMMA 1: Successors migrate Mathematicians use the word lemma toindicate that a certain fact is needed tomake other proofs easier. It is sort of likea subroutine.For anym and n,m+ny = my +n.

PROOF : We prove the claim by induction on n. Supposem is somefixed natural number.

Basis The goal is to show thatm+ 0y = my + 0.

m+ 0y = (m+ 0)y [Def. of +]

= my [Def. of +]

= my + 0 [Def. of +]

Inductive Hypothesis Supposem+ ky = my + k for some k ∈N.

Inductive Step The goal is to show thatm+ kyy = my + ky.

m+ kyy = (m+ ky)y Def. of +

= (my + k)y [Inductive Hypothesis]

= my + ky [Def. of +]

So (m+ n)y = my + n for all n. Because the proof does not dependon any assumption aboutm, it is valid for allm. �

Roughly speaking this lemma, and the definition of +, permit us to


movey anywhere within an addition: my+n = (m+n)y = m+ny.So we are free to move a successor “out of the way” whenever weneed to. The next proof illustrates the point.

PROPOSITION 3: Addition is commutative.

For any natural numbersm and n,

m+n = n+m

PROOF : We need to show thatm+ n = n+m for allm and n. Thistime, the proof is by induction onm. Fix a value for n ∈N.

Basis The goal is to show that 0+ n = n+ 0. But 0+ n = n = n+ 0

holds because of Proposition 2 and the definition of +.

Inductive Hypothesis Assume that k+n = n+ k for some k.

Inductive Step The goal is to show that ky +n = n+ ky.

ky +n = k+ny [Lemma 1]

= (k+n)y [Definition of +]

= (n+ k)y [Inductive Hypothesis]

= n+ ky [Def. of +]

Therefore,m+ n = n+m for allm. Since this argument does notdepend on any assumptions about n, it is valid for all n. �

The next law may be less familiar to you. Roughly, it says that wecan “subtract” equals and get equals. But because actual subtractiondoes not generally make sense for natural numbers (5 − 7meansnothing without introducing negative numbers), cancellativity is thebest we can do.


PROPOSITION 4: Addition is cancellative.

For any natural numbersm, n and p, ifm+ p = n+ p thenm = n.

PROOF : The proof is by induction on p. Assume thatm and n are This proof is a little subtler than theprevious ones. But notice that it stillfollows the same form: Basis, InductiveHypothesis, Inductive Step.

some fixed natural numbers.

Basis The goal is to show that ifm+ 0 = n+ 0, thenm = n. Supposem+ 0 = n+ 0. Then immediately by definition of +,m = n.

Inductive Hypothesis Assume that the following statement is true forsome k ∈N: ifm+ k = n+ k thenm = n.

Inductive Step The goal is to show that ifm + ky = n + ky thenm = n. Supposem+ ky = n+ ky [call this supposition (*) forreference]. Then

(m+ k)y = m+ ky [Def. of +]

= n+ ky [By the supposition (*)]

= (n+ k)y [Definition of +]

Thus (m + k)y = (n + k)y. Because predecessors are unique(Postulate 2)m + k = n + k. Thus by the inductive hypothesis,m = n.

Therefore,m+ p = n+ p impliesm = n for all p. Since this argumentdoes not depend on any assumptions regardingm and n, it is valid forallm and all n. �

To prove that multiplication is commutative, we will need thefollowing lemmas (analogous to Proposition 2 and Lemma 1).

LEMMA 2: 0 is an “annihilator”

For any natural number n, it is the case that 0 · n = 0 = n · 0.

PROOF : The algorithm for multiplication directly specifies thatn · 0 = 0.

The proof that 0 · n = n is by induction on n.

Basis The goal is to show that 0 · 0 = 0. But this is directly from thealgorithm for multiplication.

Inductive Hypothesis Assume that 0 · k = 0 for some k ∈N.


Inductive Step The goal is to show that 0 · ky = 0.

0 · ky = 0+ 0 · k [Definition of ·]= 0+ 0 [Inductive Hypothesis]

= 0 [Definition of +]

�

LEMMA 3: Successors migrate in products


my · n = m · n+n.

PROOF : The proof is by induction on n. Assumem is fixed.

Basis The goal is to show thatmy · 0 = m · 0+ 0. Clearlymy · 0 =

0 = 0+ 0 = m · 0+ 0 all follow from the definitions of + and ·.

Inductive Hypothesis Suppose thatmy · k = m · k + k for somek ∈N.

Inductive Step The goal is to show thatmy · ky = m · ky + ky.

my · ky = my +my · k [Exercise]

= my + (m · k+ k) [Exercise]

= (my +m · k) + k [Exercise]

= (m+m · k)y + k [Exercise]

= (m · ky)y + k [Exercise]

= m · ky + ky [Exercise]

Because the proof does on depend on assumptions aboutm, it is truefor allm. �

Some of the other laws are left as exercises.

Exercises

16. Prove that 1 is the identity for multiplication. That is, 1 ·m = m andm = m · 1.


17. Write out the entire proof of Lemma 3 providing the justificationsfor each line of the calculation in the inductive step.

18. Prove that multiplication distributes over addition [m · (n+ p) =

m · n+m · p] by induction on p. You can use any of the lemmas andpropositions we have already proved.

(a) Prove the basis: m · (n+ 0) = m · n+m · 0.

(b) Write the inductive hypothesis.

(c) Prove the inductive step: m · (n+ ky) = m · n+m · ky

19. Prove that multiplication is associative [m · (n · p) = (m · n) · p] byinduction on p.

(a) Prove the basis: m · (n · 0) = (m · n) · 0.

(b) Write the inductive hypothesis.

(c) Prove the inductive step: m · (n · ky) = (m · n) · ky. Hint: Usethe Law of Distribution, which you just proved.

20. Prove that multiplication is commutative. Hint: Use the two Lem-mas we proved right before these exercises.

21. Prove the Law of Positivity: ifm+ n = 0, then n = 0. [Hint: UseCase Distinction.]

22. Prove the Law of Integrality: ifm · n = 1, then n = 1. [Hint: Showthat if n 6= 1, thenm · n 6= 1. By Case Distinction, n 6= 1means thateither n = 0 or n = kyy for some k.]

23. Prove the Law of No Zero Divisors. Hint: my · ny = my +my · n For the record, we don’t prove thecancellation law for multiplication here.We return to it Chapter 11 when wehave other techniques.

by definition of multiplication. No explicit induction is needed forthis proof.

4Examples of Recursion and Induction

ARITHMETIC OPERATIONS ON NATURAL NUMBERS like sum andCHAPTER GOALS

Define several additional operationson natural numbers using recursion.Prove useful facts about the operations,partly to get more practice deployingthe proof pattern of simple arithmeticinduction.

product have been defined by recursion. In this chapter, we look atseveral other operations that arise in applications. Some of these, suchas factorials, are familiar to you. Others are less so.

ALGORITHM 3: Factorial

For natural number n, define the factorial of n, written n!, by thefollowing:

0! = 1

(ky)! = k! · ky

Two closely related computations are know as rising and fallingexponents.

ALGORITHM 4: Rising exponent

For natural numbers n andm, define the rising nth exponent ofm,writtenmn, by the following:

m0 = 1

mky

= m · (my)k

We will also need falling exponents such as 53 = 5 · 4 · 3. Notice that55 should equal 5!, but 510 is not clear. For our present purpose, we


will only definemn when n ≤ m. This avoids the trouble of thinkingabout higher exponents, and is sufficient for our needs.

Notice that n ≤ m really means that n+ d = m for some d. So we We will discuss this in more detail in thenext chapter.can definemn by thinking of it as (n+ d)n.

ALGORITHM 5: Falling exponent

For natural numbers d and n, define the falling exponent (n+ d)n

by recursion:

(0+ d)0 = 1

(ky + d)ky

= (d+ ky) · (k+ d)k

The sense in which rising exponent generalizes factorial is summa-rized in the following lemmas. The first two establish a law similar tothe familiar fact about standard exponents that bx+y = bx · by.

LEMMA 4: Successor in rising exponents


mny

= mn · (m+n).

PROOF : The proof is by induction on n.

Basis The goal is to show thatm0y = m0 · (m+ 0) for allm ∈N. Butm0

y= m · (my)0 = m And likewisem0 · (m+ 0) = m.

Inductive hypothesis Suppose that for some k ∈ N, it is the case thatmk

y= mk · (m+ k) for allm : N

Inductive Step The goal is to show that mkyy = mky · (m+ ky) for I omit the justifications for these calcula-

tions because, by now, you ought to beable to supply them for yourself.

allm ∈N.

mkyy

= m · (my)ky

= m · (my)k · (my + k)

= mky · (m+ ky)

�

EXAMPLES OF RECURSION AND INDUCTION 33

PROPOSITION 5: The law of rising exponents


mn+p = mn · (m+n)p.

PROOF : The proof is by induction on p.

Basis The goal is to show thatmn+0 = mn · (m+n)0. This is clearlytrue for anym and n because n+ 0 = n and (m+n)0 = 1.

Inductive hypothesis Suppose that for some k ∈N,

mn+k = mn · (m+n)k

for all natural numbersm and n.

Inductive step The goal is to show that

mn+ky

= mn · (m+n)ky

for anym and n.

mn+ky

= m(n+k)y

= mn+k · (m+n+ k)

= mn · (m+n)k · (m+n+ k)

= mn · (m+n)ky

[By Lemma 4]

�

LEMMA 5: Factorial is a special case of rising exponent

For any natural number n,

n! = 1n

PROOF : We prove this by induction on n.

Basis The goal is to show that 0! = 10. But 0! = 1 and 10 = 1 bydefinition.

Inductive hypothesis Suppose that k! = 1k for some k,


Inductive step The goal is to show that (ky)! = 1ky .

1ky

= 1k · (1+ k)= k! · ky

= (ky)!

�

A law of falling exponents is analogous to the law of rising expo-nents.

PROPOSITION 6: The law of falling exponents


(m+n+ p)m+n = (m+n+ p)m · (n+ p)n.

PROOF : This is an exercise. �

Finally, rising and falling exponents are related by the following.

LEMMA 6: (my)n = (m+n)n

PROOF : By induction on n, we prove that for allm, the lemma holds.

Basis: Both falling and rising exponents yield 1 for the exponent 0. Sofor allm, (my)0 = (m+ 0)0.

Inductive hypothesis: Suppose that for some k, it is the case that(my)k = (m+ k)k holds for allm.

Inductive step: The goal is to show that (my)ky

= (m+ ky)ky

for allm. Fix an arbitrarym. Then

(my)ky

= (m+ 1) · ((m+ 1)y)k [Definition]

= (m+ 1)1 · (m+ 1+ k)k [Inductive hypothesis]

= (m+ 1+ k)1+k [Law of falling exponents]

= (m+ ky)ky

�


4.1 Summation

You are probably familiar with notation like∑i=2 10i

2, meaning22 + 32 + · · ·+ 102. We devote Chapter ?? to understanding how tocalculate using

∑, but for now just look at basic definitions and a

couple of examples.In∑i=2 10i

2, 2 is called the lower limit of the sum and 10 is theupper limit. We can define

∑recursively.

ALGORITHM 6: Summation

Suppose that ai is a real number for every i betweenm and n,including both. Then

∑ni=m ai = am + am+1 + . . .+ an. That is,

m+0−1∑i=m

ai =0

m+ky−1∑i=m

ai = (∑i=m

m+ k− 1ai) + am+k

The base case may seem strange, but it makes sense if you considerthat

∑mi=m ai ought to be the sum of a single term am. This suggests

that∑m−1i=m ai ought to equal 0.

Consider for an example,∑9i=0 2

i. This is 20 + 21 + 22 + · · · 29. Wecould simply evaluate this directly, but a more informative approachis to try to understand

∑ni=m for arbitrarym ≤ n. The Table ?? shows

the first several values of∑ni=0 2

i for small n, suggesting that the sumis always one less than a power of 2. In fact, 1+

∑ni=0 2

i = 2n+1 for alln.

n∑ni=0 2

n

0 1

1 1+ 2 = 3

2 1+ 2+ 4 = 7

3 1+ 2+ 4+ 8 = 15

4 1+ 2+ 4+ 8+ 16 = 31

Table 4.1: Sums of powers of 2

Exercises

24. Prove that 1+∑ni=0 2

i = 2n+1.

25. Prove that for any natural numbersm ≤ n ≤ p, and any summa-


tion∑p−1i=m ai, it is the case that

p−1∑i=m

=

n−1∑i=m

ai +

p−1∑i=n

ai.

[Hint: This is a generalization of associativity of addition: (a0 +a1) + (a2 + a3) = a0 + a1 + a2 + a3. Your proof will use associativ-ity at a key step.]

26. Prove that for any natural numbersm ≤ n, any constant c and anysummation

∑ni=m ai, it is the case that

n∑i=m

c · ai = cn∑i=m

ai.

[Hint: This is a generalization of distributivity: c · (a0 + a1) =

c · a0 + c · a1. Your proof will use distributivity at a key step.]

27. Prove that for any natural numbersm ≤ n and any summations∑ni=m ai and

∑ni=m bi, it is the case that

n∑i=m

ai + bi =

n∑i=m

ai +

n∑i=m

bi.

[Meta-hint: This is a generalization of another familiar law ofarithmetic.]

Remark: Recall from calculus that∫pmf(x)dx =

∫nmf(x)dx+

∫pnf(x)dx,

∫pmcf(x)dx = c

∫pmf(x)dx,

and ∫pmf(x) + g(x)dx =

∫pmf(x)dx+

∫pmg(x)dx.

These exercises point toward a valuable analogy between integrationand summation. In Chapter ??, we explore this in depth.

Other ways to control indices

It is fairly common to see summations such as∑ni=0 x

n−iyi. The ex-ponent n− i is related to i and to the upper limit n by the obvious factthat i+ (n− i) = n. So we can write

∑i+j=n ai,j as an abbreviation for

the sum of all terms ai,j where i+ j = n. We will look at generalizing


this to other conditions. For example,∑i+j+k=n ai,j,k denotes the

sum of all terms ai,j,k where i+ j+ k = n. For now, though, we onlyneed the case of i+ j = n.

4.2 Binomial coefficients

A binomial is a term of the form (x + y)n for some natural numbern. Evidently (x + y)0 = 1. And (x + y)k

y= (x + y) · (x + y)k =

x(x+ y)k + y(x+ y)k. Evaluating this exponent recursively, one seesthat the result is a sum of terms xiyj where i+ j = n. That is,

(x+ y)n =∑i+j=n

(n

i

)xiyj

where(ni

)is some suitable choice of natural number coefficient. The

goal of this section is to employ inductive reasoning to determine thevalues

(ni

). This shows that the idea of inductive proof can be turned

around to help us discover facts about natural numbers.The basic case (x+ y)0 = 1 can be written as

∑i+j=0 x

iyj. The onlyvalues i and j satisfying i+ j = 0 are i = 0 and j = 0. So apparently(00

)= 1. We do not yet know any other values

(ni

), so we must do a bit

of detective work.

Exercises

28. Clearly, (x+ y)0 = 1, (x+ y)1 = x+ y, and (x+ y)2 = x2 + 2xy+ y2.Write the binomials (x+ y)3 and (x+ y)4 similarly.

Suppose we have already calculated values(ni

)for i ≤ n. So

(x+ y)n =∑i+j=n

(n

i

)xiyj.

We now look for values(ny

i

)for i ≤ ny, so that

(x+ y)ny

=

ny∑i+j=ny

(ny

i

)xiyj.

So start investigating the values(ny

i

), we can use the definition of


exponentiation:

(x+ y)ny

= (x+ y) · (x+ y)n

= x∑i+j=n

(n

i

)xiyj + y

∑i+j=n

(n

i

)xiyj

=∑k+j=n

(n

i

)xk

yyj +

∑i+`=n

(n

k

)xiy`

y,

where I have replaced the index iwith k in the first summation andjwith ` in the second summation in order to keep the four indicesdistinct.

We still do not know much about the numbers(ny

i

), except that if

we know the values(ni

)for i = 0 through i = n, then we should be

able to infer the values of(ny

i

)for i = 0 through i = ny.

The condition k+ j = n falls into two cases. Either j = 0 and k = n,or j has a predecessor ` and and k+ `y = n. Thus

∑k+j=n

(n

k

)xk

yyj =

(n

n

)xn

yy0 +

∑k+`y=n

(n

k

)xk

yy`

y.

Likewise, i+ ` = n falls into two cases. Either i = 0 and ` = n, or i hasa predecessor k, and ky + ` = n. So similarly,

∑i+`=n

(n

i

)xiy`

y=

( ∑ky+`=n

(n

ky

)xk

yy`

y)+

(n

0

)x0yn

y.

Now notice that the conditions k+ `y = n, ky + `y = ny, andky + ` = ny are all the same. So

(x+y)ny

=

(n

n

)xn

yy0+

( ∑ky+`y=ny

[(n

k

)+

(n

ky

)]xk

yy`

y)+

(n

0

)x0yn

y.

Now back to thinking about(ny

i

). The summation that we are

aiming for is∑i+j=ny . . .. The indices i and j fall into three cases.

Either i = 0, or j = 0 or neither. Hence,

(x+y)ny

=

(ny

ny

)xn

yy0+

( ∑ky+`y=ny

(ny

ky

)xk

yy`

y)+

(ny

0

)x0yn

y.

Lining up the terms with like exponents for x and y, we see that the


following must be true:(ny

ny

)=

(n

n

),(

ny

0

)=

(n

0

), and(

ny

ky

)=

(n

k

)+

(n

k

).

We don’t quite have a proof of something yet. But we now havea very good idea of what must be true (and easily proved thanks tothe work we’ve done). We simply need to translate these equationsinto a recursive definition of

(ni

)in general, and then prove that

(x+ y)ny

=∑i+j=n

(ni

)xiyj following the same reasoning as the

preceeding paragraphs.

ALGORITHM 7: Binomial coefficients

For any two natural numbers, i ≤ n, we define(ni

)recursively.

To do that, recall that i ≤ nmeans exactly that i+ j = n for somej. So we have an indirect way to define

(ni

)by recursion on j and i.

That is, we define(i+ji

)explicitly. In the right column below, we also

record the way(ni

)is usually stated (what you will find if look it up

on your favorite search engine).

(0+ 0

0

)= 1

(0

0

)= 1(

0+ `y

0

)=

(0+ `

0

) (ny

0

)= 1(

ky + 0

ky

)=

(k+ 0

k

) (ny

ny

)= 1(

ky + `y

ky

)=

(k+ `y

k

)+

(ky + `

ky

) (ny

ky

)=

(n

k

)+

(n

ky

)for k < n.


EXAMPLE 2:

To compute(53

)by this algorithm, we first must decompose 5 as

2+ 3. Then(3+ 2

3

)=

(3+ 1

3

)+

(2+ 2

2

)=

(3+ 0

3

)+

(2+ 1

2

)+

(2+ 1

2

)+

(1+ 2

1

)= 1+

(2+ 0

2

)+

(1+ 1

1

)+

(2+ 0

2

)+

(1+ 1

1

)+

(1+ 1

1

)+

(0+ 2

0

)= 1+ 1+

(1+ 0

1

)+

(0+ 1

0

)+ 1+

(1+ 0

1

)+

(0+ 1

0

)+

(1+ 0

1

)+

(0+ 1

0

)+ 1

= 1+ 1+ 1+ 1+ 1+ 1+ 1+ 1+ 1+ 1

= 10

LEMMA 7: The binomial coefficents satisfy (x+ yn =∑i+j=n

(ni

)xiyj)

PROOF : Exercise �

Exercises

29. Calculate(42

)by explicit steps as in Example 2.

30. Calculate(52

)by explicit steps as in Example 2.

31. Proof Lemma 7.

32. Prove that for every j and k,(j+kj

)=(k+jk

). Putting it in more

traditional form,(nk

)=(nn−k

)for any k ≤ n.


Clearly, based on your experience with these exercises, calculatingbinomial coefficents for large values by this algorithm is not practical.Another method of calculation is needed.

LEMMA 8: For any natural numbers n and kwhere k ≤ n, it is thecase that

(nk

)= n!k!(n−k)!

.

We prove this in the form: for any natural numbers i and j,

i! · j! ·(i+ j

i

)= (i+ j)!

PROOF : We proceed by induction on j to prove that the equation holdsfor all i.

Basis: For any i,(i+0i

)= 1. So i! · 0! ·

(i+0i

)= (i+ 0)!.

Inductive hypothesis (1): Suppose that for some fixed `, it is the casethat

i! · `! ·(i+ `

i

)= (i+ `)!

holds for all i.

Inductive step: The goal is to show that

i! · `y! ·(i+ `y

i

)= (i+ `y)!

holds for all i. For this, a proof by induction on iwill work.

Basis: 0! · `y! ·(0+`y

0

)= `y! = (0+ `y)!.

Inductive hypothesis (2): Suppose it is the case that

k! · `y! ·(k+ `y

k

)= (k+ `y)!

for some fixed k.

Inductive step: The goal of the inner inductive step is to show that

ky! · `y! ·(ky + `y

ky

)= (ky + `y)!.


By a simple calculation,

ky! · `y! ·(ky + `y

ky

)= ky! · `y! ·

(ky + `

ky

)+ ky! · `y! ·

(k+ `y

k

)= `y · ky! · `! ·

(ky + `

ky

)+ ky · k! · `y! ·

(k+ `y

`y

)= `y · (ky + `)! + ky · k! · `y! ·

(k+ `y

`y

)[I.H. (1)]

= `y · (ky + `)! + ky · (k+ `y)! [I.H. (2)]

= (ky + `y) · (ky + `)!

= (ky + `y)!

Thus the inner inductive step is proved, showing that for all k, itis the case that iy! · k!

(iy+kk

)= (iy + k)!. And since this is the

goal of the outer inductive step, the lemma is proved.

�

The lemma makes calculations easier. For example,(53

)= 5!

(3!·2! =1206·2 = 10— supposing we know how to compute divisions, this isfaster than directly using Algorithm 7.

Many alternative formulations of Lemma 8 can be derived bytranslating factorials as rising or falling exponents. For example, aspecial case of Lemma 5 is j! · (jy)k = (j+ k)!. So

k! ·(j+ k

k

)= (jy)k.

Exercises

33. Prove that∑ni=0

(ni

)= 2n for every n ∈ N. [Hint: Consider what

happens in (x+ y)n when x = y = 1.]

4.3 Fibonacci

The Fibonacci sequence is, historically, one of the first simple sequencesthat is defined by recursion. It shows up in many surprising places,


including analysis of some algorithms. There is even a data struc-ture called a Fibonacci heap. We define the Fibonacci sequence as anoperation on natural numbers, mainly for future use.

ALGORITHM 8:

For a natural number n, define the natural number fib(n) by thefollowing:

fib(0) = 0

fib(1) = 1

fib(kyy) = fib(ky) + fib(k)

The first several values are presented in Table 4.2

n fib(n) n fib(n)

0 0 10 551 1 11 892 1 12 1443 2 13 2334 3 14 3775 5 15 ___6 8 16 ___7 13 17 ___8 21 18 ___9 34 19 ___

Table 4.2: The first values of the Fi-bonacci sequence

Exercises

34. Complete the remaining entries in Table ??.

35. Show that fib(n+ 3) = 2 · fib(n+ 1) + 1 · fib(n) for all n.

36. Show that fib(n+ 4) = 3 · fib(n+ 1) + 2 · fib(n).

37. Show that fib((m+ n)y) = fib(my) · fib(ny) + fib(m) · fib(n) for allm and n.

5Ordering and Strong Induction

THE STANDARD ORDERING of natural numbersm ≤ n does notCHAPTER GOALS

In this chapter, we definem ≤ n

andm < n for natural numbers andderive some useful properties of theserelations. This leads to reformulations ofinduction that will be useful later.

need much explanation. But in fact, it arises in a precise way fromaddition. This fact permits us to reason about order of numbers usingthe arithmetic.

DEFINITION 2: Comparison of Natural Numbers

For natural numbersm and n, say thatm is less than or equal to n(writtenm ≤ n) if and only ifm+ d = n for some d : N. Also say thatm is strictly less than n (m < n) if and only ifm+ dy = n for somenatural number d.

For example, 4 ≤ 9 because 4+ 5 = 9. On the other hand 5 � 3

because there is no natural number d for which 5+ d = 3. Likewise,5 ≮ 5 because there is no d for which 5+ dy = 5.

From this definition, we can immediately infer some useful proper- These three properties of ≤ are impor-tant enough to have names. We saythat ≤ is reflexive, transitive and anti-symmetric. We will study these and otherproperties of relations in Chapter ??.

ties of ≤.

Reflexivity For everym, it is the case thatm ≤ m, This is becausem+ 0 = m. We say that ≤ is reflexive.

Transitivity For everym, n and p, ifm ≤ n and n ≤ p, thenm ≤ p.For ifm ≤ n, thenm + d = n for some d. Similarly, if n ≤ p,then n + e = p for some e. So using associativiy of addition,m+ (d+ e) = (m+ d) + e = n+ e = p. Hencem ≤ p. We say that≤ is transitive.

Anti-symmetry For everym and n, ifm ≤ n and n ≤ m, thenm = n.For ifm ≤ n and n ≤ m, thenm+ d = n and n+ e = m for some dand e. So (using associativity)m+ (d+ e) = m+ 0. But addition is

ORDERING AND STRONG INDUCTION 45

also cancellative. So d+ e = 0. Now the Law of Positivity ( 22) tellsus that e = 0. Hence n+ 0 = m.

Another propery of ≤ that is a bit more subtle to prove is linearity.That is, for any two natural numbersm and n, eitherm ≤ n or n ≤ m.

LEMMA 9: ≤ is linear

For allm ∈N and n ∈N, eitherm ≤ n or n ≤ m. It clearly is possible to have bothm ≤ nand n ≤ m, if the two are equal.

PROOF :We proceed by induction onm.

Basis Clearly, 0 ≤ n for every n because 0+n = n.

Inductive hypothesis Suppose that for some k ∈ N, it is the case thatfor every n ∈N, either k ≤ n or nleqk.

Inductive step The goal is to show that for every n ∈ N, eitherky ≤ n or n ≤ ky. Consider some n. By the inductive hypothesis,either k ≤ n or n < ky. We look at the two cases separately.

Suppose n ≤ k. Then n + d = k for some natural number d. Son+ dy = ky, and hence n ≤ ky.

Suppose k ≤ n. Then k+ d = n for some natural number d. Nowd either is 0 or is some successor. If d = 0, then k = n. Son+ 1 = ny = ky. Hence n ≤ ky. If d is a successor, then d = ey

for some e. So ky + e = k+ ey = n. That is, ky ≤ n.

Both cases have been accounted for, so either ky ≤ n or n ≤ ky.

�

Linearity justifies the intuition that natural numbers are ordered sothey can be arranged in a straight line with smaller numbers to the leftand larger ones to the right. In other words, this jibes with the pictureswe have been drawing all along.

Now that we know the natural numbers are linearly ordered,we can return to the question of cancellativity for multiplication —having punted on a proof of this earlier.


PROPOSITION 7: Multiplication by a non-zero is cancellative.

For all natural numbersm, n and p, ifm · py = n · py thenm = n.

PROOF : Supposem · py = n · py. Because of linear ordering of the The laws of arithmetic used herewere proved by induction earlier. Soimplicitly, this lemma still depends oninduction, even though we do not usethat pattern explicitly.

natural numbers, eitherm ≤ n or n ≤ m. Without loss of generality,assume m ≤ n — that is, if not then swapm and n. Thus m+ d = n forsome d. And so

m · py + 0 = n · py = (m+ d) · py = m · py + d · py.

By cancellativity of addition, 0 = d · py. So by the Law of No ZeroDivisors, d = 0. Hencem = n. �

The ordering on natural numbers permits us to think about a newform of induction, sometimes called strong induction even though it isequivalent to simple arithmetic induction.

Suppose P(k) is a statement about natural numbers. That is P(5)might be true or false, P(2918) must be true or false, and so on. LetP∗(k) be the statement: P(j) is true for all j < k. So P∗(5) is true just incase P(0), P(1), . . . , P(4) are all true.

An inductive proof that P∗(m) holds for allm looks like this:

STRONG INDUCTION , PRELIMINARY VERSION

Suppose P(m) is a property of natural numbers. Let P∗(m) be therelated property so that P∗(m) holds if and only if P(j) holds for eachj < m. Then by induction onm, we prove P∗(m) for allm.

Basis: There is no j < 0. So it is automatically true that P(j) holds forall j < 0. In other words, P∗(0) holds no matter what P is.

Inductive Hypothesis: Suppose P∗(k) holds for some natural number k.That is, P(j) holds for every j < k.

Inductive Step: The goal is to prove that P∗(ky) holds. But by theinductive hypothesis, P(j) holds for every j < k. So if P(k) alsoholds, then P∗(knxt) holds. So the inductive step for P∗(ky) iscomplete just by proving that P(k) holds.

We can simplify this into two parts (because the basis is trivial).


STRONG INDUCTION , USABLE VERSION

Strong inductive hypothesis: Suppose that for some k, it is the case thatP(j) holds for all j < k.

Strong inductive step: Prove that P(k) holds.

Conclude by strong induction that P(m) holds for allm.

EXAMPLE 3:

We claim that the Fibonacci numbers grow exponentially fast. Inparticular, 2n ≤ fib(n+ 2).

The proof is by strong induction.

Strong Inductive Hypothesis Suppose that for some k, it is the case that(32 )

j ≤ f(j+ 2) holds for all j < k.

Inductive Step For k < 2, check that (32 )0 = 1 ≤ fib(2) and (32 )

1 = 32 ≤

fib(3). So the claim is true for k = 0 and k = 1. For values k ≥ 2,write k as jyy for some j. So

(3

2)k =

9

4· (32)j ≤ 10

4· (32)j = (

3

2+ 1) · (3

2)j = (

3

2)jy+3

2.

By the strong inductive hypothesis, (32 )jy ≤ fib(ky) and (32 )

j ≤fib(k). So (32 )

k ≤ fib(k+ 2).

Hence, (frac32)n ≤ fib(n+ 2) for all n.

Minimization

Suppose P(−) is a property of natural numbers. Let’s refer to any n forwhich P(n) holds as a satisfier of P. A least satisfier of P is any n so that(i) n is a satisfier of P and (ii) ifm is a satisfier of P, then n ≤ m.

If P has any satisfier at all, it must have a least satisfier. For supposeP has no least satisfier. Then consider the property ¬P defined by¬P(n) is true if and only if P(n) is false. Evidently, if ¬P(j) is true forevery j < k, then ¬P(k) must also hold. Otherwise, P(k) would hold,so kwould be a satisfier, and since by the strong inductive hypothesis,no natural number less that k is a satisfier, kwould have to be the


least satisfier. We supposed that no such thing exists. So by stronginduction, ¬P(k) holds for every k. In other words, P(k) is false for allk.

We can restate this simply.

THEOREM 1: Minimization Principle for Natural Numbers

Any property of natural numbers that is satisfied by at least onenatural number has a least satisfier.

PROOF : The proof is covered essentially in the foregoing paragraph. �

We will not illustrate this principle here, but you will encounter theidea in other courses. It is sometimes a convenient way to think aboutan inductive proof.

5.1 Laws of ordered arithmetic

The arithmetic laws for addition and multiplication, as stated inChapter ??, are all to do with equality. But now that we have theordering of natural numbers, we ought to spend a bit of time lookingat ≤ interacting with arithmetic.

LEMMA 10: Addition is monotonic

For any natural numbers,m, n and p, ifm ≤ n, thenm+ p ≤ n+ p.

PROOF : This is an exercise. �

LEMMA 11: Addition is order-cancellative

For any natural numbersm, n and p, ifm+ p ≤ n+ p, thenm ≤ n.PROOF : This is an exercise. �


LEMMA 12: Multiplication is monotonic

For any natural numbersm, n and p, ifm ≤ n, thenm · p ≤ n · p.PROOF : This is an exercise. �

LEMMA 13: Multiplication by a positive natural number is order-can-cellative

For any natural numbersm, n and p, ifm · py ≤ n · py, thenm ≤ n.PROOF : This is an exercise. �

Exercises

38. Prove Lemma 10.

39. Prove Lemma 11.

40. Prove Lemma 12.

41. Prove Lemma 13.

5.2 Monus: Subtraction in N

We close this chapter by considering how to make sense of subtrac-tion on natural numbers, where for example, 7− 10 does not work.The idea is pretty simple. The main value in looking at it now is toprovide an example of a general phenomenon that has more useful inapplications later.

If we were looking at integers, subtraction would be the usualinverse of addition. Subtraction of integers is related to addition bythe fact that

a = b+ c if and only if a− b = c

for any integers a, b and c. In fact, this equivalence is precisely whatdefines subtraction for integers. But for natural numbers, for example,13− 7makes sense, but 7− 13 does not. On the other hand, we canrelax the requirement a bit, and ask for an operation − on natural


numbers (known by the obscure term monus) so that for all naturalnumbersm, n and p,

m ≤ n+ p if and only if m −n ≤ p.

Suppose that n ≤ m. Then n+ d = m for some d. Som ≤ n+ p

implies n+ d ≤ n+ p. Since addition is order cancellative, d ≤ p.Conversely, if d ≤ p, thenm ≤ n+ p because addition is monotonic. Soin the case that n ≤ m, the value ofm − n is just the usual difference.For example 13 − 5 = 8.

Supposem < n. Thenm ≤ n+ p is always true, no matter whatp happens to be. And 0 ≤ p no matter what p happens to be. Som − n = 0. Hence − yields the usual subtraction when it can, andyields 0when that is the best it can do.

An explicit algorithm for computing m −n is simple enough, but notvery useful. We write it here just for the record.

ALGORITHM 9: Monus

For natural numbersm and n, definem − n by the followingrecursion for any j and k,

0 − k = 0

jy − 0 = jy

jy − ky = j − k

The main point is thatm −n always yields a natural number that isthe “honest” subtraction when n ≤ m, and is 0 otherwise.

Clearly,m − n ≤ m − n. Som ≤ n + (m − n) holds for anymand n. Similarly,m+ n ≤ m+ n. So (m+ n) −m ≤ n. In fact, sincem ≤ m+n, the result of (m+n) −m is just the standard difference, so(m+n) −m = n.

Exercises

42. Prove that − is monotonic in its first argument. That is, ifm ≤ m ′,thenm −n ≤ m ′ −n.

43. Prove that − is antitonic in its seond argument. That is, if n ≤ n ′,thenm −n ′ ≤ m −n.


44. Determine whether − is order right cancellative, and prove yourresult. That is, is it the case thatm −n ≤ m ′ −n impliesm ≤ m ′?

45. Prove thatm · (n − p) = (m · n) − (m · p).

46. Prove that (m −n) − p = m − (n+ p).

47. Show thatm+ (n − p) is not necessarily equal to (m+n) − p.

6Lists

NATURAL NUMBERS constitute an important example of a generalCHAPTER GOALS

Introduce a formal notion of list.Following the investigation of naturalnumbers, introduce list induction andprove basic facts about lists.

process whereby “bigger” objects are built up from “smaller” ones.The Axiom of Induction captures the idea of building up, providing amethod for proving facts about natural numbers. In this chapter, wedevelop an analogous way to think about lists.

6.1 List Basics

In this section, we concentrate on the fundamental concept of lists.The idea is really meant to be the familiar one, so a list of “to do”items is a list. The alphabetized names on a class roster is a list. Wewill write lists using square brackets, following notation in manycommon programming languages. So for example, [2, 3, 5, 7] is the listof the prime numbers less than 10 in ascending order. For lists, weexpect the order to matter. So [7, 5, 3, 2] is not the same as [2, 3, 5, 7].

Like natural numbers, lists can be built up by starting with anempty list and incrementally adding items. Though we have choicesfor how to formalize this, we will follow a standard that has devel-oped in computer science.

Given the list [2, 5, 1], we will say that 2 is the head and that [5, 1]is the tail. So “head” refers to the initial item and “tail” refers to theshorter list. We may build another list with head 0, and tail [2, 5, 1],resulting in [0, 2, 5, 1].

Following the notation in the language Haskell, we denote theoperation of prepending an item to a list by ::. So 0 ::[2, 5, 1] is the list[0, 2, 5, 1].

An empty list, having no head or tail, is also possible. It is denotedby [ ]. The empty list, together with prepending items, gives us a wayto construct any list we want. The term used in computer science

for “prepend” is cons — short for“construct”.

LISTS 53

EXAMPLE 4:

Here are some examples.

• 5 :: 6 ::[4, 5] is the same as 5 ::[6, 4, 5], which is the same as [5, 6, 4, 5].

• [ ] is the empty list

• 1 ::[ ] is the same as [1]

• 1 :: 2 :: 3 :: 4 ::[ ] is the same as [1, 2, 3, 4].

• 1 :: 2 is nonsense – we can’t prepend an item to something that is nota list.

Now we have a decision to make. Is it reasonable to form a list thatconsists of different kins of data? The examples so far all the items in alist have been natural numbers. But would it make sense to make a list[1, sin]?

VOCABULARY 2: Basic Vocabulary of Lists

• There is a special list, which we call the empty list and denote by[ ].

• For any thing x and any list L, there is another list, having x as thehead and L as the tail. We denote the result as x ::L.

A thing appearing in a list is called an item of the list. Specifically, [ ] For technical reasons that we willdiscuss later, lists do not form a set.So we will never write something listL ∈ List. That notation simple doesn’tmake sense.

has no items. A list x ::L has x as its initial item. Any item of L is anitem of x ::L.

We abbreviate lists using square brackets. So for example, [2, 3] isan abbreviation for 2 :: 3 :: [ ]. When we write a series of :: operationslike this, they associate right to left. So 2 :: 3 :: [ ] means 2 ::(3 :: [ ]).

A list may consist of items that are all similar. For example, the listin the above description is a list of natural numbers. But lists are notrequired to be homogeneous. For example, [4, [3, 4]] is a list consisting oftwo items. The first is the natural number 4. The second is the list [3, 4].So [4, [3, 4]] is an example of a heterogeneous list.

We will need to think separately about homogeneous lists later. Fornow, we do not make any assumptions about items.


As with the natural numbers, we need to think about axioms thatprevent strange behavior. These are closely analogous to the axioms ofnatural numbers.

POSTULATE 4: Empty list has no head or tail.

For any thing x and any list L, [ ] 6= x ::L.

Likewise, a list that is not empty can only be built one way.

POSTULATE 5: Non-empty lists are determined by head and tail

For any things x and y and lists L andM, if x ::L = y ::M, then x = yand L =M.

For example, if I tell you that [2, 3, 4, 5] = x ::L, then you knowimmediately that xmust equal 2 and Lmust equal [3, 4, 5]. No otherinterpretation is possible.

Lists need an induction axiom that ensures that all lists are built upfrom [ ].

POSTULATE 6: The Axiom of List Induction

No lists can be removed without violating Vocabulary 2.

This axiom justifies proofs about lists using a scheme almost identi-cal to simple arithmetic induction.

LISTS 55

L IST INDUCTION

To prove some property is true about all lists,

Basis: Prove that the property is true for [ ].

Inductive hypothesis: Assume that the property is true for some list K.

Inductive step: Prove that for any thing x, the property is true for x ::K.

Conclude that the property is true for all lists.

Operations on lists can also be defined by schemes similar toaddition and multiplication on natural numbers. For example, everylist has a length. Writing len(L) for the length of a list, len([2, 3, 4]) = 3.A precise definition is now easy to formulate.

ALGORITHM 10: Length of a list

For a list L. the length of L, denoted by len(L), is the natural numberdefined by

len([ ]) = 0

len(x ::L) = len(L)y.

EXAMPLE 5:

len([2, 3, 4]) = len(2 ::[3, 4])

= len([3, 4])y

= len(3 ::[4])y

= len([4])yy

= len(4 ::[ ])yy

= len([ ]])yyy

= 0yyy

= 3


Another common operation on lists is concatenation:

[2, 3, 4] ++ [4, 1, 3] = [2, 3, 4, 4, 1, 3]

, whereby the two lists are glued together in their original orders. Thisis defined precisely by the following.

ALGORITHM 11: List concatenation

For lists L andM, their concatenation, denoted by L++M, is a list. Forall listsM, the following are true.

[ ] ++M =M

(x ::K) ++M = x ::(K++M) for any thing x and any list K

Notice that concatenation and addition are defined essentially thesame way: 0 and [ ] play the same roles, ny and x ::L play the sameroles.

EXAMPLE 6: List concatenation

To calculate [4, [5, 2], 1] ++ [3, 4, 1], we follow the algorithm:

[4, [5, 2], 1] ++ [3, 4, 1] = (4 ::[5, 2] :: 1 ::[ ]) ++ [3, 4, 1] [[4, [5, 2], 1] abbreviates 4 ::[5, 2] :: 1 ::[ ]]

= 4 ::(([5, 2] :: 1 ::[ ]) ++ [3, 4, 1]) [Def. of ++]

= 4 ::[5, 2] ::((1 ::[ ]) ++ [3, 4, 1]) [Same]

= 4 ::[5, 2] :: 1 ::([ ] ++ [3, 4, 1]) [Same]

= 4 ::[5, 2] :: 1 ::[3, 4, 1] [Same]

= [4, 5, 2, 1, 3, 4, 1] [Abbreviation]

Now we can prove some useful facts about lists.

LISTS 57

PROPOSITION 8: [ ] is the identity for ++

For list L,[ ] ++ L = L

andL = L++ [ ].

PROOF : By definition [ ] ++ L = L is always true. We proceed byinduction on L to prove the second equality. The proof should lookfamiliar (see the proof of Lemma 2).

Basis: [ ] ++ [ ] = [ ] is true by definition of ++.

Inductive hypothesis: Assume K++ [ ] = K for some list K.

Inductive step: Suppose x is some thing. We need to show that (x ::K) ++[ ] = x ::K.

(x ::K) ++ [ ] = x ::(K++ [ ]) [by definition of ++]

= x ::K [by the Inductive Hypothesis]

Thus (by the Axiom of List Induction), L++ [ ] = L holds for all lists. �

PROPOSITION 9: Concatenation is associative

For all lists L,M and N,

L++ (M++N) = (L++M) ++N.

PROOF : We prove this by induction on L. Again, this should look famil-iar. It is almost identicial to the proofs that addition and multiplicationare associative.

SupposeM and N are fixed lists.

Basis: The goal is to show that [ ] ++ (M++N) = ([ ] ++M) ++N. Thisfollows easily from the definition of ++.

Inductive hypothesis: Suppose K++ (M++N) = (K++M) ++N for somelist K.


Inductive step: The goal is to show that for any x, (x ::K) ++ (M++N) =

((x ::K) ++M) ++N. For any thing x,

(x ::K) ++ (M++N) = x ::(K++ (M++N)) Def. of ++

= x ::((K++M) ++N) Inductive Hypothesis

= (x ::(K++M)) ++N Def. of ++

= ((x ::K) ++M) ++N Def. of ++

So L++ (M++N) = (L++M) ++N is true for all L. Since the proof doesnot depend on any special properties ofM and N (except that they areboth lists), the result is true for all listsM and N. �

Concatenation is analogous to addition in the following way aswell.

PROPOSITION 10: Concatenation is cancellative

For lists L,M and N,

• L++M = L++N impliesM = N; and

• L++N =M++N implies L =M.

PROOF : Exercise. �

Here is another useful fact that we can prove by induction relatinglength to concatenation. We will encounter many operations

like len that translate one operationinto another. In this case, len translates++ into +. Such operations are calledhomomorphisms.

PROPOSITION 11: len is a homomorphism

For any lists L andM, len(L++M) = len(L) + len(M).

PROOF : The proof is by induction on L. SupposeM is some fixed list. This claim is probably obvious to you.Nevertheless, to illustrate the techniqueof list induction again, we prove itexplicitly.

Basis: The goal is to show that len([ ]) + len(M) = len([ ] ++M). Butlen([ ]) + len(M) = 0+ len(M) = len(M) = len([ ] ++M) by definitionof ++ and the fact that 0 is the additive identity.

Inductive Hypothesis: Suppose len(K++M) = len(K) + len(M) holds forsome particular list K.

Inductive Step: The goal is to show that for any x,

len((x ::K) ++M) = len((x ::K)) + len(M).

LISTS 59

Suppose x is some thing.

len((x ::K) ++M) = len(x ::(K++M)) Def. of ++

= len(K++M)y Def. of len

= (len(K) + len(M))y Inductive Hypothesis

= len(K)y + len(M) Lemma 1

= len(x ::K) + len(M) Def. of len

�

Often we use a list somewhat informally without all the punctua-tion. For example, we might say “Consider a list a0, a1, . . . , an−1 ofreal numbers.” If we do not intend to use the list itself for anythingspecial, but only want to think about the numbers a0 through an,then there is no need to be formal about it. Also, there is no harm inwriting something like this: a5, a6, a7, a8, where the indices start at 5.The default is to start at 0, but that is merely a convention.

Sometimes, it is even useful to avoid punctuation completely. Forexample, in text processing words are typically regarded as being listsof letters: the word marmoset is the list [m, a, r, m, o, s, e, t].

Exercises

48. Prove Proposition 10.

49. Show that concatenation is not commutative.

50. Write a recursive algorithm for list reversal. That is, define anoperation rev on lists so that, for example, rev([2, 3, 4]) is [4, 3, 2].[Hint: Just ask yourself what is the reverse of the empty list, andwhat is the reverse of x ::L.]

6.2 List Itemization

In a list L, the items are in order. So we can refer to items by theirposition in the list. There are two standards in mathematics for doingthis. Either we start counting from 1 or from 0. Although starting from0 (meaning that the initial item of a list is item number 0) may takesome getting used to, it actually makes many calculations simpler. Forthat reason, most programming languages use this convention for alists and arrays. I will consistently start with 0.


ALGORITHM 12: Items in a list.

Suppose L is a list and i is a natural number so that i < len(L). Notethat len([ ]) = 0, so there is no i < len([ ]). We define Li as follows.

[ ]i is never defined because i 6< len([ ])

(x ::L)0 = x

(x ::L)ky = Lk provided that Lk is defined

This is a precise way of explaining that in a list, for example L =

[a, b, c, d, e], we can refer to an item by its index, so that L0 = a, L1 = b

and so on, up to L4 = e. Notice that, for this example, L5, L6 and so onare undefined.

EXAMPLE 7:

Suppose L = [a, b, c, d, e]. We can calculate L3 explicitly step by step.

L3 = [a, b, c, d, e]3

= (a :: b :: c :: d :: e ::[ ])0yyy

= (b :: c :: d :: e ::[ ])0yy

= (c :: d :: e ::[ ])0y

= (d :: e ::[ ])0

= d

Of course, this is just a very careful way to find item number 3 inthe list. In every day use, we humans would not do this. We wouldsimply count forward from the beginning of the list. This definitionexplains precisely what “simply count forward” means.

Exercises

51. Suppose L is a list and i < len(L). Then it makes sense to thinkabout the list in which Li is removed. For example, for L = [a,b, c],removing L1 results in the list [a, c]. Let us denote the result ofremoving item i from list L as L \ i. So [a,b, c] \ 1 = [a, c].

LISTS 61

In this exercise, you define L \ i explicitly. Explain your answers foreach point.

(a) Should [ ] \ 0 be defined? If not, why not? If so, what should it be?

(b) What should (x ::L) \ 0 be?

(c) What should (x ::L) \ ky be? When should it be defined andundefined?

(d) Now write your definition for L \ i in a layout similar to Defini-tion 12.

52. Suppose L = [3, 2, 3, 3, 5] andM = [0, 1, 2, 3, 4, 5]. Calculate thefollowing explicitly step by step.

(a) len(L)

(b) L4

(c) (L++M)9

(d) L \ 4.

(e) ((M++ L) \ 3)5

6.3 Slices

Suppose L = [a0, . . . ,an−1] is a list of length n and that i ≤ j ≤ n. Wecan extract the sublist of L consisting of items indexed i, i+ 1, . . . j− 1,writing L[i : j] for the list [ai,ai+1, . . . aj−1]. The reader may thinkthis is strange usage because L[i : j] stops just one item shy of j. Butit is exactly the same notation used, for example, in Python for whatare called “slices” of a list. One reason this notation makes sense isthat it works well with concatenation. Namely, if i ≤ j ≤ k ≤ n, thenL[i, j] ++ L[j,k] = L[i,k].

So here is a formal definition of slicing.


ALGORITHM 13: List slicing

For a list L and natural numbers i ≤ j ≤ len(L), L[i : j] is a listdefined by the following:

L[0 : 0] = [ ]

(x ::L)[0 : jy] = x ::(L[0 : j])

(x ::L)[iy : jy] = L[i, j]

Exercises

53. For L = [4, 3, 7, 9, 10] calculate L[2 : 4] by explicit steps.

54. Show that for any list L and any i ≤ len(L), it is the case thatL[i : i] = [ ].

55. Show that for any list L and any i ≤ j ≤ k ≤ len(L), it is the casethat L[i : j] ++ L[j : k] = L[i : k].

Part II

Sets, Functions, Relations and Proofs


THE MATHEMATICAL UNIVERSE consists of various types of mathematical objects: numbers, functions,graphs, lists, groups, vector spaces, linear operators, and so on. We need ways to talk about these things, inisolation and in relation to one another.

Suppose you are asked to solve the equation

0 = 2x3 − x2 + 6x− 3

for x. You would have a right to ask whether x is meant to be an integer (in which case there are no solu-tions), a real number (in which case there is one solution) or a complex number (in which case there arethree solutions). So “solve for x” does not really mean anything until you know what type of x is beingsought. This is no different than in ordinary conversation. You can not answer “What would you like?”unless you know what the question is about. Are you being asked what to have for dinner, what to get youfor your birthday, what to do on vacation, or something else?

The development of programming languages has made clear the importance of data types. In mostcontemporary programming languages, each datum has an associated type. For example, a program mightinvolve integer data separate from character data and separate from, say, floating point data (roughly,floating point data approximates real number data in a way that is computationally tractible). Other datatypes might include things like matrices, arrays, lists, and much more.

In mathematics, natural numbers, real numbers, integer polynomials, complex matrices, continuous functions onthe reals and so on are essentially the mathematical counterparts of data types. They help us to organize thethings in much the same way that data types do in java or C++.

Though mathematicians tend to use the idea of types more informally than do computer scientists, one ofthe important lessons learned from computer science is that closer attention to type information helps clarifyhow things are related.

A set is, in the formulation of Cantor, jedes Viele, welches sich als Eines denken lasst “any multiplicity whichcan be comprehended as one”. For example, several playing cards taken together form a single deck ofcards; the deck is a multiplicity of cards comprehended as one thing. The several students taking DiscreteMath right now can be comprehended as one class. The infinitely many natural numbers can be regarded assingle thing, the set of natural numbers. Thus a set is essentially a collection of elements.

In a more “typed” way of thinking, the natural numbers constitute a type of mathematical data, distinctfrom, say, the type of real numbers, the type of n× nmatrices, the type of Python programs, etc. The impor-tant point is, for example, that you know what it means to add two natural numbers, or prove somethingabout them by simple arithmetic induction. On the other hand, adding two Python programs makes nosense. Proving something about real numbers by simple arithmetic induction makes no sense (but other“inductive” techniques do). A “typed” way of thinking, the emphasis is put on what you can do to a datum.A type is essentially a specification of how to build data, what can be done to data of that type, and how totell when data are equal. We hinted at this in Part I, when we developed the vocabulary and postulates fornatural numbers. In effect, we presented the natural numbers as a type.

A function is a correlation of the members of one set with members of another set. Functions can bethought of and used in many, many ways. Here are some examples.

• Polynomials such as f(x) = x3 + 2x2 − x+ 1 are functions. They can be used in a wide variety of modelingproblems in their own right, but also as approximations of more complicated phenomena.

65

• The process of definite integration takes a given function f and produces a second function∫x0 f(x)dx.

Integration is itself a “higher-order” function. Because it transforms one function into another.

• Cryptography is based almost entirely on the problem of designing functions with special properties.A crypto-system is based on two functions E (standing for “encrypt”) and D (“decrypt”). The funtionEwill take a messagem and an encryption key ke, and produce an encrypted message E(m,ke). Thefunction Dwill take an encrypted message c and a decryption key keyd and produce a clear messageD(c,kd). The system of these two functions is correct if it is the case that whenever ke and kd are correctlypaired, D(E(m,ke),kd) = m. In words, encrypting a message and then decrypting it with the matchedkey restores the original message. A correct system may still not be safe to use. To be minimally crypto-graphically safe, it must also be difficult computationally to determinem from E(m,ke) and ke, and mostalso be difficult to determine kd from ke.

• In programming, it is possible to implement a given process in many different ways. For example, oneprogrammer may use a while loop, whereas another uses recursion. To understand how to compare twoimplementations, one needs to know that they both may implement the same function. In fact, softwareengineers refer to “functional specifications” when they consider this.

A relation is more or less what is sounds like. For example, ≤ is a relation on natural numbers. It is eithertrue (4 ≤ 7) or not (4 � 2) for any two natural numbers. Relations do not have to be defined on data ofone type. For example, “root of” might be a relation between polynomials and real numbers: 2 is a root ofx2 + x− 6, but 4 is not a root of x2 + x− 6. In everyday life, we commonly think in terms of relations such as“Sam is a friend of Frodo”, “Frodo is shorter than Gandalf”, and so on.

Sets, functions and relations form an informal type-oriented framework in which virtually all of math-ematics can be built. So an understanding of sets, functions and relations is key to a rigorous approach tomost other parts of mathematics as well as of computing.

SETS , FUNCTIONS AND RELATIONS are fundamental to modern mathematics and computation. Setsare comprehensible collections. Functions are a way of thinking about attributes of the things in a set, like“the color of”, “the mass of”, “the location of”, “the velocity of”, “the father of”, “the favorite book of theperson to the left of” and so on. Relations are a formal way of thinking about how things can be related. Forexample, “less than” is a relation between numbers: 5 is less than 7, but 3 is not less than 2.

Set theory is roughly analogous in mathematics to a low level programming language in computer science.It provides a system in which we can describe and reason about a remarkably wide range of complicatedstructures. In programming, one might think about data types, programs and data base queries. In math-ematics, the approximate analogues of these are sets, functions and relations. Though not perfect, thisanalogy is worth keeping in mind as you are learning to think about set theory.

Traditional set theory emphasizes the role that sets play, deriving the concepts of functions and relationsfrom that. In fact, it is possible to develop a version of set theory that emphasizes functions, or one thatemphasizes relations. But in practice, mathematicians usually think more directly in terms of interplaybetween the three. That will be our approach.

Sets, functions and relations are useful partly because they can model non-mathematical, real-world


phenomena. A simple example is a set modelling a standard poker deck. We might denote it by

DECK = {A♣, 2♣, 3♣, 4♣, 5♣, 6♣, 7♣, 8♣, 9♣, 10♣, J♣, Q♣, K♣,

A♦, 2♦, 3♦, 4♦, 5♦, 6♦, 7♦, 8♦, 9♦, 10♦, J♦, Q♦, K♦,

A♠, 2♠, 3♠, 4♠, 5♠, 6♠, 7♠, 8♠, 9♠, 10♠, J♠, Q♠, K♠,

A♥, 2♥, 3♥, 4♥, 5♥, 6♥, 7♥, 8♥, 9♥, 10♥, J♥, Q♥, K♥}

The elements are arranged here conveniently, but we could just as well have listed the cards in any shuffledorder. The set of them (that is, the deck) would be the same. The notation used braces {. . .} is a standard wayto indicate that this is meant to be a set, So order and repetition are not important.

For any card in a deck, “the rank of” or “the suit of” are two attributes of each card, writing rank(A♦) = A

and suit(A♦) = ♦. In general, rank(c) and suit(c) pick out the relevant attributes of any card c. The potentialvalues of these attributes also form sets

RANK = {A, 2, 3, 4, 56, 7, 8, 9, 10, J, Q, K}

andSUIT = {♣,♦,♠,♥}.

Formally, we model rank and suit as functions.In a deck a five card hand is a subset of DECK consisting of five cards. The collection of all possible hands

is sometimes denoted by DECK[5]. We know that a four-of-a-kind hand beats a full-house. In general, thereis a relation beats on hands to that h1 beats h2 holds if h1 is the winning hand between them. The relationbeats can be broken down farther by establishing an order on ranks and suits (for example, ♥ is higher than♣), and an order on kinds of hands (any full house beats any pair, etc.)

The functions rank and suit capture the structure of the elements of DECK. The relation beats capturessomehing about the structure of poker games. A slogan to remember is

Functions and relations explain structure.

To understand sets, functions and relations as they are used in every day mathematics, we need to answersome questions:

• What do we mean by saying that a set is a collection?

• What do we mean by saying that two sets are equal?

• What do we mean by saying that a function is an attribute?

• What do we mean by saying that two functions are equal?

• What do we mean by a relation?

• What do we mean by saying two relations are equal?

• How do we construct sets, functions and relations?

The last of these questions can be very roughly paraphrased to ask, what is the ‘programming language’ forsets? While there is no fixed answer to that (as there is not just one programming language), we will settleon a simple set of principles by which we can build almost everything a mathematician needs.

67

We want maximum flexibility for defining new sets, functions and relations. That suggests we mightpropose a fairly complicated language that supports as wide a variety of constructions as possible. But itturns out that is not necessary. Starting with a rather small menu of constructions, we can combine them tobuild most of the structures one needs for contemporary mathematics.

7Sets, Functions, Relations

THE FUNDAMENTAL BUILDING BLOCKS of contemporary mathe-CHAPTER GOALS

In this chapter, we introduce threeinterrelated foundational concepts: sets,functions and relations.

matics are sets, functions and relations. To understand mathematicalreasoning (and its close cousin, computation), we need to understandhow these interact. We have already encountered examples of all threenotions. The collections N and N+ are sets; operations such as addi-tion and multiplication are functions; and ≤ and < are relations. Hereis a rough idea of what we mean.

• Sets correspond to types of mathematical data For example, naturalnumbers form a set; real numbers form a set; real number matricesform a set; complex polynomials form a set.

• Functions correspond to data transformations. For example, thefunctions sin and cos transform an angle θ into a correspondingvertical and horizontal displacements; the function min transformsa pair of natural numbers into the smaller of the two; the function +

transforms a pair of natural numbers into their sum.

• Relations permit us to speak precisely about how data of one typemay be related to data of another. For example, “less than” is arelation between natural numbers and natural numbers; “solves” isa relation between, say, real numbers and real number polynomialequations (as in “3 solves the equation 0 = x2 − x− 6); “overlaps” isa relation between discs in the plane (some discs overlap each other,others do not).

The development of programming languages has highlighted theimportance of data types. In most contemporary programming lan-guages, each datum has an associated type. For example, a programmight involve integer data separate from character data and separatefrom floating point data. Other computational data types include

SETS , FUNCTIONS , RELATIONS 69

things like matrices, arrays, lists, trees and much more. But the typicalprogramming language notion of data type is too restrictive for allmathematical needs. In computing, we are constrained in what cancount as a data type because a program has to be executable on anactual computer. This is a serious limitation. We would not be able todeal properly with all real numbers, for example.

To avoid confusing computational data types from a more generalconcept, we will use the terms that mathematicians use.

7.1 Sets

A set is, in the formulation of Cantor, jedes Viele, welches sich als Einesdenken lasst “any multiplicity which can be comprehended as one”.Cantor was getting at the idea that, for example, the collection ofnatural numbers N can be understood as a single entity, just as severalplaying cards taken together an be understood to be a single deck ofcards. When a collection of individuals can be understood as a singleentity in its own right, it is a set.

In Chapter 1, we introduced the notation n ∈ N and n ∈ N+

to indicate that n is a natural number, or that n is a positive naturalnumbers. We extend that notation to all sets.

VOCABULARY 3: Basic Vocabulary of Sets

A set is an entity A that we think of as consisting of elements.We write x ∈ A to indicate that x is an element of A, Sometimes, we

may also write x /∈ A to indicate that x is not an element of A. Whenx ∈ A, we say that x is in A.

For variety, all of the following phrases mean the same thing:

• x is in A

• x is an element of A

• x is a member of A

• A contains x

• x belongs to A

This vocabulary comes with some obligations. To describe a set,we have to provide a precise criterion for memebership — that is, anunambiguoua way to determine what is in and what is not in the set.


A collection for which the criterion of membership is vague is not aset. For example, “aging professors” does not constitute a set because“aging” is a vague term. On the other hand, “professors at least 45years old” does constitute a set (assuming that precise age can bepinned down and assuming that professors are mathematical objects).

To describe a small set, we can simply list the elements, usingbraces ({ and }) to indicate that we are describing a set as in the follow-ing example.

EXAMPLE 8:

A standard deck of poker cards can be described by

DECK = {A♣, 2♣, 3♣, 4♣, 5♣, 6♣, 7♣, 8♣, 9♣, 10♣, J♣, Q♣, K♣,

A♦, 2♦, 3♦, 4♦, 5♦, 6♦, 7♦, 8♦, 9♦, 10♦, J♦, Q♦, K♦,

A♠, 2♠, 3♠, 4♠, 5♠, 6♠, 7♠, 8♠, 9♠, 10♠, J♠, Q♠, K♠,

A♥, 2♥, 3♥, 4♥, 5♥, 6♥, 7♥, 8♥, 9♥, 10♥, J♥, Q♥, K♥}

The elements are arranged here conveniently, but we could just aswell have listed the cards in any shuffled order. The set — DECK itself— is the same regardless of how it is shuffled.

We can also describe a set of ranks and a set of suits:

RANK = {A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K}

SUIT = {♣,♦,♠,♥}.

As you know, the deck consists of 52 cards: one card for each possiblerank and suit. So DECK is, in a sense, built from RANK and SUIT

by systematically pairing each rank with each suit. We will returnthe general process of building a set by pairing elements of two setsbelow.

Informal set construction

To describe larger sets (including infinite ones) we can not list theelements. The most common way to specify a large set is to give aprecise criterion for membership. The most common notation for thisis illustrated in

{x ∈N | x is a perfect square}.


A perfect square is a natural number that happens to be a square ofa natural number. So 0, 1, 4, and 9 are perfect squares; 2, 3, 5 and6 are not. So the same set could also be described informally as{0, 1, 4, 9, . . .}.

For a set X, generally we can define a subset of X by

{x ∈ X | . . . }.

Whatever we write in place of . . .must be a clear description of someproperty that elements of Xmay have. This method of definitionis called set selection because it defines a subset of X by selectingelements to retain.

Some sets are familiar, and obviously useful to have around. Wewill not try to define them formally for now, but just give them namesand informal descriptions.


SOME IMPORTANT SETS

The following sets are denoted by the special symbols:

N = the set of natural numbers: 0, 1, 2, . . .

N+ = the set of positive natural numbers: 1, 2, 3, . . .

Z = the set of integers: . . . , −2, −1, 0, 1, 2, ldots

Q = the set of rational numbers:1

2,24

23and so on

R = the set of real numbers

C = the set of complex numbers

∅ = the empty set, having no elements

These symbols are very common. You will be understood by any In this text, we will only occasionallyrefer to real numbers and complexnumbers. Nevertheless, you ought toknow the standard notation.

mathematician if you use them.

7.2 Functions

Each card in the standard deck has a rank and a suit: the rank of 5♦ is5, the suit of Q♠ is ♠, and so on. So rank is an attribute of a card, andsuit as another attribute. Put another way, rank and suit transform acard c ∈ DECK into a rank, rank(c) ∈ RANK, or a suit, suit(c) ∈ SUIT.

VOCABULARY 4: Basic Vocabulary of Functions

For a set X and a set Y, a function from X to Y is an entity fwith thefollowing feature.

• Each element a ∈ X determines an element f(a) ∈ Y, spoken as “fof a”, or sometimes as “f evaluated at a.” The parentheses are sometimes omit-

ted, writing fa instead of f(a). Also,for some special purposes, fa is analternative notation. The point is thata function f from X to Y is an entity inits own right, which together with anelement a ∈ X determines an element ofY.

A function from X to X is also called a function on X. We also usethe following notation and terms:

• f : X→ Y indicates that f is a function from X to Y.

• For a function f : X→ Y, the set X is called the domain of f and Y iscalled the codomain of f.


It can be helpful to picture a function as a machine that transformsinput to output. We can depict f : X→ Y as in Figure 7.1.

fX Y

Figure 7.1: A wiring diagram depictingfunction f : X→ Y.

The trapezoid shape is not important. I only use it to suggest adirection from input (the big end) to ouput (the small end). Diagramslike this are common in circuit design, where different shapes arefrequently used to indicate different functions. We will investigatethese sorts of diagrams (called wiring diagrams) in general in Chapter 8,and in the special case of electronic circuits in Chapter ??.

It can also be helpful to picture a function more abstractly as anarrow between sets as in Figure 7.2. A diagram like this is calledan external diagram. We will also investigate external diagrams inChapter 8.

X Yf

Figure 7.2: An external diagram depict-ing the function f : X→ Y.

Wiring diagrams and external diagrams convey information aboutfunctions, but serve different purposes. Wiring diagrams are usefulwhen we need to think about details of how functions are built upfrom other functions. So they are a useful design tool.

External diagrams are useful when we need to think about howfunctions interact with other functions. So they are a useful specifica-tion tool. Also, external diagrams are usually more visually succinctand more easily fit into prose. For example, instead of Figure 7.2, we

could have drawn the same thing in the body of the text: X f−→ Y.A function may sometimes also be called a map, or a transforma-

tion.


EXAMPLE 9:

The trigonometric functions sin : R → R and cos : R → R areindeed functions because, every θ ∈ R, determines sin(θ) ∈ R, andsimilarly for cos. On the other hand, tan is not a function R → R

because, for example, there is no such thing as tan(π2 ). To deal withsuch things, tan is sometimes said to be partial function. We won’t dealwith partial functions as a distinct concept in this text.

The reader will not have any trouble thinking of many other natu-ral examples of functions f : R→ R.

Informal rules

To define a function, we must specify a set X to be the domain, a set Yto be the codomain, and for each x ∈ X, an element f(x) ∈ Y. Thoughthere are many ways to do this, we will use a notational style that iscommon in mathematics. For example, to define the function from N

to N that adds one then squares the input, we may write

s(n) = (n+ 1)2.

This specifies that s(0) = (0+ 1)2 = 1, s(1) = (1+ 1)2 = 4, s(2) =

(2+ 1)2 = 9, and so on.This rule is clear, but it depends on already knowing what +1 and

2 mean. It also depends on knowing how these things build up tothe actual rule. Nevertheless, informally what matter is that in aninformal definition such as “f(x) = . . .”, the expression on the right ofthe equals sign must clearly describe a single element of Y that maydepend on an element x belonging to X.

The variable appearing in a rule is a placeholder. Its name does notmatter. So

s(x) = (x+ 1)2

defines the same function s by the same rule. However, a critical partof the definition of s is that we specified its domain and codomain tobe N. The function t : R→ R defined by the rule

t(n) = (n+ 1)2

is not the same as s because s deals with natural numbers, but t dealswith real numbers.


Sometimes, it is helpful to be able to specify a rule without givingthe function a name. We could do that by writing something likex 7→ x3 : N → N, or just x 7→ x3 if the codomain and codomainare understood. To give a name and define a rule all in one place, wecould also write

c = (x 7→ x3)

but that is not any easier to read than

c(x) = x3.

Consider functions s : N→N and t : N→N defined by the rules

h(n) = (n+ 1)2 and

k(n) = n2 + 2n+ 1.

Are these functions equal? The steps of calculation are quite differ-ent. In the first, one finds the successor of n and squares that. In thesecond, one squares n, doubles n, adds those two results and thenadds one to the sum. So h and k present very different calculations. Asalgorithms, they are not the same. Nevertheless, they always arrive atthe same results: h(n) = k(n) is true for all n.

So the attribute of a natural number “square of the successor” is thesame as the attribute “sum of the square and the double and one”.

Regarding h and k as mathematical attributes, we conclude theyare equal. This leads to a principle that gives a general criterion forequality of functions.

PRINCIPLE 1: Function Extensionality

For functions f : X→ Y and g : X→ Y, Important: Equality of functions onlymakes sense when the two functionsshare the same domain and the samecodomain.

f = g if and only if f(x) = g(x) for all x ∈ X.

7.3 Relations

Binary relations correspond roughly to transitive verbs: “4 precedes 5”,“Jim likes waffles”, “The line y = 3x+ 2 intersects the line y = 2x− 1”,and so on.

Linguistically (at least in English), we also have ditransitive verbsthat need a subject, direct object and indirect object, such as “gives” as


in “Kelly gives Pat a book”. In general, we might have ternary rela-tions (ditransitive verbs), quatenary relations and so on. It also makessense to have unary relations (in English, adjectives and intransitiveverbs play this role), and even nullary relations.

As it happens, most of the time it makes sense to think mainlyabout binary relations because we will be able to deal with otherssystematically via the binary ones.

VOCABULARY 5: Basic Vocabulary for Binary Relations

For sets X and Y, a relation between X and Y is an entity R so that foreach x ∈ X and each y ∈ Y, either the relation “holds” or does not“hold”. We typically write x R y and x 6 R ywhen the relation holds ordoes not hold. A binary relation between X and X is called a relation onX.

For this text, we write R : X # Y to indicate that R is a binaryrelation between X and Y, though this notation is not universallyknown.

As with functions, we say that X is the domain and Y is the codomainof a relation R : X # Y. Sometimes, instead of domain and codomain,these are called the source and target.

Most of the time, we just write “relation” to mean “binary relation.”

For our purposes, when we say that R is a (binary) relation,we must If we were thinking about relationslinguistically, it would be tempted torefer to X as the subject and Y as theobject, but that is not common practice.

have explicitly in mind the domain X and codomain Y.

EXAMPLE 10:

For any two natural numbersm and n, eitherm < n or not. So <(“less than”) is a relation on N. Likewise, for any two real numbers xand y, either x < y or not. So < is a relation on R. But it is importantto bear in mind that, even though we use the same symbol, these aredistinct relations. For example, when we are dealing with naturalnumbers, it makes sense to ask for the smallest n satisfying 5 < n—namely, 6. But for real numbers, there is no smallest y satisfying 5 < y.So the “less than” relation on natural numbers and the “less than”relation on real numbers behave very differently.


EXAMPLE 11:

We can say thatm divides n ifm · d = n for some natural number d.So “divides” is a relation on natural numbers. For example, 5 divides10, but 5 does not divide 12. We will investigate divisibility in detail inChapter 12.

EXAMPLE 12:

Suppose we have defined a set R[x] that consists of all polynomialfunctions having real coefficients. So p(x) = x2 + 3x+ 1 is in R[x], andso on. Then we can define a relation is-root : R# R[x] by

a is-root p if and only if 0 = p(a).

For the example p above, there is no (real number) root. But for q(x) =x2 − 5x + 6, there are two roots: 2 is-root q and 3 is-root q, but forexample 4��is-root q.

This example illustrates that a relation can hold between data of This example also shows that sometimesan artificial name like is-root for arelation is not helpful. We might aswell just write things in plain English.Though sometimes it is helpful topropose a new bit of notation, that is ajudgement call.

one type (real numbers) and another type (real polynomials).

Suppose P is a set corresponding to people. Then we can imaginetwo relations likes and knows on P. For example, John likes Pat andPat knows Kelly. Suppose it is impossible to like someone withoutknowing the person. We can say this by asserting that to likes is asubrelation of knows. This is similar to subset inclusion, leading to adefinition.

DEFINITION 3: Subrelations

Suppose R : X # Y and S : X # Y are relations. We say that R is asubrelation of S to mean that for every a ∈ X and b ∈ Y, it is the casethat a R b implies a S b.

If R is a subrelation of S, we write R v S.Note that “subrelation” only is meaningful for relations that have

the same domain and the same codomain.


For natural numbers x and y, if x < y then x ≤ y. So < is a subrela-tion of ≤. As with functions, this leads to a criterion for equality.

PRINCIPLE 2: Relational extensionality

Suppose R : X # Y and S : X # Y are relations. Then R = S if andonly if R is a subrelation of S and S is a subrelation of R.

Relations also compose very similarly to functions.

DEFINITION 4: Relational products, diagonals and relation compari-son

For any two relations R : X # Y and S : Y # Z, there is a relational The notation for relational product andfor function composition are at odds.We write the composition of functionsf : X→ Y and g : Y → Z as g ◦ f (readingthis as “g following f”). We write theproduct of relations R : X # Y andS : Y # Z as R;S (reading it “R followedby S”).

This mismatch is only a matter ofconventions — a bit like driving onthe right or the left side of the road.Composition of functions usually iswritten in applicative order; composition(that is, product) of relations is usuallywritten in diagramatic order.

product (R;S) : X # Z sometimes pronounced “R then S” or “Rfollowed by S” defined by x R;S z if and only if x R y and y S z forsome y ∈ Y.

For any set X, there is a relation ∆X on X called the diagonal satisfy-ing x0 ∆X x1 if and only if x0 = x1.

EXAMPLE 13:

For example, suppose we have sets STUDENT, COURSE and PROF

representing students, courses and professors, respectively. Then wemight have relations

taking : STUDENT # COURSE,

taught-by : COURSE # PROF,

andknows : STUDENT # PROF,

representing which students are currently taking which courses,which courses are currently being taught by which professors, andwhich students know which professors.

Then taking; taught-by is a relation from students to professors rep-resenting which studens are currently taking a course with whichprofessors.

If we presume that a student currently taking a course with a


professor must know that professor, then taking; taught-by v knows.

Again, we can use diagrams to depict relations vey much as withfunctions. We will make use of that later.

7.4 Chickens and eggs

Though we have discussed the three separate concepts of sets, func-tions and relations in this chapter, you may have noticed that it lookslike we needed sets first, just in order to make sense of domain andcodomain of functions and relations. In fact, with a bit more devel-opment, we can interpret relations as being special kinds of sets andfunctions as being special kinds of relations. Most mathematiciansover more than the last century have been taught to think this way.

If our main concern is to limit the number of kinds of things inmathematics, then a sets-first view seems pretty reasonable.

But it turns out that, with a bit more effort, we could also havetaken relations to be basic, defining functions and sets in terms ofrelations. Or we could have taken functions to be basic, definingrelations and sets in terms of functions.

Historically (for well over the last century), the sets-first view hasbeen the default. But by putting sets, functions and relations on aneven footing, we can see their distinct roles better. In later chapters,we will have chances to concentrate on one or the other.

This chapter provides a big-picture view of these concepts. It isnecessarily a bit informal. For example, we simply announced that R

denotes the set of real numbers. But we made no attempt to say whata real number is, nor to justify that whatever they actual are, the realnumbers constitute a set.

In the next chapters, we address things more formally. The plan isto think about universal existence principles that stipulate that certainsets, functions and relations exist and are interconected to othersets, functions and relations by some universal property. The precisemeaning of “universal” will become clear, but we already encountereda very simple example.

Recall the operation of monus on natural numbers. For two fixednatural numbersm and n, we have a constraint on natural numbers p:m ≤ n+ p— some values of p “solve” this inequation and others not.In particular,m −n solves the inequation universally in the sense that

• m −n solves the inequation.

• if p solves the inequation, thenm −n ≤ p.


So anything we might want to say about solutions of the originalinequation, we might as well say about natural numbers greater thanor equal tom −n. So − supplies us with universal or optimal solutions.

The principles we are going to discuss have a similar purpose. Wepresent some kind of interaction of sets, functions, and/or relations(something like an inequation). Then a universal existence principlewill propose that the interaction has a kind of universal or optimalsolution. These principles are remarkably powerful, in the sense thatvery large swaths of contemporary mathematics can be formulatedentirely with them.

Exercises

Suppose B refers to the set of books in your university library. Eachbook has a year of publication. So we might refer to this as year : B →N. Suppose also that S refers to the set of students at the university.

56. Describe three other functions from B to N that might be part of thelibrary catalogue.

57. Each book includes a bibliography of cited books. Is “cited” afunction on B ? Explain your answer.

58. Describe two different relations from students to books.

8Functions in More Detail

FUNCTIONS AND SETS constitute what is called a category. WeCHAPTER GOALS

In this chapter, we establish basicways that functions behave and howthey can be combined. Wiring diagramsprovide a useful way to reason aboutfunctions.

will encounter other categories later. For now we use the word infor-mally. For sets and functions, the category structure is specified howcomposition and identity functions behave.

LEMMA 14: Sets and functions form a category

Sets and functions form a category via composition and identityfunctions. Specifically,

• For functions f : W → X, g : X→ Y and h : Y → Z,

h ◦ (g ◦ f) = (h ◦ g) ◦ f.

• For function g : X→ Y,

g = g ◦ idX and

g = idY ◦ g.

PROOF : The proof of this is a very simple exercise using FunctionExtensionality. �

Exercises

59. Prove that Lemma 14 is a consequence of Principles 1 and ??.


8.1 Wiring diagrams

Suppose we depict functions f : X→ Y and g : Y → Z like so:

fX Y

gY Z

Figure 8.1: Two wiring diagramsdepicting functions

Diagrams like these are called wiring diagrams or string diagrams.Informally, from an input a ∈ X on the input wire of the left diagram,the box f produces an element f(a) ∈ Y on the output wire.

The identity function idX : X → X is the function that does nothing

to an input. So we will generally draw it as a simple wire: X .That is, we don’t bother to draw a little box labelled idX.

Composition of functions f : X→ Y and g : Y → Z is just a matter ofconnecting wires as in Figure 8.2.

f gX Y Z

Figure 8.2: A string diagram for g ◦ f.

We are permitted to connect an output wire to an input wire aslong as they agree about the label. This enforces the idea that a wirelabelled X carries data of the type X.

You might think of it this way. Some wires are 120 volt electricalsupply cables, some wires are USB cables. You can connect a supplycable output to a supply cable input (think of a wall socket). Or youcan connect a USB output to a USB input (think of connecting a thumbdrive to your laptop). But of course, you can not connect a USB to asupply wall socket. (Well, you can. But it is not a good idea).

The associativity of composition is reflected in diagrams by the factthat we can simply connect additional diagrams. Figure 8.3 could beconstructed in two ways: first connect f to g, then connect that to h(resulting in h ◦ (g ◦ f)); or first connect g to h, then connect f to that(resulting in (h ◦ g) ◦ f). Associativity says the result does not dependon that order.

Feeding an input a ∈ X into Figure 8.3, results if f(a) on the secondwire, and g(f(a)) on the third wire and finally h(g(f(a)) on the output

FUNCTIONS IN MORE DETAIL 83

f g hX Y Z Z

Figure 8.3: Composition is associative.We do not need to distinguish betweenh ◦ (g ◦ f) or (h ◦ g) ◦ f.

wire. So it is a good depiction of the transformation h ◦ g ◦ f.How should we depict addition? Figure 8.4 is an obvious thing to

try.

+

N

N

N

Figure 8.4: A diagram representingaddition.

With diagrams like these, we can draw complicated compositionsof functions, and can even draw pictures of equations. For example,(omitting the label N on all wires) Figure 8.5 asserts that addition isassociative.

++ =

++ .

Figure 8.5: Addition is associative

If we name the three input wires x, y and z, then the left diagramdescribes the output (x+y) + z. The right diagram describes the outputx+ (y+ z). The equation asserts that (x+ y) + z = x+ (y+ z).

In the next section, we address a problem with wiring diagramslike this. Namely, the vocabulary for functions supposes that a func-tion has a domain and a codomain, both of which are sets. Figure 8.4does not fit how we talk about functions because + has two inputs,not one.

8.2 Products

The solution to dealing with multiple argument functions like addi-tion is to think carefully about the wiring of wiring diagrams, and to


allow sets that encode data on bundled wires.

Suppose X and Y are sets. Informally, let us write X× Y to

denote a bundled wire consisting of X and Y . So if we wereto “zoom in” on X× Y, we would see something like this:

X× Y X× YX

Y

The idea is, roughly, that a cable X× Y consists of two separate wiresX and Y.

So the basic addition diagram (Figure 8.4) could be drawn as inFigures 8.7 or 8.6.

+N×N N

Figure 8.6: The addition diagrammagnified.

+N×N N

N

N

Figure 8.7: The diagram for additionusing a bundled input wire.

To bring this into our system, we need to understand X× Y as anactual set.


PRINCIPLE 3: Cartesian Products

For any sets X and Y, there is a set, denoted by X× Y, that consists ofall pairs (x,y) where x ∈ X and y ∈ Y. Moreover, there are functions

XprX,Y←−−− X× Y pr ′X,Y−−−→ Y,

defined by

prX,Y(x,y) = x

pr ′X,Y(x,y) = y

For any two functions X f←− Cg−→ Y, there is a function,

〈f,g〉 : C→ X× Y defined by

〈f,g〉(c) = (f(c),g(c))

We can draw diagrams for pr and pr ′ by capping one or the other oftwo parallel wires.

prX,Y

X

Y

pr ′X,Y

X

Y

Figure 8.8: Projection diagrams takeinput from X× Y and discard one of theinputs.

Consider the two functions in Figure 8.9. According to Principle 3,there is also a single function 〈f,g〉 : C → X× Y that captures theidea of using f and g on the same inputs. This can be depicted as inFigure 8.10.

Composing Figure 8.10 with one of the projection diagrams fromFigure 8.8, yields Figure 8.11.

Figure 8.11 suggests that the result of projection following pairingshould satisfy the equations

prX,Y ◦ 〈f,g〉 = f and

pr ′X,Y ◦ 〈f,g〉 = g.

In fact, easy calculations show that this is the case. For any c ∈ C, we


f

g

C X

C Y

Figure 8.9: Two functions with the samedomain

〈 , 〉

f

g

C

X

Y

Figure 8.10: A single function 〈f,g〉,pairing f and g. An input c is copied asinput to f and to g.

f

g

C

X

Y

Figure 8.11: Composing 〈f,g〉 after prX,Y .An input c produces f(c) and g(c) onthe X and Y wires, and ignores g(c). Theresult is f(c).


have

prX,Y ◦ 〈f,g〉(c) = prX,Y(〈f,g〉(c))= prX,Y(f(c),g(c))

= f(c)

So by function extensionality (Principle 1), prX,Y ◦ 〈f,g〉 = f. Anidentical calculation shows that pr ′X,Y ◦ 〈f,g〉 = g.

Also, consider composing projections with pairing in the otherorder. That is, a function h : C→ X× Y can be drawn as in Figure ref.

hC X

Y

Figure 8.12: A function with codomainX× Y.

Composing hwith the two projections produces the diagrams inFigure 8.13

hX

Y

hX

Y

C

C

Figure 8.13: The diagrams prX,Y ◦ h andpr ′X,Y ◦ h. We have omitted the dashedboxes indicating the two projections.

Finally, pairing the diagrams in Figure 8.13 yields Figure 8.14.

hX

Y

hX

YC

Figure 8.14: The diagram 〈prX,Y ◦h, pr ′X,Y ◦ h〉. Again, we have omitteddashed boxes.


This diagram equals h because for any c ∈ C,

〈prX,Y ◦ h, pr ′X,Y ◦ h〉(c) = (prX,Y(h(c)), pr ′X,Y(h(c))

= h(c)

Now consider two unrelated functions d : W → Y and e : X →Z as in Figure 8.15. These can be assembled into a single function(d× e) : W × X→ Y × Z.

d

e

W Y

X Z

Figure 8.15: Two unrelated functions canbe bundled into a single function d× e.

In a formula, d× e = 〈d ◦ prW,X, e ◦ pr ′W,X〉, so

(d× e)(x,y) = (d ◦ prW,X(x,y), e ◦ pr ′W,X)

= (d(x), e(y))

8.3 Thunks and Empty Diagrams

In computing, a thunk is a function that does not take any input, butthat is neverthless a function. In a wiring diagram, we expect a thunkto look like this:

tY

,

because it has no input. To bring this idea into the mathematics of setsand functions, we need to consider the domain of t. But t is not meantto depend on input. Putting it more positively, t is meant to dependon no input.

Remember that X× Y allowed us to deal with two inputs. Evidently,we need a way to think about an empty product.


PRINCIPLE 4: A terminal set exists.

There is a set 1 that consists of exactly one element. The particularelement is not critical. So for our purposes, we will let 1 = {•}.

The set 1 is a terminal set. That is, for any set C, there is exactly one The symbol ♦ is chosen because it looksa bit like a collapsed version of 〈f,g〉.This is the empty version of “tupling.”Whereas 〈 , 〉 turns two functions intoone ♦ turns 0 functions into one.

function ♦C : C→ 1. It is defined by

♦C(c) = •.

It helps to think of 1 as empty product. As we noted earlier, X× Ycan be sometimes be depicted as two parallel wires and sometimesas a single wire labelled X × Y. Likewise, it is helpful to depict 1sometimes as an empty diagram (no wires), and sometimes as a singlewire labelled 1.

Usually, we will depict a thunk (a function with domain 1) as inthe diagram at the beginning of this section. So 1 is “drawn” in thatsituation as an empty diagram.

A thunk t : 1 → X can only do one thing: t(•) is an element of X.And every element a ∈ X should correspond to a thunk — namely,the function from 1 to X that simply produces a. This leads to anotherprinciple.

PRINCIPLE 5: Elements are thunkable.

For any set X, and any element a ∈ X, there is a function 1aX−→ X When X is clear from context, we may

omit it by writing a instead of aX.defined byaX(•) = a.

In a wiring diagram, we draw a thunk, using a special symbol, as

aY

.

where a is an element of Y.

8.4 Parametric Functions

Suppose we wish to define a function clock : T → string that turnsa computer’s time (here the set T represents the data type of time


stamps on a Unix clock, which are not easily read by us humans) intoa string of human-readable symbols. So for example clock(2340234291)

may produce the string 12:00:42.To make this function really useful, it ought to know what time

zone the user wants to use. So really, we could regard clock as a func-tion that takes two arguments. The first argument is a time zone code,the second is a UNIX time. So we could draw our diagram of the clockas in Figure 8.16.

clockT String

Z

Figure 8.16: A clock function with timezone parameter

Figure 8.16 is not really different that any other diagram withtwo inputs. We have just moved the wire representing time zone(Z) to emphasize that we want to think of clock mainly as a functionT→ String with a parameter that sets the time zone.

There are many situations in which we wish to think about para-metric functions like this. In calculus, for example, indefinite integra-tion takes a real integrable function f and returns a family of functionsF(x) +Cwhere f(x) = dF(x)+C

dx . So∫f(x)dx is not a function, but is a

family of functions parameterized by the constant C. Integration onlydefines an actual function once we fix C. This leads us to introduce aconvenient definition.

DEFINITION 5:

For sets P, X and Y, a P-parametric function from X to Y is a functionf : P× X→ Y.

In other words, f depends on a parameter p ∈ P and an input x ∈ X.There is no mathematical difference between a (simple) function fromP × X to Y and a P-parametric function from X to Y. It is a matter ofwhat we wish to emphasize.


Suppose we are given a Q-parametric function f : Q× X → Y

and a simple function c : P → Q, then we can change parameters byassembling these into a P-parametric function as in Figure 8.17.

c

fX Y

P

Q

Figure 8.17: A change of parametersfrom Q to P. The dashed box is acomposed P-parametric function

This leads to the following definition.

DEFINITION 6: Universal Parameter Set

For sets X and Y, a universal parameter set is a set E equipped witha E-parametric function from X to Y — that is, e : E× X→ Y) — so thatevery P-parametric function h : P× X → Y is obtained uniquely frome by a change of parameters. In more detail, there is a unique functionh† : P → E so that h = e ◦ (h† × idX). Figures 8.18 and 8.19 illustrates In general universal parameter objects

are called exponents. This will makesense later.

how h†, e and h are related.

hX Y

(−)†

P

E

Figure 8.18: Construction of h† corre-sponds to filling a h : P× X→ Y-shapedhole in this diagram.


h†

eX Y

P

E

= hX Y

P

Figure 8.19: An exponential set E forX and Y acts as a universal parameterset. Any other parametric function isobtained by change of parameter.

The definition of exponential sets does not guarantee they exist. Forthat, we either need a princple.

PRINCIPLE 6: Universal parameter sets exist.

For any two sets X and Y, there is a universal parameter set (alsocalled exponent) denoted by YX with YX-parametric function denotedby

evX,Y : YX × X→ Y.

For a P-parametric function h : P× X → Y, the construction fromh to h† : P → YX is called currying, after the 20th century logicianHaskell Curry. In his honor, sometimes it is written curry[h].

Now consider a function f : X → Y. We can, somewhat artificially,think of it as a 1-parametric function, f ◦ pr1,X, as in Figure 8.4.

f

pr ′1,X

1

X

Y

The “curried” version of this is (f ◦ pr1,X)†. This is a function from 1

to YX. As we noted earlier, such functions are “thunks”, correspond-ing exactly to one element of YX. We may denote this element pfq,


and call it the name of f. Thus, for any function f : X → Y, the elementpfq ∈ YX is the unique element for which pfq = (f ◦ pr ′1,X)

†.Conversely, suppose φ ∈ YX. Then φ : 1 → YX can be used as a

change of parameters. So ev ◦ (φ × idX) is a 1-parametric function fromX to Y. And so evX,Y ◦ 〈φ ◦�X, idX〉 is a function from X to Y. Call thisthe function named by φ, and denote it by φ. So each element of YX

determines a function X → Y, and each such function determines anelement of YX. We can gather these ideas into another definition.

DEFINITION 7: Names of functions

For any function f : X→ Y, let pfq ∈ YX denote the unique elementof YX so that pfq = (f ◦ pr ′1,X)

†. We call pfq the name of f.For any φ ∈ YX, let φ : X → Y denote the unique function that is

named by φ.

Now it is an exercise to show that Most of the time, mathematiciansdo not make a distinction between afunction and its name. So they maywrite f : X → Y or f ∈ YX to mean thesame thing. There is now harm in thisbecause, as we have just seen, there isan exact match between elements of YX

and functions X→ Y.

f = φ

if and only ifpfq = φ.

In concrete terms of elements: ev(pfq, x) = f(x) for any functionf : X → Y. And for any P-parametric function h : P× X → Y, anyp ∈ P, the element h†(p) ∈ YX is the name of a function so that thath†(p)(x) = h(p, x). It is sometimes convenient to have an explicitname for the function X → Y that is named by h†(p). The commonnotation is λx,h(p, x). That is, λx,h(p, x) = h†(p).

This means that if we intend to define a function F : P → YX, we canspecify F(p)(x) ∈ Y for each p ∈ P and x ∈ X. Or we can spell out justrequire F(p) = pλx, f(p, x)q for some choice of f : P× X→ Y.

EXAMPLE 14:

A law of exponents for natural numbers says that (mn)p = mnp.Since we are re-using the word “exponent” and the notation YX, itseems reasonable to ask if a similar law holds. That is, it is the casethat (YX)W and YX×W are the same sets?

It turns out that the answer is, strictly, no. But the two sets arecompletely interchangible. Specifically, there are two functions


F : (YX)W → YX×W and G : YX×W → (YX)W that are mutual inversesof each other.

To define F, we must specify F(φ) ∈ YX×W for each φ ∈ (YX)W . Soit suffices to specify F(φ)(x,w) ∈ Y for each x ∈ X and w ∈ W. But φis itself the name of some function. So φ(w) is an element of YX, andφ(w)(x) ∈ Y. So we may indirectly define F by requiring

F(φ)(x,w) = φ(w)(x).

Similarly, we can define G by specifying for each ψ ∈ YX×W the be-havior of G(ψ) ∈ (YX)W . Again, this amounts to defining G(ψ)(w) ∈YX for each w ∈ W, and this amounts to defining G(ψ)(w)(x) ∈ Y foreach x ∈ X. But recalling that ψ is a name for a function from X×W toY, we see immediately that

G(ψ)(w)(x) = ψ(x,w)

is the only sensible definition to try.Now we check whether F and G are inverses of each other.

G(F(φ))(w)(x) = F(φ)(x,w) = φ(w)(x)

Since this is true for all x,

G(F(φ))(w) = φ(w)

So G(F(φ)) = φ. Similarly,

F(G(ψ))(x,w) = G(ψ)(w)(x) = ψ(x,w).

So F(G(ψ)) = ψ.

Exercises

60. Show that for any sets W, X and Y, there are functions F : (W × X)Y →WY ×WX and G : WY ×WX → (W × X)Y that are inverses of eachother.

61. Show that for any set X, there is exactly one element of 1X. [Thisalso shows that 1X is a terminal set.]

62. Show that for any set X, there are functions F : X→ X1 and G : X1 →X that are inverses of each other.


8.5 External Diagrams

Wiring diagrams are useful for getting the idea of function composi-tion, but they are awkward for reasoning about equalities of functionsbecause we need to draw two (possibly complicated) diagrams justto state that they represent equal functions. That’s fine for small ex-amples, but is not helpful for more complicated situations involvingmany different functions.

When we are mainly concerned with how functions interact, an

individual function can be depicted simply as X f−→ Y. A compositionof functions can be depicted as in

X Y

Z

f

gg ◦ f

We do not really need to draw g ◦ f as a separate arrow because the Notice that in this diagram, functionslabel the arrows, and sets label wherearrows point from and to. In wiringdiagrams, sets label the arrows (thewires) and functions label to boxes.

path from X to Y to Z is already an implicit part of the diagram. So thesimpler diagram

X Y

Z

f

g

conveys the same information, namely, that f : X → Y and g : Y → Z

are functions and therefore, g ◦ f : X → Z is too. Notice that we alsodid not bother to draw the identity functions for X, Y and Z. If we diddraw them, they would appear as “loops”.

The diagram

W X

Y Z

f

gh

k

depicts ten functions all together: four identity functions, the namedfunctions f, g, h, and k, and the two composite functions g ◦ f and k ◦ h.We say that such a diagram commutes if g ◦ f = k ◦ h. More generally, toassert that a diagram commutes means all the compositions depictedimplicitly in the diagram that can be equal are equal. So a commuta-tive diagram is a graphical depiction of equations. This takes somegetting used to, but it is worth the practice.

Consider the Law of Associativity for composition. For composible


functions, we can draw the basic situation as

W X

Y Z

f

g

h

Including the compositions g ◦ f and h ◦ g, we get

W X

Y Z

f

g

h

g ◦ fh ◦ g

Associativity says that this diagram commutes. Likewise, the IdentityLaws say that the diagrams

X

Y Y

ff

idY and

X X

Y

f

idX

f

both commute. We will encounter many other examples later.Consider the functions f : N → N, g : N → N, h : N → N and

k : N→N defined by

f(n) = n+ 1

g(n) = n2

h(n) = 2n

k(n) = n3

We can form a diagram depicting g ◦ f and k ◦ h as in

N N

N N

f

gh

k

but this diagram does not commute because, for example, g ◦ f(2) =9 6= 64 = k ◦ h(2). The mere fact that we can draw such a diagram andthat the paths corresponding to g ◦ f and k ◦ h start and end at the sameplaces is not a guarantee the diagram commutes.


Exercises

63. For each of the following pairs of functions N → N, determinewhether they are equal and explain why or why not.

(a) f(n) = 2n+ 3 and g(m) = 2m+ 3

(b) f(n) = 2n+1 − 1 and g(n) =∑ni=0 2

i

(c) f(n) = n2 + 5n+ 6 and g(n) = (n+ 3)(n+ 2)

(d) f(n) = n4 − 10n3 + 35n2 + 50n+ 24 and g(n) = 24

64. Let R denote the set of all real numbers. Explain why tan (thetangent “function”) does not actually define a function from R to R.

65. Suppose the following functions exist: f : W → X, g : X → Y,a : W → Z, b : Y → Z. Draw a commutative diagram asserting thatb ◦ g ◦ f = a.

66. Suppose the following functions exist: f : C → A, g : C → B,h : C → P, p : P → A and q : P → B. Draw a single commutativediagram asserting that f = p ◦ h and g = q ◦ h.

67. Define functions from N to N as follows:

f(n) = n+ 1

g(n) = n2

h(n) = n2 + 2n

Draw a diagram depicting the compositions g ◦ f and f ◦ h. Does thediagram commute?

8.6 Tabulating a Function

Recall the sets DECK, RANK and SUIT from Chapter 7. Given a cardc ∈ DECK, there is evidently an assignment of a rank. For example,the rank of 5♥ is 5. Likewise, there is an assignment of suit of anycard.

The rules defining functions rank and suit are obvious, but to makethem completely explicit, we can build a table:

We do not really need the table in Figure 8.1 because all the valuesare easily predictable. The cards themselves are systematically builtfrom all possible pairings of a rank and a suit. Nevertheless, this


c ∈ DECK rank(c) suit(c) c ∈ DECK rank(c) suit(c)

A♣ A ♣ A♦ A ♦2♣ 2 ♣ 2♦ 2 ♦3♣ 3 ♣ 3♦ 3 ♦4♣ 4 ♣ 4♦ 4 ♦5♣ 5 ♣ 5♦ 5 ♦6♣ 6 ♣ 6♦ 6 ♦7♣ 7 ♣ 7♦ 7 ♦8♣ 8 ♣ 8♦ 8 ♦9♣ 9 ♣ 9♦ 9 ♦10♣ 10 ♣ 10♦ 10 ♦J♣ J ♣ J♦ J ♦Q♣ Q ♣ Q♦ Q ♦K♣ K ♣ K♦ K ♦A♠ A ♣ A♥ A ♥2♠ 2 ♠ 2♥ 2 ♥3♠ 3 ♠ 3♥ 3 ♥4♠ 4 ♠ 4♥ 4 ♥5♠ 5 ♠ 5♥ 5 ♥6♠ 6 ♠ 6♥ 6 ♥7♠ 7 ♠ 7♥ 7 ♥8♠ 8 ♠ 8♥ 8 ♥9♠ 9 ♠ 9♥ 9 ♥10♠ 10 ♠ 10♥ 10 ♥J♠ J ♠ J♥ J ♥Q♠ Q ♠ Q♥ Q ♥K♠ K ♠ K♥ K ♥

Table 8.1: The rank and suit functions

sort of tabulation of functions can be useful in more complicatedsituations.

Suppose we have dealt the cards to four players. Inmany games,the players are named N,W, S, and E. So let PLAYER = {N, W, S, E}.Now the result of dealing the cards is an assignment of a player toeach card (spelling out which card is held by which player).

For example, Table 8.2 shows one way the cards might be dealt.


c ∈ DECK holder(c) c ∈ DECK holder(c)

A♣ N A♦ S

2♣ S 2♦ E

3♣ E 3♦ W

4♣ N 4♦ W

5♣ W 5♦ N

6♣ S 6♦ S

7♣ E 7♦ E

8♣ W 8♦ S

9♣ N 9♦ N

10♣ S 10♦ E

J♣ E J♦ E

Q♣ E Q♦ E

K♣ W K♦ S

A♠ N A♥ N

2♠ S 2♥ E

3♠ W 3♥ W

4♠ N 4♥ N

5♠ W 5♥ E

6♠ W 6♥ W

7♠ W 7♥ N

8♠ N 8♥ S

9♠ S 9♥ S

10♠ E 10♥ E

J♠ W J♥ W

Q♠ N Q♥ S

K♠ S K♥ N

Table 8.2: A deal of four bridge hands


Exercises

68. Confirm or deny that the function specified in Table 8.2 is legal.

For a less uniform example, suppose we have a set corresponing to sixstudents S = {s1, s2, s3, s4, s5, s6}. Each student has a first name and alast name. So we can specify their names as in Table 8.3:

x ∈ S forename(x) family(x)

s1 Carol Binkleys2 Pat Pattersons3 Kelly Treebaums4 Jamie Doodles5 Kelly Greens6 Violet Green

Table 8.3: Names of students in S

Exercises

69. Provide a table defining a function (call it f0) from the set A =

{0, 1, 2, 3} to B = {N,S,E,W} and a table defining a function (call itg0) from B to A so that idA = g0 ◦ f0 and idB = f0 ◦ g0.


{0, 1, 2, 3} to B = {N,S,E,W} and table defining a function (call it g1)from B to A so that idA = g1 ◦ f1 but idB 6= f1 ◦ g1.


{0, 1, 2, 3} to B = {N,S,E,W} and table defining a function (call it g2)from B to A so that idA 6= g2 ◦ f2 and idB 6= f2 ◦ g2.

72. The table in Figure 8.4 does not represent a function f from the setT = {a,b, c,d} to the set W = {0, 1, 2, 3, 4, 5, 6}. Explain why.

x ∈ T f(x)

a 0b 1c 2d 1a 2

Table 8.4: A table that does not define afunction

9Basics of Relations and Logic

RELATIONS MODEL concepts like “less than”, “belongs to”, “similarto”, “divides”, “is to the left of”, “intersects”, “is tangent to” and soon. In English, verbs also typically denote relations. For example insentences such as “Jerome likes cantaloupe” or “Parthiv likes honey-dew” the transitive verb “to like” denotes a binary relation betweenpersons and things. An intransitive verb, such as “to run”, denotes aunary relation: “Fredd-E runs”, “Giuseppe runs”. A ditransitive verb,such as “to give”, denotes a ternary relation: “Axcel gave a book toDov.” Other language constructs (frequently involved prepositions)also denote relations. For example, “Orange lies between Los Ange-les and San Diego” suggests that “lies between” denotes a ternaryrelation among geographical locations. Relations are just as fundamen-tal to mathematics as they are to language because they allow us toreason about how various things interact.

We develop relations in two forms that provide us with differentviews on the same ideas: simple relations and binary relations.

VOCABULARY 6: Simple Relations

A relation type is a list of sets:

• [ ] is a relation type.

• if X is a set, and T is a relation type, then X :: T is a relation type.

For a relation type T , a T -argument is a list of elements that matchthe type as follows.

• [ ] is the only [ ]-argument.

• If x is an element of X and ~x is a T -argument, then x ::~x is a X :: T -argument.


We will write ~a of-type T to indicate that ~a is a T-argument.For a relation type T, a simple T -relation is an entity R so that for

every T-argument ~x, either R holds or R does not hold. We write R~x ifR holds for ~x, and�R~x if not.

Usually, we just speak about relations, rather than simple relations.For example “less than or equal” is a [N, N]-relation because it

relates two natural numbers. According to the foregoing vocabulary,we would write ≤ [3, 4] to indicate that 3 is less than or equal to 4. Ofcourse, we don’t really want to write things this way; the standardnotation 3 ≤ 4 is much more familiar. So we make accomodations.

1. First, suppose T0 and T1 are relation types (so T0 ++ T1 is also arelation type). Then for any T0 argument ~x0 and any T1-argument~x1, we may right ~x0 R ~x1 instead of R ~x0 ++~x1.

In the example of ≤, this means we may write [3] ≤ [4] instead of≤ [3, 4].

2. We may omit brackets and leave ++ implicit on arguments. Thus3 ≤ 4means the same as [3] ≤ [4]. And For example, R ~a~bmeansthe same as R ~a++ ~b.

3. A common special case of this is when we have a relation type X ::T.Then an argument consists of x ∈ X and a T-argument ~x. So we canwrite x R ~x.

DEFINITION 8: Relations on a set

A relation like ≤ is special because both arguments are of thesame kind (both natural numbers). In cases like that, we say this is arelation on X. Specifically, ≤ is a binary relation on N.

To make this formally precise, for a set X and natural number k,define Xk by

X0 = [ ]

Xky

= X ::Xk

Then an n-ary relation on X is a Xn-relation. “Binary” is, of course, better than “2-ary”; “ternary” than 3-ary. One doesoccasionally read, say, “4-ary” insteadof “quaternary.” Constants higherthan 4 are pretty rare. But for a genericterm when n is unspecified, “n-ary” isstandard terminology.

We will also use the word “binary” fora different kind of relation, but thereshould not be any confusion betweenthese uses.

There is a family of useful relations on X called diagonals.

BASICS OF RELATIONS AND LOGIC 103

PRINCIPLE 7: Diagonal relations exist

For each k ∈N, let ∆kX by the k-ary relation on X defined by ∆nX ~a ifand only if all items in ~a are identical.

For each n ∈N and x ∈ X, define the list n ∗ x by

0 ∗ x = [ ]

ky ∗ x = x ::(k ∗ x)

Then ∆nX ~a holds if and only if ~a = n ∗ x for some x ∈ X.

Diagonals will play a role similar for relations as identity functionsplay for functions.

We will develop, in the next few pages, formal ways to defineother relations. Informally, though, any way we can specify a precisecondition on elements of sets will count as a definition. For example,consider the condition on three natural numbers: m ≤ n+ p. This iseither true or not for any particular values. So 4 ≤ 2+ 3 is true. But5 ≤ 2+ 1 is not. So “m ≤ n+ p” defines a ternary relation on N — arelation that holds or does not hold for any three natural numbers.

EXAMPLE 15:

Using the notation we just introduced, if R refers to the relationdefined by m ≤ n+ p relation R, then R is a ternary relation on N. Andevidently, then R 4 2 3 is true, but R 5 1 3 is not.

We may also want to emphasize things differently, writing 4 R 2 3or 4 2 R 3 or 4 2 3 R. All of these notations indicate the same fact:4 ≤ 2+ 3. So where we happen to place the relation symbol R does notchange the meaning.

Recall that when we defined subtraction (actually monus) weemphasized that “m −n ≤ p if and only ifm ≤ n+ p”. With relations,we can say precisely what this means. We have two operations + and−. Using +, we define a a ternary relation R by declaring R mnpmeansm ≤ n+ p. We also define a second relation S by declaringS mnp to meanm − n ≤ p. On the face of things, R and S aredifferent relations. But in fact, they completely agree. So we shouldbe willing to say they are equal. This leads to the following definition


and extensionality principle.

DEFINITION 9: Subrelations

For any two T-relations, R and S, we say that R is a subrelation of Sprovided that R ~x implies S ~x for every T-argument ~x. We write R v Sto indicate that R is a subrelation of S.

Note that R and S can only be compared if they are the same type ofrelation.

A common example is that < is a subrelation of ≤ becausem < n

implies thatm ≤ n. Another example: Ifm | n and 0 < n, thenm ≤ n.So the relation “m divides n and n is positive” is a subrelation of ≤.

PRINCIPLE 8: Relation extensionality

Let T be any relation type. For any T-relations R and S, if R is asubrelation of S and S is a subrelation of R, then R = S.

When we write “... if and only if ...” in this setting it means that thetwo relations implicitly defined on the left and right are equal.

9.1 Relations from relations

Consider our usual favorite example of a relation: ≤. Evidently, ≥provides us with the same information about two numbers. The onlydifference is the order of arguments. This should be a general fact: forany relation R of any type, if we have some shuffling of the type into adifferent order, there ought to be a relation of the shuffled type that isthe same as Rwith arguments shuffled the same way.

There are a few ways to deal with this shuffling. In relationaldatabase languages like SQL, the problem is deferred because the indi-vidual arguments to a relation are presented by named “columns” in atable, not by the order in which table entries happen to be stored. Thisdoes not really eliminate the problem, though, because in a databaserenaming the columns amounts to a reshuffling of the entries.

In our setting (geared toward mathematical usage), we somehowneed to be able to deal with completely general shufflings. On the The technical word for shuffling is

permutation.face of things, this looks pretty daunting. Suppose a relation type T islong (len(T) = 1000). How will we describe all of the ways T could be


shuffled. There are over 102568 different shufflings.Fortunately, we do not have to directly describe all shufflings.

It is enough to give simpler processes by which all shufflings canbe produced. It turns out that one simple kind of shuffling can berepeated to produce all shufflings. Image that you have a deck ofcards. Cut the deck into three piles p0, p1 and p2 of any sizes you like,then rearrange the piles by exchanging p0 and p1. That is, the originaldeck was p0 p1 p2 and the shuffled deck is p1 p0 p2. Notice that p2can be any size. So it might be empty. In that case, we really took adeck p0p1 and produced p1 p0.

PRINCIPLE 9: Shuffling of relations

Suppose T, U and V are relation types. For any (T ++U++V)-relation It will turn out that this principle isredundant in the sense that, once weintroduce how relations are interactwith functions, we will be able provethat it is true. So really is a lemma, nota necessary fundamental principle. Westate it as a principle now because wedo not currently have the necessaryconnections to functions.

R, there is a (U++ T ++V)-relation R ′ so that

~b R ′ ~a~c if and only if ~a R ~b~c

for any ~a of-type T, ~b of-type U and ~c ∈ V.

Although this principle requires a fairly complicated statement, it ismainly needed to explain simple situations, for example, how ≤ and≥ are related. As a general idea, it allows for abrbitrarily complicatedreshufflings, but in practice, it is not so daunting. Basically, it ensuresthat if we have a relation R, but for some reason would prefer to thinkabout its arguments appearing in a different order, we are safe to dothat.

For a set X and a relation type T , the T -relations and the X :: T -relations interact in some simple, but important ways. First, a T -relation R can be extended to X :: T , essentially by adjoining an argu-ment from X that does not actually participate in the relation. Thisextension is called cylindrification.


PRINCIPLE 10: Extending relations

Suppose T is a relation type and X is a set. For any T -relation R, In light of Princple ??, we could adjointhe extra argument anywhere in theargument list, not only as the first item.

there is a (X :: T)-relation X.R satisfying x X.R ~a if and only if R ~a. Theresulting relation is called the cylindrification of R.

This may seem like a useless principle. It basically says that if Ris a relation, then we can always use R as if it has an extra argumentsimply by ignoring that argument. The real value of cylindrificationcomes by considering how to reverse it. That is, from a X :: T -relation S,is there a T -relation whose cylindrification is S? It will turn out that ingeneral the answer is no. But there are two “best” approximations.

PRINCIPLE 11: Adjoints to cylindrification

For every (X ::T)-relation S, there is a T-relation ∃XS so that

• S v X.(∃XS) and

• if S v X.R, then ∃XS v R for any T-relation R.

Similary, there is a T -relation ∀XS so that

• X.(∀XS) v S, and

• X.R v S then R v ∀XS for any T-relation R.

According to this principle, suppose P is some other relation thatsatisfies the same conditions as ∃XS. Then S v X.P. So ∃XS v P.Likewise, S v X.(existsXS), so P v ∃XS. So by relation extensionality(Principle 8) P and ∃XS are the same. In other words, the principle im-plies that ∃XS is uniquely determined by S. Similarly, ∀XS is uniquelydetermined by S.


LEMMA 15: Quantification

Consider any X :: T -relation S.Then ∃XS ~a holds if and only if x S ~a holds for some x ∈ X.Similarly, ∀XS ~a holds if and only if x S ~a for all x ∈ X.

PROOF :Consider a X :: T -relation S.Suppose ∀XS ~a. Then x X.(∀XS) ~a for every x ∈ X. So by Princi-

ple 11, x S ~a for every x ∈ X.Conversely, suppose R is some relation so that R ~a implies x S ~a for

every x ∈ X. Then x X.R ~a if and only if R ~a. Hence R v ∀XS.Suppose x S ~a holds for some x ∈ X. Then x X.∃XS ~a, which means

by definition of cylindrification, ∃XS ~a.Conversely, suppose R is some relation so that if x S ~a holds for

some x, then R ~a holds. Then clearly, S v X.R. So ∃XS v R.�

This lemma justifies the notation ∃XS and ∀XS. That is, we canread ∃XS ~a ha meaning that “there exists some x ∈ X so that x S ~a.”Likewise, we can read ∀XS as meaning “for all x ∈ X, it is the case thatx S ~a.”

The notation is set up for explaining how ∀ and ∃mean relate toexpanding a relation by cylindrification. But in practice, it is a bitcumbersome when there are several variables to track, or when thearguments are not in a convenient order, so that the argument wewant to quantify is not the first on a list of types.

This is not a big problem thanks to Principle ??. We can alwaysrearrange the order of arguments to suit our need. Because this isalways possible (and extremely tedious to work out in great detail),we will not insist on being completely formal about it.

For example, consider the ternary relationM on real numbersdefined byMxyz if and only if x < y < z. Then we might want topoint out that real numbers are “dense”: For every two real numbers xand z, if x < z then there is another real number stricty between them.

We can express that there is a number strictly between x and z bymaking explicit which variable is quantified as in ∃y ∈ R, x < y < z.Using the notation of the previous paragraphs, we would need towrite ∃RS, where S is just the right shuffle of x < y < z so that y ismoved to the initial position. It is much more readable simply to write


the relation in a convenient way (x < y < z) and specify that we arequantifying over y. There is no pressing need to make all this formal,because it should be clear in any particular example how to get thingsright.

So density of the real numbers can be stated explicitly by sayingthat < is a subrelation of ∃y ∈ R, x < y < z. Even more clearly, x < zimplies ∃y ∈ R, x < y < z.

EXAMPLE 16:

For the relation ≥ on the natural numbers, ∀x ∈ N, x ≤ y denotesthe unary relation that holds for a natural number y just in case x ≥ yholds for all natural numbers x. Clearly, the only natural number thatsatisfies this condition is 0. So ∃y ∈ N, ∀x ∈ N, x ≥ y is a nullaryrelation, which is either TRUE or FALSE. It is TRUE because there isindeed some natural number y (0) such that for every natural numberx, x ≥ y. So ∃y ∈ N, ∀x ∈ N, x ≥ y expresses the fact there there is asmallest natural number.

Exercises

73. In plain English, explain what is meant by ∀x ∈N, ∃y ∈N,y > x.

74. Suppose P denotes the unary relation on N so that P n holds if andonly if n is a prime number. Write an expression that asserts thereare infinitely many prime numbers.

Frequently, we will want to nest quantifiers of the same kind.

DEFINITION 10: Iterated cylindrification and quantification

Suppose T and U are relation types, the iterated cylindrification of aU-relation S is a T ++U-relation©TS defined inducively by

©[ ]S = S

©X ::TS =©X(©TS)

Likewise, ∃~x of-type T,~x R and ∀~x of-type T,~x R are defined byinduction.


Iterated cylindrification gives us access to a largest relation for anytype T. Namely, T.TRUE is by definition a T-relation that is holds forall ~a of-type T. So R v T.TRUE for any other R.

Exercises

75. Write a precise definition of ∃~x of-type T,~x R.

76. Let T and U be relation types. Show that for any U-relation R andT ++U-relation S, it is the case that

©TR v S if and only if R v ∃~x of-type T,~x S .

Relational databases are record systems built almost entirely on thesimple relations we have discussed here. For practical reasons, theyusually support more expressive ways to structure and name relations,but the basic ideas are almost all present in this section.

An additional construction that is key to relational databases iscalled meet or interection.

PRINCIPLE 12: Relational meets

Suppose that R and S are T-relations. Then there exists a T-relationRu S satisfying

• Ru S v R and Ru S v S;

• if P v R and P v S then P v Ru S for any T-relation P.

The relation Ru S is called the meet of R and S.

The idea of meeting relations is to build a relation that contains allthe data that is common to R and S.

9.2 Binary relations

The decision that, for example, the notation x R y z and xy R z bothmean the same condition is useful. But we could have decided insteadto emphasize a distinction between arguments on the left and on theright. That leads to binary relations.


VOCABULARY 7: Binary relations

For relation types T and U, a binary relation from T to U is a sim-ple (T ++U)-relation.

We write R : T # U to indicate that R is a binary relation from T toU.

A binary relation must indicate its domain T and codomain U.

A binary relation differs from a simple relation in so far as a bi-nary relation specifies which arguments go on the left and which onthe right. That is, for a binary relation, we must write ~a R ~b for T-argument ~a and U-argument ~b. With this convention, it is possible todo two useful things. First, we can regard binary relations as forminga category, similar to the category of sets and functions. Argumentshuffling becomes a very useful, concrete operation that will permit usto characterize how functions and relations interact.

DEFINITION 11: Relational converse

Suppose R : T # U is a binary relation. Then the binary relation English syntax provides exampleson converse relations. Passive voiceproduces roughly the converse of activevoice. In “Bob ate a peach.”, the word“ate” denotes a relation between, say,animals and food. In “A peach waseaten by Bob”, the term “was eaten by”denotes the converse relation.

R◦ : U# T called the newtermconverse of R is defined by

~b R◦ ~a if and only if ~a R ~b.

In short, R◦ consists of the same data as R, but reverses the order ofarguments. This is a definition because the construction of R◦ dependsonly on being able to shuffle the arguments of a simple relation. ThusPrinciple ?? guarantees that converses always exist.

Recall that sets together with functions constitute a category. Thismeans, essentially, that function composition makes sense when thecodomains and domains match correctly, that each set has an identityfunction, and that composition is associative when it makes sense. Wehave a similar fact (with more structure) for relation types and binaryrelations.


DEFINITION 12: Diagonal binary relations

For a relation type T, define ∆T : T # T by ~a ∆T~b if and only if ~a

and ~b.

THEOREM 2: Relation types and binary relations form an orderedcategory

For any relation type T, let ∆T denote the binary relation on T

defined by ~a ∆T~b if and only if ~a is identical to ~b.

For any two binary relations R : T # U and S : U# V define R;S tobe the binary relation from T to V defined by

~a R;S ~c if and only if ∃~b of-type U, ~aR~bu ~bS~c.

For any relation S : X# Y,

S;∆Y= S =∆X;S.

For any two additional relations R : W # X, and T : Y # Z,

(R;S); T = R; (S; T)

Furthermore, for any two relations R ′ : W # X and T ′ : Y # Z, ifR v R ′, then

R;S v R ′;S,

and if T v T ′, thenS; T v S; T ′.

PROOF : Everything is easily verified from extensionality, the defini-tions of relational product and v. We leave this as an exercise. �

The category of sets and relations is typically referred to as Rel,whereas the category of sets and functions is refered to as Set.

Exercises

77. Prove Theorem 2.


9.3 Depicting relations

A relation from finite set X to finite set Y can be represented as a table.Rows of the table represent elements of X; columns, elements of Y.Then putting a mark in row a, column b indicates a R b. For example,here is a representation of a relation likes between a set FRIENDS = {}

and D ISHES = {} indicating which friend likes which dish.Obviously, the mark (X) could be any symbol. Also, the blank

entries could be some other symbol, so long as we can distinguish thetwo cases. It can be helpful to think of a relation as a table consistingof 1’s and 0’s instead ofX’s and blanks.

Because relations form a category, we can also use wiring diagramsand external diagrams exactly as we did for functions. Naturally, oneneeds to be careful not to confuse functions and relations, becausedifferent facts hold for them. Moreover, since relations may be or-dered by entailment, an external diagram can express more than justequality. Figure 9.1 illustrates this.

v

W X

Y Z

R

ST

U

Figure 9.1: An external diagram show-ing T ;U v R;S

9.4 Subsets and nullary relations

Recall from Principle 5 that a thunk t : 1 → X corresponds exactly toan element of X. A similar idea applied to relations suggests that arelation T : 1# X ought to correspond to a subset of X.


PRINCIPLE 13: Subsets and relations

For any relation R : 1# X, there is a unique subset A ⊆ X so that forany x ∈ X it is the case that x ∈ A if and only if • R x.

Likewise, for any subset A ⊆ X, there is a relation R from 1 to X sothat for all x ∈ X, it is the case that • R x if and only if x ∈ A.

This principle simply tells us that subsets of a set X and relationsfrom 1 to X are interchangible.

9.5 Functions and relations

Every function from X to Y determines a relation from X to Y, calledthe graph of the function and is defined as follows.

PRINCIPLE 14: Graph of a function

For any function f : X → Y, there is a relation Γf : X # Y, called thegraph of f, given by a Γf b if and only if f(a) = b.

The graphs of functions cooperate with identities and with compo-sition in the following way.

LEMMA 16: Γ is a functor

For functions f : X→ Y and g : Y → Z, it is the case that Γg◦f = Γf; Γg. Notice that the “arguments” f and gswap places in this notation. That isonly because relation composition iswritten daigramatically and functioncomposition is written applicatively.Youl will have to get used to that.

Moreover, ΓidX = ∆X holds for any set X.

PROOF : Suppose a Γf; Γg c. By definition there is some b ∈ Y so thatf(a) = b and g(b) = c. So g(f(a)) = c, and hence a Γg◦f c. Conversely,suppose aΓg◦fc. Then g(f(a)) = c. So a Γf f(a) and f(a) Γg c. That is,a Γf; Γg c.

Clearly, ΓidX = ∆X. �

Clearly, relations of the form Γf are special. Most relations that wemight invent are not obtainable as graphs of functions. But we candetect what makes a graph of a function special by a clever use ofcomposition and converse.


Suppose we have a function f : X → Y. We can consider the com-position of Γf with its converse Γf◦. For every x ∈ X, x Γf f(x) andf(x) Γf

◦ a both hold. So a Γf; Γf◦ a. This is summarized by noting that

∆X v Γf; Γf◦.

The only property we needed to show this is that for each a ∈ Xthere is some b ∈ Y so that a Γf b. Let us isolate that property as adefinition.

DEFINITION 13: Total relations

Say that a relation R : X# Y is total if ∆Xv R;R◦.

Looking at the opposite composition, suppose b Γf◦; Γf b ′. Thenthere is some a ∈ X so that b = f(a) and f(a) = b ′. So b = b ′. That is,

Γf◦; Γf v ∆Y .

The only property of we needed here is that for each a ∈ X there is atmost one b ∈ Y so that a Γf b.

DEFINITION 14: Determinate relations

Say that a relation R : X# Y is determinate if R◦;R v∆Y .

Putting these observations together, for any function f : X→ Y, it isthe case that Γf is total and determinate. But suppose we have somerelation R that happens to be total and determinate. Then for everyx in the domain, there is exactly one y in the codomain so that x R y.So R seems to determine a function, even though there is no explicit“rule” for defining it.


PRINCIPLE 15: Implicit function definitions

Suppose R : X # Y is a total determinate relation. Then there is a In traditional formal set theory, afunction is actually defined to be atotal, determinate relation. But thenfunction evaluation has to be definedin an awkward way that is not allthat close to the way we actually usefunctions. In this text, we keep relationsand functions as distinct, but relatedconcepts.

function fR : X → Y so that R = ΓfR . That is, fR(x) = y if and only ifx R y.

The function fR is said to be implicitly defined by R because it maynot be possible to write an explicit rule fR(x) = . . ..

This is an important principle because it provides a way to define afunction by defining a relation, which can be simpler to do, and thenshowing that it is total and determinate.

Exercises

78. Let Sq : N # N be the relation defined bym Sq n if and only ifm = n2. Is this relation total? Is it determinate?

79. Suppose R : X # Y and S : Y # Z are both total, determinaterelations. Prove that R;S is total and determinate.

80. Suppose R : X # Y and S : Y # Z are both total, determinaterelations. Prove that fR;S = fS ◦ fR.

81. Note that for any set X, the diagonal relation ∆X is total and deter-minate. Show that f∆X = idX.

9.6 Powerset and universal relations

In the last section, we saw that functions can be translated into specialrelations, namely the total, determinate ones. In this section, weexplore the other direction.

Recall that for functions, the set YX corresponds to a universal setof parameters for parametric functions X → Y. A similar idea forrelations leads to the following definition.


DEFINITION 15: Universal relations

For a set X, a set Uwith relation R : U # X is called a universalrelation for X if and only if for any relation R : W # X, there is aunique function R∗ : W → U so that R = ΓR∗ ;N.

If X has a universal relation N : U # X, then any relation Rwithcodomain X can be replaced by a function R∗ with codomain U. Theoriginal relation is recovered by ΓR∗ ;N. So if every set has a universalrelation, then all relations can be replaced with functions in this way.

PRINCIPLE 16: Powersets exist

For every set X, there is a set P(X), and a universal relation 3X : P(X)#X. The set P(X) is called the powerset of X.

This, together with Principle 13, means that the elements of P(X)correspond exactly to subsets of X. That is, any A ⊆ X determines arelation RA : 1 # X according to Principle 13. Hence, by Principle 16,R∗A : 1 → P(X) is a function. Thus R∗A(•) is an element of P(X). Thissatisfies x ∈ A if and only if R∗A(•) 3X x.

Conversely, suppose u ∈ P(X). Then the corresponding thunku : 1 → P(X) determines a subset su;3X of X, again by Principle 13.This satisfies u 3X x if and only if x ∈ su;3X . So the two constructions,from A to R∗A(•) and R to su;3X , are inverses.

These fairly technical calculations show that the elements of P(X)correspond to subsets of X. We will follow what most mathematiciansdo by simply taking P(X) to actually be the set of subsets of X. ThusA ⊆ X and A ∈ P(X) will mean exactly the same thing.

9.7 P(X) and 2X

The subsets of 1 are simple enough to calculate. There are exactlytwo: ∅ and 1. So P(1) = {∅, 1}. Because this set has two elements, itmakes sense to denote it by 2. For that matter, if we also write 0 as analternative notation for ∅, then 2 = {0, 1}. This means that 2 can beemployed as the mathematical version of bits. That is, x ∈ 2 meansthat either x is zero or x is one.

And element w ∈ 2X, names a function w : X→ 2. But 2 is P(1). So


w corresponds to a relation from X to 1. The converse of this relationdetermines a subset of X.

Any A ⊆ X is an element of P(X). The corresponding thunkA : 1→ P(X) determines a relation, the converse of which is a relationfrom 1× X to 1. This determines a function from 1× X to 2, and hencean element of 2X.

So elements of 2X and elements of P(X) are also interchangible.For A ⊆ X, the corresponding function from X to 2 is called thecharacteristic of A ⊆ X.

10The Set of Natural Numbers

THE SYMBOL N denotes the set of natural numbers. It is time that weCHAPTER GOALS

In this chapter, we investigate whatdistinguishes N, the set of naturalnumbers, from other sets. The main ideais that 0, successors and the Axiom ofInduction characterize N as a set thatuniversally supports recursion.

make the structure of N explicit within our developing ideas aboutsets and functions..

The structure of N brings set theory into contact with computationvia the crucial idea of recursion.. For example, n! (the factorial of n) isa function from N to N specified recursively by

0! = 1

ky! = ky · k!

The axioms for N are chosen precisely to ensure that a specificationlike this is guaranteed to define a function uniquely.

10.1 Sequences and Simple Recurrences

Let us make the informal word sequence official.

DEFINITION 16: Sequences

For a set X, a sequence in X is a function σ : N→ X.For a sequence σ in X and natural number n, we may write σn

instead of σ(n). In any case, the idea is that σ0,σ1,σ2, . . . are elementsof X.

As we studied in previous lectures, the basic vocabulary of naturalnumbers is that (i) there is a starting natural number, 0, and (ii) foreach natural number k there is a next one, ky.

To discuss zero and successor in the language of sets and functions,we stipulate that 0 ∈ N and that successor is a function suc : N → N

THE SET OF NATURAL NUMBERS 119

(given by the rule k 7→ ky). The notation suc is only needed to bringky into the standard function/application notation. So N is not just aset: it is a set equipped with a special element and a special function.

Suppose X is some other similarly equipped set. For example,b ∈ X is an element, and r : X → X is a function. In other words,(X;b, r) has structure like (N; 0, suc). So Xmay be the natural num-bers, but it certainly does not have to be. It has its own vocabulary b is“a beginning” and for each x ∈ X, the element r(x) comes “after” x.

Then we should be able define a sequence σ in X, so that σ0 = b,σ1 = r(b), σ2 = r(r(b)), σ3 = r(r(r(b))), and so on.

In general, σk should be determined by starting with b and repeat-edly applying r a total of k times. In other words, we can summarize σby two equations:

σ0 = b

σsuc(k) = r(σk)

We can make this precise.

DEFINITION 17: Simple Recurrences and Counting Sets

A simple recurrence is a set equipped with an element b ∈ X andfunction r : X → X. We will say the b and r form a simple recurrenceon X. To make things easy to read, we will write (X;b, r) for a simplerecurrence on X.

A universal recurrence is a simple recurrence (N; z, s) so that forany other simple recurrence (X;b, r) there is exactly one function

Nf−→ X so that

f(z) = b

f(s(n)) = r(f(n)) for all n ∈ N


PRINCIPLE 17: N is a Counting Set

The set N of natural numbers with 0 and suc is a universal recur-rence.

Consider a simple recurrence 1 b−→ Xr←− X. This principle tells us

there is a unique function f : N→ X so that

f(0) = b

f(ky) = r(f(k))

In fact, this is just what Postulate 17 tells us: every simple recurrenceon a set X determines a sequence in X.

On the other hand, it is not the case that every sequence is de-termined by a simple recurrence. Take for example, the sequence0, 1, 0, 2, 0, 3, . . .. This can not be defined (at least not directly) by giv-ing an initial entry and specifying successive entries based only onimmediate predecessors.

We need additonal techniques for defining more general functionsby recursion.

10.2 Primitive Recursion

Evidently, addition, multiplication, factorial, and other familiar func-tions should be definable using Principle 17. But there are problems toovercome: Addition is not a sequence, and factorial is not definable bya simple recurrence. We need a scheme that generalizes simple recur-sion to permit (i) dependence on parameters not directly involved inthe recursion, and (ii) dependence on n at each stage of the recursion.

DEFINITION 18:

A primitive recurrence in X with parameters in P consists of twofunctions P b−→ X

r←− P×N× X.A parametric sequence in X with parameters in P is a function

a : P×N→ X.

This allows for a very general definition of functions by recursion.


POSTULATE 7: Functions defined by primitive recursion

For any primitive recurrence P b−→ Xr←− P ×N× X, there is a Actually, this will be a theorem, once

we establish some additional postulatesfor sets. For now, we will take it to be abasic principle.

Stronger forms of recursive definitionare possible, but these require a moresophisticated understanding of the rolethat N plays.

unique function f : P×N→ X satisfying:

f(p, 0) = b(p)

f(p,ky) = r(p,k, f(p,k))

We say that f is defined by primitive recursion.

If the parameter set is trivial (we don’t actually want extra parame-ters) we can omit them in the notation.

EXAMPLE 17:

The “predecessor” function is defined by the scheme

pred(0) = 0

pred(ny) = n.

Addition add : N×N→N is defined by

add(m, 0) = m

add(m,ky) = add(m,k)y.

Of course, we writem+ n instead of add(m,n). Notice that, formally,we should give a functions b : N→N and r : N×N×N→N

Monus (subtraction in N) can be defined by

m − 0 = my

m −ny = pred(m −n)

Factorial is defined by

0! = 1

(ny)! = ny · n!

Exercises


82. Define addition N×N+−→ N explicitly by primitive recursion.

That is, find functions Nb−→N and N×N×N

r−→N so that

m+ 0 = b(m)

m+ ky = r(m,k,m+ k)

83. Define multiplication N×N·−→N by explicit primitive recursion.

You may use addition in the definition.

84. For a given function f : N → N, find a primitive recurrencethat defines the function Σf : N → N by the informal rule n 7→∑n−1i=0 f(i). That is, the result will be the sequence 0, f(0), f(0) + f(1),

f(0) + f(1) + f(2), . . . .

85. This is a challenging exercise. Suppose f : N→N. Define µf : N→N so that µf(m) is the least value n < m for which f(n) > 0 andism otherwise. In other words, µf(m) will always be a value lessthan or equal tom. If f(n) = 0 for all n < m, then µf(m) = m.Otherwise f(µf(m)) > 0 and if f(n) > 0 for some n < m, thenµf(m) ≤ n.

86. Define by recursion, “halving” as a function from N to N. That is,define h : N→N so that h(2n) = n and h(2n+ 1) = n.

10.3 Mutual Recursion and Other Generalizations

Often, we want to define two or more functions simultaneously interms of each other. For example, consider the following “definitions”of two functions f and g.

f(0) = 0

g(0) = 1

f(ny) = f(n) + g(n)

g(ny) = f(n)

So f(1) = f(0) + g(0) = 1; f(2) = f(1) + g(1) = f(1) + f(0) = 1;f(3) = f(2) + g(2) = f(2) + f(1) = 2. In general, f defines the Fibonaccisequence.

Although the equations defining f and g do not follow the formatof primitive recursion, they seem to be “safe”. The functions f and


g are defined mutually: we can not compute one wihout the other.Nevertheless, cartesian products allow us to “package” f : N → N

and g : N → N into a single function 〈f,g〉 : N → N×N. This latterfunction can be defined by simple recursion. In particular, defineh : N→N×N by

h(0) = (0, 1)

h(ny) = (pr(h(n)) + pr ′(h(n)), pr(h(n)))

Then it is easily checked that f = pr ◦ h and g = pr ′ ◦ h are the functionswe sought. So there was no harm in writing their mutual definition aswe did. Indeed, a general scheme for mutually recursive definitionscould be worked out.

10.4 Why Induction Works

Returning to our original postulates of the natural numbers, let usconsider whymy = ny should implym = n. Recall that Thepredecessor function is defined by primitive recursion:

pred(0) = 0

pred(ky) = k

So ifmy = ny, thenm = pred(my) = pred(ny) = n. In otherwords, our ability to calculate predecessor by recursion forces thepostulate to be true.

Similarly, take a set with two distinct elements. For example,BOOL = {false, true}. The specific elements are not important, butthe fact that false 6= true is. Now define is-zero : N→ BOOL by

is-zero(0) = true

is-zero(ky) = false

Now suppose 0 = ny for some n. Then the recursive definition ofis-zero would mean that true = is-true(0) = is-true(ny) = false. That isnot possible. So 0 can not equal any ny.

The Axiom of Induction also follows from our ability to definefunctions recursively. We can reformulate the Axiom as saying that ifP is a subset of N satisfying

• 0 ∈ P and

• if k ∈ P then ky∈ P holds for all k ∈N,

then P = N. We can show that this is true using recursion. It is easier


to reason a bit more generally. Suppose P is a set, and f : P → N is afunction so that 0 = f(p0) for some p0 ∈ P. Suppose, moreover, thatt : P → P is a function satisfying f(t(p)) = f(p)y for all p ∈ P. Then weclaim that for every n ∈ N, there is some p ∈ P so that f(p) = n. Bysimple recursion, there is a function g : N→ P defined by

g(0) = p0

g(ky) = t(g(k))

So f ◦ g is a function from N to N. It satisfies the following:

(f ◦ g)(0) = f(p0)= 0

(f ◦ g)(ky) = f(t(g(k))

= f(g(k))y

= (f ◦ g)(k)y

That is, f ◦ g satisfies the same recurrence is the identity on N. Since theprimitive recursion postulate requires that any recurrence determinesexactly one function, it must be the case that idN = f ◦ g. So for anyn ∈ N, f(g(n)) = n.

Now take f to be an inclusion map inclP : P →N where P is a subsetof N. The conditions on f translate to saying that 0 ∈ P and that P isclosed under taking successors. So for every n ∈ N, it must be thecase that n ∈ P. Hence the two are equal.

In summary, Postulate 7 says that (primitive) recursion is a permis-sible way to define functions involving natural numbers. This leads toproofs that our original natural number postulates (from Chapter ??)are forced to be true. In effect, they are redundant now. This reinforcesthe idea that computation (in the form of recursion) is foundational formathematics.

Part III

Applications

11Minimum and Maximum

THE ORDERING OF NATURAL NUMBERS gives rise to two otherCHAPTER GOALS

In this chapter, we define min(m,n)and max(m,n) for natural numbers andextend the laws of arithmetic to includethem.

operations: min and max. The minimum of two natural numbersmand n is, of course, the smaller of the two. It makes sense to say thisbecause ≤ is linear. That is, eitherm ≤ n— in which casem is theminimum – or n < m – in which case n is the minimum. We writemin(m,n) for this. The maximum is written as max(m,n). Both ofthese can be specified by algorithms.

ALGORITHM 14: Mimimum

For natural numbersm and n, min(m,n) is calculated by

min(0,k) = 0

min(jy, 0) = 0

min(jy, ky) = min(j, k)y

ALGORITHM 15: Maximum

For natural numbersm and n, max(m,n) is calculated by

max(0,k) = k

max(jy, 0) = jy

max(jy,ky) = max(j,k)y

Exercises

MINIMUM AND MAXIMUM 127

87. Calculate explicitly min(3, 6).

88. Calculate explicitly max(4, 3).

The important property of min and max is summarized by thefollowing lemma.

LEMMA 17: Characterizing min and max

For natural numbersm, n and p,

p ≤ min(m,n) ⇐⇒ p ≤ m and p ≤ nmax(m,n) ≤ p ⇐⇒ m ≤ p and n ≤ p.

PROOF : We look only at min because the proof for max is essentiallyidentical.

By cases, eitherm ≤ n or n < m. In the first case, min(m,n) = m,so min(m,n) ≤ m by reflexivity, and min(m,n) ≤ n becausem ≤ n.In the latter case, min(m,n) = n by reflexivity, and min(m,n) ≤ mbecause we have assumed n < m.

Suppose p ≤ m and p ≤ n. By linearity, eitherm ≤ n or n < m. Ineither case, p ≤ min(m,n). �

It is worth noting that min(m,n) is actually the unique naturalnumber satisfying the two conditions of the lemma. That is, suppose(i) q ≤ m q ≤ n and (ii) for all natural numbers p, if p ≤ m and p ≤ nthen p ≤ q. Then by (i), q ≤ min(m,n). By (ii), min(m,n) ≤ q. Soq = min(m,n). This tells us that min(m,n is, in a sense, optimal forsolving the two inequalities x ≤ m and x ≤ n simultaneously. Wesummarize the lemma by saying that min(m,n) is the greast lowerbound ofm and n. Likewise, max(m,n) is the least upper bound.

Some simple facts about min and max derive directly from thesecharacterizations plus some facts about addition.

In particular, together with addition, min and max make the natu-ral numbers into something called a distributive lattice ordered monoid.Basically, this means that addition together with min and max coop-erate in specific ways that augment the basic laws of arithmetic. Themost useful laws having to do with min and max follow.


BASIC LAWS OF min AND max

For any natural numbers,m, n and p: Like the basic laws of arithmetic pre-sented in Chapter ??, there laws areorganized here to emphasize similaritiesbetween min and max. Pay attention tothat.

Characterization via ≤m ≤ min(n,p) if and only ifm ≤ n andm ≤ pmax(m,n) ≤ p if and only ifm ≤ p and n ≤ p

Associativity min(m, min(n,p))= min(min(m,n),p)max(m, max(n,p))= max(max(m,n),p)

Commutativity min(m,n)= min(n,m)

max(m,n)= max(n,m)

Idempotency min(m,m)= m

max(m,m)= m

Absorptivity m= min(m, max(n,m))

m= max(m, min(n,m))

Distributivity m+ min(n,p)= min(m+n,m+ p)

m+ max(n,p)= max(m+n,m+ p)

max(m, min(n,p))= min(max(m,n), max(m,p))min(m, max(n,p)= max(min(m,n), min(m,p))

m ·min(n,p)= min(m · n,m · p)m ·max(n,p)= max(m · n,m · p)

Modularity m+n= min(m,n) + max(m,n)

We will not prove most of these as they follow easily from arith-metic laws. Nevertheless, a sampling follows.

LEMMA 18: Addition distributes over min

m+ min(n,p) = min(m+n,m+ p) for all natural numbersm,n,p.PROOF : Because min(n,p) ≤ n and addition is monotonic, m + This is a good illustration of why anti-

symmetry of ≤ is useful. We wish toprovem+ min(n,p) = min(m+ n,m+

p). Instead, we provem+ min(n,p) ≤min(m + n,m + p), and separately,min(m+ n,m+ p) ≤ m+ min(n,p).Then we let antisymmetry take us therest of the way.

min(n,p) ≤ m + n. Likewisem + min(n,p) ≤ m + p. Som +

min(n,p) ≤ min(m+n,m+ p).So to complete the proof, we must show that min(m+n,m+ p) ≤

m+ min(n,p). Suppose k ≤ min(m+n,m+ p). Then k ≤ m+n andk ≤ m+ p. If k ≤ m then k ≤ m+ min(n,p) obviously. Otherwise,m < k by Linearity. So m+dy = k for some d. Hence m+dy ≤ m+n

MINIMUM AND MAXIMUM 129

andm+ dy ≤ m+ p. Since ≤ is order reflecting, dy ≤ min(n,p).Consequently, k = m+ dy ≤ m+ min(n,p). We have thus shown thatk ≤ min(m+ n,m+ p) implies k ≤ m+ min(n,p). In particular, thisapplies to min(m+n,m+ p). �

Exercises

89. Calculate the following values. Show work.

(a) min(5, min(4, 6))

(b) min(5, max(4, 6))

(c) min(340, max(234, 340))

(d) min(5, max(3, min(max(1, 2), 7)))

(e) min(5+ max(4+ min(3+ max(7, 8), 3+ min(7, 8)), 6), 7)

90. Prove that min distributes over max.

91. Prove that multiplication distributes over min.

12Divisibility: Ordering the Natural Numbers by Multipli-cation

D IVIS IBILITY of natural numbers is a relation defined by analogy withCHAPTER GOALS

Develop the analogue of ≤ defined bymultiplication instead of by addition.Illustrate the value of the analogy toprove useful properties of divisibility.Introduce general division for naturalnumbers.

the standard order. In the standard order,m ≤ n means thatm+ d = n

for some d. Because multiplication satisfies many of the same laws asaddition (it is commutative, associative, etc.), a similar definition ispossible in terms of multiplication.

DEFINITION 19: Divisibility

For natural numbersm and n, say thatm divides n if and only ifm · q = n for some natural number q. We writem | nwhenm dividesn, and writem - nwhenm does not divide n.

An alternative way to say thatm divides n is n is a multiple ofm.

The distinction betweenm ≤ n andm | n is precisely that theformer is defined by addition and the latter by multiplication. So wecan transfer many facts about ≤ to facts about | simply by noticingthat they depend on analogous laws of arithmetic.

For example, the relation ≤ is reflexive — meaning thatm ≤ m forallm— because 0 is the identity for addition. The relation | is reflexivesimilarly because 1 is the identity for multiplication. That ism | m

becausem · 1 = m. Likewise, ≤ is transitive because addition isassociative; | is transitive because multiplication is associative.

Divisibility is also anti-symmetric — meaning thatm | n and n | m

implies thatm = n. But a proof of this hints at an important differencebetween | and ≤.

Recall that we proved thatm ≤ n and n ≤ m impliesm = n using

DIVISIBILITY: ORDERING THE NATURAL NUMBERS BY MULTIPLICATION 131

cancellativity of addition. But multiplication is only cancellative fornon-zeros. That is,m · p = n · p impliesm = n only when p > 0. So weneed to treat 0 as a special case.

LEMMA 19: Divisibility is anti-symmetric

For any natural numbersm and n ifm | n and n | m, thenm = n.

PROOF : Supposem and n are natural numbers satisfyingm | n andn | m. Thus, there are natural numbers q and r so that

m · q = n

n · r = m

Supposem = 0. Then by the first equation n = 0 as well. Supposem = py for some p. Then

py · q · r = py

So cancelling py yields q · r = 1. But natural numbers also satisfy theLaw of Integrality. So q = 1. �

The divisibility relation begins to be more interesting when werealize that it is not linear. For example, 4 does not divide 13 and 13does not divide 4. Apparently, the structure of the natural numberswith respect to | is much more complicated than with respect to ≤.

Note that 1 | m is true for anym, simply because 1 ·m = m. Andm | 0 is true for anym becausem · 0 = 0. So 1 is “at the bottom” of thedivisibility relation and 0 is “at the top”. It may seem strange to claim that 0 | 0.

Some people find it so irritating thatthey simply rule 0 out of consideration,and declare that 0 | 0 is undefined. Tech-nically, this means they are committedto sayingm | n if and only if there isa unique q so thatm · q = n. Since0 · q = 0 for all q, there is no uniquevalue. This is fine, but I prefer to under-stand 0 | 0 just to mean 0 · q = 0 for someq.

Figure 12.1 shows a fragment of the natural numbers with respectto divisibility.


1

3

9

27

2

6

18

54

4

12

36

108

8

24

72

216

5

15

45

135

10

30

90

270

20

60

180

540

40

120

360

1080

25

75

225

675

50

150

450

1350

100

300

900

2700

200

600

1800

5400

0

...

...

...

...

...

...

...

Figure 12.1: Part of the divisibilityrelation

Exercises

For each of the following pairs (m,n) of natural numbers, determinewhether or notm | n.

92. (4, 202)

93. (7, 49)

94. (11, 1232)

95. (9, 19384394)

96. (n, 6n)

97. (26, 65)

Before closing this section, we note additional facts about howdivisibility interacts with arithmetic.


LEMMA 20: Technical facts about divisibility

1. m | n impliesm | n · p

2. Ifm | n, thenm | p if and only ifm | (n+ p)

3. If n < m andm | (pm+n), then n = 0.

4. m | n and n > 0 impliesm ≤ n.

PROOF : Whenm = 0, all of these are trivial, as 0 | n holds if and onlyif n = 0. So supposem is positive.

1. is due associativity.

2. is due to distributivity.

3. Suppose n < m andmd = mp+ n. If we can show that d = p,then it follows that n = 0. First, note thatmp ≤ md. And sincemis positive, it can be cancelled. So p ≤ d. Second,md = mp+ n <

mp+m = m(py). Again, cancellingm yields d < py. So d ≤ p.

4. follows from (3).

�

Exercises

98. Fill in the details of the proofs of Lemma 20:1,2 and 4.

12.1 Quotients and Remainders

Without rational numbers, division still makes sense, provided weaccount for the fact that sometimes numbers don’t divide evenly.When you first learned about division, you learned about quotients


and remainders. For example, you may have calculated a long division,

925

13)12034

11700

334260

7465

9

r 9

showing that the quotient is 925with remainder 9. In any correctlong divisionm÷ n, the remainder (call it r) must be less then n. Inthe example, 9 < 13. The quotient (call it q) and r together satisfym = n · q+ r. In the example, 12034 = 13 · 925+ 9.

So the long division calculations you performed in elementaryschool were really an implementation of the following fact.

THEOREM 3: Natural number division

For any natural numberm and any positive natural number n,there is a unique pair of natural numbers q and r satisfying

1. m = n · q+ r and

2. r < n.

PROOF : First, we prove that if q and r satisfy the stated conditions,and so do q ′ and r ′, then q = q ′ and r = r ′. This will prove thatthere can be at most one pair of natural numbers satisfying the statedconditions.

Suppose n · q+ r = n · q ′ + r ′ and r < n and r ′ < n. If r < r ′, thenthere is some e so that r+ ey = r ′. Hence q · n = q ′ · n+ ey. But thenn | n · q ′ + ey. So nmust divide ey. But ey is strictly less that n, sothis is impossible. Thus it can not be the case that r < r ′. For the samereason, it can not be the case that r ′ < r. So r = r ′. By cancellativityof addition, q · n = q ′ · n. And since n is positive, cancellativity ofmultiplication yields q = q ′.

Now we prove that there actually is a pair of numbers q and rsatisfying the conditions. For this, we proceed by induction onm. Fixa positive natural number n.

• [Basis] The goal is to find q and r so that 0 = q · n+ r and r < n.Clearly, q = 0 and r = 0 do the job.


• [Inductive hypothesis] Assume that for some k ∈ N, there arenatural numbers p and s satisfying k = p · n+ s and s < n.

• [Inductive Step] The goal is to find q and r so that ky = q · n+ r andr < n. By the inductive hypothesis, ky = p · n+ sy. There are twocases to consider: either sy < n, or sy = n— sy can not be strictlygreater than n because s < n. Suppose sy < n. Then let q = p andlet r = sy. Suppose sy = n. Then ky = p · n+ n = py · n+ 0. Solet q = py and r = 0.

So the claim is true for allm ∈N and all positive n ∈N. �

This theorem indicates that division of natural numbers works, aslong as we account for both quotient and remainder. It is useful tohave notation for these.

DEFINITION 20:

For any natural numberm and positive natural number n, letm//n denote the quotient andm mod n the remainder of dividingmby n. That is,m//n andm mod n are the unique natural numbers sothat

1. m = n · (m//n) + (m mod n) and

2. (m mod n) < n.

The notation // is borrowed from the programming langaugePython. It is intended to avoid confusion with real number divisionx/y. Python, Java, C and many other languages use a percent signfor remainder, but unfortunately, its precise meaning differs fromlangauge to language. The notation mod avoids this ambiguity.

Notice that // can be characterized in relation to ≤. First, evidentlyifm ≤ n and p > 0, thenm// p ≤ n// p. A proof of this is a simpleexercise. So division by a positive p is monotonic: m ≤ n impliesp ·m ≤ p · n.


LEMMA 21: Division is adjoint to multiplication

For natural numbersm, n and p, with p positive, it is the case thatp ·m ≤ n if and only ifm ≤ n// p.

PROOF : Supposem ≤ n// p. Then p ·m ≤ p · (n// p) becausemultiplication by p is monotonic. Moreover, p · (n// p) + (n mod p) =n. So p ·m ≤ n.

Conversely, suppose p ·m ≤ n. Then (p ·m)// p ≤ n// p. Butm = (p ·m)// p. �

The following shows that a dual version of division is also possible.That is, form ∈ N and positive n ∈ N, there is a natural number s sothat s ≤ p is and only ifm ≤ n · p.

COROLLARY 1: Natural number division, dual version

For any natural numberm and any positive natural number p,there is a unique pair of natural numbers s and t satisfying

1. m+ t = p · s and

2. t < p.

Moreover, for any natural number n, s ≤ n if and only ifm ≤ n · p.

PROOF : By division, there are natural numbers q and r so thatm =

n · q+ r and r < n. If r = 0, thenm+ r = n · q. Otherwise, r < n andthere is a “difference” t so that r+ t = n. Clearly, t < n because r > 0.Som+ t = n · q+ r+ t = n · q+n = n · (q+ 1).

Suppose s ≤ p. Thenm ≤ n · s ≤ n · p. Conversely, ifm ≤ n · p,then

n · s = m+ t < n · p+n = n · (p+ 1).

But multiplication by n is order reflecting, so s < p+ 1. That is, s ≤ p.�

There is no agreed-upon notation for the dual version of divisionmainly because it simply is not used much. Though it is tempting towrite p \\m, we do not need to introduce a new bit of notation.

Exercises


99. Calculate the following:

(a) 24// 7

(b) 10000000 mod 10000001

(c) 13 mod 8

(d) 8 mod 5

(e) 5 mod 3

(f) 3 mod 2

(g) 2 mod 1

100. Show that for anym ∈ N, any positive n ∈ N andp ∈ N, it is thecase that

(p ·m)//(p · n) = m//n

and(p ·m) mod (p · n) = p · (m mod n).

13Greatest Common Divisors and Least Common Multi-ples

Recall that the mimimum of two natural numbers — written min(m,n) Write better intro and addgoalsWrite better intro and addgoals— is characterized as the greatest lower bound ofm and nwith re-

spect to the standard order ≤. Likewise, max(m,n) is the least upperbound. That is, for anym, n and p, they satisfy

p ≤ min(m,n) ⇐⇒ p ≤ m and p ≤ nmax(m,n) ≤ p ⇐⇒ m ≤ p and n ≤ p

This chapter concerns analogous operations with respect to divisi-bility. The idea is simple. If any two natural numbers have a greatestlower bound and a least upper bound with respect to the “natural”ordering of ≤, then perhaps they also have a greatest lower boundand a least upper bound with respect to divisibility.

In fact, they do. The greatest lower bound with respect to divisibil-ity is called the greast common divisor and is denoted gcd(m,n).

The least upper bound with respect to divisibility is called the leastcommon multiple and is denoted by lcm(m,n).

Though it may not be immediately obvious that greatest common As you know, greatest common divisorsand least common multiples play usefulroles in manipulating fractions. Forexample, to add 1

6 + 215 , we first find

that 30 is least common multiple of 6and 15. So 1

6 = 530 and 2

15 = 430 . Thus

16 + 2

15 = 930 . The result is not reduced

to simplest terms. Continuing, we notethat 3 is the greatest common divisor of9 and 30. So 9

30 = 3·33·10 = 3

10 .

divisor or least common multiple for any two natural numbers exists,we have a good inkling that they do.

When you simplify a fraction, say 248 , you remove any common

factors. For example, 2418 = 66 ·

43 . So the simplified fraction is 43 . You

could have factored out 22 or 33 , but 66 is in some sense the best youcan do. It divides 24 and 18, and is divisible by any other commondivisor. In short 6 is the greatest common divisor of 24 and 18. Youhave performed the operation of finding greatest common divisorsever since you learned to simplify fractions. So now our task is simplyto put that performance on a solid footing.

GREATEST COMMON DIVISORS AND LEAST COMMON MULTIPLES 139

We do this by showing that for every two natural numbersm andn, natural numbers gcd(m,n) and lcm(m,n) exist so that

p | gcd(m,n) ⇐⇒ p | m and p | n

lcm(m,n) | p ⇐⇒ m | p and n | p

Compare these two conditions to the characterizing conditions formin and max. They are identical, except that ≤ is replaced by |.

A proof that every two natural numbers have a greatest commondivsior is quite old. Euclid included one proof in his geometry text.The proof given here is based on a proof due to the 18th centurymathematician Bezout. This proof gives us more information aboutthe greatest common divisor that will be useful later.

Bezout’s original theorem is proved in terms of integers, not naturalnumbers, using negative numbers and subtraction. We work this outin the terms of natural numbers only. The proof is longer and morecomplicated, essentially because it replaces subtraction with steps thatinvolve cancellation. The advantages are that the proof demonstrateshow to use cancellativity in a proof, and that the result only involvesnatural numbers (integers are not really needed).

Before moving to the proof of the theorem, we establish a simplelemma.

LEMMA 22:

For any natural numbersm, n and p, if 0 < m, then p is a commondivisor ofm and n if and only if p is a common divisor of (n mod m)

andm.

PROOF : This follows immediately from the fact that for anym, n andp, if p | m then it is the case that p | n if and only if p | (m+n). �

To prove the main result of this section, we introduce a new defi-nition (only needed in this section) and prove a useful lemma aboutit.


DEFINITION 21: Bezout Pairs

For a pair of natural numbers (m,n), say that a pair of naturalnumbers (a,b) is a Bezout pair for (m,n) if and only if (b · n) − (a ·m)

is the greatest common divisor ofm and n.

Notice that Bezout pairs depend on order. If (a,b) is a Bezout pairfor (m,n), it is almost never the case that (a,b), or even (b,a) is aBezout pair for (n,m).

LEMMA 23: Condition when Bezout pairs can be swapped

If (m,n) has a Bezout pair andm > 0, then (n,m) also has a Bezoutpair.

PROOF : Supposem > 0 and

g = b · n − a ·m (13.1)

is the greatest common divisor ofm and n. Notice that gmust bepositive, for otherwisem = n = 0. So it is also the case that am < b · n.Hence n > 0 and

g+ a ·m = b · n. (13.2)

Our goal is to find natural numbers a ′ and b ′ so that g+ a ′ · n = b ′ ·m.We consider the two cases: a = 0 and a > 0.Suppose a = 0. Then the greatest common divisor ofm and n is

a positive multiple of n. But then g = n, b = 1 andm is a positivemultiple of n. In that case, we leave it as an exercise to check thata ′ = (m//n) − 1 and b ′ = 1 constitute a Bezout pair for (n,m). That is,g+ a ′ · n = b ′ ·m.

Suppose a > 0. Then a · b ·m ≤ b. And because b · n > 0, it is alsothe case that a · b · n ≤ a. Again, we leave it as an exercise to check thata ′ = b · (a ·m − 1) and b ′ = a · (b · n − 1) constitute a Bezout pair for(n,m). �


THEOREM 4: Bezout’s Theorem

For any natural numbersm and n for whichm ≤ n, there is aBezout pair for (m,n).

PROOF : By strong induction on n, we show that for everym, ifm ≤ nthen there is a Bezout triple for (m,n).

Strong inductive hypothesis Suppose that for some k it is the case thatfor all pairs (i, j) so that i ≤ j < k, there is a Bezout pair for (i, j).

Inductive step: The goal is to find a Bezout pair for any pair (h,k)where h ≤ k.

For k = 0, the only case is h = 0. And clearly, (0, 0) is a Bezout pairfor (0, 0).

Suppose k is positive and h ≤ k. We consider three cases: h = 0,h = k, and 0 < h < k.

Case h = 0: Evidently k is the greatest common divisor of 0 and k.And clearly k = 1 · k − 0 · 0. So (0, 1) is a Bezout pair for (0,k).

Case 0 < h ≤ k: Then k mod h < h. So by the strong inductivehypothesis, there is a Bezout pair (a ′,b ′) for (k mod h,h). Thatis, g = b ′ · h − a ′ · (k mod h) is the greatest common divisor ofk mod h and h, and

g+ a ′ · (k mod h) = b ′ · h. (13.3)

By Euclid’s lemma, g is also the greatest common divisor of hand k. So it remains to find a and b satisfying

g+ a · h = b · k.

From Equation 13.3, by adding equals to each side and using thefact that k = h · (k//h) + k mod h,

g+ a ′ · (h · (k//h) + k mod h) = b ′ · h+ a ′ · h · (k//h) (13.4)

g+ a ′ · k =(b ′ + a ′ · (k//h)

)· h. (13.5)

So (a ′,b ′ + a ′ · (k//h)) is a Bezout pair for (k,h). But k ispositive. So Lemma 23 yeilds the desired pair for (h, k).

�


Bezout’s Theorem establishes that the greatest common divisorexists for any two natural numbersm ≤ n. But because gcd(m,n) =gcd(n,m), it actually shows that the order does not matter, except Actually, the last half of the proof

involved converting a Bezout pair for(n,k) into a Bezout triple for (k,n).So in principle, we could adapt theproof to show directly that the order ofarguments does not matter in any case.

with respect to determining Bezout pairs.An algorithm for calculating Bezout pairs, and therefore gcd(m,n),

can be extracted from the proof. We can also extract a simpler algo-rithm for calculating gcd(m,n) without finding actual Bezout pairs.This is the basis of the algorithm that Euclid provided.

ALGORITHM 16: Euclid’s algorithm

For natural numbersm and n, gcd(m,n) can be computed via thefollowing:

gcd(0,n) = n

gcd(m,n) = gcd(n mod m,m) for anym > 0

The least common multiple ofm and n also exists and relates togcd(m,n) in the following way.

THEOREM 5: Least common multiples exist

Any two natural numbersm and n have a least common multiple.Moreover, if we write lcm(m,n) for the least common multiple, thenm · n = gcd(m,n) · lcm(m,n).

PROOF : If gcd(m,n) = 0, thenm = 0 and n = 0. So 0 is their onlycommon multiple, andm · n = gcd(m,n) · lcm(m,n).

Suppose that gcd(m,n) is positive and thatm ≤ n (or swapm andn if not). So gcd(m,n) · p = m and gcd(m,n) · q = n for some naturalnumbers p and q.

Let s = gcd(m,n) · p · q. Clearly, s is a multiple ofm and is amultiple of n, and gcd(m,n) · s = m · n. So s is a common multiple ofm and n. To prove the result, we must show that s is the least. That is,if r is also a common multiple ofm and n, then r is a multiple of s.

Supposem · p ′ = r and n · q ′ = r. Then r = gcd(m,n) · p · p ′ =gcd(m,n) · q · q ′. So p · p ′ = q · q ′. Let t = p · p ′.

By Bezout’s Theorem, find natural numbers a and b so that

gcd(m,n) + a ·m = b · n.


Sot · gcd(m,n) + t · a ·m = t · b · n. (13.6)

But t · a ·m = s · a · q ′, and t · b · n = s · b · p ′. So Equation 13.6 can bewritten as

r+ s · a · q ′ = s · b · p ′ (13.7)

Since r+ s ·a ·q ′ and s ·b ·p ′ are both multiples of s, r is also a multipleof s. �

We can take the result of this theorem as a definition.

DEFINITION 22: Least common multiple as an operation

For any natural numbersm and n, let lcm(m,n) denote the least The proof of Theorem 5 can be usedto extract an algorith for computinglcm(m,n). The simplest, but not mostefficient, method is to take lcm(m,n) =(m · n)// gcd(m,n).

common multiple ofm and n.

Theorem 5 actually tells us more than mere existence of lcm(m,n).In fact, it shows that gcd(m,n) · lcm(m,n) = m · n. This is exactlyanalogous to one of the laws we established in Chapter ??: min(m,n) +max(m,n) = m+ n. Though the proof of Theorem 5 is much morecomplicated, the two results are clearly similar. They are in fact closelyrelated. Many of the laws we enumerated for min and max carry overto greatest common divisor and least common multiple. Compare thefollowing table of laws to the table on page 128


BASIC LAWS OF gcd AND lcm

For any natural numbersm, n and p:Characterization via |

m|gcd(n,p) if and only ifm | n andm | p

lcm(m,n) | p if and only ifm | p and n | p

Associativity gcd(m, gcd(n,p))= gcd(gcd(m,n),p)lcm(m, lcm(n,p))= lcm(lcm(m,n),p)

Commutativity gcd(m,n)= gcd(n,m)

lcm(m,n)= lcm(n,m)

Idempotency gcd(m,m)= m

lcm(m,m)= m

Absorptivity m= gcd(m, lcm(n,m))

m= lcm(m, gcd(n,m))

Distributivity m · gcd(n,p)= gcd(m · n,m · p)m · lcm(n,p)= lcm(m · n,m · p)

lcm(m, gcd(n,p))= gcd(lcm(m,n), lcm(m,p))gcd(m, lcm(n,p)= lcm(gcd(m,n), gcd(m,p))

Modularity m · n= gcd(m,n) · lcm(m,n)

13.1 Relative Primality

One characterization of prime numbers is that p is prime if and onlyif p is greater than 1 and gcd(p,n) = 1 for all n lying strictly between1 and p. For example, 5 is prime. It is easy to check that gcd(5, 2) =

gcd(5, 3) = gcd(5, 4) = 1. More generally, pairs of numbers forwhich gcd(m,n) = 1 play an important role in many parts of discretemathematics. Hence the following definition.


DEFINITION 23: Relatively Prime Numbers

Natural numbersm and n are said to be relatively prime if and onlyif gcd(m,n) = 1.

When you “reduce” a fraction, such a 2415 , you find the greatestcommon divisor gcd(24, 15) and factor it from the numerator anddenominator. In this case, the result is 3·83·5 = 8

5 . The result has anumerator and denominator that are relatively prime. In fact, this is auseful definition of “reduced fraction”. I won’t pursue this idea herebecause fractions are not our concern for now. Nevertheless, a coupleof useful lemmas regarding relative primality are in order You canwork out how these relate to fractions.

LEMMA 24: Factoring out a greatest common divisor results in rela-tively prime numbers.

Supposem and n are any two natural numbers. Choose p and qso thatm = gcd(m,n) · p and n = gcd(m,n) · q. Then p and q arerelatively prime.

PROOF : Exercise. �

LEMMA 25: Products of relative primes are relatively prime

If gcd(m,p) = 1 and gcd(n,p) = 1, then gcd(m · n,p) = 1.PROOF : In case p = 0, this is trivial since gcd(m, 0) = m andgcd(n, 0) = n. Otherwise, gcd(n,p) = gcd(p mod n,n) andgcd(m,p) = gcd(p mod m,m) Suppose gcd(m · n,p) > 1, gcd(m,p) =1 and gcd(n,p) = 1. So gcd(m · n,p) divides p. Evidently, it can notdividem or n. So there is a non-zero remainder in both cases. Thatis,m = q · gcd(m · n,p) + r and n = q ′ · gcd(m · n,p) + r ′ for someq, r,q ′, r ′ where 0 < r, r ′ < gcd(m · n,p). �


LEMMA 26: A cancellation law for divisibility

Ifm and n are relatively prime andm | (n · p), thenm | p.

PROOF : Suppose gcd(m,n) = 1 andm divides n · p. Eitherm ≤n or n < m. Supposem ≤ n, Bezout’s Theorem yields naturalnumbers s and t so that 1+ s ·m = t · n. Notice that t can not be 0. Sop+ p · s ·m = p · t · n. Sincem divides the right side of this equation, itdivide the left. Sincem also divides p · s ·m, it must divide p. Supposen < m, then again Bezout’s Theorem yields natural numbers s andt so that 1 + s · n = t ·m. In particular, t 6= 0. Som | t · n · p andt · n · p = p+ s ·m · p. Againmmust also divide p. �

Exercises

101. For each of the following pairs of numbers, calculate their greatestcommon divisor and indicate which pairs are relatively prime.

102. Recall the Fibonacci numbers. Show that fib(n) and fib(n+ 1) arealways relativey prime.

103. Show that for any natural numbersm andm · n + 1 are alwaysrelatively prime.

A natural next topic is primality. As you know, every positivenatural number is composed of prime factors in essentially one way.For example, 30 is 2 · 3 · 5, and there is no other decomposition of 30except by re-ordering the factors. To make this general fact, knownas the Fundamental Theorem of Arithmetic, precise, it will be helpful tohave more mathematical equipment at our finger tips. Also, we nowhave some experience with proofs. It will help to study how proofswork.

14Prime Numbers

PRIME NUMBERS ARE BUILDING BLOCKS for the natural numbers.CHAPTER GOALS

Define rigorously the prime numbersand prove the Fundamental Theorem ofArithmetic (each positive natural num-bers has a unique prime factorization).

As such they are fundamental to many branches of discrete mathemat-ics.

We all know what a prime number is: a number that is not 1 andhas no factors smaller than itself. One might ask why 1 is excluded.After all, 1 does not have any factors smaller than itself either. Soat first blush, excluding 1 seems like an arbitrary decision. In fact,historically 1 has sometimes been regarded as a prime number. Butthat actually makes things more difficult. So now, everyone agreesthat 1 should not be considered to be prime. In the next paragraphs,we look at why this makes sense.

Recall that the product of a list of natural numbers is definedinductively by

•∏

[ ] = 1

•∏n : L = n ·

∏L.

DEFINITION 24: Prime Numbers

A prime number is a positive natural number p so that for any listLinList[N] of natural numbers, if p |

∏L, then p | n for some n ε L.

With this definition, 1 is not prime for exactly the same reason that6 is not. In both cases, all we need to show is the number divides someproduct of natural numbers, but does not divide any of the factorsin that product. For 1, just observe that 1 divides

∏[ ], but 1 does not

divide any item on the list [ ] ( because there are no items on the emptylist). Likewise, 6 divides

∏[2, 3] but 6 does not divide any item on the


list [2, 3]. In contrast, 2 is prime because if 2 |∏Lwith L being a list of

natural numbers, at least one of the items on the list must be even.Notice that for both of the counter-examples, 6 and 1, we were able

to choose a product that does not just divide the number, but is equalto the number. This suggests that the definition of primality is relatedto a seemingly more general notion.

DEFINITION 25: Irreducible numbers

An irreducible number is a positive natural number p so that forany list L of natural numbers, if p =

∏L, then p ε L.

This definition is most likely what you were told is the definition ofprimality. That is, a prime number is usually defined to be any naturalnumber greater than 1 that can not be factored into properly smallenumbers. It is not hard to see that this is precisely what irreducibilitymeans.

The distinction between irreducibilty and primality is an importantone in other contexts. We will encounter some of those later whenwe discuss modular arithmetic. As it happens though, in the naturalnumbers primes and irreducibles are the same. So why bother to havetwo definitions now?

First, in situations very closely related to the natural numbers,primes and irreducibles really are not the same. So it is helpful to beaware of the distinction now. Second, the proof that the two conceptsare the same for natural numbers is instructive. To show that allprimes are irreducible is fairly easy. The proof that all irreducibles areprime is more complicated.

LEMMA 27: All primes are irreducible

Any prime number is also an irreducible number.

PROOF : Suppose p is prime. Consider a list L so that p =∏L. Then

p divides∏L. So there is some n ε L so that p divides n. But n also

divides∏L. Because divisibility is anti-symmetric, p = n. �

To prove the converse, we need a fact that (in a different form) wsknown to Euclid.

PRIME NUMBERS 149

LEMMA 28: Euclid’s Lemma

If p is irreducible, then for anym eitherm and p are relatively prime,orm is a multiple of p.

PROOF : Suppose p is irreducible. In particular, it is positive, sogcd(p,m) 6= 0.

Suppose 1 < gcd(p,m) < p. Then for some q, gcd(p,m) · q = p.But then 1 < q < p as well. So p is not irreducible. This contradictsthe assumption, so gcd(p,m) must either equal 1 or be greater than orequal to p. But gcd(p,m) ≤ p always holds when p is positive.�

THEOREM 6: Irreducibles are prime

Every irreducible number is prime.PROOF : Suppose p is irreducible. By induction on lists, we show thatif p |

∏L holds for some list L, then p | n for some n on the list.

Basis The basis holds vacuously because∏

[ ] = 1 6= p.

Inductive Hypothesis Suppose K is a list of natural numbers so that ifp |∏K, then p | n for some n on the list K.

Inductive Step The goal is to show that for any natural numberm, ifp |∏

(m : K), then p | n for some n on the listm : K. That is, eitherp | m or p | n for some n on the list K.

By Euclid’s Lemma, eitherm is a multiple of p orm and p arerelatively prime. If it is the former, then p | m. If it is the latter, thenp |∏K by Lemma 26. So by the inductive hypothesis, p | n for

some n on the list K.

�


LEMMA 29: All irreducibles are prime

Any irredicuble number is a prime number.

PROOF : Suppose r is an irreducible number. We prove by induction onlists that if r |

∏L, then r | n for some n on the list.

Basis If r is irreducible, it is greater than 1. So r can not divide∏

[ ].The basis is true vacuously.

Inductive Hypothesis Suppose that K is a list of positive natural num-bers so that if r |

∏K, then p | n for some n on the list K.

Inductive Step The goal is to show that for any positivem, if r divides∏(m : K), then r divides some n on the listm : K. That is, either

r | m or r | n for some n on the list K.

Suppose that r |∏

(m : K). Consider g = gcd(r,m). Because bothr andm are positive, g is also positive and r = g · q for some q.Because r is irreducible, either g ≥ r or q ≥ r. So one or the otheris equals r. If g = r, then r | m. Otherwise, r andm are relativelyprime. So by Lemma 26,m | m ·

∏K,m |

∏K. So the inductive

hypothesis ensures thatm | n for some n on the list K.

�

Consequently, we can use the terms “irreducible” and “prime”interchangibly for natural numbers (also for integers when they comeup). But bear in mind that the two concepts differ in other contexts.

Our next task is to prove two facts you are been told many times.First, the Fundamental Theorem of Arithmetic says that every positivenatural number can be factored in essentially one way into primenumbers. Second, we show that there are infinitely many primes.

We split the proof of Fundamental Theorem into two parts. First,we prove that every positive natural number factors into irreducibles.Then we show that any factorization into primes is unique except forthe order in which the primes are listed.

PRIME NUMBERS 151

LEMMA 30:

For any positive natural numberm, there is a list P consisting only ofirreducible numbers so thatm =

∏P.

PROOF : Here we use strong induction.

• [Strong Inductive Hypothesis] Assume that for some positive k, itis the case that for every 0 < j < k, there is a list of primes P so thatj =∏P.

• [Strong Inductive Step] There are three cases to consider. Eitherk = 1, or k = i · j for some positive i and j strictly less than k, orneither of these holds.

Suppose k = 1. Then we let P = [ ]. This is a (trivial) list of irre-ducibles whose product is k.

Suppose k = i · j where i and j are both positive and strictly less thank. By the inductive hypothesis, there are lists of primes Q and R sothat i =

∏Q and j =

∏R. Hence k =

∏Q ·∏R =∏

(Q++ R).

Suppose k is neither equal to 1, nor equal to i · j for any two naturalnumbers strictly less than k. Then k itself is irreducible. The proofof this is by induction on lists.

Basis Because k 6=∏

[ ], the basis is vacously true. That is if k =∏[ ], then kwould appear on the list [ ].

Inductive hypothesis Suppose that for a list K, if k =∏K, then k

appears on the list.

Inductive step The goal is to show that for any n, if k =∏

(n : K),then k appears on the list n : K. That is, either k = n or k ε K.

By definition,∏

(n : K) = n ·∏K. But the case of k under

consideration is that k = i · j implies either i ≥ k or j ≥ k. Becausek is positive, either k = n or k =

∏K. In the first case, k ε (n : K).

In the second case, by the inductive hypothesis for K, k ε K, sok ε (n : K).

So k is irreducible. And since k =∏

[k], it is the product of a list ofirreducibles.

These are the only possible cases. So the strong inductive step isproved.

�


The list of irreducibles constructed in Lemma 30 is often called aprime factorization ofm (because we know irreducible and primemean the same thing). Generally, a composite has more than onesuch factorization for trivial reasons. For example, the lists [2, 3]and [3, 2] both factor 6 into irreducibles. But the only way two suchfactorizations of the same number can differ is by the order in whichthe factors are listed. If we insist that our lists are sorted in increasingorder, this ambiguity is avoided.

Recall from Chapter ??.?? that because N is ordered by ≤, we cansay what it means to have a sorted list of natural numbers. Namely, alist L of natural numbers is sorted if Li ≤ jj whenever i ≤ j < len(P).

LEMMA 31:

Sorted lists of primes are determined by their productsFor any sorted lists of primes, P and Q, if

∏P =∏Q then P = Q. In other words, the factorization of a

positive natural number into primes isunique up to the order in which we listthe factors.

PROOF : The proof is by list induction on P.

Basis For any list Q of irreducibles, if∏

[ ] =∏Q, then Qmust be the

empty list as well. I leave the details of checking this to you.

Inductive Hypothesis Suppose that for some sorted list K consistingof primes it is the case that for every sorted list P of primes, if∏K =∏P, then K = P.

Inductive Step Consider some primem so thatm : K is sorted. Thegoal is to show that for any sorted list of primes Q, if

∏m : K =∏

Q, thenm : K = Q. This is by list induction on Q.

Basis As before,∏

[ ] = 1. So the basis holds vacuously.

Inductive Hypothesis Suppose that for some list sorted list J, it is thecase that if

∏m : K =

∏J, thenm : K = J.

Inductive Step Consider prime n so that n : J is sorted. Ifm = n,then

∏K =

∏J. So by the main inductive hypothesis, K = J. So

m : K = n : J.

To complete the inductive step we show thatm < n and n < mare both impossible. Ifm < n, thenm is relatively prime to n.Som |

∏J, but because n : J is a sorted list of primes,m is also

relatively prime to each item in J. This contradicts primality ofm.Likewise, n < m contradicts primality of n for the sae reason byswapping the roles ofm and n, and of K and J.

So we have completed the iductive proof that for any sorted list of

PRIME NUMBERS 153

primes Q, if∏m : K =

∏Q, thenm : K = Q. This is the needed

inductive step.

�

The preceeding lemmas show that every positivem has a uniquesorted prime factorization, typically called the prime factorization.

THEOREM 7: Fundamental Theorem of Arithmetic

Every positive natural number has a unique sorted prime factoriza-tion.

PROOF : All that remains is to the remark that if P is a prime factoriza-tion ofm, then P can be sorted into increasing order. The result has thesame product P because of commutativity. �

For example, 24 is factored as [2, 2, 2, 3] and 800 is factored as[2, 2, 2, 2, 2, 5, 5]. We can get a more efficent representation by listingthe number of times each prime is repeated. So we can represent 800by [5, 0, 2], signifying that 800 = 253052. Notice that in this notation,we need the middle 0 as a “place holder” to indicate that our numberdoes not have any 3 factors. Also “trailing zeros” in this notation donot make a difference. [5, 0, 3, 0] also represents 800. The extra 0 at theend simply tells us that 800 is not divisible by the next prime (7). Wewill investigate this notation informally (without supplying oroofs)after establishing that we have a plentiful supply of primes. The ad-vantage of the notation is that it allows us to see immediately how gcdand lcm are not merely analogous to min and max, but are actuallyclosely related computationally.

THEOREM 8:

There are infinitely many primesFor every natural numberm, there is a prime p greater thanm.

PROOF : We prove this by showing that no finite list of prime exhaustsall of them.

Suppose L is a non-empty list consisting of primes. To show thatL is missing something, consider the numberm = 1 +

∏L. Since∏

L ≥ 1,m > 1. Som has a non-empty irreducible factorization, sayM. ClearlyM does not have any item in common with L (this couldbe proved explicitly by induction on L). So any item on the listM is


missing from L SinceM is not empty, L can not be an exhaustive list ofall primes. �

DEFINITION 26:

We can enumerate the primes: 2, 3, 5, 7, . . . in increasing order. Forevery natural number k, let pk be the kth prime. That is, the numberspk satisfy

p0 = 2

pky) = the smallest prime q so that pk < q

Notice that this is well-defined because for anym there is a primegreater thanm. We would not know this if we did not know there areinfinitely many primes.

Exercises

104. What is p10?

105. What is the prime factorization of 1440?

Using pk, we can succinctly represent any positive natural num-ber by a list of exponents of primes. Namely, for a list R of naturalnumbers, define Rpe (for “prime exponent representation”) to be

Rpe :=∏i<lenR

pRii .

For example, [1, 2, 0, 2]pr = 21325072 = 882.For a positive natural number n, let PE(n) denote the unique list so

that (i) PE(n) 6= = n and (ii) PE(n) does not contain any trailing zeroes.Define P+Q for two lists of natural numbers by adding the items of

the two lists itemwise. That is,

P+[] = P

[]+Q = Q

m : P+n : Q = (m+n) : (P+Q)

Then it is clear (we will not give a proof) that Ppe ·Qpe = (P+Q)pe.

PRIME NUMBERS 155

Define a relation P � Q on lists of natural numbers by

• [] � Q always,

• m : P � [] never, and

• m : P � n : Q if and only ifm ≤ n and P � Q.

Then Ppe | Qpe if and only if P � Q. In other words, if we list theexponents of primes that constitute given numbersm and n, we cancompare them for divisibility simply by comparing the exponents by≤.

This leads to the relation between min and gcd. Namely, define anoperation on lists of natural numbers by taking minima itemwise.

min(P, [ ]) = P

min([ ],Q) = Q

min(m : P,n : Q) = min(m,n) : min(P,Q)

Then it is also easy to check that min(P,Q)pe = gcd(Ppe,Qpe) for anytwo lists of natural numbers.

A similar definition of max will lead to max(P,Q)pe = lcm(Ppe,Qpe)

for any two lists of natural numbers as well.Now the fact thatm · n = gcd(m,n) · lcm(m,n) becomes clearly a

corollary of the fact thatm+n = min(m,n) + max(m,n). Namely, forany two lists of natural numbers, P and Q,

P+Q = min(P,Q)+max(P,Q)

is easily proved by induction on lists. And the preceding paragraphsshow thatm · n = gcd(m,n) · lcm(m,n) follows from this.

In principle, gcd(m,n) could be calculated by first factoringm andn into primes, then using min on the resulting lists, then convertingthat back to a natural number. This method, however, is extremelyinefficient because finding the prime factorization of a large number isdifficult.

mandrewmoshier.github.io · 2019-12-31 · what is this about? discrete mathematics consists of...

Documents