undergraduatetopicsincomputerscienceperso.ens-lyon.fr/.../sets_logic_and_maths_for_computing-makinson… ·...

Undergraduate Topics in Computer Science

Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional content forundergraduates studying in all areas of computing and information science. From core foundational andtheoretical material to final-year topics and applications, UTiCS books take a fresh, concise, and modernapproach and are ideal for self-study or for a one- or two-semester course. The texts are all authored byestablished experts in their fields, reviewed by an international advisory board, and contain numerousexamples and problems. Many include fully worked solutions.

For further volumes:http://www.springer.com/series/7592

David Makinson

Sets, Logic and Mathsfor Computing

Second edition

123

David MakinsonLondon School of EconomicsDepartment of PhilosophyHoughton StreetWC2A 2AE LondonUnited [email protected]

Series editor

Ian Mackie

Advisory board

Samson Abramsky, University of Oxford, Oxford, UKKarin Breitman, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, BrazilChris Hankin, Imperial College London, London,UKDexter Kozen, Cornell University, Ithaca, USAAndrew Pitts, University of Cambridge, Cambridge, UKHanne Riis Nielson, Technical University of Denmark, Kongens Lyngby, DenmarkSteven Skiena, Stony Brook University, Stony Brook, USAIain Stewart, University of Durham, Durham, UK

ISSN 1863-7310ISBN 978-1-4471-2499-3 e-ISBN 978-1-4471-2500-6DOI 10.1007/978-1-4471-2500-6Springer London Dordrecht Heidelberg New York

British Library Cataloguing in Publication DataA catalogue record for this book is available from the British Library

Library of Congress Control Number: 2012930652

© Springer-Verlag London Limited 2008, 2012Apart from any fair dealing for the purposes of research or private study, or criticism or review, aspermitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,or in the case of reprographic reproduction in accordance with the terms of licenses issued by theCopyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent tothe publishers.The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of aspecific statement, that such names are exempt from the relevant laws and regulations and therefore freefor general use.The publisher makes no representation, express or implied, with regard to the accuracy of the informationcontained in this book and cannot accept any legal responsibility or liability for any errors or omissionsthat may be made.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Preface

The first part of this preface is for the student, the second for the instructor. Somereaders may find it helpful to look at both. Whoever you are, welcome!

For the Student

You have finished secondary school and are about to begin at a university or techni-cal college. You want to study computing. The course includes some mathematics –and that was not necessarily your favourite subject. But there is no escape: a certainamount of finite mathematics is a required part of the first-year curriculum, becauseit is a necessary toolkit for the subject itself.

What Is in This Book?

That is where this book comes in. Its purpose is to provide the basic mathematicallanguage that you need to enter the world of the information and computer sciences.

It does not contain all the mathematics that you will need through the severalyears of your undergraduate career. There are other very good, quite massive,volumes that do that. At some stage, you will find it useful to get one and keep iton your shelf for reference. But experience has convinced this author that no matterhow good the compendia are, beginning students tend to feel intimidated, lost andunclear about what parts to focus on. This short book, in contrast, offers just thebasics that you need to know from the beginning, on which you can build further asneeded.

It also recognizes that you may not have done much mathematics at school, maynot have understood very well what was going on and may even have grown to detestit. No matter: you can learn the essentials here and perhaps even have fun doing so.

So what is the book about? It is about certain tools that we need to apply overand over again when thinking about computations. They include:1. Collecting things together. In the jargon of mathematics, this is set theory.2. Comparing things. This is the theory of relations.3. Associating one item with another. Introduces the key notion of a function.4. Recycling outputs as inputs. We introduce the ideas of recursion and induction.

v

vi Preface

5. Counting. The mathematician calls it combinatorics.6. Weighing the odds. This is done with probability.7. Squirrel math. Here we make use of trees.8. Yea and Nay. Just two truth-values underlie propositional logic.9. Everything and nothing. That is what quantificational logic is about.

10. Just supposing. How complex proofs are built out of simple ones.Without an understanding of these basic concepts, large portions of computer

science remain behind closed doors. As you begin to grasp the ideas and integratethem into your thought, you will also find that their application extends far beyondcomputing into many other areas.

How Should You Use It?

The good news is that there is not all that much to commit to memory. Your sisterstudying medicine, or brother going for law, will envy you terribly for this. In oursubject, the essential things are to understand and be able to apply.

But that is a much more subtle affair than one might imagine, as the two areinterdependent. Application without understanding is blind and quickly leads toerrors – often trivial, but sometimes ghastly. On the other hand, comprehensionremains poor without the practice given by applications. In particular, you do notfully understand a definition until you have seen how it takes effect in specificsituations: positive examples reveal its scope, negative ones show its limits. It alsotakes some experience to be able to recognize when you have really understoodsomething, compared to having done no more than recite the words or call uponthem in hope of blessing.

For this reason, exercises have been included as an indispensable part of thelearning process. Skip them at your peril: no matter how simple and straightforwarda concept seems, you will not fully assimilate it unless you apply it. That is part ofwhat is meant by the old proverb ‘there is no royal road in mathematics’. Even whena problem is accompanied by a sample solution, you will benefit much more if youfirst hide the answer and try to work it out for yourself. That requires self-discipline,but it brings real rewards. Moreover, the exercises have been chosen so that in manyinstances the result is just what is needed to make a step somewhere later in thebook. They are thus integral to the development of the general theory.

By the same token, do not get into the habit of skipping proofs when you read thetext. Postpone, yes, but omit, no. In mathematics, you have never fully understooda fact unless you have also grasped why it is true, that is, have assimilated atleast one proof of it. The well-meaning idea that mathematics can be democratizedby teaching the ‘facts’ and forgetting about the proofs wrought disaster in somesecondary and university education systems.

In practice, the mathematical tools that we bulleted above are rarely applied inisolation from each other. They gain their real power when used in concert, settingup a crossfire that can bring tough problems to the ground. For example, the concept

Preface vii

of a set, once explained in the first chapter, is used everywhere in what follows.Relations reappear in graphs, trees and logic. Functions are omnipresent.

For the Instructor

Any book of this kind needs to find delicate balances between competing virtuesand shortcomings of modes of presentation and choices of material. We explain thevision behind the choices made here.

Manner of Presentation

Mathematically, the most elegant and coherent way to proceed is to begin with themost general concepts and gradually unfold them so that the more specific andfamiliar ones appear as special cases. Pedagogically, this sometimes works, but itcan also be disastrous. There are situations where the reverse is often required: beginwith some of the more familiar examples and special cases, and then show how theymay naturally be broadened.

There is no perfect solution to this problem; we have tried to find a leastimperfect one. Insofar as we begin the book with sets, relations and functions inthat order, we are following the first path. But, in some chapters, we have followedthe second one. For example, when explaining induction and recursion, we beginwith the most familiar special case, simple induction/recursion over the positiveintegers; then pass to their cumulative forms over the same domain; broaden to theirqualitatively formulated structural versions; and finally present the most generalforms on arbitrary well-founded sets. Again, in the chapter on trees, we have takenthe rather unusual step of beginning with rooted trees, where intuition is strongestand applications abound, then abstracting to unrooted trees.

In the chapters on counting and probability, we have had to strike anotherbalance – between traditional terminology and notation antedating the modern eraand its translation into the language of sets, relations and functions. Most textbookpresentations do it all in the traditional way, which has its drawbacks. It leavesthe student in the dark about the relation of this material to what was taught inearlier chapters on sets and functions. And, frankly, it is not always very rigorous ortransparent. Our policy is to familiarize the reader with both kinds of presentation –using the language of sets and functions for a clear understanding of the materialitself and the traditional languages of combinatorics and probability to permitcommunication in the local dialect.

In those two chapters, yet another balance had to be found. One can easilysupply counting formulae and probability equalities to be committed to memoryand applied in drills. It is more difficult to provide reader-friendly explanationsand proofs that permit students to understand what they are doing and why. Thisbook tries to do both, with a rather deeper commitment to the latter than is usual. Inparticular, it is emphasized that whenever we wish to count the number of selections

viii Preface

of k items from a pool of n, a definite answer is possible only when it is clearlyunderstood whether the selections admit repetition and whether they discriminatebetween orders, giving four options and thus four different counting formulae forthe toolbox. The student must learn which tool to choose, and why, as much as howto use it.

The place of logic in the story is delicate. We have left its systematic expositionto the end – a decision that may seem rather strange, as one uses logic wheneverreasoning mathematically, even about the most elementary things discussed in thefirst chapters. Do we not need a chapter on logic at the very beginning of the book?The author’s experience in the classroom tells him that in practice that does notwork well. Despite its simplicity – perhaps indeed because of it – logic can beelusive for beginning students. It acquires intuitive content only as examples of itsemployment are revealed. Moreover, it turns out that a really clear explanation ofthe basic concepts of logic requires some familiarity with the mathematical notionsof sets, relations, functions and trees.

For these reasons, the book takes a different tack. In its early chapters, notions oflogic are identified briefly as they arise in the discussion of more ‘concrete’ material.This is done in ‘logic boxes’. Each box introduces just enough to get on with thetask in hand. Much later, in the last three chapters, all this is brought togetherand extended. By then, the student should have little trouble appreciating what thesubject is all about and how natural it all is, and will be ready to use other basicmathematical tools when studying it.

From time to time, there are boxes of a different nature – ‘Alice boxes’. This littletroublemaker comes from the pages of Lewis Carroll to ask embarrassing questionsin all innocence. Often they are on points that students find puzzling, but which theyhave trouble articulating clearly or are too shy to pose. It is hoped that the MadHatter’s replies are of assistance to them as well as to Alice.

The house of mathematics can be built in many different ways, and students oftenhave difficulty reconciling the formulations and constructions of one text with thoseof another. From time to time, we comment on such differences. Two examples,which always give trouble if not dealt with explicitly, arise in quantificational (first-order) logic. They concern different, although ultimately equivalent, ways of readingthe quantifiers and different ways of using the terms ‘true’ and ‘false’ for formulaeof containing free variables.

Choice of Topics

Overall, our choice of topics is fairly standard, as the chapter titles indicate. Ifstrapped for class time, an instructor could omit some of the starred sections ofChaps. 4, 5, 6, 7, 8, 9, 10. But it is urged that Chaps. 1, 2, 3 be taught entire, aseverything in them is useful to the computer science student and is also needed infollowing chapters. When sufficient classroom time is available, some teachers may,on the other hand, wish to deepen topics or add further ones.

Preface ix

We have not included a chapter on the theory of graphs. That was a difficult call tomake, and the reasons were as follows. Although trees are a particular kind of graph,there is no difficulty in covering everything we want to say about trees withoutentering into general graph theory. Moreover, an adequate treatment of graphs,even if squeezed into one chapter of about the same length as the others, takes agood 2 weeks of additional class time to cover properly with enough examples andexercises to make it sink in. The basic theory of graphs is a rather messy topic,with a rather high definition/theorem ratio and multiple options about how wide tocast the net (directed/undirected graphs, with or without loops, multiple edges andso on). The author’s experience is that students gain little from a high-speed runthrough a series of distinctions and definitions, memorized for the examinations andthen promptly forgotten.

On the other hand, recursion and induction are here developed in more detail thanis usual in texts of this kind, where it is common to neglect recursive definition infavour of inductive proof and to restrict attention to the natural numbers. AlthoughChap. 4 begins with the natural numbers, it goes on to explain number-free forms,in particular the often neglected structural ones that are so important for computerscience and logic. We also explain the general versions of induction and recursionfor arbitrary well-founded relations. Throughout the presentation, the interplaybetween recursive definition and inductive proof is brought out, with the latterpiggybacking on the former. This chapter ends up being the longest in the book.

Finally, a decision had to be made whether to include specific algorithms and, ifso, in what form: ordinary English, pseudo-code outline or a real-life programminglanguage in full detail? Most first-year students of computing will be taking, inparallel, courses on principles of programming and some specific programminglanguage; but the languages chosen differ from one institution to another and changeover time. The policy in this text is to sketch the essential idea of basic algorithmsin plain but carefully formulated English. In a few cases (particularly the chapter ontrees), we give optional exercises in expressing them in pseudo-code. Instructorswishing to make more systematic use of pseudo-code, or to link material withspecific programming languages, should feel free to do so.

Courses Outside Computer Science

Computer science students are not the only ones who need to learn about thesetopics. Students of mathematics, philosophy as well as the more theoretical sidesof linguistics, economics and political science, all need a course in basic formalmethods covering more or less the same territory. This text can be used, with someomissions and additions, for a formal methods course in any of those disciplines.

In the particular case of philosophy, in the second half of the twentieth century,there was an unfortunate tendency to teach only elementary logic, leaving asideinstruction on sets, relations, functions, recursion/induction, probability and trees.Even within logic, the focus was rather narrow, often almost entirely on so-callednatural deduction. But as already remarked, it is difficult for the student to get a

x Preface

clear idea of what is going on in logic, and difficult for the instructor to say anythinginteresting about it, without having those other concepts available in the toolkit. Thefew philosophy students going on to more advanced courses in logic are usuallyexposed to such tools in bits and pieces but without a systematic grounding. It is theauthor’s belief that all of the subjects dealt with in this book (with the exception ofChap. 5 on counting and the last two sections of Chap. 7 on trees) are also vital for anadequate course on formal methods for students of philosophy. With some additionalpractice on the ins and outs of approximating statements of ordinary language in thenotation of propositional and predicate logic, the book can also be used as a text forsuch a course.

The Second Edition

For those already familiar with the first edition, we mention the modificationsmade for the second one. On a general level, affecting the entire book, the maindevelopments are as follows:• Formulations have been reviewed and retouched to improve clarity and sim-

plicity. Sometimes the order of development within a chapter has been changedslightly. In the end, hardly a paragraph remains untouched.

• Further exercises have been added at points where classroom experience sug-gested that they would be useful.

• Additional sample answers have also been provided. But, resisting classroompressures, not all exercises are supplied with answers! After students have learnedto swim in shallow water, they need to acquire confidence and self-reliance byventuring out where their feet no longer touch bottom. Ultimately that leads tothe satisfaction of getting things right where no crib is possible. Instructors canalso use these exercises in tests and examinations.

• To facilitate cross-reference, the numbering of exercises has been aligned withthat of the sections and subsections. Thus, Exercise x.y.z (k) is the kth exercisein Subsection x.y.z of Chapter x, with the (k) omitted when there is only oneexercise for the subsection. When a section has no subsections, the numbering isreduced to x.y (k), and end-of-chapter exercises are listed simply as x (k), wherex is the chapter number.On the level of specific chapters, there have been significant revisions in the

content and presentation of Chaps. 5, 6, 7:• In Chap. 5 on counting, the section on rearrangements and configured partitions

has been rewritten to make it clearer and more reader-friendly, and some materialof lesser interest has been removed.

• In Chap. 6 on probability, the last sections have been refashioned, and somematerial of little interest in the finite context has been eliminated.

• In Chap. 7 on trees, we have clarified the presence and role of the empty tree,which had been left rather vague in the first edition.The chapters on propositional and first-order logic have been very much reorga-

nized, emerging as three rather than two chapters:

Preface xi

• Order of presentation. The general notion of a consequence (or closure) relationhas been taken from its place at very beginning of Chap. 8 and relocated in thenew Chap. 10. The previous position was appropriate from a theoretical point ofview, but from a pedagogical one it just did not work. So the abstract concept hasbeen kept under wraps until readers have become thoroughly familiar with theparticular cases of classical propositional and first-order consequence.

• Motivation. Moreover, the concept of a consequence relation is now motivatedby showing how it supports the validity of ‘elementary derivations’, understoodas finite sequences or trees of propositions made up of individually valid steps.It is shown that the conditions defining consequence relations are just what areneeded for elementary proofs to do their job.

• Content. The sections of Chaps. 8 and 9 that in the first edition were devotedto the skill of constructing ‘natural deductions’ in the symbols of propositionaland quantificational logic have been suppressed. In their place, the new Chap.10 includes an informal explanation of the main strategies of mathematicalproof, rigorous statements of the higher-level rules on which those strategies arebased and an explanation on how the rules support traditional practice. Roughlyspeaking, a formal drill of dubious value has given way to greater attention to agood understanding of the recursive structure of proofs.

• Other. The notation used in explaining the semantics of quantificational logicin Chap. 9 has been considerably streamlined, to keep it as simple as possiblewithout loss of clarity or rigour.A section of the author’s webpage http://sites.google.com/site/davidcmakinson

is set aside for material relating to this edition: residual errata, comments,further exercises, etc. Readers are invited to send their observations [email protected].

Acknowledgements

The author would like to thank Anatoli Degtyarev, Franz Dietrich, Valentin Gorankoand George Kourousias for helpful discussions when writing the first edition.George also helped prepare the diagrams. Thanks to the London School of Eco-nomics (LSE) and successive department heads Colin Howson, Richard Bradley andLuc Bovens for the wonderful working environment that they provided while botheditions were being written. Herbert Huber’s emails helped find first-edition typos.For the second edition, Abhaya Nayak and Xavier Parent made valuable commentson a draft of the new tenth chapter.

Saint Martin de Hinx

xiii

Contents

1 Collecting Things Together: Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The Intuitive Concept of a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Basic Relations Between Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Identity and Proper Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.3 Diagrams .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.4 Ways of Defining a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 The Empty Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.1 Emptiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.2 Disjoint Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Boolean Operations on Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.1 Intersection.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.2 Union .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.4.3 Difference and Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5 Generalized Union and Intersection .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.6 Power Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Selected Reading .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Comparing Things: Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.1 Ordered Tuples, Cartesian Products and Relations . . . . . . . . . . . . . . . . . 27

2.1.1 Ordered Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.1.2 Cartesian Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.1.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2 Tables and Digraphs for Relations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.2 Digraphs.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3 Operations on Relations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.3.1 Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.3.2 Join, Projection, Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.3.4 Image .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4 Reflexivity and Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.4.1 Reflexivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.4.2 Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

xv

xvi Contents

2.5 Equivalence Relations and Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.5.1 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.5.2 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.5.3 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.5.4 The Partition/Equivalence Correspondence . . . . . . . . . . . . . . . 47

2.6 Relations for Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.6.1 Partial Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.6.2 Linear Orderings .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.6.3 Strict Orderings .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.7 Closing with Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.7.1 Transitive Closure of a Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.7.2 Closure of a Set Under a Relation . . . . . . . . . . . . . . . . . . . . . . . . . 54

Selected Reading .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Associating One Item with Another:Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.1 What Is a Function? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.2 Operations on Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2.1 Domain and Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2.2 Restriction, Image, Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2.3 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.4 Inverse .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.3 Injections, Surjections, Bijections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3.1 Injectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.3.2 Surjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.3.3 Bijective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.4 Using Functions to Compare Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.4.1 Equinumerosity .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.4.2 Cardinal Comparison .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.4.3 The Pigeonhole Principle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.5 Some Handy Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5.1 Identity Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5.2 Constant Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5.3 Projection Functions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.5.4 Characteristic Functions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.6 Families and Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.6.1 Families of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.6.2 Sequences and Suchlike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75


4 Recycling Outputs as Inputs: Induction and Recursion . . . . . . . . . . . . . . . 794.1 What Are Induction and Recursion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.2 Proof by Simple Induction on the Positive Integers . . . . . . . . . . . . . . . . 80

4.2.1 An Example .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.2.2 The Principle Behind the Example . . . . . . . . . . . . . . . . . . . . . . . . 82

4.3 Definition by Simple Recursion on the Positive Integers. . . . . . . . . . . 85

Contents xvii

4.4 Evaluating Functions Defined by Recursion . . . . . . . . . . . . . . . . . . . . . . . . 874.5 Cumulative Induction and Recursion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.5.1 Cumulative Recursive Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 894.5.2 Proof by Cumulative Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.5.3 Simultaneous Recursion and Induction .. . . . . . . . . . . . . . . . . . . 92

4.6 Structural Recursion and Induction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934.6.1 Defining Sets by Structural Recursion .. . . . . . . . . . . . . . . . . . . . 944.6.2 Proof by Structural Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.6.3 Defining Functions by Structural Recursion

on Their Domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.7 Recursion and Induction on Well-Founded Sets*. . . . . . . . . . . . . . . . . . . 102

4.7.1 Well-Founded Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.7.2 Proof by Well-Founded Induction . . . . . . . . . . . . . . . . . . . . . . . . . 1044.7.3 Defining Functions by Well-Founded

Recursion on Their Domains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.7.4 Recursive Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108


5 Counting Things: Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.1 Basic Principles for Addition and Multiplication .. . . . . . . . . . . . . . . . . . 113

5.1.1 Principles Considered Separately .. . . . . . . . . . . . . . . . . . . . . . . . . 1145.1.2 Using the Two Principles Together . . . . . . . . . . . . . . . . . . . . . . . . 117

5.2 Four Ways of Selecting k Items Out of n . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175.2.1 Order and Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.2.2 Connections with Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.3 Counting Formulae: Permutations and Combinations .. . . . . . . . . . . . . 1225.3.1 The Formula for Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.3.2 Counting Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.4 Selections Allowing Repetition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265.4.1 Permutations with Repetition Allowed . . . . . . . . . . . . . . . . . . . . 1275.4.2 Combinations with Repetition Allowed . . . . . . . . . . . . . . . . . . . 128

5.5 Rearrangements and Partitions* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.5.1 Rearrangements .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.5.2 Counting Configured Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132


6 Weighing the Odds: Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1376.1 Finite Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386.1.2 Properties of Probability Functions . . . . . . . . . . . . . . . . . . . . . . . . 139

6.2 Philosophy and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.2.1 Philosophical Interpretations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.2.2 The Art of Applying Probability Theory . . . . . . . . . . . . . . . . . . 1436.2.3 Digression: A Glimpse of the Infinite Case* . . . . . . . . . . . . . . 143

6.3 Some Simple Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

xviii Contents

6.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1476.4.1 The Ratio Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1476.4.2 Applying Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.5 Interlude: Simpson’s Paradox* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1536.6 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1556.7 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1576.8 Expectation* .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159Selected Reading .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

7 Squirrel Math: Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1657.1 My First Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1657.2 Rooted Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.2.1 Explicit Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1677.2.2 Recursive Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

7.3 Working with Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1717.3.1 Trees Grow Everywhere .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1717.3.2 Labelled and Ordered Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

7.4 Interlude: Parenthesis-Free Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1757.5 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767.6 Unrooted Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1807.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1817.6.3 Spanning Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185


8 Yea and Nay: Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1898.1 What Is Logic? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1898.2 Truth-Functional Connectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1908.3 Tautological Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

8.3.1 The Language of Propositional Logic . . . . . . . . . . . . . . . . . . . . . 1948.3.2 Tautological Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1958.3.3 Tautological Equivalence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1978.3.4 Tautologies and Contradictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

8.4 Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2028.4.1 Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2028.4.2 Conjunctive Normal Form* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2048.4.3 Eliminating Redundant Letters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2068.4.4 Most Modular Version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

8.5 Semantic Decomposition Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209Selected Reading .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

9 Something About Everything: Quantificational Logic . . . . . . . . . . . . . . . . . 2179.1 The Language of Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

9.1.1 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2189.1.2 Systematic Presentation of the Language . . . . . . . . . . . . . . . . . 2199.1.3 Freedom and Bondage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Contents xix

9.2 Some Basic Logical Equivalences .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2249.2.1 Quantifier Interchange .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2249.2.2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2259.2.3 Vacuity and Relettering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

9.3 Two Semantics for Quantificational Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 2279.3.1 The Common Part of the Two Semantics. . . . . . . . . . . . . . . . . . 2279.3.2 Substitutional Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2299.3.3 The x-Variant Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

9.4 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2339.4.1 Logical Implication .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2339.4.2 Clean Substitutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2369.4.3 Fundamental Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2369.4.4 Identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238


10 Just Supposing: Proof and Consequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24310.1 Elementary Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

10.1.1 My First Derivation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24310.1.2 The Logic Behind Chaining .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

10.2 Consequence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24710.2.1 The Tarski Conditions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24710.2.2 Consequence and Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24910.2.3 Consequence as an Operation* . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

10.3 A Higher-Level Proof Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25210.3.1 Informal Conditional Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25210.3.2 Conditional Proof as a Formal Rule . . . . . . . . . . . . . . . . . . . . . . . 25410.3.3 Flattening Split-Level Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

10.4 Other Higher-Level Proof Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25710.4.1 Disjunctive Proof and Proof by Cases . . . . . . . . . . . . . . . . . . . . . 25710.4.2 Proof by Contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26010.4.3 Rules with Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

10.5 Proofs as Recursive Structures* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26610.5.1 Second-Level Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26610.5.2 Split-Level Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26810.5.3 Four Views of Mount Fuji . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269


Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

List of Figures

Fig. 1.1 Euler diagrams for proper inclusion and identity . . . . . . . . . . . . . . . . . . . 7Fig. 1.2 Venn diagrams for inclusion and proper inclusion . . . . . . . . . . . . . . . . . 8Fig. 1.3 Venn diagrams for union and intersection . . . . . . . . . . . . . . . . . . . . . . . . . . 14Fig. 1.4 Hasse diagram for P(A) when A D f1,2,3g . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Fig. 2.1 Diagram for a relation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Fig. 2.2 Diagram for a partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Fig. 4.1 Recursive structural definition .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Fig. 7.1 Example of a rooted tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166Fig. 7.2 Recursive definition of a tree, adding a new root . . . . . . . . . . . . . . . . . . . 170Fig. 7.3 Tree for syntactic structure of an arithmetic expression .. . . . . . . . . . . 174Fig. 7.4 Syntactic structure with economical labels . . . . . . . . . . . . . . . . . . . . . . . . . 174Fig. 7.5 Which are binary trees? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177Fig. 7.6 Which are binary search trees? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Fig. 8.1 Semantic decomposition tree for :(p^q)!(:p_:q) . . . . . . . . . . . . . 210Fig. 8.2 Semantic decomposition tree for ((p^q)!r)!(p!r) .. . . . . . . . . . . . 211

Fig. 9.1 Logical relations between alternating quantifiers . . . . . . . . . . . . . . . . . . . 234

Fig. 10.1 A labelled derivation tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

xxi

1Collecting Things Together: Sets

AbstractIn this chapter, we introduce the student to the world of sets. Actually, only alittle bit of it, the part that is needed to get going.

After giving a rough intuitive idea of what sets are, we present the basicrelations between them: inclusion, identity, proper inclusion and exclusion. Wedescribe two common ways of identifying sets and pause to look more closely atthe empty set. We then define some basic operations for forming new sets out ofold ones: intersection, union, difference and complement. These are often calledBoolean operations, after George Boole, who first studied them systematically inthe middle of the nineteenth century.

Up to this point, the material is all ‘flat’ set theory, in the sense that it does notlook at what we can do when the elements of sets are themselves sets. However,we need to go a little beyond flatland. In particular, we need to generalizethe notions of intersection and union to cover arbitrary collections of sets andintroduce the very important concept of the power set of a set, that is, the set ofall its subsets.

1.1 The Intuitive Concept of a Set

Every day you need to consider things more than one at a time. As well as thinkingabout a particular individual, such as the young man or woman sitting on your left inthe classroom, you may focus on some collection of people – say, all those studentswho come from the same school as you do, or all those with red hair. A set is justsuch a collection, and the individuals that make it up are called its elements. Forexample, each student with green eyes is an element of the set of all students withgreen eyes.

What could be simpler? But be careful! There might be many students with greeneyes, or none or maybe just one, but there is always exactly one set of them. It isa single item, even when it has many elements. Moreover, whereas the studentsthemselves are flesh-and-blood persons, the set is an abstract object, thus different

D. Makinson, Sets, Logic and Maths for Computing, Undergraduate Topics in ComputerScience, DOI 10.1007/978-1-4471-2500-6 1,© Springer-Verlag London Limited 2008, 2012

1

2 1 Collecting Things Together: Sets

from those elements. Even when the set has just one element, it is not the samething as that unique element. For example, even if Achilles is the only person inthe class with green eyes, the corresponding set is distinct from Achilles; it is anabstract item and not a person. To anticipate later terminology, the set whose onlyelement is Achilles is often called it the singleton for that person, and it is writtenas fAchillesg.

The elements of a set need not be people. They need not even be physical objects;they may in turn be abstract items. For example, they can be numbers, geometricfigures, items of computer code, colours, concepts or whatever you like – and evenother sets, which in turn may have sets as elements, and so on. The further reachesof set theory deal with such higher-level sets, but we will not need to go so far. Wewill only need to consider sets of items that are not themselves sets (flat sets, orsets of degree zero) and sets of them (sets of degree one). To avoid the unsettlingrepetition that occurs in the term ‘set of sets’, we will also speak synonymously ofa collection, or class of sets.

We need a notation to represent the idea of elementhood. We write x 2 A for xis an element of A, and x 62 A for x is not an element of A. Here, A is always a set,while x may or may not be a set; in the simple examples, it will not be one. The sign2 is derived from one of the forms of the Greek letter epsilon.

1.2 Basic Relations Between Sets

Sets can stand in various relations to each other, notably inclusion, identity, properinclusion, and these relationships can be made graphic with diagrams, as explainedin this section.

1.2.1 Inclusion

One basic relation is that of inclusion. When A, B are sets, we say that A is includedin B (or A is a subset of B) and write A � B iff every element of A is an element of B.In other words, iff for all x, if x 2 A, then x 2 B. Put in another way that is sometimesuseful: iff there is no element of A that is not an element of B. Looking at the samerelation from the other direction, when this holds, we also say that B includesA (Bis a superset of A) and write B � A.

Warning: Avoid saying that A is ‘contained’ in B, as this is rather ambiguous. It canmean that A � B, but it can also mean that A 2 B. These are not the same and shouldnever be confused. For example, the integer 2 is an element of the set NC of allpositive integers, but it is not a subset of NC. Conversely, the set E of all evenintegers is a subset of NC, that is, each of its elements 2, 4, : : : is an element of NC,but E is not an element of NC. Neglect of this distinction quickly leads to seriousconfusion.

1.2 Basic Relations Between Sets 3

Alice Box: Iff

Alice: Hold on, what’s this ‘iff ’? It’s not in my dictionary.Hatter: Too bad for your dictionary. The expression was introduced around

the middle of the last century by the mathematician Paul Halmos,as a handy shorthand for ‘if and only if’, and soon became standardamong mathematicians.

Alice: OK, but aren’t we doing some logic here? I see words like ‘if’,‘only if’, ‘every’, ‘not’ and perhaps more. Shouldn’t we begin byexplaining what they mean?

Hatter: Life will be easier if for the moment we just use these particlesintuitively. We will get back to their exact logical analysis later.

Exercise 1.2.1 (with solution) Which of the following sets are included in which?Use the notation above and express yourself as succinctly and clearly as you can.Recall that a prime number is a positive integer greater than 1 that is not divisibleby any positive integer other than itself and 1.A: the set of all positive integers less than 10B: the set of all prime numbers less than 11C: the set of all odd numbers greater than 1 and less than 6D: the set whose only elements are 1 and 2E: the set whose only element is 1F: the set of all prime numbers less than 8

Solution Each of these sets is included in itself, and each of them is included in A.In addition, we have C � B, E � D, F � B, B � F.

Comments Note that none of the other converses hold. For example, we do nothave B � C, since 7 2 B but 7 62 C. Note also that we do not have E � B since weare not taking 1 to be a prime number.

1.2.2 Identity and Proper Inclusion

The notion of inclusion leads us to the concept of identity (alias equality) betweensets. Clearly, if both A � B and B � A, then A and B have exactly the same elements –by the definition of inclusion, every element of each is an element of the other;in other words, there is no element of one that is not an element of the other. Abasic principle of set theory, called the axiom (or postulate) of extensionality, sayssomething more: When both A � B and B � A, then the sets A, B are in fact identical.They are one and the same set, and we write A D B.


On the other hand, when A � B but A ¤ B, then we say that A is properly includedin B, and write A � B. Sometimes � is written with a small ¤ underneath. Thatshould not cause any confusion, but another notational dialect is more dangerous: afew older texts use A � B for plain inclusion. Be wary when you read.

Exercise 1.2.2 (1) (with solution) In Exercise 1.2.1, which of the sets are identicalto which?

Solution B D F (so that also F D B). And of course A D A, B D B, etc. since eachof the listed sets is identical to itself.

Comment The fact that we defined B and F in different ways makes no difference:the two sets have exactly the same elements, and so by the axiom of extensionality,they are identical.

Exercise 1.2.2 (2) (with solution) In Exercise 1.2.1, which of the sets are properlyincluded in which? In each case give a ‘witness’ to the proper nature of the inclusion,that is, identify an element of the right one that is not an element of the left one.

Solution C � B, witnesses 2, 7; E � D, sole witness 2.

Comment F š B, since B and F have exactly the same elements.

Exercise 1.2.2 (3) (with solution) Which of the following claimed identities arecorrect? Read the curly braces as framing the elements of the set so that, forexample, f1, 2, 3g is the set whose elements are just 1, 2, 3. Be careful with theanswers.(a) f1, 2, 3g D f3, 2, 1g, (b) f9, 5g D f9, 5, 9g, (c) f0, 2, 8g D fjp4j, 0/5,23g, (d) f7g D 7, (e) f8g D ff8gg, (f) fLondon, Beijingg D fLondres, Peking,(g) fCg D f‘C’g.

Solution (a) yes, (b) yes, (c) yes, (d) no, (e) no, (f) yes, (g) no.

Comments(a) The order of enumeration makes no difference – the sets still have the same

elements.(b) Repeating an element in the enumeration is inelegant, but it makes no substan-

tive difference – the sets still have the same elements.(c) The elements have been named differently, as well as being written in a different

order, but they are the same.(d) 7 is a number, not a set, while f7g is a set with the number 7 as its only element,

that is, its singleton.(e) Both are sets, and both have just one element, but these elements are not the

same. The unique element of the left set is the number 8, while the uniqueelement of the right set is the set f8g. The right set is thus the singleton of theleft one.


(f) London is the same city as Londres, although it is named in different languages(English, French). So the sets have exactly the same elements and thus areidentical.

(g) The left set has the operation as its sole element, while, under a standardconvention for naming by inverted commas, the right set has the name of thatoperation as its sole element, so the sets are not identical. This distinction mayseem quite pedantic, but it turns out to be necessary to avoid confusion whendealing with sets of symbols, as is often the case in computer science, logic andlinguistics.

Exercise 1.2.2 (3) (with partial solution) True or false? In each case, use yourintuition to make a guess and establish it by either proving the point from thedefinitions (if you guessed positively) or giving a simple counterexample (if youguessed negatively). Make sure that you don’t confuse � with �.(a) Whenever A � B and B � C, then A � C.(b) Whenever A � B and C � B, then A � C.(c) Whenever A1 � A2 � : : : � An and also An � A1, then Ai D Aj for all i,j � n.(d) A � B iff A � B and not B � A.(e) A � B iff A � B or A D B.(f) A D B iff neither A � B nor B � A.(g) Whenever A � B and B � C, then A � C.(h) Whenever A � B and B � C, then A � C.

Solutions to (a), (b), (f), (h)(a) True. Take any sets A, B, C. Suppose A � B and B � C; it suffices to show A � C.

Take any x, and suppose x 2 A; by the definition of inclusion, it is enough toshow x 2 C. But since x 2 A and A � B, we have by the definition of inclusionthat x 2 B. So since also B � C, we have again by the definition of inclusion thatx 2 C, as desired.

(b) False. Counterexample: A D f1g, B D f1,2g, C D f2g.(f) False. The left to right implication is correct, but the right to left one is false,

so the entire co-implication (the ‘iff’) is also false. Counterexample: A D f1g,B D f2g.

(h) True. Take any sets A, B, C. Suppose A � B and B � C. From the former bythe definition of proper inclusion, we have A � B. So by exercise (a), we haveA � C. It remains to show that A ¤ C. Since A � B, we have by exercise (d) thatnot B � A, so by the definition of inclusion, there is an x with x 2 B but x 62 A.Thus, since B � C, we have x 2 C while x 62 A, so C is not included in A, andthus A ¤ C, as desired.

Comment The only false ones are (b) and (f). All the others are true.


Logic Box: Proving Conditional Statements

In the preceding exercise, we needed to show several statements of the form‘if this then that’, and we followed the most straightforward way of doing so:we supposed that the first is true, and on that basis showed that the second istrue. This is known as conditional proof.

We carried it out, for example, in the verification of part (a) of the exercise.Actually, we did it there twice: First, we supposed that A � B and B � C, andset our goal as showing A � C. That means that whenever x 2 A, then x 2 C,so we then supposed that x 2 A and aimed to get x 2 C. There are other linesof attack for establishing ‘if : : : then : : : ’ statements, but we will come to themlater.

Alice Box: Whenever

Alice: Aren’t we missing something there? For (a), we were required toshow that whenever A � B and B � C, then A � C. To me, that says:for all sets A, B, C, if A � B and B � C, then A � C. You haveexplained how we tackle the ‘iffy’ part, but say nothing about the‘every’ part. There must be something underneath that too.

Hatter: One thing at a time! That’s also important, and we will get to it in duecourse.

Alice: It’s a promise?Hatter: It’s a promise!

Exercise 1.2.2 (3) is the first in which you are asked to show something, but itis certainly not the last. As well as the logic of such proofs, there are some generalpoints of common sense that you need to bear in mind when trying to prove, ordisprove, something.

First, always be clear in your mind (and on paper) what you are trying to show.If you don’t know what you are attempting to prove, it is unlikely that you willsucceed in proving it, and if by chance you do, the proof will probably be a realmess.

Following this rule is not as easy as may appear, for as a proof develops, thegoal changes! For example, looking again at part (a) of the preceding exercise, webegan by trying to show (a) itself. After choosing A, B, C arbitrarily, our goal wasthe conditional statement: If A � B and B � C, then A � C. Then, after supposingboth A � B and B � C, we switched our goal to A � C. We then chose an arbitrary x,supposed x 2 A, and aimed to show x 2 C. In half a dozen lines, four different goals!At each stage, we have to be aware of which one we are driving at. When we startusing more sophisticated tools for building proofs, notably reductio ad absurdum,the goal-shifts become even more striking.


Fig. 1.1 Euler diagrams forproper inclusion and identity

A second rule of common sense when proving things: As far as possible, beaware of what you are allowed to use and don’t hesitate to use it. If you neglectavailable information, you may not have enough to do the job.

So what are you allowed to use? To begin with, you may use the definitions ofterms in the problem (in our example, the notions of subset and proper subset). Toomany students come to mathematics with the idea that a definition is just somethingfor decoration – something you can hang on the wall like a picture or diploma. Adefinition is for use. Indeed, in a very simple verification, most of the steps canconsist of ‘unpacking’ the definitions and then, after just a little reasoning, packingthem together again. Next, you may use whatever basic axioms (also known aspostulates) that you have been supplied with. In parts (e) and (f) of the last exercise,that was just the principle of extensionality, stated at the beginning of the section.Finally, you can use anything that you have already proven. For example, we didthis while proving part (h) of the exercise.

Third point of common sense: Be flexible and ready to go into reverse. If youcan’t prove that a statement is true, try looking for a counterexample in order to showthat it is false. If you can’t find a counterexample, try to prove that it is true. Withsome experience, you can often use the failure of your attempted proofs as a guideto finding a suitable counterexample, and the failures of your trial counterexamplesto give a clue for constructing a proof. This is all part of the art of proof andrefutation.

1.2.3 Diagrams

If we think of a set A as represented by all the points in a circle (or any other closedplane figure), then we can represent the notion of one set A being a proper subset ofanother B by putting a circle labelled A inside a circle labelled B. We can diagramequality, of course, by drawing just one circle and labelling it both A and B. Thus, wehave Euler diagrams (so named after the eighteenth-century mathematician Eulerwho used them when teaching a princess by correspondence) (Fig. 1.1).


Fig. 1.2 Venn diagrams for inclusion and proper inclusion

How can we diagram inclusion in general? Here we must be careful. There is nosingle Euler diagram that does the job. When A � B, then we may have either of theabove two configurations: If A � B, then the left diagram of Fig. 1.1 is appropriate;if on the other hand A D B, then the right diagram is the correct one.

Diagrams are a very valuable aid to intuition, and it would be pedantic andunproductive to try to do without them. But we must also be clearly aware of theirlimitations. If you want to visualize A � B using Euler diagrams, and you don’tknow whether the inclusion is proper, you will need to consider two diagrams andsee what happens in each.

However, there is another kind of diagram that can represent plain inclusionwithout ambiguity. It is called a Venn diagram (after the nineteenth-century logicianJohn Venn). It consists of drawing two circles, one for A and one for B, alwaysintersecting no matter what the relationship between A and B, and then puttinga mark (e.g. ¿) in an area to indicate that it has no elements, and another kindof mark (e.g. a cross) to indicate that it does have at least one element. Withthese conventions, the left diagram below represents A � B, while the right diagramrepresents A � B (Fig. 1.2).

Note that the disposition of the circles is always the same: what changes are theareas noted as empty or as non-empty. There is considerable variability in the signsused for this – dots, ticks, etc.

A great thing about Venn diagrams is that they can represent common relation-ships like A � B by a single diagram, rather than by an alternation of differentdiagrams. Another advantage is that with some additions they can be used torepresent basic operations on sets as well as relations between them. However, theyalso have their shortcomings. The bad news is that when you have more than twosets to consider, Venn diagrams can become very complicated and lose their intuitiveclarity – which was, after all, their principal raison d’etre.

Warning: Do not confuse Euler diagrams with Venn diagrams. They are constructeddifferently and read differently. Unfortunately, textbooks themselves are sometimesrather sloppy about this, using the terms interchangeably or even in reverse.


Exercise 1.2.3 (1) (with hints)(a) Suppose we want to diagram the double proper inclusion A � B � C. We can,

of course, do it with two separate Euler (or Venn) diagrams. Do it with a singlediagram of each kind, with three circles.

(b) Suppose we want to represent the two proper inclusions A � B, A � C in a singlediagram with three circles. How many different Euler diagrams are compatiblewith this configuration? What kind of difficulty do you get into with placingcrosses in the Venn diagram, and how might you resolve it?

Hints: For (a), Euler is straightforward. For Venn, put ¿ in the smallest areas thatmust be empty, then place crosses in the smallest areas that must be non-empty.For (b), when constructing Euler diagrams, think of the various possible relationsbetween B and C, and when building the Venn diagram, try to proceed as you didfor (a).

1.2.4 Ways of Defining a Set

The examples we have so far looked at already illustrate two important ways ofdefining or identifying a set. One way is to enumerate all its elements individuallybetween curly brackets, as we did in Exercise 1.2.2 (3). Evidently, such anenumeration can be completed only when there are finitely many elements, andis convenient only when the set is fairly small. The order of enumeration makes nodifference to what items are elements of the set, for example, f1,2,3gD f3,1,2g, butwe usually write elements in some conventional order such as increasing size forsmooth communication.

Another way of identifying a set is by providing a common property: Theelements of the set are understood to be all (and only) the items that have thatproperty. That is what we did for several of the sets in Exercise 1.2.1. There isa notation for this. For example, we write the first set as follows: A D fx 2 NC:x < 10g, where NC stands for the set of all integers greater than zero (the positiveintegers). Some authors use a vertical bar in place of a colon.

Exercise 1.2.4 (1) with solution(a) Identify the sets A, B, C, F of Exercise 1.2.1 by enumeration.(b) Identify the sets D, E of the same exercise by properties, using the notation

introduced.

Solution(a) A D f1,2,3,4,5,6,7,8,9g; B D f2,3,5,7g; C D f3,5g, F D f2,3,5,7g.(b) There are many ways of doing this, here are some. D D fx 2 NC: x divides all

even integersg; E D fx 2 NC: x is less than or equal to every positive integerg.

When a set is infinite, we often use an incomplete ‘suspension points’ notation.Thus, we might write the set of all even positive integers and the set of all primes,respectively, as follows: f2,4,6, : : : g, f2,3,5, 7,11, : : : g. But it should be emphasizedthat this is an informal way of writing, used when it is well understood between


writer and reader what particular continuation is intended. Clearly, there are manyways of continuing each of these partial enumerations. We normally intend that themost familiar or simplest is the one that is meant. Without a specific understandingof this kind, the notation is meaningless.

These two methods of identifying a set – by enumeration and by a commonproperty – are not the only ones. In a later chapter, we will be looking at anothervery important one, known as recursive definition, where one specifies certain initialelements of the set, and a rule for generating new elements out of old ones. It canbe thought of as a mathematically precise way of expressing the idea hinted at bysuspension points.

1.3 The Empty Set

Just as zero is a very important natural number, the empty set is basic to set theory.Just as mathematics took a very long time to come up with a clear conceptualizationand standard notation for zero, so students can have some initial difficulties withemptiness. By the end of this section, such difficulties should be over.

1.3.1 Emptiness

What do the following two sets have in common?A D fx 2 NC: x is both even and oddgB D fx 2 NC: x is prime and 24 � x � 28g

Answer: Neither of them has any elements. From this it follows that they haveexactly the same elements – neither has any element that is not in the other. Soby the principle of extensionality given in Sect. 1.2.2, they are identical, that is, theyare the same set. Thus, A D B, even though they are described differently. This leadsto the following definition. The empty set is defined to be the (unique) set that hasno elements at all.

This set is written ¿. We already used this notation informally in Venn diagramswhen indicating that a certain region of the diagram has no elements. The fact thatit has no elements does not mean that it has any less ‘existence’ than other sets, anymore than zero has less existence than the positive numbers.

Exercise 1.3.1 (1) with solution Show that ¿ � A for every set A.

Solution We need to show that for all x, if x 2 ¿, then x 2 A. In other words, thereis no x with x 2 ¿ but x 62 A. But by the definition of ¿, there is no x with x 2 ¿, sowe are done.

1.4 Boolean Operations on Sets 11

Alice Box: If : : : then : : :

Alice: That’s a short proof, but a strange one. You say ‘in other words’, butare the two formulations really equivalent?

Hatter: Indeed they are. This is because of the way in which we understand‘if : : : then : : : ’ statements in mathematics. We could explain that indetail now, but it is probably better to come back to it a bit later.

Alice: Another promise?Hatter: Indeed!

1.3.2 Disjoint Sets

We say that sets A, B are disjoint (aka mutually exclusive) iff they have no elementsin common. That is, iff there is no x such that both x 2 A and x 2 B. When they arenot disjoint, that is, when they have at least one element in common, we can say thatthey overlap.

More generally, when A1, : : : ,An are sets, we say that they are pairwise disjointiff for any i,j � n, if i ¤ j, then Ai has no elements in common with Aj.

Exercise 1.3.2 (1)(a) Of the sets in Exercise 1.2.1, which are disjoint from which?(b) Draw Euler and Venn diagrams to express the disjointness of two sets.(c) Draw a Venn diagram to express the overlapping of two sets.(d) How many Euler diagrams would be needed to express overlapping?(e) Give an example of three sets X, Y, Z such that X is disjoint from Y, and Y is

disjoint from Z, but X is not disjoint from Z.(f) Show that the empty set is disjoint from every set, including itself.

1.4 Boolean Operations on Sets

We now define some operations on sets, that is, ways of constructing new sets outof old. We begin with three basic ones: intersection, meet and relative complement,and then several others that can be defined in terms of them.

1.4.1 Intersection

If A and B are sets, we define their intersection A \ B, also known as their meet, bythe following rule: for all x,

x 2 A \ B iff x 2 A and x 2 B:


Exercise 1.4.1 (1) (with partial solution) Show the following:(a) A \ B � A and A \ B � B.(b) Whenever X � A and X � B, then X � A \ B.(c) A \ B D B \ A (commutation).(d) A \ (B \ C) D (A \ B) \ C (association).(e) A \ A D A (idempotence).(f) A \ ¿ D ¿ (bottom).(g) Reformulate the definition of disjoint sets using intersection.

Solutions to (b), (f), (g) For (b): Suppose X � A and X � B; we want to showX � A \ B. Take any x and suppose x 2 X; we need to show that x 2 A \ B. Since x 2X and X � A, we have by the definition of inclusion that x 2 A, and similarly, sincex 2 X and X � B, we have x 2 B. So by the definition of intersection, x 2 A \ B, asdesired.

For (f): We already have A \ ¿ � ¿ by (a) above. And we also have ¿ � A \ ¿by Exercise 1.3.1 (1), so we are done.

For (g): Sets A, B are disjoint iff A \ B D ¿.

Remark Property (a) may be expressed in words by saying that A \ B is a lowerbound for A, B. Property (b) tells us that it is a greatest lower bound for A, B.

Intersection has been defined using the word ‘and’. But what does this mean? Inmathematics, it is very simple – much simpler than in ordinary life. Consider anytwo statements ’, “. Each can be true, or false, but not both. When is the statement‘’ and “’, called the conjunction of the two parts, true? The answer is intuitivelyclear: When each of ’, “ considered separately is true, the conjunction is true, butin all other cases, the conjunction is false. What are the other cases? There are threeof them: ’ true with “ false, ’ false with “ true, ’ false with “ false. All this can beput in the form of a table, called the truth table for conjunction.

Logic Box: Conjunction

’ “ ’^“

1 1 11 0 00 1 00 0 0

In this table, each row represents a possible combination of truth-values of theparts ’, “. For brevity, we write 1 for ‘true’ and 0 for ‘false’. The rightmostentry in the row gives us the resulting truth-value of the conjunction ‘’ and“’, which we write as ’^“. Clearly, the truth-value of the conjunction is fullydetermined by each combination of truth-values of the parts. For this reason,conjunction is called a truth-functional logical connective.


In a later chapter, we will study the properties of conjunction and othertruth-functional connectives. As you may already have guessed, the behaviour ofintersection (as an operation on sets) reflects that of conjunction (as a connectivebetween statements). This is because the latter is used in the definition of the former.For example, the commutativity of intersection comes from the fact that ‘’ and “’has exactly the same truth-conditions as ‘“ and ’’.

For reflection: How do you square the truth table for conjunction with thedifference in meaning between ‘they got married and had a baby’ and ‘they hada baby and got married’?

1.4.2 Union

Alongside intersection we have another operation called union. The two operationsare known as duals of each other, in the sense that each is like the other ‘upsidedown’. For any sets A and B, we define their union A[B by the following rule.For all x:

x 2 A [ B iff x 2 A or x 2 B;

where this is understood in the sense:

x 2 A [ B iff x 2 A or x 2 B .or both/ ;

in other words:

x 2 A [ B iff x is an element of at least one of A; B:

The contrast with intersection may be illustrated by Venn diagrams, but theydiffer a little from those used earlier. Now that we are representing operations ratherthan statements, we no longer pepper the diagram with ¿ and crosses to say thatregions are empty or not. Instead, we shade regions to indicate that they are theones given by the operation. Thus, in Fig. 1.3, two intersecting circles continue torepresent the sets A, B, and the shaded area in the left image represents A[B, whilein the right one, it represents A \ B.

The properties of union are just like those of intersection but ‘upside down’.Evidently, this is a rather vague way of speaking; it can be made precise, but it isbetter to leave the idea on an intuitive level for the moment.

Exercise 1.4.2 (1) Show the following:(a) A � A[B and B � A[B (upper bound)(b) Whenever A � X and B � X, then A[B � X (least upper bound)(c) A[B D B[A (commutation principle)(d) A[(B[C) D (A[B)[C (association)


Fig. 1.3 Venn diagrams for union and intersection

(e) A[A D A (idempotence)(f) A[¿ D A (bottom)

Exercise 1.4.2 (2) What would you naturally call principles (a) and (b) given thenames of their counterparts for intersection?

When ‘or’ is understood in the sense used in the definition of union, it is knownas inclusive disjunction, or briefly just disjunction. Statements ‘’ or “’ are writtenas ’_“. Whereas there is just one way (out of four) of making a conjunction true,there is just one way of making a disjunction false. The truth table is as follows:

Logic Box: Disjunction

’ “ ’_“

1 1 11 0 10 1 10 0 0

Clearly, the truth-value of the disjunction is fully determined by each combi-nation of truth-values of the parts. In other words, it is also a truth-functionallogical connective. In a later chapter, we will see how the behaviour of unionbetween sets reflects that of disjunction as a connective between statements.

Exercise 1.4.2 (3) In ordinary discourse, we often use ‘’ or “’ to mean ‘either ’ or“, but not both’, in other words, ‘exactly one of ’, “ is true’. This is called exclusivedisjunction. What would its truth table look like?

We have seen some of the basic properties of intersection and of union, takenseparately. But how do they relate to each other? The following exercise covers themost important interactions.


Exercise 1.4.2 (4) (with partial solution) Show the following:(a) A \ B � A[B(b) A \ (A[B) D A D A[(A \ B) (absorption)(c) A \ (B[C) D (A \ B)[(A \ C) (distribution of intersection over union)(d) A[(B \ C) D (A[B) \ (A[C) (distribution of union over intersection)(e) A � B iff A[B D B, and likewise iff A \ B D A

Solution to (c) Writing LHS, RHS for the left and right hand sides, respectively,we need to show that LHS � RHS and conversely RHS � LHS.

For LHS � RHS, suppose that x 2 LHS. Then, x 2 A and x 2 B[C. From thelatter, we have that either x 2 B or x 2 C. Consider the two cases separately. Supposefirst that x 2 B. Since also x 2 A, we have x 2 A \ B, and so by Exercise 1.4.2 (1)part (a), we get x 2 (A \ B)[(A \ C) D RHS as desired. Suppose second that x 2 C.Since also x 2 A, we have x 2 A \ C and so again x 2 (A \ B)[(A \ C) D RHS, asdesired.

For RHS � LHS, suppose that x 2 RHS. Then, x 2 A \ B or x 2 A \ C. Considerthe two cases separately. Suppose first that x 2 A \ B. Then, x 2 A and x 2 B; fromthe latter, x 2 B[C, and so combined with the former, x 2 A \ (B[C) D LHS, asdesired. Suppose second that x 2 A \ C. The argument is similar: x 2 A and x 2 C;from the latter, x 2 B[C, and so with the former, x 2 A \ (B[C) D LHS, as desired.

Logic Box: Breaking into Cases

In the solution above, we used a technique known as proof by cases, ordisjunctive proof. Suppose we know that either ’ is true or “ is true, butwe don’t know which. It can be difficult to proceed with this rather weakinformation. So we break the argument into two parts. First, we suppose that’ is true (one case), and with this stronger assumption, we head for whateverit was that we were trying to establish. Then, we suppose instead that “ is true(the other case) and argue using this assumption to the same conclusion. If wesucceed in reaching the desired conclusion in each case separately, then weknow that it must hold irrespective of which case is true.

Unlike conditional proof, the goal remains unchanged through the proof. Butthe two cases must be treated quite separately: We cannot use the supposition forone case in the argument for the other case. The arguments carried out in the twoseparate cases may sometimes resemble each other closely (as in our exercise). Butin more challenging problems, they may be very different.

Exercise 1.4.2 (5) For an example of proof by cases in arithmetic, show that everypositive integer n has the same parity as n2, that is, if n is even or odd then so is,respectively, n2. The cases to choose suggest themselves naturally.


Alice Box: Overlapping Cases

Alice: What if both cases are true? For example, in the solution to theexercise, in the part for LHS � RHS: what if both x 2 B and x 2 C?

Hatter: No problem! This just means that we have covered that situationtwice. For proof by cases to work, it is not required that the twocases be exclusive. In some examples (as in our exercise), it is easierto work with overlapping cases; sometimes, it is more elegant andeconomical to work with cases that exclude each other.

1.4.3 Difference and Complement

There is one more Boolean operation on sets that we wish to consider: difference.Let A, B be any sets. We define the difference of B in A, written AnB (alternatively asA � B) to be the set of all elements of A that are not elements of B. That is, AnB D fx:x 2 A but x 62 Bg.

Exercise 1.4.3 (1) (with partial solution)(a) Draw a Venn diagram for difference.(b) Give an example to show that we can have AnB ¤ BnA.(c) Show (i) AnA D ¿, (ii) An¿ D A.(d) Show that (i) if A � A0, then AnB � A0nB, and (ii) if B � B0, then AnB0 � AnB.(e) Show that (i) An(B[C) D (AnB) \ (AnC), (ii) An(B \ C) D (AnB)[(AnC).(f) Show that (AnB)nC D (AnC)nB.(g) Find a counterexample to An(BnC) D (AnB)nC.(h) Show that An(BnC) � (AnB)[C.

Sample solution to (g) Take A to be any non-empty set, for example, f1g and putC D B D A. Then, LHS D An(AnA) D An¿ D A, while RHS D (AnA)nA D ¿nA D ¿.

The notion of difference acquires particular importance in a special context.Suppose that we are carrying out an extended investigation into the elements ofsome fairly large set, such as the set NC of all positive integers, and that for thepurposes of the investigation, the only sets that we need to consider are the subsetsof this fixed set. Then, it is customary to refer to the large set as a local universe,writing it as U, and consider the differences UnB for subsets B � U. As the set Uis fixed throughout the investigation, we then simplify notation and write UnB aliasU � B as �UB, or even simply as �B with U left as understood. This application ofthe difference operation is called complementation (within the given universe).

Many other notations are also used in the literature for this important operation,for example, B�, B0, Bc (where the index stands for ‘complement’). To give it a Venndiagram, we simply take the diagram for relative complement AnB and blow the Acircle up to coincide with the whole box containing the diagram, so as to serve asthe local universe.


Exercise 1.4.3 (2) (with partial solution)(a) Taking the case that A is a local universe U, rewrite equations (e) (i) and (ii)

of the preceding exercise using the simple complementation notation describedabove.

(b) Show that (i) �(�B) D B, (ii) �U D ¿, (iii) �¿ D U.

Solutions to (a) and (b)(i)(a) When A D U, then equation (i) becomes Un(B[C) D (UnB) \ (UnC),

which we can write as �(B[C) D �B \ � C; while equation (ii) becomesUn(B \ C) D (UnB)[(UnC), which we can write as �(B \ C) D �B[ � C.

(b) (i) We need to show that �(�B) D B, in other words that Un(UnB) D Bwhenever B � U (as assumed when U is taken to be a local universe). We showthe two inclusions separately. First, to show Un(UnB) � B, suppose x 2 LHS.Then, x 2 U and x 62 (UnB). From the latter, either x 62 U or x 2 B, so x 2B D RHS, as desired. For the converse, suppose x 2 B. Then, x 62 (UnB). But byassumption B � U so x 2 U, and thus x 2 Un(UnB) D LHS, as desired.

The identities � (B \ C) D �B[ � C and � (B[C) D �B \ � C are known as deMorgan’s laws, after the nineteenth-century mathematician who drew attention tothem. The identity � (�B) D B is known as double complementation. Note how itsproof made essential use of the hypothesis that B � U.

Exercise 1.4.3 (3) Show that whenever B is not a subset of U, then Un(UnB) ¤ B.

Logic Box: Negation

You will have noticed that in our discussion of difference and complementa-tion, there were a lot of nots. In other words, we made free use of the logicalconnective of negation in our reasoning. What is its logic? Like conjunctionand disjunction, it has a truth table, which is a simple flip-flop.

’ :’

1 00 1

The properties of difference and complementation stem, in effect, from thebehaviour of negation used in defining them.

Alice Box: Relative Versus Absolute Complementation

Alice: There is something about this that I don’t quite understand. As youdefine it, the complement �B of a set B is always taken with respect toa given local universe U; it is U � B. But why not define it in absoluteterms? Simply put the absolute complement of B to be the set of

(continued)


(continued)all x that are not in B. In other words, take your U to be the set ofeverything whatsoever.

Hatter: A natural idea indeed, and that is more or less how things wereunderstood in the early days of set theory. Unfortunately, it leads tounsuspected difficulties, indeed to contradiction, as was notoriouslyshown by Bertrand Russell at the beginning of the twentieth century.

Alice: What then?Hatter: To avoid such contradictions, the standard approach as we know it

today, called Zermelo-Fraenkel set theory (ZF for short), does notadmit the existence of a universal set, that is, one containing aselements everything whatsoever. Nor a set of all sets. Nor does itadmit the existence of the absolute complement of any set, containingas elements all those things that are not elements of a given set. Forif it did, by union, it would also have to admit the universal set.

We should perhaps say a bit more about this than did the Hatter. In fact, thereare some versions of set theory that do admit absolute complementation and theuniversal set. The best known, devised by the logician Quine, is called NF (shortfor ‘New Foundations’). But to avoid contradiction, such systems must lose powerin certain other respects, notably failing a ZF axiom scheme of ‘separation’. Thismakes them rather annoying to use in practice, and they are not favoured by workingmathematicians or computer scientists.

However, for the purposes of this book, none of that matters. For we areconcerned with sets of items that need not themselves be regarded as sets, togetherwith collections of those sets. In other words, all of our sets are of level 1 or 2. Thismay be continued to higher integer levels, which may in turn be made cumulativeand even extended to levels represented by transfinite ordinal numbers (sorry, theirexplanation is beyond this text). It is only when one goes beyond all that, to setswithout a specific ordinal level, that the problems of the universal set arise.

Thus, the absence of a universal set and of an operation of absolute complemen-tation is not really troublesome for our work here. Whenever you feel that you needto have some elbow room, just look for a set that is sufficiently large to contain allthe items that you are currently working on, and use it as your local universe forrelative complementation.

1.5 Generalized Union and Intersection

It is time to go a little beyond the cosy world of ‘flat’ set theory, sets of level 1, andlook at some constructions whose elements are themselves such sets, and thus oflevel 2. We begin with the operations of generalized union and intersection.

We know that when A1, A2 are sets, then we can form their union A1[A2,whose elements are just those items that are in at least one of A1, A2. Evidently,

1.5 Generalized Union and Intersection 19

we can repeat the operation, taking the union of that with another set A3 givingus (A1[A2)[A3. We know from an exercise that this is independent of the orderof assembly, that is, (A1[A2)[A3 D A1[(A2[A3). Thus, its elements are just thoseitems that are elements of at least one of the three, and we might as well write itwithout brackets.

Clearly, we can do this any finite number of times, and so it is natural to considerdoing it infinitely many times. In other words, if we have sets A1, A2, A3, : : : wewould like to consider a set A1[A2[A3[ : : : whose elements are just those items inat least one of the Ai for i 2 NC. To make the notation more explicit, we write thisset as [fAi: i 2 NCg or more compactly as [fAi gi2NC or as [i2NC fAi g.

Quite generally, if we have a collection fAi: i 2 Ig of sets Ai, one for each elementi of a fixed set I, we may consider the following two sets:• [i 2 IfAig, whose elements are just those things that are elements of at least one

of the Ai for i 2 I. It is called the union of the sets Ai for i 2 I.• \i 2 IfAig, whose elements are just those things that are elements of all of the Ai

for i 2 I. It is called the meet (or intersection) of the sets Ai for i 2 I.When there is any possibility of confusing the signs for generalized union and

intersection with those for the two-place versions, they are customarily writtenlarger, as in the exercise below. They are natural generalizations, and their propertiesare similar. For example, we have de Morgan and distribution principles. These arethe subject of the next exercise.

Alice Box: Sets, Collections, Families, Classes

Alice: One moment! Why do you refer to fAi: i 2 Ig as a collection whileits elements Ai, and also its union [i 2 IfAig and its intersection\i 2 IfAig, are called sets?

Hatter: The difference of words does not mark a difference of content. Itis merely to make reading easier. The human mind has difficulty inprocessing phrases like ‘set of sets’, and even more difficulty with ‘setof sets of sets : : : ’, and the use of a word like ‘collection’ or ‘class’helps keep us on track.

Alice: I think I have also seen the word ‘family’.Hatter: Here we should be a little careful. Sometimes the term ‘family’ is

used rather loosely to refer to a set of sets. But strictly speaking, it issomething different, a certain kind of function. So better not to usethat term until it is explained in Chap. 3.

Exercise 1.5 (1) (with partial solution) From the definitions of general union andintersection, prove the following distribution and de Morgan principles. In the lasttwo, complementation is understood to be relative to an arbitrary sufficiently largeuniverse:


(a) A \ [i 2 IfBig D [i 2 I (A \ Big (distribution of intersection over general union)(b) A[ \i 2 IfBig D \i 2 I (A[Big (distribution of union over general intersection)(c) �[i 2 IfAig D \i 2 If�Aig (de Morgan)(d) � \i 2 IfAig D [i 2 If�Aig (de Morgan)

Solution to LHS � RHS Part of (a) This is a simple ‘unpack, rearrange, repack’verification. Suppose x 2 LHS. Then, x 2 A and x 2 [i 2 IfBig. From the latter, bythe definition of general union, we know that x 2 Bi for some i 2 I. So for this i 2 I,we have x 2 A \ Bi, and thus again by the definition of general union, x 2 RHS, asdesired.

1.6 Power Sets

Our next construction is a little more challenging. Let A be any set. We may forma new set, called the power set of A, written as P(A) or 2A, consisting of all (andonly) the subsets of A. In other words, P(A) D fB: B � Ag. This may seem like arather exotic construction, but we will need it as early as Chap. 2 when workingwith relations.

Exercise 1.6 (1) (with solution) Let A D f1,2,3g. List all the elements of P(A).Using the list, define P(A) itself by enumeration. How many elements does P(A)have?

Solution The elements of P(A) are (beginning with the smallest and work-ing our way up): ¿, f1g, f2g, f3g, f1,2g, f1,3g, f2,3g, f1,2,3g. Thus,P(A) D f¿,f1g,f2g,f3g,f1,2g,f1,3g,f2,3g, f1,2,3gg. Counting, we see that P(A) has 8elements.

Warning: Do not forget the two ‘extreme’ elements of P(A): the smallest subsetA, namely, ¿, and the largest one, namely, A itself. Also be careful with the curlybrackets. Thus, 1, 2, 3 are not elements of P(A), but their singletons f1g, f2g, f3g are.When defining P(A) by enumeration, don’t forget the outer brackets enclosing theentire list of elements. These may seem like pedantic points of punctuation, but ifthey are missed, then you can easily get into a dreadful mess.

All of the elements of P(A) are subsets of A, but some of them are also subsets ofothers. For example, the empty set is a subset of all of them. This may be broughtout clearly by the following Hasse diagram, called after the mathematician whointroduced it (Fig. 1.4).

Exercise 1.6 (2) (with partial solution) Draw Hasse diagrams for the power setsof each of ¿, f1g, f1,2g, f1,2,3,4g. How many elements does each of these powersets have?

Partial solution They have 1, 2, 4 and 16 elements, respectively.

1.6 Power Sets 21

Fig. 1.4 Hasse diagram for P(A) when A D f1,2,3g

There is a pattern here. Quite generally, if A is a finite set with n elements, itspower set P(A) has 2n elements. Here is a rough but simple proof. Let a1, : : : ,an bethe elements of A. Consider any subset B � A. For each ai, there are two possibilities:either ai 2 B or ai 62 B. That gives us 2�2� : : : �2 (n times) D 2n independent choicesto determine which among a1, : : : ,an are elements of B, thus 2n possible identitiesfor B.

This fact is very important for computing. Suppose that we have a way ofmeasuring the cost of a computation in terms of, say, time of calculation as afunction of, say, the number of input items. It can happen that this measure increasesin proportion to 2n, that is, is of the form k�2n for some fixed k. This is knownas exponential growth, and it is to be avoided like the plague, as it quickly leadsto unfeasibly expensive calculations. For example, suppose that such a process isdealing with an input of 10 items. Now 210 D 1,024, which may seem reasonable.But if the input has 100 items to deal with, we have 2100 steps to be completed,which would take a very long time indeed (try writing it out in decimal notation).

In this chapter, we have frequently made use of if : : : then : : : (alias conditional)statements and also iff (alias biconditional) ones. It is time to fulfil a promiseto Alice, to make their meanings clear. In mathematics, they are used in a verysimple way. Like conjunction, disjunction and negation, they are truth-functional.The biconditional is perhaps the easier to grasp.

Logic Box: Truth Table for ‘iff’

The biconditional ’ if and only if “, briefly written ’ iff “ and in formalnotation ’ $ “, is true whenever ’ and “ have the same truth-value, and falsewhenever they have opposite truth-values. Its table is as follows:

(continued)


(continued)

’ “ ’ $ “

1 1 11 0 00 1 00 0 1

The conditional is rather more difficult for students to assimilate. It differs fromthe biconditional in its third row.

Logic Box: Truth Table for ‘if’

The conditional if ’ then “, in formal notation ’ ! “, is true in all cases exceptone – the ‘disastrous combination’: ’ true and “ false.

’ “ ’ ! “

1 1 11 0 00 1 10 0 1

From its table, it is clear that a statement ’ ! “ is always true except in thesecond row. Comparing the two tables, it is clear that ’$“ is true just when ’ ! “

and its converse “ ! ’ are both true.

Alice Box: The Truth Table for the Conditional

Alice: Well, at least you kept your promise! But I am not entirely satisfied.I see why the ‘disastrous combination’ makes if ’ then “ false. Butwhy do all the other combinations make it come out true? The twostatements ’ and “ may have nothing to do with each other, like‘London is in France’ and ‘kangaroos are fish’. These are both false,but it is strange to say that the statement ‘if London is in France thenkangaroos are fish’ is true.

Hatter: Indeed, it is rather strange and, to be frank, in everyday life we do useif : : : then : : : in very subtle ways – much more so than any such table.But in mathematics, we use the conditional in the simplest possiblemanner, which is that given by the truth table. Moreover, it turns outto underlie all the other ones.

End-of-Chapter Exercises 23

Once again, we should say a bit more than did the Hatter. Here is an examplethat may not entirely convince you but might at least make the truth table lessstrange. We know that all positive integers divisible by four are even. This is, ineffect, a universally generalized conditional. It says: For every positive integer n, ifn is divisible by four, then it is even. So for every particular choice that we make ofa positive integer n, the statement, if n is divisible by four then it is even, comes outtrue. For example, the following three are all true:If 8 is divisible by 4, then it is even.If 2 is divisible by 4, then it is even.If 9 is divisible by 4, then it is even.But the first of these corresponds to the top row of our truth table (both componentstrue), the second to the third row (’ false, “ true), and the last to the fourth row (bothcomponents false) – while in each of these three cases, the conditional is true. Thefollowing exercise, due to Dov Gabbay, will also help bring out the same point.

Exercise 1.6 (2) A shop hangs a sign in the window saying ‘If you buy a computer,then you get a free printer’. The relevant government office suspects the shop offalse advertising and sends agents disguised as customers to make purchases. Eachagent reports on what happens. How many relevant configurations can emerge inthese reports? Which would justify charging the manager with false advertising?

Finally, we recall the names of some familiar sets of numbers. We have alreadymentioned the set NC D f1,2,3, : : : g of all positive integers. Some other number setsthat we will need to refer to are the following:N D NC [f0g, that is, the set consisting of zero and all the positive integers. This iscalled the set of the natural numbers. Warning: Some authors use this name to referto the positive integers only. Be wary when you read.Z D f0,˙1, ˙2, ˙3, : : : g, the set of all integers (positive, negative and zero).Z� D f : : : , �3, �2, �1g D ZnN, which is the set of all negative integers.Q D fp/q: p,q 2 Z and q ¤ 0g, the set of all rational numbers (positive, negative andzero).R D the set of all real numbers, also representable as the set of all numbers of theform p C d1d2 : : : where p 2 Z and d1d2 : : : is an ending or unending decimal (seriesof digits from 0 to 9).

End-of-Chapter Exercises

Exercise 1.1: Boolean operations on sets We define the operation A C Bof symmetric difference (sometimes known as disjoint union) by puttingA C B D (AnB)[(BnA). Notations vary: Often ˚ is used for this operation,sometimes �.(a) Show that for any x, x 2 A C B iff x is an element of exactly one of A, B.(b) Draw a Venn diagram for the operation (and use it to help intuition in what

follows).


(c) Show that A C B � A[B.(d) Show that A C B is disjoint from A \ B.(e) Show that A C B D (A[B)n(A \ B).(f) For each of the following properties of [, check out whether or not it also holds

for C, giving a proof or a counterexample as appropriate: (i) commutativity, (ii)associativity, (iii) distribution of \ over C, (iv) distribution of C over \ .

(g) Express �(A C B) using union, intersection and complement.(h) We have seen that each of intersection, union and difference corresponds to

a truth-functional logical connective. To what connective mentioned in thischapter does symmetric difference correspond? Draw its truth table.

Exercise 1.2: Counting principles for Boolean operations When A is a finite set,we write #(A) for the number of elements that it contains. Use the definitions ofthe Boolean operations together with your knowledge of elementary arithmetic toverify the following for finite sets. They will all be used in the chapter on counting:(a) #(A[B) � #(A) C #(B).(b) Sometimes #(A[B) < #(A) C #(B) (give an example).(c) When A, B are disjoint, then #(A[B) D #(A) C #(B). What does this tell you

about #(A C B)?(d) When A, B are any finite sets (not necessarily disjoint), then #(A[B) D #(A) C

#(B) � #(A \ B). This is known as the rule of inclusion and exclusion.(e) #(A \ B) � min(#(A), #(B)). Here, min(m,n) is whichever is the lesser of the

integers m,n.(f) Sometimes #(A \ B) < min(#(A), #(B)) (give an example).(g) Formulate and verify a necessary and sufficient condition for the equality

#(A \ B) D min(#(A), #(B)) to hold.(h) #(¿) D 0.(i) #(AnB) D #(A) � #(A \ B).(j) #(A C B) D #(A) C #(B) � 2#(A \ B).(k) When B � A, then #(AnB) D #(A) � #(B).

Exercise 1.3: Generalized union and intersection(a) Let fAigi 2 I be any collection of sets. Show that for any set B, we have

(i) [fAigi 2 I � B iff Ai � B for every i 2 I, (ii) B � \ fAigi 2 I iff B � Ai for everyi 2 I.

(b) Find a collection fAigi 2 N of non-empty sets with each Ai � AiC1 but with\ fAigi 2 I empty.

(c) Why can’t (b) be done for some finite collection fAigi 2 I of non-empty sets?

Exercise 1.4: Power sets(a) Show that whenever A � B, then P(A) �P(B).(b) True or false: P(A \ B) DP(A) \ P(B)? Give a proof or a counterexample.(c) True or false: P(A [ B) DP(A) [ P(B)? Give a proof or counterexample.

Selected Reading 25

Selected Reading

For a very gentle introduction to sets, which nevertheless takes the reader up to an outline of theZermelo-Fraenkel axioms for set theory, see:

Bloch ED (2011) Proofs and fundamentals: a first course in abstract mathematics, 2nd edn.Springer, New York, chapter 3

A classic of beautiful exposition, but short on exercises (readers should instead verify the claimsmade in the text):

Halmos PR (2001) Naive set theory, new edn. Springer, New York, chapters 1–5, 9

The present material is covered with lots of exercises in:Lipschutz S (1998) Set theory and related topics, Schaum’s outline series. McGraw Hill, New

York, chapters 1–2 and 5.1–5.3

This page intentionally left blank

2Comparing Things: Relations

AbstractRelations play an important role in computer science, both as tools of analysisand for representing computational structures such as databases. In this chapter,we introduce the basic concepts you need to master in order to work with them.

We begin with the notions of an ordered pair (and more generally, orderedn-tuple) and the Cartesian product of two or more sets. We then consider oper-ations on relations, notably those of forming the converse, join and compositionof relations, as well as some other operations that make relations interact withsets, notably the image and the closure of a set under a relation.

We also explore two of the main jobs that relations are asked to carry out:to classify and to order. For the former, we explain the notion of an equivalencerelation (reflexive, transitive, symmetric) over a set and how it corresponds to thenotion of a partition of the set. For the latter, we look first of all at several kindsof reflexive order, and then at their strict parts.

2.1 Ordered Tuples, Cartesian Products and Relations

What do the following have in common: one car overtaking another, a boy loving agirl, one tree being shadier than another, an integer dividing another, a point lyingbetween two others and a student exchanging one book for another with a friend?

They are all examples of relations involving at least two items – in some instancesthree (one point between two others), four (the book exchange) or more. Often,they involve actions, intentions, the passage of time and causal connections; but inmathematics and computer science, we abstract from all those features and workwith a very basic, stripped-down concept. To explain what it is, we begin with thenotions of an ordered tuple and Cartesian product.


27

28 2 Comparing Things: Relations

2.1.1 Ordered Tuples

Recall from the preceding chapter that when a set has exactly one element, it iscalled a singleton. When it has exactly two distinct elements, it is called a pair.For example, the set f7,9g is a pair, and it is the same as the pair f9,7g. We havef7,9g D f9,7g because the order is irrelevant: the two sets have exactly the sameelements.

An ordered pair is like a (plain, unordered) pair except that order matters. Tohighlight this, we use a different notation. The ordered pair whose first element is 7and whose second element is 9 is written as (7,9) or, in older texts, as <7,9>. It isdistinct from the ordered pair (9,7) although they have exactly the same elements:(7,9) ¤ (9,7), because the elements are considered in a different order.

Abstracting from this example, the criterion for identity of ordered pairs isas follows: (x1,x2) D (y1,y2) iff both x1 D y1 and x2 D y2. This contrasts with thecriterion for identity of plain sets: fx1,x2g D fy1,y2g iff the left and right hand setshave exactly the same elements, which, it is not difficult to show, holds iff either(x1 D y1 and x2 D y2) or (x1 D y2 and x2 D y1).

More generally, the criterion for identity of two ordered n-tuples (x1,x2, : : : ,xn)and (y1,y2, : : : ,yn) is as you would expect: (x1,x2, : : : ,xn) D (y1,y2, : : : ,yn) iff xi D yi

for all i from 1 to n.

Exercise 2.1.1 Check in detail the claim that fx1,x2g D fy1,y2g iff either (x1 D y1

and x2 D y2) or (x1 D y2 and x2 D y1). Hint: It helps to break your argument intocases.

Alice Box: Ordered Pairs

Alice: Isn’t there something circular in all this? You promised that relationswill be used to build a theory of order, but here you are defining theconcept of a relation by using the notion of an ordered pair, whichalready involves the concept of order!

Hatter: A subtle point, and a good one! But I would call it a spiral rather thana circle. We need just a rock-bottom kind of order – no more thanthe idea of one thing coming before another – in order to understandwhat an ordered pair is. From that we can build a very sophisticatedtheory of the various kinds of order that relations can create.

2.1.2 Cartesian Products

With this in hand, we can introduce the notion of the Cartesian product of two sets.If A, B are sets, then their Cartesian product, written A � B and pronounced ‘A crossB’ or ‘A by B’, is defined as follows: A � B D f(a,b): a 2 A and b 2 Bg.

2.1 Ordered Tuples, Cartesian Products and Relations 29

In English, A � B is the set of all ordered pairs whose first term is in A and whosesecond term is in B. When B D A, so that A � B D A � A, it is customary to write it asA2, calling it ‘A squared’. The operation takes its name from Rene Descartes who,in the seventeenth century, made use of the Cartesian product R2 of the set R of allreal numbers. His seminal idea was to represent each point of a plane by an orderedpair (x,y) of real numbers and use this representation to solve geometric problemsby algebraic methods. The set R2 is called the Cartesian plane.

A very simple concept – but be careful, it is also easy to trip up! Take note of theand in the definition, but don’t confuse Cartesian products with intersections. Forexample, if A, B are sets of numbers, then A\B is also a set of numbers; but A � Bis a set of ordered pairs of numbers.

Exercise 2.1.2 (1) (with solution) Let A D fJohn, Maryg and B D f1,2,3g, C D ¿.What are A � B, B � A, A � C, C � A, A2, B2? Are any of these identical with others?How many elements in each?

SolutionA � B D f(John,1), (John,2), (John,3), (Mary,1), (Mary,2), (Mary,3)gB � A D f(1,John), (2,John), (3,John), (1,Mary), (2,Mary), (3,Mary)gA2 D f(John,John), (John, Mary), (Mary,John), (Mary, Mary)gB2 D f(1,1), (1, 2), (1,3), (2,1), (2, 2), (2,3), (3,1), (3, 2), (3,3)gOf these, A � C D ¿ D C � A, but that is all; in particular A � B ¤ B � A.Counting: #(A � B) D 6 D #(B � A), #(A � C) D 0 D #(C � A), #(A2) D 4, #(B2) D 9.

Comment Note that also A � B D f(John,1), (Mary,1), (John,2), (Mary,2), (John,3),(Mary,3)g. Within the curly brackets, we are enumerating the elements of a set,so we can write them in any order we like; but within the round brackets, we areenumerating the terms of an ordered pair (or ordered n-tuple), so there the order isvital.

From the exercise, you may already have guessed a general counting principlefor the Cartesian products of finite sets: #(A � B) D #(A)�#(B), where the dot standsfor ordinary multiplication. Here is a proof. Let #(A) D m and #(B) D n. Fix anyelement a 2 A. Then there are n different pairs (a,b) with b 2 B. And when we fix adifferent a0 2 A, then the n pairs (a0,b) will all be different from the pairs (a,b) sincethey differ on their first terms. Thus, there are n C n C � � � C n (m times), that is, m�npairs altogether in A � B.

Thus, although the operation of forming the Cartesian product of two sets isnot commutative (i.e. we may have A � B ¤ B � A), the operation of counting theelements of the Cartesian product is commutative: always #(A � B) D #(B � A).

Exercise 2.1.2 (2) (with partial solution)(a) Show that when A � A0 and B � B0, then A � B � A0 � B0.(b) Show that when both A ¤ ¿ and B ¤ ¿, then A � B D B � A iff A D B.


Solution to (b) Suppose A ¤ ¿ and B ¤ ¿. We need to show that A � B D B � A iffA D B. We do this in two parts. First, we show that if A D B, then A � B D B � A.Suppose A D B. By the supposition, A � B D A2 and also B � A D A2, so thatA � B D B � A as desired. Next, we show the converse, that if A � B D B � A, thenA D B. The easiest way to do this is by showing the contrapositive: if A ¤ B, thenA � B ¤ B � A. Suppose A ¤ B. Then at least one of A � B, B � A fails. We considerthe former case; the latter is similar. Then there is an a 2 A with a 62 B. Bysupposition, B ¤ ¿, so there is a b 2 B. Thus, (a,b) 2 A � B, but since a 62 B, (a,b) 62B � A. Thus, A � B ¤ B � A, as desired.

The solution to this exercise is instructive in two respects. In the first place, ituses a ‘divide and rule’ strategy, breaking the problem down into two componentparts and tackling them one by one. In particular, in order to prove an if and onlyif statement (also known as a biconditional), it is often convenient to do it in twoparts: first prove the if in one direction, and then prove it in the other. As we sawin the preceding chapter, these are not the same; they are called converses of eachother.

In the second place, the exercise illustrates the tactic of proving by contrapo-sition. Suppose we want to prove a statement if ’ then “. As mentioned earlier,the most straightforward way of tackling this is to suppose ’ and drive towards “.But that does not always give the most transparent proof. Sometimes it is better tosuppose not-“ and head for not-’. That is what we did in the example: to provethat if A � B D B � A, then A D B for non-empty sets A,B, we supposed A ¤ B andshowed A � B ¤ B � A.

Why is proof by contraposition legitimate? It is because the two conditionals’ ! “ and :“ ! :’ are equivalent, as can be seen from their truth tables in thepreceding chapter. How can the tactic help? Often, suppositions like A ¤ B give ussomething to ‘grab hold of’. In our example, it tells us that there is an a with a 2 A,a 62 B (or conversely); we can then consider such an a, and start reasoning about it.This pattern occurs quite often.

Exercise 2.1.2 (3)(a) An example from arithmetic of proving a biconditional via its two parts. Show

that for any positive integers m,n, we have n D m iff each of m,n divides theother.

(b) An example from arithmetic of proving via the contrapositive. Show that for anytwo real numbers x,y, if x�y is irrational, then at least one of x,y is irrational.

2.1.3 Relations

Let A,B be any sets. A binary relation from A to B is defined to be any subset of theCartesian product A � B. It is thus any set of ordered pairs (a,b) such as a 2 A andb 2 B. A binary relation is thus fully determined by the ordered pairs that it covers.It does not matter how these pairs are presented or described. It is customary to use

2.1 Ordered Tuples, Cartesian Products and Relations 31

R, S, : : : as symbols standing for relations. As well as saying that the relation isfrom A to B, one also says that it is over A � B.

From the definition, it follows that in the case that A D B, a binary relation fromA to A is any set of ordered pairs (a,b) such as both a,b 2 A. It is thus a relationover A2, but informally we often abuse language a little and describe it as a relationover A.

Evidently, the notion may be generalized to any number of places. Let A1, : : : ,An

be sets (n � 1). An n-place relation over A1� : : : �An is defined to be any subset ofA1� : : : �An. In other words, it is any set of n-tuples (a1, : : : ,an) with each ai 2 Ai.Binary relations are thus 2-place relations. As a limiting case of the more generalnotion, when n D 1 we have 1-place relations over A, which may be identified withsubsets of A.

In this chapter, we will be concerned mainly with binary relations, and when wespeak of a relation without specifying the number of places, we will mean a binaryone.

Exercise 2.1.3 (1) (with solution) Let A, B be the sets given in Exercise 2.1.2 (1).(a) Which of the following are (binary) relations from A to B: (i) f(John,1),

(John,2)g, (ii) f(Mary,3), (John,Mary)g, (iii) f(Mary,2), (2,Mary)g, (iv)f(John,3), (Mary,4)g, (v) (fMary,1g, fJohn,3g)?

(b) What is the largest relation from A to B? What is the smallest?(c) Identify (by enumeration) three more relations from A to B.(d) How many relations are there from A to B?

Solution(a) Only (i) is a relation from A to B. (ii) is not, because Mary is not in B. (iii) is

not, because 2 is not in A. (iv) is not, because 4 is not in B. (v) is not, because itis a an ordered pair of sets, not a set of ordered pairs.

(b) A � B is the largest, ¿ is the smallest, in the sense that ¿ � R � A � B for everyrelation R from A to B.

(c) For brevity, we can choose three singleton relations: f(John,1)g, f(John,2)g,f(John,3)g.

(d) Since #(A) D 2 and #(B) D 3, #(A � B) D 2�3 D 6. From the definition ofa relation from A to B it follows that the set of all relations from A toB is just P(A � B), and by a counting principle in the chapter on sets,#(P(A � B)) D 2#(A � B) D 26 D 64.

When R is a relation from A to B, we call the set A a source of the relation,and B a target. Sounds simple enough, but care is advised. As already noted in anexercise, when A � A0 and B � B0, then A � B � A0 � B0, so when R is a relation fromA to B, then it is also a relation from A0 to B0. Thus, the source and target of R, in theabove sense, are not unique: a single relation will have indefinitely many sourcesand targets.


For this reason, we also need terms for the least possible source and target ofR. We define the domain of R to be the set of all a such that (a,b) 2 R for someb, writing briefly dom(R) D fa: there is a b with (a,b) 2 Rg. Likewise we definerange(R) D fb: there is an a with (a,b) 2 Rg. Clearly, whenever R is a relation fromA to B, then dom(R) � A and range(R) � B.

Warning: You may occasionally see the term ‘codomain’ contrasting with ‘domain’.But care is needed, as the term is sometimes used broadly for ‘target’, sometimesmore specifically for ‘range’. In this book, we will follow a fairly standard terminol-ogy, with domain and range defined as above, and source, target for any supersetsof them.

Exercise 2.1.3 (2)(a) Consider the relation R D f(1,7), (3,3), (13,11)g and the relation S D f(1,1),

(3,11), (13,12), (15,1)g. Identify dom(R), range(R), dom(S), range(S).(b) The identity relation IA over a set A is defined by putting I D f(a,a): a 2 Ag.

Identify dom(IA) and range(IA).(c) Identify dom(A � B), range(A � B).

Since relations are sets (of ordered pairs or tuples), we can apply to them all theconcepts that we developed for sets. In particular, it makes sense to speak of onerelation R being included in another relation S: every tuple that is an element of Ris an element of S. In this case, we also say that R is a subrelation of S. Likewise,it makes sense to speak of the empty relation: it is the relation that has no elements,and it is unique. It is thus the same as the empty set, and can be written ¿.

Exercise 2.1.3 (3)(a) Use the definitions to show that the empty relation is a subrelation of every

relation.(b) Identify dom(¿), range(¿).(c) What would it mean to say that two relations are disjoint? Give an example of

two disjoint relations over a small finite set A.

2.2 Tables and Digraphs for Relations

In mathematics, rigour is important, but so is intuition. The two should go hand inhand. One way of strengthening one’s intuition is to use graphic representations.This is particularly so in the case of binary relations. For sets in general, we usedEuler and Venn diagrams; for the more specific case of relations, tables and arrowdiagrams are helpful.

2.2 Tables and Digraphs for Relations 33

Table 2.1 Table for arelation

R 1 2 3

John 1 0 1Mary 0 1 1

2.2.1 Tables

Let’s go back to the sets A D fJohn,Maryg and B D f1,2,3g of earlier exercises.Consider the relation R D f(John,1), (John,3), (Mary,2), (Mary,3)g. How might werepresent R by a table?

We need two rows and three columns for the elements of A and B, respectively.Each cell in this table is uniquely identified by its coordinate (a,b), where a is theelement for the row and b is the element for the column. In the cell write 1 (for‘true’) or 0 (for ‘false’) according as the ordered pair (a,b) is or is not an element ofthe relation. For the R chosen above, this gives us Table 2.1.

Tabular representations of relations are particularly useful when dealing withdatabases because good software is available for writing and manipulating them.Moreover, when a table has the same number of columns as rows, geometricaloperations such as folding along the diagonal can also reveal interesting features.

Exercise 2.2.1(a) Let A D f1,2,3,4g. Draw tables for each of the following relations over A2: (i) <,

(ii) �, (iii) D, (iv) ¿, (v) A2.(b) In each of the four tables, draw a line from the top left cell (1,1) to the bottom

right cell (4,4). This is called the diagonal of a relation over A2. Comment onthe contents of the diagonal in each of the four tables.

(c) Imagine folding the table along the diagonal. Comment on any symmetries thatbecome visible.

2.2.2 Digraphs

Another way of representing binary relations, less suitable for software implementa-tion but more friendly to humans, is by means of arrow diagrams known as directedgraphs or more briefly digraphs. The idea is simple, at least in the finite case. Givena relation from A to B, mark a point for each element of A [ B, labelling it with aname if desired. Draw an arrow from one point to another just when the first standsin the relation to the second. When the source A and target B of the relation are notthe same, it can be useful to add into the diagram circles for the sets A and B.

In the example considered for the table above, the digraph comes out as inFig. 2.1.

The Hasse diagram between the subsets of f1, 2, 3g that we presented in Chap. 1can be seen as a digraph for the relation of being immediately included in. This holdsfrom a set B to a set C iff B � C, but there is no X with B � X � C. However, the


Fig. 2.1 Diagram for arelation

Hasse diagram uses plain lines rather than arrows, with the convention that these arealways read with subset below and superset above.

Given a diagram for a relation, we can use it to read off its reflexive andtransitive closure. In particular, given the Hasse diagram for the relation of B beingimmediately included in C, we can read off the relation of B being included in C: itholds iff there is an ascending path from B to C (including the one-element path).This is much more economical than drawing a digraph for the entire relation ofinclusion.

Exercise 2.2.2 Take from Chap. 1 the Hasse diagram for immediate inclusionbetween subsets of the set A D f1,2,3g, and compare it with the digraph for theentire relation of inclusion. How many links in each?

Diagrams are valuable tools to illustrate situations and stimulate intuitions.They can often help us think up counterexamples to general claims, and they cansometimes be used to illustrate a proof, making it much easier to follow. However,they have their limitations. In particular, it should be remembered that a diagramcan never itself constitute a proof of a general claim.

2.3 Operations on Relations

Since relations are sets, we can carry out on them all the Boolean operations forsets that we learned in the preceding chapter, provided we keep track of the sourcesand targets. Thus, if R and S are relations from the same source A to the sametarget B, their intersection R \ S, being the set of all ordered pairs (x,y) that aresimultaneously elements of R and of S, will also be a relation from A to B. Likewisefor the union R[S and also for complement �R with respect to A � B. Note that justas for sets, �R is really a difference (A � B) � R and so depends implicitly on thesource A and target B as well as on R itself.

As well as the Boolean operations, there are others that arise only for relations.We describe some of the most important ones.

2.3 Operations on Relations 35

2.3.1 Converse

The simplest is that of forming the converse (alias inverse) R�1 of a relation. Givena relation R, we define R�1 to be the set of all ordered pairs (b,a) such that (a,b) 2 R.

Warning: Do not confuse this with complementation! There is nothing negativeabout conversion despite the standard notation R�1: we are simply reversing thedirection of the relation. The complement of loves is doesn’t love, but its converse isis loved by.

Exercise 2.3.1 (1) (with solutions)(a) Let A be the set of natural numbers. What are the converses of the following

relations over A: (i) less than, (ii) less than or equal to, (iii) equal to.(b) Let A be a set of people. What are the converses of the following relations over

A: (i) being a child of, (ii) being a descendant of, (iii) being a daughter of,(iv) being a brother of, (v) being a sibling of, (vi) being a husband of.

Solutions(a) (i) greater than, (ii) greater than or equal to, (iii) equal to.(b) (i) being a parent of, (ii) being an ancestor of, (iii) having as a daughter,

(iv) having as a brother, (v) being a sibling of, (vi) being a wife of.

Comments In group (a) we already have examples of how the converse of a relationmay be disjoint from it, overlap with it or be identical with it.

Some of the examples in group (b) are a little tricky. Note that the converse ofbeing a brother of is not being a brother of: when a is a brother of b, b may be femaleand so a sister of a.

Sometimes, ordinary language has a single word for a relation and another singleword for its converse. This is the case for (i) (child/parent) (ii) (ancestor/descendant)and (vi) (husband/wife). But it is not always so: witness (iii) and (iv) where wehave to use special turns of phrase: daughter/having as daughter, brother/having asbrother. The range of words available for family relations differs from one languageto another.

Exercise 2.3.1 (2)(a) What does the converse of a relation look like from the point of view of a digraph

for the relation? And from the point of view of a table for it?(b) Show that dom(R�1) D range(R) and range(R�1) D dom(R).(c) Show that (i) (R�1)�1 D R, (ii) (R\S)�1 D R�1\S�1, (iii) (R[S)�1 D R�1[S�1.

Compare these equalities with those for complementation.


Alice Box: Converse of an n-Place Relation

Alice: Does the notion of conversion make sense for relations with morethan two places?

Hatter: It does, but there we have not just one operation but several. Forsimplicity, take the case of a three-place relation R, whose elementsare ordered triples (a,b,c). Then, there is an operation that convertsthe first two, giving triples (b,a,c), and an operation converting thelast two, giving triples (a,c,b).

Alice: And one switching the first and the last, giving triples (c,b,a)?Hatter: Indeed. However, once we have some of these operations we can

get others by iterating them. For example, your operation may beobtained by using the first two as follows: from (a,b,c) to (b,a,c) to(b,c,a) to (c,b,a), using the first, second and first operations. We couldalso get the first operation, say, by iterating the second and third : : :

Alice: Stop there, I’ve got the idea.

2.3.2 Join, Projection, Selection

Imagine that you are in the personnel division of a firm, and that you are in charge ofa database recording the identity numbers and names of employees. Your colleagueacross the corridor is in charge of another database recording their names and tele-phone extensions. Each of these databases may be regarded as a binary relation. Ineffect, we have a large set A of allowable identity numbers (e.g. any six digit figure),a set B of allowable names (e.g. any string of at most twenty letters and spaces, withno space at beginning or end and no space immediately following another one),and a set C of allowable telephone extension numbers (e.g. any figure of exactlyfour digits). Your database is a subset R of A � B, your colleague’s database is asubset S of B � C; they are thus both relations. These relations may have specialproperties. For example, it may be required that R cannot associate more than onename with any given identity number, although S may legitimately associate morethan one telephone extension to a given name. We will not bother with these specialproperties at the moment, leaving them to the chapter on functions.

The manager may decide to merge these two databases into one, thus economiz-ing on the staff needed to maintain them and liberating one of you for other tasks(or for redundancy). What would the natural merging be? A three-place relationover A � B � C consisting of all those triples (a,b,c) such that (a,b) 2 R and (b,c)2 S. This is clearly an operation on relations, taking two binary relations with acommon middle set (target of one, source of the other) to form a three-place relation.The resulting relation/database gives us all the data of the two components, withno loss of information.

Of course, in practice, databases make use of relations with more thanjust two places, and the operation that we have described may evidently begeneralized to cover them. Let R be a relation over A1 � : : : � Am � B1 � : : : � Bn


and let S be a relation over B1 � : : : � Bn � C1 � : : : � Cp. We define thejoin of R and S, written join(R,S), to be the set of all those (m C n C p)-tuples (a1, : : : ,am,b1, : : : ,bn,c1, : : : ,cp) such that (a1, : : : ,am,b1, : : : ,bn) 2 R and(b1, : : : ,bn,c1, : : : ,cp) 2 S.

Another terminological warning: while database theorists (and this book) call thisoperation ‘join’; set theorists and algebraists sometimes use the same word as asynonym for the simple union of sets.

There are several further operations of database theory that cannot be obtainedfrom conversion and join alone. One is projection. Suppose R � A1 � : : : � Am �B1 � : : : � Bn is a database. We may project R onto its first m places, forming arelation whose elements are those m-tuples (a1, : : : ,am) such that there are b1, : : : ,bn

with (a1, : : : ,am,b1, : : : ,bn) 2 R.Another database operation is selection (alias restriction in the language of

set theorists and algebraists). Again, let R � A1 � : : : � Am � B1 � : : : � Bn be adatabase and let C1, : : : ,Cm be subsets of A1, : : : ,Am, respectively (i.e. Ci � Ai

for each i � m). We may select (or restrict the relation to) the subsets by tak-ing its elements to be just those (m C n)-tuples (c1, : : : ,cm,b1, : : : ,bn) such that(c1, : : : ,cm,b1, : : : ,bn) 2 R and each ci 2 Ci. Here we have selected subsets fromthe first m arguments. We could equally well have selected from the last n, or fromany others. In database contexts, the subsets Ci will often be singletons fcig.

Notice that conversion and selection both leave the arity (i.e. number of places)of a relation unchanged, while projection diminishes and join increases it. Bycombining them, one can do a lot of manipulation.

Exercise 2.3.2 (with partial solution)(a) Formulate the definition of join(R,S) for the case that R is a two-place relation

over A � B and S is a three-place relation over B � C � D.(b) Formulate the definition of projection for the case of a two-place relation

R � A � B and that we project to (i) A, (ii) B.(c) Formulate the definition of selection (alias restriction) in the case that we restrict

a two-place relation R � A � B to (i) a subset A0 of A, (ii) a subset B0 of B,(iii) both A0 and B0.

(d) Let A D f2, 3, 13g, B D f12, 15, 17g, C D f11, 16, 21g. Put R D f(a,b): a 2 A,b 2 B, a divides bg, S D f(b,c): b 2 B, c 2 C, b has the same parity as cg.(i) Enumerate each of R and S. (ii) Enumerate join(R,S). (iii) Project R onto A,S onto C, and join(R,S) onto A � C. (iv) Restrict R to the odd elements of B, Sto the even elements of B, and join(R,S) to the prime elements of B.

Partial Solution(a) join(R,S) is the set of all 4-tuples (a,b,c,d) such that (a,b) 2 R, and (b,c,d) 2 S.(b) In part: the projection of R to A is the set of all a 2 A such that there is a b 2 B

with (a,b) 2 R.(c) In part: the selection of A0 as source for R is the set of all pairs (a,b) 2 R

with a 2 A0.


2.3.3 Composition

While the join operation is immensely useful for database manipulation, it does notoccur very often in everyday language. But there is a rather similar operation thatchildren learn at a very young age – as soon as they can recognize members oftheir extended family. It was first studied in the nineteenth century by Augustus deMorgan who called it relative product. Nowadays it is usually called composition.We consider it for the most commonly used case; the composition of two two-placerelations.

Suppose we are given relations R � A � B and S � B � C, with the target of thefirst the same as the source of the second. We have already defined their join as theset of all triples (a,b,c) such that (a,b) 2 R and (b,c) 2 S. But we can also define theircomposition SºR as the set of all ordered pairs (a,c) such that there is some x withboth (a,x) 2 R and (x,c) 2 S. Notice that the arity of the composition is one less thanthat of the join.

For example, if F is the relation of ‘a father of’ and P is the relation of ‘a parentof’, then PºF is the relation consisting of all ordered pairs (a,c) such that there issome x with (a,x) 2 F and (x,c) 2 P, that is, all ordered pairs (a,c) such that for somex, a is a father of x and x is a parent of c. It is thus the relation of a being a father ofa parent of c, that is, of a being a grandfather of c.

Alice Box: Notation for Composition of Relations

Alice: That looks funny, the wrong way round. Wouldn’t it be easier to writethe composition, as you have defined it, with the letters F and P theother way round as FºP. Then, they would follow the same order asthey occur in the phrase ‘a father of a parent of’, which is also theorder of the predicates when we say ‘(a,x) 2 F and (x,c) 2 P’ in thedefinition?

Hatter: Indeed, it would correspond more directly with the way that ordinarylanguage handles composition, and early authors working in thetheory of relations did it that way.

Alice: Why the switch?Hatter: To bring it into agreement with the standard way of doing things in

the theory of functions. As we will see in the next chapter, a functioncan be defined as a special kind of relation, and notions such ascomposition for functions turn out to be the same as for relations;so it is best to use the same notation in the two contexts, and thenotation for functions won out.

Alice: I’m afraid that I’ll always mix them up!Hatter: The best way to avoid that is to write the definition down again before

each exercise involving composition of relations. To help rememberthe definition correctly, note that the arrangement of R and S issymmetric around the centre of the definition.


Logic Box: Existential Quantifier

Our definition of the composition of two relations made use of the phrase‘there is an x such that : : : ’. This is known as the existential quantifier, and iswritten 9x( : : : ). For example, ‘There is an x with both (a,x) 2 R and (x,c) 2 Sand is written as 9x((a,x)2R ^ (x,c)2S), or more briefly as 9x(Rax ^ Sxc)’.

The existential quantifier 9 has its own logic, which we will describe in alater chapter along with the logic of its companion, the universal quantifier 8.In the meantime, we will simply adopt the notation 9x( : : : ) as a convenientshorthand for the longer phrase ‘there is an x such that : : : ’, and likewise use8x( : : : ) to abbreviate ‘for every x, : : : holds’.

The existential and universal quantifiers always come with a variable attached.However, ordinary speech does not contain variables; even in written text variablesare rare outside mathematical contexts. But colloquial language manages to conveythe notion in other ways. For example, when we say ‘some composers are poets’,we are not using a variable, but we are in effect saying that there is an x suchthat x is both a composer and a poet, which the set-theorist would write as9x(x 2 C ^ x 2 P) and the logician as 9x(Cx ^ Px). Pronouns can also be usedlike variables. For example, when we say ‘if there is a free place, I will reserve it’,the ‘it’ is doing the work of a variable. However, once we begin to formulate morecomplex statements involving several different quantifications, it can be difficult tobe precise without using variables explicitly.

Exercise 2.3.3 (1) (with solution) Let A be the set of people, and P, F, M, S, B therelations over A of ‘parent of’, ‘father of’, ‘mother of’, ‘sister of’ and ‘brother of’,respectively. Describe exactly the following relative products: (a) PºP, (b) MºF,(c) SºP, (d) BºB.

Warnings: (1) Be careful about order. (2) In some cases there will be a handy wordin English for just the relation, but in others it will have to be described in a moreroundabout (but still precise) way.

Solution and Commentary(a) PºP D ‘grandparent of’. Reason: a is a grandparent of c iff there is an x such

that a is a parent of x and x is a parent of c.(b) MºF D ‘maternal grandfather of’. Reason: a is a maternal grandfather of c iff

there is an x such that a is a father of x and x is a mother of c.Comments on (b)

There are two common errors here: (1) getting the order wrong and (2) rushingto the answer ‘grandfather of’, since a father of a mother is always a grandfather.But there is another way of being a grandfather, namely, by being a father of afather, so that relation is too broad.


(c) SºP D ‘parent of a sister of’. Reason: a is a parent of a sister of c iff there is anx such that a is a parent of x and x is a sister of c.

Comments on (c)

This one is tricky. When there is an x such that a is a parent of x and x is asister of c, then a is also a parent of c, which tempts one to rush into the answer‘parent of’. But that relation is also too broad, because a may also be a parentof c when c has no sisters! English has no single word for the relation SºP; wecan do no better than use a circumlocution such as ‘parent of a sister of’.

(d) BºB D ‘brother of a brother of’.

Comments on (d)

Again, English has no single word for the relation. One is tempted to say thatBºB D ‘brother’. But that answer is both too broad (in one respect) and toonarrow (in another respect). Too broad because a may be a brother of c withoutbeing a brother of a brother x of c: there may be no third brother to serve asthe x. Too narrow because a may be a brother of x and x a brother of c, withouta being a brother of c, since a may be the same person as c! In this example,again, we can do little better than use the phrase ‘brother of a brother of’.

Different languages categorize family relations in different ways, and translationsare often only approximate. There may be a single term for a certain complexrelation in one language but none in another. Languages of tribal communities arerichest in this respect, and those of modern urban societies are poorest. Even withina single language, there are sometimes ambiguities. In English, for example, let P bethe relation of ‘parent of’ and S the relation of ‘sister of’. Then, PºS is the relationof ‘being a sister of a parent of’. This is certainly a subrelation of the relation ofbeing an aunt of, but are the two relations identical? That depends on whether youinclude aunts by marriage, that is, whether you regard the wives of the brothers ofyour parents as your aunts. If you do include them under the term, then PºS is aproper subrelation of the relation of aunt. If you don’t include them, then it is thewhole relation, that is, PºS D ‘aunt’.

Exercise 2.3.3 (2)(a) Show that (S[R)ºT D (SºT)[(RºT).(b) Show that (SºR)�1 D R�1

ºS�1.

2.3.4 Image

The last operation that we consider in this section records what happens when weapply a relation to the elements of a set. Let R be any relation from set A to set B,and let a 2 A. We define the image of a under R, written R(a), to be the set of allb 2 B such that (a,b) 2 R.

2.4 Reflexivity and Transitivity 41

Exercise 2.3.4 (1) (with partial solution)(a) Let A D fJohn,Mary,Peterg and let B D f1,2,3g. Let R be the relation f(John,1),

(Mary,2), (Mary,3)g. What are the images of John, Mary and Peter under R?(b) Represent these three images in a natural way in the digraph for R.(c) What is the image of 9 under the relation � over the natural numbers?

Solution to (a) and (c)(a) R(John) D f1g, R(Mary) D f2,3g, R(Peter) D ¿.(c) � (9) D fn 2 N: 9 � ng.

It is useful to ‘lift’ the notion from of image from elements of A to subsets of A.If X � A, then we define the image of X under R, written R(X), to be the set of all b 2B such that (x,b) 2 R for some x 2 X. In shorthand notation, R(X) D fb 2 B: 9x 2 X,(x,b) 2 Rg.

Exercise 2.3.4 (2)(a) Let A, B, R be as in the preceding exercise. Identify R(X) for each one of the

eight subsets X of A.(b) What are the images of the following sets under the relation � over the natural

numbers? (i) f3, 12g, (ii) f0g, (iii) ¿, (iv) the set of all even natural numbers,(v) the set of all odds, (vi) N.

(c) Let P (for predecessor) be the relation defined by putting (a,x) 2 P iff x D a C 1.What are the images of each of the above six sets under P?

(d) Identify the converse of the relations � and P over the natural numbers, and theimages of the above six sets under these converse relations.

It can happen that the notation R(X) for the image of X � A under R is ambiguous.This will be the case when the set A already has among its elements certain subsetsX of itself. For this reason, some authors instead write R“(X) for the image of asubset. However, we will rarely if be dealing with such sets, and it will usually besafe to stick with the simpler and more common notation.

Exercise 2.3.4 (3)(a) Construct a set A with just one element, which is also a subset of A.(b) Do the same but with two elements in A, both of which are subsets of A.(c) Indicate how this can be done for n elements, where n is any positive integer?

2.4 Reflexivity and Transitivity

In this section and the following two, we will look at special properties that a relationmay have. Some of these properties are useful when we ask questions about howsimilar two items are, and when we want to classify objects. Some are needed whenwe look for some kind of order among items. Two of the properties, reflexivity andtransitivity, are essential to both tasks, and so we begin with them.


2.4.1 Reflexivity

Let R be a (binary) relation. We say that it is reflexive over a set A iff (a,a) 2 R forall a 2 A. For example, the relation � is reflexive over the natural numbers, sincealways n � n; but < is not. Indeed, < has the opposite property of being irreflexiveover the natural numbers: never n < n.

Clearly, reflexivity and irreflexivity are not the only possibilities: a relation Rover a set may be neither one nor the other. For example, n is a prime divisor of nfor some natural numbers (e.g. 2 is a prime divisor of itself) but not for some others(e.g. 4 is not a prime divisor of 4).

If we draw a digraph for a reflexive relation, every point will have an arrowgoing from it to itself; an irreflexive relation will have no such arrows; a relationthat is neither reflexive nor irreflexive will have such arrows for some but not allof its points. However, when a relation is reflexive we sometimes reduce clutter byomitting these arrows from our diagrams and treating them as understood.

Exercise 2.4.1 (1) (with partial solution)(a) Give other examples of relations over the natural numbers satisfying the three

properties reflexive/irreflexive/neither.(b) Can a relation R ever be both reflexive and irreflexive over a set A? If so, when?(c) Identify the status of the following relations as reflexive/irreflexive/neither over

the set of all people living in the UK: sibling of, shares at least one parent with,ancestor of, lives in the same city as, has listened to music played by.

(d) What does reflexivity mean in terms of a tabular representation of relations?

Solution to (c) The relation ‘sibling of’ as ordinarily understood is not reflexivesince we do not regard a person as a sibling of himself or herself. In fact, it isirreflexive. By contrast, ‘shares at least one parent with’ is reflexive. It follows thatthese two relations are not quite the same. ‘Ancestor of’ is irreflexive. ‘Lives in thesame city as’, under its most natural meaning, is neither reflexive nor irreflexive,since not everyone lives in a city. ‘Has listened to music played by’ is neitherreflexive nor irreflexive for similar reasons.

A subtle point needs attention. Let R be a relation over a set A, that is, R � A2.Then as we observed earlier, R is also a relation over any superset B of A, sincethen R � A2 � B2. It can happen that R is reflexive over A but not over its supersetB. This is because we may have (a,a) 2 R for all a 2 A but (b,b) 62 R for someb 2 B � A. For this reason, whenever we say that a relation is reflexive or irreflexive,we should always specify what set A we have in mind. Doing this explicitly canbe rather laborious, and so in practice, the identification of A is quite often left asunderstood. But you should always be sure how it is to be understood.

Exercise 2.4.1 (2)(a) Give a small finite example of the dependence described in the text.(b) Show that when R is reflexive over a set then it is reflexive over all of its subsets.

2.4 Reflexivity and Transitivity 43

(c) Show that when each of two relations is reflexive over a set, so too are theirintersection and their union.

(d) Show that the converse of a relation that is reflexive over a set is also reflexiveover that set.

2.4.2 Transitivity

Another important property for relations is transitivity. We say that R is transitiveiff whenever (a,b) 2 R and (b,c) 2 R, then (a,c) 2 R.

For example, the relation � over the natural numbers is transitive: whenevera � b and b � c, then a � c. Likewise for the relation <. But the relation of havingsome common prime factor is not transitive: 4 and 6 have a common prime factor(namely, 2), and 6 and 9 have a common prime factor (namely, 3), but 4 and 9 donot have any common prime factor. By the same token, the relation between peopleof being first cousins (that is, sharing some grandparent, but not sharing any parent)is not transitive.

Another relation over the natural numbers failing transitivity is that of being animmediate predecessor: 1 is an immediate predecessor of 2, which is an immediatepredecessor of 3, but 1 does not stand in that relation to 3. In this example, therelation is indeed intransitive, in the sense that whenever (a,b) 2 R and (b,c) 2 R,then (a,c) 62 R. In contrast, the relation of sharing a common prime factor, and thatof being a first cousin, are clearly neither transitive nor intransitive.

In a digraph for a transitive relation, whenever there is an arrow from one pointto a second, and another arrow from the second point to a third, there should alsobe an arrow from the first to the third. Evidently, this tends to clutter the pictureconsiderably, even more so than for reflexivity. So, often when a relation is knownto be transitive we adopt the convention of omitting the ‘third arrows’ from thedigraph, treating them as understood. However, one must be clear about whethersuch a convention is being used.

Exercise 2.4.2(a) Can a relation be both transitive and intransitive? If so, when?(b) For each of the following relations over persons, state whether it is transitive,

intransitive or neither: sister of, sibling of, parent of, ancestor of.(c) Show that the intersection of any two transitive relations is transitive.(d) Give an example to show that the union of two transitive relations need not be

transitive.(e) Is the composition of two transitive relations always transitive? Give a verifica-

tion or a counterexample.


2.5 Equivalence Relations and Partitions

We now look at properties that are of particular interest when we want to express anotion of equivalence or similarity between items.

2.5.1 Symmetry

We say that R is symmetric iff whenever (a,b) 2 R, then (b,a) 2 R. For example, therelation of identity (alias equality) over the natural numbers is symmetric: whenevera D b then b D a. So is the relation of sharing a common prime factor: if a has someprime factor in common with b, then b has a prime factor (indeed, the same one)in common with a. On the other hand, neither � nor < over the natural numbers issymmetric.

The way in which < fails symmetry is not the same as the way in which � failsit! When n < m then we never have m < n, and the relation is said to asymmetric. Incontrast, when n � m we sometimes have m � n (when m D n) but sometimes m — n(when m ¤ n), and so this relation is neither symmetric nor asymmetric.

In a digraph for a symmetric relation, whenever there is an arrow from one pointto a second, then there is an arrow going back again. A natural convention reducesclutter in the diagram, without possible ambiguity: use double-headed arrows, orlinks without heads.

Exercise 2.5.1(a) What does symmetry mean in terms of the tabular representation of a relation?(b) Can a relation R ever be both symmetric and asymmetric? If so, when?(c) For each of the following relations over persons, state whether it is symmetric,

asymmetric or neither: sister of, sibling of, parent of, ancestor of.(d) Show that the converse of any symmetric relation is symmetric, as are also the

intersection and union of any symmetric relations. How does this compare withthe situation for transitivity? Why the difference?

(e) Is the composition of two symmetric relations always symmetric? Give averification or a counterexample.

2.5.2 Equivalence Relations

When a relation is both reflexive and symmetric, it is sometimes called a similarityrelation. When it has all three properties – transitivity, symmetry and reflexivity – itis called an equivalence relation.

Equivalence relations are often written using a symbol such as to bring out theidea that they behave rather like identity. And like identity, they are usually writtenby infixing, that is, as a b, rather than in basic set notation as in (a,b) 2 R or byprefixing as in Rab.

2.5 Equivalence Relations and Partitions 45

Exercise 2.5.1 (with partial solution)(a) Give another example of an equivalence relation over the natural numbers, and

one over the set of all polygons in geometry.(b) (i) Check that the identity relation over a set is an equivalence relation over that

set. (ii) Show also that it is the least equivalence over that set, in the sense thatit is included in every equivalence relation over the set. (iii) What is the largestequivalence relation over a set?

(c) Over a set of people, is the relation of having the same nationality anequivalence relation?

(d) What does the digraph of an equivalence relation look like?

Solutions to (a), (b), (c)(a) For example, the relation of having the same parity (that is, both even, or else

both odd) is an equivalence relation. For polygons, the relation of having thesame number of sides is an equivalence relation. Of course, there are manyothers in both domains.

(b) (i) Identity is reflexive over any set because always a D a. It is symmetricbecause whenever a D b then b D a. It is transitive because whenever a D b andb D c then a D c. (ii) It is included in every other equivalence relation overthe same set. This is because every equivalence relation is reflexive, that is,a a for every a 2 A, so that whenever a D b we have a b. (iii) The largestequivalence relation over A is A2. Indeed it is the largest relation over A2, and itis easily checked to be an equivalence relation.

(c) In the case that everyone in A has exactly one nationality, then this is anequivalence relation. But if someone in A has no nationality, then reflexivityfails, and if someone has more than one nationality, transitivity may fail. Incontrast, the relation of ‘having the same set of nationalities’ is always anequivalence relation.

Alice Box: Identity or Equality?

Alice: You speak of people being identical to each other, but for numbersand other mathematical objects, I more often hear my instructors talkof equality. Are these the same?

Hatter: Yes, they are. Number theorists talk of equality, while logicians tendto speak of identity. But they are the same: identity is equal toequality, equality is identical with identity. However, you should becareful.

Alice: Why?Hatter: In informal talk, both terms are sometimes used, rather loosely,

to indicate that the items concerned agree on all properties thatare considered important in the context under discussion. Strictlyspeaking, this is not full identity, only a ‘tight’ equivalence relation,perhaps also satisfying certain ‘congruence’ conditions. But more onthis later.


Fig. 2.2 Diagram for a partition

There is a very important difference between identity and other equivalencerelations, which is a consequence of being the smallest. When a D b, the items aand b are elements of exactly the same sets and have exactly the same properties.Any mathematical statement that is true of one is true of the other. So in anymathematical statement we may replace one by the other without loss of truth.

This is one of the fundamental logical properties of identity. To illustrate it, returnto the last exercise, part (b) (ii). There we said that if a D b then whenever a a wehave a b. Here we were taking the statement a a and replacing the second a in itby b. In school you may have learned this under the rather vague title of ‘substitutionof equals gives equals’.

2.5.3 Partitions

Your paper correspondence is in a mess – one big heap. You need to classify it,putting items into mutually exclusive but together exhaustive categories, but youdon’t want to go into subcategories. Mathematically speaking, this means that youwant to create a partition of the set of all the items in the heap.

We now define this concept. Let A be any non-empty set, and let fBigi2I be acollection of subsets of A.• We say that the collection exhausts A iff [fBigi2I D A, that is, iff every element

of A is in at least one of the Bi. Using the notation that we introduced earlier forthe universal and existential quantifiers, iff 8a 2 A 9i2I (a2Bi).

• We say that the collection is pairwise disjoint iff every two distinct sets Bi in thecollection are disjoint. That is, for all i,j 2 I, if Bi ¤ Bj, then Bi \ Bj D ¿. Usinglogical notation, iff 8i, j 2 I (Bi ¤ Bj ! Bi \ Bj D ¿).A partition of A is defined to be any collection fBigi2I of non-empty subsets of

A that are pairwise disjoint and together exhaust A. The sets Bi are called the cells(or sometimes, blocks) of the partition. We can diagram a partition in the manner ofFig. 2.2.

2.5 Equivalence Relations and Partitions 47

Exercise 2.5.3 (with solution)(a) Which of the following are partitions of A D f1,2,3 4g? (i) ff1,2g, f3gg,

(ii) ff1,2g, f2,3g, f4gg, (iii) ff1,2g, f3,4g, ¿g, (iv) ff1,4g, f3,2gg.(b) We say that one partition of a set is at least as fine as another, iff every cell

of the former is a subset of a cell of the latter. What is the finest partitionof A D f1,2,3,4g? What is the least fine partition? In general, if A has n � 1elements, how many cells in its finest and least fine partitions?

Solution(a) Only (iv) is a partition of A. In (i) the cells do not exhaust A; in (ii) they are

not pairwise disjoint; in (iii) one cell is the empty set, which is not allowed in apartition.

(b) The finest partition of A D f1, 2, 3, 4g is ff1g,f2g,f3g,f4gg, and the least fine isff1,2,3,4gg. In general, if A has n � 1 elements, then its finest partition has ncells, while its least fine partition has only 1 cell.

2.5.4 The Partition/Equivalence Correspondence

It turns out that partitions and equivalence relations are two sides of the same coin.On the one hand, every partition of a set A determines an equivalence relation over Ain a natural manner; on the other hand, every equivalence relation over a non-emptyset A determines, in an equally natural way, a partition over A. The verification ofthis is rather abstract, and hence challenging, but you should be able to follow it.

We begin with the left-to-right direction. Let A be any set, and let fBigi2I be apartition of A. We define the relation R associated with this partition by putting(a,b) 2 R iff a and b are in the same cell, that is, iff 9i 2 I with a,b 2 Bi. We need toshow that it is an equivalence relation:• By the exhaustion condition, R is reflexive over A. Since the partition exhausts A,

we have 8a 2 A, 9i 2 I with a 2 Bi and so immediately 9i 2 I with both of a,a 2 Bi.

• By the definition, R is symmetric. When (a,b) 2 R, then 9i 2 I with a,b 2 Bi, soimmediately b,a 2 Bi, and thus (b,a) 2 R.

• Finally, by the disjointness condition, R is transitive. Suppose (a,b) 2 R and (b,c)2 R; we want to show (a,c) 2 R. Since (a,b) 2 R, 9i 2 I with both a,b 2 Bi,and since (b,c) 2 R, 9j 2 I with b,c 2 Bj. But since the cells of the partition arepairwise disjoint, either Bi D Bj or Bi\Bj D ¿. Since b 2 Bi\Bj the latter is notpossible, so Bi D Bj. Hence both a,c 2 Bi, which gives us (a,c) 2 R as desired.Now for the right-to-left direction, let be an equivalence relation over a non-

empty set A. For each a 2 A we consider its image (a) D fx 2 A: a xg under therelation . This set is usually written as jaj� or simply as jaj when the equivalencerelation is understood, and that is the notation we will use here. We consider thecollection of these sets jaj for a 2 A and verify that it has the properties needed tobe a partition:• Non-emptiness. Since is a relation over A, each set jaj is a subset of A, and it is

non-empty because is reflexive over A.


• Exhaustion. The sets jaj together exhaust A, that is, [fjajga2A D A, againbecause is reflexive over A.

• Pairwise disjointedness. We argue contrapositively. Let a,a0 2 A and supposejaj\ja0j ¤ ¿. We need to show that jaj D ja0j. For that, it suffices to show thatjaj � ja0j and conversely ja0j � jaj. We do the former; the latter is similar. Letx 2 jaj, that is, a x; we need to show that x 2 ja0j, that is, that a0 x. By theinitial supposition, there is a y with y 2 jaj\ja0j, so y 2 jaj and y 2 ja0j, so a yand a0 y. Since a y symmetry gives us y a, and so we may apply transitivitytwice: first to a0 y and y a to get a0 a, and then to that and a x to get a0 xas desired.Note that in this verification we appealed to all three of the conditions reflexivity,

symmetry and transitivity. You can’t get away with less.

Alice Box: Similar Verifications

Alice: In the verification, you said ‘We do the former; the latter is similar’.But how do I know that it is really similar if I don’t spell it out?

Hatter: Strictly speaking, you don’t! But when two cases under considera-tion have the same structure, and the arguments are exactly the sameexcept for, say, a permutation of variables (in this case, we permuteda and a0), then we may take the liberty of omitting the second so asto focus on essentials.

Alice: Isn’t it dangerous?Hatter: Of course, there can be small but crucial differences, so care is

needed. Writers should check all cases before omitting any, andshould present things in such a way that the reader can alwaysreconstruct missing parts in a routine way.

When is an equivalence relation over A, the sets jaj D fx 2 A: a xg for a 2 A(i.e. the cells of the corresponding partition) are called the equivalence classes of .

Exercise 2.5.4(a) Let A be the set of all positive integers from 1 to 10. Consider the partition

into evens and odds. Write this partition by enumeration as a collection of sets,then describe the corresponding equivalence relation in words, and write it byenumeration as a set of ordered pairs.

(b) Let A be the set of all positive integers from 2 to 16. Consider the relation ofhaving exactly the same prime factors (so e.g. 6 12 since they have the sameprime factors 2 and 3). Identify the associated partition by enumerating it as acollection of subsets of A.

(c) How would you describe the equivalence relation associated with the finest(respectively: least fine) partition of A?

2.6 Relations for Ordering 49

2.6 Relations for Ordering

Suppose that we want to use a relation to order things. It is natural to require it to betransitive. We can choose it to be reflexive, in which case we get an inclusive order(like � over the natural numbers or � between sets); or to be irreflexive, in whichcase we get what is usually known as a strict order (like < or �). In this section, weexamine some special kinds of inclusive order, and then look at strict orders.

2.6.1 Partial Order

Consider a relation that is both reflexive and transitive. What other properties do wewant it to have if it is to serve as an ordering?

Not symmetry, for that would make it an equivalence relation. Asymmetry? Notquite, for a reflexive relation over a non-empty set can never be asymmetric. Whatwe need is the closest possible thing to asymmetry: whenever (a,b) 2 R, then (b,a)62 R provided that a ¤ b. This property is known as antisymmetry, and it is usuallyformulated in a contraposed (and thus equivalent) manner: whenever (a,b) 2 R and(b,a) 2 R then a D b.

A relation R over a set A that is reflexive (over A), transitive, and alsoantisymmetric is called a partial order (or partial ordering) of A, and the pair(A,R) is called for short a poset. It is customary to write a partial ordering as �(or some square or curly variant of the same sign) even though, as we will soonsee, the familiar relation ‘less than or equal to’ over the natural numbers (or anyother number system) has a further property that not all partial orderings share. Twoexamples of partial order:• For sets, the relation � of inclusion is a partial order. As we already know, it is

reflexive and transitive. We also have antisymmetry since whenever A � B andB � A, then A D B.

• In arithmetic, an important example is the relation of being a divisor of, over thepositive integers, that is, the relation R over NC defined by (a,b) 2 R iff b D kafor some k 2 NC.

Exercise 2.6.1(a) Check that the relation of being divisor of, over the positive integers, is indeed

a partial order.(b) What about the corresponding relation over Z, that is, the relation R over Z

defined by (a,b) 2 R iff b D ka for some k 2 Z?(c) And the corresponding relation over QC, that is, the relation R over QC defined

by (a,b) 2 R iff b D ka for some k 2 QC?(d) Consider the relation R over people defined by (a,b) 2 R iff either b D a or b is

descended from a. Is it a poset?(e) Show that the relation of ‘being at least as fine as’, between partitions of a given

set A, is a partial order.


2.6.2 Linear Orderings

A reflexive relation R over a set A is said to be complete over A iff for all a,b 2 A,either (a,b) 2 R or (b,a) 2 R. A poset that is also complete is often called a linear(or total) ordering.

Clearly the relation � over N is complete and so a linear ordering, since for allm,n 2 N either m � n or n � m. So is the usual lexicographic ordering of words in adictionary. On the other hand, whenever a set A has more than one element, then therelation � over P(A) is not complete. Reason: Take any two distinct a,b 2 A, andconsider the singletons fag,fbg; they are both elements ofP(A), but neither fag � fbgnor fbg � ag because a ¤ b.

Exercise 2.6.2 (with solution)(a) Give two more linear orderings of N.(b) Which of the following are linear orderings over an arbitrary set of people: (i)

is at least as old as, (ii) is identical to or a descendent of?

Solution(a) There are plenty, but here are two: (i) the relation �, that is, the converse of

�, (ii) the relation that puts all odd positive integers first, and then all the evenones, each of these blocks being ordered separately as usual.

(b) (i) Yes, it meets all the requirements, (ii) no, since it is not complete.

2.6.3 Strict Orderings

Whenever we have a reflexive relation �, we can always look at what is knownas its strict part. It is usually written as < (or a square or curly variant of this)even though it need not have all the properties of ‘less than’ over the usual numbersystems. The definition is as follows: a < b iff a � b but a ¤ b. In language that lookslike gibberish but makes perfect sense if you read it properly: (<) D (� \ ¤).

Exercise 2.6.3 (1) (with solution)(a) Show that every asymmetric relation over a set A is irreflexive.(b) Show that when � is a partial ordering over a set A, then its strict part < is (i)

asymmetric and (ii) transitive.

Solution(a) We argue contrapositively. Suppose that < is not irreflexive. Then, there is

an a 2 A with a < a. Thus, both a < a and a < a so that asymmetry fails.(b)(i) For asymmetry, let � be a partial ordering and suppose a < b and b < a where

< is the positive part of �. We get a contradiction. By the suppositions, a � b,b � a and a ¤ b, which is impossible by the antisymmetry of �.

(b)(ii) For transitivity, suppose a < b and b < c; we want to show that a < c. By thesuppositions, a � b and b � c but a ¤ b and b ¤ c. Transitivity of � thus gives

2.6 Relations for Ordering 51

a � c; it remains to check that a ¤ c. Suppose a D c; we get a contradiction.Since b � c and a D c we have b � a, so by the antisymmetry of � using alsoa � b we have a D b, contradicting a ¤ b.

Alice Box: Proof by Contradiction (Reductio ad Absurdum)

Alice: There is something in the solution to this exercise that worriesme. We supposed the opposite of what we wanted to show. Forexample, in (b)(i), to show that the relation < there is asymmetric,we supposed that both a < b and b < a, which is the opposite of whatwe are trying to prove.

Hatter: Indeed we did, and the goal of the argument changed when we madethe supposition: it became one of deriving a contradiction. In theexercise, we got our contradictions very quickly; sometimes it takesmore argument.

Alice: Will any contradiction do?Hatter: Any contradiction you like. Any pair of propositions ’ and :’. Such

pairs are as bad as each other, and thus just as good for the purposesof the proof.

Alice: Is this procedure some new invention of modern logicians?Hatter: Not at all. It was well known to the ancient Greeks, and can be

found in Euclid. In the Middle Ages it was taught under its Latinname reductio ad absurdum, sometimes abbreviated to RAA, andthis name is still often used.

Reductio ad absurdum is a universal proof method, in the sense that it is alwayspossible to use it. Some people like to use it whenever they can; others do so onlywhen they get stuck without it. Most mathematicians are somewhere in the middle:they use it when they see that it can make the argument shorter or more transparent.

Exercise 2.6.3 (2) Do part (b) of the preceding exercise again, without using proofby contradiction, for example, by proving the contrapositive. Compare the solutions,and decide which you prefer.

In the preceding chapter we observed that the relation of immediate inclusionbetween finite sets may be represented by a Hasse diagram. We can do the samefor any partial order � over a finite set. The links of the diagram, read from belowto above, represent the relation of being an immediate predecessor of. This is therelation that holds from a to b iff a < b but there is no x with a < x < b, where < isthe strict part of �.

Once we have a Hasse diagram for the immediate predecessor part of a partialordering, we can read off from it the entire relation: a � b iff there is an ascendingpath (with at least one element) from a to b. Evidently, this is a much moreeconomical representation than drawing the digraph for the entire partial ordering.


For this reason, we loosely call it the Hasse diagram of the partial ordering itself,and similarly for its strict part.

Exercise 2.6.3 (3) Draw a Hasse diagram for (the immediate predecessor part of)the relation of being an exact divisor of, over the set of positive integers up to 13.

We have seen in an exercise that when � is a partial ordering over a set A, thenits strict part < is asymmetric and transitive. We also have a converse: whenever arelation < is both asymmetric and transitive, then the relation � defined by puttinga � b iff either a < b or a D b is a partial order. Given this two-way connection,asymmetric transitive relations are often called strict partial orders.

Exercise 2.6.3 (4)(a) Show, as claimed in the text, that when < is a transitive asymmetric relation,

then the relation � defined by putting a � b iff either a < b or a D b is a partialorder.

(b) Suppose we start with a partial order �, take its strict part <, and then definefrom it a partial order �0 by the rule in (a). Show that �0 is the same relationas �.

(c) Suppose we start with a strict partial order <, define from it a partial order � bythe rule in (a) and then take its strict part <0. Show that <0 is the same relationas <.

Hint: In part (b), first show that �0 is a subrelation of �, then show the converse.Use a similar procedure for the strict relations in (c).

2.7 Closing with Relations

In this section, we explain two very useful concepts, the transitive closure of anarbitrary relation, and the closure of a set under a relation.

2.7.1 Transitive Closure of a Relation

Suppose you are given a relation R that is not transitive, but which you want to‘make’ transitive. Of course, you cannot change the status of R itself, but you canexpand it to a larger relation that satisfies transitivity. In general, there will be manyof these. For example, whenever R � A2, then A2 itself is a transitive relation thatincludes R. But there will be much smaller ones, and it is not difficult to see thatthere must always be a unique smallest transitive relation that includes R; it is calledthe transitive closure of R and is written as R*.

There are several equivalent ways of defining R*. We give three, which we mightcall the finite path formulation (using finite sequences), the bottom-up definition(using union of sets), and the top-down one (using intersection of sets).

The finite path definition puts (a,b) 2 R* iff there is a finite sequence x1, : : : xn

(n � 2) with a D x1, xn D b, and xiRxiC1 for all i with 1 � i < n.

2.7 Closing with Relations 53

For the ‘bottom-up’ definition, we begin by defining the relations R0, R1, : : :

recursively as follows:

R0 D R

RnC1 D Rn [ f.a; c/ W 9x with .a; x/ 2 Rn and .x; c/ 2 Rg :

We then put R* to be their union: R* D [fRn: n 2 Ng. In this way, R* is built upby successively adding pairs (a,c) whenever (a,x) is in the relation constructed sofar and (x,c) 2 R. In effect, the definition tells us to keep on doing this until there areno such pairs (a,c) still needing to be put in or indefinitely if there are always stillsuch pairs.

The ‘top-down’ account defines R* as the intersection of the collection of alltransitive relations that include R. That is: when R is a relation over A2 thenR* D \fS: R � S � A2 and S is transitiveg.

Why bother with three definitions of the same thing? They provide three quitedifferent ways of looking at R*, and three distinct ways of proving facts about it.Sometimes it is more convenient to use one definition, sometimes another, and youare encouraged to have all three in your toolkit.

Exercise 2.7.1 (1) (with partial solution)(a) Identify the transitive closures of the following relations: (i) parent of, (ii)

mother of, (iii) descendant of.(b) Show that when R* is defined as the intersection of the collection of all transitive

relations that include R, it is indeed the least transitive relation that includes R,where ‘least’ means’ least under set inclusion. Hint: It suffices to check that (i)R � R*, (ii) R* is transitive, and (iii) R* � S for every transitive relation S withR � S.

(c) Write the following assertions in the notation for converse (see Sect. 2.3.1) andtransitive closure, and determine whether they are true or false: (i) the converseof the transitive closure of a relation equals the transitive closure of the converseof that relation, (ii) the intersection of the transitive closures of two relationsequals the transitive closure of their intersection.

Solution to (a) (i) Ancestor of. (ii) Ancestor of in the female line (there is no singleword for this in English). (iii) Descendant of (as this relation is already transitive, itcoincides with its transitive closure).

Exercise 2.7.1 (2)(a) Draw an arrow diagram to illustrate the definition of RnC1 out of Rn and R.(b) Can you show that the three definitions of transitive closure are equivalent? This

is quite a challenging exercise, and if you run into difficulty now you may wishto return after the chapter on recursion and induction.


2.7.2 Closure of a Set Under a Relation

Transitive closure is in fact a particular instance of a more general construction ofgreat importance – the closure of an arbitrary set under an arbitrary relation.

To introduce it, we recall the definition of image from earlier in this chapter(Sect. 2.3.4). The image R(X) of a set X under a relation R is the set of all b suchthat (x,b) 2 R for some x 2 X; briefly R(X) D fb: 9x 2 X, (x,b) 2 Rg.

Now, suppose we are given a relation R (not necessarily transitive) over a set B,and a subset A � B. We define the closure R[A] of A under R to be the least subsetof B that includes A and also includes R(X) whenever it includes X.

Again, this is a top-down definition, with ‘least’ understood as the result ofintersection. It can also be expressed bottom-up, as follows:

A0 D A

AnC1 D An [ R .An/ for each natural number n

R ŒA� D [fAn W n 2 Ng:

The closure of A under R is thus constructed by beginning with A itself, andat each stage adding in the elements of the image of the set so far constructed.Closure thus differs from image in two respects: the closure of a set under a relationalways includes that set, and we apply the relation not just once but over and overagain.

Exercise 2.7.2 (with solution) In the set N of natural numbers, what is the closureR[A] of the set A D f2,5g under the following relations over N2: (a) (m,n) 2 R iffn D m C 1, (b) (m,n) 2 R iff n D m � 1, (c) (m,n) 2 R iff n D 2 m, (d) (m,n) 2 R iffn D m/2, (e) �, (f) < .

Solution(a) fn 2 N: n � 2g, (b) fn 2 N: n � 5g D f0,1,2,3,4,5g, (c) f2,4,8, : : : ; 5,10, 20, : : : g,(d) f1,2,5g, (e) fn 2 N: n � 2g, (f) fn 2 N: n � 2g.

Comment For part (d), if you wrote f1g you forgot that A � R[A] for every relationR. Similarly if you wrote fn 2 N: n > 2g for (f).

Notation and terminology: When the relation R is understood, we sometimes writethe closure R[A] more briefly as AC, and say that R generates the closure from A.


Exercise 2.1: Cartesian Products(a) Show that A � (B\C) D (A � B)\(A � C).(b) Show that A � (B[C) D (A � B)[(A � C).


Exercise 2.2: Domain, Range, Join, Composition, Image(a) Consider the relations R D f(1,7), (3,3), (13,11)g and S D f(1,1), (1,7), (3,11),

(13,12), (15,1)g over the positive integers. Identify dom(R\S), range(R\S),dom(R[S), range(R[S).

(b) In the same example, identify join(R,S), join(S,R), SºR, RºS, RºR, SºS.(c) In the same example, identify R(X) and S(X) for X D f1,3,11g and X D ¿.(d) Explain how to carry out composition by means of join and projection.

Exercise 2.3: Reflexivity and Transitivity(a) Show that R is reflexive over A iff IA � R. Here IA is the identity relation over A,

defined in an exercise in Sect. 2.1.3.(b) Show that the converse of a relation R that is reflexive over a set A is also

reflexive over A.(c) Show that R is transitive iff RºR � R.(d) In a previous exercise, we showed that the intersection of any two transitive

relations is transitive but their union may not be. (i) Show that the converse andcomposition of transitive relations are also transitive. (ii) Give an example toshow that the complement of a transitive relation need not be transitive.

(e) A relation R is said the be acyclic iff there are no a1, : : : ,an (n � 2) such thateach (ai,aiC1) 2 R and also an D a1. (i) Show that a transitive irreflexive relationis always acyclic. (ii) Show that every acyclic relation is irreflexive. (iii) Givean example of an acyclic relation that is not transitive.

Exercise 2.4: Symmetry, Equivalence Relations and Partitions(a) Show that the following three conditions are equivalent: (i) R is symmetric, (ii)

R � R�1, R D R�1.(b) Show that a relation that is reflexive over a non-empty set can never be

intransitive.(c) Show that if R is reflexive over A and also transitive, then the relation S defined

by (a,b) 2 S iff both (a,b) 2 R and (b,a) 2 R is an equivalence relation.(d) Show that the intersection of two equivalence relations is an equivalence

relation, but that this is not the case for unions. Hint: Apply the results ofexercises on reflexivity, transitivity, and symmetry in this chapter.

(e) Enumerate all the partitions of A D f1,2,3g and draw a Hasse diagram for themunder fineness.

(f) Show that one partition of a set is at least as fine as another iff the equivalencerelation associated with the former is a subrelation of the equivalence relationassociated with the latter.

Exercise 2.5: Antisymmetry, Partial Order, Linear Order(a) Let R be any transitive relation over a set A. Define S over A by putting (a,b) 2 S

iff either a D b or both (a,b) 2 R and (b,a) 62 R. Show that S partially orders A.(b) Show that the converse of a linear order is linear but that the intersection and

composition of two linear orders need not be linear.


(c) Show that the identity relation over a set A is the unique partial order of A thatis also an equivalence relation.

(d) Let A be a set and � a partial ordering of A. An element a 2 A is said to be aminimal element of A (under �) iff there is no b 2 A with b < a. On the otherhand, an element a 2 A is said to be a least element of A (under �) iff a � b forevery b 2 A. Show the following for sets A and partial orderings �: (i) whenevera is a least element of A then it is a minimal element of A. (ii) The converse canfail (give a simple example). (iii) A can have zero, one or more than one minimalelements (give an example of each). (iv) A can have at most one least element,that is, if a least element exists, then it is unique. (v) For linear orderings, A canhave at most one minimal element, and if it exists then it is the least element ofA. (vi) If A is finite then it must have at least one minimal element. (vii) Thereare infinite linearly ordered sets without any minimal element.

Exercise 2.6: Strict Orderings(a) Give examples of relations that are (i) transitive but not asymmetric, and (ii)

asymmetric but not transitive.(b) Show that a relation is antisymmetric iff its strict part is asymmetric.(c) Show that every transitive acyclic relation is asymmetric.

Exercise 2.7: Closure(a) Show that always X[R(X) � R[X].(b) Show that R[A] D R(A) if R is both reflexive over A and transitive.

Selected Reading

The three books listed at the end of Chap. 1 for sets also have good chapters on relations. In detail:Bloch ED (2011) Proofs and fundamentals: a first course in abstract mathematics, 2nd edn.

Springer, New York, chapter 5Halmos PR (2001) Naive set theory, new edn. Springer, New York, chapters 6–7Lipschutz S (1998) Set theory and related topics, Schaum’s outline series. McGraw Hill,

New York, chapter 3

A gentle introductory account which, like that of Bloch, pays careful attention to underlying logicaland heuristic matters, is:

Velleman DJ (2006) How to prove it: a structured approach, 2nd edn. Cambridge University Press,Cambridge, chapter 4

Finally, we mention a chapter from the following text on discrete mathematics written specificallyfor computer science students:

Hein JL (2002) Discrete structures, logic and computability, 2nd edn. Jones and Bartlett, Sudbury,chapter 4

3Associating One Item with Another:Functions

AbstractFunctions occur everywhere in mathematics and computer science. In thischapter, we introduce the basic concepts needed in order to work with them.

We begin with the intuitive idea of a function and its mathematical definitionas a special kind of relation. We then see how the general concepts for relationsthat were studied in the previous chapter play out in this particular case(domain, range, restriction, image, closure, composition, inverse) and distinguishsome important kinds of function (injective, surjective, bijective) with specialbehaviour.

These concepts permit us to link functions with counting, via the equinu-merosity, comparison and surprisingly versatile pigeonhole principles. Finally,we identify some very simple kinds of function that appear over and again(identity, constant, projection and characteristic functions) and explain thedeployment of functions to represent sequences and families.

3.1 What Is a Function?

Traditionally, a function was seen as a rule, often written as an equation, whichassociates any number (called the argument of the function) with another number,called the value of the function. The concept has for long been used in physics andother sciences to describe processes whereby one quantity (such as the temperatureof a gas, the speed of a car) affects another (its volume, its braking distance). Inaccord with such applications, the argument of the function was sometimes calledthe ‘independent variable’, and the value of the function termed the ‘dependentvariable’. The idea was that each choice for the argument or independent variablecausally determines a value for the dependent variable.

Over the last 200 years, the concept of a function has evolved in the directionof greater abstraction and generality. The argument and value of the function neednot be numbers – they can be items of any kind whatsoever. The function need


57

58 3 Associating One Item with Another: Functions

not represent a causal relationship, nor indeed any physical process, although theseremain important applications. The function may not even be expressible by anylinguistic rule, although all of the ones that we will encounter in this book are.Taken to the limit of abstraction, a function is no more than a set of ordered pairs,that is, a relation that satisfies a certain condition.

What is that condition? We explain with functions of one argument. A one-placefunction from a set A into a set B is any binary relation R from A to B satisfying thecondition that for all a 2 A there is exactly one b 2 B with (a,b) 2 R. The italicizedparts of the definition deserve special attention:• Exactly one : : : : This implies that there is always at least one b 2 B with

(a,b) 2 R, and never more than one.• For all a 2 A : : : : This implies that the specified source A of the relation is in fact

its domain: there is no element a of A that fails to be the first term in some pair(a,b) 2 R.In terms of tables, a function from A to B is a relation whose table has exactly

one 1 in each row. In terms of digraphs, every point in the A circle has exactly onearrow going out from it to the B circle.

Exercise 3.1 (1) (with solution) For each of the following relations fromA D fa,b,c,dg to B D f1,2,3,4,5g, determine whether or not it is a function fromA to B. Whenever your answer is negative, give your reason.

(i) f(a,1), (b,2), (c,3)g(ii) f(a,1), (b,2), (c,3), (d,4), (d,5)g

(iii) f(a,1), (b,2), (c,3), (d,5)g(iv) f(a,1), (b,2), (c,2), (d,1)g(v) f(a,5), (b,5), (c,5), (d,5)g

Solution (i) No, since d 2 A, but there is no pair of the form (d,x) in the relation,so that the ‘at least one’ condition fails. (ii) No, since both (d,4) and (d,5) are inthe relation and evidently 4 ¤ 5 (the ‘at most one’ condition fails). (iii) Yes, eachelement of the source is related to exactly one element of the target; it does notmatter that the element 4 of the target is ‘left out’. (iv) Yes, it does not matter thatthe element 2 of the target is ‘hit twice’. (v) Yes, it does not matter that the onlyelement of the target that is hit is 5. This is called the constant function with value 5from A to B.

Functions are usually referred to with lower case letters f,g,h, : : : . Since for all a2 A there is a unique b 2 B with (a,b) 2 f, we may use some familiar terminologyand notation. We call the unique b the value of the function f for argument a andwrite it as f (a). Computer scientists sometimes say that f (a) is the output of thefunction f for input a. We also write f : A ! B to mean that f is a function from Ainto B. In the case that B D A, we have a function f : A ! A from A into itself, andwe usually say briefly that f is a function on A.

Strictly speaking, we have only defined one-place functions, from a singleset A into a set B. However, functions can have more than one argument; forexample, addition and multiplication in arithmetic each have two. We can generalize

3.1 What Is a Function? 59

the definition accordingly. If A1, : : : , An are sets, then an n-place function fromA1, : : : , An into B is an (n C 1)-place relation R such that for all a1, : : : , an witheach ai 2 Ai, there is exactly one b 2 B with (a1, : : : , an, b) 2 R. One-place functionsare then covered as the case where n D 1.

However, the additional generality in the notion of n-place functions may also beseen as merely apparent. We can treat an n-place function from A1, : : : , An intoB as a one-place function from the Cartesian product A1 � : : : � An into B. Forexample, addition and multiplication on the natural numbers are usually thoughtof a two-place functions f and g with f (x,y) D x C y and g(x,y) D x�y, but they mayalso be thought of as one-place functions on N � N into N with f ((x,y)) D x C yand g((x,y)) D x�y. So there will be no real loss in generality when we formulateprinciples in terms of one-place functions.

Alice Box: Bracket-Chopping

Alice: Typo! Too many brackets there.Hatter: No! Strictly speaking, the double parentheses are required. We need

the outer brackets as part of the notation for functions in general, andthe inner ones because the argument is an ordered pair (x,y).

Alice: But surely we can drop the inner ones if we read addition ormultiplication as two-place, rather than one-place functions? Westill need the outer brackets as part of the notation for functions,but instead of a single-ordered pair (x,y) as argument, we have twoarguments x,y which are numbers. So in that respect, using n-placefunctions can actually simplify notation.

Hatter: Nice point. And in computing, we do have to keep our eyes open forbrackets. But let’s get back to more substantial things : : :

There are many occasions like this in mathematics, where a concept or con-struction may be seen in either of two ways, each with its costs and benefits.One perspective can be taken as more general or basic than another, but also viceversa. We have just seen this for the specific concepts of one-place and many-place functions, but it also comes up for the wider concepts of relation and functionthemselves. In this book, we have taken the former as basic, defining the latter as aspecial case. It is also possible to do the reverse, and some texts do; each procedurehas its advantages and disadvantages.

To end the section, we draw attention to an important generalization of the notionof a function, which relaxes the definition in one respect. A partial function from aset A to a set B is a binary relation R from A to B such that for all a 2 A, there is atmost one b 2 B with (a,b) 2 R. The difference is that there may be elements of Athat do not have the relation R to any element of B. Partial functions are also usefulin computer science, and we employ them occasionally later, but in this chapter,we focus most of our attention on functions in the full sense of the term. When wespeak of functions without qualification, we mean full ones.


Exercise 3.1 (2) (with solution)(a) Is either of the non-functions in Exercise 3.1.1 nevertheless a partial function

from A to B?(b) Which of the following relations from Z into Z (see Chap. 1 Sect. 1.6 to recall

its definition) is a function on Z? Which of those that fail to be a function fromZ to Z is nevertheless a partial function on Z?

(i) f(a, jaj): a 2 Zg, (ii) f(jaj, a): a 2 Zg, (iii) f(a, a2): a 2 Zg, (iv) f(a2, a):a 2 Zg, (v) f(a, a C 1): a 2 Zg, (vi) f(a C 1, a): a 2 Zg, (vii) f(2a, a): a 2 Zg.

Solution(a) The relation (i) is a partial function from A to B.(b) (i) Yes, for every a 2 Z there is a unique b 2 Z with b D jaj.

(ii) No, for two reasons. First, dom(R) D N � Z. Second, even within thisdomain, the ‘at most one’ condition is not satisfied, since a can be positiveor negative. So this is not even a partial function.

(iii) Yes, for every a 2 Z there is a unique b 2 Z with b D a2.(iv) No, not even a partial function: same reasons as for (ii).(v) Yes, for every a 2 Z there is a unique b 2 Z with b D a C 1.

(vi) Yes, every x 2 Z is of the form a C 1 for a unique a 2 Z, namely, fora D x � 1.

(vii) No, dom(R) is the set of all even (positive or negative) integers only. But itis a partial function.

3.2 Operations on Functions

As functions are relations, all the operations that we introduced in the theory ofrelations apply to them. However, in this context, some of those operations aremore useful than others, or may be expressed in a simpler way or with differentterminology, so we begin by reviewing them.

3.2.1 Domain and Range

These two concepts carry over without change. Recall that when R is a relation fromA to B, then dom(R) D fa 2 A: 9 b 2 B ((a,b) 2 R)g. When R is in fact a function f,this is expressed as dom(f ) D fa 2 A: 9b 2 B (f (a) D b)g. Thus when f : A ! B, thendom(f ) D A.

Likewise, range(R) D fb 2 B: 9 a 2 A ((a,b) 2 R)g, which for functions comes torange(f ) D fb 2 B: 9a 2 A (f (a) D b)g. When f : A ! B then range(f ) may be B itselfor one of its subsets.

Exercise 3.2.1 (with solution)(a) Identify the domain and range of the three functions in Exercise 3.1 (1), calling

them f, g, h.(b) Can the domain of a function ever be empty? And the range?

3.2 Operations on Functions 61

Solutions(a) dom(f ) D dom(g) D dom(h) D A; range(f ) D f1,2,3,5g, range(g) D f1,2g,

range(h) D f5g.(b) There is just one function with empty domain, namely, the empty function,

which is also the unique function with empty range.

3.2.2 Restriction, Image, Closure

The concept of restriction is quite straightforward. Using the general definition fromthe theory of relations, the restriction of f: A ! B to X � A is the unique functionon X into B that agrees with f over X. Notations vary, but one is fX . Thus fX : X ! Bwith fX(a) D f (a) for all a 2 X. Because this is such a trivial operation it is rarelyat the centre of attention, and it is very common to omit the distracting subscript,leaving it to context to make it clear that the domain has been reduced from A to itssubset X.

Warning: When f : A ! A and X � A, then the restriction of f to X will always be afunction from X to A, but it will not always be a function from X to X, because theremay be an a 2 X � A with f (a) 2 AnX. We will see a simple example of this in the nextexercise.

Let f : A ! B, that is, let f be a function from A to B, and let X � A. The imageunder f of X � A is the set fb 2 B: 9a 2 X, b D f (a)g, which can also be written morebriefly as ff (a): a 2 Ag. Thus, to take limiting cases as examples, the image f (A) ofA itself is range(f ), and the image f (¿) of ¿ is ¿, while the image of a singletonsubset fag � A is the singleton ff (a)g. Thus, image is not quite the same thing asvalue: the value of a 2 A under f is f (a). However, some texts also use the term‘image’ rather loosely as a synonym of ‘value’.

When f : A ! B and X � A, the image of X under f is usually written as f (X),and we will follow this standard notation. Just as for relations, there are contextsin which it can be ambiguous. Suppose that A is of mixed type, in the sense thatsome element X of A is also a subset of A, that is, both X 2 A and X � A. Then theexpression f (X) becomes ambiguous: in a basic sense it denotes the value of X underf, while in a derivative sense it stands for the image ff (x): x 2 Xg of X under f. In thisbook, we will rarely be discussing sets of mixed type, and so do not have to worrymuch about the problem, but in contexts where mathematicians deal with them, theysometimes avoid ambiguity by using the alternative notation f “(X) for the image.

Image should not be confused with closure, which is a concept arising forfunctions f on a set into itself. Let f : A ! A and let X � A. Then, in a ‘top down’version, the closure f [X] of X under f is the least subset of A that includes X and alsoincludes f (Y) whenever it includes Y. Be careful: the word ‘includes’ here means


‘is a superset of’ and not ‘contains as an element’. Equivalently, in a ‘bottom up’or recursive form, we may define A0 D X and AnC1 D An[f (An) for each naturalnumber n, and put f [X] D [fAn: n 2 Ng. We will make use of this notion in thechapter on recursion and induction.

Exercise 3.2.2 (1) (with solution) In Exercise 3.1 (2), we saw that the relations(i) f(a, jaj): a 2 Zg, (iii) f(a, a2): a 2 Zg, (v) f(a, a C 1): a 2 Zg, (vi) f(a C 1, a):a 2 Zg are functions from Z to Z. Call them mod, square, successor and predecessor,respectively. Now recall the set N D f0,1,2,3, : : : g of the natural numbers.(a) Determine the image of N under each of these four functions.(b) Restrict the four functions to N. Which of them are functions into N?(c) Determine the images and closures of the set A D f�1,0,1,2g under each of the

four functions.

Solution(a) mod(N) D fjnj: n 2 Ng D fn: n 2 Ng D N; square(N) D fn2: n 2 Ng D f0,1,4,

9, : : : g; successor(N) D fn C 1: n 2 Ng D NC; predecessor(N) D fn � 1: n 2Ng D f�1g [ N.

(b) The only one that is not into N is the predecessor function, since 0 � 1 D �162 N. It is thus a function on N into f�1g [ N.

(c) Mod(A) D f0,1,2g but mod[A] D A; square(A) D f0,1,4g but square[A] D f0,1,2,4,16,32, : : : g; successor(A) D f0,1,2,3g but successor[A] D f�1,0,1,2, : : : g Df�1g[N; predecessor(A) D f�2,�1,0,1g but predecessor[A] D f : : : ,�2,�1,0,1,2g D Z�[f0,1,2g.

Comments (1) The restriction of mod: Z ! Z to N is clearly the set of all pairs(n,n) 2 N, that is, the function f : N ! N that puts f (n) D n for all n 2 N. This isknown as the identity function over N. We will have more to say about identityfunctions later in the chapter (2). In part (c), remember that although A is not alwaysincluded in its image f (A), it is always included in its closure f [A].

3.2.3 Composition

Perhaps the most basic operation to apply to functions is composition. Suppose weare given two relations R � A � B and S � B � C, so that the target of R is the sourceof S. Recall that their composition SºR is the set of all ordered pairs (a,c) such thatthere is some x with both (a,x) 2 R and (x,c) 2 S. Now consider the case that f isin fact a function from A to B and g is a function from B to C, briefly f: A ! Band g: B ! C. Then gºf is the set of all ordered pairs (a,c) such that there is somex with both x D f (a) and c D g(x); in other words, such that c D g(f (a)). It followsimmediately that gºf is a function from A to C, since for every a 2 A there is a uniquec 2 C with c D g(f (a)).

To sum up, composition is an operation on relations which, when applied tofunctions f: A ! B and g: B ! C where the domain of g is the same as the targetof f, gives us the function gºf : A ! C defined by putting gºf (a) D g(f (a)).

3.2 Operations on Functions 63

Composition is associative. Let f : A ! B, g: B ! C, h: C ! D. Then(hº(gºf )) D ((hºg)ºf ) since for all a 2 A, (hº(gºf ))(a) D h(g(f (a))) D ((hºg)ºf )(a).In rather tedious detail: on the one hand, (gºf ): A ! C with gºf (a) D g(f (a)), sohº(gºf ): A ! D with (hº(gºf ))(a) D h(g(f (a))). On the other hand, (hºg): B ! C with(hºg)(b) D h(g(b)), so (hºg)ºf: A ! D with also ((hºg)ºf )(a) D h(g(f (a))).

However, composition of functions is not in general commutative. Let f : A ! B,g: B ! C. Then the composition gºf is a function, but fºg is not a function unlessalso C D A. Even in the case that A D B D C, so that gºf and fºg are both functions,they may not be the same function. For example, suppose that A D B D C D f1,2g,with f (1) D f (2) D 1 while g(1) D 2 and g(2) D 1. Then gºf (1) D g(f (1)) D g(1) D 2,while fºg(1) D f (g(1)) D f (2) D 1.

Incidentally, we can now see the answer to a question that was bothering Alicein the preceding chapter: why write the composition of relations R and S as SıRrather than in the reverse order? When R, S are functions f : A ! B, g: B ! C thenSıR becomes gºf, which corresponds nicely to the order on the right hand side in thedefinition of gºf (a) D g(f (a)).

Exercise 3.2.3(a) Draw a diagram to illustrate the above example of non-commutativity of

composition.(b) Give an example of familiar functions f,g:N ! N such that fºg ¤ gºf.(c) Show that we can generalize the definition of composition a little, in the sense

that the composition gºf : A ! C is a function whenever f: A ! B and g: B0 ! Cand B � B0. Give an example to show that gºf may not be a function when,conversely, B0 � B.

3.2.4 Inverse

We recall the notion of the converse (alias inverse) R�1 of a relation: it is the setof all ordered pairs (b,a) such that (a,b) 2 R. The converse of a relation R is thusalways a relation. But when R is a function f: A ! B, its converse is sometimes afunction, sometimes not, and when it is a function it need not have the whole of Bas its domain.

Exercise 3.2.4 (with partial solution)(a) Let A D f1,2,3g and B D fa,b,c,dg. Let f D f(1,a), (2,a), (3,b)g and g D f(1,a),

(2,b), (3,c)g. (i) Explain why neither f�1 D f(a,1), (a,2), (b,3)g nor g�1 D f(a,1),(b,2), (c,3)g is a function from B to A. (ii) Draw a diagram of the example.

(b) Give examples of (i) a function f : Z ! Z whose inverse is not a function, (ii) afunction g: N ! N whose inverse is not a function.

Solution to (a)(i) f�1 is not a function at all, because we have both (a,1), (a,2) in it.On the other hand, g�1 is a function, but not from B to A, but rather from the propersubset B0 D fa,b,cg � B to A. It is thus a partial function from B to A.


Alice Box: Converse or Inverse?

Alice: In the preceding chapter, you talked on converses, now of inverses.Why is that?

Hatter: The general notions of a relation and a function were developed ratherseparately before being linked up by defining functions as specialrelations (as here) or relations as functions into the two-element setf0,1g (as in some other presentations). The term ‘converse’ grew upin the former community, ‘inverse’ in the latter.

3.3 Injections, Surjections, Bijections

The status of the inverse of a function, as just a relation or itself a function, is closelylinked with two further concepts, injectivity and surjectivity, together constitutingbijectivity, which we now examine.

3.3.1 Injectivity

Let f : A ! B. We say that f is injective (alias one-one) iff whenever a ¤ a0, thenf (a) ¤ f (a0). In words, iff it takes distinct arguments to distinct values (distinct inputsto distinct outputs). Contrapositively, iff whenever f (a) D f (a0), then a D a0. In yetother words, iff for each b 2 B, there is at most one a 2 A with f (a) D b.

Of the two functions f, g described in Exercise 3.2.4, f is not injective, sincef(1) D f (2) although 1 ¤ 2. However, g is injective since it takes distinct argumentsto distinct values.

Exercise 3.3.1 (1) (with partial solution)(a) In Exercise 3.1 (2) we saw that the relations (i) f(a, jaj): a 2 Zg, (iii) f(a, a2):

a 2 Zg, (v) f(a, a C 1): a 2 Zg, (vi) f(a C 1, a): a 2 Zg are functions over Z, thatis, from Z to Z, called mod, square, successor and predecessor, respectively.Which of them are injective?

(b) What does injectivity mean in terms of a digraph for the function?(c) What does injectivity mean in terms of a table for the function?(d) Show that the composition of two injective functions is injective, and give

examples to show that the failure of injectivity in either of the two componentscan lead to failure of injectivity for the composition.

Solution to (a) Over Z, mod is not injective, since, for example, j1j D j�1j,likewise for square since, for example, 32 D (�3)2. On the other hand, successor isinjective, since a C 1 D a0 C 1 implies a D a0 for all a,a0 2 Z. Similarly, predecessoris injective, since a D a0 implies a C 1 D a0 C 1.

3.3 Injections, Surjections, Bijections 65

Comment This exercise brings out the importance of always keeping clearly inmind the intended domain of the function when checking whether it is injective.Take for example the function of squaring. As we have just seen, when understoodwith domain Z, it is not injective. But when understood with domain N (i.e. asthe restriction of the former function to N) then it is injective, since for all naturalnumbers a,b, a2 D b2 implies a D b.

Note that the injectivity of a function from A to B is not enough to guarantee thatits inverse is a function from B to A. For example, as noted in the comments on thelast exercise, the function of squaring on N (rather than on Z) is injective. But itsinverse is not a function on N, since there are elements of N, for example, 5, whichare not in its domain, not being the square of any natural number.

Nevertheless, injectivity gets us part of the way: it suffices to ensure that theinverse relation f�1 of a function from A to B is a function from range(f ) � B to A,and so is a partial function from B to A. Reason: From our work on relations, weknow that f�1 must be a relation from range(f ) to A, so we need only show that forall b 2 B there is at most one a 2 A such that (b,a) 2 f�1, that is, at most one a 2 Awith (a,b) 2 f. But this is exactly what is given by the injectivity of f. Indeed, wecan also argue in the converse direction, with the result that we have the followingequivalence: a function f: A ! B is injective iff its inverse relation f�1is a functionfrom range(f) to A.

Exercise 3.3.1 (2) What would be the natural way of defining injectivity for afunction of two arguments?

Even when a function f : A ! B is not injective, so that its inverse relation f�1

is not a function, there is still a closely related function from B into the power setof A. With a little abuse of notation and considerable risk of confusion for novices,it is also often written as f�1 or more completely as f�1: B !P(A), and is definedby putting f�1(b) D fa 2 A: f (a) D bg for each b 2 B. Complicated when you spell itout, but the underlying idea is simple and natural: you go back from b to the set ofall a with f (a) D b. Perhaps surprisingly, the notion is often useful.

Exercise 3.3.1 (3) Draw an arrow-style diagram on small finite sets A, B (say,four elements in each) to illustrate the notion of the ‘abstract inverse’ functionf�1: B !P(A).

3.3.2 Surjectivity

Let f : A ! B. We say that f is onto B or surjective (with respect to B) iff for all b 2 Bthere is some a 2 A with f (a) D b. In other words, iff every element of B is the valueof some element of A under f. Equivalently, iff range(f ) D B.

For example, if A D f1,2,3g and B D f7,8,9g, then the function f that putsf (1) D 9, f (2) D 8, f (3) D 7 is onto B but is not onto its superset B0 D f6,7,8,9g, sinceit ‘misses out’ the element 6 2 B0.


Exercise 3.3.2 (1)(a) What does surjectivity mean in terms of a digraph for the function?(b) What does it mean in terms of a table for the function?

For some more substantive examples, we look to the number systems. Over Zboth of the functions f (a) D a C 1 and g(a) D 2a are injective, but only f is onto Z,since the odd integers are not in the range of g. However, if we restrict these twofunctions to the set N of natural numbers, then not even f is onto N, since 0 2 N isnot the successor of any natural number.

Exercise 3.3.2 (2) (with partial solution)(a) Consider the function f (a) D a2 over N, NC, Z, Q, R, RC, respectively, and

determine in each case whether it is surjective.(b) Show that the composition of two surjective functions is surjective, and give

examples to show that the failure of surjectivity in either of the two componentscan lead to its failure for the composition.

(c) What would be the natural definition of surjectivity for a function of twoarguments?

Solution to (a) For N, NC, Z and Q, the answer is negative, since, for example, 2is in each of these sets, but there is no number a in any of these sets with a2 D 2.In common parlance, 2 does not have a rational square root. For R, the answer isnegative, since, for example, �1 does not have a square root, but for RC the answeris positive: any b 2 RC there is an a 2 RC with a2 D b.

Both of the terms ‘onto’ and ‘surjective’ are in common use, but the formeris more explicit in that it makes it easier for us to say onto what. We can saysimply ‘onto B’, whereas ‘surjective with respect to B’ is rather more longwinded.Whichever of the two terms one uses, it is important to be clear what set B isintended, since quite trivially every function is onto some set – namely, its range!

Alice Box: What Is f(x)?

Alice: I’m a bit puzzled by your notation in the exercise. When f is afunction, then f (a) is supposed to be the value of that function whenthe argument is a. Is that right?

Hatter: Yes.Alice: But in part (a) of the exercise, you used f (a) to stand for the function

itself. What’s going on?Hatter: To be honest, it is a bit sloppy. When doing the theory, we follow

the official notation and use f for the function itself and f (a) for itsvalue when the argument is a, and that is an element of the range off. But when we talk about specific examples, it can be convenientto use an expression like f (x) or f (a) to stand for the function itself,with the variable just reminding us that it is one-place.

Alice: Shouldn’t you get your act together and be consistent?

(continued)

3.3 Injections, Surjections, Bijections 67

(continued)Hatter: I guess so : : : But in mitigation, it’s not just this book, and it is not

just perversity. It’s done everywhere, in all presentations, becauseit really is so convenient, and everyone gets used to resolving theambiguity unconsciously. So let’s just apologize on behalf of thecommunity, and carry on.

An apology is really needed! Half the trouble that students have with simplemathematics is due to quirks in the language in which it is conveyed to them.Usually, the quirks have a good reason for being there – usually one of convenience –but they do need to be explained. This is just one example; we will see others as wego on.

3.3.3 Bijective Functions

A function f : A ! B that is both injective and onto B is said to be a bijection betweenA and B. An older term sometimes used is one-one correspondence. An importantfact about bijectivity is that it is equivalent to the inverse relation being a functionfrom B into A. That is, a function f : A ! B is a bijection between A and B iff itsinverse f�1is a function from B to A.

Proof: By definition, f is a bijection between A and B iff f is injective and onto B.As we saw earlier in the chapter, f : A ! B is injective iff f�1 is a partial functionfrom B to A; and moreover f is onto B iff range(f ) D B, that is, iff dom(f�1) D B.Putting this together, f is a bijection between A and B iff f�1 is a partial functionfrom B to A with dom(f�1) D B, that is, iff f�1 is a function from B to A.

Alice Box: Chains of Iffs

Alice: Why didn’t you prove this equivalence in the same way as before,breaking it into two parts, one the converse of the other, and provingeach separately by conditional proof?

Hatter: We could perfectly well have done that. But in this particular case,the argument going in one direction would have turned out to beessentially the same as its converse, run backwards. So in this case,we can economize by doing it all as a chain of iffs. However, such aneconomy is not always available.


Exercise 3.3.3(a) What is a bijection in terms of (i) a digraph, (ii) a table for the function?(b) Show that the inverse of a bijection from A to B is a bijection from B to A. Hint:

you know that it is a function from B to A, so you only need check out injectivityand surjectivity.

(c) Show that any injective function on a finite set into itself is a bijection.

Our explanations of injection, surjection and bijection have been formulatedfor one-place functions f : A ! B. However, as remarked in Sect. 3.1, this alreadycovers functions of two or more places. Thus, a function f : A � B ! C is injectiveiff whenever (a,b) ¤ (a0,b0), f (a,b) ¤ f (a0,b0). It is surjective iff for all c 2 C thereare a 2 A, b 2 B with c D f (a,b).

3.4 Using Functions to Compare Size

One of the many uses of functions is to compare the sizes of sets. In this section,we will consider only finite sets, although a more advanced treatment could alsoconsider infinite ones. Recall from Chap. 1 that when A is a finite set, we write#(A) for the number of elements of A, also known as the cardinality of A. Anothercommon notation for this is jAj. We will look at two principles, one for bijectionsand one for injections.

3.4.1 Equinumerosity

When do two finite sets A,B have the same number of elements? The equinumerosityprinciple gives us a necessary and sufficient condition: #(A) D #(B) iff there is somebijection f : A ! B.

Proof. Suppose first that #(A) D #(B). Since both sets are finite, there is somenatural number n with #(A) D n D #(B). Let a1, : : : ,an be the elements of A, andb1, : : : ,bn those of B. Let f : A ! B be the function that puts each f (ai) D bi. Clearlyf is injective and onto B. For the converse, let f : A ! B be a bijection. Suppose Ahas n elements a1, : : : ,an. Then the list f (a1), : : : , f (an) enumerates all the elementsof B, counting none of them twice, so B also has n elements.

In general set theory, developed from its own axioms that do not mentionanything from arithmetic, this condition is often used as a definition of when twosets have the same cardinality, covering the infinite case as well as the finite one.But in this book we consider only its application to finite sets.

3.4 Using Functions to Compare Size 69

Alice Box: Equinumerosity for Infinite Sets

Alice: OK, but just one question. Wouldn’t such a definition give all infinitesets the same cardinality? Isn’t there a bijection between any twoinfinite sets?

Hatter: At first sight it may seem so; and indeed the sets N, NC, Z and Q andeven such sets as Qn (for any positive integer n) are equinumerous inthis sense: it is possible to find a bijection between any two of them.Any set that has a bijection with N is said to be countable, these setsare all countable. But one of Cantor’s fundamental theorems was toshow, surprisingly, that this is not the case for the set R of all realnumbers. It is uncountable: there is no bijection between it and N.Further, he showed that there is no bijection between any set A andits power set P(A).

Alice: How can you prove that?Hatter: The proof is ingenious and short. It uses what is known as the

‘diagonal construction’. But it would take us out of our main path.You will find it in any good introduction to set theory for students ofmathematics, for example, Paul Halmos’ Naıve Set Theory.

A typical application of the equinumerosity principle, or more strictly of its right-to-left half, takes the following form. We have two sets A,B and want to show thatthey have the same cardinality, so we look for some function that is a bijectionbetween the two; if we find one, then we are done. The following exercise gives twoexamples. In the first, finding a suitable bijection completes the proof; in the second,it serves to reset the problem in a more abstract but more amenable form.

Exercise 3.4.1 (with solution)(a) Let A be the set of sides of a polygon, and B the set of its vertices. Show

#(A) D #(B).(b) In a reception, everyone shakes hands with everyone else just once. If there are

n people in the reception, how many handshakes are there?

Solution(a) Let f : A ! B be defined by associating each side with its right endpoint. Clearly

this is both injective and is onto B. Hence #(A) D #(B).(b) Let A be the set of the handshakes, and let B be the set of all unordered

pairs fx,yg of distinct people (i.e. x ¤ y) from the reception. We apply theequinumerosity principle by defining the function f : A ! B that associates eachhandshake in A with the two distinct people in B involved in it. Clearly thisgives us a bijection between A and B. The rest of the solution works on theset B. Since A and B have the same cardinality, we need only ask how manyelements there are in B. It is not difficult to see that there must be n.(n � 1)


ordered pairs of distinct people in the reception: we can choose any one of then to fill the first place, then choose any one of the remaining n � 1 to fill thesecond place. Thus, there are n.(n � 1)/2 unordered pairs, since each unorderedpair fx,yg with x ¤ y corresponds to two distinct ordered pairs (x,y) and (y,x).Thus there are n.(n � 1)/2 handshakes.

Don’t feel bad if you can’t work out the second example for yourself; we willbe getting back to this sort of thing in Chap. 5 (on counting), and in Chap. 4 wewill explain a quite different way of tackling the problem (by induction). But for thepresent, we note that the sought-for bijection can surprisingly often be a quite simpleand obvious one. Sometimes, as in the second example, it associates the elements ofa rather ‘everyday’ set (like the set of all handshakes) with a rather more ‘abstract’one (like the set of all unordered pairs fx,yg of distinct people from the reception),so that we can forget about irrelevant information concerning the former and applycounting rules to the latter.

3.4.2 Cardinal Comparison

When is one finite set at least as large as another? The principle of comparison tellsus that for finite sets A,B: #(A) � #(B) iff there is some injective function f : A ! B.

Proof. We use the same sort of argument as for the equinumerosity principle.Suppose first that #(A) � #(B). Since both sets are finite, there is some n � 0 with#(A) D n, so that #(B) D n C m for some m � 0. Let a1, : : : ,an be the elements ofA, and b1, : : : ,bn, : : : ,bnCm those of B. Let f: A ! B be the function that puts eachf (ai) D bi. Clearly, f is injective (but not necessarily onto) B. For the converse, letf: A ! B be injective. Suppose A has n elements a1, : : : ,an. Then the list f (a1), : : : ,f (an) enumerates some (but not necessarily all) the elements of B, counting none ofthem twice, so B has at least n elements, that is, #(A) � #(B).

Once again, general set theory turns this principle into a definition of therelation � between cardinalities of sets, both finite and infinite. But that is beyondus here.

3.4.3 The Pigeonhole Principle

In applications of the principle of comparison, we typically use only the right-to-left half of it, formulating it contrapositively as follows. Let A,B be finite sets. If#(A) > #(B), then no function f : A ! B is injective. In other words, if #(A) > #(B),then for every function f: A ! B there is some b 2 B such that b D f (a) for twodistinct a 2 A.

This simple rule is known as the pigeonhole principle, from the example in whichA is a (large) set of memos to be delivered to people in an office, and B is the (small)set of their pigeonholes. It also has a more general form, as follows. Let A,B be

3.4 Using Functions to Compare Size 71

finite sets. If #(A) > k�#(B) then for every function f : A ! B there is a b 2 B suchthat b D f (a) for at least k C 1 distinct a 2 A.

The pigeonhole principle is surprisingly useful as a way of showing that, insuitable situations, at least two distinct items must have a certain property. Thewealth of possible applications can only be appreciated by looking at examples.We begin with a very straightforward one.

Exercise 3.4.3 (1) (with solution) A village has 400 inhabitants. Show that at leasttwo of them have the same birthday, and that at least 34 are born in the same monthof the year.

Solution Let A be the set of people in the village, B the set of the days of the year,C the set of months of the year. Let f : A ! B associate with each villager his orher birthday, while g: A ! C associates each villager with the month of birth. Since#(A) D 400 > 366 � #(B) the pigeonhole principle tells us that there is a b 2 B suchthat b D f (a) for two distinct a 2 A. This answers the first part of the question. Also,since #(A) D 400 > 396 D 33�#(C), the generalized pigeonhole principle tells us thatthere is a c 2 C such that c D f (a) for at least 33 C 1 D 34 distinct a 2 A, whichanswers the second part of the question.

In this exercise, it was quite obvious what sets A,B and function f : A ! Bto choose. In other examples, this may need more reflection, and sometimesconsiderable ingenuity.

Exercise 3.4.3 (2) (with solution) In a club, everyone has just one given name andjust one family name. It is decided to refer to everyone by two initials, the first initialbeing the first letter of the given name, the second initial being the first letter of thefamily name. How many members must the club have to make it inevitable that twodistinct members are referred to by the same initials?

Solution It is pretty obvious that we should choose A to be the set of membersof the club. Let B be the set of all ordered pairs of letters from the alphabet.The function f : A ! B associates with each member a pair as specified in theclub decision. Assuming that we are dealing with a standard English alphabet of26 letters, #(B) D 26�26 D 676. So if the club has 677 members, the pigeonholeprinciple guarantees that f (a) D f (a0) for two distinct a, a0 2 A.

When solving problems by the pigeonhole principle, always begin by choosingcarefully the sets A,B and the function f : A ! B. Indeed, this will often be the keystep in finding the solution. Sometimes, as in the above example, the elements of Bwill be rather abstract items, such as ordered n-tuples.

Exercise 3.4.3 (3) A multiple-choice exam has five questions, each with twopossible answers. Assuming that each student enters a cross in exactly one boxof each question (no unanswered, over-answered or spoiled questions), how manystudents need to be in the class to guarantee that at least four students submit thesame answer paper?


3.5 Some Handy Functions

In this section, we look at some simple functions that appear over and again: theidentity, constant, projection and characteristic functions. The student may find itdifficult to assimilate all four in one sitting. No matter, they are here to be understoodand then recalled whenever needed.

3.5.1 Identity Functions

Let A be any set. The identity function on A is the function f : A ! A such that forall a 2 A, f (a) D a. As simple as that! It is sometimes written as iA, or with theGreek letter iota as šA. It is very important in abstract algebra, and comes up oftenin computer science, never at the centre of attention but (like the restriction of afunction) as an everyday tool used almost without thinking.

Exercise 3.5.1(a) Show that the identity function over any set is a bijection, and is its own inverse.(b) Let f : A ! B. Show that fºiA D f D iBºf.(c) Show that if A ¤ B then iA ¤ iB.

Alice Box: One Identity Function or Many?

Alice: So, for every choice of set A you get a different identity function iA?Hatter: Strictly speaking, yes, and we showed it in the last exercise.Alice: Why not define a great big identity function, for once and for all, by

putting i(a) D a for every a whatsoever?Hatter: Attractive, but there is a technical problem. What would its domain

and range be?Alice: The set of all things whatsoever – call it the universal set.Hatter: Unfortunately, as we saw when you asked about absolute comple-

mentation in Chap. 1, standard set theory does not admit such a set, onpain of contradiction. So we must relativize the concept to whatever‘local universe’ set U we are working in. A little bit of a bother, butnot too inconvenient in practice.

3.5.2 Constant Functions

Let A,B be non-empty sets. A constant function on A into B is any function f : A ! Bsuch that f (a) D f (a0) for all a,a0 2 A. Equivalently, iff there is some b 2 B such thatb D f (a) for all a 2 A.

The order of the quantifiers is vital in the second of these formulations. Werequire that 9b 2 B 8a 2 A: b D f (a), in other words that all the elements of A

3.5 Some Handy Functions 73

have the same value under f. This is much stronger than requiring merely that8a 2 A 9b 2 B: f (a) D b; in fact, that condition holds for any function f whatso-ever! In general, there is an immense difference between statements of the form8x9 y( : : : ) and corresponding ones of the form 9y8x( : : : ). In a later chapter onthe logic of the quantifiers, we will set out systematically the relations betweenthe various propositions QxQ0y( : : : ), where Q and Q0 are quantifiers (universal orexistential), x and y are their attached variables, where the expression ( : : : ) is keptfixed.

Exercise 3.5.2(a) Fix non-empty sets A,B with #(A) D m and #(B) D n. How many constant

functions f : A ! B are there?(b) Show that when #(A) > 1, no constant function f: A ! B is injective, and when

#(B) > 1 no constant function f : A ! B is onto B.

3.5.3 Projection Functions

Let f : A�B ! C be a function of two arguments, and let a 2 A. By the rightprojection of f from a we mean the one-argument function fa: B ! C defined byputting fa(b) D f (a,b) for each b 2 B.

Likewise, letting b 2 B, the left projection of f from b is the one-argument functionfb: A ! C defined by setting fb(a) D f (a,b) for each a 2 A.

In other words, to form the left or right projection of a (two-argument) function,we hold one of the arguments of f fixed at some value and consider the (one-argument) function obtained by allowing the other argument to vary.

Exercise 3.5.3(a) What is the right projection of the two-argument function f : N � N ! N defined

by putting f (m.n) D mn, when m is chosen to be 2? And the left projection whenn is chosen to be 3?

(b) Describe the right projection of the multiplication function x � y at the valuex D 3. Also the left projection at the value y D 3. Are they the same function?

3.5.4 Characteristic Functions

Let U be any set fixed as a local universe. For each subset A � U we can define afunction fA: U ! f1,0g by putting fA(u) D 1 when u 2 A and fA(u) D 0 when u 62 A.This is known as the characteristic function of A (modulo U). Thus the characteristicfunction fA specifies the truth-value of the statement that u 2 A.

Conversely, when f : U ! f1,0g, we can define the associated subset of U byputting Af D fu 2 U: f (a) D 1g. Clearly, there is a bijection between the subsets ofU and the functions f : U ! f1.0g, and in fact, we can make either do the work ofthe other. In some contexts, it can be conceptually or notationally more convenientto work with characteristic functions than with subsets.


Exercise 3.5.4(a) What is the characteristic function of the set of all prime numbers, in the

local universe NC of positive integers? Of the composites (non-primes)? Ofthe empty set, and of NC itself?

(b) Let U be a local universe, and let A,B � U. Express the characteristic functionof A\B in terms of those for A,B. Then do the same for A[B and for UnA. Dothese remind you of anything from earlier logic boxes?

3.6 Families and Sequences

The uses of functions are endless. In fact, their role is so pervasive that somemathematicians prefer to see them, rather than sets, as providing the bedrock oftheir discipline. When that is done, sets are defined, roughly speaking, as theircharacteristic functions. We will not go further into that perspective, which belongsrather to the philosophy of mathematics. Instead, we illustrate the versatility offunctions by seeing how they can clarify the notion of a family of sets, and asequence of arbitrary items.

3.6.1 Families of Sets

In Sect. 1.5, we introduced sets of sets. They are usually written fAi: i 2 Ig wherethe Ai are sets and the set I, called an index set, helps us keep track of them. Becausethe phrase ‘set of sets’ tends to be difficult for the mind to process, we also speakof fAi: i 2 Ig as a collection of the sets Ai; but the term ‘collection’ does not meananything new – it is merely to facilitate communication.

We now introduce the subtly different concept of a family of sets. This refers toany function on a domain I (called an index set) such that for each i 2 I, f (i) is a set.Writing f (i) as Ai, it is thus the set of all ordered pairs (i, f (i)) D (i, Ai) with i 2 I.The range of this function is the collection fAi: i 2 Ig.

The difference is subtle, and in some contexts sloppiness does not matter. Thecollection notation fAi: i 2 Ig is sometimes used also for the family. But in certainsituations, notably applications of the pigeonhole principle and other counting rulesthat we will come to in a later chapter, the difference is very important. Suppose, forexample, that the index set I has n elements, say I D f1, : : : ng, but the function f isnot injective, say f (1) D f (2), that is, A1 D A2. In that case, the family containing allthe pairs (i, Ai) with i � n, has n elements, but the collection, containing all the setsAi with i � n, has fewer elements since A1 D A2.

Once more, the substance of mathematics is intricately entwined with the waythat it uses language. The convention of subscripting items with a variable is oftenan implicit way of describing a function. For example, when we say ‘let pi be theith prime number, for any i 2 NC’, then we are implicitly considering the function

3.6 Families and Sequences 75

p: NC ! NC such that each p(i) is the ith prime number. However, the conventionalnotation with subscripts is often easier to read as it gets rid of brackets, and alsofocuses attention on the numbers themselves with the subscripts merely facilitatingcross-reference. Moreover, in some contexts all we really need to know about thefamily function is its range, so in those contexts we can replace it by a simplecollection of sets.

Exercise 3.6.1 Let I be a set with n elements, and f a constant function onI into a collection of sets. (i) How many elements are there in the collectionfAi: i 2 Ig D ff (i): i 2 Ig. How many elements are there in the family f(i, Ai)g:i 2 Ig D f(i, f (i))g: i 2 Ig? (ii) What if f is an injective function?

3.6.2 Sequences and Suchlike

In computer science as in mathematics itself, we often need to consider sequencesa1,a2,a3, : : : of items. The items ai might be numbers, sets or other mathematicalobjects; in computer science they may be the instructions in a program or steps in itsexecution. The sequence itself may be finite, with just n terms a1, : : : ,an for somen, or infinite with a term ai for each positive integer i, in which case we usuallywrite it in an informal suspended dots notation, as a1,a2,a3, : : : . But what is such asequence?

It is convenient to identify an infinite sequence a1,a2,a3, : : : with a functionf : NC ! A for some appropriately chosen set A, with f (i) D ai for each i 2 NC.The ith term in the sequence is thus just the value of the function for argument(input) i. The sequence is thus just a family with index set NC.

When the sequence is finite, there are two ways to go. We can continue toidentify it with a function f : NC ! A with f (i) D ai for each i � n 2 NC and withf (n C j) D f (n) for all j 2 NC, so that the function becomes constant in its valuefrom f (n) upwards. Or, more intuitively, we can take it to be a function f : fi � n 2NCg ! A, that is, as a partial function on NC with domain fi � n 2 NCg. It doesn’treally matter which way we do it; in either case we have made a rather vague notionsharper by explaining it in terms of functions.

Alice Box: n-Tuples, Sequences, Strings and Lists

Alice: I’m getting confused! So a finite sequence of elements of A isa function f : fi � n 2 NCg ! A and we write it as a1,a2, : : : ,an.But what’s the difference between this and the ordered n-tuple(a1,a2, : : : ,an) that we saw back in Sect. 2.1.1? While we are at it, Ihave been reading around, and I also find computer scientists talkingabout finite strings and finite lists. Are they the same, or different?

Hatter: Well, er : : :

(continued)


(continued)Alice: They come at me with different notations. Sequences are written as

a1,a2, : : : ,an and tuples with external brackets as in (a1,a2, : : : ,an),but strings are written just as a1a2 : : : an with neither commas norexternal brackets. I see lists written with angular external bracketsand sometimes with further internal brackets. And the symbols usedfor the empty tuple, sequence, string and list are all different. Help!What is going on?

Let’s try to help Alice and get the Hatter off the hook, focussing on the finite case.This sort of thing is rarely explained, and it can be quite confusing to the beginningstudent.

The most abstract concept is that of a tuple. If we go back to the accountin Sect. 2.1.1, we see that we don’t care what it is. All we care about is howit behaves, and in particular the criterion for identity that we mentioned there:(x1,x2, : : : ,xn) D (y1,y2, : : : ,yn) iff xi D yi for all i from 1 to n. An n-tuple can beanything that satisfies that condition.

A sequence is a more specific kind of object. In this section, we have defineda finite sequence of n items to be a function f : fi � n 2 NCg ! A. Sequencesthus satisfy the identity criterion just mentioned, and so we may regard them asa particular kind of tuple. Mathematicians like them because they are accustomed toworking with functions on NC.

That leaves us with strings and lists. They are best thought of as tuples, usuallyof symbols, that come equipped with a tool for putting them together and takingthem apart.

In the case of strings, this tool is the operation of concatenation, which consistsof taking any strings s and t and forming a string con(s,t). For symbols, this isthe longer symbol formed by writing s and then immediately to its right the othersymbol t. When we talk of strings, we are thinking of tuples where concatenation isthe only available way, or the only permitted one, of building or dismantling them.Computer scientists like them, as concatenation is an operation that their machinescan work with quickly.

Lists may also be thought of as tuples, likewise equipped with a single tool fortheir construction and decomposition. But this time the tool is different, and morelimited in its power. It can be thought of as a restricted form of concatenation. Weare allowed to take any list y and put in front of it an element a of the base set Aof elementary symbols (often called the alphabet for constructing lists). This formsa slightly longer list ha,yi. Here a is called the head, and y is the tail of the listha,yi. If y itself is compound, say y D hb,xi where b 2 A and x is a shorter list, thenha,yi D ha, hb,xii, and so on. Being a restricted form of concatenation, the operationis given the very similar name cons, so that cons(a,y) D ha,yi. The important thingabout the cons operation is that while it can take any list of any length as its second


argument, it can take only elements of the basic alphabet A as its first argument. Itis in effect a restriction of the first argument of the concatenation operation to thealphabet set A.

This restriction may at first seem rather odd, but there is a reason for makingit. For the restricted kind of concatenation, lists satisfy a mathematical conditionof unique decomposability, which concatenation in general does not satisfy. Thatpermits us to carry out definitions by structural recursion, as we will explain inChap. 4.

To sum up: For tuples we don’t care what they are, so long as they satisfy theappropriate condition for identity. Sequences may be defined as functions on initialsegments of the positive integers (or the natural numbers, if preferred) and are aparticular kind of tuple. Strings and lists are tuples accompanied by a tool for theirconstruction and decomposition – concatenation in one case and a restricted formof concatenation in the other.


Exercise 3.1: Partial functions(a) Characterize the notion of a partial function from A to B in terms of (i) its table

as a relation and (ii) its digraph as a relation.(b) Let R be a relation from A to B. Show that it is a partial function from A to B iff

it is a function from dom(R) to B.

Exercise 3.2: Image, closure(a) The floor function from R into N is defined by putting bxc to be the largest

integer less than or equal to x. What are the images under the floor function ofthe sets [0,1] D fx 2 R: 0 � x �1g, [0,1) D fx 2 R: 0 � x <1g, (0,1] D fx 2 R:0 < x �1g, (0,1) D fx 2 R: 0 < x <1g?

(b) Let f : A ! A be a function from set A into itself. Show that for all X � A,f (A) � f [A], and give a simple example of the failure of the converse inclusion.

(c) Show that when f (A) � A then f [A] D A.(d) Show that for any partition of A, the function f taking each element a 2 A to

its cell is a function on A into the power set P(A) of A with the partition as itsrange.

(e) Let f : A ! B be a function from set A into set B. Recall the ‘abstract inverse’function f�1: B !P(A) defined at the end of Sect. 3.3.1 by putting f�1(b) Dfa 2 A: f (a) D bg for each b 2 B. (i) Show that the collection of all sets f�1(b)for b 2 f (A) � B is a partition of A in the sense defined in Chap. 2. (ii) Is thisstill the case if we include in the collection the sets f�1(b) for b 2 Bnf (A)?

Exercise 3.3: Injections, surjections, bijections(a) Is the floor function from R into N injective? (ii) Is it onto N?(b) Show that the composition of two bijections is a bijection. You may make use

of results of exercises in Sects. 3.3.1 and 3.3.2 on injectivity and surjectivity.(c) Use the equinumerosity principle to show that there is never any bijection

between a finite set and any of its proper subsets.


(d) Give an example to show that there can be a bijection between an infinite setand certain of its proper subsets.

(e) Use the principle of comparison to show that for finite sets A,B, if there areinjective functions f : A ! B and g: B ! A, then there is a bijection from A to B.

Exercise 3.4: Pigeonhole principle(a) Use the general form of the pigeonhole principle to show that of any seven

propositions, there are at least four with the same truth-value.(b) If a set A is partitioned into n cells, how many distinct elements of A need to be

selected to guarantee that at least two of them are in the same cell?(c) Let K D f1,2,3,4,5,6,7,8g. How many distinct numbers must be selected from K

to guarantee that there are two of them that sum to 9? Hint: Let A be the set ofall unordered pairs fx,yg with x,y 2 K and x C y D 9; check that this set forms apartition of K and apply the preceding part of the exercise.

Exercise 3.5: Handy functions(a) Let f: A ! B and g: B ! C. (i) Show that if at least one of f,g is a constant

function, then gºf : A ! C is a constant function. (ii) If gºf : A ! C is a constantfunction, does it follow that at least one of f,g is a constant function (give averification or a counterexample).

(b) What would be the natural way of defining the projection of an n-place functionfrom its ith argument place, thus generalizing the notions of left and rightprojection.

(c) Let f : A � B ! C be a function of two arguments. Verify or give counterexam-ples to the following. (i) If f is injective, then every (left and right) projectionof f is also injective. (ii) If f is surjective, then every projection of f is alsosurjective.

(d) What would be the natural way to define the characteristic function of a two-place relation R � U � U over a universe U?

Selected Reading

The texts listed at the end of Chap. 2 on relations also have useful chapters on functions. In detail:


Halmos PR (2001) Naive set theory, New edn. Springer, New York, chapters 8–10Hein JL (2002) Discrete structures, logic and computability, 2nd edn. Jones and Bartlett, Boston,

chapter 2Lipschutz S (1998) Set theory and related topics, Schaum’s outline series. McGraw Hill, New York,

chapters 4–5Velleman DJ (2006) How to prove it: a structured approach, 2nd edn. Cambridge University Press,

Cambridge, chapter 5

4Recycling Outputs as Inputs: Inductionand Recursion

AbstractThis chapter introduces induction and recursion, which are omnipresent incomputer science. The simplest context in which they arise is in the domainof the positive integers, and that is where we begin. We explain induction asa method for proving facts about the positive integers, and recursion as a methodfor defining functions on the same domain. We will also describe two differentmethods for evaluating such functions.

From this familiar terrain, the basic concepts of recursion and induction canbe extended to structures, processes and procedures of many kinds, not onlynumerical ones. Particularly useful for computer scientists are the forms knownas structural induction and recursion, and we give them special attention. Wewill look at structural recursion as a way of defining sets, structural inductionas a way of proving things about those sets, and then structural recursion oncemore as a way of defining functions with recursively defined domains. At this lastpoint, special care is needed, as the definitions of such functions succeed onlywhen a special condition of unique decomposability is satisfied which, happily,is the case in many computer science applications.

The broadest and most powerful kind of induction/recursion may be formu-lated for sets of any kind, provided they are equipped with a relation that is wellfounded, in a sense we explain. All other kinds may be seen as special cases ofthat one. In a final section, we look at the notion of a recursive program and seehow the ideas that we have developed in the chapter are manifested there.

4.1 What Are Induction and Recursion?

The two words are used in different contexts. ‘Induction’ is the term more com-monly applied when talking about proofs. ‘Recursion’ is the one used in connectionwith constructions and definitions. We will follow this tendency, speaking of


79

80 4 Recycling Outputs as Inputs: Induction and Recursion

inductive proofs and recursive definitions. But it should not be thought that theyanswer to two fundamentally different concepts: the same basic idea is involvedin each.

What is this basic idea? It will help if we look for a moment at the historicalcontext. We are considering an insight that goes back to ancient Greece and India,but whose explicit articulation had difficulty breaking free from a longstandingrigidity. From the time of Aristotle onwards, it was a basic tenet of logic, and ofscience in general, that nothing should ever be defined in terms of itself on pain ofmaking the definition circular. Nor should any proof assume what it is setting out toprove, for that too would create circularity.

Taken strictly, these precepts remain perfectly true. But it was realized thatdefinitions, proofs and procedures may also ‘call upon themselves’, in the sensethat later steps may systematically appeal to the outcome of earlier steps. The valueof a function for a given value of its argument may be defined in terms of its valuefor smaller arguments; likewise, a proof of a fact about an item may assume that wehave already proven the same fact about lesser items; and an instruction telling ushow to carry out steps of a procedure or program may specify this in terms of whatprevious steps have already done. In each case, what we need to keep control is aclear stepwise ordering of the domain we are working on, with a clearly specifiedstarting point.

What do these two requirements mean? To clarify them, we begin by lookingat the simplest context in which they are satisfied: the positive integers. There wehave a definite starting point, 1. We also have a clear stepwise ordering, namely, thepassage from any number n to its immediate successor s(n), more commonly writtenn C 1. This order exhausts the domain in the sense that every positive integer maybe obtained by applying the step finitely many times from the starting point.

Not all number systems are of this kind. For example, the set Z of all integers,negative as well as positive, has no starting point under its natural ordering. The setQ of rational numbers not only lacks a starting point, but it also has the wrong kindof order: no rational number has an immediate successor. Likewise for the set R ofreals.

Conversely, the use of recursion and induction need not be confined to numbersystems. They can be carried out in any structure satisfying certain abstractconditions that make precise the italicized requirement above. Such non-numericalapplications are very important for computer science but are often neglected inaccounts written from a traditional mathematical perspective. We will come to themlater in the chapter.

4.2 Proof by Simple Induction on the Positive Integers

We begin with a simple example familiar from high school, and then articulate theprinciple lying behind it.

4.2 Proof by Simple Induction on the Positive Integers 81

4.2.1 An Example

Suppose that we want to find a formula that identifies explicitly the sum of thefirst n positive integers. We might calculate a few cases, seeing that 1 D 1, 1 C 2D 3, 1 C 2 C 3 D 6, 1 C 2 C 3 C 4 D 10, etc. In other words, writing †fi: 1 � i �ng or †if1 � i � ng or just f (n) for the sum of the first n even positive integers,we see by calculation that f (1) D 1, f (2) D 3, f (3) D 6, f (4) D 10, etc. Aftersome experimenting, we may hazard the conjecture (or read somewhere) that quitegenerally f (n) D n�(n C 1)/2. But how can we prove this?

If we continue calculating the sum for specific values of n without ever finding acounterexample to the conjecture, we may become more and more convinced that itis correct, but that will never give us a proof that it is so. For no matter how manyspecific instances we calculate, there will always be infinitely many still to come.We need another method – and that is supplied by simple induction. Two steps areneeded:• The first step is to note that the conjecture f (n) D n�(n C 1)/2 holds for the

initial case that n D 1, in other words, that f (1) D 1�(1 C 1)/2. This is a matterof trivial calculation, since we have already noticed that f (1) D 1 and clearly also1�(1 C 1)/2 D 1. This step is known as the basis of the inductive proof.

• The second step is to prove a general statement: whenever the conjecturef (n) D n�(n C 1) holds for n D k, then it holds for n D k C 1. In other words,we need to show that for all positive integers k, if f (k) D k�(k C 1)/2, thenf (k C 1) D (k C 1)�(k C 2)/2. This general if-then statement is known as theinduction step of the proof. It is a universally quantified conditional statement.Note how the equality in its consequent is formulated by substituting k C 1 for kin the antecedent.Taken together, these two are enough to establish our original conjecture. The

first step shows that the conjecture holds for the number 1. The induction stepmay then be applied to that to conclude that it also holds for 2, but it may alsobe applied to that to conclude that the conjecture also holds for 3, and so on forany positive integer n. We don’t actually have to perform all these applications oneby one; indeed, we couldn’t do so, for there are infinitely many of them. But wehave a guarantee, from the induction step, that each of these applications couldbe made.

In the example, how do we go about proving the induction step? As it is auniversally quantified conditional statement about all positive integers k, we proceedin the standard way described in an earlier logic box. We let k be an arbitrary positiveinteger, suppose the antecedent to be true, and show that the consequent must alsobe true.

In detail, let k be an arbitrary positive integer. Suppose f (k) D k(k C 1)/2. We needto show that f (k C 1) D (k C 1)(k C 2)/2. Now:


f .k C 1/ D 1 C 2 C � � � C k C .k C 1/ by the definition of the functionf

D .1 C 2 C � � � C k/ C .k C 1/ arranging bracketsD f .k/ C .k C 1/ by the definition of the functionf againD Œk � .k C 1/=2� C .k C 1/ by the supposition f .k/ D k � .k C 1/=2

D Œk � .k C 1/ C 2.k C 1/�=2 by elementary arithmeticD .k2 C 3k C 2/=2 dittoD .k C 1/ � .k C 2/=2 ditto:

This completes the proof of the induction step, and thus of the proof as a whole!The key link in the chain of equalities is the underlined one, where we apply thesupposition.

4.2.2 The Principle Behind the Example

The rule used in this example is called the simple principle of mathematicalinduction, and may be stated as follows. Consider any property that is meaningfulfor positive integers. To prove that every positive integer has the property, it sufficesto show:

Basis: The least positive integer 1 has the property.Induction step: Whenever a positive integer k has the property, then so does k C 1.

The same principle may be stated in terms of sets rather than properties. Considerany set A � NC. To establish that A D NC, it suffices to show two things:

Basis: 1 2 A.Induction step: Whenever a positive integer k 2 A, then also (k C 1) 2 A.

As in our example, checking the basis is usually a matter of trivial calculation.Establishing the induction step is also carried out in the same way as in the example:we let k be an arbitrary positive integer, suppose that k has the property (that k 2 A),and show that k C 1 has the property (that k C 1 2 A). In our example, that waseasily done; tougher problems require more sweat, but still within the same generalframework.

Important terminology: Within the induction step, the supposition that k has theproperty, is called the induction hypothesis. What we set out to show from thatsupposition, that is, that k C 1 has the property, is called the induction goal.

Exercise 4.2.2 (1) (with solution) Use the principle of induction over the positiveintegers to show that for every positive integer n, the sum of the first n odd integersis n2.

Solution Write f (n) for the sum of the first n odd integers, that is, for1 C 3 C� � � C(2n � 1). We need to show that f (n) D n2 for every positive integer n.

Basis: We need to show that f (1) D 12. But clearly, f (1) D 1 D 12, and we are done.

4.2 Proof by Simple Induction on the Positive Integers 83

Induction step: Let k be any positive integer, and suppose (induction hypothesis)that the property holds when n D k, that is, suppose that f (k) D k2. We need to show(induction goal) that it holds when n D k C 1, that is, that f (k C 1) D (k C 1)2. Now:

f .k C 1/ D 1 C 3 C � � � C .2k � 1/ C .2.k C 1/ � 1/ by definition offD .1 C 3 C � � � C .2k � 1// C .2.k C 1/ � 1/ arranging bracketsD f .k/ C 2.k C 1/ � 1 also by definition offD k2 C 2.k C 1/ � 1 by the induction hypothesisD k2 C 2k C 1 by elementary arithmeticD .k C 1/2 ditto

What if we wish to prove that every natural number has the property we areconsidering? We proceed in exactly the same way, except that we start with 0 insteadof 1:

Basis: The least natural number 0 has the property.Induction step: Whenever a natural number k has the property, then so does k C 1.

In the language of sets:

Basis: 0 2 A.Induction step: Whenever a natural number k 2 A, then also (k C 1) 2 A.

Sometimes it is more transparent to state the induction step in an equivalent wayusing subtraction by one rather than addition by one. For natural numbers, in thelanguage of properties:

Induction step: For every natural number k > 0, if k � 1 has the property, then sodoes k.

In the language of sets:

Induction step: For every natural number k > 0, if k � 1 2 A, then also k 2 A.

Note carefully the proviso k > 0: this is needed to ensure that k � 1 is a naturalnumber when k is. If we are inducing over the positive integers only, then of coursethe proviso becomes k > 1.

When should you use induction over the positive integers, and when over thenatural numbers? Quite simply, when you are trying to prove something for all thenatural numbers, induce over them; if you are trying to show something for thepositive integers only, normally you will induce over them. In what follows, we willpass from one to the other without further comment.

It is quite common to formulate the induction step in (equivalent) contrapositiveform, particularly when it is stated with subtraction rather than addition. Thus, forexample, the last version of the induction step becomes:

For every natural number k > 0, if k 62 A, then also k � 1 62 A.

Exercise 4.2.2 (2) Reformulate each of the preceding five displayed formulationsof the induction step for principles of induction over the positive integers or thenatural numbers, in contrapositive form.


Exercise 4.2.2 (3) (with solution) In Chap. 3, we used the pigeonhole principle toshow that in any reception attended by n � 1 people, if everybody shakes hands justonce with everyone else, then there are n(n � 1)/2 handshakes. Show this again, byusing the simple principle of induction over the positive integers.

Solution

Basis: We show this for the case n D 1. In this case, there are 0 handshakes, and1�(1 � 1)/2 D 0, so we are done.Induction step: Let k be any positive integer, and suppose (induction hypothesis)that the property holds when n D k, that is, suppose that when there are k people,there are k�(k � 1)/2 handshakes. We need to show (induction goal) that it holdswhen n D kC1, in other words, that when there are just k C 1 people, there are(k C 1)�((k C 1)�1)/2 D k�(k C 1)/2 handshakes.

Consider any one of these k C 1 people, and call this person a. Then the set of allhandshakes is the union of two disjoint sets: the handshakes involving a and thosenot involving a. Hence (recall from Chap. 1), the total number of handshakes isequal to the number involving a plus the number not involving a. Clearly, thereare just k handshakes of the former kind, since a shakes hands just once witheveryone else. Since there are just k people in the reception other than a, we alsoknow by the induction hypothesis that there are k�(k � 1)/2 handshakes of the latterkind. Thus, there is a total of k C[k�(k � 1)/2] handshakes. It remains to check byelementary arithmetic that k C[k�(k � 1)/2] D [2k C k�(k � 1)]/2 D (2k C k2 � k)/2 D(k2 C k)/2 D k�(k C 1)/2, as desired.

This exercise illustrates the fact that a problem can sometimes be tackled in twoquite different ways – either by induction or directly. The same phenomenon canbe illustrated by the very first example of this chapter, where we showed by simpleinduction that the sum of the first n positive integers equals n�(n C 1)/2. A short,clever proof of the same fact is attributed to Gauss when still a schoolboy. It couldbe called a geometric proof or an argument by rearrangement. Let s be the sum ofthe first n positive integers. Clearly, 2s D (1C� � � Cn) C (nC� � �C1) D n�(n C 1) byadding the corresponding terms n times, so s D n�(n C 1)/2.

We end this section with some general advice. When proving something byinduction, it is essential to keep clearly in mind what the induction hypothesis andinduction goal are. Unless they are made explicit, it is easy to become confused.Classroom experience shows that students rarely have difficulty with the basis,but often get into a mess with the induction step because they have not identifiedclearly what it is, and what its two components (induction hypothesis, inductiongoal) are. Always write out the proposition expressing the induction step explicitlybefore trying to prove it and, until it becomes second nature, label the inductionhypothesis and induction goal within it. To keep a clear idea of what is going on,separate in your mind the general strategy of the induction from whatever numericalcalculations may come up within its execution.

Students are sometimes puzzled why here (and in most presentations) use onevariable n when stating the proposition to be proven by induction, but anothervariable k when proving the induction step. Does it make a difference? The short

4.3 Definition by Simple Recursion on the Positive Integers 85

answer is No. The reason for using different variables is purely pedagogical.We could perfectly well use the same one (say n) in both, but experience againshows that doing so confuses irrevocably those who are already struggling. So,to emphasize that the proposition to be proven by induction and the propositionserving as induction step in the proof are two quite different things, we use differentvariables.

Alice Box: How to Cook Up the Right Property?

Alice: That is all very well, but I am still rather dissatisfied. If you ask meto prove by induction that, say, for every positive integer n, the sumof the first n even integers is n(n C 1), I am sure that I could do itnow. But if you were to ask me to show by induction that we canexpress the sum of the first n even integers in a neat way, I am notsure that I could think up that expression n(n C 1) in order to beginthe proof. Induction seems to be good for proving things that youalready suspect are true, but not much use for imagining what mightbe true in the first place!

Hatter: Indeed, it is quite an art to guess the right equality (or more generally,property) to induce on; once you think it up, the induction itself isoften plain sailing. Like all art, it needs practice and experience. Butthat’s part of what makes it interesting : : :

Exercise 4.2.2 (4)(a) Dowhat Alice is sure that she can do now.(b) Do it again, but this time without a fresh induction. Instead, use what had

already been shown by induction about the sum of the first n positive integersand the sum of the first n odd ones.

4.3 Definition by Simple Recursion on the Positive Integers

We now dig below the material of the preceding section. Roughly speaking, one cansay that underneath every inductive proof lurks a recursive definition. In particular,when f is a function on the positive integers (or natural numbers) and we can proveinductively something about its behaviour, then f itself may be defined (or at leastcharacterized) in a recursive manner.

For an example, consider again the function f used in the first example of thischapter. Informally, f (n) was understood to be the sum 1 C 2 C 3 C� � � C n of thefirst n positive integers. The reason why induction could be applied to prove thatf (n) D n�(n C 1)/2 is that the function f can itself be defined recursively.

What would such a definition look like? As one would expect, it consists of abasis giving the value of f for the least positive integer argument 1, and a recursive


(or inductive) step giving the value of f for any argument n C 1 in terms of its valuefor argument n. Thus, in a certain sense, the function is being defined in terms ofitself, but in a non-circular manner, the value f (n C 1) is defined in terms of f (n)with the lesser argument n. Specifically, in our example, the basis and recursive stepare as follows:

Basis of definition: f (1) D 1.Recursive step of definition: When n � 1, then f (n C 1) D f (n) C (n C 1).

This way of expressing the recursive step, using addition by 1, is sometimescalled recursion by pattern-matching. Another way of writing it uses subtractionby 1, in our example as follows:

Recursive step of definition: When n > 1, then f (n) D f (n � 1) C n.

Other ways of writing recursive definitions are also current among computerscientists. In particular, one can think of the basis and recursion step as beinglimiting and principle cases, respectively, writing in our example:

If n D 1, then f (n) D 1If n > 1, then f (n) D f (n � 1) C n.

This can also be expressed in the popular if-then-else form:

If n D 1, then f (n) D 1Else f (n) D f (n � 1) C n.

Some computer scientists like to abbreviate this further to the ungrammaticaldeclaration:

f (n) D if n D 1, then 1 else f (n � 1) C n,

which can look like mumbo-jumbo to the uninitiated. All of these formulationsare equivalent. You will meet each of them in applications, and so should be ableto recognize them. The choice is partly a matter of personal preference, partly aquestion of which one allows you to get on with the problem in hand with the leastclutter.

Exercise 4.3 (with partial solution)(a) Define recursively the following functions f (n) on the positive integers:

(i) The sum 2 C 4 C� � � C 2n of the first n even integers(ii) The sum 1 C 3 C� � � C (2n � 1) of the first n odd integers

(b) Define recursively the function that takes a natural number n to 2n. This is calledthe exponential function.

(c) Define recursively the product of the first n positive integers. This is known asthe factorial function, written n!, and is pronounced ‘n factorial’.

(d) Use the recursive definitions of the relevant functions to show that for all n � 4,n! > 2n. Hint: here the basis will concern the case that n D 4.

4.4 Evaluating Functions Defined by Recursion 87

Solution to (b)–(d)(b) Basis of definition: 20 D 1. Recursive step of definition: 2nC1 D 2�2n.(c) Basis of definition: 1! D 1. Recursive step of definition: (n C 1)! D (n C 1)�n!(d) Basis of proof: We need to show that 4! > 24. This is immediate by calculation,

with 4! D1�2�3�4 D 24 > 16 D 2�2�2�2 D 24. Induction step of proof: Let k beany positive integer with k � 4. Suppose that k! > 2k (induction hypothesis). Weneed to show that (kC1)! > 2kC1 (induction goal). This can be done as follows:

.k C 1/Š D .k C 1/ � kŠ by definition of the factorial function

> .k C 1/ � 2k by theinduction hypothesis

> 2 � 2k since k � 4

D 2kC1 by definition of the exponential fuction:

The factorial and exponentiation functions are both very important in computerscience, as indications of the alarming way in which many processes can grow insize (number of steps required, time taken, memory required, and so on) as inputsincrease in size. In Chap. 1, we saw that exponentiation already gives unmanageablerates of growth. The exercise shows that factorial is worse.

4.4 Evaluating Functions Defined by Recursion

Go back yet again to the first function that we defined by recursion in the precedingsection: the sum f (n) of the first n positive integers. Suppose that we want tocalculate the value of f (7). There are basically two ways of doing it.

One very obvious way is bottom-up or forwards. We first calculate f (1), use it toget f (2), use it for f (3), and so on. Thus, we obtain:

f .1/ D1

f .2/ Df .1/ C 2 D 1 C 2 D 3

f .3/ Df .2/ C 3 D 3 C 3 D 6

f .4/ Df .3/ C 4 D 6 C 4 D 10

f .5/ Df .4/ C 5 D 10 C 5 D 15

f .6/ Df .5/ C 6 D 15 C 6 D 21

f .7/ Df .6/ C 7 D 21 C 7 D 28:

Here we have made one application of the base clause and six of the recursionclause. Each application is followed by arithmetic simplification as needed. Each ofthe seven steps fully eliminates the function sign f and provides us with a specificnumeral as value of f (i) for some i � 7.


The other way of calculating, at first a little less obvious, is top-down orbackwards. It is also known as unfolding the recursive definition or tracing thefunction, and is as follows:

f .7/ Df .6/ C 7

D.f .5/ C 6/ C 7

D..f .4/ C 5/ C 6/ C 7

D...f .3/ C 4/ C 5/ C 6/ C 7

D....f .2/ C 3/ C 4/ C 5/ C 6/ C 7

D.....f .1/ C 2/ C 3/ C 4/ C 5/ C 6/ C 7

D.....1 C 2/ C 3/ C 4/ C 5/ C 6/ C 7

D....3 C 3/ C 4/ C 5/ C 6/ C 7

D...6 C 4/ C 5/ C 6/ C 7

D..10 C 5/ C 6/ C 7

D.15 C 6/ C 7

D21 C 7

D28:

In this calculation, we begin by writing the expression f (7), make a substitutionfor it as authorized by the recursion clause of the definition, then in that expressionsubstitute for f (6), and so on until after six steps we can at last apply the basis ofthe definition to f (1) to obtain in line 7 a numerical expression, not containing f.From that point, we simplify the numerical expression until, in another six steps, weemerge with a standard numeral.

Which is the better way to calculate? In this example, it does not make asignificant difference. Indeed, quite generally, when the function is defined usingsimple recursion of the kind described in this section, the two modes of calculationwill be of essentially the same length. The second one looks longer, but that isbecause we have left all arithmetic simplifications to the end (as is customary whenworking top-down) rather than doing them as we go. Humans often prefer the firstmode, for the psychological reason that it gives us something ‘concrete’ at each step,making them feel safer; but a computer would not care.

Nevertheless, when the function is defined by more sophisticated forms ofrecursion that we will describe in the following sections, the situation may becomequite different. It can turn out that one, or the other, of the two modes of evaluationis dramatically more economical in resources of memory or time.

Such economies are of little interest to the traditional mathematician, but theyare of great importance for the computer scientist. They may make the differencebetween a feasible calculation procedure and one that, in a given state of technology,is quite unfeasible.

4.5 Cumulative Induction and Recursion 89

Exercise 4.4 Evaluate 6! bottom-up and then again top-down (unfolding).

4.5 Cumulative Induction and Recursion

We now turn to some rather more sophisticated forms of recursive definition andinductive proof. Ultimately, they can be derived from the simple form, but we willnot do so here. We are interested in them as tools-for-use.

4.5.1 Cumulative Recursive Definitions

In definitions by simple recursion such as that of the factorial function, we reachedback only one notch at a time. In other words, the recursion step defined f (n) outof f (n � 1). But sometimes we want to reach back further. A famous example isthe Fibonacci function on the natural numbers (i.e. beginning from 0 rather thanfrom 1), named after an Italian mathematician who considered it in the year 1202.Since then, it has found surprisingly many applications in computer science, biologyand elsewhere. To define it, we use recursion. The basis now has two components:

F(0) D 0F(1) D 1.

So far, F behaves just like the identity function. But then, in the recursion step,Fibonacci takes off:

F(n) D F(n � 1) C F(n � 2) whenever n � 2.

The Fibonacci function illustrates the way in which a top-down evaluation byunfolding can lead to great inefficiencies in computation. Beginning a top-downcomputation of F(8), we have to make many calls, as represented in Table 4.1. Toread the table, notice that each cell in a row is split into two in the row below,following the recursive rule for the Fibonacci function, so that value of each cellequals the sum of the values in the two cells immediately below.

The table is not yet complete – it hasn’t even reached the point where all theletters F are eliminated and the arithmetic simplifications begin – but it is already

Table 4.1 Calls when computing F(8) top-down


clear that there is a great deal of repetition. The value of F(6) is calculated twice,that of F(5) three times, F(4) five times, F(3) eight times (including the one stillto come in the next row). Indeed, when calculating F(n), the number of times eachF(k) is calculated itself follows a Fibonacci function run backwards from n C 1.For example, to calculate F(8), each F(k) for k � 8 C 1 D 9 is calculated F(9 � k)times. Unless partial calculations are saved and recycled in some manner, theinefficiency of the procedure is very high.

Exercise 4.5(a) Express the recursion clause of the definition of the Fibonacci function in

pattern-matching form, in other words, defining F(n C 2) in terms of F(n C 1)and F(n). Be explicit about the bound of your quantification.

(b) Write the pattern-matching form of the definition in if-then-else language.(c) Carry out a bottom-up evaluation of F(8) and compare it with the (incomplete)

top-down evaluation in the text.

In contrast, there are cases in which evaluation bottom-up can be less efficient.Here is a simple example. Define f as follows:

Basis: f (0) D 2.Recursion step: f (n) D f (n � 1)n for odd n > 0, f (n) D f (n/2) for even n > 0.

To calculate f (8) by unfolding is quick and easy: f (8) D f (4) D f (2) D f (1) bythree applications of the second case of the recursion step and finally f (1) D f (0)1

D21 D 2 by the first case of the recursion step and the basis. On the other hand, ifwe were to calculate f (8) bottom-up without introducing shortcuts, we would passthrough each of f (0) to f (8) doing a lot of unnecessary work on the odd values. Wewill see a more dramatic example of the same phenomenon shortly.

4.5.2 Proof by Cumulative Induction

There is no limit to how far a recursive definition may reach back when definingf (n), and so it is useful to have a form of proof that permits us to do the same. It iscalled cumulative induction, sometimes also known as course-of-values induction.

Formulated for the natural numbers, in terms of properties, the principle ofcumulative induction is as follows: To show that every natural number has a certainproperty, it suffices to show the basis and induction step:

Basis: 0 has the property.Induction step: For every natural number k, if every natural number j < k has theproperty, then k itself also has the property.

Exercise 4.5.2 (1) (with solution)(a) Formulate the induction step of the principle of cumulative induction contra-

positively.(b) Use cumulative induction to show that every natural number n � 2 is the product

of (one or more) prime numbers.

4.5 Cumulative Induction and Recursion 91

Solution(a) For every natural number k, if k lacks the property, then there is a natural number

j < k that lacks the property.(b) Basis: We need to show that 2 has the property. Since 2 is itself prime, we

are done. Induction step: Let k be any natural number with k � 2. Suppose(induction hypothesis) that every natural number j with 2 � j < k has theproperty. We need to show (induction goal) that k also has it. There are twocases to consider. If k is prime, then we are done. On the other hand, if k is notprime, then by the definition of prime numbers k D a�b where a, b are positiveintegers �2. Hence, a < k and b < k. So we may apply the induction hypothesisto get that each of a, b has the property, that is, is the product of (one or more)prime numbers. Hence, their product is too, and we are done.

Part (b) of this exercise prompts several comments. The first concerns presenta-tion. In the solution, we kept things brief by speaking of ‘the property’ when wemean ‘the property of being the product of two prime numbers’. The example wassimple enough for it to be immediately clear what property we are interested in;in more complex examples, you are advised to take the precaution of stating theproperty in full at the beginning of the proof.

Second, we began the induction at 2 rather than at 0 or even 1. This is in factcovered by the version that begins from 0, for when we say that every natural numbern � 2 has a certain property, this is the same as saying that for every natural numbern, if n � 2, then n has the property. But it is also convenient to formulate the principlein a separate form when the induction begins at a specified number a > 0, with thefollowing basis and induction step:

Basis: a has the property.Induction step: For every natural number k � a, if every natural number j with a �j < k has the property, then k itself also has the property.

Third, for cumulative induction, the basis is, strictly speaking, covered by theinduction step, and so is redundant! This contrasts with simple induction, where thebasis is quite independent of the induction step and always needs to be establishedseparately. The reason for this redundancy in the cumulative case is that there areno natural numbers less than zero. Hence, vacuously, every natural number j < 0has whatever property is under consideration, so that applying the induction step,zero itself has the property and the basis holds. Nevertheless, even for cumulativeinduction, it is quite common to state the basis explicitly and even to check it outseparately, to double-check that we did not carelessly overlook some peculiarity ofzero when establishing the induction step in its generality. You may do likewise.

Finally, the principle of cumulative induction is in fact derivable from that ofsimple induction. If this were a text for students of mathematics, the proof shouldfollow, but we omit it here. Although it is derivable from the simple form, it isnevertheless extremely useful to have available as a rule in its own right, ready forapplication.

Exercise 4.5.2 (2) Use cumulative induction to show that for every positive integern, F(n) is even iff n is divisible by 3, where F is the Fibonacci function.


4.5.3 Simultaneous Recursion and Induction

When a function has more than one argument, its definition will have to take accountof both of them. If the definition is recursive, sometimes we can get away withrecursion on just one of the arguments, holding the others as parameters. A simpleexample is the recursive definition of multiplication over the natural numbers, usingaddition, which can be expressed as follows:

Basis: (m�0) D 0.Recursion step: m�(n C 1) D (m�n)Cm.

The equality in the recursion step is usually taught in school as a fact aboutmultiplication, which is assumed to have been defined or understood in someother way. In axiomatizations of arithmetic in the language of first-order logic,the equality is treated as an axiom of the system. But here we are seeing itas the recursive part of a definition of multiplication, given addition. The samemathematical edifice can be built in many ways!

Exercise 4.5.3 (1)(a) Give a recursive definition of the power function that takes a pair (m,n) to mn,

using multiplication, and with recursion on the second argument only.(b) Comment on the pattern shared by this definition of the power function, and the

preceding definition of multiplication.

When a two-argument function is defined in this way, then inductive proofs forit will tend to follow the same lines, with induction carried out on the argument thatwas subject to recursion in the definition. But sometimes we need to define functionsof two (or more) arguments with recursions on each of them. This is called definitionby simultaneous recursion. A famous example is the Ackermann function. It has twoarguments and, in one of its variants (due to Rosza Peter), is defined as follows:

A(0,n) D n C 1A(m,0) D A(m�1,1) for m > 0A(m,n) D A(m � 1, A(m, n � 1)) for m,n > 0.

The reason why this function is famous is its spectacular rate of growth. Form < 4, it remains leisurely, but when m � 4, it accelerates dramatically, muchmore so than either the exponential or the factorial function. Even A(4,2) is about2�1019728. This gives it a theoretical interest: although the function is computable, itcan be shown that it grows faster than any function in the class of so-called ‘primitiverecursive’ functions. Roughly speaking, they are functions that can be defined byrecursions of fairly simple syntactic kind. They are computable, and for a shortwhile after their articulation as a class of functions, they were thought to exhaust allthe computable functions, until the Ackermann function provided a counterexample.

But what interests here us about the Ackermann function is the way in which thelast clause of the definition makes the value of A(m,n) depend on the value of thesame function for the first argument diminished by 1, but paired with a value ofthe second argument that might be larger than n. As the function picks up speed,

4.6 Structural Recursion and Induction 93

to calculate the value of A(m,n) for a given m may require prior calculation ofA(m � 1,n0) for an extremely large n0 > n.

Indeed, given the way in which the recursion condition reaches ‘upwards’ on thesecond variable, it is not immediately obvious that the three clauses taken togetherreally succeed in defining a unique function. It can, however, be shown that theydo, by introducing a suitable well-founded ordering on N2, and using the principleof well-founded recursion. We will explain the concept of a well-founded orderinglater in this chapter.

Alice Box: Basis and Recursion Step

Alice: There is something about the definition of the Ackermann functionthat bothers me. There are three clauses in it, not two. So which isthe basis and which is the recursion step?

Hatter: The first clause gives the basis. Although it covers infinitely manycases, and uses a function on the right hand side, the function A thatis being defined does not appear on the right.

Alice: And the recursion step?Hatter: It is given by the other two clauses together. The second clause

covers the case m > 0 while n D 0; the third deals with the case whenboth m, n > 0. The distinguishing feature that marks them both asparts of the recursion step is the appearance of A on the right.

The Ackermann function illustrates dramatically the difference that can arisebetween calculating bottom-up or top-down, in the sense described in Sect. 4.4. Thebest way to appreciate the difference is to do the following exercise. You will seethat this time it is the bottom-up approach that comes off worse. The reason is that,as already remarked, to calculate A(m,n) for a given m may require prior calculationof A(m � 1,n0) for a certain n0 > n. We don’t know in advance how large this n0 willbe, and it is inefficient to plough through all the values of n0 until we reach the onethat is actually needed.

Exercise 4.5.3 (2) Calculate the value of A(2,3) bottom-up. Then calculate it top-down. Compare your calculations.

4.6 Structural Recursion and Induction

We now come to the form of recursion and induction that is perhaps the mostfrequently used by computer scientists – structural. It can be justified or replacedby recursion and induction on the natural numbers but may also be used, veryconveniently, without ever mentioning them. It tends to be ignored in courses formathematics students, but it is essential that those heading for computer science(or logic) be familiar with it.


We introduce structural recursion/induction in three stages, remembering as wego the rough dictum that behind every inductive proof lurks a recursive definition.First, we look at the business of defining sets by structural recursion. That will notbe difficult, because back in Chaps. 2 and 3 on sets and functions we were alreadydoing it without, however, paying attention to the recursive aspect. Then we turn tothe task of proving things about these sets by structural induction, which will also bequite straightforward. Finally, we come to the rather delicate part: the task of takinga recursively defined set and using structural recursion to define a function with it asdomain. That is where care must be taken, for such definitions are legitimate onlywhen a special constraint is satisfied.

4.6.1 Defining Sets by Structural Recursion

In Chaps. 2 and 3, we introduced the notions of the image and the closure of aset under a relation or function. We now make intensive use of them. We begin byrecalling their definitions, generalizing a little from binary (i.e. two-place) relationsto relations of any finite number n � 2 of places.

Let A be any set, and let R be any relation (of at least two places) over the localuniverse within which we are working. Since m-argument functions are (m C 1)-place relations, this covers functions of one or more arguments as well.

The image of A under the (m C 1)-place relation R is defined by putting x 2 R(A)iff there are a1, : : : ,am 2 A with (a1, : : : am,x) 2 R. In the case that R is an m-argumentfunction f, this is equivalent to saying: x 2 f (A) iff there are a1, : : : ,am 2 A with x Df (a1, : : : am).

The concept of image is not yet recursive; that comes with the closure R[A],which for brevity we write as AC. We saw that it can be defined in either of twoways.

The way of union (bottom-up) is by making a recursion on the natural numbers,then taking the union of all the sets thus produced:

A0 D AAnC1 D An[R(An), for each natural number nAC D [fAn: n 2 Ng.

The way of intersection (‘top-down’) dispenses with numbers altogether. Forsimplicity, we assume that we are working within a local universe (see Sect. 1.4.3),which, however, we will usually not bother to mention explicitly. Recall that a setX is closed under a relation R iff R(X) � X. The closure AC of A under the relationR is defined to be the intersection of all sets X � A that are closed under R. That is,AC D \fX � U: A � X � R(X)g where U is the local universe, assumed to includeboth A and R. So defined, AC is in fact the least superset of A that is closed under


the relation R. On the one hand, since it is the intersection of all the X � A closedunder R, it is clearly included in each such X; on the other hand, it is easy to checkthat it is itself also closed under the relation R.

Exercise 4.6.1 (1)(a) Verify the point just made, that the intersection of all sets X � A that are closed

under R is itself closed under R.(b) For this exercise, to avoid confusion, use A[ as temporary notation for AC

defined bottom-up, and A\ as temporary notation for AC defined top-down.Show that A[ D A\ by establishing the two inclusions separately.

Hints for (b): To show that A[ � A\, use induction on the natural numbers. For theconverse, begin by checking that A[ is closed under R.

The definition can evidently be extended to cover an arbitrary collection ofrelations, rather than just one. The closure AC of A under a collection fRigi2I ofrelations is defined to be the intersection of all sets X � A that are closed under allthe relations in the collection.

A set is said to be defined by structural recursion whenever it is introduced as theclosure of some set (referred to as the basis, or initial set) under some collection ofrelations (often called constructors or generators).

In this context, it is intuitively helpful to think of each (m C 1)-place relation Ras a rule: if (a1, : : : am,x) 2 R, then, when building AC, we are authorized to passfrom items a1, : : : am already in AC, to put x in AC too. Thus, AC is also describedas the closure of A under a collection of rules. This way of speaking often makesformulations more vivid. We illustrate the concept with five examples, two drawnfrom computer science, two from logic and one from abstract algebra:• The notion of a string, already mentioned informally in an Alice box of Chap. 3.

It is of central importance for formulating, for example, the theory of finite statemachines. Let A be any alphabet, consisting of elementary signs. Let œ be anabstract object, distinct from all the letters in A, which is understood to serve asthe empty string. The set of all strings over A, conventionally written as A*, isthe closure of A[fœg under the rule of concatenation, that is, the operation oftaking two strings s,t and forming their concatenation by writing s immediatelyfollowed by t.

• We can define specific sets of strings by structural recursion. For instance, astring over an alphabet A is said to be a palindrome iff it reads the same fromeach end. Can we give this informal notion, known to grammarians since ancienttimes, a recursive definition? Very easily! The empty string œ reads the same wayfrom each end, and is the shortest even palindrome. Each individual letter in Areads the same way from left and from right, and so these are the shortest oddpalindromes. All other palindromes may be obtained by successive symmetricflanking of given palindromes. So we may take the set P of palindromes to be theclosure of the set fœg[A under the rule permitting passage from a string s to astring xsx for any x 2 A. In other words, it is the least set S including fœg[A suchthat xsx 2 S for any s 2 S and x 2 A.


• Logicians also work with symbols, and constantly define sets by structuralrecursion. For example, the set of formulae of classical propositional logic(or any other logical system) is defined as the closure of an initial set A undersome operations. In this case, A is a set of proposition letters. It is closed underthe rules for forming compound formulae by means of the logical connectivesallowed, for instance, :,^,_ (with parentheses around each application of aconnective, to ensure unambiguous reading). So defined, the set of propositionalformulae is a proper subset of the set B* of all strings in the alphabet B DA[f:,^,_g[f(,)g. In case you have trouble reading it, f(,)g is the set consistingof the left and right parentheses.

• The set of theorems (aka theses) of a formal logical system is also definedby structural recursion. It is the closure AC of some initial set A of formulae(known as the axioms of the system) under certain functions or relations betweenformulae (known as derivation rules of the system). A derivation rule often usedin this context is modus ponens (aka detachment, also known as !�), permittingpassage from formulae ’ and ’ ! “ to the formula “.

• Algebraists also use this kind of definition, even though they are not, in generaldealing with strings of symbols. For example, if A is a subset of an algebra, thenthe subalgebra generated by A has as its carrier (i.e. underlying set) the closureAC of A under the operations of the algebra, and as its operations the restrictionto that carrier of the given operations over the whole algebra.In all these cases, it is perfectly possible to use natural numbers as indices for

successive sets A0, A1, A2, : : : in the generation of the closure by the way of union,and replace the structural definition by one that makes a recursion on the indices.Indeed, this is a fairly common style of presentation, and, in some contexts, hasits advantages. But in other cases, the use of indices and appeal to induction onnumbers is an unnecessary detour.

Alice Box: Defining the Set of Natural Numbers Recursively

Alice: My friend studying the philosophy of mathematics tells me that eventhe set of natural numbers may be defined by structural recursion.This, he says, is the justification for induction over the integers. Isthat possible?

Hatter: We can define the natural numbers in that way, if we are willingto identify them with sets. There are many ways of doing it. Forexample, we can take N to be the least set X that contains ¿ and isclosed under the operation taking each set S to S[fSg. In this way,arithmetic is reduced to set theory.

Alice: Does that justify induction over the natural numbers?Hatter: It depends on which way you are doing things. On the one hand,

when arithmetic is axiomatized in its own terms, without reducingit to anything else, induction is simply treated as an axiom and so

(continued)


(continued)is not given a formal justification. But when arithmetic is reduced toset theory along the lines that I just mentioned, induction is longertreated as an axiom, since it can be proven.

Alice: But which is the best way of proceeding?Hatter: That depends on your philosophy of mathematics. In my view, it is

not a question of one way being right and the other being wrong,but one of convenience for the tasks in hand. But discussing that anyfurther would take us too far off our track.

Exercise 4.6.1 (2) (with solution) Define by structural recursion the set of evenpalindromes over an alphabet A, that is, the palindromes with an even number ofletters.

Solution The set of even palindromes over an alphabet A is the closure of fœg underthe same rule, that is, passage from a string s to a string xsx for any x 2 A.

4.6.2 Proof by Structural Induction

We have seen that the procedure of defining a set by structural recursion, that is,as the closure of a set under given relations or functions, is pervasive in computerscience, logic and abstract algebra. Piggybacking on this mode of definition is amode of demonstration that we will now examine – proof by structural induction.

Let A be a set of any items whatsoever, let AC be the closure of A under acollection fRigi2I of relations. Consider any property that we would like to showholds of all elements of AC. We say that a relation R preserves the property iffwhenever a1, : : : am have the property and (a1, : : : am,x) 2 R, then x also has it. WhenR is a function f, this amounts to requiring that whenever a1, : : : am has the property,then f (a1, : : : am) also has the property.

The principle of proof by structural induction may now be stated as follows.Again, let A be a set, and AC the closure of A under a collection fRigi2I of relations.To show that every element of AC has a certain property, it suffices to show twothings:

Basis: Every element of A has the property.Induction step: Each relation R 2 fRigi2I preserves the property.

Proof of the principle is almost immediate given the definition of the closure AC.Let X be the set of all items that have the property in question. Suppose that bothbasis and induction step hold. Since the basis holds, A � X. Since the induction stepholds, X is closed under the relations Ri. Hence, by the definition of AC as the leastset with those two features, we have AC � X, that is, every element of AC has theproperty, as desired.


For an example of the application of this principle, suppose we want to show thatevery even palindrome has the property that every letter that occurs in it occurs aneven number of times. Recall that the set of even palindromes was defined as theclosure of fœg under passage from a string s to a string xsx where x 2 A. So weneed only show two things: that œ has the property in question, and that whenevers has it then so does xsx. The former holds vacuously, since there are no letters in œ

(remember, œ is not itself a letter). The latter is trivial, since the passage from s toxsx adds two more occurrences of the letter x without disturbing the other letters.Thus, the proof is complete.

We could have proven the same fact by an induction over the natural numbers,first defining a suitable measure of the length of a palindrome (in this case, justthe number of letters from A occurring it), and then inducing on the length. Butthe passage through numbers is a quite unnecessary detour – in this example, just ashort one, but in others, where the measure of length may be rather complex, it canbe quite irksome.

Exercise 4.6.2(a) Use proof by structural induction to show that in any (unabbreviated) formula

of propositional logic, the number of left brackets equals the number of rightbrackets.

(b) If you were to prove the same by induction over the natural numbers, whatwould be a suitable measure of length?

(c) Let A be any set of formulae of propositional logic, and let R be the three-placerelation of detachment (alias modus ponens) defined by putting (a,b,c) 2 R iff bis the formula a!c. Let AC be the closure of A under R. Use structural inductionto show that every formula in AC is a subformula of some formula in A.

4.6.3 Defining Functions by Structural Recursion on Their Domains

There is an important difference between the two examples in the last exercise.In the first one, we begin with a set A of elementary letters, and the closingfunctions (forming negations, conjunctions, disjunctions) produce longer and longerformulae, and so always give us something fresh. But in the second example, theclosing relation of detachment may sometimes give us a formula that is already inthe initial set A or was already available at an earlier stage of the closing process: itis not guaranteed always to give us something fresh.

This difference is of no significance for structural induction as a method of proof,but it is vital if we want to use structural recursion to define a function whose domainis a set that was already defined by structural recursion – a situation that arises quitefrequently.

Suppose that we want to give a recursive definition of a function f whose domainis the closure AC of a set A under a function g. For example, we might want to definea function f : AC!N, as some kind of measure of complexity of the elements of thedomain, by putting f (a) D 0 for all a 2 A, and f (g(a)) D f (a) C 1. Is this legitimate?


Unfortunately, not always. If the function g is not injective, then we will have twodistinct a, a0 with g(a) D g(a0), and it may very well be that f (a) ¤ f (a0) and thusf (a) C 1 ¤ f (a0)C1 so that f (g(a)) has not been given a unique value. Moreover,the injectivity of g is not quite enough, by itself, to ensure that f is well defined.It may happen that even when g is injective, g(a) is already an element a0 2 A, sothat f (g(a)) is defined as 0 by the first part of the definition but as f (a) C 1 > 0 bythe second part, again lacking a unique value.

To illustrate this second eventuality, let g: N!N the function of adding one toa natural number except that g(99) is 0. That is, g(n) D n C 1 for n ¤ 99, whileg(99) D 0. Clearly, this function is injective. Put A D f0g. Then the closure AC ofA under g is the set f0, : : : ,99g. Now suppose that we try to define a function f :AC!N by structural induction putting f (0) D 0 and f (g(n)) D f (n) C 1 for all n 2AC. The recursion step is fine for values of n < 99; indeed we get f to be the identityfunction for those values. But it breaks down at n D 99. The recursion step tells usthat f (g(99)) D f (99) C 1 D 100, but since g(99) D 0, the basis has already forcedus to say that f (g(99)) D f (0) D 0. Although the basis and the recursion step makesense separately, they conflict, and we have not succeeded in defining a function!

In summary, whereas structural recursion is always legitimate as a method ofdefining a set as the closure of an initial set under relations or functions, it can failas a method of defining a function with a recursively defined set as its domain, unlessprecautions are taken.

What precautions? What condition needs to be satisfied to guarantee that suchdefinitions are legitimate? Fortunately, analysis of examples like the one abovesuggests an answer. To keep notation simple, we focus on the case that the closureis generated by functions.

Let A be a set and AC its closure under a collection fgigi2I of functions. Anelement x of AC is said to be uniquely decomposable iff either (1) x 2 A, and is notin the range of any of the functions gi, or (2) x 62 A, and x D gi(y1, : : : ,yk) for a uniquefunction gi in the collection and a unique tuple y1, : : : ,yk 2 AC. Roughly speaking,this guarantees that there is a unique way in which x can have got into AC. When theelements of AC are symbolic expressions of some kind, this property is also calledunique readability. Unique decomposability/readability suffices to guarantee that astructural recursion on AC succeeds, that is, that it defines a unique function withAC as domain. To be precise, we have the following

Principle of structural recursive definition: Let A be a set, fgigi2I a collection offunctions, and AC the closure of A under the functions. Suppose that every elementof AC is uniquely decomposable. Let X be any set, and let f : A!X be a givenfunction on A into X. Then, for every collection fhigi2I of functions hi on powersof X into X with arities corresponding to those of the functions gi, there is a uniquefunction fC: AC!X satisfying the following recursively formulated conditions:

Case Conditions

Basis: x 2 A fC(x) D f (x)Recursion step: x D gi(y1, : : : ,yk) is theunique decomposition of x

fC(x) D hi(fC(y1), : : : ,fC(yk))


Fig. 4.1 Recursive structuraldefinition

Another way of putting this principle, which will ring bells for readers who havedone some abstract algebra, is as follows: We may legitimately extend a function f :A!X homomorphically to a function fC: AC!X if every element of AC is uniquelydecomposable.

This is highly abstract, and may at first be rather difficult to follow. It may helpto focus on the case that A is a singleton fag and AC is its closure under just onefunction g with just one argument, so that there is also just one function h underconsideration, which likewise has just one argument. This basic case may also berepresented by a diagram, as in Fig. 4.1. In the diagram, the unique decompositioncondition becomes the requirement that in the left ellipse, the bottom element is nothit by any arrow, while the other elements in that ellipse are hit by just one arrow.

The good news is that when we look at specific recursions in computerscience and logic, particularly for symbolic expressions, the unique decomposi-tion/readability principle is often satisfied. Moreover, the applications can be verynatural – so much so that they are sometimes carried out without mention, takingthe existence of unique extensions for granted.

For a very simple example of this, suppose we are given an alphabet A, andlet AC be the set of all (finite) lists that can be formed from this alphabet by theoperation cons, that is, of prefixing a letter of the alphabet to an arbitrary list (seeSect. 3.6.2). We might define the length of a list recursively as follows, where hi isthe empty list:

Basis: length(hi) D 0.Induction step: If length(s) D n and a 2 A, then length(as) D n C 1.

The principle tells us that this definition is legitimate, because it is clear that eachnon-empty list has a unique decomposition into a head and a body. On the otherhand, if we had tried to make a similar definition using the con operation, of concate-nating any two strings s and t and putting length length (ts) D length(t)Clength(s),the unique readability condition would not hold: the string ts could have been broken


down in other ways. To be sure, in this simple example, it would be possible toget around the failure of unique readability by showing that nevertheless wheneverts D t0s0, then length(t)Clength(s) D length(t0)Clength(s0). But that is a bit of abother, and in other examples, such a patch may not be possible.

For another example of appeal to the unique readability condition, consider thecustomary definition of the depth of a formula of propositional logic. Suppose ourformulae are built up using just negation, conjunction and disjunction. Usually wesay:

Basis: depth(p) D 0 for any elementary letter p.Induction step:

Case 1. If depth(’) D m, then depth(:’) D mC1Case 2. If depth(’) D m and depth(“) D n, then depth(’^“) D depth(’_“)

D max(m,n) C 1,

and take it for granted that the function is well defined. The principle of structuralrecursive definition tells us that indeed it is so, because the formulae of propositionallogic have unique decompositions under the operations of forming negations, con-junctions and disjunctions, provided that they were written with suitable bracketing.Indeed, we can say that the mathematical purpose of bracketing is to ensure uniquedecomposition.

Exercise 4.6.2 (with partial solution)(a) Suppose we forget brackets from formulae of propositional logic, and make

no bracket-saving conventions about the cohesive powers of connectives. Showthat the depth of the expression :p^:q is not well defined.

(b) Let A be any alphabet, and A* the set of all strings over A. Let a 2 A and s 2 A*.Intuitively, the substitution function ¢a,s substitutes the string s for the letter ain all strings. Define this function by structural recursion.

(c) In propositional logic, two formulae ’, “ are said to be tautologically equivalent,and we write ’ a` “, iff they receive the same truth-value as each other underevery possible assignment of truth-values to elementary letters. Three well-known tautological equivalences are double negation ’ a` ::’ and the deMorgan principles :(’^“) a` :’_:“ and :(’_“) a` :’^:“. With thesein mind, define by structural recursion a function that takes every propositionalformula (built using the connectives :,^,_) to a tautologically equivalent onein which negation is applied only to elementary letters. Hint: The recursion stepwill need to distinguish cases.

Solution to (a) When we omit brackets without making any conventions aboutthe cohesive powers of negation and conjunction, the expression :p^:q has twodecompositions, which we may write as (:p^:q) and :(p^:q). The depth of theformer is 2, but of the latter is 3. However, in a later chapter, we will see a trick forrestoring unique readability without needing brackets.


4.7 Recursion and Induction on Well-Founded Sets*

The concept of a well-founded set provides the most general context for recursionand induction. It permits us to apply these procedures to any domain whatsoever,provided we have available a well-founded relation over it. Every other form can inprinciple be derived from this one. We explain the basics.

4.7.1 Well-Founded Sets

We begin by defining the notion of a well-founded relation over a set. Let W be anyset, and < any irreflexive, transitive relation over W (attention: we are not requiringlinearity, that is, that the relation is also complete, cf Sect. 2.6.2). We say that Wis well founded by < iff every non-empty subset A � W has at least one minimalelement. This definition is rather compact, and needs to be unfolded carefully:• A minimal element of a set A � W (under <) is any a 2 A such that there is no

b 2 A with b < a.• The definition requires that every non-empty subset A � W has a minimal

element. The word ‘every’ is vital. It is not enough to find some subset A ofW with a minimal element, nor is it enough to show that W itself has a minimalelement. Thus, the requirement is very demanding.The definition can also be put in a quite different-looking, but nevertheless

equivalent way. Let W be any set, and < any irreflexive, transitive relation overW. Then W is well founded by < iff there is no infinite descending chain: : : a2 < a1 < a0 of elements of W. To prove the equivalence, suppose on the onehand that there is such a chain. Then clearly the set A D fai: i 2 Ng is a subset of Wwith no minimal elements. For the converse, suppose that A � W is non-empty andhas no minimal elements. Choose a0 2 A, which is possible since A is non-empty;and for each i 2 N, put aiC1 to be some x 2 A with x < ai, which is always possiblesince ai is never a minimal element of A. This gives us an infinitely descending chain: : : a2 < a1 < a0 of elements of A � W, and we are done.

This argument is short, but at the same time subtle. The second part of it, whichlooks so simple, implicitly made use of an axiom of set theory called the axiom ofchoice, which is not discussed in this chapter. For our purposes, the important pointis to see the equivalence in intuitive terms.

To help appreciate the contours of the concept, we give some positive andnegative examples. Clearly, the set N of all natural numbers is well founded underthe usual ordering, since every non-empty set of natural numbers has a minimalelement (in fact a unique least element). This contrasts with the set Z of all integers,which is not well founded under the customary ordering since it has a subset (forinstance, that of the negative integers, or indeed the whole of Z) that does not havea minimal element.

Exercise 4.7.1 (1) (with solutions)(a) Is the set of all non-negative rational numbers well founded under its usual <?

4.7 Recursion and Induction on Well-Founded Sets 103

(b) Is the empty set well founded under the empty relation?(c) Show that every finite set A is well founded under any transitive irreflexive

relation over it.(d) Let A be any finite set. Show that its power set P(A) is well founded under the

relation of proper set inclusion.(e) Can you find a subset of N that is not well founded under the usual relation <?

Solutions(a) No. Consider the subset consisting of all rationals greater than zero; it has no

minimal element.(b) Yes, vacuously, as it has no non-empty subsets.(c) Hint: Take any non-empty subset B � A. Do an induction on the number of

elements of B, making it clear where you use the irreflexivity and transitivity ofthe relation in question.

(d) Since A is finite, so is P(A), so we can apply (c).(e) No: Every subset A � N is well founded under <, because all of its non-empty

subsets are non-empty subsets of N and so have a minimal element.

Parts (c) and (d) of the exercise above give us examples of sets that are wellfounded under relations that are not linear, that is, we may have distinct elements a,b with neither a < b nor b < a. For when a set A has more than one element, it willclearly have subsets that are incomparable under inclusion, for example, distinctsingletons.

Exercise 4.7.1 (2)(a) Show that the collection P(N) of all subsets (finite or infinite) of N is not well

founded under inclusion. Hint: Use the definition in terms of infinite descendingchains.

(b) Let A be any infinite set. Show that the collection Pf (A) of all its finite subsetsis infinite, but is well founded under inclusion.

In the special case that a set is well founded by a linear relation, so that for alla,b in the set, either a D b or a < b or b < a, we say that it is well ordered. Thus,unpacking the definition, a set W is well ordered by a relation < iff < is a linearorder of W satisfying the condition that every non-empty subset of W has a minimalelement.

Thus, N and all of its subsets are well ordered under the usual <. However, awell-ordered set can in a natural sense be ‘longer’ than N. For example, if we takethe natural numbers in their standard order, followed by the negative integers in theconverse of their usual order, giving us the set f0, 1, 2, : : : ; �1, �2, �3, : : : g, thenthis is well ordered in the order of listing. Clearly, the operation can be repeated asmany times as we like. It opens the door to the theory of transfinite ordinal numbers,but we will stop short of going through it.

Exercise 4.7.1 (3)(a) Reorder N itself in such a way as to obtain a relation that well orders it in a way

that is, in the same natural sense, longer than that given by <. Hint: Don’t doanything complicated.

(b) Describe a relation that well-orders N�N, built out of the usual ordering < ofN itself. Hint: Try making an infinite table and zigzag from the top left corner.


(c) Show that if W is well ordered, then every non-empty subset A � W has aunique minimal element, which is moreover the least element of the subset, inthe sense that it is less than every other element of the subset.

(d) Show how, from any well-ordered set, we may form one that is not well orderedbut is still well founded, by adding a single element.

4.7.2 Proof by Well-Founded Induction

Roughly speaking, a well-founded relation provides a ladder up which we can climbin a set. This intuition is expressed rigorously by the principle of induction over well-founded sets, as follows. Let W be any well-founded set, and consider any property.To show that every element of W has the property, it suffices to show:

Induction step: The property holds of an arbitrary element a 2 W whenever it holdsof all b 2 W with b < a.

Note that the principle has no basis. In this, it is like cumulative induction on thepositive integers. Indeed, it may be seen as a direct abstraction of the principle ofcumulative induction, from the specific order over the natural numbers that we arefamiliar with, to any well-founded relation over any set whatsoever. Of course, wecould write in a basis, which would be: every minimal element of W itself has theproperty in question. But it would be redundant, and so in this context is customarilyomitted.

Exercise 4.7.2 (1) (with solution)(a) Formulate the principle of induction over well-founded sets in contrapositive

form.(b) Formulate it as a statement about subsets of W rather than about properties of

elements of W.(c) Contrapose the formulation in (b).

Solution(a) Let W be any well-founded set, and consider any property. If the property does

not hold of all elements of W, then there is an a 2 W that lacks the property,although every b 2 W with b < a has the property.

(b) Let W be any well-founded set, and let A � W. If a 2 A whenever b 2 A forevery b 2 W with b < a, then A D W.

(c) Let W be any well-founded set, and let A � W. If A ¤ W, then there is an a 2W with a 62 A, although b 2 A for every b 2 W with b < a.

The proof of the principle of induction over well-founded sets is remarkablybrief, considering its power. Let W be any well-founded set, and consider anyproperty. Suppose that (1) the property holds of an arbitrary element a 2 Wwhenever it holds of all b 2 W with b < a, but (2) it does not hold of all elements ofW. We get a contradiction.

Let A be the set consisting of those elements of W that do not have the propertyin question. By the second supposition, A is not empty, so, since W is well founded,


A has a minimal element a. Thus, on the one hand, a does not have the property.But on the other hand, since a is a minimal element of A, every b 2 W with b < a isoutside A, and so does have the property in question, contradicting supposition (1),and we are done.

Alice Box: Proving the Principle of Induction Over Well-Founded Sets

Alice: That’s certainly brief for such a general principle – although I wouldnot say it is easy. But there is something about the proof that bothersme. We showed that the principle holds for any set that is wellfounded under a relation.

Hatter: Yes, indeed.Alice: In the definition of a well-founded set, we required that the relation

be irreflexive and transitive, as well as satisfying the ‘minimalelement’ condition that every non-empty subset has at least oneminimal element. I see how we used the minimal element conditionin the proof, but as far as I can see, we didn’t use irreflexivity ortransitivity. Does this mean that we can generalize the principle ofwell-founded induction by dropping those two requirements fromthe definition of a well-founded set?

Hatter: I guess so.

Alice and the Hatter guessed right, but we should perhaps say a bit more.Dropping the condition of irreflexivity from the definition of a well-foundingrelation does not really change anything. That is because irreflexivity is implied,anyway, by the minimal-element condition: If a < a, then the non-empty set fagdoes not have a minimal element under <. It can also be shown that dropping thecondition of transitivity from the definition does not really strengthen the principleof well-founded induction: The ‘strengthened’ version can in fact be derived, in arather devious manner, from the standard one that we have stated above. Moreover,applications of the principle almost always work with a transitive relation. So werest content with that formulation.

Exercise 4.7.2 (2) Show that the minimal-element condition implies acyclicity.

When a proof by induction is combined with reductio ad absurdum, it is oftenset out in a special way, which we might call a wlog presentation. We may beconsidering a set W that is well founded under a relation <, and want to show thatevery element of that set has a certain property. We suppose for reductio that thereis an a 2 W that lacks the property, and we let A be the set of all those elements.Since a 2 A, we know that A is non-empty, so well ordering tells us that it has aminimal element a0. We then go on to get a contradiction by showing that theremust nevertheless be a b < a0 lacking the property.


When this mode of argument is used, it is customary to abbreviate it. Aftersupposing that there is an a 2 W that lacks the property, one says simply: ‘we mayassume without loss of generality (briefly, wlog) that a is minimal’. Here, ‘minimal’is understood to mean ‘minimal among the elements of W lacking the property’(not to be confused with being a minimal element of W itself). This turn of phraseavoids introducing another variable a0 as above, which can be a bit messy when theproperty under investigation is a complicated one. In the particular case that the wellfounding is linear and so a well ordering, then we say: ‘we may assume wlog that ais the least such element of W’.

This may all sound rather recherche and even perverse, but in fact it comes quitenaturally, and it helps reduce notation. In the chapter on trees (specifically, in Sect.7.6.2), we will see several examples of its use on the natural numbers under theirusual ordering <.

Exercise 4.7.2 (3) Look up a standard textbook proof of the irrationality ofp

2 andput your finger on where it uses wlog.

Alice Box: Wlog Proofs

Alice : Is that the only situation in which we may assume something withoutloss of generality?

Hatter: No, but in practice, it is one of the most common.

Alice: When else can one do it?

Hatter : Whenever we are given that there is an a with a certain property P,and we can easily verify that whenever there is such an a, there isalso an a0 (not necessarily identical with a) that has both the propertyP and another property Q. Then, instead of restating and reletteringeverything for a0, which can be quite a nuisance if the property P is acomplex one, we simply say that we may assume wlog that a also hasthe property Q.

Once again, we can add a bit more to the Hatter’s response. There are still moreoccasions where an assumption may be made wlog. For example, suppose that wewish to show that all elements of a certain domain have a certain property. We mayhave at our disposal a representation or normal form theorem to the effect that everyelement of the domain is equivalent, in a suitable sense, to one of a special kind.Then we may begin our proof by taking an arbitrary element of the domain andassuming, without loss of generality, that the element is of the special kind. But careis needed. The move is legitimate only when such a normal form or representationtheorem is available, and provided that the kind of equivalence that it talks about issufficient to transfer the property that one is trying to establish.


4.7.3 Defining Functions by Well-Founded Recursionon Their Domains

Induction for well-founded sets is a principle of proof. Is there a correspondingprinciple for definition, guaranteeing the existence of functions that are defined byrecursion on a well-founded domain? The answer is positive. However, stating theformal principle in its full generality is quite subtle, and we will not attempt ithere. We supply a rather informal formulation that covers most of the cases thata computer scientist is likely to need.

Principle of recursive definition on well-founded sets: Let W be any set that is wellfounded under a relation <. Then we may safely define a function f : W!X bygiving a rule that specifies, for every a 2 W, the value of f (a) in terms of the values off (b) for some collection of b < a, using any other functions and sets that are alreadywell defined. ‘Safely’ here means that there exists a unique function satisfying thedefinition.

For confident and adventurous readers who have managed to follow so far, weillustrate the principle by using it to show that the Ackermann function is welldefined. Those not so confident may skip directly to the next section. We recallthe recursive definition of the Ackerman function:

A(0,n) D n C 1A(m,0) D A(m � 1,1) for m > 0A(m,n) D A(m � 1, A(m, n � 1)) for m,n > 0.

This function has two arguments, both from N, so we turn it into a one-argumentfunction on N2 D N�N by reading the round brackets in the definition as indicatingordered pairs. Given the familiar relation < over N, which we know to be wellfounded, indeed to be a well ordering of N, we define the lexicographic order overN2 by the rule: (m,n) < (m0,n0) iff either m < m0 or m D m0 and n < n0.

This is a very useful way of ordering the Cartesian product of any two well-founded sets, and deserves to be remembered in its own right. The reason for thename ‘lexicographic’ will be clear if you think of the case where instead of N,we consider its initial segment A D fn: 0 � n � 26g and identify these with theletters of the English alphabet. The lexicographic order of A2 then corresponds to itsdictionary order.

We first check that the lexicographic relation < is irreflexive, transitive, and thenthat it well-founds N2 (as requested in the following exercise). Having done that, weare ready to apply the principle of recursive definition on well-founded sets. All wehave to do is show that the two recursion clauses of the candidate definition specify,for every (m,n) 2 N2, the value of A(m,n) in terms of the values of A(p,q) for acollection of pairs (p,q) < (m,n), using any other functions and sets already knownto exist.

This amounts to showing for the first recursion clause, that (m � 1,1) < (m,0) and,for the second clause of the recursion step, that (m � 1, A(m, n � 1)) < (m,n), for allm,n > 0. But both of these are immediate from the definition of the lexicographicorder <. We have (m � 1,1) < (m,0) because m � 1 < m, regardless of the fact that


1 > 0. Likewise, we have (m � 1, A(m, n � 1)) < (m,n) no matter how large A(m,n � 1) is, again because m � 1 < m.

Exercise 4.7.3(a) Check the claim made in the text, that the lexicographic order of N2 is

irreflexive, transitive, and that it well-founds N2.(b) Show that in addition it is linear and so a well ordering of N2.

4.7.4 Recursive Programs

What is the connection between all this and recursive programs for computers? Doesthe mathematical work on recursive definitions and inductive proofs do anything tohelp us understand how programs work and what they can do?

The link lies in the fact that a program can be seen as a finite battery of instruc-tions to perform computations for a potentially infinite range of inputs. Simplifyinga great deal, and restricting ourselves to the principle case of deterministic programs(those in which the instructions suffice to fully determine each step in terms of itspredecessors), a program may be understood as a recipe for defining a function thattakes an input a to a finite or infinite succession a0, a1, : : : of successively computeditems. If the sequence is finite, the last item may be regarded as the output.

Equivalently, we are looking at a recipe for defining a two-place function takingeach pair (a,n), where a is a possible input and n is a natural number, to an item an.The recipe is recursive in the sense that the value of an may depend upon the valuesof any of the earlier am, in whatever way that the program specifies, so long as eachstep to an from the earlier steps may actually be performed by the computer thatis executing the program. Typically, the ‘while’ and ‘until’ clauses setting up loopsin the program, form part of the recursion step in the definition of the associatedfunction.

The question thus arises: When does the proposed recipe really give us aprogram, that is, when does it really define a function? If we are writing the programin a well-constructed programming language, then the tight constraints that havebeen imposed on the grammar of the language may suffice to guarantee that it iswell defined. Hopefully, those who designed the language will have proven that byinduction before making the language available to the user. If on the other handwe are working in a lax or informal language, the question of whether we havesucceeded in specifying a unique function needs to be analyzed by the programmer,by expressing it in an explicitly recursive form and verifying by induction that italways gives a unique value.

When the recipe does give us a function, other questions remain. Is the programguaranteed to terminate, in other words, is the sequence of steps an finite? If itdoes terminate, does the output give us what we would like it to give? Answers tothese two questions also require proof, and the proof will always involve inductivearguments of one kind or other, perhaps of many kinds. In some simple cases, it maybe sufficient to induce on the number of loops in the program. For more complex


programs, the inductive arguments can be less straightforward. In general, one needsto devise a suitable relation over an appropriate set of items, verify that it is wellfounded and then use well-founded induction over the relation.


Exercise 4.1: Proof by simple induction(a) Use simple induction to show that for every positive integer n, 5n � 1 is divisible

by 4.(b) Use simple induction to show that for every positive integer n, n3�n is divisible

by 3. Hint: In the induction step, you will need to make use of the arithmeticfact that (k C 1)3 D k3 C 3k2 C 3k C 1.

(c) Show by simple induction that for every natural number n, †f2i: 0 � i � ng D2nC1 � 1.

Exercise 4.2: Definition by simple recursion(a) Let f : N!N be the function defined by putting f (0) D 0 and f (n C 1) D n for all

n 2 N.(i) Evaluate this function bottom-up for all arguments 0–5.

(ii) Explain what f does by expressing it in explicit terms (i.e. without arecursion).

(b) Let g: NxN!N be defined by putting g(m,0) D m for all m 2 N andg(m,n C 1) D f (g(m,n)) where f is the function defined in the first part ofthis exercise.(i) Evaluate f (3,4) top-down.

(ii) Explain what f does by expressing it in explicit terms (i.e. without arecursion).

(c) Let f : NC!N be the function that takes each positive integer n to the greatestnatural number p with 2p � n. Define this function by a simple recursion. Hint:You will need to divide the recursion step into two cases.

Exercise 4.3: Proof by cumulative induction(a) Use cumulative induction to show that any postage cost of four or more pence

can be covered by two-pence and five-pence stamps.(b) Use cumulative induction to show that for every natural number n, F(n) �

2n � 1, where F is the Fibonacci function.(c) Calculate F(5) top-down, and then again bottom-up, where again F is the

Fibonacci function.(d) Express each of the numbers 14, 15 and 16 as a sum of 3s and/or 8s. Using

this fact in your basis, show by cumulative induction that every positive integern � 14 may be expressed as a sum of 3s and/or 8s.

(e) Show by induction that for every natural number n, A(1,n) D nC2, where A isthe Ackermann function.

Exercise 4.4: Structural recursion and induction


(a) Define by structural recursion the set of all odd palindromes over a givenalphabet.

(b) Show by structural induction that every even palindrome is either empty orcontains two contiguous occurrences of the same letter.

(c) Define by structural recursion the set of all palindromes (over a given alphabet)that contain no two contiguous occurrences of the same letter. Hint: Use the factgiven by (b).

(d) Show by structural induction that in classical propositional logic, no formulabuilt using propositional letters and connectives from the set f^,_,!,$g is acontradiction. Hint: Guess an assignment of truth-values that looks like it willmake all such formulae true, and show by structural induction that it reallydoes so.

(e) Justify the principle of structural induction by using cumulative induction overthe natural numbers. Hint: To keep the exposition simple, consider the closureAC of A under a single relation R, and use the bottom-up definition of AC as theunion of sets A0, A1, : : : .

Exercise 4.5: Definition of functions by structural recursion on their domains(a) Consider any alphabet A and the set A* of all strings made up from A. Intuitively,

the reversal of a string s is the string obtained by reading all its letters from rightto left instead of left to right. Provide a structural recursive definition for thisoperation on strings. Hint: the base should concern the empty string, called œ,and the induction step should use the operation cons.

(b) In presentations of propositional logic, it is customary to define an assignmentof truth-values to be any function v: A!f1,0g, where A is some stock of ele-mentary letters p, q, r, : : : . Any such assignment is then extended to a valuationvC: AC!f1,0g where AC is the set of all formulae (i.e. the closure of A underthe operations of forming, say, negations, conjunctions and disjunctions) in away that respects the usual truth tables. It is tacitly assumed that this extensionis unique. Analyze what is being done in terms of structural recursion.

Exercise 4.6: Well-founded sets*(a) Show that every subset of a well-founded set is well founded under the same

relation (strictly speaking, under restriction of that relation to the subset, butlet’s not be too pedantic).

(b) Is the converse of a well-founding relation always a well-founding relation?Give proof or counterexample.

(c) Given well orderings <1, <2 of disjoint sets W1, W2, define a well ordering oftheir union W1[W2. How would you modify the definition for the case that thetwo sets are not disjoint?

(d) Given well orderings <1, <2 of disjoint sets W1, W2, define a well ordering oftheir Cartesian product W1 � W2.

(e) Let W be a set, R any relation over it, and let R* be the transitive closure of R(Sect. 2.7.1). Let A � W be non-empty. (i) Explain (in one sentence) why anyinfinite descending chain : : : a2Ra1Ra0 of elements of W under R is an infinitedescending chain under R*. (ii) Give a simple counterexample to the converse.(iii) Show that, nevertheless, if there is an infinite descending chain under R*,

Selected Reading 111

then there is also an infinite descending chain under R. Remark: These factsunderlie the ‘devious’ proof of equivalence mentioned in Sect. 4.7.2.

Selected Reading

Induction and recursion on the positive integers. There are plenty of texts, although most tend toneglect recursive definition in favour of inductive proof. Here are three among them:

Hein JL (2005) Discrete structures, logic and computability, 2nd edn. Jones and Bartlett, Boston,chapter 4.4

Schumacher C (2001) Chapter zero: fundamental notions of abstract mathematics. Pearson,Boston, chapter 3

Velleman DJ (2006) How to prove it: a structured approach, 2nd edn. Cambridge University Press,Cambridge, chapter 6

Well-founded induction and recursion. Introductory accounts tend to be written for students ofmathematics rather than computer science and, again, tend to neglect recursive definition. Twotexts accessible to computer science students are:

Lipschutz S (1998) Set theory and related topics, Schaum’s outline series. McGraw Hill, New York,chapters 8–9

Halmos PR (2001) Naive set theory, New edn. Springer, New York, chapters 17–19

Structural induction and recursion. It is rather difficult to find introductory presentations. One ofthe purposes of the present chapter has been to fill the gap!

5Counting Things: Combinatorics

AbstractUp to now, our work has been almost entirely qualitative. The concepts of a set,relation, function, recursion and induction are non-numerical, although they haveimportant numerical applications as, for example, sets of integers or recursivedefinitions on the natural numbers. In this chapter, we turn to quantitative matters,and specifically to problems of counting. We will tackle two kinds of question.

First: Are there rules for determining the number of elements of a large setfrom the number of elements of smaller ones? Here we will learn how to use twovery simple rules: addition (for unions of disjoint sets) and multiplication (forCartesian products of arbitrary sets).

Second: Are there rules for calculating the number of possible selections of kitems out of a set with n elements? Here we will see that the question is lessstraightforward, as there are several different kinds of selection, and they giveus very different outcomes. We will untangle four basic modes of selection,give arithmetical formulae for them and practice their application. In the finalsection, we will turn to the problems of counting rearrangements and configuredpartitions of a set.

5.1 Basic Principles for Addition and Multiplication

In earlier chapters, we already saw some principles for calculating the number ofelements of one set, given the number in another. In particular, in Chap. 3, we notedthe equinumerosity principle: two finite sets have the same number of elements iffthere is a bijection from one to the other. In the exercises at the end of Chap. 1, wenoted an important equality for difference, and another one for disjoint union. Theyprovide our starting point in this chapter, and we begin by recalling them.


113

114 5 Counting Things: Combinatorics

5.1.1 Principles Considered Separately

For the difference AnB of A with respect to B, that is, fa 2 A: a 62 Bg, we observedin Chap. 1:

Subtraction principle for difference: Let A, B be finite sets. Then #(AnB) D#(A) � #(A\B).

For union we saw:

Addition principle for two disjoint sets: Let A, B be finite sets. If they are disjoint,then #(A[B) D #(A)C#(B).

The condition of disjointedness is essential here. For example, when A D f1,2gand B D f2,3g, then A[B D f1,2,3g with only three elements, not four.

Clearly, the addition principle can be generalized to n sets, provided they arepairwise disjoint in the sense of Chap. 1, that is, for any i,j � n, if i ¤ j, thenAi\Aj D ¿.

Addition principle for many pairwise disjoint sets: Let A1, : : : ,An be pairwisedisjoint finite sets. Then, #([fAigi�n) D #(A1)C� � � C#(An).

Exercise 5.1.1 (1) (with solution) Let A, B be finite sets. Formulate and verify anecessary and sufficient condition for #(A[B) D #(A).

Solution #(A[B) D #(A) iff B � A. Reason: If B � A, then A[B D A, so #(A[B)D #(A). Conversely, if the inclusion does not hold, then there is a b 2 BnA, so #(A)< #(A)C1 D #(A[fbg) � #(A[B).

Can we say anything for union when the sets A, B are not disjoint? Yes,by breaking them down into disjoint components. For example, we knowthat A[B D (A\B)[(AnB)[(BnA), and these are disjoint, so #(A[B) D#(A\B)C#(AnB)C#(BnA). But we can go further. Applying the subtractionprinciple to this, we have:

#.A [ B/ D#.A \ B/ C #.A/ � #.A \ B/ C #.B/ � #.A \ B/

D#.A \ B/ � #.A \ B/ � #.A \ B/ C #.A/ C #.B/

D#.A/ C #.B/ � #.A \ B/:

We have thus shown the following:

Addition principle for any two finite sets: Let A, B be any finite sets. Then, #(A[B)D #(A) C #(B) � #(A\B).

Exercise 5.1.1 (2)(a) Use a Venn diagram to illustrate the addition principle for two disjoint sets.(b) Do the same for the non-disjoint version.(c) Check the claim that A[B D (A\B)[(AnB)[(BnA).

5.1 Basic Principles for Addition and Multiplication 115

Alice Box: Addition Principle for Many Finite Sets

Alice: We generalized the addition principle for two disjoint sets to ndisjoint sets. Can we make a similar generalization when the setsare not assumed to be disjoint?

Hatter: Indeed we can, but its formulation is rather more complex. To un-derstand it properly, we need the notion of an arbitrary combinationof n things k at a time, which we will get to later in the chapter. Solet’s take a rain-check on that one.

Exercise 5.1.1 (3) (with solution)(a) The classroom contains 17 male students, 18 female students and the professor.

How many altogether in the classroom?(b) The logic class has 20 students who also take calculus, 9 who also take a

philosophy unit, 11 who take neither, and 2 who take both calculus and aphilosophy unit. How many students in the class?

Solution(a) 17 C 18 C 1 D 36, using the addition principle for n D 3 disjoint sets.(b) ((20 C 9) � 2) C 11 D 38, using the addition principle for two arbitrary finite

sets to calculate the number taking either calculus or philosophy (in the outerparentheses), and then applying the addition principle for two disjoint sets tocover those who take neither.

Applications of the addition principles taken alone are usually quite trivial.Their power comes from their joint use with other rules, notably the multiplicationprinciple, which tells us the following:

Multiplication rule for two sets. Let A, B be finite sets. Then, #(A�B) D #(A)�#(B).

In words: The number of elements in the Cartesian product of two sets is equalto the product of the numbers in each. Reason: If B has n elements, then for eacha 2 A, #f(a,b): b 2 Bg D n. If A has m elements, then there are m of these setsf(a,b): b 2 Bg for a 2 A. Moreover, they are disjoint, and their union is A�B. So,#(A�B) D n C : : : C n (m times) D m�n D #(A)�#(B). The same reasoning gives usmore generally:

Multiplication rule for many sets. #(A1� : : : �An) D #(A) � : : : �#(B) for any finitesets A1, : : : ,An.

Exercise 5.1.1 (4) (with solution) The menu at our restaurant allows choice: Forfirst course, one can choose either soup or salad, the second course can be eitherbeef, duck or vegetarian, followed by a choice of fruit or cheese, ending with blacktea, green tea or coffee. How many selections are possible?


Solution 2�3�2�3D 36. The selections are independent of each other, and so arein one-one correspondence with the Cartesian product of four sets with 2,3,2,3elements, respectively.

Alice Box: The Maverick Diner

Alice: Not if I am there!Hatter: What do you mean?Alice: Well, I never drink anything with caffeine in it, so I would skip the

last choice. And following an old European custom, I prefer to takemy fruit before anything else, so I would not follow the standardorder. So my selection would be none of your 36. It would be, say,(fruit, salad, duck, nothing).

Alice has put her finger on an important issue. When a real-life counting problemis described, its presentation is frequently underdetermined. In other words, certainaspects are left implicit, and often they can be filled in different ways. For example,in the restaurant problem, it was tacitly assumed that everyone chooses exactly onedish from each category, and that the order of the categories is fixed: any otherreordering is either disallowed or regarded as equivalent to the listed order. If wedrop these assumptions, we get quite different numbers of selections.

In general, the trickiest part of a real-life counting problem is not to be found inthe calculations to be made when applying a standard mathematical formula. It liesin working out which (if any) of the mathematical formulae in one’s toolkit is theappropriate one. And that depends on understanding what the problem is about andlocating its possible nuances. We need to know what items are being selected, fromwhat categories, in what manner. Especially, we need to know when two items, ortwo categories or two selections are to be regarded as identical – in other words,are to be counted as one, not as two. Only then can we safely give the probleman abstract representation that justifies the application of one of the formulae inthe toolkit, rather than one of the others. We will see more examples of this aswe go on.

Exercise 5.1.1 (5)(a) Telephone numbers in the London area begin with 020 and continue with eight

more digits. How many are there?(b) How many distinct licence plates are there consisting of two letters other than

O, I and S (to prevent possible visual confusion with similar-looking digits),followed by four digits?

5.2 Four Ways of Selecting k Items Out of n 117

5.1.2 Using the Two Principles Together

Often the solution of a problem requires the use of both addition and multiplicationprinciples. Here is a simple example. How many four-digit numbers begin with 5 orwith 73?

We begin by breaking our set (of all four-digit numbers beginning with 5 orwith 73) into two disjoint subsets – those beginning with 5 and those beginningwith 73. We determine their numbers separately, and then by the addition principlefor disjoint sets, we add them. In the first case, with the digit 5, there are three digitsstill to be filled in, with 10 choices for each, so we get 103 D 1,000 possibilities. Inthe second case, beginning with 73, we have two digits to fill in; hence, 102 D 100possibilities. So, the total is 1,100 four-digit numbers beginning with 5 or with 73.

This kind of procedure is quite common, and so we look at its general form. Weare required to determine #(A) – the number of elements of a set A. We observe thatA D A1[ : : : [An where the sets Ai are pairwise disjoint. We then note that each ofthe sets Ai is (or may be put in one-one correspondence with) a Cartesian productAi1� : : : �Aik (where k may depend on i). So we apply n times the multiplicationrule to get the value of each #(Ai), and then apply once the addition rule for thedisjoint sets to get the value of #(A) itself.

Exercise 5.1.2 (1) (with partial solution)(a) You have six shirts, five ties and four pairs of jeans. You must wear a shirt and

a pair of trousers, but maybe not a tie. How many outfits are possible?(b) A tag consists of a sequence of four alphanumeric signs (letters or digits). How

many tags with alternating letters and digits begin with either the digit 9 or theletter m? Warning: Pay attention to the requirement of alternation.

Solution to (a) The set of possible outfits is the union of two sets: S�J andS�J�T. Note that these two sets are disjoint (why?). There are 6�4 D 24 elementsof S�J, and 6�4�5 D 120 elements of S�J�T. So there are 144 attires in which tosally forth. Another way of getting the same answer: Let T 0 be the six-element setconsisting of the five ties plus ‘no tie’. The set of possible outfits is S�J�T 0, whichhas 6�4�6 D 144 elements.

5.2 Four Ways of Selecting k Items Out of n

A club has 20 members, and volunteers are needed to set up a weekly roster to cleanup the coffee room, one for each of the 6 days of the week that the club is open.How many possible ways of doing this?

This question has no answer! Or rather, it has several different answers, accordingto how we interpret it. The general form is: how many ways of selecting k items(here 6 volunteers) out of a set with n elements (here, the pool of 20 members).There are two basic questions that always need to be asked before the counting canbegin: whether order matters, and whether repetition is allowed.


Table 5.1 Cells not to count a b c d e

a R�b O� R�c O� O� R�d O� O� O� R�e O� O� O� O� R�

5.2.1 Order and Repetition

Expressed in more detail, the two preliminary questions that we need to answer arethe following:• Does the order in which the selection is made matter for counting purposes? For

example, is the roster with me serving on Monday and you on Tuesday regardedas different from the one that is the same for all other days but with our time-slotsswapped? Do we count them as two rosters or as one?

• Is repetition allowed? For example, are you allowed to volunteer for more thanone day, say, both Monday and Friday?These two options evidently give rise to four cases, which we will need to

distinguish carefully in our analysis.

Order matters, repetition allowed OCRCOrder matters, repetition not allowed OCR�Order doesn’t matter, repetition allowed O�RCOrder doesn’t matter, repetition not allowed O�R�

To illustrate the difference in a two-dimensional table, consider a similar examplewith just two volunteers, and to keep the diagram small, take the total number of clubmembers to be five, whom we call a,b,c,d,e. Our table will then contain 5�5 D 25cells for the 25 pairs (x,y) of members. If we select volunteers (b,d) say, then this isrepresented by the cell for row b and column d (Table 5.1).

The decisions we make concerning order and repetition will affect which cellsare considered possible and which are taken as distinct from each other. Thus, inboth cases, these decisions determine which cells not to count.

Mode OCRC: If order matters and repetition is allowed, then each cell representsa possible selection of two volunteers out of five, and all are considered different.There are thus 25 possible selections.

Mode OCR�: If order matters but repetition is not allowed, then the diagonal cellsdo not represent allowed selections, but the remaining cells are all distinct. Thereare thus 25 � 5 D 20 possible selections, subtracting the five diagonal ones (markedR�) from the total.

Mode O�RC: If order does not matter but repetition is permitted, then all cells areallowed, but those to the upper right of the diagonal give the same outcome as their


images in the bottom left. For example, (b,d) represents the same selection as (d,b).In this case, we have only 25 � 10 D 15 possible selections, subtracting the 10 to thebottom left of the diagonal (marked O�) to avoid counting them twice.

Mode O�R�: If order does not matter and repetition is not allowed, we haveeven less. We must subtract from the previous count, the non-permitted cells ofthe diagonal, leaving only those to the upper right. Thus, we get only 15 � 5 D 10selections. Equivalently, we could subtract from the count preceding that (the onefor OCR�), the ten duplicating cells to the bottom left of the diagonal, likewisegiving 20 � 10 D 10 selections. In other words, we subtract from the total 25 boththose marked R� (not permitted) and those marked O� (duplicates).

Thus, according to the way in which we understand the selection, we get fourdifferent answers: 25, 20, 15 or 10 possible selections! The moral of the story is thatwe must consider the four modes of selection separately.

Exercise 5.2.1 Construct a similar table for the four different modes of choosing2 elements out of 6, marking the respective areas in the same way, and giving thenumber of selections under each mode.

5.2.2 Connections with Functions

It is helpful to notice connections between the above and what we have learnedabout functions in Chap. 3. In the first two modes, where order is significant, we arein effect counting functions, as follows:

For mode OCRC: When order matters and repetition is allowed, then eachselection of k items from an n-element set may be represented by an orderedk-tuple (a1, : : : ,ak) of elements of the set. As we already know, such an orderedk-tuple may be understood as a function on the set f1, : : : ,kg into the setfa1, : : : ,ang. Thus, counting the selections of k items from an n-element setfa1, : : : ,ang, understood in this mode, is the same as counting all the functionsfrom the k-element set f1, : : : ,kg into fa1, : : : ,ang.

For mode OCR�: When order matters but repetition is not allowed, then thenumber of selections of k items from an n-element set fa1, : : : ,ang equals thenumber of injective functions from the k-element set f1, : : : ,kg into fa1, : : : ,ang.Injective, because when i ¤ j, f (i) is not allowed to be the same as f (j).

In the fourth mode, we are no longer counting functions, but something veryclose which reduces to something very simple.

For mode O�R�: When order does not matter and repetition is not allowed,then the number of selections of k items from an n-element set fa1, : : : ,ang isthe same as the number of ranges of injective functions from a k-element setinto fa1, : : : ,ang. If you think about it, this is the number of k-element subsets offa1, : : : ,ang. Reason: Every such range is a k-element subset of fa1, : : : ,ang, and


every k-element subset of fa1, : : : ,ang is the range of some such function; the twocollections are thus identical and so have the same number of elements.

This leaves us with the third mode of selection, O�RC. We can also describethis in terms of functions, but they are no longer functions from f1, : : : ,kg intothe selection-pool fa1, : : : ,ang. They are functions from fa1, : : : ,ang into f0, : : : ,kgsatisfying a certain condition.

Mode O�RC: When order doesn’t matter but repetition is allowed, then thenumber of selections of k items from an n-element set fa1, : : : ,ang is the same asthe number of functions from fa1, : : : ,ang into f1, : : : ,kg satisfying the conditionthat f (a1)C : : : Cf (an) D k. At first, this may appear rather mysterious, but thereason is quite simple. The function tells us how many times each element of theselection-pool fa1, : : : ,ang actually appears in the selection. It can be zero, one ormore, but the sum of all appearances must come to k, as we are selecting k items.

Alice Box: Multisets

Alice: This is getting a bit tricky.Hatter: I can tell you about another way of understanding O�RC selections

of k items from a pool of n, but it uses a language we have not seenbefore.

Alice: OK, go ahead.Hatter: We can see them as picking out all the k-element multisets of a set

of n elements.Alice: What is a multiset?Hatter: Roughly speaking, it is like a set, but allows repetitions. For

example, when a and b are distinct items, the set fa,a,b,bg hasonly two elements and is identical to the set fa,bg; but the multiset[a,a,b,b] has four members, and is distinct from the multiset [a,b].Multisets, also sometimes called bags, have recently gained somepopularity in certain areas of computer science, and also in certainareas of logic dealing with the analysis of proofs.

Alice: Oh no! So now I have to learn a new version of Chap. 1, about thebehaviour of multisets? I protest!

Hatter: OK, let’s forget about them for this course. If ever you want to knowmore, you can begin with the Wikipedia article about them.

Exercise 5.2.2 (with solution) When k > n, that is, when the number of items tobe selected is greater than the number of items in the selection-pool, which of thefour modes of selection become impossible?

Solution The two modes that prohibit repetition (OCR� and O�R�) prevent anypossible selection when k > n. For they require that there be at least k distinct itemsselected, which is not possible when there are less than k waiting to be chosen.


Table 5.2 Names for four modes of selecting k items from an n-element set

Generic term Mode of selection Particular modes Notation

Selection OCR� Permutations P(n,k) and similar

O�R� Combinations C(n,k) and similar

OCRC Permutations with repetition allowed Varies with author

O�RC Combinations with repetition allowed Varies with author

In contrast, the two modes permitting repetition (OCRC and O�RC) still allowselection when k > n.

So far, we have been calling the four modes by their order/repetition codes; itis time to give them names. Two of the four modes – those where repetition isnot allowed – have widely accepted names going back centuries, long before anysystematic account along the lines given here:• When repetition is not allowed but order matters, that is, OCR�, the mode of

selection is traditionally called permutation. The function counting the numberof permutations of n elements k at a time is written P(n,k), or in some texts asnPk or nPk or similar.

• When repetition is not allowed and order does not matter, that is, O�R�, themode of selection is traditionally called combination. The function countingthe number of combinations of n elements k at a time is written C(n,k), or forsome authors, nCk or nCk or similar. A common two-dimensional notation, veryconvenient in complex calculations, puts the n above the k within large roundbrackets.These two modes are the ones that tend to crop up most frequently in traditional

mathematical practice, for example, in the formulation of the binomial theoremgoing back to the sixteenth century. They are also the most common modes incomputer science practice.

For the modes allowing repetition, terminology is distressingly variable fromauthor to author. The simplest names for them piggyback on the standard ones fortheir counterparts without repetition, as follows:• When repetition is allowed and order matters, that is, when we are in the mode

OCRC, we will speak of permutations with repetition allowed.• When repetition is allowed but order does not matter, that is, when we are in the

mode O�RC, we will speak of combinations with repetition allowed.

In Table 5.2, we summarize the nomenclature. Note that it lists the four modes ina sequence different from that which was convenient for our conceptual explanation.It is more convenient for numerical analysis; it corresponds to the order in which wewill shortly establish their respective counting formulae. It also corresponds, veryroughly, to the relative importance of the four modes in applications. In the table,we use the term selection to cover, quite generally, all four modes. The subject ofcounting selections is often nicknamed the theory of perms and coms.


5.3 Counting Formulae: Permutations and Combinations

So far, we have been doing essentially conceptual work, sorting out differentmodes of selection. We now present counting formulae for the four modes ofselection that were distinguished. We begin with the two in which repetition isnot allowed: permutations (OCR�) and combinations (O�R�). Table 5.3 gives thenumber of permutations and combinations (without repetition) of k items from ann-element set.

Exercise 5.3(a) Check that the figures obtained in Sect. 5.3 for choosing 2 out of 5 agree with

the formulae in the table.(b) Calculate the figures for choosing 4 out of 9 under the two modes, using the

formulae in the table.

At this stage, it is tempting to plunge into a river of examples, to practice applyingthe counting formulae. Indeed, this is necessary if one is to master the material. Butby itself it is not enough. We also need to see why each of the formulae does its job.On that basis, we can also understand when it, rather than its neighbour, should beused in a given problem.

5.3.1 The Formula for Permutations

For intuition, we begin with an example. How many six-digit numerals are therein which no digit appears more than once? The formula in our table says that it isP(n,k) D n!/(n � k)! D 10!/(10 � 6)! D 10!/4! D 10�9�8�7�6�5D 151,200. How do weget this?

We use the multiplication principle. There are n ways of choosing the firstitem. As repeated choice of the same element is not allowed, we thus haveonly n � 1 ways of choosing the second, then only n � 2 ways of choosing thethird, and so on. If we do this k times, the number of possible selections isthus: n�(n � 1)�(n � 2)� : : : �(n�(k � 1)). Multiplying this by (n � k)!/(n � k)! gives usn!/(n � k)! in agreement with the counting formula.

Table 5.3 Formulae for perms and coms (without repetition)

Mode of selection Notation Standard name Formula Proviso

OCR� P(n,k) Permutations n!/(n � k)! k � nO�R� C(n,k) Combinations n!/k!(n � k)! k � n

5.3 Counting Formulae: Permutations and Combinations 123

Alice Box: When k D n

Alice: I understand why we have the condition k � n for permutations; inExercise 5.2.2, we showed that when k > n, selection is impossiblefor the modes that disallow repetition. But what happens in thelimiting case that k D n? It worries me, because in that case, n � k is0, and in Chap. 4, the factorial function was defined only for positiveintegers.

Hatter: Nice point! To cover that case, it is conventional to extend thedefinition of factorial by setting 0! D 1. So when k D n, we haveP(n,k) D P(n,n) D n!/(n � n)! D n!/0! D n!.

Although the case k D n for permutations is a limiting one, it is very importantand often arises in applications. Then, instead of the long-winded phrase ‘permu-tation of n items n at a time’, it is customary to speak briefly of a permutation ofn items. These permutations correspond to the bijections between an n-element setand itself. Thus, the number P(n,n) of permutations of an n-element set is the sameas the number of bijections on that set into itself, and is equal to n!.

Exercise 5.3.1 (1) (with partial solution)(a) Use the formula to calculate each of P(6,0), : : : ,P(6,6).(b) You have 15 ties, and you want to wear a different one each day of the working

week (5 days). For how long can you go on without ever repeating a weeklysequence?

Solution to (b) Our pool is the set of ties, with n D 15 elements, and we are se-lecting k D 5 with order significant and repetition not allowed. We can thus apply theformula for permutations. There are thus 15!/(15 � 5)! D 15!/10! D 15�14�13�12�11D360,360 possible weekly selections. That means you can go on for nearly7,000 years without repeating a weekly sequence. Moral: you have far too many ties!

Exercise 5.3.1 (2) (with partial solution)(a) In words of ordinary English, how would you interpret the meaning of P(n,k)

when k D 0? Is the counting formula reasonable in this limiting case?(b) Compare the values of P(n,n) and P(n,0). Any comments?(c) Verify the claim in the text that P(n,n) counts the number of bijections between

a set of n elements and itself. Hint: Use the connection between of permutationsin general and injections given in Sect. 5.2.2 and an exercise on bijections inSect. 3.3.3.

Solutions to (a) and (b)(a) P(n,0) is the number of ways of selecting nothing from a set of n elements. The

counting formula gives the value P(n,0) D n!/n! D 1. That is, there is one wayof selecting nothing (namely, doing nothing). That’s reasonable enough for alimiting case.


(b) As Alice remarked, P(n,n) D n!/(n�n)! D n!/0! D n!, while as we have seen inpart (a), P(n,0) D 1. Comment: clearly, P(n,k) takes its largest possible valuewhen k D n, and its smallest when k D 0.

5.3.2 Counting Combinations

For concreteness, we again begin with an example. How many ways are thereof choosing a six-person subcommittee out of a full ten-person committee? It isimplicitly understood that the order of the members of the subcommittee is ofno consequence, and that all six members of the subcommittee must be differentpeople. That is, we are in mode O�R�; we are counting combinations. The countingformula in the table gives us C(n,k) D n!/(k!(n � k)!) so that C(10,6) D 10!/(6!4!)D (10�9�8�7)/(4�3�2)D 5�3�2�7 D 210. This is a tiny fraction of the 151,200 forpermutations: to be precise, it is one 6!th, that is, one 720th. Order matters!

How can we prove the general formula for combinations? Notice that it says, ineffect, that C(n,k)�k! D P(n,k) for k � n, so it suffices to prove that. Now C(n,k)counts the number of k-element subsets of an n-element set. We already know thateach such subset can be given k! D P(k,k) orderings. Hence, the total number P(n,k)of ordered subsets is C(n,k)�k!, and we are done.

Exercise 5.3.2 (1) (with partial solution)(a) Use the formula to calculate each of C(6,0), : : : C(6,6).(b) Draw a chart with the seven values of k (0� k� 6) on the abscissa (horizontal

axis) and the values of each of P(6,k) and C(6,k) on the ordinate (vertical axis).(c) Your investment advisor has given you a list of eight stocks attractive for

investment. You decide to invest in three of them. How many different selectionsare possible?

(d) Same scenario, except that you decide to invest $1,000 in one, double that inanother, and double that again in a third. How many different selections arepossible?

Solutions to (c) and (d) These two questions illustrate how important it is tounderstand, before applying a formula, what kind of selection we are supposed tobe making.(c) It is implicit in the formulation that you wish to invest in three different

stocks – no repetition. It is also implicit that the order of the selection is to bedisregarded – we are interested only in the subset selected. So we are countingcombinations (mode O�R�) and can apply the formula for C(n,k). Once that isclear, the rest is just calculation: C(8,3) D 8!/(3!5!) D (8�7�6)/(3�2) D 4�7�2 D 56.

(d) In this question, it is again assumed that there is no repetition, but the order of theselection is regarded as important – we are interested in which stock is boughtin what volume. So we are back with permutations (mode OCR�), and shouldapply the counting formula for P(n,k), to get P(8,3) D 8!/5! D 8�7�6 D 336.

It is instructive to consider an example that looks quite like the last two of theexercise, but which reveals greater complexity. Your investment advisor has given

5.3 Counting Formulae: Permutations and Combinations 125

you a list of eight stocks attractive for investment, and a linear ordering of theseeight by their estimated risk. You decide to invest $1,000 in a comparatively riskyone, double that in a less risky one, and double that again in an even less risky one.How many different selections are possible? Now, the first choice must not be thesafest stock, nor the second safest, but may be from any of the six riskier ones. Sofar, so good: there are six ways of choosing the first stock. But the range of optionsopen for the second choice depends on how we made the first one: it must be on alesser risk level than the first choice, but still not the least risky one. And the thirdchoice depends similarly on the second. We thus have quite an intricate problem –it requires more than a simple application of any one of the counting formulae inour table. We will not attempt to solve it here, but merely emphasize the lesson:real-life counting problems, apparently simple in their verbal presentation, can bequite nasty to solve.

Exercise 5.3.2 (2) (with partial solution)(a) In words of ordinary English, how would you interpret the meaning of C(n,k)

when k D 0?(b) Explain why C(n,k) � P(n,k) for all n,k with 1 � k � n.(c) State and prove a necessary and sufficient condition on n,k for C(n,k) < P(n,k).(d) Compare the values of C(n,n), C(n,1) and C(n,0) with their counterparts P(n,n),

P(n,1) and P(n,0) and explain the similarities/differences in intuitive terms.(e) Show from the counting formula that C(n,k) D C(n,n � k).(f) Suppose that n is even, that is, n D 2m. Show that for j < k � m we have C(n,j)

< C(n,k).(g) Same question but with n odd, that is, n D 2m C 1.

Solutions to (d), (e)(d) For the combinations, we have C(n,n) D n!/(n!(n � n)!) D (n � n)! D 0! D 1,

while C(n,1) D n!/(1!(n � 1)!) D n!/(n � 1)! D n, and C(n,0) D n!/(0!(n � 0)!)D n!/n! D 1. For permutations, as we have already seen P(n,n) D n!, P(n,1) D nand P(n,0) D 1. Thus, P(n,n) D n! > 1 D C(n,n), as we would expect since thereis just one subset of all the n items in our pool, but n! ways of putting that subsetin order. On the other hand, C(n,0) D 1 D P(n,0) since there is just one way ofselecting nothing, irrespective of order; and C(n,1) D n D P(n,1) because whenyou are selecting just one thing, the question of order of selection does not arise.

(e) By the counting formula, C(n,n � k) D n!/[(n � k)!�(n �(n � k))!] Dn!/[(n � k)!k!] D n!/[k!(n � k)!] D C(n,k) again by the counting formula.

This is a good moment to fulfil a promise that Hatter made to Alice in the firstsection of this chapter, when discussing the addition principle for two arbitrarysets: #(A[B)D #(A)C#(B) � #(A\B). The unanswered question was: How do wegeneralize this from 2 to n arbitrary sets?

Recall how we arrived at the principle for two sets. We counted the elements ofA and B separately and added them; but as the two sets may not be disjoint, thatcounted the elements in A\B twice, so we subtract them once. For three sets A,B,Cthe reasoning is similar. We begin by adding the number of elements in each of


the three. But some of these may be in two or more of the sets, so we will haveovercounted, by counting those elements at least twice. So in the next stage, wesubtract the number of elements that are in the intersection of any two of the sets.But then we may have undercounted, for each item x that is in all three sets wascounted three times in the first stage and then subtracted three times in the secondstage, and thus needs to be added in again, which is what we do in the last stage.The rule for three sets is thus:

#(A[B[C) D #(A) C #(B) C #(C) � #(A\B) � #(B\C) � #(A\C) C #(A\B \C).

For the general rule, let A1, : : : ,An be any sets with n � 1. Then, #(A1[ : : : [ An)equals:

C (the sum of the cardinalities of the sets taken individually)� (the sum of the cardinalities of the sets intersected two at a time)C (the sum of the cardinalities of the sets intersected three at a time)� (the sum of the cardinalities of the sets intersected four at a time): : : : : : : : : : : : : : : : : : : : :

˙ (the sum of the cardinalities of the n sets intersected n at a time)

The last term is marked ˙ because the sign depends on whether n is even or odd.If n is odd, then the sign is C; if n is even, then it is �. Of course, the rule can bewritten in a much more compact mathematical notation, but for our purposes, wecan leave that aside. Traditionally, the rule is called the principle of inclusion andexclusion, because one first includes too much, then excludes too much and so on.It could also be called the principle of overcounting and undercounting.

Application of the rule ties in with the theory of combinations. For example,in the second line, we need to work out the sum of the cardinalities of the setsAi\Aj for all distinct i,j � n. The value of each #(Ai\Aj) of course depends on thespecific example. But we also need to keep tabs on how many such terms we needto consider. Since there are C(n,2) two-element subsets fAi, Ajg of the n-element setfA1, : : : ,Ang, there are C(n,2) such terms to sum in the second line. Likewise, thethird line of the rule asks for the sum of C(n,3) terms, and so on.

Exercise 5.3.2 (3)(a) Draw a Euler diagram representing three intersecting sets, label the relevant

areas and paraphrase the rule of inclusion and exclusion in terms of the areas.(b) Write out the principle of inclusion and exclusion in full for the case n D 4.

Then calculate C(4,k) for each k D 1, : : : ,4 to check that at each stage you havesummed the right number of terms.

5.4 Selections Allowing Repetition

We now consider the two modes of selection that allow repetition of the selectedelements. We add these two modes as new rows at the bottom of the table that wehad for the non-repetitive selections (Table 5.4).

5.4 Selections Allowing Repetition 127

Table 5.4 Formulae for four modes of selecting k items from an n-element set

Mode ofselection Notation Standard name Formula Proviso

Example:n D 10, k D 6

OCR� P(n,k) Permutations n!/(n � k)! k � n 151,200O�R� C(n,k) Combinations n!/k!(n � k)! k � n 210OCRC Permutations with

repetitionnk none 1,000,000

O�RC Combinations withrepetition

(n C k � 1)!/k!(n � 1)! n � 1 5,005

5.4.1 Permutations with Repetition Allowed

We are looking at the number of ways of choosing of k elements from an n-elementset, this time allowing repetitions and distinguishing orders. To fix ideas, keep inmind a simple example: How many six-digit telephone numbers can be concoctedwith the usual ten digits 0–9? The formula in the table tells us that there are106 D 1,000,000. This is far more than the figure of 151,200 for plain permutationswhere repetition is not allowed, and dwarfing the 210 for combinations. What is thereasoning?

As for plain permutations, we apply the multiplication principle. Clearly, thereare n ways of choosing the first item. But, as repetition is allowed, there are again nways of choosing the second, and so on, thus giving us n� : : : �n (k times), that is, nk

possible selections.In more abstract language: Each selection can be represented by an ordered

k-tuple of elements of the n-element pool, so that the set of all the selectionscorresponds to the Cartesian product of the latter by itself k times which, as wehave seen by the principle of multiplication, has cardinality nk. As we remarkedearlier, this is the same as the number of functions from a k-element set such asf1, : : : ,kg into an n-element set fa1, : : : ,ang.

We recall that this operation makes perfectly good sense even when k > n.For example, if only two digits may be used, we can construct six-digit telephonenumbers out of them, and there will be 26 of them.

Exercise 5.4.1 (with partial solution)(a) What happens in the limiting case that n D 1? Does the figure provided by the

formula square with intuition?(b) What about the limiting case that k D 1? Does the figure given by the formula

make intuitive sense?(c) The drinks machine has three kinds of coffee, four kinds of soft drink, still

water and sparkling water. Every working day I buy a coffee to wake up in themorning, a bottle of water to wash down lunch and a soft drink for energy in theafternoon. Assuming that I start work on a Monday and work 5 days a week,how many weeks before I am obliged to repeat my daily selection?


Solution to (a)(a) When n D 1, then nk D 1 for any k. Intuitively, this is what we want, for there

is only one way to write down the same thing k times.

The limiting case when n D 0 is a bit odd. On the one hand, when n D 0 butk � 1, we have nk D 0k D 0. This is fairly natural: in this case, no selection can bemade since there is nothing to select from. On the other hand, the ultra-limiting casewhere n D 0 and k D 0 comes out differently: here nk D 00 D 1, not 0. One couldaccommodate this with intuition by reflecting that in this case, we are asked to selectnothing from nothing, and by doing nothing, we achieve just that, so that there is oneselection (the empty selection) that does the job. This contrasts with being asked toselect something from nothing (the case n D 0, k � 1), for there is no way of doingthat – not even by doing nothing! So intuition, carefully cultivated, may be saidto accord with the formula after all. Moreover, the formulae for permutations andcombinations without repetition also yield value 1 in this case (and are not definedwhen n D 0 but k � 1).

That said, it should also be remarked that in mathematics, definitions quiteoften give rather odd-looking outcomes in their limiting cases. Nevertheless, solong as the definition behaves as desired in the principal cases, for the limitingones the mathematician is pretty much free to stipulate whatever permits thesmoothest formulation and proof of general principles. Reflection on the limitingcases sometimes suggests interesting philosophical perspectives, and usually onecan find a way of seeing things that brings educated intuition into line with themathematical convention adopted in the limiting case.

5.4.2 Combinations with Repetition Allowed

As usual, we begin with an example. A ten-person committee needs volunteersto handle six tasks. We are not interested in the order of the tasks, and dedicatedmembers are allowed to volunteer for more than one task. How many different waysmay volunteers come forward?

Since we are not interested in the order in which the tasks are performed,but we do allow multiple volunteering, we need to apply the formula forcombinations with possible repetition in Table 5.4. This tells us that it is(n C k � 1)!/k!(n � 1)! D (10 C 6 � 1)!/6!(10� 1)! D 15!/6!9! D (15�14�13�12�11�10)/(6�5�4�3�2)D 7�13�11�5D 5,005. More than the 210 for plain combinations, but stillmuch less than the 151,200 for plain permutations, not to speak of the 1,000,000for permutations with repetition allowed.

This mode was the trickiest to interpret in terms of functions in Sect. 5.3, and itscounting formula is likewise the trickiest to prove. The basic idea of the argumentis to reconceptualize the problem, of counting the selections under the O�RCmode (combinations with repetition allowed) to one under the O�R� mode (plaincombinations, in which repetitions are not allowed), from a larger pool of elements.

Recall that, as mentioned earlier, in the O�RC mode, we are in effect countingthe number of functions f from an n-element set fa1, : : : ,ang into f0, : : : ,kg such

5.4 Selections Allowing Repetition 129

that f (a1)C : : : Cf (an) D k. Consider any such function f. To reconceptualize it, wethink of ourselves as provided with a sufficiently long sequence of slots, which wefill up to a certain point in the following way. We write 1 in each of the first f (a1)slots, then a label C in the next slot, then 2 in each of the next f (a2) slots, then alabel C in the next one, and so on up to the (n � 1)th label C, and finally, write nin each of the next f (an) slots, and stop. This makes sense provided n � 1. Careful:We do not put a plus sign at the end: we use n � 1 plus signs just as there are in theexpression f (a1)C� � � Cf (an). The construction will thus need kC(n � 1) slots, sincewe are requiring that f (a1)C� � � Cf (an) D k and we insert n � 1 plus signs.

Now the slot pattern we have created is in fact fully determined by the totalnumber kC(n � 1) of slots and the particular slots where the plus signs go. This isbecause we can recuperate the whole pattern by writing 1s in the empty slots to theleft of the first plus sign, then 2s in the empty slots to the left of the second plussign, and so on up to (n � 1)s in the empty slots to the left of the (n � 1)th plus signand, not to be forgotten, writing ns in the remaining empty slots to the right of thelast plus sign until all kC(n � 1) slots are occupied. So in effect, we are looking atthe number of ways of choosing n � 1 slots for plus signs from a set of kC(n � 1)slots. The n � 1 plus sign slots must be distinct from each other (we can’t put twoplus signs in the same slot), and we are holding the order of the items fixed.

But for counting purposes, holding the order fixed is the same as taking the orderto be of no significance, that is, identifying different orderings of the same items.This is because there is a trivial one-one correspondence between the two. So we arein effect making a selection, where order remains without significance but repetitionis not allowed, of n � 1 items from a larger pool of k C(n � 1) elements. We havereduced the counting problem for the O�RC mode to one for the O�R� mode,that is, plain combinations, with different values for the variables. Specifically, wehave shown that the number of selections when we choose k items from a pool of nelements, using the O�RC mode of selection, equals C(k C n � 1, n � 1).

The rest is calculation: C(k C n � 1, n � 1) D (k C n � 1)!/(n � 1)!�((k C n � 1)� (n � 1))! D (k C n � 1)!/(n � 1)!� k! D (n C k � 1)!/k!(n � 1)! – which is the de-sired counting formula for the O�RC mode in Table 5.4, so our proof of thatformula is complete.

Exercise 5.4.2 (with partial solution)(a) Use the formula to calculate the number of combinations with repetition of 6

items taken k at a time, for each k D 1, : : : ,8.(b) The restaurant has five flavours of ice cream, and you can ask for one ball,

two balls or three balls (unfortunately, not the same price). How many differentservings are possible?

Solution to (b)(b) We are looking at a set with 5 elements (the flavours), and at selections of one,

two or three items (the balls), allowing repetition. Does order count? Well,the formulation of the problem is rather vague. Two interpretations suggestthemselves.


First interpretation. If the balls are laid out in a row on a suitably shaped dish, wemight regard the serving strawberry-chocolate-vanilla as different from chocolate-vanilla-strawberry. In that case, we are in the mode OCRC, and reason as follows.Break the set of selections down into three disjoint subsets – those with one ball,two balls, three balls. Calculate each separately. There are 51 D 5 selections withone ball, 52 D 25 with two balls, 53 D 125 with three balls. So, in total, there are5 C 25 C 125 D 155 possible selections.

Second interpretation. If we don’t care about the order in which the balls aredistributed on the plate, then we are in mode O�RC, and reason as follows.Again, break the class of selections into three disjoint subclasses, and applythe counting formula to each separately. With one ball, there are still 5 se-lections, since (5 C 1 � 1)!/1!(5 � 1)! D 5!/4! D 5. But with two balls, there arenow only (n C k � 1)!/k!(n � 1)! D (5 C 2 � 1)!/2!(5 � 1)! D 6!/2!4! D 3�5 D 15 se-lections, and with three balls, just (5 C 3 � 1)!/3!(5 � 1)! D 7!/3!4! D 7�5 D 35. Wethus get a total of 5 C 15 C 35 D 55 possible selections – only about one-third of thenumber under the first interpretation.

5.5 Rearrangements and Partitions*

We have covered all the four modes of selection of k items from a pool of n elements,but there are also other things that one can do. For example, one can rearrangethe letters in a word, and partition a set into cells of specified sizes. Each of theseoperations gives rise to a counting problem: how many ways of doing it?

5.5.1 Rearrangements

How many ways of arranging the letters in banana? This problem sounds verysimple, but its analysis is quite subtle. At first sight, it looks like a case ofpermutations allowing repetition: order definitely matters since banana ¤ nabana,and there are repeats since only three letters are filling six places. But there is also abig difference: the amount of repetition is predetermined: there must be exactly threeas, two ns and one b. Operations of this kind are usually called rearrangements.

It is important to approach the problem with the right gestalt. In the case ofbanana, don’t think of yourself as selecting from a three-element set consisting ofthe letters a,n,b. Instead, focus on a six-element set consisting of the six occurrencesof letters in the word or, alternatively, six slots ready for you to insert thoseoccurrences.

These lead to two ways of analyzing the problem. They give the same numericalresult but provide an instructive contrast: one uses permutations while the otheruses combinations. We describe them in turn. For each we begin with an analysisof the example, then state the general rule for words with k letters and n letter-occurrences. To distinguish letter-occurrences from letters-as-types, we rewrite theword with subscripts: b1a1n1a2n2a3.

5.5 Rearrangements and Partitions 131

5.5.1.1 First MethodAnalysis of the example: We know that there are P(6,6) D 6! D 720 permutations ofthe set A D fb1,a1,n1,a2,n2,a3g. Clearly, for each such permutation, there are 3! D 6that differ from it at most in the distribution of subscripts to the letter a. So droppingthose subscripts means that we need to divide 720 by 6. But there are also 2! D 2differing from it at most in the distribution of subscripts to the letter n, so we alsoneed to divide by 2. The letter b occurs just once, so as far as it is concerned, thereis no need to divide. Thus, the number of rearrangements of the letters (including,of course, the original arrangement banana) is 720/6�2 D 60.

General rule: Suppose we are given a word with n letter-occurrences, made up of kletters, with those letters occurring m1, : : : , mk times. Then there are n!/m1! : : : mk!rearrangements of the letters in the word.

5.5.1.2 Second MethodFor the second approach, we think in terms of six slots s1, : : : ,s6 ready to receiveletter-occurrences.

Analysis of the example: The rearrangements of the letters can be thought ofas functions f that take elements of the three-element set fa,n,bg to subsets offs1, : : : ,s6g in such a way that f (a) has three elements, f (n) has two, f (b) is asingleton and the family ff (a), f (n), f (b)g partitions fs1, : : : ,s6g. How can we countthese functions?

We know that there are C(6,3) ways of choosing a three-element subset for f (a),leaving 3 slots unfilled. We have C(3,2) ways of choosing a two-element subset ofthose three slots for f (n), leaving 1 slot to be filled with the letter b. That gives usC(6,3)�C(3,2)�C(1,1) selections. Calculating using the formula already known forcombinations:

C .6; 3/ � C .3; 2/ � C .1; 1/ DŒ6Š=3Š.6 � 3/Š� � Œ3Š=2Š.3 � 2/Š� � Œ1Š=1Š.1 � 1/Š�

DŒ6Š=3Š3Š� � Œ3Š=2Š�

D20 � 3 D 60:

This is the same figure as reached by the first method.

General rule: We have a set fa1, : : : ,akg of k letters to be written into a set fs1, : : : ,sng(k � n) of slots, with each letter ai being written in mi slots. So we are looking atfunctions f that take elements of the set fa1, : : : ,akg to subsets of fs1, : : : ,sng insuch a way that each f (ai) has mi elements and the family ff (ai) : i � kg partitionsfs1, : : : ,sng. How can we count these functions?

We know that there are C(n,m1) ways of choosing an m1-element subset forf (a1), leaving n � m1 slots unfilled. We have C(n � m1, m2) ways of choosing anm2-element subset of those slots for f (a1), leaving n � m1 � m2 slots to be filled,


and so on until all the n slots are used up. Clearly this can be done in the followingnumber of ways:

C(n, m1)�C(n � m1, m2)� : : : �C(n � m1 � m2� : : : �mk � 1, mk).

This formula looks pretty ghastly. But if we write it out using the formula forcombinations and then do successive cancellations, it simplifies beautifully comingdown to n!/m1! : : : mk! – which is the same formula as obtained by the other method.

Exercise 5.5.1(a) Using the counting formula for combinations, write out the formula C(n,

m1)�C(n � m1, m2)� : : : �C(n � m1 � m2� : : : �mk � 1, mk) in full for the casethat n D 9, k D 4. Perform the cancellations to check that it agrees with9!/m1! : : : m4!.

(b) Apply the counting formula n!/m1! : : : mk! to determine how many ways ofrearranging the letters in the word Turramurra.

5.5.2 Counting Configured Partitions

The second method for counting rearrangements can be expressed in an abstractway, without mentioning words and their component letters. Suppose we want tocount the number of functions f that take elements of a k-element set fa1, : : : ,akgto subsets of an n-element set fs1, : : : ,sng (k � n) such that each set f (ai) has mi

elements and the family ff (ai) : i � k)g partitions fs1, : : : ,sng. Then by the samereasoning as for the second method, we may use the same formula:

C(n, m1)�C(n � m1, m2)� : : : �C(n � m1 � m2� : : : �mk � 1, mk) D n!/m1! : : : mk!

Let’s see how this plays out in an example. There are 9 students in a class. Howmany ways of dividing them into tutorial groups of 2, 3 and 4 students for tutorialson Monday, Wednesday and Friday? We are looking at a set A D fa1, : : : ,a9g with9 elements, and we want to count ways of partitioning it into three cells of specifiedsizes. But before applying a counting formula, we need to be more specific aboutwhat we are trying to count. Do we regard the assignment to days of the week asrelevant? For example, suppose we put students a1,a2 in one tutorial group, a3,a4,a5

in another, a6,a7,a8,a9 in the third; does it make a difference for our countingwhether a1,a2 are taught on Monday rather than Wednesday?

Suppose that we wish to regard these as different. We know from the abstractversion of the second method for counting rearrangements, given above, whatformula to use for that. It is n!/m1! : : : mk! giving us 9!/2!3!4! D 1,260 possibleways of assigning the students.

However, we may wish to ignore which day of the week a given tutorial groupis to be taught. Perhaps we are merely constituting the three groups, leaving theirscheduling for later decision. Then we should divide the above counting formula by3! giving us 1,260/6 D 210 possible ways of forming three tutorial groups of the

5.5 Rearrangements and Partitions 133

Table 5.5 Formulae for counting rearrangements and partitions

Description Brief name Counting formula

Number of rearrangements of k letters in a wordof n letter occurrences

Rearrangements n!/m1! : : : mk!

Number of partitions with a given numericalconfiguration (k cells and cell sizes m1, : : : , mk)

Configuredpartitions

n!/m1! : : : mk! k!

specified sizes. With this way of interpreting the problem, we are in effect countingthe number of partitions of the pool set (n D 9) into a specified number of cells(k D 3) of specified sizes (2,3,4), and using the formula n!/m1! : : : mk!k!.

The answer we get depends thus on the method we use, which in turn dependson the exact interpretation of the problem. Sometimes there are only hints in itsformulation to indicate which interpretation is meant. On other occasions, theformulation may simply be ambiguous, in which case we say that the problem isunderdetermined. Some texts take a perverse pleasure in keeping the hints to analmost imperceptible minimum in their problems. Not here.

We summarize the results of the section in Table 5.5, which thus supplementsTable 5.4 for the four O˙R˙ modes.

Alice Box: Different but ‘Identical’

Alice: I have been reading some other textbooks on this material, and thereis something that leaves me quite confused.

Hatter: Go ahead.Alice: I find problems that begin like this: ‘Let A be a set of six elements,

of which three are identical : : : ’. But that doesn’t make any sense!Hatter: Why not?Alice: You can’t have different items that are nevertheless identical! If A D

fa,b,c,d,e,f g and a,b,c are identical to each other, then A has at mostfour elements, not six. These books seem to be speaking a differentlanguage!

Hatter: Well, at least a different dialect. It is rather confusing, indeed.

We should say a bit more about this than the Hatter does. The subject that thisbook calls ‘sets, logic and finite mathematics’ is a fairly recent assemblage. It ismade up of different topics developed at quite distinct times in history. Each partevolved using its specific conceptual structures and terminologies, and these do notfit well with each other.

Textbooks like ours try to put it all into one coherent story, based on thevery general notion of a set. But several of the topics are much older than thetheory of sets. For example, the theory of counting was already highly active inthe seventeenth century, and indeed goes back many centuries before – consider


Fibonacci in 1202! In contrast, set theory as we know it today appeared only in thelate nineteenth century. As one would expect, the conceptual frameworks are quitedifferent.

Consequently, textbooks like this one are torn between the demands of coherenceof the overall structure, on the one side, and faithfulness to the deeply rootedterminologies of specific topics on the other. Usually something is given up onboth sides. In this book, some concessions are made to traditional topic-specificterminologies, but coherence of the broad picture tends to take priority.

So how can you or Alice make sense of problems that speak of selections froma set with several identical elements? Above all, try to interpret what could bemeant by the particular problem, taking into account its content as well as itsstructure and paying attention to little hints. Sometimes when the problem speaksof identical or indistinguishable elements, it may mean that we have no way oftelling the difference between different orders of the elements, so that interest liesin selections where order is immaterial. On other occasions, it may be a way ofreferring obliquely to a partition (in other words, an equivalence relation) betweenelements. Then we may have a problem of counting partitions. There is no formalalgorithm for ascertaining what is meant: it is an exercise in hermeneutics. Tryto find the most natural interpretation of the problem, remembering there is notalways a unique reading, as more than one may be plausible. The following exerciseillustrates the task.

Exercise 5.5.2 (with hints for solution)(a) How many ways are there to place 12 indistinguishable balls into 10 slots?(b) In a cellar, there are 12 bottles of wine in a row, of which 5 are the same, another

4 the same and another 3 the same. They are taken down and put into a box. Thenthey are put back on the shelf. How many different results are possible?

Hints for solution(a) The balls are described as ‘indistinguishable’. Presumably we are not thinking

of the trivial partition with just one cell. We are envisaging the selection, withrepetition permitted but order ignored, of 12 slots from the set of 10, that is,12-combinations with repetition from a 10-element set. Go on from there.

(b) From the phrasing, it appears that the problem setter is imagining a rack with12 places on it, and functions on the three kinds of wine taking each kind to asubset, with a specified size, of the places. Go on from there.


Exercise 5.1: Addition and multiplication(a) A computer represents elements of Z using binary digits 0 and 1. The first

digit represents the sign (negative or positive), and the remainder represents themagnitude. How many distinct integers can we represent with n binary digits?


Warning: Be careful with zero.

(b) I want to take two books with me on a trip. I have two logic books, threemathematics books and two novels. I want the two books that I take to be ofdifferent types. How many possible sets of two books can I take with me?

(c) In the USA, radio station identifiers consist of either 3 or 4 letters of thealphabet, with the first letter a K or a W. Determine the number of possibleidentifiers.

(d) You are in Paris, and you want to travel Paris-Biarritz-London-Paris or Paris-London-Biarritz-Paris. There are 5 flights each way per day between Parisand Biarritz, 13 each way between Paris and London, but only 3 per dayfrom London to Biarritz and just 2 in the reverse direction. How many flightsequences are possible?

Exercise 5.2: The principle of inclusion and exclusion Amal telephones 5different people, Bertrand telephones 10, Clarice telephones 7. But 3 of the peoplecalled by Amal are also called by Bertrand, 2 of those called by Bertrand are rungby Clarice, and 2 of those contacted by Clarice are called by Amal. One person wascalled by all three. How many people were called?

Exercise 5.3: Four basic modes of selection(a) In the first round of a quiz show, a contestant has to answer correctly seven

questions, each with a yes/no answer. Another contestant has to answer correctlytwo questions, each of which is multiple-choice with seven possible answers.Assuming that both contestants are totally ignorant and guess blindly, who facesthe more daunting task?

(b) Australia currently has six states (not counting its various territories, notablythe Northern Territory). A manager proposes to test a product in four of thosestates. In how many ways may the test states be selected?

(c) Another manager, more cautious, proposes to test the product in one state at atime. In how many ways may a test-run be selected?

(d) Even more cautious, the director decides to test the product in one state at atime, with the rule that the test-run is terminated if a negative result is obtainedat any stage. How many test-runs are possible?

(e) A game-board is a 4 by 4 grid of squares, and you have to move your piece fromthe bottom left square to the top right one. The only moves allowed are ‘onesquare to the right’ and ‘one square up’. How many possible routes?

Warning: This is more subtle than you may at first think. The answer is not 28.


(f) You have five dice of different colours, and throw one after another, keepingtrack of which is which. What is the natural concept of an outcome here? Howmany such outcomes are there?

(g) You throw five dice together, and are not able to distinguish one die fromanother. What is the natural concept of an outcome here, and how many arethere?

Exercise 5.4: Rearrangements and configured partitions*(a) The candidate wins a free holiday in the place whose name admits the

greatest number of arrangements of its letters: Mississippi, Ouagadougou andWoolloomooloo. Make the calculations and give the answer to win the trip.

(b) How many arrangements of the letters in Woolloomooloo make all occurrencesof any given letter contiguous?

(c) How many distributions of 10 problems are possible among Albert, Betty,Carlos and Deborah if Albert is to do 2, Betty 3 and Deborah 5.

(d) How many ways of dividing 10 problems up into three groups of 2, 3 and 5problems?

(e) Which is larger: the number of partitions of a set with 12 elements into threecells each of four elements, or the number of partitions of the same set into fourcells each of three elements?

Selected Reading

Almost every text on discrete mathematics has a chapter on this material somewhere in the middleand, although terminology, notation and order of presentation differ from book to book thematerial covered is usually much the same. One clear beginner’s exposition is:

Haggarty R (2002) Discrete mathematics for computing. Pearson, Harlow, chapter 6

More detailed presentations include:Johnsonbaugh R (2009) Discrete mathematics, 7th edn. Pearson, Upper Saddle River, chapter 6Lipschutz S (1997) Discrete mathematics, Schaum’s outline series. McGraw Hill, New York,

chapter 6

For more advanced texts dedicated to the subject, see, for example:Andreescu T et al (2004) Counting strategies: a path to combinatorics for undergraduates.

Springer/Birkhauser, New York/BostonHerman J et al (1997) Counting and configurations: problems in combinatorics, arithmetic and

geometry, CMS books in mathematics. Springer, New York

As mentioned by the Hatter, a good place to begin reading about multisets is the relevant article inWikipedia.

6Weighing the Odds: Probability

AbstractIn this chapter, we introduce the elements of probability theory. In the spirit ofthe book, we confine ourselves to the discrete case, that is, probabilities on finitedomains, leaving aside the infinite case.

We begin by defining probability functions on a finite sample space andidentifying some of their basic properties. So much is simple mathematics.This is followed by some words on different philosophies of probability, andwarnings of traps that arise in applications. Then back to the mathematical work,introducing the concept of conditional probability and setting out its properties,its connections with independence and its use in Bayes’ theorem. An interludepresents the curious configuration known as Simpson’s paradox. In the finalsection, we explain the notions of a payoff function and expected value.

6.1 Finite Probability Spaces

In the chapter on rules for counting, we remarked that the area is rather older than themodern theory of sets, and tends to keep some of its traditional ways of speakingeven when its ideas can be expressed in a set-theoretic manner. The same is trueof probability theory, creating problems for both reader and writer of a textbooklike this. If we simply follow the rather loose traditional language, the reader mayfail to see how it rests on fundamental notions. On the other hand, if we translateeverything into set-theoretic terms the reader is ill-prepared for going on to otherexpositions couched in the traditional terminology.

For this reason, we follow a compromise approach. We make use of traditionalprobabilistic terminology, but at each step show how it is serving as shorthand for auniform one in terms of sets and functions. A table will be used to keep a runningtally of the correspondences.


137

138 6 Weighing the Odds: Probability

It turns out that applications of probability theory to practical problems oftenneed to deploy the basic counting principles that were developed in the precedingchapter, and so the reader should be ready to flip back and forth to refresh the mindwhenever needed.

6.1.1 Basic Definitions

The first concept that we need in discrete probability theory is that of a samplespace. Mathematically, this is just an arbitrary finite (but non-empty) set S. Theterm merely indicates that we intend to use it in a probabilistic framework.

A probability distribution (or just distribution for short) is an arbitrary functionp: S ! [0,1] such that †fp(s): s 2 Sg D 1, that is, the sum of the values p(s) fors 2 S equals 1. Recall that [0,1], often called the real interval, is the set of all realnumbers from 0 to 1 included, that is, [0,1] D fx 2 R: 0 � x � 1g. Actually, in thecontext of discrete probability, that is, where S is a finite set, we could without lossof generality reduce the target of the probability function to the set of all rationalnumbers from 0 to 1. But we may as well allow the whole of the real interval, whichgives us no extra trouble and is needed for the infinite case.

Exercise 6.1.1 (1) (with solution) Let S D fa,b,c,d,eg. Which of the followingfunctions p are probability distributions on S? Give a reason in each case.(a) p(a) D 0.1, p(b) D 0.2, p(c) D 0.3, p(d) D 0.4, p(e) D 0.5.(b) p(a) D 0.1, p(b) D 0.2, p(c) D 0.3, p(d) D 0.4, p(e) D 0.(c) p(a) D 0, p(b) D 0, p(c) D 0, p(d) D 0, p(e) D 1.(d) p(a) D �1, p(b) D 0, p(c) D 0, p(d) D 1, p(e) D 1.(e) p(a) D p(b) D p(c) D p(d) D p(e) D 0.2.

Solution(a) No, the values do not add to one.(b) Yes, values are in the real interval and add to one.(c) Yes, same reason.(d) No, not all values are in the real interval.(e) Yes, values are in the real interval and add to one.

The last of the functions in the exercise is clearly a very special one: it is aconstant function on S, with all the elements of the sample space receiving the samevalue. This is known as an equiprobable (or uniform) distribution, and is particularlyeasy to work with. Given a sample space S with n elements, there is evidently justone equiprobable distribution p: S ! [0,1], and it puts p(s) D 1/n for all s 2 S. Butit should be understood that this is a special case, not only mathematically but alsoin practice. Many problems involve unequal distributions, and we cannot confineourselves to equiprobable ones.

Given a distribution p: S ! [0,1], we can extend it to a function pC on the powerset P(S) of S by putting pC(A) D †fp(s): s 2 Ag when A is any non-empty subset

6.1 Finite Probability Spaces 139

of S, and pC(A) D 0 in the limiting case that A D ¿. This is the probability functiondetermined by the distribution. Even for a very small set S, there will be infinitelymany distributions on S, and each of them determines a unique probability functionpC on P(S) or, as one also says, over S. The pair made up of the sample space Sand the probability function pC: P(S) ! [0,1] is often called a probability space.Traditionally, the elements of P(S), that is, the subsets of S, are called events.

Exercise 6.1.1 (2)(a) Show that every probability function is indeed into [0,1].(b) Show that whenever #(S) � 2 then there are infinitely many distributions on S.

Hint: No need for an induction; show it in one fell swoop. If you followedthe Alice Box in Sect. 3.4.2 of Chap. 3, you can even show that there areuncountably many of them.

(c) Reformulate the above definition of pC in an explicitly recursive manner,identifying its base and recursion step and why it is well defined (cf. Sect. 4.6.3).

(d) Consider the probability function pC: P(S) ! [0,1] determined by the equiprob-able distribution p: S ! [0,1]. Explain why pC(A) measures the proportion ofelements of S that are in A, for every A � S.

(e) Explain why there are no equiprobable distributions over infinite sample spaces.Hint: Look up Archimedes’ principle in arithmetic and apply it. There is a fullexplanation in Sect. 6.2.3.

Now that we are clear about the difference between p and pC, we will often‘abuse notation’ and keep things streamlined by dropping the superscript from pC,calling it just p, in contexts where one can easily see which is meant.

6.1.2 Properties of Probability Functions

It is surprising how many useful properties of probability functions may be extractedfrom the definition. Some of the most important are covered in the followingexercise. Try to do as much as possible before looking at the solution!

Exercise 6.1.2 (1) (with solution) Let pC: P(S) ! [0,1], or more briefly p withoutthe superscript, be a probability function over a finite sample space S. Show that phas the following properties (where A,B are arbitrary subsets of S):(a) p(¿) D 0.(b) p(S) D 1.(c) p(A[B) D p(A) C p(B) � p(A\B).(d) When A and B are disjoint then p(A[B) D p(A) C p(B).(e) p(SnA) D 1 � p(A).(f) p(A\B) D p(A) C p(B) � p(A[B).(g) When A[B D S, then p(A\B) D p(A) C p(B) � 1 D 1 � [p(SnA) C p(SnB)].


Solution(a) Given explicitly in the definition of a probability function.(b) p(S) D †fp(s): s 2 Sg D 1 by the definition of a distribution function (for the

second equality) and the definition of a probability function (for the first one).(c) p(A[B) D †fp(s): s 2 A[Bg D †fp(s): s 2 Ag C †fp(s): s 2 Bg � †fp(s):

s 2 A\Bg D p(A) C p(B) � p(A\B).(d) By (c) noting that when A\B D ¿, then p(A\B) D p(¿) D 0.(e) S D A[(SnA) and the sets A, SnA are disjoint, so by (d) we have

p(S) D p(A) C p(SnA) and thus by arithmetic p(SnA) D p(S) � p(A) D 1 � p(A)by (b).

(f) This can be shown from first principles, but it is easier to get it by arithmeticmanipulation of (c).

(g) The first equality is immediate from (f) noting that when A[B D S, thenp(A[B) D p(S) D 1 by (b). For the second equality, p(A) C p(B) � 1 D[1 � p(SnA)] C [1 � p(SnB)] � 1 by (e), which simplifies to 1 � [p(SnA)C p(SnB)].

In the above exercise, we used the notation SnA for the complement of A withrespect to the sample space S, in order to avoid any possible confusion with thearithmetic operation of subtraction. Although they are closely related, they are ofcourse different. From now on, however, we will take the liberty of writing morebriefly –A.

Here’s a hint for simplifying your scratch-work when doing exercises like thepreceding one, or the one to follow. It can be quite tiresome to write out the functionsign p over and over again when calculating or proving in probability theory. Forproblems using only one probability function, we can use a more succinct notation.For each subset A of the sample space, we can write p(A) as A with the underliningrepresenting the probability function p. For example, we may write the equationp(A[B) D p(A) C p(B) � p(A\B) as A [ B D A C B � A\B. But be careful! Whenthe problem involves more than one probability function (as for example in theiterated conditionalizations defined later in this chapter), this shorthand notationcannot be used, since it cannot keep track of the different functions. To be sure,if there are, say, two probability functions in play, one could in principle use twodifferent kinds of underlining, but such devices quickly become cumbersome. Forthis reason, your instructor may not like you to employ underlining at all, so checkon class policy before using it outside personal rough work. In what follows, we willstick to standard presentation.

Exercise 6.1.2 (1)(a) Use results of the preceding exercise to show that if A � B then p(A) � p(B).(b) Use this to show that p(A\B) � p(A) and p(A\B) � p(B).(c) Let S D fa,b,c,dg and let p be the equiprobability distribution on S. Give exam-

ples of events, that is, sets A,B � S, such that, respectively, p(A\B) < p(A)�p(B),p(A\B) D p(A)�p(B) and p(A\B) > p(A)�p(B).

6.2 Philosophy and Applications 141

Table 6.1 Two languages for probability

Traditional probabilistic Modern set-theoretic

1 Sample space S Arbitrary set S2 Sample point Element of S3 Probability distribution Function p: S ! [0,1] whose values sum to 14 Event Subset of S5 Elementary event Singleton subset of S

6 Null event Empty set7 Mutually exclusive events Disjoint subsets of s8 Experiment Situation to be modelled probabilistically9 P(AjB) P(A,B)10 Random variable X (Payoff) function f : S ! R11 Range space RX Range f (S) of the function f : S ! R

Table 6.1 keeps track of translations between traditional probabilistic and modernset-theoretic terminology and notation. Rows 1 through 8 are about matters arisingin this and the following two sections; row 9 concerns the notation for conditionalprobability to be discussed in Sect. 6.4; while rows 10 and 11 will come up inSect. 6.8 on expectation.

6.2 Philosophy and Applications

Probability theory differs from any of the topics that we have studied so far in tworespects, one philosophical and the other practical.

6.2.1 Philosophical Interpretations

Philosophically, the intuitive notion of probability is much more slippery than thatof a set. It is even more perplexing than the notion of truth, which we used in logicboxes to define connectives such as negation, conjunction, disjunction and materialimplication. When we declare that an event is, say, highly improbable we are notcommitting ourselves to saying that it does not take place. Indeed, we may have acollection of events, each of which known to be highly improbable while we alsoknow that at least one must take place. Consider for example a properly conductedlottery with many tickets. For each individual ticket, it is highly improbable that itwill win; but it is nevertheless certain that one will do so.

So what is meant by saying that an event is probable, or improbable? There aretwo main perspectives on the question.• One of them sees probability as a measure of uncertainty, understood as a state

of mind. This is known as the subjectivist conception of probability. The generalidea is that a probability function p is a measure of the degree of belief that a


fully rational (but not omniscient) agent would have concerning various possibleevents, given some limited background knowledge that is not in general enoughto permit unqualified judgement.

• The other sees probability as a manifestation of as something in the worlditself, independent of the thinking of any agent. This is known as the objectivistconception of probability. It comes in several flavours. Often on this approach,the probability of an event is taken to be a measure of the long-run tendencyfor events similar to it to take place in contexts similar to the one underconsideration. That particular version of the objectivist approach is called thefrequentist conception.Of course, these thumbnail sketches open more questions than they begin to

answer. For example, under the subjectivist approach, what counts is not your degreeof belief or mine, nor even that of some reputed expert or guru, but rather the degreethat a fully rational agent would have. So how are we to understand rationality inthis context? Surely, in the final analysis, rationality is not such a subjective matter.Properly unpacked, the approach may thus end up being not so subjective afterall! On the other hand, the frequentist view leaves us with the difficult problemof explaining what is meant by ‘contexts similar to the one under consideration’.It also leaves us with questions such as why the long-run frequency of somethingshould give us any confidence at all in the short term, and how long the long run issupposed to be. As the saying goes, in the long run we are all dead. It is difficultto justify confidence in real-life decisions, which usually need to be taken withina very limited time, in terms of the frequentist approach without making someequiprobability assumptions.

It is often felt by practitioners that some kinds of situation are more easilyconceptualized in a subjectivist manner, while others are better seen in frequentistterms. The subjectivist approach seems to fit better for events that are rare or unique,or about which little data on past occurrences of similar events are available. Forexample, prima facie it may be rather difficult for a frequentist to deal with thequestion of the probability of the earth being rendered uninhabitable by a nuclearwar in the coming century. On the other hand, there are other kinds of questionon which lots of data are available, on which frequencies may be calculated, andfor which the frequentist conception of probability seems appropriate. That is howinsurance companies keep afloat, and part of the process by which new medicinesare cleared for safety.

As this is not a text of the philosophy of science, we will not take such issuesfurther. We simply note that they are difficult to formulate meaningfully, tricky tohandle and have not obtained any consensual resolution despite the centuries withwhich probability theory has been studied mathematically and applied to everydayaffairs. The amazing thing is that we are able to carry on with both the mathematicsof probability theory and its application to the real world without resolving thephilosophical issues about its interpretation.

6.2 Philosophy and Applications 143

6.2.2 The Art of Applying Probability Theory

The application of probability theory is an art as much as a science, and ismore subtle than the mathematics itself. The first point to realize about it is that,unfortunately, you cannot get probabilities out of nothing. You have to makeassumptions about the probabilities of some events in order to deduce probabilitiesof others. Such assumptions should always be made explicit, so that they can berecognized for what they are and then assessed. Whereas in textbook examples, theassumptions are often simply given as part of the problem statement, in real life theycan be difficult to determine or justify.

In practice, this means that when confronted with a problem of probabilities, twosteps should be made before attempting any calculation:• Identify the sample space of the problem. What is the set S that we take as domain

of the probability distribution in the question?• Specify the values of the distribution. Can it reasonably be assumed to be an

equiprobable distribution? If so, and we also know the size n of the sample space,then we already have value p(s) D 1/n for each point s in the sample space, and weare ready to calculate the values pC(A) for subsets A � S. If it cannot reasonablybe taken to be equiprobable, we must ask: Is there any other distribution that isreasonable to assume?Many students are prone to premature calculation, and training is needed to

overcome it. In this chapter, we will try always to be explicit about the sample spaceand the values of the distribution.

6.2.3 Digression: A Glimpse of the Infinite Case*

In this text, we study only finite probability spaces, so this subsection is only for thecurious; others may skip it. What differences arise in the infinite case?

Those who have been doing the exercises for this chapter have already seen thatwhen the sample space S is infinite, it cannot have an equiprobable distribution inthe sense defined. The reason is that if the distribution is equiprobable, then either allpoints in S get the value zero, or else they all get the same value greater than zero. Inthe former case, the values add up to less than 1, for under the usual understandingof infinite sums, the sum of even infinitely many zeros is still zero. And in the lattercase, the values sum to more than 1. Indeed, by the principle of Archimedes, forevery real r > 0 (no matter how small) there is an integer n (whose size depends onthe choice of r) with n � r > 1. Thus, no matter how small we choose r > 0, assigningvalue r to all the elements of an infinite sample space will make even sufficientlylarge (finite) subspaces of it sum to more than 1.

The standard response to this predicament is to abandon the idea that in theinfinite case, probability functions may always be generated by summing fromdistributions on the sample space S. Instead, they are treated directly, in a moreabstract manner. They become functions p that take some (but not necessarily all)subsets A of the sample space S as arguments, to give values p(A) between 0 and 1.


The domain of the functions is required to be a field of subsets of S, that is, a non-empty collection C�P(S) that is closed under complement (with respect to S) andbinary union (and so also binary intersection). The functions themselves are requiredto satisfy certain postulates, notably that p(S) D 1, and that when A,B are disjointsubsets of S then p(A[B) D p(A) C p(B). But it is no longer required that for infiniteA, p(A) is in any sense the sum of the infinitely many values p(fag) of the singletonsubsets of A, nor indeed that such singleton subsets need be in the domain of thefunction p when A is.

As a result, it is not possible to generalize addition principles for finite unionsto principles that cover arbitrary infinite unions of subsets of the sample space.However, a restricted form of the generalization, known as �-additivity, mayconsistently be accepted, and often is. It requires that the field of subsets servingas domain of the probability function is closed under countable unions and, ifthe sample set is more than countable, that p satisfies a principle of addition forcountable unions of pairwise disjoint sets. But this is going way beyond discretemathematics.

Another approach to the definition of probability in the infinite case is to widenthe set of the reals to contain also ‘infinitesimals’ as defined and exploited in whatis called non-standard analysis. But that would take us even further afield.

6.3 Some Simple Problems

In this section, we run through a series of problems, to give practice in thetask of applying the first principles of probability theory. The examples becomeprogressively more sophisticated as they bring in additional devices.

Example 1 Throw a fair die. What is the probability that the number (of dots on theuppermost surface) is divisible by 3?

Solution First, we specify the sample space. In this case, it is the setS D f1,2,3,4,5,6g. Next, we specify the distribution. In this context, the term fairmeans that the questioner is assuming that we have an equiprobable distribution.The distribution thus puts p(n) D 1/6 for all positive n � 6. The event we have toconsider is A D f3,6g and so p(A) D 1/6 C 1/6 D 1/3.

Example 2 Select a playing card at random from a standard European pack. Whatis the probability that it is a hearts? That it is a court card (jack, queen, king)? Thatit is both? That it is hearts but not a court card? That it is either?

Solution First we specify the sample space. In this case, it is the set S of 52 cardsin the standard European pack. Then we specify the distribution. In this context, theterm random again means that we are asked to consider an equiprobable distribution.The distribution thus puts p(c) D 1/52 for each card c. We have five events toconsider. The first two we call H, C, and the others are H\C, HnC and H[C.In a standard pack, there are 13 hearts and 12 court cards, so p(H) D 13/52 D 1/4

6.3 Some Simple Problems 145

and p(C) D 12/52 D 3/13. There are only 3 cards in H\C, so p(H\C) D 3/52,so there are 13 � 3 D 10 in HnC giving us p(HnC) D 10/52 D 5/26. Finally,#(H[C) D #(H) C #(C) � #(H\C) D 13 C 12 � 3 D 22 so p(H[C) D 22/52 D 11/26.

Example 3 An unscrupulous gambler has a loaded die with probability distributionp(1) D 0.1, p(2) D 0.2, p(3) D 0.1, p(4) D 0.2, p(5) D 0.1, p(6) D 0.3. Which is moreprobable, that the die falls with an even number, or that it falls with a number greaterthan 3?

Solution As in the first problem, the sample space is S D f1,2,3,4,5,6g. But thistime the distribution is not equiprobable. Writing E and G for the two events inquestion, we have E D f2,4,6g while G D f4,5,6g, so p(E) D p(2) C p(4) C p(6) D0.2 C 0.2 C 0.3 D 0.7, while p(G) D p(4) C p(5) C p(6) D 0.2 C 0.1 C 0.3 D 0.6.Thus, it is slightly more probable (difference 1/10) that this die falls even than thatit falls greater than 3.

Example 4 We toss a fair coin three times. What is the probability that (a) atleast two consecutive heads appear, (b) that exactly two heads (not necessarilyconsecutive) appear?

Solution The essential first step is to get clear about the sample space. It is notthe pair fheads, tailsg or fH,Tg for short. It is the set fH,Tg3 of all ordered tripleswith elements from fH,Tg. Enumerating it in full, S D fHHH, HHT, HTH, HTT,THH, THT, TTH, TTTg. As the coin is supposed to be fair, these are consideredequiprobable. Let A be the event of getting at least two consecutive heads, and Bthat of getting exactly two heads (in any order). Then A D fHHH, HHT, THHg,while B D fHHT, HTH, THHg. Thus, p(A) D 3/8 D p(B).

Alice Box: Repeated Tosses

Alice: Not so fast! Something is funny here.Hatter: What’s wrong?Alice: You seem to be making a hidden assumption. We are told that

the coin is fair, so p(H) D 0.5 D p(T). But how do you know that,say, p(HHH) D p(HTH) D 1/8? Perhaps the fact of landing headson the first toss makes it less likely to land heads on the secondtoss. In that case the distribution over this sample space will not beequiprobable.

Hatter: Nice point! In fact, we are implicitly relying on such an assumption.We are assuming that p(H) always has the value 0.5 no matter whathappened on earlier tosses. Another way of putting it: the successivetosses have no influence on each other – HiC1 is independent of Hi,where Hi is the event ‘heads on the ith toss’. We will be saying moreabout independence later in the chapter.


Example 5 Seven candidates are to be interviewed for a job. Of these, four are menand three are women. The interviews are conducted in random order. What is theprobability that all the women are interviewed before any of the men?

Solution Let M be the four male candidates, and W be the three female ones.Then M[W is the set of all candidates. We take the sample space to be the setof all permutations (i.e. selections in mode O C R�) of this seven-element set. Aswe know from the preceding chapter, there are P(7,7) D 7! such permutations, andwe assume that they are equiprobable as suggested by the word ‘random’ in theformulation of the problem. An interview in which all the women come before themen will be one in which we have some permutation of the women, followed bysome permutation of the men, and there are clearly P(3,3)�P(4,4) of these. So theprobability of such an interview pattern is P(3,3)�P(4,4)/P(7,7)D 3!4!/7! D 1/35.

This is the first example in our discussion of probability that makes use of arule from the chapter on counting. The sample space consisted of all permutationsof a certain set and, since the space is assumed equiprobable, we are calculating theproportion of them with a certain property. If, on the other hand, a problem requires asample space consisting of combinations, we need to apply the appropriate formulafor counting them, as in the following example.

Example 6 Of the ten envelopes in my letterbox, two contain bills. If I pick threeenvelopes at random to open today, leaving the others for tomorrow, what is theprobability that none of these three is a bill.

Solution We are interested in the proportion of 3-element subsets of S that containno bills. So we take the sample space S to consist of the set of all 3-element subsetsof a 10-element set. Thus S has C(10,3) D 10!/3!7! D 120 elements. How many ofthese subsets have no bills? There are 8 envelopes without bills, and the number of3-element subsets of this 8-element set is C(8,3) D 8!/3!5! D 56. So the probabilitythat none of our three selected envelopes contains a bill is 56/120 D 7/15 – just under0.5, unfortunately.

Alice Box: Combinations or Permutations?

Alice: Can’t we do it with permutations too?Hatter: Let’s see. What’s your sample space?Alice: The set of all 3-element permutations (rather than subsets) of

a 10-element set. This has P(10,3) D 10!/7! D 10�9�8D 720 ele-ments. And we are interested in the 3-element permutations (again,rather than subsets) of the 8-element no-bill set, of which thereare P(8,3) D 8!/5! D 8�7�6 D 336 elements. As the sample spaceis equiprobable, the desired probability is given by the ratio336/720 D 7/15, the same as by the other method.

6.4 Conditional Probability 147

So there are problems that can be solved by either method. However, the solutionusing combinations has a smaller sample space. In effect, with permutations weare processing redundant information (the different orderings), giving us muchlarger numbers in the numerator and denominator, which we then cancel down.So it involves rather more calculation. Of course, much of the calculation could beavoided by cancelling as soon as possible, at the expression (8!/5!)/(10!/7!) beforerewriting the numerator and denominator in decimal notation. But as a general rule,it is better to keep the sample space down from the beginning.

Exercise 6.3(a) A pair of fair dice is thrown. What is the probability that the sum of the dots

is 8? What is the probability that it is at least 5?(b) Your debit card has a 4-digit pin number. You lose the card, and it is found by a

dishonest person. The cash dispenser allows 3 attempts to enter the pin number.What is the probability that the person accesses your account, assuming thathe/she enters candidate pin numbers at random, without repetition (but of courseallowing repetition of individual digits).

(c) There are 25 people in a room. What is the probability that at least two ofthe people are born on the same day of the year (but not necessarily the sameyear)? For simplicity, assume that each year has 365 days, and that birthdaysare randomly distributed through the year (both assumptions being, of course,rough approximations). Hints: (1) Be careful with your choice of sample space.(2) First find the probability that no two distinct people have the same birthday.(3) You will need a pocket calculator.

6.4 Conditional Probability

In this section, we introduce the general concept of conditional probability, itsdefinition as a ratio, examples of its application and dangers of its misinterpretation.

6.4.1 The Ratio Definition

You throw a pair of fair dice. You already know how to calculate the probabilityof the event A that at least one of the dice is a 3. Your sample space is the set ofall ordered pairs (n,m) where n,m 2 f1, : : : ,6g and you assume that the distributionon this space is equiprobable. There are 11 pairs (n,m) with either n D 3 or m D 3(or both) so, by the assumption of equiprobability, p(A) D 11/36.

But what if we are informed, before looking at the result of the throw, that thesum of the two dice is 5? What is the probability that at least one die is a 3, giventhat the sum of the two dice is 5? This is a question of conditional probability.

In effect, we are changing the sample space. We are restricting it from the set Sof all 36 ordered pairs (n,m) where n,m 2 f1, : : : ,6g, to the subset B � S consistingof those ordered pairs whose sum is 5. But it is very inconvenient to have to change


the sample space each time we calculate a probability. It is easier to keep the samplespace fixed and obtain the same result in another, equivalent, way. So we define theprobability of A (e.g. that at least one die is a 3) given B (e.g. that the sum of the twofaces is 5) as the ratio of p(A\B) to p(B). Writing the conditional probability of Agiven B as p(AjB), we put:

p .AjB/ D p.A \ B/=p.B/

In the example given, there are four ordered pairs whose sum is 5, namely, thepairs (1,4), (2,3), (3,2), (4,1), so p(B) D 4/36. Just two of these contain a 3, namely,the pairs (2,3), (3,2), so p(A\B) D 2/36. Thus, p(AjB) D (2/36)/(4/36)D 1/2 – largerthan the unconditional (alias prior) probability p(A) D 11/36.

Note that the conditional probability of A given B is not in general the sameas the conditional probability of B given A. While p(AjB) D p(A\B)/p(B), we havep(BjA) D p(B\A)/p(A). The numerators are the same, since A\B D B\A, but thedenominators are different: they are p(B) and p(A), respectively.

Exercise 6.4.1 (1) Calculate the value of p(BjA) in the above die-throwing exampleto determine its relation to p(AjB).

The non-convertibility (or non-commutativity) of conditional probability shouldnot be surprising. In logic we know that the proposition ’ ! “ is not the same as“ ! ’, and in set theory, the inclusion X � Y is not equivalent to Y � X.

Alice Box: Division by Zero

Alice: One moment! There is something that worries me in the definitionof conditional probability. What happens to p(AjB) when p(B) D 0,for example, when B is empty set? In this case, we can’t putp(AjB) D p(A\B)/ p(B) D p(A\B)/0, since division by zero is notpossible.

Hatter: You are right! Division by zero is undefined in arithmetic, and so thedefinition that we gave for conditional probability does not makesense in this limiting case.

Alice: So what should we do?Hatter: Just leave it undefined.

Actually, that is the most common of three possible reactions. It treats conditionalprobability, like division itself, as a partial function of two arguments (cf. Sect.3.1 of Chap. 3). Its domain is not P(S)2 where S is the sample space, but ratherP(S) � fX � S: p(X) ¤ 0g. The second argument ranges over all the subsets of thesample space S whose probability under the function p is greater than zero. Thatexcludes the empty set ¿, and it may also exclude some other subsets since it can


happen that p(X) D 0 for a non-empty X � S. This option gives us the ratio definitionof conditional probability: p(AjB) D p(A\B)/p(B) except when p(B) D 0, in whichcase p(AjB) is undefined. It is the definition that we follow in this chapter.

An alternative approach is to treat conditional probability as a full functionon P(S)2, but to give p(AjB) a constant value for all B with p(B) D 0, a ‘dummyvalue’ if you like to call it that. It is convenient to choose the constant value tobe 1. This option gives us the ratio/unit definition: p(AjB) D p(A\B)/p(B) exceptwhen p(B) D 0, in which case p(AjB) D 1. Really, these two definitions are not verydifferent from each other; it is more a difference of style than of content.

There is also a third approach, which is significantly different. Like the first two,it puts p(AjB) D p(A\B)/p(B) when p(B) ¤ 0. Like the second, it treats p(AjB) aswell defined even p(B) D 0, thus taking the domain to be P(S)2. But unlike both, itallows the value of p(AjB) to vary when p(B) D 0, provided it satisfies the productrule p(A\BjC) D p(AjC)�p(BjA\C). We will not go any further into this notion ofrevisionary conditional probability, except to say three things. (1) It has severalmain variants, corresponding to natural ways in which the details of the definitionmay be filled out for the ‘critical zone’ where p(B) D 0 but B ¤ ¿; (2) it is quitepopular among philosophers of probability; but (3) it is rarely used by statisticians,mathematicians and physicists, although it is employed by some economists inrecent work on game theory.

Exercise 6.4.1 (2)(a) Suppose that we know from records that, in a certain population, the probability

p(C) of high cholesterol is 0.3 and the probability p(C\H) of high cholesteroland heart attack together is 0.1. Find the probability p(HjC) of heart attack givenhigh cholesterol.

(b) Suppose that we know from records from another population that the probabilityp(H) of heart attack is 0.2 and the probability p(H\C) of heart attack and highcholesterol together is 0.1. Find the probability p(CjH) of high cholesterol givenheart attack.

(c) Check the limiting case equalities: p(¿jB) D p(�BjB) D 0, p(SjB) D p(BjB) D 1,and p(AjS) D p(A).

(d) Show that (i) p(AjB) D p((A\B)jB) and (ii) p(AjB) � p(A0jB) whenever A � A0,both under the assumption that p(B) ¤ 0.

(e) Show that the ratio definition of conditional probability satisfies the product rulep(A\BjC) D p(AjC)�p(BjA\C) whenever p(A\C) ¤ 0.

6.4.2 Applying Conditional Probability

In the empirical sciences, conditional probability is often employed in the investi-gation of causal relationships. For example, as in the first parts of the last exercise,we may be interested in the extent to which high cholesterol leads to heart attackand use conditional probabilities like those in the exercise above when analysing thedata. But great care is needed.


In the first place, conditional probability may make sense even when questions oftime and causality do not arise. For example, in Exercise 6.4.1 (1) we calculated theconditional probabilities p(AjB) and p(BjA), where A is the event that at least onedie is a 3 and B is the event that the sum of the two faces is 5. Neither of these eventscan be thought of as causally related to the other, nor does either happen before orafter the other. Their conditional probabilities are merely a matter of numbers.

Second, when time and causality are relevant, we are just as entitled to look at theprobability of an earlier event or suspected cause (e.g. high cholesterol) given a laterone or suspected effect (e.g. heart attack), as conversely. Conditional probabilitymakes sense in both directions.

Third, in the presence of time and causality, discovery of a value of p(AjB) (evena value of 1) does not by itself establish any kind of causal link, direct or mediated,although it may lead us to suspect one. The mathematics says nothing about theexistence of causal or temporal relationships between A and B in either direction.

Reasoning from high conditional probability to causation is fraught with philo-sophical difficulties and hedged with practical complications. Even inference in thereverse direction, from a causal relationship to a conditional probability, is not asstraightforward as may appear. We will not attempt to unravel these fascinating andimportant issues, which are properly tackled in courses of statistics, experimentalmethod and the philosophy of science. We remain with the confines of the basicmathematics of probability itself.

Exercise 6.4.2 (1) (with partial solution)(a) In a writers’ association, 60% of members write novels, 30% write poetry, but

only 10% write both novels and poetry. What is (i) the probability that anarbitrarily chosen member writing poetry also writes novels, (ii) the converseconditional probability?

(b) In the same writers’ association, 15% write songs, all of whom also write poetry,and 80% of the songsters write love poetry. But none of the novelists write lovepoetry. What is the probability that (i) an arbitrary poet writes songs, (ii) anarbitrary songster writes poetry, (iii) an arbitrary novelist writes love poetry,(iv) an arbitrary novelist writing love poetry also writes songs, (v) an arbitrarysongster writes novels?

In this problem, understand all percentage figures given as exact (rather than atleast, at most or approximate). Use the acronyms N,P,S,L for those members of theassociation who write novels, poetry, songs and about love.

Solution to (a), (b)(i), (b)(v)(a) (i) p(NjP) D p(N\P)/p(P) D 0.1/0.3 D 1/3. (ii) p(PjN) D p(P\N)/p(N) D 0.1/

0.6 D 1/6.(b) (i) p(SjP) D p(S\P)/p(P). But by hypothesis, S � P, so S\P D S, so

p(SjP) D p(S)/p(P) D 0.15/0.3 D 0.5. (v) By the ratio definition, p(NjS) Dp(N\S)/p(S). But the data given in the problem do not suffice to calculatethe value of p(N\S), so we cannot solve the problem.


The last part of the exercise illustrates an important lesson. In real life, theinformation available is often insufficient to determine probabilities. One cansometimes get one by making rather arbitrary assumptions; for example, in theexercise we might assume that being a songster is independent of being a novelist(in a sense that we will define later) and use a principle for calculating the jointprobability of independent events (also given later). But making such assumptionsin an ad hoc or gratuitous manner can be dangerous. If introduced, they must berendered explicit, so that they may be challenged and justified.

However, even when the data are insufficient to determine a precise probabilityfor a property in which we are interested, they may permit us to work out a usefulupper or lower bound. For example, in the exercise above, although we cannotcalculate the exact value of p(N\S), we can obtain an upper bound on it. Sinceexactly 15% of the association are songsters and exactly 80% of songster memberswrite love poetry, we know that exactly 12% of the association are songsterswriting love poetry. Since none of the love poets are novelists, at least 12% ofthe association (but not necessarily exactly 12%) are songster non-novelists. Butrecalling that exactly 15% of the association are songsters in the first place, wemay conclude that at most 3% of the songsters are novelists, that is, p(N\S) � 0.03.So p(NjS) D p(N\S)/p(S) � 0.03/0.15� 0.2. Rather tortuous reasoning, but it got ussomething.

Exercise 6.4.2 (2) Show that the data in the previous exercise are compatible withN\S D ¿. What does that tell us about a lower bound on p(NjS)?

When working with conditional probabilities it is easy to forget the fact that theyare only partial functions, proceeding as if they are full ones. In other words, it iseasy to neglect the fact that p(AjB) is not defined in the limiting case that p(B) D 0.This can sometimes lead to serious errors. On the other hand, it is quite distractingto have to check out these limiting cases while trying to solve the problem; it canmake you lose the thread of your thought.

How can these competing demands of rigour and insight be met? The bestprocedure is make two runs. When solving a problem involving conditionalprobability, make a first draft of your calculation or proof without worrying aboutthe limiting cases, that is, as if conditional probability was always defined. Whenfinished, go back and check each step to see that it is correct in those cases. Thismeans verifying that each condition used in a conditional probability really haspositive value, or finding an alternative argument for a step if the value of a conditionis zero. These checks are usually quite straightforward, but sometimes there can benasty surprises!

Exercise 6.4.2 (3) (with partial solution) Let p: S ! [0,1] be a probabilityfunction. For each of the following statements, state the condition needed for allits conditional probabilities to be well defined, and show that it holds whenever welldefined:(a) p(A1[A2jB) D p(A1jB) C p(A2jB) whenever the sets A1\B, A2\B are disjoint.(b) p(�AjB) D 1 � p(AjB).


(c) If p(A) � p(B) then p(AjB) � p(BjA).(d) p(AjB)/p(BjA)D p(A)/p(B).

Solution to (a)(a) Condition needed: p(B) ¤ 0. Proof: Suppose A1\B, A2\B are disjoint. Then:

p.A1 [ A2jB/ Dp..A1 [ A2/ \ B//=p.B/ by definition of conditionalprobability

Dp..A1 \ B/ [ .A2 \ B//=p.B/ by Boolean distributionin numerator

DŒp.A1 \ B/ C p.A2 \ B/�=p.B/ using disjointednessassumption

Dp.A1 \ B/=p.B/ C p.A2 \ B/=p.B/ by arithmetic

Dp .A1jB/ C p .A2jB/ by definition of conditionalprobability

Alice Box: What Is A j B?

Alice: I understand the definitions and can do the exercises. But thereis still a question bothering me. What is AjB? The definition ofconditional probability tells me how to calculate p(AjB), but it doesnot say what AjB itself is. Is it a subset of the sample space S likeA,B, or some new kind of ‘conditional object’?

Hatter: Neither one nor the other! Don’t be misled by notation: AjB isjust a traditional shorthand way for writing the ordered pair (A,B).Whereas A,B are subsets of the sample space, AjB is not: it is anordered pair of such subsets. The vertical stroke is there merelyto remind you that there is a division in the definition of p(A,B).Conditional probability is thus a partial two-place function, not aone-place function with a funny argument. A notation uniform withthe rest of mathematics would simply be p(A,B).

Alice: Does an expression like p(Aj(BjC)) or p((AjB)jC) mean anything?Hatter: Nothing whatsoever! Try to unpack the second one, say, accord-

ing to the definition of conditional probability. We should havep((AjB)jC) D p((AjB)\C)/p(C). The denominator is OK, but thenumerator is meaningless, for p(X\C) is defined only when X isa subset of the sample set S, and AjB doesn’t fit the bill.

Alice: So the operation of conditionalization can’t be iterated?Hatter: Not so fast! We can iterate conditionalization, but not like that!

Back to the text to explain how.

6.5 Interlude: Simpson’s Paradox 153

Exercise 6.4.2 (4) Show that the other expression mentioned by Alice, p(Aj(BjC)),is also meaningless.

As we have emphasized, a probability function p: P(S) ! [0,1] over a samplespace S has only one argument, while its associated conditional probability functionhas two: it is a partial two-place function with domain P(S) � fX � S: p(X) ¤ 0g.But like all two-place functions, full or partial, it can be transformed into a familyof one-place functions by the operation of projection (Chap. 3, Sect. 3.5.3) fromvalues of the left or right argument, and in this case, the interesting projections holdfixed the right argument. For every probability function p: P(S) ! [0,1] and B � Swith p(B) ¤ 0, we have the one-place function pB: P(S) ! [0,1] defined by puttingpB(A) D p(AjB) D p(A\B)/p(B) for all A � S.

Why should these projections be of any interest? The reason is that each suchfunction pB: P(S) ! [0,1] is also a probability function over the same sample spaceS, as is easily verified and is set as an exercise below. It is called the condition-alization of p on B. This reveals a perfectly good sense in which the operation ofconditionalization may be iterated. Although, as we have seen, Alice’s expressionsp(Aj(BjC)) and p((AjB)jC) are meaningless, the functions (pB)C and (pC)B are welldefined when p(B\C) ¤ 0. Moreover, and rather surprisingly, it is not difficult toshow that the order of conditionalization is immaterial, and that every iteratedconditionalization collapses into a suitable one-shot conditionalization. Specifically:(pB)C D pB\C D pC\B D (pC)B when all these functions are well defined, that is,when p(B\C) ¤ 0. Verifying this last point is one of the end-of-chapter exercises.

Exercise 6.4.2 (5) (with partial solution) Let p: P(S) ! [0,1] be a probabilityfunction. Show the following:(a) p D pS.(b) If p(B) ¤ 0, then pB(A1[A2) D pB(A1) C pB(A2) whenever the sets A1\B, A2\B

are disjoint.(c) If p(B) ¤ 0, then pB(�A) D 1 � pB(A).(d) For any probability function p: P(S) ! [0,1] and B � S with p(B) ¤ 0, the

function pB: P(S) ! [0,1] is well defined and is also a probability functionover S.

Solution for (a) and Hints for the Others (a) By definition, pS(A) D p(A\S)/p(S) D p(A)/1 D p(A). Note that p(S) D 1 ¤ 0. For (b), (c) use Exercise 6.4.2 (3).For (d), unpack the definition and make use of any properties already established.

6.5 Interlude: Simpson’s Paradox*

We pause for a moment to note a surprising fact known as Simpson’s paradox,named after the person who rediscovered it in 1951, although it had been noticedmuch earlier in 1903 by Yule, and even earlier in 1899 by Pearson. It shows – just incase you hadn’t already noticed – that even elementary discrete probability theoryis full of traps for the unwary!


Suppose we have a sample space S, a partition of S into two cells F, �F andanother partition of S into two cells C, �C. To fix ideas, take S to be the set of allapplicants for vacant jobs in a university in a given period. It is assumed that allapplicants are female or male (but not both), and that every application is addressedeither to computer science or to mathematics (but not both). We can thus write F,�F for the female and male applicants, respectively, and C, �C for the applicantsto the respective departments of computer science and mathematics. The four setsF\C, F\ � C, �F\C, �F\ � C form a partition of S (Chap. 2, Sect. 2.5.3). Nowlet H be the set of applicants actually hired. Suppose that:

p.H jF \ C / > p.H j � F \ C /I

that is, the probability that an arbitrarily chosen female candidate to the computerscience department is hired, is greater than the probability that an arbitrarily chosenmale candidate to the computer science department is hired. Suppose that the samestrict inequality holds regarding the mathematics department, that is, that

p.H jF \ �C / > p.H j � F \ �C /:

Intuitively, it seems inevitable that the same inequality will hold when we takeunions on each side, so that

p .H jF / > p.H j � F /I

in other words, that the probability that a female candidate is hired, is greater thanthat for a male candidate. But in fact, it need not do so! We give a counterexample(based on a lawsuit that was actually brought against a US university). Let Shave 26 elements, neatly partitioned into 13 female and 13 male. Eight womenapply to computer science, of whom two are hired, while five men apply to thesame department, of whom one is hired. At the same time, five women apply tomathematics, with four hired, while eight men apply with six hired. We calculatethe probabilities.

p.H jF \ C / D p.H \ F \ C /=p.F \ C / D 2=8 D 0:25 while

p.H j � F \ C / D p.H \ �F \ C /=p.�F \ C / D 1=5 D 0:2;

so that the first inequality holds. Likewise

p.H jF \ �C / D p.H \ F \ �C /=p.F \ �C / D 4=5 D 0:8 while

p.H j � F \ �C / D p.H \ �F \ �C /=p.�F \ �C / D 6=8 D 0:75;

6.6 Independence 155

and thus the second inequality also holds. But we also have:

p .H jF / D p.H \ F /=p.F / D .2 C 4/ =13 D 6=13 < 0:5

p.H j � F / D p.H \ �F /=p.�F / D .1 C 6/ =13 D 7=13 > 0:5;

and so the third inequality fails – giving rise to the lawsuit!Of course, the ‘paradox’ is not a paradox in the strict sense of the term. It is

not a contradiction arising from apparently unquestionable principles; it does notshake the foundations of probability theory. But it shows that there can be surpriseseven in elementary discrete probability. Intuition is not always a reliable guide inmatters of probability, and even experts (as well as laymen and lawyers) can gowrong.

Simpson’s paradox may also be regarded as a manifestation in a probabilistic set-ting of a simple and perhaps less surprising arithmetical fact: there are positive inte-gers with a1/x1 > a2/x2 and b1/y1 > b2/y2 but (a1 C b1)/(x1 C y1) < (a2 C b2)/(x2 C y2).

6.6 Independence

There are two equivalent ways of defining the independence of two events, usingprobability. One uses conditional probability, the other is in terms of multiplica-tion of unconditional probabilities. We begin with the definition via conditionalprobability.

Let p: P(S) ! [0,1] be any probability function on a (finite) sample space S. LetA,B be events (i.e. subsets of S). We say that A is independent of B (modulo p) iffp(A) D p(AjB) (in the principal case) or (limiting case) p(B) D 0. In the languageof projections, iff p(A) D pB(A) when pB is well defined. In rough terms, iffconditionalizing on B makes no difference to the probability of A.

Note that this is just mathematics; it is not a matter of the presence or absenceof a causal connection. Remember too that independence is modulo the probabilityfunction chosen: A may be independent of B with respect to one probability functionp on P(S), but not so with respect to another one p0.

The other definition, in terms of multiplication, is as follows: A is independentof B (modulo p) iff p(A\B) D p(A)�p(B). The two definitions are equivalent. Somepeople find the definition in terms of conditional probabilities more intuitive(except for its limiting case, which is a little annoying). But the account interms of multiplication is usually handier to work with, and we adopt it asour official definition. The following exercise looks at some basic properties ofindependence.

Exercise 6.6 (1) (with partial solution)(a) Show the equivalence of the two definitions of independence.(b) Show that the relation of independence is symmetric.


(c) Show that A is independent of B in each of the four limiting cases that p(B) D 0,p(B) D 1, p(A) D 0, p(A) D 1.

(d) Show that for disjoint A,B, if A is independent of B then p(A) D 0 or p(B) D 0.

Solution to (a)(a) Suppose first that p(A\B) D p(A)�p(B) and p(B) ¤ 0. Then we may divide both

sides by p(B), giving us p(A) D p(A\B)/p(B) D p(AjB) as desired. Conversely,suppose either p(B) D 0 or p(A) D p(AjB). In the former case, p(A\B) D0 D p(A)�p(B) as needed. In the latter case, p(A) D p(AjB) D p(A\B)/p(B), somultiplying both sides by p(B) we have p(A\B) D p(A)�p(B) and we are done.

An advantage of the multiplicative definition is that it generalizes elegantlyto the independence of any finite collection of events. Let fA1, : : : ,Ang be anysuch collection with n � 1. We say that the Ai are independent iff for every mwith 1 � m � n we have: p(A1\ : : : \Am) D p(A1)� : : : �p(Am). Equivalently, usinga definition that may at first glance seem circular, but is in fact recursive (seeChap. 4): iff every proper non-empty subcollection of fA1, : : : ,Ang is independentand p(A1\ : : : \An) D p(A1)� : : : �p(An). It should be emphasized that the indepen-dence of n events does not follow from their pairwise independence; one of theend-of-chapter exercises asks you to find a simple counterexample.

When it holds, independence is a powerful tool for calculating probabilities. Weillustrate with an example. Five per cent of computers sold by a store are defective.Today the store sold three computers. Assuming that these three form a randomsample, with also independence as regards defects, what are: (i) the probability thatall of them are defective, (i) the probability that none of them have defects, (iii) theprobability that at least one is defective?

We are told that the probability p(Di) of defect for each computer ci (i � 3) takenindividually is 0.05, and that these probabilities are independent. For (i), the assump-tion of the independence of fD1,D2,D3g tells us that the probability p(D1\D2\D3)that all three are defective is (0.05)3 D 0.000125. For (ii), the probability p(�Di) thata given computer is not defective is 0.95, so by independence again, the probabilityp(�D1\ � D2\ � D3) that none are defective is (0.95)3 D 0.857375, which is sig-nificantly less than 0.95. For (iii), p(D1[D2[D3) D 1 � p(�D1\ � D2\ � D3) D 1–0.857375D 0.142625.

Alice Box: A Flaw in the Argument?

Alice: Just a minute! I think that there is a flaw in that argument. Inthe statement of the problem, it was given that the sets Di areindependent. But in answering part (ii), we needed to assume thattheir complements �Di are independent.

Hatter: Gap in the argument, yes, and thanks for pointing it out. But not aflaw. In fact, if the Di are independent, so are the �Di. For the casen D 2, that will be part of the next exercise.

6.7 Bayes’ Theorem 157

Exercise 6.6 (2) (with solution) Show that the following four conditions are allequivalent (modulo any given probability function): A and B are independent, A and�B are independent, �A and B are independent, �A and �B are independent.

Solution It suffices to show the equivalence of the first to the second. For,given that, we can reapply it to get the equivalence of the first to the thirdby using symmetry on the first. Then we get the equivalence of the third tothe fourth by reapplying it again. So, suppose p(A\B) D p(A)�p(B); we want toshow that p(A\ � B) D p(A)�p(�B). Now p(A) D p(A\B) C p(A\ � B) D p(A)�p(B)C p(A\ � B). Hence, p(A\ � B) D p(A) � p(A)�p(B)D p(A) � [p(A)�(1� p(�B))] Dp(A) � [p(A) � p(A)�p(�B)]D p(A)�p(�B), as desired.

The usefulness of independence for making calculations should not, however,blind us to the fact that in real life it holds only exceptionally and should notbe assumed without good reason. Examples abound. In one from recent UK legalhistory, an expert witness cited the accepted (low) figure for the probability of aninfant cot death in a family, and obtained the probability of two successive suchdeaths simply by squaring the figure, which of course was very low indeed. This islegitimate only if we can safely assume that the probabilities of the two deaths areindependent, which is highly questionable.

Exercise 6.6 (3)(a) Two archers A1, A2 shoot at a target. The probability that A1 gets a bull’s-eye

is 0.2, the probability that A2 gets a bull’s-eye is 0.3. Assuming independence,what are: (i) the probability that they both get a bull’s-eye? (ii) That neither getsa bull’s-eye? (iii) Assuming also independence for successive tries, that neithergets a bull’s-eye in ten successive attempts each?

(b) Can an event ever be independent of itself?(c) Is the relation of independence transitive?

6.7 Bayes’ Theorem

We have emphasized that conditional probability does not commute: in general,p(AjB) ¤ p(BjA). But there are situations in which we can calculate the value ofone from the other provided we are given some supplementary information. In thissection, we see how this is done.

Assume that p(A), p(B) are non-zero. By definition, p(AjB) D p(A\B)/p(B), sop(A\B) D p(AjB)�p(B). But likewise p(B\A) D p(BjA)�p(A). Since A\B D B\A thisgives us p(AjB)�p(B)D p(BjA)�p(A) and so finally:

p.AjB/ D p.BjA/ � p.A/=p.B/:

Thus, we can calculate the conditional probability in one direction if weknow it in the other direction and also know the two unconditional (alsoknown as prior, or base rate) probabilities. Indeed, we can go further. Since


p(B) D p(BjA)�p(A)C p(Bj � A)�p(�A) when p(A) and p(�A) are non-zero, as shownin the next exercise, we can substitute in the denominator to get the followingequation, which is a special case of what is known as Bayes’ theorem:

Assuming that p(A), p(�A), p(B) are non-zero so p(AjB), p(BjA) and p(Bj � A)are well defined, p(AjB) D p(BjA)�p(A)/[p(BjA)�p(A)C p(Bj � A)�p(�A)].

So we can calculate the probability of p(AjB) provided we know both of theconditional probabilities p(BjA) and p(Bj � A) in the reverse direction, and theunconditional probability p(A), from which we can of course get p(�A).

Exercise 6.7 (1)(a) Verify the claim made above, that p(B) D p(BjA)�p(A)C p(Bj � A)�p(�A) when

p(A) and p(�A) are non-zero. Hint: Use the fact that B D (B\A)[(B\ � A).(b) When p(B) D 0 then p(AjB) is not defined, but when p(B) ¤ 0 then p(AjB) is

defined even if p(A) D 0 or p(�A) D 0. In these limiting two cases, we can stillcalculate the value of p(AjB), not by Bayes’ theorem but quite directly. What isthe value in those cases?

Bayes’ theorem is named after the eighteenth-century clergyman-mathematicianwho first noted it, and is surprisingly useful (also, as we will see, abusable). For itcan often happen that we have no information permitting us to determine p(AjB)directly, but do have statistical information, that is, records of frequencies, allowingus to give figures for p(BjA) and for p(Bj � A), together with some kind of estimateof p(A). With that information, Bayes’ theorem tells us how to calculate p(AjB).

However, applications of Bayes’ theorem can be controversial. The figurescoming out of the calculation are no better than those going into it: as the sayinggoes, garbage in, garbage out. In practice, it often happens that we have goodstatistics for estimating p(BjA), perhaps less reliable ones for p(Bj � A), and onlyvague theoretical reasons for guessing approximately at p(A). A good critical senseis needed to avoid being led into fantasy by the opportunity for quick calculation.

Indeed, probability theorists and statisticians tend to be divided into what arecalled Bayesians and their opponents. The difference is one of attitude or philosophyrather than mathematics. Bayesians tend to be happy with applying Bayes’ theoremto get a value for p(AjB) even in contexts where available statistics give us no seriousindication of the value of the ‘prior’ p(A). They are willing to rely on non-statisticalreasons for its estimation or are confident that possible errors deriving from thissource can be ‘washed out’ in the course of successive conditionalizations as newinformation comes in. Their opponents tend to be much more cautious, unwilling toassume a probability that does not have a serious statistical backing. As one wouldexpect, subjectivists tend to be Bayesian, while frequentists do not.

It is beyond the scope of this book to enter into the debate, although the reader canpresumably guess where the author’s sympathies lie. And it should be rememberedthat whatever position one takes in the debate on applications, Bayes’ theorem asa mathematical result, stated above, remains a provable fact. The disagreement isabout the conditions under which it may reasonably be applied.

6.8 Expectation 159

Exercise 6.7 (2) (with solution) Fifteen percent of patients in a ward suffer fromthe HIV virus. Further, of those that have the virus, about 90% react positively on acertain test, whereas only 3% of those lacking the virus react positively. (i) Find theprobability that an arbitrarily chosen patient has the virus given that the test resultis positive. (ii) What would that probability be if 60% of the patients in the wardsuffered from the HIV virus, the other data remaining unchanged?

Solution First, we translate the data and goal into mathematical language. Thesample set S is the set of all patients in the ward. We write V for the set ofthose with the HIV virus and P for the set of those who react positive to thetest. The phrase ‘arbitrarily chosen’ indicates that we may use the percentages asprobabilities. We are given the probabilities p(V) D 0.15, p(PjV) D 0.9, p(Pj � V) D0.03. For (i), we are asked to find the value of p(VjP). Bayes’ theorem tellsus that p(VjP) D p(PjV)�p(V)/[p(PjV)�p(V)C p(Pj � V)�p(�V)]. The rest is calcu-lation: p(VjP) D 0.9�0.15/(0.9�0.15C 0.03�0.85)D 0.841 to the first three decimalplaces. For (ii), we are given p(V) D 0.6 with the other data unchanged, so this timep(VjP) D 0.9�0.6/(0.9�0.6C 0.03�0.4)D 0.978 to the first three decimal places.

Notice how, in this exercise, the solution depends on the value of the uncondi-tional probability (alias base rate, or prior) p(V). Other things being equal, a higherprior gives a higher posterior probability: as p(V) moved from 0.15 to 0.6, p(VjP)moved from 0.841 to 0.948. This phenomenon holds quite generally.

Bayes’ theorem can be stated in a more general manner. Clearly, the pair A,�Agives us a two-cell partition of the sample space S (with complementation beingtaken as relative to that space), and it is natural to consider more generally anypartition into n cells A1, : : : ,An. Essentially the same argument as in the two-cellcase then gives us the following general version of Bayes’ theorem. It should becommitted to memory. Consider any event B and partition fA1, : : : ,Ang of the samplespace such that B and each Ai has non-zero probability. Then for each i � n we have:

p .Ai jB/ D p .BjAi/ � p .Ai / =†j �n

�p

�BjAj

� � p�Aj

��:

Exercise 6.7 (3) A telephone helpline has three operators, Alfred, Betty, Clarice.They receive 33%, 37%, 30% of the calls, respectively. They manage to solve theproblems of 60%, 70%, 90% of their respective calls. What is the probability that asuccessfully handled problem was dealt with by Alfred, Betty, Claire, respectively?Hint: First translate into mathematical language and then use Bayes’ theoremwith n D 3.

6.8 Expectation*

Let S be any finite set, and let f : S ! R be a function associating a magnitude,positive or negative, with each of the elements of S. For example, S may consist ofrunners in a horse race, and f may give the amount of money that will be won or lost,


under some system of bets, when a horse wins. Or S could consist of the outcomesof throwing a die twice, with f specifying say the sum of the numbers appearing inthe two throws. All we need to assume about f is that takes elements of S into thereals. It need not be surjective; in applications, the range may often be quite small.When the points in f (S) are thought of as representing the ‘worth’ of elements of S,in money terms or otherwise, then we call f : S ! R a payoff function.

A terminological digression: Payoff functions are often called random variablesand written X: S ! R, particularly in texts covering the infinite as well as thefinite case. The range of the function, which we are writing as f (S) in set-theoreticnotation, is then called its range space and written RX . Once again, this is atraditional topic-specific terminology, and the reader should be able to translate itinto universal set-theoretic language. From a modern point of view, it can be quitemisleading, since a ‘random variable’ is not a variable in the syntactic sense ofa letter used to range over a domain, but rather a function. Nor is it necessarilyrandom in the sense of being associated with an equiprobable distribution. End ofdigression.

Now suppose that we also have a probability distribution p: S ! [0,1] on S(not necessarily equiprobable). Is there any way in which we can bring the payofffunction and the probability to work together, to give the ‘expected worth’ of eachelement of S? The natural idea is to weigh one by the other. A low payoff for ans 2 S with a high probability may be as desirable as a large payoff for an s0 withsmall probability. We may consider the probability-weighted payoff (or payoff-weighted probability) of an element s 2 S to be the product f (s)�p(s) of its worthby its probability. This is also known as the expected value of s. For example, whenS consists of the horses in a race, p(s) is the probability that horse s will win (asdetermined from, say, its track record) and f (s) is the amount that you stand togain or lose from a bet if s wins, then f (s)�p(s) is the gain or loss weighted by itsprobability.

This in turn leads naturally to the concept of expected value or, more briefly,expectation. It is a probability-weighted average for the payoff function itself.We introduce it through the same example of a three-horse race. Let S D fa,b,cgbe the runners, with probabilities of winning p(a) D 0.1, p(b) D 0.3, p(c) D 0.6.Suppose the payoff function puts f (a) D 12, f (b) D �1, f (c) D �1. Then the expec-tation of f given p is the sum f (a)�p(a)C f (b)�p(b)C f (c)�p(c) D 12�(0.1) � 1�(0.3)�1�(0.6) D 0.3.

In general terms, given a payoff function f : S ! R and a probability distributionp: S ! [0,1] we define the expectation of f given p, written �(f, p) or just�, by the equation � D �(f, p) D †s2Sff (s)�p(s)g. Equivalently, in a form thatcan be useful when f (S) is small and the probability function pC: P(S) ! [0,1]determined by the distribution p: S ! [0,1] is independently known, we have�(f,p) D †x2f (S)fx�pC(f�1(x))g.

Note that to calculate the expectation we need to know both the payoff functionand the probability distribution. Neither alone suffices. The pair (f,p) consisting ofthe two is sometimes referred to as a gamble. We can thus speak of the expectationof a gamble.

6.8 Expectation 161

Exercise 6.8(a) In the horse race example of the text, compare the expectation of f given p with

the arithmetic mean (alias average) of values of f, calculated (without referenceto probability) by the usual formula mean(f ) D †s2Sff (s)g/#(S). Explain thebasic reason for the difference.

(b) Show that in the special case that p: S ! [0,1] is an equiprobable distributionexpectation equals mean.

(c) Consider a pair of fair dice about to be thrown. We want to determine theexpected value of the sum of the two outcomes. First, specify the sample space,the payoff function and the probability distribution. Use the preceding part ofthis exercise to calculate the expectation without reference to the probabilityfunction.

(d) This part is for the mathematically inclined. Show that the two definitions of theexpectation of f given p are indeed equivalent.

When the expectation of a gamble comes out negative, then it is not a goodbet to make – one is likely to lose money. When it is positive, the bet isfavourable, provided of course the payoff function correctly records all your costs(perhaps including, as well as the actual outlay, taxes on earnings, time spent andopportunities foregone, stress) and benefits (perhaps including, as well as moneywon, the excitement and enjoyment given by the affair). In our horse race example,if those auxiliary costs and benefits are zero, then � D 0.3, and it is a good betto make.

Alice Box: Really a Good Bet?

Alice: Not so fast! If I bet only once, then there are only two possibilities:either I win or I lose. The probability of winning is very low (0.1),while the probability of losing is correspondingly high (0.9). So,even though the benefit of winning is high at $12 and the cost oflosing is low at $1 loss, I am much more likely to lose than to win.Not very attractive!

Hatter: If you bet only once, yes. But if you make the same bet many times,then it becomes highly probable that the outcome is close to theexpected value. We will not show that here, but it can be proven. Inthat case, the bet becomes attractive.

Alice: Provided I have enough capital to be able to put up with someprobable initial losses without undue hardship : : :

As usual, Alice has a good point. If there is a great disparity between theprobabilities of winning and losing, then we may be likely to bankrupt ourselveswhile waiting for the lucky day. One might try to factor this into the payoff function,but this would be complex and perhaps rather artificial. Once again, we see that,


while the mathematics of probability theory are quite straightforward – at least inthe finite case, which we are dealing with here – its application can remain verytricky.

In the alternative characterization of expectation, we made use of the functionpC

ºf�1: f (S) ! [0,1], that is, the composition of the inverse of the payoff functionand the probability function. We end by observing that this function turns out to bea probability distribution on f (S). Thus, in turn, it determines another probabilityfunction (pC

ºf�1)C: P(f (S)) ! [0,1].In this chapter, we have only scratched the surface of even finite probability

theory. For example, as well as expectation �, statisticians need measures of thedispersion of items around the expectation point, understood as the degree to whichthey bunch up around it or spread far out on each side. The first steps in that directionare the concepts of variance and standard distribution. We will not discuss them inthis brief chapter; there are pointers to reading below.


Exercise 6.1: Finite probability spaces(a) Consider a loaded die, such that the numbers 1 through 5 are equally likely

to appear, but 6 is twice as likely as any of the others. To what probabilitydistribution does this correspond?

(b) Consider a loaded die in which the odd numbers are equally likely, the evennumbers are also equally likely, but are three times more likely than the oddones. What is the probability of getting a prime number?

(c) Consider a sample space S with n elements. How many probability distributionsare there on this space in which p(s) 2 f0,1g for all s 2 S? For such a probabilitydistribution, formulate a criterion for p(A) D 1, where A � S.

(d) Show by induction on n that when A1, : : : ,An are pairwise disjoint, thenp(A1[ : : : [An) D p(A1) C � � � C p(An).

Exercise 6.2: Unconditional probabilities(a) Consider the probability distribution in part (a) of the preceding exercise.

Suppose that you roll the die three times. What is the probability that you get atleast one 6?

(b) You have a loaded die with probability distribution like that in part (a) of thepreceding exercise, and your friend has one with probability distribution likethat in part (b). You both roll. What is the probability that you both get the samenumber?

(c) John has to take a multiple-choice examination. There are ten questions, each tobe answered ‘yes’ or ‘no’. Marks are counted simply by the number of correctanswers, and the pass mark is 5. Since he has not studied at all, John decides toanswer the questions randomly. What is the probability that he passes?


(d) Mary is taking another multiple-choice examination, with a different markingsystem. There are again ten questions, each to be answered ‘yes’ or ‘no’, andthe pass mark is again 5. But this time marks are counted by taking the numberof correct answers and subtracting the number of incorrect ones. If a questionis left unanswered, it is treated as incorrect. If Mary answers at random, what isthe probability that she passes?

Exercise 6.3: Conditional probability(a) In a sports club, 70% of the members play football, 30% swim and 20% do both.

What are: (i) the probability that a randomly chosen member plays football,given that he/she swims? (ii) the converse probability?

(b) Two archers Alice and Betty shoot an arrow at a target. From their past records,we know that their probability of hitting the target are 1/4 and 1/5, respectively.We assume that their performances are independent of each other. If we learnthat exactly one of them hit the target, what is the probability that it was Alice?

(c) Show, as claimed in the text, that for any probability function p: P(S) ! [0,1],if p(B\C) ¤ 0 then (pB)C D pB\C D pC\B D (pC)B.

(d) Using the above, apply induction to show that any finite series of conditional-izations can be reduced to a single one.

Exercise 6.4: Independence(a) A fair coin is tossed three times. Consider the events A, B, C that the first

toss gives tails, the second toss gives tails, and we get exactly two tails in arow. Specify the sample space, and identify the three events by enumeratingtheir elements. Show that A and B are independent, and likewise A and C areindependent, but B and C are not independent.

(b) Use results established in the text to show that A is independent of B iffp(AjB) D p(Aj � B).

(c) Show that if A is independent of each of two disjoint sets B,C, then it isindependent of B[C.

(d) Construct an example of a sample space S and three events A, B, C that arepairwise independent but not jointly so.

Exercise 6.5: Bayes’ theorem(a) Suppose that the probability of a person getting flu is 0.3, that the probability of

a person having been vaccinated against flu is 0.4, and that the probability of aperson getting flu given vaccination is 0.2. What is the probability of a personhaving been vaccinated given that he/she has flu?

(b) At any one time, approximately 3% of drivers have a blood alcohol level overthe legal limit. About 98% of those over the limit react positively on a breathtest, but 7% of those not over the limit also react positively. Find: (i) theprobability that an arbitrarily chosen driver is over the limit given that the breathtest is positive; (ii) the probability that a driver is not over the limit given that the


breath test is negative; (iii) the corresponding results in a neighbouring countrywhere, unfortunately, 20% of drivers are over the limit.

(c) Suppose that we redefine conditional probability in the alternative mannermentioned in Sect. 6.4.1, putting p(AjB) D 1 whenever p(B) D 0. What happensto Bayes’ theorem in the limiting cases? Suggestion: First answer this questionfor the two-cell version of Bayes’ theorem, then for the general version.

Exercise 6.6: Expectation*(a) Consider a box of 10 gadgets, of which 4 are defective. A sample of three items

(order immaterial, without replacement, that is, mode O � R� in the notation ofthe chapter on counting) is drawn at random. What are: (i) the probability thatit has exactly one defective item, (ii) the expected number of defective items?

(b) In a television contest, the guest is presented with four envelopes from which tochoose at random. They contain $1, $10, $100, $1,000, respectively. Assumingthat there are no costs to be taken into consideration, what is the expected gain?

(c) A loaded coin has p(H) D 1/4, p(T) D 3/4. It is tossed three times. Specify thecorresponding sample space S. Let f be the payoff function on this samplespace that gives the number of heads that appear. Specify f (S). Calculate theexpectation � D �(f,p).

(d) For the more mathematically inclined: Let S be a finite sample space, f :S ! R a payoff function and p: S ! [0,1] a probability distribution on S.Show, as claimed in the last paragraph of the chapter, that the function pC

ºf�1:f (S) ! [0,1] is a probability distribution on f (S).

Selected Reading

Introductory texts of discrete mathematics tend to say rather little on probability. However, thefollowing two cover more or less the same ground as this chapter:

Lipschutz S, Lipson M (1997) Discrete mathematics, 2nd edn, Schaum’s outline series. McGrawHill, New York, chapter 7

Rosen K (2007) Discrete mathematics and its applications, 6th edn. McGraw Hill, New York,chapter 6

For those wishing to go further and include the infinite case, the authors of the first text have alsowritten the following:

Lipschutz S, Lipson M (2000) Probability, 2nd edn, Schaum’s outline series. McGraw Hill,New York

For all of the above, you may find it useful to use Table 6.1 to keep track of the more traditionalterminology and notation that is often used.

Finally, if you had fun with the interlude on Simpson’s paradox, you will enjoy the Monty Hallproblem. See the Wikipedia entry on it, and if you can’t stop thinking about it, continuewith, say:

Rosenhouse J (2009) The Monty Hall problem. Oxford University Press, Oxford

7Squirrel Math: Trees

AbstractThis chapter introduces a kind of structure that turns up everywhere in computerscience: trees. We will be learning to speak their language – how to talk abouttheir components, varieties and uses – more than proving things about them. Theflavour of the chapter is thus rather different from that of the preceding one onprobability: more use of spatial intuition, rather less in the way of demonstration.

We begin by looking at trees in their most intuitive form – rooted (aliasdirected) trees – first of all quite naked and then clothed with labels and finallyordered. Special attention will be given to the case of binary trees and their usein search procedures. Finally, we turn to unrooted (or undirected) trees and theirapplication to span graphs. As always, we remain in the finite case.

7.1 My First Tree

Before giving any definitions, we look at an example. Consider the structurepresented in Fig. 7.1, reading it from top to bottom.

This structure is a rooted tree. It consists of a set T of 15 elements a, : : : ,o,together with a relation over them. The relation is indicated by the arrows. Theelements of the set are called nodes (aka vertices); the arrows representing pairs inthe relation are called links. We note some features of the structure:• The relation is acyclic, and so in particular asymmetric and irreflexive (as defined

in the chapter on relations).• There is a distinguished element a of the set, from which all others may be

reached by following paths in the direction of the arrows. In the language ofChap. 2, all other elements are in the transitive closure of fag under the relation.This is called the root of the tree, and it is unique.

• Paths may fork, continue or stop. Thus, from the node a, we may pass to twonodes b and c. These are called the children of a. While a has just two children,d has three (i, j, k), and c has four (e, f, g, h). But b has only one (d), and m hasnone. Mixing biological with botanical metaphors, nodes without children are


165

166 7 Squirrel Math: Trees

Fig. 7.1 Example of a rooted tree

called leaves. Any path from the root to a leaf is called a branch of the tree. Notethat a branch is an entire path from root to a leaf; thus, neither (b,d,i) nor (a,c,h)nor (a,b,k,n) is a branch of the tree in the figure.

• Paths never meet once they have diverged: we never have two or more arrowsgoing to a node. In other words, each node has at most one parent. An immediateconsequence of this and irreflexivity is that the link relation is intransitive.Given these features, it is clear that in our example we can leave the arrowheads

off the links, with the direction of the relation understood implicitly from the layout.

Exercise 7.1 (with partial solution) This exercise concerns the tree in Fig. 7.1 andis intended to sharpen intuitions before we give a formal definition of what a tree is.Feel free to make use of any of the bulleted properties extracted from the figure.(a) Identify and count all the leaves of the tree.(b) Why are none of (b,d,i), (a,c,h), (a,b,k,n) branches of the tree?(c) Identify all the branches of the tree and count them. Compare the result with

that for leaves and comment.(d) The link-length of a branch is understood to be the number of links making it up.

How does this relate to the number of nodes in the branch? Find the link-lengthof each branch in the tree.

(e) Suppose you delete the node m and the link leading into it. Would the resultingstructure be a tree? What if you delete the node but leave the link? And if youdelete the link but not the node?

(f) Suppose you add a link from m to n. Would the resulting structure be a tree?And if you add one from c to l?

(g) Suppose you add a node p without any links. Would you have a tree? If you addp with a link from b to p? And if you add p but with a link from p to b? Onefrom p to a?

7.2 Rooted Trees 167

(h) Given the notions of parent and child in a tree, define those of sibling,descendant and ancestor in the natural way. Identify the siblings, descendantsand ancestors of the node d in the tree.

Solutions to (a), (b), (e), (g)(a) There are eight leaves: m, j, n, o, e, f, g, l.(b) The first does not begin from the root nor reach a leaf. The second does not

reach a leaf. The third omits node d from the path that goes from the root toleaf n.

(e) If you delete node m and the arrow to it, the resulting structure is still a tree. Butif you delete either alone, it will not be a tree.

(g) In the first case no, because the new node p is not reachable by a path from theroot a; nor can it be considered a new root for the enlarged structure because noother node is reachable from it. But in the second case, we would have a tree.In the third case, we would not, as p would not be reachable from a, nor wouldp be a new root for the enlarged structure since a is not reachable from it. In thefourth case, we do get a tree.

7.2 Rooted Trees

There are two ways of defining the concept of a rooted tree. One is explicit, giving anecessary and sufficient condition for a structure to be a tree. The other is recursive,defining first the smallest rooted tree and then larger ones out of smaller ones. Asusual, the recursive approach tends to be better for the computer, though not alwaysfor the human. Each way delivers its particular insights, and problems are sometimesmore easily solved using one than another. In particular, the recursive approachprovides support for inductive proofs.

Whichever definition we use, a limiting-case question poses itself from the verybeginning. Do we wish to allow an empty rooted tree, or require that all rooted treeshave at least one element? Intuition suggests the latter; after all, if a tree is empty,then it has no root! But it turns out convenient to allow the empty rooted tree as well,particularly when we come to the theory of binary trees. So that is what we will do.

7.2.1 Explicit Definition

We begin with the explicit approach. To get it going, we need the notion of a directedgraph or briefly digraph. This is simply any pair (G,R) where G is a set and R is atwo-place relation over G. A rooted tree is a special kind of directed graph, butwe can zoom in on it immediately, with no special knowledge of general graphtheory. The only notion we need that is not already available from Chap. 2 is thatof a path. We used the notion informally in the preceding section, but we need aprecise definition. If (G,R) is a directed graph, then a path is defined to be any finitesequence a0, : : : , an (n � 1) of elements of G (not necessarily distinct from eachother) such that each pair (ai,aiC1) 2 R.


Note that in the definition of a path, we require that n � 1; thus, we do not countempty or singleton sequences as paths. Of course, they could also be counted if wewished, but that would tend to complicate formulations down the line, where weusually want to exclude the empty and singleton ones.

Note also that we do not require that all the ai are distinct; thus, (b, b) is a pathwhen (b, b) 2 R, and (a, b, a, b) is a path when both of (a, b),(b, a) 2 R. Nevertheless,it will follow from the definition of a tree that paths like these (called loops andcycles) never occur in trees.

A (finite) rooted tree is defined to be any finite directed graph (G, R) such thateither (a) G is empty or (b) there is an element a 2 G (called the root of the tree) suchthat (i) for every x 2 G with a ¤ x there is a unique path from a to x but (ii) there isno path from a to a. When a graph (G, R) is a tree, we usually write its carrier set Gas T, so that the tree is called (T, R). In this section, we will usually say, for brevity,just tree instead of rooted tree; but when we get to unrooted trees in the last sectionof the chapter, we will have to be more explicit.

From the definition, it is easy to establish some basic properties of trees.Let (T, R) be any tree. Then:• R is acyclic.• If T ¤ ¿, then the tree has a unique root.• If T ¤ ¿, then its root has no parent.• Every element of T other than the root has exactly one parent.• No two diverging paths ever meet.

We verify these properties using proof by contradiction each time. The proofsmake frequent use of the condition ‘unique’ in the definition of a rooted tree:• For acyclicity, recall from Chap. 2 that a relation R is said to be acyclic iff there is

no path from any element x to itself, that is, no path x0, : : : ,xn (n � 1) with x0 D xn.Suppose that a tree fails acyclicity; we get a contradiction. By the supposition,there is a path from some element x to itself. So the tree is not empty. But bythe definition of a tree, there is also a unique path from the root a to x. Form thecomposite path made up of the path from the root to x followed by the one fromx to itself. This is clearly another path from the root to x, and it is distinct fromthe first path because it is longer. This contradicts uniqueness of the path from ato x.

• For uniqueness of the root, suppose for reductio ad absurdum that a and a0 aredistinct elements of T such that for every x 2 G, if a ¤ x (resp. a0 ¤ x), there is apath from a to x (resp. from a0 to x), but there is no path from a to a (resp. from a0to a0). From the first supposition, there is a path from a to a0, and by the second,there is a path from a0 to a. Putting these two paths together gives us one from ato a, giving us a contradiction.

• To show that the root has no parent, suppose a 2 T is the root of the tree, and thatit has a parent x. By the definition, there is a path from a to x, so adding the linkfrom x to a gives us a path from a to a, contradicting the definition.

• Let b be any element of T distinct from the root a. By the definition of a tree, thereis a path from the root a to b, and clearly, its last link proceeds from a parent of b,

7.2 Rooted Trees 169

so b has at least one parent. Now suppose that b has two distinct parents x and x0.Then there are paths from the root a to each of x and x0, and compounding themwith the additional links from x and x0 to b gives us two distinct paths from a tob, contradicting the definition of a tree.

• Finally, if two diverging paths ever meet, then the node where they meet wouldhave two distinct parents, contradicting the preceding property.The link-height (alias level) of a node in a tree is defined recursively: that of

the root is 0 and that of each of the children of a node is one greater than that ofthe node. The node-height of a node (alias just its height in many texts) is definedby the same recursion, except that the node-height of the root is set at 1. Thus, forevery node x, node-height(x)D link-height(x) C 1. As trees are usually drawn upsidedown, the term ‘depth’ is often used instead of ‘height’. Depending on the problemunder consideration, either one of these two measures may be more convenient touse than the other. The height (whether in terms of nodes or links) of a non-emptytree may be defined to be the greatest height (node/link, respectively) of any of itsnodes. The height of the empty tree is zero.

These notions are closely related to that of the length of a path in a tree. Formally,the link-length of a path x0, : : : , xn (n � 1) is n, its node-length is n C 1. Thus,the height of a node (in terms of nodes or links) equals to the length (node/link,respectively) of the unique path from the root to that node – except for the case ofthe root node since we are not counting one-element sequences as paths.

Exercise 7.2.1 (with partial solution)(a) What is the (link/node)-height of the tree in Fig. 7.1?(b) Consider a set with n � 2 elements. What is the greatest possible (link/node)-

height of any tree over that set? And the least possible?(c) For n D 4, draw rooted trees with n nodes, one tree for each possible height.(d) We have shown that the relation R of a rooted tree (T, R) is acyclic. Show that

this immediately implies its irreflexivity and asymmetry.(e) In Chap. 4 Sect. 4.5.1, we gave a table of calls that need to be made when

calculating F(8) where F is the Fibonacci function. Rewrite this table as a tree,carrying its construction through to the elimination of all occurrences of F.

Solutions to (a), (b) and (d)(a) The link-height of the tree in Fig. 7.1 is 4, and its node-height is 5.(b) Let n � 2. The greatest possible (link/node)-height is obtained when the tree

consists of a single branch, that is, is a sequence of nodes. The node-length ofsuch a branch is n, and its link-length is n � 1. The least possible measure isobtained when the tree consists of the root with n � 1 children; the link-lengthis then 1, and its node-length is 2.

(d) Irreflexivity and asymmetry both follow immediately from acyclicity by takingthe cases n D 1 and n D 2.


Fig. 7.2 Recursive definitionof a tree, adding a new root

7.2.2 Recursive Definition

We pass now to recursive definitions of a rooted tree. There are two main ways ofexpressing them: the recursion step can bring in a new root or a new leaf. The mostcommon presentation makes a new root, and we give it first.• The basis of the definition stipulates that (¿, ¿), consisting of the empty set and

empty relation over it, is a rooted tree (although it has no root).• The recursion step tells us that whenever we have a finite non-empty collection

of disjoint trees and a fresh item a, then we can form a new tree by taking ato be its root, linked to the roots of the given trees when the latter are non-empty. Formally, suppose (Ti, Ri) (1 � i � n) are rooted trees. Let B be the setof their roots. Suppose that the Ti are pairwise disjoint and a 62 [fTigi�n. Then,the structure (T,S) with T D [fTigi � n[fag and S D [fRigi � n[f(a,b): b 2 Bgi � n

is a tree with root a.Note the conditions that the Ti must be disjoint and that a must be fresh. The

recursion step may be illustrated by Fig. 7.2.Here, the Ti are represented schematically by triangles. They are the immediate

(proper) subtrees of the entire tree. They may be empty, singletons, or contain morethan one node.

Exercise 7.2.2 (1) (with partial solution)(a) Use the above recursive definition of a tree to construct five trees of respective

node-heights 0, 1, 2, 3, 4, each one obtained from its predecessor by an applica-tion of the recursion step.

(b) How many rooted subtrees does a non-empty rooted tree with n � 1 nodes have?

Solution to (b)(b) Clearly, there is a one-one correspondence between the nodes themselves and

the non-empty subtrees with them as roots. This includes the given tree, whichis a subtree of itself. Thus, there are n non-empty rooted subtrees. Counting alsothe empty subtree, this gives us n C 1 rooted subtrees.

We could also define rooted trees recursively by making fresh leaves. The basisis most conveniently formulated with two parts rather than one: The empty tree is arooted tree, and so too is any singleton T D fag, R D ¿ with a as root. The recursionstep then says: Whenever we have a non-empty rooted tree and a fresh item x, then

7.3 Working with Trees 171

we can form a new tree by choosing a node b in it and linking b to x. In this case,the root of the new tree remains that of the old one.

This recursive definition corresponds more closely to most people’s thoughtswhen they draw a tree on the page. One usually starts by placing a root, whichretains that role throughout the construction, and then one adds new nodes bylengthening or dividing a branch. But the other recursive definition, by ‘new root’,lends itself more easily to inductive proofs and computations and so is more popularin computer science. We take it as our official definition.

Exercise 7.2.2 (2)(a) Take the last tree that you constructed in the preceding exercise, and reconstruct

it step by step following a ‘new leaf’ recursion.(b) Use the ‘new leaf’ recursive definition to show that any tree with n � 1 nodes

has n � 1 links.(c) Do the same using the ‘new root’ recursive definition.

7.3 Working with Trees

Now that we have a definition for the notion of a rooted tree, we look at somesituations in which trees can come up, and ways of annotating, or one might say,clothing or decorating them.

7.3.1 Trees Grow Everywhere

What are some typical examples of trees in everyday life? The first that springs tomind may be your family tree – but be careful! On the one hand, people have twoparents, not one; and diverging branches of descendants can meet (legally as wellas biologically) when cousins or more distant relations have children together. Butif we make a family tree consisting of the male line only (or the female line only)of descendants of a given patriarch (resp. matriarch), then we do get a tree in themathematical sense.

The most familiar example of a tree in today’s world is given by the structure offolders and files that are available (or can be created) on your personal computer.Usually, these are constrained so as to form a tree. For those working as employeesin an office, another familiar example of a tree is the staff organogram, in whichthe nodes are staff members (or rather their posts) and the links indicate theimmediate subordinates of each node in the hierarchy. In a well-constructed patternof responsibilities, this will often be a tree.

Trees also arise naturally whenever we investigate grammatical structure. Innatural languages such as English, this reveals itself when we parse a sentence.In formal languages of computer science, mathematics and logic, parsing trees arisewhen we consider the structure of a formal expression. The syntactic analysis of adeclaration in Prolog, for example, may take the form of a tree.


In logic, trees also arise in other ways. A proof or derivation can be representedas a tree, with the conclusion as root and the assumptions as leaves. And one popularway for checking whether a formula of propositional logic is a tautology proceedsby constructing what is called its semantic decomposition tree.

We will give some examples of parsing trees shortly. Proof trees and semanticdecomposition trees will be described in the following chapters on logic. But first,we should introduce some refinements into our theory, introducing labelled andordered trees.

Alice Box: Which Way Up?

Alice: Before going on, in Fig. 7.1 why did you draw the tree upside down,with the root at the top?

Hatter: Just a convention.Alice: Is that all?Hatter: Well, that’s the short answer. Let’s get on to labels.

A bit too short! In fact, we sometimes draw trees upside down, and sometimes,the right way up. The orientation depends, in part, on how we are constructing thetree. If we are thinking of it as built recursively from the root to the leaves by a‘new leaf’ recursion, then it is natural to write the root at the top of the page andwork downwards. As that is what humans most commonly do in small examples,diagrams are usually drawn with the root at the top. On the other hand, if we arebuilding the tree by a ‘new root’ recursion, then it makes sense to put the leaves(which remain leaves throughout the construction) at the top of the page and workdown to the root. Both of these modes of presentation show themselves in logic.In the following two chapters, formal derivations will be represented by trees withthe leaves (assumptions) at the top and the root (conclusion) below, while semanticdecomposition trees for testing validity will be upside down.

7.3.2 Labelled and Ordered Trees

In practice, we rarely work with naked trees. They are almost always clothedor decorated in some way – in the technical jargon, labelled. Indeed, in manyapplications, the identity of the nodes themselves is of little or no importance; theyare thought of as just ‘points’; what is important are the labels attached to them.

A labelled tree is a tree (T, R) accompanied by a function œ: T ! L (œ and L arefor ‘label’). L can be any set, and the function œ can also be partial, that is, defined ona subset of T. Given a node x, œ(x) indicates some object or property that is placed atthat point in the tree. Thus, heuristically, the nodes are thought of as hooks on whichto hang labels. We know that the hooks are there and that they are all distinct from

7.3 Working with Trees 173

each other, but usually what really interest us are the labels. In a diagram, we writethe labels next to the nodes and may even omit to give names to the nodes, leavingthem as dots on the page.

It is important to remember that the labelling function need not be injective. Forexample, in a tree of folders and files in your computer, we may put labels indicatingthe date of creation or of last modification. Different folders may thus have the samelabels. In a graphic representation, dots at different positions in the tree may havethe same label associated with them.

A given tree may come with several parallel systems for labelling its nodes. Thesecould, of course, be amalgamated into one complex labelling function whose labelsare ordered n-tuples of the component labels, but this is not always of any advantage.It is also sometimes useful to label the links rather than (or in addition to) the nodes.For example, we might want to represent some kind of distance, or cost of passage,between a parent and its children. Sometimes, the decision whether to place thelabels on nodes or on links is just a matter of convention and convenience; in othercases, it may make a difference. Remember that trees are tools for use, as much asobjects for study, and we have considerable freedom in adapting them to the needsof the task in hand.

An ordered tree is a tree in which the children of each node are put into a linearorder and labelled with numbers 1, : : : , n to record the ordering. In this case, thelabelling function is a partial function on the tree (the root is not in its domain) intothe positive integers. We give two examples of labelled, ordered trees.

Example 1 Consider the arithmetic expression (5 C (2�x)) � ((y � 3) C (2 C x)).Evidently, it is formed from two subexpressions, 5 C (2�x) and (y � 3) C (2 C x)by inserting a sign for the operation of subtraction (and a pair of brackets). Asthis operation is not commutative, the order of application is important. We mayrepresent the syntactic structure of the expression by a labelled tree, whose root isthe whole expression, which has just two children, labelled by the two immediatesubexpressions. Continuing this process until decomposition is no longer possible,we get the following labelled and ordered tree.

Alice Box: Drawing Trees

Alice: Why do we have to regard the expressions written alongside the nodesas labels? Why not just take them to be the names of the nodes? Doesit make a difference?

Hatter: It does, because the labelling function need not be injective. In ourexample, we have two different nodes labelled by the numeral 2. Wealso have two different nodes labelled by the variable x. They have tobe distinguished because they have different roles in the expressionand so different places in the tree.

(continued)


(continued)Alice: I see. But that leads me to another question. Where, in the figure, are

the names of the nodes?Hatter: We have not given them. In the diagram they are simply represented

by dots. To avoid clutter, we name them only if we really need to.Alice: One more question. If this is an ordered tree, then the children of each

node must be given ordering labels. Where are they?Hatter: We let the sheet of paper take care of that. In other words, when

we draw a diagram for an ordered tree, we follow the conventionof writing the children in the required order from left to right on thepaper. This is another way of reducing clutter.

Alice: So this tree is different from its mirror image, in which left and rightare reversed on the page?

Hatter: Yes, since it is an ordered tree. If it were meant as an unordered tree,they would be the same.

Actually, the tree in Fig. 7.3 can be written more economically. The interior nodes(i.e. the non-leaves) need not be labelled with entire subexpressions; we can justwrite their principal operations. This gives us the diagram in Fig. 7.4. It does notlose any information for we know that the operation acts on the two children in theorder given. It can be regarded as an alternative presentation of the same tree.

Syntactic decomposition trees such as these are important for evaluating anexpression, whether by hand or by computer. Suppose the variables x and y are

Fig. 7.3 Tree for syntacticstructure of an arithmeticexpression

Fig. 7.4 Syntactic structurewith economical labels

7.4 Interlude: Parenthesis-Free Notation 175

instantiated to 4 and 7, respectively. The most basic way of evaluating the expressionas a whole is by ‘climbing’ the tree from leaves to root, always evaluating thechildren of a node before evaluating that node.

Example 2 Syntactic decomposition trees arise not only for arithmetic expressionsbut for any kind of ‘algebraic’ expression in mathematics or logic. Take, forexample, the formula of propositional logic (p ! :q) ! (:r$(s^:p)). Clearly, thestructure of this expression can be represented by a syntactic decomposition tree.

Exercise 7.3.2(a) Evaluate the expression labelling the root of Fig. 7.3, for the values 4,7 for x,y,

respectively, by writing the values in the diagram alongside the labels.(b) (i) Draw the syntactic decomposition tree for the formula of Example 2, once

with full labelling and again with abbreviated labelling. (ii) What are its nodeand link heights? (iii) Evaluate the formula for the values p D 1, q D 1, r D 0and s D 0, writing the values as additional labels next to the nodes on the tree.Reminder: You will need to use the truth tables given in logic boxes in Chap. 1.

7.4 Interlude: Parenthesis-Free Notation

In our arithmetical and logical examples of syntactic decomposition trees, weused brackets. This is necessary to avoid ambiguity. Thus, the expression 5 C 2�xis ambiguous unless brackets are put in (or a convention adopted, as indeed isusually done for multiplication, that it ‘binds more strongly than’ or ‘has priorityover’ addition). Likewise, the logical expression :r$s^:p is ambiguous unlessparentheses are inserted or analogous conventions employed.

In practice, we frequently adopt priority conventions and also drop bracketswhenever different syntactic structures have the same semantic content, as, forexample, with x C y C z. Such conventions help prevent visual overload and are partof the endless battle against the tyranny of notation.

For a while in the early twentieth century, a system of dots was used bysome logicians (notably Whitehead and Russell in their celebrated PrincipiaMathematica) to replace brackets. Thus, (p ! :q) ! (:r$(s^:p)) was written asp ! :q.!::r$.s^:p where the dot after the q indicates a right bracket (the leftone being omitted because it is at the beginning of the formula), the dot before thes marks a left bracket (its right partner omitted because it is at the end) and the two-dot colon marks another left bracket. The number of dots indicates the ‘level’ of thebracket. However, the system did not catch on and died out.

Is there any systematic way of doing without brackets altogether, withoutspecial priority conventions or alternative devices like dots? There is, and itwas first noted by the philosopher/logician Łukasiewicz early in the twentiethcentury. Instead of writing the operation between its arguments, just write it beforethem! Then, brackets become redundant. For example, the arithmetical expression(5 C (2�x)) � ((y � 3) C (2 C x)) is written as � C5�2x C � y3 C 2x.


To decipher this last sequence, construct its (unique!) decomposition tree fromthe leaves to the root. The leaves will be labelled by the constants and variables.C2x will label the parent of nodes labelled by 2 and x; �y3 will label the parent ofnodes labelled by 3 and y, etc.

This is known as Polish notation, after the nationality of its inventor. There is alsoreverse Polish notation, where the operation is written after its arguments, and whichlikewise makes bracketing redundant. The ordinary way of writing expressions isusually called infix notation, as the operation is written in between the arguments.The terms prefix and postfix are often used for Polish and reverse Polish notation,respectively.

Exercise 7.4 (with partial solution)(a) Draw all possible decomposition trees for each of the ambiguous expressions

5 C 2�x and :r$s^:p, writing the associated bracketed expression next to eachtree.

(b) Draw the whole syntactic decomposition tree for the Polish expres-sion � C5�2x C �y3 C 2x. Compare it with the tree for (5 C (2�x)) � ((y � 3) C(2 C x)).

(c) Put the propositional formula (p ! :q) ! (:r$(s^:p)) in both Polish (prefix)and reverse Polish (postfix) notation.

(d) Insert a few redundant brackets into the outputs just obtained to make themfriendlier to non-Polish readers.

Solutions to (c) and (d)(c) In prefix notation, it is ! !p:q$:r^s:p. In postfix notation, it is

pq: ! r:sp:^$ ! .(d) !(!p:q)($(:r(^s:p)) and (pq:!)(r:(sp:^)$)! .

7.5 Binary Trees

Computer science and logic both love binary concepts. In the case of logic, theyappear at its very foundations: classical logic is two-valued with truth tables forthe logical connectives defined using functions into the two-element set f0,1g. Incomputer science, binary concepts ultimately derive from a basic practical fact: aswitch can be on or off. This means that an essential role is played by bits, andbinary notions are propagated through the whole superstructure. Of all kinds of tree,the most intensively employed in computing is the binary one. So we look at it inmore detail.

A binary tree is a rooted tree (possibly empty) in which each node has at mosttwo children, equipped with an ordering that labels each child in the tree with a tagleft or right. A peculiarity about the ordering required for binary trees is that evenwhen a node has only one child, that child is also labelled ‘left’ or ‘right’. Whendrawing the tree on paper, we need not write these two labels explicitly; we can

7.5 Binary Trees 177

Fig. 7.5 Which are binary trees?

understand them as given by the position of nodes on the page. For this reason, in abinary tree we will never draw an only child vertically below its parent.

To illustrate, of the four trees in Fig. 7.5, the first is not a binary tree since thechild of the root has not been given (explicitly or by orientation on the page) a leftor right label. The remaining three are binary trees. The second and third are distinctbinary trees since the second gives a right label to the child of the root, while thethird gives it a left label. The third and the fourth are the same when consideredsimply as binary trees, but they differ in that the fourth has additional (numerical)labels.

What are binary trees used for? They, and more specifically a kind of binary treeknown as a binary search tree, are convenient structures on which to record andmanipulate data. Surprisingly, they do this more efficiently than linear sequences.There are very efficient algorithms for working with them, in particular, forsearching, deleting and inserting items recorded on them. There are also goodalgorithms for converting lists into binary search trees and back again. Moreover,any finite tree can be reconfigured as a binary search tree in a natural way.

What is a binary search tree? Suppose we have a binary tree and a labellingfunction œ: T ! L into a set L that is linearly ordered by a relation <. The set Lmay consist of numbers, but not necessarily so; the ordering may, for example, bea lexicographic ordering of expressions of a formal language. For simplicity, wewill consider the case that the labelling is a full function (i.e. has the entire set T asdomain) and is injective. In this situation, we may speak of a linear ordering < ofthe nodes themselves, determined by the ordering of the labels. Generalization tonon-injective labelling functions is not difficult, but a little more complex.

A binary tree so equipped is called a binary search tree (modulo the labellingfunction œ and linear relation <) iff each node x is greater than every node in theleft subtree of x and is less than every node in the right subtree of x.

Exercise 7.5 (1) (with solution) Which of the four binary trees in Fig. 7.6 arebinary search trees, where the order is the usual order between integers?

Solution The four trees differ only in their labelling systems. Tree (a) is not a binarysearch tree for three distinct reasons: 2 < 4, 10 > 6, 10 > 9. Any one of these would


Fig. 7.6 Which are binary search trees?

be enough to disqualify it. Tree (b) isn’t either because 5 > 3. Nor is (c) a binarysearch tree in our sense of the term since the labelling is not injective. Only (d) is abinary search tree.

The exercise brings out the fact that in the definition of a binary search tree, wedon’t merely require that each node is greater than its left child and less than itsright child. We require much more: Each node must be greater than all nodes in itsleft subtree and less than all nodes in its right subtree.

Suppose we are given a binary search tree T. How can we search it to find outwhether it contains an item x? If the tree is empty, then evidently it does not containx. If the tree is not empty, the following recursive procedure will do the job:Compare x with the root a of T.

If x D a, then print yes.If x < a then

If a has no left child, then print no.If a does have a left child, apply the recursion to that child.

If x > a thenIf a has no right child, then print no.If a does have a right child, apply the recursion to that child.

Exercise 7.5 (2) (with partial solution)(a) Execute the algorithm manually to search for 6 in the last tree of Fig. 7.6.(b) Do the same to search for 8.(c) For readers familiar with pseudo-code, express the above algorithm using it.

7.5 Binary Trees 179

Solutions to (a) and (b)(a) We compare 6 with the root 5. Since 6 > 5, we compare 6 with the right child 7

of 5. Since 6 < 7, we compare 6 with the left child 6 of 7. Since 6 D 6, we printyes.

(b) We compare 8 with the root 5. Since 8 > 5, we compare 8 with the right child 7of 5. Since 8 > 7, we compare 8 with the right child 10 of 7. Since 8 < 10, wecompare 8 with the left child 9 of 10. Since 8 < 9 and 9 has no left child, weprint no.

The interesting thing about this algorithm is its efficiency. The height of the treewill never be more than the length of the corresponding list of items, and usuallyvery much less. At one end of the spectrum, when no node has two children, thenode-height of the tree and the length of the list will be the same. At the other end ofthe spectrum, if every node other than the leaves has two children and all branchesare the same length, then the non-empty tree will have †f2i: 0 � i < hg D 2h � 1nodes where h is the node-height of the tree. A search using the algorithm down abranch of length h will thus usually be very much quicker, and never slower, thanone plodding through a list of all 2h � 1 nodes.

The algorithm described above can be refined. There are also algorithms to insertnodes into a binary search tree and to delete nodes from it, in each case ensuringthat the resulting structure is still a binary search tree. The one for insertion is quitesimple, as it always inserts the new node as a leaf. That for deletion is rather morecomplex, as it deletes from anywhere in the tree, leaves or interior nodes. We sketchthe insertion algorithm, omitting the deletion one.

To insert an item x, we begin by searching for it using the search algorithm. Ifwe find x, we do nothing since it is already there. If we don’t find it, we go to thenode y where we learned that x is absent from the tree (it should have been in a leftresp. right subtree of y, but y has no left resp. right child). We add x as a child of ywith an appropriate left or right label.

Exercise 7.5 (3) (with solution) Consider again the binary search tree in Fig. 7.6.What is the tree that results from inserting the node 8 by means of the algorithmsketched above?

Solution Searching for 8 in the tree, we reach node 9 and note that 8 should be inits left subtree, but that 9 is a leaf. So we add 8 as left child of 9.

We can also iterate the insertion algorithm to construct a binary search tree froma list. Take, for example, the list l of nine words: this, is, how, to, construct, a,binary, search, tree and that our ordering relation < for the tree construction is thelexicographic one (dictionary order). We begin with the empty tree and search in itfor the first item in the list, the word ‘this’. Evidently, it is not there, so we put it asroot. We then take the next item in the list, the word ‘is’ and search for it in the treeso far constructed. We don’t find it, but note that ‘is’ < ‘this’, and so we add it as aleft child of ‘this’. We continue in this way until we have completed the tree.


Exercise 7.5 (4)(a) Complete the construction of the binary search tree from the word-list l D (this,

is, how, to, construct, a, binary, search, tree).(b) For readers familiar with pseudo-code, express the above algorithm for con-

structing a binary search tree using it.

Evidently, when constructed by this algorithm, the shape of the tree depends onthe order in which the items are listed. For example, if in the above example thesame words were presented in the list l0 D (a, binary, construct, how, is, search, this,to, tree), the binary search tree would be a chain going down diagonally to the right,and just as high as the list is long!

Clearly, we would like to keep the height of the tree as low as possible. Theoptimum is to have a binary tree with two children for every node other than theleaves, and all branches of the same length. This is possible only when the list tobe encoded has node-length 2h � 1 for some h � 0. Nevertheless, for any value ofh � 0, it is possible to construct a complete binary search tree in which no branchis more than one node longer than any other and where the longer branches are allto the left of the shorter ones. Moreover, there are good algorithms for carrying outthis construction, although we will not describe them here.

7.6 Unrooted Trees

We now go back to the rooted tree at the beginning of this chapter in Fig. 7.1 andplay around with it. Imagine that instead of the figure made of dots and lines onpaper, we have physical model, made of beads connected with pieces of string orwire. We can then pick up the model, push the root a down, and bring up b, say, sothat it becomes the root. Or we can rearrange the beads on a table so that the modelloses its tree-like shape and looks more like a road map in which none of the nodesseems to have a special place. When we do this, we are treating the model as anunrooted tree.

7.6.1 Definition

Formally, an unrooted tree (alias undirected tree) may be defined as any structure(T,S) that can be formed out of a rooted tree (T,R) by taking S to be the symmetricclosure of R.

Recall the definition of the symmetric closure S of R: it is the least symmetricrelation (i.e. intersection of all symmetric relations) that includes R. Equivalently,and more simply, S D R[R�1. Diagrammatically, it is the relation formed bydeleting the arrowheads from the diagram of R (if there were any) and omittingany convention for reading a direction into the links.

7.6 Unrooted Trees 181

Exercise 7.6.1 (1)(a) Take the rooted tree in Fig. 7.1 and redraw it as an unrooted tree with the nodes

scattered haphazardly on the page.(b) Reroot the unrooted tree that you obtained by drawing it with node m as root.(c) Check the equivalence of the two definitions of symmetric closure.

In the context of unrooted trees, terminology changes (yes, yet again). Nodes inunrooted trees are usually called vertices. More important, as well as speaking oflinks, which are ordered pairs (x,y), that is, elements of the relation, we also need tospeak of edges (in some texts, arcs), identifying them with the unordered pairs fx,ygsuch that both (x,y) and (y,x) 2 S.

Exercise 7.6.1 (2) We observed in Exercise 7.2.2 (2) that a rooted tree (T, R) withn � 1 nodes has exactly n � 1 links. Use this to show that an unrooted tree (T, S)with n � 1 vertices has exactly n � 1 edges.

Alice Box: Unrooted Trees

Alice: That’s a nice, simple definition – provided we are coming to thesubject via rooted trees. But what if I wanted to study unrooted treesbefore the rooted ones? Could I define them in a direct way?

Hatter: No problem. In fact that is the way most textbooks do it. We will seehow shortly, after noting some properties emerging from our presentdefinition.

It appears that the mathematical concept of an unrooted tree was first articulatedin the middle of the nineteenth century and popularized by the mathematicianArthur Cayley discussing problems of chemistry in 1857. Graphs were alreadybeing used to represent the structure of molecules, with vertices (nodes) representingatoms and (undirected) edges representing bonds. Cayley noticed that the saturatedhydrocarbons – that is, the isomers of compounds of the form CnH2nC2 – have aspecial structure: they are all what we now call unrooted trees. For rooted trees,intuitive uses go back much further. For example, in the third century AD, aneoplatonist philosopher commentating on the logic of Aristotle introduced whatbecame known as the ‘tree of Porphyry’.

7.6.2 Properties

Let (T, S) be an unrooted tree, with S the symmetric closure of the link relation Rof a rooted tree (T,R). Then, it is easy to show that S is connected (over T), that is,for any two distinct elements x, y of T, there is an S-path from x to y, that is, a finitesequence a0, : : : ,an (n � 1) of elements of T such that x D a0, y D an, and each pair


(ai,aiC1) 2 S. The proof is very straightforward. Since (T,R) is a rooted tree, it hasa root a. We distinguish three cases. In the case that a D x, by the definition of arooted tree we have a R-path from x to y, and this is an S-path. In the case that a D y,we have an R-path from y to x and thus, running it in reverse (which is legitimatesince S is symmetric), we have an S-path from x to y. Finally, if a ¤ x and a ¤ y, weknow that there are R-paths from a to each of x, y considered separately. Reversingthe path from a to x, we have an S-path from x to a; and continuing it from a to y(i.e. take the composition of the two paths), we end up with one from x to y.

Exercise 7.6.2 (1) Draw sample tree diagrams to illustrate the three cases of thisproof and the constructions made.

Less obvious is the fact that S is a minimal connected relation over T. In otherwords, no proper subrelation of S is connected over T; that is, for any pair (x, y) 2 S,the relation S0 D Snf(x,y)g is not connected over T. To show this, note that either(x, y) 2 R or (y,x) 2 R. Consider the former case; the latter is similar. We claim thatthere is no S0-path from a to y where a is the root of the rooted tree (T,R). Suppose forreductio ad absurdum that there is such a S0-path, that is, a finite sequence a0, : : : ,an

(n � 1) of elements of T such that a0 D a, an D y, and each pair (ai,aiC1) 2 S0. Wemay assume without loss of generality that this is a shortest such path so thatin particular never ai D aiC2. Since (an�1,an) 2 S0, we have either (an�1,an) 2 R orconversely (an,an�1) 2 R. But the former is impossible. Reason: Since an D y andno node can have two R-parents, we must have an�1 D x so that (x,y) is the lastlink in the S0-path, contradicting the fact that (x,y) 62 S0. Thus, the latter alternative(an,an�1) 2 R must hold. But then by induction from n to 1, we must have each(aiC1,ai) 2 R, for otherwise we would have an ai with two distinct R-parents. Thisgives us an R-path from y to the root a, which is impossible by the definition of arooted tree.

This is a tricky little proof, partly because it uses one proof by contradictionwithin another, and partly because it also uses induction and wlog. Students areadvised to read it carefully, at least twice.

Exercise 7.6.2 (2)(a) Draw a diagram to illustrate the above argument.(b) ‘We may assume without loss of generality that this is a shortest such path’.

What principle lies behind that remark? Hint: Reread the discussion of wlogproofs at the end of Sect. 4.7.2 of Chap. 4.

(c) ‘For otherwise, we would have an ai with two distinct R-parents’. Explain inmore detail why.

The second key property of unrooted trees is that they have no cycles that arein a certain sense ‘simple’. A cycle is a path whose first and last items are thesame. So, unpacking the definition of a path, a cycle of an unrooted tree (T,S) isa sequence a0, : : : ,an (n � 1) of elements of T with each (ai,ai C 1) 2 S and an D a0.A simple cycle is one with no repeated edges, that is, for no i < j < n, do we


have fai,aiC1g D faj,ajC1g. Expressed in terms of the relation R of the underlyingrooted tree, the sequence a0, : : : ,an D a0 never repeats or reverses an R-link. It may,however, repeat vertices.

For example, the cycles a,b,c,b,a and a,b,c,e,c,b,a are not simple, since eachcontains both of (b,c) and (c,b). The cycle a,b,c,e,b,c,a is not simple either, asit repeats the link (b,c). On the other hand, the cycle a,b,c,d,e,c,a is simple, despitethe repetition of vertex c.

Exercise 7.6.2 (3)(a) Take the first-mentioned cycle a,b,c,b,a and add c in second place, that is, after

the initial a. Is it simple?(b) Take the last-mentioned cycle a,b,c,d,e,c,a and drop the node b. Is it simple?

What if we drop e?

Clearly, any unrooted tree with more than one vertex is full of cycles: whenever(x,y) 2 S, then by symmetry (y,x) 2 S giving us the cycle x,y,x. The interesting pointis that it never has any simple cycles. It can be a bit tricky for students to getthis point clear in their minds, for its formulation uses two negations: there are nocycles that contain no repeated edges. To get a better handle on it, we can express itpositively: every cycle in (T,S) has at least one repeated edge.

Alice Box: No Simple Cycles

Alice: No simple cycles: in other words, acyclic?Hatter: Not so fast! Acyclicity, as we have defined it in Sect. 7.2 means that

there are no cycles at all, that is, no paths a0, : : : ,an (n � 1) with each(ai,aiC1) 2 S and an D a0. Here we are requiring less; we require onlythat there are no simple cycles. And in the present context, that is allwe can ask for. As remarked in the text, when S is symmetric it willalways contain non-simple cycles, by back-tracking.

Alice: But I think I have seen some other books using the term ‘acyclic’ inthis way.

Hatter: Indeed, some do but then, of course, they need another term fornot having any cycles at all! It would be nice to have special namefor ‘no simple cycles’, say ‘nocyclic’, but let’s not make a messyterminological landscape even more confusing.

Alice: I hope I don’t get mixed up : : :

Hatter: If you do, go to the positive formulation above: ‘no simple cycles’means ‘every cycle has a repeated edge’.

To prove the claim, suppose for reductio ad absurdum that (T,S) is an unrootedtree containing a simple S-cycle a0, : : : ,an D a0. We may assume without loss ofgenerality that this is a shortest one so that in particular ai ¤ aj for distinct i,jexcept for the end points a0 D an. We distinguish three cases and find a contradictionin each.


Case 1. Suppose (a0,a1) 2 R. Then for all i with 0 � i < n, we have (ai,aiC1) 2 R,for otherwise some aiC1 would have two distinct R-parents ai and aiC2, which isimpossible. Thus, the S-cycle a0, : : : ,an D a0 is in fact an R-cycle, which we knowfrom earlier in this chapter is impossible for the link relation R of a rooted tree.Case 2. Suppose (an,an�1) 2 R. A similar argument shows that this case is alsoimpossible.Case 3. Neither of the first two cases holds. Then, (a1a0) 2 R and (an�1,an) 2 R.Since a0 D an and no node of a rooted tree can have two distinct parents, this impliesthat a1 D an�1. But that gives us a shorter S-cycle a1, : : : ,an�1 D a1 which must alsobe simple, again giving us a contradiction.

Exercise 7.6.2 (4)(a) Draw circular diagrams to illustrate the three cases of this proof.(b) In Case 1, check fully the claim ‘which is impossible’.

Indeed, we can go further: when (T,S) is an unrooted tree then S is maximallywithout simple cycles. What does this mean? Let (T,S) be an unrooted tree, derivedfrom a rooted tree (T,R). Let x,y be distinct elements of T with (x,y) 62 S. Then, thestructure (T, S0) where S0 D S[f(x,y), (y,x)g contains a simple cycle.

We will not give a full proof of this, just sketching the underlying construction. Inthe principal case that x, y are both distinct from the root a of (T,R), there are uniqueR-paths from a to each of them. Take the last node b that is common to these twoR-paths and form an S-path from x up to b (using R�1) and then down to y (usingR). With (y,x) also in S0, we thus get an S0-cycle from x to x, and it is not difficult tocheck that this cycle must be simple.

Exercise 7.6.2 (5)(a) Draw a tree diagram to illustrate the above proof sketch.(b) Fill in the details of the above proof by (i) covering the cases that either x or y is

the root and (ii) in the case that neither is the root, showing that the constructedcycle is simple, as claimed in the text.

Summarizing what we have done so far in this section, we see that unrooted trees(T,S) have a symmetric relation S, and when T is non-empty, satisfy the followingsix conditions:• S is the symmetric closure of the link relation of some rooted tree (T,R) (by

definition).• S minimally connects T.• S connects T and has n � 1 edges, where n D #(T).• S connects T and has no simple cycles.• S is maximally without simple cycles.• S has no simple cycles and has n � 1 edges, where n D #(T).

In fact, it turns out (although we do not prove it here) that these six conditionsare mutually equivalent for any non-empty set T and symmetric relation S � T2, sothat any one of them could serve for a characterization of unrooted trees. The firstbulleted condition defined unrooted trees out of rooted ones; each of the remainingones answers a question that Alice raised: how can we define unrooted trees directly,without reference to rooted ones?


The equivalence of the six conditions should not be thought of as just a bit oftechnical wizardry; it has a deep conceptual content. In particular, the first threebullet points tell us that if we connect a set T with a symmetric relation in a mosteconomical way possible then, whether we think of economy in terms of sets ofedges (ordered by inclusion) or number of edges, we get an unrooted tree; andconversely.

Alice Box: Explicit Versus Recursive Definitions of Unrooted Trees

Alice: Thanks, this does answer my query. But one question leads toanother. Could we also define unrooted trees recursively, as we didfor rooted ones?

Hatter: Of course. Indeed, we need only take the recursive definition forrooted trees and tweak it a little. You might like to do it as anexercise.

Exercise 7.6.2 (6)(a) Give a simple example of a structure (T,S) where T is a set with #(T) D n and S

is a symmetric relation over T with n � 1 edges but is not an unrooted tree. Giveanother example in which S is moreover irreflexive.

(b) Without proof, using only your intuition, try making a suitable recursivedefinition of unrooted trees for Alice.

Solution to (a)The simplest example has just two vertices, with an edge from one of the verticesto itself. If we also require irreflexivity, the simplest example has four vertices, withthree connected in a simple cycle and the fourth not connected to anything.

7.6.3 Spanning Trees

The second of the above bulleted conditions gives rise to an important practicalproblem. A symmetric relation S that minimally connects a set T (thus satisfyingthe condition) is known as a spanning tree for T. Such minimality is valuable, as ithelps reduce computation; how can we obtain it algorithmically?

Suppose we are given a set A connected by a symmetric relation S. There may bemany S-paths between vertices and many simple cycles. In the limit, every elementof A may be related to every one including itself, giving n(n C 1)/2 edges where n isthe number of elements of A; in the case of an irreflexive relation, every element ofA may be related to every other one, flooding us with n(n � 1)/2 edges. That is a lotof information, most of which may be redundant for our purposes. So the questionarises: Is there an algorithm which, given a set A connected by a symmetric relationS over it, finds a minimal symmetric subrelation S0 � S that still connects A? In otherwords, is there an algorithm to find a spanning tree for A?


Clearly there is a ‘top-down’ procedure that does the job. We take the givenrelation S connecting A, and take off edges one by one in such a way as to leave Aconnected. To begin with, we can get rid of all the edges connecting a vertex withitself, and then we start to carefully remove edges between distinct vertices withoutdamaging connectivity. When we get to a point where it is no longer possible todelete any edge without de-connecting A (which will happen when we have gotdown to n � 1 edges, where n is the number of vertices), we stop. That leaves uswith a minimal symmetric relation S0 � S connecting A – which is what we arelooking for.

But there is also a ‘bottom-up’ procedure that accomplishes the task. We beginwith the empty set of edges and add in edges from S one by one in such a way asnever to create a simple cycle. When we get to a point where it is no longer possibleto add an edge from S without creating a simple cycle (which will happen when wehave got up to n � 1 edges, where n is the number of vertices), we stop. That leavesus with a maximal relation S0 � S without simple cycles. By two of the equivalentconditions bulleted above, S0 will also be a minimal symmetric relation connectingA – which is what we are looking for.

In general, the ‘bottom-up’ algorithm is much more efficient than the ‘top-down’one for finding a spanning tree, since it is less costly computationally to checkwhether a given relation creates a simple cycle than to check whether a given setis connected by a relation.

Exercise 7.6.3 For readers familiar with pseudo-code, express each of the algo-rithms above using it.

In many practical situations, one needs to go further and consider symmetricrelations whose edges have numerical weights attached to them. These weightsmay represent the distance between vertices, or the time or cost involved in passingfrom one to the other. In this context, we often want to do something more subtlethan minimize the set of edges in a relation connecting the domain; we may wishto minimize total cost, that is, minimize the sum of their weights. In the usualterminology, we want to find a minimal spanning tree for A.

The ‘bottom-up’ algorithm that we have described for finding a spanning treecan be refined to solve this problem. However, doing so would take us beyond thelimits of the present book. The references at the end of the chapter follow the subjectfurther.


Exercise 7.1: Properties of rooted trees(a) Show that the relation R of a rooted tree (T,R) is always intransitive, in the sense

defined in the chapter on relations. Hint: Use the explicit definition of a rootedtree and argue by using proof by contradiction.


(b) Consider the ‘new root’ definition of a rooted tree. Show that the tree con-structed by an application of the recursion step has height (whether node-heightor link-height) one larger than the maximal height (in the same sense) of itsimmediate subtrees.

(c) Do the same with the ‘new leaf’ definition of a rooted tree.(d) In the text, we defined the notion of a subtree of a tree, but we did not verify

that when so defined, it is always itself a tree. Check it.

Exercise 7.2: Definitions of rooted trees* Show the equivalence of the threedefinitions of a rooted tree – explicit, recursive by ‘new root’ and recursive by ‘newleaf’. Remark: This is a challenging exercise, and its answer will have to consist of atleast three parts, thus taking some time. Perhaps the simplest strategy is to establisha cycle of implications.

Exercise 7.3: Labelled trees(a) Construct the syntactic decomposition tree of the arithmetic expression

(8 � (7 C x)) C (y3 C (x � 5)).(b) Construct the syntactic decomposition tree of the arithmetic expression �

(�(8 � (7 C x))).(c) What would be special about the shape of syntactic decomposition tree of an

arithmetic expression formed from two-place arithmetic operations alone?(d) Draw the mirror image of the tree in Fig. 7.3, with its labels. Write down

the arithmetic expression to which it corresponds in both standard and Polishnotation.

Exercise 7.4: Binary search trees(a) Using the construction algorithm given in the text, construct a binary search tree

for the list of letters (g, c, b, m, i, h, o, a) where the ordering relation < is theusual alphabetical one.

(b) In the tree you have constructed, trace the steps in a search for the letter i usingthe search algorithm in the text.

(c) To the tree you have constructed, add the letter f using the insertion algorithmin the text.

Exercise 7.5 Unrooted trees*Consider the complete (irreflexive) graph (G,R) with five vertices, that is, the

graph in which there is an edge between each vertex and every other vertex.(a) Construct a spanning tree for this graph by the top-down method, showing by

successive diagrams each step.(b) Construct a different spanning tree by the bottom-up method, again showing

your steps by successive diagrams.


Selected Reading

Almost all introductory texts of discrete mathematics have a chapter on trees, usually preceded byone on the more general theory of graphs. More often than not, they treat unrooted trees beforerooted ones. Of the two books below, both go into considerably more detail than we have done.The first starts from unrooted trees, while the approach of the second is closer to that of thischapter.

Johnsonbaugh R (2009) Discrete mathematics, 7th edn. Pearson, Upper Saddle River, chapter 9Kolman B et al (2006) Discrete mathematical structures, 6th edn. Pearson, Upper Saddle River,

chapter 7

8Yea and Nay: Propositional Logic

AbstractWe have been using logic on every page of this book – in every proof, verificationand informal justification. In the first four chapters, we inserted some ‘logicboxes’; they gave just enough to be able to follow what was being done. Now wegather the material of these boxes together and develop their principles. Logicthus emerges as both a tool for reasoning and an object for study.

We begin by explaining different ways of approaching the subject and situat-ing the kind of logic that we will be concerned with, then zooming into a detailedaccount of classical propositional logic. The basic topics there will be the truth-functional connectives, the family of concepts around tautological implication,the availability of normal forms and unique minimalities for formulae and the useof semantic decomposition trees as a shortcut method for testing logical status.

8.1 What Is Logic?

What we will be studying in this chapter is only one part of logic, but a very basicpart. In a general sense, logic may be seen as the study of reasoning or, in morefashionable terms, belief management. It concerns ways in which agents (human orother) may develop and shape their beliefs, by inference, organization and change:• Inference is the process by which a proposition is accepted on the basis of others,

in other words, is considered as justified by them.• Belief organization is the business of getting whatever we accept into an easily

stocked, communicable and exploitable pattern. It is important in computerscience for database management but also in mathematics, where it traditionallytakes the form of axiomatization.

• Belief change takes place when we decide to abandon something that we hadhitherto accepted, a process known to logicians as contraction. It can also takethe form of revision, where we accept something that we previously ignoredor even rejected, at the same time making sufficient contractions to maintainconsistency of the whole. A closely related form of belief change, of particular


189

190 8 Yea and Nay: Propositional Logic

interest to computer science, is update, where we modify our records to keep upwith changes that are taking place in the world. Evidently, this is very close torevision, but there are also subtle differences.Of all these processes of belief management, the most basic is inference. It

reappears as an ingredient within all the others, just as sets reappear in relations,functions, counting, probability and trees. For this reason, introductory logic booksusually restrict themselves to inference, leaving other concerns for more advancedwork. Even within that sphere, it is customary to look at only one kind of inference,admittedly the most fundamental one, underlying others: deductive inference.Reluctantly, we must do the same.

That this is a real limitation becomes apparent if one reflects on the kind ofreasoning carried out in daily life, outside the study and without the aid of pen andpaper. Often, it is not so much inference as articulating, marshalling and comparinginformation and points of view. Even when it takes the form of inference, it is seldomfully deductive. The conclusions one reaches are offered as plausible, reasonable,probable or convincing given the assumptions made, but they are rarely if everabsolutely certain relative to them. There is the possibility, perhaps remote, thateven if the assumptions are true, the conclusion might still be false.

But within mathematics and most of computer science, the game is quitedifferent. There we use only fully deductive inference, rendering the conclusionscertain given the assumptions made. This is not to say that there is any certainty inthe assumptions themselves, nor for that matter in the conclusions. It lies in the linkbetween them: it is impossible for the premises to be true without the conclusionalso being so.

That is the only kind of reasoning we will be studying in this chapter. It providesa basis that is needed before trying to tackle uncertain inference, whether that isexpressed in qualitative terms or probabilistically, and before analyzing other kindsof belief management. Its study goes back 2,000 years; in it modern form, it beganto take shape in the middle of the nineteenth century.

8.2 Truth-Functional Connectives

We begin by looking at some of the ways in which statements, often also calledpropositions, may be combined. The simplest ways of doing this are by means ofthe truth-functional connectives ‘not’, ‘and’, ‘or’, ‘if’, ‘iff’, etc. We have alreadyintroduced each of these one in a corresponding logic box in the first few chapters,and you are strongly advised to flip back to those boxes to get up to speed, for wewill not repeat everything said there. However, for easy reference, we recall the truthtables, using ’, “, : : : , for arbitrary propositions. The one-place connective ‘not’ hasthe following Table 8.1:

Table 8.1 Truth table fornegation

’ :’

1 00 1

8.2 Truth-Functional Connectives 191

Table 8.2 Truth table forfamiliar two-placeconnectives

’ “ ’^“ ’_“ ’!“ ’$“

1 1 1 1 1 11 0 0 1 0 00 1 0 1 1 00 0 0 0 1 1

Here, 1 is for truth, and 0 is for falsehood. The two-place connectives ‘and’, ‘or’,‘if’ and ‘iff’ are grouped together in Table 8.2.

A basic assumption made in these tables is the principle of bivalence: everyproposition is either true or false, but not both. In other words, the truth-values ofcompound propositions may be represented by functions with domain taken to bethe set of all propositions of the language under consideration (about which moreshortly), into the two-element set f1,0g.

Alice Box: The Principle of Bivalence

Alice: What happens if we relax the principle of bivalence?Hatter: There are two main ways of going about it. One is to allow that there

may be truth-values other than truth and falsehood. This gives rise tothe study of what is called many-valued logic.

Alice: And the other?Hatter: We can allow that the values may not be exclusive, so that a propo-

sition may be both true and false. That can also be accommodatedwithin the first approach by using new values to represent subsetsof the old values. For example, we might use four values (the twoold ones and two new ones) to represent the four subsets of the two-element set consisting of the classical values 1 and 0.

Alice: Will we be looking at any of these?Hatter: No. They are all non-classical, and beyond our compass. In any case,

classical logic remains the standard base.

It is time to look more closely at the concept of a truth function in two-valuedlogic. A one-place truth function is simply a function on domain f1,0g into f1,0g.A two-place one is a function on f1,0g2 into f1,0g and so on. This immediatelysuggests all kinds of questions. How many one-place truth functions are there? Howmany two-place and generally n-place ones? Can every truth function be representedby a truth table? Are the specific truth functions given in the tables above sufficientto represent all of them, or do we need further logical connectives to do so? We cananswer these questions quite easily.

Clearly, there are just four one-place truth functions. Each can be represented bya truth table. In the leftmost column, we write the two truth-values 1 and 0 that a


Table 8.3 The one-placetruth functions

’ f1(’) f2(’) f3(’) f4(’)

1 1 1 0 00 1 0 1 0

proposition ’ can bear. In the remaining columns, we write the possible values ofthe functions for those two values of their arguments (Table 8.3):

Exercise 8.2 (1) (with solution)(a) Which of these four truth functions fi corresponds to negation?(b) Can you express the other three truth functions in terms of connectives with

which you are already familiar (:, ^, _, !, $)?

Solution(a) Obviously, f3.(b) f2 is the identity function, that is, f2(’) D ’, so we don’t need to represent it

by more than ’ itself. f1 is the constant function with value 1, and so can berepresented as any of ’_:’, ’!’, or ’$’. Finally, f4 is the constant functionwith value 0 and can be represented by ’^:’ or by the negation of any of thosefor f1.

Going on to the two-place truth functions, the Cartesian product f1,0g2 evidentlyhas 22 D 4 elements, which may be listed in a table with four rows, as in thetables for the familiar two-place connectives. For each pair (a,b) 2 f1,0g2, there areevidently two possible values for the function, which gives us 42 D 16 columns tofill in, that is, 16 truth functions.

We always write the rows of a truth table in standard order. In a case of atwo-place function with arguments ’,“, we begin with the row (1,1) where both’ and “ are true, and end with the row (0,0) where both are false. The principlefor constructing each row from its predecessor for truth functions of any number ofarguments: take the last 1 in the preceding row, change it to 0 and replace all 0 s toits right to 1. The following exercise may look tedious, but you will learn a lot bycarrying it out in full.

Exercise 8.2 (2)(a) Write out a table grouping together all of the 16 two-place truth functions,

calling them f1, : : : , f16. Hints: You will need 16 columns for the sixteenfunctions, plus 2 on the left for their two arguments. Again, these columnsshould be written in standard order: begin with the column (1,1,1,1) and proceedto the last column (0,0,0,0), in the manner prescribed in the text.

(b) Identify which of the 16 columns represent truth functions with which you arealready familiar. Try to express the others using combinations of the familiarones.

The familiar truth functions ^, _, !, $ are all two-place, while : is one-place.Is there a gap in their expressive power, or can every truth function of two places,indeed of arbitrary n places, be captured using them? Fortunately, there is no gap.

8.3 Tautological Relations 193

Table 8.4 Sample two-placetruth table

’ “ f3(’,“)

1 1 11 0 10 1 00 0 1

It is easy to show that every truth function, of any finite number of places, may berepresented using at most the three connectives :, ^, _.

To see how, consider the case of the two-place functions. Take an arbitrary two-place truth function fi, say f3, with its table as worked out in Exercise 8.2 (2)(Table 8.4).

Take the rows in which fi(’,“) D 1 – there are three of them in this instance.For each such row, form the conjunction ˙’^˙“ where ˙ is empty or negationaccording as the argument has 1 or 0. This gives us the three conjunctions ’^“ (forthe first row), ’^:“ (for the second row) and :’^:“ (for the fourth row). Formtheir disjunction: (’^“)_(’^:“)_(:’^:“). Then, this will express the same truthfunction f3.

Why? By the table for disjunction, it will come out true just when at least oneof the three disjuncts is true. But the first disjunct is true in just the first row, thesecond is true in just the second row and the third is true in just the last row. Sothe constructed expression has exactly the same truth table as f3, that is, it expressesthat truth function.

Exercise 8.2 (3)(a) Of the 16 two-place truth functions, there is exactly one that cannot be expressed

in this way. Which is it? Show how we can still represent it using :, ^, _.(b) Draw a truth table for some three-place truth function and express it using :,

^, _ in the same way.(c) Would it have made a difference if we had written exclusive disjunction ˚

instead of inclusive disjunction _ in this construction? Why?

Thus, every truth function can be expressed using only the connectives :, ^, _.Are all three really needed, or can we do the job with even fewer connectives? Wewill return to this question in the next section, after clarifying some fundamentalconcepts.

8.3 Tautological Relations

A special feature of logic, as compared with most other parts of mathematics, isthe very careful attention that it gives to the language in which it is formulated.Whereas the theory of trees, say, is about certain kinds of abstract structure –arbitrary sets equipped with a relation satisfying a certain condition – logic is aboutthe interconnections between certain kinds of language and the structures that maybe used to interpret them. A good deal of logic thus looks at the language as wellas at the structures in which the language is interpreted. This can take quite some


time to get used to, since ordinarily in mathematics we look through the language atthe structures alone. It is like learning to look at the window pane as well as at thelandscape beyond it.

8.3.1 The Language of Propositional Logic

Our language should be able to express all truth functions. Since we knowthat the trio :, ^, _ are together sufficient for the task, we may as well usethem. They are the primitive connectives of the language. We take some set ofexpressions p1, p2, p3, : : : , understood intuitively as ranging over propositions,and call them propositional variables (their most common name) or elementaryletters (less common, but our preferred one). This set may be finite or infinite; theusual convention is to take it either as finite but of unspecified cardinality or ascountably infinite. As subscripts are a pain, we usually write elementary letters asp,q,r, : : : bringing in the subscripts only when we run short of letters or they areconvenient for formulations.

The formulae (in some texts, ‘well-formed formulae’) of our language areexpressions that can be obtained recursively from elementary letters by applyingconnectives. If the chapter on recursion and induction has not been forgottenentirely, it should be clear what this means: the set of formulae is the least set Lthat contains all the elementary letters and is closed under the connectives, that is,the intersection of all such sets. That is, whenever ’,“ 2 L, then so are (:’), (’^“)and (’_“), and only expressions that can be formed from elementary letters in afinite number of such steps are counted as formulae.

In the chapter on trees, we have already seen why at least some brackets areneeded in propositional formulae. For example, we need to be able to distinguish(p^q)_r from p^(q_r) and likewise :(p^q) from (:p)^q. We also saw how thebrackets may in principle be dispensed with if we adopt a prefix (Polish) or postfix(reverse Polish) notation, and we learned how to draw the syntactic decompositiontree of a formula. We agreed to make reading easier by omitting brackets when thiscan be done without ambiguity, for example, by omitting the outermost brackets,which we have just done and will continue to do. Finally, we reduce brackets a bitfurther by using standard grouping conventions, notably for negation, for example,always reading :p^q as (:p)^q rather than as :(p^q).

Exercise 8.3.1 (1)(a) For revision, draw the syntactic decomposition trees of the two formulae

(p^q)_r and p^(q_r) and also write each of them in both Polish and in reversePolish notation. Likewise for the formulae :(p^q) and (:p)^q.

(b) If you did not do Exercise 4.6.2 in the chapter on recursion and induction, do itnow. If you did answer it, review your solutions.

With this out of the way, we can get down to more serious business, defining thefundamental concepts of classical propositional logic. An assignment is defined tobe a function v: E!f1,0g on the set E of elementary letters of the language intothe two-element set f1,0g. Roughly speaking, assignments correspond to left-hand


parts of a truth table. By structural induction on formulae, we can easily show thatfor each assignment v: E!f1,0g, there is a unique function vC: L!f1,0g, where Lis the set of all formulae, that satisfies the following two conditions:(1) It agrees with v on E, that is, vC(p) D v(p) for every elementary letter p.(2) It satisfies the truth-table conditions; that is, vC(:’) D 1 iff vC(’) D 0,

vC(’^“) D 1 iff vC(’) D vC(“) D 1, and vC(’_“) D 0 iff vC(’) D vC(“) D 0.Such a vC is called a valuation of formulae – the valuation determined by the

assignment v. For the verification of its existence and uniqueness, we needed tokeep the difference between v and vC clear. However, now that the verification isdone, in accord with the precept of minimizing notational fuss, we will soon be‘abusing notation’ by dropping the superscript from vC and writing it too as plain v.Only in contexts where confusion could possibly arise will we put the superscriptback in.

Exercise 8.3.1 (2) Write out the proof by structural induction that shows the‘unique extension’ result. Keep the notational difference between v and vC clearfor the duration of this proof.

We are interested in a group of notions that may be called the tautologicality con-cepts. There are four basic ones. Two are relations between formulae: tautologicalimplication and tautological equivalence. The other two are properties of formulae:those of being a tautology and of being a contradiction. They are intimately linkedto each other.

8.3.2 Tautological Implication

We begin with the relation of tautological implication, also known as tautologicalconsequence. It is a relation between sets of formulae on the left and individualformulae on the right. Let A be a set of formulae, and “ an individual formula. Wesay that A tautologically implies “ and write A ` “ iff there is no valuation v such thatv(’) D 1 for all ’ 2 A but v(“) D 0. In other words, for every valuation v, if v(’) D 1for all ’ 2 A, then v(“) D 1. When A is a singleton f’g, this comes to requiringthat v(“) D 1 whenever v(’) D 1. To reduce notation, in this singleton case, we write’ ` “ instead of f’g ` “.

The definition can be expressed in terms of truth tables. If we draw up a truthtable that covers all the elementary letters that occur in formulae in A[f“g, then it isrequired that every row which has a 1 under each of the ’ 2 A also has a 1 under “.

Warning: The symbol ` is not one of the connectives of propositional logic like :,^, _, !, $. Whereas p!q is a formula of the language L of propositional logic,p ` q is not. Rather, ` is a symbol that we use when talking about formulae ofpropositional logic. In handy jargon, we say that it belongs to our metalanguagerather than to the object language. This distinction takes a little getting used to, butit is very important. Neglect can lead to inextricable confusion.


Table 8.5 Some importanttautological implications

Name LHS RHS

Simplification, ^� ’^“ ’

’^“ “

Conjunction, ^C ’, “ ’^“

Disjunction, _C ’ ’_“

“ ’_“

Modus ponens, MP, !� ’, ’!“ “

Modus tollens, MT :“, ’!“ :’

Disjunctive syllogism, DS ’_“, :’ “

Transitivity ’!“, “!” ’!”

Material implication “ ’!“

:’ ’!“

Limiting cases ” “_:“

’^:’ ”

Table 8.6 Verification formodus tollens

’ “ ’!“ :“ :’

1 1 1 0 01 0 0 1 00 1 1 0 10 0 1 1 1

Table 8.5 gives some of the more important tautological implications. The leftcolumn gives their more common names, typically a traditional name followed insome cases by its acronym, in some instances also a modern symbolic name. TheGreek letters ’,“,” stand for arbitrary formulae of propositional logic. In each row,the formula or set of formulas marked LHS (left-hand side) tautologically impliesthe formula marked RHS (right-hand side). In most cases, we have a singleton onthe left, but in four rows, we have a two-element set whose elements are separatedby a comma. Thus, for example, the premise set for modus ponens is the pair f’,’!“g, and the conclusion is “.

We can check that modus tollens, for example, is a tautological implication bybuilding a truth table (Table 8.6). The four possible truth-value combinations for’,“ are given in the left part of the table. The resulting values of ’!“, :“, :’

are calculated step by step, from the inside to the outside, more precisely from theleaves of the syntactic decomposition trees for the three formulae to their roots (veryshort trees in this case). We ask: Is there a row in which the two premises ’!“, :“

get 1 while the conclusion :’ gets 0? No, so the premises tautologically imply theconclusion.

Exercise 8.3.2(a) Draw truth tables to check out each of disjunctive syllogism, transitivity and

two material implication consequences.(b) Check the two limiting cases and also explain in general terms why they hold,

even when the formula ” shares no elementary letters with ’ or “.


(c) Show that the relation of tautological implication is reflexive and transitive, butnot symmetric.

(d) Show that the relation of tautological implication is neither antisymmetric norlinear.

Once understood, the entries in Table 8.5 should be committed firmly to memory.That is rather a bore, but it is necessary in order to make progress. It is like learningsome basic equalities and inequalities in arithmetic; you need to have them at yourfingertips so as to apply them without hesitation.

8.3.3 Tautological Equivalence

In a tautological implication such as ’ ` ’_“, while the left implies the right, theconverse does not hold in general. When p, q are distinct elementary letters, p_q °p, although we do have ’_“ ` ’ for some special choices of ’, “.

Exercise 8.3.3 (1) Show that p_q ° p and find formulae ’, “ such that ’_“ ` ’.

When each of two formulae tautologically implies the other, we say that the twoare tautologically equivalent. That is, ’ is tautologically equivalent to “, and wewrite ’ a` “, iff both ’ ` “ and “ ` ’. Equivalently: ’ a` “ iff v(’) D v(“) for everyvaluation v. In terms of truth tables: the two formulae are tautologically equivalentiff, when we draw up a truth table that covers all the elementary letters that occur inthem, the column for ’ comes out exactly the same as the column for “. Once again,the symbol a` does not belong to the object language of propositional logic but ispart of our metalanguage.

We give two tables listing the most important tautological equivalences thatcan be expressed in up to three elementary letters. The first Table 8.7 containsequivalences using at most the connectives :, ^, _ while the second table also dealswith !, $. They are all very important and should also be committed to memoryafter being understood.

Exercise 8.3.3 (2)(a) Draw truth tables to verify one of the de Morgan equivalences, one of the

distributions, one of the absorptions, one of the expansions and one of thelimiting cases.

(b) What striking syntactic feature do the limiting case equivalences have, aloneamong all the equivalences in the table?

(c) What syntactic feature is shared by the limiting case equivalences, absorptionand expansion, but none of the others?

(d) Show from the definition that tautological equivalence is indeed an equivalencerelation.

We saw analogues of many of these equivalences in Chap. 1 on sets. For example,the de Morgan principles there took the form of identities between sets, two of thembeing �(A\B) D �A[�B and �(A[B) D �A\�B, where � is complementation


Table 8.7 Some importanttautological equivalences for:, ^, _

Name LHS RHS

Double negation ’ ::’

Commutation for ^ ’^“ “^’

Association for ^ ’^(“^”) (’^“)^”

Commutation for _ ’_“ “_’

Association for _ ’_(“_”) (’_“)_”

Distribution of ^ over _ ’^(“_”) (’^“)_(’^”)Distribution of _ over ^ ’_(“^”) (’_“)^(’_”)Absorption ’ ’^(’_“)

’ ’_(’^“)Expansion ’ (’^“)_(’^:“)

’ (’_“)^(’_:“)de Morgan :(’^“) :’_:“

:(’_“) :’^:“

’^“ :(:’_:“)’_“ :(:’^:“)

Limiting cases ’^:’ “^:“

’_:’ “_:“

with respect to some local universe. This is not surprising, since intersection, unionand complementation of sets are defined using ‘and’, ‘or’ and ‘not’, respectively.There is thus a systematic correspondence between tautological equivalences andBoolean identities between sets. Similarly, there is a correspondence between tauto-logical implications, such as those in Table 8.5, and rules of inclusion between sets.

The de Morgan equivalences answer a question that we posed at the end ofSect. 8.2 about the connectives needed to be able to express all truth functions.They may all be represented using just the two connectives :, ^, since from themwe can get _ by the last of the four de Morgan equivalences, and we already knowthat with that trio, we may obtain all the others. Likewise, the pair :, _ suffices toexpress all possible truth functions via the third de Morgan equivalence.

Table 8.8 lists important equivalences that involve ! or $, in some cases, plusconnectives from :, ^, _.

Exercise 8.3.3 (3)(a) Use a truth table to verify association for $.(b) Use information from the list of equivalences to show that the pair :,! is

enough to express all truth functions.(c) Use information from the list of equivalences to show by structural induction

that any formula built using at most the connectives :, $ is equivalent to onein which all occurrences of : act on elementary letters.

(d) Show using the same resources and similar strategy that any formula ’ builtusing at most the connectives :, $ is equivalent to one of the form ˙(’),where ˙ may be negation or no connective at all and where ’ contains nooccurrences of :.


Table 8.8 Some important tautological equivalences using !, $Name LHS RHS

Contraposition ’!“ :“!:’

’!:“ “!:’

:’!“ :“!’

Import/export ’!(“!”) (’^“)!”

’!(“!”) “!(’!”)Consequential mirabilis (miraculous consequence) ’!:’ :’

:’!’ ’

Commutation for $ ’$“ “$’

Association for $ ’$(“$”) (’$“)$”

: through $ :(’$“) ’$:“

Translations between connectives ’$“ (’!“)^(“!’)’$“ (’^“)_(:’^:“)’!“ :(’^:“)’!“ :’_“

’_“ :’!“

Translations of negations of connectives :(’!“) ’^:“

:(’^“) ’!:“

:(’$“) (’^:“)_(“^:’)

A set of truth-functional connectives is said to be functionally complete if all truthfunctions, of any finite number of places, may be expressed using only connectivesfrom that set. From what we have done so far, we know that the sets f:,^,_g, f^,:g,f:,_g and f:,!g are all functionally complete.

Exercise 8.3.3 (4) Is there any two-place truth-connective that, taken alone, isfunctionally complete? Hints: Go through the table for all the 16 two-place truthfunctions that was constructed in an exercise for Sect. 8.2, use your intuition to picklikely candidates and check that they do the job. For the checking part, first try toexpress :, and then try to express either ^ or _. If you succeed, you are done.

We already know that tautological equivalence is indeed an equivalence relation.We end this subsection by noting that it has two further properties. It is also a con-gruence relation with respect to every truth-functional connective. That is, whenever’ a` ’0, then :’ a` :’0; whenever ’ a` ’0 and “ a` “0, then ’^“a`’0^“0 and’_“ a`’0_“0; and similarly for all other truth-functional connectives. Moreover, ithas the replacement property. That is, whenever ’ a` ’0, then if we take a formula” and replace one or more occurrences of ’ in ” by ’0 to get a formula ”0, then”a`”0.

Exercise 8.3.3 (5)(a) Use structural induction to show that tautological equivalence is a congruence

relation.


(b) Do the same to show that it has the replacement property. Remark: The proof isnot difficult to see intuitively but can be tricky to formulate neatly.

The concept of tautological equivalence may evidently be ‘lifted’ to a relationbetween sets of formulae. If A, B are sets of formulae, we say that they aretautologically equivalent and write A a` B iff A ` “ for all “ 2 B and also B ` ’

for all ’ 2 A. Equivalently: For every valuation v, v(’) D 1 for all ’ 2 A iff v(“) D1 for all “ 2 B.

Exercise 8.3.3 (6) Check that so defined tautological equivalence between sets offormulae is an equivalence relation.

8.3.4 Tautologies and Contradictions

Now that we have the relations of tautological implication and equivalence underour belt, the properties of being a tautology, contradiction or contingent are child’splay. Let ’ be any formula.• We say that ’ is a tautology iff v(’) D 1 for every valuation v. In other words: iff

’ comes out with value 1 in every row of its truth table.• We say that ’ is a contradiction iff v(’) D 0 for every valuation v. In other words:

iff ’ comes out with value 0 in every row of its truth table.• We say that ’ is contingent iff it is neither a tautology nor a contradiction. In other

words: iff v(’) D 1 for some valuation v and also v(’) D 0 for some valuation v.Clearly, every formula is either a tautology, or a contradiction, or contingent, and

only one of these. In other words, these three sets partition the set of all formulaeinto three cells.

Exercise 8.3.4 (1) (with solution) Classify the following formulae as tautologies,contradictions or contingent: (i) p_:p, (ii) :(p_:p), (iii) p_:q, (iv) :(p_:q), (v)(p^(:p_q))!q, (vi) :(p_q)$(:p^:q), (vii) p^:p, (viii) p!:p, (ix) p$:p, (x)(r^s)_:(r^s), (xi) (r!s)$:(r!s).

Solution Tautologies: (i), (v), (vi), (x). Contradictions: (ii), (vii), (ix), (xi). Contin-gent: (iii), (iv), (viii).

If you went through this little exercise conscientiously, you will already havesensed several general lessons that can be extracted from it.• A formula is a tautology iff its negation is a contradiction. Example: (i) and (ii).• A formula is contingent iff its negation is contingent. Example: (iii) and (iv).• A formula ’!“ is a tautology iff ’ ` “. Generally, f’1, : : : ,’ng ` “ iff the

formula (’1^ : : : ^’n)!“ is a tautology. Example: formula (v) of the exerciseand the tautological implication of disjunctive syllogism.

• A formula ’$“ is a tautology iff ’ a` “. Example: formula (vi) of the exerciseand one of the de Morgan equivalences.


Exercise 8.3.4 (2) Verify each of the bulleted points from the definitions.

Examples (x) and (xi) of the penultimate exercise are also instructive. The formertells us that (r^s)_:(r^s) is a tautology. Without making a truth table, we can seethat it must be so since it is merely a substitution instance of the simple tautologyp_:p. Likewise, (r!s)$:(r!s) is a contradiction, being a substitution instanceof the contradiction p$:p. Quite generally, we have the following principle, whichwe will prove in a moment:• Every substitution instance of a tautology or a contradiction is a tautology or

contradiction, respectively.On the other hand, not every substitution instance of a contingent formula is

contingent. For example, we saw that p_:q is contingent, but its substitutioninstance p_:p (formed by substituting p for q) is a tautology, while another of itssubstitution instances (p^:p)_:(q_:q), formed by substituting p^:p for p andq_:q for q, is a contradiction.

This notion of a substitution in propositional logic can be given a very precisemathematical content. A substitution is a function ¢ : L!L, where L is the set of allformulae, satisfying the following homomorphism conditions:

¢.:’/ D :¢.’/

¢.’ ^ “/ D ¢.’/ ^ ¢.“/

¢.’ _ “/ D ¢.’/ _ ¢.“/

Note that in this definition, D is not just tautological equivalence; it is fullidentity between formulae, that is, the left and right sides stand for the very sameformula. It is easy to show, by structural induction on the definition of a formula ofpropositional logic, that a substitution function is uniquely determined by its valuesfor elementary letters.

Exercise 8.3.4 (3) (with partial solution)(a) Suppose ¢(p) D q^:r, ¢(q) D :q, ¢(r) D p!s. Identify the formulae (i) ¢(:p),

(ii) ¢(p_:q), (iii) ¢(r_(q!r)).(b) Treating ’!“ and ’$“ as abbreviations for :(’^:“) and (’^“)_(:’^:“),

respectively, show that substitution functions satisfy analogous homomorphismconditions for them too.

Solution to (a)(a) (i) ¢(:p) D :¢(p) D :(q^:r), (ii) ¢(p_:q) D ¢(p)_¢(:q)D ¢(p)_:¢(q)D

(q^:r)_::q. For (iii), omitting the intermediate calculations, ¢(r_(q!r)) D(p!s)_(:q!(p!s)).

Note that substitutions are carried out simultaneously, not serially. For example,in part (a) (ii) of the exercise, ¢(p_:q) is obtained by simultaneously substituting¢(p) for p and ¢(q) for q. If we were to replace serially, first substituting ¢(p)for p to get (q^:r)_:q and then substituting ¢(q) for q, we would get the result(:q^:r)_::q, which differs in the sign on the first occurrence of q.

We are now in a position to prove the claim that every substitution instance of atautology is a tautology or, as we also say, the set of all tautologies is closed under


substitution. Let ’ be any formula, and ¢ any substitution function. Suppose that¢(’) is not a tautology. Then there is a valuation v such that v(¢(’)) D 0. Let v0 bethe valuation defined on the letters in ’ by putting v0(p) D v(¢(p)). Then it is easy toverify by structural induction that for every formula “, v0(“) D v(¢(“)). In particular,v0(’) D v(¢(’)) D 0, so that ’ is not a tautology.

Likewise, the set of all contradictions and the relations of tautological equiva-lence and tautological implication are closed under substitution. That is, for anysubstitution function ¢ , we have the following:• Whenever ’ is a contradiction, then ¢(’) is too.• Whenever ’ a` “, then ¢(’) a` ¢(“).• Whenever ’ ` “, then ¢(’) ` ¢(“).• Whenever A ` “, then ¢(A) ` ¢(“).

Here, A is any set of formulae, and ¢(A) is defined, as you would expect from thechapter on functions, as f¢(’): ’ 2 Ag.

Exercise 8.3.4 (4) Complete the inductive details in the verification that the set ofall tautologies is closed under substitution.

8.4 Normal Forms

The formulae of propositional logic may be of any length, depth or level ofcomplexity. It is important to be able to express them in the most transparentpossible way. In this section, we will look briefly at four kinds of normal form.The first two focus on the positioning of connectives and show that they maybe applied in a neat order. The next pair concerns the interplay of elementaryletters; one gets rid of all redundant letters, while the other compartmentalizes theirinteraction as much as possible.

8.4.1 Disjunctive Normal Form

Normal forms are common in logic, mathematics and computer science. A normalform for an expression is another expression, equivalent (in a suitable sense) to thefirst, but with a nice simple structure. When the normal form is used a lot, the termcanonical form is often employed. A normal form theorem is one telling us thatevery expression (from some broad category) has a normal form (of some specifiedkind). In propositional logic, the best-known such form is disjunctive normal form,abbreviated dnf. To explain it, we need the concepts of a literal and of a basicconjunction.

A literal is simply an elementary letter or its negation. A basic conjunction is anyconjunction of (one or more) literals in which no letter occurs more than once. Thus,we do not allow repetitions of a letter, as in p^q^p, nor repetitions of the negationof a letter, as in :p^q^:p,nor a letter occurring both negated and unnegated, as in:p^q^p. We write a basic conjunction as ˙p1^ : : : ^˙pn, where the ˙ indicates

8.4 Normal Forms 203

the presence or absence of a negation sign. In the limiting case that n D 1, a basicconjunction is of the form ˙p1, without any conjunction sign.

A formula is said to be in disjunctive normal form (dnf ) iff it is a disjunction of(one or more) basic conjunctions. Again, in the limiting case that n D 1, a dnf willnot contain a disjunction sign. A formula is said to be in full dnf iff it is in dnf, andmoreover every letter in it occurs in each of the basic conjunctions.

Exercise 8.4.1 (1) (with solution)(a) Which of the following are in disjunctive normal form? When your an-

swer is negative, explain briefly why. (i) ((p^q)_:r)^:s, (ii) (p_q)_(q!r),(iii) (p^q)_(:p^:q), (iv) (p^q)_(:p^:q^p), (v) (p^q)_(:p^:q^:p), (vi)p^q^r, (vii) p, (viii) :p, (ix) p_q, (x) p_:p, (xi) p^:p.

(b) Which of the above are in full dnf?

Solution(a) No: there is a disjunction inside a conjunction. (ii) No: we have not eliminated

!. (iii) Yes. (iv) No: :p^:q^p contains two occurrences of p and so is nota basic conjunction. (v) No: :p^:q^:p contains two occurrences of :p. (vi)Yes. (vii) Yes. (viii) Yes. (ix) Yes. (x) Yes. (xi) No: p^:p has two occurrencesof p.

(b) (iii), (vi), (vii), (viii), (x).

How can we find a disjunctive normal form for an arbitrarily given formula ’?There are two basic algorithms. One is semantic, proceeding via the truth tablefor ’. The other is syntactic, using successive transformations of ’, justified bytautological equivalences from our table of basic equivalences.

The semantic construction is the simpler of the two. In fact, we already made useof it in Sect. 8.3 when we showed that the trio f:,^,_g of connectives is functionallycomplete. We begin by drawing up the truth table for ’ and checking whether it is acontradiction. Then:• In the principal case that ’ is not a contradiction, the dnf of ’ is the disjunction

of the basic conjunctions that correspond to rows of the table in which ’ receivesvalue 1.

• In the limiting case that ’ is a contradiction, that is, when there are no such rows,the formula does not have a dnf.It is clear from the construction that every non-contradictory formula has a

disjunctive normal form. It is also clear that the dnf obtained is unique up to theordering and bracketing of literals and basic conjuncts. Moreover, it is clearly full.

Exercise 8.4.1 (2) Find the full disjunctive normal form (if it exists) for eachof the following formulae, using the above truth-table algorithm: (i) p$q, (ii)p!(q_r), (iii) :[(p^q)!:(r_s)], (iv) p!(q!p), (v) (p_:q)!(r^:s), (vi)(p_:p)!(r^:r).

For the syntactic method, we start with the formula ’ and proceed by a seriesof transformations that massage it into the desired shape. The basic idea is fairly


simple, although the details are rather fussy. The steps of the algorithm should beexecuted in the order given:• Translate the connectives ! and $ (and any others in the formula, such as

exclusive disjunction) into :, ^, _ using translation equivalences such as thosein Table 8.8.

• Use the de Morgan rules :(’^“) a` :’_:“ and :(’_“) a` :’^:“ iterativelyto move negation signs inwards until they act directly on elementary letters,eliminating double negations as you go by the rule ::’ a` ’.

• Use the distribution rule ’^(“_”) a` (’^“)_(’^”) to move all conjunctionsinside disjunctions.

• Use idempotence, with help from commutation and association, to eliminaterepetitions of letters or of their negations.

• Delete all basic conjunctions containing a letter and its negation.This will give us as output a formula in disjunctive normal form, except when

’ is a contradiction, in which case it turns out that the output is empty. However,the dnf obtained in this way is rarely full: there may be basic conjunctions with lessthan the full baggage of letters occurring in them.

Exercise 8.4.1 (3)(a) Use the syntactic algorithm to transform the formula (p_:q)!(r^:s) into

disjunctive normal form. Show your transformations step by step and comparethe result with that obtained by the semantic method in the preceding exercise.

(b) How might you transform the dnf obtained above into a full one? Hint: Makeuse of the tautological equivalence of expansion ’ a` (’^“)_(’^:“).

8.4.2 Conjunctive Normal Form*

Conjunctive normal form is like disjunctive normal form but ‘upside-down’: theroles of disjunction and conjunction are reversed. Technically, they are called dualsof each other. A basic disjunction is defined to be any disjunction of (one ormore) literals in which no letter occurs more than once. A formula is said to bein conjunctive normal form (cnf ) iff it is a conjunction of (one or more) basicdisjunctions, and this is said to be a full conjunctive normal form of ’ iff everyletter of ’ occurs in each of the basic disjunctions.

Cnfs may also be constructed semantically, syntactically or by piggybacking ondnfs. For the semantic construction, we look at the rows of the truth table that give’ the value 0. For each such row, we construct a basic disjunction: this will bethe disjunction of those letters with the value 0 and the negations of the letterswith value 1. We then conjoin these basic disjunctions. This will give an outputin conjunctive normal form, except when the initial formula ’ is a tautology. Forthe syntactic construction, we proceed just as we do for dnfs, except that we usethe other distribution rule ’_(“^”) a` (’_“)^(’_”) to move disjunctions insideconjunctions.


At first sight, this construction may seem a little mysterious. Why do we constructthe cnf from the table the way described? Is there an underlying idea? There is,indeed. Instead of allowing the valuations that make the formula ’ true, we areexcluding the valuations that make ’ false. To do that, we take each row that gives’ the value 0 and declare it not to hold. That is the same as negating the basicconjunction of that row, and by de Morganizing that negation, we get the basicdisjunction described in the construction. In effect, the basic disjunction says ‘notin my row, thank you’.

Alice Box: Conjunctive Normal Form

Alice: OK, but why should we bother with cnfs when we already have dnfs?They are much less intuitive!

Hatter: Indeed, they are. But they have been found useful in a disciplineknown as logic programming. Such a program may, in the simplestcase, be seen as a set or conjunction of positive clauses (p1^ : : : ^pn)!q with n � 0 (understood as just the letter q in the limiting casethat n D 0, and sometimes known as Horn clauses). Such a formulaemay be expressed as a basic disjunction with just one unnegatedletter, (:p1_ : : : :pn)_q (n � 0). A conjunction of such formulae isevidently in cnf.

We can also construct a cnf for a formula ’ by piggybacking on a dnf for itsnegation :’. Given ’, we construct a dnf of its negation :’ by whichever methodyou like (via truth tables or by successive transformations). This will be a disjunction“1_ : : : _“n of basic conjunctions “i. Negate it, getting :(“1_ : : : _“n), which istautologically equivalent to ::’ and thus by double negation to our original ’.We can then use de Morgan to push all negations inside, getting :“1^ : : : ^:“n.Each :“i is of the form :(˙p1^ : : : ^˙pn), so we can apply de Morgan again,with double negation as needed, to express it as a disjunction of literals, asdesired.

Exercise 8.4.2(a) Take again the formula (p_:q)!(r^:s), and find a cnf for it by all three

methods: (i) semantic, (ii) successive syntactic transformations, (iii) the indirectmethod.

(b) Evidently, in most cases, a formula that is in dnf will not be in cnf. But in somelimiting cases, it will be in both forms. When can that happen?

(c) Check that :p1_ : : : :pn_q a` (p1^ : : : ^ pn)!q.


8.4.3 Eliminating Redundant Letters

A formula of propositional logic may contain redundant letters. We have alreadyseen an example in the table of important equivalences: absorption tells us thatp^(p_q) a` p a` p_(p^q); the letter q is thus redundant in each of the two outlyingformulae. The expansion equivalences in the same table give another example. Sodoes the equivalence (p!q)^(:p!q) a` q, not in the table.

Suppose we expand our language a little to admit a zero-ary connective ? (calledbottom or falsum) so that ? is a formula with no elementary letters. We stipulatethat it receives the value 0 under every valuation. We also stipulate that ¢(?) D ?for any substitution function ¢ . Then, contradictions like p^:p and tautologies likep_:p will also have redundant letters, since p^:p a` ? and p_:p a` :?. Forour discussion of redundancy, it will be convenient to assume that our languagecontains ? as well as the ordinary connectives, as it will streamline formulations.

It is clear that for every formula ’ containing elementary letters p1,..,pn (n � 0),there is a minimal set of letters in terms of which ’ may equivalently be expressed.That is, there is some minimal set E’ of letters such that ’ a` ’0 for some formula ’0all of whose letters are drawn from E’. This is because ’ contains only finitely manyletters to begin with, and so as we eliminate redundant letters, we must eventuallycome to a set (perhaps empty) from which no more can be eliminated.

But is this minimal set unique? In other words, is there a least such set – one thatis included in every such set. In general, as noted in an exercise at the end of thechapter on relations, minimality does not in general imply leastness. However, inthe present context, intuition suggests that surely there should be a least letter-set.And in this case, intuition is right.

The proof is not really difficult. Let ’ be a formula containing elementary lettersp1, : : : ,pn and let fp1, : : : ,pmg (m � n) be a minimal letter-set for ’, so that thereis a formula ’0 a` ’ containing only the letters p1, : : : ,pm. We want to show thatfp1, : : : ,pmg is a least letter-set for ’. Suppose that it is not. Then there is a formula’00 equivalent to ’ and a letter pi 2 fp1, : : : ,pmg such that pi does not occur in’00. We get a contradiction. Substitute the falsum ? for pi leaving all other lettersunchanged. Calling this substitution ¢ , we have ¢(’0) a` ¢(’00) since ’0 a` ’00.Since pi does not occur in ’00, we have ¢(’00) D ’00 a` ’, so ¢(’0) a` ’. But ¢(’0)has one less letter than ’0, namely, pi (remember, ? is not an elementary letter),contradicting the assumption that the letters in ’0 form a minimal letter-set for ’.

Thus, every formula has a least letter-set E’, and it is trivial that this must beunique. Any formula ’0 equivalent to ’ that is built from those letters is known asa least letter-set version of ’. The same considerations apply for arbitrary sets offormulae, even when they are infinite: Each set A of formulae has a unique leastletter-set and thus also a least letter-set version A0.

It is possible to construct algorithms to find a least letter-set version of anyformula, although they are too complex to be carried out easily by hand. So, forsmall examples like those of the next exercise, we use our experience with truth-functional formulae to inspire guesses, which we then settle by checking.


Exercise 8.7.3 (with solution)(a) Find a least letter-set version for each of the following formulae: (i) p^:p, (ii)

(p^q^r)_(p^:q^r), (iii) (p$q)_(p$r)_(q$r).(b) Do the same for the following sets of formulae: (i) fp_q_r, p_:q_r_s,

p_:q_r_:sg, (ii) fp!q, p!:qg.(c) True or false? ‘The least letter-set of a finite set of formulae is the same as the

least letter-set of the conjunction of all of its elements’. Explain.

Solution(a) (i) ?, (ii) p^r, (iii) :?. (b) (i) p_r, (ii) :p. (c) True: A finite set of formulae

is tautologically equivalent to the conjunction of all of its elements, so theformulae that are equivalent to the set are the same as those equivalent to theconjunction.

8.4.4 Most Modular Version

The next kind of simplification is rather more subtle. It leads us to a representationthat does not change the letter-set but makes their role as ‘compartmentalized’ or‘modular’ as possible. The definition makes essential use of the notion of a partition,and you are advised to review the basic theory of partitions in Chap. 2 (Sect. 2.5.3)before going further.

Consider the formula set A D f:p, r!((:p^s)_q), q!pg. This set has threeformulae as its elements. Between them, the formulae contain four elementaryletters p, q, r, s. None of these letters is redundant – the least letter-set is stillfp, q, r, sg. But the way in which the letters occur in formulae in A is unnecessarily‘mixed up’: they can be separated out rather better from each other. In other words,we can make the presentation of A more ‘modular’, without reducing the set ofletters involved.

Observe that A is tautologically equivalent to the set A0 D f:p, r!s, :qg. Wehave not eliminated any letters, but we have disentangled their role in the set. Ineffect, we have partitioned the letter-set fp, q, r, sg of A into three cells fpg, fr,sg,fqg, with each formula in A0 drawing all its letters from a single cell of the partition.Thus, the formula :p takes its sole letter from the cell fpg, r!s draws its lettersfrom the cell fr,sg and :q takes its letter from the cell fqg. We say that the partitionffpg, fr,sg, fqgg of the letter-set fp, q, r, sg is a splitting of A.

In general terms, here is the definition. Let A be any set of formulae, with Ethe set of its elementary letters. A splitting of A is a partition of E such that A istautologically equivalent to some set A0 of formulae, such that each formula in A0takes all of its letters from a single cell.

Now, as we saw in Exercise 2.5.3 of the chapter on relations, partitions of a setcan be compared according to their fineness. Consider any two partitions of the sameset. One partition is said to be at least as fine as another iff every cell of the formeris a subset of some cell of the latter. This relation between partitions is a partial


ordering in the sense also defined in the chapter on relations: reflexive, transitive,antisymmetric. As we also saw in the end-of-chapter exercises, one partition is atleast as fine as another iff the equivalence relation corresponding to the former is asubrelation of that corresponding to the latter.

Since a splitting of a set of propositional formulae is a special kind of partitionof the set of its elementary letters, it makes sense to compare splittings accordingto their fineness. In our example A D f:p, r!((:p^s)_q), q!pg, the three-cellsplitting ffpg, fr,sg, fqgg mentioned above is finer than the two-cell splitting ffp,qg,fr,sgg that corresponds to the formula set A0 D f:p^:q, r!sg equivalent to A; andthis is in turn finer than the trivial one-cell splitting ffp, q, r, sgg that corresponds toA itself.

It turns out that each set A of formulae has a unique finest splitting. In the caseof our example, it is the three-cell partition ffpg, fr,sg, fqgg of E. The four-cellpartition ffpg, frg, fsg, fqgg of E is finer, but it is not a splitting of A, since no setA00 of formulae, each of which draws its letters from a single cell of this partition, isequivalent to A. We will not prove this, but intuitively it is to be expected since eachformula in A00 contains only a single letter.

Any formula set A0 equivalent to A, using the same letters as A, but with the lettersin each formula of A0 taken from a single cell of the finest splitting of A, is called afinest splitting version of A.

Strictly speaking, the definitions above cover only the principal case that thereis some elementary letter in some formula of A. For in the limiting case that thereare no elementary letters, that is, when E D ¿, the notion of a partition of E is notdefined. However, in this case, A must be tautologically equivalent either to ? or to:?, and it is convenient to take that as the finest splitting version of A in this case.

Thus, a finest splitting version of a set of formulae disentangles as much aspossible the roles that are played by the different elementary letters. It makes thepresentation of the set as modular as possible: we have reached the finest wayof separating the letters such that no formula contains letters from two distinctcells. The finest splitting versions of A may thus also be called the most modularpresentations of A. They could also be called the minimal mixing versions of A.

Whereas the idea of the least letter-set of a set of formulae is quite old, perhapsdating back as far as the nineteenth century, that of the finest splitting is surprisinglynew. It was first formulated and verified for the finite case by Rohit Parikh in 1999;a proof of uniqueness for the infinite case was given (by the author) only in 2007!

Just as for least letter-set versions, any general algorithm for finding mostmodular versions is highly exponential and laborious to execute by hand in eventhe simplest examples. So for such small examples, we again use our experiencewith truth-functional formulae to inspire our guessing and follow it up by checking,as in the following exercise:

Exercise 8.4.4 (1) (with partial solution) Find the finest splitting and a most mod-ular version for each of the following sets of formulae: A D fp^q, rg, B D fp!q,q!r, r!:pg, C D f(:p^q^r)_(q^:s^:p)g, D D fp_q, q_r, r_:p, :q_rg.Remember: You are not eliminating redundant letters (indeed, in these instances,no letters are redundant); you are splitting the existing letter-set.

8.5 Semantic Decomposition Trees 209

Solution to (a) and (b) The finest splitting of A partitions its letter-set E D fp, q, rginto singleton cells: ffpg, fqg, frgg, with a most modular version of A being A0 Dfp, q, rg. The finest splitting of B partitions its letter-set E D fp, q, rg into two cells:ffpg, fq, rgg, with a most modular version of B being B0 D f:p, q!rg.

Finally, we note that there is nothing to stop us combining these two kinds ofletter management with each other and with either dnf or cnf. Given a set A offormulae, we can first eliminate all redundant letters, getting it into least letter-setform A0; then work on A0 to put it into a most modular version A00; and then go on,if we wish, to express the separate formulae ’ 2 A00 in dnf or in cnf.

Exercise 8.4.4 (2) Find the least letter-set of the following set of formulae andput it in most modular form over that least letter-set: f(p!q)^(r!s), :q^(p_u),(u_r)!(s_r)g.

8.5 Semantic Decomposition Trees

By now, you are probably sick of drawing up truth tables, even small ones of fouror eight rows, to test for the various kinds of tautological relations and properties(tautological consequence and equivalence, tautologies and contradictions). Wedescribe a shortcut method that can be used instead.

The method is semantic, in the sense that it is formulated in terms of truth-values;algorithmic, in the sense that its steps can be carried out by a computer; and two-sided, in the sense that it gives a way of determining, in a finite time, whether or nota formula is, for example, a tautological consequence of another. It thus supplies uswith a decision procedure. It is called the method of semantic decomposition trees,also known as semantic tableaux.

We begin with an example. We already know that the de Morgan formula’ D :(p^q)!(:p_:q) is a tautology, but let’s check it without making a table.Suppose that v(’) D 0 for some valuation v. We show by successive decompositionsof the formula that this supposition leads to a violation of bivalence. From the suppo-sition, we have by the table for the arrow that v(:(p^q)) D 1 while v(:p_:q) D 0.From the latter by the table for disjunction, v(:p) D 0 and v(:q) D 0, so by thetable for negation, v(p) D 1 and v(q) D 1. On the other hand, since v(:(p^q)) D 1,we have v(p^q) D 0. so by the table for conjunction, either v(p) D 0 or v(q) D 0.In the first case, we get a contradiction with v(p) D 1, and in the second case, acontradiction with v(q) D 1. Thus, the initial supposition that v(’) D 0 is impossible,so :(p^q)!(:p_:q) is a tautology.

This reasoning can be set out in the form of a labelled tree, as in Fig. 8.1. It isconstructed from the root down and should be read in the same way.

To see what is going on, note the following features of the way in which this treeis constructed:• The root is labelled with the formula that we are testing, together with a truth-

value. In our example, we are testing whether ’ is a tautology, that is, whether it


Fig. 8.1 Semanticdecomposition tree for:(p^q)!(:p_:q)

is impossible for it to receive value 0, so the label is 0: ’. If we had been testingwhether ’ is a contradiction, the label would have been 1: ’.

• At each step, we decompose the current formula, passing from information aboutits truth-value to resulting information about its immediate subformulae. Neverin the opposite direction. The information is supplied by the truth table for theparticular connective that is being decomposed at that point.

• When we get disjunctive information about the immediate subformulae of ®,we divide our branch into two subbranches, with one of the alternatives on onebranch and the other alternative on the other.

• When we get definite information about the immediate subformulae of ®, we putit on every branch below the node for ®. One way of not forgetting this is bywriting it down immediately before any further dividing takes place.Having constructed the decomposition tree, we need to be able to read our answer

from it. We make sure that we have decomposed each node in the tree wheneverpossible, that is, when it is not an elementary letter. In our example, the ticks next tonodes keep a record that we have actually carried out the decomposition. Then weread the completed tree as follows:• If every branch contains an explicit contradiction (two nodes labelled by the

same formula with opposite signs, 1: ® and 0: ®), then the label of the root isimpossible. This is what happens in our example: there are two branches, onecontaining both v(p) D 1 and v(p) D 0 and the other containing both v(q) D 1 and

8.5 Semantic Decomposition Trees 211

Fig. 8.2 Semantic decomposition tree for ((p^q)!r)!(p!r)

v(q) D 0. We label these branches dead and conclude that v(’) D 0 is impossible,that is, that ’ is a tautology.

• On the other hand, if at least one branch is without explicit contradiction,then the label of the root is possible, provided we have really decomposed alldecomposable formulae in that branch. We label these branches ok, and read offany one of them a valuation that gives the root formula the value indicated in thelabel.We give a second example that illustrates the latter situation. Consider the

formula ’ D ((p^q)!r)!(p!r) and test to determine whether or not it is atautology. We get the labelled tree of Fig. 8.2.

This decomposition tree contains three branches. The leftmost and rightmostones each contain an explicit contradiction (1: p and 0: p in the left one, 0: rand 1: r in the right one), and so we label them dead. The middle branch doesnot contain any explicit contradiction. We check carefully that we have completedall decompositions in this branch – that we have ticked every formula in it otherthan elementary letters. We collect from the branch the assignment v(p) D 1,v(r) D v(q) D 0. This valuation generated by this assignment gives every formula inthe branch its labelled value, and in particular gives the root formula ’ the labelledvalue 0, so that it is not a tautology.

Evidently, this method is algorithmic and can be programmed. It is usuallyquicker to calculate (whether by human or by machine) and more fun than a fulltable, although it must be admitted that in worst-case examples, it turns out tobe just as exponential as a truth table. The construction always terminates: as wego down the branches, we deal with shorter and shorter formulae until we reachelementary letters and can decompose no further. This is intuitively clear: a rigorousproof would be by structural induction on formulae.


Table 8.9 Definitedecomposition rules

1: :’ 0: :’ 1: ’^“ 0: ’_“ 0: ’!“

0: ’ 1: ’ 1: ’ 0: ’ 1: ’

1: “ 0: “ 0: “

Table 8.10 Branchingdecomposition rules

1: ’_“ 0: ’^“ 1: ’!“ 1: ’$“ 0: ’$“

1: ’ 1: “ 0: ’ 0: “ 0: ’ 1: “ 1: ’ 0: ’ 1: ’ 0: ’

1: “ 0: “ 0: “ 1: “

There are many ways in which the method can be streamlined to maximizeits efficiency. Some gains may be obtained by controlling the order in whichdecompositions are carried out. In our second example, after decomposing the rootto introduce the second and third nodes, we had a choice of which of those todecompose first. We opted to decompose the third node before the second, but wecould perfectly well have done the reverse. Our choice was motivated by the factthat the third node gives definite information, and we were following a policy ofpostponing branching for as long as possible. This is a control heuristic that workswell, at least for humans in small finite examples, because it can reduce the numberof branches and repetitions of labelled nodes on them. Whichever order we follow,we get the same final verdict.

Exercise 8.5 (1) Reconstruct the semantic decomposition tree for ((p^q)!r)!(p!r), decomposing the second node before the third one.

Also important as a way of reducing unnecessary work is the fact that we cansometimes stop constructing the tree before it is finished. This can be done in twodistinct ways:• If a branch contains an explicit contradiction before having been fully de-

composed, we can already declare that branch is dead without completing itsdecomposition and pass on to other branches. This sometimes saves a little work.

• If we have a branch, all whose nodes have been decomposed, and it is free ofexplicit contradictions, then we can declare that this branch is ok so that the labelof the root (and of everything on the branch) is possible, without bothering tocomplete any other branches. This sometimes saves a lot of work.In addition to these economies, one can develop heuristics to guide the choice,

when we branch, of which side to construct first and how far before looking atthe other, in other words balancing depth-first and breadth-first strategies. But suchheuristics take us beyond the limits of this introduction, and we leave them aside.Tables 8.9 and 8.10 recapitulate the definite and branching decompositions thatone may make with the various truth-functional connectives. Their justification isimmediate from the truth tables.

We thus have two rules for decomposing each connective, one for sign 1 and theother for sign 0. Note that the rule for : is definite no matter what the sign (1 or 0);the rules for ^, _, ! are definite or branching depending on the sign, and the one for


$ branches irrespective of sign. Most common student errors: (1) misrememberingthe rule for !, usually from not having remembered its truth table well enough and(2) forgetting that for material equivalence, we always need to make two entries oneach side of the fork.

Exercise 8.5 (2)(a) What would the decomposition rules for exclusive disjunction look like? Would

they be definite or disjunctive? Can you see any connection with the rules forany of the other connectives?

(b) Determine the status of the following formulae by the method of decompositiontrees. First, test to see whether it is a tautology. If it is not a tautology, test tosee whether it is a contradiction. Your answer will specify one of the threepossibilities: tautology, contradiction, contingent.

(i) (p!(q!r))!((p!q)!(p!r))(ii) (p!q)_(q!p)

(iii) (p$q)_(q$p)(iv) (p$q)_(p$r)_(q$r)(v) (p!q)^(q!r)^(q!:p)

(vi) ((p_q)^(p_r))!(p^(q_r)).

Clearly, the method may also be applied to determine whether two formulae aretautologically equivalent, whether one formula tautologically implies another, andmore generally, whether a finite set of formulae tautologically implies a formula.

• To test whether ’ a` “, test whether the formula ’$“ is a tautology.• To test whether ’ ` “, test whether the formula ’!“ is a tautology.• To test whether f’1, : : : ,’ng ` “, test whether the formula (’1^ : : : ^’n)!“ is a

tautology.

Exercise 8.5 (3)(a) Use the method of decomposition trees to determine whether (i) (p_q)!r a`

(p!r)^(q!r), (ii) (p^q)!r a` (p!r)_(q!r), (iii) p^:p a` q$:q.(b) Use the method of decomposition trees to determine whether f:q_p, :r!:p,

s, s!:r, t!pg ` :t^:q.


Exercise 8.1 Truth-functional connectives(a) Construct the full disjunctive normal form for each of the two-place truth

functions f2, f11 and f15 constructed for Exercise 8.2 (2). Can any of them beexpressed more simply?

(b) Show that the connective-set f^, _, !, $g is not functionally complete. Hint:Think about the top row of the truth table to suggest a truth function that cannotbe expressed by formulae built using only these connectives. Then use structuralinduction on those formulae to check out that this is really so.


(c) Show that the pair f:, $g is not functionally complete. This exercise is quitechallenging, so we give plenty of hints. Aim at showing that f:, $g do notsuffice to express ^. As a preliminary, first, show that if they do suffice toexpress ^, then they can do so by a formula with just two elementary letters.The main part will then be to show by structural induction that any formulawith just two elementary letters and with at most :, $ as connectives, is truein an even number of the four rows of its truth table. This can be done directlyby structural induction on formulae. Alternatively, it can be done via anotherinduction, showing that every formula with at most two letters p, q and at mostthose connectives is equivalent to one of eight specific formulae.

Exercise 8.2 Tautological relations(a) In informal mathematics, we often say, when asserting a chain of equivalences,

‘’ iff “ iff ”’, just as in arithmetic, we say a D b D c. But what does it mean:’$(“$”), (’$“)$”, or (’$“)^(“$”)? Are they tautologically equivalent?

(b) Explain why the following two properties are equivalent: (i) ’ is a contradiction,(ii) ’ ` “ for every formula “.

(c) Explain why the following three properties are equivalent: (i) ’ is a tautology,(ii) “ ` ’ for every formula “, (iii) ¿ ` ’.

(d) A set A of formulae is said to be inconsistent iff there is no valuation v such thatv(’) D 1 for all ’ 2 A. How does this concept, applied to finite sets of formulae,relate to that of a contradiction?

(e) Show, as claimed in the text, that the set of all contradictions and the relationsof tautological equivalence and tautological implication are closed undersubstitution. Hint: You may do this in either of two ways – either directly fromtheir definitions or from their connections with the property of being a tautologyand the fact (shown in the text) that the set of all tautologies is closed undersubstitution.

(f) Show by structural induction that every formula constructed using the zero-aryconnective ? and the usual :,^,_ but without any elementary letters is either atautology or a contradiction.

(g) In Exercise 1.4.3 (1) of Chap. 1, we saw inter alia that (AnB)nC D (AnC)nB andAn(BnC) � (AnB)[C for arbitrary sets A, B, C. What would the counterpart ofthese be in propositional logic, expressed using ^, :, _?

Exercise 8.3 Normal forms(a) Find a dnf for each of the following formulae, (i) r!:(q_p), (ii)

p^:(r_:(q!:p)), using the semantic method for one and the method ofsuccessive syntactic transformations for the other.

(b)* Find a cnf for the formula p^:(r_:(q!:p)) by each of the three methods:(i) semantic, (ii) successive syntactic transformations, (iii) indirect method (viathe dnf of its negation).

(c) For each of the following sets of formulae, find a least letter-set version:A D fp$q, q$r, r$:pg, B D f(p^q^r)_(s^r)g, C D f(p^q^r)_(s^q), :pg.


(d) True or false? For each, give a proof or counterexample. (i) The least letter-setof a formula is empty iff the formula is either a tautology or a contradiction. (ii)If a letter is redundant in each of ’,“, then it is redundant in ’^“. (iii) The leastletter-set of a disjunction is the union of the least letter-sets of its disjuncts.

(e)* Find most modular versions of your answers to (c).

Exercise 8.4 Semantic decomposition trees*(a) Use a semantic decomposition tree to show that (p^q)!r ° (p^(q_s))!r.(b) Use semantic decomposition trees to determine whether each the following

hold: (i) (p_q)!r a` (p!r)^(q!r), (ii) p!(q^r) a` (p!q)^(p!r), (iii)(p^q)!r a` (p!r)_(q!r), (iv) p!(q_r) a` (p!q)_(p!r).

(c) Use semantic decomposition trees to determine whether p!(q!r) ` (p!q)!rand conversely.

Selected Reading

Introductions to discrete mathematics tend to put their chapters on logic right at the beginning. Inthe order of nature, this makes good sense but it also makes it difficult to use tools like set,relation and function in the presentation. One computer science text that introduces logic afterhaving presented those notions is:

Hein J (2002) Discrete structures, logic and computability, 2nd edn. Jones and Bartlett, Boston,chapter 6

Five well-known books dedicated to elementary logic are listed below. The first is writtenspecifically for students of computer science without much mathematics, the second for thesame students but with more mathematical baggage, while the last three are aimed respectivelyat students of philosophy, linguistics and the general reader.

Huth M, Ryan M (2000) Logic in computer science. Cambridge University Press, Cambridge,chapter 1

Ben-Ami M (2001) Mathematical logic for computer science, 2nd edn. Springer, London, chapters1–4

Howson C (1997) Logic with trees. Routledge, London/New York, chapters 1–4Gamut LTF (1991) Logic, language, and meaning, vol I, Introduction to logic. University of

Chicago Press, Chicago, chapters 1,2Hodges W (1977) Logic. Penguin, Harmondsworth, sections 1–25

9Something About Everything: QuantificationalLogic

AbstractAlthough fundamental, the logic of truth-functional connectives has very limitedexpressive power. In this chapter we go further, explaining the basic ideas ofquantificational (alias first-order or again predicate) logic, which is sufficientlyexpressive to cover most of the deductive reasoning that is carried out in standardmathematics and computer science.

We begin by presenting its language, built around the universal and existentialquantifiers, and the ways they can be used to express complex relationships.With no more than an intuitive understanding of the quantifiers, some of thebasic logical equivalences involving them can already be appreciated. For adeeper understanding, we then present the semantics of the language, whichis still bivalent but goes beyond truth tables. This semantics may be given intwo versions, substitutional and x-variant, which however are equivalent undera suitable conditions. After explaining the distinction between free and boundoccurrences of variables, and the notion of a clean substitution, the chapter endswith a review of some of the most important logical implications with quantifiersand with the identity relation.

9.1 The Language of Quantifiers

We have been using quantifiers informally throughout the book, and have made afew remarks on them in a logic box in Chap. 2. Recall that there are two quantifiers8 and 9, meaning ‘for all’ and ‘for some’. They are always used with an attachedvariable.


217

218 9 Something About Everything: Quantificational Logic

Table 9.1 Examples of quantified statements

English Symbols

1 All composer are poets 8x(Cx ! Px)2 Some composers are poets 9x(Cx^Px)3 No poets are composers 8x(Px !:Cx)4 Everybody loves someone 8x9y(Lxy)5 There is someone who is loved by everyone 9y8x(Lxy)6 There is a prime number less than 5 9x(Px^(x < 5))7 Behind every successful man stands an ambitious

woman8x[(Mx^Sx) ! 9y(Wy^Ay^Byx)]

8 No man is older than his father :9x(Oxf (x))9 The successors of distinct integers are distinct 8x8y[:(x � y) ! :(s(x) � s(y)]10 The English should be familiar from Chap. 4! [P0^8xfPx ! PxC1g] ! 8x(Px)

9.1.1 Some Examples

Before getting systematic, we give some examples of statements of ordinary Englishand their representation using quantifiers, commenting on their salient features(Table 9.1). Each of these examples raises a host of questions, whose answers willemerge as we continue through the chapter.1. This symbolization uses the universal quantifier 8, variable x, two predicate

letters, truth-functional !. Could we use a different variable, say, y? Why arewe using ! here instead of, say, ^? Can we express this using 9 instead of 8?What is its relation to 8x(Px ! Cx)?

2. Why are we using ^ here instead of !? Can we express this using 8 instead of9? Is it logically implied by statement (1)? What is its relation to 9x(Cx^:Px)?

3. Can we express this using 9, ^, :? What is its relation to 8x(Cx ! :Px)?4. Why haven’t we used a predicate P for ‘is a person’? Does the meaning change

if we write 8x9y(Lyx)?5. Is this equivalent to 4? Does either logically imply the other?6. Could we somehow replace the individual constant by a predicate?7. Can we express this equivalently with both quantifiers up the front?8. Is it possible to express the statement more ‘positively’? Could we express it

with a relation symbol F and two variables?9. Can we simplify this by converting the part inside square brackets? Does the

meaning change if we reverse the initial quantifiers? Is the identity relation anydifferent, from the point of view of logic, than other relations? Why aren’t weusing the usual sign D for identity?

10. Why haven’t we used a predicate N for ‘is a natural number’? Shouldn’t thesecond universal quantifier take a different variable? Can we move the secondquantifier right out the front? What about the first quantifier?

Exercise 9.1.1 On the basis of your informal experience with quantifiers, have ago at answering the questions. If you feel confident about most of your answers,

9.1 The Language of Quantifiers 219

Table 9.2 Ingredients of the language of quantificational logic

Broadcategory Specific items Signs used Purpose

Basic terms Constants a, b, c, : : : Name-specific objects: forexample, 5, Charlie Chaplin,London

Variables x, y, z, : : : Range over specific objects,combine with quantifiers toexpress generality

Functionletters

2-Place f, g, h, : : : Form compound terms out ofsimpler terms, starting from thebasic ones

n-PlacePredicates 1-Place P, Q, R, : : : For example, is prime, is funny,

is polluted2-Place For example, is smaller than,

resemblesn-Place For example, lies between

(3-place)Special relation sign � Identity

Quantifiers Universal 8 For allExistential 9 There is

Connectives ‘Not’ etc. :, ^, _, ! Usual truth tablesAuxiliary Parentheses and commas To ensure unique

decomposition and makeformulae easier to read

congratulations – you seem to have a head start! But keep a record of your answersto check later! If on the other hand the questions leave you puzzled or uncertain,don’t worry; by the end of the chapter, all will be clear.

9.1.2 Systematic Presentation of the Language

The basic components of the language of quantificational logic are set out inTable 9.2. The familiar truth-functional connectives are of course there, and the newquantifiers. But to allow the quantifiers to do their work, we also need some furtheringredients.

In each of the categories of basic terms, function letters and predicates, weassume that we have an infinite supply. These can be referred to by using numericalsubscripts, for example, f1, f2, : : : , for the function letters. We could also havenumerical superscripts to indicate the arity of the function and predicate letters,so that the two-place predicate letters, say, are listed as R1

2, R22, : : : But as this is

rather cumbersome, we ordinarily use a few chosen letters as indicated in the table,with context or comment indicating the arity.


How are these ingredients put together to build formulae? Recursively, as youwould expect, but in two stages. First, we define recursively the notion of a term,and then afterwards we use that to define, again recursively, the notion of a (well-formed) formula. We begin with terms.

Basis: Constants and variables are terms (we call them basic terms, in some texts‘atomic terms’).Recursion step: If f is an n-place function symbol and t1, : : : ,tn are terms, thenf (t1, : : : ,tn) is a term.

Exercise 9.1.2 (1) (with solution) Which of the following are terms, where f istwo-place and g is one-place? In the negative cases, give a brief reason. (i) a, (ii)ax, (iii) f (x,b), (iv) f (c,g(y)), (v) g(g(x,y)), (vi) f (f (b,b),f (a,y)), (vii) g(g(a)), (viii)g(g(a), (ix) gg(a).

Solution (i) Yes, it is a basic term. (ii) No, it juxtaposes two basic terms but is notitself basic, nor is it formed using a function symbol. (iii) Yes. (iv) Yes. (v) No, gis one-place. (vi) Yes. (vii) Yes, (viii) Not quite, a right-hand parenthesis forgotten.(ix) Strictly speaking no, but this often used to abbreviate g(g(a)), and we will dothe same in what follows.

Given the terms, we can now define formulae recursively.

Basis: If R is an n-place predicate and t1, : : : ,tn are terms, then R(t1, : : : ,tn) is aformula (sometimes called an atomic formula), and so is (t1 � t2) where � is thesymbol for the identity relation.Recursion step: If ’,“ are formulae and x is a variable, then the following areformulae: (:’), (’^“), (’_“), (’ ! “), 8x(’), 9x(’).

As in propositional logic, we drop parentheses whenever context or conventionsuffices to ensure unique decomposition. For ease of reading, when there are multi-ple parentheses, we also use different styles, for example, square and curly brackets.The special relation sign � for identity is customarily infixed. All other predicatesand also the function symbols are prefixed; for this reason, the parentheses andcommas in terms and atomic formulae without identity can in principle be omittedwithout ambiguity, and they will be when it improves readability.

Alice Box: Use and Mention

Alice: I notice that when you talk about a specific sign, say, the conjunctionsign or the universal quantifier, you do not put it in inverted commas.For example, you say ‘The symbol 8 is the universal quantifier’,rather than ‘The symbol ‘8’ is the universal quantifier’. Is thislegitimate? Isn’t it like saying ‘The word London has six letters’ whenyou should be saying ‘The word ‘London’ has six letters’?

(continued)


(continued)Hatter: Yes and no. Strictly speaking, one should distinguish between use and

mention: the use of a symbol like the quantifier in our object languageand its mention when talking about it in our metalanguage. That couldbe done by using quotation marks, italics, underlining, a differentfont, etc. But when carried out systematically, it is very tiresome. Asis customary in computer science and mathematics writing, we omitthem except when really needed to avoid confusion.

Alice: So presentations of logic are in general rather less rigorous than thelogics they present!

Hatter: I wouldn’t say less rigorous. Less formal would be more accurate.

The Hatter is right. Another example of informality in our presentation can beseen in the above definition of formulae. In the recursion step, we have used oneparticular variable, namely, x. But what we really mean is that we can use any of thevariables x, y, z, : : : in this way. So when ’ is a formula, then 8y(’),8z(’), : : : arealso formulae.

The reason for such informality is that it keeps down the notation. In ourmetalanguage, we already have meta-variables ’, “, : : : ranging over formulae; wedon’t want to add special signs (say Gothic letters) to serve as meta-variables thatrange over, say, the set of variables of the object language, unless we really haveto. In the early twentieth century, in obeisance to a misplaced sense of rigour, a fewlogic texts did attempt to work with multiple arrays of meta-variables ranging overdifferent categories of symbols of the object language, and to use inverted commaswhenever mentioning rather than using a symbol. But the result was cumbersome,annoying and almost unreadable. Nobody does it any more, although everybodyknows that it can be done, and that for utter strictness should be done.

Exercise 9.1.2 (2) (with solution) (a) Which of the following are formulae, whereR is a two-place predicate, S is a three-place predicate, and P is a one-placepredicate, and f, g are as in the previous exercise? In the negative cases, give a briefreason. (i) R(a,a), (ii) R(x,b)^:P(g(y)), (iii) R(x,P(b)), (iv) P(P(x)), (v) 8(Rxy),(vi) 9P(Px), (vii) 8x8x(Px), (viii) 8x(Px ! Rxy), (ix) 9y9y[R(f (x,y),g(x))], (x)8x(9y(Rxy)), (xi) 8x9y(Rxy), (xii) 9x(Ray), (xiii) 9x8x(Ray), (xiv) 8x9y8z(Sxyz).

Solution (i) and (ii) Yes. (iii) No, P(b) is a formula, not a term, and we need a termin this position if the whole expression is to be a formula. Note this carefully, asit is a common student error. (iv) No, same reason. (v) No, the quantifier does nothave a variable attached to it. (vi) No, the quantifier has a predicate attached to it,rather than a variable. (vii) Yes, despite the fact that both quantifiers have the samevariable attached to them. (viii), (ix) and (x) Yes. (xi) Strictly speaking, there shouldbe another pair of parentheses, as in (x). But when we have a string of quantifiersfollowing each other like this, we can drop such parentheses without ambiguity, andwill do so. (xii) and (xiii) Yes. (xiv) Yes.


Examples (vi) and (xiv) in the exercise deserve further comment. Although (vi) isnot a formula of quantificational logic, also known as first-order logic, it is admittedin what is called second-order logic, which goes beyond our remit. In (xiv), we havethree alternating quantifiers in the order 898. You may have come across this levelof complexity when defining the notion of a limit in real number theory.

Exercise 9.1.2 (3) (with some solutions, hints, comments) Express the followingstatements in the language of quantificational logic, using naturally suggestiveletters for the predicates. For example, in (a), use L for the predicate ‘is a lion’,T for the predicate ‘is a tiger’ and D for the predicate ‘is dangerous’.

(a) Lions and tigers are dangerous. (b) If a triangle is right-angled, then it isnot equilateral. (c) Anyone who likes Albert likes Betty. (d) Albert doesn’t likeeverybody Betty likes. (e) Albert doesn’t like anybody Betty likes. (f) Everyonewho loves someone is loved by that person. (g) Everyone who loves someone isloved by someone. (h) There are exactly two prime numbers less than 5. (i) Everyinteger has exactly one successor. (j) My mother’s father is older than my father’smother.

Some solutions, hints and comments

(a) Can be 8x(Lx ! Dx)^8x(Tx ! Dx) or 8x((Lx_Tx) ! Dx). It cannot be ex-pressed as 8x((Lx^Tx) ! Dx) – try to understand why! Note that the universalquantifier is not explicit in the English, but it evidently part of the meaning.

(b) 8x((Tx^Rx) ! :Ex). Here we are using T for ‘is a triangle’. You can alsosimplify life by declaring the set of all triangles as your domain (also calleduniverse) of discourse and write simply 8x(Rx ! :Ex). This kind of simplifica-tion, by declaring a domain of discourse, is often useful when symbolizing. Butit is legitimate only if the domain is chosen so that the statement says nothingabout anything outside the domain.

(c) 8x(Lxa ! Lxb). We declare our domain of discourse to be the set of all people,write Lxy stands for ‘x loves y’, and we use individual constants for Albert andBetty.

(d) and (e) Try to understand the difference of meaning between the two. TheEnglish words ‘every’ and ‘any’ tend do much the same work when they occurpositively, but when occurring negatively, they can do quite different jobs.Declare your domain of discourse.

(f) and (g) Try to understand the difference of meaning, and declare your domainof discourse. In (f), you will need two quantifiers, both universal (despite theidiomatic English), while in (g), you will need three quantifiers.

(h) The problem here is how to say that there are exactly two items with a certainproperty. Try paraphrasing it as: There is an x and there is a y such that (1)they both have the property and (2) everything with the property is identicalto at least one of them. This goes into the language of quantificational logicsmoothly. Take your domain of discourse to be the set of positive integers.

(i) If you can express ‘exactly two’ it should be easy to express ‘exactly one’.(j) Declare your domain of discourse. Use one-place function letters for ‘father of’

and ‘mother of’. Use a constant to stand for yourself. Be careful with order.


9.1.3 Freedom and Bondage

To understand the internal structure of a formula of quantificational logic, threenotions are essential – the scope of a quantifier and free versus bound occurrencesof a variable. We explain them through some examples.

Consider the formula 8z[Rxz ! 9y(Rzy)]. It has two quantifiers. The scope ofeach quantifier is the material in the parentheses immediately following it. Thus, thescope of the first quantifier is the material between the square brackets, and the scopeof the second one likewise is given by the round brackets. Note how the scope ofone quantifier may lie inside the scope of another, as in this example, or be entirelyseparate from it; but they never overlap.

In a quantified formula 8x(’) or 9x(’), the quantifier is said to bind theoccurrence of the variable x that is attached to it and all occurrences of the samevariable x occurring in ’, unless some other quantifier occurring inside ’ alreadybinds them. An occurrence of a variable x in a formula ’ is said to be bound in ’

iff there is some quantifier in ’ that binds it. Occurrences of a variable that are notbound in a formula are called free in the formula. Finally, a formula with no freeoccurrences of any variables is said to be closed, otherwise open. Closed formulaeare often called sentences.

The translation into symbols of any complete sentence of English should have allits variables bound, that is, it should be closed, since a free variable is not specifyinganything in particular, nor yet expressing a universal or existential quantification. Goback to Table 9.1 to check that all its symbolic representations are closed formulae.

Exercise 9.1.3 (with partial solution)(a) Identify the free and bound occurrences of variables in the formula

8z[Rxz ! 9y(Rzy)]. Use marks from above to mark the bound occurrences,marks from below for the free ones.

(b) In the formula 8x9y(Rxy)_9z(Py ! 8x(Rzx^Ryx)), replace some of the roundbrackets by square brackets to help make salient the scopes of the quantifiersin the formula. Add further square brackets for those that are implicit under theconvention for writing strings of contiguous quantifiers. Identify free and boundoccurrences of variables in the formula.

(c) Use the last example to illustrate how a single variable can have one occurrencebound, another free, in the same formula.

(d) Give a simple example to illustrate how an occurrence of a variable may bebound in a formula but free in one of its subformulae.

(e) Consider the formula 8x9y8x9y(Rxy). Which quantifiers bind which occur-rences of x? Which bind which occurrences of y?

(f) Which of the formulae mentioned in this Exercise are closed?(g) Express the definition of an occurrence of a variable being free in a formula

more rigorously, as a recursion on the structure of the formula.


Solutions to (b)–(f)(b) 8x9y(Rxy)_9z[Py ! 8x(Rzx^Ryx)] and 8x[9y(Rxy)]_9z[Py ! 8x(Rzx^Ryx)]

with added parentheses.(c) The occurrences of y to the left of _ are bound, those on the right are free.(d) In the last example, x is bound in the whole formula, and also in the subformula

8x9y(Rxy), but is free in the subformula 9y(Rxy).(e) In the formula 8x9y8x9y(Rxy), the x in Rxy is bound by the innermost 8. The

outer 8 binds only its attached occurrence of x. Likewise, the y in Rxy is boundby the innermost 9. The outer 9 binds only its attached occurrence of y.

(f) Closed formula: 8x9y8x9y(Rxy). Open: 8x9y(Rxy)_9z(Py ! 8x(Rzx^Ryx)),8z[Rxz ! 9y(Rzy)].

9.2 Some Basic Logical Equivalences

At this point, we could set out the semantics for quantificational formulae, withrigorous definitions of logical relationships such as consequence and equivalence.But we will postpone that until the next section and, in the meantime, cultivateintuitions. There are a number of basic logical equivalences that can already beappreciated when you read 8x(’) and 9x(’) intuitively as saying, respectively, ‘’holds for every x in our domain of discourse’ and ‘’ holds for at least one x in ourdomain of discourse’.

9.2.1 Quantifier Interchange

First among these equivalences are the quantifier interchange principles, whichshow that anything expressed by one of our two quantifiers may equivalently beexpressed by the other, with the judicious help of negation. In Table 9.3, theformulae on the left are logically equivalent to those on the right.

Here, ’ stands for any formula of quantificational logic. We will reuse the signa`, familiar from propositional logic, for the intuitively understood, but not yetformally defined, concept of logical equivalence for quantificational formulae.

Alice Box: 9 x(:’) or 9x:(’)?

Alice: A question about the right column in the table. Should we write9x(:’), 8x(:’) or 9x:(’),8x:(’)?

Hatter: Hatter: It doesn’t matter. With all brackets present, we would have9x(:(’)),8x(:(’)), and we can leave off one of the two pairs withoutambiguity.

9.2 Some Basic Logical Equivalences 225

Table 9.3 Quantifierinterchange equivalences

:8x(’) 9x(:’):9x(’) 8x(:’)8x(’) :9x(:’)9x(’) :8x(:’)

Actually, any one of the four equivalences can be obtained from any other bymeans of double negation and the principle of replacement of logically equivalentformulae. For example, suppose we have the first one and want to get the last one.Since ’a` ::’, we have 9x(’) a` 9x(::’) a` :8x(:’) by replacement andthen the first equivalence, and we are done.

Exercise 9.2.1 (1)(a) Obtain the second and third quantifier interchange principles from the first one

by a similar procedure.(b) Use quantifier interchange and suitable truth-functional equivalences to show

that (i) :8x(’ ! “) a` 9x(’^:“), (ii) :9x(’^“) a` 8x(’ ! :“), (iii)8x(’ ! “) a` :9x(’^:“) and (iv) 9x(’^“) a` :8x(’ ! :“). Make free useof the principle of replacement in your work.

The equivalences in the exercise are important, because the formulae correspondto familiar kinds of statement in English. For example, in (b)(i), the left side says‘not all ’s are “s’, while the right one says ‘at least one ’ is not a “’.

Exercise 9.2.1 (2) To what English statements do the equivalences (b)(ii) through(b)(iv) correspond?

9.2.2 Distribution

The next group of equivalences may be described as distribution principles. Theyshow the way in which universal quantification distributes over conjunction, whilethe existential distributes over disjunction (Table 9.4).

Why does the universal quantifier get on so well with conjunction? Essentiallybecause it is like a long conjunction itself. Suppose our domain of discourse hasjust n elements, named by n individual constants a1, : : : ,an. Then saying 8x(Px)amounts to saying, of each element in the domain, that it has the property P, that is,that Pa1^ : : : ^ Pan. This formula is called a finite transform of 8x(Px).

Of course if we change the size of the domain, then we change the length of theconjunction; and to make the transform work, we must have enough constants toname all the elements of the domain. But even with these provisos, it is clear thatthe principles for 8 reflect those for ^, while those for 9 resemble familiar onesfor _. The idea of working with such finite transforms of quantified formulae canbe put to systematic use to obtain negative results. We will return to this later.


Table 9.4 Distributionequivalences

8x(’^“) 8x(’)^8x(“)9x(’_“) 9x(’)_9x(“)

Exercise 9.2.2(a) Write out 8x(’^“) as a conjunction in a domain of three elements, with ’

taken to be the atomic formula Px and “ the atomic formula Qx. Then writeout 8x(’)^8x(“) in the same way and check that they are equivalent. Whattruth-functional principles do you appeal to in that check?

(b) What does 9x(Px) amount to in a domain of n elements?(c) Write out 9x(’_“) as a disjunction in a domain of three elements, with ’ chosen

as Px and “ as Qx. Then write out 9x(’)_9x(“) in the same way and check thatthey are equivalent. What truth-functional principles do you appeal to?

(d) If we think of the universal and existential quantifier as expressing generalizedconjunctions and disjunctions, respectively, to what familiar truth-functionalequivalences do the entries in Table 9.3 correspond?

(e) Illustrate your answer to (d) by constructing finite transforms for the third rowof Table 9.3 in a domain of three elements with ’ chosen as Px.

9.2.3 Vacuity and Relettering

The last two intuitive equivalences that we mention are the vacuity and reletteringprinciples. The vacuity equivalences are expressed in Table 9.5, where all formulaein a given row are equivalent.

In other words, quantifying twice on the same variable does no more than doingit once. To say ‘for every x it is true that for every x we have Px’, as in the top rowmiddle column, is just a long-winded way of saying ‘Px for every x’. Note that inthese vacuity equivalences, the variables must be the same: 8x(Rxy) is not logicallyequivalent to either of 8y8x(Rxy) and 9y8x(Rxy).

The equivalences of the table are special cases of a general principle of vacuousquantification: When there are no free occurrences of x in ®, then each of 8x(®)and 9x(®) is logically equivalent to ®. To see how the table fits under this principle,in the top row, take ® to be 8x(’), and in the bottom row, take ® to be 9x(’).

Exercise 9.2.3 (1) Use vacuity and distribution equivalences to show that8x[Px ! 9y(Qy)] a` 8x9y[Px ! Qy] and 9x[Px^8y(Qy)] a` 9x8y[Px^Qy]. Doyou notice anything interesting about the shapes of the two right-hand formulae?Hint: Begin the first equivalence by getting rid of the conditional in favour ofdisjunction and negation, using propositional logic and the replacement rule.

For the relettering equivalence, we begin with some examples. Intuitively, it isclear that 8x(Px) a` 8y(Py), although 8x(Px^Qy) does not seem to be equivalentto 8y(Py^Qy). Again, 8x9y(Rxy) a` 8x9z(Rxz) a` 8y9z(Ryz) a` 8y9x(Ryx)although 8x9y(Rxy) is certainly not equivalent to 8x9x(Rxx).

9.3 Two Semantics for Quantificational Logic 227

Table 9.5 Vacuityequivalences

8x(’) 8x8x(’) 9x8x(’)9x(’) 9x9x(’) 8x9x(’)

The useful principle of relettering of bound variables may be put as follows. Let® be a formula and x any variable. Let y be a fresh variable, in the sense that it doesnot occur at all (whether free or bound) in ®. Then ® a` ®0, where ®0 is obtainedby replacing all bound occurrences of x in ® by y. The proviso that y is fresh for ®

is essential!

Exercise 9.2.3 (2) (with solution)(a) Use iterated relettering to show that 9x8y(Rxy) a` 9y8x(Ryx).(b) Can relettering be used to show that 9x8y(Rxy) a` 9y8x(Rxy)?(c) Is there an analogous principle of relettering of free variables?

Solution(a) You need to proceed in several steps, bringing in and eliminating fresh variables

z,w: 9x8y(Rxy) a` 9z8y(Rzy) a` 9z8w(Rzw) a` 9y8w(Ryw) a` 9y8x(Ryx).(b) No. Roughly speaking, the reason is that no matter how many replacements

you make, the variable attached to each quantifier will continue to occupy thesame place in the part that is quantified. In fact, in this example, the left- andright-hand sides of (b) are not equivalent.

(c) No. For example, Px is not logically equivalent to Py. However, there is aweaker principle for free variables. Let ® be any formula and x any variable.Let y be a variable fresh to ®, and form ®0 by replacing all free occurrences of xin ® by y. Then ® is logically true iff ®0 is logically true. If the subtle differencebetween this weaker but valid principle and the stronger but invalid one is notintuitively clear to you, return to it after completing the chapter.

9.3 Two Semantics for Quantificational Logic

It is time to get rigorous. In this section, we present the semantics for quantificationallogic. To keep a sense of direction, remember that we are working towards analoguesfor our enriched language of concepts familiar from propositional logic: a valuation,logical consequence, logical truth, etc.

9.3.1 The Common Part of the Two Semantics

We begin with the notion of an interpretation with universe D, where D is a non-empty set. This is defined to be any function v that assigns a value to each constant,variable, predicate letter and function letter of the language in the following way:• For each constant a, v(a) is an element of the domain, that is, v(a) 2 D.• Likewise, for each variable x, v(x) is an element of the domain, that is, v(x) 2 D.


• For each n-place function letter f, v(f ) is an n-argument function on the domain,that is, v(f ): Dn ! D.

• For each n-place predicate P other than the identity symbol, v(P) is an n-placerelation over the domain, that is, v(P) � Dn.

• For the identity symbol, we put v(�) to be the identity relation over the domain.The last clause of this definition means that we are giving the identity symbol

a special treatment. Whereas every other predicate symbol may be interpreted asany relation over the domain of the right arity (i.e. number of places), the identitysymbol is always read as genuine identity over the domain. Semantics following thisrule are often said to be standard with respect to identity. It is also possible to give anon-standard treatment, in which the identity sign is interpreted as any equivalencerelation over the domain that is also congruential with respect to all function lettersand predicates. However, we will not go further into the non-standard approach,confining ourselves as is customary to standard interpretations.

Given an interpretation with domain D, we can define the value vC(t) of a termrecursively, following the definition of the terms themselves. Remember that termsare built from constants and variables by applying function letters. The recursivedefinition of vC follows suit, with two base clauses and a recursion step.

vC(a) D v(a) for every constant a of the languagevC(x) D v(x) for every variable x of the languagevC(f (t1, : : : ,tn)) D (vC(f ))(vC(t1), : : : ,vC(tn)).

The recursion step needs some commentary. We are in effect definingvC(f (t1, : : : ,tn)) homomorphically. The value of a term f (t1, : : : ,tn) is defined bytaking the interpretation of the function letter f, which will be a function on Dn intoD, and applying that function to the interpretations of the terms t1, : : : ,tn, which willall be elements of D, to get a value that will also be an element of D. The notationmakes it look complicated at first, but the underlying idea is natural and simple.

As vC is a uniquely determined extension of v, we will follow the same ‘abuseof notation’ as in propositional logic and write it simply as v. There are somecontexts in which the notation could be simplified further. When considering onlyone interpretation function v, we could reduce clutter by writing v(t) as t, simplyunderlining the term t. But in contexts where we are considering more than oneinterpretation (which is most of the time), this simple notation is not open to us,unless of course we use more than one kind of underlining.

Exercise 9.3.1 Rewrite the recursion step of the definition of vC using theunderlining notation.

Finally, we may go on to define the notion of the truth or falsehood of a formula’ under an interpretation v with domain D. Once again, our definition is recursive,following the recursive construction of the formulae. We are defining a function v:L ! f0,1g from the set L of all formulae into the set f0,1g. We will merely specifywhen v(’) D 1, leaving it understood that otherwise v(’) D 0.


• Basis, first part: For any atomic formula of the form Pt1, : : : ,tn: v(Pt1, : : : ,tn) D 1iff (v(t1), : : : ,v(tn)) 2 v(P).

• Basis, second part: For any atomic formula of the form t1 � t2: v(t1 � t2) D 1 iff(v(t1),v(t2)) 2 v(�). Since v(�) is required always to be the identity relation overD, this means that v(t1 � t2) D 1 iff v(t1) D v(t2).The recursion step also has two parts, one for the truth-functional connectives

and the other for the quantifiers. The first part is easy.• Recursion step for the truth-functional connectives: v(:’) D 1 iff v(’) D 0 and

so on for the other truth-functional connectives, just as in propositional logic.The recursion step for the quantifiers is the heart of the semantics. But it is subtle

and needs careful formulation. In the literature, there are two ways of going about it.They are equivalent under a suitable condition, and from an abstract point of view,one can see them as essentially two ways of doing the same thing. But passions canrun high over which is better to use, and it is good policy to understand both of them.They are known as the substitutional and x-variant readings of the quantifiers. Weexplain each in turn, beginning with the substitutional reading as students usuallyfind it easier to grasp.

9.3.2 Substitutional Reading

This way of interpreting the quantifiers is not often used by mathematicians, but itis often employed by computer scientists and philosophers. Given a formula ’ anda variable x, write ’[t/x] to stand for the result of substituting the term t for all freeoccurrences of x in ’. For example, if ’ is the formula 8z[Rxz ! 9x(Rzx)], then’[a/x] D 8z[Raz ! 9x(Rzx)], obtained by replacing the unique free occurrence of xin ’ by the constant a (leaving bound occurrences of x untouched).

Exercise 9.3.2 (1)(a) Let ’ be the formula 8z[Rxz ! 9x(Rzx)]. Write out ’[b/x], ’[b/y].(b) Do the same for the formula 8x9y(Rxy)_9z[Py ! 8x(Rzx^Ryx)].

We are now ready to give the substitutional reading of the quantifiers under aninterpretation v with domain D. We first make sure that the function v, restricted tothe set of all constants of the language, is onto D, in other words, that every elementof D is the value under v of some constant symbol. If it is not already onto D, weadd enough constants to the language to be able to extend the interpretation functionto ensure that it is onto D. Then, we evaluate the quantifiers as follows:

v(8x(’)) D 1 iff v(’[a/x]) D 1 for every constant symbol a of the language (thusexpanded).v(9x(’)) D 1 iff v(’[a/x]) D 1 for at least one constant symbol a of the language(thus expanded).The reason for requiring that every element of the domain is the value, under v,

of some constant symbol, is to guarantee that 8x really means ‘for every element inthe domain’ and not merely ‘for every element of the domain that happens to havea name in our language’.


Alice Box: Oddities

Alice: One moment, there is something funny here! The requirement thatevery element of the domain D is the image under v of some constantsymbol has a strange consequence. The language itself is not fixed,since the supply of constants depends on the particular domain ofdiscourse under consideration. That’s odd!

Hatter: Odd, yes, but not incoherent. It works perfectly well. But it is onereason why some logicians prefer the other reading of the quantifiers,which we will give shortly.

Alice: Another puzzle. Reading the above rules, it seems that we are usingquantifiers in our metalanguage when defining the semantics ofquantifiers in the object language.

Hatter: Sure, just as we used truth-functional connectives in the metalan-guage when defining their semantics in the object language. There’sno other way.

Exercise 9.3.2 (2) (with partial solution)(a) With a domain D consisting of two items 1 and 2, construct an interpretation

that makes the formula 8x(Px_Qx) true but makes 8x(Px)_8x(Qx) false. Showyour calculations of the respective truth-values.

(b) Do the same for the pair 9x(Px)^9x(Qx) and 9x(Px^Qx).(c) Again for the pair 8x9y(Rxy) and 9y8x(Rxy).

Sample solution to (a) Put v(P) D f1g and v(Q) D f2g, and let a, b beconstants with v(a) D 1 and v(b) D 2. We claim that v(8x(Px_Qx)) D 1 whilev(8x(Px)_8x(Qx)) D 0. For the former, it suffices to show that both v(Pa_Qa) D 1and v(Pb_Qb) D 1. But since v(Pa) D 1, we have v(Pa_Qa) D 1, and likewise sincev(Qb) D 1, we have v(Pb_Qb) D 1. For the latter, v(Pb) D 0 so that v(8x(Px)) D 0,and likewise v(Qa) D 0 so that v(8x(Qx)) D 0, and so finally, v(8x(Px)_8x(Qx)) D 0as desired.

The substitutional account of the quantifiers throws light on the finite transformsof a quantified formula, which we discussed briefly in the preceding section. Letv be any interpretation with domain D. Then, under the substitutional reading, forany universally quantified formula, we have v(8x(’)) D 1 iff v(’[a/x]) D 1 for everyconstant symbol a of the language. If D is finite and a1, : : : ,an are constants namingall its elements, then this holds iff v(’[a1/x]^ : : : ^(’[an/x]) D 1. Similarly, for theexistential quantifier, v(9x(’)) D 1 iff v(’[a1/x]_ : : : _(’[an/x]) D 1.

Thus, the truth-value of a formula under an interpretation v with a finite domainD of n elements can also be calculated by a translation into a quantifier-freeformula (i.e. with only truth-functional connectives). We recursively translate 8x(’)and 9x(’) into ’[a1/x]^ : : : ^’[an/x] and ’[a1/x]_ : : : _’[an/x], respectively, wherea1, : : : ,an are constants chosen to name all elements of D.


Exercise 9.3.2 (3) (with partial solution) Do Exercise 9.3.2 (2) again, but this timetranslating into quantifier-free formulae and assigning truth-values.

Solution Take constants a, b. The translation of 8x(Px_Qx) is (Pa_Qa)^(Pb_Qb)while the translation of 8x(Px)_8x(Qx) is (Pa^Pb)_(Qa^Qb). Let v be thepropositional assignment that, for example, puts v(Pa) D 1 D v(Qb) andv(Pb) D 0 D v(Qa). Then, by truth tables, v((Pa_Qa)^(Pb_Qb))D 1 whilev((Pa^Pb)_(Qa^Qb))D 0 as desired.

9.3.3 The x-Variant Reading

We now explain the rival x-variant reading of the quantifiers. Roughly speaking,whereas the substitutional reading of a formula 8x(’) under a valuation v lookedat the values of certain formulae different from ’ under the same valuation v, thex-variant reading will consider the values of the same formula ’ under certainvaluations, most of which are different from v.

Let v be any interpretation with domain D, and let x be any variable of thelanguage. By an x-variant of v, we mean any interpretation v0 with the samedomain D that agrees with v in the values given to all constants, function letters andpredicates (i.e. v0(a) D v(a), v0(f ) D v(f ), v0(P) D v(P) for all letters of the respectivekinds), and also agrees on the interpretation given to all variables except possiblythe particular variable x (so that v0(y) D v(y) for every variable y of the languageother than the variable x). With this notion in hand, we evaluate the quantifiers for avaluation v with domain D, as follows:

v(8x(’)) D 1 iff v0(’) D 1 for every x-variant interpretation v of v.v(9x(’)) D 1 v0(’) D 1 for at least one x-variant interpretation v of v.

It is instructive to re-solve Exercise 9.3.2 (2) using the x-variant reading of thequantifiers and compare the calculations.

Exercise 9.3.3 (with partial solution)(a) Construct an interpretation v with domain D D f1,2g, such that un-

der the x-variant reading of the quantifiers v(8x(Px_Qx)) D 1 whilev(8x(Px)_8x(Qx)) D 0. Show your calculations.

(b) Do the same for the pair 9x(Px)^9x(Qx) and 9x(Px^Qx).(c) Again for the pair 8x9y(Rxy) and 9y8x(Rxy).

Sample solution to (a) Put v(P) D f1g, v(Q) D f2g and v(x) any element of D (itwill not matter how it is chosen in D since the formulae we are evaluating areclosed). We claim that v(8x(Px_Qx)) D 1 while v(8x(Px)_8x(Qx)) D 0. For theformer, it suffices to show that v0(Px_Qx) D 1 for every x-variant interpretationv0. But there are only two such x-variants, vxD1 and vxD2, defined by puttingvxD1(x) D 1 and vxD2(x) D 2. Then, we have vxD1(Px) D 1 while vxD2(Qx) D 1, sovxD1(Px_Qx) D 1 D vxD2(Px_Qx) and, thus, v(8x(Px_Qx)) D 1. For the latter, wehave vxD2(Px) D 0 D vxD1(Qx) so that v(8x(Px)) D 0 D v(8x(Qx)), and so finally,v(8x(Px)_8x(Qx)) D 0 as desired.


In this exercise, we have used a notation for x-variants (vxD1, vxD2, etc.) thatpermits us to keep track of what is going on. It also runs parallel to the notation forthe substitutional reading (vxD1(’), etc., in place of v(’[1/x]), etc.), making it clearthat at some deep level, the two procedures are ‘doing the same thing’.

Evidently, detailed truth-verifications like those in the exercises are rathertedious, and having done them, you already begin to ‘see’ the value of fairlysimple formulae under a given interpretation, without writing out all the details.It should also be clear from the exercises that the truth-value of a formula under aninterpretation is independent of the value given to the variables that do not occurfree in it. For example, in the formula 8x(Px_Qx), the variable x does not occurfree – all its occurrences are bound – and so its truth-value under an interpretationv is independent of the value of v(x), although, of course, the truth-value of Px_Qxunder an interpretation v does in general depend on the value of v(x).

Alice Box: Values of Formulae That Are Not Closed

Alice: There is something about this that bothers me. I have been browsingin several other books on logic for students of mathematics ratherthan computer science. They use the x-variant account but in a ratherdifferent way. Under the definition above, every formula comes out astrue, or as false, under a given interpretation with specified domain,that is, we always have either v(’) D 1 or v(’) D 0. But the texts that Ihave been looking at insist that when a formula ’ has free occurrencesof variables in it (in other words, is not closed, is not a sentence), thenin general it is neither true nor false under an interpretation! What isgoing on?

Hatter: That is another way of building the x-variant semantics. It is differentin some of the terminology and details, but in the end, it does thesame job. The two accounts agree on closed formulae.

As usual, the Hatter has given the short answer. Alice deserves rather more,although students who have not been exposed to the other way of proceeding mayhappily skip to the next section.

Under the alternative account, terms and formulae are evaluated under aninterpretation in exactly the same way as we have done but with a terminologicaldifference. Rather than speaking of truth and falsity and writing v(’) D 1 andv(’) D 0, the interpretation is said to ‘satisfy’ or ‘not satisfy’ the formula, writing v� ’ or v ² ’. A ‘model’ is understood as an interpretation stripped of the valuesassigned to variables, and a formula is called be true in a model itself iff it is satisfiedby every interpretation extending that model, that is, every interpretation that agreeswith the model but in addition gives values to variables. A formula is said to be falsein the model iff it is satisfied by none of those interpretations.

A formula containing free occurrences of variables will in general be neithertrue nor false in a given model: some assignments of values to its free variables

9.4 Semantic Analysis 233

may satisfy it, others not. The simplest example is the basic formula Px. The modelwith domain D D f1,2g and v(P) D f1g does not make Px true, nor false, since theinterpretation with v(x) D 1 satisfies Px while the interpretation with v(x) D 2 doesnot. But when a formula is closed (i.e. has no free occurrences of variables) thenthe gap disappears: for any given model, a closed formula is either true in the modelor false in the model. Moreover, it turns out that an arbitrary formula ’ is true in amodel iff its universal closure 8x1 : : : 8xn(’) is true in that same model.

The reason why this text (like quite a few others) has not followed the alternativemanner of presentation is that it creates unnecessary difficulties for students. Havingbecome accustomed to bivalent ‘truth’ and ‘falsehood’ in propositional logic, whenthey come to quantificational logic, they are asked to express the same bivalenceusing the different terms ‘satisfaction’ and ‘non-satisfaction’ and to employ theterms ‘truth’ and ‘falsehood’ in a non-bivalent way. A sure recipe for classroomconfusion!

9.4 Semantic Analysis

We now apply the semantic tools developed in Sect. 9.3 to analyze the notion oflogical implication in quantificational languages and establish some fundamentalrules – even more fundamental than the equivalences listed in Sect. 9.2.

9.4.1 Logical Implication

We have all the apparatus for extending our definitions of relations such as logicalimplication from propositional to quantificational logic. We simply take the sameformulations as for truth-functional logic and plug in the more elaborate notion ofan interpretation:• A set A of formulae is said to logically imply a formula “ iff there is no

interpretation v such that v(’) D 1 for all ’ 2 A but v(“) D 0. This relation iswritten with the same ‘gate’ or ‘turnstile’ sign as in propositional logic, A ` “,and we also say that “ is a logical consequence of A.

• ’ is logically equivalent to “, and we write ’ a` “, iff both ’ ` “ and “ ` ’.Equivalently, ’ a` “ iff v(’) D v(“) for every interpretation v.

• ’ is a logically true iff v(’) D 1 for every interpretation v.• A formula ’ is a contradiction iff v(’) D 0 for every interpretation v. More

generally, a set A of formulae is inconsistent iff for every interpretation v there isa formula ’ 2 A with v(’) D 0.

• ’ is contingent iff it is neither logically true nor a contradiction.Evidently, the five bulleted concepts above are interrelated in the same way as

their counterparts in propositional logic (see Sect. 8.3.4) with the same verifications.For example, ’ is logically true iff ¿ ` ’.

In the exercises of Sect. 9.3, we checked some negative results. In effect, we sawthat 8x(Px_Qx) ° 8x(Px)_8x(Qx), 9x(Px)^9x(Qx) ° 9x(Px^Qx) and 8x9y(Rxy)


Fig. 9.1 Logical relationsbetween alternatingquantifiers

° 9y8x(Rxy). In each instance, we found an interpretation in a very small domain(two elements) that does the job. For more complex non-implications, one oftenneeds larger domains. Indeed, there are formulae ’, “ such that ’ ° “ but such thatv(“) D 1 for every interpretation v in any finite domain with v(’) D 1. In other words,such non-implications can be witnessed only in infinite domains (which, however,can always be chosen to be countable). But that is beyond our remit, and we remainwith more elementary matters.

The last of the three non-implications above suggests the general question ofwhich alternating quantifiers imply which. This is answered in Fig. 9.1.

The double-headed arrows in the diagram indicate logical equivalence; single-headed ones are for logical implication. We are looking at formulae with two initialquantifiers with attached variables; the left column changes the quantifiers leavingthe attached variables in the fixed order x,y (22 D 4 cases), while the right columndoes the same with the attached variables in reverse order y,x (another 22 D 4 cases),thus 8 cases in all. In each case, the material quantified remains unchanged. Thediagram is complete in the sense that when there is no path from ® to § in thefigure, then ® ° §.

Exercise 9.4.1 (1) Verify the non-implication 9x8y(’) ° 9y8x(’) by choosing’ to be Rxy and considering a suitable interpretation. Use either the x-variant orsubstitutional account of the quantifier in your verification, as you prefer. Advice:(1) Make life easy by using the smallest domain that you can get away with. (2)If using the x-variant reading, use the same subscript notation as in the solution toExercise 9.3.3.

So far, we have been using the semantics to get negative results. We now use it toget a positive one by verifying the implication 8x(’) ` 9x(’) using the substitutionalreading of the quantifiers. Let v be a valuation with domain D, and supposev(8x(’)) D 1. Then for every constant a of the language, we have v(’[a/x]) D 1.Since the domain is assumed to be non-empty and every element of the domain isgiven a constant as name, there is at least one constant in the language, and so wehave v(’[a/x]) D 1 for at least one constant a, so v(9x(’)) D 1 as desired.


Alice Box: That’s a Fraud!

Alice: You make me laugh! You are trying to prove something intuitivelyobvious, and in proving it, you use the very same principle to proveit (OK, it is in the metalanguage rather than the object language), andthe resulting argument is more complicated and less transparent thanthe original fact. That’s a bit of a fraud!

Hatter: A disappointment, yes; and a pain in the neck, I agree. But not a fraud.To establish a principle about the object language, we must reason inour metalanguage, and if we make everything explicit, it can becomerather complex.

For our second example, we show that 9x8y(’) ` 8y9x(’) using this timethe x-variant reading of the quantifiers. Let v be a valuation with domain D, andsuppose v(9x8y(’)) D 1 while v(8y9x(’)) D 0; we get a contradiction. By the firstsupposition, there is an x-variant of v, which we write as vxDa, with vxDa(8y(’)) D 1.By the second supposition, there is a y-variant vyDb of v with vyDb(9x(’)) D 0. Nowtake the y-variant vxDa,yDb of vxDa. Then by the first remark, vxDa,yDb(’) D 1, whileby the second remark, vyDb,xDa(’) D 0. But clearly, the two valuations vxDa,yDb andvyDb,xDa are the same, so long as x and y are distinct variables, as presupposed inthe formulation of the problem. For the former is obtained by respecifying the valuegiven to x and then respecifying the value given to the different variable y, while thelatter respecifies in reverse order. So vxDa,yDb(’) D 1 while vxDa,yDb(’) D 0, givingus a contradiction.

Exercise 9.4.1 (2)(a) Do the first example again using the x-variant method, and the second example

using the substitutional method. Hint: Follow the same patterns, just changingthe notation.

(b) Verify one of the quantifier interchange principles, one of the distributionprinciples, and one of the vacuous quantification principles of Sect. 9.2. In eachcase, give a proof using the substitutional method or the x-variant method one,as you prefer, but using each method at least once.

(c) Correct or incorrect? (i) 8x(Px ! Qx) ` 8x(Px) ! 8x(Qx), (ii) the converse.Same instructions as for (b).

Alice Box: This Is Tedious : : :

Alice: Do I have to do much of this? I’ve got the idea, but the calculationsare tedious, and by now I can guess the results intuitively in simpleexamples.

Hatter: One example at the end of the chapter for revision.Alice: Will these calculations be in the exam?Hatter: You’d better ask your instructor.


9.4.2 Clean Substitutions

All the logical relations between quantifiers rest on four fundamental principles thatwe have not yet enunciated. To formulate them, we need a further concept – that ofa clean substitution of a term t for a variable x.

Recall that a term t may contain variables as well as constants and functionletters; indeed, a variable standing alone is a term. So when we substitute t for freeoccurrences of x in ’, it may happen that t contains a variable y that is ‘captured’ bysome quantifier of ’. For example, when ’ is the formula 9y(Rxy) and we substitutey for the sole free occurrence of x in it, we get ’[y/x] D 9y(Ryy), where the y thatis introduced is in the scope of the existential quantifier. Such substitutions areundesirable from the point of view of the principles that we are about to articulate.We say that a substitution ’[t/x] is clean iff no free occurrence of x in ’ falls withinthe scope of a quantifier binding some variable occurring in t.

Exercise 9.4.2 (with solution) Let ’ D 9yfPy_8z[Rzx^8x(Sax)]g. Which of thefollowing substitutions are clean: (i) ’[y/x], (ii) ’[a/x], (iii) ’[x/x], (iv) ’[z/x], (v)’[z/y], (vi) ’[y/w]?

Solution (i) Not clean, because the sole free occurrence of x is in the scope of9y. (ii), (iii) Both clean. (iv) No, because the free occurrence of x is in the scopeof 8z. For (v), note that there are no free occurrences of y. Hence, vacuously, thesubstitution ’[z/y] is clean. (vi) Again vacuously clean: there are no occurrences ofw in ’ at all.

This exercise illustrates the following general facts, which are immediate fromthe definition:• The substitution of a constant for a variable is always clean.• The identity substitution is always clean.• A substitution is clean whenever the variable being replaced has no free

occurrences in the formula.• In particular, a substitution is clean whenever the variable being replaced does

not occur in the formula at all.

9.4.3 Fundamental Rules

We can now formulate the promised fundamental principles about 8. The first is alogical consequence (and known as a first-level rule); the second is a rule allowing usto get one logical consequence from another (a second-level rule). They are knownas 8� (universal instantiation, UI) and 8C (universal generalization, UG):• 8� : 8x(’) ` ’[t/x], provided the substitution ’[t/x] is clean.• 8C: Whenever ’1, : : : ,’n ` ’ then ’1, : : : ,’n ` 8x(’), provided the variable x

has no free occurrences in any of ’1, : : : ,’n.


As one would expect, there are two dual principles for 9. They are dubbed 9C(existential generalization, EG) and 9� (existential instantiation, EI):• 9C : ’[t/x] ` 9x(’), provided the substitution ’[t/x] is clean.• 9� : Whenever ’1, : : : ,’n�1, ’ ` ’n, then ’1, : : : ,’n�1,9x(’) ` ’n, provided the

variable x has no free occurrences in any of ’1, : : : ,’n.In the second-level rule 8C, the entire consequence ’1, : : : ,’n ` 8x(’) serves as

output of the rule. The input of the rule is the consequence ’1, : : : ,’n ` ’. Similarly,the input and output of 9� are both entire consequences.

Note carefully that the two first-level implications 8� and 9C make substitutionsand require that these substitutions be clean. In contrast, the second-level ones 8Cand 9 do not make substitutions and have the different proviso that the quantifiedvariable has no free occurrences in certain formulae. Commit to memory to avoidconfusion!

We draw attention to a detail about the first-level rule 9C that sometimes throwsstudents. While the implication goes from ’[t/x] to 9x(’), the substitution goes in thereverse direction, from the body ’ of 9x(’) to ’[t/x]. This contrasts with the situationin the first-level rule 8�, where the direction of the substitution is the same as thatof the inference.

The following exercise is very boring, but you need to do it in order to make surethat you have really understood what these two rules are saying.

Exercise 9.4.3 (1) (with partial solution)(a) Which of the following are instances of the first-level rule 8�? (i) 8x(Px ! Qx)

` Px ! Qx, (ii) 8x(Px ! Qx) ` Pb ! Qb, (iii) 8x(Px^Qx) ` Px^Qy, (iv)8x9y(Rxy) ` 9y(Rxy), (v) 8x9y(Rxy) ` 9y(Ryy), (vi) 8x9x(Rxy) ` 9x(Ray), (vii)8x9y(Rxy) ` 9y(Ryx), (viii) 8x9y(Rxy) ` 9y(Rzy), (ix) 8x9y(Rxy) ` 9y(Ray),(x) 8x9y(Rxy) ` Rxy.

(b) Which of the following are instances of the first-level rule 9C? (i) Pa ` 9x(Px),(ii) Rxy ` 9x(Rxx), (iii) Rxy ` 9z(Rzy), (iv) Rxy ` 9z(Rzx), (v) 9y(Rxy) `9z9y(Rzy).

(c) We know that 8x9y(Rxyz) ` 9y(Rzyz). Which of the following may be obtainedfrom it by an application of the second-level rule 8C? (i) 8x9y(Rxyz) `8z9y(Rzyz), (ii) 8x9y(Rxyz) ` 8w9y(Rzyz), (iii) 8x9y(Rxyz) ` 8w9y(Rwyz),(iv) 8x9y(Rxyz) ` 9x9y(Rzyz), (v) 8x9y(Rxyz) ` 8y9y(Rzyz).

(d) We know that Px, 8y(Ryz) ` Px^9z(Rxz). Which of the following may beobtained from it by an application of the second-level rule 9�? (i) Px, 9z8y(Ryz)` Px^9z(Rxz), (ii) Px, 9x8y(Ryz) ` Px^9z(Rxz), (iii) Px, 9w8y(Ryw)` Px^9z(Rxz), (iv) Px, 9w8y(Ryz) ` Px^9z(Rxz), (v) 9x(Px),8y(Ryz) `Px^9z(Rxz).

Solutions to (a) and (c)(a): (i) and (ii) yes; (iii) no; (iv) yes; (v), (vi) and (vii) no; (viii) and (ix) yes; (x) no.(c): (i) No; (ii) yes; (iii), (iv) no; (v) yes.


Alice Box: Why the Minus in 9�?

Alice: I’m a bit bothered by the name of one of these four rules. I can seewhy you call the first three as you do, but not the fourth. Why theminus sign in 9�? Surely we are adding the existential quantifier 9xto ’.

Hatter: It is a procedural elimination. Suppose we are given’1, : : : ,’n�1,9x(’) as premises and we want to infer a conclusion ’n.The rule tells us that to do that, it suffices to strip off the existentialquantifier from 9x(’) and use ’ as an additional premise, headingfor the same goal ’n. While we are not inferring ’ from 9x(’) (andit would be quite invalid to do so), nevertheless, we are movingprocedurally from 9x(’) to the ’ in the process of building ourdeduction.

In this respect, 9� is similar to the rule _� of disjunctive proof in propositionallogic. There we are given ’1, : : : ,’n, “1_“2 as premises, and we want to infer aconclusion ”. The rule _� tells us that to do that, it suffices to strip the disjunctionout of “1_“2 and first use “1 as an additional premise, heading for the same goal ”,then use “2 as the additional premise, heading again for ”.

9.4.4 Identity

So far we have ignored the identity relation sign. Recall from the preceding sectionthat this symbol is always given a highly constrained interpretation. At the mostlax, it is interpreted as a congruence relation over elements of the domain; in thestandard semantics used here, it is always taken as the identity relation over thedomain. Whichever of these readings one follows, standard or non-standard, somespecial properties emerge.

As would be expected, these include formulae expressing the reflexivity, symme-try and transitivity of the relation, thus its status as an equivalence relation. To beprecise, the following formulae are logically true, in the sense defined in Sect. 9.4.1:8x(x � x), 8x8y[(x � y) ! (y � x)], 8x8y8z[f(x � y)^(y � z) ! (x � z)g].

A more subtle principle for identity is known as replacement. It reflects the factthat whenever x is identical with y, then whatever is true of x is true of y. Sometimesdubbed the principle of the ‘indiscernibility of identicals’, its formulation in thelanguage of first-order logic is a little trickier than one might anticipate. To state therule, let t, t0 be terms and ’ a formula. Remember that terms may be constants,variables or more complex items made up from constants and/or variables byiterated application of function letters. We say that an occurrence of a term t in aformula ’ is free iff it does not fall within the scope of a quantifier of ’ that binds avariable in t. A replacement of t by t0 in ’, written ’[t0: t], is defined to be the result


of taking one or more free occurrences of the term t in ’ and replacing them by t0.Such a replacement ’[t0: t] is said to be clean iff the occurrences of t0 introduced arealso free in ’[t0: t]. Using these concepts, we can articulate the following principleof replacement:

’, t � t0 ` ’[t0: t], provided the replacement is clean.

For example, taking ’ to be 9y(Rxy), t to be x and t0 to be z, we have:

9y(Rxy), x � z ` 9y(Rzy)

since the introduced variable z is free in 9y(Rzy). On the other hand, if we take t0 tobe y and propose the inference 9y(Rxy), x � y ` 9y(Ryy), the introduced occurrenceof y y is not free in 9y(Ryy), and so, the principle does not authorize the step, whichindeed is not valid. Intuitively speaking, the y that is free in the formula x � yhas ‘nothing to do’ with the y that is bound in 9y(Ryy); that formula is logicallyequivalent to 9z(Rzz) which does not contain y at all.

The proof of the principle of the replacement is by a rather tedious induction onthe complexity of the formula ’, which we omit.

Exercise 9.4.4 (1) (with solution)(a) Put ’ to be 9y[R(f (x,a),y)] and t to be f (x,a), which is free in ’. Let t1 be g(x) and

t2 be f (y,z). Write out ’[t1: t] and ’[t2: t], and determine in each case whetherthe principle authorizes the consequence ’, t � ti ` ’[ti: t].

(b) Show that the principle of replacement authorizes the consequence x � x, x � y` y � x.

Solution(a) In this example, ’[t1: t] is 9y[R(g(x),y)], in which the introduced g(x) is free, so

the replacement is clean. The principle thus authorizes the desired implication.On the other hand, ’[t2: t] is 9y[R(f (y,z),y)], in which the y of the introducedf (y,z) is bound. So, the principle does not authorize the proposed implication.

(b) Put ’ to be x � x, t to be x and t0 to be y. Let ’[t0: t] replace the first occurrenceof t in ’ by t0. Since there are no quantifiers, the replacement is clean. Thus, wehave ’, t � t0 ` ’[t0: t], that is, x � x, x � y ` y � x.

Actually, symmetry and transitivity for � may be obtained from reflexivity byclever use of replacement. In the case of symmetry, the essential idea behind sucha derivation is given in the last exercise. But we will leave the full derivations untilwe have looked at higher-level proof strategies in the next chapter.


Exercise 9.1: The language of quantifiers(a) Return to the ten questions that were raised after Table 9.1 and answer them in

the light of what you have learned in this chapter.


(b) Express the following statements in the language of first-order logic. Ineach case, specify explicitly an appropriate domain of discourse and suitableconstants, function letters and/or predicates:(i) Zero is less than or equal to every natural number.

(ii) If one real number is less than another, there is a third between the two.(iii) Every computer program has a bug.(iv) Any student who can solve this problem can solve all problems.(v) Squaring on the natural numbers is injective but is not onto.

(vi) Nobody loves everybody, but nobody loves nobody.(c) Let ’ be the formula 8x9yf(y � z) ! 8w9y[R(f (u,w),y)]g. Identify the free

and bound occurrences of variables and terms in it. Identify also the vacuousquantifiers, if any. Which of the substitutions ’[x/z], ’[y/z], ’[w/z], and theirthree converse substitutions, are clean?

Exercise 9.2: Interpretations(a) Consider the relation between interpretations of being x-variants (for a fixed

variable x). Is it an equivalence relation? And if the variable x is not held fixed?Justify.

(b) Find a formula using the identity predicate, which is true under every inter-pretation whose domain has less than three elements but is false under everyinterpretation whose domain has three or more elements.

(c) Sketch an explanation why there is no hope of finding a formula that does notcontain the identity predicate, with the same properties.

(d) In the definition of an interpretation, we required that the domain of discourseis non-empty. Would the definition make sense, as it stands, if we allowed theempty domain? How might you reformulate it if you wanted to admit the emptydomain? What would the result be for the relationship between 8 and 9?

Exercise 9.3: Semantic analysis(a) Justify the principle of vacuous quantification semantically, first using the

substitutional reading of the quantifiers, and then using the x-variant reading.Which, if any, is easier?

(b) Which of the following claims are correct? (i) 8x(Px ! Qx) ` 9x(Px) ! 9x(Qx),(ii) 9x(Px ! Qx) ` 9x(Px) ! 9x(Qx), (iii) 8x(Px ! Qx) ` 9x(Px) ! 8x(Qx),(iv) 8x(Px ! Qx) ` 8x(Px) ! 9x(Qx) and (v)–(viii) their converses. In thenegative cases, give an interpretation in a small finite domain to serve aswitness. In the positive cases, sketch a semantic verification using either thex-variant or the substitutional reading of the quantifiers.

(c) Find a simple formula ’ of quantificational logic such that 8x(’) is a contradic-tion but ’ is not. Use this to get a simple formula “ such that 9x(“) is logicallytrue but “ is not. Hint: It is possible to do it without the identity predicate.

(d) Recall from the text that a set A of formulae is said to be consistent iff thereis some interpretation that makes all of its elements true. Verify the following:(i) The empty set is consistent. (ii) A singleton set is consistent iff its unique


element is not a contradiction. (iii) An arbitrary set A of formulae is consistentiff A ° p^:p. (iv) Every subset of a consistent set of formulae is consistent.(v) A finite set of formulae is consistent iff the conjunction of all its elements isconsistent.

Selected Reading

The same books as for propositional logic, with different chapters. In detail:Hein J (2002) Discrete structures, logic and computability, 2nd edn. Jones and Bartlett, Boston,

chapter 7Huth M, Ryan M (2000) Logic in computer science. Cambridge University Press, CambridgeBen-Ami M (2001) Mathematical logic for computer science, 2nd edn. Springer, London, chapters

5–7Gamut LTF (1991) Logic, language, and meaning. Volume I: Introduction to logic. University of

Chicago Press, Chicago, chapters 3,4Howson C (1997) Logic with trees. Routledge, London/New York, chapters 5–11Hodges W (1977) Logic. Penguin, Harmondsworth, sections 34–41

A more advanced text, despite its title, is:Hedman S (2004) A first course in logic. Oxford University Press, Oxford

10Just Supposing: Proof and Consequence

AbstractIn the last two chapters, we learned quite a lot about propositional and quantifica-tional logic and in particular their relations of logical implication. In this chapter,we look at how simple implications may be put together to make a deductivelyvalid argument or proof. At first glance, this may seem trivial: just string themtogether! But although it starts like that, it goes well beyond, and is indeed quitesubtle.

We begin by looking at the easy process of chaining, which creates elementaryderivations, and show how its validity is linked with the Tarski conditionsdefining consequence relations/operations. We then review several higher-levelproof strategies used in everyday mathematics and uncover the logic behindthem. These include the strategies traditionally known as conditional proof,disjunctive proof and proof by cases, proof by contradiction and argument toand from an arbitrary instance. Their analysis leads us to distinguish second-levelfrom split-level rules, articulate their recursive structures and explain the informalprocedure of flattening a split-level proof into its familiar ‘suppositional’ form.

10.1 Elementary Derivations

We begin with an example and then see what is required in order to make it work.While reading this section, the reader is advised to have at hand the tables of basictautological implications and equivalences from Chap. 8.

10.1.1 My First Derivation

Suppose that we are given four propositions as assumptions (usually called premisesin logic), and that they are of the form (p_:q) ! (s ! r), :s ! q and :q. We wantto infer r. How can we do it?


243

244 10 Just Supposing: Proof and Consequence

Fig. 10.1 A labelled derivation tree

This is, in effect, a matter of showing that f(p_:q) ! (s ! r), :s ! q, :qg` r, so in principle we could use a truth table with 24 D 16 rows or a semanticdecomposition tree. But there is another method, which parallels much moreclosely how we would handle such a problem intuitively. We can apply some basictautological implications or equivalences with which we are already familiar (e.g.from the tables in Sects. 8.2 and 8.3 of Chap. 8) and chain them together to get thedesired conclusion out of the assumptions given.

Specifically, from the third premise, we have p_:q by addition (aka _C), andfrom this together with the first premise, we get s ! r by modus ponens (!�). Butfrom the third premise again, combined this time with the second one, we have::s by modus tollens, which evidently gives us s by double negation elimination.Combining this with s ! r gives us r by modus ponens, as desired. One way ofsetting this out is as a labelled derivation tree, as in Fig. 10.1.

Such a tree is conventionally written with leaves at the top and is usuallyconstructed from the leaves to the root. The leaves are labelled by the premises (aliasassumptions) of the inference. The parent of a node (or pair of nodes) is labelled bythe proposition immediately inferred from it/them. If desired, we may also labeleach node with the usual name of the inference rule used to get it or, in the case ofthe leaves, give it the label ‘premise’. Note that the labelling function in the exampleis not quite injective: there are two distinct leaves labelled by the third premise.

Exercise 10.1.1 (1) (with partial solution)(a) In the tree of Fig. 10.1, label the interior nodes (i.e. the non-leaves) with the

rules used.(b) What happens to the tree in the figure if we merge the two nodes labelled :q?

Solution to (b) It would no longer be a tree, only a graph. It turns out to be muchmore convenient to represent derivations as trees rather than as graphs.

Given the convention of linearity that is inherent in just about all traditionalhuman writing systems, a derivation such as the above would not normally bewritten out as a tree but as a finite sequence. This is always possible and may beaccompanied by annotations to permit unambiguous reconstruction of the tree fromthe sequence. The process of transforming a derivation from tree to sequence layoutmay be called squeezing the tree into linear form.

10.1 Elementary Derivations 245

Table 10.1 A derivation as a sequence with annotations

Number Formula Obtained from By rule

1 (p_:q) ! (s ! r) Premise2 :s ! q Premise3 :q Premise4 p_:q 3 Addition5 s ! r 4, 1 Modus ponens6 ::s 3, 2 Modus tollens7 s 6 Double negation8 r 7, 5 Modus ponens

Table 10.1 gives an annotated sequence corresponding to the tree in Fig. 10.1.The sequence itself is in the second column, with its elements numbered in the firstcolumn and annotations in the third and fourth columns.

The linear form is the one with which the layman first thinks of, with vaguememories of proofs done at school. One begins with the premises and works one’sway, proposition by proposition, to the conclusion. Building a derivation like this isoften called chaining.

Tree displays reveal their structure more readily to the eye, but linearizationhas practical advantages. It takes up less space on the handwritten page, and it iseasier to write down using standard word-processing software, at least with currentsystems. A further advantage to linearization is that it eliminates the need to repeatpropositions that may have occurred at more than one node in the tree. The savingmay appear derisory; indeed it is so in our example, which is short and where therepeated formula :q is a premise, thus a leaf in the tree. But it is easy to imagine along derivation in which a proposition is appealed to twice (or more) near the end. Intree form, this will be represented by a large tree with a node repeated near the root,and that forces repetition of the entire subtree above it from the node in question tothe leaves from which it was obtained. In such cases, the amount of repetition canbecome very significant.

Exercise 10.1.1 (2) On the basis of the example considered in the above figureand table, give in words a simple algorithm for squeezing a derivation tree into anannotated derivation sequence that (1) ‘preserves the order’, in the sense that everynode of the tree is preceded, in the sequence, by its children, and (2) does not repeatany propositions.

10.1.2 The Logic Behind Chaining

This is all very easy, but it gives rise to a rather subtle question. What guarantee dowe have that when each of the individual steps in a derivation sequence/tree is valid(i.e. corresponds to a tautological implication) then the sequence as a whole is too?In other words, how can we be sure that the last item/root is implied by the premises


taken together, no matter how distant? In the example, how can we be sure that thepropositions (1) through (3) given as premises really do tautologically imply theend proposition (8)? The first answer that comes to mind is that we are just applyingthe transitivity of `. But while transitivity is indeed involved, there is more to itthan that.

In our example, we want to show that each formula ’ in the sequence istautologically implied by the three premises (1) through (3) taken together. Weconsider them one by one. First, we want to check that formula (1) is implied bythe set f1,2,3g. Clearly, this requires a principle even more basic than transitivity,namely, that a proposition is a consequence of any set of propositions of which it isan element. That is, we need the following principle of strong reflexivity:

A ` ’ whenever ’ 2 A:

Propositions (2) and (3) are checked in a similar manner. What about (4)? We sawthat it is tautologically implied by (3), but we need to show that it is tautologicallyimplied by the set f1,2,3g of propositions. For this we need a principle calledmonotony. It tells us that whenever ” is a consequence of a set A, then it remainsa consequence of any set obtained by adding any other propositions to A. In otherwords, increasing the stock of premises never leads us to drop a conclusion:

B ` “ whenever both A ` “ and A � B:

In our example, A D f3g, “ D (4) and B D f1,2,3g. What about (5)? In theannotations, we noted that it is tautologically implied by f1,4g, so monotony tellsus that it implied by f1,2,3,4g. But again we need to show that it is tautologicallyimplied by f1,2,3g, so we need to ‘get rid of’ (4). We are in effect using the followingprinciple, called cumulative transitivity where, in our example, A D f1,2,3g, B D f4gand ” D (5).

A ` ” whenever both A ` “ for all propositions “ 2 B and A [ B ` ”:

All three of these principles – strong reflexivity, monotony and cumulativetransitivity – are needed to ensure that when each individual step in a sequence isjustified by an inference relation ) (in this example, the relation ` of tautologicalimplication), then so too is the sequence as a whole. The surprising thing is thatthe three are not only necessary for the job; together they are sufficient! Inferencerelations satisfying them are thus very important and are called consequencerelations. In the next section, we study them closely and prove that they sufficeto guarantee the validity of chaining.

Exercise 10.1.2 (with partial solution)(a) Complete the enumeration for the sequence in Table 10.1 by considering

formulae (6), (7), (8) in turn and noting the application of the principles ofstrong reflexivity, monotony and/or transitivity.

10.2 Consequence Relations 247

(b) In Chap. 8, we used the method of decomposition trees to show that f:q_p,:r ! :p, s, s ! :r, t ! pg ` :t^:q. Now do it by constructing an elementaryderivation, setting out the result first as a labelled derivation tree, and then as anannotated derivation sequence.

Partial solution to (a) To check that f1,2,3g ` (6), we first note that f2,3g ` (6) (bymodus tollens, as recorded in the annotation) and then apply monotony. To checkthat f1,2,3g ` (7), we first note that (6) ` (7) (by double negation elimination asrecorded in the annotation), so f1,2,3,6g ` (7) by monotony. But we already knowthat f1,2,3g ` (6), so we may apply cumulative transitivity to get f1,2,3g ` (7), asdesired. The last check, that f1,2,3g ` (8), is done in similar fashion.

10.2 Consequence Relations

We now abstract on the material of Sect. 10.1 to define the general concept of aconsequence relation, establish its relationship to elementary derivations and showhow it may conveniently be expressed as an operation.

10.2.1 The Tarski Conditions

Let ) be any relation between sets of formulae on its left (the formulae comingfrom the language of propositional, quantificational or some other logic) andindividual formulae (of the same language) on its right. It is called a consequencerelation iff it satisfies the three conditions that we isolated in the previous section:

Strong reflexivity: A ) ’ whenever ’ 2 A.

Monotony: Whenever A ) “ and A � B, then B ) “.

Cumulative transitivity: Whenever A ) “ for all “ 2 B and A[B ) ”, thenA ) ”.

A few words on variant terminologies may be useful. Strong reflexivity issometimes called ‘identity’, though that could possibly give rise to confusion withprinciples for the identity relation in first-order logic. It is sometimes also called(plain) reflexivity, although we are already using the term in its standard and weakersense that “ ) “ (when the left side is a singleton set, we omit the braces to reduceclutter). ‘Monotony’ is sometimes written ‘monotonicity’. Cumulative transitivityis often called ‘cut’.

The three conditions are often called the Tarski conditions, in honour of AlfredTarski who first realized their joint importance. Algebraically-minded logiciansoften call consequence relations logical closure relations, as they are just a particularcase of the closure relations that arise in abstract algebra.

Exercise 10.2.1 (1) (with partial solution)(a) Show that both tautological implication in propositional logic and logical impli-

cation in quantificational logic are indeed consequence relations. Hint: You can


use a single argument to cover both. Consider any valuation v, without botheringto specify which of the two languages you are considering.

(b) Show that (i) strong reflexivity immediately implies plain reflexivity and(ii) plain reflexivity together with monotony implies strong reflexivity.

(c) Give a simple example of a relation (between sets of formulae on the left andformulae on the right) that satisfies both monotony and cumulative transitivitybut does not satisfy reflexivity.

(d) Show that monotony may be expressed equivalently in either of the followingtwo ways: (i) whenever A ) ”, then A[B ) ”; (ii) whenever A\B ) ”, thenA ) ”.

Partial solutions to (a) and (c)(a) We check out cumulative transitivity. Suppose A ) “ for all “ 2 B and

A[B ) ”; we need to show A ) ”. Let v be any valuation, and suppose v(’) D 1for all ’ 2 A; we need to get v(”) D 1. By the first supposition (and the definitionof tautological/logical implication), we have v(“) D 1 for all “ 2 B. Thus,v(¦) D 1 for all ¦ 2 A[B. Hence, by the second supposition, v(”) D 1. Theother two verifications are even easier.

(c) The empty relation is the simplest.

Although, as verified in the above exercise, classical logical implication satisfiesall three of the Tarski conditions, it should be appreciated that neither monotony norcumulative transitivity is appropriate (in unrestricted form) for relations of uncertaininference. In particular, monotony fails for both qualitative and probabilistic ac-counts of uncertain reasoning, and while cumulative transitivity is acceptable undersome qualitative perspectives, it too fails probabilistically. But that goes beyond ourpresent concerns.

Exercise 10.2.1 (2) (with partial solution)(a) Sketch an intuitive example of non-deductive reasoning (say, an imaginary one

concerning the guilt of a suspect in a criminal case) that illustrates how theprinciple of monotony may fail in such contexts.

(b) Is plain transitivity always appropriate for uncertain inference? Give an intuitivecounterexample or explanation.

Solution to (b) Plain transitivity fails for uncertain inference. One way of illus-trating this is in terms of an intuitive notion of risk. When we pass from ’ to“ non-deductively, there is a small risk of losing truth, and passage from “ to ”

may likewise incur a small risk. Taking both steps compounds the risk, which maythereby exceed a reasonable threshold that each of the separate ones respects. Ineffect, the strength of the chain may be even less than the strength of its weakest link.


10.2.2 Consequence and Chaining

We are almost ready to articulate in a rigorous manner the fact that the chainingis always justified for consequence relations. First we define, more carefully thanbefore, the notion of an elementary derivation.

Let )0 be any relation between sets of formulae and individual formulae of aformal language. To guide intuition, think of )0 as (the union of the instances of)some small finite set of rules like ^C, _C, !�, or others that you select from thelogical implication tables of Chaps. 8 and 9, or yet others that you find attractive.Let A be any set of formulae, ’ an individual formula. We say that a finite sequence¢ D ’1, : : : ,’n of formulae is an elementary derivation of ’ from A using the relation)0 iff (1) ’n D ’ and (2) for every i � n, either ’i 2 A or B )0 ’i for someB � f’1, : : : ,’i�1g.

This definition is in terms of sequences. It may equivalently be expressed in termsof trees. We can say that a finite tree £ whose nodes are labelled by formulae is anelementary derivation of ’ from A using the relation )0 iff (1) its root is labelled by’, (2) every leaf node in the tree is labelled by a formula in A and (3) each non-leafnode x is labelled by a formula “ such that the labels of the children of x form a setB of formulae with B )0 “. Visually, the tree representation is more graphic, andconceptually it is also clearer; but verbally, it is more complex to formulate and use.We will speak in both ways, according to which is handier on the occasion.

The soundness theorem for chaining tells us: For any inference relation ) andsubrelation )0 of ), if (i) there is an elementary derivation of ’ from A using)0, and (ii) ) satisfies the three Tarski conditions, then we have A ) ’. Theproof of this is easy once you see it, but it can be difficult for a student to set itup from scratch, so we give it in full detail. It is most simply expressed in terms ofsequences (rather than trees) and proceeds by cumulative induction on the length ofthe derivation (i.e. the number of terms in the sequence). The reader is advised torefresh memories of cumulative induction from Sect. 4.5.2. In particular, it shouldbe noted that in this form of induction, we do not need a base (although we can putone in if one wants), only an induction step.

Let )0 be any relation between sets of formulae and individual formulae ofa formal language. Suppose that )0 � ) and that ) is a consequence relation(i.e. satisfies the three Tarski conditions). We want to show that for any elementaryderivation ¢ D ’1, : : : ,’n of ’ from A using the relation )0, we have A ) ’. (Notethat we are not trying to show that A )0 ’; in general, that would fail.) Supposethat this holds for all i < n. Now, by the definition of an elementary derivation, either’n 2 A or B )0 ’n for some B � f’1, : : : ,’n�1g. In the former case, we have A ) ’

by strong reflexivity of ). In the latter case, we have B ) ’n by the suppositionthat )0 � ), and thus in turn (1) A[B ) ’n by monotony of ). But each ’i

2 B is obtained by an elementary derivation of length i < n. So by our inductionhypothesis, (2) A ) ’i for each ’i 2 B. Applying cumulative transitivity to (1) and(2) gives us A ) ’, as desired.


Exercise 10.2.2 (1) Uneasy about not having a base in this induction? Add itin. That is, show that the soundness theorem for chaining holds (trivially) forderivations of length 1.

Alice Box: What Did We Use in This Proof?

Alice: This little proof does not seem to use the full power of monotony orcumulative transitivity.

Hatter: How is that?Alice: When applying cumulative transitivity, we needed only the case

that B is finite, although A may be infinite. And similarly, whenapplying monotony understood in the form ‘whenever A ) ”, thenA[B ) ”’, only the case that B is finite was needed. I wonderwhether there is any significance in that!

Hatter: I have no idea : : :

Eagle eye! Alice is going beyond the boundaries of the text, but we can say a fewwords for those who are interested. Indeed, we don’t quite need the full force of theTarski conditions to establish the soundness theorem for chaining. As Alice noticed,we can do it with restricted versions of monotony and cumulative transitivity, inwhich certain sets are finite. For many inference relations ), including classicalconsequence in both propositional and quantificational logic, the full and restrictedversions of those Tarski conditions are in fact equivalent, because those relations arecompact in the sense that whenever A ) ’, then there is some finite subset A0 � Awith A0 ) ’. But for relations ) that are not compact, the restricted versions ofmonotony and cumulative transitivity can sometimes be strictly weaker than theirfull versions.

Another lesson can be drawn from Alice’s observation. It concerns a possibleconverse for the soundness theorem for chaining. That would be a theorem tothe effect that if ) fails one of the three Tarski conditions, then chaining with asubrelation )0 may lead us outside ). It could be called a completeness theoremfor chaining. Such a result can very easily be proven, but only for the restrictedversions of the Tarski conditions, or for relations ) that are compact in the sensedefined above. But this is beyond the syllabus, and if you have trouble understandingit, just skip the next exercise.

Exercise 10.2.2 (2) (optional, for confident students only)*(a) Explain why the converse of compactness holds trivially for every consequence

relation.(b) Show that if a relation ) is both compact and monotonic, then, whenever A is

infinite and A ) “, there are infinitely many finite subsets A0 � A with A0 ) “.


(c) Formulate and prove a completeness theorem for chaining, using the fullversions of the Tarski conditions but restricting the statement of the theoremto relations ) that are compact.

(d) Do the same but with the restricted versions of monotony and cumulativeinstead of requiring compactness.

10.2.3 Consequence as an Operation*

One can express the notion of logical consequence equivalently as an operationinstead of a relation. This is often convenient, since its behaviour can usually bestated rather more succinctly in operational terms. On the other hand, the operationalversions do take a bit of time to get used to.

Let A be any set of propositions, expressed in a formal language like proposi-tional or first-order logic, and let ) be any inference relation for that language. Wedefine Cn quite simply as gathering together the propositions “ such that A ) “.That is, Cn(A) D f“: A ) “g. The function Cn thus takes sets of propositions to setsof propositions; it is on P(L) into P(L), where L is the set of all propositions of thelanguage under consideration and P is the power-set operation, familiar from thechapter on sets. The notation for Cn is chosen to recall the word ‘consequence’.

When ) is a consequence relation, that is, satisfies the three Tarski conditions,then the operation Cn has the following three properties:

Inclusion: A � Cn(A)Idempotence: Cn(A) D Cn(Cn(A))Monotony: Cn(A) � Cn(B) whenever A � BThese properties correspond to those for consequence as a relation and can

be derived from them. Again, they are called Tarski conditions, and consequenceoperations are known to the algebraically or topologically inclined as closureoperations. In topology, however, closure operations are sometimes required tosatisfy the equality Cn(A)[Cn(B) D Cn(A[B), which does not hold for classicalconsequence and the logical operations in general.

We distinguished notationally between an arbitrary consequence relation ) andthe particular relation of classical consequence `. Similarly, while we write Cn foran arbitrary consequence relation, Cn` stands for the specific operation of classicalconsequence. By the definition, we have Cn`(A) D f“: A ` “g.

Exercise 10.2.3 (with partial solution)(a) Give a very simple example to show that (i) classical consequence does not

satisfy the equality Cn`(A)[Cn`(B) D Cn`(A[B). Show that nevertheless, (ii)the LHS � RHS half of it holds for every consequence relation, includingclassical consequence.

(b) Show that the three Tarski conditions for logical consequence as an operationfollow from those for consequence relations, via the definition Cn(A) D f“:A ) “g given in the text.


(c) In your answer to (b), did you obtain idempotence from cumulative transitivityalone, or did you need the cooperation of one of the other conditions forconsequence relations?

(d) What would be the natural way of defining consequence as a relation fromconsequence as an operation?

(e) Show that the three Tarski conditions for logical consequence as a relationfollow from those for consequence operations, via the definition given in answerto (d).

(f) Express the equality Cnf’g\Cnf“g D Cn(f’_“g) in terms of consequencerelations ). Show that it holds for classical consequence. Remark: You mayanswer in terms of ` or of Cn`.

(g) Express in terms of consequence relations the principle that when A is finite,Cn(A) D Cn(^A) where ^A is the conjunction of all the elements of A. Showthat it holds for classical consequence. Remark: Same as before.

Solutions to (a) and (d)(a) (i) Take two distinct elementary letters p, q and put A D fpg, B D fqg. Then,

p^q 2 RHS but p^q 62 LHS. (ii) A � A[B and also A � A[B; so by monotony,Cn(A) � Cn(A[B) and also Cn(B) � Cn(A[B), so Cn(A)[Cn(B) � Cn(A[B).

(d) Put A ) “ iff “ 2 Cn(A).

10.3 A Higher-Level Proof Strategy

In this section and the next, we review some of the proof strategies that are usedin traditional mathematical reasoning. These include conditional proof, disjunctiveproof and its variant proof by cases, proof by contradiction and proof to and froman arbitrary instance. They are typically marked by terms like ‘suppose’, ‘let x be anarbitrary such-and-such’ or ‘choose any so-and-so’. From a logical point of view,they make use of higher-level rules as well as the implications deployed in chaining,which, for contrast, we can refer to as first-level rules.

This section focuses on the strategy known as conditional proof. We already saida few words about it in a logic box (Chap. 1, Sect. 1.2.2), but now we put it underthe microscope and analyze the logic behind it. We do so slowly, as conditionalproof serves as an exemplar. After examining it carefully, we can describe the otherhigher-level strategies more succinctly in the following section.

10.3.1 Informal Conditional Proof

Consider a situation where we want to prove a conditional proposition “ ! ” givenbackground propositions ’1, : : : ,’n that we are willing to take as known, say asaxioms or theorems already established. The standard way of doing this is to make asupposition and change the goal. We suppose the antecedent “ of the conditional and

10.3 A Higher-Level Proof Strategy 253

seek to establish, on this basis, the consequent ” of that conditional. If we manageto get ” out of f’1, : : : ,’n, “g, then our proof is complete. From that point on, wedisregard the supposition; it is like a ladder that we kick away after we have climbedto our destination. This kind of reasoning is known as conditional proof.

In everyday mathematics, it is relatively rare to prove a bare conditional “ ! ”;usually there is an initial universal quantifier as well. For example, we might want toprove (using facts we already know) that whenever a positive integer n is even, thenso is n2, which is of the form 8n(En ! En2) taking the set of all positive integers asdomain. To prove that we begin by stripping off the initial quantifier ‘whenever’, wefirst let n be an arbitrary positive integer, and only then suppose that n is even andgo on to establish that n2 is even. However, in our analysis, we want to separate outthe moves made for different logical connectives. Proof methods for the quantifierswill be discussed later; for now we consider an example in which conditional proofis the only method going beyond familiar chaining.

Take three premises (p^q) ! r, (t ! :s), (r ! (t_u)) (these are the ’1, : : : ,’n ofour schematic description) and try to prove (p^s) ! (q ! u) (which is “ ! ” in thescheme). What do we do? We suppose p^s (“ in the scheme) and reset our goal asq ! u (” in the scheme). That happens still to be a conditional, so we may iteratethe procedure by supposing its antecedent q, with the desired conclusion now resetas u. If we succeed in getting u out of the five assumptions ’1, : : : ,’5 (three initialpremises plus two suppositions), then our proof is complete. Filling in the rest isstraightforward; we make use of some of the tautological implications in Chap. 8 toconstruct an elementary derivation that does the job.

Exercise 10.3.1 (1) (with sketch of a solution) Write out the above proof in full,including the steps described as straightforward.

Sketch of a solution Make your suppositions as above, and try applying simplifi-cation, conjunction, modus ponens, modus tollens and disjunctive syllogism, someof these perhaps more than once. Some variation in order of application is possible.

Sometimes, when we want to prove a conditional “ ! ”, it is more convenientto prove its contrapositive :” ! :“. In that case, we apply the same method to:” ! :“: we suppose :” and try to get :“. If we succeed, we have proven:” ! :“ from whatever the original premises were, and so we can finally applythe first-level implication of contraposition to end with the desired “ ! ”. Thisprocedure is often known as contraposed or indirect conditional proof. However, itis important to realize that its central feature is still plain conditional proof, and forthe purposes of logical analysis, we continue to focus on that. A text more concernedwith heuristics than logic would give the contraposed version more attention.

Exercise 10.3.1 (2) Carry out the same proof by first supposing p^s and thensupposing (contrapositively) :u, heading for :q. The choice and/or order oftautological implications applied may differ slightly.


Alice Box: Why Bother with Suppositions?

Alice: Why bother making suppositions? Can’t we do it without them?Hatter: How?Alice: Well, by a using truth table or a semantic decomposition tree.Hatter: The last example has 6 elementary letters, so the table will have

26 D 64 rows. The decomposition tree will also be quite big. Youcan do it, but it is neither elegant nor convenient.

Alice: Well, by an elementary derivation, without suppositions?Hatter: Try it. It will be tricky to find, longer and less transparent.

10.3.2 Conditional Proof as a Formal Rule

Let’s articulate the logical rule underlying the procedure, expressing it in terms ofan arbitrary inference relation ). Quite simply, it is that A ) (“ ! ”) wheneverA[f“g ) ”. Writing a slash for the central transition, it may be expressed as

ŒA [ f“g ) ”� =ŒA ) .“ ! ”/�

Omitting the square brackets as understood and abbreviating A[f“g as A,“:

A; “ ) ”=A ) .“ ! ”/ :

In the rule, A ) “ ! ” is called the principal (or main) inference, while theone from which that is obtained, A, “ ) ”, is called the subordinate inference(or subproof ). The proposition “ is the supposition of the subordinate inferenceand becomes the antecedent of the conclusion of the principal inference. Like theinformal procedure that it underlies, the rule as a whole is known as conditionalproof with acronym CP. It is often called !C because an arrow is introduced intothe conclusion of the principal inference. Sometimes for display purposes, insteadof using a slash, a line is drawn below the subordinate inference and the principalone is written below.

There are several important points that we should immediately note aboutconditional proof. They concern its shape, status and benefits and apply, mutatismutandis, to the other higher-level rules that will be explained in the followingsection:• In its present formulation, it is a second-level rule, in the sense that it takes

us from the validity of an entire inference A, “ ) ” to the validity of anotherentire inference A ) “ ! ”. In this respect, it contrasts with the tautologicalimplications and equivalences in tables of Chap. 8, all of which take us fromvarious propositions as premises to a proposition as conclusion.

10.3 A Higher-Level Proof Strategy 255

• It is correct when the relation ) is classical consequence, that is, tautologicalor first-order logical implication `. That is the only specific context that we willbe considering in this chapter, but we continue to formulate conditional proofin a general way using the double arrow, and note in passing that the rule isalso correct for many other consequence relations in more complex or variantlanguages.

• It makes life easier when carrying out a deduction, because it gives us morepremises to play around with. When the principal inference A ) “ ! ” has npremises, the subordinate inference A, “ ) ” has n C 1. That gives us one morepremise to grab. If conditional proof is iterated, as in our example, then eachapplication makes another premise available.

• Finally, the conclusion ” of the subordinate argument has one less occurrence ofa connective than the conclusion “ ! ” of the principal argument. In this way,it reduces the question of obtaining a conditional conclusion to that of getting alogically simpler one.Not all of the higher-order rules that follow actually increase the number of

assumptions in the subordinate inference: one of them (for the universal quantifier)leaves the premise-set unchanged, making no supposition. But in all of them, it ispossible to formulate the rule in such a way that the premise-set of the subordinateinference is a superset (usually but not always proper) of the premise-set of theprincipal one. To mark this property, we say that the rules are incremental.

Finally, we observe that conditional proof may be written in another way, as asplit-level rule taking propositions plus an entire inference to a proposition. Whatis the difference? Whereas the second-level formulation tells us that whenever theinference A, “ ) ” is valid, then so is the inference A ) “ ! ”, the split-levelpresentation tells us that whenever all propositions in A are true and the inference A,“ ) ” is valid, then the proposition “ ! ” is true. In a brief display, where we haveadded a semicolon and restored some brackets for readability:

AI .A; “ ) ”/ =.“ ! ”/ :

The difference between the two formulations is mathematically trivial; we haveonly moved the A around. But it can be helpful, as we will see shortly. It is ratherlike working with the left or right projection of a two-argument function insteadof the function itself. In both cases, we have a notational reformulation based on atrivial equivalence, which is surprisingly useful. All of the second-level rules thatwill be considered can be rewritten in a split-level version.

Exercise 10.3.2(a) Show that, as claimed in the text, conditional proof is correct for classical

consequence. Hint: You can consider the second-level or the split-level version.You may also use a single argument to cover both propositional and first-orderconsequence: consider any valuation v, without bothering to specify which ofthe two languages you are considering.


(b) The converse of the second-level rule of conditional proof reverses its principaland subordinate inferences; that is, its principal inference is the old subordinateone and vice versa. Write this converse explicitly, rewrite it in split-level formand show that it is classically correct.

(c) Explain in informal terms why converse conditional proof is of little interest.

10.3.3 Flattening Split-Level Proofs

It is difficult to combine second-level rules directly with first-level ones, as thepremises and conclusion of the latter are all propositions, while the premises andconclusion of the former are all inferences. But split-level rules cooperate easilywith first-level ones. Since the output and some of the inputs of the split-level ruleare propositions, we can use them as premises or conclusions of first-level rules.A proof using both split-level and first-level rules will have a mix of nodes: somewill be propositions and some will be summary records (premises and associatedconclusions) of entire inferences.

When all the higher-level rules used are incremental, then we can rewrite a split-level proof as a sequence (or tree, but we will focus on sequences) of propositionsonly, by a process called flattening. At the nodes that were labelled by summaryrecords of entire inferences, we insert a sequence of propositions beginning withthe additional premises (if any) of the subproof, calling them suppositions, andcontinuing with the propositions of the subproof until its goal is reached, at whichpoint the supposition is cancelled (or as is often said, discharged).

In the particular case of conditional proof, we add to the premises A of the mainproof the further premise “ of the subproof, calling it a supposition, and make freeuse of it as well as the original premises in A until we reach the goal ” of thesubproof, where we annotate to record the discharge of the supposition, and thenconclude “ ! ” by the rule of conditional proof on the basis of the original premisesA alone. Whereas the split-level proof has the form:

Given the premises in A, and wanting to get “ ! ”,carry out separately a proof of ” from A[f“g, and thenconclude “ ! ” from A alone;

its flattened or suppositional version has the form:

Given the premises in A, and wanting to get “ ! ”,add in “ as a further premise, called a supposition,argue to ”,discharge (i.e. unsuppose) “,conclude “ ! ” on the basis of A.

What is the advantage? It cuts down on repetition: we do not have to repeat all thepremises in A. That may seem a very meagre gain, but in practice, it makes thingsmuch more manageable, especially with multiple subproofs.

All of this applies to the other higher-level strategies that we will shortly beexplaining. In each case, the higher-level rule is incremental, so that flatteningbecomes possible and eliminates the need to repeat premises.

10.4 Other Higher-Level Proof Strategies 257

Exercise 10.3.3 (with sketch of solution) In Exercise 10.3.1 (1), we constructed aninformal argument using conditional proof. These were expressed in traditional formwith suppositions, that is, as flattened versions of split-level proofs. Reconstruct thefull split-level proof that lies behind it, and compare the amount of repetition.

Sketch of Solution In the principal proof (call it A), where we supposed p^s, youneed to write a summary record (premises and conclusion) of a subproof and alsowrite out that subproof (call it B) separately. In the subproof, where we supposedq, you should write a summary record (premises and conclusion) of a sub-subproofand give separately the sub-subproof (call it C).

Thus, flattening, like squeezing (Sect. 10.1), reduces repetition in a proof. Butthe two processes should not be confused:• Squeezing may be applied to any derivation in tree form, even an elementary one.

By squeezing a tree into a sequence, we avoid the need to repeat propositions thatare appealed to more than once, which also eliminates repetition of the subtreesabove them.

• On the other hand, flattening transforms arguments combining first-level withsplit-level rules (whether in tree or sequence form), into structures that super-ficially resemble elementary derivations in that they contain only propositions,with the application of higher-level rules signalled by markers such as ‘suppose’.Flattening split-level proofs into suppositional form has the advantage of stream-

lining, but it has the conceptual side effect of moving the recursive mechanismout of the spotlight to somewhere at the back of the stage. A good suppositionalpresentation should retain enough annotative signals to let the reader know wherethe recursions are and to permit their recovery in full if needed.

10.4 Other Higher-Level Proof Strategies

Now that we have a clear picture of what is going on when one carries out aconditional proof, we can review more easily other higher-level proof strategies,namely, disjunctive proof (and the closely related proof by cases), proof bycontradiction and proofs to and from arbitrary instances. We consider each in turn.

10.4.1 Disjunctive Proof and Proof by Cases

We said a little about disjunctive proof in a logic box of Chap. 1; here we dig deeper.Consider a situation where we have among our premises, or have already inferred,a disjunction “1_“2. We wish to establish a conclusion ”. The disjunction doesn’tgive us much definite information; how can we make good use of it?

One way is to translate it into a conditional :“1 ! “2 and try to apply modusponens or modus tollens. But that will work only when our information is sufficientto logically imply one of :“1 and :“2, which is not often the case. But we can tackle


the problem in another way: divide and rule. We first suppose “1 and try to get thesame conclusion ” as before. If we succeed, we forget about that supposition, andsuppose instead “2, aiming again at ”. If we succeed again, the proof is complete.This is disjunctive proof, acronym DP, code name _�.

Again, it is not usual for disjunctive proof to occur in isolation in a mathematicalargument. More commonly, the information available will be of a form like8x(“1_“2) or 8x[’ ! (“1_“2)], and we have to peel off the outer operators beforeworking on the disjunction. For example, we might wish to show that for any sets A,B, P(A)[P(B) �P(A[B). We first let A, B be arbitrary sets, suppose for conditionalproof that X 2 P(A)[P(B) and reset our goal as X 2 P(A[B). From the supposition,we know that either X 2 P(A) or X 2 P(B), so we are now in a position to applydisjunctive proof. We first suppose that X 2 P(A) and argue to our current goalX 2 P(A[B); then we suppose that X 2 P(B) and do the same. If we succeed inboth, we are done.

Exercise 10.4.1 (1) (with partial solution)(a) Fill in the missing parts of the above proof of P(A)[ P(B) �P(A[B).(b) Does it matter which disjunct we handle first?(c) Do we need to require that the two disjuncts are exclusive, that is, that they

cannot both be true?(d) Construct informal verbal arguments using disjunctive proof to show that

(A[B) � C � (A � C)[(B � C) and conversely.

Partial solutions to (b) and (c)(b) No, because the two subordinate arguments are independent of each other. In

practice, it is good style to cover the easier one first.(c) No. The situation where both disjuncts are true is covered by each of the two

subproofs.

For an example in which disjunctive proof is the only device used other thanchaining, we derive s from the three premises :p_q, (:p_r) ! s, :s ! :q. Notingthat one of the premises is a disjunction :p_q, we first suppose :p and use familiartautological implications to reach the unchanged goal s. We then suppose q and dothe same. With both done, we are finished.

Exercise 10.4.1 (2) Fill in the gaps of the above proof.

What is the underlying logical rule? Expressing it in an incremental form, it says:Whenever A, “1_“2, “1 ) ” and A, “1_“2, “2 ) ”, then A, “1_“2 ) ”. Of course,for classical consequence (or anything reasonably like it), we have “1 ) “1_“2 and“2 ) “1_“2, so for such consequence relations, the rule may be expressed moresuccinctly without “1_“2 appearing as a premise in the two subordinate inferences.That gives us the following more common formulation, with the same bracketingconventions as for the rule for conditional proof.

A; “1 ) ”I A; “2 ) ” =A; “1 _ “2 ) ”:


Exercise 10.4.1 (3)(a) Identify the principal inference and the two subordinate inferences in the above

scheme for disjunctive proof.(b) Write in brackets to make sure that you understand the structure of the rule.(c) Verify that the rule is correct for classical consequence.

A split-level formulation moves the A across, telling us that when “1_“2 and allpropositions in A are true and the inferences A,“1 ) ” and A,“2 ) ” are both valid,then the proposition ” is true. In a brief display, with brackets for readability:

Œ.A; “1 _ “2/ I .A; “1 ) ”/ I .A; “2 ) ”/� =” :

A flattened suppositional version looks like the following:

Given “1_“2 and premises in A, and wanting to get ”,suppose “1,argue to ”,discharge “1,suppose “2,

argue to ” again,discharge “2,and finally conclude ”.

In disjunctive proof, flattening reduces repetition even more than it does inconditional proof. A single application of the split-level version would write allthe propositions in A three times; a flattened version writes them only once.

In mathematical practice, disjunctive proof most often appears in a variant(though classically equivalent) form known as proof by cases. We have variouspremises ’1, : : : ,’n, and we wish to infer a conclusion ”. We may not be able to finda single form of argument that works in all possible cases. So we divide the possiblecases into two. We think up a suitable proposition “ and consider separately thecases for “ and :“. Each of these may open the way to a neat proof of ”. The twoproofs may be more or less similar or very different. If both succeed, we are done.

For example, we may want to show in the arithmetic of the natural numbers thatfor all n, n2 C n is even. The natural way to do it is by first considering the case thatn is even, then the case that n is not even (so odd). Again, we might want to showthat the remainder when a square natural number n2 is divided by 4 is either 0 or 1;here too, the natural breakdown is into the cases that n is even and that it is not even.In the chapter on trees, proof by cases was used (inside a proof by contradiction) inSect. 7.6.2 when showing that every cycle in an unrooted tree contains at least onerepeated edge.

Exercise 10.4.1 (4) Complete the two informal arithmetic proofs, with the use ofcases marked clearly.

The underlying second-level rule for proof by cases is

ŒA; “ ) ”I A; :“ ) ”� =A ) ” :


Expressed as a split-level rule this becomes

ŒAI .A; “ ) ”/ I .A; :“ ) ”/� =” :

Exercise 10.4.1 (5)(a) Verify that the rule of proof by cases is correct for classical consequence. You

may focus on the second-level or the split-level formulation.(b) Write out an informal flattened version of proof by cases, modelling it on the

flattened version of disjunctive proof.

In our two arithmetical examples of proof by cases, it was rather obvious howbest to choose the proposition “. In other examples, it may not be so clear. In difficultproblems, finding a profitable way to split the situation may be the greater part ofsolving the problem. Experience helps, just as it helps us cut a roast turkey at thejoints.

Exercise 10.4.1 (6) Use proof by cases to give a simple proof that there are positiveirrational numbers x, y such that xy is rational.

Sometimes in a proof, one finds oneself iterating the splitting to get subcases,sub-subcases, and so on, ending up with dozens of them. Such arguments are ratherinelegant and are disdained by mathematicians when a less fragmented one can befound. However, computers are not so averse to them. A famous example is thesolution of the four-colour problem in graph theory, which required a breakdowninto a myriad of cases treated separately with the assistance of a computer.

10.4.2 Proof by Contradiction

Our third method of argument has been famous since Greek antiquity. It is knownas proof by contradiction or, using its Latin name, reductio ad absurdum, acronymRAA. It was sketched in an Alice box in Chap. 2; here we go further.

Imagine that we are given information ’1, : : : ,’n and we want to get a conclu-sion “. We may even have tried various methods and got stuck, so we try proof bycontradiction. We suppose the opposite proposition :“ and seek to establish, onthis basis, a contradiction. If we manage to do so, our proof of “ from ’1, : : : , ’n iscomplete.

What is meant by ‘a contradiction’ here? We mean an explicit contradiction, thatis, any statement ” and its negation :”, whether obtained separately or conjoinedas ”^:”. As remarked by the Hatter in the Alice box of Chap. 2, it does not matterwhat contradiction it is, that is, how ” is chosen. When constructing the proof, weusually do not have a clear idea which contradiction we will end up with, but wedon’t care in the least.

Proof by contradiction is a universal rule: Neither the premises nor the conclusionhas to be in any particular form, in contrast with conditional proof (for conditionalconclusions) and disjunctive proof (for disjunctive premises). Sometimes it willmake the argument a great deal easier, due to the availability of the supposition


as something more to work with, but sometimes it will not make much difference.Some mathematicians prefer to use reductio only when they have to; others apply itat the slightest pretext.

When RAA is used to establish the existence of an item with a certain property,it will usually be non-constructive. That is, it shows the existence of something withthe desired feature by getting a contradiction out of supposing that nothing has thatfeature, but without thereby specifying a particular object possessing it. Actually,the same can happen in a less radical way with proof by cases; indeed, the argumentfor exercise 10.4.1 (6) shows that there exists a pair (x,y) positive irrational numbersx, y such that xy is rational, and moreover shows that at least one of two pairs mustdo the job, but does not specify a single such pair. Intuitionists do not accept suchproofs.

The most famous example of proof by contradiction, dating back to Greekantiquity, is the standard proof that

p2 is irrational. It begins by supposing thatp

2 is rational and, with the assistance of some basic arithmetic facts, derives acontradiction.

Exercise 10.4.2 (1) An exercise of Chap. 4 asked you to look up a standardtextbook proof of the irrationality of

p2 and put your finger on where it uses wlog.

Look it up again to appreciate how it uses reductio.

We used proof by contradiction many times in previous chapters. In particular,in Sect. 7.2.1 of the chapter on trees, we used it five times to establish as manybulleted properties from the explicit definition of a rooted tree. In Sect. 7.6.2, whenproving that S D R[R�1 is a minimal connected relation over a rooted tree (T,R), weused two reductios, one inside the other! Readers are advised to review those proofsagain now. The first five are very easy; the last is rather tricky, partly because of theiterated RAA.

For a simple example using only propositional moves, consider the two premisess ! t, (r_t) ! :p and try to get the conclusion :( p^s). As the conclusion is alreadynegative, we can cut a small corner: instead of supposing ::( p^s) and eliminatingthe double negation, we suppose p^s itself. We then use familiar tautologicalimplications to chain through to a contradiction, and we are done.

Exercise 10.4.2 (2)(a) Complete the above proof by constructing an elementary derivation of a

contradiction, using only familiar tautological implications from the tables inChap. 8.

(b) What contradiction did you come to? Can you modify the argument to reacha different contradiction just as naturally? Hint: Think of using modus tollensrather than modus ponens or vice versa.

Writing ? to stand for any explicit contradiction, we can state the logical ruleunderlying the procedure: Whenever A, :“ ) ?, then A ) “. Writing a slash forthe central transition, it may be expressed as

A; :“ ) ?=A ) “:


In split-level form, this becomes

ŒAI .A; :“ ) ?/� =“ :

Finally, a flattened suppositional rendering:

Given premises in A,suppose :“,argue to ?,discharge “,conclude “ on the basis of A.

Exercise 10.4.2 (3) Show that proof by contradiction is correct for classicalconsequence.

Alice Box: Indirect Inference

Alice: Is proof by contradiction the same as indirect inference? Or does thatterm cover, more broadly, all higher-level strategies?

Hatter: Neither, so be careful! ‘Indirect inference’ is more of a stylisticdescription than a logical one. It is used to mean any way of writingout an argument in which the order of development ‘swims against’the stream of implication.

Alice: And so which are deemed direct, and which indirect?Hatter: Chaining is of course considered direct. But so too is conditional

proof (although logically it rests on a higher-level inference) whilethe variant that we called contraposed or indirect conditional proofis, as its name suggests, indirect. Both disjunctive and case-basedproof are normally regarded as direct, while proof by contradictionis seen as the most indirect of all. Not only does it swim against thecurrent of implication but it supposes the very opposite of what wewant to prove.

Of course, there is a sense in which given any one of the three higher-level rulesthat we have been discussing, the others become redundant. For example, givenconditional proof, we can make it do the work of disjunctive proof by supposing“1 to get ”, conditionalizing to have “1 ! ” with the supposition discharged,then supposing “2 to get to ” again, conditionalizing to obtain “2 ! ” with thatsupposition likewise discharged, and finally using the tautological implication“1 ! ”, “2 ! ”, “1_“2 ` ”, which one could add to the list in the table of Chap. 8,to conclude ”, as desired. But from the standpoint of real-life reasoning, this is anartificial way to proceed, even though it is only a couple of steps longer. That is whyin mathematical practice, all three strategies are considered in their own right.


Exercise 10.4.2 (4)(a) Show that a similar formal trick can be done, given conditional proof, to carry

out proof by contradiction. Identify the tautological implication that would beused.

(b) Similar formal tricks could be used to make disjunctive proof do the work of theother two, likewise to make reductio do it all. Describe those tricks, isolatingthe relevant tautological implication in each case.

We end with some remarks for the philosophically inclined; others may skipto the next subsection. There are a few mathematicians, and perhaps rather morephilosophers of mathematics, who are wary of the use of truth-values in thediscipline. For them, the notions of truth and falsehood have no meaning inmathematics beyond ‘intuitively provable’ and ‘intuitively provable that we cannothave an intuitive proof’. For this reason, they feel that the whole of classical logicmust be reconstructed. In the resulting intuitionistic logic, some classical principlesgo out the window.

The most notorious of these is the tautology of excluded middle ’_:’, whichis not intuitionistically acceptable. But some other principles are also affected; inparticular, a number of classical equivalences involving negation are accepted inone direction only. For example, classical double negation ::’ a` ’ is reducedto ’ ` ::’; contraposition ’ ! “ a`:“ ! :’ drops to ’ ! “ ` :“ ! :’; deMorgan :(’^“) a` :’_:“ is cut to :’_:“ ` :(’^“); quantifier interchange:8x(’)a`9x:(’) becomes only 9x:(’) ` :8x(’). There are also losses involvingimplication without negation: for example, the classical tautology (’ ! “)_(“ ! ’)is not intuitionistically acceptable, nor is the following classical tautology usingimplication alone: ((’ ! “) ! ’) ! ’.

As one would expect, such cutbacks have repercussions for higher-level inferencerules. In intuitionistic logic, conditional proof is accepted, but its contraposedversion is not; disjunctive proof is retained, but proof by cases in any form thatpresumes excluded middle is out; proof by contradiction is not accepted in theform that we have given it but only in a form with interchanged negations: A,“ ) ?/A ) :“. However, all that is beyond our remit.

10.4.3 Rules with Quantifiers

We have already seen the second-level rules 8C and 9� for the two quantifiers inChap. 9, contrasting them with the first-level logical implications 8� and 9C. Theirformulations were:

8C: Whenever ’1, : : : ,’n ` ’, then ’1, : : : ,’n ` 8x(’), provided the variable xhas no free occurrences in any of ’1, : : : ,’n

9�: Whenever ’1, : : : ,’n�1, ’ ` ’n, then ’1, : : : ,’n�1, 9x(’) ` ’n, provided thevariable x has no free occurrences in any of ’1, : : : ,’n

As we also noted, the usual names and acronyms in English are ‘universalgeneralization’ (UG) and ‘existential instantiation’ (EI), although their higher-ordernature is perhaps brought out more clearly under the names proof to and from anarbitrary instance.


There is one respect in which 8C differs from both its dual 9� and all theother second-level rules for propositional connectives that we have considered. Itmakes no supposition! The premises of the subordinate inference are exactly thesame as those of the principal inference. So, we cannot simply identify higher-level inference with the making and discharging of suppositions. Suppositions area salient but not invariable feature of such inferences. Nevertheless, the rule 8C isclearly incremental, since the principal and subordinate inferences have the samepremises.

On the other hand, the rule 9�, it is not incremental as we have written it: thepremise 9x(’) of the principal inference is not a premise of the subordinate one.However, just as with disjunctive proof, this is not a difficulty. Since ’ ` 9x(’), wemay equivalently reformulate the rule, a little redundantly, as follows:

9�: Whenever ’1, : : : ,’n�1, 9x(’), ’ ` ’n, then ’1, : : : ,’n�1, 9x(’) ` ’n,provided the variable x has no free occurrences in any of ’1, : : : ,’n

Clearly, in this formulation, 9� is incremental, like all our other higher-levelrules. However, for simplicity, we will usually give it in its first formulation.

Exercise 10.4.3 (1)(a) Give a simple example of how the rule 8C can fail if its proviso is not respected.(b) Give two simple examples of how 9� can go awry if its proviso is ignored, one

with a free occurrence of x in ’n, the other with a free occurrence of x in one of’1, : : : ,’n�1.

How do 8C and 9� look in a more schematic presentation? We first rewrite themas conditions on an arbitrary inference relation ). At the same time, we also useexplicit notation for sets of formulae, which forces us to rephrase the formulation ofthe proviso for 9�. We also abbreviate the wording of the provisos in a manner thatshould still be clear.

8C: Whenever A ) ’, then A ) 8x(’) (x not free in A)9�: Whenever A,’ ) “, then A, 9x(’) ) “ (x not free in “ or in A)

In the schematic slash manner:

8C: A ) ’ / A ) 8x(’) (x not free in A)9�: A, ’ ) “ / A, 9x(’) ) “ (x not free in “ or in A)

Written as split-level rules:

8C: A ; (A ) ’) / 8x(’) (x not free in A)9�: A, 9x(’); (A, ’ ) “) / “ (x not free in “ or in A)

Exercise 10.4.3 (2) Take the incremental formulation of 9�, and rewrite it as asplit-level rule too.

When flattened, using the traditional terminology of ‘arbitrary’ and ‘choose’,these split-level rules become:

Given premises in A, consider arbitrary x. Argue to ’(x). Conclude 8x(’).


Given 9x(’) and premises A, arbitrarily choose x0 with ’(x0). Argue to “.Conclude “.

Here we face a terminological problem. Strictly speaking, we should not call theflattened presentation of 8C a suppositional argument because, as we have observed,it makes no supposition: the subordinate inference of the second-order rule 8C hasexactly the same premises as the principal one. But it would be pedantic to inventa new terminology to replace one so well entrenched, so we will follow custom bycontinuing to call such flattened arguments ‘suppositional’, in a broad sense of thatterm.

Alice Box: Arbitrary Triangles?

Alice: I have a philosophical worry about this talk of arbitrary items. In theflattened version of 8C you speak of an arbitrary x in the domain –say an arbitrary positive integer or triangle. And 9� is also glossed interms of choosing x0 in an arbitrary way. But there is no such thingas an arbitrary triangle or arbitrary number. Every integer is eithereven or odd, every triangle is either equilateral, isosceles or scalene.Arbitrary triangles do not exist!

Hatter: If I were as mad as hatters are made out to be, I might try to disagree.But of course you are absolutely right.

Alice: So how can we legitimately use the word?Hatter: Well, it’s just a manner of speaking : : :

Alice was right to be puzzled. Her question was raised by the eighteenth-centuryphilosopher Berkeley in a forceful critique of the mathematics of his time. TheHatter’s response is also correct, but we should flesh it out a bit. Let’s look at 8C.Although there is no such thing as an arbitrary element of a domain, we can provetheorems using a variable x, ranging over the domain, which is treated arbitrarilyas far as the particular proof is concerned. What does that mean? It means thatthe premises of the proof in question do not ascribe any particular attributes to x,other than those that it asserts for all elements of the domain. But this is still rathervague! We make it perfectly precise in a syntactic manner: the variable x shouldnot occur free in any of the premises – which is just the formal proviso of the rule.Likewise for 9�: treating the variable x in an arbitrary manner means that it doesnot occur free in the conclusion or any of the other premises of the application. Inthis way, a rather perplexing semantic notion is rendered clear by a banal syntacticcriterion.

For quick reference, we end this section by putting the main second-levelschemas for propositional connectives and quantifiers into a table, omitting theirsplit-level versions, which are easily constructed out of them (Table 10.2).


Table 10.2 Second-level inference rules

Conditionalproof(!C)

Disjunctiveproof(_�)

Proof bycontradiction

Proof to anarbitraryinstance(8C)

Proof froman arbitraryinstance(9�)

Subordinateinference(s)

A,“ ) ” A,“1 ) ”; A,:“ ) ”^:” A ) ’ A,’ ) “

A,“2 ) ”

Principalinference

A ) “ ! ” A,“1_“2 ) ” A ) “ A ) 8x(’) A,9x(’) ) “

Provisos x not freein A

x not free in “

or in A

10.5 Proofs as Recursive Structures*

In Sect. 10.2.2, we gave a formal definition of the notion of an elementary derivationin terms of sequences and (equivalently) in terms of trees. In the former mode, afinite sequence ¢ D ’1, : : : ,’n of formulae is an elementary derivation of ’ from Ausing the relation )0 iff (1) ’n D ’ and (2) for every i � n, either ’i 2 A or B )0 ’i

for some B � f’1, : : : ,’i�1g. How can we generalize to define, in an equally rigorousmanner, the idea of a higher-level proof? We have already expressed the essentialideas in an informal way; our task is now to do it formally.

10.5.1 Second-Level Proofs

A second-level proof uses second-level rules alone, without any first-level rules (norsplit-level ones). In contrast, a split-level proof employs first-level plus split-levelrules (but no second-level ones). We consider second-level proofs in this subsectionand split-level ones in the next.

How can one get away with only second-level rules, without any help at all fromfirst-level ones? For example, how can we dispense with the first-level rule of modusponens (!�): ’, ’ ! “ ) “? The answer is that it can be reformulated as a second-level rule [A ) ’; A ) (’ ! “)] / [A ) “]. Similarly, the first-level rule ’ ) ’_“

of addition (_C) may be put into the second-level form [A ) ’] / [A ) ’_“]. Otherfirst-level rules are transformed in the same simple way: a first-level rule ®1, : : : ,®n

` § becomes the second-level rule [A ` ®1, : : : , A ` ®n] / [A ` §].

Exercise 10.5.1 (1) (with partial solution)(a) Express modus tollens as a second-level rule.(b) Likewise for the first-level rules ’^“ ) ’ and ’^“ ) “.(c) What would be the second-order version of the first-level rule of reflexivity

’ ) ’?

10.5 Proofs as Recursive Structures 267

Solutions to (a) and (c)(a) A ) (’ ! “), A ) :“ / A ) :’.(b) A ) ’ / A ) ’.

The notion of a proof using only second-level rules may be defined in exactly thesame way as we defined an elementary derivation. The only difference is that thesequence (or tree) is no longer labelled by propositional or first-order formulae butby sequents, that is, by expressions A ) ’ with finite sets A (or in some versions,multisets or sequences) of formulae on the left and individual formulae on the right.We use the second-level rules to pass from earlier sequents in the sequence to laterones (from children to parents in the tree), and we allow a special set of sequents asnot needing further justification (leaves in the tree). In pure logic, these are usuallyof the form ’ ) ’; in mathematics, they may also reflect the substantive axiomsbeing employed.

In general, such sequent calculi are unsuitable for constructing deductions byhand, due to the immense amount of repetition that they create. The transformationof first-level rules into second-level ones brings with it the obligation to writeformulae over and over again. For example, when we rewrite first-level modusponens ’, ’ ! “ ) “ as a second-level rule [A ) ’; A ) (’ ! “)] / [A ) “], the setA of propositions figures thrice, so that a single application of the rule must writeout all the formulae in A three times! Since a typical proof will apply such ‘up-levelled’ rules many times, this effect is multiplied. Nevertheless, sequent calculican sometimes be used in automated reasoning. While the copying is enough todrive a human to distraction, it need not bother a computer too much so long as itdoes not become exponential.

However, the main use of sequent calculi is for certain kinds of theoreticalwork, notably to show that all proofs from suitable initial sequents have desirableproperties. The idea is to choose a well-balanced collection of second-level rules towork with, show that proof using only those rules may be put into a suitable normalform, define a well ordering on the normal derivations to serve as a measure oftheir length or complexity and establish desired properties by inducing on the wellordering. Such investigations make up what is known as proof theory. It forms aworld of its own, with a vast literature; we will not discuss it further.

Exercise 10.5.1 (2) (with solution) Write the second-level Tarski conditions ofmonotony and cumulative transitivity for inference relations in schematic slashmanner.

Solution Monotony is most concisely expressed as [A ) “] / [A[X ) “], althoughone could of course write: [A ) “] / [B ) “] whenever A � B. Cumulative transitiv-ity may be written: [A ) “ for all “ 2 B, A[B ) ”] / [A ) ”].


10.5.2 Split-Level Proofs

Next, we look at split-level proofs (aka subproof systems) which, as remarked,use first-level and split-level rules in cooperation with each other. Their definitionmay be expressed in terms of trees or of sequences; technically the version withsequences is more straightforward, and so we use it here.

Let )0 be a relation between sets of formulae on the left and individual formulaeon the right. To guide intuition, think of it as (the union of the instances of)some small finite set of first-level rules. Similarly, let )1 be a relation betweenboth formulae and sets of formulae on the left and formulae on the right. Thisrepresents the (union of the instances of) a small finite set of split-level rules.These two relations, )0 and )1, are what we use to build, recursively, split-levelproofs.

For any set A of formulae and individual formula ’, we say that a finite sequence¢ is a split-level proof of ’ from A using )0 and )1 iff (1) its last term is ’, (2)every point in the sequence is either (a) a formula in A, or (b) a split-level proof(of some formula from some set of formulae), or (c) a formula “ such that eitherB )0 “ or B )1 “ for some set B of earlier points in the sequence.

Note the recursion in clause 2(b): it is this that makes the definition more subtlethan that for a purely first-level proof (i.e. an elementary derivation) or even fora second-level one. Some points in the sequence may be split-level proofs, andthus themselves finite sequences, and some of their points may in turn be finitesequences and so on. With trees, the split-level proofs mentioned in clause (2b) willall be (labels of) leaves of the tree, but they will also be trees themselves and so onrecursively. In situations like this, it would be nice to have an n-dimensional pageon which to display the proof!

Exercise 10.5.2 (with solution) Write the Tarski conditions of monotony andcumulative transitivity for inference relations as split-level rules.

Solution Monotony is most concisely expressed as [A[X; A ) “] / “, althoughone could of course write: [B; A ) “] / “ whenever A � B. Cumulative transitivitymay be written: [A; A ) “ for all “ 2 B, A[B ) ”] / ”. When B is a singleton, thisreduces to [A; A ) “, A[f“g ) ”] / ”.

Alice Box: Third-Level Rules?

Alice: We have first- and second-level rules as well as the split-levelversions of the latter. Are there such things as third-level rules andso on?

Hatter: Of course, we could define them formally. But neither traditionalmathematical reasoning nor current logical analysis has found anyserious use for them. Perhaps this will change in the future.

Alice: I hope not. I already have enough on my plate!


10.5.3 Four Views of Mount Fuji

The material of this section has been rather abstract. We now illustrate it by anexample. We take a single inference problem and look at it from four differentangles: (1) as a short informal proof, entirely in English, (2) a semiformal one thatreveals more traces of higher-level inference, (3) a complete split-level proof, and(4) a purely second-level one. Not as many perspectives as in Hokusai’s hundredviews of Mount Fuji, but they do illustrate the theoretical discussion carried outso far.

10.5.3.1 An Informal Argument in EnglishThe example itself is simple; you don’t need to have taken a course in logic torecognize its validity. Our task is to put it under the microscope. Given that allwombats are marsupials, the problem is to show that if there is an animal in thegarden that is not a marsupial, then not every animal in the garden is a wombat.

In plain English, with no logical symbols, nor even any variables, the followingshort argument suffices:

Assume the premise; we want to get the conclusion. Suppose that there is an animal in thegarden that is not a marsupial. Since it is not a marsupial, the premise tells us that it is nota wombat. So, not every animal in the garden is a wombat.

For most purposes, this is perfectly adequate. But it leaves the underlying logicalmechanisms rather hidden. While the word ‘suppose’ hints at the presence of ahigher-level step, namely, a conditional proof, there is a second one that is almostinvisible. As a first move to bring a bit more logic to the surface, we reformulatethe problem in the language of first-order logic and set out the argument in a moreformal way but still in suppositional form.

10.5.3.2 Semiformal Suppositional PresentationExpressed in the symbolism of first-order logic, the problem is given 8x[Wx ! Mx],show 9x[Ax^Gx^:Mx] ! :8x[(Ax^Gx) ! Wx]. In our semiformal but still sup-positional proof, we make free use of any basic implications or equivalences fromtables in Chaps. 8 and 9.

We are given (1) 8x[Wx ! Mx]. Suppose (2) 9x[Ax^Gx^:Mx]; it will suffice to get:8x[(Ax^Gx) ! Wx]. By (2), we may choose an x such that (3) Ax^Gx^:Mx, so thatin particular :Mx. Applying 8 � to (1), we have Wx ! Mx, so by modus tollens, weget :Wx. Putting this together with Ax^Gx, which also comes from (3), we have (4)Ax^Gx^:Wx, so by 9C, we infer (5) 9x[Ax^Gx^:Wx], in other words by a familiartautological equivalence, 9x:[(Ax^Gx) ! Wx] and thus by quantifier interchange (QI):8x[(Ax^Gx) ! Wx], as desired.

This already renders more of the logic visible, and in particular, the italicizedwords tell us that it is appealing to two higher-level strategies: conditional (CP) proofand 9�. This is more or less the way in which the argument would be presented in alogic text centred on what is called ‘natural deduction’. However, it is still flat in thesense of Sect. 10.3.3, that is, just an annotated sequence of propositions. We nowwrite it explicitly as a split-level proof, replacing the suppositions by subproofs.


10.5.3.3 As a Split-Level ProofThe split-level version consists of a main proof, a subproof and, within the latter,a sub-subproof that has no further subproofs and is thus an elementary derivation.

Main Proof AA1. 8x[Wx ! Mx] premiseA2. Insert subproof BA3. 9x[Ax^Gx^:Mx] !

:8x[(Ax^Gx) ! Wx]conclusion: from A1,A2 by CP

Subproof BB1. 8x[Wx ! Mx] premiseB2. 9x[Ax^Gx^:Mx] premiseB3. Insert sub-subproof CB4. :8x[(Ax^Gx) ! Wx] new conclusion: from B1, B2, B3 by 9�

Sub-subproof CC1. 8x[Wx ! Mx] premiseC2. 9x[Ax^Gx^:Mx] premiseC3. Ax^Gx^:Mx premiseC4. :Mx from C3 by ^�C5. Wx ! Mx from C1 by 8�C6. :Wx from C4,C5 by MTC7. Ax^Gx from C3 by ^�C8. Ax^Gx^:Wx from C6,C7 by ^CC9. :[(Ax^Gx) ! Wx] from C8 by a taut. equivalenceC10. 9x:[(Ax^Gx) ! Wx] from C9 by 9CC11. :8x[(Ax^Gx) ! Wx] same conclusion: from C10 by QI

What were suppositions of a single argument now become additional premisesof subordinate arguments. Evidently, this is more repetitious and thus longer thanthe suppositional version. It is also less familiar and thus perhaps more difficultto assimilate on first acquaintance. But it gives a fuller record of what is reallygoing on.

10.5.3.4 As a Second-Level ProofFinally, we present essentially the same little argument as a second-level proof, inwhich the only rules used are second-level ones. ‘Essentially’, because in order tokeep things reasonably brief, we forget about the ‘animal in the garden’ part andconsider the following simpler version: Given that all wombats are marsupials, showthat if there is something that is not a marsupial, then not everything is a wombat.In other words, we prove the sequent 8x[Wx ! Mx] ` 9x[:Mx] ! :8x[Wx].

Conditional proof and 9� are reformulated in their second-level rather than split-level versions. Familiar first-level rules are also recast as second-level ones, in


the manner we have explained, with one exception: inferences of the form ’ `’ (identity inferences) are allowed without needing further justification (in a treepresentation they would occur as leaves).

D1. Wx ! Mx ` Wx ! Mx identity inferenceD2. Wx ` Wx identity inferenceD3. Wx ! Mx, Wx ` Mx from D1, D2 by second-level MPD4. Wx ! Mx, :Mx, Wx ` Mx from D3 by monotonyD5. :Mx ` :Mx identity inferenceD6. Wx ! Mx, :Mx, Wx ` :Mx from D5 by monotonyD7. Wx ! Mx, :Mx ` :Wx from D4, D6 by reductio ad

absurdumD8. 8x[Wx ! Mx], :Mx ` :Wx from D7 by second-level 8�D9. 8x[Wx ! Mx], :Mx ` 9x[:Wx] from D8 by second-level 9CD10. 8x[Wx ! Mx], :Mx ` :8x[Wx] from D9 by second-level qfr.

interchangeD11. 8x[Wx ! Mx], 9x[:Mx] ` :8x[Wx] from D10 by 9�D12. 8x[Wx ! Mx] ` 9x[:Mx] ! :8x[Wx] from D10 by conditional proof

The second-level versions of first-level rules are obtained by the transformationdescribed earlier. In particular, in line D3, the second-level version of modus ponensis the rule ’ ` ®, “ ` (® ! §) / ’, “ ` §. In D8, second-level 8� is the rule ’ `8x[®] / ’ ` ®[t/x] whenever the substitution is clean. Line D10 uses the second-levelversion of quantifier interchange: ’ ` 9x[:®] / ’ ` :8x[®]. On the other hand, thespecial first-level rule ’ ` ’ is left unchanged so that the derivation can start withsomething.

Exercise 10.5.3 (1)(a) Specify the substitutions that make lines D3, D8 and D10 instances of the

second-level rules mentioned.(b) Elaborate on this second-level proof for the simplified version of the example

to construct one for the example itself.

In the example, we have allowed any familiar first-level rule from tables inChaps. 8 and 9 to be transformed into a second-level one. It should be noted,however, that most presentations of sequent calculi are much more parsimonious.They allow only a very few rules: one ‘introduction’ and one ‘elimination’ rulefor almost all connectives (although an exception must be made for negation andfor the falsum, which do not have any classically correct introduction rules unlesswe admit sets of formulae on the right of sequents). Despite their parsimony, thesenevertheless suffice to generate all classically valid inferences, and because of thesame parsimony, they permit normal forms for second-level derivations, which inturn make inductive arguments about them possible. That, however, would take usbeyond the limits of this book.


Exercise 10.5.3 (2) Consider the following simple inference: If one person is arelative of another, who is a relative of a third, then the first is a relative of the third.If one person is a relative of another, then that other is a relative of the first. So if aperson is a relative of two others, then they are relatives of each other.(a) Establish its validity in English (with the help of variables if you wish but no

symbols).(b) Translate it into the symbolism of first-order logic, and establish validity by a

semiformal suppositional argument.(c) Rewrite the argument as a split-level proof containing subproofs.(d) Rewrite it as purely second-level proof.

Exercise 10.5.3 (3) In Chap. 9, we remarked that in first-order logic, the symmetryand transitivity of identity may be derived from its reflexivity and the replacementrule. It is hardly possible to express that derivation in English without any symbols,since the replacement rule itself is difficult to formulate except in a symbolicmanner. So:(a) Construct a semiformal suppositional argument for each of symmetry and

transitivity. Hint: The central step in that argument was made in exercise 9.4.4(1) (b).

(b) Rewrite the argument as a split-level proof containing subproofs.


Exercise 10.1: Consequence Relations(a) Show that cumulative reflexivity taken together with monotony implies (plain)

transitivity (the principle that whenever ’ ) “ and ’ ) “, then ’ ) ”).(b) Let ) be the relation defined by putting A ) “ iff either (i) A is a singleton f’g

and ’ D “ or (ii) A has more than one element. Show that ) satisfies strongreflexivity, monotony and transitivity, but not cumulative transitivity.

Exercise 10.2: Consequence Operations*(a) In an exercise, we noted that the LHS � RHS half of the ‘distribution’

equality Cn`(A[B) D Cn`(A)[Cn`(B) fails for classical consequenceCn`. (i) Show that the equality formed by ‘closing’ the RHS, namely,Cn(A[B) D Cn(Cn(A)[Cn(B)), holds for all consequence operations, and (ii)express this equality in terms of consequence relations.

(b) (i) Show that for any consequence operation Cn, if A � Cn(B) and B � Cn(A),then Cn(A) D Cn(B), and (ii) express the principle in terms of consequencerelations.

(c) Show that for classical consequence, we have the equality Cn`(A)\Cn`(B) DCn`[Cn`(A)\Cn`(B)]. Hint: One half is easy; the other is quite difficult.It needs the compactness of classical consequence and uses disjunction anddistribution.


Exercise 10.3: Proof Construction(a) Given premises p ! ((r_s) ! t), q ! (:(s_u)_t), s, get the conclusion

(p_q) ! t, by a semiformal suppositional proof using CP and DP.(b) Rewrite the semiformal suppositional proof as an explicit split-level proof.(c) Construct an informal proof, from no premises at all, that there is no set whose

elements are precisely the sets that are not elements of themselves. Remark:Those familiar with the paradoxes of naıve set theory will recognize the Russellparadox here.

(d) Express the last statement in the language of first-order logic, taking the classof all sets as your domain of discourse (thus, no predicate for ‘ : : : is a set’ isneeded) and elementhood as the sole relation. Rewrite your informal proof as asemiformal suppositional argument, then as a split-level proof.

Exercise 10.4: Higher-Level Rules(a) Show that when ) is monotonic, then conditional proof may equivalently be

formulated in the following more general way: A0, “ ) ” / A ) “ ! ” wheneverA0 � A.

(b) Explain informally how the rule of proof by cases may be obtained fromdisjunctive proof. Hint: Make use of a simple tautology.

(c) Explain informally how, conversely, disjunctive proof may be obtained fromproof by cases. Hint: Make use of disjunctive syllogism.

(d) Write out the converses of disjunctive proof, reductio, 8C, 9� and checkwhether they are classically correct. Comment on the status of the provisos for8C, 9� in the converse rules. Remark: As disjunctive proof is a two-premiserule, you will need to define its ‘converse’ a little creatively.

Exercise 10.5: Introduction and Elimination Rules* A second-level inferencerule A1 ) ’1, : : : , An ) ’n / B ) “ for propositional logic is said to be structuraliff it contains no connectives at all, and is an introduction (resp. elimination) ruleiff it contains only one connective, and that connective occurs just once, with theunique occurrence in “ (resp. in some formula in B).(a) Classify the second-level rules discussed in this chapter (monotony, cumulative

transitivity, conditional proof, disjunctive proof, proof by cases, reductio, 8C,9�) as structural, introduction, elimination or ‘none of the above’.

(b) Call a truth-functional connective *(p1, : : : ,pn) contrarian iff v(*(p1, : : : ,pn))D 0 when all v(pn) D 1. (i) Identify five familiar contrarian truth-functionalconnectives. (ii) Show by a simple argument that no contrarian connective has aclassically correct introduction rule. Hint: Read the definition of an introductionrule carefully.

Exercise 10.6: Wild Card Exercises – Uncertain Inference Relations*(a) Google ‘lottery paradox’ and ‘preface paradox’, write up short descriptions of

them and articulate what they tell us about the status of second-level rule for^C in uncertain inference.


(b) In the light of Chap. 6 on probability, what would you expect the status ofdisjunctive proof to be in probabilistic inference? What about proof by cases?

Selected Reading

The references given for Chaps. 8 and 9 do not contain anything on the material of this chapter.The concept of a consequence operation/relation is explained in the following sources:

Wikipedia entry on consequence operators. http://en.wikipedia.org/wiki/Consequenceoperators

Makinson D (2007) Bridges from classical to nonmonotonic logic. College Publications,London, chapter 1

Wojcicki R (1988) Theory of logical calculi: basic theory of consequence operations. Syntheselibrary, vol 199. Reidel, Dordrecht, chapter 1

The Wikipedia article highlights algebraic and topological connections of the concept, withlots of useful links. The Makinson text also goes on to discuss uncertain inference relations,mainly qualitative but also probabilistic.

The following are two very clear textbook discussions of traditional informal proof strategiesin mathematics. Section 2.6 of the Bloch text also contains useful advice on writing proofsin coherent and elegant English:


Velleman DJ (2006) How to prove it: a structured approach, 2nd edn. Cambridge UniversityPress, Cambridge, chapter 3

For a formal study of higher-level rules and proof theory, a good entry point would be:von Plato J The development of proof theory. In: Stanford encyclopedia of philosophy. http://

plato.stanford.edu/entries/proof-theory-development

For an aerial view of the jungle of textbook systems of ‘natural deduction’, with also a briefintroduction to proof theory, see:

Pelletier J, Hazen A (2012) Natural deduction. In: Gabbay D, Woods J (eds) Handbook of thehistory of logic. Central concepts, vol 11. Elsevier North-Holland, Amsterdam

Index

AAbsorption, 15, 197, 198, 205Ackermann function, 92, 93, 107, 109Acyclic, 55, 56, 107, 165, 168, 169, 183Addition principle, 114, 115, 117, 125, 144Algorithms, 134, 177–180, 185–187, 202–204,

206, 208, 209, 211, 245Alice boxes, 3, 6, 11, 16, 18, 19, 22, 28, 36, 38,

45, 48, 51, 59, 64, 66, 67, 69, 72, 75,85, 93, 95, 96, 105, 106, 115, 116, 120,123, 133, 139, 145, 146, 148, 152, 156,161, 172–174, 181, 183, 185, 191, 205,220–221, 224, 230, 232, 235, 238, 250,254, 260, 262, 265, 268

Alphabet, 71, 76–77, 95–97, 100, 101, 107,110, 135, 187

Alternating quantifiers, 222, 234Andreescu, T., 136Antisymmetry, 49–51, 55–56, 198, 207Arbitrary instance, 252, 257, 263, 266Archimedes’ principle, 139, 143Argument of a function, 57, 58, 65, 66, 73, 75,

78, 92, 100, 107, 148, 192, 227Aristotle, 80, 181Arity of a relation, 37Assignment of truth-values, 101, 110Association, 12, 13, 24, 36, 47, 48, 55, 57–78,

109, 150, 151, 153, 159, 160, 173, 176,198, 203, 204, 256

Asymmetric, 44, 49–52, 56, 165, 169Atomic formula, 220, 226, 228, 229Axioms, 3, 4, 7, 17, 68, 92, 96–97, 102, 252,

267

BBackwards evaluation, 67, 88, 90Bags, 120Base rate, 157, 159Basic conjunction, 202–205

Basic disjunction, 204, 205Basic term, 219, 220Basis, 6, 81–93, 95, 97, 99–101, 104, 109, 122,

170, 189, 190, 218, 220, 228, 229, 245,252–253, 256, 260, 262

Bayesians, 158Bayes’ theorem, 157–159, 163–164Belief organization and change, 189Ben-Ami, M., 215, 241Biconditional statements, 21, 22, 30Bijective functions, 67–68Binary relation, 30–33, 36, 42, 58, 59Binary search tree, 177–180, 187Binary tree, 167, 176–180Bivalence, 191, 209, 233Bloch, E.D., 25, 56, 78, 274Block of a partition, 47Boolean operations, 11–18, 23–24, 35Bottom, 12, 14, 33, 100, 118–119, 126, 135,

165, 206, 226Bottom-up definition, 52, 53, 62, 90, 93, 95,

110Bottom-up evaluation, 90Bound occurrence, 223, 224, 227, 229, 240Bovens, L., xiiiBradley, R., xiiiBranch of a tree, 166Breadth-first strategies, 212

CCancellation of a supposition, 132Cantor, G., 69Cardinality of a set, 68–70, 126Cartesian plane, 29Cartesian product, 27–32, 54, 59, 107, 110,

115–117, 127, 192Causal relationship, 58, 149, 150Cayley, A., 181Cell of a partition, 78, 134, 159, 200, 208

D. Makinson, Sets, Logic and Maths for Computing, Undergraduate Topics in ComputerScience, DOI 10.1007/978-1-4471-2500-6,© Springer-Verlag London Limited 2008, 2012

275

276 Index

Chaining of inferences, 189Characteristic function, 72–74, 78Children of a node, 169, 174–176, 179, 180Class, 1, 2, 19, 71, 84, 92, 115, 130, 132, 140,

233, 273Clean substitution, 236, 237, 240, 271Closed formula, 223, 224, 232, 233Closure, 54, 56, 61–62, 77, 94–100, 110

operation, 251relation, 247

cnf. See Conjunctive normal form (cnf)Collection, 1, 2, 18, 19, 24, 46–48, 53, 74, 75,

95, 97, 99, 103, 107, 119, 141, 144,156, 170, 267

Combination, 12, 22, 115, 121–132, 134, 146,147, 192, 196

Combination with repetition, 121, 127–130,134

Combinatorics, 113–136Common property, 9, 10Commutation, 12, 13, 198, 203, 204Compactness, 250, 251, 272Comparison principle, 70, 78Complementation, 16–19, 35, 159, 197–198Complete graph, 187Completeness in logic, 250Completeness theorem for chaining, 251Complete relation, 50, 102Composition of functions, 38–40, 62, 63, 66Concatenation, 76, 77, 95Conditionalization, 140, 152, 153, 158Conditional probability, 141, 147–153, 155,

157, 158, 163, 164Conditional proof, 6, 15, 67, 252–260, 262,

263, 269, 271, 273Conditional statements, 6, 21, 81Configured partition, 132–134, 136Congruence relation, 199, 238Conjunction, 12–14, 17, 21, 98, 100, 101, 110,

141, 193, 202–207, 209, 210, 225, 226,241, 252, 253

Conjunctive normal form (cnf), 204–205, 208,209, 214

Connected set, 181, 182, 185Connective, 13, 14, 17, 24, 96, 101, 110, 141,

176, 192–195, 197–199, 202, 203, 206,209, 213, 271, 273

Consequence operation, 251–252, 272Consequence relation, 247–252, 255, 258, 272Consequential mirabilis, 203Consistent, 66, 144, 240–241Cons operation, 76–77, 100, 110

Constant, 72, 75, 96, 149, 176, 218, 220,222, 225, 227–231, 234, 236, 238,240

Constant function, 58, 72–73, 75, 78, 138,192

Constructive proof, 7, 273Constructor, 8, 11, 18, 20, 41, 53, 54, 69, 76,

77, 79, 95, 127, 129, 163, 169–172,176, 179, 180, 182, 184, 187, 192, 193,199, 203–206, 209–214, 226, 228, 230,231, 244, 247, 253, 257, 258, 260, 261,266, 267, 271–273

Contingent, 199, 200, 213, 233Contraction, 189Contradiction, 17, 18, 50, 51, 72, 104, 105,

110, 155, 168, 182–184, 186, 195,199–201, 203, 204, 206, 209, 213, 214,233, 235, 240–241, 252, 257, 259–263

Contrapositive, 30, 48, 50, 51, 64, 70, 83, 104,253

Converse, 2, 3, 15, 17, 22, 30, 35–36, 41, 43,44, 48, 50, 52, 53, 55, 56, 63–65, 67,68, 70, 73, 77, 80, 95, 102, 103, 110,114, 120, 150, 156, 163, 182, 185, 197,215, 235, 240, 250, 256, 258, 273

Countable set, 69, 144Counting formulae, 121–126, 128–130, 132,

133Course-of-values induction, 90Cumulative induction, 89–93, 95, 104, 109,

110, 249Cumulative recursion, 89–94, 96Cumulative transitivity, 246–250, 252, 267,

268, 272, 273Cut, 247, 260, 261, 263Cycle, 168, 182–187, 259

DDead branch, 210–212Decision procedure, 209Deduction, 238, 255, 267Definition

bottom up, 52, 53, 62, 90, 93, 95, 110by common property, 9by enumeration, 9, 20top-down, 54, 95

Degtyarev, A., xiiide Morgan laws, 19, 20, 197, 198, 200, 204,

209Depth-first strategy, 212Depth of a formula, 101

Index 277

Derivationelementary, 243–247, 249, 253, 254, 257,

261, 266–268, 270rule, 96tree, 244, 245, 247

Descartes, R., 29Designation function, 108Detachment, 96, 98Deterministic program, 108Diagonal of a relation, 33Dietrich, F., xiiiDifference, 4, 9, 13, 16–19, 23, 24, 34, 44, 46,

48, 59, 73–75, 84–85, 88, 93, 98, 113,114, 118, 125, 130, 132, 134, 139, 143,145, 149, 155, 158, 161, 173, 190, 193,195, 222, 227, 232, 255, 260–261, 267

Directed graph (Digraph), 33–35, 41–45, 51,58, 64, 66, 68, 77, 167, 168

Directed tree, 167, 168Discharge of a supposition, 256, 262Disjoint sets, 11, 12, 84, 110, 114, 115, 117,

144, 163Disjoint union, 23, 113, 115Disjunction, 14, 17, 21, 98, 101, 110, 141, 193,

202–205, 209, 214, 225, 226, 238, 257,258, 272

Disjunctive normal form (dnf), 202–204, 210Disjunctive proof, 15, 238, 252, 257–260,

262–264, 273, 274Disjunctive syllogism (DS), 196, 200, 253,

273Dispersion, 162Distribution, 15, 19, 20, 24, 131, 136, 138–141,

143–145, 147, 160–162, 164, 197, 198,204, 225–226, 235, 272

Division by zero, 148dnf. See Disjunctive normal form (dnf)Domain, 32, 45, 55, 60–63, 65, 72, 74, 75, 80,

94, 98–102, 106–108, 110, 143, 144,148, 149, 153, 160, 173, 177, 186, 191,222, 224–235, 238, 240, 253, 265, 273

Double negation, 101, 198, 204, 205, 225, 244,245, 247, 261, 263

DS. See Disjunctive syllogism (DS)Duals, 13, 204, 237, 264

EEdge, 181–187, 259EG. See Existential generalization (EG)EI. See Existential instantiation (EI)Elementary derivation, 243–247, 249, 253,

254, 257, 261, 266–268, 270Elementary event, 141

Elementary letter, 98, 101, 194–198, 201, 202,204, 206–208, 210, 211, 213–214, 252,254

Element of a set, 1, 2, 29, 40, 98, 102, 117, 194Empty relation, 32, 103, 170, 248Empty set, 10–11, 20, 32, 47, 74, 103, 141,

148–149, 170, 186, 240Enumeration, 4, 9, 10, 20, 29, 31, 37, 48, 55,

68, 70, 145, 163, 246Equality, 7, 24, 35, 45, 81, 82, 85, 92, 113, 140,

149, 197, 251, 252, 272Equinumerosity, 68–70, 77, 113Equiprobable, 138, 139, 143–147, 160, 161Equivalence class, 48Equivalence relation, 44–49, 55, 56, 134, 197,

199, 207, 228, 238, 240Euler diagram, 7–9, 11, 32, 126Evaluating a function, 87–88Evaluating an expression, 174, 175Event, 139–145, 147, 150, 151, 155–157, 163Exclusion, 24, 126, 135Exhaustion, 46–48, 80, 92Existential generalization (EG), 237Existential instantiation (EI), 237, 263Existential quantifier, 39, 46, 223, 226, 230,

236, 238Expansion, 197, 198, 204, 205Expectation, 141, 159–162, 164Expected value, 160, 161Experiment, 81, 141, 150Explicit contradiction, 210–212, 260, 261Exponential function, 86, 87, 92Extensionality, 3, 4, 7, 10

FFactorial function, 86, 87, 89, 92, 123Falsehood, 191, 228, 233, 263Family of sets, 74Fibonacci function, 89–91, 109, 169Field of subsets, 144Fineness of a partition, 55, 207Finest splitting, 208Finite path, 52Finite transform, 225, 226, 230First-level rule, 236, 237, 252, 256, 266–268,

270–271First-order logic, 92, 222, 238, 240, 247, 251,

255, 269, 272, 273Flat set, 2, 18Flattened rule, 256, 257, 259, 260, 262, 264,

265Floor function, 77Formulae of predicate logic, 220, 221

278 Index

Formulae of propositional logic, 96, 98, 101,172, 175, 195, 201, 205

Forwards evaluation, 87Four-colour problem, 260Free occurrence, 226, 227, 229, 232, 233,

236–239, 264Frequentist conception, 142Function, 19, 36, 57, 80, 119, 137, 169, 190,

219, 244Functional completeness, 199, 213Function letter, 219, 222, 227, 228, 231, 238,

240

GGabbay, D., 23Gamble, 145, 160, 161Gamut, L.T.F., 215, 241Gate sign, 233Generalized union and intersection, 18–20, 24Generation, 96Generator, 95Goal changing, 6Goranko, V., xiiiGrammatical structure, 171Graph, 167, 168, 187, 244, 260

HHaggarty, R., 136Halmos, P., 3, 69Hasse diagram, 20, 21, 33, 34, 51, 52, 55Hazen, A., 274Head of a list, 76, 100Hedman, S., 136Hein, J., 56, 78, 111, 215, 241Herman, J., 136Hermeneutics, 134Heuristics, 172, 212, 253Higher-level proof, 239, 252–266Hodges, W., 215, 241Homomorphism, 200, 201Horn clause, 205Howson, C., 215, 241Huber, H., xiiiHuth, M., 215, 241

IIdempotence, 12, 14, 204, 251, 252Identity, 3–7, 17, 21, 28, 36, 44–46, 55, 56, 72,

76, 77, 172, 197, 198, 200, 218–220,228, 229, 236, 238–240, 247, 271, 272

Identity function, 62, 72, 89, 99, 192

iff, 2, 28, 64, 91, 113, 155, 168, 190, 223,247

if-then, 81if-then-else, 86, 90Image, 13, 40–41, 47, 54, 55, 61–62, 77, 94,

118–119, 174, 187, 230Immediate predecessor, 43, 51, 52Implication, logical, 233–235, 248, 249,

255Import/export, 203Inclusion, 2–9, 12, 17, 24, 34, 49, 51, 53, 77,

95, 103, 114, 126, 135, 148, 185, 198,251

Inclusion and exclusion rule, 24, 126, 135Inclusive order, 49Inconsistent, 214, 233Incremental rule, 255, 256, 264Independence, 145, 155–157, 163Index set, 74, 75Indirect proof, 253, 262Indiscernibility, 238Induction, 53, 62, 70, 79–111, 139, 162, 163,

182, 194, 195, 214, 249, 250Induction goal, 82–84, 87, 91Induction hypothesis, 82–84, 87, 91Induction step, 81–87, 90, 91, 97, 100, 101,

104, 109, 110, 249Inference, 150, 189, 190, 237, 239, 244, 246,

248–251, 254–256, 258, 259, 262,264–269, 271–274

Infinite descending chain, 102, 103, 110–111Infix notation, 176Injective function, 64, 68, 70, 75, 78, 99, 119Input, 21, 58, 64, 75, 79–111, 237, 256Insertion into a tree, 187Integer, 2, 3, 9, 15, 16, 18, 23, 24, 27, 30, 41,

48–50, 52, 55, 60, 66, 69, 74, 75, 77,80–87, 91, 96, 102–104, 109, 123, 134,143, 173, 177, 222, 253, 265

Interior node of a tree, 179, 244Interpretation, 129, 130, 133, 134, 142,

227–234, 238, 240Intersection of sets, 11–13, 19, 52, 94, 95,

194Intransitive relation, 43, 55, 186Intuitionistic logic, 263Inverse of a function, 63–65Irreflexive relation, 42, 55, 102, 103, 165,

185

JJohnsonbaugh, R., 136, 188Join of relations, 37, 55

Index 279

KKolman, B., 188Kourousias, G., xiii

LLabelled tree, 172–174, 187, 209, 211Leaf of a tree, 166, 171, 187, 245, 249Least element, 56, 102, 104Least letter set, 206–209, 214Left projection, 73Length of a list, 76–77, 179Lexicographic order, 50, 107, 108, 177Limiting case, 31, 61, 123, 127, 128, 139, 148,

149, 151, 155, 156, 164, 196–198, 202,203, 205, 208

Linearization, 245Linear order, 50, 55–56, 124–125, 177Link, 34, 44, 51, 64, 82, 108, 150, 165, 166,

168–171, 173, 175, 180–184, 187, 190,195, 248

Link-height, 169, 175, 187Link-length, 166, 169Lipschutz, S., 25, 56, 78, 111, 136, 164Lipson, M., 164List, 4, 20, 68, 70, 75–77, 100, 103, 116, 121,

124–125, 177, 179, 180, 187, 197, 198,219, 233, 262

Literal, 202–205Local universe, 16–18, 72–74, 94, 197–198Logic

boxes, 6, 12, 14, 15, 17, 21–22, 39, 74, 81,175, 190, 219, 252, 257

programming, 205propositional, 98, 101, 110, 175, 189–215,

220, 224, 226–229, 233, 238, 247, 273quantificational, 217–241, 247, 250

Logical closure relation, 247Logical equivalence, 224–227, 234Logical implication (consequence), 233Logical system, 96Logical truth, 227Lower bound, 12, 151Łukasiewicz, J., 175

MMakinson, D., 274Many-valued logic, 191Meet of sets, 11Metalanguage, 195, 197, 221, 230, 235Meta-variable, 221Minimal element, 56, 102–106Minimal mixing, 208

Model, 180, 232, 233, 260Modus ponens, 96, 98, 196, 244, 245, 253,

261, 271Modus tollens, 196, 244, 245, 247, 253, 257,

261, 266, 269Monotony (monotonicity), 246–252, 267, 268,

271–273Monty Hall problem, 164Most modular presentation, 208Multiplication principle, 122, 127Multiset, 120, 267Mutual exclusion, 11, 46, 141

NNatural deduction, 269Natural number, 10, 23, 35, 41–45, 49, 54, 59,

65, 66, 68, 77, 83, 85, 86, 89–94, 96–99,102–104, 106, 108–110, 218, 240, 259

Nayak, A., xiiiNegation, 17, 21, 98, 101, 110, 141, 183, 190,

192–194, 198, 200, 202–205, 209, 214,224–226, 244, 245, 247, 260, 261, 263,271

New Foundations (NF), 17NF. See New Foundations (NF)Node height, 169, 170, 179, 187Node of a tree, 176–177, 245Non-classical logic, 191Non-deductive inference, 248Non-standard analysis, 144Normal form, 106, 201–209, 213, 214, 271n-place function, 59, 78, 220, 227n-place relation, 31, 36Null event, 141

OObjectivist conception, 142Object language, 195, 197, 221, 230, 235Ok branch, 210, 212One-one correspondence, 67, 116, 117, 129,

170One-place function, 58, 59, 68, 152, 153, 222One-sided procedure, 225Onto function, 65–67, 73Open formula, 223OCRC, 118–121, 127, 130OCR–, 118–122, 124O–RC, 118–121, 128–130O–R–, 118–122, 124, 128Order

inclusive, 49linear, 50, 55–56, 125, 173, 177

280 Index

Order (cont.)partial, 49–52, 55–56, 207of selection, 118, 124, 125, 129strict, 49–52, 56total, 50

Ordered n-tuple, 28, 29, 71, 75, 173Ordered pair, 28, 29, 31, 33, 59, 152Ordered tree, 173, 174Organization of beliefs, 189Output, 58, 64, 79–111, 176, 204, 237, 256Overcounting and undercounting, 126

PPair, 28, 29, 31, 33, 49, 51, 58, 59, 70, 71, 92,

108, 117, 139, 145, 147, 152, 159–161,167, 173, 181, 182, 192, 196, 198, 202,213, 221, 230, 231, 244, 261

Pairwise disjoint, 11, 46–48, 114, 117, 144,162, 170

Palindrome, 95, 97, 98, 110Parent, 35, 38–40, 42–44, 53, 166–169, 171,

173, 176, 177, 182, 184, 267Parenthesis-free notation, 175–176Parent, x, 39Parikh, R., 208Partial function, 59, 60, 63, 65, 67, 75, 77, 148,

151, 173Partial order, 49–52, 55–56Partition, 44–48, 55, 77, 78, 113, 130–136,

154, 159, 200, 207, 208Path, 34, 51, 52, 69, 165–169, 181–185, 234Pattern-matching, 86, 90Payoff function, 141, 160–162, 164Pearson, K., 153Pelletier, J., 274Permutation, 48, 121–128, 130, 131, 146, 147Permutation with repetition, 127, 128Peter, R., 92Philosophy, 74, 96, 97, 115, 141–144, 150, 158Pigeonhole principle, 70–71, 74, 78, 84Polish notation, 176, 194Porphyry, 181Poset, 49, 50Positive clause, 205Postfix notation, 176, 194Power set, 20–24, 65, 69, 77, 103, 251Predicate, 38, 218–222, 227, 228, 231, 240,

273Predicate logic, 273Prefix notation, 44, 176Premise, 190, 196, 238, 243–246, 253–262,

264, 265, 269, 270, 273

Primitive connective, 194Principal case, 128, 155, 184, 203, 208Principal inference, 254, 256, 259, 264,

266Prior probability, 148Probability, 137–164, 190, 274

distribution, 138, 140, 141, 143, 145,160–162, 164

function, 138–144, 153, 155, 157, 160, 162,163

space, 137–141, 143, 162Product rule, 149Projection, 36–37, 55, 73, 78, 153, 155, 255Prolog, 171Proof, 6, 7, 11, 15–17, 21, 24, 29, 30, 34, 51,

67–70, 79–85, 87, 89–92, 94, 97–98,104–111, 120, 128, 129, 151, 152, 167,168, 171, 172, 181–182, 184–186, 195,199, 206, 208, 211, 214, 235, 238, 239,243–273

by cases, 15, 16, 252, 257–261, 263, 273,274

construction, 273by contradiction, 51, 168, 182, 186, 252,

259–263theory, 267

Proposition, 51, 73, 78, 84, 85, 96, 98, 101,110, 148, 172, 175, 176, 189–215, 220,224, 226–229, 231, 233, 238, 243–247,250–252, 254–257, 259–261, 264, 265,267, 269

Propositional logic, 96, 98, 101, 110, 172, 175,189–215, 220, 224, 226–229, 233, 247,273

Propositional variable, 194Proving a general statement, 81

QQI. See Quantifier interchange (QI)Quantificational logic, 217–241, 247–248,

250Quantifier

existential, 39, 46, 73, 219, 226, 232, 236,238

universal, 39, 46, 81, 218–220, 222, 225,230, 253, 255

Quantifier-free formula, 231Quantifier interchange (QI), 224–225, 235,

263, 269–271Quine’s set theory (NF), 17

Index 281

RRAA. See Reductio ad absurdum (RAA)Random variable, 141, 160Range, 32, 35, 55, 60–61, 65–67, 72, 74, 77,

99, 108, 119, 125, 141, 148, 160, 219,221

Range space, 141, 160Ratio definition, 147–150Rational number, 23, 80, 102Ratio/unit definition, 149Real interval, 138Real number, 23, 29, 30, 69, 222, 240Rearrangement, 84, 130–136Reasoning, 7, 12, 17, 30, 32, 39–42, 50, 52, 58,

60, 65, 67, 77, 85, 88, 91–93, 107, 114,115, 119, 120, 125, 130, 137, 138, 140,143, 150, 151, 153, 157, 158, 161, 177,182, 189, 190, 209, 220, 221, 227, 229,230, 233, 235, 248, 252, 253, 262, 263,267, 268

Recursion, 53, 62, 77, 79–111, 139, 169–172,178, 187, 194, 220, 221, 223, 228, 229,257, 268

Recursion step, 89, 90, 92, 93, 99, 101,107–109, 139, 170, 187, 220, 221, 228,229

Recursive program, 108–109Reductio ad absurdum (RAA), 6, 51, 105, 168,

182, 183, 260, 261, 271Redundant letter, 202, 205, 206, 208–209Reflexive order, 49Reflexive relation, 42, 49, 50Relation, 2, 27, 58, 94, 134, 148, 165, 189,

218, 246Relative product, 38, 39Relettering, 106, 226–227Repetition

in a proof, 257in a selection, 118–121, 124, 126–130

Replacement, 164, 199, 225–227, 238–239,272

Restriction, 37, 61–62, 65, 72, 77, 96, 110Reverse Polish notation, 176, 194Revision, 189, 190, 194, 235Revisionary conditional probability, 149Right projection, 73, 78, 255Risk, 65, 125, 248Rooted tree, 165–171, 176, 180–182, 184–187,

261Root of a tree, 165, 168Rosenhouse, J., 164Rosen, K., 164Rules, higher-level, 252, 254, 256, 257, 262,

273

Russell, B., 18Ryan, M., 215, 241

SSample point, 141Sample space, 138–141, 143–148, 152–155,

159, 161–164Satisfaction, 233Schumacher, C., 111Scope of a quantifier, 223, 236, 238Second-level inference, 266, 273Second-level rule, 236, 237, 254–256, 259,

263, 264, 266–268, 271, 273Second-order logic, 222Selection, 36–37, 115, 116, 118–131, 134, 135,

146Semantic decomposition tree, 172, 209–213,

215, 244, 254Semantics, 172, 175, 202–205, 224, 227–230,

232–240, 265Separation axiom scheme, 17Sequence, 52, 74–77, 108, 117, 121, 123, 124,

129, 135, 167–169, 176, 177, 181, 182,244–246, 249, 256, 257, 266–269

Sequent, 267, 269, 271Set, 1–24, 54, 69, 74–75, 94–97, 102–105, 107,

115Sibling, 35, 42–44, 167Sigma-additivity, 144Similarity relation, 44Simple cycle, 182–186Simple induction, 80–85, 91, 109Simple recursion, 85–89, 109Simplification, 16, 59, 87–90, 94, 108, 132,

140, 147, 177, 196, 207, 218, 222, 228,253, 264, 271

Simpson’s paradox, 153–155Simultaneous induction, 92–93Simultaneous recursion, 92–93Singleton, 2, 4–5, 20, 28, 31, 37, 50, 61, 100,

103, 131, 141, 144, 168, 170, 195, 196,208, 240–241, 247, 268, 272

Soundness theorem for chaining, 249, 250Source of a relation, 31Spanning tree, 185–187Split-level rules, 255–257, 260, 264, 267, 268Splitting, 207–208, 260Squeezing, 245, 257Standard distribution, 162Standard interpretation, 228Strict order, 49–52, 56Strict partial order, 52Strict part of a relation, 50, 51, 56

282 Index

String, 36, 75–77, 95–98, 100–101, 110, 180,221, 223

Strong reflexivity, 246–249Structural induction, 94, 97–99, 110, 194–195,

198–201, 211, 213–214Structural recursion, 77, 93–101, 109–110Subalgebra, 96Subjectivist conception, 141Subordinate inference, 254–256, 258, 259,

264–266Subproof, 254, 256–258, 267, 269, 270, 272Subrelation, 32, 40, 52, 55, 182, 185, 207, 249,

250Subset, 2, 7, 16, 17, 20, 30, 31, 33, 34, 36, 37,

41, 42, 46–48, 54, 60, 61, 63, 73, 77, 78,96, 102–105, 110, 117, 119, 124–126,130–132, 134, 138–141, 143, 144, 147,148, 152, 155, 172, 191, 207, 241, 250

Substitution, 46, 88, 101, 200, 201, 206, 214,229–232, 234–237, 240, 271

Substitutional reading, 229–232, 240Subtraction principle, 114Subtree, 170, 177–179, 187, 245, 257Superset, 2, 32, 34, 42, 61–62, 65, 94–95, 255Supposition, 15, 30, 48, 50, 51, 82, 104–105,

168, 209, 235, 248, 249, 252–265, 269,270, 272, 273

Surjective function, 66, 78, 160Suspension points, 9, 10Symmetric closure, 180, 181, 184Symmetric difference, 23, 24Symmetry, 23, 24, 33, 38, 44, 45, 47–52, 55,

155, 157, 180–186, 197, 239, 272Syntactic decomposition tree, 174–176, 187,

194, 196

TTable for a relation, 33Tail of a list, 76Target, 31–34, 36, 38, 58, 62, 138, 157, 163Tarski conditions, 247–252, 267, 268Tautological equivalence, 101, 195, 197–200,

202–204, 214, 269Tautological implication (consequence),

195–197, 199–201, 214, 243–245, 247,253, 258, 261–263

Tautology, 101, 172, 195, 199–201, 204,209–211, 213, 214, 263

Term, 2, 19, 29, 32, 40, 58, 59, 61, 64, 67, 74,75, 79, 121, 126, 138, 142, 144, 155,169, 178, 183, 202, 220, 221, 228, 229,236, 238–239, 247, 262, 265, 268

Theorem of a formal system, 96

Third-level rule, 269Top-down definition, 54Top-down evaluation, 89, 90Total order, 50Tracinga function, 88Transitive closure, 34, 52–54, 110, 165Transitivity, 41, 43, 45, 48, 50, 52, 55, 103,

105, 196, 238, 239, 246, 247, 272Translation, 40, 141, 203, 223, 230, 231Tree, 27, 106, 165, 190, 244Truth, 12–14, 17, 22–24, 30, 46, 110, 175, 176,

191, 195, 196, 227, 233, 263Truth-function, 218, 226Truth-functional connective, 13, 14, 24,

190–193, 199, 212–214, 219, 225, 229,230, 233, 273

Truth-tables, 195, 203Tuple, ordered, 27–32, 119, 127Turnstile, 233Two-sided procedure, 47

UUncertain inference, 190, 248, 273–274Undirected tree, 180Unfolding, 88–90, 102Uniform distribution, 138Union of sets, 18, 19, 37, 52, 110, 117, 214Unique decomposition, 77, 99–101, 176, 220Unique minimality, 104Unique readability, 99–101Universal generalization (UG), 23, 236, 263Universal instantiation (UI), 236Universal quantifier, 39, 81, 218, 220, 222,

225, 226, 230, 253, 255Universe of discourse, 222Unrooted tree, 168, 180–187, 259Until clause, 108Update, 190Upper bound, 13, 151Use and mention, 220–221

VVacuous quantification, 226, 235, 240Valuation of a formula, 195Value of a function, 57, 58, 66, 75, 92Variable, 39, 48, 57, 66, 73, 74, 84, 85, 93, 106,

121, 129, 160, 173, 174, 176, 217–221,223, 226–229, 231–240, 263–265, 269,272

Variance, 162Velleman, D., 56, 78, 111, 274Venn diagram, 8–11, 13, 14, 16, 23, 32, 114

Index 283

Vertex, 183, 186, 187von Plato, J., 274

WWay of intersection, 94Way of union, 94, 96Well-founded induction, 104–106, 109Well-founded recursion, 93, 107–108Well-founded relation, 102, 104, 110Well-founded set, 102–111Well-ordered set, 103, 104Whenever, 5, 6, 12, 13, 17, 18, 21, 24, 32,

42–46, 49–54, 56, 58, 61, 63, 64, 68,72, 81–83, 89, 95, 97, 98, 101, 104,106, 138, 139, 149, 151, 153, 164, 170,171, 175, 183, 194, 195, 199, 201, 210,220, 221, 236–238, 246–248, 250, 251,253–255, 258, 261, 263, 264, 267, 268,271–273

While clauses, 108Whitehead, 175Wikipedia, 120Wlog, 105, 106, 182, 261Wojcicki, R., 274

Xx-variant reading, 229, 231–235, 240

YYule, G.U., 153

ZZermelo-Fraenkel set theory (ZF), 18Zero-ary connective, 206, 214ZF. See Zermelo-Fraenkel set theory (ZF)

undergraduatetopicsincomputerscienceperso.ens-lyon.fr/.../sets_logic_and_maths_for_computing-makinson… ·...

Documents