introductionto theoretical computerscience

650
BOAZ BARAK INTRODUCTION TO THEORETICAL COMPUTER SCIENCE TEXTBOOK IN PREPARATION. AVAILABLE ON HTTPS://INTROTCS.ORG

Upload: others

Post on 14-May-2022

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

B OA Z BA R A K

I N T RO D U C T I O N TOT H E O R E T I CA LCO M P U T E R S C I E N C E

T E X T B O O K I N P R E PA R AT I O N .AVA I L A B L E O N HTTPS://INTROTCS.ORG

Page 2: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Text available on https://github.com/boazbk/tcs - please post any issues there - thank you!

This version was compiled on Wednesday 26th August, 2020 18:10

Copyright ยฉ 2020 Boaz Barak

This work is licensed under a CreativeCommons โ€œAttribution-NonCommercial-NoDerivatives 4.0 Internationalโ€ license.

Page 3: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

To Ravit, Alma and Goren.

Page 4: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 5: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Contents

Preface 9

Preliminaries 17

0 Introduction 19

1 Mathematical Background 37

2 Computation and Representation 73

I Finite computation 111

3 Defining computation 113

4 Syntactic sugar, and computing every function 149

5 Code as data, data as code 175

II Uniform computation 205

6 Functions with Infinite domains, Automata, and Regularexpressions 207

7 Loops and infinity 241

8 Equivalent models of computation 271

9 Universality and uncomputability 315

10 Restricted computational models 347

11 Is every theorem provable? 365

Compiled on 8.26.2020 18:10

Page 6: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

6

III Efficient algorithms 385

12 Efficient computation: An informal introduction 387

13 Modeling running time 407

14 Polynomial-time reductions 439

15 NP, NP completeness, and the Cook-Levin Theorem 465

16 What if P equals NP? 483

17 Space bounded computation 503

IV Randomized computation 505

18 Probability Theory 101 507

19 Probabilistic computation 527

20 Modeling randomized computation 539

V Advanced topics 561

21 Cryptography 563

22 Proofs and algorithms 591

23 Quantum computing 593

VI Appendices 625

Page 7: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Contents (detailed)

Preface 90.1 To the student . . . . . . . . . . . . . . . . . . . . . . . . 10

0.1.1 Is the effort worth it? . . . . . . . . . . . . . . . . 110.2 To potential instructors . . . . . . . . . . . . . . . . . . . 120.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 14

Preliminaries 17

0 Introduction 190.1 Integer multiplication: an example of an algorithm . . . 200.2 Extended Example: A faster way to multiply (optional) 220.3 Algorithms beyond arithmetic . . . . . . . . . . . . . . . 270.4 On the importance of negative results . . . . . . . . . . 280.5 Roadmap to the rest of this book . . . . . . . . . . . . . 29

0.5.1 Dependencies between chapters . . . . . . . . . . 300.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 320.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 33

1 Mathematical Background 371.1 This chapter: a readerโ€™s manual . . . . . . . . . . . . . . 371.2 A quick overview of mathematical prerequisites . . . . 381.3 Reading mathematical texts . . . . . . . . . . . . . . . . 39

1.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . 401.3.2 Assertions: Theorems, lemmas, claims . . . . . . 401.3.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 40

1.4 Basic discrete math objects . . . . . . . . . . . . . . . . . 411.4.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 411.4.2 Special sets . . . . . . . . . . . . . . . . . . . . . . 421.4.3 Functions . . . . . . . . . . . . . . . . . . . . . . . 441.4.4 Graphs . . . . . . . . . . . . . . . . . . . . . . . . 461.4.5 Logic operators and quantifiers . . . . . . . . . . 491.4.6 Quantifiers for summations and products . . . . 501.4.7 Parsing formulas: bound and free variables . . . 501.4.8 Asymptotics and Big-๐‘‚ notation . . . . . . . . . . 52

Page 8: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

8

1.4.9 Some โ€œrules of thumbโ€ for Big-๐‘‚ notation . . . . 531.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

1.5.1 Proofs and programs . . . . . . . . . . . . . . . . 551.5.2 Proof writing style . . . . . . . . . . . . . . . . . . 551.5.3 Patterns in proofs . . . . . . . . . . . . . . . . . . 56

1.6 Extended example: Topological Sorting . . . . . . . . . 581.6.1 Mathematical induction . . . . . . . . . . . . . . . 601.6.2 Proving the result by induction . . . . . . . . . . 611.6.3 Minimality and uniqueness . . . . . . . . . . . . . 63

1.7 This book: notation and conventions . . . . . . . . . . . 651.7.1 Variable name conventions . . . . . . . . . . . . . 661.7.2 Some idioms . . . . . . . . . . . . . . . . . . . . . 67

1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 691.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 71

2 Computation and Representation 732.1 Defining representations . . . . . . . . . . . . . . . . . . 75

2.1.1 Representing natural numbers . . . . . . . . . . . 762.1.2 Meaning of representations (discussion) . . . . . 78

2.2 Representations beyond natural numbers . . . . . . . . 782.2.1 Representing (potentially negative) integers . . . 792.2.2 Twoโ€™s complement representation (optional) . . 792.2.3 Rational numbers, and representing pairs of

strings . . . . . . . . . . . . . . . . . . . . . . . . . 802.3 Representing real numbers . . . . . . . . . . . . . . . . . 822.4 Cantorโ€™s Theorem, countable sets, and string represen-

tations of the real numbers . . . . . . . . . . . . . . . . . 832.4.1 Corollary: Boolean functions are uncountable . . 892.4.2 Equivalent conditions for countability . . . . . . 89

2.5 Representing objects beyond numbers . . . . . . . . . . 902.5.1 Finite representations . . . . . . . . . . . . . . . . 912.5.2 Prefix-free encoding . . . . . . . . . . . . . . . . . 912.5.3 Making representations prefix-free . . . . . . . . 942.5.4 โ€œProof by Pythonโ€ (optional) . . . . . . . . . . . 952.5.5 Representing letters and text . . . . . . . . . . . . 972.5.6 Representing vectors, matrices, images . . . . . . 992.5.7 Representing graphs . . . . . . . . . . . . . . . . . 992.5.8 Representing lists and nested lists . . . . . . . . . 992.5.9 Notation . . . . . . . . . . . . . . . . . . . . . . . . 100

2.6 Defining computational tasks as mathematical functions 1002.6.1 Distinguish functions from programs! . . . . . . 102

2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 108

Page 9: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

9

I Finite computation 111

3 Defining computation 1133.1 Defining computation . . . . . . . . . . . . . . . . . . . . 1153.2 Computing using AND, OR, and NOT. . . . . . . . . . . 116

3.2.1 Some properties of AND and OR . . . . . . . . . 1183.2.2 Extended example: Computing XOR from

AND, OR, and NOT . . . . . . . . . . . . . . . . . 1193.2.3 Informally defining โ€œbasic operationsโ€ and

โ€œalgorithmsโ€ . . . . . . . . . . . . . . . . . . . . . 1213.3 Boolean Circuits . . . . . . . . . . . . . . . . . . . . . . . 123

3.3.1 Boolean circuits: a formal definition . . . . . . . . 1243.3.2 Equivalence of circuits and straight-line programs 127

3.4 Physical implementations of computing devices (di-gression) . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

3.4.1 Transistors . . . . . . . . . . . . . . . . . . . . . . 1313.4.2 Logical gates from transistors . . . . . . . . . . . 1323.4.3 Biological computing . . . . . . . . . . . . . . . . 1323.4.4 Cellular automata and the game of life . . . . . . 1323.4.5 Neural networks . . . . . . . . . . . . . . . . . . . 1323.4.6 A computer made from marbles and pipes . . . . 133

3.5 The NAND function . . . . . . . . . . . . . . . . . . . . 1343.5.1 NAND Circuits . . . . . . . . . . . . . . . . . . . . 1353.5.2 More examples of NAND circuits (optional) . . . 1363.5.3 The NAND-CIRC Programming language . . . . 138

3.6 Equivalence of all these models . . . . . . . . . . . . . . 1403.6.1 Circuits with other gate sets . . . . . . . . . . . . 1413.6.2 Specification vs. implementation (again) . . . . . 142

3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1433.8 Biographical notes . . . . . . . . . . . . . . . . . . . . . . 146

4 Syntactic sugar, and computing every function 1494.1 Some examples of syntactic sugar . . . . . . . . . . . . . 151

4.1.1 User-defined procedures . . . . . . . . . . . . . . 1514.1.2 Proof by Python (optional) . . . . . . . . . . . . . 1534.1.3 Conditional statements . . . . . . . . . . . . . . . 154

4.2 Extended example: Addition and Multiplication (op-tional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

4.3 The LOOKUP function . . . . . . . . . . . . . . . . . . . 1584.3.1 Constructing a NAND-CIRC program for

LOOKUP . . . . . . . . . . . . . . . . . . . . . . . 1594.4 Computing every function . . . . . . . . . . . . . . . . . 161

4.4.1 Proof of NANDโ€™s Universality . . . . . . . . . . . 1624.4.2 Improving by a factor of ๐‘› (optional) . . . . . . . 163

Page 10: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

10

4.5 Computing every function: An alternative proof . . . . 1654.6 The class SIZE(๐‘‡ ) . . . . . . . . . . . . . . . . . . . . . . 1674.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 1704.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 174

5 Code as data, data as code 1755.1 Representing programs as strings . . . . . . . . . . . . . 1775.2 Counting programs, and lower bounds on the size of

NAND-CIRC programs . . . . . . . . . . . . . . . . . . . 1785.2.1 Size hierarchy theorem (optional) . . . . . . . . . 180

5.3 The tuples representation . . . . . . . . . . . . . . . . . 1825.3.1 From tuples to strings . . . . . . . . . . . . . . . . 183

5.4 A NAND-CIRC interpreter in NAND-CIRC . . . . . . . 1845.4.1 Efficient universal programs . . . . . . . . . . . . 1855.4.2 A NAND-CIRC interpeter in โ€œpseudocodeโ€ . . . 1865.4.3 A NAND interpreter in Python . . . . . . . . . . 1875.4.4 Constructing the NAND-CIRC interpreter in

NAND-CIRC . . . . . . . . . . . . . . . . . . . . . 1885.5 A Python interpreter in NAND-CIRC (discussion) . . . 1915.6 The physical extended Church-Turing thesis (discus-

sion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1935.6.1 Attempts at refuting the PECTT . . . . . . . . . . 195

5.7 Recap of Part I: Finite Computation . . . . . . . . . . . . 1995.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2015.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 203

II Uniform computation 205

6 Functions with Infinite domains, Automata, and Regularexpressions 2076.1 Functions with inputs of unbounded length . . . . . . . 208

6.1.1 Varying inputs and outputs . . . . . . . . . . . . . 2096.1.2 Formal Languages . . . . . . . . . . . . . . . . . . 2116.1.3 Restrictions of functions . . . . . . . . . . . . . . . 211

6.2 Deterministic finite automata (optional) . . . . . . . . . 2126.2.1 Anatomy of an automaton (finite vs. unbounded) 2156.2.2 DFA-computable functions . . . . . . . . . . . . . 216

6.3 Regular expressions . . . . . . . . . . . . . . . . . . . . . 2176.3.1 Algorithms for matching regular expressions . . 221

6.4 Efficient matching of regular expressions (optional) . . 2236.4.1 Matching regular expressions using DFAs . . . . 2276.4.2 Equivalence of regular expressions and automata 2286.4.3 Closure properties of regular expressions . . . . 230

Page 11: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

11

6.5 Limitations of regular expressions and the pumpinglemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

6.6 Answering semantic questions about regular expres-sions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2396.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 240

7 Loops and infinity 2417.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . 242

7.1.1 Extended example: A Turing machine for palin-dromes . . . . . . . . . . . . . . . . . . . . . . . . 244

7.1.2 Turing machines: a formal definition . . . . . . . 2457.1.3 Computable functions . . . . . . . . . . . . . . . . 2477.1.4 Infinite loops and partial functions . . . . . . . . 248

7.2 Turing machines as programming languages . . . . . . 2497.2.1 The NAND-TM Programming language . . . . . 2517.2.2 Sneak peak: NAND-TM vs Turing machines . . . 2547.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . 254

7.3 Equivalence of Turing machines and NAND-TM pro-grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

7.3.1 Specification vs implementation (again) . . . . . 2607.4 NAND-TM syntactic sugar . . . . . . . . . . . . . . . . . 261

7.4.1 โ€œGOTOโ€ and inner loops . . . . . . . . . . . . . . 2617.5 Uniformity, and NAND vs NAND-TM (discussion) . . 2647.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 2657.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 268

8 Equivalent models of computation 2718.1 RAM machines and NAND-RAM . . . . . . . . . . . . . 2738.2 The gory details (optional) . . . . . . . . . . . . . . . . . 277

8.2.1 Indexed access in NAND-TM . . . . . . . . . . . . 2778.2.2 Two dimensional arrays in NAND-TM . . . . . . 2798.2.3 All the rest . . . . . . . . . . . . . . . . . . . . . . 279

8.3 Turing equivalence (discussion) . . . . . . . . . . . . . . 2808.3.1 The โ€œBest of both worldsโ€ paradigm . . . . . . . 2818.3.2 Letโ€™s talk about abstractions . . . . . . . . . . . . 2818.3.3 Turing completeness and equivalence, a formal

definition (optional) . . . . . . . . . . . . . . . . . 2838.4 Cellular automata . . . . . . . . . . . . . . . . . . . . . . 284

8.4.1 One dimensional cellular automata are Turingcomplete . . . . . . . . . . . . . . . . . . . . . . . 286

8.4.2 Configurations of Turing machines and thenext-step function . . . . . . . . . . . . . . . . . . 287

Page 12: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

12

8.5 Lambda calculus and functional programming lan-guages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

8.5.1 Applying functions to functions . . . . . . . . . . 2908.5.2 Obtaining multi-argument functions via Currying 2918.5.3 Formal description of the ฮป calculus . . . . . . . . 2928.5.4 Infinite loops in the ฮป calculus . . . . . . . . . . . 295

8.6 The โ€œEnhancedโ€ ฮป calculus . . . . . . . . . . . . . . . . 2958.6.1 Computing a function in the enhanced ฮป calculus 2978.6.2 Enhanced ฮป calculus is Turing-complete . . . . . 298

8.7 From enhanced to pure ฮป calculus . . . . . . . . . . . . 3018.7.1 List processing . . . . . . . . . . . . . . . . . . . . 3028.7.2 The Y combinator, or recursion without recursion 303

8.8 The Church-Turing Thesis (discussion) . . . . . . . . . 3068.8.1 Different models of computation . . . . . . . . . . 307

8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 3088.10 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 312

9 Universality and uncomputability 3159.1 Universality or a meta-circular evaluator . . . . . . . . . 316

9.1.1 Proving the existence of a universal TuringMachine . . . . . . . . . . . . . . . . . . . . . . . . 318

9.1.2 Implications of universality (discussion) . . . . . 3209.2 Is every function computable? . . . . . . . . . . . . . . . 3219.3 The Halting problem . . . . . . . . . . . . . . . . . . . . 323

9.3.1 Is the Halting problem really hard? (discussion) 3269.3.2 A direct proof of the uncomputability of HALT

(optional) . . . . . . . . . . . . . . . . . . . . . . . 3279.4 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . 329

9.4.1 Example: Halting on the zero problem . . . . . . 3309.5 Riceโ€™s Theorem and the impossibility of general soft-

ware verification . . . . . . . . . . . . . . . . . . . . . . . 3339.5.1 Riceโ€™s Theorem . . . . . . . . . . . . . . . . . . . . 3359.5.2 Halting and Riceโ€™s Theorem for other Turing-

complete models . . . . . . . . . . . . . . . . . . . 3399.5.3 Is software verification doomed? (discussion) . . 340

9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 3429.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 345

10 Restricted computational models 34710.1 Turing completeness as a bug . . . . . . . . . . . . . . . 34710.2 Context free grammars . . . . . . . . . . . . . . . . . . . 349

10.2.1 Context-free grammars as a computational model 35110.2.2 The power of context free grammars . . . . . . . 35310.2.3 Limitations of context-free grammars (optional) 355

Page 13: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

13

10.3 Semantic properties of context free languages . . . . . . 35710.3.1 Uncomputability of context-free grammar

equivalence (optional) . . . . . . . . . . . . . . . 35710.4 Summary of semantic properties for regular expres-

sions and context-free grammars . . . . . . . . . . . . . 36010.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 36110.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 362

11 Is every theorem provable? 36511.1 Hilbertโ€™s Program and Gรถdelโ€™s Incompleteness Theorem 365

11.1.1 Defining โ€œProof Systemsโ€ . . . . . . . . . . . . . . 36711.2 Gรถdelโ€™s Incompleteness Theorem: Computational

variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36811.3 Quantified integer statements . . . . . . . . . . . . . . . 37011.4 Diophantine equations and the MRDP Theorem . . . . 37211.5 Hardness of quantified integer statements . . . . . . . . 374

11.5.1 Step 1: Quantified mixed statements and com-putation histories . . . . . . . . . . . . . . . . . . 375

11.5.2 Step 2: Reducing mixed statements to integerstatements . . . . . . . . . . . . . . . . . . . . . . 378

11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 38011.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 382

III Efficient algorithms 385

12 Efficient computation: An informal introduction 38712.1 Problems on graphs . . . . . . . . . . . . . . . . . . . . . 389

12.1.1 Finding the shortest path in a graph . . . . . . . . 39012.1.2 Finding the longest path in a graph . . . . . . . . 39212.1.3 Finding the minimum cut in a graph . . . . . . . 39212.1.4 Min-Cut Max-Flow and Linear programming . . 39312.1.5 Finding the maximum cut in a graph . . . . . . . 39512.1.6 A note on convexity . . . . . . . . . . . . . . . . . 395

12.2 Beyond graphs . . . . . . . . . . . . . . . . . . . . . . . . 39712.2.1 SAT . . . . . . . . . . . . . . . . . . . . . . . . . . 39712.2.2 Solving linear equations . . . . . . . . . . . . . . . 39812.2.3 Solving quadratic equations . . . . . . . . . . . . 399

12.3 More advanced examples . . . . . . . . . . . . . . . . . 39912.3.1 Determinant of a matrix . . . . . . . . . . . . . . . 39912.3.2 Permanent of a matrix . . . . . . . . . . . . . . . . 40112.3.3 Finding a zero-sum equilibrium . . . . . . . . . . 40112.3.4 Finding a Nash equilibrium . . . . . . . . . . . . 40212.3.5 Primality testing . . . . . . . . . . . . . . . . . . . 40212.3.6 Integer factoring . . . . . . . . . . . . . . . . . . . 403

Page 14: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

14

12.4 Our current knowledge . . . . . . . . . . . . . . . . . . . 40312.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 40412.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 40412.7 Further explorations . . . . . . . . . . . . . . . . . . . . 405

13 Modeling running time 40713.1 Formally defining running time . . . . . . . . . . . . . . 409

13.1.1 Polynomial and Exponential Time . . . . . . . . . 41013.2 Modeling running time using RAM Machines /

NAND-RAM . . . . . . . . . . . . . . . . . . . . . . . . . 41213.3 Extended Church-Turing Thesis (discussion) . . . . . . 41713.4 Efficient universal machine: a NAND-RAM inter-

preter in NAND-RAM . . . . . . . . . . . . . . . . . . . 41813.4.1 Timed Universal Turing Machine . . . . . . . . . 420

13.5 The time hierarchy theorem . . . . . . . . . . . . . . . . 42113.6 Non uniform computation . . . . . . . . . . . . . . . . . 425

13.6.1 Oblivious NAND-TM programs . . . . . . . . . . 42713.6.2 โ€œUnrolling the loopโ€: algorithmic transforma-

tion of Turing Machines to circuits . . . . . . . . . 43013.6.3 Can uniform algorithms simulate non uniform

ones? . . . . . . . . . . . . . . . . . . . . . . . . . 43213.6.4 Uniform vs. Nonuniform computation: A recap . 434

13.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 43513.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 438

14 Polynomial-time reductions 43914.1 Formal definitions of problems . . . . . . . . . . . . . . 44014.2 Polynomial-time reductions . . . . . . . . . . . . . . . . 441

14.2.1 Whistling pigs and flying horses . . . . . . . . . . 44214.3 Reducing 3SAT to zero one and quadratic equations . . 444

14.3.1 Quadratic equations . . . . . . . . . . . . . . . . . 44714.4 The independent set problem . . . . . . . . . . . . . . . 44914.5 Some exercises and anatomy of a reduction. . . . . . . . 452

14.5.1 Dominating set . . . . . . . . . . . . . . . . . . . . 45314.5.2 Anatomy of a reduction . . . . . . . . . . . . . . . 456

14.6 Reducing Independent Set to Maximum Cut . . . . . . 45814.7 Reducing 3SAT to Longest Path . . . . . . . . . . . . . . 459

14.7.1 Summary of relations . . . . . . . . . . . . . . . . 46214.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 46214.9 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 463

15 NP, NP completeness, and the Cook-Levin Theorem 46515.1 The class NP . . . . . . . . . . . . . . . . . . . . . . . . . 465

15.1.1 Examples of functions in NP . . . . . . . . . . . . 46815.1.2 Basic facts about NP . . . . . . . . . . . . . . . . . 470

Page 15: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

15

15.2 From NP to 3SAT: The Cook-Levin Theorem . . . . . . . 47115.2.1 What does this mean? . . . . . . . . . . . . . . . . 47315.2.2 The Cook-Levin Theorem: Proof outline . . . . . 473

15.3 The NANDSAT Problem, and why it is NP hard . . . . 47415.4 The 3NAND problem . . . . . . . . . . . . . . . . . . . . 47615.5 From 3NAND to 3SAT . . . . . . . . . . . . . . . . . . . 47915.6 Wrapping up . . . . . . . . . . . . . . . . . . . . . . . . . 48015.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 48115.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 482

16 What if P equals NP? 48316.1 Search-to-decision reduction . . . . . . . . . . . . . . . . 48416.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 487

16.2.1 Example: Supervised learning . . . . . . . . . . . 49016.2.2 Example: Breaking cryptosystems . . . . . . . . . 491

16.3 Finding mathematical proofs . . . . . . . . . . . . . . . 49116.4 Quantifier elimination (advanced) . . . . . . . . . . . . 492

16.4.1 Application: self improving algorithm for 3SAT . 49516.5 Approximating counting problems and posterior

sampling (advanced, optional) . . . . . . . . . . . . . . 49516.6 What does all of this imply? . . . . . . . . . . . . . . . . 49616.7 Can P โ‰  NP be neither true nor false? . . . . . . . . . . 49816.8 Is P = NP โ€œin practiceโ€? . . . . . . . . . . . . . . . . . . 49916.9 What if P โ‰  NP? . . . . . . . . . . . . . . . . . . . . . . . 50016.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 50216.11 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 502

17 Space bounded computation 50317.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 50317.2 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 503

IV Randomized computation 505

18 Probability Theory 101 50718.1 Random coins . . . . . . . . . . . . . . . . . . . . . . . . 507

18.1.1 Random variables . . . . . . . . . . . . . . . . . . 51018.1.2 Distributions over strings . . . . . . . . . . . . . . 51218.1.3 More general sample spaces . . . . . . . . . . . . 512

18.2 Correlations and independence . . . . . . . . . . . . . . 51318.2.1 Independent random variables . . . . . . . . . . . 51418.2.2 Collections of independent random variables . . 516

18.3 Concentration and tail bounds . . . . . . . . . . . . . . . 51618.3.1 Chebyshevโ€™s Inequality . . . . . . . . . . . . . . . 51818.3.2 The Chernoff bound . . . . . . . . . . . . . . . . . 519

Page 16: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

16

18.3.3 Application: Supervised learning and empiricalrisk minimization . . . . . . . . . . . . . . . . . . 520

18.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 52218.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 525

19 Probabilistic computation 52719.1 Finding approximately good maximum cuts . . . . . . . 528

19.1.1 Amplifying the success of randomized algorithms 52919.1.2 Success amplification . . . . . . . . . . . . . . . . 53019.1.3 Two-sided amplification . . . . . . . . . . . . . . . 53119.1.4 What does this mean? . . . . . . . . . . . . . . . . 53119.1.5 Solving SAT through randomization . . . . . . . 53219.1.6 Bipartite matching . . . . . . . . . . . . . . . . . . 534

19.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 53719.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 53719.4 Acknowledgements . . . . . . . . . . . . . . . . . . . . . 537

20 Modeling randomized computation 53920.1 Modeling randomized computation . . . . . . . . . . . 540

20.1.1 An alternative view: random coins as an โ€œextrainputโ€ . . . . . . . . . . . . . . . . . . . . . . . . . 542

20.1.2 Success amplification of two-sided error algo-rithms . . . . . . . . . . . . . . . . . . . . . . . . . 544

20.2 BPP and NP completeness . . . . . . . . . . . . . . . . . 54520.3 The power of randomization . . . . . . . . . . . . . . . . 547

20.3.1 Solving BPP in exponential time . . . . . . . . . . 54720.3.2 Simulating randomized algorithms by circuits . . 547

20.4 Derandomization . . . . . . . . . . . . . . . . . . . . . . 54920.4.1 Pseudorandom generators . . . . . . . . . . . . . 55020.4.2 From existence to constructivity . . . . . . . . . . 55220.4.3 Usefulness of pseudorandom generators . . . . . 553

20.5 P = NP and BPP vs P . . . . . . . . . . . . . . . . . . . . 55420.6 Non-constructive existence of pseudorandom genera-

tors (advanced, optional) . . . . . . . . . . . . . . . . . 55720.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 56020.8 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 560

V Advanced topics 561

21 Cryptography 56321.1 Classical cryptosystems . . . . . . . . . . . . . . . . . . . 56421.2 Defining encryption . . . . . . . . . . . . . . . . . . . . . 56621.3 Defining security of encryption . . . . . . . . . . . . . . 56721.4 Perfect secrecy . . . . . . . . . . . . . . . . . . . . . . . . 569

Page 17: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

17

21.4.1 Example: Perfect secrecy in the battlefield . . . . 57021.4.2 Constructing perfectly secret encryption . . . . . 571

21.5 Necessity of long keys . . . . . . . . . . . . . . . . . . . 57221.6 Computational secrecy . . . . . . . . . . . . . . . . . . . 574

21.6.1 Stream ciphers or the โ€œderandomized one-timepadโ€ . . . . . . . . . . . . . . . . . . . . . . . . . . 576

21.7 Computational secrecy and NP . . . . . . . . . . . . . . 57921.8 Public key cryptography . . . . . . . . . . . . . . . . . . 581

21.8.1 Defining public key encryption . . . . . . . . . . 58321.8.2 Diffie-Hellman key exchange . . . . . . . . . . . . 584

21.9 Other security notions . . . . . . . . . . . . . . . . . . . 58621.10 Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586

21.10.1 Zero knowledge proofs . . . . . . . . . . . . . . . 58621.10.2 Fully homomorphic encryption . . . . . . . . . . 58721.10.3 Multiparty secure computation . . . . . . . . . . 587

21.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 58921.12 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 589

22 Proofs and algorithms 59122.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 59122.2 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 591

23 Quantum computing 59323.1 The double slit experiment . . . . . . . . . . . . . . . . . 59423.2 Quantum amplitudes . . . . . . . . . . . . . . . . . . . . 594

23.2.1 Linear algebra quick review . . . . . . . . . . . . 59723.3 Bellโ€™s Inequality . . . . . . . . . . . . . . . . . . . . . . . 59823.4 Quantum weirdness . . . . . . . . . . . . . . . . . . . . . 60023.5 Quantum computing and computation - an executive

summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 60023.6 Quantum systems . . . . . . . . . . . . . . . . . . . . . . 602

23.6.1 Quantum amplitudes . . . . . . . . . . . . . . . . 60423.6.2 Quantum systems: an executive summary . . . . 605

23.7 Analysis of Bellโ€™s Inequality (optional) . . . . . . . . . . 60523.8 Quantum computation . . . . . . . . . . . . . . . . . . . 608

23.8.1 Quantum circuits . . . . . . . . . . . . . . . . . . 60823.8.2 QNAND-CIRC programs (optional) . . . . . . . 61123.8.3 Uniform computation . . . . . . . . . . . . . . . . 612

23.9 Physically realizing quantum computation . . . . . . . 61323.10 Shorโ€™s Algorithm: Hearing the shape of prime factors . 614

23.10.1 Period finding . . . . . . . . . . . . . . . . . . . . 61423.10.2 Shorโ€™s Algorithm: A birdโ€™s eye view . . . . . . . . 615

23.11 Quantum Fourier Transform (advanced, optional) . . . 617

Page 18: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

18

23.11.1 Quantum Fourier Transform over the BooleanCube: Simonโ€™s Algorithm . . . . . . . . . . . . . . 619

23.11.2 From Fourier to Period finding: Simonโ€™s Algo-rithm (advanced, optional) . . . . . . . . . . . . . 620

23.11.3 From Simon to Shor (advanced, optional) . . . . 62123.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 62323.13 Bibliographical notes . . . . . . . . . . . . . . . . . . . . 623

VI Appendices 625

Page 19: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Preface

โ€œWe make ourselves no promises, but we cherish the hope that the unobstructedpursuit of useless knowledge will prove to have consequences in the futureas in the pastโ€ โ€ฆ โ€œAn institution which sets free successive generations ofhuman souls is amply justified whether or not this graduate or that makes aso-called useful contribution to human knowledge. A poem, a symphony, apainting, a mathematical truth, a new scientific fact, all bear in themselves allthe justification that universities, colleges, and institutes of research need orrequireโ€, Abraham Flexner, The Usefulness of Useless Knowledge, 1939.

โ€œI suggest that you take the hardest courses that you can, because you learnthe most when you challenge yourselfโ€ฆ CS 121 I found pretty hard.โ€, MarkZuckerberg, 2005.

This is a textbook for an undergraduate introductory course onTheoretical Computer Science. The educational goals of this book areto convey the following:

โ€ข That computation arises in a variety of natural and human-madesystems, and not only in modern silicon-based computers.

โ€ข Similarly, beyond being an extremely important tool, computationalso serves as a useful lens to describe natural, physical, mathemati-cal and even social concepts.

โ€ข The notion of universality of many different computational models,and the related notion of the duality between code and data.

โ€ข The idea that one can precisely define a mathematical model ofcomputation, and then use that to prove (or sometimes only conjec-ture) lower bounds and impossibility results.

โ€ข Some of the surprising results and discoveries in modern theoreti-cal computer science, including the prevalence of NP-completeness,the power of interaction, the power of randomness on one handand the possibility of derandomization on the other, the abilityto use hardness โ€œfor goodโ€ in cryptography, and the fascinatingpossibility of quantum computing.

Compiled on 8.26.2020 18:10

Page 20: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

20

I hope that following this course, students would be able to rec-ognize computation, with both its power and pitfalls, as it arises invarious settings, including seemingly โ€œstaticโ€ content or โ€œrestrictedโ€formalisms such as macros and scripts. They should be able to followthrough the logic of proofs about computation, including the cen-tral concept of a reduction, as well as understanding โ€œself-referentialโ€proofs (such as diagonalization-based proofs that involve programsgiven their own code as input). Students should understand thatsome problems are inherently intractable, and be able to recognize thepotential for intractability when they are faced with a new problem.While this book only touches on cryptography, students should un-derstand the basic idea of how we can use computational hardness forcryptographic purposes. However, more than any specific skill, thisbook aims to introduce students to a new way of thinking of computa-tion as an object in its own right and to illustrate how this new way ofthinking leads to far-reaching insights and applications.

My aim in writing this text is to try to convey these concepts in thesimplest possible way and try to make sure that the formal notationand model help elucidate, rather than obscure, the main ideas. I alsotried to take advantage of modern studentsโ€™ familiarity (or at leastinterest!) in programming, and hence use (highly simplified) pro-gramming languages to describe our models of computation. Thatsaid, this book does not assume fluency with any particular program-ming language, but rather only some familiarity with the generalnotion of programming. We will use programming metaphors andidioms, occasionally mentioning specific programming languagessuch as Python, C, or Lisp, but students should be able to follow thesedescriptions even if they are not familiar with these languages.

Proofs in this book, including the existence of a universal TuringMachine, the fact that every finite function can be computed by somecircuit, the Cook-Levin theorem, and many others, are often con-structive and algorithmic, in the sense that they ultimately involvetransforming one program to another. While it is possible to followthese proofs without seeing the code, I do think that having accessto the code, and the ability to play around with it and see how it actson various programs, can make these theorems more concrete for thestudents. To that end, an accompanying website (which is still workin progress) allows executing programs in the various computationalmodels we define, as well as see constructive proofs of some of thetheorems.

0.1 TO THE STUDENT

This book can be challenging, mainly because it brings together avariety of ideas and techniques in the study of computation. There

Page 21: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

21

are quite a few technical hurdles to master, whether it is followingthe diagonalization argument for proving the Halting Problem isundecidable, combinatorial gadgets in NP-completeness reductions,analyzing probabilistic algorithms, or arguing about the adversary toprove the security of cryptographic primitives.

The best way to engage with this material is to read these notes ac-tively, so make sure you have a pen ready. While reading, I encourageyou to stop and think about the following:

โ€ข When I state a theorem, stop and take a shot at proving it on yourown before reading the proof. You will be amazed by how muchbetter you can understand a proof even after only 5 minutes ofattempting it on your own.

โ€ข When reading a definition, make sure that you understand whatthe definition means, and what the natural examples are of objectsthat satisfy it and objects that do not. Try to think of the motivationbehind the definition, and whether there are other natural ways toformalize the same concept.

โ€ข Actively notice which questions arise in your mind as you read thetext, and whether or not they are answered in the text.

As a general rule, it is more important that you understand thedefinitions than the theorems, and it is more important that youunderstand a theorem statement than its proof. After all, before youcan prove a theorem, you need to understand what it states, and tounderstand what a theorem is about, you need to know the definitionsof the objects involved. Whenever a proof of a theorem is at leastsomewhat complicated, I provide a โ€œproof idea.โ€ Feel free to skip theactual proof in a first reading, focusing only on the proof idea.

This book contains some code snippets, but this is by no meansa programming text. You donโ€™t need to know how to program tofollow this material. The reason we use code is that it is a precise wayto describe computation. Particular implementation details are notas important to us, and so we will emphasize code readability at theexpense of considerations such as error handling, encapsulation, etc.that can be extremely important for real-world programming.

0.1.1 Is the effort worth it?This is not an easy book, and you might reasonably wonder whyshould you spend the effort in learning this material. A traditionaljustification for a โ€œTheory of Computationโ€ course is that you mightencounter these concepts later on in your career. Perhaps you willcome across a hard problem and realize it is NP complete, or find aneed to use what you learned about regular expressions. This might

Page 22: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

22

1 An earlier book that starts with circuits as the initialmodel is John Savageโ€™s [Sav98].

very well be true, but the main benefit of this book is not in teachingyou any practical tool or technique, but instead in giving you a differ-ent way of thinking: an ability to recognize computational phenomenaeven when they occur in non-obvious settings, a way to model compu-tational tasks and questions, and to reason about them.

Regardless of any use you will derive from this book, I believelearning this material is important because it contains concepts thatare both beautiful and fundamental. The role that energy and matterplayed in the 20th century is played in the 21st by computation andinformation, not just as tools for our technology and economy, but alsoas the basic building blocks we use to understand the world. Thisbook will give you a taste of some of the theory behind those, andhopefully spark your curiosity to study more.

0.2 TO POTENTIAL INSTRUCTORS

I wrote this book for my Harvard course, but I hope that other lectur-ers will find it useful as well. To some extent, it is similar in contentto โ€œTheory of Computationโ€ or โ€œGreat Ideasโ€ courses such as thosetaught at CMU or MIT.

The most significant difference between our approach and moretraditional ones (such as Hopcroft and Ullmanโ€™s [HU69; HU79] andSipserโ€™s [Sip97]) is that we do not start with finite automata as our ini-tial computational model. Instead, our initial computational modelis Boolean Circuits.1 We believe that Boolean Circuits are more fun-damental to the theory of computing (and even its practice!) thanautomata. In particular, Boolean Circuits are a prerequisite for manyconcepts that one would want to teach in a modern course on Theoret-ical Computer Science, including cryptography, quantum computing,derandomization, attempts at proving P โ‰  NP, and more. Even incases where Boolean Circuits are not strictly required, they can of-ten offer significant simplifications (as in the case of the proof of theCook-Levin Theorem).

Furthermore, I believe there are pedagogical reasons to start withBoolean circuits as opposed to finite automata. Boolean circuits are amore natural model of computation, and one that corresponds moreclosely to computing in silicon, making the connection to practicemore immediate to the students. Finite functions are arguably easierto grasp than infinite ones, as we can fully write down their truth ta-ble. The theorem that every finite function can be computed by someBoolean circuit is both simple enough and important enough to serveas an excellent starting point for this course. Moreover, many of themain conceptual points of the theory of computation, including thenotions of the duality between code and data, and the idea of universal-ity, can already be seen in this context.

Page 23: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

23

After Boolean circuits, we move on to Turing machines and proveresults such as the existence of a universal Turing machine, the un-computability of the halting problem, and Riceโ€™s Theorem. Automataare discussed after we see Turing machines and undecidability, as anexample for a restricted computational model where problems such asdetermining halting can be effectively solved.

While this is not our motivation, the order we present circuits, Tur-ing machines, and automata roughly corresponds to the chronologicalorder of their discovery. Boolean algebra goes back to Booleโ€™s andDeMorganโ€™s works in the 1840s [Boo47; De 47] (though the defini-tion of Boolean circuits and the connection to physical computationwas given 90 years later by Shannon [Sha38]). Alan Turing definedwhat we now call โ€œTuring Machinesโ€ in the 1930s [Tur37], while finiteautomata were introduced in the 1943 work of McCulloch and Pitts[MP43] but only really understood in the seminal 1959 work of Rabinand Scott [RS59].

More importantly, while models such as finite-state machines, reg-ular expressions, and context-free grammars are incredibly importantfor practice, the main applications for these models (whether it is forparsing, for analyzing properties such as liveness and safety, or even forsoftware-defined routing tables) rely crucially on the fact that theseare tractable models for which we can effectively answer semantic ques-tions. This practical motivation can be better appreciated after studentssee the undecidability of semantic properties of general computingmodels.

The fact that we start with circuits makes proving the Cook-LevinTheorem much easier. In fact, our proof of this theorem can be (andis) done using a handful of lines of Python. Combining this proofwith the standard reductions (which are also implemented in Python)allows students to appreciate visually how a question about computa-tion can be mapped into a question about (for example) the existenceof an independent set in a graph.

Some other differences between this book and previous texts arethe following:

1. For measuring time complexity, we use the standard RAM machinemodel used (implicitly) in algorithms courses, rather than Tur-ing machines. While these two models are of course polynomiallyequivalent, and hence make no difference for the definitions of theclasses P, NP, and EXP, our choice makes the distinction betweennotions such as ๐‘‚(๐‘›) or ๐‘‚(๐‘›2) time more meaningful. This choicealso ensures that these finer-grained time complexity classes corre-spond to the informal definitions of linear and quadratic time that

Page 24: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

24

students encounter in their algorithms lectures (or their whiteboardcoding interviewsโ€ฆ).

2. We use the terminology of functions rather than languages. That is,rather than saying that a Turing Machine ๐‘€ decides a language ๐ฟ โŠ†0, 1โˆ—, we say that it computes a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. Theterminology of โ€œlanguagesโ€ arises from Chomskyโ€™s work [Cho56],but it is often more confusing than illuminating. The languageterminology also makes it cumbersome to discuss concepts suchas algorithms that compute functions with more than one bit ofoutput (including basic tasks such as addition, multiplication,etcโ€ฆ). The fact that we use functions rather than languages meanswe have to be extra vigilant about students distinguishing betweenthe specification of a computational task (e.g., the function) and itsimplementation (e.g., the program). On the other hand, this point isso important that it is worth repeatedly emphasizing and drillinginto the students, regardless of the notation used. The book doesmention the language terminology and reminds of it occasionally,to make it easier for students to consult outside resources.

Reducing the time dedicated to finite automata and context-freelanguages allows instructors to spend more time on topics that a mod-ern course in the theory of computing needs to touch upon. Theseinclude randomness and computation, the interactions between proofsand programs (including Gรถdelโ€™s incompleteness theorem, interactiveproof systems, and even a bit on the ๐œ†-calculus and the Curry-Howardcorrespondence), cryptography, and quantum computing.

This book contains sufficient detail to enable its use for self-study.Toward that end, every chapter starts with a list of learning objectives,ends with a recap, and is peppered with โ€œpause boxesโ€ which encour-age students to stop and work out an argument or make sure theyunderstand a definition before continuing further.

Section 0.5 contains a โ€œroadmapโ€ for this book, with descriptionsof the different chapters, as well as the dependency structure betweenthem. This can help in planning a course based on this book.

0.3 ACKNOWLEDGEMENTS

This text is continually evolving, and I am getting input from manypeople, for which I am deeply grateful. Salil Vadhan co-taught withme the first iteration of this course and gave me a tremendous amountof useful feedback and insights during this process. Michele Amorettiand Marika Swanberg carefully read several chapters of this text andgave extremely helpful detailed comments. Dave Evans and RichardXu contributed many pull requests fixing errors and improving phras-

Page 25: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

25

ing. Thanks to Anil Ada, Venkat Guruswami, and Ryan Oโ€™Donnell forhelpful tips from their experience in teaching CMU 15-251.

Thanks to everyone that sent me comments, typo reports, or postedissues or pull requests on the GitHub repository https://github.com/boazbk/tcs. In particular I would like to acknowledge helpfulfeedback from Scott Aaronson, Michele Amoretti, Aadi Bajpai, Mar-guerite Basta, Anindya Basu, Sam Benkelman, Jarosล‚aw Bล‚asiok, EmilyChan, Christy Cheng, Michelle Chiang, Daniel Chiu, Chi-Ning Chou,Michael Colavita, Rodrigo Daboin Sanchez, Robert Darley Waddilove,Anlan Du, Juan Esteller, David Evans, Michael Fine, Simon Fischer,Leor Fishman, Zaymon Foulds-Cook, William Fu, Kent Furuie, PiotrGaluszka, Carolyn Ge, Mark Goldstein, Alexander Golovnev, SayanGoswami, Michael Haak, Rebecca Hao, Joosep Hook, Thomas HUET,Emily Jia, Chan Kang, Nina Katz-Christy, Vidak Kazic, Eddie Kohler,Estefania Lahera, Allison Lee, Benjamin Lee, Ondล™ej Lengรกl, RaymondLin, Emma Ling, Alex Lombardi, Lisa Lu, Aditya Mahadevan, Chris-tian May, Jacob Meyerson, Leon Mlodzian, George Moe, Glenn Moss,Hamish Nicholson, Owen Niles, Sandip Nirmel, Sebastian Oberhoff,Thomas Orton, Joshua Pan, Pablo Parrilo, Juan Perdomo, Banks Pick-ett, Aaron Sachs, Abdelrhman Saleh, Brian Sapozhnikov, AnthonyScemama, Peter Schรคfer, Josh Seides, Alaisha Sharma, Haneul Shin,Noah Singer, Matthew Smedberg, Miguel Solano, Hikari Sorensen,David Steurer, Alec Sun, Amol Surati, Everett Sussman, Marika Swan-berg, Garrett Tanzer, Eric Thomas, Sarah Turnill, Salil Vadhan, PatrickWatts, Jonah Weissman, Ryan Williams, Licheng Xu, Richard Xu, Wan-qian Yang, Elizabeth Yeoh-Wang, Josh Zelinsky, Fred Zhang, GraceZhang, and Jessica Zhu.

I am using many open source software packages in the productionof these notes for which I am grateful. In particular, I am thankful toDonald Knuth and Leslie Lamport for LaTeX and to John MacFarlanefor Pandoc. David Steurer wrote the original scripts to produce thistext. The current version uses Sergio Correiaโ€™s panflute. The templatesfor the LaTeX and HTML versions are derived from Tufte LaTeX,Gitbook and Bookdown. Thanks to Amy Hendrickson for some LaTeXconsulting. Juan Esteller and Gabe Montague initially implementedthe NAND* programming languages in OCaml and Javascript. I usedthe Jupyter project to write the supplemental code snippets.

Finally, I would like to thank my family: my wife Ravit, and mychildren Alma and Goren. Working on this book (and the correspond-ing course) took so much of my time that Alma wrote an essay for herfifth-grade class saying that โ€œuniversities should not pressure profes-sors to work too much.โ€ Iโ€™m afraid all I have to show for this effort is600 pages of ultra-boring mathematical text.

Page 26: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 27: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

PRELIMINARIES

Page 28: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 29: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

1 This quote is typically read as disparaging theimportance of actual physical computers in ComputerScience, but note that telescopes are absolutelyessential to astronomy as they provide us with themeans to connect theoretical predictions with actualexperimental observations.2 To be fair, in the following sentence Graham saysโ€œyou need to know how to calculate time and spacecomplexity and about Turing completenessโ€. Thisbook includes these topics, as well as others such asNP-hardness, randomization, cryptography, quantumcomputing, and more.

0Introduction

โ€œComputer Science is no more about computers than astronomy is abouttelescopesโ€, attributed to Edsger Dijkstra.1

โ€œHackers need to understand the theory of computation about as much aspainters need to understand paint chemistry.โ€, Paul Graham 2003.2

โ€œThe subject of my talk is perhaps most directly indicated by simply askingtwo questions: first, is it harder to multiply than to add? and second, why?โ€ฆI(would like to) show that there is no algorithm for multiplication computation-ally as simple as that for addition, and this proves something of a stumblingblock.โ€, Alan Cobham, 1964

The origin of much of science and medicine can be traced back tothe ancient Babylonians. However, the Babyloniansโ€™ most significantcontribution to humanity was arguably the invention of the place-valuenumber system. The place-value system represents any number usinga collection of digits, whereby the position of the digit is used to de-termine its value, as opposed to a system such as Roman numerals,where every symbol has a fixed numerical value regardless of posi-tion. For example, the average distance to the moon is approximately238,900 of our miles or 259,956 Roman miles. The latter quantity, ex-pressed in standard Roman numerals isMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMDCCCCLVI

Writing the distance to the sun in Roman numerals would requireabout 100,000 symbols: a 50-page book just containing this singlenumber!

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Introduce and motivate the study of

computation for its own sake, irrespective ofparticular implementations.

โ€ข The notion of an algorithm and some of itshistory.

โ€ข Algorithms as not just tools, but also ways ofthinking and understanding.

โ€ข Taste of Big-๐‘‚ analysis and the surprisingcreativity in the design of efficientalgorithms.

Page 30: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

30 introduction to theoretical computer science

For someone who thinks of numbers in an additive system likeRoman numerals, quantities like the distance to the moon or sun arenot merely largeโ€”they are unspeakable: cannot be expressed or evengrasped. Itโ€™s no wonder that Eratosthenes, who was the first personto calculate the earthโ€™s diameter (up to about ten percent error) andHipparchus who was the first to calculate the distance to the moon,did not use a Roman-numeral type system but rather the Babyloniansexagesimal (i.e., base 60) place-value system.

0.1 INTEGER MULTIPLICATION: AN EXAMPLE OF AN ALGORITHM

In the language of Computer Science, the place-value system for rep-resenting numbers is known as a data structure: a set of instructions,or โ€œrecipeโ€, for representing objects as symbols. An algorithm is a setof instructions, or โ€œrecipeโ€, for performing operations on such rep-resentations. Data structures and algorithms have enabled amazingapplications that have transformed human society, but their impor-tance goes beyond their practical utility. Structures from computerscience, such as bits, strings, graphs, and even the notion of a programitself, as well as concepts such as universality and replication, have notjust found (many) practical uses but contributed a new language anda new way to view the world.

In addition to coming up with the place-value system, the Babylo-nians also invented the โ€œstandard algorithmsโ€ that we were all taughtin elementary school for adding and multiplying numbers. These al-gorithms have been essential throughout the ages for people usingabaci, papyrus, or pencil and paper, but in our computer age, do theystill serve any purpose beyond torturing third graders? To see whythese algorithms are still very much relevant, let us compare the Baby-lonian digit-by-digit multiplication algorithm (โ€œgrade-school multi-plicationโ€) with the naive algorithm that multiplies numbers throughrepeated addition. We start by formally describing both algorithms,see Algorithm 0.1 and Algorithm 0.2.

Algorithm 0.1 โ€” Multiplication via repeated addition.

Input: Non-negative integers ๐‘ฅ, ๐‘ฆOutput: Product ๐‘ฅ โ‹… ๐‘ฆ1: Let ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก โ† 0.2: for ๐‘– = 1, โ€ฆ , ๐‘ฆ do3: ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก โ† ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก + ๐‘ฅ4: end for5: return ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก

Page 31: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 31

Algorithm 0.2 โ€” Grade-school multiplication.

Input: Non-negative integers ๐‘ฅ, ๐‘ฆOutput: Product ๐‘ฅ โ‹… ๐‘ฆ1: Write ๐‘ฅ = ๐‘ฅ๐‘›โˆ’1๐‘ฅ๐‘›โˆ’2 โ‹ฏ ๐‘ฅ0 and ๐‘ฆ = ๐‘ฆ๐‘šโˆ’1๐‘ฆ๐‘šโˆ’2 โ‹ฏ ๐‘ฆ0 in dec-

imal place-value notation. # ๐‘ฅ0 is the ones digit of ๐‘ฅ, ๐‘ฅ1 isthe tens digit, etc.

2: Let ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก โ† 03: for ๐‘– = 0, โ€ฆ , ๐‘› โˆ’ 1 do4: for ๐‘— = 0, โ€ฆ , ๐‘š โˆ’ 1 do5: ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก โ† ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก + 10๐‘–+๐‘— โ‹… ๐‘ฅ๐‘– โ‹… ๐‘ฆ๐‘—6: end for7: end for8: return ๐‘Ÿ๐‘’๐‘ ๐‘ข๐‘™๐‘ก

Both Algorithm 0.1 and Algorithm 0.2 assume that we alreadyknow how to add numbers, and Algorithm 0.2 also assumes that wecan multiply a number by a power of 10 (which is, after all, a sim-ple shift). Suppose that ๐‘ฅ and ๐‘ฆ are two integers of ๐‘› = 20 decimaldigits each. (This roughly corresponds to 64 binary digits, which isa common size in many programming languages.) Computing ๐‘ฅ โ‹… ๐‘ฆusing Algorithm 0.1 entails adding ๐‘ฅ to itself ๐‘ฆ times which entails(since ๐‘ฆ is a 20-digit number) at least 1019 additions. In contrast, thegrade-school algorithm (i.e., Algorithm 0.2) involves ๐‘›2 shifts andsingle-digit products, and so at most 2๐‘›2 = 800 single-digit opera-tions. To understand the difference, consider that a grade-schooler canperform a single-digit operation in about 2 seconds, and so would re-quire about 1, 600 seconds (about half an hour) to compute ๐‘ฅ โ‹… ๐‘ฆ usingAlgorithm 0.2. In contrast, even though it is more than a billion timesfaster than a human, if we used Algorithm 0.1 to compute ๐‘ฅ โ‹… ๐‘ฆ using amodern PC, it would take us 1020/109 = 1011 seconds (which is morethan three millennia!) to compute the same result.

Computers have not made algorithms obsolete. On the contrary,the vast increase in our ability to measure, store, and communicatedata has led to much higher demand for developing better and moresophisticated algorithms that empower us to make better decisionsbased on these data. We also see that in no small extent the notion ofalgorithm is independent of the actual computing device that executesit. The digit-by-digit multiplication algorithm is vastly better than iter-ated addition, regardless whether the technology we use to implementit is a silicon-based chip, or a third grader with pen and paper.

Theoretical computer science is concerned with the inherent proper-ties of algorithms and computation; namely, those properties that areindependent of current technology. We ask some questions that were

Page 32: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

32 introduction to theoretical computer science

already pondered by the Babylonians, such as โ€œwhat is the best way tomultiply two numbers?โ€, but also questions that rely on cutting-edgescience such as โ€œcould we use the effects of quantum entanglement tofactor numbers faster?โ€.

RRemark 0.3 โ€” Specification, implementation and analysisof algorithms.. A full description of an algorithm hasthree components:

โ€ข Specification: What is the task that the algorithmperforms (e.g., multiplication in the case of Algo-rithm 0.1 and Algorithm 0.2.)

โ€ข Implementation: How is the task accomplished:what is the sequence of instructions to be per-formed. Even though Algorithm 0.1 and Algo-rithm 0.2 perform the same computational task(i.e., they have the same specification), they do it indifferent ways (i.e., they have different implementa-tions).

โ€ข Analysis: Why does this sequence of instructionsachieve the desired task. A full description of Algo-rithm 0.1 and Algorithm 0.2 will include a proof foreach one of these algorithms that on input ๐‘ฅ, ๐‘ฆ, thealgorithm does indeed output ๐‘ฅ โ‹… ๐‘ฆ.

Often as part of the analysis we show that the algo-rithm is not only correct but also efficient. That is, wewant to show that not only will the algorithm computethe desired task, but will do so in prescribed numberof operations. For example Algorithm 0.2 computesthe multiplication function on inputs of ๐‘› digits using๐‘‚(๐‘›2) operations, while Algorithm 0.4 (describedbelow) computes the same function using ๐‘‚(๐‘›1.6)operations. (We define the ๐‘‚ notations used here inSection 1.4.8.)

0.2 EXTENDED EXAMPLE: A FASTER WAY TO MULTIPLY (OP-TIONAL)

Once you think of the standard digit-by-digit multiplication algo-rithm, it seems like the โ€œobviously bestโ€™ โ€™ way to multiply numbers.In 1960, the famous mathematician Andrey Kolmogorov organizeda seminar at Moscow State University in which he conjectured thatevery algorithm for multiplying two ๐‘› digit numbers would requirea number of basic operations that is proportional to ๐‘›2 (ฮฉ(๐‘›2) opera-tions, using ๐‘‚-notation as defined in Chapter 1). In other words, Kol-mogorov conjectured that in any multiplication algorithm, doublingthe number of digits would quadruple the number of basic operationsrequired. A young student named Anatoly Karatsuba was in the au-

Page 33: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 33

Figure 1: The grade-school multiplication algorithmillustrated for multiplying ๐‘ฅ = 10๐‘ฅ + ๐‘ฅ and ๐‘ฆ =10๐‘ฆ + ๐‘ฆ. It uses the formula (10๐‘ฅ + ๐‘ฅ) ร— (10๐‘ฆ + ๐‘ฆ) =100๐‘ฅ๐‘ฆ + 10(๐‘ฅ๐‘ฆ + ๐‘ฅ๐‘ฆ) + ๐‘ฅ๐‘ฆ.

3 If ๐‘ฅ is a number then โŒŠ๐‘ฅโŒ‹ is the integer obtained byrounding it down, see Section 1.7.

dience, and within a week he disproved Kolmogorovโ€™s conjecture bydiscovering an algorithm that requires only about ๐ถ๐‘›1.6 operationsfor some constant ๐ถ. Such a number becomes much smaller than ๐‘›2as ๐‘› grows and so for large ๐‘› Karatsubaโ€™s algorithm is superior to thegrade-school one. (For example, Pythonโ€™s implementation switchesfrom the grade-school algorithm to Karatsubaโ€™s algorithm for num-bers that are 1000 bits or larger.) While the difference between an๐‘‚(๐‘›1.6) and an ๐‘‚(๐‘›2) algorithm can be sometimes crucial in practice(see Section 0.3 below), in this book we will mostly ignore such dis-tinctions. However, we describe Karatsubaโ€™s algorithm below since itis a good example of how algorithms can often be surprising, as wellas a demonstration of the analysis of algorithms, which is central to thisbook and to theoretical computer science at large.

Karatsubaโ€™s algorithm is based on a faster way to multiply two-digitnumbers. Suppose that ๐‘ฅ, ๐‘ฆ โˆˆ [100] = 0, โ€ฆ , 99 are a pair of two-digit numbers. Letโ€™s write ๐‘ฅ for the โ€œtensโ€ digit of ๐‘ฅ, and ๐‘ฅ for theโ€œonesโ€ digit, so that ๐‘ฅ = 10๐‘ฅ + ๐‘ฅ, and write similarly ๐‘ฆ = 10๐‘ฆ + ๐‘ฆ for๐‘ฅ, ๐‘ฅ, ๐‘ฆ, ๐‘ฆ โˆˆ [10]. The grade-school algorithm for multiplying ๐‘ฅ and ๐‘ฆ isillustrated in Fig. 1.

The grade-school algorithm can be thought of as transforming thetask of multiplying a pair of two-digit numbers into four single-digitmultiplications via the formula(10๐‘ฅ + ๐‘ฅ) ร— (10๐‘ฆ + ๐‘ฆ) = 100๐‘ฅ๐‘ฆ + 10(๐‘ฅ๐‘ฆ + ๐‘ฅ๐‘ฆ) + ๐‘ฅ๐‘ฆ (1)

Generally, in the grade-school algorithm doubling the number ofdigits in the input results in quadrupling the number of operations,leading to an ๐‘‚(๐‘›2) times algorithm. In contrast, Karatsubaโ€™s algo-rithm is based on the observation that we can express Eq. (1) alsoas

(10๐‘ฅ+๐‘ฅ)ร—(10๐‘ฆ+๐‘ฆ) = (100โˆ’10)๐‘ฅ๐‘ฆ+10 [(๐‘ฅ + ๐‘ฅ)(๐‘ฆ + ๐‘ฆ)]โˆ’(10โˆ’1)๐‘ฅ๐‘ฆ (2)

which reduces multiplying the two-digit number ๐‘ฅ and ๐‘ฆ to com-puting the following three simpler products: ๐‘ฅ๐‘ฆ, ๐‘ฅ๐‘ฆ and (๐‘ฅ + ๐‘ฅ)(๐‘ฆ + ๐‘ฆ).By repeating the same strategy recursively, we can reduce the task ofmultiplying two ๐‘›-digit numbers to the task of multiplying three pairsof โŒŠ๐‘›/2โŒ‹ + 1 digit numbers.3 Since every time we double the number ofdigits we triple the number of operations, we will be able to multiplynumbers of ๐‘› = 2โ„“ digits using about 3โ„“ = ๐‘›log2 3 โˆผ ๐‘›1.585 operations.

The above is the intuitive idea behind Karatsubaโ€™s algorithm, but isnot enough to fully specify it. A complete description of an algorithmentails a precise specification of its operations together with its analysis:proof that the algorithm does in fact do what itโ€™s supposed to do. The

Page 34: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

34 introduction to theoretical computer science

Figure 2: Karatsubaโ€™s multiplication algorithm illus-trated for multiplying ๐‘ฅ = 10๐‘ฅ + ๐‘ฅ and ๐‘ฆ = 10๐‘ฆ + ๐‘ฆ.We compute the three orange, green and purple prod-ucts ๐‘ฅ๐‘ฆ, ๐‘ฅ๐‘ฆ and (๐‘ฅ + ๐‘ฅ)(๐‘ฆ + ๐‘ฆ) and then add andsubtract them to obtain the result.

Figure 3: Running time of Karatsubaโ€™s algorithmvs. the grade-school algorithm. (Python implementa-tion available online.) Note the existence of a โ€œcutoffโ€length, where for sufficiently large inputs Karat-suba becomes more efficient than the grade-schoolalgorithm. The precise cutoff location varies by imple-mentation and platform details, but will always occureventually.

operations of Karatsubaโ€™s algorithm are detailed in Algorithm 0.4,while the analysis is given in Lemma 0.5 and Lemma 0.6.

Algorithm 0.4 โ€” Karatsuba multiplication.

Input: nonnegative integers ๐‘ฅ, ๐‘ฆ each of at most ๐‘› digitsOutput: ๐‘ฅ โ‹… ๐‘ฆ1: procedure Karatsuba(๐‘ฅ,๐‘ฆ)2: if ๐‘› โ‰ค 4 then return ๐‘ฅ โ‹… ๐‘ฆ ;3: Let ๐‘š = โŒŠ๐‘›/2โŒ‹4: Write ๐‘ฅ = 10๐‘š๐‘ฅ + ๐‘ฅ and ๐‘ฆ = 10๐‘š๐‘ฆ + ๐‘ฆ5: ๐ด โ† Karatsuba(๐‘ฅ, ๐‘ฆ)6: ๐ต โ† Karatsuba(๐‘ฅ + ๐‘ฅ, ๐‘ฆ + ๐‘ฆ)7: ๐ถ โ† Karatsuba(๐‘ฅ, ๐‘ฆ)8: return (10๐‘› โˆ’ 10๐‘š) โ‹… ๐ด + 10๐‘š โ‹… ๐ต + (1 โˆ’ 10๐‘š) โ‹… ๐ถ9: end procedure

Algorithm 0.4 is only half of the full description of Karatsubaโ€™salgorithm. The other half is the analysis, which entails proving that (1)Algorithm 0.4 indeed computes the multiplication operation and (2)it does so using ๐‘‚(๐‘›log2 3) operations. We now turn to showing bothfacts:Lemma 0.5 For every nonnegative integers ๐‘ฅ, ๐‘ฆ, when given input ๐‘ฅ, ๐‘ฆAlgorithm 0.4 will output ๐‘ฅ โ‹… ๐‘ฆ.Proof. Let ๐‘› be the maximum number of digits of ๐‘ฅ and ๐‘ฆ. We provethe lemma by induction on ๐‘›. The base case is ๐‘› โ‰ค 4 where the algo-rithm returns ๐‘ฅ โ‹… ๐‘ฆ by definition. (It does not matter which algorithmwe use to multiply four-digit numbers - we can even use repeatedaddition.) Otherwise, if ๐‘› > 4, we define ๐‘š = โŒŠ๐‘›/2โŒ‹, and write๐‘ฅ = 10๐‘š๐‘ฅ + ๐‘ฅ and ๐‘ฆ = 10๐‘š๐‘ฆ + ๐‘ฆ.

Plugging this into ๐‘ฅ โ‹… ๐‘ฆ, we get๐‘ฅ โ‹… ๐‘ฆ = 102๐‘š๐‘ฅ๐‘ฆ + 10๐‘š(๐‘ฅ๐‘ฆ + ๐‘ฅ๐‘ฆ) + ๐‘ฅ๐‘ฆ . (3)

Rearranging the terms we see that๐‘ฅ โ‹… ๐‘ฆ = 102๐‘š๐‘ฅ๐‘ฆ + 10๐‘š [(๐‘ฅ + ๐‘ฅ)(๐‘ฆ + ๐‘ฆ) โˆ’ ๐‘ฅ๐‘ฆ โˆ’ ๐‘ฅ๐‘ฆ] + ๐‘ฅ๐‘ฆ . (4)

since the numbers ๐‘ฅ,๐‘ฅ, ๐‘ฆ,๐‘ฆ,๐‘ฅ + ๐‘ฅ,๐‘ฆ + ๐‘ฆ all have at most ๐‘š + 1 < ๐‘› digits,the induction hypothesis implies that the values ๐ด, ๐ต, ๐ถ computedby the recursive calls will satisfy ๐ด = ๐‘ฅ๐‘ฆ, ๐ต = (๐‘ฅ + ๐‘ฅ)(๐‘ฆ + ๐‘ฆ) and๐ถ = ๐‘ฅ๐‘ฆ. Plugging this into (4) we see that ๐‘ฅ โ‹… ๐‘ฆ equals the value(102๐‘š โˆ’ 10๐‘š) โ‹… ๐ด + 10๐‘š โ‹… ๐ต + (1 โˆ’ 10๐‘š) โ‹… ๐ถ computed by Algorithm 0.4.

Page 35: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 35

Lemma 0.6 If ๐‘ฅ, ๐‘ฆ are integers of at most ๐‘› digits, Algorithm 0.4 willtake ๐‘‚(๐‘›log2 3) operations on input ๐‘ฅ, ๐‘ฆ.Proof. Fig. 2 illustrates the idea behind the proof, which we onlysketch here, leaving filling out the details as Exercise 0.4. The proofis again by induction. We define ๐‘‡ (๐‘›) to be the maximum number ofsteps that Algorithm 0.4 takes on inputs of length at most ๐‘›. Since inthe base case ๐‘› โ‰ค 2, Exercise 0.4 performs a constant number of com-putation, we know that ๐‘‡ (2) โ‰ค ๐‘ for some constant ๐‘ and for ๐‘› > 2, itsatisfies the recursive equation๐‘‡ (๐‘›) โ‰ค 3๐‘‡ (โŒŠ๐‘›/2โŒ‹ + 1) + ๐‘โ€ฒ๐‘› (5)

for some constant ๐‘โ€ฒ (using the fact that addition can be done in ๐‘‚(๐‘›)operations).

The recursive equation (5) solves to ๐‘‚(๐‘›log2 3). The intuition be-hind this is presented in Fig. 2, and this is also a consequence of theso called โ€œMaster Theoremโ€ on recurrence relations. As mentionedabove, we leave completing the proof to the reader as Exercise 0.4.

Figure 4: Karatsubaโ€™s algorithm reduces an ๐‘›-bitmultiplication to three ๐‘›/2-bit multiplications,which in turn are reduced to nine ๐‘›/4-bit multi-plications and so on. We can represent the compu-tational cost of all these multiplications in a 3-arytree of depth log2 ๐‘›, where at the root the extra costis ๐‘๐‘› operations, at the first level the extra cost is๐‘(๐‘›/2) operations, and at each of the 3๐‘– nodes oflevel ๐‘–, the extra cost is ๐‘(๐‘›/2๐‘–). The total cost is๐‘๐‘› โˆ‘log2 ๐‘›๐‘–=0 (3/2)๐‘– โ‰ค 10๐‘๐‘›log2 3 by the formula forsumming a geometric series.

Karatsubaโ€™s algorithm is by no means the end of the line for multi-plication algorithms. In the 1960โ€™s, Toom and Cook extended Karat-subaโ€™s ideas to get an ๐‘‚(๐‘›log๐‘˜(2๐‘˜โˆ’1)) time multiplication algorithm forevery constant ๐‘˜. In 1971, Schรถnhage and Strassen got even better al-gorithms using the Fast Fourier Transform; their idea was to somehowtreat integers as โ€œsignalsโ€ and do the multiplication more efficientlyby moving to the Fourier domain. (The Fourier transform is a centraltool in mathematics and engineering, used in a great many applica-tions; if you have not seen it yet, you are likely encounter it at somepoint in your studies.) In the years that followed researchers kept im-proving the algorithm, and only very recently Harvey and Van Der

Page 36: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

36 introduction to theoretical computer science

Hoeven managed to obtain an ๐‘‚(๐‘› log๐‘›) time algorithm for multipli-cation (though it only starts beating the Schรถnhage-Strassen algorithmfor truly astronomical numbers). Yet, despite all this progress, westill donโ€™t know whether or not there is an ๐‘‚(๐‘›) time algorithm formultiplying two ๐‘› digit numbers!

RRemark 0.7 โ€” Matrix Multiplication (advanced note).(This book contains many โ€œadvancedโ€ or โ€œoptionalโ€notes and sections. These may assume backgroundthat not every student has, and can be safely skippedover as none of the future parts depends on them.)Ideas similar to Karatsubaโ€™s can be used to speed upmatrix multiplications as well. Matrices are a powerfulway to represent linear equations and operations,widely used in a great many applications of scientificcomputing, graphics, machine learning, and manymany more.One of the basic operations one can do withtwo matrices is to multiply them. For example,

if ๐‘ฅ = (๐‘ฅ0,0 ๐‘ฅ0,1๐‘ฅ1,0 ๐‘ฅ1,1) and ๐‘ฆ = (๐‘ฆ0,0 ๐‘ฆ0,1๐‘ฆ1,0 ๐‘ฆ1,1)then the product of ๐‘ฅ and ๐‘ฆ is the matrix(๐‘ฅ0,0๐‘ฆ0,0 + ๐‘ฅ0,1๐‘ฆ1,0 ๐‘ฅ0,0๐‘ฆ0,1 + ๐‘ฅ0,1๐‘ฆ1,1๐‘ฅ1,0๐‘ฆ0,0 + ๐‘ฅ1,1๐‘ฆ1,0 ๐‘ฅ1,0๐‘ฆ0,1 + ๐‘ฅ1,1๐‘ฆ1,1). You can

see that we can compute this matrix by eight productsof numbers.Now suppose that ๐‘› is even and ๐‘ฅ and ๐‘ฆ are a pair of๐‘› ร— ๐‘› matrices which we can think of as each com-posed of four (๐‘›/2) ร— (๐‘›/2) blocks ๐‘ฅ0,0, ๐‘ฅ0,1, ๐‘ฅ1,0, ๐‘ฅ1,1and ๐‘ฆ0,0, ๐‘ฆ0,1, ๐‘ฆ1,0, ๐‘ฆ1,1. Then the formula for the matrixproduct of ๐‘ฅ and ๐‘ฆ can be expressed in the same wayas above, just replacing products ๐‘ฅ๐‘Ž,๐‘๐‘ฆ๐‘,๐‘‘ with matrixproducts, and addition with matrix addition. Thismeans that we can use the formula above to give analgorithm that doubles the dimension of the matricesat the expense of increasing the number of operationby a factor of 8, which for ๐‘› = 2โ„“ results in 8โ„“ = ๐‘›3operations.In 1969 Volker Strassen noted that we can computethe product of a pair of two-by-two matrices usingonly seven products of numbers by observing thateach entry of the matrix ๐‘ฅ๐‘ฆ can be computed byadding and subtracting the following seven terms:๐‘ก1 = (๐‘ฅ0,0 + ๐‘ฅ1,1)(๐‘ฆ0,0 + ๐‘ฆ1,1), ๐‘ก2 = (๐‘ฅ0,0 + ๐‘ฅ1,1)๐‘ฆ0,0,๐‘ก3 = ๐‘ฅ0,0(๐‘ฆ0,1 โˆ’ ๐‘ฆ1,1), ๐‘ก4 = ๐‘ฅ1,1(๐‘ฆ0,1 โˆ’ ๐‘ฆ0,0),๐‘ก5 = (๐‘ฅ0,0 + ๐‘ฅ0,1)๐‘ฆ1,1, ๐‘ก6 = (๐‘ฅ1,0 โˆ’ ๐‘ฅ0,0)(๐‘ฆ0,0 + ๐‘ฆ0,1),๐‘ก7 = (๐‘ฅ0,1 โˆ’ ๐‘ฅ1,1)(๐‘ฆ1,0 + ๐‘ฆ1,1). Indeed, one can verify

that ๐‘ฅ๐‘ฆ = (๐‘ก1 + ๐‘ก4 โˆ’ ๐‘ก5 + ๐‘ก7 ๐‘ก3 + ๐‘ก5๐‘ก2 + ๐‘ก4 ๐‘ก1 + ๐‘ก3 โˆ’ ๐‘ก2 + ๐‘ก6).

Page 37: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 37

Using this observation, we can obtain an algorithmsuch that doubling the dimension of the matricesresults in increasing the number of operations by afactor of 7, which means that for ๐‘› = 2โ„“ the cost is7โ„“ = ๐‘›log2 7 โˆผ ๐‘›2.807. A long sequence of work hassince improved this algorithm, and the current recordhas running time about ๐‘‚(๐‘›2.373). However, unlike thecase of integer multiplication, at the moment we donโ€™tknow of any algorithm for matrix multiplication thatruns in time linear or even close to linear in the sizeof the input matrices (e.g., an ๐‘‚(๐‘›2๐‘๐‘œ๐‘™๐‘ฆ๐‘™๐‘œ๐‘”(๐‘›)) timealgorithm). People have tried to use group represen-tations, which can be thought of as generalizations ofthe Fourier transform, to obtain faster algorithms, butthis effort has not yet succeeded.

0.3 ALGORITHMS BEYOND ARITHMETIC

The quest for better algorithms is by no means restricted to arithmetictasks such as adding, multiplying or solving equations. Many graphalgorithms, including algorithms for finding paths, matchings, span-ning trees, cuts, and flows, have been discovered in the last severaldecades, and this is still an intensive area of research. (For example,the last few years saw many advances in algorithms for the maximumflow problem, borne out of unexpected connections with electrical cir-cuits and linear equation solvers.) These algorithms are being usednot just for the โ€œnaturalโ€ applications of routing network traffic orGPS-based navigation, but also for applications as varied as drug dis-covery through searching for structures in gene-interaction graphs tocomputing risks from correlations in financial investments.

Google was founded based on the PageRank algorithm, which isan efficient algorithm to approximate the โ€œprincipal eigenvectorโ€ of(a dampened version of) the adjacency matrix of the web graph. TheAkamai company was founded based on a new data structure, knownas consistent hashing, for a hash table where buckets are stored at dif-ferent servers. The backpropagation algorithm, which computes partialderivatives of a neural network in ๐‘‚(๐‘›) instead of ๐‘‚(๐‘›2) time, under-lies many of the recent phenomenal successes of learning deep neuralnetworks. Algorithms for solving linear equations under sparsityconstraints, a concept known as compressed sensing, have been usedto drastically reduce the amount and quality of data needed to ana-lyze MRI images. This made a critical difference for MRI imaging ofcancer tumors in children, where previously doctors needed to useanesthesia to suspend breath during the MRI exam, sometimes withdire consequences.

Page 38: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

38 introduction to theoretical computer science

Even for classical questions, studied through the ages, new dis-coveries are still being made. For example, for the question of de-termining whether a given integer is prime or composite, which hasbeen studied since the days of Pythagoras, efficient probabilistic algo-rithms were only discovered in the 1970s, while the first deterministicpolynomial-time algorithm was only found in 2002. For the relatedproblem of actually finding the factors of a composite number, newalgorithms were found in the 1980s, and (as weโ€™ll see later in thiscourse) discoveries in the 1990s raised the tantalizing prospect ofobtaining faster algorithms through the use of quantum mechanicaleffects.

Despite all this progress, there are still many more questions thananswers in the world of algorithms. For almost all natural prob-lems, we do not know whether the current algorithm is the โ€œbestโ€,or whether a significantly better one is still waiting to be discovered.As alluded in Cobhamโ€™s opening quote for this chapter, even for thebasic problem of multiplying numbers we have not yet answered thequestion of whether there is a multiplication algorithm that is as ef-ficient as our algorithms for addition. But at least we now know theright way to ask it.

0.4 ON THE IMPORTANCE OF NEGATIVE RESULTS

Finding better algorithms for problems such as multiplication, solv-ing equations, graph problems, or fitting neural networks to data, isundoubtedly a worthwhile endeavor. But why is it important to provethat such algorithms donโ€™t exist? One motivation is pure intellectualcuriosity. Another reason to study impossibility results is that theycorrespond to the fundamental limits of our world. In other words,impossibility results are laws of nature.

Here are some examples of impossibility results outside computerscience (see Section 0.7 for more about these). In physics, the impos-sibility of building a perpetual motion machine corresponds to the lawof conservation of energy. The impossibility of building a heat enginebeating Carnotโ€™s bound corresponds to the second law of thermo-dynamics, while the impossibility of faster-than-light informationtransmission is a cornerstone of special relativity. In mathematics,while we all learned the formula for solving quadratic equations inhigh school, the impossibility of generalizing this formula to equationsof degree five or more gave birth to group theory. The impossibilityof proving Euclidโ€™s fifth axiom from the first four gave rise to non-Euclidean geometries, which ended up crucial for the theory of generalrelativity.

In an analogous way, impossibility results for computation corre-spond to โ€œcomputational laws of natureโ€ that tell us about the fun-

Page 39: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 39

damental limits of any information processing apparatus, whetherbased on silicon, neurons, or quantum particles. Moreover, computerscientists found creative approaches to apply computational limitationsto achieve certain useful tasks. For example, much of modern Internettraffic is encrypted using the RSA encryption scheme, which relies onits security on the (conjectured) impossibility of efficiently factoringlarge integers. More recently, the Bitcoin system uses a digital ana-log of the โ€œgold standardโ€ where, instead of using a precious metal,new currency is obtained by โ€œminingโ€ solutions for computationallydifficult problems.

Chapter Recap

โ€ข The history of algorithms goes back thousandsof years; they have been essential much of hu-man progress and these days form the basis ofmulti-billion dollar industries, as well as life-savingtechnologies.

โ€ข There is often more than one algorithm to achievethe same computational task. Finding a faster al-gorithm can often make a much bigger differencethan improving computing hardware.

โ€ข Better algorithms and data structures donโ€™t justspeed up calculations, but can yield new qualitativeinsights.

โ€ข One question we will study is to find out what isthe most efficient algorithm for a given problem.

โ€ข To show that an algorithm is the most efficient onefor a given problem, we need to be able to provethat it is impossible to solve the problem using asmaller amount of computational resources.

0.5 ROADMAP TO THE REST OF THIS BOOK

Often, when we try to solve a computational problem, whether it issolving a system of linear equations, finding the top eigenvector of amatrix, or trying to rank Internet search results, it is enough to use theโ€œI know it when I see itโ€ standard for describing algorithms. As longas we find some way to solve the problem, we are happy and mightnot care much on the exact mathematical model for our algorithm.But when we want to answer a question such as โ€œdoes there exist analgorithm to solve the problem ๐‘ƒ ?โ€ we need to be much more precise.

In particular, we will need to (1) define exactly what it means tosolve ๐‘ƒ , and (2) define exactly what an algorithm is. Even (1) cansometimes be non-trivial but (2) is particularly challenging; it is notat all clear how (and even whether) we can encompass all potentialways to design algorithms. We will consider several simple models of

Page 40: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

40 introduction to theoretical computer science

computation, and argue that, despite their simplicity, they do captureall โ€œreasonableโ€ approaches to achieve computing, including all thosethat are currently used in modern computing devices.

Once we have these formal models of computation, we can tryto obtain impossibility results for computational tasks, showing thatsome problems can not be solved (or perhaps can not be solved withinthe resources of our universe). Archimedes once said that given afulcrum and a long enough lever, he could move the world. We willsee how reductions allow us to leverage one hardness result into aslew of a great many others, illuminating the boundaries betweenthe computable and uncomputable (or tractable and intractable)problems.

Later in this book we will go back to examining our models ofcomputation, and see how resources such as randomness or quantumentanglement could potentially change the power of our model. Inthe context of probabilistic algorithms, we will see a glimpse of howrandomness has become an indispensable tool for understandingcomputation, information, and communication. We will also see howcomputational difficulty can be an asset rather than a hindrance, andbe used for the โ€œderandomizationโ€ of probabilistic algorithms. Thesame ideas also show up in cryptography, which has undergone notjust a technological but also an intellectual revolution in the last fewdecades, much of it building on the foundations that we explore inthis course.

Theoretical Computer Science is a vast topic, branching out andtouching upon many scientific and engineering disciplines. This bookprovides a very partial (and biased) sample of this area. More thananything, I hope I will manage to โ€œinfectโ€ you with at least some ofmy love for this field, which is inspired and enriched by the connec-tion to practice, but is also deep and beautiful regardless of applica-tions.

0.5.1 Dependencies between chaptersThis book is divided into the following parts, see Fig. 5.

โ€ข Preliminaries: Introduction, mathematical background, and repre-senting objects as strings.

โ€ข Part I: Finite computation (Boolean circuits): Equivalence of cir-cuits and straight-line programs. Universal gate sets. Existence of acircuit for every function, representing circuits as strings, universalcircuit, lower bound on circuit size using the counting argument.

โ€ข Part II: Uniform computation (Turing machines): Equivalence ofTuring machines and programs with loops. Equivalence of models

Page 41: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 41

(including RAM machines, ๐œ† calculus, and cellular automata), con-figurations of Turing machines, existence of a universal Turing ma-chine, uncomputable functions (including the Halting problem andRiceโ€™s Theorem), Gรถdelโ€™s incompleteness theorem, restricted com-putational models models (regular and context free languages).

โ€ข Part III: Efficient computation: Definition of running time, timehierarchy theorem, P and NP, P/poly, NP completeness and theCook-Levin Theorem, space bounded computation.

โ€ข Part IV: Randomized computation: Probability, randomized algo-rithms, BPP, amplification, BPP โŠ† P/๐‘๐‘œ๐‘™๐‘ฆ, pseudorandom genera-tors and derandomization.

โ€ข Part V: Advanced topics: Cryptography, proofs and algorithms(interactive and zero knowledge proofs, Curry-Howard correspon-dence), quantum computing.

Figure 5: The dependency structure of the differentparts. Part I introduces the model of Boolean cir-cuits to study finite functions with an emphasis onquantitative questions (how many gates to computea function). Part II introduces the model of Turingmachines to study functions that have unbounded inputlengths with an emphasis on qualitative questions (isthis function computable or not). Much of Part II doesnot depend on Part I, as Turing machines can be usedas the first computational model. Part III dependson both parts as it introduces a quantitative study offunctions with unbounded input length. The moreadvanced parts IV (randomized computation) andV (advanced topics) rely on the material of Parts I, IIand III.

The book largely proceeds in linear order, with each chapter build-ing on the previous ones, with the following exceptions:

โ€ข The topics of ๐œ† calculus (Section 8.5 and Section 8.5), Gรถdelโ€™s in-completeness theorem (Chapter 11), Automata/regular expres-sions and context-free grammars (Chapter 10), and space-boundedcomputation (Chapter 17), are not used in the following chapters.Hence you can choose whether to cover or skip any subset of them.

โ€ข Part II (Uniform Computation / Turing Machines) does not havea strong dependency on Part I (Finite computation / Boolean cir-cuits) and it should be possible to teach them in the reverse order

Page 42: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

42 introduction to theoretical computer science

with minor modification. Boolean circuits are used Part III (efficientcomputation) for results such as P โŠ† P/poly and the Cook-LevinTheorem, as well as in Part IV (for BPP โŠ† P/poly and derandom-ization) and Part V (specifically in cryptography and quantumcomputing).

โ€ข All chapters in Part V (Advanced topics) are independent of oneanother and can be covered in any order.

A course based on this book can use all of Parts I, II, and III (possi-bly skipping over some or all of the ๐œ† calculus, Chapter 11, Chapter 10or Chapter 17), and then either cover all or some of Part IV (random-ized computation), and add a โ€œsprinklingโ€ of advanced topics fromPart V based on student or instructor interest.

0.6 EXERCISES

Exercise 0.1 Rank the significance of the following inventions in speed-ing up multiplication of large (that is 100-digit or more) numbers.That is, use โ€œback of the envelopeโ€ estimates to order them in terms ofthe speedup factor they offered over the previous state of affairs.

a. Discovery of the grade-school digit by digit algorithm (improvingupon repeated addition)

b. Discovery of Karatsubaโ€™s algorithm (improving upon the digit bydigit algorithm)

c. Invention of modern electronic computers (improving upon calcu-lations with pen and paper).

Exercise 0.2 The 1977 Apple II personal computer had a processorspeed of 1.023 Mhz or about 106 operations per seconds. At thetime of this writing the worldโ€™s fastest supercomputer performs 93โ€œpetaflopsโ€ (1015 floating point operations per second) or about 1018basic steps per second. For each one of the following running times(as a function of the input length ๐‘›), compute for both computers howlarge an input they could handle in a week of computation, if they runan algorithm that has this running time:

a. ๐‘› operations.

b. ๐‘›2 operations.

c. ๐‘› log๐‘› operations.

d. 2๐‘› operations.

Page 43: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 43

4 As we will see in Chapter Chapter 21, almost anycompany relying on cryptography needs to assumethe non existence of certain algorithms. In particular,RSA Security was founded based on the securityof the RSA cryptosystem, which presumes the nonexistence of an efficient algorithm to compute theprime factorization of large integers.5 Hint: Use a proof by induction - suppose that this istrue for all ๐‘›โ€™s from 1 to ๐‘š and prove that this is truealso for ๐‘š + 1.

e. ๐‘›! operations.

Exercise 0.3 โ€” Usefulness of algorithmic non-existence. In this chapter wementioned several companies that were founded based on the discov-ery of new algorithms. Can you give an example for a company thatwas founded based on the non existence of an algorithm? See footnotefor hint.4

Exercise 0.4 โ€” Analysis of Karatsubaโ€™s Algorithm. a. Suppose that๐‘‡1, ๐‘‡2, ๐‘‡3, โ€ฆ is a sequence of numbers such that ๐‘‡2 โ‰ค 10 andfor every ๐‘›, ๐‘‡๐‘› โ‰ค 3๐‘‡โŒŠ๐‘›/2โŒ‹+1 + ๐ถ๐‘› for some ๐ถ โ‰ฅ 1. Prove that๐‘‡๐‘› โ‰ค 20๐ถ๐‘›log2 3 for every ๐‘› > 2.5

b. Prove that the number of single-digit operations that Karatsubaโ€™salgorithm takes to multiply two ๐‘› digit numbers is at most1000๐‘›log2 3.

Exercise 0.5 Implement in the programming language of yourchoice functions Gradeschool_multiply(x,y) and Karat-suba_multiply(x,y) that take two arrays of digits x and y and returnan array representing the product of x and y (where x is identifiedwith the number x[0]+10*x[1]+100*x[2]+... etc..) using thegrade-school algorithm and the Karatsuba algorithm respectively.At what number of digits does the Karatsuba algorithm beat thegrade-school one?

Exercise 0.6 โ€” Matrix Multiplication (optional, advanced). In this exercise, weshow that if for some ๐œ” > 2, we can write the product of two ๐‘˜ ร— ๐‘˜real-valued matrices ๐ด, ๐ต using at most ๐‘˜๐œ” multiplications, then wecan multiply two ๐‘› ร— ๐‘› matrices in roughly ๐‘›๐œ” time for every largeenough ๐‘›.

To make this precise, we need to make some notation that is unfor-tunately somewhat cumbersome. Assume that there is some ๐‘˜ โˆˆ โ„•and ๐‘š โ‰ค ๐‘˜๐œ” such that for every ๐‘˜ ร— ๐‘˜ matrices ๐ด, ๐ต, ๐ถ such that๐ถ = AB, we can write for every ๐‘–, ๐‘— โˆˆ [๐‘˜]:๐ถ๐‘–,๐‘— = ๐‘šโˆ’1โˆ‘โ„“=0 ๐›ผโ„“๐‘–,๐‘—๐‘“โ„“(๐ด)๐‘”โ„“(๐ต) (6)

for some linear functions ๐‘“0, โ€ฆ , ๐‘“๐‘šโˆ’1, ๐‘”0, โ€ฆ , ๐‘”๐‘šโˆ’1 โˆถ โ„๐‘›2 โ†’ โ„ andcoefficients ๐›ผโ„“๐‘–,๐‘—๐‘–,๐‘—โˆˆ[๐‘˜],โ„“โˆˆ[๐‘š]. Prove that under this assumption forevery ๐œ– > 0, if ๐‘› is sufficiently large, then there is an algorithm thatcomputes the product of two ๐‘› ร— ๐‘› matrices using at most ๐‘‚(๐‘›๐œ”+๐œ–)arithmetic operations. See footnote for hint.6

Page 44: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

44 introduction to theoretical computer science

0.7 BIBLIOGRAPHICAL NOTES

For a brief overview of what weโ€™ll see in this book, you could do farworse than read Bernard Chazelleโ€™s wonderful essay on the Algo-rithm as an Idiom of modern science. The book of Moore and Mertens[MM11] gives a wonderful and comprehensive overview of the theoryof computation, including much of the content discussed in this chap-ter and the rest of this book. Aaronsonโ€™s book [Aar13] is another greatread that touches upon many of the same themes.

For more on the algorithms the Babylonians used, see Knuthโ€™spaper and Neugebauerโ€™s classic book.

Many of the algorithms we mention in this chapter are coveredin algorithms textbooks such as those by Cormen, Leiserson, Rivert,and Stein [Cor+09], Kleinberg and Tardos [KT06], and Dasgupta, Pa-padimitriou and Vazirani [DPV08], as well as Jeff Ericksonโ€™s textbook.Ericksonโ€™s book is freely available online and contains a great exposi-tion of recursive algorithms in general and Karatsubaโ€™s algorithm inparticular.

The story of Karatsubaโ€™s discovery of his multiplication algorithmis recounted by him in [Kar95]. As mentioned above, further improve-ments were made by Toom and Cook [Too63; Coo66], Schรถnhage andStrassen [SS71], Fรผrer [Fรผr07], and recently by Harvey and Van DerHoeven [HV19], see this article for a nice overview. The last paperscrucially rely on the Fast Fourier transform algorithm. The fascinatingstory of the (re)discovery of this algorithm by John Tukey in the con-text of the cold war is recounted in [Coo87]. (We say re-discoverybecause it later turned out that the algorithm dates back to Gauss[HJB85].) The Fast Fourier Transform is covered in some of the booksmentioned below, and there are also online available lectures such asJeff Ericksonโ€™s. See also this popular article by David Austin. Fast ma-trix multiplication was discovered by Strassen [Str69], and since thenthis has been an active area of research. [Blรค13] is a recommendedself-contained survey of this area.

The Backpropagation algorithm for fast differentiation of neural net-works was invented by Werbos [Wer74]. The Pagerank algorithm wasinvented by Larry Page and Sergey Brin [Pag+99]. It is closely relatedto the HITS algorithm of Kleinberg [Kle99]. The Akamai company wasfounded based on the consistent hashing data structure described in[Kar+97]. Compressed sensing has a long history but two foundationalpapers are [CRT06; Don06]. [Lus+08] gives a survey of applicationsof compressed sensing to MRI; see also this popular article by Ellen-berg [Ell10]. The deterministic polynomial-time algorithm for testingprimality was given by Agrawal, Kayal, and Saxena [AKS04].

Page 45: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

introduction 45

We alluded briefly to classical impossibility results in mathematics,including the impossibility of proving Euclidโ€™s fifth postulate fromthe other four, impossibility of trisecting an angle with a straightedgeand compass and the impossibility of solving a quintic equation viaradicals. A geometric proof of the impossibility of angle trisection(one of the three geometric problems of antiquity, going back to theancient Greeks) is given in this blog post of Tao. The book of MarioLivio [Liv05] covers some of the background and ideas behind theseimpossibility results. Some exciting recent research is focused ontrying to use computational complexity to shed light on fundamentalquestions in physics such understanding black holes and reconcilinggeneral relativity with quantum mechanics

Page 46: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 47: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

1Mathematical Background

โ€œI found that every number, which may be expressed from one to ten, surpassesthe preceding by one unit: afterwards the ten is doubled or tripled โ€ฆ untila hundred; then the hundred is doubled and tripled in the same manner asthe units and the tens โ€ฆ and so forth to the utmost limit of numeration.โ€,Muhammad ibn Mลซsฤ al-Khwฤrizmฤซ, 820, translation by Fredric Rosen,1831.

In this chapter we review some of the mathematical concepts thatwe use in this book. These concepts are typically covered in coursesor textbooks on โ€œmathematics for computer scienceโ€ or โ€œdiscretemathematicsโ€; see the โ€œBibliographical Notesโ€ section (Section 1.9)for several excellent resources on these topics that are freely-availableonline.

A mathematicianโ€™s apology. Some students might wonder why thisbook contains so much math. The reason is that mathematics is sim-ply a language for modeling concepts in a precise and unambiguousway. In this book we use math to model the concept of computation.For example, we will consider questions such as โ€œis there an efficientalgorithm to find the prime factors of a given integer?โ€. (We will see thatthis question is particularly interesting, touching on areas as far apartas Internet security and quantum mechanics!) To even phrase such aquestion, we need to give a precise definition of the notion of an algo-rithm, and of what it means for an algorithm to be efficient. Also, sincethere is no empirical experiment to prove the nonexistence of an algo-rithm, the only way to establish such a result is using a mathematicalproof.

1.1 THIS CHAPTER: A READERโ€™S MANUAL

Depending on your background, you can approach this chapter in twodifferent ways:

โ€ข If you have already taken a โ€œdiscrete mathematicsโ€, โ€œmathematicsfor computer scienceโ€ or similar courses, you do not need to read

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Recall basic mathematical notions such as

sets, functions, numbers, logical operatorsand quantifiers, strings, and graphs.

โ€ข Rigorously define Big-๐‘‚ notation.โ€ข Proofs by induction.โ€ข Practice with reading mathematical

definitions, statements, and proofs.โ€ข Transform an intuitive argument into a

rigorous proof.

Page 48: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

48 introduction to theoretical computer science

the whole chapter. You can just take quick look at Section 1.2 to seethe main tools we will use, Section 1.7 for our notation and conven-tions, and then skip ahead to the rest of this book. Alternatively,you can sit back, relax, and read this chapter just to get familiarwith our notation, as well as to enjoy (or not) my philosophicalmusings and attempts at humor.

โ€ข If your background is less extensive, see Section 1.9 for some re-sources on these topics. This chapter briefly covers the conceptsthat we need, but you may find it helpful to see a more in-depthtreatment. As usual with math, the best way to get comfort withthis material is to work out exercises on your own.

โ€ข You might also want to start brushing up on discrete probability,which weโ€™ll use later in this book (see Chapter 18).

1.2 A QUICK OVERVIEW OF MATHEMATICAL PREREQUISITES

The main mathematical concepts we will use are the following. Wejust list these notions below, deferring their definitions to the rest ofthis chapter. If you are familiar with all of these, then you might wantto just skip to Section 1.7 to see the full list of notation we use.

โ€ข Proofs: First and foremost, this book involves a heavy dose of for-mal mathematical reasoning, which includes mathematical defini-tions, statements, and proofs.

โ€ข Sets and set operations: We will use extensively mathematical sets.We use the basic set relations of membership (โˆˆ) and containment(โŠ†), and set operations, principally union (โˆช), intersection (โˆฉ), andset difference (โงต).

โ€ข Cartesian product and Kleene star operation: We also use theCartesian product of two sets ๐ด and ๐ต, denoted as ๐ด ร— ๐ต (that is,๐ด ร— ๐ต the set of pairs (๐‘Ž, ๐‘) where ๐‘Ž โˆˆ ๐ด and ๐‘ โˆˆ ๐ต). We denote by๐ด๐‘› the ๐‘› fold Cartesian product (e.g., ๐ด3 = ๐ด ร— ๐ด ร— ๐ด) and by ๐ดโˆ—(known as the Kleene star) the union of ๐ด๐‘› for all ๐‘› โˆˆ 0, 1, 2, โ€ฆ.

โ€ข Functions: The domain and codomain of a function, properties suchas being one-to-one (also known as injective) or onto (also knownas surjective) functions, as well as partial functions (that, unlikestandard or โ€œtotalโ€ functions, are not necessarily defined on allelements of their domain).

โ€ข Logical operations: The operations AND (โˆง), OR (โˆจ), and NOT(ยฌ) and the quantifiers โ€œthere existsโ€ (โˆƒ) and โ€œfor allโ€ (โˆ€).

โ€ข Basic combinatorics: Notions such as (๐‘›๐‘˜) (the number of ๐‘˜-sizedsubsets of a set of size ๐‘›).

Page 49: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 49

Figure 1.1: A snippet from the โ€œmethodsโ€ section ofthe โ€œAlphaGo Zeroโ€ paper by Silver et al, Nature, 2017.

Figure 1.2: A snippet from the โ€œZerocashโ€ paper ofBen-Sasson et al, that forms the basis of the cryptocur-rency startup Zcash.

โ€ข Graphs: Undirected and directed graphs, connectivity, paths, andcycles.

โ€ข Big-๐‘‚ notation: ๐‘‚, ๐‘œ, ฮฉ, ๐œ”, ฮ˜ notation for analyzing asymptoticgrowth of functions.

โ€ข Discrete probability: We will use probability theory, and specifi-cally probability over finite samples spaces such as tossing ๐‘› coins,including notions such as random variables, expectation, and concen-tration. We will only use probability theory in the second half ofthis text, and will review it beforehand in Chapter 18. However,probabilistic reasoning is a subtle (and extremely useful!) skill, anditโ€™s always good to start early in acquiring it.

In the rest of this chapter we briefly review the above notions. Thisis partially to remind the reader and reinforce material that mightnot be fresh in your mind, and partially to introduce our notationand conventions which might occasionally differ from those youโ€™veencountered before.

1.3 READING MATHEMATICAL TEXTS

Mathematicians use jargon for the same reason that it is used in manyother professions such engineering, law, medicine, and others. Wewant to make terms precise and introduce shorthand for conceptsthat are frequently reused. Mathematical texts tend to โ€œpack a lotof punchโ€ per sentence, and so the key is to read them slowly andcarefully, parsing each symbol at a time.

With time and practice you will see that reading mathematical textsbecomes easier and jargon is no longer an issue. Moreover, readingmathematical texts is one of the most transferable skills you could takefrom this book. Our world is changing rapidly, not just in the realmof technology, but also in many other human endeavors, whether itis medicine, economics, law or even culture. Whatever your futureaspirations, it is likely that you will encounter texts that use new con-cepts that you have not seen before (see Fig. 1.1 and Fig. 1.2 for tworecent examples from current โ€œhot areasโ€). Being able to internalizeand then apply new definitions can be hugely important. It is a skillthatโ€™s much easier to acquire in the relatively safe and stable context ofa mathematical course, where one at least has the guarantee that theconcepts are fully specified, and you have access to your teaching stafffor questions.

The basic components of a mathematical text are definitions, asser-tions and proofs.

Page 50: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

50 introduction to theoretical computer science

Figure 1.3: An annotated form of Definition 1.1,marking which part is being defined and how.

1.3.1 DefinitionsMathematicians often define new concepts in terms of old concepts.For example, here is a mathematical definition which you may haveencountered in the past (and will see again shortly):

Definition 1.1 โ€” One to one function. Let ๐‘†, ๐‘‡ be sets. We say that afunction ๐‘“ โˆถ ๐‘† โ†’ ๐‘‡ is one to one (also known as injective) if for everytwo elements ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ ๐‘†, if ๐‘ฅ โ‰  ๐‘ฅโ€ฒ then ๐‘“(๐‘ฅ) โ‰  ๐‘“(๐‘ฅโ€ฒ).Definition 1.1 captures a simple concept, but even so it uses quite

a bit of notation. When reading such a definition, it is often useful toannotate it with a pen as youโ€™re going through it (see Fig. 1.3). Forexample, when you see an identifier such as ๐‘“ , ๐‘† or ๐‘ฅ, make sure thatyou realize what sort of object is it: is it a set, a function, an element,a number, a gremlin? You might also find it useful to explain thedefinition in words to a friend (or to yourself).

1.3.2 Assertions: Theorems, lemmas, claimsTheorems, lemmas, claims and the like are true statements about theconcepts we defined. Deciding whether to call a particular statement aโ€œTheoremโ€, a โ€œLemmaโ€ or a โ€œClaimโ€ is a judgement call, and does notmake a mathematical difference. All three correspond to statementswhich were proven to be true. The difference is that a Theorem refersto a significant result, that we would want to remember and highlight.A Lemma often refers to a technical result, that is not necessarily im-portant in its own right, but can be often very useful in proving othertheorems. A Claim is a โ€œthrow awayโ€ statement, that we need to usein order to prove some other bigger results, but do not care so muchabout for its own sake.

1.3.3 ProofsMathematical proofs are the arguments we use to demonstrate that ourtheorems, lemmas, and claims area indeed true. We discuss proofs inSection 1.5 below, but the main point is that the mathematical stan-dard of proof is very high. Unlike in some other realms, in mathe-matics a proof is an โ€œairtightโ€ argument that demonstrates that thestatement is true beyond a shadow of a doubt. Some examples in thissection for mathematical proofs are given in Solved Exercise 1.1 andSection 1.6. As mentioned in the preface, as a general rule, it is moreimportant you understand the definitions than the theorems, and it ismore important you understand a theorem statement than its proof.

Page 51: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 51

1.4 BASIC DISCRETE MATH OBJECTS

In this section we quickly review some of the mathematical objects(the โ€œbasic data structuresโ€ of mathematics, if you will) we use in thisbook.

1.4.1 SetsA set is an unordered collection of objects. For example, when wewrite ๐‘† = 2, 4, 7, we mean that ๐‘† denotes the set that contains thenumbers 2, 4, and 7. (We use the notation โ€œ2 โˆˆ ๐‘†โ€ to denote that 2 isan element of ๐‘†.) Note that the set 2, 4, 7 and 7, 4, 2 are identical,since they contain the same elements. Also, a set either contains anelement or does not contain it โ€“ there is no notion of containing itโ€œtwiceโ€ โ€“ and so we could even write the same set ๐‘† as 2, 2, 4, 7(though that would be a little weird). The cardinality of a finite set ๐‘†,denoted by |๐‘†|, is the number of elements it contains. (Cardinality canbe defined for infinite sets as well; see the sources in Section 1.9.) So,in the example above, |๐‘†| = 3. A set ๐‘† is a subset of a set ๐‘‡ , denotedby ๐‘† โŠ† ๐‘‡ , if every element of ๐‘† is also an element of ๐‘‡ . (We canalso describe this by saying that ๐‘‡ is a superset of ๐‘†.) For example,2, 7 โŠ† 2, 4, 7. The set that contains no elements is known as theempty set and it is denoted by โˆ…. If ๐ด is a subset of ๐ต that is not equalto ๐ต we say that ๐ด is a strict subset of ๐ต, and denote this by ๐ด โŠŠ ๐ต.

We can define sets by either listing all their elements or by writingdown a rule that they satisfy such as

EVEN = ๐‘ฅ | ๐‘ฅ = 2๐‘ฆ for some non-negative integer ๐‘ฆ . (1.1)

Of course there is more than one way to write the same set, and of-ten we will use intuitive notation listing a few examples that illustratethe rule. For example, we can also define EVEN as

EVEN = 0, 2, 4, โ€ฆ . (1.2)Note that a set can be either finite (such as the set 2, 4, 7) or in-

finite (such as the set EVEN). Also, the elements of a set donโ€™t haveto be numbers. We can talk about the sets such as the set ๐‘Ž, ๐‘’, ๐‘–, ๐‘œ, ๐‘ขof all the vowels in the English language, or the set New York, LosAngeles, Chicago, Houston, Philadelphia, Phoenix, San Antonio,San Diego, Dallas of all cities in the U.S. with population more thanone million per the 2010 census. A set can even have other sets as ele-ments, such as the set โˆ…, 1, 2, 2, 3, 1, 3 of all even-sized subsetsof 1, 2, 3.Operations on sets: The union of two sets ๐‘†, ๐‘‡ , denoted by ๐‘† โˆช ๐‘‡ ,is the set that contains all elements that are either in ๐‘† or in ๐‘‡ . Theintersection of ๐‘† and ๐‘‡ , denoted by ๐‘† โˆฉ ๐‘‡ , is the set of elements that are

Page 52: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

52 introduction to theoretical computer science

1 The letter Z stands for the German word โ€œZahlenโ€,which means numbers.

both in ๐‘† and in ๐‘‡ . The set difference of ๐‘† and ๐‘‡ , denoted by ๐‘† โงต ๐‘‡ (andin some texts also by ๐‘† โˆ’ ๐‘‡ ), is the set of elements that are in ๐‘† but notin ๐‘‡ .

Tuples, lists, strings, sequences: A tuple is an ordered collection of items.For example (1, 5, 2, 1) is a tuple with four elements (also known asa 4-tuple or quadruple). Since order matters, this is not the sametuple as the 4-tuple (1, 1, 5, 2) or the 3-tuple (1, 5, 2). A 2-tuple is alsoknown as a pair. We use the terms tuples and lists interchangeably.A tuple where every element comes from some finite set ฮฃ (such as0, 1) is also known as a string. Analogously to sets, we denote thelength of a tuple ๐‘‡ by |๐‘‡ |. Just like sets, we can also think of infiniteanalogues of tuples, such as the ordered collection (1, 4, 9, โ€ฆ) of allperfect squares. Infinite ordered collections are known as sequences;we might sometimes use the term โ€œinfinite sequenceโ€ to emphasizethis, and use โ€œfinite sequenceโ€ as a synonym for a tuple. (We canidentify a sequence (๐‘Ž0, ๐‘Ž1, ๐‘Ž2, โ€ฆ) of elements in some set ๐‘† with afunction ๐ด โˆถ โ„• โ†’ ๐‘† (where ๐‘Ž๐‘› = ๐ด(๐‘›) for every ๐‘› โˆˆ โ„•). Similarly,we can identify a ๐‘˜-tuple (๐‘Ž0, โ€ฆ , ๐‘Ž๐‘˜โˆ’1) of elements in ๐‘† with a function๐ด โˆถ [๐‘˜] โ†’ ๐‘†.)

Cartesian product: If ๐‘† and ๐‘‡ are sets, then their Cartesian product,denoted by ๐‘† ร— ๐‘‡ , is the set of all ordered pairs (๐‘ , ๐‘ก) where ๐‘  โˆˆ ๐‘† and๐‘ก โˆˆ ๐‘‡ . For example, if ๐‘† = 1, 2, 3 and ๐‘‡ = 10, 12, then ๐‘† ร— ๐‘‡contains the 6 elements (1, 10), (2, 10), (3, 10), (1, 12), (2, 12), (3, 12).Similarly if ๐‘†, ๐‘‡ , ๐‘ˆ are sets then ๐‘† ร— ๐‘‡ ร— ๐‘ˆ is the set of all orderedtriples (๐‘ , ๐‘ก, ๐‘ข) where ๐‘  โˆˆ ๐‘†, ๐‘ก โˆˆ ๐‘‡ , and ๐‘ข โˆˆ ๐‘ˆ . More generally, forevery positive integer ๐‘› and sets ๐‘†0, โ€ฆ , ๐‘†๐‘›โˆ’1, we denote by ๐‘†0 ร— ๐‘†1 ร—โ‹ฏ ร— ๐‘†๐‘›โˆ’1 the set of ordered ๐‘›-tuples (๐‘ 0, โ€ฆ , ๐‘ ๐‘›โˆ’1) where ๐‘ ๐‘– โˆˆ ๐‘†๐‘– forevery ๐‘– โˆˆ 0, โ€ฆ , ๐‘› โˆ’ 1. For every set ๐‘†, we denote the set ๐‘† ร— ๐‘† by ๐‘†2,๐‘† ร— ๐‘† ร— ๐‘† by ๐‘†3, ๐‘† ร— ๐‘† ร— ๐‘† ร— ๐‘† by ๐‘†4, and so on and so forth.

1.4.2 Special setsThere are several sets that we will use in this book time and again. Theset โ„• = 0, 1, 2, โ€ฆ (1.3)contains all natural numbers, i.e., non-negative integers. For any naturalnumber ๐‘› โˆˆ โ„•, we define the set [๐‘›] as 0, โ€ฆ , ๐‘› โˆ’ 1 = ๐‘˜ โˆˆ โ„• โˆถ๐‘˜ < ๐‘›. (We start our indexing of both โ„• and [๐‘›] from 0, while manyother texts index those sets from 1. Starting from zero or one is simplya convention that doesnโ€™t make much difference, as long as one isconsistent about it.)

We will also occasionally use the set โ„ค = โ€ฆ , โˆ’2, โˆ’1, 0, +1, +2, โ€ฆof (negative and non-negative) integers,1 as well as the set โ„ of real

Page 53: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 53

numbers. (This is the set that includes not just the integers, but alsofractional and irrational numbers; e.g., โ„ contains numbers such as+0.5, โˆ’๐œ‹, etc.) We denote by โ„+ the set ๐‘ฅ โˆˆ โ„ โˆถ ๐‘ฅ > 0 of positive realnumbers. This set is sometimes also denoted as (0, โˆž).Strings: Another set we will use time and again is0, 1๐‘› = (๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) โˆถ ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 โˆˆ 0, 1 (1.4)which is the set of all ๐‘›-length binary strings for some natural number๐‘›. That is 0, 1๐‘› is the set of all ๐‘›-tuples of zeroes and ones. This isconsistent with our notation above: 0, 12 is the Cartesian product0, 1 ร— 0, 1, 0, 13 is the product 0, 1 ร— 0, 1 ร— 0, 1 and so on.

We will write the string (๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) as simply ๐‘ฅ0๐‘ฅ1 โ‹ฏ ๐‘ฅ๐‘›โˆ’1. Forexample, 0, 13 = 000, 001, 010, 011, 100, 101, 110, 111 . (1.5)

For every string ๐‘ฅ โˆˆ 0, 1๐‘› and ๐‘– โˆˆ [๐‘›], we write ๐‘ฅ๐‘– for the ๐‘–๐‘กโ„Želement of ๐‘ฅ.

We will also often talk about the set of binary strings of all lengths,which is

0, 1โˆ— = (๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) โˆถ ๐‘› โˆˆ โ„• , , ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 โˆˆ 0, 1 . (1.6)

Another way to write this set is as0, 1โˆ— = 0, 10 โˆช 0, 11 โˆช 0, 12 โˆช โ‹ฏ (1.7)

or more concisely as 0, 1โˆ— = โˆช๐‘›โˆˆโ„•0, 1๐‘› . (1.8)The set 0, 1โˆ— includes the โ€œstring of length 0โ€ or โ€œthe empty

stringโ€, which we will denote by "". (In using this notation we fol-low the convention of many programming languages. Other textssometimes use ๐œ– or ๐œ† to denote the empty string.)

Generalizing the star operation: For every set ฮฃ, we defineฮฃโˆ— = โˆช๐‘›โˆˆโ„•ฮฃ๐‘› . (1.9)For example, if ฮฃ = ๐‘Ž, ๐‘, ๐‘, ๐‘‘, โ€ฆ , ๐‘ง then ฮฃโˆ— denotes the set of all finitelength strings over the alphabet a-z.

Concatenation: The concatenation of two strings ๐‘ฅ โˆˆ ฮฃ๐‘› and ๐‘ฆ โˆˆ ฮฃ๐‘š isthe (๐‘› + ๐‘š)-length string ๐‘ฅ๐‘ฆ obtained by writing ๐‘ฆ after ๐‘ฅ. That is, if๐‘ฅ โˆˆ 0, 1๐‘› and ๐‘ฆ โˆˆ 0, 1๐‘š, then ๐‘ฅ๐‘ฆ is equal to the string ๐‘ง โˆˆ 0, 1๐‘›+๐‘šsuch that for ๐‘– โˆˆ [๐‘›], ๐‘ง๐‘– = ๐‘ฅ๐‘– and for ๐‘– โˆˆ ๐‘›, โ€ฆ , ๐‘› + ๐‘š โˆ’ 1, ๐‘ง๐‘– = ๐‘ฆ๐‘–โˆ’๐‘›.

Page 54: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

54 introduction to theoretical computer science

2 For two natural numbers ๐‘ฅ and ๐‘Ž, ๐‘ฅ mod ๐‘Ž (short-hand for โ€œmoduloโ€) denotes the remainder of ๐‘ฅwhen it is divided by ๐‘Ž. That is, it is the number ๐‘Ÿ in0, โ€ฆ , ๐‘Ž โˆ’ 1 such that ๐‘ฅ = ๐‘Ž๐‘˜ + ๐‘Ÿ for some integer ๐‘˜.We sometimes also use the notation ๐‘ฅ = ๐‘ฆ ( mod ๐‘Ž)to denote the assertion that ๐‘ฅ mod ๐‘Ž is the same as ๐‘ฆmod ๐‘Ž.

1.4.3 FunctionsIf ๐‘† and ๐‘‡ are nonempty sets, a function ๐น mapping ๐‘† to ๐‘‡ , denotedby ๐น โˆถ ๐‘† โ†’ ๐‘‡ , associates with every element ๐‘ฅ โˆˆ ๐‘† an element๐น(๐‘ฅ) โˆˆ ๐‘‡ . The set ๐‘† is known as the domain of ๐น and the set ๐‘‡is known as the codomain of ๐น . The image of a function ๐น is the set๐น(๐‘ฅ) | ๐‘ฅ โˆˆ ๐‘† which is the subset of ๐น โ€™s codomain consisting of alloutput elements that are mapped from some input. (Some texts userange to denote the image of a function, while other texts use rangeto denote the codomain of a function. Hence we will avoid using theterm โ€œrangeโ€ altogether.) As in the case of sets, we can write a func-tion either by listing the table of all the values it gives for elementsin ๐‘† or by using a rule. For example if ๐‘† = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9and ๐‘‡ = 0, 1, then the table below defines a function ๐น โˆถ ๐‘† โ†’ ๐‘‡ .Note that this function is the same as the function defined by the rule๐น(๐‘ฅ) = (๐‘ฅ mod 2).2

Table 1.1: An example of a function.

Input Output0 01 12 03 14 05 16 07 18 09 1

If ๐‘“ โˆถ ๐‘† โ†’ ๐‘‡ satisfies that ๐‘“(๐‘ฅ) โ‰  ๐‘“(๐‘ฆ) for all ๐‘ฅ โ‰  ๐‘ฆ then we saythat ๐‘“ is one-to-one (Definition 1.1, also known as an injective functionor simply an injection). If ๐น satisfies that for every ๐‘ฆ โˆˆ ๐‘‡ there is some๐‘ฅ โˆˆ ๐‘† such that ๐น(๐‘ฅ) = ๐‘ฆ then we say that ๐น is onto (also known as asurjective function or simply a surjection). A function that is both one-to-one and onto is known as a bijective function or simply a bijection.A bijection from a set ๐‘† to itself is also known as a permutation of ๐‘†. If๐น โˆถ ๐‘† โ†’ ๐‘‡ is a bijection then for every ๐‘ฆ โˆˆ ๐‘‡ there is a unique ๐‘ฅ โˆˆ ๐‘†such that ๐น(๐‘ฅ) = ๐‘ฆ. We denote this value ๐‘ฅ by ๐น โˆ’1(๐‘ฆ). Note that ๐น โˆ’1is itself a bijection from ๐‘‡ to ๐‘† (can you see why?).

Giving a bijection between two sets is often a good way to showthey have the same size. In fact, the standard mathematical definitionof the notion that โ€œ๐‘† and ๐‘‡ have the same cardinalityโ€ is that there

Page 55: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 55

Figure 1.4: We can represent finite functions as adirected graph where we put an edge from ๐‘ฅ to๐‘“(๐‘ฅ). The onto condition corresponds to requiringthat every vertex in the codomain of the functionhas in-degree at least one. The one-to-one conditioncorresponds to requiring that every vertex in thecodomain of the function has in-degree at most one. Inthe examples above ๐น is an onto function, ๐บ is one toone, and ๐ป is neither onto nor one to one.

exists a bijection ๐‘“ โˆถ ๐‘† โ†’ ๐‘‡ . Further, the cardinality of a set ๐‘† isdefined to be ๐‘› if there is a bijection from ๐‘† to the set 0, โ€ฆ , ๐‘› โˆ’ 1.As we will see later in this book, this is a definition that generalizes todefining the cardinality of infinite sets.Partial functions: We will sometimes be interested in partial functionsfrom ๐‘† to ๐‘‡ . A partial function is allowed to be undefined on somesubset of ๐‘†. That is, if ๐น is a partial function from ๐‘† to ๐‘‡ , then forevery ๐‘  โˆˆ ๐‘†, either there is (as in the case of standard functions) anelement ๐น(๐‘ ) in ๐‘‡ , or ๐น(๐‘ ) is undefined. For example, the partial func-tion ๐น(๐‘ฅ) = โˆš๐‘ฅ is only defined on non-negative real numbers. Whenwe want to distinguish between partial functions and standard (i.e.,non-partial) functions, we will call the latter total functions. When wesay โ€œfunctionโ€ without any qualifier then we mean a total function.

The notion of partial functions is a strict generalization of func-tions, and so every function is a partial function, but not every partialfunction is a function. (That is, for every nonempty ๐‘† and ๐‘‡ , the setof partial functions from ๐‘† to ๐‘‡ is a proper superset of the set of totalfunctions from ๐‘† to ๐‘‡ .) When we want to emphasize that a function๐‘“ from ๐ด to ๐ต might not be total, we will write ๐‘“ โˆถ ๐ด โ†’๐‘ ๐ต. We canthink of a partial function ๐น from ๐‘† to ๐‘‡ also as a total function from๐‘† to ๐‘‡ โˆช โŠฅ where โŠฅ is a special โ€œfailure symbolโ€. So, instead ofsaying that ๐น is undefined at ๐‘ฅ, we can say that ๐น(๐‘ฅ) = โŠฅ.Basic facts about functions: Verifying that you can prove the followingresults is an excellent way to brush up on functions:โ€ข If ๐น โˆถ ๐‘† โ†’ ๐‘‡ and ๐บ โˆถ ๐‘‡ โ†’ ๐‘ˆ are one-to-one functions, then their

composition ๐ป โˆถ ๐‘† โ†’ ๐‘ˆ defined as ๐ป(๐‘ ) = ๐บ(๐น(๐‘ )) is also one toone.

โ€ข If ๐น โˆถ ๐‘† โ†’ ๐‘‡ is one to one, then there exists an onto function๐บ โˆถ ๐‘‡ โ†’ ๐‘† such that ๐บ(๐น(๐‘ )) = ๐‘  for every ๐‘  โˆˆ ๐‘†.

โ€ข If ๐บ โˆถ ๐‘‡ โ†’ ๐‘† is onto then there exists a one-to-one function ๐น โˆถ ๐‘† โ†’๐‘‡ such that ๐บ(๐น(๐‘ )) = ๐‘  for every ๐‘  โˆˆ ๐‘†.

โ€ข If ๐‘† and ๐‘‡ are finite sets then the following conditions are equiva-lent to one another: (a) |๐‘†| โ‰ค |๐‘‡ |, (b) there is a one-to-one function๐น โˆถ ๐‘† โ†’ ๐‘‡ , and (c) there is an onto function ๐บ โˆถ ๐‘‡ โ†’ ๐‘†. (This isactually true even for infinite ๐‘† and ๐‘‡ : in that case (b) (or equiva-lently (c)) is the commonly accepted definition for |๐‘†| โ‰ค |๐‘‡ |.)

PYou can find the proofs of these results in many dis-crete math texts, including for example, Section 4.5in the Lehman-Leighton-Meyer notes. However, I

Page 56: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

56 introduction to theoretical computer science

3 It is possible, and sometimes useful, to think of anundirected graph as the special case of a directedgraph that has the special property that for every pair๐‘ข, ๐‘ฃ either both the edges (๐‘ข, ๐‘ฃ) and (๐‘ฃ, ๐‘ข) are presentor neither of them is. However, in many settings thereis a significant difference between undirected anddirected graphs, and so itโ€™s typically best to think ofthem as separate categories.

Figure 1.5: An example of an undirected and a di-rected graph. The undirected graph has vertex set1, 2, 3, 4 and edge set 1, 2, 2, 3, 2, 4. Thedirected graph has vertex set ๐‘Ž, ๐‘, ๐‘ and the edgeset (๐‘Ž, ๐‘), (๐‘, ๐‘), (๐‘, ๐‘Ž), (๐‘Ž, ๐‘).

strongly suggest you try to prove them on your own,or at least convince yourself that they are true byproving special cases of those for small sizes (e.g.,|๐‘†| = 3, |๐‘‡ | = 4, |๐‘ˆ| = 5).

Let us prove one of these facts as an example:Lemma 1.2 If ๐‘†, ๐‘‡ are non-empty sets and ๐น โˆถ ๐‘† โ†’ ๐‘‡ is one to one, thenthere exists an onto function ๐บ โˆถ ๐‘‡ โ†’ ๐‘† such that ๐บ(๐น(๐‘ )) = ๐‘  forevery ๐‘  โˆˆ ๐‘†.

Proof. Choose some ๐‘ 0 โˆˆ ๐‘†. We will define the function ๐บ โˆถ ๐‘‡ โ†’ ๐‘† asfollows: for every ๐‘ก โˆˆ ๐‘‡ , if there is some ๐‘  โˆˆ ๐‘† such that ๐น(๐‘ ) = ๐‘ก thenset ๐บ(๐‘ก) = ๐‘  (the choice of ๐‘  is well defined since by the one-to-oneproperty of ๐น , there cannot be two distinct ๐‘ , ๐‘ โ€ฒ that both map to ๐‘ก).Otherwise, set ๐บ(๐‘ก) = ๐‘ 0. Now for every ๐‘  โˆˆ ๐‘†, by the definition of ๐บ,if ๐‘ก = ๐น(๐‘ ) then ๐บ(๐‘ก) = ๐บ(๐น(๐‘ )) = ๐‘ . Moreover, this also shows that๐บ is onto, since it means that for every ๐‘  โˆˆ ๐‘† there is some ๐‘ก, namely๐‘ก = ๐น(๐‘ ), such that ๐บ(๐‘ก) = ๐‘ .

1.4.4 GraphsGraphs are ubiquitous in Computer Science, and many other fields aswell. They are used to model a variety of data types including socialnetworks, scheduling constraints, road networks, deep neural nets,gene interactions, correlations between observations, and a greatmany more. Formal definitions of several kinds of graphs are givennext, but if you have not seen graphs before in a course, I urge you toread up on them in one of the sources mentioned in Section 1.9.

Graphs come in two basic flavors: undirected and directed.3

Definition 1.3 โ€” Undirected graphs. An undirected graph ๐บ = (๐‘‰ , ๐ธ) con-sists of a set ๐‘‰ of vertices and a set ๐ธ of edges. Every edge is a sizetwo subset of ๐‘‰ . We say that two vertices ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰ are neighbors, ifthe edge ๐‘ข, ๐‘ฃ is in ๐ธ.

Given this definition, we can define several other properties ofgraphs and their vertices. We define the degree of ๐‘ข to be the numberof neighbors ๐‘ข has. A path in the graph is a tuple (๐‘ข0, โ€ฆ , ๐‘ข๐‘˜) โˆˆ ๐‘‰ ๐‘˜+1,for some ๐‘˜ > 0 such that ๐‘ข๐‘–+1 is a neighbor of ๐‘ข๐‘– for every ๐‘– โˆˆ [๐‘˜]. Asimple path is a path (๐‘ข0, โ€ฆ , ๐‘ข๐‘˜โˆ’1) where all the ๐‘ข๐‘–โ€™s are distinct. A cycleis a path (๐‘ข0, โ€ฆ , ๐‘ข๐‘˜) where ๐‘ข0 = ๐‘ข๐‘˜. We say that two vertices ๐‘ข, ๐‘ฃ โˆˆ ๐‘‰are connected if either ๐‘ข = ๐‘ฃ or there is a path from (๐‘ข0, โ€ฆ , ๐‘ข๐‘˜) where๐‘ข0 = ๐‘ข and ๐‘ข๐‘˜ = ๐‘ฃ. We say that the graph ๐บ is connected if every pair ofvertices in it is connected.

Page 57: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 57

Here are some basic facts about undirected graphs. We give someinformal arguments below, but leave the full proofs as exercises (theproofs can be found in many of the resources listed in Section 1.9).Lemma 1.4 In any undirected graph ๐บ = (๐‘‰ , ๐ธ), the sum of the degreesof all vertices is equal to twice the number of edges.

Lemma 1.4 can be shown by seeing that every edge ๐‘ข, ๐‘ฃ con-tributes twice to the sum of the degrees (once for ๐‘ข and the secondtime for ๐‘ฃ).Lemma 1.5 The connectivity relation is transitive, in the sense that if ๐‘ข isconnected to ๐‘ฃ, and ๐‘ฃ is connected to ๐‘ค, then ๐‘ข is connected to ๐‘ค.

Lemma 1.5 can be shown by simply attaching a path of the form(๐‘ข, ๐‘ข1, ๐‘ข2, โ€ฆ , ๐‘ข๐‘˜โˆ’1, ๐‘ฃ) to a path of the form (๐‘ฃ, ๐‘ขโ€ฒ1, โ€ฆ , ๐‘ขโ€ฒ๐‘˜โ€ฒโˆ’1, ๐‘ค) to obtainthe path (๐‘ข, ๐‘ข1, โ€ฆ , ๐‘ข๐‘˜โˆ’1, ๐‘ฃ, ๐‘ขโ€ฒ1, โ€ฆ , ๐‘ขโ€ฒ๐‘˜โ€ฒโˆ’1, ๐‘ค) that connects ๐‘ข to ๐‘ค.Lemma 1.6 For every undirected graph ๐บ = (๐‘‰ , ๐ธ) and connected pair๐‘ข, ๐‘ฃ, the shortest path from ๐‘ข to ๐‘ฃ is simple. In particular, for everyconnected pair there exists a simple path that connects them.

Lemma 1.6 can be shown by โ€œshortcuttingโ€ any non simple pathfrom ๐‘ข to ๐‘ฃ where the same vertex ๐‘ค appears twice to remove it (seeFig. 1.6). It is a good exercise to transforming this intuitive reasoningto a formal proof:

Figure 1.6: If there is a path from ๐‘ข to ๐‘ฃ in a graphthat passes twice through a vertex ๐‘ค then we canโ€œshortcutโ€ it by removing the loop from ๐‘ค to itself tofind a path from ๐‘ข to ๐‘ฃ that only passes once through๐‘ค.

Solved Exercise 1.1 โ€” Connected vertices have simple paths. Prove Lemma 1.6

Solution:

The proof follows the idea illustrated in Fig. 1.6. One complica-tion is that there can be more than one vertex that is visited twiceby a path, and so โ€œshortcuttingโ€ might not necessarily result in a

Page 58: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

58 introduction to theoretical computer science

simple path; we deal with this by looking at a shortest path between๐‘ข and ๐‘ฃ. Details follow.Let ๐บ = (๐‘‰ , ๐ธ) be a graph and ๐‘ข and ๐‘ฃ in ๐‘‰ be two connected

vertices in ๐บ. We will prove that there is a simple graph between๐‘ข and ๐‘ฃ. Let ๐‘˜ be the shortest length of a path between ๐‘ข and ๐‘ฃand let ๐‘ƒ = (๐‘ข0, ๐‘ข1, ๐‘ข2, โ€ฆ , ๐‘ข๐‘˜โˆ’1, ๐‘ข๐‘˜) be a ๐‘˜-length path from ๐‘ข to ๐‘ฃ(there can be more than one such path: if so we just choose one ofthem). (That is ๐‘ข0 = ๐‘ข, ๐‘ข๐‘˜ = ๐‘ฃ, and (๐‘ขโ„“, ๐‘ขโ„“+1) โˆˆ ๐ธ for all โ„“ โˆˆ [๐‘˜].)We claim that ๐‘ƒ is simple. Indeed, suppose otherwise that there issome vertex ๐‘ค that occurs twice in the path: ๐‘ค = ๐‘ข๐‘– and ๐‘ค = ๐‘ข๐‘— forsome ๐‘– < ๐‘—. Then we can โ€œshortcutโ€ the path ๐‘ƒ by considering thepath ๐‘ƒ โ€ฒ = (๐‘ข0, ๐‘ข1, โ€ฆ , ๐‘ข๐‘–โˆ’1, ๐‘ค, ๐‘ข๐‘—+1, โ€ฆ , ๐‘ข๐‘˜) obtained by taking thefirst ๐‘– vertices of ๐‘ƒ (from ๐‘ข0 = 0 to the first occurrence of ๐‘ค) andthe last ๐‘˜ โˆ’ ๐‘— ones (from the vertex ๐‘ข๐‘—+1 following the second oc-currence of ๐‘ค to ๐‘ข๐‘˜ = ๐‘ฃ). The path ๐‘ƒ โ€ฒ is a valid path between ๐‘ข and๐‘ฃ since every consecutive pair of vertices in it is connected by anedge (in particular, since ๐‘ค = ๐‘ข๐‘– = ๐‘ค๐‘—, both (๐‘ข๐‘–โˆ’1, ๐‘ค) and (๐‘ค, ๐‘ข๐‘—+1)are edges in ๐ธ), but since the length of ๐‘ƒ โ€ฒ is ๐‘˜ โˆ’ (๐‘— โˆ’ ๐‘–) < ๐‘˜, thiscontradicts the minimality of ๐‘ƒ .

RRemark 1.7 โ€” Finding proofs. Solved Exercise 1.1 is agood example of the process of finding a proof. Youstart by ensuring you understand what the statementmeans, and then come up with an informal argumentwhy it should be true. You then transform the infor-mal argument into a rigorous proof. This proof neednot be very long or overly formal, but should clearlyestablish why the conclusion of the statement followsfrom its assumptions.

The concepts of degrees and connectivity extend naturally to di-rected graphs, defined as follows.

Definition 1.8 โ€” Directed graphs. A directed graph ๐บ = (๐‘‰ , ๐ธ) consistsof a set ๐‘‰ and a set ๐ธ โŠ† ๐‘‰ ร— ๐‘‰ of ordered pairs of ๐‘‰ . We sometimesdenote the edge (๐‘ข, ๐‘ฃ) also as ๐‘ข โ†’ ๐‘ฃ. If the edge ๐‘ข โ†’ ๐‘ฃ is presentin the graph then we say that ๐‘ฃ is an out-neighbor of ๐‘ข and ๐‘ข is anin-neighbor of ๐‘ฃ.A directed graph might contain both ๐‘ข โ†’ ๐‘ฃ and ๐‘ฃ โ†’ ๐‘ข in which

case ๐‘ข will be both an in-neighbor and an out-neighbor of ๐‘ฃ and viceversa. The in-degree of ๐‘ข is the number of in-neighbors it has, and theout-degree of ๐‘ฃ is the number of out-neighbors it has. A path in the

Page 59: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 59

graph is a tuple (๐‘ข0, โ€ฆ , ๐‘ข๐‘˜) โˆˆ ๐‘‰ ๐‘˜+1, for some ๐‘˜ > 0 such that ๐‘ข๐‘–+1 is anout-neighbor of ๐‘ข๐‘– for every ๐‘– โˆˆ [๐‘˜]. As in the undirected case, a simplepath is a path (๐‘ข0, โ€ฆ , ๐‘ข๐‘˜โˆ’1) where all the ๐‘ข๐‘–โ€™s are distinct and a cycleis a path (๐‘ข0, โ€ฆ , ๐‘ข๐‘˜) where ๐‘ข0 = ๐‘ข๐‘˜. One type of directed graphs weoften care about is directed acyclic graphs or DAGs, which, as their nameimplies, are directed graphs without any cycles:

Definition 1.9 โ€” Directed Acyclic Graphs. We say that ๐บ = (๐‘‰ , ๐ธ) is adirected acyclic graph (DAG) if it is a directed graph and there doesnot exist a list of vertices ๐‘ข0, ๐‘ข1, โ€ฆ , ๐‘ข๐‘˜ โˆˆ ๐‘‰ such that ๐‘ข0 = ๐‘ข๐‘˜ andfor every ๐‘– โˆˆ [๐‘˜], the edge ๐‘ข๐‘– โ†’ ๐‘ข๐‘–+1 is in ๐ธ.

The lemmas we mentioned above have analogs for directed graphs.We again leave the proofs (which are essentially identical to theirundirected analogs) as exercises.Lemma 1.10 In any directed graph ๐บ = (๐‘‰ , ๐ธ), the sum of the in-degrees is equal to the sum of the out-degrees, which is equal to thenumber of edges.Lemma 1.11 In any directed graph ๐บ, if there is a path from ๐‘ข to ๐‘ฃ and apath from ๐‘ฃ to ๐‘ค, then there is a path from ๐‘ข to ๐‘ค.Lemma 1.12 For every directed graph ๐บ = (๐‘‰ , ๐ธ) and a pair ๐‘ข, ๐‘ฃ suchthat there is a path from ๐‘ข to ๐‘ฃ, the shortest path from ๐‘ข to ๐‘ฃ is simple.

RRemark 1.13 โ€” Labeled graphs. For some applicationswe will consider labeled graphs, where the vertices oredges have associated labels (which can be numbers,strings, or members of some other set). We can thinkof such a graph as having an associated (possiblypartial) labelling function ๐ฟ โˆถ ๐‘‰ โˆช ๐ธ โ†’ โ„’, where โ„’ isthe set of potential labels. However we will typicallynot refer explicitly to this labeling function and simplysay things such as โ€œvertex ๐‘ฃ has the label ๐›ผโ€.

1.4.5 Logic operators and quantifiersIf ๐‘ƒ and ๐‘„ are some statements that can be true or false, then ๐‘ƒ AND๐‘„ (denoted as ๐‘ƒ โˆง ๐‘„) is a statement that is true if and only if both ๐‘ƒand ๐‘„ are true, and ๐‘ƒ OR ๐‘„ (denoted as ๐‘ƒ โˆจ ๐‘„) is a statement that istrue if and only if either ๐‘ƒ or ๐‘„ is true. The negation of ๐‘ƒ , denoted asยฌ๐‘ƒ or ๐‘ƒ , is true if and only if ๐‘ƒ is false.

Suppose that ๐‘ƒ(๐‘ฅ) is a statement that depends on some parameter ๐‘ฅ(also sometimes known as an unbound variable) in the sense that forevery instantiation of ๐‘ฅ with a value from some set ๐‘†, ๐‘ƒ(๐‘ฅ) is eithertrue or false. For example, ๐‘ฅ > 7 is a statement that is not a priori

Page 60: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

60 introduction to theoretical computer science

4 In this book, we place the variable bound by a quan-tifier in a subscript and so write โˆ€๐‘ฅโˆˆ๐‘†๐‘ƒ(๐‘ฅ). Manyother texts do not use this subscript notation and sowill write the same statement as โˆ€๐‘ฅ โˆˆ ๐‘†, ๐‘ƒ(๐‘ฅ).

true or false, but becomes true or false whenever we instantiate ๐‘ฅ withsome real number. We denote by โˆ€๐‘ฅโˆˆ๐‘†๐‘ƒ(๐‘ฅ) the statement that is trueif and only if ๐‘ƒ(๐‘ฅ) is true for every ๐‘ฅ โˆˆ ๐‘†.4 We denote by โˆƒ๐‘ฅโˆˆ๐‘†๐‘ƒ(๐‘ฅ) thestatement that is true if and only if there exists some ๐‘ฅ โˆˆ ๐‘† such that๐‘ƒ(๐‘ฅ) is true.

For example, the following is a formalization of the true statementthat there exists a natural number ๐‘› larger than 100 that is not divisi-ble by 3: โˆƒ๐‘›โˆˆโ„•(๐‘› > 100) โˆง (โˆ€๐‘˜โˆˆ๐‘๐‘˜ + ๐‘˜ + ๐‘˜ โ‰  ๐‘›) . (1.10)

โ€For sufficiently large n.โ€ One expression that we will see come uptime and again in this book is the claim that some statement ๐‘ƒ(๐‘›) istrue โ€œfor sufficiently large ๐‘›โ€. What this means is that there exists aninteger ๐‘0 such that ๐‘ƒ(๐‘›) is true for every ๐‘› > ๐‘0. We can formalizethis as โˆƒ๐‘0โˆˆโ„•โˆ€๐‘›>๐‘0๐‘ƒ(๐‘›).1.4.6 Quantifiers for summations and productsThe following shorthands for summing up or taking products of sev-eral numbers are often convenient. If ๐‘† = ๐‘ 0, โ€ฆ , ๐‘ ๐‘›โˆ’1 is a finite setand ๐‘“ โˆถ ๐‘† โ†’ โ„ is a function, then we write โˆ‘๐‘ฅโˆˆ๐‘† ๐‘“(๐‘ฅ) as shorthand for๐‘“(๐‘ 0) + ๐‘“(๐‘ 1) + ๐‘“(๐‘ 2) + โ€ฆ + ๐‘“(๐‘ ๐‘›โˆ’1) , (1.11)

and โˆ๐‘ฅโˆˆ๐‘† ๐‘“(๐‘ฅ) as shorthand for๐‘“(๐‘ 0) โ‹… ๐‘“(๐‘ 1) โ‹… ๐‘“(๐‘ 2) โ‹… โ€ฆ โ‹… ๐‘“(๐‘ ๐‘›โˆ’1) . (1.12)

For example, the sum of the squares of all numbers from 1 to 100can be written as โˆ‘๐‘–โˆˆ1,โ€ฆ,100 ๐‘–2 . (1.13)

Since summing up over intervals of integers is so common, thereis a special notation for it. For every two integers, ๐‘Ž โ‰ค ๐‘, โˆ‘๐‘๐‘–=๐‘Ž ๐‘“(๐‘–)denotes โˆ‘๐‘–โˆˆ๐‘† ๐‘“(๐‘–) where ๐‘† = ๐‘ฅ โˆˆ โ„ค โˆถ ๐‘Ž โ‰ค ๐‘ฅ โ‰ค ๐‘. Hence, we canwrite the sum (1.13) as 100โˆ‘๐‘–=1 ๐‘–2 . (1.14)

1.4.7 Parsing formulas: bound and free variablesIn mathematics, as in coding, we often have symbolic โ€œvariablesโ€ orโ€œparametersโ€. It is important to be able to understand, given someformula, whether a given variable is bound or free in this formula. For

Page 61: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 61

example, in the following statement ๐‘› is free but ๐‘Ž and ๐‘ are bound bythe โˆƒ quantifier: โˆƒ๐‘Ž,๐‘โˆˆโ„•(๐‘Ž โ‰  1) โˆง (๐‘Ž โ‰  ๐‘›) โˆง (๐‘› = ๐‘Ž ร— ๐‘) (1.15)

Since ๐‘› is free, it can be set to any value, and the truth of the state-ment (1.15) depends on the value of ๐‘›. For example, if ๐‘› = 8 then(1.15) is true, but for ๐‘› = 11 it is false. (Can you see why?)

The same issue appears when parsing code. For example, in thefollowing snippet from the C programming language

for (int i=0 ; i<n ; i=i+1) printf("*");

the variable i is bound within the for block but the variable n isfree.

The main property of bound variables is that we can rename them(as long as the new name doesnโ€™t conflict with another used variable)without changing the meaning of the statement. Thus for example thestatement โˆƒ๐‘ฅ,๐‘ฆโˆˆโ„•(๐‘ฅ โ‰  1) โˆง (๐‘ฅ โ‰  ๐‘›) โˆง (๐‘› = ๐‘ฅ ร— ๐‘ฆ) (1.16)

is equivalent to (1.15) in the sense that it is true for exactly the sameset of ๐‘›โ€™s.

Similarly, the code

for (int j=0 ; j<n ; j=j+1) printf("*");

produces the same result as the code above that used i instead of j.

RRemark 1.14 โ€” Aside: mathematical vs programming no-tation. Mathematical notation has a lot of similaritieswith programming language, and for the same rea-sons. Both are formalisms meant to convey complexconcepts in a precise way. However, there are somecultural differences. In programming languages, weoften try to use meaningful variable names such asNumberOfVerticeswhile in math we often use shortidentifiers such as ๐‘›. Part of it might have to do withthe tradition of mathematical proofs as being hand-written and verbally presented, as opposed to typedup and compiled. Another reason is if the wrongvariable name is used in a proof, at worst is causesconfusion to readers; when the wrong variable name

Page 62: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

62 introduction to theoretical computer science

is used in a program, planes might crash, patientsmight die, and rockets could explode.One consequence of that is that in mathematics weoften end up reusing identifiers, and also โ€œrun outโ€of letters and hence use Greek letters too, as well asdistinguish between small and capital letters anddifferent font faces. Similarly, mathematical notationtends to use quite a lot of โ€œoverloadingโ€, using oper-ators such as + for a great variety of objects (e.g., realnumbers, matrices, finite field elements, etc..), andassuming that the meaning can be inferred from thecontext.Both fields have a notion of โ€œtypesโ€, and in mathwe often try to reserve certain letters for variablesof a particular type. For example, variables such as๐‘–, ๐‘—, ๐‘˜, โ„“, ๐‘š, ๐‘› will often denote integers, and ๐œ– willoften denote a small positive real number (see Sec-tion 1.7 for more on these conventions). When readingor writing mathematical texts, we usually donโ€™t havethe advantage of a โ€œcompilerโ€ that will check typesafety for us. Hence it is important to keep track of thetype of each variable, and see that the operations thatare performed on it โ€œmake senseโ€.Kunโ€™s book [Kun18] contains an extensive discus-sion on the similarities and differences between thecultures of mathematics and programming.

1.4.8 Asymptotics and Big-๐‘‚ notationโ€œlog log log๐‘› has been proved to go to infinity, but has never been observed todo so.โ€, Anonymous, quoted by Carl Pomerance (2000)

It is often very cumbersome to describe precisely quantities suchas running time and is also not needed, since we are typically mostlyinterested in the โ€œhigher order termsโ€. That is, we want to understandthe scaling behavior of the quantity as the input variable grows. Forexample, as far as running time goes, the difference between an ๐‘›5-time algorithm and an ๐‘›2-time one is much more significant than thedifference between an 100๐‘›2 + 10๐‘› time algorithm and an 10๐‘›2 timealgorithm. For this purpose, ๐‘‚-notation is extremely useful as a wayto โ€œdeclutterโ€ our text and focus our attention on what really matters.For example, using ๐‘‚-notation, we can say that both 100๐‘›2 + 10๐‘›and 10๐‘›2 are simply ฮ˜(๐‘›2) (which informally means โ€œthe same up toconstant factorsโ€), while ๐‘›2 = ๐‘œ(๐‘›5) (which informally means that ๐‘›2is โ€œmuch smaller thanโ€ ๐‘›5).

Generally (though still informally), if ๐น, ๐บ are two functions map-ping natural numbers to non-negative reals, then โ€œ๐น = ๐‘‚(๐บ)โ€ meansthat ๐น(๐‘›) โ‰ค ๐บ(๐‘›) if we donโ€™t care about constant factors, whileโ€œ๐น = ๐‘œ(๐บ)โ€ means that ๐น is much smaller than ๐บ, in the sense that nomatter by what constant factor we multiply ๐น , if we take ๐‘› to be large

Page 63: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 63

Figure 1.7: If ๐น(๐‘›) = ๐‘œ(๐บ(๐‘›)) then for sufficientlylarge ๐‘›, ๐น(๐‘›) will be smaller than ๐บ(๐‘›). For example,if Algorithm ๐ด runs in time 1000 โ‹… ๐‘› + 106 andAlgorithm ๐ต runs in time 0.01 โ‹… ๐‘›2 then even though๐ต might be more efficient for smaller inputs, whenthe inputs get sufficiently large, ๐ด will run much fasterthan ๐ต.

enough then ๐บ will be bigger (for this reason, sometimes ๐น = ๐‘œ(๐บ)is written as ๐น โ‰ช ๐บ). We will write ๐น = ฮ˜(๐บ) if ๐น = ๐‘‚(๐บ) and๐บ = ๐‘‚(๐น), which one can think of as saying that ๐น is the same as ๐บ ifwe donโ€™t care about constant factors. More formally, we define Big-๐‘‚notation as follows:

Definition 1.15 โ€” Big-๐‘‚ notation. Let โ„+ = ๐‘ฅ โˆˆ โ„ | ๐‘ฅ > 0 be the setof positive real numbers. For two functions ๐น, ๐บ โˆถ โ„• โ†’ โ„+, we saythat ๐น = ๐‘‚(๐บ) if there exist numbers ๐‘Ž, ๐‘0 โˆˆ โ„• such that ๐น(๐‘›) โ‰ค๐‘Ž โ‹… ๐บ(๐‘›) for every ๐‘› > ๐‘0. We say that ๐น = ฮ˜(๐บ) if ๐น = ๐‘‚(๐บ) and๐บ = ๐‘‚(๐น). We say that ๐น = ฮฉ(๐บ) if ๐บ = ๐‘‚(๐น).

We say that ๐น = ๐‘œ(๐บ) if for every ๐œ– > 0 there is some ๐‘0 suchthat ๐น(๐‘›) < ๐œ–๐บ(๐‘›) for every ๐‘› > ๐‘0. We say that ๐น = ๐œ”(๐บ) if๐บ = ๐‘œ(๐น).Itโ€™s often convenient to use โ€œanonymous functionsโ€ in the context of๐‘‚-notation. For example, when we write a statement such as ๐น(๐‘›) =๐‘‚(๐‘›3), we mean that ๐น = ๐‘‚(๐บ) where ๐บ is the function defined by๐บ(๐‘›) = ๐‘›3. Chapter 7 in Jim Apsnesโ€™ notes on discrete math provides

a good summary of ๐‘‚ notation; see also this tutorial for a gentler andmore programmer-oriented introduction.๐‘‚ is not equality. Using the equality sign for ๐‘‚-notation is extremelycommon, but is somewhat of a misnomer, since a statement such as๐น = ๐‘‚(๐บ) really means that ๐น is in the set ๐บโ€ฒ โˆถ โˆƒ๐‘,๐‘ s.t. โˆ€๐‘›>๐‘๐บโ€ฒ(๐‘›) โ‰ค๐‘๐บ(๐‘›). If anything, it makes more sense to use inequalities and write๐น โ‰ค ๐‘‚(๐บ) and ๐น โ‰ฅ ฮฉ(๐บ), reserving equality for ๐น = ฮ˜(๐บ), andso we will sometimes use this notation too, but since the equalitynotation is quite firmly entrenched we often stick to it as well. (Sometexts write ๐น โˆˆ ๐‘‚(๐บ) instead of ๐น = ๐‘‚(๐บ), but we will not use thisnotation.) Despite the misleading equality sign, you should rememberthat a statement such as ๐น = ๐‘‚(๐บ) means that ๐น is โ€œat mostโ€ ๐บ insome rough sense when we ignore constants, and a statement such as๐น = ฮฉ(๐บ) means that ๐น is โ€œat leastโ€ ๐บ in the same rough sense.

1.4.9 Some โ€œrules of thumbโ€ for Big-๐‘‚ notationThere are some simple heuristics that can help when trying to com-pare two functions ๐น and ๐บ:

โ€ข Multiplicative constants donโ€™t matter in ๐‘‚-notation, and so if๐น(๐‘›) = ๐‘‚(๐บ(๐‘›)) then 100๐น(๐‘›) = ๐‘‚(๐บ(๐‘›)).โ€ข When adding two functions, we only care about the larger one. For

example, for the purpose of ๐‘‚-notation, ๐‘›3 + 100๐‘›2 is the same as๐‘›3, and in general in any polynomial, we only care about the largerexponent.

Page 64: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

64 introduction to theoretical computer science

โ€ข For every two constants ๐‘Ž, ๐‘ > 0, ๐‘›๐‘Ž = ๐‘‚(๐‘›๐‘) if and only if ๐‘Ž โ‰ค ๐‘,and ๐‘›๐‘Ž = ๐‘œ(๐‘›๐‘) if and only if ๐‘Ž < ๐‘. For example, combining the twoobservations above, 100๐‘›2 + 10๐‘› + 100 = ๐‘œ(๐‘›3).

โ€ข Polynomial is always smaller than exponential: ๐‘›๐‘Ž = ๐‘œ(2๐‘›๐œ–) forevery two constants ๐‘Ž > 0 and ๐œ– > 0 even if ๐œ– is much smaller than๐‘Ž. For example, 100๐‘›100 = ๐‘œ(2โˆš๐‘›).

โ€ข Similarly, logarithmic is always smaller than polynomial: (log๐‘›)๐‘Ž(which we write as log๐‘Ž ๐‘›) is ๐‘œ(๐‘›๐œ–) for every two constants ๐‘Ž, ๐œ– > 0.For example, combining the observations above, 100๐‘›2 log100 ๐‘› =๐‘œ(๐‘›3).

RRemark 1.16 โ€” Big ๐‘‚ for other applications (optional).While Big-๐‘‚ notation is often used to analyze runningtime of algorithms, this is by no means the only ap-plication. We can use ๐‘‚ notation to bound asymptoticrelations between any functions mapping integersto positive numbers. It can be used regardless ofwhether these functions are a measure of runningtime, memory usage, or any other quantity that mayhave nothing to do with computation. Here is oneexample which is unrelated to this book (and henceone that you can feel free to skip): one way to state theRiemann Hypothesis (one of the most famous openquestions in mathematics) is that it corresponds tothe conjecture that the number of primes between 0and ๐‘› is equal to โˆซ๐‘›2 1

ln๐‘ฅ ๐‘‘๐‘ฅ up to an additive error ofmagnitude at most ๐‘‚(โˆš๐‘› log๐‘›).

1.5 PROOFS

Many people think of mathematical proofs as a sequence of logicaldeductions that starts from some axioms and ultimately arrives at aconclusion. In fact, some dictionaries define proofs that way. This isnot entirely wrong, but at its essence mathematical proof of a state-ment X is simply an argument that convinces the reader that X is truebeyond a shadow of a doubt.

To produce such a proof you need to:

1. Understand precisely what X means.

2. Convince yourself that X is true.

3. Write your reasoning down in plain, precise and concise English(using formulas or notation only when they help clarity).

Page 65: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 65

In many cases, the first part is the most important one. Understand-ing what a statement means is oftentimes more than halfway towardsunderstanding why it is true. In third part, to convince the readerbeyond a shadow of a doubt, we will often want to break down thereasoning to โ€œbasic stepsโ€, where each basic step is simple enoughto be โ€œself evidentโ€. The combination of all steps yields the desiredstatement.

1.5.1 Proofs and programsThere is a great deal of similarity between the process of writing proofsand that of writing programs, and both require a similar set of skills.Writing a program involves:

1. Understanding what is the task we want the program to achieve.

2. Convincing yourself that the task can be achieved by a computer,perhaps by planning on a whiteboard or notepad how you willbreak it up to simpler tasks.

3. Converting this plan into code that a compiler or interpreter canunderstand, by breaking up each task into a sequence of the basicoperations of some programming language.

In programs as in proofs, step 1 is often the most important one.A key difference is that the reader for proofs is a human being andthe reader for programs is a computer. (This difference is erodingwith time as more proofs are being written in a machine verifiable form;moreover, to ensure correctness and maintainability of programs, itis important that they can be read and understood by humans.) Thusour emphasis is on readability and having a clear logical flow for ourproof (which is not a bad idea for programs as well). When writing aproof, you should think of your audience as an intelligent but highlyskeptical and somewhat petty reader, that will โ€œcall foulโ€ at every stepthat is not well justified.

1.5.2 Proof writing styleA mathematical proof is a piece of writing, but it is a specific genreof writing with certain conventions and preferred styles. As in anywriting, practice makes perfect, and it is also important to revise yourdrafts for clarity.

In a proof for the statement ๐‘‹, all the text between the wordsโ€œProof:โ€ and โ€œQEDโ€ should be focused on establishing that ๐‘‹ is true.Digressions, examples, or ruminations should be kept outside thesetwo words, so they do not confuse the reader. The proof should havea clear logical flow in the sense that every sentence or equation in itshould have some purpose and it should be crystal-clear to the reader

Page 66: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

66 introduction to theoretical computer science

what this purpose is. When you write a proof, for every equation orsentence you include, ask yourself:

1. Is this sentence or equation stating that some statement is true?

2. If so, does this statement follow from the previous steps, or are wegoing to establish it in the next step?

3. What is the role of this sentence or equation? Is it one step towardsproving the original statement, or is it a step towards proving someintermediate claim that you have stated before?

4. Finally, would the answers to questions 1-3 be clear to the reader?If not, then you should reorder, rephrase or add explanations.

Some helpful resources on mathematical writing include this hand-out by Lee, this handout by Hutching, as well as several of the excel-lent handouts in Stanfordโ€™s CS 103 class.

1.5.3 Patterns in proofsโ€œIf it was so, it might be; and if it were so, it would be; but as it isnโ€™t, it ainโ€™t.Thatโ€™s logic.โ€, Lewis Carroll, Through the looking-glass.

Just like in programming, there are several common patterns ofproofs that occur time and again. Here are some examples:

Proofs by contradiction: One way to prove that ๐‘‹ is true is to showthat if ๐‘‹ was false it would result in a contradiction. Such proofsoften start with a sentence such as โ€œSuppose, towards a contradiction,that ๐‘‹ is falseโ€ and end with deriving some contradiction (such as aviolation of one of the assumptions in the theorem statement). Here isan example:Lemma 1.17 There are no natural numbers ๐‘Ž, ๐‘ such that

โˆš2 = ๐‘Ž๐‘ .Proof. Suppose, towards a contradiction that this is false, and so let๐‘Ž โˆˆ โ„• be the smallest number such that there exists some ๐‘ โˆˆ โ„•satisfying

โˆš2 = ๐‘Ž๐‘ . Squaring this equation we get that 2 = ๐‘Ž2/๐‘2 or๐‘Ž2 = 2๐‘2 (โˆ—). But this means that ๐‘Ž2 is even, and since the product oftwo odd numbers is odd, it means that ๐‘Ž is even as well, or in otherwords, ๐‘Ž = 2๐‘Žโ€ฒ for some ๐‘Žโ€ฒ โˆˆ โ„•. Yet plugging this into (โˆ—) shows that4๐‘Žโ€ฒ2 = 2๐‘2 which means ๐‘2 = 2๐‘Žโ€ฒ2 is an even number as well. By thesame considerations as above we get that ๐‘ is even and hence ๐‘Ž/2 and๐‘/2 are two natural numbers satisfying ๐‘Ž/2๐‘/2 = โˆš2, contradicting theminimality of ๐‘Ž.

Page 67: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 67

Proofs of a universal statement: Often we want to prove a statement ๐‘‹ ofthe form โ€œEvery object of type ๐‘‚ has property ๐‘ƒ .โ€ Such proofs oftenstart with a sentence such as โ€œLet ๐‘œ be an object of type ๐‘‚โ€ and end byshowing that ๐‘œ has the property ๐‘ƒ . Here is a simple example:Lemma 1.18 For every natural number ๐‘› โˆˆ ๐‘ , either ๐‘› or ๐‘› + 1 is even.

Proof. Let ๐‘› โˆˆ ๐‘ be some number. If ๐‘›/2 is a whole number thenwe are done, since then ๐‘› = 2(๐‘›/2) and hence it is even. Otherwise,๐‘›/2 + 1/2 is a whole number, and hence 2(๐‘›/2 + 1/2) = ๐‘› + 1 is even.

Proofs of an implication: Another common case is that the statement ๐‘‹has the form โ€œ๐ด implies ๐ตโ€. Such proofs often start with a sentencesuch as โ€œAssume that ๐ด is trueโ€ and end with a derivation of ๐ต from๐ด. Here is a simple example:Lemma 1.19 If ๐‘2 โ‰ฅ 4๐‘Ž๐‘ then there is a solution to the quadratic equa-tion ๐‘Ž๐‘ฅ2 + ๐‘๐‘ฅ + ๐‘ = 0.Proof. Suppose that ๐‘2 โ‰ฅ 4๐‘Ž๐‘. Then ๐‘‘ = ๐‘2 โˆ’ 4๐‘Ž๐‘ is a non-negativenumber and hence it has a square root ๐‘ . Thus ๐‘ฅ = (โˆ’๐‘ + ๐‘ )/(2๐‘Ž)satisfies๐‘Ž๐‘ฅ2 + ๐‘๐‘ฅ + ๐‘ = ๐‘Ž(โˆ’๐‘ + ๐‘ )2/(4๐‘Ž2) + ๐‘(โˆ’๐‘ + ๐‘ )/(2๐‘Ž) + ๐‘= (๐‘2 โˆ’ 2๐‘๐‘  + ๐‘ 2)/(4๐‘Ž) + (โˆ’๐‘2 + ๐‘๐‘ )/(2๐‘Ž) + ๐‘ . (1.17)

Rearranging the terms of (1.17) we get๐‘ 2/(4๐‘Ž) + ๐‘ โˆ’ ๐‘2/(4๐‘Ž) = (๐‘2 โˆ’ 4๐‘Ž๐‘)/(4๐‘Ž) + ๐‘ โˆ’ ๐‘2/(4๐‘Ž) = 0 (1.18)

Proofs of equivalence: If a statement has the form โ€œ๐ด if and only if๐ตโ€ (often shortened as โ€œ๐ด iff ๐ตโ€) then we need to prove both that ๐ดimplies ๐ต and that ๐ต implies ๐ด. We call the implication that ๐ด implies๐ต the โ€œonly ifโ€ direction, and the implication that ๐ต implies ๐ด the โ€œifโ€direction.

Proofs by combining intermediate claims: When a proof is more complex,it is often helpful to break it apart into several steps. That is, to provethe statement ๐‘‹, we might first prove statements ๐‘‹1,๐‘‹2, and ๐‘‹3 andthen prove that ๐‘‹1 โˆง ๐‘‹2 โˆง ๐‘‹3 implies ๐‘‹. (Recall that โˆง denotes thelogical AND operator.)

Proofs by case distinction: This is a special case of the above, where toprove a statement ๐‘‹ we split into several cases ๐ถ1, โ€ฆ , ๐ถ๐‘˜, and provethat (a) the cases are exhaustive, in the sense that one of the cases ๐ถ๐‘–must happen and (b) go one by one and prove that each one of thecases ๐ถ๐‘– implies the result ๐‘‹ that we are after.

Page 68: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

68 introduction to theoretical computer science

Proofs by induction: We discuss induction and give an example inSection 1.6.1 below. We can think of such proofs as a variant of theabove, where we have an unbounded number of intermediate claims๐‘‹0, ๐‘‹2, โ€ฆ , ๐‘‹๐‘˜, and we prove that ๐‘‹0 is true, as well as that ๐‘‹0 implies๐‘‹1, and that ๐‘‹0 โˆง ๐‘‹1 implies ๐‘‹2, and so on and so forth. The websitefor CMU course 15-251 contains a useful handout on potential pitfallswhen making proofs by induction.

โ€Without loss of generality (w.l.o.g)โ€: This term can be initially quite con-fusing. It is essentially a way to simplify proofs by case distinctions.The idea is that if Case 1 is equal to Case 2 up to a change of variablesor a similar transformation, then the proof of Case 1 will also implythe proof of Case 2. It is always a statement that should be viewedwith suspicion. Whenever you see it in a proof, ask yourself if youunderstand why the assumption made is truly without loss of gen-erality, and when you use it, try to see if the use is indeed justified.When writing a proof, sometimes it might be easiest to simply repeatthe proof of the second case (adding a remark that the proof is verysimilar to the first one).

RRemark 1.20 โ€” Hierarchical Proofs (optional). Mathe-matical proofs are ultimately written in English prose.The well-known computer scientist Leslie Lamportargues that this is a problem, and proofs should bewritten in a more formal and rigorous way. In hismanuscript he proposes an approach for structuredhierarchical proofs, that have the following form:

โ€ข A proof for a statement of the form โ€œIf ๐ด then ๐ตโ€is a sequence of numbered claims, starting withthe assumption that ๐ด is true, and ending with theclaim that ๐ต is true.

โ€ข Every claim is followed by a proof showing howit is derived from the previous assumptions orclaims.

โ€ข The proof for each claim is itself a sequence ofsubclaims.

The advantage of Lamportโ€™s format is that the rolethat every sentence in the proof plays is very clear.It is also much easier to transform such proofs intomachine-checkable forms. The disadvantage is thatsuch proofs can be tedious to read and write, withless differentiation between the important parts of thearguments versus the more routine ones.

Page 69: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 69

1.6 EXTENDED EXAMPLE: TOPOLOGICAL SORTING

In this section we will prove the following: every directed acyclicgraph (DAG, see Definition 1.9) can be arranged in layers so that forall directed edges ๐‘ข โ†’ ๐‘ฃ, the layer of ๐‘ฃ is larger than the layer of ๐‘ข.This result is known as topological sorting and is used in many appli-cations, including task scheduling, build systems, software packagemanagement, spreadsheet cell calculations, and many others (seeFig. 1.8). In fact, we will also use it ourselves later on in this book.

Figure 1.8: An example of topological sorting. We con-sider a directed graph corresponding to a prerequisitegraph of the courses in some Computer Science pro-gram. The edge ๐‘ข โ†’ ๐‘ฃ means that the course ๐‘ข is aprerequisite for the course ๐‘ฃ. A layering or โ€œtopologi-cal sortingโ€ of this graph is the same as mapping thecourses to semesters so that if we decide to take thecourse ๐‘ฃ in semester ๐‘“(๐‘ฃ), then we have already takenall the prerequisites for ๐‘ฃ (i.e., its in-neighbors) inprior semesters.

We start with the following definition. A layering of a directedgraph is a way to assign for every vertex ๐‘ฃ a natural number(corresponding to its layer), such that ๐‘ฃโ€™s in-neighbors are inlower-numbered layers than ๐‘ฃ, and ๐‘ฃโ€™s out-neighbors are inhigher-numbered layers. The formal definition is as follows:

Definition 1.21 โ€” Layering of a DAG. Let ๐บ = (๐‘‰ , ๐ธ) be a directed graph.A layering of ๐บ is a function ๐‘“ โˆถ ๐‘‰ โ†’ โ„• such that for every edge๐‘ข โ†’ ๐‘ฃ of ๐บ, ๐‘“(๐‘ข) < ๐‘“(๐‘ฃ).In this section we prove that a directed graph is acyclic if and only if

it has a valid layering.

Theorem 1.22 โ€” Topological Sort. Let ๐บ be a directed graph. Then ๐บ isacyclic if and only if there exists a layering ๐‘“ of ๐บ.

To prove such a theorem, we need to first understand what itmeans. Since it is an โ€œif and only ifโ€ statement, Theorem 1.22 corre-sponds to two statements:Lemma 1.23 For every directed graph ๐บ, if ๐บ is acyclic then it has alayering.Lemma 1.24 For every directed graph ๐บ, if ๐บ has a layering, then it isacyclic.

Page 70: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

70 introduction to theoretical computer science

Figure 1.9: Some examples of DAGs of one, two andthree vertices, and valid ways to assign layers to thevertices.

To prove Theorem 1.22 we need to prove both Lemma 1.23 andLemma 1.24. Lemma 1.24 is actually not that hard to prove. Intuitively,if ๐บ contains a cycle, then it cannot be the case that all edges on thecycle increase in layer number, since if we travel along the cycle atsome point we must come back to the place we started from. Theformal proof is as follows:

Proof. Let ๐บ = (๐‘‰ , ๐ธ) be a directed graph and let ๐‘“ โˆถ ๐‘‰ โ†’ โ„• be alayering of ๐บ as per Definition 1.21 . Suppose, towards a contradiction,that ๐บ is not acyclic, and hence there exists some cycle ๐‘ข0, ๐‘ข1, โ€ฆ , ๐‘ข๐‘˜such that ๐‘ข0 = ๐‘ข๐‘˜ and for every ๐‘– โˆˆ [๐‘˜] the edge ๐‘ข๐‘– โ†’ ๐‘ข๐‘–+1 is present in๐บ. Since ๐‘“ is a layering, for every ๐‘– โˆˆ [๐‘˜], ๐‘“(๐‘ข๐‘–) < ๐‘“(๐‘ข๐‘–+1), which meansthat ๐‘“(๐‘ข0) < ๐‘“(๐‘ข1) < โ‹ฏ < ๐‘“(๐‘ข๐‘˜) (1.19)

but this is a contradiction since ๐‘ข0 = ๐‘ข๐‘˜ and hence ๐‘“(๐‘ข0) = ๐‘“(๐‘ข๐‘˜).

Lemma 1.23 corresponds to the more difficult (and useful) direc-tion. To prove it, we need to show how given an arbitrary DAG ๐บ, wecan come up with a layering of the vertices of ๐บ so that all edges โ€œgoupโ€.

PIf you have not seen the proof of this theorem before(or donโ€™t remember it), this would be an excellentpoint to pause and try to prove it yourself. One wayto do it would be to describe an algorithm that given asinput a directed acyclic graph ๐บ on ๐‘› vertices and ๐‘›โˆ’2or fewer edges, constructs an array ๐น of length ๐‘› suchthat for every edge ๐‘ข โ†’ ๐‘ฃ in the graph ๐น[๐‘ข] < ๐น[๐‘ฃ].

1.6.1 Mathematical inductionThere are several ways to prove Lemma 1.23. One approach to do isto start by proving it for small graphs, such as graphs with 1, 2 or 3vertices (see Fig. 1.9, for which we can check all the cases, and then tryto extend the proof for larger graphs. The technical term for this proofapproach is proof by induction.

Induction is simply an application of the self-evident Modus Ponensrule that says that if

(a) ๐‘ƒ is trueand

(b) ๐‘ƒ implies ๐‘„then ๐‘„ is true.

Page 71: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 71

In the setting of proofs by induction we typically have a statement๐‘„(๐‘˜) that is parameterized by some integer ๐‘˜, and we prove that (a)๐‘„(0) is true, and (b) For every ๐‘˜ > 0, if ๐‘„(0), โ€ฆ , ๐‘„(๐‘˜ โˆ’ 1) are all truethen ๐‘„(๐‘˜) is true. (Usually proving (b) is the hard part, though thereare examples where the โ€œbase caseโ€ (a) is quite subtle.) By applyingModus Ponens, we can deduce from (a) and (b) that ๐‘„(1) is true.Once we did so, since we now know that both ๐‘„(0) and ๐‘„(1) are true,then we can use this and (b) to deduce (again using Modus Ponens)that ๐‘„(2) is true. We can repeat the same reasoning again and againto obtain that ๐‘„(๐‘˜) is true for every ๐‘˜. The statement (a) is called theโ€œbase caseโ€, while (b) is called the โ€œinductive stepโ€. The assumptionin (b) that ๐‘„(๐‘–) holds for ๐‘– < ๐‘˜ is called the โ€œinductive hypothesisโ€.(The form of induction described here is sometimes called โ€œstronginductionโ€ as opposed to โ€œweak inductionโ€ where we replace (b)by the statement (bโ€™) that if ๐‘„(๐‘˜ โˆ’ 1) is true then ๐‘„(๐‘˜) is true; weakinduction can be thought of as the special case of strong inductionwhere we donโ€™t use the assumption that ๐‘„(0), โ€ฆ , ๐‘„(๐‘˜ โˆ’ 2) are true.)

RRemark 1.25 โ€” Induction and recursion. Proofs by in-duction are closely related to algorithms by recursion.In both cases we reduce solving a larger problem tosolving a smaller instance of itself. In a recursive algo-rithm to solve some problem P on an input of length๐‘˜ we ask ourselves โ€œwhat if someone handed me away to solve P on instances smaller than ๐‘˜?โ€. In aninductive proof to prove a statement Q parameterizedby a number ๐‘˜, we ask ourselves โ€œwhat if I alreadyknew that ๐‘„(๐‘˜โ€ฒ) is true for ๐‘˜โ€ฒ < ๐‘˜?โ€. Both inductionand recursion are crucial concepts for this course andComputer Science at large (and even other areas ofinquiry, including not just mathematics but othersciences as well). Both can be confusing at first, butwith time and practice they become clearer. For moreon proofs by induction and recursion, you might findthe following Stanford CS 103 handout, this MIT 6.00lecture or this excerpt of the Lehman-Leighton bookuseful.

1.6.2 Proving the result by inductionThere are several ways to use induction to prove Lemma 1.23 by in-duction. We will use induction on the number ๐‘› of vertices, and so wewill define the statement ๐‘„(๐‘›) as follows:

๐‘„(๐‘›) is โ€œFor every DAG ๐บ = (๐‘‰ , ๐ธ) with ๐‘› vertices, there is a layering of ๐บ.โ€

Page 72: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

72 introduction to theoretical computer science

5 QED stands for โ€œquod erat demonstrandumโ€, whichis Latin for โ€œwhat was to be demonstratedโ€ or โ€œthevery thing it was required to have shownโ€.

6 Using ๐‘› = 0 as the base case is logically valid, butcan be confusing. If you find the trivial ๐‘› = 0 caseto be confusing, you can always directly verify thestatement for ๐‘› = 1 and then use both ๐‘› = 0 and๐‘› = 1 as the base cases.

The statement for ๐‘„(0) (where the graph contains no vertices) istrivial. Thus it will suffice to prove the following: for every ๐‘› > 0, if๐‘„(๐‘› โˆ’ 1) is true then ๐‘„(๐‘›) is true.

To do so, we need to somehow find a way, given a graph ๐บ of ๐‘›vertices, to reduce the task of finding a layering for ๐บ into the task offinding a layering for some other graph ๐บโ€ฒ of ๐‘›โˆ’1 vertices. The idea isthat we will find a source of ๐บ: a vertex ๐‘ฃ that has no in-neighbors. Wecan then assign to ๐‘ฃ the layer 0, and layer the remaining vertices usingthe inductive hypothesis in layers 1, 2, โ€ฆ.

The above is the intuition behind the proof of Lemma 1.23, butwhen writing the formal proof below, we use the benefit of hind-sight, and try to streamline what was a messy journey into a linearand easy-to-follow flow of logic that starts with the word โ€œProof:โ€and ends with โ€œQEDโ€ or the symbol .5 Discussions, examples anddigressions can be very insightful, but we keep them outside the spacedelimited between these two words, where (as described by this ex-cellent handout) โ€œevery sentence must be load bearingโ€. Just like wedo in programming, we can break the proof into little โ€œsubroutinesโ€or โ€œfunctionsโ€ (known as lemmas or claims in math language), whichwill be smaller statements that help us prove the main result. How-ever, the proof should be structured in a way that ensures that it isalways crystal-clear to the reader in what stage we are of the proof.The reader should be able to tell what is the role of every sentence inthe proof and which part it belongs to. We now present the formalproof of Lemma 1.23.

Proof of Lemma 1.23. Let ๐บ = (๐‘‰ , ๐ธ) be a DAG and ๐‘› = |๐‘‰ | be thenumber of its vertices. We prove the lemma by induction on ๐‘›. Thebase case is ๐‘› = 0 where there are no vertices, and so the statement istrivially true.6 For the case of ๐‘› > 0, we make the inductive hypothesisthat every DAG ๐บโ€ฒ of at most ๐‘› โˆ’ 1 vertices has a layering.

We make the following claim:Claim: ๐บ must contain a vertex ๐‘ฃ of in-degree zero.Proof of Claim: Suppose otherwise that every vertex ๐‘ฃ โˆˆ ๐‘‰ has an

in-neighbor. Let ๐‘ฃ0 be some vertex of ๐บ, let ๐‘ฃ1 be an in-neighbor of ๐‘ฃ0,๐‘ฃ2 be an in-neighbor of ๐‘ฃ1, and continue in this way for ๐‘› steps untilwe construct a list ๐‘ฃ0, ๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘› such that for every ๐‘– โˆˆ [๐‘›], ๐‘ฃ๐‘–+1 is anin-neighbor of ๐‘ฃ๐‘–, or in other words the edge ๐‘ฃ๐‘–+1 โ†’ ๐‘ฃ๐‘– is present in thegraph. Since there are only ๐‘› vertices in this graph, one of the ๐‘› + 1vertices in this sequence must repeat itself, and so there exists ๐‘– < ๐‘—such that ๐‘ฃ๐‘– = ๐‘ฃ๐‘—. But then the sequence ๐‘ฃ๐‘— โ†’ ๐‘ฃ๐‘—โˆ’1 โ†’ โ‹ฏ โ†’ ๐‘ฃ๐‘– is a cyclein ๐บ, contradicting our assumption that it is acyclic. (QED Claim)

Given the claim, we can let ๐‘ฃ0 be some vertex of in-degree zero in๐บ, and let ๐บโ€ฒ be the graph obtained by removing ๐‘ฃ0 from ๐บ. ๐บโ€ฒ has

Page 73: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 73

๐‘› โˆ’ 1 vertices and hence per the inductive hypothesis has a layering๐‘“ โ€ฒ โˆถ (๐‘‰ โงต ๐‘ฃ0) โ†’ โ„•. We define ๐‘“ โˆถ ๐‘‰ โ†’ โ„• as follows:

๐‘“(๐‘ฃ) = โŽงโŽจโŽฉ๐‘“ โ€ฒ(๐‘ฃ) + 1 ๐‘ฃ โ‰  ๐‘ฃ00 ๐‘ฃ = ๐‘ฃ0 . (1.20)

We claim that ๐‘“ is a valid layering, namely that for every edge ๐‘ข โ†’๐‘ฃ, ๐‘“(๐‘ข) < ๐‘“(๐‘ฃ). To prove this, we split into cases:

โ€ข Case 1: ๐‘ข โ‰  ๐‘ฃ0, ๐‘ฃ โ‰  ๐‘ฃ0. In this case the edge ๐‘ข โ†’ ๐‘ฃ exists in thegraph ๐บโ€ฒ and hence by the inductive hypothesis ๐‘“ โ€ฒ(๐‘ข) < ๐‘“ โ€ฒ(๐‘ฃ)which implies that ๐‘“ โ€ฒ(๐‘ข) + 1 < ๐‘“ โ€ฒ(๐‘ฃ) + 1.

โ€ข Case 2: ๐‘ข = ๐‘ฃ0, ๐‘ฃ โ‰  ๐‘ฃ0. In this case ๐‘“(๐‘ข) = 0 and ๐‘“(๐‘ฃ) = ๐‘“ โ€ฒ(๐‘ฃ) + 1 >0.โ€ข Case 3: ๐‘ข โ‰  ๐‘ฃ0, ๐‘ฃ = ๐‘ฃ0. This case canโ€™t happen since ๐‘ฃ0 does not

have in-neighbors.

โ€ข Case 4: ๐‘ข = ๐‘ฃ0, ๐‘ฃ = ๐‘ฃ0. This case again canโ€™t happen since it meansthat ๐‘ฃ0 is its own-neighbor โ€” it is involved in a self loop which is aform cycle that is disallowed in an acyclic graph.

Thus, ๐‘“ is a valid layering for ๐บ which completes the proof.

PReading a proof is no less of an important skill thanproducing one. In fact, just like understanding code,it is a highly non-trivial skill in itself. Therefore Istrongly suggest that you re-read the above proof, ask-ing yourself at every sentence whether the assumptionit makes is justified, and whether this sentence trulydemonstrates what it purports to achieve. Anothergood habit is to ask yourself when reading a proof forevery variable you encounter (such as ๐‘ข, ๐‘–, ๐บโ€ฒ, ๐‘“ โ€ฒ, etc.in the above proof) the following questions: (1) Whattype of variable is it? is it a number? a graph? a vertex?a function? and (2) What do we know about it? Is itan arbitrary member of the set? Have we shown somefacts about it?, and (3) What are we trying to showabout it?.

1.6.3 Minimality and uniquenessTheorem 1.22 guarantees that for every DAG ๐บ = (๐‘‰ , ๐ธ) there existssome layering ๐‘“ โˆถ ๐‘‰ โ†’ โ„• but this layering is not necessarily unique.For example, if ๐‘“ โˆถ ๐‘‰ โ†’ โ„• is a valid layering of the graph then so isthe function ๐‘“ โ€ฒ defined as ๐‘“ โ€ฒ(๐‘ฃ) = 2 โ‹… ๐‘“(๐‘ฃ). However, it turns out that

Page 74: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

74 introduction to theoretical computer science

the minimal layering is unique. A minimal layering is one where everyvertex is given the smallest layer number possible. We now formallydefine minimality and state the uniqueness theorem:

Theorem 1.26 โ€” Minimal layering is unique. Let ๐บ = (๐‘‰ , ๐ธ) be a DAG. Wesay that a layering ๐‘“ โˆถ ๐‘‰ โ†’ โ„• is minimal if for every vertex ๐‘ฃ โˆˆ ๐‘‰ , if๐‘ฃ has no in-neighbors then ๐‘“(๐‘ฃ) = 0 and if ๐‘ฃ has in-neighbors thenthere exists an in-neighbor ๐‘ข of ๐‘ฃ such that ๐‘“(๐‘ข) = ๐‘“(๐‘ฃ) โˆ’ 1.

For every layering ๐‘“, ๐‘” โˆถ ๐‘‰ โ†’ โ„• of ๐บ, if both ๐‘“ and ๐‘” are minimalthen ๐‘“ = ๐‘”.The definition of minimality in Theorem 1.26 implies that for every

vertex ๐‘ฃ โˆˆ ๐‘‰ , we cannot move it to a lower layer without makingthe layering invalid. If ๐‘ฃ is a source (i.e., has in-degree zero) thena minimal layering ๐‘“ must put it in layer 0, and for every other ๐‘ฃ, if๐‘“(๐‘ฃ) = ๐‘–, then we cannot modify this to set ๐‘“(๐‘ฃ) โ‰ค ๐‘– โˆ’ 1 since thereis an-neighbor ๐‘ข of ๐‘ฃ satisfying ๐‘“(๐‘ข) = ๐‘– โˆ’ 1. What Theorem 1.26says is that a minimal layering ๐‘“ is unique in the sense that every otherminimal layering is equal to ๐‘“ .Proof Idea:

The idea is to prove the theorem by induction on the layers. If ๐‘“ and๐‘” are minimal then they must agree on the source vertices, since both๐‘“ and ๐‘” should assign these vertices to layer 0. We can then show thatif ๐‘“ and ๐‘” agree up to layer ๐‘– โˆ’ 1, then the minimality property impliesthat they need to agree in layer ๐‘– as well. In the actual proof we usea small trick to save on writing. Rather than proving the statementthat ๐‘“ = ๐‘” (or in other words that ๐‘“(๐‘ฃ) = ๐‘”(๐‘ฃ) for every ๐‘ฃ โˆˆ ๐‘‰ ),we prove the weaker statement that ๐‘“(๐‘ฃ) โ‰ค ๐‘”(๐‘ฃ) for every ๐‘ฃ โˆˆ ๐‘‰ .(This is a weaker statement since the condition that ๐‘“(๐‘ฃ) is lesser orequal than to ๐‘”(๐‘ฃ) is implied by the condition that ๐‘“(๐‘ฃ) is equal to๐‘”(๐‘ฃ).) However, since ๐‘“ and ๐‘” are just labels we give to two minimallayerings, by simply changing the names โ€œ๐‘“โ€ and โ€œ๐‘”โ€ the same proofalso shows that ๐‘”(๐‘ฃ) โ‰ค ๐‘“(๐‘ฃ) for every ๐‘ฃ โˆˆ ๐‘‰ and hence that ๐‘“ = ๐‘”.โ‹†Proof of Theorem 1.26. Let ๐บ = (๐‘‰ , ๐ธ) be a DAG and ๐‘“, ๐‘” โˆถ ๐‘‰ โ†’ โ„• betwo minimal valid layering of ๐บ. We will prove that for every ๐‘ฃ โˆˆ ๐‘‰ ,๐‘“(๐‘ฃ) โ‰ค ๐‘”(๐‘ฃ). Since we didnโ€™t assume anything about ๐‘“, ๐‘” except theirminimality, the same proof will imply that for every ๐‘ฃ โˆˆ ๐‘‰ , ๐‘”(๐‘ฃ) โ‰ค ๐‘“(๐‘ฃ)and hence that ๐‘“(๐‘ฃ) = ๐‘”(๐‘ฃ) for every ๐‘ฃ โˆˆ ๐‘‰ , which is what we neededto show.

We will prove that ๐‘“(๐‘ฃ) โ‰ค ๐‘”(๐‘ฃ) for every ๐‘ฃ โˆˆ ๐‘‰ by induction on๐‘– = ๐‘“(๐‘ฃ). The case ๐‘– = 0 is immediate: since in this case ๐‘“(๐‘ฃ) = 0,๐‘”(๐‘ฃ) must be at least ๐‘“(๐‘ฃ). For the case ๐‘– > 0, by the minimality of ๐‘“ ,

Page 75: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 75

if ๐‘“(๐‘ฃ) = ๐‘– then there must exist some in-neighbor ๐‘ข of ๐‘ฃ such that๐‘“(๐‘ข) = ๐‘– โˆ’ 1. By the induction hypothesis we get that ๐‘”(๐‘ข) โ‰ฅ ๐‘– โˆ’ 1, andsince ๐‘” is a valid layering it must hold that ๐‘”(๐‘ฃ) > ๐‘”(๐‘ข) which meansthat ๐‘”(๐‘ฃ) โ‰ฅ ๐‘– = ๐‘“(๐‘ฃ).

PThe proof of Theorem 1.26 is fully rigorous, but iswritten in a somewhat terse manner. Make sure thatyou read through it and understand why this is indeedan airtight proof of the Theoremโ€™s statement.

1.7 THIS BOOK: NOTATION AND CONVENTIONS

Most of the notation we use in this book is standard and is used inmost mathematical texts. The main points where we diverge are:

โ€ข We index the natural numbers โ„• starting with 0 (though manyother texts, especially in computer science, do the same).

โ€ข We also index the set [๐‘›] starting with 0, and hence define it as0, โ€ฆ , ๐‘›โˆ’1. In other texts it is often defined as 1, โ€ฆ , ๐‘›. Similarly,we index our strings starting with 0, and hence a string ๐‘ฅ โˆˆ 0, 1๐‘›is written as ๐‘ฅ0๐‘ฅ1 โ‹ฏ ๐‘ฅ๐‘›โˆ’1.

โ€ข If ๐‘› is a natural number then 1๐‘› does not equal the number 1 butrather this is the length ๐‘› string 11 โ‹ฏ 1 (that is a string of ๐‘› ones).Similarly, 0๐‘› refers to the length ๐‘› string 00 โ‹ฏ 0.

โ€ข Partial functions are functions that are not necessarily defined onall inputs. When we write ๐‘“ โˆถ ๐ด โ†’ ๐ต this means that ๐‘“ is a totalfunction unless we say otherwise. When we want to emphasize that๐‘“ can be a partial function, we will sometimes write ๐‘“ โˆถ ๐ด โ†’๐‘ ๐ต.

โ€ข As we will see later on in the course, we will mostly describe ourcomputational problems in terms of computing a Boolean function๐‘“ โˆถ 0, 1โˆ— โ†’ 0, 1. In contrast, many other textbooks refer to thesame task as deciding a language ๐ฟ โŠ† 0, 1โˆ—. These two viewpointsare equivalent, since for every set ๐ฟ โŠ† 0, 1โˆ— there is a correspond-ing function ๐น such that ๐น(๐‘ฅ) = 1 if and only if ๐‘ฅ โˆˆ ๐ฟ. Computingpartial functions corresponds to the task known in the literatureas a solving a promise problem. Because the language notation isso prevalent in other textbooks, we will occasionally remind thereader of this correspondence.

โ€ข We use โŒˆ๐‘ฅโŒ‰ and โŒŠ๐‘ฅโŒ‹ for the โ€œceilingโ€ and โ€œfloorโ€ operators thatcorrespond to โ€œrounding upโ€ or โ€œrounding downโ€ a number to the

Page 76: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

76 introduction to theoretical computer science

nearest integer. We use (๐‘ฅ mod ๐‘ฆ) to denote the โ€œremainderโ€ of ๐‘ฅwhen divided by ๐‘ฆ. That is, (๐‘ฅ mod ๐‘ฆ) = ๐‘ฅ โˆ’ ๐‘ฆโŒŠ๐‘ฅ/๐‘ฆโŒ‹. In contextwhen an integer is expected weโ€™ll typically โ€œsilently roundโ€ thequantities to an integer. For example, if we say that ๐‘ฅ is a string oflength โˆš๐‘› then this means that ๐‘ฅ is of length โŒˆโˆš๐‘› โŒ‰. (We round upfor the sake of convention, but in most such cases, it will not make adifference whether we round up or down.)

โ€ข Like most Computer Science texts, we default to the logarithm inbase two. Thus, log๐‘› is the same as log2 ๐‘›.

โ€ข We will also use the notation ๐‘“(๐‘›) = ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) as a shorthand for๐‘“(๐‘›) = ๐‘›๐‘‚(1) (i.e., as shorthand for saying that there are someconstants ๐‘Ž, ๐‘ such that ๐‘“(๐‘›) โ‰ค ๐‘Ž โ‹… ๐‘›๐‘ for every sufficiently large๐‘›). Similarly, we will use ๐‘“(๐‘›) = ๐‘๐‘œ๐‘™๐‘ฆ๐‘™๐‘œ๐‘”(๐‘›) as shorthand for๐‘“(๐‘›) = ๐‘๐‘œ๐‘™๐‘ฆ(log๐‘›) (i.e., as shorthand for saying that there aresome constants ๐‘Ž, ๐‘ such that ๐‘“(๐‘›) โ‰ค ๐‘Ž โ‹… (log๐‘›)๐‘ for every sufficientlylarge ๐‘›).

โ€ข As in often the case in mathematical literature, we use the apostro-phe character to enrich our set of identifiers. Typically if ๐‘ฅ denotessome object, then ๐‘ฅโ€ฒ, ๐‘ฅโ€ณ, etc. will denote other objects of the sametype.

โ€ข To save on โ€œcognitive loadโ€ we will often use round constants suchas 10, 100, 1000 in the statements of both theorems and problemset questions. When you see such a โ€œroundโ€ constant, you cantypically assume that it has no special significance and was justchosen arbitrarily. For example, if you see a theorem of the formโ€œAlgorithm ๐ด takes at most 1000 โ‹… ๐‘›2 steps to compute function ๐นon inputs of length ๐‘›โ€ then probably the number 1000 is an abitrarysufficiently large constant, and one could prove the same theoremwith a bound of the form ๐‘ โ‹… ๐‘›2 for a constant ๐‘ that is smaller than1000. Similarly, if a problem asks you to prove that some quantity isat least ๐‘›/100, it is quite possible that in truth the quantity is at least๐‘›/๐‘‘ for some constant ๐‘‘ that is smaller than 100.

1.7.1 Variable name conventionsLike programming, mathematics is full of variables. Whenever you seea variable, it is always important to keep track of what is its type (e.g.,whether the variable is a number, a string, a function, a graph, etc.).To make this easier, we try to stick to certain conventions and consis-tently use certain identifiers for variables of the same type. Some ofthese conventions are listed in Section 1.7.1 below. These conventionsare not immutable laws and we might occasionally deviate from them.

Page 77: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 77

Also, such conventions do not replace the need to explicitly declare foreach new variable the type of object that it denotes.

Table 1.2: Conventions for identifiers in this book

Identifier Often denotes object of type๐‘–,๐‘—,๐‘˜,โ„“,๐‘š,๐‘›Natural numbers (i.e., in โ„• = 0, 1, 2, โ€ฆ)๐œ–, ๐›ฟ Small positive real numbers (very close to 0)๐‘ฅ, ๐‘ฆ, ๐‘ง, ๐‘ค Typically strings in 0, 1โˆ— though sometimes numbers orother objects. We often identify an object with itsrepresentation as a string.๐บ A graph. The set of ๐บโ€™s vertices is typically denoted by ๐‘‰ .Often ๐‘‰ = [๐‘›]. The set of ๐บโ€™s edges is typically denoted by๐ธ.๐‘† Set๐‘“, ๐‘”, โ„Ž Functions. We often (though not always) use lowercaseidentifiers for finite functions, which map 0, 1๐‘› to 0, 1๐‘š(often ๐‘š = 1).๐น, ๐บ, ๐ป Infinite (unbounded input) functions mapping 0, 1โˆ— to0, 1โˆ— or 0, 1โˆ— to 0, 1๐‘š for some ๐‘š. Based on context,the identifiers ๐บ, ๐ป are sometimes used to denotefunctions and sometimes graphs.๐ด, ๐ต, ๐ถ Boolean circuits๐‘€, ๐‘ Turing machines๐‘ƒ , ๐‘„ Programs๐‘‡ A function mapping โ„• to โ„• that corresponds to a timebound.๐‘ A positive number (often an unspecified constant; e.g.,๐‘‡ (๐‘›) = ๐‘‚(๐‘›) corresponds to the existence of ๐‘ s.t.๐‘‡ (๐‘›) โ‰ค ๐‘ โ‹… ๐‘› every ๐‘› > 0). We sometimes use ๐‘Ž, ๐‘ in asimilar way.ฮฃ Finite set (often used as the alphabet for a set of strings).

1.7.2 Some idiomsMathematical texts often employ certain conventions or โ€œidiomsโ€.Some examples of such idioms that we use in this text include thefollowing:

โ€ข โ€œLet ๐‘‹ be โ€ฆโ€, โ€œlet ๐‘‹ denote โ€ฆโ€, or โ€œlet ๐‘‹ = โ€ฆโ€: These are alldifferent ways for us to say that we are defining the symbol ๐‘‹ tostand for whatever expression is in the โ€ฆ. When ๐‘‹ is a property ofsome objects we might define ๐‘‹ by writing something along thelines of โ€œWe say that โ€ฆ has the property ๐‘‹ if โ€ฆ.โ€. While we often

Page 78: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

78 introduction to theoretical computer science

try to define terms before they are used, sometimes a mathematicalsentence reads easier if we use a term before defining it, in whichcase we add โ€œWhere ๐‘‹ is โ€ฆโ€ to explain how ๐‘‹ is defined in thepreceding expression.

โ€ข Quantifiers: Mathematical texts involve many quantifiers such asโ€œfor allโ€ and โ€œexistsโ€. We sometimes spell these in words as in โ€œforall ๐‘– โˆˆ โ„•โ€ or โ€œthere is ๐‘ฅ โˆˆ 0, 1โˆ—โ€, and sometimes use the formalsymbols โˆ€ and โˆƒ. It is important to keep track on which variable isquantified in what way the dependencies between the variables. Forexample, a sentence fragment such as โ€œfor every ๐‘˜ > 0 there exists๐‘›โ€ means that ๐‘› can be chosen in a way that depends on ๐‘˜. The orderof quantifiers is important. For example, the following is a truestatement: โ€œfor every natural number ๐‘˜ > 1 there exists a prime number๐‘› such that ๐‘› divides ๐‘˜.โ€ In contrast, the following statement is false:โ€œthere exists a prime number ๐‘› such that for every natural number ๐‘˜ > 1,๐‘› divides ๐‘˜.โ€

โ€ข Numbered equations, theorems, definitions: To keep track of allthe terms we define and statements we prove, we often assign thema (typically numeric) label, and then refer back to them in otherparts of the text.

โ€ข (i.e.,), (e.g.,): Mathematical texts tend to contain quite a few ofthese expressions. We use ๐‘‹ (i.e., ๐‘Œ ) in cases where ๐‘Œ is equivalentto ๐‘‹ and ๐‘‹ (e.g., ๐‘Œ ) in cases where ๐‘Œ is an example of ๐‘‹ (e.g., onecan use phrases such as โ€œa natural number (i.e., a non-negativeinteger)โ€ or โ€œa natural number (e.g., 7)โ€).

โ€ข โ€œThusโ€, โ€œThereforeโ€ , โ€œWe get thatโ€: This means that the followingsentence is implied by the preceding one, as in โ€œThe ๐‘›-vertex graph๐บ is connected. Therefore it contains at least ๐‘› โˆ’ 1 edges.โ€ Wesometimes use โ€œindeedโ€ to indicate that the following text justifiesthe claim that was made in the preceding sentence as in โ€œThe ๐‘›-vertex graph ๐บ has at least ๐‘› โˆ’ 1 edges. Indeed, this follows since ๐บ isconnected.โ€

โ€ข Constants: In Computer Science, we typically care about how ouralgorithmsโ€™ resource consumption (such as running time) scaleswith certain quantities (such as the length of the input). We refer toquantities that do not depend on the length of the input as constantsand so often use statements such as โ€œthere exists a constant ๐‘ > 0 suchthat for every ๐‘› โˆˆ โ„•, Algorithm ๐ด runs in at most ๐‘ โ‹… ๐‘›2 steps on inputs oflength ๐‘›.โ€ The qualifier โ€œconstantโ€ for ๐‘ is not strictly needed but isadded to emphasize that ๐‘ here is a fixed number independent of ๐‘›.In fact sometimes, to reduce cognitive load, we will simply replace ๐‘

Page 79: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 79

by a sufficiently large round number such as 10, 100, or 1000, or use๐‘‚-notation and write โ€œAlgorithm ๐ด runs in ๐‘‚(๐‘›2) time.โ€

Chapter Recap

โ€ข The basic โ€œmathematical data structuresโ€ weโ€™llneed are numbers, sets, tuples, strings, graphs andfunctions.

โ€ข We can use basic objects to define more complexnotions. For example, graphs can be defined as a listof pairs.

โ€ข Given precise definitions of objects, we can stateunambiguous and precise statements. We can thenuse mathematical proofs to determine whether thesestatements are true or false.

โ€ข A mathematical proof is not a formal ritual butrather a clear, precise and โ€œbulletproofโ€ argumentcertifying the truth of a certain statement.

โ€ข Big-๐‘‚ notation is an extremely useful formalismto suppress less significant details and allows usto focus on the high level behavior of quantities ofinterest.

โ€ข The only way to get comfortable with mathematicalnotions is to apply them in the contexts of solvingproblems. You should expect to need to go backtime and again to the definitions and notation inthis chapter as you work through problems in thiscourse.

1.8 EXERCISES

Exercise 1.1 โ€” Logical expressions. a. Write a logical expression ๐œ‘(๐‘ฅ)involving the variables ๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2 and the operators โˆง (AND), โˆจ(OR), and ยฌ (NOT), such that ๐œ‘(๐‘ฅ) is true if the majority of theinputs are True.

b. Write a logical expression ๐œ‘(๐‘ฅ) involving the variables ๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2and the operators โˆง (AND), โˆจ (OR), and ยฌ (NOT), such that ๐œ‘(๐‘ฅ)is true if the sum โˆ‘2๐‘–=0 ๐‘ฅ๐‘– (identifying โ€œtrueโ€ with 1 and โ€œfalseโ€with 0) is odd.

Exercise 1.2 โ€” Quantifiers. Use the logical quantifiers โˆ€ (for all), โˆƒ (thereexists), as well as โˆง, โˆจ, ยฌ and the arithmetic operations +, ร—, =, >, < towrite the following:a. An expression ๐œ‘(๐‘›, ๐‘˜) such that for every natural numbers ๐‘›, ๐‘˜,๐œ‘(๐‘›, ๐‘˜) is true if and only if ๐‘˜ divides ๐‘›.b. An expression ๐œ‘(๐‘›) such that for every natural number ๐‘›, ๐œ‘(๐‘›) is

true if and only if ๐‘› is a power of three.

Page 80: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

80 introduction to theoretical computer science

Exercise 1.3 Describe the following statement in English words:โˆ€๐‘›โˆˆโ„•โˆƒ๐‘>๐‘›โˆ€๐‘Ž, ๐‘ โˆˆ โ„•(๐‘Ž ร— ๐‘ โ‰  ๐‘) โˆจ (๐‘Ž = 1).

Exercise 1.4 โ€” Set construction notation. Describe in words the followingsets:

a. ๐‘† = ๐‘ฅ โˆˆ 0, 1100 โˆถ โˆ€๐‘–โˆˆ0,โ€ฆ,99๐‘ฅ๐‘– = ๐‘ฅ99โˆ’๐‘–b. ๐‘‡ = ๐‘ฅ โˆˆ 0, 1โˆ— โˆถ โˆ€๐‘–,๐‘—โˆˆ2,โ€ฆ,|๐‘ฅ|โˆ’1๐‘– โ‹… ๐‘— โ‰  |๐‘ฅ|

Exercise 1.5 โ€” Existence of one to one mappings. For each one of the fol-lowing pairs of sets (๐‘†, ๐‘‡ ), prove or disprove the following statement:there is a one to one function ๐‘“ mapping ๐‘† to ๐‘‡ .

a. Let ๐‘› > 10. ๐‘† = 0, 1๐‘› and ๐‘‡ = [๐‘›] ร— [๐‘›] ร— [๐‘›].b. Let ๐‘› > 10. ๐‘† is the set of all functions mapping 0, 1๐‘› to 0, 1.๐‘‡ = 0, 1๐‘›3 .c. Let ๐‘› > 100. ๐‘† = ๐‘˜ โˆˆ [๐‘›] | ๐‘˜ is prime, ๐‘‡ = 0, 1โŒˆlog๐‘›โˆ’1โŒ‰.

Exercise 1.6 โ€” Inclusion Exclusion. a. Let ๐ด, ๐ต be finite sets. Prove that|๐ด โˆช ๐ต| = |๐ด| + |๐ต| โˆ’ |๐ด โˆฉ ๐ต|.b. Let ๐ด0, โ€ฆ , ๐ด๐‘˜โˆ’1 be finite sets. Prove that |๐ด0 โˆช โ‹ฏ โˆช ๐ด๐‘˜โˆ’1| โ‰ฅโˆ‘๐‘˜โˆ’1๐‘–=0 |๐ด๐‘–| โˆ’ โˆ‘0โ‰ค๐‘–<๐‘—<๐‘˜ |๐ด๐‘– โˆฉ ๐ด๐‘—|.c. Let ๐ด0, โ€ฆ , ๐ด๐‘˜โˆ’1 be finite subsets of 1, โ€ฆ , ๐‘›, such that |๐ด๐‘–| = ๐‘š for

every ๐‘– โˆˆ [๐‘˜]. Prove that if ๐‘˜ > 100๐‘›, then there exist two distinctsets ๐ด๐‘–, ๐ด๐‘— s.t. |๐ด๐‘– โˆฉ ๐ด๐‘—| โ‰ฅ ๐‘š2/(10๐‘›).

Exercise 1.7 Prove that if ๐‘†, ๐‘‡ are finite and ๐น โˆถ ๐‘† โ†’ ๐‘‡ is one to onethen |๐‘†| โ‰ค |๐‘‡ |.

Exercise 1.8 Prove that if ๐‘†, ๐‘‡ are finite and ๐น โˆถ ๐‘† โ†’ ๐‘‡ is onto then|๐‘†| โ‰ฅ |๐‘‡ |.

Exercise 1.9 Prove that for every finite ๐‘†, ๐‘‡ , there are (|๐‘‡ | + 1)|๐‘†| partialfunctions from ๐‘† to ๐‘‡ .

Exercise 1.10 Suppose that ๐‘†๐‘›๐‘›โˆˆโ„• is a sequence such that ๐‘†0 โ‰ค 10 andfor ๐‘› > 1 ๐‘†๐‘› โ‰ค 5๐‘†โŒŠ ๐‘›5 โŒ‹ + 2๐‘›. Prove by induction that ๐‘†๐‘› โ‰ค 100๐‘› log๐‘› forevery ๐‘›.

Page 81: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

mathematical background 81

7 one way to do this is to use Stirlingโ€™s approximationfor the factorial function..

Exercise 1.11 Prove that for every undirected graph ๐บ of 100 vertices,if every vertex has degree at most 4, then there exists a subset ๐‘† of atleast 20 vertices such that no two vertices in ๐‘† are neighbors of oneanother.

Exercise 1.12 โ€” ๐‘‚-notation. For every pair of functions ๐น, ๐บ below, deter-mine which of the following relations holds: ๐น = ๐‘‚(๐บ), ๐น = ฮฉ(๐บ),๐น = ๐‘œ(๐บ) or ๐น = ๐œ”(๐บ).a. ๐น(๐‘›) = ๐‘›, ๐บ(๐‘›) = 100๐‘›.b. ๐น(๐‘›) = ๐‘›, ๐บ(๐‘›) = โˆš๐‘›.c. ๐น(๐‘›) = ๐‘› log๐‘›, ๐บ(๐‘›) = 2(log(๐‘›))2 .d. ๐น(๐‘›) = โˆš๐‘›, ๐บ(๐‘›) = 2โˆšlog๐‘›e. ๐น(๐‘›) = ( ๐‘›โŒˆ0.2๐‘›โŒ‰) , ๐บ(๐‘›) = 20.1๐‘› (where (๐‘›๐‘˜) is the number of ๐‘˜-sized

subsets of a set of size ๐‘›) and ๐‘”(๐‘›) = 20.1๐‘›. See footnote for hint.7

Exercise 1.13 Give an example of a pair of functions ๐น, ๐บ โˆถ โ„• โ†’ โ„• suchthat neither ๐น = ๐‘‚(๐บ) nor ๐บ = ๐‘‚(๐น) holds.

Exercise 1.14 Prove that for every undirected graph ๐บ on ๐‘› vertices, if ๐บhas at least ๐‘› edges then ๐บ contains a cycle.

Exercise 1.15 Prove that for every undirected graph ๐บ of 1000 vertices,if every vertex has degree at most 4, then there exists a subset ๐‘† of atleast 200 vertices such that no two vertices in ๐‘† are neighbors of oneanother.

1.9 BIBLIOGRAPHICAL NOTES

The heading โ€œA Mathematicianโ€™s Apologyโ€, refers to Hardyโ€™s classicbook [Har41]. Even when Hardy is wrong, he is very much worthreading.

There are many online sources for the mathematical backgroundneeded for this book. In particular, the lecture notes for MIT 6.042โ€œMathematics for Computer Scienceโ€ [LLM18] are extremely com-prehensive, and videos and assignments for this course are availableonline. Similarly, Berkeley CS 70: โ€œDiscrete Mathematics and Proba-bility Theoryโ€ has extensive lecture notes online.

Page 82: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

82 introduction to theoretical computer science

Other sources for discrete mathematics are Rosen [Ros19] andJim Aspensโ€™ online book [Asp18]. Lewis and Zax [LZ19], as wellas the online book of Fleck [Fle18], give a more gentle overview ofthe much of the same material. Solow [Sol14] is a good introductionto proof reading and writing. Kun [Kun18] gives an introductionto mathematics aimed at readers with programming background.Stanfordโ€™s CS 103 course has a wonderful collections of handouts onmathematical proof techniques and discrete mathematics.

The word graph in the sense of Definition 1.3 was coined by themathematician Sylvester in 1878 in analogy with the chemical graphsused to visualize molecules. There is an unfortunate confusion be-tween this term and the more common usage of the word โ€œgraphโ€ asa way to plot data, and in particular a plot of some function ๐‘“(๐‘ฅ) as afunction of ๐‘ฅ. One way to relate these two notions is to identify everyfunction ๐‘“ โˆถ ๐ด โ†’ ๐ต with the directed graph ๐บ๐‘“ over the vertex set๐‘‰ = ๐ด โˆช ๐ต such that ๐บ๐‘“ contains the edge ๐‘ฅ โ†’ ๐‘“(๐‘ฅ) for every ๐‘ฅ โˆˆ ๐ด. Ina graph ๐บ๐‘“ constructed in this way, every vertex in ๐ด has out-degreeequal to one. If the function ๐‘“ is one to one then every vertex in ๐ต hasin-degree at most one. If the function ๐‘“ is onto then every vertex in ๐ตhas in-degree at least one. If ๐‘“ is a bijection then every vertex in ๐ต hasin-degree exactly equal to one.

Carl Pomeranceโ€™s quote is taken from the home page of DoronZeilberger.

Page 83: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Figure 2.1: Our basic notion of computation is someprocess that maps an input to an output

2Computation and Representation

โ€œThe alphabet (sic) was a great invention, which enabled men (sic) to storeand to learn with little effort what others had learned the hard way โ€“ that is, tolearn from books rather than from direct, possibly painful, contact with the realworld.โ€, B.F. Skinner

โ€œThe name of the song is called โ€˜HADDOCKโ€™S EYES.โ€โ€™ [said the Knight]โ€œOh, thatโ€™s the name of the song, is it?โ€ Alice said, trying to feel interested.โ€œNo, you donโ€™t understand,โ€ the Knight said, looking a little vexed. โ€œThatโ€™swhat the name is CALLED. The name really is โ€˜THE AGED AGEDMAN.โ€โ€™โ€œThen I ought to have said โ€˜Thatโ€™s what the SONG is calledโ€™?โ€ Alice cor-rected herself.โ€œNo, you oughtnโ€™t: thatโ€™s quite another thing! The SONG is called โ€˜WAYSANDMEANSโ€™: but thatโ€™s only what itโ€™s CALLED, you know!โ€โ€œWell, what IS the song, then?โ€ said Alice, who was by this time com-pletely bewildered.โ€œI was coming to that,โ€ the Knight said. โ€œThe song really IS โ€˜A-SITTING ONA GATEโ€™: and the tuneโ€™s my own invention.โ€Lewis Carroll, Through the Looking-Glass

To a first approximation, computation is a process that maps an inputto an output.

When discussing computation, it is essential to separate the ques-tion of what is the task we need to perform (i.e., the specification) fromthe question of how we achieve this task (i.e., the implementation).For example, as weโ€™ve seen, there is more than one way to achieve thecomputational task of computing the product of two integers.

In this chapter we focus on the what part, namely defining compu-tational tasks. For starters, we need to define the inputs and outputs.Capturing all the potential inputs and outputs that we might everwant to compute on seems challenging, since computation today isapplied to a wide variety of objects. We do not compute merely onnumbers, but also on texts, images, videos, connection graphs of social

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Distinguish between specification and

implementation, or equivalently betweenmathematical functions andalgorithms/programs.

โ€ข Representing an object as a string (often ofzeroes and ones).

โ€ข Examples of representations for commonobjects such as numbers, vectors, lists, graphs.

โ€ข Prefix-free representations.โ€ข Cantorโ€™s Theorem: The real numbers cannot

be represented exactly as finite strings.

Page 84: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

84 introduction to theoretical computer science

Figure 2.2: We represent numbers, texts, images, net-works and many other objects using strings of zeroesand ones. Writing the zeroes and ones themselves ingreen font over a black background is optional.

networks, MRI scans, gene data, and even other programs. We willrepresent all these objects as strings of zeroes and ones, that is objectssuch as 0011101 or 1011 or any other finite list of 1โ€™s and 0โ€™s. (Thischoice is for convenience: there is nothing โ€œholyโ€ about zeroes andones, and we could have used any other finite collection of symbols.)

Today, we are so used to the notion of digital representation thatwe are not surprised by the existence of such an encoding. But it isactually a deep insight with significant implications. Many animalscan convey a particular fear or desire, but what is unique about hu-mans is language: we use a finite collection of basic symbols to describea potentially unlimited range of experiences. Language allows trans-mission of information over both time and space and enables soci-eties that span a great many people and accumulate a body of sharedknowledge over time.

Over the last several decades, we have seen a revolution in what wecan represent and convey in digital form. We can capture experienceswith almost perfect fidelity, and disseminate it essentially instanta-neously to an unlimited audience. Moreover, once information is indigital form, we can compute over it, and gain insights from data thatwere not accessible in prior times. At the heart of this revolution is thesimple but profound observation that we can represent an unboundedvariety of objects using a finite set of symbols (and in fact using onlythe two symbols 0 and 1).

In later chapters, we will typically take such representations forgranted, and hence use expressions such as โ€œprogram ๐‘ƒ takes ๐‘ฅ asinputโ€ when ๐‘ฅ might be a number, a vector, a graph, or any otherobject, when we really mean that ๐‘ƒ takes as input the representation of๐‘ฅ as a binary string. However, in this chapter we will dwell a bit moreon how we can construct such representations.

This chapter: A non-mathy overviewThe main takeaways from this chapter are:

โ€ข We can represent all kinds of objects we want to use as in-puts and outputs using binary strings. For example, we canuse the binary basis to represent integers and rational num-bers as binary strings (see Section 2.1.1 and Section 2.2).

โ€ข We can use compose the representations of simple objectsto represent more complex objects. In this way, we canrepresent lists of integers or rational numbers, and usethat to represent objects such as matrices, images, andgraphs. Prefix-free encoding is one way to achieve such acomposition (see Section 2.5.2).

Page 85: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 85

โ€ข A computational task specifies a map between an input to anoutputโ€” a function. It is crucially important to distinguishbetween the โ€œwhatโ€ and the โ€œhowโ€, or the specificationand implementation (see Section 2.6.1). A function simplydefines which output corresponds to which input. It doesnot specify how to compute the output from the input, andas weโ€™ve seen in the context of multiplication, there can bemore than one way to compute the same function.

โ€ข While the set of all possible binary strings is infinite, itstill cannot represent everything. In particular, there is norepresentation of the real numbers (with absolute accuracy)as binary strings. This result is also known as โ€œCantorโ€™sTheoremโ€ (see Section 2.4) and is typically referred to asthe result that the โ€œreals are uncountableโ€. It is also im-plies that there are different levels of infinity though we willnot get into this topic in this book (see Remark 2.10).

The two โ€œbig ideasโ€™ โ€™ we discuss are Big Idea 1 - we can com-pose representations for simple objects to represent morecomplex objects and Big Idea 2 - it is crucial to distinguishbetween functions (โ€œwhatโ€) and programs (โ€œhowโ€). The latterwill be a theme we will come back to time and again in thisbook.

2.1 DEFINING REPRESENTATIONS

Every time we store numbers, images, sounds, databases, or other ob-jects on a computer, what we actually store in the computerโ€™s memoryis the representation of these objects. Moreover, the idea of representa-tion is not restricted to digital computers. When we write down text ormake a drawing we are representing ideas or experiences as sequencesof symbols (which might as well be strings of zeroes and ones). Evenour brain does not store the actual sensory inputs we experience, butrather only a representation of them.

To use objects such as numbers, images, graphs, or others as inputsfor computation, we need to define precisely how to represent theseobjects as binary strings. A representation scheme is a way to map an ob-ject ๐‘ฅ to a binary string ๐ธ(๐‘ฅ) โˆˆ 0, 1โˆ—. For example, a representationscheme for natural numbers is a function ๐ธ โˆถ โ„• โ†’ 0, 1โˆ—. Of course,we cannot merely represent all numbers as the string โ€œ0011โ€ (for ex-ample). A minimal requirement is that if two numbers ๐‘ฅ and ๐‘ฅโ€ฒ aredifferent then they would be represented by different strings. Anotherway to say this is that we require the encoding function ๐ธ to be one toone.

Page 86: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

86 introduction to theoretical computer science

Figure 2.3: Representing each one the digits0, 1, 2, โ€ฆ , 9 as a 12 ร— 8 bitmap image, which can bethought of as a string in 0, 196. Using this schemewe can represent a natural number ๐‘ฅ of ๐‘› decimaldigits as a string in 0, 196๐‘›. Image taken from blogpost of A. C. Andersen.

2.1.1 Representing natural numbersWe now show how we can represent natural numbers as binarystrings. Over the years people have represented numbers in a varietyof ways, including Roman numerals, tally marks, our own Hindu-Arabic decimal system, and many others. We can use any one ofthose as well as many others to represent a number as a string (seeFig. 2.3). However, for the sake of concreteness, we use the binarybasis as our default representation of natural numbers as strings.For example, we represent the number six as the string 110 since1 โ‹… 22 + 1 โ‹… 21 + 0 โ‹… 20 = 6, and similarly we represent the number thirty-five as the string ๐‘ฆ = 100011 which satisfies โˆ‘5๐‘–=0 ๐‘ฆ๐‘– โ‹… 2|๐‘ฆ|โˆ’๐‘–โˆ’1 = 35.Some more examples are given in the table below.

Table 2.1: Representing numbers in the binary basis. The lefthandcolumn contains representations of natural numbers in the deci-mal basis, while the righthand column contains representationsof the same numbers in the binary basis.

Number (decimal representation) Number (binary representation)0 01 12 105 10116 1000040 10100053 110101389 1100001013750 111010100110

If ๐‘› is even, then the least significant digit of ๐‘›โ€™s binary representa-tion is 0, while if ๐‘› is odd then this digit equals 1. Just like the numberโŒŠ๐‘›/10โŒ‹ corresponds to โ€œchopping offโ€ the least significant decimaldigit (e.g., โŒŠ457/10โŒ‹ = โŒŠ45.7โŒ‹ = 45), the number โŒŠ๐‘›/2โŒ‹ correspondsto the โ€œchopping offโ€ the least significant binary digit. Hence the bi-nary representation can be formally defined as the following function๐‘๐‘ก๐‘† โˆถ โ„• โ†’ 0, 1โˆ— (๐‘๐‘ก๐‘† stands for โ€œnatural numbers to stringsโ€):

๐‘๐‘ก๐‘†(๐‘›) = โŽงโŽจโŽฉ0 ๐‘› = 01 ๐‘› = 1๐‘๐‘ก๐‘†(โŒŠ๐‘›/2โŒ‹)๐‘๐‘Ž๐‘Ÿ๐‘–๐‘ก๐‘ฆ(๐‘›) ๐‘› > 1 (2.1)

where ๐‘๐‘Ž๐‘Ÿ๐‘–๐‘ก๐‘ฆ โˆถ โ„• โ†’ 0, 1 is the function defined as ๐‘๐‘Ž๐‘Ÿ๐‘–๐‘ก๐‘ฆ(๐‘›) = 0if ๐‘› is even and ๐‘๐‘Ž๐‘Ÿ๐‘–๐‘ก๐‘ฆ(๐‘›) = 1 if ๐‘› is odd, and as usual, for strings๐‘ฅ, ๐‘ฆ โˆˆ 0, 1โˆ—, ๐‘ฅ๐‘ฆ denotes the concatenation of ๐‘ฅ and ๐‘ฆ. The function

Page 87: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 87

๐‘๐‘ก๐‘† is defined recursively: for every ๐‘› > 0 we define ๐‘Ÿ๐‘’๐‘(๐‘›) in termsof the representation of the smaller number โŒŠ๐‘›/2โŒ‹. It is also possible todefine ๐‘๐‘ก๐‘† non-recursively, see Exercise 2.2.

Throughout most of this book, the particular choices of represen-tation of numbers as binary strings would not matter much: we justneed to know that such a representation exists. In fact, for many of ourpurposes we can even use the simpler representation of mapping anatural number ๐‘› to the length-๐‘› all-zero string 0๐‘›.

RRemark 2.1 โ€” Binary representation in python (optional).We can implement the binary representation in Pythonas follows:

def NtS(n):# natural numbers to stringsif n > 1:

return NtS(n // 2) + str(n % 2)else:

return str(n % 2)

print(NtS(236))# 11101100

print(NtS(19))# 10011

We can also use Python to implement the inversetransformation, mapping a string back to the naturalnumber it represents.

def StN(x):# String to numberk = len(x)-1return sum(int(x[i])*(2**(k-i)) for i in

range(k+1))print(StN(NtS(236)))# 236

RRemark 2.2 โ€” Programming examples. In this book,we sometimes use code examples as in Remark 2.1.The point is always to emphasize that certain com-putations can be achieved concretely, rather thanillustrating the features of Python or any other pro-gramming language. Indeed, one of the messages ofthis book is that all programming languages are ina certain precise sense equivalent to one another, andhence we could have just as well used JavaScript, C,COBOL, Visual Basic or even BrainF*ck. This bookis not about programming, and it is absolutely OK if

Page 88: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

88 introduction to theoretical computer science

1 While the Babylonians already invented a positionalsystem much earlier, the decimal positional systemwe use today was invented by Indian mathematiciansaround the third century. Arab mathematicians tookit up in the 8th century. It first received significantattention in Europe with the publication of the 1202book โ€œLiber Abaciโ€ by Leonardo of Pisa, also known asFibonacci, but it did not displace Roman numerals incommon usage until the 15th century.

you are not familiar with Python or do not follow codeexamples such as those in Remark 2.1.

2.1.2 Meaning of representations (discussion)It is natural for us to think of 236 as the โ€œactualโ€ number, and of11101100 as โ€œmerelyโ€ its representation. However, for most Euro-peans in the middle ages CCXXXVI would be the โ€œactualโ€ number and236 (if they have even heard about it) would be the weird Hindu-Arabic positional representation.1 When our AI robot overlords ma-terialize, they will probably think of 11101100 as the โ€œactualโ€ numberand of 236 as โ€œmerelyโ€ a representation that they need to use whenthey give commands to humans.

So what is the โ€œactualโ€ number? This is a question that philoso-phers of mathematics have pondered over throughout history. Platoargued that mathematical objects exist in some ideal sphere of exis-tence (that to a certain extent is more โ€œrealโ€ than the world we per-ceive via our senses, as this latter world is merely the shadow of thisideal sphere). In Platoโ€™s vision, the symbols 236 are merely notationfor some ideal object, that, in homage to the late musician, we canrefer to as โ€œthe number commonly represented by 236โ€.

The Austrian philosopher Ludwig Wittgenstein, on the other hand,argued that mathematical objects do not exist at all, and the onlythings that exist are the actual marks on paper that make up 236,11101100 or CCXXXVI. In Wittgensteinโ€™s view, mathematics is merelyabout formal manipulation of symbols that do not have any inherentmeaning. You can think of the โ€œactualโ€ number as (somewhat recur-sively) โ€œthat thing which is common to 236, 11101100 and CCXXXVIand all other past and future representations that are meant to capturethe same objectโ€.

While reading this book, you are free to choose your own phi-losophy of mathematics, as long as you maintain the distinction be-tween the mathematical objects themselves and the various particularchoices of representing them, whether as splotches of ink, pixels on ascreen, zeroes and one, or any other form.

2.2 REPRESENTATIONS BEYOND NATURAL NUMBERS

We have seen that natural numbers can be represented as binarystrings. We now show that the same is true for other types of objects,including (potentially negative) integers, rational numbers, vectors,lists, graphs and many others. In many instances, choosing the โ€œrightโ€string representation for a piece of data is highly nontrivial, and find-ing the โ€œbestโ€ one (e.g., most compact, best fidelity, most efficientlymanipulable, robust to errors, most informative features, etc.) is the

Page 89: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 89

object of intense research. But for now, we focus on presenting somesimple representations for various objects that we would like to use asinputs and outputs for computation.

2.2.1 Representing (potentially negative) integersSince we can represent natural numbers as strings, we canrepresent the full set of integers (i.e., members of the setโ„ค = โ€ฆ , โˆ’3, โˆ’2, โˆ’1, 0, +1, +2, +3, โ€ฆ ) by adding one more bitthat represents the sign. To represent a (potentially negative) number๐‘š, we prepend to the representation of the natural number |๐‘š| a bit ๐œŽthat equals 0 if ๐‘š โ‰ฅ 0 and equals 1 if ๐‘š < 0. Formally, we define thefunction ๐‘๐‘ก๐‘† โˆถ โ„ค โ†’ 0, 1โˆ— as follows๐‘๐‘ก๐‘†(๐‘š) = โŽงโŽจโŽฉ0 ๐‘๐‘ก๐‘†(๐‘š) ๐‘š โ‰ฅ 01 ๐‘๐‘ก๐‘†(โˆ’๐‘š) ๐‘š < 0 (2.2)

where ๐‘๐‘ก๐‘† is defined as in (2.1).While the encoding function of a representation needs to be one

to one, it does not have to be onto. For example, in the representationabove there is no number that is represented by the empty stringbut it is still a fine representation, since every integer is representeduniquely by some string.

RRemark 2.3 โ€” Interpretation and context. Given a string๐‘ฆ โˆˆ 0, 1โˆ—, how do we know if itโ€™s โ€œsupposedโ€ torepresent a (nonnegative) natural number or a (po-tentially negative) integer? For that matter, even ifwe know ๐‘ฆ is โ€œsupposedโ€ to be an integer, how dowe know what representation scheme it uses? Theshort answer is that we do not necessarily know thisinformation, unless it is supplied from the context. (Inprogramming languages, the compiler or interpreterdetermines the representation of the sequence of bitscorresponding to a variable based on the variableโ€™stype.) We can treat the same string ๐‘ฆ as representing anatural number, an integer, a piece of text, an image,or a green gremlin. Whenever we say a sentence suchas โ€œlet ๐‘› be the number represented by the string ๐‘ฆ,โ€we will assume that we are fixing some canonical rep-resentation scheme such as the ones above. The choiceof the particular representation scheme will rarelymatter, except that we want to make sure to stick withthe same one for consistency.

2.2.2 Twoโ€™s complement representation (optional)Section 2.2.1โ€™s approach of representing an integer using a specificโ€œsign bitโ€ is known as the Signed Magnitude Representation and was

Page 90: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

90 introduction to theoretical computer science

Figure 2.4: In the twoโ€™s complement representationwe represent a potentially negative integer ๐‘˜ โˆˆโˆ’2๐‘›, โ€ฆ , 2๐‘› โˆ’ 1 as an ๐‘› + 1 length string using thebinary representation of the integer ๐‘˜ mod 2๐‘›+1. Onthe lefthand side: this representation for ๐‘› = 3 (thered integers are the numbers being represented bythe blue binary strings). If a microprocessor does notcheck for overflows, adding the two positive numbers6 and 5 might result in the negative number โˆ’5 (sinceโˆ’5 mod 16 = 11. The righthand side is a C programthat will on some 32 bit architecture print a negativenumber after adding two positive numbers. (Integeroverflow in C is considered undefined behavior whichmeans the result of this program, including whetherit runs or crashes, could differ depending on thearchitecture, compiler, and even compiler options andversion.)

used in some early computers. However, the twoโ€™s complement rep-resentation is much more common in practice. The twoโ€™s complementrepresentation of an integer ๐‘˜ in the set โˆ’2๐‘›, โˆ’2๐‘› + 1, โ€ฆ , 2๐‘› โˆ’ 1 is thestring ๐‘๐‘ก๐‘†๐‘›(๐‘˜) of length ๐‘› + 1 defined as follows:๐‘๐‘ก๐‘†๐‘›(๐‘˜) = โŽงโŽจโŽฉ๐‘๐‘ก๐‘†๐‘›+1(๐‘˜) 0 โ‰ค ๐‘˜ โ‰ค 2๐‘› โˆ’ 1๐‘๐‘ก๐‘†๐‘›+1(2๐‘›+1 + ๐‘˜) โˆ’2๐‘› โ‰ค ๐‘˜ โ‰ค โˆ’1 , (2.3)

where ๐‘๐‘ก๐‘†โ„“(๐‘š) demotes the standard binary representation of a num-ber ๐‘š โˆˆ 0, โ€ฆ , 2โ„“ as string of length โ„“, padded with leading zerosas needed. For example, if ๐‘› = 3 then ๐‘๐‘ก๐‘†3(1) = ๐‘๐‘ก๐‘†4(1) = 0001,๐‘๐‘ก๐‘†3(2) = ๐‘๐‘ก๐‘†4(2) = 0010, ๐‘๐‘ก๐‘†3(โˆ’1) = ๐‘๐‘ก๐‘†4(16 โˆ’ 1) = 1111, and๐‘๐‘ก๐‘†3(โˆ’8) = ๐‘๐‘ก๐‘†4(16 โˆ’ 8) = 1000. If ๐‘˜ is a negative number larger thanor equal to โˆ’2๐‘› then 2๐‘›+1 + ๐‘˜ is a number between 2๐‘› and 2๐‘›+1 โˆ’ 1.Hence the twoโ€™s complement representation of such a number ๐‘˜ is astring of length ๐‘› + 1 with its first digit equal to 1.

Another way to say this is that we represent a potentially negativenumber ๐‘˜ โˆˆ โˆ’2๐‘›, โ€ฆ , 2๐‘› โˆ’1 as the non-negative number ๐‘˜ mod 2๐‘›+1(see also Fig. 2.4). This means that if two (potentially negative) num-bers ๐‘˜ and ๐‘˜โ€ฒ are not too large (i.e., |๐‘˜| + |๐‘˜โ€ฒ| < 2๐‘›+1), then we cancompute the representation of ๐‘˜ + ๐‘˜โ€ฒ by adding modulo 2๐‘›+1 the rep-resentations of ๐‘˜ and ๐‘˜โ€ฒ as if they were non-negative integers. Thisproperty of the twoโ€™s complement representation is its main attractionsince, depending on their architectures, microprocessors can oftenperform arithmetic operations modulo 2๐‘ค very efficiently (for certainvalues of ๐‘ค such as 32 and 64). Many systems leave it to the pro-grammer to check that values are not too large and will carry out thismodular arithmetic regardless of the size of the numbers involved. Forthis reason, in some systems adding two large positive numbers canresult in a negative number (e.g., adding 2๐‘› โˆ’ 100 and 2๐‘› โˆ’ 200 mightresult in โˆ’300 since (2๐‘›+1 โˆ’ 300) mod 2๐‘›+1 = โˆ’300, see also Fig. 2.4).

2.2.3 Rational numbers, and representing pairs of stringsWe can represent a rational number of the form ๐‘Ž/๐‘ by represent-ing the two numbers ๐‘Ž and ๐‘. However, merely concatenating therepresentations of ๐‘Ž and ๐‘ will not work. For example, the binary rep-resentation of 4 is 100 and the binary representation of 43 is 101011,but the concatenation 100101011 of these strings is also the concatena-tion of the representation 10010 of 18 and the representation 1011 of11. Hence, if we used such simple concatenation then we would notbe able to tell if the string 100101011 is supposed to represent 4/43 or18/11.

We tackle this by giving a general representation for pairs of strings.If we were using a pen and paper, we would just use a separator sym-bol such as โ€– to represent, for example, the pair consisting of the num-

Page 91: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 91

bers represented by 10 and 110001 as the length-9 string โ€œ01โ€–110001โ€.In other words, there is a one to one map ๐น from pairs of strings๐‘ฅ, ๐‘ฆ โˆˆ 0, 1โˆ— into a single string ๐‘ง over the alphabet ฮฃ = 0, 1, โ€–(in other words, ๐‘ง โˆˆ ฮฃโˆ—). Using such separators is similar to theway we use spaces and punctuation to separate words in English. Byadding a little redundancy, we achieve the same effect in the digitaldomain. We can map the three-element set ฮฃ to the three-element set00, 11, 01 โŠ‚ 0, 12 in a one-to-one fashion, and hence encode alength ๐‘› string ๐‘ง โˆˆ ฮฃโˆ— as a length 2๐‘› string ๐‘ค โˆˆ 0, 1โˆ—.

Our final representation for rational numbers is obtained by com-posing the following steps:1. Representing a non-negative rational number as a pair of natural

numbers.2. Representing a natural number by a string via the binary represen-

tation.3. Combining 1 and 2 to obtain a representation of a rational number

as a pair of strings.4. Representing a pair of strings over 0, 1 as a single string overฮฃ = 0, 1, โ€–.5. Representing a string over ฮฃ as a longer string over 0, 1.

Example 2.4 โ€” Representing a rational number as a string. Consider therational number ๐‘Ÿ = โˆ’5/8. We represent โˆ’5 as 1101 and +8 as01000, and so we can represent ๐‘Ÿ as the pair of strings (1101, 01000)and represent this pair as the length 10 string 1101โ€–01000 overthe alphabet 0, 1, โ€–. Now, applying the map 0 โ†ฆ 00, 1 โ†ฆ 11,โ€– โ†ฆ 01, we can represent the latter string as the length 20 string๐‘  = 11110011010011000000 over the alphabet 0, 1.The same idea can be used to represent triples of strings, quadru-

ples, and so on as a string. Indeed, this is one instance of a very gen-eral principle that we use time and again in both the theory and prac-tice of computer science (for example, in Object Oriented program-ming):

Big Idea 1 If we can represent objects of type ๐‘‡ as strings, then wecan represent tuples of objects of type ๐‘‡ as strings as well.

Repeating the same idea, once we can represent objects of type ๐‘‡ ,we can also represent lists of lists of such objects, and even lists of listsof lists and so on and so forth. We will come back to this point whenwe discuss prefix free encoding in Section 2.5.2.

Page 92: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

92 introduction to theoretical computer science

Figure 2.6: XKCD cartoon on floating-point arithmetic.

2.3 REPRESENTING REAL NUMBERS

The set of real numbers โ„ contains all numbers including positive,negative, and fractional, as well as irrational numbers such as ๐œ‹ or ๐‘’.Every real number can be approximated by a rational number, andthus we can represent every real number ๐‘ฅ by a rational number ๐‘Ž/๐‘that is very close to ๐‘ฅ. For example, we can represent ๐œ‹ by 22/7 withinan error of about 10โˆ’3. If we want a smaller error (e.g., about 10โˆ’4)then we can use 311/99, and so on and so forth.

Figure 2.5: The floating point representation of a realnumber ๐‘ฅ โˆˆ โ„ is its approximation as a number ofthe form ๐œŽ๐‘ โ‹… 2๐‘’ where ๐œŽ โˆˆ ยฑ1, ๐‘’ is an (potentiallynegative) integer, and ๐‘ is a rational number between1 and 2 expressed as a binary fraction 1.๐‘0๐‘1๐‘2 โ€ฆ ๐‘๐‘˜for some ๐‘1, โ€ฆ , ๐‘๐‘˜ โˆˆ 0, 1 (that is ๐‘ = 1 + ๐‘1/2 +๐‘2/4 + โ€ฆ + ๐‘๐‘˜/2๐‘˜). Commonly-used floating pointrepresentations fix the numbers โ„“ and ๐‘˜ of bits torepresent ๐‘’ and ๐‘ respectively. In the example above,assuming we use twoโ€™s complement representationfor ๐‘’, the number represented is โˆ’1 ร— 25 ร— (1 + 1/2 +1/4 + 1/64 + 1/512) = โˆ’56.5625.The above representation of real numbers via rational numbers

that approximate them is a fine choice for a representation scheme.However, typically in computing applications, it is more common touse the floating point representation scheme (see Fig. 2.5) to representreal numbers. In the floating point representation scheme we rep-resent ๐‘ฅ โˆˆ โ„ by the pair (๐‘, ๐‘’) of (positive or negative) integers ofsome prescribed sizes (determined by the desired accuracy) such that๐‘ ร— 2๐‘’ is closest to ๐‘ฅ. Floating point representation is the base-twoversion of scientific notation, where one represents a number ๐‘ฆ โˆˆ ๐‘…as its approximation of the form ๐‘ ร— 10๐‘’ for ๐‘, ๐‘’. It is called โ€œfloatingpointโ€ because we can think of the number ๐‘ as specifying a sequenceof binary digits, and ๐‘’ as describing the location of the โ€œbinary pointโ€within this sequence. The use of floating representation is the reasonwhy in many programming systems, printing the expression 0.1+0.2will result in 0.30000000000000004 and not 0.3, see here, here andhere for more.

The reader might be (rightly) worried about the fact that the float-ing point representation (or the rational number one) can only ap-proximately represent real numbers. In many (though not all) com-putational applications, one can make the accuracy tight enough sothat this does not affect the final result, though sometimes we do needto be careful. Indeed, floating-point bugs can sometimes be no jok-ing matter. For example, floating point rounding errors have beenimplicated in the failure of a U.S. Patriot missile to intercept an IraqiScud missile, costing 28 lives, as well as a 100 million pound error incomputing payouts to British pensioners.

Page 93: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 93

2 ๐‘…๐‘ก๐‘† stands for โ€œreal numbers to stringsโ€.

2.4 CANTORโ€™S THEOREM, COUNTABLE SETS, AND STRING REP-RESENTATIONS OF THE REAL NUMBERS

โ€œFor any collection of fruits, we can make more fruit salads than there arefruits. If not, we could label each salad with a different fruit, and consider thesalad of all fruits not in their salad. The label of this salad is in it if and only ifit is not.โ€, Martha Storey.

Given the issues with floating point approximations for real num-bers, a natural question is whether it is possible to represent real num-bers exactly as strings. Unfortunately, the following theorem showsthat this cannot be done:

Theorem 2.5 โ€” Cantorโ€™s Theorem. There does not exist a one-to-onefunction ๐‘…๐‘ก๐‘† โˆถ โ„ โ†’ 0, 1โˆ—. 2

Countable sets. We say that a set ๐‘† is countable if there is an ontomap ๐ถ โˆถ โ„• โ†’ ๐‘†, or in other words, we can write ๐‘† as the sequence๐ถ(0), ๐ถ(1), ๐ถ(2), โ€ฆ. Since the binary representation yields an ontomap from 0, 1โˆ— to โ„•, and the composition of two onto maps is onto,a set ๐‘† is countable iff there is an onto map from 0, 1โˆ— to ๐‘†. Usingthe basic properties of functions (see Section 1.4.3), a set is countableif and only if there is a one-to-one function from ๐‘† to 0, 1โˆ—. Hence,we can rephrase Theorem 2.5 as follows:

Theorem 2.6 โ€” Cantorโ€™s Theorem (equivalent statement). The reals are un-countable. That is, there does not exist an onto function ๐‘๐‘ก๐‘… โˆถ โ„• โ†’โ„.

Theorem 2.6 was proven by Georg Cantor in 1874. This result (andthe theory around it) was quite shocking to mathematicians at thetime. By showing that there is no one-to-one map from โ„ to 0, 1โˆ— (orโ„•), Cantor showed that these two infinite sets have โ€œdifferent forms ofinfinityโ€ and that the set of real numbers โ„ is in some sense โ€œbiggerโ€than the infinite set 0, 1โˆ—. The notion that there are โ€œshades of infin-ityโ€ was deeply disturbing to mathematicians and philosophers at thetime. The philosopher Ludwig Wittgenstein (whom we mentioned be-fore) called Cantorโ€™s results โ€œutter nonsenseโ€ and โ€œlaughable.โ€ Othersthought they were even worse than that. Leopold Kronecker calledCantor a โ€œcorrupter of youth,โ€ while Henri Poincarรฉ said that Can-torโ€™s ideas โ€œshould be banished from mathematics once and for all.โ€The tide eventually turned, and these days Cantorโ€™s work is univer-sally accepted as the cornerstone of set theory and the foundations ofmathematics. As David Hilbert said in 1925, โ€œNo one shall expel us fromthe paradise which Cantor has created for us.โ€ As we will see later in this

Page 94: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

94 introduction to theoretical computer science

3 ๐น๐‘ก๐‘† stands for โ€œfunctions to stringsโ€.4 ๐น๐‘ก๐‘… stands for โ€œfunctions to reals.โ€

book, Cantorโ€™s ideas also play a huge role in the theory of computa-tion.

Now that we have discussed Theorem 2.5โ€™s importance, let us seethe proof. It is achieved in two steps:

1. Define some infinite set ๐’ณ for which it is easier for us to prove that๐’ณ is not countable (namely, itโ€™s easier for us to prove that there isno one-to-one function from ๐’ณ to 0, 1โˆ—).

2. Prove that there is a one-to-one function ๐บ mapping ๐’ณ to โ„.

We can use a proof by contradiction to show that these two factstogether imply Theorem 2.5. Specifically, if we assume (towards thesake of contradiction) that there exists some one-to-one ๐น mapping โ„to 0, 1โˆ— then the function ๐‘ฅ โ†ฆ ๐น(๐บ(๐‘ฅ)) obtained by composing ๐นwith the function ๐บ from Step 2 above would be a one-to-one functionfrom ๐’ณ to 0, 1โˆ—, which contradicts what we proved in Step 1!

To turn this idea into a full proof of Theorem 2.5 we need to:

โ€ข Define the set ๐’ณ.

โ€ข Prove that there is no one-to-one function from ๐’ณ to 0, 1โˆ—โ€ข Prove that there is a one-to-one function from ๐’ณ to โ„.

We now proceed to do precisely that. That is, we will define the set0, 1โˆž, which will play the role of ๐’ณ, and then state and prove twolemmas that show that this set satisfies our two desired properties.

Definition 2.7 We denote by 0, 1โˆž the set ๐‘“ | ๐‘“ โˆถ โ„• โ†’ 0, 1.That is, 0, 1โˆž is a set of functions, and a function ๐‘“ is in 0, 1โˆž

iff its domain is โ„• and its codomain is 0, 1. We can also think of0, 1โˆž as the set of all infinite sequences of bits, since a function ๐‘“ โˆถโ„• โ†’ 0, 1 can be identified with the sequence (๐‘“(0), ๐‘“(1), ๐‘“(2), โ€ฆ).The following two lemmas show that 0, 1โˆž can play the role of ๐’ณ toestablish Theorem 2.5.Lemma 2.8 There does not exist a one-to-one map ๐น๐‘ก๐‘† โˆถ 0, 1โˆž โ†’0, 1โˆ—.3Lemma 2.9 There does exist a one-to-one map ๐น๐‘ก๐‘… โˆถ 0, 1โˆž โ†’ โ„.4

As weโ€™ve seen above, Lemma 2.8 and Lemma 2.9 together implyTheorem 2.5. To repeat the argument more formally, suppose, forthe sake of contradiction, that there did exist a one-to-one function๐‘…๐‘ก๐‘† โˆถ โ„ โ†’ 0, 1โˆ—. By Lemma 2.9, there exists a one-to-one function๐น๐‘ก๐‘… โˆถ 0, 1โˆž โ†’ โ„. Thus, under this assumption, since the composi-tion of two one-to-one functions is one-to-one (see Exercise 2.12), the

Page 95: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 95

function ๐น๐‘ก๐‘† โˆถ 0, 1โˆž โ†’ 0, 1โˆ— defined as ๐น๐‘ก๐‘†(๐‘“) = ๐‘…๐‘ก๐‘†(๐น๐‘ก๐‘…(๐‘“))will be one to one, contradicting Lemma 2.8. See Fig. 2.7 for a graphi-cal illustration of this argument.

Figure 2.7: We prove Theorem 2.5 by combiningLemma 2.8 and Lemma 2.9. Lemma 2.9, which usesstandard calculus tools, shows the existence of aone-to-one map ๐น๐‘ก๐‘… from the set 0, 1โˆž to the realnumbers. So, if a hypothetical one-to-one map ๐‘…๐‘ก๐‘† โˆถโ„ โ†’ 0, 1โˆ— existed, then we could compose themto get a one-to-one map ๐น๐‘ก๐‘† โˆถ 0, 1โˆž โ†’ 0, 1โˆ—.Yet this contradicts Lemma 2.8- the heart of the proof-which rules out the existence of such a map.

Now all that is left is to prove these two lemmas. We start by prov-ing Lemma 2.8 which is really the heart of Theorem 2.5.

Figure 2.8: We construct a function ๐‘‘ such that ๐‘‘ โ‰ ๐‘†๐‘ก๐น(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1โˆ— by ensuring that๐‘‘(๐‘›(๐‘ฅ)) โ‰  ๐‘†๐‘ก๐น(๐‘ฅ)(๐‘›(๐‘ฅ)) for every ๐‘ฅ โˆˆ 0, 1โˆ—with lexicographic order ๐‘›(๐‘ฅ). We can think of thisas building a table where the columns correspondto numbers ๐‘š โˆˆ โ„• and the rows correspond to๐‘ฅ โˆˆ 0, 1โˆ— (sorted according to ๐‘›(๐‘ฅ)). If the entryin the ๐‘ฅ-th row and the ๐‘š-th column corresponds to๐‘”(๐‘š)) where ๐‘” = ๐‘†๐‘ก๐น(๐‘ฅ) then ๐‘‘ is obtained by goingover the โ€œdiagonalโ€ elements in this table (the entriescorresponding to the ๐‘ฅ-th row and ๐‘›(๐‘ฅ)-th column)and ensuring that ๐‘‘(๐‘ฅ)(๐‘›(๐‘ฅ)) โ‰  ๐‘†๐‘ก๐น(๐‘ฅ)(๐‘ฅ).

Warm-up: โ€Baby Cantorโ€. The proof of Lemma 2.8 is rather subtle. Oneway to get intution for it is to consider the following finite statementโ€there is no onto function ๐‘“ โˆถ 0, โ€ฆ , 99 โ†’ 0, 1100. Of course weknow itโ€™s true since the set 0, 1100 is bigger than the set [100], butletโ€™s see a direct proof. For every ๐‘“ โˆถ 0, โ€ฆ , 99 โ†’ 0, 1100, wecan define the string ๐‘‘ โˆˆ 0, 1100 as follows: ๐‘‘ = (1 โˆ’ ๐‘“(0)0, 1 โˆ’๐‘“(1)1, โ€ฆ , 1 โˆ’ ๐‘“(99)99). If ๐‘“ was onto, then there would exist some๐‘› โˆˆ [100] such that ๐‘“(๐‘›) = ๐‘‘(๐‘›), but we claim that no such ๐‘› exists.

Page 96: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

96 introduction to theoretical computer science

Indeed, if there was such ๐‘›, then the ๐‘›-th coordinate of ๐‘‘ would equal๐‘“(๐‘›)๐‘› but by definition this coordinate equals 1 โˆ’ ๐‘“(๐‘›)๐‘›. See also aโ€œproof by codeโ€ of this statement.

Proof of Lemma 2.8. We will prove that there does not exist an ontofunction ๐‘†๐‘ก๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆž. This implies the lemma sincefor every two sets ๐ด and ๐ต, there exists an onto function from ๐ด to๐ต if and only if there exists a one-to-one function from ๐ต to ๐ด (seeLemma 1.2).

The technique of this proof is known as the โ€œdiagonal argumentโ€and is illustrated in Fig. 2.8. We assume, towards a contradiction, thatthere exists such a function ๐‘†๐‘ก๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆž. We will showthat ๐‘†๐‘ก๐น is not onto by demonstrating a function ๐‘‘ โˆˆ 0, 1โˆž such that๐‘‘ โ‰  ๐‘†๐‘ก๐น(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1โˆ—. Consider the lexicographic orderingof binary strings (i.e., "",0,1,00,01,โ€ฆ). For every ๐‘› โˆˆ โ„•, we let ๐‘ฅ๐‘› be the๐‘›-th string in this order. That is ๐‘ฅ0 = "", ๐‘ฅ1 = 0, ๐‘ฅ2 = 1 and so on andso forth. We define the function ๐‘‘ โˆˆ 0, 1โˆž as follows:๐‘‘(๐‘›) = 1 โˆ’ ๐‘†๐‘ก๐น(๐‘ฅ๐‘›)(๐‘›) (2.4)

for every ๐‘› โˆˆ โ„•. That is, to compute ๐‘‘ on input ๐‘› โˆˆ โ„•, we first com-pute ๐‘” = ๐‘†๐‘ก๐น(๐‘ฅ๐‘›), where ๐‘ฅ๐‘› โˆˆ 0, 1โˆ— is the ๐‘›-th string in the lexico-graphical ordering. Since ๐‘” โˆˆ 0, 1โˆž, it is a function mapping โ„• to0, 1. The value ๐‘‘(๐‘›) is defined to be the negation of ๐‘”(๐‘›).

The definition of the function ๐‘‘ is a bit subtle. One way to thinkabout it is to imagine the function ๐‘†๐‘ก๐น as being specified by an in-finitely long table, in which every row corresponds to a string ๐‘ฅ โˆˆ0, 1โˆ— (with strings sorted in lexicographic order), and contains thesequence ๐‘†๐‘ก๐น(๐‘ฅ)(0), ๐‘†๐‘ก๐น(๐‘ฅ)(1), ๐‘†๐‘ก๐น(๐‘ฅ)(2), โ€ฆ. The diagonal elements inthis table are the values

๐‘†๐‘ก๐น("")(0), ๐‘†๐‘ก๐น(0)(1), ๐‘†๐‘ก๐น(1)(2), ๐‘†๐‘ก๐น(00)(3), ๐‘†๐‘ก๐น(01)(4), โ€ฆ (2.5)

which correspond to the elements ๐‘†๐‘ก๐น(๐‘ฅ๐‘›)(๐‘›) in the ๐‘›-th row and๐‘›-th column of this table for ๐‘› = 0, 1, 2, โ€ฆ. The function ๐‘‘ we definedabove maps every ๐‘› โˆˆ โ„• to the negation of the ๐‘›-th diagonal value.

To complete the proof that ๐‘†๐‘ก๐น is not onto we need to show that๐‘‘ โ‰  ๐‘†๐‘ก๐น(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1โˆ—. Indeed, let ๐‘ฅ โˆˆ 0, 1โˆ— be some stringand let ๐‘” = ๐‘†๐‘ก๐น(๐‘ฅ). If ๐‘› is the position of ๐‘ฅ in the lexicographicalorder then by construction ๐‘‘(๐‘›) = 1 โˆ’ ๐‘”(๐‘›) โ‰  ๐‘”(๐‘›) which means that๐‘” โ‰  ๐‘‘ which is what we wanted to prove.

Page 97: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 97

RRemark 2.10 โ€” Generalizing beyond strings and reals.Lemma 2.8 doesnโ€™t really have much to do with thenatural numbers or the strings. An examination ofthe proof shows that it really shows that for everyset ๐‘†, there is no one-to-one map ๐น โˆถ 0, 1๐‘† โ†’ ๐‘†where 0, 1๐‘† denotes the set ๐‘“ | ๐‘“ โˆถ ๐‘† โ†’ 0, 1of all Boolean functions with domain ๐‘†. Since we canidentify a subset ๐‘‰ โŠ† ๐‘† with its characteristic function๐‘“ = 1๐‘‰ (i.e., 1๐‘‰ (๐‘ฅ) = 1 iff ๐‘ฅ โˆˆ ๐‘‰ ), we can think of0, 1๐‘† also as the set of all subsets of ๐‘†. This subsetis sometimes called the power set of ๐‘† and denoted by๐’ซ(๐‘†) or 2๐‘† .The proof of Lemma 2.8 can be generalized to showthat there is no one-to-one map between a set and itspower set. In particular, it means that the set 0, 1โ„ isโ€œeven biggerโ€ than โ„. Cantor used these ideas to con-struct an infinite hierarchy of shades of infinity. Thenumber of such shades turns out to be much largerthan |โ„•| or even |โ„|. He denoted the cardinality of โ„•by โ„ต0 and denoted the next largest infinite numberby โ„ต1. (โ„ต is the first letter in the Hebrew alphabet.)Cantor also made the continuum hypothesis that|โ„| = โ„ต1. We will come back to the fascinating storyof this hypothesis later on in this book. This lecture ofAaronson mentions some of these issues (see also thisBerkeley CS 70 lecture).

To complete the proof of Theorem 2.5, we need to show Lemma 2.9.This requires some calculus background but is otherwise straightfor-ward. If you have not had much experience with limits of a real seriesbefore, then the formal proof below might be a little hard to follow.This part is not the core of Cantorโ€™s argument, nor are such limitsimportant to the remainder of this book, so you can feel free to takeLemma 2.9 on faith and skip the proof.

Proof Idea:

We define ๐น๐‘ก๐‘…(๐‘“) to be the number between 0 and 2 whose dec-imal expansion is ๐‘“(0).๐‘“(1)๐‘“(2) โ€ฆ, or in other words ๐น๐‘ก๐‘…(๐‘“) =โˆ‘โˆž๐‘–=0 ๐‘“(๐‘–) โ‹… 10โˆ’๐‘–. If ๐‘“ and ๐‘” are two distinct functions in 0, 1โˆž, thenthere must be some input ๐‘˜ in which they disagree. If we take theminimum such ๐‘˜, then the numbers ๐‘“(0).๐‘“(1)๐‘“(2) โ€ฆ ๐‘“(๐‘˜ โˆ’ 1)๐‘“(๐‘˜) โ€ฆand ๐‘”(0).๐‘”(1)๐‘”(2) โ€ฆ ๐‘”(๐‘˜) โ€ฆ agree with each other all the way up to the๐‘˜ โˆ’ 1-th digit after the decimal point, and disagree on the ๐‘˜-th digit.But then these numbers must be distinct. Concretely, if ๐‘“(๐‘˜) = 1 and๐‘”(๐‘˜) = 0 then the first number is larger than the second, and otherwise(๐‘“(๐‘˜) = 0 and ๐‘”(๐‘˜) = 1) the first number is smaller than the second.In the proof we have to be a little careful since these are numbers withinfinite expansions. For example, the number one half has two decimal

Page 98: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

98 introduction to theoretical computer science

expansions 0.5 and 0.49999 โ‹ฏ. However, this issue does not come uphere, since we restrict attention only to numbers with decimal expan-sions that do not involve the digit 9.โ‹†Proof of Lemma 2.9. For every ๐‘“ โˆˆ 0, 1โˆž, we define ๐น๐‘ก๐‘…(๐‘“) to be thenumber whose decimal expansion is ๐‘“(0).๐‘“(1)๐‘“(2)๐‘“(3) โ€ฆ. Formally,๐น๐‘ก๐‘…(๐‘“) = โˆžโˆ‘๐‘–=0 ๐‘“(๐‘–) โ‹… 10โˆ’๐‘– (2.6)

It is a known result in calculus (whose proof we will not repeat here)that the series on the righthand side of (2.6) converges to a definitelimit in โ„.

We now prove that ๐น๐‘ก๐‘… is one to one. Let ๐‘“, ๐‘” be two distinct func-tions in 0, 1โˆž. Since ๐‘“ and ๐‘” are distinct, there must be some inputon which they differ, and we define ๐‘˜ to be the smallest such inputand assume without loss of generality that ๐‘“(๐‘˜) = 0 and ๐‘”(๐‘˜) = 1.(Otherwise, if ๐‘“(๐‘˜) = 1 and ๐‘”(๐‘˜) = 0, then we can simply switch theroles of ๐‘“ and ๐‘”.) The numbers ๐น๐‘ก๐‘…(๐‘“) and ๐น๐‘ก๐‘…(๐‘”) agree with eachother up to the ๐‘˜โˆ’1-th digit up after the decimal point. Since this digitequals 0 for ๐น๐‘ก๐‘…(๐‘“) and equals 1 for ๐น๐‘ก๐‘…(๐‘”), we claim that ๐น๐‘ก๐‘…(๐‘”) isbigger than ๐น๐‘ก๐‘…(๐‘“) by at least 0.5 โ‹… 10โˆ’๐‘˜. To see this note that the dif-ference ๐น๐‘ก๐‘…(๐‘”) โˆ’ ๐น๐‘ก๐‘…(๐‘“) will be minimized if ๐‘”(โ„“) = 0 for every โ„“ > ๐‘˜and ๐‘“(โ„“) = 1 for every โ„“ > ๐‘˜, in which case (since ๐‘“ and ๐‘” agree up tothe ๐‘˜ โˆ’ 1-th digit)

๐น๐‘ก๐‘…(๐‘”) โˆ’ ๐น๐‘ก๐‘…(๐‘“) = 10โˆ’๐‘˜ โˆ’ 10โˆ’๐‘˜โˆ’1 โˆ’ 10โˆ’๐‘˜โˆ’2 โˆ’ 10โˆ’๐‘˜โˆ’3 โˆ’ โ‹ฏ (2.7)

Since the infinite series โˆ‘โˆž๐‘—=0 10โˆ’๐‘– converges to 10/9, it follows thatfor every such ๐‘“ and ๐‘”, ๐น๐‘ก๐‘…(๐‘”) โˆ’ ๐น๐‘ก๐‘…(๐‘“) โ‰ฅ 10โˆ’๐‘˜ โˆ’ 10โˆ’๐‘˜โˆ’1 โ‹… (10/9) > 0.In particular we see that for every distinct ๐‘“, ๐‘” โˆˆ 0, 1โˆž, ๐น๐‘ก๐‘…(๐‘“) โ‰ ๐น๐‘ก๐‘…(๐‘”), implying that the function ๐น๐‘ก๐‘… is one to one.

RRemark 2.11 โ€” Using decimal expansion (op-tional). In the proof above we used the fact that1 + 1/10 + 1/100 + โ‹ฏ converges to 10/9, whichplugging into (2.7) yields that the difference between๐น๐‘ก๐‘…(๐‘”) and ๐น๐‘ก๐‘…(โ„Ž) is at least 10โˆ’๐‘˜โˆ’10โˆ’๐‘˜โˆ’1 โ‹…(10/9) > 0.While the choice of the decimal representation for ๐น๐‘ก๐‘…was arbitrary, we could not have used the binaryrepresentation in its place. Had we used the binaryexpansion instead of decimal, the corresponding se-quence 1 + 1/2 + 1/4 + โ‹ฏ converges to 2/1 = 2,

Page 99: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 99

and since 2โˆ’๐‘˜ = 2โˆ’๐‘˜โˆ’1 โ‹… 2, we could not have de-duced that ๐น๐‘ก๐‘… is one to one. Indeed there do existpairs of distinct sequences ๐‘“, ๐‘” โˆˆ 0, 1โˆž such thatโˆ‘โˆž๐‘–=0 ๐‘“(๐‘–)2โˆ’๐‘– = โˆ‘โˆž๐‘–=0 ๐‘”(๐‘–)2โˆ’๐‘–. (For example, the se-quence 1, 0, 0, 0, โ€ฆ and the sequence 0, 1, 1, 1, โ€ฆ havethis property.)

2.4.1 Corollary: Boolean functions are uncountableCantorโ€™s Theorem yields the following corollary that we will useseveral times in this book: the set of all Boolean functions (mapping0, 1โˆ— to 0, 1) is not countable:

Theorem 2.12 โ€” Boolean functions are uncountable. Let ALL be the set ofall functions ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. Then ALL is uncountable. Equiv-alently, there does not exist an onto map ๐‘†๐‘ก๐ด๐ฟ๐ฟ โˆถ 0, 1โˆ— โ†’ ALL.

Proof Idea:

This is a direct consequence of Lemma 2.8, since we can use thebinary representation to show a one-to-one map from 0, 1โˆž to ALL.Hence the uncountability of 0, 1โˆž implies the uncountability ofALL.โ‹†Proof of Theorem 2.12. Since 0, 1โˆž is uncountable, the result willfollow by showing a one-to-one map from 0, 1โˆž to ALL. The reasonis that the existence of such a map implies that if ALL was countable,and hence there was a one-to-one map from ALL to โ„•, then therewould have been a one-to-one map from 0, 1โˆž to โ„•, contradictingLemma 2.8.

We now show this one-to-one map. We simply map a function๐‘“ โˆˆ 0, 1โˆž to the function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 as follows. We let๐น(0) = ๐‘“(0), ๐น(1) = ๐‘“(1), ๐น(10) = ๐น(2), ๐น(11) = ๐น(3) and so on andso forth. That is, for every ๐‘ฅ โˆˆ 0, 1โˆ— that represents a natural number๐‘› in the binary basis, we define ๐น(๐‘ฅ) = ๐‘“(๐‘›). If ๐‘ฅ does not representsuch a number (e.g., it has a leading zero), then we set ๐น(๐‘ฅ) = 0.

This map is one-to-one since if ๐‘“ โ‰  ๐‘” are two distinct elements in0, 1โˆž, then there must be some input ๐‘› โˆˆ โ„• on which ๐‘“(๐‘›) โ‰  ๐‘”(๐‘›).But then if ๐‘ฅ โˆˆ 0, 1โˆ— is the string representing ๐‘›, we see that ๐น(๐‘ฅ) โ‰ ๐บ(๐‘ฅ) where ๐น is the function in ALL that ๐‘“ mapped to, and ๐บ is thefunction that ๐‘” is mapped to.

Page 100: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

100 introduction to theoretical computer science

2.4.2 Equivalent conditions for countabilityThe results above establish many equivalent ways to phrase the factthat a set is countable. Specifically, the following statements are allequivalent:

1. The set ๐‘† is countable

2. There exists an onto map from โ„• to ๐‘†3. There exists an onto map from 0, 1โˆ— to ๐‘†.

4. There exists a one-to-one map from ๐‘† to โ„•5. There exists a one-to-one map from ๐‘† to 0, 1โˆ—.6. There exists an onto map from some countable set ๐‘‡ to ๐‘†.

7. There exists a one-to-one map from ๐‘† to some countable set ๐‘‡ .

PMake sure you know how to prove the equivalence ofall the results above.

2.5 REPRESENTING OBJECTS BEYOND NUMBERS

Numbers are of course by no means the only objects that we can repre-sent as binary strings. A representation scheme for representing objectsfrom some set ๐’ช consists of an encoding function that maps an object in๐’ช to a string, and a decoding function that decodes a string back to anobject in ๐’ช. Formally, we make the following definition:

Definition 2.13 โ€” String representation. Let ๐’ช be any set. A representationscheme for ๐’ช is a pair of functions ๐ธ, ๐ท where ๐ธ โˆถ ๐’ช โ†’ 0, 1โˆ— is atotal one-to-one function, ๐ท โˆถ 0, 1โˆ— โ†’๐‘ ๐’ช is a (possibly partial)function, and such that ๐ท and ๐ธ satisfy that ๐ท(๐ธ(๐‘œ)) = ๐‘œ for every๐‘œ โˆˆ ๐’ช. ๐ธ is known as the encoding function and ๐ท is known as thedecoding function.

Note that the condition ๐ท(๐ธ(๐‘œ)) = ๐‘œ for every ๐‘œ โˆˆ ๐’ช impliesthat ๐ท is onto (can you see why?). It turns out that to construct arepresentation scheme we only need to find an encoding function. Thatis, every one-to-one encoding function has a corresponding decodingfunction, as shown in the following lemma:Lemma 2.14 Suppose that ๐ธ โˆถ ๐’ช โ†’ 0, 1โˆ— is one-to-one. Then thereexists a function ๐ท โˆถ 0, 1โˆ— โ†’ ๐’ช such that ๐ท(๐ธ(๐‘œ)) = ๐‘œ for every๐‘œ โˆˆ ๐’ช.

Page 101: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 101

Proof. Let ๐‘œ0 be some arbitrary element of ๐’ช. For every ๐‘ฅ โˆˆ 0, 1โˆ—,there exists either zero or a single ๐‘œ โˆˆ ๐’ช such that ๐ธ(๐‘œ) = ๐‘ฅ (otherwise๐ธ would not be one-to-one). We will define ๐ท(๐‘ฅ) to equal ๐‘œ0 in thefirst case and this single object ๐‘œ in the second case. By definition๐ท(๐ธ(๐‘œ)) = ๐‘œ for every ๐‘œ โˆˆ ๐’ช.

RRemark 2.15 โ€” Total decoding functions. While thedecoding function of a representation scheme can ingeneral be a partial function, the proof of Lemma 2.14implies that every representation scheme has a totaldecoding function. This observation can sometimes beuseful.

2.5.1 Finite representationsIf ๐’ช is finite, then we can represent every object in ๐’ช as a string oflength at most some number ๐‘›. What is the value of ๐‘›? Let us denoteby 0, 1โ‰ค๐‘› the set ๐‘ฅ โˆˆ 0, 1โˆ— โˆถ |๐‘ฅ| โ‰ค ๐‘› of strings of length at most ๐‘›.The size of 0, 1โ‰ค๐‘› is equal to|0, 10| + |0, 11| + |0, 12| + โ‹ฏ + |0, 1๐‘›| = ๐‘›โˆ‘๐‘–=0 2๐‘– = 2๐‘›+1 โˆ’ 1 (2.8)

using the standard formula for summing a geometric progression.To obtain a representation of objects in ๐’ช as strings of length at

most ๐‘› we need to come up with a one-to-one function from ๐’ช to0, 1โ‰ค๐‘›. We can do so, if and only if |๐’ช| โ‰ค 2๐‘›+1 โˆ’ 1 as is implied bythe following lemma:Lemma 2.16 For every two finite sets ๐‘†, ๐‘‡ , there exists a one-to-one๐ธ โˆถ ๐‘† โ†’ ๐‘‡ if and only if |๐‘†| โ‰ค |๐‘‡ |.Proof. Let ๐‘˜ = |๐‘†| and ๐‘š = |๐‘‡ | and so write the elements of ๐‘† and๐‘‡ as ๐‘† = ๐‘ 0, ๐‘ 1, โ€ฆ , ๐‘ ๐‘˜โˆ’1 and ๐‘‡ = ๐‘ก0, ๐‘ก1, โ€ฆ , ๐‘ก๐‘šโˆ’1. We need toshow that there is a one-to-one function ๐ธ โˆถ ๐‘† โ†’ ๐‘‡ iff ๐‘˜ โ‰ค ๐‘š. Forthe โ€œifโ€ direction, if ๐‘˜ โ‰ค ๐‘š we can simply define ๐ธ(๐‘ ๐‘–) = ๐‘ก๐‘– for every๐‘– โˆˆ [๐‘˜]. Clearly for ๐‘– โ‰  ๐‘—, ๐‘ก๐‘– = ๐ธ(๐‘ ๐‘–) โ‰  ๐ธ(๐‘ ๐‘—) = ๐‘ก๐‘—, and hence thisfunction is one-to-one. In the other direction, suppose that ๐‘˜ > ๐‘š and๐ธ โˆถ ๐‘† โ†’ ๐‘‡ is some function. Then ๐ธ cannot be one-to-one. Indeed, for๐‘– = 0, 1, โ€ฆ , ๐‘š โˆ’ 1 let us โ€œmarkโ€ the element ๐‘ก๐‘— = ๐ธ(๐‘ ๐‘–) in ๐‘‡ . If ๐‘ก๐‘— wasmarked before, then we have found two objects in ๐‘† mapping to thesame element ๐‘ก๐‘—. Otherwise, since ๐‘‡ has ๐‘š elements, when we get to๐‘– = ๐‘š โˆ’ 1 we mark all the objects in ๐‘‡ . Hence, in this case, ๐ธ(๐‘ ๐‘š) mustmap to an element that was already marked before. (This observationis sometimes known as the โ€œPigeon Hole Principleโ€: the principle that

Page 102: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

102 introduction to theoretical computer science

if you have a pigeon coop with ๐‘š holes, and ๐‘˜ > ๐‘š pigeons, then theremust be two pigeons in the same hole. )

2.5.2 Prefix-free encodingWhen showing a representation scheme for rational numbers, we usedthe โ€œhackโ€ of encoding the alphabet 0, 1, โ€– to represent tuples ofstrings as a single string. This is a special case of the general paradigmof prefix-free encoding. The idea is the following: if our representationhas the property that no string ๐‘ฅ representing an object ๐‘œ is a prefix(i.e., an initial substring) of a string ๐‘ฆ representing a different object๐‘œโ€ฒ, then we can represent a list of objects by merely concatenating therepresentations of all the list members. For example, because in En-glish every sentence ends with a punctuation mark such as a period,exclamation, or question mark, no sentence can be a prefix of anotherand so we can represent a list of sentences by merely concatenatingthe sentences one after the other. (English has some complicationssuch as periods used for abbreviations (e.g., โ€œe.g.โ€) or sentence quotescontaining punctuation, but the high level point of a prefix-free repre-sentation for sentences still holds.)

It turns out that we can transform every representation to a prefix-free form. This justifies Big Idea 1, and allows us to transform a repre-sentation scheme for objects of a type ๐‘‡ to a representation scheme oflists of objects of the type ๐‘‡ . By repeating the same technique, we canalso represent lists of lists of objects of type ๐‘‡ , and so on and so forth.But first let us formally define prefix-freeness:

Definition 2.17 โ€” Prefix free encoding. For two strings ๐‘ฆ, ๐‘ฆโ€ฒ, we say that ๐‘ฆis a prefix of ๐‘ฆโ€ฒ if |๐‘ฆ| โ‰ค |๐‘ฆโ€ฒ| and for every ๐‘– < |๐‘ฆ|, ๐‘ฆโ€ฒ๐‘– = ๐‘ฆ๐‘–.

Let ๐’ช be a non-empty set and ๐ธ โˆถ ๐’ช โ†’ 0, 1โˆ— be a function. Wesay that ๐ธ is prefix-free if ๐ธ(๐‘œ) is non-empty for every ๐‘œ โˆˆ ๐’ช andthere does not exist a distinct pair of objects ๐‘œ, ๐‘œโ€ฒ โˆˆ ๐’ช such that๐ธ(๐‘œ) is a prefix of ๐ธ(๐‘œโ€ฒ).Recall that for every set ๐’ช, the set ๐’ชโˆ— consists of all finite length

tuples (i.e., lists) of elements in ๐’ช. The following theorem shows thatif ๐ธ is a prefix-free encoding of ๐’ช then by concatenating encodings wecan obtain a valid (i.e., one-to-one) representation of ๐’ชโˆ—:

Theorem 2.18 โ€” Prefix-free implies tuple encoding. Suppose that ๐ธ โˆถ ๐’ช โ†’0, 1โˆ— is prefix-free. Then the following map ๐ธ โˆถ ๐’ชโˆ— โ†’ 0, 1โˆ— isone to one, for every ๐‘œ0, โ€ฆ , ๐‘œ๐‘˜โˆ’1 โˆˆ ๐’ชโˆ—, we define๐ธ(๐‘œ0, โ€ฆ , ๐‘œ๐‘˜โˆ’1) = ๐ธ(๐‘œ0)๐ธ(๐‘œ1) โ‹ฏ ๐ธ(๐‘œ๐‘˜โˆ’1) . (2.9)

Page 103: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 103

Figure 2.9: If we have a prefix-free representation ofeach object then we can concatenate the representa-tions of ๐‘˜ objects to obtain a representation for thetuple (๐‘œ0, โ€ฆ , ๐‘œ๐‘˜โˆ’1).

PTheorem 2.18 is an example of a theorem that is a littlehard to parse, but in fact is fairly straightforward toprove once you understand what it means. Therefore,I highly recommend that you pause here to makesure you understand the statement of this theorem.You should also try to prove it on your own beforeproceeding further.

Proof Idea:

The idea behind the proof is simple. Suppose that for examplewe want to decode a triple (๐‘œ0, ๐‘œ1, ๐‘œ2) from its representation ๐‘ฅ =๐ธ(๐‘œ0, ๐‘œ1, ๐‘œ2) = ๐ธ(๐‘œ0)๐ธ(๐‘œ1)๐ธ(๐‘œ2). We will do so by first finding thefirst prefix ๐‘ฅ0 of ๐‘ฅ that is a representation of some object. Then wewill decode this object, remove ๐‘ฅ0 from ๐‘ฅ to obtain a new string ๐‘ฅโ€ฒ,and continue onwards to find the first prefix ๐‘ฅ1 of ๐‘ฅโ€ฒ and so on and soforth (see Exercise 2.9). The prefix-freeness property of ๐ธ will ensurethat ๐‘ฅ0 will in fact be ๐ธ(๐‘œ0), ๐‘ฅ1 will be ๐ธ(๐‘œ1), etc.โ‹†Proof of Theorem 2.18. We now show the formal proof. Suppose, to-wards the sake of contradiction, that there exist two distinct tuples(๐‘œ0, โ€ฆ , ๐‘œ๐‘˜โˆ’1) and (๐‘œโ€ฒ0, โ€ฆ , ๐‘œโ€ฒ๐‘˜โ€ฒโˆ’1) such that๐ธ(๐‘œ0, โ€ฆ , ๐‘œ๐‘˜โˆ’1) = ๐ธ(๐‘œโ€ฒ0, โ€ฆ , ๐‘œโ€ฒ๐‘˜โ€ฒโˆ’1) . (2.10)

We will denote the string ๐ธ(๐‘œ0, โ€ฆ , ๐‘œ๐‘˜โˆ’1) by ๐‘ฅ.Let ๐‘– be the first index such that ๐‘œ๐‘– โ‰  ๐‘œโ€ฒ๐‘–. (If ๐‘œ๐‘– = ๐‘œโ€ฒ๐‘– for all ๐‘– then,

since we assume the two tuples are distinct, one of them must belarger than the other. In this case we assume without loss of generalitythat ๐‘˜โ€ฒ > ๐‘˜ and let ๐‘– = ๐‘˜.) In the case that ๐‘– < ๐‘˜, we see that the string๐‘ฅ can be written in two different ways:

๐‘ฅ = ๐ธ(๐‘œ0, โ€ฆ , ๐‘œ๐‘˜โˆ’1) = ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1๐ธ(๐‘œ๐‘–)๐ธ(๐‘œ๐‘–+1) โ‹ฏ ๐ธ(๐‘œ๐‘˜โˆ’1) (2.11)

and

๐‘ฅ = ๐ธ(๐‘œโ€ฒ0, โ€ฆ , ๐‘œโ€ฒ๐‘˜โ€ฒโˆ’1) = ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1๐ธ(๐‘œโ€ฒ๐‘–)๐ธ(๐‘œโ€ฒ๐‘–+1) โ‹ฏ ๐ธ(๐‘œโ€ฒ๐‘˜โˆ’1) (2.12)

where ๐‘ฅ๐‘— = ๐ธ(๐‘œ๐‘—) = ๐ธ(๐‘œโ€ฒ๐‘—) for all ๐‘— < ๐‘–. Let ๐‘ฆ be the string obtainedafter removing the prefix ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’๐‘– from ๐‘ฅ. We see that ๐‘ฆ can be writ-ten as both ๐‘ฆ = ๐ธ(๐‘œ๐‘–)๐‘  for some string ๐‘  โˆˆ 0, 1โˆ— and as ๐‘ฆ = ๐ธ(๐‘œโ€ฒ๐‘–)๐‘ โ€ฒfor some ๐‘ โ€ฒ โˆˆ 0, 1โˆ—. But this means that one of ๐ธ(๐‘œ๐‘–) and ๐ธ(๐‘œโ€ฒ๐‘–) mustbe a prefix of the other, contradicting the prefix-freeness of ๐ธ.

Page 104: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

104 introduction to theoretical computer science

In the case that ๐‘– = ๐‘˜ and ๐‘˜โ€ฒ > ๐‘˜, we get a contradiction in thefollowing way. In this case

๐‘ฅ = ๐ธ(๐‘œ0) โ‹ฏ ๐ธ(๐‘œ๐‘˜โˆ’1) = ๐ธ(๐‘œ0) โ‹ฏ ๐ธ(๐‘œ๐‘˜โˆ’1)๐ธ(๐‘œโ€ฒ๐‘˜) โ‹ฏ ๐ธ(๐‘œโ€ฒ๐‘˜โ€ฒโˆ’1) (2.13)

which means that ๐ธ(๐‘œโ€ฒ๐‘˜) โ‹ฏ ๐ธ(๐‘œโ€ฒ๐‘˜โ€ฒโˆ’1) must correspond to the emptystring "". But in such a case ๐ธ(๐‘œโ€ฒ๐‘˜) must be the empty string, which inparticular is the prefix of any other string, contradicting the prefix-freeness of ๐ธ.

RRemark 2.19 โ€” Prefix freeness of list representation.Even if the representation ๐ธ of objects in ๐’ช is prefixfree, this does not mean that our representation ๐ธof lists of such objects will be prefix free as well. Infact, it wonโ€™t be: for every three objects ๐‘œ, ๐‘œโ€ฒ, ๐‘œโ€ณ therepresentation of the list (๐‘œ, ๐‘œโ€ฒ) will be a prefix of therepresentation of the list (๐‘œ, ๐‘œโ€ฒ, ๐‘œโ€ณ). However, as we seein Lemma 2.20 below, we can transform every repre-sentation into prefix-free form, and so will be able touse that transformation if needed to represent lists oflists, lists of lists of lists, and so on and so forth.

2.5.3 Making representations prefix-freeSome natural representations are prefix-free. For example, every fixedoutput length representation (i.e., one-to-one function ๐ธ โˆถ ๐’ช โ†’ 0, 1๐‘›)is automatically prefix-free, since a string ๐‘ฅ can only be a prefix ofan equal-length ๐‘ฅโ€ฒ if ๐‘ฅ and ๐‘ฅโ€ฒ are identical. Moreover, the approachwe used for representing rational numbers can be used to show thefollowing:Lemma 2.20 Let ๐ธ โˆถ ๐’ช โ†’ 0, 1โˆ— be a one-to-one function. Then there isa one-to-one prefix-free encoding ๐ธ such that |๐ธ(๐‘œ)| โ‰ค 2|๐ธ(๐‘œ)| + 2 forevery ๐‘œ โˆˆ ๐’ช.

PFor the sake of completeness, we will include theproof below, but it is a good idea for you to pausehere and try to prove it on your own, using the sametechnique we used for representing rational numbers.

Proof of Lemma 2.20. The idea behind the proof is to use the map 0 โ†ฆ00, 1 โ†ฆ 11 to โ€œdoubleโ€ every bit in the string ๐‘ฅ and then mark theend of the string by concatenating to it the pair 01. If we encode a

Page 105: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 105

string ๐‘ฅ in this way, it ensures that the encoding of ๐‘ฅ is never a prefixof the encoding of a distinct string ๐‘ฅโ€ฒ. Formally, we define the functionPF โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— as follows:

PF(๐‘ฅ) = ๐‘ฅ0๐‘ฅ0๐‘ฅ1๐‘ฅ1 โ€ฆ ๐‘ฅ๐‘›โˆ’1๐‘ฅ๐‘›โˆ’101 (2.14)

for every ๐‘ฅ โˆˆ 0, 1โˆ—. If ๐ธ โˆถ ๐’ช โ†’ 0, 1โˆ— is the (potentially notprefix-free) representation for ๐’ช, then we transform it into a prefix-free representation ๐ธ โˆถ ๐’ช โ†’ 0, 1โˆ— by defining ๐ธ(๐‘œ) = PF(๐ธ(๐‘œ)).

To prove the lemma we need to show that (1) ๐ธ is one-to-one and(2) ๐ธ is prefix-free. In fact, prefix freeness is a stronger condition thanone-to-one (if two strings are equal then in particular one of them is aprefix of the other) and hence it suffices to prove (2), which we nowdo.

Let ๐‘œ โ‰  ๐‘œโ€ฒ in ๐’ช be two distinct objects. We will prove that ๐ธ(๐‘œ)is a not a prefix of ๐ธ(๐‘œโ€ฒ). Define ๐‘ฅ = ๐ธ(๐‘œ) and ๐‘ฅโ€ฒ = ๐ธ(๐‘œโ€ฒ). Since๐ธ is one-to-one, ๐‘ฅ โ‰  ๐‘ฅโ€ฒ. Under our assumption, PF(๐‘ฅ) is a prefixof PF(๐‘ฅโ€ฒ). If |๐‘ฅ| < |๐‘ฅโ€ฒ| then the two bits in positions 2|๐‘ฅ|, 2|๐‘ฅ| + 1in PF(๐‘ฅ) have the value 01 but the corresponding bits in PF(๐‘ฅโ€ฒ) willequal either 00 or 11 (depending on the |๐‘ฅ|-th bit of ๐‘ฅโ€ฒ) and hencePF(๐‘ฅ) cannot be a prefix of PF(๐‘ฅโ€ฒ). If |๐‘ฅ| = |๐‘ฅโ€ฒ| then, since ๐‘ฅ โ‰  ๐‘ฅโ€ฒ,there must be a coordinate ๐‘– in which they differ, meaning that thestrings PF(๐‘ฅ) and PF(๐‘ฅโ€ฒ) differ in the coordinates 2๐‘–, 2๐‘– + 1, whichagain means that PF(๐‘ฅ) cannot be a prefix of PF(๐‘ฅโ€ฒ). If |๐‘ฅ| > |๐‘ฅโ€ฒ|then |PF(๐‘ฅ)| = 2|๐‘ฅ| + 2 > |PF(๐‘ฅโ€ฒ)| = 2|๐‘ฅโ€ฒ| + 2 and hence PF(๐‘ฅ) islonger than (and cannot be a prefix of) PF(๐‘ฅโ€ฒ). In all cases we see thatPF(๐‘ฅ) = ๐ธ(๐‘œ) is not a prefix of PF(๐‘ฅโ€ฒ) = ๐ธ(๐‘œโ€ฒ), hence completing theproof.

The proof of Lemma 2.20 is not the only or even the best way totransform an arbitrary representation into prefix-free form. Exer-cise 2.10 asks you to construct a more efficient prefix-free transforma-tion satisfying |๐ธ(๐‘œ)| โ‰ค |๐ธ(๐‘œ)| + ๐‘‚(log |๐ธ(๐‘œ)|).2.5.4 โ€œProof by Pythonโ€ (optional)The proofs of Theorem 2.18 and Lemma 2.20 are constructive in thesense that they give us:

โ€ข A way to transform the encoding and decoding functions of anyrepresentation of an object ๐‘‚ to encoding and decoding functionsthat are prefix-free, and

โ€ข A way to extend prefix-free encoding and decoding of single objectsto encoding and decoding of lists of objects by concatenation.

Page 106: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

106 introduction to theoretical computer science

Specifically, we could transform any pair of Python functions en-code and decode to functions pfencode and pfdecode that correspondto a prefix-free encoding and decoding. Similarly, given pfencode andpfdecode for single objects, we can extend them to encoding of lists.Let us show how this works for the case of the NtS and StN functionswe defined above.

We start with the โ€œPython proofโ€ of Lemma 2.20: a way to trans-form an arbitrary representation into one that is prefix free. The func-tion prefixfree below takes as input a pair of encoding and decodingfunctions, and returns a triple of functions containing prefix-free encod-ing and decoding functions, as well as a function that checks whethera string is a valid encoding of an object.

# takes functions encode and decode mapping# objects to lists of bits and vice versa,# and returns functions pfencode and pfdecode that# maps objects to lists of bits and vice versa# in a prefix-free way.# Also returns a function pfvalid that says# whether a list is a valid encodingdef prefixfree(encode, decode):

def pfencode(o):L = encode(o)return [L[i//2] for i in range(2*len(L))]+[0,1]

def pfdecode(L):return decode([L[j] for j in range(0,len(L)-2,2)])

def pfvalid(L):return (len(L) % 2 == 0 ) and all(L[2*i]==L[2*i+1]

for i in range((len(L)-2)//2)) andL[-2:]==[0,1]

return pfencode, pfdecode, pfvalid

pfNtS, pfStN , pfvalidN = prefixfree(NtS,StN)

NtS(234)# 11101010pfNtS(234)# 111111001100110001pfStN(pfNtS(234))# 234pfvalidM(pfNtS(234))# true

Page 107: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 107

PNote that the Python function prefixfree abovetakes two Python functions as input and outputsthree Python functions as output. (When itโ€™s nottoo awkward, we use the term โ€œPython functionโ€ orโ€œsubroutineโ€ to distinguish between such snippets ofPython programs and mathematical functions.) Youdonโ€™t have to know Python in this course, but you doneed to get comfortable with the idea of functions asmathematical objects in their own right, that can beused as inputs and outputs of other functions.

We now show a โ€œPython proofโ€ of Theorem 2.18. Namely, we showa function represlists that takes as input a prefix-free representationscheme (implemented via encoding, decoding, and validity testingfunctions) and outputs a representation scheme for lists of such ob-jects. If we want to make this representation prefix-free then we couldfit it into the function prefixfree above.

def represlists(pfencode,pfdecode,pfvalid):"""Takes functions pfencode, pfdecode and pfvalid,and returns functions encodelists, decodeliststhat can encode and decode lists of the objectsrespectively."""

def encodelist(L):"""Gets list of objects, encodes it as list of

bits"""return "".join([pfencode(obj) for obj in L])

def decodelist(S):"""Gets lists of bits, returns lists of objects"""i=0; j=1 ; res = []while j<=len(S):

if pfvalid(S[i:j]):res += [pfdecode(S[i:j])]i=j

j+= 1return res

return encodelist,decodelist

LtS , StL = represlists(pfNtS,pfStN,pfvalidN)

Page 108: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

108 introduction to theoretical computer science

Figure 2.10: The word โ€œBinaryโ€ in โ€œGrade 1โ€ orโ€œuncontractedโ€ Unified English Braille. This word isencoded using seven symbols since the first one is amodifier indicating that the first letter is capitalized.

LtS([234,12,5])# 111111001100110001111100000111001101StL(LtS([234,12,5]))# [234, 12, 5]

2.5.5 Representing letters and textWe can represent a letter or symbol by a string, and then if this rep-resentation is prefix-free, we can represent a sequence of symbols bymerely concatenating the representation of each symbol. One suchrepresentation is the ASCII that represents 128 letters and symbols asstrings of 7 bits. Since the ASCII representation is fixed-length, it isautomatically prefix-free (can you see why?). Unicode is representa-tion of (at the time of this writing) about 128,000 symbols as numbers(known as code points) between 0 and 1, 114, 111. There are severaltypes of prefix-free representations of the code points, a popular onebeing UTF-8 that encodes every codepoint into a string of length be-tween 8 and 32.

Example 2.21 โ€” The Braille representation. The Braille system is anotherway to encode letters and other symbols as binary strings. Specifi-cally, in Braille, every letter is encoded as a string in 0, 16, whichis written using indented dots arranged in two columns and threerows, see Fig. 2.10. (Some symbols require more than one six-bitstring to encode, and so Braille uses a more general prefix-freeencoding.)

The Braille system was invented in 1821 by Louis Braille whenhe was just 12 years old (though he continued working on it andimproving it throughout his life). Braille was a French boy wholost his eyesight at the age of 5 as the result of an accident.

Example 2.22 โ€” Representing objects in C (optional). We can use pro-gramming languages to probe how our computing environmentrepresents various values. This is easiest to do in โ€œunsafeโ€ pro-gramming languages such as C that allow direct access to thememory.

Using a simple C program we have produced the followingrepresentations of various values. One can see that for integers,multiplying by 2 corresponds to a โ€œleft shiftโ€ inside each byte. Incontrast, for floating point numbers, multiplying by two corre-sponds to adding one to the exponent part of the representation.In the architecture we used, a negative number is represented

Page 109: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 109

using the twoโ€™s complement approach. C represents strings in aprefix-free form by ensuring that a zero byte is at their end.

int 2 : 00000010 00000000 00000000 00000000int 4 : 00000100 00000000 00000000 00000000int 513 : 00000001 00000010 00000000 00000000long 513 : 00000001 00000010 00000000 00000000

00000000 00000000 00000000 00000000int -1 : 11111111 11111111 11111111 11111111int -2 : 11111110 11111111 11111111 11111111string Hello: 01001000 01100101 01101100 01101100

01101111 00000000string abcd : 01100001 01100010 01100011 01100100

00000000float 33.0 : 00000000 00000000 00000100 01000010float 66.0 : 00000000 00000000 10000100 01000010float 132.0: 00000000 00000000 00000100 01000011double 132.0: 00000000 00000000 00000000 00000000

00000000 10000000 01100000 010000002.5.6 Representing vectors, matrices, imagesOnce we can represent numbers and lists of numbers, then we can alsorepresent vectors (which are just lists of numbers). Similarly, we canrepresent lists of lists, and thus, in particular, can represent matrices.To represent an image, we can represent the color at each pixel by alist of three numbers corresponding to the intensity of Red, Green andBlue. (We can restrict to three primary colors since most humans onlyhave three types of cones in their retinas; we would have needed 16primary colors to represent colors visible to the Mantis Shrimp.) Thusan image of ๐‘› pixels would be represented by a list of ๐‘› such length-three lists. A video can be represented as a list of images. Of coursethese representations are rather wasteful and much more compactrepresentations are typically used for images and videos, though thiswill not be our concern in this book.

2.5.7 Representing graphsA graph on ๐‘› vertices can be represented as an ๐‘› ร— ๐‘› adjacency matrixwhose (๐‘–, ๐‘—)๐‘กโ„Ž entry is equal to 1 if the edge (๐‘–, ๐‘—) is present and isequal to 0 otherwise. That is, we can represent an ๐‘› vertex directedgraph ๐บ = (๐‘‰ , ๐ธ) as a string ๐ด โˆˆ 0, 1๐‘›2 such that ๐ด๐‘–,๐‘— = 1 iff theedge ๐‘– ๐‘— โˆˆ ๐ธ. We can transform an undirected graph to a directedgraph by replacing every edge ๐‘–, ๐‘— with both edges ๐‘– ๐‘— and ๐‘– ๐‘—

Another representation for graphs is the adjacency list representa-tion. That is, we identify the vertex set ๐‘‰ of a graph with the set [๐‘›]

Page 110: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

110 introduction to theoretical computer science

Figure 2.11: Representing the graph ๐บ =(0, 1, 2, 3, 4, (1, 0), (4, 0), (1, 4), (4, 1), (2, 1), (3, 2), (4, 3))in the adjacency matrix and adjacency list representa-tions.

where ๐‘› = |๐‘‰ |, and represent the graph ๐บ = (๐‘‰ , ๐ธ) as a list of ๐‘›lists, where the ๐‘–-th list consists of the out-neighbors of vertex ๐‘–. Thedifference between these representations can be significant for someapplications, though for us would typically be immaterial.

2.5.8 Representing lists and nested listsIf we have a way of representing objects from a set ๐’ช as binary strings,then we can represent lists of these objects by applying a prefix-freetransformation. Moreover, we can use a trick similar to the above tohandle nested lists. The idea is that if we have some representation๐ธ โˆถ ๐’ช โ†’ 0, 1โˆ—, then we can represent nested lists of items from๐’ช using strings over the five element alphabet ฮฃ = 0,1,[ , ] , , .For example, if ๐‘œ1 is represented by 0011, ๐‘œ2 is represented by 10011,and ๐‘œ3 is represented by 00111, then we can represent the nested list(๐‘œ1, (๐‘œ2, ๐‘œ3)) as the string "[0011,[10011,00111]]" over the alphabetฮฃ. By encoding every element of ฮฃ itself as a three-bit string, we cantransform any representation for objects ๐’ช into a representation thatenables representing (potentially nested) lists of these objects.

2.5.9 NotationWe will typically identify an object with its representation as a string.For example, if ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— is some function that mapsstrings to strings and ๐‘› is an integer, we might make statements suchas โ€œ๐น(๐‘›) + 1 is primeโ€ to mean that if we represent ๐‘› as a string ๐‘ฅ,then the integer ๐‘š represented by the string ๐น(๐‘ฅ) satisfies that ๐‘š + 1is prime. (You can see how this convention of identifying objects withtheir representation can save us a lot of cumbersome formalism.)Similarly, if ๐‘ฅ, ๐‘ฆ are some objects and ๐น is a function that takes stringsas inputs, then by ๐น(๐‘ฅ, ๐‘ฆ) we will mean the result of applying ๐น to therepresentation of the ordered pair (๐‘ฅ, ๐‘ฆ). We use the same notation toinvoke functions on ๐‘˜-tuples of objects for every ๐‘˜.

This convention of identifying an object with its representation asa string is one that we humans follow all the time. For example, whenpeople say a statement such as โ€œ17 is a prime numberโ€, what theyreally mean is that the integer whose decimal representation is thestring โ€œ17โ€, is prime.

When we say๐ด is an algorithm that computes the multiplication function on natural num-bers.what we really mean is that๐ด is an algorithm that computes the function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— such thatfor every pair ๐‘Ž, ๐‘ โˆˆ โ„•, if ๐‘ฅ โˆˆ 0, 1โˆ— is a string representing the pair (๐‘Ž, ๐‘)then ๐น(๐‘ฅ) will be a string representing their product ๐‘Ž โ‹… ๐‘.

Page 111: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 111

2.6 DEFINING COMPUTATIONAL TASKS AS MATHEMATICAL FUNC-TIONS

Abstractly, a computational process is some process that takes an inputwhich is a string of bits and produces an output which is a stringof bits. This transformation of input to output can be done using amodern computer, a person following instructions, the evolution ofsome natural system, or any other means.

In future chapters, we will turn to mathematically defining com-putational processes, but, as we discussed above, at the moment wefocus on computational tasks. That is, we focus on the specification andnot the implementation. Again, at an abstract level, a computationaltask can specify any relation that the output needs to have with the in-put. However, for most of this book, we will focus on the simplest andmost common task of computing a function. Here are some examples:

โ€ข Given (a representation of) two integers ๐‘ฅ, ๐‘ฆ, compute the product๐‘ฅ ร— ๐‘ฆ. Using our representation above, this corresponds to com-puting a function from 0, 1โˆ— to 0, 1โˆ—. We have seen that there ismore than one way to solve this computational task, and in fact, westill do not know the best algorithm for this problem.

โ€ข Given (a representation of) an integer ๐‘ง > 1, compute its factoriza-tion; i.e., the list of primes ๐‘1 โ‰ค โ‹ฏ โ‰ค ๐‘๐‘˜ such that ๐‘ง = ๐‘1 โ‹ฏ ๐‘๐‘˜. Thisagain corresponds to computing a function from 0, 1โˆ— to 0, 1โˆ—.The gaps in our knowledge of the complexity of this problem areeven larger.

โ€ข Given (a representation of) a graph ๐บ and two vertices ๐‘  and ๐‘ก,compute the length of the shortest path in ๐บ between ๐‘  and ๐‘ก, or dothe same for the longest path (with no repeated vertices) between๐‘  and ๐‘ก. Both these tasks correspond to computing a function from0, 1โˆ— to 0, 1โˆ—, though it turns out that there is a vast difference intheir computational difficulty.

โ€ข Given the code of a Python program, determine whether there is aninput that would force it into an infinite loop. This task correspondsto computing a partial function from 0, 1โˆ— to 0, 1 since not everystring corresponds to a syntactically valid Python program. We willsee that we do understand the computational status of this problem,but the answer is quite surprising.

โ€ข Given (a representation of) an image ๐ผ , decide if ๐ผ is a photo of acat or a dog. This corresponds to computing some (partial) func-tion from 0, 1โˆ— to 0, 1.

Page 112: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

112 introduction to theoretical computer science

Figure 2.12: A subset ๐ฟ โŠ† 0, 1โˆ— can be identifiedwith the function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 such that๐น(๐‘ฅ) = 1 if ๐‘ฅ โˆˆ ๐ฟ and ๐น(๐‘ฅ) = 0 if ๐‘ฅ โˆ‰ ๐ฟ. Functionswith a single bit of output are called Boolean functions,while subsets of strings are called languages. Theabove shows that the two are essentially the sameobject, and we can identify the task of decidingmembership in ๐ฟ (known as deciding a language in theliterature) with the task of computing the function ๐น .

RRemark 2.23 โ€” Boolean functions and languages. Animportant special case of computational tasks corre-sponds to computing Boolean functions, whose outputis a single bit 0, 1. Computing such functions corre-sponds to answering a YES/NO question, and hencethis task is also known as a decision problem. Given anyfunction ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 and ๐‘ฅ โˆˆ 0, 1โˆ—, the taskof computing ๐น(๐‘ฅ) corresponds to the task of decidingwhether or not ๐‘ฅ โˆˆ ๐ฟ where ๐ฟ = ๐‘ฅ โˆถ ๐น(๐‘ฅ) = 1 isknown as the language that corresponds to the function๐น . (The language terminology is due to historicalconnections between the theory of computation andformal linguistics as developed by Noam Chomsky.)Hence many texts refer to such a computational taskas deciding a language.

For every particular function ๐น , there can be several possible algo-rithms to compute ๐น . We will be interested in questions such as:

โ€ข For a given function ๐น , can it be the case that there is no algorithm tocompute ๐น ?

โ€ข If there is an algorithm, what is the best one? Could it be that ๐น isโ€œeffectively uncomputableโ€ in the sense that every algorithm forcomputing ๐น requires a prohibitively large amount of resources?

โ€ข If we cannot answer this question, can we show equivalence be-tween different functions ๐น and ๐น โ€ฒ in the sense that either they areboth easy (i.e., have fast algorithms) or they are both hard?

โ€ข Can a function being hard to compute ever be a good thing? Can weuse it for applications in areas such as cryptography?

In order to do that, we will need to mathematically define the no-tion of an algorithm, which is what we will do in Chapter 3.

2.6.1 Distinguish functions from programs!You should always watch out for potential confusions between speci-fications and implementations or equivalently between mathematicalfunctions and algorithms/programs. It does not help that program-ming languages (Python included) use the term โ€œfunctionsโ€ to denote(parts of) programs. This confusion also stems from thousands of yearsof mathematical history, where people typically defined functions bymeans of a way to compute them.

For example, consider the multiplication function on natural num-bers. This is the function MULT โˆถ โ„• ร— โ„• โ†’ โ„• that maps a pair (๐‘ฅ, ๐‘ฆ)of natural numbers to the number ๐‘ฅ โ‹… ๐‘ฆ. As we mentioned, it can beimplemented in more than one way:

Page 113: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 113

Figure 2.13: A function is a mapping of inputs tooutputs. A program is a set of instructions on howto obtain an output given an input. A programcomputes a function, but it is not the same as a func-tion, popular programming language terminologynotwithstanding.

def mult1(x,y):res = 0while y>0:

res += xy -= 1

return res

def mult2(x,y):a = str(x) # represent x as string in decimal notationb = str(y) # represent y as string in decimal notationres = 0for i in range(len(a)):

for j in range(len(b)):res += int(a[len(a)-i])*int(b[len(b)-

j])*(10**(i+j))return res

print(mult1(12,7))# 84print(mult2(12,7))# 84

Both mult1 and mult2 produce the same output given the samepair of natural number inputs. (Though mult1 will take far longer todo so when the numbers become large.) Hence, even though these aretwo different programs, they compute the same mathematical function.This distinction between a program or algorithm ๐ด, and the function ๐นthat ๐ด computes will be absolutely crucial for us in this course (see alsoFig. 2.13).

Big Idea 2 A function is not the same as a program. A programcomputes a function.

Distinguishing functions from programs (or other ways for comput-ing, including circuits and machines) is a crucial theme for this course.For this reason, this is often a running theme in questions that I (andmany other instructors) assign in homework and exams (hint, hint).

RRemark 2.24 โ€” Computation beyond functions (advanced,optional). Functions capture quite a lot of compu-tational tasks, but one can consider more generalsettings as well. For starters, we can and will talkabout partial functions, that are not defined on allinputs. When computing a partial function, we only

Page 114: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

114 introduction to theoretical computer science

need to worry about the inputs on which the functionis defined. Another way to say it is that we can designan algorithm for a partial function ๐น under the as-sumption that someone โ€œpromisedโ€ us that all inputs๐‘ฅ would be such that ๐น(๐‘ฅ) is defined (as otherwise,we do not care about the result). Hence such tasks arealso known as promise problems.Another generalization is to consider relations that mayhave more than one possible admissible output. Forexample, consider the task of finding any solution fora given set of equations. A relation ๐‘… maps a string๐‘ฅ โˆˆ 0, 1โˆ— into a set of strings ๐‘…(๐‘ฅ) (for example, ๐‘ฅmight describe a set of equations, in which case ๐‘…(๐‘ฅ)would correspond to the set of all solutions to ๐‘ฅ). Wecan also identify a relation ๐‘… with the set of pairs ofstrings (๐‘ฅ, ๐‘ฆ) where ๐‘ฆ โˆˆ ๐‘…(๐‘ฅ). A computational pro-cess solves a relation if for every ๐‘ฅ โˆˆ 0, 1โˆ—, it outputssome string ๐‘ฆ โˆˆ ๐‘…(๐‘ฅ).Later in this book, we will consider even more generaltasks, including interactive tasks, such as finding agood strategy in a game, tasks defined using proba-bilistic notions, and others. However, for much of thisbook, we will focus on the task of computing a func-tion, and often even a Boolean function, that has only asingle bit of output. It turns out that a great deal of thetheory of computation can be studied in the context ofthis task, and the insights learned are applicable in themore general settings.

Chapter Recap

โ€ข We can represent objects we want to compute onusing binary strings.

โ€ข A representation scheme for a set of objects ๐’ช is aone-to-one map from ๐’ช to 0, 1โˆ—.

โ€ข We can use prefix-free encoding to โ€œboostโ€ a repre-sentation for a set ๐’ช into representations of lists ofelements in ๐’ช.

โ€ข A basic computational task is the task of computinga function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—. This task encom-passes not just arithmetical computations suchas multiplication, factoring, etc. but a great manyother tasks arising in areas as diverse as scientificcomputing, artificial intelligence, image processing,data mining and many many more.

โ€ข We will study the question of finding (or at leastgiving bounds on) what the best algorithm forcomputing ๐น for various interesting functions ๐น is.

Page 115: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 115

2.7 EXERCISES

Exercise 2.1 Which one of these objects can be represented by a binarystring?

a. An integer ๐‘ฅb. An undirected graph ๐บ.

c. A directed graph ๐ปd. All of the above.

Exercise 2.2 โ€” Binary representation. a. Prove that the function ๐‘๐‘ก๐‘† โˆถ โ„• โ†’0, 1โˆ— of the binary representation defined in (2.1) satisfies that forevery ๐‘› โˆˆ โ„•, if ๐‘ฅ = ๐‘๐‘ก๐‘†(๐‘›) then |๐‘ฅ| = 1 + max(0, โŒŠlog2 ๐‘›โŒ‹) and๐‘ฅ๐‘– = โŒŠ๐‘ฅ/2โŒŠlog2 ๐‘›โŒ‹โˆ’๐‘–โŒ‹ mod 2.

b. Prove that ๐‘๐‘ก๐‘† is a one to one function by coming up with a func-tion ๐‘†๐‘ก๐‘ โˆถ 0, 1โˆ— โ†’ โ„• such that ๐‘†๐‘ก๐‘(๐‘๐‘ก๐‘†(๐‘›)) = ๐‘› for every๐‘› โˆˆ โ„•.

Exercise 2.3 โ€” More compact than ASCII representation. The ASCII encodingcan be used to encode a string of ๐‘› English letters as a 7๐‘› bit binarystring, but in this exercise, we ask about finding a more compact rep-resentation for strings of English lowercase letters.

1. Prove that there exists a representation scheme (๐ธ, ๐ท) for stringsover the 26-letter alphabet ๐‘Ž, ๐‘, ๐‘, โ€ฆ , ๐‘ง as binary strings suchthat for every ๐‘› > 0 and length-๐‘› string ๐‘ฅ โˆˆ ๐‘Ž, ๐‘, โ€ฆ , ๐‘ง๐‘›, therepresentation ๐ธ(๐‘ฅ) is a binary string of length at most 4.8๐‘› + 1000.In other words, prove that for every ๐‘›, there exists a one-to-onefunction ๐ธ โˆถ ๐‘Ž, ๐‘, โ€ฆ , ๐‘ง๐‘› โ†’ 0, 1โŒŠ4.8๐‘›+1000โŒ‹.

2. Prove that there exists no representation scheme for strings over thealphabet ๐‘Ž, ๐‘, โ€ฆ , ๐‘ง as binary strings such that for every length-๐‘›string ๐‘ฅ โˆˆ ๐‘Ž, ๐‘, โ€ฆ , ๐‘ง๐‘›, the representation ๐ธ(๐‘ฅ) is a binary string oflength โŒŠ4.6๐‘› + 1000โŒ‹. In other words, prove that there exists some๐‘› > 0 such that there is no one-to-one function ๐ธ โˆถ ๐‘Ž, ๐‘, โ€ฆ , ๐‘ง๐‘› โ†’0, 1โŒŠ4.6๐‘›+1000โŒ‹.

3. Pythonโ€™s bz2.compress function is a mapping from strings tostrings, which uses the lossless (and hence one to one) bzip2 algo-rithm for compression. After converting to lowercase, and truncat-ing spaces and numbers, the text of Tolstoyโ€™s โ€œWar and Peaceโ€ con-tains ๐‘› = 2, 517, 262. Yet, if we run bz2.compress on the string ofthe text of โ€œWar and Peaceโ€ we get a string of length ๐‘˜ = 6, 274, 768

Page 116: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

116 introduction to theoretical computer science

5 Actually that particular fictional company uses ametric that focuses more on compression speed thenratio, see here and here.

bits, which is only 2.49๐‘› (and in particular much smaller than4.6๐‘›). Explain why this does not contradict your answer to theprevious question.

4. Interestingly, if we try to apply bz2.compress on a random string,we get much worse performance. In my experiments, I got a ratioof about 4.78 between the number of bits in the output and thenumber of characters in the input. However, one could imagine thatone could do better and that there exists a company called โ€œPiedPiperโ€ with an algorithm that can losslessly compress a string of ๐‘›random lowercase letters to fewer than 4.6๐‘› bits.5 Show that thisis not the case by proving that for every ๐‘› > 100 and one to onefunction ๐ธ๐‘›๐‘๐‘œ๐‘‘๐‘’ โˆถ ๐‘Ž, โ€ฆ , ๐‘ง๐‘› โ†’ 0, 1โˆ—, if we let ๐‘ be the randomvariable |๐ธ๐‘›๐‘๐‘œ๐‘‘๐‘’(๐‘ฅ)| (i.e., the length of ๐ธ๐‘›๐‘๐‘œ๐‘‘๐‘’(๐‘ฅ)) for ๐‘ฅ chosenuniformly at random from the set ๐‘Ž, โ€ฆ , ๐‘ง๐‘›, then the expectedvalue of ๐‘ is at least 4.6๐‘›.

Exercise 2.4 โ€” Representing graphs: upper bound. Show that there is a stringrepresentation of directed graphs with vertex set [๐‘›] and degree atmost 10 that uses at most 1000๐‘› log๐‘› bits. More formally, show thefollowing: Suppose we define for every ๐‘› โˆˆ โ„•, the set ๐บ๐‘› as the setcontaining all directed graphs (with no self loops) over the vertexset [๐‘›] where every vertex has degree at most 10. Then, prove that forevery sufficiently large ๐‘›, there exists a one-to-one function ๐ธ โˆถ ๐บ๐‘› โ†’0, 1โŒŠ1000๐‘› log๐‘›โŒ‹.

Exercise 2.5 โ€” Representing graphs: lower bound. 1. Define ๐‘†๐‘› to be theset of one-to-one and onto functions mapping [๐‘›] to [๐‘›]. Prove thatthere is a one-to-one mapping from ๐‘†๐‘› to ๐บ2๐‘›, where ๐บ2๐‘› is the setdefined in Exercise 2.4 above.

2. In this question you will show that one cannot improve the rep-resentation of Exercise 2.4 to length ๐‘œ(๐‘› log๐‘›). Specifically, provefor every sufficiently large ๐‘› โˆˆ โ„• there is no one-to-one function๐ธ โˆถ ๐บ๐‘› โ†’ 0, 1โŒŠ0.001๐‘› log๐‘›โŒ‹+1000.

Exercise 2.6 โ€” Multiplying in different representation. Recall that the grade-school algorithm for multiplying two numbers requires ๐‘‚(๐‘›2) oper-ations. Suppose that instead of using decimal representation, we useone of the following representations ๐‘…(๐‘ฅ) to represent a number ๐‘ฅbetween 0 and 10๐‘› โˆ’ 1. For which one of these representations you canstill multiply the numbers in ๐‘‚(๐‘›2) operations?

Page 117: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 117

a. The standard binary representation: ๐ต(๐‘ฅ) = (๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜) where๐‘ฅ = โˆ‘๐‘˜๐‘–=0 ๐‘ฅ๐‘–2๐‘– and ๐‘˜ is the largest number s.t. ๐‘ฅ โ‰ฅ 2๐‘˜.b. The reverse binary representation: ๐ต(๐‘ฅ) = (๐‘ฅ๐‘˜, โ€ฆ , ๐‘ฅ0) where ๐‘ฅ๐‘– is

defined as above for ๐‘– = 0, โ€ฆ , ๐‘˜ โˆ’ 1.c. Binary coded decimal representation: ๐ต(๐‘ฅ) = (๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘›โˆ’1) where๐‘ฆ๐‘– โˆˆ 0, 14 represents the ๐‘–๐‘กโ„Ž decimal digit of ๐‘ฅ mapping 0 to 0000,1 to 0001, 2 to 0010, etc. (i.e. 9 maps to 1001)d. All of the above.

Exercise 2.7 Suppose that ๐‘… โˆถ โ„• โ†’ 0, 1โˆ— corresponds to representing anumber ๐‘ฅ as a string of ๐‘ฅ 1โ€™s, (e.g., ๐‘…(4) = 1111, ๐‘…(7) = 1111111, etc.).If ๐‘ฅ, ๐‘ฆ are numbers between 0 and 10๐‘› โˆ’ 1, can we still multiply ๐‘ฅ and๐‘ฆ using ๐‘‚(๐‘›2) operations if we are given them in the representation๐‘…(โ‹…)?

Exercise 2.8 Recall that if ๐น is a one-to-one and onto function mappingelements of a finite set ๐‘ˆ into a finite set ๐‘‰ then the sizes of ๐‘ˆ and ๐‘‰are the same. Let ๐ต โˆถ โ„• โ†’ 0, 1โˆ— be the function such that for every๐‘ฅ โˆˆ โ„•, ๐ต(๐‘ฅ) is the binary representation of ๐‘ฅ.1. Prove that ๐‘ฅ < 2๐‘˜ if and only if |๐ต(๐‘ฅ)| โ‰ค ๐‘˜.2. Use 1. to compute the size of the set ๐‘ฆ โˆˆ 0, 1โˆ— โˆถ |๐‘ฆ| โ‰ค ๐‘˜ where |๐‘ฆ|

denotes the length of the string ๐‘ฆ.3. Use 1. and 2. to prove that 2๐‘˜ โˆ’ 1 = 1 + 2 + 4 + โ‹ฏ + 2๐‘˜โˆ’1.

Exercise 2.9 โ€” Prefix-free encoding of tuples. Suppose that ๐น โˆถ โ„• โ†’ 0, 1โˆ—is a one-to-one function that is prefix-free in the sense that there is no๐‘Ž โ‰  ๐‘ s.t. ๐น(๐‘Ž) is a prefix of ๐น(๐‘).a. Prove that ๐น2 โˆถ โ„• ร— โ„• โ†’ 0, 1โˆ—, defined as ๐น2(๐‘Ž, ๐‘) = ๐น(๐‘Ž)๐น(๐‘) (i.e.,

the concatenation of ๐น(๐‘Ž) and ๐น(๐‘)) is a one-to-one function.

b. Prove that ๐นโˆ— โˆถ โ„•โˆ— โ†’ 0, 1โˆ— defined as ๐นโˆ—(๐‘Ž1, โ€ฆ , ๐‘Ž๐‘˜) =๐น(๐‘Ž1) โ‹ฏ ๐น(๐‘Ž๐‘˜) is a one-to-one function, where โ„•โˆ— denotes the set ofall finite-length lists of natural numbers.

Exercise 2.10 โ€” More efficient prefix-free transformation. Suppose that ๐น โˆถ๐‘‚ โ†’ 0, 1โˆ— is some (not necessarily prefix-free) representation of theobjects in the set ๐‘‚, and ๐บ โˆถ โ„• โ†’ 0, 1โˆ— is a prefix-free representa-tion of the natural numbers. Define ๐น โ€ฒ(๐‘œ) = ๐บ(|๐น(๐‘œ)|)๐น(๐‘œ) (i.e., theconcatenation of the representation of the length ๐น(๐‘œ) and ๐น(๐‘œ)).

Page 118: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

118 introduction to theoretical computer science

6 Hint: Think recursively how to represent the lengthof the string.

a. Prove that ๐น โ€ฒ is a prefix-free representation of ๐‘‚.

b. Show that we can transform any representation to a prefix-free oneby a modification that takes a ๐‘˜ bit string into a string of length atmost ๐‘˜ + ๐‘‚(log ๐‘˜).

c. Show that we can transform any representation to a prefix-free oneby a modification that takes a ๐‘˜ bit string into a string of length atmost ๐‘˜ + log ๐‘˜ + ๐‘‚(log log ๐‘˜).6

Exercise 2.11 โ€” Kraftโ€™s Inequality. Suppose that ๐‘† โŠ† 0, 1๐‘› is some finiteprefix-free set.

a. For every ๐‘˜ โ‰ค ๐‘› and length-๐‘˜ string ๐‘ฅ โˆˆ ๐‘†, let ๐ฟ(๐‘ฅ) โŠ† 0, 1๐‘› denoteall the length-๐‘› strings whose first ๐‘˜ bits are ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1. Prove that(1) |๐ฟ(๐‘ฅ)| = 2๐‘›โˆ’|๐‘ฅ| and (2) If ๐‘ฅ โ‰  ๐‘ฅโ€ฒ then ๐ฟ(๐‘ฅ) is disjoint from๐ฟ(๐‘ฅโ€ฒ).

b. Prove that โˆ‘๐‘ฅโˆˆ๐‘† 2โˆ’|๐‘ฅ| โ‰ค 1.c. Prove that there is no prefix-free encoding of strings with less than

logarithmic overhead. That is, prove that there is no function PF โˆถ0, 1โˆ— โ†’ 0, 1โˆ— s.t. |PF(๐‘ฅ)| โ‰ค |๐‘ฅ| + 0.9 log |๐‘ฅ| for every ๐‘ฅ โˆˆ 0, 1โˆ—and such that the set PF(๐‘ฅ) โˆถ ๐‘ฅ โˆˆ 0, 1โˆ— is prefix-free. The factor0.9 is arbitrary; all that matters is that it is less than 1.

Exercise 2.12 โ€” Composition of one-to-one functions. Prove that for everytwo one-to-one functions ๐น โˆถ ๐‘† โ†’ ๐‘‡ and ๐บ โˆถ ๐‘‡ โ†’ ๐‘ˆ , the function๐ป โˆถ ๐‘† โ†’ ๐‘ˆ defined as ๐ป(๐‘ฅ) = ๐บ(๐น(๐‘ฅ)) is one to one.

Exercise 2.13 โ€” Natural numbers and strings. 1. We have shown thatthe natural numbers can be represented as strings. Prove thatthe other direction holds as well: that there is a one-to-one map๐‘†๐‘ก๐‘ โˆถ 0, 1โˆ— โ†’ โ„•. (๐‘†๐‘ก๐‘ stands for โ€œstrings to numbers.โ€)

2. Recall that Cantor proved that there is no one-to-one map ๐‘…๐‘ก๐‘ โˆถโ„ โ†’ โ„•. Show that Cantorโ€™s result implies Theorem 2.5.

Exercise 2.14 โ€” Map lists of integers to a number. Recall that for every set๐‘†, the set ๐‘†โˆ— is defined as the set of all finite sequences of mem-bers of ๐‘† (i.e., ๐‘†โˆ— = (๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) | ๐‘› โˆˆ โ„• , โˆ€๐‘–โˆˆ[๐‘›]๐‘ฅ๐‘– โˆˆ ๐‘† ).Prove that there is a one-one-map from โ„คโˆ— to โ„• where โ„ค is the set ofโ€ฆ , โˆ’3, โˆ’2, โˆ’1, 0, +1, +2, +3, โ€ฆ of all integers.

Page 119: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

computation and representation 119

2.8 BIBLIOGRAPHICAL NOTES

The study of representing data as strings, including issues such ascompression and error corrections falls under the purview of informationtheory, as covered in the classic textbook of Cover and Thomas [CT06].Representations are also studied in the field of data structures design, ascovered in texts such as [Cor+09].

The question of whether to represent integers with the most signif-icant digit first or last is known as Big Endian vs. Little Endian repre-sentation. This terminology comes from Cohenโ€™s [Coh81] entertainingand informative paper about the conflict between adherents of bothschools which he compared to the warring tribes in Jonathan Swiftโ€™sโ€œGulliverโ€™s Travelsโ€. The twoโ€™s complement representation of signedintegers was suggested in von Neumannโ€™s classic report [Neu45]that detailed the design approaches for a stored-program computer,though similar representations have been used even earlier in abacusand other mechanical computation devices.

The idea that we should separate the definition or specification ofa function from its implementation or computation might seem โ€œobvi-ous,โ€ but it took quite a lot of time for mathematicians to arrive at thisviewpoint. Historically, a function ๐น was identified by rules or formu-las showing how to derive the output from the input. As we discussin greater depth in Chapter 9, in the 1800s this somewhat informalnotion of a function started โ€œbreaking at the seams,โ€ and eventuallymathematicians arrived at the more rigorous definition of a functionas an arbitrary assignment of input to outputs. While many functionsmay be described (or computed) by one or more formulas, today wedo not consider that to be an essential property of functions, and alsoallow functions that do not correspond to any โ€œniceโ€ formula.

We have mentioned that all representations of the real numbersare inherently approximate. Thus an important endeavor is to under-stand what guarantees we can offer on the approximation quality ofthe output of an algorithm, as a function of the approximation qualityof the inputs. This question is known as the question of determiningthe numerical stability of given equations. The Floating Points Guidewebsite contains an extensive description of the floating point repre-sentation, as well the many ways in which it could subtly fail, see alsothe website 0.30000000000000004.com.

Dauben [Dau90] gives a biography of Cantor with emphasis onthe development of his mathematical ideas. [Hal60] is a classic text-book on set theory, including also Cantorโ€™s theorem. Cantorโ€™s Theo-rem is also covered in many texts on discrete mathematics, including[LLM18; LZ19].

Page 120: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

120 introduction to theoretical computer science

The adjacency matrix representation of graphs is not merely a con-venient way to map a graph into a binary string, but it turns out thatmany natural notions and operations on matrices are useful for graphsas well. (For example, Googleโ€™s PageRank algorithm relies on thisviewpoint.) The notes of Spielmanโ€™s course are an excellent source forthis area, known as spectral graph theory. We will return to this viewmuch later in this book when we talk about random walks.

Page 121: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

IFINITE COMPUTATION

Page 122: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 123: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Figure 3.1: Calculating wheels by Charles Babbage.Image taken from the Mark I โ€˜operating manualโ€™

Figure 3.2: A 1944 Popular Mechanics article on theHarvard Mark I computer.

3Defining computation

โ€œthere is no reason why mental as well as bodily labor should not be economizedby the aid of machineryโ€, Charles Babbage, 1852

โ€œIf, unwarned by my example, any man shall undertake and shall succeedin constructing an engine embodying in itself the whole of the executive de-partment of mathematical analysis upon different principles or by simplermechanical means, I have no fear of leaving my reputation in his charge, for healone will be fully able to appreciate the nature of my efforts and the value oftheir results.โ€, Charles Babbage, 1864

โ€œTo understand a program you must become both the machine and the pro-gram.โ€, Alan Perlis, 1982

People have been computing for thousands of years, with aidsthat include not just pen and paper, but also abacus, slide rules, vari-ous mechanical devices, and modern electronic computers. A priori,the notion of computation seems to be tied to the particular mech-anism that you use. You might think that the โ€œbestโ€ algorithm formultiplying numbers will differ if you implement it in Python on amodern laptop than if you use pen and paper. However, as we sawin the introduction (Chapter 0), an algorithm that is asymptoticallybetter would eventually beat a worse one regardless of the underly-ing technology. This gives us hope for a technology independent wayof defining computation. This is what we do in this chapter. We willdefine the notion of computing an output from an input by applying asequence of basic operations (see Fig. 3.3). Using this, we will be ableto precisely define statements such as โ€œfunction ๐‘“ can be computedby model ๐‘‹โ€ or โ€œfunction ๐‘“ can be computed by model ๐‘‹ using ๐‘ operationsโ€.

This chapter: A non-mathy overviewThe main takeaways from this chapter are:

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข See that computation can be precisely

modeled.โ€ข Learn the computational model of Boolean

circuits / straight-line programs.โ€ข Equivalence of circuits and straight-line

programs.โ€ข Equivalence of AND/OR/NOT and NAND.โ€ข Examples of computing in the physical world.

Page 124: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

124 introduction to theoretical computer science

Figure 3.3: A function mapping strings to stringsspecifies a computational task, i.e., describes what thedesired relation between the input and the outputis. In this chapter we define models for implementingcomputational processes that achieve the desiredrelation, i.e., describe how to compute the outputfrom the input. We will see several examples of suchmodels using both Boolean circuits and straight-lineprogramming languages.

โ€ข We can use logical operations such as AND, OR, and NOT tocompute an output from an input (see Section 3.2).

โ€ข A Boolean circuit is a way to compose the basic logicaloperations to compute a more complex function (see Sec-tion 3.3). We can think of Boolean circuits as both a mathe-matical model (which is based on directed acyclic graphs)as well as physical devices we can construct in the realworld in a variety of ways, including not just silicon-basedsemi-conductors but also mechanical and even biologicalmechanisms (see Section 3.4).

โ€ข We can describe Boolean circuits also as straight-line pro-grams, which are programs that do not have any loopingconstructs (i.e., no while / for/ do .. until etc.), seeSection 3.3.2.

โ€ข It is possible to implement the AND, OR, and NOT oper-ations using the NAND operation (as well as vice versa).This means that circuits with AND/OR/NOT gates cancompute the same functions (i.e., are equivalent in power)to circuits with NAND gates, and we can use either modelto describe computation based on our convenience, seeSection 3.5. To give out a โ€œspoilerโ€, we will see in Chap-ter 4 that such circuits can compute all finite functions.

One โ€œbig ideaโ€ of this chapter is the notion of equivalencebetween models (Big Idea 3). Two computational modelsare equivalent if they can compute the same set of functions.

Page 125: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 125

Figure 3.4: Text pages from Algebra manuscript withgeometrical solutions to two quadratic equations.Shelfmark: MS. Huntington 214 fol. 004v-005r

Figure 3.5: An explanation for children of the two digitaddition algorithm

Boolean circuits with AND/OR/NOT gates are equivalent tocircuits with NAND gates, but this is just one example of themore general phenomenon that we will see many times inthis book.

3.1 DEFINING COMPUTATION

The name โ€œalgorithmโ€ is derived from the Latin transliteration ofMuhammad ibn Musa al-Khwarizmiโ€™s name. Al-Khwarizmi was aPersian scholar during the 9th century whose books introduced thewestern world to the decimal positional numeral system, as well as tothe solutions of linear and quadratic equations (see Fig. 3.4). HoweverAl-Khwarizmiโ€™s descriptions of algorithms were rather informal bytodayโ€™s standards. Rather than use โ€œvariablesโ€ such as ๐‘ฅ, ๐‘ฆ, he usedconcrete numbers such as 10 and 39, and trusted the reader to beable to extrapolate from these examples, much as algorithms are stilltaught to children today.

Here is how Al-Khwarizmi described the algorithm for solving anequation of the form ๐‘ฅ2 + ๐‘๐‘ฅ = ๐‘:

[How to solve an equation of the form ] โ€œroots and squares are equal to num-bersโ€: For instance โ€œone square , and ten roots of the same, amount to thirty-nine dirhemsโ€ that is to say, what must be the square which, when increasedby ten of its own root, amounts to thirty-nine? The solution is this: you halvethe number of the roots, which in the present instance yields five. This youmultiply by itself; the product is twenty-five. Add this to thirty-nineโ€™ the sumis sixty-four. Now take the root of this, which is eight, and subtract from it halfthe number of roots, which is five; the remainder is three. This is the root of thesquare which you sought for; the square itself is nine.

For the purposes of this book, we will need a much more preciseway to describe algorithms. Fortunately (or is it unfortunately?), atleast at the moment, computers lag far behind school-age childrenin learning from examples. Hence in the 20th century, people cameup with exact formalisms for describing algorithms, namely program-ming languages. Here is al-Khwarizmiโ€™s quadratic equation solvingalgorithm described in the Python programming language:

from math import sqrt#Pythonspeak to enable use of the sqrt function to compute

square roots.def solve_eq(b,c):

# return solution of x^2 + bx = c following AlKhwarizmi's instructions

# Al Kwarizmi demonstrates this for the case b=10 andc= 39

Page 126: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

126 introduction to theoretical computer science

val1 = b / 2.0 # "halve the number of the roots"val2 = val1 * val1 # "this you multiply by itself"val3 = val2 + c # "Add this to thirty-nine"val4 = sqrt(val3) # "take the root of this"val5 = val4 - val1 # "subtract from it half the number

of roots"return val5 # "This is the root of the square which

you sought for"# Test: solve x^2 + 10*x = 39print(solve_eq(10,39))# 3.0

We can define algorithms informally as follows:

Informal definition of an algorithm: An algorithm is a set of instruc-tions for how to compute an output from an input by following a se-quence of โ€œelementary stepsโ€.An algorithm ๐ด computes a function ๐น if for every input ๐‘ฅ, if we followthe instructions of ๐ด on the input ๐‘ฅ, we obtain the output ๐น(๐‘ฅ).In this chapter we will make this informal definition precise using

the model of Boolean Circuits. We will show that Boolean Circuitsare equivalent in power to straight line programs that are written inโ€œultra simpleโ€ programming languages that do not even have loops.We will also see that the particular choice of elementary operations isimmaterial and many different choices yield models with equivalentpower (see Fig. 3.6). However, it will take us some time to get there.We will start by discussing what are โ€œelementary operationsโ€ and howwe map a description of an algorithm into an actual physical processthat produces an output from an input in the real world.

Figure 3.6: An overview of the computational modelsdefined in this chapter. We will show several equiv-alent ways to represent a recipe for performing afinite computation. Specifically we will show that wecan model such a computation using either a Booleancircuit or a straight line program, and these two repre-sentations are equivalent to one another. We will alsoshow that we can choose as our basic operations ei-ther the set AND,OR,NOT or the set NAND andthese two choices are equivalent in power. By makingthe choice of whether to use circuits or programs,and whether to use AND,OR,NOT or NAND weobtain four equivalent ways of modeling finite com-putation. Moreover, there are many other choices ofsets of basic operations that are equivalent in power.

Page 127: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 127

3.2 COMPUTING USING AND, OR, AND NOT.

An algorithm breaks down a complex calculation into a series of sim-pler steps. These steps can be executed in a variety of different ways,including:

โ€ข Writing down symbols on a piece of paper.

โ€ข Modifying the current flowing on electrical wires.

โ€ข Binding a protein to a strand of DNA.

โ€ข Responding to a stimulus by a member of a collection (e.g., a bee ina colony, a trader in a market).

To formally define algorithms, let us try to โ€œerr on the side of sim-plicityโ€ and model our โ€œbasic stepsโ€ as truly minimal. For example,here are some very simple functions:

โ€ข OR โˆถ 0, 12 โ†’ 0, 1 defined as

OR(๐‘Ž, ๐‘) = โŽงโŽจโŽฉ0 ๐‘Ž = ๐‘ = 01 otherwise(3.1)

โ€ข AND โˆถ 0, 12 โ†’ 0, 1 defined as

AND(๐‘Ž, ๐‘) = โŽงโŽจโŽฉ1 ๐‘Ž = ๐‘ = 10 otherwise(3.2)

โ€ข NOT โˆถ 0, 1 โ†’ 0, 1 defined as

NOT(๐‘Ž) = โŽงโŽจโŽฉ0 ๐‘Ž = 11 ๐‘Ž = 0 (3.3)

The functions AND, OR and NOT, are the basic logical operatorsused in logic and many computer systems. In the context of logic, it iscommon to use the notation ๐‘Ž โˆง ๐‘ for AND(๐‘Ž, ๐‘), ๐‘Ž โˆจ ๐‘ for OR(๐‘Ž, ๐‘) and๐‘Ž and ยฌ๐‘Ž for NOT(๐‘Ž), and we will use this notation as well.

Each one of the functions AND,OR,NOT takes either one or twosingle bits as input, and produces a single bit as output. Clearly, itcannot get much more basic than that. However, the power of compu-tation comes from composing such simple building blocks together.

Page 128: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

128 introduction to theoretical computer science

Example 3.1 โ€” Majority from ๐ด๐‘๐ท,๐‘‚๐‘… and ๐‘๐‘‚๐‘‡ . Consider the func-tion MAJ โˆถ 0, 13 โ†’ 0, 1 that is defined as follows:

MAJ(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ0 + ๐‘ฅ1 + ๐‘ฅ2 โ‰ฅ 20 otherwise. (3.4)

That is, for every ๐‘ฅ โˆˆ 0, 13, MAJ(๐‘ฅ) = 1 if and only if the ma-jority (i.e., at least two out of the three) of ๐‘ฅโ€™s elements are equalto 1. Can you come up with a formula involving AND, OR andNOT to compute MAJ? (It would be useful for you to pause at thispoint and work out the formula for yourself. As a hint, althoughthe NOT operator is needed to compute some functions, you willnot need to use it to compute MAJ.)

Let us first try to rephrase MAJ(๐‘ฅ) in words: โ€œMAJ(๐‘ฅ) = 1 if andonly if there exists some pair of distinct elements ๐‘–, ๐‘— such that both๐‘ฅ๐‘– and ๐‘ฅ๐‘— are equal to 1.โ€ In other words it means that MAJ(๐‘ฅ) = 1iff either both ๐‘ฅ0 = 1 and ๐‘ฅ1 = 1, or both ๐‘ฅ1 = 1 and ๐‘ฅ2 = 1, or both๐‘ฅ0 = 1 and ๐‘ฅ2 = 1. Since the OR of three conditions ๐‘0, ๐‘1, ๐‘2 canbe written as OR(๐‘0,OR(๐‘1, ๐‘2)), we can now translate this into aformula as follows:

MAJ(๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2) = OR (AND(๐‘ฅ0, ๐‘ฅ1) , OR(AND(๐‘ฅ1, ๐‘ฅ2) , AND(๐‘ฅ0, ๐‘ฅ2)) ) .(3.5)

Recall that we can also write ๐‘Ž โˆจ ๐‘ for OR(๐‘Ž, ๐‘) and ๐‘Ž โˆง ๐‘ forAND(๐‘Ž, ๐‘). With this notation, (3.5) can also be written as

MAJ(๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2) = ((๐‘ฅ0 โˆง ๐‘ฅ1) โˆจ (๐‘ฅ1 โˆง ๐‘ฅ2)) โˆจ (๐‘ฅ0 โˆง ๐‘ฅ2) . (3.6)

We can also write (3.5) in a โ€œprogramming languageโ€ form,expressing it as a set of instructions for computing MAJ given thebasic operations AND,OR,NOT:

def MAJ(X[0],X[1],X[2]):firstpair = AND(X[0],X[1])secondpair = AND(X[1],X[2])thirdpair = AND(X[0],X[2])temp = OR(secondpair,thirdpair)return OR(firstpair,temp)

3.2.1 Some properties of AND and ORLike standard addition and multiplication, the functions AND and ORsatisfy the properties of commutativity: ๐‘Ž โˆจ ๐‘ = ๐‘ โˆจ ๐‘Ž and ๐‘Ž โˆง ๐‘ = ๐‘ โˆง ๐‘Žand associativity: (๐‘Žโˆจ๐‘)โˆจ๐‘ = ๐‘Žโˆจ(๐‘โˆจ๐‘) and (๐‘Žโˆง๐‘)โˆง๐‘ = ๐‘Žโˆง(๐‘โˆง๐‘). As in

Page 129: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 129

the case of addition and multiplication, we often drop the parenthesisand write ๐‘Žโˆจ๐‘โˆจ๐‘ โˆจ๐‘‘ for ((๐‘Žโˆจ๐‘)โˆจ๐‘)โˆจ๐‘‘, and similarly ORโ€™s and ANDโ€™sof more terms. They also satisfy a variant of the distributive law:Solved Exercise 3.1 โ€” Distributive law for AND and OR. Prove that for every๐‘Ž, ๐‘, ๐‘ โˆˆ 0, 1, ๐‘Ž โˆง (๐‘ โˆจ ๐‘) = (๐‘Ž โˆง ๐‘) โˆจ (๐‘Ž โˆง ๐‘).

Solution:

We can prove this by enumerating over all the 8 possible valuesfor ๐‘Ž, ๐‘, ๐‘ โˆˆ 0, 1 but it also follows from the standard distributivelaw. Suppose that we identify any positive integer with โ€œtrueโ€ andthe value zero with โ€œfalseโ€. Then for every numbers ๐‘ข, ๐‘ฃ โˆˆ โ„•, ๐‘ข + ๐‘ฃis positive if and only if ๐‘ข โˆจ ๐‘ฃ is true and ๐‘ข โ‹… ๐‘ฃ is positive if and onlyif ๐‘ข โˆง ๐‘ฃ is true. This means that for every ๐‘Ž, ๐‘, ๐‘ โˆˆ 0, 1, the expres-sion ๐‘Ž โˆง (๐‘ โˆจ ๐‘) is true if and only if ๐‘Ž โ‹… (๐‘ + ๐‘) is positive, and theexpression (๐‘Ž โˆง ๐‘) โˆจ (๐‘Ž โˆง ๐‘) is true if and only if ๐‘Ž โ‹… ๐‘ + ๐‘Ž โ‹… ๐‘ is positive,But by the standard distributive law ๐‘Ž โ‹… (๐‘ + ๐‘) = ๐‘Ž โ‹… ๐‘ + ๐‘Ž โ‹… ๐‘ andhence the former expression is true if and only if the latter one is.

3.2.2 Extended example: Computing XOR from AND, OR, and NOTLet us see how we can obtain a different function from the samebuilding blocks. Define XOR โˆถ 0, 12 โ†’ 0, 1 to be the functionXOR(๐‘Ž, ๐‘) = ๐‘Ž + ๐‘ mod 2. That is, XOR(0, 0) = XOR(1, 1) = 0 andXOR(1, 0) = XOR(0, 1) = 1. We claim that we can construct XORusing only AND, OR, and NOT.

PAs usual, it is a good exercise to try to work out thealgorithm for XOR using AND, OR and NOT on yourown before reading further.

The following algorithm computes XOR using AND, OR, and NOT:

Algorithm 3.2 โ€” ๐‘‹๐‘‚๐‘… from ๐ด๐‘๐ท/๐‘‚๐‘…/๐‘๐‘‚๐‘‡ .

Input: ๐‘Ž, ๐‘ โˆˆ 0, 1.Output: ๐‘‹๐‘‚๐‘…(๐‘Ž, ๐‘)1: ๐‘ค1 โ† ๐ด๐‘๐ท(๐‘Ž, ๐‘)2: ๐‘ค2 โ† ๐‘๐‘‚๐‘‡ (๐‘ค1)3: ๐‘ค3 โ† ๐‘‚๐‘…(๐‘Ž, ๐‘)4: return ๐ด๐‘๐ท(๐‘ค2, ๐‘ค3)

Lemma 3.3 For every ๐‘Ž, ๐‘ โˆˆ 0, 1, on input ๐‘Ž, ๐‘, Algorithm 3.2 outputs๐‘Ž + ๐‘ mod 2.

Page 130: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

130 introduction to theoretical computer science

Proof. For every ๐‘Ž, ๐‘, XOR(๐‘Ž, ๐‘) = 1 if and only if ๐‘Ž is different from๐‘. On input ๐‘Ž, ๐‘ โˆˆ 0, 1, Algorithm 3.2 outputs AND(๐‘ค2, ๐‘ค3) where๐‘ค2 = NOT(AND(๐‘Ž, ๐‘)) and ๐‘ค3 = OR(๐‘Ž, ๐‘).โ€ข If ๐‘Ž = ๐‘ = 0 then ๐‘ค3 = OR(๐‘Ž, ๐‘) = 0 and so the output will be 0.โ€ข If ๐‘Ž = ๐‘ = 1 then AND(๐‘Ž, ๐‘) = 1 and so ๐‘ค2 = NOT(AND(๐‘Ž, ๐‘)) = 0

and the output will be 0.โ€ข If ๐‘Ž = 1 and ๐‘ = 0 (or vice versa) then both ๐‘ค3 = OR(๐‘Ž, ๐‘) = 1

and ๐‘ค1 = AND(๐‘Ž, ๐‘) = 0, in which case the algorithm will outputOR(NOT(๐‘ค1), ๐‘ค3) = 1.

We can also express Algorithm 3.2 using a programming language.Specifically, the following is a Python program that computes the XORfunction:

def AND(a,b): return a*bdef OR(a,b): return 1-(1-a)*(1-b)def NOT(a): return 1-a

def XOR(a,b):w1 = AND(a,b)w2 = NOT(w1)w3 = OR(a,b)return AND(w2,w3)

# Test out the codeprint([f"XOR(a,b)=XOR(a,b)" for a in [0,1] for b in

[0,1]])# ['XOR(0,0)=0', 'XOR(0,1)=1', 'XOR(1,0)=1', 'XOR(1,1)=0']

Solved Exercise 3.2 โ€” Compute ๐‘‹๐‘‚๐‘… on three bits of input. Let XOR3 โˆถ0, 13 โ†’ 0, 1 be the function defined as XOR3(๐‘Ž, ๐‘, ๐‘) = ๐‘Ž + ๐‘ + ๐‘mod 2. That is, XOR3(๐‘Ž, ๐‘, ๐‘) = 1 if ๐‘Ž+๐‘+๐‘ is odd, and XOR3(๐‘Ž, ๐‘, ๐‘) =0 otherwise. Show that you can compute XOR3 using AND, OR, andNOT. You can express it as a formula, use a programming languagesuch as Python, or use a Boolean circuit.

Solution:

Addition modulo two satisfies the same properties of associativ-ity ((๐‘Ž + ๐‘) + ๐‘ = ๐‘Ž + (๐‘ + ๐‘)) and commutativity (๐‘Ž + ๐‘ = ๐‘ + ๐‘Ž) asstandard addition. This means that, if we define ๐‘Ž โŠ• ๐‘ to equal ๐‘Ž + ๐‘

Page 131: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 131

mod 2, thenXOR3(๐‘Ž, ๐‘, ๐‘) = (๐‘Ž โŠ• ๐‘) โŠ• ๐‘ (3.7)

or in other words

XOR3(๐‘Ž, ๐‘, ๐‘) = XOR(XOR(๐‘Ž, ๐‘), ๐‘) . (3.8)

Since we know how to compute XOR using AND, OR, andNOT, we can compose this to compute XOR3 using the same build-ing blocks. In Python this corresponds to the following program:

def XOR3(a,b,c):w1 = AND(a,b)w2 = NOT(w1)w3 = OR(a,b)w4 = AND(w2,w3)w5 = AND(w4,c)w6 = NOT(w5)w7 = OR(w4,c)return AND(w6,w7)

# Let's test this outprint([f"XOR3(a,b,c)=XOR3(a,b,c)" for a in [0,1]

for b in [0,1] for c in [0,1]])# ['XOR3(0,0,0)=0', 'XOR3(0,0,1)=1', 'XOR3(0,1,0)=1',

'XOR3(0,1,1)=0', 'XOR3(1,0,0)=1', 'XOR3(1,0,1)=0','XOR3(1,1,0)=0', 'XOR3(1,1,1)=1']

PTry to generalize the above examples to obtain a wayto compute XOR๐‘› โˆถ 0, 1๐‘› โ†’ 0, 1 for every ๐‘› us-ing at most 4๐‘› basic steps involving applications of afunction in AND,OR,NOT to outputs or previouslycomputed values.

3.2.3 Informally defining โ€œbasic operationsโ€ and โ€œalgorithmsโ€We have seen that we can obtain at least some examples of interestingfunctions by composing together applications of AND, OR, and NOT.This suggests that we can use AND, OR, and NOT as our โ€œbasic opera-tionsโ€, hence obtaining the following definition of an โ€œalgorithmโ€:

Semi-formal definition of an algorithm: An algorithm consists of asequence of steps of the form โ€œcompute a new value by applying AND,OR, or NOT to previously computed valuesโ€.

Page 132: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

132 introduction to theoretical computer science

An algorithm ๐ด computes a function ๐น if for every input ๐‘ฅ to ๐น , if wefeed ๐‘ฅ as input to the algorithm, the value computed in its last step is๐น(๐‘ฅ).There are several concerns that are raised by this definition:

1. First and foremost, this definition is indeed too informal. We do notspecify exactly what each step does, nor what it means to โ€œfeed ๐‘ฅ asinputโ€.

2. Second, the choice of AND, OR or NOT seems rather arbitrary.Why not XOR and MAJ? Why not allow operations like additionand multiplication? What about any other logical constructionssuch if/then or while?

3. Third, do we even know that this definition has anything to dowith actual computing? If someone gave us a description of such analgorithm, could we use it to actually compute the function in thereal world?

PThese concerns will to a large extent guide us in theupcoming chapters. Thus you would be well advisedto re-read the above informal definition and see whatyou think about these issues.

A large part of this book will be devoted to addressing the aboveissues. We will see that:

1. We can make the definition of an algorithm fully formal, and sogive a precise mathematical meaning to statements such as โ€œAlgo-rithm ๐ด computes function ๐‘“โ€.

2. While the choice of AND/OR/NOT is arbitrary, and we could justas well have chosen other functions, we will also see this choicedoes not matter much. We will see that we would obtain the samecomputational power if we instead used addition and multiplica-tion, and essentially every other operation that could be reasonablythought of as a basic step.

3. It turns out that we can and do compute such โ€œAND/OR/NOTbased algorithmsโ€ in the real world. First of all, such an algorithmis clearly well specified, and so can be executed by a human with apen and paper. Second, there are a variety of ways to mechanize thiscomputation. Weโ€™ve already seen that we can write Python codethat corresponds to following such a list of instructions. But in factwe can directly implement operations such as AND, OR, and NOT

Page 133: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 133

Figure 3.7: Standard symbols for the logical operationsor โ€œgatesโ€ of AND, OR, NOT, as well as the operationNAND discussed in Section 3.5.

Figure 3.8: A circuit with AND, OR and NOT gates forcomputing the XOR function.

via electronic signals using components known as transistors. This ishow modern electronic computers operate.

In the remainder of this chapter, and the rest of this book, we willbegin to answer some of these questions. We will see more examplesof the power of simple operations to compute more complex opera-tions including addition, multiplication, sorting and more. We willalso discuss how to physically implement simple operations such asAND, OR and NOT using a variety of technologies.

3.3 BOOLEAN CIRCUITS

Boolean circuits provide a precise notion of โ€œcomposing basic opera-tions togetherโ€. A Boolean circuit (see Fig. 3.9) is composed of gatesand inputs that are connected by wires. The wires carry a signal thatrepresents either the value 0 or 1. Each gate corresponds to either theOR, AND, or NOT operation. An OR gate has two incoming wires,and one or more outgoing wires. If these two incoming wires carrythe signals ๐‘Ž and ๐‘ (for ๐‘Ž, ๐‘ โˆˆ 0, 1), then the signal on the outgoingwires will be OR(๐‘Ž, ๐‘). AND and NOT gates are defined similarly. Theinputs have only outgoing wires. If we set a certain input to a value๐‘Ž โˆˆ 0, 1, then this value is propagated on all the wires outgoingfrom it. We also designate some gates as output gates, and their valuecorresponds to the result of evaluating the circuit. For example, ??gives such a circuit for the XOR function, following Section 3.2.2. Weevaluate an ๐‘›-input Boolean circuit ๐ถ on an input ๐‘ฅ โˆˆ 0, 1๐‘› by plac-ing the bits of ๐‘ฅ on the inputs, and then propagating the values on thewires until we reach an output, see Fig. 3.9.

RRemark 3.4 โ€” Physical realization of Boolean circuits.Boolean circuits are a mathematical model that does notnecessarily correspond to a physical object, but theycan be implemented physically. In physical imple-mentation of circuits, the signal is often implementedby electric potential, or voltage, on a wire, where forexample voltage above a certain level is interpretedas a logical value of 1, and below a certain level isinterpreted as a logical value of 0. Section 3.4 dis-cusses physical implementation of Boolean circuits(with examples including using electrical signals suchas in silicon-based circuits, but also biological andmechanical implementations as well).

Solved Exercise 3.3 โ€” All equal function. Define ALLEQ โˆถ 0, 14 โ†’ 0, 1to be the function that on input ๐‘ฅ โˆˆ 0, 14 outputs 1 if and only if๐‘ฅ0 = ๐‘ฅ1 = ๐‘ฅ2 = ๐‘ฅ3. Give a Boolean circuit for computing ALLEQ.

Page 134: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

134 introduction to theoretical computer science

Figure 3.9: A Boolean Circuit consists of gates that areconnected by wires to one another and the inputs. Theleft side depicts a circuit with 2 inputs and 5 gates,one of which is designated the output gate. The rightside depicts the evaluation of this circuit on the input๐‘ฅ โˆˆ 0, 12 with ๐‘ฅ0 = 1 and ๐‘ฅ1 = 0. The value ofevery gate is obtained by applying the correspondingfunction (AND, OR, or NOT) to values on the wire(s)that enter it. The output of the circuit on a giveninput is the value of the output gate(s). In this case,the circuit computes the XOR function and hence itoutputs 1 on the input 10.

Figure 3.10: A Boolean circuit for computing the allequal function ALLEQ โˆถ 0, 14 โ†’ 0, 1 that outputs1 on ๐‘ฅ โˆˆ 0, 14 if and only if ๐‘ฅ0 = ๐‘ฅ1 = ๐‘ฅ2 = ๐‘ฅ3.

Solution:

Another way to describe the function ALLEQ is that it outputs1 on an input ๐‘ฅ โˆˆ 0, 14 if and only if ๐‘ฅ = 04 or ๐‘ฅ = 14. We canphrase the condition ๐‘ฅ = 14 as ๐‘ฅ0 โˆง ๐‘ฅ1 โˆง ๐‘ฅ2 โˆง ๐‘ฅ3 which can becomputed using three AND gates. Similarly we can phrase the con-dition ๐‘ฅ = 04 as ๐‘ฅ0 โˆง ๐‘ฅ1 โˆง ๐‘ฅ2 โˆง ๐‘ฅ3 which can be computed using fourNOT gates and three AND gates. The output of ALLEQ is the ORof these two conditions, which results in the circuit of 4 NOT gates,6 AND gates, and one OR gate presented in Fig. 3.10.

3.3.1 Boolean circuits: a formal definitionWe defined Boolean circuits informally as obtained by connectingAND, OR, and NOT gates via wires so as to produce an output froman input. However, to be able to prove theorems about the existence ornon-existence of Boolean circuits for computing various functions weneed to:

1. Formally define a Boolean circuit as a mathematical object.

2. Formally define what it means for a circuit ๐ถ to compute a function๐‘“ .We now proceed to do so. We will define a Boolean circuit as a

labeled Directed Acyclic Graph (DAG). The vertices of the graph corre-spond to the gates and inputs of the circuit, and the edges of the graphcorrespond to the wires. A wire from an input or gate ๐‘ข to a gate ๐‘ฃ inthe circuit corresponds to a directed edge between the correspondingvertices. The inputs are vertices with no incoming edges, while eachgate has the appropriate number of incoming edges based on the func-tion it computes. (That is, AND and OR gates have two in-neighbors,while NOT gates have one in-neighbor.) The formal definition is asfollows (see also Fig. 3.11):

Page 135: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 135

Figure 3.11: A Boolean Circuit is a labeled directedacyclic graph (DAG). It has ๐‘› input vertices, which aremarked with X[0],โ€ฆ, X[๐‘› โˆ’ 1] and have no incomingedges, and the rest of the vertices are gates. AND,OR, and NOT gates have two, two, and one incomingedges, respectively. If the circuit has ๐‘š outputs, then๐‘š of the gates are known as outputs and are markedwith Y[0],โ€ฆ,Y[๐‘š โˆ’ 1]. When we evaluate a circuit๐ถ on an input ๐‘ฅ โˆˆ 0, 1๐‘›, we start by setting thevalue of the input vertices to ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 and thenpropagate the values, assigning to each gate ๐‘” theresult of applying the operation of ๐‘” to the values of๐‘”โ€™s in-neighbors. The output of the circuit is the valueassigned to the output gates.

1 Having parallel edges means an AND or OR gate๐‘ข can have both its in-neighbors be the same gate๐‘ฃ. Since AND(๐‘Ž, ๐‘Ž) = OR(๐‘Ž, ๐‘Ž) = ๐‘Ž for every ๐‘Ž โˆˆ0, 1, such parallel edges donโ€™t help in computingnew values in circuits with AND/OR/NOT gates.However, we will see circuits with more general setsof gates later on.

Definition 3.5 โ€” Boolean Circuits. Let ๐‘›, ๐‘š, ๐‘  be positive integers with๐‘  โ‰ฅ ๐‘š. A Boolean circuit with ๐‘› inputs, ๐‘š outputs, and ๐‘  gates, is alabeled directed acyclic graph (DAG) ๐บ = (๐‘‰ , ๐ธ) with ๐‘ +๐‘› verticessatisfying the following properties:

โ€ข Exactly ๐‘› of the vertices have no in-neighbors. These verticesare known as inputs and are labeled with the ๐‘› labels X[0], โ€ฆ,X[๐‘› โˆ’ 1]. Each input has at least one out-neighbor.

โ€ข The other ๐‘  vertices are known as gates. Each gate is labeled withโˆง, โˆจ or ยฌ. Gates labeled with โˆง (AND) or โˆจ (OR) have two in-neighbors. Gates labeled with ยฌ (NOT) have one in-neighbor.We will allow parallel edges. 1

โ€ข Exactly ๐‘š of the gates are also labeled with the ๐‘š labels Y[0], โ€ฆ,Y[๐‘š โˆ’ 1] (in addition to their label โˆง/โˆจ/ยฌ). These are known asoutputs.

The size of a Boolean circuit is the number of gates it contains.

PThis is a non-trivial mathematical definition, so it isworth taking the time to read it slowly and carefully.As in all mathematical definitions, we are using aknown mathematical object โ€” a directed acyclic graph(DAG) โ€” to define a new object, a Boolean circuit.This might be a good time to review some of the basic

Page 136: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

136 introduction to theoretical computer science

properties of DAGs and in particular the fact that theycan be topologically sorted, see Section 1.6.

If ๐ถ is a circuit with ๐‘› inputs and ๐‘š outputs, and ๐‘ฅ โˆˆ 0, 1๐‘›, thenwe can compute the output of ๐ถ on the input ๐‘ฅ in the natural way:assign the input vertices X[0], โ€ฆ, X[๐‘› โˆ’ 1] the values ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1,apply each gate on the values of its in-neighbors, and then output thevalues that correspond to the output vertices. Formally, this is definedas follows:

Definition 3.6 โ€” Computing a function via a Boolean circuit. Let ๐ถ be aBoolean circuit with ๐‘› inputs and ๐‘š outputs. For every ๐‘ฅ โˆˆ 0, 1๐‘›,the output of ๐ถ on the input ๐‘ฅ, denoted by ๐ถ(๐‘ฅ), is defined as theresult of the following process:

We let โ„Ž โˆถ ๐‘‰ โ†’ โ„• be the minimal layering of ๐ถ (aka topologicalsorting, see Theorem 1.26). We let ๐ฟ be the maximum layer of โ„Ž,and for โ„“ = 0, 1, โ€ฆ , ๐ฟ we do the following:

โ€ข For every ๐‘ฃ in the โ„“-th layer (i.e., ๐‘ฃ such that โ„Ž(๐‘ฃ) = โ„“) do:

โ€“ If ๐‘ฃ is an input vertex labeled with X[๐‘–] for some ๐‘– โˆˆ [๐‘›], thenwe assign to ๐‘ฃ the value ๐‘ฅ๐‘–.

โ€“ If ๐‘ฃ is a gate vertex labeled with โˆง and with two in-neighbors๐‘ข, ๐‘ค then we assign to ๐‘ฃ the AND of the values assigned to๐‘ข and ๐‘ค. (Since ๐‘ข and ๐‘ค are in-neighbors of ๐‘ฃ, they are in alower layer than ๐‘ฃ, and hence their values have already beenassigned.)

โ€“ If ๐‘ฃ is a gate vertex labeled with โˆจ and with two in-neighbors๐‘ข, ๐‘ค then we assign to ๐‘ฃ the OR of the values assigned to ๐‘ขand ๐‘ค.

โ€“ If ๐‘ฃ is a gate vertex labeled with ยฌ and with one in-neighbor ๐‘ขthen we assign to ๐‘ฃ the negation of the value assigned to ๐‘ข.

โ€ข The result of this process is the value ๐‘ฆ โˆˆ 0, 1๐‘š such that forevery ๐‘— โˆˆ [๐‘š], ๐‘ฆ๐‘— is the value assigned to the vertex with labelY[๐‘—].Let ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š. We say that the circuit ๐ถ computes ๐‘“ if

for every ๐‘ฅ โˆˆ 0, 1๐‘›, ๐ถ(๐‘ฅ) = ๐‘“(๐‘ฅ).R

Remark 3.7 โ€” Boolean circuits nitpicks (optional). Inphrasing Definition 3.5, weโ€™ve made some technical

Page 137: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 137

choices that are not very important, but will be con-venient for us later on. Having parallel edges meansan AND or OR gate ๐‘ข can have both its in-neighborsbe the same gate ๐‘ฃ. Since AND(๐‘Ž, ๐‘Ž) = OR(๐‘Ž, ๐‘Ž) = ๐‘Žfor every ๐‘Ž โˆˆ 0, 1, such parallel edges donโ€™t help incomputing new values in circuits with AND/OR/NOTgates. However, we will see circuits with more gen-eral sets of gates later on. The condition that everyinput vertex has at least one out-neighbor is also notvery important because we can always add โ€œdummygatesโ€ that touch these inputs. However, it is conve-nient since it guarantees that (since every gate has atmost two in-neighbors) that the number of inputs in acircuit is never larger than twice its size.

3.3.2 Equivalence of circuits and straight-line programsWe have seen two ways to describe how to compute a function ๐‘“ usingAND, OR and NOT:

โ€ข A Boolean circuit, defined in Definition 3.5, computes ๐‘“ by connect-ing via wires AND, OR, and NOT gates to the inputs.

โ€ข We can also describe such a computation using a straight-lineprogram that has lines of the form foo = AND(bar,blah), foo =OR(bar,blah) and foo = NOT(bar)where foo, bar and blah arevariable names. (We call this a straight-line program since it containsno loops or branching (e.g., if/then) statements.)

We now formally define the AON-CIRC programming language(โ€œAONโ€ stands for AND/OR/NOT; โ€œCIRCโ€ stands for circuit) whichhas the above operations, and show that it is equivalent to Booleancircuits.

Definition 3.8 โ€” AON-CIRC Programming language. An AON-CIRC pro-gram is a string of lines of the form foo = AND(bar,blah), foo= OR(bar,blah) and foo = NOT(bar)where foo, bar and blahare variable names. 2 Variables of the form X[๐‘–] are known asinput variables, and variables of the form Y[๐‘—] are known as outputvariables. In every line, the variables on the righthand side of theassignment operators must either be input variables or variablesthat have already been assigned a value.

A valid AON-CIRC program ๐‘ƒ includes input variables of theform X[0],โ€ฆ,X[๐‘› โˆ’ 1] and output variables of the form Y[0],โ€ฆ,Y[๐‘š โˆ’ 1] for some ๐‘›, ๐‘š โ‰ฅ 1. If ๐‘ƒ is valid AON-CIRC program and๐‘ฅ โˆˆ 0, 1๐‘›, then we define the output of ๐‘ƒ on input ๐‘ฅ, denoted by๐‘ƒ(๐‘ฅ), to be the string ๐‘ฆ โˆˆ 0, 1๐‘š corresponding to the values ofthe output variables Y[0] ,โ€ฆ, Y[๐‘š โˆ’ 1] in the execution of ๐‘ƒ where

Page 138: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

138 introduction to theoretical computer science

2 We follow the common programming languagesconvention of using names such as foo, bar, baz,blah as stand-ins for generic identifiers. A variableidentifier in our programming language can beany combination of letters, numbers, underscores,and brackets. The appendix contains a full formalspecification of our programming language.

we initialize the input variables X[0],โ€ฆ,X[๐‘› โˆ’ 1] to the values๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1.We say that such an AON-CIRC program ๐‘ƒ computes a function๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š if ๐‘ƒ(๐‘ฅ) = ๐‘“(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1๐‘›.

AON-CIRC is not a practical programming language: it was de-signed for pedagogical purposes only, as a way to model computationas composition of AND, OR, and NOT. However, AON-CIRC can stillbe easily implemented on a computer. The following solved exercisegives an example of an AON-CIRC program.Solved Exercise 3.4 Consider the following function CMP โˆถ 0, 14 โ†’0, 1 that on four input bits ๐‘Ž, ๐‘, ๐‘, ๐‘‘ โˆˆ 0, 1, outputs 1 iff the numberrepresented by (๐‘Ž, ๐‘) is larger than the number represented by (๐‘, ๐‘‘).That is CMP(๐‘Ž, ๐‘, ๐‘, ๐‘‘) = 1 iff 2๐‘Ž + ๐‘ > 2๐‘ + ๐‘‘.

Write an AON-CIRC program to compute CMP.

Solution:

Writing such a program is tedious but not truly hard. To com-pare two numbers we first compare their most significant digit,and then go down to the next digit and so on and so forth. In thiscase where the numbers have just two binary digits, these compar-isons are particularly simple. The number represented by (๐‘Ž, ๐‘) islarger than the number represented by (๐‘, ๐‘‘) if and only if one ofthe following conditions happens:

1. The most significant bit ๐‘Ž of (๐‘Ž, ๐‘) is larger than the most signifi-cant bit ๐‘ of (๐‘, ๐‘‘).or

2. The two most significant bits ๐‘Ž and ๐‘ are equal, but ๐‘ > ๐‘‘.Another way to express the same condition is the following: the

number (๐‘Ž, ๐‘) is larger than (๐‘, ๐‘‘) iff ๐‘Ž > ๐‘ OR (๐‘Ž โ‰ฅ ๐‘ AND ๐‘ > ๐‘‘).For binary digits ๐›ผ, ๐›ฝ, the condition ๐›ผ > ๐›ฝ is simply that ๐›ผ = 1

and ๐›ฝ = 0 or AND(๐›ผ,NOT(๐›ฝ)) = 1, and the condition ๐›ผ โ‰ฅ ๐›ฝ is sim-ply OR(๐›ผ,NOT(๐›ฝ)) = 1. Together these observations can be used togive the following AON-CIRC program to compute CMP:

temp_1 = NOT(X[2])temp_2 = AND(X[0],temp_1)temp_3 = OR(X[0],temp_1)temp_4 = NOT(X[3])

Page 139: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 139

Figure 3.12: A circuit for computing the CMP function.The evaluation of this circuit on (1, 1, 1, 0) yields theoutput 1, since the number 3 (represented in binaryas 11) is larger than the number 2 (represented inbinary as 10).

temp_5 = AND(X[1],temp_4)temp_6 = AND(temp_5,temp_3)Y[0] = OR(temp_2,temp_6)

We can also present this 8-line program as a circuit with 8 gates,see Fig. 3.12.

It turns out that AON-CIRC programs and Boolean circuits haveexactly the same power:

Theorem 3.9 โ€” Equivalence of circuits and straight-line programs. Let๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š and ๐‘  โ‰ฅ ๐‘š be some number. Then ๐‘“ iscomputable by a Boolean circuit of ๐‘  gates if and only if ๐‘“ is com-putable by an AON-CIRC program of ๐‘  lines.

Proof Idea:

The idea is simple - AON-CIRC programs and Boolean circuitsare just different ways of describing the exact same computationalprocess. For example, an AND gate in a Boolean circuit correspondingto computing the AND of two previously-computed values. In a AON-CIRC program this will correspond to the line that stores in a variablethe AND of two previously-computed variables.โ‹†

PThis proof of Theorem 3.9 is simple at heart, but allthe details it contains can make it a little cumbersometo read. You might be better off trying to work it outyourself before reading it. Our GitHub repository con-tains a โ€œproof by Pythonโ€ of Theorem 3.9: implemen-tation of functions circuit2prog and prog2circuitsmapping Boolean circuits to AON-CIRC programs andvice versa.

Proof of Theorem 3.9. Let ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š. Since the theorem is anโ€œif and only ifโ€ statement, to prove it we need to show both directions:translating an AON-CIRC program that computes ๐‘“ into a circuit thatcomputes ๐‘“ , and translating a circuit that computes ๐‘“ into an AON-CIRC program that does so.

We start with the first direction. Let ๐‘ƒ be an ๐‘  line AON-CIRC thatcomputes ๐‘“ . We define a circuit ๐ถ as follows: the circuit will have ๐‘›inputs and ๐‘  gates. For every ๐‘– โˆˆ [๐‘ ], if the ๐‘–-th line has the form foo= AND(bar,blah) then the ๐‘–-th gate in the circuit will be an ANDgate that is connected to gates ๐‘— and ๐‘˜ where ๐‘— and ๐‘˜ correspond tothe last lines before ๐‘– where the variables bar and blah (respectively)

Page 140: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

140 introduction to theoretical computer science

Figure 3.13: Two equivalent descriptions of the sameAND/OR/NOT computation as both an AON pro-gram and a Boolean circuit.

where written to. (For example, if ๐‘– = 57 and the last line bar waswritten to is 35 and the last line blah was written to is 17 then the twoin-neighbors of gate 57 will be gates 35 and 17.) If either bar or blah isan input variable then we connect the gate to the corresponding inputvertex instead. If foo is an output variable of the form Y[๐‘—] then weadd the same label to the corresponding gate to mark it as an outputgate. We do the analogous operations if the ๐‘–-th line involves an ORor a NOT operation (except that we use the corresponding OR or NOTgate, and in the latter case have only one in-neighbor instead of two).For every input ๐‘ฅ โˆˆ 0, 1๐‘›, if we run the program ๐‘ƒ on ๐‘ฅ, then thevalue written that is computed in the ๐‘–-th line is exactly the valuethat will be assigned to the ๐‘–-th gate if we evaluate the circuit ๐ถ on ๐‘ฅ.Hence ๐ถ(๐‘ฅ) = ๐‘ƒ(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1๐‘›.

For the other direction, let ๐ถ be a circuit of ๐‘  gates and ๐‘› inputs thatcomputes the function ๐‘“ . We sort the gates according to a topologicalorder and write them as ๐‘ฃ0, โ€ฆ , ๐‘ฃ๐‘ โˆ’1. We now can create a program๐‘ƒ of ๐‘  lines as follows. For every ๐‘– โˆˆ [๐‘ ], if ๐‘ฃ๐‘– is an AND gate within-neighbors ๐‘ฃ๐‘—, ๐‘ฃ๐‘˜ then we will add a line to ๐‘ƒ of the form temp_๐‘–= AND(temp_๐‘—,temp_๐‘˜), unless one of the vertices is an input vertexor an output gate, in which case we change this to the form X[.] orY[.] appropriately. Because we work in topological ordering, we areguaranteed that the in-neighbors ๐‘ฃ๐‘— and ๐‘ฃ๐‘˜ correspond to variablesthat have already been assigned a value. We do the same for OR andNOT gates. Once again, one can verify that for every input ๐‘ฅ, thevalue ๐‘ƒ(๐‘ฅ) will equal ๐ถ(๐‘ฅ) and hence the program computes thesame function as the circuit. (Note that since ๐ถ is a valid circuit, perDefinition 3.5, every input vertex of ๐ถ has at least one out-neighborand there are exactly ๐‘š output gates labeled 0, โ€ฆ , ๐‘š โˆ’ 1; hence all thevariables X[0], โ€ฆ, X[๐‘› โˆ’ 1] and Y[0] ,โ€ฆ, Y[๐‘š โˆ’ 1] will appear in theprogram ๐‘ƒ .)

3.4 PHYSICAL IMPLEMENTATIONS OF COMPUTING DEVICES (DI-GRESSION)

Computation is an abstract notion that is distinct from its physicalimplementations. While most modern computing devices are obtainedby mapping logical gates to semiconductor based transistors, overhistory people have computed using a huge variety of mechanisms,including mechanical systems, gas and liquid (known as fluidics),biological and chemical processes, and even living creatures (e.g., seeFig. 3.14 or this video for how crabs or slime mold can be used to docomputations).

Page 141: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 141

Figure 3.14: Crab-based logic gates from the paperโ€œRobust soldier-crab ball gateโ€ by Gunji, Nishiyamaand Adamatzky. This is an example of an AND gatethat relies on the tendency of two swarms of crabsarriving from different directions to combine to asingle swarm that continues in the average of thedirections.

Figure 3.15: We can implement the logic of transistorsusing water. The water pressure from the gate closesor opens a faucet between the source and the sink.

In this section we will review some of these implementations, bothso you can get an appreciation of how it is possible to directly translateBoolean circuits to the physical world, without going through the en-tire stack of architecture, operating systems, and compilers, as well asto emphasize that silicon-based processors are by no means the onlyway to perform computation. Indeed, as we will see in Chapter 23,a very exciting recent line of work involves using different media forcomputation that would allow us to take advantage of quantum me-chanical effects to enable different types of algorithms.

Such a cool way to explain logic gates. pic.twitter.com/6Wgu2ZKFCxโ€” Lionel Page (@page_eco) October 28, 2019

3.4.1 TransistorsA transistor can be thought of as an electric circuit with two inputs,known as the source and the gate and an output, known as the sink.The gate controls whether current flows from the source to the sink. Ina standard transistor, if the gate is โ€œONโ€ then current can flow from thesource to the sink and if it is โ€œOFFโ€ then it canโ€™t. In a complementarytransistor this is reversed: if the gate is โ€œOFFโ€ then current can flowfrom the source to the sink and if it is โ€œONโ€ then it canโ€™t.

There are several ways to implement the logic of a transistor. Forexample, we can use faucets to implement it using water pressure(e.g. Fig. 3.15). This might seem as merely a curiosity, but there isa field known as fluidics concerned with implementing logical op-erations using liquids or gasses. Some of the motivations includeoperating in extreme environmental conditions such as in space or abattlefield, where standard electronic equipment would not survive.

The standard implementations of transistors use electrical current.One of the original implementations used vacuum tubes. As its nameimplies, a vacuum tube is a tube containing nothing (i.e., a vacuum)and where a priori electrons could freely flow from the source (awire) to the sink (a plate). However, there is a gate (a grid) betweenthe two, where modulating its voltage can block the flow of electrons.

Early vacuum tubes were roughly the size of lightbulbs (andlooked very much like them too). In the 1950โ€™s they were supplantedby transistors, which implement the same logic using semiconduc-tors which are materials that normally do not conduct electricity butwhose conductivity can be modified and controlled by inserting impu-rities (โ€œdopingโ€) and applying an external electric field (this is knownas the field effect). In the 1960โ€™s computers started to be implementedusing integrated circuits which enabled much greater density. In 1965,Gordon Moore predicted that the number of transistors per integratedcircuit would double every year (see Fig. 3.16), and that this wouldlead to โ€œsuch wonders as home computers โ€”or at least terminals con-

Page 142: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

142 introduction to theoretical computer science

Figure 3.16: The number of transistors per integratedcircuits from 1959 till 1965 and a prediction that ex-ponential growth will continue for at least anotherdecade. Figure taken from โ€œCramming More Com-ponents onto Integrated Circuitsโ€, Gordon Moore,1965

Figure 3.17: Cartoon from Gordon Mooreโ€™s articleโ€œpredictingโ€ the implications of radically improvingtransistor density.

Figure 3.18: The exponential growth in computingpower over the last 120 years. Graph by Steve Jurvet-son, extending a prior graph of Ray Kurzweil.

Figure 3.19: Implementing logical gates using transis-tors. Figure taken from Rory Manglesโ€™ website.

nected to a central computerโ€” automatic controls for automobiles,and personal portable communications equipmentโ€. Since then, (ad-justed versions of) this so-called โ€œMooreโ€™s lawโ€ have been runningstrong, though exponential growth cannot be sustained forever, andsome physical limitations are already becoming apparent.

3.4.2 Logical gates from transistorsWe can use transistors to implement various Boolean functions such asAND, OR, and NOT. For each a two-input gate ๐บ โˆถ 0, 12 โ†’ 0, 1,such an implementation would be a system with two input wires ๐‘ฅ, ๐‘ฆand one output wire ๐‘ง, such that if we identify high voltage with โ€œ1โ€and low voltage with โ€œ0โ€, then the wire ๐‘ง will equal to โ€œ1โ€ if and onlyif applying ๐บ to the values of the wires ๐‘ฅ and ๐‘ฆ is 1 (see Fig. 3.19 andFig. 3.20). This means that if there exists a AND/OR/NOT circuit tocompute a function ๐‘” โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š, then we can compute ๐‘” inthe physical world using transistors as well.

3.4.3 Biological computingComputation can be based on biological or chemical systems. For ex-ample the lac operon produces the enzymes needed to digest lactoseonly if the conditions ๐‘ฅ โˆง (ยฌ๐‘ฆ) hold where ๐‘ฅ is โ€œlactose is presentโ€ and๐‘ฆ is โ€œglucose is presentโ€. Researchers have managed to create transis-tors, and from them logic gates, based on DNA molecules (see alsoFig. 3.21). One motivation for DNA computing is to achieve increasedparallelism or storage density; another is to create โ€œsmart biologicalagentsโ€ that could perhaps be injected into bodies, replicate them-selves, and fix or kill cells that were damaged by a disease such ascancer. Computing in biological systems is not restricted, of course, toDNA: even larger systems such as flocks of birds can be considered ascomputational processes.

3.4.4 Cellular automata and the game of lifeCellular automata is a model of a system composed of a sequence ofcells, each of which can have a finite state. At each step, a cell updatesits state based on the states of its neighboring cells and some simplerules. As we will discuss later in this book (see Section 8.4), cellularautomata such as Conwayโ€™s โ€œGame of Lifeโ€ can be used to simulatecomputation gates .

3.4.5 Neural networksOne computation device that we all carry with us is our own brain.Brains have served humanity throughout history, doing computationsthat range from distinguishing prey from predators, through makingscientific discoveries and artistic masterpieces, to composing witty 280character messages. The exact working of the brain is still not fully

Page 143: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 143

Figure 3.20: Implementing a NAND gate (see Sec-tion 3.5) using transistors.

Figure 3.21: Performance of DNA-based logic gates.Figure taken from paper of Bonnet et al, Science, 2013.

Figure 3.22: An AND gate using a โ€œGame of Lifeโ€configuration. Figure taken from Jean-PhilippeRennardโ€™s paper.

understood, but one common mathematical model for it is a (verylarge) neural network.

A neural network can be thought of as a Boolean circuit that insteadof AND/OR/NOT uses some other gates as the basic basis. For exam-ple, one particular basis we can use are threshold gates. For every vector๐‘ค = (๐‘ค0, โ€ฆ , ๐‘ค๐‘˜โˆ’1) of integers and integer ๐‘ก (some or all of whichcould be negative), the threshold function corresponding to ๐‘ค, ๐‘ก is thefunction ๐‘‡๐‘ค,๐‘ก โˆถ 0, 1๐‘˜ โ†’ 0, 1 that maps ๐‘ฅ โˆˆ 0, 1๐‘˜ to 1 if and only ifโˆ‘๐‘˜โˆ’1๐‘–=0 ๐‘ค๐‘–๐‘ฅ๐‘– โ‰ฅ ๐‘ก. For example, the threshold function ๐‘‡๐‘ค,๐‘ก correspond-ing to ๐‘ค = (1, 1, 1, 1, 1) and ๐‘ก = 3 is simply the majority function MAJ5on 0, 15. Threshold gates can be thought of as an approximation forneuron cells that make up the core of human and animal brains. To afirst approximation, a neuron has ๐‘˜ inputs and a single output, andthe neurons โ€œfiresโ€ or โ€œturns onโ€ its output when those signals passsome threshold.

Many machine learning algorithms use artificial neural networkswhose purpose is not to imitate biology but rather to perform somecomputational tasks, and hence are not restricted to threshold orother biologically-inspired gates. Generally, a neural network is oftendescribed as operating on signals that are real numbers, rather than0/1 values, and where the output of a gate on inputs ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1 isobtained by applying ๐‘“(โˆ‘๐‘– ๐‘ค๐‘–๐‘ฅ๐‘–) where ๐‘“ โˆถ โ„ โ†’ โ„ is an activationfunction such as rectified linear unit (ReLU), Sigmoid, or many others(see Fig. 3.23). However, for the purposes of our discussion, all ofthe above are equivalent (see also Exercise 3.13). In particular we canreduce the setting of real inputs to binary inputs by representing areal number in the binary basis, and multiplying the weight of the bitcorresponding to the ๐‘–๐‘กโ„Ž digit by 2๐‘–.3.4.6 A computer made from marbles and pipesWe can implement computation using many other physical media,without any electronic, biological, or chemical components. Manysuggestions for mechanical computers have been put forward, goingback at least to Gottfried Leibnizโ€™ computing machines from the 1670sand Charles Babbageโ€™s 1837 plan for a mechanical โ€œAnalytical Engineโ€.As one example, Fig. 3.24 shows a simple implementation of a NAND(negation of AND, see Section 3.5) gate using marbles going throughpipes. We represent a logical value in 0, 1 by a pair of pipes, suchthat there is a marble flowing through exactly one of the pipes. Wecall one of the pipes the โ€œ0 pipeโ€ and the other the โ€œ1 pipeโ€, and sothe identity of the pipe containing the marble determines the logicalvalue. A NAND gate corresponds to a mechanical object with twopairs of incoming pipes and one pair of outgoing pipes, such that forevery ๐‘Ž, ๐‘ โˆˆ 0, 1, if two marbles are rolling toward the object in the

Page 144: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

144 introduction to theoretical computer science

Figure 3.23: Common activation functions used inNeural Networks, including rectified linear units(ReLU), sigmoids, and hyperbolic tangent. All ofthose can be thought of as continuous approximationsto simple the step function. All of these can be usedto compute the NAND gate (see Exercise 3.13). Thisproperty enables neural networks to (approximately)compute any function that can be computed by aBoolean circuit.

Figure 3.24: A physical implementation of a NANDgate using marbles. Each wire in a Boolean circuit ismodeled by a pair of pipes representing the values0 and 1 respectively, and hence a gate has four inputpipes (two for each logical input) and two outputpipes. If one of the input pipes representing the value0 has a marble in it then that marble will flow to theoutput pipe representing the value 1. (The dashedline represents a gadget that will ensure that at mostone marble is allowed to flow onward in the pipe.)If both the input pipes representing the value 1 havemarbles in them, then the first marble will be stuckbut the second one will flow onwards to the outputpipe representing the value 0.

Figure 3.25: A โ€œgadgetโ€ in a pipe that ensures that atmost one marble can pass through it. The first marblethat passes causes the barrier to lift and block newones.

๐‘Ž pipe of the first pair and the ๐‘ pipe of the second pair, then a marblewill roll out of the object in the NAND(๐‘Ž, ๐‘)-pipe of the outgoing pair.In fact, there is even a commercially-available educational game thatuses marbles as a basis of computing, see Fig. 3.26.

3.5 THE NAND FUNCTION

The NAND function is another simple function that is extremely use-ful for defining computation. It is the function mapping 0, 12 to0, 1 defined by:

NAND(๐‘Ž, ๐‘) = โŽงโŽจโŽฉ0 ๐‘Ž = ๐‘ = 11 otherwise. (3.9)

As its name implies, NAND is the NOT of AND (i.e., NAND(๐‘Ž, ๐‘) =NOT(AND(๐‘Ž, ๐‘))), and so we can clearly compute NAND using ANDand NOT. Interestingly, the opposite direction holds as well:

Theorem 3.10 โ€” NAND computes AND,OR,NOT. We can compute AND,OR, and NOT by composing only the NAND function.

Proof. We start with the following observation. For every ๐‘Ž โˆˆ 0, 1,AND(๐‘Ž, ๐‘Ž) = ๐‘Ž. Hence, NAND(๐‘Ž, ๐‘Ž) = NOT(AND(๐‘Ž, ๐‘Ž)) = NOT(๐‘Ž).This means that NAND can compute NOT. By the principle of โ€œdou-ble negationโ€, AND(๐‘Ž, ๐‘) = NOT(NOT(AND(๐‘Ž, ๐‘))), and hencewe can use NAND to compute AND as well. Once we can computeAND and NOT, we can compute OR using โ€œDe Morganโ€™s Lawโ€:OR(๐‘Ž, ๐‘) = NOT(AND(NOT(๐‘Ž),NOT(๐‘))) (which can also be writ-ten as ๐‘Ž โˆจ ๐‘ = ๐‘Ž โˆง ๐‘) for every ๐‘Ž, ๐‘ โˆˆ 0, 1.

PTheorem 3.10โ€™s proof is very simple, but you shouldmake sure that (i) you understand the statement ofthe theorem, and (ii) you follow its proof. In partic-ular, you should make sure you understand why DeMorganโ€™s law is true.

We can use NAND to compute many other functions, as demon-strated in the following exercise.Solved Exercise 3.5 โ€” Compute majority with NAND. Let MAJ โˆถ 0, 13 โ†’0, 1 be the function that on input ๐‘Ž, ๐‘, ๐‘ outputs 1 iff ๐‘Ž + ๐‘ + ๐‘ โ‰ฅ 2.Show how to compute MAJ using a composition of NANDโ€™s.

Page 145: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 145

Figure 3.26: The game โ€œTuring Tumbleโ€ contains animplementation of logical gates using marbles.

Figure 3.27: A circuit with NAND gates to computethe Majority function on three bits

Solution:

Recall that (3.5) states that

MAJ(๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2) = OR (AND(๐‘ฅ0, ๐‘ฅ1) , OR(AND(๐‘ฅ1, ๐‘ฅ2) , AND(๐‘ฅ0, ๐‘ฅ2)) ) .(3.10)

We can use Theorem 3.10 to replace all the occurrences of ANDand OR with NANDโ€™s. Specifically, we can use the equivalenceAND(๐‘Ž, ๐‘) = NOT(NAND(๐‘Ž, ๐‘)), OR(๐‘Ž, ๐‘) = NAND(NOT(๐‘Ž),NOT(๐‘)),and NOT(๐‘Ž) = NAND(๐‘Ž, ๐‘Ž) to replace the righthand side of(3.10) with an expression involving only NAND, yielding thatMAJ(๐‘Ž, ๐‘, ๐‘) is equivalent to the (somewhat unwieldy) expression

NAND(NAND(NAND(NAND(๐‘Ž, ๐‘),NAND(๐‘Ž, ๐‘)),NAND(NAND(๐‘Ž, ๐‘),NAND(๐‘Ž, ๐‘)) ),

NAND(๐‘, ๐‘) ) (3.11)

The same formula can also be expressed as a circuit with NANDgates, see Fig. 3.27.

3.5.1 NAND CircuitsWe define NAND Circuits as circuits in which all the gates are NANDoperations. Such a circuit again corresponds to a directed acyclicgraph (DAG) since all the gates correspond to the same function (i.e.,NAND), we do not even need to label them, and all gates have in-degree exactly two. Despite their simplicity, NAND circuits can bequite powerful.

Example 3.11 โ€” ๐‘๐ด๐‘๐ท circuit for ๐‘‹๐‘‚๐‘…. Recall the XOR functionwhich maps ๐‘ฅ0, ๐‘ฅ1 โˆˆ 0, 1 to ๐‘ฅ0 + ๐‘ฅ1 mod 2. We have seen inSection 3.2.2 that we can compute XOR using AND, OR, and NOT,and so by Theorem 3.10 we can compute it using only NANDโ€™s.However, the following is a direct construction of computing XORby a sequence of NAND operations:

1. Let ๐‘ข = NAND(๐‘ฅ0, ๐‘ฅ1).2. Let ๐‘ฃ = NAND(๐‘ฅ0, ๐‘ข)3. Let ๐‘ค = NAND(๐‘ฅ1, ๐‘ข).4. The XOR of ๐‘ฅ0 and ๐‘ฅ1 is ๐‘ฆ0 = NAND(๐‘ฃ, ๐‘ค).

Page 146: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

146 introduction to theoretical computer science

Figure 3.28: A circuit with NAND gates to computethe XOR of two bits.

One can verify that this algorithm does indeed compute XORby enumerating all the four choices for ๐‘ฅ0, ๐‘ฅ1 โˆˆ 0, 1. We can alsorepresent this algorithm graphically as a circuit, see Fig. 3.28.

In fact, we can show the following theorem:

Theorem 3.12 โ€” NAND is a universal operation. For every Boolean circuit๐ถ of ๐‘  gates, there exists a NAND circuit ๐ถโ€ฒ of at most 3๐‘  gates thatcomputes the same function as ๐ถ.

Proof Idea:

The idea of the proof is to just replace every AND, OR and NOTgate with their NAND implementation following the proof of Theo-rem 3.10.โ‹†Proof of Theorem 3.12. If ๐ถ is a Boolean circuit, then since, as weโ€™veseen in the proof of Theorem 3.10, for every ๐‘Ž, ๐‘ โˆˆ 0, 1โ€ข NOT(๐‘Ž) = NAND(๐‘Ž, ๐‘Ž)โ€ข AND(๐‘Ž, ๐‘) = NAND(NAND(๐‘Ž, ๐‘),NAND(๐‘Ž, ๐‘))โ€ข OR(๐‘Ž, ๐‘) = NAND(NAND(๐‘Ž, ๐‘Ž),NAND(๐‘, ๐‘))

we can replace every gate of ๐ถ with at most three NAND gates toobtain an equivalent circuit ๐ถโ€ฒ. The resulting circuit will have at most3๐‘  gates.

Big Idea 3 Two models are equivalent in power if they can be usedto compute the same set of functions.

3.5.2 More examples of NAND circuits (optional)Here are some more sophisticated examples of NAND circuits:

Incrementing integers. Consider the task of computing, given as inputa string ๐‘ฅ โˆˆ 0, 1๐‘› that represents a natural number ๐‘‹ โˆˆ โ„•, therepresentation of ๐‘‹ + 1. That is, we want to compute the functionINC๐‘› โˆถ 0, 1๐‘› โ†’ 0, 1๐‘›+1 such that for every ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, INC๐‘›(๐‘ฅ) =๐‘ฆ which satisfies โˆ‘๐‘›๐‘–=0 ๐‘ฆ๐‘–2๐‘– = (โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘–2๐‘–) + 1. (For simplicity ofnotation, in this example we use the representation where the leastsignificant digit is first rather than last.)

The increment operation can be very informally described as fol-lows: โ€œAdd 1 to the least significant bit and propagate the carryโ€. A little

Page 147: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 147

Figure 3.29: NAND circuit with computing the incre-ment function on 4 bits.

more precisely, in the case of the binary representation, to obtain theincrement of ๐‘ฅ, we scan ๐‘ฅ from the least significant bit onwards, andflip all 1โ€™s to 0โ€™s until we encounter a bit equal to 0, in which case weflip it to 1 and stop.

Thus we can compute the increment of ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 by doing thefollowing:

Algorithm 3.13 โ€” Compute Increment Function.

Input: ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 representing the number โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘– โ‹… 2๐‘–# we use LSB-first representation

Output: ๐‘ฆ โˆˆ 0, 1๐‘›+1 such that โˆ‘๐‘›๐‘–=0 ๐‘ฆ๐‘– โ‹…2๐‘– = โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘– โ‹…2๐‘–+11: Let ๐‘0 โ† 1 # we pretend we have a โ€carryโ€ of 1 initially2: for ๐‘– = 0, โ€ฆ , ๐‘› โˆ’ 1 do3: Let ๐‘ฆ๐‘– โ† ๐‘‹๐‘‚๐‘…(๐‘ฅ๐‘–, ๐‘๐‘–).4: if ๐‘๐‘– = ๐‘ฅ๐‘– = 1 then5: ๐‘๐‘–+1 = 16: else7: ๐‘๐‘–+1 = 08: end if9: end for

10: Let ๐‘ฆ๐‘› โ† ๐‘๐‘›.Algorithm 3.13 describes precisely how to compute the increment

operation, and can be easily transformed into Python code that per-forms the same computation, but it does not seem to directly yielda NAND circuit to compute this. However, we can transform thisalgorithm line by line to a NAND circuit. For example, since for ev-ery ๐‘Ž, NAND(๐‘Ž,NOT(๐‘Ž)) = 1, we can replace the initial statement๐‘0 = 1 with ๐‘0 = NAND(๐‘ฅ0,NAND(๐‘ฅ0, ๐‘ฅ0)). We already knowhow to compute XOR using NAND and so we can use this to im-plement the operation ๐‘ฆ๐‘– โ† XOR(๐‘ฅ๐‘–, ๐‘๐‘–). Similarly, we can writethe โ€œifโ€ statement as saying ๐‘๐‘–+1 โ† AND(๐‘๐‘–, ๐‘ฅ๐‘–), or in other words๐‘๐‘–+1 โ† NAND(NAND(๐‘๐‘–, ๐‘ฅ๐‘–),NAND(๐‘๐‘–, ๐‘ฅ๐‘–)). Finally, the assignment๐‘ฆ๐‘› = ๐‘๐‘› can be written as ๐‘ฆ๐‘› = NAND(NAND(๐‘๐‘›, ๐‘๐‘›),NAND(๐‘๐‘›, ๐‘๐‘›)).Combining these observations yields for every ๐‘› โˆˆ โ„•, a NAND circuitto compute INC๐‘›. For example, Fig. 3.29 shows what this circuit lookslike for ๐‘› = 4.From increment to addition. Once we have the increment operation,we can certainly compute addition by repeatedly incrementing (i.e.,compute ๐‘ฅ+๐‘ฆ by performing INC(๐‘ฅ) ๐‘ฆ times). However, that would bequite inefficient and unnecessary. With the same idea of keeping trackof carries we can implement the โ€œgrade-schoolโ€ addition algorithmand compute the function ADD๐‘› โˆถ 0, 12๐‘› โ†’ 0, 1๐‘›+1 that on

Page 148: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

148 introduction to theoretical computer science

input ๐‘ฅ โˆˆ 0, 12๐‘› outputs the binary representation of the sum of thenumbers represented by ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 and ๐‘ฅ๐‘›+1, โ€ฆ , ๐‘ฅ๐‘›:

Algorithm 3.14 โ€” Addition using NAND.

Input: ๐‘ข โˆˆ 0, 1๐‘›, ๐‘ฃ โˆˆ 0, 1๐‘› representing numbers inLSB-first binary representation.

Output: LSB-first binary representation of ๐‘ฅ + ๐‘ฆ.1: Let ๐‘0 โ† 02: for ๐‘– = 0, โ€ฆ , ๐‘› โˆ’ 1 do3: Let ๐‘ฆ๐‘– โ† ๐‘ข๐‘– + ๐‘ฃ๐‘– mod 24: if ๐‘ข๐‘– + ๐‘ฃ๐‘– + ๐‘๐‘– โ‰ฅ 2 then5: ๐‘๐‘–+1 โ† 16: else7: ๐‘๐‘–+1 โ† 08: end if9: end for

10: Let ๐‘ฆ๐‘› โ† ๐‘๐‘›Once again, Algorithm 3.14 can be translated into a NAND cir-

cuit. The crucial observation is that the โ€œif/thenโ€ statement simplycorresponds to ๐‘๐‘–+1 โ† MAJ3(๐‘ข๐‘–, ๐‘ฃ๐‘–, ๐‘ฃ๐‘–) and we have seen in SolvedExercise 3.5 that the function MAJ3 โˆถ 0, 13 โ†’ 0, 1 can be computedusing NANDs.

3.5.3 The NAND-CIRC Programming languageJust like we did for Boolean circuits, we can define a programming-language analog of NAND circuits. It is even simpler than the AON-CIRC language since we only have a single operation. We define theNAND-CIRC Programming Language to be a programming languagewhere every line has the following form:

foo = NAND(bar,blah)

where foo, bar and blah are variable identifiers.

Example 3.15 โ€” Our first NAND-CIRC program. Here is an example of aNAND-CIRC program:

u = NAND(X[0],X[1])v = NAND(X[0],u)w = NAND(X[1],u)Y[0] = NAND(v,w)

Page 149: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 149

Figure 3.30: A NAND program and the correspondingcircuit. Note how every line in the program corre-sponds to a gate in the circuit.

PDo you know what function this program computes?Hint: you have seen it before.

Formally, just like we did in Definition 3.8 for AON-CIRC, we candefine the notion of computation by a NAND-CIRC program in thenatural way:

Definition 3.16 โ€” Computing by a NAND-CIRC program. Let ๐‘“ โˆถ 0, 1๐‘› โ†’0, 1๐‘š be some function, and let ๐‘ƒ be a NAND-CIRC program.We say that ๐‘ƒ computes the function ๐‘“ if:

1. ๐‘ƒ has ๐‘› input variables X[0], โ€ฆ ,X[๐‘›โˆ’1] and ๐‘š output variablesY[0],โ€ฆ,Y[๐‘š โˆ’ 1].

2. For every ๐‘ฅ โˆˆ 0, 1๐‘›, if we execute ๐‘ƒ when we assign toX[0], โ€ฆ ,X[๐‘› โˆ’ 1] the values ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, then at the end ofthe execution, the output variables Y[0],โ€ฆ,Y[๐‘š โˆ’ 1] have thevalues ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘šโˆ’1 where ๐‘ฆ = ๐‘“(๐‘ฅ).

As before we can show that NAND circuits are equivalent toNAND-CIRC programs (see Fig. 3.30):

Theorem 3.17 โ€” NAND circuits and straight-line program equivalence. Forevery ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š and ๐‘  โ‰ฅ ๐‘š, ๐‘“ is computable by aNAND-CIRC program of ๐‘  lines if and only if ๐‘“ is computable by aNAND circuit of ๐‘  gates.

We omit the proof of Theorem 3.17 since it follows along exactlythe same lines as the equivalence of Boolean circuits and AON-CIRCprogram (Theorem 3.9). Given Theorem 3.17 and Theorem 3.12, weknow that we can translate every ๐‘ -line AON-CIRC program ๐‘ƒ intoan equivalent NAND-CIRC program of at most 3๐‘  lines. In fact, thistranslation can be easily done by replacing every line of the formfoo = AND(bar,blah), foo = OR(bar,blah) or foo = NOT(bar)with the equivalent 1-3 lines that use the NAND operation. Our GitHubrepository contains a โ€œproof by codeโ€: a simple Python programAON2NAND that transforms an AON-CIRC into an equivalent NAND-CIRC program.

RRemark 3.18 โ€” Is the NAND-CIRC programming languageTuring Complete? (optional note). You might have heardof a term called โ€œTuring Completeโ€ that is sometimes

Page 150: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

150 introduction to theoretical computer science

used to describe programming languages. (If youhavenโ€™t, feel free to ignore the rest of this remark: wedefine this term precisely in Chapter 8.) If so, youmight wonder if the NAND-CIRC programming lan-guage has this property. The answer is no, or perhapsmore accurately, the term โ€œTuring Completenessโ€ isnot really applicable for the NAND-CIRC program-ming language. The reason is that, by design, theNAND-CIRC programming language can only com-pute finite functions ๐น โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š that take afixed number of input bits and produce a fixed num-ber of outputs bits. The term โ€œTuring Completeโ€ isonly applicable to programming languages for infinitefunctions that can take inputs of arbitrary length. Wewill come back to this distinction later on in this book.

3.6 EQUIVALENCE OF ALL THESE MODELS

If we put together Theorem 3.9, Theorem 3.12, and Theorem 3.17, weobtain the following result:

Theorem 3.19 โ€” Equivalence between models of finite computation. Forevery sufficiently large ๐‘ , ๐‘›, ๐‘š and ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š, thefollowing conditions are all equivalent to one another:

โ€ข ๐‘“ can be computed by a Boolean circuit (with โˆง, โˆจ, ยฌ gates) of atmost ๐‘‚(๐‘ ) gates.

โ€ข ๐‘“ can be computed by an AON-CIRC straight-line program of atmost ๐‘‚(๐‘ ) lines.

โ€ข ๐‘“ can be computed by a NAND circuit of at most ๐‘‚(๐‘ ) gates.

โ€ข ๐‘“ can be computed by a NAND-CIRC straight-line program of atmost ๐‘‚(๐‘ ) lines.

By โ€œ๐‘‚(๐‘ )โ€ we mean that the bound is at most ๐‘ โ‹… ๐‘  where ๐‘ is a con-stant that is independent of ๐‘›. For example, if ๐‘“ can be computed by aBoolean circuit of ๐‘  gates, then it can be computed by a NAND-CIRCprogram of at most 3๐‘  lines, and if ๐‘“ can be computed by a NANDcircuit of ๐‘  gates, then it can be computed by an AON-CIRC programof at most 2๐‘  lines.

Proof Idea:

We omit the formal proof, which is obtained by combining Theo-rem 3.9, Theorem 3.12, and Theorem 3.17. The key observation is thatthe results we have seen allow us to translate a program/circuit thatcomputes ๐‘“ in one of the above models into a program/circuit that

Page 151: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 151

computes ๐‘“ in another model by increasing the lines/gates by at mosta constant factor (in fact this constant factor is at most 3).โ‹†

Theorem 3.9 is a special case of a more general result. We can con-sider even more general models of computation, where instead ofAND/OR/NOT or NAND, we use other operations (see Section 3.6.1below). It turns out that Boolean circuits are equivalent in power tosuch models as well. The fact that all these different ways to definecomputation lead to equivalent models shows that we are โ€œon theright trackโ€. It justifies the seemingly arbitrary choices that weโ€™vemade of using AND/OR/NOT or NAND as our basic operations,since these choices do not affect the power of our computationalmodel. Equivalence results such as Theorem 3.19 mean that we caneasily translate between Boolean circuits, NAND circuits, NAND-CIRC programs and the like. We will use this ability later on in thisbook, often shifting to the most convenient formulation without mak-ing a big deal about it. Hence we will not worry too much about thedistinction between, for example, Boolean circuits and NAND-CIRCprograms.

In contrast, we will continue to take special care to distinguishbetween circuits/programs and functions (recall Big Idea 2). A func-tion corresponds to a specification of a computational task, and it isa fundamentally different object than a program or a circuit, whichcorresponds to the implementation of the task.

3.6.1 Circuits with other gate setsThere is nothing special about AND/OR/NOT or NAND. For everyset of functions ๐’ข = ๐บ0, โ€ฆ , ๐บ๐‘˜โˆ’1, we can define a notion of circuitsthat use elements of ๐’ข as gates, and a notion of a โ€œ๐’ข programminglanguageโ€ where every line involves assigning to a variable foo the re-sult of applying some ๐บ๐‘– โˆˆ ๐’ข to previously defined or input variables.Specifically, we can make the following definition:

Definition 3.20 โ€” General straight-line programs. Let โ„ฑ = ๐‘“0, โ€ฆ , ๐‘“๐‘กโˆ’1be a finite collection of Boolean functions, such that ๐‘“๐‘– โˆถ 0, 1๐‘˜๐‘– โ†’0, 1 for some ๐‘˜๐‘– โˆˆ โ„•. An โ„ฑ program is a sequence of lines, each ofwhich assigns to some variable the result of applying some ๐‘“๐‘– โˆˆ โ„ฑto ๐‘˜๐‘– other variables. As above, we use X[๐‘–] and Y[๐‘—] to denote theinput and output variables.

We say that โ„ฑ is a universal set of operations (also known as a uni-versal gate set) if there exists a โ„ฑ program to compute the functionNAND.

Page 152: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

152 introduction to theoretical computer science

3 One can also define these functions as taking alength zero input. This makes no difference for thecomputational power of the model.

AON-CIRC programs correspond to ๐ด๐‘๐ท,OR,NOT programs,NAND-CIRC programs corresponds to โ„ฑ programs for the setโ„ฑ that only contains the NAND function, but we can also defineIF,ZERO,ONE programs (see below), or use any other set.

We can also define โ„ฑ circuits, which will be directed graphs inwhich each gate corresponds to applying a function ๐‘“๐‘– โˆˆ โ„ฑ, and willeach have ๐‘˜๐‘– incoming wires and a single outgoing wire. (If the func-tion ๐‘“๐‘– is not symmetric, in the sense that the order of its input mattersthen we need to label each wire entering a gate as to which parameterof the function it corresponds to.) As in Theorem 3.9, we can showthat โ„ฑ circuits and โ„ฑ programs are equivalent. We have seen that forโ„ฑ = AND,OR,NOT, the resulting circuits/programs are equivalentin power to the NAND-CIRC programming language, as we can com-pute NAND using AND/OR/NOT and vice versa. This turns out to bea special case of a general phenomenaโ€” the universality of NAND andother gate sets โ€” that we will explore more in depth later in this book.

Example 3.21 โ€” IF,ZERO,ONE circuits. Let โ„ฑ = IF,ZERO,ONEwhere ZERO โˆถ 0, 1 โ†’ 0 and ONE โˆถ 0, 1 โ†’ 1 are theconstant zero and one functions, 3 and IF โˆถ 0, 13 โ†’ 0, 1 is thefunction that on input (๐‘Ž, ๐‘, ๐‘) outputs ๐‘ if ๐‘Ž = 1 and ๐‘ otherwise.Then โ„ฑ is universal.

Indeed, we can demonstrate that IF,ZERO,ONE is universalusing the following formula for NAND:

NAND(๐‘Ž, ๐‘) = IF(๐‘Ž, IF(๐‘,ZERO,ONE),ONE) . (3.12)

There are also some sets โ„ฑ that are more restricted in power. Forexample it can be shown that if we use only AND or OR gates (with-out NOT) then we do not get an equivalent model of computation.The exercises cover several examples of universal and non-universalgate sets.

3.6.2 Specification vs. implementation (again)As we discussed in Section 2.6.1, one of the most important distinc-tions in this book is that of specification versus implementation or sep-arating โ€œwhatโ€ from โ€œhowโ€ (see Fig. 3.31). A function correspondsto the specification of a computational task, that is what output shouldbe produced for every particular input. A program (or circuit, or anyother way to specify algorithms) corresponds to the implementation ofhow to compute the desired output from the input. That is, a programis a set of instructions how to compute the output from the input.Even within the same computational model there can be many differ-ent ways to compute the same function. For example, there is more

Page 153: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 153

Figure 3.31: It is crucial to distinguish between thespecification of a computational task, namely what isthe function that is to be computed and the implemen-tation of it, namely the algorithm, program, or circuitthat contains the instructions defining how to mapan input to an output. The same function could becomputed in many different ways.

than one NAND-CIRC program that computes the majority function,more than one Boolean circuit to compute the addition function, andso on and so forth.

Confusing specification and implementation (or equivalently func-tions and programs) is a common mistake, and one that is unfortu-nately encouraged by the common programming-language termi-nology of referring to parts of programs as โ€œfunctionsโ€. However, inboth the theory and practice of computer science, it is important tomaintain this distinction, and it is particularly important for us in thisbook.

Chapter Recap

โ€ข An algorithm is a recipe for performing a compu-tation as a sequence of โ€œelementaryโ€ or โ€œsimpleโ€operations.

โ€ข One candidate definition for โ€œelementaryโ€ opera-tions is the set AND, OR and NOT.

โ€ข Another candidate definition for an โ€œelementaryโ€operation is the NAND operation. It is an operationthat is easily implementable in the physical worldin a variety of methods including by electronictransistors.

โ€ข We can use NAND to compute many other func-tions, including majority, increment, and others.

โ€ข There are other equivalent choices, including thesets ๐ด๐‘๐ท,OR,NOT and IF,ZERO,ONE.

โ€ข We can formally define the notion of a function๐น โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š being computable using theNAND-CIRC Programming language.

โ€ข For every set of basic operations, the notions of be-ing computable by a circuit and being computableby a straight-line program are equivalent.

Page 154: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

154 introduction to theoretical computer science

4 Hint: Use the fact that MAJ(๐‘Ž, ๐‘, ๐‘) = ๐‘€๐ด๐ฝ(๐‘Ž, ๐‘, ๐‘)to prove that every ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 computableby a circuit with only MAJ and NOT gates satisfies๐‘“(0, 0, โ€ฆ , 0) โ‰  ๐‘“(1, 1, โ€ฆ , 1). Thanks to NathanBrunelle and David Evans for suggesting this exercise.

3.7 EXERCISES

Exercise 3.1 โ€” Compare 4 bit numbers. Give a Boolean circuit(with AND/OR/NOT gates) that computes the functionCMP8 โˆถ 0, 18 โ†’ 0, 1 such that CMP8(๐‘Ž0, ๐‘Ž1, ๐‘Ž2, ๐‘Ž3, ๐‘0, ๐‘1, ๐‘2, ๐‘3) = 1if and only if the number represented by ๐‘Ž0๐‘Ž1๐‘Ž2๐‘Ž3 is larger than thenumber represented by ๐‘0๐‘1๐‘2๐‘3.

Exercise 3.2 โ€” Compare ๐‘› bit numbers. Prove that there exists a constant ๐‘such that for every ๐‘› there is a Boolean circuit (with AND/OR/NOTgates) ๐ถ of at most ๐‘ โ‹… ๐‘› gates that computes the function CMP2๐‘› โˆถ0, 12๐‘› โ†’ 0, 1 such that CMP2๐‘›(๐‘Ž0 โ‹ฏ ๐‘Ž๐‘›โˆ’1๐‘0 โ‹ฏ ๐‘๐‘›โˆ’1) = 1 if andonly if the number represented by ๐‘Ž0 โ‹ฏ ๐‘Ž๐‘›โˆ’1 is larger than the numberrepresented by ๐‘0 โ‹ฏ ๐‘๐‘›โˆ’1.

Exercise 3.3 โ€” OR,NOT is universal. Prove that the set OR,NOT is univer-sal, in the sense that one can compute NAND using these gates.

Exercise 3.4 โ€” AND,OR is not universal. Prove that for every ๐‘›-bit inputcircuit ๐ถ that contains only AND and OR gates, as well as gates thatcompute the constant functions 0 and 1, ๐ถ is monotone, in the sensethat if ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐‘›, ๐‘ฅ๐‘– โ‰ค ๐‘ฅโ€ฒ๐‘– for every ๐‘– โˆˆ [๐‘›], then ๐ถ(๐‘ฅ) โ‰ค ๐ถ(๐‘ฅโ€ฒ).

Conclude that the set AND,OR, 0, 1 is not universal.

Exercise 3.5 โ€” XOR is not universal. Prove that for every ๐‘›-bit input circuit๐ถ that contains only XOR gates, as well as gates that compute theconstant functions 0 and 1, ๐ถ is affine or linear modulo two, in the sensethat there exists some ๐‘Ž โˆˆ 0, 1๐‘› and ๐‘ โˆˆ 0, 1 such that for every๐‘ฅ โˆˆ 0, 1๐‘›, ๐ถ(๐‘ฅ) = โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘Ž๐‘–๐‘ฅ๐‘– + ๐‘ mod 2.

Conclude that the set XOR, 0, 1 is not universal.

Exercise 3.6 โ€” MAJ,NOT, 1 is universal. Let MAJ โˆถ 0, 13 โ†’ 0, 1 be themajority function. Prove that MAJ,NOT, 1 is a universal set of gates.

Exercise 3.7 โ€” MAJ,NOT is not universal. Prove that MAJ,NOT is not auniversal set. See footnote for hint.4

Exercise 3.8 โ€” NOR is universal. Let NOR โˆถ 0, 12 โ†’ 0, 1 defined asNOR(๐‘Ž, ๐‘) = NOT(OR(๐‘Ž, ๐‘)). Prove that NOR is a universal set ofgates.

Page 155: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 155

5 Thanks to Alec Sun and Simon Fischer for commentson this problem.

6 Hint: Use the conditions of Definition 3.5 stipulatingthat every input vertex has at least one out-neighborand there are exactly ๐‘š output gates. See also Re-mark 3.7.

Exercise 3.9 โ€” Lookup is universal. Prove that LOOKUP1, 0, 1 is a uni-versal set of gates where 0 and 1 are the constant functions andLOOKUP1 โˆถ 0, 13 โ†’ 0, 1 satisfies LOOKUP1(๐‘Ž, ๐‘, ๐‘) equals ๐‘Ž if๐‘ = 0 and equals ๐‘ if ๐‘ = 1.

Exercise 3.10 โ€” Bound on universal basis size (challenge). Prove that for ev-ery subset ๐ต of the functions from 0, 1๐‘˜ to 0, 1, if ๐ต is universalthen there is a ๐ต-circuit of at most ๐‘‚(1) gates to compute the NANDfunction (you can start by showing that there is a ๐ต circuit of at most๐‘‚(๐‘˜16) gates).5

Exercise 3.11 โ€” Size and inputs / outputs. Prove that for every NAND cir-cuit of size ๐‘  with ๐‘› inputs and ๐‘š outputs, ๐‘  โ‰ฅ min๐‘›/2, ๐‘š. Seefootnote for hint.6

Exercise 3.12 โ€” Threshold using NANDs. Prove that there is some constant๐‘ such that for every ๐‘› > 1, and integers ๐‘Ž0, โ€ฆ , ๐‘Ž๐‘›โˆ’1, ๐‘ โˆˆ โˆ’2๐‘›, โˆ’2๐‘› +1, โ€ฆ , โˆ’1, 0, +1, โ€ฆ , 2๐‘›, there is a NAND circuit with at most ๐‘๐‘›4 gatesthat computes the threshold function ๐‘“๐‘Ž0,โ€ฆ,๐‘Ž๐‘›โˆ’1,๐‘ โˆถ 0, 1๐‘› โ†’ 0, 1 thaton input ๐‘ฅ โˆˆ 0, 1๐‘› outputs 1 if and only if โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘Ž๐‘–๐‘ฅ๐‘– > ๐‘.

Exercise 3.13 โ€” NANDs from activation functions. We say that a function๐‘“ โˆถ โ„2 โ†’ โ„ is a NAND approximator if it has the following property: forevery ๐‘Ž, ๐‘ โˆˆ โ„, if min|๐‘Ž|, |1 โˆ’ ๐‘Ž| โ‰ค 1/3 and min|๐‘|, |1 โˆ’ ๐‘| โ‰ค 1/3 then|๐‘“(๐‘Ž, ๐‘) โˆ’ NAND(โŒŠ๐‘ŽโŒ‰, โŒŠ๐‘โŒ‰)| โ‰ค 1/3 where we denote by โŒŠ๐‘ฅโŒ‰ the integerclosest to ๐‘ฅ. That is, if ๐‘Ž, ๐‘ are within a distance 1/3 to 0, 1 then wewant ๐‘“(๐‘Ž, ๐‘) to equal the NAND of the values in 0, 1 that are closestto ๐‘Ž and ๐‘ respectively. Otherwise, we do not care what the output of๐‘“ is on ๐‘Ž and ๐‘.

In this exercise you will show that you can construct a NAND ap-proximator from many common activation functions used in deepneural networks. As a corollary you will obtain that deep neural net-works can simulate NAND circuits. Since NAND circuits can alsosimulate deep neural networks, these two computational models areequivalent to one another.

1. Show that there is a NAND approximator ๐‘“ defined as ๐‘“(๐‘Ž, ๐‘) =๐ฟ(๐‘…๐‘’๐ฟ๐‘ˆ(๐ฟโ€ฒ(๐‘Ž, ๐‘))) where ๐ฟโ€ฒ โˆถ โ„2 โ†’ โ„ is an affine function (ofthe form ๐ฟโ€ฒ(๐‘Ž, ๐‘) = ๐›ผ๐‘Ž + ๐›ฝ๐‘ + ๐›พ for some ๐›ผ, ๐›ฝ, ๐›พ โˆˆ โ„), ๐ฟ is anaffine function (of the form ๐ฟ(๐‘ฆ) = ๐›ผ๐‘ฆ + ๐›ฝ for ๐›ผ, ๐›ฝ โˆˆ โ„), and๐‘…๐‘’๐ฟ๐‘ˆ โˆถ โ„ โ†’ โ„, is the function defined as ๐‘…๐‘’๐ฟ๐‘ˆ(๐‘ฅ) = max0, ๐‘ฅ.

Page 156: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

156 introduction to theoretical computer science

7 One approach to solve this is using recursion andanalyzing it using the so called โ€œMaster Theoremโ€.

8 Hint: Vertices in layers beyond the output can besafely removed without changing the functionality ofthe circuit.

2. Show that there is a NAND approximator ๐‘“ defined as ๐‘“(๐‘Ž, ๐‘) =๐ฟ(๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐ฟโ€ฒ(๐‘Ž, ๐‘))) where ๐ฟโ€ฒ, ๐ฟ are affine as above and ๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘ โˆถโ„ โ†’ โ„ is the function defined as ๐‘ ๐‘–๐‘”๐‘š๐‘œ๐‘–๐‘‘(๐‘ฅ) = ๐‘’๐‘ฅ/(๐‘’๐‘ฅ + 1).3. Show that there is a NAND approximator ๐‘“ defined as ๐‘“(๐‘Ž, ๐‘) =๐ฟ(๐‘ก๐‘Ž๐‘›โ„Ž(๐ฟโ€ฒ(๐‘Ž, ๐‘))) where ๐ฟโ€ฒ, ๐ฟ are affine as above and ๐‘ก๐‘Ž๐‘›โ„Ž โˆถ โ„ โ†’ โ„

is the function defined as ๐‘ก๐‘Ž๐‘›โ„Ž(๐‘ฅ) = (๐‘’๐‘ฅ โˆ’ ๐‘’โˆ’๐‘ฅ)/(๐‘’๐‘ฅ + ๐‘’โˆ’๐‘ฅ).4. Prove that for every NAND-circuit ๐ถ with ๐‘› inputs and one output

that computes a function ๐‘” โˆถ 0, 1๐‘› โ†’ 0, 1, if we replace everygate of ๐ถ with a NAND-approximator and then invoke the result-ing circuit on some ๐‘ฅ โˆˆ 0, 1๐‘›, the output will be a number ๐‘ฆ suchthat |๐‘ฆ โˆ’ ๐‘”(๐‘ฅ)| โ‰ค 1/3.

Exercise 3.14 โ€” Majority with NANDs efficiently. Prove that there is someconstant ๐‘ such that for every ๐‘› > 1, there is a NAND circuit of atmost ๐‘ โ‹… ๐‘› gates that computes the majority function on ๐‘› input bitsMAJ๐‘› โˆถ 0, 1๐‘› โ†’ 0, 1. That is MAJ๐‘›(๐‘ฅ) = 1 iff โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘– > ๐‘›/2. Seefootnote for hint.7

Exercise 3.15 โ€” Output at last layer. Prove that for every ๐‘“ โˆถ 0, 1๐‘› โ†’0, 1, if there is a Boolean circuit ๐ถ of ๐‘  gates that computes ๐‘“ thenthere is a Boolean circuit ๐ถโ€ฒ of at most ๐‘  gates such that in the minimallayering of ๐ถโ€ฒ, the output gate of ๐ถโ€ฒ is placed in the last layer. Seefootnote for hint.8

3.8 BIOGRAPHICAL NOTES

The excerpt from Al-Khwarizmiโ€™s book is from โ€œThe Algebra of Ben-Musaโ€, Fredric Rosen, 1831.

Charles Babbage (1791-1871) was a visionary scientist, mathemati-cian, and inventor (see [Swa02; CM00]). More than a century beforethe invention of modern electronic computers, Babbage realized thatcomputation can be in principle mechanized. His first design for amechanical computer was the difference engine that was designed to dopolynomial interpolation. He then designed the analytical engine whichwas a much more general machine and the first prototype for a pro-grammable general purpose computer. Unfortunately, Babbage wasnever able to complete the design of his prototypes. One of the earliestpeople to realize the engineโ€™s potential and far reaching implicationswas Ada Lovelace (see the notes for Chapter 7).

Boolean algebra was first investigated by Boole and DeMorganin the 1840โ€™s [Boo47; De 47]. The definition of Boolean circuits and

Page 157: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

defining computation 157

connection to electrical relay circuits was given in Shannonโ€™s MastersThesis [Sha38]. (Howard Gardener called Shannonโ€™s thesis โ€œpossiblythe most important, and also the most famous, masterโ€™s thesis of the[20th] centuryโ€.) Savageโ€™s book [Sav98], like this one, introducesthe theory of computation starting with Boolean circuits as the firstmodel. Juknaโ€™s book [Juk12] contains a modern in-depth exposition ofBoolean circuits, see also [Weg87].

The NAND function was shown to be universal by Sheffer [She13],though this also appears in the earlier work of Peirce, see [Bur78].Whitehead and Russell used NAND as the basis for their logic intheir magnum opus Principia Mathematica [WR12]. In her Ph.D thesis,Ernst [Ern09] investigates empirically the minimal NAND circuitsfor various functions. Nisan and Shockenโ€™s book [NS05] builds acomputing system starting from NAND gates and ending with highlevel programs and games (โ€œNAND to Tetrisโ€); see also the websitenandtotetris.org.

Page 158: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 159: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

4Syntactic sugar, and computing every function

โ€œ[In 1951] I had a running compiler and nobody would touch it because,they carefully told me, computers could only do arithmetic; they could not doprograms.โ€, Grace Murray Hopper, 1986.

โ€œSyntactic sugar causes cancer of the semicolon.โ€, Alan Perlis, 1982.

The computational models we considered thus far are as โ€œbarebonesโ€ as they come. For example, our NAND-CIRC โ€œprogramminglanguageโ€ has only the single operation foo = NAND(bar,blah). Inthis chapter we will see that these simple models are actually equiv-alent to more sophisticated ones. The key observation is that we canimplement more complex features using our basic building blocks,and then use these new features themselves as building blocks foreven more sophisticated features. This is known as โ€œsyntactic sugarโ€in the field of programming language design since we are not modi-fying the underlying programming model itself, but rather we merelyimplement new features by syntactically transforming a program thatuses such features into one that doesnโ€™t.

This chapter provides a โ€œtoolkitโ€ that can be used to show thatmany functions can be computed by NAND-CIRC programs, andhence also by Boolean circuits. We will also use this toolkit to provea fundamental theorem: every finite function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘šcan be computed by a Boolean circuit, see Theorem 4.13 below. Whilethe syntactic sugar toolkit is important in its own right, Theorem 4.13can also be proven directly without using this toolkit. We present thisalternative proof in Section 4.5. See Fig. 4.1 for an outline of the resultsof this chapter.

This chapter: A non-mathy overviewIn this chapter we will see our first major result: every fi-nite function can be computed some Boolean circuit (seeTheorem 4.13 and Big Idea 5). This is sometimes known as

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Get comfortable with syntactic sugar or

automatic translation of higher level logic tolow level gates.

โ€ข Learn proof of major result: every finitefunction can be computed by a Booleancircuit.

โ€ข Start thinking quantitatively about numberof lines required for computation.

Page 160: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

160 introduction to theoretical computer science

Figure 4.1: An outline of the results of this chapter. InSection 4.1 we give a toolkit of โ€œsyntactic sugarโ€ trans-formations showing how to implement features suchas programmer-defined functions and conditionalstatements in NAND-CIRC. We use these tools inSection 4.3 to give a NAND-CIRC program (or alter-natively a Boolean circuit) to compute the LOOKUPfunction. We then build on this result to show in Sec-tion 4.4 that NAND-CIRC programs (or equivalently,Boolean circuits) can compute every finite function.An alternative direct proof of the same result is givenin Section 4.5.

the โ€œuniversalityโ€ of AND, OR, and NOT (and, using theequivalence of Chapter 3, of NAND as well)Despite being an important result, Theorem 4.13 is actuallynot that hard to prove. Section 4.5 presents a relatively sim-ple direct proof of this result. However, in Section 4.1 andSection 4.3 we derive this result using the concept of โ€œsyntac-tic sugarโ€ (see Big Idea 4). This is an important concept forprogramming languages theory and practice. The idea be-hind โ€œsyntactic sugarโ€ is that we can extend a programminglanguage by implementing advanced features from its basiccomponents. For example, we can take the AON-CIRC andNAND-CIRC programming languages we saw in Chapter 3,and extend them to achieve features such as user-definedfunctions (e.g., def Foo(...)), condtional statements (e.g.,if blah ...), and more. Once we have these features, itis not that hard to show that we can take the โ€œtruth tableโ€(table of all inputs and outputs) of any function, and use thatto create an AON-CIRC or NAND-CIRC program that mapseach input to its corresponding output.We will also get our first glimpse of quantitative measures inthis chapter. While Theorem 4.13 tells us that every func-tion can be computed by some circuit, the number of gatesin this circuit can be exponentially large. (We are not usinghere โ€œexponentiallyโ€ as some colloquial term for โ€œvery verybigโ€ but in a very precise mathematical sense, which alsohappens to coincide with being very very big.) It turns outthat some functions (for example, integer addition and multi-

Page 161: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 161

plication) can be in fact computed using far fewer gates. Wewill explore this issue of โ€œgate complexityโ€ more deeply inChapter 5 and following chapters.

4.1 SOME EXAMPLES OF SYNTACTIC SUGAR

We now present some examples of โ€œsyntactic sugarโ€ transformationsthat we can use in constructing straightline programs or circuits. Wefocus on the straight-line programming language view of our computa-tional models, and specifically (for the sake of concreteness) on theNAND-CIRC programming language. This is convenient becausemany of the syntactic sugar transformations we present are easiest tothink about in terms of applying โ€œsearch and replaceโ€ operations tothe source code of a program. However, by Theorem 3.19, all of ourresults hold equally well for circuits, whether ones using NAND gatesor Boolean circuits that use the AND, OR, and NOT operations. Enu-merating the examples of such syntactic sugar transformations can bea little tedious, but we do it for two reasons:

1. To convince you that despite their seeming simplicity and limita-tions, simple models such as Boolean circuits or the NAND-CIRCprogramming language are actually quite powerful.

2. So you can realize how lucky you are to be taking a theory of com-putation course and not a compilers courseโ€ฆ :)

4.1.1 User-defined proceduresOne staple of almost any programming language is the ability todefine and then execute procedures or subroutines. (These are oftenknown as functions in some programming languages, but we preferthe name procedures to avoid confusion with the function that a pro-gram computes.) The NAND-CIRC programming language doesnot have this mechanism built in. However, we can achieve the sameeffect using the time honored technique of โ€œcopy and pasteโ€. Specifi-cally, we can replace code which defines a procedure such as

def Proc(a,b):proc_codereturn c

some_codef = Proc(d,e)some_more_code

with the following code where we โ€œpasteโ€ the code of Proc

Page 162: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

162 introduction to theoretical computer science

some_codeproc_code'some_more_code

and where proc_code' is obtained by replacing all occurrencesof a with d, b with e, and c with f. When doing that we will need toensure that all other variables appearing in proc_code' donโ€™t interferewith other variables. We can always do so by renaming variables tonew names that were not used before. The above reasoning leads tothe proof of the following theorem:

Theorem 4.1 โ€” Procedure definition synctatic sugar. Let NAND-CIRC-PROC be the programming language NAND-CIRC augmentedwith the syntax above for defining procedures. Then for everyNAND-CIRC-PROC program ๐‘ƒ , there exists a standard (i.e.,โ€œsugar freeโ€) NAND-CIRC program ๐‘ƒ โ€ฒ that computes the samefunction as ๐‘ƒ .

RRemark 4.2 โ€” No recursive procedure. NAND-CIRC-PROC only allows non recursive procedures. In particu-lar, the code of a procedure Proc cannot call Proc butonly use procedures that were defined before it. With-out this restriction, the above โ€œsearch and replaceโ€procedure might never terminate and Theorem 4.1would not be true.

Theorem 4.1 can be proven using the transformation above, butsince the formal proof is somewhat long and tedious, we omit it here.

Example 4.3 โ€” Computing Majority from NAND using syntactic sugar. Pro-cedures allow us to express NAND-CIRC programs much morecleanly and succinctly. For example, because we can computeAND, OR, and NOT using NANDs, we can compute the Majorityfunction as follows:

def NOT(a):return NAND(a,a)

def AND(a,b):temp = NAND(a,b)return NOT(temp)

def OR(a,b):temp1 = NOT(a)temp2 = NOT(b)

Page 163: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 163

return NAND(temp1,temp2)

def MAJ(a,b,c):and1 = AND(a,b)and2 = AND(a,c)and3 = AND(b,c)or1 = OR(and1,and2)return OR(or1,and3)

print(MAJ(0,1,1))# 1

Fig. 4.2 presents the โ€œsugar freeโ€ NAND-CIRC program (andthe corresponding circuit) that is obtained by โ€œexpanding outโ€ thisprogram, replacing the calls to procedures with their definitions.

Big Idea 4 Once we show that a computational model ๐‘‹ is equiv-alent to a model that has feature ๐‘Œ , we can assume we have ๐‘Œ whenshowing that a function ๐‘“ is computable by ๐‘‹.

Figure 4.2: A standard (i.e., โ€œsugar freeโ€) NAND-CIRC program that is obtained by expanding out theprocedure definitions in the program for Majorityof Example 4.3. The corresponding circuit is onthe right. Note that this is not the most efficientNAND circuit/program for majority: we can save onsome gates by โ€œshort cuttingโ€ steps where a gate ๐‘ขcomputes NAND(๐‘ฃ, ๐‘ฃ) and then a gate ๐‘ค computesNAND(๐‘ข, ๐‘ข) (as indicated by the dashed greenarrows in the above figure).

RRemark 4.4 โ€” Counting lines. While we can use syn-tactic sugar to present NAND-CIRC programs in morereadable ways, we did not change the definition ofthe language itself. Therefore, whenever we say thatsome function ๐‘“ has an ๐‘ -line NAND-CIRC programwe mean a standard โ€œsugar freeโ€ NAND-CIRC pro-gram, where all syntactic sugar has been expandedout. For example, the program of Example 4.3 is a12-line program for computing the MAJ function,

Page 164: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

164 introduction to theoretical computer science

even though it can be written in fewer lines usingNAND-CIRC-PROC.

4.1.2 Proof by Python (optional)We can write a Python program that implements the proof of Theo-rem 4.1. This is a Python program that takes a NAND-CIRC-PROCprogram ๐‘ƒ that includes procedure definitions and uses simpleโ€œsearch and replaceโ€ to transform ๐‘ƒ into a standard (i.e., โ€œsugarfreeโ€) NAND-CIRC program ๐‘ƒ โ€ฒ that computes the same functionas ๐‘ƒ without using any procedures. The idea is simple: if the program๐‘ƒ contains a definition of a procedure Proc of two arguments x and y,then whenever we see a line of the form foo = Proc(bar,blah), wecan replace this line by:

1. The body of the procedure Proc (replacing all occurrences of x andy with bar and blah respectively).

2. A line foo = exp, where exp is the expression following the re-turn statement in the definition of the procedure Proc.

To make this more robust we add a prefix to the internal variablesused by Proc to ensure they donโ€™t conflict with the variables of ๐‘ƒ ;for simplicity we ignore this issue in the code below though it can beeasily added.

The code of the Python function desugar below achieves such atransformation.

Fig. 4.2 shows the result of applying desugar to the program of Ex-ample 4.3 that uses syntactic sugar to compute the Majority function.Specifically, we first apply desugar to remove usage of the OR func-tion, then apply it to remove usage of the AND function, and finallyapply it a third time to remove usage of the NOT function.

RRemark 4.5 โ€” Parsing function definitions (optional). Thefunction desugar in Fig. 4.3 assumes that it is giventhe procedure already split up into its name, argu-ments, and body. It is not crucial for our purposes todescribe precisely how to scan a definition and split itup into these components, but in case you are curious,it can be achieved in Python via the following code:

def parse_func(code):"""Parse a function definition into name,

arguments and body"""lines = [l.strip() for l in code.split('\n')]regexp = r'def\s+([a-zA-Z\_0-9]+)\(([\sa-zA-

Z0-9\_,]+)\)\s*:\s*'

Page 165: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 165

Figure 4.3: Python code for transforming NAND-CIRC-PROC programs into standard sugar free NAND-CIRC programs.

def desugar(code, func_name, func_args,func_body):"""Replaces all occurences of

foo = func_name(func_args)with

func_body[x->a,y->b]foo = [result returned in func_body]

"""# Uses Python regular expressions to simplify the search and replace,# see https://docs.python.org/3/library/re.html and Chapter 9 of the book

# regular expression for capturing a list of variable names separated by commasarglist = ",".join([r"([a-zA-Z0-9\_\[\]]+)" for i in range(len(func_args))])# regular expression for capturing a statement of the form# "variable = func_name(arguments)"regexp = fr'([a-zA-Z0-9\_\[\]]+)\s*=\s*func_name\(arglist\)\s*$'while True:

m = re.search(regexp, code, re.MULTILINE)if not m: breaknewcode = func_body# replace function arguments by the variables from the function invocationfor i in range(len(func_args)):

newcode = newcode.replace(func_args[i], m.group(i+2))# Splice the new code insidenewcode = newcode.replace('return', m.group(1) + " = ")code = code[:m.start()] + newcode + code[m.end()+1:]

return code

Page 166: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

166 introduction to theoretical computer science

m = re.match(regexp,lines[0])return m.group(1), m.group(2).split(','),

'\n'.join(lines[1:])

4.1.3 Conditional statementsAnother sorely missing feature in NAND-CIRC is a conditionalstatement such as the if/then constructs that are found in manyprogramming languages. However, using procedures, we can ob-tain an ersatz if/then construct. First we can compute the functionIF โˆถ 0, 13 โ†’ 0, 1 such that IF(๐‘Ž, ๐‘, ๐‘) equals ๐‘ if ๐‘Ž = 1 and ๐‘ if ๐‘Ž = 0.

PBefore reading onward, try to see how you could com-pute the IF function using NANDโ€™s. Once you do that,see how you can use that to emulate if/then types ofconstructs.

The IF function can be implemented from NANDs as follows (seeExercise 4.2):

def IF(cond,a,b):notcond = NAND(cond,cond)temp = NAND(b,notcond)temp1 = NAND(a,cond)return NAND(temp,temp1)

The IF function is also known as a multiplexing function, since ๐‘๐‘œ๐‘›๐‘‘can be thought of as a switch that controls whether the output is con-nected to ๐‘Ž or ๐‘. Once we have a procedure for computing the IF func-tion, we can implement conditionals in NAND. The idea is that wereplace code of the form

if (condition): assign blah to variable foo

with code of the form

foo = IF(condition, blah, foo)

that assigns to foo its old value when condition equals 0, andassign to foo the value of blah otherwise. More generally we canreplace code of the form

if (cond):a = ...b = ...c = ...

Page 167: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 167

with code of the form

temp_a = ...temp_b = ...temp_c = ...a = IF(cond,temp_a,a)b = IF(cond,temp_b,b)c = IF(cond,temp_c,c)

Using such transformations, we can prove the following theorem.Once again we omit the (not too insightful) full formal proof, thoughsee Section 4.1.2 for some hints on how to obtain it.

Theorem 4.6 โ€” Conditional statements synctatic sugar. Let NAND-CIRC-IF be the programming language NAND-CIRC augmented withif/then/else statements for allowing code to be conditionallyexecuted based on whether a variable is equal to 0 or 1.

Then for every NAND-CIRC-IF program ๐‘ƒ , there exists a stan-dard (i.e., โ€œsugar freeโ€) NAND-CIRC program ๐‘ƒ โ€ฒ that computesthe same function as ๐‘ƒ .

4.2 EXTENDED EXAMPLE: ADDITION AND MULTIPLICATION (OP-TIONAL)

Using โ€œsyntactic sugarโ€, we can write the integer addition function asfollows:

# Add two n-bit integers# Use LSB first notation for simplicitydef ADD(A,B):

Result = [0]*(n+1)Carry = [0]*(n+1)Carry[0] = zero(A[0])for i in range(n):

Result[i] = XOR(Carry[i],XOR(A[i],B[i]))Carry[i+1] = MAJ(Carry[i],A[i],B[i])

Result[n] = Carry[n]return Result

ADD([1,1,1,0,0],[1,0,0,0,0]);;# [0, 0, 0, 1, 0, 0]

where zero is the constant zero function, and MAJ and XOR corre-spond to the majority and XOR functions respectively. While we usePython syntax for convenience, in this example ๐‘› is some fixed integerand so for every such ๐‘›, ADD is a finite function that takes as input 2๐‘›

Page 168: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

168 introduction to theoretical computer science

1 The value of ๐‘ can be improved to 9, see Exercise 4.5.

Figure 4.5: The number of lines in our NAND-CIRCprogram to add two ๐‘› bit numbers, as a function of๐‘›, for ๐‘›โ€™s between 1 and 100. This is not the mostefficient program for this task, but the important pointis that it has the form ๐‘‚(๐‘›).

bits and outputs ๐‘› + 1 bits. In particular for every ๐‘› we can removethe loop construct for i in range(n) by simply repeating the code ๐‘›times, replacing the value of i with 0, 1, 2, โ€ฆ , ๐‘› โˆ’ 1. By expanding outall the features, for every value of ๐‘› we can translate the above pro-gram into a standard (โ€œsugar freeโ€) NAND-CIRC program. Fig. 4.4depicts what we get for ๐‘› = 2.

Figure 4.4: The NAND-CIRC program and corre-sponding NAND circuit for adding two-digit binarynumbers that are obtained by โ€œexpanding outโ€ all thesyntactic sugar. The program/circuit has 43 lines/-gates which is by no means necessary. It is possibleto add ๐‘› bit numbers using 9๐‘› NAND gates, seeExercise 4.5.

By going through the above program carefully and accounting forthe number of gates, we can see that it yields a proof of the followingtheorem (see also Fig. 4.5):

Theorem 4.7 โ€” Addition using NAND-CIRC programs. For every ๐‘› โˆˆ โ„•,let ADD๐‘› โˆถ 0, 12๐‘› โ†’ 0, 1๐‘›+1 be the function that, given๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐‘› computes the representation of the sum of the num-bers that ๐‘ฅ and ๐‘ฅโ€ฒ represent. Then there is a constant ๐‘ โ‰ค 30 suchthat for every ๐‘› there is a NAND-CIRC program of at most ๐‘๐‘› linescomputing ADD๐‘›. 1

Once we have addition, we can use the grade-school algorithm toobtain multiplication as well, thus obtaining the following theorem:

Theorem 4.8 โ€” Multiplication using NAND-CIRC programs. For every ๐‘›,let MULT๐‘› โˆถ 0, 12๐‘› โ†’ 0, 12๐‘› be the function that, given๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐‘› computes the representation of the product of thenumbers that ๐‘ฅ and ๐‘ฅโ€ฒ represent. Then there is a constant ๐‘ suchthat for every ๐‘›, there is a NAND-CIRC program of at most ๐‘๐‘›2that computes the function MULT๐‘›.We omit the proof, though in Exercise 4.7 we ask you to supply

a โ€œconstructive proofโ€ in the form of a program (in your favorite

Page 169: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 169

programming language) that on input a number ๐‘›, outputs the codeof a NAND-CIRC program of at most 1000๐‘›2 lines that computes theMULT๐‘› function. In fact, we can use Karatsubaโ€™s algorithm to showthat there is a NAND-CIRC program of ๐‘‚(๐‘›log2 3) lines to computeMULT๐‘› (and can get even further asymptotic improvements usingbetter algorithms).

4.3 THE LOOKUP FUNCTION

The LOOKUP function will play an important role in this chapter andlater. It is defined as follows:

Definition 4.9 โ€” Lookup function. For every ๐‘˜, the lookup function oforder ๐‘˜, LOOKUP๐‘˜ โˆถ 0, 12๐‘˜+๐‘˜ โ†’ 0, 1 is defined as follows: Forevery ๐‘ฅ โˆˆ 0, 12๐‘˜ and ๐‘– โˆˆ 0, 1๐‘˜,

LOOKUP๐‘˜(๐‘ฅ, ๐‘–) = ๐‘ฅ๐‘– (4.1)

where ๐‘ฅ๐‘– denotes the ๐‘–๐‘กโ„Ž entry of ๐‘ฅ, using the binary representationto identify ๐‘– with a number in 0, โ€ฆ , 2๐‘˜ โˆ’ 1.

Figure 4.6: The LOOKUP๐‘˜ function takes an inputin 0, 12๐‘˜+๐‘˜, which we denote by ๐‘ฅ, ๐‘– (with ๐‘ฅ โˆˆ0, 12๐‘˜ and ๐‘– โˆˆ 0, 1๐‘˜). The output is ๐‘ฅ๐‘–: the ๐‘–-thcoordinate of ๐‘ฅ, where we identify ๐‘– as a numberin [๐‘˜] using the binary representation. In the aboveexample ๐‘ฅ โˆˆ 0, 116 and ๐‘– โˆˆ 0, 14. Since ๐‘– = 0110is the binary representation of the number 6, theoutput of LOOKUP4(๐‘ฅ, ๐‘–) in this case is ๐‘ฅ6 = 1.

See Fig. 4.6 for an illustration of the LOOKUP function. It turnsout that for every ๐‘˜, we can compute LOOKUP๐‘˜ using a NAND-CIRCprogram:

Theorem 4.10 โ€” Lookup function. For every ๐‘˜ > 0, there is a NAND-CIRC program that computes the function LOOKUP๐‘˜ โˆถ 0, 12๐‘˜+๐‘˜ โ†’0, 1. Moreover, the number of lines in this program is at most4 โ‹… 2๐‘˜.An immediate corollary of Theorem 4.10 is that for every ๐‘˜ > 0,

LOOKUP๐‘˜ can be computed by a Boolean circuit (with AND, OR andNOT gates) of at most 8 โ‹… 2๐‘˜ gates.

4.3.1 Constructing a NAND-CIRC program for LOOKUPWe prove Theorem 4.10 by induction. For the case ๐‘˜ = 1, LOOKUP1maps (๐‘ฅ0, ๐‘ฅ1, ๐‘–) โˆˆ 0, 13 to ๐‘ฅ๐‘–. In other words, if ๐‘– = 0 then it outputs

Page 170: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

170 introduction to theoretical computer science

๐‘ฅ0 and otherwise it outputs ๐‘ฅ1, which (up to reordering variables) isthe same as the IF function presented in Section 4.1.3, which can becomputed by a 4-line NAND-CIRC program.

As a warm-up for the case of general ๐‘˜, let us consider the caseof ๐‘˜ = 2. Given input ๐‘ฅ = (๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3) for LOOKUP2 and anindex ๐‘– = (๐‘–0, ๐‘–1), if the most significant bit ๐‘–0 of the index is 0 thenLOOKUP2(๐‘ฅ, ๐‘–) will equal ๐‘ฅ0 if ๐‘–1 = 0 and equal ๐‘ฅ1 if ๐‘–1 = 1. Similarly,if the most significant bit ๐‘–0 is 1 then LOOKUP2(๐‘ฅ, ๐‘–) will equal ๐‘ฅ2 if๐‘–1 = 0 and will equal ๐‘ฅ3 if ๐‘–1 = 1. Another way to say this is that wecan write LOOKUP2 as follows:

def LOOKUP2(X[0],X[1],X[2],X[3],i[0],i[1]):if i[0]==1:

return LOOKUP1(X[2],X[3],i[1])else:

return LOOKUP1(X[0],X[1],i[1])

or in other words,

def LOOKUP2(X[0],X[1],X[2],X[3],i[0],i[1]):a = LOOKUP1(X[2],X[3],i[1])b = LOOKUP1(X[0],X[1],i[1])return IF( i[0],a,b)

More generally, as shown in the following lemma, we can computeLOOKUP๐‘˜ using two invocations of LOOKUP๐‘˜โˆ’1 and one invocationof IF:Lemma 4.11 โ€” Lookup recursion. For every ๐‘˜ โ‰ฅ 2, LOOKUP๐‘˜(๐‘ฅ0, โ€ฆ , ๐‘ฅ2๐‘˜โˆ’1, ๐‘–0, โ€ฆ , ๐‘–๐‘˜โˆ’1)is equal to

IF (๐‘–0,LOOKUP๐‘˜โˆ’1(๐‘ฅ2๐‘˜โˆ’1 , โ€ฆ , ๐‘ฅ2๐‘˜โˆ’1, ๐‘–1, โ€ฆ , ๐‘–๐‘˜โˆ’1),LOOKUP๐‘˜โˆ’1(๐‘ฅ0, โ€ฆ , ๐‘ฅ2๐‘˜โˆ’1โˆ’1, ๐‘–1, โ€ฆ , ๐‘–๐‘˜โˆ’1))(4.2)

Proof. If the most significant bit ๐‘–0 of ๐‘– is zero, then the index ๐‘– isin 0, โ€ฆ , 2๐‘˜โˆ’1 โˆ’ 1 and hence we can perform the lookup on theโ€œfirst halfโ€ of ๐‘ฅ and the result of LOOKUP๐‘˜(๐‘ฅ, ๐‘–) will be the same as๐‘Ž = LOOKUP๐‘˜โˆ’1(๐‘ฅ0, โ€ฆ , ๐‘ฅ2๐‘˜โˆ’1โˆ’1, ๐‘–1, โ€ฆ , ๐‘–๐‘˜โˆ’1). On the other hand, if thismost significant bit ๐‘–0 is equal to 1, then the index is in 2๐‘˜โˆ’1, โ€ฆ , 2๐‘˜ โˆ’1, in which case the result of LOOKUP๐‘˜(๐‘ฅ, ๐‘–) is the same as ๐‘ =LOOKUP๐‘˜โˆ’1(๐‘ฅ2๐‘˜โˆ’1 , โ€ฆ , ๐‘ฅ2๐‘˜โˆ’1, ๐‘–1, โ€ฆ , ๐‘–๐‘˜โˆ’1). Thus we can computeLOOKUP๐‘˜(๐‘ฅ, ๐‘–) by first computing ๐‘Ž and ๐‘ and then outputtingIF(๐‘–๐‘˜โˆ’1, ๐‘Ž, ๐‘).

Proof of Theorem 4.10 from Lemma 4.11. Now that we have Lemma 4.11,we can complete the proof of Theorem 4.10. We will prove by induc-tion on ๐‘˜ that there is a NAND-CIRC program of at most 4 โ‹… (2๐‘˜ โˆ’ 1)

Page 171: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 171

Figure 4.7: The number of lines in our implementationof the LOOKUP_k function as a function of ๐‘˜ (i.e., thelength of the index). The number of lines in ourimplementation is roughly 3 โ‹… 2๐‘˜.

lines for LOOKUP๐‘˜. For ๐‘˜ = 1 this follows by the four line program forIF weโ€™ve seen before. For ๐‘˜ > 1, we use the following pseudocode:

a = LOOKUP_(k-1)(X[0],...,X[2^(k-1)-1],i[1],...,i[k-1])b = LOOKUP_(k-1)(X[2^(k-1)],...,Z[2^(k-1)],i[1],...,i[k-

1])return IF(i[0],b,a)

If we let ๐ฟ(๐‘˜) be the number of lines required for LOOKUP๐‘˜, thenthe above pseudo-code shows that๐ฟ(๐‘˜) โ‰ค 2๐ฟ(๐‘˜ โˆ’ 1) + 4 . (4.3)

Since under our induction hypothesis ๐ฟ(๐‘˜ โˆ’ 1) โ‰ค 4(2๐‘˜โˆ’1 โˆ’ 1), we getthat ๐ฟ(๐‘˜) โ‰ค 2 โ‹… 4(2๐‘˜โˆ’1 โˆ’ 1) + 4 = 4(2๐‘˜ โˆ’ 1) which is what we wantedto prove. See Fig. 4.7 for a plot of the actual number of lines in ourimplementation of LOOKUP๐‘˜.4.4 COMPUTING EVERY FUNCTION

At this point we know the following facts about NAND-CIRC pro-grams (and so equivalently about Boolean circuits and our otherequivalent models):

1. They can compute at least some non trivial functions.

2. Coming up with NAND-CIRC programs for various functions is avery tedious task.

Thus I would not blame the reader if they were not particularlylooking forward to a long sequence of examples of functions that canbe computed by NAND-CIRC programs. However, it turns out we arenot going to need this, as we can show in one fell swoop that NAND-CIRC programs can compute every finite function:

Theorem 4.12 โ€” Universality of NAND. There exists some constant ๐‘ > 0such that for every ๐‘›, ๐‘š > 0 and function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š,there is a NAND-CIRC program with at most ๐‘โ‹…๐‘š2๐‘› lines that com-putes the function ๐‘“ .

By Theorem 3.19, the models of NAND circuits, NAND-CIRC pro-grams, AON-CIRC programs, and Boolean circuits, are all equivalentto one another, and hence Theorem 4.12 holds for all these models. Inparticular, the following theorem is equivalent to Theorem 4.12:

Theorem 4.13 โ€” Universality of Boolean circuits. There exists someconstant ๐‘ > 0 such that for every ๐‘›, ๐‘š > 0 and function

Page 172: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

172 introduction to theoretical computer science

2 In case you are curious, this is the function on input๐‘– โˆˆ 0, 14 (which we interpret as a number in [16]),that outputs the ๐‘–-th digit of ๐œ‹ in the binary basis.

๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š, there is a Boolean circuit with at most๐‘ โ‹… ๐‘š2๐‘› gates that computes the function ๐‘“ .

Big Idea 5 Every finite function can be computed by a largeenough Boolean circuit.

Improved bounds. Though it will not be of great importance to us, itis possible to improve on the proof of Theorem 4.12 and shave an extrafactor of ๐‘›, as well as optimize the constant ๐‘, and so prove that forevery ๐œ– > 0, ๐‘š โˆˆ โ„• and sufficiently large ๐‘›, if ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘šthen ๐‘“ can be computed by a NAND circuit of at most (1 + ๐œ–) ๐‘šโ‹…2๐‘›๐‘›gates. The proof of this result is beyond the scope of this book, but wedo discuss how to obtain a bound of the form ๐‘‚( ๐‘šโ‹…2๐‘›๐‘› ) in Section 4.4.2;see also the biographical notes.

4.4.1 Proof of NANDโ€™s UniversalityTo prove Theorem 4.12, we need to give a NAND circuit, or equiva-lently a NAND-CIRC program, for every possible function. We willrestrict our attention to the case of Boolean functions (i.e., ๐‘š = 1).Exercise 4.9 asks you to extend the proof for all values of ๐‘š. A func-tion ๐น โˆถ 0, 1๐‘› โ†’ 0, 1 can be specified by a table of its values foreach one of the 2๐‘› inputs. For example, the table below describes oneparticular function ๐บ โˆถ 0, 14 โ†’ 0, 1:2

Table 4.1: An example of a function ๐บ โˆถ 0, 14 โ†’ 0, 1.Input (๐‘ฅ) Output (๐บ(๐‘ฅ))0000 10001 10010 00011 00100 10101 00110 00111 11000 01001 01010 01011 01100 11101 11110 11111 1

Page 173: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 173

For every ๐‘ฅ โˆˆ 0, 14, ๐บ(๐‘ฅ) = LOOKUP4(1100100100001111, ๐‘ฅ), andso the following is NAND-CIRC โ€œpseudocodeโ€ to compute ๐บ usingsyntactic sugar for the LOOKUP_4 procedure.

G0000 = 1G1000 = 1G0100 = 0...G0111 = 1G1111 = 1Y[0] = LOOKUP_4(G0000,G1000,...,G1111,

X[0],X[1],X[2],X[3])

We can translate this pseudocode into an actual NAND-CIRC pro-gram by adding three lines to define variables zero and one that areinitialized to 0 and 1 respectively, and then replacing a statement suchas Gxxx = 0 with Gxxx = NAND(one,one) and a statement such asGxxx = 1 with Gxxx = NAND(zero,zero). The call to LOOKUP_4 willbe replaced by the NAND-CIRC program that computes LOOKUP4,plugging in the appropriate inputs.

There was nothing about the above reasoning that was particular tothe function ๐บ above. Given every function ๐น โˆถ 0, 1๐‘› โ†’ 0, 1, wecan write a NAND-CIRC program that does the following:

1. Initialize 2๐‘› variables of the form F00...0 till F11...1 so that forevery ๐‘ง โˆˆ 0, 1๐‘›, the variable corresponding to ๐‘ง is assigned thevalue ๐น(๐‘ง).

2. Compute LOOKUP๐‘› on the 2๐‘› variables initialized in the previ-ous step, with the index variable being the input variables X[0],โ€ฆ,X[2๐‘› โˆ’ 1 ]. That is, just like in the pseudocode for G above, weuse Y[0] = LOOKUP(F00..00,...,F11..1,X[0],..,x[๐‘› โˆ’ 1])The total number of lines in the resulting program is 3 + 2๐‘› lines for

initializing the variables plus the 4 โ‹… 2๐‘› lines that we pay for computingLOOKUP๐‘›. This completes the proof of Theorem 4.12.

RRemark 4.14 โ€” Result in perspective. While Theo-rem 4.12 seems striking at first, in retrospect, it isperhaps not that surprising that every finite functioncan be computed with a NAND-CIRC program. Afterall, a finite function ๐น โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š can berepresented by simply the list of its outputs for eachone of the 2๐‘› input values. So it makes sense that wecould write a NAND-CIRC program of similar sizeto compute it. What is more interesting is that somefunctions, such as addition and multiplication, have

Page 174: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

174 introduction to theoretical computer science

3 The constant ๐‘ in this theorem is at most 10 and infact can be arbitrarily close to 1, see Section 4.8.

Figure 4.8: We can compute ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 oninput ๐‘ฅ = ๐‘Ž๐‘ where ๐‘Ž โˆˆ 0, 1๐‘˜ and ๐‘ โˆˆ 0, 1๐‘›โˆ’๐‘˜by first computing the 2๐‘›โˆ’๐‘˜ long string ๐‘”(๐‘Ž) thatcorresponds to all ๐‘“โ€™s values on inputs that begin with๐‘Ž, and then outputting the ๐‘-th coordinate of thisstring.

a much more efficient representation: one that onlyrequires ๐‘‚(๐‘›2) or even fewer lines.

4.4.2 Improving by a factor of ๐‘› (optional)By being a little more careful, we can improve the bound of Theo-rem 4.12 and show that every function ๐น โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š can becomputed by a NAND-CIRC program of at most ๐‘‚(๐‘š2๐‘›/๐‘›) lines. Inother words, we can prove the following improved version:

Theorem 4.15 โ€” Universality of NAND circuits, improved bound. There ex-ists a constant ๐‘ > 0 such that for every ๐‘›, ๐‘š > 0 and function๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š, there is a NAND-CIRC program with at most๐‘ โ‹… ๐‘š2๐‘›/๐‘› lines that computes the function ๐‘“ . 3

Proof. As before, it is enough to prove the case that ๐‘š = 1. Hencewe let ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1, and our goal is to prove that there existsa NAND-CIRC program of ๐‘‚(2๐‘›/๐‘›) lines (or equivalently a Booleancircuit of ๐‘‚(2๐‘›/๐‘›) gates) that computes ๐‘“ .

We let ๐‘˜ = log(๐‘› โˆ’ 2 log๐‘›) (the reasoning behind this choice willbecome clear later on). We define the function ๐‘” โˆถ 0, 1๐‘˜ โ†’ 0, 12๐‘›โˆ’๐‘˜as follows: ๐‘”(๐‘Ž) = ๐‘“(๐‘Ž0๐‘›โˆ’๐‘˜)๐‘“(๐‘Ž0๐‘›โˆ’๐‘˜โˆ’11) โ‹ฏ ๐‘“(๐‘Ž1๐‘›โˆ’๐‘˜) . (4.4)

In other words, if we use the usual binary representation to identifythe numbers 0, โ€ฆ , 2๐‘›โˆ’๐‘˜ โˆ’ 1 with the strings 0, 1๐‘›โˆ’๐‘˜, then for every๐‘Ž โˆˆ 0, 1๐‘˜ and ๐‘ โˆˆ 0, 1๐‘›โˆ’๐‘˜๐‘”(๐‘Ž)๐‘ = ๐‘“(๐‘Ž๐‘) . (4.5)

(4.5) means that for every ๐‘ฅ โˆˆ 0, 1๐‘›, if we write ๐‘ฅ = ๐‘Ž๐‘ with๐‘Ž โˆˆ 0, 1๐‘˜ and ๐‘ โˆˆ 0, 1๐‘›โˆ’๐‘˜ then we can compute ๐‘“(๐‘ฅ) by firstcomputing the string ๐‘‡ = ๐‘”(๐‘Ž) of length 2๐‘›โˆ’๐‘˜, and then computingLOOKUP๐‘›โˆ’๐‘˜(๐‘‡ , ๐‘) to retrieve the element of ๐‘‡ at the position cor-responding to ๐‘ (see Fig. 4.8). The cost to compute the LOOKUP๐‘›โˆ’๐‘˜is ๐‘‚(2๐‘›โˆ’๐‘˜) lines/gates and the cost in NAND-CIRC lines (or Booleangates) to compute ๐‘“ is at most๐‘๐‘œ๐‘ ๐‘ก(๐‘”) + ๐‘‚(2๐‘›โˆ’๐‘˜) , (4.6)

where ๐‘๐‘œ๐‘ ๐‘ก(๐‘”) is the number of operations (i.e., lines of NAND-CIRCprograms or gates in a circuit) needed to compute ๐‘”.

To complete the proof we need to give a bound on ๐‘๐‘œ๐‘ ๐‘ก(๐‘”). Since ๐‘”is a function mapping 0, 1๐‘˜ to 0, 12๐‘›โˆ’๐‘˜ , we can also think of it as acollection of 2๐‘›โˆ’๐‘˜ functions ๐‘”0, โ€ฆ , ๐‘”2๐‘›โˆ’๐‘˜โˆ’1 โˆถ 0, 1๐‘˜ โ†’ 0, 1, where

Page 175: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 175

Figure 4.9: If ๐‘”0, โ€ฆ , ๐‘”๐‘โˆ’1 is a collection of functionseach mapping 0, 1๐‘˜ to 0, 1 such that at most ๐‘†of them are distinct then for every ๐‘Ž โˆˆ 0, 1๐‘˜, wecan compute all the values ๐‘”0(๐‘Ž), โ€ฆ , ๐‘”๐‘โˆ’1(๐‘Ž) usingat most ๐‘‚(๐‘† โ‹… 2๐‘˜ + ๐‘) operations by first computingthe distinct functions and then copying the resultingvalues.

๐‘”๐‘–(๐‘ฅ) = ๐‘”(๐‘Ž)๐‘– for every ๐‘Ž โˆˆ 0, 1๐‘˜ and ๐‘– โˆˆ [2๐‘›โˆ’๐‘˜]. (That is, ๐‘”๐‘–(๐‘Ž) isthe ๐‘–-th bit of ๐‘”(๐‘Ž).) Naively, we could use Theorem 4.12 to computeeach ๐‘”๐‘– in ๐‘‚(2๐‘˜) lines, but then the total cost is ๐‘‚(2๐‘›โˆ’๐‘˜ โ‹… 2๐‘˜) = ๐‘‚(2๐‘›)which does not save us anything. However, the crucial observationis that there are only 22๐‘˜ distinct functions mapping 0, 1๐‘˜ to 0, 1.For example, if ๐‘”17 is an identical function to ๐‘”67 that means that ifwe already computed ๐‘”17(๐‘Ž) then we can compute ๐‘”67(๐‘Ž) using onlya constant number of operations: simply copy the same value! Ingeneral, if you have a collection of ๐‘ functions ๐‘”0, โ€ฆ , ๐‘”๐‘โˆ’1 mapping0, 1๐‘˜ to 0, 1, of which at most ๐‘† are distinct then for every value๐‘Ž โˆˆ 0, 1๐‘˜ we can compute the ๐‘ values ๐‘”0(๐‘Ž), โ€ฆ , ๐‘”๐‘โˆ’1(๐‘Ž) using atmost ๐‘‚(๐‘† โ‹… 2๐‘˜ + ๐‘) operations (see Fig. 4.9).

In our case, because there are at most 22๐‘˜ distinct functions map-ping 0, 1๐‘˜ to 0, 1, we can compute the function ๐‘” (and hence by(4.5) also ๐‘“) using at most๐‘‚(22๐‘˜ โ‹… 2๐‘˜ + 2๐‘›โˆ’๐‘˜) (4.7)operations. Now all that is left is to plug into (4.7) our choice of ๐‘˜ =log(๐‘› โˆ’ 2 log๐‘›). By definition, 2๐‘˜ = ๐‘› โˆ’ 2 log๐‘›, which means that (4.7)can be bounded๐‘‚ (2๐‘›โˆ’2 log๐‘› โ‹… (๐‘› โˆ’ 2 log๐‘›) + 2๐‘›โˆ’log(๐‘›โˆ’2 log๐‘›)) โ‰ค (4.8)

๐‘‚ ( 2๐‘›๐‘›2 โ‹… ๐‘› + 2๐‘›๐‘›โˆ’2 log๐‘› ) โ‰ค ๐‘‚ ( 2๐‘›๐‘› + 2๐‘›0.5๐‘› ) = ๐‘‚ ( 2๐‘›๐‘› ) (4.9)which is what we wanted to prove. (We used above the fact that ๐‘› โˆ’2 log๐‘› โ‰ฅ 0.5 log๐‘› for sufficiently large ๐‘›.)

Using the connection between NAND-CIRC programs and Booleancircuits, an immediate corollary of Theorem 4.15 is the followingimprovement to Theorem 4.13:

Theorem 4.16 โ€” Universality of Boolean circuits, improved bound. Thereexists some constant ๐‘ > 0 such that for every ๐‘›, ๐‘š > 0 and func-tion ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š, there is a Boolean circuit with at most๐‘ โ‹… ๐‘š2๐‘›/๐‘› gates that computes the function ๐‘“ .

4.5 COMPUTING EVERY FUNCTION: AN ALTERNATIVE PROOF

Theorem 4.13 is a fundamental result in the theory (and practice!) ofcomputation. In this section we present an alternative proof of thisbasic fact that Boolean circuits can compute every finite function. Thisalternative proof gives a somewhat worse quantitative bound on thenumber of gates but it has the advantage of being simpler, working

Page 176: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

176 introduction to theoretical computer science

directly with circuits and avoiding the usage of all the syntactic sugarmachinery. (However, that machinery is useful in its own right, andwill find other applications later on.)

Theorem 4.17 โ€” Universality of Boolean circuits (alternative phrasing). Thereexists some constant ๐‘ > 0 such that for every ๐‘›, ๐‘š > 0 and func-tion ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š, there is a Boolean circuit with at most๐‘ โ‹… ๐‘š โ‹… ๐‘›2๐‘› gates that computes the function ๐‘“ .

Figure 4.10: Given a function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1,we let ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘โˆ’1 โŠ† 0, 1๐‘› be the set ofinputs such that ๐‘“(๐‘ฅ๐‘–) = 1, and note that ๐‘ โ‰ค 2๐‘›.We can express ๐‘“ as the OR of ๐›ฟ๐‘ฅ๐‘– for ๐‘– โˆˆ [๐‘] wherethe function ๐›ฟ๐›ผ โˆถ 0, 1๐‘› โ†’ 0, 1 (for ๐›ผ โˆˆ 0, 1๐‘›)is defined as follows: ๐›ฟ๐›ผ(๐‘ฅ) = 1 iff ๐‘ฅ = ๐›ผ. We cancompute the OR of ๐‘ values using ๐‘ two-input ORgates. Therefore if we have a circuit of size ๐‘‚(๐‘›) tocompute ๐›ฟ๐›ผ for every ๐›ผ โˆˆ 0, 1๐‘›, we can compute ๐‘“using a circuit of size ๐‘‚(๐‘› โ‹… ๐‘) = ๐‘‚(๐‘› โ‹… 2๐‘›).

Proof Idea:

The idea of the proof is illustrated in Fig. 4.10. As before, it isenough to focus on the case that ๐‘š = 1 (the function ๐‘“ has a sin-gle output), since we can always extend this to the case of ๐‘š > 1by looking at the composition of ๐‘š circuits each computing a differ-ent output bit of the function ๐‘“ . We start by showing that for every๐›ผ โˆˆ 0, 1๐‘›, there is an ๐‘‚(๐‘›) sized circuit that computes the function๐›ฟ๐›ผ โˆถ 0, 1๐‘› โ†’ 0, 1 defined as follows: ๐›ฟ๐›ผ(๐‘ฅ) = 1 iff ๐‘ฅ = ๐›ผ (that is,๐›ฟ๐›ผ outputs 0 on all inputs except the input ๐›ผ). We can then write anyfunction ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 as the OR of at most 2๐‘› functions ๐›ฟ๐›ผ forthe ๐›ผโ€™s on which ๐‘“(๐›ผ) = 1.โ‹†Proof of Theorem 4.17. We prove the theorem for the case ๐‘š = 1. Theresult can be extended for ๐‘š > 1 as before (see also Exercise 4.9). Let๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1. We will prove that there is an ๐‘‚(๐‘› โ‹… 2๐‘›)-sizedBoolean circuit to compute ๐‘“ in the following steps:

Page 177: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 177

Figure 4.11: For every string ๐›ผ โˆˆ 0, 1๐‘›, there is aBoolean circuit of ๐‘‚(๐‘›) gates to compute the function๐›ฟ๐›ผ โˆถ 0, 1๐‘› โ†’ 0, 1 such that ๐›ฟ๐›ผ(๐‘ฅ) = 1 if andonly if ๐‘ฅ = ๐›ผ. The circuit is very simple. Given input๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 we compute the AND of ๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1where ๐‘ง๐‘– = ๐‘ฅ๐‘– if ๐›ผ๐‘– = 1 and ๐‘ง๐‘– = NOT(๐‘ฅ๐‘–) if ๐›ผ๐‘– = 0.While formally Boolean circuits only have a gate forcomputing the AND of two inputs, we can implementan AND of ๐‘› inputs by composing ๐‘› two-inputANDs.

1. We show that for every ๐›ผ โˆˆ 0, 1๐‘›, there is an ๐‘‚(๐‘›) sized circuitthat computes the function ๐›ฟ๐›ผ โˆถ 0, 1๐‘› โ†’ 0, 1, where ๐›ฟ๐›ผ(๐‘ฅ) = 1 iff๐‘ฅ = ๐›ผ.

2. We then show that this implies the existence of an ๐‘‚(๐‘› โ‹… 2๐‘›)-sizedcircuit that computes ๐‘“ , by writing ๐‘“(๐‘ฅ) as the OR of ๐›ฟ๐›ผ(๐‘ฅ) for all๐›ผ โˆˆ 0, 1๐‘› such that ๐‘“(๐›ผ) = 1.We start with Step 1:CLAIM: For ๐›ผ โˆˆ 0, 1๐‘›, define ๐›ฟ๐›ผ โˆถ 0, 1๐‘› as follows:๐›ฟ๐›ผ(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐›ผ0 otherwise

. (4.10)

then there is a Boolean circuit using at most 2๐‘› gates that computes ๐›ฟ๐›ผ.PROOF OF CLAIM: The proof is illustrated in Fig. 4.11. As an

example, consider the function ๐›ฟ011 โˆถ 0, 13 โ†’ 0, 1. This functionoutputs 1 on ๐‘ฅ if and only if ๐‘ฅ0 = 0, ๐‘ฅ1 = 1 and ๐‘ฅ2 = 1, and so we canwrite ๐›ฟ011(๐‘ฅ) = ๐‘ฅ0 โˆง ๐‘ฅ1 โˆง ๐‘ฅ2, which translates into a Boolean circuitwith one NOT gate and two AND gates. More generally, for every๐›ผ โˆˆ 0, 1๐‘›, we can express ๐›ฟ๐›ผ(๐‘ฅ) as (๐‘ฅ0 = ๐›ผ0)โˆง(๐‘ฅ1 = ๐›ผ1)โˆงโ‹ฏโˆง(๐‘ฅ๐‘›โˆ’1 =๐›ผ๐‘›โˆ’1), where if ๐›ผ๐‘– = 0 we replace ๐‘ฅ๐‘– = ๐›ผ๐‘– with ๐‘ฅ๐‘– and if ๐›ผ๐‘– = 1 wereplace ๐‘ฅ๐‘– = ๐›ผ๐‘– by simply ๐‘ฅ๐‘–. This yields a circuit that computes ๐›ฟ๐›ผusing ๐‘› AND gates and at most ๐‘› NOT gates, so a total of at most 2๐‘›gates.

Now for every function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1, we can write๐‘“(๐‘ฅ) = ๐›ฟ๐‘ฅ0(๐‘ฅ) โˆจ ๐›ฟ๐‘ฅ1(๐‘ฅ) โˆจ โ‹ฏ โˆจ ๐›ฟ๐‘ฅ๐‘โˆ’1(๐‘ฅ) (4.11)where ๐‘† = ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘โˆ’1 is the set of inputs on which ๐‘“ outputs 1.

(To see this, you can verify that the right-hand side of (4.11) evaluatesto 1 on ๐‘ฅ โˆˆ 0, 1๐‘› if and only if ๐‘ฅ is in the set ๐‘†.)

Therefore we can compute ๐‘“ using a Boolean circuit of at most 2๐‘›gates for each of the ๐‘ functions ๐›ฟ๐‘ฅ๐‘– and combine that with at most ๐‘OR gates, thus obtaining a circuit of at most 2๐‘› โ‹… ๐‘ + ๐‘ gates. Since๐‘† โŠ† 0, 1๐‘›, its size ๐‘ is at most 2๐‘› and hence the total number ofgates in this circuit is ๐‘‚(๐‘› โ‹… 2๐‘›).

4.6 THE CLASS SIZE(๐‘‡ )We have seen that every function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š can be com-puted by a circuit of size ๐‘‚(๐‘š โ‹… 2๐‘›), and some functions (such as ad-dition and multiplication) can be computed by much smaller circuits.We define SIZE(๐‘ ) to be the set of functions that can be computed byNAND circuits of at most ๐‘  gates (or equivalently, by NAND-CIRCprograms of at most ๐‘  lines). Formally, the definition is as follows:

Page 178: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

178 introduction to theoretical computer science

4 The restriction that ๐‘š, ๐‘› โ‰ค 2๐‘  makes no difference;see Exercise 3.11.

Definition 4.18 โ€” Size class of functions. For every ๐‘›, ๐‘š โˆˆ 1, โ€ฆ , 2๐‘ ,we let set SIZE๐‘›,๐‘š(๐‘ ) denotes the set of all functions ๐‘“ โˆถ 0, 1๐‘› โ†’0, 1๐‘š such that ๐‘“ โˆˆ SIZE(๐‘ ). 4 We denote by SIZE๐‘›(๐‘ ) the setSIZE๐‘›,1(๐‘ ). For every integer ๐‘  โ‰ฅ 1, we let SIZE(๐‘ ) = โˆช๐‘›,๐‘šโ‰ค2๐‘ SIZE๐‘›,๐‘š(๐‘ )be the set of all functions ๐‘“ for which there exists a NAND circuitof at most ๐‘  gates that compute ๐‘“ .Fig. 4.12 depicts the set SIZE๐‘›,1(๐‘ ). Note that SIZE๐‘›,๐‘š(๐‘ ) is a set of

functions, not of programs! (Asking if a program or a circuit is a mem-ber of SIZE๐‘›,๐‘š(๐‘ ) is a category error as in the sense of Fig. 4.13.) Aswe discussed in Section 3.6.2 (and Section 2.6.1), the distinction be-tween programs and functions is absolutely crucial. You should alwaysremember that while a program computes a function, it is not equal toa function. In particular, as weโ€™ve seen, there can be more than oneprogram to compute the same function.

Figure 4.12: There are 22๐‘› functions mapping 0, 1๐‘›to 0, 1, and an infinite number of circuits with ๐‘› bitinputs and a single bit of output. Every circuit com-putes one function, but every function can be com-puted by many circuits. We say that ๐‘“ โˆˆ SIZE๐‘›,1(๐‘ )if the smallest circuit that computes ๐‘“ has ๐‘  or fewergates. For example XOR๐‘› โˆˆ SIZE๐‘›,1(4๐‘›). Theo-rem 4.12 shows that every function ๐‘” is computableby some circuit of at most ๐‘ โ‹… 2๐‘›/๐‘› gates, and henceSIZE๐‘›,1(๐‘ โ‹… 2๐‘›/๐‘›) corresponds to the set of all func-tions from 0, 1๐‘› to 0, 1.

While we defined SIZE(๐‘ ) with respect to NAND gates, wewould get essentially the same class if we defined it with respect toAND/OR/NOT gates:Lemma 4.19 Let SIZE๐ด๐‘‚๐‘๐‘›,๐‘š (๐‘ ) denote the set of all functions ๐‘“ โˆถ 0, 1๐‘› โ†’0, 1๐‘š that can be computed by an AND/OR/NOT Boolean circuit ofat most ๐‘  gates. Then,

SIZE๐‘›,๐‘š(๐‘ /2) โŠ† SIZE๐ด๐‘‚๐‘๐‘›,๐‘š (๐‘ ) โŠ† SIZE๐‘›,๐‘š(3๐‘ ) (4.12)

Proof. If ๐‘“ can be computed by a NAND circuit of at most ๐‘ /2 gates,then by replacing each NAND with the two gates NOT and AND, wecan obtain an AND/OR/NOT Boolean circuit of at most ๐‘  gates thatcomputes ๐‘“ . On the other hand, if ๐‘“ can be computed by a Boolean

Page 179: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 179

Figure 4.13: A โ€œcategory errorโ€ is a question such asโ€œis a cucumber even or odd?โ€ which does not evenmake sense. In this book one type of category erroryou should watch out for is confusing functions andprograms (i.e., confusing specifications and implemen-tations). If ๐ถ is a circuit or program, then asking if๐ถ โˆˆ SIZE๐‘›,1(๐‘ ) is a category error, since SIZE๐‘›,1(๐‘ ) isa set of functions and not programs or circuits.

AND/OR/NOT circuit of at most ๐‘  gates, then by Theorem 3.12 it canbe computed by a NAND circuit of at most 3๐‘  gates.

The results we have seen in this chapter can be phrased as showingthat ADD๐‘› โˆˆ SIZE2๐‘›,๐‘›+1(100๐‘›) and MULT๐‘› โˆˆ SIZE2๐‘›,2๐‘›(10000๐‘›log2 3).Theorem 4.12 shows that for some constant ๐‘, SIZE๐‘›,๐‘š(๐‘๐‘š2๐‘›) is equalto the set of all functions from 0, 1๐‘› to 0, 1๐‘š.

RRemark 4.20 โ€” Finite vs infinite functions. A NAND-CIRC program ๐‘ƒ can only compute a function witha certain number ๐‘› of inputs and a certain number๐‘š of outputs. Hence, for example, there is no singleNAND-CIRC program that can compute the incre-ment function INC โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— that maps astring ๐‘ฅ (which we identify with a number via thebinary representation) to the string that represents๐‘ฅ + 1. Rather for every ๐‘› > 0, there is a NAND-CIRCprogram ๐‘ƒ๐‘› that computes the restriction INC๐‘› ofthe function INC to inputs of length ๐‘›. Since it can beshown that for every ๐‘› > 0 such a program ๐‘ƒ๐‘› existsof length at most 10๐‘›, INC๐‘› โˆˆ SIZE(10๐‘›) for every๐‘› > 0.If ๐‘‡ โˆถ โ„• โ†’ โ„• and ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, we willwrite ๐น โˆˆ SIZEโˆ—(๐‘‡ (๐‘›)) (or sometimes slightly abusenotation and write simply ๐น โˆˆ SIZE(๐‘‡ (๐‘›))) to in-dicate that for every ๐‘› the restriction ๐น๐‘› of ๐น toinputs in 0, 1๐‘› is in SIZE(๐‘‡ (๐‘›)). Hence we can writeINC โˆˆ SIZEโˆ—(10๐‘›). We will come back to this issue offinite vs infinite functions later in this course.

Solved Exercise 4.1 โ€” ๐‘†๐ผ๐‘๐ธ closed under complement.. In this exercise weprove a certain โ€œclosure propertyโ€ of the class SIZE๐‘›(๐‘ ). That is, weshow that if ๐‘“ is in this class then (up to some small additive term) sois the complement of ๐‘“ , which is the function ๐‘”(๐‘ฅ) = 1 โˆ’ ๐‘“(๐‘ฅ).

Prove that there is a constant ๐‘ such that for every ๐‘“ โˆถ 0, 1๐‘› โ†’0, 1 and ๐‘  โˆˆ โ„•, if ๐‘“ โˆˆ SIZE๐‘›(๐‘ ) then 1 โˆ’ ๐‘“ โˆˆ SIZE๐‘›(๐‘  + ๐‘).

Solution:

If ๐‘“ โˆˆ SIZE(๐‘ ) then there is an ๐‘ -line NAND-CIRC program๐‘ƒ that computes ๐‘“ . We can rename the variable Y[0] in ๐‘ƒ to avariable temp and add the line

Y[0] = NAND(temp,temp)

at the very end to obtain a program ๐‘ƒ โ€ฒ that computes 1 โˆ’ ๐‘“ .

Page 180: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

180 introduction to theoretical computer science

Chapter Recap

โ€ข We can define the notion of computing a functionvia a simplified โ€œprogramming languageโ€, wherecomputing a function ๐น in ๐‘‡ steps would corre-spond to having a ๐‘‡ -line NAND-CIRC programthat computes ๐น .

โ€ข While the NAND-CIRC programming only has oneoperation, other operations such as functions andconditional execution can be implemented using it.

โ€ข Every function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š can be com-puted by a circuit of at most ๐‘‚(๐‘š2๐‘›) gates (and infact at most ๐‘‚(๐‘š2๐‘›/๐‘›) gates).

โ€ข Sometimes (or maybe always?) we can translate anefficient algorithm to compute ๐‘“ into a circuit thatcomputes ๐‘“ with a number of gates comparable tothe number of steps in this algorithm.

4.7 EXERCISES

Exercise 4.1 โ€” Pairing. This exercise asks you to give a one-to-one mapfrom โ„•2 to โ„•. This can be useful to implement two-dimensional arraysas โ€œsyntactic sugarโ€ in programming languages that only have one-dimensional array.

1. Prove that the map ๐น(๐‘ฅ, ๐‘ฆ) = 2๐‘ฅ3๐‘ฆ is a one-to-one map from โ„•2 toโ„•.

2. Show that there is a one-to-one map ๐น โˆถ โ„•2 โ†’ โ„• such that for every๐‘ฅ, ๐‘ฆ, ๐น(๐‘ฅ, ๐‘ฆ) โ‰ค 100 โ‹… max๐‘ฅ, ๐‘ฆ2 + 100.3. For every ๐‘˜, show that there is a one-to-one map ๐น โˆถ โ„•๐‘˜ โ†’ โ„• such

that for every ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1 โˆˆ โ„•, ๐น(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1) โ‰ค 100 โ‹… (๐‘ฅ0 + ๐‘ฅ1 + โ€ฆ +๐‘ฅ๐‘˜โˆ’1 + 100๐‘˜)๐‘˜.

Exercise 4.2 โ€” Computing MUX. Prove that the NAND-CIRC program be-low computes the function MUX (or LOOKUP1) where MUX(๐‘Ž, ๐‘, ๐‘)equals ๐‘Ž if ๐‘ = 0 and equals ๐‘ if ๐‘ = 1:t = NAND(X[2],X[2])u = NAND(X[0],t)v = NAND(X[1],X[2])Y[0] = NAND(u,v)

Exercise 4.3 โ€” At least two / Majority. Give a NAND-CIRC program of atmost 6 lines to compute the function MAJ โˆถ 0, 13 โ†’ 0, 1 whereMAJ(๐‘Ž, ๐‘, ๐‘) = 1 iff ๐‘Ž + ๐‘ + ๐‘ โ‰ฅ 2.

Page 181: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 181

5 You can start by transforming ๐‘ƒ into a NAND-CIRC-PROC program that uses procedure statements, andthen use the code of Fig. 4.3 to transform the latterinto a โ€œsugar freeโ€ NAND-CIRC program.

6 Use a โ€œcascadeโ€ of adding the bits one after theother, starting with the least significant digit, just likein the elementary-school algorithm.

Exercise 4.4 โ€” Conditional statements. In this exercise we will explore The-orem 4.6: transforming NAND-CIRC-IF programs that use code suchas if .. then .. else .. to standard NAND-CIRC programs.

1. Give a โ€œproof by codeโ€ of Theorem 4.6: a program in a program-ming language of your choice that transforms a NAND-CIRC-IFprogram ๐‘ƒ into a โ€œsugar freeโ€ NAND-CIRC program ๐‘ƒ โ€ฒ that com-putes the same function. See footnote for hint.5

2. Prove the following statement, which is the heart of Theorem 4.6:suppose that there exists an ๐‘ -line NAND-CIRC program to com-pute ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 and an ๐‘ โ€ฒ-line NAND-CIRC programto compute ๐‘” โˆถ 0, 1๐‘› โ†’ 0, 1. Prove that there exist a NAND-CIRC program of at most ๐‘  + ๐‘ โ€ฒ + 10 lines to compute the func-tion โ„Ž โˆถ 0, 1๐‘›+1 โ†’ 0, 1 where โ„Ž(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, ๐‘ฅ๐‘›) equals๐‘“(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) if ๐‘ฅ๐‘› = 0 and equals ๐‘”(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) otherwise.(All programs in this item are standard โ€œsugar-freeโ€ NAND-CIRCprograms.)

Exercise 4.5 โ€” Half and full adders. 1. A half adder is the function HA โˆถ0, 12 โˆถโ†’ 0, 12 that corresponds to adding two binary bits. Thatis, for every ๐‘Ž, ๐‘ โˆˆ 0, 1, HA(๐‘Ž, ๐‘) = (๐‘’, ๐‘“) where 2๐‘’ + ๐‘“ = ๐‘Ž + ๐‘.Prove that there is a NAND circuit of at most five NAND gates thatcomputes HA.

2. A full adder is the function FA โˆถ 0, 13 โ†’ 0, 12 that takes intwo bits and a โ€œcarryโ€ bit and outputs their sum. That is, for every๐‘Ž, ๐‘, ๐‘ โˆˆ 0, 1, FA(๐‘Ž, ๐‘, ๐‘) = (๐‘’, ๐‘“) such that 2๐‘’ + ๐‘“ = ๐‘Ž + ๐‘ + ๐‘.Prove that there is a NAND circuit of at most nine NAND gates thatcomputes FA.

3. Prove that if there is a NAND circuit of ๐‘ gates that computes FA,then there is a circuit of ๐‘๐‘› gates that computes ADD๐‘› where (asin Theorem 4.7) ADD๐‘› โˆถ 0, 12๐‘› โ†’ 0, 1๐‘›+1 is the function thatoutputs the addition of two input ๐‘›-bit numbers. See footnote forhint.6

4. Show that for every ๐‘› there is a NAND-CIRC program to computeADD๐‘› with at most 9๐‘› lines.

Exercise 4.6 โ€” Addition. Write a program using your favorite program-ming language that on input of an integer ๐‘›, outputs a NAND-CIRCprogram that computes ADD๐‘›. Can you ensure that the program itoutputs for ADD๐‘› has fewer than 10๐‘› lines?

Page 182: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

182 introduction to theoretical computer science

7 Hint: Use Karatsubaโ€™s algorithm.

Exercise 4.7 โ€” Multiplication. Write a program using your favorite pro-gramming language that on input of an integer ๐‘›, outputs a NAND-CIRC program that computes MULT๐‘›. Can you ensure that the pro-gram it outputs for MULT๐‘› has fewer than 1000 โ‹… ๐‘›2 lines?

Exercise 4.8 โ€” Efficient multiplication (challenge). Write a program usingyour favorite programming language that on input of an integer ๐‘›,outputs a NAND-CIRC program that computes MULT๐‘› and has atmost 10000๐‘›1.9 lines.7 What is the smallest number of lines you canuse to multiply two 2048 bit numbers?

Exercise 4.9 โ€” Multibit function. In the text Theorem 4.12 is only provenfor the case ๐‘š = 1. In this exercise you will extend the proof for every๐‘š.

Prove that

1. If there is an ๐‘ -line NAND-CIRC program to compute๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 and an ๐‘ โ€ฒ-line NAND-CIRC programto compute ๐‘“ โ€ฒ โˆถ 0, 1๐‘› โ†’ 0, 1 then there is an ๐‘  + ๐‘ โ€ฒ-lineprogram to compute the function ๐‘” โˆถ 0, 1๐‘› โ†’ 0, 12 such that๐‘”(๐‘ฅ) = (๐‘“(๐‘ฅ), ๐‘“ โ€ฒ(๐‘ฅ)).

2. For every function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š, there is a NAND-CIRCprogram of at most 10๐‘š โ‹… 2๐‘› lines that computes ๐‘“ . (You can use the๐‘š = 1 case of Theorem 4.12, as well as Item 1.)

Exercise 4.10 โ€” Simplifying using syntactic sugar. Let ๐‘ƒ be the followingNAND-CIRC program:

Temp[0] = NAND(X[0],X[0])Temp[1] = NAND(X[1],X[1])Temp[2] = NAND(Temp[0],Temp[1])Temp[3] = NAND(X[2],X[2])Temp[4] = NAND(X[3],X[3])Temp[5] = NAND(Temp[3],Temp[4])Temp[6] = NAND(Temp[2],Temp[2])Temp[7] = NAND(Temp[5],Temp[5])Y[0] = NAND(Temp[6],Temp[7])

1. Write a program ๐‘ƒ โ€ฒ with at most three lines of code that uses bothNAND as well as the syntactic sugar OR that computes the same func-tion as ๐‘ƒ .

Page 183: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

syntactic sugar, and computing every function 183

8 You can use the fact that (๐‘Ž+๐‘)+๐‘ mod 2 = ๐‘Ž+๐‘+๐‘mod 2. In particular it means that if you have thelines d = XOR(a,b) and e = XOR(d,c) then e gets thesum modulo 2 of the variable a, b and c.

2. Draw a circuit that computes the same function as ๐‘ƒ and uses onlyAND and NOT gates.

In the following exercises you are asked to compare the power ofpairs of programming languages. By โ€œcomparing the powerโ€ of twoprogramming languages ๐‘‹ and ๐‘Œ we mean determining the relationbetween the set of functions that are computable using programs in ๐‘‹and ๐‘Œ respectively. That is, to answer such a question you need to doboth of the following:

1. Either prove that for every program ๐‘ƒ in ๐‘‹ there is a program ๐‘ƒ โ€ฒin ๐‘Œ that computes the same function as ๐‘ƒ , or give an example fora function that is computable by an ๐‘‹-program but not computableby a ๐‘Œ -program.

and

2. Either prove that for every program ๐‘ƒ in ๐‘Œ there is a program ๐‘ƒ โ€ฒin ๐‘‹ that computes the same function as ๐‘ƒ , or give an example for afunction that is computable by a ๐‘Œ -program but not computable byan ๐‘‹-program.

When you give an example as above of a function that is com-putable in one programming language but not the other, you needto prove that the function you showed is (1) computable in the firstprogramming language and (2) not computable in the second program-ming language.Exercise 4.11 โ€” Compare IF and NAND. Let IF-CIRC be the programminglanguage where we have the following operations foo = 0, foo = 1,foo = IF(cond,yes,no) (that is, we can use the constants 0 and 1,and the IF โˆถ 0, 13 โ†’ 0, 1 function such that IF(๐‘Ž, ๐‘, ๐‘) equals ๐‘ if๐‘Ž = 1 and equals ๐‘ if ๐‘Ž = 0). Compare the power of the NAND-CIRCprogramming language and the IF-CIRC programming language.

Exercise 4.12 โ€” Compare XOR and NAND. Let XOR-CIRC be the pro-gramming language where we have the following operations foo= XOR(bar,blah), foo = 1 and bar = 0 (that is, we can use theconstants 0, 1 and the XOR function that maps ๐‘Ž, ๐‘ โˆˆ 0, 12 to ๐‘Ž + ๐‘mod 2). Compare the power of the NAND-CIRC programminglanguage and the XOR-CIRC programming language. See footnote forhint.8

Exercise 4.13 โ€” Circuits for majority. Prove that there is some constant ๐‘such that for every ๐‘› > 1, MAJ๐‘› โˆˆ SIZE(๐‘๐‘›) where MAJ๐‘› โˆถ 0, 1๐‘› โ†’

Page 184: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

184 introduction to theoretical computer science

9 One approach to solve this is using recursion and theso-called Master Theorem.

0, 1 is the majority function on ๐‘› input bits. That is MAJ๐‘›(๐‘ฅ) = 1 iffโˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘– > ๐‘›/2. See footnote for hint.9

Exercise 4.14 โ€” Circuits for threshold. Prove that there is some constant ๐‘such that for every ๐‘› > 1, and integers ๐‘Ž0, โ€ฆ , ๐‘Ž๐‘›โˆ’1, ๐‘ โˆˆ โˆ’2๐‘›, โˆ’2๐‘› +1, โ€ฆ , โˆ’1, 0, +1, โ€ฆ , 2๐‘›, there is a NAND circuit with at most ๐‘›๐‘ gatesthat computes the threshold function ๐‘“๐‘Ž0,โ€ฆ,๐‘Ž๐‘›โˆ’1,๐‘ โˆถ 0, 1๐‘› โ†’ 0, 1 thaton input ๐‘ฅ โˆˆ 0, 1๐‘› outputs 1 if and only if โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘Ž๐‘–๐‘ฅ๐‘– > ๐‘.

4.8 BIBLIOGRAPHICAL NOTES

See Juknaโ€™s and Wegenerโ€™s books [Juk12; Weg87] for much moreextensive discussion on circuits. Shannon showed that every Booleanfunction can be computed by a circuit of exponential size [Sha38]. Theimproved bound of ๐‘ โ‹… 2๐‘›/๐‘› (with the optimal value of ๐‘ for manybases) is due to Lupanov [Lup58]. An exposition of this for the caseof NAND (where ๐‘ = 1) is given in Chapter 4 of his book [Lup84].(Thanks to Sasha Golovnev for tracking down this reference!)

The concept of โ€œsyntactic sugarโ€ is also known as โ€œmacrosโ€ orโ€œmeta-programmingโ€ and is sometimes implemented via a prepro-cessor or macro language in a programming language or a text editor.One modern example is the Babel JavaScript syntax transformer, thatconverts JavaScript programs written using the latest features intoa format that older Browsers can accept. It even has a plug-in ar-chitecture, that allows users to add their own syntactic sugar to thelanguage.

Page 185: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

5Code as data, data as code

โ€œThe term code script is, of course, too narrow. The chromosomal structuresare at the same time instrumental in bringing about the development theyforeshadow. They are law-code and executive power - or, to use another simile,they are architectโ€™s plan and builderโ€™s craft - in one.โ€ , Erwin Schrรถdinger,1944.

โ€œA mathematician would hardly call a correspondence between the set of 64triples of four units and a set of twenty other units,โ€universalโ€œ, while suchcorrespondence is, probably, the most fundamental general feature of life onEarthโ€, Misha Gromov, 2013

A program is simply a sequence of symbols, each of which can beencoded as a string of 0โ€™s and 1โ€™s using (for example) the ASCII stan-dard. Therefore we can represent every NAND-CIRC program (andhence also every Boolean circuit) as a binary string. This statementseems obvious but it is actually quite profound. It means that we cantreat circuits or NAND-CIRC programs both as instructions to car-rying computation and also as data that could potentially be used asinputs to other computations.

Big Idea 6 A program is a piece of text, and so it can be fed as inputto other programs.

This correspondence between code and data is one of the most fun-damental aspects of computing. It underlies the notion of generalpurpose computers, that are not pre-wired to compute only one task,and also forms the basis of our hope for obtaining general artificialintelligence. This concept finds immense use in all areas of comput-ing, from scripting languages to machine learning, but it is fair to saythat we havenโ€™t yet fully mastered it. Many security exploits involvecases such as โ€œbuffer overflowsโ€ when attackers manage to inject codewhere the system expected only โ€œpassiveโ€ data (see Fig. 5.1). The re-lation between code and data reaches beyond the realm of electronic

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข See one of the most important concepts in

computing: duality between code and data.โ€ข Build up comfort in moving between

different representations of programs.โ€ข Follow the construction of a โ€œuniversal circuit

evaluatorโ€ that can evaluate other circuitsgiven their representation.

โ€ข See major result that complements the resultof the last chapter: some functions require anexponential number of gates to compute.

โ€ข Discussion of Physical extendedChurch-Turing thesis stating that Booleancircuits capture all feasible computation inthe physical world, and its physical andphilosophical implications.

Page 186: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

186 introduction to theoretical computer science

Figure 5.1: As illustrated in this xkcd cartoon, manyexploits, including buffer overflow, SQL injections,and more, utilize the blurry line between โ€œactiveprogramsโ€ and โ€œstatic stringsโ€.

computers. For example, DNA can be thought of as both a programand data (in the words of Schrรถdinger, who wrote before the discov-ery of DNAโ€™s structure a book that inspired Watson and Crick, DNA isboth โ€œarchitectโ€™s plan and builderโ€™s craftโ€).

This chapter: A non-mathy overviewIn this chapter, we will begin to explore some of the manyapplications of the correspondence between code and data.We start by using the representation of programs/circuitsas strings to count the number of programs/circuits up to acertain size, and use that to obtain a counterpart to the resultwe proved in Chapter 4. There we proved that every functioncan be computed by a circuit, but that circuit could be expo-nentially large (see Theorem 4.16 for the precise bound) Inthis chapter we will prove that there are some functions forwhich we cannot do better: the smallest circuit that computesthem is exponentially large.We will also use the notion of representing programs/cir-cuits as strings to show the existence of a โ€œuniversal circuitโ€- a circuit that can evaluate other circuits. In programminglanguages, this is known as a โ€œmeta circular evaluatorโ€ - aprogram in a certain programming language that can eval-uate other programs in the same language. These resultsdo have an important restriction: the universal circuit willhave to be of bigger size than the circuits it evaluates. We willshow how to get around this restriction in Chapter 7 wherewe introduce loops and Turing Machines.See Fig. 5.2 for an overview of the results of this chapter.

Figure 5.2: Overview of the results in this chapter.We use the representation of programs/circuits asstrings to derive two main results. First we showthe existence of a universal program/circuit, andin fact (with more work) the existence of such aprogram/circuit whose size is at most polynomial inthe size of the program/circuit it evaluates. We thenuse the string representation to count the numberof programs/circuits of a given size, and use that toestablish that some functions require an exponentialnumber of lines/gates to compute.

Page 187: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 187

5.1 REPRESENTING PROGRAMS AS STRINGS

We can represent programs or circuits as strings in a myriad of ways.For example, since Boolean circuits are labeled directed acyclic graphs,we can use the adjacency matrix or adjacency list representations forthem. However, since the code of a program is ultimately just a se-quence of letters and symbols, arguably the conceptually simplestrepresentation of a program is as such a sequence. For example, thefollowing NAND-CIRC program ๐‘ƒtemp_0 = NAND(X[0],X[1])temp_1 = NAND(X[0],temp_0)temp_2 = NAND(X[1],temp_0)Y[0] = NAND(temp_1,temp_2)

is simply a string of 107 symbols which include lower and uppercase letters, digits, the underscore character _ and equality sign =,punctuation marks such as โ€œ(โ€,โ€œ)โ€,โ€œ,โ€, spaces, and โ€œnew lineโ€ mark-ers (often denoted as โ€œ\nโ€ or โ€œโ†ตโ€). Each such symbol can be encodedas a string of 7 bits using the ASCII encoding, and hence the program๐‘ƒ can be encoded as a string of length 7 โ‹… 107 = 749 bits.

Nothing in the above discussion was specific to the program ๐‘ƒ , andhence we can use the same reasoning to prove that every NAND-CIRCprogram can be represented as a string in 0, 1โˆ—. In fact, we can do abit better. Since the names of the working variables of a NAND-CIRCprogram do not affect its functionality, we can always transform a pro-gram to have the form of ๐‘ƒ โ€ฒ where all variables apart from the inputsand outputs have the form temp_0, temp_1, temp_2, etc.. Moreover,if the program has ๐‘  lines, then we will never need to use an indexlarger than 3๐‘  (since each line involves at most three variables), andsimilarly the indices of the input and output variables will all be atmost 3๐‘ . Since a number between 0 and 3๐‘  can be expressed usingat most โŒˆlog10(3๐‘  + 1)โŒ‰ = ๐‘‚(log ๐‘ ) digits, each line in the program(which has the form foo = NAND(bar,blah)), can be representedusing ๐‘‚(1) + ๐‘‚(log ๐‘ ) = ๐‘‚(log ๐‘ ) symbols, each of which can be rep-resented by 7 bits. Hence an ๐‘  line program can be represented as astring of ๐‘‚(๐‘  log ๐‘ ) bits, resulting in the following theorem:

Theorem 5.1 โ€” Representing programs as strings. There is a constant ๐‘such that for ๐‘“ โˆˆ SIZE(๐‘ ), there exists a program ๐‘ƒ computing ๐‘“whose string representation has length at most ๐‘๐‘  log ๐‘ .

P

Page 188: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

188 introduction to theoretical computer science

1 The implicit constant in the ๐‘‚(โ‹…) notation is smallerthan 10. That is, for all sufficiently large ๐‘ , |SIZE(๐‘ )| <210๐‘  log ๐‘ , see Remark 5.4. As discussed in Section 1.7,we use the bound 10 simply because it is a roundnumber.

2 โ€œAstronomicalโ€ here is an understatement: there aremuch fewer than 2210 stars, or even particles, in theobservable universe.

We omit the formal proof of Theorem 5.1 but pleasemake sure that you understand why it follows fromthe reasoning above.

5.2 COUNTING PROGRAMS, AND LOWER BOUNDS ON THE SIZEOF NAND-CIRC PROGRAMS

One consequence of the representation of programs as strings is thatthe number of programs of certain length is bounded by the numberof strings that represent them. This has consequences for the setsSIZE(๐‘ ) that we defined in Section 4.6.

Theorem 5.2 โ€” Counting programs. For every ๐‘  โˆˆ โ„•,|SIZE(๐‘ )| โ‰ค 2๐‘‚(๐‘  log ๐‘ ). (5.1)

That is, there are at most 2๐‘‚(๐‘  log ๐‘ ) functions computed by NAND-CIRC programs of at most ๐‘  lines. 1

Proof. We will show a one-to-one map ๐ธ from SIZE(๐‘ ) to the set ofstrings of length ๐‘๐‘  log ๐‘  for some constant ๐‘. This will conclude theproof, since it implies that |SIZE(๐‘ )| is smaller than the size of the setof all strings of length at most โ„“, which equals 1+2+4+โ‹ฏ+2โ„“ = 2โ„“+1โˆ’1by the formula for sums of geometric progressions.

The map ๐ธ will simply map ๐‘“ to the representation of the programcomputing ๐‘“ . Specifically, we let ๐ธ(๐‘“) be the representation of theprogram ๐‘ƒ computing ๐‘“ given by Theorem 5.1. This representationhas size at most ๐‘๐‘  log ๐‘ , and moreover the map ๐ธ is one to one, sinceif ๐‘“ โ‰  ๐‘“ โ€ฒ then every two programs computing ๐‘“ and ๐‘“ โ€ฒ respectivelymust have different representations.

A function mapping 0, 12 to 0, 1 can be identified with thetable of its four values on the inputs 00, 01, 10, 11. A function mapping0, 13 to 0, 1 can be identified with the table of its eight values onthe inputs 000, 001, 010, 011, 100, 101, 110, 111. More generally, everyfunction ๐น โˆถ 0, 1๐‘› โ†’ 0, 1 can be identified with the table of its 2๐‘›values on the inputs 0, 1๐‘›. Hence the number of functions mapping0, 1๐‘› to 0, 1 is equal to the number of such tables which (sincewe can choose either 0 or 1 for every row) is exactly 22๐‘› . Note thatthis is double exponential in ๐‘›, and hence even for small values of ๐‘›(e.g., ๐‘› = 10) the number of functions from 0, 1๐‘› to 0, 1 is trulyastronomical.2 This has the following important corollary:

Page 189: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 189

3 The constant ๐›ฟ is at least 0.1 and in fact, can be im-proved to be arbitrarily close to 1/2, see Exercise 5.7.

Theorem 5.3 โ€” Counting argument lower bound. There is a constant๐›ฟ > 0, such that for every sufficiently large ๐‘›, there is a function๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 such that ๐‘“ โˆ‰ SIZE ( ๐›ฟ2๐‘›๐‘› ). That is, the shortestNAND-CIRC program to compute ๐‘“ requires more than ๐›ฟ โ‹… 2๐‘›/๐‘›lines. 3

Proof. The proof is simple. If we let ๐‘ be the constant such that|SIZE(๐‘ )| โ‰ค 2๐‘๐‘  log ๐‘  and ๐›ฟ = 1/๐‘, then setting ๐‘  = ๐›ฟ2๐‘›/๐‘› we see that|SIZE( ๐›ฟ2๐‘›๐‘› )| โ‰ค 2๐‘ ๐›ฟ2๐‘›๐‘› log ๐‘  < 2๐‘๐›ฟ2๐‘› = 22๐‘› (5.2)

using the fact that since ๐‘  < 2๐‘›, log ๐‘  < ๐‘› and ๐›ฟ = 1/๐‘. But since|SIZE(๐‘ )| is smaller than the total number of functions mapping ๐‘› bitsto 1 bit, there must be at least one such function not in SIZE(๐‘ ), whichis what we needed to prove.

We have seen before that every function mapping 0, 1๐‘› to 0, 1can be computed by an ๐‘‚(2๐‘›/๐‘›) line program. Theorem 5.3 showsthat this is tight in the sense that some functions do require such anastronomical number of lines to compute.

Big Idea 7 Some functions ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 cannot be computedby a Boolean circuit using fewer than exponential (in ๐‘›) number ofgates.

In fact, as we explore in the exercises, this is the case for most func-tions. Hence functions that can be computed in a small number oflines (such as addition, multiplication, finding short paths in graphs,or even the EVAL function) are the exception, rather than the rule.

RRemark 5.4 โ€” More efficient representation (advanced,optional). The ASCII representation is not the shortestrepresentation for NAND-CIRC programs. NAND-CIRC programs are equivalent to circuits with NANDgates, which means that a NAND-CIRC program of ๐‘ lines, ๐‘› inputs, and ๐‘š outputs can be represented bya labeled directed graph of ๐‘  + ๐‘› vertices, of which ๐‘›have in-degree zero, and the ๐‘  others have in-degreeat most two. Using the adjacency matrix represen-tation for such graphs, we can reduce the implicitconstant in Theorem 5.2 to be arbitrarily close to 5, seeExercise 5.6.

Page 190: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

190 introduction to theoretical computer science

Figure 5.4: We prove Theorem 5.5 by coming upwith a list ๐‘“0, โ€ฆ , ๐‘“2๐‘› of functions such that ๐‘“0 is theall zero function, ๐‘“2๐‘› is a function (obtained fromTheorem 5.3) outside of SIZE(0.1 โ‹… 2๐‘›/๐‘›) and suchthat ๐‘“๐‘–โˆ’1 and ๐‘“๐‘– differ by one another on at most oneinput. We can show that for every ๐‘–, the number ofgates to compute ๐‘“๐‘– is at most 10๐‘› larger than thenumber of gates to compute ๐‘“๐‘–โˆ’1 and so if we let ๐‘–be the smallest number such that ๐‘“๐‘– โˆ‰ SIZE(๐‘ ), then๐‘“๐‘– โˆˆ SIZE(๐‘  + 10๐‘›).

5.2.1 Size hierarchy theorem (optional)By Theorem 4.15 the class SIZE๐‘›(10 โ‹… 2๐‘›/๐‘›) contains all functionsfrom 0, 1๐‘› to 0, 1, while by Theorem 5.3, there is some function๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 that is not contained in SIZE๐‘›(0.1 โ‹… 2๐‘›/๐‘›). In otherwords, for every sufficiently large ๐‘›,

SIZE๐‘› (0.1 2๐‘›๐‘› ) โŠŠ SIZE๐‘› (10 2๐‘›๐‘› ) . (5.3)

It turns out that we can use Theorem 5.3 to show a more general re-sult: whenever we increase our โ€œbudgetโ€ of gates we can computenew functions.

Theorem 5.5 โ€” Size Hierarchy Theorem. For every sufficiently large ๐‘›and 10๐‘› < ๐‘  < 0.1 โ‹… 2๐‘›/๐‘›,

SIZE๐‘›(๐‘ ) โŠŠ SIZE๐‘›(๐‘  + 10๐‘›) . (5.4)

Proof Idea:

To prove the theorem we need to find a function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1such that ๐‘“ can be computed by a circuit of ๐‘  + 10๐‘› gates but it cannotbe computed by a circuit of ๐‘  gates. We will do so by coming up witha sequence of functions ๐‘“0, ๐‘“1, ๐‘“2, โ€ฆ , ๐‘“๐‘ with the following properties:(1) ๐‘“0 can be computed by a circuit of at most 10๐‘› gates, (2) ๐‘“๐‘ cannotbe computed by a circuit of 0.1 โ‹… 2๐‘›/๐‘› gates, and (3) for every ๐‘– โˆˆ0, โ€ฆ , ๐‘, if ๐‘“๐‘– can be computed by a circuit of size ๐‘ , then ๐‘“๐‘–+1 can becomputed by a circuit of size at most ๐‘ +10๐‘›. Together these propertiesimply that if we let ๐‘– be the smallest number such that ๐‘“๐‘– โˆ‰ SIZE๐‘›(๐‘ ),then since ๐‘“๐‘–โˆ’1 โˆˆ SIZE(๐‘ ) it must hold that ๐‘“๐‘– โˆˆ SIZE(๐‘  + 10๐‘›) which iswhat we need to prove. See Fig. 5.4 for an illustration.โ‹†Proof of Theorem 5.5. Let ๐‘“โˆ— โˆถ 0, 1๐‘› โ†’ 0, 1 be the function(whose existence we are guaranteed by Theorem 5.3) such that๐‘“โˆ— โˆ‰ SIZE๐‘›(0.1 โ‹… 2๐‘›/๐‘›). We define the functions ๐‘“0, ๐‘“1, โ€ฆ , ๐‘“2๐‘› map-ping 0, 1๐‘› to 0, 1 as follows. For every ๐‘ฅ โˆˆ 0, 1๐‘›, if ๐‘™๐‘’๐‘ฅ(๐‘ฅ) โˆˆ0, 1, โ€ฆ , 2๐‘› โˆ’ 1 is ๐‘ฅโ€™s order in the lexicographical order then

๐‘“๐‘–(๐‘ฅ) = โŽงโŽจโŽฉ๐‘“โˆ—(๐‘ฅ) ๐‘™๐‘’๐‘ฅ(๐‘ฅ) < ๐‘–0 otherwise. (5.5)

The function ๐‘“0 is simply the constant zero function, while thefunction ๐‘“2๐‘› is equal to ๐‘“โˆ—. Moreover, for every ๐‘– โˆˆ [2๐‘›], the functions๐‘“๐‘– and ๐‘“๐‘–+1 differ on at most one input (i.e., the input ๐‘ฅ โˆˆ 0, 1๐‘› suchthat ๐‘™๐‘’๐‘ฅ(๐‘ฅ) = ๐‘–). Let 10๐‘› < ๐‘  < 0.1 โ‹… 2๐‘›/๐‘›, and let ๐‘– be the first indexsuch that ๐‘“๐‘– โˆ‰ SIZE๐‘›(๐‘ ). Since ๐‘“2๐‘› = ๐‘“โˆ— โˆ‰ SIZE(0.1 โ‹… 2๐‘›/๐‘›) there

Page 191: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 191

must exist such an index ๐‘–, and moreover ๐‘– > 0 since the constant zerofunction is a member of SIZE๐‘›(10๐‘›).

By our choice of ๐‘–, ๐‘“๐‘–โˆ’1 is a member of SIZE๐‘›(๐‘ ). To complete theproof, we need to show that ๐‘“๐‘– โˆˆ SIZE๐‘›(๐‘  + 10๐‘›). Let ๐‘ฅโˆ— be the stringsuch that ๐‘™๐‘’๐‘ฅ(๐‘ฅโˆ—) = ๐‘– ๐‘ โˆˆ 0, 1 is the value of ๐‘“โˆ—(๐‘ฅโˆ—). Then we candefine ๐‘“๐‘– also as follows

๐‘“๐‘–(๐‘ฅ) = โŽงโŽจโŽฉ๐‘ ๐‘ฅ = ๐‘ฅโˆ—๐‘“๐‘–(๐‘ฅ) ๐‘ฅ โ‰  ๐‘ฅโˆ— (5.6)

or in other words๐‘“๐‘–(๐‘ฅ) = ๐‘“๐‘–โˆ’1(๐‘ฅ) โˆง EQUAL(๐‘ฅโˆ—, ๐‘ฅ) โˆจ ๐‘ โˆง ยฌEQUAL(๐‘ฅโˆ—, ๐‘ฅ) (5.7)

where EQUAL โˆถ 0, 12๐‘› โ†’ 0, 1 is the function that maps ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ0, 1๐‘› to 1 if they are equal and to 0 otherwise. Since (by our choiceof ๐‘–), ๐‘“๐‘–โˆ’1 can be computed using at most ๐‘  gates and (as can be easilyverified) that EQUAL โˆˆ SIZE๐‘›(9๐‘›), we can compute ๐‘“๐‘– using at most๐‘  + 9๐‘› + ๐‘‚(1) โ‰ค ๐‘  + 10๐‘› gates which is what we wanted to prove.

Figure 5.5: An illustration of some of what we knowabout the size complexity classes (not to scale!). Thisfigure depicts classes of the form SIZE๐‘›,๐‘›(๐‘ ) but thestate of affairs for other size complexity classes suchas SIZE๐‘›,1(๐‘ ) is similar. We know by Theorem 4.12(with the improvement of Section 4.4.2) that allfunctions mapping ๐‘› bits to ๐‘› bits can be computedby a circuit of size ๐‘ โ‹… 2๐‘› for ๐‘ โ‰ค 10, while on theother hand the counting lower bound (Theorem 5.3,see also Exercise 5.4) shows that some such functionswill require 0.1 โ‹… 2๐‘›, and the size hierarchy theorem(Theorem 5.5) shows the existence of functions inSIZE(๐‘†) โงต SIZE(๐‘ ) whenever ๐‘  = ๐‘œ(๐‘†), see alsoExercise 5.5. We also consider some specific examples:addition of two ๐‘›/2 bit numbers can be done in ๐‘‚(๐‘›)lines, while we donโ€™t know of such a program formultiplying two ๐‘› bit numbers, though we do knowit can be done in ๐‘‚(๐‘›2) and in fact even better size.In the above, FACTOR๐‘› corresponds to the inverseproblem of multiplying- finding the prime factorizationof a given number. At the moment we do not knowof any circuit a polynomial (or even sub-exponential)number of lines that can compute FACTOR๐‘›.R

Remark 5.6 โ€” Explicit functions. While the size hierar-chy theorem guarantees that there exists some functionthat can be computed using, for example, ๐‘›2 gates, butnot using 100๐‘› gates, we do not know of any explicitexample of such a function. While we suspect thatinteger multiplication is such an example, we do nothave any proof that this is the case.

Page 192: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

192 introduction to theoretical computer science

5.3 THE TUPLES REPRESENTATION

ASCII is a fine presentation of programs, but for some applicationsit is useful to have a more concrete representation of NAND-CIRCprograms. In this section we describe a particular choice, that willbe convenient for us later on. A NAND-CIRC program is simply asequence of lines of the form

blah = NAND(baz,boo)

There is of course nothing special about the particular names weuse for variables. Although they would be harder to read, we couldwrite all our programs using only working variables such as temp_0,temp_1 etc. Therefore, our representation for NAND-CIRC programsignores the actual names of the variables, and just associate a numberwith each variable. We encode a line of the program as a triple ofnumbers. If the line has the form foo = NAND(bar,blah) then weencode it with the triple (๐‘–, ๐‘—, ๐‘˜) where ๐‘– is the number correspondingto the variable foo and ๐‘— and ๐‘˜ are the numbers corresponding to barand blah respectively.

More concretely, we will associate every variable with a numberin the set [๐‘ก] = 0, 1, โ€ฆ , ๐‘ก โˆ’ 1. The first ๐‘› numbers 0, โ€ฆ , ๐‘› โˆ’ 1correspond to the input variables, the last ๐‘š numbers ๐‘ก โˆ’ ๐‘š, โ€ฆ , ๐‘ก โˆ’ 1correspond to the output variables, and the intermediate numbers๐‘›, โ€ฆ , ๐‘ก โˆ’ ๐‘š โˆ’ 1 correspond to the remaining โ€œworkspaceโ€ variables.Formally, we define our representation as follows:

Definition 5.7 โ€” List of tuples representation. Let ๐‘ƒ be a NAND-CIRCprogram of ๐‘› inputs, ๐‘š outputs, and ๐‘  lines, and let ๐‘ก be the num-ber of distinct variables used by ๐‘ƒ . The list of tuples representationof ๐‘ƒ is the triple (๐‘›, ๐‘š, ๐ฟ) where ๐ฟ is a list of triples of the form(๐‘–, ๐‘—, ๐‘˜) for ๐‘–, ๐‘—, ๐‘˜ โˆˆ [๐‘ก].

We assign a number for variable of ๐‘ƒ as follows:

โ€ข For every ๐‘– โˆˆ [๐‘›], the variable X[๐‘–] is assigned the number ๐‘–.โ€ข For every ๐‘— โˆˆ [๐‘š], the variable Y[๐‘—] is assigned the number๐‘ก โˆ’ ๐‘š + ๐‘—.โ€ข Every other variable is assigned a number in ๐‘›, ๐‘›+1, โ€ฆ , ๐‘กโˆ’๐‘šโˆ’1 in the order in which the variable appears in the program ๐‘ƒ .

The list of tuples representation is our default choice for represent-ing NAND-CIRC programs. Since โ€œlist of tuples representationโ€ is abit of a mouthful, we will often call it simply โ€œthe representationโ€ fora program ๐‘ƒ . Sometimes, when the number ๐‘› of inputs and number

Page 193: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 193

4 If youโ€™re curious what these few lines are, see ourGitHub repository.

๐‘š of outputs are known from the context, we will simply represent aprogram as the list ๐ฟ instead of the triple (๐‘›, ๐‘š, ๐ฟ).

Example 5.8 โ€” Representing the XOR program. Our favorite NAND-CIRC program, the program

u = NAND(X[0],X[1])v = NAND(X[0],u)w = NAND(X[1],u)Y[0] = NAND(v,w)

computing the XOR function is represented as the tuple (2, 1, ๐ฟ)where ๐ฟ = ((2, 0, 1), (3, 0, 2), (4, 1, 2), (5, 3, 4)). That is, the variablesX[0] and X[1] are given the indices 0 and 1 respectively, the vari-ables u,v,w are given the indices 2, 3, 4 respectively, and the variableY[0] is given the index 5.Transforming a NAND-CIRC program from its representation as

code to the representation as a list of tuples is a fairly straightforwardprogramming exercise, and in particular can be done in a few lines ofPython.4 The list-of-tuples representation loses information such as theparticular names we used for the variables, but this is OK since thesenames do not make a difference to the functionality of the program.

5.3.1 From tuples to stringsIf ๐‘ƒ is a program of size ๐‘ , then the number ๐‘ก of variables is at most 3๐‘ (as every line touches at most three variables). Hence we can encodeevery variable index in [๐‘ก] as a string of length โ„“ = โŒˆlog(3๐‘ )โŒ‰, by addingleading zeroes as needed. Since this is a fixed-length encoding, it isprefix free, and so we can encode the list ๐ฟ of ๐‘  triples (correspondingto the encoding of the ๐‘  lines of the program) as simply the string oflength 3โ„“๐‘  obtained by concatenating all of these encodings.

We define ๐‘†(๐‘ ) to be the length of the string representing the list ๐ฟcorresponding to a size ๐‘  program. By the above we see that๐‘†(๐‘ ) = 3๐‘ โŒˆlog(3๐‘ )โŒ‰ . (5.8)

We can represent ๐‘ƒ = (๐‘›, ๐‘š, ๐ฟ) as a string by prepending a prefixfree representation of ๐‘› and ๐‘š to the list ๐ฟ. Since ๐‘›, ๐‘š โ‰ค 3๐‘  (a pro-gram must touch at least once all its input and output variables), thoseprefix free representations can be encoded using strings of length๐‘‚(log ๐‘ ). In particular, every program ๐‘ƒ of at most ๐‘  lines can be rep-resented by a string of length ๐‘‚(๐‘  log ๐‘ ). Similarly, every circuit ๐ถ ofat most ๐‘  gates can be represented by a string of length ๐‘‚(๐‘  log ๐‘ ) (forexample by translating ๐ถ to the equivalent program ๐‘ƒ ).

Page 194: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

194 introduction to theoretical computer science

5.4 A NAND-CIRC INTERPRETER IN NAND-CIRC

Since we can represent programs as strings, we can also think of aprogram as an input to a function. In particular, for every naturalnumber ๐‘ , ๐‘›, ๐‘š > 0 we define the function EVAL๐‘ ,๐‘›,๐‘š โˆถ 0, 1๐‘†(๐‘ )+๐‘› โ†’0, 1๐‘š as follows:

EVAL๐‘ ,๐‘›,๐‘š(๐‘๐‘ฅ) = โŽงโŽจโŽฉ๐‘ƒ(๐‘ฅ) ๐‘ โˆˆ 0, 1๐‘†(๐‘ ) represents a size-๐‘  program ๐‘ƒ with ๐‘› inputs and ๐‘š outputs0๐‘š otherwise(5.9)

where ๐‘†(๐‘ ) is defined as in (5.8) and we use the concrete representa-tion scheme described in Section 5.1.

That is, EVAL๐‘ ,๐‘›,๐‘š takes as input the concatenation of two strings:a string ๐‘ โˆˆ 0, 1๐‘†(๐‘ ) and a string ๐‘ฅ โˆˆ 0, 1๐‘›. If ๐‘ is a string thatrepresents a list of triples ๐ฟ such that (๐‘›, ๐‘š, ๐ฟ) is a list-of-tuples rep-resentation of a size-๐‘  NAND-CIRC program ๐‘ƒ , then EVAL๐‘ ,๐‘›,๐‘š(๐‘๐‘ฅ)is equal to the evaluation ๐‘ƒ(๐‘ฅ) of the program ๐‘ƒ on the input ๐‘ฅ. Oth-erwise, EVAL๐‘ ,๐‘›,๐‘š(๐‘๐‘ฅ) equals 0๐‘š (this case is not very important: youcan simply think of 0๐‘š as some โ€œjunk valueโ€ that indicates an error).

Take-away points. The fine details of EVAL๐‘ ,๐‘›,๐‘šโ€™s definition are notvery crucial. Rather, what you need to remember about EVAL๐‘ ,๐‘›,๐‘š isthat:

โ€ข EVAL๐‘ ,๐‘›,๐‘š is a finite function taking a string of fixed length as inputand outputting a string of fixed length as output.

โ€ข EVAL๐‘ ,๐‘›,๐‘š is a single function, such that computing EVAL๐‘ ,๐‘›,๐‘šallows to evaluate arbitrary NAND-CIRC programs of a certainlength on arbitrary inputs of the appropriate length.

โ€ข EVAL๐‘ ,๐‘›,๐‘š is a function, not a program (recall the discussion in Sec-tion 3.6.2). That is, EVAL๐‘ ,๐‘›,๐‘š is a specification of what output is as-sociated with what input. The existence of a program that computesEVAL๐‘ ,๐‘›,๐‘š (i.e., an implementation for EVAL๐‘ ,๐‘›,๐‘š) is a separatefact, which needs to be established (and which we will do in Theo-rem 5.9, with a more efficient program shown in Theorem 5.10).

One of the first examples of self circularity we will see in this book isthe following theorem, which we can think of as showing a โ€œNAND-CIRC interpreter in NAND-CIRCโ€:

Theorem 5.9 โ€” Bounded Universality of NAND-CIRC programs. For every๐‘ , ๐‘›, ๐‘š โˆˆ โ„• with ๐‘  โ‰ฅ ๐‘š there is a NAND-CIRC program ๐‘ˆ๐‘ ,๐‘›,๐‘š thatcomputes the function EVAL๐‘ ,๐‘›,๐‘š.

That is, the NAND-CIRC program ๐‘ˆ๐‘ ,๐‘›,๐‘š takes the descriptionof any other NAND-CIRC program ๐‘ƒ (of the right length and input-

Page 195: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 195

Figure 5.6: A universal circuit ๐‘ˆ is a circuit that gets asinput the description of an arbitrary (smaller) circuit๐‘ƒ as a binary string, and an input ๐‘ฅ, and outputs thestring ๐‘ƒ(๐‘ฅ) which is the evaluation of ๐‘ƒ on ๐‘ฅ. We canalso think of ๐‘ˆ as a straight-line program that gets asinput the code of a straight-line program ๐‘ƒ and aninput ๐‘ฅ, and outputs ๐‘ƒ(๐‘ฅ).

s/outputs) and any input ๐‘ฅ, and computes the result of evaluating theprogram ๐‘ƒ on the input ๐‘ฅ. Given the equivalence between NAND-CIRC programs and Boolean circuits, we can also think of ๐‘ˆ๐‘ ,๐‘›,๐‘š asa circuit that takes as input the description of other circuits and theirinputs, and returns their evaluation, see Fig. 5.6. We call this NAND-CIRC program ๐‘ˆ๐‘ ,๐‘›,๐‘š that computes EVAL๐‘ ,๐‘›,๐‘š a bounded universalprogram (or a universal circuit, see Fig. 5.6). โ€œUniversalโ€ stands forthe fact that this is a single program that can evaluate arbitrary code,where โ€œboundedโ€ stands for the fact that ๐‘ˆ๐‘ ,๐‘›,๐‘š only evaluates pro-grams of bounded size. Of course this limitation is inherent for theNAND-CIRC programming language, since a program of ๐‘  lines (or,equivalently, a circuit of ๐‘  gates) can take at most 2๐‘  inputs. Later, inChapter 7, we will introduce the concept of loops (and the model ofTuring Machines), that allow to escape this limitation.

Proof. Theorem 5.9 is an important result, but it is actually not hard toprove. Specifically, since EVAL๐‘ ,๐‘›,๐‘š is a finite function Theorem 5.9 isan immediate corollary of Theorem 4.12, which states that every finitefunction can be computed by some NAND-CIRC program.

PTheorem 5.9 is simple but important. Make sure youunderstand what this theorem means, and why it is acorollary of Theorem 4.12.

5.4.1 Efficient universal programsTheorem 5.9 establishes the existence of a NAND-CIRC programfor computing EVAL๐‘ ,๐‘›,๐‘š, but it provides no explicit bound on thesize of this program. Theorem 4.12, which we used to prove Theo-rem 5.9, guarantees the existence of a NAND-CIRC program whosesize can be as large as exponential in the length of its input. This wouldmean that even for moderately small values of ๐‘ , ๐‘›, ๐‘š (for example๐‘› = 100, ๐‘  = 300, ๐‘š = 1), computing EVAL๐‘ ,๐‘›,๐‘š might require aNAND program with more lines than there are atoms in the observ-able universe! Fortunately, we can do much better than that. In fact,for every ๐‘ , ๐‘›, ๐‘š there exists a NAND-CIRC program for comput-ing EVAL๐‘ ,๐‘›,๐‘š with size that is polynomial in its input length. This isshown in the following theorem.

Theorem 5.10 โ€” Efficient bounded universality of NAND-CIRC programs.

For every ๐‘ , ๐‘›, ๐‘š โˆˆ โ„• there is a NAND-CIRC program of atmost ๐‘‚(๐‘ 2 log ๐‘ ) lines that computes the function EVAL๐‘ ,๐‘›,๐‘š โˆถ

Page 196: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

196 introduction to theoretical computer science

0, 1๐‘†+๐‘› โ†’ 0, 1๐‘š defined above (where ๐‘† is the number of bitsneeded to represent programs of ๐‘  lines).

PIf you havenโ€™t done so already, now might be a goodtime to review ๐‘‚ notation in Section 1.4.8. In particu-lar, an equivalent way to state Theorem 5.10 is that itsays that there exists some number ๐‘ > 0 such that forevery ๐‘ , ๐‘›, ๐‘š โˆˆ โ„•, there exists a NAND-CIRC program๐‘ƒ of at most ๐‘๐‘ 2 log ๐‘  lines that computes the functionEVAL๐‘ ,๐‘›,๐‘š.

Unlike Theorem 5.9, Theorem 5.10 is not a trivial corollary of thefact that every finite function can be computed by some circuit. Prov-ing Theorem 5.9 requires us to present a concrete NAND-CIRC pro-gram for computing the function EVAL๐‘ ,๐‘›,๐‘š. We will do so in severalstages.1. First, we will describe the algorithm to evaluate EVAL๐‘ ,๐‘›,๐‘š in

โ€œpseudo codeโ€.

2. Then, we will show how we can write a program to computeEVAL๐‘ ,๐‘›,๐‘š in Python. We will not use much about Python, anda reader that has familiarity with programming in any languageshould be able to follow along.

3. Finally, we will show how we can transform this Python programinto a NAND-CIRC program.This approach yields much more than just proving Theorem 5.10:

we will see that it is in fact always possible to transform (loop free)code in high level languages such as Python to NAND-CIRC pro-grams (and hence to Boolean circuits as well).

5.4.2 A NAND-CIRC interpeter in โ€œpseudocodeโ€To prove Theorem 5.10 it suffices to give a NAND-CIRC program of๐‘‚(๐‘ 2 log ๐‘ ) lines that can evaluate NAND-CIRC programs of ๐‘  lines.Let us start by thinking how we would evaluate such programs if wewerenโ€™t restricted to only performing NAND operations. That is, let usdescribe informally an algorithm that on input ๐‘›, ๐‘š, ๐‘ , a list of triples๐ฟ, and a string ๐‘ฅ โˆˆ 0, 1๐‘›, evaluates the program represented by(๐‘›, ๐‘š, ๐ฟ) on the string ๐‘ฅ.

PIt would be highly worthwhile for you to stop hereand try to solve this problem yourself. For example,you can try thinking how you would write a program

Page 197: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 197

NANDEVAL(n,m,s,L,x) that computes this function inthe programming language of your choice.

We will now describe such an algorithm. We assume that we haveaccess to a bit array data structure that can store for every ๐‘– โˆˆ [๐‘ก] abit ๐‘‡๐‘– โˆˆ 0, 1. Specifically, if Table is a variable holding this datastructure, then we assume we can perform the operations:

โ€ข GET(Table,i)which retrieves the bit corresponding to i in Table.The value of i is assumed to be an integer in [๐‘ก].

โ€ข Table = UPDATE(Table,i,b)which updates Table so the the bitcorresponding to i is now set to b. The value of i is assumed to bean integer in [๐‘ก] and b is a bit in 0, 1.Algorithm 5.11 โ€” Eval NAND-CIRC programs.

Input: Numbers ๐‘›, ๐‘š, ๐‘  and ๐‘ก โ‰ค 3๐‘ , as well as a list ๐ฟ of ๐‘ triples of numbers in [๐‘ก], and a string ๐‘ฅ โˆˆ 0, 1๐‘›.

Output: Evaluation of the program represented by(๐‘›, ๐‘š, ๐ฟ) on theInput: ๐‘ฅ โˆˆ 0, 1๐‘›.1: Let Vartable be table of size ๐‘ก2: for ๐‘– in [๐‘›] do3: Vartable = UPDATE(Vartable,๐‘–,๐‘ฅ๐‘–)4: end for5: for (๐‘–, ๐‘—, ๐‘˜) in ๐ฟ do6: ๐‘Ž โ† GET(Vartable,๐‘—)7: ๐‘ โ† GET(Vartable,๐‘˜)8: Vartable = UPDATE(Vartable,๐‘–,NAND(๐‘Ž,๐‘))9: end for

10: for ๐‘— in [๐‘š] do11: ๐‘ฆ๐‘— โ† GET(Vartable,๐‘ก โˆ’ ๐‘š + ๐‘—)12: end for13: return ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘šโˆ’1

Algorithm 5.11 evaluates the program given to it as input one lineat a time, updating the Vartable table to contain the value of eachvariable. At the end of the execution it outputs the variables at posi-tions ๐‘ก โˆ’ ๐‘š, ๐‘ก โˆ’ ๐‘š + 1, โ€ฆ , ๐‘ก โˆ’ 1 which correspond to the input variables.

5.4.3 A NAND interpreter in PythonTo make things more concrete, let us see how we implement Algo-rithm 5.11 in the Python programming language. (There is nothingspecial about Python. We could have easily presented a corresponding

Page 198: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

198 introduction to theoretical computer science

5 Python does not distinguish between lists andarrays, but allows constant time random access to anindexed elements to both of them. One could arguethat if we allowed programs of truly unboundedlength (e.g., larger than 264) then the price wouldnot be constant but logarithmic in the length of thearray/lists, but the difference between ๐‘‚(๐‘ ) and๐‘‚(๐‘  log ๐‘ ) will not be important for our discussions.

function in JavaScript, C, OCaml, or any other programming lan-guage.) We will construct a function NANDEVAL that on input ๐‘›, ๐‘š, ๐ฟ, ๐‘ฅwill output the result of evaluating the program represented by(๐‘›, ๐‘š, ๐ฟ) on ๐‘ฅ. To keep things simple, we will not worry about the casethat ๐ฟ does not represent a valid program of ๐‘› inputs and ๐‘š outputs.The code is presented in Fig. 5.7.

Accessing an element of the array Vartable at a given index takesa constant number of basic operations. Hence (since ๐‘›, ๐‘š โ‰ค ๐‘  and๐‘ก โ‰ค 3๐‘ ), the program above will use ๐‘‚(๐‘ ) basic operations.5

5.4.4 Constructing the NAND-CIRC interpreter in NAND-CIRCWe now turn to describing the proof of Theorem 5.10. To prove thetheorem it is not enough to give a Python program. Rather, we need toshow how we compute the function EVAL๐‘ ,๐‘›,๐‘š using a NAND-CIRCprogram. In other words, our job is to transform, for every ๐‘ , ๐‘›, ๐‘š, thePython code of Section 5.4.3 to a NAND-CIRC program ๐‘ˆ๐‘ ,๐‘›,๐‘š thatcomputes the function EVAL๐‘ ,๐‘›,๐‘š.

PBefore reading further, try to think how you could givea โ€œconstructive proofโ€ of Theorem 5.10. That is, thinkof how you would write, in the programming lan-guage of your choice, a function universal(s,n,m)that on input ๐‘ , ๐‘›, ๐‘š outputs the code for the NAND-CIRC program ๐‘ˆ๐‘ ,๐‘›,๐‘š such that ๐‘ˆ๐‘ ,๐‘›,๐‘š computesEVAL๐‘ ,๐‘›,๐‘š. There is a subtle but crucial differencebetween this function and the Python NANDEVAL pro-gram described above. Rather than actually evaluatinga given program ๐‘ƒ on some input ๐‘ค, the functionuniversal should output the code of a NAND-CIRCprogram that computes the map (๐‘ƒ , ๐‘ฅ) โ†ฆ ๐‘ƒ(๐‘ฅ).

Our construction will follow very closely the Python implementa-tion of EVAL above. We will use variables Vartable[0],โ€ฆ,Vartable[2โ„“โˆ’1], where โ„“ = โŒˆlog 3๐‘ โŒ‰ to store our variables. However, NAND doesnโ€™thave integer-valued variables, so we cannot write code such asVartable[i] for some variable i. However, we can implement thefunction GET(Vartable,i) that outputs the i-th bit of the arrayVartable. Indeed, this is nothing but the function LOOKUPโ„“ that wehave seen in Theorem 4.10!

PPlease make sure that you understand why GET andLOOKUPโ„“ are the same function.

We saw that we can compute LOOKUPโ„“ in time ๐‘‚(2โ„“) = ๐‘‚(๐‘ ) forour choice of โ„“.

Page 199: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 199

Figure 5.7: Code for evaluating a NAND-CIRC program given in the list-of-tuples representation

def NANDEVAL(n,m,L,X):# Evaluate a NAND-CIRC program from list of tuple representation.s = len(L) # num of linest = max(max(a,b,c) for (a,b,c) in L)+1 # max index in L + 1Vartable = [0] * t # initialize array

# helper functionsdef GET(V,i): return V[i]def UPDATE(V,i,b):

V[i]=breturn V

# load input values to Vartable:for i in range(n):

Vartable = UPDATE(Vartable,i,X[i])

# Run the programfor (i,j,k) in L:

a = GET(Vartable,j)b = GET(Vartable,k)c = NAND(a,b)Vartable = UPDATE(Vartable,i,c)

# Return outputs Vartable[t-m], Vartable[t-m+1],....,Vartable[t-1]return [GET(Vartable,t-m+j) for j in range(m)]

# Test on XOR (2 inputs, 1 output)L = ((2, 0, 1), (3, 0, 2), (4, 1, 2), (5, 3, 4))print(NANDEVAL(2,1,L,(0,1))) # XOR(0,1)# [1]print(NANDEVAL(2,1,L,(1,1))) # XOR(1,1)# [0]

Page 200: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

200 introduction to theoretical computer science

For every โ„“, let UPDATEโ„“ โˆถ 0, 12โ„“+โ„“+1 โ†’ 0, 12โ„“ correspond to theUPDATE function for arrays of length 2โ„“. That is, on input ๐‘‰ โˆˆ 0, 12โ„“ ,๐‘– โˆˆ 0, 1โ„“, ๐‘ โˆˆ 0, 1, UPDATEโ„“(๐‘‰ , ๐‘, ๐‘–) is equal to ๐‘‰ โ€ฒ โˆˆ 0, 12โ„“ suchthat ๐‘‰ โ€ฒ๐‘— = โŽงโŽจโŽฉ๐‘‰๐‘— ๐‘— โ‰  ๐‘–๐‘ ๐‘— = 1 (5.10)

where we identify the string ๐‘– โˆˆ 0, 1โ„“ with a number in 0, โ€ฆ , 2โ„“ โˆ’ 1using the binary representation. We can compute UPDATEโ„“ using an๐‘‚(2โ„“โ„“) = (๐‘  log ๐‘ ) line NAND-CIRC program as as follows:

1. For every ๐‘— โˆˆ [2โ„“], there is an ๐‘‚(โ„“) line NAND-CIRC program tocompute the function EQUALS๐‘— โˆถ 0, 1โ„“ โ†’ 0, 1 that on input ๐‘–outputs 1 if and only if ๐‘– is equal to (the binary representation of) ๐‘—.(We leave verifying this as Exercise 5.2 and Exercise 5.3.)

2. We have seen that we can compute the function IF โˆถ 0, 13 โ†’ 0, 1such that IF(๐‘Ž, ๐‘, ๐‘) equals ๐‘ if ๐‘Ž = 1 and ๐‘ if ๐‘Ž = 0.Together, this means that we can compute UPDATE (using some

โ€œsyntactic sugarโ€ for bounded length loops) as follows:

def UPDATE_ell(V,i,b):# Get V[0]...V[2^ell-1], i in 0,1^ell, b in 0,1# Return NewV[0],...,NewV[2^ell-1]# updated array with NewV[i]=b and all# else same as Vfor j in range(2**ell): # j = 0,1,2,....,2^ell -1

a = EQUALS_j(i)NewV[j] = IF(a,b,V[j])

return NewV

Since the loop over j in UPDATE is run 2โ„“ times, and computingEQUALS_j takes ๐‘‚(โ„“) lines, the total number of lines to compute UP-DATE is ๐‘‚(2โ„“ โ‹… โ„“) = ๐‘‚(๐‘  log ๐‘ ). Once we can compute GET and UPDATE,the rest of the implementation amounts to โ€œbook keepingโ€ that needsto be done carefully, but is not too insightful, and hence we omit thefull details. Since we run GET and UPDATE ๐‘  times, the total number oflines for computing EVAL๐‘ ,๐‘›,๐‘š is ๐‘‚(๐‘ 2) + ๐‘‚(๐‘ 2 log ๐‘ ) = ๐‘‚(๐‘ 2 log ๐‘ ).This completes (up to the omitted details) the proof of Theorem 5.10.

RRemark 5.12 โ€” Improving to quasilinear overhead (ad-vanced optional note). The NAND-CIRC programabove is less efficient than its Python counterpart,since NAND does not offer arrays with efficient ran-dom access. Hence for example the LOOKUP operation

Page 201: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 201

6 ARM stands for โ€œAdvanced RISC Machineโ€ whereRISC in turn stands for โ€œReduced instruction setcomputerโ€.

on an array of ๐‘  bits takes ฮฉ(๐‘ ) lines in NAND eventhough it takes ๐‘‚(1) steps (or maybe ๐‘‚(log ๐‘ ) steps,depending on how we count) in Python.It turns out that it is possible to improve the boundof Theorem 5.10, and evaluate ๐‘  line NAND-CIRCprograms using a NAND-CIRC program of ๐‘‚(๐‘  log ๐‘ )lines. The key is to consider the description of NAND-CIRC programs as circuits, and in particular as di-rected acyclic graphs (DAGs) of bounded in-degree.A universal NAND-CIRC program ๐‘ˆ๐‘  for ๐‘  line pro-grams will correspond to a universal graph ๐ป๐‘  for such๐‘  vertex DAGs. We can think of such a graph ๐‘ˆ๐‘  asfixed โ€œwiringโ€ for a communication network, thatshould be able to accommodate any arbitrary patternof communication between ๐‘  vertices (where this pat-tern corresponds to an ๐‘  line NAND-CIRC program).It turns out that such efficient routing networks existthat allow embedding any ๐‘  vertex circuit inside a uni-versal graph of size ๐‘‚(๐‘  log ๐‘ ), see the bibliographicalnotes Section 5.9 for more on this issue.

5.5 A PYTHON INTERPRETER IN NAND-CIRC (DISCUSSION)

To prove Theorem 5.10 we essentially translated every line of thePython program for EVAL into an equivalent NAND-CIRC snip-pet. However none of our reasoning was specific to the particu-lar function EVAL. It is possible to translate every Python programinto an equivalent NAND-CIRC program of comparable efficiency.(More concretely, if the Python program takes ๐‘‡ (๐‘›) operations oninputs of length at most ๐‘› then there exists NAND-CIRC program of๐‘‚(๐‘‡ (๐‘›) log๐‘‡ (๐‘›)) lines that agrees with the Python program on inputsof length ๐‘›.) Actually doing so requires taking care of many detailsand is beyond the scope of this book, but let me try to convince youwhy you should believe it is possible in principle.

For starters, one can use CPython (the reference implementationfor Python), to evaluate every Python program using a C program. Wecan combine this with a C compiler to transform a Python programto various flavors of โ€œmachine languageโ€. So, to transform a Pythonprogram into an equivalent NAND-CIRC program, it is enough toshow how to transform a machine language program into an equiva-lent NAND-CIRC program. One minimalistic (and hence convenient)family of machine languages is known as the ARM architecture whichpowers many mobile devices including essentially all Android de-vices.6 There are even simpler machine languages, such as the LEGarchitecture for which a backend for the LLVM compiler was imple-mented (and hence can be the target of compiling any of the largeand growing list of languages that this compiler supports). Other ex-

Page 202: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

202 introduction to theoretical computer science

amples include the TinyRAM architecture (motivated by interactiveproof systems that we will discuss in Chapter 22) and the teaching-oriented Ridiculously Simple Computer architecture. Going one byone over the instruction sets of such computers and translating themto NAND snippets is no fun, but it is a feasible thing to do. In fact,ultimately this is very similar to the transformation that takes placein converting our high level code to actual silicon gates that are notso different from the operations of a NAND-CIRC program. Indeed,tools such as MyHDL that transform โ€œPython to Siliconโ€ can be usedto convert a Python program to a NAND-CIRC program.

The NAND-CIRC programming language is just a teaching tool,and by no means do I suggest that writing NAND-CIRC programs, orcompilers to NAND-CIRC, is a practical, useful, or enjoyable activity.What I do want is to make sure you understand why it can be done,and to have the confidence that if your life (or at least your grade)depended on it, then you would be able to do this. Understandinghow programs in high level languages such as Python are eventuallytransformed into concrete low-level representation such as NAND isfundamental to computer science.

The astute reader might notice that the above paragraphs onlyoutlined why it should be possible to find for every particular Python-computable function ๐‘“ , a particular comparably efficient NAND-CIRCprogram ๐‘ƒ that computes ๐‘“ . But this still seems to fall short of ourgoal of writing a โ€œPython interpreter in NANDโ€ which would meanthat for every parameter ๐‘›, we come up with a single NAND-CIRCprogram UNIV๐‘  such that given a description of a Python program๐‘ƒ , a particular input ๐‘ฅ, and a bound ๐‘‡ on the number of operations(where the lengths of ๐‘ƒ and ๐‘ฅ and the value of ๐‘‡ are all at most ๐‘ )returns the result of executing ๐‘ƒ on ๐‘ฅ for at most ๐‘‡ steps. After all,the transformation above takes every Python program into a differentNAND-CIRC program, and so does not yield โ€œone NAND-CIRC pro-gram to rule them allโ€ that can evaluate every Python program up tosome given complexity. However, we can in fact obtain one NAND-CIRC program to evaluate arbitrary Python programs. The reason isthat there exists a Python interpreter in Python: a Python program ๐‘ˆthat takes a bit string, interprets it as Python code, and then runs thatcode. Hence, we only need to show a NAND-CIRC program ๐‘ˆ โˆ— thatcomputes the same function as the particular Python program ๐‘ˆ , andthis will give us a way to evaluate all Python programs.

What we are seeing time and again is the notion of universality orself reference of computation, which is the sense that all reasonably richmodels of computation are expressive enough that they can โ€œsimulatethemselvesโ€. The importance of this phenomenon to both the theoryand practice of computing, as well as far beyond it, including the

Page 203: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 203

foundations of mathematics and basic questions in science, cannot beoverstated.

5.6 THE PHYSICAL EXTENDED CHURCH-TURING THESIS (DISCUS-SION)

Weโ€™ve seen that NAND gates (and other Boolean operations) can beimplemented using very different systems in the physical world. Whatabout the reverse direction? Can NAND-CIRC programs simulate anyphysical computer?

We can take a leap of faith and stipulate that Boolean circuits (orequivalently NAND-CIRC programs) do actually encapsulate everycomputation that we can think of. Such a statement (in the realm ofinfinite functions, which weโ€™ll encounter in Chapter 7) is typicallyattributed to Alonzo Church and Alan Turing, and in that contextis known as the Church Turing Thesis. As we will discuss in futurelectures, the Church-Turing Thesis is not a mathematical theorem orconjecture. Rather, like theories in physics, the Church-Turing Thesisis about mathematically modeling the real world. In the context offinite functions, we can make the following informal hypothesis orprediction:

โ€œPhysical Extended Church-Turing Thesisโ€ (PECTT): If a function๐น โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š can be computed in the physical world using ๐‘  amountof โ€œphysical resourcesโ€ then it can be computed by a Boolean circuit program ofroughly ๐‘  gates.

A priori it might seem rather extreme to hypothesize that our mea-ger model of NAND-CIRC programs or Boolean circuits captures allpossible physical computation. But yet, in more than a century ofcomputing technologies, no one has yet built any scalable computingdevice that challenges this hypothesis.

We now discuss the โ€œfine printโ€ of the PECTT in more detail, aswell as the (so far unsuccessful) challenges that have been raisedagainst it. There is no single universally-agreed-upon formalizationof โ€œroughly ๐‘  physical resourcesโ€, but we can approximate this notionby considering the size of any physical computing device and thetime it takes to compute the output, and ask that any such device canbe simulated by a Boolean circuit with a number of gates that is apolynomial (with not too large exponent) in the size of the system andthe time it takes it to operate.

In other words, we can phrase the PECTT as stipulating that anyfunction that can be computed by a device that takes a certain volume๐‘‰ of space and requires ๐‘ก time to complete the computation, must becomputable by a Boolean circuit with a number of gates ๐‘(๐‘‰ , ๐‘ก) that ispolynomial in ๐‘‰ and ๐‘ก.

Page 204: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

204 introduction to theoretical computer science

The exact form of the function ๐‘(๐‘‰ , ๐‘ก) is not universally agreedupon but it is generally accepted that if ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 is anexponentially hard function, in the sense that it has no NAND-CIRCprogram of fewer than, say, 2๐‘›/2 lines, then a demonstration of a phys-ical device that can compute in the real world ๐‘“ for moderate inputlengths (e.g., ๐‘› = 500) would be a violation of the PECTT.

RRemark 5.13 โ€” Advanced note: making PECTT concrete(advanced, optional). We can attempt a more exactphrasing of the PECTT as follows. Suppose that ๐‘ isa physical system that accepts ๐‘› binary stimuli andhas a binary output, and can be enclosed in a sphereof volume ๐‘‰ . We say that the system ๐‘ computes afunction ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 within ๐‘ก seconds if when-ever we set the stimuli to some value ๐‘ฅ โˆˆ 0, 1๐‘›, ifwe measure the output after ๐‘ก seconds then we obtain๐‘“(๐‘ฅ).One can then phrase the PECTT as stipulating that ifthere exists such a system ๐‘ that computes ๐น within๐‘ก seconds, then there exists a NAND-CIRC programthat computes ๐น and has at most ๐›ผ(๐‘‰ ๐‘ก)2 lines, where๐›ผ is some normalization constant. (We can also con-sider variants where we use surface area insteadof volume, or take (๐‘‰ ๐‘ก) to a different power than 2.However, none of these choices makes a qualitativedifference to the discussion below.) In particular,suppose that ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 is a function thatrequires 2๐‘›/(100๐‘›) > 20.8๐‘› lines for any NAND-CIRCprogram (such a function exists by Theorem 5.3).Then the PECTT would imply that either the volumeor the time of a system that computes ๐น will have tobe at least 20.2๐‘›/โˆš๐›ผ. Since this quantity grows expo-nentially in ๐‘›, it is not hard to set parameters so thateven for moderately large values of ๐‘›, such a systemcould not fit in our universe.To fully make the PECTT concrete, we need to decideon the units for measuring time and volume, and thenormalization constant ๐›ผ. One conservative choice isto assume that we could squeeze computation to theabsolute physical limits (which are many orders ofmagnitude beyond current technology). This corre-sponds to setting ๐›ผ = 1 and using the Planck unitsfor volume and time. The Planck length โ„“๐‘ƒ (which is,roughly speaking, the shortest distance that can the-oretically be measured) is roughly 2โˆ’120 meters. ThePlanck time ๐‘ก๐‘ƒ (which is the time it takes for light totravel one Planck length) is about 2โˆ’150 seconds. In theabove setting, if a function ๐น takes, say, 1KB of input(e.g., roughly 104 bits, which can encode a 100 by 100bitmap image), and requires at least 20.8๐‘› = 20.8โ‹…104NAND lines to compute, then any physical systemthat computes it would require either volume of

Page 205: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 205

20.2โ‹…104 Planck length cubed, which is more than 21500meters cubed or take at least 20.2โ‹…104 Planck Time units,which is larger than 21500 seconds. To get a sense ofhow big that number is, note that the universe is onlyabout 260 seconds old, and its observable radius isonly roughly 290 meters. The above discussion sug-gests that it is possible to empirically falsify the PECTTby presenting a smaller-than-universe-size system thatcomputes such a function.There are of course several hurdles to refuting thePECTT in this way, one of which is that we canโ€™t actu-ally test the system on all possible inputs. However,it turns out that we can get around this issue usingnotions such as interactive proofs and program checkingthat we might encounter later in this book. Another,perhaps more salient problem, is that while we knowmany hard functions exist, at the moment there is nosingle explicit function ๐น โˆถ 0, 1๐‘› โ†’ 0, 1 for whichwe can prove an ๐œ”(๐‘›) (let alone ฮฉ(2๐‘›/๐‘›)) lower boundon the number of lines that a NAND-CIRC programneeds to compute it.

5.6.1 Attempts at refuting the PECTTOne of the admirable traits of mankind is the refusal to accept limita-tions. In the best case this is manifested by people achieving long-standing โ€œimpossibleโ€ challenges such as heavier-than-air flight,putting a person on the moon, circumnavigating the globe, or evenresolving Fermatโ€™s Last Theorem. In the worst case it is manifested bypeople continually following the footsteps of previous failures to try todo proven-impossible tasks such as build a perpetual motion machine,trisect an angle with a compass and straightedge, or refute Bellโ€™s in-equality. The Physical Extended Church Turing thesis (in its variousforms) has attracted both types of people. Here are some physicaldevices that have been speculated to achieve computational tasks thatcannot be done by not-too-large NAND-CIRC programs:

โ€ข Spaghetti sort: One of the first lower bounds that Computer Sci-ence students encounter is that sorting ๐‘› numbers requires makingฮฉ(๐‘› log๐‘›) comparisons. The โ€œspaghetti sortโ€ is a description of aproposed โ€œmechanical computerโ€ that would do this faster. Theidea is that to sort ๐‘› numbers ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›, we could cut ๐‘› spaghettinoodles into lengths ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›, and then if we simply hold themtogether in our hand and bring them down to a flat surface, theywill emerge in sorted order. There are a great many reasons whythis is not truly a challenge to the PECTT hypothesis, and I will notruin the readerโ€™s fun in finding them out by her or himself.

Page 206: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

206 introduction to theoretical computer science

Figure 5.8: Scott Aaronson tests a candidate device forcomputing Steiner trees using soap bubbles.

7 We were extremely conservative in the suggestedparameters for the PECTT, having assumed that asmany as โ„“โˆ’2๐‘ƒ 10โˆ’6 โˆผ 1061 bits could potentially bestored in a millimeter radius region.

โ€ข Soap bubbles: One function ๐น โˆถ 0, 1๐‘› โ†’ 0, 1 that is conjecturedto require a large number of NAND lines to solve is the EuclideanSteiner Tree problem. This is the problem where one is given ๐‘špoints in the plane (๐‘ฅ1, ๐‘ฆ1), โ€ฆ , (๐‘ฅ๐‘š, ๐‘ฆ๐‘š) (say with integer coordi-nates ranging from 1 till ๐‘š, and hence the list can be representedas a string of ๐‘› = ๐‘‚(๐‘š log๐‘š) size) and some number ๐พ. The goalis to figure out whether it is possible to connect all the points byline segments of total length at most ๐พ. This function is conjec-tured to be hard because it is NP complete - a concept that weโ€™ll en-counter later in this course - and it is in fact reasonable to conjecturethat as ๐‘š grows, the number of NAND lines required to computethis function grows exponentially in ๐‘š, meaning that the PECTTwould predict that if ๐‘š is sufficiently large (such as few hundredsor so) then no physical device could compute ๐น . Yet, some peopleclaimed that there is in fact a very simple physical device that couldsolve this problem, that can be constructed using some woodenpegs and soap. The idea is that if we take two glass plates, and put๐‘š wooden pegs between them in the locations (๐‘ฅ1, ๐‘ฆ1), โ€ฆ , (๐‘ฅ๐‘š, ๐‘ฆ๐‘š)then bubbles will form whose edges touch those pegs in a way thatwill minimize the total energy which turns out to be a function ofthe total length of the line segments. The problem with this deviceis that nature, just like people, often gets stuck in โ€œlocal optimaโ€.That is, the resulting configuration will not be one that achievesthe absolute minimum of the total energy but rather one that canโ€™tbe improved with local changes. Aaronson has carried out actualexperiments (see Fig. 5.8), and saw that while this device oftenis successful for three or four pegs, it starts yielding suboptimalresults once the number of pegs grows beyond that.

โ€ข DNA computing. People have suggested using the properties ofDNA to do hard computational problems. The main advantage ofDNA is the ability to potentially encode a lot of information in arelatively small physical space, as well as compute on this infor-mation in a highly parallel manner. At the time of this writing, itwas demonstrated that one can use DNA to store about 1016 bitsof information in a region of radius about a millimeter, as opposedto about 1010 bits with the best known hard disk technology. Thisdoes not posit a real challenge to the PECTT but does suggest thatone should be conservative about the choice of constant and not as-sume that current hard disk + silicon technologies are the absolutebest possible.7

โ€ข Continuous/real computers. The physical world is often describedusing continuous quantities such as time and space, and people

Page 207: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 207

have suggested that analog devices might have direct access tocomputing with real-valued quantities and would be inherentlymore powerful than discrete models such as NAND machines.Whether the โ€œtrueโ€ physical world is continuous or discrete is anopen question. In fact, we do not even know how to precisely phrasethis question, let alone answer it. Yet, regardless of the answer, itseems clear that the effort to measure a continuous quantity growswith the level of accuracy desired, and so there is no โ€œfree lunchโ€or way to bypass the PECTT using such machines (see also thispaper). Related to that are proposals known as โ€œhypercomputingโ€or โ€œZenoโ€™s computersโ€ which attempt to use the continuity of timeby doing the first operation in one second, the second one in half asecond, the third operation in a quarter second and so on.. Thesefail for a similar reason to the one guaranteeing that Achilles willeventually catch the tortoise despite the original Zenoโ€™s paradox.

โ€ข Relativity computer and time travel. The formulation above as-sumed the notion of time, but under the theory of relativity time isin the eye of the observer. One approach to solve hard problems isto leave the computer to run for a lot of time from his perspective,but to ensure that this is actually a short while from our perspective.One approach to do so is for the user to start the computer and thengo for a quick jog at close to the speed of light before checking onits status. Depending on how fast one goes, few seconds from thepoint of view of the user might correspond to centuries in com-puter time (it might even finish updating its Windows operatingsystem!). Of course the catch here is that the energy required fromthe user is proportional to how close one needs to get to the speedof light. A more interesting proposal is to use time travel via closedtimelike curves (CTCs). In this case we could run an arbitrarily longcomputation by doing some calculations, remembering the currentstate, and then travelling back in time to continue where we left off.Indeed, if CTCs exist then weโ€™d probably have to revise the PECTT(though in this case I will simply travel back in time and edit thesenotes, so I can claim I never conjectured it in the first placeโ€ฆ)

โ€ข Humans. Another computing system that has been proposed asa counterexample to the PECTT is a 3 pound computer of about0.1m radius, namely the human brain. Humans can walk around,talk, feel, and do other things that are not commonly done byNAND-CIRC programs, but can they compute partial functionsthat NAND-CIRC programs cannot? There are certainly compu-tational tasks that at the moment humans do better than computers(e.g., play some video games, at the moment), but based on ourcurrent understanding of the brain, humans (or other animals)

Page 208: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

208 introduction to theoretical computer science

8 This is a very rough approximation that couldbe wrong to a few orders of magnitude in eitherdirection. For one, there are other structures in thebrain apart from neurons that one might need tosimulate, hence requiring higher overhead. On theother hand, it is by no mean clear that we need tofully clone the brain in order to achieve the samecomputational tasks that it does.

9 There are some well known scientists that haveadvocated that humans have inherent computationaladvantages over computers. See also this.

have no inherent computational advantage over computers. Thebrain has about 1011 neurons, each operating at a speed of about1000 operations per seconds. Hence a rough first approximation isthat a Boolean circuit of about 1014 gates could simulate one secondof a brainโ€™s activity.8 Note that the fact that such a circuit (likely)exists does not mean it is easy to find it. After all, constructing thiscircuit took evolution billions of years. Much of the recent effortsin artificial intelligence research is focused on finding programsthat replicate some of the brainโ€™s capabilities and they take massivecomputational effort to discover, these programs often turn out tobe much smaller than the pessimistic estimates above. For example,at the time of this writing, Googleโ€™s neural network for machinetranslation has about 104 nodes (and can be simulated by a NAND-CIRC program of comparable size). Philosophers, priests and manyothers have since time immemorial argued that there is somethingabout humans that cannot be captured by mechanical devices suchas computers; whether or not that is the case, the evidence is thinthat humans can perform computational tasks that are inherentlyimpossible to achieve by computers of similar complexity.9

โ€ข Quantum computation. The most compelling attack on the Physi-cal Extended Church Turing Thesis comes from the notion of quan-tum computing. The idea was initiated by the observation that sys-tems with strong quantum effects are very hard to simulate on acomputer. Turning this observation on its head, people have pro-posed using such systems to perform computations that we do notknow how to do otherwise. At the time of this writing, scalablequantum computers have not yet been built, but it is a fascinatingpossibility, and one that does not seem to contradict any known lawof nature. We will discuss quantum computing in much more detailin Chapter 23. Modeling quantum computation involves extendingthe model of Boolean circuits into Quantum circuits that have onemore (very special) gate. However, the main takeaway is that whilequantum computing does suggest we need to amend the PECTT,it does not require a complete revision of our worldview. Indeed,almost all of the content of this book remains the same regardless ofwhether the underlying computational model is Boolean circuits orquantum circuits.

RRemark 5.14 โ€” Physical Extended Church-Turing Thesisand Cryptography. While even the precise phrasing ofthe PECTT, let alone understanding its correctness, isstill a subject of active research, some variants of it are

Page 209: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 209

already implicitly assumed in practice. Governments,companies, and individuals currently rely on cryptog-raphy to protect some of their most precious assets,including state secrets, control of weapon systemsand critical infrastructure, securing commerce, andprotecting the confidentiality of personal information.In applied cryptography, one often encounters state-ments such as โ€œcryptosystem ๐‘‹ provides 128 bits ofsecurityโ€. What such a statement really means is that(a) it is conjectured that there is no Boolean circuit(or, equivalently, a NAND-CIRC program) of sizemuch smaller than 2128 that can break ๐‘‹, and (b) weassume that no other physical mechanism can do bet-ter, and hence it would take roughly a 2128 amount ofโ€œresourcesโ€ to break ๐‘‹. We say โ€œconjecturedโ€ and notโ€œprovedโ€ because, while we can phrase the statementthat breaking the system cannot be done by an ๐‘ -gatecircuit as a precise mathematical conjecture, at themoment we are unable to prove such a statement forany non-trivial cryptosystem. This is related to the Pvs NP question we will discuss in future chapters. Wewill explore Cryptography in Chapter 21.

Chapter Recap

โ€ข We can think of programs both as describing a pro-cess, as well as simply a list of symbols that can beconsidered as data that can be fed as input to otherprograms.

โ€ข We can write a NAND-CIRC program that evalu-ates arbitrary NAND-CIRC programs (or equiv-alently a circuit that evaluates other circuits).Moreover, the efficiency loss in doing so is not toolarge.

โ€ข We can even write a NAND-CIRC program thatevaluates programs in other programming lan-guages such as Python, C, Lisp, Java, Go, etc.

โ€ข By a leap of faith, we could hypothesize that thenumber of gates in the smallest circuit that com-putes a function ๐‘“ captures roughly the amountof physical resources required to compute ๐‘“ . Thisstatement is known as the Physical Extended Church-Turing Thesis (PECTT).

โ€ข Boolean circuits (or equivalently AON-CIRC orNAND-CIRC programs) capture a surprisinglywide array of computational models. The strongestcurrently known challenge to the PECTT comesfrom the potential for using quantum mechanicaleffects to speed-up computation, a model known asquantum computers.

Page 210: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

210 introduction to theoretical computer science

Figure 5.9: A finite computational task is specified bya function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š. We can modela computational process using Boolean circuits (ofvarying gate sets) or straight-line program. Everyfunction can be computed by many programs. Wesay that ๐‘“ โˆˆ SIZE๐‘›,๐‘š(๐‘ ) if there exists a NANDcircuit of at most ๐‘  gates (equivalently a NAND-CIRCprogram of at most ๐‘  lines) that computes ๐‘“. Everyfunction ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š can be computed bya circuit of ๐‘‚(๐‘š โ‹… 2๐‘›/๐‘›) gates. Many functions suchas multiplication, addition, solving linear equations,computing the shortest path in a graph, and others,can be computed by circuits of much fewer gates.In particular there is an ๐‘‚(๐‘ 2 log ๐‘ )-size circuitthat computes the map ๐ถ, ๐‘ฅ โ†ฆ ๐ถ(๐‘ฅ) where ๐ถ isa string describing a circuit of ๐‘  gates. However,the counting argument shows there do exist somefunctions ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š that requireฮฉ(๐‘š โ‹… 2๐‘›/๐‘›) gates to compute.

5.7 RECAP OF PART I: FINITE COMPUTATION

This chapter concludes the first part of this book that deals with finitecomputation (computing functions that map a fixed number of Booleaninputs to a fixed number of Boolean outputs). The main take-awaysfrom Chapter 3, Chapter 4, and Chapter 5 are as follows (see alsoFig. 5.9):

โ€ข We can formally define the notion of a function ๐‘“ โˆถ 0, 1๐‘› โ†’0, 1๐‘š being computable using ๐‘  basic operations. Whether theseoperations are AND/OR/NOT, NAND, or some other universalbasis does not make much difference. We can describe such a com-putation either using a circuit or using a straight-line program.

โ€ข We define SIZE๐‘›,๐‘š(๐‘ ) to be the set of functions that are computableby NAND circuits of at most ๐‘  gates. This set is equal to the set offunctions computable by a NAND-CIRC program of at most ๐‘  linesand up to a constant factor in ๐‘  (which we will not care about);this is also the same as the set of functions that are computableby a Boolean circuit of at most ๐‘  AND/OR/NOT gates. The classSIZE๐‘›,๐‘š(๐‘ ) is a set of functions, not of programs/circuits.

โ€ข Every function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š can be computed using acircuit of at most ๐‘‚(๐‘š โ‹… 2๐‘›/๐‘›) gates. Some functions require at leastฮฉ(๐‘š โ‹… 2๐‘›/๐‘›) gates. We define SIZE๐‘›,๐‘š(๐‘ ) to be the set of functionsfrom 0, 1๐‘› to 0, 1๐‘š that can be computed using at most ๐‘  gates.

โ€ข We can describe a circuit/program ๐‘ƒ as a string. For every ๐‘ , thereis a universal circuit/program ๐‘ˆ๐‘  that can evaluate programs oflength ๐‘  given their description as strings. We can use this repre-sentation also to count the number of circuits of at most ๐‘  gates and

Page 211: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 211

hence prove that some functions cannot be computed by circuits ofsmaller-than-exponential size.

โ€ข If there is a circuit of ๐‘  gates that computes a function ๐‘“ , then wecan build a physical device to compute ๐‘“ using ๐‘  basic components(such as transistors). The โ€œPhysical Extended Church-Turing The-sisโ€ postulates that the reverse direction is true as well: if ๐‘“ is afunction for which every circuit requires at least ๐‘  gates then thatevery physical device to compute ๐‘“ will require about ๐‘  โ€œphysicalresourcesโ€. The main challenge to the PECTT is quantum computing,which we will discuss in Chapter 23.

Sneak preview: In the next part we will discuss how to model compu-tational tasks on unbounded inputs, which are specified using functions๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— (or ๐น โˆถ 0, 1โˆ— โ†’ 0, 1) that can take anunbounded number of Boolean inputs.

5.8 EXERCISES

Exercise 5.1 Which one of the following statements is false:

a. There is an ๐‘‚(๐‘ 3) line NAND-CIRC program that given as inputprogram ๐‘ƒ of ๐‘  lines in the list-of-tuples representation computesthe output of ๐‘ƒ when all its input are equal to 1.

b. There is an ๐‘‚(๐‘ 3) line NAND-CIRC program that given as inputprogram ๐‘ƒ of ๐‘  characters encoded as a string of 7๐‘  bits using theASCII encoding, computes the output of ๐‘ƒ when all its input areequal to 1.

c. There is an ๐‘‚(โˆš๐‘ ) line NAND-CIRC program that given as inputprogram ๐‘ƒ of ๐‘  lines in the list-of-tuples representation computesthe output of ๐‘ƒ when all its input are equal to 1.

Exercise 5.2 โ€” Equals function. For every ๐‘˜ โˆˆ โ„•, show that there is an ๐‘‚(๐‘˜)line NAND-CIRC program that computes the function EQUALS๐‘˜ โˆถ0, 12๐‘˜ โ†’ 0, 1 where EQUALS(๐‘ฅ, ๐‘ฅโ€ฒ) = 1 if and only if ๐‘ฅ = ๐‘ฅโ€ฒ.

Exercise 5.3 โ€” Equal to constant function. For every ๐‘˜ โˆˆ โ„• and ๐‘ฅโ€ฒ โˆˆ 0, 1๐‘˜,show that there is an ๐‘‚(๐‘˜) line NAND-CIRC program that computesthe function EQUALS๐‘ฅโ€ฒ โˆถ 0, 1๐‘˜ โ†’ 0, 1 that on input ๐‘ฅ โˆˆ 0, 1๐‘˜outputs 1 if and only if ๐‘ฅ = ๐‘ฅโ€ฒ.

Exercise 5.4 โ€” Counting lower bound for multibit functions. Prove that thereexists a number ๐›ฟ > 0 such that for every ๐‘›, ๐‘š there exists a function

Page 212: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

212 introduction to theoretical computer science

10 How many functions from 0, 1๐‘› to 0, 1๐‘š exist?

11 Follow the proof of Theorem 5.5, replacing the useof the counting argument with Exercise 5.4.

12 Using the adjacency list representation, a graphwith ๐‘› in-degree zero vertices and ๐‘  in-degree twovertices can be represented using roughly 2๐‘  log(๐‘  +๐‘›) โ‰ค 2๐‘ (log ๐‘  + ๐‘‚(1)) bits. The labeling of the ๐‘› inputand ๐‘š output vertices can be specified by a list of ๐‘›labels in [๐‘›] and ๐‘š labels in [๐‘š].13 Hint: Use the results of Exercise 5.6 and the fact thatin this regime ๐‘š = 1 and ๐‘› โ‰ช ๐‘ .

14 Hint: An equivalent way to say this is that youneed to prove that the set of functions that can becomputed using at most 2๐‘›/(1000๐‘›) lines has fewerthan 2โˆ’10022๐‘› elements. Can you see why?

15 Note that if ๐‘› is big enough, then it is easy torepresent such a pair using ๐‘›2 bits, since we canrepresent the program using ๐‘‚(๐‘›1.1 log๐‘›) bits, andwe can always pad our representation to have exactly๐‘›2 length.

๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š that requires at least ๐›ฟ๐‘š โ‹… 2๐‘›/๐‘› NAND gates tocompute. See footnote for hint.10

Exercise 5.5 โ€” Size hierarchy theorem for multibit functions. Prove that thereexists a number ๐ถ such that for every ๐‘›, ๐‘š and ๐‘›+๐‘š < ๐‘  < ๐‘šโ‹…2๐‘›/(๐ถ๐‘›)there exists a function ๐‘“ โˆˆ SIZE๐‘›,๐‘š(๐ถ โ‹… ๐‘ ) โงต SIZE๐‘›,๐‘š(๐‘ ). See footnote forhint.11

Exercise 5.6 โ€” Efficient representation of circuits and a tighter counting upper

bound. Use the ideas of Remark 5.4 to show that for every ๐œ– > 0 andsufficiently large ๐‘ , ๐‘›, ๐‘š,|SIZE๐‘›,๐‘š(๐‘ )| < 2(2+๐œ–)๐‘  log ๐‘ +๐‘› log๐‘›+๐‘š log ๐‘  . (5.11)

Conclude that the implicit constant in Theorem 5.2 can be made arbi-trarily close to 5. See footnote for hint.12

Exercise 5.7 โ€” Tighter counting lower bound. Prove that for every ๐›ฟ < 1/2, if๐‘› is sufficiently large then there exists a function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1such that ๐‘“ โˆ‰ SIZE๐‘›,1 ( ๐›ฟ2๐‘›๐‘› ). See footnote for hint.13

Exercise 5.8 โ€” Random functions are hard. Suppose ๐‘› > 1000 and that wechoose a function ๐น โˆถ 0, 1๐‘› โ†’ 0, 1 at random, choosing for every๐‘ฅ โˆˆ 0, 1๐‘› the value ๐น(๐‘ฅ) to be the result of tossing an independentunbiased coin. Prove that the probability that there is a 2๐‘›/(1000๐‘›)line program that computes ๐น is at most 2โˆ’100.14

Exercise 5.9 The following is a tuple representing a NAND program:(3, 1, ((3, 2, 2), (4, 1, 1), (5, 3, 4), (6, 2, 1), (7, 6, 6), (8, 0, 0), (9, 7, 8), (10, 5, 0), (11, 9, 10))).1. Write a table with the eight values ๐‘ƒ(000), ๐‘ƒ(001), ๐‘ƒ(010), ๐‘ƒ(011),๐‘ƒ(100), ๐‘ƒ(101), ๐‘ƒ(110), ๐‘ƒ(111) in this order.

2. Describe what the programs does in words.

Exercise 5.10 โ€” EVAL with XOR. For every sufficiently large ๐‘›, let ๐ธ๐‘› โˆถ0, 1๐‘›2 โ†’ 0, 1 be the function that takes an ๐‘›2-length string thatencodes a pair (๐‘ƒ , ๐‘ฅ) where ๐‘ฅ โˆˆ 0, 1๐‘› and ๐‘ƒ is a NAND programof ๐‘› inputs, a single output, and at most ๐‘›1.1 lines, and returns theoutput of ๐‘ƒ on ๐‘ฅ.15 That is, ๐ธ๐‘›(๐‘ƒ , ๐‘ฅ) = ๐‘ƒ(๐‘ฅ).

Prove that for every sufficiently large ๐‘›, there does not exist an XORcircuit ๐ถ that computes the function ๐ธ๐‘›, where a XOR circuit has theXOR gate as well as the constants 0 and 1 (see Exercise 3.5). That is,

Page 213: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

code as data, data as code 213

16 Hint: Use our bound on the number of program-s/circuits of size ๐‘  (Theorem 5.2), as well as theChernoff Bound ( Theorem 18.12) and the unionbound.

prove that there is some constant ๐‘›0 such that for every ๐‘› > ๐‘›0 andXOR circuit ๐ถ of ๐‘›2 inputs and a single output, there exists a pair(๐‘ƒ , ๐‘ฅ) such that ๐ถ(๐‘ƒ , ๐‘ฅ) โ‰  ๐ธ๐‘›(๐‘ƒ , ๐‘ฅ).

Exercise 5.11 โ€” Learning circuits (challenge, optional, assumes more background).

(This exercise assumes background in probability theory and/ormachine learning that you might not have at this point. Feel freeto come back to it at a later point and in particular after going overChapter 18.) In this exercise we will use our bound on the number ofcircuits of size ๐‘  to show that (if we ignore the cost of computation)every such circuit can be learned from not too many training samples.Specifically, if we find a size-๐‘  circuit that classifies correctly a trainingset of ๐‘‚(๐‘  log ๐‘ ) samples from some distribution ๐ท, then it is guaran-teed to do well on the whole distribution ๐ท. Since Boolean circuitsmodel very many physical processes (maybe even all of them, if the(controversial) physical extended Church-Turing thesis is true), thisshows that all such processes could be learned as well (again, ignor-ing the computation cost of finding a classifier that does well on thetraining data).

Let ๐ท be any probability distribution over 0, 1๐‘› and let ๐ถ be aNAND circuit with ๐‘› inputs, one output, and size ๐‘  โ‰ฅ ๐‘›. Prove thatthere is some constant ๐‘ such that with probability at least 0.999 thefollowing holds: if ๐‘š = ๐‘๐‘  log ๐‘  and ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘šโˆ’1 are chosen indepen-dently from ๐ท, then for every circuit ๐ถโ€ฒ such that ๐ถโ€ฒ(๐‘ฅ๐‘–) = ๐ถ(๐‘ฅ๐‘–) onevery ๐‘– โˆˆ [๐‘š], Pr๐‘ฅโˆผ๐ท[๐ถโ€ฒ(๐‘ฅ) โ‰ค ๐ถ(๐‘ฅ)] โ‰ค 0.99.

In other words, if ๐ถโ€ฒ is a so called โ€œempirical risk minimizerโ€ thatagrees with ๐ถ on all the training examples ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, then it willalso agree with ๐ถ with high probability for samples drawn from thedistribution ๐ท (i.e., it โ€œgeneralizesโ€, to use Machine-Learning lingo).See footnote for hint.16

5.9 BIBLIOGRAPHICAL NOTES

The EVAL function is usually known as a universal circuit. The imple-mentation we describe is not the most efficient known. Valiant [Val76]first showed a universal circuit of ๐‘‚(๐‘› log๐‘›) size where ๐‘› is the size ofthe input. Universal circuits have seen in recent years new motivationsdue to their applications for cryptography, see [LMS16; GKS17] .

While weโ€™ve seen that โ€œmostโ€ functions mapping ๐‘› bits to one bitrequire circuits of exponential size ฮฉ(2๐‘›/๐‘›), we actually do not knowof any explicit function for which we can prove that it requires, say, atleast ๐‘›100 or even 100๐‘› size. At the moment, the strongest such lowerbound we know is that there are quite simple and explicit ๐‘›-variable

Page 214: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

214 introduction to theoretical computer science

functions that require at least (5 โˆ’ ๐‘œ(1))๐‘› lines to compute, see thispaper of Iwama et al as well as this more recent work of Kulikov et al.Proving lower bounds for restricted models of circuits is an extremelyinteresting research area, for which Juknaโ€™s book [Juk12] (see alsoWegener [Weg87]) provides a very good introduction and overview. Ilearned of the proof of the size hierarchy theorem (Theorem 5.5) fromSasha Golovnev.

Scott Aaronsonโ€™s blog post on how information is physical is a gooddiscussion on issues related to the physical extended Church-TuringPhysics. Aaronsonโ€™s survey on NP complete problems and physicalreality [Aar05] discusses these issues as well, though it might beeasier to read after we reach Chapter 15 on NP and NP-completeness.

Page 215: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

IIUNIFORM COMPUTATION

Page 216: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 217: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Figure 6.1: Once you know how to multiply multi-digit numbers, you can do so for every number ๐‘›of digits, but if you had to describe multiplicationusing Boolean circuits or NAND-CIRC programs,you would need a different program/circuit for everylength ๐‘› of the input.

6Functions with Infinite domains, Automata, and Regular ex-pressions

โ€œAn algorithm is a finite answer to an infinite number of questions.โ€, At-tributed to Stephen Kleene.

The model of Boolean circuits (or equivalently, the NAND-CIRCprogramming language) has one very significant drawback: a Booleancircuit can only compute a finite function ๐‘“ , and in particular sinceevery gate has two inputs, a size ๐‘  circuit can compute on an inputof length at most 2๐‘ . This does not capture our intuitive notion of analgorithm as a single recipe to compute a potentially infinite function.For example, the standard elementary school multiplication algorithmis a single algorithm that multiplies numbers of all lengths, but yetwe cannot express this algorithm as a single circuit, but rather need adifferent circuit (or equivalently, a NAND-CIRC program) for everyinput length (see Fig. 6.1).

In this chapter we extend our definition of computational tasks toconsider functions with the unbounded domain of 0, 1โˆ—. We focuson the question of defining what tasks to compute, mostly leavingthe question of how to do so to later chapters, where we will seeTuring Machines and other computational models for computing onunbounded inputs. In this chapter we will see however one examplefor a simple and restricted model of computation - deterministic finiteautomata (DFAs).

This chapter: A non-mathy overviewIn this chapter we discuss functions that as input strings ofarbitary length. Such functions could have outputs that arelong strings as well, but the case of Boolean functions, wherethe output is a single bit, is of particular importance. The taskof computing a Boolean function is equivalent to the task

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Define functions on unbounded length inputs,

that cannot be described by a finite size tableof inputs and outputs.

โ€ข Equivalence with the task of decidingmembership in a language.

โ€ข Deterministic finite automatons (optional): Asimple example for a model for unboundedcomputation

โ€ข Equivalence with regular expressions.

Page 218: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

218 introduction to theoretical computer science

Figure 6.2: The NAND circuit and NAND-CIRCprogram for computing the XOR of 5 bits. Note howthe circuit for XOR5 merely repeats four times thecircuit to compute the XOR of 2 bits.

of deciding a language. We will also define the restriction ofa function over unbounded length strings to a finite length,and how this restriction can be computed by Boolean circuits.In the second half of this chapter we discuss finite automata,which is a model for computing functions of unboundedlength. This model is not as powerful as Python or othergeneral-purpose programming languages, but can serveas an introduction to these more general models. We alsoshow a beautiful result - the functions computable by finiteautomata are exactly the ones that correspond to regularexpressions. However, the reader can also feel free to skipautomata and go straight to our discussion of Turing Machinesin Chapter 7.

6.1 FUNCTIONS WITH INPUTS OF UNBOUNDED LENGTH

Up until now, we considered the computational task of mappingsome string of length ๐‘› into a string of length ๐‘š. However, in gen-eral computational tasks can involve inputs of unbounded length.For example, the following Python function computes the functionXOR โˆถ 0, 1โˆ— โ†’ 0, 1, where XOR(๐‘ฅ) equals 1 iff the number of 1โ€™sin ๐‘ฅ is odd. (In other words, XOR(๐‘ฅ) = โˆ‘|๐‘ฅ|โˆ’1๐‘–=0 ๐‘ฅ๐‘– mod 2 for every๐‘ฅ โˆˆ 0, 1โˆ—.) As simple as it is, the XOR function cannot be com-puted by a Boolean circuit. Rather, for every ๐‘›, we can compute XOR๐‘›(the restriction of XOR to 0, 1๐‘›) using a different circuit (e.g., seeFig. 6.2).

def XOR(X):'''Takes list X of 0's and 1's

Outputs 1 if the number of 1's is odd and outputs 0otherwise'''result = 0for i in range(len(X)):

result += X[i] % 2return result

Previously in this book we studied the computation of finite func-tions ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š. Such a function ๐‘“ can always be describedby listing all the 2๐‘› values it takes on inputs ๐‘ฅ โˆˆ 0, 1๐‘›. In this chap-ter we consider functions such as XOR that take inputs of unboundedsize. While we can describe XOR using a finite number of symbols(in fact we just did so above), it takes infinitely many possible inputsand so we cannot just write down all of its values. The same is truefor many other functions capturing important computational tasks

Page 219: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 219

including addition, multiplication, sorting, finding paths in graphs,fitting curves to points, and so on and so forth. To contrast with thefinite case, we will sometimes call a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 (or๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—) infinite. However, this does not mean that ๐นtaks as input strings of infinite length! It just means that ๐น can take asinput a string of that can be arbitrarily long, and so we cannot simplywrite down a table of all the outputs of ๐น on different inputs.

Big Idea 8 A function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— specifies the computa-tional task mapping an input ๐‘ฅ โˆˆ 0, 1โˆ— into the output ๐น(๐‘ฅ).

As weโ€™ve seen before, restricting attention to functions that usebinary strings as inputs and outputs does not detract from our gener-ality, since other objects, including numbers, lists, matrices, images,videos, and more, can be encoded as binary strings.

As before, it is important to differentiate between specification andimplementation. For example, consider the following function:

TWINP(๐‘ฅ) = โŽงโŽจโŽฉ1 โˆƒ๐‘โˆˆโ„• s.t.๐‘, ๐‘ + 2 are primes and ๐‘ > |๐‘ฅ|0 otherwise(6.1)

This is a mathematically well-defined function. For every ๐‘ฅ,TWINP(๐‘ฅ) has a unique value which is either 0 or 1. However, atthe moment no one knows of a Python program that computes thisfunction. The Twin prime conjecture posits that for every ๐‘› thereexists ๐‘ > ๐‘› such that both ๐‘ and ๐‘ + 2 are primes. If this conjectureis true, then ๐‘‡ is easy to compute indeed - the program def T(x):return 1 will do the trick. However, mathematicians have triedunsuccesfully to prove this conjecture since 1849. That said, whetheror not we know how to implement the function TWINP, the definitionabove provides its specification.

6.1.1 Varying inputs and outputsMany of the functions we are interested in take more than one input.For example the function

MULT(๐‘ฅ, ๐‘ฆ) = ๐‘ฅ โ‹… ๐‘ฆ (6.2)that takes the binary representation of a pair of integers ๐‘ฅ, ๐‘ฆ โˆˆ โ„•

and outputs the binary representation of their product ๐‘ฅ โ‹… ๐‘ฆ. How-ever, since we can represent a pair of strings as a single string, we willconsider functions such as MULT as mapping 0, 1โˆ— to 0, 1โˆ—. Wewill typically not be concerned with low-level details such as the pre-cise way to represent a pair of integers as a string, since essentially allchoices will be equivalent for our purposes.

Page 220: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

220 introduction to theoretical computer science

Another example for a function we want to compute is

PALINDROME(๐‘ฅ) = โŽงโŽจโŽฉ1 โˆ€๐‘–โˆˆ[|๐‘ฅ|]๐‘ฅ๐‘– = ๐‘ฅ|๐‘ฅ|โˆ’๐‘–0 otherwise(6.3)

PALINDROME has a single bit as output. Functions with a singlebit of output are known as Boolean functions. Boolean functions arecentral to the theory of computation, and we will come discuss themoften in this book. Note that even though Boolean functions have asingle bit of output, their input can be of arbitrary length. Thus theyare still infinite functions, that cannot be described via a finite table ofvalues.

โ€œBooleanizingโ€ functions. Sometimes it might be convenient to ob-tain a Boolean variant for a non-Boolean function. For example, thefollowing is a Boolean variant of MULT.

BMULT(๐‘ฅ, ๐‘ฆ, ๐‘–) = โŽงโŽจโŽฉ๐‘–๐‘กโ„Ž bit of ๐‘ฅ โ‹… ๐‘ฆ ๐‘– < |๐‘ฅ โ‹… ๐‘ฆ|0 otherwise(6.4)

If we can compute BMULT via any programming language suchas Python, C, Java, etc. then we can compute MULT as well, and viceversa.Solved Exercise 6.1 โ€” Booleanizing general functions. Show that for everyfunction ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— there exists a Boolean function BF โˆถ0, 1โˆ— โ†’ 0, 1 such that a Python program to compute BF can betransformed into a program to compute ๐น and vice versa.

Solution:

For every ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, we can define

BF(๐‘ฅ, ๐‘–, ๐‘) = โŽงโŽจโŽฉ๐น(๐‘ฅ)๐‘– ๐‘– < |๐น(๐‘ฅ)|, ๐‘ = 01 ๐‘– < |๐น(๐‘ฅ)|, ๐‘ = 10 ๐‘– โ‰ฅ |๐‘ฅ| (6.5)

to be the function that on input ๐‘ฅ โˆˆ 0, 1โˆ—, ๐‘– โˆˆ โ„•, ๐‘ โˆˆ 0, 1 out-puts the ๐‘–๐‘กโ„Ž bit of ๐น(๐‘ฅ) if ๐‘ = 0 and ๐‘– < |๐‘ฅ|. If ๐‘ = 1 then BF(๐‘ฅ, ๐‘–, ๐‘)outputs 1 iff ๐‘– < |๐น(๐‘ฅ)| and hence this allows to compute the lengthof ๐น(๐‘ฅ).

Computing BF from ๐น is straightforward. For the other direc-tion, given a Python function BF that computes BF, we can compute๐น as follows:

def F(x):res = []

Page 221: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 221

i = 0while BF(x,i,1):

res.apppend(BF(x,i,0))i += 1

return res

6.1.2 Formal LanguagesFor every Boolean function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1, we can define theset ๐ฟ๐น = ๐‘ฅ|๐น(๐‘ฅ) = 1 of strings on which ๐น outputs 1. Such setsare known as languages. This name is rooted in formal language theoryas pursued by linguists such as Noam Chomsky. A formal languageis a subset ๐ฟ โŠ† 0, 1โˆ— (or more generally ๐ฟ โŠ† ฮฃโˆ— for some finitealphabet ฮฃ). The membership or decision problem for a language ๐ฟ, isthe task of determining, given ๐‘ฅ โˆˆ 0, 1โˆ—, whether or not ๐‘ฅ โˆˆ ๐ฟ. Ifwe can compute the function ๐น then we can decide membership in thelanguage ๐ฟ๐น and vice versa. Hence, many texts such as [Sip97] referto the task of computing a Boolean function as โ€œdeciding a languageโ€.In this book we mostly describe computational task using the functionnotation, which is easier to generalize to computation with more thanone bit of output. However, since the language terminology is sopopular in the literature, we will sometimes mention it.

6.1.3 Restrictions of functionsIf ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is a Boolean function and ๐‘› โˆˆ โ„• then the re-striction of ๐น to inputs of length ๐‘›, denoted as ๐น๐‘›, is the finite function๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 such that ๐‘“(๐‘ฅ) = ๐น(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1๐‘›. Thatis, ๐น๐‘› is the finite function that is only defined on inputs in 0, 1๐‘›, butagrees with ๐น on those inputs. Since ๐น๐‘› is a finite function, it can becomputed by a Boolean circuit, implying the following theorem:

Theorem 6.1 โ€” Circuit collection for every infinite function. Let ๐น โˆถ 0, 1โˆ— โ†’0, 1. Then there is a collection ๐ถ๐‘›๐‘›โˆˆ1,2,โ€ฆ of circuits such thatfor every ๐‘› > 0, ๐ถ๐‘› computes the restriction ๐น๐‘› of ๐น to inputs oflength ๐‘›.

Proof. This is an immediate corollary of the universality of Booleancircuits. Indeed, since ๐น๐‘› maps 0, 1๐‘› to 0, 1, Theorem 4.15 impliesthat there exists a Boolean circuit ๐ถ๐‘› to compute it. In fact. the size ofthis circuit is at most ๐‘ โ‹… 2๐‘›/๐‘› gates for some constant ๐‘ โ‰ค 10.

In particular, Theorem 6.1 implies that there exists such as circuitcollection ๐ถ๐‘› even for the TWINP function we described before,

Page 222: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

222 introduction to theoretical computer science

despite the fact that we donโ€™t know of any program to compute it. In-deed, this is not that surprising: for every particular ๐‘› โˆˆ โ„•, TWINP๐‘›is either the constant zero function or the constant one function, bothof which can be computed by very simple Boolean circuits. Hencea collection of circuits ๐ถ๐‘› that computes TWINP certainly exists.The difficulty in computing TWINP using Python or any other pro-gramming language arises from the fact that we donโ€™t know for eachparticular ๐‘› what is the circuit ๐ถ๐‘› in this collection.

6.2 DETERMINISTIC FINITE AUTOMATA (OPTIONAL)

All our computational models so far - Boolean circuits and straight-line programs - were only applicable for finite functions. In Chapter 7we will present Turing Machines, which are the central models of com-putation for functions of unbounded input length. However, in thissection we will present the more basic model of deterministic finiteautomata (DFA). Automata can serve as a good stepping-stone forTuring machines, though they will not be used much in later parts ofthis book, and so the reader can feel free to skip ahead fo Chapter 7.DFAs turn out to be equivalent in power to regular expressions, whichare a powerful mechanism to specify patterns that is widely used inpractice. Our treatment of automata is quite brief. There are plenty ofresources that help you get more comfortable with DFAโ€™s. In particu-lar, Chapter 1 of Sipserโ€™s book [Sip97] contains an excellent expositionof this material. There are also many websites with online simula-tors for automata, as well as translators from regular expressions toautomata and vice versa (see for example here and here).

At a high level, an algorithm is a recipe for computing an outputfrom an input via a combination of the following steps:

1. Read a bit from the input2. Update the state (working memory)3. Stop and produce an output

For example, recall the Python program that computes the XORfunction:

def XOR(X):'''Takes list X of 0's and 1's

Outputs 1 if the number of 1's is odd and outputs 0otherwise'''result = 0for i in range(len(X)):

result += X[i] % 2return result

Page 223: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 223

Figure 6.3: A deterministic finite automaton thatcomputes the XOR function. It has two states 0 and 1,and when it observes ๐œŽ it transitions from ๐‘ฃ to ๐‘ฃ โŠ• ๐œŽ.

In each step, this program reads a single bit X[i] and updatesits state result based on that bit (flipping result if X[i] is 1 andkeeping it the same otherwise). When its done transversing the input,the program outputs result. In computer science, such a program iscalled a single-pass constant-memory algorithm since it makes a singlepass over the input and its working memory is of finite size. (Indeedin this case result can either be 0 or 1.) Such an algorithm is alsoknown as a Deterministic Finite Automaton or DFA (another name forDFAโ€™s is a finite state machine). We can think of such an algorithm asa โ€œmachineโ€ that can be in one of ๐ถ states, for some constant ๐ถ. Themachine starts in some initial state, and then reads its input ๐‘ฅ โˆˆ 0, 1โˆ—one bit at a time. Whenever the machine reads a bit ๐œŽ โˆˆ 0, 1, ittransitions into a new state based on ๐œŽ and its prior state. The outputof the machine is based on the final state. Every single-pass constant-memory algorithm corresponds to such a machine. If an algorithmuses ๐‘ bits of memory, then the contents of its memory are a string oflength ๐‘. Since there are 2๐‘ such strings, at any point in the execution,such an algorithm can be in one of 2๐‘ states.

We can specify a DFA of ๐ถ states by a list of ๐ถ โ‹… 2 rules. Each rulewill be of the form โ€œIf the DFA is in state ๐‘ฃ and the bit read from theinput is ๐œŽ then the new state is ๐‘ฃโ€ฒโ€. At the end of the computationwe will also have a rule of the form โ€œIf the final state is one of thefollowing โ€ฆ then output 1, otherwise output 0โ€. For example, thePython program above can be represented by a two state automata forcomputing XOR of the following form:

โ€ข Initialize in state 0โ€ข For every state ๐‘  โˆˆ 0, 1 and input bit ๐œŽ read, if ๐œŽ = 1 then change

to state 1 โˆ’ ๐‘ , otherwise stay in state ๐‘ .โ€ข At the end output 1 iff ๐‘  = 1.

We can also describe a ๐ถ-state DFA as a labeled graph of ๐ถ vertices.For every state ๐‘  and bit ๐œŽ, we add a directed edge labeled with ๐œŽbetween ๐‘  and the state ๐‘ โ€ฒ such that if the DFA is at state ๐‘  and reads ๐œŽthen it transitions to ๐‘ โ€ฒ. (If the state stays the same then this edge willbe a self loop; similarly, if ๐‘  transitions to ๐‘ โ€ฒ in both the case ๐œŽ = 0 and๐œŽ = 1 then the graph will contain two parallel edges.) We also labelthe set ๐’ฎ of states on which the automate will output 1 at the end ofthe computation. This set is known as the set of accepting states. SeeFig. 6.3 for the graphical representation of the XOR automaton.

Formally, a DFA is specified by (1) the table of the ๐ถ โ‹… 2 rules, whichcan be represented as a transition function ๐‘‡ that maps a state ๐‘  โˆˆ [๐ถ]and bit ๐œŽ โˆˆ 0, 1 to the state ๐‘ โ€ฒ โˆˆ [๐ถ] which the DFA will transition tofrom state ๐‘ on input ๐œŽ and (2) the set ๐’ฎ of accepting states. This leadsto the following definition.

Page 224: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

224 introduction to theoretical computer science

Definition 6.2 โ€” Deterministic Finite Automaton. A deterministic finiteautomaton (DFA) with ๐ถ states over 0, 1 is a pair (๐‘‡ , ๐’ฎ) with๐‘‡ โˆถ [๐ถ] ร— 0, 1 โ†’ [๐ถ] and ๐’ฎ โŠ† [๐ถ]. The finite function ๐‘‡ is knownas the transition function of the DFA and the set ๐’ฎ is known as theset of accepting states.

Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 be a Boolean function with the infinitedomain 0, 1โˆ—. We say that (๐‘‡ , ๐’ฎ) computes a function ๐น โˆถ 0, 1โˆ— โ†’0, 1 if for every ๐‘› โˆˆ โ„• and ๐‘ฅ โˆˆ 0, 1๐‘›, if we define ๐‘ 0 = 0 and๐‘ ๐‘–+1 = ๐‘‡ (๐‘ ๐‘–, ๐‘ฅ๐‘–) for every ๐‘– โˆˆ [๐‘›], then๐‘ ๐‘› โˆˆ ๐’ฎ โ‡” ๐น(๐‘ฅ) = 1 (6.6)

PMake sure not to confuse the transition function ofan automaton (๐‘‡ in Definition 6.2), which is a finitefunction specifying the table of โ€œrulesโ€ which it fol-lows, with the function the automaton computes (๐น inDefinition 6.2) which is an infinite function.

RRemark 6.3 โ€” Definitions in other texts. Deterministicfinite automata can be defined in several equivalentways. In particular Sipser [Sip97] defines a DFAs as afive-tuple (๐‘„, ฮฃ, ๐›ฟ, ๐‘ž0, ๐น ) where ๐‘„ is the set of states,ฮฃ is the alphabet, ๐›ฟ is the transition function, ๐‘ž0 isthe initial state, and ๐น is the set of accepting states.In this book the set of states is always of the form๐‘„ = 0, โ€ฆ , ๐ถ โˆ’ 1 and the initial state is always ๐‘ž0 = 0,but this makes no difference to the computationalpower of these models. Also, we restrict our attentionto the case that the alphabet ฮฃ is equal to 0, 1.

Solved Exercise 6.2 โ€” DFA for (010)โˆ—. Prove that there is a DFA that com-putes the following function ๐น :

๐น(๐‘ฅ) = โŽงโŽจโŽฉ1 3 divides |๐‘ฅ| and โˆ€๐‘–โˆˆ[|๐‘ฅ|/3]๐‘ฅ๐‘–๐‘ฅ๐‘–+1๐‘ฅ๐‘–+2 = 0100 otherwise(6.7)

Solution:

When asked to construct a deterministic finite automaton, ithelps to start by thinking of it a single-pass constant-memory al-

Page 225: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 225

Figure 6.4: A DFA that outputs 1 only on inputs๐‘ฅ โˆˆ 0, 1โˆ— that are a concatenation of zero or morecopies of 010. The state 0 is both the starting stateand the only accepting state. The table denotes thetransition function of ๐‘‡ , which maps the current stateand symbol read to the new symbol.

gorithm (for example, a Python program) and then translate thisprogram into a DFA. Here is a simple Python program for comput-ing ๐น :

def F(X):'''Return 1 iff X is a concatenation of zero/more

copies of [0,1,0]'''if len(X) % 3 != 0:

return Falseultimate = 0penultimate = 1antepenultimate = 0for idx, b in enumerate(X):

antepenultimate = penultimatepenultimate = ultimateultimate = bif idx % 3 == 2 and ((antepenultimate,

penultimate, ultimate) != (0,1,0)):return False

return True

Since we keep three Boolean variables, the working memory canbe in one of 23 = 8 configurations, and so the program above canbe directly translated into an 8 state DFA. While this is not neededto solve the question, by examining the resulting DFA, we can seethat we can merge some states together and obtain a 4 state au-tomaton, described in Fig. 6.4. See also Fig. 6.5, which depicts theexecution of this DFA on a particular input.

6.2.1 Anatomy of an automaton (finite vs. unbounded)Now that we consider computational tasks with unbounded inputsizes, it is crucially important to distinguish between the componentsof our algorithm that are fixed in size, and the components that growwith the size of the input. For the case of DFAs these are the follow-ing:

Constant size components: Given a DFA ๐ด, the following quantities arefixed independent of the input size:

โ€ข The number of states ๐ถ in ๐ด.

โ€ข The transition function ๐‘‡ (which has 2๐ถ inputs, and so can be speci-fied by a table of 2๐ถ rows, each entry in which is a number in [๐ถ]).

Page 226: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

226 introduction to theoretical computer science

โ€ข The set ๐’ฎ โŠ† [๐ถ] of accepting states. There are at most 2๐ถ such states,each of which can be described by a string in 0, 1๐ถ specifiyingwhich state is in ๐’ฎ and which isnโ€™t

Together the above means that we can fully describe an automatonusing finitely many symbols. This is a property we require out of anynotion of โ€œalgorithmโ€: we should be able to write down a completelyspecification of how it produces an output from an input.

Components of unbounded size: The following quantities relating to aDFA are not bounded by any constant. We stress that these are stillfinite for any given input.

โ€ข The length of the input ๐‘ฅ โˆˆ 0, 1โˆ— that the DFA is provided with. Itis always finite, but not bounded.

โ€ข The number of steps that the DFA takes can grow with the length ofthe input. Indeed, a DFA makes a single pass on the input and so ittakes exactly |๐‘ฅ| steps on an input ๐‘ฅ โˆˆ 0, 1โˆ—.

Figure 6.5: Execution of the DFA of Fig. 6.4. The num-ber of states and the size of the transition functionare bounded, but the input can be arbitrarily long. Ifthe DFA is at state ๐‘  and observes the value ๐œŽ then itmoves to the state ๐‘‡ (๐‘ , ๐œŽ). At the end of the executionthe DFA accepts iff the final state is in ๐’ฎ.

6.2.2 DFA-computable functionsWe say that a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is DFA computable if thereexists some DFA that computes ๐น . In Chapter 4 we saw that everyfinite function is computable by some Boolean circuit. Thus, at thispoint you might expect that every infinite function is computable bysome DFA. However, this is very much not the case. We will soon seesome simple examples of infinite functions that are not computable byDFAs, but for starters, letโ€™s prove that such functions exist.

Page 227: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 227

Theorem 6.4 โ€” DFA-computable functions are countable. Let DFACOMP bethe set of all Boolean functions ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 such that thereexists a DFA computing ๐น . Then DFACOMP is countable.

Proof Idea:

Every DFA can be described by a finite length string, which yieldsan onto map from 0, 1โˆ— to DFACOMP: namely the function thatmaps a string describing an automaton ๐ด to the function that it com-putes.โ‹†Proof of Theorem 6.4. Every DFA can be described by a finite string,representing the transition function ๐‘‡ and the set of accepting states,and every DFA ๐ด computes some function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. Thuswe can define the following function ๐‘†๐‘ก๐ท๐ถ โˆถ 0, 1โˆ— โ†’ DFACOMP:

๐‘†๐‘ก๐ท๐ถ(๐‘Ž) = โŽงโŽจโŽฉ๐น ๐‘Ž represents automaton ๐ด and ๐น is the function ๐ด computesONE otherwise

(6.8)where ONE โˆถ 0, 1โˆ— โ†’ 0, 1 is the constant function that outputs1 on all inputs (and is a member of DFACOMP). Since by definition,every function ๐น in DFACOMP is computable by some automaton,๐‘†๐‘ก๐ท๐ถ is an onto function from 0, 1โˆ— to DFACOMP, which meansthat DFACOMP is countable (see Section 2.4.2).

Since the set of all Boolean functions is uncountable, we get thefollowing corollary:

Theorem 6.5 โ€” Existence of DFA-uncomputable functions. There exists aBoolean function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 that is not computable by anyDFA.

Proof. If every Boolean function ๐น is computable by some DFA thenDFACOMP equals the set ALL of all Boolean functions, but by Theo-rem 2.12, the latter set is uncountable, contradicting Theorem 6.4.

6.3 REGULAR EXPRESSIONS

Searching for a piece of text is a common task in computing. At itsheart, the search problem is quite simple. We have a collection ๐‘‹ =๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜ of strings (e.g., files on a hard-drive, or student records ina database), and the user wants to find out the subset of all the ๐‘ฅ โˆˆ ๐‘‹

Page 228: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

228 introduction to theoretical computer science

that are matched by some pattern (e.g., all files whose names end withthe string .txt). In full generality, we can allow the user to specify thepattern by specifying a (computable) function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1,where ๐น(๐‘ฅ) = 1 corresponds to the pattern matching ๐‘ฅ. That is, theuser provides a program ๐‘ƒ in some Turing-complete programminglanguage such as Python, and the system will return all the ๐‘ฅ โˆˆ ๐‘‹such that ๐‘ƒ(๐‘ฅ) = 1. For example, one could search for all text filesthat contain the string important document or perhaps (letting ๐‘ƒcorrespond to a neural-network based classifier) all images that con-tain a cat. However, we donโ€™t want our system to get into an infiniteloop just trying to evaluate the program ๐‘ƒ ! For this reason, typicalsystems for searching files or databases do not allow users to specifythe patterns using full-fledged programming languages. Rather, suchsystems use restricted computational models that on the one hand arerich enough to capture many of the queries needed in practice (e.g., allfilenames ending with .txt, or all phone numbers of the form (617)xxx-xxxx), but on the other hand are restricted enough so the queriescan be evaluated very efficiently on huge files and in particular cannotresult in an infinite loop.

One of the most popular such computational models is regularexpressions. If you ever used an advanced text editor, a command lineshell, or have done any kind of manipulation of text files, then youhave probably come across regular expressions.

A regular expression over some alphabet ฮฃ is obtained by combin-ing elements of ฮฃ with the operation of concatenation, as well as |(corresponding to or) and โˆ— (corresponding to repetition zero ormore times). (Common implementations of regular expressions inprogramming languages and shells typically include some extra oper-ations on top of | and โˆ—, but these operations can be implemented asโ€œsyntactic sugarโ€ using the operators | and โˆ—.) For example, the fol-lowing regular expression over the alphabet 0, 1 corresponds to theset of all strings ๐‘ฅ โˆˆ 0, 1โˆ— where every digit is repeated at least twice:(00(0โˆ—)|11(1โˆ—))โˆ— . (6.9)

The following regular expression over the alphabet ๐‘Ž, โ€ฆ , ๐‘ง, 0, โ€ฆ , 9corresponds to the set of all strings that consist of a sequence of oneor more of the letters ๐‘Ž-๐‘‘ followed by a sequence of one or more digits(without a leading zero):

(๐‘Ž|๐‘|๐‘|๐‘‘)(๐‘Ž|๐‘|๐‘|๐‘‘)โˆ—(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)โˆ— . (6.10)

Formally, regular expressions are defined by the following recursivedefinition:

Page 229: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 229

Definition 6.6 โ€” Regular expression. A regular expression ๐‘’ over an al-phabet ฮฃ is a string over ฮฃ โˆช (, ), |, โˆ—,โˆ…, "" that has one of thefollowing forms:

1. ๐‘’ = ๐œŽ where ๐œŽ โˆˆ ฮฃ2. ๐‘’ = (๐‘’โ€ฒ|๐‘’โ€ณ) where ๐‘’โ€ฒ, ๐‘’โ€ณ are regular expressions.

3. ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) where ๐‘’โ€ฒ, ๐‘’โ€ณ are regular expressions. (We oftendrop the parentheses when there is no danger of confusion andso write this as ๐‘’โ€ฒ ๐‘’โ€ณ.)

4. ๐‘’ = (๐‘’โ€ฒ)โˆ— where ๐‘’โ€ฒ is a regular expression.

Finally we also allow the following โ€œedge casesโ€: ๐‘’ = โˆ… and๐‘’ = "". These are the regular expressions corresponding to accept-ing no strings, and accepting only the empty string respectively.

We will drop parenthesis when they can be inferred from thecontext. We also use the convention that OR and concatenation areleft-associative, and we give highest precedence to โˆ—, then concate-nation, and then OR. Thus for example we write 00โˆ—|11 instead of((0)(0โˆ—))|((1)(1)).

Every regular expression ๐‘’ corresponds to a function ฮฆ๐‘’ โˆถ ฮฃโˆ— โ†’0, 1 where ฮฆ๐‘’(๐‘ฅ) = 1 if ๐‘ฅ matches the regular expression. For exam-ple, if ๐‘’ = (00|11)โˆ— then ฮฆ๐‘’(110011) = 1 but ฮฆ๐‘’(101) = 0 (can you seewhy?).

PThe formal definition of ฮฆ๐‘’ is one of those definitionsthat is more cumbersome to write than to grasp. Thusit might be easier for you to first work it out on yourown and then check that your definition matches whatis written below.

Definition 6.7 โ€” Matching a regular expression. Let ๐‘’ be a regular expres-sion over the alphabet ฮฃ. The function ฮฆ๐‘’ โˆถ ฮฃโˆ— โ†’ 0, 1 is definedas follows:

1. If ๐‘’ = ๐œŽ then ฮฆ๐‘’(๐‘ฅ) = 1 iff ๐‘ฅ = ๐œŽ.2. If ๐‘’ = (๐‘’โ€ฒ|๐‘’โ€ณ) then ฮฆ๐‘’(๐‘ฅ) = ฮฆ๐‘’โ€ฒ(๐‘ฅ)โˆจฮฆ๐‘’โ€ณ(๐‘ฅ) where โˆจ is the OR op-

erator.

Page 230: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

230 introduction to theoretical computer science

3. If ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) then ฮฆ๐‘’(๐‘ฅ) = 1 iff there is some ๐‘ฅโ€ฒ, ๐‘ฅโ€ณ โˆˆ ฮฃโˆ— suchthat ๐‘ฅ is the concatenation of ๐‘ฅโ€ฒ and ๐‘ฅโ€ณ and ฮฆ๐‘’โ€ฒ(๐‘ฅโ€ฒ) = ฮฆ๐‘’โ€ณ(๐‘ฅโ€ณ) =1.

4. If ๐‘’ = (๐‘’โ€ฒ)โˆ— then ฮฆ๐‘’(๐‘ฅ) = 1 iff there is some ๐‘˜ โˆˆ โ„• and some๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1 โˆˆ ฮฃโˆ— such that ๐‘ฅ is the concatenation ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘˜โˆ’1 andฮฆ๐‘’โ€ฒ(๐‘ฅ๐‘–) = 1 for every ๐‘– โˆˆ [๐‘˜].5. Finally, for the edge cases ฮฆโˆ… is the constant zero function, andฮฆ"" is the function that only outputs 1 on the empty string "".

We say that a regular expression ๐‘’ over ฮฃ matches a string ๐‘ฅ โˆˆ ฮฃโˆ—if ฮฆ๐‘’(๐‘ฅ) = 1.

PThe definitions above are not inherently difficult, butare a bit cumbersome. So you should pause here andgo over it again until you understand why it corre-sponds to our intuitive notion of regular expressions.This is important not just for understanding regularexpressions themselves (which are used time andagain in a great many applications) but also for get-ting better at understanding recursive definitions ingeneral.

A Boolean function is called โ€œregularโ€ if it outputs 1 on preciselythe set of strings that are matched by some regular expression. That is,

Definition 6.8 โ€” Regular functions / languages. Let ฮฃ be a finite set and๐น โˆถ ฮฃโˆ— โ†’ 0, 1 be a Boolean function. We say that ๐น is regular if๐น = ฮฆ๐‘’ for some regular expression ๐‘’.Similarly, for every formal language ๐ฟ โŠ† ฮฃโˆ—, we say that ๐ฟ is reg-

ular if and only if there is a regular expression ๐‘’ such that ๐‘ฅ โˆˆ ๐ฟ iff๐‘’ matches ๐‘ฅ. Example 6.9 โ€” A regular function. Let ฮฃ = ๐‘Ž, ๐‘, ๐‘, ๐‘‘, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9and ๐น โˆถ ฮฃโˆ— โ†’ 0, 1 be the function such that ๐น(๐‘ฅ) outputs 1 iff๐‘ฅ consists of one or more of the letters ๐‘Ž-๐‘‘ followed by a sequenceof one or more digits (without a leading zero). Then ๐น is a regularfunction, since ๐น = ฮฆ๐‘’ where๐‘’ = (๐‘Ž|๐‘|๐‘|๐‘‘)(๐‘Ž|๐‘|๐‘|๐‘‘)โˆ—(0|1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)โˆ—

(6.11)is the expression we saw in (6.10).

Page 231: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 231

If we wanted to verify, for example, that ฮฆ๐‘’(๐‘Ž๐‘๐‘12078) = 1,we can do so by noticing that the expression (๐‘Ž|๐‘|๐‘|๐‘‘) matchesthe string ๐‘Ž, (๐‘Ž|๐‘|๐‘|๐‘‘)โˆ— matches ๐‘๐‘, (0|1|2|3|4|5|6|7|8|9) matches thestring 1, and the expression (0|1|2|3|4|5|6|7|8|9)โˆ— matches the string2078. Each one of those boils down to a simpler expression. For ex-ample, the expression (๐‘Ž|๐‘|๐‘|๐‘‘)โˆ— matches the string ๐‘๐‘ because bothof the one-character strings ๐‘ and ๐‘ are matched by the expression๐‘Ž|๐‘|๐‘|๐‘‘.Regular expression can be defined over any finite alphabet ฮฃ, but

as usual, we will mostly focus our attention on the binary case, whereฮฃ = 0, 1. Most (if not all) of the theoretical and practical generalinsights about regular expressions can be gleaned from studying thebinary case.

6.3.1 Algorithms for matching regular expressionsRegular expressions would not be very useful for search if we couldnot actually evaulate, given a regular expression ๐‘’, whether a string ๐‘ฅis matched by ๐‘’. Luckily, there is an algorithm to do so. Specifically,there is an algorithm (think โ€œPython programโ€ though later we willformalize the notion of algorithms using Turing machines) that oninput a regular expression ๐‘’ over the alphabet 0, 1 and a string ๐‘ฅ โˆˆ0, 1โˆ—, outputs 1 iff ๐‘’ matches ๐‘ฅ (i.e., outputs ฮฆ๐‘’(๐‘ฅ)).

Indeed, Definition 6.7 actually specifies a recursive algorithm forcomputing ฮฆ๐‘’. Specifically, each one of our operations -concatenation,OR, and star- can be thought of as reducing the task of testing whetheran expression ๐‘’ matches a string ๐‘ฅ to testing whether some sub-expressions of ๐‘’ match substrings of ๐‘ฅ. Since these sub-expressionsare always shorter than the original expression, this yields a recursivealgorithm for checking if ๐‘’ matches ๐‘ฅ which will eventually terminateat the base cases of the expressions that correspond to a single symbolor the empty string.

Page 232: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

232 introduction to theoretical computer science

Algorithm 6.10 โ€” Regular expression matching.

Input: Regular expression ๐‘’ over ฮฃโˆ—, ๐‘ฅ โˆˆ ฮฃโˆ—Output: ฮฆ๐‘’(๐‘ฅ)1: procedure Match(๐‘’,๐‘ฅ)2: if ๐‘’ = โˆ… then return 0 ;3: if ๐‘ฅ = "" then return MatchEmpty(()๐‘’) ;4: if ๐‘’ โˆˆ ฮฃ then return 1 iff ๐‘ฅ = ๐‘’ ;5: if ๐‘’ = (๐‘’โ€ฒ|๐‘’โ€ณ) then return Match(๐‘’โ€ฒ, ๐‘ฅ) or Match(๐‘’โ€ณ, ๐‘ฅ)

;6: if ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) then7: for ๐‘– โˆˆ [|๐‘ฅ| + 1] do8: if Match(๐‘’โ€ฒ, ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1) and Match(๐‘’โ€ณ, ๐‘ฅ๐‘– โ‹ฏ ๐‘ฅ|๐‘ฅ|โˆ’1)

then return 1 ;9: end for

10: end if11: if ๐‘’ = (๐‘’โ€ฒ)โˆ— then12: if ๐‘’โ€ฒ = "" then return Match("", ๐‘ฅ) ;13: # ("")โˆ— is the same as ""14: for ๐‘– โˆˆ [|๐‘ฅ|] do15: # ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1 is shorter than ๐‘ฅ16: if Match(๐‘’, ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1) and Match(๐‘’โ€ฒ, ๐‘ฅ๐‘– โ‹ฏ ๐‘ฅ|๐‘ฅ|โˆ’1)

then return 1 ;17: end for18: end if19: return 020: end procedure

We assume above that we have a procedure MatchEmpty thaton input a regular expression ๐‘’ outputs 1 if and only if ๐‘’ matches theempty string "".

The key observation is that in our recursive definition of regular ex-pressions, whenever ๐‘’ is made up of one or two expressions ๐‘’โ€ฒ, ๐‘’โ€ณ thenthese two regular expressions are smaller than ๐‘’. Eventually (whenthey have size 1) then they must correspond to the non-recursivecase of a single alphabet symbol. Correspondingly, the recursive callsmade in Algorithm 6.10 always correspond to a shorter expression or(in the case of an expression of the form (๐‘’โ€ฒ)โˆ—) a shorter input string.Thus, we can prove the correctness of Algorithm 6.10 on inputs ofthe form (๐‘’, ๐‘ฅ) by induction over min|๐‘’|, |๐‘ฅ|. The base case is wheneither ๐‘ฅ = "" or ๐‘’ is a single alphabet symbol, "" or โˆ…. In the casethe expression is of the forrm ๐‘’ = (๐‘’โ€ฒ|๐‘’โ€ณ) or ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) then wemake recursive calls with the shorter expressions ๐‘’โ€ฒ, ๐‘’โ€ณ. In the casethe expression is of the form ๐‘’ = (๐‘’โ€ฒ)โˆ— we make recursive calls with

Page 233: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 233

either a shorter string ๐‘ฅ and the same expression, or with the shorterexpression ๐‘’โ€ฒ and a string ๐‘ฅโ€ฒ that is equal in length or shorter than ๐‘ฅ.Solved Exercise 6.3 โ€” Match the empty string. Give an algorithm that oninput a regular expression ๐‘’, outputs 1 if and only if ฮฆ๐‘’("") = 1.

Solution:

We can obtain such a recursive algorithm by using the followingobservations:

1. An expression of the form "" or (๐‘’โ€ฒ)โˆ— always matches the emptystring.

2. An expression of the form ๐œŽ, where ๐œŽ โˆˆ ฮฃ is an alphabet sym-bol, never matches the empty string.

3. The regular expression โˆ… does not match the empty string.

4. An expression of the form ๐‘’โ€ฒ|๐‘’โ€ณ matches the empty string if andonly if one of ๐‘’โ€ฒ or ๐‘’โ€ณ matches it.

5. An expression of the form (๐‘’โ€ฒ)(๐‘’โ€ณ) matches the empty string ifand only if both ๐‘’โ€ฒ and ๐‘’โ€ณ match it.

Given the above observations, we see that the following algo-rithm will check if ๐‘’ matches the empty string:

procedureMatchEmpty๐‘’ lIf ๐‘’ = โˆ… return 0 lendif lIf๐‘’ = "" return 1 lendif lIf ๐‘’ = โˆ… or ๐‘’ โˆˆ ฮฃ return 0 lendif lIf๐‘’ = (๐‘’โ€ฒ|๐‘’โ€ณ) return ๐‘€๐‘Ž๐‘ก๐‘โ„Ž๐ธ๐‘š๐‘๐‘ก๐‘ฆ(๐‘’โ€ฒ) or ๐‘€๐‘Ž๐‘ก๐‘โ„Ž๐ธ๐‘š๐‘๐‘ก๐‘ฆ(๐‘’โ€ณ) lendifLIf ๐‘’ = (๐‘’โ€ฒ)(๐‘Ÿโ€ฒ) return ๐‘€๐‘Ž๐‘ก๐‘โ„Ž๐ธ๐‘š๐‘๐‘ก๐‘ฆ(๐‘’โ€ฒ) or ๐‘€๐‘Ž๐‘ก๐‘โ„Ž๐ธ๐‘š๐‘๐‘ก๐‘ฆ(๐‘’โ€ณ)lendif lIf ๐‘’ = (๐‘’โ€ฒ)โˆ— return 1 lendif endprocedure

6.4 EFFICIENT MATCHING OF REGULAR EXPRESSIONS (OP-TIONAL)

Algorithm 6.10 is not very efficient. For example, given an expressioninvolving concatenation or the โ€œstarโ€ operation and a string of length๐‘›, it can make ๐‘› recursive calls, and hence it can be shown that in theworst case Algorithm 6.10 can take time exponential in the length ofthe input string ๐‘ฅ. Fortunately, it turns out that there is a much moreefficient algorithm that can match regular expressions in linear (i.e.,๐‘‚(๐‘›)) time. Since we have not yet covered the topics of time and spacecomplexity, we describe this algorithm in high level terms, withoutmaking the computational model precise, using the colloquial no-tion of ๐‘‚(๐‘›) running time as is used in introduction to programming

Page 234: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

234 introduction to theoretical computer science

courses and whiteboard coding interviews. We will see a formal defi-nition of time complexity in Chapter 13.

Theorem 6.11 โ€” Matching regular expressions in linear time. Let ๐‘’ be aregular expression. Then there is an ๐‘‚(๐‘›) time algorithm thatcomputes ฮฆ๐‘’.The implicit constant in the ๐‘‚(๐‘›) term of Theorem 6.11 depends

on the expression ๐‘’. Thus, another way to state Theorem 6.11 is thatfor every expression ๐‘’, there is some constant ๐‘ and an algorithm ๐ดthat computes ฮฆ๐‘’ on ๐‘›-bit inputs using at most ๐‘ โ‹… ๐‘› steps. This makessense, since in practice we often want to compute ฮฆ๐‘’(๐‘ฅ) for a smallregular expression ๐‘’ and a large document ๐‘ฅ. Theorem 6.11 tells usthat we can do so with running time that scales linearly with the sizeof the document, even if it has (potentially) worse dependence on thesize of the regular expression.

We prove Theorem 6.11 by obtaining more efficient recursive al-gorithm, that determines whether ๐‘’ matches a string ๐‘ฅ โˆˆ 0, 1๐‘› byreducing this task to determining whether a related expression ๐‘’โ€ฒmatches ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’2. This will result in an expression for the runningtime of the form ๐‘‡ (๐‘›) = ๐‘‡ (๐‘› โˆ’ 1) + ๐‘‚(1) which solves to ๐‘‡ (๐‘›) = ๐‘‚(๐‘›).

Restrictions of regular expressions. The central definition for the algo-rithm behind Theorem 6.11 is the notion of a restriction of a regularexpression. The idea is that for every regular expression ๐‘’ and symbol๐œŽ in its alphabet, it is possible to define a regular expression ๐‘’[๐œŽ] suchthat ๐‘’[๐œŽ] matches a string ๐‘ฅ if and only if ๐‘’ matches the string ๐‘ฅ๐œŽ. Forexample, if ๐‘’ is the regular expression 01|(01) โˆ— (01) (i.e., one or moreoccurrences of 01) then ๐‘’[1] is equal to 0|(01) โˆ— 0 and ๐‘’[0] will be โˆ….(Can you see why?)

Algorithm 6.12 computes the resriction ๐‘’[๐œŽ] given a regular ex-pression ๐‘’ and an alphabet symbol ๐œŽ. It always terminates, since therecursive calls it makes are always on expressions smaller than theinput expression. Its correctness can be proven by induction on thelength of the regular expression ๐‘’, with the base cases being when ๐‘’ is"", โˆ…, or a single alphabet symbol ๐œ .

Page 235: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 235

Algorithm 6.12 โ€” Restricting regular expression.

Input: Regular expression ๐‘’ over ฮฃ, symbol ๐œŽ โˆˆ ฮฃOutput: Regular expression ๐‘’โ€ฒ = ๐‘’[๐œŽ] such that ฮฆ๐‘’โ€ฒ(๐‘ฅ) =ฮฆ๐‘’(๐‘ฅ๐œŽ) for every ๐‘ฅ โˆˆ ฮฃโˆ—1: procedure Restrict(๐‘’,๐œŽ)2: if ๐‘’ = "" or ๐‘’ = โˆ… then return โˆ… ;3: if ๐‘’ = ๐œ for ๐œ โˆˆ ฮฃ then return "" if ๐œ = ๐œŽ and returnโˆ… otherwise ;4: if ๐‘’ = (๐‘’โ€ฒ|๐‘’โ€ณ) then return (Restrict(๐‘’โ€ฒ, ๐œŽ)|Restrict(๐‘’โ€ณ, ๐œŽ))

;5: if ๐‘’ = (๐‘’โ€ฒ)โˆ— then return (๐‘’โ€ฒ)โˆ—(Restrict(๐‘’โ€ฒ, ๐œŽ)) ;6: if ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) and ฮฆ๐‘’โ€ณ("") = 0 then return(๐‘’โ€ฒ)(Restrict(๐‘’โ€ณ, ๐œŽ)) ;7: if ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) and ฮฆ๐‘’โ€ณ("") = 1 then return(๐‘’โ€ฒ)(Restrict(๐‘’โ€ณ, ๐œŽ) | Restrict(๐‘’โ€ฒ, ๐œŽ)) ;8: end procedure

Using this notion of restriction, we can define the following recur-sive algorithm for regular expression matching:

Algorithm 6.13 โ€” Regular expression matching in linear time.

Input: Regular expression ๐‘’ over ฮฃโˆ—, ๐‘ฅ โˆˆ ฮฃ๐‘› where ๐‘› โˆˆ โ„•Output: ฮฆ๐‘’(๐‘ฅ)1: procedure FMatch(๐‘’,๐‘ฅ)2: if ๐‘ฅ = "" then return MatchEmpty(()๐‘’) ;3: Let ๐‘’โ€ฒ โ† Restrict(๐‘’, ๐‘ฅ๐‘›โˆ’2)4: return FMatch(๐‘’โ€ฒ, ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘›โˆ’1)5: end procedure

By the definition of a restriction, for every ๐œŽ โˆˆ ฮฃ and ๐‘ฅโ€ฒ โˆˆ ฮฃโˆ—,the expression ๐‘’ matches ๐‘ฅโ€ฒ๐œŽ if and only if ๐‘’[๐œŽ] matches ๐‘ฅโ€ฒ. Hence forevery ๐‘’ and ๐‘ฅ โˆˆ ฮฃ๐‘›, ฮฆ๐‘’[๐‘ฅ๐‘›โˆ’1](๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘›โˆ’2) = ฮฆ๐‘’(๐‘ฅ) and Algorithm 6.13does return the correct answer. The only remaining task is to analyzeits running time. Note that Algorithm 6.13 uses the MatchEmptyprocedure of Solved Exercise 6.3 in the base case that ๐‘ฅ = "". However,this is OK since the running time of this procedure depends only on ๐‘’and is independent of the length of the original input.

For simplicity, let us restrict our attention to the case that the al-phabet ฮฃ is equal to 0, 1. Define ๐ถ(โ„“) to be the maximum numberof operations that Algorithm 6.12 takes when given as input a regularexpression ๐‘’ over 0, 1 of at most โ„“ symbols. The value ๐ถ(โ„“) can beshown to be polynomial in โ„“, though this is not important for this the-orem, since we only care about the dependence of the time to compute

Page 236: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

236 introduction to theoretical computer science

ฮฆ๐‘’(๐‘ฅ) on the length of ๐‘ฅ and not about the dependence of this time onthe length of ๐‘’.

Algorithm 6.13 is a recursive algorithm that input an expression๐‘’ and a string ๐‘ฅ โˆˆ 0, 1๐‘›, does computation of at most ๐ถ(|๐‘’|) stepsand then calls itself with input some expression ๐‘’โ€ฒ and a string ๐‘ฅโ€ฒ oflength ๐‘› โˆ’ 1. It will terminate after ๐‘› steps when it reaches a string oflength 0. So, the running time ๐‘‡ (๐‘’, ๐‘›) that it takes for Algorithm 6.13to compute ฮฆ๐‘’ for inputs of length ๐‘› satisfies the recursive equation:

๐‘‡ (๐‘’, ๐‘›) = max๐‘‡ (๐‘’[0], ๐‘› โˆ’ 1), ๐‘‡ (๐‘’[1], ๐‘› โˆ’ 1) + ๐ถ(|๐‘’|) (6.12)

(In the base case ๐‘› = 0, ๐‘‡ (๐‘’, 0) is equal to some constant dependingonly on ๐‘’.) To get some intuition for the expression Eq. (6.12), let usopen up the recursion for one level, writing ๐‘‡ (๐‘’, ๐‘›) as๐‘‡ (๐‘’, ๐‘›) = max๐‘‡ (๐‘’[0][0], ๐‘› โˆ’ 2) + ๐ถ(|๐‘’[0]|),๐‘‡ (๐‘’[0][1], ๐‘› โˆ’ 2) + ๐ถ(|๐‘’[0]|),๐‘‡ (๐‘’[1][0], ๐‘› โˆ’ 2) + ๐ถ(|๐‘’[1]|),๐‘‡ (๐‘’[1][1], ๐‘› โˆ’ 2) + ๐ถ(|๐‘’[1]|) + ๐ถ(|๐‘’|) . (6.13)

Continuing this way, we can see that ๐‘‡ (๐‘’, ๐‘›) โ‰ค ๐‘› โ‹… ๐ถ(๐ฟ) + ๐‘‚(1)where ๐ฟ is the largest length of any expression ๐‘’โ€ฒ that we encounteralong the way. Therefore, the following claim suffices to show thatAlgorithm 6.13 runs in ๐‘‚(๐‘›) time:

Claim: Let ๐‘’ be a regular expression over 0, 1, then there is a num-ber ๐ฟ(๐‘’) โˆˆ โ„•, such that for every sequence of symbols ๐›ผ0, โ€ฆ , ๐›ผ๐‘›โˆ’1, ifwe define ๐‘’โ€ฒ = ๐‘’[๐›ผ0][๐›ผ1] โ‹ฏ [๐›ผ๐‘›โˆ’1] (i.e., restricting ๐‘’ to ๐›ผ0, and then ๐›ผ1and so on and so forth), then |๐‘’โ€ฒ| โ‰ค ๐ฟ(๐‘’).

Proof of claim: For a regular expression ๐‘’ over 0, 1 and ๐›ผ โˆˆ 0, 1๐‘š,we denote by ๐‘’[๐›ผ] the expression ๐‘’[๐›ผ0][๐›ผ1] โ‹ฏ [๐›ผ๐‘šโˆ’1] obtained by restrict-ing ๐‘’ to ๐›ผ0 and then to ๐›ผ1 and so on. We let ๐‘†(๐‘’) = ๐‘’[๐›ผ]|๐›ผ โˆˆ 0, 1โˆ—.We will prove the claim by showing that for every ๐‘’, the set ๐‘†(๐‘’) is fi-nite, and hence so is the number ๐ฟ(๐‘’) which is the maximum length of๐‘’โ€ฒ for ๐‘’โ€ฒ โˆˆ ๐‘†(๐‘’).We prove this by induction on the structure of ๐‘’. If ๐‘’ is a symbol, theempty string, or the empty set, then this is straightforward to showas the most expressions ๐‘†(๐‘’) can contain are the expression itself, "",and โˆ…. Otherwise we split to the two cases (i) ๐‘’ = ๐‘’โ€ฒโˆ— and (ii) ๐‘’ =๐‘’โ€ฒ๐‘’โ€ณ, where ๐‘’โ€ฒ, ๐‘’โ€ณ are smaller expressions (and hence by the inductionhypothesis ๐‘†(๐‘’โ€ฒ) and ๐‘†(๐‘’โ€ณ) are finite). In the case (i), if ๐‘’ = (๐‘’โ€ฒ)โˆ— then๐‘’[๐›ผ] is either equal to (๐‘’โ€ฒ)โˆ—๐‘’โ€ฒ[๐›ผ] or it is simply the empty set if ๐‘’โ€ฒ[๐›ผ] = โˆ….Since ๐‘’โ€ฒ[๐›ผ] is in the set ๐‘†(๐‘’โ€ฒ), the number of distinct expressions in๐‘†(๐‘’) is at most |๐‘†(๐‘’โ€ฒ)| + 1. In the case (ii), if ๐‘’ = ๐‘’โ€ฒ๐‘’โ€ณ then all therestrictions of ๐‘’ to strings ๐›ผ will either have the form ๐‘’โ€ฒ๐‘’โ€ณ[๐›ผ] or the form๐‘’โ€ฒ๐‘’โ€ณ[๐›ผ]|๐‘’โ€ฒ[๐›ผโ€ฒ] where ๐›ผโ€ฒ is some string such that ๐›ผ = ๐›ผโ€ฒ๐›ผโ€ณ and ๐‘’โ€ณ[๐›ผโ€ณ]

Page 237: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 237

matches the empty string. Since ๐‘’โ€ณ[๐›ผ] โˆˆ ๐‘†(๐‘’โ€ณ) and ๐‘’โ€ฒ[๐›ผโ€ฒ] โˆˆ ๐‘†(๐‘’โ€ฒ), thenumber of the possible distinct expressions of the form ๐‘’[๐›ผ] is at most|๐‘†(๐‘’โ€ณ)| + |๐‘†(๐‘’โ€ณ)| โ‹… |๐‘†(๐‘’โ€ฒ)|. This completes the proof of the claim.

The bottom line is that while running Algorithm 6.13 on a regularexpression ๐‘’, all the expressions we ever encounter are in the finite set๐‘†(๐‘’), no matter how large the input ๐‘ฅ is, and so the running time ofAlgorithm 6.13 satisfies the equation ๐‘‡ (๐‘›) = ๐‘‡ (๐‘› โˆ’ 1) + ๐ถโ€ฒ for someconstant ๐ถโ€ฒ depending on ๐‘’. This solves to ๐‘‚(๐‘›) where the implicitconstant in the O notation can (and will) depend on ๐‘’ but crucially,not on the length of the input ๐‘ฅ.6.4.1 Matching regular expressions using DFAsTheorem 6.11 is already quite impressive, but we can do even better.Specifically, no matter how long the string ๐‘ฅ is, we can compute ฮฆ๐‘’(๐‘ฅ)by maintaining only a constant amount of memory and moreovermaking a single pass over ๐‘ฅ. That is, the algorithm will scan the input๐‘ฅ once from start to finish, and then determine whether or not ๐‘ฅ ismatched by the expression ๐‘’. This is important in the common caseof trying to match a short regular expression over a huge file or docu-ment that might not even fit in our computerโ€™s memory. Of course, aswe have seen before, a single-pass constant-memory algorithm is sim-ply a deterministic finite automaton. As we will see in Theorem 6.16, afunction is can be computed by regular expression if and only if it canbe computed by a DFA. We start with showing the โ€œonly ifโ€ direction:

Theorem 6.14 โ€” DFA for regular expression matching. Let ๐‘’ be a regularexpression. Then there is an algorithm that on input ๐‘ฅ โˆˆ 0, 1โˆ—computes ฮฆ๐‘’(๐‘ฅ) while making a single pass over ๐‘ฅ and maintaininga constant amount of memory.

Proof Idea:

The single-pass constant-memory for checking if a string matchesa regular expression is presented in Algorithm 6.15. The idea is toreplace the recursive algorithm of Algorithm 6.13 with a dynamic pro-gram, using the technique of memoization. If you havenโ€™t taken yet analgorithms course, you might not know these techniques. This is OK;while this more efficient algorithm is crucial for the many practicalapplications of regular expressions, it is not of great importance forthis book.โ‹†

Page 238: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

238 introduction to theoretical computer science

Algorithm 6.15 โ€” Regular expression matching by a DFA.

Input: Regular expression ๐‘’ over ฮฃโˆ—, ๐‘ฅ โˆˆ ฮฃ๐‘› where ๐‘› โˆˆ โ„•Output: ฮฆ๐‘’(๐‘ฅ)1: procedure DFAMatch(๐‘’,๐‘ฅ)2: Let ๐‘† โ† ๐‘†(๐‘’) be the set ๐‘’[๐›ผ]|๐›ผ โˆˆ 0, 1โˆ— as defined

in the proof of [reglintimethm]().ref.3: for ๐‘’โ€ฒ โˆˆ ๐‘† do4: Let ๐‘ฃ๐‘’โ€ฒ โ† 1 if ฮฆ๐‘’โ€ฒ("") = 1 and ๐‘ฃ๐‘’โ€ฒ โ† 0 otherwise5: end for6: for ๐‘– โˆˆ [๐‘›] do7: Let ๐‘™๐‘Ž๐‘ ๐‘ก๐‘’โ€ฒ โ† ๐‘ฃ๐‘’โ€ฒ for all ๐‘’โ€ฒ โˆˆ ๐‘†8: Let ๐‘ฃ๐‘’โ€ฒ โ† ๐‘™๐‘Ž๐‘ ๐‘ก๐‘’โ€ฒ[๐‘ฅ๐‘–] for all ๐‘’โ€ฒ โˆˆ ๐‘†9: end for

10: return ๐‘ฃ๐‘’11: end procedure

Proof of Theorem 6.14. Algorithm 6.15 checks if a given string ๐‘ฅ โˆˆ ฮฃโˆ—is matched by the regular expression ๐‘’. For every regular expres-sion ๐‘’, this algorithm has a constant number 2|๐‘†(๐‘’)| Boolean vari-ables (๐‘ฃ๐‘’โ€ฒ , ๐‘™๐‘Ž๐‘ ๐‘ก๐‘’โ€ฒ for ๐‘’โ€ฒ โˆˆ ๐‘†(๐‘’)), and it makes a single pass overthe input string. Hence it corresponds to a DFA. We prove its cor-rectness by induction on the length ๐‘› of the input. Specifically, wewill argue that before reading the ๐‘–-th bit of ๐‘ฅ, the variable ๐‘ฃ๐‘’โ€ฒ isequal to ฮฆ๐‘’โ€ฒ(๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1) for every ๐‘’โ€ฒ โˆˆ ๐‘†(๐‘’). In the case ๐‘– = 0 thisholds since we initialize ๐‘ฃ๐‘’โ€ฒ = ฮฆ๐‘’โ€ฒ("") for all ๐‘’โ€ฒ โˆˆ ๐‘†(๐‘’). For ๐‘– > 0this holds by induction since the inductive hypothesis implies that๐‘™๐‘Ž๐‘ ๐‘กโ€ฒ๐‘’ = ฮฆ๐‘’โ€ฒ(๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’2) for all ๐‘’โ€ฒ โˆˆ ๐‘†(๐‘’) and by the definition of the set๐‘†(๐‘’โ€ฒ), for every ๐‘’โ€ฒ โˆˆ ๐‘†(๐‘’) and ๐‘ฅ๐‘–โˆ’1 โˆˆ ฮฃ, ๐‘’โ€ณ = ๐‘’โ€ฒ[๐‘ฅ๐‘–โˆ’1] is in ๐‘†(๐‘’) andฮฆ๐‘’โ€ฒ(๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1) = ฮฆ๐‘’โ€ณ(๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–).

6.4.2 Equivalence of regular expressions and automataRecall that a Boolean function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is defined to beregular if it is equal to ฮฆ๐‘’ for some regular expression ๐‘’. (Equivalently,a language ๐ฟ โŠ† 0, 1โˆ— is defined to be regular if there is a regularexpression ๐‘’ such that ๐‘’ matches ๐‘ฅ iff ๐‘ฅ โˆˆ ๐ฟ.) The following theorem isthe central result of automata theory:

Theorem 6.16 โ€” DFA and regular expression equivalency. Let ๐น โˆถ 0, 1โˆ— โ†’0, 1. Then ๐น is regular if and only if there exists a DFA (๐‘‡ , ๐’ฎ) thatcomputes ๐น .

Proof Idea:

Page 239: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 239

Figure 6.6: A deterministic finite automaton thatcomputes the function ฮฆ(01)โˆ— .

Figure 6.7: Given a DFA of ๐ถ states, for every ๐‘ฃ, ๐‘ค โˆˆ[๐ถ] and number ๐‘ก โˆˆ 0, โ€ฆ , ๐ถ we define the function๐น ๐‘ก๐‘ฃ,๐‘ค โˆถ 0, 1โˆ— โ†’ 0, 1 to output one on input๐‘ฅ โˆˆ 0, 1โˆ— if and only if when the DFA is initializedin the state ๐‘ฃ and is given the input ๐‘ฅ, it will reach thestate ๐‘ค while going only through the intermediatestates 0, โ€ฆ , ๐‘ก โˆ’ 1.

One direction follows from Theorem 6.14, which shows that forevery regular expression ๐‘’, the function ฮฆ๐‘’ can be computed by a DFA(see for example Fig. 6.6). For the other direction, we show that givena DFA (๐‘‡ , ๐’ฎ) for every ๐‘ฃ, ๐‘ค โˆˆ [๐ถ] we can find a regular expression thatwould match ๐‘ฅ โˆˆ 0, 1โˆ— if and only if the DFA starting in state ๐‘ฃ, willend up in state ๐‘ค after reading ๐‘ฅ.โ‹†Proof of Theorem 6.16. Since Theorem 6.14 proves the โ€œonly ifโ€ direc-tion, we only need to show the โ€œifโ€ direction. Let ๐ด = (๐‘‡ , ๐’ฎ) be a DFAwith ๐ถ states that computes the function ๐น . We need to show that ๐น isregular.

For every ๐‘ฃ, ๐‘ค โˆˆ [๐ถ], we let ๐น๐‘ฃ,๐‘ค โˆถ 0, 1โˆ— โ†’ 0, 1 be the functionthat maps ๐‘ฅ โˆˆ 0, 1โˆ— to 1 if and only if the DFA ๐ด, starting at thestate ๐‘ฃ, will reach the state ๐‘ค if it reads the input ๐‘ฅ. We will prove that๐น๐‘ฃ,๐‘ค is regular for every ๐‘ฃ, ๐‘ค. This will prove the theorem, since byDefinition 6.2, ๐น(๐‘ฅ) is equal to the OR of ๐น0,๐‘ค(๐‘ฅ) for every ๐‘ค โˆˆ ๐’ฎ.Hence if we have a regular expression for every function of the form๐น๐‘ฃ,๐‘ค then (using the | operation) we can obtain a regular expressionfor ๐น as well.

To give regular expressions for the functions ๐น๐‘ฃ,๐‘ค, we start bydefining the following functions ๐น ๐‘ก๐‘ฃ,๐‘ค: for every ๐‘ฃ, ๐‘ค โˆˆ [๐ถ] and0 โ‰ค ๐‘ก โ‰ค ๐ถ, ๐น ๐‘ก๐‘ฃ,๐‘ค(๐‘ฅ) = 1 if and only if starting from ๐‘ฃ and observ-ing ๐‘ฅ, the automata reaches ๐‘ค with all intermediate states being in the set[๐‘ก] = 0, โ€ฆ , ๐‘ก โˆ’ 1 (see Fig. 6.7). That is, while ๐‘ฃ, ๐‘ค themselves mightbe outside [๐‘ก], ๐น ๐‘ก๐‘ฃ,๐‘ค(๐‘ฅ) = 1 if and only if throughout the execution ofthe automaton on the input ๐‘ฅ (when initiated at ๐‘ฃ) it never enters anyof the states outside [๐‘ก] and still ends up at ๐‘ค. If ๐‘ก = 0 then [๐‘ก] is theempty set, and hence ๐น 0๐‘ฃ,๐‘ค(๐‘ฅ) = 1 if and only if the automaton reaches๐‘ค from ๐‘ฃ directly on ๐‘ฅ, without any intermediate state. If ๐‘ก = ๐ถ thenall states are in [๐‘ก], and hence ๐น ๐‘ก๐‘ฃ,๐‘ค = ๐น๐‘ฃ,๐‘ค.

We will prove the theorem by induction on ๐‘ก, showing that ๐น ๐‘ก๐‘ฃ,๐‘ค isregular for every ๐‘ฃ, ๐‘ค and ๐‘ก. For the base case of ๐‘ก = 0, ๐น 0๐‘ฃ,๐‘ค is regularfor every ๐‘ฃ, ๐‘ค since it can be described as one of the expressions "", โˆ…,0, 1 or 0|1. Specifically, if ๐‘ฃ = ๐‘ค then ๐น 0๐‘ฃ,๐‘ค(๐‘ฅ) = 1 if and only if ๐‘ฅ isthe empty string. If ๐‘ฃ โ‰  ๐‘ค then ๐น 0๐‘ฃ,๐‘ค(๐‘ฅ) = 1 if and only if ๐‘ฅ consistsof a single symbol ๐œŽ โˆˆ 0, 1 and ๐‘‡ (๐‘ฃ, ๐œŽ) = ๐‘ค. Therefore in this case๐น 0๐‘ฃ,๐‘ค corresponds to one of the four regular expressions 0|1, 0, 1 or โˆ…,depending on whether ๐ด transitions to ๐‘ค from ๐‘ฃ when it reads either 0or 1, only one of these symbols, or neither.

Inductive step: Now that weโ€™ve seen the base case, letโ€™s prove thegeneral case by induction. Assume, via the induction hypothesis, thatfor every ๐‘ฃโ€ฒ, ๐‘คโ€ฒ โˆˆ [๐ถ], we have a regular expression ๐‘…๐‘ก๐‘ฃ,๐‘ค that computes๐น ๐‘ก๐‘ฃโ€ฒ,๐‘คโ€ฒ . We need to prove that ๐น ๐‘ก+1๐‘ฃ,๐‘ค is regular for every ๐‘ฃ, ๐‘ค. If the

Page 240: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

240 introduction to theoretical computer science

automaton arrives from ๐‘ฃ to ๐‘ค using the intermediate states [๐‘ก + 1],then it visits the ๐‘ก-th state zero or more times. If the path labeled by ๐‘ฅcauses the automaton to get from ๐‘ฃ to ๐‘ค without visiting the ๐‘ก-th stateat all, then ๐‘ฅ is matched by the regular expression ๐‘…๐‘ก๐‘ฃ,๐‘ค. If the pathlabeled by ๐‘ฅ causes the automaton to get from ๐‘ฃ to ๐‘ค while visiting the๐‘ก-th state ๐‘˜ > 0 times then we can think of this path as:โ€ข First travel from ๐‘ฃ to ๐‘ก using only intermediate states in [๐‘ก โˆ’ 1].โ€ข Then go from ๐‘ก back to itself ๐‘˜ โˆ’ 1 times using only intermediate

states in [๐‘ก โˆ’ 1]โ€ข Then go from ๐‘ก to ๐‘ค using only intermediate states in [๐‘ก โˆ’ 1].

Therefore in this case the string ๐‘ฅ is matched by the regular expres-sion ๐‘…๐‘ก๐‘ฃ,๐‘ก(๐‘…๐‘ก๐‘ก,๐‘ก)โˆ—๐‘…๐‘ก๐‘ก,๐‘ค. (See also Fig. 6.8.)

Therefore we can compute ๐น ๐‘ก+1๐‘ฃ,๐‘ค using the regular expression๐‘…๐‘ก๐‘ฃ,๐‘ค | ๐‘…๐‘ก๐‘ฃ,๐‘ก(๐‘…๐‘ก๐‘ก,๐‘ก)โˆ—๐‘…๐‘ก๐‘ก,๐‘ค . (6.14)This completes the proof of the inductive step and hence of the theo-rem.

Figure 6.8: If we have regular expressions ๐‘…๐‘ก๐‘ฃโ€ฒ,๐‘คโ€ฒcorresponding to ๐น ๐‘ก๐‘ฃโ€ฒ,๐‘คโ€ฒ for every ๐‘ฃโ€ฒ, ๐‘คโ€ฒ โˆˆ [๐ถ], we canobtain a regular expression ๐‘…๐‘ก+1๐‘ฃ,๐‘ค corresponding to๐น ๐‘ก+1๐‘ฃ,๐‘ค . The key observation is that a path from ๐‘ฃ to ๐‘คusing 0, โ€ฆ , ๐‘ก either does not touch ๐‘ก at all, in whichcase it is captured by the expression ๐‘…๐‘ก๐‘ฃ,๐‘ค, or it goesfrom ๐‘ฃ to ๐‘ก, comes back to ๐‘ก zero or more times, andthen goes from ๐‘ก to ๐‘ค, in which case it is captured bythe expression ๐‘…๐‘ก๐‘ฃ,๐‘ก(๐‘…๐‘ก๐‘ก,๐‘ก)โˆ—๐‘…๐‘ก๐‘ก,๐‘ค.

6.4.3 Closure properties of regular expressionsIf ๐น and ๐บ are regular functions computed by the expressions ๐‘’ and ๐‘“respectively, then the expression ๐‘’|๐‘“ computes the function ๐ป = ๐น โˆจ ๐บdefined as ๐ป(๐‘ฅ) = ๐น(๐‘ฅ) โˆจ ๐บ(๐‘ฅ). Another way to say this is that the setof regular functions is closed under the OR operation. That is, if ๐น and ๐บare regular then so is ๐น โˆจ ๐บ. An important corollary of Theorem 6.16is that this set is also closed under the NOT operation:

Page 241: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 241

Lemma 6.17 โ€” Regular expressions closed under complement. If ๐น โˆถ 0, 1โˆ— โ†’0, 1 is regular then so is the function ๐น , where ๐น(๐‘ฅ) = 1 โˆ’ ๐น(๐‘ฅ) forevery ๐‘ฅ โˆˆ 0, 1โˆ—.Proof. If ๐น is regular then by Theorem 6.11 it can be computed by aDFA ๐ด = (๐‘‡ , ๐’œ) with some ๐ถ states. But then the DFA ๐ด = (๐‘‡ , [๐ถ]โงต๐’œ)which does the same computation but where flips the set of acceptedstates will compute ๐น . By Theorem 6.16 this implies that ๐น is regularas well.

Since ๐‘Ž โˆง ๐‘ = ๐‘Ž โˆจ ๐‘, Lemma 6.17 implies that the set of regularfunctions is closed under the AND operation as well. Moreover, sinceOR, NOT and AND are a universal basis, this set is also closed un-der NAND, XOR, and any other finite function. That is, we have thefollowing corollary:

Theorem 6.18 โ€” Closure of regular expressions. Let ๐‘“ โˆถ 0, 1๐‘˜ โ†’ 0, 1 beany finite Boolean function, and let ๐น0, โ€ฆ , ๐น๐‘˜โˆ’1 โˆถ 0, 1โˆ— โ†’ 0, 1 beregular functions. Then the function ๐บ(๐‘ฅ) = ๐‘“(๐น0(๐‘ฅ), ๐น1(๐‘ฅ), โ€ฆ , ๐น๐‘˜โˆ’1(๐‘ฅ))is regular.

Proof. This is a direct consequence of the closure of regular functionsunder OR and NOT (and hence AND) and Theorem 4.13, that statesthat every ๐‘“ can be computed by a Boolean circuit (which is simply acombination of the AND, OR, and NOT operations).

6.5 LIMITATIONS OF REGULAR EXPRESSIONS AND THE PUMPINGLEMMA

The efficiency of regular expression matching makes them very useful.This is the reason why operating systems and text editors often restricttheir search interface to regular expressions and donโ€™t allow searchingby specifying an arbitrary function. But this efficiency comes at a cost.As we have seen, regular expressions cannot compute every function.In fact there are some very simple (and useful!) functions that theycannot compute. Here is one example:Lemma 6.19 โ€” Matching parenthesis. Let ฮฃ = โŸจ, โŸฉ and MATCHPAREN โˆถฮฃโˆ— โ†’ 0, 1 be the function that given a string of parentheses, out-puts 1 if and only if every opening parenthesis is matched by a corre-sponding closed one. Then there is no regular expression over ฮฃ thatcomputes MATCHPAREN.

Lemma 6.19 is a consequence of the following result, which isknown as the pumping lemma:

Page 242: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

242 introduction to theoretical computer science

Theorem 6.20 โ€” Pumping Lemma. Let ๐‘’ be a regular expression oversome alphabet ฮฃ. Then there is some number ๐‘›0 such that for ev-ery ๐‘ค โˆˆ ฮฃโˆ— with |๐‘ค| > ๐‘›0 and ฮฆ๐‘’(๐‘ค) = 1, we can write ๐‘ค = ๐‘ฅ๐‘ฆ๐‘ง forstrings ๐‘ฅ, ๐‘ฆ, ๐‘ง โˆˆ ฮฃโˆ— satisfying the following conditions:

1. |๐‘ฆ| โ‰ฅ 1.2. |๐‘ฅ๐‘ฆ| โ‰ค ๐‘›0.3. ฮฆ๐‘’(๐‘ฅ๐‘ฆ๐‘˜๐‘ง) = 1 for every ๐‘˜ โˆˆ โ„•.

Figure 6.9: To prove the โ€œpumping lemmaโ€ we lookat a word ๐‘ค that is much larger than the regularexpression ๐‘’ that matches it. In such a case, part of๐‘ค must be matched by some sub-expression of theform (๐‘’โ€ฒ)โˆ—, since this is the only operator that allowsmatching words longer than the expression. If welook at the โ€œleftmostโ€ such sub-expression and define๐‘ฆ๐‘˜ to be the string that is matched by it, we obtain thepartition needed for the pumping lemma.

Proof Idea:

The idea behind the proof the following. Let ๐‘›0 be twice the num-ber of symbols that are used in the expression ๐‘’, then the only waythat there is some ๐‘ค with |๐‘ค| > ๐‘›0 and ฮฆ๐‘’(๐‘ค) = 1 is that ๐‘’ containsthe โˆ— (i.e. star) operator and that there is a nonempty substring ๐‘ฆ of๐‘ค that was matched by (๐‘’โ€ฒ)โˆ— for some sub-expression ๐‘’โ€ฒ of ๐‘’. We cannow repeat ๐‘ฆ any number of times and still get a matching string. Seealso Fig. 6.9.โ‹†

PThe pumping lemma is a bit cumbersome to state,but one way to remember it is that it simply says thefollowing: โ€œif a string matching a regular expression islong enough, one of its substrings must be matched usingthe โˆ— operatorโ€.

Page 243: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 243

Proof of Theorem 6.20. To prove the lemma formally, we use inductionon the length of the expression. Like all induction proofs, this is goingto be somewhat lengthy, but at the end of the day it directly followsthe intuition above that somewhere we must have used the star oper-ation. Reading this proof, and in particular understanding how theformal proof below corresponds to the intuitive idea above, is a verygood way to get more comfortable with inductive proofs of this form.

Our inductive hypothesis is that for an ๐‘› length expression, ๐‘›0 =2๐‘› satisfies the conditions of the lemma. The base case is when theexpression is a single symbol ๐œŽ โˆˆ ฮฃ or that the expression is โˆ… or"". In all these cases the conditions of the lemma are satisfied simplybecause there ๐‘›0 = 2 and there is no string ๐‘ฅ of length larger than ๐‘›0that is matched by the expression.

We now prove the inductive step. Let ๐‘’ be a regular expressionwith ๐‘› > 1 symbols. We set ๐‘›0 = 2๐‘› and let ๐‘ค โˆˆ ฮฃโˆ— be a stringsatisfying |๐‘ค| > ๐‘›0. Since ๐‘’ has more than one symbol, it has one ofthe the forms (a) ๐‘’โ€ฒ|๐‘’โ€ณ, (b), (๐‘’โ€ฒ)(๐‘’โ€ณ), or (c) (๐‘’โ€ฒ)โˆ— where in all thesecases the subexpressions ๐‘’โ€ฒ and ๐‘’โ€ณ have fewer symbols than ๐‘’ andhence satisfy the induction hypothesis.

In the case (a), every string ๐‘ค matched by ๐‘’ must be matched byeither ๐‘’โ€ฒ or ๐‘’โ€ณ. If ๐‘’โ€ฒ matches ๐‘ค then, since |๐‘ค| > 2|๐‘’โ€ฒ|, by the inductionhypothesis there exist ๐‘ฅ, ๐‘ฆ, ๐‘ง with |๐‘ฆ| โ‰ฅ 1 and |๐‘ฅ๐‘ฆ| โ‰ค 2|๐‘’โ€ฒ| < ๐‘›0 suchthat ๐‘’โ€ฒ (and therefore also ๐‘’ = ๐‘’โ€ฒ|๐‘’โ€ณ) matches ๐‘ฅ๐‘ฆ๐‘˜๐‘ง for every ๐‘˜. Thesame arguments works in the case that ๐‘’โ€ณ matches ๐‘ค.

In the case (b), if ๐‘ค is matched by (๐‘’โ€ฒ)(๐‘’โ€ณ) then we can write ๐‘ค =๐‘คโ€ฒ๐‘คโ€ณ where ๐‘’โ€ฒ matches ๐‘คโ€ฒ and ๐‘’โ€ณ matches ๐‘คโ€ณ. We split to subcases. If|๐‘คโ€ฒ| > 2|๐‘’โ€ฒ| then by the induction hypothesis there exist ๐‘ฅ, ๐‘ฆ, ๐‘งโ€ฒ with|๐‘ฆ| โ‰ค 1, |๐‘ฅ๐‘ฆ| โ‰ค 2|๐‘’โ€ฒ| < ๐‘›0 such that ๐‘คโ€ฒ = ๐‘ฅ๐‘ฆ๐‘งโ€ฒ and ๐‘’โ€ฒ matches ๐‘ฅ๐‘ฆ๐‘˜๐‘งโ€ฒfor every ๐‘˜ โˆˆ โ„•. This completes the proof since if we set ๐‘ง = ๐‘งโ€ฒ๐‘คโ€ณthen we see that ๐‘ค = ๐‘คโ€ฒ๐‘คโ€ณ = ๐‘ฅ๐‘ฆ๐‘ง and ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) matches ๐‘ฅ๐‘ฆ๐‘˜๐‘ง forevery ๐‘˜ โˆˆ โ„•. Otherwise, if |๐‘คโ€ฒ| โ‰ค 2|๐‘’โ€ฒ| then since |๐‘ค| = |๐‘คโ€ฒ| + |๐‘คโ€ณ| >๐‘›0 = 2(|๐‘’โ€ฒ| + |๐‘’โ€ณ|), it must be that |๐‘คโ€ณ| > 2|๐‘’โ€ณ|. Hence by the inductionhypothesis there exist ๐‘ฅโ€ฒ, ๐‘ฆ, ๐‘ง such that |๐‘ฆ| โ‰ฅ 1, |๐‘ฅโ€ฒ๐‘ฆ| โ‰ค 2|๐‘’โ€ณ| and ๐‘’โ€ณmatches ๐‘ฅโ€ฒ๐‘ฆ๐‘˜๐‘ง for every ๐‘˜ โˆˆ โ„•. But now if we set ๐‘ฅ = ๐‘คโ€ฒ๐‘ฅโ€ฒ we see that|๐‘ฅ๐‘ฆ| โ‰ค |๐‘คโ€ฒ| + |๐‘ฅโ€ฒ๐‘ฆ| โ‰ค 2|๐‘’โ€ฒ| + 2|๐‘’โ€ณ| = ๐‘›0 and on the other hand theexpression ๐‘’ = (๐‘’โ€ฒ)(๐‘’โ€ณ) matches ๐‘ฅ๐‘ฆ๐‘˜๐‘ง = ๐‘คโ€ฒ๐‘ฅโ€ฒ๐‘ฆ๐‘˜๐‘ง for every ๐‘˜ โˆˆ โ„•.

In case (c), if ๐‘ค is matched by (๐‘’โ€ฒ)โˆ— then ๐‘ค = ๐‘ค0 โ‹ฏ ๐‘ค๐‘ก where forevery ๐‘– โˆˆ [๐‘ก], ๐‘ค๐‘– is a nonempty string matched by ๐‘’โ€ฒ. If |๐‘ค0| > 2|๐‘’โ€ฒ|then we can use the same approach as in the concatenation case above.Otherwise, we simply note that if ๐‘ฅ is the empty string, ๐‘ฆ = ๐‘ค0, and๐‘ง = ๐‘ค1 โ‹ฏ ๐‘ค๐‘ก then |๐‘ฅ๐‘ฆ| โ‰ค ๐‘›0 and ๐‘ฅ๐‘ฆ๐‘˜๐‘ง is matched by (๐‘’โ€ฒ)โˆ— for every๐‘˜ โˆˆ โ„•.

Page 244: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

244 introduction to theoretical computer science

RRemark 6.21 โ€” Recursive definitions and inductiveproofs. When an object is recursively defined (as in thecase of regular expressions) then it is natural to proveproperties of such objects by induction. That is, if wewant to prove that all objects of this type have prop-erty ๐‘ƒ , then it is natural to use an inductive steps thatsays that if ๐‘œโ€ฒ, ๐‘œโ€ณ, ๐‘œโ€ด etc have property ๐‘ƒ then so is anobject ๐‘œ that is obtained by composing them.

Using the pumping lemma, we can easily prove Lemma 6.19 (i.e.,the non-regularity of the โ€œmatching parenthesisโ€ function):

Proof of Lemma 6.19. Suppose, towards the sake of contradiction, thatthere is an expression ๐‘’ such that ฮฆ๐‘’ = MATCHPAREN. Let ๐‘›0 bethe number obtained from Theorem 6.20 and let ๐‘ค = โŸจ๐‘›0โŸฉ๐‘›0 (i.e.,๐‘›0 left parenthesis followed by ๐‘›0 right parenthesis). Then we seethat if we write ๐‘ค = ๐‘ฅ๐‘ฆ๐‘ง as in Lemma 6.19, the condition |๐‘ฅ๐‘ฆ| โ‰ค ๐‘›0implies that ๐‘ฆ consists solely of left parenthesis. Hence the string๐‘ฅ๐‘ฆ2๐‘ง will contain more left parenthesis than right parenthesis. HenceMATCHPAREN(๐‘ฅ๐‘ฆ2๐‘ง) = 0 but by the pumping lemma ฮฆ๐‘’(๐‘ฅ๐‘ฆ2๐‘ง) = 1,contradicting our assumption that ฮฆ๐‘’ = MATCHPAREN.

The pumping lemma is a very useful tool to show that certain func-tions are not computable by a regular expression. However, it is notan โ€œif and only ifโ€ condition for regularity: there are non regularfunctions that still satisfy the conditions of the pumping lemma. Tounderstand the pumping lemma, it is important to follow the order ofquantifiers in Theorem 6.20. In particular, the number ๐‘›0 in the state-ment of Theorem 6.20 depends on the regular expression (in the proofwe chose ๐‘›0 to be twice the number of symbols in the expression). So,if we want to use the pumping lemma to rule out the existence of aregular expression ๐‘’ computing some function ๐น , we need to be ableto choose an appropriate input ๐‘ค โˆˆ 0, 1โˆ— that can be arbitrarily largeand satisfies ๐น(๐‘ค) = 1. This makes sense if you think about the intu-ition behind the pumping lemma: we need ๐‘ค to be large enough as toforce the use of the star operator.Solved Exercise 6.4 โ€” Palindromes is not regular. Prove that the followingfunction over the alphabet 0, 1, ; is not regular: PAL(๐‘ค) = 1 if andonly if ๐‘ค = ๐‘ข; ๐‘ข๐‘… where ๐‘ข โˆˆ 0, 1โˆ— and ๐‘ข๐‘… denotes ๐‘ข โ€œreversedโ€:the string ๐‘ข|๐‘ข|โˆ’1 โ‹ฏ ๐‘ข0. (The Palindrome function is most often definedwithout an explicit separator character ;, but the version with such aseparator is a bit cleaner and so we use it here. This does not make

Page 245: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 245

Figure 6.10: A cartoon of a proof using the pumping lemma that a function ๐น is not regular. The pumping lemma states that if ๐น is regular then thereexists a number ๐‘›0 such that for every large enough ๐‘ค with ๐น(๐‘ค) = 1, there exists a partition of ๐‘ค to ๐‘ค = ๐‘ฅ๐‘ฆ๐‘ง satisfying certain conditions suchthat for every ๐‘˜ โˆˆ โ„•, ๐น(๐‘ฅ๐‘ฆ๐‘˜๐‘ง) = 1. You can imagine a pumping-lemma based proof as a game between you and the adversary. Every there existsquantifier corresponds to an object you are free to choose on your own (and base your choice on previously chosen objects). Every for every quantifiercorresponds to an object the adversary can choose arbitrarily (and again based on prior choices) as long as it satisfies the conditions. A valid proofcorresponds to a strategy by which no matter what the adversary does, you can win the game by obtaining a contradiction which would be a choiceof ๐‘˜ that would result in ๐น(๐‘ฅ๐‘ฆ๐‘˜๐‘ง) = 0, hence violating the conclusion of the pumping lemma.

Page 246: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

246 introduction to theoretical computer science

much difference, as one can easily encode the separator as a specialbinary string instead.)

Solution:

We use the pumping lemma. Suppose towards the sake of con-tradiction that there is a regular expression ๐‘’ computing PAL,and let ๐‘›0 be the number obtained by the pumping lemma (The-orem 6.20). Consider the string ๐‘ค = 0๐‘›0 ; 0๐‘›0 . Since the reverseof the all zero string is the all zero string, PAL(๐‘ค) = 1. Now, bythe pumping lemma, if PAL is computed by ๐‘’, then we can write๐‘ค = ๐‘ฅ๐‘ฆ๐‘ง such that |๐‘ฅ๐‘ฆ| โ‰ค ๐‘›0, |๐‘ฆ| โ‰ฅ 1 and PAL(๐‘ฅ๐‘ฆ๐‘˜๐‘ง) = 1 forevery ๐‘˜ โˆˆ โ„•. In particular, it must hold that PAL(๐‘ฅ๐‘ง) = 1, but thisis a contradiction, since ๐‘ฅ๐‘ง = 0๐‘›0โˆ’|๐‘ฆ|; 0๐‘›0 and so its two parts arenot of the same length and in particular are not the reverse of oneanother.

For yet another example of a pumping-lemma based proof, seeFig. 6.10 which illustrates a cartoon of the proof of the non-regularityof the function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 which is defined as ๐น(๐‘ฅ) = 1 iff๐‘ฅ = 0๐‘›1๐‘› for some ๐‘› โˆˆ โ„• (i.e., ๐‘ฅ consists of a string of consecutivezeroes, followed by a string of consecutive ones of the same length).

6.6 ANSWERING SEMANTIC QUESTIONS ABOUT REGULAR EX-PRESSIONS

Regular expressions have applications beyond search. For example,regular expressions are often used to define tokens (such as what is avalid variable identifier, or keyword) in the design of parsers, compilersand interpreters for programming languages. There other applicationstoo: for example, in recent years, the world of networking moved fromfixed topologies to โ€œsoftware defined networksโ€. Such networks arerouted by programmable switches that can implement policies suchas โ€œif packet is secured by SSL then forward it to A, otherwise for-ward it to Bโ€. To represent such policies we need a language that ison one hand sufficiently expressive to capture the policies we wantto implement, but on the other hand sufficiently restrictive so that wecan quickly execute them at network speed and also be able to answerquestions such as โ€œcan C see the packets moved from A to B?โ€. TheNetKAT network programming language uses a variant of regularexpressions to achieve precisely that. For this application, it is im-portant that we are not able to merely answer whether an expression๐‘’ matches a string ๐‘ฅ but also answer semantic questions about regu-lar expressions such as โ€œdo expressions ๐‘’ and ๐‘’โ€ฒ compute the same

Page 247: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 247

function?โ€ and โ€œdoes there exist a string ๐‘ฅ that is matched by the ex-pression ๐‘’?โ€. The following theorem shows that we can answer thelatter question:

Theorem 6.22 โ€” Emptiness of regular languages is computable. There is analgorithm that given a regular expression ๐‘’, outputs 1 if and only ifฮฆ๐‘’ is the constant zero function.

Proof Idea:

The idea is that we can directly observe this from the structureof the expression. The only way a regular expression ๐‘’ computesthe constant zero function is if ๐‘’ has the form โˆ… or is obtained byconcatenating โˆ… with other expressions.โ‹†Proof of Theorem 6.22. Define a regular expression to be โ€œemptyโ€ if itcomputes the constant zero function. Given a regular expression ๐‘’, wecan determine if ๐‘’ is empty using the following rules:

โ€ข If ๐‘’ has the form ๐œŽ or "" then it is not empty.

โ€ข If ๐‘’ is not empty then ๐‘’|๐‘’โ€ฒ is not empty for every ๐‘’โ€ฒ.โ€ข If ๐‘’ is not empty then ๐‘’โˆ— is not empty.

โ€ข If ๐‘’ and ๐‘’โ€ฒ are both not empty then ๐‘’ ๐‘’โ€ฒ is not empty.

โ€ข โˆ… is empty.

Using these rules it is straightforward to come up with a recursivealgorithm to determine emptiness.

Using Theorem 6.22, we can obtain an algorithm that determineswhether or not two regular expressions ๐‘’ and ๐‘’โ€ฒ are equivalent, in thesense that they compute the same function.

Theorem 6.23 โ€” Equivalence of regular expressions is computable. LetREGEQ โˆถ 0, 1โˆ— โ†’ 0, 1 be the function that on input (a stringrepresenting) a pair of regular expressions ๐‘’, ๐‘’โ€ฒ, REGEQ(๐‘’, ๐‘’โ€ฒ) = 1if and only if ฮฆ๐‘’ = ฮฆ๐‘’โ€ฒ . Then there is an algorithm that computesREGEQ.

Proof Idea:

The idea is to show that given a pair of regular expressions ๐‘’ and๐‘’โ€ฒ we can find an expression ๐‘’โ€ณ such that ฮฆ๐‘’โ€ณ(๐‘ฅ) = 1 if and only ifฮฆ๐‘’(๐‘ฅ) โ‰  ฮฆ๐‘’โ€ฒ(๐‘ฅ). Therefore ฮฆ๐‘’โ€ณ is the constant zero function if and only

Page 248: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

248 introduction to theoretical computer science

if ๐‘’ and ๐‘’โ€ฒ are equivalent, and thus we can test for emptiness of ๐‘’โ€ณ todetermine equivalence of ๐‘’ and ๐‘’โ€ฒ.โ‹†Proof of Theorem 6.23. We will prove Theorem 6.23 from Theorem 6.22.(The two theorems are in fact equivalent: it is easy to prove Theo-rem 6.22 from Theorem 6.23, since checking for emptiness is the sameas checking equivalence with the expression โˆ….) Given two regu-lar expressions ๐‘’ and ๐‘’โ€ฒ, we will compute an expression ๐‘’โ€ณ such thatฮฆ๐‘’โ€ณ(๐‘ฅ) = 1 if and only if ฮฆ๐‘’(๐‘ฅ) โ‰  ฮฆ๐‘’โ€ฒ(๐‘ฅ). One can see that ๐‘’ is equiva-lent to ๐‘’โ€ฒ if and only if ๐‘’โ€ณ is empty.

We start with the observation that for every bit ๐‘Ž, ๐‘ โˆˆ 0, 1, ๐‘Ž โ‰  ๐‘ ifand only if (๐‘Ž โˆง ๐‘) โˆจ (๐‘Ž โˆง ๐‘) . (6.15)

Hence we need to construct ๐‘’โ€ณ such that for every ๐‘ฅ,ฮฆ๐‘’โ€ณ(๐‘ฅ) = (ฮฆ๐‘’(๐‘ฅ) โˆง ฮฆ๐‘’โ€ฒ(๐‘ฅ)) โˆจ (ฮฆ๐‘’(๐‘ฅ) โˆง ฮฆ๐‘’โ€ฒ(๐‘ฅ)) . (6.16)To construct the expression ๐‘’โ€ณ, we will show how given any pair of

expressions ๐‘’ and ๐‘’โ€ฒ, we can construct expressions ๐‘’ โˆง ๐‘’โ€ฒ and ๐‘’ thatcompute the functions ฮฆ๐‘’ โˆง ฮฆ๐‘’โ€ฒ and ฮฆ๐‘’ respectively. (Computing theexpression for ๐‘’ โˆจ ๐‘’โ€ฒ is straightforward using the | operation of regularexpressions.)

Specifically, by Lemma 6.17, regular functions are closed undernegation, which means that for every regular expression ๐‘’, there is anexpression ๐‘’ such that ฮฆ๐‘’(๐‘ฅ) = 1 โˆ’ ฮฆ๐‘’(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1โˆ—. Now,for every two expression ๐‘’ and ๐‘’โ€ฒ, the expression๐‘’ โˆง ๐‘’โ€ฒ = (๐‘’|๐‘’โ€ฒ) (6.17)

computes the AND of the two expressions. Given these two transfor-mations, we see that for every regular expressions ๐‘’ and ๐‘’โ€ฒ we can finda regular expression ๐‘’โ€ณ satisfying (6.16) such that ๐‘’โ€ณ is empty if andonly if ๐‘’ and ๐‘’โ€ฒ are equivalent.

Chapter Recap

โ€ข We model computational tasks on arbitrarily largeinputs using infinite functions ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—.

โ€ข Such functions take an arbitrarily long (but stillfinite!) string as input, and cannot be described bya finite table of inputs and ouptuts.

โ€ข A function with a single bit of output is known asa Boolean function, and the tasks of computing it isequivalent to deciding a language ๐ฟ โŠ† 0, 1โˆ—.

Page 249: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

functions with infinite domains, automata, and regular expressions 249

โ€ข Deterministic finite automata (DFAs) are one simplemodel for computing (infinite) Boolean functions.

โ€ข There are some functions that cannot be computedby DFAs.

โ€ข The set of functions computable by DFAs is thesame as the set of languages that can be recognizedby regular expressions.

6.7 EXERCISES

Exercise 6.1 โ€” Closure properties of regular functions. Suppose that ๐น, ๐บ โˆถ0, 1โˆ— โ†’ 0, 1 are regular. For each one of the following defini-tions of the function ๐ป , either prove that ๐ป is always regular or give acounterexample for regular ๐น, ๐บ that would make ๐ป not regular.

1. ๐ป(๐‘ฅ) = ๐น(๐‘ฅ) โˆจ ๐บ(๐‘ฅ).2. ๐ป(๐‘ฅ) = ๐น(๐‘ฅ) โˆง ๐บ(๐‘ฅ)3. ๐ป(๐‘ฅ) = NAND(๐น(๐‘ฅ), ๐บ(๐‘ฅ)).4. ๐ป(๐‘ฅ) = ๐น(๐‘ฅ๐‘…) where ๐‘ฅ๐‘… is the reverse of ๐‘ฅ: ๐‘ฅ๐‘… = ๐‘ฅ๐‘›โˆ’1๐‘ฅ๐‘›โˆ’2 โ‹ฏ ๐‘ฅ๐‘œ for๐‘› = |๐‘ฅ|.5. ๐ป(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐‘ข๐‘ฃ s.t. ๐น(๐‘ข) = ๐บ(๐‘ฃ) = 10 otherwise

6. ๐ป(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐‘ข๐‘ข s.t. ๐น(๐‘ข) = ๐บ(๐‘ข) = 10 otherwise

7. ๐ป(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐‘ข๐‘ข๐‘… s.t. ๐น(๐‘ข) = ๐บ(๐‘ข) = 10 otherwise

Exercise 6.2 One among the following two functions that map 0, 1โˆ—to 0, 1 can be computed by a regular expression, and the other onecannot. For the one that can be computed by a regular expression,write the expression that does it. For the one that cannot, prove thatthis cannot be done using the pumping lemma.

โ€ข ๐น(๐‘ฅ) = 1 if 4 divides โˆ‘|๐‘ฅ|โˆ’1๐‘–=0 ๐‘ฅ๐‘– and ๐น(๐‘ฅ) = 0 otherwise.

โ€ข ๐บ(๐‘ฅ) = 1 if and only if โˆ‘|๐‘ฅ|โˆ’1๐‘–=0 ๐‘ฅ๐‘– โ‰ฅ |๐‘ฅ|/4 and ๐บ(๐‘ฅ) = 0 otherwise.

Exercise 6.3 โ€” Non regularity. 1. Prove that the following function ๐น โˆถ0, 1โˆ— โ†’ 0, 1 is not regular. For every ๐‘ฅ โˆˆ 0, 1โˆ—, ๐น(๐‘ฅ) = 1 iff ๐‘ฅ isof the form ๐‘ฅ = 13๐‘– for some ๐‘– > 0.

Page 250: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

250 introduction to theoretical computer science

2. Prove that the following function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is not regular.For every ๐‘ฅ โˆˆ 0, 1โˆ—, ๐น(๐‘ฅ) = 1 iff โˆ‘๐‘— ๐‘ฅ๐‘— = 3๐‘– for some ๐‘– > 0.

6.8 BIBLIOGRAPHICAL NOTES

The relation of regular expressions with finite automata is a beautifultopic, on which we only touch upon in this text. It is covered moreextensively in [Sip97; HMU14; Koz97]. These texts also discuss top-ics such as non deterministic finite automata (NFA) and the relationbetween context-free grammars and pushdown automata.

The automaton of Fig. 6.4 was generated using the FSM simulatorof Ivan Zuzak and Vedrana Jankovic. Our proof of Theorem 6.11 isclosely related to the Myhill-Nerode Theorem. One direction of theMyhill-Nerode theorem can be stated as saying that if ๐‘’ is a regularexpression then there is at most a finite number of strings ๐‘ง0, โ€ฆ , ๐‘ง๐‘˜โˆ’1such that ฮฆ๐‘’[๐‘ง๐‘–] โ‰  ฮฆ๐‘’[๐‘ง๐‘—] for every 0 โ‰ค ๐‘– โ‰  ๐‘— < ๐‘˜.

Page 251: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Figure 7.1: An algorithm is a finite recipe to computeon arbitrarily long inputs. The components of analgorithm include the instructions to be performed,finite state or โ€œlocal variablesโ€, the memory to storethe input and intermediate computations, as well asmechanisms to decide which part of the memory toaccess, and when to repeat instructions and when tohalt.

7Loops and infinity

โ€œThe bounds of arithmetic were however outstepped the moment the idea ofapplying the [punched] cards had occurred; and the Analytical Engine doesnot occupy common ground with mere โ€˜calculating machines.โ€™ โ€ฆ In enablingmechanism to combine together general symbols, in successions of unlim-ited variety and extent, a uniting link is established between the operations ofmatter and the abstract mental processes of the most abstract branch of mathe-matical science.โ€, Ada Augusta, countess of Lovelace, 1843

As the quote of Chapter 6 says, an algorithm is โ€œa finite answer toan infinite number of questionsโ€. To express an algorithm we need towrite down a finite set of instructions that will enable us to computeon arbitrarily long inputs. To describe and execute an algorithm weneed the following components (see Fig. 7.1):

โ€ข The finite set of instructions to be performed.

โ€ข Some โ€œlocal variablesโ€ or finite state used in the execution.

โ€ข A potentially unbounded working memory to store the input aswell as any other values we may require later.

โ€ข While the memory is unbounded, at every single step we can onlyread and write to a finite part of it, and we need a way to addresswhich are the parts we want to read from and write to.

โ€ข If we only have a finite set of instructions but our input can bearbitrarily long, we will need to repeat instructions (i.e., loop back).We need a mechanism to decide when we will loop and when wewill halt.

This chapter: A non-mathy overviewIn this chapter we give a general model of an algorithm,which (unlike Boolean circuits) is not restricted to a fixedinput lengths, and (unlike finite automata) is not restricted to

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Learn the model of Turing machines, which

can compute functions of arbitrary inputlengths.

โ€ข See a programming-language description ofTuring machines, using NAND-TM programs,which add loops and arrays to NAND-CIRC.

โ€ข See some basic syntactic sugar andequivalence of variants of Turing machinesand NAND-TM programs.

Page 252: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

252 introduction to theoretical computer science

a finite amount of working memory. We will see two ways tomodel algorithms:

โ€ข Turing machines, invented by Alan Turing in 1936, are anhypothetical abstract device that yields a finite descriptionof an algorithm that can handle arbitrarily long inputs.

โ€ข The NAND-TM Programming language extends NAND-CIRC with the notion of loops and arrays to obtain finiteprograms that can compute a function with arbitrarilylong inputs.

It turns out that these two models are equivalent, and in factthey are equivalent to many other computational modelsincluding programming languages such as C, Lisp, Python,JavaScript, etc. This notion, known as Turing equivalenceor Turing completeness, will be discussed in Chapter 8. SeeFig. 7.2 for an overview of the models presented in this chap-ter and Chapter 8.

Figure 7.2: Overview of our models for finite andunbounded computation. In the previous chapterswe study the computation of finite functions, whichare functions ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š for some fixed๐‘›, ๐‘š, and modeled computing these functions usingcircuits or straightline programs. In this chapter westudy computing unbounded functions of the form๐น โˆถ 0, 1โˆ— โ†’ 0, 1๐‘š or ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—.We model computing these functions using TuringMachines or (equivalently) NAND-TM programswhich add the notion of loops to the NAND-CIRCprogramming language. In Chapter 8 we will showthat these models are equivalent to many othermodels, including RAM machines, the ๐œ† calculus, andall the common programming languages including C,Python, Java, JavaScript, etc.

7.1 TURING MACHINESโ€œComputing is normally done by writing certain symbols on paper. We maysuppose that this paper is divided into squares like a childโ€™s arithmetic book..The behavior of the [human] computer at any moment is determined by thesymbols which he is observing, and of his โ€˜state of mindโ€™ at that momentโ€ฆ Wemay suppose that in a simple operation not more than one symbol is altered.โ€,โ€œWe compare a man in the process of computing โ€ฆ to a machine which is onlycapable of a finite number of configurationsโ€ฆ The machine is supplied with aโ€˜tapeโ€™ (the analogue of paper) โ€ฆ divided into sections (called โ€˜squaresโ€™) eachcapable of bearing a โ€˜symbolโ€™ โ€, Alan Turing, 1936

Page 253: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 253

Figure 7.3: Aside from his many other achievements,Alan Turing was an excellent long distance runnerwho just fell shy of making Englandโ€™s olympic team.A fellow runner once asked him why he punishedhimself so much in training. Alan said โ€œI have sucha stressful job that the only way I can get it out of mymind is by running hard; itโ€™s the only way I can getsome release.โ€

Figure 7.4: Until the advent of electronic computers,the word โ€œcomputerโ€ was used to describe a personthat performed calculations. Most of these โ€œhumancomputersโ€ were women, and they were absolutelyessential to many achievements including mappingthe stars, breaking the Enigma cipher, and the NASAspace mission; see also the bibliographical notes.Photo from National Photo Company Collection; seealso [Sob17].

Figure 7.5: Steam-powered Turing Machine mural,painted by CSE grad students at the University ofWashington on the night before spring qualifyingexaminations, 1987. Image from https://www.cs.washington.edu/building/art/SPTM.

โ€œWhat is the difference between a Turing machine and the modern computer?Itโ€™s the same as that between Hillaryโ€™s ascent of Everest and the establishmentof a Hilton hotel on its peak.โ€ , Alan Perlis, 1982.

The โ€œgranddaddyโ€ of all models of computation is the Turing Ma-chine. Turing machines were defined in 1936 by Alan Turing in anattempt to formally capture all the functions that can be computedby human โ€œcomputersโ€ (see Fig. 7.4) that follow a well-defined set ofrules, such as the standard algorithms for addition or multiplication.

Turing thought of such a person as having access to as muchโ€œscratch paperโ€ as they need. For simplicity we can think of thisscratch paper as a one dimensional piece of graph paper (or tape, asit is commonly referred to), which is divided to โ€œcellsโ€, where eachโ€œcellโ€ can hold a single symbol (e.g., one digit or letter, and moregenerally some element of a finite alphabet). At any point in time, theperson can read from and write to a single cell of the paper, and basedon the contents can update his/her finite mental state, and/or move tothe cell immediately to the left or right of the current one.

Turing modeled such a computation by a โ€œmachineโ€ that maintainsone of ๐‘˜ states. At each point in time the machine reads from its โ€œworktapeโ€ a single symbol from a finite alphabet ฮฃ and uses that to up-date its state, write to tape, and possibly move to an adjacent cell (seeFig. 7.7). To compute a function ๐น using this machine, we initialize thetape with the input ๐‘ฅ โˆˆ 0, 1โˆ— and our goal is to ensure that the tapewill contain the value ๐น(๐‘ฅ) at the end of the computation. Specifically,a computation of a Turing Machine ๐‘€ with ๐‘˜ states and alphabet ฮฃ oninput ๐‘ฅ โˆˆ 0, 1โˆ— proceeds as follows:

โ€ข Initially the machine is at state 0 (known as the โ€œstarting stateโ€)and the tape is initialized to โ–ท, ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1,โˆ…,โˆ…, โ€ฆ. We use thesymbol โ–ท to denote the beginning of the tape, and the symbol โˆ… todenote an empty cell. We will always assume that the alphabet ฮฃ isa (potentially strict) superset of โ–ท,โˆ…, 0, 1.

โ€ข The location ๐‘– to which the machine points to is set to 0.โ€ข At each step, the machine reads the symbol ๐œŽ = ๐‘‡ [๐‘–] that is in the๐‘–๐‘กโ„Ž location of the tape, and based on this symbol and its state ๐‘ 

decides on:

โ€“ What symbol ๐œŽโ€ฒ to write on the tapeโ€“ Whether to move Left (i.e., ๐‘– โ† ๐‘– โˆ’ 1), Right (i.e., ๐‘– โ† ๐‘– + 1), Stay

in place, or Halt the computation.โ€“ What is going to be the new state ๐‘  โˆˆ [๐‘˜]

โ€ข The set of rules the Turing machine follows is known as its transi-tion function.

Page 254: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

254 introduction to theoretical computer science

Figure 7.6: The components of a Turing Machine. Notehow they correspond to the general components ofalgorithms as described in Fig. 7.1.

โ€ข When the machine halts then its output is the binary string ob-tained by reading the tape from the beginning until the head posi-tion, dropping all symbols such as โ–ท, โˆ…, etc. that are not either 0 or1.

7.1.1 Extended example: A Turing machine for palindromesLet PAL (for palindromes) be the function that on input ๐‘ฅ โˆˆ 0, 1โˆ—,outputs 1 if and only if ๐‘ฅ is an (even length) palindrome, in the sensethat ๐‘ฅ = ๐‘ค0 โ‹ฏ ๐‘ค๐‘›โˆ’1๐‘ค๐‘›โˆ’1๐‘ค๐‘›โˆ’2 โ‹ฏ ๐‘ค0 for some ๐‘› โˆˆ โ„• and ๐‘ค โˆˆ 0, 1๐‘›.

We now show a Turing Machine ๐‘€ that computes PAL. To specify๐‘€ we need to specify (i) ๐‘€ โ€™s tape alphabet ฮฃ which should contain atleast the symbols 0,1, โ–ท and โˆ…, and (ii) ๐‘€ โ€™s transition function whichdetermines what action ๐‘€ takes when it reads a given symbol while itis in a particular state.

In our case, ๐‘€ will use the alphabet 0, 1, โ–ท,โˆ…, ร— and will have๐‘˜ = 14 states. Though the states are simply numbers between 0 and๐‘˜ โˆ’ 1, for convenience we will give them the following labels:

State Label0 START1 RIGHT_02 RIGHT_13 LOOK_FOR_04 LOOK_FOR_15 RETURN6 REJECT7 ACCEPT8 OUTPUT_09 OUTPUT_110 0_AND_BLANK11 1_AND_BLANK12 BLANK_AND_STOP

We describe the operation of our Turing Machine ๐‘€ in words:

โ€ข ๐‘€ starts in state START and will go right, looking for the first sym-bol that is 0 or 1. If we find โˆ… before we hit such a symbol then wewill move to the OUTPUT_1 state that we describe below.

โ€ข Once ๐‘€ finds such a symbol ๐‘ โˆˆ 0, 1, ๐‘€ deletes ๐‘ from the tapeby writing the ร— symbol, it enters either the RIGHT_0 or RIGHT_1mode according to the value of ๐‘ and starts moving rightwardsuntil it hits the first โˆ… or ร— symbol.

Page 255: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 255

โ€ข Once we find this symbol we go into the state LOOK_FOR_0 orLOOK_FOR_1 depending on whether we were in the state RIGHT_0or RIGHT_1 and make one left move.

โ€ข In the state LOOK_FOR_๐‘, we check whether the value on the tape is๐‘. If it is, then we delete it by changing its value to ร—, and move tothe state RETURN. Otherwise, we change to the OUTPUT_0 state.

โ€ข The RETURN state means we go back to the beginning. Specifically,we move leftward until we hit the first symbol that is not 0 or 1, inwhich case we change our state to START.

โ€ข The OUTPUT_๐‘ states mean that we are going to output the value ๐‘.In both these states we go left until we hit โ–ท. Once we do so, wemake a right step, and change to the 1_AND_BLANK or 0_AND_BLANKstates respectively. In the latter states, we write the correspondingvalue, and then move right and change to the BLANK_AND_STOPstate, in which we write โˆ… to the tape and halt.

The above description can be turned into a table describing for eachone of the 13 โ‹… 5 combination of state and symbol, what the Turingmachine will do when it is in that state and it reads that symbol. Thistable is known as the transition function of the Turing machine.

7.1.2 Turing machines: a formal definition

Figure 7.7: A Turing machine has access to a tape ofunbounded length. At each point in the execution,the machine can read a single symbol of the tape,and based on that and its current state, write a newsymbol, update the tape, decide whether to move left,right, stay, or halt.

The formal definition of Turing machines is as follows:

Definition 7.1 โ€” Turing Machine. A (one tape) Turing machine with ๐‘˜states and alphabet ฮฃ โŠ‡ 0, 1, โ–ท,โˆ… is represented by a transitionfunction ๐›ฟ๐‘€ โˆถ [๐‘˜] ร— ฮฃ โ†’ [๐‘˜] ร— ฮฃ ร— L, R, S, H.

Page 256: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

256 introduction to theoretical computer science

For every ๐‘ฅ โˆˆ 0, 1โˆ—, the output of ๐‘€ on input ๐‘ฅ, denoted by๐‘€(๐‘ฅ), is the result of the following process:

โ€ข We initialize ๐‘‡ to be the sequence โ–ท, ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›โˆ’1,โˆ…,โˆ…, โ€ฆ,where ๐‘› = |๐‘ฅ|. (That is, ๐‘‡ [0] = โ–ท, ๐‘‡ [๐‘– + 1] = ๐‘ฅ๐‘– for ๐‘– โˆˆ [๐‘›], and๐‘‡ [๐‘–] = โˆ… for ๐‘– > ๐‘›.)

โ€ข We also initialize ๐‘– = 0 and ๐‘  = 0.โ€ข We then repeat the following process:

1. Let (๐‘ โ€ฒ, ๐œŽโ€ฒ, ๐ท) = ๐›ฟ๐‘€(๐‘ , ๐‘‡ [๐‘–]).2. Set ๐‘  โ†’ ๐‘ โ€ฒ, ๐‘‡ [๐‘–] โ†’ ๐œŽโ€ฒ.3. If ๐ท = R then set ๐‘– โ†’ ๐‘–+1, if ๐ท = L then set ๐‘– โ†’ max๐‘–โˆ’1, 0.

(If ๐ท = S then we keep ๐‘– the same.)4. If ๐ท = H then halt.

โ€ข If the process above halts, then ๐‘€ โ€™s output, denoted by ๐‘€(๐‘ฅ),is the string ๐‘ฆ โˆˆ 0, 1โˆ— obtained by concatenating all the sym-bols in 0, 1 in positions ๐‘‡ [0], โ€ฆ , ๐‘‡ [๐‘–] where ๐‘– is the final headposition.

โ€ข If The Turing machine does not halt then we denote ๐‘€(๐‘ฅ) = โŠฅ.

PYou should make sure you see why this formal def-inition corresponds to our informal description ofa Turing Machine. To get more intuition on TuringMachines, you can explore some of the online avail-able simulators such as Martin Ugarteโ€™s, AnthonyMorphettโ€™s, or Paul Rendellโ€™s.

One should not confuse the transition function ๐›ฟ๐‘€ of a Turing ma-chine ๐‘€ with the function that the machine computes. The transitionfunction ๐›ฟ๐‘€ is a finite function, with ๐‘˜|ฮฃ| inputs and 4๐‘˜|ฮฃ| outputs.(Can you see why?) The machine can compute an infinite function ๐นthat takes as input a string ๐‘ฅ โˆˆ 0, 1โˆ— of arbitrary length and mightalso produce an arbitrary length string as output.

In our formal definition, we identified the machine ๐‘€ with its tran-sition function ๐›ฟ๐‘€ since the transition function tells us everythingwe need to know about the Turing machine, and hence serves as agood mathematical representation of it. This choice of representa-tion is somewhat arbitrary, and is based on our convention that thestate space is always the numbers 0, โ€ฆ , ๐‘˜ โˆ’ 1 with 0 as the startingstate. Other texts use different conventions and so their mathematical

Page 257: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 257

definition of a Turing machine might look superficially different, butthese definitions describe the same computational process and hasthe same computational powers. See Section 7.7 for a comparison be-tween Definition 7.1 and the way Turing Machines are defined in textssuch as Sipser [Sip97]. These definitions are equivalent despite theirsuperficial differences.

7.1.3 Computable functionsWe now turn to making one of the most important definitions in thisbook, that of computable functions.

Definition 7.2 โ€” Computable functions. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— bea (total) function and let ๐‘€ be a Turing machine. We say that ๐‘€computes ๐น if for every ๐‘ฅ โˆˆ 0, 1โˆ—, ๐‘€(๐‘ฅ) = ๐น(๐‘ฅ).

We say that a function ๐น is computable if there exists a Turingmachine ๐‘€ that computes it.

Defining a function โ€œcomputableโ€ if and only if it can be computedby a Turing machine might seem โ€œrecklessโ€ but, as weโ€™ll see in Chap-ter 8, it turns out that being computable in the sense of Definition 7.2is equivalent to being computable in essentially any reasonable modelof computation. This is known as the Church-Turing Thesis. (Unlikethe extended Church-Turing Thesis which we discussed in Section 5.6,the Church-Turing thesis itself is widely believed and there are nocandidate devices that attack it.)

Big Idea 9 We can precisely define what it means for a function tobe computable by any possible algorithm.

This is a good point to remind the reader that functions are not thesame as programs:

Functions โ‰  Programs . (7.1)A Turing machine (or program) ๐‘€ can compute some function๐น , but it is not the same as ๐น . In particular there can be more than

one program to compute the same function. Being computable is aproperty of functions, not of machines.

We will often pay special attention to functions ๐น โˆถ 0, 1โˆ— โ†’ 0, 1that have a single bit of output. Hence we give a special name for theset of functions of this form that are computable.

Definition 7.3 โ€” The class R. We define R be the set of all computablefunctions ๐น โˆถ 0, 1โˆ— โ†’ 0, 1.

Page 258: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

258 introduction to theoretical computer science

1 A partial function ๐น from a set ๐ด to a set ๐ต is afunction that is only defined on a subset of ๐ด, (seeSection 1.4.3). We can also think of such a function asmapping ๐ด to ๐ต โˆช โŠฅ where โŠฅ is a special โ€œfailureโ€symbol such that ๐น(๐‘Ž) = โŠฅ indicates the function ๐นis not defined on ๐‘Ž.

RRemark 7.4 โ€” Functions vs. languages. As discussedin Section 6.1.2, many texts use the terminology ofโ€œlanguagesโ€ rather than functions to refer to compu-tational tasks. A Turing machine ๐‘€ decides a language๐ฟ if for every input ๐‘ฅ โˆˆ 0, 1โˆ—, ๐‘€(๐‘ฅ) outputs 1 ifand only if ๐‘ฅ โˆˆ ๐ฟ. This is equivalent to computingthe Boolean function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 defined as๐น(๐‘ฅ) = 1 iff ๐‘ฅ โˆˆ ๐ฟ. A language ๐ฟ is decidable if thereis a Turing machine ๐‘€ that decides it. For historicalreasons, some texts also call such a language recursive(which is the reason that the letter R is often usedto denote the set of computable Boolean functions /decidable languages defined in Definition 7.3).In this book we stick to the terminology of functionsrather than languages, but all definitions and resultscan be easily translated back and forth by using theequivalence between the function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1and the language ๐ฟ = ๐‘ฅ โˆˆ 0, 1โˆ— | ๐น (๐‘ฅ) = 1.

7.1.4 Infinite loops and partial functionsOne crucial difference between circuits/straight-line programs andTuring machines is the following. Looking at a NAND-CIRC program๐‘ƒ , we can always tell how many inputs and how many outputs it has(by simply looking at the X and Y variables). Furthermore, we areguaranteed that if we invoke ๐‘ƒ on any input then some output will beproduced.

In contrast, given any Turing machine ๐‘€ , we cannot determinea priori the length of the output. In fact, we donโ€™t even know if anoutput would be produced at all! For example, it is very easy to comeup with a Turing machine whose transition function never outputs Hand hence never halts.

If a machine ๐‘€ fails to stop and produce an output on some aninput ๐‘ฅ, then it cannot compute any total function ๐น , since clearly oninput ๐‘ฅ, ๐‘€ will fail to output ๐น(๐‘ฅ). However, ๐‘€ can still compute apartial function.1

For example, consider the partial function DIV that on input a pair(๐‘Ž, ๐‘) of natural numbers, outputs โŒˆ๐‘Ž/๐‘โŒ‰ if ๐‘ > 0, and is undefinedotherwise. We can define a Turing machine ๐‘€ that computes DIV oninput ๐‘Ž, ๐‘ by outputting the first ๐‘ = 0, 1, 2, โ€ฆ such that ๐‘๐‘ โ‰ฅ ๐‘Ž. If ๐‘Ž > 0and ๐‘ = 0 then the machine ๐‘€ will never halt, but this is OK, sinceDIV is undefined on such inputs. If ๐‘Ž = 0 and ๐‘ = 0, the machine ๐‘€will output 0, which is also OK, since we donโ€™t care about what theprogram outputs on inputs on which DIV is undefined. Formally, wedefine computability of partial functions as follows:

Page 259: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 259

Definition 7.5 โ€” Computable (partial or total) functions. Let ๐น be either atotal or partial function mapping 0, 1โˆ— to 0, 1โˆ— and let ๐‘€ be aTuring machine. We say that ๐‘€ computes ๐น if for every ๐‘ฅ โˆˆ 0, 1โˆ—on which ๐น is defined, ๐‘€(๐‘ฅ) = ๐น(๐‘ฅ). We say that a (partial ortotal) function ๐น is computable if there is a Turing machine thatcomputes it.

Note that if ๐น is a total function, then it is defined on every ๐‘ฅ โˆˆ0, 1โˆ— and hence in this case, Definition 7.5 is identical to Defini-tion 7.2.

RRemark 7.6 โ€” Bot symbol. We often use โŠฅ as our spe-cial โ€œfailure symbolโ€. If a Turing machine ๐‘€ fails tohalt on some input ๐‘ฅ โˆˆ 0, 1โˆ— then we denote this by๐‘€(๐‘ฅ) = โŠฅ. This does not mean that ๐‘€ outputs someencoding of the symbol โŠฅ but rather that ๐‘€ entersinto an infinite loop when given ๐‘ฅ as input.If a partial function ๐น is undefined on ๐‘ฅ then we canalso write ๐น(๐‘ฅ) = โŠฅ. Therefore one might thinkthat Definition 7.5 can be simplified to requiring that๐‘€(๐‘ฅ) = ๐น(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1โˆ—, which would implythat for every ๐‘ฅ, ๐‘€ halts on ๐‘ฅ if and only if ๐น is de-fined on ๐‘ฅ. However this is not the case: for a TuringMachine ๐‘€ to compute a partial function ๐น it is notnecessary for ๐‘€ to enter an infinite loop on inputs ๐‘ฅon which ๐น is not defined. All that is needed is for ๐‘€to output ๐น(๐‘ฅ) on ๐‘ฅโ€™s on which ๐น is defined: on otherinputs it is OK for ๐‘€ to output an arbitrary value suchas 0, 1, or anything else, or not to halt at all. To borrowa term from the C programming language, on inputs ๐‘ฅon which ๐น is not defined, what ๐‘€ does is โ€œundefinedbehaviorโ€.

7.2 TURING MACHINES AS PROGRAMMING LANGUAGES

The name โ€œTuring machineโ€, with its โ€œtapeโ€ and โ€œheadโ€ evokes aphysical object, while in contrast we think of a program as a pieceof text. But we can think of a Turing machine as a program as well.For example, consider the Turing Machine ๐‘€ of Section 7.1.1 thatcomputes the function PAL such that PAL(๐‘ฅ) = 1 iff ๐‘ฅ is a palindrome.We can also describe this machine as a program using the Python-likepseudocode of the form below

# Gets an array Tape initialized to# [">", x_0 , x_1 , .... , x_(n-1), "", "", ...]# At the end of the execution, Tape[1] is equal to 1# if x is a palindrome and is equal to 0 otherwise

Page 260: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

260 introduction to theoretical computer science

2 Most programming languages use arrays of fixedsize, while a Turing machineโ€™s tape is unbounded. Butof course there is no need to store an infinite numberof โˆ… symbols. If you want, you can think of the tapeas a list that starts off just long enough to store theinput, but is dynamically grown in size as the Turingmachineโ€™s head explores new positions.

def PAL(Tape):head = 0state = 0 # STARTwhile (state != 12):

if (state == 0 && Tape[head]=='0'):state = 3 # LOOK_FOR_0Tape[head] = 'x'head += 1 # move right

if (state==0 && Tape[head]=='1')state = 4 # LOOK_FOR_1Tape[head] = 'x'head += 1 # move right

... # more if statements here

The particular details of this program are not important. What mat-ters is that we can describe Turing machines as programs. Moreover,note that when translating a Turing machine into a program, the tapebecomes a list or array that can hold values from the finite set ฮฃ.2 Thehead position can be thought of as an integer valued variable that canhold integers of unbounded size. The state is a local register that canhold one of a fixed number of values in [๐‘˜].

More generally we can think of every Turing Machine ๐‘€ as equiva-lent to a program similar to the following:

# Gets an array Tape initialized to# [">", x_0 , x_1 , .... , x_(n-1), "", "", ...]def M(Tape):

state = 0i = 0 # holds head locationwhile (True):

# Move head, modify state, write to tape# based on current state and cell at head# below are just examples for how program looks

for a particular transition functionif Tape[i]=="0" and state==7: #

ฮด_M(7,"0")=(19,"1","R")i += 1Tape[i]="1"state = 19

elif Tape[i]==">" and state == 13: #ฮด_M(13,">")=(15,"0","S")Tape[i]="0"state = 15

elif ......

Page 261: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 261

elif Tape[i]==">" and state == 29: #ฮด_M(29,">")=(.,.,"H")break # Halt

If we wanted to use only Boolean (i.e., 0/1-valued) variables thenwe can encode the state variables using โŒˆlog ๐‘˜โŒ‰ bits. Similarly, wecan represent each element of the alphabet ฮฃ using โ„“ = โŒˆlog |ฮฃ|โŒ‰ bitsand hence we can replace the ฮฃ-valued array Tape[] with โ„“ Boolean-valued arrays Tape0[],โ€ฆ, Tape(โ„“ โˆ’ 1)[].7.2.1 The NAND-TM Programming languageWe now introduce the NAND-TM programming language, which aimsto capture the power of a Turing machine in a programming languageformalism. Just like the difference between Boolean circuits and Tur-ing Machines, the main difference between NAND-TM and NAND-CIRC is that NAND-TM models a single uniform algorithm that cancompute a function that takes inputs of arbitrary lengths. To do so, weextend the NAND-CIRC programming language with two constructs:

โ€ข Loops: NAND-CIRC is a straight-line programming language- aNAND-CIRC program of ๐‘  lines takes exactly ๐‘  steps of computa-tion and hence in particular cannot even touch more than 3๐‘  vari-ables. Loops allow us to capture in a short program the instructionsfor a computation that can take an arbitrary amount of time.

โ€ข Arrays: A NAND-CIRC program of ๐‘  lines touches at most 3๐‘  vari-ables. While we can use variables with names such as Foo_17 orBar[22], they are not true arrays, since the number in the identifieris a constant that is โ€œhardwiredโ€ into the program.

Figure 7.8: A NAND-TM program has scalar variablesthat can take a Boolean value, array variables thathold a sequence of Boolean values, and a specialindex variable i that can be used to index the arrayvariables. We refer to the i-th value of the arrayvariable Spam using Spam[i]. At each iteration ofthe program the index variable can be incrementedor decremented by one step using the MODANDJMPoperation.

Thus a good way to remember NAND-TM is using the followinginformal equation:

NAND-TM = NAND-CIRC + loops + arrays (7.2)

Page 262: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

262 introduction to theoretical computer science

RRemark 7.7 โ€” NAND-CIRC + loops + arrays = every-thing.. As we will see, adding loops and arrays toNAND-CIRC is enough to capture the full power ofall programming languages! Hence we could replaceโ€œNAND-TMโ€ with any of Python, C, Javascript, OCaml,etc. in the lefthand side of (7.2). But weโ€™re gettingahead of ourselves: this issue will be discussed inChapter 8.

Concretely, the NAND-TM programming language adds the fol-lowing features on top of NAND-CIRC (see Fig. 7.8):

โ€ข We add a special integer valued variable i. All other variables inNAND-TM are Boolean valued (as in NAND-CIRC).

โ€ข Apart from i NAND-TM has two kinds of variables: scalars andarrays. Scalar variables hold one bit (just as in NAND-CIRC). Arrayvariables hold an unbounded number of bits. At any point in thecomputation we can access the array variables at the location in-dexed by i using Foo[i]. We cannot access the arrays at locationsother than the one pointed to by i.

โ€ข We use the convention that arrays always start with a capital letter,and scalar variables (which are never indexed with i) start withlowercase letters. Hence Foo is an array and bar is a scalar variable.

โ€ข The input and output X and Y are now considered arrays with val-ues of zeroes and ones. (There are also two other special arraysX_nonblank and Y_nonblank, see below.)

โ€ข We add a special MODANDJUMP instruction that takes two booleanvariables ๐‘Ž, ๐‘ as input and does the following:

โ€“ If ๐‘Ž = 1 and ๐‘ = 1 then MODANDJUMP(๐‘Ž, ๐‘) increments i by oneand jumps to the first line of the program.

โ€“ If ๐‘Ž = 0 and ๐‘ = 1 then MODANDJUMP(๐‘Ž, ๐‘) decrements i by oneand jumps to the first line of the program. (If i is already equalto 0 then it stays at 0.)

โ€“ If ๐‘Ž = 1 and ๐‘ = 0 then MODANDJUMP(๐‘Ž, ๐‘) jumps to the first line ofthe program without modifying i.

โ€“ If ๐‘Ž = ๐‘ = 0 then MODANDJUMP(๐‘Ž, ๐‘) halts execution of theprogram.

โ€ข The MODANDJUMP instruction always appears in the last line of aNAND-TM program and nowhere else.

Page 263: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 263

Default values. We need one more convention to handle โ€œdefault val-uesโ€. Turing machines have the special symbol โˆ… to indicate that tapelocation is โ€œblankโ€ or โ€œuninitializedโ€. In NAND-TM there is no suchsymbol, and all variables are Boolean, containing either 0 or 1. Allvariables and locations of arrays are default to 0 if they have not beeninitialized to another value. To keep track of whether a 0 in an arraycorresponds to a true zero or to an uninitialized cell, a programmercan always add to an array Foo a โ€œcompanion arrayโ€ Foo_nonblankand set Foo_nonblank[i] to 1 whenever the iโ€™th location is initial-ized. In particular we will use this convention for the input and out-put arrays X and Y. A NAND-TM program has four special arrays X,X_nonblank, Y, and Y_nonblank. When a NAND-TM program is exe-cuted on input ๐‘ฅ โˆˆ 0, 1โˆ— of length ๐‘›, the first ๐‘› cells of the array X areinitialized to ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 and the first ๐‘› cells of the array X_nonblankare initialized to 1. (All uninitialized cells default to 0.) The output ofa NAND-TM program is the string Y[0], โ€ฆ, Y[๐‘š โˆ’ 1] where ๐‘š is thesmallest integer such that Y_nonblank[๐‘š]= 0. A NAND-TM programgets called with X and X_nonblank initialized to contain the input, andwrites to Y and Y_nonblank to produce the output.

Formally, NAND-TM programs are defined as follows:

Definition 7.8 โ€” NAND-TM programs. A NAND-TM program consists ofa sequence of lines of the form foo = NAND(bar,blah) endingwith a line of the form MODANDJMP(foo,bar), where foo,bar,blahare either scalar variables (sequences of letters, digits, and under-scores) or array variables of the form Foo[i] (starting with capitalletter and indexed by i). The program has the array variables X,X_nonblank, Y, Y_nonblank and the index variable i built in, andcan use additional array and scalar variables.

If ๐‘ƒ is a NAND-TM program and ๐‘ฅ โˆˆ 0, 1โˆ— is an input then anexecution of ๐‘ƒ on ๐‘ฅ is the following process:

1. The arrays X and X_nonblank are initialized by X[๐‘–]= ๐‘ฅ๐‘– andX_nonblank[๐‘–]= 1 for all ๐‘– โˆˆ [|๐‘ฅ|]. All other variables and cellsare initialized to 0. The index variable i is also initialized to 0.

2. The program is executed line by line, when the last line MODAND-JMP(foo,bar) is executed then we do as follows:

a. If foo= 1 and bar= 0 then jump to the first line without mod-ifying the value of i.

b. If foo= 1 and bar= 1 then increment i by one and jump tothe first line.

Page 264: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

264 introduction to theoretical computer science

c. If foo= 0 and bar= 1 then decrement i by one (unless it is al-ready zero) and jump to the first line.

d. If foo= 0 and bar= 0 then halt and output Y[0], โ€ฆ, Y[๐‘š โˆ’ 1]where ๐‘š is the smallest integer such that Y_nonblank[๐‘š]= 0.

7.2.2 Sneak peak: NAND-TM vs Turing machinesAs the name implies, NAND-TM programs are a direct implemen-tation of Turing machines in programming language form. We willshow the equivalence below but you can already see how the compo-nents of Turing machines and NAND-TM programs correspond to oneanother:

Table 7.2: Turing Machine and NAND-TM analogs

Turing Machine NAND-TM programState: single register thattakes values in [๐‘˜] Scalar variables: Several variables

such as foo, bar etc.. each takingvalues in 0, 1.

Tape: One tape containingvalues in a finite set ฮฃ.Potentially infinite but ๐‘‡ [๐‘ก]defaults to โˆ… for all locations๐‘ก that have not beenaccessed.

Arrays: Several arrays such as Foo,Bar etc.. for each such array Arr andindex ๐‘—, the value of Arr at position ๐‘—is either 0 or 1. The value defaults to0 for position that have not beenwritten to.

Head location: A number๐‘– โˆˆ โ„• that encodes theposition of the head.

Index variable: The variable i that canbe used to access the arrays.

Accessing memory: At everystep the Turing machine hasaccess to its local state, butcan only access the tape atthe position of the currenthead location.

Accessing memory: At every step aNAND-TM program has access to allthe scalar variables, but can onlyaccess the arrays at the location i ofthe index variable

Control of location: In eachstep the machine can movethe head location by at mostone position.

Control of index variable: In eachiteration of its main loop theprogram can modify the index i byat most one.

7.2.3 ExamplesWe now present some examples of NAND-TM programs.

Page 265: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 265

Example 7.9 โ€” Increment in NAND-TM. The following is a NAND-TMprogram to compute the increment function. That is, INC โˆถ 0, 1โˆ— โ†’0, 1โˆ— such that for every ๐‘ฅ โˆˆ 0, 1๐‘›, INC(๐‘ฅ) is the ๐‘› + 1 bit longstring ๐‘ฆ such that if ๐‘‹ = โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘– โ‹… 2๐‘– is the number represented by๐‘ฅ, then ๐‘ฆ is the (least-significant digit first) binary representation ofthe number ๐‘‹ + 1.

We start by showing the program using the โ€œsyntactic sugarโ€weโ€™ve seen before of using shorthand for some NAND-CIRC pro-grams we have seen before to compute simple functions such asIF, XOR and AND (as well as the constant one function as well as thefunction COPY that just maps a bit to itself).

carry = IF(started,carry,one(started))started = one(started)Y[i] = XOR(X[i],carry)carry = AND(X[i],carry)Y_nonblank[i] = one(started)MODANDJUMP(X_nonblank[i],X_nonblank[i])

The above is not, strictly speaking, a valid NAND-TM program.If we โ€œopen upโ€ all of the syntactic sugar, we get the followingโ€œsugar freeโ€ valid program to compute the same function.

temp_0 = NAND(started,started)temp_1 = NAND(started,temp_0)temp_2 = NAND(started,started)temp_3 = NAND(temp_1,temp_2)temp_4 = NAND(carry,started)carry = NAND(temp_3,temp_4)temp_6 = NAND(started,started)started = NAND(started,temp_6)temp_8 = NAND(X[i],carry)temp_9 = NAND(X[i],temp_8)temp_10 = NAND(carry,temp_8)Y[i] = NAND(temp_9,temp_10)temp_12 = NAND(X[i],carry)carry = NAND(temp_12,temp_12)temp_14 = NAND(started,started)Y_nonblank[i] = NAND(started,temp_14)MODANDJUMP(X_nonblank[i],X_nonblank[i])

Page 266: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

266 introduction to theoretical computer science

Example 7.10 โ€” XOR in NAND-TM. The following is a NAND-TM pro-gram to compute the XOR function on inputs of arbitrary length.That is XOR โˆถ 0, 1โˆ— โ†’ 0, 1 such that XOR(๐‘ฅ) = โˆ‘|๐‘ฅ|โˆ’1๐‘–=0 ๐‘ฅ๐‘–mod 2 for every ๐‘ฅ โˆˆ 0, 1โˆ—. Once again, we use a certain โ€œsyn-tactic sugarโ€. Specifically, we access the arrays X and Y at theirzero-th entry, while NAND-TM only allows access to arrays in thecoordinate of the variable i.

temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)MODANDJUMP(X_nonblank[i],X_nonblank[i])

To transform the program above to a valid NAND-TM program,we can transform references such as X[0] and Y[0] to scalar vari-ables x_0 and y_0 (similarly we can transform any reference of theform Foo[17] or Bar[15] to scalars such as foo_17 and bar_15).We then need to add code to load the value of X[0] to x_0 andsimilarly to write to Y[0] the value of y_0, but this is not hard todo. Using the fact that variables are initialized to zero by default,we can create a variable init which will be set to 1 at the end ofthe first iteration and not changed since then. We can then add anarray Atzero and code that will modify Atzero[i] to 1 if init is0 and otherwise leave it as it is. This will ensure that Atzero[i] isequal to 1 if and only if i is set to zero, and allow the program toknow when we are at the zero-th location. Thus we can add codeto read and write to the corresponding scalars x_0, y_0 when weare at the zero-th location, and also code to move i to zero andthen halt at the end. Working this out fully is somewhat tedious,but can be a good exercise.

PWorking out the above two examples can go a longway towards understanding the NAND-TM language.See our GitHub repository for a full specification ofthe NAND-TM language.

Page 267: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 267

7.3 EQUIVALENCE OF TURING MACHINES AND NAND-TM PRO-GRAMS

Given the above discussion, it might not be surprising that Turingmachines turn out to be equivalent to NAND-TM programs. Indeed,we designed the NAND-TM language to have this property. Never-theless, this is an important result, and the first of many other suchequivalence results we will see in this book.

Theorem 7.11 โ€” Turing machines and NAND-TM programs are equivalent. Forevery ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, ๐น is computable by a NAND-TM pro-gram ๐‘ƒ if and only if there is a Turing Machine ๐‘€ that computes๐น .

Proof Idea:

To prove such an equivalence theorem, we need to show two di-rections. We need to be able to (1) transform a Turing machine ๐‘€ toa NAND-TM program ๐‘ƒ that computes the same function as ๐‘€ and(2) transform a NAND-TM program ๐‘ƒ into a Turing machine ๐‘€ thatcomputes the same function as ๐‘ƒ .

The idea of the proof is illustrated in Fig. 7.9. To show (1), givena Turing machine ๐‘€ , we will create a NAND-TM program ๐‘ƒ thatwill have an array Tape for the tape of ๐‘€ and scalar (i.e., non array)variable(s) state for the state of ๐‘€ . Specifically, since the state of aTuring machine is not in 0, 1 but rather in a larger set [๐‘˜], we will useโŒˆlog ๐‘˜โŒ‰ variables state_0 , โ€ฆ, state_โŒˆlog ๐‘˜โŒ‰ โˆ’ 1 variables to store therepresentation of the state. Similarly, to encode the larger alphabet ฮฃof the tape, we will use โŒˆlog |ฮฃ|โŒ‰ arrays Tape_0 , โ€ฆ, Tape_โŒˆlog |ฮฃ|โŒ‰ โˆ’ 1,such that the ๐‘–๐‘กโ„Ž location of these arrays encodes the ๐‘–๐‘กโ„Ž symbol in thetape for every tape. Using the fact that every function can be computedby a NAND-CIRC program, we will be able to compute the transitionfunction of ๐‘€ , replacing moving left and right by decrementing andincrementing i respectively.

We show (2) using very similar ideas. Given a program ๐‘ƒ thatuses ๐‘Ž array variables and ๐‘ scalar variables, we will create a Turingmachine with about 2๐‘ states to encode the values of scalar variables,and an alphabet of about 2๐‘Ž so we can encode the arrays using ourtape. (The reason the sizes are only โ€œaboutโ€ 2๐‘Ž and 2๐‘ is that we willneed to add some symbols and steps for bookkeeping purposes.) TheTuring Machine ๐‘€ will simulate each iteration of the program ๐‘ƒ byupdating its state and tape accordingly.โ‹†Proof of Theorem 7.11. We start by proving the โ€œifโ€ direction of The-orem 7.11. Namely we show that given a Turing machine ๐‘€ , we can

Page 268: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

268 introduction to theoretical computer science

Figure 7.9: Comparing a Turing Machine to a NAND-TM program. Both have an unbounded memorycomponent (the tape for a Turing machine, and the ar-rays for a NAND-TM program), as well as a constantlocal memory (state for a Turing machine, and scalarvariables for a NAND-TM program). Both can onlyaccess at each step one location of the unboundedmemory, this is the โ€œheadโ€ location for a Turingmachine, and the value of the index variable i for aNAND-TM program.

find a NAND-TM program ๐‘ƒ๐‘€ such that for every input ๐‘ฅ, if ๐‘€ haltson input ๐‘ฅ with output ๐‘ฆ then ๐‘ƒ๐‘€(๐‘ฅ) = ๐‘ฆ. Since our goal is just toshow such a program ๐‘ƒ๐‘€ exists, we donโ€™t need to write out the fullcode of ๐‘ƒ๐‘€ line by line, and can take advantage of our various โ€œsyn-tactic sugarโ€ in describing it.

The key observation is that by Theorem 4.12 we can compute everyfinite function using a NAND-CIRC program. In particular, considerthe transition function ๐›ฟ๐‘€ โˆถ [๐‘˜] ร— ฮฃ โ†’ [๐‘˜] ร— ฮฃ ร— L, R of our TuringMachine. We can encode the its components as follows:

โ€ข We encode [๐‘˜] using 0, 1โ„“ and ฮฃ using 0, 1โ„“โ€ฒ , where โ„“ = โŒˆlog ๐‘˜โŒ‰and โ„“โ€ฒ = โŒˆlog |ฮฃ|โŒ‰.

โ€ข We encode the set L, R, S, H using 0, 12. We will choose theencode L โ†ฆ 01, R โ†ฆ 11, S โ†ฆ 10, H โ†ฆ 00. (This convenientlycorresponds to the semantics of the MODANDJUMP operation.)

Hence we can identify ๐›ฟ๐‘€ with a function ๐‘€ โˆถ 0, 1โ„“+โ„“โ€ฒ โ†’0, 1โ„“+โ„“โ€ฒ+2, mapping strings of length โ„“ + โ„“โ€ฒ to strings of lengthโ„“ + โ„“โ€ฒ + 2. By Theorem 4.12 there exists a finite length NAND-CIRCprogram ComputeM that computes this function ๐‘€ . The NAND-TMprogram to simulate ๐‘€ will essentially be the following:

Page 269: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 269

Algorithm 7.12 โ€” NAND-TM program to simulate TM ๐‘€ .

Input: ๐‘ฅ โˆˆ 0, 1โˆ—Output: ๐‘€(๐‘ฅ) if ๐‘€ halts on ๐‘ฅ. Otherwise go into infinite

loop1: # We use variables state_0 โ€ฆ state_โ„“ โˆ’ 1 to encode ๐‘€ โ€™s

state2: # We use arrays Tape_0[] โ€ฆ Tape_โ„“โ€ฒ โˆ’ 1[] to encode ๐‘€ โ€™s

tape3: # We omit the initial and final โ€book keepingโ€ to copy

Input: to Tape and copyOutput: from Tape4: # Use the fact that transition is finite and computable by

NAND-CIRC program:5: state_0 โ€ฆ state_โ„“ โˆ’ 1, Tape_0[i]โ€ฆ Tape_โ„“โ€ฒ โˆ’ 1[i],

dir0,dir1 โ† TRANSITION( state_0 โ€ฆ state_โ„“ โˆ’ 1,Tape_0[i]โ€ฆ Tape_โ„“โ€ฒ โˆ’ 1[i], dir0,dir1 )

6: MODANDJMP(dir0,dir1)

Every step of the main loop of the above program perfectly mimicsthe computation of the Turing Machine ๐‘€ and so the program carriesout exactly the definition of computation by a Turing Machine as perDefinition 7.1.

For the other direction, suppose that ๐‘ƒ is a NAND-TM programwith ๐‘  lines, โ„“ scalar variables, and โ„“โ€ฒ array variables. We will showthat there exists a Turing machine ๐‘€๐‘ƒ with 2โ„“ + ๐ถ states and alphabetฮฃ of size ๐ถโ€ฒ + 2โ„“โ€ฒ that computes the same functions as ๐‘ƒ (where ๐ถ, ๐ถโ€ฒare some constants to be determined later).

Specifically, consider the function ๐‘ƒ โˆถ 0, 1โ„“ ร— 0, 1โ„“โ€ฒ โ†’ 0, 1โ„“ ร—0, 1โ„“โ€ฒ that on input the contents of ๐‘ƒ โ€™s scalar variables and the con-tents of the array variables at location i in the beginning of an itera-tion, outputs all the new values of these variables at the last line of theiteration, right before the MODANDJUMP instruction is executed.

If foo and bar are the two variables that are used as input to theMODANDJUMP instruction, then this means that based on the values ofthese variables we can compute whether i will increase, decrease orstay the same, and whether the program will halt or jump back to thebeginning. Hence a Turing machine can simulate an execution of ๐‘ƒ inone iteration using a finite function applied to its alphabet. The overalloperation of the Turing machine will be as follows:

1. The machine ๐‘€๐‘ƒ encodes the contents of the array variables of ๐‘ƒin its tape, and the contents of the scalar variables in (part of) itsstate. Specifically, if ๐‘ƒ has โ„“ local variables and ๐‘ก arrays, then thestate space of ๐‘€ will be large enough to encode all 2โ„“ assignments

Page 270: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

270 introduction to theoretical computer science

to the local variables and the alphabet ฮฃ of ๐‘€ will be large enoughto encode all 2๐‘ก assignments for the array variables at each location.The head location corresponds to the index variable i.

2. Recall that every line of the program ๐‘ƒ corresponds to reading andwriting either a scalar variable, or an array variable at the locationi. In one iteration of ๐‘ƒ the value of i remains fixed, and so themachine ๐‘€ can simulate this iteration by reading the values ofall array variables at i (which are encoded by the single symbolin the alphabet ฮฃ located at the i-th cell of the tape) , reading thevalues of all scalar variables (which are encoded by the state), andupdating both. The transition function of ๐‘€ can output L, S, Rdepending on whether the values given to the MODANDJMP operationare 01, 10 or 11 respectively.

3. When the program halts (i.e., MODANDJMP gets 00) then the Turingmachine will enter into a special loop to copy the results of the Yarray into the output and then halt. We can achieve this by adding afew more states.

The above is not a full formal description of a Turing Machine, butour goal is just to show that such a machine exists. One can see that๐‘€๐‘ƒ simulates every step of ๐‘ƒ , and hence computes the same functionas ๐‘ƒ .

RRemark 7.13 โ€” Running time equivalence (optional). Ifwe examine the proof of Theorem 7.11 then we can seethat every iteration of the loop of a NAND-TM pro-gram corresponds to one step in the execution of theTuring machine. We will come back to this questionof measuring number of computation steps later inthis course. For now the main take away point is thatNAND-TM programs and Turing Machines are essen-tially equivalent in power even when taking runningtime into account.

7.3.1 Specification vs implementation (again)Once you understand the definitions of both NAND-TM programsand Turing Machines, Theorem 7.11 is fairly straightforward. Indeed,NAND-TM programs are not as much a different model from TuringMachines as they are simply a reformulation of the same model usingprogramming language notation. You can think of the difference be-tween a Turing machine and a NAND-TM program as the differencebetween representing a number using decimal or binary notation. In

Page 271: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 271

contrast, the difference between a function ๐น and a Turing machinethat computes ๐น is much more profound: it is like the difference be-tween the equation ๐‘ฅ2 + ๐‘ฅ = 12 and the number 3 that is a solutionfor this equation. For this reason, while we take special care in distin-guishing functions from programs or machines, we will often identifythe two latter concepts. We will move freely between describing analgorithm as a Turing machine or as a NAND-TM program (as wellas some of the other equivalent computational models we will see inChapter 8 and beyond).

Table 7.3: Specification vs Implementation formalisms

Setting Specification ImplementationFinite com-putation

Functions mapping 0, 1๐‘› to0, 1๐‘š Circuits, Straightlineprograms

Infinitecomputa-tion

Functions mapping 0, 1โˆ— to0, 1 or to 0, 1โˆ—. Algorithms, TuringMachines, Programs

7.4 NAND-TM SYNTACTIC SUGAR

Just like we did with NAND-CIRC in Chapter 4, we can use โ€œsyntacticsugarโ€ to make NAND-TM programs easier to write. For starters, wecan use all of the syntactic sugar of NAND-CIRC, and so have accessto macro definitions and conditionals (i.e., if/then). But we can gobeyond this and achieve for example:

โ€ข Inner loops such as the while and for operations common to manyprogramming language.

โ€ข Multiple index variables (e.g., not just i but we can add j, k, etc.).

โ€ข Arrays with more than one dimension (e.g., Foo[i][j],Bar[i][j][k] etc.)

In all of these cases (and many others) we can implement the newfeature as mere โ€œsyntactic sugarโ€ on top of standard NAND-TM,which means that the set of functions computable by NAND-TMwith this feature is the same as the set of functions computable bystandard NAND-TM. Similarly, we can show that the set of functionscomputable by Turing Machines that have more than one tape, ortapes of more dimensions than one, is the same as the set of functionscomputable by standard Turing machines.

Page 272: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

272 introduction to theoretical computer science

7.4.1 โ€œGOTOโ€ and inner loopsWe can implement more advanced looping constructs than the simpleMODANDJUMP. For example, we can implement GOTO. A GOTO statementcorresponds to jumping to a certain line in the execution. For example,if we have code of the form

"start": do fooGOTO("end")

"skip": do bar"end": do blah

then the program will only do foo and blah as when it reaches theline GOTO("end") it will jump to the line labeled with "end". We canachieve the effect of GOTO in NAND-TM using conditionals. In thecode below, we assume that we have a variable pc that can take stringsof some constant length. This can be encoded using a finite numberof Boolean variables pc_0, pc_1, โ€ฆ, pc_๐‘˜ โˆ’ 1, and so when we writebelow pc = "label" what we mean is something like pc_0 = 0,pc_1= 1, โ€ฆ (where the bits 0, 1, โ€ฆ correspond to the encoding of the finitestring "label" as a string of length ๐‘˜). We also assume that we haveaccess to conditional (i.e., if statements), which we can emulate usingsyntactic sugar in the same way as we did in NAND-CIRC.

To emulate a GOTO statement, we will first modify a program P ofthe form

do foodo bardo blah

to have the following form (using syntactic sugar for if):

pc = "line1"if (pc=="line1"):

do foopc = "line2"

if (pc=="line2"):do barpc = "line3"

if (pc=="line3"):do blah

These two programs do the same thing. The variable pc cor-responds to the โ€œprogram counterโ€ and tells the program whichline to execute next. We can see that if we wanted to emulate aGOTO("line3") then we could simply modify the instruction pc ="line2" to be pc = "line3".

Page 273: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 273

In NAND-CIRC we could only have GOTOs that go forward in thecode, but since in NAND-TM everything is encompassed within alarge outer loop, we can use the same ideas to implement GOTOโ€™s thatcan go backwards, as well as conditional loops.

Other loops. Once we have GOTO, we can emulate all the standard loopconstructs such as while, do .. until or for in NAND-TM as well.For example, we can replace the code

while foo:do blah

do bar

with

"loop":if NOT(foo): GOTO("next")do blahGOTO("loop")

"next":do bar

RRemark 7.14 โ€” GOTOโ€™s in programming languages. TheGOTO statement was a staple of most early program-ming languages, but has largely fallen out of favor andis not included in many modern languages such asPython, Java, Javascript. In 1968, Edsger Dijsktra wrote afamous letter titled โ€œGo to statement considered harm-ful.โ€ (see also Fig. 7.10). The main trouble with GOTOis that it makes analysis of programs more difficultby making it harder to argue about invariants of theprogram.When a program contains a loop of the form:

for j in range(100):do something

do blah

you know that the line of code do blah can only bereached if the loop ended, in which case you knowthat j is equal to 100, and might also be able to argueother properties of the state of the program. In con-trast, if the program might jump to do blah from anyother point in the code, then itโ€™s very hard for you asthe programmer to know what you can rely upon inthis code. As Dijkstra said, such invariants are impor-tant because โ€œour intellectual powers are rather gearedto master static relations and .. our powers to visualizeprocesses evolving in time are relatively poorly developedโ€

Page 274: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

274 introduction to theoretical computer science

Figure 7.10: XKCDโ€™s take on the GOTO statement.

and so โ€œwe should โ€ฆ do โ€ฆour utmost best to shorten theconceptual gap between the static program and the dynamicprocess.โ€That said, GOTO is still a major part of lower level lan-guages where it is used to implement higher levellooping constructs such as while and for loops.For example, even though Java doesnโ€™t have a GOTOstatement, the Java Bytecode (which is a lower levelrepresentation of Java) does have such a statement.Similarly, Python bytecode has instructions such asPOP_JUMP_IF_TRUE that implement the GOTO function-ality, and similar instructions are included in manyassembly languages. The way we use GOTO to imple-ment a higher level functionality in NAND-TM isreminiscent of the way these various jump instructionsare used to implement higher level looping constructs.

7.5 UNIFORMITY, AND NAND VS NAND-TM (DISCUSSION)

While NAND-TM adds extra operations over NAND-CIRC, it is notexactly accurate to say that NAND-TM programs or Turing machinesare โ€œmore powerfulโ€ than NAND-CIRC programs or Boolean circuits.NAND-CIRC programs, having no loops, are simply not applicablefor computing functions with an unbounded number of inputs. Thus,to compute a function ๐น โˆถ 0, 1โˆ— โˆถโ†’ 0, 1โˆ— using NAND-CIRC (orequivalently, Boolean circuits) we need a collection of programs/cir-cuits: one for every input length.

The key difference between NAND-CIRC and NAND-TM is thatNAND-TM allows us to express the fact that the algorithm for com-puting parities of length-100 strings is really the same one as the al-gorithm for computing parities of length-5 strings (or similarly thefact that the algorithm for adding ๐‘›-bit numbers is the same for every๐‘›, etc.). That is, one can think of the NAND-TM program for generalparity as the โ€œseedโ€ out of which we can grow NAND-CIRC programsfor length 10, length 100, or length 1000 parities as needed.

This notion of a single algorithm that can compute functions ofall input lengths is known as uniformity of computation and hencewe think of Turing machines / NAND-TM as uniform model of com-putation, as opposed to Boolean circuits or NAND-CIRC which is anonuniform model, where we have to specify a different program forevery input length.

Looking ahead, we will see that this uniformity leads to anothercrucial difference between Turing machines and circuits. Turing ma-chines can have inputs and outputs that are longer than the descrip-tion of the machine as a string and in particular there exists a Turingmachine that can โ€œself replicateโ€ in the sense that it can print its own

Page 275: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 275

code. This notion of โ€œself replicationโ€, and the related notion of โ€œselfreferenceโ€ is crucial to many aspects of computation, as well of courseto life itself, whether in the form of digital or biological programs.

For now, what you ought to remember is the following differencesbetween uniform and non uniform computational models:

โ€ข Non uniform computational models: Examples are NAND-CIRCprograms and Boolean circuits. These are models where each indi-vidual program/circuit can compute a finite function ๐‘“ โˆถ 0, 1๐‘› โ†’0, 1๐‘š. We have seen that every finite function can be computed bysome program/circuit. To discuss computation of an infinite func-tion ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— we need to allow a sequence ๐‘ƒ๐‘›๐‘›โˆˆโ„• ofprograms/circuits (one for every input length), but this does notcapture the notion of a single algorithm to compute the function ๐น .

โ€ข Uniform computational models: Examples are Turing machines andNAND-TM programs. These are models where a single program/-machine can take inputs of arbitrary length and hence compute aninfinite function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—. The number of steps thata program/machine takes on some input is not a priori boundedin advance and in particular there is a chance that it will enter intoan infinite loop. Unlike the nonuniform case, we have not shownthat every infinite function can be computed by some NAND-TMprogram/Turing Machine. We will come back to this point in Chap-ter 9.

Chapter Recap

โ€ข Turing machines capture the notion of a single al-gorithm that can evaluate functions of every inputlength.

โ€ข They are equivalent to NAND-TM programs, whichadd loops and arrays to NAND-CIRC.

โ€ข Unlike NAND-CIRC or Boolean circuits, the num-ber of steps that a Turing machine takes on a giveninput is not fixed in advance. In fact, a Turing ma-chine or a NAND-TM program can enter into aninfinite loop on certain inputs, and not halt at all.

7.6 EXERCISES

Exercise 7.1 โ€” Explicit NAND TM programming. Produce the code of a(syntactic-sugar free) NAND-TM program ๐‘ƒ that computes the (un-bounded input length) Majority function ๐‘€๐‘Ž๐‘— โˆถ 0, 1โˆ— โ†’ 0, 1 wherefor every ๐‘ฅ โˆˆ 0, 1โˆ—, ๐‘€๐‘Ž๐‘—(๐‘ฅ) = 1 if and only if โˆ‘|๐‘ฅ|๐‘–=0 ๐‘ฅ๐‘– > |๐‘ฅ|/2. Wesay โ€œproduceโ€ rather than โ€œwriteโ€ because you do not have to write

Page 276: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

276 introduction to theoretical computer science

the code of ๐‘ƒ by hand, but rather can use the programming languageof your choice to compute this code.

Exercise 7.2 โ€” Computable functions examples. Prove that the followingfunctions are computable. For all of these functions, you do not haveto fully specify the Turing Machine or the NAND-TM program thatcomputes the function, but rather only prove that such a machine orprogram exists:

1. INC โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— which takes as input a representation of anatural number ๐‘› and outputs the representation of ๐‘› + 1.

2. ADD โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— which takes as input a representation ofa pair of natural numbers (๐‘›, ๐‘š) and outputs the representation of๐‘› + ๐‘š.

3. MULT โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, which takes a representation of a pair ofnatural numbers (๐‘›, ๐‘š) and outputs the representation of ๐‘›.

4. SORT โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— which takes as input the representation ofa list of natural numbers (๐‘Ž0, โ€ฆ , ๐‘Ž๐‘›โˆ’1) and returns its sorted version(๐‘0, โ€ฆ , ๐‘๐‘›โˆ’1) such that for every ๐‘– โˆˆ [๐‘›] there is some ๐‘— โˆˆ [๐‘›] with๐‘๐‘– = ๐‘Ž๐‘— and ๐‘0 โ‰ค ๐‘1 โ‰ค โ‹ฏ โ‰ค ๐‘๐‘›โˆ’1.

Exercise 7.3 โ€” Two index NAND-TM. Define NAND-TMโ€™ to be the variant ofNAND-TM where there are two index variables i and j. Arrays can beindexed by either i or j. The operation MODANDJMP takes four variables๐‘Ž, ๐‘, ๐‘, ๐‘‘ and uses the values of ๐‘, ๐‘‘ to decide whether to increment j,decrement j or keep it in the same value (corresponding to 01, 10, and00 respectively). Prove that for every function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, ๐นis computable by a NAND-TM program if and only if ๐น is computableby a NAND-TMโ€™ program.

Exercise 7.4 โ€” Two tape Turing machines. Define a two tape Turing machineto be a Turing machine which has two separate tapes and two separateheads. At every step, the transition function gets as input the locationof the cells in the two tapes, and can decide whether to move eachhead independently. Prove that for every function ๐น โˆถ 0, 1โˆ— โ†’0, 1โˆ—, ๐น is computable by a standard Turing Machine if and only if ๐นis computable by a two-tape Turing machine.

Exercise 7.5 โ€” Two dimensional arrays. Define NAND-TMโ€™ โ€™ to be the vari-ant of NAND-TM where just like NAND-TMโ€™ defined in Exercise 7.3

Page 277: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 277

3 You can use the sequence R, L,R, R, L, L, R,R,R, L, L, L,โ€ฆ.

there are two index variables i and j, but now the arrays are two di-mensional and so we index an array Foo by Foo[i][j]. Prove that forevery function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, ๐น is computable by a NAND-TMprogram if and only if ๐น is computable by a NAND-TMโ€™ โ€™ program.

Exercise 7.6 โ€” Two dimensional Turing machines. Define a two-dimensionalTuring machine to be a Turing machine in which the tape is two dimen-sional. At every step the machine can move Up, Down, Left, Right, orStay. Prove that for every function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, ๐น is com-putable by a standard Turing Machine if and only if ๐น is computableby a two-dimensional Turing machine.

Exercise 7.7 Prove the following closure properties of the set R definedin Definition 7.3:

1. If ๐น โˆˆ R then the function ๐บ(๐‘ฅ) = 1 โˆ’ ๐น(๐‘ฅ) is in R.

2. If ๐น, ๐บ โˆˆ R then the function ๐ป(๐‘ฅ) = ๐น(๐‘ฅ) โˆจ ๐บ(๐‘ฅ) is in R.

3. If ๐น โˆˆ R then the function ๐น โˆ— in in R where ๐น โˆ— is defined as fol-lows: ๐น โˆ—(๐‘ฅ) = 1 iff there exist some strings ๐‘ค0, โ€ฆ , ๐‘ค๐‘˜โˆ’1 such that๐‘ฅ = ๐‘ค0๐‘ค1 โ‹ฏ ๐‘ค๐‘˜โˆ’1 and ๐น(๐‘ค๐‘–) = 1 for every ๐‘– โˆˆ [๐‘˜].

4. If ๐น โˆˆ R then the function

๐บ(๐‘ฅ) = โŽงโŽจโŽฉโˆƒ๐‘ฆโˆˆ0,1|๐‘ฅ|๐น(๐‘ฅ๐‘ฆ) = 10 otherwise(7.3)

is in R.

Exercise 7.8 โ€” Oblivious Turing Machines (challenging). Define a Turing Ma-chine ๐‘€ to be oblivious if its head movements are independent of itsinput. That is, we say that ๐‘€ is oblivious if there exists an infinitesequence MOVE โˆˆ L, R, Sโˆž such that for every ๐‘ฅ โˆˆ 0, 1โˆ—, themovements of ๐‘€ when given input ๐‘ฅ (up until the point it halts, ifsuch point exists) are given by MOVE0,MOVE1,MOVE2, โ€ฆ.

Prove that for every function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, if ๐น is com-putable then it is computable by an oblivious Turing machine. Seefootnote for hint.3

Exercise 7.9 โ€” Single vs multiple bit. Prove that for every ๐น โˆถ 0, 1โˆ— โ†’0, 1โˆ—, the function ๐น is computable if and only if the following func-

Page 278: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

278 introduction to theoretical computer science

tion ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1 is computable, where ๐บ is defined as follows:๐บ(๐‘ฅ, ๐‘–, ๐œŽ) = โŽงโŽจโŽฉ๐น(๐‘ฅ)๐‘– ๐‘– < |๐น(๐‘ฅ)|, ๐œŽ = 01 ๐‘– < |๐น(๐‘ฅ)|, ๐œŽ = 10 ๐‘– โ‰ฅ |๐น(๐‘ฅ)|

Exercise 7.10 โ€” Uncomputability via counting. Recall that R is the set of alltotal functions from 0, 1โˆ— to 0, 1 that are computable by a Turingmachine (see Definition 7.3). Prove that R is countable. That is, provethat there exists a one-to-one map ๐ท๐‘ก๐‘ โˆถ R โ†’ โ„•. You can use theequivalence between Turing machines and NAND-TM programs.

Exercise 7.11 โ€” Not every function is computable. Prove that the set of alltotal functions from 0, 1โˆ— โ†’ 0, 1 is not countable. You can use theresults of Section 2.4. (We will see an explicit uncomputable functionin Chapter 9.)

7.7 BIBLIOGRAPHICAL NOTES

Augusta Ada Byron, countess of Lovelace (1815-1852) lived a shortbut turbulent life, though is today most well known for her collabo-ration with Charles Babbage (see [Ste87] for a biography). Ada tookan immense interest in Babbageโ€™s analytical engine, which we men-tioned in Chapter 3. In 1842-3, she translated from Italian a paper ofMenabrea on the engine, adding copious notes (longer than the paperitself). The quote in the chapterโ€™s beginning is taken from Nota A inthis text. Lovelaceโ€™s notes contain several examples of programs for theanalytical engine, and because of this she has been called โ€œthe worldโ€™sfirst computer programmerโ€ though it is not clear whether they werewritten by Lovelace or Babbage himself [Hol01]. Regardless, Ada wasclearly one of very few people (perhaps the only one outside of Bab-bage himself) to fully appreciate how significant and revolutionarythe idea of mechanizing computation truly is.

The books of Shetterly [She16] and Sobel [Sob17] discuss the his-tory of human computers (who were female, more often than not)and their important contributions to scientific discoveries in astron-omy and space exploration.

Alan Turing was one of the intellectual giants of the 20th century.He was not only the first person to define the notion of computation,but also invented and used some of the worldโ€™s earliest computationaldevices as part of the effort to break the Enigma cipher during WorldWar II, saving millions of lives. Tragically, Turing committed suicidein 1954, following his conviction in 1952 for homosexual acts and acourt-mandated hormonal treatment. In 2009, British prime minister

Page 279: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

loops and infinity 279

Gordon Brown made an official public apology to Turing, and in 2013Queen Elizabeth II granted Turing a posthumous pardon. Turingโ€™s lifeis the subject of a great book and a mediocre movie.

Sipserโ€™s text [Sip97] defines a Turing machine as a seven tuple con-sisting of the state space, input alphabet, tape alphabet, transitionfunction, starting state, accepting state, and rejecting state. Superfi-cially this looks like a very different definition than Definition 7.1 butit is simply a different representation of the same concept, just as agraph can be represented in either adjacency list or adjacency matrixform.

One difference is that Sipser considers a general set of states ๐‘„ thatis not necessarily of the form ๐‘„ = 0, 1, 2, โ€ฆ , ๐‘˜ โˆ’ 1 for some naturalnumber ๐‘˜ > 0. Sipser also restricts his attention to Turing machinesthat output only a single bit and therefore designates two special halt-ing states: the โ€œ0 halting stateโ€ (often known as the rejecting state) andthe other as the โ€œ1 halting stateโ€ (often known as the accepting state).Thus instead of writing 0 or 1 on an output tape, the machine will en-ter into one of these states and halt. This again makes no differenceto the computational power, though we prefer to consider the moregeneral model of multi-bit outputs. (Sipser presents the basic task of aTuring machine as that of deciding a language as opposed to computinga function, but these are equivalent, see Remark 7.4.)

Sipser considers also functions with input in ฮฃโˆ— for an arbitraryalphabet ฮฃ (and hence distinguishes between the input alphabet whichhe denotes as ฮฃ and the tape alphabet which he denotes as ฮ“), while werestrict attention to functions with binary strings as input. Again thisis not a major issue, since we can always encode an element of ฮฃ usinga binary string of length logโŒˆ|ฮฃ|โŒ‰. Finally (and this is a very minorpoint) Sipser requires the machine to either move left or right in everystep, without the Stay operation, though staying in place is very easyto emulate by simply moving right and then back left.

Another definition used in the literature is that a Turing machine๐‘€ recognizes a language ๐ฟ if for every ๐‘ฅ โˆˆ ๐ฟ, ๐‘€(๐‘ฅ) = 1 and forevery ๐‘ฅ โˆ‰ ๐ฟ, ๐‘€(๐‘ฅ) โˆˆ 0, โŠฅ. A language ๐ฟ is recursively enumerable ifthere exists a Turing machine ๐‘€ that recognizes it, and the set of allrecursively enumerable languages is often denoted by RE. We will notuse this terminology in this book.

One of the first programming-language formulations of Turingmachines was given by Wang [Wan57]. Our formulation of NAND-TM is aimed at making the connection with circuits more direct, withthe eventual goal of using it for the Cook-Levin Theorem, as well asresults such as P โŠ† P/poly and BPP โŠ† P/poly. The website esolangs.orgfeatures a large variety of esoteric Turing-complete programminglanguages. One of the most famous of them is Brainf*ck.

Page 280: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 281: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

8Equivalent models of computation

โ€œAll problems in computer science can be solved by another level of indirec-tionโ€, attributed to David Wheeler.

โ€œBecause we shall later compute with expressions for functions, we need adistinction between functions and forms and a notation for expressing thisdistinction. This distinction and a notation for describing it, from which wedeviate trivially, is given by Church.โ€, John McCarthy, 1960 (in paperdescribing the LISP programming language)

So far we have defined the notion of computing a function usingTuring machines, which are not a close match to the way computationis done in practice. In this chapter we justify this choice by showingthat the definition of computable functions will remain the sameunder a wide variety of computational models. This notion is knownas Turing completeness or Turing equivalence and is one of the mostfundamental facts of computer science. In fact, a widely believedclaim known as the Church-Turing Thesis holds that every โ€œreasonableโ€definition of computable function is equivalent to being computableby a Turing machine. We discuss the Church-Turing Thesis and thepotential definitions of โ€œreasonableโ€ in Section 8.8.

Some of the main computational models we discuss in this chapterinclude:

โ€ข RAM Machines: Turing Machines do not correspond to standardcomputing architectures that have Random Access Memory (RAM).The mathematical model of RAM machines is much closer to actualcomputers, but we will see that it is equivalent in power to TuringMachines. We also discuss a programming language variant ofRAM machines, which we call NAND-RAM. The equivalence ofTuring Machines and RAM machines enables demonstrating theTuring Equivalence of many popular programming languages, in-cluding all general-purpose languages used in practice such as C,Python, JavaScript, etc.

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Learn about RAM machines and the ฮป

calculus.โ€ข Equivalence between these and other models

and Turing machines.โ€ข Cellular automata and configurations of

Turing machines.โ€ข Understand the Church-Turing thesis.

Page 282: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

282 introduction to theoretical computer science

โ€ข Cellular Automata: Many natural and artificial systems can bemodeled as collections of simple components, each evolving ac-cording to simple rules based on its state and the state of its imme-diate neighbors. One well-known such example is Conwayโ€™s Gameof Life. To prove that cellular automata are equivalent to Turingmachines we introduce the tool of configurations of Turing Ma-chines. These have other applications, and in particular are used inChapter 11 to prove Gรถdelโ€™s Incompleteness Theorem: a central resultin mathematics.

โ€ข ๐œ† calculus: The ๐œ† calculus is a model for expressing computationthat originates from the 1930โ€™s, though it is closely connected tofunctional programming languages widely used today. Showingthe equivalence of ๐œ† calculus to Turing Machines involves a beauti-ful technique to eliminate recursion known as the โ€œY Combinatorโ€.

This chapter: A non-mathy overviewIn this chapter we study equivalence between models. Twocomputational models are equivalent (also known as Turingequivalent) if they can compute the same set of functions. Forexample, we have seen that Turing Machines and NAND-TMprograms are equivalent since we can transform every Tur-ing Machine into a NAND-TM program that computes thesame function, and similarly can transform every NAND-TM program into a Turing Machine that computes the samefunction.In this chapter we show this extends far beyond Turing Ma-chines. The techniques we develop allow us to show thatall general-purpose programming languages (i.e., Python,C, Java, etc.) are Turing Complete, in the sense that they cansimulate Turing Machines and hence compute all functionsthat can be computed by a TM. We will also show the otherdirection- Turing Machines can be used to simulate a pro-gram in any of these languages and hence compute anyfunction computable by them. This means that all these pro-gramming language are Turing equivalent: they are equivalentin power to Turing Machines and to each other. This is apowerful principle, which underlies behind the vast reach ofComputer Science. Moreover, it enables us to โ€œhave our cakeand eat it tooโ€- since all these models are equivalent, we canchoose the model of our convenience for the task at hand.To achieve this equivalence, we define a new computationalmodel known as RAM machines. RAM Machines capture the

Page 283: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 283

architecture of modern computers more closely than TuringMachines, but are still computationally equivalent to Turingmachines.Finally, we will show that Turing equivalence extends farbeyond traditional programming languages. We will seethat cellular automata which are a mathematical model of ex-tremely simple natural systems is also Turing equivalent, andalso see the Turing equivalence of the ๐œ† calculus - a logicalsystem for expressing functions that is the basis for functionalprogramming languages such as Lisp, OCaml, and more.See Fig. 8.1 for an overview of the results of this chapter.

Figure 8.1: Some Turing-equivalent models. All ofthese are equivalent in power to Turing Machines(or equivalently NAND-TM programs) in the sensethat they can compute exactly the same class offunctions. All of these are models for computinginfinite functions that take inputs of unboundedlength. In contrast, Boolean circuits / NAND-CIRCprograms can only compute finite functions and henceare not Turing complete.

8.1 RAM MACHINES AND NAND-RAM

One of the limitations of Turing Machines (and NAND-TM programs)is that we can only access one location of our arrays/tape at a time. Ifthe head is at position 22 in the tape and we want to access the 957-thposition then it will take us at least 923 steps to get there. In contrast,almost every programming language has a formalism for directlyaccessing memory locations. Actual physical computers also provideso called Random Access Memory (RAM) which can be thought of as alarge array Memory, such that given an index ๐‘ (i.e., memory address,or a pointer), we can read from and write to the ๐‘๐‘กโ„Ž location of Memory.(โ€œRandom access memoryโ€ is quite a misnomer since it has nothing todo with probability, but since it is a standard term in both the theoryand practice of computing, we will use it as well.)

The computational model that models access to such a memory isthe RAM machine (sometimes also known as the Word RAM model),as depicted in Fig. 8.2. The memory of a RAM machine is an arrayof unbounded size where each cell can store a single word, whichwe think of as a string in 0, 1๐‘ค and also (equivalently) as a num-ber in [2๐‘ค]. For example, many modern computing architectures

Page 284: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

284 introduction to theoretical computer science

Figure 8.2: A RAMMachine contains a finite number oflocal registers, each of which holds an integer, and anunbounded memory array. It can perform arithmeticoperations on its register as well as load to a register ๐‘Ÿthe contents of the memory at the address indexed bythe number in register ๐‘Ÿโ€ฒ.

Figure 8.3: Different aspects of RAM machines andTuring machines. RAM machines can store integersin their local registers, and can read and write totheir memory at a location specified by a register.In contrast, Turing machines can only access theirmemory in the head location, which moves at mostone position to the right or left in each step.

use 64 bit words, in which every memory location holds a string in0, 164 which can also be thought of as a number between 0 and264 โˆ’ 1 = 18, 446, 744, 073, 709, 551, 615. The parameter ๐‘ค is knownas the word size. In practice often ๐‘ค is a fixed number such as 64, butwhen doing theory we model ๐‘ค as a parameter that can depend onthe input length or number of steps. (You can think of 2๐‘ค as roughlycorresponding to the largest memory address that we use in the com-putation.) In addition to the memory array, a RAM machine alsocontains a constant number of registers ๐‘Ÿ0, โ€ฆ , ๐‘Ÿ๐‘˜โˆ’1, each of which canalso contain a single word.

The operations a RAM machine can carry out include:โ€ข Data movement: Load data from a certain cell in memory into

a register or store the contents of a register into a certain cell ofmemory. RAM machine can directly access any cell of memorywithout having to move the โ€œheadโ€ (as Turing machines do) to thatlocation. That is, in one step a RAM machine can load into register๐‘Ÿ๐‘– the contents of the memory cell indexed by register ๐‘Ÿ๐‘—, or storeinto the memory cell indexed by register ๐‘Ÿ๐‘— the contents of register๐‘Ÿ๐‘–.

โ€ข Computation: RAM machines can carry out computation on regis-ters such as arithmetic operations, logical operations, and compar-isons.

โ€ข Control flow: As in the case of Turing machines, the choice of whatinstruction to perform next can depend on the state of the RAMmachine, which is captured by the contents of its register.We will not give a formal definition of RAM Machines, though the

bibliographical notes section (Section 8.10) contains sources for suchdefinitions. Just as the NAND-TM programming language modelsTuring machines, we can also define a NAND-RAM programming lan-guage that models RAM machines. The NAND-RAM programminglanguage extends NAND-TM by adding the following features:โ€ข The variables of NAND-RAM are allowed to be (non negative)

integer valued rather than only Boolean as is the case in NAND-TM. That is, a scalar variable foo holds a non negative integer in โ„•(rather than only a bit in 0, 1), and an array variable Bar holdsan array of integers. As in the case of RAM machines, we will notallow integers of unbounded size. Concretely, each variable holdsa number between 0 and ๐‘‡ โˆ’ 1, where ๐‘‡ is the number of stepsthat have been executed by the program so far. (You can ignorethis restriction for now: if we want to hold larger numbers, wecan simply execute dummy instructions; it will be useful in laterchapters.)

Page 285: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 285

Figure 8.4: Overview of the steps in the proof of The-orem 8.1 simulating NANDRAM with NANDTM.We first use the inner loop syntactic sugar of Sec-tion 7.4.1 to enable loading an integer from an arrayto the index variable i of NANDTM. Once we can dothat, we can simulate indexed access in NANDTM. Wethen use an embedding of โ„•2 in โ„• to simulate twodimensional bit arrays in NANDTM. Finally, we usethe binary representation to encode one-dimensionalarrays of integers as two dimensional arrays of bitshence completing the simulation of NANDRAM withNANDTM.

โ€ข We allow indexed access to arrays. If foo is a scalar and Bar is anarray, then Bar[foo] refers to the location of Bar indexed by thevalue of foo. (Note that this means we donโ€™t need to have a specialindex variable i anymore.)

โ€ข As is often the case in programming languages, we will assumethat for Boolean operations such as NAND, a zero valued integer isconsidered as false, and a nonzero valued integer is considered astrue.

โ€ข In addition to NAND, NAND-RAM also includes all the basic arith-metic operations of addition, subtraction, multiplication, (integer)division, as well as comparisons (equal, greater than, less than,etc..).

โ€ข NAND-RAM includes conditional statements if/then as part ofthe language.

โ€ข NAND-RAM contains looping constructs such as while and do aspart of the language.

A full description of the NAND-RAM programming language isin the appendix. However, the most important fact you need to knowabout NAND-RAM is that you actually donโ€™t need to know muchabout NAND-RAM at all, since it is equivalent in power to Turingmachines:

Theorem 8.1 โ€” Turing Machines (aka NAND-TM programs) and RAM ma-

chines (aka NAND-RAM programs) are equivalent. For every function๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, ๐น is computable by a NAND-TM program ifand only if ๐น is computable by a NAND-RAM program.

Since NAND-TM programs are equivalent to Turing machines, andNAND-RAM programs are equivalent to RAM machines, Theorem 8.1shows that all these four models are equivalent to one another.

Proof Idea:

Clearly NAND-RAM is only more powerful than NAND-TM, andso if a function ๐น is computable by a NAND-TM program then it canbe computed by a NAND-RAM program. The challenging direction isto transform a NAND-RAM program ๐‘ƒ to an equivalent NAND-TMprogram ๐‘„. To describe the proof in full we will need to cover the fullformal specification of the NAND-RAM language, and show how wecan implement every one of its features as syntactic sugar on top ofNAND-TM.

This can be done but going over all the operations in detail is rathertedious. Hence we will focus on describing the main ideas behind this

Page 286: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

286 introduction to theoretical computer science

transformation. (See also Fig. 8.4.) NAND-RAM generalizes NAND-TM in two main ways: (a) adding indexed access to the arrays (ie..,Foo[bar] syntax) and (b) moving from Boolean valued variables tointeger valued ones. The transformation has two steps:

1. Indexed access of bit arrays: We start by showing how to handle (a).Namely, we show how we can implement in NAND-TM the op-eration Setindex(Bar) such that if Bar is an array that encodessome integer ๐‘—, then after executing Setindex(Bar) the value ofi will equal to ๐‘—. This will allow us to simulate syntax of the formFoo[Bar] by Setindex(Bar) followed by Foo[i].

2. Two dimensional bit arrays: We then show how we can use โ€œsyntacticsugarโ€ to augment NAND-TM with two dimensional arrays. That is,have two indices i and j and two dimensional arrays, such that we canuse the syntax Foo[i][j] to access the (i,j)-th location of Foo.

3. Arrays of integers: Finally we will encode a one dimensional arrayArr of integers by a two dimensional Arrbin of bits. The idea issimple: if ๐‘Ž๐‘–,0, โ€ฆ , ๐‘Ž๐‘–,โ„“ is a binary (prefix-free) representation ofArr[๐‘–], then Arrbin[๐‘–][๐‘—] will be equal to ๐‘Ž๐‘–,๐‘—.Once we have arrays of integers, we can use our usual syntactic

sugar for functions, GOTO etc. to implement the arithmetic and controlflow operations of NAND-RAM.โ‹†

The above approach is not the only way to obtain a proof of Theo-rem 8.1, see for example Exercise 8.1

RRemark 8.2 โ€” RAM machines / NAND-RAM and assemblylanguage (optional). RAM machines correspond quiteclosely to actual microprocessors such as those in theIntel x86 series that also contains a large primary mem-ory and a constant number of small registers. This is ofcourse no accident: RAM machines aim at modelingmore closely than Turing machines the architecture ofactual computing systems, which largely follows theso called von Neumann architecture as described inthe report [Neu45]. As a result, NAND-RAM is sim-ilar in its general outline to assembly languages suchas x86 or NIPS. These assembly languages all haveinstructions to (1) move data from registers to mem-ory, (2) perform arithmetic or logical computationson registers, and (3) conditional execution and loops(โ€œifโ€ and โ€œgotoโ€, commonly known as โ€œbranchesโ€ andโ€œjumpsโ€ in the context of assembly languages).The main difference between RAM machines andactual microprocessors (and correspondingly between

Page 287: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 287

NAND-RAM and assembly languages) is that actualmicroprocessors have a fixed word size ๐‘ค so that allregisters and memory cells hold numbers in [2๐‘ค] (orequivalently strings in 0, 1๐‘ค). This number ๐‘ค canvary among different processors, but common valuesare either 32 or 64. As a theoretical model, RAM ma-chines do not have this limitation, but we rather let ๐‘คbe the logarithm of our running time (which roughlycorresponds to its value in practice as well). Actualmicroprocessors also have a fixed number of registers(e.g., 14 general purpose registers in x86-64) but thisdoes not make a big difference with RAM machines.It can be shown that RAM machines with as few astwo registers are as powerful as full-fledged RAM ma-chines that have an arbitrarily large constant numberof registers.Of course actual microprocessors have many featuresnot shared with RAM machines as well, includingparallelism, memory hierarchies, and many others.However, RAM machines do capture actual comput-ers to a first approximation and so (as we will see), therunning time of an algorithm on a RAM machine (e.g.,๐‘‚(๐‘›) vs ๐‘‚(๐‘›2)) is strongly correlated with its practicalefficiency.

8.2 THE GORY DETAILS (OPTIONAL)

We do not show the full formal proof of Theorem 8.1 but focus on themost important parts: implementing indexed access, and simulatingtwo dimensional arrays with one dimensional ones. Even these arealready quite tedious to describe, as will not be surprising to anyonethat has ever written a compiler. Hence you can feel free to merelyskim this section. The important point is not for you to know all de-tails by heart but to be convinced that in principle it is possible totransform a NAND-RAM program to an equivalent NAND-TM pro-gram, and even be convinced that, with sufficient time and effort, youcould do it if you wanted to.

8.2.1 Indexed access in NAND-TMIn NAND-TM we can only access our arrays in the position of the in-dex variable i, while NAND-RAM has integer-valued variables andcan use them for indexed access to arrays, of the form Foo[bar]. To im-plement indexed access in NAND-TM, we will encode integers in ourarrays using some prefix-free representation (see Section 2.5.2)), andthen have a procedure Setindex(Bar) that sets i to the value encodedby Bar. We can simulate the effect of Foo[Bar] using Setindex(Bar)followed by Foo[i].

Implementing Setindex(Bar) can be achieved as follows:

Page 288: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

288 introduction to theoretical computer science

1. We initialize an array Arzero such that Atzero[0]= 1 andAtzero[๐‘—]= 0 for all ๐‘— > 0. (This can be easily done in NAND-TMas all uninitialized variables default to zero.)

2. Set i to zero, by decrementing it until we reach the point whereAtzero[i]= 1.

3. Let Temp be an array encoding the number 0.4. We use GOTO to simulate an inner loop of of the form: while Temp โ‰ 

Bar, increment Temp.5. At the end of the loop, i is equal to the value encoded by Bar.

In NAND-TM code (using some syntactic sugar), we can imple-ment the above operations as follows:# assume Atzero is an array such that Atzero[0]=1# and Atzero[j]=0 for all j>0

# set i to 0.LABEL("zero_idx")dir0 = zerodir1 = one# corresponds to i <- i-1GOTO("zero_idx",NOT(Atzero[i]))...# zero out temp#(code below assumes a specific prefix-free encoding in

which 10 is the "end marker")Temp[0] = 1Temp[1] = 0# set i to Bar, assume we know how to increment, compareLABEL("increment_temp")cond = EQUAL(Temp,Bar)dir0 = onedir1 = one# corresponds to i <- i+1INC(Temp)GOTO("increment_temp",cond)# if we reach this point, i is number encoded by Bar...# final instruction of programMODANDJUMP(dir0,dir1)

8.2.2 Two dimensional arrays in NAND-TMTo implement two dimensional arrays, we want to embed them in aone dimensional array. The idea is that we come up with a one to one

Page 289: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 289

Figure 8.5: Illustration of the map ๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅ, ๐‘ฆ) =12 (๐‘ฅ + ๐‘ฆ)(๐‘ฅ + ๐‘ฆ + 1) + ๐‘ฅ for ๐‘ฅ, ๐‘ฆ โˆˆ [10], one cansee that for every distinct pairs (๐‘ฅ, ๐‘ฆ) and (๐‘ฅโ€ฒ, ๐‘ฆโ€ฒ),๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅ, ๐‘ฆ) โ‰  ๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅโ€ฒ, ๐‘ฆโ€ฒ).

function ๐‘’๐‘š๐‘๐‘’๐‘‘ โˆถ โ„• ร— โ„• โ†’ โ„•, and so embed the location (๐‘–, ๐‘—) of thetwo dimensional array Two in the location ๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘–, ๐‘—) of the array One.

Since the set โ„• ร— โ„• seems โ€œmuch biggerโ€ than the set โ„•, a priori itmight not be clear that such a one to one mapping exists. However,once you think about it more, it is not that hard to construct. For ex-ample, you could ask a child to use scissors and glue to transform a10โ€ by 10โ€ piece of paper into a 1โ€ by 100โ€ strip. This is essentiallya one to one map from [10] ร— [10] to [100]. We can generalize this toobtain a one to one map from [๐‘›] ร— [๐‘›] to [๐‘›2] and more generally a oneto one map from โ„• ร— โ„• to โ„•. Specifically, the following map ๐‘’๐‘š๐‘๐‘’๐‘‘would do (see Fig. 8.5):๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅ, ๐‘ฆ) = 12 (๐‘ฅ + ๐‘ฆ)(๐‘ฅ + ๐‘ฆ + 1) + ๐‘ฅ . (8.1)

Exercise 8.3 asks you to prove that ๐‘’๐‘š๐‘๐‘’๐‘‘ is indeed one to one, aswell as computable by a NAND-TM program. (The latter can be doneby simply following the grade-school algorithms for multiplication,addition, and division.) This means that we can replace code of theform Two[Foo][Bar] = something (i.e., access the two dimensionalarray Two at the integers encoded by the one dimensional arrays Fooand Bar) by code of the form:

Blah = embed(Foo,Bar)Setindex(Blah)Two[i] = something

8.2.3 All the restOnce we have two dimensional arrays and indexed access, simulatingNAND-RAM with NAND-TM is just a matter of implementing thestandard algorithms for arithmetic operations and comparisons inNAND-TM. While this is cumbersome, it is not difficult, and the endresult is to show that every NAND-RAM program ๐‘ƒ can be simulatedby an equivalent NAND-TM program ๐‘„, thus completing the proof ofTheorem 8.1.

RRemark 8.3 โ€” Recursion in NAND-RAM (advanced). Oneconcept that appears in many programming languagesbut we did not include in NAND-RAM programs isrecursion. However, recursion (and function calls ingeneral) can be implemented in NAND-RAM usingthe stack data structure. A stack is a data structure con-taining a sequence of elements, where we can โ€œpushโ€elements into it and โ€œpopโ€ them from it in โ€œfirst in lastoutโ€ order.We can implement a stack using an array of integersStack and a scalar variable stackpointer that will

Page 290: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

290 introduction to theoretical computer science

Figure 8.6: A punched card corresponding to a Fortranstatement.

1 Some programming language have fixed (even ifextremely large) bounds on the amount of memorythey can access, which formally prevent them frombeing applicable to computing infinite functions andhence simulating Turing machines. We ignore suchissues in this discussion and assume access to somestorage device without a fixed upper bound on itscapacity.

be the number of items in the stack. We implementpush(foo) by

Stack[stackpointer]=foostackpointer += one

and implement bar = pop() by

bar = Stack[stackpointer]stackpointer -= one

We implement a function call to ๐น by pushing thearguments for ๐น into the stack. The code of ๐น willโ€œpopโ€ the arguments from the stack, perform the com-putation (which might involve making recursive ornon recursive calls) and then โ€œpushโ€ its return valueinto the stack. Because of the โ€œfirst in last outโ€ na-ture of a stack, we do not return control to the callingprocedure until all the recursive calls are done.The fact that we can implement recursion using a non-recursive language is not surprising. Indeed, machinelanguages typically do not have recursion (or functioncalls in general), and hence a compiler implementsfunction calls using a stack and GOTO. You can findonline tutorials on how recursion is implementedvia stack in your favorite programming language,whether itโ€™s Python , JavaScript, or Lisp/Scheme.

8.3 TURING EQUIVALENCE (DISCUSSION)

Any of the standard programming language such as C, Java, Python,Pascal, Fortran have very similar operations to NAND-RAM. (In-deed, ultimately they can all be executed by machines which have afixed number of registers and a large memory array.) Hence usingTheorem 8.1, we can simulate any program in such a programminglanguage by a NAND-TM program. In the other direction, it is a fairlyeasy programming exercise to write an interpreter for NAND-TM inany of the above programming languages. Hence we can also simulateNAND-TM programs (and so by Theorem 7.11, Turing machines) us-ing these programming languages. This property of being equivalentin power to Turing Machines / NAND-TM is called Turing Equivalent(or sometimes Turing Complete). Thus all programming languages weare familiar with are Turing equivalent.1

8.3.1 The โ€œBest of both worldsโ€ paradigmThe equivalence between Turing Machines and RAM machines allowsus to choose the most convenient language for the task at hand:

โ€ข When we want to prove a theorem about all programs/algorithms,we can use Turing machines (or NAND-TM) since they are sim-

Page 291: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 291

Figure 8.7: By having the two equivalent languagesNAND-TM and NAND-RAM, we can โ€œhave our cakeand eat it tooโ€, using NAND-TM when we want toprove that programs canโ€™t do something, and usingNAND-RAM or other high level languages when wewant to prove that programs can do something.

pler and easier to analyze. In particular, if we want to show thata certain function can not be computed, then we will use Turingmachines.

โ€ข When we want to show that a function can be computed we can useRAM machines or NAND-RAM, because they are easier to pro-gram in and correspond more closely to high level programminglanguages we are used to. In fact, we will often describe NAND-RAM programs in an informal manner, trusting that the readercan fill in the details and translate the high level description to theprecise program. (This is just like the way people typically use in-formal or โ€œpseudocodeโ€ descriptions of algorithms, trusting thattheir audience will know to translate these descriptions to code ifneeded.)

Our usage of Turing Machines / NAND-TM and RAM Machines/ NAND-RAM is very similar to the way people use in practice highand low level programming languages. When one wants to producea device that executes programs, it is convenient to do so for verysimple and โ€œlow levelโ€ programming language. When one wants todescribe an algorithm, it is convenient to use as high level a formalismas possible.

Big Idea 10 Using equivalence results such as those betweenTuring and RAM machines, we can โ€œhave our cake and eat it tooโ€.We can use a simpler model such as Turing machines when we wantto prove something canโ€™t be done, and use a feature-rich model such asRAM machines when we want to prove something can be done.

8.3.2 Letโ€™s talk about abstractionsโ€œThe programmer is in the unique position that โ€ฆ he has to be ableto think in terms of conceptual hierarchies that are much deeper thana single mind ever needed to face before.โ€, Edsger Dijkstra, โ€œOn thecruelty of really teaching computing scienceโ€, 1988.

At some point in any theory of computation course, the instructorand students need to have the talk. That is, we need to discuss the levelof abstraction in describing algorithms. In algorithms courses, onetypically describes algorithms in English, assuming readers can โ€œfillin the detailsโ€ and would be able to convert such an algorithm into animplementation if needed. For example, Algorithm 8.4 is a high leveldescription of the breadth first search algorithm.

Page 292: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

292 introduction to theoretical computer science

Algorithm 8.4 โ€” Breadth First Search.

Input: Graph ๐บ, vertices ๐‘ข, ๐‘ฃOutput: โ€connectedโ€ when ๐‘ข is connected to ๐‘ฃ in ๐บ, โ€dis-

connectedโ€1: Initialize empty queue ๐‘„.2: Put ๐‘ข in ๐‘„3: while ๐‘„ is not empty do4: Remove top vertex ๐‘ค from ๐‘„5: if ๐‘ค = ๐‘ฃ then return โ€connectedโ€6: end if7: Mark ๐‘ค8: Add all unmarked neighbors of ๐‘ค to ๐‘„.9: end while

10: return โ€disconnectedโ€

If we wanted to give more details on how to implement breadthfirst search in a programming language such as Python or C (orNAND-RAM / NAND-TM for that matter), we would describe howwe implement the queue data structure using an array, and similarlyhow we would use arrays to mark vertices. We call such an โ€œinterme-diate levelโ€ description an implementation level or pseudocode descrip-tion. Finally, if we want to describe the implementation precisely, wewould give the full code of the program (or another fully precise rep-resentation, such as in the form of a list of tuples). We call this a formalor low level description.

Figure 8.8: We can describe an algorithm at differentlevels of granularity/detail and precision. At thehighest level we just write the idea in words, omittingall details on representation and implementation.In the intermediate level (also known as implemen-tation or pseudocode) we give enough details of theimplementation that would allow someone to de-rive it, though we still fall short of providing the fullcode. The lowest level is where the actual code ormathematical description is fully spelled out. Thesedifferent levels of detail all have their uses, and mov-ing between them is one of the most important skillsfor a computer scientist.

While we started off by describing NAND-CIRC, NAND-TM, andNAND-RAM programs at the full formal level, as we progress in this

Page 293: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 293

book we will move to implementation and high level description.After all, our goal is not to use these models for actual computation,but rather to analyze the general phenomenon of computation. Thatsaid, if you donโ€™t understand how the high level description translatesto an actual implementation, going โ€œdown to the metalโ€ is often anexcellent exercise. One of the most important skills for a computerscientist is the ability to move up and down hierarchies of abstractions.

A similar distinction applies to the notion of representation of objectsas strings. Sometimes, to be precise, we give a low level specificationof exactly how an object maps into a binary string. For example, wemight describe an encoding of ๐‘› vertex graphs as length ๐‘›2 binarystrings, by saying that we map a graph ๐บ over the vertices [๐‘›] to astring ๐‘ฅ โˆˆ 0, 1๐‘›2 such that the ๐‘› โ‹… ๐‘– + ๐‘—-th coordinate of ๐‘ฅ is 1 if andonly if the edge ๐‘– ๐‘— is present in ๐บ. We can also use an intermediate orimplementation level description, by simply saying that we represent agraph using the adjacency matrix representation.

Finally, because we are translating between the various represen-tations of graphs (and objects in general) can be done via a NAND-RAM (and hence a NAND-TM) program, when talking in a high levelwe also suppress discussion of representation altogether. For example,the fact that graph connectivity is a computable function is true re-gardless of whether we represent graphs as adjacency lists, adjacencymatrices, list of edge-pairs, and so on and so forth. Hence, in caseswhere the precise representation doesnโ€™t make a difference, we wouldoften talk about our algorithms as taking as input an object ๐‘‹ (thatcan be a graph, a vector, a program, etc.) without specifying how ๐‘‹ isencoded as a string.

Defining โ€Algorithmsโ€. Up until now we have used the term โ€œalgo-rithmโ€ informally. However, Turing Machines and the range of equiv-alent models yield a way to precisely and formally define algorithms.Hence whenever we refer to an algorithm in this book, we will meanthat it is an instance of one of the Turing equivalent models, such asTuring machines, NAND-TM, RAM machines, etc. Because of theequivalence of all these models, in many contexts, it will not matterwhich of these we use.

8.3.3 Turing completeness and equivalence, a formal definition (optional)A computational model is some way to define what it means for a pro-gram (which is represented by a string) to compute a (partial) func-tion. A computational model โ„ณ is Turing complete, if we can map ev-ery Turing machine (or equivalently NAND-TM program) ๐‘ into aprogram ๐‘ƒ for โ„ณ that computes the same function as ๐‘„. It is Turingequivalent if the other direction holds as well (i.e., we can map everyprogram in โ„ณ to a Turing machine that computes the same function).

Page 294: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

294 introduction to theoretical computer science

We can define this notion formally as follows. (This formal definitionis not crucial for the remainder of this book so feel to skip it as longas you understand the general concept of Turing equivalence; Thisnotion is sometimes referred to in the literature as Gรถdel numberingor admissible numbering.)

Definition 8.5 โ€” Turing completeness and equivalence (optional). Let โ„ฑ bethe set of all partial functions from 0, 1โˆ— to 0, 1โˆ—. A computa-tional model is a map โ„ณ โˆถ 0, 1โˆ— โ†’ โ„ฑ.

We say that a program ๐‘ƒ โˆˆ 0, 1โˆ— โ„ณ-computes a function ๐น โˆˆ โ„ฑif โ„ณ(๐‘ƒ) = ๐น .

A computational model โ„ณ is Turing complete if there is a com-putable map ENCODEโ„ณ โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— for every Turingmachine ๐‘ (represented as a string), โ„ณ(ENCODEโ„ณ(๐‘)) is equalto the partial function computed by ๐‘ .

A computational model โ„ณ is Turing equivalent if it is Tur-ing complete and there exists a computable map DECODEโ„ณ โˆถ0, 1โˆ— โ†’ 0, 1โˆ— such that or every string ๐‘ƒ โˆˆ 0, 1โˆ—, ๐‘ =DECODEโ„ณ(๐‘ƒ ) is a string representation of a Turing machine thatcomputes the function โ„ณ(๐‘ƒ).Some examples of Turing equivalent models (some of which we

have already seen, and some are discussed below) include:

โ€ข Turing machinesโ€ข NAND-TM programsโ€ข NAND-RAM programsโ€ข ฮป calculusโ€ข Game of life (mapping programs and inputs/outputs to starting

and ending configurations)โ€ข Programming languages such as Python/C/Javascript/OCamlโ€ฆ

(allowing for unbounded storage)

8.4 CELLULAR AUTOMATA

Many physical systems can be described as consisting of a large num-ber of elementary components that interact with one another. Oneway to model such systems is using cellular automata. This is a systemthat consists of a large (or even infinite) number of cells. Each cellonly has a constant number of possible states. At each time step, a cellupdates to a new state by applying some simple rule to the state ofitself and its neighbors.

A canonical example of a cellular automaton is Conwayโ€™s Gameof Life. In this automata the cells are arranged in an infinite two di-mensional grid. Each cell has only two states: โ€œdeadโ€ (which we can

Page 295: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 295

Figure 8.9: Rules for Conwayโ€™s Game of Life. Imagefrom this blog post.

encode as 0 and identify with โˆ…) or โ€œaliveโ€ (which we can encodeas 1). The next state of a cell depends on its previous state and thestates of its 8 vertical, horizontal and diagonal neighbors (see Fig. 8.9).A dead cell becomes alive only if exactly three of its neighbors arealive. A live cell continues to live if it has two or three live neighbors.Even though the number of cells is potentially infinite, we can en-code the state using a finite-length string by only keeping track of thelive cells. If we initialize the system in a configuration with a finitenumber of live cells, then the number of live cells will stay finite in allfuture steps. The Wikipedia page for the Game of Life contains somebeautiful figures and animations of configurations that produce veryinteresting evolutions.

Figure 8.10: In a two dimensional cellular automatonevery cell is in position ๐‘–, ๐‘— for some integers ๐‘–, ๐‘— โˆˆ โ„ค.The state of a cell is some value ๐ด๐‘–,๐‘— โˆˆ ฮฃ for somefinite alphabet ฮฃ. At a given time step, the state of thecell is adjusted according to some function applied tothe state of (๐‘–, ๐‘—) and all its neighbors (๐‘– ยฑ 1, ๐‘— ยฑ 1).In a one dimensional cellular automaton every cell is inposition ๐‘– โˆˆ โ„ค and the state ๐ด๐‘– of ๐‘– at the next timestep depends on its current state and the state of itstwo neighbors ๐‘– โˆ’ 1 and ๐‘– + 1.

Since the cells in the game of life are are arranged in an infinite two-dimensional grid, it is an example of a two dimensional cellular automa-ton. We can also consider the even simpler setting of a one dimensionalcellular automaton, where the cells are arranged in an infinite line, seeFig. 8.10. It turns out that even this simple model is enough to achieve

Page 296: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

296 introduction to theoretical computer science

Figure 8.11: A Game-of-Life configuration simulatinga Turing Machine. Figure by Paul Rendell.

Turing-completeness. We will now formally define one-dimensionalcellular automata and then prove their Turing completeness.

Definition 8.6 โ€” One dimensional cellular automata. Let ฮฃ be a finite setcontaining the symbol โˆ…. A one dimensional cellular automation overalphabet ฮฃ is described by a transition rule ๐‘Ÿ โˆถ ฮฃ3 โ†’ ฮฃ, whichsatisfies ๐‘Ÿ(โˆ…,โˆ…,โˆ…) = โˆ….

A configuration of the automaton ๐‘Ÿ is a function ๐ด โˆถ โ„ค โ†’ ฮฃ. Ifan automaton with rule ๐‘Ÿ is in configuration ๐ด, then its next config-uration, denoted by ๐ดโ€ฒ = NEXT๐‘Ÿ(๐ด), is the function ๐ดโ€ฒ such that๐ดโ€ฒ(๐‘–) = ๐‘Ÿ(๐ด(๐‘– โˆ’ 1), ๐ด(๐‘–), ๐ด(๐‘– + 1)) for every ๐‘– โˆˆ โ„ค. In other words,the next state of the automaton ๐‘Ÿ at point ๐‘– obtained by applyingthe rule ๐‘Ÿ to the values of ๐ด at ๐‘– and its two neighbors.

Finite configuration. We say that a configuration of an automaton ๐‘Ÿis finite if there is only some finite number of indices ๐‘–0, โ€ฆ , ๐‘–๐‘—โˆ’1 in โ„คsuch that ๐ด(๐‘–๐‘—) โ‰  โˆ…. (That is, for every ๐‘– โˆ‰ ๐‘–0, โ€ฆ , ๐‘–๐‘—โˆ’1, ๐ด(๐‘–) = โˆ….)Such a configuration can be represented using a finite string thatencodes the indices ๐‘–0, โ€ฆ , ๐‘–๐‘›โˆ’1 and the values ๐ด(๐‘–0), โ€ฆ , ๐ด(๐‘–๐‘›โˆ’1). Since๐‘…(โˆ…,โˆ…,โˆ…) = โˆ…, if ๐ด is a finite configuration then NEXT๐‘Ÿ(๐ด) is finiteas well. We will only be interested in studying cellular automata thatare initialized in finite configurations, and hence remain in a finiteconfiguration throughout their evolution.

8.4.1 One dimensional cellular automata are Turing completeWe can write a program (for example using NAND-RAM) that sim-ulates the evolution of any cellular automaton from an initial finiteconfiguration by simply storing the values of the cells with state notequal to โˆ… and repeatedly applying the rule ๐‘Ÿ. Hence cellular au-tomata can be simulated by Turing Machines. What is more surprisingthat the other direction holds as well. For example, as simple as itsrules seem, we can simulate a Turing machine using the game of life(see Fig. 8.11).

In fact, even one dimensional cellular automata can be Turing com-plete:

Theorem 8.7 โ€” One dimensional automata are Turing complete. For everyTuring machine ๐‘€ , there is a one dimensional cellular automatonthat can simulate ๐‘€ on every input ๐‘ฅ.To make the notion of โ€œsimulating a Turing machineโ€ more precise

we will need to define configurations of Turing machines. We willdo so in Section 8.4.2 below, but at a high level a configuration of aTuring machine is a string that encodes its full state at a given step in

Page 297: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 297

its computation. That is, the contents of all (non empty) cells of itstape, its current state, as well as the head position.

The key idea in the proof of Theorem 8.7 is that at every point inthe computation of a Turing machine ๐‘€ , the only cell in ๐‘€ โ€™s tape thatcan change is the one where the head is located, and the value thiscell changes to is a function of its current state and the finite state of๐‘€ . This observation allows us to encode the configuration of a Turingmachine ๐‘€ as a finite configuration of a cellular automaton ๐‘Ÿ, andensure that a one-step evolution of this encoded configuration underthe rules of ๐‘Ÿ corresponds to one step in the execution of the Turingmachine ๐‘€ .

8.4.2 Configurations of Turing machines and the next-step functionTo turn the above ideas into a rigorous proof (and even statement!)of Theorem 8.7 we will need to precisely define the notion of config-urations of Turing machines. This notion will be useful for us in laterchapters as well.

Figure 8.12: A configuration of a Turing machine ๐‘€with alphabet ฮฃ and state space [๐‘˜] encodes the stateof ๐‘€ at a particular step in its execution as a string ๐›ผover the alphabet ฮฃ = ฮฃ ร— (โ‹… ร— [๐‘˜]). The string isof length ๐‘ก where ๐‘ก is such that ๐‘€โ€™s tape contains โˆ… inall positions ๐‘ก and larger and ๐‘€โ€™s head is in a positionsmaller than ๐‘ก. If ๐‘€โ€™s head is in the ๐‘–-th position, thenfor ๐‘— โ‰  ๐‘–, ๐›ผ๐‘— encodes the value of the ๐‘—-th cell of ๐‘€โ€™stape, while ๐›ผ๐‘– encodes both this value as well as thecurrent state of ๐‘€ . If the machine writes the value ๐œ ,changes state to ๐‘ก, and moves right, then in the nextconfiguration will contain at position ๐‘– the value (๐œ, โ‹…)and at position ๐‘– + 1 the value (๐›ผ๐‘–+1, ๐‘ก).

Definition 8.8 โ€” Configuration of Turing Machines.. Let ๐‘€ be a Turing ma-chine with tape alphabet ฮฃ and state space [๐‘˜]. A configuration of ๐‘€is a string ๐›ผ โˆˆ ฮฃโˆ— where ฮฃ = ฮฃ ร— (โ‹… โˆช [๐‘˜]) that satisfies that thereis exactly one coordinate ๐‘– for which ๐›ผ๐‘– = (๐œŽ, ๐‘ ) for some ๐œŽ โˆˆ ฮฃ and๐‘  โˆˆ [๐‘˜]. For all other coordinates ๐‘—, ๐›ผ๐‘— = (๐œŽโ€ฒ, โ‹…) for some ๐œŽโ€ฒ โˆˆ ฮฃ.

A configuration ๐›ผ โˆˆ ฮฃโˆ— of ๐‘€ corresponds to the following stateof its execution:

โ€ข ๐‘€ โ€™s tape contains ๐›ผ๐‘—,0 for all ๐‘— < |๐›ผ| and contains โˆ… for all po-sitions that are at least |๐›ผ|, where we let ๐›ผ๐‘—,0 be the value ๐œŽ suchthat ๐›ผ๐‘— = (๐œŽ, ๐‘ก) with ๐œŽ โˆˆ ฮฃ and ๐‘ก โˆˆ โ‹… โˆช [๐‘˜]. (In other words,

Page 298: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

298 introduction to theoretical computer science

since ๐›ผ๐‘— is a pair of an alphabet symbol ๐œŽ and either a state in [๐‘˜]or the symbol โ‹…, ๐›ผ๐‘—,0 is the first component ๐œŽ of this pair.)

โ€ข ๐‘€ โ€™s head is in the unique position ๐‘– for which ๐›ผ๐‘– has the form(๐œŽ, ๐‘ ) for ๐‘  โˆˆ [๐‘˜], and ๐‘€ โ€™s state is equal to ๐‘ .P

Definition 8.8 below has some technical details, butis not actually that deep or complicated. Try to take amoment to stop and think how you would encode as astring the state of a Turing machine at a given point inan execution.Think what are all the components that you need toknow in order to be able to continue the executionfrom this point onwards, and what is a simple wayto encode them using a list of finite symbols. In par-ticular, with an eye towards our future applications,try to think of an encoding which will make it as sim-ple as possible to map a configuration at step ๐‘ก to theconfiguration at step ๐‘ก + 1.

Definition 8.8 is a little cumbersome, but ultimately a configurationis simply a string that encodes a snapshot of the Turing machine at agiven point in the execution. (In operating-systems lingo, it is a โ€œcoredumpโ€.) Such a snapshot needs to encode the following components:

1. The current head position.

2. The full contents of the large scale memory, that is the tape.

3. The contents of the โ€œlocal registersโ€, that is the state of the ma-chine.

The precise details of how we encode a configuration are not impor-tant, but we do want to record the following simple fact:Lemma 8.9 Let ๐‘€ be a Turing machine and let NEXT๐‘€ โˆถ ฮฃโˆ— โ†’ ฮฃโˆ—be the function that maps a configuration of ๐‘€ to the configurationat the next step of the execution. Then for every ๐‘– โˆˆ โ„•, the value ofNEXT๐‘€(๐›ผ)๐‘– only depends on the coordinates ๐›ผ๐‘–โˆ’1, ๐›ผ๐‘–, ๐›ผ๐‘–+1.

(For simplicity of notation, above we use the convention that if ๐‘–is โ€œout of boundsโ€, such as ๐‘– < 0 or ๐‘– > |๐›ผ|, then we assume that๐›ผ๐‘– = (โˆ…, โ‹…).) We leave proving Lemma 8.9 as Exercise 8.7. The ideabehind the proof is simple: if the head is neither in position ๐‘– norpositions ๐‘– โˆ’ 1 and ๐‘– + 1, then the next-step configuration at ๐‘– will bethe same as it was before. Otherwise, we can โ€œread offโ€ the state of theTuring machine and the value of the tape at the head location from the

Page 299: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 299

Figure 8.13: Evolution of a one dimensional automata.Each row in the figure corresponds to the configura-tion. The initial configuration corresponds to the toprow and contains only a single โ€œliveโ€ cell. This figurecorresponds to the โ€œRule 110โ€ automaton of StephenWolfram which is Turing Complete. Figure takenfrom Wolfram MathWorld.

configuration at ๐‘– or one of its neighbors and use that to update whatthe new state at ๐‘– should be. Completing the full proof is not hard,but doing it is a great way to ensure that you are comfortable with thedefinition of configurations.

Completing the proof of Theorem 8.7. We can now restate Theorem 8.7more formally, and complete its proof:

Theorem 8.10 โ€” One dimensional automata are Turing complete (formal state-

ment). For every Turing Machine ๐‘€ , if we denote by ฮฃ the alphabetof its configuration strings, then there is a one-dimensional cellularautomaton ๐‘Ÿ over the alphabet ฮฃโˆ— such that(NEXT๐‘€(๐›ผ)) = NEXT๐‘Ÿ (๐›ผ) (8.2)

for every configuration ๐›ผ โˆˆ ฮฃโˆ— of ๐‘€ (again using the conventionthat we consider ๐›ผ๐‘– = โˆ… if ๐‘– is โ€out of bounds).

Proof. We consider the element (โˆ…, โ‹…) of ฮฃ to correspond to the โˆ…element of the automaton ๐‘Ÿ. In this case, by Lemma 8.9, the functionNEXT๐‘€ that maps a configuration of ๐‘€ into the next one is in fact avalid rule for a one dimensional automata.

The automaton arising from the proof of Theorem 8.10 has a largealphabet, and furthermore one whose size that depends on the ma-chine ๐‘€ that is being simulated. It turns out that one can obtain anautomaton with an alphabet of fixed size that is independent of theprogram being simulated, and in fact the alphabet of the automatoncan be the minimal set 0, 1! See Fig. 8.13 for an example of such anTuring-complete automaton.

RRemark 8.11 โ€” Configurations of NAND-TM programs.We can use the same approach as Definition 8.8 todefine configurations of a NAND-TM program. Such aconfiguration will need to encode:

1. The current value of the variable i.2. For every scalar variable foo, the value of foo.3. For every array variable Bar, the value Bar[๐‘—] for

every ๐‘— โˆˆ 0, โ€ฆ , ๐‘ก โˆ’ 1 where ๐‘ก โˆ’ 1 is the largestvalue that the index variable i ever achieved in thecomputation.

Page 300: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

300 introduction to theoretical computer science

8.5 LAMBDA CALCULUS AND FUNCTIONAL PROGRAMMING LAN-GUAGES

The ฮป calculus is another way to define computable functions. It wasproposed by Alonzo Church in the 1930โ€™s around the same time asAlan Turingโ€™s proposal of the Turing Machine. Interestingly, whileTuring Machines are not used for practical computation, the ฮป calculushas inspired functional programming languages such as LISP, ML andHaskell, and indirectly the development of many other programminglanguages as well. In this section we will present the ฮป calculus andshow that its power is equivalent to NAND-TM programs (and hencealso to Turing machines). Our Github repository contains a Jupyternotebook with a Python implementation of the ฮป calculus that you canexperiment with to get a better feel for this topic.

The ฮป operator. At the core of the ฮป calculus is a way to define โ€œanony-mousโ€ functions. For example, instead of giving a name ๐‘“ to a func-tion and defining it as ๐‘“(๐‘ฅ) = ๐‘ฅ ร— ๐‘ฅ (8.3)

we can write it as ๐œ†๐‘ฅ.๐‘ฅ ร— ๐‘ฅ (8.4)and so (๐œ†๐‘ฅ.๐‘ฅ ร— ๐‘ฅ)(7) = 49. That is, you can think of ๐œ†๐‘ฅ.๐‘’๐‘ฅ๐‘(๐‘ฅ),

where ๐‘’๐‘ฅ๐‘ is some expression as a way of specifying the anonymousfunction ๐‘ฅ โ†ฆ ๐‘’๐‘ฅ๐‘(๐‘ฅ). Anonymous functions, using either ๐œ†๐‘ฅ.๐‘“(๐‘ฅ), ๐‘ฅ โ†ฆ๐‘“(๐‘ฅ) or other closely related notation, appear in many programminglanguages. For example, in Python we can define the squaring functionusing lambda x: x*x while in JavaScript we can use x => x*x or(x) => x*x. In Scheme we would define it as (lambda (x) (* x x)).Clearly, the name of the argument to a function doesnโ€™t matter, and so๐œ†๐‘ฆ.๐‘ฆ ร— ๐‘ฆ is the same as ๐œ†๐‘ฅ.๐‘ฅ ร— ๐‘ฅ, as both correspond to the squaringfunction.

Dropping parenthesis. To reduce notational clutter, when writing๐œ† calculus expressions we often drop the parentheses for functionevaluation. Hence instead of writing ๐‘“(๐‘ฅ) for the result of applyingthe function ๐‘“ to the input ๐‘ฅ, we can also write this as simply ๐‘“ ๐‘ฅ.Therefore we can write (๐œ†๐‘ฅ.๐‘ฅ ร— ๐‘ฅ)7 = 49. In this chapter, we will useboth the ๐‘“(๐‘ฅ) and ๐‘“ ๐‘ฅ notations for function application. Functionevaluations are associative and bind from left to right, and hence ๐‘“ ๐‘” โ„Žis the same as (๐‘“๐‘”)โ„Ž.8.5.1 Applying functions to functionsA key feature of the ฮป calculus is that functions are โ€œfirst-class objectsโ€in the sense that we can use functions as arguments to other functions.

Page 301: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 301

For example, can you guess what number is the following expressionequal to? (((๐œ†๐‘“.(๐œ†๐‘ฆ.(๐‘“ (๐‘“ ๐‘ฆ))))(๐œ†๐‘ฅ.๐‘ฅ ร— ๐‘ฅ)) 3) (8.5)

PThe expression (8.5) might seem daunting, but beforeyou look at the solution below, try to break it apartto its components, and evaluate each component at atime. Working out this example would go a long waytoward understanding the ฮป calculus.

Letโ€™s evaluate (8.5) one step at a time. As nice as it is for the ฮปcalculus to allow anonymous functions, adding names can be veryhelpful for understanding complicated expressions. So, let us write๐น = ๐œ†๐‘“.(๐œ†๐‘ฆ.(๐‘“(๐‘“๐‘ฆ))) and ๐‘” = ๐œ†๐‘ฅ.๐‘ฅ ร— ๐‘ฅ.

Therefore (8.5) becomes ((๐น ๐‘”) 3) . (8.6)

On input a function ๐‘“ , ๐น outputs the function ๐œ†๐‘ฆ.(๐‘“(๐‘“ ๐‘ฆ)), or inother words ๐น๐‘“ is the function ๐‘ฆ โ†ฆ ๐‘“(๐‘“(๐‘ฆ)). Our function ๐‘” is simply๐‘”(๐‘ฅ) = ๐‘ฅ2 and so (๐น๐‘”) is the function that maps ๐‘ฆ to (๐‘ฆ2)2 = ๐‘ฆ4. Hence((๐น๐‘”)3) = 34 = 81.Solved Exercise 8.1 What number does the following expression equalto? ((๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘ฅ)) 2) 9) . (8.7)

Solution:๐œ†๐‘ฆ.๐‘ฅ is the function that on input ๐‘ฆ ignores its input and outputs๐‘ฅ. Hence (๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘ฅ))2 yields the function ๐‘ฆ โ†ฆ 2 (or, using ๐œ† nota-tion, the function ๐œ†๐‘ฆ.2). Hence (8.7) is equivalent to (๐œ†๐‘ฆ.2)9 = 2.

8.5.2 Obtaining multi-argument functions via CurryingIn a ฮป expression of the form ๐œ†๐‘ฅ.๐‘’, the expression ๐‘’ can itself involvethe ฮป operator. Thus for example the function๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘ฅ + ๐‘ฆ) (8.8)

maps ๐‘ฅ to the function ๐‘ฆ โ†ฆ ๐‘ฅ + ๐‘ฆ.In particular, if we invoke the function (8.8) on ๐‘Ž to obtain some

function ๐‘“ , and then invoke ๐‘“ on ๐‘, we obtain the value ๐‘Ž + ๐‘. We

Page 302: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

302 introduction to theoretical computer science

Figure 8.14: In the โ€œcurryingโ€ transformation, we cancreate the effect of a two parameter function ๐‘“(๐‘ฅ, ๐‘ฆ)with the ฮป expression ๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘“(๐‘ฅ, ๐‘ฆ)) which on input๐‘ฅ outputs a one-parameter function ๐‘“๐‘ฅ that has ๐‘ฅโ€œhardwiredโ€ into it and such that ๐‘“๐‘ฅ(๐‘ฆ) = ๐‘“(๐‘ฅ, ๐‘ฆ).This can be illustrated by a circuit diagram; seeChelsea Vossโ€™s site.

can see that the one-argument function (8.8) corresponding to ๐‘Ž โ†ฆ(๐‘ โ†ฆ ๐‘Ž + ๐‘) can also be thought of as the two-argument function(๐‘Ž, ๐‘) โ†ฆ ๐‘Ž + ๐‘. Generally, we can use the ฮป expression ๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘“(๐‘ฅ, ๐‘ฆ))to simulate the effect of a two argument function (๐‘ฅ, ๐‘ฆ) โ†ฆ ๐‘“(๐‘ฅ, ๐‘ฆ). Thistechnique is known as Currying. We will use the shorthand ๐œ†๐‘ฅ, ๐‘ฆ.๐‘’for ๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘’). If ๐‘“ = ๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘’) then (๐‘“๐‘Ž)๐‘ corresponds to applying ๐‘“๐‘Žand then invoking the resulting function on ๐‘, obtaining the result ofreplacing in ๐‘’ the occurrences of ๐‘ฅ with ๐‘Ž and occurrences of ๐‘ with๐‘ฆ. By our rules of associativity, this is the same as (๐‘“๐‘Ž๐‘) which weโ€™llsometimes also write as ๐‘“(๐‘Ž, ๐‘).8.5.3 Formal description of the ฮป calculusWe now provide a formal description of the ฮป calculus. We start withโ€œbasic expressionsโ€ that contain a single variable such as ๐‘ฅ or ๐‘ฆ andbuild more complex expressions of the form (๐‘’ ๐‘’โ€ฒ) and ๐œ†๐‘ฅ.๐‘’ where ๐‘’, ๐‘’โ€ฒare expressions and ๐‘ฅ is a variable idenifier. Formally ฮป expressionsare defined as follows:

Definition 8.12 โ€” ฮป expression.. A ฮป expression is either a single variableidentifier or an expression ๐‘’ of the one of the following forms:

โ€ข Application: ๐‘’ = (๐‘’โ€ฒ ๐‘’โ€ณ), where ๐‘’โ€ฒ and ๐‘’โ€ณ are ฮป expressions.

โ€ข Abstraction: ๐‘’ = ๐œ†๐‘ฅ.(๐‘’โ€ฒ) where ๐‘’โ€ฒ is a ฮป expression.

Definition 8.12 is a recursive definition since we defined the conceptof ฮป expressions in terms of itself. This might seem confusing at first,but in fact you have known recursive definitions since you were anelementary school student. Consider how we define an arithmeticexpression: it is an expression that is either just a number, or has one ofthe forms (๐‘’ + ๐‘’โ€ฒ), (๐‘’ โˆ’ ๐‘’โ€ฒ), (๐‘’ ร— ๐‘’โ€ฒ), or (๐‘’ รท ๐‘’โ€ฒ), where ๐‘’ and ๐‘’โ€ฒ are otherarithmetic expressions.

Free and bound variables. Variables in a ฮป expression can either befree or bound to a ๐œ† operator (in the sense of Section 1.4.7). In a single-variable ฮป expression ๐‘ฃ๐‘Ž๐‘Ÿ, the variable ๐‘ฃ๐‘Ž๐‘Ÿ is free. The set of free andbound variables in an application expression ๐‘’ = (๐‘’โ€ฒ ๐‘’โ€ณ) is the sameas that of the underlying expressions ๐‘’โ€ฒ and ๐‘’โ€ณ. In an abstraction ex-pression ๐‘’ = ๐œ†๐‘ฃ๐‘Ž๐‘Ÿ.(๐‘’โ€ฒ), all free occurences of ๐‘ฃ๐‘Ž๐‘Ÿ in ๐‘’โ€ฒ are bound tothe ๐œ† operator of ๐‘’. If you find the notion of free and bound variablesconfusing, you can avoid all these issues by using unique identifiersfor all variables.

Precedence and parenthesis. We will use the following rules to allowus to drop some parenthesis. Function application associates fromleft to right, and so ๐‘“๐‘”โ„Ž is the same as (๐‘“๐‘”)โ„Ž. Function application hasa higher precedence than the ฮป operator, and so ๐œ†๐‘ฅ.๐‘“๐‘”๐‘ฅ is the same

Page 303: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 303

as ๐œ†๐‘ฅ.((๐‘“๐‘”)๐‘ฅ). This is similar to how we use the precedence rules inarithmetic operations to allow us to use fewer parenthesis and so writethe expression (7ร—3)+2 as 7ร—3+2. As mentioned in Section 8.5.2, wealso use the shorthand ๐œ†๐‘ฅ, ๐‘ฆ.๐‘’ for ๐œ†๐‘ฅ.(๐œ†๐‘ฆ.๐‘’) and the shorthand ๐‘“(๐‘ฅ, ๐‘ฆ)for (๐‘“ ๐‘ฅ) ๐‘ฆ. This plays nicely with the โ€œCurryingโ€ transformation ofsimulating multi-input functions using ฮป expressions.

Equivalence of ฮป expressions. As we have seen in Solved Exercise 8.1,the rule that (๐œ†๐‘ฅ.๐‘’๐‘ฅ๐‘)๐‘’๐‘ฅ๐‘โ€ฒ is equivalent to ๐‘’๐‘ฅ๐‘[๐‘ฅ โ†’ ๐‘’๐‘ฅ๐‘โ€ฒ] enables usto modify ฮป expressions and obtain simpler equivalent form for them.Another rule that we can use is that the parameter does not matterand hence for example ๐œ†๐‘ฆ.๐‘ฆ is the same as ๐œ†๐‘ง.๐‘ง. Together these rulesdefine the notion of equivalence of ฮป expressions:

Definition 8.13 โ€” Equivalence of ฮป expressions. Two ฮป expressions areequivalent if they can be made into the same expression by repeatedapplications of the following rules:

1. Evaluation (aka ๐›ฝ reduction): The expression (๐œ†๐‘ฅ.๐‘’๐‘ฅ๐‘)๐‘’๐‘ฅ๐‘โ€ฒ isequivalent to ๐‘’๐‘ฅ๐‘[๐‘ฅ โ†’ ๐‘’๐‘ฅ๐‘โ€ฒ].

2. Variable renaming (aka ๐›ผ conversion): The expression ๐œ†๐‘ฅ.๐‘’๐‘ฅ๐‘is equivalent to ๐œ†๐‘ฆ.๐‘’๐‘ฅ๐‘[๐‘ฅ โ†’ ๐‘ฆ].

If ๐‘’๐‘ฅ๐‘ is a ฮป expression of the form ๐œ†๐‘ฅ.๐‘’๐‘ฅ๐‘โ€ฒ then it naturally corre-sponds to the function that maps any input ๐‘ง to ๐‘’๐‘ฅ๐‘โ€ฒ[๐‘ฅ โ†’ ๐‘ง]. Hencethe ฮป calculus naturally implies a computational model. Since in the ฮปcalculus the inputs can themselves be functions, we need to decide inwhat order we evaluate an expression such as(๐œ†๐‘ฅ.๐‘“)(๐œ†๐‘ฆ.๐‘”๐‘ง) . (8.9)There are two natural conventions for this:

โ€ข Call by name (aka โ€œlazy evaluationโ€): We evaluate (8.9) by first plug-ging in the righthand expression (๐œ†๐‘ฆ.๐‘”๐‘ง) as input to the lefthandside function, obtaining ๐‘“[๐‘ฅ โ†’ (๐œ†๐‘ฆ.๐‘”๐‘ง)] and then continue fromthere.

โ€ข Call by value (aka โ€œeager evaluationโ€): We evaluate (8.9) by firstevaluating the righthand side and obtaining โ„Ž = ๐‘”[๐‘ฆ โ†’ ๐‘ง], and thenplugging this into the lefthandside to obtain ๐‘“[๐‘ฅ โ†’ โ„Ž].Because the ฮป calculus has only pure functions, that do not have

โ€œside effectsโ€, in many cases the order does not matter. In fact, it canbe shown that if we obtain a definite irreducible expression (for ex-ample, a number) in both strategies, then it will be the same one.

Page 304: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

304 introduction to theoretical computer science

However, for concreteness we will always use the โ€œcall by nameโ€ (i.e.,lazy evaluation) order. (The same choice is made in the programminglanguage Haskell, though many other programming languages useeager evaluation.) Formally, the evaluation of a ฮป expression usingโ€œcall by nameโ€ is captured by the following process:

Definition 8.14 โ€” Simplification of ฮป expressions. Let ๐‘’ be a ฮป expres-sion. The simplification of ๐‘’ is the result of the following recursiveprocess:

1. If ๐‘’ is a single variable ๐‘ฅ then the simplification of ๐‘’ is ๐‘’.2. If ๐‘’ has the form ๐‘’ = ๐œ†๐‘ฅ.๐‘’โ€ฒ then the simplification of ๐‘’ is ๐œ†๐‘ฅ.๐‘“ โ€ฒ

where ๐‘“ โ€ฒ is the simplification of ๐‘’โ€ฒ.3. (Evaluation / ๐›ฝ reduction.) If ๐‘’ has the form ๐‘’ = (๐œ†๐‘ฅ.๐‘’โ€ฒ ๐‘’โ€ณ) then

the simplification of ๐‘’ is the simplification of ๐‘’โ€ฒ[๐‘ฅ โ†’ ๐‘’โ€ณ], whichdenotes replacing all copies of ๐‘ฅ in ๐‘’โ€ฒ bound to the ๐œ† operatorwith ๐‘’โ€ณ

4. (Renaming / ๐›ผ conversion.) The canonical simplification of ๐‘’ isobtained by taking the simplification of ๐‘’ and renaming the vari-ables so that the first bound variable in the expression is ๐‘ฃ0, thesecond one is ๐‘ฃ1, and so on and so forth.

We say that two ฮป expressions ๐‘’ and ๐‘’โ€ฒ are equivalent, denoted by๐‘’ โ‰… ๐‘’โ€ฒ, if they have the same canonical simplification.

Solved Exercise 8.2 โ€” Equivalence of ฮป expressions. Prove that the followingtwo expressions ๐‘’ and ๐‘“ are equivalent:

๐‘’ = ๐œ†๐‘ฅ.๐‘ฅ (8.10)

๐‘“ = (๐œ†๐‘Ž.(๐œ†๐‘.๐‘))(๐œ†๐‘ง.๐‘ง๐‘ง) (8.11)

Solution:

The canonical simplification of ๐‘’ is simply ๐œ†๐‘ฃ0.๐‘ฃ0. To do thecanonical simplification of ๐‘“ we first use ๐›ฝ reduction to plug in๐œ†๐‘ง.๐‘ง๐‘ง instead of ๐‘Ž in (๐œ†๐‘.๐‘) but since ๐‘Ž is not used in this function atall, we simply obtained ๐œ†๐‘.๐‘ which simplifies to ๐œ†๐‘ฃ0.๐‘ฃ0 as well.

Page 305: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 305

2 In Lisp, the PAIR, HEAD and TAIL functions aretraditionally called cons, car and cdr.

8.5.4 Infinite loops in the ฮป calculusLike Turing machines and NAND-TM programs, the simplificationprocess in the ฮป calculus can also enter into an infinite loop. For exam-ple, consider the ฮป expression๐œ†๐‘ฅ.๐‘ฅ๐‘ฅ ๐œ†๐‘ฅ.๐‘ฅ๐‘ฅ (8.12)

If we try to simplify (8.12) by invoking the lefthand function onthe righthand one, then we get another copy of (8.12) and hence thisnever ends. There are examples where the order of evaluation canmatter for whether or not an expression can be simplified, see Exer-cise 8.9.

8.6 THE โ€œENHANCEDโ€ ฮ› CALCULUS

We now discuss the ฮป calculus as a computational model. We willstart by describing an โ€œenhancedโ€ version of the ฮป calculus that con-tains some โ€œsuperfluous featuresโ€ but is easier to wrap your headaround. We will first show how the enhanced ฮป calculus is equiva-lent to Turing machines in computational power. Then we will showhow all the features of โ€œenhanced ฮป calculusโ€ can be implemented asโ€œsyntactic sugarโ€ on top of the โ€œpureโ€ (i.e., non enhanced) ฮป calculus.Hence the pure ฮป calculus is equivalent in power to Turing machines(and hence also to RAM machines and all other Turing-equivalentmodels).

The enhanced ฮป calculus includes the following set of objects andoperations:

โ€ข Boolean constants and IF function: There are ฮป expressions 0, 1and IF that satisfy the following conditions: for every ฮป expression๐‘’ and ๐‘“ , IF 1 ๐‘’ ๐‘“ = ๐‘’ and IF 0 ๐‘’ ๐‘“ = ๐‘“ . That is, IF is the function thatgiven three arguments ๐‘Ž, ๐‘’, ๐‘“ outputs ๐‘’ if ๐‘Ž = 1 and ๐‘“ if ๐‘Ž = 0.

โ€ข Pairs: There is a ฮป expression PAIR which we will think of as thepairing function. For every ฮป expressions ๐‘’, ๐‘“ , PAIR ๐‘’ ๐‘“ is thepair โŸจ๐‘’, ๐‘“โŸฉ that contains ๐‘’ as its first member and ๐‘“ as its secondmember. We also have ฮป expressions HEAD and TAIL that extractthe first and second member of a pair respectively. Hence, for everyฮป expressions ๐‘’, ๐‘“ , HEAD (PAIR ๐‘’ ๐‘“) = ๐‘’ and TAIL (PAIR ๐‘’ ๐‘“) = ๐‘“ .2

โ€ข Lists and strings: There is ฮป expression NIL that corresponds tothe empty list, which we also denote by โŸจโŠฅโŸฉ. Using PAIR and NILwe construct lists. The idea is that if ๐ฟ is a ๐‘˜ element list of theform โŸจ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’๐‘˜, โŠฅโŸฉ then for every ฮป expression ๐‘’0 we can obtainthe ๐‘˜ + 1 element list โŸจ๐‘’0, ๐‘’1, ๐‘’2, โ€ฆ , ๐‘’๐‘˜, โŠฅโŸฉ using the expressionPAIR ๐‘’0 ๐ฟ. For example, for every three ฮป expressions ๐‘’, ๐‘“, ๐‘”, the

Page 306: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

306 introduction to theoretical computer science

following corresponds to the three element list โŸจ๐‘’, ๐‘“, ๐‘”, โŠฅโŸฉ:PAIR ๐‘’ (PAIR ๐‘“ (PAIR ๐‘” NIL)) . (8.13)

The ฮป expression ISEMPTY returns 1 on NIL and returns 0 on everyother list. A string is simply a list of bits.

โ€ข List operations: The enhanced ฮป calculus also contains thelist-processing functions MAP, REDUCE, and FILTER. Givena list ๐ฟ = โŸจ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, โŠฅโŸฉ and a function ๐‘“ , MAP ๐ฟ ๐‘“ ap-plies ๐‘“ on every member of the list to obtain the new list๐ฟโ€ฒ = โŸจ๐‘“(๐‘ฅ0), โ€ฆ , ๐‘“(๐‘ฅ๐‘›โˆ’1), โŠฅโŸฉ. Given a list ๐ฟ as above and anexpression ๐‘“ whose output is either 0 or 1, FILTER ๐ฟ ๐‘“ returns thelist โŸจ๐‘ฅ๐‘–โŸฉ๐‘“๐‘ฅ๐‘–=1 containing all the elements of ๐ฟ for which ๐‘“ outputs1. The function REDUCE applies a โ€œcombiningโ€ operation to alist. For example, REDUCE ๐ฟ + 0 will return the sum of all theelements in the list ๐ฟ. More generally, REDUCE takes a list ๐ฟ, anoperation ๐‘“ (which we think of as taking two arguments) and a ฮปexpression ๐‘ง (which we think of as the โ€œneutral elementโ€ for theoperation ๐‘“ , such as 0 for addition and 1 for multiplication). Theoutput is defined via

REDUCE ๐ฟ ๐‘“ ๐‘ง = โŽงโŽจโŽฉ๐‘ง ๐ฟ = NIL๐‘“ (HEAD ๐ฟ) (REDUCE (TAIL ๐ฟ) ๐‘“ ๐‘ง) otherwise.

(8.14)See Fig. 8.16 for an illustration of the three list-processing operations.

โ€ข Recursion: Finally, we want to be able to execute recursive func-tions. Since in ฮป calculus functions are anonymous, we canโ€™t writea definition of the form ๐‘“(๐‘ฅ) = ๐‘๐‘™๐‘Žโ„Ž where ๐‘๐‘™๐‘Žโ„Ž includes calls to๐‘“ . Instead we use functions ๐‘“ that take an additional input ๐‘š๐‘’ as aparameter. The operator RECURSE will take such a function ๐‘“ asinput and return a โ€œrecursive versionโ€ of ๐‘“ where all the calls to ๐‘š๐‘’are replaced by recursive calls to this function. That is, if we have afunction ๐น taking two parameters ๐‘š๐‘’ and ๐‘ฅ, then RECURSE ๐น willbe the function ๐‘“ taking one parameter ๐‘ฅ such that ๐‘“(๐‘ฅ) = ๐น(๐‘“, ๐‘ฅ)for every ๐‘ฅ.

Solved Exercise 8.3 โ€” Compute NAND using ฮป calculus. Give a ฮป expression๐‘ such that ๐‘ ๐‘ฅ ๐‘ฆ = NAND(๐‘ฅ, ๐‘ฆ) for every ๐‘ฅ, ๐‘ฆ โˆˆ 0, 1.

Page 307: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 307

Solution:

The NAND of ๐‘ฅ, ๐‘ฆ is equal to 1 unless ๐‘ฅ = ๐‘ฆ = 1. Hence we canwrite ๐‘ = ๐œ†๐‘ฅ, ๐‘ฆ.IF(๐‘ฅ, IF(๐‘ฆ, 0, 1), 1) (8.15)

Solved Exercise 8.4 โ€” Compute XOR using ฮป calculus. Give a ฮป expressionXOR such that for every list ๐ฟ = โŸจ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, โŠฅโŸฉ where ๐‘ฅ๐‘– โˆˆ 0, 1 for๐‘– โˆˆ [๐‘›], XOR๐ฟ evaluates to โˆ‘ ๐‘ฅ๐‘– mod 2.

Solution:

First, we note that we can compute XOR of two bits as follows:

NOT = ๐œ†๐‘Ž.IF(๐‘Ž, 0, 1) (8.16)

andXOR2 = ๐œ†๐‘Ž, ๐‘.IF(๐‘,NOT(๐‘Ž), ๐‘Ž) (8.17)

(We are using here a bit of syntactic sugar to describe the func-tions. To obtain the ฮป expression for XOR we will simply replacethe expression (8.16) in (8.17).) Now recursively we can define theXOR of a list as follows:

XOR(๐ฟ) = โŽงโŽจโŽฉ0 ๐ฟ is emptyXOR2(HEAD(๐ฟ),XOR(TAIL(๐ฟ))) otherwise

(8.18)This means that XOR is equal to

RECURSE (๐œ†๐‘š๐‘’, ๐ฟ.IF(ISEMPTY(๐ฟ), 0,XOR2(HEAD ๐ฟ , ๐‘š๐‘’(TAIL ๐ฟ)))) .(8.19)

That is, XOR is obtained by applying the RECURSE operatorto the function that on inputs ๐‘š๐‘’, ๐ฟ, returns 0 if ISEMPTY(๐ฟ) andotherwise returns XOR2 applied to HEAD(๐ฟ) and to ๐‘š๐‘’(TAIL(๐ฟ)).

We could have also computed XOR using the REDUCE opera-tion, we leave working this out as an exercise to the reader.

8.6.1 Computing a function in the enhanced ฮป calculusAn enhanced ฮป expression is obtained by composing the objects abovewith the application and abstraction rules. The result of simplifying a ฮป

Page 308: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

308 introduction to theoretical computer science

Figure 8.15: A list โŸจ๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2โŸฉ in the ฮป calculus is con-structed from the tail up, building the pair โŸจ๐‘ฅ2,NILโŸฉ,then the pair โŸจ๐‘ฅ1, โŸจ๐‘ฅ2,NILโŸฉโŸฉ and finally the pairโŸจ๐‘ฅ0, โŸจ๐‘ฅ1, โŸจ๐‘ฅ2,NILโŸฉโŸฉโŸฉ. That is, a list is a pair wherethe first element of the pair is the first element of thelist and the second element is the rest of the list. Thefigure on the left renders this โ€œpairs inside pairsโ€construction, though it is often easier to think of a listas a โ€œchainโ€, as in the figure on the right, where thesecond element of each pair is thought of as a link,pointer or reference to the remainder of the list.

Figure 8.16: Illustration of the MAP, FILTER andREDUCE operations.

expression is an equivalent expression, and hence if two expressionshave the same simplification then they are equivalent.

Definition 8.15 โ€” Computing a function via ฮป calculus. Let ๐น โˆถ 0, 1โˆ— โ†’0, 1โˆ—We say that ๐‘’๐‘ฅ๐‘ computes ๐น if for every ๐‘ฅ โˆˆ 0, 1โˆ—,๐‘’๐‘ฅ๐‘โŸจ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, โŠฅโŸฉ โ‰… โŸจ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘šโˆ’1, โŠฅโŸฉ (8.20)where ๐‘› = |๐‘ฅ|, ๐‘ฆ = ๐น(๐‘ฅ), and ๐‘š = |๐‘ฆ|, and the notion of equiva-

lence is defined as per Definition 8.14.

8.6.2 Enhanced ฮป calculus is Turing-completeThe basic operations of the enhanced ฮป calculus more or less amountto the Lisp or Scheme programming languages. Given that, it is per-haps not surprising that the enhanced ฮป-calculus is equivalent toTuring machines:

Theorem 8.16 โ€” Lambda calculus and NAND-TM. For every function๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—, ๐น is computable in the enhanced ฮป calculus ifand only if it is computable by a Turing machine.

Page 309: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 309

Proof Idea:

To prove the theorem, we need to show that (1) if ๐น is computableby a ฮป calculus expression then it is computable by a Turing machine,and (2) if ๐น is computable by a Turing machine, then it is computableby an enhanced ฮป calculus expression.

Showing (1) is fairly straightforward. Applying the simplificationrules to a ฮป expression basically amounts to โ€œsearch and replaceโ€which we can implement easily in, say, NAND-RAM, or for thatmatter Python (both of which are equivalent to Turing machines inpower). Showing (2) essentially amounts to simulating a Turing ma-chine (or writing a NAND-TM interpreter) in a functional program-ming language such as LISP or Scheme. We give the details below buthow this can be done is a good exercise in mastering some functionalprogramming techniques that are useful in their own right.โ‹†Proof of Theorem 8.16. We only sketch the proof. The โ€œifโ€ directionis simple. As mentioned above, evaluating ฮป expressions basicallyamounts to โ€œsearch and replaceโ€. It is also a fairly straightforwardprogramming exercise to implement all the above basic operations inan imperative language such as Python or C, and using the same ideaswe can do so in NAND-RAM as well, which we can then transform toa NAND-TM program.

For the โ€œonly ifโ€ direction we need to simulate a Turing machineusing a ฮป expression. We will do so by first showing that showingfor every Turing machine ๐‘€ a ฮป expression to compute the next-stepfunction NEXT๐‘€ โˆถ ฮฃโˆ— โ†’ ฮฃโˆ— that maps a configuration of ๐‘€ to the nextone (see Section 8.4.2).

A configuration of ๐‘€ is a string ๐›ผ โˆˆ ฮฃโˆ— for a finite set ฮฃ. We canencode every symbol ๐œŽ โˆˆ ฮฃ by a finite string 0, 1โ„“, and so we willencode a configuration ๐›ผ in the ฮป calculus as a list โŸจ๐›ผ0, ๐›ผ1, โ€ฆ , ๐›ผ๐‘šโˆ’1, โŠฅโŸฉwhere ๐›ผ๐‘– is an โ„“-length string (i.e., an โ„“-length list of 0โ€™s and 1โ€™s) en-coding a symbol in ฮฃ.

By Lemma 8.9, for every ๐›ผ โˆˆ ฮฃโˆ—, NEXT๐‘€(๐›ผ)๐‘– is equal to๐‘Ÿ(๐›ผ๐‘–โˆ’1, ๐›ผ๐‘–, ๐›ผ๐‘–+1) for some finite function ๐‘Ÿ โˆถ ฮฃ3 โ†’ ฮฃ. Using ourencoding of ฮฃ as 0, 1โ„“, we can also think of ๐‘Ÿ as mapping 0, 13โ„“ to0, 1โ„“. By Solved Exercise 8.3, we can compute the NAND function,and hence every finite function, including ๐‘Ÿ, using the ฮป calculus.Using this insight, we can compute NEXT๐‘€ using the ฮป calculus asfollows. Given a list ๐ฟ encoding the configuration ๐›ผ0 โ‹ฏ ๐›ผ๐‘šโˆ’1, wedefine the lists ๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ and ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก encoding the configuration ๐›ผ shiftedby one step to the right and left respectively. The next configuration๐›ผโ€ฒ is defined as ๐›ผโ€ฒ๐‘– = ๐‘Ÿ(๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ[๐‘–], ๐ฟ[๐‘–], ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก[๐‘–]) where we let ๐ฟโ€ฒ[๐‘–] denote

Page 310: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

310 introduction to theoretical computer science

the ๐‘–-th element of ๐ฟโ€ฒ. This can be computed by recursion (and henceusing the enhanced ฮป calculusโ€™ RECURSE operator) as follows:

Algorithm 8.17 โ€” ๐‘๐ธ๐‘‹๐‘‡๐‘€ using the ฮป calculus.

Input: List ๐ฟ = โŸจ๐›ผ0, ๐›ผ1, โ€ฆ , ๐›ผ๐‘šโˆ’1, โŠฅโŸฉ encoding a configura-tion ๐›ผ.

Output: List ๐ฟโ€ฒ encoding ๐‘๐ธ๐‘‹๐‘‡๐‘€(๐›ผ)1: procedure ComputeNext(๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ, ๐ฟ, ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก)2: if then๐ผ๐‘†๐ธ๐‘€๐‘ƒ๐‘‡ ๐‘Œ ๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ3: return ๐‘๐ผ๐ฟ4: end if5: ๐‘Ž โ† ๐ป๐ธ๐ด๐ท ๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ6: if then๐ผ๐‘†๐ธ๐‘€๐‘ƒ๐‘‡ ๐‘Œ ๐ฟ7: ๐‘ โ† โˆ… # Encoding of โˆ… in 0, 1โ„“8: else9: ๐‘ โ† ๐ป๐ธ๐ด๐ท ๐ฟ

10: end if11: if then๐ผ๐‘†๐ธ๐‘€๐‘ƒ๐‘‡ ๐‘Œ ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก12: ๐‘ โ† โˆ…13: else14: ๐‘ โ† ๐ป๐ธ๐ด๐ท ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก15: end if16: return ๐‘ƒ๐ด๐ผ๐‘… ๐‘Ÿ(๐‘Ž, ๐‘, ๐‘) ComputeNext(๐‘‡ ๐ด๐ผ๐ฟ ๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ , ๐‘‡ ๐ด๐ผ๐ฟ ๐ฟ , ๐‘‡ ๐ด๐ผ๐ฟ ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก)17: end procedure18: ๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ โ† ๐‘ƒ๐ด๐ผ๐‘… โˆ… ๐ฟ # ๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ = โŸจโˆ…, ๐›ผ0, โ€ฆ , ๐›ผ๐‘šโˆ’1, โŠฅโŸฉ19: ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก โ† ๐‘‡ ๐ด๐ผ๐ฟ ๐ฟ # ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก = โŸจ๐›ผ1, โ€ฆ , ๐›ผ๐‘šโˆ’1, โŠฅ20: return ComputeNext(๐ฟ๐‘๐‘Ÿ๐‘’๐‘ฃ, ๐ฟ, ๐ฟ๐‘›๐‘’๐‘ฅ๐‘ก)

Once we can compute NEXT๐‘€ , we can simulate the execution of๐‘€ on input ๐‘ฅ using the following recursion. Define FINAL(๐›ผ) to bethe final configuration of ๐‘€ when initialized at configuration ๐›ผ. Thefunction FINAL can be defined recursively as follows:

FINAL(๐›ผ) = โŽงโŽจโŽฉ๐›ผ ๐›ผ is halting configurationNEXT๐‘€(๐›ผ) otherwise

. (8.21)

Checking whether a configuration is halting (i.e., whether it isone in which the transition function would output Halt) can be easilyimplemented in the ๐œ† calculus, and hence we can use the RECURSEto compute FINAL. If we let ๐›ผ0 be the initial configuration of ๐‘€ oninput ๐‘ฅ then we can obtain the output ๐‘€(๐‘ฅ) from FINAL(๐›ผ0), hencecompleting the proof.

Page 311: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 311

8.7 FROM ENHANCED TO PURE ฮ› CALCULUS

While the collection of โ€œbasicโ€ functions we allowed for the enhancedฮป calculus is smaller than whatโ€™s provided by most Lisp dialects, com-ing from NAND-TM it still seems a little โ€œbloatedโ€. Can we make dowith less? In other words, can we find a subset of these basic opera-tions that can implement the rest?

It turns out that there is in fact a proper subset of the operations ofthe enhanced ฮป calculus that can be used to implement the rest. Thatsubset is the empty set. That is, we can implement all the operationsabove using the ฮป formalism only, even without using 0โ€™s and 1โ€™s. Itโ€™sฮปโ€™s all the way down!

PThis is a good point to pause and think howyou would implement these operations your-self. For example, start by thinking how youcould implement MAP using REDUCE, andthen REDUCE using RECURSE combined with0, 1, IF,PAIR,HEAD,TAIL,NIL, ISEMPTY. You canalso PAIR, HEAD and TAIL based on 0, 1, IF. The mostchallenging part is to implement RECURSE using onlythe operations of the pure ฮป calculus.

Theorem 8.18 โ€” Enhanced ฮป calculus equivalent to pure ฮป calculus.. Thereare ฮป expressions that implement the functions 0,1,IF,PAIR, HEAD,TAIL, NIL, ISEMPTY, MAP, REDUCE, and RECURSE.

The idea behind Theorem 8.18 is that we encode 0 and 1 them-selves as ฮป expressions, and build things up from there. This is knownas Church encoding, as it was originated by Church in his effort toshow that the ฮป calculus can be a basis for all computation. We willnot write the full formal proof of Theorem 8.18 but outline the ideasinvolved in it:

โ€ข We define 0 to be the function that on two inputs ๐‘ฅ, ๐‘ฆ outputs ๐‘ฆ,and 1 to be the function that on two inputs ๐‘ฅ, ๐‘ฆ outputs ๐‘ฅ. We useCurrying to achieve the effect of two-input functions and hence0 = ๐œ†๐‘ฅ.๐œ†๐‘ฆ.๐‘ฆ and 1 = ๐œ†๐‘ฅ.๐œ†๐‘ฆ.๐‘ฅ. (This representation scheme is thecommon convention for representing false and true but there aremany other alternative representations for 0 and 1 that would haveworked just as well.)

โ€ข The above implementation makes the IF function trivial:IF(๐‘๐‘œ๐‘›๐‘‘, ๐‘Ž, ๐‘) is simply ๐‘๐‘œ๐‘›๐‘‘ ๐‘Ž ๐‘ since 0๐‘Ž๐‘ = ๐‘ and 1๐‘Ž๐‘ = ๐‘Ž. Wecan write IF = ๐œ†๐‘ฅ.๐‘ฅ to achieve IF(๐‘๐‘œ๐‘›๐‘‘, ๐‘Ž, ๐‘) = (((IF๐‘๐‘œ๐‘›๐‘‘)๐‘Ž)๐‘) =๐‘๐‘œ๐‘›๐‘‘ ๐‘Ž ๐‘.

Page 312: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

312 introduction to theoretical computer science

โ€ข To encode a pair (๐‘ฅ, ๐‘ฆ) we will produce a function ๐‘“๐‘ฅ,๐‘ฆ that has ๐‘ฅand ๐‘ฆ โ€œin its bellyโ€ and satisfies ๐‘“๐‘ฅ,๐‘ฆ๐‘” = ๐‘”๐‘ฅ๐‘ฆ for every function ๐‘”.That is, PAIR = ๐œ†๐‘ฅ, ๐‘ฆ. (๐œ†๐‘”.๐‘”๐‘ฅ๐‘ฆ). We can extract the first element ofa pair ๐‘ by writing ๐‘1 and the second element by writing ๐‘0, and soHEAD = ๐œ†๐‘.๐‘1 and TAIL = ๐œ†๐‘.๐‘0.

โ€ข We define NIL to be the function that ignores its input and alwaysoutputs 1. That is, NIL = ๐œ†๐‘ฅ.1. The ISEMPTY function checks,given an input ๐‘, whether we get 1 if we apply ๐‘ to the function๐‘ง๐‘’๐‘Ÿ๐‘œ = ๐œ†๐‘ฅ, ๐‘ฆ.0 that ignores both its inputs and always outputs 0. Forevery valid pair of the form ๐‘ = PAIR๐‘ฅ๐‘ฆ, ๐‘๐‘ง๐‘’๐‘Ÿ๐‘œ = ๐‘๐‘ฅ๐‘ฆ = 0 whileNIL๐‘ง๐‘’๐‘Ÿ๐‘œ = 1. Formally, ISEMPTY = ๐œ†๐‘.๐‘(๐œ†๐‘ฅ, ๐‘ฆ.0).

RRemark 8.19 โ€” Church numerals (optional). There isnothing special about Boolean values. You can usesimilar tricks to implement natural numbers usingฮป terms. The standard way to do so is to representthe number ๐‘› by the function ITER๐‘› that on input afunction ๐‘“ outputs the function ๐‘ฅ โ†ฆ ๐‘“(๐‘“(โ‹ฏ ๐‘“(๐‘ฅ))) (๐‘›times). That is, we represent the natural number 1 as๐œ†๐‘“.๐‘“ , the number 2 as ๐œ†๐‘“.(๐œ†๐‘ฅ.๐‘“(๐‘“๐‘ฅ)), the number 3 as๐œ†๐‘“.(๐œ†๐‘ฅ.๐‘“(๐‘“(๐‘“๐‘ฅ))), and so on and so forth. (Note thatthis is not the same representation we used for 1 inthe Boolean context: this is fine; we already know thatthe same object can be represented in more than oneway.) The number 0 is represented by the functionthat maps any function ๐‘“ to the identity function ๐œ†๐‘ฅ.๐‘ฅ.(That is, 0 = ๐œ†๐‘“.(๐œ†๐‘ฅ.๐‘ฅ).)In this representation, we can compute PLUS(๐‘›, ๐‘š)as ๐œ†๐‘“.๐œ†๐‘ฅ.(๐‘›๐‘“)((๐‘š๐‘“)๐‘ฅ) and TIMES(๐‘›, ๐‘š) as ๐œ†๐‘“.๐‘›(๐‘š๐‘“).Subtraction and division are trickier, but can beachieved using recursion. (Working this out is a greatexercise.)

8.7.1 List processingNow we come to a bigger hurdle, which is how to implementMAP, FILTER, REDUCE and RECURSE in the pure ฮป calculus. Itturns out that we can build MAP and FILTER from REDUCE, andREDUCE from RECURSE. For example MAP(๐ฟ, ๐‘“) is the same asREDUCE(๐ฟ, ๐‘”) where ๐‘” is the operation that on input ๐‘ฅ and ๐‘ฆ, outputsPAIR(๐‘“(๐‘ฅ),NIL) if ๐‘ฆ is NIL and otherwise outputs PAIR(๐‘“(๐‘ฅ), ๐‘ฆ). (Ileave checking this as a (recommended!) exercise for you, the reader.)

We can define REDUCE(๐ฟ, ๐‘”) recursively, by settingREDUCE(NIL, ๐‘”) = NIL and stipulating that given a non-empty list ๐ฟ, which we can think of as a pair (โ„Ž๐‘’๐‘Ž๐‘‘, ๐‘Ÿ๐‘’๐‘ ๐‘ก),

Page 313: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 313

REDUCE(๐ฟ, ๐‘”) = ๐‘”(โ„Ž๐‘’๐‘Ž๐‘‘,REDUCE(๐‘Ÿ๐‘’๐‘ ๐‘ก, ๐‘”))). Thus, we mighttry to write a recursive ฮป expression for REDUCE as follows

REDUCE = ๐œ†๐ฟ, ๐‘”.IF(ISEMPTY(๐ฟ),NIL, ๐‘”HEAD(๐ฟ)REDUCE(TAIL(๐ฟ), ๐‘”)) .(8.22)

The only fly in this ointment is that the ฮป calculus does not have thenotion of recursion, and so this is an invalid definition. But of coursewe can use our RECURSE operator to solve this problem. We willreplace the recursive call to โ€œREDUCEโ€ with a call to a function ๐‘š๐‘’that is given as an extra argument, and then apply RECURSE to this.Thus REDUCE = RECURSE ๐‘š๐‘ฆ๐‘…๐ธ๐ท๐‘ˆ๐ถ๐ธ where

๐‘š๐‘ฆ๐‘…๐ธ๐ท๐‘ˆ๐ถ๐ธ = ๐œ†๐‘š๐‘’, ๐ฟ, ๐‘”.IF(ISEMPTY(๐ฟ),NIL, ๐‘”HEAD(๐ฟ)๐‘š๐‘’(TAIL(๐ฟ), ๐‘”)) .(8.23)

8.7.2 The Y combinator, or recursion without recursionEq. (8.23) means that implementing MAP, FILTER, and REDUCE boilsdown to implementing the RECURSE operator in the pure ฮป calculus.This is what we do now.

How can we implement recursion without recursion? We willillustrate this using a simple example - the XOR function. As shown inSolved Exercise 8.4, we can write the XOR function of a list recursivelyas follows:

XOR(๐ฟ) = โŽงโŽจโŽฉ0 ๐ฟ is emptyXOR2(HEAD(๐ฟ),XOR(TAIL(๐ฟ))) otherwise

(8.24)

where XOR2 โˆถ 0, 12 โ†’ 0, 1 is the XOR on two bits. In Python wewould write this asdef xor2(a,b): return 1-b if a else bdef head(L): return L[0]def tail(L): return L[1:]

def xor(L): return xor2(head(L),xor(tail(L))) if L else 0

print(xor([0,1,1,0,0,1]))# 1

Now, how could we eliminate this recursive call? The main idea isthat since functions can take other functions as input, it is perfectlylegal in Python (and the ฮป calculus of course) to give a function itselfas input. So, our idea is to try to come up with a non recursive functiontempxor that takes two inputs: a function and a list, and such thattempxor(tempxor,L)will output the XOR of L!

Page 314: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

314 introduction to theoretical computer science

PAt this point you might want to stop and try to im-plement this on your own in Python or any otherprogramming language of your choice (as long as itallows functions as inputs).

Our first attempt might be to simply use the idea of replacing therecursive call by me. Letโ€™s define this function as myxor

def myxor(me,L): return xor2(head(L),me(tail(L))) if Lelse 0

Letโ€™s test this out:

myxor(myxor,[1,0,1])

If you do this, you will get the following complaint from the inter-preter:

TypeError: myxor() missing 1 required positional argu-ment

The problem is that myxor expects two inputs- a function and alist- while in the call to me we only provided a list. To correct this, wemodify the call to also provide the function itself:

def tempxor(me,L): return xor2(head(L),me(me,tail(L))) ifL else 0

Note the call me(me,..) in the definition of tempxor: given a func-tion me as input, tempxor will actually call the function me with itselfas the first input. If we test this out now, we see that we actually getthe right result!

tempxor(tempxor,[1,0,1])# 0tempxor(tempxor,[1,0,1,1])# 1

and so we can define xor(L) as simply return tem-pxor(tempxor,L).

The approach above is not specific to XOR. Given a recursive func-tion f that takes an input x, we can obtain a non recursive version asfollows:

1. Create the function myf that takes a pair of inputs me and x, andreplaces recursive calls to f with calls to me.

2. Create the function tempf that converts calls in myf of the formme(x) to calls of the form me(me,x).

Page 315: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 315

3 Because of specific issues of Python syntax, in thisimplementation we use f * g for applying f to grather than fg, and use ฮปx(exp) rather than ฮปx.expfor abstraction. We also use _0 and _1 for the ฮป termsfor 0 and 1 so as not to confuse with the Pythonconstants.

3. The function f(x) will be defined as tempf(tempf,x)

Here is the way we implement the RECURSE operator in Python. Itwill take a function myf as above, and replace it with a function g suchthat g(x)=myf(g,x) for every x.

def RECURSE(myf):def tempf(me,x): return myf(lambda y: me(me,y),x)

return lambda x: tempf(tempf,x)

xor = RECURSE(myxor)

print(xor([0,1,1,0,0,1]))# 1

print(xor([1,1,0,0,1,1,1,1]))# 0

From Python to the calculus. In the ฮป calculus, a two input function๐‘” that takes a pair of inputs ๐‘š๐‘’, ๐‘ฆ is written as ๐œ†๐‘š๐‘’.(๐œ†๐‘ฆ.๐‘”). So thefunction ๐‘ฆ โ†ฆ ๐‘š๐‘’(๐‘š๐‘’, ๐‘ฆ) is simply written as ๐‘š๐‘’ ๐‘š๐‘’ and similarlythe function ๐‘ฅ โ†ฆ ๐‘ก๐‘’๐‘š๐‘๐‘“(๐‘ก๐‘’๐‘š๐‘๐‘“, ๐‘ฅ) is simply ๐‘ก๐‘’๐‘š๐‘๐‘“ ๐‘ก๐‘’๐‘š๐‘๐‘“ . (Canyou see why?) Therefore the function tempf defined above can bewritten as ฮป me. myf(me me). This means that if we denote the inputof RECURSE by ๐‘“ , then RECURSE ๐‘š๐‘ฆ๐‘“ = ๐‘ก๐‘’๐‘š๐‘๐‘“ ๐‘ก๐‘’๐‘š๐‘๐‘“ where ๐‘ก๐‘’๐‘š๐‘๐‘“ =๐œ†๐‘š.๐‘“(๐‘š ๐‘š) or in other words

RECURSE = ๐œ†๐‘“.((๐œ†๐‘š.๐‘“(๐‘š ๐‘š)) (๐œ†๐‘š.๐‘“(๐‘š ๐‘š))) (8.25)

The online appendix contains an implementation of the ฮป calcu-lus using Python. Here is an implementation of the recursive XORfunction from that appendix:3

# XOR of two bitsXOR2 = ฮป(a,b)(IF(a,IF(b,_0,_1),b))

# Recursive XOR with recursive calls replaced by mparameter

myXOR = ฮป(m,l)(IF(ISEMPTY(l),_0,XOR2(HEAD(l),m(TAIL(l)))))

# Recurse operator (aka Y combinator)RECURSE = ฮปf((ฮปm(f(m*m)))(ฮปm(f(m*m))))

# XOR function

Page 316: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

316 introduction to theoretical computer science

XOR = RECURSE(myXOR)

#TESTING:

XOR(PAIR(_1,NIL)) # List [1]# equals 1

XOR(PAIR(_1,PAIR(_0,PAIR(_1,NIL)))) # List [1,0,1]# equals 0

RRemark 8.20 โ€” The Y combinator. The RECURSE opera-tor above is better known as the Y combinator.It is one of a family of a fixed point operators that givena lambda expression ๐น , find a fixed point ๐‘“ of ๐น suchthat ๐‘“ = ๐น๐‘“ . If you think about it, XOR is the fixedpoint of ๐‘š๐‘ฆ๐‘‹๐‘‚๐‘… above. XOR is the function suchthat for every ๐‘ฅ, if plug in XOR as the first argumentof ๐‘š๐‘ฆ๐‘‹๐‘‚๐‘… then we get back XOR, or in other wordsXOR = ๐‘š๐‘ฆ๐‘‹๐‘‚๐‘… XOR. Hence finding a fixed point for๐‘š๐‘ฆ๐‘‹๐‘‚๐‘… is the same as applying RECURSE to it.

8.8 THE CHURCH-TURING THESIS (DISCUSSION)โ€œ[In 1934], Church had been speculating, and finally definitely proposed, thatthe ฮป-definable functions are all the effectively calculable functions โ€ฆ. WhenChurch proposed this thesis, I sat down to disprove it โ€ฆ but, quickly realizingthat [my approach failed], I became overnight a supporter of the thesis.โ€,Stephen Kleene, 1979.

โ€œ[The thesis is] not so much a definition or to an axiom but โ€ฆ a natural law.โ€,Emil Post, 1936.

We have defined functions to be computable if they can be computedby a NAND-TM program, and weโ€™ve seen that the definition wouldremain the same if we replaced NAND-TM programs by Python pro-grams, Turing machines, ฮป calculus, cellular automata, and manyother computational models. The Church-Turing thesis is that this isthe only sensible definition of โ€œcomputableโ€ functions. Unlike theโ€œPhysical Extended Church-Turing Thesisโ€ (PECTT) which we sawbefore, the Church-Turing thesis does not make a concrete physicalprediction that can be experimentally tested, but it certainly motivatespredictions such as the PECTT. One can think of the Church-TuringThesis as either advocating a definitional choice, making some pre-diction about all potential computing devices, or suggesting somelaws of nature that constrain the natural world. In Scott Aaronsonโ€™s

Page 317: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 317

words, โ€œwhatever it is, the Church-Turing thesis can only be regardedas extremely successfulโ€. No candidate computing device (includingquantum computers, and also much less reasonable models such asthe hypothetical โ€œclosed time curveโ€ computers we mentioned before)has so far mounted a serious challenge to the Church-Turing thesis.These devices might potentially make some computations more effi-cient, but they do not change the difference between what is finitelycomputable and what is not. (The extended Church-Turing thesis,which we discuss in Section 13.3, stipulates that Turing machines cap-ture also the limit of what can be efficiently computable. Just like itsphysical version, quantum computing presents the main challenge tothis thesis.)

8.8.1 Different models of computationWe can summarize the models we have seen in the following table:

Table 8.1: Different models for computing finite functions andfunctions with arbitrary input length.

Computationalproblems Type of model ExamplesFinite functions๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š Non uniform

computation(algorithmdepends on inputlength)

Boolean circuits,NAND circuits,straight-line programs(e.g., NAND-CIRC)

Functions withunbounded inputs๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—

Sequential accessto memory

Turing machines,NAND-TM programs

โ€“ Indexed access /RAM

RAM machines,NAND-RAM, modernprogramminglanguages

โ€“ Other Lambda calculus,cellular automata

Later on in Chapter 17 we will study memory bounded computa-tion. It turns out that NAND-TM programs with a constant amountof memory are equivalent to the model of finite automata (the adjec-tives โ€œdeterministicโ€ or โ€œnondeterministicโ€ are sometimes added aswell, this model is also known as finite state machines) which in turncaptures the notion of regular languages (those that can be described byregular expressions), which is a concept we will see in Chapter 10.

Page 318: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

318 introduction to theoretical computer science

Chapter Recap

โ€ข While we defined computable functions usingTuring machines, we could just as well have doneso using many other models, including not justNAND-TM programs but also RAM machines,NAND-RAM, the ฮป-calculus, cellular automata andmany other models.

โ€ข Very simple models turn out to be โ€œTuring com-pleteโ€ in the sense that they can simulate arbitrarilycomplex computation.

8.9 EXERCISES

Exercise 8.1 โ€” Alternative proof for TM/RAM equivalence. Let SEARCH โˆถ0, 1โˆ— โ†’ 0, 1โˆ— be the following function. The input is a pair(๐ฟ, ๐‘˜) where ๐‘˜ โˆˆ 0, 1โˆ—, ๐ฟ is an encoding of a list of key value pairs(๐‘˜0, ๐‘ฃ1), โ€ฆ , (๐‘˜๐‘šโˆ’1, ๐‘ฃ๐‘šโˆ’1) where ๐‘˜0, โ€ฆ , ๐‘˜๐‘šโˆ’1, ๐‘ฃ0, โ€ฆ , ๐‘ฃ๐‘šโˆ’1 are binarystrings. The output is ๐‘ฃ๐‘– for the smallest ๐‘– such that ๐‘˜๐‘– = ๐‘˜, if such ๐‘–exists, and otherwise the empty string.

1. Prove that SEARCH is computable by a Turing machine.

2. Let UPDATE(๐ฟ, ๐‘˜, ๐‘ฃ) be the function whose input is a list ๐ฟ of pairs,and whose output is the list ๐ฟโ€ฒ obtained by prepending the pair(๐‘˜, ๐‘ฃ) to the beginning of ๐ฟ. Prove that UPDATE is computable by aTuring machine.

3. Suppose we encode the configuration of a NAND-RAM programby a list ๐ฟ of key/value pairs where the key is either the name ofa scalar variable foo or of the form Bar[<num>] for some num-ber <num> and it contains all the nonzero values of variables. LetNEXT(๐ฟ) be the function that maps a configuration of a NAND-RAM program at one step to the configuration in the next step.Prove that NEXT is computable by a Turing machine (you donโ€™thave to implement each one of the arithmetic operations: it isenough to implement addition and multiplication).

4. Prove that for every ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— that is computable by aNAND-RAM program, ๐น is computable by a Turing machine.

Exercise 8.2 โ€” NAND-TM lookup. This exercise shows part of the proof thatNAND-TM can simulate NAND-RAM. Produce the code of a NAND-TM program that computes the function LOOKUP โˆถ 0, 1โˆ— โ†’ 0, 1that is defined as follows. On input ๐‘๐‘“(๐‘–)๐‘ฅ, where ๐‘๐‘“(๐‘–) denotes aprefix-free encoding of an integer ๐‘–, LOOKUP(๐‘๐‘“(๐‘–)๐‘ฅ) = ๐‘ฅ๐‘– if ๐‘– < |๐‘ฅ|

Page 319: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 319

4 You donโ€™t have to give a full description of a Turingmachine: use our โ€œhave the cake and eat it tooโ€paradigm to show the existence of such a machine byarguing from more powerful equivalent models.

5 Same hint as Exercise 8.5 applies. Note that forshowing that LONGPATH is computable you donโ€™thave to give an efficient algorithm.

and LOOKUP(๐‘๐‘“(๐‘–)๐‘ฅ) = 0 otherwise. (We donโ€™t care what LOOKUPoutputs on inputs that are not of this form.) You can choose anyprefix-free encoding of your choice, and also can use your favoriteprogramming language to produce this code.

Exercise 8.3 โ€” Pairing. Let ๐‘’๐‘š๐‘๐‘’๐‘‘ โˆถ โ„•2 โ†’ โ„• be the function defined as๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅ0, ๐‘ฅ1) = 12 (๐‘ฅ0 + ๐‘ฅ1)(๐‘ฅ0 + ๐‘ฅ1 + 1) + ๐‘ฅ1.1. Prove that for every ๐‘ฅ0, ๐‘ฅ1 โˆˆ โ„•, ๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅ0, ๐‘ฅ1) is indeed a natural

number.

2. Prove that ๐‘’๐‘š๐‘๐‘’๐‘‘ is one-to-one

3. Construct a NAND-TM program ๐‘ƒ such that for every ๐‘ฅ0, ๐‘ฅ1 โˆˆ โ„•,๐‘ƒ(๐‘๐‘“(๐‘ฅ0)๐‘๐‘“(๐‘ฅ1)) = ๐‘๐‘“(๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅ0, ๐‘ฅ1)), where ๐‘๐‘“ is the prefix-freeencoding map defined above. You can use the syntactic sugar forinner loops, conditionals, and incrementing/decrementing thecounter.

4. Construct NAND-TM programs ๐‘ƒ0, ๐‘ƒ1 such that for for every๐‘ฅ0, ๐‘ฅ1 โˆˆ โ„• and ๐‘– โˆˆ ๐‘ , ๐‘ƒ๐‘–(๐‘๐‘“(๐‘’๐‘š๐‘๐‘’๐‘‘(๐‘ฅ0, ๐‘ฅ1))) = ๐‘๐‘“(๐‘ฅ๐‘–). You canuse the syntactic sugar for inner loops, conditionals, and increment-ing/decrementing the counter.

Exercise 8.4 โ€” Shortest Path. Let SHORTPATH โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—be the function that on input a string encoding a triple (๐บ, ๐‘ข, ๐‘ฃ) out-puts a string encoding โˆž if ๐‘ข and ๐‘ฃ are disconnected in ๐บ or a stringencoding the length ๐‘˜ of the shortest path from ๐‘ข to ๐‘ฃ. Prove thatSHORTPATH is computable by a Turing machine. See footnote forhint.4

Exercise 8.5 โ€” Longest Path. Let LONGPATH โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— bethe function that on input a string encoding a triple (๐บ, ๐‘ข, ๐‘ฃ) outputsa string encoding โˆž if ๐‘ข and ๐‘ฃ are disconnected in ๐บ or a string en-coding the length ๐‘˜ of the longest simple path from ๐‘ข to ๐‘ฃ. Prove thatLONGPATH is computable by a Turing machine. See footnote forhint.5

Exercise 8.6 โ€” Shortest path ฮป expression. Let SHORTPATH be as inExercise 8.4. Prove that there exists a ๐œ† expression that computesSHORTPATH. You can use Exercise 8.4

Exercise 8.7 โ€” Next-step function is local. Prove Lemma 8.9 and use it tocomplete the proof of Theorem 8.7.

Page 320: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

320 introduction to theoretical computer science

6 Hint: You can reduce the number of variables afunction takes by โ€œpairing them upโ€. That is, define aฮป expression PAIR such that for every ๐‘ฅ, ๐‘ฆ PAIR๐‘ฅ๐‘ฆ issome function ๐‘“ such that ๐‘“0 = ๐‘ฅ and ๐‘“1 = ๐‘ฆ. Thenuse PAIR to iteratively reduce the number of variablesused.

7 Use structural induction on the expression ๐‘’.

8 The name ๐‘ง๐‘–๐‘ is a common name for this operation,for example in Python. It should not be confused withthe zip compression file format.

9 Use MAP and REDUCE (and potentially FILTER).You might also find the function ๐‘ง๐‘–๐‘ of Exercise 8.10useful.

10 Try to set up a procedure such that if array Leftcontains an encoding of a ฮป expression ๐œ†๐‘ฅ.๐‘’ andarray Right contains an encoding of another ฮป expres-sion ๐‘’โ€ฒ, then the array Result will contain ๐‘’[๐‘ฅ โ†’ ๐‘’โ€ฒ].

Exercise 8.8 โ€” ฮป calculus requires at most three variables. Prove that for ev-ery ฮป-expression ๐‘’ with no free variables there is an equivalent ฮป-expression ๐‘“ that only uses the variables ๐‘ฅ,๐‘ฆ, and ๐‘ง.6

Exercise 8.9 โ€” Evaluation order example in ฮป calculus. 1. Let ๐‘’ =๐œ†๐‘ฅ.7 ((๐œ†๐‘ฅ.๐‘ฅ๐‘ฅ)(๐œ†๐‘ฅ.๐‘ฅ๐‘ฅ)). Prove that the simplification process of ๐‘’ends in a definite number if we use the โ€œcall by nameโ€ evaluationorder while it never ends if we use the โ€œcall by valueโ€ order.

2. (bonus, challenging) Let ๐‘’ be any ฮป expression. Prove that if thesimplification process ends in a definite number if we use the โ€œcallby valueโ€ order then it also ends in such a number if we use theโ€œcall by nameโ€ order. See footnote for hint.7

Exercise 8.10 โ€” Zip function. Give an enhanced ฮป calculus expression tocompute the function ๐‘ง๐‘–๐‘ that on input a pair of lists ๐ผ and ๐ฟ of thesame length ๐‘›, outputs a list of ๐‘› pairs ๐‘€ such that the ๐‘—-th elementof ๐‘€ (which we denote by ๐‘€๐‘—) is the pair (๐ผ๐‘—, ๐ฟ๐‘—). Thus ๐‘ง๐‘–๐‘ โ€œzipstogetherโ€ these two lists of elements into a single list of pairs.8

Exercise 8.11 โ€” Next-step function without ๐‘…๐ธ๐ถ๐‘ˆ๐‘…๐‘†๐ธ. Let ๐‘€ be a Turingmachine. Give an enhanced ฮป calculus expression to compute thenext-step function NEXT๐‘€ of ๐‘€ (as in the proof of Theorem 8.16)without using RECURSE. See footnote for hint.9

Exercise 8.12 โ€” ฮป calculus to NAND-TM compiler (challenging). Give a programin the programming language of your choice that takes as input a ฮปexpression ๐‘’ and outputs a NAND-TM program ๐‘ƒ that computes thesame function as ๐‘’. For partial credit you can use the GOTO and allNAND-CIRC syntactic sugar in your output program. You can useany encoding of ฮป expressions as binary string that is convenient foryou. See footnote for hint.10

Exercise 8.13 โ€” At least two in ๐œ† calculus. Let 1 = ๐œ†๐‘ฅ, ๐‘ฆ.๐‘ฅ and 0 = ๐œ†๐‘ฅ, ๐‘ฆ.๐‘ฆ asbefore. Define

ALT = ๐œ†๐‘Ž, ๐‘, ๐‘.(๐‘Ž(๐‘1(๐‘10))(๐‘๐‘0)) (8.26)

Prove that ALT is a ๐œ† expression that computes the at least two func-tion. That is, for every ๐‘Ž, ๐‘, ๐‘ โˆˆ 0, 1 (as encoded above) ALT๐‘Ž๐‘๐‘ = 1if and only at least two of ๐‘Ž, ๐‘, ๐‘ are equal to 1.

Page 321: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 321

Exercise 8.14 โ€” Locality of next-step function. This question will help youget a better sense of the notion of locality of the next step function of Tur-ing Machines. This locality plays an important role in results such asthe Turing completeness of ๐œ† calculus and one dimensional cellularautomata, as well as results such as Godelโ€™s Incompleteness Theoremand the Cook Levin theorem that we will see later in this course. De-fine STRINGS to be the a programming language that has the followingsemantics:

โ€ข A STRINGS program ๐‘„ has a single string variable str that is boththe input and the output of ๐‘„. The program has no loops and noother variables, but rather consists of a sequence of conditionalsearch and replace operations that modify str.

โ€ข The operations of a STRINGS program are:

โ€“ REPLACE(pattern1,pattern2)where pattern1 and pattern2are fixed strings. This replaces the first occurrence of pattern1in str with pattern2

โ€“ if search(pattern) code executes code if pattern is asubstring of str. The code code can itself include nested ifโ€™s.(One can also add an else ... to execute if pattern is nota substring of condf).

โ€“ the returned value is str

โ€ข A STRING program ๐‘„ computes a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— iffor every ๐‘ฅ โˆˆ 0, 1โˆ—, if we initialize str to ๐‘ฅ and then execute thesequence of instructions in ๐‘„, then at the end of the execution strequals ๐น(๐‘ฅ).For example, the following is a STRINGS program that computes

the function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— such that for every ๐‘ฅ โˆˆ 0, 1โˆ—, if ๐‘ฅcontains a substring of the form ๐‘ฆ = 11๐‘Ž๐‘11 where ๐‘Ž, ๐‘ โˆˆ 0, 1, then๐น(๐‘ฅ) = ๐‘ฅโ€ฒ where ๐‘ฅโ€ฒ is obtained by replacing the first occurrence of ๐‘ฆ in๐‘ฅ with 00.if search('110011')

replace('110011','00') else if search('110111')

replace('110111','00') else if search('111011')

replace('111011','00') else if search('111111')

replace('1111111','00')

Page 322: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

322 introduction to theoretical computer science

Prove that for every Turing Machine program ๐‘€ , there exists aSTRINGS program ๐‘„ that computes the NEXT๐‘€ function that mapsevery string encoding a valid configuration of ๐‘€ to the string encodingthe configuration of the next step of ๐‘€ โ€™s computation. (We donโ€™tcare what the function will do on strings that do not encode a validconfiguration.) You donโ€™t have to write the STRINGS program fully,but you do need to give a convincing argument that such a programexists.

8.10 BIBLIOGRAPHICAL NOTES

Chapters 7 in the wonderful book of Moore and Mertens [MM11]contains a great exposition much of this material. .

The RAM model can be very useful in studying the concrete com-plexity of practical algorithms. Its theoretical study was initiated in[CR73]. However, the exact set of operations that are allowed in theRAM model and their costs vary between texts and contexts. Oneneeds to be careful in making such definitions, especially if the wordsize grows, as was already shown by Shamir [Sha79]. Chapter 3 inSavageโ€™s book [Sav98] contains a more formal description of RAMmachines, see also the paper [Hag98]. A study of RAM algorithmsthat are independent of the input size (known as the โ€œtransdichoto-mous RAM modelโ€) was initiated by [FW93]

The models of computation we considered so far are inherentlysequential, but these days much computation happens in parallel,whether using multi-core processors or in massively parallel dis-tributed computation in data centers or over the Internet. Parallelcomputing is important in practice, but it does not really make muchdifference for the question of what can and canโ€™t be computed. Afterall, if a computation can be performed using ๐‘š machines in ๐‘ก time,then it can be computed by a single machine in time ๐‘š๐‘ก.

The ฮป-calculus was described by Church in [Chu41]. Pierceโ€™s book[Pie02] is a canonical textbook, see also [Bar84]. The โ€œCurrying tech-niqueโ€ is named after the logician Haskell Curry (the Haskell pro-gramming language is named after Haskell Curry as well). Curryhimself attributed this concept to Moses Schรถnfinkel, though for somereason the term โ€œSchรถnfinkelingโ€ never caught on.

Unlike most programming languages, the pure ฮป-calculus doesnโ€™thave the notion of types. Every object in the ฮป calculus can also bethought of as a ฮป expression and hence as a function that takes oneinput and returns one output. All functions take one input and re-turn one output, and if you feed a function an input of a form it didnโ€™texpect, it still evaluates the ฮป expression via โ€œsearch and replaceโ€,replacing all instances of its parameter with copies of the input expres-

Page 323: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

equivalent models of computation 323

sion you fed it. Typed variants of the ฮป calculus are objects of intenseresearch, and are strongly related to type systems for programminglanguage and computer-verifiable proof systems, see [Pie02]. Some ofthe typed variants of the ฮป calculus do not have infinite loops, whichmakes them very useful as ways of enabling static analysis of pro-grams as well as computer-verifiable proofs. We will come back to thispoint in Chapter 10 and Chapter 22.

Tao has proposed showing the Turing completeness of fluid dy-namics (a โ€œwater computerโ€) as a way of settling the question of thebehavior of the Navier-Stokes equations, see this popular article.

Page 324: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 325: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

9Universality and uncomputability

โ€œA function of a variable quantity is an analytic expression composed in anyway whatsoever of the variable quantity and numbers or constant quantities.โ€,Leonhard Euler, 1748.

โ€œThe importance of the universal machine is clear. We do not need to have aninfinity of different machines doing different jobs. โ€ฆ The engineering problemof producing various machines for various jobs is replaced by the office work ofโ€˜programmingโ€™ the universal machineโ€, Alan Turing, 1948

One of the most significant results we showed for Boolean circuits(or equivalently, straight-line programs) is the notion of universality:there is a single circuit that can evaluate all other circuits. However,this result came with a significant caveat. To evaluate a circuit of ๐‘ gates, the universal circuit needed to use a number of gates largerthan ๐‘ . It turns out that uniform models such as Turing machines orNAND-TM programs allow us to โ€œbreak out of this cycleโ€ and obtaina truly universal Turing machine ๐‘ˆ that can evaluate all other machines,including machines that are more complex (e.g., more states) than ๐‘ˆitself. (Similarly, there is a Universal NAND-TM program ๐‘ˆ โ€ฒ that canevaluate all NAND-TM programs, including programs that have morelines than ๐‘ˆ โ€ฒ.)

It is no exaggeration to say that the existence of such a universalprogram/machine underlies the information technology revolutionthat began in the latter half of the 20th century (and is still ongoing).Up to that point in history, people have produced various special-purpose calculating devices such as the abacus, the slide ruler, andmachines that compute various trigonometric series. But as Turing(who was perhaps the one to see most clearly the ramifications ofuniversality) observed, a general purpose computer is much more pow-erful. Once we build a device that can compute the single universalfunction, we have the ability, via software, to extend it to do arbitrarycomputations. For example, if we want to simulate a new Turing ma-chine ๐‘€ , we do not need to build a new physical machine, but rather

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข The universal machine/program - โ€œone

program to rule them allโ€โ€ข A fundamental result in computer science and

mathematics: the existence of uncomputablefunctions.

โ€ข The halting problem: the canonical example ofan uncomputable function.

โ€ข Introduction to the technique of reductions.โ€ข Riceโ€™s Theorem: A โ€œmeta toolโ€ for

uncomputability results, and a starting pointfor much of the research on compilers,programming languages, and softwareverification.

Page 326: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

326 introduction to theoretical computer science

can represent ๐‘€ as a string (i.e., using code) and then input ๐‘€ to theuniversal machine ๐‘ˆ .

Beyond the practical applications, the existence of a universal algo-rithm also has surprising theoretical ramifications, and in particularcan be used to show the existence of uncomputable functions, upend-ing the intuitions of mathematicians over the centuries from Eulerto Hilbert. In this chapter we will prove the existence of the univer-sal program, and also show its implications for uncomputability, seeFig. 9.1

This chapter: A non-mathy overviewIn this chapter we will see two of the most important resultsin Computer Science:

1. The existence of a universal Turing machine: a single algo-rithm that can evaluate all other algorithms,

2. The existence of uncomputable functions: functions (includ-ing the famous โ€œHalting problemโ€) that cannot be computedby any algorithm.

Along the way, we develop the technique of reductions as away to show hardness of computing a function. A reductiongives a way to compute a certain function using โ€œwishfulthinkingโ€ and assuming that another function can be com-puted. Reductions are of course widely used in program-ming - we often obtain an algorithm for one task by usinganother task as a โ€œblack boxโ€ subroutine. However we willuse it in the โ€œcontra positiveโ€: rather than using a reductionto show that the former task is โ€œeasyโ€, we use them to showthat the latter task is โ€œhardโ€. Donโ€™t worry if you find thisconfusing - reductions are initially confusing - but they can bemastered with time and practice.

9.1 UNIVERSALITY OR A META-CIRCULAR EVALUATOR

We start by proving the existence of a universal Turing machine. This isa single Turing machine ๐‘ˆ that can evaluate arbitrary Turing machines๐‘€ on arbitrary inputs ๐‘ฅ, including machines ๐‘€ that can have morestates and larger alphabet than ๐‘ˆ itself. In particular, ๐‘ˆ can even beused to evaluate itself! This notion of self reference will appear time andagain in this book, and as we will see, leads to several counter-intuitivephenomena in computing.

Page 327: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 327

Figure 9.1: In this chapter we will show the existenceof a universal Turing machine and then use this to de-rive first the existence of some uncomputable function.We then use this to derive the uncomputability ofTuringโ€™s famous โ€œhalting problemโ€ (i.e., the HALTfunction), from which we a host of other uncom-putability results follow. We also introduce reductions,which allow us to use the uncomputability of afunction ๐น to derive the uncomputability of a newfunction ๐บ.

Figure 9.2: A Universal Turing Machine is a singleTuring Machine ๐‘ˆ that can evaluate, given input the(description as a string of) arbitrary Turing machine๐‘€ and input ๐‘ฅ, the output of ๐‘€ on ๐‘ฅ. In contrast tothe universal circuit depicted in Fig. 5.6, the machine๐‘€ can be much more complex (e.g., more states ortape alphabet symbols) than ๐‘ˆ .

Theorem 9.1 โ€” Universal Turing Machine. There exists a Turing machine๐‘ˆ such that on every string ๐‘€ which represents a Turing machine,and ๐‘ฅ โˆˆ 0, 1โˆ—, ๐‘ˆ(๐‘€, ๐‘ฅ) = ๐‘€(๐‘ฅ).

That is, if the machine ๐‘€ halts on ๐‘ฅ and outputs some ๐‘ฆ โˆˆ0, 1โˆ— then ๐‘ˆ(๐‘€, ๐‘ฅ) = ๐‘ฆ, and if ๐‘€ does not halt on ๐‘ฅ (i.e.,๐‘€(๐‘ฅ) = โŠฅ) then ๐‘ˆ(๐‘€, ๐‘ฅ) = โŠฅ.

Big Idea 11 There is a โ€œuniversalโ€ algorithm that can evaluatearbitrary algorithms on arbitrary inputs.

Proof Idea:

Once you understand what the theorem says, it is not that hard toprove. The desired program ๐‘ˆ is an interpreter for Turing machines.That is, ๐‘ˆ gets a representation of the machine ๐‘€ (think of it as sourcecode), and some input ๐‘ฅ, and needs to simulate the execution of ๐‘€ on๐‘ฅ.

Think of how you would code ๐‘ˆ in your favorite programminglanguage. First, you would need to decide on some representationscheme for ๐‘€ . For example, you can use an array or a dictionaryto encode ๐‘€ โ€™s transition function. Then you would use some datastructure, such as a list, to store the contents of ๐‘€ โ€™s tape. Now you cansimulate ๐‘€ step by step, updating the data structure as you go along.The interpreter will continue the simulation until the machine halts.

Once you do that, translating this interpreter from your favoriteprogramming language to a Turing machine can be done just as wehave seen in Chapter 8. The end result is whatโ€™s known as a โ€œmeta-

Page 328: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

328 introduction to theoretical computer science

circular evaluatorโ€: an interpreter for a programming language in thesame one. This is a concept that has a long history in computer sciencestarting from the original universal Turing machine. See also Fig. 9.3.โ‹†9.1.1 Proving the existence of a universal Turing MachineTo prove (and even properly state) Theorem 9.1, we need to fix somerepresentation for Turing machines as strings. One potential choicefor such a representation is to use the equivalence between Turingmachines and NAND-TM programs and hence represent a Turingmachine ๐‘€ using the ASCII encoding of the source code of the corre-sponding NAND-TM program ๐‘ƒ . However, we will use a more directencoding.

Definition 9.2 โ€” String representation of Turing Machine. Let ๐‘€ be a Turingmachine with ๐‘˜ states and a size โ„“ alphabet ฮฃ = ๐œŽ0, โ€ฆ , ๐œŽโ„“โˆ’1 (weuse the convention ๐œŽ0 = 0,๐œŽ1 = 1, ๐œŽ2 = โˆ…, ๐œŽ3 = โ–ท). We represent๐‘€ as the triple (๐‘˜, โ„“, ๐‘‡ ) where ๐‘‡ is the table of values for ๐›ฟ๐‘€ :๐‘‡ = (๐›ฟ๐‘€(0, ๐œŽ0), ๐›ฟ๐‘€(0, ๐œŽ1), โ€ฆ , ๐›ฟ๐‘€(๐‘˜ โˆ’ 1, ๐œŽโ„“โˆ’1)) , (9.1)

where each value ๐›ฟ๐‘€(๐‘ , ๐œŽ) is a triple (๐‘ โ€ฒ, ๐œŽโ€ฒ, ๐‘‘) with ๐‘ โ€ฒ โˆˆ [๐‘˜],๐œŽโ€ฒ โˆˆ ฮฃ and ๐‘‘ a number 0, 1, 2, 3 encoding one of L, R, S, H. Thussuch a machine ๐‘€ is encoded by a list of 2 + 3๐‘˜ โ‹… โ„“ natural num-bers. The string representation of ๐‘€ is obtained by concatenatingprefix free representation of all these integers. If a string ๐›ผ โˆˆ 0, 1โˆ—does not represent a list of integers in the form above, then we treatit as representing the trivial Turing machine with one state thatimmediately halts on every input.

RRemark 9.3 โ€” Take away points of representation. Thedetails of the representation scheme of Turing ma-chines as strings are immaterial for almost all applica-tions. What you need to remember are the followingpoints:

1. We can represent every Turing machine as a string.2. Given the string representation of a Turing ma-

chine ๐‘€ and an input ๐‘ฅ, we can simulate ๐‘€ โ€™sexecution on the input ๐‘ฅ. (This is the content ofTheorem 9.1.)

An additional minor issue is that for convenience wemake the assumption that every string represents someTuring machine. This is very easy to ensure by justmapping strings that would otherwise not represent a

Page 329: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 329

Turing machine into some fixed trivial machine. Thisassumption is not very important, but does make afew results (such as Riceโ€™s Theorem: Theorem 9.15) alittle less cumbersome to state.

Using this representation, we can formally prove Theorem 9.1.

Proof of Theorem 9.1. We will only sketch the proof, giving the majorideas. First, we observe that we can easily write a Python programthat, on input a representation (๐‘˜, โ„“, ๐‘‡ ) of a Turing machine ๐‘€ andan input ๐‘ฅ, evaluates ๐‘€ on ๐‘‹. Here is the code of this program forconcreteness, though you can feel free to skip it if you are not familiarwith (or interested in) Python:

# constantsdef EVAL(ฮด,x):

'''Evaluate TM given by transition table ฮดon input x'''Tape = [""] + [a for a in x]i = 0; s = 0 # i = head pos, s = statewhile True:

s, Tape[i], d = ฮด[(s,Tape[i])]if d == "H": breakif d == "L": i = max(i-1,0)if d == "R": i += 1if i>= len(Tape): Tape.append('ฮฆ')

j = 1; Y = [] # produce outputwhile Tape[j] != 'ฮฆ':

Y.append(Tape[j])j += 1

return Y

On input a transition table ๐›ฟ this program will simulate the cor-responding machine ๐‘€ step by step, at each point maintaining theinvariant that the array Tape contains the contents of ๐‘€ โ€™s tape, andthe variable s contains ๐‘€ โ€™s current state.

The above does not prove the theorem as stated, since we needto show a Turing machine that computes EVAL rather than a Pythonprogram. With enough effort, we can translate this Python codeline by line to a Turing machine. However, to prove the theorem wedonโ€™t need to do this, but can use our โ€œeat the cake and have it tooโ€paradigm. That is, while we need to evaluate a Turing machine, inwriting the code for the interpreter we are allowed to use a richermodel such as NAND-RAM since it is equivalent in power to Turingmachines per Theorem 8.1).

Page 330: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

330 introduction to theoretical computer science

Translating the above Python code to NAND-RAM is truly straight-forward. The only issue is that NAND-RAM doesnโ€™t have the dictio-nary data structure built in, which we have used above to store thetransition function ฮด. However, we can represent a dictionary ๐ท ofthe form ๐‘˜๐‘’๐‘ฆ0 โˆถ ๐‘ฃ๐‘Ž๐‘™0, โ€ฆ , ๐‘˜๐‘’๐‘ฆ๐‘šโˆ’1 โˆถ ๐‘ฃ๐‘Ž๐‘™๐‘šโˆ’1 as simply a list of pairs.To compute ๐ท[๐‘˜] we can scan over all the pairs until we find one ofthe form (๐‘˜, ๐‘ฃ) in which case we return ๐‘ฃ. Similarly we scan the listto update the dictionary with a new value, either modifying it or ap-pending the pair (๐‘˜๐‘’๐‘ฆ, ๐‘ฃ๐‘Ž๐‘™) at the end.

RRemark 9.4 โ€” Efficiency of the simulation. The argu-ment in the proof of Theorem 9.1 is a very inefficientway to implement the dictionary data structure inpractice, but it suffices for the purpose of proving thetheorem. Reading and writing to a dictionary of ๐‘švalues in this implementation takes ฮฉ(๐‘š) steps, butit is in fact possible to do this in ๐‘‚(log๐‘š) steps usinga search tree data structure or even ๐‘‚(1) (for โ€œtypicalโ€instances) using a hash table. NAND-RAM and RAMmachines correspond to the architecture of modernelectronic computers, and so we can implement hashtables and search trees in NAND-RAM just as they areimplemented in other programming languages.

The construction above yields a universal Turing Machine with avery large number of states. However, since universal Turing machineshave such a philosophical and technical importance, researchers haveattempted to find the smallest possible universal Turing machines, seeSection 9.7.

9.1.2 Implications of universality (discussion)There is more than one Turing machine ๐‘ˆ that satisfies the condi-tions of Theorem 9.1, but the existence of even a single such machineis already extremely fundamental to both the theory and practice ofcomputer science. Theorem 9.1โ€™s impact reaches beyond the particu-lar model of Turing machines. Because we can simulate every TuringMachine by a NAND-TM program and vice versa, Theorem 9.1 im-mediately implies there exists a universal NAND-TM program ๐‘ƒ๐‘ˆsuch that ๐‘ƒ๐‘ˆ(๐‘ƒ , ๐‘ฅ) = ๐‘ƒ(๐‘ฅ) for every NAND-TM program ๐‘ƒ . We canalso โ€œmix and matchโ€ models. For example since we can simulateevery NAND-RAM program by a Turing machine, and every TuringMachine by the ๐œ† calculus, Theorem 9.1 implies that there exists a ๐œ†expression ๐‘’ such that for every NAND-RAM program ๐‘ƒ and input ๐‘ฅon which ๐‘ƒ(๐‘ฅ) = ๐‘ฆ, if we encode (๐‘ƒ , ๐‘ฅ) as a ๐œ†-expression ๐‘“ (using the

Page 331: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 331

Figure 9.3: a) A particularly elegant example of aโ€œmeta-circular evaluatorโ€ comes from John Mc-Carthyโ€™s 1960 paper, where he defined the Lispprogramming language and gave a Lisp function thatevaluates an arbitrary Lisp program (see above). Lispwas not initially intended as a practical program-ming language and this example was merely meantas an illustration that the Lisp universal function ismore elegant than the universal Turing machine. Itwas McCarthyโ€™s graduate student Steve Russell whosuggested that it can be implemented. As McCarthylater recalled, โ€œI said to him, ho, ho, youโ€™re confusingtheory with practice, this eval is intended for reading, notfor computing. But he went ahead and did it. That is, hecompiled the eval in my paper into IBM 704 machine code,fixing a bug, and then advertised this as a Lisp interpreter,which it certainly wasโ€. b) A self-replicating C programfrom the classic essay of Thompson [Tho84].

๐œ†-calculus encoding of strings as lists of 0โ€™s and 1โ€™s) then (๐‘’ ๐‘“) eval-uates to an encoding of ๐‘ฆ. More generally we can say that for every๐’ณ and ๐’ด in the set Turing Machines, RAM Machines, NAND-TM,NAND-RAM, ๐œ†-calculus, JavaScript, Python, โ€ฆ of Turing equivalentmodels, there exists a program/machine in ๐’ณ that computes the map(๐‘ƒ , ๐‘ฅ) โ†ฆ ๐‘ƒ(๐‘ฅ) for every program/machine ๐‘ƒ โˆˆ ๐’ด.

The idea of a โ€œuniversal programโ€ is of course not limited to theory.For example compilers for programming languages are often used tocompile themselves, as well as programs more complicated than thecompiler. (An extreme example of this is Fabrice Bellardโ€™s ObfuscatedTiny C Compiler which is a C program of 2048 bytes that can compilea large subset of the C programming language, and in particular cancompile itself.) This is also related to the fact that it is possible to writea program that can print its own source code, see Fig. 9.3. There areuniversal Turing machines known that require a very small numberof states or alphabet symbols, and in particular there is a universalTuring machine (with respect to a particular choice of representingTuring machines as strings) whose tape alphabet is โ–ท,โˆ…, 0, 1 andhas fewer than 25 states (see Section 9.7).

9.2 IS EVERY FUNCTION COMPUTABLE?

In Theorem 4.12, we saw that NAND-CIRC programs can computeevery finite function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1. Therefore a natural guess isthat NAND-TM programs (or equivalently, Turing Machines) couldcompute every infinite function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. However, thisturns out to be false. That is, there exists a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1that is uncomputable!

Page 332: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

332 introduction to theoretical computer science

The existence of uncomputable functions is quite surprising. Ourintuitive notion of a โ€œfunctionโ€ (and the notion most mathematicianshad until the 20th century) is that a function ๐‘“ defines some implicitor explicit way of computing the output ๐‘“(๐‘ฅ) from the input ๐‘ฅ. Thenotion of an โ€œuncomputable functionโ€ thus seems to be a contradic-tion in terms, but yet the following theorem shows that such creaturesdo exist:

Theorem 9.5 โ€” Uncomputable functions. There exists a function ๐น โˆ— โˆถ0, 1โˆ— โ†’ 0, 1 that is not computable by any Turing machine.

Proof Idea:

The idea behind the proof follows quite closely Cantorโ€™s proof thatthe reals are uncountable (Theorem 2.5), and in fact the theorem canalso be obtained fairly directly from that result (see Exercise 7.11).However, it is instructive to see the direct proof. The idea is to con-struct ๐น โˆ— in a way that will ensure that every possible machine ๐‘€ willin fact fail to compute ๐น โˆ—. We do so by defining ๐น โˆ—(๐‘ฅ) to equal 0 if ๐‘ฅdescribes a Turing machine ๐‘€ which satisfies ๐‘€(๐‘ฅ) = 1 and defining๐น โˆ—(๐‘ฅ) = 1 otherwise. By construction, if ๐‘€ is any Turing machine and๐‘ฅ is the string describing it, then ๐น โˆ—(๐‘ฅ) โ‰  ๐‘€(๐‘ฅ) and therefore ๐‘€ doesnot compute ๐น โˆ—.โ‹†Proof of Theorem 9.5. The proof is illustrated in Fig. 9.4. We start bydefining the following function ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1:

For every string ๐‘ฅ โˆˆ 0, 1โˆ—, if ๐‘ฅ satisfies (1) ๐‘ฅ is a valid repre-sentation of some Turing machine ๐‘€ (per the representation schemeabove) and (2) when the program ๐‘€ is executed on the input ๐‘ฅ ithalts and produces an output, then we define ๐บ(๐‘ฅ) as the first bit ofthis output. Otherwise (i.e., if ๐‘ฅ is not a valid representation of a Tur-ing machine, or the machine ๐‘€๐‘ฅ never halts on ๐‘ฅ) we define ๐บ(๐‘ฅ) = 0.We define ๐น โˆ—(๐‘ฅ) = 1 โˆ’ ๐บ(๐‘ฅ).

We claim that there is no Turing machine that computes ๐น โˆ—. In-deed, suppose, towards the sake of contradiction, there exists a ma-chine ๐‘€ that computes ๐น โˆ—, and let ๐‘ฅ be the binary string that rep-resents the machine ๐‘€ . On one hand, since by our assumption ๐‘€computes ๐น โˆ—, on input ๐‘ฅ the machine ๐‘€ halts and outputs ๐น โˆ—(๐‘ฅ). Onthe other hand, by the definition of ๐น โˆ—, since ๐‘ฅ is the representationof the machine ๐‘€ , ๐น โˆ—(๐‘ฅ) = 1 โˆ’ ๐บ(๐‘ฅ) = 1 โˆ’ ๐‘€(๐‘ฅ), hence yielding acontradiction.

Page 333: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 333

Figure 9.4: We construct an uncomputable functionby defining for every two strings ๐‘ฅ, ๐‘ฆ the value1 โˆ’ ๐‘€๐‘ฆ(๐‘ฅ) which equals 0 if the machine describedby ๐‘ฆ outputs 1 on ๐‘ฅ, and 1 otherwise. We then define๐น โˆ—(๐‘ฅ) to be the โ€œdiagonalโ€ of this table, namely๐น โˆ—(๐‘ฅ) = 1 โˆ’ ๐‘€๐‘ฅ(๐‘ฅ) for every ๐‘ฅ. The function ๐น โˆ—is uncomputable, because if it was computable bysome machine whose string description is ๐‘ฅโˆ— then wewould get that ๐‘€๐‘ฅโˆ— (๐‘ฅโˆ—) = ๐น(๐‘ฅโˆ—) = 1 โˆ’ ๐‘€๐‘ฅโˆ— (๐‘ฅโˆ—).

Big Idea 12 There are some functions that can not be computed byany algorithm.

PThe proof of Theorem 9.5 is short but subtle. I suggestthat you pause here and go back to read it again andthink about it - this is a proof that is worth reading atleast twice if not three or four times. It is not often thecase that a few lines of mathematical reasoning estab-lish a deeply profound fact - that there are problemswe simply cannot solve.

The type of argument used to prove Theorem 9.5 is known as di-agonalization since it can be described as defining a function basedon the diagonal entries of a table as in Fig. 9.4. The proof can bethought of as an infinite version of the counting argument we usedfor showing lower bound for NAND-CIRC programs in Theorem 5.3.Namely, we show that itโ€™s not possible to compute all functions from0, 1โˆ— โ†’ 0, 1 by Turing machines simply because there are morefunctions like that then there are Turing machines.

As mentioned in Remark 7.4, many texts use the โ€œlanguageโ€ ter-minology and so will call a set ๐ฟ โŠ† 0, 1โˆ— an undecidable or nonrecursive language if the function ๐น โˆถ 0, 1โˆ— โˆถโ†’ 0, 1 such that๐น(๐‘ฅ) = 1 โ†” ๐‘ฅ โˆˆ ๐ฟ is uncomputable.

Page 334: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

334 introduction to theoretical computer science

9.3 THE HALTING PROBLEM

Theorem 9.5 shows that there is some function that cannot be com-puted. But is this function the equivalent of the โ€œtree that falls in theforest with no one hearing itโ€? That is, perhaps it is a function thatno one actually wants to compute. It turns out that there are naturaluncomputable functions:

Theorem 9.6 โ€” Uncomputability of Halting function. Let HALT โˆถ 0, 1โˆ— โ†’0, 1 be the function such that for every string ๐‘€ โˆˆ 0, 1โˆ—,HALT(๐‘€, ๐‘ฅ) = 1 if Turing machine ๐‘€ halts on the input ๐‘ฅ andHALT(๐‘€, ๐‘ฅ) = 0 otherwise. Then HALT is not computable.

Before turning to prove Theorem 9.6, we note that HALT is a verynatural function to want to compute. For example, one can think ofHALT as a special case of the task of managing an โ€œApp storeโ€. Thatis, given the code of some application, the gatekeeper for the storeneeds to decide if this code is safe enough to allow in the store or not.At a minimum, it seems that we should verify that the code would notgo into an infinite loop.Proof Idea:

One way to think about this proof is as follows:

Uncomputability of ๐น โˆ— + Universality = Uncomputability of HALT(9.2)

That is, we will use the universal Turing machine that computes EVALto derive the uncomputability of HALT from the uncomputability of๐น โˆ— shown in Theorem 9.5. Specifically, the proof will be by contra-diction. That is, we will assume towards a contradiction that HALT iscomputable, and use that assumption, together with the universal Tur-ing machine of Theorem 9.1, to derive that ๐น โˆ— is computable, whichwill contradict Theorem 9.5.โ‹†

Big Idea 13 If a function ๐น is uncomputable we can show thatanother function ๐ป is uncomputable by giving a way to reduce the taskof computing ๐น to computing ๐ป .

Proof of Theorem 9.6. The proof will use the previously establishedresult Theorem 9.5. Recall that Theorem 9.5 shows that the followingfunction ๐น โˆ— โˆถ 0, 1โˆ— โ†’ 0, 1 is uncomputable:

๐น โˆ—(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ(๐‘ฅ) = 00 otherwise(9.3)

Page 335: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 335

where ๐‘ฅ(๐‘ฅ) denotes the output of the Turing machine described by thestring ๐‘ฅ on the input ๐‘ฅ (with the usual convention that ๐‘ฅ(๐‘ฅ) = โŠฅ if thiscomputation does not halt).

We will show that the uncomputability of ๐น โˆ— implies the uncom-putability of HALT. Specifically, we will assume, towards a contra-diction, that there exists a Turing machine ๐‘€ that can compute theHALT function, and use that to obtain a Turing machine ๐‘€ โ€ฒ that com-putes the function ๐น โˆ—. (This is known as a proof by reduction, since wereduce the task of computing ๐น โˆ— to the task of computing HALT. Bythe contrapositive, this means the uncomputability of ๐น โˆ— implies theuncomputability of HALT.)

Indeed, suppose that ๐‘€ is a Turing machine that computes HALT.Algorithm 9.7 describes a Turing Machine ๐‘€ โ€ฒ that computes ๐น โˆ—. (Weuse โ€œhigh levelโ€ description of Turing machines, appealing to theโ€œhave your cake and eat it tooโ€ paradigm, see Big Idea 10.)

Algorithm 9.7 โ€” ๐น โˆ— to ๐ป๐ด๐ฟ๐‘‡ reduction.

Input: ๐‘ฅ โˆˆ 0, 1โˆ—Output: ๐น โˆ—(๐‘ฅ)1: # Assume T.M. ๐‘€๐ป๐ด๐ฟ๐‘‡ computes ๐ป๐ด๐ฟ๐‘‡2: Let ๐‘ง โ† ๐‘€๐ป๐ด๐ฟ๐‘‡ (๐‘ฅ, ๐‘ฅ). # Assume ๐‘ง = ๐ป๐ด๐ฟ๐‘‡ (๐‘ฅ, ๐‘ฅ).3: if ๐‘ง = 0 then4: return 05: end if6: Let ๐‘ฆ โ† ๐‘ˆ(๐‘ฅ, ๐‘ฅ) # ๐‘ˆ universal TM, i.e., ๐‘ฆ = ๐‘ฅ(๐‘ฅ)7: if ๐‘ฆ = 0 then8: return 19: end if

10: return 0We claim that Algorithm 9.7 computes the function ๐น โˆ—. In-

deed, suppose that ๐‘ฅ(๐‘ฅ) = 0 (and hence ๐น โˆ—(๐‘ฅ) = 1). In thiscase, HALT(๐‘ฅ, ๐‘ฅ) = 1 and hence, under our assumption that๐‘€(๐‘ฅ, ๐‘ฅ) = HALT(๐‘ฅ, ๐‘ฅ), the value ๐‘ง will equal 1, and hence Al-gorithm 9.7 will set ๐‘ฆ = ๐‘ฅ(๐‘ฅ) = 0, and output the correct value1.

Suppose otherwise that ๐‘ฅ(๐‘ฅ) โ‰  0 (and hence ๐น โˆ—(๐‘ฅ) = 0). In thiscase there are two possibilities:

โ€ข Case 1: The machine described by ๐‘ฅ does not halt on the input ๐‘ฅ.In this case, HALT(๐‘ฅ, ๐‘ฅ) = 0. Since we assume that ๐‘€ computesHALT it means that on input ๐‘ฅ, ๐‘ฅ, the machine ๐‘€ must halt andoutput the value 0. This means that Algorithm 9.7 will set ๐‘ง = 0and output 0.

Page 336: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

336 introduction to theoretical computer science

1 This argument has also been connected to theissues of consciousness and free will. I am personallyskeptical of its relevance to these issues. Perhaps thereasoning is that humans have the ability to solve thehalting problem but they exercise their free will andconsciousness by choosing not to do so.

โ€ข Case 2: The machine described by ๐‘ฅ halts on the input ๐‘ฅ and out-puts some ๐‘ฆโ€ฒ โ‰  0. In this case, since HALT(๐‘ฅ, ๐‘ฅ) = 1, under ourassumptions, Algorithm 9.7 will set ๐‘ฆ = ๐‘ฆโ€ฒ โ‰  0 and so output 0.We see that in all cases, ๐‘€ โ€ฒ(๐‘ฅ) = ๐น โˆ—(๐‘ฅ), which contradicts the

fact that ๐น โˆ— is uncomputable. Hence we reach a contradiction to ouroriginal assumption that ๐‘€ computes HALT.

POnce again, this is a proof thatโ€™s worth reading morethan once. The uncomputability of the halting prob-lem is one of the fundamental theorems of computerscience, and is the starting point for much of the in-vestigations we will see later. An excellent way to geta better understanding of Theorem 9.6 is to go overSection 9.3.2, which presents an alternative proof ofthe same result.

9.3.1 Is the Halting problem really hard? (discussion)Many peopleโ€™s first instinct when they see the proof of Theorem 9.6is to not believe it. That is, most people do believe the mathematicalstatement, but intuitively it doesnโ€™t seem that the Halting problem isreally that hard. After all, being uncomputable only means that HALTcannot be computed by a Turing machine.

But programmers seem to solve HALT all the time by informally orformally arguing that their programs halt. Itโ€™s true that their programsare written in C or Python, as opposed to Turing machines, but thatmakes no difference: we can easily translate back and forth betweenthis model and any other programming language.

While every programmer encounters at some point an infinite loop,is there really no way to solve the halting problem? Some peopleargue that they personally can, if they think hard enough, determinewhether any concrete program that they are given will halt or not.Some have even argued that humans in general have the ability todo that, and hence humans have inherently superior intelligence tocomputers or anything else modeled by Turing machines.1

The best answer we have so far is that there truly is no way to solveHALT, whether using Macs, PCs, quantum computers, humans, orany other combination of electronic, mechanical, and biological de-vices. Indeed this assertion is the content of the Church-Turing Thesis.This of course does not mean that for every possible program ๐‘ƒ , itis hard to decide if ๐‘ƒ enters an infinite loop. Some programs donโ€™teven have loops at all (and hence trivially halt), and there are manyother far less trivial examples of programs that we can certify to never

Page 337: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 337

Figure 9.5: SMBCโ€™s take on solving the Halting prob-lem.

enter an infinite loop (or programs that we know for sure that willenter such a loop). However, there is no general procedure that woulddetermine for an arbitrary program ๐‘ƒ whether it halts or not. More-over, there are some very simple programs for which no one knowswhether they halt or not. For example, the following Python programwill halt if and only if Goldbachโ€™s conjecture is false:

def isprime(p):return all(p % i for i in range(2,p-1))

def Goldbach(n):return any( (isprime(p) and isprime(n-p))

for p in range(2,n-1))

n = 4while True:

if not Goldbach(n): breakn+= 2

Given that Goldbachโ€™s Conjecture has been open since 1742, it isunclear that humans have any magical ability to say whether this (orother similar programs) will halt or not.

9.3.2 A direct proof of the uncomputability of HALT (optional)It turns out that we can combine the ideas of the proofs of Theo-rem 9.5 and Theorem 9.6 to obtain a short proof of the latter theorem,that does not appeal to the uncomputability of ๐น โˆ—. This short proofappeared in print in a 1965 letter to the editor of Christopher Strachey:

To the Editor, The Computer Journal.An Impossible ProgramSir,A well-known piece of folk-lore among programmers holds that it isimpossible to write a program which can examine any other programand tell, in every case, if it will terminate or get into a closed loop whenit is run. I have never actually seen a proof of this in print, and thoughAlan Turing once gave me a verbal proof (in a railway carriage on theway to a Conference at the NPL in 1953), I unfortunately and promptlyforgot the details. This left me with an uneasy feeling that the proofmust be long or complicated, but in fact it is so short and simple that itmay be of interest to casual readers. The version below uses CPL, butnot in any essential way.Suppose T[R] is a Boolean function taking a routine (or program) Rwith no formal or free variables as its arguments and that for all R,T[R] = True if R terminates if run and that T[R] = False if R does notterminate.Consider the routine P defined as follows

Page 338: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

338 introduction to theoretical computer science

rec routine PยงL: if T[P] go to LReturn ยง

If T[P] = True the routine P will loop, and it will only terminate ifT[P] = False. In each case โ€˜T[P]โ€œ has exactly the wrong value, and thiscontradiction shows that the function T cannot exist.Yours faithfully,C. StracheyChurchill College, Cambridge

PTry to stop and extract the argument for provingTheorem 9.6 from the letter above.

Since CPL is not as common today, let us reproduce this proof. Theidea is the following: suppose for the sake of contradiction that thereexists a program T such that T(f,x) equals True iff f halts on inputx. (Stracheyโ€™s letter considers the no-input variant of HALT, but asweโ€™ll see, this is an immaterial distinction.) Then we can construct aprogram P and an input x such that T(P,x) gives the wrong answer.The idea is that on input x, the program P will do the following: runT(x,x), and if the answer is True then go into an infinite loop, andotherwise halt. Now you can see that T(P,P) will give the wronganswer: if P halts when it gets its own code as input, then T(P,P) issupposed to be True, but then P(P) will go into an infinite loop. Andif P does not halt, then T(P,P) is supposed to be False but then P(P)will halt. We can also code this up in Python:

def CantSolveMe(T):"""Gets function T that claims to solve HALT.Returns a pair (P,x) of code and input on whichT(P,x) โ‰  HALT(x)"""def fool(x):

if T(x,x):while True: pass

return "I halted"

return (fool,fool)

For example, consider the following Naive Python program T thatguesses that a given function does not halt if its input contains whileor for

Page 339: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 339

def T(f,x):"""Crude halting tester - decides it doesn't halt if it

contains a loop."""import inspectsource = inspect.getsource(f)if source.find("while"): return Falseif source.find("for"): return Falsereturn True

If we now set (f,x) = CantSolveMe(T), then T(f,x)=False butf(x) does in fact halt. This is of course not specific to this particular T:for every program T, if we run (f,x) = CantSolveMe(T) then weโ€™llget an input on which T gives the wrong answer to HALT.

9.4 REDUCTIONS

The Halting problem turns out to be a linchpin of uncomputability, inthe sense that Theorem 9.6 has been used to show the uncomputabil-ity of a great many interesting functions. We will see several examplesof such results in this chapter and the exercises, but there are manymore such results (see Fig. 9.6).

Figure 9.6: Some uncomputability results. An arrowfrom problem X to problem Y means that we use theuncomputability of X to prove the uncomputabilityof Y by reducing computing X to computing Y.All of these results except for the MRDP Theoremappear in either the text or exercises. The HaltingProblem HALT serves as our starting point for allthese uncomputability results as well as many others.

The idea behind such uncomputability results is conceptually sim-ple but can at first be quite confusing. If we know that HALT is un-computable, and we want to show that some other function BLAH isuncomputable, then we can do so via a contrapositive argument (i.e.,proof by contradiction). That is, we show that if there exists a Turingmachine that computes BLAH then there exists a Turing machine thatcomputes HALT. (Indeed, this is exactly how we showed that HALTitself is uncomputable, by reducing this fact to the uncomputability ofthe function ๐น โˆ— from Theorem 9.5.)

Page 340: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

340 introduction to theoretical computer science

For example, to prove that BLAH is uncomputable, we could showthat there is a computable function ๐‘… โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— such that forevery pair ๐‘€ and ๐‘ฅ, HALT(๐‘€, ๐‘ฅ) = BLAH(๐‘…(๐‘€, ๐‘ฅ)). The existence ofsuch a function ๐‘… implies that if BLAH was computable then HALTwould be computable as well, hence leading to a contradiction! Theconfusing part about reductions is that we are assuming somethingwe believe is false (that BLAH has an algorithm) to derive somethingthat we know is false (that HALT has an algorithm). Michael Sipserdescribes such results as having the form โ€œIf pigs could whistle thenhorses could flyโ€.

A reduction-based proof has two components. For starters, sincewe need ๐‘… to be computable, we should describe the algorithm tocompute it. The algorithm to compute ๐‘… is known as a reduction sincethe transformation ๐‘… modifies an input to HALT to an input to BLAH,and hence reduces the task of computing HALT to the task of comput-ing BLAH. The second component of a reduction-based proof is theanalysis of the algorithm ๐‘…: namely a proof that ๐‘… does indeed satisfythe desired properties.

Reduction-based proofs are just like other proofs by contradiction,but the fact that they involve hypothetical algorithms that donโ€™t reallyexist tends to make reductions quite confusing. The one silver liningis that at the end of the day the notion of reductions is mathematicallyquite simple, and so itโ€™s not that bad even if you have to go back tofirst principles every time you need to remember what is the directionthat a reduction should go in.

RRemark 9.8 โ€” Reductions are algorithms. A reductionis an algorithm, which means that, as discussed inRemark 0.3, a reduction has three components:

โ€ข Specification (what): In the case of a reductionfrom HALT to BLAH, the specification is that func-tion ๐‘… โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— should satisfy thatHALT(๐‘€, ๐‘ฅ) = BLAH(๐‘…(๐‘€, ๐‘ฅ)) for every Tur-ing machine ๐‘€ and input ๐‘ฅ. In general, to reducea function ๐น to ๐บ, the reduction should satisfy๐น(๐‘ค) = ๐บ(๐‘…(๐‘ค)) for every input ๐‘ค to ๐น .

โ€ข Implementation (how): The algorithmโ€™s descrip-tion: the precise instructions how to transform aninput ๐‘ค to the output ๐‘…(๐‘ค).

โ€ข Analysis (why): A proof that the algorithm meetsthe specification. In particular, in a reductionfrom ๐น to ๐บ this is a proof that for every input๐‘ค, the output ๐‘ฆ of the algorithm satisfies that๐น(๐‘ค) = ๐บ(๐‘ฆ).

Page 341: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 341

9.4.1 Example: Halting on the zero problemHere is a concrete example for a proof by reduction. We define thefunction HALTONZERO โˆถ 0, 1โˆ— โ†’ 0, 1 as follows. Given anystring ๐‘€ , HALTONZERO(๐‘€) = 1 if and only if ๐‘€ describes a Turingmachine that halts when it is given the string 0 as input. A prioriHALTONZERO seems like a potentially easier function to computethan the full-fledged HALT function, and so we could perhaps hopethat it is not uncomputable. Alas, the following theorem shows thatthis is not the case:

Theorem 9.9 โ€” Halting without input. HALTONZERO is uncomputable.

PThe proof of Theorem 9.9 is below, but before readingit you might want to pause for a couple of minutesand think how you would prove it yourself. In partic-ular, try to think of what a reduction from HALT toHALTONZERO would look like. Doing so is an excel-lent way to get some initial comfort with the notionof proofs by reduction, which a technique we will beusing time and again in this book.

Figure 9.7: To prove Theorem 9.9, we show thatHALTONZERO is uncomputable by giving a reductionfrom the task of computing HALT to the task of com-puting HALTONZERO. This shows that if there was ahypothetical algorithm ๐ด computing HALTONZERO,then there would be an algorithm ๐ต computingHALT, contradicting Theorem 9.6. Since neither ๐ด nor๐ต actually exists, this is an example of an implicationof the form โ€œif pigs could whistle then horses couldflyโ€.

Proof of Theorem 9.9. The proof is by reduction from HALT, seeFig. 9.7. We will assume, towards the sake of contradiction, thatHALTONZERO is computable by some algorithm ๐ด, and use thishypothetical algorithm ๐ด to construct an algorithm ๐ต to computeHALT, hence obtaining a contradiction to Theorem 9.6. (As discussed

Page 342: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

342 introduction to theoretical computer science

in Big Idea 10, following our โ€œhave your cake and eat it tooโ€ paradigm,we just use the generic name โ€œalgorithmโ€ rather than worryingwhether we model them as Turing machines, NAND-TM programs,NAND-RAM, etc.; this makes no difference since all these models areequivalent to one another.)

Since this is our first proof by reduction from the Halting prob-lem, we will spell it out in more details than usual. Such a proof byreduction consists of two steps:

1. Description of the reduction: We will describe the operation of ouralgorithm ๐ต, and how it makes โ€œfunction callsโ€ to the hypotheticalalgorithm ๐ด.

2. Analysis of the reduction: We will then prove that under the hypoth-esis that Algorithm ๐ด computes HALTONZERO, Algorithm ๐ต willcompute HALT.

Algorithm 9.10 โ€” ๐ป๐ด๐ฟ๐‘‡ to ๐ป๐ด๐ฟ๐‘‡ ๐‘‚๐‘๐‘๐ธ๐‘…๐‘‚ reduction.

Input: Turing machine ๐‘€ and string ๐‘ฅ.Output: Turing machine ๐‘€ โ€ฒ such that ๐‘€ halts on ๐‘ฅ iff ๐‘€ โ€ฒ

halts on zero1: procedure ๐‘๐‘€,๐‘ฅ(๐‘ค) # Description of the T.M. ๐‘๐‘€,๐‘ฅ2: return ๐ธ๐‘‰ ๐ด๐ฟ(๐‘€, ๐‘ฅ) # Ignore the

Input: ๐‘ค, evaluate ๐‘€ on ๐‘ฅ.3: end procedure4: return ๐‘๐‘€,๐‘ฅ # We do not execute ๐‘๐‘€,๐‘ฅ: only return its

description

Our Algorithm ๐ต works as follows: on input ๐‘€, ๐‘ฅ, it runs Algo-rithm 9.10 to obtain a Turing Machine ๐‘€ โ€ฒ, and then returns ๐ด(๐‘€ โ€ฒ).The machine ๐‘€ โ€ฒ ignores its input ๐‘ง and simply runs ๐‘€ on ๐‘ฅ.

In pseudocode, the program ๐‘๐‘€,๐‘ฅ will look something like thefollowing:

def N(z):M = r'.......'# a string constant containing desc. of Mx = r'.......'# a string constant containing xreturn eval(M,x)# note that we ignore the input z

That is, if we think of ๐‘๐‘€,๐‘ฅ as a program, then it is a program thatcontains ๐‘€ and ๐‘ฅ as โ€œhardwired constantsโ€, and given any input ๐‘ง, itsimply ignores the input and always returns the result of evaluating

Page 343: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 343

๐‘€ on ๐‘ฅ. The algorithm ๐ต does not actually execute the machine ๐‘๐‘€,๐‘ฅ.๐ต merely writes down the description of ๐‘๐‘€,๐‘ฅ as a string (just as wedid above) and feeds this string as input to ๐ด.

The above completes the description of the reduction. The analysis isobtained by proving the following claim:

Claim: For every strings ๐‘€, ๐‘ฅ, ๐‘ง, the machine ๐‘๐‘€,๐‘ฅ constructed byAlgorithm ๐ต in Step 1 satisfies that ๐‘๐‘€,๐‘ฅ halts on ๐‘ง if and only if theprogram described by ๐‘€ halts on the input ๐‘ฅ.

Proof of Claim: Since ๐‘๐‘€,๐‘ฅ ignores its input and evaluates ๐‘€ on ๐‘ฅusing the universal Turing machine, it will halt on ๐‘ง if and only if ๐‘€halts on ๐‘ฅ.

In particular if we instantiate this claim with the input ๐‘ง = 0 to๐‘๐‘€,๐‘ฅ, we see that HALTONZERO(๐‘๐‘€,๐‘ฅ) = HALT(๐‘€, ๐‘ฅ). Thus ifthe hypothetical algorithm ๐ด satisfies ๐ด(๐‘€) = HALTONZERO(๐‘€)for every ๐‘€ then the algorithm ๐ต we construct satisfies ๐ต(๐‘€, ๐‘ฅ) =HALT(๐‘€, ๐‘ฅ) for every ๐‘€, ๐‘ฅ, contradicting the uncomputability ofHALT.

RRemark 9.11 โ€” The hardwiring technique. In the proof ofTheorem 9.9 we used the technique of โ€œhardwiringโ€an input ๐‘ฅ to a program/machine ๐‘ƒ . That is, modify-ing a program ๐‘ƒ that it uses โ€œhardwired constantsโ€for some of all of its input. This technique is quitecommon in reductions and elsewhere, and we willoften use it again in this course.

9.5 RICEโ€™S THEOREM AND THE IMPOSSIBILITY OF GENERALSOFTWARE VERIFICATION

The uncomputability of the Halting problem turns out to be a specialcase of a much more general phenomenon. Namely, that we cannotcertify semantic properties of general purpose programs. โ€œSemantic prop-ertiesโ€ mean properties of the function that the program computes, asopposed to properties that depend on the particular syntax used bythe program.

An example for a semantic property of a program ๐‘ƒ is the propertythat whenever ๐‘ƒ is given an input string with an even number of 1โ€™s,it outputs 0. Another example is the property that ๐‘ƒ will always haltwhenever the input ends with a 1. In contrast, the property that a Cprogram contains a comment before every function declaration is nota semantic property, since it depends on the actual source code asopposed to the input/output relation.

Page 344: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

344 introduction to theoretical computer science

Checking semantic properties of programs is of great interest, as itcorresponds to checking whether a program conforms to a specifica-tion. Alas it turns out that such properties are in general uncomputable.We have already seen some examples of uncomputable semantic func-tions, namely HALT and HALTONZERO, but these are just the โ€œtip ofthe icebergโ€. We start by observing one more such example:

Theorem 9.12 โ€” Computing all zero function. Let ZEROFUNC โˆถ 0, 1โˆ— โ†’0, 1 be the function such that for every ๐‘€ โˆˆ 0, 1โˆ—, ZEROFUNC(๐‘€) =1 if and only if ๐‘€ represents a Turing machine such that ๐‘€ outputs0 on every input ๐‘ฅ โˆˆ 0, 1โˆ—. Then ZEROFUNC is uncomputable.

PDespite the similarity in their names, ZEROFUNC andHALTONZERO are two different functions. For exam-ple, if ๐‘€ is a Turing machine that on input ๐‘ฅ โˆˆ 0, 1โˆ—,halts and outputs the OR of all of ๐‘ฅโ€™s coordinates, thenHALTONZERO(๐‘€) = 1 (since ๐‘€ does halt on theinput 0) but ZEROFUNC(๐‘€) = 0 (since ๐‘€ does notcompute the constant zero function).

Proof of Theorem 9.12. The proof is by reduction to HALTONZERO.Suppose, towards the sake of contradiction, that there was an algo-rithm ๐ด such that ๐ด(๐‘€) = ZEROFUNC(๐‘€) for every ๐‘€ โˆˆ 0, 1โˆ—.Then we will construct an algorithm ๐ต that solves HALTONZERO,contradicting Theorem 9.9.

Given a Turing machine ๐‘ (which is the input to HALTONZERO),our Algorithm ๐ต does the following:

1. Construct a Turing Machine ๐‘€ which on input ๐‘ฅ โˆˆ 0, 1โˆ—, firstruns ๐‘(0) and then outputs 0.

2. Return ๐ด(๐‘€).Now if ๐‘ halts on the input 0 then the Turing machine ๐‘€ com-

putes the constant zero function, and hence under our assumptionthat ๐ด computes ZEROFUNC, ๐ด(๐‘€) = 1. If ๐‘ does not halt on theinput 0, then the Turing machine ๐‘€ will not halt on any input, andso in particular will not compute the constant zero function. Henceunder our assumption that ๐ด computes ZEROFUNC, ๐ด(๐‘€) = 0.We see that in both cases, ZEROFUNC(๐‘€) = HALTONZERO(๐‘)and hence the value that Algorithm ๐ต returns in step 2 is equal toHALTONZERO(๐‘) which is what we needed to prove.

Another result along similar lines is the following:

Page 345: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 345

Theorem 9.13 โ€” Uncomputability of verifying parity. The following func-tion is uncomputable

COMPUTES-PARITY(๐‘ƒ ) = โŽงโŽจโŽฉ1 ๐‘ƒ computes the parity function0 otherwise(9.4)

PWe leave the proof of Theorem 9.13 as an exercise(Exercise 9.6). I strongly encourage you to stop hereand try to solve this exercise.

9.5.1 Riceโ€™s TheoremTheorem 9.13 can be generalized far beyond the parity function. Infact, this generalization rules out verifying any type of semantic spec-ification on programs. We define a semantic specification on programsto be some property that does not depend on the code of the programbut just on the function that the program computes.

For example, consider the following two C programs

int First(int n) if (n<0) return 0;return 2*n;

int Second(int n) int i = 0;int j = 0if (n<0) return 0;while (j<n)

i = i + 2;j= j + 1;

return i;

First and Second are two distinct C programs, but they computethe same function. A semantic property, would be either true for bothprograms or false for both programs, since it depends on the functionthe programs compute and not on their code. An example for a se-mantic property that both First and Second satisfy is the following:โ€œThe program ๐‘ƒ computes a function ๐‘“ mapping integers to integers satisfy-ing that ๐‘“(๐‘›) โ‰ฅ ๐‘› for every input ๐‘›โ€.

Page 346: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

346 introduction to theoretical computer science

A property is not semantic if it depends on the source code ratherthan the input/output behavior. For example, properties such as โ€œtheprogram contains the variable kโ€ or โ€œthe program uses the while op-erationโ€ are not semantic. Such properties can be true for one of theprograms and false for others. Formally, we define semantic proper-ties as follows:

Definition 9.14 โ€” Semantic properties. A pair of Turing machines๐‘€ and ๐‘€ โ€ฒ are functionally equivalent if for every ๐‘ฅ โˆˆ 0, 1โˆ—,๐‘€(๐‘ฅ) = ๐‘€ โ€ฒ(๐‘ฅ). (In particular, ๐‘€(๐‘ฅ) = โŠฅ iff ๐‘€ โ€ฒ(๐‘ฅ) = โŠฅ for all๐‘ฅ.)A function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is semantic if for every pair of

strings ๐‘€, ๐‘€ โ€ฒ that represent functionally equivalent Turing ma-chines, ๐น(๐‘€) = ๐น(๐‘€ โ€ฒ). (Recall that we assume that every stringrepresents some Turing machine, see Remark 9.3)

There are two trivial examples of semantic functions: the constantone function and the constant zero function. For example, if ๐‘ is theconstant zero function (i.e., ๐‘(๐‘€) = 0 for every ๐‘€) then clearly๐น(๐‘€) = ๐น(๐‘€ โ€ฒ) for every pair of Turing machines ๐‘€ and ๐‘€ โ€ฒ that arefunctionally equivalent ๐‘€ and ๐‘€ โ€ฒ. Here is a non-trivial exampleSolved Exercise 9.1 โ€” ๐‘๐ธ๐‘…๐‘‚๐น๐‘ˆ๐‘๐ถ is semantic. Prove that the functionZEROFUNC is semantic.

Solution:

Recall that ZEROFUNC(๐‘€) = 1 if and only if ๐‘€(๐‘ฅ) = 0 forevery ๐‘ฅ โˆˆ 0, 1โˆ—. If ๐‘€ and ๐‘€ โ€ฒ are functionally equivalent, then forevery ๐‘ฅ, ๐‘€(๐‘ฅ) = ๐‘€ โ€ฒ(๐‘ฅ). Hence ZEROFUNC(๐‘€) = 1 if and only ifZEROFUNC(๐‘€ โ€ฒ) = 1.

Often the properties of programs that we are most interested incomputing are the semantic ones, since we want to understand theprogramsโ€™ functionality. Unfortunately, Riceโ€™s Theorem tells us thatthese properties are all uncomputable:

Theorem 9.15 โ€” Riceโ€™s Theorem. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. If ๐น is seman-tic and non-trivial then it is uncomputable.

Proof Idea:

The idea behind the proof is to show that every semantic non-trivial function ๐น is at least as hard to compute as HALTONZERO.This will conclude the proof since by Theorem 9.9, HALTONZEROis uncomputable. If a function ๐น is non trivial then there are two

Page 347: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 347

machines ๐‘€0 and ๐‘€1 such that ๐น(๐‘€0) = 0 and ๐น(๐‘€1) = 1. So,the goal would be to take a machine ๐‘ and find a way to map it intoa machine ๐‘€ = ๐‘…(๐‘), such that (i) if ๐‘ halts on zero then ๐‘€ isfunctionally equivalent to ๐‘€1 and (ii) if ๐‘ does not halt on zero then๐‘€ is functionally equivalent ๐‘€0.

Because ๐น is semantic, if we achieved this, then we would be guar-anteed that HALTONZERO(๐‘) = ๐น(๐‘…(๐‘)), and hence would showthat if ๐น was computable, then HALTONZERO would be computableas well, contradicting Theorem 9.9.โ‹†Proof of Theorem 9.15. We will not give the proof in full formality, butrather illustrate the proof idea by restricting our attention to a particu-lar semantic function ๐น . However, the same techniques generalize toall possible semantic functions. Define MONOTONE โˆถ 0, 1โˆ— โ†’ 0, 1as follows: MONOTONE(๐‘€) = 1 if there does not exist ๐‘› โˆˆ โ„• andtwo inputs ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐‘› such that for every ๐‘– โˆˆ [๐‘›] ๐‘ฅ๐‘– โ‰ค ๐‘ฅโ€ฒ๐‘– but ๐‘€(๐‘ฅ)outputs 1 and ๐‘€(๐‘ฅโ€ฒ) = 0. That is, MONOTONE(๐‘€) = 1 if itโ€™s notpossible to find an input ๐‘ฅ such that flipping some bits of ๐‘ฅ from 0 to1 will change ๐‘€ โ€™s output in the other direction from 1 to 0. We willprove that MONOTONE is uncomputable, but the proof will easilygeneralize to any semantic function.

We start by noting that MONOTONE is neither the constant zeronor the constant one function:

โ€ข The machine INF that simply goes into an infinite loop on everyinput satisfies MONOTONE(INF) = 1, since INF is not definedanywhere and so in particular there are no two inputs ๐‘ฅ, ๐‘ฅโ€ฒ where๐‘ฅ๐‘– โ‰ค ๐‘ฅโ€ฒ๐‘– for every ๐‘– but INF(๐‘ฅ) = 0 and INF(๐‘ฅโ€ฒ) = 1.

โ€ข The machine PAR that computes the XOR or parity of its input, isnot monotone (e.g., PAR(1, 1, 0, 0, โ€ฆ , 0) = 0 but PAR(1, 0, 0, โ€ฆ , 0) =0) and hence MONOTONE(PAR) = 0.(Note that INF and PAR are machines and not functions.)We will now give a reduction from HALTONZERO to

MONOTONE. That is, we assume towards a contradiction thatthere exists an algorithm ๐ด that computes MONOTONE and we willbuild an algorithm ๐ต that computes HALTONZERO. Our algorithm ๐ตwill work as follows:

Algorithm ๐ต:Input: String ๐‘ describing a Turing machine. (Goal: ComputeHALTONZERO(๐‘))Assumption: Access to Algorithm ๐ด to compute MONOTONE.Operation:

Page 348: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

348 introduction to theoretical computer science

1. Construct the following machine ๐‘€ : โ€œOn input ๐‘ง โˆˆ 0, 1โˆ— do: (a)Run ๐‘(0), (b) Return PAR(๐‘ง)โ€.

2. Return 1 โˆ’ ๐ด(๐‘€).To complete the proof we need to show that ๐ต outputs the cor-

rect answer, under our assumption that ๐ด computes MONOTONE.In other words, we need to show that HALTONZERO(๐‘) = 1 โˆ’๐‘€๐‘‚๐‘๐‘‚๐‘‡ ๐‘‚๐‘๐ธ(๐‘€). Suppose that ๐‘ does not halt on zero. In thiscase the program ๐‘€ constructed by Algorithm ๐ต enters into an in-finite loop in step (a) and will never reach step (b). Hence in thiscase ๐‘ is functionally equivalent to INF. (The machine ๐‘ is notthe same machine as INF: its description or code is different. But itdoes have the same input/output behavior (in this case) of neverhalting on any input. Also, while the program ๐‘€ will go into an in-finite loop on every input, Algorithm ๐ต never actually runs ๐‘€ : itonly produces its code and feeds it to ๐ด. Hence Algorithm ๐ต willnot enter into an infinite loop even in this case.) Thus in this case,MONOTONE(๐‘) = MONOTONE(INF) = 1.

If ๐‘ does halt on zero, then step (a) in ๐‘€ will eventually concludeand ๐‘€ โ€™s output will be determined by step (b), where it simply out-puts the parity of its input. Hence in this case, ๐‘€ computes the non-monotone parity function (i.e., is functionally equivalent to PAR), andso we get that MONOTONE(๐‘€) = MONOTONE(PAR) = 0. In bothcases, MONOTONE(๐‘€) = 1 โˆ’ ๐ป๐ด๐ฟ๐‘‡ ๐‘‚๐‘๐‘๐ธ๐‘…๐‘‚(๐‘), which is whatwe wanted to prove.

An examination of this proof shows that we did not use anythingabout MONOTONE beyond the fact that it is semantic and non-trivial.For every semantic non-trivial ๐น , we can use the same proof, replacingPAR and INF with two machines ๐‘€0 and ๐‘€1 such that ๐น(๐‘€0) = 0 and๐น(๐‘€1) = 1. Such machines must exist if ๐น is non trivial.

RRemark 9.16 โ€” Semantic is not the same as uncom-putable. Riceโ€™s Theorem is so powerful and such apopular way of proving uncomputability that peo-ple sometimes get confused and think that it is theonly way to prove uncomputability. In particular, acommon misconception is that if a function ๐น is notsemantic then it is computable. This is not at all thecase.For example, consider the following functionHALTNOYALE โˆถ 0, 1โˆ— โ†’ 0, 1. This is a functionthat on input a string that represents a NAND-TMprogram ๐‘ƒ , outputs 1 if and only if both (i) ๐‘ƒ haltson the input 0, and (ii) the program ๐‘ƒ does not con-tain a variable with the identifier Yale. The function

Page 349: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 349

HALTNOYALE is clearly not semantic, as it will out-put two different values when given as input one ofthe following two functionally equivalent programs:

Yale[0] = NAND(X[0],X[0])Y[0] = NAND(X[0],Yale[0])

and

Harvard[0] = NAND(X[0],X[0])Y[0] = NAND(X[0],Harvard[0])

However, HALTNOYALE is uncomputable since everyprogram ๐‘ƒ can be transformed into an equivalent(and in fact improved :)) program ๐‘ƒ โ€ฒ that does notcontain the variable Yale. Hence if we could computeHALTNOYALE then determine halting on zero forNAND-TM programs (and hence for Turing machinesas well).Moreover, as we will see in Chapter 11, there are un-computable functions whose inputs are not programs,and hence for which the adjective โ€œsemanticโ€ is notapplicable.Properties such as โ€œthe program contains the variableYaleโ€ are sometimes known as syntactic properties.The terms โ€œsemanticโ€ and โ€œsyntacticโ€ are used be-yond the realm of programming languages: a famousexample of a syntactically correct but semanticallymeaningless sentence in English is Chomskyโ€™s โ€œColor-less green ideas sleep furiously.โ€ However, formallydefining โ€œsyntactic propertiesโ€ is rather subtle and wewill not use this terminology in this book, sticking tothe terms โ€œsemanticโ€ and โ€œnon semanticโ€ only.

9.5.2 Halting and Riceโ€™s Theorem for other Turing-complete modelsAs we saw before, many natural computational models turn out to beequivalent to one another, in the sense that we can transform a โ€œpro-gramโ€ of one model (such as a ๐œ† expression, or a game-of-life config-urations) into another model (such as a NAND-TM program). Thisequivalence implies that we can translate the uncomputability of theHalting problem for NAND-TM programs into uncomputability forHalting in other models. For example:

Theorem 9.17 โ€” NAND-TM Machine Halting. Let NANDTMHALT โˆถ0, 1โˆ— โ†’ 0, 1 be the function that on input strings ๐‘ƒ โˆˆ0, 1โˆ— and ๐‘ฅ โˆˆ 0, 1โˆ— outputs 1 if the NAND-TM program de-scribed by ๐‘ƒ halts on the input ๐‘ฅ and outputs 0 otherwise. ThenNANDTMHALT is uncomputable.

Page 350: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

350 introduction to theoretical computer science

POnce again, this is a good point for you to stop and tryto prove the result yourself before reading the proofbelow.

Proof. We have seen in Theorem 7.11 that for every Turing machine๐‘€ , there is an equivalent NAND-TM program ๐‘ƒ๐‘€ such that for ev-ery ๐‘ฅ, ๐‘ƒ๐‘€(๐‘ฅ) = ๐‘€(๐‘ฅ). In particular this means that HALT(๐‘€) =NANDTMHALT(๐‘ƒ๐‘€).

The transformation ๐‘€ โ†ฆ ๐‘ƒ๐‘€ that is obtained from the proofof Theorem 7.11 is constructive. That is, the proof yields a way tocompute the map ๐‘€ โ†ฆ ๐‘ƒ๐‘€ . This means that this proof yields areduction from task of computing HALT to the task of computingNANDTMHALT, which means that since HALT is uncomputable,neither is NANDTMHALT.

The same proof carries over to other computational models such asthe ๐œ† calculus, two dimensional (or even one-dimensional) automata etc.Hence for example, there is no algorithm to decide if a ๐œ† expressionevaluates the identity function, and no algorithm to decide whetheran initial configuration of the game of life will result in eventuallycoloring the cell (0, 0) black or not.

Indeed, we can generalize Riceโ€™s Theorem to all these models. Forexample, if ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is a non-trivial function such that๐น(๐‘ƒ) = ๐น(๐‘ƒ โ€ฒ) for every functionally equivalent NAND-TM programs๐‘ƒ , ๐‘ƒ โ€ฒ then ๐น is uncomputable, and the same holds for NAND-RAMprograms, ๐œ†-expressions, and all other Turing complete models (asdefined in Definition 8.5), see also Exercise 9.12.

9.5.3 Is software verification doomed? (discussion)Programs are increasingly being used for mission critical purposes,whether itโ€™s running our banking system, flying planes, or monitoringnuclear reactors. If we canโ€™t even give a certification algorithm thata program correctly computes the parity function, how can we everbe assured that a program does what it is supposed to do? The keyinsight is that while it is impossible to certify that a general programconforms with a specification, it is possible to write a program inthe first place in a way that will make it easier to certify. As a trivialexample, if you write a program without loops, then you can certifythat it halts. Also, while it might not be possible to certify that anarbitrary program computes the parity function, it is quite possible towrite a particular program ๐‘ƒ for which we can mathematically provethat ๐‘ƒ computes the parity. In fact, writing programs or algorithms

Page 351: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 351

and providing proofs for their correctness is what we do all the time inalgorithms research.

The field of software verification is concerned with verifying thatgiven programs satisfy certain conditions. These conditions can bethat the program computes a certain function, that it never writesinto a dangerous memory location, that is respects certain invari-ants, and others. While the general tasks of verifying this may beuncomputable, researchers have managed to do so for many inter-esting cases, especially if the program is written in the first place ina formalism or programming language that makes verification eas-ier. That said, verification, especially of large and complex programs,remains a highly challenging task in practice as well, and the num-ber of programs that have been formally proven correct is still quitesmall. Moreover, even phrasing the right theorem to prove (i.e., thespecification) if often a highly non-trivial endeavor.

Figure 9.8: The set R of computable Boolean functions(Definition 7.3) is a proper subset of the set of allfunctions mapping 0, 1โˆ— to 0, 1. In this chapterwe saw a few examples of elements in the latter setthat are not in the former.

Chapter Recap

โ€ข There is a universal Turing machine (or NAND-TMprogram) ๐‘ˆ such that on input a description of aTuring machine ๐‘€ and some input ๐‘ฅ, ๐‘ˆ(๐‘€, ๐‘ฅ) haltsand outputs ๐‘€(๐‘ฅ) if (and only if) ๐‘€ halts on input๐‘ฅ. Unlike in the case of finite computation (i.e.,NAND-CIRC programs / circuits), the input tothe program ๐‘ˆ can be a machine ๐‘€ that has morestates than ๐‘ˆ itself.

โ€ข Unlike the finite case, there are actually functionsthat are inherently uncomputable in the sense thatthey cannot be computed by any Turing machine.

โ€ข These include not only some โ€œdegenerateโ€ or โ€œeso-tericโ€ functions but also functions that people have

Page 352: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

352 introduction to theoretical computer science

2 A machine with alphabet ฮฃ can have at most |ฮฃ|๐‘‡choices for the contents of the first ๐‘‡ locations ofits tape. What happens if the machine repeats apreviously seen configuration, in the sense that thetape contents, the head location, and the current state,are all identical to what they were in some previousstate of the execution?

deeply care about and conjectured that could becomputed.

โ€ข If the Church-Turing thesis holds then a function๐น that is uncomputable according to our definitioncannot be computed by any means in our physicalworld.

9.6 EXERCISES

Exercise 9.1 โ€” NAND-RAM Halt. Let NANDRAMHALT โˆถ 0, 1โˆ— โ†’ 0, 1be the function such that on input (๐‘ƒ , ๐‘ฅ) where ๐‘ƒ represents a NAND-RAM program, NANDRAMHALT(๐‘ƒ , ๐‘ฅ) = 1 iff ๐‘ƒ halts on the input ๐‘ฅ.Prove that NANDRAMHALT is uncomputable.

Exercise 9.2 โ€” Timed halting. Let TIMEDHALT โˆถ 0, 1โˆ— โ†’ 0, 1 bethe function that on input (a string representing) a triple (๐‘€, ๐‘ฅ, ๐‘‡ ),TIMEDHALT(๐‘€, ๐‘ฅ, ๐‘‡ ) = 1 iff the Turing machine ๐‘€ , on input ๐‘ฅ,halts within at most ๐‘‡ steps (where a step is defined as one sequenceof reading a symbol from the tape, updating the state, writing a newsymbol and (potentially) moving the head).

Prove that TIMEDHALT is computable.

Exercise 9.3 โ€” Space halting (challenging). Let SPACEHALT โˆถ 0, 1โˆ— โ†’0, 1 be the function that on input (a string representing) a triple(๐‘€, ๐‘ฅ, ๐‘‡ ), SPACEHALT(๐‘€, ๐‘ฅ, ๐‘‡ ) = 1 iff the Turing machine ๐‘€ , oninput ๐‘ฅ, halts before its head reached the ๐‘‡ -th location of its tape. (Wedonโ€™t care how many steps ๐‘€ makes, as long as the head stays insidelocations 0, โ€ฆ , ๐‘‡ โˆ’ 1.)

Prove that SPACEHALT is computable. See footnote for hint2

Exercise 9.4 โ€” Computable compositions. Suppose that ๐น โˆถ 0, 1โˆ— โ†’ 0, 1and ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1 are computable functions. For each one of thefollowing functions ๐ป , either prove that ๐ป is necessarily computable orgive an example of a pair ๐น and ๐บ of computable functions such that๐ป will not be computable. Prove your assertions.

1. ๐ป(๐‘ฅ) = 1 iff ๐น(๐‘ฅ) = 1 OR ๐บ(๐‘ฅ) = 1.2. ๐ป(๐‘ฅ) = 1 iff there exist two nonempty strings ๐‘ข, ๐‘ฃ โˆˆ 0, 1โˆ— such

that ๐‘ฅ = ๐‘ข๐‘ฃ (i.e., ๐‘ฅ is the concatenation of ๐‘ข and ๐‘ฃ), ๐น(๐‘ข) = 1 and๐บ(๐‘ฃ) = 1.3. ๐ป(๐‘ฅ) = 1 iff there exist a list ๐‘ข0, โ€ฆ , ๐‘ข๐‘กโˆ’1 of non empty strings such

that strings๐น(๐‘ข๐‘–) = 1 for every ๐‘– โˆˆ [๐‘ก] and ๐‘ฅ = ๐‘ข0๐‘ข1 โ‹ฏ ๐‘ข๐‘กโˆ’1.

Page 353: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 353

3 Hint: You can use Riceโ€™s Theorem.

4 Hint: While it cannot be applied directly, with alittle โ€œmassagingโ€ you can prove this using Riceโ€™sTheorem.

4. ๐ป(๐‘ฅ) = 1 iff ๐‘ฅ is a valid string representation of a NAND++program ๐‘ƒ such that for every ๐‘ง โˆˆ 0, 1โˆ—, on input ๐‘ง the program๐‘ƒ outputs ๐น(๐‘ง).

5. ๐ป(๐‘ฅ) = 1 iff ๐‘ฅ is a valid string representation of a NAND++program ๐‘ƒ such that on input ๐‘ฅ the program ๐‘ƒ outputs ๐น(๐‘ฅ).

6. ๐ป(๐‘ฅ) = 1 iff ๐‘ฅ is a valid string representation of a NAND++program ๐‘ƒ such that on input ๐‘ฅ, ๐‘ƒ outputs ๐น(๐‘ฅ) after executing atmost 100 โ‹… |๐‘ฅ|2 lines.

Exercise 9.5 Prove that the following function FINITE โˆถ 0, 1โˆ— โ†’ 0, 1is uncomputable. On input ๐‘ƒ โˆˆ 0, 1โˆ—, we define FINITE(๐‘ƒ ) = 1if and only if ๐‘ƒ is a string that represents a NAND++ program suchthat there only a finite number of inputs ๐‘ฅ โˆˆ 0, 1โˆ— s.t. ๐‘ƒ(๐‘ฅ) = 1.3

Exercise 9.6 โ€” Computing parity. Prove Theorem 9.13 without using Riceโ€™sTheorem.

Exercise 9.7 โ€” TM Equivalence. Let EQ โˆถ 0, 1โˆ— โˆถโ†’ 0, 1 be the func-tion defined as follows: given a string representing a pair (๐‘€, ๐‘€ โ€ฒ)of Turing machines, EQ(๐‘€, ๐‘€ โ€ฒ) = 1 iff ๐‘€ and ๐‘€ โ€ฒ are functionallyequivalent as per Definition 9.14. Prove that EQ is uncomputable.

Note that you cannot use Riceโ€™s Theorem directly, as this theoremonly deals with functions that take a single Turing machine as input,and EQ takes two machines.

Exercise 9.8 For each of the following two functions, say whether it iscomputable or not:

1. Given a NAND-TM program ๐‘ƒ , an input ๐‘ฅ, and a number ๐‘˜, whenwe run ๐‘ƒ on ๐‘ฅ, does the index variable i ever reach ๐‘˜?

2. Given a NAND-TM program ๐‘ƒ , an input ๐‘ฅ, and a number ๐‘˜, whenwe run ๐‘ƒ on ๐‘ฅ, does ๐‘ƒ ever write to an array at index ๐‘˜?

Exercise 9.9 Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 be the function that is defined asfollows. On input a string ๐‘ƒ that represents a NAND-RAM programand a String ๐‘€ that represents a Turing machine, ๐น(๐‘ƒ , ๐‘€) = 1 if andonly if there exists some input ๐‘ฅ such ๐‘ƒ halts on ๐‘ฅ but ๐‘€ does not halton ๐‘ฅ. Prove that ๐น is uncomputable. See footnote for hint.4

Page 354: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

354 introduction to theoretical computer science

5 HALT has this property.

6 You can either use the diagonalization method toprove this directly or show that the set of all recur-sively enumerable functions is countable.

7 HALT has this property: show that if both HALTand ๐ป๐ด๐ฟ๐‘‡ were recursively enumerable then HALTwould be in fact computable.

8 Show that any ๐บ satisfying (b) must be semantic.

Exercise 9.10 โ€” Recursively enumerable. Define a function ๐น โˆถ 0, 1โˆ— โˆถโ†’0, 1 to be recursively enumerable if there exists a Turing machine ๐‘€such that such that for every ๐‘ฅ โˆˆ 0, 1โˆ—, if ๐น(๐‘ฅ) = 1 then ๐‘€(๐‘ฅ) = 1,and if ๐น(๐‘ฅ) = 0 then ๐‘€(๐‘ฅ) = โŠฅ. (i.e., if ๐น(๐‘ฅ) = 0 then ๐‘€ does not halton ๐‘ฅ.)1. Prove that every computable ๐น is also recursively enumerable.

2. Prove that there exists ๐น that is not computable but is recursivelyenumerable. See footnote for hint.5

3. Prove that there exists a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 such that ๐น isnot recursively enumerable. See footnote for hint.6

4. Prove that there exists a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 such that๐น is recursively enumerable but the function ๐น defined as ๐น(๐‘ฅ) =1 โˆ’ ๐น(๐‘ฅ) is not recursively enumerable. See footnote for hint.7

Exercise 9.11 โ€” Riceโ€™s Theorem: standard form. In this exercise we willprove Riceโ€™s Theorem in the form that it is typically stated in the litera-ture.

For a Turing machine ๐‘€ , define ๐ฟ(๐‘€) โŠ† 0, 1โˆ— to be the set of all๐‘ฅ โˆˆ 0, 1โˆ— such that ๐‘€ halts on the input ๐‘ฅ and outputs 1. (The set๐ฟ(๐‘€) is known in the literature as the language recognized by ๐‘€ . Notethat ๐‘€ might either output a value other than 1 or not halt at all oninputs ๐‘ฅ โˆ‰ ๐ฟ(๐‘€). )1. Prove that for every Turing Machine ๐‘€ , if we define ๐น๐‘€ โˆถ 0, 1โˆ— โ†’0, 1 to be the function such that ๐น๐‘€(๐‘ฅ) = 1 iff ๐‘ฅ โˆˆ ๐ฟ(๐‘€) then ๐น๐‘€

is recursively enumerable as defined in Exercise 9.10.

2. Use Theorem 9.15 to prove that for every ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1, if (a)๐บ is neither the constant zero nor the constant one function, and(b) for every ๐‘€, ๐‘€ โ€ฒ such that ๐ฟ(๐‘€) = ๐ฟ(๐‘€ โ€ฒ), ๐บ(๐‘€) = ๐บ(๐‘€ โ€ฒ),then ๐บ is uncomputable. See footnote for hint.8

Exercise 9.12 โ€” Riceโ€™s Theorem for general Turing-equivalent models (optional).

Let โ„ฑ be the set of all partial functions from 0, 1โˆ— to 0, 1 and โ„ณ โˆถ0, 1โˆ— โ†’ โ„ฑ be a Turing-equivalent model as defined in Definition 8.5.We define a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 to be โ„ณ-semantic if thereexists some ๐’ข โˆถ โ„ฑ โ†’ 0, 1 such that ๐น(๐‘ƒ) = ๐’ข(โ„ณ(๐‘ƒ)) for every๐‘ƒ โˆˆ 0, 1โˆ—.

Prove that for every โ„ณ-semantic ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 that is neitherthe constant one nor the constant zero function, ๐น is uncomputable.

Page 355: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

universality and uncomputability 355

9 You will not need to use very specific propertiesof the TOWER function in this exercise. For exam-ple, NBB(๐‘›) also grows faster than the Ackermanfunction.

Exercise 9.13 โ€” Busy Beaver. In this question we define the NAND-TM variant of the busy beaver function (see Aaronsonโ€™s 1999 essay,2017 blog post and 2020 survey [Aar20]; see also Taoโ€™s highly recom-mended presentation on how civilizationโ€™s scientific progress can bemeasured by the quantities we can grasp).

1. Let ๐‘‡๐ต๐ต โˆถ 0, 1โˆ— โ†’ โ„• be defined as follows. For every string ๐‘ƒ โˆˆ0, 1โˆ—, if ๐‘ƒ represents a NAND-TM program such that when ๐‘ƒ isexecuted on the input 0 then it halts within ๐‘€ steps then ๐‘‡๐ต๐ต(๐‘ƒ ) =๐‘€ . Otherwise (if ๐‘ƒ does not represent a NAND-TM program, or itis a program that does not halt on 0), ๐‘‡๐ต๐ต(๐‘ƒ ) = 0. Prove that ๐‘‡๐ต๐ตis uncomputable.

2. Let TOWER(๐‘›) denote the number 2โ‹…โ‹…โ‹…2โŸ๐‘› times(that is, a โ€œtower of pow-

ers of twoโ€ of height ๐‘›). To get a sense of how fast this functiongrows, TOWER(1) = 2, TOWER(2) = 22 = 4, TOWER(3) = 222 =16, TOWER(4) = 216 = 65536 and TOWER(5) = 265536 whichis about 1020000. TOWER(6) is already a number that is too big towrite even in scientific notation. Define NBB โˆถ โ„• โ†’ โ„• (for โ€œNAND-TM Busy Beaverโ€) to be the function NBB(๐‘›) = max๐‘ƒโˆˆ0,1๐‘› ๐‘‡๐ต๐ต(๐‘ƒ )where ๐‘‡๐ต๐ต is as defined in Question 6.1. Prove that NBB growsfaster than TOWER, in the sense that TOWER(๐‘›) = ๐‘œ(NBB(๐‘›)). Seefootnote for hint9

9.7 BIBLIOGRAPHICAL NOTES

The cartoon of the Halting problem in Fig. 9.1 and taken from CharlesCooperโ€™s website.

Section 7.2 in [MM11] gives a highly recommended overview ofuncomputability. Gรถdel, Escher, Bach [Hof99] is a classic popularscience book that touches on uncomputability, and unprovability, andspecifically Gรถdelโ€™s Theorem that we will see in Chapter 11. See alsothe recent book by Holt [Hol18].

The history of the definition of a function is intertwined with thedevelopment of mathematics as a field. For many years, a functionwas identified (as per Eulerโ€™s quote above) with the means to calcu-late the output from the input. In the 1800โ€™s, with the invention ofthe Fourier series and with the systematic study of continuity anddifferentiability, people have started looking at more general kinds offunctions, but the modern definition of a function as an arbitrary map-ping was not yet universally accepted. For example, in 1899 Poincarewrote โ€œwe have seen a mass of bizarre functions which appear to be forcedto resemble as little as possible honest functions which serve some purpose.

Page 356: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

356 introduction to theoretical computer science

โ€ฆ they are invented on purpose to show that our ancestorโ€™s reasoning was atfault, and we shall never get anything more than that out of themโ€. Some ofthis fascinating history is discussed in [Gra83; Kle91; Lรผt02; Gra05].

The existence of a universal Turing machine, and the uncomputabil-ity of HALT was first shown by Turing in his seminal paper [Tur37],though closely related results were shown by Church a year before.These works built on Gรถdelโ€™s 1931 incompleteness theorem that we willdiscuss in Chapter 11.

Some universal Turing Machines with a small alphabet and numberof states are given in [Rog96], including a single-tape universal Turingmachine with the binary alphabet and with less than 25 states; seealso the survey [WN09]. Adam Yedidia has written software to helpin producing Turing machines with a small number of states. This isrelated to the recreational pastime of โ€œCode Golfingโ€ which is aboutsolving a certain computational task using the as short as possibleprogram. Finding โ€œhighly complexโ€ small Turing machine is alsorelated to the โ€œBusy Beaverโ€ problem, see Exercise 9.13 and the survey[Aar20].

The diagonalization argument used to prove uncomputability of ๐น โˆ—is derived from Cantorโ€™s argument for the uncountability of the realsdiscussed in Chapter 2.

Christopher Strachey was an English computer scientist and theinventor of the CPL programming language. He was also an earlyartificial intelligence visionary, programming a computer to playCheckers and even write love letters in the early 1950โ€™s, see this NewYorker article and this website.

Riceโ€™s Theorem was proven in [Ric53]. It is typically stated in aform somewhat different than what we used, see Exercise 9.11.

We do not discuss in the chapter the concept of recursively enumer-able languages, but it is covered briefly in Exercise 9.10. As usual, weuse function, as opposed to language, notation.

The cartoon of the Halting problem in Fig. 9.1 is copyright 2019Charles F. Cooper.

Page 357: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

10Restricted computational models

โ€œHappy families are all alike; every unhappy family is unhappy in its ownwayโ€, Leo Tolstoy (opening of the book โ€œAnna Kareninaโ€).

We have seen that many models of computation are Turing equiva-lent, including Turing machines, NAND-TM/NAND-RAM programs,standard programming languages such as C/Python/Javascript, aswell as other models such as the ๐œ† calculus and even the game of life.The flip side of this is that for all these models, Riceโ€™s theorem (The-orem 9.15) holds as well, which means that any semantic property ofprograms in such a model is uncomputable.

The uncomputability of halting and other semantic specificationproblems for Turing equivalent models motivates restricted com-putational models that are (a) powerful enough to capture a set offunctions useful for certain applications but (b) weak enough that wecan still solve semantic specification problems on them. In this chapterwe discuss several such examples.

Big Idea 14 We can use restricted computational models to bypasslimitations such as uncomputability of the Halting problem and Riceโ€™sTheorem. Such models can compute only a restricted subclass offunctions, but allow to answer at least some semantic questions onprograms.

10.1 TURING COMPLETENESS AS A BUG

We have seen that seemingly simple computational models or sys-tems can turn out to be Turing complete. The following webpage listsseveral examples of formalisms that โ€œaccidentallyโ€ turned out to Tur-ing complete, including supposedly limited languages such as the Cpreprocessor, CSS, (certain variants of) SQL, sendmail configuration,as well as games such as Minecraft, Super Mario, and the card game

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข See that Turing completeness is not always a

good thing.โ€ข Another example of an always-halting

formalism: context-free grammars and simplytyped ๐œ† calculus.

โ€ข The pumping lemma for non context-freefunctions.

โ€ข Examples of computable and uncomputablesemantic properties of regular expressions andcontext-free grammars.

Page 358: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

358 introduction to theoretical computer science

Figure 10.1: Some restricted computational models.We have already seen two equivalent restrictedmodels of computation: regular expressions anddeterministic finite automata. We show a morepowerful model: context-free grammars. We alsopresent tools to demonstrate that some functions cannot be computed in these models.

โ€œMagic: The Gatheringโ€. Turing completeness is not always a goodthing, as it means that such formalisms can give rise to arbitrarilycomplex behavior. For example, the postscript format (a precursor ofPDF) is a Turing-complete programming language meant to describedocuments for printing. The expressive power of postscript can allowfor short descriptions of very complex images, but it also gave rise tosome nasty surprises, such as the attacks described in this page rang-ing from using infinite loops as a denial of service attack, to accessingthe printerโ€™s file system.

Example 10.1 โ€” The DAO Hack. An interesting recent example of thepitfalls of Turing-completeness arose in the context of the cryp-tocurrency Ethereum. The distinguishing feature of this currencyis the ability to design โ€œsmart contractsโ€ using an expressive (andin particular Turing-complete) programming language. In ourcurrent โ€œhuman operatedโ€ economy, Alice and Bob might sign acontract to agree that if condition X happens then they will jointlyinvest in Charlieโ€™s company. Ethereum allows Alice and Bob tocreate a joint venture where Alice and Bob pool their funds to-gether into an account that will be governed by some program ๐‘ƒthat decides under what conditions it disburses funds from it. Forexample, one could imagine a piece of code that interacts betweenAlice, Bob, and some program running on Bobโ€™s car that allowsAlice to rent out Bobโ€™s car without any human intervention oroverhead.

Specifically Ethereum uses the Turing-complete programminglanguage solidity which has a syntax similar to JavaScript. Theflagship of Ethereum was an experiment known as The โ€œDecen-tralized Autonomous Organizationโ€ or The DAO. The idea wasto create a smart contract that would create an autonomously rundecentralized venture capital fund, without human managers,where shareholders could decide on investment opportunities. The

Page 359: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 359

DAO was at the time the biggest crowdfunding success in history.At its height the DAO was worth 150 million dollars, which wasmore than ten percent of the total Ethereum market. Investing inthe DAO (or entering any other โ€œsmart contractโ€) amounts to pro-viding your funds to be run by a computer program. i.e., โ€œcodeis lawโ€, or to use the words the DAO described itself: โ€œThe DAOis borne from immutable, unstoppable, and irrefutable computer codeโ€.Unfortunately, it turns out that (as we saw in Chapter 9) under-standing the behavior of computer programs is quite a hard thingto do. A hacker (or perhaps, some would say, a savvy investor)was able to fashion an input that caused the DAO code to enterinto an infinite recursive loop in which it continuously transferredfunds into the hackerโ€™s account, thereby cleaning out about 60 mil-lion dollars out of the DAO. While this transaction was โ€œlegalโ€ inthe sense that it complied with the code of the smart contract, itwas obviously not what the humans who wrote this code had inmind. The Ethereum community struggled with the response tothis attack. Some tried the โ€œRobin Hoodโ€ approach of using thesame loophole to drain the DAO funds into a secure account, butit only had limited success. Eventually, the Ethereum communitydecided that the code can be mutable, stoppable, and refutable.Specifically, the Ethereum maintainers and miners agreed on aโ€œhard forkโ€ (also known as a โ€œbailoutโ€) to revert history to be-fore the hackerโ€™s transaction occurred. Some community membersstrongly opposed this decision, and so an alternative currencycalled Ethereum Classic was created that preserved the originalhistory.

10.2 CONTEXT FREE GRAMMARS

If you have ever written a program, youโ€™ve experienced a syntax error.You probably also had the experience of your program entering intoan infinite loop. What is less likely is that the compiler or interpreterentered an infinite loop while trying to figure out if your program hasa syntax error.

When a person designs a programming language, they need todetermine its syntax. That is, the designer decides which strings corre-sponds to valid programs, and which ones do not (i.e., which stringscontain a syntax error). To ensure that a compiler or interpreter al-ways halts when checking for syntax errors, language designers typi-cally do not use a general Turing-complete mechanism to express theirsyntax. Rather they use a restricted computational model. One of themost popular choices for such models is context free grammars.

Page 360: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

360 introduction to theoretical computer science

To explain context free grammars, let us begin with a canonical ex-ample. Consider the function ARITH โˆถ ฮฃโˆ— โ†’ 0, 1 that takes as inputa string ๐‘ฅ over the alphabet ฮฃ = (, ), +, โˆ’, ร—, รท, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9and returns 1 if and only if the string ๐‘ฅ represents a valid arithmeticexpression. Intuitively, we build expressions by applying an opera-tion such as +,โˆ’,ร— or รท to smaller expressions, or enclosing them inparenthesis, where the โ€œbase caseโ€ corresponds to expressions that aresimply numbers. More precisely, we can make the following defini-tions:

โ€ข A digit is one of the symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.โ€ข A number is a sequence of digits. (For simplicity we drop the condi-

tion that the sequence does not have a leading zero, though it is nothard to encode it in a context-free grammar as well.)

โ€ข An operation is one of +, โˆ’, ร—, รทโ€ข An expression has either the form โ€œnumberโ€, the form โ€œsub-

expression1 operation sub-expression2โ€, or the form โ€œ(sub-expression1)โ€, where โ€œsub-expression1โ€ and โ€œsub-expression2โ€ arethemselves expressions. (Note that this is a recursive definition.)

A context free grammar (CFG) is a formal way of specifying suchconditions. A CFG consists of a set of rules that tell us how to generatestrings from smaller components. In the above example, one of therules is โ€œif ๐‘’๐‘ฅ๐‘1 and ๐‘’๐‘ฅ๐‘2 are valid expressions, then ๐‘’๐‘ฅ๐‘1 ร— ๐‘’๐‘ฅ๐‘2 isalso a valid expressionโ€; we can also write this rule using the short-hand ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› โ‡’ ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘›. As in the above ex-ample, the rules of a context-free grammar are often recursive: the rule๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› โ‡’ ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› ร— ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› defines valid expressions interms of itself. We now formally define context-free grammars:

Definition 10.2 โ€” Context Free Grammar. Let ฮฃ be some finite set. Acontext free grammar (CFG) over ฮฃ is a triple (๐‘‰ , ๐‘…, ๐‘ ) such that:

โ€ข ๐‘‰ , known as the variables, is a set disjoint from ฮฃ.

โ€ข ๐‘  โˆˆ ๐‘‰ is known as the initial variable.

โ€ข ๐‘… is a set of rules. Each rule is a pair (๐‘ฃ, ๐‘ง) with ๐‘ฃ โˆˆ ๐‘‰ and๐‘ง โˆˆ (ฮฃ โˆช ๐‘‰ )โˆ—. We often write the rule (๐‘ฃ, ๐‘ง) as ๐‘ฃ โ‡’ ๐‘ง and say thatthe string ๐‘ง can be derived from the variable ๐‘ฃ.

Page 361: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 361

Example 10.3 โ€” Context free grammar for arithmetic expressions. Theexample above of well-formed arithmetic expressions can be cap-tured formally by the following context free grammar:

โ€ข The alphabet ฮฃ is (, ), +, โˆ’, ร—, รท, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9โ€ข The variables are ๐‘‰ = ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› , ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ , ๐‘‘๐‘–๐‘”๐‘–๐‘ก , ๐‘œ๐‘๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘›.โ€ข The rules are the set ๐‘… containing the following 19 rules:

โ€“ The 4 rules ๐‘œ๐‘๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› โ‡’ +, ๐‘œ๐‘๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› โ‡’ โˆ’, ๐‘œ๐‘๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› โ‡’ ร—,and ๐‘œ๐‘๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› โ‡’ รท.

โ€“ The 10 rules ๐‘‘๐‘–๐‘”๐‘–๐‘ก โ‡’ 0,โ€ฆ, ๐‘‘๐‘–๐‘”๐‘–๐‘ก โ‡’ 9.โ€“ The rule ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ โ‡’ ๐‘‘๐‘–๐‘”๐‘–๐‘ก.โ€“ The rule ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ โ‡’ ๐‘‘๐‘–๐‘”๐‘–๐‘ก ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ.โ€“ The rule ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› โ‡’ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ.โ€“ The rule ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› โ‡’ ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› ๐‘œ๐‘๐‘’๐‘Ÿ๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘›.โ€“ The rule ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘› โ‡’ (๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘›).

โ€ข The starting variable is ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘’๐‘ ๐‘ ๐‘–๐‘œ๐‘›People use many different notations to write context free grammars.

One of the most common notations is the Backusโ€“Naur form. In thisnotation we write a rule of the form ๐‘ฃ โ‡’ ๐‘Ž (where ๐‘ฃ is a variable and ๐‘Žis a string) in the form <v> := a. If we have several rules of the form๐‘ฃ โ†ฆ ๐‘Ž, ๐‘ฃ โ†ฆ ๐‘, and ๐‘ฃ โ†ฆ ๐‘ then we can combine them as <v> := a|b|c.(In words we say that ๐‘ฃ can derive either ๐‘Ž, ๐‘, or ๐‘.) For example, theBackus-Naur description for the context free grammar of Example 10.3is the following (using ASCII equivalents for operations):

operation := +|-|*|/digit := 0|1|2|3|4|5|6|7|8|9number := digit|digit numberexpression := number|expression operation

expression|(expression)Another example of a context free grammar is the โ€œmatching paren-

thesisโ€ grammar, which can be represented in Backus-Naur as fol-lows:

match := ""|match match|(match)

A string over the alphabet (,) can be generated from this gram-mar (where match is the starting expression and "" corresponds to theempty string) if and only if it consists of a matching set of parenthesis.

Page 362: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

362 introduction to theoretical computer science

1 As in the case of Definition 6.7 we can also uselanguage rather than function notation and say that alanguage ๐ฟ โŠ† ฮฃโˆ— is context free if the function ๐น suchthat ๐น(๐‘ฅ) = 1 iff ๐‘ฅ โˆˆ ๐ฟ is context free.

In contrast, by Lemma 6.19 there is no regular expression that matchesa string ๐‘ฅ if and only if ๐‘ฅ contains a valid sequence of matching paren-thesis.

10.2.1 Context-free grammars as a computational modelWe can think of a context-free grammar over the alphabet ฮฃ as defin-ing a function that maps every string ๐‘ฅ in ฮฃโˆ— to 1 or 0 depending onwhether ๐‘ฅ can be generated by the rules of the grammars. We nowmake this definition formally.

Definition 10.4 โ€” Deriving a string from a grammar. If ๐บ = (๐‘‰ , ๐‘…, ๐‘ ) is acontext-free grammar over ฮฃ, then for two strings ๐›ผ, ๐›ฝ โˆˆ (ฮฃ โˆช ๐‘‰ )โˆ—we say that ๐›ฝ can be derived in one step from ๐›ผ, denoted by ๐›ผ โ‡’๐บ ๐›ฝ,if we can obtain ๐›ฝ from ๐›ผ by applying one of the rules of ๐บ. That is,we obtain ๐›ฝ by replacing in ๐›ผ one occurrence of the variable ๐‘ฃ withthe string ๐‘ง, where ๐‘ฃ โ‡’ ๐‘ง is a rule of ๐บ.

We say that ๐›ฝ can be derived from ๐›ผ, denoted by ๐›ผ โ‡’โˆ—๐บ ๐›ฝ, if itcan be derived by some finite number ๐‘˜ of steps. That is, if thereare ๐›ผ1, โ€ฆ , ๐›ผ๐‘˜โˆ’1 โˆˆ (ฮฃ โˆช ๐‘‰ )โˆ—, so that ๐›ผ โ‡’๐บ ๐›ผ1 โ‡’๐บ ๐›ผ2 โ‡’๐บ โ‹ฏ โ‡’๐บ๐›ผ๐‘˜โˆ’1 โ‡’๐บ ๐›ฝ.

We say that ๐‘ฅ โˆˆ ฮฃโˆ— is matched by ๐บ = (๐‘‰ , ๐‘…, ๐‘ ) if ๐‘ฅ can be de-rived from the starting variable ๐‘  (i.e., if ๐‘  โ‡’โˆ—๐บ ๐‘ฅ). We define thefunction computed by (๐‘‰ , ๐‘…, ๐‘ ) to be the map ฮฆ๐‘‰ ,๐‘…,๐‘  โˆถ ฮฃโˆ— โ†’ 0, 1such that ฮฆ๐‘‰ ,๐‘…,๐‘ (๐‘ฅ) = 1 iff ๐‘ฅ is matched by (๐‘‰ , ๐‘…, ๐‘ ). A function๐น โˆถ ฮฃโˆ— โ†’ 0, 1 is context free if ๐น = ฮฆ๐‘‰ ,๐‘…,๐‘  for some CFG (๐‘‰ , ๐‘…, ๐‘ ).1

A priori it might not be clear that the map ฮฆ๐‘‰ ,๐‘…,๐‘  is computable,but it turns out that this is the case.

Theorem 10.5 โ€” Context-free grammars always halt. For every CFG(๐‘‰ , ๐‘…, ๐‘ ) over 0, 1, the function ฮฆ๐‘‰ ,๐‘…,๐‘  โˆถ 0, 1โˆ— โ†’ 0, 1 iscomputable.

As usual we restrict attention to grammars over 0, 1 although theproof extends to any finite alphabet ฮฃ.

Proof. We only sketch the proof. We start with the observation we canconvert every CFG to an equivalent version of Chomsky normal form,where all rules either have the form ๐‘ข โ†’ ๐‘ฃ๐‘ค for variables ๐‘ข, ๐‘ฃ, ๐‘ค or theform ๐‘ข โ†’ ๐œŽ for a variable ๐‘ข and symbol ๐œŽ โˆˆ ฮฃ, plus potentially therule ๐‘  โ†’ "" where ๐‘  is the starting variable.

The idea behind such a transformation is to simply add new vari-ables as needed, and so for example we can translate a rule such as๐‘ฃ โ†’ ๐‘ข๐œŽ๐‘ค into the three rules ๐‘ฃ โ†’ ๐‘ข๐‘Ÿ, ๐‘Ÿ โ†’ ๐‘ก๐‘ค and ๐‘ก โ†’ ๐œŽ.

Page 363: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 363

Using the Chomsky Normal form we get a natural recursive algo-rithm for computing whether ๐‘  โ‡’โˆ—๐บ ๐‘ฅ for a given grammar ๐บ andstring ๐‘ฅ. We simply try all possible guesses for the first rule ๐‘  โ†’ ๐‘ข๐‘ฃthat is used in such a derivation, and then all possible ways to par-tition ๐‘ฅ as a concatenation ๐‘ฅ = ๐‘ฅโ€ฒ๐‘ฅโ€ณ. If we guessed the rule and thepartition correctly, then this reduces our task to checking whether๐‘ข โ‡’โˆ—๐บ ๐‘ฅโ€ฒ and ๐‘ฃ โ‡’โˆ—๐บ ๐‘ฅโ€ณ, which (as it involves shorter strings) canbe done recursively. The base cases are when ๐‘ฅ is empty or a singlesymbol, and can be easily handled.

RRemark 10.6 โ€” Parse trees. While we focus on thetask of deciding whether a CFG matches a string, thealgorithm to compute ฮฆ๐‘‰ ,๐‘…,๐‘  actually gives more in-formation than that. That is, on input a string ๐‘ฅ, ifฮฆ๐‘‰ ,๐‘…,๐‘ (๐‘ฅ) = 1 then the algorithm yields the sequenceof rules that one can apply from the starting vertex ๐‘ to obtain the final string ๐‘ฅ. We can think of these rulesas determining a tree with ๐‘  being the root vertex andthe sinks (or leaves) corresponding to the substringsof ๐‘ฅ that are obtained by the rules that do not have avariable in their second element. This tree is knownas the parse tree of ๐‘ฅ, and often yields very usefulinformation about the structure of ๐‘ฅ.Often the first step in a compiler or interpreter for aprogramming language is a parser that transforms thesource into the parse tree (also known as the abstractsyntax tree). There are also tools that can automati-cally convert a description of a context-free grammarsinto a parser algorithm that computes the parse tree ofa given string. (Indeed, the above recursive algorithmcan be used to achieve this, but there are much moreefficient versions, especially for grammars that haveparticular forms, and programming language design-ers often try to ensure their languages have these moreefficient grammars.)

10.2.2 The power of context free grammarsContext free grammars can capture every regular expression:

Theorem 10.7 โ€” Context free grammars and regular expressions. Let ๐‘’ be aregular expression over 0, 1, then there is a CFG (๐‘‰ , ๐‘…, ๐‘ ) over0, 1 such that ฮฆ๐‘‰ ,๐‘…,๐‘  = ฮฆ๐‘’.

Proof. We prove the theorem by induction on the length of ๐‘’. If ๐‘’ isan expression of one bit length, then ๐‘’ = 0 or ๐‘’ = 1, in which casewe leave it to the reader to verify that there is a (trivial) CFG that

Page 364: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

364 introduction to theoretical computer science

computes it. Otherwise, we fall into one of the following case: case1: ๐‘’ = ๐‘’โ€ฒ๐‘’โ€ณ, case 2: ๐‘’ = ๐‘’โ€ฒ|๐‘’โ€ณ or case 3: ๐‘’ = (๐‘’โ€ฒ)โˆ— where in all cases๐‘’โ€ฒ, ๐‘’โ€ณ are shorter regular expressions. By the induction hypothesishave grammars (๐‘‰ โ€ฒ, ๐‘…โ€ฒ, ๐‘ โ€ฒ) and (๐‘‰ โ€ณ, ๐‘…โ€ณ, ๐‘ โ€ณ) that compute ฮฆ๐‘’โ€ฒ and ฮฆ๐‘’โ€ณrespectively. By renaming of variables, we can also assume withoutloss of generality that ๐‘‰ โ€ฒ and ๐‘‰ โ€ณ are disjoint.

In case 1, we can define the new grammar as follows: we add a newstarting variable ๐‘  โˆ‰ ๐‘‰ โˆช ๐‘‰ โ€ฒ and the rule ๐‘  โ†ฆ ๐‘ โ€ฒ๐‘ โ€ณ. In case 2, we candefine the new grammar as follows: we add a new starting variable๐‘  โˆ‰ ๐‘‰ โˆช ๐‘‰ โ€ฒ and the rules ๐‘  โ†ฆ ๐‘ โ€ฒ and ๐‘  โ†ฆ ๐‘ โ€ณ. Case 3 will be theonly one that uses recursion. As before we add a new starting variable๐‘  โˆ‰ ๐‘‰ โˆช ๐‘‰ โ€ฒ, but now add the rules ๐‘  โ†ฆ "" (i.e., the empty string) andalso add, for every rule of the form (๐‘ โ€ฒ, ๐›ผ) โˆˆ ๐‘…โ€ฒ, the rule ๐‘  โ†ฆ ๐‘ ๐›ผ to ๐‘….

We leave it to the reader as (a very good!) exercise to verify that inall three cases the grammars we produce capture the same function asthe original expression.

It turns out that CFGโ€™s are strictly more powerful than regularexpressions. In particular, as weโ€™ve seen, the โ€œmatching parenthesisโ€function MATCHPAREN can be computed by a context free grammar,whereas, as shown in Lemma 6.19, it cannot be computed by regularexpressions. Here is another example:Solved Exercise 10.1 โ€” Context free grammar for palindromes. Let PAL โˆถ0, 1, ; โˆ— โ†’ 0, 1 be the function defined in Solved Exercise 6.4 wherePAL(๐‘ค) = 1 iff ๐‘ค has the form ๐‘ข; ๐‘ข๐‘…. Then PAL can be computed by acontext-free grammar

Solution:

A simple grammar computing PAL can be described usingBackusโ€“Naur notation:

start := ; | 0 start 0 | 1 start 1

One can prove by induction that this grammar generates exactlythe strings ๐‘ค such that PAL(๐‘ค) = 1.

A more interesting example is computing the strings of the form๐‘ข; ๐‘ฃ that are not palindromes:Solved Exercise 10.2 โ€” Non palindromes. Prove that there is a context freegrammar that computes NPAL โˆถ 0, 1, ; โˆ— โ†’ 0, 1 where NPAL(๐‘ค) =1 if ๐‘ค = ๐‘ข; ๐‘ฃ but ๐‘ฃ โ‰  ๐‘ข๐‘….

Page 365: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 365

Solution:

Using Backusโ€“Naur notation we can describe such a grammar asfollows

palindrome := ; | 0 palindrome 0 | 1 palindrome 1different := 0 palindrome 1 | 1 palindrome 0start := different | 0 start | 1 start | start

0 | start 1In words, this means that we can characterize a string ๐‘ค such

that NPAL(๐‘ค) = 1 as having the following form๐‘ค = ๐›ผ๐‘๐‘ข; ๐‘ข๐‘…๐‘โ€ฒ๐›ฝ (10.1)where ๐›ผ, ๐›ฝ, ๐‘ข are arbitrary strings and ๐‘ โ‰  ๐‘โ€ฒ. Hence we can

generate such a string by first generating a palindrome ๐‘ข; ๐‘ข๐‘…(palindrome variable), then adding 0 on either the left or right and1 on the opposite side to get something that is not a palindrome(different variable), and then we can add arbitrary number of 0โ€™sand 1โ€™s on either end (the start variable).

10.2.3 Limitations of context-free grammars (optional)Even though context-free grammars are more powerful than regularexpressions, there are some simple languages that are not capturedby context free grammars. One tool to show this is the context-freegrammar analog of the โ€œpumping lemmaโ€ (Theorem 6.20):

Theorem 10.8 โ€” Context-free pumping lemma. Let (๐‘‰ , ๐‘…, ๐‘ ) be a CFGover ฮฃ, then there is some numbers ๐‘›0, ๐‘›1 โˆˆ โ„• such that for every๐‘ฅ โˆˆ ฮฃโˆ— with |๐‘ฅ| > ๐‘›0, if ฮฆ๐‘‰ ,๐‘…,๐‘ (๐‘ฅ) = 1 then ๐‘ฅ = ๐‘Ž๐‘๐‘๐‘‘๐‘’ such that|๐‘| + |๐‘| + |๐‘‘| โ‰ค ๐‘›1, |๐‘| + |๐‘‘| โ‰ฅ 1, and ฮฆ๐‘‰ ,๐‘…,๐‘ (๐‘Ž๐‘๐‘˜๐‘๐‘‘๐‘˜๐‘’) = 1 for every๐‘˜ โˆˆ โ„•.

PThe context-free pumping lemma is even more cum-bersome to state than its regular analog, but you canremember it as saying the following: โ€œIf a long enoughstring is matched by a grammar, there must be a variablethat is repeated in the derivation.โ€

Proof of Theorem 10.8. We only sketch the proof. The idea is that ifthe total number of symbols in the rules of the grammar is ๐‘˜0, thenthe only way to get |๐‘ฅ| > ๐‘›0 with ฮฆ๐‘‰ ,๐‘…,๐‘ (๐‘ฅ) = 1 is to use recursion.That is, there must be some variable ๐‘ฃ โˆˆ ๐‘‰ such that we are able to

Page 366: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

366 introduction to theoretical computer science

derive from ๐‘ฃ the value ๐‘๐‘ฃ๐‘‘ for some strings ๐‘, ๐‘‘ โˆˆ ฮฃโˆ—, and then furtheron derive from ๐‘ฃ some string ๐‘ โˆˆ ฮฃโˆ— such that ๐‘๐‘๐‘‘ is a substring of๐‘ฅ (in other words, ๐‘ฅ = ๐‘Ž๐‘๐‘๐‘‘๐‘’ for some ๐‘Ž, ๐‘’ โˆˆ 0, 1โˆ—). If we takethe variable ๐‘ฃ satisfying this requirement with a minimum numberof derivation steps, then we can ensure that |๐‘๐‘๐‘‘| is at most someconstant depending on ๐‘›0 and we can set ๐‘›1 to be that constant (๐‘›1 =10 โ‹… |๐‘…| โ‹… ๐‘›0 will do, since we will not need more than |๐‘…| applicationsof rules, and each such application can grow the string by at most ๐‘›0symbols).

Thus by the definition of the grammar, we can repeat the derivationto replace the substring ๐‘๐‘๐‘‘ in ๐‘ฅ with ๐‘๐‘˜๐‘๐‘‘๐‘˜ for every ๐‘˜ โˆˆ โ„• whileretaining the property that the output of ฮฆ๐‘‰ ,๐‘…,๐‘  is still one. Since ๐‘๐‘๐‘‘is a substring of ๐‘ฅ, we can write ๐‘ฅ = ๐‘Ž๐‘๐‘๐‘‘๐‘’ and are guaranteed that๐‘Ž๐‘๐‘˜๐‘๐‘‘๐‘˜๐‘’ is matched by the grammar for every ๐‘˜.

Using Theorem 10.8 one can show that even the simple function๐น โˆถ 0, 1โˆ— โ†’ 0, 1 defined as follows:๐น(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐‘ค๐‘ค for some ๐‘ค โˆˆ 0, 1โˆ—0 otherwise(10.2)

is not context free. (In contrast, the function ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1defined as ๐บ(๐‘ฅ) = 1 iff ๐‘ฅ = ๐‘ค0๐‘ค1 โ‹ฏ ๐‘ค๐‘›โˆ’1๐‘ค๐‘›โˆ’1๐‘ค๐‘›โˆ’2 โ‹ฏ ๐‘ค0 for some๐‘ค โˆˆ 0, 1โˆ— and ๐‘› = |๐‘ค| is context free, can you see why?.)Solved Exercise 10.3 โ€” Equality is not context-free. Let EQ โˆถ 0, 1, ; โˆ— โ†’0, 1 be the function such that EQ(๐‘ฅ) = 1 if and only if ๐‘ฅ = ๐‘ข; ๐‘ข forsome ๐‘ข โˆˆ 0, 1โˆ—. Then EQ is not context free.

Solution:

We use the context-free pumping lemma. Suppose towards thesake of contradiction that there is a grammar ๐บ that computes EQ,and let ๐‘›0 be the constant obtained from Theorem 10.8.

Consider the string ๐‘ฅ = 1๐‘›00๐‘›0 ; 1๐‘›00๐‘›0 , and write it as ๐‘ฅ = ๐‘Ž๐‘๐‘๐‘‘๐‘’as per Theorem 10.8, with |๐‘๐‘๐‘‘| โ‰ค ๐‘›0 and with |๐‘| + |๐‘‘| โ‰ฅ 1. By The-orem 10.8, it should hold that EQ(๐‘Ž๐‘๐‘’) = 1. However, by case anal-ysis this can be shown to be a contradiction.

Firstly, unless ๐‘ is on the left side of the ; separator and ๐‘‘ is onthe right side, dropping ๐‘ and ๐‘‘ will definitely make the two partsdifferent. But if it is the case that ๐‘ is on the left side and ๐‘‘ is on theright side, then by the condition that |๐‘๐‘๐‘‘| โ‰ค ๐‘›0 we know that ๐‘ is astring of only zeros and ๐‘‘ is a string of only ones. If we drop ๐‘ and๐‘‘ then since one of them is non empty, we get that there are either

Page 367: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 367

less zeroes on the left side than on the right side, or there are lessones on the right side than on the left side. In either case, we getthat EQ(๐‘Ž๐‘๐‘’) = 0, obtaining the desired contradiction.

10.3 SEMANTIC PROPERTIES OF CONTEXT FREE LANGUAGES

As in the case of regular expressions, the limitations of context freegrammars do provide some advantages. For example, emptiness ofcontext free grammars is decidable:

Theorem 10.9 โ€” Emptiness for CFGโ€™s is decidable. There is an algorithmthat on input a context-free grammar ๐บ, outputs 1 if and only if ฮฆ๐บis the constant zero function.

Proof Idea:

The proof is easier to see if we transform the grammar to ChomskyNormal Form as in Theorem 10.5. Given a grammar ๐บ, we can recur-sively define a non-terminal variable ๐‘ฃ to be non empty if there is eithera rule of the form ๐‘ฃ โ‡’ ๐œŽ, or there is a rule of the form ๐‘ฃ โ‡’ ๐‘ข๐‘ค whereboth ๐‘ข and ๐‘ค are non empty. Then the grammar is non empty if andonly if the starting variable ๐‘  is non-empty.โ‹†Proof of Theorem 10.9. We assume that the grammar ๐บ in ChomskyNormal Form as in Theorem 10.5. We consider the following proce-dure for marking variables as โ€œnon emptyโ€:

1. We start by marking all variables ๐‘ฃ that are involved in a rule of theform ๐‘ฃ โ‡’ ๐œŽ as non empty.

2. We then continue to mark ๐‘ฃ as non empty if it is involved in a ruleof the form ๐‘ฃ โ‡’ ๐‘ข๐‘ค where ๐‘ข, ๐‘ค have been marked before.

We continue this way until we cannot mark any more variables. Wethen declare that the grammar is empty if and only if ๐‘  has not beenmarked. To see why this is a valid algorithm, note that if a variable ๐‘ฃhas been marked as โ€œnon emptyโ€ then there is some string ๐›ผ โˆˆ ฮฃโˆ— thatcan be derived from ๐‘ฃ. On the other hand, if ๐‘ฃ has not been marked,then every sequence of derivations from ๐‘ฃ will always have a variablethat has not been replaced by alphabet symbols. Hence in particularฮฆ๐บ is the all zero function if and only if the starting variable ๐‘  is notmarked โ€œnon emptyโ€.

Page 368: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

368 introduction to theoretical computer science

10.3.1 Uncomputability of context-free grammar equivalence (optional)By analogy to regular expressions, one might have hoped to get analgorithm for deciding whether two given context free grammarsare equivalent. Alas, no such luck. It turns out that the equivalenceproblem for context free grammars is uncomputable. This is a directcorollary of the following theorem:

Theorem 10.10 โ€” Fullness of CFGโ€™s is uncomputable. For every set ฮฃ, letCFGFULLฮฃ be the function that on input a context-free grammar ๐บover ฮฃ, outputs 1 if and only if ๐บ computes the constant 1 function.Then there is some finite ฮฃ such that CFGFULLฮฃ is uncomputable.

Theorem 10.10 immediately implies that equivalence for context-free grammars is uncomputable, since computing โ€œfullnessโ€ of agrammar ๐บ over some alphabet ฮฃ = ๐œŽ0, โ€ฆ , ๐œŽ๐‘˜โˆ’1 corresponds tochecking whether ๐บ is equivalent to the grammar ๐‘  โ‡’ ""|๐‘ ๐œŽ0| โ‹ฏ |๐‘ ๐œŽ๐‘˜โˆ’1.Note that Theorem 10.10 and Theorem 10.9 together imply thatcontext-free grammars, unlike regular expressions, are not closedunder complement. (Can you see why?) Since we can encode everyelement of ฮฃ using โŒˆlog |ฮฃ|โŒ‰ bits (and this finite encoding can be easilycarried out within a grammar) Theorem 10.10 implies that fullness isalso uncomputable for grammars over the binary alphabet.

Proof Idea:

We prove the theorem by reducing from the Halting problem. Todo that we use the notion of configurations of NAND-TM programs, asdefined in Definition 8.8. Recall that a configuration of a program ๐‘ƒ is abinary string ๐‘  that encodes all the information about the program inthe current iteration.

We define ฮฃ to be 0, 1 plus some separator characters and defineINVALID๐‘ƒ โˆถ ฮฃโˆ— โ†’ 0, 1 to be the function that maps every string ๐ฟ โˆˆฮฃโˆ— to 1 if and only if ๐ฟ does not encode a sequence of configurationsthat correspond to a valid halting history of the computation of ๐‘ƒ onthe empty input.

The heart of the proof is to show that INVALID๐‘ƒ is context-free.Once we do that, we see that ๐‘ƒ halts on the empty input if and only ifINVALID๐‘ƒ (๐ฟ) = 1 for every ๐ฟ. To show that, we will encode the listin a special way that makes it amenable to deciding via a context-freegrammar. Specifically we will reverse all the odd-numbered strings.โ‹†Proof of Theorem 10.10. We only sketch the proof. We will show that ifwe can compute CFGFULL then we can solve HALTONZERO, whichhas been proven uncomputable in Theorem 9.9. Let ๐‘€ be an input

Page 369: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 369

Turing machine for HALTONZERO. We will use the notion of configu-rations of a Turing machine, as defined in Definition 8.8.

Recall that a configuration of Turing machine ๐‘€ and input ๐‘ฅ cap-tures the full state of ๐‘€ at some point of the computation. The partic-ular details of configurations are not so important, but what you needto remember is that:

โ€ข A configuration can be encoded by a binary string ๐œŽ โˆˆ 0, 1โˆ—.โ€ข The initial configuration of ๐‘€ on the input 0 is some fixed string.

โ€ข A halting configuration will have the value a certain state (which canbe easily โ€œread offโ€ from it) set to 1.

โ€ข If ๐œŽ is a configuration at some step ๐‘– of the computation, we denoteby NEXT๐‘€(๐œŽ) as the configuration at the next step. NEXT๐‘€(๐œŽ) isa string that agrees with ๐œŽ on all but a constant number of coor-dinates (those encoding the position corresponding to the headposition and the two adjacent ones). On those coordinates, thevalue of NEXT๐‘€(๐œŽ) can be computed by some finite function.

We will let the alphabet ฮฃ = 0, 1 โˆช โ€–, #. A computation his-tory of ๐‘€ on the input 0 is a string ๐ฟ โˆˆ ฮฃ that corresponds to a listโ€–๐œŽ0#๐œŽ1โ€–๐œŽ2#๐œŽ3 โ‹ฏ ๐œŽ๐‘กโˆ’2โ€–๐œŽ๐‘กโˆ’1# (i.e., โ€– comes before an even numberedblock, and # comes before an odd numbered one) such that if ๐‘– iseven then ๐œŽ๐‘– is the string encoding the configuration of ๐‘ƒ on input 0at the beginning of its ๐‘–-th iteration, and if ๐‘– is odd then it is the sameexcept the string is reversed. (That is, for odd ๐‘–, ๐‘Ÿ๐‘’๐‘ฃ(๐œŽ๐‘–) encodes theconfiguration of ๐‘ƒ on input 0 at the beginning of its ๐‘–-th iteration.)Reversing the odd-numbered blocks is a technical trick to ensure thatthe function INVALID๐‘€ we define below is context free.

We now define INVALID๐‘€ โˆถ ฮฃโˆ— โ†’ 0, 1 as follows:

INVALID๐‘€(๐ฟ) = โŽงโŽจโŽฉ0 ๐ฟ is a valid computation history of ๐‘€ on 01 otherwise(10.3)

We will show the following claim:CLAIM: INVALID๐‘€ is context-free.The claim implies the theorem. Since ๐‘€ halts on 0 if and only if

there exists a valid computation history, INVALID๐‘€ is the constantone function if and only if ๐‘€ does not halt on 0. In particular, thisallows us to reduce determining whether ๐‘€ halts on 0 to determiningwhether the grammar ๐บ๐‘€ corresponding to INVALID๐‘€ is full.

We now turn to the proof of the claim. We will not show all thedetails, but the main point INVALID๐‘€(๐ฟ) = 1 if at least one of thefollowing three conditions hold:

Page 370: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

370 introduction to theoretical computer science

1. ๐ฟ is not of the right format, i.e. not of the form โŸจbinary-stringโŸฉ#โŸจbinary-stringโŸฉโ€–โŸจbinary-stringโŸฉ# โ‹ฏ.

2. ๐ฟ contains a substring of the form โ€–๐œŽ#๐œŽโ€ฒโ€– such that๐œŽโ€ฒ โ‰  ๐‘Ÿ๐‘’๐‘ฃ(NEXT๐‘ƒ (๐œŽ))3. ๐ฟ contains a substring of the form #๐œŽโ€–๐œŽโ€ฒ# such that๐œŽโ€ฒ โ‰  NEXT๐‘ƒ (๐‘Ÿ๐‘’๐‘ฃ(๐œŽ))

Since context-free functions are closed under the OR operation, theclaim will follow if we show that we can verify conditions 1, 2 and 3via a context-free grammar.

For condition 1 this is very simple: checking that ๐ฟ is of the correctformat can be done using a regular expression. Since regular expres-sions are closed under negation, this means that checking that ๐ฟ is notof this format can also be done by a regular expression and hence by acontext-free grammar.

For conditions 2 and 3, this follows via very similar reasoning tothat showing that the function ๐น such that ๐น(๐‘ข#๐‘ฃ) = 1 iff ๐‘ข โ‰  ๐‘Ÿ๐‘’๐‘ฃ(๐‘ฃ)is context-free, see Solved Exercise 10.2. After all, the NEXT๐‘€ functiononly modifies its input in a constant number of places. We leave fillingout the details as an exercise to the reader. Since INVALID๐‘€(๐ฟ) = 1if and only if ๐ฟ satisfies one of the conditions 1., 2. or 3., and all threeconditions can be tested for via a context-free grammar, this completesthe proof of the claim and hence the theorem.

10.4 SUMMARY OF SEMANTIC PROPERTIES FOR REGULAR EX-PRESSIONS AND CONTEXT-FREE GRAMMARS

To summarize, we can often trade expressiveness of the model foramenability to analysis. If we consider computational models that arenot Turing complete, then we are sometimes able to bypass Riceโ€™s The-orem and answer certain semantic questions about programs in suchmodels. Here is a summary of some of what is known about semanticquestions for the different models we have seen.

Table 10.1: Computability of semantic properties

Model Halting Emptiness EquivalenceRegular expressions Computable Computable ComputableContext free grammars Computable Computable UncomputableTuring-complete models UncomputableUncomputable Uncomputable

Page 371: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 371

Chapter Recap

โ€ข The uncomputability of the Halting problem forgeneral models motivates the definition of re-stricted computational models.

โ€ข In some restricted models we can answer semanticquestions such as: does a given program terminate,or do two programs compute the same function?

โ€ข Regular expressions are a restricted model of com-putation that is often useful to capture tasks ofstring matching. We can test efficiently whetheran expression matches a string, as well as answerquestions such as Halting and Equivalence.

โ€ข Context free grammars is a stronger, yet still not Tur-ing complete, model of computation. The haltingproblem for context free grammars is computable,but equivalence is not computable.

10.5 EXERCISES

Exercise 10.1 โ€” Closure properties of context-free functions. Suppose that๐น, ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1 are context free. For each one of the followingdefinitions of the function ๐ป , either prove that ๐ป is always contextfree or give a counterexample for regular ๐น, ๐บ that would make ๐ป notcontext free.

1. ๐ป(๐‘ฅ) = ๐น(๐‘ฅ) โˆจ ๐บ(๐‘ฅ).2. ๐ป(๐‘ฅ) = ๐น(๐‘ฅ) โˆง ๐บ(๐‘ฅ)3. ๐ป(๐‘ฅ) = NAND(๐น(๐‘ฅ), ๐บ(๐‘ฅ)).4. ๐ป(๐‘ฅ) = ๐น(๐‘ฅ๐‘…) where ๐‘ฅ๐‘… is the reverse of ๐‘ฅ: ๐‘ฅ๐‘… = ๐‘ฅ๐‘›โˆ’1๐‘ฅ๐‘›โˆ’2 โ‹ฏ ๐‘ฅ๐‘œ for๐‘› = |๐‘ฅ|.5. ๐ป(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐‘ข๐‘ฃ s.t. ๐น(๐‘ข) = ๐บ(๐‘ฃ) = 10 otherwise

6. ๐ป(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐‘ข๐‘ข s.t. ๐น(๐‘ข) = ๐บ(๐‘ข) = 10 otherwise

7. ๐ป(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฅ = ๐‘ข๐‘ข๐‘… s.t. ๐น(๐‘ข) = ๐บ(๐‘ข) = 10 otherwise

Exercise 10.2 Prove that the function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 such that๐น(๐‘ฅ) = 1 if and only if |๐‘ฅ| is a power of two is not context free.

Page 372: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

372 introduction to theoretical computer science

2 Try to see if you can โ€œembedโ€ in some way a func-tion that looks similar to MATCHPAREN in SYN, soyou can use a similar proof. Of course for a functionto be non-regular, it does not need to utilize literalparentheses symbols.

Exercise 10.3 โ€” Syntax for programming languages. Consider the followingsyntax of a โ€œprogramming languageโ€ whose source can be writtenusing the ASCII character set:

โ€ข Variables are obtained by a sequence of letters, numbers and under-scores, but canโ€™t start with a number.

โ€ข A statement has either the form foo = bar; where foo and bar arevariables, or the form IF (foo) BEGIN ... ENDwhere ... is listof one or more statements, potentially separated by newlines.

A program in our language is simply a sequence of statements (pos-sibly separated by newlines or spaces).

1. Let VAR โˆถ 0, 1โˆ— โ†’ 0, 1 be the function that given a string๐‘ฅ โˆˆ 0, 1โˆ—, outputs 1 if and only if ๐‘ฅ corresponds to an ASCIIencoding of a valid variable identifier. Prove that VAR is regular.

2. Let SYN โˆถ 0, 1โˆ— โ†’ 0, 1 be the function that given a string๐‘  โˆˆ 0, 1โˆ—, outputs 1 if and only if ๐‘  is an ASCII encoding of a validprogram in our language. Prove that SYN is context free. (You donot have to specify the full formal grammar for SYN, but you needto show that such a grammar exists.)

3. Prove that SYN is not regular. See footnote for hint2

10.6 BIBLIOGRAPHICAL NOTES

As in the case of regular expressions, there are many resources avail-able that cover context-free grammar in great detail. Chapter 2 of[Sip97] contains many examples of context-free grammars and theirproperties. There are also websites such as Grammophone where youcan input grammars, and see what strings they generate, as well assome of the properties that they satisfy.

The adjective โ€œcontext freeโ€ is used for CFGโ€™s because a rule ofthe form ๐‘ฃ โ†ฆ ๐‘Ž means that we can always replace ๐‘ฃ with the string๐‘Ž, no matter what is the context in which ๐‘ฃ appears. More generally,we might want to consider cases where the replacement rules dependon the context. This gives rise to the notion of general (aka โ€œType 0โ€)grammars that allow rules of the form ๐‘Ž โ‡’ ๐‘ where both ๐‘Ž and ๐‘ arestrings over (๐‘‰ โˆช ฮฃ)โˆ—. The idea is that if, for example, we wanted toenforce the condition that we only apply some rule such as ๐‘ฃ โ†ฆ 0๐‘ค1when ๐‘ฃ is surrounded by three zeroes on both sides, then we could doso by adding a rule of the form 000๐‘ฃ000 โ†ฆ 0000๐‘ค1000 (and of coursewe can add much more general conditions). Alas, this generality

Page 373: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

restricted computational models 373

comes at a cost - general grammars are Turing complete and hencetheir halting problem is uncomputable. That is, there is no algorithm๐ด that can determine for every general grammar ๐บ and a string ๐‘ฅ,whether or not the grammar ๐บ generates ๐‘ฅ.

The Chomsky Hierarchy is a hierarchy of grammars from the leastrestrictive (most powerful) Type 0 grammars, which correspond torecursively enumerable languages (see Exercise 9.10) to the most re-strictive Type 3 grammars, which correspond to regular languages.Context-free languages correspond to Type 2 grammars. Type 1 gram-mars are context sensitive grammars. These are more powerful thancontext-free grammars but still less powerful than Turing machines.In particular functions/languages corresponding to context-sensitivegrammars are always computable, and in fact can be computed by alinear bounded automatons which are non-deterministic algorithmsthat take ๐‘‚(๐‘›) space. For this reason, the class of functions/languagescorresponding to context-sensitive grammars is also known as thecomplexity class NSPACE๐‘‚(๐‘›); we discuss space-bounded com-plexity in Chapter 17). While Riceโ€™s Theorem implies that we cannotcompute any non-trivial semantic property of Type 0 grammars, thesituation is more complex for other types of grammars: some seman-tic properties can be determined and some cannot, depending on thegrammarโ€™s place in the hierarchy.

Page 374: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 375: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

11Is every theorem provable?

โ€œTake any definite unsolved problem, such as โ€ฆ the existence of an infinitenumber of prime numbers of the form 2๐‘› + 1. However unapproachable theseproblems may seem to us and however helpless we stand before them, we have,nevertheless, the firm conviction that their solution must follow by a finitenumber of purely logical processesโ€ฆโ€โ€œโ€ฆThis conviction of the solvability of every mathematical problem is a pow-erful incentive to the worker. We hear within us the perpetual call: There is theproblem. Seek its solution. You can find it by pure reason, for in mathematicsthere is no ignorabimus.โ€, David Hilbert, 1900.

โ€œThe meaning of a statement is its method of verification.โ€, Moritz Schlick,1938 (aka โ€œThe verification principleโ€ of logical positivism)

The problems shown uncomputable in Chapter 9, while naturaland important, still intimately involved NAND-TM programs or othercomputing mechanisms in their definitions. One could perhaps hopethat as long as we steer clear of functions whose inputs are themselvesprograms, we can avoid the โ€œcurse of uncomputabilityโ€. Alas, we haveno such luck.

In this chapter we will see an example of a natural and seeminglyโ€œcomputation freeโ€ problem that nevertheless turns out to be uncom-putable: solving Diophantine equations. As a corollary, we will seeone of the most striking results of 20th century mathematics: Gรถdelโ€™sIncompleteness Theorem, which showed that there are some mathemat-ical statements (in fact, in number theory) that are inherently unprov-able. We will actually start with the latter result, and then show theformer.

11.1 HILBERTโ€™S PROGRAM AND Gร–DELโ€™S INCOMPLETENESSTHEOREM

โ€œAnd what are these โ€ฆvanishing increments? They are neither finite quanti-ties, nor quantities infinitely small, nor yet nothing. May we not call them theghosts of departed quantities?โ€, George Berkeley, Bishop of Cloyne, 1734.

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข More examples of uncomputable functions

that are not as tied to computation.โ€ข Gรถdelโ€™s incompleteness theorem - a result

that shook the world of mathematics in theearly 20th century.

Page 376: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

376 introduction to theoretical computer science

Figure 11.1: Outline of the results of this chapter. Oneversion of Gรถdelโ€™s Incompleteness Theorem is animmediate consequence of the uncomputability of theHalting problem. To obtain the theorem as originallystated (for statements about the integers) we firstprove that the QMS problem of determining truthof quantified statements involving both integers andstrings is uncomputable. We do so using the notion ofTuring Machine configurations but there are alternativeapproaches to do so as well, see Remark 11.14.

The 1700โ€™s and 1800โ€™s were a time of great discoveries in mathe-matics but also of several crises. The discovery of calculus by Newtonand Leibnitz in the late 1600โ€™s ushered a golden age of problem solv-ing. Many longstanding challenges succumbed to the new tools thatwere discovered, and mathematicians got ever better at doing sometruly impressive calculations. However, the rigorous foundationsbehind these calculations left much to be desired. Mathematiciansmanipulated infinitesimal quantities and infinite series cavalierly, andwhile most of the time they ended up with the correct results, therewere a few strange examples (such as trying to calculate the valueof the infinite series 1 โˆ’ 1 + 1 โˆ’ 1 + 1 + โ€ฆ) which seemed to giveout different answers depending on the method of calculation. Thisled to a growing sense of unease in the foundations of the subjectwhich was addressed in the works of mathematicians such as Cauchy,Weierstrass, and Riemann, who eventually placed analysis on firmerfoundations, giving rise to the ๐œ–โ€™s and ๐›ฟโ€™s that students taking honorscalculus grapple with to this day.

In the beginning of the 20th century, there was an effort to replicatethis effort, in greater rigor, to all parts of mathematics. The hope wasto show that all the true results of mathematics can be obtained bystarting with a number of axioms, and deriving theorems from themusing logical rules of inference. This effort was known as the Hilbertprogram, named after the influential mathematician David Hilbert.

Alas, it turns out the results weโ€™ve seen dealt a devastating blow tothis program, as was shown by Kurt Gรถdel in 1931:

Theorem 11.1 โ€” Gรถdelโ€™s Incompleteness Theorem: informal version. Forevery sound proof system ๐‘‰ for sufficiently rich mathematicalstatements, there is a mathematical statement that is true but is notprovable in ๐‘‰ .

Page 377: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 377

1 This happens to be a false statement.

2 It is unknown whether this statement is true or false.

11.1.1 Defining โ€œProof Systemsโ€Before proving Theorem 11.1, we need to define โ€œproof systemsโ€ andeven formally define the notion of a โ€œmathematical statementโ€. Ingeometry and other areas of mathematics, proof systems are oftendefined by starting with some basic assumptions or axioms and thenderiving more statements by using inference rules such as the famousModus Ponens, but what axioms shall we use? What rules? We willuse an extremely general notion of proof systems, not even restrictingourselves to ones that have the form of axioms and inference.

Mathematical statements. At the highest level, a mathematical statementis simply a piece of text, which we can think of as a string ๐‘ฅ โˆˆ 0, 1โˆ—.Mathematical statements contain assertions whose truth does notdepend on any empirical fact, but rather only on properties of abstractobjects. For example, the following is a mathematical statement:1

โ€œThe number 2,696,635,869,504,783,333,238,805,675,613, 588,278,597,832,162,617,892,474,670,798,113is primeโ€.

Mathematical statements do not have to involve numbers. Theycan assert properties of any other mathematical object including sets,strings, functions, graphs and yes, even programs. Thus, another exam-ple of a mathematical statement is the following:2

The following Python function halts on every positive integer n

def f(n):if n==1: return 1return f(3*n+1) if n % 2 else f(n//2)

Proof systems. A proof for a statement ๐‘ฅ โˆˆ 0, 1โˆ— is another piece oftext ๐‘ค โˆˆ 0, 1โˆ— that certifies the truth of the statement asserted in ๐‘ฅ.The conditions for a valid proof system are:

1. (Effectiveness) Given a statement ๐‘ฅ and a proof ๐‘ค, there is an algo-rithm to verify whether or not ๐‘ค is a valid proof for ๐‘ฅ. (For exam-ple, by going line by line and checking that each line follows fromthe preceding ones using one of the allowed inference rules.)

2. (Soundness) If there is a valid proof ๐‘ค for ๐‘ฅ then ๐‘ฅ is true.

These are quite minimal requirements for a proof system. Require-ment 2 (soundness) is the very definition of a proof system: youshouldnโ€™t be able to prove things that are not true. Requirement 1is also essential. If there is no set of rules (i.e., an algorithm) to checkthat a proof is valid then in what sense is it a proof system? We could

Page 378: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

378 introduction to theoretical computer science

replace it with a system where the โ€œproofโ€ for a statement ๐‘ฅ is โ€œtrustme: itโ€™s trueโ€.

We formally define proof systems as an algorithm ๐‘‰ where๐‘‰ (๐‘ฅ, ๐‘ค) = 1 holds if the string ๐‘ค is a valid proof for the statement ๐‘ฅ.Even if ๐‘ฅ is true, the string ๐‘ค does not have to be a valid proof for it(there are plenty of wrong proofs for true statements such as 4=2+2)but if ๐‘ค is a valid proof for ๐‘ฅ then ๐‘ฅ must be true.

Definition 11.2 โ€” Proof systems. Let ๐’ฏ โŠ† 0, 1โˆ— be some set (whichwe consider the โ€œtrueโ€ statements). A proof system for ๐’ฏ is an algo-rithm ๐‘‰ that satisfies:

1. (Effectiveness) For every ๐‘ฅ, ๐‘ค โˆˆ 0, 1โˆ—, ๐‘‰ (๐‘ฅ, ๐‘ค) halts with an out-put of either 0 or 1.

2. (Soundness) For every ๐‘ฅ โˆ‰ ๐’ฏ and ๐‘ค โˆˆ 0, 1โˆ—, ๐‘‰ (๐‘ฅ, ๐‘ค) = 0.A true statement ๐‘ฅ โˆˆ ๐’ฏ is unprovable (with respect to ๐‘‰ ) if for

every ๐‘ค โˆˆ 0, 1โˆ—, ๐‘‰ (๐‘ฅ, ๐‘ค) = 0. We say that ๐‘‰ is complete if theredoes not exist a true statement ๐‘ฅ that is unprovable with respect to๐‘‰ .

Big Idea 15 A proof is just a string of text whose meaning is givenby a verification algorithm.

11.2 Gร–DELโ€™S INCOMPLETENESS THEOREM: COMPUTATIONALVARIANT

Our first formalization of Theorem 11.1 involves statements aboutTuring machines. We let โ„‹ be the set of strings ๐‘ฅ โˆˆ 0, 1โˆ— that havethe form โ€œTuring machine ๐‘€ halts on the zero inputโ€.

Theorem 11.3 โ€” Gรถdelโ€™s Incompleteness Theorem: computational variant.

There does not exist a complete proof system for โ„‹.

Proof Idea:

If we had such a complete and sound proof system then we couldsolve the HALTONZERO problem. On input a Turing machine ๐‘€ ,we would search all purported proofs ๐‘ค and halt as soon as we finda proof of either โ€œ๐‘€ halts on zeroโ€ or โ€œ๐‘€ does not halt on zeroโ€. Ifthe system is sound and complete then we will eventually find such aproof, and it will provide us with the correct output.โ‹†

Page 379: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 379

Proof of Theorem 11.3. Assume for the sake of contradiction that therewas such a proof system ๐‘‰ . We will use ๐‘‰ to build an algorithm ๐ดthat computes HALTONZERO, hence contradicting Theorem 9.9. Ouralgorithm ๐ด will will work as follows:

Algorithm 11.4 โ€” Halting from proofs.

Input: Turing Machine ๐‘€Output: 1 ๐‘€ if halts on theInput: 0; 0 otherwise.1: for ๐‘› = 1, 2, 3, โ€ฆ do2: for ๐‘ค โˆˆ 0, 1๐‘› do3: if ๐‘‰ (โ€๐‘€ halts on 0โ€, ๐‘ค) = 1 then4: return 15: end if6: if ๐‘‰ (โ€๐‘€ does not halt on 0โ€, ๐‘ค) = 1 then7: return 08: end if9: end for

10: end for

If ๐‘€ halts on 0 then under our assumption there exists ๐‘ค thatproves this fact, and so when Algorithm ๐ด reaches ๐‘› = |๐‘ค| we willeventually find this ๐‘ค and output 1, unless we already halted be-fore. But we cannot halt before and output a wrong answer becauseit would contradict the soundness of the proof system. Similarly, thisshows that if ๐‘€ does not halt on 0 then (since we assume there is aproof of this fact too) our algorithm ๐ด will eventually halt and output0.

RRemark 11.5 โ€” The Gรถdel statement (optional). One canextract from the proof of Theorem 11.3 a procedurethat for every proof system ๐‘‰ , yields a true statement๐‘ฅโˆ— that cannot be proven in ๐‘‰ . But Gรถdelโ€™s proofgave a very explicit description of such a statement ๐‘ฅโˆ—which is closely related to the โ€œLiarโ€™s paradoxโ€. Thatis, Gรถdelโ€™s statement ๐‘ฅโˆ— was designed to be true if andonly if โˆ€๐‘คโˆˆ0,1โˆ— ๐‘‰ (๐‘ฅ, ๐‘ค) = 0. In other words, it satisfiedthe following property

๐‘ฅโˆ— is true โ‡” ๐‘ฅโˆ— does not have a proof in ๐‘‰ (11.1)

One can see that if ๐‘ฅโˆ— is true, then it does not have aproof, but it is false then (assuming the proof systemis sound) then it cannot have a proof, and hence ๐‘ฅโˆ—

Page 380: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

380 introduction to theoretical computer science

must be both true and unprovable. One might wonderhow is it possible to come up with an ๐‘ฅโˆ— that satisfiesa condition such as (11.1) where the same string ๐‘ฅโˆ—appears on both the righthand side and the lefthandside of the equation. The idea is that the proof of The-orem 11.3 yields a way to transform every statement ๐‘ฅinto a statement ๐น(๐‘ฅ) that is true if and only if ๐‘ฅ doesnot have a proof in ๐‘‰ . Thus ๐‘ฅโˆ— needs to be a fixed pointof ๐น : a sentence such that ๐‘ฅโˆ— = ๐น(๐‘ฅโˆ—). It turns out thatwe can always find such a fixed point of ๐น . Weโ€™ve al-ready seen this phenomenon in the ๐œ† calculus, wherethe ๐‘Œ combinator maps every ๐น into a fixed point ๐‘Œ ๐นof ๐น . This is very related to the idea of programs thatcan print their own code. Indeed, Scott Aaronson likesto describe Gรถdelโ€™s statement as follows:

The following sentence repeated twice, the sec-ond time in quotes, is not provable in the formalsystem ๐‘‰ . โ€œThe following sentence repeatedtwice, the second time in quotes, is not provablein the formal system ๐‘‰ .โ€

In the argument above we actually showed that ๐‘ฅโˆ— istrue, under the assumption that ๐‘‰ is sound. Since ๐‘ฅโˆ—is true and does not have a proof in ๐‘‰ , this means thatwe cannot carry the above argument in the system ๐‘‰ ,which means that ๐‘‰ cannot prove its own soundness(or even consistency: that there is no proof of both astatement and its negation). Using this idea, itโ€™s nothard to get Gรถdelโ€™s second incompleteness theorem,which says that every sufficiently rich ๐‘‰ cannot proveits own consistency. That is, if we formalize the state-ment ๐‘โˆ— that is true if and only if ๐‘‰ is consistent (i.e.,๐‘‰ cannot prove both a statement and the statementโ€™snegation), then ๐‘โˆ— cannot be proven in ๐‘‰ .

11.3 QUANTIFIED INTEGER STATEMENTS

There is something โ€œunsatisfyingโ€ about Theorem 11.3. Sure, it showsthere are statements that are unprovable, but they donโ€™t feel like โ€œrealโ€statements about math. After all, they talk about programs rather thannumbers, matrices, or derivatives, or whatever it is they teach in mathcourses. It turns out that we can get an analogous result for statementssuch as โ€œthere are no positive integers ๐‘ฅ and ๐‘ฆ such that ๐‘ฅ2 โˆ’ 2 =๐‘ฆ7โ€, or โ€œthere are positive integers ๐‘ฅ, ๐‘ฆ, ๐‘ง such that ๐‘ฅ2 + ๐‘ฆ6 = ๐‘ง11โ€that only talk about natural numbers. It doesnโ€™t get much more โ€œrealmathโ€ than this. Indeed, the 19th century mathematician LeopoldKronecker famously said that โ€œGod made the integers, all else is thework of man.โ€ (By the way, the status of the above two statements isunknown.)

Page 381: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 381

To make this more precise, let us define the notion of quantifiedinteger statements:

Definition 11.6 โ€” Quantified integer statements. A quantified integer state-ment is a well-formed statement with no unbound variables involv-ing integers, variables, the operators >, <, ร—, +, โˆ’, =, the logicaloperations ยฌ (NOT), โˆง (AND), and โˆจ (OR), as well as quantifiersof the form โˆƒ๐‘ฅโˆˆโ„• and โˆ€๐‘ฆโˆˆโ„• where ๐‘ฅ, ๐‘ฆ are variable names.

We often care deeply about determining the truth of quantifiedinteger statements. For example, the statement that Fermatโ€™s LastTheorem is true for ๐‘› = 3 can be phrased as the quantified integerstatement

ยฌโˆƒ๐‘Žโˆˆโ„•โˆƒ๐‘โˆˆโ„•โˆƒ๐‘โˆˆโ„•(๐‘Ž > 0)โˆง(๐‘ > 0)โˆง(๐‘ > 0)โˆง(๐‘Ž ร— ๐‘Ž ร— ๐‘Ž + ๐‘ ร— ๐‘ ร— ๐‘ = ๐‘ ร— ๐‘ ร— ๐‘) .(11.2)

The twin prime conjecture, that states that there is an infinite num-ber of numbers ๐‘ such that both ๐‘ and ๐‘ + 2 are primes can be phrasedas the quantified integer statementโˆ€๐‘›โˆˆโ„•โˆƒ๐‘โˆˆโ„•(๐‘ > ๐‘›) โˆง PRIME(๐‘) โˆง PRIME(๐‘ + 2) (11.3)

where we replace an instance of PRIME(๐‘ž) with the statement (๐‘ž >1) โˆง โˆ€๐‘Žโˆˆโ„•โˆ€๐‘โˆˆโ„•(๐‘Ž = 1) โˆจ (๐‘Ž = ๐‘ž) โˆจ ยฌ(๐‘Ž ร— ๐‘ = ๐‘ž).The claim (mentioned in Hilbertโ€™s quote above) that are infinitely

many primes of the form ๐‘ = 2๐‘› + 1 can be phrased as follows:

โˆ€๐‘›โˆˆโ„•โˆƒ๐‘โˆˆโ„•(๐‘ > ๐‘›) โˆง PRIME(๐‘)โˆง(โˆ€๐‘˜โˆˆโ„•(๐‘˜ โ‰  2 โˆง PRIME(๐‘˜)) โ‡’ ยฌDIVIDES(๐‘˜, ๐‘ โˆ’ 1)) (11.4)

where DIVIDES(๐‘Ž, ๐‘) is the statement โˆƒ๐‘โˆˆโ„•๐‘ ร— ๐‘ = ๐‘Ž. In English, thiscorresponds to the claim that for every ๐‘› there is some ๐‘ > ๐‘› such thatall of ๐‘ โˆ’ 1โ€™s prime factors are equal to 2.

RRemark 11.7 โ€” Syntactic sugar for quantified integerstatements. To make our statements more readable,we often use syntactic sugar and so write ๐‘ฅ โ‰  ๐‘ฆ asshorthand for ยฌ(๐‘ฅ = ๐‘ฆ), and so on. Similarly, theโ€œimplication operatorโ€ ๐‘Ž โ‡’ ๐‘ is โ€œsyntactic sugarโ€ orshorthand for ยฌ๐‘Ž โˆจ ๐‘, and the โ€œif and only if operatorโ€๐‘Ž โ‡” is shorthand for (๐‘Ž โ‡’ ๐‘) โˆง (๐‘ โ‡’ ๐‘Ž). We willalso allow ourselves the use of โ€œmacrosโ€: plugging inone quantified integer statement in another, as we didwith DIVIDES and PRIME above.

Page 382: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

382 introduction to theoretical computer science

Much of number theory is concerned with determining the truthof quantified integer statements. Since our experience has been that,given enough time (which could sometimes be several centuries) hu-manity has managed to do so for the statements that it cared enoughabout, one could (as Hilbert did) hope that eventually we would beable to prove or disprove all such statements. Alas, this turns out to beimpossible:

Theorem 11.8 โ€” Gรถdelโ€™s Incompleteness Theorem for quantified integer state-

ments. Let ๐‘‰ โˆถ 0, 1โˆ— โ†’ 0, 1 a computable purported verificationprocedure for quantified integer statements. Then either:

โ€ข ๐‘‰ is not sound: There exists a false statement ๐‘ฅ and a string๐‘ค โˆˆ 0, 1โˆ— such that ๐‘‰ (๐‘ฅ, ๐‘ค) = 1.or

โ€ข ๐‘‰ is not complete: There exists a true statement ๐‘ฅ such that forevery ๐‘ค โˆˆ 0, 1โˆ—, ๐‘‰ (๐‘ฅ, ๐‘ค) = 0.

Theorem 11.8 is a direct corollary of the following result, justas Theorem 11.3 was a direct corollary of the uncomputability ofHALTONZERO:

Theorem 11.9 โ€” Uncomputability of quantified integer statements. LetQIS โˆถ 0, 1โˆ— โ†’ 0, 1 be the function that given a (string rep-resentation of) a quantified integer statement outputs 1 if it is trueand 0 if it is false. Then QIS is uncomputable.

Since a quantified integer statement is simply a sequence of sym-bols, we can easily represent it as a string. For simplicity we will as-sume that every string represents some quantified integer statement,by mapping strings that do not correspond to such a statement to anarbitrary statement such as โˆƒ๐‘ฅโˆˆโ„•๐‘ฅ = 1.

PPlease stop here and make sure you understandwhy the uncomputability of QIS (i.e., Theorem 11.9)means that there is no sound and complete proofsystem for proving quantified integer statements (i.e.,Theorem 11.8). This follows in the same way thatTheorem 11.3 followed from the uncomputability ofHALTONZERO, but working out the details is a greatexercise (see Exercise 11.1)

In the rest of this chapter, we will show the proof of Theorem 11.8,following the outline illustrated in Fig. 11.1.

Page 383: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 383

Figure 11.2: Diophantine equations such as findinga positive integer solution to the equation ๐‘Ž(๐‘Ž +๐‘)(๐‘Ž + ๐‘) + ๐‘(๐‘ + ๐‘Ž)(๐‘ + ๐‘) + ๐‘(๐‘ + ๐‘Ž)(๐‘ + ๐‘) =4(๐‘Ž + ๐‘)(๐‘Ž + ๐‘)(๐‘ + ๐‘) (depicted more compactlyand whimsically above) can be surprisingly difficult.There are many equations for which we do not knowif they have a solution, and there is no algorithm tosolve them in general. The smallest solution for thisequation has 80 digits! See this Quora post for moreinformation, including the credits for this image.3 This is a special case of whatโ€™s known as โ€œFermatโ€™sLast Theoremโ€ which states that ๐‘Ž๐‘› + ๐‘๐‘› = ๐‘๐‘› has nosolution in integers for ๐‘› > 2. This was conjectured in1637 by Pierre de Fermat but only proven by AndrewWiles in 1991. The case ๐‘› = 11 (along with all otherso called โ€œregular prime exponentsโ€) was establishedby Kummer in 1850.

11.4 DIOPHANTINE EQUATIONS AND THE MRDP THEOREM

Many of the functions people wanted to compute over the years in-volved solving equations. These have a much longer history thanmechanical computers. The Babylonians already knew how to solvesome quadratic equations in 2000BC, and the formula for all quadrat-ics appears in the Bakhshali Manuscript that was composed in Indiaaround the 3rd century. During the Renaissance, Italian mathemati-cians discovered generalization of these formulas for cubic and quar-tic (degrees 3 and 4) equations. Many of the greatest minds of the17th and 18th century, including Euler, Lagrange, Leibniz and Gaussworked on the problem of finding such a formula for quintic equationsto no avail, until in the 19th century Ruffini, Abel and Galois showedthat no such formula exists, along the way giving birth to group theory.

However, the fact that there is no closed-form formula doesnot mean we can not solve such equations. People have beensolving higher degree equations numerically for ages. The Chinesemanuscript Jiuzhang Suanshu from the first century mentions suchapproaches. Solving polynomial equations is by no means restrictedonly to ancient history or to studentsโ€™ homework. The gradientdescent method is the workhorse powering many of the machinelearning tools that have revolutionized Computer Science over the lastseveral years.

But there are some equations that we simply do not know how tosolve by any means. For example, it took more than 200 years until peo-ple succeeded in proving that the equation ๐‘Ž11 + ๐‘11 = ๐‘11 has nosolution in integers.3 The notorious difficulty of so called Diophantineequations (i.e., finding integer roots of a polynomial) motivated themathematician David Hilbert in 1900 to include the question of find-ing a general procedure for solving such equations in his famous listof twenty-three open problems for mathematics of the 20th century. Idonโ€™t think Hilbert doubted that such a procedure exists. After all, thewhole history of mathematics up to this point involved the discoveryof ever more powerful methods, and even impossibility results suchas the inability to trisect an angle with a straightedge and compass, orthe non-existence of an algebraic formula for quintic equations, merelypointed out to the need to use more general methods.

Alas, this turned out not to be the case for Diophantine equations.In 1970, Yuri Matiyasevich, building on a decades long line of work byMartin Davis, Hilary Putnam and Julia Robinson, showed that there issimply no method to solve such equations in general:

Theorem 11.10 โ€” MRDP Theorem. Let DIO โˆถ 0, 1โˆ— โ†’ 0, 1 be thefunction that takes as input a string describing a 100-variable poly-

Page 384: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

384 introduction to theoretical computer science

nomial with integer coefficients ๐‘ƒ(๐‘ฅ0, โ€ฆ , ๐‘ฅ99) and outputs 1 if andonly if there exists ๐‘ง0, โ€ฆ , ๐‘ง99 โˆˆ โ„• s.t. ๐‘ƒ(๐‘ง0, โ€ฆ , ๐‘ง99) = 0.

Then DIO is uncomputable.

As usual, we assume some standard way to express numbers andtext as binary strings. The constant 100 is of course arbitrary; the prob-lem is known to be uncomputable even for polynomials of degreefour and at most 58 variables. In fact the number of variables can bereduced to nine, at the expense of the polynomial having a larger (butstill constant) degree. See Jonesโ€™s paper for more about this issue.

RRemark 11.11 โ€” Active code vs static data. The diffi-culty in finding a way to distinguish between โ€œcodeโ€such as NAND-TM programs, and โ€œstatic contentโ€such as polynomials is just another manifestation ofthe phenomenon that code is the same as data. Whilea fool-proof solution for distinguishing between thetwo is inherently impossible, finding heuristics that doa reasonable job keeps many firewall and anti-virusmanufacturers very busy (and finding ways to bypassthese tools keeps many hackers busy as well).

11.5 HARDNESS OF QUANTIFIED INTEGER STATEMENTS

We will not prove the MRDP Theorem (Theorem 11.10). However, aswe mentioned, we will prove the uncomputability of QIS (i.e., Theo-rem 11.9), which is a special case of the MRDP Theorem. The reasonis that a Diophantine equation is a special case of a quantified integerstatement where the only quantifier is โˆƒ. This means that deciding thetruth of quantified integer statements is a potentially harder problemthan solving Diophantine equations, and so it is potentially easier toprove that QIS is uncomputable.

PIf you find the last sentence confusing, it is worth-while to reread it until you are sure you follow itslogic. We are so accustomed to trying to find solu-tions for problems that it can sometimes be hard tofollow the arguments for showing that problems areuncomputable.

Our proof of the uncomputability of QIS (i.e. Theorem 11.9) will, asusual, go by reduction from the Halting problem, but we will do so intwo steps:

Page 385: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 385

1. We will first use a reduction from the Halting problem to show thatdeciding the truth of quantified mixed statements is uncomputable.Quantified mixed statements involve both strings and integers.Since quantified mixed statements are a more general concept thanquantified integer statements, it is easier to prove the uncomputabil-ity of deciding their truth.

2. We will then reduce the problem of quantified mixed statements toquantifier integer statements.

11.5.1 Step 1: Quantified mixed statements and computation historiesWe define quantified mixed statements as statements involving not justintegers and the usual arithmetic operators, but also string variables aswell.

Definition 11.12 โ€” Quantified mixed statements. A quantified mixed state-ment is a well-formed statement with no unbound variables involv-ing integers, variables, the operators >, <, ร—, +, โˆ’, =, the logicaloperations ยฌ (NOT), โˆง (AND), and โˆจ (OR), as well as quanti-fiers of the form โˆƒ๐‘ฅโˆˆโ„•, โˆƒ๐‘Žโˆˆ0,1โˆ— , โˆ€๐‘ฆโˆˆโ„•, โˆ€๐‘โˆˆ0,1โˆ— where ๐‘ฅ, ๐‘ฆ, ๐‘Ž, ๐‘ arevariable names. These also include the operator |๐‘Ž| which returnsthe length of a string valued variable ๐‘Ž, as well as the operator ๐‘Ž๐‘–where ๐‘Ž is a string-valued variable and ๐‘– is an integer valued ex-pression which is true if ๐‘– is smaller than the length of ๐‘Ž and the ๐‘–๐‘กโ„Žcoordinate of ๐‘Ž is 1, and is false otherwise.

For example, the true statement that for every string ๐‘Ž there is astring ๐‘ that corresponds to ๐‘Ž in reverse order can be phrased as thefollowing quantified mixed statementโˆ€๐‘Žโˆˆ0,1โˆ—โˆƒ๐‘โˆˆ0,1โˆ—(|๐‘Ž| = |๐‘|) โˆง (โˆ€๐‘–โˆˆโ„•๐‘– < |๐‘Ž| โ‡’ (๐‘Ž๐‘– โ‡” ๐‘|๐‘Ž|โˆ’๐‘–)) . (11.5)

Quantified mixed statements are more general than quantifiedinteger statements, and so the following theorem is potentially easierto prove than Theorem 11.9:

Theorem 11.13 โ€” Uncomputability of quantified mixed statements. LetQMS โˆถ 0, 1โˆ— โ†’ 0, 1 be the function that given a (string rep-resentation of) a quantified mixed statement outputs 1 if it is trueand 0 if it is false. Then QMS is uncomputable.

Proof Idea:

The idea behind the proof is similar to that used in showing thatone-dimensional cellular automata are Turing complete (Theorem 8.7)as well as showing that equivalence (or even โ€œfullnessโ€) of contextfree grammars is uncomputable (Theorem 10.10). We use the notion

Page 386: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

386 introduction to theoretical computer science

of a configuration of a NAND-TM program as in Definition 8.8. Sucha configuration can be thought of as a string ๐›ผ over some large-but-finite alphabet ฮฃ describing its current state, including the valuesof all arrays, scalars, and the index variable i. It can be shown thatif ๐›ผ is the configuration at a certain step of the execution and ๐›ฝ isthe configuration at the next step, then ๐›ฝ๐‘— = ๐›ผ๐‘— for all ๐‘— outside of๐‘– โˆ’ 1, ๐‘–, ๐‘– + 1 where ๐‘– is the value of i. In particular, every value ๐›ฝ๐‘— issimply a function of ๐›ผ๐‘—โˆ’1,๐‘—,๐‘—+1. Using these observations we can writea quantified mixed statement NEXT(๐›ผ, ๐›ฝ) that will be true if and only if๐›ฝ is the configuration encoding the next step after ๐›ผ. Since a program๐‘ƒ halts on input ๐‘ฅ if and only if there is a sequence of configurations๐›ผ0, โ€ฆ , ๐›ผ๐‘กโˆ’1 (known as a computation history) starting with the initialconfiguration with input ๐‘ฅ and ending in a halting configuration, wecan define a quantified mixed statement to determine if there is sucha statement by taking a universal quantifier over all strings ๐ป (forhistory) that encode a tuple (๐›ผ0, ๐›ผ1, โ€ฆ , ๐›ผ๐‘กโˆ’1) and then checking that๐›ผ0 and ๐›ผ๐‘กโˆ’1 are valid starting and halting configurations, and thatNEXT(๐›ผ๐‘—, ๐›ผ๐‘—+1) is true for every ๐‘— โˆˆ 0, โ€ฆ , ๐‘ก โˆ’ 2.โ‹†Proof of Theorem 11.13. The proof is obtained by a reduction from theHalting problem. Specifically, we will use the notion of a configura-tion of a Turing Machines (Definition 8.8) that we have seen in thecontext of proving that one dimensional cellular automata are Turingcomplete. We need the following facts about configurations:

โ€ข For every Turing Machine ๐‘€ , there is a finite alphabet ฮฃ, and aconfiguration of ๐‘€ is a string ๐›ผ โˆˆ ฮฃโˆ—.

โ€ข A configuration ๐›ผ encodes all the state of the program at a particu-lar iteration, including the array, scalar, and index variables.

โ€ข If ๐›ผ is a configuration, then ๐›ฝ = NEXT๐‘ƒ (๐›ผ) denotes the configura-tion of the computation after one more iteration. ๐›ฝ is a string over ฮฃof length either |๐›ผ| or |๐›ผ| + 1, and every coordinate of ๐›ฝ is a functionof just three coordinates in ๐›ผ. That is, for every ๐‘— โˆˆ 0, โ€ฆ , |๐›ฝ| โˆ’ 1,๐›ฝ๐‘— = MAP๐‘ƒ (๐›ผ๐‘—โˆ’1, ๐›ผ๐‘—, ๐›ผ๐‘—+1) where MAP๐‘ƒ โˆถ ฮฃ3 โ†’ ฮฃ is some functiondepending on ๐‘ƒ .

โ€ข There are simple conditions to check whether a string ๐›ผ is a validstarting configuration corresponding to an input ๐‘ฅ, as well as tocheck whether a string ๐›ผ is a halting configuration. In particularthese conditions can be phrased as quantified mixed statements.

โ€ข A program ๐‘€ halts on input ๐‘ฅ if and only if there exists a sequenceof configurations ๐ป = (๐›ผ0, ๐›ผ1, โ€ฆ , ๐›ผ๐‘‡ โˆ’1) such that (i) ๐›ผ0 is a valid

Page 387: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 387

starting configuration of ๐‘€ with input ๐‘ฅ, (ii) ๐›ผ๐‘‡ โˆ’1 is a valid haltingconfiguration of ๐‘ƒ , and (iii) ๐›ผ๐‘–+1 = NEXT๐‘ƒ (๐›ผ๐‘–) for every ๐‘– โˆˆ0, โ€ฆ , ๐‘‡ โˆ’ 2.We can encode such a sequence ๐ป of configuration as a binary

string. For concreteness, we let โ„“ = โŒˆlog(|ฮฃ| + 1)โŒ‰ and encode eachsymbol ๐œŽ in ฮฃ โˆช "; " by a string in 0, 1โ„“. We use โ€œ;โ€ as a โ€œseparatorโ€symbol, and so encode ๐ป = (๐›ผ0, ๐›ผ1, โ€ฆ , ๐›ผ๐‘‡ โˆ’1) as the concatenationof the encodings of each configuration, using โ€œ;โ€ to separate the en-coding of ๐›ผ๐‘– and ๐›ผ๐‘–+1 for every ๐‘– โˆˆ [๐‘‡ ]. In particular for every TuringMachine ๐‘€ , ๐‘€ halts on the input 0 if and only if the following state-ment ๐œ‘๐‘€ is true

โˆƒ๐ปโˆˆ0,1โˆ—๐ป encodes halting configuration sequence starting with input 0 .(11.6)

If we can encode the statement ๐œ‘๐‘€ as a mixed-integer statementthen, since ๐œ‘๐‘€ is true if and only if HALTONZERO(๐‘€) = 1, thiswould reduce the task of computing HALTONZERO to computingMIS, and hence imply (using Theorem 9.9 ) that MIS is uncomputable,completing the proof. Indeed, ๐œ‘๐‘€ can be encoded as a mixed-integerstatement for the following reasons:

1. Let ๐›ผ, ๐›ฝ โˆˆ 0, 1โˆ— be two strings that encode configurations of ๐‘€ .We can define a quantified mixed predicate NEXT(๐›ผ, ๐›ฝ) that is trueif and only if ๐›ฝ = NEXT๐‘€(๐›ฝ) (i.e., ๐›ฝ encodes the configurationobtained by proceeding from ๐›ผ in one computational step). IndeedNEXT(๐›ผ, ๐›ฝ) is true if for every ๐‘– โˆˆ 0, โ€ฆ , |๐›ฝ| which is a multipleof โ„“, ๐›ฝ๐‘–,โ€ฆ,๐‘–+โ„“โˆ’1 = MAP๐‘€(๐›ผ๐‘–โˆ’โ„“,โ‹ฏ,๐‘–+2โ„“โˆ’1) where MAP๐‘€ โˆถ 0, 13โ„“ โ†’0, 1โ„“ is the finite function above (identifying elements of ฮฃ withtheir encoding in 0, 1โ„“). Since MAP๐‘€ is a finite function, we canexpress it using the logical operations AND,OR, NOT (for exampleby computing MAP๐‘€ with NANDโ€™s).

2. Using the above we can now write the condition that for everysubstring of ๐ป that has the form ๐›ผENC(; )๐›ฝ with ๐›ผ, ๐›ฝ โˆˆ 0, 1โ„“and ENC(; ) being the encoding of the separator โ€œ;โ€, it holds thatNEXT(๐›ผ, ๐›ฝ) is true.

3. Finally, if ๐›ผ0 is a binary string encoding the initial configuration of๐‘€ on input 0, checking that the first |๐›ผ0| bits of ๐ป equal ๐›ผ0 can beexpressed using AND,OR, and NOTโ€™s. Similarly checking that thelast configuration encoded by ๐ป corresponds to a state in which ๐‘€will halt can also be expressed as a quantified statement.

Together the above yields a computable procedure that maps everyTuring Machine ๐‘€ into a quantified mixed statement ๐œ‘๐‘€ such that

Page 388: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

388 introduction to theoretical computer science

HALTONZERO(๐‘€) = 1 if and only if QMS(๐œ‘๐‘€) = 1. This reducescomputing HALTONZERO to computing QMS, and hence the uncom-putability of HALTONZERO implies the uncomputability of QMS.

RRemark 11.14 โ€” Alternative proofs. There are sev-eral other ways to show that QMS is uncomputable.For example, we can express the condition that a 1-dimensional cellular automaton eventually writes aโ€œ1โ€ to a given cell from a given initial configurationas a quantified mixed statement over a string encod-ing the history of all configurations. We can then usethe fact that cellular automatons can simulate Tur-ing machines (Theorem 8.7) to reduce the haltingproblem to QMS. We can also use other well knownuncomputable problems such as tiling or the post cor-respondence problem. Exercise 11.5 and Exercise 11.6explore two alternative proofs of Theorem 11.13.

11.5.2 Step 2: Reducing mixed statements to integer statementsWe now show how to prove Theorem 11.9 using Theorem 11.13. Theidea is again a proof by reduction. We will show a transformation ofevery quantifier mixed statement ๐œ‘ into a quantified integer statement๐œ‰ that does not use string-valued variables such that ๐œ‘ is true if andonly if ๐œ‰ is true.

To remove string-valued variables from a statement, we encodeevery string by a pair integer. We will show that we can encode astring ๐‘ฅ โˆˆ 0, 1โˆ— by a pair of numbers (๐‘‹, ๐‘›) โˆˆ โ„• s.t.

โ€ข ๐‘› = |๐‘ฅ|โ€ข There is a quantified integer statement COORD(๐‘‹, ๐‘–) that for every๐‘– < ๐‘›, will be true if ๐‘ฅ๐‘– = 1 and will be false otherwise.

This will mean that we can replace a โ€œfor allโ€ quantifier over stringssuch as โˆ€๐‘ฅโˆˆ0,1โˆ— with a pair of quantifiers over integers of the formโˆ€๐‘‹โˆˆโ„•โˆ€๐‘›โˆˆโ„• (and similarly replace an existential quantifier of the formโˆƒ๐‘ฅโˆˆ0,1โˆ— with a pair of quantifiers โˆƒ๐‘‹โˆˆโ„•โˆƒ๐‘›โˆˆโ„•) . We can then replace allcalls to |๐‘ฅ| by ๐‘› and all calls to ๐‘ฅ๐‘– by COORD(๐‘‹, ๐‘–). This means thatif we are able to define COORD via a quantified integer statement,then we obtain a proof of Theorem 11.9, since we can use it to mapevery mixed quantified statement ๐œ‘ to an equivalent quantified inte-ger statement ๐œ‰ such that ๐œ‰ is true if and only if ๐œ‘ is true, and henceQMS(๐œ‘) = QIS(๐œ‰). Such a procedure implies that the task of comput-ing QMS reduces to the task of computing QIS, which means that theuncomputability of QMS implies the uncomputability of QIS.

Page 389: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 389

The above shows that proof of Theorem 11.9 all boils down to find-ing the right encoding of strings as integers, and the right way toimplement COORD as a quantified integer statement. To achieve thiswe use the following technical result :Lemma 11.15 โ€” Constructible prime sequence. There is a sequence of primenumbers ๐‘0 < ๐‘1 < ๐‘2 < โ‹ฏ such that there is a quantified integerstatement PSEQ(๐‘, ๐‘–) that is true if and only if ๐‘ = ๐‘๐‘–.

Using Lemma 11.15 we can encode a ๐‘ฅ โˆˆ 0, 1โˆ— by the numbers(๐‘‹, ๐‘›) where ๐‘‹ = โˆ๐‘ฅ๐‘–=1 ๐‘๐‘– and ๐‘› = |๐‘ฅ|. We can then define thestatement COORD(๐‘‹, ๐‘–) as

COORD(๐‘‹, ๐‘–) = โˆƒ๐‘โˆˆโ„•PSEQ(๐‘, ๐‘–) โˆง DIVIDES(๐‘, ๐‘‹) (11.7)

where DIVIDES(๐‘Ž, ๐‘), as before, is defined as โˆƒ๐‘โˆˆโ„•๐‘Ž ร— ๐‘ = ๐‘. Note thatindeed if ๐‘‹, ๐‘› encodes the string ๐‘ฅ โˆˆ 0, 1โˆ—, then for every ๐‘– < ๐‘›,COORD(๐‘‹, ๐‘–) = ๐‘ฅ๐‘–, since ๐‘๐‘– divides ๐‘‹ if and only if ๐‘ฅ๐‘– = 1.

Thus all that is left to conclude the proof of Theorem 11.9 is toprove Lemma 11.15, which we now proceed to do.

Proof. The sequence of prime numbers we consider is the following:We fix ๐ถ to be a sufficiently large constant (๐ถ = 2234 will do) anddefine ๐‘๐‘– to be the smallest prime number that is in the interval [(๐‘– +๐ถ)3 + 1, (๐‘– + ๐ถ + 1)3 โˆ’ 1]. It is known that there exists such a primenumber for every ๐‘– โˆˆ โ„•. Given this, the definition of PSEQ(๐‘, ๐‘–) issimple:(๐‘ > (๐‘–+๐ถ)ร—(๐‘–+๐ถ)ร—(๐‘–+๐ถ))โˆง(๐‘ < (๐‘–+๐ถ+1)ร—(๐‘–+๐ถ+1)ร—(๐‘–+๐ถ+1))โˆง(โˆ€๐‘โ€ฒยฌPRIME(๐‘โ€ฒ) โˆจ (๐‘โ€ฒ โ‰ค ๐‘–) โˆจ (๐‘โ€ฒ โ‰ฅ ๐‘)) ,

(11.8)We leave it to the reader to verify that PSEQ(๐‘, ๐‘–) is true iff ๐‘ = ๐‘๐‘–.

To sum up we have shown that for every quantified mixed state-ment ๐œ‘, we can compute a quantified integer statement ๐œ‰ such thatQMS(๐œ‘) = 1 if and only if QIS(๐œ‰) = 1. Hence the uncomputabilityof QMS (Theorem 11.13) implies the uncomputability of QIS, com-pleting the proof of Theorem 11.9, and so also the proof of Gรถdelโ€™sIncompleteness Theorem for quantified integer statements (Theo-rem 11.8).

Chapter Recap

โ€ข Uncomputable functions include also functionsthat seem to have nothing to do with NAND-TMprograms or other computational models suchas determining the satisfiability of Diophantineequations.

Page 390: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

390 introduction to theoretical computer science

4 Hint: think of ๐‘ฅ as saying โ€œTuring Machine ๐‘€ haltson input ๐‘ขโ€ and ๐‘ค being a proof that is the number ofsteps that it will take for this to happen. Can you findan always-halting ๐‘‰ that will verify such statements?

Figure 11.3: In the puzzle problem, the input can bethought of as a finite collection ฮฃ of types of puz-zle pieces and the goal is to find out whether or notfind a way to arrange pieces from these types in arectangle. Formally, we model the input as a pair offunctions ๐‘š๐‘Ž๐‘ก๐‘โ„Žโ†”, ๐‘š๐‘Ž๐‘ก๐‘โ„Ž โˆถ ฮฃ2 โ†’ 0, 1 thatsuch that ๐‘š๐‘Ž๐‘ก๐‘โ„Žโ†”(๐‘™๐‘’๐‘“๐‘ก, ๐‘Ÿ๐‘–๐‘”โ„Ž๐‘ก) = 1 (respectively๐‘š๐‘Ž๐‘ก๐‘โ„Ž(๐‘ข๐‘, ๐‘‘๐‘œ๐‘ค๐‘›) = 1 ) if the pair of pieces arecompatible when placed in their respective posi-tions. We assume ฮฃ contains a special symbol โˆ…corresponding to having no piece, and an arrange-ment of puzzle pieces by an (๐‘š โˆ’ 2) ร— (๐‘› โˆ’ 2)rectangle is modeled by a string ๐‘ฅ โˆˆ ฮฃ๐‘šโ‹…๐‘› whoseโ€œouter coordinatesโ€™ โ€™ are โˆ… and such that for every๐‘– โˆˆ [๐‘› โˆ’ 1], ๐‘— โˆˆ [๐‘š โˆ’ 1], ๐‘š๐‘Ž๐‘ก๐‘โ„Ž(๐‘ฅ๐‘–,๐‘—, ๐‘ฅ๐‘–+1,๐‘—) = 1 and๐‘š๐‘Ž๐‘ก๐‘โ„Žโ†”(๐‘ฅ๐‘–,๐‘—, ๐‘ฅ๐‘–,๐‘—+1) = 1.

โ€ข This also implies that for any sound proof system(and in particular every finite axiomatic system) ๐‘†,there are interesting statements ๐‘‹ (namely of theform โ€œ๐น(๐‘ฅ) = 0โ€ for an uncomputable function๐น) such that ๐‘† is not able to prove either ๐‘‹ or itsnegation.

11.6 EXERCISES

Exercise 11.1 โ€” Gรถdelโ€™s Theorem from uncomputability of ๐‘„๐ผ๐‘†. Prove Theo-rem 11.8 using Theorem 11.9.

Exercise 11.2 โ€” Proof systems and uncomputability. Let FINDPROOF โˆถ0, 1โˆ— โ†’ 0, 1 be the following function. On input a Turing machine๐‘‰ (which we think of as the verifying algorithm for a proof system)and a string ๐‘ฅ โˆˆ 0, 1โˆ—, FINDPROOF(๐‘‰ , ๐‘ฅ) = 1 if and only if thereexists ๐‘ค โˆˆ 0, 1โˆ— such that ๐‘‰ (๐‘ฅ, ๐‘ค) = 1.1. Prove that FINDPROOF is uncomputable.

2. Prove that there exists a Turing machine ๐‘‰ such that ๐‘‰ haltson every input ๐‘ฅ, ๐‘ฃ but the function FINDPROOF๐‘‰ defined asFINDPROOF๐‘‰ (๐‘ฅ) = FINDPROOF(๐‘‰ , ๐‘ฅ) is uncomputable. Seefootnote for hint.4

Exercise 11.3 โ€” Expression for floor. Let FSQRT(๐‘›, ๐‘š) = โˆ€๐‘—โˆˆโ„•((๐‘— ร— ๐‘—) >๐‘š) โˆจ (๐‘— โ‰ค ๐‘›). Prove that FSQRT(๐‘›, ๐‘š) is true if and only if ๐‘› = โŒŠโˆš๐‘šโŒ‹.

Exercise 11.4 โ€” axiomatic proof systems. For every representation of logicalstatements as strings, we can define an axiomatic proof system toconsist of a finite set of strings ๐ด and a finite set of rules ๐ผ0, โ€ฆ , ๐ผ๐‘šโˆ’1with ๐ผ๐‘— โˆถ (0, 1โˆ—)๐‘˜๐‘— โ†’ 0, 1โˆ— such that a proof (๐‘ 1, โ€ฆ , ๐‘ ๐‘›) that ๐‘ ๐‘›is true is valid if for every ๐‘–, either ๐‘ ๐‘– โˆˆ ๐ด or is some ๐‘— โˆˆ [๐‘š] andare ๐‘–1, โ€ฆ , ๐‘–๐‘˜๐‘— < ๐‘– such that ๐‘ ๐‘– = ๐ผ๐‘—(๐‘ ๐‘–1 , โ€ฆ , ๐‘–๐‘˜๐‘—). A system is sound ifwhenever there is no false ๐‘  such that there is a proof that ๐‘  is true.Prove that for every uncomputable function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1and every sound axiomatic proof system ๐‘† (that is characterized by afinite number of axioms and inference rules), there is some input ๐‘ฅ forwhich the proof system ๐‘† is not able to prove neither that ๐น(๐‘ฅ) = 0nor that ๐น(๐‘ฅ) โ‰  0.

Exercise 11.5 โ€” Post Corrrespondence Problem. In the Post CorrespondenceProblem the input is a set ๐‘† = (๐›ผ0, ๐›ฝ0), โ€ฆ , (๐›ฝ๐‘โˆ’1, ๐›ฝ๐‘โˆ’1) where each

Page 391: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 391

๐›ผ๐‘– and ๐›ฝ๐‘— is a string in 0, 1โˆ—. We say that PCP(๐‘†) = 1 if and only ifthere exists a list (๐›ผ0, ๐›ฝ0), โ€ฆ , (๐›ผ๐‘šโˆ’1, ๐›ฝ๐‘šโˆ’1) of pairs in ๐‘† such that๐›ผ0๐›ผ1 โ‹ฏ ๐›ผ๐‘šโˆ’1 = ๐›ฝ0๐›ฝ1 โ‹ฏ ๐›ฝ๐‘šโˆ’1 . (11.9)

(We can think of each pair (๐›ผ, ๐›ฝ) โˆˆ ๐‘† as a โ€œdomino tileโ€ and the ques-tion is whether we can stack a list of such tiles so that the top and thebottom yield the same string.) It can be shown that the PCP is uncom-putable by a fairly straightforward though somewhat tedious proof(see for example the Wikipedia page for the Post CorrespondenceProblem or Section 5.2 in [Sip97]).

Use this fact to provide a direct proof that QMS is uncomputable byshowing that there exists a computable map ๐‘… โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— suchthat PCP(๐‘†) = QMS(๐‘…(๐‘†)) for every string ๐‘† encoding an instance ofthe post correspondence problem.

Exercise 11.6 โ€” Uncomputability of puzzle. Let PUZZLE โˆถ 0, 1โˆ— โ†’ 0, 1 bethe problem of determining, given a finite collection of types of โ€œpuz-zle piecesโ€, whether it is possible to put them together in a rectangle,see Fig. 11.3. Formally, we think of such a collection as a finite set ฮฃ(see Fig. 11.3). We model the criteria as to which pieces โ€œfit togetherโ€by a pair of finite function ๐‘š๐‘Ž๐‘ก๐‘โ„Ž, ๐‘š๐‘Ž๐‘ก๐‘โ„Žโ†” โˆถ ฮฃ2 โ†’ 0, 1 such that apiece ๐‘Ž fits above a piece ๐‘ if and only if ๐‘š๐‘Ž๐‘ก๐‘โ„Ž(๐‘Ž, ๐‘) = 1 and a piece๐‘ fits to the left of a piece ๐‘‘ if and only if ๐‘š๐‘Ž๐‘ก๐‘โ„Žโ†”(๐‘, ๐‘‘) = 1. To modelthe โ€œstraight edgeโ€ pieces that can be placed next to a โ€œblank spotโ€we assume that ฮฃ contains the symbol โˆ… and the matching functionsare defined accordingly. A square tiling of ฮฃ is an ๐‘š ร— ๐‘› long string๐‘ฅ โˆˆ ฮฃ๐‘š๐‘›, such that for every ๐‘– โˆˆ 1, โ€ฆ , ๐‘š โˆ’ 2 and ๐‘— โˆˆ 1, โ€ฆ , ๐‘› โˆ’ 2,๐‘š๐‘Ž๐‘ก๐‘โ„Ž(๐‘ฅ๐‘–,๐‘—, ๐‘ฅ๐‘–โˆ’1,๐‘—, ๐‘ฅ๐‘–+1,๐‘—, ๐‘ฅ๐‘–,๐‘—โˆ’1, ๐‘ฅ๐‘–,๐‘—+1) = 1 (i.e., every โ€œinternal pieveโ€fits in with the pieces adjacent to it). We also require all of the โ€œouterpiecesโ€ (i.e., ๐‘ฅ๐‘–,๐‘— where ๐‘– โˆˆ 0, ๐‘š โˆ’ 1 of ๐‘— โˆˆ 0, ๐‘› โˆ’ 1) are โ€œblankโ€or equal to โˆ…. The function PUZZLE takes as input a string describingthe set ฮฃ and the function ๐‘š๐‘Ž๐‘ก๐‘โ„Ž and outputs 1 if and only if there issome square tiling of ฮฃ: some not all blank string ๐‘ฅ โˆˆ ฮฃ๐‘š๐‘› satisfyingthe above condition.

1. Prove that PUZZLE is uncomputable.

2. Give a reduction from PUZZLE to QMS.

Exercise 11.7 โ€” MRDP exercise. The MRDP theorem states that theproblem of determining, given a ๐‘˜-variable polynomial ๐‘ with integercoefficients, whether there exists integers ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1 such that๐‘(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1) = 0 is uncomputable. Consider the following quadratic

Page 392: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

392 introduction to theoretical computer science

5 You can replace the equation ๐‘ฆ = ๐‘ฅ4 with the pairof equations ๐‘ฆ = ๐‘ง2 and ๐‘ง = ๐‘ฅ2. Also, you canreplace the equation ๐‘ค = ๐‘ฅ6 with the three equations๐‘ค = ๐‘ฆ๐‘ข, ๐‘ฆ = ๐‘ฅ4 and ๐‘ข = ๐‘ฅ2.

6 You will not need to use very specific properties ofthe TOWER function in this exercise. For example,NBB(๐‘›) also grows faster than the Ackerman func-tion. You might find Aaronsonโ€™s blog post on thesame topic to be quite interesting, and relevant to thisbook at large. If you like it then you might also enjoythis piece by Terence Tao.

integer equation problem: the input is a list of polynomials ๐‘0, โ€ฆ , ๐‘๐‘šโˆ’1over ๐‘˜ variables with integer coefficients, where each of the polynomi-als is of degree at most two (i.e., it is a quadratic function). The goalis to determine whether there exist integers ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1 that solve theequations ๐‘0(๐‘ฅ) = โ‹ฏ = ๐‘๐‘šโˆ’1(๐‘ฅ) = 0.

Use the MRDP Theorem to prove that this problem is uncom-putable. That is, show that the function QUADINTEQ โˆถ 0, 1โˆ— โ†’0, 1 is uncomputable, where this function gets as input a string de-scribing the polynomials ๐‘0, โ€ฆ , ๐‘๐‘šโˆ’1 (each with integer coefficientsand degree at most two), and outputs 1 if and only if there exists๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1 โˆˆ โ„ค such that for every ๐‘– โˆˆ [๐‘š], ๐‘๐‘–(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1) = 0. Seefootnote for hint5

Exercise 11.8 โ€” The Busy Beaver problem. In this question we define theNAND-TM variant of the busy beaver function.

1. We define the function ๐‘‡ โˆถ 0, 1โˆ— โ†’ โ„• as follows: for everystring ๐‘ƒ โˆˆ 0, 1โˆ—, if ๐‘ƒ represents a NAND-TM program such thatwhen ๐‘ƒ is executed on the input 0 (i.e., the string of length 1 that issimply 0), a total of ๐‘€ lines are executed before the program halts,then ๐‘‡ (๐‘ƒ) = ๐‘€ . Otherwise (if ๐‘ƒ does not represent a NAND-TMprogram, or it is a program that does not halt on 0), ๐‘‡ (๐‘ƒ) = 0.Prove that ๐‘‡ is uncomputable.

2. Let TOWER(๐‘›) denote the number 222.

.

.

2โŸ๐‘› times(that is, a โ€œtower of pow-

ers of twoโ€ of height ๐‘›). To get a sense of how fast this functiongrows, TOWER(1) = 2, TOWER(2) = 22 = 4, TOWER(3) = 222 =16, TOWER(4) = 216 = 65536 and TOWER(5) = 265536 whichis about 1020000. TOWER(6) is already a number that is too big towrite even in scientific notation. Define NBB โˆถ โ„• โ†’ โ„• (for โ€œNAND-TM Busy Beaverโ€) to be the function NBB(๐‘›) = max๐‘ƒโˆˆ0,1๐‘› ๐‘‡ (๐‘ƒ)where ๐‘‡ โˆถ โ„• โ†’ โ„• is the function defined in Item 1. Prove thatNBB grows faster than TOWER, in the sense that TOWER(๐‘›) =๐‘œ(NBB(๐‘›)) (i.e., for every ๐œ– > 0, there exists ๐‘›0 such that for every๐‘› > ๐‘›0, TOWER(๐‘›) < ๐œ– โ‹… NBB(๐‘›).).6

11.7 BIBLIOGRAPHICAL NOTES

As mentioned before, Gรถdel, Escher, Bach [Hof99] is a highly recom-mended book covering Gรถdelโ€™s Theorem. A classic popular sciencebook about Fermatโ€™s Last Theorem is [Sin97].

Cantorโ€™s are used for both Turing and Gรถdelโ€™s theorems. In a twistof fate, using techniques originating from the works of Gรถdel and Tur-

Page 393: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

is every theorem provable? 393

ing, Paul Cohen showed in 1963 that Cantorโ€™s Continuum Hypothesisis independent of the axioms of set theory, which means that neitherit nor its negation is provable from these axioms and hence in somesense can be considered as โ€œneither true nor falseโ€ (see [Coh08]). TheContinuum Hypothesis is the conjecture that for every subset ๐‘† of โ„,either there is a one-to-one and onto map between ๐‘† and โ„• or thereis a one-to-one and onto map between ๐‘† and โ„. It was conjecturedby Cantor and listed by Hilbert in 1900 as one of the most importantproblems in mathematics. See also the non-conventional survey ofShelah [She03]. See here for recent progress on a related question.

Thanks to Alex Lombardi for pointing out an embarrassing mistakein the description of Fermatโ€™s Last Theorem. (I said that it was openfor exponent 11 before Wilesโ€™ work.)

Page 394: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 395: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

IIIEFFICIENT ALGORITHMS

Page 396: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 397: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

12Efficient computation: An informal introduction

โ€œThe problem of distinguishing prime numbers from composite and of resolvingthe latter into their prime factors is โ€ฆ one of the most important and usefulin arithmetic โ€ฆ Nevertheless we must confess that all methods โ€ฆ are eitherrestricted to very special cases or are so laborious โ€ฆ they try the patience ofeven the practiced calculator โ€ฆ and do not apply at all to larger numbers.โ€,Carl Friedrich Gauss, 1798

โ€œFor practical purposes, the difference between algebraic and exponential orderis often more crucial than the difference between finite and non-finite.โ€, JackEdmunds, โ€œPaths, Trees, and Flowersโ€, 1963

โ€œWhat is the most efficient way to sort a million 32-bit integers?โ€, EricSchmidt to Barack Obama, 2008โ€œI think the bubble sort would be the wrong way to go.โ€, Barack Obama.

So far we have been concerned with which functions are computableand which ones are not. In this chapter we look at the finer questionof the time that it takes to compute functions, as a function of their inputlength. Time complexity is extremely important to both the theory andpractice of computing, but in introductory courses, coding interviews,and software development, terms such as โ€œ๐‘‚(๐‘›) running timeโ€ are of-ten used in an informal way. People donโ€™t have a precise definition ofwhat a linear-time algorithm is, but rather assume that โ€œtheyโ€™ll knowit when they see itโ€. In this book we will define running time pre-cisely, using the mathematical models of computation we developedin the previous chapters. This will allow us to ask (and sometimesanswer) questions such as:

โ€ข โ€œIs there a function that can be computed in ๐‘‚(๐‘›2) time but not in๐‘‚(๐‘›) time?โ€

โ€ข โ€œAre there natural problems for which the best algorithm (and notjust the best known) requires 2ฮฉ(๐‘›) time?โ€

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Describe at a high level some interesting

computational problems.โ€ข The difference between polynomial and

exponential time.โ€ข Examples of techniques for obtaining efficient

algorithmsโ€ข Examples of how seemingly small differences

in problems can potentially make hugedifferences in their computational complexity.

Page 398: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

398 introduction to theoretical computer science

Big Idea 16 The running time of an algorithm is not a number, it isa function of the length of the input.

This chapter: A non-mathy overviewIn this chapter, we informally survey examples of compu-tational problems. For some of these problems we knowefficient (i.e., ๐‘‚(๐‘›๐‘)-time for a small constant ๐‘) algorithms,and for others the best known algorithms are exponential.We present these examples to get a feel as to the kinds ofproblems that lie on each side of this divide and also see howsometimes seemingly minor changes in problem formula-tion can make the (known) complexity of a problem โ€œjumpโ€from polynomial to exponential. We do not formally definethe notion of running time in this chapter, but use the sameโ€œI know it when I see itโ€ notion of an ๐‘‚(๐‘›) or ๐‘‚(๐‘›2) timealgorithms as the one youโ€™ve seen in introduction to com-puter science courses. We will see the precise definition ofrunning time (using Turing machines and RAM machines /NAND-RAM) in Chapter 13.

While the difference between ๐‘‚(๐‘›) and ๐‘‚(๐‘›2) time can be crucial inpractice, in this book we focus on the even bigger difference betweenpolynomial and exponential running time. As we will see, the differencebetween polynomial versus exponential time is typically insensitive tothe choice of the particular computational model, a polynomial-timealgorithm is still polynomial whether you use Turing machines, RAMmachines, or parallel cluster as your model of computation, and sim-ilarly an exponential-time algorithm will remain exponential in all ofthese platforms. One of the interesting phenomena of computing isthat there is often a kind of a โ€œthreshold phenomenonโ€ or โ€œzero-onelawโ€ for running time. Many natural problems can either be solvedin polynomial running time with a not-too-large exponent (e.g., some-thing like ๐‘‚(๐‘›2) or ๐‘‚(๐‘›3)), or require exponential (e.g., at least 2ฮฉ(๐‘›)or 2ฮฉ(โˆš๐‘›)) time to solve. The reasons for this phenomenon are still notfully understood, but some light on it is shed by the concept of NPcompleteness, which we will see in Chapter 15.

This chapter is merely a tiny sample of the landscape of computa-tional problems and efficient algorithms. If you want to explore thefield of algorithms and data structures more deeply (which I verymuch hope you do!), the bibliographical notes contain references tosome excellent texts, some of which are available freely on the web.

Page 399: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 399

RRemark 12.1 โ€” Relations between parts of this book.Part I of this book contained a quantitative study ofcomputation of finite functions. We asked what arethe resources (in terms of gates of Boolean circuits orlines in straight-line programs) required to computevarious finite functions.Part II of the book contained a qualitative study ofcomputation of infinite functions (i.e., functions ofunbounded input length). In that part we asked thequalitative question of whether or not a function is com-putable at all, regardless of the number of operations.Part III of the book, beginning with this chapter,merges the two approaches and contains a quantitativestudy of computation of infinite functions. In this partwe ask how do resources for computing a functionscale with the length of the input. In Chapter 13 wedefine the notion of running time, and the class P offunctions that can be computed using a number ofsteps that scales polynomially with the input length.In Section 13.6 we will relate this class to the modelsof Boolean circuits and straightline programs that westudied in Part I.

12.1 PROBLEMS ON GRAPHS

In this chapter we discuss several examples of important computa-tional problems. Many of the problems will involve graphs. We havealready encountered graphs before (see Section 1.4.4) but now quicklyrecall the basic notation. A graph ๐บ consists of a set of vertices ๐‘‰ andedges ๐ธ where each edge is a pair of vertices. We typically denote by๐‘› the number of vertices (and in fact often consider graphs where theset of vertices ๐‘‰ equals the set [๐‘›] of the integers between 0 and ๐‘› โˆ’ 1).In a directed graph, an edge is an ordered pair (๐‘ข, ๐‘ฃ), which we some-times denote as ๐‘ข ๐‘ฃ. In an undirected graph, an edge is an unorderedpair (or simply a set) ๐‘ข, ๐‘ฃ which we sometimes denote as ๐‘ข ๐‘ฃ or๐‘ข โˆผ ๐‘ฃ. An equivalent viewpoint is that an undirected graph corre-sponds to a directed graph satisfying the property that whenever theedge ๐‘ข ๐‘ฃ is present then so is the edge ๐‘ฃ ๐‘ข. In this chapter we restrictour attention to graphs that are undirected and simple (i.e., containingno parallel edges or self-loops). Graphs can be represented either inthe adjacency list or adjacency matrix representation. We can transformbetween these two representations using ๐‘‚(๐‘›2) operations, and hencefor our purposes we will mostly consider them as equivalent.

Graphs are so ubiquitous in computer science and other sciencesbecause they can be used to model a great many of the data that weencounter. These are not just the โ€œobviousโ€ data such as the road

Page 400: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

400 introduction to theoretical computer science

Figure 12.1: Some examples of graphs found on theInternet.

1 A queue is a data structure for storing a list of el-ements in โ€œFirst In First Out (FIFO)โ€ order. Eachโ€œpopโ€ operation removes an element from the queuein the order that they were โ€œpushedโ€ into it; see theWikipedia page.

network (which can be thought of as a graph of whose vertices arelocations with edges corresponding to road segments), or the web(which can be thought of as a graph whose vertices are web pageswith edges corresponding to links), or social networks (which canbe thought of as a graph whose vertices are people and the edgescorrespond to friend relation). Graphs can also denote correlations indata (e.g., graph of observations of features with edges correspondingto features that tend to appear together), causal relations (e.g., generegulatory networks, where a gene is connected to gene products itderives), or the state space of a system (e.g., graph of configurationsof a physical system, with edges corresponding to states that can bereached from one another in one step).

12.1.1 Finding the shortest path in a graphThe shortest path problem is the task of finding, given a graph ๐บ =(๐‘‰ , ๐ธ) and two vertices ๐‘ , ๐‘ก โˆˆ ๐‘‰ , the length of the shortest pathbetween ๐‘  and ๐‘ก (if such a path exists). That is, we want to find thesmallest number ๐‘˜ such that there are vertices ๐‘ฃ0, ๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘˜ with ๐‘ฃ0 = ๐‘ ,๐‘ฃ๐‘˜ = ๐‘ก and for every ๐‘– โˆˆ 0, โ€ฆ , ๐‘˜ โˆ’ 1 an edge between ๐‘ฃ๐‘– and ๐‘ฃ๐‘–+1. For-mally, we define MINPATH โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— to be the function thaton input a triple (๐บ, ๐‘ , ๐‘ก) (represented as a string) outputs the number๐‘˜ which is the length of the shortest path in ๐บ between ๐‘  and ๐‘ก or astring representing no path if no such path exists. (In practice peopleoften want to also find the actual path and not just its length; it turnsout that the algorithms to compute the length of the path often yieldthe actual path itself as a byproduct, and so everything we say aboutthe task of computing the length also applies to the task of finding thepath.)

If each vertex has at least two neighbors then there can be an expo-nential number of paths from ๐‘  to ๐‘ก, but fortunately we do not have toenumerate them all to find the shortest path. We can find the short-est path using a breadth first search (BFS), enumerating ๐‘ โ€™s neigh-bors, and then neighborsโ€™ neighbors, etc.. in order. If we maintainthe neighbors in a list we can perform a BFS in ๐‘‚(๐‘›2) time, while us-ing a queue we can do this in ๐‘‚(๐‘š) time.1 Dijkstraโ€™s algorithm is awell-known generalization of BFS to weighted graphs. More formally,the algorithm for computing the function MINPATH is described inAlgorithm 12.2.

Page 401: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 401

Algorithm 12.2 โ€” Shortest path via BFS.

Input: Graph ๐บ = (๐‘‰ , ๐ธ) and vertices ๐‘ , ๐‘ก โˆˆ ๐‘‰ . Assume๐‘‰ = [๐‘›].Output: Length ๐‘˜ of shortest path from ๐‘  to ๐‘ก or โˆž if no

such path exists.1: Let ๐ท be length-๐‘› array.2: Set ๐ท[๐‘ ] = 0 and ๐ท[๐‘–] = โˆž for all ๐‘– โˆˆ [๐‘›] โงต ๐‘ .3: Initialize queue ๐‘„ to contain ๐‘ .4: while ๐‘„ non empty do5: Pop ๐‘ฃ from ๐‘„6: if ๐‘ฃ = ๐‘ก then7: return ๐ท[๐‘ฃ]8: end if9: for ๐‘ข neighbor of ๐‘ฃ with ๐ท[๐‘ข] = โˆž do

10: Set ๐ท[๐‘ข] โ† ๐ท[๐‘ฃ] + 111: Add ๐‘ข to ๐‘„.12: end for13: end while14: return โˆž

Since we only add to the queue vertices ๐‘ค with ๐ท[๐‘ค] = โˆž (andthen immediately set ๐ท[๐‘ค] to an actual number), we never push tothe queue a vertex more than once, and hence the algorithm makes atmost ๐‘› โ€œpushโ€ and โ€œpopโ€ operations. For each vertex ๐‘ฃ, the numberof times we run the inner loop is equal to the degree of ๐‘ฃ and hencethe total running time is proportional to the sum of all degrees whichequals twice the number ๐‘š of edges. Algorithm 12.2 returns the cor-rect answer since the vertices are added to the queue in the order oftheir distance from ๐‘ , and hence we will reach ๐‘ก after we have exploredall the vertices that are closer to ๐‘  than ๐‘ก.

RRemark 12.3 โ€” On data structures. If youโ€™ve ever takenan algorithms course, you have probably encounteredmany data structures such as lists, arrays, queues,stacks, heaps, search trees, hash tables and manymore. Data structures are extremely important in com-puter science, and each one of those offers differenttradeoffs between overhead in storage, operationssupported, cost in time for each operation, and more.For example, if we store ๐‘› items in a list, we will needa linear (i.e., ๐‘‚(๐‘›) time) scan to retrieve an element,while we achieve the same operation in ๐‘‚(1) time ifwe used a hash table. However, when we only careabout polynomial-time algorithms, such factors of๐‘‚(๐‘›) in the running time will not make much differ-

Page 402: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

402 introduction to theoretical computer science

Figure 12.2: A knightโ€™s tour can be thought of as amaximally long path on the graph corresponding toa chessboard where we put an edge between any twosquares that can be reached by one step via a legalknight move.

ence. Similarly, if we donโ€™t care about the differencebetween ๐‘‚(๐‘›) and ๐‘‚(๐‘›2), then it doesnโ€™t matter if werepresent graphs as adjacency lists or adjacency matri-ces. Hence we will often describe our algorithms at avery high level, without specifying the particular datastructures that are used to implement them. How-ever, it will always be clear that there exists some datastructure that is sufficient for our purposes.

12.1.2 Finding the longest path in a graphThe longest path problem is the task of finding the length of the longestsimple (i.e., non intersecting) path between a given pair of vertices๐‘  and ๐‘ก in a given graph ๐บ. If the graph is a road network, then thelongest path might seem less motivated than the shortest path (unlessyou are the kind of person that always prefers the โ€œscenic routeโ€).But graphs can and are used to model a variety of phenomena, and inmany such cases finding the longest path (and some of its variants)can be very useful. In particular, finding the longest path is a gener-alization of the famous Hamiltonian path problem which asks for amaximally long simple path (i.e., path that visits all ๐‘› vertices once)between ๐‘  and ๐‘ก, as well as the notorious traveling salesman problem(TSP) of finding (in a weighted graph) a path visiting all vertices ofcost at most ๐‘ค. TSP is a classical optimization problem, with appli-cations ranging from planning and logistics to DNA sequencing andastronomy.

Surprisingly, while we can find the shortest path in ๐‘‚(๐‘š) time,there is no known algorithm for the longest path problem that signif-icantly improves on the trivial โ€œexhaustive searchโ€ or โ€œbrute forceโ€algorithm that enumerates all the exponentially many possibilitiesfor such paths. Specifically, the best known algorithms for the longestpath problem take ๐‘‚(๐‘๐‘›) time for some constant ๐‘ > 1. (At the mo-ment the best record is ๐‘ โˆผ 1.65 or so; even obtaining an ๐‘‚(2๐‘›) timebound is not that simple, see Exercise 12.1.)

12.1.3 Finding the minimum cut in a graphGiven a graph ๐บ = (๐‘‰ , ๐ธ), a cut of ๐บ is a subset ๐‘† โŠ† ๐‘‰ such that ๐‘†is neither empty nor is it all of ๐‘‰ . The edges cut by ๐‘† are those edgeswhere one of their endpoints is in ๐‘† and the other is in ๐‘† = ๐‘‰ โงต ๐‘†. Wedenote this set of edges by ๐ธ(๐‘†, ๐‘†). If ๐‘ , ๐‘ก โˆˆ ๐‘‰ are a pair of verticesthen an ๐‘ , ๐‘ก cut is a cut such that ๐‘  โˆˆ ๐‘† and ๐‘ก โˆˆ ๐‘† (see Fig. 12.3).The minimum ๐‘ , ๐‘ก cut problem is the task of finding, given ๐‘  and ๐‘ก, theminimum number ๐‘˜ such that there is an ๐‘ , ๐‘ก cut cutting ๐‘˜ edges (theproblem is also sometimes phrased as finding the set that achievesthis minimum; it turns out that algorithms to compute the numberoften yield the set as well). Formally, we define MINCUT โˆถ 0, 1โˆ— โ†’

Page 403: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 403

Figure 12.3: A cut in a graph ๐บ = (๐‘‰ , ๐ธ) is simply asubset ๐‘† of its vertices. The edges that are cut by ๐‘†are all those whose one endpoint is in ๐‘† and the otherone is in ๐‘† = ๐‘‰ โงต ๐‘†. The cut edges are colored red inthis figure.

0, 1โˆ— to be the function that on input a string representing a triple(๐บ = (๐‘‰ , ๐ธ), ๐‘ , ๐‘ก) of a graph and two vertices, outputs the minimumnumber ๐‘˜ such that there exists a set ๐‘† โŠ† ๐‘‰ with ๐‘  โˆˆ ๐‘†, ๐‘ก โˆ‰ ๐‘† and|๐ธ(๐‘†, ๐‘†)| = ๐‘˜.

Computing minimum ๐‘ , ๐‘ก cuts is useful in many applications sinceminimum cuts often correspond to bottlenecks. For example, in a com-munication or railroad network the minimum cut between ๐‘  and ๐‘กcorresponds to the smallest number of edges that, if dropped, willdisconnect ๐‘  from ๐‘ก. (This was actually the original motivation for thisproblem; see Section 12.6.) Similar applications arise in schedulingand planning. In the setting of image segmentation, one can define agraph whose vertices are pixels and whose edges correspond to neigh-boring pixels of distinct colors. If we want to separate the foregroundfrom the background then we can pick (or guess) a foreground pixel ๐‘ and background pixel ๐‘ก and ask for a minimum cut between them.

The naive algorithm for computing MINCUT will check all 2๐‘› pos-sible subsets of an ๐‘›-vertex graph, but it turns out we can do muchbetter than that. As weโ€™ve seen in this book time and again, there ismore than one algorithm to compute the same function, and someof those algorithms might be more efficient than others. Luckily theminimum cut problem is one of those cases. In particular, as we willsee in the next section, there are algorithms that compute MINCUT intime which is polynomial in the number of vertices.

12.1.4 Min-Cut Max-Flow and Linear programmingWe can obtain a polynomial-time algorithm for computing MINCUTusing the Max-Flow Min-Cut Theorem. This theorem says that theminimum cut between ๐‘  and ๐‘ก equals the maximum amount of flowwe can send from ๐‘  to ๐‘ก, if every edge has unit capacity. Specifically,imagine that every edge of the graph corresponded to a pipe thatcould carry one unit of fluid per one unit of time (say 1 liter of waterper second). The maximum ๐‘ , ๐‘ก flow is the maximum units of waterthat we could transfer from ๐‘  to ๐‘ก over these pipes. If there is an ๐‘ , ๐‘กcut of ๐‘˜ edges, then the maximum flow is at most ๐‘˜. The reason isthat such a cut ๐‘† acts as a โ€œbottleneckโ€ since at most ๐‘˜ units can flowfrom ๐‘† to its complement at any given unit of time. This means thatthe maximum ๐‘ , ๐‘ก flow is always at most the value of the minimum๐‘ , ๐‘ก cut. The surprising and non-trivial content of the Max-Flow Min-Cut Theorem is that the maximum flow is also at least the value of theminimum cut, and hence computing the cut is the same as computingthe flow.

The Max-Flow Min-Cut Theorem reduces the task of computing aminimum cut to the task of computing a maximum flow. However, thisstill does not show how to compute such a flow. The Ford-Fulkerson

Page 404: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

404 introduction to theoretical computer science

Algorithm is a direct way to compute a flow using incremental im-provements. But computing flows in polynomial time is also a specialcase of a much more general tool known as linear programming.

A flow on a graph ๐บ of ๐‘š edges can be modeled as a vector ๐‘ฅ โˆˆ โ„๐‘šwhere for every edge ๐‘’, ๐‘ฅ๐‘’ corresponds to the amount of water pertime-unit that flows on ๐‘’. We think of an edge ๐‘’ as an ordered pair(๐‘ข, ๐‘ฃ) (we can choose the order arbitrarily) and let ๐‘ฅ๐‘’ be the amountof flow that goes from ๐‘ข to ๐‘ฃ. (If the flow is in the other direction thenwe make ๐‘ฅ๐‘’ negative.) Since every edge has capacity one, we knowthat โˆ’1 โ‰ค ๐‘ฅ๐‘’ โ‰ค 1 for every edge ๐‘’. A valid flow has the property thatthe amount of water leaving the source ๐‘  is the same as the amountentering the sink ๐‘ก, and that for every other vertex ๐‘ฃ, the amount ofwater entering and leaving ๐‘ฃ is the same.

Mathematically, we can write these conditions as follows:โˆ‘๐‘’โˆ‹๐‘  ๐‘ฅ๐‘’ + โˆ‘๐‘’โˆ‹๐‘ก ๐‘ฅ๐‘’ = 0โˆ‘๐‘’โˆ‹๐‘ฃ ๐‘ฅ๐‘’ = 0 โˆ€๐‘ฃโˆˆ๐‘‰ โงต๐‘ ,๐‘กโˆ’1 โ‰ค ๐‘ฅ๐‘’ โ‰ค 1 โˆ€๐‘’โˆˆ๐ธ(12.1)

where for every vertex ๐‘ฃ, summing over ๐‘’ โˆ‹ ๐‘ฃ means summing over allthe edges that touch ๐‘ฃ.

The maximum flow problem can be thought of as the task of max-imizing โˆ‘๐‘’โˆ‹๐‘  ๐‘ฅ๐‘’ over all the vectors ๐‘ฅ โˆˆ โ„๐‘š that satisfy the aboveconditions (12.1). Maximizing a linear function โ„“(๐‘ฅ) over the set of๐‘ฅ โˆˆ โ„๐‘š that satisfy certain linear equalities and inequalities is knownas linear programming. Luckily, there are polynomial-time algorithmsfor solving linear programming, and hence we can solve the maxi-mum flow (and so, equivalently, minimum cut) problem in polyno-mial time. In fact, there are much better algorithms for maximum-flow/minimum-cut, even for weighted directed graphs, with currentlythe record standing at ๐‘‚(min๐‘š10/7, ๐‘šโˆš๐‘›) time.Solved Exercise 12.1 โ€” Global minimum cut. Given a graph ๐บ = (๐‘‰ , ๐ธ),define the global minimum cut of ๐บ to be the minimum over all ๐‘† โŠ† ๐‘‰with ๐‘† โ‰  โˆ… and ๐‘† โ‰  ๐‘‰ of the number of edges cut by ๐‘†. Prove thatthere is a polynomial-time algorithm to compute the global minimumcut of a graph.

Solution:

By the above we know that there is a polynomial-time algorithm๐ด that on input (๐บ, ๐‘ , ๐‘ก) finds the minimum ๐‘ , ๐‘ก cut in the graph

Page 405: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 405

Figure 12.4: In a convex function ๐‘“ (left figure), forevery ๐‘ฅ and ๐‘ฆ and ๐‘ โˆˆ [0, 1] it holds that ๐‘“(๐‘๐‘ฅ + (1 โˆ’๐‘)๐‘ฆ) โ‰ค ๐‘โ‹…๐‘“(๐‘ฅ)+(1โˆ’๐‘)โ‹…๐‘“(๐‘ฆ). In particular this meansthat every local minimum of ๐‘“ is also a global minimum.In contrast in a non convex function there can be manylocal minima.

Figure 12.5: In the high dimensional case, if ๐‘“ is aconvex function (left figure) the global minimumis the only local minimum, and we can find it bya local-search algorithm which can be thought ofas dropping a marble and letting it โ€œslide downโ€until it reaches the global minimum. In contrast, anon-convex function (right figure) might have anexponential number of local minima in which anylocal-search algorithm could get stuck.

๐บ. Using ๐ด, we can obtain an algorithm ๐ต that on input a graph ๐บcomputes the global minimum cut as follows:

1. For every distinct pair ๐‘ , ๐‘ก โˆˆ ๐‘‰ , Algorithms ๐ต sets ๐‘˜๐‘ ,๐‘ก โ†๐ด(๐บ, ๐‘ , ๐‘ก).2. ๐ต returns the minimum of ๐‘˜๐‘ ,๐‘ก over all distinct pairs ๐‘ , ๐‘ก

The running time of ๐ต will be ๐‘‚(๐‘›2) times the running time of ๐ดand hence polynomial time. Moreover, if the the global minimumcut is ๐‘†, then when ๐ต reaches an iteration with ๐‘  โˆˆ ๐‘† and ๐‘ก โˆ‰ ๐‘† itwill obtain the value of this cut, and hence the value output by ๐ตwill be the value of the global minimum cut.

The above is our first example of a reduction in the context ofpolynomial-time algorithms. Namely, we reduced the task of com-puting the global minimum cut to the task of computing minimum๐‘ , ๐‘ก cuts.

12.1.5 Finding the maximum cut in a graphThe maximum cut problem is the task of finding, given an input graph๐บ = (๐‘‰ , ๐ธ), the subset ๐‘† โŠ† ๐‘‰ that maximizes the number of edgescut by ๐‘†. (We can also define an ๐‘ , ๐‘ก-cut variant of the maximum cutlike we did for minimum cut; the two variants have similar complexitybut the global maximum cut is more common in the literature.) Likeits cousin the minimum cut problem, the maximum cut problem isalso very well motivated. For example, maximum cut arises in VLSIdesign, and also has some surprising relation to analyzing the Isingmodel in statistical physics.

Surprisingly, while (as weโ€™ve seen) there is a polynomial-time al-gorithm for the minimum cut problem, there is no known algorithmsolving maximum cut much faster than the trivial โ€œbrute forceโ€ algo-rithm that tries all 2๐‘› possibilities for the set ๐‘†.

12.1.6 A note on convexityThere is an underlying reason for the sometimes radical differencebetween the difficulty of maximizing and minimizing a function overa domain. If ๐ท โŠ† โ„๐‘›, then a function ๐‘“ โˆถ ๐ท โ†’ ๐‘… is convex if for every๐‘ฅ, ๐‘ฆ โˆˆ ๐ท and ๐‘ โˆˆ [0, 1] ๐‘“(๐‘๐‘ฅ + (1 โˆ’ ๐‘)๐‘ฆ) โ‰ค ๐‘๐‘“(๐‘ฅ) + (1 โˆ’ ๐‘)๐‘“(๐‘ฆ). Thatis, ๐‘“ applied to the ๐‘-weighted midpoint between ๐‘ฅ and ๐‘ฆ is smallerthan the ๐‘-weighted average value of ๐‘“ . If ๐ท itself is convex (whichmeans that if ๐‘ฅ, ๐‘ฆ are in ๐ท then so is the line segment between them),then this means that if ๐‘ฅ is a local minimum of ๐‘“ then it is also a globalminimum. The reason is that if ๐‘“(๐‘ฆ) < ๐‘“(๐‘ฅ) then every point ๐‘ง =๐‘๐‘ฅ + (1 โˆ’ ๐‘)๐‘ฆ on the line segment between ๐‘ฅ and ๐‘ฆ will satisfy ๐‘“(๐‘ง) โ‰ค๐‘๐‘“(๐‘ฅ) + (1 โˆ’ ๐‘)๐‘“(๐‘ฆ) < ๐‘“(๐‘ฅ) and hence in particular ๐‘ฅ cannot be a local

Page 406: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

406 introduction to theoretical computer science

minimum. Intuitively, local minima of functions are much easier tofind than global ones: after all, any โ€œlocal searchโ€ algorithm that keepsfinding a nearby point on which the value is lower, will eventuallyarrive at a local minima. One example of such a local search algorithmis gradient descent which takes a sequence of small steps, each one inthe direction that would reduce the value by the most amount basedon the current derivative.

Indeed, under certain technical conditions, we can often efficientlyfind the minimum of convex functions over a convex domain, andthis is the reason why problems such as minimum cut and shortestpath are easy to solve. On the other hand, maximizing a convex func-tion over a convex domain (or equivalently, minimizing a concavefunction) can often be a hard computational task. A linear functionis both convex and concave, which is the reason that both the maxi-mization and minimization problems for linear functions can be doneefficiently.

The minimum cut problem is not a priori a convex minimizationtask, because the set of potential cuts is discrete and not continuous.However, it turns out that we can embed it in a continuous and con-vex set via the (linear) maximum flow problem. The โ€œmax flow mincutโ€ theorem ensures that this embedding is โ€œtightโ€ in the sense thatthe minimum โ€œfractional cutโ€ that we obtain through the maximum-flow linear program will be the same as the true minimum cut. Un-fortunately, we donโ€™t know of such a tight embedding in the setting ofthe maximum cut problem.

Convexity arises time and again in the context of efficient computa-tion. For example, one of the basic tasks in machine learning is empir-ical risk minimization. This is the task of finding a classifier for a givenset of training examples. That is, the input is a list of labeled examples(๐‘ฅ0, ๐‘ฆ0), โ€ฆ , (๐‘ฅ๐‘šโˆ’1, ๐‘ฆ๐‘šโˆ’1), where each ๐‘ฅ๐‘– โˆˆ 0, 1๐‘› and ๐‘ฆ๐‘– โˆˆ 0, 1,and the goal is to find a classifier โ„Ž โˆถ 0, 1๐‘› โ†’ 0, 1 (or sometimesโ„Ž โˆถ 0, 1๐‘› โ†’ โ„) that minimizes the number of errors. More generally,we want to find โ„Ž that minimizes๐‘šโˆ’1โˆ‘๐‘–=0 ๐ฟ(๐‘ฆ๐‘–, โ„Ž(๐‘ฅ๐‘–)) (12.2)

where ๐ฟ is some loss function measuring how far is the predicted la-bel โ„Ž(๐‘ฅ๐‘–) from the true label ๐‘ฆ๐‘–. When ๐ฟ is the square loss function๐ฟ(๐‘ฆ, ๐‘ฆโ€ฒ) = (๐‘ฆ โˆ’ ๐‘ฆโ€ฒ)2 and โ„Ž is a linear function, empirical risk mini-mization corresponds to the well-known convex minimization task oflinear regression. In other cases, when the task is non convex, there canbe many global or local minima. That said, even if we donโ€™t find theglobal (or even a local) minima, this continuous embedding can stillhelp us. In particular, when running a local improvement algorithm

Page 407: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 407

such as Gradient Descent, we might still find a function โ„Ž that is โ€œuse-fulโ€ in the sense of having a small error on future examples from thesame distribution.

12.2 BEYOND GRAPHS

Not all computational problems arise from graphs. We now list someother examples of computational problems that are of great interest.

12.2.1 SATA propositional formula ๐œ‘ involves ๐‘› variables ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘› and the logicaloperators AND (โˆง), OR (โˆจ), and NOT (ยฌ, also denoted as โ‹…). We saythat such a formula is in conjunctive normal form (CNF for short) if it isan AND of ORs of variables or their negations (we call a term of theform ๐‘ฅ๐‘– or ๐‘ฅ๐‘– a literal). For example, this is a CNF formula(๐‘ฅ7 โˆจ ๐‘ฅ22 โˆจ ๐‘ฅ15) โˆง (๐‘ฅ37 โˆจ ๐‘ฅ22) โˆง (๐‘ฅ55 โˆจ ๐‘ฅ7) (12.3)

The satisfiability problem is the task of determining, given a CNFformula ๐œ‘, whether or not there exists a satisfying assignment for ๐œ‘. Asatisfying assignment for ๐œ‘ is a string ๐‘ฅ โˆˆ 0, 1๐‘› such that ๐œ‘ evalu-ates to True if we assign its variables the values of ๐‘ฅ. The SAT problemmight seem as an abstract question of interest only in logic but in factSAT is of huge interest in industrial optimization, with applicationsincluding manufacturing planning, circuit synthesis, software verifica-tion, air-traffic control, scheduling sports tournaments, and more.

2SAT. We say that a formula is a ๐‘˜-CNF it is an AND of ORs whereeach OR involves exactly ๐‘˜ literals. The ๐‘˜-SAT problem is the restric-tion of the satisfiability problem for the case that the input formula isa ๐‘˜-CNF. In particular, the 2SAT problem is to find out, given a 2-CNFformula ๐œ‘, whether there is an assignment ๐‘ฅ โˆˆ 0, 1๐‘› that satisfies๐œ‘, in the sense that it makes it evaluate to 1 or โ€œTrueโ€. The trivial,brute-force, algorithm for 2SAT will enumerate all the 2๐‘› assignments๐‘ฅ โˆˆ 0, 1๐‘› but fortunately we can do much better. The key is thatwe can think of every constraint of the form โ„“๐‘– โˆจ โ„“๐‘— (where โ„“๐‘–, โ„“๐‘— areliterals, corresponding to variables or their negations) as an implicationโ„“๐‘– โ‡’ โ„“๐‘—, since it corresponds to the constraints that if the literal โ„“โ€ฒ๐‘– = โ„“๐‘–is true then it must be the case that โ„“๐‘— is true as well. Hence we canthink of ๐œ‘ as a directed graph between the 2๐‘› literals, with an edgefrom โ„“๐‘– to โ„“๐‘— corresponding to an implication from the former to thelatter. It can be shown that ๐œ‘ is unsatisfiable if and only if there is avariable ๐‘ฅ๐‘– such that there is a directed path from ๐‘ฅ๐‘– to ๐‘ฅ๐‘– as well asa directed path from ๐‘ฅ๐‘– to ๐‘ฅ๐‘– (see Exercise 12.2). This reduces 2SATto the (efficiently solvable) problem of determining connectivity indirected graphs.

Page 408: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

408 introduction to theoretical computer science

3SAT. The 3SAT problem is the task of determining satisfiabilityfor 3CNFs. One might think that changing from two to three wouldnot make that much of a difference for complexity. One would bewrong. Despite much effort, we do not know of a significantly betterthan brute force algorithm for 3SAT (the best known algorithms takeroughly 1.3๐‘› steps).

Interestingly, a similar issue arises time and again in computation,where the difference between two and three often corresponds tothe difference between tractable and intractable. We do not fully un-derstand the reasons for this phenomenon, though the notion of NPcompleteness we will see later does offer a partial explanation. It maybe related to the fact that optimizing a polynomial often amounts toequations on its derivative. The derivative of a quadratic polynomial islinear, while the derivative of a cubic is quadratic, and, as we will see,the difference between solving linear and quadratic equations can bequite profound.

12.2.2 Solving linear equationsOne of the most useful problems that people have been solving timeand again is solving ๐‘› linear equations in ๐‘› variables. That is, solveequations of the form๐‘Ž0,0๐‘ฅ0 + ๐‘Ž0,1๐‘ฅ1 + โ‹ฏ + ๐‘Ž0,๐‘›โˆ’1๐‘ฅ๐‘›โˆ’1 = ๐‘0๐‘Ž1,0๐‘ฅ0 + ๐‘Ž1,1๐‘ฅ1 + โ‹ฏ + ๐‘Ž1,๐‘›โˆ’1๐‘ฅ๐‘›โˆ’1 = ๐‘1โ‹ฎ+ โ‹ฎ + โ‹ฎ + โ‹ฎ =โ‹ฎ๐‘Ž๐‘›โˆ’1,0๐‘ฅ0 + ๐‘Ž๐‘›โˆ’1,1๐‘ฅ1 + โ‹ฏ + ๐‘Ž๐‘›โˆ’1,๐‘›โˆ’1๐‘ฅ๐‘›โˆ’1 = ๐‘๐‘›โˆ’1

(12.4)

where ๐‘Ž๐‘–,๐‘—๐‘–,๐‘—โˆˆ[๐‘›] and ๐‘๐‘–๐‘–โˆˆ[๐‘›] are real (or rational) numbers. Morecompactly, we can write this as the equations ๐ด๐‘ฅ = ๐‘ where ๐ด is an๐‘› ร— ๐‘› matrix, and we think of ๐‘ฅ, ๐‘ are column vectors in โ„๐‘›.

The standard Gaussian elimination algorithm can be used to solvesuch equations in polynomial time (i.e., determine if they have a so-lution, and if so, to find it). As we discussed above, if we are willingto allow some loss in precision, we even have algorithms that handlelinear inequalities, also known as linear programming. In contrast, ifwe insist on integer solutions, the task of solving for linear equalitiesor inequalities is known as integer programming, and the best knownalgorithms are exponential time in the worst case.

RRemark 12.4 โ€” Bit complexity of numbers. Whenever wediscuss problems whose inputs correspond to num-bers, the input length corresponds to how many bitsare needed to describe the number (or, as is equiv-alent up to a constant factor, the number of digits

Page 409: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 409

in base 10, 16 or any other constant). The differencebetween the length of the input and the magnitudeof the number itself can be of course quite profound.For example, most people would agree that there isa huge difference between having a billion (i.e. 109)dollars and having nine dollars. Similarly there is ahuge difference between an algorithm that takes ๐‘›steps on an ๐‘›-bit number and an algorithm that takes2๐‘› steps.One example is the problem (discussed below) offinding the prime factors of a given integer ๐‘ . Thenatural algorithm is to search for such a factor by try-ing all numbers from 1 to ๐‘ , but that would take ๐‘steps which is exponential in the input length, whichis the number of bits needed to describe ๐‘ . (The run-ning time of this algorithm can be easily improved toroughly

โˆš๐‘ , but this is still exponential (i.e., 2๐‘›/2) inthe number ๐‘› of bits to describe ๐‘ .) It is an importantand long open question whether there is such an algo-rithm that runs in time polynomial in the input length(i.e., polynomial in log๐‘).

12.2.3 Solving quadratic equationsSuppose that we want to solve not just linear but also equations in-volving quadratic terms of the form ๐‘Ž๐‘–,๐‘—,๐‘˜๐‘ฅ๐‘—๐‘ฅ๐‘˜. That is, suppose thatwe are given a set of quadratic polynomials ๐‘1, โ€ฆ , ๐‘๐‘š and considerthe equations ๐‘๐‘–(๐‘ฅ) = 0. To avoid issues with bit representations,we will always assume that the equations contain the constraints๐‘ฅ2๐‘– โˆ’ ๐‘ฅ๐‘– = 0๐‘–โˆˆ[๐‘›]. Since only 0 and 1 satisfy the equation ๐‘Ž2 โˆ’ ๐‘Ž = 0,this assumption means that we can restrict attention to solutions in0, 1๐‘›. Solving quadratic equations in several variables is a classicaland extremely well motivated problem. This is the generalization ofthe classical case of single-variable quadratic equations that gener-ations of high school students grapple with. It also generalizes thequadratic assignment problem, introduced in the 1950โ€™s as a way tooptimize assignment of economic activities. Once again, we do notknow a much better algorithm for this problem than the one that enu-merates over all the 2๐‘› possibilities.

12.3 MORE ADVANCED EXAMPLES

We now list a few more examples of interesting problems that are alittle more advanced but are of significant interest in areas such asphysics, economics, number theory, and cryptography.

12.3.1 Determinant of a matrixThe determinant of a ๐‘› ร— ๐‘› matrix ๐ด, denoted by det(๐ด), is an ex-tremely important quantity in linear algebra. For example, it is known

Page 410: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

410 introduction to theoretical computer science

that det(๐ด) โ‰  0 if and only if ๐ด is nonsingular, which means that ithas an inverse ๐ดโˆ’1, and hence we can always uniquely solve equationsof the form ๐ด๐‘ฅ = ๐‘ where ๐‘ฅ and ๐‘ are ๐‘›-dimensional vectors. Moregenerally, the determinant can be thought of as a quantitative measureas to what extent ๐ด is far from being singular. If the rows of ๐ด are โ€œal-mostโ€ linearly dependent (for example, if the third row is very closeto being a linear combination of the first two rows) then the determi-nant will be small, while if they are far from it (for example, if they areare orthogonal to one another, then the determinant will be large). Inparticular, for every matrix ๐ด, the absolute value of the determinantof ๐ด is at most the product of the norms (i.e., square root of sum ofsquares of entries) of the rows, with equality if and only if the rowsare orthogonal to one another.

The determinant can be defined in several ways. One way to definethe determinant of an ๐‘› ร— ๐‘› matrix ๐ด is:

det(๐ด) = โˆ‘๐œ‹โˆˆ๐‘†๐‘› sign(๐œ‹) โˆ๐‘–โˆˆ[๐‘›] ๐ด๐‘–,๐œ‹(๐‘–) (12.5)

where ๐‘†๐‘› is the set of all permutations from [๐‘›] to [๐‘›] and the sign ofa permutation ๐œ‹ is equal to โˆ’1 raised to the power of the number ofinversions in ๐œ‹ (pairs ๐‘–, ๐‘— such that ๐‘– > ๐‘— but ๐œ‹(๐‘–) < ๐œ‹(๐‘—)).

This definition suggests that computing det(๐ด) might requiresumming over |๐‘†๐‘›| terms which would take exponential time since|๐‘†๐‘›| = ๐‘›! > 2๐‘›. However, there are other ways to compute the de-terminant. For example, it is known that det is the only function thatsatisfies the following conditions:

1. det(AB) = det(๐ด)det(๐ต) for every square matrices ๐ด, ๐ต.

2. For every ๐‘› ร— ๐‘› triangular matrix ๐‘‡ with diagonal entries๐‘‘0, โ€ฆ , ๐‘‘๐‘›โˆ’1, det(๐‘‡ ) = โˆ๐‘›๐‘–=0 ๐‘‘๐‘–. In particular det(๐ผ) = 1 where ๐ผ isthe identity matrix. (A triangular matrix is one in which either allentries below the diagonal, or all entries above the diagonal, arezero.)

3. det(๐‘†) = โˆ’1 where ๐‘† is a โ€œswap matrixโ€ that corresponds toswapping two rows or two columns of ๐ผ . That is, there are two

coordinates ๐‘Ž, ๐‘ such that for every ๐‘–, ๐‘—, ๐‘†๐‘–,๐‘— = โŽงโŽจโŽฉ1 ๐‘– = ๐‘— , ๐‘– โˆ‰ ๐‘Ž, ๐‘1 ๐‘–, ๐‘— = ๐‘Ž, ๐‘0 otherwise

.

Using these rules and the Gaussian elimination algorithm, it ispossible to tell whether ๐ด is singular or not, and in the latter case, de-compose ๐ด as a product of a polynomial number of swap matricesand triangular matrices. (Indeed one can verify that the row opera-tions in Gaussian elimination corresponds to either multiplying by a

Page 411: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 411

swap matrix or by a triangular matrix.) Hence we can compute thedeterminant for an ๐‘› ร— ๐‘› matrix using a polynomial time of arithmeticoperations.

12.3.2 Permanent of a matrixGiven an ๐‘› ร— ๐‘› matrix ๐ด, the permanent of ๐ด is defined as

perm(๐ด) = โˆ‘๐œ‹โˆˆ๐‘†๐‘› โˆ๐‘–โˆˆ[๐‘›] ๐ด๐‘–,๐œ‹(๐‘–) . (12.6)

That is, perm(๐ด) is defined analogously to the determinant in (12.5)except that we drop the term sign(๐œ‹). The permanent of a matrix is anatural quantity, and has been studied in several contexts includingcombinatorics and graph theory. It also arises in physics where it canbe used to describe the quantum state of multiple Boson particles (seehere and here).

Permanent modulo 2. If the entries of ๐ด are integers, then we can de-fine the Boolean function ๐‘๐‘’๐‘Ÿ๐‘š2 which outputs on input a matrix ๐ดthe result of the permanent of ๐ด modulo 2. It turns out that we cancompute ๐‘๐‘’๐‘Ÿ๐‘š2(๐ด) in polynomial time. The key is that modulo 2, โˆ’๐‘ฅand +๐‘ฅ are the same quantity and hence, since the only differencebetween (12.5) and (12.6) is that some terms are multiplied by โˆ’1,det(๐ด) mod 2 = perm(๐ด) mod 2 for every ๐ด.

Permanent modulo 3. Emboldened by our good fortune above, wemight hope to be able to compute the permanent modulo any prime ๐‘and perhaps in full generality. Alas, we have no such luck. In a similarโ€œtwo to threeโ€ type of a phenomenon, we do not know of a muchbetter than brute force algorithm to even compute the permanentmodulo 3.12.3.3 Finding a zero-sum equilibriumA zero sum game is a game between two players where the payoff forone is the same as the penalty for the other. That is, whatever the firstplayer gains, the second player loses. As much as we want to avoidthem, zero sum games do arise in life, and the one good thing aboutthem is that at least we can compute the optimal strategy.

A zero sum game can be specified by an ๐‘› ร— ๐‘› matrix ๐ด, where ifplayer 1 chooses action ๐‘– and player 2 chooses action ๐‘— then player onegets ๐ด๐‘–,๐‘— and player 2 loses the same amount. The famous Min MaxTheorem by John von Neumann states that if we allow probabilistic orโ€œmixedโ€ strategies (where a player does not choose a single action butrather a distribution over actions) then it does not matter who playsfirst and the end result will be the same. Mathematically the min maxtheorem is that if we let ฮ”๐‘› be the set of probability distributions over

Page 412: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

412 introduction to theoretical computer science

[๐‘›] (i.e., non-negative columns vectors in โ„๐‘› whose entries sum to 1)then

max๐‘โˆˆโˆ†๐‘› min๐‘žโˆˆโˆ†๐‘› ๐‘โŠค๐ด๐‘ž = min๐‘žโˆˆโˆ†๐‘› max๐‘โˆˆโˆ†๐‘› ๐‘โŠค๐ด๐‘ž (12.7)

The min-max theorem turns out to be a corollary of linear pro-gramming duality, and indeed the value of (12.7) can be computedefficiently by a linear program.

12.3.4 Finding a Nash equilibriumFortunately, not all real-world games are zero sum, and we do havemore general games, where the payoff of one player does not neces-sarily equal the loss of the other. John Nash won the Nobel prize forshowing that there is a notion of equilibrium for such games as well.In many economic texts it is taken as an article of faith that whenactual agents are involved in such a game then they reach a Nashequilibrium. However, unlike zero sum games, we do not know ofan efficient algorithm for finding a Nash equilibrium given the de-scription of a general (non zero sum) game. In particular this meansthat, despite economistsโ€™ intuitions, there are games for which naturalstrategies will take an exponential number of steps to converge to anequilibrium.

12.3.5 Primality testingAnother classical computational problem, that has been of interestsince the ancient Greeks, is to determine whether a given number๐‘ is prime or composite. Clearly we can do so by trying to divide itwith all the numbers in 2, โ€ฆ , ๐‘ โˆ’ 1, but this would take at least ๐‘steps which is exponential in its bit complexity ๐‘› = log๐‘ . We canreduce these ๐‘ steps to

โˆš๐‘ by observing that if ๐‘ is a composite ofthe form ๐‘ = PQ then either ๐‘ƒ or ๐‘„ is smaller than

โˆš๐‘ . But this isstill quite terrible. If ๐‘ is a 1024 bit integer,

โˆš๐‘ is about 2512, and sorunning this algorithm on such an input would take much more thanthe lifetime of the universe.

Luckily, it turns out we can do radically better. In the 1970โ€™s, Ra-bin and Miller gave probabilistic algorithms to determine whether agiven number ๐‘ is prime or composite in time ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) for ๐‘› = log๐‘ .We will discuss the probabilistic model of computation later in thiscourse. In 2002, Agrawal, Kayal, and Saxena found a deterministic๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) time algorithm for this problem. This is surely a developmentthat mathematicians from Archimedes till Gauss would have foundexciting.

Page 413: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 413

Figure 12.6: The current computational status ofseveral interesting problems. For all of them we eitherknow a polynomial-time algorithm or the knownalgorithms require at least 2๐‘›๐‘ for some ๐‘ > 0. Infact for all except the factoring problem, we eitherknow an ๐‘‚(๐‘›3) time algorithm or the best knownalgorithm require at least 2ฮฉ(๐‘›) time where ๐‘› is anatural parameter such that there is a brute forcealgorithm taking roughly 2๐‘› or ๐‘›! time. Whether thisโ€œcliffโ€ between the easy and hard problem is a realphenomenon or a reflection of our ignorance is still anopen question.

12.3.6 Integer factoringGiven that we can efficiently determine whether a number ๐‘ is primeor composite, we could expect that in the latter case we could also ef-ficiently find the factorization of ๐‘ . Alas, no such algorithm is known.In a surprising and exciting turn of events, the non existence of such analgorithm has been used as a basis for encryptions, and indeed it un-derlies much of the security of the world wide web. We will return tothe factoring problem later in this course. We remark that we do knowmuch better than brute force algorithms for this problem. While thebrute force algorithms would require 2ฮฉ(๐‘›) time to factor an ๐‘›-bit inte-ger, there are known algorithms running in time roughly 2๐‘‚(โˆš๐‘›) andalso algorithms that are widely believed (though not fully rigorouslyanalyzed) to run in time roughly 2๐‘‚(๐‘›1/3). (By โ€œroughlyโ€ we mean thatwe neglect factors that are polylogarithmic in ๐‘›.)12.4 OUR CURRENT KNOWLEDGE

The difference between an exponential and polynomial time algo-rithm might seem merely โ€œquantitativeโ€ but it is in fact extremelysignificant. As weโ€™ve already seen, the brute force exponential timealgorithm runs out of steam very very fast, and as Edmonds says, inpractice there might not be much difference between a problem wherethe best algorithm is exponential and a problem that is not solvableat all. Thus the efficient algorithms we mentioned above are widelyused and power many computer science applications. Moreover, apolynomial-time algorithm often arises out of significant insight tothe problem at hand, whether it is the โ€œmax-flow min-cutโ€ result, thesolvability of the determinant, or the group theoretic structure thatenables primality testing. Such insight can be useful regardless of itscomputational implications.

At the moment we do not know whether the โ€œhardโ€ problems aretruly hard, or whether it is merely because we havenโ€™t yet found theright algorithms for them. However, we will now see that there areproblems that do inherently require exponential time. We just donโ€™tknow if any of the examples above fall into that category.

Chapter Recap

โ€ข There are many natural problems that havepolynomial-time algorithms, and other naturalproblems that weโ€™d love to solve, but for which thebest known algorithms are exponential.

โ€ข Often a polynomial time algorithm relies on dis-covering some hidden structure in the problem, orfinding a surprising equivalent formulation for it.

Page 414: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

414 introduction to theoretical computer science

2 Hint: Use dynamic programming to compute forevery ๐‘ , ๐‘ก โˆˆ [๐‘›] and ๐‘† โŠ† [๐‘›] the value ๐‘ƒ(๐‘ , ๐‘ก, ๐‘†)which equals 1 if there is a simple path from ๐‘  to ๐‘กthat uses exactly the vertices in ๐‘†. Do this iterativelyfor ๐‘†โ€™s of growing sizes.

โ€ข There are many interesting problems where thereis an exponential gap between the best known algo-rithm and the best algorithm that we can rule out.Closing this gap is one of the main open questionsof theoretical computer science.

12.5 EXERCISES

Exercise 12.1 โ€” exponential time algorithm for longest path. The naive algo-rithm for computing the longest path in a given graph could takemore than ๐‘›! steps. Give a ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›)2๐‘› time algorithm for the longestpath problem in ๐‘› vertex graphs.2

Exercise 12.2 โ€” 2SAT algorithm. For every 2CNF ๐œ‘, define the graph ๐บ๐œ‘on 2๐‘› vertices corresponding to the literals ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›, suchthat there is an edge โ„“๐‘– โ„“๐‘— iff the constraint โ„“๐‘– โˆจ โ„“๐‘— is in ๐œ‘. Prove that ๐œ‘is unsatisfiable if and only if there is some ๐‘– such that there is a pathfrom ๐‘ฅ๐‘– to ๐‘ฅ๐‘– and from ๐‘ฅ๐‘– to ๐‘ฅ๐‘– in ๐บ๐œ‘. Show how to use this to solve2SAT in polynomial time.

Exercise 12.3 โ€” Reductions for showing algorithms. The following fact istrue: there is a polynomial-time algorithm BIP that on input a graph๐บ = (๐‘‰ , ๐ธ) outputs 1 if and only if the graph is bipartite: there is apartition of ๐‘‰ to disjoint parts ๐‘† and ๐‘‡ such that every edge (๐‘ข, ๐‘ฃ) โˆˆ ๐ธsatisfies either ๐‘ข โˆˆ ๐‘† and ๐‘ฃ โˆˆ ๐‘‡ or ๐‘ข โˆˆ ๐‘‡ and ๐‘ฃ โˆˆ ๐‘†. Use thisfact to prove that there is a polynomial-time algorithm to computethat following function CLIQUEPARTITION that on input a graph๐บ = (๐‘‰ , ๐ธ) outputs 1 if and only if there is a partition of ๐‘‰ the graphinto two parts ๐‘† and ๐‘‡ such that both ๐‘† and ๐‘‡ are cliques: for everypair of distinct vertices ๐‘ข, ๐‘ฃ โˆˆ ๐‘†, the edge (๐‘ข, ๐‘ฃ) is in ๐ธ and similarly forevery pair of distinct vertices ๐‘ข, ๐‘ฃ โˆˆ ๐‘‡ , the edge (๐‘ข, ๐‘ฃ) is in ๐ธ.

12.6 BIBLIOGRAPHICAL NOTES

The classic undergraduate introduction to algorithms text is[Cor+09]. Two texts that are less โ€œencyclopedicโ€ are Kleinberg andTardos [KT06], and Dasgupta, Papadimitriou and Vazirani [DPV08].Jeff Ericksonโ€™s book is an excellent algorithms text that is freelyavailable online.

The origins of the minimum cut problem date to the Cold War.Specifically, Ford and Fulkerson discovered their max-flow/min-cutalgorithm in 1955 as a way to find out the minimum amount of train

Page 415: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

efficient computation: an informal introduction 415

tracks that would need to be blown up to disconnect Russia from therest of Europe. See the survey [Sch05] for more.

Some algorithms for the longest path problem are given in [Wil09;Bjo14].

12.7 FURTHER EXPLORATIONS

Some topics related to this chapter that might be accessible to ad-vanced students include: (to be completed)

Page 416: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 417: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

13Modeling running time

โ€œWhen the measure of the problem-size is reasonable and when thesizes assume values arbitrarily large, an asymptotic estimate of โ€ฆ the or-der of difficulty of [an] algorithm .. is theoretically important. It cannotbe rigged by making the algorithm artificially difficult for smaller sizesโ€,Jack Edmonds, โ€œPaths, Trees, and Flowersโ€, 1963

Max Newman: It is all very well to say that a machine could โ€ฆ do this orthat, but โ€ฆ what about the time it would take to do it?Alan Turing: To my mind this time factor is the one question which willinvolve all the real technical difficulty.BBC radio panel on โ€œCan automatic Calculating Machines Be Said toThink?โ€, 1952

In Chapter 12 we saw examples of efficient algorithms, and madesome claims about their running time, but did not give a mathemati-cally precise definition for this concept. We do so in this chapter, usingthe models of Turing machines and RAM machines (or equivalentlyNAND-TM and NAND-RAM) we have seen before. The runningtime of an algorithm is not a fixed number since any non-trivial algo-rithm will take longer to run on longer inputs. Thus, what we wantto measure is the dependence between the number of steps the algo-rithms takes and the length of the input. In particular we care aboutthe distinction between algorithms that take at most polynomial time(i.e., ๐‘‚(๐‘›๐‘) time for some constant ๐‘) and problems for which everyalgorithm requires at least exponential time (i.e., ฮฉ(2๐‘›๐‘) for some ๐‘). Asmentioned in Edmondโ€™s quote in Chapter 12, the difference betweenthese two can sometimes be as important as the difference betweenbeing computable and uncomputable.

This chapter: A non-mathy overviewIn this chapter we formally define what it means for a func-tion to be computable in a certain number of steps. As dis-cussed in Chapter 12, running time is not a number, rather

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Formally modeling running time, and in

particular notions such as ๐‘‚(๐‘›) or ๐‘‚(๐‘›3) timealgorithms.

โ€ข The classes P and EXP modelling polynomialand exponential time respectively.

โ€ข The time hierarchy theorem, that in particularsays that for every ๐‘˜ โ‰ฅ 1 there are functionswe can compute in ๐‘‚(๐‘›๐‘˜+1) time but can notcompute in ๐‘‚(๐‘›๐‘˜) time.

โ€ข The class P/poly of non uniform computationand the result that P โŠ† P/poly

Page 418: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

418 introduction to theoretical computer science

Figure 13.1: Overview of the results of this chapter.

what we care about is the scaling behevaiour of the numberof steps as the input size grows. We can use either Turingmachines or RAM machines to give such a formal definition- it turns out that this doesnโ€™t make a difference at the reso-lution we care about. We make several important definitionsand prove some important theorems in this chapter. We willdefine the main time complexity classes we use in this book,and also show the Time Hierarchy Theorem which states thatgiven more resources (more time steps per input size) we cancompute more functions.

To put this in more โ€œmathyโ€ language, in this chapter we definewhat it means for a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— to be computablein time ๐‘‡ (๐‘›) steps, where ๐‘‡ is some function mapping the length ๐‘›of the input to the number of computation steps allowed. Using thisdefinition we will do the following (see also Fig. 13.1):

โ€ข We define the class P of Boolean functions that can be computedin polynomial time and the class EXP of functions that can be com-puted in exponential time. Note that P โŠ† EXP if we can compute afunction in polynomial time, we can certainly compute it in expo-nential time.

โ€ข We show that the times to compute a function using a Turing Ma-chine and using a RAM machine (or NAND-RAM program) arepolynomially related. In particular this means that the classes P andEXP are identical regardless of whether they are defined usingTuring Machines or RAM machines / NAND-RAM programs.

Page 419: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 419

โ€ข We give an efficient universal NAND-RAM program and use this toestablish the time hierarchy theorem that in particular implies that P isa strict subset of EXP.

โ€ข We relate the notions defined here to the non uniform models ofBoolean circuits and NAND-CIRC programs defined in Chapter 3.We define P/poly to be the class of functions that can be computedby a sequence of polynomial-sized circuits. We prove that P โŠ† P/polyand that P/poly contains uncomputable functions.

13.1 FORMALLY DEFINING RUNNING TIME

Our models of computation such Turing Machines, NAND-TM andNAND-RAM programs and others all operate by executing a se-quence of instructions on an input one step at a time. We can definethe running time of an algorithm ๐‘€ in one of these models by measur-ing the number of steps ๐‘€ takes on input ๐‘ฅ as a function of the length|๐‘ฅ| of the input. We start by defining running time with respect to Tur-ing Machines:

Definition 13.1 โ€” Running time (Turing Machines). Let ๐‘‡ โˆถ โ„• โ†’ โ„• be somefunction mapping natural numbers to natural numbers. We saythat a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— is computable in ๐‘‡ (๐‘›) Turing-Machine time (TM-time for short) if there exists a Turing Machine ๐‘€such that for every sufficiently large ๐‘› and every ๐‘ฅ โˆˆ 0, 1๐‘›, whengiven input ๐‘ฅ, the machine ๐‘€ halts after executing at most ๐‘‡ (๐‘›)steps and outputs ๐น(๐‘ฅ).

We define TIMETM(๐‘‡ (๐‘›)) to be the set of Boolean functions(functions mapping 0, 1โˆ— to 0, 1) that are computable in ๐‘‡ (๐‘›)TM time.

Big Idea 17 For a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 and ๐‘‡ โˆถโ„• โ†’ โ„•, we can formally define what it means for ๐น to be computablein time at most ๐‘‡ (๐‘›) where ๐‘› is the size of the input.

PDefinition 13.1 is not very complicated but is one ofthe most important definitions of this book. As usual,TIMETM(๐‘‡ (๐‘›)) is a class of functions, not of machines. If๐‘€ is a Turing Machine then a statement such as โ€œ๐‘€is a member of TIMETM(๐‘›2)โ€ does not make sense.The concept of TM-time as defined here is sometimesknown as โ€œsingle-tape Turing machine timeโ€ in theliterature, since some texts consider Turing machineswith more than one working tape.

Page 420: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

420 introduction to theoretical computer science

Figure 13.2: Comparing ๐‘‡ (๐‘›) = 10๐‘›3 with ๐‘‡ โ€ฒ(๐‘›) =2๐‘› (on the right figure the Y axis is in log scale).Since for every large enough ๐‘›, ๐‘‡ โ€ฒ(๐‘›) โ‰ฅ ๐‘‡ (๐‘›),TIMETM(๐‘‡ (๐‘›)) โŠ† TIMETM(๐‘‡ โ€ฒ(๐‘›)).

The relaxation of considering only โ€œsufficiently largeโ€ ๐‘›โ€™s is notvery important but it is convenient since it allows us to avoid dealingexplicitly with un-interesting โ€œedge casesโ€. We will mostly anyway beinterested in determining running time only up to constant and evenpolynomial factors.

While the notion of being computable within a certain running timecan be defined for every function, the class TIMETM(๐‘‡ (๐‘›)) is a classof Boolean functions that have a single bit of output. This choice is notvery important, but is made for simplicity and convenience later on.In fact, every non-Boolean function has a computationally equivalentBoolean variant, see Exercise 13.3.Solved Exercise 13.1 โ€” Example of time bounds. Prove that TIMETM(10โ‹…๐‘›3) โŠ†TIMETM(2๐‘›).

Solution:

The proof is illustrated in Fig. 13.2. Suppose that ๐น โˆˆ TIMETM(10โ‹…๐‘›3) and hence there some number ๐‘0 and a machine ๐‘€ such thatfor every ๐‘› > ๐‘0, and ๐‘ฅ โˆˆ 0, 1โˆ—, ๐‘€(๐‘ฅ) outputs ๐น(๐‘ฅ) within atmost 10 โ‹… ๐‘›3 steps. Since 10 โ‹… ๐‘›3 = ๐‘œ(2๐‘›), there is some number๐‘1 such that for every ๐‘› > ๐‘1, 10 โ‹… ๐‘›3 < 2๐‘›. Hence for every๐‘› > max๐‘0, ๐‘1, ๐‘€(๐‘ฅ) will output ๐น(๐‘ฅ) within at most 2๐‘› steps,just demonstrating that ๐น โˆˆ TIMETM(2๐‘›).

13.1.1 Polynomial and Exponential TimeUnlike the notion of computability, the exact running time can be afunction of the model we use. However, it turns out that if we onlycare about โ€œcoarse enoughโ€ resolution (as will most often be the case)then the choice of the model, whether Turing Machines, RAM ma-chines, NAND-TM/NAND-RAM programs, or C/Python programs,does not matter. This is known as the extended Church-Turing Thesis.Specifically we will mostly care about the difference between polyno-mial and exponential time.

The two main time complexity classes we will be interested in arethe following:

โ€ข Polynomial time: A function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is computable inpolynomial time if it is in the class P = โˆช๐‘โˆˆ1,2,3,โ€ฆTIMETM(๐‘›๐‘). Thatis, ๐น โˆˆ P if there is an algorithm to compute ๐น that runs in time atmost polynomial (i.e., at most ๐‘›๐‘ for some constant ๐‘) in the lengthof the input.

โ€ข Exponential time: A function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is computable inexponential time if it is in the class EXP = โˆช๐‘โˆˆ1,2,3,โ€ฆTIMETM(2๐‘›๐‘).

Page 421: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 421

That is, ๐น โˆˆ EXP if there is an algorithm to compute ๐น that runs intime at most exponential (i.e., at most 2๐‘›๐‘ for some constant ๐‘) in thelength of the input.

In other words, these are defined as follows:

Definition 13.2 โ€” P and EXP. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. We say that๐น โˆˆ P if there is a polynomial ๐‘ โˆถ โ„• โ†’ โ„ and a Turing Machine๐‘€ such that for every ๐‘ฅ โˆˆ 0, 1โˆ—, when given input ๐‘ฅ, the Turingmachine halts within at most ๐‘(|๐‘ฅ|) steps and outputs ๐น(๐‘ฅ).

We say that ๐น โˆˆ EXP if there is a polynomial ๐‘ โˆถ โ„• โ†’ โ„ anda Turing Machine ๐‘€ such that for every ๐‘ฅ โˆˆ 0, 1โˆ—, when giveninput ๐‘ฅ, ๐‘€ halts within at most 2๐‘(|๐‘ฅ|) steps and outputs ๐น(๐‘ฅ).

PPlease take the time to make sure you understandthese definitions. In particular, sometimes studentsthink of the class EXP as corresponding to functionsthat are not in P. However, this is not the case. If ๐น isin EXP then it can be computed in exponential time.This does not mean that it cannot be computed inpolynomial time as well.

Solved Exercise 13.2 โ€” Differerent definitions of P. Prove that P as defined inDefinition 13.2 is equal to โˆช๐‘โˆˆ1,2,3,โ€ฆTIMETM(๐‘›๐‘)

Solution:

To show these two sets are equal we need to show that P โŠ†โˆช๐‘โˆˆ1,2,3,โ€ฆTIMETM(๐‘›๐‘) and โˆช๐‘โˆˆ1,2,3,โ€ฆTIMETM(๐‘›๐‘) โŠ† P. We startwith the former inclusion. Suppose that ๐น โˆˆ P. Then there is somepolynomial ๐‘ โˆถ โ„• โ†’ โ„ and a Turing machine ๐‘€ such that ๐‘€computes ๐น and ๐‘€ halts on every input ๐‘ฅ within at most ๐‘(|๐‘ฅ|)steps. We can write the polynomial ๐‘ โˆถ โ„• โ†’ โ„ in the form๐‘(๐‘›) = โˆ‘๐‘‘๐‘–=0 ๐‘Ž๐‘–๐‘›๐‘– where ๐‘Ž0, โ€ฆ , ๐‘Ž๐‘‘ โˆˆ โ„, and we assume that ๐‘Ž๐‘‘is nonzero (or otherwise we just let ๐‘‘ correspond to the largestnumber such that ๐‘Ž๐‘‘ is nonzero). The degree if ๐‘ the number ๐‘‘.Since ๐‘›๐‘‘ = ๐‘œ(๐‘›๐‘‘+1), no matter what is the coefficient ๐‘Ž๐‘‘, for largeenough ๐‘›, ๐‘(๐‘›) < ๐‘›๐‘‘+1 which means that the Turing machine ๐‘€will halt on inputs of length ๐‘› within fewer than ๐‘›๐‘‘+1 steps, andhence ๐น โˆˆ TIMETM(๐‘›๐‘‘+1) โŠ† โˆช๐‘โˆˆ1,2,3,โ€ฆTIMETM(๐‘›๐‘).

For the second inclusion, suppose that ๐น โˆˆ โˆช๐‘โˆˆ1,2,3,โ€ฆTIMETM(๐‘›๐‘).Then there is some positive ๐‘ โˆˆ โ„• such that ๐น โˆˆ TIMETM(๐‘›๐‘) whichmeans that there is a Turing Machine ๐‘€ and some number ๐‘0 such

Page 422: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

422 introduction to theoretical computer science

Figure 13.3: Some examples of problems that areknown to be in P and problems that are known tobe in EXP but not known whether or not they arein P. Since both P and EXP are classes of Booleanfunctions, in this figure we always refer to the Boolean(i.e., Yes/No) variant of the problems.

that ๐‘€ computes ๐น and for every ๐‘› > ๐‘0, ๐‘€ halts on length ๐‘›inputs within at most ๐‘›๐‘ steps. Let ๐‘‡0 be the maximum numberof steps that ๐‘€ takes on inputs of length at most ๐‘0. Then if wedefine the polynomial ๐‘(๐‘›) = ๐‘›๐‘ + ๐‘‡0 then we see that ๐‘€ halts onevery input ๐‘ฅ within at most ๐‘(|๐‘ฅ|) steps and hence the existence of๐‘€ demonstrates that ๐น โˆˆ P.

Since exponential time is much larger than polynomial time, P โŠ†EXP. All of the problems we listed in Chapter 12 are in EXP, but asweโ€™ve seen, for some of them there are much better algorithms thatdemonstrate that they are in fact in the smaller class P.

P EXP (but not known to be in P)Shortest path Longest PathMin cut Max cut2SAT 3SATLinear eqs Quad. eqsZerosum NashDeterminant PermanentPrimality Factoring

Table : A table of the examples from Chapter 12. All these problemsare in EXP but only the ones on the left column are currently known tobe in P as well (i.e., they have a polynomial-time algorithm). See alsoFig. 13.3.

RRemark 13.3 โ€” Boolean versions of problems. Manyof the problems defined in Chapter 12 correspond tonon Boolean functions (functions with more than onebit of output) while P and EXP are sets of Booleanfunctions. However, for every non-Boolean function๐น we can always define a computationally-equivalentBoolean function ๐บ by letting ๐บ(๐‘ฅ, ๐‘–) be the ๐‘–-th bitof ๐น(๐‘ฅ) (see Exercise 13.3). Hence the table above,as well as Fig. 13.3, refer to the computationally-equivalent Boolean variants of these problems.

13.2 MODELING RUNNING TIME USING RAM MACHINES / NAND-RAM

Turing Machines are a clean theoretical model of computation, butdo not closely correspond to real-world computing architectures. Thediscrepancy between Turing Machines and actual computers doesnot matter much when we consider the question of which functions

Page 423: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 423

are computable, but can make a difference in the context of efficiency.Even a basic staple of undergraduate algorithms such as โ€œmerge sortโ€cannot be implemented on a Turing Machine in ๐‘‚(๐‘› log๐‘›) time (seeSection 13.8). RAM machines (or equivalently, NAND-RAM programs)match more closely actual computing architecture and what we meanwhen we say ๐‘‚(๐‘›) or ๐‘‚(๐‘› log๐‘›) algorithms in algorithms coursesor whiteboard coding interviews. We can define running time withrespect to NAND-RAM programs just as we did for Turing Machines.

Definition 13.4 โ€” Running time (RAM). Let ๐‘‡ โˆถ โ„• โ†’ โ„• be some func-tion mapping natural numbers to natural numbers. We say thata function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— is computable in ๐‘‡ (๐‘›) RAM time(RAM-time for short) if there exists a NAND-RAM program ๐‘ƒ suchthat for every sufficiently large ๐‘› and every ๐‘ฅ โˆˆ 0, 1๐‘›, when giveninput ๐‘ฅ, the program ๐‘ƒ halts after executing at most ๐‘‡ (๐‘›) lines andoutputs ๐น(๐‘ฅ).

We define TIMERAM(๐‘‡ (๐‘›)) to be the set of Boolean functions(functions mapping 0, 1โˆ— to 0, 1) that are computable in ๐‘‡ (๐‘›)RAM time.

Because NAND-RAM programs correspond more closely to ournatural notions of running time, we will use NAND-RAM as ourโ€œdefaultโ€ model of running time, and hence use TIME(๐‘‡ (๐‘›)) (withoutany subscript) to denote TIMERAM(๐‘‡ (๐‘›)). However, it turns out thatas long as we only care about the difference between exponential andpolynomial time, this does not make much difference. The reason isthat Turing Machines can simulate NAND-RAM programs with atmost a polynomial overhead (see also Fig. 13.4):

Theorem 13.5 โ€” Relating RAM and Turing machines. Let ๐‘‡ โˆถ โ„• โ†’ โ„• be afunction such that ๐‘‡ (๐‘›) โ‰ฅ ๐‘› for every ๐‘› and the map ๐‘› โ†ฆ ๐‘‡ (๐‘›) canbe computed by a Turing machine in time ๐‘‚(๐‘‡ (๐‘›)). Then

TIMETM(๐‘‡ (๐‘›)) โŠ† TIMERAM(10 โ‹… ๐‘‡ (๐‘›)) โŠ† TIMETM(๐‘‡ (๐‘›)4) . (13.1)

PThe technical details of Theorem 13.5, such as the con-dition that ๐‘› โ†ฆ ๐‘‡ (๐‘›) is computable in ๐‘‚(๐‘‡ (๐‘›)) timeor the constants 10 and 4 in (13.1) (which are not tightand can be improved), are not very important. In par-ticular, all non pathological time bound functions weencounter in practice such as ๐‘‡ (๐‘›) = ๐‘›, ๐‘‡ (๐‘›) = ๐‘› log๐‘›,๐‘‡ (๐‘›) = 2๐‘› etc. will satisfy the conditions of Theo-rem 13.5, see also Remark 13.6.

Page 424: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

424 introduction to theoretical computer science

Figure 13.4: The proof of Theorem 13.5 shows thatwe can simulate ๐‘‡ steps of a Turing Machine with ๐‘‡steps of a NAND-RAM program, and can simulate๐‘‡ steps of a NAND-RAM program with ๐‘œ(๐‘‡ 4)steps of a Turing Machine. Hence TIMETM(๐‘‡ (๐‘›)) โŠ†TIMERAM(10 โ‹… ๐‘‡ (๐‘›)) โŠ† TIMETM(๐‘‡ (๐‘›)4).

The main message of the theorem is Turing Machinesand RAM machines are โ€œroughly equivalentโ€ in thesense that one can simulate the other with polyno-mial overhead. Similarly, while the proof involvessome technical details, itโ€™s not very deep or hard, andmerely follows the simulation of RAM machines withTuring Machines we saw in Theorem 8.1 with morecareful โ€œbook keepingโ€.

For example, by instantiating Theorem 13.5 with ๐‘‡ (๐‘›) = ๐‘›๐‘Ž andusing the fact that 10๐‘›๐‘Ž = ๐‘œ(๐‘›๐‘Ž+1), we see that TIMETM(๐‘›๐‘Ž) โŠ†TIMERAM(๐‘›๐‘Ž+1) โŠ† TIMETM(๐‘›4๐‘Ž+4) which means that (by Solved Ex-ercise 13.2)

P = โˆช๐‘Ž=1,2,โ€ฆTIMETM(๐‘›๐‘Ž) = โˆช๐‘Ž=1,2,โ€ฆTIMERAM(๐‘›๐‘Ž) . (13.2)That is, we could have equally well defined P as the class of functionscomputable by NAND-RAM programs (instead of Turing Machines)that run in time polynomial in the length of the input. Similarly, byinstantiating Theorem 13.5 with ๐‘‡ (๐‘›) = 2๐‘›๐‘Ž we see that the class EXPcan also be defined as the set of functions computable by NAND-RAMprograms in time at most 2๐‘(๐‘›) where ๐‘ is some polynomial. Similarequivalence results are known for many models including cellularautomata, C/Python/Javascript programs, parallel computers, and agreat many other models, which justifies the choice of P as capturinga technology-independent notion of tractability. (See Section 13.3for more discussion of this issue.) This equivalence between Turingmachines and NAND-RAM (as well as other models) allows us topick our favorite model depending on the task at hand (i.e., โ€œhave ourcake and eat it tooโ€) even when we study questions of efficiency, aslong as we only care about the gap between polynomial and exponentialtime. When we want to design an algorithm, we can use the extrapower and convenience afforded by NAND-RAM. When we wantto analyze a program or prove a negative result, we can restrict ourattention to Turing machines.

Big Idea 18 All โ€œreasonableโ€ computational models are equiv-alent if we only care about the distinction between polynomial andexponential.

The adjective โ€œreasonableโ€ above refers to all scalable computa-tional models that have been implemented, with the possible excep-tion of quantum computers, see Section 13.3 and Chapter 23.Proof Idea:

The direction TIMETM(๐‘‡ (๐‘›)) โŠ† TIMERAM(10 โ‹… ๐‘‡ (๐‘›)) is not hard toshow, since a NAND-RAM program ๐‘ƒ can simulate a Turing Machine

Page 425: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 425

๐‘€ with constant overhead by storing the transition table of ๐‘€ inan array (as is done in the proof of Theorem 9.1). Simulating everystep of the Turing machine can be done in a constant number ๐‘ ofsteps of RAM, and it can be shown this constant ๐‘ is smaller than10. Thus the heart of the theorem is to prove that TIMERAM(๐‘‡ (๐‘›)) โŠ†TIMETM(๐‘‡ (๐‘›)4). This proof closely follows the proof of Theorem 8.1,where we have shown that every function ๐น that is computable bya NAND-RAM program ๐‘ƒ is computable by a Turing Machine (orequivalently a NAND-TM program) ๐‘€ . To prove Theorem 13.5, wefollow the exact same proof but just check that the overhead of thesimulation of ๐‘ƒ by ๐‘€ is polynomial. The proof has many details, butis not deep. It is therefore much more important that you understandthe statement of this theorem than its proof.โ‹†Proof of Theorem 13.5. We only focus on the non-trivial directionTIMERAM(๐‘‡ (๐‘›)) โŠ† TIMETM(๐‘‡ (๐‘›)4). Let ๐น โˆˆ TIMERAM(๐‘‡ (๐‘›)). ๐น canbe computed in time ๐‘‡ (๐‘›) by some NAND-RAM program ๐‘ƒ and weneed to show that it can also be computed in time ๐‘‡ (๐‘›)4 by a TuringMachine ๐‘€ . This will follow from showing that ๐น can be computedin time ๐‘‡ (๐‘›)4 by a NAND-TM program, since for every NAND-TMprogram ๐‘„ there is a Turing Machine ๐‘€ simulating it such that eachiteration of ๐‘„ corresponds to a single step of ๐‘€ .

As mentioned above, we follow the proof of Theorem 8.1 (simula-tion of NAND-RAM programs using NAND-TM programs) and usethe exact same simulation, but with a more careful accounting of thenumber of steps that the simulation costs. Recall, that the simulationof NAND-RAM works by โ€œpeeling offโ€ features of NAND-RAM oneby one, until we are left with NAND-TM.

We will not provide the full details but will present the main ideasused in showing that every feature of NAND-RAM can be simulatedby NAND-TM with at most a polynomial overhead:

1. Recall that every NAND-RAM variable or array element can con-tain an integer between 0 and ๐‘‡ where ๐‘‡ is the number of lines thathave been executed so far. Therefore if ๐‘ƒ is a NAND-RAM pro-gram that computes ๐น in ๐‘‡ (๐‘›) time, then on inputs of length ๐‘›, allintegers used by ๐‘ƒ are of magnitude at most ๐‘‡ (๐‘›). This means thatthe largest value i can ever reach is at most ๐‘‡ (๐‘›) and so each one of๐‘ƒ โ€™s variables can be thought of as an array of at most ๐‘‡ (๐‘›) indices,each of which holds a natural number of magnitude at most ๐‘‡ (๐‘›).We let โ„“ = โŒˆlog๐‘‡ (๐‘›)โŒ‰ be the number of bits needed to encode suchnumbers. (We can start off the simulation by computing ๐‘‡ (๐‘›) andโ„“.)

Page 426: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

426 introduction to theoretical computer science

2. We can encode a NAND-RAM array of length โ‰ค ๐‘‡ (๐‘›) containingnumbers in 0, โ€ฆ , ๐‘‡ (๐‘›) โˆ’ 1 as an Boolean (i.e., NAND-TM) arrayof ๐‘‡ (๐‘›)โ„“ = ๐‘‚(๐‘‡ (๐‘›) log๐‘‡ (๐‘›)) bits, which we can also think of asa two dimensional array as we did in the proof of Theorem 8.1. Weencode a NAND-RAM scalar containing a number in 0, โ€ฆ , ๐‘‡ (๐‘›) โˆ’1 simply by a shorter NAND-TM array of โ„“ bits.

3. We can simulate the two dimensional arrays using one-dimensional arrays of length ๐‘‡ (๐‘›)โ„“ = ๐‘‚(๐‘‡ (๐‘›) log๐‘‡ (๐‘›). All thearithmetic operations on integers use the grade-school algorithms,that take time that is polynomial in the number โ„“ of bits of theintegers, which is ๐‘๐‘œ๐‘™๐‘ฆ(log๐‘‡ (๐‘›)) in our case. Hence we can simulate๐‘‡ (๐‘›) steps of NAND-RAM with ๐‘‚(๐‘‡ (๐‘›)๐‘๐‘œ๐‘™๐‘ฆ(log๐‘‡ (๐‘›)) steps of amodel that uses random access memory but only Boolean-valuedone-dimensional arrays.

4. The most expensive step is to translate from random access mem-ory to the sequential memory model of NAND-TM/Turing Ma-chines. As we did in the proof of Theorem 8.1 (see Section 8.2), wecan simulate accessing an array Foo at some location encoded in anarray Bar by:

a. Copying Bar to some temporary array Tempb. Having an array Index which is initially all zeros except 1 at the

first location.c. Repeating the following until Temp encodes the number 0:

(Number of repetitions is at most ๐‘‡ (๐‘›).)โ€ข Decrease the number encoded temp by 1. (Take number of steps

polynomial in โ„“ = โŒˆlog๐‘‡ (๐‘›)โŒ‰.)โ€ข Decrease i until it is equal to 0. (Take ๐‘‚(๐‘‡ (๐‘›) steps.)โ€ข Scan Index until we reach the point in which it equals 1 and

then change this 1 to 0 and go one step further and write 1 inthis location. (Takes ๐‘‚(๐‘‡ (๐‘›)) steps.)

d. When we are done we know that if we scan Index until we reachthe point in which Index[i]= 1 then i contains the value thatwas encoded by Bar (Takes ๐‘‚(๐‘‡ (๐‘›) steps.)

The total cost for each such operation is ๐‘‚(๐‘‡ (๐‘›)2+๐‘‡ (๐‘›)๐‘๐‘œ๐‘™๐‘ฆ(log๐‘‡ (๐‘›))) =๐‘‚(๐‘‡ (๐‘›)2) steps.In sum, we simulate a single step of NAND-RAM using๐‘‚(๐‘‡ (๐‘›)2๐‘๐‘œ๐‘™๐‘ฆ(log๐‘‡ (๐‘›))) steps of NAND-TM, and hence the total

simulation time is ๐‘‚(๐‘‡ (๐‘›)3๐‘๐‘œ๐‘™๐‘ฆ(log๐‘‡ (๐‘›))) which is smaller than ๐‘‡ (๐‘›)4for sufficiently large ๐‘›.

Page 427: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 427

RRemark 13.6 โ€” Nice time bounds. When consideringgeneral time bounds such we need to make sure torule out some โ€œpathologicalโ€ cases such as functions ๐‘‡that donโ€™t give enough time for the algorithm to readthe input, or functions where the time bound itself isuncomputable. We say that a function ๐‘‡ โˆถ โ„• โ†’ โ„• isa nice time bound function (or nice function for short)if for every ๐‘› โˆˆ โ„•, ๐‘‡ (๐‘›) โ‰ฅ ๐‘› (i.e., ๐‘‡ allows enoughtime to read the input), for every ๐‘›โ€ฒ โ‰ฅ ๐‘›, ๐‘‡ (๐‘›โ€ฒ) โ‰ฅ ๐‘‡ (๐‘›)(i.e., ๐‘‡ allows more time on longer inputs), and themap ๐น(๐‘ฅ) = 1๐‘‡ (|๐‘ฅ|) (i.e., mapping a string of length๐‘› to a sequence of ๐‘‡ (๐‘›) ones) can be computed by aNAND-RAM program in ๐‘‚(๐‘‡ (๐‘›)) time.All the โ€œnormalโ€ time complexity bounds we en-counter in applications such as ๐‘‡ (๐‘›) = 100๐‘›,๐‘‡ (๐‘›) = ๐‘›2 log๐‘›,๐‘‡ (๐‘›) = 2โˆš๐‘›, etc. are โ€œniceโ€.Hence from now on we will only care about theclass TIME(๐‘‡ (๐‘›)) when ๐‘‡ is a โ€œniceโ€ function. Thecomputability condition is in particular typically easilysatisfied. For example, for arithmetic functions suchas ๐‘‡ (๐‘›) = ๐‘›3, we can typically compute the binaryrepresentation of ๐‘‡ (๐‘›) in time polynomial in the num-ber of bits of ๐‘‡ (๐‘›) and hence poly-logarithmic in ๐‘‡ (๐‘›).Hence the time to write the string 1๐‘‡ (๐‘›) in such caseswill be ๐‘‡ (๐‘›) + ๐‘๐‘œ๐‘™๐‘ฆ(log๐‘‡ (๐‘›)) = ๐‘‚(๐‘‡ (๐‘›)).

13.3 EXTENDED CHURCH-TURING THESIS (DISCUSSION)

Theorem 13.5 shows that the computational models of Turing Machinesand RAMMachines / NAND-RAM programs are equivalent up to poly-nomial factors in the running time. Other examples of polynomiallyequivalent models include:

โ€ข All standard programming languages, including C/Python/-JavaScript/Lisp/etc.

โ€ข The ๐œ† calculus (see also Section 13.8).

โ€ข Cellular automata

โ€ข Parallel computers

โ€ข Biological computing devices such as DNA-based computers.

The Extended Church Turing Thesis is the statement that this is truefor all physically realizable computing models. In other words, theextended Church Turing thesis says that for every scalable computingdevice ๐ถ (which has a finite description but can be in principle usedto run computation on arbitrarily large inputs), there is some con-stant ๐‘Ž such that for every function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 that ๐ถ can

Page 428: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

428 introduction to theoretical computer science

compute on ๐‘› length inputs using an ๐‘†(๐‘›) amount of physical re-sources, ๐น is in TIME(๐‘†(๐‘›)๐‘Ž). This is a strengthening of the (โ€œplainโ€)Church-Turing Thesis, discussed in Section 8.8, which states that theset of computable functions is the same for all physically realizablemodels, but without requiring the overhead in the simulation betweendifferent models to be at most polynomial.

All the current constructions of scalable computational models andprogramming languages conform to the Extended Church-TuringThesis, in the sense that they can be simulated with polynomial over-head by Turing Machines (and hence also by NAND-TM or NAND-RAM programs). Consequently, the classes P and EXP are robust tothe choice of model, and we can use the programming language ofour choice, or high level descriptions of an algorithm, to determinewhether or not a problem is in P.

Like the Church-Turing thesis itself, the extended Church-Turingthesis is in the asymptotic setting and does not directly yield an ex-perimentally testable prediction. However, it can be instantiated withmore concrete bounds on the overhead, yielding experimentally-testable predictions such as the Physical Extended Church-Turing Thesiswe mentioned in Section 5.6.

In the last hundred+ years of studying and mechanizing com-putation, no one has yet constructed a scalable computing devicethat violates the extended Church Turing Thesis. However, quan-tum computing, if realized, will pose a serious challenge to the ex-tended Church-Turing Thesis (see Chapter 23). However, even ifthe promises of quantum computing are fully realized, the extendedChurch-Turing thesis is โ€œmorallyโ€ correct, in the sense that, while wedo need to adapt the thesis to account for the possibility of quantumcomputing, its broad outline remains unchanged. We are still ableto model computation mathematically, we can still treat programsas strings and have a universal program, we still have time hierarchyand uncomputability results, and there is still no reason to doubt the(โ€œplainโ€) Church-Turing thesis. Moreover, the prospect of quantumcomputing does not seem to make a difference for the time complexityof many (though not all!) of the concrete problems that we care about.In particular, as far as we know, out of all the example problems men-tioned in Chapter 12 the complexity of only oneโ€” integer factoringโ€”is affected by modifying our model to include quantum computers aswell.

Page 429: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 429

Figure 13.5: The universal NAND-RAM program๐‘ˆ simulates an input NAND-RAM program ๐‘ƒby storing all of ๐‘ƒ โ€™s variables inside a single arrayVars of ๐‘ˆ . If ๐‘ƒ has ๐‘ก variables, then the array Varsis divided into blocks of length ๐‘ก, where the ๐‘—-thcoordinate of the ๐‘–-th block contains the ๐‘–-th elementof the ๐‘—-th array of ๐‘ƒ . If the ๐‘—-th variable of ๐‘ƒ isscalar, then we just store its value in the zeroth blockof Vars.

13.4 EFFICIENT UNIVERSAL MACHINE: A NAND-RAM INTER-PRETER IN NAND-RAM

We have seen in Theorem 9.1 the โ€œuniversal Turing Machineโ€. Exam-ining that proof, and combining it with Theorem 13.5 , we can see thatthe program ๐‘ˆ has a polynomial overhead, in the sense that it can sim-ulate ๐‘‡ steps of a given NAND-TM (or NAND-RAM) program ๐‘ƒ onan input ๐‘ฅ in ๐‘‚(๐‘‡ 4) steps. But in fact, by directly simulating NAND-RAM programs we can do better with only a constant multiplicativeoverhead. That is, there is a universal NAND-RAM program ๐‘ˆ such thatfor every NAND-RAM program ๐‘ƒ , ๐‘ˆ simulates ๐‘‡ steps of ๐‘ƒ usingonly ๐‘‚(๐‘‡ ) steps. (The implicit constant in the ๐‘‚ notation can dependon the program ๐‘ƒ but does not depend on the length of the input.)

Theorem 13.7 โ€” Efficient universality of NAND-RAM. There exists a NAND-RAM program ๐‘ˆ satisfying the following:

1. (๐‘ˆ is a universal NAND-RAM program.) For every NAND-RAMprogram ๐‘ƒ and input ๐‘ฅ, ๐‘ˆ(๐‘ƒ , ๐‘ฅ) = ๐‘ƒ(๐‘ฅ) where by ๐‘ˆ(๐‘ƒ , ๐‘ฅ) wedenote the output of ๐‘ˆ on a string encoding the pair (๐‘ƒ , ๐‘ฅ).

2. (๐‘ˆ is efficient.) There are some constants ๐‘Ž, ๐‘ such that for ev-ery NAND-RAM program ๐‘ƒ , if ๐‘ƒ halts on input ๐‘ฅ after most๐‘‡ steps, then ๐‘ˆ(๐‘ƒ , ๐‘ฅ) halts after at most ๐ถ โ‹… ๐‘‡ steps where๐ถ โ‰ค ๐‘Ž|๐‘ƒ |๐‘.

PAs in the case of Theorem 13.5, the proof of Theo-rem 13.7 is not very deep and so it is more importantto understand its statement. Specifically, if you under-stand how you would go about writing an interpreterfor NAND-RAM using a modern programming lan-guage such as Python, then you know everything youneed to know about the proof of this theorem.

Proof of Theorem 13.7. To present a universal NAND-RAM programin full we would need to describe a precise representation scheme,as well as the full NAND-RAM instructions for the program. Whilethis can be done, it is more important to focus on the main ideas, andso we just sketch the proof here. A specification of NAND-RAM isgiven in the appendix, and for the purposes of this simulation, we cansimply use the representation of the code NAND-RAM as an ASCIIstring.

Page 430: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

430 introduction to theoretical computer science

The program ๐‘ˆ gets as input a NAND-RAM program ๐‘ƒ and aninput ๐‘ฅ and simulates ๐‘ƒ one step at a time. To do so, ๐‘ˆ does the fol-lowing:

1. ๐‘ˆ maintains variables program_counter, and number_steps for thecurrent line to be executed and the number of steps executed so far.

2. ๐‘ˆ initially scans the code of ๐‘ƒ to find the number ๐‘ก of unique vari-able names that ๐‘ƒ uses. It will translate each variable name into anumber between 0 and ๐‘ก โˆ’ 1 and use an array Program to store ๐‘ƒ โ€™scode where for every line โ„“, Program[โ„“] will store the โ„“-th line of ๐‘ƒwhere the variable names have been translated to numbers. (Moreconcretely, we will use a constant number of arrays to separatelyencode the operation used in this line, and the variable names andindices of the operands.)

3. ๐‘ˆ maintains a single array Vars that contains all the values of ๐‘ƒ โ€™svariables. We divide Vars into blocks of length ๐‘ก. If ๐‘  is a num-ber corresponding to an array variable Foo of ๐‘ƒ , then we storeFoo[0] in Vars[๐‘ ], we store Foo[1] in Var_values[๐‘ก + ๐‘ ], Foo[2]in Vars[2๐‘ก + ๐‘ ] and so on and so forth (see Fig. 13.5). Generally,ifthe ๐‘ -th variable of ๐‘ƒ is a scalar variable, then its value will bestored in location Vars[๐‘ ]. If it is an array variable then the value ofits ๐‘–-th element will be stored in location Vars[๐‘ก โ‹… ๐‘– + ๐‘ ].

4. To simulate a single step of ๐‘ƒ , the program ๐‘ˆ recovers from Pro-gram the line corresponding to program_counter and executes it.Since NAND-RAM has a constant number of arithmetic operations,we can implement the logic of which operation to execute using asequence of a constant number of if-then-elseโ€™s. Retrieving fromVars the values of the operands of each instruction can be doneusing a constant number of arithmetic operations.

The setup stages take only a constant (depending on |๐‘ƒ | but noton the input ๐‘ฅ) number of steps. Once we are done with the setup, tosimulate a single step of ๐‘ƒ , we just need to retrieve the correspondingline and do a constant number of โ€œif elsesโ€ and accesses to Vars tosimulate it. Hence the total running time to simulate ๐‘‡ steps of theprogram ๐‘ƒ is at most ๐‘‚(๐‘‡ ) when suppressing constants that dependon the program ๐‘ƒ .

13.4.1 Timed Universal Turing MachineOne corollary of the efficient universal machine is the following.Given any Turing Machine ๐‘€ , input ๐‘ฅ, and โ€œstep budgetโ€ ๐‘‡ , we cansimulate the execution of ๐‘€ for ๐‘‡ steps in time that is polynomial in

Page 431: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 431

Figure 13.6: The timed universal Turing Machine takesas input a Turing machine ๐‘€ , an input ๐‘ฅ, and a timebound ๐‘‡ , and outputs ๐‘€(๐‘ฅ) if ๐‘€ halts within atmost ๐‘‡ steps. Theorem 13.8 states that there is such amachine that runs in time polynomial in ๐‘‡ .

๐‘‡ . Formally, we define a function TIMEDEVAL that takes the threeparameters ๐‘€ , ๐‘ฅ, and the time budget, and outputs ๐‘€(๐‘ฅ) if ๐‘€ haltswithin at most ๐‘‡ steps, and outputs 0 otherwise. The timed univer-sal Turing Machine computes TIMEDEVAL in polynomial time (seeFig. 13.6). (Since we measure time as a function of the input length,we define TIMEDEVAL as taking the input ๐‘‡ represented in unary: astring of ๐‘‡ ones.)

Theorem 13.8 โ€” Timed Universal Turing Machine. Let TIMEDEVAL โˆถ0, 1โˆ— โ†’ 0, 1โˆ— be the function defined as

TIMEDEVAL(๐‘€, ๐‘ฅ, 1๐‘‡ ) = โŽงโŽจโŽฉ๐‘€(๐‘ฅ) ๐‘€ halts within โ‰ค ๐‘‡ steps on ๐‘ฅ0 otherwise.

(13.3)Then TIMEDEVAL โˆˆ P.

Proof. We only sketch the proof since the result follows fairly directlyfrom Theorem 13.5 and Theorem 13.7. By Theorem 13.5 to show thatTIMEDEVAL โˆˆ P, it suffices to give a polynomial-time NAND-RAMprogram to compute TIMEDEVAL.

Such a program can be obtained as follows. Given a Turing Ma-chine ๐‘€ , by Theorem 13.5 we can transform it in time polynomial inits description into a functionally-equivalent NAND-RAM program๐‘ƒ such that the execution of ๐‘€ on ๐‘‡ steps can be simulated by theexecution of ๐‘ƒ on ๐‘ โ‹… ๐‘‡ steps. We can then run the universal NAND-RAM machine of Theorem 13.7 to simulate ๐‘ƒ for ๐‘ โ‹… ๐‘‡ steps, using๐‘‚(๐‘‡ ) time, and output 0 if the execution did not halt within this bud-get. This shows that TIMEDEVAL can be computed by a NAND-RAMprogram in time polynomial in |๐‘€| and linear in ๐‘‡ , which meansTIMEDEVAL โˆˆ P.

13.5 THE TIME HIERARCHY THEOREM

Some functions are uncomputable, but are there functions that canbe computed, but only at an exorbitant cost? For example, is there afunction that can be computed in time 2๐‘›, but can not be computed intime 20.9๐‘›? It turns out that the answer is Yes:

Theorem 13.9 โ€” Time Hierarchy Theorem. For every nice function ๐‘‡ โˆถโ„• โ†’ โ„•, there is a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 in TIME(๐‘‡ (๐‘›) log๐‘›)โงตTIME(๐‘‡ (๐‘›)).There is nothing special about log๐‘›, and we could have used any

other efficiently computable function that tends to infinity with ๐‘›.

Page 432: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

432 introduction to theoretical computer science

Big Idea 19 If we have more time, we can compute more functions.

RRemark 13.10 โ€” Simpler corollary of the time hierarchytheorem. The generality of the time hierarchy theoremcan make its proof a little hard to read. It might beeasier to follow the proof if you first try to prove byyourself the easier statement P โŠŠ EXP.You can do so by showing that the following function๐น โˆถ 0, 1โˆ— โˆถโ†’ 0, 1 is in EXP โงต P: for every TuringMachine ๐‘€ and input ๐‘ฅ, ๐น(๐‘€, ๐‘ฅ) = 1 if and only if๐‘€ halts on ๐‘ฅ within at most |๐‘ฅ|log |๐‘ฅ| steps. One canshow that ๐น โˆˆ TIME(๐‘›๐‘‚(log๐‘›)) โŠ† EXP using theuniversal Turing machine (or the efficient universalNAND-RAM program of Theorem 13.7). On the otherharnd, we can use similar ideas to those used to showthe uncomputability of HALT in Section 9.3.2 to provethat ๐น โˆ‰ P.

Figure 13.7: The Time Hierarchy Theorem (Theo-rem 13.9) states that all of these classes are distinct.

Proof Idea:

In the proof of Theorem 9.6 (the uncomputability of the Haltingproblem), we have shown that the function HALT cannot be com-puted in any finite time. An examination of the proof shows that itgives something stronger. Namely, the proof shows that if we fix ourcomputational budget to be ๐‘‡ steps, then not only we canโ€™t distinguishbetween programs that halt and those that do not, but cannot evendistinguish between programs that halt within at most ๐‘‡ โ€ฒ steps andthose that take more than that (where ๐‘‡ โ€ฒ is some number dependingon ๐‘‡ ). Therefore, the proof of Theorem 13.9 follows the ideas of the

Page 433: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 433

uncomputability of the halting problem, but again with a more carefulaccounting of the running time.โ‹†Proof of Theorem 13.9. Our proof is inspired by the proof of the un-computability of the halting problem. Specifically, for every function๐‘‡ as in the theoremโ€™s statement, we define the Bounded Halting func-tion HALT๐‘‡ as follows. The input to HALT๐‘‡ is a pair (๐‘ƒ , ๐‘ฅ) such that|๐‘ƒ | โ‰ค log log |๐‘ฅ| encodes some NAND-RAM program. We define

HALT๐‘‡ (๐‘ƒ , ๐‘ฅ) = โŽงโŽจโŽฉ1, ๐‘ƒ halts on ๐‘ฅ within โ‰ค 100 โ‹… ๐‘‡ (|๐‘ƒ | + |๐‘ฅ|) steps0, otherwise.

(13.4)(The constant 100 and the function log log๐‘› are rather arbitrary, andare chosen for convenience in this proof.)

Theorem 13.9 is an immediate consequence of the following twoclaims:

Claim 1: HALT๐‘‡ โˆˆ TIME(๐‘‡ (๐‘›) โ‹… log๐‘›)andClaim 2: HALT๐‘‡ โˆ‰ TIME(๐‘‡ (๐‘›)).Please make sure you understand why indeed the theorem follows

directly from the combination of these two claims. We now turn toproving them.

Proof of claim 1: We can easily check in linear time whether aninput has the form ๐‘ƒ , ๐‘ฅ where |๐‘ƒ | โ‰ค log log |๐‘ฅ|. Since ๐‘‡ (โ‹…) is a nicefunction, we can evaluate it in ๐‘‚(๐‘‡ (๐‘›)) time. Thus, we can computeHALT๐‘‡ (๐‘ƒ , ๐‘ฅ) as follows:

1. Compute ๐‘‡0 = ๐‘‡ (|๐‘ƒ | + |๐‘ฅ|) in ๐‘‚(๐‘‡0) steps.

2. Use the universal NAND-RAM program of Theorem 13.7 to simu-late 100โ‹…๐‘‡0 steps of ๐‘ƒ on the input ๐‘ฅ using at most ๐‘๐‘œ๐‘™๐‘ฆ(|๐‘ƒ |)๐‘‡0 steps.(Recall that we use ๐‘๐‘œ๐‘™๐‘ฆ(โ„“) to denote a quantity that is bounded by๐‘Žโ„“๐‘ for some constants ๐‘Ž, ๐‘.)

3. If ๐‘ƒ halts within these 100 โ‹… ๐‘‡0 steps then output 1, else output 0.The length of the input is ๐‘› = |๐‘ƒ | + |๐‘ฅ|. Since |๐‘ฅ| โ‰ค ๐‘› and(log log |๐‘ฅ|)๐‘ = ๐‘œ(log |๐‘ฅ|) for every ๐‘, the running time will be๐‘œ(๐‘‡ (|๐‘ƒ | + |๐‘ฅ|) log๐‘›) and hence the above algorithm demonstrates that

HALT๐‘‡ โˆˆ TIME(๐‘‡ (๐‘›) โ‹… log๐‘›), completing the proof of Claim 1.Proof of claim 2: This proof is the heart of Theorem 13.9, and is

very reminiscent of the proof that HALT is not computable. Assume,for the sake of contradiction, that there is some NAND-RAM program๐‘ƒ โˆ— that computes HALT๐‘‡ (๐‘ƒ , ๐‘ฅ) within ๐‘‡ (|๐‘ƒ | + |๐‘ฅ|) steps. We are going

Page 434: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

434 introduction to theoretical computer science

to show a contradiction by creating a program ๐‘„ and showing thatunder our assumptions, if ๐‘„ runs for less than ๐‘‡ (๐‘›) steps when given(a padded version of) its own code as input then it actually runs formore than ๐‘‡ (๐‘›) steps and vice versa. (It is worth re-reading the lastsentence twice or thrice to make sure you understand this logic. It isvery similar to the direct proof of the uncomputability of the haltingproblem where we obtained a contradiction by using an assumedโ€œhalting solverโ€ to construct a program that, given its own code asinput, halts if and only if it does not halt.)

We will define ๐‘„โˆ— to be the program that on input a string ๐‘ง doesthe following:

1. If ๐‘ง does not have the form ๐‘ง = ๐‘ƒ1๐‘š where ๐‘ƒ represents a NAND-RAM program and |๐‘ƒ | < 0.1 log log๐‘š then return 0. (Recall that1๐‘š denotes the string of ๐‘š ones.)

2. Compute ๐‘ = ๐‘ƒ โˆ—(๐‘ƒ , ๐‘ง) (at a cost of at most ๐‘‡ (|๐‘ƒ | + |๐‘ง|) steps, underour assumptions).

3. If ๐‘ = 1 then ๐‘„โˆ— goes into an infinite loop, otherwise it halts.

Let โ„“ be the length description of ๐‘„โˆ— as a string, and let ๐‘š be largerthan 221000โ„“ . We will reach a contradiction by splitting into cases ac-cording to whether or not HALT๐‘‡ (๐‘„โˆ—, ๐‘„โˆ—1๐‘š) equals 0 or 1.

On the one hand, if HALT๐‘‡ (๐‘„โˆ—, ๐‘„โˆ—1๐‘š) = 1, then under our as-sumption that ๐‘ƒ โˆ— computes HALT๐‘‡ , ๐‘„โˆ— will go into an infinite loopon input ๐‘ง = ๐‘„โˆ—1๐‘š, and hence in particular ๐‘„โˆ— does not halt within100๐‘‡ (|๐‘„โˆ—| + ๐‘š) steps on the input ๐‘ง. But this contradicts our assump-tion that HALT๐‘‡ (๐‘„โˆ—, ๐‘„โˆ—1๐‘š) = 1.

This means that it must hold that HALT๐‘‡ (๐‘„โˆ—, ๐‘„โˆ—1๐‘š) = 0. Butin this case, since we assume ๐‘ƒ โˆ— computes HALT๐‘‡ , ๐‘„โˆ— does not doanything in phase 3 of its computation, and so the only computa-tion costs come in phases 1 and 2 of the computation. It is not hardto verify that Phase 1 can be done in linear and in fact less than 5|๐‘ง|steps. Phase 2 involves executing ๐‘ƒ โˆ—, which under our assumptionrequires ๐‘‡ (|๐‘„โˆ—| + ๐‘š) steps. In total we can perform both phases inless than 10๐‘‡ (|๐‘„โˆ—| + ๐‘š) in steps, which by definition means thatHALT๐‘‡ (๐‘„โˆ—, ๐‘„โˆ—1๐‘š) = 1, but this is of course a contradiction. Thiscompletes the proof of Claim 2 and hence of Theorem 13.9.

Solved Exercise 13.3 โ€” P vs EXP. Prove that P โŠŠ EXP.

Solution:

Page 435: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 435

Figure 13.8: Some complexity classes and some of thefunctions we know (or conjecture) to be contained inthem.

We show why this statement follows from the time hierarchytheorem, but it can be an instructive exercise to prove it directly,see Remark 13.10. We need to show that there exists ๐น โˆˆ EXP โงต P.Let ๐‘‡ (๐‘›) = ๐‘›log๐‘› and ๐‘‡ โ€ฒ(๐‘›) = ๐‘›log๐‘›/2. Both are nice functions.Since ๐‘‡ (๐‘›)/๐‘‡ โ€ฒ(๐‘›) = ๐œ”(log๐‘›), by Theorem 13.9 there exists some๐น in TIME(๐‘‡ โ€ฒ(๐‘›)) โŠŠ TIME(๐‘‡ (๐‘›)). Since for sufficiently large ๐‘›,2๐‘› > ๐‘›log๐‘›, ๐น โˆˆ TIME(2๐‘›) โŠ† EXP. On the other hand, ๐น โˆ‰ P.Indeed, suppose otherwise that there was a constant ๐‘ > 0 anda Turing Machine computing ๐น on ๐‘›-length input in at most ๐‘›๐‘steps for all sufficiently large ๐‘›. Then since for ๐‘› large enough๐‘›๐‘ < ๐‘›log๐‘›/2, it would have followed that ๐น โˆˆ TIME(๐‘›log๐‘›/2)contradicting our choice of ๐น .

The time hierarchy theorem tells us that there are functions we cancompute in ๐‘‚(๐‘›2) time but not ๐‘‚(๐‘›), in 2๐‘› time, but not 2โˆš๐‘›, etc.. Inparticular there are most definitely functions that we can compute intime 2๐‘› but not ๐‘‚(๐‘›). We have seen that we have no shortage of natu-ral functions for which the best known algorithm requires roughly 2๐‘›time, and that many people have invested significant effort in tryingto improve that. However, unlike in the finite vs. infinite case, for allof the examples above at the moment we do not know how to ruleout even an ๐‘‚(๐‘›) time algorithm. We will however see that there is asingle unproven conjecture that would imply such a result for most ofthese problems.

The time hierarchy theorem relies on the existence of an efficientuniversal NAND-RAM program, as proven in Theorem 13.7. Forother models such as Turing Machines we have similar time hierarchyresults showing that there are functions computable in time ๐‘‡ (๐‘›) andnot in time ๐‘‡ (๐‘›)/๐‘“(๐‘›) where ๐‘“(๐‘›) corresponds to the overhead in thecorresponding universal machine.

13.6 NON UNIFORM COMPUTATION

We have now seen two measures of โ€œcomputation costโ€ for functions.In Section 4.6 we defined the complexity of computing finite functionsusing circuits / straightline programs. Specifically, for a finite function๐‘” โˆถ 0, 1๐‘› โ†’ 0, 1 and number ๐‘‡ โˆˆ โ„•, ๐‘” โˆˆ SIZE(๐‘‡ ) if there is a circuitof at most ๐‘‡ NAND gates (or equivalently a ๐‘‡ -line NAND-CIRCprogram) that computes ๐‘”. To relate this to the classes TIME(๐‘‡ (๐‘›))defined in this chapter we first need to extend the class SIZE(๐‘‡ (๐‘›))from finite functions to functions with unbounded input length.

Definition 13.11 โ€” Non uniform computation. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1and ๐‘‡ โˆถ โ„• โ†’ โ„• be a nice time bound. For every ๐‘› โˆˆ โ„•, define

Page 436: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

436 introduction to theoretical computer science

Figure 13.9: We can think of an infinite function๐น โˆถ 0, 1โˆ— โ†’ 0, 1 as a collection of finite functions๐น0, ๐น1, ๐น2, โ€ฆ where ๐น๐‘› โˆถ 0, 1๐‘› โ†’ 0, 1 is therestriction of ๐น to inputs of length ๐‘›. We say ๐น is inP/poly if for every ๐‘›, the function ๐น๐‘› is computableby a polynomial size NAND-CIRC program, orequivalently, a polynomial sized Boolean circuit.

๐น๐‘› โˆถ 0, 1๐‘› โ†’ 0, 1 to be the restriction of ๐น to inputs of size ๐‘›.That is, ๐น๐‘› is the function mapping 0, 1๐‘› to 0, 1 such that forevery ๐‘ฅ โˆˆ 0, 1๐‘›, ๐น๐‘›(๐‘ฅ) = ๐น(๐‘ฅ).

We say that ๐น is non-uniformly computable in at most ๐‘‡ (๐‘›) size, de-noted by ๐น โˆˆ SIZE(๐‘‡ (๐‘›)) if there exists a sequence (๐ถ0, ๐ถ1, ๐ถ2, โ€ฆ)of NAND circuits such that:

โ€ข For every ๐‘› โˆˆ โ„•, ๐ถ๐‘› computes the function ๐น๐‘›โ€ข For every sufficiently large ๐‘›, ๐ถ๐‘› has at most ๐‘‡ (๐‘›) gates.

The non uniform analog to the class P is the class P/poly defined as

P/poly = โˆช๐‘โˆˆโ„•SIZE(๐‘›๐‘) . (13.5)There is a big difference between non uniform computation and uni-form complexity classes such as TIME(๐‘‡ (๐‘›)) or P. The condition๐น โˆˆ P means that there is a single Turing machine ๐‘€ that computes๐น on all inputs in polynomial time. The condition ๐น โˆˆ P/poly onlymeans that for every input length ๐‘› there can be a different circuit ๐ถ๐‘›that computes ๐น using polynomially many gates on inputs of theselengths. As we will see, ๐น โˆˆ P/poly does not necessarily imply that๐น โˆˆ P. However, the other direction is true:

Theorem 13.12 โ€” Nonuniform computation contains uniform computa-

tion. There is some ๐‘Ž โˆˆ โ„• s.t. for every nice ๐‘‡ โˆถ โ„• โ†’ โ„• and๐น โˆถ 0, 1โˆ— โ†’ 0, 1,TIME(๐‘‡ (๐‘›)) โŠ† SIZE(๐‘‡ (๐‘›)๐‘Ž) . (13.6)

In particular, Theorem 13.12 shows that for every ๐‘, TIME(๐‘›๐‘) โŠ†SIZE(๐‘›๐‘๐‘Ž) and hence P โŠ† P/poly.Proof Idea:

The idea behind the proof is to โ€œunroll the loopโ€. Specifically, wewill use the programming language variants of non-uniform and uni-form computation: namely NAND-CIRC and NAND-TM. The maindifference between the two is that NAND-TM has loops. However, forevery fixed ๐‘›, if we know that a NAND-TM program runs in at most๐‘‡ (๐‘›) steps, then we can replace its loop by simply โ€œcopying and past-ingโ€ its code ๐‘‡ (๐‘›) times, similar to how in Python we can replace codesuch as

for i in range(4):print(i)

with the โ€œloop freeโ€ code

Page 437: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 437

print(0)print(1)print(2)print(3)

To make this idea into an actual proof we need to tackle one tech-nical difficulty, and this is to ensure that the NAND-TM program isoblivious in the sense that the value of the index variable i in the ๐‘—-thiteration of the loop will depend only on ๐‘— and not on the contents ofthe input. We make a digression to do just that in Section 13.6.1 andthen complete the proof of Theorem 13.12.โ‹†13.6.1 Oblivious NAND-TM programsOur approach for proving Theorem 13.12 involves โ€œunrolling theloopโ€. For example, consider the following NAND-TM to compute theXOR function on inputs of arbitrary length:

temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)MODANDJUMP(X_nonblank[i],X_nonblank[i])

Setting (as an example) ๐‘› = 3, we can attempt to translate thisNAND-TM program into a NAND-CIRC program for computingXOR3 โˆถ 0, 13 โ†’ 0, 1 by simply โ€œcopying and pastingโ€ the loopthree times (dropping the MODANDJMP line):

temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)temp_0 = NAND(X[0],X[0])Y_nonblank[0] = NAND(X[0],temp_0)

Page 438: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

438 introduction to theoretical computer science

Figure 13.10: A NAND circuit for XOR3 obtained byโ€œunrolling the loopโ€ of the NAND-TM program forcomputing XOR three times.

temp_2 = NAND(X[i],Y[0])temp_3 = NAND(X[i],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)

However, the above is still not a valid NAND-CIRC program sinceit contains references to the special variable i. To make it into a validNAND-CIRC program, we replace references to i in the first iterationwith 0, references in the second iteration with 1, and references in thethird iteration with 2. (We also create a variable zero and use it for thefirst time any variable is instantiated, as well as remove assignments tonon-output variables that are never used later on.) The resulting pro-gram is a standard โ€œloop free and index freeโ€ NAND-CIRC programthat computes XOR3 (see also Fig. 13.10):

temp_0 = NAND(X[0],X[0])one = NAND(X[0],temp_0)zero = NAND(one,one)temp_2 = NAND(X[0],zero)temp_3 = NAND(X[0],temp_2)temp_4 = NAND(zero,temp_2)Y[0] = NAND(temp_3,temp_4)temp_2 = NAND(X[1],Y[0])temp_3 = NAND(X[1],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)temp_2 = NAND(X[2],Y[0])temp_3 = NAND(X[2],temp_2)temp_4 = NAND(Y[0],temp_2)Y[0] = NAND(temp_3,temp_4)

Key to this transformation was the fact that in our original NAND-TM program for XOR, regardless of whether the input is 011, 100, orany other string, the index variable i is guaranteed to equal 0 in thefirst iteration, 1 in the second iteration, 2 in the third iteration, and soon and so forth. The particular sequence 0, 1, 2, โ€ฆ is immaterial: thecrucial property is that the NAND-TM program for XOR is obliviousin the sense that the value of the index i in the ๐‘—-th iteration dependsonly on ๐‘— and does not depend on the particular choice of the input.Luckily, it is possible to transform every NAND-TM program intoa functionally equivalent oblivious program with at most quadraticoverhead. (Similarly we can transform any Turing machine into afunctionally equivalent oblivious Turing machine, see Exercise 13.6.)

Page 439: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 439

Figure 13.11: We simulate a ๐‘‡ (๐‘›)-time NAND-TMprogram ๐‘ƒ โ€ฒ with an oblivious NAND-TM program ๐‘ƒby adding special arrays Atstart and Atend to markpositions 0 and ๐‘‡ โˆ’ 1 respectively. The program ๐‘ƒwill simply โ€œsweepโ€ its arrays from right to left andback again. If the original program ๐‘ƒ โ€ฒ would havemoved i in a different direction then we wait ๐‘‚(๐‘‡ )steps until we reach the same point back again, and so๐‘ƒ runs in ๐‘‚(๐‘‡ (๐‘›)2) time.

Theorem 13.13 โ€” Making NAND-TM oblivious. Let ๐‘‡ โˆถ โ„• โ†’ โ„• be a nicefunction and let ๐น โˆˆ TIMETM(๐‘‡ (๐‘›)). Then there is a NAND-TMprogram ๐‘ƒ that computes ๐น in ๐‘‚(๐‘‡ (๐‘›)2) steps and satisfying thefollowing. For every ๐‘› โˆˆ โ„• there is a sequence ๐‘–0, ๐‘–1, โ€ฆ , ๐‘–๐‘šโˆ’1 suchthat for every ๐‘ฅ โˆˆ 0, 1๐‘›, if ๐‘ƒ is executed on input ๐‘ฅ then in the๐‘—-th iteration the variable i is equal to ๐‘–๐‘—.In other words, Theorem 13.13 implies that if we can compute ๐น in๐‘‡ (๐‘›) steps, then we can compute it in ๐‘‚(๐‘‡ (๐‘›)2) steps with a program๐‘ƒ in which the position of i in the ๐‘—-th iteration depends only on ๐‘—

and the length of the input, and not on the contents of the input. Sucha program can be easily translated into a NAND-CIRC program of๐‘‚(๐‘‡ (๐‘›)2) lines by โ€œunrolling the loopโ€.Proof Idea:

We can translate any NAND-TM program ๐‘ƒ โ€ฒ into an obliviousprogram ๐‘ƒ by making ๐‘ƒ โ€œsweepโ€ its arrays. That is, the index i in๐‘ƒ will always move all the way from position 0 to position ๐‘‡ (๐‘›) โˆ’ 1and back again. We can then simulate the program ๐‘ƒ โ€ฒ with at most๐‘‡ (๐‘›) overhead: if ๐‘ƒ โ€ฒ wants to move i left when we are in a rightwardsweep then we simply wait the at most 2๐‘‡ (๐‘›) steps until the next timewe are back in the same position while sweeping to the left.โ‹†Proof of Theorem 13.13. Let ๐‘ƒ โ€ฒ be a NAND-TM program computing ๐นin ๐‘‡ (๐‘›) steps. We construct an oblivious NAND-TM program ๐‘ƒ forcomputing ๐น as follows (see also Fig. 13.11).

1. On input ๐‘ฅ, ๐‘ƒ will compute ๐‘‡ = ๐‘‡ (|๐‘ฅ|) and set up arrays Atstartand Atend satisfying Atstart[0]= 1 and Atstart[๐‘–]= 0 for ๐‘– > 0and Atend[๐‘‡ โˆ’ 1]= 1 and Atend[i]= 0 for all ๐‘– โ‰  ๐‘‡ โˆ’ 1. We can dothis because ๐‘‡ is a nice function. Note that since this computationdoes not depend on ๐‘ฅ but only on its length, it is oblivious.

2. ๐‘ƒ will also have a special array Marker initialized to all zeroes.

3. The index variable of ๐‘ƒ will change direction of movement tothe right whenever Atstart[i]= 1 and to the left wheneverAtend[i]= 1.

4. The program ๐‘ƒ simulates the execution of ๐‘ƒ โ€ฒ. However, if theMODANDJMP instruction in ๐‘ƒ โ€ฒ attempts to move to the right when ๐‘ƒis moving left (or vice versa) then ๐‘ƒ will set Marker[i] to 1 andenter into a special โ€œwaiting modeโ€. In this mode ๐‘ƒ will wait untilthe next time in which Marker[i]= 1 (at the next sweep) at whichpoints ๐‘ƒ zeroes Marker[i] and continues with the simulation. In

Page 440: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

440 introduction to theoretical computer science

Figure 13.12: The function UNROLL takes as input aTuring Machine ๐‘€ , an input length parameter ๐‘›, astep budget parameter ๐‘‡ , and outputs a circuit ๐ถ ofsize ๐‘๐‘œ๐‘™๐‘ฆ(๐‘‡ ) that takes ๐‘› bits of inputs and outputs๐‘€(๐‘ฅ) if ๐‘€ halts on ๐‘ฅ within at most ๐‘‡ steps.

the worst case this will take 2๐‘‡ (๐‘›) steps (if ๐‘ƒ has to go all the wayfrom one end to the other and back again.)

5. We also modify ๐‘ƒ to ensure it ends the computation after simu-lating exactly ๐‘‡ (๐‘›) steps of ๐‘ƒ โ€ฒ, adding โ€œdummy stepsโ€ if ๐‘ƒ โ€ฒ endsearly.

We see that ๐‘ƒ simulates the execution of ๐‘ƒ โ€ฒ with an overhead of๐‘‚(๐‘‡ (๐‘›)) steps of ๐‘ƒ per one step of ๐‘ƒ โ€ฒ, hence completing the proof.

Theorem 13.13 implies Theorem 13.12. Indeed, if ๐‘ƒ is a ๐‘˜-line obliv-ious NAND-TM program computing ๐น in time ๐‘‡ (๐‘›) then for every ๐‘›we can obtain a NAND-CIRC program of (๐‘˜ โˆ’ 1) โ‹… ๐‘‡ (๐‘›) lines by simplymaking ๐‘‡ (๐‘›) copies of ๐‘ƒ (dropping the final MODANDJMP line). In the๐‘—-th copy we replace all references of the form Foo[i] to foo_๐‘–๐‘— where๐‘–๐‘— is the value of i in the ๐‘—-th iteration.

13.6.2 โ€œUnrolling the loopโ€: algorithmic transformation of Turing Machinesto circuits

The proof of Theorem 13.12 is algorithmic, in the sense that the proofyields a polynomial-time algorithm that given a Turing Machine ๐‘€and parameters ๐‘‡ and ๐‘›, produces a circuit of ๐‘‚(๐‘‡ 2) gates that agreeswith ๐‘€ on all inputs ๐‘ฅ โˆˆ 0, 1๐‘› (as long as ๐‘€ runs for less than ๐‘‡steps these inputs.) We record this fact in the following theorem, sinceit will be useful for us later on:

Theorem 13.14 โ€” Turing-machine to circuit compiler. There is algorithmUNROLL such that for every Turing Machine ๐‘€ and numbers ๐‘›, ๐‘‡ ,UNROLL(๐‘€, 1๐‘‡ , 1๐‘›) runs for ๐‘๐‘œ๐‘™๐‘ฆ(|๐‘€|, ๐‘‡ , ๐‘›) steps and outputs aNAND circuit ๐ถ with ๐‘› inputs, ๐‘‚(๐‘‡ 2) gates, and one output, suchthat ๐ถ(๐‘ฅ) = โŽงโŽจโŽฉ๐‘ฆ ๐‘€ halts in โ‰ค ๐‘‡ steps and outputs ๐‘ฆ0 otherwise

. (13.7)

Proof. We only sketch the proof since it follows by directly translat-ing the proof of Theorem 13.12 into an algorithm together with thesimulation of Turing machines by NAND-TM programs (see alsoFig. 13.13). Specifically, UNROLL does the following:

1. Transform the Turing Machine ๐‘€ into an equivalent NAND-TMprogram ๐‘ƒ .

2. Transform the NAND-TM program ๐‘ƒ into an equivalent obliviousprogram ๐‘ƒ โ€ฒ following the proof of Theorem 13.13. The program ๐‘ƒ โ€ฒtakes ๐‘‡ โ€ฒ = ๐‘‚(๐‘‡ 2) steps to simulate ๐‘‡ steps of ๐‘ƒ .

Page 441: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 441

3. โ€œUnroll the loopโ€ of ๐‘ƒ โ€ฒ by obtaining a NAND-CIRC program of๐‘‚(๐‘‡ โ€ฒ) lines (or equivalently a NAND circuit with ๐‘‚(๐‘‡ 2) gates)corresponding to the execution of ๐‘‡ โ€ฒ iterations of ๐‘ƒ โ€ฒ.

Figure 13.13: We can transform a Turing Machine ๐‘€ ,input length parameter ๐‘›, and time bound ๐‘‡ into an๐‘‚(๐‘‡ 2) sized NAND circuit that agrees with ๐‘€ on allinputs ๐‘ฅ โˆˆ 0, 1๐‘› on which ๐‘€ halts in at most ๐‘‡steps. The transformation is obtained by first usingthe equivalence of Turing Machines and NAND-TM programs ๐‘ƒ , then turning ๐‘ƒ into an equivalentoblivious NAND-TM program ๐‘ƒ โ€ฒ via Theorem 13.13,then โ€œunrollingโ€ ๐‘‚(๐‘‡ 2) iterations of the loop of๐‘ƒ โ€ฒ to obtain an ๐‘‚(๐‘‡ 2) line NAND-CIRC programthat agrees with ๐‘ƒ โ€ฒ on length ๐‘› inputs, and finallytranslating this program into an equivalent circuit.

Big Idea 20 By โ€œunrolling the loopโ€ we can transform an al-gorithm that takes ๐‘‡ (๐‘›) steps to compute ๐น into a circuit that uses๐‘๐‘œ๐‘™๐‘ฆ(๐‘‡ (๐‘›)) gates to compute the restriction of ๐น to 0, 1๐‘›.

PReviewing the transformations described in Fig. 13.13,as well as solving the following two exercises is a greatway to get more comfort with non-uniform complexityand in particular with P/poly and its relation to P.

Solved Exercise 13.4 โ€” Alternative characterization of P. Prove that for every๐น โˆถ 0, 1โˆ— โ†’ 0, 1, ๐น โˆˆ P if and only if there is a polynomial-time Turing Machine ๐‘€ such that for every ๐‘› โˆˆ โ„•, ๐‘€(1๐‘›) outputs adescription of an ๐‘› input circuit ๐ถ๐‘› that computes the restriction ๐น๐‘›of ๐น to inputs in 0, 1๐‘›.

Solution:

We start with the โ€œifโ€ direction. Suppose that there is a polynomial-time Turing Machine ๐‘€ that on input 1๐‘› outputs a circuit ๐ถ๐‘› that

Page 442: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

442 introduction to theoretical computer science

computes ๐น๐‘›. Then the following is a polynomial-time TuringMachine ๐‘€ โ€ฒ to compute ๐น . On input ๐‘ฅ โˆˆ 0, 1โˆ—, ๐‘€ โ€ฒ will:

1. Let ๐‘› = |๐‘ฅ| and compute ๐ถ๐‘› = ๐‘€(1๐‘›).2. Return the evaluation of ๐ถ๐‘› on ๐‘ฅ.

Since we can evaluate a Boolean circuit on an input in poly-nomial time, ๐‘€ โ€ฒ runs in polynomial time and computes ๐น(๐‘ฅ) onevery input ๐‘ฅ.

For the โ€œonly ifโ€ direction, if ๐‘€ โ€ฒ is a Turing Machine that com-putes ๐น in polynomial-time, then (applying the equivalence of Tur-ing Machines and NAND-TM as well as Theorem 13.13) there isalso an oblivious NAND-TM program ๐‘ƒ that computes ๐น in time๐‘(๐‘›) for some polynomial ๐‘. We can now define ๐‘€ to be the TuringMachine that on input 1๐‘› outputs the NAND circuit obtained byโ€œunrolling the loopโ€ of ๐‘ƒ for ๐‘(๐‘›) iterations. The resulting NANDcircuit computes ๐น๐‘› and has ๐‘‚(๐‘(๐‘›)) gates. It can also be trans-formed to a Boolean circuit with ๐‘‚(๐‘(๐‘›)) AND/OR/NOT gates.

Solved Exercise 13.5 โ€” P/poly characterization by advice. Let ๐น โˆถ 0, 1โˆ— โ†’0, 1. Then ๐น โˆˆ P/poly if and only if there exists a polynomial ๐‘ โˆถ โ„• โ†’โ„•, a polynomial-time Turing Machine ๐‘€ and a sequence ๐‘Ž๐‘›๐‘›โˆˆโ„• ofstrings, such that for every ๐‘› โˆˆ โ„•:

โ€ข |๐‘Ž๐‘›| โ‰ค ๐‘(๐‘›)โ€ข For every ๐‘ฅ โˆˆ 0, 1๐‘›, ๐‘€(๐‘Ž๐‘›, ๐‘ฅ) = ๐น(๐‘ฅ).

Solution:

We only sketch the proof. For the โ€œonly ifโ€ direction, if ๐น โˆˆP/poly then we can use for ๐‘Ž๐‘› simply the description of the cor-responding circuit ๐ถ๐‘› and for ๐‘€ the program that computes inpolynomial time the evaluation of a circuit on its input.

For the โ€œifโ€ direction, we can use the same โ€œunrolling the loopโ€technique of Theorem 13.12 to show that if ๐‘ƒ is a polynomial-timeNAND-TM program, then for every ๐‘› โˆˆ โ„•, the map ๐‘ฅ โ†ฆ ๐‘ƒ(๐‘Ž๐‘›, ๐‘ฅ)can be computed by a polynomial size NAND-CIRC program ๐‘„๐‘›.

13.6.3 Can uniform algorithms simulate non uniform ones?Theorem 13.12 shows that every function in TIME(๐‘‡ (๐‘›)) is inSIZE(๐‘๐‘œ๐‘™๐‘ฆ(๐‘‡ (๐‘›))). One can ask if there is an inverse relation. Supposethat ๐น is such that ๐น๐‘› has a โ€œshortโ€ NAND-CIRC program for every๐‘›. Can we say that it must be in TIME(๐‘‡ (๐‘›)) for some โ€œsmallโ€ ๐‘‡ ? The

Page 443: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 443

answer is an emphatic no. Not only is P/poly not contained in P, in factP/poly contains functions that are uncomputable!

Theorem 13.15 โ€” P/poly contains uncomputable functions. There exists anuncomputable function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 such that ๐น โˆˆ P/poly.

Proof Idea:

Since P/poly corresponds to non uniform computation, a function๐น is in P/poly if for every ๐‘› โˆˆ โ„•, the restriction ๐น๐‘› to inputs of length๐‘› has a small circuit/program, even if the circuits for different valuesof ๐‘› are completely different from one another. In particular, if ๐น hasthe property that for every equal-length inputs ๐‘ฅ and ๐‘ฅโ€ฒ, ๐น(๐‘ฅ) =๐น(๐‘ฅโ€ฒ) then this means that ๐น๐‘› is either the constant function zeroor the constant function one for every ๐‘› โˆˆ โ„•. Since the constantfunction has a (very!) small circuit, such a function ๐น will alwaysbe in P/poly (indeed even in smaller classes). Yet by a reduction fromthe Halting problem, we can obtain a function with this property thatis uncomputable.โ‹†Proof of Theorem 13.15. Consider the following โ€œunary halting func-tionโ€ UH โˆถ 0, 1โˆ— โ†’ 0, 1 defined as follows. We let ๐‘† โˆถ โ„• โ†’ 0, 1โˆ—be the function that on input ๐‘› โˆˆ โ„•, outputs the string that corre-sponds to the binary representation of the number ๐‘› without the mostsignificant 1 digit. Note that ๐‘† is onto. For every ๐‘ฅ โˆˆ 0, 1, we de-fine UH(๐‘ฅ) = HALTONZERO(๐‘†(|๐‘ฅ|)). That is, if ๐‘› is the length of ๐‘ฅ,then UH(๐‘ฅ) = 1 if and only if the string ๐‘†(๐‘›) encodes a NAND-TMprogram that halts on the input 0.

UH is uncomputable, since otherwise we could computeHALTONZERO by transforming the input program ๐‘ƒ into the integer๐‘› such that ๐‘ƒ = ๐‘†(๐‘›) and then running UH(1๐‘›) (i.e., UH on the stringof ๐‘› ones). On the other hand, for every ๐‘›, UH๐‘›(๐‘ฅ) is either equalto 0 for all inputs ๐‘ฅ or equal to 1 on all inputs ๐‘ฅ, and hence can becomputed by a NAND-CIRC program of a constant number of lines.

The issue here is of course uniformity. For a function ๐น โˆถ 0, 1โˆ— โ†’0, 1, if ๐น is in TIME(๐‘‡ (๐‘›)) then we have a single algorithm thatcan compute ๐น๐‘› for every ๐‘›. On the other hand, ๐น๐‘› might be inSIZE(๐‘‡ (๐‘›)) for every ๐‘› using a completely different algorithm for ev-ery input length. For this reason we typically use P/poly not as a modelof efficient computation but rather as a way to model inefficient compu-tation. For example, in cryptography people often define an encryp-tion scheme to be secure if breaking it for a key of length ๐‘› requires

Page 444: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

444 introduction to theoretical computer science

more than a polynomial number of NAND lines. Since P โŠ† P/poly,this in particular precludes a polynomial time algorithm for doing so,but there are technical reasons why working in a non uniform modelmakes more sense in cryptography. It also allows to talk about se-curity in non asymptotic terms such as a scheme having โ€œ128 bits ofsecurityโ€.

While it can sometimes be a real issue, in many natural settings thedifference between uniform and non-uniform computation does notseem so important. In particular, in all the examples of problems notknown to be in P we discussed before: longest path, 3SAT, factoring,etc., these problems are also not known to be in P/poly either. Thus,for โ€œnaturalโ€ functions, if you pretend that TIME(๐‘‡ (๐‘›)) is roughly thesame as SIZE(๐‘‡ (๐‘›)), you will be right more often than wrong.

Figure 13.14: Relations between P, EXP, and P/poly. Itis known that P โŠ† EXP, P โŠ† P/poly and that P/polycontains uncomputable functions (which in particularare outside of EXP). It is not known whether or notEXP โŠ† P/poly though it is believed that EXP โŠˆ P/poly.

13.6.4 Uniform vs. Nonuniform computation: A recapTo summarize, the two models of computation we have described sofar are:โ€ข Uniform models: Turing machines, NAND-TM programs, RAM ma-

chines, NAND-RAM programs, C/JavaScript/Python, etc. These mod-els include loops and unbounded memory hence a single programcan compute a function with unbounded input length.

โ€ข Non-uniform models: Boolean Circuits or straightline programs haveno loops and can only compute finite functions. The time to executethem is exactly the number of lines or gates they contain.For a function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 and some nice time bound๐‘‡ โˆถ โ„• โ†’ โ„•, we know that:

โ€ข If ๐น is uniformly computable in time ๐‘‡ (๐‘›) then there is a sequenceof circuits ๐ถ1, ๐ถ2, โ€ฆ where ๐ถ๐‘› has ๐‘๐‘œ๐‘™๐‘ฆ(๐‘‡ (๐‘›)) gates and computes๐น๐‘› (i.e., restriction of ๐น to 0, 1๐‘›) for every ๐‘›.

โ€ข The reverse direction is not necessarily true - there are examples offunctions ๐น โˆถ 0, 1๐‘› โ†’ 0, 1 such that ๐น๐‘› can be computed byeven a constant size circuit but ๐น is uncomputable.

Page 445: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 445

This means that non uniform complexity is more useful to establishhardness of a function than its easiness.

Chapter Recap

โ€ข We can define the time complexity of a functionusing NAND-TM programs, and similarly to thenotion of computability, this appears to capture theinherent complexity of the function.

โ€ข There are many natural problems that havepolynomial-time algorithms, and other naturalproblems that weโ€™d love to solve, but for which thebest known algorithms are exponential.

โ€ข The definition of polynomial time, and hence theclass P, is robust to the choice of model, whetherit is Turing machines, NAND-TM, NAND-RAM,modern programming languages, and many othermodels.

โ€ข The time hierarchy theorem shows that there aresome problems that can be solved in exponential,but not in polynomial time. However, we do notknow if that is the case for the natural examplesthat we described in this lecture.

โ€ข By โ€œunrolling the loopโ€ we can show that everyfunction computable in time ๐‘‡ (๐‘›) can be computedby a sequence of NAND-CIRC programs (one forevery input length) each of size at most ๐‘๐‘œ๐‘™๐‘ฆ(๐‘‡ (๐‘›))

13.7 EXERCISES

Exercise 13.1 โ€” Equivalence of different definitions of P and EXP.. Provethat the classes P and EXP defined in Definition 13.2 are equal toโˆช๐‘โˆˆ1,2,3,โ€ฆTIME(๐‘›๐‘) and โˆช๐‘โˆˆ1,2,3,โ€ฆTIME(2๐‘›๐‘) respectively. (If๐‘†1, ๐‘†2, ๐‘†3, โ€ฆ is a collection of sets then the set ๐‘† = โˆช๐‘โˆˆ1,2,3,โ€ฆ๐‘†๐‘ isthe set of all elements ๐‘’ such that there exists some ๐‘ โˆˆ 1, 2, 3, โ€ฆwhere ๐‘’ โˆˆ ๐‘†๐‘.)

Exercise 13.2 โ€” Robustness to representation. Theorem 13.5 shows that theclasses P and EXP are robust with respect to variations in the choiceof the computational model. This exercise shows that these classesare also robust with respect to our choice of the representation of theinput.

Specifically, let ๐น be a function mapping graphs to 0, 1, and let๐น โ€ฒ, ๐น โ€ณ โˆถ 0, 1โˆ— โ†’ 0, 1 be the functions defined as follows. For every๐‘ฅ โˆˆ 0, 1โˆ—:

Page 446: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

446 introduction to theoretical computer science

1 Hint: This is the Turing Machine analog of Theo-rem 13.13. We replace one step of the original TM ๐‘€โ€ฒcomputing ๐น with a โ€œsweepโ€ of the obliviouss TM ๐‘€in which it goes ๐‘‡ steps to the right and then ๐‘‡ stepsto the left.

โ€ข ๐น โ€ฒ(๐‘ฅ) = 1 iff ๐‘ฅ represents a graph ๐บ via the adjacency matrixrepresentation such that ๐น(๐บ) = 1.

โ€ข ๐น โ€ณ(๐‘ฅ) = 1 iff ๐‘ฅ represents a graph ๐บ via the adjacency list represen-tation such that ๐น(๐บ) = 1.Prove that ๐น โ€ฒ โˆˆ P iff ๐น โ€ณ โˆˆ P.More generally, for every function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1, the answer

to the question of whether ๐น โˆˆ P (or whether ๐น โˆˆ EXP) is unchangedby switching representations, as long as transforming one represen-tation to the other can be done in polynomial time (which essentiallyholds for all reasonable representations).

Exercise 13.3 โ€” Boolean functions. For every function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—,define ๐ต๐‘œ๐‘œ๐‘™(๐น) to be the function mapping 0, 1โˆ— to 0, 1 such thaton input a (string representation of a) triple (๐‘ฅ, ๐‘–, ๐œŽ) with ๐‘ฅ โˆˆ 0, 1โˆ—,๐‘– โˆˆ โ„• and ๐œŽ โˆˆ 0, 1,

๐ต๐‘œ๐‘œ๐‘™(๐น)(๐‘ฅ, ๐‘–, ๐œŽ) = โŽงโŽจโŽฉ๐น(๐‘ฅ)๐‘– ๐œŽ = 0, ๐‘– < |๐น(๐‘ฅ)|1 ๐œŽ = 1, ๐‘– < |๐น(๐‘ฅ)|0 otherwise

(13.8)

where ๐น(๐‘ฅ)๐‘– is the ๐‘–-th bit of the string ๐น(๐‘ฅ).Prove that ๐น โˆˆ P if and only if ๐ต๐‘œ๐‘œ๐‘™(๐น) โˆˆ P.

Exercise 13.4 โ€” Composition of polynomial time. Prove that if๐น, ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— are in P then their composition ๐น โˆ˜ ๐บ,which is the function ๐ป s.t. ๐ป(๐‘ฅ) = ๐น(๐บ(๐‘ฅ)), is also in P.

Exercise 13.5 โ€” Non composition of exponential time. Prove that there is some๐น, ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— s.t. ๐น, ๐บ โˆˆ EXP but ๐น โˆ˜ ๐บ is not in EXP.

Exercise 13.6 โ€” Oblivious Turing Machines. We say that a Turing machine ๐‘€is oblivious if there is some function ๐‘‡ โˆถ โ„• ร— โ„• โ†’ โ„ค such that for everyinput ๐‘ฅ of length ๐‘›, and ๐‘ก โˆˆ โ„• it holds that:

โ€ข If ๐‘€ takes more than ๐‘ก steps to halt on the input ๐‘ฅ, then in the ๐‘ก-th step ๐‘€ โ€™s head will be in the position ๐‘‡ (๐‘›, ๐‘ก). (Note that thisposition depends only on the length of ๐‘ฅ and not its contents.)

โ€ข If ๐‘€ halts before the ๐‘ก-th step then ๐‘‡ (๐‘›, ๐‘ก) = โˆ’1.Prove that if ๐น โˆˆ P then there exists an oblivious Turing machine ๐‘€

that computes ๐น in polynomial time. See footnote for hint.1

Page 447: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 447

Exercise 13.7 Let EDGE โˆถ 0, 1โˆ— โ†’ 0, 1 be the function such that oninput a string representing a triple (๐ฟ, ๐‘–, ๐‘—), where ๐ฟ is the adjacencylist representation of an ๐‘› vertex graph ๐บ, and ๐‘– and ๐‘— are numbers in[๐‘›], EDGE(๐ฟ, ๐‘–, ๐‘—) = 1 if the edge ๐‘–, ๐‘— is present in the graph. EDGEoutputs 0 on all other inputs.

1. Prove that EDGE โˆˆ P.

2. Let PLANARMATRIX โˆถ 0, 1โˆ— โ†’ 0, 1 be the function thaton input an adjacency matrix ๐ด outputs 1 if and only if the graphrepresented by ๐ด is planar (that is, can be drawn on the plane with-out edges crossing one another). For this question, you can usewithout proof the fact that PLANARMATRIX โˆˆ P. Prove thatPLANARLIST โˆˆ P where PLANARLIST โˆถ 0, 1โˆ— โ†’ 0, 1 is thefunction that on input an adjacency list ๐ฟ outputs 1 if and only if ๐ฟrepresents a planar graph.

Exercise 13.8 โ€” Evaluate NAND circuits. Let NANDEVAL โˆถ 0, 1โˆ— โ†’0, 1 be the function such that for every string representing a pair(๐‘„, ๐‘ฅ) where ๐‘„ is an ๐‘›-input 1-output NAND-CIRC (not NAND-TM!)program and ๐‘ฅ โˆˆ 0, 1๐‘›, NANDEVAL(๐‘„, ๐‘ฅ) = ๐‘„(๐‘ฅ). On all otherinputs NANDEVAL outputs 0.

Prove that NANDEVAL โˆˆ P.

Exercise 13.9 โ€” Find hard function. Let NANDHARD โˆถ 0, 1โˆ— โ†’ 0, 1be the function such that on input a string representing a pair (๐‘“, ๐‘ )where

โ€ข ๐‘“ โˆˆ 0, 12๐‘› for some ๐‘› โˆˆ โ„•โ€ข ๐‘  โˆˆ โ„•

NANDHARD(๐‘“, ๐‘ ) = 1 if there is no NAND-CIRC program ๐‘„of at most ๐‘  lines that computes the function ๐น โˆถ 0, 1๐‘› โ†’ 0, 1whose truth table is the string ๐‘“ . That is, NANDHARD(๐‘“, ๐‘ ) = 1if for every NAND-CIRC program ๐‘„ of at most ๐‘  lines, there existssome ๐‘ฅ โˆˆ 0, 1๐‘› such that ๐‘„(๐‘ฅ) โ‰  ๐‘“๐‘ฅ where ๐‘“๐‘ฅ denote the ๐‘ฅ-thecoordinate of ๐‘“ , using the binary representation to identify 0, 1๐‘›with the numbers 0, โ€ฆ , 2๐‘› โˆ’ 1.1. Prove that NANDHARD โˆˆ EXP.

2. (Challenge) Prove that there is an algorithm FINDHARD suchthat if ๐‘› is sufficiently large, then FINDHARD(1๐‘›) runs in time22๐‘‚(๐‘›) and outputs a string ๐‘“ โˆˆ 0, 12๐‘› that is the truth table ofa function that is not contained in SIZE(2๐‘›/(1000๐‘›)). (In other

Page 448: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

448 introduction to theoretical computer science

2 Hint: Use Item 1, the existence of functions requir-ing exponentially hard NAND programs, and the factthat there are only finitely many functions mapping0, 1๐‘› to 0, 1.

words, if ๐‘“ is the string output by FINDHARD(1๐‘›) then if we let๐น โˆถ 0, 1๐‘› โ†’ 0, 1 be the function such that ๐น(๐‘ฅ) outputs the ๐‘ฅ-thcoordinate of ๐‘“ , then ๐น โˆ‰ SIZE(2๐‘›/(1000๐‘›)).2

Exercise 13.10 Suppose that you are in charge of scheduling courses incomputer science in University X. In University X, computer sciencestudents wake up late, and have to work on their startups in the af-ternoon, and take long weekends with their investors. So you onlyhave two possible slots: you can schedule a course either Monday-Wednesday 11am-1pm or Tuesday-Thursday 11am-1pm.

Let SCHEDULE โˆถ 0, 1โˆ— โ†’ 0, 1 be the function that takes as inputa list of courses ๐ฟ and a list of conflicts ๐ถ (i.e., list of pairs of coursesthat cannot share the same time slot) and outputs 1 if and only if thereis a โ€œconflict freeโ€ scheduling of the courses in ๐ฟ, where no pair in ๐ถ isscheduled in the same time slot.

More precisely, the list ๐ฟ is a list of strings (๐‘0, โ€ฆ , ๐‘๐‘›โˆ’1) and the list๐ถ is a list of pairs of the form (๐‘๐‘–, ๐‘๐‘—). SCHEDULE(๐ฟ, ๐ถ) = 1 if andonly if there exists a partition of ๐‘0, โ€ฆ , ๐‘๐‘›โˆ’1 into two parts so that thereis no pair (๐‘๐‘–, ๐‘๐‘—) โˆˆ ๐ถ such that both ๐‘๐‘– and ๐‘๐‘— are in the same part.

Prove that SCHEDULE โˆˆ P. As usual, you do not have to providethe full code to show that this is the case, and can describe operationsas a high level, as well as appeal to any data structures or other resultsmentioned in the book or in lecture. Note that to show that a function๐น is in P you need to both (1) present an algorithm ๐ด that computes๐น in polynomial time, (2) prove that ๐ด does indeed run in polynomialtime, and does indeed compute the correct answer.

Try to think whether or not your algorithm extends to the casewhere there are three possible time slots.

13.8 BIBLIOGRAPHICAL NOTES

Because we are interested in the maximum number of steps for inputsof a given length, running-time as we defined it is often known asworst case complexity. The minimum number of steps (or โ€œbest caseโ€complexity) to compute a function on length ๐‘› inputs is typically nota meaningful quantity since essentially every natural problem willhave some trivially easy instances. However, the average case complexity(i.e., complexity on a โ€œtypicalโ€ or โ€œrandomโ€ input) is an interestingconcept which weโ€™ll return to when we discuss cryptography. Thatsaid, worst-case complexity is the most standard and basic of thecomplexity measures, and will be our focus in most of this book.

Some lower bounds for single-tape Turing machines are given in[Maa85].

Page 449: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling running time 449

For defining efficiency in the ๐œ† calculus, one needs to be carefulabout the order of application of the reduction steps, which can matterfor computational efficiency, see for example this paper.

The notation P/poly is used for historical reasons. It was introducedby Karp and Lipton, who considered this class as corresponding tofunctions that can be computed by polynomial-time Turing Machinesthat are given for any input length ๐‘› an advice string of length polyno-mial in ๐‘›.

Page 450: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 451: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

14Polynomial-time reductions

Consider some of the problems we have encountered in Chapter 12:

1. The 3SAT problem: deciding whether a given 3CNF formula has asatisfying assignment.

2. Finding the longest path in a graph.

3. Finding the maximum cut in a graph.

4. Solving quadratic equations over ๐‘› variables ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 โˆˆ โ„.

All of these problems have the following properties:

โ€ข These are important problems, and people have spent significanteffort on trying to find better algorithms for them.

โ€ข Each one of these is a search problem, whereby we search for asolution that is โ€œgoodโ€ in some easy to define sense (e.g., a longpath, a satisfying assignment, etc.).

โ€ข Each of these problems has a trivial exponential time algorithm thatinvolve enumerating all possible solutions.

โ€ข At the moment, for all these problems the best known algorithm isnot much faster than the trivial one in the worst case.

In this chapter and in Chapter 15 we will see that, despite theirapparent differences, we can relate the computational complexity ofthese and many other problems. In fact, it turns out that the prob-lems above are computationally equivalent, in the sense that solving oneof them immediately implies solving the others. This phenomenon,known as NP completeness, is one of the surprising discoveries of the-oretical computer science, and we will see that it has far-reachingramifications.

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Introduce the notion of polynomial-time

reductions as a way to relate the complexity ofproblems to one another.

โ€ข See several examples of such reductions.โ€ข 3SAT as a basic starting point for reductions.

Page 452: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

452 introduction to theoretical computer science

This chapter: A non-mathy overviewThis chapter introduces the concept of a polynomial time re-duction which is a central object in computational complexityand this book in particular. A polynomial-time reduction is away to reduce the task of solving one problem to another. Theway we use reductions in complexity is to argue that if thefirst problem is hard to solve efficiently, then the second mustalso be hard. We see several examples for reductions in thischapter, and reductions will be the basis for the theory of NPcompleteness that we will develop in Chapter 15.

Figure 14.1: In this chapter we show that if the 3SATproblem cannot be solved in polynomial time, thenneither can the QUADEQ, LONGESTPATH, ISETand MAXCUT problems. We do this by using thereduction paradigm showing for example โ€œif pigs couldwhistleโ€ (i.e., if we had an efficient algorithm forQUADEQ) then โ€œhorses could flyโ€ (i.e., we wouldhave an efficient algorithm for 3SAT.)

In this chapter we will see that for each one of the problems of find-ing a longest path in a graph, solving quadratic equations, and findingthe maximum cut, if there exists a polynomial-time algorithm for thisproblem then there exists a polynomial-time algorithm for the 3SATproblem as well. In other words, we will reduce the task of solving3SAT to each one of the above tasks. Another way to interpret theseresults is that if there does not exist a polynomial-time algorithm for3SAT then there does not exist a polynomial-time algorithm for theseother problems as well. In Chapter 15 we will see evidence (thoughno proof!) that all of the above problems do not have polynomial-timealgorithms and hence are inherently intractable.

14.1 FORMAL DEFINITIONS OF PROBLEMS

For reasons of technical convenience rather than anything substantial,we concern ourselves with decision problems (i.e., Yes/No questions) orin other words Boolean (i.e., one-bit output) functions. We model the

Page 453: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 453

problems above as functions mapping 0, 1โˆ— to 0, 1 in the followingway:

3SAT. The 3SAT problem can be phrased as the function 3SAT โˆถ0, 1โˆ— โ†’ 0, 1 that takes as input a 3CNF formula ๐œ‘ (i.e., a formulaof the form ๐ถ0 โˆง โ‹ฏ โˆง ๐ถ๐‘šโˆ’1 where each ๐ถ๐‘– is the OR of three variablesor their negation) and maps ๐œ‘ to 1 if there exists some assignment tothe variables of ๐œ‘ that causes it to evalute to true, and to 0 otherwise.For example

3SAT ("(๐‘ฅ0 โˆจ ๐‘ฅ1 โˆจ ๐‘ฅ2) โˆง (๐‘ฅ1 โˆจ ๐‘ฅ2 โˆจ ๐‘ฅ3) โˆง (๐‘ฅ0 โˆจ ๐‘ฅ2 โˆจ ๐‘ฅ3)") = 1 (14.1)

since the assignment ๐‘ฅ = 1101 satisfies the input formula. In theabove we assume some representation of formulas as strings, anddefine the function to output 0 if its input is not a valid representation;we use the same convention for all the other functions below.

Quadratic equations. The quadratic equations problem corresponds to thefunction QUADEQ โˆถ 0, 1โˆ— โ†’ 0, 1 that maps a set of quadraticequations ๐ธ to 1 if there is an assignment ๐‘ฅ that satisfies all equations,and to 0 otherwise.

Longest path. The longest path problem corresponds to the functionLONGPATH โˆถ 0, 1โˆ— โ†’ 0, 1 that maps a graph ๐บ and a number ๐‘˜to 1 if there is a simple path in ๐บ of length at least ๐‘˜, and maps (๐บ, ๐‘˜)to 0 otherwise. The longest path problem is a generalization of thewell-known Hamiltonian Path Problem of determining whether a pathof length ๐‘› exists in a given ๐‘› vertex graph.

Maximum cut. The maximum cut problem corresponds to the functionMAXCUT โˆถ 0, 1โˆ— โ†’ 0, 1 that maps a graph ๐บ and a number ๐‘˜ to1 if there is a cut in ๐บ that cuts at least ๐‘˜ edges, and maps (๐บ, ๐‘˜) to 0otherwise.

All of the problems above are in EXP but it is not known whetheror not they are in P. However, we will see in this chapter that if eitherQUADEQ , LONGPATH or MAXCUT are in P, then so is 3SAT.14.2 POLYNOMIAL-TIME REDUCTIONS

Suppose that ๐น, ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1 are two Boolean functions. Apolynomial-time reduction (or sometimes just โ€œreductionโ€ for short) from๐น to ๐บ is a way to show that ๐น is โ€œno harderโ€ than ๐บ, in the sensethat a polynomial-time algorithm for ๐บ implies a polynomial-timealgorithm for ๐น .

Page 454: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

454 introduction to theoretical computer science

Figure 14.2: If ๐น โ‰ค๐‘ ๐บ then we can transform apolynomial-time algorithm ๐ต that computes ๐บinto a polynomial-time algorithm ๐ด that computes๐น . To compute ๐น(๐‘ฅ) we can run the reduction ๐‘…guaranteed by the fact that ๐น โ‰ค๐‘ ๐บ to obtain ๐‘ฆ =๐น(๐‘ฅ) and then run our algorithm ๐ต for ๐บ to compute๐บ(๐‘ฆ).

Definition 14.1 โ€” Polynomial-time reductions. Let ๐น, ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1.We say that ๐น reduces to ๐บ, denoted by ๐น โ‰ค๐‘ ๐บ if there is apolynomial-time computable ๐‘… โˆถ 0, 1โˆ— โ†’ 0, 1โˆ— such thatfor every ๐‘ฅ โˆˆ 0, 1โˆ—, ๐น(๐‘ฅ) = ๐บ(๐‘…(๐‘ฅ)) . (14.2)

We say that ๐น and ๐บ have equivalent complexity if ๐น โ‰ค๐‘ ๐บ and ๐บ โ‰ค๐‘๐น .

The following exercise justifies our intuition that ๐น โ‰ค๐‘ ๐บ signifiesthat โ€๐น is no harder than ๐บ.Solved Exercise 14.1 โ€” Reductions and P. Prove that if ๐น โ‰ค๐‘ ๐บ and ๐บ โˆˆ Pthen ๐น โˆˆ P.

PAs usual, solving this exercise on your own is an excel-lent way to make sure you understand Definition 14.1.

Solution:

Suppose there was an algorithm ๐ต that compute ๐บ in time ๐‘(๐‘›)where ๐‘ is its input size. Then, (14.2) directly gives an algorithm๐ด to compute ๐น (see Fig. 14.2). Indeed, on input ๐‘ฅ โˆˆ 0, 1โˆ—, Al-gorithm ๐ด will run the polynomial-time reduction ๐‘… to obtain๐‘ฆ = ๐‘…(๐‘ฅ) and then return ๐ต(๐‘ฆ). By (14.2), ๐บ(๐‘…(๐‘ฅ)) = ๐น(๐‘ฅ) andhence Algorithm ๐ด will indeed compute ๐น .

We now show that ๐ด runs in polynomial time. By assumption, ๐‘…can be computed in time ๐‘ž(๐‘›) for some polynomial ๐‘ž. In particular,this means that |๐‘ฆ| โ‰ค ๐‘ž(|๐‘ฅ|) (as just writing down ๐‘ฆ takes |๐‘ฆ| steps).Computing ๐ต(๐‘ฆ) will take at most ๐‘(|๐‘ฆ|) โ‰ค ๐‘(๐‘ž(|๐‘ฅ|)) steps. Thusthe total running time of ๐ด on inputs of length ๐‘› is at most the timeto compute ๐‘ฆ, which is bounded by ๐‘ž(๐‘›), and the time to compute๐ต(๐‘ฆ), which is bounded by ๐‘(๐‘ž(๐‘›)), and since the composition oftwo polynomials is a polynomial, ๐ด runs in polynomial time.

Big Idea 21 A reduction ๐น โ‰ค๐‘ ๐บ shows that ๐น is โ€œno harder than๐บโ€ or equivalently that ๐บ is โ€œno easier than ๐นโ€.

14.2.1 Whistling pigs and flying horsesA reduction from ๐น to ๐บ can be used for two purposes:

Page 455: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 455

โ€ข If we already know an algorithm for ๐บ and ๐น โ‰ค๐‘ ๐บ then we canuse the reduction to obtain an algorithm for ๐น . This is a widelyused tool in algorithm design. For example in Section 12.1.4 we sawhow the Min-Cut Max-Flow theorem allows to reduce the task ofcomputing a minimum cut in a graph to the task of computing amaximum flow in it.

โ€ข If we have proven (or have evidence) that there exists no polynomial-time algorithm for ๐น and ๐น โ‰ค๐‘ ๐บ then the existence of this reductionallows us to conclude that there exists no polynomial-time algo-rithm for ๐บ. This is the โ€œif pigs could whistle then horses couldflyโ€ interpretation weโ€™ve seen in Section 9.4. We show that if therewas an hypothetical efficient algorithm for ๐บ (a โ€œwhistling pigโ€)then since ๐น โ‰ค๐‘ ๐บ then there would be an efficient algorithm for๐น (a โ€œflying horseโ€). In this book we often use reductions for thissecond purpose, although the lines between the two is sometimesblurry (see the bibliographical notes in Section 14.9).

The most crucial difference between the notion in Definition 14.1and the reductions we saw in the context of uncomputability (e.g.,in Section 9.4) is that for relating time complexity of problems, weneed the reduction to be computable in polynomial time, as opposed tomerely computable. Definition 14.1 also restricts reductions to have avery specific format. That is, to show that ๐น โ‰ค๐‘ ๐บ, rather than allow-ing a general algorithm for ๐น that uses a โ€œmagic boxโ€ that computes๐บ, we only allow an algorithm that computes ๐น(๐‘ฅ) by outputting๐บ(๐‘…(๐‘ฅ)). This restricted form is convenient for us, but people havedefined and used more general reductions as well (see Section 14.9).

In this chapter we use reductions to relate the computational com-plexity of the problems mentioned above: 3SAT, Quadratic Equations,Maximum Cut, and Longest Path, as well as a few others. We willreduce 3SAT to the latter problems, demonstrating that solving anyone of them efficiently will result in an efficient algorithm for 3SAT.In Chapter 15 we show the other direction: reducing each one of theseproblems to 3SAT in one fell swoop.

Transitivity of reductions. Since we think of ๐น โ‰ค๐‘ ๐บ as saying that(as far as polynomial-time computation is concerned) ๐น is โ€œeasier orequal in difficulty toโ€ ๐บ, we would expect that if ๐น โ‰ค๐‘ ๐บ and ๐บ โ‰ค๐‘ ๐ป ,then it would hold that ๐น โ‰ค๐‘ ๐ป . Indeed this is the case:Solved Exercise 14.2 โ€” Transitivity of polynomial-time reductions. For every๐น, ๐บ, ๐ป โˆถ 0, 1โˆ— โ†’ 0, 1, if ๐น โ‰ค๐‘ ๐บ and ๐บ โ‰ค๐‘ ๐ป then ๐น โ‰ค๐‘ ๐ป .

Page 456: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

456 introduction to theoretical computer science

1 If you are familiar with matrix notation you maynote that such equations can be written as ๐ด๐‘ฅ = bwhere ๐ด is an ๐‘š ร— ๐‘› matrix with entries in 0/1 andb โˆˆ โ„•๐‘š.

Solution:

If ๐น โ‰ค๐‘ ๐บ and ๐บ โ‰ค๐‘ ๐ป then there exist polynomial-time com-putable functions ๐‘…1 and ๐‘…2 mapping 0, 1โˆ— to 0, 1โˆ— such thatfor every ๐‘ฅ โˆˆ 0, 1โˆ—, ๐น(๐‘ฅ) = ๐บ(๐‘…1(๐‘ฅ)) and for every ๐‘ฆ โˆˆ 0, 1โˆ—,๐บ(๐‘ฆ) = ๐ป(๐‘…2(๐‘ฆ)). Combining these two equalities, we see thatfor every ๐‘ฅ โˆˆ 0, 1โˆ—, ๐น(๐‘ฅ) = ๐ป(๐‘…2(๐‘…1(๐‘ฅ))) and so to show that๐น โ‰ค๐‘ ๐ป , it is sufficient to show that the map ๐‘ฅ โ†ฆ ๐‘…2(๐‘…1(๐‘ฅ)) iscomputable in polynomial time. But if there are some constants ๐‘, ๐‘‘such that ๐‘…1(๐‘ฅ) is computable in time |๐‘ฅ|๐‘ and ๐‘…2(๐‘ฆ) is computablein time |๐‘ฆ|๐‘‘ then ๐‘…2(๐‘…1(๐‘ฅ)) is computable in time (|๐‘ฅ|๐‘)๐‘‘ = |๐‘ฅ|๐‘๐‘‘which is polynomial.

14.3 REDUCING 3SAT TO ZERO ONE AND QUADRATIC EQUATIONS

We now show our first example of a reduction. The Zero-One Lin-ear Equations problem corresponds to the function 01EQ โˆถ 0, 1โˆ— โ†’0, 1 whose input is a collection ๐ธ of linear equations in variables๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, and the output is 1 iff there is an assignment ๐‘ฅ โˆˆ 0, 1๐‘›of 0/1 values to the variables that satisfies all the equations. For exam-ple, if the input ๐ธ is a string encoding the set of equations๐‘ฅ0 + ๐‘ฅ1 + ๐‘ฅ2 = 2๐‘ฅ0 + ๐‘ฅ2 = 1๐‘ฅ1 + ๐‘ฅ2 = 2 (14.3)

then 01EQ(๐ธ) = 1 since the assignment ๐‘ฅ = 011 satisfies all threeequations. We specifically restrict attention to linear equations invariables ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 in which every equation has the form โˆ‘๐‘–โˆˆ๐‘† ๐‘ฅ๐‘– =๐‘ where ๐‘† โŠ† [๐‘›] and ๐‘ โˆˆ โ„•.1

If we asked the question of whether there is a solution ๐‘ฅ โˆˆ โ„๐‘› ofreal numbers to ๐ธ, then this can be solved using the famous Gaussianelimination algorithm in polynomial time. However, there is no knownefficient algorithm to solve 01EQ. Indeed, such an algorithm wouldimply an algorithm for 3SAT as shown by the following theorem:

Theorem 14.2 โ€” Hardness of 01๐ธ๐‘„. 3SAT โ‰ค๐‘ 01EQProof Idea:

A constraint ๐‘ฅ2 โˆจ ๐‘ฅ5 โˆจ ๐‘ฅ7 can be written as ๐‘ฅ2 + (1 โˆ’ ๐‘ฅ5) + ๐‘ฅ7 โ‰ฅ 1.This is a linear inequality but since the sum on the left-hand side isat most three, we can also turn it into an equality by adding two newvariables ๐‘ฆ, ๐‘ง and writing it as ๐‘ฅ2 + (1 โˆ’ ๐‘ฅ5) + ๐‘ฅ7 + ๐‘ฆ + ๐‘ง = 3. (We willuse fresh such variables ๐‘ฆ, ๐‘ง for every constraint.) Finally, for everyvariable ๐‘ฅ๐‘– we can add a variable ๐‘ฅโ€ฒ๐‘– corresponding to its negation by

Page 457: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 457

adding the equation ๐‘ฅ๐‘– + ๐‘ฅโ€ฒ๐‘– = 1, hence mapping the original constraint๐‘ฅ2 โˆจ ๐‘ฅ5 โˆจ ๐‘ฅ7 to ๐‘ฅ2 + ๐‘ฅโ€ฒ5 + ๐‘ฅ7 + ๐‘ฆ + ๐‘ง = 3. The main takeawaytechnique from this reduction is the idea of adding auxiliary variablesto replace an equation such as ๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 โ‰ฅ 1 that is not quite in theform we want with the equivalent (for 0/1 valued variables) equation๐‘ฅ1 + ๐‘ฅ2 + ๐‘ฅ3 + ๐‘ข + ๐‘ฃ = 3 which is in the form we want.โ‹†

Figure 14.3: Left: Python code implementing thereduction of 3SAT to 01EQ. Right: Example output ofthe reduction. Code is in our repository.

Proof of Theorem 14.2. To prove the theorem we need to:

1. Describe an algorithm ๐‘… for mapping an input ๐œ‘ for 3SAT into aninput ๐ธ for 01EQ.

2. Prove that the algorithm runs in polynomial time.

3. Prove that 01EQ(๐‘…(๐œ‘)) = 3SAT(๐œ‘) for every 3CNF formula ๐œ‘.

We now proceed to do just that. Since this is our first reduction, wewill spell out this proof in detail. However it straightforwardly followsthe proof idea.

Page 458: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

458 introduction to theoretical computer science

Algorithm 14.3 โ€” 3๐‘†๐ด๐‘‡ to 01๐ธ๐‘„ reduction.

Input: 3CNF formula ๐œ‘ with ๐‘› variables ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 and ๐‘šclauses.

Output: Set ๐ธ of linear equations over 0/1 such that3๐‘†๐ด๐‘‡ (๐œ‘) = 1 -iff 01๐ธ๐‘„(๐ธ) = 1.1: Let ๐ธโ€™s variables be ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1, ๐‘ฅโ€ฒ0, โ€ฆ , ๐‘ฅโ€ฒ๐‘›โˆ’1, ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘šโˆ’1,๐‘ง0, โ€ฆ , ๐‘ง๐‘šโˆ’1.2: for ๐‘– โˆˆ [๐‘›] do3: add to ๐ธ the equation ๐‘ฅ๐‘– + ๐‘ฅโ€ฒ๐‘– = 14: end for5: for jโˆˆ[m] do6: Let ๐‘—-th clause be ๐‘ค0 โˆจ ๐‘ค1 โˆจ ๐‘ค2 where ๐‘ค0, ๐‘ค1, ๐‘ค2 are

literals.7: for ๐‘Ž โˆˆ [3] do8: if ๐‘ค๐‘Ž is variable ๐‘ฅ๐‘– then9: set ๐‘ก๐‘Ž โ† ๐‘ฅ๐‘–

10: end if11: if ๐‘ค๐‘Ž is negation ยฌ๐‘ฅ๐‘– then12: set ๐‘ก๐‘Ž โ† ๐‘ฅโ€ฒ๐‘–13: end if14: end for15: Add to ๐ธ the equation ๐‘ก0 + ๐‘ก1 + ๐‘ก2 + ๐‘ฆ๐‘— + ๐‘ง๐‘— = 3.16: end for17: return ๐ธ

The reduction is described in Algorithm 14.3, see also Fig. 14.3. Ifthe input formula has ๐‘› variable and ๐‘š steps, Algorithm 14.3 creates aset ๐ธ of ๐‘›+๐‘š equations over 2๐‘›+2๐‘š variables. Algorithm 14.3 makesan initial loop of ๐‘› steps (each taking constant time) and then anotherloop of ๐‘š steps (each taking constant time) to create the equations,and hence it runs in polynomial time.

Let ๐‘… be the function computed by Algorithm 14.3. The heart ofthe proof is to show that for every 3CNF ๐œ‘, 01EQ(๐‘…(๐œ‘)) = 3SAT(๐œ‘).We split the proof into two parts. The first part, traditionally knownas the completeness property, is to show that if 3SAT(๐œ‘) = 1 then๐‘‚1EQ(๐‘…(๐œ‘)) = 1. The second part, traditionally known as the sound-ness property, is to show that if 01EQ(๐‘…(๐œ‘)) = 1 then 3SAT(๐œ‘) = 1.(The names โ€œcompletenessโ€ and โ€œsoundnessโ€ derive viewing a so-lution to ๐‘…(๐œ‘) as a โ€œproofโ€ that ๐œ‘ is satisfiable, in which case theseconditions corresponds to completeness and soundness as definedin Section 11.1.1. However, if you find the names confusing you cansimply think of completeness as the โ€œ1-instance maps to 1-instanceโ€

Page 459: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 459

property and soundness as the โ€œ0-instance maps to 0-instanceโ€ prop-erty.)

We complete the proof by showing both parts:

โ€ข Completeness: Suppose that 3SAT(๐œ‘) = 1, which means thatthere is an assignment ๐‘ฅ โˆˆ 0, 1๐‘› that satisfies ๐œ‘. If we use theassignment ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 and 1 โˆ’ ๐‘ฅ0, โ€ฆ , 1 โˆ’ ๐‘ฅ๐‘›โˆ’1 for the first 2๐‘›variables of ๐ธ = ๐‘…(๐œ‘) then we will satisfy all equations of the form๐‘ฅ๐‘– + ๐‘ฅโ€ฒ๐‘– = 1. Moreover, for every ๐‘— โˆˆ [๐‘›], if ๐‘ก0 + ๐‘ก1 + ๐‘ก2 + ๐‘ฆ๐‘— + ๐‘ง๐‘— =3(โˆ—) is the equation arising from the ๐‘—th clause of ๐œ‘ (with ๐‘ก0, ๐‘ก1, ๐‘ก2being variables of the form ๐‘ฅ๐‘– or ๐‘ฅโ€ฒ๐‘– depending on the literals of theclause) then our assignment to the first 2๐‘› variables ensures that๐‘ก0 + ๐‘ก1 + ๐‘ก2 โ‰ฅ 1 (since ๐‘ฅ satisfied ๐œ‘) and hence we can assign valuesto ๐‘ฆ๐‘— and ๐‘ง๐‘— that will ensure that the equation (โˆ—) is satisfied. Hencein this case ๐ธ = ๐‘…(๐œ‘) is satisfied, meaning that 01EQ(๐‘…(๐œ‘)) = 1.

โ€ข Soundness: Suppose that 01EQ(๐‘…(๐œ‘)) = 1, which means that theset of equations ๐ธ = ๐‘…(๐œ‘) has a satisfying assignment ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1,๐‘ฅโ€ฒ0, โ€ฆ , ๐‘ฅโ€ฒ๐‘›โˆ’1, ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘šโˆ’1, ๐‘ง0, โ€ฆ , ๐‘ง๐‘šโˆ’1. Then, since the equationscontain the condition ๐‘ฅ๐‘– + ๐‘ฅโ€ฒ๐‘– = 1, for every ๐‘– โˆˆ [๐‘›], ๐‘ฅโ€ฒ๐‘– is the negationof ๐‘ฅ๐‘–, and morover, for every ๐‘— โˆˆ [๐‘š], if ๐ถ has the form ๐‘ค0 โˆจ ๐‘ค1 โˆจ ๐‘ค2is the ๐‘—-th clause of ๐ถ, then the corresponding assignment ๐‘ฅ willensure that ๐‘ค0 + ๐‘ค1 + ๐‘ค2 โ‰ฅ 1, implying that ๐ถ is satisfied. Hence inthis case 3SAT(๐œ‘) = 1.

14.3.1 Quadratic equationsNow that we reduced 3SAT to 01EQ, we can use this to reduce 3SATto the quadratic equations problem. This is the function QUADEQ inwhich the input is a list of ๐‘›-variate polynomials ๐‘0, โ€ฆ , ๐‘๐‘šโˆ’1 โˆถ โ„๐‘› โ†’ โ„that are all of degree at most two (i.e., they are quadratic) and withinteger coefficients. (The latter condition is for convenience and canbe achieved by scaling.) We define QUADEQ(๐‘0, โ€ฆ , ๐‘๐‘šโˆ’1) to equal 1if and only if there is a solution ๐‘ฅ โˆˆ โ„๐‘› to the equations ๐‘0(๐‘ฅ) = 0,๐‘1(๐‘ฅ) = 0, โ€ฆ, ๐‘๐‘šโˆ’1(๐‘ฅ) = 0.

For example, the following is a set of quadratic equations over thevariables ๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2: ๐‘ฅ20 โˆ’ ๐‘ฅ0 = 0๐‘ฅ21 โˆ’ ๐‘ฅ1 = 0๐‘ฅ22 โˆ’ ๐‘ฅ2 = 01 โˆ’ ๐‘ฅ0 โˆ’ ๐‘ฅ1 + ๐‘ฅ0๐‘ฅ1 = 0 (14.4)

You can verify that ๐‘ฅ โˆˆ โ„3 satisfies this set of equations if and only if๐‘ฅ โˆˆ 0, 13 and ๐‘ฅ0 โˆจ ๐‘ฅ1 = 1.

Page 460: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

460 introduction to theoretical computer science

Theorem 14.4 โ€” Hardness of quadratic equations.3SAT โ‰ค๐‘ QUADEQ (14.5)

Proof Idea:

Using the transitivity of reductions (Solved Exercise 14.2), it isenough to show that 01EQ โ‰ค๐‘ QUADEQ, but this follows since wecan phrase the equation ๐‘ฅ๐‘– โˆˆ 0, 1 as the quadratic constraint ๐‘ฅ2๐‘– โˆ’๐‘ฅ๐‘– = 0. The takeaway technique of this reduction is that we can usenonlinearity to force continuous variables (e.g., variables taking valuesin โ„) to be discrete (e.g., take values in 0, 1).โ‹†Proof of Theorem 14.4. By Theorem 14.2 and Solved Exercise 14.2,it is sufficient to prove that 01EQ โ‰ค๐‘ QUADEQ. Let ๐ธ be an in-stance of 01EQ with variables ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘šโˆ’1. We map ๐ธ to the set ofquadratic equations ๐ธโ€ฒ that is obtained by taking the linear equationsin ๐ธ and adding to them the ๐‘› quadratic equations ๐‘ฅ2๐‘– โˆ’ ๐‘ฅ๐‘– = 0 forall ๐‘– โˆˆ [๐‘›]. (See Algorithm 14.5.) The map ๐ธ โ†ฆ ๐ธโ€ฒ can be com-puted in polynomial time. We claim that 01EQ(๐ธ) = 1 if and onlyif QUADEQ(๐ธโ€ฒ) = 1. Indeed, the only difference between the twoinstances is that:

โ€ข In the 01EQ instance ๐ธ, the equations are over variables ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1in 0, 1.

โ€ข In the QUADEQ instance ๐ธโ€ฒ, the equations are over variables๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 โˆˆ โ„ but we have the extra constraints ๐‘ฅ2๐‘– โˆ’ ๐‘ฅ๐‘– = 0for all ๐‘– โˆˆ [๐‘›].Since for every ๐‘Ž โˆˆ โ„, ๐‘Ž2 โˆ’ ๐‘Ž = 0 if and only if ๐‘Ž โˆˆ 0, 1, the two

sets of equations are equivalent and 01EQ(๐ธ) = QUADEQ(๐ธโ€ฒ) whichis what we wanted to prove.

Page 461: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 461

Algorithm 14.5 โ€” 3๐‘†๐ด๐‘‡ to 01๐ธ๐‘„ reduction.

Input: Set ๐ธ of linear equations over ๐‘› variables ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1.Output: Set ๐ธโ€ฒ of quadratic eqations ovar ๐‘š variables๐‘ค0, โ€ฆ , ๐‘ค๐‘šโˆ’1 such that there is an 0/1 assignment๐‘ฅ โˆˆ 0, 1๐‘›1: satisfying the equations of ๐ธ -iff there is an assignment๐‘ค โˆˆ โ„๐‘š satisfying the equations of ๐ธโ€ฒ.2: That is, ๐‘‚1๐ธ๐‘„(๐ธ) = ๐‘„๐‘ˆ๐ด๐ท๐ธ๐‘„(๐ธโ€ฒ).3: Let ๐‘š โ† ๐‘›.4: Variables of ๐ธโ€ฒ are set to be same variable ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1

as ๐ธ.5: for every equation ๐‘’ โˆˆ ๐ธ do6: Add ๐‘’ to ๐ธโ€ฒ7: end for8: for ๐‘– โˆˆ [๐‘›] do9: Add to ๐ธโ€ฒ the equation ๐‘ฅ2๐‘– โˆ’ ๐‘ฅ๐‘– = 0.

10: end for11: return ๐ธโ€ฒ

14.4 THE INDEPENDENT SET PROBLEM

For a graph ๐บ = (๐‘‰ , ๐ธ), an independent set (also known as a stableset) is a subset ๐‘† โŠ† ๐‘‰ such that there are no edges with both end-points in ๐‘† (in other words, ๐ธ(๐‘†, ๐‘†) = โˆ…). Every โ€œsingletonโ€ (setconsisting of a single vertex) is trivially an independent set, but find-ing larger independent sets can be challenging. The maximum indepen-dent set problem (henceforth simply โ€œindependent setโ€) is the task offinding the largest independent set in the graph. The independent setproblem is naturally related to scheduling problems: if we put an edgebetween two conflicting tasks, then an independent set correspondsto a set of tasks that can all be scheduled together without conflicts.The independent set problem has been studied in a variety of settings,including for example in the case of algorithms for finding structure inprotein-protein interaction graphs.

As mentioned in Section 14.1, we think of the independent set prob-lem as the function ISET โˆถ 0, 1โˆ— โ†’ 0, 1 that on input a graph ๐บand a number ๐‘˜ outputs 1 if and only if the graph ๐บ contains an in-dependent set of size at least ๐‘˜. We now reduce 3SAT to Independentset.

Theorem 14.6 โ€” Hardness of Independent Set. 3SAT โ‰ค๐‘ ISET.

Proof Idea:

Page 462: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

462 introduction to theoretical computer science

Figure 14.4: An example of the reduction of 3SATto ISET for the case the original input formula is๐œ‘ = (๐‘ฅ0 โˆจ๐‘ฅ1 โˆจ๐‘ฅ2)โˆง(๐‘ฅ0 โˆจ๐‘ฅ1 โˆจ๐‘ฅ2)โˆง(๐‘ฅ1 โˆจ๐‘ฅ2 โˆจ๐‘ฅ3). Wemap each clause of ๐œ‘ to a triangle of three vertices,each tagged above with โ€œ๐‘ฅ๐‘– = 0โ€ or โ€œ๐‘ฅ๐‘– = 1โ€depending on the value of ๐‘ฅ๐‘– that would satisfy theparticular literal. We put an edge between every twoliterals that are conflicting (i.e., tagged with โ€œ๐‘ฅ๐‘– = 0โ€and โ€œ๐‘ฅ๐‘– = 1โ€ respectively).

The idea is that finding a satisfying assignment to a 3SAT formulacorresponds to satisfying many local constraints without creatingany conflicts. One can think of โ€œ๐‘ฅ17 = 0โ€ and โ€œ๐‘ฅ17 = 1โ€ as twoconflicting events, and of the constraints ๐‘ฅ17 โˆจ ๐‘ฅ5 โˆจ ๐‘ฅ9 as creatinga conflict between the events โ€œ๐‘ฅ17 = 0โ€, โ€œ๐‘ฅ5 = 1โ€ and โ€œ๐‘ฅ9 = 0โ€,saying that these three cannot simultaneously co-occur. Using theseideas, we can we can think of solving a 3SAT problem as trying toschedule non conflicting events, though the devil is, as usual, in thedetails. The takeaway technique here is to map each clause of theoriginal formula into a gadget which is a small subgraph (or moregenerally โ€œsubinstanceโ€) satisfying some convenient properties. Wewill see these โ€œgadgetsโ€ used time and again in the construction ofpolynomial-time reductions.โ‹†

Algorithm 14.7 โ€” 3๐‘†๐ด๐‘‡ to ๐ผ๐‘† reduction.

Input: 3๐‘†๐ด๐‘‡ formula ๐œ‘ with ๐‘› variables and ๐‘š clauses.Output: Graph ๐บ = (๐‘‰ , ๐ธ) and number ๐‘˜, such that ๐บ

has an independent set of size ๐‘˜ -iff ๐œ‘ has a satisfyingassignment.

1: That is, 3๐‘†๐ด๐‘‡ (๐œ‘) = ๐ผ๐‘†๐ธ๐‘‡ (๐บ, ๐‘˜),2: Initialize ๐‘‰ โ† โˆ…, ๐ธ โ† โˆ…3: for every clause ๐ถ = ๐‘ฆ โˆจ ๐‘ฆโ€ฒ โˆจ ๐‘ฆโ€ณ of ๐œ‘ do4: Add three vertices (๐ถ, ๐‘ฆ), (๐ถ, ๐‘ฆโ€ฒ), (๐ถ, ๐‘ฆโ€ณ) to ๐‘‰5: Add edges (๐ถ, ๐‘ฆ), (๐ถ, ๐‘ฆโ€ฒ), (๐ถ, ๐‘ฆโ€ฒ), (๐ถ, ๐‘ฆโ€ณ),(๐ถ, ๐‘ฆโ€ณ), (๐ถ, ๐‘ฆ) to ๐ธ.6: end for7: for every distinct clauses ๐ถ, ๐ถโ€ฒ in ๐œ‘ do8: for every ๐‘– โˆˆ [๐‘›] do9: if ๐ถ contains literal ๐‘ฅ๐‘– and ๐ถโ€ฒ contains literal ๐‘ฅ๐‘–

then10: Add edge (๐ถ, ๐‘ฅ๐‘–), (๐ถ, ๐‘ฅ๐‘–) to ๐ธ11: end if12: end for13: end for14: return ๐บ = (๐‘‰ , ๐ธ)

Proof of Theorem 14.6. Given a 3SAT formula ๐œ‘ on ๐‘› variables andwith ๐‘š clauses, we will create a graph ๐บ with 3๐‘š vertices as follows.(See Algorithm 14.7, see also Fig. 14.4 for an example and Fig. 14.5 forPython code.)

โ€ข A clause ๐ถ in ๐œ‘ has the form ๐ถ = ๐‘ฆ โˆจ ๐‘ฆโ€ฒ โˆจ ๐‘ฆโ€ณ where ๐‘ฆ, ๐‘ฆโ€ฒ, ๐‘ฆโ€ณ areliterals (variables or their negation). For each such clause ๐ถ, we will

Page 463: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 463

add three vertices to ๐บ, and label them (๐ถ, ๐‘ฆ), (๐ถ, ๐‘ฆโ€ฒ), and (๐ถ, ๐‘ฆโ€ณ)respectively. We will also add the three edges between all pairs ofthese vertices, so they form a triangle. Since there are ๐‘š clauses in ๐œ‘,the graph ๐บ will have 3๐‘š vertices.

โ€ข In addition to the above edges, we also add an edge between everypair vertices of the form (๐ถ, ๐‘ฆ) and (๐ถโ€ฒ, ๐‘ฆโ€ฒ) where ๐‘ฆ and ๐‘ฆโ€ฒ are con-flicting literals. That is, we add an edge between (๐ถ, ๐‘ฆ) and (๐ถโ€ฒ, ๐‘ฆโ€ฒ)if there is an ๐‘– such that ๐‘ฆ = ๐‘ฅ๐‘– and ๐‘ฆโ€ฒ = ๐‘ฅ๐‘– or vice versa.

The algorithm constructing of ๐บ based on ๐œ‘ takes polynomial timesince it involves two loops, the first taking ๐‘‚(๐‘š) steps and the secondtaking ๐‘‚(๐‘š2๐‘›) steps (see Algorithm 14.7). Hence to prove the theo-rem we need to show that ๐œ‘ is satisfiable if and only if ๐บ contains anindependent set of ๐‘š vertices. We now show both directions of thisequivalence:

Part 1: Completeness. The โ€œcompletenessโ€ direction is to show thatif ๐œ‘ has a satisfying assignment ๐‘ฅโˆ—, then ๐บ has an independent set ๐‘†โˆ—of ๐‘š vertices. Let us now show this.

Indeed, suppose that ๐œ‘ has a satisfying assignment ๐‘ฅโˆ— โˆˆ 0, 1๐‘›.Then for every clause ๐ถ = ๐‘ฆ โˆจ ๐‘ฆโ€ฒ โˆจ ๐‘ฆโ€ณ of ๐œ‘, one of the literals ๐‘ฆ, ๐‘ฆโ€ฒ, ๐‘ฆโ€ณmust evaluate to true under the assignment ๐‘ฅโˆ— (as otherwise it wouldnot satisfy ๐œ‘). We let ๐‘† be a set of ๐‘š vertices that is obtained by choos-ing for every clause ๐ถ one vertex of the form (๐ถ, ๐‘ฆ) such that ๐‘ฆ eval-uates to true under ๐‘ฅโˆ—. (If there is more than one such vertex for thesame ๐ถ, we arbitrarily choose one of them.)

We claim that ๐‘† is an independent set. Indeed, suppose otherwisethat there was a pair of vertices (๐ถ, ๐‘ฆ) and (๐ถโ€ฒ, ๐‘ฆโ€ฒ) in ๐‘† that have anedge between them. Since we picked one vertex out of each trianglecorresponding to a clause, it must be that ๐ถ โ‰  ๐ถโ€ฒ. Hence the onlyway that there is an edge between (๐ถ, ๐‘ฆ) and (๐ถ, ๐‘ฆโ€ฒ) is if ๐‘ฆ and ๐‘ฆโ€ฒ areconflicting literals (i.e. ๐‘ฆ = ๐‘ฅ๐‘– and ๐‘ฆโ€ฒ = ๐‘ฅ๐‘– for some ๐‘–). But then theycanโ€™t both evaluate to true under the assignment ๐‘ฅโˆ—, which contradictsthe way we constructed the set ๐‘†. This completes the proof of thecompleteness condition.

Part 2: Soundness. The โ€œsoundnessโ€ direction is to show that if๐บ has an independent set ๐‘†โˆ— of ๐‘š vertices, then ๐œ‘ has a satisfyingassignment ๐‘ฅโˆ— โˆˆ 0, 1๐‘›. Let us now show this.

Indeed, suppose that ๐บ has an independent set ๐‘†โˆ— with ๐‘š vertices.We will define an assignment ๐‘ฅโˆ— โˆˆ 0, 1๐‘› for the variables of ๐œ‘ asfollows. For every ๐‘– โˆˆ [๐‘›], we set ๐‘ฅโˆ—๐‘– according to the following rules:

โ€ข If ๐‘†โˆ— contains a vertex of the form (๐ถ, ๐‘ฅ๐‘–) then we set ๐‘ฅโˆ—๐‘– = 1.โ€ข If ๐‘†โˆ— contains a vertex of the form (๐ถ, ๐‘ฅ๐‘–) then we set ๐‘ฅโˆ—๐‘– = 0.

Page 464: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

464 introduction to theoretical computer science

โ€ข If ๐‘†โˆ— does not contain a vertex of either of these forms, then it doesnot matter which value we give to ๐‘ฅโˆ—๐‘– , but for concreteness weโ€™ll set๐‘ฅโˆ—๐‘– = 0.The first observation is that ๐‘ฅโˆ— is indeed well defined, in the sense

that the rules above do not conflict with one another, and ask to set ๐‘ฅโˆ—๐‘–to be both 0 and 1. This follows from the fact that ๐‘†โˆ— is an independentset and hence if it contains a vertex of the form (๐ถ, ๐‘ฅ๐‘–) then it cannotcontain a vertex of the form (๐ถโ€ฒ, ๐‘ฅ๐‘–).

We now claim that ๐‘ฅโˆ— is a satisfying assignment for ๐œ‘. Indeed, since๐‘†โˆ— is an independent set, it cannot have more than one vertex insideeach one of the ๐‘š triangles (๐ถ, ๐‘ฆ), (๐ถ, ๐‘ฆโ€ฒ), (๐ถ, ๐‘ฆโ€ณ) corresponding to aclause of ๐œ‘. Hence since |๐‘†โˆ—| = ๐‘š, it must have exactly one vertex ineach such triangle. For every clause ๐ถ of ๐œ‘, if (๐ถ, ๐‘ฆ) is the vertex in๐‘†โˆ— in the triangle corresponding to ๐ถ, then by the way we defined ๐‘ฅโˆ—,the literal ๐‘ฆ must evaluate to true, which means that ๐‘ฅโˆ— satisfies thisclause. Therefore ๐‘ฅโˆ— satisfies all clauses of ๐œ‘, which is the definition ofa satisfying assignment.

This completes the proof of Theorem 14.6

Figure 14.5: The reduction of 3SAT to Independent Set.On the righthand side is Python code that implementsthis reduction. On the lefthand side is a sampleoutput of the reduction. We use black for the โ€œtriangleedgesโ€ and red for the โ€œconflict edgesโ€. Note that thesatisfying assignment ๐‘ฅโˆ— = 0110 corresponds to theindependent set (0, ยฌ๐‘ฅ3), (1, ยฌ๐‘ฅ0), (2, ๐‘ฅ2).

14.5 SOME EXERCISES AND ANATOMY OF A REDUCTION.

Reductions can be confusing and working out exercises is a great wayto gain more comfort with them. Here is one such example. As usual,I recommend you try it out yourself before looking at the solution.Solved Exercise 14.3 โ€” Vertex cover. A vertex cover in a graph ๐บ = (๐‘‰ , ๐ธ)is a subset ๐‘† โŠ† ๐‘‰ of vertices such that every edge touches at least onevertex of ๐‘† (see ??). The vertex cover problem is the task to determine,given a graph ๐บ and a number ๐‘˜, whether there exists a vertex coverin the graph with at most ๐‘˜ vertices. Formally, this is the functionVC โˆถ 0, 1โˆ— โ†’ 0, 1 such that for every ๐บ = (๐‘‰ , ๐ธ) and ๐‘˜ โˆˆ โ„•,

Page 465: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 465

Figure 14.6: A vertex cover in a graph is a subset ofvertices that touches all edges. In this 7-vertex graph,the 3 filled vertices are a vertex cover.

VC(๐บ, ๐‘˜) = 1 if and only if there exists a vertex cover ๐‘† โŠ† ๐‘‰ such that|๐‘†| โ‰ค ๐‘˜.Prove that 3SAT โ‰ค๐‘ VC.

Solution:

The key observation is that if ๐‘† โŠ† ๐‘‰ is a vertex cover thattouches all vertices, then there is no edge ๐‘’ such that both ๐‘ โ€™s end-points are in the set ๐‘† = ๐‘‰ โงต ๐‘†, and vice versa. In other words,๐‘† is a vertex cover if and only if ๐‘† is an independent set. Sincethe size of ๐‘† is |๐‘‰ | โˆ’ |๐‘†|, we see that the polynomial-time map๐‘…(๐บ, ๐‘˜) = (๐บ, ๐‘› โˆ’ ๐‘˜) (where ๐‘› is the number of vertices of ๐บ)satisfies that VC(๐‘…(๐บ, ๐‘˜)) = ISET(๐บ, ๐‘˜) which means that it is areduction from independent set to vertex cover.

Solved Exercise 14.4 โ€” Clique is equivalent to independent set. The maximumclique problem corresponds to the function CLIQUE โˆถ 0, 1โˆ— โ†’ 0, 1such that for a graph ๐บ and a number ๐‘˜, CLIQUE(๐บ, ๐‘˜) = 1 iff thereis a ๐‘† subset of ๐‘˜ vertices such that for every distinct ๐‘ข, ๐‘ฃ โˆˆ ๐‘†, the edge๐‘ข, ๐‘ฃ is in ๐บ. Such a set is known as a clique.

Prove that CLIQUE โ‰ค๐‘ ISET and ISET โ‰ค๐‘ CLIQUE.

Solution:

If ๐บ = (๐‘‰ , ๐ธ) is a graph, we denote by ๐บ its complement whichis the graph on the same vertices ๐‘‰ and such that for every distinct๐‘ข, ๐‘ฃ โˆˆ ๐‘‰ , the edge ๐‘ข, ๐‘ฃ is present in ๐บ if and only if this edge isnot present in ๐บ.

This means that for every set ๐‘†, ๐‘† is an independent set in ๐บ ifand only if ๐‘† is a clique in ๐บ. Therefore for every ๐‘˜, ISET(๐บ, ๐‘˜) =CLIQUE(๐บ, ๐‘˜). Since the map ๐บ โ†ฆ ๐บ can be computed efficiently,this yields a reduction ISET โ‰ค๐‘ CLIQUE. Moreover, since ๐บ = ๐บthis yields a reduction in the other direction as well.

14.5.1 Dominating setIn the two examples above, the reduction was almost โ€œtrivialโ€: thereduction from independent set to vertex cover merely changes thenumber ๐‘˜ to ๐‘› โˆ’ ๐‘˜, and the reduction from independent set to cliqueflips edges to non-edges and vice versa. The following exercise re-quires a somewhat more interesting reduction.Solved Exercise 14.5 โ€” Dominating set. A dominating set in a graph ๐บ =(๐‘‰ , ๐ธ) is a subset ๐‘† โŠ† ๐‘‰ of vertices such that for every ๐‘ข โˆˆ ๐‘‰ โงต ๐‘† is aneighbor in ๐บ of some ๐‘  โˆˆ ๐‘† (see Fig. 14.7). The dominating set problem

Page 466: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

466 introduction to theoretical computer science

Figure 14.7: A dominating set is a subset ๐‘† of verticessuch that every vertex in the graph is either in ๐‘† or aneighbor of ๐‘†. The figure above are two copies of thesame graph. The red vertices on the left are a vertexcover that is not a dominating set. The blue verticeson the right are a dominating set that is not a vertexcover.

is the task, given a graph ๐บ = (๐‘‰ , ๐ธ) and number ๐‘˜, of determiningwhether there exists a dominating set ๐‘† โŠ† ๐‘‰ with |๐‘†| โ‰ค ๐‘˜. Formally,this is the function DS โˆถ 0, 1โˆ— โ†’ 0, 1 such that DS(๐บ, ๐‘˜) = 1 iffthere is a dominating set in ๐บ of at most ๐‘˜ vertices.

Prove that ISET โ‰ค๐‘ DS.

Solution:

Since we know that ISET โ‰ค๐‘ VC, using transitivity, it is enoughto show that VC โ‰ค๐‘ DS. As Fig. 14.7 shows, a dominating set isnot the same thing as a vertex cover. However, we can still relatethe two problems. The idea is to map a graph ๐บ into a graph ๐ปsuch that a vertex cover in ๐บ would translate into a dominating setin ๐ป and vice versa. We do so by including in ๐ป all the verticesand edges of ๐บ, but for every edge ๐‘ข, ๐‘ฃ of ๐บ we also add to ๐ป anew vertex ๐‘ค๐‘ข,๐‘ฃ and connect it to both ๐‘ข and ๐‘ฃ. Let โ„“ be the numberof isolated vertices in ๐บ. The idea behind the proof is that we cantransform a vertex cover ๐‘† of ๐‘˜ vertices in ๐บ into a dominating setof ๐‘˜ + โ„“ vertices in ๐ป by adding to ๐‘† all the isolated vertices, andmoreover we can transform every ๐‘˜ + โ„“ sized dominating set in ๐ปinto a vertex cover in ๐บ. We now give the details.

Description of the algorithm. Given an instance (๐บ, ๐‘˜) for thevertex cover problem, we will map ๐บ into an instance (๐ป, ๐‘˜โ€ฒ) forthe dominating set problem as follows (see Fig. 14.8 for Pythonimplementation):

Algorithm 14.8 โ€” ๐‘‰ ๐ถ to ๐ท๐‘† reduction.

Input: Graph ๐บ = (๐‘‰ , ๐ธ) and number ๐‘˜.Output: Graph ๐ป = (๐‘‰ โ€ฒ, ๐ธโ€ฒ) and number ๐‘˜โ€ฒ, such that๐บ has a vertex cover of size ๐‘˜ -iff ๐ป has a dominating

set of size ๐‘˜โ€ฒ1: That is, ๐ท๐‘†(๐ป, ๐‘˜โ€ฒ) = ๐ผ๐‘†๐ธ๐‘‡ (๐บ, ๐‘˜),2: Initialize ๐‘‰ โ€ฒ โ† ๐‘‰ , ๐ธโ€ฒ โ† ๐ธ3: for every edge ๐‘ข, ๐‘ฃ โˆˆ ๐ธ do4: Add vertex ๐‘ค๐‘ข,๐‘ฃ to ๐‘‰ โ€ฒ5: Add edges ๐‘ข, ๐‘ค๐‘ข,๐‘ฃ, ๐‘ฃ, ๐‘ค๐‘ข,๐‘ฃ to ๐ธโ€ฒ.6: end for7: Let โ„“ โ† number of isolated vertices in ๐บ8: return (๐ป = (๐‘‰ โ€ฒ, ๐ธโ€ฒ) , ๐‘˜ + โ„“)

Algorithm 14.8 runs in polynomial time, since the loop takes๐‘‚(๐‘š) steps where ๐‘š is the number of edges, with each step can beimplemented in constant or at most linear time (depending on the

Page 467: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 467

representation of the graph ๐ป). Counting the number of isolatedvertices in an ๐‘› vertex graph ๐บ can be done in time ๐‘‚(๐‘›2) if ๐บ isrepresented in the adjacency matrix representation and ๐‘‚(๐‘›) timeif it is represented in the adjacency list representation. Regardlessthe algorithm runs in polynomial time.

To complete the proof we need to prove that for every ๐บ, ๐‘˜, if๐ป, ๐‘˜โ€ฒ is the output of Algorithm 14.8 on input (๐บ, ๐‘˜), thenDS(๐ป, ๐‘˜โ€ฒ) = VC(๐บ, ๐‘˜). We split the proof into two parts. The

completeness part is that if VC(๐บ, ๐‘˜) = 1 then DS(๐ป, ๐‘˜โ€ฒ) = 1. Thesoundness part is that if DS(๐ป, ๐‘˜โ€ฒ) = 1 then VC(๐บ, ๐‘˜) = 1.

Completeness. Suppose that VC(๐บ, ๐‘˜) = 1. Then there is a ver-tex cover ๐‘† โŠ† ๐‘‰ of at most ๐‘˜ vertices. Let ๐ผ be the set of isolatedvertices in ๐บ and โ„“ be their number. Then |๐‘† โˆช ๐ผ| โ‰ค ๐‘˜ + โ„“. We claimthat ๐‘† โˆช ๐ผ is a dominating set in ๐ปโ€ฒ. Indeed for every vertex ๐‘ฃ of ๐ปโ€ฒthere are three cases:

โ€ข Case 1: ๐‘ฃ is an isolated vertex of ๐บ. In this case ๐‘ฃ is in ๐‘† โˆช ๐ผ .โ€ข Case 2: ๐‘ฃ is a non-isolated vertex of ๐บ and hence there is an edge๐‘ข, ๐‘ฃ of ๐บ for some ๐‘ข. In this case since ๐‘† is a vertex cover, one

of ๐‘ข, ๐‘ฃ has to be in ๐‘†, and hence either ๐‘ฃ or a neighbor of ๐‘ฃ has tobe in ๐‘† โŠ† ๐‘† โˆช ๐ผ .

โ€ข Case 3: ๐‘ฃ is of the form ๐‘ค๐‘ข,๐‘ขโ€ฒ for some two neighbors ๐‘ข, ๐‘ขโ€ฒ in ๐บ.But then since ๐‘† is a vertex cover, one of ๐‘ข, ๐‘ขโ€ฒ has to be in ๐‘† andhence ๐‘† contains a neighbor of ๐‘ฃ.We conclude that ๐‘† โˆช ๐ผ is a dominating set of size at most๐‘˜โ€ฒ = ๐‘˜ + โ„“ in ๐ปโ€ฒ and hence under the assumption that VC(๐บ.๐‘˜) = 1,

DS(๐ปโ€ฒ, ๐‘˜โ€ฒ) = 1.Soundness. Suppose that DS(๐ป, ๐‘˜โ€ฒ) = 1. Then there is a domi-

nating set ๐ท of size at most ๐‘˜โ€ฒ = ๐‘˜ + โ„“ in ๐ป . For every edge ๐‘ข, ๐‘ฃ inthe graph ๐บ, if ๐ท contains the vertex ๐‘ค๐‘ข,๐‘ฃ then we remove this ver-tex and add ๐‘ข in its place. The only two neighbors of ๐‘ค๐‘ข,๐‘ฃ are ๐‘ข and๐‘ฃ, and since ๐‘ข is a neighbor of both ๐‘ค๐‘ข,๐‘ฃ and of ๐‘ฃ, replacing ๐‘ค๐‘ข,๐‘ฃwith ๐‘ฃ maintains the property that it is a dominating set. Moreover,this change cannot increase the size of ๐ท. Thus following this mod-ification, we can assume that ๐ท is a dominating set of at most ๐‘˜ + โ„“vertices that does not contain any vertices of the form ๐‘ค๐‘ข,๐‘ฃ.

Let ๐ผ be the set of isolated vertices in ๐บ. These vertices are alsoisolated in ๐ป and hence must be included in ๐ท (an isolated ver-tex must be in any dominating set, since it has no neighbors). Welet ๐‘† = ๐ท โงต ๐ผ . Then |๐‘†| โ‰ค ๐ผ . We claim that ๐‘† is a vertex coverin ๐บ. Indeed, for every edge ๐‘ข, ๐‘ฃ of ๐บ, either the vertex ๐‘ค๐‘ข,๐‘ฃ or

Page 468: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

468 introduction to theoretical computer science

one of its neighbors must be in ๐‘† by the dominating set property.But since we ensured ๐‘† doesnโ€™t contain any of the vertices of theform ๐‘ค๐‘ข,๐‘ฃ, it must be the case that either ๐‘ข or ๐‘ฃ is in ๐‘†. This showsthat ๐‘† is a vertex cover of ๐บ of size at most ๐‘˜, hence proving thatVC(๐บ, ๐‘˜) = 1.

A corollary of Algorithm 14.8 and the other reduction we have seenso far is that if DS โˆˆ P (i.e., dominating set has a polynomial-time al-gorithm) then 3SAT โˆˆ P (i.e., 3SAT has a polynomial-time algorithm).By the contra-positive, if 3SAT does not have a polynomial-time algo-rithm then neither does dominating set.

Figure 14.8: Python implementation of the reductionfrom vertex cover to dominating set, together with anexample of an input graph and the resulting outputgraph. This reduction allows to transform a hypothet-ical polynomial-time algorithm for dominating set (aโ€œwhistling pigโ€) into a hypothetical polynomial-timealgorithm for vertex-cover (a โ€œflying horseโ€).

14.5.2 Anatomy of a reduction

Figure 14.9: The four components of a reduction,illustrated for the particular reduction of vertex coverto dominating set. A reduction from problem ๐น toproblem ๐บ is an algorithm that maps an input ๐‘ฅ for ๐นinto an input ๐‘…(๐‘ฅ) for ๐บ. To show that the reductionis correct we need to show the properties of efficiency:algorithm ๐‘… runs in polynomial time, completeness:if ๐น(๐‘ฅ) = 1 then ๐บ(๐‘…(๐‘ฅ)) = 1, and soundness: if๐น(๐‘…(๐‘ฅ)) = 1 then ๐บ(๐‘ฅ) = 1.

The reduction of Solved Exercise 14.5 gives a good illustration ofthe anatomy of a reduction. A reduction consists of four parts:โ€ข Algorithm description: This is the description of how the algorithm

maps an input into the output. For example, in Solved Exercise 14.5this is the description of how we map an instance (๐บ, ๐‘˜) of thevertex cover problem into an instance (๐ป, ๐‘˜โ€ฒ) of the dominating setproblem.

Page 469: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 469

โ€ข Algorithm analysis: It is not enough to describe how the algorithmworks but we need to also explain why it works. In particular weneed to provide an analysis explaining why the reduction is bothefficient (i.e., runs in polynomial time) and correct (satisfies that๐บ(๐‘…(๐‘ฅ)) = ๐น(๐‘ฅ) for every ๐‘ฅ). Specifically, the components ofanalysis of a reduction ๐‘… include:

โ€“ Efficiency: We need to show that ๐‘… runs in polynomial time. Inmost reductions we encounter this part is straightforward, as thereductions we typically use involve a constant number of nestedloops, each involving a constant number of operations. For ex-ample, the reduction of Solved Exercise 14.5 just enumerates overthe edges and vertices of the input graph.

โ€“ Completeness: In a reduction ๐‘… demonstrating ๐น โ‰ค๐‘ ๐บ, thecompleteness condition is the condition that for every ๐‘ฅ โˆˆ 0, 1โˆ—,if ๐น(๐‘ฅ) = 1 then ๐บ(๐‘…(๐‘ฅ)) = 1. Typically we construct thereduction to ensure that this holds, by giving a way to map aโ€œcertificate/solutionโ€ certifying that ๐น(๐‘ฅ) = 1 into a solutioncertifying that ๐บ(๐‘…(๐‘ฅ)) = 1. For example, in Solved Exercise 14.5we constructed the graph ๐ป such that for every vertex cover ๐‘†in ๐บ, the set ๐‘† โˆช ๐ผ (where ๐ผ is the isolated vertices) would be adominating set in ๐ป .

โ€“ Soundness: This is the condition that if ๐น(๐‘ฅ) = 0 then๐บ(๐‘…(๐‘ฅ)) = 0 or (taking the contrapositive) if ๐บ(๐‘…(๐‘ฅ)) = 1 then๐น(๐‘ฅ) = 1. This is sometimes straightforward but can often beharder to show than the completeness condition, and in moreadvanced reductions (such as the reduction SAT โ‰ค๐‘ ISETof Theorem 14.6) demonstrating soundness is the main partof the analysis. For example, in Solved Exercise 14.5 to showsoundness we needed to show that for every dominating set ๐ท inthe graph ๐ป , there exists a vertex cover ๐‘† of size at most |๐ท| โˆ’ โ„“in the graph ๐บ (where โ„“ is the number of isolated vertices).This was challenging since the dominating set ๐ท might not benecessarily the one we โ€œhad in mindโ€. In particular, in the proofabove we needed to modify ๐ท to ensure that it does not containvertices of the form ๐‘ค๐‘ข,๐‘ฃ, and it was important to show that thismodification still maintains the property that ๐ท is a dominatingset, and also does not make it bigger.

Whenever you need to provide a reduction, you should make surethat your description has all these components. While it is sometimestempting to weave together the description of the reduction and itsanalysis, it is usually clearer if you separate the two, and also breakdown the analysis to its three components of efficiency, completeness,and soundness.

Page 470: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

470 introduction to theoretical computer science

14.6 REDUCING INDEPENDENT SET TO MAXIMUM CUT

We now show that the independent set problem reduces to the maxi-mum cut (or โ€œmax cutโ€) problem, modeled as the function MAXCUTthat on input a pair (๐บ, ๐‘˜) outputs 1 iff ๐บ contains a cut of at least ๐‘˜edges. Since both are graph problems, a reduction from independentset to max cut maps one graph into the other, but as we will see theoutput graph does not have to have the same vertices or edges as theinput graph.

Theorem 14.9 โ€” Hardness of Max Cut. ISET โ‰ค๐‘ MAXCUT

Proof Idea:

We will map a graph ๐บ into a graph ๐ป such that a large indepen-dent set in ๐บ becomes a partition cutting many edges in ๐ป . We canthink of a cut in ๐ป as coloring each vertex either โ€œblueโ€ or โ€œredโ€. Wewill add a special โ€œsourceโ€ vertex ๐‘ โˆ—, connect it to all other vertices,and assume without loss of generality that it is colored blue. Hencethe more vertices we color red, the more edges from ๐‘ โˆ— we cut. Now,for every edge ๐‘ข, ๐‘ฃ in the original graph ๐บ we will add a special โ€œgad-getโ€ which will be a small subgraph that involves ๐‘ข,๐‘ฃ, the source ๐‘ โˆ—,and two other additional vertices. We design the gadget in a way sothat if the red vertices are not an independent set in ๐บ then the cor-responding cut in ๐ป will be โ€œpenalizedโ€ in the sense that it wouldnot cut as many edges. Once we set for ourselves this objective, it isnot hard to find a gadget that achieves itโˆ’ see the proof below. Onceagain the takeaway technique is to use (this time a slightly moreclever) gadget.โ‹†

Figure 14.10: In the reduction of ISET to MAXCUTwe map an ๐‘›-vertex ๐‘š-edge graph ๐บ into the๐‘› + 2๐‘š + 1 vertex and ๐‘› + 5๐‘š edge graph ๐ป asfollows. The graph ๐ป contains a special โ€œsourceโ€vertex ๐‘ โˆ—,๐‘› vertices ๐‘ฃ0, โ€ฆ , ๐‘ฃ๐‘›โˆ’1, and 2๐‘š ver-tices ๐‘’00, ๐‘’10, โ€ฆ , ๐‘’0๐‘šโˆ’1, ๐‘’1๐‘šโˆ’1 with each pair cor-responding to an edge of ๐บ. We put an edge be-tween ๐‘ โˆ— and ๐‘ฃ๐‘– for every ๐‘– โˆˆ [๐‘›], and if the ๐‘ก-thedge of ๐บ was (๐‘ฃ๐‘–, ๐‘ฃ๐‘—) then we add the five edges(๐‘ โˆ—, ๐‘’0๐‘ก ), (๐‘ โˆ—, ๐‘’1๐‘ก ), (๐‘ฃ๐‘–, ๐‘’0๐‘ก ), (๐‘ฃ๐‘—, ๐‘’1๐‘ก ), (๐‘’0๐‘ก , ๐‘’1๐‘ก ). The intentis that if cut at most one of ๐‘ฃ๐‘–, ๐‘ฃ๐‘— from ๐‘ โˆ— then weโ€™llbe able to cut 4 out of these five edges, while if wecut both ๐‘ฃ๐‘– and ๐‘ฃ๐‘— from ๐‘ โˆ— then weโ€™ll be able to cut atmost three of them.

Proof of Theorem 14.9. We will transform a graph ๐บ of ๐‘› vertices and ๐‘šedges into a graph ๐ป of ๐‘› + 1 + 2๐‘š vertices and ๐‘› + 5๐‘š edges in thefollowing way (see also Fig. 14.10). The graph ๐ป contains all vertices

Page 471: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 471

Figure 14.11: In the reduction of independent setto max cut, for every ๐‘ก โˆˆ [๐‘š], we have a โ€œgadgetโ€corresponding to the ๐‘ก-th edge ๐‘’ = ๐‘ฃ๐‘–, ๐‘ฃ๐‘— in theoriginal graph. If we think of the side of the cutcontaining the special source vertex ๐‘ โˆ— as โ€œwhiteโ€ andthe other side as โ€œblueโ€, then the leftmost and centerfigures show that if ๐‘ฃ๐‘– and ๐‘ฃ๐‘— are not both blue thenwe can cut four edges from the gadget. In contrast,by enumerating all possibilities one can verify thatif both ๐‘ข and ๐‘ฃ are blue, then no matter how wecolor the intermediate vertices ๐‘’0๐‘ก , ๐‘’1๐‘ก , we will cut atmost three edges from the gadget. The figure abovecontains only the gadget edges and ignores the edgesconnecting ๐‘ โˆ— to the vertices ๐‘ฃ0, โ€ฆ , ๐‘ฃ๐‘›โˆ’1.

of ๐บ (though not the edges between them!) and in addition ๐ป alsohas:

* A special vertex ๐‘ โˆ— that is connected to all the vertices of ๐บ* For every edge ๐‘’ = ๐‘ข, ๐‘ฃ โˆˆ ๐ธ(๐บ), two vertices ๐‘’0, ๐‘’1 such that ๐‘’0

is connected to ๐‘ข and ๐‘’1 is connected to ๐‘ฃ, and moreover we add theedges ๐‘’0, ๐‘’1, ๐‘’0, ๐‘ โˆ—, ๐‘’1, ๐‘ โˆ— to ๐ป .

Theorem 14.9 will follow by showing that ๐บ contains an indepen-dent set of size at least ๐‘˜ if and only if ๐ป has a cut cutting at least๐‘˜ + 4๐‘š edges. We now prove both directions of this equivalence:

Part 1: Completeness. If ๐ผ is an independent ๐‘˜-sized set in ๐บ, thenwe can define ๐‘† to be a cut in ๐ป of the following form: we let ๐‘† con-tain all the vertices of ๐ผ and for every edge ๐‘’ = ๐‘ข, ๐‘ฃ โˆˆ ๐ธ(๐บ), if ๐‘ข โˆˆ ๐ผand ๐‘ฃ โˆ‰ ๐ผ then we add ๐‘’1 to ๐‘†; if ๐‘ข โˆ‰ ๐ผ and ๐‘ฃ โˆˆ ๐ผ then we add ๐‘’0 to๐‘†; and if ๐‘ข โˆ‰ ๐ผ and ๐‘ฃ โˆ‰ ๐ผ then we add both ๐‘’0 and ๐‘’1 to ๐‘†. (We donโ€™tneed to worry about the case that both ๐‘ข and ๐‘ฃ are in ๐ผ since it is anindependent set.) We can verify that in all cases the number of edgesfrom ๐‘† to its complement in the gadget corresponding to ๐‘’ will be four(see Fig. 14.11). Since ๐‘ โˆ— is not in ๐‘†, we also have ๐‘˜ edges from ๐‘ โˆ— to ๐ผ ,for a total of ๐‘˜ + 4๐‘š edges.

Part 2: Soundness. Suppose that ๐‘† is a cut in ๐ป that cuts at least๐ถ = ๐‘˜ + 4๐‘š edges. We can assume that ๐‘ โˆ— is not in ๐‘† (otherwise wecan โ€œflipโ€ ๐‘† to its complement ๐‘†, since this does not change the sizeof the cut). Now let ๐ผ be the set of vertices in ๐‘† that correspond to theoriginal vertices of ๐บ. If ๐ผ was an independent set of size ๐‘˜ then wewould be done. This might not always be the case but we will see thatif ๐ผ is not an independent set then itโ€™s also larger than ๐‘˜. Specifically,we define ๐‘š๐‘–๐‘› = |๐ธ(๐ผ, ๐ผ)| be the set of edges in ๐บ that are containedin ๐ผ and let ๐‘š๐‘œ๐‘ข๐‘ก = ๐‘š โˆ’ ๐‘š๐‘–๐‘› (i.e., if ๐ผ is an independent set then๐‘š๐‘–๐‘› = 0 and ๐‘š๐‘œ๐‘ข๐‘ก = ๐‘š). By the properties of our gadget we knowthat for every edge ๐‘ข, ๐‘ฃ of ๐บ, we can cut at most three edges whenboth ๐‘ข and ๐‘ฃ are in ๐‘†, and at most four edges otherwise. Hence thenumber ๐ถ of edges cut by ๐‘† satisfies ๐ถ โ‰ค |๐ผ| + 3๐‘š๐‘–๐‘› + 4๐‘š๐‘œ๐‘ข๐‘ก =|๐ผ| + 3๐‘š๐‘–๐‘› + 4(๐‘š โˆ’ ๐‘š๐‘–๐‘›) = |๐ผ| + 4๐‘š โˆ’ ๐‘š๐‘–๐‘›. Since ๐ถ = ๐‘˜ + 4๐‘š weget that |๐ผ| โˆ’ ๐‘š๐‘–๐‘› โ‰ฅ ๐‘˜. Now we can transform ๐ผ into an independentset ๐ผโ€ฒ by going over every one of the ๐‘š๐‘–๐‘› edges that are inside ๐ผ andremoving one of the endpoints of the edge from it. The resulting set ๐ผโ€ฒis an independent set in the graph ๐บ of size |๐ผ| โˆ’ ๐‘š๐‘–๐‘› โ‰ฅ ๐‘˜ and so thisconcludes the proof of the soundness condition.

Page 472: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

472 introduction to theoretical computer science

Figure 14.12: The reduction of independent set tomax cut. On the righthand side is Python codeimplementing the reduction. On the lefthand side isan example output of the reduction where we applyit to the independent set instance that is obtained byrunning the reduction of Theorem 14.6 on the 3CNFformula (๐‘ฅ0 โˆจ๐‘ฅ3 โˆจ๐‘ฅ2)โˆง(๐‘ฅ0 โˆจ๐‘ฅ1 โˆจ๐‘ฅ2)โˆง(๐‘ฅ1 โˆจ๐‘ฅ2 โˆจ๐‘ฅ3).

Figure 14.13: We can transform a 3SAT formula ๐œ‘ intoa graph ๐บ such that the longest path in the graph ๐บwould correspond to a satisfying assignment in ๐œ‘. Inthis graph, the black colored part corresponds to thevariables of ๐œ‘ and the blue colored part correspondsto the vertices. A sufficiently long path would have tofirst โ€œsnakeโ€ through the black part, for each variablechoosing either the โ€œupper pathโ€ (correspondingto assigning it the value True) or the โ€œlower pathโ€(corresponding to assigning it the value False). Thento achieve maximum length the path would traversethrough the blue part, where to go between twovertices corresponding to a clause such as ๐‘ฅ17 โˆจ ๐‘ฅ32 โˆจ๐‘ฅ57, the corresponding vertices would have to havebeen not traversed before.

Figure 14.14: The graph above with the longest pathmarked on it, the part of the path corresponding tovariables is in green and part corresponding to theclauses is in pink.

14.7 REDUCING 3SAT TO LONGEST PATH

Note: This section is still a little messy; feel free to skip it or just readit without going into the proof details. The proof appears in Section7.5 in Sipserโ€™s book.

One of the most basic algorithms in Computer Science is Dijkstraโ€™salgorithm to find the shortest path between two vertices. We now showthat in contrast, an efficient algorithm for the longest path problemwould imply a polynomial-time algorithm for 3SAT.

Theorem 14.10 โ€” Hardness of longest path.3SAT โ‰ค๐‘ LONGPATH (14.6)

Proof Idea:

To prove Theorem 14.10 need to show how to transform a 3CNFformula ๐œ‘ into a graph ๐บ and two vertices ๐‘ , ๐‘ก such that ๐บ has a pathof length at least ๐‘˜ if and only if ๐œ‘ is satisfiable. The idea of the reduc-tion is sketched in Fig. 14.13 and Fig. 14.14. We will construct a graphthat contains a potentially long โ€œsnaking pathโ€ that corresponds toall variables in the formula. We will add a โ€œgadgetโ€ correspondingto each clause of ๐œ‘ in a way that we would only be able to use thegadgets if we have a satisfying assignment.โ‹†def TSAT2LONGPATH(ฯ†):

"""Reduce 3SAT to LONGPATH"""def var(v): # return variable and True/False depending

if positive or negatedreturn int(v[2:]),False if v[0]=="ยฌ" else

int(v[1:]),Truen = numvars(ฯ†)clauses = getclauses(ฯ†)m = len(clauses)G =Graph()

Page 473: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 473

G.edge("start","start_0")for i in range(n): # add 2 length-m paths per variable

G.edge(f"start_i",f"v_i_0_T")G.edge(f"start_i",f"v_i_0_F")for j in range(m-1):

G.edge(f"v_i_j_T",f"v_i_j+1_T")G.edge(f"v_i_j_F",f"v_i_j+1_F")

G.edge(f"v_i_m-1_T",f"end_i")G.edge(f"v_i_m-1_F",f"end_i")if i<n-1:

G.edge(f"end_i",f"start_i+1")G.edge(f"end_n-1","start_clauses")for j,C in enumerate(clauses): # add gadget for each

clausefor v in enumerate(C):

i,sign = var(v[1])s = "F" if sign else "T"G.edge(f"C_j_in",f"v_i_j_s")G.edge(f"v_i_j_s",f"C_j_out")

if j<m-1:G.edge(f"C_j_out",f"C_j+1_in")

G.edge("start_clauses","C_0_in")G.edge(f"C_m-1_out","end")return G, 1+n*(m+1)+1+2*m+1

Proof of Theorem 14.10. We build a graph ๐บ that โ€œsnakesโ€ from ๐‘  to ๐‘ก asfollows. After ๐‘  we add a sequence of ๐‘› long loops. Each loop has anโ€œupper pathโ€ and a โ€œlower pathโ€. A simple path cannot take both theupper path and the lower path, and so it will need to take exactly oneof them to reach ๐‘  from ๐‘ก.

Our intention is that a path in the graph will correspond to an as-signment ๐‘ฅ โˆˆ 0, 1๐‘› in the sense that taking the upper path in the ๐‘–๐‘กโ„Žloop corresponds to assigning ๐‘ฅ๐‘– = 1 and taking the lower path cor-responds to assigning ๐‘ฅ๐‘– = 0. When we are done snaking through allthe ๐‘› loops corresponding to the variables to reach ๐‘ก we need to passthrough ๐‘š โ€œobstaclesโ€: for each clause ๐‘— we will have a small gad-get consisting of a pair of vertices ๐‘ ๐‘—, ๐‘ก๐‘— that have three paths betweenthem. For example, if the ๐‘—๐‘กโ„Ž clause had the form ๐‘ฅ17 โˆจ ๐‘ฅ55 โˆจ ๐‘ฅ72 thenone path would go through a vertex in the lower loop correspondingto ๐‘ฅ17, one path would go through a vertex in the upper loop corre-sponding to ๐‘ฅ55 and the third would go through the lower loop cor-responding to ๐‘ฅ72. We see that if we went in the first stage accordingto a satisfying assignment then we will be able to find a free vertex totravel from ๐‘ ๐‘— to ๐‘ก๐‘—. We link ๐‘ก1 to ๐‘ 2, ๐‘ก2 to ๐‘ 3, etc and link ๐‘ก๐‘š to ๐‘ก. Thus

Page 474: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

474 introduction to theoretical computer science

Figure 14.15: The result of applying the reduction of3SAT to LONGPATH to the formula (๐‘ฅ0 โˆจ ยฌ๐‘ฅ3 โˆจ ๐‘ฅ2) โˆง(ยฌ๐‘ฅ0 โˆจ ๐‘ฅ1 โˆจ ยฌ๐‘ฅ2) โˆง (๐‘ฅ1 โˆจ ๐‘ฅ2 โˆจ ยฌ๐‘ฅ3).

a satisfying assignment would correspond to a path from ๐‘  to ๐‘ก thatgoes through one path in each loop corresponding to the variables,and one path in each loop corresponding to the clauses. We can makethe loop corresponding to the variables long enough so that we musttake the entire path in each loop in order to have a fighting chance ofgetting a path as long as the one corresponds to a satisfying assign-ment. But if we do that, then the only way if we are able to reach ๐‘ก isif the paths we took corresponded to a satisfying assignment, sinceotherwise we will have one clause ๐‘— where we cannot reach ๐‘ก๐‘— from ๐‘ ๐‘—without using a vertex we already used before.

14.7.1 Summary of relationsWe have shown that there are a number of functions ๐น for which wecan prove a statement of the form โ€œIf ๐น โˆˆ P then 3SAT โˆˆ Pโ€. Hencecoming up with a polynomial-time algorithm for even one of theseproblems will entail a polynomial-time algorithm for 3SAT (see forexample Fig. 14.16). In Chapter 15 we will show the inverse direction(โ€œIf 3SAT โˆˆ P then ๐น โˆˆ Pโ€) for these functions, hence allowing us toconclude that they have equivalent complexity to 3SAT.

Figure 14.16: So far we have shown that P โŠ† EXPand that several problems we care about such as3SAT and MAXCUT are in EXP but it is not knownwhether or not they are in EXP. However, since3SAT โ‰ค๐‘ MAXCUT we can rule out the possiblitythat MAXCUT โˆˆ P but 3SAT โˆ‰ P. The relation ofP/poly to the class EXP is not known. We know thatEXP does not contain P/poly since the latter evencontains uncomputable functions, but we do notknow whether ot not EXP โŠ† P/poly (though it isbelieved that this is not the case and in particular thatboth 3SAT and MAXCUT are not in P/poly). Chapter Recap

โ€ข The computational complexity of many seeminglyunrelated computational problems can be relatedto one another through the use of reductions.

โ€ข If ๐น โ‰ค๐‘ ๐บ then a polynomial-time algorithmfor ๐บ can be transformed into a polynomial-timealgorithm for ๐น .

โ€ข Equivalently, if ๐น โ‰ค๐‘ ๐บ and ๐น does not have apolynomial-time algorithm then neither does ๐บ.

โ€ข Weโ€™ve developed many techniques to show that3SAT โ‰ค๐‘ ๐น for interesting functions ๐น . Sometimeswe can do so by using transitivity of reductions: if3SAT โ‰ค๐‘ ๐บ and ๐บ โ‰ค๐‘ ๐น then 3SAT โ‰ค๐‘ ๐น .

Page 475: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

polynomial-time reductions 475

14.8 EXERCISES

14.9 BIBLIOGRAPHICAL NOTES

Several notions of reductions are defined in the literature. The notiondefined in Definition 14.1 is often known as a mapping reduction, manyto one reduction or a Karp reduction.

The maximal (as opposed to maximum) independent set is the taskof finding a โ€œlocal maximumโ€ of an independent set: an independentset ๐‘† such that one cannot add a vertex to it without losing the in-dependence property (such a set is known as a vertex cover). Unlikefinding a maximum independent set, finding a maximal independentset can be done efficiently by a greedy algorithm, but this local maxi-mum can be much smaller than the global maximum.

Reduction of independent set to max cut taken from these notes.Image of Hamiltonian Path through Dodecahedron by ChristophSommer.

We have mentioned that the line between reductions used for algo-rithm design and showing hardness is sometimes blurry. An excellentexample for this is the area of SAT Solvers (see [Gom+08]). In thisfield people use algorithms for SAT (that take exponential time in theworst case but often are much faster on many instances in practice)together with reductions of the form ๐น โ‰ค๐‘ SAT to derive algorithmsfor other functions ๐น of interest.

Page 476: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 477: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

15NP, NP completeness, and the Cook-Levin Theorem

โ€œIn this paper we give theorems that suggest, but do not imply, that theseproblems, as well as many others, will remain intractable perpetuallyโ€, RichardKarp, 1972

โ€œSad to say, but it will be many more years, if ever before we really understandthe Mystical Power of Twonessโ€ฆ 2-SAT is easy, 3-SAT is hard, 2-dimensionalmatching is easy, 3-dimensional matching is hard. Why? oh, Why?โ€ EugeneLawler

So far we have shown that 3SAT is no harder than Quadratic Equa-tions, Independent Set, Maximum Cut, and Longest Path. But to showthat these problems are computationally equivalent we need to give re-ductions in the other direction, reducing each one of these problems to3SAT as well. It turns out we can reduce all three problems to 3SAT inone fell swoop.

In fact, this result extends far beyond these particular problems. Allof the problems we discussed in Chapter 14, and a great many otherproblems, share the same commonality: they are all search problems,where the goal is to decide, given an instance ๐‘ฅ, whether there existsa solution ๐‘ฆ that satisfies some condition that can be verified in poly-nomial time. For example, in 3SAT, the instance is a formula and thesolution is an assignment to the variable; in Max-Cut the instance is agraph and the solution is a cut in the graph; and so on and so forth. Itturns out that every such search problem can be reduced to 3SAT.

15.1 THE CLASS NPTo make the above precise, we will make the following mathematicaldefinition. We define the class NP to contain all Boolean functions thatcorrespond to a search problem of the form above. That is, a Booleanfunction ๐น is in NP if ๐น has the form that on input a string ๐‘ฅ, ๐น(๐‘ฅ) = 1if and only if there exists a โ€œsolutionโ€ string ๐‘ค such that the pair (๐‘ฅ, ๐‘ค)satisfies some polynomial-time checkable condition. Formally, NP isdefined as follows:

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Introduce the class NP capturing a great many

important computational problemsโ€ข NP-completeness: evidence that a problem

might be intractable.โ€ข The P vs NP problem.

Page 478: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

478 introduction to theoretical computer science

Figure 15.1: Overview of the results of this chapter.We define NP to contain all decision problems forwhich a solution can be efficiently verified. The mainresult of this chapter is the Cook Levin Theorem (The-orem 15.6) which states that 3SAT has a polynomial-time algorithm if and only if every problem in NPhas a polynomial-time algorithm. Another way tostate this theorem is that 3SAT is NP complete. Wewill prove the Cook-Levin theorem by defining thetwo intermediate problems NANDSAT and 3NAND,proving that NANDSAT is NP complete, and thenproving that NANDSAT โ‰ค๐‘ 3NAND โ‰ค๐‘ 3SAT.

Figure 15.2: The class NP corresponds to problemswhere solutions can be efficiently verified. That is, thisis the class of functions ๐น such that ๐น(๐‘ฅ) = 1 if thereis a โ€œsolutionโ€ ๐‘ค of length polynomial in |๐‘ฅ| that canbe verified by a polynomial-time algorithm ๐‘‰ .

Definition 15.1 โ€” NP. We say that ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is in NP if thereexists some integer ๐‘Ž > 0 and ๐‘‰ โˆถ 0, 1โˆ— โ†’ 0, 1 such that ๐‘‰ โˆˆ Pand for every ๐‘ฅ โˆˆ 0, 1๐‘›,๐น(๐‘ฅ) = 1 โ‡” โˆƒ๐‘คโˆˆ0,1๐‘›๐‘Ž s.t. ๐‘‰ (๐‘ฅ๐‘ค) = 1 . (15.1)

In other words, for ๐น to be in NP, there needs to exist somepolynomial-time computable verification function ๐‘‰ , such that if๐น(๐‘ฅ) = 1 then there must exist ๐‘ค (of length polynomial in |๐‘ฅ|) suchthat ๐‘‰ (๐‘ฅ๐‘ค) = 1, and if ๐น(๐‘ฅ) = 0 then for every such ๐‘ค, ๐‘‰ (๐‘ฅ๐‘ค) = 0.Since the existence of this string ๐‘ค certifies that ๐น(๐‘ฅ) = 1, ๐‘ค is oftenreferred to as a certificate, witness, or proof that ๐น(๐‘ฅ) = 1.

See also Fig. 15.2 for an illustration of Definition 15.1. The nameNP stands for โ€œnondeterministic polynomial timeโ€ and is used forhistorical reasons; see the bibiographical notes. The string ๐‘ค in (15.1)is sometimes known as a solution, certificate, or witness for the instance๐‘ฅ.Solved Exercise 15.1 โ€” Alternative definition of NP. Show that the conditionthat |๐‘ค| = |๐‘ฅ|๐‘Ž in Definition 15.1 can be replaced by the conditionthat |๐‘ค| โ‰ค ๐‘(|๐‘ฅ|) for some polynomial ๐‘. That is, prove that for every๐น โˆถ 0, 1โˆ— โ†’ 0, 1, ๐น โˆˆ NP if and only if there is a polynomial-time Turing machine ๐‘‰ and a polynomial ๐‘ โˆถ โ„• โ†’ โ„• such that forevery ๐‘ฅ โˆˆ 0, 1โˆ— ๐น(๐‘ฅ) = 1 if and only if there exists ๐‘ค โˆˆ 0, 1โˆ— with|๐‘ค| โ‰ค ๐‘(|๐‘ฅ|) such that ๐‘‰ (๐‘ฅ, ๐‘ค) = 1.

Page 479: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 479

Solution:

The โ€œonly ifโ€ direction (namely that if ๐น โˆˆ NP then there is analgorithm ๐‘‰ and a polynomial ๐‘ as above) follows immediatelyfrom Definition 15.1 by letting ๐‘(๐‘›) = ๐‘›๐‘Ž. For the โ€œifโ€ direc-tion, the idea is that if a string ๐‘ค is of size at most ๐‘(๐‘›) for degree๐‘‘ polynomial ๐‘, then there is some ๐‘›0 such that for all ๐‘› > ๐‘›0,|๐‘ค| < ๐‘›๐‘‘+1. Hence we can encode ๐‘ค by a string of exactly length๐‘›๐‘‘+1 by padding it with 1 and an appropriate number of zeroes.Hence if there is an algorithm ๐‘‰ and polynomial ๐‘ as above, thenwe can define an algorithm ๐‘‰ โ€ฒ that does the following on input๐‘ฅ, ๐‘คโ€ฒ with |๐‘ฅ| = ๐‘› and |๐‘คโ€ฒ| = ๐‘›๐‘Ž:โ€ข If ๐‘› โ‰ค ๐‘›0 then ๐‘‰ โ€ฒ(๐‘ฅ, ๐‘คโ€ฒ) ignores ๐‘คโ€ฒ and enumerates over all ๐‘ค

of length at most ๐‘(๐‘›) and outputs 1 if there exists ๐‘ค such that๐‘‰ (๐‘ฅ, ๐‘ค) = 1. (Since ๐‘› < ๐‘›0, this only takes a constant number ofsteps.)

โ€ข If ๐‘› > ๐‘›0 then ๐‘‰ โ€ฒ(๐‘ฅ, ๐‘คโ€ฒ) โ€œstrips outโ€ the padding by droppingall the rightmost zeroes from ๐‘ค until it reaches out the first 1(which it drops as well) and obtains a string ๐‘ค. If |๐‘ค| โ‰ค ๐‘(๐‘›)then ๐‘‰ โ€ฒ outputs ๐‘‰ (๐‘ฅ, ๐‘ค).Since ๐‘‰ runs in polynomial time, ๐‘‰ โ€ฒ runs in polynomial time

as well, and by definition for every ๐‘ฅ, there exists ๐‘คโ€ฒ โˆˆ 0, 1|๐‘ฅ|๐‘Žsuch that ๐‘‰ โ€ฒ(๐‘ฅ๐‘คโ€ฒ) = 1 if and only if there exists ๐‘ค โˆˆ 0, 1โˆ— with|๐‘ค| โ‰ค ๐‘(|๐‘ฅ|) such that ๐‘‰ (๐‘ฅ๐‘ค) = 1.

The definition of NP means that for every ๐น โˆˆ NP and string๐‘ฅ โˆˆ 0, 1โˆ—, ๐น(๐‘ฅ) = 1 if and only if there is a short and efficientlyverifiable proof of this fact. That is, we can think of the function ๐‘‰ inDefinition 15.1 as a verifier algorithm, similar to what weโ€™ve seen inSection 11.1. The verifier checks whether a given string ๐‘ค โˆˆ 0, 1โˆ— is avalid proof for the statement โ€œ๐น(๐‘ฅ) = 1โ€. Essentially all proof systemsconsidered in mathematics involve line-by-line checks that can be car-ried out in polynomial time. Thus the heart of NP is asking for state-ments that have short (i.e., polynomial in the size of the statements)proof. Indeed, as we will see in Chapter 16, Kurt Gรถdel phrased thequestion of whether NP = P as asking whether โ€œthe mental work ofa mathematician [in proving theorems] could be completely replacedby a machineโ€.

RRemark 15.2 โ€” NP not (necessarily) closed under com-plement. Definition 15.1 is asymmetric in the sense that

Page 480: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

480 introduction to theoretical computer science

there is a difference between an output of 1 and anoutput of 0. You should make sure you understandwhy this definition does not guarantee that if ๐น โˆˆ NPthen the function 1 โˆ’ ๐น (i.e., the map ๐‘ฅ โ†ฆ 1 โˆ’ ๐น(๐‘ฅ)) isin NP as well.In fact, it is believed that there do exist functions ๐นsuch that ๐น โˆˆ NP but 1 โˆ’ ๐น โˆ‰ NP. For example, asshown below, 3SAT โˆˆ NP, but the function 3SAT thaton input a 3CNF formula ๐œ‘ outputs 1 if and only if ๐œ‘is not satisfiable is not known (nor believed) to be inNP. This is in contrast to the class P which does satisfythat if ๐น โˆˆ P then 1 โˆ’ ๐น is in P as well.

15.1.1 Examples of functions in NPWe now present some examples of functions that are in the class NP.We start with the canonical example of the 3SAT function.

Example 15.3 โ€” 3๐‘†๐ด๐‘‡ โˆˆ NP. 3SAT is in NP since for every โ„“-variable formula ๐œ‘, 3SAT(๐œ‘) = 1 if and only if there exists asatisfying assignment ๐‘ฅ โˆˆ 0, 1โ„“ such that ๐œ‘(๐‘ฅ) = 1, and wecan check this condition in polynomial time.

The above reasoning explains why 3SAT is in NP, but since thisis our first example, we will now belabor the point and expand outin full formality the precise representation of the witness ๐‘ค and thealgorithm ๐‘‰ that demonstrate that 3SAT is in NP. Since demon-strating that functions are in NP is fairly straightforward, in futurecases we will not use as much detail, and the reader can also feelfree to skip the rest of this example.

Using Solved Exercise 15.1, it is OK if witness is of size at mostpolynomial in the input length ๐‘›, rather than of precisely size ๐‘›๐‘Žfor some integer ๐‘Ž > 0. Specifically, we can represent a 3CNFformula ๐œ‘ with ๐‘˜ variables and ๐‘š clauses as a string of length๐‘› = ๐‘‚(๐‘š log ๐‘˜), since every one of the ๐‘š clauses involves threevariables and their negation, and the identity of each variable canbe represented using โŒˆlog2 ๐‘˜โŒ‰. We assume that every variable par-ticipates in some clause (as otherwise it can be ignored) and hencethat ๐‘š โ‰ฅ ๐‘˜, which in particular means that the input length ๐‘› is atleast as large as ๐‘š and ๐‘˜.

We can represent an assignment to the ๐‘˜ variables using a ๐‘˜-length string ๐‘ค. The following algorithm checks whether a given ๐‘คsatisfies the formula ๐œ‘:

Page 481: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 481

Algorithm 15.4 โ€” Verifier for 3๐‘†๐ด๐‘‡ .

Input: 3CNF formula ๐œ‘ on ๐‘˜ variables and with ๐‘šclauses, string ๐‘ค โˆˆ 0, 1๐‘˜

Output: 1 iff ๐‘ค satisfies ๐œ‘1: for ๐‘— โˆˆ [๐‘š] do2: Let โ„“1 โˆจ โ„“2 โˆจ โ„“๐‘— be the ๐‘—-th clause of ๐œ‘3: if ๐‘ค violates all three literals then4: return 05: end if6: end for7: return 1

Algorithm 15.4 takes ๐‘‚(๐‘š) time to enumerate over all clauses,and will return 1 if and only if ๐‘ฆ satisfies all the clauses.

Here are some more examples for problems in NP. For each oneof these problems we merely sketch how the witness is representedand why it is efficiently checkable, but working out the details can be agood way to get more comfortable with Definition 15.1:

โ€ข QUADEQ is in NP since for every โ„“-variable instance of quadraticequations ๐ธ, QUADEQ(๐ธ) = 1 if and only if there exists an assign-ment ๐‘ฅ โˆˆ 0, 1โ„“ that satisfies ๐ธ. We can check the condition that๐‘ฅ satisfies ๐ธ in polynomial time by enumerating over all the equa-tions in ๐ธ, and for each such equation ๐‘’, plug in the values of ๐‘ฅ andverify that ๐‘’ is satisfied.

โ€ข ISET is in NP since for every graph ๐บ and integer ๐‘˜, ISET(๐บ, ๐‘˜) =1 if and only if there exists a set ๐‘† of ๐‘˜ vertices that contains nopair of neighbors in ๐บ. We can check the condition that ๐‘† is anindependent set of size โ‰ฅ ๐‘˜ in polynomial time by first checkingthat |๐‘†| โ‰ฅ ๐‘˜ and then enumerating over all edges ๐‘ข, ๐‘ฃ in ๐บ, andfor each such edge verify that either ๐‘ข โˆ‰ ๐‘† or ๐‘ฃ โˆ‰ ๐‘†.

โ€ข LONGPATH is in NP since for every graph ๐บ and integer ๐‘˜,LONGPATH(๐บ, ๐‘˜) = 1 if and only if there exists a simple path ๐‘ƒin ๐บ that is of length at least ๐‘˜. We can check the condition that ๐‘ƒis a simple path of length ๐‘˜ in polynomial time by checking that ithas the form (๐‘ฃ0, ๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘˜) where each ๐‘ฃ๐‘– is a vertex in ๐บ, no ๐‘ฃ๐‘– isrepeated, and for every ๐‘– โˆˆ [๐‘˜], the edge ๐‘ฃ๐‘–, ๐‘ฃ๐‘–+1 is present in thegraph.

โ€ข MAXCUT is in NP since for every graph ๐บ and integer ๐‘˜,MAXCUT(๐บ, ๐‘˜) = 1 if and only if there exists a cut (๐‘†, ๐‘†) in ๐บ thatcuts at least ๐‘˜ edges. We can check that condition that (๐‘†, ๐‘†) is acut of value at least ๐‘˜ in polynomial time by checking that ๐‘† is a

Page 482: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

482 introduction to theoretical computer science

subset of ๐บโ€™s vertices and enumerating over all the edges ๐‘ข, ๐‘ฃ of๐บ, counting those edges such that ๐‘ข โˆˆ ๐‘† and ๐‘ฃ โˆ‰ ๐‘† or vice versa.

15.1.2 Basic facts about NPThe definition of NP is one of the most important definitions of thisbook, and is worth while taking the time to digest and internalize. Thefollowing solved exercises establish some basic properties of this class.As usual, I highly recommend that you try to work out the solutionsyourself.Solved Exercise 15.2 โ€” Verifying is no harder than solving. Prove that P โŠ† NP.

Solution:

Suppose that ๐น โˆˆ P. Define the following function ๐‘‰ : ๐‘‰ (๐‘ฅ0๐‘›) =1 iff ๐‘› = |๐‘ฅ| and ๐น(๐‘ฅ) = 1. (๐‘‰ outputs 0 on all other inputs.) Since๐น โˆˆ P we can clearly compute ๐‘‰ in polynomial time as well.Let ๐‘ฅ โˆˆ 0, 1๐‘› be some string. If ๐น(๐‘ฅ) = 1 then ๐‘‰ (๐‘ฅ0๐‘›) = 1. On

the other hand, if ๐น(๐‘ฅ) = 0 then for every ๐‘ค โˆˆ 0, 1๐‘›, ๐‘‰ (๐‘ฅ๐‘ค) = 0.Therefore, setting ๐‘Ž = ๐‘ = 1, we see that ๐‘‰ satisfies (15.1), and es-tablishes that ๐น โˆˆ NP.

RRemark 15.5 โ€” NP does not mean non-polynomial!.People sometimes think that NP stands for โ€œnon poly-nomial timeโ€. As Solved Exercise 15.2 shows, this isfar from the truth, and in fact every polynomial-timecomputable function is in NP as well.If ๐น is in NP it certainly does not mean that ๐น is hardto compute (though it does not, as far as we know,necessarily mean that itโ€™s easy to compute either).Rather, it means that ๐น is easy to verify, in the technicalsense of Definition 15.1.

Solved Exercise 15.3 โ€” NP is in exponential time. Prove that NP โŠ† EXP.

Solution:

Suppose that ๐น โˆˆ NP and let ๐‘‰ be the polynomial-time com-putable function that satisfies (15.1) and ๐‘Ž the correspondingconstant. Then given every ๐‘ฅ โˆˆ 0, 1๐‘›, we can check whether๐น(๐‘ฅ) = 1 in time ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) โ‹… 2๐‘›๐‘Ž = ๐‘œ(2๐‘›๐‘Ž+1) by enumerating overall the 2๐‘›๐‘Ž strings ๐‘ค โˆˆ 0, 1๐‘›๐‘Ž and checking whether ๐‘‰ (๐‘ฅ๐‘ค) = 1,in which case we return 1. If ๐‘‰ (๐‘ฅ๐‘ค) = 0 for every such ๐‘ค then wereturn 0. By construction, the algorithm above will run in time at

Page 483: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 483

most exponential in its input length and by the definition of NP itwill return ๐น(๐‘ฅ) for every ๐‘ฅ.

Solved Exercise 15.2 and Solved Exercise 15.3 together imply that

P โŠ† NP โŠ† EXP . (15.2)

The time hierarchy theorem (Theorem 13.9) implies that P โŠŠ EXPand hence at least one of the two inclusions P โŠ† NP or NP โŠ† EXPis strict. It is believed that both of them are in fact strict inclusions.That is, it is believed that there are functions in NP that cannot becomputed in polynomial time (this is the P โ‰  NP conjecture) andthat there are functions ๐น in EXP for which we cannot even effi-ciently certify that ๐น(๐‘ฅ) = 1 for a given input ๐‘ฅ. One function ๐นthat is believed to lie in EXP โงต NP is the function 3SAT defined as3SAT(๐œ‘) = 1 โˆ’ 3SAT(๐œ‘) for every 3CNF formula ๐œ‘. The conjecturethat 3SAT โˆ‰ NP is known as the โ€œNP โ‰  co โˆ’ NPโ€ conjecture. Itimplies the P โ‰  NP conjecture (see Exercise 15.2).

We have previously informally equated the notion of ๐น โ‰ค๐‘ ๐บ with๐น being โ€œno harder than ๐บโ€ and in particular have seen in SolvedExercise 14.1 that if ๐บ โˆˆ P and ๐น โ‰ค๐‘ ๐บ, then ๐น โˆˆ P as well. Thefollowing exercise shows that if ๐น โ‰ค๐‘ ๐บ then it is also โ€œno harder toverifyโ€ than ๐บ. That is, regardless of whether or not it is in P, if ๐บ hasthe property that solutions to it can be efficiently verified, then so does๐น .Solved Exercise 15.4 โ€” Reductions and NP. Let ๐น, ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1.Show that if ๐น โ‰ค๐‘ ๐บ and ๐บ โˆˆ NP then ๐น โˆˆ NP.

Solution:

Suppose that ๐บ is in NP and in particular there exists ๐‘Ž and ๐‘‰ โˆˆP such that for every ๐‘ฆ โˆˆ 0, 1โˆ—, ๐บ(๐‘ฆ) = 1 โ‡” โˆƒ๐‘คโˆˆ0,1|๐‘ฆ|๐‘Ž ๐‘‰ (๐‘ฆ๐‘ค) = 1.Suppose also that ๐น โ‰ค๐‘ ๐บ and so in particular there is a ๐‘›๐‘-time computable function ๐‘… such that ๐น(๐‘ฅ) = ๐บ(๐‘…(๐‘ฅ)) for all๐‘ฅ โˆˆ 0, 1โˆ—. Define ๐‘‰ โ€ฒ to be a Turing Machine that on input a pair(๐‘ฅ, ๐‘ค) computes ๐‘ฆ = ๐‘…(๐‘ฅ) and returns 1 if and only if |๐‘ค| = |๐‘ฆ|๐‘Žand ๐‘‰ (๐‘ฆ๐‘ค) = 1. Then ๐‘‰ โ€ฒ runs in polynomial time, and for every๐‘ฅ โˆˆ 0, 1โˆ—, ๐น(๐‘ฅ) = 1 iff there exists ๐‘ค of size |๐‘…(๐‘ฅ)|๐‘Ž which is atmost polynomial in |๐‘ฅ| such that ๐‘‰ โ€ฒ(๐‘ฅ, ๐‘ค) = 1, hence demonstratingthat ๐น โˆˆ NP.

Page 484: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

484 introduction to theoretical computer science

15.2 FROM NP TO 3SAT: THE COOK-LEVIN THEOREM

We have seen several examples of problems for which we do not knowif their best algorithm is polynomial or exponential, but we can showthat they are in NP. That is, we donโ€™t know if they are easy to solve, butwe do know that it is easy to verify a given solution. There are many,many, many, more examples of interesting functions we would like tocompute that are easily shown to be in NP. What is quite amazing isthat if we can solve 3SAT then we can solve all of them!

The following is one of the most fundamental theorems in Com-puter Science:

Theorem 15.6 โ€” Cook-Levin Theorem. For every ๐น โˆˆ NP, ๐น โ‰ค๐‘ 3SAT.We will soon show the proof of Theorem 15.6, but note that it im-

mediately implies that QUADEQ, LONGPATH, and MAXCUT allreduce to 3SAT. Combining it with the reductions weโ€™ve seen in Chap-ter 14, it implies that all these problems are equivalent! For example,to reduce QUADEQ to LONGPATH, we can first reduce QUADEQ to3SAT using Theorem 15.6 and use the reduction weโ€™ve seen in Theo-rem 14.10 from 3SAT to LONGPATH. That is, since QUADEQ โˆˆ NP,Theorem 15.6 implies that QUADEQ โ‰ค๐‘ 3SAT, and Theorem 14.10implies that 3SAT โ‰ค๐‘ LONGPATH, which by the transitivity of reduc-tions (Solved Exercise 14.2) means that QUADEQ โ‰ค๐‘ LONGPATH.Similarly, since LONGPATH โˆˆ NP, we can use Theorem 15.6 andTheorem 14.4 to show that LONGPATH โ‰ค๐‘ 3SAT โ‰ค๐‘ QUADEQ,concluding that LONGPATH and QUADEQ are computationallyequivalent.

There is of course nothing special about QUADEQ and LONGPATHhere: by combining (15.6) with the reductions we saw, we see that justlike 3SAT, every ๐น โˆˆ NP reduces to LONGPATH, and the same is truefor QUADEQ and MAXCUT. All these problems are in some senseโ€œthe hardest in NPโ€ since an efficient algorithm for any one of themwould imply an efficient algorithm for all the problems in NP. Thismotivates the following definition:

Definition 15.7 โ€” NP-hardness and NP-completeness. Let ๐บ โˆถ 0, 1โˆ— โ†’0, 1. We say that ๐บ is NP hard if for every ๐น โˆˆ NP, ๐น โ‰ค๐‘ ๐บ.We say that ๐บ is NP complete if ๐บ is NP hard and ๐บ โˆˆ NP.

The Cook-Levin Theorem (Theorem 15.6) can be rephrased assaying that 3SAT is NP hard, and since it is also in NP, this means that3SAT is NP complete. Together with the reductions of Chapter 14,Theorem 15.6 shows that despite their superficial differences, 3SAT,quadratic equations, longest path, independent set, and maximum

Page 485: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 485

Figure 15.3: The world if P โ‰  NP (left) and P = NP(right). In the former case the set of NP-completeproblems is disjoint from P and Ladnerโ€™s theoremshows that there exist problems that are neither inP nor are NP-complete. (There are remarkably fewnatural candidates for such problems, with someprominent examples being decision variants ofproblems such as integer factoring, lattice shortestvector, and finding Nash equilibria.) In the latter casethat P = NP the notion of NP-completeness loses itsmeaning, as essentially all functions in P (save for thetrivial constant zero and constant one functions) areNP-complete.

Figure 15.4: A rough illustration of the (conjectured)status of problems in exponential time. Darker colorscorrespond to higher running time, and the circle inthe middle is the problems in P. NP is a (conjecturedto be proper) superclass of P and the NP-completeproblems (or NPC for short) are the โ€œhardestโ€ prob-lems in NP, in the sense that a solution for one ofthem implies a solution for all other problems in NP.It is conjectured that all the NP-complete problemsrequire at least exp(๐‘›๐œ–) time to solve for a constant๐œ– > 0, and many require exp(ฮฉ(๐‘›)) time. The per-manent is not believed to be contained in NP thoughit is NP-hard, which means that a polynomial-timealgorithm for it implies that P = NP.

cut, are all NP-complete. Many thousands of additional problemshave been shown to be NP-complete, arising from all the sciences,mathematics, economics, engineering and many other fields. (For afew examples, see this Wikipedia page and this website.)

Big Idea 22 If a single NP-complete has a polynomial-time algo-rithm, then there is such an algorithm for every decision problem thatcorresponds to the existence of an efficiently-verifiable solution.

15.2.1 What does this mean?As weโ€™ve seen in Solved Exercise 15.2, P โŠ† NP. The most famous con-jecture in Computer Science is that this containment is strict. That is,it is widely conjectured that P โ‰  NP. One way to refute the conjec-ture that P โ‰  NP is to give a polynomial-time algorithm for even asingle one of the NP-complete problems such as 3SAT, Max Cut, orthe thousands of others that have been studied in all fields of humanendeavors. The fact that these problems have been studied by so manypeople, and yet not a single polynomial-time algorithm for any ofthem has been found, supports that conjecture that indeed P โ‰  NP. Infact, for many of these problems (including all the ones we mentionedabove), we donโ€™t even know of a 2๐‘œ(๐‘›)-time algorithm! However, to thefrustration of computer scientists, we have not yet been able to provethat P โ‰  NP or even rule out the existence of an ๐‘‚(๐‘›)-time algorithmfor 3SAT. Resolving whether or not P = NP is known as the P vs NPproblem. A million-dollar prize has been offered for the solution ofthis problem, a popular book has been written, and every year a newpaper comes out claiming a proof of P = NP or P โ‰  NP, only to witherunder scrutiny.

One of the mysteries of computation is that people have observed acertain empirical โ€œzero-one lawโ€ or โ€œdichotomyโ€ in the computationalcomplexity of natural problems, in the sense that many natural prob-lems are either in P (often in TIME(๐‘‚(๐‘›)) or TIME(๐‘‚(๐‘›2))), or theyare are NP hard. This is related to the fact that for most natural prob-lems, the best known algorithm is either exponential or polynomial,with not too many examples where the best running time is somestrange intermediate complexity such as 22โˆšlog๐‘› . However, it is be-lieved that there exist problems in NP that are neither in P nor are NP-complete, and in fact a result known as โ€œLadnerโ€™s Theoremโ€ showsthat if P โ‰  NP then this is indeed the case (see also Exercise 15.1 andFig. 15.3).

Page 486: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

486 introduction to theoretical computer science

15.2.2 The Cook-Levin Theorem: Proof outlineWe will now prove the Cook-Levin Theorem, which is the underpin-ning to a great web of reductions from 3SAT to thousands of problemsacross many great fields. Some problems that have been shown to beNP-complete include: minimum-energy protein folding, minimumsurface-area foam configuration, map coloring, optimal Nash equi-librium, quantum state entanglement, minimum supersequence ofa genome, minimum codeword problem, shortest vector in a lattice,minimum genus knots, positive Diophantine equations, integer pro-gramming, and many many more. The worst-case complexity of allthese problems is (up to polynomial factors) equivalent to that of3SAT, and through the Cook-Levin Theorem, to all problems in NP.

To prove Theorem 15.6 we need to show that ๐น โ‰ค๐‘ 3SAT for every๐น โˆˆ NP. We will do so in three stages. We define two intermediateproblems: NANDSAT and 3NAND. We will shortly show the def-initions of these two problems, but Theorem 15.6 will follow fromcombining the following three results:

1. NANDSAT is NP hard (Lemma 15.8).

2. NANDSAT โ‰ค๐‘ 3NAND (Lemma 15.9).

3. 3NAND โ‰ค๐‘ 3SAT (Lemma 15.10).

By the transitivity of reductions, it will follow that for every ๐น โˆˆNP, ๐น โ‰ค๐‘ NANDSAT โ‰ค๐‘ 3NAND โ‰ค๐‘ 3SAT (15.3)

hence establishing Theorem 15.6.We will prove these three results Lemma 15.8, Lemma 15.9 and

Lemma 15.10 one by one, providing the requisite definitions as we goalong.

15.3 THE NANDSAT PROBLEM, AND WHY IT IS NP HARD

The function NANDSAT โˆถ 0, 1โˆ— โ†’ 0, 1 is defined as follows:

โ€ข The input to NANDSAT is a string ๐‘„ representing a NAND-CIRCprogram (or equivalently, a circuit with NAND gates).

โ€ข The output of NANDSAT on input ๐‘„ is 1 if and only if there exists astring ๐‘ค โˆˆ 0, 1๐‘› (where ๐‘› is the number of inputs to ๐‘„) such that๐‘„(๐‘ค) = 1.

Solved Exercise 15.5 โ€” ๐‘๐ด๐‘๐ท๐‘†๐ด๐‘‡ โˆˆ NP. Prove that NANDSAT โˆˆ NP.

Page 487: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 487

Solution:

We have seen that the circuit (or straightline program) evalua-tion problem can be computed in polynomial time. Specifically,given a NAND-CIRC program ๐‘„ of ๐‘  lines and ๐‘› inputs, and๐‘ค โˆˆ 0, 1๐‘›, we can evaluate ๐‘„ on the input ๐‘ค in time which ispolynomial in ๐‘  and hence verify whether or not ๐‘„(๐‘ค) = 1.

We now prove that NANDSAT is NP hard.Lemma 15.8 NANDSAT is NP hard.

Proof Idea:

The proof closely follows the proof that P โŠ† P/poly (Theorem 13.12, see also Section 13.6.2). Specifically, if ๐น โˆˆ NP then there is a poly-nomial time Turing machine ๐‘€ and positive integer ๐‘Ž such that forevery ๐‘ฅ โˆˆ 0, 1๐‘›, ๐น(๐‘ฅ) = 1 iff there is some ๐‘ค โˆˆ 0, 1๐‘›๐‘Ž such that๐‘€(๐‘ฅ๐‘ค) = 1. The proof that P โŠ† P/poly gave us way (via โ€œunrolling theloopโ€) to come up in polynomial time with a Boolean circuit ๐ถ on ๐‘›๐‘Žinputs that computes the function ๐‘ค โ†ฆ ๐‘€(๐‘ฅ๐‘ค). We can then translate๐ถ into an equivalent NAND circuit (or NAND-CIRC program) ๐‘„. Wesee that there is a string ๐‘ค โˆˆ 0, 1๐‘›๐‘Ž such that ๐‘„(๐‘ค) = 1 if and only ifthere is such ๐‘ค satisfying ๐‘€(๐‘ฅ๐‘ค) = 1 which (by definition) happens ifand only if ๐น(๐‘ฅ) = 1. Hence the translation of ๐‘ฅ into the circuit ๐‘„ is areduction showing ๐น โ‰ค๐‘ NANDSAT.โ‹†

PThe proof is a little bit technical but ultimately followsquite directly from the definition of NP, as well as theability to โ€œunroll the loopโ€ of NAND-TM programs asdiscussed in Section 13.6.2. If you find it confusing, tryto pause here and think how you would implementin your favorite programming language the functionunroll which on input a NAND-TM program ๐‘ƒand numbers ๐‘‡ , ๐‘› outputs an ๐‘›-input NAND-CIRCprogram ๐‘„ of ๐‘‚(|๐‘‡ |) lines such that for every input๐‘ง โˆˆ 0, 1๐‘›, if ๐‘ƒ halts on ๐‘ง within at most ๐‘‡ steps andoutputs ๐‘ฆ, then ๐‘„(๐‘ง) = ๐‘ฆ.

Proof of Lemma 15.8. Let ๐น โˆˆ NP. To prove Lemma 15.8 we need togive a polynomial-time computable function that will map every ๐‘ฅโˆ— โˆˆ0, 1โˆ— to a NAND-CIRC program ๐‘„ such that ๐น(๐‘ฅ) = NANDSAT(๐‘„).

Let ๐‘ฅโˆ— โˆˆ 0, 1โˆ— be such a string and let ๐‘› = |๐‘ฅโˆ—| be its length. ByDefinition 15.1 there exists ๐‘‰ โˆˆ P and positive ๐‘Žโ„• such that ๐น(๐‘ฅโˆ—) = 1if and only if there exists ๐‘ค โˆˆ 0, 1๐‘›๐‘Ž satisfying ๐‘‰ (๐‘ฅโˆ—๐‘ค) = 1.

Page 488: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

488 introduction to theoretical computer science

Let ๐‘š = ๐‘›๐‘Ž. Since ๐‘‰ โˆˆ P there is some NAND-TM program ๐‘ƒ โˆ— thatcomputes ๐‘‰ on inputs of the form ๐‘ฅ๐‘ค with ๐‘ฅ โˆˆ 0, 1๐‘› and ๐‘ค โˆˆ 0, 1๐‘šin at most (๐‘› + ๐‘š)๐‘ time for some constant ๐‘. Using our โ€œunrollingthe loop NAND-TM to NAND compilerโ€ of Theorem 13.14, we canobtain a NAND-CIRC program ๐‘„โ€ฒ that has ๐‘› + ๐‘š inputs and at most๐‘‚((๐‘› + ๐‘š)2๐‘) lines such that ๐‘„โ€ฒ(๐‘ฅ๐‘ค) = ๐‘ƒ โˆ—(๐‘ฅ๐‘ค) for every ๐‘ฅ โˆˆ 0, 1๐‘›and ๐‘ค โˆˆ 0, 1๐‘š.

We can then use a simple โ€œhardwiringโ€ technique, reminiscent ofRemark 9.11 to map ๐‘„โ€ฒ into a circuit/NAND-CIRC program ๐‘„ on ๐‘šinputs such that ๐‘„(๐‘ค) = ๐‘„โ€ฒ(๐‘ฅโˆ—๐‘ค) for every ๐‘ค โˆˆ 0, 1๐‘š.

CLAIM: There is a polynomial-time algorithm that on input aNAND-CIRC program ๐‘„โ€ฒ on ๐‘› + ๐‘š inputs and ๐‘ฅโˆ— โˆˆ 0, 1๐‘›, outputsa NAND-CIRC program ๐‘„ such that for every ๐‘ค โˆˆ 0, 1๐‘›, ๐‘„(๐‘ค) =๐‘„โ€ฒ(๐‘ฅโˆ—๐‘ค).

PROOF OF CLAIM: We can do so by adding a few lines to ensurethat the variables zero and one are 0 and 1 respectively, and thensimply replacing any reference in ๐‘„โ€ฒ to an input ๐‘ฅ๐‘– with ๐‘– โˆˆ [๐‘›] thecorresponding value based on ๐‘ฅโˆ—๐‘– . See Fig. 15.5 for an implementationof this reduction in Python.

Our final reduction maps an input ๐‘ฅโˆ—, into the NAND-CIRC pro-gram ๐‘„ obtained above. By the above discussion, this reduction runsin polynomial time. Since we know that ๐น(๐‘ฅโˆ—) = 1 if and only if thereexists ๐‘ค โˆˆ 0, 1๐‘š such that ๐‘ƒ โˆ—(๐‘ฅโˆ—๐‘ค) = 1, this means that ๐น(๐‘ฅโˆ—) = 1 ifand only if NANDSAT(๐‘„) = 1, which is what we wanted to prove.

Figure 15.5: Given an ๐‘‡ -line NAND-CIRC program๐‘„ that has ๐‘› + ๐‘š inputs and some ๐‘ฅโˆ— โˆˆ 0, 1๐‘›,we can transform ๐‘„ into a ๐‘‡ + 3 line NAND-CIRCprogram ๐‘„โ€ฒ that computes the map ๐‘ค โ†ฆ ๐‘„(๐‘ฅโˆ—๐‘ค)for ๐‘ค โˆˆ 0, 1๐‘š by simply adding code to computethe zero and one constants, replacing all references toX[๐‘–] with either zero or one depending on the valueof ๐‘ฅโˆ—๐‘– , and then replacing the remaining referencesto X[๐‘—] with X[๐‘— โˆ’ ๐‘›]. Above is Python code thatimplements this transformation, as well as an exampleof its execution on a simple program.

15.4 THE 3NAND PROBLEM

The 3NAND problem is defined as follows:

โ€ข The input is a logical formula ฮจ on a set of variables ๐‘ง0, โ€ฆ , ๐‘ง๐‘Ÿโˆ’1which is an AND of constraints of the form ๐‘ง๐‘– = NAND(๐‘ง๐‘—, ๐‘ง๐‘˜).

Page 489: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 489

โ€ข The output is 1 if and only if there is an input ๐‘ง โˆˆ 0, 1๐‘Ÿ thatsatisfies all of the constraints.

For example, the following is a 3NAND formula with 5 variablesand 3 constraints:

ฮจ = (๐‘ง3 = NAND(๐‘ง0, ๐‘ง2))โˆง(๐‘ง1 = NAND(๐‘ง0, ๐‘ง2))โˆง(๐‘ง4 = NAND(๐‘ง3, ๐‘ง1)) .(15.4)

In this case 3NAND(ฮจ) = 1 since the assignment ๐‘ง = 01010 satisfiesit. Given a 3NAND formula ฮจ on ๐‘Ÿ variables and an assignment ๐‘ง โˆˆ0, 1๐‘Ÿ, we can check in polynomial time whether ฮจ(๐‘ง) = 1, and hence3NAND โˆˆ NP. We now prove that 3NAND is NP hard:Lemma 15.9 NANDSAT โ‰ค๐‘ 3NAND.

Proof Idea:

To prove Lemma 15.9 we need to give a polynomial-time map fromevery NAND-CIRC program ๐‘„ to a 3NAND formula ฮจ such that thereexists ๐‘ค such that ๐‘„(๐‘ค) = 1 if and only if there exists ๐‘ง satisfying ฮจ.For every line ๐‘– of ๐‘„, we define a corresponding variable ๐‘ง๐‘– of ฮจ. Ifthe line ๐‘– has the form foo = NAND(bar,blah) then we will add theclause ๐‘ง๐‘– = NAND(๐‘ง๐‘—, ๐‘ง๐‘˜) where ๐‘— and ๐‘˜ are the last lines in which barand blah were written to. We will also set variables correspondingto the input variables, as well as add a clause to ensure that the finaloutput is 1. The resulting reduction can be implemented in about adozen lines of Python, see Fig. 15.6.โ‹†

Figure 15.6: Python code to reduce an instance ๐‘„ ofNANDSAT to an instance ฮจ of 3NAND. In the exam-ple above we transform the NAND-CIRC programxor5 which has 5 input variables and 16 lines, intoa 3NAND formula ฮจ that has 24 variables and 20clauses. Since xor5 outputs 1 on the input 1, 0, 0, 1, 1,there exists an assignment ๐‘ง โˆˆ 0, 124 to ฮจ such that(๐‘ง0, ๐‘ง1, ๐‘ง2, ๐‘ง3, ๐‘ง4) = (1, 0, 0, 1, 1) and ฮจ evaluates totrue on ๐‘ง.

Page 490: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

490 introduction to theoretical computer science

Proof of Lemma 15.9. To prove Lemma 15.9 we need to give a reductionfrom NANDSAT to 3NAND. Let ๐‘„ be a NAND-CIRC program with๐‘› inputs, one output, and ๐‘š lines. We can assume without loss ofgenerality that ๐‘„ contains the variables one and zero as usual.

We map ๐‘„ to a 3NAND formula ฮจ as follows:

โ€ข ฮจ has ๐‘š + ๐‘› variables ๐‘ง0, โ€ฆ , ๐‘ง๐‘š+๐‘›โˆ’1.โ€ข The first ๐‘› variables ๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1 will corresponds to the inputs of ๐‘„.

The next ๐‘š variables ๐‘ง๐‘›, โ€ฆ , ๐‘ง๐‘›+๐‘šโˆ’1 will correspond to the ๐‘š linesof ๐‘„.

โ€ข For every โ„“ โˆˆ ๐‘›, ๐‘› + 1, โ€ฆ , ๐‘› + ๐‘š, if the โ„“ โˆ’ ๐‘›-th line of the program๐‘„ is foo = NAND(bar,blah) then we add to ฮจ the constraint ๐‘งโ„“ =NAND(๐‘ง๐‘—, ๐‘ง๐‘˜) where ๐‘— โˆ’ ๐‘› and ๐‘˜ โˆ’ ๐‘› correspond to the last linesin which the variables bar and blah (respectively) were written to.If one or both of bar and blah was not written to before then weuse ๐‘งโ„“0 instead of the corresponding value ๐‘ง๐‘— or ๐‘ง๐‘˜ in the constraint,where โ„“0 โˆ’ ๐‘› is the line in which zero is assigned a value. If one orboth of bar and blah is an input variable X[i] then we use ๐‘ง๐‘– in theconstraint.

โ€ข Let โ„“โˆ— be the last line in which the output y_0 is assigned a value.Then we add the constraint ๐‘งโ„“โˆ— = NAND(๐‘งโ„“0 , ๐‘งโ„“0) where โ„“0 โˆ’ ๐‘› is asabove the last line in which zero is assigned a value. Note that thisis effectively the constraint ๐‘งโ„“โˆ— = NAND(0, 0) = 1.To complete the proof we need to show that there exists ๐‘ค โˆˆ 0, 1๐‘›

s.t. ๐‘„(๐‘ค) = 1 if and only if there exists ๐‘ง โˆˆ 0, 1๐‘›+๐‘š that satisfies allconstraints in ฮจ. We now show both sides of this equivalence.

Part I: Completeness. Suppose that there is ๐‘ค โˆˆ 0, 1๐‘› s.t. ๐‘„(๐‘ค) =1. Let ๐‘ง โˆˆ 0, 1๐‘›+๐‘š be defined as follows: for ๐‘– โˆˆ [๐‘›], ๐‘ง๐‘– = ๐‘ค๐‘– andfor ๐‘– โˆˆ ๐‘›, ๐‘› + 1, โ€ฆ , ๐‘› + ๐‘š ๐‘ง๐‘– equals the value that is assigned inthe (๐‘– โˆ’ ๐‘›)-th line of ๐‘„ when executed on ๐‘ค. Then by construction๐‘ง satisfies all of the constraints of ฮจ (including the constraint that๐‘งโ„“โˆ— = NAND(0, 0) = 1 since ๐‘„(๐‘ค) = 1.)

Part II: Soundness. Suppose that there exists ๐‘ง โˆˆ 0, 1๐‘›+๐‘š satisfy-ing ฮจ. Soundness will follow by showing that ๐‘„(๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1) = 1 (andhence in particular there exists ๐‘ค โˆˆ 0, 1๐‘›, namely ๐‘ค = ๐‘ง0 โ‹ฏ ๐‘ง๐‘›โˆ’1,such that ๐‘„(๐‘ค) = 1). To do this we will prove the following claim(โˆ—): for every โ„“ โˆˆ [๐‘š], ๐‘งโ„“+๐‘› equals the value assigned in the โ„“-th stepof the execution of the program ๐‘„ on ๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1. Note that because ๐‘งsatisfies the constraints of ฮจ, (โˆ—) is sufficient to prove the soundnesscondition since these constraints imply that the last value assigned tothe variable y_0 in the execution of ๐‘„ on ๐‘ง0 โ‹ฏ ๐‘ค๐‘›โˆ’1 is equal to 1. Toprove (โˆ—) suppose, towards a contradiction, that it is false, and let โ„“ be

Page 491: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 491

Figure 15.7: A 3NAND instance that is obtained bytaking a NAND-TM program for computing theAND function, unrolling it to obtain a NANDSATinstance, and then composing it with the reduction ofLemma 15.9.

the smallest number such that ๐‘งโ„“+๐‘› is not equal to the value assignedin the โ„“-th step of the execution of ๐‘„ on ๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1. But since ๐‘ง sat-isfies the constraints of ฮจ, we get that ๐‘งโ„“+๐‘› = NAND(๐‘ง๐‘–, ๐‘ง๐‘—) where(by the assumption above that โ„“ is smallest with this property) thesevalues do correspond to the values last assigned to the variables on therighthand side of the assignment operator in the โ„“-th line of the pro-gram. But this means that the value assigned in the โ„“-th step is indeedsimply the NAND of ๐‘ง๐‘– and ๐‘ง๐‘—, contradicting our assumption on thechoice of โ„“.

15.5 FROM 3NAND TO 3SATThe final step in the proof of Theorem 15.6 is the following:Lemma 15.10 3NAND โ‰ค๐‘ 3SAT.Proof Idea:

To prove Lemma 15.10 we need to map a 3NAND formula ๐œ‘ intoa 3SAT formula ๐œ“ such that ๐œ‘ is satisfiable if and only if ๐œ“ is. Theidea is that we can transform every NAND constraint of the form๐‘Ž = NAND(๐‘, ๐‘) into the AND of ORs involving the variables ๐‘Ž, ๐‘, ๐‘and their negations, where each of the ORs contains at most threeterms. The construction is fairly straightforward, and the details aregiven below.โ‹†

PIt is a good exercise for you to try to find a 3CNF for-mula ๐œ‰ on three variables ๐‘Ž, ๐‘, ๐‘ such that ๐œ‰(๐‘Ž, ๐‘, ๐‘) istrue if and only if ๐‘Ž = NAND(๐‘, ๐‘). Once you do so, tryto see why this implies a reduction from 3NAND to3SAT, and hence completes the proof of Lemma 15.10

Figure 15.8: Code and example output for the reduc-tion given in Lemma 15.10 of 3NAND to 3SAT.

Page 492: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

492 introduction to theoretical computer science

1 The resulting formula will have some of the ORโ€™sinvolving only two variables. If we wanted to insist oneach formula involving three distinct variables we canalways add a โ€œdummy variableโ€ ๐‘ง๐‘›+๐‘š and include itin all the ORโ€™s involving only two variables, and add aconstraint requiring this dummy variable to be zero.

Proof of Lemma 15.10. The constraint๐‘ง๐‘– = NAND(๐‘ง๐‘—, ๐‘ง๐‘˜) (15.5)is satisfied if ๐‘ง๐‘– = 1 whenever (๐‘ง๐‘—, ๐‘ง๐‘˜) โ‰  (1, 1). By going through allcases, we can verify that (15.5) is equivalent to the constraint(๐‘ง๐‘– โˆจ ๐‘ง๐‘— โˆจ ๐‘ง๐‘˜) โˆง (๐‘ง๐‘– โˆจ ๐‘ง๐‘—) โˆง (๐‘ง๐‘– โˆจ ๐‘ง๐‘˜) . (15.6)

Indeed if ๐‘ง๐‘— = ๐‘ง๐‘˜ = 1 then the first constraint of Eq. (15.6) is onlytrue if ๐‘ง๐‘– = 0. On the other hand, if either of ๐‘ง๐‘— or ๐‘ง๐‘˜ equals 0 then un-less ๐‘ง๐‘– = 1 either the second or third constraints will fail. This meansthat, given any 3NAND formula ๐œ‘ over ๐‘› variables ๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1, we canobtain a 3SAT formula ๐œ“ over the same variables by replacing every3NAND constraint of ๐œ‘ with three 3OR constraints as in Eq. (15.6).1Because of the equivalence of (15.5) and (15.6), the formula ๐œ“ sat-isfies that ๐œ“(๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1) = ๐œ‘(๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1) for every assignment๐‘ง0, โ€ฆ , ๐‘ง๐‘›โˆ’1 โˆˆ 0, 1๐‘› to the variables. In particular ๐œ“ is satisfiable ifand only if ๐œ‘ is, thus completing the proof.

Figure 15.9: An instance of the independent set problemobtained by applying the reductions NANDSAT โ‰ค๐‘3NAND โ‰ค๐‘ 3SAT โ‰ค๐‘ ISAT starting with the xor5NAND-CIRC program.

15.6 WRAPPING UP

We have shown that for every function ๐น in NP, ๐น โ‰ค๐‘ NANDSAT โ‰ค๐‘3NAND โ‰ค๐‘ 3SAT, and so 3SAT is NP-hard. Since in Chapter 14 wesaw that 3SAT โ‰ค๐‘ QUADEQ, 3SAT โ‰ค๐‘ ISET, 3SAT โ‰ค๐‘ MAXCUTand 3SAT โ‰ค๐‘ LONGPATH, all these problems are NP-hard as well.Finally, since all the aforementioned problems are in NP, they areall in fact NP-complete and have equivalent complexity. There arethousands of other natural problems that are NP-complete as well.Finding a polynomial-time algorithm for any one of them will imply apolynomial-time algorithm for all of them.

Page 493: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 493

Figure 15.10: We believe that P โ‰  NP and all NPcomplete problems lie outside of P, but we cannotrule out the possiblity that P = NP. However, wecan rule out the possiblity that some NP-completeproblems are in P and other do not, since we knowthat if even one NP-complete problem is in P thenP = NP. The relation between P/poly and NP isnot known though it can be shown that if one NP-complete problem is in P/poly then NP โŠ† P/poly.

2 Hint: Use the function ๐น that on input a formula ๐œ‘and a string of the form 1๐‘ก, outputs 1 if and only if ๐œ‘is satisfiable and ๐‘ก = |๐œ‘|log |๐œ‘|.3 Hint: Prove and then use the fact that P is closedunder complement.

Chapter Recap

โ€ข Many of the problems for which we donโ€™t knowpolynomial-time algorithms are NP-complete,which means that finding a polynomial-time algo-rithm for one of them would imply a polynomial-time algorithm for all of them.

โ€ข It is conjectured that NP โ‰  P which means thatwe believe that polynomial-time algorithms forthese problems are not merely unknown but arenonexistent.

โ€ข While an NP-hardness result means for examplethat a full-fledged โ€œtextbookโ€ solution to a problemsuch as MAX-CUT that is as clean and general asthe algorithm for MIN-CUT probably does notexist, it does not mean that we need to give upwhenever we see a MAX-CUT instance. Later inthis course we will discuss several strategies to dealwith NP-hardness, including average-case complexityand approximation algorithms.

15.7 EXERCISES

Exercise 15.1 โ€” Poor manโ€™s Ladnerโ€™s Theorem. Prove that if there is no๐‘›๐‘‚(log2 ๐‘›) time algorithm for 3SAT then there is some ๐น โˆˆ NP suchthat ๐น โˆ‰ P and ๐น is not NP complete.2

Exercise 15.2 โ€” NP โ‰  co โˆ’ NP โ‡’ NP โ‰  P. Let 3SAT be the functionthat on input a 3CNF formula ๐œ‘ return 1 โˆ’ 3SAT(๐œ‘). Prove that if3SAT โˆ‰ NP then P โ‰  NP. See footnote for hint.3

Exercise 15.3 Define WSAT to be the following function: the input is aCNF formula ๐œ‘ where each clause is the OR of one to three variables(without negations), and a number ๐‘˜ โˆˆ โ„•. For example, the followingformula can be used for a valid input to WSAT: ๐œ‘ = (๐‘ฅ5 โˆจ ๐‘ฅ2 โˆจ ๐‘ฅ1) โˆง(๐‘ฅ1 โˆจ ๐‘ฅ3 โˆจ ๐‘ฅ0) โˆง (๐‘ฅ2 โˆจ ๐‘ฅ4 โˆจ ๐‘ฅ0). The output WSAT(๐œ‘, ๐‘˜) = 1 if andonly if there exists a satisfying assignment to ๐œ‘ in which exactly ๐‘˜of the variables get the value 1. For example for the formula above

Page 494: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

494 introduction to theoretical computer science

WSAT(๐œ‘, 2) = 1 since the assignment (1, 1, 0, 0, 0, 0) satisfies all theclauses. However WSAT(๐œ‘, 1) = 0 since there is no single variableappearing in all clauses.

Prove that WSAT is NP-complete.

Exercise 15.4 In the employee recruiting problem we are given a list ofpotential employees, each of which has some subset of ๐‘š potentialskills, and a number ๐‘˜. We need to assemble a team of ๐‘˜ employeessuch that for every skill there would be one member of the team withthis skill.

For example, if Alice has the skills โ€œC programmingโ€, โ€œNANDprogrammingโ€ and โ€œSolving Differential Equationsโ€, Bob has theskills โ€œC programmingโ€ and โ€œSolving Differential Equationsโ€, andCharlie has the skills โ€œNAND programmingโ€ and โ€œCoffee Brewingโ€,then if we want a team of two people that covers all the four skills, wewould hire Alice and Charlie.

Define the function EMP s.t. on input the skills ๐ฟ of all potentialemployees (in the form of a sequence ๐ฟ of ๐‘› lists ๐ฟ1, โ€ฆ , ๐ฟ๐‘›, eachcontaining distinct numbers between 0 and ๐‘š), and a number ๐‘˜,EMP(๐ฟ, ๐‘˜) = 1 if and only if there is a subset ๐‘† of ๐‘˜ potential em-ployees such that for every skill ๐‘— in [๐‘š], there is an employee in ๐‘† thathas the skill ๐‘—.

Prove that EMP is NP complete.

Exercise 15.5 โ€” Balanced max cut. Prove that the โ€œbalanced variantโ€ ofthe maximum cut problem is NP-complete, where this is defined asBMC โˆถ 0, 1โˆ— โ†’ 0, 1 where for every graph ๐บ = (๐‘‰ , ๐ธ) and ๐‘˜ โˆˆ โ„•,BMC(๐บ, ๐‘˜) = 1 if and only if there exists a cut ๐‘† in ๐บ cutting at least ๐‘˜edges such that |๐‘†| = |๐‘‰ |/2.

Exercise 15.6 โ€” Regular expression intersection. Let MANYREGS be the fol-lowing function: On input a list of regular expressions ๐‘’๐‘ฅ๐‘0, โ€ฆ , exp๐‘š(represented as strings in some standard way), output 1 if and only ifthere is a single string ๐‘ฅ โˆˆ 0, 1โˆ— that matches all of them. Prove thatMANYREGS is NP-hard.

15.8 BIBLIOGRAPHICAL NOTES

Aaronsonโ€™s 120 page survey [Aar16] is a beautiful and extensive ex-position to the P vs NP problem, its importance and status. See alsoas well as Chapter 3 in Wigdersonโ€™s excellent book [Wig19]. Johnson[Joh12] gives a survey of the historical development of the theory ofNP completeness. The following web page keeps a catalog of failed

Page 495: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

np, np completeness, and the cook-levin theorem 495

attempts at settling P vs NP. At the time of this writing, it lists about110 papers claiming to resolve the question, of which about 60 claim toprove that P = NP and about 50 claim to prove that P โ‰  NP.

Eugene Lawlerโ€™s quote on the โ€œmystical power of twonessโ€ wastaken from the wonderful book โ€œThe Nature of Computationโ€ byMoore and Mertens. See also this memorial essay on Lawler byLenstra.

Page 496: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 497: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

1 Paul Erdล‘s (1913-1996) was one of the most prolificmathematicians of all times. Though he was an athe-ist, Erdล‘s often referred to โ€œThe Bookโ€ in which Godkeeps the most elegant proof of each mathematicaltheorem.

2 The ๐‘˜-th Ramsey number, denoted as ๐‘…(๐‘˜, ๐‘˜), is thesmallest number ๐‘› such that for every graph ๐บ on ๐‘›vertices, both ๐บ and its complement contain a ๐‘˜-sizedindependent set. If P = NP then we can compute๐‘…(๐‘˜, ๐‘˜) in time polynomial in 2๐‘˜, while otherwise itcan potentially take closer to 222๐‘˜ steps.

16What if P equals NP?

โ€œYou donโ€™t have to believe in God, but you should believe in The Book.โ€, PaulErdล‘s, 1985.1

โ€œNo more half measures, Walterโ€, Mike Ehrmantraut in โ€œBreaking Badโ€,2010.

โ€œThe evidence in favor of [P โ‰  NP] and [ its algebraic counterpart ] is sooverwhelming, and the consequences of their failure are so grotesque, that theirstatus may perhaps be compared to that of physical laws rather than that ofordinary mathematical conjectures.โ€, Volker Strassen, laudation for LeslieValiant, 1986.

โ€œSuppose aliens invade the earth and threaten to obliterate it in a yearโ€™s timeunless human beings can find the [fifth Ramsey number]. We could marshalthe worldโ€™s best minds and fastest computers, and within a year we could prob-ably calculate the value. If the aliens demanded the [sixth Ramsey number],however, we would have no choice but to launch a preemptive attack.โ€, PaulErdล‘s, as quoted by Graham and Spencer, 1990.2

We have mentioned that the question of whether P = NP, whichis equivalent to whether there is a polynomial-time algorithm for3SAT, is the great open question of Computer Science. But why is it soimportant? In this chapter, we will try to figure out the implications ofsuch an algorithm.

First, let us get one qualm out of the way. Sometimes people say,โ€œWhat if P = NP but the best algorithm for 3SAT takes ๐‘›1000 time?โ€ Well,๐‘›1000 is much larger than, say, 20.001โˆš๐‘› for any input smaller than 250,as large as a harddrive as you will encounter, and so another way tophrase this question is to say โ€œwhat if the complexity of 3SAT is ex-ponential for all inputs that we will ever encounter, but then growsmuch smaller than that?โ€ To me this sounds like the computer scienceequivalent of asking, โ€œwhat if the laws of physics change completelyonce they are out of the range of our telescopes?โ€. Sure, this is a validpossibility, but wondering about it does not sound like the most pro-ductive use of our time.

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Explore the consequences of P = NPโ€ข Search-to-decision reduction: transform

algorithms that solve decision version tosearch version for NP-complete problems.

โ€ข Optimization and learning problemsโ€ข Quantifier elimination and solving problems

in the polynomial hierarchy.โ€ข What is the evidence for P = NP vs P โ‰  NP?

Page 498: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

498 introduction to theoretical computer science

So, as the saying goes, weโ€™ll keep an open mind, but not so openthat our brains fall out, and assume from now on that:

โ€ข There is a mathematical god,

and

โ€ข She does not โ€œbeat around the bushโ€™ โ€™ or take โ€œhalf measuresโ€.

What we mean by this is that we will consider two extreme scenar-ios:

โ€ข 3SAT is very easy: 3SAT has an ๐‘‚(๐‘›) or ๐‘‚(๐‘›2) time algorithm witha not too huge constant (say smaller than 106.)

โ€ข 3SAT is very hard: 3SAT is exponentially hard and cannot besolved faster than 2๐œ–๐‘› for some not too tiny ๐œ– > 0 (say at least10โˆ’6). We can even make the stronger assumption that for everysufficiently large ๐‘›, the restriction of 3SAT to inputs of length ๐‘›cannot be computed by a circuit of fewer than 2๐œ–๐‘› gates.

At the time of writing, the fastest known algorithm for 3SAT re-quires more than 20.35๐‘› to solve ๐‘› variable formulas, while we do noteven know how to rule out the possibility that we can compute 3SATusing 10๐‘› gates. To put it in perspective, for the case ๐‘› = 1000 ourlower and upper bounds for the computational costs are apart bya factor of about 10100. As far as we know, it could be the case that1000-variable 3SAT can be solved in a millisecond on a first-generationiPhone, and it can also be the case that such instances require morethan the age of the universe to solve on the worldโ€™s fastest supercom-puter.

So far, most of our evidence points to the latter possibility of 3SATbeing exponentially hard, but we have not ruled out the former possi-bility either. In this chapter we will explore some of the consequencesof the โ€œ3SAT easyโ€ scenario.

16.1 SEARCH-TO-DECISION REDUCTION

A priori, having a fast algorithm for 3SAT might not seem so impres-sive. Sure, such an algorithm allows us to decide the satisfiability ofnot just 3CNF formulas but also of quadratic equations, as well as findout whether there is a long path in a graph, and solve many other de-cision problems. But this is not typically what we want to do. Itโ€™s notenough to know if a formula is satisfiable: we want to discover theactual satisfying assignment. Similarly, itโ€™s not enough to find out if agraph has a long path: we want to actually find the path.

It turns out that if we can solve these decision problems, we cansolve the corresponding search problems as well:

Page 499: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 499

Theorem 16.1 โ€” Search vs Decision. Suppose that P = NP. Thenfor every polynomial-time algorithm ๐‘‰ and ๐‘Ž, ๐‘ โˆˆ โ„•,there is apolynomial-time algorithm FIND๐‘‰ such that for every ๐‘ฅ โˆˆ 0, 1๐‘›,if there exists ๐‘ฆ โˆˆ 0, 1๐‘Ž๐‘›๐‘ satisfying ๐‘‰ (๐‘ฅ๐‘ฆ) = 1, then FIND๐‘‰ (๐‘ฅ)finds some string ๐‘ฆโ€ฒ satisfying this condition.

PTo understand what the statement of Theo-rem 16.1 means, let us look at the special case ofthe MAXCUT problem. It is not hard to see that thereis a polynomial-time algorithm VERIFYCUT such thatVERIFYCUT(๐บ, ๐‘˜, ๐‘†) = 1 if and only if ๐‘† is a subsetof ๐บโ€™s vertices that cuts at least ๐‘˜ edges. Theorem 16.1implies that if P = NP then there is a polynomial-timealgorithm FINDCUT that on input ๐บ, ๐‘˜ outputs a set๐‘† such that VERIFYCUT(๐บ, ๐‘˜, ๐‘†) = 1 if such a setexists. This means that if P = NP, by trying all valuesof ๐‘˜ we can find in polynomial time a maximum cutin any given graph. We can use a similar argument toshow that if P = NP then we can find a satisfying as-signment for every satisfiable 3CNF formula, find thelongest path in a graph, solve integer programming,and so and so forth.

Proof Idea:

The idea behind the proof of Theorem 16.1 is simple; let usdemonstrate it for the special case of 3SAT. (In fact, this case is notso โ€œspecialโ€โˆ’ since 3SAT is NP-complete, we can reduce the task ofsolving the search problem for MAXCUT or any other problem inNP to the task of solving it for 3SAT.) Suppose that P = NP and weare given a satisfiable 3CNF formula ๐œ‘, and we now want to find asatisfying assignment ๐‘ฆ for ๐œ‘. Define 3SAT0(๐œ‘) to output 1 if there isa satisfying assignment ๐‘ฆ for ๐œ‘ such that its first bit is 0, and similarlydefine 3SAT1(๐œ‘) = 1 if there is a satisfying assignment ๐‘ฆ with ๐‘ฆ0 = 1.The key observation is that both 3SAT0 and 3SAT1 are in NP, and so ifP = NP then we can compute them in polynomial time as well. Thuswe can use this to find the first bit of the satisfying assignment. Wecan continue in this way to recover all the bits.โ‹†Proof of Theorem 16.1. Let ๐‘‰ be some polynomial time algorithm and๐‘Ž, ๐‘ โˆˆ โ„• some constants. Define the function STARTSWITH๐‘‰ asfollows: For every ๐‘ฅ โˆˆ 0, 1โˆ— and ๐‘ง โˆˆ 0, 1โˆ—, STARTSWITH๐‘‰ (๐‘ฅ, ๐‘ง) =1 if and only if there exists some ๐‘ฆ โˆˆ 0, 1๐‘Ž๐‘›๐‘โˆ’|๐‘ง| (where ๐‘› = |๐‘ฅ|) suchthat ๐‘‰ (๐‘ฅ๐‘ง๐‘ฆ) = 1. That is, STARTSWITH๐‘‰ (๐‘ฅ, ๐‘ง) outputs 1 if there issome string ๐‘ค of length ๐‘Ž|๐‘ฅ|๐‘ such that ๐‘‰ (๐‘ฅ, ๐‘ค) = 1 and the first |๐‘ง|

Page 500: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

500 introduction to theoretical computer science

bits of ๐‘ค are ๐‘ง0, โ€ฆ , ๐‘งโ„“โˆ’1. Since, given ๐‘ฅ, ๐‘ฆ, ๐‘ง as above, we can check inpolynomial time if ๐‘‰ (๐‘ฅ๐‘ง๐‘ฆ) = 1, the function STARTSWITH๐‘‰ is in NPand hence if P = NP we can compute it in polynomial time.

Now for every such polynomial-time ๐‘‰ and ๐‘Ž, ๐‘ โˆˆ โ„•, we can imple-ment FIND๐‘‰ (๐‘ฅ) as follows:

Algorithm 16.2 โ€” ๐น๐ผ๐‘๐ท๐‘‰ : Search to decision reduction.

Input: ๐‘ฅ โˆˆ 0, 1๐‘›Output: ๐‘ง โˆˆ 0, 1๐‘Ž๐‘›๐‘ s.t. ๐‘‰ (๐‘ฅ๐‘ง) = 1, if such ๐‘ง exists. Other-

wise output the empty string.1: Initially ๐‘ง0 = ๐‘ง1 = โ‹ฏ = ๐‘ง๐‘Ž๐‘›๐‘โˆ’1 = 0.2: for โ„“ = 0, โ€ฆ , ๐‘Ž๐‘›๐‘ โˆ’ 1 do3: Let ๐‘0 โ† ๐‘†๐‘‡ ๐ด๐‘…๐‘‡ ๐‘†๐‘Š๐ผ๐‘‡ ๐ป๐‘‰ (๐‘ฅ๐‘ง0 โ‹ฏ ๐‘งโ„“โˆ’10).4: Let ๐‘1 โ† ๐‘†๐‘‡ ๐ด๐‘…๐‘‡ ๐‘†๐‘Š๐ผ๐‘‡ ๐ป๐‘‰ (๐‘ฅ๐‘ง0 โ‹ฏ ๐‘งโ„“โˆ’11).5: if ๐‘0 = ๐‘1 = 0 then6: return โ€โ€7: # Canโ€™t extend ๐‘ฅ๐‘ง0 โ€ฆ ๐‘งโ„“โˆ’1 to input ๐‘‰ accepts8: end if9: if ๐‘0 = 1 then

10: ๐‘งโ„“ โ† 011: # Can extend ๐‘ฅ๐‘ง0 โ€ฆ ๐‘ฅโ„“โˆ’1 with 0 to accepting input12: else13: ๐‘งโ„“ โ† 114: # Can extend ๐‘ฅ๐‘ง0 โ€ฆ ๐‘ฅโ„“โˆ’1 with 1 to accepting input15: end if16: end for17: return ๐‘ง0, โ€ฆ , ๐‘ง๐‘Ž๐‘›๐‘โˆ’1

To analyze Algorithm 16.2, note that it makes 2๐‘Ž๐‘›๐‘ invocations toSTARTSWITH๐‘‰ and hence if the latter is polynomial-time, then so isAlgorithm 16.2 Now suppose that ๐‘ฅ is such that there exists some ๐‘ฆsatisfying ๐‘‰ (๐‘ฅ๐‘ฆ) = 1. We claim that at every step โ„“ = 0, โ€ฆ , ๐‘Ž๐‘›๐‘ โˆ’ 1, wemaintain the invariant that there exists ๐‘ฆ โˆˆ 0, 1๐‘Ž๐‘›๐‘ whose first โ„“ bitsare ๐‘ง s.t. ๐‘‰ (๐‘ฅ๐‘ฆ) = 1. Note that this claim implies the theorem, since inparticular it means that for โ„“ = ๐‘Ž๐‘›๐‘ โˆ’ 1, ๐‘ง satisfies ๐‘‰ (๐‘ฅ๐‘ง) = 1.

We prove the claim by induction. For โ„“ = 0, this holds vacuously.Now for every โ„“ > 0, if the call STARTSWITH๐‘‰ (๐‘ฅ๐‘ง0 โ‹ฏ ๐‘งโ„“โˆ’10)returns 1, then we are guaranteed the invariant by definition ofSTARTSWITH๐‘‰ . Now under our inductive hypothesis, there is๐‘ฆโ„“, โ€ฆ , ๐‘ฆ๐‘Ž๐‘›๐‘โˆ’1 such that ๐‘ƒ(๐‘ฅ๐‘ง0, โ€ฆ , ๐‘งโ„“โˆ’1๐‘ฆโ„“, โ€ฆ , ๐‘ฆ๐‘Ž๐‘›๐‘โˆ’1) = 1. If the call toSTARTSWITH๐‘‰ (๐‘ฅ๐‘ง0 โ‹ฏ ๐‘งโ„“โˆ’10) returns 0 then it must be the case that๐‘ฆโ„“ = 1, and hence when we set ๐‘งโ„“ = 1 we maintain the invariant.

Page 501: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 501

16.2 OPTIMIZATION

Theorem 16.1 allows us to find solutions for NP problems if P = NP,but it is not immediately clear that we can find the optimal solution.For example, suppose that P = NP, and you are given a graph ๐บ. Canyou find the longest simple path in ๐บ in polynomial time?

PThis is actually an excellent question for you to at-tempt on your own. That is, assuming P = NP, givea polynomial-time algorithm that on input a graph ๐บ,outputs a maximally long simple path in the graph ๐บ.

The answer is Yes. The idea is simple: if P = NP then we can findout in polynomial time if an ๐‘›-vertex graph ๐บ contains a simple pathof length ๐‘›, and moreover, by Theorem 16.1, if ๐บ does contain such apath, then we can find it. (Can you see why?) If ๐บ does not contain asimple path of length ๐‘›, then we will check if it contains a simple pathof length ๐‘› โˆ’ 1, and continue in this way to find the largest ๐‘˜ such that๐บ contains a simple path of length ๐‘˜.

The above reasoning was not specifically tailored to finding pathsin graphs. In fact, it can be vastly generalized to proving the followingresult:

Theorem 16.3 โ€” Optimization from P = NP. Suppose that P = NP. Thenfor every polynomial-time computable function ๐‘“ โˆถ 0, 1โˆ— โ†’ โ„•(identifying ๐‘“(๐‘ฅ) with natural numbers via the binary representa-tion) there is a polynomial-time algorithm OPT such that on input๐‘ฅ โˆˆ 0, 1โˆ—,

OPT(๐‘ฅ, 1๐‘š) = max๐‘ฆโˆˆ0,1๐‘š ๐‘“(๐‘ฅ, ๐‘ฆ) . (16.1)

Moreover under the same assumption, there is a polynomial-time algorithm FINDOPT such that for every ๐‘ฅ โˆˆ 0, 1โˆ—, FINDOPT(๐‘ฅ, 1๐‘š)outputs ๐‘ฆโˆ— โˆˆ 0, 1โˆ— such that ๐‘“(๐‘ฅ, ๐‘ฆโˆ—) = max๐‘ฆโˆˆ0,1๐‘š ๐‘“(๐‘ฅ, ๐‘ฆ).

PThe statement of Theorem 16.3 is a bit cumbersome.To understand it, think how it would subsume theexample above of a polynomial time algorithm forfinding the maximum length path in a graph. Inthis case the function ๐‘“ would be the map that oninput a pair ๐‘ฅ, ๐‘ฆ outputs 0 if the pair (๐‘ฅ, ๐‘ฆ) does notrepresent some graph and a simple path inside thegraph respectively; otherwise ๐‘“(๐‘ฅ, ๐‘ฆ) would equalthe length of the path ๐‘ฆ in the graph ๐‘ฅ. Since a pathin an ๐‘› vertex graph can be represented by at most

Page 502: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

502 introduction to theoretical computer science

๐‘› log๐‘› bits, for every ๐‘ฅ representing a graph of ๐‘› ver-tices, finding max๐‘ฆโˆˆ0,1๐‘› log๐‘› ๐‘“(๐‘ฅ, ๐‘ฆ) corresponds tofinding the length of the maximum simple path in thegraph corresponding to ๐‘ฅ, and finding the string ๐‘ฆโˆ—that achieves this maximum corresponds to actuallyfinding the path.

Proof Idea:

The proof follows by generalizing our ideas from the longest pathexample above. Let ๐‘“ be as in the theorem statement. If P = NP thenfor every for every string ๐‘ฅ โˆˆ 0, 1โˆ— and number ๐‘˜, we can test inin ๐‘๐‘œ๐‘™๐‘ฆ(|๐‘ฅ|, ๐‘š) time whether there exists ๐‘ฆ such that ๐‘“(๐‘ฅ, ๐‘ฆ) โ‰ฅ ๐‘˜, orin other words test whether max๐‘ฆโˆˆ0,1๐‘š ๐‘“(๐‘ฅ, ๐‘ฆ) โ‰ฅ ๐‘˜. If ๐‘“(๐‘ฅ, ๐‘ฆ) is aninteger between 0 and ๐‘๐‘œ๐‘™๐‘ฆ(|๐‘ฅ| + |๐‘ฆ|) (as is the case in the example oflongest path) then we can just try out all possibilities for ๐‘˜ to find themaximum number ๐‘˜ for which max๐‘ฆ ๐‘“(๐‘ฅ, ๐‘ฆ) โ‰ฅ ๐‘˜. Otherwise, we canuse binary search to hone down on the right value. Once we do so, wecan use search-to-decision to actually find the string ๐‘ฆโˆ— that achievesthe maximum.โ‹†Proof of Theorem 16.3. For every ๐‘“ as in the theorem statement, we candefine the Boolean function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 as follows.

๐น(๐‘ฅ, 1๐‘š, ๐‘˜) = โŽงโŽจโŽฉ1 โˆƒ๐‘ฆโˆˆ0,1๐‘š๐‘“(๐‘ฅ, ๐‘ฆ) โ‰ฅ ๐‘˜0 otherwise(16.2)

Since ๐‘“ is computable in polynomial time, ๐น is in NP, and so underour assumption that P = NP, ๐น itself can be computed in polynomialtime. Now, for every ๐‘ฅ and ๐‘š, we can compute the largest ๐‘˜ such that๐น(๐‘ฅ, 1๐‘š, ๐‘˜) = 1 by a binary search. Specifically, we will do this asfollows:

1. We maintain two numbers ๐‘Ž, ๐‘ such that we are guaranteed that๐‘Ž โ‰ค max๐‘ฆโˆˆ0,1๐‘š ๐‘“(๐‘ฅ, ๐‘ฆ) < ๐‘.2. Initially we set ๐‘Ž = 0 and ๐‘ = 2๐‘‡ (๐‘›) where ๐‘‡ (๐‘›) is the running time

of ๐‘“ . (A function with ๐‘‡ (๐‘›) running time canโ€™t output more than๐‘‡ (๐‘›) bits and so canโ€™t output a number larger than 2๐‘‡ (๐‘›).)3. At each point in time, we compute the midpoint ๐‘ = โŒŠ(๐‘Ž + ๐‘)/2โŒ‹)

and let ๐‘ฆ = ๐น(1๐‘›, ๐‘).a. If ๐‘ฆ = 1 then we set ๐‘Ž = ๐‘ and leave ๐‘ as it is.b. If ๐‘ฆ = 0 then we set ๐‘ = ๐‘ and leave ๐‘Ž as it is.

4. We then go back to step 3, until ๐‘ โ‰ค ๐‘Ž + 1.

Page 503: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 503

Since |๐‘ โˆ’ ๐‘Ž| shrinks by a factor of 2, within log2 2๐‘‡ (๐‘›) = ๐‘‡ (๐‘›)steps, we will get to the point at which ๐‘ โ‰ค ๐‘Ž + 1, and then we cansimply output ๐‘Ž. Once we find the maximum value of ๐‘˜ such that๐น(๐‘ฅ, 1๐‘š, ๐‘˜) = 1, we can use the search to decision reduction of Theo-rem 16.1 to obtain the actual value ๐‘ฆโˆ— โˆˆ 0, 1๐‘š such that ๐‘“(๐‘ฅ, ๐‘ฆโˆ—) = ๐‘˜.

Example 16.4 โ€” Integer programming. One application for Theo-rem 16.3 is in solving optimization problems. For example, the taskof linear programming is to find ๐‘ฆ โˆˆ โ„๐‘› that maximizes some linearobjective โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘๐‘–๐‘ฆ๐‘– subject to the constraint that ๐‘ฆ satisfies linearinequalities of the form โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘Ž๐‘–๐‘ฆ๐‘– โ‰ค ๐‘. As we discussed in Sec-tion 12.1.3, there is a known polynomial-time algorithm for linearprogramming. However, if we want to place additional constraintson ๐‘ฆ, such as requiring the coordinates of ๐‘ฆ to be integer or 0/1valued then the best-known algorithms run in exponential time inthe worst case. However, if P = NP then Theorem 16.3 tells usthat we would be able to solve all problems of this form in poly-nomial time. For every string ๐‘ฅ that describes a set of constraintsand objective, we will define a function ๐‘“ such that if ๐‘ฆ satisfiesthe constraints of ๐‘ฅ then ๐‘“(๐‘ฅ, ๐‘ฆ) is the value of the objective, andotherwise we set ๐‘“(๐‘ฅ, ๐‘ฆ) = โˆ’๐‘€ where ๐‘€ is some large number. Wecan then use Theorem 16.3 to compute the ๐‘ฆ that maximizes ๐‘“(๐‘ฅ, ๐‘ฆ)and that will give us the assignment for the variables that satisfiesour constraints and maximizes the objective. (If the computationresults in ๐‘ฆ such that ๐‘“(๐‘ฅ, ๐‘ฆ) = โˆ’๐‘€ then we can double ๐‘€ and tryagain; if the true maximum objective is achieved by some string๐‘ฆโˆ—, then eventually ๐‘€ will be large enough so that โˆ’๐‘€ would besmaller than the objective achieved by ๐‘ฆโˆ—, and hence when we runprocedure of Theorem 16.3 we would get a value larger than โˆ’๐‘€ .)

RRemark 16.5 โ€” Need for binary search.. In many exam-ples, such as the case of finding longest path, we donโ€™tneed to use the binary search step in Theorem 16.3,and can simply enumerate over all possible values for๐‘˜ until we find the correct one. One example wherewe do need to use this binary search step is in the caseof the problem of finding a maximum length path ina weighted graph. This is the problem where ๐บ is aweighted graph, and every edge of ๐บ is given a weightwhich is a number between 0 and 2๐‘˜. Theorem 16.3shows that we can find the maximum-weight simplepath in ๐บ (i.e., simple path maximizing the sum of

Page 504: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

504 introduction to theoretical computer science

3 This is often known as Empirical Risk Minimization.

the weights of its edges) in time polynomial in thenumber of vertices and in ๐‘˜.Beyond just this example there is a vast field of math-ematical optimization that studies problems of thesame form as in Theorem 16.3. In the context of opti-mization, ๐‘ฅ typically denotes a set of constraints oversome variables (that can be Boolean, integer, or realvalued), ๐‘ฆ encodes an assignment to these variables,and ๐‘“(๐‘ฅ, ๐‘ฆ) is the value of some objective function thatwe want to maximize. Given that we donโ€™t knowefficient algorithms for NP complete problems, re-searchers in optimization research study special casesof functions ๐‘“ (such as linear programming andsemidefinite programming) where it is possible tooptimize the value efficiently. Optimization is widelyused in a great many scientific areas including: ma-chine learning, engineering, economics and operationsresearch.

16.2.1 Example: Supervised learningOne classical optimization task is supervised learning. In supervisedlearning we are given a list of examples ๐‘ฅ0, ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘šโˆ’1 (where wecan think of each ๐‘ฅ๐‘– as a string in 0, 1๐‘› for some ๐‘›) and the la-bels for them ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘›โˆ’1 (which we will think of simply bits, i.e.,๐‘ฆ๐‘– โˆˆ 0, 1). For example, we can think of the ๐‘ฅ๐‘–โ€™s as images of ei-ther dogs or cats, for which ๐‘ฆ๐‘– = 1 in the former case and ๐‘ฆ๐‘– = 0in the latter case. Our goal is to come up with a hypothesis or predic-tor โ„Ž โˆถ 0, 1๐‘› โ†’ 0, 1 such that if we are given a new example ๐‘ฅthat has an (unknown to us) label ๐‘ฆ, then with high probability โ„Žwill predict the label. That is, with high probability it will hold thatโ„Ž(๐‘ฅ) = ๐‘ฆ. The idea in supervised learning is to use the Occamโ€™s Ra-zor principle: the simplest hypothesis that explains the data is likelyto be correct. There are several ways to model this, but one popularapproach is to pick some fairly simple function ๐ป โˆถ 0, 1๐‘˜+๐‘› โ†’ 0, 1.We think of the first ๐‘˜ inputs as the parameters and the last ๐‘› inputsas the example data. (For example, we can think of the first ๐‘˜ inputsof ๐ป as specifying the weights and connections for some neural net-work that will then be applied on the latter ๐‘› inputs.) We can thenphrase the supervised learning problem as finding, given a set of la-beled examples ๐‘† = (๐‘ฅ0, ๐‘ฆ0), โ€ฆ , (๐‘ฅ๐‘šโˆ’1, ๐‘ฆ๐‘šโˆ’1), the set of parameters๐œƒ0, โ€ฆ , ๐œƒ๐‘˜โˆ’1 โˆˆ 0, 1 that minimizes the number of errors made by thepredictor ๐‘ฅ โ†ฆ ๐ป(๐œƒ, ๐‘ฅ).3

In other words, we can define for every set ๐‘† as above the function๐น๐‘† โˆถ 0, 1๐‘˜ โ†’ [๐‘š] such that ๐น๐‘†(๐œƒ) = โˆ‘(๐‘ฅ,๐‘ฆ)โˆˆ๐‘† |๐ป(๐œƒ, ๐‘ฅ) โˆ’ ๐‘ฆ|. Now,finding the value ๐œƒ that minimizes ๐น๐‘†(๐œƒ) is equivalent to solving thesupervised learning problem with respect to ๐ป . For every polynomial-

Page 505: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 505

time computable ๐ป โˆถ 0, 1๐‘˜+๐‘› โ†’ 0, 1, the task of minimizing๐น๐‘†(๐œƒ) can be โ€œmassagedโ€ to fit the form of Theorem 16.3 and hence ifP = NP, then we can solve the supervised learning problem in greatgenerality. In fact, this observation extends to essentially any learn-ing model, and allows for finding the optimal predictors given theminimum number of examples. (This is in contrast to many currentlearning algorithms, which often rely on having access to an extremelylarge number of examplesโˆ’ far beyond the minimum needed, andin particular far beyond the number of examples humans use for thesame tasks.)

16.2.2 Example: Breaking cryptosystemsWe will discuss cryptography later in this course, but it turns out thatif P = NP then almost every cryptosystem can be efficiently bro-ken. One approach is to treat finding an encryption key as an in-stance of a supervised learning problem. If there is an encryptionscheme that maps a โ€œplaintextโ€ message ๐‘ and a key ๐œƒ to a โ€œcipher-textโ€ ๐‘, then given examples of ciphertext/plaintext pairs of theform (๐‘0, ๐‘0), โ€ฆ , (๐‘๐‘šโˆ’1, ๐‘๐‘šโˆ’1), our goal is to find the key ๐œƒ such that๐ธ(๐œƒ, ๐‘๐‘–) = ๐‘๐‘– where ๐ธ is the encryption algorithm. While you mightthink getting such โ€œlabeled examplesโ€ is unrealistic, it turns out (asmany amateur home-brew crypto designers learn the hard way) thatthis is actually quite common in real-life scenarios, and that it is alsopossible to relax the assumption to having more minimal prior infor-mation about the plaintext (e.g., that it is English text). We defer amore formal treatment to Chapter 21.

16.3 FINDING MATHEMATICAL PROOFS

In the context of Gรถdelโ€™s Theorem, we discussed the notion of a proofsystem (see Section 11.1). Generally speaking, a proof system can bethought of as an algorithm ๐‘‰ โˆถ 0, 1โˆ— โ†’ 0, 1 (known as the verifier)such that given a statement ๐‘ฅ โˆˆ 0, 1โˆ— and a candidate proof ๐‘ค โˆˆ 0, 1โˆ—,๐‘‰ (๐‘ฅ, ๐‘ค) = 1 if and only if ๐‘ค encodes a valid proof for the statement ๐‘ฅ.Any type of proof system that is used in mathematics for geometry,number theory, analysis, etc., is an instance of this form. In fact, stan-dard mathematical proof systems have an even simpler form wherethe proof ๐‘ค encodes a sequence of lines ๐‘ค0, โ€ฆ , ๐‘ค๐‘š (each of which isitself a binary string) such that each line ๐‘ค๐‘– is either an axiom or fol-lows from some prior lines through an application of some inferencerule. For example, Peanoโ€™s axioms encode a set of axioms and rulesfor the natural numbers, and one can use them to formalize proofsin number theory. Also, there are some even stronger axiomatic sys-tems, the most popular one being Zermeloโ€“Fraenkel with the Axiomof Choice or ZFC for short. Thus, although mathematicians typically

Page 506: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

506 introduction to theoretical computer science

4 The undecidability of Entscheidungsproblem refersto the uncomputability of the function that maps astatement in first order logic to 1 if and only if thatstatement has a proof.

write their papers in natural language, proofs of number theoristscan typically be translated to ZFC or similar systems, and so in par-ticular the existence of an ๐‘›-page proof for a statement ๐‘ฅ implies thatthere exists a string ๐‘ค of length ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) (in fact often ๐‘‚(๐‘›) or ๐‘‚(๐‘›2))that encodes the proof in such a system. Moreover, because verify-ing a proof simply involves going over each line and checking that itdoes indeed follow from the prior lines, it is fairly easy to do that in๐‘‚(|๐‘ค|) or ๐‘‚(|๐‘ค|2) (where as usual |๐‘ค| denotes the length of the proof๐‘ค). This means that for every reasonable proof system ๐‘‰ , the follow-ing function SHORTPROOF๐‘‰ โˆถ 0, 1โˆ— โ†’ 0, 1 is in NP, wherefor every input of the form ๐‘ฅ1๐‘š, SHORTPROOF๐‘‰ (๐‘ฅ, 1๐‘š) = 1 if andonly if there exists ๐‘ค โˆˆ 0, 1โˆ— with |๐‘ค| โ‰ค ๐‘š s.t. ๐‘‰ (๐‘ฅ๐‘ค) = 1. Thatis, SHORTPROOF๐‘‰ (๐‘ฅ, 1๐‘š) = 1 if there is a proof (in the system ๐‘‰ )of length at most ๐‘š bits that ๐‘ฅ is true. Thus, if P = NP, then despiteGรถdelโ€™s Incompleteness Theorems, we can still automate mathematicsin the sense of finding proofs that are not too long for every statementthat has one. (Frankly speaking, if the shortest proof for some state-ment requires a terabyte, then human mathematicians wonโ€™t ever findthis proof either.) For this reason, Gรถdel himself felt that the questionof whether SHORTPROOF๐‘‰ has a polynomial time algorithm is ofgreat interest. As Gรถdel wrote in a letter to John von Neumann in 1956(before the concept of NP or even โ€œpolynomial timeโ€ was formallydefined):

One can obviously easily construct a Turing machine, which for everyformula ๐น in first order predicate logic and every natural number ๐‘›, al-lows one to decide if there is a proof of ๐น of length ๐‘› (length = numberof symbols). Let ๐œ“(๐น , ๐‘›) be the number of steps the machine requiresfor this and let ๐œ‘(๐‘›) = max๐น ๐œ“(๐น , ๐‘›). The question is how fast ๐œ‘(๐‘›)grows for an optimal machine. One can show that ๐œ‘ โ‰ฅ ๐‘˜ โ‹… ๐‘› [for someconstant ๐‘˜ > 0]. If there really were a machine with ๐œ‘(๐‘›) โˆผ ๐‘˜ โ‹… ๐‘› (oreven โˆผ ๐‘˜ โ‹… ๐‘›2), this would have consequences of the greatest importance.Namely, it would obviously mean that in spite of the undecidabilityof the Entscheidungsproblem,4 the mental work of a mathematicianconcerning Yes-or-No questions could be completely replaced by a ma-chine. After all, one would simply have to choose the natural number๐‘› so large that when the machine does not deliver a result, it makes nosense to think more about the problem.

For many reasonable proof systems (including the one that Gรถdelreferred to), SHORTPROOF๐‘‰ is in fact NP-complete, and so Gรถdel canbe thought of as the first person to formulate the P vs NP question.Unfortunately, the letter was only discovered in 1988.

Page 507: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 507

5 Since NAND-CIRC programs are equivalent toBoolean circuits, the search problem corresponding to(16.6) known as the circuit minimization problem andis widely studied in Engineering. You can skip aheadto Section 16.4.1 to see a particularly compellingapplication of this.

16.4 QUANTIFIER ELIMINATION (ADVANCED)

If P = NP then we can solve all NP search and optimization problems inpolynomial time. But can we do more? It turns out that the answer isthat Yes we can!

An NP decision problem can be thought of as the task of deciding,given some string ๐‘ฅ โˆˆ 0, 1โˆ— the truth of a statement of the formโˆƒ๐‘ฆโˆˆ0,1๐‘(|๐‘ฅ|)๐‘‰ (๐‘ฅ๐‘ฆ) = 1 (16.3)

for some polynomial-time algorithm ๐‘‰ and polynomial ๐‘ โˆถ โ„• โ†’ โ„•.That is, we are trying to determine, given some string ๐‘ฅ, whetherthere exists a string ๐‘ฆ such that ๐‘ฅ and ๐‘ฆ satisfy some polynomial-timecheckable condition ๐‘‰ . For example, in the independent set problem,the string ๐‘ฅ represents a graph ๐บ and a number ๐‘˜, the string ๐‘ฆ repre-sents some subset ๐‘† of ๐บโ€™s vertices, and the condition that we check iswhether |๐‘†| โ‰ฅ ๐‘˜ and there is no edge ๐‘ข, ๐‘ฃ in ๐บ such that both ๐‘ข โˆˆ ๐‘†and ๐‘ฃ โˆˆ ๐‘†.

We can consider more general statements such as checking, given astring ๐‘ฅ โˆˆ 0, 1โˆ—, the truth of a statement of the formโˆƒ๐‘ฆโˆˆ0,1๐‘0(|๐‘ฅ|)โˆ€๐‘งโˆˆ0,1๐‘1(|๐‘ฅ|)๐‘‰ (๐‘ฅ๐‘ฆ๐‘ง) = 1 , (16.4)

which in words corresponds to checking, given some string ๐‘ฅ, whetherthere exists a string ๐‘ฆ such that for every string ๐‘ง, the triple (๐‘ฅ, ๐‘ฆ, ๐‘ง) sat-isfy some polynomial-time checkable condition. We can also considermore levels of quantifiers such as checking the truth of the statementโˆƒ๐‘ฆโˆˆ0,1๐‘0(|๐‘ฅ|)โˆ€๐‘งโˆˆ0,1๐‘1(|๐‘ฅ|)โˆƒ๐‘คโˆˆ0,1๐‘2(|๐‘ฅ|)๐‘‰ (๐‘ฅ๐‘ฆ๐‘ง๐‘ค) = 1 (16.5)

and so on and so forth.For example, given an ๐‘›-input NAND-CIRC program ๐‘ƒ , we might

want to find the smallest NAND-CIRC program ๐‘ƒ โ€ฒ that computes thesame function as ๐‘ƒ . The question of whether there is such a ๐‘ƒ โ€ฒ thatcan be described by a string of at most ๐‘  bits can be phrased asโˆƒ๐‘ƒ โ€ฒโˆˆ0,1๐‘ โˆ€๐‘ฅโˆˆ0,1๐‘›๐‘ƒ(๐‘ฅ) = ๐‘ƒ โ€ฒ(๐‘ฅ) (16.6)which has the form (16.4).5 Another example of a statement involving๐‘Ž levels of quantifiers would be to check, given a chess position ๐‘ฅ,whether there is a strategy that guarantees that White wins within ๐‘Žsteps. For example is ๐‘Ž = 3 we would want to check if given the boardposition ๐‘ฅ, there exists a move ๐‘ฆ for White such that for every move ๐‘ง forBlack there exists a move ๐‘ค for White that ends in a a checkmate.

It turns out that if P = NP then we can solve these kinds of prob-lems as well:

Page 508: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

508 introduction to theoretical computer science

6 For the ease of notation, we assume that all thestrings we quantify over have the same length ๐‘š =๐‘(๐‘›), but using simple padding one can show thatthis captures the general case of strings of differentpolynomial lengths.

Theorem 16.6 โ€” Polynomial hierarchy collapse. If P = NP then for every๐‘Ž โˆˆ โ„•, polynomial ๐‘ โˆถ โ„• โ†’ โ„• and polynomial-time algorithm๐‘‰ , there is a polynomial-time algorithm SOLVE๐‘‰ ,๐‘Ž that on input๐‘ฅ โˆˆ 0, 1๐‘› returns 1 if and only ifโˆƒ๐‘ฆ0โˆˆ0,1๐‘šโˆ€๐‘ฆ1โˆˆ0,1๐‘š โ‹ฏ ๐’ฌ๐‘ฆ๐‘Žโˆ’1โˆˆ0,1๐‘š๐‘‰ (๐‘ฅ๐‘ฆ0๐‘ฆ1 โ‹ฏ ๐‘ฆ๐‘Žโˆ’1) = 1 (16.7)

where ๐‘š = ๐‘(๐‘›) and ๐’ฌ is either โˆƒ or โˆ€ depending on whether ๐‘Ž isodd or even, respectively. 6

Proof Idea:

To understand the idea behind the proof, consider the special casewhere we want to decide, given ๐‘ฅ โˆˆ 0, 1๐‘›, whether for every ๐‘ฆ โˆˆ0, 1๐‘› there exists ๐‘ง โˆˆ 0, 1๐‘› such that ๐‘‰ (๐‘ฅ๐‘ฆ๐‘ง) = 1. Consider thefunction ๐น such that ๐น(๐‘ฅ๐‘ฆ) = 1 if there exists ๐‘ง โˆˆ 0, 1๐‘› such that๐‘‰ (๐‘ฅ๐‘ฆ๐‘ง) = 1. Since ๐‘‰ runs in polynomial-time ๐น โˆˆ NP and hence ifP = NP, then there is an algorithm ๐‘‰ โ€ฒ that on input ๐‘ฅ, ๐‘ฆ outputs 1 ifand only if there exists ๐‘ง โˆˆ 0, 1๐‘› such that ๐‘‰ (๐‘ฅ๐‘ฆ๐‘ง) = 1. Now wecan see that the original statement we consider is true if and only if forevery ๐‘ฆ โˆˆ 0, 1๐‘›, ๐‘‰ โ€ฒ(๐‘ฅ๐‘ฆ) = 1, which means it is false if and only ifthe following condition (โˆ—) holds: there exists some ๐‘ฆ โˆˆ 0, 1๐‘› suchthat ๐‘‰ โ€ฒ(๐‘ฅ๐‘ฆ) = 0. But for every ๐‘ฅ โˆˆ 0, 1๐‘›, the question of whetherthe condition (โˆ—) is itself in NP (as we assumed ๐‘‰ โ€ฒ can be computedin polynomial time) and hence under the assumption that P = NPwe can determine in polynomial time whether the condition (โˆ—), andhence our original statement, is true.โ‹†Proof of Theorem 16.6. We prove the theorem by induction. We assumethat there is a polynomial-time algorithm SOLVE๐‘‰ ,๐‘Žโˆ’1 that can solvethe problem (16.7) for ๐‘Ž โˆ’ 1 and use that to solve the problem for ๐‘Ž.For ๐‘Ž = 1, SOLVE๐‘‰ ,๐‘Žโˆ’1(๐‘ฅ) = 1 iff ๐‘‰ (๐‘ฅ) = 1 which is a polynomial-timecomputation since ๐‘‰ runs in polynomial time. For every ๐‘ฅ, ๐‘ฆ0, definethe statement ๐œ‘๐‘ฅ,๐‘ฆ0 to be the following:

๐œ‘๐‘ฅ,๐‘ฆ0 = โˆ€๐‘ฆ1โˆˆ0,1๐‘šโˆƒ๐‘ฆ2โˆˆ0,1๐‘š โ‹ฏ ๐’ฌ๐‘ฆ๐‘Žโˆ’1โˆˆ0,1๐‘š๐‘‰ (๐‘ฅ๐‘ฆ0๐‘ฆ1 โ‹ฏ ๐‘ฆ๐‘Žโˆ’1) = 1 (16.8)

By the definition of SOLVE๐‘‰ ,๐‘Ž, for every ๐‘ฅ โˆˆ 0, 1๐‘›, our goal isthat SOLVE๐‘‰ ,๐‘Ž(๐‘ฅ) = 1 if and only if there exists ๐‘ฆ0 โˆˆ 0, 1๐‘š such that๐œ‘๐‘ฅ,๐‘ฆ0 is true.

The negation of ๐œ‘๐‘ฅ,๐‘ฆ0 is the statement

๐œ‘๐‘ฅ,๐‘ฆ0 = โˆƒ๐‘ฆ1โˆˆ0,1๐‘šโˆ€๐‘ฆ2โˆˆ0,1๐‘š โ‹ฏ ๐’ฌ๐‘ฆ๐‘Žโˆ’1โˆˆ0,1๐‘š๐‘‰ (๐‘ฅ๐‘ฆ0๐‘ฆ1 โ‹ฏ ๐‘ฆ๐‘Žโˆ’1) = 0 (16.9)

Page 509: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 509

7 We do not know whether such loss is inherent.As far as we can tell, itโ€™s possible that the quantifiedboolean formula problem has a linear-time algorithm.We will, however, see later in this course that itsatisfies a notion known as PSPACE-hardness that iseven stronger than NP-hardness.

where ๐’ฌ is โˆƒ if ๐’ฌ was โˆ€ and ๐’ฌ is โˆ€ otherwise. (Please stop and verifythat you understand why this is true, this is a generalization of the factthat if ฮจ is some logical condition then the negation of โˆƒ๐‘ฆโˆ€๐‘งฮจ(๐‘ฆ, ๐‘ง) isโˆ€๐‘ฆโˆƒ๐‘งยฌฮจ(๐‘ฆ, ๐‘ง).)

The crucial observation is that ๐œ‘๐‘ฅ,๐‘ฆ0 is exactly a statement of theform we consider with ๐‘Ž โˆ’ 1 quantifiers instead of ๐‘Ž, and hence byour inductive hypothesis there is some polynomial time algorithm๐‘† that on input ๐‘ฅ๐‘ฆ0 outputs 1 if and only if ๐œ‘๐‘ฅ,๐‘ฆ0 is true. If we let ๐‘†be the algorithm that on input ๐‘ฅ, ๐‘ฆ0 outputs 1 โˆ’ ๐‘†(๐‘ฅ๐‘ฆ0) then we seethat ๐‘† outputs 1 if and only if ๐œ‘๐‘ฅ,๐‘ฆ0 is true. Hence we can rephrase theoriginal statement (16.7) as follows:โˆƒ๐‘ฆ0โˆˆ0,1๐‘š๐‘†(๐‘ฅ๐‘ฆ0) = 1 (16.10)

but since ๐‘† is a polynomial-time algorithm, Eq. (16.10) is clearly astatement in NP and hence under our assumption that P = NP there isa polynomial time algorithm that on input ๐‘ฅ โˆˆ 0, 1๐‘›, will determineif (16.10) is true and so also if the original statement (16.7) is true.

The algorithm of Theorem 16.6 can also solve the search problemas well: find the value ๐‘ฆ0 that certifies the truth of (16.7). We notethat while this algorithm is in polynomial time, the exponent of thispolynomial blows up quite fast. If the original NANDSAT algorithmrequired ฮฉ(๐‘›2) time, solving ๐‘Ž levels of quantifiers would require timeฮฉ(๐‘›2๐‘Ž).716.4.1 Application: self improving algorithm for 3SATSuppose that we found a polynomial-time algorithm ๐ด for 3SAT thatis โ€œgood but not greatโ€. For example, maybe our algorithm runs intime ๐‘๐‘›2 for some not too small constant ๐‘. However, itโ€™s possiblethat the best possible SAT algorithm is actually much more efficientthan that. Perhaps, as we guessed before, there is a circuit ๐ถโˆ— of atmost 106๐‘› gates that computes 3SAT on ๐‘› variables, and we simplyhavenโ€™t discovered it yet. We can use Theorem 16.6 to โ€œbootstrapโ€ ouroriginal โ€œgood but not greatโ€ 3SAT algorithm to discover the optimalone. The idea is that we can phrase the question of whether thereexists a size ๐‘  circuit that computes 3SAT for all length ๐‘› inputs asfollows: there exists a size โ‰ค ๐‘  circuit ๐ถ such that for every formula ๐œ‘described by a string of length at most ๐‘›, if ๐ถ(๐œ‘) = 1 then there existsan assignment ๐‘ฅ to the variables of ๐œ‘ that satisfies it. One can see thatthis is a statement of the form (16.5) and hence if P = NP we can solveit in polynomial time as well. We can therefore imagine investing hugecomputational resources in running ๐ด one time to discover the circuit๐ถโˆ— and then using ๐ถโˆ— for all further computation.

Page 510: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

510 introduction to theoretical computer science

16.5 APPROXIMATING COUNTING PROBLEMS AND POSTERIORSAMPLING (ADVANCED, OPTIONAL)

Given a Boolean circuit ๐ถ, if P = NP then we can find an input ๐‘ฅ (ifone exists) such that ๐ถ(๐‘ฅ) = 1. But what if there is more than one ๐‘ฅlike that? Clearly we canโ€™t efficiently output all such ๐‘ฅโ€™s; there mightbe exponentially many. But we can get an arbitrarily good multiplica-tive approximation (i.e., a 1ยฑ๐œ– factor for arbitrarily small ๐œ– > 0) for thenumber of such ๐‘ฅโ€™s, as well as output a (nearly) uniform member ofthis set. The details are beyond the scope of this book, but this result isformally stated in the following theorem (whose proof is omitted).

Theorem 16.7 โ€” Approximate counting if P = NP. Let ๐‘‰ โˆถ 0, 1โˆ— โ†’ 0, 1be some polynomial-time algorithm, and suppose that P = NP.Then there exists an algorithm COUNT๐‘‰ that on input ๐‘ฅ, 1๐‘š, ๐œ–,runs in time polynomial in |๐‘ฅ|, ๐‘š, 1/๐œ– and outputs a number in[2๐‘š + 1] satisfying(1โˆ’๐œ–)COUNT๐‘‰ (๐‘ฅ, ๐‘š, ๐œ–) โ‰ค โˆฃ๐‘ฆ โˆˆ 0, 1๐‘š โˆถ ๐‘‰ (๐‘ฅ๐‘ฆ) = 1โˆฃ โ‰ค (1+๐œ–)COUNT๐‘‰ (๐‘ฅ, ๐‘š, ๐œ–) .

(16.11)

In other words, the algorithm COUNT๐‘‰ gives an approximationup to a factor of 1 ยฑ ๐œ– for the number of witnesses for ๐‘ฅ with respectto the verifying algorithm ๐‘‰ . Once again, to understand this theoremit can be useful to see how it implies that if P = NP then there is apolynomial-time algorithm that given a graph ๐บ and a number ๐‘˜,can compute a number ๐พ that is within a 1 ยฑ 0.01 factor equal to thenumber of simple paths in ๐บ of length ๐‘˜. (That is, ๐พ is between 0.99 to1.01 times the number of such paths.)

Posterior sampling and probabilistic programming. The algorithm for count-ing can also be extended to sampling from a given posterior distri-bution. That is, if ๐ถ โˆถ 0, 1๐‘› โ†’ 0, 1๐‘š is a Boolean circuit and๐‘ฆ โˆˆ 0, 1๐‘š, then if P = NP we can sample from (a close approx-imation of) the distribution of uniform ๐‘ฅ โˆˆ 0, 1๐‘› conditioned on๐ถ(๐‘ฅ) = ๐‘ฆ. This task is known as posterior sampling and is crucial forBayesian data analysis. These days it is known how to achieve pos-terior sampling only for circuits ๐ถ of very special form, and even inthese cases more often than not we do have guarantees on the qualityof the sampling algorithm. The field of making inferences by samplingfrom posterior distribution specified by circuits or programs is knownas probabilistic programming.

Page 511: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 511

8 One interesting theory is that P = NP and evolutionhas already discovered this algorithm, which we arealready using without realizing it. At the moment,there seems to be very little evidence for such a sce-nario. In fact, we have some partial results in theother direction showing that, regardless of whetherP = NP, many types of โ€œlocal searchโ€ or โ€œevolution-aryโ€ algorithms require exponential time to solve3SAT and other NP-hard problems.

16.6 WHAT DOES ALL OF THIS IMPLY?

So, what will happen if we have a 106๐‘› algorithm for 3SAT? We havementioned that NP-hard problems arise in many contexts, and indeedscientists, engineers, programmers and others routinely encountersuch problems in their daily work. A better 3SAT algorithm will prob-ably make their lives easier, but that is the wrong place to look forthe most foundational consequences. Indeed, while the invention ofelectronic computers did of course make it easier to do calculationsthat people were already doing with mechanical devices and pen andpaper, the main applications computers are used for today were noteven imagined before their invention.

An exponentially faster algorithm for all NP problems would beno less radical an improvement (and indeed, in some sense wouldbe more) than the computer itself, and it is as hard for us to imaginewhat it would imply as it was for Babbage to envision todayโ€™s world.For starters, such an algorithm would completely change the way weprogram computers. Since we could automatically find the โ€œbestโ€(in any measure we chose) program that achieves a certain task, wewould not need to define how to achieve a task, but only specify testsas to what would be a good solution, and could also ensure that aprogram satisfies an exponential number of tests without actuallyrunning them.

The possibility that P = NP is often described as โ€œautomatingcreativityโ€. There is something to that analogy, as we often think ofa creative solution as one that is hard to discover but that, once theโ€œsparkโ€ hits, is easy to verify. But there is also an element of hubristo that statement, implying that the most impressive consequence ofsuch an algorithmic breakthrough will be that computers would suc-ceed in doing something that humans already do today. Nevertheless,artificial intelligence, like many other fields, will clearly be greatlyimpacted by an efficient 3SAT algorithm. For example, it is clearlymuch easier to find a better Chess-playing algorithm when, given anyalgorithm ๐‘ƒ , you can find the smallest algorithm ๐‘ƒ โ€ฒ that plays Chessbetter than ๐‘ƒ . Moreover, as we mentioned above, much of machinelearning (and statistical reasoning in general) is about finding โ€œsim-pleโ€ concepts that explain the observed data, and if NP = P, we couldsearch for such concepts automatically for any notion of โ€œsimplicityโ€we see fit. In fact, we could even โ€œskip the middle manโ€ and do anautomatic search for the learning algorithm with smallest general-ization error. Ultimately the field of Artificial Intelligence is abouttrying to โ€œshortcutโ€ billions of years of evolution to obtain artificialprograms that match (or beat) the performance of natural ones, and afast algorithm for NP would provide the ultimate shortcut.8

Page 512: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

512 introduction to theoretical computer science

More generally, a faster algorithm for NP problems would be im-mensely useful in any field where one is faced with computational orquantitative problemsโˆ’ which is basically all fields of science, math,and engineering. This will not only help with concrete problems suchas designing a better bridge, or finding a better drug, but also withaddressing basic mysteries such as trying to find scientific theories orโ€œlaws of natureโ€. In a fascinating talk, physicist Nima Arkani-Hameddiscusses the effort of finding scientific theories in much the same lan-guage as one would describe solving an NP problem, for which thesolution is easy to verify or seems โ€œinevitableโ€, once found, but thatrequires searching through a huge landscape of possibilities to reach,and that often can get โ€œstuckโ€ at local optima:

โ€œthe laws of nature have this amazing feeling of inevitabilityโ€ฆ which is associ-ated with local perfection.โ€

โ€œThe classical picture of the world is the top of a local mountain in the space ofideas. And you go up to the top and it looks amazing up there and absolutelyincredible. And you learn that there is a taller mountain out there. Find it,Mount Quantumโ€ฆ. theyโ€™re not smoothly connected โ€ฆ youโ€™ve got to make ajump to go from classical to quantum โ€ฆ This also tells you why we have suchmajor challenges in trying to extend our understanding of physics. We donโ€™thave these knobs, and little wheels, and twiddles that we can turn. We have tolearn how to make these jumps. And it is a tall order. And thatโ€™s why things aredifficult.โ€

Finding an efficient algorithm for NP amounts to always being ableto search through an exponential space and find not just the โ€œlocalโ€mountain, but the tallest peak.

But perhaps more than any computational speedups, a fast algo-rithm for NP problems would bring about a new type of understanding.In many of the areas where NP-completeness arises, it is not as mucha barrier for solving computational problems as it is a barrier for ob-taining โ€œclosed-form formulasโ€ or other types of more constructivedescriptions of the behavior of natural, biological, social and other sys-tems. A better algorithm for NP, even if it is โ€œmerelyโ€ 2โˆš๐‘›-time, seemsto require obtaining a new way to understand these types of systems,whether it is characterizing Nash equilibria, spin-glass configurations,entangled quantum states, or any of the other questions where NP iscurrently a barrier for analytical understanding. Such new insightswould be very fruitful regardless of their computational utility.

Big Idea 23 If P = NP, we can efficiently solve a fantastic numberof decision, search, optimization, counting, and sampling problemsfrom all areas of human endeavors.

Page 513: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 513

9 This inefficiency is not necessarily inherent. Laterin this course we may discuss results in program-checking, interactive proofs, and average-case com-plexity, that can be used for efficient verification ofproofs of related statements. In contrast, the ineffi-ciency of verifying failure of all programs could wellbe inherent.

16.7 CAN P โ‰  NP BE NEITHER TRUE NOR FALSE?

The Continuum Hypothesis is a conjecture made by Georg Cantor in1878, positing the non-existence of a certain type of infinite cardinality.(One way to phrase it is that for every infinite subset ๐‘† of the realnumbers โ„, either there is a one-to-one and onto function ๐‘“ โˆถ ๐‘† โ†’ โ„or there is a one-to-one and onto function ๐‘“ โˆถ ๐‘† โ†’ โ„•.) This wasconsidered one of the most important open problems in set theory,and settling its truth or falseness was the first problem put forward byHilbert in the 1900 address we mentioned before. However, using thetheories developed by Gรถdel and Turing, in 1963 Paul Cohen provedthat both the Continuum Hypothesis and its negation are consistentwith the standard axioms of set theory (i.e., the Zermelo-Fraenkelaxioms + the Axiom of choice, or โ€œZFCโ€ for short). Formally, whathe proved is that if ZFC is consistent, then so is ZFC when we assumeeither the continuum hypothesis or its negation.

Today, many (though not all) mathematicians interpret this resultas saying that the Continuum Hypothesis is neither true nor false, butrather is an axiomatic choice that we are free to make one way or theother. Could the same hold for P โ‰  NP?

In short, the answer is No. For example, suppose that we are try-ing to decide between the โ€œ3SAT is easyโ€ conjecture (there is an 106๐‘›time algorithm for 3SAT) and the โ€œ3SAT is hardโ€ conjecture (for ev-ery ๐‘›, any NAND-CIRC program that solves ๐‘› variable 3SAT takes210โˆ’6๐‘› lines). Then, since for ๐‘› = 108, 210โˆ’6๐‘› > 106๐‘›, this boils downto the finite question of deciding whether or not there is a 1013-lineNAND-CIRC program deciding 3SAT on formulas with 108 variables.If there is such a program then there is a finite proof of its existence,namely the approximately 1TB file describing the program, and forwhich the verification is the (finite in principle though infeasible inpractice) process of checking that it succeeds on all inputs.9 If thereisnโ€™t such a program, then there is also a finite proof of that, thoughany such proof would take longer since we would need to enumer-ate over all programs as well. Ultimately, since it boils down to a finitestatement about bits and numbers; either the statement or its negationmust follow from the standard axioms of arithmetic in a finite numberof arithmetic steps. Thus, we cannot justify our ignorance in distin-guishing between the โ€œ3SAT easyโ€ and โ€œ3SAT hardโ€ cases by claimingthat this might be an inherently ill-defined question. Similar reason-ing (with different numbers) applies to other variants of the P vs NPquestion. We note that in the case that 3SAT is hard, it may well bethat there is no short proof of this fact using the standard axioms, andthis is a question that people have been studying in various restrictedforms of proof systems.

Page 514: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

514 introduction to theoretical computer science

10 Actually, the computational difficulty of problems ineconomics such as finding optimal (or any) equilibriais quite subtle. Some variants of such problems areNP-hard, while others have a certain โ€œintermediateโ€complexity.

11 Talk more about coping with NP hardness. Maintwo approaches are heuristics such as SAT solvers thatsucceed on some instances, and proxy measures suchas mathematical relaxations that instead of solvingproblem ๐‘‹ (e.g., an integer program) solve program๐‘‹โ€ฒ (e.g., a linear program) that is related to that.Maybe give compressed sensing as an example, andleast square minimization as a proxy for maximumapostoriori probability.

16.8 IS P = NP โ€œIN PRACTICEโ€?

The fact that a problem is NP-hard means that we believe there is noefficient algorithm that solve it in the worst case. It does not, however,mean that every single instance of the problem is hard. For exam-ple, if all the clauses in a 3SAT instance ๐œ‘ contain the same variable๐‘ฅ๐‘– (possibly in negated form), then by guessing a value to ๐‘ฅ๐‘– we canreduce ๐œ‘ to a 2SAT instance which can then be efficiently solved. Gen-eralizations of this simple idea are used in โ€œSAT solversโ€, which arealgorithms that have solved certain specific interesting SAT formulaswith thousands of variables, despite the fact that we believe SAT tobe exponentially hard in the worst case. Similarly, a lot of problemsarising in economics and machine learning are NP-hard.10 And yetvendors and customers manage to figure out market-clearing prices(as economists like to point out, there is milk on the shelves) and micesucceed in distinguishing cats from dogs. Hence people (and ma-chines) seem to regularly succeed in solving interesting instances ofNP-hard problems, typically by using some combination of guessingwhile making local improvements.

It is also true that there are many interesting instances of NP-hardproblems that we do not currently know how to solve. Across all ap-plication areas, whether it is scientific computing, optimization, con-trol or more, people often encounter hard instances of NP problemson which our current algorithms fail. In fact, as we will see, all of ourdigital security infrastructure relies on the fact that some concrete andeasy-to-generate instances of, say, 3SAT (or, equivalently, any otherNP-hard problem) are exponentially hard to solve.

Thus it would be wrong to say that NP is easy โ€œin practiceโ€, norwould it be correct to take NP-hardness as the โ€œfinal wordโ€ on thecomplexity of a problem, particularly when we have more informa-tion about how any given instance is generated. Understanding boththe โ€œtypical complexityโ€ of NP problems, as well as the power andlimitations of certain heuristics (such as various local-search based al-gorithms) is a very active area of research. We will see more on thesetopics later in this course.

11

16.9 WHAT IF P โ‰  NP?

So, P = NP would give us all kinds of fantastical outcomes. But westrongly suspect that P โ‰  NP, and moreover that there is no much-better-than-brute-force algorithm for 3SAT. If indeed that is the case, isit all bad news?

One might think that impossibility results, telling you that youcannot do something, is the kind of cloud that does not have a silver

Page 515: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

what if p equals np? 515

lining. But in fact, as we already alluded to before, it does. A hard(in a sufficiently strong sense) problem in NP can be used to create acode that cannot be broken, a task that for thousands of years has beenthe dream of not just spies but of many scientists and mathematiciansover the generations. But the complexity viewpoint turned out toyield much more than simple codes, achieving tasks that people hadpreviously not even dared to dream of. These include the notion ofpublic key cryptography, allowing two people to communicate securelywithout ever having exchanged a secret key; electronic cash, allowingprivate and secure transaction without a central authority; and securemultiparty computation, enabling parties to compute a joint function onprivate inputs without revealing any extra information about it. Also,as we will see, computational hardness can be used to replace the roleof randomness in many settings.

Furthermore, while it is often convenient to pretend that computa-tional problems are simply handed to us, and that our job as computerscientists is to find the most efficient algorithm for them, this is nothow things work in most computing applications. Typically even for-mulating the problem to solve is a highly non-trivial task. When wediscover that the problem we want to solve is NP-hard, this might be auseful sign that we used the wrong formulation for it.

Beyond all these, the quest to understand computational hardnessโˆ’ including the discoveries of lower bounds for restricted compu-tational models, as well as new types of reductions (such as thosearising from โ€œprobabilistically checkable proofsโ€) โˆ’ has already hadsurprising positive applications to problems in algorithm design, aswell as in coding for both communication and storage. This is notsurprising since, as we mentioned before, from group theory to thetheory of relativity, the pursuit of impossibility results has often beenone of the most fruitful enterprises of mankind.

Chapter Recap

โ€ข The question of whether P = NP is one of themost important and fascinating questions of com-puter science and science at large, touching on allfields of the natural and social sciences, as well asmathematics and engineering.

โ€ข Our current evidence and understanding supportsthe โ€œSAT hardโ€ scenario that there is no much-better-than-brute-force algorithm for 3SAT or manyother NP-hard problems.

โ€ข We are very far from proving this, however. Re-searchers have studied proving lower bounds onthe number of gates to compute explicit functionsin restricted forms of circuits, and have made some

Page 516: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

516 introduction to theoretical computer science

advances in this effort, along the way generatingmathematical tools that have found other uses.However, we have made essentially no headwayin proving lower bounds for general models ofcomputation such as Boolean circuits and Turingmachines. Indeed, we currently do not even knowhow to rule out the possibility that for every ๐‘› โˆˆ โ„•,SAT restricted to ๐‘›-length inputs has a Booleancircuit of less than 10๐‘› gates (even though thereexist ๐‘›-input functions that require at least 2๐‘›/(10๐‘›)gates to compute).

โ€ข Understanding how to cope with this compu-tational intractability, and even benefit from it,comprises much of the research in theoreticalcomputer science.

16.10 EXERCISES

16.11 BIBLIOGRAPHICAL NOTES

As mentioned before, Aaronsonโ€™s survey [Aar16] is a great expositionof the P vs NP problem. Another recommended survey by Aaronsonis [Aar05] which discusses the question of whether NP completeproblems could be computed by any physical means.

The paper [BU11] discusses some results about problems in thepolynomial hierarchy.

Page 517: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

17Space bounded computation

PLAN: Example of space bounded algorithms, importance of pre-serving space. The classes L and PSPACE, space hierarchy theorem,PSPACE=NPSPACE, constant space = regular languages.

17.1 EXERCISES

17.2 BIBLIOGRAPHICAL NOTES

Compiled on 8.26.2020 18:10

Page 518: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 519: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

IVRANDOMIZED COMPUTATION

Page 520: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 521: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

18Probability Theory 101

โ€œGod doesnโ€™t play dice with the universeโ€, Albert Einstein

โ€œEinstein was doubly wrong โ€ฆ not only does God definitely play dice, but Hesometimes confuses us by throwing them where they canโ€™t be seen.โ€, StephenHawking

โ€œ โ€˜The probability of winning a battle has no place in our theory because itdoes not belong to any [random experiment]. Probability cannot be appliedto this problem any more than the physical concept of work can be applied tothe โ€™workโ€™ done by an actor reciting his part.โ€, Richard Von Mises, 1928(paraphrased)

โ€œI am unable to see why โ€˜objectivityโ€™ requires us to interpret every probabilityas a frequency in some random experiment; particularly when in most problemsprobabilities are frequencies only in an imaginary universe invented just for thepurpose of allowing a frequency interpretation.โ€, E.T. Jaynes, 1976

Before we show how to use randomness in algorithms, let us do aquick review of some basic notions in probability theory. This is notmeant to replace a course on probability theory, and if you have notseen this material before, I highly recommend you look at additionalresources to get up to speed. Fortunately, we will not need many ofthe advanced notions of probability theory, but, as we will see, eventhe so-called โ€œsimpleโ€ setting of tossing ๐‘› coins can lead to very subtleand interesting issues.

18.1 RANDOM COINS

The nature of randomness and probability is a topic of great philo-sophical, scientific and mathematical depth. Is there actual random-ness in the world, or does it proceed in a deterministic clockwork fash-ion from some initial conditions set at the beginning of time? Doesprobability refer to our uncertainty of beliefs, or to the frequency ofoccurrences in repeated experiments? How can we define probabilityover infinite sets?

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Review the basic notion of probability theory

that we will use.โ€ข Sample spaces, and in particular the space0, 1๐‘›โ€ข Events, probabilities of unions and

intersections.โ€ข Random variables and their expectation,

variance, and standard deviation.โ€ข Independence and correlation for both events

and random variables.โ€ข Markov, Chebyshev and Chernoff tail bounds

(bounding the probability that a randomvariable will deviate from its expectation).

Page 522: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

522 introduction to theoretical computer science

Figure 18.1: The probabilistic experiment of tossingthree coins corresponds to making 2 ร— 2 ร— 2 = 8choices, each with equal probability. In this example,the blue set corresponds to the event ๐ด = ๐‘ฅ โˆˆ0, 13 | ๐‘ฅ0 = 0 where the first coin toss is equalto 0, and the pink set corresponds to the event ๐ต =๐‘ฅ โˆˆ 0, 13 | ๐‘ฅ1 = 1 where the second coin toss isequal to 1 (with their intersection having a purplishcolor). As we can see, each of these events contains 4elements (out of 8 total) and so has probability 1/2.The intersection of ๐ด and ๐ต contains two elements,and so the probability that both of these events occuris 2/8 = 1/4.

Figure 18.2: The event that if we toss three coins๐‘ฅ0, ๐‘ฅ1, ๐‘ฅ2 โˆˆ 0, 1 then the sum of the ๐‘ฅ๐‘–โ€™s is evenhas probability 1/2 since it corresponds to exactly 4out of the 8 possible strings of length 3.

These are all important questions that have been studied and de-bated by scientists, mathematicians, statisticians and philosophers.Fortunately, we will not need to deal directly with these questionshere. We will be mostly interested in the setting of tossing ๐‘› random,unbiased and independent coins. Below we define the basic proba-bilistic objects of events and random variables when restricted to thissetting. These can be defined for much more general probabilistic ex-periments or sample spaces, and later on we will briefly discuss howthis can be done. However, the ๐‘›-coin case is sufficient for almosteverything weโ€™ll need in this course.

If instead of โ€œheadsโ€ and โ€œtailsโ€ we encode the sides of each coinby โ€œzeroโ€ and โ€œoneโ€, we can encode the result of tossing ๐‘› coins asa string in 0, 1๐‘›. Each particular outcome ๐‘ฅ โˆˆ 0, 1๐‘› is obtainedwith probability 2โˆ’๐‘›. For example, if we toss three coins, then weobtain each of the 8 outcomes 000, 001, 010, 011, 100, 101, 110, 111with probability 2โˆ’3 = 1/8 (see also Fig. 18.1). We can describe theexperiment of tossing ๐‘› coins as choosing a string ๐‘ฅ uniformly atrandom from 0, 1๐‘›, and hence weโ€™ll use the shorthand ๐‘ฅ โˆผ 0, 1๐‘›for ๐‘ฅ that is chosen according to this experiment.

An event is simply a subset ๐ด of 0, 1๐‘›. The probability of ๐ด, de-noted by Pr๐‘ฅโˆผ0,1๐‘› [๐ด] (or Pr[๐ด] for short, when the sample space isunderstood from the context), is the probability that an ๐‘ฅ chosen uni-formly at random will be contained in ๐ด. Note that this is the same as|๐ด|/2๐‘› (where |๐ด| as usual denotes the number of elements in the set๐ด). For example, the probability that ๐‘ฅ has an even number of onesis Pr[๐ด] where ๐ด = ๐‘ฅ โˆถ โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘– = 0 mod 2. In the case ๐‘› = 3,๐ด = 000, 011, 101, 110, and hence Pr[๐ด] = 48 = 12 (see Fig. 18.2). Itturns out this is true for every ๐‘›:Lemma 18.1 For every ๐‘› > 0,

Pr๐‘ฅโˆผ0,1๐‘›[๐‘›โˆ’1โˆ‘๐‘–=0 ๐‘ฅ๐‘– is even ] = 1/2 (18.1)

PTo test your intuition on probability, try to stop hereand prove the lemma on your own.

Proof of Lemma 18.1. We prove the lemma by induction on ๐‘›. For thecase ๐‘› = 1 it is clear since ๐‘ฅ = 0 is even and ๐‘ฅ = 1 is odd, and hencethe probability that ๐‘ฅ โˆˆ 0, 1 is even is 1/2. Let ๐‘› > 1. We assumeby induction that the lemma is true for ๐‘› โˆ’ 1 and we will prove itfor ๐‘›. We split the set 0, 1๐‘› into four disjoint sets ๐ธ0, ๐ธ1, ๐‘‚0, ๐‘‚1,where for ๐‘ โˆˆ 0, 1, ๐ธ๐‘ is defined as the set of ๐‘ฅ โˆˆ 0, 1๐‘› such that

Page 523: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 523

๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘›โˆ’2 has even number of ones and ๐‘ฅ๐‘›โˆ’1 = ๐‘ and similarly ๐‘‚๐‘ isthe set of ๐‘ฅ โˆˆ 0, 1๐‘› such that ๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘›โˆ’2 has odd number of ones and๐‘ฅ๐‘›โˆ’1 = ๐‘. Since ๐ธ0 is obtained by simply extending ๐‘› โˆ’ 1-length stringwith even number of ones by the digit 0, the size of ๐ธ0 is simply thenumber of such ๐‘›โˆ’1-length strings which by the induction hypothesisis 2๐‘›โˆ’1/2 = 2๐‘›โˆ’2. The same reasoning applies for ๐ธ1, ๐‘‚0, and ๐‘‚1.Hence each one of the four sets ๐ธ0, ๐ธ1, ๐‘‚0, ๐‘‚1 is of size 2๐‘›โˆ’2. Since๐‘ฅ โˆˆ 0, 1๐‘› has an even number of ones if and only if ๐‘ฅ โˆˆ ๐ธ0 โˆช ๐‘‚1(i.e., either the first ๐‘› โˆ’ 1 coordinates sum up to an even number andthe final coordinate is 0 or the first ๐‘› โˆ’ 1 coordinates sum up to an oddnumber and the final coordinate is 1), we get that the probability that๐‘ฅ satisfies this property is

|๐ธ0โˆช๐‘‚1|2๐‘› = 2๐‘›โˆ’2 + 2๐‘›โˆ’22๐‘› = 12 , (18.2)

using the fact that ๐ธ0 and ๐‘‚1 are disjoint and hence |๐ธ0 โˆช ๐‘‚1| =|๐ธ0| + |๐‘‚1|.

We can also use the intersection (โˆฉ) and union (โˆช) operators totalk about the probability of both event ๐ด and event ๐ต happening, orthe probability of event ๐ด or event ๐ต happening. For example, theprobability ๐‘ that ๐‘ฅ has an even number of ones and ๐‘ฅ0 = 1 is the sameas Pr[๐ด โˆฉ ๐ต] where ๐ด = ๐‘ฅ โˆˆ 0, 1๐‘› โˆถ โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘– = 0 mod 2 and๐ต = ๐‘ฅ โˆˆ 0, 1๐‘› โˆถ ๐‘ฅ0 = 1. This probability is equal to 1/4 for๐‘› > 1. (It is a great exercise for you to pause here and verify that youunderstand why this is the case.)

Because intersection corresponds to considering the logical ANDof the conditions that two events happen, while union correspondsto considering the logical OR, we will sometimes use the โˆง and โˆจoperators instead of โˆฉ and โˆช, and so write this probability ๐‘ = Pr[๐ด โˆฉ๐ต] defined above also as

Pr๐‘ฅโˆผ0,1๐‘› [โˆ‘๐‘– ๐‘ฅ๐‘– = 0 mod 2 โˆง ๐‘ฅ0 = 1] . (18.3)

If ๐ด โŠ† 0, 1๐‘› is an event, then ๐ด = 0, 1๐‘› โงต ๐ด corresponds to theevent that ๐ด does not happen. Since |๐ด| = 2๐‘› โˆ’ |๐ด|, we get that

Pr[๐ด] = |๐ด|2๐‘› = 2๐‘›โˆ’|๐ด|2๐‘› = 1 โˆ’ |๐ด|2๐‘› = 1 โˆ’ Pr[๐ด] (18.4)

This makes sense: since ๐ด happens if and only if ๐ด does not happen,the probability of ๐ด should be one minus the probability of ๐ด.

R

Page 524: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

524 introduction to theoretical computer science

Remark 18.2 โ€” Remember the sample space. While theabove definition might seem very simple and almosttrivial, the human mind seems not to have evolved forprobabilistic reasoning, and it is surprising how oftenpeople can get even the simplest settings of probabilitywrong. One way to make sure you donโ€™t get confusedwhen trying to calculate probability statements isto always ask yourself the following two questions:(1) Do I understand what is the sample space thatthis probability is taken over?, and (2) Do I under-stand what is the definition of the event that we areanalyzing?.For example, suppose that I were to randomize seatingin my course, and then it turned out that studentssitting in row 7 performed better on the final: howsurprising should we find this? If we started out withthe hypothesis that there is something special aboutthe number 7 and chose it ahead of time, then theevent that we are discussing is the event ๐ด that stu-dents sitting in number 7 had better performance onthe final, and we might find it surprising. However, ifwe first looked at the results and then chose the rowwhose average performance is best, then the eventwe are discussing is the event ๐ต that there exists somerow where the performance is higher than the over-all average. ๐ต is a superset of ๐ด, and its probability(even if there is no correlation between sitting andperformance) can be quite significant.

18.1.1 Random variablesEvents correspond to Yes/No questions, but often we want to analyzefiner questions. For example, if we make a bet at the roulette wheel,we donโ€™t want to just analyze whether we won or lost, but also howmuch weโ€™ve gained. A (real valued) random variable is simply a wayto associate a number with the result of a probabilistic experiment.Formally, a random variable is a function ๐‘‹ โˆถ 0, 1๐‘› โ†’ โ„ that mapsevery outcome ๐‘ฅ โˆˆ 0, 1๐‘› to an element ๐‘‹(๐‘ฅ) โˆˆ โ„. For example, thefunction SUM โˆถ 0, 1๐‘› โ†’ โ„ that maps ๐‘ฅ to the sum of its coordinates(i.e., to โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘–) is a random variable.

The expectation of a random variable ๐‘‹, denoted by ๐”ผ[๐‘‹], is theaverage value that this number takes, taken over all draws from theprobabilistic experiment. In other words, the expectation of ๐‘‹ is de-fined as follows: ๐”ผ[๐‘‹] = โˆ‘๐‘ฅโˆˆ0,1๐‘› 2โˆ’๐‘›๐‘‹(๐‘ฅ) . (18.5)

If ๐‘‹ and ๐‘Œ are random variables, then we can define ๐‘‹ + ๐‘Œ assimply the random variable that maps a point ๐‘ฅ โˆˆ 0, 1๐‘› to ๐‘‹(๐‘ฅ) +๐‘Œ (๐‘ฅ). One basic and very useful property of the expectation is that itis linear:

Page 525: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 525

Lemma 18.3 โ€” Linearity of expectation.๐”ผ[๐‘‹ + ๐‘Œ ] = ๐”ผ[๐‘‹] + ๐”ผ[๐‘Œ ] (18.6)

Proof. ๐”ผ[๐‘‹ + ๐‘Œ ] = โˆ‘๐‘ฅโˆˆ0,1๐‘› 2โˆ’๐‘› (๐‘‹(๐‘ฅ) + ๐‘Œ (๐‘ฅ)) =โˆ‘๐‘ฅโˆˆ0,1๐‘ 2โˆ’๐‘›๐‘‹(๐‘ฅ) + โˆ‘๐‘ฅโˆˆ0,1๐‘ 2โˆ’๐‘›๐‘Œ (๐‘ฅ) =๐”ผ[๐‘‹] + ๐”ผ[๐‘Œ ] (18.7)

Similarly, ๐”ผ[๐‘˜๐‘‹] = ๐‘˜ ๐”ผ[๐‘‹] for every ๐‘˜ โˆˆ โ„.Solved Exercise 18.1 โ€” Expectation of sum. Let ๐‘‹ โˆถ 0, 1๐‘› โ†’ โ„ be therandom variable that maps ๐‘ฅ โˆˆ 0, 1๐‘› to ๐‘ฅ0 + ๐‘ฅ1 + โ€ฆ + ๐‘ฅ๐‘›โˆ’1. Provethat ๐”ผ[๐‘‹] = ๐‘›/2.

Solution:

We can solve this using the linearity of expectation. We can de-fine random variables ๐‘‹0, ๐‘‹1, โ€ฆ , ๐‘‹๐‘›โˆ’1 such that ๐‘‹๐‘–(๐‘ฅ) = ๐‘ฅ๐‘–. Sinceeach ๐‘ฅ๐‘– equals 1 with probability 1/2 and 0 with probability 1/2,๐”ผ[๐‘‹๐‘–] = 1/2. Since ๐‘‹ = โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘‹๐‘–, by the linearity of expectation๐”ผ[๐‘‹] = ๐”ผ[๐‘‹0] + ๐”ผ[๐‘‹1] + โ‹ฏ + ๐”ผ[๐‘‹๐‘›โˆ’1] = ๐‘›2 . (18.8)

PIf you have not seen discrete probability before, pleasego over this argument again until you are sure youfollow it; it is a prototypical simple example of thetype of reasoning we will employ again and again inthis course.

If ๐ด is an event, then 1๐ด is the random variable such that 1๐ด(๐‘ฅ)equals 1 if ๐‘ฅ โˆˆ ๐ด, and 1๐ด(๐‘ฅ) = 0 otherwise. Note that Pr[๐ด] = ๐”ผ[1๐ด](can you see why?). Using this and the linearity of expectation, wecan show one of the most useful bounds in probability theory:Lemma 18.4 โ€” Union bound. For every two events ๐ด, ๐ต, Pr[๐ด โˆช ๐ต] โ‰คPr[๐ด] + Pr[๐ต]

PBefore looking at the proof, try to see why the unionbound makes intuitive sense. We can also proveit directly from the definition of probabilities and

Page 526: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

526 introduction to theoretical computer science

Figure 18.3: The union bound tells us that the proba-bility of ๐ด or ๐ต happening is at most the sum of theindividual probabilities. We can see it by noting thatfor every two sets |๐ด โˆช ๐ต| โ‰ค |๐ด| + |๐ต| (with equalityonly if ๐ด and ๐ต have no intersection).

the cardinality of sets, together with the equation|๐ด โˆช ๐ต| โ‰ค |๐ด| + |๐ต|. Can you see why the latterequation is true? (See also Fig. 18.3.)

Proof of Lemma 18.4. For every ๐‘ฅ, the variable 1๐ดโˆช๐ต(๐‘ฅ) โ‰ค 1๐ด(๐‘ฅ)+1๐ต(๐‘ฅ).Hence, Pr[๐ดโˆช๐ต] = ๐”ผ[1๐ดโˆช๐ต] โ‰ค ๐”ผ[1๐ด+1๐ต] = ๐”ผ[1๐ด]+๐”ผ[1๐ต] = Pr[๐ด]+Pr[๐ต].

The way we often use this in theoretical computer science is toargue that, for example, if there is a list of 100 bad events that can hap-pen, and each one of them happens with probability at most 1/10000,then with probability at least 1 โˆ’ 100/10000 = 0.99, no bad eventhappens.

18.1.2 Distributions over stringsWhile most of the time we think of random variables as havingas output a real number, we sometimes consider random vari-ables whose output is a string. That is, we can think of a map๐‘Œ โˆถ 0, 1๐‘› โ†’ 0, 1โˆ— and consider the โ€œrandom variableโ€ ๐‘Œ suchthat for every ๐‘ฆ โˆˆ 0, 1โˆ—, the probability that ๐‘Œ outputs ๐‘ฆ is equalto 12๐‘› |๐‘ฅ โˆˆ 0, 1๐‘› | ๐‘Œ (๐‘ฅ) = ๐‘ฆ|. To avoid confusion, we will typicallyrefer to such string-valued random variables as distributions overstrings. So, a distribution ๐‘Œ over strings 0, 1โˆ— can be thought of asa finite collection of strings ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘€โˆ’1 โˆˆ 0, 1โˆ— and probabilities๐‘0, โ€ฆ , ๐‘๐‘€โˆ’1 (which are non-negative numbers summing up to one),so that Pr[๐‘Œ = ๐‘ฆ๐‘–] = ๐‘๐‘–.

Two distributions ๐‘Œ and ๐‘Œ โ€ฒ are identical if they assign the sameprobability to every string. For example, consider the following twofunctions ๐‘Œ , ๐‘Œ โ€ฒ โˆถ 0, 12 โ†’ 0, 12. For every ๐‘ฅ โˆˆ 0, 12, we define๐‘Œ (๐‘ฅ) = ๐‘ฅ and ๐‘Œ โ€ฒ(๐‘ฅ) = ๐‘ฅ0(๐‘ฅ0 โŠ• ๐‘ฅ1) where โŠ• is the XOR operations.Although these are two different functions, they induce the samedistribution over 0, 12 when invoked on a uniform input. The distri-bution ๐‘Œ (๐‘ฅ) for ๐‘ฅ โˆผ 0, 12 is of course the uniform distribution over0, 12. On the other hand ๐‘Œ โ€ฒ is simply the map 00 โ†ฆ 00, 01 โ†ฆ 01,10 โ†ฆ 11, 11 โ†ฆ 10 which is a permutation of ๐‘Œ .

18.1.3 More general sample spacesWhile throughout most of this book we assume that the underlyingprobabilistic experiment corresponds to tossing ๐‘› independent coins,all the claims we make easily generalize to sampling ๐‘ฅ from a moregeneral finite or countable set ๐‘† (and not-so-easily generalizes touncountable sets ๐‘† as well). A probability distribution over a finite set๐‘† is simply a function ๐œ‡ โˆถ ๐‘† โ†’ [0, 1] such that โˆ‘๐‘ฅโˆˆ๐‘† ๐œ‡(๐‘ฅ) = 1. Wethink of this as the experiment where we obtain every ๐‘ฅ โˆˆ ๐‘† with

Page 527: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 527

Figure 18.4: Two events ๐ด and ๐ต are independent ifPr[๐ด โˆฉ ๐ต] = Pr[๐ด] โ‹… Pr[๐ต]. In the two figures above,the empty ๐‘ฅ ร— ๐‘ฅ square is the sample space, and ๐ดand ๐ต are two events in this sample space. In the leftfigure, ๐ด and ๐ต are independent, while in the rightfigure they are negatively correlated, since ๐ต is lesslikely to occur if we condition on ๐ด (and vice versa).Mathematically, one can see this by noticing that inthe left figure the areas of ๐ด and ๐ต respectively are๐‘Ž โ‹… ๐‘ฅ and ๐‘ โ‹… ๐‘ฅ, and so their probabilities are ๐‘Žโ‹…๐‘ฅ๐‘ฅ2 = ๐‘Ž๐‘ฅand ๐‘โ‹…๐‘ฅ๐‘ฅ2 = ๐‘๐‘ฅ respectively, while the area of ๐ด โˆฉ ๐ต is๐‘Ž โ‹… ๐‘ which corresponds to the probability ๐‘Žโ‹…๐‘๐‘ฅ2 . In theright figure, the area of the triangle ๐ต is ๐‘โ‹…๐‘ฅ2 whichcorresponds to a probability of ๐‘2๐‘ฅ , but the area of๐ด โˆฉ ๐ต is ๐‘โ€ฒโ‹…๐‘Ž2 for some ๐‘โ€ฒ < ๐‘. This means that theprobability of ๐ด โˆฉ ๐ต is ๐‘โ€ฒโ‹…๐‘Ž2๐‘ฅ2 < ๐‘2๐‘ฅ โ‹… ๐‘Ž๐‘ฅ , or in other wordsPr[๐ด โˆฉ ๐ต] < Pr[๐ด] โ‹… Pr[๐ต].

probability ๐œ‡(๐‘ฅ), and sometimes denote this as ๐‘ฅ โˆผ ๐œ‡. In particular,tossing ๐‘› random coins corresponds to the probability distribution๐œ‡ โˆถ 0, 1๐‘› โ†’ [0, 1] defined as ๐œ‡(๐‘ฅ) = 2โˆ’๐‘› for every ๐‘ฅ โˆˆ 0, 1๐‘›. Anevent ๐ด is a subset of ๐‘†, and the probability of ๐ด, which we denote byPr๐œ‡[๐ด], is โˆ‘๐‘ฅโˆˆ๐ด ๐œ‡(๐‘ฅ). A random variable is a function ๐‘‹ โˆถ ๐‘† โ†’ โ„, wherethe probability that ๐‘‹ = ๐‘ฆ is equal to โˆ‘๐‘ฅโˆˆ๐‘† s.t. ๐‘‹(๐‘ฅ)=๐‘ฆ ๐œ‡(๐‘ฅ).18.2 CORRELATIONS AND INDEPENDENCE

One of the most delicate but important concepts in probability is thenotion of independence (and the opposing notion of correlations). Subtlecorrelations are often behind surprises and errors in probability andstatistical analysis, and several mistaken predictions have been blamedon miscalculating the correlations between, say, housing prices inFlorida and Arizona, or voter preferences in Ohio and Michigan. Seealso Joe Blitzsteinโ€™s aptly named talk โ€œConditioning is the Soul ofStatisticsโ€. (Another thorny issue is of course the difference betweencorrelation and causation. Luckily, this is another point we donโ€™t need toworry about in our clean setting of tossing ๐‘› coins.)

Two events ๐ด and ๐ต are independent if the fact that ๐ด happensmakes ๐ต neither more nor less likely to happen. For example, if wethink of the experiment of tossing 3 random coins ๐‘ฅ โˆˆ 0, 13, and welet ๐ด be the event that ๐‘ฅ0 = 1 and ๐ต the event that ๐‘ฅ0 + ๐‘ฅ1 + ๐‘ฅ2 โ‰ฅ 2,then if ๐ด happens it is more likely that ๐ต happens, and hence theseevents are not independent. On the other hand, if we let ๐ถ be the eventthat ๐‘ฅ1 = 1, then because the second coin toss is not affected by theresult of the first one, the events ๐ด and ๐ถ are independent.

The formal definition is that events ๐ด and ๐ต are independent ifPr[๐ด โˆฉ ๐ต] = Pr[๐ด] โ‹… Pr[๐ต]. If Pr[๐ด โˆฉ ๐ต] > Pr[๐ด] โ‹… Pr[๐ต] then we saythat ๐ด and ๐ต are positively correlated, while if Pr[๐ด โˆฉ ๐ต] < Pr[๐ด] โ‹… Pr[๐ต]then we say that ๐ด and ๐ต are negatively correlated (see Fig. 18.1).

If we consider the above examples on the experiment of choosing๐‘ฅ โˆˆ 0, 13 then we can see that

Pr[๐‘ฅ0 = 1] = 12Pr[๐‘ฅ0 + ๐‘ฅ1 + ๐‘ฅ2 โ‰ฅ 2] = Pr[011, 101, 110, 111] = 48 = 12 (18.9)

but

Pr[๐‘ฅ0 = 1 โˆง ๐‘ฅ0 +๐‘ฅ1 +๐‘ฅ2 โ‰ฅ 2] = Pr[101, 110, 111] = 38 > 12 โ‹… 12 (18.10)

and hence, as we already observed, the events ๐‘ฅ0 = 1 and ๐‘ฅ0 +๐‘ฅ1 + ๐‘ฅ2 โ‰ฅ 2 are not independent and in fact are positively correlated.On the other hand, Pr[๐‘ฅ0 = 1 โˆง ๐‘ฅ1 = 1] = Pr[110, 111] = 28 = 12 โ‹… 12and hence the events ๐‘ฅ0 = 1 and ๐‘ฅ1 = 1 are indeed independent.

Page 528: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

528 introduction to theoretical computer science

Figure 18.5: Consider the sample space 0, 1๐‘› andthe events ๐ด, ๐ต, ๐ถ, ๐ท, ๐ธ corresponding to ๐ด: ๐‘ฅ0 = 1,๐ต: ๐‘ฅ1 = 1, ๐ถ: ๐‘ฅ0 + ๐‘ฅ1 + ๐‘ฅ2 โ‰ฅ 2, ๐ท: ๐‘ฅ0 + ๐‘ฅ1 + ๐‘ฅ2 =0๐‘š๐‘œ๐‘‘2 and ๐ท: ๐‘ฅ0 + ๐‘ฅ1 = 0๐‘š๐‘œ๐‘‘2. We can see that๐ด and ๐ต are independent, ๐ถ is positively correlatedwith ๐ด and positively correlated with ๐ต, the threeevents ๐ด, ๐ต, ๐ท are mutually independent, and whileevery pair out of ๐ด, ๐ต, ๐ธ is independent, the threeevents ๐ด, ๐ต, ๐ธ are not mutually independent sincetheir intersection has probability 28 = 14 instead of12 โ‹… 12 โ‹… 12 = 18 .

RRemark 18.5 โ€” Disjointness vs independence. Peoplesometimes confuse the notion of disjointness and in-dependence, but these are actually quite different. Twoevents ๐ด and ๐ต are disjoint if ๐ด โˆฉ ๐ต = โˆ…, which meansthat if ๐ด happens then ๐ต definitely does not happen.They are independent if Pr[๐ด โˆฉ ๐ต] = Pr[๐ด]Pr[๐ต] whichmeans that knowing that ๐ด happens gives us no infor-mation about whether ๐ต happened or not. If ๐ด and ๐ตhave nonzero probability, then being disjoint impliesthat they are not independent, since in particular itmeans that they are negatively correlated.

Conditional probability: If ๐ด and ๐ต are events, and ๐ด happens withnonzero probability then we define the probability that ๐ต happensconditioned on ๐ด to be Pr[๐ต|๐ด] = Pr[๐ด โˆฉ ๐ต]/Pr[๐ด]. This correspondsto calculating the probability that ๐ต happens if we already knowthat ๐ด happened. Note that ๐ด and ๐ต are independent if and only ifPr[๐ต|๐ด] = Pr[๐ต].More than two events: We can generalize this definition to more thantwo events. We say that events ๐ด1, โ€ฆ , ๐ด๐‘˜ are mutually independentif knowing that any set of them occurred or didnโ€™t occur does notchange the probability that an event outside the set occurs. Formally,the condition is that for every subset ๐ผ โŠ† [๐‘˜],

Pr[โˆง๐‘–โˆˆ๐ผ๐ด๐‘–] = โˆ๐‘–โˆˆ๐ผ Pr[๐ด๐‘–]. (18.11)

For example, if ๐‘ฅ โˆผ 0, 13, then the events ๐‘ฅ0 = 1, ๐‘ฅ1 = 1 and๐‘ฅ2 = 1 are mutually independent. On the other hand, the events๐‘ฅ0 = 1, ๐‘ฅ1 = 1 and ๐‘ฅ0 + ๐‘ฅ1 = 0 mod 2 are not mutuallyindependent, even though every pair of these events is independent(can you see why? see also Fig. 18.5).

18.2.1 Independent random variablesWe say that two random variables ๐‘‹ โˆถ 0, 1๐‘› โ†’ โ„ and ๐‘Œ โˆถ 0, 1๐‘› โ†’ โ„are independent if for every ๐‘ข, ๐‘ฃ โˆˆ โ„, the events ๐‘‹ = ๐‘ข and ๐‘Œ = ๐‘ฃare independent. (We use ๐‘‹ = ๐‘ข as shorthand for ๐‘ฅ | ๐‘‹(๐‘ฅ) = ๐‘ข.)In other words, ๐‘‹ and ๐‘Œ are independent if Pr[๐‘‹ = ๐‘ข โˆง ๐‘Œ = ๐‘ฃ] =Pr[๐‘‹ = ๐‘ข]Pr[๐‘Œ = ๐‘ฃ] for every ๐‘ข, ๐‘ฃ โˆˆ โ„. For example, if two randomvariables depend on the result of tossing different coins then they areindependent:Lemma 18.6 Suppose that ๐‘† = ๐‘ 0, โ€ฆ , ๐‘ ๐‘˜โˆ’1 and ๐‘‡ = ๐‘ก0, โ€ฆ , ๐‘ก๐‘šโˆ’1 aredisjoint subsets of 0, โ€ฆ , ๐‘› โˆ’ 1 and let ๐‘‹, ๐‘Œ โˆถ 0, 1๐‘› โ†’ โ„ be randomvariables such that ๐‘‹ = ๐น(๐‘ฅ๐‘ 0 , โ€ฆ , ๐‘ฅ๐‘ ๐‘˜โˆ’1) and ๐‘Œ = ๐บ(๐‘ฅ๐‘ก0 , โ€ฆ , ๐‘ฅ๐‘ก๐‘šโˆ’1) for

Page 529: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 529

some functions ๐น โˆถ 0, 1๐‘˜ โ†’ โ„ and ๐บ โˆถ 0, 1๐‘š โ†’ โ„. Then ๐‘‹ and ๐‘Œare independent.

PThe notation in the lemmaโ€™s statement is a bit cum-bersome, but at the end of the day, it simply says thatif ๐‘‹ and ๐‘Œ are random variables that depend on twodisjoint sets ๐‘† and ๐‘‡ of coins (for example, ๐‘‹ mightbe the sum of the first ๐‘›/2 coins, and ๐‘Œ might be thelargest consecutive stretch of zeroes in the second ๐‘›/2coins), then they are independent.

Proof of Lemma 18.6. Let ๐‘Ž, ๐‘ โˆˆ โ„, and let ๐ด = ๐‘ฅ โˆˆ 0, 1๐‘˜ โˆถ ๐น (๐‘ฅ) = ๐‘Žand ๐ต = ๐‘ฅ โˆˆ 0, 1๐‘š โˆถ ๐น (๐‘ฅ) = ๐‘. Since ๐‘† and ๐‘‡ are disjoint, we canreorder the indices so that ๐‘† = 0, โ€ฆ , ๐‘˜ โˆ’ 1 and ๐‘‡ = ๐‘˜, โ€ฆ , ๐‘˜ + ๐‘š โˆ’ 1without affecting any of the probabilities. Hence we can write Pr[๐‘‹ =๐‘Ž โˆง ๐‘‹ = ๐‘] = |๐ถ|/2๐‘› where ๐ถ = ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 โˆถ (๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘˜โˆ’1) โˆˆ๐ด โˆง (๐‘ฅ๐‘˜, โ€ฆ , ๐‘ฅ๐‘˜+๐‘šโˆ’1) โˆˆ ๐ต. Another way to write this using stringconcatenation is that ๐ถ = ๐‘ฅ๐‘ฆ๐‘ง โˆถ ๐‘ฅ โˆˆ ๐ด, ๐‘ฆ โˆˆ ๐ต, ๐‘ง โˆˆ 0, 1๐‘›โˆ’๐‘˜โˆ’๐‘š, andhence |๐ถ| = |๐ด||๐ต|2๐‘›โˆ’๐‘˜โˆ’๐‘š, which means that|๐ถ|2๐‘› = |๐ด|2๐‘˜ |๐ต|2๐‘š 2๐‘›โˆ’๐‘˜โˆ’๐‘š2๐‘›โˆ’๐‘˜โˆ’๐‘š = Pr[๐‘‹ = ๐‘Ž]Pr[๐‘Œ = ๐‘]. (18.12)

If ๐‘‹ and ๐‘Œ are independent random variables then (letting ๐‘†๐‘‹, ๐‘†๐‘Œdenote the sets of all numbers that have positive probability of beingthe output of ๐‘‹ and ๐‘Œ , respectively):๐”ผ[XY] = โˆ‘๐‘Žโˆˆ๐‘†๐‘‹,๐‘โˆˆ๐‘†๐‘Œ Pr[๐‘‹ = ๐‘Ž โˆง ๐‘Œ = ๐‘] โ‹… ๐‘Ž๐‘ =(1) โˆ‘๐‘Žโˆˆ๐‘†๐‘‹,๐‘โˆˆ๐‘†๐‘Œ Pr[๐‘‹ = ๐‘Ž]Pr[๐‘Œ = ๐‘] โ‹… ๐‘Ž๐‘ =(2)

( โˆ‘๐‘Žโˆˆ๐‘†๐‘‹ Pr[๐‘‹ = ๐‘Ž] โ‹… ๐‘Ž) ( โˆ‘๐‘โˆˆ๐‘†๐‘Œ Pr[๐‘Œ = ๐‘] โ‹… ๐‘) =(3)๐”ผ[๐‘‹] ๐”ผ[๐‘Œ ](18.13)

where the first equality (=(1)) follows from the independence of ๐‘‹and ๐‘Œ , the second equality (=(2)) follows by โ€œopening the parenthe-sesโ€ of the righthand side, and the third equality (=(3)) follows fromthe definition of expectation. (This is not an โ€œif and only ifโ€; see Exer-cise 18.3.)

Another useful fact is that if ๐‘‹ and ๐‘Œ are independent randomvariables, then so are ๐น(๐‘‹) and ๐บ(๐‘Œ ) for all functions ๐น, ๐บ โˆถ โ„ โ†’ โ„.This is intuitively true since learning ๐น(๐‘‹) can only provide us withless information than does learning ๐‘‹ itself. Hence, if learning ๐‘‹does not teach us anything about ๐‘Œ (and so also about ๐น(๐‘Œ )) then

Page 530: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

530 introduction to theoretical computer science

neither will learning ๐น(๐‘‹). Indeed, to prove this we can write forevery ๐‘Ž, ๐‘ โˆˆ โ„:

Pr[๐น (๐‘‹) = ๐‘Ž โˆง ๐บ(๐‘Œ ) = ๐‘] = โˆ‘๐‘ฅ s.t.๐น(๐‘ฅ)=๐‘Ž,๐‘ฆ s.t. ๐บ(๐‘ฆ)=๐‘ Pr[๐‘‹ = ๐‘ฅ โˆง ๐‘Œ = ๐‘ฆ] =โˆ‘๐‘ฅ s.t.๐น(๐‘ฅ)=๐‘Ž,๐‘ฆ s.t. ๐บ(๐‘ฆ)=๐‘ Pr[๐‘‹ = ๐‘ฅ]Pr[๐‘Œ = ๐‘ฆ] =โŽ›โŽœโŽ โˆ‘๐‘ฅ s.t.๐น(๐‘ฅ)=๐‘Ž Pr[๐‘‹ = ๐‘ฅ]โŽžโŽŸโŽ  โ‹… โŽ›โŽœโŽ โˆ‘๐‘ฆ s.t.๐บ(๐‘ฆ)=๐‘ Pr[๐‘Œ = ๐‘ฆ]โŽžโŽŸโŽ  =

Pr[๐น (๐‘‹) = ๐‘Ž]Pr[๐บ(๐‘Œ ) = ๐‘].(18.14)

18.2.2 Collections of independent random variablesWe can extend the notions of independence to more than two randomvariables: we say that the random variables ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 are mutuallyindependent if for every ๐‘Ž0, โ€ฆ , ๐‘Ž๐‘›โˆ’1 โˆˆ โ„,

Pr [๐‘‹0 = ๐‘Ž0 โˆง โ‹ฏ โˆง ๐‘‹๐‘›โˆ’1 = ๐‘Ž๐‘›โˆ’1] = Pr[๐‘‹0 = ๐‘Ž0] โ‹ฏPr[๐‘‹๐‘›โˆ’1 = ๐‘Ž๐‘›โˆ’1].(18.15)

And similarly, we have thatLemma 18.7 โ€” Expectation of product of independent random variables. If๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 are mutually independent then๐”ผ[๐‘›โˆ’1โˆ๐‘–=0 ๐‘‹๐‘–] = ๐‘›โˆ’1โˆ๐‘–=0 ๐”ผ[๐‘‹๐‘–]. (18.16)

Lemma 18.8 โ€” Functions preserve independence. If ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 are mu-tually independent, and ๐‘Œ0, โ€ฆ , ๐‘Œ๐‘›โˆ’1 are defined as ๐‘Œ๐‘– = ๐น๐‘–(๐‘‹๐‘–) forsome functions ๐น0, โ€ฆ , ๐น๐‘›โˆ’1 โˆถ โ„ โ†’ โ„, then ๐‘Œ0, โ€ฆ , ๐‘Œ๐‘›โˆ’1 are mutuallyindependent as well.

PWe leave proving Lemma 18.7 and Lemma 18.8 asExercise 18.6 and Exercise 18.7. It is good idea for youstop now and do these exercises to make sure you arecomfortable with the notion of independence, as wewill use it heavily later on in this course.

18.3 CONCENTRATION AND TAIL BOUNDS

The name โ€œexpectationโ€ is somewhat misleading. For example, sup-pose that you and I place a bet on the outcome of 10 coin tosses, whereif they all come out to be 1โ€™s then I pay you 100,000 dollars and other-

Page 531: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 531

Figure 18.6: The probabilities that we obtain a partic-ular sum when we toss ๐‘› = 10, 20, 100, 1000 coinsconverge quickly to the Gaussian/normal distribu-tion.

Figure 18.7: Markovโ€™s Inequality tells us that a non-negative random variable ๐‘‹ cannot be much largerthan its expectation, with high probability. For exam-ple, if the expectation of ๐‘‹ is ๐œ‡, then the probabilitythat ๐‘‹ > 4๐œ‡ must be at most 1/4, as otherwise justthe contribution from this part of the sample spacewill be too large.

wise you pay me 10 dollars. If we let ๐‘‹ โˆถ 0, 110 โ†’ โ„ be the randomvariable denoting your gain, then we see that๐”ผ[๐‘‹] = 2โˆ’10 โ‹… 100000 โˆ’ (1 โˆ’ 2โˆ’10)10 โˆผ 90. (18.17)

But we donโ€™t really โ€œexpectโ€ the result of this experiment to be foryou to gain 90 dollars. Rather, 99.9% of the time you will pay me 10dollars, and you will hit the jackpot 0.01% of the times.

However, if we repeat this experiment again and again (with freshand hence independent coins), then in the long run we do expect youraverage earning to be close to 90 dollars, which is the reason whycasinos can make money in a predictable way even though everyindividual bet is random. For example, if we toss ๐‘› independent andunbiased coins, then as ๐‘› grows, the number of coins that come upones will be more and more concentrated around ๐‘›/2 according to thefamous โ€œbell curveโ€ (see Fig. 18.6).

Much of probability theory is concerned with so called concentrationor tail bounds, which are upper bounds on the probability that a ran-dom variable ๐‘‹ deviates too much from its expectation. The first andsimplest one of them is Markovโ€™s inequality:

Theorem 18.9 โ€” Markovโ€™s inequality. If ๐‘‹ is a non-negative randomvariable then for every ๐‘˜ > 1, Pr[๐‘‹ โ‰ฅ ๐‘˜ ๐”ผ[๐‘‹]] โ‰ค 1/๐‘˜.

PMarkovโ€™s Inequality is actually a very natural state-ment (see also Fig. 18.7). For example, if you knowthat the average (not the median!) household incomein the US is 70,000 dollars, then in particular you candeduce that at most 25 percent of households makemore than 280,000 dollars, since otherwise, even ifthe remaining 75 percent had zero income, the top25 percent alone would cause the average income tobe larger than 70,000 dollars. From this example youcan already see that in many situations, Markovโ€™sinequality will not be tight and the probability of devi-ating from expectation will be much smaller: see theChebyshev and Chernoff inequalities below.

Proof of Theorem 18.9. Let ๐œ‡ = ๐”ผ[๐‘‹] and define ๐‘Œ = 1๐‘‹โ‰ฅ๐‘˜๐œ‡. Thatis, ๐‘Œ (๐‘ฅ) = 1 if ๐‘‹(๐‘ฅ) โ‰ฅ ๐‘˜๐œ‡ and ๐‘Œ (๐‘ฅ) = 0 otherwise. Note that bydefinition, for every ๐‘ฅ, ๐‘Œ (๐‘ฅ) โ‰ค ๐‘‹/(๐‘˜๐œ‡). We need to show ๐”ผ[๐‘Œ ] โ‰ค 1/๐‘˜.But this follows since ๐”ผ[๐‘Œ ] โ‰ค ๐”ผ[๐‘‹/๐‘˜(๐œ‡)] = ๐”ผ[๐‘‹]/(๐‘˜๐œ‡) = ๐œ‡/(๐‘˜๐œ‡) = 1/๐‘˜.

Page 532: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

532 introduction to theoretical computer science

The averaging principle. While the expectation of a random variable๐‘‹ is hardly always the โ€œtypical valueโ€, we can show that ๐‘‹ is guar-anteed to achieve a value that is at least its expectation with positiveprobability. For example, if the average grade in an exam is 87 points,at least one student got a grade 87 or more on the exam. This is knownas the averaging principle, and despite its simplicity it is surprisinglyuseful.Lemma 18.10 Let ๐‘‹ be a random variable, then Pr[๐‘‹ โ‰ฅ ๐”ผ[๐‘‹]] > 0.Proof. Suppose towards the sake of contradiction that Pr[๐‘‹ < ๐”ผ[๐‘‹]] =1. Then the random variable ๐‘Œ = ๐”ผ[๐‘‹] โˆ’ ๐‘‹ is always positive. Bylinearity of expectation ๐”ผ[๐‘Œ ] = ๐”ผ[๐‘‹] โˆ’ ๐”ผ[๐‘‹] = 0. Yet by Markov, anon-negative random variable ๐‘Œ with ๐”ผ[๐‘Œ ] = 0 must equal 0 withprobability 1, since the probability that ๐‘Œ > ๐‘˜ โ‹… 0 = 0 is at most 1/๐‘˜ forevery ๐‘˜ > 1. Hence we get a contradiction to the assumption that ๐‘Œ isalways positive.

18.3.1 Chebyshevโ€™s InequalityMarkovโ€™s inequality says that a (non-negative) random variable ๐‘‹canโ€™t go too crazy and be, say, a million times its expectation, withsignificant probability. But ideally we would like to say that withhigh probability, ๐‘‹ should be very close to its expectation, e.g., in therange [0.99๐œ‡, 1.01๐œ‡] where ๐œ‡ = ๐”ผ[๐‘‹]. In such a case we say that ๐‘‹ isconcentrated, and hence its expectation (i.e., mean) will be close to itsmedian and other ways of measuring ๐‘‹โ€™s โ€œtypical valueโ€. Chebyshevโ€™sinequality can be thought of as saying that ๐‘‹ is concentrated if it has asmall standard deviation.

A standard way to measure the deviation of a random variablefrom its expectation is by using its standard deviation. For a randomvariable ๐‘‹, we define the variance of ๐‘‹ as Var[๐‘‹] = ๐”ผ[(๐‘‹ โˆ’ ๐œ‡)2]where ๐œ‡ = ๐”ผ[๐‘‹]; i.e., the variance is the average squared distanceof ๐‘‹ from its expectation. The standard deviation of ๐‘‹ is defined as๐œŽ[๐‘‹] = โˆšVar[๐‘‹]. (This is well-defined since the variance, being anaverage of a square, is always a non-negative number.)

Using Chebyshevโ€™s inequality, we can control the probability thata random variable is too many standard deviations away from itsexpectation.

Theorem 18.11 โ€” Chebyshevโ€™s inequality. Suppose that ๐œ‡ = ๐”ผ[๐‘‹] and๐œŽ2 = Var[๐‘‹]. Then for every ๐‘˜ > 0, Pr[|๐‘‹ โˆ’ ๐œ‡| โ‰ฅ ๐‘˜๐œŽ] โ‰ค 1/๐‘˜2.Proof. The proof follows from Markovโ€™s inequality. We define therandom variable ๐‘Œ = (๐‘‹ โˆ’ ๐œ‡)2. Then ๐”ผ[๐‘Œ ] = Var[๐‘‹] = ๐œŽ2, and hence

Page 533: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 533

Figure 18.8: In the normal distribution or the Bell curve,the probability of deviating ๐‘˜ standard deviationsfrom the expectation shrinks exponentially in ๐‘˜2, andspecifically with probability at least 1 โˆ’ 2๐‘’โˆ’๐‘˜2/2, arandom variable ๐‘‹ of expectation ๐œ‡ and standarddeviation ๐œŽ satisfies ๐œ‡โˆ’๐‘˜๐œŽ โ‰ค ๐‘‹ โ‰ค ๐œ‡+๐‘˜๐œŽ. This figuregives more precise bounds for ๐‘˜ = 1, 2, 3, 4, 5, 6.(Image credit:Imran Baghirov)

by Markov the probability that ๐‘Œ > ๐‘˜2๐œŽ2 is at most 1/๐‘˜2. But clearly(๐‘‹ โˆ’ ๐œ‡)2 โ‰ฅ ๐‘˜2๐œŽ2 if and only if |๐‘‹ โˆ’ ๐œ‡| โ‰ฅ ๐‘˜๐œŽ.

One example of how to use Chebyshevโ€™s inequality is the settingwhen ๐‘‹ = ๐‘‹1 + โ‹ฏ + ๐‘‹๐‘› where ๐‘‹๐‘–โ€™s are independent and identicallydistributed (i.i.d for short) variables with values in [0, 1] where eachhas expectation 1/2. Since ๐”ผ[๐‘‹] = โˆ‘๐‘– ๐”ผ[๐‘‹๐‘–] = ๐‘›/2, we would like tosay that ๐‘‹ is very likely to be in, say, the interval [0.499๐‘›, 0.501๐‘›]. Us-ing Markovโ€™s inequality directly will not help us, since it will only tellus that ๐‘‹ is very likely to be at most 100๐‘› (which we already knew,since it always lies between 0 and ๐‘›). However, since ๐‘‹1, โ€ฆ , ๐‘‹๐‘› areindependent,

Var[๐‘‹1 + โ‹ฏ + ๐‘‹๐‘›] = Var[๐‘‹1] + โ‹ฏ + Var[๐‘‹๐‘›] . (18.18)

(We leave showing this to the reader as Exercise 18.8.)For every random variable ๐‘‹๐‘– in [0, 1], Var[๐‘‹๐‘–] โ‰ค 1 (if the variable

is always in [0, 1], it canโ€™t be more than 1 away from its expectation),and hence (18.18) implies that Var[๐‘‹] โ‰ค ๐‘› and hence ๐œŽ[๐‘‹] โ‰ค โˆš๐‘›.For large ๐‘›, โˆš๐‘› โ‰ช 0.001๐‘›, and in particular if โˆš๐‘› โ‰ค 0.001๐‘›/๐‘˜, we canuse Chebyshevโ€™s inequality to bound the probability that ๐‘‹ is not in[0.499๐‘›, 0.501๐‘›] by 1/๐‘˜2.18.3.2 The Chernoff boundChebyshevโ€™s inequality already shows a connection between inde-pendence and concentration, but in many cases we can hope fora quantitatively much stronger result. If, as in the example above,๐‘‹ = ๐‘‹1 + โ€ฆ + ๐‘‹๐‘› where the ๐‘‹๐‘–โ€™s are bounded i.i.d random variablesof mean 1/2, then as ๐‘› grows, the distribution of ๐‘‹ would be roughlythe normal or Gaussian distributionโˆ’ that is, distributed according tothe bell curve (see Fig. 18.6 and Fig. 18.8). This distribution has theproperty of being very concentrated in the sense that the probability ofdeviating ๐‘˜ standard deviations from the mean is not merely 1/๐‘˜2 as isguaranteed by Chebyshev, but rather is roughly ๐‘’โˆ’๐‘˜2 . Specifically, fora normal random variable ๐‘‹ of expectation ๐œ‡ and standard deviation๐œŽ, the probability that |๐‘‹ โˆ’ ๐œ‡| โ‰ฅ ๐‘˜๐œŽ is at most 2๐‘’โˆ’๐‘˜2/2. That is, we havean exponential decay of the probability of deviation.

The following extremely useful theorem shows that such expo-nential decay occurs every time we have a sum of independent andbounded variables. This theorem is known under many names in dif-ferent communities, though it is mostly called the Chernoff bound inthe computer science literature:

Page 534: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

534 introduction to theoretical computer science

Theorem 18.12 โ€” Chernoff/Hoeffding bound. If ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 are i.i.d ran-dom variables such that ๐‘‹๐‘– โˆˆ [0, 1] and ๐”ผ[๐‘‹๐‘–] = ๐‘ for every ๐‘–, thenfor every ๐œ– > 0

Pr[โˆฃ๐‘›โˆ’1โˆ‘๐‘–=0 ๐‘‹๐‘– โˆ’ ๐‘๐‘›โˆฃ > ๐œ–๐‘›] โ‰ค 2 โ‹… ๐‘’โˆ’2๐œ–2๐‘›. (18.19)

We omit the proof, which appears in many texts, and uses Markovโ€™sinequality on i.i.d random variables ๐‘Œ0, โ€ฆ , ๐‘Œ๐‘› that are of the form๐‘Œ๐‘– = ๐‘’๐œ†๐‘‹๐‘– for some carefully chosen parameter ๐œ†. See Exercise 18.11for a proof of the simple (but highly useful and representative) casewhere each ๐‘‹๐‘– is 0, 1 valued and ๐‘ = 1/2. (See also Exercise 18.12for a generalization.)

RRemark 18.13 โ€” Slight simplification of Chernoff. Since ๐‘’is roughly 2.7 (and in particular larger than 2),(18.19) would still be true if we replaced its righthandside with ๐‘’โˆ’2๐œ–2๐‘›+1. For ๐‘› > 1/๐œ–2, the equation willstill be true if we replaced the righthand side withthe simpler ๐‘’โˆ’๐œ–2๐‘›. Hence we will sometimes use theChernoff bound as stating that for ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 and ๐‘as above, ๐‘› > 1/๐œ–2 then

Pr[โˆฃ๐‘›โˆ’1โˆ‘๐‘–=0 ๐‘‹๐‘– โˆ’ ๐‘๐‘›โˆฃ > ๐œ–๐‘›] โ‰ค ๐‘’โˆ’๐œ–2๐‘›. (18.20)

18.3.3 Application: Supervised learning and empirical risk minimizationHere is a nice application of the Chernoff bound. Consider the taskof supervised learning. You are given a set ๐‘† of ๐‘› samples of the form(๐‘ฅ0, ๐‘ฆ0), โ€ฆ , (๐‘ฅ๐‘›โˆ’1, ๐‘ฆ๐‘›โˆ’1) drawn from some unknown distribution ๐ทover pairs (๐‘ฅ, ๐‘ฆ). For simplicity we will assume that ๐‘ฅ๐‘– โˆˆ 0, 1๐‘š and๐‘ฆ๐‘– โˆˆ 0, 1. (We use here the concept of general distribution over thefinite set 0, 1๐‘š+1 as discussed in Section 18.1.3.) The goal is to finda classifier โ„Ž โˆถ 0, 1๐‘š โ†’ 0, 1 that will minimize the test error whichis the probability ๐ฟ(โ„Ž) that โ„Ž(๐‘ฅ) โ‰  ๐‘ฆ where (๐‘ฅ, ๐‘ฆ) is drawn from thedistribution ๐ท. That is, ๐ฟ(โ„Ž) = Pr(๐‘ฅ,๐‘ฆ)โˆผ๐ท[โ„Ž(๐‘ฅ) โ‰  ๐‘ฆ].

One way to find such a classifier is to consider a collection ๐’ž of po-tential classifiers and look at the classifier โ„Ž in ๐’ž that does best on thetraining set ๐‘†. The classifier โ„Ž is known as the empirical risk minimizer(see also Section 12.1.6) . The Chernoff bound can be used to showthat as long as the number ๐‘› of samples is sufficiently larger than thelogarithm of |๐’ž|, the test error ๐ฟ(โ„Ž) will be close to its training error

Page 535: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 535

๐‘†(โ„Ž), which is defined as the fraction of pairs (๐‘ฅ๐‘–, ๐‘ฆ๐‘–) โˆˆ ๐‘† that it failsto classify. (Equivalently, ๐‘†(โ„Ž) = 1๐‘› โˆ‘๐‘–โˆˆ[๐‘›] |โ„Ž(๐‘ฅ๐‘–) โˆ’ ๐‘ฆ๐‘–|.)

Theorem 18.14 โ€” Generalization of ERM. Let ๐ท be any distribution overpairs (๐‘ฅ, ๐‘ฆ) โˆˆ 0, 1๐‘š+1 and ๐’ž be any set of functions mapping0, 1๐‘š to 0, 1. Then for every ๐œ–, ๐›ฟ > 0, if ๐‘› > log |๐’ž| log(1/๐›ฟ)๐œ–2 and ๐‘†is a set of (๐‘ฅ0, ๐‘ฆ0), โ€ฆ , (๐‘ฅ๐‘›โˆ’1, ๐‘ฆ๐‘›โˆ’1) samples that are drawn indepen-dently from ๐ท then

Pr๐‘† [โˆ€โ„Žโˆˆ๐’ž|๐ฟ(โ„Ž) โˆ’ ๐‘†(โ„Ž)| โ‰ค ๐œ–] > 1 โˆ’ ๐›ฟ , (18.21)

where the probability is taken over the choice of the set of samples๐‘†.In particular if |๐’ž| โ‰ค 2๐‘˜ and ๐‘› > ๐‘˜ log(1/๐›ฟ)๐œ–2 then with probability at

least 1โˆ’๐›ฟ, the classifier โ„Žโˆ— โˆˆ ๐’ž that minimizes that empirical test er-ror ๐‘†(๐ถ) satisfies ๐ฟ(โ„Žโˆ—) โ‰ค ๐‘†(โ„Žโˆ—) + ๐œ–, and hence its test error is atmost ๐œ– worse than its training error.

Proof Idea:

The idea is to combine the Chernoff bound with the union bound.Let ๐‘˜ = log |๐’ž|. We first use the Chernoff bound to show that forevery fixed โ„Ž โˆˆ ๐’ž, if we choose ๐‘† at random then the probability that|๐ฟ(โ„Ž) โˆ’ ๐‘†(โ„Ž)| > ๐œ– will be smaller than ๐›ฟ2๐‘˜ . We can then use the unionbound over all the 2๐‘˜ members of ๐’ž to show that this will be the casefor every โ„Ž.โ‹†Proof of Theorem 18.14. Set ๐‘˜ = log |๐’ž| and so ๐‘› > ๐‘˜ log(1/๐›ฟ)/๐œ–2. Westart by making the following claim

CLAIM: For every โ„Ž โˆˆ ๐’ž, the probability over ๐‘† that |๐ฟ(โ„Ž) โˆ’๐‘†(โ„Ž)| โ‰ฅ ๐œ– is smaller than ๐›ฟ/2๐‘˜.We prove the claim using the Chernoff bound. Specifically, for ev-

ery such โ„Ž, let us defined a collection of random variables ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1as follows:

๐‘‹๐‘– = โŽงโŽจโŽฉ1 โ„Ž(๐‘ฅ๐‘–) โ‰  ๐‘ฆ๐‘–0 otherwise. (18.22)

Since the samples (๐‘ฅ0, ๐‘ฆ0), โ€ฆ , (๐‘ฅ๐‘›โˆ’1, ๐‘ฆ๐‘›โˆ’1) are drawn independentlyfrom the same distribution ๐ท, the random variables ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 areindependently and identically distributed. Moreover, for every ๐‘–,๐”ผ[๐‘‹๐‘–] = ๐ฟ(โ„Ž). Hence by the Chernoff bound (see (18.20)), the proba-bility that | โˆ‘๐‘›๐‘–=0 ๐‘‹๐‘– โˆ’๐‘›โ‹…๐ฟ(โ„Ž)| โ‰ฅ ๐œ–๐‘› is at most ๐‘’โˆ’๐œ–2๐‘› < ๐‘’โˆ’๐‘˜ log(1/๐›ฟ) < ๐›ฟ/2๐‘˜(using the fact that ๐‘’ > 2). Since (โ„Ž) = 1๐‘› โˆ‘๐‘–โˆˆ[๐‘›] ๐‘‹๐‘–, this completesthe proof of the claim.

Page 536: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

536 introduction to theoretical computer science

Given the claim, the theorem follows from the union bound. In-deed, for every โ„Ž โˆˆ ๐’ž, define the โ€œbad eventโ€ ๐ตโ„Ž to be the event (overthe choice of ๐‘†) that |๐ฟ(โ„Ž) โˆ’ ๐‘†(โ„Ž)| > ๐œ–. By the claim Pr[๐ตโ„Ž] < ๐›ฟ/2๐‘˜,and hence by the union bound the probability that the union of ๐ตโ„Ž forall โ„Ž โˆˆ โ„‹ happens is smaller than |๐’ž|๐›ฟ/2๐‘˜ = ๐›ฟ. If for every โ„Ž โˆˆ ๐’ž, ๐ตโ„Ždoes not happen, it means that for every โ„Ž โˆˆ โ„‹, |๐ฟ(โ„Ž) โˆ’ ๐‘†(โ„Ž)| โ‰ค ๐œ–,and so the probability of the latter event is larger than 1 โˆ’ ๐›ฟ which iswhat we wanted to prove.

Chapter Recap

โ€ข A basic probabilistic experiment corresponds totossing ๐‘› coins or choosing ๐‘ฅ uniformly at randomfrom 0, 1๐‘›.

โ€ข Random variables assign a real number to everyresult of a coin toss. The expectation of a randomvariable ๐‘‹ is its average value.

โ€ข There are several concentration results, also knownas tail bounds showing that under certain condi-tions, random variables deviate significantly fromtheir expectation only with small probability.

18.4 EXERCISES

Exercise 18.1 Suppose that we toss three independent fair coins ๐‘Ž, ๐‘, ๐‘ โˆˆ0, 1. What is the probability that the XOR of ๐‘Ž,๐‘, and ๐‘ is equal to 1?What is the probability that the AND of these three values is equal to1? Are these two events independent?

Exercise 18.2 Give an example of random variables ๐‘‹, ๐‘Œ โˆถ 0, 13 โ†’ โ„such that ๐”ผ[XY] โ‰  ๐”ผ[๐‘‹] ๐”ผ[๐‘Œ ].

Exercise 18.3 Give an example of random variables ๐‘‹, ๐‘Œ โˆถ 0, 13 โ†’ โ„such that ๐‘‹ and ๐‘Œ are not independent but ๐”ผ[XY] = ๐”ผ[๐‘‹] ๐”ผ[๐‘Œ ].

Exercise 18.4 Let ๐‘› be an odd number, and let ๐‘‹ โˆถ 0, 1๐‘› โ†’ โ„ be therandom variable defined as follows: for every ๐‘ฅ โˆˆ 0, 1๐‘›, ๐‘‹(๐‘ฅ) = 1 ifโˆ‘๐‘–=0 ๐‘ฅ๐‘– > ๐‘›/2 and ๐‘‹(๐‘ฅ) = 0 otherwise. Prove that ๐”ผ[๐‘‹] = 1/2.

Exercise 18.5 โ€” standard deviation. 1. Give an example for a randomvariable ๐‘‹ such that ๐‘‹โ€™s standard deviation is equal to ๐”ผ[|๐‘‹ โˆ’ ๐”ผ[๐‘‹]|]

Page 537: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 537

1 While you donโ€™t need this to solve this exercise, thisis the function that maps ๐‘ to the entropy (as definedin Exercise 18.9) of the ๐‘-biased coin distribution over0, 1, which is the function ๐œ‡ โˆถ 0, 1 โ†’ [0, 1] s.y.๐œ‡(0) = 1 โˆ’ ๐‘ and ๐œ‡(1) = ๐‘.2 Hint: Use Stirlingโ€™s formula for approximating thefactorial function.

2. Give an example for a random variable ๐‘‹ such that ๐‘‹โ€™s standarddeviation is not equal to ๐”ผ[|๐‘‹ โˆ’ ๐”ผ[๐‘‹]|]

Exercise 18.6 โ€” Product of expectations. Prove Lemma 18.7

Exercise 18.7 โ€” Transformations preserve independence. Prove Lemma 18.8

Exercise 18.8 โ€” Variance of independent random variables. Prove that if๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 are independent random variables then Var[๐‘‹0 + โ‹ฏ +๐‘‹๐‘›โˆ’1] = โˆ‘๐‘›โˆ’1๐‘–=0 Var[๐‘‹๐‘–].

Exercise 18.9 โ€” Entropy (challenge). Recall the definition of a distribution๐œ‡ over some finite set ๐‘†. Shannon defined the entropy of a distribution๐œ‡, denoted by ๐ป(๐œ‡), to be โˆ‘๐‘ฅโˆˆ๐‘† ๐œ‡(๐‘ฅ) log(1/๐œ‡(๐‘ฅ)). The idea is that if ๐œ‡is a distribution of entropy ๐‘˜, then encoding members of ๐œ‡ will require๐‘˜ bits, in an amortized sense. In this exercise we justify this definition.Let ๐œ‡ be such that ๐ป(๐œ‡) = ๐‘˜.

1. Prove that for every one to one function ๐น โˆถ ๐‘† โ†’ 0, 1โˆ—,๐”ผ๐‘ฅโˆผ๐œ‡ |๐น (๐‘ฅ)| โ‰ฅ ๐‘˜.2. Prove that for every ๐œ–, there is some ๐‘› and a one-to-one function๐น โˆถ ๐‘†๐‘› โ†’ 0, 1โˆ—, such that ๐”ผ๐‘ฅโˆผ๐œ‡๐‘› |๐น (๐‘ฅ)| โ‰ค ๐‘›(๐‘˜ + ๐œ–), where ๐‘ฅ โˆผ ๐œ‡

denotes the experiments of choosing ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 each independentlyfrom ๐‘† using the distribution ๐œ‡.

Exercise 18.10 โ€” Entropy approximation to binomial. Let ๐ป(๐‘) = ๐‘ log(1/๐‘) +(1 โˆ’ ๐‘) log(1/(1 โˆ’ ๐‘)).1 Prove that for every ๐‘ โˆˆ (0, 1) and ๐œ– > 0, if ๐‘› islarge enough then2 2(๐ป(๐‘)โˆ’๐œ–)๐‘›( ๐‘›๐‘๐‘›) โ‰ค 2(๐ป(๐‘)+๐œ–)๐‘› (18.23)

where (๐‘›๐‘˜) is the binomial coefficient ๐‘›!๐‘˜!(๐‘›โˆ’๐‘˜)! which is equal to thenumber of ๐‘˜-size subsets of 0, โ€ฆ , ๐‘› โˆ’ 1.

Exercise 18.11 โ€” Chernoff using Stirling. 1. Prove that Pr๐‘ฅโˆผ0,1๐‘› [โˆ‘ ๐‘ฅ๐‘– =๐‘˜] = (๐‘›๐‘˜)2โˆ’๐‘›.2. Use this and Exercise 18.10 to prove (an approximate version of)

the Chernoff bound for the case that ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 are i.i.d. randomvariables over 0, 1 each equaling 0 and 1 with probability 1/2.That is, prove that for every ๐œ– > 0, and ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 as above,Pr[| โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘‹๐‘– โˆ’ ๐‘›2 | > ๐œ–๐‘›] < 20.1โ‹…๐œ–2๐‘›.

Page 538: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

538 introduction to theoretical computer science

3 Hint: Bound the number of tuples ๐‘—0, โ€ฆ , ๐‘—๐‘›โˆ’1 suchthat every ๐‘—๐‘– is even and โˆ‘ ๐‘—๐‘– = ๐‘˜ using the Binomialcoefficient and the fact that in any such tuple there areat most ๐‘˜/2 distinct indices.4 Hint: Set ๐‘˜ = 2โŒˆ๐œ–2๐‘›/1000โŒ‰ and then show that if theevent | โˆ‘ ๐‘Œ๐‘–| โ‰ฅ ๐œ–๐‘› happens then the random variable(โˆ‘ ๐‘Œ๐‘–)๐‘˜ is a factor of ๐œ–โˆ’๐‘˜ larger than its expectation.

Exercise 18.12 โ€” Poor manโ€™s Chernoff. Exercise 18.11 establishes the Cher-noff bound for the case that ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 are i.i.d variables over 0, 1with expectation 1/2. In this exercise we use a slightly differentmethod (bounding the moments of the random variables) to estab-lish a version of Chernoff where the random variables range over [0, 1]and their expectation is some number ๐‘ โˆˆ [0, 1] that may be differentthan 1/2. Let ๐‘‹0, โ€ฆ , ๐‘‹๐‘›โˆ’1 be i.i.d random variables with ๐”ผ ๐‘‹๐‘– = ๐‘ andPr[0 โ‰ค ๐‘‹๐‘– โ‰ค 1] = 1. Define ๐‘Œ๐‘– = ๐‘‹๐‘– โˆ’ ๐‘.1. Prove that for every ๐‘—0, โ€ฆ , ๐‘—๐‘›โˆ’1 โˆˆ โ„•, if there exists one ๐‘– such that ๐‘—๐‘–

is odd then ๐”ผ[โˆ๐‘›โˆ’1๐‘–=0 ๐‘Œ ๐‘—๐‘–๐‘– ] = 0.2. Prove that for every ๐‘˜, ๐”ผ[(โˆ‘๐‘›โˆ’1๐‘–=0 ๐‘Œ๐‘–)๐‘˜] โ‰ค (10๐‘˜๐‘›)๐‘˜/2.33. Prove that for every ๐œ– > 0, Pr[| โˆ‘๐‘– ๐‘Œ๐‘–| โ‰ฅ ๐œ–๐‘›] โ‰ฅ 2โˆ’๐œ–2๐‘›/(10000 log1/๐œ–).4

Exercise 18.13 โ€” Sampling. Suppose that a country has 300,000,000 citi-zens, 52 percent of which prefer the color โ€œgreenโ€ and 48 percent ofwhich prefer the color โ€œorangeโ€. Suppose we sample ๐‘› random citi-zens and ask them their favorite color (assume they will answer truth-fully). What is the smallest value ๐‘› among the following choices sothat the probability that the majority of the sample answers โ€œgreenโ€ isat most 0.05?a. 1,000

b. 10,000

c. 100,000

d. 1,000,000

Exercise 18.14 Would the answer to Exercise 18.13 change if the countryhad 300,000,000,000 citizens?

Exercise 18.15 โ€” Sampling (2). Under the same assumptions as Exer-cise 18.13, what is the smallest value ๐‘› among the following choices sothat the probability that the majority of the sample answers โ€œgreenโ€ isat most 2โˆ’100?a. 1,000

b. 10,000

c. 100,000

Page 539: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probability theory 101 539

d. 1,000,000

e. It is impossible to get such low probability since there are fewerthan 2100 citizens.

18.5 BIBLIOGRAPHICAL NOTES

There are many sources for more information on discrete probability,including the texts referenced in Section 1.9. One particularly recom-mended source for probability is Harvardโ€™s STAT 110 class, whoselectures are available on youtube and whose book is available online.

The version of the Chernoff bound that we stated in Theorem 18.12is sometimes known as Hoeffdingโ€™s Inequality. Other variants of theChernoff bound are known as well, but all of them are equally goodfor the applications of this book.

Page 540: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 541: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Figure 19.1: A 1947 entry in the log book of the Har-vard MARK II computer containing an actual bug thatcaused a hardware malfunction. By Courtesy of theNaval Surface Warfare Center.

1 Some texts also talk about โ€œLas Vegas algorithmsโ€that always return the right answer but whose run-ning time is only polynomial on the average. Sincethis Monte Carlo vs Las Vegas terminology is confus-ing, we will not use these terms anymore, and simplytalk about randomized algorithms.

19Probabilistic computation

โ€œin 1946 .. (I asked myself) what are the chances that a Canfield solitaire laidout with 52 cards will come out successfully? After spending a lot of timetrying to estimate them by pure combinatorial calculations, I wondered whethera more practical method โ€ฆ might not be to lay it our say one hundred times andsimple observe and countโ€, Stanislaw Ulam, 1983

โ€œThe salient features of our method are that it is probabilistic โ€ฆ and with acontrollable miniscule probability of error.โ€, Michael Rabin, 1977

In early computer systems, much effort was taken to drive outrandomness and noise. Hardware components were prone to non-deterministic behavior from a number of causes, whether it was vac-uum tubes overheating or actual physical bugs causing short circuits(see Fig. 19.1). This motivated John von Neumann, one of the earlycomputing pioneers, to write a paper on how to error correct computa-tion, introducing the notion of redundancy.

So it is quite surprising that randomness turned out not just a hin-drance but also a resource for computation, enabling us to achievetasks much more efficiently than previously known. One of the firstapplications involved the very same John von Neumann. While hewas sick in bed and playing cards, Stan Ulam came up with the ob-servation that calculating statistics of a system could be done muchfaster by running several randomized simulations. He mentioned thisidea to von Neumann, who became very excited about it; indeed, itturned out to be crucial for the neutron transport calculations thatwere needed for development of the Atom bomb and later on the hy-drogen bomb. Because this project was highly classified, Ulam, vonNeumann and their collaborators came up with the codeword โ€œMonteCarloโ€ for this approach (based on the famous casinos where Ulamโ€™suncle gambled). The name stuck, and randomized algorithms areknown as Monte Carlo algorithms to this day.1

In this chapter, we will see some examples of randomized algo-rithms that use randomness to compute a quantity in a faster or

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข See examples of randomized algorithmsโ€ข Get more comfort with analyzing

probabilistic processes and tail boundsโ€ข Success amplification using tail bounds

Page 542: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

542 introduction to theoretical computer science

simpler way than was known otherwise. We will describe the algo-rithms in an informal / โ€œpseudo-codeโ€ way, rather than as NAND-TM, NAND-RAM programs or Turing macines. In Chapter 20 we willdiscuss how to augment the computational models we say before toincorporate the ability to โ€œtoss coinsโ€.

19.1 FINDING APPROXIMATELY GOOD MAXIMUM CUTS

We start with the following example. Recall the maximum cut problemof finding, given a graph ๐บ = (๐‘‰ , ๐ธ), the cut that maximizes the num-ber of edges. This problem is NP-hard, which means that we do notknow of any efficient algorithm that can solve it, but randomizationenables a simple algorithm that can cut at least half of the edges:

Theorem 19.1 โ€” Approximating max cut. There is an efficient probabilis-tic algorithm that on input an ๐‘›-vertex ๐‘š-edge graph ๐บ, outputs acut (๐‘†, ๐‘†) that cuts at least ๐‘š/2 of the edges of ๐บ in expectation.

Proof Idea:

We simply choose a random cut: we choose a subset ๐‘† of vertices bychoosing every vertex ๐‘ฃ to be a member of ๐‘† with probability 1/2 in-dependently. Itโ€™s not hard to see that each edge is cut with probability1/2 and so the expected number of cut edges is ๐‘š/2.โ‹†Proof of Theorem 19.1. The algorithm is extremely simple:

Algorithm Random Cut:Input: Graph ๐บ = (๐‘‰ , ๐ธ) with ๐‘› vertices and ๐‘š edges. Denote ๐‘‰ =๐‘ฃ0, ๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›โˆ’1.Operation:

1. Pick ๐‘ฅ uniformly at random in 0, 1๐‘›.2. Let ๐‘† โŠ† ๐‘‰ be the set ๐‘ฃ๐‘– โˆถ ๐‘ฅ๐‘– = 1 , ๐‘– โˆˆ [๐‘›] that includes all vertices

corresponding to coordinates of ๐‘ฅ where ๐‘ฅ๐‘– = 1.3. Output the cut (๐‘†, ๐‘†).We claim that the expected number of edges cut by the algorithm is๐‘š/2. Indeed, for every edge ๐‘’ โˆˆ ๐ธ, let ๐‘‹๐‘’ be the random variable such

that ๐‘‹๐‘’(๐‘ฅ) = 1 if the edge ๐‘’ is cut by ๐‘ฅ, and ๐‘‹๐‘’(๐‘ฅ) = 0 otherwise. Forevery such edge ๐‘’ = ๐‘–, ๐‘—, ๐‘‹๐‘’(๐‘ฅ) = 1 if and only if ๐‘ฅ๐‘– โ‰  ๐‘ฅ๐‘—. Since thepair (๐‘ฅ๐‘–, ๐‘ฅ๐‘—) obtains each of the values 00, 01, 10, 11 with probability1/4, the probability that ๐‘ฅ๐‘– โ‰  ๐‘ฅ๐‘— is 1/2 and hence ๐”ผ[๐‘‹๐‘’] = 1/2. If we let

Page 543: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probabilist ic computation 543

๐‘‹ be the random variable corresponding to the total number of edgescut by ๐‘†, then ๐‘‹ = โˆ‘๐‘’โˆˆ๐ธ ๐‘‹๐‘’ and hence by linearity of expectation๐”ผ[๐‘‹] = โˆ‘๐‘’โˆˆ๐ธ ๐”ผ[๐‘‹๐‘’] = ๐‘š(1/2) = ๐‘š/2 . (19.1)

Randomized algorithms work in the worst case. It is tempting of a random-ized algorithm such as the one of Theorem 19.1 as an algorithm thatworks for a โ€œrandom input graphโ€ but it is actually much better thanthat. The expectation in this theorem is not taken over the choice ofthe graph, but rather only over the random choices of the algorithm. Inparticular, for every graph ๐บ, the algorithm is guaranteed to cut half ofthe edges of the input graph in expectation. That is,

Big Idea 24 A randomized algorithm outputs the correct valuewith good probability on every possible input.

We will define more formally what โ€œgood probabilityโ€ means inChapter 20 but the crucial point is that this probability is always onlytaken over the random choices of the algorithm, while the input is notchosen at random.

19.1.1 Amplifying the success of randomized algorithmsTheorem 19.1 gives us an algorithm that cuts ๐‘š/2 edges in expectation.But, as we saw before, expectation does not immediately imply con-centration, and so a priori, it may be the case that when we run thealgorithm, most of the time we donโ€™t get a cut matching the expecta-tion. Luckily, we can amplify the probability of success by repeatingthe process several times and outputting the best cut we find. Westart by arguing that the probability the algorithm above succeeds incutting at least ๐‘š/2 edges is not too tiny.Lemma 19.2 The probability that a random cut in an ๐‘š edge graph cutsat least ๐‘š/2 edges is at least 1/(2๐‘š).Proof Idea:

To see the idea behind the proof, think of the case that ๐‘š = 1000. Inthis case one can show that we will cut at least 500 edges with proba-bility at least 0.001 (and so in particular larger than 1/(2๐‘š) = 1/2000).Specifically, if we assume otherwise, then this means that with proba-bility more than 0.999 the algorithm cuts 499 or fewer edges. But sincewe can never cut more than the total of 1000 edges, given this assump-tion, the highest value the expected number of edges cut is if we cutexactly 499 edges with probability 0.999 and cut 1000 edges with prob-ability 0.001. Yet even in this case the expected number of edges will

Page 544: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

544 introduction to theoretical computer science

be 0.999 โ‹… 499 + 0.001 โ‹… 1000 < 500, which contradicts the fact that weโ€™vecalculated the expectation to be at least 500 in Theorem 19.1.โ‹†Proof of Lemma 19.2. Let ๐‘ be the probability that we cut at least ๐‘š/2edges and suppose, towards a contradiction, that ๐‘ < 1/(2๐‘š). Sincethe number of edges cut is an integer, and ๐‘š/2 is a multiple of 0.5,by definition of ๐‘, with probability 1 โˆ’ ๐‘ we cut at most ๐‘š/2 โˆ’ 0.5edges. Moreover, since we can never cut more than ๐‘š edges, underour assumption that ๐‘ < 1/(2๐‘š), we can bound the expected numberof edges cut by๐‘๐‘š + (1 โˆ’ ๐‘)(๐‘š/2 โˆ’ 0.5) โ‰ค ๐‘๐‘š + ๐‘š/2 โˆ’ 0.5 (19.2)But if ๐‘ < 1/(2๐‘š) then ๐‘๐‘š < 0.5 and so the righthand side is smallerthan ๐‘š/2, which contradicts the fact that (as proven in Theorem 19.1)the expected number of edges cut is at least ๐‘š/2.

19.1.2 Success amplificationLemma 19.2 shows that our algorithm succeeds at least some of thetime, but weโ€™d like to succeed almost all of the time. The approachto do that is to simply repeat our algorithm many times, with freshrandomness each time, and output the best cut we get in one of theserepetitions. It turns out that with extremely high probability we willget a cut of size at least ๐‘š/2. For example, if we repeat this experiment2000๐‘š times, then (using the inequality (1 โˆ’ 1/๐‘˜)๐‘˜ โ‰ค 1/๐‘’ โ‰ค 1/2) wecan show that the probability that we will never cut at least ๐‘š/2 edgesis at most (1 โˆ’ 1/(2๐‘š))2000๐‘š โ‰ค 2โˆ’1000 . (19.3)

More generally, the same calculations can be used to show thefollowing lemma:Lemma 19.3 There is an algorithm that on input a graph ๐บ = (๐‘‰ , ๐ธ) anda number ๐‘˜, runs in time polynomial in |๐‘‰ | and ๐‘˜ and outputs a cut(๐‘†, ๐‘†) such that

Pr[number of edges cut by (๐‘†, ๐‘†) โ‰ฅ |๐ธ|/2] โ‰ฅ 1 โˆ’ 2โˆ’๐‘˜ . (19.4)

Proof of Lemma 19.3. The algorithm will work as follows:

Algorithm AMPLIFY RANDOM CUT:Input: Graph ๐บ = (๐‘‰ , ๐ธ) with ๐‘› vertices and ๐‘š edges. Denote ๐‘‰ =๐‘ฃ0, ๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›โˆ’1. Number ๐‘˜ > 0.Operation:

Page 545: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probabilist ic computation 545

1. Repeat the following 200๐‘˜๐‘š times:

a. Pick ๐‘ฅ uniformly at random in 0, 1๐‘›.b. Let ๐‘† โŠ† ๐‘‰ be the set ๐‘ฃ๐‘– โˆถ ๐‘ฅ๐‘– = 1 , ๐‘– โˆˆ [๐‘›] that includes all

vertices corresponding to coordinates of ๐‘ฅ where ๐‘ฅ๐‘– = 1.c. If (๐‘†, ๐‘†) cuts at least ๐‘š/2 then halt and output (๐‘†, ๐‘†).

2. Output โ€œfailedโ€

We leave completing the analysis as an exercise to the reader (seeExercise 19.1).

19.1.3 Two-sided amplificationThe analysis above relied on the fact that the maximum cut has onesided error. By this we mean that if we get a cut of size at least ๐‘š/2then we know we have succeeded. This is common for randomizedalgorithms, but is not the only case. In particular, consider the task ofcomputing some Boolean function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. A randomizedalgorithm ๐ด for computing ๐น , given input ๐‘ฅ, might toss coins and suc-ceed in outputting ๐น(๐‘ฅ) with probability, say, 0.9. We say that ๐ด hastwo sided errors if there is positive probability that ๐ด(๐‘ฅ) outputs 1 when๐น(๐‘ฅ) = 0, and positive probability that ๐ด(๐‘ฅ) outputs 0 when ๐น(๐‘ฅ) = 1.In such a case, to simplify ๐ดโ€™s success, we cannot simply repeat it ๐‘˜times and output 1 if a single one of those repetitions resulted in 1,nor can we output 0 if a single one of the repetitions resulted in 0. Butwe can output the majority value of these repetitions. By the Chernoffbound (Theorem 18.12), with probability exponentially close to 1 (i.e.,1 โˆ’ 2ฮฉ(๐‘˜)), the fraction of the repetitions where ๐ด will output ๐น(๐‘ฅ)will be at least, say 0.89, and in such cases we will of course output thecorrect answer.

The above translates into the following theorem

Theorem 19.4 โ€” Two-sided amplification. If ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 is a func-tion such that there is a polynomial-time algorithm ๐ด satisfying

Pr[๐ด(๐‘ฅ) = ๐น(๐‘ฅ)] โ‰ฅ 0.51 (19.5)

for every ๐‘ฅ โˆˆ 0, 1โˆ—, then there is a polynomial time algorithm ๐ตsatisfying

Pr[๐ต(๐‘ฅ) = ๐น(๐‘ฅ)] โ‰ฅ 1 โˆ’ 2โˆ’|๐‘ฅ| (19.6)for every ๐‘ฅ โˆˆ 0, 1โˆ—.We omit the proof of Theorem 19.4, since we will prove a more

general result later on in Theorem 20.5.

Page 546: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

546 introduction to theoretical computer science

2 This question does have some significance to prac-tice, since hardware that generates high qualityrandomness at speed is nontrivial to construct.

19.1.4 What does this mean?We have shown a probabilistic algorithm that on any ๐‘š edge graph๐บ, will output a cut of at least ๐‘š/2 edges with probability at least1 โˆ’ 2โˆ’1000. Does it mean that we can consider this problem as โ€œeasyโ€?Should we be somewhat wary of using a probabilistic algorithm, sinceit can sometimes fail?

First of all, it is important to emphasize that this is still a worst caseguarantee. That is, we are not assuming anything about the inputgraph: the probability is only due to the internal randomness of the al-gorithm. While a probabilistic algorithm might not seem as nice as adeterministic algorithm that is guaranteed to give an output, to get asense of what a failure probability of 2โˆ’1000 means, note that:

โ€ข The chance of winning the Massachusetts Mega Million lottery isone over (75)5 โ‹… 15 which is roughly 2โˆ’35. So 2โˆ’1000 corresponds towinning the lottery about 300 times in a row, at which point youmight not care so much about your algorithm failing.

โ€ข The chance for a U.S. resident to be struck by lightning is about1/700000, which corresponds to about 2โˆ’45 chance that youโ€™ll bestruck by lightning the very second that youโ€™re reading this sen-tence (after which again you might not care so much about thealgorithmโ€™s performance).

โ€ข Since the earth is about 5 billion years old, we can estimate thechance that an asteroid of the magnitude that caused the dinosaursโ€™extinction will hit us this very second to be about 2โˆ’60. It is quitelikely that even a deterministic algorithm will fail if this happens.

So, in practical terms, a probabilistic algorithm is just as good asa deterministic one. But it is still a theoretically fascinating ques-tion whether randomized algorithms actually yield more power, orwhether is it the case that for any computational problem that can besolved by probabilistic algorithm, there is a deterministic algorithmwith nearly the same performance.2 For example, we will see in Ex-ercise 19.2 that there is in fact a deterministic algorithm that can cutat least ๐‘š/2 edges in an ๐‘š-edge graph. We will discuss this questionin generality in Chapter 20. For now, let us see a couple of exampleswhere randomization leads to algorithms that are better in some sensethan the known deterministic algorithms.

19.1.5 Solving SAT through randomizationThe 3SAT problem is NP hard, and so it is unlikely that it has a poly-nomial (or even subexponential) time algorithm. But this does notmean that we canโ€™t do at least somewhat better than the trivial 2๐‘› al-gorithm for ๐‘›-variable 3SAT. The best known worst-case algorithms

Page 547: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probabilist ic computation 547

3 At the time of this writing, the best known ran-domized algorithms for 3SAT run in time roughly๐‘‚(1.308๐‘›), and the best known deterministic algo-rithms run in time ๐‘‚(1.3303๐‘›) in the worst case.

for 3SAT are randomized, and are related to the following simplealgorithm, variants of which are also used in practice:

Algorithm WalkSAT:Input: An ๐‘› variable 3CNF formula ๐œ‘.Parameters: ๐‘‡ , ๐‘† โˆˆ โ„•Operation:

1. Repeat the following ๐‘‡ steps:a. Choose a random assignment ๐‘ฅ โˆˆ 0, 1๐‘› and repeat the following

for ๐‘† steps:1. If ๐‘ฅ satisfies ๐œ‘ then output ๐‘ฅ.2. Otherwise, choose a random clause (โ„“๐‘– โˆจ โ„“๐‘— โˆจ โ„“๐‘˜) that ๐‘ฅ does

not satisfy, choose a random literal in โ„“๐‘–, โ„“๐‘—, โ„“๐‘˜ and modify ๐‘ฅ tosatisfy this literal.

2. If all the ๐‘‡ โ‹… ๐‘† repetitions above did not result in a satisfying assign-ment then output Unsatisfiable

The running time of this algorithm is ๐‘† โ‹… ๐‘‡ โ‹… ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›), and so thekey question is how small we can make ๐‘† and ๐‘‡ so that the proba-bility that WalkSAT outputs Unsatisfiable on a satisfiable formula๐œ‘ is small. It is known that we can do so with ST = ((4/3)๐‘›) =(1.333 โ€ฆ๐‘›) (see Exercise 19.4 for a weaker result), but weโ€™ll showbelow a simpler analysis yielding ST = (โˆš3๐‘›) = (1.74๐‘›), which isstill much better than the trivial 2๐‘› bound.3

Theorem 19.5 โ€” WalkSAT simple analysis. If we set ๐‘‡ = 100 โ‹… โˆš3๐‘› and๐‘† = ๐‘›/2, then the probability we output Unsatisfiable for asatisfiable ๐œ‘ is at most 1/2.

Proof. Suppose that ๐œ‘ is a satisfiable formula and let ๐‘ฅโˆ— be a satisfyingassignment for it. For every ๐‘ฅ โˆˆ 0, 1๐‘›, denote by ฮ”(๐‘ฅ, ๐‘ฅโˆ—) the num-ber of coordinates that differ between ๐‘ฅ and ๐‘ฅโˆ—. The heart of the proofis the following claim:

Claim I: For every ๐‘ฅ, ๐‘ฅโˆ— as above, in every local improvement step,the value ฮ”(๐‘ฅ, ๐‘ฅโˆ—) is decreased by one with probability at least 1/3.

Proof of Claim I: Since ๐‘ฅโˆ— is a satisfying assignment, if ๐ถ is a clausethat ๐‘ฅ does not satisfy, then at least one of the variables involve in ๐ถmust get different values in ๐‘ฅ and ๐‘ฅโˆ—. Thus when we change ๐‘ฅ by oneof the three literals in the clause, we have probability at least 1/3 ofdecreasing the distance.

The second claim is that our starting point is not that bad:Claim 2: With probability at least 1/2 over a random ๐‘ฅ โˆˆ 0, 1๐‘›,ฮ”(๐‘ฅ, ๐‘ฅโˆ—) โ‰ค ๐‘›/2.Proof of Claim II: Consider the map FLIP โˆถ 0, 1๐‘› โ†’ 0, 1๐‘› that

simply โ€œflipsโ€ all the bits of its input from 0 to 1 and vice versa. That

Page 548: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

548 introduction to theoretical computer science

Figure 19.2: For every ๐‘ฅโˆ— โˆˆ 0, 1๐‘›, we can sort allstrings in 0, 1๐‘› according to their distance from๐‘ฅโˆ— (top to bottom in the above figure), where we let๐ด = ๐‘ฅ โˆˆ 0, 1๐‘› | ๐‘‘๐‘–๐‘ ๐‘ก(๐‘ฅ, ๐‘ฅโˆ— โ‰ค ๐‘›/2 be the โ€œtophalfโ€ of strings. If we define FLIP โˆถ 0, 1๐‘› โ†’ 0, 1to be the map that โ€œflipsโ€ the bits of a given string ๐‘ฅthen it maps every ๐‘ฅ โˆˆ ๐ด to an output FLIP(๐‘ฅ) โˆˆ ๐ดin a one-to-one way, and so it demonstrates that|๐ด| โ‰ค |๐ด| which implies that Pr[๐ด] โ‰ฅ Pr[๐ด] and hencePr[๐ด] โ‰ฅ 1/2.

Figure 19.3: The bipartite matching problem in thegraph ๐บ = (๐ฟโˆช๐‘…, ๐ธ) can be reduced to the minimum๐‘ , ๐‘ก cut problem in the graph ๐บโ€ฒ obtained by addingvertices ๐‘ , ๐‘ก to ๐บ, connecting ๐‘  with ๐ฟ and connecting๐‘ก with ๐‘….

is, FLIP(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) = (1โˆ’๐‘ฅ0, โ€ฆ , 1โˆ’๐‘ฅ๐‘›โˆ’1). Clearly FLIP is one to one.Moreover, if ๐‘ฅ is of distance ๐‘˜ to ๐‘ฅโˆ—, then FLIP(๐‘ฅ) is distance ๐‘› โˆ’ ๐‘˜ to๐‘ฅโˆ—. Now let ๐ต be the โ€œbad eventโ€ in which ๐‘ฅ is of distance > ๐‘›/2 from๐‘ฅโˆ—. Then the set ๐ด = FLIP(๐ต) = FLIP(๐‘ฅ) โˆถ ๐‘ฅ โˆˆ 0, 1๐‘› satisfies|๐ด| = |๐ต| and that if ๐‘ฅ โˆˆ ๐ด then ๐‘ฅ is of distance < ๐‘›/2 from ๐‘ฅโˆ—. Since๐ด and ๐ต are disjoint events, Pr[๐ด] + Pr[๐ต] โ‰ค 1. Since they have thesame cardinality, they have the same probability and so we get that2Pr[๐ต] โ‰ค 1 or Pr[๐ต] โ‰ค 1/2. (See also Fig. 19.2).

Claims I and II imply that each of the ๐‘‡ iterations of the outer loopsucceeds with probability at least 1/2 โ‹… โˆš3โˆ’๐‘›. Indeed, by Claim II,the original guess ๐‘ฅ will satisfy ฮ”(๐‘ฅ, ๐‘ฅโˆ—) โ‰ค ๐‘›/2 with probabilityPr[ฮ”(๐‘ฅ, ๐‘ฅโˆ—) โ‰ค ๐‘›/2] โ‰ฅ 1/2. By Claim I, even conditioned on all thehistory so far, for each of the ๐‘† = ๐‘›/2 steps of the inner loop we haveprobability at least โ‰ฅ 1/3 of being โ€œluckyโ€ and decreasing the distance(i.e. the output of ฮ”) by one. The chance we will be lucky in all ๐‘›/2steps is hence at least (1/3)๐‘›/2 = โˆš3โˆ’๐‘›.

Since any single iteration of the outer loop succeeds with probabil-ity at least 12 โ‹… โˆš3โˆ’๐‘›, the probability that we never do so in ๐‘‡ = 100โˆš3๐‘›repetitions is at most (1 โˆ’ 12โˆš3๐‘› )100โ‹…โˆš3๐‘› โ‰ค (1/๐‘’)50.

19.1.6 Bipartite matchingThe matching problem is one of the canonical optimization problems,arising in all kinds of applications: matching residents with hospitals,kidney donors with patients, flights with crews, and many others.One prototypical variant is bipartite perfect matching. In this problem,we are given a bipartite graph ๐บ = (๐ฟ โˆช ๐‘…, ๐ธ) which has 2๐‘› verticespartitioned into ๐‘›-sized sets ๐ฟ and ๐‘…, where all edges have one end-point in ๐ฟ and the other in ๐‘…. The goal is to determine whether thereis a perfect matching, a subset ๐‘€ โŠ† ๐ธ of ๐‘› disjoint edges. That is, ๐‘€matches every vertex in ๐ฟ to a unique vertex in ๐‘….

The bipartite matching problem turns out to have a polynomial-time algorithm, since we can reduce finding a matching in ๐บ to find-ing a maximum flow (or equivalently, minimum cut) in a relatedgraph ๐บโ€ฒ (see Fig. 19.3). However, we will see a different probabilisticalgorithm to determine whether a graph contains such a matching.

Let us label ๐บโ€™s vertices as ๐ฟ = โ„“0, โ€ฆ , โ„“๐‘›โˆ’1 and ๐‘… = ๐‘Ÿ0, โ€ฆ , ๐‘Ÿ๐‘›โˆ’1.A matching ๐‘€ corresponds to a permutation ๐œ‹ โˆˆ ๐‘†๐‘› (i.e., one-to-oneand onto function ๐œ‹ โˆถ [๐‘›] โ†’ [๐‘›]) where for every ๐‘– โˆˆ [๐‘›], we define ๐œ‹(๐‘–)to be the unique ๐‘— such that ๐‘€ contains the edge โ„“๐‘–, ๐‘Ÿ๐‘—. Define an๐‘› ร— ๐‘› matrix ๐ด = ๐ด(๐บ) where ๐ด๐‘–,๐‘— = 1 if and only if the edge โ„“๐‘–, ๐‘Ÿ๐‘—is present and ๐ด๐‘–,๐‘— = 0 otherwise. The correspondence betweenmatchings and permutations implies the following claim:

Page 549: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probabilist ic computation 549

4 The sign of a permutation ๐œ‹ โˆถ [๐‘›] โ†’ [๐‘›], denoted by๐‘ ๐‘–๐‘”๐‘›(๐œ‹), can be defined in several equivalent ways,one of which is that ๐‘ ๐‘–๐‘”๐‘›(๐œ‹) = (โˆ’1)๐ผ๐‘๐‘‰ (๐œ‹) whereINV(๐‘๐‘–) = |(๐‘ฅ, ๐‘ฆ) โˆˆ [๐‘›] | ๐‘ฅ < ๐‘ฆ โˆง ๐œ‹(๐‘ฅ) > ๐œ‹(๐‘ฆ) (i.e.,INV(๐œ‹) is the number of pairs of elements that areinverted by ๐œ‹). The importance of the term ๐‘ ๐‘–๐‘”๐‘›(๐œ‹) isthat it makes ๐‘ƒ equal to the determinant of the matrix(๐‘ฅ๐‘–,๐‘—) and hence efficiently computable.

Figure 19.4: A degree ๐‘‘ curve in one variable canhave at most ๐‘‘ roots. In higher dimensions, a ๐‘›-variate degree-๐‘‘ polynomial can have an infinitenumber roots though the set of roots will be an ๐‘› โˆ’ 1dimensional surface. Over a finite field ๐”ฝ, an ๐‘›-variatedegree ๐‘‘ polynomial has at most ๐‘‘|๐”ฝ|๐‘›โˆ’1 roots.

Lemma 19.6 โ€” Matching polynomial. Define ๐‘ƒ = ๐‘ƒ(๐บ) to be the polynomialmapping โ„๐‘›2 to โ„ where๐‘ƒ(๐‘ฅ0,0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1,๐‘›โˆ’1) = โˆ‘๐œ‹โˆˆ๐‘†๐‘› (๐‘›โˆ’1โˆ๐‘–=0 ๐‘ ๐‘–๐‘”๐‘›(๐œ‹)๐ด๐‘–,๐œ‹(๐‘–)) ๐‘›โˆ’1โˆ๐‘–=0 ๐‘ฅ๐‘–,๐œ‹(๐‘–) (19.7)

Then ๐บ has a perfect matching if and only if ๐‘ƒ is not identically zero.That is, ๐บ has a perfect matching if and only if there exists some as-signment ๐‘ฅ = (๐‘ฅ๐‘–,๐‘—)๐‘–,๐‘—โˆˆ[๐‘›] โˆˆ โ„๐‘›2 such that ๐‘ƒ(๐‘ฅ) โ‰  0.4Proof. If ๐บ has a perfect matching ๐‘€โˆ—, then let ๐œ‹โˆ— be the permutationcorresponding to ๐‘€ and let ๐‘ฅโˆ— โˆˆ โ„ค๐‘›2 defined as follows: ๐‘ฅ๐‘–,๐‘— = 1 if๐‘— = ๐œ‹(๐‘–) and ๐‘ฅ๐‘–,๐‘— = 0. Note that for every ๐œ‹ โ‰  ๐œ‹โˆ—, โˆ๐‘›โˆ’1๐‘–=0 ๐‘ฅ๐‘–,๐œ‹(๐‘–) = 0 butโˆ๐‘›โˆ’1๐‘–=0 ๐‘ฅโˆ—๐‘–,๐œ‹โˆ—(๐‘–) = 1. Hence ๐‘ƒ(๐‘ฅโˆ—) will equal โˆ๐‘›โˆ’1๐‘–=0 ๐ด๐‘–,๐œ‹โˆ—(๐‘–). But since ๐‘€โˆ— isa perfect matching in ๐บ, โˆ๐‘›โˆ’1๐‘–=0 ๐ด๐‘–,๐œ‹โˆ—(๐‘–) = 1.

On the other hand, suppose that ๐‘ƒ is not identically zero. By (19.7),this means that at least one of the terms โˆ๐‘›โˆ’1๐‘–=0 ๐ด๐‘–,๐œ‹(๐‘–) is not equal tozero. But then this permutation ๐œ‹ must be a perfect matching in ๐บ.

As weโ€™ve seen before, for every ๐‘ฅ โˆˆ โ„๐‘›2 , we can compute ๐‘ƒ(๐‘ฅ)by simply computing the determinant of the matrix ๐ด(๐‘ฅ), which isobtained by replacing ๐ด๐‘–,๐‘— with ๐ด๐‘–,๐‘—๐‘ฅ๐‘–,๐‘—. This reduces testing perfectmatching to the zero testing problem for polynomials: given some poly-nomial ๐‘ƒ(โ‹…), test whether ๐‘ƒ is identically zero or not. The intuitionbehind our randomized algorithm for zero testing is the following:

If a polynomial is not identically zero, then it canโ€™t have โ€œtoo manyโ€ roots.

This intuition sort of makes sense. For one variable polynomi-als, we know that a nonzero linear function has at most one root, aquadratic function (e.g., a parabola) has at most two roots, and gener-ally a degree ๐‘‘ equation has at most ๐‘‘ roots. While in more than onevariable there can be an infinite number of roots (e.g., the polynomial๐‘ฅ0 + ๐‘ฆ0 vanishes on the line ๐‘ฆ = โˆ’๐‘ฅ) it is still the case that the set ofroots is very โ€œsmallโ€ compared to the set of all inputs. For example,the root of a bivariate polynomial form a curve, the roots of a three-variable polynomial form a surface, and more generally the roots of an๐‘›-variable polynomial are a space of dimension ๐‘› โˆ’ 1.

This intuition leads to the following simple randomized algorithm:

To decide if ๐‘ƒ is identically zero, choose a โ€œrandomโ€ input ๐‘ฅ and check if๐‘ƒ(๐‘ฅ) โ‰  0.This makes sense: if there are only โ€œfewโ€ roots, then we expect that

with high probability the random input ๐‘ฅ is not going to be one ofthose roots. However, to transform this into an actual algorithm, we

Page 550: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

550 introduction to theoretical computer science

need to make both the intuition and the notion of a โ€œrandomโ€ inputprecise. Choosing a random real number is quite problematic, espe-cially when you have only a finite number of coins at your disposal,and so we start by reducing the task to a finite setting. We will use thefollowing result:

Theorem 19.7 โ€” Schwartzโ€“Zippel lemma. For every integer ๐‘ž, and poly-nomial ๐‘ƒ โˆถ โ„๐‘› โ†’ โ„ with integer coefficients. If ๐‘ƒ has degree atmost ๐‘‘ and is not identically zero, then it has at most ๐‘‘๐‘ž๐‘›โˆ’1 roots inthe set [๐‘ž]๐‘› = (๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) โˆถ ๐‘ฅ๐‘– โˆˆ 0, โ€ฆ , ๐‘ž โˆ’ 1.We omit the (not too complicated) proof of Theorem 19.7. We

remark that it holds not just over the real numbers but over any fieldas well. Since the matching polynomial ๐‘ƒ of Lemma 19.6 has degree atmost ๐‘›, Theorem 19.7 leads directly to a simple algorithm for testing ifit is nonzero:

Algorithm Perfect-Matching:Input: Bipartite graph ๐บ on 2๐‘› vertices โ„“0, โ€ฆ , โ„“๐‘›โˆ’1, ๐‘Ÿ0, โ€ฆ , ๐‘Ÿ๐‘›โˆ’1.Operation:

1. For every ๐‘–, ๐‘— โˆˆ [๐‘›], choose ๐‘ฅ๐‘–,๐‘— independently at random from[2๐‘›] = 0, โ€ฆ 2๐‘› โˆ’ 1.2. Compute the determinant of the matrix ๐ด(๐‘ฅ) whose (๐‘–, ๐‘—)๐‘กโ„Ž entry

equals ๐‘ฅ๐‘–,๐‘— if the edge โ„“๐‘–, ๐‘Ÿ๐‘— is present and 0 otherwise.3. Output no perfect matching if this determinant is zero, and out-

put perfect matching otherwise.

This algorithm can be improved further (e.g., see Exercise 19.5).While it is not necessarily faster than the cut-based algorithms for per-fect matching, it does have some advantages. In particular, it is moreamenable for parallelization. (However, it also has the significant dis-advantage that it does not produce a matching but only states that oneexists.) The Schwartzโ€“Zippel Lemma, and the associated zero testingalgorithm for polynomials, is widely used across computer science,including in several settings where we have no known deterministicalgorithm matching their performance.

Chapter Recap

โ€ข Using concentration results, we can amplify inpolynomial time the success probability of a prob-abilistic algorithm from a mere 1/๐‘(๐‘›) to 1 โˆ’ 2โˆ’๐‘ž(๐‘›)for every polynomials ๐‘ and ๐‘ž.

โ€ข There are several randomized algorithms that arebetter in various senses (e.g., simpler, faster, orother advantages) than the best known determinis-tic algorithm for the same problem.

Page 551: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

probabilist ic computation 551

5 TODO: add exercise to give a deterministic max cutalgorithm that gives ๐‘š/2 edges. Talk about greedyapproach.

6 Hint: Think of ๐‘ฅ โˆˆ 0, 1๐‘› as choosing ๐‘˜ numbers๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘˜ โˆˆ 0, โ€ฆ , 2โŒˆlog๐‘€โŒ‰ โˆ’ 1. Output the first suchnumber that is in 0, โ€ฆ , ๐‘€ โˆ’ 1.

19.2 EXERCISES

Exercise 19.1 โ€” Amplification for max cut. Prove Lemma 19.3

Exercise 19.2 โ€” Deterministic max cut algorithm. 5

Exercise 19.3 โ€” Simulating distributions using coins. Our model for proba-bility involves tossing ๐‘› coins, but sometimes algorithm require sam-pling from other distributions, such as selecting a uniform number in0, โ€ฆ , ๐‘€ โˆ’ 1 for some ๐‘€ . Fortunately, we can simulate this with anexponentially small probability of error: prove that for every ๐‘€ , if ๐‘› >๐‘˜โŒˆlog๐‘€โŒ‰, then there is a function ๐น โˆถ 0, 1๐‘› โ†’ 0, โ€ฆ , ๐‘€ โˆ’ 1 โˆช โŠฅsuch that (1) The probability that ๐น(๐‘ฅ) = โŠฅ is at most 2โˆ’๐‘˜ and (2) thedistribution of ๐น(๐‘ฅ) conditioned on ๐น(๐‘ฅ) โ‰  โŠฅ is equal to the uniformdistribution over 0, โ€ฆ , ๐‘€ โˆ’ 1.6

Exercise 19.4 โ€” Better walksat analysis. 1. Prove that for every ๐œ– > 0, if๐‘› is large enough then for every ๐‘ฅโˆ— โˆˆ 0, 1๐‘› Pr๐‘ฅโˆผ0,1๐‘› [ฮ”(๐‘ฅ, ๐‘ฅโˆ—) โ‰ค๐‘›/3] โ‰ค 2โˆ’(1โˆ’๐ป(1/3)โˆ’๐œ–)๐‘› where ๐ป(๐‘) = ๐‘ log(1/๐‘)+(1โˆ’๐‘) log(1/(1โˆ’๐‘))is the same function as in Exercise 18.10.

2. Prove that 21โˆ’๐ป(1/4)+(1/4) log3 = (3/2).3. Use the above to prove that for every ๐›ฟ > 0 and large enough ๐‘›, if

we set ๐‘‡ = 1000 โ‹… (3/2 + ๐›ฟ)๐‘› and ๐‘† = ๐‘›/4 in the WalkSAT algorithmthen for every satisfiable 3CNF ๐œ‘, the probability that we outputunsatisfiable is at most 1/2.

Exercise 19.5 โ€” Faster bipartite matching (challenge). (to be completed: im-prove the matching algorithm by working modulo a prime)

19.3 BIBLIOGRAPHICAL NOTES

The books of Motwani and Raghavan [MR95] and Mitzenmacher andUpfal [MU17] are two excellent resources for randomized algorithms.Some of the history of the discovery of Monte Carlo algorithm is cov-ered here.

19.4 ACKNOWLEDGEMENTS

Page 552: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 553: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Figure 20.1: A mechanical coin tosser built for PercyDiaconis by Harvard technicians Steve Sansone andRick Haggerty

20Modeling randomized computation

โ€œAny one who considers arithmetical methods of producing random digits is, ofcourse, in a state of sin.โ€ John von Neumann, 1951.

So far we have described randomized algorithms in an informalway, assuming that an operation such as โ€œpick a string ๐‘ฅ โˆˆ 0, 1๐‘›โ€can be done efficiently. We have neglected to address two questions:

1. How do we actually efficiently obtain random strings in the physi-cal world?

2. What is the mathematical model for randomized computations,and is it more powerful than deterministic computation?

The first question is of both practical and theoretical importance,but for now letโ€™s just say that there are various physical sources ofโ€œrandomโ€ or โ€œunpredictableโ€ data. A userโ€™s mouse movements andtyping pattern, (non solid state) hard drive and network latency,thermal noise, and radioactive decay have all been used as sources forrandomness (see discussion in Section 20.8). For example, many Intelchips come with a random number generator built in. One can evenbuild mechanical coin tossing machines (see Fig. 20.1).

In this chapter we focus on the second question: formally modelingprobabilistic computation and studying its power. We will show that:

1. We can define the class BPP that captures all Boolean functions thatcan be computed in polynomial time by a randomized algorithm.Crucially BPP is still very much a worst case class of computation:the probability is only over the choice of the random coins of thealgorithm, as opposed to the choice of the input.

2. We can amplify the success probability of randomized algorithms,and as a result the class BPP would be identical if we changedthe required success probability to any number ๐‘ that lies strictlybetween 1/2 and 1 (and in fact any number in the range 1/2+1/๐‘ž(๐‘›)to 1 โˆ’ 2โˆ’๐‘ž(๐‘›) for any polynomial ๐‘ž(๐‘›)).

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Formal definition of probabilistic polynomial

time: the class BPP.โ€ข Proof that every function in BPP can be

computed by ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›)-sized NAND-CIRCprograms/circuits.

โ€ข Relations between BPP and NP.โ€ข Pseudorandom generators

Page 554: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

554 introduction to theoretical computer science

3. Though, as is the case for P and NP, there is much we do not knowabout the class BPP, we can establish some relations between BPPand the other complexity classes we saw before. In particular wewill show that P โŠ† BPP โŠ† EXP and BPP โŠ† P/poly.

4. While the relation between BPP and NP is not known, we can showthat if P = NP then BPP = P.

5. We also show that the concept of NP completeness applies equallywell if we use randomized algorithms as our model of โ€œefficientcomputationโ€. That is, if a single NP complete problem has a ran-domized polynomial-time algorithm, then all of NP can be com-puted in polynomial-time by randomized algorithms.

6. Finally we will discuss the question of whether BPP = P and showsome of the intriguing evidence that the answer might actually beโ€œYesโ€ using the concept of pseudorandom generators.

20.1 MODELING RANDOMIZED COMPUTATION

Modeling randomized computation is actually quite easy. We canadd the following operations to any programming language such asNAND-TM, NAND-RAM, NAND-CIRC etc..:foo = RAND()

where foo is a variable. The result of applying this operation is thatfoo is assigned a random bit in 0, 1. (Every time the RAND operationis invoked it returns a fresh independent random bit.) We call theprogramming languages that are augmented with this extra operationRNAND-TM, RNAND-RAM, and RNAND-CIRC respectively.

Similarly, we can easily define randomized Turing machines asTuring machines in which the transition function ๐›ฟ gets as an extrainput (in addition to the current state and symbol read from the tape)a bit ๐‘ that in each step is chosen at random โˆˆ0, 1. Of course thefunction can ignore this bit (and have the same output regardless ofwhether ๐‘ = 0 or ๐‘ = 1) and hence randomized Turing machinesgeneralize deterministic Turing machines.

We can use the RAND() operation to define the notion of a functionbeing computed by a randomized ๐‘‡ (๐‘›) time algorithm for every nicetime bound ๐‘‡ โˆถ โ„• โ†’ โ„•, as well as the notion of a finite function beingcomputed by a size ๐‘† randomized NAND-CIRC program (or, equiv-alently, a randomized circuit with ๐‘† gates that correspond to eitherthe NAND or coin-tossing operations). However, for simplicity wewill not define randomized computation in full generality, but simplyfocus on the class of functions that are computable by randomizedalgorithms running in polynomial time, which by historical conventionis known as BPP:

Page 555: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 555

Definition 20.1 โ€” The class BPP. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. We say that๐น โˆˆ BPP if there exist constants ๐‘Ž, ๐‘ โˆˆ โ„• and an RNAND-TMprogram ๐‘ƒ such that for every ๐‘ฅ โˆˆ 0, 1โˆ—, on input ๐‘ฅ, the program๐‘ƒ halts within at most ๐‘Ž|๐‘ฅ|๐‘ steps and

Pr[๐‘ƒ (๐‘ฅ) = ๐น(๐‘ฅ)] โ‰ฅ 23 (20.1)

where this probability is taken over the result of the RAND opera-tions of ๐‘ƒ .

Note that the probability in (20.1) is taken only over the ran-dom choices in the execution of ๐‘ƒ and not over the choice of the in-put ๐‘ฅ. In particular, as discussed in Big Idea 24, BPP is still a worstcase complexity class, in the sense that if ๐น is in BPP then there is apolynomial-time randomized algorithm that computes ๐น with proba-bility at least 2/3 on every possible (and not just random) input.

The same polynomial-overhead simulation of NAND-RAM pro-grams by NAND-TM programs we saw in Theorem 13.5 extends torandomized programs as well. Hence the class BPP is the same re-gardless of whether it is defined via RNAND-TM or RNAND-RAMprograms. Similarly, we could have just as well defined BPP usingrandomized Turing machines.

Because of these equivalences, below we will use the name โ€œpoly-nomial time randomized algorithmโ€ to denote a computation that can bemodeled by a polynomial-time RNAND-TM program, RNAND-RAMprogram, or a randomized Turing machine (or any programming lan-guage that includes a coin tossing operation). Since all these modelsare equivalent up to polynomial factors, you can use your favoritemodel to capture polynomial-time randomized algorithms withoutany loss in generality.Solved Exercise 20.1 โ€” Choosing from a set. Modern programming lan-guages often involve not just the ability to toss a random coin in 0, 1but also to choose an element at random from a set ๐‘†. Show that youcan emulate this primitive using coin tossing. Specifically, show thatthere is randomized algorithm ๐ด that on input a set ๐‘† of ๐‘š strings oflength ๐‘›, runs in time ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›, ๐‘š) and outputs either an element ๐‘ฅ โˆˆ ๐‘†or โ€œfailโ€ such that

1. Let ๐‘ be the probability that ๐ด outputs โ€œfailโ€, then ๐‘ < 2โˆ’๐‘› (anumber small enough that it can be ignored).

2. For every ๐‘ฅ โˆˆ ๐‘†, the probability that ๐ด outputs ๐‘ฅ is exactly 1โˆ’๐‘๐‘š(and so the output is uniform over ๐‘† if we ignore the tiny probabil-ity of failure)

Page 556: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

556 introduction to theoretical computer science

Figure 20.2: The two equivalent views of random-ized algorithms. We can think of such an algorithmas having access to an internal RAND() operationthat outputs a random independent value in 0, 1whenever it is invoked, or we can think of it as a de-terministic algorithm that in addition to the standardinput ๐‘ฅ โˆˆ 0, 1๐‘› obtains an additional auxiliaryinput ๐‘Ÿ โˆˆ 0, 1๐‘š that is chosen uniformly at random.

Solution:

If the size of ๐‘† is a power of two, that is ๐‘š = 2โ„“ for some โ„“ โˆˆ ๐‘ ,then we can choose a random element in ๐‘† by tossing โ„“ coins toobtain a string ๐‘ค โˆˆ 0, 1โ„“ and then output the ๐‘–-th element of ๐‘†where ๐‘– is the number whose binary representation is ๐‘ค.

If ๐‘† is not a power of two, then our first attempt will be to letโ„“ = โŒˆlog๐‘šโŒ‰ and do the same, but then output the ๐‘–-th element of๐‘† if ๐‘– โˆˆ [๐‘š] and output โ€œfailโ€ otherwise. Conditioned on not out-putting โ€œfailโ€, this element is distributed uniformly in ๐‘†. However,in the worst case, 2โ„“ can be almost 2๐‘š and so the probability of failmight be close to half. To reduce the failure probability, we canrepeat the experiment above ๐‘› times. Specifically, we will use thefollowing algorithm

Algorithm 20.2 โ€” Sample from set.

Input: Set ๐‘† = ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘šโˆ’1 with ๐‘ฅ๐‘– โˆˆ 0, 1๐‘› for all๐‘– โˆˆ [๐‘š].Output: Either ๐‘ฅ โˆˆ ๐‘† or โ€failโ€1: Let โ„“ โ† โŒˆlog๐‘šโŒ‰2: for ๐‘— = 0, 1, โ€ฆ , ๐‘› โˆ’ 1 do3: Pick ๐‘ค โˆผ 0, 1โ„“4: Let ๐‘– โˆˆ [2โ„“] be number whose binary representa-

tion is ๐‘ค.5: if ๐‘– < ๐‘š then6: return ๐‘ฅ๐‘–7: end if8: end for9: return โ€failโ€

Conditioned on not failing, the output of Algorithm 20.2 is uni-formly distributed in ๐‘†. However, since 2โ„“ < 2๐‘š, the probabilityof failure in each iteration is less than 1/2 and so the probability offailure in all of them is at most (1/2)๐‘› = 2โˆ’๐‘›.

20.1.1 An alternative view: random coins as an โ€œextra inputโ€While we presented randomized computation as adding an extraโ€œcoin tossingโ€ operation to our programs, we can also model this asbeing given an additional extra input. That is, we can think of a ran-domized algorithm ๐ด as a deterministic algorithm ๐ดโ€ฒ that takes twoinputs ๐‘ฅ and ๐‘Ÿ where the second input ๐‘Ÿ is chosen at random from0, 1๐‘š for some ๐‘š โˆˆ โ„• (see Fig. 20.2). The equivalence to the Defini-tion 20.1 is shown in the following theorem:

Page 557: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 557

Theorem 20.3 โ€” Alternative characterization of BPP. Let ๐น โˆถ 0, 1โˆ— โ†’0, 1. Then ๐น โˆˆ BPP if and only if there exists ๐‘Ž, ๐‘ โˆˆ โ„• and๐บ โˆถ 0, 1โˆ— โ†’ 0, 1 such that ๐บ is in P and for every ๐‘ฅ โˆˆ 0, 1โˆ—,Pr๐‘Ÿโˆผ0,1๐‘Ž|๐‘ฅ|๐‘ [๐บ(๐‘ฅ๐‘Ÿ) = ๐น(๐‘ฅ)] โ‰ฅ 23 . (20.2)

Proof Idea:

The idea behind the proof is that, as illustrated in Fig. 20.2, we cansimply replace sampling a random coin with reading a bit from theextra โ€œrandom inputโ€ ๐‘Ÿ and vice versa. To prove this rigorously weneed to work through some slightly cumbersome formal notation.This might be one of those proofs that is easier to work out on yourown than to read.โ‹†Proof of Theorem 20.3. We start by showing the โ€œonly ifโ€ direction.Let ๐น โˆˆ BPP and let ๐‘ƒ be an RNAND-TM program that computes๐น as per Definition 20.1, and let ๐‘Ž, ๐‘ โˆˆ โ„• be such that on every inputof length ๐‘›, the program ๐‘ƒ halts within at most ๐‘Ž๐‘›๐‘ steps. We willconstruct a polynomial-time algorithm that ๐‘ƒ โ€ฒ such that for every๐‘ฅ โˆˆ 0, 1๐‘›, if we set ๐‘š = ๐‘Ž๐‘›๐‘, then

Pr๐‘Ÿโˆผ0,1๐‘š[๐‘ƒ โ€ฒ(๐‘ฅ๐‘Ÿ) = 1] = Pr[๐‘ƒ (๐‘ฅ) = 1] , (20.3)

where the probability in the righthand side is taken over the RAND()operations in ๐‘ƒ . In particular this means that if we define ๐บ(๐‘ฅ๐‘Ÿ) =๐‘ƒ โ€ฒ(๐‘ฅ๐‘Ÿ) then the function ๐บ satisfies the conditions of (20.2).

The algorithm ๐‘ƒ โ€ฒ will be very simple: it simulates the program ๐‘ƒ ,maintaining a counter ๐‘– initialized to 0. Every time that ๐‘ƒ makes aRAND() operation, the program ๐‘ƒ โ€ฒ will supply the result from ๐‘Ÿ๐‘– andincrement ๐‘– by one. We will never โ€œrun outโ€ of bits, since the runningtime of ๐‘ƒ is at most ๐‘Ž๐‘›๐‘ and hence it can make at most this number ofRNAND() calls. The output of ๐‘ƒ โ€ฒ(๐‘ฅ๐‘Ÿ) for a random ๐‘Ÿ โˆผ 0, 1๐‘š will bedistributed identically to the output of ๐‘ƒ(๐‘ฅ).

For the other direction, given a function ๐บ โˆˆ P satisfying the condi-tion (20.2) and a NAND-TM ๐‘ƒ โ€ฒ that computes ๐บ in polynomial time,we can construct an RNAND-TM program ๐‘ƒ that computes ๐น in poly-nomial time. On input ๐‘ฅ โˆˆ 0, 1๐‘›, the program ๐‘ƒ will simply use theRNAND() instruction ๐‘Ž๐‘›๐‘ times to fill an array R[0] , โ€ฆ, R[๐‘Ž๐‘›๐‘ โˆ’ 1] andthen execute the original program ๐‘ƒ โ€ฒ on input ๐‘ฅ๐‘Ÿ where ๐‘Ÿ๐‘– is the ๐‘–-thelement of the array R. Once again, it is clear that if ๐‘ƒ โ€ฒ runs in polyno-mial time then so will ๐‘ƒ , and for every input ๐‘ฅ and ๐‘Ÿ โˆˆ 0, 1๐‘Ž๐‘›๐‘ , theoutput of ๐‘ƒ on input ๐‘ฅ and where the coin tosses outcome is ๐‘Ÿ is equalto ๐‘ƒ โ€ฒ(๐‘ฅ๐‘Ÿ).

Page 558: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

558 introduction to theoretical computer science

RRemark 20.4 โ€” Definitions of BPP and NP. The char-acterization of BPP Theorem 20.3 is reminiscent ofthe characterization of NP in Definition 15.1, withthe randomness in the case of BPP playing the roleof the solution in the case of NP. However, there areimportant differences between the two:

โ€ข The definition of NP is โ€œone sidedโ€: ๐น(๐‘ฅ) = 1 ifthere exists a solution ๐‘ค such that ๐บ(๐‘ฅ๐‘ค) = 1 and๐น(๐‘ฅ) = 0 if for every string ๐‘ค of the appropriatelength, ๐บ(๐‘ฅ๐‘ค) = 0. In contrast, the characteriza-tion of BPP is symmetric with respect to the cases๐น(๐‘ฅ) = 0 and ๐น(๐‘ฅ) = 1.

โ€ข The relation between NP and BPP is not immedi-ately clear. It is not known whether BPP โŠ† NP,NP โŠ† BPP, or these two classes are incomparable.It is however known (with a non-trivial proof) thatif P = NP then BPP = P (see Theorem 20.11).

โ€ข Most importantly, the definition of NP is โ€œinef-fective,โ€ since it does not yield a way of actuallyfinding whether there exists a solution among theexponentially many possibilities. By contrast, thedefinition of BPP gives us a way to compute thefunction in practice by simply choosing the secondinput at random.

โ€Random tapesโ€. Theorem 20.3 motivates sometimes considering therandomness of an RNAND-TM (or RNAND-RAM) program as anextra input. As such, if ๐ด is a randomized algorithm that on inputs oflength ๐‘› makes at most ๐‘š coin tosses, we will often use the notation๐ด(๐‘ฅ; ๐‘Ÿ) (where ๐‘ฅ โˆˆ 0, 1๐‘› and ๐‘Ÿ โˆˆ 0, 1๐‘š) to refer to the result ofexecuting ๐‘ฅ when the coin tosses of ๐ด correspond to the coordinatesof ๐‘Ÿ. This second, or โ€œauxiliary,โ€ input is sometimes referred to asa โ€œrandom tape.โ€ This terminology originates from the model ofrandomized Turing machines.

20.1.2 Success amplification of two-sided error algorithmsThe number 2/3 might seem arbitrary, but as weโ€™ve seen in Chapter 19it can be amplified to our liking:

Theorem 20.5 โ€” Amplification. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 be a Booleanfunction such that there is a polynomial ๐‘ โˆถ โ„• โ†’ โ„• and apolynomial-time randomized algorithm ๐ด satisfying that for ev-

Page 559: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 559

Figure 20.3: If ๐น โˆˆ BPP then there is randomizedpolynomial-time algorithm ๐‘ƒ with the followingproperty: In the case ๐น(๐‘ฅ) = 0 two thirds of theโ€œpopulationโ€ of random choices satisfy ๐‘ƒ(๐‘ฅ; ๐‘Ÿ) = 0and in the case ๐น(๐‘ฅ) = 1 two thirds of the populationsatisfy ๐‘ƒ(๐‘ฅ; ๐‘Ÿ) = 1. We can think of amplification asa form of โ€œpollingโ€ of the choices of randomness. Bythe Chernoff bound, if we poll a sample of ๐‘‚( log(1/๐›ฟ)๐œ–2 )random choices ๐‘Ÿ, then with probability at least1 โˆ’ ๐›ฟ, the fraction of ๐‘Ÿโ€™s in the sample satisfying๐‘ƒ(๐‘ฅ; ๐‘Ÿ) = 1 will give us an estimate of the fraction ofthe population within an ๐œ– margin of error. This is thesame calculation used by pollsters to determine theneeded sample size in their polls.

ery ๐‘ฅ โˆˆ 0, 1๐‘›,Pr[๐ด(๐‘ฅ) = ๐น(๐‘ฅ)] โ‰ฅ 12 + 1๐‘(๐‘›) . (20.4)

Then for every polynomial ๐‘ž โˆถ โ„• โ†’ โ„• there is a polynomial-timerandomized algorithm ๐ต satisfying for every ๐‘ฅ โˆˆ 0, 1๐‘›,

Pr[๐ต(๐‘ฅ) = ๐น(๐‘ฅ)] โ‰ฅ 1 โˆ’ 2โˆ’๐‘ž(๐‘›) . (20.5)

Big Idea 25 We can amplify the success of randomized algorithmsto a value that is arbitrarily close to 1.

Proof Idea:

The proof is the same as weโ€™ve seen before in the case of maximumcut and other examples. We use the Chernoff bound to argue that if๐ด computes ๐น with probability at least 12 + ๐œ– and we run it ๐‘‚(๐‘˜/๐œ–2)times, each time using fresh and independent random coins, then theprobability that the majority of the answers will not be correct will beless than 2โˆ’๐‘˜. Amplification can be thought of as a โ€œpollingโ€ of thechoices for randomness for the algorithm (see Fig. 20.3).โ‹†Proof of Theorem 20.5. Let ๐ด be an algorithm satisfying (20.4). Set๐œ– = 1๐‘(๐‘›) and ๐‘˜ = ๐‘ž(๐‘›) where ๐‘, ๐‘ž are the polynomials in the theoremstatement. We can run ๐‘ƒ on input ๐‘ฅ for ๐‘ก = 10๐‘˜/๐œ–2 times, using freshrandomness in each execution, and compute the outputs ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘กโˆ’1.We output the value ๐‘ฆ that appeared the largest number of times. Let๐‘‹๐‘– be the random variable that is equal to 1 if ๐‘ฆ๐‘– = ๐น(๐‘ฅ) and equal to0 otherwise. The random variables ๐‘‹0, โ€ฆ , ๐‘‹๐‘กโˆ’1 are i.i.d. and satisfy๐”ผ[๐‘‹๐‘–] = Pr[๐‘‹๐‘– = 1] โ‰ฅ 1/2 + ๐œ–, and hence by linearity of expectation๐”ผ[โˆ‘๐‘กโˆ’1๐‘–=0 ๐‘‹๐‘–] โ‰ฅ ๐‘ก(1/2 + ๐œ–). For the plurality value to be incorrect, it musthold that โˆ‘๐‘กโˆ’1๐‘–=0 ๐‘‹๐‘– โ‰ค ๐‘ก/2, which means that โˆ‘๐‘กโˆ’1๐‘–=0 ๐‘‹๐‘– is at least ๐œ–๐‘ก farfrom its expectation. Hence by the Chernoff bound (Theorem 18.12),the probability that the plurality value is not correct is at most 2๐‘’โˆ’๐œ–2๐‘ก,which is smaller than 2โˆ’๐‘˜ for our choice of ๐‘ก.

20.2 BPP AND NP COMPLETENESS

Since โ€œnoisy processesโ€ abound in nature, randomized algorithms canbe realized physically, and so it is reasonable to propose BPP ratherthan P as our mathematical model for โ€œfeasibleโ€ or โ€œtractableโ€ com-putation. One might wonder if this makes all the previous chapters

Page 560: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

560 introduction to theoretical computer science

irrelevant, and in particular if the theory of NP completeness stillapplies to probabilistic algorithms. Fortunately, the answer is Yes:

Theorem 20.6 โ€” NP hardness and BPP. Suppose that ๐น is NP-hard and๐น โˆˆ BPP. Then NP โŠ† BPP.

Before seeing the proof, note that Theorem 20.6 implies that if therewas a randomized polynomial time algorithm for any NP-completeproblem such as 3SAT, ISET etc., then there would be such an algo-rithm for every problem in NP. Thus, regardless of whether our modelof computation is deterministic or randomized algorithms, NP com-plete problems retain their status as the โ€œhardest problems in NP.โ€

Proof Idea:

The idea is to simply run the reduction as usual, and plug it intothe randomized algorithm instead of a deterministic one. It wouldbe an excellent exercise, and a way to reinforce the definitions of NP-hardness and randomized algorithms, for you to work out the prooffor yourself. However for the sake of completeness, we include thisproof below.โ‹†Proof of Theorem 20.6. Suppose that ๐น is NP-hard and ๐น โˆˆ BPP.We will now show that this implies that NP โŠ† BPP. Let ๐บ โˆˆ NP.By the definition of NP-hardness, it follows that ๐บ โ‰ค๐‘ ๐น , or that inother words there exists a polynomial-time computable function ๐‘… โˆถ0, 1โˆ— โ†’ 0, 1โˆ— such that ๐บ(๐‘ฅ) = ๐น(๐‘…(๐‘ฅ)) for every ๐‘ฅ โˆˆ 0, 1โˆ—. Nowif ๐น is in BPP then there is a polynomial-time RNAND-TM program ๐‘ƒsuch that

Pr[๐‘ƒ (๐‘ฆ) = ๐น(๐‘ฆ)] โ‰ฅ 2/3 (20.6)

for every ๐‘ฆ โˆˆ 0, 1โˆ— (where the probability is taken over the randomcoin tosses of ๐‘ƒ ). Hence we can get a polynomial-time RNAND-TMprogram ๐‘ƒ โ€ฒ to compute ๐บ by setting ๐‘ƒ โ€ฒ(๐‘ฅ) = ๐‘ƒ(๐‘…(๐‘ฅ)). By (20.6)Pr[๐‘ƒ โ€ฒ(๐‘ฅ) = ๐น(๐‘…(๐‘ฅ))] โ‰ฅ 2/3 and since ๐น(๐‘…(๐‘ฅ)) = ๐บ(๐‘ฅ) this implies thatPr[๐‘ƒ โ€ฒ(๐‘ฅ) = ๐บ(๐‘ฅ)] โ‰ฅ 2/3, which proves that ๐บ โˆˆ BPP.

Most of the results weโ€™ve seen about NP hardness, including thesearch to decision reduction of Theorem 16.1, the decision to optimiza-tion reduction of Theorem 16.3, and the quantifier elimination resultof Theorem 16.6, all carry over in the same way if we replace P withBPP as our model of efficient computation. Thus if NP โŠ† BPP thenwe get essentially all of the strange and wonderful consequences ofP = NP. Unsurprisingly, we cannot rule out this possibility. In fact,unlike P = EXP, which is ruled out by the time hierarchy theorem, we

Page 561: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 561

1 At the time of this writing, the largest โ€œnaturalโ€ com-plexity class which we canโ€™t rule out being containedin BPP is the class NEXP, which we did not definein this course, but corresponds to non deterministicexponential time. See this paper for a discussion ofthis question.

Figure 20.4: Some possibilities for the relations be-tween BPP and other complexity classes. Mostresearchers believe that BPP = P and that theseclasses are not powerful enough to solve NP-completeproblems, let alone all problems in EXP. However,we have not even been able yet to rule out the possi-bility that randomness is a โ€œsilver bulletโ€ that allowsexponential speedup on all problems, and henceBPP = EXP. As weโ€™ve already seen, we also canโ€™trule out that P = NP. Interestingly, in the latter case,P = BPP.

donโ€™t even know how to rule out the possibility that BPP = EXP! Thusa priori itโ€™s possible (though seems highly unlikely) that randomnessis a magical tool that allows us to speed up arbitrary exponential timecomputation.1 Nevertheless, as we discuss below, it is believed thatrandomizationโ€™s power is much weaker and BPP lies in much moreโ€œpedestrianโ€ territory.

20.3 THE POWER OF RANDOMIZATION

A major question is whether randomization can add power to compu-tation. Mathematically, we can phrase this as the following question:does BPP = P? Given what weโ€™ve seen so far about the relations ofother complexity classes such as P and NP, or NP and EXP, one mightguess that:

1. We do not know the answer to this question.

2. But we suspect that BPP is different than P.

One would be correct about the former, but wrong about the latter.As we will see, we do in fact have reasons to believe that BPP = P.This can be thought of as supporting the extended Church Turing hy-pothesis that deterministic polynomial-time Turing machines capturewhat can be feasibly computed in the physical world.

We now survey some of the relations that are known betweenBPP and other complexity classes we have encountered. (See alsoFig. 20.4.)

20.3.1 Solving BPP in exponential timeIt is not hard to see that if ๐น is in BPP then it can be computed inexponential time.

Theorem 20.7 โ€” Simulating randomized algorithms in exponential time.

BPP โŠ† EXP

PThe proof of Theorem 20.7 readily follows by enumer-ating over all the (exponentially many) choices for therandom coins. We omit the formal proof, as doing itby yourself is an excellent way to get comfortable withDefinition 20.1.

20.3.2 Simulating randomized algorithms by circuitsWe have seen in Theorem 13.12 that if ๐น is in P, then there is a polyno-mial ๐‘ โˆถ โ„• โ†’ โ„• such that for every ๐‘›, the restriction ๐น๐‘› of ๐น to inputs

Page 562: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

562 introduction to theoretical computer science

0, 1๐‘› is in SIZE(๐‘(๐‘›)). (In other words, that P โŠ† P/poly.) A priori it isnot at all clear that the same holds for a function in BPP, but this doesturn out to be the case.

Figure 20.5: The possible guarantees for a random-ized algorithm ๐ด computing some function ๐น . Inthe tables above, the columns correspond to differ-ent inputs and the rows to different choices of therandom tape. A cell at position ๐‘Ÿ, ๐‘ฅ is colored greenif ๐ด(๐‘ฅ; ๐‘Ÿ) = ๐น(๐‘ฅ) (i.e., the algorithm outputs thecorrect answer) and red otherwise. The standard BPPguarantee corresponds to the middle figure, wherefor every input ๐‘ฅ, at least two thirds of the choices๐‘Ÿ for a random tape will result in ๐ด computing thecorrect value. That is, every column is colored greenin at least two thirds of its coordinates. In the leftfigure we have an โ€œaverage caseโ€ guarantee wherethe algorithm is only guaranteed to output the correctanswer with probability two thirds over a randominput (i.e., at most one third of the total entries of thetable are colored red, but there could be an all redcolumn). The right figure corresponds to the โ€œofflineBPPโ€ case, with probability at least two thirds overthe random choice ๐‘Ÿ, ๐‘Ÿ will be good for every input.That is, at least two thirds of the rows are all green.Theorem 20.8 (BPP โŠ† P/poly) is proven by amplify-ing the success of a BPP algorithm until we have theโ€œoffline BPPโ€ guarantee, and then hardwiring thechoice of the randomness ๐‘Ÿ to obtain a nonuniformdeterministic algorithm.

Theorem 20.8 โ€” Randomness does not help for non uniform computation.

BPP โŠ† P/poly.That is, for every ๐น โˆˆ BPP, there exist some ๐‘Ž, ๐‘ โˆˆ โ„• such that

for every ๐‘› > 0, ๐น๐‘› โˆˆ SIZE(๐‘Ž๐‘›๐‘) where ๐น๐‘› is the restriction of ๐น toinputs in 0, 1๐‘›.Proof Idea:

The idea behind the proof is that we can first amplify by repetitionthe probability of success from 2/3 to 1 โˆ’ 0.1 โ‹… 2โˆ’๐‘›. This will allow us toshow that for every ๐‘› โˆˆ โ„• there exists a single fixed choice of โ€œfavorablecoinsโ€ which is a string ๐‘Ÿ of length polynomial in ๐‘› such that if ๐‘Ÿ isused for the randomness then we output the right answer on all ofthe possible 2๐‘› inputs. We can then use the standard โ€œunravelling theloopโ€ technique to transform an RNAND-TM program to an RNAND-CIRC program, and โ€œhardwireโ€ the favorable choice of random coinsto transform the RNAND-CIRC program into a plain old deterministicNAND-CIRC program.โ‹†Proof of Theorem 20.8. Suppose that ๐น โˆˆ BPP. Let ๐‘ƒ be a polynomial-time RNAND-TM program that computes ๐น as per Definition 20.1.Using Theorem 20.5, we can amplify the success probability of ๐‘ƒ toobtain an RNAND-TM program ๐‘ƒ โ€ฒ that is at most a factor of ๐‘‚(๐‘›)

Page 563: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 563

slower (and hence still polynomial time) such that for every ๐‘ฅ โˆˆ0, 1๐‘›Pr๐‘Ÿโˆผ0,1๐‘š[๐‘ƒ โ€ฒ(๐‘ฅ; ๐‘Ÿ) = ๐น(๐‘ฅ)] โ‰ฅ 1 โˆ’ 0.1 โ‹… 2โˆ’๐‘› , (20.7)

where ๐‘š is the number of coin tosses that ๐‘ƒ โ€ฒ uses on inputs oflength ๐‘›. We use the notation ๐‘ƒ โ€ฒ(๐‘ฅ; ๐‘Ÿ) to denote the execution of ๐‘ƒ โ€ฒon input ๐‘ฅ and when the result of the coin tosses corresponds to thestring ๐‘Ÿ.

For every ๐‘ฅ โˆˆ 0, 1๐‘›, define the โ€œbadโ€ event ๐ต๐‘ฅ to hold if ๐‘ƒ โ€ฒ(๐‘ฅ) โ‰ ๐น(๐‘ฅ), where the sample space for this event consists of the coins of ๐‘ƒ โ€ฒ.Then by (20.7), Pr[๐ต๐‘ฅ] โ‰ค 0.1 โ‹… 2โˆ’๐‘› for every ๐‘ฅ โˆˆ 0, 1๐‘›. Since thereare 2๐‘› many such ๐‘ฅโ€™s, by the union bound we see that the probabilitythat the union of the events ๐ต๐‘ฅ๐‘ฅโˆˆ0,1๐‘› is at most 0.1. This means thatif we choose ๐‘Ÿ โˆผ 0, 1๐‘š, then with probability at least 0.9 it will bethe case that for every ๐‘ฅ โˆˆ 0, 1๐‘›, ๐น(๐‘ฅ) = ๐‘ƒ โ€ฒ(๐‘ฅ; ๐‘Ÿ). (Indeed, otherwisethe event ๐ต๐‘ฅ would hold for some ๐‘ฅ.) In particular, because of themere fact that the the probability of โˆช๐‘ฅโˆˆ0,1๐‘›๐ต๐‘ฅ is smaller than 1, thismeans that there exists a particular ๐‘Ÿโˆ— โˆˆ 0, 1๐‘š such that๐‘ƒ โ€ฒ(๐‘ฅ; ๐‘Ÿโˆ—) = ๐น(๐‘ฅ) (20.8)

for every ๐‘ฅ โˆˆ 0, 1๐‘›.Now let us use the standard โ€œunravelling the loopโ€ the technique

and transform ๐‘ƒ โ€ฒ into a NAND-CIRC program ๐‘„ of polynomial in๐‘› size, such that ๐‘„(๐‘ฅ๐‘Ÿ) = ๐‘ƒ โ€ฒ(๐‘ฅ; ๐‘Ÿ) for every ๐‘ฅ โˆˆ 0, 1๐‘› and ๐‘Ÿ โˆˆ0, 1๐‘š. Then by โ€œhardwiringโ€ the values ๐‘Ÿโˆ—0, โ€ฆ , ๐‘Ÿโˆ—๐‘šโˆ’1 in place of thelast ๐‘š inputs of ๐‘„, we obtain a new NAND-CIRC program ๐‘„๐‘Ÿโˆ— thatsatisfies by (20.8) that ๐‘„๐‘Ÿโˆ—(๐‘ฅ) = ๐น(๐‘ฅ) for every ๐‘ฅ โˆˆ 0, 1๐‘›. Thisdemonstrates that ๐น๐‘› has a polynomial sized NAND-CIRC program,hence completing the proof of Theorem 20.8.

20.4 DERANDOMIZATION

The proof of Theorem 20.8 can be summarized as follows: we canreplace a ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›)-time algorithm that tosses coins as it runs with analgorithm that uses a single set of coin tosses ๐‘Ÿโˆ— โˆˆ 0, 1๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) whichwill be good enough for all inputs of size ๐‘›. Another way to say it isthat for the purposes of computing functions, we do not need โ€œonlineโ€access to random coins and can generate a set of coins โ€œofflineโ€ aheadof time, before we see the actual input.

But this does not really help us with answering the question ofwhether BPP equals P, since we still need to find a way to generatethese โ€œofflineโ€ coins in the first place. To derandomize an RNAND-TM program we will need to come up with a single deterministic

Page 564: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

564 introduction to theoretical computer science

2 One amusing anecdote is a recent case where scam-mers managed to predict the imperfect โ€œpseudo-random generatorโ€ used by slot machines to cheatcasinos. Unfortunately we donโ€™t know the details ofhow they did it, since the case was sealed.

algorithm that will work for all input lengths. That is, unlike in thecase of RNAND-CIRC programs, we cannot choose for every inputlength ๐‘› some string ๐‘Ÿโˆ— โˆˆ 0, 1๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) to use as our random coins.

Can we derandomize randomized algorithms, or does randomnessadd an inherent extra power for computation? This is a fundamentallyinteresting question but is also of practical significance. Ever sincepeople started to use randomized algorithms during the Manhattanproject, they have been trying to remove the need for randomness andreplace it with numbers that are selected through some deterministicprocess. Throughout the years this approach has often been usedsuccessfully, though there have been a number of failures as well.2

A common approach people used over the years was to replacethe random coins of the algorithm by a โ€œrandomish lookingโ€ stringthat they generated through some arithmetic progress. For example,one can use the digits of ๐œ‹ for the random tape. Using these type ofmethods corresponds to what von Neumann referred to as a โ€œstateof sinโ€. (Though this is a sin that he himself frequently committed,as generating true randomness in sufficient quantity was and still isoften too expensive.) The reason that this is considered a โ€œsinโ€ is thatsuch a procedure will not work in general. For example, it is easy tomodify any probabilistic algorithm ๐ด such as the ones we have seen inChapter 19, to an algorithm ๐ดโ€ฒ that is guaranteed to fail if the randomtape happens to equal the digits of ๐œ‹. This means that the procedureโ€œreplace the random tape by the digits of ๐œ‹โ€ does not yield a generalway to transform a probabilistic algorithm to a deterministic one thatwill solve the same problem. Of course, this procedure does not alwaysfail, but we have no good way to determine when it fails and whenit succeeds. This reasoning is not specific to ๐œ‹ and holds for everydeterministically produced string, whether it obtained by ๐œ‹, ๐‘’, theFibonacci series, or anything else.

An algorithm that checks if its random tape is equal to ๐œ‹ and thenfails seems to be quite silly, but this is but the โ€œtip of the icebergโ€ for avery serious issue. Time and again people have learned the hard waythat one needs to be very careful about producing random bits usingdeterministic means. As we will see when we discuss cryptography,many spectacular security failures and break-ins were the result ofusing โ€œinsufficiently randomโ€ coins.

20.4.1 Pseudorandom generatorsSo, we canโ€™t use any single string to โ€œderandomizeโ€ a probabilisticalgorithm. It turns out however, that we can use a collection of stringsto do so. Another way to think about it is that rather than trying toeliminate the need for randomness, we start by focusing on reducing the

Page 565: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 565

Figure 20.6: A pseudorandom generator ๐บ mapsa short string ๐‘  โˆˆ 0, 1โ„“ into a long string ๐‘Ÿ โˆˆ0, 1๐‘š such that an small program/circuit ๐‘ƒ cannotdistinguish between the case that it is provided arandom input ๐‘Ÿ โˆผ 0, 1๐‘š and the case that itis provided a โ€œpseudorandomโ€ input of the form๐‘Ÿ = ๐บ(๐‘ ) where ๐‘  โˆผ 0, 1โ„“. The short string ๐‘ is sometimes called the seed of the pseudorandomgenerator, as it is a small object that can be thought asyielding a large โ€œtree of randomnessโ€.

amount of randomness needed. (Though we will see that if we reducethe randomness sufficiently, we can eventually get rid of it altogether.)

We make the following definition:

Definition 20.9 โ€” Pseudorandom generator. A function ๐บ โˆถ 0, 1โ„“ โ†’0, 1๐‘š is a (๐‘‡ , ๐œ–)-pseudorandom generator if for every circuit ๐ถ with๐‘š inputs, one output, and at most ๐‘‡ gates,โˆฃ Pr๐‘ โˆผ0,1โ„“[๐ถ(๐บ(๐‘ )) = 1] โˆ’ Pr๐‘Ÿโˆผ0,1๐‘š[๐ถ(๐‘Ÿ) = 1]โˆฃ < ๐œ– (20.9)

PThis is a definition thatโ€™s worth reading more thanonce, and spending some time to digest it. Note that ittakes several parameters:

โ€ข ๐‘‡ is the limit on the number of gates of the circuit๐ถ that the generator needs to โ€œfoolโ€. The larger ๐‘‡is, the stronger the generator.

โ€ข ๐œ– is how close is the output of the pseudorandomgenerator to the true uniform distribution over0, 1๐‘š. The smaller ๐œ– is, the stronger the generator.

โ€ข โ„“ is the input length and ๐‘š is the output length.If โ„“ โ‰ฅ ๐‘š then it is trivial to come up with sucha generator: on input ๐‘  โˆˆ 0, 1โ„“, we can output๐‘ 0, โ€ฆ , ๐‘ ๐‘šโˆ’1. In this case Pr๐‘ โˆผ0,1โ„“ [๐‘ƒ (๐บ(๐‘ )) = 1]will simply equal Pr๐‘Ÿโˆˆ0,1๐‘š [๐‘ƒ (๐‘Ÿ) = 1], no matterhow many lines ๐‘ƒ has. So, the smaller โ„“ is and thelarger ๐‘š is, the stronger the generator, and to getanything non-trivial, we need ๐‘š > โ„“.

Furthermore note that although our eventual goal is tofool probabilistic randomized algorithms that take anunbounded number of inputs, Definition 20.9 refers tofinite and deterministic NAND-CIRC programs.

We can think of a pseudorandom generator as a โ€œrandomnessamplifier.โ€ It takes an input ๐‘  of โ„“ bits chosen at random and ex-pands these โ„“ bits into an output ๐‘Ÿ of ๐‘š > โ„“ pseudorandom bits. If๐œ– is small enough then the pseudorandom bits will โ€œlook randomโ€to any NAND-CIRC program that is not too big. Still, there are twoquestions we havenโ€™t answered:

โ€ข What reason do we have to believe that pseudorandom generators withnon-trivial parameters exist?

โ€ข Even if they do exist, why would such generators be useful to derandomizerandomized algorithms? After all, Definition 20.9 does not involve

Page 566: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

566 introduction to theoretical computer science

RNAND-TM or RNAND-RAM programs, but rather deterministicNAND-CIRC programs with no randomness and no loops.

We will now (partially) answer both questions. For the first ques-tion, let us come clean and confess we do not know how to prove thatinteresting pseudorandom generators exist. By interesting we meanpseudorandom generators that satisfy that ๐œ– is some small constant(say ๐œ– < 1/3), ๐‘š > โ„“, and the function ๐บ itself can be computed in๐‘๐‘œ๐‘™๐‘ฆ(๐‘š) time. Nevertheless, Lemma 20.12 (whose statement and proofis deferred to the end of this chapter) shows that if we only drop thelast condition (polynomial-time computability), then there do in factexist pseudorandom generators where ๐‘š is exponentially larger than โ„“.

PAt this point you might want to skip ahead and look atthe statement of Lemma 20.12. However, since its proofis somewhat subtle, I recommend you defer reading ituntil youโ€™ve finished reading the rest of this chapter.

20.4.2 From existence to constructivityThe fact that there exists a pseudorandom generator does not meanthat there is one that can be efficiently computed. However, it turnsout that we can turn complexity โ€œon its headโ€ and use the assumednon existence of fast algorithms for problems such as 3SAT to obtainpseudorandom generators that can then be used to transform random-ized algorithms into deterministic ones. This is known as the Hardnessvs Randomness paradigm. A number of results along those lines, mostof which are outside the scope of this course, have led researchers tobelieve the following conjecture:

Optimal PRG conjecture: There is a polynomial-time computablefunction PRG โˆถ 0, 1โˆ— โ†’ 0, 1 that yields an exponentially securepseudorandom generator.Specifically, there exists a constant ๐›ฟ > 0 such that for every โ„“ and๐‘š < 2๐›ฟโ„“, if we define ๐บ โˆถ 0, 1โ„“ โ†’ 0, 1๐‘š as ๐บ(๐‘ )๐‘– = PRG(๐‘ , ๐‘–) for every๐‘  โˆˆ 0, 1โ„“ and ๐‘– โˆˆ [๐‘š], then ๐บ is a (2๐›ฟโ„“, 2โˆ’๐›ฟโ„“) pseudorandom generator.

PThe โ€œoptimal PRG conjectureโ€ is worth while readingmore than once. What it posits is that we can obtain(๐‘‡ , ๐œ–) pseudorandom generator ๐บ such that everyoutput bit of ๐บ can be computed in time polynomialin the length โ„“ of the input, where ๐‘‡ is exponentiallylarge in โ„“ and ๐œ– is exponentially small in โ„“. (Note thatwe could not hope for the entire output to be com-

Page 567: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 567

3 A pseudorandom generator of the form we posit,where each output bit can be computed individuallyin time polynomial in the seed length, is commonlyknown as a pseudorandom function generator. For moreon the many interesting results and connections in thestudy of pseudorandomness, see this monograph of SalilVadhan.

putable in โ„“, as just writing the output down will taketoo long.)To understand why we call such a pseudorandomgenerator โ€œoptimal,โ€ it is a great exercise to convinceyourself that, for example, there does not exist a(21.1โ„“, 2โˆ’1.1โ„“) pseudorandom generator (in fact, thenumber ๐›ฟ in the conjecture must be smaller than 1). Tosee that we canโ€™t have ๐‘‡ โ‰ซ 2โ„“, note that if we allow aNAND-CIRC program with much more than 2โ„“ linesthen this NAND-CIRC program could โ€œhardwireโ€ in-side it all the outputs of ๐บ on all its 2โ„“ inputs, and usethat to distinguish between a string of the form ๐บ(๐‘ )and a uniformly chosen string in 0, 1๐‘š. To see thatwe canโ€™t have ๐œ– โ‰ช 2โˆ’โ„“, note that by guessing the input๐‘  (which will be successful with probability 2โˆ’2โ„“), wecan obtain a small (i.e., ๐‘‚(โ„“) line) NAND-CIRC pro-gram that achieves a 2โˆ’โ„“ advantage in distinguishing apseudorandom and uniform input. Working out thesedetails is a highly recommended exercise.

We emphasize again that the optimal PRG conjecture is, as its nameimplies, a conjecture, and we still do not know how to prove it. In par-ticular, it is stronger than the conjecture that P โ‰  NP. But we do havesome evidence for its truth. There is a spectrum of different types ofpseudorandom generators, and there are weaker assumptions thanthe optimal PRG conjecture that suffice to prove that BPP = P. Inparticular this is known to hold under the assumption that there existsa function ๐น โˆˆ TIME(2๐‘‚(๐‘›)) and ๐œ– > 0 such that for every sufficientlylarge ๐‘›, ๐น๐‘› is not in SIZE(2๐œ–๐‘›). The name โ€œOptimal PRG conjectureโ€is non standard. This conjecture is sometimes known in the literatureas the existence of exponentially strong pseudorandom functions.3

20.4.3 Usefulness of pseudorandom generatorsWe now show that optimal pseudorandom generators are indeed veryuseful, by proving the following theorem:

Theorem 20.10 โ€” Derandomization of BPP. Suppose that the optimalPRG conjecture is true. Then BPP = P.

Proof Idea:

The optimal PRG conjecture tells us that we can achieve exponentialexpansion of โ„“ truly random coins into as many as 2๐›ฟโ„“ โ€œpseudorandomcoins.โ€ Looked at from the other direction, it allows us to reduce theneed for randomness by taking an algorithm that uses ๐‘š coins andconverting it into an algorithm that only uses ๐‘‚(log๐‘š) coins. Now analgorithm of the latter type by can be made fully deterministic by enu-merating over all the 2๐‘‚(log๐‘š) (which is polynomial in ๐‘š) possibilitiesfor its random choices.

Page 568: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

568 introduction to theoretical computer science

โ‹†We now proceed with the proof details.

Proof of Theorem 20.10. Let ๐น โˆˆ BPP and let ๐‘ƒ be a NAND-TM pro-gram and ๐‘Ž, ๐‘, ๐‘, ๐‘‘ constants such that for every ๐‘ฅ โˆˆ 0, 1๐‘›, ๐‘ƒ(๐‘ฅ)runs in at most ๐‘ โ‹… ๐‘›๐‘‘ steps and Pr๐‘Ÿโˆผ0,1๐‘š [๐‘ƒ (๐‘ฅ; ๐‘Ÿ) = ๐น(๐‘ฅ)] โ‰ฅ 2/3.By โ€œunrolling the loopโ€ and hardwiring the input ๐‘ฅ, we can obtainfor every input ๐‘ฅ โˆˆ 0, 1๐‘› a NAND-CIRC program ๐‘„๐‘ฅ of at most,say, ๐‘‡ = 10๐‘ โ‹… ๐‘›๐‘‘ lines, that takes ๐‘š bits of input and such that๐‘„(๐‘Ÿ) = ๐‘ƒ(๐‘ฅ; ๐‘Ÿ).

Now suppose that ๐บ โˆถ 0, 1โ„“ โ†’ 0, 1 is a (๐‘‡ , 0.1) pseudorandomgenerator. Then we could deterministically estimate the probability๐‘(๐‘ฅ) = Pr๐‘Ÿโˆผ0,1๐‘š [๐‘„๐‘ฅ(๐‘Ÿ) = 1] up to 0.1 accuracy in time ๐‘‚(๐‘‡ โ‹… 2โ„“ โ‹… ๐‘š โ‹…๐‘๐‘œ๐‘ ๐‘ก(๐บ)) where ๐‘๐‘œ๐‘ ๐‘ก(๐บ) is the time that it takes to compute a singleoutput bit of ๐บ.

The reason is that we know that ๐‘(๐‘ฅ) = Pr๐‘ โˆผ0,1โ„“ [๐‘„๐‘ฅ(๐บ(๐‘ )) =1] will give us such an estimate for ๐‘(๐‘ฅ), and we can compute theprobability ๐‘(๐‘ฅ) by simply trying all 2โ„“ possibillites for ๐‘ . Now, underthe optimal PRG conjecture we can set ๐‘‡ = 2๐›ฟโ„“ or equivalently โ„“ =1๐›ฟ log๐‘‡ , and our total computation time is polynomial in 2โ„“ = ๐‘‡ 1/๐›ฟ.Since ๐‘‡ โ‰ค 10๐‘ โ‹… ๐‘›๐‘‘, this running time will be polynomial in ๐‘›.

This completes the proof, since we are guaranteed thatPr๐‘Ÿโˆผ0,1๐‘š [๐‘„๐‘ฅ(๐‘Ÿ) = ๐น(๐‘ฅ)] โ‰ฅ 2/3, and hence estimating theprobability ๐‘(๐‘ฅ) to within 0.1 accuracy is sufficient to compute ๐น(๐‘ฅ).

20.5 P = NP AND BPP VS PTwo computational complexity questions that we cannot settle are:

โ€ข Is P = NP? Where we believe the answer is negative.

โ€ข Is BPP = P? Where we believe the answer is positive.

However we can say that the โ€œconventional wisdomโ€ is correct onat least one of these questions. Namely, if weโ€™re wrong on the firstcount, then weโ€™ll be right on the second one:

Theorem 20.11 โ€” Sipserโ€“Gรกcs Theorem. If P = NP then BPP = P.

PBefore reading the proof, it is instructive to thinkwhy this result is not โ€œobvious.โ€ If P = NP thengiven any randomized algorithm ๐ด and input ๐‘ฅ,we will be able to figure out in polynomial time ifthere is a string ๐‘Ÿ โˆˆ 0, 1๐‘š of random coins for ๐ด

Page 569: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 569

Figure 20.7: If ๐น โˆˆ BPP then through amplification wecan ensure that there is an algorithm ๐ด to compute๐น on ๐‘›-length inputs and using ๐‘š coins such thatPr๐‘Ÿโˆˆ0,1๐‘š [๐ด(๐‘ฅ๐‘Ÿ) โ‰  ๐น(๐‘ฅ)] โ‰ช 1/๐‘๐‘œ๐‘™๐‘ฆ(๐‘š). Henceif ๐น(๐‘ฅ) = 1 then almost all of the 2๐‘š choices for ๐‘Ÿwill cause ๐ด(๐‘ฅ๐‘Ÿ) to output 1, while if ๐น(๐‘ฅ) = 0 then๐ด(๐‘ฅ๐‘Ÿ) = 0 for almost all ๐‘Ÿโ€™s. To prove the Sipserโ€“Gรกcs Theorem we consider several โ€œshiftsโ€ of the set๐‘† โŠ† 0, 1๐‘š of the coins ๐‘Ÿ such that ๐ด(๐‘ฅ๐‘Ÿ) = 1. If๐น(๐‘ฅ) = 1 then we can find a set of ๐‘˜ shifts ๐‘ 0, โ€ฆ , ๐‘ ๐‘˜โˆ’1for which โˆช๐‘–โˆˆ[๐‘˜](๐‘† โŠ• ๐‘ ๐‘–) = 0, 1๐‘š. If ๐น(๐‘ฅ) = 0 thenfor every such set |๐‘๐‘ข๐‘๐‘–โˆˆ[๐‘˜]๐‘†๐‘–| โ‰ค ๐‘˜|๐‘†| โ‰ช 2๐‘š. We canphrase the question of whether there is such a set ofshift using a constant number of quantifiers, and socan solve it in polynomial time if P = NP.

such that ๐ด(๐‘ฅ๐‘Ÿ) = 1. The problem is that even ifPr๐‘Ÿโˆˆ0,1๐‘š [๐ด(๐‘ฅ๐‘Ÿ) = ๐น(๐‘ฅ)] โ‰ฅ 0.9999, it can still be thecase that even when ๐น(๐‘ฅ) = 0 there exists a string ๐‘Ÿsuch that ๐ด(๐‘ฅ๐‘Ÿ) = 1.The proof is rather subtle. It is much more importantthat you understand the statement of the theorem thanthat you follow all the details of the proof.

Proof Idea:

The construction follows the โ€œquantifier eliminationโ€ idea whichwe have seen in Theorem 16.6. We will show that for every ๐น โˆˆ BPP,we can reduce the question of some input ๐‘ฅ satisfies ๐น(๐‘ฅ) = 1 to thequestion of whether a formula of the form โˆƒ๐‘ขโˆˆ0,1๐‘šโˆ€๐‘ฃโˆˆ0,1๐‘˜๐‘ƒ(๐‘ข, ๐‘ฃ)is true, where ๐‘š, ๐‘˜ are polynomial in the length of ๐‘ฅ and ๐‘ƒ ispolynomial-time computable. By Theorem 16.6, if P = NP then we candecide in polynomial time whether such a formula is true or false.

The idea behind this construction is that using amplification wecan obtain a randomized algorithm ๐ด for computing ๐น using ๐‘š coinssuch that for every ๐‘ฅ โˆˆ 0, 1๐‘›, if ๐น(๐‘ฅ) = 0 then the set ๐‘† โŠ† 0, 1๐‘šof coins that make ๐ด output 1 is extremely tiny, and if ๐น(๐‘ฅ) = 1 thenit is very large. Now in the case ๐น(๐‘ฅ) = 1, one can show that thismeans that there exists a small number ๐‘˜ of โ€œshiftsโ€ ๐‘ 0, โ€ฆ , ๐‘ ๐‘˜โˆ’1 suchthat the union of the sets ๐‘† โŠ• ๐‘ ๐‘– (i.e., sets of the form ๐‘  โŠ• ๐‘ ๐‘– | ๐‘  โˆˆ ๐‘†)covers 0, 1๐‘š, while in the case ๐น(๐‘ฅ) = 0 this union will always be ofsize at most ๐‘˜|๐‘†| which is much smaller than 2๐‘š. We can express thecondition that there exists ๐‘ 0, โ€ฆ , ๐‘ ๐‘˜โˆ’1 such that โˆช๐‘–โˆˆ[๐‘˜](๐‘† โŠ• ๐‘ ๐‘–) = 0, 1๐‘šas a statement with a constant number of quantifiers.โ‹†Proof of Theorem 20.11. Let ๐น โˆˆ BPP. Using Theorem 20.5, thereexists a polynomial-time algorithm ๐ด such that for every ๐‘ฅ โˆˆ 0, 1๐‘›,Pr๐‘ฅโˆˆ0,1๐‘š [๐ด(๐‘ฅ๐‘Ÿ) = ๐น(๐‘ฅ)] โ‰ฅ 1 โˆ’ 2โˆ’๐‘› where ๐‘š is polynomial in ๐‘›. Inparticular (since an exponential dominates a polynomial, and we canalways assume ๐‘› is sufficiently large), it holds that

Pr๐‘ฅโˆˆ0,1๐‘š[๐ด(๐‘ฅ๐‘Ÿ) = ๐น(๐‘ฅ)] โ‰ฅ 1 โˆ’ 110๐‘š2 . (20.10)

Let ๐‘ฅ โˆˆ 0, 1๐‘›, and let ๐‘†๐‘ฅ โŠ† 0, 1๐‘š be the set ๐‘Ÿ โˆˆ 0, 1๐‘š โˆถ๐ด(๐‘ฅ๐‘Ÿ) = 1. By our assumption, if ๐น(๐‘ฅ) = 0 then |๐‘†๐‘ฅ| โ‰ค 110๐‘š2 2๐‘š and if๐น(๐‘ฅ) = 1 then |๐‘†๐‘ฅ| โ‰ฅ (1 โˆ’ 110๐‘š2 )2๐‘š.For a set ๐‘† โŠ† 0, 1๐‘š and a string ๐‘  โˆˆ 0, 1๐‘š, we define the set๐‘† โŠ• ๐‘  to be ๐‘Ÿ โŠ• ๐‘  โˆถ ๐‘Ÿ โˆˆ ๐‘† where โŠ• denotes the XOR operation. That

is, ๐‘† โŠ• ๐‘  is the set ๐‘† โ€œshiftedโ€ by ๐‘ . Note that |๐‘† โŠ• ๐‘ | = |๐‘†|. (Pleasemake sure that you see why this is true.)

The heart of the proof is the following two claims:

Page 570: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

570 introduction to theoretical computer science

CLAIM I: For every subset ๐‘† โŠ† 0, 1๐‘š, if |๐‘†| โ‰ค 11000๐‘š 2๐‘š, then forevery ๐‘ 0, โ€ฆ , ๐‘ 100๐‘šโˆ’1 โˆˆ 0, 1๐‘š, โˆช๐‘–โˆˆ[100๐‘š](๐‘† โŠ• ๐‘ ๐‘–) โŠŠ 0, 1๐‘š.

CLAIM II: For every subset ๐‘† โŠ† 0, 1๐‘š, if |๐‘†| โ‰ฅ 12 2๐‘š then thereexist a set of string ๐‘ 0, โ€ฆ , ๐‘ 100๐‘šโˆ’1 such that โˆช๐‘–โˆˆ[100๐‘š](๐‘† โŠ• ๐‘ ๐‘–) = 0, 1๐‘š.

CLAIM I and CLAIM II together imply the theorem. Indeed, theymean that under our assumptions, for every ๐‘ฅ โˆˆ 0, 1๐‘›, ๐น(๐‘ฅ) = 1 ifand only ifโˆƒ๐‘ 0,โ€ฆ,๐‘ 100๐‘šโˆ’1โˆˆ0,1๐‘š โˆช๐‘–โˆˆ[100๐‘š] (๐‘†๐‘ฅ โŠ• ๐‘ ๐‘–) = 0, 1๐‘š (20.11)

which we can re-write as

โˆƒ๐‘ 0,โ€ฆ,๐‘ 100๐‘šโˆ’1โˆˆ0,1๐‘šโˆ€๐‘คโˆˆ0,1๐‘š(๐‘ค โˆˆ (๐‘†๐‘ฅโŠ•๐‘ 0)โˆจ๐‘ค โˆˆ (๐‘†๐‘ฅโŠ•๐‘ 1)โˆจโ‹ฏ ๐‘ค โˆˆ (๐‘†๐‘ฅโŠ•๐‘ 100๐‘šโˆ’1))(20.12)

or equivalently

โˆƒ๐‘ 0,โ€ฆ,๐‘ 100๐‘šโˆ’1โˆˆ0,1๐‘šโˆ€๐‘คโˆˆ0,1๐‘š(๐ด(๐‘ฅ(๐‘คโŠ•๐‘ 0)) = 1โˆจ๐ด(๐‘ฅ(๐‘คโŠ•๐‘ 1)) = 1โˆจโ‹ฏโˆจ๐ด(๐‘ฅ(๐‘คโŠ•๐‘ 100๐‘šโˆ’1)) = 1)(20.13)

which (since ๐ด is computable in polynomial time) is exactly thetype of statement shown in Theorem 16.6 to be decidable in polyno-mial time if P = NP.

We see that all that is left is to prove CLAIM I and CLAIM II.CLAIM I follows immediately from the fact that

โˆช๐‘–โˆˆ[100๐‘šโˆ’1]|๐‘†๐‘ฅ โŠ• ๐‘ ๐‘–| โ‰ค 100๐‘šโˆ’1โˆ‘๐‘–=0 |๐‘†๐‘ฅ โŠ• ๐‘ ๐‘–| = 100๐‘šโˆ’1โˆ‘๐‘–=0 |๐‘†๐‘ฅ| = 100๐‘š|๐‘†๐‘ฅ| .(20.14)

To prove CLAIM II, we will use a technique known as the prob-abilistic method (see the proof of Lemma 20.12 for a more extensivediscussion). Note that this is a completely different use of probabilitythan in the theorem statement, we just use the methods of probabilityto prove an existential statement.

Proof of CLAIM II: Let ๐‘† โŠ† 0, 1๐‘š with |๐‘†| โ‰ฅ 0.5 โ‹… 2๐‘š be asin the claimโ€™s statement. Consider the following probabilistic ex-periment: we choose 100๐‘š random shifts ๐‘ 0, โ€ฆ , ๐‘ 100๐‘šโˆ’1 indepen-dently at random in 0, 1๐‘š, and consider the event GOOD thatโˆช๐‘–โˆˆ[100๐‘š](๐‘† โŠ• ๐‘ ๐‘–) = 0, 1๐‘š. To prove CLAIM II it is enough to showthat Pr[GOOD] > 0, since that means that in particular there must existshifts ๐‘ 0, โ€ฆ , ๐‘ 100๐‘šโˆ’1 that satisfy this condition.

For every ๐‘ง โˆˆ 0, 1๐‘š, define the event BAD๐‘ง to hold if ๐‘ง โˆ‰โˆช๐‘–โˆˆ[100๐‘šโˆ’1](๐‘† โŠ• ๐‘ ๐‘–). The event GOOD holds if BAD๐‘ง fails for every๐‘ง โˆˆ 0, 1๐‘š, and so our goal is to prove that Pr[โˆช๐‘งโˆˆ0,1๐‘šBAD๐‘ง] < 1. Bythe union bound, to show this, it is enough to show that Pr[BAD๐‘ง] <

Page 571: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 571

4 The condition of independence here is subtle. Itis not the case that all of the 2๐‘š ร— 100๐‘š eventsBAD๐‘–๐‘ง๐‘งโˆˆ0,1๐‘š,๐‘–โˆˆ[100๐‘š] are mutually independent.Only for a fixed ๐‘ง โˆˆ 0, 1๐‘š, the 100๐‘š events of theform BAD๐‘–๐‘ง are mutually independent.

5 There is a whole (highly recommended) book byAlon and Spencer devoted to this method.

2โˆ’๐‘š for every ๐‘ง โˆˆ 0, 1๐‘š. Define the event BAD๐‘–๐‘ง to hold if ๐‘ง โˆ‰ ๐‘† โŠ• ๐‘ ๐‘–.Since every shift ๐‘ ๐‘– is chosen independently, for every fixed ๐‘ง theevents BAD0๐‘ง, โ€ฆ ,BAD100๐‘šโˆ’1๐‘ง are mutually independent,4 and hence

Pr[BAD๐‘ง] = Pr[โˆฉ๐‘–โˆˆ[100๐‘šโˆ’1]BAD๐‘–๐‘ง] = 100๐‘šโˆ’1โˆ๐‘–=0 Pr[BAD๐‘–๐‘ง] . (20.15)

So this means that the result will follow by showing thatPr[BAD๐‘–๐‘ง] โ‰ค 12 for every ๐‘ง โˆˆ 0, 1๐‘š and ๐‘– โˆˆ [100๐‘š] (as that wouldallow to bound the righthand side of (20.15) by 2โˆ’100๐‘š). In otherwords, we need to show that for every ๐‘ง โˆˆ 0, 1๐‘š and set ๐‘† โŠ† 0, 1๐‘šwith |๐‘†| โ‰ฅ 12 2๐‘š,

Pr๐‘ โˆˆ0,1๐‘š[๐‘ง โˆˆ ๐‘† โŠ• ๐‘ ] โ‰ฅ 12 . (20.16)

To show this, we observe that ๐‘ง โˆˆ ๐‘† โŠ• ๐‘  if and only if ๐‘  โˆˆ ๐‘† โŠ• ๐‘ง (canyou see why). Hence we can rewrite the probability on the lefthandside of (20.16) as Pr๐‘ โˆˆ0,1๐‘š [๐‘  โˆˆ ๐‘†โŠ•๐‘ง] which simply equals |๐‘†โŠ•๐‘ง|/2๐‘š =|๐‘†|/2๐‘š โ‰ฅ 1/2! This concludes the proof of CLAIM I and hence ofTheorem 20.11.

20.6 NON-CONSTRUCTIVE EXISTENCE OF PSEUDORANDOM GEN-ERATORS (ADVANCED, OPTIONAL)

We now show that, if we donโ€™t insist on constructivity of pseudoran-dom generators, then we can show that there exists pseudorandomgenerators with output that exponentially larger in the input length.Lemma 20.12 โ€” Existence of inefficient pseudorandom generators. There issome absolute constant ๐ถ such that for every ๐œ–, ๐‘‡ , if โ„“ > ๐ถ(log๐‘‡ +log(1/๐œ–)) and ๐‘š โ‰ค ๐‘‡ , then there is an (๐‘‡ , ๐œ–) pseudorandom generator๐บ โˆถ 0, 1โ„“ โ†’ 0, 1๐‘š.

Proof Idea:

The proof uses an extremely useful technique known as the โ€œprob-abilistic methodโ€ which is not too hard mathematically but can beconfusing at first.5 The idea is to give a โ€œnon constructiveโ€ proof ofexistence of the pseudorandom generator ๐บ by showing that if ๐บ waschosen at random, then the probability that it would be a valid (๐‘‡ , ๐œ–)pseudorandom generator is positive. In particular this means thatthere exists a single ๐บ that is a valid (๐‘‡ , ๐œ–) pseudorandom generator.The probabilistic method is just a proof technique to demonstrate theexistence of such a function. Ultimately, our goal is to show the exis-tence of a deterministic function ๐บ that satisfies the condition.โ‹†

Page 572: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

572 introduction to theoretical computer science

The above discussion might be rather abstract at this point, butwould become clearer after seeing the proof.

Proof of Lemma 20.12. Let ๐œ–, ๐‘‡ , โ„“, ๐‘š be as in the lemmaโ€™s statement. Weneed to show that there exists a function ๐บ โˆถ 0, 1โ„“ โ†’ 0, 1๐‘š thatโ€œfoolsโ€ every ๐‘‡ line program ๐‘ƒ in the sense of (20.9). We will showthat this follows from the following claim:

Claim I: For every fixed NAND-CIRC program ๐‘ƒ , if we pick ๐บ โˆถ0, 1โ„“ โ†’ 0, 1๐‘š at random then the probability that (20.9) is violatedis at most 2โˆ’๐‘‡ 2 .

Before proving Claim I, let us see why it implies Lemma 20.12. Wecan identify a function ๐บ โˆถ 0, 1โ„“ โ†’ 0, 1๐‘š with its โ€œtruth tableโ€or simply the list of evaluations on all its possible 2โ„“ inputs. Sinceeach output is an ๐‘š bit string, we can also think of ๐บ as a string in0, 1๐‘šโ‹…2โ„“ . We define โ„ฑ๐‘šโ„“ to be the set of all functions from 0, 1โ„“ to0, 1๐‘š. As discussed above we can identify โ„ฑ๐‘šโ„“ with 0, 1๐‘šโ‹…2โ„“ andchoosing a random function ๐บ โˆผ โ„ฑ๐‘šโ„“ corresponds to choosing arandom ๐‘š โ‹… 2โ„“-long bit string.

For every NAND-CIRC program ๐‘ƒ let ๐ต๐‘ƒ be the event that, if wechoose ๐บ at random from โ„ฑ๐‘šโ„“ then (20.9) is violated with respect tothe program ๐‘ƒ . It is important to understand what is the sample spacethat the event ๐ต๐‘ƒ is defined over, namely this event depends on thechoice of ๐บ and so ๐ต๐‘ƒ is a subset of โ„ฑ๐‘šโ„“ . An equivalent way to definethe event ๐ต๐‘ƒ is that it is the subset of all functions mapping 0, 1โ„“ to0, 1๐‘š that violate (20.9), or in other words:

๐ต๐‘ƒ = โŽงโŽจโŽฉ๐บ โˆˆ โ„ฑ๐‘šโ„“ โˆฃ โˆฃ 12โ„“ โˆ‘๐‘ โˆˆ0,1โ„“ ๐‘ƒ(๐บ(๐‘ )) โˆ’ 12๐‘š โˆ‘๐‘Ÿโˆˆ0,1๐‘š ๐‘ƒ(๐‘Ÿ)โˆฃ > ๐œ–โŽซโŽฌโŽญ(20.17)

(Weโ€™ve replaced here the probability statements in (20.9) with theequivalent sums so as to reduce confusion as to what is the samplespace that ๐ต๐‘ƒ is defined over.)

To understand this proof it is crucial that you pause here and seehow the definition of ๐ต๐‘ƒ above corresponds to (20.17). This may welltake re-reading the above text once or twice, but it is a good exerciseat parsing probabilistic statements and learning how to identify thesample space that these statements correspond to.

Now, weโ€™ve shown in Theorem 5.2 that up to renaming variables(which makes no difference to programโ€™s functionality) there are2๐‘‚(๐‘‡ log๐‘‡ ) NAND-CIRC programs of at most ๐‘‡ lines. Since ๐‘‡ log๐‘‡ <๐‘‡ 2 for sufficiently large ๐‘‡ , this means that if Claim I is true, thenby the union bound it holds that the probability of the union of๐ต๐‘ƒ over all NAND-CIRC programs of at most ๐‘‡ lines is at most2๐‘‚(๐‘‡ log๐‘‡ )2โˆ’๐‘‡ 2 < 0.1 for sufficiently large ๐‘‡ . What is important for

Page 573: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 573

us about the number 0.1 is that it is smaller than 1. In particular thismeans that there exists a single ๐บโˆ— โˆˆ โ„ฑ๐‘šโ„“ such that ๐บโˆ— does not violate(20.9) with respect to any NAND-CIRC program of at most ๐‘‡ lines,but that precisely means that ๐บโˆ— is a (๐‘‡ , ๐œ–) pseudorandom generator.

Hence to conclude the proof of Lemma 20.12, it suffices to proveClaim I. Choosing a random ๐บ โˆถ 0, 1โ„“ โ†’ 0, 1๐‘š amounts to choos-ing ๐ฟ = 2โ„“ random strings ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐ฟโˆ’1 โˆˆ 0, 1๐‘š and letting ๐บ(๐‘ฅ) = ๐‘ฆ๐‘ฅ(identifying 0, 1โ„“ and [๐ฟ] via the binary representation). This meansthat proving the claim amounts to showing that for every fixed func-tion ๐‘ƒ โˆถ 0, 1๐‘š โ†’ 0, 1, if ๐ฟ > 2๐ถ(log๐‘‡ +log ๐œ–) (which by setting ๐ถ > 4,we can ensure is larger than 10๐‘‡ 2/๐œ–2) then the probability thatโˆฃ 1๐ฟ ๐ฟโˆ’1โˆ‘๐‘–=0 ๐‘ƒ(๐‘ฆ๐‘–) โˆ’ Pr๐‘ โˆผ0,1๐‘š[๐‘ƒ (๐‘ ) = 1]โˆฃ > ๐œ– (20.18)

is at most 2โˆ’๐‘‡ 2 .(20.18) follows directly from the Chernoff bound. Indeed, if we

let for every ๐‘– โˆˆ [๐ฟ] the random variable ๐‘‹๐‘– denote ๐‘ƒ(๐‘ฆ๐‘–), thensince ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐ฟโˆ’1 is chosen independently at random, these are in-dependently and identically distributed random variables with mean๐”ผ๐‘ฆโˆผ0,1๐‘š [๐‘ƒ (๐‘ฆ)] = Pr๐‘ฆโˆผ0,1๐‘š [๐‘ƒ (๐‘ฆ) = 1] and hence the probability thatthey deviate from their expectation by ๐œ– is at most 2 โ‹… 2โˆ’๐œ–2๐ฟ/2.

Figure 20.8: The relation between BPP and the othercomplexity classes that we have seen. We know thatP โŠ† BPP โŠ† EXP and BPP โŠ† P/poly but we donโ€™tknow how BPP compares with NP and canโ€™t rule outeven BPP = EXP. Most evidence points out to thepossibliity that BPP = P.

Chapter Recap

โ€ข We can model randomized algorithms by eitheradding a special โ€œcoin tossโ€ operation or assumingan extra randomly chosen input.

โ€ข The class BPP contains the set of Boolean func-tions that can be computed by polynomial timerandomized algorithms.

โ€ข BPP is a worst case class of computation: a ran-domized algorithm to compute a function mustcompute it correctly with high probability on everyinput.

Page 574: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

574 introduction to theoretical computer science

โ€ข We can amplify the success probability of random-ized algorithm from any value strictly larger than1/2 into a success probability that is exponentiallyclose to 1.

โ€ข We know that P โŠ† BPP โŠ† EXP.โ€ข We also know that BPP โŠ† P/poly.โ€ข The relation between BPP and NP is not known,

but we do know that if P = NP then BPP = P.โ€ข Pseudorandom generators are objects that take

a short random โ€œseedโ€ and expand it to a muchlonger output that โ€œappears randomโ€ for effi-cient algorithms. We conjecture that exponentiallystrong pseudorandom generators exist. Under thisconjecture, BPP = P.

20.7 EXERCISES

20.8 BIBLIOGRAPHICAL NOTES

In this chapter we ignore the issue of how we actually get randombits in practice. The output of many physical processes, whether itis thermal heat, network and hard drive latency, user typing pat-tern and mouse movements, and more can be thought of as a binarystring sampled from some distribution ๐œ‡ that might have significantunpredictability (or entropy) but is not necessarily the uniform distri-bution over 0, 1๐‘›. Indeed, as this paper shows, even (real-world)coin tosses do not have exactly the distribution of a uniformly randomstring. Therefore, to use the resulting measurements for randomizedalgorithms, one typically needs to apply a โ€œdistillationโ€ or random-ness extraction process to the raw measurements to transform themto the uniform distribution. Vadhanโ€™s book [Vad+12] is an excellentsource for more discussion on both randomness extractors and pseu-dorandom generators.

The name BPP stands for โ€œbounded probability polynomial timeโ€.This is an historical accident: this class probably should have beencalled RP or PP but both names were taken by other classes.

The proof of Theorem 20.8 actually yields more than its statement.We can use the same โ€œunrolling the loopโ€ arguments weโ€™ve used be-fore to show that the restriction to 0, 1๐‘› of every function in BPPis also computable by a polynomial-size RNAND-CIRC program(i.e., NAND-CIRC program with the RAND operation). Like in the Pvs SIZE(๐‘๐‘œ๐‘™๐‘ฆ(๐‘›)) case, there are also functions outside BPP whoserestrictions can be computed by polynomial-size RNAND-CIRC pro-grams. Nevertheless the proof of Theorem 20.8 shows that even suchfunctions can be computed by polynomial sized NAND-CIRC pro-

Page 575: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

modeling randomized computation 575

grams without using the rand operations. This can be phrased assaying that BPSIZE(๐‘‡ (๐‘›)) โŠ† SIZE(๐‘‚(๐‘›๐‘‡ (๐‘›))) (where BPSIZE isdefined in the natural way using RNAND progams). The strongerversion of Theorem 20.8 we mentioned can be phrased as saying thatBPP/poly = P/poly.

Page 576: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 577: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

VADVANCED TOPICS

Page 578: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 579: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

21Cryptography

โ€œHuman ingenuity cannot concoct a cipher which human ingenuity cannotresolve.โ€, Edgar Allen Poe, 1841

โ€œA good disguise should not reveal the personโ€™s heightโ€, Shafi Goldwasserand Silvio Micali, 1982

โ€œโ€œPerfect Secrecyโ€ is defined by requiring of a system that after a cryptogramis intercepted by the enemy the a posteriori probabilities of this cryptogram rep-resenting various messages be identically the same as the a priori probabilitiesof the same messages before the interception. It is shown that perfect secrecy ispossible but requires, if the number of messages is finite, the same number ofpossible keys.โ€, Claude Shannon, 1945

โ€œWe stand today on the brink of a revolution in cryptography.โ€, WhitfeldDiffie and Martin Hellman, 1976

Cryptography - the art or science of โ€œsecret writingโ€ - has beenaround for several millennia, and for almost all of that time EdgarAllan Poeโ€™s quote above held true. Indeed, the history of cryptographyis littered with the figurative corpses of cryptosystems believed secureand then broken, and sometimes with the actual corpses of those whohave mistakenly placed their faith in these cryptosystems.

Yet, something changed in the last few decades, which is the โ€œrevo-lutionโ€ alluded to (and to a large extent initiated by) Diffie and Hell-manโ€™s 1976 paper quoted above. New cryptosystems have been foundthat have not been broken despite being subjected to immense effortsinvolving both human ingenuity and computational power on a scalethat completely dwarves the โ€œcode breakersโ€ of Poeโ€™s time. Even moreamazingly, these cryptosystem are not only seemingly unbreakable,but they also achieve this under much harsher conditions. Not only dotodayโ€™s attackers have more computational power but they also havemore data to work with. In Poeโ€™s age, an attacker would be lucky ifthey got access to more than a few encryptions of known messages.These days attackers might have massive amounts of data- terabytes

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข Definition of perfect secrecyโ€ข The one-time pad encryption schemeโ€ข Necessity of long keys for perfect secrecyโ€ข Computational secrecy and the derandomized

one-time pad.โ€ข Public key encryptionโ€ข A taste of advanced topics

Page 580: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

580 introduction to theoretical computer science

Figure 21.1: Snippet from encrypted communicationbetween queen Mary and Sir Babington

Figure 21.2: XKCDโ€™s take on the added security ofusing uncommon symbols

or more - at their disposal. In fact, with public key encryption, an at-tacker can generate as many ciphertexts as they wish.

The key to this success has been a clearer understanding of bothhow to define security for cryptographic tools and how to relate thissecurity to concrete computational problems. Cryptography is a vastand continuously changing topic, but we will touch on some of theseissues in this chapter.

21.1 CLASSICAL CRYPTOSYSTEMS

A great many cryptosystems have been devised and broken through-out the ages. Let us recount just some of these stories. In 1587, Marythe queen of Scots, and the heir to the throne of England, wanted toarrange the assassination of her cousin, queen Elisabeth I of England,so that she could ascend to the throne and finally escape the housearrest under which she had been for the last 18 years. As part of thiscomplicated plot, she sent a coded letter to Sir Anthony Babington.

Mary used whatโ€™s known as a substitution cipher where each letteris transformed into a different obscure symbol (see Fig. 21.1). At afirst look, such a letter might seem rather inscrutable- a meaninglesssequence of strange symbols. However, after some thought, one mightrecognize that these symbols repeat several times and moreover thatdifferent symbols repeat with different frequencies. Now it doesnโ€™ttake a large leap of faith to assume that perhaps each symbol corre-sponds to a different letter and the more frequent symbols correspondto letters that occur in the alphabet with higher frequency. From thisobservation, there is a short gap to completely breaking the cipher,which was in fact done by queen Elisabethโ€™s spies who used the de-coded letters to learn of all the co-conspirators and to convict queenMary of treason, a crime for which she was executed. Trusting in su-perficial security measures (such as using โ€œinscrutableโ€ symbols) is atrap that users of cryptography have been falling into again and againover the years. (As in many things, this is the subject of a great XKCDcartoon, see Fig. 21.2.)

The Vigenรจre cipher is named after Blaise de Vigenรจre who de-scribed it in a book in 1586 (though it was invented earlier by Bellaso).The idea is to use a collection of substitution cyphers - if there are ๐‘›different ciphers then the first letter of the plaintext is encoded withthe first cipher, the second with the second cipher, the ๐‘›๐‘กโ„Ž with the๐‘›๐‘กโ„Ž cipher, and then the ๐‘› + 1๐‘ ๐‘ก letter is again encoded with the firstcipher. The key is usually a word or a phrase of ๐‘› letters, and the๐‘–๐‘กโ„Ž substitution cipher is obtained by shifting each letter ๐‘˜๐‘– positionsin the alphabet. This โ€œflattensโ€ the frequencies and makes it muchharder to do frequency analysis, which is why this cipher was consid-ered โ€œunbreakableโ€ for 300+ years and got the nickname โ€œle chiffre

Page 581: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 581

Figure 21.3: Confederate Cipher Disk for implement-ing the Vigenรจre cipher

Figure 21.4: Confederate encryption of the messageโ€œGenโ€™l Pemberton: You can expect no help from thisside of the river. Let Genโ€™l Johnston know, if possible,when you can attack the same point on the enemyโ€™slines. Inform me also and I will endeavor to make adiversion. I have sent some caps. I subjoin a despatchfrom General Johnston.โ€

Figure 21.5: In the Enigma mechanical cipher the secretkey would be the settings of the rotors and internalwires. As the operator types up their message, theencrypted version appeared in the display area above,and the internal state of the cipher was updated (andso typing the same letter twice would generally resultin two different letters output). Decrypting followsthe same process: if the sender and receiver are usingthe same key then typing the ciphertext would resultin the plaintext appearing in the display.1 Here is a nice exercise: compute (up to an orderof magnitude) the probability that a 50-letter longmessage composed of random letters will end up notcontaining the letter โ€œLโ€.

indรฉchiffrableโ€ (โ€œthe unbreakable cipherโ€). Nevertheless, CharlesBabbage cracked the Vigenรจre cipher in 1854 (though he did not pub-lish it). In 1863 Friedrich Kasiski broke the cipher and published theresult. The idea is that once you guess the length of the cipher, youcan reduce the task to breaking a simple substitution cipher which canbe done via frequency analysis (can you see why?). Confederate gen-erals used Vigenรจre regularly during the civil war, and their messageswere routinely cryptanalzed by Union officers.

The Enigma cipher was a mechanical cipher (looking like a type-writer, see Fig. 21.5) where each letter typed would get mapped into adifferent letter depending on the (rather complicated) key and currentstate of the machine which had several rotors that rotated at differentpaces. An identically wired machine at the other end could be usedto decrypt. Just as many ciphers in history, this has also been believedby the Germans to be โ€œimpossible to breakโ€ and even quite late in thewar they refused to believe it was broken despite mounting evidenceto that effect. (In fact, some German generals refused to believe itwas broken even after the war.) Breaking Enigma was an heroic effortwhich was initiated by the Poles and then completed by the British atBletchley Park, with Alan Turing (of the Turing machines) playing akey role. As part of this effort the Brits built arguably the worldโ€™s firstlarge scale mechanical computation devices (though they looked moresimilar to washing machines than to iPhones). They were also helpedalong the way by some quirks and errors of the German operators. Forexample, the fact that their messages ended with โ€œHeil Hitlerโ€ turnedout to be quite useful.

Here is one entertaining anecdote: the Enigma machine wouldnever map a letter to itself. In March 1941, Mavis Batey, a cryptana-lyst at Bletchley Park received a very long message that she tried todecrypt. She then noticed a curious propertyโ€” the message did notcontain the letter โ€œLโ€.1 She realized that the probability that no โ€œLโ€โ€™sappeared in the message is too small for this to happen by chance.Hence she surmised that the original message must have been com-posed only of Lโ€™s. That is, it must have been the case that the operator,perhaps to test the machine, have simply sent out a message where herepeatedly pressed the letter โ€œLโ€. This observation helped her decodethe next message, which helped inform of a planned Italian attack andsecure a resounding British victory in what became known as โ€œtheBattle of Cape Matapanโ€. Mavis also helped break another Enigmamachine. Using the information she provided, the Brits were ableto feed the Germans with the false information that the main alliedinvasion would take place in Pas de Calais rather than on Normandy.

In the words of General Eisenhower, the intelligence from Bletchleypark was of โ€œpriceless valueโ€. It made a huge difference for the Allied

Page 582: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

582 introduction to theoretical computer science

Figure 21.6: A private-key encryption scheme is apair of algorithms ๐ธ, ๐ท such that for every key๐‘˜ โˆˆ 0, 1๐‘› and plaintext ๐‘ฅ โˆˆ 0, 1๐ฟ(๐‘›), ๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ)is a ciphertext of length ๐ถ(๐‘›). The encryption schemeis valid if for every such ๐‘ฆ, ๐ท๐‘˜(๐‘ฆ) = ๐‘ฅ. That is, thedecryption of an encryption of ๐‘ฅ is ๐‘ฅ, as long as bothencryption and decryption use the same key.

war effort, thereby shortening World War II and saving millions oflives. See also this interview with Sir Harry Hinsley.

21.2 DEFINING ENCRYPTION

Many of the troubles that cryptosystem designers faced over history(and still face!) can be attributed to not properly defining or under-standing what are the goals they want to achieve in the first place. Letus focus on the setting of private key encryption. (This is also known asโ€œsymmetric encryptionโ€; for thousands of years, โ€œprivate key encryp-tionโ€ was synonymous with encryption and only in the 1970โ€™s wasthe concept of public key encryption invented, see Definition 21.11.) Asender (traditionally called โ€œAliceโ€) wants to send a message (knownalso as a plaintext) ๐‘ฅ โˆˆ 0, 1โˆ— to a receiver (traditionally called โ€œBobโ€).They would like their message to be kept secret from an adversarywho listens in or โ€œeavesdropsโ€ on the communication channel (and istraditionally called โ€œEveโ€).

Alice and Bob share a secret key ๐‘˜ โˆˆ 0, 1โˆ—. (While the letter ๐‘˜is often used elsewhere in the book to denote a natural number, inthis chapter we use it to denote the string corresponding to a secretkey.) Alice uses the key ๐‘˜ to โ€œscrambleโ€ or encrypt the plaintext ๐‘ฅ intoa ciphertext ๐‘ฆ, and Bob uses the key ๐‘˜ to โ€œunscrambleโ€ or decrypt theciphertext ๐‘ฆ back into the plaintext ๐‘ฅ. This motivates the followingdefinition which attempts to capture what it means for an encryptionscheme to be valid or โ€œmake senseโ€, regardless of whether or not it issecure:

Definition 21.1 โ€” Valid encryption scheme. Let ๐ฟ โˆถ โ„• โ†’ โ„• and ๐ถ โˆถ โ„• โ†’ โ„•be two functions mapping natural numbers to natural numbers.A pair of polynomial-time computable functions (๐ธ, ๐ท) map-ping strings to strings is a valid private key encryption scheme (orencryption scheme for short) with plaintext length function ๐ฟ(โ‹…) andciphertext length function ๐ถ(โ‹…) if for every ๐‘› โˆˆ โ„•, ๐‘˜ โˆˆ 0, 1๐‘› and๐‘ฅ โˆˆ 0, 1๐ฟ(๐‘›), |๐ธ๐‘˜(๐‘ฅ)| = ๐ถ(๐‘›) and๐ท(๐‘˜, ๐ธ(๐‘˜, ๐‘ฅ)) = ๐‘ฅ . (21.1)

We will often write the first input (i.e., the key) to the encryp-tion and decryption as a subscript and so can write (21.1) also as๐ท๐‘˜(๐ธ๐‘˜(๐‘ฅ)) = ๐‘ฅ.Solved Exercise 21.1 โ€” Lengths of ciphertext and plaintext. Prove that forevery valid encryption scheme (๐ธ, ๐ท) with functions ๐ฟ, ๐ถ. ๐ถ(๐‘›) โ‰ฅ๐ฟ(๐‘›) for every ๐‘›.

Page 583: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 583

2 The actual quote is โ€œIl faut quโ€™il nโ€™exige pas lesecret, et quโ€™il puisse sans inconvรฉnient tomber entreles mains de lโ€™ennemiโ€ loosely translated as โ€œThesystem must not require secrecy and can be stolen bythe enemy without causing troubleโ€. According toSteve Bellovin the NSA version is โ€œassume that thefirst copy of any device we make is shipped to theKremlinโ€.

Solution:

For every fixed key ๐‘˜ โˆˆ 0, 1๐‘›, the equation (21.1) implies thatthe map ๐‘ฆ โ†ฆ ๐ท๐‘˜(๐‘ฆ) inverts the map ๐‘ฅ โ†ฆ ๐ธ๐‘˜(๐‘ฅ), which in partic-ular means that the map ๐‘ฅ โ†ฆ ๐ธ๐‘˜(๐‘ฅ) must be one to one. Henceits codomain must be at least as large as its domain, and since itsdomain is 0, 1๐ฟ(๐‘›) and its codomain is 0, 1๐ถ(๐‘›) it follows that๐ถ(๐‘›) โ‰ฅ ๐ฟ(๐‘›).

Since the ciphertext length is always at least the plaintext length(and in most applications it is not much longer than that), we typi-cally focus on the plaintext length as the quantity to optimize in anencryption scheme. The larger ๐ฟ(๐‘›) is, the better the scheme, since itmeans we need a shorter secret key to protect messages of the samelength.

21.3 DEFINING SECURITY OF ENCRYPTION

Definition 21.1 says nothing about the security of ๐ธ and ๐ท, and evenallows the trivial encryption scheme that ignores the key altogetherand sets ๐ธ๐‘˜(๐‘ฅ) = ๐‘ฅ for every ๐‘ฅ. Defining security is not a trivial matter.

PYou would appreciate the subtleties of defining secu-rity of encryption more if at this point you take a fiveminute break from reading, and try (possibly with apartner) to brainstorm on how you would mathemat-ically define the notion that an encryption scheme issecure, in the sense that it protects the secrecy of theplaintext ๐‘ฅ.

Throughout history, many attacks on cryptosystems were rootedin the cryptosystem designersโ€™ reliance on โ€œsecurity throughobscurityโ€โ€” trusting that the fact their methods are not known totheir enemy will protect them from being broken. This is a faultyassumption - if you reuse a method again and again (even with adifferent key each time) then eventually your adversaries will figureout what you are doing. And if Alice and Bob meet frequently in asecure location to decide on a new method, they might as well takethe opportunity to exchange their secrets. These considerations ledAuguste Kerckhoffs in 1883 to state the following principle:

A cryptosystem should be secure even if everything about the system, except thekey, is public knowledge.2

Why is it OK to assume the key is secret and not the algorithm?Because we can always choose a fresh key. But of course that wonโ€™t

Page 584: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

584 introduction to theoretical computer science

help us much if our key is โ€œ1234โ€ or โ€œpassw0rd!โ€. In fact, if you useany deterministic algorithm to choose the key then eventually youradversary will figure this out. Therefore for security we must choosethe key at random and can restate Kerckhoffsโ€™s principle as follows:

There is no secrecy without randomness

This is such a crucial point that is worth repeating:

Big Idea 26 There is no secrecy without randomness.

At the heart of every cryptographic scheme there is a secret key,and the secret key is always chosen at random. A corollary of thatis that to understand cryptography, you need to know probabilitytheory.

RRemark 21.2 โ€” Randomness in the real world. Choos-ing the secrets for cryptography requires generatingrandomness, which is often done by measuring someโ€œunpredictableโ€ or โ€œhigh entropyโ€ data, and thenapplying hash functions to the result to โ€œextractโ€ auniformly random string. Great care must be taken indoing this, and randomness generators often turn outto be the Achilles heel of secure systems.In 2006 a programmer removed a line of code from theprocedure to generate entropy in OpenSSL packagedistributed by Debian since it caused a warning insome automatic verification code. As a result for twoyears (until this was discovered) all the randomnessgenerated by this procedure used only the processID as an โ€œunpredictableโ€ source. This means that allcommunication done by users in that period is fairlyeasily breakable (and in particular, if some entitiesrecorded that communication they could break it alsoretroactively). See XKCDโ€™s take on that incident.In 2012 two separate teams of researchers scanned alarge number of RSA keys on the web and found outthat about 4 percent of them are easy to break. Themain issue were devices such as routers, internet-connected printers and such. These devices sometimesrun variants of Linux- a desktop operating system-but without a hard drive, mouse or keyboard, theydonโ€™t have access to many of the entropy sources thatdesktop have. Coupled with some good old fashionedignorance of cryptography and software bugs, this ledto many keys that are downright trivial to break, seethis blog post and this web page for more details.Since randomness is so crucial to security, breakingthe procedure to generate randomness can lead to acomplete break of the system that uses this random-ness. Indeed, the Snowden documents, combined with

Page 585: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 585

observations of Shumow and Ferguson, strongly sug-gest that the NSA has deliberately inserted a trapdoorin one of the pseudorandom generators published bythe National Institute of Standards and Technologies(NIST). Fortunately, this generator wasnโ€™t widelyadapted but apparently the NSA did pay 10 milliondollars to RSA security so the latter would make thisgenerator their default option in their products.

21.4 PERFECT SECRECY

If you think about encryption scheme security for a while, you mightcome up with the following principle for defining security: โ€œAnencryption scheme is secure if it is not possible to recover the key ๐‘˜ from๐ธ๐‘˜(๐‘ฅ)โ€. However, a momentโ€™s thought shows that the key is not reallywhat weโ€™re trying to protect. After all, the whole point of an encryp-tion is to protect the confidentiality of the plaintext ๐‘ฅ. So, we can try todefine that โ€œan encryption scheme is secure if it is not possible to recover theplaintext ๐‘ฅ from ๐ธ๐‘˜(๐‘ฅ)โ€. Yet it is not clear what this means either. Sup-pose that an encryption scheme reveals the first 10 bits of the plaintext๐‘ฅ. It might still not be possible to recover ๐‘ฅ completely, but on an in-tuitive level, this seems like it would be extremely unwise to use suchan encryption scheme in practice. Indeed, often even partial informationabout the plaintext is enough for the adversary to achieve its goals.

The above thinking led Shannon in 1945 to formalize the notion ofperfect secrecy, which is that an encryption reveals absolutely nothingabout the message. There are several equivalent ways to define it, butperhaps the cleanest one is the following:

Definition 21.3 โ€” Perfect secrecy. A valid encryption scheme (๐ธ, ๐ท)with plaintext length ๐ฟ(โ‹…) is perfectly secret if for every ๐‘› โˆˆ โ„• andplaintexts ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐ฟ(๐‘›), the following two distributions ๐‘Œ and๐‘Œ โ€ฒ over 0, 1โˆ— are identical:

โ€ข ๐‘Œ is obtained by sampling ๐‘˜ โˆผ 0, 1๐‘› and outputting ๐ธ๐‘˜(๐‘ฅ).โ€ข ๐‘Œ โ€ฒ is obtained by sampling ๐‘˜ โˆผ 0, 1๐‘› and outputting ๐ธ๐‘˜(๐‘ฅโ€ฒ).

PThis definition might take more than one readingto parse. Try to think of how this condition wouldcorrespond to your intuitive notion of โ€œlearning noinformationโ€ about ๐‘ฅ from observing ๐ธ๐‘˜(๐‘ฅ), and toShannonโ€™s quote in the beginning of this chapter.

Page 586: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

586 introduction to theoretical computer science

Figure 21.7: For any key length ๐‘›, we can visualize anencryption scheme (๐ธ, ๐ท) as a graph with a vertexfor every one of the 2๐ฟ(๐‘›) possible plaintexts and forevery one of the ciphertexts in 0, 1โˆ— of the form๐ธ๐‘˜(๐‘ฅ) for ๐‘˜ โˆˆ 0, 1๐‘› and ๐‘ฅ โˆˆ 0, 1๐ฟ(๐‘›). For everyplaintext ๐‘ฅ and key ๐‘˜, we add an edge labeled ๐‘˜between ๐‘ฅ and ๐ธ๐‘˜(๐‘ฅ). By the validity condition, if wepick any fixed key ๐‘˜, the map ๐‘ฅ โ†ฆ ๐ธ๐‘˜(๐‘ฅ) must beone-to-one. The condition of perfect secrecy simplycorresponds to requiring that every two plaintexts๐‘ฅ and ๐‘ฅโ€ฒ have exactly the same set of neighbors (ormulti-set, if there are parallel edges).

In particular, suppose that you knew ahead of timethat Alice sent either an encryption of ๐‘ฅ or an en-cryption of ๐‘ฅโ€ฒ. Would you learn anything new fromobserving the encryption of the message that Aliceactually sent? It may help you to look at Fig. 21.7.

21.4.1 Example: Perfect secrecy in the battlefieldTo understand Definition 21.3, suppose that Alice sends only one oftwo possible messages: โ€œattackโ€ or โ€œretreatโ€, which we denote by ๐‘ฅ0and ๐‘ฅ1 respectively, and that she sends each one of those messageswith probability 1/2. Let us put ourselves in the shoes of Eve, theeavesdropping adversary. A priori we would have guessed that Alicesent either ๐‘ฅ0 or ๐‘ฅ1 with probability 1/2. Now we observe ๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ๐‘–)where ๐‘˜ is a uniformly chosen key in 0, 1๐‘›. How does this newinformation cause us to update our beliefs on whether Alice sent theplaintext ๐‘ฅ0 or the plaintext ๐‘ฅ1?

PBefore reading the next paragraph, you might wantto try the analysis yourself. You may find it useful tolook at the Wikipedia entry on Bayesian Inference orthese MIT lecture notes.

Let us define ๐‘0(๐‘ฆ) to be the probability (taken over ๐‘˜ โˆผ 0, 1๐‘›)that ๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ0) and similarly ๐‘1(๐‘ฆ) to be Pr๐‘˜โˆผ0,1๐‘› [๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ1)].Note that, since Alice chooses the message to send at random, oura priori probability for observing ๐‘ฆ is 12 ๐‘0(๐‘ฆ) + 12 ๐‘1(๐‘ฆ). However,as per Definition 21.3, the perfect secrecy condition guarantees that๐‘0(๐‘ฆ) = ๐‘1(๐‘ฆ)! Let us denote the number ๐‘0(๐‘ฆ) = ๐‘1(๐‘ฆ) by ๐‘. By theformula for conditional probability, the probability that Alice sent themessage ๐‘ฅ0 conditioned on our observation ๐‘ฆ is simply

Pr[๐‘– = 0|๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ๐‘–)] = Pr[๐‘– = 0 โˆง ๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ๐‘–)]Pr[๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ)] . (21.2)

(The equation (21.2) is a special case of Bayesโ€™ rule which, althougha simple restatement of the formula for conditional probability, isan extremely important and widely used tool in statistics and dataanalysis.)

Since the probability that ๐‘– = 0 and ๐‘ฆ is the ciphertext ๐ธ๐‘˜(0) is equalto 12 โ‹… ๐‘0(๐‘ฆ), and the a priori probability of observing ๐‘ฆ is 12 ๐‘0(๐‘ฆ) +12 ๐‘1(๐‘ฆ), we can rewrite (21.2) as

Pr[๐‘– = 0|๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ๐‘–)] = 12 ๐‘0(๐‘ฆ)12 ๐‘0(๐‘ฆ) + 12 ๐‘1(๐‘ฆ) = ๐‘๐‘ + ๐‘ = 12 (21.3)

Page 587: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 587

Figure 21.8: A perfectly secret encryption schemefor two-bit keys and messages. The blue verticesrepresent plaintexts and the red vertices representciphertexts, each edge mapping a plaintext ๐‘ฅ to a ci-phertext ๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ) is labeled with the correspondingkey ๐‘˜. Since there are four possible keys, the degree ofthe graph is four and it is in fact a complete bipartitegraph. The encryption scheme is valid in the sensethat for every ๐‘˜ โˆˆ 0, 12, the map ๐‘ฅ โ†ฆ ๐ธ๐‘˜(๐‘ฅ) isone-to-one, which in other words means that the setof edges labeled with ๐‘˜ is a matching.

using the fact that ๐‘0(๐‘ฆ) = ๐‘1(๐‘ฆ) = ๐‘. This means that observing theciphertext ๐‘ฆ did not help us at all! We still would not be able to guesswhether Alice sent โ€œattackโ€ or โ€œretreatโ€ with better than 50/50 odds!

This example can be vastly generalized to show that perfect secrecyis indeed โ€œperfectโ€ in the sense that observing a ciphertext gives Eveno additional information about the plaintext beyond her a priori knowl-edge.

21.4.2 Constructing perfectly secret encryptionPerfect secrecy is an extremely strong condition, and implies that aneavesdropper does not learn any information from observing the ci-phertext. You might think that an encryption scheme satisfying such astrong condition will be impossible, or at least extremely complicated,to achieve. However it turns out we can in fact obtain perfectly secretencryption scheme fairly easily. Such a scheme for two-bit messages isillustrated in Fig. 21.8

In fact, this can be generalized to any number of bits:

Theorem 21.4 โ€” One Time Pad (Vernam 1917, Shannon 1949). There is a per-fectly secret valid encryption scheme (๐ธ, ๐ท) with ๐ฟ(๐‘›) = ๐ถ(๐‘›) = ๐‘›.

Proof Idea:

Our scheme is the one-time pad also known as the โ€œVernam Ci-pherโ€, see Fig. 21.9. The encryption is exceedingly simple: to encrypta message ๐‘ฅ โˆˆ 0, 1๐‘› with a key ๐‘˜ โˆˆ 0, 1๐‘› we simply output ๐‘ฅ โŠ• ๐‘˜where โŠ• is the bitwise XOR operation that outputs the string corre-sponding to XORing each coordinate of ๐‘ฅ and ๐‘˜.โ‹†Proof of Theorem 21.4. For two binary strings ๐‘Ž and ๐‘ of the samelength ๐‘›, we define ๐‘Ž โŠ• ๐‘ to be the string ๐‘ โˆˆ 0, 1๐‘› such that๐‘๐‘– = ๐‘Ž๐‘– + ๐‘๐‘– mod 2 for every ๐‘– โˆˆ [๐‘›]. The encryption scheme(๐ธ, ๐ท) is defined as follows: ๐ธ๐‘˜(๐‘ฅ) = ๐‘ฅ โŠ• ๐‘˜ and ๐ท๐‘˜(๐‘ฆ) = ๐‘ฆ โŠ• ๐‘˜.By the associative law of addition (which works also modulo two),๐ท๐‘˜(๐ธ๐‘˜(๐‘ฅ)) = (๐‘ฅ โŠ• ๐‘˜) โŠ• ๐‘˜ = ๐‘ฅ โŠ• (๐‘˜ โŠ• ๐‘˜) = ๐‘ฅ โŠ• 0๐‘› = ๐‘ฅ, using the factthat for every bit ๐œŽ โˆˆ 0, 1, ๐œŽ + ๐œŽ mod 2 = 0 and ๐œŽ + 0 = ๐œŽ mod 2.Hence (๐ธ, ๐ท) form a valid encryption.

To analyze the perfect secrecy property, we claim that for every๐‘ฅ โˆˆ 0, 1๐‘›, the distribution ๐‘Œ๐‘ฅ = ๐ธ๐‘˜(๐‘ฅ) where ๐‘˜ โˆผ 0, 1๐‘› is simplythe uniform distribution over 0, 1๐‘›, and hence in particular thedistributions ๐‘Œ๐‘ฅ and ๐‘Œ๐‘ฅโ€ฒ are identical for every ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐‘›. Indeed,for every particular ๐‘ฆ โˆˆ 0, 1๐‘›, the value ๐‘ฆ is output by ๐‘Œ๐‘ฅ if andonly if ๐‘ฆ = ๐‘ฅ โŠ• ๐‘˜ which holds if and only if ๐‘˜ = ๐‘ฅ โŠ• ๐‘ฆ. Since ๐‘˜ ischosen uniformly at random in 0, 1๐‘›, the probability that ๐‘˜ happens

Page 588: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

588 introduction to theoretical computer science

Figure 21.9: In the one time pad encryption scheme weencrypt a plaintext ๐‘ฅ โˆˆ 0, 1๐‘› with a key ๐‘˜ โˆˆ 0, 1๐‘›by the ciphertext ๐‘ฅ โŠ• ๐‘˜ where โŠ• denotes the bitwiseXOR operation.

to equal ๐‘ฅ โŠ• ๐‘ฆ is exactly 2โˆ’๐‘›, which means that every string ๐‘ฆ is outputby ๐‘Œ๐‘ฅ with probability 2โˆ’๐‘›.

PThe argument above is quite simple but is worthreading again. To understand why the one-time padis perfectly secret, it is useful to envision it as a bi-partite graph as weโ€™ve done in Fig. 21.8. (In fact theencryption scheme of Fig. 21.8 is precisely the one-time pad for ๐‘› = 2.) For every ๐‘›, the one-time padencryption scheme corresponds to a bipartite graphwith 2๐‘› vertices on the โ€œleft sideโ€ corresponding to theplaintexts in 0, 1๐‘› and 2๐‘› vertices on the โ€œright sideโ€corresponding to the ciphertexts 0, 1๐‘›. For every๐‘ฅ โˆˆ 0, 1๐‘› and ๐‘˜ โˆˆ 0, 1๐‘›, we connect ๐‘ฅ to the vertex๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ) with an edge that we label with ๐‘˜. One cansee that this is the complete bipartite graph, whereevery vertex on the left is connected to all vertices onthe right. In particular this means that for every leftvertex ๐‘ฅ, the distribution on the ciphertexts obtainedby taking a random ๐‘˜ โˆˆ 0, 1๐‘› and going to theneighbor of ๐‘ฅ on the edge labeled ๐‘˜ is the uniform dis-tribution over 0, 1๐‘›. This ensures the perfect secrecycondition.

21.5 NECESSITY OF LONG KEYS

So, does Theorem 21.4 give the final word on cryptography, andmeans that we can all communicate with perfect secrecy and livehappily ever after? No it doesnโ€™t. While the one-time pad is efficient,and gives perfect secrecy, it has one glaring disadvantage: to commu-nicate ๐‘› bits you need to store a key of length ๐‘›. In contrast, practicallyused cryptosystems such as AES-128 have a short key of 128 bits (i.e.,16 bytes) that can be used to protect terabytes or more of communica-tion! Imagine that we all needed to use the one time pad. If that wasthe case, then if you had to communicate with ๐‘š people, you wouldhave to maintain (securely!) ๐‘š huge files that are each as long as thelength of the maximum total communication you expect with that per-son. Imagine that every time you opened an account with Amazon,Google, or any other service, they would need to send you in the mail(ideally with a secure courier) a DVD full of random numbers, andevery time you suspected a virus, youโ€™d need to ask all these servicesfor a fresh DVD. This doesnโ€™t sound so appealing.

This is not just a theoretical issue. The Soviets have used the one-time pad for their confidential communication since before the 1940โ€™s.In fact, even before Shannonโ€™s work, the U.S. intelligence already

Page 589: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 589

Figure 21.10: Gene Grabeel, who founded the U.S.Russian SigInt program on 1 Feb 1943. Photo taken in1942, see Page 7 in the Venona historical study.

Figure 21.11: An encryption scheme where the num-ber of keys is smaller than the number of plaintextscorresponds to a bipartite graph where the degree issmaller than the number of vertices on the left side.Together with the validity condition this implies thatthere will be two left vertices ๐‘ฅ, ๐‘ฅโ€ฒ with non-identicalneighborhoods, and hence the scheme does not satisfyperfect secrecy.

knew in 1941 that the one-time pad is in principle โ€œunbreakableโ€ (seepage 32 in the Venona document). However, it turned out that thehassle of manufacturing so many keys for all the communication tookits toll on the Soviets and they ended up reusing the same keys formore than one message. They did try to use them for completely dif-ferent receivers in the (false) hope that this wouldnโ€™t be detected. TheVenona Project of the U.S. Army was founded in February 1943 byGene Grabeel (see Fig. 21.10), a former home economics teacher fromMadison Heights, Virgnia and Lt. Leonard Zubko. In October 1943,they had their breakthrough when it was discovered that the Russianswere reusing their keys. In the 37 years of its existence, the project hasresulted in a treasure chest of intelligence, exposing hundreds of KGBagents and Russian spies in the U.S. and other countries, includingJulius Rosenberg, Harry Gold, Klaus Fuchs, Alger Hiss, Harry DexterWhite and many others.

Unfortunately it turns out that such long keys are necessary forperfect secrecy:

Theorem 21.5 โ€” Perfect secrecy requires long keys. For every perfectlysecret encryption scheme (๐ธ, ๐ท) the length function ๐ฟ satisfies๐ฟ(๐‘›) โ‰ค ๐‘›.

Proof Idea:

The idea behind the proof is illustrated in Fig. 21.11. We define agraph between the plaintexts and ciphertexts, where we put an edgebetween plaintext ๐‘ฅ and ciphertext ๐‘ฆ if there is some key ๐‘˜ such that๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ). The degree of this graph is at most the number of potentialkeys. The fact that the degree is smaller than the number of plaintexts(and hence of ciphertexts) implies that there would be two plaintexts๐‘ฅ and ๐‘ฅโ€ฒ with different sets of neighbors, and hence the distributionof a ciphertext corresponding to ๐‘ฅ (with a random key) will not beidentical to the distribution of a ciphertext corresponding to ๐‘ฅโ€ฒ.โ‹†Proof of Theorem 21.5. Let ๐ธ, ๐ท be a valid encryption scheme withmessages of length ๐ฟ and key of length ๐‘› < ๐ฟ. We will show that(๐ธ, ๐ท) is not perfectly secret by providing two plaintexts ๐‘ฅ0, ๐‘ฅ1 โˆˆ0, 1๐ฟ such that the distributions ๐‘Œ๐‘ฅ0 and ๐‘Œ๐‘ฅ1 are not identical, where๐‘Œ๐‘ฅ is the distribution obtained by picking ๐‘˜ โˆผ 0, 1๐‘› and outputting๐ธ๐‘˜(๐‘ฅ).

We choose ๐‘ฅ0 = 0๐ฟ. Let ๐‘†0 โŠ† 0, 1โˆ— be the set of all ciphertextsthat have nonzero probability of being output in ๐‘Œ๐‘ฅ0 . That is, ๐‘†0 =๐‘ฆ | โˆƒ๐‘˜โˆˆ0,1๐‘›๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ0). Since there are only 2๐‘› keys, we know that|๐‘†0| โ‰ค 2๐‘›.

Page 590: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

590 introduction to theoretical computer science

We will show the following claim:Claim I: There exists some ๐‘ฅ1 โˆˆ 0, 1๐ฟ and ๐‘˜ โˆˆ 0, 1๐‘› such that๐ธ๐‘˜(๐‘ฅ1) โˆ‰ ๐‘†0.Claim I implies that the string ๐ธ๐‘˜(๐‘ฅ1) has positive probability of

being output by ๐‘Œ๐‘ฅ1 and zero probability of being output by ๐‘Œ๐‘ฅ0 andhence in particular ๐‘Œ๐‘ฅ0 and ๐‘Œ๐‘ฅ1 are not identical. To prove Claim I, justchoose a fixed ๐‘˜ โˆˆ 0, 1๐‘›. By the validity condition, the map ๐‘ฅ โ†ฆ๐ธ๐‘˜(๐‘ฅ) is a one to one map of 0, 1๐ฟ to 0, 1โˆ— and hence in particularthe image of this map which is the set ๐ผ๐‘˜ = ๐‘ฆ | โˆƒ๐‘ฅโˆˆ0,1๐ฟ๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ)has size at least (in fact exactly) 2๐ฟ. Since |๐‘†0| โ‰ค 2๐‘› < 2๐ฟ, this meansthat |๐ผ๐‘˜| > |๐‘†0| and so in particular there exists some string ๐‘ฆ in ๐ผ๐‘˜ โงต ๐‘†0.But by the definition of ๐ผ๐‘˜ this means that there is some ๐‘ฅ โˆˆ 0, 1๐ฟsuch that ๐ธ๐‘˜(๐‘ฅ) โˆ‰ ๐‘†0 which concludes the proof of Claim I and henceof Theorem 21.5.

21.6 COMPUTATIONAL SECRECY

To sum up the previous episodes, we now know that:

โ€ข It is possible to obtain a perfectly secret encryption scheme with keylength the same as the plaintext.

and

โ€ข It is not possible to obtain such a scheme with key that is even asingle bit shorter than the plaintext.

How does this mesh with the fact that, as weโ€™ve already seen, peo-ple routinely use cryptosystems with a 16 byte (i.e., 128 bit) key butmany terabytes of plaintext? The proof of Theorem 21.5 does give infact a way to break all these cryptosystems, but an examination of thisproof shows that it only yields an algorithm with time exponential inthe length of the key. This motivates the following relaxation of perfectsecrecy to a condition known as โ€œcomputational secrecyโ€. Intuitively,an encryption scheme is computationally secret if no polynomial timealgorithm can break it. The formal definition is below:

Definition 21.6 โ€” Computational secrecy. Let (๐ธ, ๐ท) be a valid encryp-tion scheme where for keys of length ๐‘›, the plaintexts are of length๐ฟ(๐‘›) and the ciphertexts are of length ๐‘š(๐‘›). We say that (๐ธ, ๐ท) iscomputationally secret if for every polynomial ๐‘ โˆถ โ„• โ†’ โ„•, and large

Page 591: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 591

enough ๐‘›, if ๐‘ƒ is an ๐‘š(๐‘›)-input and single output NAND-CIRCprogram of at most ๐‘(๐‘›) lines, and ๐‘ฅ0, ๐‘ฅ1 โˆˆ 0, 1๐ฟ(๐‘›) then

โˆฃ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘ƒ (๐ธ๐‘˜(๐‘ฅ0))] โˆ’ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘ƒ (๐ธ๐‘˜(๐‘ฅ1))]โˆฃ < 1๐‘(๐‘›) (21.4)

PDefinition 21.6 requires a second or third read andsome practice to truly understand. One excellent exer-cise to make sure you follow it is to see that if we allow๐‘ƒ to be an arbitrary function mapping 0, 1๐‘š(๐‘›) to0, 1, and we replace the condition in (21.4) that thelefthand side is smaller than 1๐‘(๐‘›) with the conditionthat it is equal to 0 then we get the perfect secrecycondition of Definition 21.3. Indeed if the distributions๐ธ๐‘˜(๐‘ฅ0) and ๐ธ๐‘˜(๐‘ฅ1) are identical then applying anyfunction ๐‘ƒ to them we get the same expectation. Onthe other hand, if the two distributions above give adifferent probability for some element ๐‘ฆโˆ— โˆˆ 0, 1๐‘š(๐‘›),then the function ๐‘ƒ(๐‘ฆ) that outputs 1 iff ๐‘ฆ = ๐‘ฆโˆ— willhave a different expectation under the former distribu-tion than under the latter.

Definition 21.6 raises two natural questions:

โ€ข Is it strong enough to ensure that a computationally secret encryp-tion scheme protects the secrecy of messages that are encryptedwith it?

โ€ข It is weak enough that, unlike perfect secrecy, it is possible to obtaina computationally secret encryption scheme where the key is muchsmaller than the message?

To the best of our knowledge, the answer to both questions is Yes.This is just one example of a much broader phenomenon. We canuse computational hardness to achieve many cryptographic goals,including some goals that have been dreamed about for millenia, andother goals that people have not even dared to imagine.

Big Idea 27 Computational hardness is necessary and sufficient foralmost all cryptographic applications.

Regarding the first question, it is not hard to show that if, for ex-ample, Alice uses a computationally secret encryption algorithm toencrypt either โ€œattackโ€ or โ€œretreatโ€ (each chosen with probability

Page 592: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

592 introduction to theoretical computer science

Figure 21.12: In a stream cipher or โ€œderandomizedone-time padโ€ we use a pseudorandom generator๐บ โˆถ 0, 1๐‘› โ†’ 0, 1๐ฟ to obtain an encryption schemewith a key length of ๐‘› and plaintexts of length ๐ฟ.We encrypt the plaintext ๐‘ฅ โˆˆ 0, 1๐ฟ with key๐‘˜ โˆˆ 0, 1๐‘› by the ciphertext ๐‘ฅ โŠ• ๐บ(๐‘˜).

1/2), then as long as sheโ€™s restricted to polynomial-time algorithms, anadversary Eve will not be able to guess the message with probabilitybetter than, say, 0.51, even after observing its encrypted form. (Weomit the proof, but it is an excellent exercise for you to work it out onyour own.)

To answer the second question we will show that under the sameassumption we used for derandomizing BPP, we can obtain a com-putationally secret cryptosystem where the key is almost exponentiallysmaller than the plaintext.

21.6.1 Stream ciphers or the โ€œderandomized one-time padโ€It turns out that if pseudorandom generators exist as in the optimalPRG conjecture, then there exists a computationally secret encryptionscheme with keys that are much shorter than the plaintext. The con-struction below is known as a stream cipher, though perhaps a bettername is the โ€œderandomized one-time padโ€. It is widely used in prac-tice with keys on the order of a few tens or hundreds of bits protectingmany terabytes or even petabytes of communication.

We start by recalling the notion of a pseudorandom generator, as de-fined in Definition 20.9. For this chapter, we will fix a special case ofthe definition:

Definition 21.7 โ€” Cryptographic pseudorandom generator. Let ๐ฟ โˆถ โ„• โ†’ โ„• besome function. A cryptographic pseudorandom generator with stretch๐ฟ(โ‹…) is a polynomial-time computable function ๐บ โˆถ 0, 1โˆ— โ†’ 0, 1โˆ—such that:

โ€ข For every ๐‘› โˆˆ โ„• and ๐‘  โˆˆ 0, 1๐‘›, |๐บ(๐‘ )| = ๐ฟ(๐‘›).โ€ข For every polynomial ๐‘ โˆถ โ„• โ†’ โ„• and ๐‘› large enough, if ๐ถ is a cir-

cuit of ๐ฟ(๐‘›) inputs, one output, and at most ๐‘(๐‘›) gates thenโˆฃ Pr๐‘ โˆผ0,1โ„“[๐ถ(๐บ(๐‘ )) = 1] โˆ’ Pr๐‘Ÿโˆผ0,1๐‘š[๐ถ(๐‘Ÿ) = 1]โˆฃ < 1๐‘(๐‘›) . (21.5)

In this chapter we will call a cryptographic pseudorandom gener-ator simply a pseudorandom generator or PRG for short. The optimalPRG conjecture of Section 20.4.2 implies that there is a pseudoran-dom generator that can โ€œfoolโ€ circuits of exponential size and wherethe gap in probabilities is at most one over an exponential quantity.Since exponential grow faster than every polynomial, the optimal PRGconjecture implies the following:

The crypto PRG conjecture: For every ๐‘Ž โˆˆ โ„•, there is a cryptographicpseudorandom generator with ๐ฟ(๐‘›) = ๐‘›๐‘Ž.

Page 593: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 593

The crypto PRG conjecture is a weaker conjecture than the optimalPRG conjecture, but it too (as we will see) is still stronger than theconjecture that P โ‰  NP.

Theorem 21.8 โ€” Derandomized one-time pad. Suppose that the cryptoPRG conjecture is true. Then for every constant ๐‘Ž โˆˆ โ„• there is acomputationally secret encryption scheme (๐ธ, ๐ท) with plaintextlength ๐ฟ(๐‘›) at least ๐‘›๐‘Ž.

Proof Idea:

The proof is illustrated in Fig. 21.12. We simply take the one-timepad on ๐ฟ bit plaintexts, but replace the key with ๐บ(๐‘˜) where ๐‘˜ is astring in 0, 1๐‘› and ๐บ โˆถ 0, 1๐‘› โ†’ 0, 1๐ฟ is a pseudorandom gen-erator. Since the one time pad cannot be broken, an adversary thatbreaks the derandomized one-time pad can be used to distinguishbetween the output of the pseudorandom generator and the uniformdistribution.โ‹†Proof of Theorem 21.8. Let ๐บ โˆถ 0, 1๐‘› โ†’ 0, 1๐ฟ for ๐ฟ = ๐‘›๐‘Ž be therestriction to input length ๐‘› of the pseudorandom generator ๐บ whoseexistence we are guaranteed from the crypto PRG conjecture. Wenow define our encryption scheme as follows: given key ๐‘˜ โˆˆ 0, 1๐‘›and plaintext ๐‘ฅ โˆˆ 0, 1๐ฟ, the encryption ๐ธ๐‘˜(๐‘ฅ) is simply ๐‘ฅ โŠ• ๐บ(๐‘˜).To decrypt a string ๐‘ฆ โˆˆ 0, 1๐‘š we output ๐‘ฆ โŠ• ๐บ(๐‘˜). This is a validencryption since ๐บ is computable in polynomial time and (๐‘ฅ โŠ• ๐บ(๐‘˜)) โŠ•๐บ(๐‘˜) = ๐‘ฅ โŠ• (๐บ(๐‘˜) โŠ• ๐บ(๐‘˜)) = ๐‘ฅ for every ๐‘ฅ โˆˆ 0, 1๐ฟ.

Computational secrecy follows from the condition of a pseudo-random generator. Suppose, towards a contradiction, that there isa polynomial ๐‘, NAND-CIRC program ๐‘„ of at most ๐‘(๐ฟ) lines and๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐ฟ(๐‘›) such thatโˆฃ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐ธ๐‘˜(๐‘ฅ))] โˆ’ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐ธ๐‘˜(๐‘ฅโ€ฒ))]โˆฃ > 1๐‘(๐ฟ) . (21.6)

(We use here the simple fact that for a 0, 1-valued random variable๐‘‹, Pr[๐‘‹ = 1] = ๐”ผ[๐‘‹].)By the definition of our encryption scheme, this means thatโˆฃ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅ)] โˆ’ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅโ€ฒ)]โˆฃ > 1๐‘(๐ฟ) . (21.7)

Now since (as we saw in the security analysis of the one-time pad),for every strings ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ 0, 1๐ฟ, the distribution ๐‘Ÿ โŠ• ๐‘ฅ and ๐‘Ÿ โŠ• ๐‘ฅโ€ฒ areidentical, where ๐‘Ÿ โˆผ 0, 1๐ฟ. Hence๐”ผ๐‘Ÿโˆผ0,1๐ฟ[๐‘„(๐‘Ÿ โŠ• ๐‘ฅ)] = ๐”ผ๐‘Ÿโˆผ0,1๐ฟ[๐‘„(๐‘Ÿ โŠ• ๐‘ฅโ€ฒ)] . (21.8)

Page 594: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

594 introduction to theoretical computer science

By plugging (21.8) into (21.7) we can derive thatโˆฃ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅ)] โˆ’ ๐”ผ๐‘Ÿโˆผ0,1๐ฟ[๐‘„(๐‘Ÿ โŠ• ๐‘ฅ)] + ๐”ผ๐‘Ÿโˆผ0,1๐ฟ[๐‘„(๐‘Ÿ โŠ• ๐‘ฅโ€ฒ)] โˆ’ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅโ€ฒ)]โˆฃ > 1๐‘(๐ฟ) .(21.9)

(Please make sure that you can see why this is true.)Now we can use the triangle inequality that |๐ด + ๐ต| โ‰ค |๐ด| + |๐ต| for

every two numbers ๐ด, ๐ต, applying it for ๐ด = ๐”ผ๐‘˜โˆผ0,1๐‘› [๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅ)] โˆ’๐”ผ๐‘Ÿโˆผ0,1๐ฟ [๐‘„(๐‘ŸโŠ•๐‘ฅ)] and ๐ต = ๐”ผ๐‘Ÿโˆผ0,1๐ฟ [๐‘„(๐‘ŸโŠ•๐‘ฅโ€ฒ)]โˆ’๐”ผ๐‘˜โˆผ0,1๐‘› [๐‘„(๐บ(๐‘˜)โŠ•๐‘ฅโ€ฒ)]to deriveโˆฃ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅ)] โˆ’ ๐”ผ๐‘Ÿโˆผ0,1๐ฟ[๐‘„(๐‘Ÿ โŠ• ๐‘ฅ)]โˆฃ+โˆฃ ๐”ผ๐‘Ÿโˆผ0,1๐ฟ[๐‘„(๐‘Ÿ โŠ• ๐‘ฅโ€ฒ)] โˆ’ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅโ€ฒ)]โˆฃ > 1๐‘(๐ฟ) .

(21.10)In particular, either the first term or the second term of the

lefthand-side of (21.10) must be at least 12๐‘(๐ฟ) . Let us assume the firstcase holds (the second case is analyzed in exactly the same way).Then we get thatโˆฃ ๐”ผ๐‘˜โˆผ0,1๐‘›[๐‘„(๐บ(๐‘˜) โŠ• ๐‘ฅ)] โˆ’ ๐”ผ๐‘Ÿโˆผ0,1๐ฟ[๐‘„(๐‘Ÿ โŠ• ๐‘ฅ)]โˆฃ > 12๐‘(๐ฟ) . (21.11)

But if we now define the NAND-CIRC program ๐‘ƒ๐‘ฅ that on input๐‘Ÿ โˆˆ 0, 1๐ฟ outputs ๐‘„(๐‘ŸโŠ•๐‘ฅ) then (since XOR of ๐ฟ bits can be computedin ๐‘‚(๐ฟ) lines), we get that ๐‘ƒ๐‘ฅ has ๐‘(๐ฟ) + ๐‘‚(๐ฟ) lines and by (21.11) itcan distinguish between an input of the form ๐บ(๐‘˜) and an input of theform ๐‘Ÿ โˆผ 0, 1๐‘˜ with advantage better than 12๐‘(๐ฟ) . Since a polynomialis dominated by an exponential, if we make ๐ฟ large enough, this willcontradict the (2๐›ฟ๐‘›, 2โˆ’๐›ฟ๐‘›) security of the pseudorandom generator ๐บ.

RRemark 21.9 โ€” Stream ciphers in practice. The twomost widely used forms of (private key) encryptionschemes in practice are stream ciphers and block ciphers.(To make things more confusing, a block cipher isalways used in some mode of operation and someof these modes effectively turn a block cipher intoa stream cipher.) A block cipher can be thought asa sort of a โ€œrandom invertible mapโ€ from 0, 1๐‘› to0, 1๐‘›, and can be used to construct a pseudorandomgenerator and from it a stream cipher, or to encryptdata directly using other modes of operations. Thereare a great many other security notions and consider-ations for encryption schemes beyond computationalsecrecy. Many of those involve handling scenariossuch as chosen plaintext, man in the middle, and cho-sen ciphertext attacks, where the adversary is not justmerely a passive eavesdropper but can influence thecommunication in some way. While this chapter is

Page 595: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 595

meant to give you some taste of the ideas behind cryp-tography, there is much more to know before applyingit correctly to obtain secure applications, and a greatmany people have managed to get it wrong.

21.7 COMPUTATIONAL SECRECY AND NPWeโ€™ve also mentioned before that an efficient algorithm for NP couldbe used to break all cryptography. We now give an example of howthis can be done:

Theorem 21.10 โ€” Breaking encryption using NP algorithm. If P = NPthen there is no computationally secret encryption scheme with๐ฟ(๐‘›) > ๐‘›.

Furthermore, for every valid encryption scheme (๐ธ, ๐ท) with๐ฟ(๐‘›) > ๐‘› + 100 there is a polynomial ๐‘ such that for every largeenough ๐‘› there exist ๐‘ฅ0, ๐‘ฅ1 โˆˆ 0, 1๐ฟ(๐‘›) and a ๐‘(๐‘›)-line NAND-CIRC program EVE s.t.

Pr๐‘–โˆผ0,1,๐‘˜โˆผ0,1๐‘›[EVE(๐ธ๐‘˜(๐‘ฅ๐‘–)) = ๐‘–] โ‰ฅ 0.99 . (21.12)

Note that the โ€œfurthermoreโ€ part is extremely strong. It meansthat if the plaintext is even a little bit larger than the key, then we canalready break the scheme in a very strong way. That is, there will bea pair of messages ๐‘ฅ0, ๐‘ฅ1 (think of ๐‘ฅ0 as โ€œsellโ€ and ๐‘ฅ1 as โ€œbuyโ€) andan efficient strategy for Eve such that if Eve gets a ciphertext ๐‘ฆ thenshe will be able to tell whether ๐‘ฆ is an encryption of ๐‘ฅ0 or ๐‘ฅ1 withprobability very close to 1. (We model breaking the scheme as Eveoutputting 0 or 1 corresponding to whether the message sent was ๐‘ฅ0or ๐‘ฅ1. Note that we could have just as well modified Eve to output ๐‘ฅ0instead of 0 and ๐‘ฅ1 instead of 1. The key point is that a priori Eve onlyhad a 50/50 chance of guessing whether Alice sent ๐‘ฅ0 or ๐‘ฅ1 but afterseeing the ciphertext this chance increases to better than 99/1.) Thecondition P = NP can be relaxed to NP โŠ† BPP and even the weakercondition NP โŠ† P/poly with essentially the same proof.Proof Idea:

The proof follows along the lines of Theorem 21.5 but this timepaying attention to the computational aspects. If P = NP then forevery plaintext ๐‘ฅ and ciphertext ๐‘ฆ, we can efficiently tell whether thereexists ๐‘˜ โˆˆ 0, 1๐‘› such that ๐ธ๐‘˜(๐‘ฅ) = ๐‘ฆ. So, to prove this result we needto show that if the plaintexts are long enough, there would exist a pair๐‘ฅ0, ๐‘ฅ1 such that the probability that a random encryption of ๐‘ฅ1 also isa valid encryption of ๐‘ฅ0 will be very small. The details of how to showthis are below.

Page 596: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

596 introduction to theoretical computer science

โ‹†Proof of Theorem 21.10. We focus on showing only the โ€œfurthermoreโ€part since it is the more interesting and the other part follows by es-sentially the same proof.

Suppose that (๐ธ, ๐ท) is such an encryption, let ๐‘› be large enough,and let ๐‘ฅ0 = 0๐ฟ(๐‘›). For every ๐‘ฅ โˆˆ 0, 1๐ฟ(๐‘›) we define ๐‘†๐‘ฅ to be the setof all valid encryption of ๐‘ฅ. That is ๐‘†๐‘ฅ = ๐‘ฆ | โˆƒ๐‘˜โˆˆ0,1๐‘›๐‘ฆ = ๐ธ๐‘˜(๐‘ฅ). Asin the proof of Theorem 21.5, since there are 2๐‘› keys ๐‘˜, |๐‘†๐‘ฅ| โ‰ค 2๐‘› forevery ๐‘ฅ โˆˆ 0, 1๐ฟ(๐‘›).

We denote by ๐‘†0 the set ๐‘†๐‘ฅ0 . We define our algorithm EVE to out-put 0 on input ๐‘ฆ โˆˆ 0, 1โˆ— if ๐‘ฆ โˆˆ ๐‘†0 and to output 1 otherwise. Thiscan be implemented in polynomial time if P = NP, since the key ๐‘˜can serve the role of an efficiently verifiable solution. (Can you seewhy?) Clearly Pr[EVE(๐ธ๐‘˜(๐‘ฅ0)) = 0] = 1 and so in the case that EVEgets an encryption of ๐‘ฅ0 then she guesses correctly with probability1. The remainder of the proof is devoted to showing that there ex-ists ๐‘ฅ1 โˆˆ 0, 1๐ฟ(๐‘›) such that Pr[EVE(๐ธ๐‘˜(๐‘ฅ1)) = 0] โ‰ค 0.01, whichwill conclude the proof by showing that EVE guesses wrongly withprobability at most 12 0 + 12 0.01 < 0.01.

Consider now the following probabilistic experiment (which wedefine solely for the sake of analysis). We consider the sample spaceof choosing ๐‘ฅ uniformly in 0, 1๐ฟ(๐‘›) and define the random variable๐‘๐‘˜(๐‘ฅ) to equal 1 if and only if ๐ธ๐‘˜(๐‘ฅ) โˆˆ ๐‘†0. For every ๐‘˜, the map ๐‘ฅ โ†ฆ๐ธ๐‘˜(๐‘ฅ) is one-to-one, which means that the probability that ๐‘๐‘˜ = 1is equal to the probability that ๐‘ฅ โˆˆ ๐ธโˆ’1๐‘˜ (๐‘†0) which is |๐‘†0|2๐ฟ(๐‘›) . So by thelinearity of expectation ๐”ผ[โˆ‘๐‘˜โˆˆ0,1๐‘› ๐‘๐‘˜] โ‰ค 2๐‘›|๐‘†0|2๐ฟ(๐‘›) โ‰ค 22๐‘›2๐ฟ(๐‘›) .

We will now use the following extremely simple but useful factknown as the averaging principle (see also Lemma 18.10): for everyrandom variable ๐‘, if ๐”ผ[๐‘] = ๐œ‡, then with positive probability ๐‘ โ‰ค ๐œ‡.(Indeed, if ๐‘ > ๐œ‡ with probability one, then the expected value of ๐‘will have to be larger than ๐œ‡, just like you canโ€™t have a class in whichall students got A or A- and yet the overall average is B+.) In our caseit means that with positive probability โˆ‘๐‘˜โˆˆ0,1๐‘› ๐‘๐‘˜ โ‰ค 22๐‘›2๐ฟ(๐‘›) . In otherwords, there exists some ๐‘ฅ1 โˆˆ 0, 1๐ฟ(๐‘›) such that โˆ‘๐‘˜โˆˆ0,1๐‘› ๐‘๐‘˜(๐‘ฅ1) โ‰ค22๐‘›2๐ฟ(๐‘›) . Yet this means that if we choose a random ๐‘˜ โˆผ 0, 1๐‘›, thenthe probability that ๐ธ๐‘˜(๐‘ฅ1) โˆˆ ๐‘†0 is at most 12๐‘› โ‹… 22๐‘›2๐ฟ(๐‘›) = 2๐‘›โˆ’๐ฟ(๐‘›).So, in particular if we have an algorithm EVE that outputs 0 if ๐‘ฅ โˆˆ๐‘†0 and outputs 1 otherwise, then Pr[EVE(๐ธ๐‘˜(๐‘ฅ0)) = 0] = 1 andPr[EVE(๐ธ๐‘˜(๐‘ฅ1)) = 0] โ‰ค 2๐‘›โˆ’๐ฟ(๐‘›) which will be smaller than 2โˆ’10 < 0.01if ๐ฟ(๐‘›) โ‰ฅ ๐‘› + 10.

Page 597: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 597

In retrospect Theorem 21.10 is perhaps not surprising. After all, asweโ€™ve mentioned before it is known that the Optimal PRG conjecture(which is the basis for the derandomized one-time pad encryption) isfalse if P = NP (and in fact even if NP โŠ† BPP or even NP โŠ† P/poly).21.8 PUBLIC KEY CRYPTOGRAPHY

People have been dreaming about heavier-than-air flight since at leastthe days of Leonardo Da Vinci (not to mention Icarus from the greekmythology). Jules Verne wrote with rather insightful details aboutgoing to the moon in 1865. But, as far as I know, in all the thousandsof years people have been using secret writing, until about 50 yearsago no one has considered the possibility of communicating securelywithout first exchanging a shared secret key.

Yet in the late 1960โ€™s and early 1970โ€™s, several people started toquestion this โ€œcommon wisdomโ€. Perhaps the most surprising ofthese visionaries was an undergraduate student at Berkeley namedRalph Merkle. In the fall of 1974 Merkle wrote in a project proposalfor his computer security course that while โ€œit might seem intuitivelyobvious that if two people have never had the opportunity to prear-range an encryption method, then they will be unable to communicatesecurely over an insecure channelโ€ฆ I believe it is falseโ€. The projectproposal was rejected by his professor as โ€œnot good enoughโ€. Merklelater submitted a paper to the communication of the ACM where heapologized for the lack of references since he was unable to find anymention of the problem in the scientific literature, and the only sourcewhere he saw the problem even raised was in a science fiction story.The paper was rejected with the comment that โ€œExperience shows thatit is extremely dangerous to transmit key information in the clear.โ€Merkle showed that one can design a protocol where Alice and Bobcan use ๐‘‡ invocations of a hash function to exchange a key, but anadversary (in the random oracle model, though he of course didnโ€™tuse this name) would need roughly ๐‘‡ 2 invocations to break it. Heconjectured that it may be possible to obtain such protocols wherebreaking is exponentially harder than using them, but could not think ofany concrete way to doing so.

We only found out much later that in the late 1960โ€™s, a few yearsbefore Merkle, James Ellis of the British Intelligence agency GCHQwas having similar thoughts. His curiosity was spurred by an oldWorld-War II manuscript from Bell labs that suggested the followingway that two people could communicate securely over a phone line.Alice would inject noise to the line, Bob would relay his messages,and then Alice would subtract the noise to get the signal. The idea isthat an adversary over the line sees only the sum of Aliceโ€™s and Bobโ€™ssignals, and doesnโ€™t know what came from what. This got James Ellis

Page 598: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

598 introduction to theoretical computer science

thinking whether it would be possible to achieve something like thatdigitally. As Ellis later recollected, in 1970 he realized that in princi-ple this should be possible, since he could think of an hypotheticalblack box ๐ต that on input a โ€œhandleโ€ ๐›ผ and plaintext ๐‘ฅ would give aโ€œciphertextโ€ ๐‘ฆ and that there would be a secret key ๐›ฝ correspondingto ๐›ผ, such that feeding ๐›ฝ and ๐‘ฆ to the box would recover ๐‘ฅ. However,Ellis had no idea how to actually instantiate this box. He and otherskept giving this question as a puzzle to bright new recruits until oneof them, Clifford Cocks, came up in 1973 with a candidate solutionloosely based on the factoring problem; in 1974 another GCHQ re-cruit, Malcolm Williamson, came up with a solution using modularexponentiation.

But among all those thinking of public key cryptography, probablythe people who saw the furthest were two researchers at Stanford,Whit Diffie and Martin Hellman. They realized that with the adventof electronic communication, cryptography would find new applica-tions beyond the military domain of spies and submarines, and theyunderstood that in this new world of many users and point to pointcommunication, cryptography will need to scale up. Diffie and Hell-man envisioned an object which we now call โ€œtrapdoor permutationโ€though they called โ€œone way trapdoor functionโ€ or sometimes simplyโ€œpublic key encryptionโ€. Though they didnโ€™t have full formal defini-tions, their idea was that this is an injective function that is easy (e.g.,polynomial-time) to compute but hard (e.g., exponential-time) to in-vert. However, there is a certain trapdoor, knowledge of which wouldallow polynomial time inversion. Diffie and Hellman argued that us-ing such a trapdoor function, it would be possible for Alice and Bobto communicate securely without ever having exchanged a secret key. Butthey didnโ€™t stop there. They realized that protecting the integrity ofcommunication is no less important than protecting its secrecy. Thusthey imagined that Alice could โ€œrun encryption in reverseโ€ in order tocertify or sign messages.

At the point, Diffie and Hellman were in a position not unlikephysicists who predicted that a certain particle should exist but with-out any experimental verification. Luckily they met Ralph Merkle,and his ideas about a probabilistic key exchange protocol, together witha suggestion from their Stanford colleague John Gill, inspired themto come up with what today is known as the Diffie Hellman Key Ex-change (which unbeknownst to them was found two years earlier atGCHQ by Malcolm Williamson). They published their paper โ€œNewDirections in Cryptographyโ€ in 1976, and it is considered to havebrought about the birth of modern cryptography.

The Diffie-Hellman Key Exchange is still widely used today forsecure communication. However, it still felt short of providing Diffie

Page 599: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 599

Figure 21.13: Top left: Ralph Merkle, Martin Hellmanand Whit Diffie, who together came up in 1976with the concept of public key encryption and a keyexchange protocol. Bottom left: Adi Shamir, Ron Rivest,and Leonard Adleman who, following Diffie andHellmanโ€™s paper, discovered the RSA function thatcan be used for public key encryption and digitalsignatures. Interestingly, one can see the equationP = NP on the blackboard behind them. Right: JohnGill, who was the first person to suggest to Diffie andHellman that they use modular exponentiation as aneasy-to-compute but hard-to-invert function.

Figure 21.14: In a public key encryption, Alice generatesa private/public keypair (๐‘’, ๐‘‘), publishes ๐‘’ and keeps๐‘‘ secret. To encrypt a message for Alice, one onlyneeds to know ๐‘’. To decrypt it we need to know ๐‘‘.

and Hellmanโ€™s elusive trapdoor function. This was done the next yearby Rivest, Shamir and Adleman who came up with the RSA trapdoorfunction, which through the framework of Diffie and Hellman yieldednot just encryption but also signatures. (A close variant of the RSAfunction was discovered earlier by Clifford Cocks at GCHQ, thoughas far as I can tell Cocks, Ellis and Williamson did not realize theapplication to digital signatures.) From this point on began a flurry ofadvances in cryptography which hasnโ€™t died down till this day.

21.8.1 Defining public key encryptionA public key encryption consists of a triple of algorithms:

โ€ข The key generation algorithm, which we denote by ๐พ๐‘’๐‘ฆ๐บ๐‘’๐‘› or KG forshort, is a randomized algorithm that outputs a pair of strings (๐‘’, ๐‘‘)where ๐‘’ is known as the public (or encryption) key, and ๐‘‘ is knownas the private (or decryption) key. The key generation algorithm getsas input 1๐‘› (i.e., a string of ones of length ๐‘›). We refer to ๐‘› as thesecurity parameter of the scheme. The bigger we make ๐‘›, the moresecure the encryption will be, but also the less efficient it will be.

โ€ข The encryption algorithm, which we denote by ๐ธ, takes the encryp-tion key ๐‘’ and a plaintext ๐‘ฅ, and outputs the ciphertext ๐‘ฆ = ๐ธ๐‘’(๐‘ฅ).

โ€ข The decryption algorithm, which we denote by ๐ท, takes the decryp-tion key ๐‘‘ and a ciphertext ๐‘ฆ, and outputs the plaintext ๐‘ฅ = ๐ท๐‘‘(๐‘ฆ).We now make this a formal definition:

Definition 21.11 โ€” Public Key Encryption. A computationally secret publickey encryption with plaintext length ๐ฟ โˆถ โ„• โ†’ โ„• is a triple of ran-domized polynomial-time algorithms (KG, ๐ธ, ๐ท) that satisfy thefollowing conditions:

โ€ข For every ๐‘›, if (๐‘’, ๐‘‘) is output by KG(1๐‘›) with positive proba-bility, and ๐‘ฅ โˆˆ 0, 1๐ฟ(๐‘›), then ๐ท๐‘‘(๐ธ๐‘’(๐‘ฅ)) = ๐‘ฅ with probabilityone.

โ€ข For every polynomial ๐‘, and sufficiently large ๐‘›, if ๐‘ƒ is a NAND-CIRC program of at most ๐‘(๐‘›) lines then for every ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ0, 1๐ฟ(๐‘›), |๐”ผ[๐‘ƒ (๐‘’, ๐ธ๐‘’(๐‘ฅ))] โˆ’ ๐”ผ[๐‘ƒ (๐‘’, ๐ธ๐‘’(๐‘ฅโ€ฒ))]| < 1/๐‘(๐‘›), wherethis probability is taken over the coins of KG and ๐ธ.

Definition 21.11 allows ๐ธ and ๐ท to be randomized algorithms. Infact, it turns out that it is necessary for ๐ธ to be randomized to obtaincomputational secrecy. It also turns out that, unlike the private keycase, we can transform a public-key encryption that works for mes-sages that are only one bit long into a public-key encryption scheme

Page 600: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

600 introduction to theoretical computer science

that can encrypt arbitrarily long messages, and in particular messagesthat are longer than the key. In particular this means that we cannot ob-tain a perfectly secret public-key encryption scheme even for one-bitlong messages (since it would imply a perfectly secret public-key, andhence in particular private-key, encryption with messages longer thanthe key).

We will not give full constructions for public key encryptionschemes in this chapter, but will mention some of the ideas thatunderlie the most widely used schemes today. These generally belongto one of two families:

โ€ข Group theoretic constructions based on problems such as integer factor-ing and the discrete logarithm over finite fields or elliptic curves.

โ€ข Lattice/coding based constructions based on problems such as theclosest vector in a lattice or bounded distance decoding.

Group-theory based encryptions such as the RSA cryptosystem, theDiffie-Hellman protocol, and Elliptic-Curve Cryptography, are cur-rently more widely implemented. But the lattice/coding schemes arerecently on the rise, particularly because the known group theoreticencryption schemes can be broken by quantum computers, which wediscuss in Chapter 23.

21.8.2 Diffie-Hellman key exchangeAs just one example of how public key encryption schemes are con-structed, let us now describe the Diffie-Hellman key exchange. Wedescribe the Diffie-Hellman protocol in a somewhat of an informallevel, without presenting a full security analysis.

The computational problem underlying the Diffie Hellman protocolis the discrete logarithm problem. Letโ€™s suppose that ๐‘” is some integer.We can compute the map ๐‘ฅ โ†ฆ ๐‘”๐‘ฅ and also its inverse ๐‘ฆ โ†ฆ log๐‘” ๐‘ฆ. (Forexample, we can compute a logarithm is by binary search: start withsome interval [๐‘ฅ๐‘š๐‘–๐‘›, ๐‘ฅ๐‘š๐‘Ž๐‘ฅ] that is guaranteed to contain log๐‘” ๐‘ฆ. We canthen test whether the intervalโ€™s midpoint ๐‘ฅ๐‘š๐‘–๐‘‘ satisfies ๐‘”๐‘ฅ๐‘š๐‘–๐‘‘ > ๐‘ฆ, andbased on that halve the size of the interval.)

However, suppose now that we use modular arithmetic and workmodulo some prime number ๐‘. If ๐‘ has ๐‘› binary digits and ๐‘” is in [๐‘]then we can compute the map ๐‘ฅ โ†ฆ ๐‘”๐‘ฅ mod ๐‘ in time polynomial in ๐‘›.(This is not trivial, and is a great exercise for you to work this out; as ahint, start by showing that one can compute the map ๐‘˜ โ†ฆ ๐‘”2๐‘˜ mod ๐‘using ๐‘˜ modular multiplications modulo ๐‘, if youโ€™re stumped, youcan look up this Wikipedia entry.) On the other hand, because of theโ€œwraparoundโ€ property of modular arithmetic, we cannot run binarysearch to find the inverse of this map (known as the discrete logarithm).

Page 601: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 601

In fact, there is no known polynomial-time algorithm for computingthis discrete logarithm map (๐‘”, ๐‘ฅ, ๐‘) โ†ฆ log๐‘” ๐‘ฅ mod ๐‘, where we definelog๐‘” ๐‘ฅ mod ๐‘ as the number ๐‘Ž โˆˆ [๐‘] such that ๐‘”๐‘Ž = ๐‘ฅ mod ๐‘.

The Diffie-Hellman protocol for Bob to send a message to Alice is asfollows:

โ€ข Alice: Chooses ๐‘ to be a random ๐‘› bit long prime (which can bedone by choosing random numbers and running a primality testingalgorithm on them), and ๐‘” and ๐‘Ž at random in [๐‘]. She sends to Bobthe triple (๐‘, ๐‘”, ๐‘”๐‘Ž mod ๐‘).

โ€ข Bob: Given the triple (๐‘, ๐‘”, โ„Ž), Bob sends a message ๐‘ฅ โˆˆ 0, 1๐ฟto Alice by choosing ๐‘ at random in [๐‘], and sending to Alice thepair (๐‘”๐‘ mod ๐‘, ๐‘Ÿ๐‘’๐‘(โ„Ž๐‘ mod ๐‘) โŠ• ๐‘ฅ) where ๐‘Ÿ๐‘’๐‘ โˆถ [๐‘] โ†’ 0, 1โˆ—is some โ€œrepresentation functionโ€ that maps [๐‘] to 0, 1๐ฟ. (Thefunction ๐‘Ÿ๐‘’๐‘ does not need to be one-to-one and you can think of๐‘Ÿ๐‘’๐‘(๐‘ง) as simply outputting ๐ฟ of the bits of ๐‘ง in the natural binaryrepresentation, it does need to satisfy certain technical conditionswhich we omit in this description.)

โ€ข Alice: Given ๐‘”โ€ฒ, ๐‘ง, Alice recovers ๐‘ฅ by outputting ๐‘Ÿ๐‘’๐‘(๐‘”โ€ฒ๐‘Ž mod ๐‘) โŠ•๐‘ง.The correctness of the protocol follows from the simple fact that(๐‘”๐‘Ž)๐‘ = (๐‘”๐‘)๐‘Ž for every ๐‘”, ๐‘Ž, ๐‘ and this still holds if we work modulo๐‘. Its security relies on the computational assumption that computing

this map is hard, even in a certain โ€œaverage caseโ€ sense (this computa-tional assumption is known as the Decisional Diffie Hellman assump-tion). The Diffie-Hellman key exchange protocol can be thought of asa public key encryption where the Aliceโ€™s first message is the publickey, and Bobโ€™s message is the encryption.

One can think of the Diffie-Hellman protocol as being based ona โ€œtrapdoor pseudorandom generatorโ€ whereas the triple ๐‘”๐‘Ž, ๐‘”๐‘, ๐‘”๐‘Ž๐‘looks โ€œrandomโ€ to someone that doesnโ€™t know ๐‘Ž, but someone thatdoes know ๐‘Ž can see that raising the second element to the ๐‘Ž-th poweryields the third element. The Diffie-Hellman protocol can be describedabstractly in the context of any finite Abelian group for which we canefficiently compute the group operation. It has been implementedon other groups than numbers modulo ๐‘, and in particular EllipticCurve Cryptography (ECC) is obtained by basing the Diffie Hell-man on elliptic curve groups which gives some practical advantages.Another common group theoretic basis for key-exchange/public keyencryption protocol is the RSA function. A big disadvantage of Diffie-Hellman (both the modular arithmetic and elliptic curve variants)and RSA is that both schemes can be broken in polynomial time by a

Page 602: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

602 introduction to theoretical computer science

quantum computer. We will discuss quantum computing later in thiscourse.

21.9 OTHER SECURITY NOTIONS

There is a great deal to cryptography beyond just encryption schemes,and beyond the notion of a passive adversary. A central objectiveis integrity or authentication: protecting communications from beingmodified by an adversary. Integrity is often more fundamental thansecrecy: whether it is a software update or viewing the news, youmight often not care about the communication being secret as much asthat it indeed came from its claimed source. Digital signature schemesare the analog of public key encryption for authentication, and arewidely used (in particular as the basis for public key certificates) toprovide a foundation of trust in the digital world.

Similarly, even for encryption, we often need to ensure securityagainst active attacks, and so notions such as non-malleability andadaptive chosen ciphertext security have been proposed. An encryp-tion scheme is only as secure as the secret key, and mechanisms tomake sure the key is generated properly, and is protected against re-fresh or even compromise (i.e., forward secrecy) have been studied aswell. Hopefully this chapter provides you with some appreciation forcryptography as an intellectual field, but does not imbue you with afalse self confidence in implementing it.

Cryptographic hash functions is another widely used tool with a va-riety of uses, including extracting randomness from high entropysources, achieving hard-to-forge short โ€œdigestsโ€ of files, protectingpasswords, and much more.

21.10 MAGIC

Beyond encryption and signature schemes, cryptographers have man-aged to obtain objects that truly seem paradoxical and โ€œmagicalโ€. Webriefly discuss some of these objects. We do not give any details, buthopefully this will spark your curiosity to find out more.

21.10.1 Zero knowledge proofsOn October 31, 1903, the mathematician Frank Nelson Cole,gave an hourlong lecture to a meeting of the American Mathe-matical Society where he did not speak a single word. Rather,he calculated on the board the value 267 โˆ’ 1 which is equal to147, 573, 952, 589, 676, 412, 927, and then showed that this number isequal to 193, 707, 721 ร— 761, 838, 257, 287. Coleโ€™s proof showed that267 โˆ’ 1 is not a prime, but it also revealed additional information,namely its actual factors. This is often the case with proofs: they teachus more than just the validity of the statements.

Page 603: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 603

In Zero Knowledge Proofs we try to achieve the opposite effect. Wewant a proof for a statement ๐‘‹ where we can rigorously show that theproofs reveals absolutely no additional information about ๐‘‹ beyond thefact that it is true. This turns out to be an extremely useful object fora variety of tasks including authentication, secure protocols, voting,anonymity in cryptocurrencies, and more. Constructing these ob-jects relies on the theory of NP completeness. Thus this theory thatoriginally was designed to give a negative result (show that some prob-lems are hard) ended up yielding positive applications, enabling us toachieve tasks that were not possible otherwise.

21.10.2 Fully homomorphic encryptionSuppose that we are given a bit-by-bit encryption of a string๐ธ๐‘˜(๐‘ฅ0), โ€ฆ , ๐ธ๐‘˜(๐‘ฅ๐‘›โˆ’1). By design, these ciphertexts are supposed tobe โ€œcompletely unscrutableโ€ and we should not be able to extractany information about ๐‘ฅ๐‘–โ€™s from it. However, already in 1978, Rivest,Adleman and Dertouzos observed that this does not imply that wecould not manipulate these encryptions. For example, it turns out thesecurity of an encryption scheme does not immediately rule out theability to take a pair of encryptions ๐ธ๐‘˜(๐‘Ž) and ๐ธ๐‘˜(๐‘) and computefrom them ๐ธ๐‘˜(๐‘ŽNAND๐‘) without knowing the secret key ๐‘˜. But do thereexist encryption schemes that allow such manipulations? And if so, isthis a bug or a feature?

Rivest et al already showed that such encryption schemes couldbe immensely useful, and their utility has only grown in the age ofcloud computing. After all, if we can compute NAND then we canuse this to run any algorithm ๐‘ƒ on the encrypted data, and map๐ธ๐‘˜(๐‘ฅ0), โ€ฆ , ๐ธ๐‘˜(๐‘ฅ๐‘›โˆ’1) to ๐ธ๐‘˜(๐‘ƒ (๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1)). For example, a clientcould store their secret data ๐‘ฅ in encrypted form on the cloud, andhave the cloud provider perform all sorts of computation on thesedata without ever revealing to the provider the private key, and sowithout the provider ever learning any information about the secretdata.

The question of existence of such a scheme took much longer timeto resolve. Only in 2009 Craig Gentry gave the first construction ofan encryption scheme that allows to compute a universal basis ofgates on the data (known as a Fully Homomorphic Encryption scheme incrypto parlance). Gentryโ€™s scheme left much to be desired in terms ofefficiency, and improving upon it has been the focus of an intensiveresearch program that has already seen significant improvements.

21.10.3 Multiparty secure computationCryptography is about enabling mutually distrusting parties toachieve a common goal. Perhaps the most general primitive achiev-

Page 604: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

604 introduction to theoretical computer science

ing this objective is secure multiparty computation. The idea in securemultiparty computation is that ๐‘› parties interact together to computesome function ๐น(๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1) where ๐‘ฅ๐‘– is the private input of the ๐‘–-thparty. The crucial point is that there is no commonly trusted party orauthority and that nothing is revealed about the secret data beyond thefunctionโ€™s output. One example is an electronic voting protocol whereonly the total vote count is revealed, with the privacy of the individualvoters protected, but without having to trust any authority to eithercount the votes correctly or to keep information confidential. Anotherexample is implementing a second price (aka Vickrey) auction where๐‘› โˆ’ 1 parties submit bids to an item owned by the ๐‘›-th party, and theitem goes to the highest bidder but at the price of the second highest bid.Using secure multiparty computation we can implement second priceauction in a way that will ensure the secrecy of the numerical valuesof all bids (including even the top one) except the second highest one,and the secrecy of the identity of all bidders (including even the sec-ond highest bidder) except the top one. We emphasize that such aprotocol requires no trust even in the auctioneer itself, that will alsonot learn any additional information. Secure multiparty computationcan be used even for computing randomized processes, with one exam-ple being playing Poker over the net without having to trust any serverfor correct shuffling of cards or not revealing the information.

Chapter Recap

โ€ข We can formally define the notion of security of anencryption scheme.

โ€ข Perfect secrecy ensures that an adversary does notlearn anything about the plaintext from the cipher-text, regardless of their computational powers.

โ€ข The one-time pad is a perfectly secret encryptionwith the length of the key equaling the length ofthe message. No perfectly secret encryption canhave key shorter than the message.

โ€ข Computational secrecy can be as good as perfectsecrecy since it ensures that the advantage thatcomputationally bounded adversaries gain fromobserving the ciphertext is exponentially small. Ifthe optimal PRG conjecture is true then there existsa computationally secret encryption scheme withmessages that can be (almost) exponentially biggerthan the key.

โ€ข There are many cryptographic tools that go well be-yond private key encryption. These include publickey encryption, digital signatures and hash functions,as well as more โ€œmagicalโ€ tools such as multipartysecure computation, fully homomorphic encryption, zeroknowledge proofs, and many others.

Page 605: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

cryptography 605

21.11 EXERCISES

21.12 BIBLIOGRAPHICAL NOTES

Much of this text is taken from my lecture notes on cryptography.Shannonโ€™s manuscript was written in 1945 but was classified, and a

partial version was only published in 1949. Still it has revolutionizedcryptography, and is the forerunner to much of what followed.

The Venona projectโ€™s history is described in this document. Asidefrom Grabeel and Zubko, credit to the discovery that the Soviets werereusing keys is shared by Lt. Richard Hallock, Carrie Berry, FrankLewis, and Lt. Karl Elmquist, and there are others that have madeimportant contribution to this project. See pages 27 and 28 in thedocument.

In a 1955 letter to the NSA that only recently came forward, JohnNash proposed an โ€œunbreakableโ€ encryption scheme. He wrote โ€œIhope my handwriting, etc. do not give the impression I am just a crank orcircle-squarerโ€ฆ. The significance of this conjecture [that certain encryptionschemes are exponentially secure against key recovery attacks] .. is that it isquite feasible to design ciphers that are effectively unbreakable.โ€. John Nashmade seminal contributions in mathematics and game theory, and wasawarded both the Abel Prize in mathematics and the Nobel MemorialPrize in Economic Sciences. However, he has struggled with mentalillness throughout his life. His biography, A Beautiful Mind was madeinto a popular movie. It is natural to compare Nashโ€™s 1955 letter to theNSA to Gรถdelโ€™s letter to von Neumann we mentioned before. Fromthe theoretical computer science point of view, the crucial differenceis that while Nash informally talks about exponential vs polynomialcomputation time, he does not mention the word โ€œTuring Machineโ€or other models of computation, and it is not clear if he is aware or notthat his conjecture can be made mathematically precise (assuming aformalization of โ€œsufficiently complex types of encipheringโ€).

The definition of computational secrecy we use is the notion ofcomputational indistinguishability (known to be equivalent to semanticsecurity) that was given by Goldwasser and Micali in 1982.

Although they used a different terminology, Diffie and Hellmanalready made clear in their paper that their protocol can be used asa public key encryption, with the first message being put in a โ€œpub-lic fileโ€. In 1985, ElGamal showed how to obtain a signature schemebased on the Diffie Hellman ideas, and since he described the Diffie-Hellman encryption scheme in the same paper, the public key encryp-tion scheme originally proposed by Diffie and Hellman is sometimesalso known as ElGamal encryption.

My survey contains a discussion on the different types of public keyassumptions. While the standard elliptic curve cryptographic schemes

Page 606: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

606 introduction to theoretical computer science

are as susceptible to quantum computers as Diffie-Hellman and RSA,their main advantage is that the best known classical algorithms forcomputing discrete logarithms over elliptic curve groups take time 2๐œ–๐‘›for some ๐œ– > 0 where ๐‘› is the number of bits to describe a group ele-ment. In contrast, for the multiplicative group modulo a prime ๐‘ thebest algorithm take time 2๐‘‚(๐‘›1/3๐‘๐‘œ๐‘™๐‘ฆ๐‘™๐‘œ๐‘”(๐‘›)) which means that (assum-ing the known algorithms are optimal) we need to set the prime to bebigger (and so have larger key sizes with corresponding overhead incommunication and computation) to get the same level of security.

Zero-knowledge proofs were constructed by Goldwasser, Micali,and Rackoff in 1982, and their wide applicability was shown (usingthe theory of NP completeness) by Goldreich, Micali, and Wigdersonin 1986.

Two party and multiparty secure computation protocols were con-structed (respectively) by Yao in 1982 and Goldreich, Micali, andWigderson in 1987. The latter work gave a general transformationfrom security against passive adversaries to security against activeadversaries using zero knowledge proofs.

Page 607: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

22Proofs and algorithms

โ€œLetโ€™s not try to define knowledge, but try to define zero-knowledge.โ€, ShafiGoldwasser.

Proofs have captured human imagination for thousands of years,ever since the publication of Euclidโ€™s Elements, a book second only tothe bible in the number of editions printed.

Plan:

โ€ข Proofs and algorithms

โ€ข Interactive proofs

โ€ข Zero knowledge proofs

โ€ข Propositions as types, Coq and other proof assistants.

22.1 EXERCISES

22.2 BIBLIOGRAPHICAL NOTES

Compiled on 8.26.2020 18:10

Page 608: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 609: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

23Quantum computing

โ€œWe always have had (secret, secret, close the doors!) โ€ฆ a great deal of diffi-culty in understanding the world view that quantum mechanics represents โ€ฆIt has not yet become obvious to me that thereโ€™s no real problem. โ€ฆ Can I learnanything from asking this question about computersโ€“about this may or maynot be mystery as to what the world view of quantum mechanics is?โ€ , RichardFeynman, 1981

โ€œThe only difference between a probabilistic classical world and the equationsof the quantum world is that somehow or other it appears as if the probabilitieswould have to go negativeโ€, Richard Feynman, 1981

There were two schools of natural philosophy in ancient Greece.Aristotle believed that objects have an essence that explains their behav-ior, and a theory of the natural world has to refer to the reasons (or โ€œfi-nal causeโ€ to use Aristotleโ€™s language) as to why they exhibit certainphenomena. Democritus believed in a purely mechanistic explanationof the world. In his view, the universe was ultimately composed ofelementary particles (or Atoms) and our observed phenomena arisefrom the interactions between these particles according to some localrules. Modern science (arguably starting with Newton) has embracedDemocritusโ€™ point of view, of a mechanistic or โ€œclockworkโ€ universeof particles and forces acting upon them.

While the classification of particles and forces evolved with time,to a large extent the โ€œbig pictureโ€ has not changed from Newton tillEinstein. In particular it was held as an axiom that if we knew fullythe current state of the universe (i.e., the particles and their propertiessuch as location and velocity) then we could predict its future state atany point in time. In computational language, in all these theories thestate of a system with ๐‘› particles could be stored in an array of ๐‘‚(๐‘›)numbers, and predicting the evolution of the system can be done byrunning some efficient (e.g., ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›) time) deterministic computationon this array.

Compiled on 8.26.2020 18:10

Learning Objectives:โ€ข See main aspects in which quantum

mechanics differs from local deterministictheories.

โ€ข Model of quantum circuits, or equivalentlyQNAND-CIRC programs

โ€ข The complexity class BQP and what we knowabout its relation to other classes

โ€ข Ideas behind Shorโ€™s Algorithm and theQuantum Fourier Transform

Page 610: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

610 introduction to theoretical computer science

Figure 23.1: In the โ€œdouble baseball experimentโ€ weshoot baseballs from a gun at a soft wall through ahard barrier that has one or two slits open in it. Thereis only โ€œconstructive interferenceโ€ in the sense thatthe dent in each position in the wall when both slitsare open is the sum of the dents when each slit isopen on its own.

Figure 23.2: The setup of the double slit experimentin the case of photon or electron guns. We see alsodestructive interference in the sense that there aresome positions on the wall that get fewer hits whenboth slits are open than they get when only one of theslits is open. See also this video.

23.1 THE DOUBLE SLIT EXPERIMENT

Alas, in the beginning of the 20th century, several experimental re-sults were calling into question this โ€œclockworkโ€ or โ€œbilliard ballโ€theory of the world. One such experiment is the famous double slit ex-periment. Here is one way to describe it. Suppose that we buy one ofthose baseball pitching machines, and aim it at a soft plastic wall, butput a metal barrier with a single slit between the machine and the plasticwall (see Fig. 23.1). If we shoot baseballs at the plastic wall, then someof the baseballs would bounce off the metal barrier, while some wouldmake it through the slit and dent the wall. If we now carve out an ad-ditional slit in the metal barrier then more balls would get through,and so the plastic wall would be even more dented.

So far this is pure common sense, and it is indeed (to my knowl-edge) an accurate description of what happens when we shoot base-balls at a plastic wall. However, this is not the same when we shootphotons. Amazingly, if we shoot with a โ€œphoton gunโ€ (i.e., a laser) ata wall equipped with photon detectors through some barrier, then(as shown in Fig. 23.2) in some positions of the wall we will see fewerhits when the two slits are open than when only one of them is!. Inparticular there are positions in the wall that are hit when the first slitis open, hit when the second gun is open, but are not hit at all when bothslits are open!.

It seems as if each photon coming out of the gun is aware of theglobal setup of the experiment, and behaves differently if two slits areopen than if only one is. If we try to โ€œcatch the photon in the actโ€ andplace a detector right next to each slit so we can see exactly the patheach photon takes then something even more bizarre happens. Themere fact that we measure the path changes the photonโ€™s behavior, andnow this โ€œdestructive interferenceโ€ pattern is gone and the numberof times a position is hit when two slits are open is the sum of thenumber of times it is hit when each slit is open.

PYou should read the paragraphs above more thanonce and make sure you appreciate how truly mindboggling these results are.

23.2 QUANTUM AMPLITUDES

The double slit and other experiments ultimately forced scientists toaccept a very counterintuitive picture of the world. It is not merelyabout nature being randomized, but rather it is about the probabilitiesin some sense โ€œgoing negativeโ€ and cancelling each other!

Page 611: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 611

To see what we mean by this, let us go back to the baseball exper-iment. Suppose that the probability a ball passes through the left slitis ๐‘๐ฟ and the probability that it passes through the right slit is ๐‘๐‘….Then, if we shoot ๐‘ balls out of each gun, we expect the wall will behit (๐‘๐ฟ + ๐‘๐‘…)๐‘ times. In contrast, in the quantum world of photonsinstead of baseballs, it can sometimes be the case that in both the firstand second case the wall is hit with positive probabilities ๐‘๐ฟ and ๐‘๐‘…respectively but somehow when both slits are open the wall (or a par-ticular position in it) is not hit at all. Itโ€™s almost as if the probabilitiescan โ€œcancel each other outโ€.

To understand the way we model this in quantum mechanics, it ishelpful to think of a โ€œlazy evaluationโ€ approach to probability. Wecan think of a probabilistic experiment such as shooting a baseballthrough two slits in two different ways:โ€ข When a ball is shot, โ€œnatureโ€ tosses a coin and decides if it will go

through the left slit (which happens with probability ๐‘๐ฟ), right slit(which happens with probability ๐‘๐‘…), or bounce back. If it passesthrough one of the slits then it will hit the wall. Later we can look atthe wall and find out whether or not this event happened, but thefact that the event happened or not is determined independently ofwhether or not we look at the wall.

โ€ข The other viewpoint is that when a ball is shot, โ€œnatureโ€ computesthe probabilities ๐‘๐ฟ and ๐‘๐‘… as before, but does not yet โ€œtoss thecoinโ€ and determines what happened. Only when we actuallylook at the wall, nature tosses a coin and with probability ๐‘๐ฟ + ๐‘๐‘…ensures we see a dent. That is, nature uses โ€œlazy evaluationโ€, andonly determines the result of a probabilistic experiment when wedecide to measure it.While the first scenario seems much more natural, the end result

in both is the same (the wall is hit with probability ๐‘๐ฟ + ๐‘๐‘…) and sothe question of whether we should model nature as following the firstscenario or second one seems like asking about the proverbial tree thatfalls in the forest with no one hearing about it.

However, when we want to describe the double slit experimentwith photons rather than baseballs, it is the second scenario that lendsitself better to a quantum generalization. Quantum mechanics as-sociates a number ๐›ผ known as an amplitude with each probabilisticexperiment. This number ๐›ผ can be negative, and in fact even complex.We never observe the amplitudes directly, since whenever we mea-sure an event with amplitude ๐›ผ, nature tosses a coin and determinesthat the event happens with probability |๐›ผ|2. However, the sign (orin the complex case, phase) of the amplitudes can affect whether twodifferent events have constructive or destructive interference.

Page 612: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

612 introduction to theoretical computer science

Specifically, consider an event that can either occur or not (e.g. โ€œde-tector number 17 was hit by a photonโ€). In classical probability, wemodel this by a probability distribution over the two outcomes: a pairof non-negative numbers ๐‘ and ๐‘ž such that ๐‘ + ๐‘ž = 1, where ๐‘ corre-sponds to the probability that the event occurs and ๐‘ž corresponds tothe probability that the event does not occur. In quantum mechanics,we model this also by pair of numbers, which we call amplitudes. Thisis a pair of (potentially negative or even complex) numbers ๐›ผ and ๐›ฝsuch that |๐›ผ|2 + |๐›ฝ|2 = 1. The probability that the event occurs is |๐›ผ|2and the probability that it does not occur is |๐›ฝ|2. In isolation, thesenegative or complex numbers donโ€™t matter much, since we anywaysquare them to obtain probabilities. But the interaction of positive andnegative amplitudes can result in surprising cancellations where some-how combining two scenarios where an event happens with positiveprobability results in a scenario where it never does.

PIf you donโ€™t find the above description confusing andunintuitive, you probably didnโ€™t get it. Please makesure to re-read the above paragraphs until you arethoroughly confused.

Quantum mechanics is a mathematical theory that allows us tocalculate and predict the results of the double-slit and many other ex-periments. If you think of quantum mechanics as an explanation as towhat โ€œreallyโ€ goes on in the world, it can be rather confusing. How-ever, if you simply โ€œshut up and calculateโ€ then it works amazinglywell at predicting experimental results. In particular, in the doubleslit experiment, for any position in the wall, we can compute num-bers ๐›ผ and ๐›ฝ such that photons from the first and second slit hit thatposition with probabilities |๐›ผ|2 and |๐›ฝ|2 respectively. When we openboth slits, the probability that the position will be hit is proportionalto |๐›ผ + ๐›ฝ|2, and so in particular, if ๐›ผ = โˆ’๐›ฝ then it will be the case that,despite being hit when either slit one or slit two are open, the positionis not hit at all when they both are. If you are confused by quantummechanics, you are not alone: for decades people have been trying tocome up with explanations for โ€œthe underlying realityโ€ behind quan-tum mechanics, including Bohmian Mechanics, Many Worlds andothers. However, none of these interpretations have gained universalacceptance and all of those (by design) yield the same experimentalpredictions. Thus at this point many scientists prefer to just ignore thequestion of what is the โ€œtrue realityโ€ and go back to simply โ€œshuttingup and calculatingโ€.

Page 613: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 613

RRemark 23.1 โ€” Complex vs real, other simplifications. If(like the author) you are a bit intimidated by complexnumbers, donโ€™t worry: you can think of all ampli-tudes as real (though potentially negative) numberswithout loss of understanding. All the โ€œmagicโ€ ofquantum computing already arises in this case, andso we will often restrict attention to real amplitudes inthis chapter.We will also only discuss so-called pure quantumstates, and not the more general notion of mixed states.Pure states turn out to be sufficient for understandingthe algorithmic aspects of quantum computing.More generally, this chapter is not meant to be a com-plete description of quantum mechanics, quantuminformation theory, or quantum computing, but ratherillustrate the main points where these differ fromclassical computing.

23.2.1 Linear algebra quick reviewLinear algebra underlies much of quantum mechanics, and so youwould do well to review some of the basic notions such as vectors,matrices, and linear subspaces. The operations in quantum mechanicscan be represented as linear functions over the complex numbers, butwe stick to the real numbers in this chapter. This does not cause muchloss in understanding but does allow us to simplify our notation andeliminate the use of the complex conjugate.

The main notions we use are:

โ€ข A function ๐น โˆถ โ„๐‘ โ†’ โ„๐‘ is linear if ๐น(๐›ผ๐‘ข + ๐›ฝ๐‘ฃ) = ๐›ผ๐น(๐‘ข) + ๐›ฝ๐น(๐‘ฃ) forevery ๐›ผ, ๐›ฝ โˆˆ โ„ and ๐‘ข, ๐‘ฃ โˆˆ โ„๐‘ .

โ€ข The inner product of two vectors ๐‘ข, ๐‘ฃ โˆˆ โ„๐‘ can be defined as โŸจ๐‘ข, ๐‘ฃโŸฉ =โˆ‘๐‘–โˆˆ[๐‘] ๐‘ข๐‘–๐‘ฃ๐‘–. (There can be different inner products but we stickto this one.) The norm of a vector ๐‘ข โˆˆ โ„๐‘ is defined as โ€–๐‘ขโ€– =โˆšโŸจ๐‘ข, ๐‘ขโŸฉ = โˆšโˆ‘๐‘–โˆˆ[๐‘] ๐‘ข2๐‘– . We say that ๐‘ข is a unit vector if โ€–๐‘ขโ€– = 1.

โ€ข Two vectors ๐‘ข, ๐‘ฃ โˆˆ โ„๐‘ are orthogonal if โŸจ๐‘ข, ๐‘ฃโŸฉ = 0. An orthonormalbasis for โ„๐‘ is a set of ๐‘ vectors ๐‘ฃ0, ๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘โˆ’1 such that โ€–๐‘ฃ๐‘–โ€– = 1for every ๐‘– โˆˆ [๐‘] and โŸจ๐‘ฃ๐‘–, ๐‘ฃ๐‘—โŸฉ = 0 for every ๐‘– โ‰  ๐‘—. A canoncialexample is the standard basis ๐‘’0, โ€ฆ , ๐‘’๐‘โˆ’1, where ๐‘’๐‘– is the vector thathas zeroes in all cooordinates except the ๐‘–-th coordinate in whichits value is 1. A quirk of the quantum mechanics literature is that๐‘’๐‘– is often denoted by |๐‘–โŸฉ. We often consider the case ๐‘ = 2๐‘›, inwhich case we identify [๐‘] with 0, 1๐‘› and for every ๐‘ฅ โˆˆ 0, 1๐‘›,we denote the standard basis element corresponding to the ๐‘ฅ-thcoordinate by |๐‘ฅโŸฉ.

Page 614: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

614 introduction to theoretical computer science

1 If you are extremely paranoid about Alice and Bobcommunicating with one another, you can coordinatewith your assistant to perform the experiment exactlyat the same time, and make sure that the roomsare sufficiently far apart (e.g., are on two differentcontinents, or maybe even one is on the moon andanother is on earth) so that Alice and Bob couldnโ€™tcommunicate to each other in time the results of theirrespective coins even if they do so at the speed oflight.

โ€ข If ๐‘ข is a vector in โ„๐‘› and ๐‘ฃ0, โ€ฆ , ๐‘ฃ๐‘โˆ’1 is an orthonormal basis forโ„๐‘ , then there are coefficients ๐›ผ0, โ€ฆ , ๐›ผ๐‘โˆ’1 such that ๐‘ข = ๐›ผ0๐‘ฃ0 +โ‹ฏ + ๐›ผ๐‘โˆ’1๐‘ฃ๐‘โˆ’1. Consequently, the value ๐น(๐‘ข) is determined by thevalues ๐น(๐‘ฃ0), โ€ฆ, ๐น(๐‘ฃ๐‘โˆ’1). Moreover, โ€–๐‘ขโ€– = โˆšโˆ‘๐‘–โˆˆ[๐‘] ๐›ผ2๐‘– .

โ€ข We can represent a linear function ๐น โˆถ โ„๐‘ โ†’ โ„๐‘ as an ๐‘ ร— ๐‘matrix ๐‘€(๐น) where the coordinate in the ๐‘–-th row and ๐‘—-th columnof ๐‘€(๐น) (that is ๐‘€(๐น)๐‘–,๐‘—) is equal to โŸจ๐‘’๐‘–, ๐น (๐‘’๐‘—)โŸฉ or equivalently the๐‘–-th coordinate of ๐น(๐‘’๐‘—).

โ€ข A linear function ๐น โˆถ โ„๐‘ โ†’ โ„๐‘ such that โ€–๐น(๐‘ข)โ€– = โ€–๐‘ขโ€– for every๐‘ข is called unitary. It can be shown that a function ๐น is unitary ifand only if ๐‘€(๐น)๐‘€(๐น)โŠค = ๐ผ where โŠค is the transpose operator(in the complex case the conjugate transpose) and ๐ผ is the ๐‘ ร— ๐‘identity matrix that has 1โ€™s on the diagonal and zeroes everywhereelse. (For every two matrices ๐ด, ๐ต, we use ๐ด๐ต to denote the matrixproduct of ๐ด and ๐ต.) Another equivalent characterization of thiscondition is that ๐‘€(๐น)โŠค = ๐‘€(๐น)โˆ’1 and yet another is that both therows and columns of ๐‘€(๐น) form an orthonormal basis.

23.3 BELLโ€™S INEQUALITY

There is something weird about quantum mechanics. In 1935 Einstein,Podolsky and Rosen (EPR) tried to pinpoint this issue by highlightinga previously unrealized corollary of this theory. They showed thatthe idea that nature does not determine the results of an experimentuntil it is measured results in so called โ€œspooky action at a distanceโ€.Namely, making a measurement of one object may instantaneouslyeffect the state (i.e., the vector of amplitudes) of another object in theother end of the universe.

Since the vector of amplitudes is just a mathematical abstraction,the EPR paper was considered to be merely a thought experiment forphilosophers to be concerned about, without bearing on experiments.This changed when in 1965 John Bell showed an actual experimentto test the predictions of EPR and hence pit intuitive common senseagainst quantum mechanics. Quantum mechanics won: it turns outthat it is in fact possible to use measurements to create correlationsbetween the states of objects far removed from one another that cannotbe explained by any prior theory. Nonetheless, since the results ofthese experiments are so obviously wrong to anyone that has ever satin an armchair, that there are still a number of Bell denialists arguingthat this canโ€™t be true and quantum mechanics is wrong.

So, what is this Bellโ€™s Inequality? Suppose that Alice and Bob tryto convince you they have telepathic ability, and they aim to prove itvia the following experiment. Alice and Bob will be in separate closed

Page 615: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 615

rooms.1 You will interrogate Alice and your associate will interrogateBob. You choose a random bit ๐‘ฅ โˆˆ 0, 1 and your associate choosesa random ๐‘ฆ โˆˆ 0, 1. We let ๐‘Ž be Aliceโ€™s response and ๐‘ be Bobโ€™sresponse. We say that Alice and Bob win this experiment if ๐‘Ž โŠ• ๐‘ =๐‘ฅ โˆง ๐‘ฆ. In other words, Alice and Bob need to output two bits thatdisagree if ๐‘ฅ = ๐‘ฆ = 1 and agree otherwise.

Now if Alice and Bob are not telepathic, then they need to agree inadvance on some strategy. Itโ€™s not hard for Alice and Bob to succeedwith probability 3/4: just always output the same bit. Moreover, bydoing some case analysis, we can show that no matter what strategythey use, Alice and Bob cannot succeed with higher probability thanthat:

Theorem 23.2 โ€” Bellโ€™s Inequality. For every two functions ๐‘“, ๐‘” โˆถ 0, 1 โ†’0, 1, Pr๐‘ฅ,๐‘ฆโˆˆ0,1[๐‘“(๐‘ฅ) โŠ• ๐‘”(๐‘ฆ) = ๐‘ฅ โˆง ๐‘ฆ] โ‰ค 3/4.Proof. Since the probability is taken over all four choices of ๐‘ฅ, ๐‘ฆ โˆˆ0, 1, the only way the theorem can be violated if if there exist twofunctions ๐‘“, ๐‘” that satisfy ๐‘“(๐‘ฅ) โŠ• ๐‘”(๐‘ฆ) = ๐‘ฅ โˆง ๐‘ฆ (23.1)

for all the four choices of ๐‘ฅ, ๐‘ฆ โˆˆ 0, 12. Letโ€™s plug in all these fourchoices and see what we get (below we use the equalities ๐‘ง โŠ• 0 = ๐‘ง,๐‘ง โˆง 0 = 0 and ๐‘ง โˆง 1 = ๐‘ง):๐‘“(0) โŠ• ๐‘”(0) = 0 (plugging in ๐‘ฅ = 0, ๐‘ฆ = 0)๐‘“(0) โŠ• ๐‘”(1) = 0 (plugging in ๐‘ฅ = 0, ๐‘ฆ = 1)๐‘“(1) โŠ• ๐‘”(0) = 0 (plugging in ๐‘ฅ = 1, ๐‘ฆ = 0)๐‘“(1) โŠ• ๐‘”(1) = 1 (plugging in ๐‘ฅ = 1, ๐‘ฆ = 1) (23.2)

If we XOR together the first and second equalities we get ๐‘”(0) โŠ•๐‘”(1) = 0 while if we XOR together the third and fourth equalities weget ๐‘”(0) โŠ• ๐‘”(1) = 1, thus obtaining a contradiction.

RRemark 23.3 โ€” Randomized strategies. Theorem 23.2above assumes that Alice and Bob use deterministicstrategies ๐‘“ and ๐‘” respectively. More generally, Aliceand Bob could use a randomized strategy, or equiva-lently, each could choose ๐‘“ and ๐‘” from some distri-butions โ„ฑ and ๐’ข respectively. However the averagingprinciple (Lemma 18.10) implies that if all possibledeterministic strategies succeed with probability atmost 3/4, then the same is true for all randomizedstrategies.

Page 616: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

616 introduction to theoretical computer science

2 More accurately, one either has to give up on aโ€œbilliard ball typeโ€ theory of the universe or believein telepathy (believe it or not, some scientists went forthe latter option).

An amazing experimentally verified fact is that quantum mechanicsallows for โ€œtelepathyโ€.2 Specifically, it has been shown that using theweirdness of quantum mechanics, there is in fact a strategy for Aliceand Bob to succeed in this game with probability larger than 3/4 (infact, they can succeed with probability about 0.85, see Lemma 23.5).

23.4 QUANTUM WEIRDNESS

Some of the counterintuitive properties that arise from quantum me-chanics include:

โ€ข Interference - As weโ€™ve seen, quantum amplitudes can โ€œcancel eachother outโ€.

โ€ข Measurement - The idea that amplitudes are negative as long asโ€œno one is lookingโ€ and โ€œcollapseโ€ (by squaring them) to positiveprobabilities when they are measured is deeply disturbing. Indeed,as shown by EPR and Bell, this leads to various strange outcomessuch as โ€œspooky actions at a distanceโ€, where we can create corre-lations between the results of measurements in places far removed.Unfortunately (or fortunately?) these strange outcomes have beenconfirmed experimentally.

โ€ข Entanglement - The notion that two parts of the system could beconnected in this weird way where measuring one will affect theother is known as quantum entanglement.

As counter-intuitive as these concepts are, they have been experi-mentally confirmed, so we just have to live with them.

The discussion in this chapter of quantum mechanics in general andquantum computing in particular is quite brief and superficial, theโ€œbibliographical notesโ€ section (Section 23.13) contains references andlinks to many other resources that cover this material in more depth.

23.5 QUANTUM COMPUTING AND COMPUTATION - AN EXECUTIVESUMMARY

One of the strange aspects of the quantum-mechanical picture of theworld is that unlike in the billiard ball example, there is no obviousalgorithm to simulate the evolution of ๐‘› particles over ๐‘ก time periodsin ๐‘๐‘œ๐‘™๐‘ฆ(๐‘›, ๐‘ก) steps. In fact, the natural way to simulate ๐‘› quantum par-ticles will require a number of steps that is exponential in ๐‘›. This is ahuge headache for scientists that actually need to do these calculationsin practice.

In the 1981, physicist Richard Feynman proposed to โ€œturn thislemon to lemonadeโ€ by making the following almost tautologicalobservation:

Page 617: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 617

3 As its title suggests, Feynmanโ€™s lecture was actuallyfocused on the other side of simulating physics witha computer. However, he mentioned that as a โ€œsideremarkโ€ one could wonder if itโ€™s possible to simulatephysics with a new kind of computer - a โ€œquantumcomputerโ€ which would โ€œnot [be] a Turing machine,but a machine of a different kindโ€. As far as I know,Feynman did not suggest that such a computer couldbe useful for computations completely outside thedomain of quantum simulation. Indeed, he wasmore interested in the question of whether quantummechanics could be simulated by a classical computer.

4 This โ€œ95 percentโ€ is a figure of speech, but not com-pletely so. At the time of this writing, cryptocurrencymining electricity consumption is estimated to useup at least 70Twh or 0.3 percent of the worldโ€™s pro-duction, which is about 2 to 5 percent of the totalenergy usage for the computing industry. All thecurrent cryptocurrencies will be broken by quantumcomputers. Also, for many web servers the TLS pro-tocol (which is based on the current non-lattice basedsystems would be completely broken by quantumcomputing) is responsible for about 1 percent of theCPU usage.

If a physical system cannot be simulated by a computer in ๐‘‡ steps, thesystem can be considered as performing a computation that would takemore than ๐‘‡ steps.

So, he asked whether one could design a quantum system suchthat its outcome ๐‘ฆ based on the initial condition ๐‘ฅ would be somefunction ๐‘ฆ = ๐‘“(๐‘ฅ) such that (a) we donโ€™t know how to efficientlycompute in any other way, and (b) is actually useful for something.3In 1985, David Deutsch formally suggested the notion of a quantumTuring machine, and the model has been since refined in works ofDeutsch, Josza, Bernstein and Vazirani. Such a system is now knownas a quantum computer.

For a while these hypothetical quantum computers seemed usefulfor one of two things. First, to provide a general-purpose mecha-nism to simulate a variety of the real quantum systems that peoplecare about, such as various interactions inside molecules in quan-tum chemistry. Second, as a challenge to the Physical Extended ChurchTuring Thesis which says that every physically realizable computa-tion device can be modeled (up to polynomial overhead) by Turingmachines (or equivalently, NAND-TM / NAND-RAM programs).

Quantum chemistry is important (and in particular understand-ing it can be a bottleneck for designing new materials, drugs, andmore), but it is still a rather niche area within the broader context ofcomputing (and even scientific computing) applications. Hence for awhile most researchers (to the extent they were aware of it), thoughtof quantum computers as a theoretical curiosity that has little bear-ing to practice, given that this theoretical โ€œextra powerโ€ of quantumcomputer seemed to offer little advantage in the majority of the prob-lems people want to solve in areas such as combinatorial optimization,machine learning, data structures, etc..

To some extent this is still true today. As far as we know, quantumcomputers, if built, will not provide exponential speed ups for 95%of the applications of computing.4 In particular, as far as we know,quantum computers will not help us solve NP complete problems inpolynomial or even sub-exponential time, though Groverโ€™s algorithm (Remark 23.4) does yield a quadratic advantage in many cases.

However, there is one cryptography-sized exception: In 1994 PeterShor showed that quantum computers can solve the integer factoringand discrete logarithm in polynomial time. This result has capturedthe imagination of a great many people, and completely energized re-search into quantum computing. This is both because the hardness ofthese particular problems provides the foundations for securing sucha huge part of our communications (and these days, our economy),as well as it was a powerful demonstration that quantum computers

Page 618: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

618 introduction to theoretical computer science

5 Of course, given that weโ€™re still hearing of attacksexploiting โ€œexport gradeโ€ cryptography that wassupposed to disappear in 1990โ€™s, I imagine thatweโ€™ll still have products running 1024 bit RSA wheneveryone has a quantum laptop.

could turn out to be useful for problems that a-priori seemed to havenothing to do with quantum physics.

As weโ€™ll discuss later, at the moment there are several intensiveefforts to construct large scale quantum computers. It seems safeto say that, as far as we know, in the next five years or so there willnot be a quantum computer large enough to factor, say, a 1024 bitnumber. On the other hand, it does seem quite likely that in the verynear future there will be quantum computers which achieve some taskexponentially faster than the best-known way to achieve the sametask with a classical computer. When and if a quantum computer isbuilt that is strong enough to break reasonable parameters of DiffieHellman, RSA and elliptic curve cryptography is anybodyโ€™s guess. Itcould also be a โ€œself destroying prophecyโ€ whereby the existence ofa small-scale quantum computer would cause everyone to shift awayto lattice-based crypto which in turn will diminish the motivationto invest the huge resources needed to build a large scale quantumcomputer.5

RRemark 23.4 โ€” Quantum computing and NP. Despitepopular accounts of quantum computers as havingvariables that can take โ€œzero and one at the sametimeโ€ and therefore can โ€œexplore an exponential num-ber of possibilities simultaneouslyโ€, their true poweris much more subtle and nuanced. In particular, as faras we know, quantum computers do not enable us tosolve NP complete problems such as 3SAT in polyno-mial or even sub-exponential time. However, Groverโ€™ssearch algorithm does give a more modest advan-tage (namely, quadratic) for quantum computersover classical ones for problems in NP. In particular,due to Groverโ€™s search algorithm, we know that the๐‘˜-SAT problem for ๐‘› variables can be solved in time๐‘‚(2๐‘›/2๐‘๐‘œ๐‘™๐‘ฆ(๐‘›)) on a quantum computer for every ๐‘˜.In contrast, the best known algorithms for ๐‘˜-SAT on aclassical computer take roughly 2(1โˆ’ 1๐‘˜ )๐‘› steps.

Big Idea 28 Quantum computers are not a panacea and are un-likely to solve NP complete problems, but they can provide exponen-tial speedups to certain structured problems.

23.6 QUANTUM SYSTEMS

Before we talk about quantum computing, let us recall how we phys-ically realize โ€œvanillaโ€ or classical computing. We model a logical bit

Page 619: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 619

that can equal 0 or a 1 by some physical system that can be in one oftwo states. For example, it might be a wire with high or low voltage,charged or uncharged capacitor, or even (as we saw) a pipe with orwithout a flow of water, or the presence or absence of a soldier crab. Aclassical system of ๐‘› bits is composed of ๐‘› such โ€œbasic systemsโ€, eachof which can be in either a โ€œzeroโ€ or โ€œoneโ€ state. We can model thestate of such a system by a string ๐‘  โˆˆ 0, 1๐‘›. If we perform an op-eration such as writing to the 17-th bit the NAND of the 3rd and 5thbits, this corresponds to applying a local function to ๐‘  such as setting๐‘ 17 = 1 โˆ’ ๐‘ 3 โ‹… ๐‘ 5.

In the probabilistic setting, we would model the state of the systemby a distribution. For an individual bit, we could model it by a pair ofnon-negative numbers ๐›ผ, ๐›ฝ such that ๐›ผ + ๐›ฝ = 1, where ๐›ผ is the prob-ability that the bit is zero and ๐›ฝ is the probability that the bit is one.For example, applying the negation (i.e., NOT) operation to this bitcorresponds to mapping the pair (๐›ผ, ๐›ฝ) to (๐›ฝ, ๐›ผ) since the probabilitythat NOT(๐œŽ) is equal to 1 is the same as the probability that ๐œŽ is equalto 0. This means that we can think of the NOT function as the linearmap ๐‘ โˆถ โ„2 โ†’ โ„2 such that ๐‘ (๐›ผ๐›ฝ) = (๐›ฝ๐›ผ) or equivalently as the

matrix (0 11 0).

If we think of the ๐‘›-bit system as a whole, then since the ๐‘› bits cantake one of 2๐‘› possible values, we model the state of the system as avector ๐‘ of 2๐‘› probabilities. For every ๐‘  โˆˆ 0, 1๐‘›, we denote by ๐‘’๐‘ the 2๐‘› dimensional vector that has 1 in the coordinate correspond-ing to ๐‘  (identifying it with a number in [2๐‘›]), and so can write ๐‘ asโˆ‘๐‘ โˆˆ0,1๐‘› ๐‘๐‘ ๐‘’๐‘  where ๐‘๐‘  is the probability that the system is in the state๐‘ .

Applying the operation above of setting the 17-th bit to the NANDof the 3rd and 5th bits, corresponds to transforming the vector ๐‘ tothe vector ๐น๐‘ where ๐น โˆถ โ„2๐‘› โ†’ โ„2๐‘› is the linear map that maps ๐‘’๐‘  to๐‘’๐‘ 0โ‹ฏ๐‘ 16(1โˆ’๐‘ 3โ‹…๐‘ 5)๐‘ 18โ‹ฏ๐‘ ๐‘›โˆ’1 . (Since ๐‘’๐‘ ๐‘ โˆˆ0,1๐‘› is a basis for ๐‘…2๐‘› , it suffices todefine the map ๐น on vectors of this form.)

PPlease make sure you understand why performing theoperation will take a system in state ๐‘ to a system inthe state ๐น๐‘. Understanding the evolution of proba-bilistic systems is a prerequisite to understanding theevolution of quantum systems.If your linear algebra is a bit rusty, now would be agood time to review it, and in particular make sureyou are comfortable with the notions of matrices, vec-tors, (orthogonal and orthonormal) bases, and norms.

Page 620: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

620 introduction to theoretical computer science

23.6.1 Quantum amplitudesIn the quantum setting, the state of an individual bit (or โ€œqubitโ€,to use quantum parlance) is modeled by a pair of numbers (๐›ผ, ๐›ฝ)such that |๐›ผ|2 + |๐›ฝ|2 = 1. While in general these numbers can becomplex, for the rest of this chapter, we will often assume they arereal (though potentially negative), and hence often drop the absolutevalue operator. (This turns out not to make much of a difference inexplanatory power.) As before, we think of ๐›ผ2 as the probability thatthe bit equals 0 and ๐›ฝ2 as the probability that the bit equals 1. As wedid before, we can model the NOT operation by the map ๐‘ โˆถ โ„2 โ†’ โ„2where ๐‘(๐›ผ, ๐›ฝ) = (๐›ฝ, ๐›ผ).

Following quantum tradition, instead of using ๐‘’0 and ๐‘’1 as we didabove, from now on we will denote the vector (1, 0) by |0โŸฉ and thevector (0, 1) by |1โŸฉ (and moreover, think of these as column vectors).This is known as the Dirac โ€œketโ€ notation. This means that NOT isthe unique linear map ๐‘ โˆถ โ„2 โ†’ โ„2 that satisfies ๐‘|0โŸฉ = |1โŸฉ and๐‘|1โŸฉ = |0โŸฉ. In other words, in the quantum case, as in the probabilisticcase, NOT corresponds to the matrix

๐‘ = (0 11 0) . (23.3)

In classical computation, we typically think that there are onlytwo operations that we can do on a single bit: keep it the same ornegate it. In the quantum setting, a single bit operation correspondsto any linear map OP โˆถ โ„2 โ†’ โ„2 that is norm preserving in the sense

that for every ๐›ผ, ๐›ฝ, if we apply OP to the vector (๐›ผ๐›ฝ) then we obtain

a vector (๐›ผโ€ฒ๐›ฝโ€ฒ) such that ๐›ผโ€ฒ2 + ๐›ฝโ€ฒ2 = ๐›ผ2 + ๐›ฝ2. Such a linear map

OP corresponds to a unitary two by two matrix. (As we mentioned,quantum mechanics actually models states as vectors with complexcoordinates; however, this does not make any qualitative difference toour discussion.) Keeping the bit the same corresponds to the matrix๐ผ = (1 00 1) and (as weโ€™ve seen) the NOT operations corresponds to

the matrix ๐‘ = (0 11 0). But there are other operations we can use

as well. One such useful operation is the Hadamard operation, whichcorresponds to the matrix

๐ป = 1โˆš2 (+1 +1+1 โˆ’1) . (23.4)

Page 621: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 621

In fact it turns out that Hadamard is all that we need to add to aclassical universal basis to achieve the full power of quantum comput-ing.

23.6.2 Quantum systems: an executive summaryIf you ignore the physics and philosophy, for the purposes of under-standing the model of quantum computers, all you need to knowabout quantum systems is the following. The state of a quantum systemof ๐‘› qubits is modeled by an 2๐‘› dimensional vector ๐œ“ of unit norm(i.e., squares of all coordinates sums up to 1), which we write as๐œ“ = โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐œ“๐‘ฅ|๐‘ฅโŸฉ where |๐‘ฅโŸฉ is the column vector that has 0 in allcoordinates except the one corresponding to ๐‘ฅ (identifying 0, 1๐‘›with the numbers 0, โ€ฆ , 2๐‘› โˆ’ 1). We use the convention that if ๐‘Ž, ๐‘are strings of lengths ๐‘˜ and โ„“ respectively then we can write the 2๐‘˜+โ„“dimensional vector with 1 in the ๐‘Ž๐‘-th coordinate and zero elsewherenot just as |๐‘Ž๐‘โŸฉ but also as |๐‘ŽโŸฉ|๐‘โŸฉ. In particular, for every ๐‘ฅ โˆˆ 0, 1๐‘›, wecan write the vector |๐‘ฅโŸฉ also as |๐‘ฅ0โŸฉ|๐‘ฅ1โŸฉ โ‹ฏ |๐‘ฅ๐‘›โˆ’1โŸฉ. This notation satisfiescertain nice distributive laws such as |๐‘ŽโŸฉ(|๐‘โŸฉ + |๐‘โ€ฒโŸฉ)|๐‘โŸฉ = |๐‘Ž๐‘๐‘โŸฉ + |๐‘Ž๐‘โ€ฒ๐‘โŸฉ.

A quantum operation on such a system is modeled by a 2๐‘› ร— 2๐‘›unitary matrix ๐‘ˆ (one that satisfies UUโŠค = ๐ผ where ๐‘ˆโŠค is the transposeoperation, or conjugate transpose for complex matrices). If the systemis in state ๐œ“ and we apply to it the operation ๐‘ˆ , then the new state ofthe system is ๐‘ˆ๐œ“.

When we measure an ๐‘›-qubit system in a state ๐œ“ = โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐œ“๐‘ฅ|๐‘ฅโŸฉ,then we observe the value ๐‘ฅ โˆˆ 0, 1๐‘› with probability |๐œ“๐‘ฅ|2. In thiscase, the system collapses to the state |๐‘ฅโŸฉ.23.7 ANALYSIS OF BELLโ€™S INEQUALITY (OPTIONAL)

Now that we have the notation in place, we can show a strategy forAlice and Bob to display โ€œquantum telepathyโ€ in Bellโ€™s Game. Re-call that in the classical case, Alice and Bob can succeed in the โ€œBellGameโ€ with probability at most 3/4 = 0.75. We now show that quan-tum mechanics allows them to succeed with probability at least 0.8.(The strategy we show is not the best one. Alice and Bob can in factsucceed with probability cos2(๐œ‹/8) โˆผ 0.854.)Lemma 23.5 There is a 2-qubit quantum state ๐œ“ โˆˆ โ„‚4 so that if Alicehas access to the first qubit of ๐œ“, can manipulate and measure it andoutput ๐‘Ž โˆˆ 0, 1 and Bob has access to the second qubit of ๐œ“ and canmanipulate and measure it and output ๐‘ โˆˆ 0, 1 then Pr[๐‘Ž โŠ• ๐‘ =๐‘ฅ โˆง ๐‘ฆ] โ‰ฅ 0.8.

Page 622: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

622 introduction to theoretical computer science

Proof. Alice and Bob will start by preparing a 2-qubit quantum systemin the state ๐œ“ = 1โˆš2 |00โŸฉ + 1โˆš2 |11โŸฉ (23.5)

(this state is known as an EPR pair). Alice takes the first qubitof the system to her room, and Bob takes the second qubit to hisroom. Now, when Alice receives ๐‘ฅ if ๐‘ฅ = 0 she does nothing andif ๐‘ฅ = 1 she applies the unitary map ๐‘…โˆ’๐œ‹/8 to her qubit where๐‘…๐œƒ = (๐‘๐‘œ๐‘ ๐œƒ โˆ’ sin ๐œƒ

sin ๐œƒ cos ๐œƒ ) is the unitary operation corresponding to

rotation in the plane with angle ๐œƒ. When Bob receives ๐‘ฆ, if ๐‘ฆ = 0 hedoes nothing and if ๐‘ฆ = 1 he applies the unitary map ๐‘…๐œ‹/8 to his qubit.Then each one of them measures their qubit and sends this as theirresponse.

Recall that to win the game Bob and Alice want their outputs tobe more likely to differ if ๐‘ฅ = ๐‘ฆ = 1 and to be more likely to agreeotherwise. We will split the analysis in one case for each of the fourpossible values of ๐‘ฅ and ๐‘ฆ.

Case 1: ๐‘ฅ = 0 and ๐‘ฆ = 0. If ๐‘ฅ = ๐‘ฆ = 0 then the state does notchange. Because the state ๐œ“ is proportional to |00โŸฉ + |11โŸฉ, the measure-ments of Bob and Alice will always agree (if Alice measures 0 then thestate collapses to |00โŸฉ and so Bob measures 0 as well, and similarly for1). Hence in the case ๐‘ฅ = ๐‘ฆ = 1, Alice and Bob always win.

Case 2: ๐‘ฅ = 0 and ๐‘ฆ = 1. If ๐‘ฅ = 0 and ๐‘ฆ = 1 then after Alicemeasures her bit, if she gets 0 then the system collapses to the state|00โŸฉ, in which case after Bob performs his rotation, his qubit is inthe state cos(๐œ‹/8)|0โŸฉ + sin(๐œ‹/8)|1โŸฉ. Thus, when Bob measures hisqubit, he will get 0 (and hence agree with Alice) with probabilitycos2(๐œ‹/8) โ‰ฅ 0.85. Similarly, if Alice gets 1 then the system collapsesto |11โŸฉ, in which case after rotation Bobโ€™s qubit will be in the stateโˆ’ sin(๐œ‹/8)|0โŸฉ + cos(๐œ‹/8)|1โŸฉ and so once again he will agree with Alicewith probability cos2(๐œ‹/8).

The analysis for Case 3, where ๐‘ฅ = 1 and ๐‘ฆ = 0, is completelyanalogous to Case 2. Hence Alice and Bob will agree with probabilitycos2(๐œ‹/8) in this case as well. (To show this we use the observationthat the result of this experiment is the same regardless of the orderin which Alice and Bob apply their rotations and measurements; thisrequires a proof but is not very hard to show.)

Case 4: ๐‘ฅ = 1 and ๐‘ฆ = 1. For the case that ๐‘ฅ = 1 and ๐‘ฆ = 1,after both Alice and Bob perform their rotations, the state will beproportional to ๐‘…โˆ’๐œ‹/8|0โŸฉ๐‘…๐œ‹/8|0โŸฉ + ๐‘…โˆ’๐œ‹/8|1โŸฉ๐‘…๐œ‹/8|1โŸฉ . (23.6)

Page 623: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 623

Intuitively, since we rotate one state by 45 degrees and the otherstate by -45 degrees, they will become orthogonal to each other, andthe measurements will behave like independent coin tosses that agreewith probability 1/2. However, for the sake of completeness, we nowshow the full calculation.

Opening up the coefficients and using cos(โˆ’๐‘ฅ) = cos(๐‘ฅ) andsin(โˆ’๐‘ฅ) = โˆ’ sin(๐‘ฅ), we can see that (23.6) is proportional to

cos2(๐œ‹/8)|00โŸฉ + cos(๐œ‹/8) sin(๐œ‹/8)|01โŸฉโˆ’ sin(๐œ‹/8) cos(๐œ‹/8)|10โŸฉ + sin2(๐œ‹/8)|11โŸฉโˆ’ sin2(๐œ‹/8)|00โŸฉ + sin(๐œ‹/8) cos(๐œ‹/8)|01โŸฉโˆ’ cos(๐œ‹/8) sin(๐œ‹/8)|10โŸฉ + cos2(๐œ‹/8)|11โŸฉ . (23.7)

Using the trigonometric identities 2 sin(๐›ผ) cos(๐›ผ) = sin(2๐›ผ) andcos2(๐›ผ) โˆ’ sin2(๐›ผ) = cos(2๐›ผ), we see that the probability of getting anyone of |00โŸฉ, |10โŸฉ, |01โŸฉ, |11โŸฉ is proportional to cos(๐œ‹/4) = sin(๐œ‹/4) = 1โˆš2 .Hence all four options for (๐‘Ž, ๐‘) are equally likely, which mean that inthis case ๐‘Ž = ๐‘ with probability 0.5.

Taking all the four cases together, the overall probability of winningthe game is 14 โ‹… 1 + 12 โ‹… 0.85 + 14 โ‹… 0.5 = 0.8.

RRemark 23.6 โ€” Quantum vs probabilistic strategies. Itis instructive to understand what is it about quan-tum mechanics that enabled this gain in Bellโ€™sInequality. For this, consider the following anal-ogous probabilistic strategy for Alice and Bob.They agree that each one of them output 0 if heor she get 0 as input and outputs 1 with prob-ability ๐‘ if they get 1 as input. In this case onecan see that their success probability would be14 โ‹… 1 + 12 (1 โˆ’ ๐‘) + 14 [2๐‘(1 โˆ’ ๐‘)] = 0.75 โˆ’ 0.5๐‘2 โ‰ค 0.75.The quantum strategy we described above can bethought of as a variant of the probabilistic strategy forparameter ๐‘ set to sin2(๐œ‹/8) = 0.15. But in the case๐‘ฅ = ๐‘ฆ = 1, instead of disagreeing only with probability2๐‘(1 โˆ’ ๐‘) = 1/4, the existence of the so called โ€œneg-ative probabilitiesโ€™ โ€™ in the quantum world allowedus to rotate the state in opposing directions to achievedestructive interference and hence a higher probabilityof disagreement, namely sin2(๐œ‹/4) = 0.5.

Page 624: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

624 introduction to theoretical computer science

6 Readers familiar with quantum computing shouldnote that ๐‘ˆ๐‘๐ด๐‘๐ท is a close variant of the so calledToffoli gate and so QNAND-CIRC programs corre-spond to quantum circuits with the Hadamard andToffoli gates.

23.8 QUANTUM COMPUTATION

Recall that in the classical setting, we modeled computation as ob-tained by a sequence of basic operations. We had two types of computa-tional models:

โ€ข Non uniform models of computation such as Boolean circuits andNAND-CIRC programs, where a finite function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1is computable in size ๐‘‡ if it can be expressed as a combination of๐‘‡ basic operations (gates in a circuit or lines in a NAND-CIRCprogram)

โ€ข Uniform models of computation such as Turing machines and NAND-TM programs, where an infinite function ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 iscomputable in time ๐‘‡ (๐‘›) if there is a single algorithm that on input๐‘ฅ โˆˆ 0, 1๐‘› evaluates ๐น(๐‘ฅ) using at most ๐‘‡ (๐‘›) basic steps.

When considering efficient computation, we defined the class P toconsist of all infinite functions ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 that can be com-puted by a Turing machine or NAND-TM program in time ๐‘(๐‘›) forsome polynomial ๐‘(โ‹…). We defined the class P/poly to consists of allinfinite functions ๐น โˆถ 0, 1โˆ— โ†’ 0, 1 such that for every ๐‘›, the re-striction ๐น๐‘› of ๐น to 0, 1๐‘› can be computed by a Boolean circuit orNAND-CIRC program of size at most ๐‘(๐‘›) for some polynomial ๐‘(โ‹…).

We will do the same for quantum computation, focusing mostly onthe non uniform setting of quantum circuits, since that is simpler, andalready illustrates the important differences with classical computing.

23.8.1 Quantum circuitsA quantum circuit is analogous to a Boolean circuit, and can be de-scribed as a directed acyclic graph. One crucial difference that theout degree of every vertex in a quantum circuit is at most one. This isbecause we cannot โ€œreuseโ€ quantum states without measuring them(which collapses their โ€œprobabilitiesโ€). Therefore, we cannot use thesame qubit as input for two different gates. (This is known as the NoCloning Theorem.) Another more technical difference is that to ex-press our operations as unitary matrices, we will need to make sure allour gates are reversible. This is not hard to ensure. For example, in thequantum context, instead of thinking of NAND as a (non reversible)map from 0, 12 to 0, 1, we will think of it as the reversible mapon three qubits that maps ๐‘Ž, ๐‘, ๐‘ to ๐‘Ž, ๐‘, ๐‘ โŠ• NAND(๐‘Ž, ๐‘) (i.e., flip thelast bit if NAND of the first two bits is 1). Equivalently, the NANDoperation corresponds to the 8 ร— 8 unitary matrix ๐‘ˆ๐‘๐ด๐‘๐ท such that(identifying 0, 13 with [8]) for every ๐‘Ž, ๐‘, ๐‘ โˆˆ 0, 1, if |๐‘Ž๐‘๐‘โŸฉ is thebasis element with 1 in the ๐‘Ž๐‘๐‘-th coordinate and zero elsewhere, then๐‘ˆ๐‘๐ด๐‘๐ท|๐‘Ž๐‘๐‘โŸฉ = |๐‘Ž๐‘(๐‘ โŠ• NAND(๐‘Ž, ๐‘))โŸฉ.6 If we order the rows and

Page 625: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 625

columns as 000, 001, 010, โ€ฆ , 111, then ๐‘ˆ๐‘๐ด๐‘๐ท can be written as thefollowing matrix:

๐‘ˆ๐‘๐ด๐‘๐ท =โŽ›โŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽœโŽ

0 1 0 0 0 0 0 01 0 0 0 0 0 0 00 0 0 1 0 0 0 00 0 1 0 0 0 0 00 0 0 0 0 1 0 00 0 0 0 1 0 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1

โŽžโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽŸโŽ (23.8)

If we have an ๐‘› qubit system, then for ๐‘–, ๐‘—, ๐‘˜ โˆˆ [๐‘›], we will denoteby ๐‘ˆ ๐‘–,๐‘—,๐‘˜๐‘๐ด๐‘๐ท as the 2๐‘› ร— 2๐‘› unitary matrix that corresponds to applying๐‘ˆ๐‘๐ด๐‘๐ท to the ๐‘–-th, ๐‘—-th, and ๐‘˜-th bits, leaving the others intact. That is,for every ๐‘ฃ = โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘ฃ๐‘ฅ|๐‘ฅโŸฉ, ๐‘ˆ ๐‘–,๐‘—,๐‘˜๐‘๐ด๐‘๐ท๐‘ฃ = โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘ฃ๐‘ฅ|๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘˜โˆ’1(๐‘ฅ๐‘˜ โŠ•NAND(๐‘ฅ๐‘–, ๐‘ฅ๐‘—))๐‘ฅ๐‘˜+1 โ‹ฏ ๐‘ฅ๐‘›โˆ’1โŸฉ.

As mentioned above, we will also use the Hadamard or HAD oper-ation, A quantum circuit is obtained by applying a sequence of ๐‘ˆ๐‘๐ด๐‘๐ทand HAD gates, where a HAD gates corresponding to applying thematrix ๐ป = 1โˆš2 (+1 +1+1 โˆ’1) . (23.9)

Another way to write define ๐ป is that for ๐‘ โˆˆ 0, 1, ๐ป|๐‘โŸฉ = 1โˆš2 |0โŸฉ +1โˆš2 (โˆ’1)๐‘|1โŸฉ. We define HAD๐‘– to be the 2๐‘› ร— 2๐‘› unitary matrix thatapplies HAD to the ๐‘–-th qubit and leaves the others intact. Using theket notation, we can write this as

HAD๐‘– โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘ฃ๐‘ฅ|๐‘ฅโŸฉ = 1โˆš2 โˆ‘๐‘ฅโˆˆ0,1๐‘› |๐‘ฅ0 โ‹ฏ ๐‘ฅ๐‘–โˆ’1โŸฉ (|0โŸฉ + (โˆ’1)๐‘ฅ๐‘– |1โŸฉ) |๐‘ฅ๐‘–+1 โ‹ฏ ๐‘ฅ๐‘›โˆ’1โŸฉ .(23.10)

A quantum circuit is obtained by composing these basic operationson some ๐‘š qubits. If ๐‘š โ‰ฅ ๐‘›, we use a circuit to compute a function๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1:โ€ข On input ๐‘ฅ, we initialize the system to hold ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 in the first ๐‘›

qubits, and initialize all remaining ๐‘š โˆ’ ๐‘› qubits to zero.

โ€ข We execute each elementary operation one by one: at every step weapply to the current state either an operation of the form ๐‘ˆ ๐‘–,๐‘—,๐‘˜๐‘๐ด๐‘๐ท oran operation of the form HAD๐‘– for ๐‘–, ๐‘—, ๐‘˜ โˆˆ [๐‘š].

โ€ข At the end of the computation, we measure the system, and outputthe result of the last qubit (i.e. the qubit in location ๐‘š โˆ’ 1). (Forsimplicity we restrict attention to functions with a single bit ofoutput, though the definition of quantum circuits naturally extendsto circuits with multiple outputs.)

Page 626: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

626 introduction to theoretical computer science

โ€ข We say that the circuit computes the function ๐‘“ if the probability thatthis output equals ๐‘“(๐‘ฅ) is at least 2/3. Note that this probabilityis obtained by summing up the squares of the amplitudes of allcoordinates in the final state of the system corresponding to vectors|๐‘ฆโŸฉ where ๐‘ฆ๐‘šโˆ’1 = ๐‘“(๐‘ฅ).Formally we define quantum circuits as follows:

Definition 23.7 โ€” Quantum circuit. Let ๐‘  โ‰ฅ ๐‘š โ‰ฅ ๐‘›. A quantum circuit of๐‘› inputs, ๐‘š โˆ’ ๐‘› auxiliary bits, and ๐‘  gates over the ๐‘ˆ๐‘๐ด๐‘๐ท,HADbasis is a sequence of ๐‘  unitary 2๐‘š ร— 2๐‘š matrices ๐‘ˆ0, โ€ฆ , ๐‘ˆ๐‘ โˆ’1 suchthat each matrix ๐‘ˆโ„“ is either of the form NAND๐‘–,๐‘—,๐‘˜ for ๐‘–, ๐‘—, ๐‘˜ โˆˆ [๐‘š]or HAD๐‘– for ๐‘– โˆˆ [๐‘š].

A quantum circuit computes a function ๐‘“ โˆถ 0, 1๐‘› โ†’ 0, 1 if thefollowing is true for every ๐‘ฅ โˆˆ 0, 1๐‘›:

Let ๐‘ฃ be the vector๐‘ฃ = ๐‘ˆ๐‘ โˆ’1๐‘ˆ๐‘ โˆ’2 โ‹ฏ ๐‘ˆ1๐‘ˆ0|๐‘ฅ0๐‘šโˆ’๐‘›โŸฉ (23.11)

and write ๐‘ฃ as โˆ‘๐‘ฆโˆˆ0,1๐‘š ๐‘ฃ๐‘ฆ|๐‘ฆโŸฉ. Thenโˆ‘๐‘ฆโˆˆ0,1๐‘š s.t. ๐‘ฆ๐‘šโˆ’1=๐‘“(๐‘ฅ) |๐‘ฃ๐‘ฆ|2 โ‰ฅ 23 . (23.12)

PPlease stop here and see that this definition makessense to you.

Big Idea 29 Just as we did with classical computation, we can de-fine mathematical models for quantum computation, and representquantum algorithms as binary strings.

Once we have the notion of quantum circuits, we can define thequantum analog of P/poly (i.e., the class of functions computable bypolynomial size quantum circuits) as follows:

Definition 23.8 โ€” BQP/poly. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. We say that๐น โˆˆ BQP/poly if there exists some polynomial ๐‘ โˆถ โ„• โ†’ โ„• such thatfor every ๐‘› โˆˆ โ„•, if ๐น๐‘› is the restriction of ๐น to inputs of length ๐‘›,then there is a quantum circuit of size at most ๐‘(๐‘›) that computes๐น๐‘›.

Page 627: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 627

RRemark 23.9 โ€” The obviously exponential fallacy. Apriori it might seem โ€œobviousโ€ that quantum com-puting is exponentially powerful, since to perform aquantum computation on ๐‘› bits we need to maintainthe 2๐‘› dimensional state vector and apply 2๐‘› ร— 2๐‘› ma-trices to it. Indeed popular descriptions of quantumcomputing (too) often say something along the linesthat the difference between quantum and classicalcomputer is that a classical bit can either be zero orone while a qubit can be in both states at once, andso in many qubits a quantum computer can performexponentially many computations at once.Depending on how you interpret it, this descriptionis either false or would apply equally well to proba-bilistic computation, even though weโ€™ve already seenthat every randomized algorithm can be simulated bya similar-sized circuit, and in fact we conjecture thatBPP = P.Moreover, this โ€œobviousโ€ approach for simulating aquantum computation will take not just exponentialtime but exponential space as well, while can be shownthat using a simple recursive formula one can calcu-late the final quantum state using polynomial space (inphysics this is known as โ€œFeynman path integralsโ€).So, the exponentially long vector description by itselfdoes not imply that quantum computers are exponen-tially powerful. Indeed, we cannot prove that they are(i.e., we have not been able to rule out the possibilitythat every QNAND-CIRC program could be simu-lated by a NAND-CIRC program/ Boolean circuit withpolynomial overhead), but we do have some problems(integer factoring most prominently) for which theydo provide exponential speedup over the currentlybest known classical (deterministic or probabilistic)algorithms.

23.8.2 QNAND-CIRC programs (optional)Just like in the classical case, there is an equivalence between circuitsand straight-line programs, and so we can define the programminglanguage QNAND-CIRC that is the quantum analog of our NAND-CIRC programming language. To do so, we only add a single op-eration: HAD(foo) which applies the single-bit operation ๐ป to thevariable foo. We also use the following interpretation to make NANDreversible: foo = NAND(bar,blah)means that we modify foo to bethe XOR of its original value and the NAND of bar and blah. (Inother words, apply the 8 by 8 unitary transformation ๐‘ˆ๐‘๐ด๐‘๐ท definedabove to the three qubits corresponding to foo, bar and blah.) If foois initialized to zero then this makes no difference.

Page 628: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

628 introduction to theoretical computer science

If ๐‘ƒ is a QNAND-CIRC program with ๐‘› input variables, โ„“workspace variables, and ๐‘š output variables, then running it on theinput ๐‘ฅ โˆˆ 0, 1๐‘› corresponds to setting up a system with ๐‘› + ๐‘š + โ„“qubits and performing the following process:1. We initialize the input variables X[0] โ€ฆ X[๐‘› โˆ’ 1] to ๐‘ฅ0, โ€ฆ , ๐‘ฅ๐‘›โˆ’1 and

all other variables to 0.2. We execute the program line by line, applying the corresponding

physical operation ๐ป or ๐‘ˆ๐‘๐ด๐‘๐ท to the qubits that are referred to bythe line.

3. We measure the output variables Y[0], โ€ฆ, Y[๐‘š โˆ’ 1] and outputthe result (if there is more than one output then we measure morevariables).

23.8.3 Uniform computationJust as in the classical case, we can define uniform computationalmodels for quantum computing as well. We will let BQP be thequantum analog to P and BPP: the class of all Boolean functions๐น โˆถ 0, 1โˆ— โ†’ 0, 1 that can be computed by quantum algorithmsin polynomial time. There are several equivalent ways to define BQP.For example, there is a computational model of Quantum TuringMachines that can be used to define BQP just as standard Turingmachines are used to define P. Another alternative is to define theQNAND-TM programming language to be QNAND-CIRC augmentedwith loops and arrays just like NAND-TM is obtained from NAND-CIRC. Once again, we can define BQP using QNAND-TM programsanalogously to the way P can be defined using NAND-TM programs.However, we use the following equivalent definition (which is alsothe one most popular in the literature):

Definition 23.10 โ€” The class BQP. Let ๐น โˆถ 0, 1โˆ— โ†’ 0, 1. We say that๐น โˆˆ BQP if there exists a polynomial time NAND-TM program ๐‘ƒsuch that for every ๐‘›, ๐‘ƒ(1๐‘›) is the description of a quantum circuit๐ถ๐‘› that computes the restriction of ๐น to 0, 1๐‘›.

PDefinition 23.10 is the quantum analog of the alter-native characterization of P that appears in SolvedExercise 13.4. One way to verify that youโ€™ve under-stood Definition 23.10 it to see that you can prove(1) P โŠ† BQP and in fact the stronger statementBPP โŠ† BQP, (2) BQP โŠ† EXP, and (3) For everyNP-complete function ๐น , if ๐น โˆˆ BQP then NP โŠ† BQP.Exercise 23.1 asks you to work these out.

Page 629: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 629

Figure 23.3: Superconducting quantum computerprototype at Google. Image credit: Google / MITTechnology Review.

The relation between NP and BQP is not known (see also Re-mark 23.4). It is widely believed that NP โŠˆ BQP, but there is noconsensus whether or not BQP โŠ† NP. It is quite possible that thesetwo classes are incomparable, in the sense that NP โŠˆ BQP (and in par-ticular no NP-complete function belongs to BQP) but also BQP โŠˆ NP(and there are some interesting candidates for such problems).

It can be shown that QNANDEVAL (evaluating a quantum circuiton an input) is computable by a polynomial size QNAND-CIRC pro-gram, and moreover this program can even be generated uniformlyand hence QNANDEVAL is in BQP. This allows us to โ€œportโ€ manyof the results of classical computational complexity into the quantumrealm, including the notions of a universal quantum Turing machine,as well as all of the uncomputability results. There is even a quantumanalog of the Cook-Levin Theorem.

RRemark 23.11 โ€” Restricting attention to circuits. Becausethe non uniform model is a little cleaner to work with,in the rest of this chapter we mostly restrict attentionto this model, though all the algorithms we discusscan be implemented using uniform algorithms as well.

23.9 PHYSICALLY REALIZING QUANTUM COMPUTATION

To realize quantum computation one needs to create a system with ๐‘›independent binary states (i.e., โ€œqubitsโ€), and be able to manipulatesmall subsets of two or three of these qubits to change their state.While by the way we defined operations above it might seem that oneneeds to be able to perform arbitrary unitary operations on these twoor three qubits, it turns out that there are several choices for universalsets - a small constant number of gates that generate all others. Thebiggest challenge is how to keep the system from being measured andcollapsing to a single classical combination of states. This is sometimesknown as the coherence time of the system. The threshold theorem saysthat there is some absolute constant level of errors ๐œ so that if errorsare created at every gate at rate smaller than ๐œ then we can recoverfrom those and perform arbitrary long computations. (Of course thereare different ways to model the errors and so there are actually severalthreshold theorems corresponding to various noise models).

There have been several proposals to build quantum computers:

โ€ข Superconducting quantum computers use superconducting electriccircuits to do quantum computation. This is the direction wherethere has been most recent progress towards โ€œbeatingโ€ classicalcomputers.

Page 630: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

630 introduction to theoretical computer science

Figure 23.4: Top: A periodic function. Bottom: Ana-periodic function.

โ€ข Trapped ion quantum computers Use the states of an ion to sim-ulate a qubit. People have made some recent advances on thesecomputers too. While itโ€™s not at all clear thatโ€™s the right measur-ing yard, the current best implementation of Shorโ€™s algorithm (forfactoring 15) is done using an ion-trap computer.

โ€ข Topological quantum computers use a different technology. Topo-logical qubits are more stable by design and hence error correctionis less of an issue, but constructing them is extremely challenging.

These approaches are not mutually exclusive and it could be thatultimately quantum computers are built by combining all of themtogether. In the near future, it seems that we will not be able to achievefull fledged large scale universal quantum computers, but rather morerestricted machines, sometimes called โ€œNoisy Intermediate-ScaleQuantum Computersโ€ or โ€œNISQโ€. See this article by John Preskil forsome of the progress and applications of such more restricted devices.

23.10 SHORโ€™S ALGORITHM: HEARING THE SHAPE OF PRIME FAC-TORS

Bellโ€™s Inequality is a powerful demonstration that there is some-thing very strange going on with quantum mechanics. But couldthis โ€œstrangenessโ€ be of any use to solve computational problems notdirectly related to quantum systems? A priori, one could guess theanswer is no. In 1994 Peter Shor showed that one would be wrong:

Theorem 23.12 โ€” Shorโ€™s Algorithm. There is a polynomial-time quan-tum algorithm that on input an integer ๐‘€ (represented in basetwo), outputs the prime factorization of ๐‘€ .

Another way to state Theorem 23.12 is that if we defineFACTORING โˆถ 0, 1โˆ— โ†’ 0, 1 to be the function that on input apair of numbers (๐‘€, ๐‘‹) outputs 1 if and only if ๐‘€ has a factor ๐‘ƒ suchthat 2 โ‰ค ๐‘ƒ โ‰ค ๐‘‹, then FACTORING is in BQP. This is an exponentialimprovement over the best known classical algorithms, which takeroughly 2(๐‘›1/3) time, where the notation hides factors that arepolylogarithmic in ๐‘›. While we will not prove Theorem 23.12 in thischapter, we will sketch some of the ideas behind the proof.

23.10.1 Period findingAt the heart of Shorโ€™s Theorem is an efficient quantum algorithm forfinding periods of a given function. For example, a function ๐‘“ โˆถ โ„ โ†’ โ„is periodic if there is some โ„Ž > 0 such that ๐‘“(๐‘ฅ + โ„Ž) = ๐‘“(๐‘ฅ) for every ๐‘ฅ(e.g., see Fig. 23.4).

Page 631: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 631

Figure 23.5: Left: The air-pressure when playing aโ€œC Majorโ€ chord as a function of time. Right: Thecoefficients of the Fourier transform of the same func-tion, we can see that it is the sum of three frequenciescorresponding to the C, E and G notes (261.63, 329.63and 392 Hertz respectively). Credit: Bjarke Mรธnstedโ€™sQuora answer.

Figure 23.6: If ๐‘“ is a periodic function then when werepresent it in the Fourier transform, we expect thecoefficients corresponding to wavelengths that donot evenly divide the period to be very small, as theywould tend to โ€œcancel outโ€.

Musical notes yield one type of periodic function. When you pullon a string on a musical instrument, it vibrates in a repeating pattern.Hence, if we plot the speed of the string (and so also the speed ofthe air around it) as a function of time, it will correspond to someperiodic function. The length of the period is known as the wave lengthof the note. The frequency is the number of times the function repeatsitself within a unit of time. For example, the โ€œMiddle Cโ€ note hasa frequency of 261.63 Hertz, which means its period is 1/(261.63)seconds.

If we play a chord by playing several notes at once, we get a morecomplex periodic function obtained by combining the functions ofthe individual notes (see Fig. 23.5). The human ear contains manysmall hairs, each of which is sensitive to a narrow band of frequencies.Hence when we hear the sound corresponding to a chord, the hairs inour ears actually separate it out to the components corresponding toeach frequency.

It turns out that (essentially) every periodic function ๐‘“ โˆถ โ„ โ†’ โ„can be decomposed into a sum of simple wave functions (namelyfunctions of the form ๐‘ฅ โ†ฆ sin(๐œƒ๐‘ฅ) or ๐‘ฅ โ†ฆ cos(๐œƒ๐‘ฅ)). This is known asthe Fourier Transform (see Fig. 23.6). The Fourier transform makes iteasy to compute the period of a given function: it will simply be theleast common multiple of the periods of the constituent waves.

23.10.2 Shorโ€™s Algorithm: A birdโ€™s eye viewOn input an integer ๐‘€ , Shorโ€™s algorithm outputs the prime factoriza-tion of ๐‘€ in time that is polynomial in log๐‘€ . The main steps in thealgorithm are the following:

Step 1: Reduce to period finding. The first step in the algorithm is topick a random ๐ด โˆˆ 0, 1 โ€ฆ , ๐‘€ โˆ’ 1 and define the function ๐น๐ด โˆถ0, 1๐‘š โ†’ 0, 1๐‘š as ๐น๐ด(๐‘ฅ) = ๐ด๐‘ฅ( mod ๐‘€) where we identify thestring ๐‘ฅ โˆˆ 0, 1๐‘š with an integer using the binary representation,and similarly represent the integer ๐ด๐‘ฅ( mod ๐‘€) as a string. (We willchoose ๐‘š to be some polynomial in log๐‘€ and so in particular 0, 1๐‘šis a large enough set to represent all the numbers in 0, 1, โ€ฆ , ๐‘€ โˆ’ 1).

Some not-too-hard (though somewhat technical) calculations showthat: (1) The function ๐น๐ด is periodic (i.e., there is some integer ๐‘๐ด suchthat ๐น๐ด(๐‘ฅ + ๐‘๐ด) = ๐น๐ด(๐‘ฅ) for โ€œalmostโ€ every ๐‘ฅ) and more importantly(2) If we can recover the period ๐‘๐ด of ๐น๐ด for several randomly cho-sen ๐ดโ€™s, then we can recover the factorization of ๐‘€ . (Weโ€™ll ignore theโ€œalmostโ€ qualifier in the discussion below; it causes some annoying,yet ultimately manageable, technical issues in the full-fledged algo-rithm.) Hence, factoring ๐‘€ reduces to finding out the period of thefunction ๐น๐ด. Exercise 23.2 asks you to work out this for the related

Page 632: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

632 introduction to theoretical computer science

task of computing the discrete logarithm (which underlies the securityof the Diffie-Hellman key exchange and elliptic curve cryptography).Step 2: Period finding via the Quantum Fourier Transform. Using a simpletrick known as โ€œrepeated squaringโ€, it is possible to compute themap ๐‘ฅ โ†ฆ ๐น๐ด(๐‘ฅ) in time polynomial in ๐‘š, which means we can alsocompute this map using a polynomial number of NAND gates,and soin particular we can generate in polynomial quantum time a quantumstate ๐œŒ that is (up to normalization) equal toโˆ‘๐‘ฅโˆˆ0,1๐‘š |๐‘ฅโŸฉ|๐น๐ด(๐‘ฅ)โŸฉ . (23.13)

In particular, if we were to measure the state ๐œŒ, we would get a ran-dom pair of the form (๐‘ฅ, ๐‘ฆ) where ๐‘ฆ = ๐น๐ด(๐‘ฅ). So far, this is not at allimpressive. After all, we did not need the power of quantum comput-ing to generate such pairs: we could simply generate a random ๐‘ฅ andthen compute ๐น๐ด(๐‘ฅ).

Another way to describe the state ๐œŒ is that the coefficient of |๐‘ฅโŸฉ|๐‘ฆโŸฉ in๐œŒ is proportional to ๐‘“๐ด,๐‘ฆ(๐‘ฅ) where ๐‘“๐ด,๐‘ฆ โˆถ 0, 1๐‘š โ†’ โ„ is the functionsuch that ๐‘“๐ด,๐‘ฆ(๐‘ฅ) = โŽงโŽจโŽฉ1 ๐‘ฆ = ๐ด๐‘ฅ( mod ๐‘€)0 otherwise

. (23.14)

The magic of Shorโ€™s algorithm comes from a procedure knownas the Quantum Fourier Transform. It allows to change the state ๐œŒ intothe state ๐œŒ where the coefficient of |๐‘ฅโŸฉ|๐‘ฆโŸฉ is now proportional to the๐‘ฅ-th Fourier coefficient of ๐‘“๐ด,๐‘ฆ. In other words, if we measure the state๐œŒ, we will obtain a pair (๐‘ฅ, ๐‘ฆ) such that the probability of choosing ๐‘ฅis proportional to the square of the weight of the frequency ๐‘ฅ in therepresentation of the function ๐‘“๐ด,๐‘ฆ. Since for every ๐‘ฆ, the function๐‘“๐ด,๐‘ฆ has the period ๐‘๐ด, it can be shown that the frequency ๐‘ฅ will be(almost) a multiple of ๐‘๐ด. If we make several such samples ๐‘ฆ0, โ€ฆ , ๐‘ฆ๐‘˜and obtain the frequencies ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘˜, then the true period ๐‘๐ด dividesall of them, and it can be shown that it is going to be in fact the greatestcommon divisor (g.c.d.) of all these frequencies: a value which can becomputed in polynomial time.

As mentioned above, we can recover the factorization of ๐‘€ fromthe periods of ๐น๐ด0 , โ€ฆ , ๐น๐ด๐‘ก for some randomly chosen ๐ด0, โ€ฆ , ๐ด๐‘ก in0, โ€ฆ , ๐‘€ โˆ’ 1 and ๐‘ก which is polynomial in log๐‘€ .

The resulting algorithm can be described in a high (and somewhatinaccurate) level as follows:

Shorโ€™s Algorithm: (sketch)Input: Number ๐‘€ โˆˆ โ„•.Output: Prime factorization of ๐‘€ .Operations:

Page 633: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 633

1. Repeat the following ๐‘˜ = ๐‘๐‘œ๐‘™๐‘ฆ(log๐‘€) number of times:a. Choose ๐ด โˆˆ 0, โ€ฆ , ๐‘€ โˆ’ 1 at random, and let ๐‘“๐ด โˆถ โ„ค๐‘€ โ†’ โ„ค๐‘€ be

the map ๐‘ฅ โ†ฆ ๐ด๐‘ฅ mod ๐‘€ .b. For ๐‘ก = ๐‘๐‘œ๐‘™๐‘ฆ(log๐‘€), repeat ๐‘ก times the following step: Quantum

Fourier Transform to create a quantum state |๐œ“โŸฉ over ๐‘๐‘œ๐‘™๐‘ฆ(log(๐‘š))qubits, such that if we measure |๐œ“โŸฉ we obtain a pair of strings(๐‘—, ๐‘ฆ) with probability proportional to the square of the coefficientcorresponding to the wave function ๐‘ฅ โ†ฆ cos(๐‘ฅ๐œ‹๐‘—/๐‘€) or ๐‘ฅ โ†ฆsin(๐‘ฅ๐œ‹๐‘—/๐‘€) in the Fourier transform of the function ๐‘“๐ด,๐‘ฆ โˆถ โ„ค๐‘š โ†’0, 1 defined as ๐‘“๐ด,๐‘ฆ(๐‘ฅ) = 1 iff ๐‘“๐ด(๐‘ฅ) = ๐‘ฆ.

c. If ๐‘—1, โ€ฆ , ๐‘—๐‘ก are the coefficients we obtained in the previous step,then the least common multiple of ๐‘€/๐‘—1, โ€ฆ , ๐‘€/๐‘—๐‘ก is likely to bethe period of the function ๐‘“๐ด.

2. If we let ๐ด0, โ€ฆ , ๐ด๐‘˜โˆ’1 and ๐‘0, โ€ฆ , ๐‘๐‘˜โˆ’1 be the numbers we chose inthe previous step and the corresponding periods of the functions๐‘“๐ด0 , โ€ฆ , ๐‘“๐ด๐‘˜โˆ’1 then we can use classical results in number theory toobtain from these a non-trivial prime factor ๐‘„ of ๐‘€ (if such exists).We can now run the algorithm again with the (smaller) input ๐‘€/๐‘„to obtain all other factors.

Reducing factoring to order finding is cumbersome, but can bedone in polynomial time using a classical computer. The key quantumingredient in Shorโ€™s algorithm is the quantum fourier transform.

RRemark 23.13 โ€” Quantum Fourier Transform. Despiteits name, the Quantum Fourier Transform does notactually give a way to compute the Fourier Trans-form of a function ๐‘“ โˆถ 0, 1๐‘š โ†’ โ„. This would beimpossible to do in time polynomial in ๐‘š, as simplywriting down the Fourier Transform would require 2๐‘šcoefficients. Rather the Quantum Fourier Transformgives a quantum state where the amplitude correspond-ing to an element (think: frequency) โ„Ž is equal tothe corresponding Fourier coefficient. This allows tosample from a distribution where โ„Ž is drawn withprobability proportional to the square of its Fouriercoefficient. This is not the same as computing theFourier transform, but is good enough for recoveringthe period.

23.11 QUANTUM FOURIER TRANSFORM (ADVANCED, OPTIONAL)

The above description of Shorโ€™s algorithm skipped over the implemen-tation of the main quantum ingredient: the Quantum Fourier Transformalgorithm. In this section we discuss the ideas behind this algorithm.We will be rather brief and imprecise. Section 23.13 contain referencesto sources of more information about this topic.

Page 634: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

634 introduction to theoretical computer science

To understand the Quantum Fourier Transform, we need to betterunderstand the Fourier Transform itself. In particular, we will needto understand how it applies not just to functions whose input is areal number but to functions whose domain can be any arbitrarycommutative group. Therefore we now take a short detour to (verybasic) group theory, and define the notion of periodic functions overgroups.

RRemark 23.14 โ€” Group theory. While we define the con-cepts we use, some background in group or numbertheory will be very helpful for fully understandingthis section. In particular we will use the notion offinite commutative (a.k.a. Abelian) groups. These aredefined as follows.

โ€ข A finite group ๐”พ is a pair (๐บ, โ‹†) where ๐บ is a finiteset of elements and โ‹† is a binary operation mappinga pair ๐‘”, โ„Ž of elements in ๐บ to the element ๐‘” โ‹† โ„Ž in ๐บ.We often identify ๐”พ with the set ๐บ of its elements,and so use notation such as ๐‘” โˆˆ ๐”พ to indicate that ๐‘”is an element of ๐”พ and |๐”พ| to denote the number ofelements in ๐”พ.

โ€ข The operation โ‹† satisfies the sort of properties thata product operation does, namely, it is associative(i.e., (๐‘” โ‹† โ„Ž) โ‹† ๐‘“ = ๐‘” โ‹† (โ„Ž โ‹† ๐‘“)) and there is someelement 1 such that ๐‘” โ‹† 1 = ๐‘” for all ๐‘”, and forevery ๐‘” โˆˆ ๐”พ there exists an element ๐‘”โˆ’1 such that๐‘” โ‹† ๐‘”โˆ’1 = 1.

โ€ข A group is called commutative (also known asAbelian) if ๐‘” โ‹† โ„Ž = โ„Ž โ‹† ๐‘” for all ๐‘”, โ„Ž โˆˆ ๐”พ.

The Fourier transform is a deep and vast topic, on which we willbarely touch upon here. Over the real numbers, the Fourier trans-form of a function ๐‘“ is obtained by expressing ๐‘“ in the form โˆ‘ ๐‘“(๐›ผ)๐œ’๐›ผwhere the ๐œ’๐›ผโ€™s are โ€œwave functionsโ€ (e.g. sines and cosines). How-ever, it turns out that the same notion exists for every Abelian group ๐”พ.Specifically, for every such group ๐”พ, if ๐‘“ is a function mapping ๐”พ to โ„‚,then we can write ๐‘“ as ๐‘“ = โˆ‘๐‘”โˆˆ๐”พ ๐‘“(๐‘”)๐œ’๐‘” , (23.15)

where the ๐œ’๐‘”โ€™s are functions mapping ๐”พ to โ„‚ that are analogs ofthe โ€œwave functionsโ€ for the group ๐”พ and for every ๐‘” โˆˆ ๐”พ, ๐‘“(๐‘”) is acomplex number known as the Fourier coefficient of ๐‘“ corresponding to๐‘”. Specifically, the equation (23.15) means that if we think of ๐‘“ as a|๐”พ| dimensional vector over the complex numbers, then we can writethis vector as a sum (with certain coefficients) of the vectors ๐œ’๐‘”๐‘”โˆˆ๐”พ.

Page 635: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 635

The representation (23.15) is known as the Fourier expansion or Fouriertransform of ๐‘“ , the numbers ( ๐‘“(๐‘”))๐‘”โˆˆ๐”พ are known as the Fourier coef-ficients of ๐‘“ and the functions (๐œ’๐‘”)๐‘”โˆˆ๐”พ are known as the Fourier char-acters. The central property of the Fourier characters is that they arehomomorphisms of the group into the complex numbers, in the sensethat for every ๐‘ฅ, ๐‘ฅโ€ฒ โˆˆ ๐”พ, ๐œ’๐‘”(๐‘ฅ โ‹† ๐‘ฅโ€ฒ) = ๐œ’๐‘”(๐‘ฅ)๐œ’๐‘”(๐‘ฅโ€ฒ), where โ‹† is the groupoperation. One corollary of this property is that if ๐œ’๐‘”(โ„Ž) = 1 then ๐œ’๐‘”is โ„Ž periodic in the sense that ๐œ’๐‘”(๐‘ฅ โ‹† โ„Ž) = ๐œ’๐‘”(๐‘ฅ) for every ๐‘ฅ. It turnsout that if ๐‘“ is periodic with minimal period โ„Ž, then the only Fouriercharacters that have non zero coefficients in the expression (23.15)are those that are โ„Ž periodic as well. This can be used to recover theperiod of ๐‘“ from its Fourier expansion.

23.11.1 Quantum Fourier Transform over the Boolean Cube: Simonโ€™sAlgorithm

We now describe the simplest setting of the Quantum Fourier Trans-form: the group 0, 1๐‘› with the XOR operation, which weโ€™ll denoteby (0, 1๐‘›, โŠ•). (Since XOR is equal to addition modulo two, thisgroup is also often denoted as (โ„ค2)๐‘›.) It can be shown that the Fouriertransform over (0, 1๐‘›, โŠ•) corresponds to expressing ๐‘“ โˆถ 0, 1๐‘› โ†’ โ„‚as ๐‘“ = โˆ‘๐‘ฆโˆˆ0,1 ๐‘“(๐‘ฆ)๐œ’๐‘ฆ (23.16)

where ๐œ’๐‘ฆ โˆถ 0, 1๐‘› โ†’ โ„‚ is defined as ๐œ’๐‘ฆ(๐‘ฅ) = (โˆ’1)โˆ‘๐‘– ๐‘ฆ๐‘–๐‘ฅ๐‘– and๐‘“(๐‘ฆ) = 1โˆš2๐‘› โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘“(๐‘ฅ)(โˆ’1)โˆ‘๐‘– ๐‘ฆ๐‘–๐‘ฅ๐‘– .The Quantum Fourier Transform over (0, 1๐‘›, โŠ•) is actually quite

simple:

Theorem 23.15 โ€” QFT Over the Boolean Cube. Let ๐œŒ = โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘“(๐‘ฅ)|๐‘ฅโŸฉbe a quantum state where ๐‘“ โˆถ 0, 1๐‘› โ†’ โ„‚ is some function satisfy-ing โˆ‘๐‘ฅโˆˆ0,1๐‘› |๐‘“(๐‘ฅ)|2 = 1. Then we can use ๐‘› gates to transform ๐œŒ tothe state โˆ‘๐‘ฆโˆˆ0,1๐‘› ๐‘“(๐‘ฆ)|๐‘ฆโŸฉ (23.17)

where ๐‘“ = โˆ‘๐‘ฆ ๐‘“(๐‘ฆ)๐œ’๐‘ฆ and ๐œ’๐‘ฆ โˆถ 0, 1๐‘› โ†’ โ„‚ is the function๐œ’๐‘ฆ(๐‘ฅ) = โˆ’1โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘– .Proof Idea:

The idea behind the proof is that the Hadamard operation corre-sponds to the Fourier transform over the group 0, 1๐‘› (with the XORoperations). To show this, we just need to do the calculations.โ‹†

Page 636: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

636 introduction to theoretical computer science

Proof of Theorem 23.15. We can express the Hadamard operation HADas follows:

HAD|๐‘ŽโŸฉ = 1โˆš2 (|0โŸฉ + (โˆ’1)๐‘Ž|1โŸฉ) . (23.18)We are given the state ๐œŒ = โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘“(๐‘ฅ)|๐‘ฅโŸฉ . (23.19)

Now suppose that we apply the HAD operation to each of the ๐‘›qubits. We can see that we get the state2โˆ’๐‘›/2 โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘“(๐‘ฅ) ๐‘›โˆ’1โˆ๐‘–=0(|0โŸฉ + (โˆ’1)๐‘ฅ๐‘– |1โŸฉ) . (23.20)

We can now use the distributive law and open up a term of theform ๐‘“(๐‘ฅ)(|0โŸฉ + (โˆ’1)๐‘ฅ0 |1โŸฉ) โ‹ฏ (|0โŸฉ + (โˆ’1)๐‘ฅ๐‘›โˆ’1 |1โŸฉ) (23.21)

to the following sum over 2๐‘› terms:๐‘“(๐‘ฅ) โˆ‘๐‘ฆโˆˆ0,1๐‘›(โˆ’1)โˆ‘ ๐‘ฆ๐‘–๐‘ฅ๐‘– |๐‘ฆโŸฉ . (23.22)

(If you find the above confusing, try to work out explicitly thiscalculation for ๐‘› = 3; namely show that (|0โŸฉ + (โˆ’1)๐‘ฅ0 |1โŸฉ)(|0โŸฉ +(โˆ’1)๐‘ฅ1 |1โŸฉ)(|0โŸฉ+(โˆ’1)๐‘ฅ2 |1โŸฉ) is the same as the sum over 23 terms |000โŸฉ+(โˆ’1)๐‘ฅ2 |001โŸฉ + โ‹ฏ + (โˆ’1)๐‘ฅ0+๐‘ฅ1+๐‘ฅ2 |111โŸฉ.)

By changing the order of summations, we see that the final state isโˆ‘๐‘ฆโˆˆ0,1๐‘› 2โˆ’๐‘›/2( โˆ‘๐‘ฅโˆˆ0,1๐‘› ๐‘“(๐‘ฅ)(โˆ’1)โˆ‘ ๐‘ฅ๐‘–๐‘ฆ๐‘–)|๐‘ฆโŸฉ (23.23)

which exactly corresponds to ๐œŒ.

23.11.2 From Fourier to Period finding: Simonโ€™s Algorithm (advanced,optional)

Using Theorem 23.15 it is not hard to get an algorithm that can recovera string โ„Žโˆ— โˆˆ 0, 1๐‘› given a circuit that computes a function ๐น โˆถ0, 1๐‘› โ†’ 0, 1โˆ— that is โ„Žโˆ— periodic in the sense that ๐น(๐‘ฅ) = ๐น(๐‘ฅโ€ฒ) fordistinct ๐‘ฅ, ๐‘ฅโ€ฒ if and only if ๐‘ฅโ€ฒ = ๐‘ฅ โŠ• โ„Žโˆ—. The key observation is that ifwe compute the state โˆ‘๐‘ฅโˆˆ0,1๐‘› |๐‘ฅโŸฉ|๐น (๐‘ฅ)โŸฉ, and perform the QuantumFourier transform on the first ๐‘› qubits, then we would get a state suchthat the only basis elements with nonzero coefficients would be of theform |๐‘ฆโŸฉ where โˆ‘ ๐‘ฆ๐‘–โ„Žโˆ—๐‘– = 0( mod 2) (23.24)

Page 637: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 637

So, by measuring the state, we can obtain a sample of a random๐‘ฆ satisfying (23.24). But since (23.24) is a linear equation modulo 2about the unknown ๐‘› variables โ„Žโˆ—0, โ€ฆ , โ„Žโˆ—๐‘›โˆ’1, if we repeat this proce-dure to get ๐‘› such equations, we will have at least as many equationsas variables and (it can be shown that) this will suffice to recover โ„Žโˆ—.

This result is known as Simonโ€™s Algorithm, and it preceded andinspired Shorโ€™s algorithm.

23.11.3 From Simon to Shor (advanced, optional)Theorem 23.15 seemed to really use the special bit-wise structure ofthe group 0, 1๐‘›, and so one could wonder if it can be extended toother groups. However, it turns out that we can in fact achieve such ageneralization.

The key step in Shorโ€™s algorithm is to implement the Fourier trans-form for the group โ„ค๐ฟ which is the set of numbers 0, โ€ฆ , ๐ฟ โˆ’ 1 withthe operation being addition modulo ๐ฟ. In this case it turns out thatthe Fourier characters are the functions ๐œ’๐‘ฆ(๐‘ฅ) = ๐œ”๐‘ฆ๐‘ฅ where ๐œ” = ๐‘’2๐œ‹๐‘–/๐ฟ(๐‘– here denotes the complex number

โˆšโˆ’1). The ๐‘ฆ-th Fourier coeffi-cient of a function ๐‘“ โˆถ โ„ค๐ฟ โ†’ โ„‚ is๐‘“(๐‘ฆ) = 1โˆš๐ฟ โˆ‘๐‘ฅโˆˆโ„ค๐ฟ ๐‘“(๐‘ฅ)๐œ”๐‘ฅ๐‘ฆ . (23.25)

The key to implementing the Quantum Fourier Transform for suchgroups is to use the same recursive equations that enable the classicalFast Fourier Transform (FFT) algorithm. Specifically, consider the casethat ๐ฟ = 2โ„“. We can separate the sum over ๐‘ฅ in (23.25) to the termscorresponding to even ๐‘ฅโ€™s (of the form ๐‘ฅ = 2๐‘ง) and odd ๐‘ฅโ€™s (of theform ๐‘ฅ = 2๐‘ง + 1) to obtain๐‘“(๐‘ฆ) = 1โˆš๐ฟ โˆ‘๐‘งโˆˆ๐‘๐ฟ/2 ๐‘“(2๐‘ง)(๐œ”2)๐‘ฆ๐‘ง + ๐œ”๐‘ฆโˆš๐ฟ โˆ‘๐‘งโˆˆโ„ค๐ฟ/2 ๐‘“(2๐‘ง + 1)(๐œ”2)๐‘ฆ๐‘ง (23.26)

which reduces computing the Fourier transform of ๐‘“ over the groupโ„ค2โ„“ to computing the Fourier transform of the functions ๐‘“๐‘’๐‘ฃ๐‘’๐‘› and๐‘“๐‘œ๐‘‘๐‘‘ (corresponding to the applying ๐‘“ to only the even and odd ๐‘ฅโ€™srespectively) which have 2โ„“โˆ’1 inputs that we can identify with thegroup โ„ค2โ„“โˆ’1 = โ„ค๐ฟ/2.

Specifically, the Fourier characters of the group โ„ค๐ฟ/2 are the func-tions ๐œ’๐‘ฆ(๐‘ฅ) = ๐‘’2๐œ‹๐‘–/(๐ฟ/2)๐‘ฆ๐‘ฅ = (๐œ”2)๐‘ฆ๐‘ฅ for every ๐‘ฅ, ๐‘ฆ โˆˆ โ„ค๐ฟ/2. Moreover,since ๐œ”๐ฟ = 1, (๐œ”2)๐‘ฆ = (๐œ”2)๐‘ฆ mod ๐ฟ/2 for every ๐‘ฆ โˆˆ โ„•. Thus (23.26)translates into๐‘“(๐‘ฆ) = ๐‘“๐‘’๐‘ฃ๐‘’๐‘›(๐‘ฆ mod ๐ฟ/2) + ๐œ”๐‘ฆ ๐‘“๐‘œ๐‘‘๐‘‘(๐‘ฆ mod ๐ฟ/2) . (23.27)

This observation is usually used to obtain a fast (e.g. ๐‘‚(๐ฟ log๐ฟ))time to compute the Fourier transform in a classical setting, but it can

Page 638: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

638 introduction to theoretical computer science

be used to obtain a quantum circuit of ๐‘๐‘œ๐‘™๐‘ฆ(log๐ฟ) gates to transform astate of the form โˆ‘๐‘ฅโˆˆโ„ค๐ฟ ๐‘“(๐‘ฅ)|๐‘ฅโŸฉ to a state of the form โˆ‘๐‘ฆโˆˆโ„ค๐ฟ ๐‘“(๐‘ฆ)|๐‘ฆโŸฉ.

The case that ๐ฟ is not an exact power of two causes some complica-tions in both the classical case of the Fast Fourier Transform and thequantum setting of Shorโ€™s algorithm. However, it is possible to handlethese. The idea is that we can embed ๐‘๐ฟ in the group โ„ค๐ดโ‹…๐ฟ for anyinteger ๐ด, and we can find an integer ๐ด such that ๐ด โ‹… ๐ฟ will be closeenough to a power of 2 (i.e., a number of the form 2๐‘š for some ๐‘š), sothat if we do the Fourier transform over the group โ„ค2๐‘š then we willnot introduce too many errors.

Figure 23.7: Conjectured status of BQP with respect toother complexity classes. We know that P โŠ† BPP โŠ†BQP and BQP โŠ† PSPACE โŠ† EXP. It is not known ifany of these inclusions are strict though it is believedthat they are. The relation between BQP and NP isunknown but they are believed to be incomparable,and that NP-complete problems can not be solved inpolynomial time by quantum computers. However, itis possible that BQP contains NP and even PSPACE,and it is also possible that quantum computersoffer no super-polynomial speedups and that P =BQP. The class โ€œNISQโ€ above is not a well definedcomplexity class, but rather captures the currentstatus of quantum devices, which seem to be able tosolve a set of computational tasks that is incomparablewith the set of tasks solvable by classical computers.The diagram is also inaccurate in the sense that at themoment the โ€œquantum supremacyโ€ tasks on whichsuch devices seem to offer exponential speedupsdo not correspond to Boolean functions/ decisionproblems. Chapter Recap

โ€ข The state of an ๐‘›-qubit quantum system can bemodeled as a 2๐‘› dimensional vector

โ€ข An operation on the state corresponds to applyinga unitary matrix to this vector.

โ€ข Quantum circuits are obtained by composing basicoperations such as HAD and ๐‘ˆ๐‘๐ด๐‘๐ท.

โ€ข We can use quantum circuits to define the classesBQP/poly and BQP which are the quantum analogsof P/poly and BPP respectively.

โ€ข There are some problems for which the best knownquantum algorithm is exponentially faster thanthe best known, but quantum computing is not apanacea. In particular, as far as we know, quantumcomputers could still require exponential time tosolve NP-complete problems such as SAT.

Page 639: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

quantum computing 639

7 You can use ๐‘ˆ๐‘๐ด๐‘๐ท to simulate NAND gates.

8 Use the alternative characterization of P as in SolvedExercise 13.4.9 You can use the HAD gate to simulate a coin toss.

10 In exponential time simulating quantum computa-tion boils down to matrix multiplication.11 If a reduction can be implemented in P it can beimplemented in BQP as well.

12 We are given โ„Ž = ๐‘”๐‘ฅ and need to recover ๐‘ฅ. Todo so we can compute the order of various elementsof the form โ„Ž๐‘Ž๐‘”๐‘. The order of such an element isa number ๐‘ satisfying ๐‘(๐‘ฅ๐‘Ž + ๐‘) = 0 (mod |๐”พ|).With a few random examples we will get a non trivialequation on ๐‘ฅ (where ๐‘ is not zero modulo |๐”พ|) andthen we can use our knowledge of ๐‘Ž, ๐‘, ๐‘ to recover ๐‘ฅ.

23.12 EXERCISES

Exercise 23.1 โ€” Quantum and classical complexity class relations. Prove thefollowing relations between quantum complexity classes and classicalones:

1. P/poly โŠ† BQP/poly. See footnote for hint.7

2. P โŠ† BQP. See footnote for hint.8

3. BPP โŠ† BQP. See footnote for hint.9

4. BQP โŠ† EXP. See footnote for hint.10

5. If SAT โˆˆ BQP then NP โŠ† BQP. See footnote for hint.11

Exercise 23.2 โ€” Discrete logarithm from order finding. Show a probabilisticpolynomial time classical algorithm that given an Abelian finite group๐”พ (in the form of an algorithm that computes the group operation),a generator ๐‘” for the group, and an element โ„Ž โˆˆ ๐”พ, as well as access toa black box that on input ๐‘“ โˆˆ ๐”พ outputs the order of ๐‘“ (the smallest ๐‘Žsuch that ๐‘“๐‘Ž = 1), computes the discrete logarithm of โ„Ž with respect to๐‘”. That is the algorithm should output a number ๐‘ฅ such that ๐‘”๐‘ฅ = โ„Ž.See footnote for hint.12

23.13 BIBLIOGRAPHICAL NOTES

An excellent gentle introduction to quantum computation is given inMerminโ€™s book [Mer07]. In particular the first 100 pages (Chapter1 to 4) of [Mer07] cover all the material of this chapter in a muchmore comprehensive way. This material is also covered in the first5 chapters of De-Wolfโ€™s online lecture notes. For a more condensedexposition, the chapter on quantum computation in my book withArora (see draft here) is one relatively short source that containsfull descriptions of Groverโ€™s, Simonโ€™s and Shorโ€™s algorithms. Thisblog post of Aaronson contains a high level explanation of Shorโ€™salgorithm which ends with links to several more detailed expositions.Chapters 9 and 10 in Aaronsonโ€™s book [Aar13] give an informal buthighly informative introduction to the topics of this chapter and muchmore. Chapter 10 in Avi Wigdersonโ€™s book also provides a high leveloverview of quantum computing. Other recommended resourcesinclude Andrew Childsโ€™ lecture notes on quantum algorithms, aswell as the lecture notes of Umesh Vazirani, John Preskill, and JohnWatrous.

There are many excellent videos available online covering some ofthese materials. The videos of Umesh Vaziraniโ€™s EdX course are an

Page 640: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

640 introduction to theoretical computer science

accessible and recommended introduction to quantum computing.Regarding quantum mechanics in general, this video illustrates thedouble slit experiment, this Scientific American video is a nice ex-position of Bellโ€™s Theorem. This talk and panel moderated by BrianGreene discusses some of the philosophical and technical issuesaround quantum mechanics and its so called โ€œmeasurement prob-lemโ€. The Feynman lecture on the Fourier Transform and quantummechanics in general are very much worth reading. The Fourier trans-form is covered in these videos of Dr. Chris Geoscience, Clare Zhangand Vi Hart. See also Kelsey Houston-Edwardsโ€™s video on Shorโ€™s Al-gorithm.

The form of Bellโ€™s game we discuss in Section 23.3 was given byClauser, Horne, Shimony, and Holt.

The Fast Fourier Transform, used as a component in Shorโ€™s algo-rithm, is one of the most widely used algorithms invented. The storiesof its discovery by Gauss in trying to calculate asteroid orbits andrediscovery by Tukey during the cold war are fascinating as well.

The image in Fig. 23.2 is taken from Wikipedia.Thanks to Scott Aaronson for many helpful comments about this

chapter.

Page 641: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

VIAPPENDICES

Page 642: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE
Page 643: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

Bibliography

[Aar05] Scott Aaronson. โ€œNP-complete problems and physicalrealityโ€. In: ACM Sigact News 36.1 (2005). Available onhttps://arxiv.org/abs/quant-ph/0502072, pp. 30โ€“52.

[Aar13] Scott Aaronson. Quantum computing since Democritus.Cambridge University Press, 2013.

[Aar16] Scott Aaronson. โ€œP =? NPโ€. In: Open problems in mathe-matics. Available on https://www.scottaaronson.com/papers/pnp.pdf. Springer, 2016, pp. 1โ€“122.

[Aar20] Scott Aaronson. โ€œThe Busy Beaver Frontierโ€. In: SIGACTNews (2020). Available on https://www.scottaaronson.com/papers/bb.pdf.

[AKS04] Manindra Agrawal, Neeraj Kayal, and Nitin Saxena.โ€œPRIMES is in Pโ€. In: Annals of mathematics (2004),pp. 781โ€“793.

[Asp18] James Aspens. Notes on Discrete Mathematics. Onlinetextbook for CS 202. Available on http://www.cs.yale.edu/homes/aspnes/classes/202/notes.pdf. 2018.

[Bar84] H. P. Barendregt. The lambda calculus : its syntax andsemantics. Amsterdam New York New York, N.Y: North-Holland Sole distributors for the U.S.A. and Canada,Elsevier Science Pub. Co, 1984. isbn: 0444875085.

[Bjo14] Andreas Bjorklund. โ€œDeterminant sums for undirectedhamiltonicityโ€. In: SIAM Journal on Computing 43.1(2014), pp. 280โ€“299.

[Blรค13] Markus Blรคser. Fast Matrix Multiplication. Graduate Sur-veys 5. Theory of Computing Library, 2013, pp. 1โ€“60.doi : 10 . 4086/ toc .gs . 2013 .005. url: http : //www.theoryofcomputing.org/library.html.

[Boo47] George Boole. The mathematical analysis of logic. Philo-sophical Library, 1847.

Compiled on 8.26.2020 18:10

Page 644: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

644 introduction to theoretical computer science

[BU11] David Buchfuhrer and Christopher Umans. โ€œThe com-plexity of Boolean formula minimizationโ€. In: Journal ofComputer and System Sciences 77.1 (2011), pp. 142โ€“153.

[Bur78] Arthur W Burks. โ€œBooke review: Charles S. Peirce, Thenew elements of mathematicsโ€. In: Bulletin of the Ameri-can Mathematical Society 84.5 (1978), pp. 913โ€“918.

[Cho56] Noam Chomsky. โ€œThree models for the description oflanguageโ€. In: IRE Transactions on information theory 2.3(1956), pp. 113โ€“124.

[Chu41] Alonzo Church. The calculi of lambda-conversion. Prince-ton London: Princeton University Press H. Milford,Oxford University Press, 1941. isbn: 978-0-691-08394-0.

[CM00] Bruce Collier and James MacLachlan. Charles Babbage:And the engines of perfection. Oxford University Press,2000.

[Coh08] Paul Cohen. Set theory and the continuum hypothesis. Mine-ola, N.Y: Dover Publications, 2008. isbn: 0486469212.

[Coh81] Danny Cohen. โ€œOn holy wars and a plea for peaceโ€. In:Computer 14.10 (1981), pp. 48โ€“54.

[Coo66] SA Cook. โ€œOn the minimum computation time for mul-tiplicationโ€. In: Doctoral dissertation, Harvard UniversityDepartment of Mathematics, Cambridge, Mass. 1 (1966).

[Coo87] James W Cooley. โ€œThe re-discovery of the fast Fouriertransform algorithmโ€. In: Microchimica Acta 93.1-6(1987), pp. 33โ€“45.

[Cor+09] Thomas H. Cormen et al. Introduction to Algorithms, 3rdEdition. MIT Press, 2009. isbn: 978-0-262-03384-8. url:http://mitpress.mit.edu/books/introduction-algorithms.

[CR73] Stephen A Cook and Robert A Reckhow. โ€œTime boundedrandom access machinesโ€. In: Journal of Computer andSystem Sciences 7.4 (1973), pp. 354โ€“375.

[CRT06] Emmanuel J Candes, Justin K Romberg, and Terence Tao.โ€œStable signal recovery from incomplete and inaccuratemeasurementsโ€. In: Communications on Pure and AppliedMathematics: A Journal Issued by the Courant Institute ofMathematical Sciences 59.8 (2006), pp. 1207โ€“1223.

[CT06] Thomas M. Cover and Joy A. Thomas. โ€œElements ofinformation theory 2nd editionโ€. In: Willey-Interscience:NJ (2006).

[Dau90] Joseph Warren Dauben. Georg Cantor: His mathematics andphilosophy of the infinite. Princeton University Press, 1990.

Page 645: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

BIBLIOGRAPHY 645

[De 47] Augustus De Morgan. Formal logic: or, the calculus ofinference, necessary and probable. Taylor and Walton, 1847.

[Don06] David L Donoho. โ€œCompressed sensingโ€. In: IEEE Trans-actions on information theory 52.4 (2006), pp. 1289โ€“1306.

[DPV08] Sanjoy Dasgupta, Christos H. Papadimitriou, and UmeshV. Vazirani. Algorithms. McGraw-Hill, 2008. isbn: 978-0-07-352340-8.

[Ell10] Jordan Ellenberg. โ€œFill in the blanks: Using math to turnlo-res datasets into hi-res samplesโ€. In: Wired Magazine18.3 (2010), pp. 501โ€“509.

[Ern09] Elizabeth Ann Ernst. โ€œOptimal combinational multi-level logic synthesisโ€. Available on https://deepblue.lib .umich .edu/handle/2027 .42/62373. PhD thesis.University of Michigan, 2009.

[Fle18] Margaret M. Fleck. Building Blocks for Theoretical Com-puter Science. Online book, available at http://mfleck.cs.illinois.edu/building-blocks/. 2018.

[Fรผr07] Martin Fรผrer. โ€œFaster integer multiplicationโ€. In: Pro-ceedings of the 39th Annual ACM Symposium on Theory ofComputing, San Diego, California, USA, June 11-13, 2007.2007, pp. 57โ€“66.

[FW93] Michael L Fredman and Dan E Willard. โ€œSurpassingthe information theoretic bound with fusion treesโ€.In: Journal of computer and system sciences 47.3 (1993),pp. 424โ€“436.

[GKS17] Daniel Gรผnther, รgnes Kiss, and Thomas Schneider.โ€œMore efficient universal circuit constructionsโ€. In: Inter-national Conference on the Theory and Application of Cryp-tology and Information Security. Springer. 2017, pp. 443โ€“470.

[Gom+08] Carla P Gomes et al. โ€œSatisfiability solversโ€. In: Founda-tions of Artificial Intelligence 3 (2008), pp. 89โ€“134.

[Gra05] Judith Grabiner. The origins of Cauchyโ€™s rigorous cal-culus. Mineola, N.Y: Dover Publications, 2005. isbn:9780486438153.

[Gra83] Judith V Grabiner. โ€œWho gave you the epsilon? Cauchyand the origins of rigorous calculusโ€. In: The AmericanMathematical Monthly 90.3 (1983), pp. 185โ€“194.

[Hag98] Torben Hagerup. โ€œSorting and searching on the wordRAMโ€. In: Annual Symposium on Theoretical Aspects ofComputer Science. Springer. 1998, pp. 366โ€“398.

Page 646: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

646 introduction to theoretical computer science

[Hal60] Paul R Halmos. Naive set theory. Republished in 2017 byCourier Dover Publications. 1960.

[Har41] GH Hardy. โ€œA Mathematicianโ€™s Apologyโ€. In: (1941).[HJB85] Michael T Heideman, Don H Johnson, and C Sidney

Burrus. โ€œGauss and the history of the fast Fourier trans-formโ€. In: Archive for history of exact sciences 34.3 (1985),pp. 265โ€“277.

[HMU14] John Hopcroft, Rajeev Motwani, and Jeffrey Ullman.Introduction to automata theory, languages, and compu-tation. Harlow, Essex: Pearson Education, 2014. isbn:1292039051.

[Hof99] Douglas Hofstadter. Gรถdel, Escher, Bach : an eternal goldenbraid. New York: Basic Books, 1999. isbn: 0465026567.

[Hol01] Jim Holt. โ€œThe Ada perplex: how Byronโ€™s daughter cameto be celebrated as a cybervisionaryโ€. In: The New Yorker5 (2001), pp. 88โ€“93.

[Hol18] Jim Holt. When Einstein walked with Gรถdel : excursions tothe edge of thought. New York: Farrar, Straus and Giroux,2018. isbn: 0374146705.

[HU69] John E Hopcroft and Jeffrey D Ullman. โ€œFormal lan-guages and their relation to automataโ€. In: (1969).

[HU79] John E. Hopcroft and Jeffrey D. Ullman. Introduction toAutomata Theory, Languages and Computation. Addison-Wesley, 1979. isbn: 0-201-02988-X.

[HV19] David Harvey and Joris Van Der Hoeven. โ€œInteger multi-plication in time O(n log n)โ€. working paper or preprint.Mar. 2019. url: https://hal.archives-ouvertes.fr/hal-02070778.

[Joh12] David S Johnson. โ€œA brief history of NP-completeness,1954โ€“2012โ€. In: Documenta Mathematica (2012), pp. 359โ€“376.

[Juk12] Stasys Jukna. Boolean function complexity: advances andfrontiers. Vol. 27. Springer Science & Business Media,2012.

[Kar+97] David R. Karger et al. โ€œConsistent Hashing and RandomTrees: Distributed Caching Protocols for Relieving HotSpots on the World Wide Webโ€. In: Proceedings of theTwenty-Ninth Annual ACM Symposium on the Theory ofComputing, El Paso, Texas, USA, May 4-6, 1997. 1997,pp. 654โ€“663. doi : 10.1145/258533.258660. url: https://doi.org/10.1145/258533.258660.

Page 647: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

BIBLIOGRAPHY 647

[Kar95] ANATOLII ALEXEEVICH Karatsuba. โ€œThe complexityof computationsโ€. In: Proceedings of the Steklov Institute ofMathematics-Interperiodica Translation 211 (1995), pp. 169โ€“183.

[Kle91] Israel Kleiner. โ€œRigor and Proof in Mathematics: AHistorical Perspectiveโ€. In: Mathematics Magazine 64.5(1991), pp. 291โ€“314. issn: 0025570X, 19300980. url:http://www.jstor.org/stable/2690647.

[Kle99] Jon M Kleinberg. โ€œAuthoritative sources in a hyper-linked environmentโ€. In: Journal of the ACM (JACM) 46.5(1999), pp. 604โ€“632.

[Koz97] Dexter Kozen. Automata and computability. New York:Springer, 1997. isbn: 978-3-642-85706-5.

[KT06] Jon M. Kleinberg and ร‰va Tardos. Algorithm design.Addison-Wesley, 2006. isbn: 978-0-321-37291-8.

[Kun18] Jeremy Kun. A programmerโ€™s introduction to mathematics.Middletown, DE: CreateSpace Independent PublishingPlatform, 2018. isbn: 1727125452.

[Liv05] Mario Livio. The equation that couldnโ€™t be solved : howmathematical genius discovered the language of symmetry.New York: Simon & Schuster, 2005. isbn: 0743258207.

[LLM18] Eric Lehman, Thomson F. Leighton, and Albert R. Meyer.Mathematics for Computer Science. 2018.

[LMS16] Helger Lipmaa, Payman Mohassel, and Seyed SaeedSadeghian. โ€œValiantโ€™s Universal Circuit: Improvements,Implementation, and Applications.โ€ In: IACR CryptologyePrint Archive 2016 (2016), p. 17.

[Lup58] O Lupanov. โ€œA circuit synthesis methodโ€. In: Izv. Vuzov,Radiofizika 1.1 (1958), pp. 120โ€“130.

[Lup84] Oleg B. Lupanov. Asymptotic complexity bounds for controlcircuits. In Russian. MSU, 1984.

[Lus+08] Michael Lustig et al. โ€œCompressed sensing MRIโ€. In:IEEE signal processing magazine 25.2 (2008), pp. 72โ€“82.

[Lรผt02] Jesper Lรผtzen. โ€œBetween Rigor and Applications: De-velopments in the Concept of Function in MathematicalAnalysisโ€. In: The Cambridge History of Science. Ed. byMary JoEditor Nye. Vol. 5. The Cambridge History ofScience. Cambridge University Press, 2002, pp. 468โ€“487.doi : 10.1017/CHOL9780521571999.026.

Page 648: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

648 introduction to theoretical computer science

[LZ19] Harry Lewis and Rachel Zax. Essential Discrete Mathemat-ics for Computer Science. Princeton University Press, 2019.isbn: 9780691190617.

[Maa85] Wolfgang Maass. โ€œCombinatorial lower bound argu-ments for deterministic and nondeterministic Turingmachinesโ€. In: Transactions of the American MathematicalSociety 292.2 (1985), pp. 675โ€“693.

[Mer07] N David Mermin. Quantum computer science: an introduc-tion. Cambridge University Press, 2007.

[MM11] Cristopher Moore and Stephan Mertens. The nature ofcomputation. Oxford University Press, 2011.

[MP43] Warren S McCulloch and Walter Pitts. โ€œA logical calcu-lus of the ideas immanent in nervous activityโ€. In: Thebulletin of mathematical biophysics 5.4 (1943), pp. 115โ€“133.

[MR95] Rajeev Motwani and Prabhakar Raghavan. Randomizedalgorithms. Cambridge university press, 1995.

[MU17] Michael Mitzenmacher and Eli Upfal. Probability andcomputing: randomization and probabilistic techniques inalgorithms and data analysis. Cambridge university press,2017.

[Neu45] John von Neumann. โ€œFirst Draft of a Report on the ED-VACโ€. In: (1945). Reprinted in the IEEE Annals of theHistory of Computing journal, 1993.

[NS05] Noam Nisan and Shimon Schocken. The elements of com-puting systems: building a modern computer from first princi-ples. MIT press, 2005.

[Pag+99] Lawrence Page et al. The PageRank citation ranking: Bring-ing order to the web. Tech. rep. Stanford InfoLab, 1999.

[Pie02] Benjamin Pierce. Types and programming languages. Cam-bridge, Mass: MIT Press, 2002. isbn: 0262162091.

[Ric53] Henry Gordon Rice. โ€œClasses of recursively enumerablesets and their decision problemsโ€. In: Transactions of theAmerican Mathematical Society 74.2 (1953), pp. 358โ€“366.

[Rog96] Yurii Rogozhin. โ€œSmall universal Turing machinesโ€. In:Theoretical Computer Science 168.2 (1996), pp. 215โ€“240.

[Ros19] Kenneth Rosen. Discrete mathematics and its applications.New York, NY: McGraw-Hill, 2019. isbn: 125967651x.

[RS59] Michael O Rabin and Dana Scott. โ€œFinite automata andtheir decision problemsโ€. In: IBM journal of research anddevelopment 3.2 (1959), pp. 114โ€“125.

Page 649: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

BIBLIOGRAPHY 649

[Sav98] John E Savage. Models of computation. Vol. 136. Availableelectronically at http://cs.brown.edu/people/jsavage/book/. Addison-Wesley Reading, MA, 1998.

[Sch05] Alexander Schrijver. โ€œOn the history of combinatorialoptimization (till 1960)โ€. In: Handbooks in operationsresearch and management science 12 (2005), pp. 1โ€“68.

[Sha38] Claude E Shannon. โ€œA symbolic analysis of relay andswitching circuitsโ€. In: Electrical Engineering 57.12 (1938),pp. 713โ€“723.

[Sha79] Adi Shamir. โ€œFactoring numbers in O (logn) arith-metic stepsโ€. In: Information Processing Letters 8.1 (1979),pp. 28โ€“31.

[She03] Saharon Shelah. โ€œLogical dreamsโ€. In: Bulletin of theAmerican Mathematical Society 40.2 (2003), pp. 203โ€“228.

[She13] Henry Maurice Sheffer. โ€œA set of five independent pos-tulates for Boolean algebras, with application to logicalconstantsโ€. In: Transactions of the American mathematicalsociety 14.4 (1913), pp. 481โ€“488.

[She16] Margot Shetterly. Hidden figures : the American dream andthe untold story of the Black women mathematicians whohelped win the space race. New York, NY: William Morrow,2016. isbn: 9780062363602.

[Sin97] Simon Singh. Fermatโ€™s enigma : the quest to solve the worldโ€™sgreatest mathematical problem. New York: Walker, 1997.isbn: 0385493622.

[Sip97] Michael Sipser. Introduction to the theory of computation.PWS Publishing Company, 1997. isbn: 978-0-534-94728-6.

[Sob17] Dava Sobel. The Glass Universe : How the Ladies of theHarvard Observatory Took the Measure of the Stars. NewYork: Penguin Books, 2017. isbn: 9780143111344.

[Sol14] Daniel Solow. How to read and do proofs : an introductionto mathematical thought processes. Hoboken, New Jersey:John Wiley & Sons, Inc, 2014. isbn: 9781118164020.

[SS71] Arnold Schรถnhage and Volker Strassen. โ€œSchnelle mul-tiplikation grosser zahlenโ€. In: Computing 7.3-4 (1971),pp. 281โ€“292.

[Ste87] Dorothy Stein. Ada : a life and a legacy. Cambridge, Mass:MIT Press, 1987. isbn: 0262691167.

[Str69] Volker Strassen. โ€œGaussian elimination is not optimalโ€.In: Numerische mathematik 13.4 (1969), pp. 354โ€“356.

Page 650: INTRODUCTIONTO THEORETICAL COMPUTERSCIENCE

650 introduction to theoretical computer science

[Swa02] Doron Swade. The difference engine : Charles Babbage andthe quest to build the first computer. New York: PenguinBooks, 2002. isbn: 0142001449.

[Tho84] Ken Thompson. โ€œReflections on trusting trustโ€. In: Com-munications of the ACM 27.8 (1984), pp. 761โ€“763.

[Too63] Andrei L Toom. โ€œThe complexity of a scheme of func-tional elements realizing the multiplication of integersโ€.In: Soviet Mathematics Doklady. Vol. 3. 4. 1963, pp. 714โ€“716.

[Tur37] Alan M Turing. โ€œOn computable numbers, with an ap-plication to the Entscheidungsproblemโ€. In: Proceedingsof the London mathematical society 2.1 (1937), pp. 230โ€“265.

[Vad+12] Salil P Vadhan et al. โ€œPseudorandomnessโ€. In: Founda-tions and Trendsยฎ in Theoretical Computer Science 7.1โ€“3(2012), pp. 1โ€“336.

[Val76] Leslie G Valiant. โ€œUniversal circuits (preliminary re-port)โ€. In: Proceedings of the eighth annual ACM sympo-sium on Theory of computing. ACM. 1976, pp. 196โ€“203.

[Wan57] Hao Wang. โ€œA variant to Turingโ€™s theory of computingmachinesโ€. In: Journal of the ACM (JACM) 4.1 (1957),pp. 63โ€“92.

[Weg87] Ingo Wegener. The complexity of Boolean functions. Vol. 1.BG Teubner Stuttgart, 1987.

[Wer74] PJ Werbos. โ€œBeyond regression: New tools for predictionand analysis in the behavioral sciences. Ph. D. thesis,Harvard University, Cambridge, MA, 1974.โ€ In: (1974).

[Wig19] Avi Wigderson. Mathematics and Computation. Draftavailable on https://www.math.ias.edu/avi/book.Princeton University Press, 2019.

[Wil09] Ryan Williams. โ€œFinding paths of length k in ๐‘‚โˆ—(2๐‘˜)timeโ€. In: Information Processing Letters 109.6 (2009),pp. 315โ€“318.

[WN09] Damien Woods and Turlough Neary. โ€œThe complex-ity of small universal Turing machines: A surveyโ€. In:Theoretical Computer Science 410.4-5 (2009), pp. 443โ€“450.

[WR12] Alfred North Whitehead and Bertrand Russell. Principiamathematica. Vol. 2. University Press, 1912.