julian brad eld [email protected] if{4
TRANSCRIPT
Course Aims
I understanding of computability, complexity and intractability;
I knowledge of lambda calculus, types, and type safety.
2
Course Outcomes
By the end of the course you should be able to
I Explain decidability, undecidability and the halting problem.
I Demonstrate the use of reductions for undecidability proofs.
I Explain the notions of P, NP, NP-complete.
I Use reductions to show problems to be NP-hard.
I Write short programs in lambda-calculus.
I Explain and demonstrate type-inference for simple programs.
3
Course Outcomes
By the end of the course you should be able to
I Explain decidability, undecidability and the halting problem.
I Demonstrate the use of reductions for undecidability proofs.
I Explain the notions of P, NP, NP-complete.
I Use reductions to show problems to be NP-hard.
I Write short programs in lambda-calculus.
I Explain and demonstrate type-inference for simple programs.
3
Course Outcomes
By the end of the course you should be able to
I Explain decidability, undecidability and the halting problem.
I Demonstrate the use of reductions for undecidability proofs.
I Explain the notions of P, NP, NP-complete.
I Use reductions to show problems to be NP-hard.
I Write short programs in lambda-calculus.
I Explain and demonstrate type-inference for simple programs.
3
Course Outcomes
By the end of the course you should be able to
I Explain decidability, undecidability and the halting problem.
I Demonstrate the use of reductions for undecidability proofs.
I Explain the notions of P, NP, NP-complete.
I Use reductions to show problems to be NP-hard.
I Write short programs in lambda-calculus.
I Explain and demonstrate type-inference for simple programs.
3
Course Outcomes
By the end of the course you should be able to
I Explain decidability, undecidability and the halting problem.
I Demonstrate the use of reductions for undecidability proofs.
I Explain the notions of P, NP, NP-complete.
I Use reductions to show problems to be NP-hard.
I Write short programs in lambda-calculus.
I Explain and demonstrate type-inference for simple programs.
3
Course Outcomes
By the end of the course you should be able to
I Explain decidability, undecidability and the halting problem.
I Demonstrate the use of reductions for undecidability proofs.
I Explain the notions of P, NP, NP-complete.
I Use reductions to show problems to be NP-hard.
I Write short programs in lambda-calculus.
I Explain and demonstrate type-inference for simple programs.
3
Course Outline
I Introduction. The meaning of computation.
I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.
I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.
I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.
I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.
I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.
I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class P
I Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NP
I NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completeness
I Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.
I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.
I Recursion.I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.
I Types.I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.
I Polymorphism.I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.
I Type inference.
4
Course Outline
I Introduction. The meaning of computation.I Register machines and their programming.I Universal machines and the halting problem.I Decision problems and reductions.I Undecidability and semi-decidability.I Complexity of algorithms and problems.I The class PI Non-determinism and NPI NP-completenessI Beyond NP.I Lambda-calculus.I Recursion.I Types.I Polymorphism.I Type inference.
4
Assessment
The course is assessed by a written examination (70%) and two courseworkexercises (15% each).
Coursework deadlines: 16:00 on 14 February and 20 March (end of weeks5 and 9 – remember the gap week isn’t numbered).
The courseworks will, roughly, contain one question for each week ofthe course. Please do the questions in/following the week to which theyapply, rather than waiting till the deadline to do them all!
5
Lectures
I see the purpose of lectures as providing the key concepts on to whichyou can hook the rest of the material.
Slides are few in number, and text-dense: I will spend a lot of time talkingabout things and using the board, with the slides providing the reminderof what we’re doing.
Interaction in lectures is encouraged!
From time to time, we’ll do relatively detailed proofs, but mostly this willbe left to you in exercises and independent study.
Note: The last examinable lecture is (as currently planned) the Mondayof week 8, but there are two further lectures offered.
6
Textbooks
It will be useful, but not absolutely necessary, to have access to:
I Michael Sipser Introduction to the Theory of Computation, PWSPublishing (International Thomson Publishing)
I Benjamin C. Pierce Types and Programming Languages, MIT Press
There is also much information on the Web, and in particular Wikipediaarticles are generally fairly good in this area.
Generally I will refer to textbooks for the detail of material I discuss onslides.
7
What is computation? What are computers?
Some computing devices:
The abacus – some millennia BP.
[Association pour le musee international du calcul de l’informatique et de
l’automatique de Valbonne Sophia Antipolis (AMISA)]
8
First mechanical digital calculator – 1642 Pascal
[original source unknown]
9
The Difference Engine, [The Analytical Engine] – 1812, 1832 Babbage /Lovelace.
[ Science Museum ?? ]
Analytical Engine (never built) anticipated many modern aspects ofcomputers. See http://www.fourmilab.ch/babbage/.
10
ENIAC – 1945, Eckert & Mauchley
[University of Pennsylvania] 11
What do computers manipulate?
Symbols? Numbers? Bits? Does it matter?
What about real numbers? Physical quantities? Proofs? Emotions?
Do we buy that numbers are enough? If we buy that, are bits enough?
How much memory do we need?
12
What do computers manipulate?
Symbols? Numbers? Bits? Does it matter?
What about real numbers? Physical quantities? Proofs? Emotions?
Do we buy that numbers are enough? If we buy that, are bits enough?
How much memory do we need?
12
What do computers manipulate?
Symbols? Numbers? Bits? Does it matter?
What about real numbers? Physical quantities? Proofs? Emotions?
Do we buy that numbers are enough? If we buy that, are bits enough?
How much memory do we need?
12
What do computers manipulate?
Symbols? Numbers? Bits? Does it matter?
What about real numbers? Physical quantities? Proofs? Emotions?
Do we buy that numbers are enough? If we buy that, are bits enough?
How much memory do we need?
12
What can we compute?
If we can cast a problem in terms that our computers manipulate, can wesolve it?
Always? Sometimes? With how much time? With how muchmemory?
13
What can we compute?
If we can cast a problem in terms that our computers manipulate, can wesolve it? Always? Sometimes?
With how much time? With how muchmemory?
13
What can we compute?
If we can cast a problem in terms that our computers manipulate, can wesolve it? Always? Sometimes? With how much time?
With how muchmemory?
13
What can we compute?
If we can cast a problem in terms that our computers manipulate, can wesolve it? Always? Sometimes? With how much time? With how muchmemory?
13
Register Machines
At the end of Inf2A, you saw Turing machines and undecidability. We’llreprise this using a different model a bit closer to (y)our ideas of what acomputer is.
The simplest Register Machine has:
I a fixed number of registers R0, . . . ,Rm−1, which each hold a naturalnumber ;
I a fixed program that is a sequence P = I0, I1, . . . , In−1 ofinstructions.
I Each instruction is one of:inc(i) add 1 to Ri , ordecjz(i , j) if Ri = 0 then goto Ij , else subtract 1 from Ri
I The machine executes instructions in order, except when decjzcauses a jump.
CLAIM: Register Machines can compute anything any other computercan.
What is unrealistic about these machines?
14
Register Machines
At the end of Inf2A, you saw Turing machines and undecidability. We’llreprise this using a different model a bit closer to (y)our ideas of what acomputer is.
The simplest Register Machine has:
I a fixed number of registers R0, . . . ,Rm−1, which each hold a naturalnumber ;
I a fixed program that is a sequence P = I0, I1, . . . , In−1 ofinstructions.
I Each instruction is one of:inc(i) add 1 to Ri , ordecjz(i , j) if Ri = 0 then goto Ij , else subtract 1 from Ri
I The machine executes instructions in order, except when decjzcauses a jump.
CLAIM: Register Machines can compute anything any other computercan.
What is unrealistic about these machines?
14
Register Machines
At the end of Inf2A, you saw Turing machines and undecidability. We’llreprise this using a different model a bit closer to (y)our ideas of what acomputer is.
The simplest Register Machine has:
I a fixed number of registers R0, . . . ,Rm−1, which each hold a naturalnumber ;
I a fixed program that is a sequence P = I0, I1, . . . , In−1 ofinstructions.
I Each instruction is one of:inc(i) add 1 to Ri , ordecjz(i , j) if Ri = 0 then goto Ij , else subtract 1 from Ri
I The machine executes instructions in order, except when decjzcauses a jump.
CLAIM: Register Machines can compute anything any other computercan.
What is unrealistic about these machines?
14
Register Machines
At the end of Inf2A, you saw Turing machines and undecidability. We’llreprise this using a different model a bit closer to (y)our ideas of what acomputer is.
The simplest Register Machine has:
I a fixed number of registers R0, . . . ,Rm−1, which each hold a naturalnumber ;
I a fixed program that is a sequence P = I0, I1, . . . , In−1 ofinstructions.
I Each instruction is one of:inc(i) add 1 to Ri , ordecjz(i , j) if Ri = 0 then goto Ij , else subtract 1 from Ri
I The machine executes instructions in order, except when decjzcauses a jump.
CLAIM: Register Machines can compute anything any other computercan.
What is unrealistic about these machines?
14
Register Machines
At the end of Inf2A, you saw Turing machines and undecidability. We’llreprise this using a different model a bit closer to (y)our ideas of what acomputer is.
The simplest Register Machine has:
I a fixed number of registers R0, . . . ,Rm−1, which each hold a naturalnumber ;
I a fixed program that is a sequence P = I0, I1, . . . , In−1 ofinstructions.
I Each instruction is one of:inc(i) add 1 to Ri , ordecjz(i , j) if Ri = 0 then goto Ij , else subtract 1 from Ri
I The machine executes instructions in order, except when decjzcauses a jump.
CLAIM: Register Machines can compute anything any other computercan.
What is unrealistic about these machines?
14
Register Machines
At the end of Inf2A, you saw Turing machines and undecidability. We’llreprise this using a different model a bit closer to (y)our ideas of what acomputer is.
The simplest Register Machine has:
I a fixed number of registers R0, . . . ,Rm−1, which each hold a naturalnumber ;
I a fixed program that is a sequence P = I0, I1, . . . , In−1 ofinstructions.
I Each instruction is one of:inc(i) add 1 to Ri , ordecjz(i , j) if Ri = 0 then goto Ij , else subtract 1 from Ri
I The machine executes instructions in order, except when decjzcauses a jump.
CLAIM: Register Machines can compute anything any other computercan.
What is unrealistic about these machines?
14
Register Machines
At the end of Inf2A, you saw Turing machines and undecidability. We’llreprise this using a different model a bit closer to (y)our ideas of what acomputer is.
The simplest Register Machine has:
I a fixed number of registers R0, . . . ,Rm−1, which each hold a naturalnumber ;
I a fixed program that is a sequence P = I0, I1, . . . , In−1 ofinstructions.
I Each instruction is one of:inc(i) add 1 to Ri , ordecjz(i , j) if Ri = 0 then goto Ij , else subtract 1 from Ri
I The machine executes instructions in order, except when decjzcauses a jump.
CLAIM: Register Machines can compute anything any other computercan.
What is unrealistic about these machines?14
Programming RMs
RMs are very simple, so programs become very large.
First of all, we’ll define ‘macros’ to make programs more understandable.
These are not functions or subroutines as in C or Java: they are like#define in C, human abbreviations for longer bits of code.
We’ll write them in English, e.g. ‘add Ri to Rj clearing Ri ’.
When we define a macro, we’ll number instructions from zero. When themacro is used, the numbers must be updated appropriately (‘relocated’).
We’ll also use symbolic labels for instructions, and use them in jumps.
We can then build macros into the hardware by adding new instructionsthat may have access to special registers: we’ll use negative indices forregisters that are not to be used by normal programs.
Convention: macros must leave all special registers they use at zero –and hence the special registers can be assumed to be zero on entry.
15
Programming RMs
RMs are very simple, so programs become very large.
First of all, we’ll define ‘macros’ to make programs more understandable.
These are not functions or subroutines as in C or Java: they are like#define in C, human abbreviations for longer bits of code.
We’ll write them in English, e.g. ‘add Ri to Rj clearing Ri ’.
When we define a macro, we’ll number instructions from zero. When themacro is used, the numbers must be updated appropriately (‘relocated’).
We’ll also use symbolic labels for instructions, and use them in jumps.
We can then build macros into the hardware by adding new instructionsthat may have access to special registers: we’ll use negative indices forregisters that are not to be used by normal programs.
Convention: macros must leave all special registers they use at zero –and hence the special registers can be assumed to be zero on entry.
15
Programming RMs
RMs are very simple, so programs become very large.
First of all, we’ll define ‘macros’ to make programs more understandable.
These are not functions or subroutines as in C or Java: they are like#define in C, human abbreviations for longer bits of code.
We’ll write them in English, e.g. ‘add Ri to Rj clearing Ri ’.
When we define a macro, we’ll number instructions from zero. When themacro is used, the numbers must be updated appropriately (‘relocated’).
We’ll also use symbolic labels for instructions, and use them in jumps.
We can then build macros into the hardware by adding new instructionsthat may have access to special registers: we’ll use negative indices forregisters that are not to be used by normal programs.
Convention: macros must leave all special registers they use at zero –and hence the special registers can be assumed to be zero on entry.
15
Programming RMs
RMs are very simple, so programs become very large.
First of all, we’ll define ‘macros’ to make programs more understandable.
These are not functions or subroutines as in C or Java: they are like#define in C, human abbreviations for longer bits of code.
We’ll write them in English, e.g. ‘add Ri to Rj clearing Ri ’.
When we define a macro, we’ll number instructions from zero. When themacro is used, the numbers must be updated appropriately (‘relocated’).
We’ll also use symbolic labels for instructions, and use them in jumps.
We can then build macros into the hardware by adding new instructionsthat may have access to special registers: we’ll use negative indices forregisters that are not to be used by normal programs.
Convention: macros must leave all special registers they use at zero –and hence the special registers can be assumed to be zero on entry.
15
Easy RM programs
I ‘goto Ij ’ using R−1 as temp0 decjz (−1, j)
I ‘clear Ri ’0 decjz (i , 2)1 goto 0
I ‘copy Ri to Rj ’ using R−2 as temp0 clear Rj
loop1 : 2 decjz (i , loop2)3 inc (j)4 inc (−2)5 goto loop1
loop2 : 6 decjz (−2, end)7 inc (i)8 goto loop2
end : 9
16
Easy RM programs
I ‘goto Ij ’ using R−1 as temp0 decjz (−1, j)
I ‘clear Ri ’0 decjz (i , 2)1 goto 0
I ‘copy Ri to Rj ’ using R−2 as temp0 clear Rj
loop1 : 2 decjz (i , loop2)3 inc (j)4 inc (−2)5 goto loop1
loop2 : 6 decjz (−2, end)7 inc (i)8 goto loop2
end : 9
16
Easy RM programs
I ‘goto Ij ’ using R−1 as temp0 decjz (−1, j)
I ‘clear Ri ’0 decjz (i , 2)1 goto 0
I ‘copy Ri to Rj ’ using R−2 as temp0 clear Rj
loop1 : 2 decjz (i , loop2)3 inc (j)4 inc (−2)5 goto loop1
loop2 : 6 decjz (−2, end)7 inc (i)8 goto loop2
end : 9
16
RM programming exercises
I Addition/subtraction of registers
I Comparison of registers
I Multiplication of registers
I (integer) Division/remainder of registers
17
RM programming exercises
I Addition/subtraction of registers
I Comparison of registers
I Multiplication of registers
I (integer) Division/remainder of registers
17
RM programming exercises
I Addition/subtraction of registers
I Comparison of registers
I Multiplication of registers
I (integer) Division/remainder of registers
17
RM programming exercises
I Addition/subtraction of registers
I Comparison of registers
I Multiplication of registers
I (integer) Division/remainder of registers
17
How many registers?
So far, we have introduced registers as needed. Do we need machineswith arbitrarily many registers?
A pairing function is an injective function N× N→ N.
An easy mathematical example is (x , y) 7→ 2x3y .
Given a pairing function f , write 〈x , y〉2 for f (x , y). If z = 〈x , y〉2, letz0 = x and z1 = y .
Exercise: program pairing and unpairing functions in an RM.Exercise: design (or look up) a surjective pairing function.
Can generalize to 〈. . . 〉n : Nn → N. (Exercise: give two different ways.)
And thence to 〈. . . 〉 : N∗ → NThus we can compute with arbitrarily many numbers with a fixed numberof registers.
With a couple of extra instructions (‘goto’ and ‘clear’), just two userregisters are enough. (Think about what this means.)
18
How many registers?
So far, we have introduced registers as needed. Do we need machineswith arbitrarily many registers?
A pairing function is an injective function N× N→ N.
An easy mathematical example is (x , y) 7→ 2x3y .
Given a pairing function f , write 〈x , y〉2 for f (x , y). If z = 〈x , y〉2, letz0 = x and z1 = y .
Exercise: program pairing and unpairing functions in an RM.Exercise: design (or look up) a surjective pairing function.
Can generalize to 〈. . . 〉n : Nn → N. (Exercise: give two different ways.)
And thence to 〈. . . 〉 : N∗ → NThus we can compute with arbitrarily many numbers with a fixed numberof registers.
With a couple of extra instructions (‘goto’ and ‘clear’), just two userregisters are enough. (Think about what this means.)
18
How many registers?
So far, we have introduced registers as needed. Do we need machineswith arbitrarily many registers?
A pairing function is an injective function N× N→ N.
An easy mathematical example is (x , y) 7→ 2x3y .
Given a pairing function f , write 〈x , y〉2 for f (x , y). If z = 〈x , y〉2, letz0 = x and z1 = y .
Exercise: program pairing and unpairing functions in an RM.Exercise: design (or look up) a surjective pairing function.
Can generalize to 〈. . . 〉n : Nn → N. (Exercise: give two different ways.)
And thence to 〈. . . 〉 : N∗ → NThus we can compute with arbitrarily many numbers with a fixed numberof registers.
With a couple of extra instructions (‘goto’ and ‘clear’), just two userregisters are enough. (Think about what this means.)
18
How many registers?
So far, we have introduced registers as needed. Do we need machineswith arbitrarily many registers?
A pairing function is an injective function N× N→ N.
An easy mathematical example is (x , y) 7→ 2x3y .
Given a pairing function f , write 〈x , y〉2 for f (x , y). If z = 〈x , y〉2, letz0 = x and z1 = y .
Exercise: program pairing and unpairing functions in an RM.
Exercise: design (or look up) a surjective pairing function.
Can generalize to 〈. . . 〉n : Nn → N. (Exercise: give two different ways.)
And thence to 〈. . . 〉 : N∗ → NThus we can compute with arbitrarily many numbers with a fixed numberof registers.
With a couple of extra instructions (‘goto’ and ‘clear’), just two userregisters are enough. (Think about what this means.)
18
How many registers?
So far, we have introduced registers as needed. Do we need machineswith arbitrarily many registers?
A pairing function is an injective function N× N→ N.
An easy mathematical example is (x , y) 7→ 2x3y .
Given a pairing function f , write 〈x , y〉2 for f (x , y). If z = 〈x , y〉2, letz0 = x and z1 = y .
Exercise: program pairing and unpairing functions in an RM.Exercise: design (or look up) a surjective pairing function.
Can generalize to 〈. . . 〉n : Nn → N. (Exercise: give two different ways.)
And thence to 〈. . . 〉 : N∗ → NThus we can compute with arbitrarily many numbers with a fixed numberof registers.
With a couple of extra instructions (‘goto’ and ‘clear’), just two userregisters are enough. (Think about what this means.)
18
How many registers?
So far, we have introduced registers as needed. Do we need machineswith arbitrarily many registers?
A pairing function is an injective function N× N→ N.
An easy mathematical example is (x , y) 7→ 2x3y .
Given a pairing function f , write 〈x , y〉2 for f (x , y). If z = 〈x , y〉2, letz0 = x and z1 = y .
Exercise: program pairing and unpairing functions in an RM.Exercise: design (or look up) a surjective pairing function.
Can generalize to 〈. . . 〉n : Nn → N. (Exercise: give two different ways.)
And thence to 〈. . . 〉 : N∗ → N
Thus we can compute with arbitrarily many numbers with a fixed numberof registers.
With a couple of extra instructions (‘goto’ and ‘clear’), just two userregisters are enough. (Think about what this means.)
18
How many registers?
So far, we have introduced registers as needed. Do we need machineswith arbitrarily many registers?
A pairing function is an injective function N× N→ N.
An easy mathematical example is (x , y) 7→ 2x3y .
Given a pairing function f , write 〈x , y〉2 for f (x , y). If z = 〈x , y〉2, letz0 = x and z1 = y .
Exercise: program pairing and unpairing functions in an RM.Exercise: design (or look up) a surjective pairing function.
Can generalize to 〈. . . 〉n : Nn → N. (Exercise: give two different ways.)
And thence to 〈. . . 〉 : N∗ → NThus we can compute with arbitrarily many numbers with a fixed numberof registers.
With a couple of extra instructions (‘goto’ and ‘clear’), just two userregisters are enough. (Think about what this means.)
18
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉
pdecjz(i , j)q = 〈1, i , j〉pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉
pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉
pMq = 〈pPq, pRq,C 〉Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
Towards a universal RM
Programs are data.
Can we build an RM that can interpret any other RM?
Need to encode RMs as numbers. What do we need for a machine M?
We need the contents R of the registers R0, . . . ,Rm−1, the programP = I0 . . . In−1 itself, and the ‘program counter’ C giving the currentinstruction.
Let p. . .q be the coding function given thus:
pinc(i)q = 〈0, i〉pdecjz(i , j)q = 〈1, i , j〉
pPq = 〈pI0q, . . . , pIn−1q〉pRq = 〈R0, . . . ,Rm−1〉pMq = 〈pPq, pRq,C 〉
Routine but very tedious Exercise: design an RM that takes an RMcoding in R0, simulates it, and leaves the final state (if any) in R0.
19
and on to the halting problem
“the final state (if any)” . . .
Can we tell computationally whether (any) simulated machine halts, bysome clever analysis of the program?
I Suppose H is an RM (PH ,R0, . . . ) which takes a machine codingpMq in R0, and terminates with 1 in R0 if M halts, and 0 in R0 if Mruns forever.
I It’s then easy to construct L = (PL,R0, . . . ), which takes a program(code) pPq and terminates with 1 if H returns 0 on the machine(P, pPq), and itself goes into an infinite loop if H returns 1 on(P, pPq).
I (L takes a program, and runs the halting test on the program withitself as input (diagonalization), and loops iff it halts.)
I What happens if we run L with input pPLq?I If L halts on pPLq, that means that H says that (PL, pPLq) loops;
and if L loops on pPLq, that means that H says that (PL, pPLq)halts. So either way, contradiction.
20
and on to the halting problem
“the final state (if any)” . . .
Can we tell computationally whether (any) simulated machine halts, bysome clever analysis of the program?
I Suppose H is an RM (PH ,R0, . . . ) which takes a machine codingpMq in R0, and terminates with 1 in R0 if M halts, and 0 in R0 if Mruns forever.
I It’s then easy to construct L = (PL,R0, . . . ), which takes a program(code) pPq and terminates with 1 if H returns 0 on the machine(P, pPq), and itself goes into an infinite loop if H returns 1 on(P, pPq).
I (L takes a program, and runs the halting test on the program withitself as input (diagonalization), and loops iff it halts.)
I What happens if we run L with input pPLq?I If L halts on pPLq, that means that H says that (PL, pPLq) loops;
and if L loops on pPLq, that means that H says that (PL, pPLq)halts. So either way, contradiction.
20
and on to the halting problem
“the final state (if any)” . . .
Can we tell computationally whether (any) simulated machine halts, bysome clever analysis of the program?
I Suppose H is an RM (PH ,R0, . . . ) which takes a machine codingpMq in R0, and terminates with 1 in R0 if M halts, and 0 in R0 if Mruns forever.
I It’s then easy to construct L = (PL,R0, . . . ), which takes a program(code) pPq and terminates with 1 if H returns 0 on the machine(P, pPq), and itself goes into an infinite loop if H returns 1 on(P, pPq).
I (L takes a program, and runs the halting test on the program withitself as input (diagonalization), and loops iff it halts.)
I What happens if we run L with input pPLq?I If L halts on pPLq, that means that H says that (PL, pPLq) loops;
and if L loops on pPLq, that means that H says that (PL, pPLq)halts. So either way, contradiction.
20
and on to the halting problem
“the final state (if any)” . . .
Can we tell computationally whether (any) simulated machine halts, bysome clever analysis of the program?
I Suppose H is an RM (PH ,R0, . . . ) which takes a machine codingpMq in R0, and terminates with 1 in R0 if M halts, and 0 in R0 if Mruns forever.
I It’s then easy to construct L = (PL,R0, . . . ), which takes a program(code) pPq and terminates with 1 if H returns 0 on the machine(P, pPq), and itself goes into an infinite loop if H returns 1 on(P, pPq).
I (L takes a program, and runs the halting test on the program withitself as input (diagonalization), and loops iff it halts.)
I What happens if we run L with input pPLq?I If L halts on pPLq, that means that H says that (PL, pPLq) loops;
and if L loops on pPLq, that means that H says that (PL, pPLq)halts. So either way, contradiction.
20
and on to the halting problem
“the final state (if any)” . . .
Can we tell computationally whether (any) simulated machine halts, bysome clever analysis of the program?
I Suppose H is an RM (PH ,R0, . . . ) which takes a machine codingpMq in R0, and terminates with 1 in R0 if M halts, and 0 in R0 if Mruns forever.
I It’s then easy to construct L = (PL,R0, . . . ), which takes a program(code) pPq and terminates with 1 if H returns 0 on the machine(P, pPq), and itself goes into an infinite loop if H returns 1 on(P, pPq).
I (L takes a program, and runs the halting test on the program withitself as input (diagonalization), and loops iff it halts.)
I What happens if we run L with input pPLq?
I If L halts on pPLq, that means that H says that (PL, pPLq) loops;and if L loops on pPLq, that means that H says that (PL, pPLq)halts. So either way, contradiction.
20
and on to the halting problem
“the final state (if any)” . . .
Can we tell computationally whether (any) simulated machine halts, bysome clever analysis of the program?
I Suppose H is an RM (PH ,R0, . . . ) which takes a machine codingpMq in R0, and terminates with 1 in R0 if M halts, and 0 in R0 if Mruns forever.
I It’s then easy to construct L = (PL,R0, . . . ), which takes a program(code) pPq and terminates with 1 if H returns 0 on the machine(P, pPq), and itself goes into an infinite loop if H returns 1 on(P, pPq).
I (L takes a program, and runs the halting test on the program withitself as input (diagonalization), and loops iff it halts.)
I What happens if we run L with input pPLq?I If L halts on pPLq, that means that H says that (PL, pPLq) loops;
and if L loops on pPLq, that means that H says that (PL, pPLq)halts. So either way, contradiction.
20
remarks
That (sketch) proof assumed very little – doesn’t actually need RMs, justa function of two inputs that can be understood as program and data. Sowhy is it such a big deal?
Because we convinced ourselves (?) that RMs can compute anything thatis computable in any reasonable sense. So we’ve shown that some thingscan’t be computed in any reasonable sense – which was not obvious.
But are Register Machines the right thing? Maybe other machines can domore?
21
remarks
That (sketch) proof assumed very little – doesn’t actually need RMs, justa function of two inputs that can be understood as program and data. Sowhy is it such a big deal?
Because we convinced ourselves (?) that RMs can compute anything thatis computable in any reasonable sense. So we’ve shown that some thingscan’t be computed in any reasonable sense – which was not obvious.
But are Register Machines the right thing? Maybe other machines can domore?
21
remarks
That (sketch) proof assumed very little – doesn’t actually need RMs, justa function of two inputs that can be understood as program and data. Sowhy is it such a big deal?
Because we convinced ourselves (?) that RMs can compute anything thatis computable in any reasonable sense. So we’ve shown that some thingscan’t be computed in any reasonable sense – which was not obvious.
But are Register Machines the right thing? Maybe other machines can domore?
21
Turing machines
(Reprise from Inf2A) Turing’s original paper (quite readable - read it!)used what we now call Turing machines.
A Turing machine comprises:
I a tape t = t0t1 . . . with an unlimited number of cells, each of whichholds a symbol ti from a finite alphabet A;
I a head which can read and write one cell of the tape, and be movedone step left or right; let h be the position of the head;
I a finite set S of control states
I a next function n : S × A→ (S × A× +1,−1) ∪ halt
The execution of the machine is:
I Suppose the current state is s, and the head position is h, then let(s ′, a′, k) = n(s, th): then cell h is written with the symbol a′, thehead moves to position h + k , and the next state is s ′;
I if n(s, th) = halt, or h + k < 0, the machine halts.
22
Turing machines
(Reprise from Inf2A) Turing’s original paper (quite readable - read it!)used what we now call Turing machines.
A Turing machine comprises:
I a tape t = t0t1 . . . with an unlimited number of cells, each of whichholds a symbol ti from a finite alphabet A;
I a head which can read and write one cell of the tape, and be movedone step left or right; let h be the position of the head;
I a finite set S of control states
I a next function n : S × A→ (S × A× +1,−1) ∪ halt
The execution of the machine is:
I Suppose the current state is s, and the head position is h, then let(s ′, a′, k) = n(s, th): then cell h is written with the symbol a′, thehead moves to position h + k , and the next state is s ′;
I if n(s, th) = halt, or h + k < 0, the machine halts.
22
Turing machines
(Reprise from Inf2A) Turing’s original paper (quite readable - read it!)used what we now call Turing machines.
A Turing machine comprises:
I a tape t = t0t1 . . . with an unlimited number of cells, each of whichholds a symbol ti from a finite alphabet A;
I a head which can read and write one cell of the tape, and be movedone step left or right; let h be the position of the head;
I a finite set S of control states
I a next function n : S × A→ (S × A× +1,−1) ∪ halt
The execution of the machine is:
I Suppose the current state is s, and the head position is h, then let(s ′, a′, k) = n(s, th): then cell h is written with the symbol a′, thehead moves to position h + k , and the next state is s ′;
I if n(s, th) = halt, or h + k < 0, the machine halts.
22
Turing machines
(Reprise from Inf2A) Turing’s original paper (quite readable - read it!)used what we now call Turing machines.
A Turing machine comprises:
I a tape t = t0t1 . . . with an unlimited number of cells, each of whichholds a symbol ti from a finite alphabet A;
I a head which can read and write one cell of the tape, and be movedone step left or right; let h be the position of the head;
I a finite set S of control states
I a next function n : S × A→ (S × A× +1,−1) ∪ halt
The execution of the machine is:
I Suppose the current state is s, and the head position is h, then let(s ′, a′, k) = n(s, th): then cell h is written with the symbol a′, thehead moves to position h + k , and the next state is s ′;
I if n(s, th) = halt, or h + k < 0, the machine halts.
22
Turing machines
(Reprise from Inf2A) Turing’s original paper (quite readable - read it!)used what we now call Turing machines.
A Turing machine comprises:
I a tape t = t0t1 . . . with an unlimited number of cells, each of whichholds a symbol ti from a finite alphabet A;
I a head which can read and write one cell of the tape, and be movedone step left or right; let h be the position of the head;
I a finite set S of control states
I a next function n : S × A→ (S × A× +1,−1) ∪ halt
The execution of the machine is:
I Suppose the current state is s, and the head position is h, then let(s ′, a′, k) = n(s, th): then cell h is written with the symbol a′, thehead moves to position h + k , and the next state is s ′;
I if n(s, th) = halt, or h + k < 0, the machine halts.
22
Turing machines
(Reprise from Inf2A) Turing’s original paper (quite readable - read it!)used what we now call Turing machines.
A Turing machine comprises:
I a tape t = t0t1 . . . with an unlimited number of cells, each of whichholds a symbol ti from a finite alphabet A;
I a head which can read and write one cell of the tape, and be movedone step left or right; let h be the position of the head;
I a finite set S of control states
I a next function n : S × A→ (S × A× +1,−1) ∪ halt
The execution of the machine is:
I Suppose the current state is s, and the head position is h, then let(s ′, a′, k) = n(s, th): then cell h is written with the symbol a′, thehead moves to position h + k , and the next state is s ′;
I if n(s, th) = halt, or h + k < 0, the machine halts.22
Programming Turing machines
How to represent numbers? In binary, with 0 and 1? In unary, with just1?
Typically use binary. Usually convenient to have a ‘blank’ symbol, and afew marker symbols such as $.
For further information and examples, see Sipser ch. 3.
Exercise: Design a TM (with any convenient alphabet) to interpret anRM program.
Exercise: Design an RM to interpret a TM specification.
(Don’t really do these – just think about how they can be done.)
A predecessor course
http://www.inf.ed.ac.uk/teaching/courses/ci/
provides a Java TM simulator and a bunch of TMs, including a universalTM!
23
Programming Turing machines
How to represent numbers? In binary, with 0 and 1? In unary, with just1?
Typically use binary. Usually convenient to have a ‘blank’ symbol, and afew marker symbols such as $.
For further information and examples, see Sipser ch. 3.
Exercise: Design a TM (with any convenient alphabet) to interpret anRM program.
Exercise: Design an RM to interpret a TM specification.
(Don’t really do these – just think about how they can be done.)
A predecessor course
http://www.inf.ed.ac.uk/teaching/courses/ci/
provides a Java TM simulator and a bunch of TMs, including a universalTM!
23
Programming Turing machines
How to represent numbers? In binary, with 0 and 1? In unary, with just1?
Typically use binary. Usually convenient to have a ‘blank’ symbol, and afew marker symbols such as $.
For further information and examples, see Sipser ch. 3.
Exercise: Design a TM (with any convenient alphabet) to interpret anRM program.
Exercise: Design an RM to interpret a TM specification.
(Don’t really do these – just think about how they can be done.)
A predecessor course
http://www.inf.ed.ac.uk/teaching/courses/ci/
provides a Java TM simulator and a bunch of TMs, including a universalTM!
23
Programming Turing machines
How to represent numbers? In binary, with 0 and 1? In unary, with just1?
Typically use binary. Usually convenient to have a ‘blank’ symbol, and afew marker symbols such as $.
For further information and examples, see Sipser ch. 3.
Exercise: Design a TM (with any convenient alphabet) to interpret anRM program.
Exercise: Design an RM to interpret a TM specification.
(Don’t really do these – just think about how they can be done.)
A predecessor course
http://www.inf.ed.ac.uk/teaching/courses/ci/
provides a Java TM simulator and a bunch of TMs, including a universalTM!
23
TM variations
Bi-infinite tapes, multiple tapes, etc. may make programming easier, butaren’t necessary.
One semi-infinite tape and two symbols are needed.
24
Computable functions
Much of what we say henceforth could be about any reasonable ‘domain’:graphs, integers, rationals, trees. We will use N as the canonical domain,and rely on the (usually obvious) fact that any reasonable domain can beencoded into N.
A (total) function f : N→ N is computable if there is an RM/TM whichcomputes f (e.g. by taking x in R0 and leaving f (x) in R0) and alwaysterminates.
Note that predicates (e.g. ‘does the machine halt?’) are functions,viewing 0 as ‘no’ and 1 as ‘yes’.
You will also see the term ‘recursive function’. This is confusing, becauseit doesn’t mean ‘recursive’ in the sense you know, though there is a strongconnection. This terminology is slowly going out of use. We may come to it . . .
25
Computable functions
Much of what we say henceforth could be about any reasonable ‘domain’:graphs, integers, rationals, trees. We will use N as the canonical domain,and rely on the (usually obvious) fact that any reasonable domain can beencoded into N.
A (total) function f : N→ N is computable if there is an RM/TM whichcomputes f (e.g. by taking x in R0 and leaving f (x) in R0) and alwaysterminates.
Note that predicates (e.g. ‘does the machine halt?’) are functions,viewing 0 as ‘no’ and 1 as ‘yes’.
You will also see the term ‘recursive function’. This is confusing, becauseit doesn’t mean ‘recursive’ in the sense you know, though there is a strongconnection. This terminology is slowly going out of use. We may come to it . . .
25
Computable functions
Much of what we say henceforth could be about any reasonable ‘domain’:graphs, integers, rationals, trees. We will use N as the canonical domain,and rely on the (usually obvious) fact that any reasonable domain can beencoded into N.
A (total) function f : N→ N is computable if there is an RM/TM whichcomputes f (e.g. by taking x in R0 and leaving f (x) in R0) and alwaysterminates.
Note that predicates (e.g. ‘does the machine halt?’) are functions,viewing 0 as ‘no’ and 1 as ‘yes’.
You will also see the term ‘recursive function’. This is confusing, becauseit doesn’t mean ‘recursive’ in the sense you know, though there is a strongconnection. This terminology is slowly going out of use. We may come to it . . .
25
Computable functions
Much of what we say henceforth could be about any reasonable ‘domain’:graphs, integers, rationals, trees. We will use N as the canonical domain,and rely on the (usually obvious) fact that any reasonable domain can beencoded into N.
A (total) function f : N→ N is computable if there is an RM/TM whichcomputes f (e.g. by taking x in R0 and leaving f (x) in R0) and alwaysterminates.
Note that predicates (e.g. ‘does the machine halt?’) are functions,viewing 0 as ‘no’ and 1 as ‘yes’.
You will also see the term ‘recursive function’. This is confusing, becauseit doesn’t mean ‘recursive’ in the sense you know, though there is a strongconnection. This terminology is slowly going out of use. We may come to it . . .
25
Proving (un)decidability
Is the halting problem the only natural (?) undecidable problem?
How can we show that a ‘real’ problem is decidable or undecidable?
To show a problem decidable: write a program to solve it, prove theprogram terminates. Alternatively, translate it to a problem you alreadyknow how to decide.
Why doesn’t this contradict the halting theorem?
To show a problem undecidable: prove that if you could solve it, youcould also solve the halting problem.
26
Proving (un)decidability
Is the halting problem the only natural (?) undecidable problem?
How can we show that a ‘real’ problem is decidable or undecidable?
To show a problem decidable: write a program to solve it, prove theprogram terminates. Alternatively, translate it to a problem you alreadyknow how to decide.
Why doesn’t this contradict the halting theorem?
To show a problem undecidable: prove that if you could solve it, youcould also solve the halting problem.
26
Proving (un)decidability
Is the halting problem the only natural (?) undecidable problem?
How can we show that a ‘real’ problem is decidable or undecidable?
To show a problem decidable: write a program to solve it, prove theprogram terminates. Alternatively, translate it to a problem you alreadyknow how to decide.
Why doesn’t this contradict the halting theorem?
To show a problem undecidable: prove that if you could solve it, youcould also solve the halting problem.
26
Decision problems and oracles
A decision problem is a set D (domain) and a subset Q (query) of D. Theproblem is ‘is d ∈ Q?’, or ‘is Q(d) 0 or 1?’. The problem is computableor decidable iff the predicate Q is computable.
For example:
I the domain N and the subset Primes;
I the domain RM of register machine (encodings), and the subset Hthat halt.
Given a decision problem (D,Q), an oracle for Q is a ‘magic’ RMinstruction oracleQ(i) which assumes that Ri contains (an encoding of)d ∈ D, and sets Ri to contain Q(d).
If Q is itself decidable, oracleQ(i) can be replaced by a call to an RMwhich computes Q – thus a decidable oracle adds nothing.
27
Decision problems and oracles
A decision problem is a set D (domain) and a subset Q (query) of D. Theproblem is ‘is d ∈ Q?’, or ‘is Q(d) 0 or 1?’. The problem is computableor decidable iff the predicate Q is computable. For example:
I the domain N and the subset Primes;
I the domain RM of register machine (encodings), and the subset Hthat halt.
Given a decision problem (D,Q), an oracle for Q is a ‘magic’ RMinstruction oracleQ(i) which assumes that Ri contains (an encoding of)d ∈ D, and sets Ri to contain Q(d).
If Q is itself decidable, oracleQ(i) can be replaced by a call to an RMwhich computes Q – thus a decidable oracle adds nothing.
27
Decision problems and oracles
A decision problem is a set D (domain) and a subset Q (query) of D. Theproblem is ‘is d ∈ Q?’, or ‘is Q(d) 0 or 1?’. The problem is computableor decidable iff the predicate Q is computable. For example:
I the domain N and the subset Primes;
I the domain RM of register machine (encodings), and the subset Hthat halt.
Given a decision problem (D,Q), an oracle for Q is a ‘magic’ RMinstruction oracleQ(i) which assumes that Ri contains (an encoding of)d ∈ D, and sets Ri to contain Q(d).
If Q is itself decidable, oracleQ(i) can be replaced by a call to an RMwhich computes Q – thus a decidable oracle adds nothing.
27
Decision problems and oracles
A decision problem is a set D (domain) and a subset Q (query) of D. Theproblem is ‘is d ∈ Q?’, or ‘is Q(d) 0 or 1?’. The problem is computableor decidable iff the predicate Q is computable. For example:
I the domain N and the subset Primes;
I the domain RM of register machine (encodings), and the subset Hthat halt.
Given a decision problem (D,Q), an oracle for Q is a ‘magic’ RMinstruction oracleQ(i) which assumes that Ri contains (an encoding of)d ∈ D, and sets Ri to contain Q(d).
If Q is itself decidable, oracleQ(i) can be replaced by a call to an RMwhich computes Q – thus a decidable oracle adds nothing.
27
Decision problems and oracles
A decision problem is a set D (domain) and a subset Q (query) of D. Theproblem is ‘is d ∈ Q?’, or ‘is Q(d) 0 or 1?’. The problem is computableor decidable iff the predicate Q is computable. For example:
I the domain N and the subset Primes;
I the domain RM of register machine (encodings), and the subset Hthat halt.
Given a decision problem (D,Q), an oracle for Q is a ‘magic’ RMinstruction oracleQ(i) which assumes that Ri contains (an encoding of)d ∈ D, and sets Ri to contain Q(d).
If Q is itself decidable, oracleQ(i) can be replaced by a call to an RMwhich computes Q – thus a decidable oracle adds nothing.
27
Reductions
Reductions are a key technique. Simple concept: turn one problem intoanother. But there are subtleties.
A Turing transducer is an RM that takes an instance d of a problem(D,Q) in R0 and halts with an instance d ′ = f (d) of (D ′,Q ′) in R0.(Thus f is a computable function D → D ′.)
A mapping reduction or many–one reduction from Q to Q ′ is a Turingtransducer f as above such that d ∈ Q iff f (d) ∈ Q ′.
Equivalently, an m-reduction is a Turing transducer which after puttingf (d) in R0 then calls oracleQ′(0) and halts.
NOTE that there is no post-processing of the oracle’s answer!
Is this intuitively reasonable?
If you allow the transducer to call the oracle freely, you get a Turing reduction.These are also useful; but they’re more powerful, so don’t make such finedistinctions of computing power.
28
Reductions
Reductions are a key technique. Simple concept: turn one problem intoanother. But there are subtleties.
A Turing transducer is an RM that takes an instance d of a problem(D,Q) in R0 and halts with an instance d ′ = f (d) of (D ′,Q ′) in R0.(Thus f is a computable function D → D ′.)
A mapping reduction or many–one reduction from Q to Q ′ is a Turingtransducer f as above such that d ∈ Q iff f (d) ∈ Q ′.
Equivalently, an m-reduction is a Turing transducer which after puttingf (d) in R0 then calls oracleQ′(0) and halts.
NOTE that there is no post-processing of the oracle’s answer!
Is this intuitively reasonable?
If you allow the transducer to call the oracle freely, you get a Turing reduction.These are also useful; but they’re more powerful, so don’t make such finedistinctions of computing power.
28
Reductions
Reductions are a key technique. Simple concept: turn one problem intoanother. But there are subtleties.
A Turing transducer is an RM that takes an instance d of a problem(D,Q) in R0 and halts with an instance d ′ = f (d) of (D ′,Q ′) in R0.(Thus f is a computable function D → D ′.)
A mapping reduction or many–one reduction from Q to Q ′ is a Turingtransducer f as above such that d ∈ Q iff f (d) ∈ Q ′.
Equivalently, an m-reduction is a Turing transducer which after puttingf (d) in R0 then calls oracleQ′(0) and halts.
NOTE that there is no post-processing of the oracle’s answer!
Is this intuitively reasonable?
If you allow the transducer to call the oracle freely, you get a Turing reduction.These are also useful; but they’re more powerful, so don’t make such finedistinctions of computing power.
28
Reductions
Reductions are a key technique. Simple concept: turn one problem intoanother. But there are subtleties.
A Turing transducer is an RM that takes an instance d of a problem(D,Q) in R0 and halts with an instance d ′ = f (d) of (D ′,Q ′) in R0.(Thus f is a computable function D → D ′.)
A mapping reduction or many–one reduction from Q to Q ′ is a Turingtransducer f as above such that d ∈ Q iff f (d) ∈ Q ′.
Equivalently, an m-reduction is a Turing transducer which after puttingf (d) in R0 then calls oracleQ′(0) and halts.
NOTE that there is no post-processing of the oracle’s answer!
Is this intuitively reasonable?
If you allow the transducer to call the oracle freely, you get a Turing reduction.These are also useful; but they’re more powerful, so don’t make such finedistinctions of computing power.
28
Reductions
Reductions are a key technique. Simple concept: turn one problem intoanother. But there are subtleties.
A Turing transducer is an RM that takes an instance d of a problem(D,Q) in R0 and halts with an instance d ′ = f (d) of (D ′,Q ′) in R0.(Thus f is a computable function D → D ′.)
A mapping reduction or many–one reduction from Q to Q ′ is a Turingtransducer f as above such that d ∈ Q iff f (d) ∈ Q ′.
Equivalently, an m-reduction is a Turing transducer which after puttingf (d) in R0 then calls oracleQ′(0) and halts.
NOTE that there is no post-processing of the oracle’s answer!
Is this intuitively reasonable?
If you allow the transducer to call the oracle freely, you get a Turing reduction.These are also useful; but they’re more powerful, so don’t make such finedistinctions of computing power.
28
Showing undecidability
Reductions are the key (really the only) tool for showing undecidability,thus:
I Given (D,Q), construct a reduction Red(H,Q) from (RM/TM,H)to (D,Q).
I Suppose Q is decidable. Then given M ∈ RM, feed it to Red(H,Q)to decide M ∈ H. But if Q is decidable, we can replace the oracle,and Red(H,Q) is just an ordinary RM. Hence H is decidable –contradiction. So Q must be undecidable.
Constructing Red is usually either very easy, or rather long and difficult!
For a relatively simple example of a long and complicated reduction, seehttps://hal.archives-ouvertes.fr/hal-00204625v2, a simplifiedproof by Nicolas Ollinger of the famous ‘tiling problem’ due to Hao Wang.
29
Showing undecidability
Reductions are the key (really the only) tool for showing undecidability,thus:
I Given (D,Q), construct a reduction Red(H,Q) from (RM/TM,H)to (D,Q).
I Suppose Q is decidable. Then given M ∈ RM, feed it to Red(H,Q)to decide M ∈ H. But if Q is decidable, we can replace the oracle,and Red(H,Q) is just an ordinary RM. Hence H is decidable –contradiction. So Q must be undecidable.
Constructing Red is usually either very easy, or rather long and difficult!
For a relatively simple example of a long and complicated reduction, seehttps://hal.archives-ouvertes.fr/hal-00204625v2, a simplifiedproof by Nicolas Ollinger of the famous ‘tiling problem’ due to Hao Wang.
29
Showing undecidability
Reductions are the key (really the only) tool for showing undecidability,thus:
I Given (D,Q), construct a reduction Red(H,Q) from (RM/TM,H)to (D,Q).
I Suppose Q is decidable. Then given M ∈ RM, feed it to Red(H,Q)to decide M ∈ H. But if Q is decidable, we can replace the oracle,and Red(H,Q) is just an ordinary RM. Hence H is decidable –contradiction. So Q must be undecidable.
Constructing Red is usually either very easy, or rather long and difficult!
For a relatively simple example of a long and complicated reduction, seehttps://hal.archives-ouvertes.fr/hal-00204625v2, a simplifiedproof by Nicolas Ollinger of the famous ‘tiling problem’ due to Hao Wang.
29
The Uniform Halting Problem
A simple example:
The Uniform Halting Problem (UH) asks, given a machine M, does Mhalt on all inputs?
‘Clearly’ at least as hard as H, so must be undecidable. Proof:
I We need an m-reduction from H to UH:
I given machine M, input R, build a machine M ′ which ignores itsinput and overwrites it with R, then behaves as M.
I Then M ′ halts on any input iff M halts on R.
Check that this really is a transducer.
30
The Uniform Halting Problem
A simple example:
The Uniform Halting Problem (UH) asks, given a machine M, does Mhalt on all inputs?
‘Clearly’ at least as hard as H, so must be undecidable.
Proof:
I We need an m-reduction from H to UH:
I given machine M, input R, build a machine M ′ which ignores itsinput and overwrites it with R, then behaves as M.
I Then M ′ halts on any input iff M halts on R.
Check that this really is a transducer.
30
The Uniform Halting Problem
A simple example:
The Uniform Halting Problem (UH) asks, given a machine M, does Mhalt on all inputs?
‘Clearly’ at least as hard as H, so must be undecidable. Proof:
I We need an m-reduction from H to UH:
I given machine M, input R, build a machine M ′ which ignores itsinput and overwrites it with R, then behaves as M.
I Then M ′ halts on any input iff M halts on R.
Check that this really is a transducer.
30
The Uniform Halting Problem
A simple example:
The Uniform Halting Problem (UH) asks, given a machine M, does Mhalt on all inputs?
‘Clearly’ at least as hard as H, so must be undecidable. Proof:
I We need an m-reduction from H to UH:
I given machine M, input R, build a machine M ′ which ignores itsinput and overwrites it with R, then behaves as M.
I Then M ′ halts on any input iff M halts on R.
Check that this really is a transducer.
30
The Uniform Halting Problem
A simple example:
The Uniform Halting Problem (UH) asks, given a machine M, does Mhalt on all inputs?
‘Clearly’ at least as hard as H, so must be undecidable. Proof:
I We need an m-reduction from H to UH:
I given machine M, input R, build a machine M ′ which ignores itsinput and overwrites it with R, then behaves as M.
I Then M ′ halts on any input iff M halts on R.
Check that this really is a transducer.
30
The Looping Problem
Let L be the subset of RMs (or TMs) that go into an infinite loop. Showthat L is undecidable.
Since L is just the complement of H, this seems equally easy: just flip theanswer. But m-reductions can’t post-process the answer.
Can you devise an m-reduction from H to L?
No! You can’t.
Can you devise a Turing reduction from H to L?
31
The Looping Problem
Let L be the subset of RMs (or TMs) that go into an infinite loop. Showthat L is undecidable.
Since L is just the complement of H, this seems equally easy: just flip theanswer. But m-reductions can’t post-process the answer.
Can you devise an m-reduction from H to L?
No! You can’t.
Can you devise a Turing reduction from H to L?
31
The Looping Problem
Let L be the subset of RMs (or TMs) that go into an infinite loop. Showthat L is undecidable.
Since L is just the complement of H, this seems equally easy: just flip theanswer. But m-reductions can’t post-process the answer.
Can you devise an m-reduction from H to L?
No! You can’t.
Can you devise a Turing reduction from H to L?
31
The Looping Problem
Let L be the subset of RMs (or TMs) that go into an infinite loop. Showthat L is undecidable.
Since L is just the complement of H, this seems equally easy: just flip theanswer. But m-reductions can’t post-process the answer.
Can you devise an m-reduction from H to L?
No! You can’t.
Can you devise a Turing reduction from H to L?
31
The Looping Problem
Let L be the subset of RMs (or TMs) that go into an infinite loop. Showthat L is undecidable.
Since L is just the complement of H, this seems equally easy: just flip theanswer. But m-reductions can’t post-process the answer.
Can you devise an m-reduction from H to L?
No! You can’t.
Can you devise a Turing reduction from H to L?
31
When ‘yes’ is easier than ‘no’
The halting problem is not symmetrical:
I if M ∈ H, then we can determine this: run M, and when it halts,say ‘yes’;
I if M /∈ H, then we can’t determine this: run M, but it never stops,and we can never say ‘no’.
Such problems are called semi-decidable.
The problem L is the opposite: we can determine ‘no’, but not ‘yes’. It iscalled co-semi-decidable.
There are two ways of exploiting this asymmetry – we’ll see them quicklynow, and in more detail in the second section of the course.
32
When ‘yes’ is easier than ‘no’
The halting problem is not symmetrical:
I if M ∈ H, then we can determine this: run M, and when it halts,say ‘yes’;
I if M /∈ H, then we can’t determine this: run M, but it never stops,and we can never say ‘no’.
Such problems are called semi-decidable.
The problem L is the opposite: we can determine ‘no’, but not ‘yes’. It iscalled co-semi-decidable.
There are two ways of exploiting this asymmetry – we’ll see them quicklynow, and in more detail in the second section of the course.
32
When ‘yes’ is easier than ‘no’
The halting problem is not symmetrical:
I if M ∈ H, then we can determine this: run M, and when it halts,say ‘yes’;
I if M /∈ H, then we can’t determine this: run M, but it never stops,and we can never say ‘no’.
Such problems are called semi-decidable.
The problem L is the opposite: we can determine ‘no’, but not ‘yes’. It iscalled co-semi-decidable.
There are two ways of exploiting this asymmetry – we’ll see them quicklynow, and in more detail in the second section of the course.
32
When ‘yes’ is easier than ‘no’
The halting problem is not symmetrical:
I if M ∈ H, then we can determine this: run M, and when it halts,say ‘yes’;
I if M /∈ H, then we can’t determine this: run M, but it never stops,and we can never say ‘no’.
Such problems are called semi-decidable.
The problem L is the opposite: we can determine ‘no’, but not ‘yes’. It iscalled co-semi-decidable.
There are two ways of exploiting this asymmetry – we’ll see them quicklynow, and in more detail in the second section of the course.
32
Enumeration
It’s convenient to talk of RMs that ‘output an infinite list’ – as ini = 0; while ( true ) print i++; – which can be formalizedin several ways. (Exercise: think of a few.)
Suppose (N,Q) is decidable. Then we can write a machine which outputsthe set Q:
I for each n ∈ 0, 1, 2, . . . , compute Q(n) and output n iff Q(n);
I thus every n ∈ Q will eventually be output.
This is an enumeration of Q, and we say that Q is computably enumerable(c.e. for short).
H is not decidable – but can we still enumerate it?
33
Enumeration
It’s convenient to talk of RMs that ‘output an infinite list’ – as ini = 0; while ( true ) print i++; – which can be formalizedin several ways. (Exercise: think of a few.)
Suppose (N,Q) is decidable. Then we can write a machine which outputsthe set Q:
I for each n ∈ 0, 1, 2, . . . , compute Q(n) and output n iff Q(n);
I thus every n ∈ Q will eventually be output.
This is an enumeration of Q, and we say that Q is computably enumerable(c.e. for short).
H is not decidable – but can we still enumerate it?
33
Enumeration
It’s convenient to talk of RMs that ‘output an infinite list’ – as ini = 0; while ( true ) print i++; – which can be formalizedin several ways. (Exercise: think of a few.)
Suppose (N,Q) is decidable. Then we can write a machine which outputsthe set Q:
I for each n ∈ 0, 1, 2, . . . , compute Q(n) and output n iff Q(n);
I thus every n ∈ Q will eventually be output.
This is an enumeration of Q, and we say that Q is computably enumerable(c.e. for short).
H is not decidable – but can we still enumerate it?
33
Enumeration
It’s convenient to talk of RMs that ‘output an infinite list’ – as ini = 0; while ( true ) print i++; – which can be formalizedin several ways. (Exercise: think of a few.)
Suppose (N,Q) is decidable. Then we can write a machine which outputsthe set Q:
I for each n ∈ 0, 1, 2, . . . , compute Q(n) and output n iff Q(n);
I thus every n ∈ Q will eventually be output.
This is an enumeration of Q, and we say that Q is computably enumerable(c.e. for short).
H is not decidable – but can we still enumerate it?
33
Enumeration by interleaving
Observe that the set of valid register machine encodings is decidable:given n, we can check whether n is pMq for a syntactically valid machineM.
Therefore we can enumerate pM0q, pM1q, . . .
Now consider the following pseudo-code:
machineList := 〈〉 sequence storing a list of machine (code)sfor (i:=0; true; i++):
add pMiq to machineListforeach pMq in machineList:
run M for one step and update in machineListif M has halted:
output pMqdelete pMq from machineList
This program outputs H: every halting machine is eventually output.
H is computably enumerable.
34
Enumeration by interleaving
Observe that the set of valid register machine encodings is decidable:given n, we can check whether n is pMq for a syntactically valid machineM.
Therefore we can enumerate pM0q, pM1q, . . .
Now consider the following pseudo-code:
machineList := 〈〉 sequence storing a list of machine (code)sfor (i:=0; true; i++):
add pMiq to machineListforeach pMq in machineList:
run M for one step and update in machineListif M has halted:
output pMqdelete pMq from machineList
This program outputs H: every halting machine is eventually output.
H is computably enumerable.
34
Enumeration by interleaving
Observe that the set of valid register machine encodings is decidable:given n, we can check whether n is pMq for a syntactically valid machineM.
Therefore we can enumerate pM0q, pM1q, . . .
Now consider the following pseudo-code:
machineList := 〈〉 sequence storing a list of machine (code)sfor (i:=0; true; i++):
add pMiq to machineListforeach pMq in machineList:
run M for one step and update in machineListif M has halted:
output pMqdelete pMq from machineList
This program outputs H: every halting machine is eventually output.
H is computably enumerable.
34
Semi-decidability and enumerability
The interleaving technique lets us see that
I Any semi-decidable problem is also computably enumerable
Exercise: Show that the converse is true:
I Any c.e. problem is semi-decidable.
Now we’ve shown that semi-decidability is the same as c.e.-ness.
Note that if (D,Q) is semi-decidable, then (D,D\Q) is co-semi-decidable.
35
Semi-decidability and enumerability
The interleaving technique lets us see that
I Any semi-decidable problem is also computably enumerable
Exercise: Show that the converse is true:
I Any c.e. problem is semi-decidable.
Now we’ve shown that semi-decidability is the same as c.e.-ness.
Note that if (D,Q) is semi-decidable, then (D,D\Q) is co-semi-decidable.
35
Back to Looping
Suppose that (D,Q) is both semi-decidable and co-semi-decidable.
I So Q and D \ Q are both semi-decidable.
I So run a semi-deciding machine for Q in parallel with a semi-deciderfor D \ Q, using interleaving.
I Whatever instance d we look at, one of the two will halt – so stopthen.
So (D,Q) is decidable.
Now we know that L cannot be semi-decidable, as it’s the complement ofsemi-decidable H.
(Exercise: When I say the complement of L is H, what is the domain D?Do I need to take care?)
36
Back to Looping
Suppose that (D,Q) is both semi-decidable and co-semi-decidable.
I So Q and D \ Q are both semi-decidable.
I So run a semi-deciding machine for Q in parallel with a semi-deciderfor D \ Q, using interleaving.
I Whatever instance d we look at, one of the two will halt – so stopthen.
So (D,Q) is decidable.
Now we know that L cannot be semi-decidable, as it’s the complement ofsemi-decidable H.
(Exercise: When I say the complement of L is H, what is the domain D?Do I need to take care?)
36
Back to Looping
Suppose that (D,Q) is both semi-decidable and co-semi-decidable.
I So Q and D \ Q are both semi-decidable.
I So run a semi-deciding machine for Q in parallel with a semi-deciderfor D \ Q, using interleaving.
I Whatever instance d we look at, one of the two will halt – so stopthen.
So (D,Q) is decidable.
Now we know that L cannot be semi-decidable, as it’s the complement ofsemi-decidable H.
(Exercise: When I say the complement of L is H, what is the domain D?Do I need to take care?)
36
Uniform Halting again
We showed, easily, that UH was undecidable. Is it semi-decidable?
We used reductions from H to show un-decidability. Do reductions fromL show un-semi-decidability? Yes!
Suppose f reduces Y to X , and suppose that X is semi-decidable bymachine MX . Then since Y (y) = X (f (y)), if Y (y) = 1 we can translatey to f (y), run MX , and get 1. If Y (y) = 0, then MX applied to f (y)either gives 0, or doesn’t halt.
So if X is semi-decidable, so is Y ; or if Y is not semi-decidable, then Xcan’t be.
37
Uniform Halting again
We showed, easily, that UH was undecidable. Is it semi-decidable?
We used reductions from H to show un-decidability. Do reductions fromL show un-semi-decidability? Yes!
Suppose f reduces Y to X , and suppose that X is semi-decidable bymachine MX . Then since Y (y) = X (f (y)), if Y (y) = 1 we can translatey to f (y), run MX , and get 1. If Y (y) = 0, then MX applied to f (y)either gives 0, or doesn’t halt.
So if X is semi-decidable, so is Y ; or if Y is not semi-decidable, then Xcan’t be.
37
Uniform Halting again
We showed, easily, that UH was undecidable. Is it semi-decidable?
We used reductions from H to show un-decidability. Do reductions fromL show un-semi-decidability? Yes!
Suppose f reduces Y to X , and suppose that X is semi-decidable bymachine MX . Then since Y (y) = X (f (y)), if Y (y) = 1 we can translatey to f (y), run MX , and get 1. If Y (y) = 0, then MX applied to f (y)either gives 0, or doesn’t halt.
So if X is semi-decidable, so is Y ; or if Y is not semi-decidable, then Xcan’t be.
37
Uniform Halting again
We showed, easily, that UH was undecidable. Is it semi-decidable?
We used reductions from H to show un-decidability. Do reductions fromL show un-semi-decidability? Yes!
Suppose f reduces Y to X , and suppose that X is semi-decidable bymachine MX . Then since Y (y) = X (f (y)), if Y (y) = 1 we can translatey to f (y), run MX , and get 1. If Y (y) = 0, then MX applied to f (y)either gives 0, or doesn’t halt.
So if X is semi-decidable, so is Y ; or if Y is not semi-decidable, then Xcan’t be.
37
Reducing L to UH
We want a transducer f (M,R) such that f (M,R) halts on all inputs iffM loops on R. Not obvious . . . we could ignore the inputs to f (M), buthow do we make it stop?
By a clever trick: f (M,R) will measure how long M runs for, with atimeout as its input.
I Let M ′ = f (M,R) be a machine which takes a number n as input,and then simulates M on R for (at most) n steps.
I If M halts on R before n steps, then M ′ goes into a loop;
I while if M hasn’t halted, M ′ just halts.
I Hence if M loops on R, M ′ halts on all inputs n; and if M halts, thenM ′ loops on all sufficiently large n, and so doesn’t halt on all inputs.
38
Reducing L to UH
We want a transducer f (M,R) such that f (M,R) halts on all inputs iffM loops on R. Not obvious . . . we could ignore the inputs to f (M), buthow do we make it stop?
By a clever trick: f (M,R) will measure how long M runs for, with atimeout as its input.
I Let M ′ = f (M,R) be a machine which takes a number n as input,and then simulates M on R for (at most) n steps.
I If M halts on R before n steps, then M ′ goes into a loop;
I while if M hasn’t halted, M ′ just halts.
I Hence if M loops on R, M ′ halts on all inputs n; and if M halts, thenM ′ loops on all sufficiently large n, and so doesn’t halt on all inputs.
38
Reducing L to UH
We want a transducer f (M,R) such that f (M,R) halts on all inputs iffM loops on R. Not obvious . . . we could ignore the inputs to f (M), buthow do we make it stop?
By a clever trick: f (M,R) will measure how long M runs for, with atimeout as its input.
I Let M ′ = f (M,R) be a machine which takes a number n as input,and then simulates M on R for (at most) n steps.
I If M halts on R before n steps, then M ′ goes into a loop;
I while if M hasn’t halted, M ′ just halts.
I Hence if M loops on R, M ′ halts on all inputs n; and if M halts, thenM ′ loops on all sufficiently large n, and so doesn’t halt on all inputs.
38
Reducing L to UH
We want a transducer f (M,R) such that f (M,R) halts on all inputs iffM loops on R. Not obvious . . . we could ignore the inputs to f (M), buthow do we make it stop?
By a clever trick: f (M,R) will measure how long M runs for, with atimeout as its input.
I Let M ′ = f (M,R) be a machine which takes a number n as input,and then simulates M on R for (at most) n steps.
I If M halts on R before n steps, then M ′ goes into a loop;
I while if M hasn’t halted, M ′ just halts.
I Hence if M loops on R, M ′ halts on all inputs n; and if M halts, thenM ′ loops on all sufficiently large n, and so doesn’t halt on all inputs.
38
Reducing L to UH
We want a transducer f (M,R) such that f (M,R) halts on all inputs iffM loops on R. Not obvious . . . we could ignore the inputs to f (M), buthow do we make it stop?
By a clever trick: f (M,R) will measure how long M runs for, with atimeout as its input.
I Let M ′ = f (M,R) be a machine which takes a number n as input,and then simulates M on R for (at most) n steps.
I If M halts on R before n steps, then M ′ goes into a loop;
I while if M hasn’t halted, M ′ just halts.
I Hence if M loops on R, M ′ halts on all inputs n; and if M halts, thenM ′ loops on all sufficiently large n, and so doesn’t halt on all inputs.
38
Reducing L to UH
We want a transducer f (M,R) such that f (M,R) halts on all inputs iffM loops on R. Not obvious . . . we could ignore the inputs to f (M), buthow do we make it stop?
By a clever trick: f (M,R) will measure how long M runs for, with atimeout as its input.
I Let M ′ = f (M,R) be a machine which takes a number n as input,and then simulates M on R for (at most) n steps.
I If M halts on R before n steps, then M ′ goes into a loop;
I while if M hasn’t halted, M ′ just halts.
I Hence if M loops on R, M ′ halts on all inputs n; and if M halts, thenM ′ loops on all sufficiently large n, and so doesn’t halt on all inputs.
38
So UH is not . . .
The reduction from L shows that UH is not semi-decidable.
Is UH co-semi-decidable (like L)?
No! The (trivial) reduction from H to UH also shows that UH is notco-semi-decidable.
UH
semi-decidable H L co-semi-decidable
decidable
H has from ‘∃n.M halts after n’; L has form ‘∀n.M doesn’t halt after n’; UH hasform ‘∀R.∃n.M halts after n on R’. Really the diagram should be . . .
39
So UH is not . . .
The reduction from L shows that UH is not semi-decidable.
Is UH co-semi-decidable (like L)?
No! The (trivial) reduction from H to UH also shows that UH is notco-semi-decidable.
UH
semi-decidable H L co-semi-decidable
decidable
H has from ‘∃n.M halts after n’; L has form ‘∀n.M doesn’t halt after n’; UH hasform ‘∀R.∃n.M halts after n on R’. Really the diagram should be . . .
39
So UH is not . . .
The reduction from L shows that UH is not semi-decidable.
Is UH co-semi-decidable (like L)?
No! The (trivial) reduction from H to UH also shows that UH is notco-semi-decidable.
UH
semi-decidable H L co-semi-decidable
decidable
H has from ‘∃n.M halts after n’; L has form ‘∀n.M doesn’t halt after n’; UH hasform ‘∀R.∃n.M halts after n on R’. Really the diagram should be . . .
39
So UH is not . . .
The reduction from L shows that UH is not semi-decidable.
Is UH co-semi-decidable (like L)?
No! The (trivial) reduction from H to UH also shows that UH is notco-semi-decidable.
UH
semi-decidable H L co-semi-decidable
decidable
H has from ‘∃n.M halts after n’; L has form ‘∀n.M doesn’t halt after n’; UH hasform ‘∀R.∃n.M halts after n on R’. Really the diagram should be . . .
39
So UH is not . . .
The reduction from L shows that UH is not semi-decidable.
Is UH co-semi-decidable (like L)?
No! The (trivial) reduction from H to UH also shows that UH is notco-semi-decidable.
UH Π02
↑Σ0
1 semi-decidable H L co-semi-decidable Π01
decidable ∆0
1
H has from ‘∃n.M halts after n’; L has form ‘∀n.M doesn’t halt after n’; UH hasform ‘∀R.∃n.M halts after n on R’. Really the diagram should be . . .
39
Harder and harder
Suppose we consider RMH , the class of register machines with an oraclefor H.
We can easily adapt our universal machine to produce a universalH
machine for this class.
So we can apply diagonalization, and show that:
I There is no RMH machine that computes the halting problem forRMH
We can keep doing this, and generate harder and harder problems – thearithmetical hierarchy. Sadly, we don’t have (much) time to explore thisfascinating topic.
40
Harder and harder
Suppose we consider RMH , the class of register machines with an oraclefor H.
We can easily adapt our universal machine to produce a universalH
machine for this class.
So we can apply diagonalization, and show that:
I There is no RMH machine that computes the halting problem forRMH
We can keep doing this, and generate harder and harder problems – thearithmetical hierarchy. Sadly, we don’t have (much) time to explore thisfascinating topic.
40
Harder and harder
Suppose we consider RMH , the class of register machines with an oraclefor H.
We can easily adapt our universal machine to produce a universalH
machine for this class.
So we can apply diagonalization, and show that:
I There is no RMH machine that computes the halting problem forRMH
We can keep doing this, and generate harder and harder problems – thearithmetical hierarchy. Sadly, we don’t have (much) time to explore thisfascinating topic.
40
Harder and harder
Suppose we consider RMH , the class of register machines with an oraclefor H.
We can easily adapt our universal machine to produce a universalH
machine for this class.
So we can apply diagonalization, and show that:
I There is no RMH machine that computes the halting problem forRMH
We can keep doing this, and generate harder and harder problems – thearithmetical hierarchy. Sadly, we don’t have (much) time to explore thisfascinating topic.
40
Complexity of algorithms and problems
Recall that in Inf2B (and in ADS if you did it) we calculated the runningcost of algorithms: typically as a function f (n) of the size n of the input.
Recall big-O notation: f ∈ O(g) means that f is asymptotically boundedabove by g , or formally
∃c, n0.∀n > n0. f (n) ≤ c · g(n)
Similarly f ∈ Ω(g) means f is bounded below by g
∃c, n0.∀n > n0. f (n) ≥ c · g(n)
Recall that for some problems, we could show that any algorithm musttake at least a certain time. E.g. any sorting algorithm must be Ω(n lg n),and we have an O(n lg n) algorithm.
41
Complexity of algorithms and problems
Recall that in Inf2B (and in ADS if you did it) we calculated the runningcost of algorithms: typically as a function f (n) of the size n of the input.
Recall big-O notation: f ∈ O(g) means that f is asymptotically boundedabove by g , or formally
∃c, n0.∀n > n0. f (n) ≤ c · g(n)
Similarly f ∈ Ω(g) means f is bounded below by g
∃c, n0.∀n > n0. f (n) ≥ c · g(n)
Recall that for some problems, we could show that any algorithm musttake at least a certain time. E.g. any sorting algorithm must be Ω(n lg n),and we have an O(n lg n) algorithm.
41
Complexity of algorithms and problems
Recall that in Inf2B (and in ADS if you did it) we calculated the runningcost of algorithms: typically as a function f (n) of the size n of the input.
Recall big-O notation: f ∈ O(g) means that f is asymptotically boundedabove by g , or formally
∃c, n0.∀n > n0. f (n) ≤ c · g(n)
Similarly f ∈ Ω(g) means f is bounded below by g
∃c, n0.∀n > n0. f (n) ≥ c · g(n)
Recall that for some problems, we could show that any algorithm musttake at least a certain time. E.g. any sorting algorithm must be Ω(n lg n),and we have an O(n lg n) algorithm.
41
Complexity of algorithms and problems
Recall that in Inf2B (and in ADS if you did it) we calculated the runningcost of algorithms: typically as a function f (n) of the size n of the input.
Recall big-O notation: f ∈ O(g) means that f is asymptotically boundedabove by g , or formally
∃c, n0.∀n > n0. f (n) ≤ c · g(n)
Similarly f ∈ Ω(g) means f is bounded below by g
∃c, n0.∀n > n0. f (n) ≥ c · g(n)
Recall that for some problems, we could show that any algorithm musttake at least a certain time. E.g. any sorting algorithm must be Ω(n lg n),and we have an O(n lg n) algorithm.
41
Variations in models of computing
In Inf2/ADS, there was the question of what counts as a step ofcomputation.
E.g. in sorting, we counted the number of comparisons – and assumedthat it costs as much to compare a small number as a big number.
Reasonable, for numbers that fit in a machine word!
What about the cost of control flow etc.?
Do we care about whether we’re using Turing machines, register machines,random access machines, or whatever?
On a register machine, adding two numbers is an O(n) operation (becauseof the decrement/increment loop). But if we add addition as a primitive,it’s O(1). On a Turing machine, it’s O(lg n) (work in binary, and doprimary school long addition).
It would be nice to ignore these differences.
42
Variations in models of computing
In Inf2/ADS, there was the question of what counts as a step ofcomputation.
E.g. in sorting, we counted the number of comparisons – and assumedthat it costs as much to compare a small number as a big number.
Reasonable, for numbers that fit in a machine word!
What about the cost of control flow etc.?
Do we care about whether we’re using Turing machines, register machines,random access machines, or whatever?
On a register machine, adding two numbers is an O(n) operation (becauseof the decrement/increment loop). But if we add addition as a primitive,it’s O(1). On a Turing machine, it’s O(lg n) (work in binary, and doprimary school long addition).
It would be nice to ignore these differences.
42
Variations in models of computing
In Inf2/ADS, there was the question of what counts as a step ofcomputation.
E.g. in sorting, we counted the number of comparisons – and assumedthat it costs as much to compare a small number as a big number.
Reasonable, for numbers that fit in a machine word!
What about the cost of control flow etc.?
Do we care about whether we’re using Turing machines, register machines,random access machines, or whatever?
On a register machine, adding two numbers is an O(n) operation (becauseof the decrement/increment loop). But if we add addition as a primitive,it’s O(1). On a Turing machine, it’s O(lg n) (work in binary, and doprimary school long addition).
It would be nice to ignore these differences.
42
Variations in models of computing
In Inf2/ADS, there was the question of what counts as a step ofcomputation.
E.g. in sorting, we counted the number of comparisons – and assumedthat it costs as much to compare a small number as a big number.
Reasonable, for numbers that fit in a machine word!
What about the cost of control flow etc.?
Do we care about whether we’re using Turing machines, register machines,random access machines, or whatever?
On a register machine, adding two numbers is an O(n) operation (becauseof the decrement/increment loop). But if we add addition as a primitive,it’s O(1). On a Turing machine, it’s O(lg n) (work in binary, and doprimary school long addition).
It would be nice to ignore these differences.
42
Time to extend our RMs a bit
Our very basic RMs essentially work in unary: that’s why addition is O(n)instead of O(lg n).
This is too big a slow-down to ignore – it’s exponential in the size (numberof bits = lg n) of the input!.
So from now on we assume instructions add(i , j) and sub(i , j) which(add/truncated subtract) Rj from Ri , putting the result in Ri . (This isenough to do everything else.)
Now our machines are too fast, because we’re adding in time 1, instead of in timeO(lg n) = O(bits). For our purposes now, this is just a linear factor, so doesn’tmatter. If you prefer, you can refine the machine to have registers implementedas bit sequences, with bit manipulation instructions.
43
Time to extend our RMs a bit
Our very basic RMs essentially work in unary: that’s why addition is O(n)instead of O(lg n).
This is too big a slow-down to ignore – it’s exponential in the size (numberof bits = lg n) of the input!.
So from now on we assume instructions add(i , j) and sub(i , j) which(add/truncated subtract) Rj from Ri , putting the result in Ri . (This isenough to do everything else.)
Now our machines are too fast, because we’re adding in time 1, instead of in timeO(lg n) = O(bits). For our purposes now, this is just a linear factor, so doesn’tmatter. If you prefer, you can refine the machine to have registers implementedas bit sequences, with bit manipulation instructions.
43
Time to extend our RMs a bit
Our very basic RMs essentially work in unary: that’s why addition is O(n)instead of O(lg n).
This is too big a slow-down to ignore – it’s exponential in the size (numberof bits = lg n) of the input!.
So from now on we assume instructions add(i , j) and sub(i , j) which(add/truncated subtract) Rj from Ri , putting the result in Ri . (This isenough to do everything else.)
Now our machines are too fast, because we’re adding in time 1, instead of in timeO(lg n) = O(bits). For our purposes now, this is just a linear factor, so doesn’tmatter. If you prefer, you can refine the machine to have registers implementedas bit sequences, with bit manipulation instructions.
43
Time to extend our RMs a bit
Our very basic RMs essentially work in unary: that’s why addition is O(n)instead of O(lg n).
This is too big a slow-down to ignore – it’s exponential in the size (numberof bits = lg n) of the input!.
So from now on we assume instructions add(i , j) and sub(i , j) which(add/truncated subtract) Rj from Ri , putting the result in Ri . (This isenough to do everything else.)
Now our machines are too fast, because we’re adding in time 1, instead of in timeO(lg n) = O(bits). For our purposes now, this is just a linear factor, so doesn’tmatter. If you prefer, you can refine the machine to have registers implementedas bit sequences, with bit manipulation instructions.
43
What counts as seriously different?
Can we separate easy problems from hard problems?
If a problem is O(n) on some model, it’s surely easy on any model.
Actually, O(n) is not ‘easy’ when n is a petabyte. There’s a whole field of‘sub-linear’ algorithms.
If a problem is Ω(2n) on some (non-unary) model, it’s surely hard on anymodel.
Actually, there are problems that are much worse than Ω(2n), but still quitesolvable for real examples. We’ll see one later.
What about something that is O(n10) or Ω(n10)?
An n10 problem is probably practically insoluble – but maybe there’s atrick or a fancy new model that makes it O(n2)?
There’s also that c . . . if f (n) ≥ 10100 lg n, that’s not very tractable. But
this doesn’t often happen.
44
What counts as seriously different?
Can we separate easy problems from hard problems?
If a problem is O(n) on some model, it’s surely easy on any model.
Actually, O(n) is not ‘easy’ when n is a petabyte. There’s a whole field of‘sub-linear’ algorithms.
If a problem is Ω(2n) on some (non-unary) model, it’s surely hard on anymodel.
Actually, there are problems that are much worse than Ω(2n), but still quitesolvable for real examples. We’ll see one later.
What about something that is O(n10) or Ω(n10)?
An n10 problem is probably practically insoluble – but maybe there’s atrick or a fancy new model that makes it O(n2)?
There’s also that c . . . if f (n) ≥ 10100 lg n, that’s not very tractable. But
this doesn’t often happen.
44
What counts as seriously different?
Can we separate easy problems from hard problems?
If a problem is O(n) on some model, it’s surely easy on any model.
Actually, O(n) is not ‘easy’ when n is a petabyte. There’s a whole field of‘sub-linear’ algorithms.
If a problem is Ω(2n) on some (non-unary) model, it’s surely hard on anymodel.
Actually, there are problems that are much worse than Ω(2n), but still quitesolvable for real examples. We’ll see one later.
What about something that is O(n10) or Ω(n10)?
An n10 problem is probably practically insoluble – but maybe there’s atrick or a fancy new model that makes it O(n2)?
There’s also that c . . . if f (n) ≥ 10100 lg n, that’s not very tractable. But
this doesn’t often happen.
44
What counts as seriously different?
Can we separate easy problems from hard problems?
If a problem is O(n) on some model, it’s surely easy on any model.
Actually, O(n) is not ‘easy’ when n is a petabyte. There’s a whole field of‘sub-linear’ algorithms.
If a problem is Ω(2n) on some (non-unary) model, it’s surely hard on anymodel.
Actually, there are problems that are much worse than Ω(2n), but still quitesolvable for real examples. We’ll see one later.
What about something that is O(n10) or Ω(n10)?
An n10 problem is probably practically insoluble – but maybe there’s atrick or a fancy new model that makes it O(n2)?
There’s also that c . . . if f (n) ≥ 10100 lg n, that’s not very tractable. But
this doesn’t often happen.
44
What counts as seriously different?
Can we separate easy problems from hard problems?
If a problem is O(n) on some model, it’s surely easy on any model.
Actually, O(n) is not ‘easy’ when n is a petabyte. There’s a whole field of‘sub-linear’ algorithms.
If a problem is Ω(2n) on some (non-unary) model, it’s surely hard on anymodel.
Actually, there are problems that are much worse than Ω(2n), but still quitesolvable for real examples. We’ll see one later.
What about something that is O(n10) or Ω(n10)?
An n10 problem is probably practically insoluble – but maybe there’s atrick or a fancy new model that makes it O(n2)?
There’s also that c . . . if f (n) ≥ 10100 lg n, that’s not very tractable. But
this doesn’t often happen.
44
What counts as seriously different?
Can we separate easy problems from hard problems?
If a problem is O(n) on some model, it’s surely easy on any model.
Actually, O(n) is not ‘easy’ when n is a petabyte. There’s a whole field of‘sub-linear’ algorithms.
If a problem is Ω(2n) on some (non-unary) model, it’s surely hard on anymodel.
Actually, there are problems that are much worse than Ω(2n), but still quitesolvable for real examples. We’ll see one later.
What about something that is O(n10) or Ω(n10)?
An n10 problem is probably practically insoluble – but maybe there’s atrick or a fancy new model that makes it O(n2)?
There’s also that c . . . if f (n) ≥ 10100 lg n, that’s not very tractable. But
this doesn’t often happen.
44
P – the class of polynomial-time problems
A problem (decision problem or function) or algorithm is in P or PTimeif it can be computed in time f (n) ∈ O(nk) for some k .
The class P is robust: any reasonable change of model doesn’t changeit, and moreover any reasonable translation of one problem into anotherpreserves membership in P.
Problems in P are called tractable.
This is somewhat bogus (see previous slide).
Nonetheless, a problem in P is generally easy-ish in practice, and aproblem not in P is generally hard in practice.
Most of the algorithms you’ve seen are in P.
Note that if a problem is not in P, it’s Ω(nk) for every k (and so ω(nk)for every k). E.g. 2n, or 2
√n.
45
P – the class of polynomial-time problems
A problem (decision problem or function) or algorithm is in P or PTimeif it can be computed in time f (n) ∈ O(nk) for some k .
The class P is robust: any reasonable change of model doesn’t changeit, and moreover any reasonable translation of one problem into anotherpreserves membership in P.
Problems in P are called tractable.
This is somewhat bogus (see previous slide).
Nonetheless, a problem in P is generally easy-ish in practice, and aproblem not in P is generally hard in practice.
Most of the algorithms you’ve seen are in P.
Note that if a problem is not in P, it’s Ω(nk) for every k (and so ω(nk)for every k). E.g. 2n, or 2
√n.
45
P – the class of polynomial-time problems
A problem (decision problem or function) or algorithm is in P or PTimeif it can be computed in time f (n) ∈ O(nk) for some k .
The class P is robust: any reasonable change of model doesn’t changeit, and moreover any reasonable translation of one problem into anotherpreserves membership in P.
Problems in P are called tractable.
This is somewhat bogus (see previous slide).
Nonetheless, a problem in P is generally easy-ish in practice, and aproblem not in P is generally hard in practice.
Most of the algorithms you’ve seen are in P.
Note that if a problem is not in P, it’s Ω(nk) for every k (and so ω(nk)for every k). E.g. 2n, or 2
√n.
45
P – the class of polynomial-time problems
A problem (decision problem or function) or algorithm is in P or PTimeif it can be computed in time f (n) ∈ O(nk) for some k .
The class P is robust: any reasonable change of model doesn’t changeit, and moreover any reasonable translation of one problem into anotherpreserves membership in P.
Problems in P are called tractable.
This is somewhat bogus (see previous slide).
Nonetheless, a problem in P is generally easy-ish in practice, and aproblem not in P is generally hard in practice.
Most of the algorithms you’ve seen are in P.
Note that if a problem is not in P, it’s Ω(nk) for every k (and so ω(nk)for every k). E.g. 2n, or 2
√n.
45
P – the class of polynomial-time problems
A problem (decision problem or function) or algorithm is in P or PTimeif it can be computed in time f (n) ∈ O(nk) for some k .
The class P is robust: any reasonable change of model doesn’t changeit, and moreover any reasonable translation of one problem into anotherpreserves membership in P.
Problems in P are called tractable.
This is somewhat bogus (see previous slide).
Nonetheless, a problem in P is generally easy-ish in practice, and aproblem not in P is generally hard in practice.
Most of the algorithms you’ve seen are in P.
Note that if a problem is not in P, it’s Ω(nk) for every k (and so ω(nk)for every k). E.g. 2n, or 2
√n.
45
Polynomial-time reductions
Just as with computability, reductions are the key tool for comparing thecomplexity of problems.
A PTime reduction or polynomial reduction from (D,Q) to (D ′,Q ′) is:
I a PTime-computable function r : D → D ′, such that
I d ∈ Q ⇔ r(d) ∈ Q ′
We say Q ≤P Q ′
Clearly, if Q ′ ∈ P and Q ≤P Q ′, then also Q ∈ P.
Exercise: If we replace ‘PTime-computable’ by ‘computable’, do we get thenotion of many–one reduction we used before, or the notion of Turing reduction?Does it matter?
For function problems, it makes sense to use a Turing reduction. Exercise:
Express this in terms of oracles.
46
Polynomial-time reductions
Just as with computability, reductions are the key tool for comparing thecomplexity of problems.
A PTime reduction or polynomial reduction from (D,Q) to (D ′,Q ′) is:
I a PTime-computable function r : D → D ′, such that
I d ∈ Q ⇔ r(d) ∈ Q ′
We say Q ≤P Q ′
Clearly, if Q ′ ∈ P and Q ≤P Q ′, then also Q ∈ P.
Exercise: If we replace ‘PTime-computable’ by ‘computable’, do we get thenotion of many–one reduction we used before, or the notion of Turing reduction?Does it matter?
For function problems, it makes sense to use a Turing reduction. Exercise:
Express this in terms of oracles.
46
Polynomial-time reductions
Just as with computability, reductions are the key tool for comparing thecomplexity of problems.
A PTime reduction or polynomial reduction from (D,Q) to (D ′,Q ′) is:
I a PTime-computable function r : D → D ′, such that
I d ∈ Q ⇔ r(d) ∈ Q ′
We say Q ≤P Q ′
Clearly, if Q ′ ∈ P and Q ≤P Q ′, then also Q ∈ P.
Exercise: If we replace ‘PTime-computable’ by ‘computable’, do we get thenotion of many–one reduction we used before, or the notion of Turing reduction?Does it matter?
For function problems, it makes sense to use a Turing reduction. Exercise:
Express this in terms of oracles.
46
Polynomial-time reductions
Just as with computability, reductions are the key tool for comparing thecomplexity of problems.
A PTime reduction or polynomial reduction from (D,Q) to (D ′,Q ′) is:
I a PTime-computable function r : D → D ′, such that
I d ∈ Q ⇔ r(d) ∈ Q ′
We say Q ≤P Q ′
Clearly, if Q ′ ∈ P and Q ≤P Q ′, then also Q ∈ P.
Exercise: If we replace ‘PTime-computable’ by ‘computable’, do we get thenotion of many–one reduction we used before, or the notion of Turing reduction?Does it matter?
For function problems, it makes sense to use a Turing reduction. Exercise:
Express this in terms of oracles.
46
Polynomial-time reductions
Just as with computability, reductions are the key tool for comparing thecomplexity of problems.
A PTime reduction or polynomial reduction from (D,Q) to (D ′,Q ′) is:
I a PTime-computable function r : D → D ′, such that
I d ∈ Q ⇔ r(d) ∈ Q ′
We say Q ≤P Q ′
Clearly, if Q ′ ∈ P and Q ≤P Q ′, then also Q ∈ P.
Exercise: If we replace ‘PTime-computable’ by ‘computable’, do we get thenotion of many–one reduction we used before, or the notion of Turing reduction?Does it matter?
For function problems, it makes sense to use a Turing reduction. Exercise:
Express this in terms of oracles.
46
Outside P
What does it mean to say that a problem cannot be computed by apolynomial algorithm? What is an algorithm?
A polynomially-bounded RM is an RM together with a polynomial (wlognk for some k), such that if the size of the input (e.g. the number of bitsof the number in R0) is n0, then the machine halts after executing nk0instructions.
Then formally a problem Q is in P iff it can be computed by somepolynomially-bounded RM.
This avoids the problem of having to analyse the running time of themachine (which we know is impossible in general!).
To show that Q /∈ P, we therefore have to show that every polynomially-bounded RM claiming to solve Q will time out on some input.
This is not all that easy . . .
47
Outside P
What does it mean to say that a problem cannot be computed by apolynomial algorithm? What is an algorithm?
A polynomially-bounded RM is an RM together with a polynomial (wlognk for some k), such that if the size of the input (e.g. the number of bitsof the number in R0) is n0, then the machine halts after executing nk0instructions.
Then formally a problem Q is in P iff it can be computed by somepolynomially-bounded RM.
This avoids the problem of having to analyse the running time of themachine (which we know is impossible in general!).
To show that Q /∈ P, we therefore have to show that every polynomially-bounded RM claiming to solve Q will time out on some input.
This is not all that easy . . .
47
Outside P
What does it mean to say that a problem cannot be computed by apolynomial algorithm? What is an algorithm?
A polynomially-bounded RM is an RM together with a polynomial (wlognk for some k), such that if the size of the input (e.g. the number of bitsof the number in R0) is n0, then the machine halts after executing nk0instructions.
Then formally a problem Q is in P iff it can be computed by somepolynomially-bounded RM.
This avoids the problem of having to analyse the running time of themachine (which we know is impossible in general!).
To show that Q /∈ P, we therefore have to show that every polynomially-bounded RM claiming to solve Q will time out on some input.
This is not all that easy . . .
47
Outside P
What does it mean to say that a problem cannot be computed by apolynomial algorithm? What is an algorithm?
A polynomially-bounded RM is an RM together with a polynomial (wlognk for some k), such that if the size of the input (e.g. the number of bitsof the number in R0) is n0, then the machine halts after executing nk0instructions.
Then formally a problem Q is in P iff it can be computed by somepolynomially-bounded RM.
This avoids the problem of having to analyse the running time of themachine (which we know is impossible in general!).
To show that Q /∈ P, we therefore have to show that every polynomially-bounded RM claiming to solve Q will time out on some input.
This is not all that easy . . .
47
Outside P
What does it mean to say that a problem cannot be computed by apolynomial algorithm? What is an algorithm?
A polynomially-bounded RM is an RM together with a polynomial (wlognk for some k), such that if the size of the input (e.g. the number of bitsof the number in R0) is n0, then the machine halts after executing nk0instructions.
Then formally a problem Q is in P iff it can be computed by somepolynomially-bounded RM.
This avoids the problem of having to analyse the running time of themachine (which we know is impossible in general!).
To show that Q /∈ P, we therefore have to show that every polynomially-bounded RM claiming to solve Q will time out on some input.
This is not all that easy . . .
47
Some apparently intractable problems
Practical problems that are hard to solve typically seem to need exploringmany (exponentially many) possible answers.
The Hamiltonian Path Problem takes a graph G = (V ,E ), and asks ifthere is a Hamiltonian path, i.e. a path that visits every vertex exactlyonce.
Naively, the only way to solve it is to explore all paths, and there couldbe up to |V |! of them. Factorial grows slightly faster than exponential, sothis is bad . . .
Timetabling is the following: given students taking exams, and timetableslots for exams, is it possible to schedule the exams so that there are noclashes? It also apparently requires looking at exponentially many possibleassignments. (That’s why Registry starts timetabling exams 9 weeks inadvance. . . )
48
Some apparently intractable problems
Practical problems that are hard to solve typically seem to need exploringmany (exponentially many) possible answers.
The Hamiltonian Path Problem takes a graph G = (V ,E ), and asks ifthere is a Hamiltonian path, i.e. a path that visits every vertex exactlyonce.
Naively, the only way to solve it is to explore all paths, and there couldbe up to |V |! of them. Factorial grows slightly faster than exponential, sothis is bad . . .
Timetabling is the following: given students taking exams, and timetableslots for exams, is it possible to schedule the exams so that there are noclashes? It also apparently requires looking at exponentially many possibleassignments. (That’s why Registry starts timetabling exams 9 weeks inadvance. . . )
48
Some apparently intractable problems
Practical problems that are hard to solve typically seem to need exploringmany (exponentially many) possible answers.
The Hamiltonian Path Problem takes a graph G = (V ,E ), and asks ifthere is a Hamiltonian path, i.e. a path that visits every vertex exactlyonce.
Naively, the only way to solve it is to explore all paths, and there couldbe up to |V |! of them. Factorial grows slightly faster than exponential, sothis is bad . . .
Timetabling is the following: given students taking exams, and timetableslots for exams, is it possible to schedule the exams so that there are noclashes? It also apparently requires looking at exponentially many possibleassignments. (That’s why Registry starts timetabling exams 9 weeks inadvance. . . )
48
Some apparently intractable problems
Practical problems that are hard to solve typically seem to need exploringmany (exponentially many) possible answers.
The Hamiltonian Path Problem takes a graph G = (V ,E ), and asks ifthere is a Hamiltonian path, i.e. a path that visits every vertex exactlyonce.
Naively, the only way to solve it is to explore all paths, and there couldbe up to |V |! of them. Factorial grows slightly faster than exponential, sothis is bad . . .
Timetabling is the following: given students taking exams, and timetableslots for exams, is it possible to schedule the exams so that there are noclashes? It also apparently requires looking at exponentially many possibleassignments. (That’s why Registry starts timetabling exams 9 weeks inadvance. . . )
48
Checking vs solving
Both HPP and Timetabling have the property that they are easy to check:if you have a claimed solution, it’s quick to verify that it is a solution.
This is a feature of many practical intractable problems.
Can we study and explore this property?
49
Checking vs solving
Both HPP and Timetabling have the property that they are easy to check:if you have a claimed solution, it’s quick to verify that it is a solution.
This is a feature of many practical intractable problems.
Can we study and explore this property?
49
Non-deterministic register machines
In Inf2A, you saw non-deterministic finite automata: multiple possibletransitions from each (state,input) pair. These were equivalent todeterministic automata, albeit with an exponential blow-up in states.
We can do the same with RMs (or TMs). For RMs, probably easiest toadd a new instruction maybe(j), which non-deterministically either doesnothing or jumps to Ij :
clear R0
beg : maybe (end)inc (0)goto beg
end :
puts a non-deterministically chosen number in R0.
There is a path which just loops for ever – this doesn’t matter. Usually we wantto choose from a finite set, so not an issue anyway.
There is no notion of probability here. Probabilistic RMs (e.g. maybe chooseswith equal prob.) are another extension.
50
Non-deterministic register machines
In Inf2A, you saw non-deterministic finite automata: multiple possibletransitions from each (state,input) pair. These were equivalent todeterministic automata, albeit with an exponential blow-up in states.
We can do the same with RMs (or TMs). For RMs, probably easiest toadd a new instruction maybe(j), which non-deterministically either doesnothing or jumps to Ij :
clear R0
beg : maybe (end)inc (0)goto beg
end :
puts a non-deterministically chosen number in R0.
There is a path which just loops for ever – this doesn’t matter. Usually we wantto choose from a finite set, so not an issue anyway.
There is no notion of probability here. Probabilistic RMs (e.g. maybe chooseswith equal prob.) are another extension.
50
Non-deterministic register machines
In Inf2A, you saw non-deterministic finite automata: multiple possibletransitions from each (state,input) pair. These were equivalent todeterministic automata, albeit with an exponential blow-up in states.
We can do the same with RMs (or TMs). For RMs, probably easiest toadd a new instruction maybe(j), which non-deterministically either doesnothing or jumps to Ij :
clear R0
beg : maybe (end)inc (0)goto beg
end :
puts a non-deterministically chosen number in R0.
There is a path which just loops for ever – this doesn’t matter. Usually we wantto choose from a finite set, so not an issue anyway.
There is no notion of probability here. Probabilistic RMs (e.g. maybe chooseswith equal prob.) are another extension.
50
Non-deterministic register machines
In Inf2A, you saw non-deterministic finite automata: multiple possibletransitions from each (state,input) pair. These were equivalent todeterministic automata, albeit with an exponential blow-up in states.
We can do the same with RMs (or TMs). For RMs, probably easiest toadd a new instruction maybe(j), which non-deterministically either doesnothing or jumps to Ij :
clear R0
beg : maybe (end)inc (0)goto beg
end :
puts a non-deterministically chosen number in R0.
There is a path which just loops for ever – this doesn’t matter. Usually we wantto choose from a finite set, so not an issue anyway.
There is no notion of probability here. Probabilistic RMs (e.g. maybe chooseswith equal prob.) are another extension.
50
Non-deterministic register machines
In Inf2A, you saw non-deterministic finite automata: multiple possibletransitions from each (state,input) pair. These were equivalent todeterministic automata, albeit with an exponential blow-up in states.
We can do the same with RMs (or TMs). For RMs, probably easiest toadd a new instruction maybe(j), which non-deterministically either doesnothing or jumps to Ij :
clear R0
beg : maybe (end)inc (0)goto beg
end :
puts a non-deterministically chosen number in R0.
There is a path which just loops for ever – this doesn’t matter. Usually we wantto choose from a finite set, so not an issue anyway.
There is no notion of probability here. Probabilistic RMs (e.g. maybe chooseswith equal prob.) are another extension.
50
An NRM accepts if some run (sequence of instructions through thechoices) halts and accepts.
‘run accepts’ could mean: halts; halts with 1 in R0; or whatever.
Exercise: explore a few possibilities – does it matter?
Thus infinite runs are ignored.
I sometimes prefer to think of maybe as fork: the machine forks a copyof itself which takes the jump. If any copy accepts, it signals the OS,which kills off all the others.
How does this work for problems with multiple correct answers?
We can use an RM to simulate all the runs of an NRM using theinterleaving technique. Thus:
I NRMs have the same deciding power as RMs.
Exercise: Read the details of the TM version in Sipser.
51
An NRM accepts if some run (sequence of instructions through thechoices) halts and accepts.
‘run accepts’ could mean: halts; halts with 1 in R0; or whatever.
Exercise: explore a few possibilities – does it matter?
Thus infinite runs are ignored.
I sometimes prefer to think of maybe as fork: the machine forks a copyof itself which takes the jump. If any copy accepts, it signals the OS,which kills off all the others.
How does this work for problems with multiple correct answers?
We can use an RM to simulate all the runs of an NRM using theinterleaving technique. Thus:
I NRMs have the same deciding power as RMs.
Exercise: Read the details of the TM version in Sipser.
51
An NRM accepts if some run (sequence of instructions through thechoices) halts and accepts.
‘run accepts’ could mean: halts; halts with 1 in R0; or whatever.
Exercise: explore a few possibilities – does it matter?
Thus infinite runs are ignored.
I sometimes prefer to think of maybe as fork: the machine forks a copyof itself which takes the jump. If any copy accepts, it signals the OS,which kills off all the others.
How does this work for problems with multiple correct answers?
We can use an RM to simulate all the runs of an NRM using theinterleaving technique. Thus:
I NRMs have the same deciding power as RMs.
Exercise: Read the details of the TM version in Sipser.
51
An NRM accepts if some run (sequence of instructions through thechoices) halts and accepts.
‘run accepts’ could mean: halts; halts with 1 in R0; or whatever.
Exercise: explore a few possibilities – does it matter?
Thus infinite runs are ignored.
I sometimes prefer to think of maybe as fork: the machine forks a copyof itself which takes the jump. If any copy accepts, it signals the OS,which kills off all the others.
How does this work for problems with multiple correct answers?
We can use an RM to simulate all the runs of an NRM using theinterleaving technique. Thus:
I NRMs have the same deciding power as RMs.
Exercise: Read the details of the TM version in Sipser.
51
NPTime
NRMs are, however, potentially exponentially faster than RMs: in time n,an NRM can fork/explore 2O(n) possibilities.
We say Q ∈ NP if there is a polynomially-bounded NRM that computesQ.
I HPP is in NP: guess the Hamiltonian path (if there is one), andcheck it (in linear time).
I Timetabling is in NP: guess a timetable, and check it for clashes (inmaybe O(n2) time?).
Is this just theory? We can’t do non-determinism in practice . . . can we?Quantum computers can achieve a similar effect: an n-qubit computer computeson all 2n values simultaneously. But it’s hard to get many qubits; and there aresubtleties – not every NP algorithm is quantum-computable (as far as we know).
52
NPTime
NRMs are, however, potentially exponentially faster than RMs: in time n,an NRM can fork/explore 2O(n) possibilities.
We say Q ∈ NP if there is a polynomially-bounded NRM that computesQ.
I HPP is in NP: guess the Hamiltonian path (if there is one), andcheck it (in linear time).
I Timetabling is in NP: guess a timetable, and check it for clashes (inmaybe O(n2) time?).
Is this just theory? We can’t do non-determinism in practice . . . can we?Quantum computers can achieve a similar effect: an n-qubit computer computeson all 2n values simultaneously. But it’s hard to get many qubits; and there aresubtleties – not every NP algorithm is quantum-computable (as far as we know).
52
NPTime
NRMs are, however, potentially exponentially faster than RMs: in time n,an NRM can fork/explore 2O(n) possibilities.
We say Q ∈ NP if there is a polynomially-bounded NRM that computesQ.
I HPP is in NP: guess the Hamiltonian path (if there is one), andcheck it (in linear time).
I Timetabling is in NP: guess a timetable, and check it for clashes (inmaybe O(n2) time?).
Is this just theory? We can’t do non-determinism in practice . . . can we?Quantum computers can achieve a similar effect: an n-qubit computer computeson all 2n values simultaneously. But it’s hard to get many qubits; and there aresubtleties – not every NP algorithm is quantum-computable (as far as we know).
52
NPTime
NRMs are, however, potentially exponentially faster than RMs: in time n,an NRM can fork/explore 2O(n) possibilities.
We say Q ∈ NP if there is a polynomially-bounded NRM that computesQ.
I HPP is in NP: guess the Hamiltonian path (if there is one), andcheck it (in linear time).
I Timetabling is in NP: guess a timetable, and check it for clashes (inmaybe O(n2) time?).
Is this just theory? We can’t do non-determinism in practice . . . can we?Quantum computers can achieve a similar effect: an n-qubit computer computeson all 2n values simultaneously. But it’s hard to get many qubits; and there aresubtleties – not every NP algorithm is quantum-computable (as far as we know).
52
Is NP all we need?
Is every apparently exponential problem really an NP problem?
No! Orat least, it will be very surprising if they are.
I Consider the problem of determining whether an exponentiallybounded RM accepts.
I There is no shortcut, and nothing to guess. You just have to run itand see.
I Alas, this isn’t a proof – we don’t have a proof that there isn’t anNP algorithm to do it, but (almost) nobody believes there is.
To say nothing of doubly-exponential, non-elementary, etc. (See later.)
53
Is NP all we need?
Is every apparently exponential problem really an NP problem? No! Orat least, it will be very surprising if they are.
I Consider the problem of determining whether an exponentiallybounded RM accepts.
I There is no shortcut, and nothing to guess. You just have to run itand see.
I Alas, this isn’t a proof – we don’t have a proof that there isn’t anNP algorithm to do it, but (almost) nobody believes there is.
To say nothing of doubly-exponential, non-elementary, etc. (See later.)
53
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
Are all NP problems equally hard?
What does this mean? How do we measure hardness?
We’ll measure hardness relative to ≤P.
I Q is easier than (or equally hard as) Q ′ iff Q ≤P Q ′.
I We say Q is NP-hard iff Q ≥P Q ′ for every Q ′ ∈ NP.
Truly exponential problems are NP-hard, but that’s boring.
I We say Q is NP-complete iff Q is NP-hard and Q ∈ NP.
I Obviously if Q is NP-complete, and NP 3 Q ′ ≥P Q, then Q ′ is alsoNP-complete.
Do NP-complete problems necessarily exist?
They do exist, and there are many, including HPP and Timetabling. Infact, almost all NP-problems encountered in practice are either in P, orNP-complete.
54
The Cook–Levin theorem
states that a particular NP problem, SAT, is NP-complete.
The theorem is usually proved for TMs; we shall do it for RMs.
The concept is simple, but there are details. Most details are left for you.
Why ‘Cook–Levin’? The notion of NP-completeness, and the theorem, weredue to Stephen Cook (and partly Richard Karp) – in the West. But as withmany major mathematical results of the mid-20th century, they were discoveredindependently in the Soviet Union, by Leonid Levin. Since the fall of the IronCurtain made Soviet maths more accessible, we try to attribute results to bothdiscoverers.
55
The Cook–Levin theorem
states that a particular NP problem, SAT, is NP-complete.
The theorem is usually proved for TMs; we shall do it for RMs.
The concept is simple, but there are details. Most details are left for you.
Why ‘Cook–Levin’? The notion of NP-completeness, and the theorem, weredue to Stephen Cook (and partly Richard Karp) – in the West. But as withmany major mathematical results of the mid-20th century, they were discoveredindependently in the Soviet Union, by Leonid Levin. Since the fall of the IronCurtain made Soviet maths more accessible, we try to attribute results to bothdiscoverers.
55
The Cook–Levin theorem
states that a particular NP problem, SAT, is NP-complete.
The theorem is usually proved for TMs; we shall do it for RMs.
The concept is simple, but there are details. Most details are left for you.
Why ‘Cook–Levin’? The notion of NP-completeness, and the theorem, weredue to Stephen Cook (and partly Richard Karp) – in the West. But as withmany major mathematical results of the mid-20th century, they were discoveredindependently in the Soviet Union, by Leonid Levin. Since the fall of the IronCurtain made Soviet maths more accessible, we try to attribute results to bothdiscoverers.
55
SAT
SAT is a simple problem about boolean logic:
Given a boolean formula φ over a set of boolean variables Xi , is there anassignment of values to the Xi which satisfies φ (i.e. makes φ true)?
Equivalently, is φ non-contradictory?
I (A ∨ B) ∧ (¬B ∨ C ) ∧ (A ∨ C ) is satisfiable, e.g. by making A and Ctrue.
I (A ∧ B) ∧ (¬B ∧ C ) ∧ (A ∧ C ) is not satisfiable.
The size of a SAT problem is simply the number of symbols in φ.
SAT is obviously in NP: just guess an assignment and check it.
It’s also apparently exponential in reality: no obvious way to avoidchecking all possible assignments (the truth table method).
56
SAT
SAT is a simple problem about boolean logic:
Given a boolean formula φ over a set of boolean variables Xi , is there anassignment of values to the Xi which satisfies φ (i.e. makes φ true)?
Equivalently, is φ non-contradictory?
I (A ∨ B) ∧ (¬B ∨ C ) ∧ (A ∨ C ) is satisfiable, e.g. by making A and Ctrue.
I (A ∧ B) ∧ (¬B ∧ C ) ∧ (A ∧ C ) is not satisfiable.
The size of a SAT problem is simply the number of symbols in φ.
SAT is obviously in NP: just guess an assignment and check it.
It’s also apparently exponential in reality: no obvious way to avoidchecking all possible assignments (the truth table method).
56
SAT
SAT is a simple problem about boolean logic:
Given a boolean formula φ over a set of boolean variables Xi , is there anassignment of values to the Xi which satisfies φ (i.e. makes φ true)?
Equivalently, is φ non-contradictory?
I (A ∨ B) ∧ (¬B ∨ C ) ∧ (A ∨ C ) is satisfiable, e.g. by making A and Ctrue.
I (A ∧ B) ∧ (¬B ∧ C ) ∧ (A ∧ C ) is not satisfiable.
The size of a SAT problem is simply the number of symbols in φ.
SAT is obviously in NP: just guess an assignment and check it.
It’s also apparently exponential in reality: no obvious way to avoidchecking all possible assignments (the truth table method).
56
SAT
SAT is a simple problem about boolean logic:
Given a boolean formula φ over a set of boolean variables Xi , is there anassignment of values to the Xi which satisfies φ (i.e. makes φ true)?
Equivalently, is φ non-contradictory?
I (A ∨ B) ∧ (¬B ∨ C ) ∧ (A ∨ C ) is satisfiable, e.g. by making A and Ctrue.
I (A ∧ B) ∧ (¬B ∧ C ) ∧ (A ∧ C ) is not satisfiable.
The size of a SAT problem is simply the number of symbols in φ.
SAT is obviously in NP: just guess an assignment and check it.
It’s also apparently exponential in reality: no obvious way to avoidchecking all possible assignments (the truth table method).
56
SAT
SAT is a simple problem about boolean logic:
Given a boolean formula φ over a set of boolean variables Xi , is there anassignment of values to the Xi which satisfies φ (i.e. makes φ true)?
Equivalently, is φ non-contradictory?
I (A ∨ B) ∧ (¬B ∨ C ) ∧ (A ∨ C ) is satisfiable, e.g. by making A and Ctrue.
I (A ∧ B) ∧ (¬B ∧ C ) ∧ (A ∧ C ) is not satisfiable.
The size of a SAT problem is simply the number of symbols in φ.
SAT is obviously in NP: just guess an assignment and check it.
It’s also apparently exponential in reality: no obvious way to avoidchecking all possible assignments (the truth table method).
56
SAT
SAT is a simple problem about boolean logic:
Given a boolean formula φ over a set of boolean variables Xi , is there anassignment of values to the Xi which satisfies φ (i.e. makes φ true)?
Equivalently, is φ non-contradictory?
I (A ∨ B) ∧ (¬B ∨ C ) ∧ (A ∨ C ) is satisfiable, e.g. by making A and Ctrue.
I (A ∧ B) ∧ (¬B ∧ C ) ∧ (A ∧ C ) is not satisfiable.
The size of a SAT problem is simply the number of symbols in φ.
SAT is obviously in NP: just guess an assignment and check it.
It’s also apparently exponential in reality: no obvious way to avoidchecking all possible assignments (the truth table method).
56
SAT is NP-complete
Suppose (D,Q) ∈ NP. We shall construct a reduction Q ≤P SAT.
Outline of proof:
Given an instance d ∈ D, design a boolean formula φd which can besatisfied if its variables describe the successful executions of an NRMchecking Q. The machine can be polynomially bounded, so the size of φdis polynomial in the size of d .
The rest is detail.
57
SAT is NP-complete
Suppose (D,Q) ∈ NP. We shall construct a reduction Q ≤P SAT.
Outline of proof:
Given an instance d ∈ D, design a boolean formula φd which can besatisfied if its variables describe the successful executions of an NRMchecking Q. The machine can be polynomially bounded, so the size of φdis polynomial in the size of d .
The rest is detail.
57
SAT is NP-complete
Suppose (D,Q) ∈ NP. We shall construct a reduction Q ≤P SAT.
Outline of proof:
Given an instance d ∈ D, design a boolean formula φd which can besatisfied if its variables describe the successful executions of an NRMchecking Q. The machine can be polynomially bounded, so the size of φdis polynomial in the size of d .
The rest is detail.
57
Detail
We have an instance d ∈ D.
Since (D,Q) ∈ NP, there is a polynomially bounded NRM M =(R0, . . . ,Rm−1, I0 . . . In−1) that computes Q. Let p(x) be the polynomialbound, and let s = p(|d |). (s ‘number of steps’) (Note: the emptyinstruction In is the halting state.)
So M takes at most s steps.
How big can the values in the registers get? Worst case: start with 2|d |,and execute s add(0, 0) instructions, giving 2|d |+s . I.e. at most |d | + sbits. Assume wlog s ≥ |d |, so at most 2s bits.
We now define a large, but O(poly(s)), number of variables, with intendedinterpretations.
‘Intended’ because we will design φd so that these interpretations do what wewant.
58
Detail
We have an instance d ∈ D.
Since (D,Q) ∈ NP, there is a polynomially bounded NRM M =(R0, . . . ,Rm−1, I0 . . . In−1) that computes Q. Let p(x) be the polynomialbound, and let s = p(|d |). (s ‘number of steps’) (Note: the emptyinstruction In is the halting state.)
So M takes at most s steps.
How big can the values in the registers get? Worst case: start with 2|d |,and execute s add(0, 0) instructions, giving 2|d |+s . I.e. at most |d | + sbits. Assume wlog s ≥ |d |, so at most 2s bits.
We now define a large, but O(poly(s)), number of variables, with intendedinterpretations.
‘Intended’ because we will design φd so that these interpretations do what wewant.
58
Detail
We have an instance d ∈ D.
Since (D,Q) ∈ NP, there is a polynomially bounded NRM M =(R0, . . . ,Rm−1, I0 . . . In−1) that computes Q. Let p(x) be the polynomialbound, and let s = p(|d |). (s ‘number of steps’) (Note: the emptyinstruction In is the halting state.)
So M takes at most s steps.
How big can the values in the registers get? Worst case: start with 2|d |,and execute s add(0, 0) instructions, giving 2|d |+s . I.e. at most |d | + sbits. Assume wlog s ≥ |d |, so at most 2s bits.
We now define a large, but O(poly(s)), number of variables, with intendedinterpretations.
‘Intended’ because we will design φd so that these interpretations do what wewant.
58
VariablesName Meaning How many?
Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · n
Rtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?
χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·m
ρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · n
α machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)
59
VariablesName Meaning How many?Ctj program counter at step t is on Ij s · nRtik kth bit of Ri at step t s ·m · 2s
Formulas
Name Meaning How big?χone prog cntr in one place s · n2
χt step t + 1 follows from step t s2 ·mρinit initial register values m · nα machine accepts s
The formula C00 ∧ ρinit ∧χone ∧∧
t χt ∧α is then satisfied by assignmentscorresponding to valid executions of the machine.
The χone formula is easy:∧t
∨j
(Ctj ∧
∧j ′ 6=j
¬Ctj ′
)59
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or subCt+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.
60
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or subCt+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.
60
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or sub
Ct+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.
60
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or subCt+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)
((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.
60
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or subCt+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.
60
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or subCt+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.
60
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or subCt+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.
60
more details
Coding χt is a bit tedious . . . step t + 1 follows from step t if the controlflow is correct and if the registers have changed correctly.
For control flow: the formula is∨
j(Ctj ∧ νtj), where νtj is:
Ct+1,j+1 if Ij is inc, add or subCt+1,j+1 ∨ Ct+1,j ′ if Ij is maybe(j ′)((∨
k Rtik) ∧ Ct+1,j+1) ∨ ((∧
k ¬Rtik) ∧ Ct+1,j ′) if Ij is dec(i , j ′)
To code the register changes, we need to express the addition of (2s)-bitnumbers by boolean operations on the bits. See your hardware course!
Exercise: write the formula ρ+tii ′ which says that at step t + 1, the value
of Ri is the sum of the values of Ri and Ri ′ at step t. (Answer in thenotes.)
Despite the very tedious nature of these formulas, we can see that thewhole formula is O(s4).
And so the theorem is proved.60
More NP-complete problems
There are many hundreds of natural problems that are NP-complete.Look at the Wikipedia article ‘List of NP-complete problems’ for a rangeof examples, or Sipser for more details on a smaller range.
Starting from SAT, we use reductions to build a library of NP-completeproblems, for use in tackling new potential problems.
Sometimes, the reductions from SAT (or other known problem) requireconsiderable ingenuity. E.g., showing NP-completeness of HPP is quitetricky.
We’ll do a couple of examples.
61
More NP-complete problems
There are many hundreds of natural problems that are NP-complete.Look at the Wikipedia article ‘List of NP-complete problems’ for a rangeof examples, or Sipser for more details on a smaller range.
Starting from SAT, we use reductions to build a library of NP-completeproblems, for use in tackling new potential problems.
Sometimes, the reductions from SAT (or other known problem) requireconsiderable ingenuity. E.g., showing NP-completeness of HPP is quitetricky.
We’ll do a couple of examples.
61
More NP-complete problems
There are many hundreds of natural problems that are NP-complete.Look at the Wikipedia article ‘List of NP-complete problems’ for a rangeof examples, or Sipser for more details on a smaller range.
Starting from SAT, we use reductions to build a library of NP-completeproblems, for use in tackling new potential problems.
Sometimes, the reductions from SAT (or other known problem) requireconsiderable ingenuity. E.g., showing NP-completeness of HPP is quitetricky.
We’ll do a couple of examples.
61
3SAT
There are special forms of SAT that turn out to be particularly handy.
A literal is either a propositional variable P or the negation ¬P of one.
φ is in conjunctive normal form (CNF) if it is of the form∧
i
∨j pij where
each pij is a literal.
φ is in k-CNF if each clause∨
j pij has at most k literals.
3SAT is the problem of whether a satisfying assignment exists for aformula in 3-CNF.
The reduction from unrestricted SAT to 3-SAT is also a bit tricky, mostlybecause normal boolean logic conversion to CNF introduces exponentialblowup. The Tseitin encoding is used to produce a not equivalent butequisatisfiable CNF formula. (See Note 3.)
We are paying a price here for having done Cook–Levin with RMs. The originalversion with TMs manages to produce a CNF formula describing machineexecutions. In fact, SAT is often defined to mean CNF-SAT.
62
3SAT
There are special forms of SAT that turn out to be particularly handy.
A literal is either a propositional variable P or the negation ¬P of one.
φ is in conjunctive normal form (CNF) if it is of the form∧
i
∨j pij where
each pij is a literal.
φ is in k-CNF if each clause∨
j pij has at most k literals.
3SAT is the problem of whether a satisfying assignment exists for aformula in 3-CNF.
The reduction from unrestricted SAT to 3-SAT is also a bit tricky, mostlybecause normal boolean logic conversion to CNF introduces exponentialblowup. The Tseitin encoding is used to produce a not equivalent butequisatisfiable CNF formula. (See Note 3.)
We are paying a price here for having done Cook–Levin with RMs. The originalversion with TMs manages to produce a CNF formula describing machineexecutions. In fact, SAT is often defined to mean CNF-SAT.
62
3SAT
There are special forms of SAT that turn out to be particularly handy.
A literal is either a propositional variable P or the negation ¬P of one.
φ is in conjunctive normal form (CNF) if it is of the form∧
i
∨j pij where
each pij is a literal.
φ is in k-CNF if each clause∨
j pij has at most k literals.
3SAT is the problem of whether a satisfying assignment exists for aformula in 3-CNF.
The reduction from unrestricted SAT to 3-SAT is also a bit tricky, mostlybecause normal boolean logic conversion to CNF introduces exponentialblowup. The Tseitin encoding is used to produce a not equivalent butequisatisfiable CNF formula. (See Note 3.)
We are paying a price here for having done Cook–Levin with RMs. The originalversion with TMs manages to produce a CNF formula describing machineexecutions. In fact, SAT is often defined to mean CNF-SAT.
62
3SAT
There are special forms of SAT that turn out to be particularly handy.
A literal is either a propositional variable P or the negation ¬P of one.
φ is in conjunctive normal form (CNF) if it is of the form∧
i
∨j pij where
each pij is a literal.
φ is in k-CNF if each clause∨
j pij has at most k literals.
3SAT is the problem of whether a satisfying assignment exists for aformula in 3-CNF.
The reduction from unrestricted SAT to 3-SAT is also a bit tricky, mostlybecause normal boolean logic conversion to CNF introduces exponentialblowup. The Tseitin encoding is used to produce a not equivalent butequisatisfiable CNF formula. (See Note 3.)
We are paying a price here for having done Cook–Levin with RMs. The originalversion with TMs manages to produce a CNF formula describing machineexecutions. In fact, SAT is often defined to mean CNF-SAT.
62
3SAT
There are special forms of SAT that turn out to be particularly handy.
A literal is either a propositional variable P or the negation ¬P of one.
φ is in conjunctive normal form (CNF) if it is of the form∧
i
∨j pij where
each pij is a literal.
φ is in k-CNF if each clause∨
j pij has at most k literals.
3SAT is the problem of whether a satisfying assignment exists for aformula in 3-CNF.
The reduction from unrestricted SAT to 3-SAT is also a bit tricky, mostlybecause normal boolean logic conversion to CNF introduces exponentialblowup. The Tseitin encoding is used to produce a not equivalent butequisatisfiable CNF formula. (See Note 3.)
We are paying a price here for having done Cook–Levin with RMs. The originalversion with TMs manages to produce a CNF formula describing machineexecutions. In fact, SAT is often defined to mean CNF-SAT.
62
CLIQUE
Given a graph G = (V ,E ) and a number k , a k-clique is a k-sized subsetC of V , such that every vertex in C has an edge to every other. (C formsa complete subgraph.) The CLIQUE problem is to decide whether G hasa k-clique.
The problem has applications in chemistry and biology (and of coursesociology).
It is NP-complete, by the following reduction from 3SAT:
Let φ =∧
1≤i≤k(xi1 ∨ xi2 ∨ xi3) be an instance of 3SAT, so each xij isa literal. Construct a graph thus: each xij is a vertex. Make an edgebetween xij and xi ′j ′ just in case i 6= i ′ and xi ′j ′ is not the negation ofxij . (I.e., we connect literals in different clauses just when they are notinconsistent with each other.)
Since the vertices in one clause are disconnected, finding a k-cliqueamounts to finding one literal for each clause, such that they are allconsistent – and so represent a satisfying assignment. Conversely, anysatisfying assignment generates a k-clique.
63
CLIQUE
Given a graph G = (V ,E ) and a number k , a k-clique is a k-sized subsetC of V , such that every vertex in C has an edge to every other. (C formsa complete subgraph.) The CLIQUE problem is to decide whether G hasa k-clique.
The problem has applications in chemistry and biology (and of coursesociology).
It is NP-complete, by the following reduction from 3SAT:
Let φ =∧
1≤i≤k(xi1 ∨ xi2 ∨ xi3) be an instance of 3SAT, so each xij isa literal.
Construct a graph thus: each xij is a vertex. Make an edgebetween xij and xi ′j ′ just in case i 6= i ′ and xi ′j ′ is not the negation ofxij . (I.e., we connect literals in different clauses just when they are notinconsistent with each other.)
Since the vertices in one clause are disconnected, finding a k-cliqueamounts to finding one literal for each clause, such that they are allconsistent – and so represent a satisfying assignment. Conversely, anysatisfying assignment generates a k-clique.
63
CLIQUE
Given a graph G = (V ,E ) and a number k , a k-clique is a k-sized subsetC of V , such that every vertex in C has an edge to every other. (C formsa complete subgraph.) The CLIQUE problem is to decide whether G hasa k-clique.
The problem has applications in chemistry and biology (and of coursesociology).
It is NP-complete, by the following reduction from 3SAT:
Let φ =∧
1≤i≤k(xi1 ∨ xi2 ∨ xi3) be an instance of 3SAT, so each xij isa literal. Construct a graph thus: each xij is a vertex. Make an edgebetween xij and xi ′j ′ just in case i 6= i ′ and xi ′j ′ is not the negation ofxij . (I.e., we connect literals in different clauses just when they are notinconsistent with each other.)
Since the vertices in one clause are disconnected, finding a k-cliqueamounts to finding one literal for each clause, such that they are allconsistent – and so represent a satisfying assignment. Conversely, anysatisfying assignment generates a k-clique.
63
CLIQUE
Given a graph G = (V ,E ) and a number k , a k-clique is a k-sized subsetC of V , such that every vertex in C has an edge to every other. (C formsa complete subgraph.) The CLIQUE problem is to decide whether G hasa k-clique.
The problem has applications in chemistry and biology (and of coursesociology).
It is NP-complete, by the following reduction from 3SAT:
Let φ =∧
1≤i≤k(xi1 ∨ xi2 ∨ xi3) be an instance of 3SAT, so each xij isa literal. Construct a graph thus: each xij is a vertex. Make an edgebetween xij and xi ′j ′ just in case i 6= i ′ and xi ′j ′ is not the negation ofxij . (I.e., we connect literals in different clauses just when they are notinconsistent with each other.)
Since the vertices in one clause are disconnected, finding a k-cliqueamounts to finding one literal for each clause, such that they are allconsistent – and so represent a satisfying assignment. Conversely, anysatisfying assignment generates a k-clique.
63
P?= NP
We have said that NP-complete problems are effectively exponential.
But do we know this? Is it possible that we’re just stupid, and there’s aPTime algorithm for SAT?
We don’t know.
It is possible that NP = P. If you find a polynomial algorithm for SATor any other NP-complete problem . . . hire bodyguards, because mostweb/banking security depends on such problems being hard.
Solving the problem is worth a million US dollars: it is one of the sevenproblems chosen by the Clay Institute for their Millennium Prizes.
Many of the results in complexity theory are therefore conditional: ‘ifP 6= NP, then very difficult theorem’.
E.g. if P 6= NP, then there are problems that are neither P nor NP-complete. There are very few candidates: best known is the GraphIsomorphism Problem.
64
P?= NP
We have said that NP-complete problems are effectively exponential.
But do we know this? Is it possible that we’re just stupid, and there’s aPTime algorithm for SAT?
We don’t know.
It is possible that NP = P. If you find a polynomial algorithm for SATor any other NP-complete problem . . . hire bodyguards, because mostweb/banking security depends on such problems being hard.
Solving the problem is worth a million US dollars: it is one of the sevenproblems chosen by the Clay Institute for their Millennium Prizes.
Many of the results in complexity theory are therefore conditional: ‘ifP 6= NP, then very difficult theorem’.
E.g. if P 6= NP, then there are problems that are neither P nor NP-complete. There are very few candidates: best known is the GraphIsomorphism Problem.
64
P?= NP
We have said that NP-complete problems are effectively exponential.
But do we know this? Is it possible that we’re just stupid, and there’s aPTime algorithm for SAT?
We don’t know.
It is possible that NP = P. If you find a polynomial algorithm for SATor any other NP-complete problem . . . hire bodyguards, because mostweb/banking security depends on such problems being hard.
Solving the problem is worth a million US dollars: it is one of the sevenproblems chosen by the Clay Institute for their Millennium Prizes.
Many of the results in complexity theory are therefore conditional: ‘ifP 6= NP, then very difficult theorem’.
E.g. if P 6= NP, then there are problems that are neither P nor NP-complete. There are very few candidates: best known is the GraphIsomorphism Problem.
64
P?= NP
We have said that NP-complete problems are effectively exponential.
But do we know this? Is it possible that we’re just stupid, and there’s aPTime algorithm for SAT?
We don’t know.
It is possible that NP = P. If you find a polynomial algorithm for SATor any other NP-complete problem . . . hire bodyguards, because mostweb/banking security depends on such problems being hard.
Solving the problem is worth a million US dollars: it is one of the sevenproblems chosen by the Clay Institute for their Millennium Prizes.
Many of the results in complexity theory are therefore conditional: ‘ifP 6= NP, then very difficult theorem’.
E.g. if P 6= NP, then there are problems that are neither P nor NP-complete. There are very few candidates: best known is the GraphIsomorphism Problem.
64
P?= NP
We have said that NP-complete problems are effectively exponential.
But do we know this? Is it possible that we’re just stupid, and there’s aPTime algorithm for SAT?
We don’t know.
It is possible that NP = P. If you find a polynomial algorithm for SATor any other NP-complete problem . . . hire bodyguards, because mostweb/banking security depends on such problems being hard.
Solving the problem is worth a million US dollars: it is one of the sevenproblems chosen by the Clay Institute for their Millennium Prizes.
Many of the results in complexity theory are therefore conditional: ‘ifP 6= NP, then very difficult theorem’.
E.g. if P 6= NP, then there are problems that are neither P nor NP-complete. There are very few candidates: best known is the GraphIsomorphism Problem.
64
Handling NP
As far as we know, NP problems are just hard: need exponential search,so O(poly(n) · 2n). So how do we solve them in practice? Huge industry. . .
Randomized algorithms are often useful. Allow algorithms to toss acoin. Surprisingly one can get randomized algorithms that solve e.g.3SAT in time O(poly(n) · (4/3)n) (Why is this useful? 2100 ≈ 1031, while1.33100 ≈ 1012) Catch: (really) small probability of error!
In many special classes (e.g. sparse graphs, or almost-complete graphs),heuristics lead to fast results.
See http://satcompetition.org/ for the state of the art.
65
Handling NP
As far as we know, NP problems are just hard: need exponential search,so O(poly(n) · 2n). So how do we solve them in practice? Huge industry. . .
Randomized algorithms are often useful. Allow algorithms to toss acoin. Surprisingly one can get randomized algorithms that solve e.g.3SAT in time O(poly(n) · (4/3)n) (Why is this useful? 2100 ≈ 1031, while1.33100 ≈ 1012) Catch: (really) small probability of error!
In many special classes (e.g. sparse graphs, or almost-complete graphs),heuristics lead to fast results.
See http://satcompetition.org/ for the state of the art.
65
Handling NP
As far as we know, NP problems are just hard: need exponential search,so O(poly(n) · 2n). So how do we solve them in practice? Huge industry. . .
Randomized algorithms are often useful. Allow algorithms to toss acoin. Surprisingly one can get randomized algorithms that solve e.g.3SAT in time O(poly(n) · (4/3)n) (Why is this useful? 2100 ≈ 1031, while1.33100 ≈ 1012) Catch: (really) small probability of error!
In many special classes (e.g. sparse graphs, or almost-complete graphs),heuristics lead to fast results.
See http://satcompetition.org/ for the state of the art.
65
Beyond NP
So what do we know about hard problems?
As noted earlier, ExpTime is strictly harder than P (and, we believe, NP):a polynomially bounded machine simply doesn’t have time or memory toexplore 2n steps. Formal proof via the Time Hierarchy Theorem.
Similarly 2-ExpTime ) ExpTime.
Also applies to N versions.
66
Space-bounded machines
An RM/TM is f (n)-space-bounded if it may use only f (input size) space.For TMs, space means cells on tape; for RMs, number of bits in registers.
PSpace is the class of problems solvable by polynomially-space-boundedmachines.
The following are obvious (Exercise: why?):
I PSpace ⊇ PTime
I PSpace ⊇ NPTime
I PSpace ⊆ ExpTime
67
Space-bounded machines
An RM/TM is f (n)-space-bounded if it may use only f (input size) space.For TMs, space means cells on tape; for RMs, number of bits in registers.
PSpace is the class of problems solvable by polynomially-space-boundedmachines.
The following are obvious (Exercise: why?):
I PSpace ⊇ PTime
I PSpace ⊇ NPTime
I PSpace ⊆ ExpTime
67
Space-bounded machines
An RM/TM is f (n)-space-bounded if it may use only f (input size) space.For TMs, space means cells on tape; for RMs, number of bits in registers.
PSpace is the class of problems solvable by polynomially-space-boundedmachines.
The following are obvious (Exercise: why?):
I PSpace ⊇ PTime
I PSpace ⊇ NPTime
I PSpace ⊆ ExpTime
67
The polynomial hierarchy
Recall the notion of oracle. Suppose we have an oracle for NP problems.What then can we compute?
The polynomial hierarchy is defined thus:
I Let ∆P0 = ΣP
0 = ΠP0 = P.
I Let ∆Pn+1 = PΣP
n ,ΣPn+1 = NPΣP
n ,ΠPn+1 = co-NPΣP
n .
where co-X is problems whose negation is in X.
So ΣP1 = NPP = NP, and ΣP
2 = NPNP. Why is NPNP not just NP?
This hierarchy could collapse at any level – generalization of P?= NP.
We don’t know.
The whole hierarchy is inside PSpace.
68
The polynomial hierarchy
Recall the notion of oracle. Suppose we have an oracle for NP problems.What then can we compute?
The polynomial hierarchy is defined thus:
I Let ∆P0 = ΣP
0 = ΠP0 = P.
I Let ∆Pn+1 = PΣP
n ,ΣPn+1 = NPΣP
n ,ΠPn+1 = co-NPΣP
n .
where co-X is problems whose negation is in X.
So ΣP1 = NPP = NP, and ΣP
2 = NPNP. Why is NPNP not just NP?
This hierarchy could collapse at any level – generalization of P?= NP.
We don’t know.
The whole hierarchy is inside PSpace.
68
The polynomial hierarchy
Recall the notion of oracle. Suppose we have an oracle for NP problems.What then can we compute?
The polynomial hierarchy is defined thus:
I Let ∆P0 = ΣP
0 = ΠP0 = P.
I Let ∆Pn+1 = PΣP
n ,ΣPn+1 = NPΣP
n ,ΠPn+1 = co-NPΣP
n .
where co-X is problems whose negation is in X.
So ΣP1 = NPP = NP, and ΣP
2 = NPNP. Why is NPNP not just NP?
This hierarchy could collapse at any level – generalization of P?= NP.
We don’t know.
The whole hierarchy is inside PSpace.
68
The polynomial hierarchy
Recall the notion of oracle. Suppose we have an oracle for NP problems.What then can we compute?
The polynomial hierarchy is defined thus:
I Let ∆P0 = ΣP
0 = ΠP0 = P.
I Let ∆Pn+1 = PΣP
n ,ΣPn+1 = NPΣP
n ,ΠPn+1 = co-NPΣP
n .
where co-X is problems whose negation is in X.
So ΣP1 = NPP = NP, and ΣP
2 = NPNP. Why is NPNP not just NP?
This hierarchy could collapse at any level – generalization of P?= NP.
We don’t know.
The whole hierarchy is inside PSpace.
68
The polynomial hierarchy
Recall the notion of oracle. Suppose we have an oracle for NP problems.What then can we compute?
The polynomial hierarchy is defined thus:
I Let ∆P0 = ΣP
0 = ΠP0 = P.
I Let ∆Pn+1 = PΣP
n ,ΣPn+1 = NPΣP
n ,ΠPn+1 = co-NPΣP
n .
where co-X is problems whose negation is in X.
So ΣP1 = NPP = NP, and ΣP
2 = NPNP. Why is NPNP not just NP?
This hierarchy could collapse at any level – generalization of P?= NP.
We don’t know.
The whole hierarchy is inside PSpace.
68
Lambda-Calculus
While Turing was working with machines, Alonzo Church was computingdifferently – with the precursor of Haskell!
You saw Haskell as a programming language for a computer: Church sawit as the computer.
Computation with lambda-calculus works by reduction (syntactic manip-ulation) of expressions or terms.
69
Lambda-Calculus
While Turing was working with machines, Alonzo Church was computingdifferently – with the precursor of Haskell!
You saw Haskell as a programming language for a computer: Church sawit as the computer.
Computation with lambda-calculus works by reduction (syntactic manip-ulation) of expressions or terms.
69
Lambda-Calculus
While Turing was working with machines, Alonzo Church was computingdifferently – with the precursor of Haskell!
You saw Haskell as a programming language for a computer: Church sawit as the computer.
Computation with lambda-calculus works by reduction (syntactic manip-ulation) of expressions or terms.
69
Lambda-calculus syntax
The terms of lambda-calculus are:
I There is a set of variables x , y , . . . . Every variable is a term.
I If x is a variable and t is a term, then (λx .t) is a term. (abstraction)
I If s and t are terms, then (st) is a term. (application)
I Nothing else is a term.
To avoid writing Lots of Irritating Spurious Parentheses, we adopt theconventions:
I Omit outermost parentheses. (E.g. λx .t)
I Application associates to the left: stu means (st)u.
I the body of an abstraction extends as far to the right as possible:λx .sλy .xty means λx .(s(λy .((xt)y)))
70
Lambda-calculus syntax
The terms of lambda-calculus are:
I There is a set of variables x , y , . . . . Every variable is a term.
I If x is a variable and t is a term, then (λx .t) is a term. (abstraction)
I If s and t are terms, then (st) is a term. (application)
I Nothing else is a term.
To avoid writing Lots of Irritating Spurious Parentheses, we adopt theconventions:
I Omit outermost parentheses. (E.g. λx .t)
I Application associates to the left: stu means (st)u.
I the body of an abstraction extends as far to the right as possible:λx .sλy .xty means λx .(s(λy .((xt)y)))
70
Lambda-calculus syntax
The terms of lambda-calculus are:
I There is a set of variables x , y , . . . . Every variable is a term.
I If x is a variable and t is a term, then (λx .t) is a term. (abstraction)
I If s and t are terms, then (st) is a term. (application)
I Nothing else is a term.
To avoid writing Lots of Irritating Spurious Parentheses, we adopt theconventions:
I Omit outermost parentheses. (E.g. λx .t)
I Application associates to the left: stu means (st)u.
I the body of an abstraction extends as far to the right as possible:λx .sλy .xty means λx .(s(λy .((xt)y)))
70
Lambda-calculus syntax
The terms of lambda-calculus are:
I There is a set of variables x , y , . . . . Every variable is a term.
I If x is a variable and t is a term, then (λx .t) is a term. (abstraction)
I If s and t are terms, then (st) is a term. (application)
I Nothing else is a term.
To avoid writing Lots of Irritating Spurious Parentheses, we adopt theconventions:
I Omit outermost parentheses. (E.g. λx .t)
I Application associates to the left: stu means (st)u.
I the body of an abstraction extends as far to the right as possible:λx .sλy .xty means λx .(s(λy .((xt)y)))
70
Lambda-calculus syntax
The terms of lambda-calculus are:
I There is a set of variables x , y , . . . . Every variable is a term.
I If x is a variable and t is a term, then (λx .t) is a term. (abstraction)
I If s and t are terms, then (st) is a term. (application)
I Nothing else is a term.
To avoid writing Lots of Irritating Spurious Parentheses, we adopt theconventions:
I Omit outermost parentheses. (E.g. λx .t)
I Application associates to the left: stu means (st)u.
I the body of an abstraction extends as far to the right as possible:λx .sλy .xty means λx .(s(λy .((xt)y)))
70
Lambda-calculus syntax
The terms of lambda-calculus are:
I There is a set of variables x , y , . . . . Every variable is a term.
I If x is a variable and t is a term, then (λx .t) is a term. (abstraction)
I If s and t are terms, then (st) is a term. (application)
I Nothing else is a term.
To avoid writing Lots of Irritating Spurious Parentheses, we adopt theconventions:
I Omit outermost parentheses. (E.g. λx .t)
I Application associates to the left: stu means (st)u.
I the body of an abstraction extends as far to the right as possible:λx .sλy .xty means λx .(s(λy .((xt)y)))
70
Lambda-calculus syntax
The terms of lambda-calculus are:
I There is a set of variables x , y , . . . . Every variable is a term.
I If x is a variable and t is a term, then (λx .t) is a term. (abstraction)
I If s and t are terms, then (st) is a term. (application)
I Nothing else is a term.
To avoid writing Lots of Irritating Spurious Parentheses, we adopt theconventions:
I Omit outermost parentheses. (E.g. λx .t)
I Application associates to the left: stu means (st)u.
I the body of an abstraction extends as far to the right as possible:λx .sλy .xty means λx .(s(λy .((xt)y)))
70
Intuition
Variables are . . . variables. Abstraction makes anonymous functions: λx .tis function that takes an argument (s, say), binds s to the variable x , andevaluates (?) t.
Application is application of a function to an argument.
Examples:
I λx .x is the identity function
I (λx .x)(λx .x) is the identity function applied to itself – which is (willturn out to be) the identity function
I λx .λy .wxy is a function that takes an argument x , and gives back afunction that takes an argument y and applies w to x giving a resultthat is applied to y . A curried function of two arguments.
This should all look completely familiar from Haskell – except there areno types here! Also no integers, booleans, data types, pattern matching,. . .
71
Intuition
Variables are . . . variables. Abstraction makes anonymous functions: λx .tis function that takes an argument (s, say), binds s to the variable x , andevaluates (?) t.
Application is application of a function to an argument.
Examples:
I λx .x is the identity function
I (λx .x)(λx .x) is the identity function applied to itself – which is (willturn out to be) the identity function
I λx .λy .wxy is a function that takes an argument x , and gives back afunction that takes an argument y and applies w to x giving a resultthat is applied to y . A curried function of two arguments.
This should all look completely familiar from Haskell – except there areno types here! Also no integers, booleans, data types, pattern matching,. . .
71
Intuition
Variables are . . . variables. Abstraction makes anonymous functions: λx .tis function that takes an argument (s, say), binds s to the variable x , andevaluates (?) t.
Application is application of a function to an argument.
Examples:
I λx .x is the identity function
I (λx .x)(λx .x) is the identity function applied to itself – which is (willturn out to be) the identity function
I λx .λy .wxy is a function that takes an argument x , and gives back afunction that takes an argument y and applies w to x giving a resultthat is applied to y . A curried function of two arguments.
This should all look completely familiar from Haskell – except there areno types here! Also no integers, booleans, data types, pattern matching,. . .
71
Intuition
Variables are . . . variables. Abstraction makes anonymous functions: λx .tis function that takes an argument (s, say), binds s to the variable x , andevaluates (?) t.
Application is application of a function to an argument.
Examples:
I λx .x is the identity function
I (λx .x)(λx .x) is the identity function applied to itself – which is (willturn out to be) the identity function
I λx .λy .wxy is a function that takes an argument x , and gives back afunction that takes an argument y and applies w to x giving a resultthat is applied to y . A curried function of two arguments.
This should all look completely familiar from Haskell – except there areno types here! Also no integers, booleans, data types, pattern matching,. . .
71
Intuition
Variables are . . . variables. Abstraction makes anonymous functions: λx .tis function that takes an argument (s, say), binds s to the variable x , andevaluates (?) t.
Application is application of a function to an argument.
Examples:
I λx .x is the identity function
I (λx .x)(λx .x) is the identity function applied to itself – which is (willturn out to be) the identity function
I λx .λy .wxy is a function that takes an argument x , and gives back afunction that takes an argument y and applies w to x giving a resultthat is applied to y . A curried function of two arguments.
This should all look completely familiar from Haskell – except there areno types here! Also no integers, booleans, data types, pattern matching,. . .
71
Free and bound variables; α-conversion
If a variable x occurs inside a term t, the occurrence is free iff x is notinside a subterm λx .t ′ of t.
If x is free in t ′, it is bound in λx .t ′.
This is just like variable binding in all civilized programming languages.
Exercise: Write a formal definition of ‘free’ and ‘bound’ by induction onthe structure of terms.
The terms λx .x and λy .y are not interestingly different.
Changing the name of a bound variable is α-conversion (historicalreasons. . . ).
Care is needed: can’t just change x to y in λx .λy .xy , or in λx .yx .
Exercise: Write a precise and safe definition of α-conversion.
We write sα→ t if s can be α-converted to t.
72
Free and bound variables; α-conversion
If a variable x occurs inside a term t, the occurrence is free iff x is notinside a subterm λx .t ′ of t.
If x is free in t ′, it is bound in λx .t ′.
This is just like variable binding in all civilized programming languages.
Exercise: Write a formal definition of ‘free’ and ‘bound’ by induction onthe structure of terms.
The terms λx .x and λy .y are not interestingly different.
Changing the name of a bound variable is α-conversion (historicalreasons. . . ).
Care is needed: can’t just change x to y in λx .λy .xy , or in λx .yx .
Exercise: Write a precise and safe definition of α-conversion.
We write sα→ t if s can be α-converted to t.
72
Free and bound variables; α-conversion
If a variable x occurs inside a term t, the occurrence is free iff x is notinside a subterm λx .t ′ of t.
If x is free in t ′, it is bound in λx .t ′.
This is just like variable binding in all civilized programming languages.
Exercise: Write a formal definition of ‘free’ and ‘bound’ by induction onthe structure of terms.
The terms λx .x and λy .y are not interestingly different.
Changing the name of a bound variable is α-conversion (historicalreasons. . . ).
Care is needed: can’t just change x to y in λx .λy .xy , or in λx .yx .
Exercise: Write a precise and safe definition of α-conversion.
We write sα→ t if s can be α-converted to t.
72
Free and bound variables; α-conversion
If a variable x occurs inside a term t, the occurrence is free iff x is notinside a subterm λx .t ′ of t.
If x is free in t ′, it is bound in λx .t ′.
This is just like variable binding in all civilized programming languages.
Exercise: Write a formal definition of ‘free’ and ‘bound’ by induction onthe structure of terms.
The terms λx .x and λy .y are not interestingly different.
Changing the name of a bound variable is α-conversion (historicalreasons. . . ).
Care is needed: can’t just change x to y in λx .λy .xy , or in λx .yx .
Exercise: Write a precise and safe definition of α-conversion.
We write sα→ t if s can be α-converted to t.
72
Free and bound variables; α-conversion
If a variable x occurs inside a term t, the occurrence is free iff x is notinside a subterm λx .t ′ of t.
If x is free in t ′, it is bound in λx .t ′.
This is just like variable binding in all civilized programming languages.
Exercise: Write a formal definition of ‘free’ and ‘bound’ by induction onthe structure of terms.
The terms λx .x and λy .y are not interestingly different.
Changing the name of a bound variable is α-conversion (historicalreasons. . . ).
Care is needed: can’t just change x to y in λx .λy .xy , or in λx .yx .
Exercise: Write a precise and safe definition of α-conversion.
We write sα→ t if s can be α-converted to t.
72
Evaluating terms: β-reduction
The main computation is done by β-reduction (historical reasons. . . ),which ‘evaluates’ a function application:
(λx .t)(s)β→ t[s/x ]
where t[s/x ] is the result of (appropriately) substituting s for every freeoccurrence of x in t.
Examples:
(λx .x)(λy .y)β→ λy .y
(λx .xx)(λy .y)wβ→ (λy .y)(λy .y)wβ→ (λy .y)wβ→ w
73
Evaluating terms: β-reduction
The main computation is done by β-reduction (historical reasons. . . ),which ‘evaluates’ a function application:
(λx .t)(s)β→ t[s/x ]
where t[s/x ] is the result of (appropriately) substituting s for every freeoccurrence of x in t.
Examples:
(λx .x)(λy .y)β→ λy .y
(λx .xx)(λy .y)wβ→ (λy .y)(λy .y)wβ→ (λy .y)wβ→ w
73
Evaluating terms: β-reduction
The main computation is done by β-reduction (historical reasons. . . ),which ‘evaluates’ a function application:
(λx .t)(s)β→ t[s/x ]
where t[s/x ] is the result of (appropriately) substituting s for every freeoccurrence of x in t.
Examples:
(λx .x)(λy .y)β→ λy .y
(λx .xx)(λy .y)wβ→ (λy .y)(λy .y)w
β→ (λy .y)wβ→ w
73
Evaluating terms: β-reduction
The main computation is done by β-reduction (historical reasons. . . ),which ‘evaluates’ a function application:
(λx .t)(s)β→ t[s/x ]
where t[s/x ] is the result of (appropriately) substituting s for every freeoccurrence of x in t.
Examples:
(λx .x)(λy .y)β→ λy .y
(λx .xx)(λy .y)wβ→ (λy .y)(λy .y)wβ→ (λy .y)w
β→ w
73
Evaluating terms: β-reduction
The main computation is done by β-reduction (historical reasons. . . ),which ‘evaluates’ a function application:
(λx .t)(s)β→ t[s/x ]
where t[s/x ] is the result of (appropriately) substituting s for every freeoccurrence of x in t.
Examples:
(λx .x)(λy .y)β→ λy .y
(λx .xx)(λy .y)wβ→ (λy .y)(λy .y)wβ→ (λy .y)wβ→ w
73
Variable capture and substitution
The lambda calculus is purely text (symbol) manipulation – naivesubstitution can go wrong:
(λx .λy .xy)yw → (λy .yy)w → ww
because the free y was captured by λy .
So we defineβ→ to do the right thing, by (e.g.) renaming bound variables
to avoid capture:
(λx .λy .xy)ywβ→ (λy ′.yy ′)w
β→ yw
Surprisingly tricky to get right. See Pierce §5.3 for details.
74
Variable capture and substitution
The lambda calculus is purely text (symbol) manipulation – naivesubstitution can go wrong:
(λx .λy .xy)yw → (λy .yy)w → ww
because the free y was captured by λy .
So we defineβ→ to do the right thing, by (e.g.) renaming bound variables
to avoid capture:
(λx .λy .xy)ywβ→ (λy ′.yy ′)w
β→ yw
Surprisingly tricky to get right. See Pierce §5.3 for details.
74
Eta conversion
Sometimes it’s useful to wrap a function up in another abstraction (e.g.to delay its evaluation). η-conversion allows us to do this: provide x isnot free in f , then
(λx .fx)η↔ f
75
Evaluation order
Does it matter what order we evaluate things in?
(λz.w)((λx .xx)(λx .xx))β→ w
or
(λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
. . .
‘Call by name’ (nearly but not quite what Haskell does) is the first: reducethe outer left redex first, and don’t go inside lambda abstractions.
‘Call by value’ is the second: reduce function arguments before reducingthe application.
For others, see Pierce §5.1.
76
Evaluation order
Does it matter what order we evaluate things in?
(λz.w)((λx .xx)(λx .xx))β→ w
or
(λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
. . .
‘Call by name’ (nearly but not quite what Haskell does) is the first: reducethe outer left redex first, and don’t go inside lambda abstractions.
‘Call by value’ is the second: reduce function arguments before reducingthe application.
For others, see Pierce §5.1.
76
Evaluation order
Does it matter what order we evaluate things in?
(λz.w)((λx .xx)(λx .xx))β→ w
or
(λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
. . .
‘Call by name’ (nearly but not quite what Haskell does) is the first: reducethe outer left redex first, and don’t go inside lambda abstractions.
‘Call by value’ is the second: reduce function arguments before reducingthe application.
For others, see Pierce §5.1.
76
Evaluation order
Does it matter what order we evaluate things in?
(λz.w)((λx .xx)(λx .xx))β→ w
or
(λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
β→ (λz.w)((λx .xx)(λx .xx))
. . .
‘Call by name’ (nearly but not quite what Haskell does) is the first: reducethe outer left redex first, and don’t go inside lambda abstractions.
‘Call by value’ is the second: reduce function arguments before reducingthe application.
For others, see Pierce §5.1.76
Adding stuff
So far, this doesn’t look like much of a computer.
We could imagine adding basic numerals 0, 1, 2, . . . , and arithmeticoperations with rules like
+n1n2β→ n where n = n1 + n2
and booleans true, false and conditionals with rules like
ifbstβ→ s if b is true
t if b is false
but we still need some way to achieve iteration or looping.
It’s not actually necessary to add numerical primitives: we can define λ-terms torepresent them. Look up Church numerals in the textbook.
77
Adding stuff
So far, this doesn’t look like much of a computer.
We could imagine adding basic numerals 0, 1, 2, . . . , and arithmeticoperations with rules like
+n1n2β→ n where n = n1 + n2
and booleans true, false and conditionals with rules like
ifbstβ→ s if b is true
t if b is false
but we still need some way to achieve iteration or looping.
It’s not actually necessary to add numerical primitives: we can define λ-terms torepresent them. Look up Church numerals in the textbook.
77
Adding stuff
So far, this doesn’t look like much of a computer.
We could imagine adding basic numerals 0, 1, 2, . . . , and arithmeticoperations with rules like
+n1n2β→ n where n = n1 + n2
and booleans true, false and conditionals with rules like
ifbstβ→ s if b is true
t if b is false
but we still need some way to achieve iteration or looping.
It’s not actually necessary to add numerical primitives: we can define λ-terms torepresent them. Look up Church numerals in the textbook.
77
Recursion in lambda calculus
Haskellers will expect that recursion is the trick. But without names forfunctions it’s . . . tricky.
Example: factorial function is
(λF .(λX .F (XX ))(λX .F (XX )))(λf .λx .if(=0x)1(×x(f (−x1))))
If you can understand that . . .
The λf . . . . is basically factorial, but we have to find a way for theembedded f to refer to itself.
Exercise: work through the evaluation applied to 3. (Use call by name.)
At this point, you may wish to find a lambda-calculus evaluator. (Thereare many. Explore!)
Exercise: can you do this in Haskell? If not, why not?
78
Recursion in lambda calculus
Haskellers will expect that recursion is the trick. But without names forfunctions it’s . . . tricky.
Example: factorial function is
(λF .(λX .F (XX ))(λX .F (XX )))(λf .λx .if(=0x)1(×x(f (−x1))))
If you can understand that . . .
The λf . . . . is basically factorial, but we have to find a way for theembedded f to refer to itself.
Exercise: work through the evaluation applied to 3. (Use call by name.)
At this point, you may wish to find a lambda-calculus evaluator. (Thereare many. Explore!)
Exercise: can you do this in Haskell? If not, why not?
78
Recursion in lambda calculus
Haskellers will expect that recursion is the trick. But without names forfunctions it’s . . . tricky.
Example: factorial function is
(λF .(λX .F (XX ))(λX .F (XX )))(λf .λx .if(=0x)1(×x(f (−x1))))
If you can understand that . . .
The λf . . . . is basically factorial, but we have to find a way for theembedded f to refer to itself.
Exercise: work through the evaluation applied to 3. (Use call by name.)
At this point, you may wish to find a lambda-calculus evaluator. (Thereare many. Explore!)
Exercise: can you do this in Haskell? If not, why not?
78
Y?
The magic term that made λf . . . . recursive was
Ydef= λF .(λX .F (XX ))(λX .F (XX ))
Why does it work?
Suppose we give Y an argument G , a function that takes two argumentsf and x . Then, in call-by-name,
YGβ→ (λX .G (XX ))(λX .G (XX ))
β→ G ((λX .G (XX ))(λX .G (XX )))
Observe that the last term is just (one reduction of) G (YG ).
Now if G itself applies its first argument (YG here) to its second argument,YG will get expanded to G (YG ) again. If doesn’t use its first argument,the expansion stops.
79
Y?
The magic term that made λf . . . . recursive was
Ydef= λF .(λX .F (XX ))(λX .F (XX ))
Why does it work?
Suppose we give Y an argument G , a function that takes two argumentsf and x . Then, in call-by-name,
YGβ→ (λX .G (XX ))(λX .G (XX ))
β→ G ((λX .G (XX ))(λX .G (XX )))
Observe that the last term is just (one reduction of) G (YG ).
Now if G itself applies its first argument (YG here) to its second argument,YG will get expanded to G (YG ) again. If doesn’t use its first argument,the expansion stops.
79
Y?
The magic term that made λf . . . . recursive was
Ydef= λF .(λX .F (XX ))(λX .F (XX ))
Why does it work?
Suppose we give Y an argument G , a function that takes two argumentsf and x . Then, in call-by-name,
YGβ→ (λX .G (XX ))(λX .G (XX ))
β→ G ((λX .G (XX ))(λX .G (XX )))
Observe that the last term is just (one reduction of) G (YG ).
Now if G itself applies its first argument (YG here) to its second argument,YG will get expanded to G (YG ) again. If doesn’t use its first argument,the expansion stops.
79
Y?
The magic term that made λf . . . . recursive was
Ydef= λF .(λX .F (XX ))(λX .F (XX ))
Why does it work?
Suppose we give Y an argument G , a function that takes two argumentsf and x . Then, in call-by-name,
YGβ→ (λX .G (XX ))(λX .G (XX ))
β→ G ((λX .G (XX ))(λX .G (XX )))
Observe that the last term is just (one reduction of) G (YG ).
Now if G itself applies its first argument (YG here) to its second argument,YG will get expanded to G (YG ) again. If doesn’t use its first argument,the expansion stops.
79
Putting Gdef= λf .λx .if (= 0 x) 1 (× x (f (− x 1))), and evaluating Y G 3,
say, we get:
Y G 3
β→ G (YG ) 3β→(λx .if (= 0 x) 1 (× x (YG (− x 1)))
)3
β→ if (= 0 3) 1 (× 3 (YG (− 3 1)))β→
+
× 3 (YG (− 3 1))
. . .β→
+
6
80
Putting Gdef= λf .λx .if (= 0 x) 1 (× x (f (− x 1))), and evaluating Y G 3,
say, we get:
Y G 3β→ G (YG ) 3
β→(λx .if (= 0 x) 1 (× x (YG (− x 1)))
)3
β→ if (= 0 3) 1 (× 3 (YG (− 3 1)))β→
+
× 3 (YG (− 3 1))
. . .β→
+
6
80
Putting Gdef= λf .λx .if (= 0 x) 1 (× x (f (− x 1))), and evaluating Y G 3,
say, we get:
Y G 3β→ G (YG ) 3β→(λx .if (= 0 x) 1 (× x (YG (− x 1)))
)3
β→ if (= 0 3) 1 (× 3 (YG (− 3 1)))β→
+
× 3 (YG (− 3 1))
. . .β→
+
6
80
Putting Gdef= λf .λx .if (= 0 x) 1 (× x (f (− x 1))), and evaluating Y G 3,
say, we get:
Y G 3β→ G (YG ) 3β→(λx .if (= 0 x) 1 (× x (YG (− x 1)))
)3
β→ if (= 0 3) 1 (× 3 (YG (− 3 1)))
β→+
× 3 (YG (− 3 1))
. . .β→
+
6
80
Putting Gdef= λf .λx .if (= 0 x) 1 (× x (f (− x 1))), and evaluating Y G 3,
say, we get:
Y G 3β→ G (YG ) 3β→(λx .if (= 0 x) 1 (× x (YG (− x 1)))
)3
β→ if (= 0 3) 1 (× 3 (YG (− 3 1)))β→
+
× 3 (YG (− 3 1))
. . .β→
+
6
80
Putting Gdef= λf .λx .if (= 0 x) 1 (× x (f (− x 1))), and evaluating Y G 3,
say, we get:
Y G 3β→ G (YG ) 3β→(λx .if (= 0 x) 1 (× x (YG (− x 1)))
)3
β→ if (= 0 3) 1 (× 3 (YG (− 3 1)))β→
+
× 3 (YG (− 3 1))
. . .β→
+
6
80
Types
In λ-calculus, we can write crazy things like
(λx .xxx)(0)
or(+(λx .xx)42)
Simply typed λ-calculus adds simple (!) types to the basic λ-calculussyntax, to prevent such nonsense.
I Fix some set of base types, e.g. nat, bool, for the constants.I If σ, τ are types, then σ→ τ is a type, the function type of
abstractions that take an argument of type σ and return a value oftype τ . Convention: → associates to the right, so σ→ τ → υmeans σ→ (τ → υ).
I The syntax of abstractions is changed to (λx :σ.t), giving the typeof the argument.
Note that σ, τ are meta-variables, not symbols in the language itself.
81
Types
In λ-calculus, we can write crazy things like
(λx .xxx)(0)
or(+(λx .xx)42)
Simply typed λ-calculus adds simple (!) types to the basic λ-calculussyntax, to prevent such nonsense.
I Fix some set of base types, e.g. nat, bool, for the constants.I If σ, τ are types, then σ→ τ is a type, the function type of
abstractions that take an argument of type σ and return a value oftype τ . Convention: → associates to the right, so σ→ τ → υmeans σ→ (τ → υ).
I The syntax of abstractions is changed to (λx :σ.t), giving the typeof the argument.
Note that σ, τ are meta-variables, not symbols in the language itself.
81
Types
In λ-calculus, we can write crazy things like
(λx .xxx)(0)
or(+(λx .xx)42)
Simply typed λ-calculus adds simple (!) types to the basic λ-calculussyntax, to prevent such nonsense.
I Fix some set of base types, e.g. nat, bool, for the constants.
I If σ, τ are types, then σ→ τ is a type, the function type ofabstractions that take an argument of type σ and return a value oftype τ . Convention: → associates to the right, so σ→ τ → υmeans σ→ (τ → υ).
I The syntax of abstractions is changed to (λx :σ.t), giving the typeof the argument.
Note that σ, τ are meta-variables, not symbols in the language itself.
81
Types
In λ-calculus, we can write crazy things like
(λx .xxx)(0)
or(+(λx .xx)42)
Simply typed λ-calculus adds simple (!) types to the basic λ-calculussyntax, to prevent such nonsense.
I Fix some set of base types, e.g. nat, bool, for the constants.I If σ, τ are types, then σ→ τ is a type, the function type of
abstractions that take an argument of type σ and return a value oftype τ . Convention: → associates to the right, so σ→ τ → υmeans σ→ (τ → υ).
I The syntax of abstractions is changed to (λx :σ.t), giving the typeof the argument.
Note that σ, τ are meta-variables, not symbols in the language itself.
81
Types
In λ-calculus, we can write crazy things like
(λx .xxx)(0)
or(+(λx .xx)42)
Simply typed λ-calculus adds simple (!) types to the basic λ-calculussyntax, to prevent such nonsense.
I Fix some set of base types, e.g. nat, bool, for the constants.I If σ, τ are types, then σ→ τ is a type, the function type of
abstractions that take an argument of type σ and return a value oftype τ . Convention: → associates to the right, so σ→ τ → υmeans σ→ (τ → υ).
I The syntax of abstractions is changed to (λx :σ.t), giving the typeof the argument.
Note that σ, τ are meta-variables, not symbols in the language itself.81
Typing
A legitimate simply typed λ-calculus term must be well typed. This isdetermined by inferring types.
For a term t and type τ , we write t : τ for ‘t is well typed with type τ ’.
I Constants (individuals or functions) are well typed with theirassigned type.
I If t : τ under the assumption that x : σ, then (λx :σ.t) : σ→ τ .
I If t : σ→ τ and s : σ, then (ts) : τ .
Examples:
I A function that takes a function f on integers and returns a functionthat applies f twice:
(λf :nat→ nat.λx :nat.f (fx)) : (nat→ nat)→ (nat→ nat)
Exercise: Work it through.
I λx :nat.xx cannot be well typed.
82
Typing
A legitimate simply typed λ-calculus term must be well typed. This isdetermined by inferring types.
For a term t and type τ , we write t : τ for ‘t is well typed with type τ ’.
I Constants (individuals or functions) are well typed with theirassigned type.
I If t : τ under the assumption that x : σ, then (λx :σ.t) : σ→ τ .
I If t : σ→ τ and s : σ, then (ts) : τ .
Examples:
I A function that takes a function f on integers and returns a functionthat applies f twice:
(λf :nat→ nat.λx :nat.f (fx)) : (nat→ nat)→ (nat→ nat)
Exercise: Work it through.
I λx :nat.xx cannot be well typed.
82
Typing
A legitimate simply typed λ-calculus term must be well typed. This isdetermined by inferring types.
For a term t and type τ , we write t : τ for ‘t is well typed with type τ ’.
I Constants (individuals or functions) are well typed with theirassigned type.
I If t : τ under the assumption that x : σ, then (λx :σ.t) : σ→ τ .
I If t : σ→ τ and s : σ, then (ts) : τ .
Examples:
I A function that takes a function f on integers and returns a functionthat applies f twice:
(λf :nat→ nat.λx :nat.f (fx)) : (nat→ nat)→ (nat→ nat)
Exercise: Work it through.
I λx :nat.xx cannot be well typed.
82
Typing
A legitimate simply typed λ-calculus term must be well typed. This isdetermined by inferring types.
For a term t and type τ , we write t : τ for ‘t is well typed with type τ ’.
I Constants (individuals or functions) are well typed with theirassigned type.
I If t : τ under the assumption that x : σ, then (λx :σ.t) : σ→ τ .
I If t : σ→ τ and s : σ, then (ts) : τ .
Examples:
I A function that takes a function f on integers and returns a functionthat applies f twice:
(λf :nat→ nat.λx :nat.f (fx)) : (nat→ nat)→ (nat→ nat)
Exercise: Work it through.
I λx :nat.xx cannot be well typed.
82
Typing
A legitimate simply typed λ-calculus term must be well typed. This isdetermined by inferring types.
For a term t and type τ , we write t : τ for ‘t is well typed with type τ ’.
I Constants (individuals or functions) are well typed with theirassigned type.
I If t : τ under the assumption that x : σ, then (λx :σ.t) : σ→ τ .
I If t : σ→ τ and s : σ, then (ts) : τ .
Examples:
I A function that takes a function f on integers and returns a functionthat applies f twice:
(λf :nat→ nat.λx :nat.f (fx)) : (nat→ nat)→ (nat→ nat)
Exercise: Work it through.
I λx :nat.xx cannot be well typed.
82
Typing
A legitimate simply typed λ-calculus term must be well typed. This isdetermined by inferring types.
For a term t and type τ , we write t : τ for ‘t is well typed with type τ ’.
I Constants (individuals or functions) are well typed with theirassigned type.
I If t : τ under the assumption that x : σ, then (λx :σ.t) : σ→ τ .
I If t : σ→ τ and s : σ, then (ts) : τ .
Examples:
I A function that takes a function f on integers and returns a functionthat applies f twice:
(λf :nat→ nat.λx :nat.f (fx)) : (nat→ nat)→ (nat→ nat)
Exercise: Work it through.
I λx :nat.xx cannot be well typed.82
Proofs for types
It’s common to present type inferences as proof trees, as with proofs inpropositional or predicate logic.
We assume that all formulae are α-converted to avoid clashes of boundvariables.
Γ ` t : τ is a judgement that t has type τ under the set of assumptionsΓ. The following rules (read top to bottom) prove judgements:
I Γ ` c : τ for each constant c of base type τ .
I x : σ ∈ ΓΓ ` x : σ
I Γ, x : σ ` t : τΓ ` λx :σ.t : σ→ τ
I Γ ` t : σ→ τ Γ ` s : σΓ ` ts : τ
Then t : τ iff ` t : τ is provable.
83
Proofs for types
It’s common to present type inferences as proof trees, as with proofs inpropositional or predicate logic.
We assume that all formulae are α-converted to avoid clashes of boundvariables.
Γ ` t : τ is a judgement that t has type τ under the set of assumptionsΓ. The following rules (read top to bottom) prove judgements:
I Γ ` c : τ for each constant c of base type τ .
I x : σ ∈ ΓΓ ` x : σ
I Γ, x : σ ` t : τΓ ` λx :σ.t : σ→ τ
I Γ ` t : σ→ τ Γ ` s : σΓ ` ts : τ
Then t : τ iff ` t : τ is provable.
83
Proofs for types
It’s common to present type inferences as proof trees, as with proofs inpropositional or predicate logic.
We assume that all formulae are α-converted to avoid clashes of boundvariables.
Γ ` t : τ is a judgement that t has type τ under the set of assumptionsΓ. The following rules (read top to bottom) prove judgements:
I Γ ` c : τ for each constant c of base type τ .
I x : σ ∈ ΓΓ ` x : σ
I Γ, x : σ ` t : τΓ ` λx :σ.t : σ→ τ
I Γ ` t : σ→ τ Γ ` s : σΓ ` ts : τ
Then t : τ iff ` t : τ is provable.
83
Proofs for types
It’s common to present type inferences as proof trees, as with proofs inpropositional or predicate logic.
We assume that all formulae are α-converted to avoid clashes of boundvariables.
Γ ` t : τ is a judgement that t has type τ under the set of assumptionsΓ. The following rules (read top to bottom) prove judgements:
I Γ ` c : τ for each constant c of base type τ .
I x : σ ∈ ΓΓ ` x : σ
I Γ, x : σ ` t : τΓ ` λx :σ.t : σ→ τ
I Γ ` t : σ→ τ Γ ` s : σΓ ` ts : τ
Then t : τ iff ` t : τ is provable.
83
Proofs for types
It’s common to present type inferences as proof trees, as with proofs inpropositional or predicate logic.
We assume that all formulae are α-converted to avoid clashes of boundvariables.
Γ ` t : τ is a judgement that t has type τ under the set of assumptionsΓ. The following rules (read top to bottom) prove judgements:
I Γ ` c : τ for each constant c of base type τ .
I x : σ ∈ ΓΓ ` x : σ
I Γ, x : σ ` t : τΓ ` λx :σ.t : σ→ τ
I Γ ` t : σ→ τ Γ ` s : σΓ ` ts : τ
Then t : τ iff ` t : τ is provable.
83
Example: to show that λx :nat.(+ 1 x) : nat→ nat:
x : nat ` + : nat→ (nat→ nat) x : nat ` 1 : nat
x : nat ` (+ 1) : nat→ nat
x : nat ∈ x : natx : nat ` x : nat
x : nat ` ((+ 1) x) : nat
` λx :nat.((+ 1) x) : nat→ nat
What if we try to type λx :nat.xx?
` λx :nat.xx : ?
It should be reasonably obvious that we can quickly well-type (or proveuntypable) any simply typed term. (Did the stage when σ turned into natremind you of anything?)
84
Example: to show that λx :nat.(+ 1 x) : nat→ nat:
x : nat ` + : nat→ (nat→ nat) x : nat ` 1 : nat
x : nat ` (+ 1) : nat→ nat
x : nat ∈ x : natx : nat ` x : nat
x : nat ` ((+ 1) x) : nat
` λx :nat.((+ 1) x) : nat→ nat
What if we try to type λx :nat.xx?
` λx :nat.xx : ?
It should be reasonably obvious that we can quickly well-type (or proveuntypable) any simply typed term. (Did the stage when σ turned into natremind you of anything?)
84
Example: to show that λx :nat.(+ 1 x) : nat→ nat:
x : nat ` + : nat→ (nat→ nat) x : nat ` 1 : nat
x : nat ` (+ 1) : nat→ nat
x : nat ∈ x : natx : nat ` x : nat
x : nat ` ((+ 1) x) : nat
` λx :nat.((+ 1) x) : nat→ nat
What if we try to type λx :nat.xx?
x : nat ` xx : τ
` λx :nat.xx : nat→ τ
It should be reasonably obvious that we can quickly well-type (or proveuntypable) any simply typed term. (Did the stage when σ turned into natremind you of anything?)
84
Example: to show that λx :nat.(+ 1 x) : nat→ nat:
x : nat ` + : nat→ (nat→ nat) x : nat ` 1 : nat
x : nat ` (+ 1) : nat→ nat
x : nat ∈ x : natx : nat ` x : nat
x : nat ` ((+ 1) x) : nat
` λx :nat.((+ 1) x) : nat→ nat
What if we try to type λx :nat.xx?
x : nat ` x : σ→ τ x : nat ` x : σ
x : nat ` xx : τ
` λx :nat.xx : nat→ τ
It should be reasonably obvious that we can quickly well-type (or proveuntypable) any simply typed term. (Did the stage when σ turned into natremind you of anything?)
84
Example: to show that λx :nat.(+ 1 x) : nat→ nat:
x : nat ` + : nat→ (nat→ nat) x : nat ` 1 : nat
x : nat ` (+ 1) : nat→ nat
x : nat ∈ x : natx : nat ` x : nat
x : nat ` ((+ 1) x) : nat
` λx :nat.((+ 1) x) : nat→ nat
What if we try to type λx :nat.xx?
x : nat ` x : nat→ τ
x : nat ∈ x : natx : nat ` x : nat
x : nat ` xx : τ
` λx :nat.xx : nat→ τ
It should be reasonably obvious that we can quickly well-type (or proveuntypable) any simply typed term. (Did the stage when σ turned into natremind you of anything?)
84
Example: to show that λx :nat.(+ 1 x) : nat→ nat:
x : nat ` + : nat→ (nat→ nat) x : nat ` 1 : nat
x : nat ` (+ 1) : nat→ nat
x : nat ∈ x : natx : nat ` x : nat
x : nat ` ((+ 1) x) : nat
` λx :nat.((+ 1) x) : nat→ nat
What if we try to type λx :nat.xx?
x : nat 0 x : nat→ τ
x : nat ∈ x : natx : nat ` x : nat
x : nat ` xx : τ
` λx :nat.xx : nat→ τ
It should be reasonably obvious that we can quickly well-type (or proveuntypable) any simply typed term. (Did the stage when σ turned into natremind you of anything?)
84
Example: to show that λx :nat.(+ 1 x) : nat→ nat:
x : nat ` + : nat→ (nat→ nat) x : nat ` 1 : nat
x : nat ` (+ 1) : nat→ nat
x : nat ∈ x : natx : nat ` x : nat
x : nat ` ((+ 1) x) : nat
` λx :nat.((+ 1) x) : nat→ nat
What if we try to type λx :nat.xx?
x : nat 0 x : nat→ τ
x : nat ∈ x : natx : nat ` x : nat
x : nat ` xx : τ
` λx :nat.xx : nat→ τ
It should be reasonably obvious that we can quickly well-type (or proveuntypable) any simply typed term. (Did the stage when σ turned into natremind you of anything?)
84
Properties of simple types
I Uniqueness of types. In a given context (types for free variables),any simply typed lambda term has at most one type (and this isdecidable, in small polynomial time). (This should be fairly obviousfrom what we’ve just done; formal proof takes a little work, but notmuch.)
I Type safety. The type of a term remains unchanged underα, β, η-conversion. (Also straightforward, though with some lemmas(e.g. to handle substitution.))
I Strong normalization. A well typed term evaluates underβ-reduction in finitely many steps to a unique irreducible term. Ifthe type is a base type, then the irreducible term is a constant. (Abit harder to prove.)
I Corollary: Simply typed lambda calculus cannot beTuring-complete! What have we lost?
85
Properties of simple types
I Uniqueness of types. In a given context (types for free variables),any simply typed lambda term has at most one type (and this isdecidable, in small polynomial time). (This should be fairly obviousfrom what we’ve just done; formal proof takes a little work, but notmuch.)
I Type safety. The type of a term remains unchanged underα, β, η-conversion. (Also straightforward, though with some lemmas(e.g. to handle substitution.))
I Strong normalization. A well typed term evaluates underβ-reduction in finitely many steps to a unique irreducible term. Ifthe type is a base type, then the irreducible term is a constant. (Abit harder to prove.)
I Corollary: Simply typed lambda calculus cannot beTuring-complete! What have we lost?
85
Properties of simple types
I Uniqueness of types. In a given context (types for free variables),any simply typed lambda term has at most one type (and this isdecidable, in small polynomial time). (This should be fairly obviousfrom what we’ve just done; formal proof takes a little work, but notmuch.)
I Type safety. The type of a term remains unchanged underα, β, η-conversion. (Also straightforward, though with some lemmas(e.g. to handle substitution.))
I Strong normalization. A well typed term evaluates underβ-reduction in finitely many steps to a unique irreducible term. Ifthe type is a base type, then the irreducible term is a constant. (Abit harder to prove.)
I Corollary: Simply typed lambda calculus cannot beTuring-complete! What have we lost?
85
Properties of simple types
I Uniqueness of types. In a given context (types for free variables),any simply typed lambda term has at most one type (and this isdecidable, in small polynomial time). (This should be fairly obviousfrom what we’ve just done; formal proof takes a little work, but notmuch.)
I Type safety. The type of a term remains unchanged underα, β, η-conversion. (Also straightforward, though with some lemmas(e.g. to handle substitution.))
I Strong normalization. A well typed term evaluates underβ-reduction in finitely many steps to a unique irreducible term. Ifthe type is a base type, then the irreducible term is a constant. (Abit harder to prove.)
I Corollary: Simply typed lambda calculus cannot beTuring-complete! What have we lost?
85
Recursion lost
Just as λx .xx can’t be typed, nor can Y, nor any other potentiallynon-terminating term. (Exercise: convince yourself that Y can’t be givena simple type.)
What can we do? General recursion is bound to allow non-termination.
86
Recursion re-gained
If your tool-kit doesn’t have a gadget: make one and put it in the kit!
I We add the term constructor fix to the language: if t is a term,then (fix t) is a term.
I We extend β-reduction to say that:
fix(λx :τ.t)β→ t[fix(λx :τ.t)/x ]
(often called the unfolding rule).I We add a new typing rule
Γ ` t : τ → τΓ ` fix t : τ
Now we can use fix just as we used Y.
Exercise: Define addition of (base type nat) numbers using fix, thesuccessor and predecessor functions suc, prd : nat→ nat, and the equalityof numbers function = : nat→ nat→ nat.
87
Recursion re-gained
If your tool-kit doesn’t have a gadget: make one and put it in the kit!
I We add the term constructor fix to the language: if t is a term,then (fix t) is a term.
I We extend β-reduction to say that:
fix(λx :τ.t)β→ t[fix(λx :τ.t)/x ]
(often called the unfolding rule).I We add a new typing rule
Γ ` t : τ → τΓ ` fix t : τ
Now we can use fix just as we used Y.
Exercise: Define addition of (base type nat) numbers using fix, thesuccessor and predecessor functions suc, prd : nat→ nat, and the equalityof numbers function = : nat→ nat→ nat.
87
Recursion re-gained
If your tool-kit doesn’t have a gadget: make one and put it in the kit!
I We add the term constructor fix to the language: if t is a term,then (fix t) is a term.
I We extend β-reduction to say that:
fix(λx :τ.t)β→ t[fix(λx :τ.t)/x ]
(often called the unfolding rule).
I We add a new typing rule
Γ ` t : τ → τΓ ` fix t : τ
Now we can use fix just as we used Y.
Exercise: Define addition of (base type nat) numbers using fix, thesuccessor and predecessor functions suc, prd : nat→ nat, and the equalityof numbers function = : nat→ nat→ nat.
87
Recursion re-gained
If your tool-kit doesn’t have a gadget: make one and put it in the kit!
I We add the term constructor fix to the language: if t is a term,then (fix t) is a term.
I We extend β-reduction to say that:
fix(λx :τ.t)β→ t[fix(λx :τ.t)/x ]
(often called the unfolding rule).I We add a new typing rule
Γ ` t : τ → τΓ ` fix t : τ
Now we can use fix just as we used Y.
Exercise: Define addition of (base type nat) numbers using fix, thesuccessor and predecessor functions suc, prd : nat→ nat, and the equalityof numbers function = : nat→ nat→ nat.
87
More bells and whistles
We can further extend the syntax of simply typed lambda with constructorsand types for the familiar apparatus of programming languages:
I tuples (products)
I records (named products)
I sums (union types)
I lists, trees etc. (N.B. one list type for each base type!)
See Pierce ch. 11 for gory (rather uninteresting) details.
More interesting is the let statement. Simple (non-recursive) lets can bedone by
let x :τ = t in t ′ ≡ (λx :τ.t ′)t
and fully recursive let definitions by
letrec x :τ = t in t ′ ≡ let x :τ = fix(λx :τ.t) in t ′
Exercise: Think carefully about the last! How does it work?
88
More bells and whistles
We can further extend the syntax of simply typed lambda with constructorsand types for the familiar apparatus of programming languages:
I tuples (products)
I records (named products)
I sums (union types)
I lists, trees etc. (N.B. one list type for each base type!)
See Pierce ch. 11 for gory (rather uninteresting) details.
More interesting is the let statement. Simple (non-recursive) lets can bedone by
let x :τ = t in t ′ ≡ (λx :τ.t ′)t
and fully recursive let definitions by
letrec x :τ = t in t ′ ≡ let x :τ = fix(λx :τ.t) in t ′
Exercise: Think carefully about the last! How does it work?
88
More bells and whistles
We can further extend the syntax of simply typed lambda with constructorsand types for the familiar apparatus of programming languages:
I tuples (products)
I records (named products)
I sums (union types)
I lists, trees etc. (N.B. one list type for each base type!)
See Pierce ch. 11 for gory (rather uninteresting) details.
More interesting is the let statement. Simple (non-recursive) lets can bedone by
let x :τ = t in t ′ ≡ (λx :τ.t ′)t
and fully recursive let definitions by
letrec x :τ = t in t ′ ≡ let x :τ = fix(λx :τ.t) in t ′
Exercise: Think carefully about the last! How does it work?88
Type reconstruction
It’s rather boring having to type all these types! Can we leave them out,write in the untyped language, and let the computer work out the types?
Essentially we program up the inference we did ‘in our heads’ to maketype proofs.
Give every (ordinary) variable x a different type variable, so x :αx . Thensee what relations have to hold between the α.
Before, we were using meta-variables σ, τ to do human reasoning; now we’regoing to promote these to actual variables ranging over types.
This is a constraint solving problem: e.g. we have α, β and we know thatα = β→ β.
Where do the constraints come from? From the typing rules.
First, some formalities . . .
89
Type reconstruction
It’s rather boring having to type all these types! Can we leave them out,write in the untyped language, and let the computer work out the types?
Essentially we program up the inference we did ‘in our heads’ to maketype proofs.
Give every (ordinary) variable x a different type variable, so x :αx . Thensee what relations have to hold between the α.
Before, we were using meta-variables σ, τ to do human reasoning; now we’regoing to promote these to actual variables ranging over types.
This is a constraint solving problem: e.g. we have α, β and we know thatα = β→ β.
Where do the constraints come from? From the typing rules.
First, some formalities . . .
89
Type reconstruction
It’s rather boring having to type all these types! Can we leave them out,write in the untyped language, and let the computer work out the types?
Essentially we program up the inference we did ‘in our heads’ to maketype proofs.
Give every (ordinary) variable x a different type variable, so x :αx . Thensee what relations have to hold between the α.
Before, we were using meta-variables σ, τ to do human reasoning; now we’regoing to promote these to actual variables ranging over types.
This is a constraint solving problem: e.g. we have α, β and we know thatα = β→ β.
Where do the constraints come from? From the typing rules.
First, some formalities . . .
89
Type variables
We add type variables to our typed language. Convention: I use thestart of the Greek alphabet for type variables. Now types include:
I every type variable α is a type.
Eventually, type variables need to turn into actual types: consider(λx :α.x)1.
A type substitution gives a substitution of types for type variables,e.g. [nat/α, bool/β, (α→ β)/γ]. We’ll write type substitution before aterm, e.g. [nat/α](λx :α.x) = λx :nat.x
Substitutions apply to terms, contexts, etc. in the obvious way.
I Theorem: type substitution preserves typing: if Γ ` t : τ , and T is atype substitution, then T Γ ` T t : T τ
I The reverse is not true: x : α 0 x : nat, but[nat/α](x : α) ` [nat/α]x : [nat/α]nat.
90
Type variables
We add type variables to our typed language. Convention: I use thestart of the Greek alphabet for type variables. Now types include:
I every type variable α is a type.
Eventually, type variables need to turn into actual types: consider(λx :α.x)1.
A type substitution gives a substitution of types for type variables,e.g. [nat/α, bool/β, (α→ β)/γ]. We’ll write type substitution before aterm, e.g. [nat/α](λx :α.x) = λx :nat.x
Substitutions apply to terms, contexts, etc. in the obvious way.
I Theorem: type substitution preserves typing: if Γ ` t : τ , and T is atype substitution, then T Γ ` T t : T τ
I The reverse is not true: x : α 0 x : nat, but[nat/α](x : α) ` [nat/α]x : [nat/α]nat.
90
Type variables
We add type variables to our typed language. Convention: I use thestart of the Greek alphabet for type variables. Now types include:
I every type variable α is a type.
Eventually, type variables need to turn into actual types: consider(λx :α.x)1.
A type substitution gives a substitution of types for type variables,e.g. [nat/α, bool/β, (α→ β)/γ]. We’ll write type substitution before aterm, e.g. [nat/α](λx :α.x) = λx :nat.x
Substitutions apply to terms, contexts, etc. in the obvious way.
I Theorem: type substitution preserves typing: if Γ ` t : τ , and T is atype substitution, then T Γ ` T t : T τ
I The reverse is not true: x : α 0 x : nat, but[nat/α](x : α) ` [nat/α]x : [nat/α]nat.
90
Type variables
We add type variables to our typed language. Convention: I use thestart of the Greek alphabet for type variables. Now types include:
I every type variable α is a type.
Eventually, type variables need to turn into actual types: consider(λx :α.x)1.
A type substitution gives a substitution of types for type variables,e.g. [nat/α, bool/β, (α→ β)/γ]. We’ll write type substitution before aterm, e.g. [nat/α](λx :α.x) = λx :nat.x
Substitutions apply to terms, contexts, etc. in the obvious way.
I Theorem: type substitution preserves typing: if Γ ` t : τ , and T is atype substitution, then T Γ ` T t : T τ
I The reverse is not true: x : α 0 x : nat, but[nat/α](x : α) ` [nat/α]x : [nat/α]nat.
90
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) :
α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) :
α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : δ
α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η)
. . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η)
. . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.
91
Reconstruction example
Find types for λf .λx .λy .suc(f (+ x y)).
First annotate: λf :γ.λx :α.λy :β.suc(f (+ x y)).
Now make the proof tree, using additional type variables and recordingequations instead of working things out:
. . . ` suc : nat→ nat
[γ = η→ nat] . . . ` f : γ
. . . ` f : η→ nat
[α→ θ→ η = nat→ nat→ nat]
. . . ` + : nat→ nat→ nat
. . . ` + : α→ (θ→ η) . . . ` x : α
. . . ` (+ x) : θ→ η
[θ = β]
. . . ` y : β
. . . ` y : θ
. . . ` (+ x y) : η
. . . ` f (+ x y) : nat [ζ = nat]
f : γ, x : α, y : β ` suc(f (+ x y)) : ζ
f : γ, x : α ` λy :β.suc(f (+ x y)) : β→ ζ [ε = β→ ζ]
f : γ ` λx :α.λy :β.suc(f (+ x y)) : α→ ε [δ = α→ ε]
` λf :γ.λx :α.λy :β.suc(f (+ x y)) : γ→ δ
Solve the equations to get α = β = nat, and γ = nat→ nat.91
Solving the type equations
The equations are very simple: just equalities involving variables and →.
We can solve them by unification (technique from logic programming).Standard algorithm: look it up in Pierce for this case, or anywhere for thegeneral case.
Theorem: if an untyped λ-term can be simply typed, this procedure willfind a simple type for it, or return ‘fail’ otherwise (if the equations haveno solution).
Proof by a somewhat technically involved induction on terms. (Full detailsin Pierce.)
So type inference is decidable (with reasonable polynomial complexity –in all practical cases, it’s even linear).
92
Solving the type equations
The equations are very simple: just equalities involving variables and →.
We can solve them by unification (technique from logic programming).Standard algorithm: look it up in Pierce for this case, or anywhere for thegeneral case.
Theorem: if an untyped λ-term can be simply typed, this procedure willfind a simple type for it, or return ‘fail’ otherwise (if the equations haveno solution).
Proof by a somewhat technically involved induction on terms. (Full detailsin Pierce.)
So type inference is decidable (with reasonable polynomial complexity –in all practical cases, it’s even linear).
92
General solutions
I Consider the identity function λx .x . Type reconstruction gives(λx :α.x) : α→ α with no further equations!
I What does an unsolved variable type mean? We can type λx .x withany substitution [τ/α].
I In general, the reconstruction algorithm gives us the most generaltype, leaving free variables.
I What about λf :α.λx :β.f (fx)?We get α = β→ β, so type (β→ β)→ β→ β.
I What about λf :α.λx :β.f (x)?We get α = β→ γ, so type (β→ γ)→ β→ γ.
I But still, to get a real simply typed term, we must substitute.
93
General solutions
I Consider the identity function λx .x . Type reconstruction gives(λx :α.x) : α→ α with no further equations!
I What does an unsolved variable type mean? We can type λx .x withany substitution [τ/α].
I In general, the reconstruction algorithm gives us the most generaltype, leaving free variables.
I What about λf :α.λx :β.f (fx)?We get α = β→ β, so type (β→ β)→ β→ β.
I What about λf :α.λx :β.f (x)?We get α = β→ γ, so type (β→ γ)→ β→ γ.
I But still, to get a real simply typed term, we must substitute.
93
General solutions
I Consider the identity function λx .x . Type reconstruction gives(λx :α.x) : α→ α with no further equations!
I What does an unsolved variable type mean? We can type λx .x withany substitution [τ/α].
I In general, the reconstruction algorithm gives us the most generaltype, leaving free variables.
I What about λf :α.λx :β.f (fx)?We get α = β→ β, so type (β→ β)→ β→ β.
I What about λf :α.λx :β.f (x)?We get α = β→ γ, so type (β→ γ)→ β→ γ.
I But still, to get a real simply typed term, we must substitute.
93
General solutions
I Consider the identity function λx .x . Type reconstruction gives(λx :α.x) : α→ α with no further equations!
I What does an unsolved variable type mean? We can type λx .x withany substitution [τ/α].
I In general, the reconstruction algorithm gives us the most generaltype, leaving free variables.
I What about λf :α.λx :β.f (fx)?We get α = β→ β, so type (β→ β)→ β→ β.
I What about λf :α.λx :β.f (x)?We get α = β→ γ, so type (β→ γ)→ β→ γ.
I But still, to get a real simply typed term, we must substitute.
93
Polymorphism
In the simply-typed λ-calculus, we have infinitely many distinct identityfunctions: λx :nat.x , λx :bool.x , λx :nat→ nat→ nat.x and so on.
Wouldn’t be nice if we could write a polymorphic λx .x that was actuallythe identity for all the types at once?
How did we avoid giving a type to the recursion ‘combinator’ fix?
Note that if we write it out each time, we can do this, as type reconstruc-tion works separately on each instance:
〈(λx .x)(1), (λx .x)(true)〉
What we can’t do is:
let id = λx .x in 〈id(1), id(true)〉
Why not? Un-sugar the let into a λ-application, and work through thereconstruction. How does it fail?
94
Polymorphism
In the simply-typed λ-calculus, we have infinitely many distinct identityfunctions: λx :nat.x , λx :bool.x , λx :nat→ nat→ nat.x and so on.
Wouldn’t be nice if we could write a polymorphic λx .x that was actuallythe identity for all the types at once?
How did we avoid giving a type to the recursion ‘combinator’ fix?
Note that if we write it out each time, we can do this, as type reconstruc-tion works separately on each instance:
〈(λx .x)(1), (λx .x)(true)〉
What we can’t do is:
let id = λx .x in 〈id(1), id(true)〉
Why not? Un-sugar the let into a λ-application, and work through thereconstruction. How does it fail?
94
Polymorphism
In the simply-typed λ-calculus, we have infinitely many distinct identityfunctions: λx :nat.x , λx :bool.x , λx :nat→ nat→ nat.x and so on.
Wouldn’t be nice if we could write a polymorphic λx .x that was actuallythe identity for all the types at once?
How did we avoid giving a type to the recursion ‘combinator’ fix?
Note that if we write it out each time, we can do this, as type reconstruc-tion works separately on each instance:
〈(λx .x)(1), (λx .x)(true)〉
What we can’t do is:
let id = λx .x in 〈id(1), id(true)〉
Why not? Un-sugar the let into a λ-application, and work through thereconstruction. How does it fail?
94
Polymorphism more formally
We have several possibilities for types with free type variables:
I some terms, like λx :α.x are well-typed ‘with the variablesthemselves’, and so are well-typed for any type substitution;
I some terms, like λx :α.(+ 1 x) are ill-typed, and only becomewell-typed for a specific substitution [nat/α];
I and some become well-typed for many substitutions:λf :α.λx :β.f (fx) well types under T wheneverT (α) = T (β)→T (β).
So what is the best type for λx .x? Our typing rules tell us that for anyα, we have ` (λx :α.x) : α→α. Perhaps we could say that λx .x has thetype scheme ∀α.α→ α?
λx .(+ 1 x), however, must have type nat→ nat.
And λf .λx .f (fx) could have the type (scheme) ∀β.(β→ β)→ β→ β?
The universal quantifier binds the free type variables.
Now go back to our original simply typed proof system, when we tried to type
λx :nat.xx – did anything happen in there that looked like type substitution?
95
Polymorphism more formally
We have several possibilities for types with free type variables:
I some terms, like λx :α.x are well-typed ‘with the variablesthemselves’, and so are well-typed for any type substitution;
I some terms, like λx :α.(+ 1 x) are ill-typed, and only becomewell-typed for a specific substitution [nat/α];
I and some become well-typed for many substitutions:λf :α.λx :β.f (fx) well types under T wheneverT (α) = T (β)→T (β).
So what is the best type for λx .x? Our typing rules tell us that for anyα, we have ` (λx :α.x) : α→α. Perhaps we could say that λx .x has thetype scheme ∀α.α→ α?
λx .(+ 1 x), however, must have type nat→ nat.
And λf .λx .f (fx) could have the type (scheme) ∀β.(β→ β)→ β→ β?
The universal quantifier binds the free type variables.
Now go back to our original simply typed proof system, when we tried to type
λx :nat.xx – did anything happen in there that looked like type substitution?
95
Polymorphism more formally
We have several possibilities for types with free type variables:
I some terms, like λx :α.x are well-typed ‘with the variablesthemselves’, and so are well-typed for any type substitution;
I some terms, like λx :α.(+ 1 x) are ill-typed, and only becomewell-typed for a specific substitution [nat/α];
I and some become well-typed for many substitutions:λf :α.λx :β.f (fx) well types under T wheneverT (α) = T (β)→T (β).
So what is the best type for λx .x? Our typing rules tell us that for anyα, we have ` (λx :α.x) : α→α. Perhaps we could say that λx .x has thetype scheme ∀α.α→ α?
λx .(+ 1 x), however, must have type nat→ nat.
And λf .λx .f (fx) could have the type (scheme) ∀β.(β→ β)→ β→ β?
The universal quantifier binds the free type variables.
Now go back to our original simply typed proof system, when we tried to type
λx :nat.xx – did anything happen in there that looked like type substitution?
95
Polymorphism more formally
We have several possibilities for types with free type variables:
I some terms, like λx :α.x are well-typed ‘with the variablesthemselves’, and so are well-typed for any type substitution;
I some terms, like λx :α.(+ 1 x) are ill-typed, and only becomewell-typed for a specific substitution [nat/α];
I and some become well-typed for many substitutions:λf :α.λx :β.f (fx) well types under T wheneverT (α) = T (β)→T (β).
So what is the best type for λx .x? Our typing rules tell us that for anyα, we have ` (λx :α.x) : α→α. Perhaps we could say that λx .x has thetype scheme ∀α.α→ α?
λx .(+ 1 x), however, must have type nat→ nat.
And λf .λx .f (fx) could have the type (scheme) ∀β.(β→ β)→ β→ β?
The universal quantifier binds the free type variables.
Now go back to our original simply typed proof system, when we tried to type
λx :nat.xx – did anything happen in there that looked like type substitution?
95
Polymorphism more formally
We have several possibilities for types with free type variables:
I some terms, like λx :α.x are well-typed ‘with the variablesthemselves’, and so are well-typed for any type substitution;
I some terms, like λx :α.(+ 1 x) are ill-typed, and only becomewell-typed for a specific substitution [nat/α];
I and some become well-typed for many substitutions:λf :α.λx :β.f (fx) well types under T wheneverT (α) = T (β)→T (β).
So what is the best type for λx .x? Our typing rules tell us that for anyα, we have ` (λx :α.x) : α→α. Perhaps we could say that λx .x has thetype scheme ∀α.α→ α?
λx .(+ 1 x), however, must have type nat→ nat.
And λf .λx .f (fx) could have the type (scheme) ∀β.(β→ β)→ β→ β?
The universal quantifier binds the free type variables.
Now go back to our original simply typed proof system, when we tried to type
λx :nat.xx – did anything happen in there that looked like type substitution?
95
Polymorphism more formally
We have several possibilities for types with free type variables:
I some terms, like λx :α.x are well-typed ‘with the variablesthemselves’, and so are well-typed for any type substitution;
I some terms, like λx :α.(+ 1 x) are ill-typed, and only becomewell-typed for a specific substitution [nat/α];
I and some become well-typed for many substitutions:λf :α.λx :β.f (fx) well types under T wheneverT (α) = T (β)→T (β).
So what is the best type for λx .x? Our typing rules tell us that for anyα, we have ` (λx :α.x) : α→α. Perhaps we could say that λx .x has thetype scheme ∀α.α→ α?
λx .(+ 1 x), however, must have type nat→ nat.
And λf .λx .f (fx) could have the type (scheme) ∀β.(β→ β)→ β→ β?
The universal quantifier binds the free type variables.
Now go back to our original simply typed proof system, when we tried to type
λx :nat.xx – did anything happen in there that looked like type substitution? 95
Let-polymorphism
Allowing type schemes seems natural, but . . .
I could we still do type inference?I are we going to get more complicated types?
We’ll come back to this, but just think about trying to type
(λx .x)(λx .x)
It turns out that there is a natural ‘sweet spot’ if we allow type schemesin let-bindings, but not in λ-abstractions. (N.B. This means let has tobe promoted from syntactic sugar to a primitive.) Then
let id = λx .x in 〈id(1), id(true)〉will work as desired.
The following will not work!
let id = λx .x in (λf .〈f (1), f (true)〉)(id)
Exercise: Verify that indeed it fails in Haskell.
96
Let-polymorphism
Allowing type schemes seems natural, but . . .
I could we still do type inference?I are we going to get more complicated types?
We’ll come back to this, but just think about trying to type
(λx .x)(λx .x)
It turns out that there is a natural ‘sweet spot’ if we allow type schemesin let-bindings, but not in λ-abstractions. (N.B. This means let has tobe promoted from syntactic sugar to a primitive.) Then
let id = λx .x in 〈id(1), id(true)〉will work as desired.
The following will not work!
let id = λx .x in (λf .〈f (1), f (true)〉)(id)
Exercise: Verify that indeed it fails in Haskell.
96
Let-polymorphism
Allowing type schemes seems natural, but . . .
I could we still do type inference?I are we going to get more complicated types?
We’ll come back to this, but just think about trying to type
(λx .x)(λx .x)
It turns out that there is a natural ‘sweet spot’ if we allow type schemesin let-bindings, but not in λ-abstractions. (N.B. This means let has tobe promoted from syntactic sugar to a primitive.) Then
let id = λx .x in 〈id(1), id(true)〉will work as desired.
The following will not work!
let id = λx .x in (λf .〈f (1), f (true)〉)(id)
Exercise: Verify that indeed it fails in Haskell. 96
Hindley–Milner type inference
First, we add let as a primitive:
I If x is a variable, and t and t ′ are terms, then let x = t in t ′ is aterm.
I It evaluates just as λ:
let x = t in t ′β→ t ′[t/x ]
The difference comes in the typing rule:
Γ ` t : σ Γ, x : Γ(σ) ` t ′ : τ
Γ ` let x = t in t ′ : τ
What is Γ(σ)? It is the result of generalizing (binding with ∀) all free typevariables in σ that do not occur free in Γ.
E.g.: if x : α ` t : α→ β, then x : α(α→ β) = ∀β.α→ β
97
Hindley–Milner type inference
First, we add let as a primitive:
I If x is a variable, and t and t ′ are terms, then let x = t in t ′ is aterm.
I It evaluates just as λ:
let x = t in t ′β→ t ′[t/x ]
The difference comes in the typing rule:
Γ ` t : σ Γ, x : Γ(σ) ` t ′ : τ
Γ ` let x = t in t ′ : τ
What is Γ(σ)? It is the result of generalizing (binding with ∀) all free typevariables in σ that do not occur free in Γ.
E.g.: if x : α ` t : α→ β, then x : α(α→ β) = ∀β.α→ β
97
Hindley–Milner type inference
First, we add let as a primitive:
I If x is a variable, and t and t ′ are terms, then let x = t in t ′ is aterm.
I It evaluates just as λ:
let x = t in t ′β→ t ′[t/x ]
The difference comes in the typing rule:
Γ ` t : σ Γ, x : Γ(σ) ` t ′ : τ
Γ ` let x = t in t ′ : τ
What is Γ(σ)? It is the result of generalizing (binding with ∀) all free typevariables in σ that do not occur free in Γ.
E.g.: if x : α ` t : α→ β, then x : α(α→ β) = ∀β.α→ β
97
Hindley–Milner type inference
First, we add let as a primitive:
I If x is a variable, and t and t ′ are terms, then let x = t in t ′ is aterm.
I It evaluates just as λ:
let x = t in t ′β→ t ′[t/x ]
The difference comes in the typing rule:
Γ ` t : σ Γ, x : Γ(σ) ` t ′ : τ
Γ ` let x = t in t ′ : τ
What is Γ(σ)? It is the result of generalizing (binding with ∀) all free typevariables in σ that do not occur free in Γ.
E.g.: if x : α ` t : α→ β, then x : α(α→ β) = ∀β.α→ β
97
Hindley–Milner type inference
First, we add let as a primitive:
I If x is a variable, and t and t ′ are terms, then let x = t in t ′ is aterm.
I It evaluates just as λ:
let x = t in t ′β→ t ′[t/x ]
The difference comes in the typing rule:
Γ ` t : σ Γ, x : Γ(σ) ` t ′ : τ
Γ ` let x = t in t ′ : τ
What is Γ(σ)? It is the result of generalizing (binding with ∀) all free typevariables in σ that do not occur free in Γ.
E.g.: if x : α ` t : α→ β, then x : α(α→ β) = ∀β.α→ β
97
Generalization and Specialization
Generalization gave variables type schemes (or polytypes). For typeinference to work, we have to un-generalize, or specialize, by modifyingthe rule for variables:
x : σ ∈ Γ σ v τΓ ` x : τ
What does σ v τ mean? ‘τ is more special than σ’, i.e. is got bysubstituting types (no quantifiers!) for the quantified variables of σ.
E.g. ∀β.α→ β v α→ γ and ∀β.α→ β v α→ (nat→ bool).
This lets every use of a polymorphic identifier get a new bunch of typevariables to label it.
98
Generalization and Specialization
Generalization gave variables type schemes (or polytypes). For typeinference to work, we have to un-generalize, or specialize, by modifyingthe rule for variables:
x : σ ∈ Γ σ v τΓ ` x : τ
What does σ v τ mean? ‘τ is more special than σ’, i.e. is got bysubstituting types (no quantifiers!) for the quantified variables of σ.
E.g. ∀β.α→ β v α→ γ and ∀β.α→ β v α→ (nat→ bool).
This lets every use of a polymorphic identifier get a new bunch of typevariables to label it.
98
Generalization and Specialization
Generalization gave variables type schemes (or polytypes). For typeinference to work, we have to un-generalize, or specialize, by modifyingthe rule for variables:
x : σ ∈ Γ σ v τΓ ` x : τ
What does σ v τ mean? ‘τ is more special than σ’, i.e. is got bysubstituting types (no quantifiers!) for the quantified variables of σ.
E.g. ∀β.α→ β v α→ γ and ∀β.α→ β v α→ (nat→ bool).
This lets every use of a polymorphic identifier get a new bunch of typevariables to label it.
98
Generalization and Specialization
Generalization gave variables type schemes (or polytypes). For typeinference to work, we have to un-generalize, or specialize, by modifyingthe rule for variables:
x : σ ∈ Γ σ v τΓ ` x : τ
What does σ v τ mean? ‘τ is more special than σ’, i.e. is got bysubstituting types (no quantifiers!) for the quantified variables of σ.
E.g. ∀β.α→ β v α→ γ and ∀β.α→ β v α→ (nat→ bool).
This lets every use of a polymorphic identifier get a new bunch of typevariables to label it.
98
Polymorphic type inference example
Let’s look at our motivating example
let id = λx .x in 〈id(1), id(true)〉
The inference tree is (with some abbreviation)
x : β ∈ x : βx : β ` x : β
` λx .x : β→ β
. . .
. . . ∈ . . . [ζ = δ]
. . . v ζ→ ζ
. . . ` id : ζ→ δ
[δ = bool]
. . . ` true : bool
. . . ` id(true) : δ
id : ∀β.β→ β ` 〈id(1), id(true)〉 : α [α = γ × δ]
` let id = λx .x in 〈id(1), id(true)〉 : α
99
Polymorphic type inference example
Let’s look at our motivating example
let id = λx .x in 〈id(1), id(true)〉
The inference tree is (with some abbreviation)
x : β ∈ x : βx : β ` x : β
` λx .x : β→ β
. . .
. . . ∈ . . . [ζ = δ]
. . . v ζ→ ζ
. . . ` id : ζ→ δ
[δ = bool]
. . . ` true : bool
. . . ` id(true) : δ
id : ∀β.β→ β ` 〈id(1), id(true)〉 : α [α = γ × δ]
` let id = λx .x in 〈id(1), id(true)〉 : α
99
Polymorphic type inference example
Let’s look at our motivating example
let id = λx .x in 〈id(1), id(true)〉
The inference tree is (with some abbreviation)
x : β ∈ x : βx : β ` x : β
` λx .x : β→ β
. . . ∈ . . . [ε = γ]
. . . v ε→ ε
. . . ` id : ε→ γ
[γ = nat]
. . . ` 1 : nat
. . . ` id(1) : γ
. . . ∈ . . . [ζ = δ]
. . . v ζ→ ζ
. . . ` id : ζ→ δ
[δ = bool]
. . . ` true : bool
. . . ` id(true) : δ
id : ∀β.β→ β ` 〈id(1), id(true)〉 : α [α = γ × δ]
` let id = λx .x in 〈id(1), id(true)〉 : α
99
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]tE.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).
100
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]tE.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).
100
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]tE.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).
100
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]tE.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).
100
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]tE.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).
100
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]t
E.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).
100
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]tE.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).
100
Remarks on Hindley–Milner
I The proof-tree plus unification description is convenient for provingproperties of the system. Implementations do more or less the same(as the quite efficient version we’ve given).
I Type inference for let-polymorphism runs quickly (linearly) on everyprogram a real programmer has ever written.
I But in fact it is ExpTime-complete!
I We can imagine generalizing to full polymorphism: allow allvariables and terms to have actual types like ∀α.α→ α. Then(λf .〈f (1), f (true)〉)(λx .x) is fine.
I But type inference becomes undecidable!
I So we write the types explicitly, as in Java generics, and introducetype abstraction and type application:if t : τ , then (Λα.t)〈〈σ〉〉 → [σ/α]tE.g. (λf :∀α.α→ α.〈(f 〈〈bool〉〉)(true), (f 〈〈nat〉〉)(1)〉)(Λα.λx :α.x)
I System F (Girard), or polymorphic λ-calculus (Reynolds).100