cse 599 lecture 7: information theory, thermodynamics and reversible computing

72
1 R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing What have we done so far? Theoretical computer science: Abstract models of computing Turing machines, computability, time and space complexity Physical Instantiations 1. Digital Computing Silicon switches manipulate binary variables with near-zero error 2. DNA computing Massive parallelism and biochemical properties of organic molecules allow fast solutions to hard search problems 3. Neural Computing Distributed networks of neurons compute fast, parallel, adaptive, and fault-tolerant solutions to hard pattern recognition and motor control problems

Upload: keith-cherry

Post on 30-Dec-2015

35 views

Category:

Documents


2 download

DESCRIPTION

CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing. What have we done so far? Theoretical computer science: Abstract models of computing Turing machines, computability, time and space complexity Physical Instantiations Digital Computing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

1R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing What have we done so far?

Theoretical computer science: Abstract models of computing Turing machines, computability, time and space complexity

Physical Instantiations1. Digital Computing

Silicon switches manipulate binary variables with near-zero error

2. DNA computing Massive parallelism and biochemical properties of organic

molecules allow fast solutions to hard search problems3. Neural Computing

Distributed networks of neurons compute fast, parallel, adaptive, and fault-tolerant solutions to hard pattern recognition and motor control problems

Page 2: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

2R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Overview of Today’s Lecture

Information theory and Kolmogorov Complexity What is information? Definition based on probability theory Error-correcting codes and compression An algorithmic definition of information (Kolmogorov complexity)

Thermodynamics The physics of computation Relation to information theory Energy requirements for computing

Reversible Computing Computing without energy consumption? Biological example Reversibe logic gates Quantum computing (next week!)

Page 3: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

3R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Information and Algorithmic Complexity

3 principal results: Shannon’s source-coding theorem

The main theorem of information contentA measure of the number of bits needed to specify the expected

outcome of an experiment Shannon’s noisy-channel coding theorem

Describes how much information we can transmit over a channelA strict bound on information transfer

Kolmogorov complexity Measures the algorithmic information content of a string An uncomputable function

Page 4: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

4R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

What is information?

First try at a definition…

Suppose you have stored n different bookmarks on your web browser.

What is the minimum number of bits you need to store these as binary numbers?

Let I be the minimum number of bits needed. Then,

2I n I log2 n

So, the “information” contained in your collection of n bookmarks is I0 = log2 n

Page 5: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

5R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Deterministic information I0

Consider a set of alternatives: X = {a1, a2, a3, …aK} When the outcome is a3, we say x = a3

I0(X) is the amount of information needed to specify the outcome of X I0(X) = log2X

We will assume base 2 from now on (unless stated otherwise)Units are bits (binary digits)

Relationship between bits and binary digits B = {0, 1} X = BM = set of all binary strings of length M I0(X) = logBM log2M M bits

Page 6: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

6R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Is this definition satisfactory?

Appeal to your intuition…

Which of these two messages contains more “information”?

“Dog bites man”

or

“Man bites dog”

Page 7: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

7R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Is this definition satisfactory?

Appeal to your intuition…

Which of these two messages contains more “information”?

“Dog bites man”

or

“Man bites dog”

Same number of bits to represent each message!

But, it seems like the second message contains a lot more information than the first. Why?

Page 8: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

8R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Enter probability theory…

Surprising events (unexpected messages) contain more information than ordinary or expected events “Dog bites man” occurs much more frequently than “Man bites dog

Messages about less frequent events carry more information

So, information about an event varies inversely with the probability of that event

But, we also want information to be additive If message xy contains sub-parts x and y, we want:

I(xy) = I(x) + I(y)

Use the logarithm function: log(xy) = log(x) + log(y)

Page 9: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

9R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

New Definition of Information

Define the information contained in a message x in terms of log of the inverse probability of that message: I(x) = log(1/P(x)) = - log P(x)

First defined rigorously and studied by Shannon (1948) “A mathematical theory of communication” – electronic handout

(PDF file) on class website.

Our previous definition is a special case: Suppose you had n equally likely items (e.g. bookmarks) For any item x, P(x) = 1/n I(x) = log(1/P(x)) = log n Same as before (minimum number of bits needed to store n items)

Page 10: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

10R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Review: Axioms of probability theory

Kolmogorov, 1933 P(a) >= 0 where a is an event P(l) = 1 where l is the certain event P(a + b) = P(a) + P(b) where a and b are mutually exclusive

Kolmogorov (axiomatic) definition is computable Probability theory forms the basis for information theory Classical definition based on event frequencies (Bernoulli) is

uncomputable:P a

nnna( ) lim

Page 11: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

11R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Review: Results from probability theory

Joint probability of two events a and b: P(ab)

Independence Events a and b are independent if P(ab) = P(a)P(b)

Conditional probability: P(a|b) = probability that event a happens given that b has happened P(a|b) = P(ab)/P(b) P(b|a) = P(ba)/P(a) = P(ab)/P(a)

We just proved Bayes’ Theorem:

P(a) is called the a priori probability of a P(ab) is called the a posteriori probability of a

P a bP b a P a

P b

e j e j

Page 12: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

12R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Summary: Postulates of information theory

1. Information is defined in the context of a set of alternatives. The amount of information quantifies the number of bits needed to specify an outcome from the alternatives

2. The amount of information is independent of the semantics (only depends on probability)

3. Information is always positive

4. Information is measured on a logarithmic scaleProbabilities are multiplicative, but information is

additive

Page 13: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

13R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

In-Class Example

Message y contains duplicates: y = xx

Message x has probability P(x)

What is the information content of y? Is I(y) = 2 I(x)?

Page 14: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

14R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

In-Class Example

Message y contains duplicates: y = xx

Message x has probability P(x)

What is the information content of y? Is I(y) = 2 I(x)?

I(y) = log(1/P(xx)) = log[1/(P(x|x)P(x))] = log(1/P(x|x)) + log(1/P(x))

= 0 + log(1/P(x))= I(x)

Duplicates convey no additional information!

Page 15: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

15R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Definition: Entropy

The average self-information or entropy of an ensemble X= {a1, a2, a3, …aK}

E expected (or average) value

H X EP x

P aP ak

k

K

k FH IK

FHG

IKJ

log( )

log1 1

1af af

Page 16: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

16R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Properties of Entropy

0 <= H(X) <= I0(X) Equals I0(X) = logXif all the ak’s are equally probable Equals 0 if only one ak is possible

Consider the case where k = 2 X = {a1, a2} P(a1) = ; P(a2) = 1–

H X FHIK

log log

log log

11

11

1 1

Page 17: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

17R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Examples

Entropy is a measure of randomness of the source producing the events

Example 1 : Coin toss: Heads or tails with equal probability H = -(½ log ½ + ½ log ½) = -(½ (-1) + ½ (-1)) = 1 bit per coin toss

Example 2 : P(heads) = ¾ and P(tails) = ¼ H = -(¾ log ¾ + ¼ log ¼) = 0.811 bits per coin toss

As things get less random, entropy decreases Redundancy and regularity increases

Page 18: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

18R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Question

If we have N different symbols, we can encode them in log(N) bits. Example: English - 26 letters 5 bits

So, over many, many messages, the average cost/symbol is still 5 bits.

But, letters occur with very different probabilities! “A” and “E” much more common than “X” and “Q”. The log(N) estimate assumes equal probabilities.

Question: Can we encode symbols based on probabilities so that the average cost/symbol is minimized?

Page 19: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

19R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Shannon’s noiseless source-coding theorem

Also called the fundamental theorem. In words: You can compress N independent, identically distributed (i.i.d.)

random variables, each with entropy H, down to NH bits with negligible loss of information (as N)

If you compress them into fewer than NH bits you will dramatically lose information

The theorem: Let X be an ensemble with H(X) = H bits. Let H (X) be the entropy

of an encoding of X with allowable probability of error Given any > 0 and 0 < < 1, there exists a positive integer No such

that, for N > No, 1N

H XN

e j H

Page 20: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

20R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Comments on the theorem

What do the two inequalities tell us?

The number of bits that we need to specify outcomes x with vanishingly small error probability does not exceed H +

If we accept a vanishingly small error, the number of bits we need to specify x drops to N(H + )

The number of bits that we need to specify outcomes x with large allowable error probability is at least H –

1N

H X HN

e j

1N

H XN

e j

HN

H XN

1 e j

1N

H XN

e j

Page 21: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

21R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Source coding (data compression)

Question: How do we compress the outcomes XN? With vanishingly small probability of error How do we assign the elements of X such that the number of bits we

need to encode XN drops to N(H + )

Symbol coding: Given x = a3 a2 a7 … a5 Generate codeword (x) = 01 1010 00 Want Io((x)) ~ H(X)

Well-known coding examples Zip, gzip, compress, etc. The performance of these algorithms is, in general, poor when

compared to the Shannon limit

Page 22: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

22R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Source-coding definitions

A code is a function : X B+

B = {0, 1} B+ the set of finite strings over B

B+ = {0, 1, 00, 01, 10, 11, 000, 001, …} (x) = (x1) (x2) (x3) … (xN)

A code is uniquely decodable (UD) iff : X+ B+ is one-to-one

A code is instantaneous iff No codeword is the prefix of another (x1) is not a prefix of (x2)

Page 23: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

23R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Huffman coding

Given X = {a1, a2, …aK}, with associated probabilities P(ak)

Given a code with codeword lengths n1, n2, …nk

The expected code length

No instantaneous, UD code can achieve a smaller than a Huffman code

n P nk kk

K

1

n

n P n H Xk kk

K

11

Page 24: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

24R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Constructing a Huffman code

Feynman example: Encoding an alphabet Code is instantaneous and UD: 00100001101010 = ANOTHER

Code achieves close to Shannon limit H(X) = 2.06 bits; = 2.13 bits n

1

Page 25: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

25R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Information channels

I(X;Y) is the average mutual information between X and Y

Definition: Channel capacity The information capacity of a channel

is: C = max[I(X;Y)]

The channel may add noise Corrupting our symbols

channelx y

Input Output

H(X) entropyof input ensemble X

I(X;Y) what we knowabout X given Y

I X Y H X H X Y

P xyP xy

P x P y

H Y H Y X

xy

;

log( )

( ) ( )

a fafa f

Page 26: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

26R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Example: Channel capacity

Problem: A binary source sends equiprobable messages in a time T, using the alphabet {0, 1} with a symbol rate R. As a result of noise, a “0” may be mistaken for a “1”, and a “1” for a “0”, both with probability q. What is the channel capacity C?

X YChannel is discreteand memoryless

Page 27: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

27R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Example: Channel capacity (con’t)

Assume no noise (no errors)T is the time to send the string, R is the rate

The number of possible message strings is 2RT

The maximum entropy of the source is Ho = log(2RT ) bitsThe source rate is (1/T) Ho = R bits per second

The entropy of the noise (per transmitted bit) isHn = qlog[1/q] + (1–q)log[1/(1–q)]

The channel capacity C (bits/sec) = R – RHn = R(1 – Hn)C is always less than R (a fixed fraction of R)!We must add code bits to correct the received message

Page 28: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

28R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

How many code bits must we add?

We want to send a message string of length M We add codebits to M, thereby increasing its length to Mc

How are M, Mc, and q related?

M = Mc(1 – Hn) Intuitively, from our example Also see pgs. 106 – 110 of Feynman Note: this is an asymptotic limit

May require a huge Mc

Page 29: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

29R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Shannon’s Channel-Coding Theorem

The Theorem: There is a nonnegative channel capacity C associated with each

discrete memoryless channel with the following property: For any symbol rate R < C, and any error rate > 0, there is a protocol that achieves a rate >= R and a probability of error <=

In words: If the entropy of our symbol stream is equal to or less than the

channel capacity, then there exists a coding technique that enables transmission over the channel with arbitrarily small error

Can transmit information at a rate H(X) <= C

Shannon’s theorem tells us the asymptotically maximum rate It does not tell us the code that we must use to obtain this rate Achieving a high rate may require a prohibitively long code

Page 30: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

30R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Error-correction codes

Error-correcting codes allow us to detect and correct errors in symbol streams Used in all signal communications (digital phones, etc) Used in quantum computing to ameliorate effects of decoherence

Many techniques and algorithms Block codes Hamming codes BCH codes Reed-Solomon codes Turbo codes

Page 31: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

31R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Hamming codes

An example: Construct a code that corrects a single error We add m check bits to our message

Can encode at most (2m – 1) error positions Errors can occur in the message bits and/or in the check bits

If n is the length of the original message then 2m – 1 >= (n + m) Examples:

If n = 11, m = 4: 24 – 1= 15 >= (n + m) = 15 If n = 1013, m = 10: 210 – 1= 1023 >= (n + m) = 1023

Page 32: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

32R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Hamming codes (cont.)

Example: An 11/15 SEC Hamming code Idea: Calculate parity over subsets of input

bitsFour subsets: Four parity bits

Check bit x stores parity of input bit positions whose binary representation holds a “1” in position x:Check bit c1: Bits 1,3,5,7,9,11,13,15Check bit c2: Bits 2,3,6,7,10,11,14,15Check bit c3: Bits 4,5,6,7,12,13,14,15Check bit c4: Bits 8,9,10,11,12,13,14,15

The parity-check bits are called a syndrome The syndrome tells us the location of the error

Position in messagebinary decimal0001 10010 20011 30100 40101 50110 60111 71000 81001 91010 101011 111100 121101 131110 141111 15

Page 33: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

33R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Hamming codes (con’t)

The check bits specify the error location

Suppose check bits turn out to be as follows: Check c1 = 1 (Bits 1,3,5,7,9,11,13,15)

Error is in one of bits 1,3,5,7,9,11,13,15 Check c2 = 1 (Bits 2,3,6,7,10,11,14,15)

Error is in one of bits 3,7,11,15 Check c3 = 0 (Bits 4,5,6,7,12,13,14,15)

Error is in one of bits 3,11 Check c4 = 0 (Bits 8,9,10,11,12,13,14,15)

So error is in bit 3!!

Page 34: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

34R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Hamming codes (cont.)

Example: Encode 10111011011 Code position: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Code symbol: 1 0 1 1 1 0 1 c4 1 0 1 c3 1 c2 c1 Codeword: 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1

Notice that we can generate the code bits on the fly!

What if we receive 101100111011101? c4 = 1 101100111011101 c3 = 0 101100111011101 c2 = 1 101100111011101 c1 = 1 101100111011101 The error is in location 1011 = 1110

Page 35: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

35R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Kolmogorov Complexity (Algorithmic Information)

Computers represent information as stored symbols Not probabilistic n the Shannon sense) Can we quantify information from an algorithmic standpoint?

Kolmogorov complexity K(s) of a finite binary string s is the single, natural number representing the minimum length (in bits) of a program p that generates s when run on a Universal Turing machine U K(s) is the algorithmic information content of s Quantifies the “algorithmic randomness” of the string

K(s) is an uncomputable function Similar argument to the halting problem

How do we know when we have the shortest program?

Page 36: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

36R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Kolmogorov Complexity: Example

Randomness of a string defined by shortest algorithm that can print it out.

Suppose you were given the binary string x:“11111111111111….11111111111111111111111” (1000 1’s)

Instead of 1000 bits, you can compress this string to a few tens of bits, representing the length |P| of the program: For I = 1 to 1000

Print “1”

So, K(x) <= |P|

Possible project topic: Quantum Kolmogorov complexity?

Page 37: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

37R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

5-minute break…

Next: Thermodynamics and Reversible Computing

Page 38: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

38R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Thermodynamics and the Physics of Computation

Physics imposes fundamental limitations on computing Computers are physical machines Computers manipulate physical quantities Physical quantities represent information

The limitations are both technological and theoretical Physical limitations on what we can build

Example: Silicon-technology scaling Major limiting factor in the future: Power Consumption Theoretical limitations of energy consumed during computation

Thermodynamics and computation

Page 39: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

39R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Principal Questions of Interest

How much energy must we use to carry out a computation? The theoretical, minimum energy

Is there a minimum energy for a certain rate of computation? A relationship between computing speed and energy consumption

What is the link between energy and information? Between information–entropy and thermodynamic–entropy

Is there a physical definition for information content? The information content of a message in physical units

Page 40: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

40R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Main Results

Computation has no inherent thermodynamic cost A reversible computation, that proceeds at an infinitesimal rate,

consumes no energy

Destroying information requires kTln2 joules per bit Information-theoretic bits (not binary digits)

Driving a computation forward requires kTln(r) joules per step r is the rate of going forward rather than backward

Page 41: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

41R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Basic thermodynamics

First law: Conservation of energy (heat put into system) + (work done on system) = increase in energy

of a system Q + W = U Total energy of the universe is constant

Second law: It is not possible to have heat flow from a colder region to a hotter region i.e. Q/T >= 0 Change in Entropy S = Q/T Equality holds only for reversible processes The entropy of the universe is always increasing

Page 42: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

42R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Heat engines

A basic heat engine: Q2 = Q1 – W T1 and T2 are temperatures T1 > T2

Reversible heat engines are those that have: No friction Infinitesimal heat gradients

The Carnot cycle: Motivation was steam engine Reversible Pumps heat Q from T1 to T2 Does work W = Q

Page 43: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

43R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Heat engines (cont.)

Page 44: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

44R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

The Second Law

No engine that takes heat Q1 at T1 and delivers heat Q2 at T2 can do more work than a reversible engine W = Q1 – Q2 = Q1(T1 – T2) / T1

Heat will not, by itself, flow from a cold object to a hot object

Page 45: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

45R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Thermodynamic entropy

If we add heat Q reversibly to a system at fixed temperature T, the increase in entropy of the system is S = Q/T

S is a measure of degrees of freedom The probability of a configuration

The probability of a point in phase space

In a reversible system, the total entropy is constant

In an irreversible system, the total entropy always increases

Page 46: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

46R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Thermodynamic versus Information Entropy

Assume a gas containing N atoms Occupies a volume V1

Ideal gas: No attraction or repulsion between particles

Now shrink the volume Isothermally (at constant temperature, immerse in a bath) Reversibly, with no friction

How much work does this require?

Page 47: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

47R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Compressing the gas

From mechanics work = force × distance force = pressure × (area of piston) volume change = (area of piston) × distance Solving:

From gas theory The idea gas law: N number of molecules k Boltzmann’s constant (in

joules/Kelvin)

Solving:

W F x

F pA

V A x

W p V

pV NkT

WNkTV

dV NkTVV

V

V

FHIKz1

22

1ln

Page 48: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

48R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

A few notes

W is negative because we are doing work on the gas:V2 < V1

W would be positive if the gas did work for us

Where did the work go? Isothermal compression

The temperature is constant (same before and after)

First law: The work went into heating the bath

Second law: We decreased the entropy of the gas and increased the entropy of the bath

Page 49: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

49R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Free energy and entropy

The total energy of the gas, U, remains unchanged Same number of particles Same temperature

The “free energy” Fe, and the entropy S both change Both are related to the number of states (degrees of freedom)

Fe = U – TS

For our experiment, change in free energy is equal to the work done on the gas and U remains unchanged

FHGIKJ

FHGIKJ

NkTVV

F U T S T S

S NkVV

eln

ln

2

1

2

1

Fe is the (negative) heat siphoned off

into the bath

Page 50: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

50R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Special Case: N = 1

Imagine that our gas contains only one molecule Take statistical averages of same molecule over time rather than over

a population of particles Halve the volume

Fe increases by +kTln2S decreases by kln2But U is constant

What’s going on? Our knowledge of the possible locations of

the particle has changed! Fewer places that themolecule can be in,

now that volume has been halved The entropy, a measure of the uncertainty

of a configuration, has decreased

Page 51: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

51R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Thermodynamic entropy revisited

Take the probability of a gas configuration to be P Then S ~ klnP

Random configurations (molecules moving haphazardly) have large P and large S

Ordered configurations (all molecules moving in one direction) have small P and small S

The less we know about a gas… the more states it could be in and the greater the entropy

A clear analogy with information theory

Page 52: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

52R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

The fuel value of knowledge

Analysis is from Bennett: Tape cells with particles coding 0 (left side) or 1 (right side)

If we know the message on a tape Then randomizing the tape can do useful work

Increasing the tape’s entropy

What is the fuel value of the tape(i.e. what is the fuel value of our knowledge)?

Page 53: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

53R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Bennett’s idea

The procedure Tape cell comes in with known particle location Orient a piston depending on whether cell is a 0 or a 1 Particle pushes piston outward

Increasing the entropy by kln2Providing free energy of kTln2 joules per bit

Tape cell goes out with randomized particle location

Page 54: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

54R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

The energy value of knowledge

Define fuel value of tape = (N – I)kTln2 N is the number of tape cells I is information (Shannon)

Examples Random tape (I = N) has no fuel value Known tape (I = 0) has maximum fuel value

Page 55: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

55R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Feynman’s tape-erasing machine

Define the information in the tape to be the amount of free energy required to reset the tape The energy required to compress each bit to a known state Only the “surprise” bits cost us energy

Doesn’t take any energy to reset known bits Cost to erase the tape: IkTln2 joules

For known bits, just move the partition (without changing the volume)

Page 56: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

56R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Reversible Computing

A reversible computation, that proceeds at an infinitesimal rate, destroying no information, consumes no energy Regardless of the complexity of the computation The only cost is in resetting the machine at the end Erasing information costs energy

Reversible computers are like heat engines If we run a reversible heat engine at an infinitesimal pace, it

consumes no energy other than the work that it does

Page 57: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

57R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Energy cost versus speed

We want our computations to run in finite time We need to drive the computation forward

Dissipates energy (kinetic, thermal, etc.)

Assume we are driving the computation forward at a rate r The computation is r times as likely to go forward as go backward

What is the minimum energy per computational step?

Page 58: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

58R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Energy-driven computation

Computation is a transition between states State transitions have an associated energy diagram

Assume forward state E2 has a lower energy than backward state E1

“A” is the activation energy for a state transition Thermal fluctuations cause the computer to move between states

Whenever the energy exceeds “A”

We also used this model in neural

networks (e.g. Hopfield networks)

Page 59: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

59R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

State transitions

The probability of a transition between states differing in positive energy E is proportional to exp(–E/kT)

Our state transitions have unequal probabilities The energy required for a forward step is (A – E1) The energy required for a backward step is (A – E2)

forward rate , and backward rate

forward ratebackward rate

C C

r

e e

e

A EkT

A EkT

E EkT

1 2

1 2

Page 60: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

60R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Driving computation by energy differences

The (reaction) rate r depends only on the energy difference between successive states The bigger (E1 – E2), the more likely the state transitions, and the

faster the computation

Energy expended per step = E1 – E2= kTlnr

Page 61: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

61R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Driving computation by state availability

We can drive a computation even if the forward and backward states have the same energy As long as there are more forward states than backward states

The computation proceeds by diffusion More likely to move into a state with greater availability Thermodynamic entropy drives the computation

rnn

kT r kT n n S S T

forward ratebackward rate

2

1

2 1 2 1ln ln lnaf afb ga f

Page 62: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

62R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Rate-Driven Reversible Computing: A Biological Example

Protein synthesis is an example... of (nearly) reversible computation of the copy computation of a computation driven forward by thermodynamic entropy

Protein synthesis is a 2-stage process 1. DNA forms mRNA 2. mRNA forms a protein

We will consider step 1

Page 63: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

63R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

DNA

DNA comprises a double-stranded helix Each strand comprises alternating phosphate and sugar groups One of four bases attaches to each sugar

Adenine (A)Thymine (T)Cytosine (C)Guanine (G)

(base + sugar + phosphate) group is called a nucleotide

DNA provides a template for protein synthesis The sequence of nucleotides forms a code

Page 64: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

64R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

RNA polymerase

RNA polymerase attaches itself to a DNA strand Moves along, building an mRNA strand one base at a time

RNA polymerase catalyzes the copying reaction Within the nucleus there is DNA, RNA polymerase, and triphosphates

(nucleotides with 2 extra phosphates), plus other stuff The triphosphates are

adenosine triphosphate (ATP) cytosine triphosphate (CTP)guanine triphosphate (GTP)uracil triphosphate (UTP)

Page 65: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

65R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

mRNA

The mRNA strand is complementary to the DNA The matching pairs are

DNA RNA A U T A C G G C

As each nucleotide is added, two phosphates are released Bound as a pyrophosphate

Page 66: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

66R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

The process

Page 67: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

67R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

RNA polymerase is a catalyst

Catalysts influence the rate of a biochemical reaction But not the direction

Chemical reactions are reversible RNA polymerase can unmake an mRNA strand

Just as easily as it can make oneGrab a pyrophosphate, attach to a base, and release

The direction of the reaction depends on the relative concentrations of the pyrophosphates and triphosphates More triphosphates than pyrophosphates: Make RNA More pyrophosphates than triphosphates: Unmake RNA

Page 68: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

68R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

DNA, entropy, and states

The relative concentrations of pyrophosphate and triphosphate define the number of states available Cells hydrolyze pyrophosphate to keep the reactions going forward

How much energy does a cell use to drive this reaction? Energy = kTlnr = (S2 – S1)T ~ 100kT/bit

Page 69: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

69R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Efficiency of a representation

Cells create protein engines (mRNA) for 100kT/bit

0.03µm transistors consume 100kT per switching event

Think of representational efficiency What does each system get for 100kT?

Digital logic uses an impoverished representation 104 switching events to perform an 8-bit multiply

Semiconductor scaling doesn’t improve the representation We pay a huge thermodynamic cost to use discrete math

Page 70: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

70R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Example 2: Computing using Reversible Logic Gates

Two reversible gates: controlled not (CN) and controlled controlled not (CCN).

A CN gate

A

B

A'

B'

A CCN gate

A A'

B B'

C C'A B A’ B’0 0 0 00 1 0 11 0 1 11 1 1 0

A B C A’ B’ C’0 0 0 0 0 00 0 1 0 0 10 1 0 0 1 00 1 1 0 1 11 0 0 1 0 01 0 1 1 0 11 1 0 1 1 11 1 1 1 1 0

CCN is complete: we can form any Boolean function using only CCN gates: e.g. AND if C = 0

Page 71: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

71R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Next Week: Quantum Computing

Reversible Logic Gates and Quantum Computing Quantum versions of CN and CCN gates Quantum superposition of states allows exponential speedup

Shor’s fast algorithm for factoring and breaking the RSA cryptosystem

Grover’s database search algorithm

Physical substrates for quantum computing

Page 72: CSE 599 Lecture 7: Information Theory, Thermodynamics and Reversible Computing

72R. Rao, Week 3: Information Theory, Thermodynamics, and Reversible Computing

Next Week…

Guest Lecturer: Dan Simon, Microsoft Research Introductory lecture on quantum computing and Shor’s algorithm

Discussion and review afterwards

Homework # 4 due: submit code and results electronically by Thursday (let us know if you have problems meeting the deadline)

Sign up for project and presentation times

Feel free to contact instructor and TA if you want to discuss your project

Have a great weekend!