the four c 's of neuroinformation theory: c oding, c omputing, c ontrol and c ognition ibm...

THE FOUR C's OF NEUROINFORMATION THEORY:

CODING, COMPUTING,

CONTROL AND COGNITION

IBM Almaden: Institute on Cognitive Computing, May 10-11, 2006

Toby Berger

University of Virginia

Charlottesville, VA 22903

Sensory System; Brain

Motor SystemEnvironment

Selector

))1(),(|)((3

kekmkep ))1(),(|)((2

kmkvkmp

))1()),1(),1(|)((1

kmkvkskvp

)(ke

)1( ke

)1( ks

)(km

)(kv

)1( km )1( kv

FIG. 1 BLOCK DIAGRAM OF MARKOV-MARKO BRAIN MODEL

Sensory System


Selector

))1(),(|)((3

kekmkep ))1(),(|)((2

kmkvkmp

))1()),1(),1(|)((1

kmkvkskvp

)(ke

)1( ke

)1( ks

)(km

)(kv

)1( km )1( kv


(Control)

(Coding)

(Cognition)

(Computation)

• Prof. William B. “Chip” Levy, UVA Med - Neuroscientist and my prime bio-collaborator.

• Former Grad Students: Zhen Zhang, Yuzheng Ying, Jun Chen

• PhD Candidate: Prapun Suksompong

MY BIO-IT COLLABORATORS

FIGURE 1 OF EVERY INFORMATION THEORY TEXTBOOK

Channel is “fixed” and “given.” Future source data does not depend on past outputs to user (open loop).

Channel behavior is independent of source statistics.

Good performance usually requires computationally intense, long-delay source and channel codes .

Source and user must exchange coding rules a priori and must share a common “language.”

SourceSource

Encoder

Channel

Source Decoder

User

Channel Encoder

Channel Decoder

BUT, IN THE INTRA-ORGANISM COMMUNICATION THAT NEUROSCIENTS STUDY,

• Channels are not fixed. They adapt their transition

probabilities over eons, or over milliseconds, in response to

the empirical distribution of the source.

• Future source data depends on past outputs to user.

• Time-varying joint source-channel coding often can be

efficiently performed by biochemical subsystems of

appropriate topology via simple probabilistic

transformations. No coding occurs in the classical

sense of information theory.

WAIT! What about DNA?Long block code, discrete alphabet, extensive redundancy, perhaps to control against the infiltration of errors.

But DNA enables two organisms to communicate; it’s designed for inter-organism communication.

DNA also controls gene expression, an intra-organism process, so a comprehensive theory of intra-organism communication needs to address it

eventually.

ROBUST SHANNON-OPTIMAL PERFORMANCE WITHOUT CODING

Ex. 1: IID Source, MSE

AWGN Channel

)2

log(21)(

DDR

)1log(21

NSC

Equating R(D) to C yields the Shannon-optimum mean distortion:

But this minimum possible MSE per unit variance can be achieved simply by scaling the signal to the available channel input power level and then scaling the channel output to produce the MMSE estimate!

Source P

+SX

UserY

)N N,0(

1)1(2 NSD

1)1(22)( NSYXE

Channel

)2,0( N

NS

S

2

ROBUST SHANNON-OPTIMAL PERFORMANCE WITHOUT CODING

Ex. 2: Bern-1/2 Source, HammingDistance

BSC(p) Channel

Equating R(D) to C yields the Shannon-optimum Hamming distortion:

This minimum possible Hamming distortion obviously can be achieved simply by feeding the source output directly into the channel and sending the channel output directly to the user – no delay, no coding!!

Source,Bern-1/2

XUser

Y

)(1)( DhDR

)(1 phC

pD

BSC(p)

pYXE )(

SHANNON OPTIMALITY IS ACHIEVED WITHOUT CODING OR DELAY IN THESE TWO EXAMPES BECAUSE:

Source is matched to the channel. Source outputs are distributed over channel input space in a way that maximizes the mutual info rate between the channel input and output subject to operative constraint(s), thereby achieving capacity.

Channel is matched to the source. The channel transition probability structure is optimum for the source and distortion measure; i.e., it achieves the point on their rate-distortion function at which the rate equals the channel’s capacity.

[INSPIRED BY MY ABOVE EXAMPLES 1 AND 2, B. RIMOLDI, M. GASTPAR AND M. VETTERLI HAVE DETERMINED A BROAD CLASS OF EXAMPLES THAT EXHIBIT SUCH DOUBLE MATCHING, FIRST WITHOUT AND LATER WITH NOISELESS FEEDBACK OF THE CHANNEL OUTPUTS TO THE ENCODER.]

I CONTEND THAT MOST BIOLOGICAL SYSTEMS HAVE EVOLVED TO BE NEARLY DOUBLY MATCHED LIKE THIS. THUS, THEY HANDLE DATA OPTIMALLY WITH MINIMAL IF ANY CODING AND NEGLIGIBLE DELAY.

Information theorists recently have come to appreciate that near-optimum performance can be obtained in many situations via relatively simple probabilistic methods that employ feedback in the source encoder and/or around the channel, and/or in the channel decoder. Biology has knows this for eons.

BUT THERE’S MORE! LIVING ORGANISMS ARE INGENIOUSLY ENERGY-AWARE*.

THEY’RE OPTIMALLY DOUBLY MATCHED OVER A WIDE RANGE OF POWER CONSUMPTION LEVELS.

THEY HAVE EVOLVED THE ABILITY TO CHANGE THEIR INTERNAL CHANNEL TRANSITION FUNCTIONS, OVER BOTH THE LONG RUN AND THE SHORT RUN, TO MEET THE INFORMATION RATE NEEDS OF THE APPLICATION AT HAND.

*The brain consumes 25-50% of the total metabolic energy budget of sedentary human. (L. Sokoloff (1989), “Circulation and energy metabolism of the brain,” in Basic Neurochemistry: Molecular, Cellular and Medical Aspects, 4th ed., G. Siegel et al., Eds.)

Average power – joules/s

Cap

acit

y –

bit

s/s

C

S

Slope = (bits/s)/(joules/s) = bits/joule

N.B. Increasing joules/s to get more bits/s requires expending more joules/bit !!

)1log(21

NSC

NEURON CARDINALITYThere are approximately 1011 neurons in the human brain.

Each neuron forms synapses with between 10 and 105 others, resulting in a total of circa 1015 synapses.

From age -1/2 to age +2, the number of synapses increases at net rate of a million per second, day and night; many are abandoned, too.

It had long been believed that neuron and synapse formation effectively cease after age 1 and age 2, respectively, but recent studies have shown that they continue until at least age 6.

MULTICASTING:• Viewed as a network, the human brain simultaneously

multicasts 1011 messages that have an average of 104 recipients each. Each of of these 1011 x 104 = 1015 destinations receives a new binary digit – spike or no spike - once every 2.5 ms, which is the effective spike width.

• Moreover, 2.5 ms later another petabit that depends on the outcome of processing the previous one has been multicast. (The Internet pales by comparison!)

• The brain does not simply use store-and-forward routing. Rather, it uses an intensive form of network coding, the exciting new information-theoretic discipline recently introduced by Raymond Yeung and Bob Li. (See, e.g., the latest IT Outstanding Paper Award winning article by Yeung, Li, Ahlswede, and Cai.)

Time permitting, we shall see below that the fact that neurons actually fire asynchronously in continuous time may enable them to send considerably more bps than their relatively low firing rates suggest is the case.

DEFINITION OF A “TEAM” OF SENSORY NEURONS

THE AXONS IN A TEAM OF SENSORY NEURONS FORM MANY OF THEIR SYNAPSES WITH OTHER NEURONS IN THE TEAM (HORIZONTAL, FEEDBACK). SOMETIMES THE LOCAL CONNECTIVITY IS CLOSE TO 50%, AS OPPOSED TO ONLY 10-7 BRAINWIDE.

THE REMAINDER OF THE SYNAPSES TO WHICH A TEAM’S AXONS ARE EFFERENT ARE SPLIT BETWEEN “LOWER” NEURONS (TOP-DOWN FEEDBACK) AND “HIGHER” NEURONS (BOTTOM-UP FEEDFORWARD).

PSPs

TIME-DISCRETE MODEL OF A “TEAM” OF NEURONS

MAXIMUM INFORMATION RATE HYPOTHESIS

The process {X(k))} afferent to a team of neurons has the property that it maximizes the directed mutual information rate from {X(k)} to the efferent process {Y(k)} that it generates, where the maximization is over all processes that lead to the same or smaller energy expenditure in the Y-neurons.

Remarks: 1)Energy is expended in the synapses both in receiving and in responding to afferent excitation, and in the axons both to restore chemical concentrations during refractory periods following action potential generation and, to a lesser extent, to drive spikes down the axonal ‘transmission lines’.

2) Time permitting, directed information will be defined in a subsequent slide.

The Brain as a Markov Chain

MAIN THEOREM:

IF MAXIMUM INFORMATION RATE HYPTOTHESIS IS TRUE, THEN:

• {(Xk,Yk} is a first-order (non-homog) Markov chain

• {Yk} is a first-order (non-homog) Markov chain

• {Xk} is not necessarily Markovian

PROOF: Via the Berger-Ying lemmas: Joint work with Yuzheng Ying, to appear in IEEE IT Trans.

REMARKS:

• The max info rate hypothesis says the source {X(k)} is robustly “matched” to the channel’s transition matrix, P(y|x).

• If double matching prevails, as we suspect it does, then the QSF rate parameterizes the rate-distortion function, and distortion is measured by a Weber-Fechner fidelity criterion of the form

n

kkykykxdn

1)).1(),(),((1

• The Markovianness of the Main Theorem is essential to the brain’s low-latency processing of sensory information. Without it, bottom-up delay would accumulate too fast to allow for the number of hierarchy levels needed to achieve the sophisticated distinctions of which the brain is capable.

It is widely held that the principal, if not the only, information transmission task a neuron is called upon to perform is to convey continually to its efferent cohort the value of the afferent excitation intensity (a.k.a. the “bombardment”) it has recently been experiencing.

NEURAL CODING AND SYNAPTIC CLOCKS

Several investigators have studied the statistics of the durations of interspike intervals (ISI’s ) for mathematical models of leaky, fixed-threshold neurons. Both with and without a refractory period included in the model, the ISI’s coefficient of variation (i.e., the ratio, / m, of its standard deviation to its mean) is greater than 1 over almost the entire range of afferent excitation levels of practical interest; the only exception is at the highest excitation levels that result in the neuron firing about as fast as it can (saturation).

FIXED THRESHOLD NEURAL MODELS ARE PLAGUED BY LARGE COEFFICIENTS OF VARIATION

This renders timing codes virtually useless, leaving rate codes as the only means by which a neuron can reliably communicate information to its efferent neighbors about the bombardment intensity it is currently experiencing.

However, that is in direct conflict with numerous recent experiments which convincingly demonstrate that many neurons in cortex and elsewhere exhibit reliable ISI’s in response to repetitions of investigator-controlled stimuli. Also, animals can respond intelligently at latencies which are substantially lower than the time it would take for a hierarchy of rate codes to achieve a useful level of statistical reliability.

A compelling (?) case for this has been made by Berger and Levy,

Encoding of Excitation via Dynamic Thresholding

NEUROSCIENCE 2004

San Diego, CA 10/23-28/2004

Time, ms 4 8 10 12

PS

P

Increasing

MEAN PSP v. TIME FOR VARIOUS BOMBARDMENT INTENSITIES

PS

P

4 8

Time, ms

Filtered Poisson PSP’s v. Time

Fixed Threshold

Descending Threshold

Spiking times of red and blue PSPs for descending threshold

Spiking times of red and blue PSPs for fixed threshold

PS

P

Time, ms

DYNAMICALLY DESCENDING THRESHOLDS ENABLE TIMING CODES

A descending threshold can serve as a simple mechanism by means of which a neuron can accurately convert (i.e., encode) - into the duration of the ISI between any two of its successive AP’s - the value of the excitation intensity it has experienced during said ISI. This statement is true regardless of whether the intensity in question is strong, moderate, or weak. A neuron that possesses a fixed threshold cannot accomplish this.

It is also known that synapses possess chemical “clocks” that enable them to “remember” even for hundreds of milliseconds how long ago its most recent and its next-to-most recent afferent spikes arrived.

ALL THIS LEADS ME TO BELIEVE THAT NEURONS DO INDEED IMPLEMENT ACCURATE, LOW-LATENCY TIMING CODES BY MEANS OF DYNAMIC POST-SYNAPTIC POTENTIAL THRESHOLDS THAT DECAY WITH TIME.

Alternatively, a neuron also can achieve much the same result by having a post-synaptic leakage conductance that varies inversely with PSP. (See, e.g., Brette and Gensler, 2005.)

It may well be that neurons employ a combination of theshold decay and variable leakage conductance. However, in what follows we use only threshold decay terminology.

1. The precise shape of the threshold decay curve is not important; the neurons in the efferent cohort can readily adapt to the shape of T(t).

2. The resulting variance in estimating has the form

3. If instead you are interested in estimating ,

4. To estimate the accuracy of ISI encoding of bombardment intensity, one must take into account at least the following three sources of imprecision:

i) Imprecision in the instant of generation of an APii) Imprecision in the rates of axonal propagation

along the axon for two successive action potentials

(Var .) 1 c

./]log)[( 2log cVar

log

iii) imprecision in the estimate of the AP’s time of arrival at the synapse. (See Berger and Suksompong, IEEE ISIT, Seattle, July 9-15, 2006.) Doing so shows that neural encoding bit rates can be meaningfully higher than previously had been thought!

5. If the excitation is a time-varying Poisson process, then its intensity is a sufficient statistic for stochastically describing it, so it is the only thing that needs to be communicated.

6. The excitation of a (cortical) neuron is indeed robustly a time-varying Poisson process, despite the individual spike trains of which it is composed not being Poisson and possibly being highly correlated. (This is a consequence of Stein-Chen Poisson approximation theory; cf. C. Stein, IMS Lecture Notes, vol. 78, Lecture VIII, IMS, Hayward, CA, 1986, and subsequent work of Barbour et al., among others.)

)(t

Consider a sparsely connected, feedback-heavy network of hundreds of millions of neurons most of which have an in-degree and out-degree of circa 10,000. When galvanized by sensory inputs and exchanging their excitation histories in the manner described above, what kinds of decisions, computations, and responses can such a network generate? (N.B The excitation history that a neuron communicates does not directly propagate beyond its first-tier neighbors.)

A CHALLENGING, IMPORTANT QUESTION ABOUT RNN’s

Sensory System


Selector

))1(),(|)((3

kekmkep ))1(),(|)((2

kmkvkmp

))1()),1(),1(|)((1

kmkvkskvp

)(ke

)1( ke

)1( ks

)(km

)(kv

)1( km )1( kv


(Control)

(Coding)

(Cognition)

(Computation)

AT THE START OF TIME SLOT k , e(k-1), s(k-1), v(k-1) and m(k-1) ALL EXIST ALREADY.

AS SLOT k PROGRESSES, FIRST v(k), NEXT m(k), NEXT e(k), AND FINALLY s(k) GET PRODUCED IN THAT ORDER.

ACTIVITY DURING TIME SLOT k

MARKOV REVISITEDTHE NOTATION USED IN THE BOXES IN FIG. 1, e.g.,

IMPLIES THAT THE CONDITIONAL PROBABILITY OF THE RANDOM VECTOR APPEARING BEFORE THE CONDITIONING BAR WOULD NOT CHANGE IF ONE WERE TO INCLUDE AFTER THE CONDITIONING BAR TIME-PREDECESSORS OF ONE OR MORE OF THE VECTORS THAT CURRENTLY APPEAR THERE. THAT IS, THE MODEL TREATS THE (SENSORY, MOTOR, ENVIRONMENT)-DYNAMIC SYSTEM AS JOINTLY FIRST-ORDER MARKOV. SURELY, THIS IS ONLY AN APPROXIMATION TO REALITY. HOWEVER, THE NEXT TWO SLIDES DISCUSS HOW TO BUILD THE SENSORY PORTION OF THE MODEL SO THAT IT ACCURATELY RESPECTS THE NEUROBIOLOGY WHILE AT THE SAME TIME BEING FIRST-ORDER MARKOV.

)),1()),1(),1(|)((1

kmkvkskvp

BRAIN STATE AS A FIRST-ORDER MARKOV PROCESS

IT DOES NOT SUFFICE TO USE AS THE STATE OF THE BRAIN AT TIME k A BINARY VECTOR WHOSE jth COMPONENT EQUALS 1 IF NEURON j HAS FIRED DURING SLOT k-1 AND 0 IF IT HAS NOT. THAT’S BECAUSE THE NEURONS THAT HAVE NOT FIRED DURING THE LAST SLOT CARRY OVER INTO THE NEXT SLOT INFORMATION ABOUT THE SIZE OF THEIR SUB-THRESHOLD PSP’s AND THE STATUS OF CERTAIN OF THEIR SYNAPTIC CLOCKS.

INSTEAD, WE INTRODUCE A STATE VECTOR L(k) WHOSE jth COMPONENT IS THE NUMBER OF TIME SLOTS THAT HAVE TRANSPIRED SINCE THE LAST SLOT IN WHICH NEURON j GENERATED A SPIKE. THE COMPONENTS OF L(k) THAT ARE ZERO INDEX THE SET OF NEURONS THAT HAVE JUST FIRED IN THE PREVIOUS SLOT, SO THIS SUBSUMES THE USUAL STATE VECTOR. MOREOVER, IT ALLOWS US TO TAKE DYNAMIC THRESHOLDS INTO ACCOUNT, WITH ABSOLUTE REFRACTORINESS CORRESPONDING TO A THRESHOLD THAT IS INFINITELY HIGH DURING THE SLOT IMMEDIATELY FOLLOWING ONE IN WHICH A NEURON HAS FIRED. L(k) CAPTURES EVERYTHING THAT MATTERS EXCEPT QUANTAL SYNAPTIC FAILURE (QSF), WHICH WE ADDRESS ON THE NEXT SLIDE.

BRAIN STATE AUGMENTSED BY QSF DATA

QSF’s PROVIDE A POTENT MECHANISM FOR MAKING THE CONDITIONAL DISTRIBUTIONS IN THE THREE MAIN BOXES OF OUR MODEL GENUINELY PROBABILISTIC. THIS IS CRUCIAL TO MANY PHENOMENA OF NEUROSCIENTIFIC INTEREST, INCLUDING THE BUILDING OF AN INTERNAL STOCHASTIC MODEL OF THE ENVIRONMENT THE RANDOM NATURE OF WHICH CAN BE VARIED RAPIDLY OVER A LARGE DYNAMIC RANGE.

INCORPORATING QSF’s INECESSITATES INCREASING THE SIZE OF THE STATE VECTOR FROM THE NUMBER OF NEURONS TO THE NUMBER OF SYNAPSES, A FACTOR OF ABOUT 104 IN THE CASE OF THE HUMAN BRAIN. THE COMPONENTS THUS ADDED ARE BINARY, EQUALING 1 IF THE LAST SPIKE AFFERENT TO NEURON j FROM NEURON i WAS FAILED AND 0 IF IT WASN’T. THIS IS BECAUSE THE CONDITIONAL PROBABILITY THAT THE NEXT SPIKE TO ARRIVE AT SYNAPSE (i,j) WILL BE FAILED DEPENDS BPTJ ON HOW LONG IT HAS BEEN SINCE A SPIKE LAST ARRIVED THERE AND ON WHETHER OR NOT THAT SPIKE WAS FAILED. WITH THIS AUGMENTATION WE GET A HIGHLY ACCURATE FIRST-ORDER MARKOV MODEL OF THE BRAIN.

MARKO REVISITEDNEURONS IN A CORTICAL REGION, SAY V2, RECEIVE SOME OF THEIR INPUTS DIRECTLY FROM OTHERS IN V2 (HORIZONTAL), SOME FROM OTHERS IN V3 AND ABOVE (TOP DOWN), AND SOME FROM OTHERS IN V1 AND BELOW (BOTTOM UP).

AS A CONSEQUENCE THE INFORMATION THESE NEURONS TRANSMIT TO OTHERS VIA THEIR AXONAL SPIKES IS DYNAMICALLY DETERMINED IN REAL TIME BY THE INPUTS THEY ARE STEADILY RECEIVING. THE

NEURONS THAT CONSTITUTE V2 THEREFORE ARE NOT INFORMATION SOURCES IN THE SHANNON SENSE. THAT IS, THEY DO NOT GENERATE DATA A PRIORI AND INDEPENDENTLY OF WHAT THEY HEAR FROM THOSE WITH WHOM THEY ARE CONVERSING. THEIR OUTPUTS ARE INSTEAD HEAVILY INFLUENCED BY INPUTS THEY HAVE RECEIVED FROM OTHERS IN BOTH THE RECENT AND THE DISTANT PAST. SUCH SOURCES THUS SUBSCRIBE TO THE COMMUNICATION MODEL INTRODUCED BY MARKO. (H. Marko,The bidirectional communication theory: A generalization of information theory, IEEE Trans. Comm., vol. COM-21, pp. 1345-1351, December 1973.)

NASA Houston

Comm Link

Comm Link

Comm Link

Control Link

CANONICAL REMOTE CONTROL PROBLEM

REPRESENTATION OF THE ENVIRONMENT

We subscribe to the view that, within its brain, a healthy organism steadily builds, refines, extends and modifies a model of it’s environment. We view this model not as some mystical or metaphysical construct but rather as being instantiated as a collection of interacting neurons. The model may be located in a particular region or regions of the brain, but its crucial importance militates for it being widely distributed over much if not all of the brain. Most of the basic infrastructure of the model is forged during gestation according to genetic prescriptions, including the design of the fundamental mechanisms by means of which the model subsequently will be extended and modified based on acquired experience.

The posited model constitutes an internal representation of the external environment. As such, it is the mechanism by which the organism persistently seeks to solve the “representation problem” of neuropsychology with ever-increasing sophistication.

THE REASON FOR MODEL BUILDING

An organism’s principal reason for constructing and continually updating its internal model of the environment is to learn how to better control that environment. If no physical actions are taken, the organism effectively defaults on any attempt at environmental control. The sine qua non, then, is to learn to generate the most effective motor responses possible based on the environmental stimuli acquired by the sensory organs.

ESTIMATING ENVIRONMENTAL RESPONSE

An organism can use its internal model of the environment to generate estimates of how the environment will react to prospective motor controls. Depending upon the amounts of time, computational ability, and energy consumption that are permissible in a given situation, the organism may be able to input many prospective motor controls to the environmental model. In this connection, since the actual environment contains sources of randomness due both to stochastic natural phenomena and to the usually unpredictable actions of other denizens of the environment, an organism’s model of it should be similarly stochastic. (QSF’s may play a major role in producing this stochasticity.) Therefore, better estimates may result if a given prospective control is put into the model more than once and statistics are gathered about the set of resulting responses of the model.

THE PERF0RMANCE CRITERION

Adopting the block diagram of Figure 1, and also subscribing to the view that an organism is always engaged in building and exercising a model of its environment in the manner described in the preceding slides, leads to the following conclusion:

The purpose of processing sensory stimuli is less to convey to the top brain what stimuli have been sensed in the past than it is to enable the brain to better predict what stimuli will be sensed in the future.

THE PERF0RMANCE CRITERION (Cont.)

IN SYMBOLS, THE SENTIMENT EXPRESSED IN THE PREVIOUS SLIDE IS THAT THE DISTORTION MEASURE TO BE APPLIED IN TIME SLOT k IS NOT OF THE FORM

BUT INSTEAD IS OF THE FORM

))(),1(( kvksd

))(),(( ksksd

WHERE )(ks IS THE BRAIN’S ESTIMATE OF WHAT

WILL BE BASED ON THE IT INPUTS TO THE ENVIROMENT, AS CALCULATED DURING SLOT k ON THE

BASIS OF THE DERIVED FROM PROCESSING

)(ks

)(kv )1( ks

)(km

MASSEY REVISITED (Cont.)

BUT THE MASSEY-TATIKONDA THEOREM ASSUMES A SHANNON-STYLE SOURCE – ONE OF 2TR PRE-GENERATED MESSAGES TO BE SENT DURING AN INTERVAL OF DURATION T. SINCE OUR (S,M,E)- MODEL USES MARKO-STYLE SOURCES, THE M-T THEOREM IS NOT APPLICABLE TO IT.

PERHAPS IT WILL TURN OUT THAT DIRECTED INFORMATION IS RELEVANT TO THE PROBLEM OF NEURAL CODING AND LEARNING, BUT AT PRESENT THERE I SEE NO COMPELLING REASON TO BELIEVE THAT IS THE CASE.

MASSEY REVISITEDDIRECTED INFORMATION WAS INTRODUCED IN A PAIR OF

CHARACTERISTICALLY BEAUTIFUL PAPERS BY JIM MASSEY.* AMONG OTHER THINGS, MASSEY SHOWED THAT THE CAPACITY OF A CHANNEL WITH MEMORY AND FEEDBACK IS GIVEN BY THE SUPREMUM OF THE DIRECTED INFORMATION RATE FROM THE CHANNEL’S INPUT TO ITS OUTPUT THAT HE INTRODUCED THEREIN, AS OPPOSED TO THE SUPREMUM OF SHANNON’S MUTUAL INFORMATION RATE WHICH HE SHOWED IS IN GENERAL STRICTLY GREATER. (S. TATIKONDA HAS SINCE PROVED THE CORRESPONDING CONVERSE THEOREM.)

*1. J. L. Massey, Causality, feedback and directed information, Proceedings of the International Symposium on Information Theory and its Applications, Honolulu, HI, Nov. 27-30, 1990.

2. J. L. Massey, Network information theory – some tentative definitions, DIMACS Workshop on Network Information Theory, March 17, 2003.

BERGER-YING LEMMAS

REGARDLESS OF WHETHER MUTUAL INFORMATION OR DIRECTED INFORMATION IS USED, THE BERGER-YING LEMMAS WILL APPLY. THE B-Y LEMMAS SAY THAT, IF IT IS DESIRED TO MAXIMIZE THE RATE AT WHICH EITHER INFORMATION OR DIRECTED INFORMATION IS SENT PART WAY OR ALL THE WAY AROUND THE LOOP FROM {s(k)} TO {v(k)} TO {m(k)} TO {e(k)}, THEN THE PROCESSES INVOLVED IN THAT PORTION OF THE LOOP WILL BE JOINTLY FIRST-ORDER MARKOV. MOREOVER, EACH OF THEM, EXCEPT PERHAPS {s(k)}, WILL BE INDIVIDUALLY FIRST-ORDER MARKOV. THESE FACTS REMAIN TRUE EVEN IF CONSTRAINTS ARE IMPOSED ON THE EXPECTED VALUES OF ONE OR MORE FUNCTIONS OF {s(k-1),v(k),m(k),e(k),v(k-1),m(k-1),e(k-1)); THIS INCLUDES CONSTRAINTS ON ENERGY USAGE.

“We have knowledge of the past, but we can’t control it. We can control the future, but we have no knowledge of it.” CLAUDE E. SHANNON, 1960

THE BRAIN IS A WONDERFUL ORGAN.

IT STARTS WORKING THE MOMENT YOU

GET UP IN THE MORNING AND DOES NOT

STOP UNTIL YOU GET TO THE OFFICE.

Robert Frost (1874-1963)

THE END

From T. S. Lee and M. Nguyen, Dynamics of subjective contour formation in the early visual cortex. PNAS 98(4):1907-1911, 2001.

Temporal Dynamics of a V1 Neuron’s Response to Real and Illusory Contours

the four c 's of neuroinformation theory: c oding, c omputing, c ontrol and c ognition ibm...

Documents

source outputs

delay source

joint sourcechannel

iid source

channel output

channel behavior

channel codes

future source data