the free-energy principle : a rough guide to the brain ? k friston

The free-energy principle: a rough guide to the brain?

K FristonComputational Modeling of Intelli-

gence11.03.04.(Fri)

Summarized by Joon Shik Kim

Sufficient Statistics• Quantities which are sufficient to pa-

rameterise a probability density (e.g., mean and covariance of a Gaussian density).

2

2( )21( )

2

x

p x e

Surprise• or self-information is the negative

log-probability of an outcome. An improbable outcome is there-fore surprising.

ln ( | )p y y

: sensory input

: action

Kullback-Leibler Divergence• Information divergence, information

gain, cross or relative entropy is a non-commutative measure of the dif -ference between two probability dis-tributions.

( )( || ) ( ) log( )KLp xD P Q p x dxq x

Conditional Density• or posterior density is the probability

distribution of causes or model pa-rameters, given some data; i.e., a probabilistic mapping from observed data to causes.

( | ) ( )( | )( )

P E H P HP H EP E

( ) ( | ) ( )i ii

P E p E H p H

EH( )p H

( | )p H E

: evi-dence: hypothe-sis: prior: posterior

Generative Model• or forward model is a probabilistic

mapping from causes to observed consequences (data). It is usually specified in terms of the likelihood of getting some data given their causes (parameters of a model) and priors on the parameters

( | ) ( | ) ( )p w D p D w p w

Prior• The probability distribution or density

on the causes of data that encode beliefs about those causes prior to observing the data.

( )p H or ( )p w

Empirical Priors• Priors that are induced by hierarchi-

cal models; they provide constraints on the recognition density in the usual way but depend on the data.

Bayesian Surprise• A measure of salience based on the

divergence between the recognition and prior densities. It measures the information in the data that can be recognised.

Entropy• The average surprise of outcomes

sampled from a probability distribu-tion or density. A density with low en-tropy means, on average, the out-come is relatively predictable.

( ) ln ( )S p x p x dx

Ergodic• A process is ergodic if its long term

time-average converges to its en-semble average. Ergodic processes that evolve for a long time forget their initial states.

1

0

1 1lim ( )( )

nk

n k

f T x fdn x

Free-energy• An information theory measure that

bounds (is greater than) the surprise on sampling some data, given a gen-erative model.

( , | )F y E TS

ln ( , | ) ln ( , )q qp y q

( ( ; ) || ( | )) ln ( | )F D q p y p y m

Generalised Coordinates• of motion cover the value of a vari-

able, in its motion, acceleration, jerk and higher orders of motion. A point in generalised coordinates corre-sponds to a path or trajectory over time.

, ', '',...u u u u

Gradient Descent• An optimization scheme that finds a

minimum of a function by changing its arguments in proportion to the negative of the gradient of the func-tion at the current value.

( 1) ( ) Ew t w tw

Helmholtz Machine• Device or scheme that uses a gener-

ative model to furnish a recognition density. They learn hidden structure in data by optimising the parameters of generative models.

Stochastic• The successive states of stochastic

processes that are governed by ran-dom effects.

Free Energy

Dynamic Model of World and Recog-nition

Neuronal Architecture

What is the computational role of neuromodulation?

• Previous treatments suggest that modulatory neu-rotransmitter have distinct roles; for example, ‘dopamine signals the error in reward prediction, serotonin controls the time scale of reward predic-tion, noradrenalin controls the randomness in ac-tion selection, and acetylcholine controls the speed of memory update. This contrasts with a single role in encoding precision above. Can the apparently diverse functions of these neuro-transmitters be understood in terms of one role (encoding precision) in different parts of the brain?

Can we entertain ambiguous per-cepts?

• Although not an integral part of the free-en-ergy principle, we claim the brain uses uni-modal recognition densities to represent one thing at a time. Although, there is compelling evidence for bimodal ‘priors’ in sensorimotor learning, people usually assume the ‘recogni-tion’ density collapses to a single percept, when sensory information becomes available. The implicit challenge here is to find any elec-trophysiological or psychological evidence for multimodal recognition densities.

Does avoiding surprise suppress salient information?

• No; a careful analysis of visual search and atten-tion suggests that: ‘only data observations which substantially affect the observer’s beliefs yield (Bayesian) surprise, irrespectively of how rare or informative in Shannon’s sense these observa-tions are.’ This is consistent with active sampling of things we recognize (to reduce free-energy). However, it remains an interesting challenge to formally relate Bayesian surprise to the free-en-ergy bound on (Shannon) surprise. A key issue here is whether saliency can be shown to depend on top-down perceptual expectations.

Which optimisation schemes does the brain use?

• We have assumed that the brain uses a de-terministic gradient descent on free-energy to optimise action and perception. However, it might also use stochastic searches; sampling the sensorium randomly for a percept with low free-energy. Indeed, there is compelling evi-dence that our eye movements implement an optimal stochastic strategy. This raises inter-esting questions about the role of stochastic searches; from visual search to foraging, in both perception and action.

the free-energy principle : a rough guide to the brain ? k friston

Documents

causes of data

probability density

recognition density

observed data

consequences data

model parameters

causes parameters

gaussian density