stochastic processes for physicists - imperial · stochastic processes for physicists understanding...

29
Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular Biosciences, Imperial College London 19/03/2013

Upload: others

Post on 13-Jul-2020

31 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

Stochastic Processes for PhysicistsUnderstanding Noisy Systems

Chapter 1: A review of probability theory

Paul KirkDivision of Molecular Biosciences, Imperial College London

19/03/2013

Page 2: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.1 Random variables and mutually exclusive events

Random variables• Suppose we do not know the precise value of a variable, but may

have an idea of the relative likelihood that it will have one of anumber of possible values.

• Let us call the unknown quantity X .

• This quantity is referred to as a random variable.

Probability

• 6-sided die. Let X be the value we get when we roll the die.

• Describe the likelihood that X will have each of the values 1, . . . , 6by a number between 0 and 1: the probability.

• If Prob(X = 3) = 1, then we will always get a 3.

• If Prob(X = 3) = 2/3, then we expect to get a 3 about two-thirdsof the time.

Paul Kirk 1 of 22

Page 3: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.1 Random variables and mutually exclusive events

Mutually exclusive events

• The various values of X are an example of mutually exclusiveevents.

• X can take precisely one of the values between 1 and 6.

• “Mutually exclusive probabilities sum”◦ Prob(X = 3 or X = 4) = Prob(X = 3) + Prob(X = 4).2 A review of probability theory

Figure 1.1. An illustation of summing the probabilities of mutually exclusiveevents, both for discrete and continuous random variables.

we want to know the probability for X to be in the range from 3 to 4, we sum allthe probabilities for the values from 3 to 4. This is illustrated in Figure 1.1. SinceX always takes a value between 1 and 6, the probability for it to take a value in thisrange must be unity. Thus, the sum of the probabilities for all the mutually exclusivepossible values must always be unity. If the die is fair, then all the possible valuesare equally likely, and each is therefore equal to 1/6.

Note: in mathematics texts it is customary to denote the unknown quantityusing a capital letter, say X, and a variable that specifies one of the possiblevalues that X may have as the equivalent lower-case letter, x. We will use thisconvention in this chapter, but in the following chapters we will use a lower-caseletter for both the unknown quantity and the values it can take, since it causes noconfusion.

In the above example, X is a discrete random variable, since it takes the discreteset of values 1, . . . , 6. If instead the value of X can be any real number, then we saythat X is a continuous random variable. Once again we assign a number to each ofthese values to describe their relative likelihoods. This number is now a function ofx (where x ranges over the values that X can take), called the probability density,and is usually denoted by PX(x) (or just P (x)). The probability for X to be in therange from x = a to x = b is now the area under P (x) from x = a to x = b. Thatis

Prob(a < X < b) =∫ b

a

P (x)dx. (1.1)

Paul Kirk 2 of 22

Page 4: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.1 Random variables and mutually exclusive events

“Note: in mathematics texts it is customary to denote the unknownquantity using a capital letter, say X , and a variable that specifies oneof the possible values that X may have as the equivalent lower-caseletter, x . We will use this convention in this chapter, but in thefollowing chapters we will use a lower-case letter for both the unknownquantity and the values it can take, since it causes no confusion.”

So, rather than writing Prob(X = 3) or Prob(X = x), we will (in laterchapters) tend to write Prob(3) or Prob(x).

Warning: may cause confusion.

Paul Kirk 3 of 22

Page 5: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.1 Random variables and mutually exclusive events

“Note: in mathematics texts it is customary to denote the unknownquantity using a capital letter, say X , and a variable that specifies oneof the possible values that X may have as the equivalent lower-caseletter, x . We will use this convention in this chapter, but in thefollowing chapters we will use a lower-case letter for both the unknownquantity and the values it can take, since it causes no confusion.”

So, rather than writing Prob(X = 3) or Prob(X = x), we will (in laterchapters) tend to write Prob(3) or Prob(x).

Warning: may cause confusion.

Paul Kirk 3 of 22

Page 6: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.1 Random variables and mutually exclusive events

Continuous random variables• For continuous random variables, the probability for X to be within

a range is found by integrating the probability density function.

• Prob(a < X < b) =∫ ba P(x)dx

2 A review of probability theory

Figure 1.1. An illustation of summing the probabilities of mutually exclusiveevents, both for discrete and continuous random variables.

we want to know the probability for X to be in the range from 3 to 4, we sum allthe probabilities for the values from 3 to 4. This is illustrated in Figure 1.1. SinceX always takes a value between 1 and 6, the probability for it to take a value in thisrange must be unity. Thus, the sum of the probabilities for all the mutually exclusivepossible values must always be unity. If the die is fair, then all the possible valuesare equally likely, and each is therefore equal to 1/6.

Note: in mathematics texts it is customary to denote the unknown quantityusing a capital letter, say X, and a variable that specifies one of the possiblevalues that X may have as the equivalent lower-case letter, x. We will use thisconvention in this chapter, but in the following chapters we will use a lower-caseletter for both the unknown quantity and the values it can take, since it causes noconfusion.

In the above example, X is a discrete random variable, since it takes the discreteset of values 1, . . . , 6. If instead the value of X can be any real number, then we saythat X is a continuous random variable. Once again we assign a number to each ofthese values to describe their relative likelihoods. This number is now a function ofx (where x ranges over the values that X can take), called the probability density,and is usually denoted by PX(x) (or just P (x)). The probability for X to be in therange from x = a to x = b is now the area under P (x) from x = a to x = b. Thatis

Prob(a < X < b) =∫ b

a

P (x)dx. (1.1)

Paul Kirk 4 of 22

Page 7: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.1 Random variables and mutually exclusive events

Expectation

• The expectation of an arbitrary function, f (X ), with respect to theprobability density function P(X ) is

〈f (X )〉P(X ) =

∫ ∞−∞

P(x)f (x)dx .

• The mean or expected value of X is 〈X 〉.

Paul Kirk 5 of 22

Page 8: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.1 Random variables and mutually exclusive events

Variance• The variance of X is the expectation of the squared difference from

the mean

V [X ] =

∫ ∞−∞

P(x)(x − 〈X 〉)2dx

=

∫ ∞−∞

P(x)(x2 + 〈X 〉2 − 2x〈X 〉)dx

=

∫ ∞−∞

P(x)x2dx +

∫ ∞−∞

P(x)〈X 〉2dx −∫ ∞−∞

P(x)2x〈X 〉dx

= 〈X 2〉+

(〈X 〉2

∫ ∞−∞

P(x)dx

)−(

2〈X 〉∫ ∞−∞

P(x)xdx

)= 〈X 2〉+ 〈X 〉2 − 2〈X 〉2

= 〈X 2〉 − 〈X 〉2

Paul Kirk 6 of 22

Page 9: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.2 Independence

Independence

• “Independent probabilities multiply”

• For independent variables, PX ,Y (x , y) = PX (x)PY (y)

• For independent variables, 〈XY 〉 = 〈X 〉〈Y 〉

Paul Kirk 7 of 22

Page 10: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.3 Dependent random variables

Dependence

• If X and Y are dependent, then PX ,Y (x , y) does not factor as theproduct of PX (x) and PY (y)

• If we know PX ,Y (x , y) and want to know PX (x), then it is obtainedby “integrating out” (or “marginalising”) the other variable:

PX (x) =

∫ ∞−∞

PX ,Y (x , y)dy

Paul Kirk 8 of 22

Page 11: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.3 Dependent random variables

Conditional probability densities

• The probability density for X given that we know that Y = y iswritten P(X = x |Y = y) or P(x |y) and is referred to as theconditional probability density for X given Y .

PX |Y (X = x |Y = y) =PX ,Y (X = x ,Y = y)

PY (Y = y).

Explanation: “To see how to calculate this conditional probability, wenote first that P(x , y) with y = a gives the relative probability fordifferent values of x given that Y = a. To obtain the conditionalprobability density for X given that Y = a, all we have to do is divideP(x , a) by its integral over all values of x .”

Paul Kirk 9 of 22

Page 12: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.4 Correlations and correlation coefficients

• The covariance of X and Y is:

cov(X ,Y ) = 〈(X − 〈X 〉)(Y − 〈Y 〉)〉 = 〈XY 〉 − 〈X 〉〈Y 〉.

Idea:

1. How can we define what it means for a value x to be “bigger thanusual”? Well, we can see if x > 〈X 〉 i.e. if x − 〈X 〉 > 0.

2. Similarly, we can say that a value x is “smaller than usual” if x < 〈X 〉i.e. if x − 〈X 〉 < 0.

3. If x tends to be “bigger than usual” when y is “bigger than usual”,then 〈(X − 〈X 〉)(Y − 〈Y 〉)〉 will be > 0

• The correlation is just a normalised version of the covariance,which takes values in the range −1 to 1:

CXY =〈XY 〉 − 〈X 〉〈Y 〉√

V [X ]V [Y ]

Paul Kirk 10 of 22

Page 13: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.5 Adding random variables together

When we have two continuous random variables, X and Y , withprobability densities PX and PY , it is often useful to be able tocalculate the probability density of the random variable whose value isthe sum of them: Z = X + Y . It turns out that the probabilitydensity for Z is given by

PZ (z) =∫∞−∞ PX (s − z)PY (s)ds = PX ∗ PY

PZ (z) =

∫ ∞−∞

PX ,Y (z − s, s)ds =

∫ ∞−∞

PX ,Y (s, z − s)ds

If X and Y are independent, this becomes:

PZ (z) =

∫ ∞−∞

PX (z − s)PY (s)ds = PX ∗ PY

Paul Kirk 11 of 22

Page 14: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.5 Adding random variables together

When we have two continuous random variables, X and Y , withprobability densities PX and PY , it is often useful to be able tocalculate the probability density of the random variable whose value isthe sum of them: Z = X + Y . It turns out that the probabilitydensity for Z is given by

PZ (z) =∫∞−∞ PX (s − z)PY (s)ds = PX ∗ PY

PZ (z) =

∫ ∞−∞

PX ,Y (z − s, s)ds =

∫ ∞−∞

PX ,Y (s, z − s)ds

If X and Y are independent, this becomes:

PZ (z) =

∫ ∞−∞

PX (z − s)PY (s)ds = PX ∗ PY

Paul Kirk 11 of 22

Page 15: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.5 Adding random variables together

When we have two continuous random variables, X and Y , withprobability densities PX and PY , it is often useful to be able tocalculate the probability density of the random variable whose value isthe sum of them: Z = X + Y . It turns out that the probabilitydensity for Z is given by

PZ (z) =∫∞−∞ PX (s − z)PY (s)ds = PX ∗ PY

PZ (z) =

∫ ∞−∞

PX ,Y (z − s, s)ds =

∫ ∞−∞

PX ,Y (s, z − s)ds

If X and Y are independent, this becomes:

PZ (z) =

∫ ∞−∞

PX (z − s)PY (s)ds = PX ∗ PY

Paul Kirk 11 of 22

Page 16: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.5 Adding random variables together

When we have two continuous random variables, X and Y , withprobability densities PX and PY , it is often useful to be able tocalculate the probability density of the random variable whose value isthe sum of them: Z = X + Y . It turns out that the probabilitydensity for Z is given by

PZ (z) =∫∞−∞ PX (s − z)PY (s)ds = PX ∗ PY

PZ (z) =

∫ ∞−∞

PX ,Y (z − s, s)ds =

∫ ∞−∞

PX ,Y (s, z − s)ds

If X and Y are independent, this becomes:

PZ (z) =

∫ ∞−∞

PX (z − s)PY (s)ds = PX ∗ PY

Paul Kirk 11 of 22

Page 17: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.5 Adding random variables together

If X1 and X2 are random variables and X = X1 + X2, then

〈X 〉 = 〈X1〉+ 〈X2〉,

and if X1 and X2 are independent, then

V [X ] = V [X1] + V [X2].

Mysterious (?) assertion

“Averaging the results of a number of independent measurementsproduces a more accurate result. This is because the variances of thedifferent measurements add together.” — does this make sense?

Paul Kirk 12 of 22

Page 18: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.5 Adding random variables together

Explanation

Assume all measurements have expectation, µ, and variance, σ2.By the independence assumption, the variance of the average is:

V

[N∑

n=1

Xn

N

]=

N∑n=1

V

[Xn

N

].

Moreover,

V

[Xn

N

]= E

[X 2n

N2

]−(E

[Xn

N

])2

=E[X 2n

]− (E [Xn])2

N2=

V [Xn]

N2.

So,

V

[N∑

n=1

Xn

N

]=

N∑n=1

V [Xn]

N2=

1

N2

N∑n=1

σ2 =σ2

N.

Paul Kirk 13 of 22

Page 19: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.6 Transformations of a random variable

Key assertion: If Y = g(X ), then:

〈f (Y )〉 =

∫ x=b

x=aPX (x)f (g(x))dx =

∫ y=g(b)

y=g(a)PY (y)f (y)dy .

Given this assumption, everything else falls out automatically:

〈f (Y )〉 =

∫ x=b

x=aPX (x)f (g(x))dx =

∫ y=g(b)

y=g(a)PX (g−1(y))f (y)

dx

dydy

=

∫ y=g(b)

y=g(a)

PX (g−1(y))

g ′(g−1(y))f (y)dy .

General result (for invertible g):

PY (y) =PX (g−1(y))

|g ′(g−1(y))|.

Paul Kirk 14 of 22

Page 20: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.6 Transformations of a random variable

Key assertion: If Y = g(X ), then:

〈f (Y )〉 =

∫ x=b

x=aPX (x)f (g(x))dx =

∫ y=g(b)

y=g(a)PY (y)f (y)dy .

Given this assumption, everything else falls out automatically:

〈f (Y )〉 =

∫ x=b

x=aPX (x)f (g(x))dx =

∫ y=g(b)

y=g(a)PX (g−1(y))f (y)

dx

dydy

=

∫ y=g(b)

y=g(a)

PX (g−1(y))

g ′(g−1(y))f (y)dy .

General result (for invertible g):

PY (y) =PX (g−1(y))

|g ′(g−1(y))|.

Paul Kirk 14 of 22

Page 21: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.6 Transformations of a random variable

Key assertion: If Y = g(X ), then:

〈f (Y )〉 =

∫ x=b

x=aPX (x)f (g(x))dx =

∫ y=g(b)

y=g(a)PY (y)f (y)dy .

Given this assumption, everything else falls out automatically:

〈f (Y )〉 =

∫ x=b

x=aPX (x)f (g(x))dx =

∫ y=g(b)

y=g(a)PX (g−1(y))f (y)

dx

dydy

=

∫ y=g(b)

y=g(a)

PX (g−1(y))

g ′(g−1(y))f (y)dy .

General result (for invertible g):

PY (y) =PX (g−1(y))

|g ′(g−1(y))|.

Paul Kirk 14 of 22

Page 22: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.7 The distribution function

The probability distribution function, which we will call D(x), of arandom variable X is defined as the probability that X is less than orequal to x . Thus

D(x) = Prob(X ≤ x) =

∫ x

−∞P(z)dz

In addition, the fundamental theorem of calculus tells us that

P(x) =d

dxD(x).

Paul Kirk 15 of 22

Page 23: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.8 The characteristic function

“The characteristic function is defined as the Fourier transform of theprobability density.”

χ(s) =

∫ ∞−∞

P(x) exp (isX )dx .

The inverse transform gives:

P(x) =1

∫ ∞−∞

χ(s) exp (−isx)ds.

The Fourier transform of the convolution of two functions, P(x) andQ(x), is the product of their Fourier transforms, χP(s) and χQ(s).For discrete random variables, the characteristic function is a sum.In general (for both discrete and continuous r.v.’s), we have:

χP(s) = 〈exp (isx)〉P(X ).

Paul Kirk 16 of 22

Page 24: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.9 Moments and cumulants

Moment generating function (departure from the book)

The moment generating function is defined as:

M(t) = 〈exp(tX )〉,

where X is a random variable, and the expectation is with respect tosome density P(X ), so that

M(t) =

∫ ∞−∞

exp(tx)P(x)dx

=

∫ ∞−∞

(1 + tx +

1

2!t2x2 + ...

)P(x)dx

= 1 + tm1 +1

2!t2m2 + . . .+

1

r !trmr + . . .

where mr = 〈X r 〉 is the r -th (raw) moment.

Paul Kirk 17 of 22

Page 25: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.9 Moments and cumulants

Moment generating function (continued)

M(t) = 1 + tm1 +1

2!t2m2 + . . .+

1

r !trmr + . . .

It follows from the above expansion that:

M(0) = 1

M ′(0) = m1

M ′′(0) = m2

...

M(r)(0) = mr

Paul Kirk 18 of 22

Page 26: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.9 Moments and cumulants

Cumulant generating function (continued departure from book)

The log of the moment generating function is called the cumulantgenerating function,

R(t) = ln(M(t)).

By the chain rule of differentiation, we can write down the derivativesof R(t) in terms of the derivatives of M(t), e.g.

R ′(t) =M ′(t)

M(t)

R ′′(t) =M(t)M ′′(t)− (M ′(t))2

(M(t))2

Note that R(0) = 1, R ′(0) = M ′(0) = m1 = µ,R ′′(0) = M ′′(0)− (M ′(0))2 = m2 −m2

1 = σ2, . . .These are the cumulants.

Paul Kirk 19 of 22

Page 27: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.9 Moments and cumulants

The moments can be calculated from the derivatives of thecharacteristic function, evaluated at s = 0. We can see this byexpanding the characteristic function as a Taylor series:

χ(s) =∞∑n=0

χ(n)(0)sn

n!

where χ(n)(s) is the n-th derivative of χ(s). But we also have:

χ(s) = 〈e isX 〉 =

⟨ ∞∑n=0

(isX )n

n!

⟩=∞∑n=0

in〈X n〉sn

n!

Equating the 2 expressions, we get: 〈X n〉 = χ(n)(0)in

Paul Kirk 20 of 22

Page 28: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.9 Moments and cumulants

Cumulants

The n-th order cumulant of X , is the n-th derivative of the log of thecharacteristic function.

For independent random variables, X and Y , if Z = X + Y then then-th cumulant of Z is the sum of the n-th cumulants of X and Y .

The Gaussian distribution is also the only absolutely continuousdistribution all of whose cumulants beyond the first two (i.e. otherthan the mean and variance) are zero.

Paul Kirk 21 of 22

Page 29: Stochastic Processes for Physicists - Imperial · Stochastic Processes for Physicists Understanding Noisy Systems Chapter 1: A review of probability theory Paul Kirk Division of Molecular

1.10 The multivariate Gaussian

Let x = [x!, . . . , xN ]>, then the general form of the Gaussian pdf is:

P(x) =1√

(2π)Ndet(Σ)exp

(−1

2(x− µ)>Σ−1(x− µ)

),

where µ is the mean vector and Σ is the covariance matrix.

All higher moments of a Gaussian can be written in terms of themeans and covariances. Defining ∆X ≡ X − 〈X 〉, for a 1-dimensionalGaussian we have:

〈∆X 2n〉 =(2n − 1)!(V [X ])n

2n−1(n − 1)!

〈∆X 2n−1〉 = 0

Paul Kirk 22 of 22