estimation theory - the open academy · [kay’93] s. m. kay, fundamentals of statistical signal...

ESTIMATION THEORY

Outline

1. Random Variables

2. Introduction

3. Estimation techniques

4. Extensions to Complex Vector Parameters

5. Application to communication systems

[Kay’93] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory,

Prentice-Hall, New Jersey, 1993.

[Cover-Thomas’91] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wi-

ley, New York, 1991.

1

Random Variables

Definitions

A random variable � is a function that assigns a number to every outcome of an experiment.

A random variable � is completely characterized by:

Its cumulative distribution function (cdf): � � ��

Its probability density function (pdf): � � ��

��

Properties

The probability that � lies between� and� � then is

��

� ��

� � ��

The mean of � is given by

m � � E � � � � � � � � �� The variance of � is given by

var � � E � � � � � m � � � �� m � � � � � �� 2

Random Variables

Examples

Uniform random variable:

pdf: � � ��

��

�

��

� ��

mean and variance: m � ��

� var � � � � � � � ��

Gaussian random variable:

pdf: � � ��

��

exp

��

��

mean and variance: m � � � var � � � �

3

Random Variables

Two random variables

For two random variables � and � , we can define� The joint cdf: � � ��

� The joint pdf: � � ��

��

� � ��

The marginal pdfs � � �� and � � � � � can then be determined by

� � ��

The conditional pdfs � � �� and � � � � � � �� are given by

� � ��

� � � � � � � � � � � ��

� � ��

From this follows the popular Bayes’ rule

� � ��

� � � � �

� � � � � � � ��

� � � � � � ��

For independent random variables � and � we have

� � � � ��

4

Random Variables

Function of random variables

Suppose � is a function of the random variables � and � , e.g., � � � � � � � �

Corresponding increments in the cdf of � and the joint cdf of � and � are the same� � � ��

Hence, the expectation over � equals the joint expectation over � and �


m � � E � ��

The variance of � is given by

var � � E � �� m � � � �� m � � � � � �� 5

Random Variables

Vector random variables

A vector random variable � is a vector of random variables � � :

� � � � � � � ��

Its cdf/pdf is the joint cdf/pdf of all these random variables.


� m � � �� E � � � � � �

m � � E � � � �The covariance matrix of is given by

� cov � � � � � E � � � � � � � � � � m � � � � � � m � � � �cov � � E � � � � � m � � � � � m � � � �

6

Introduction

Problem Statement

Suppose we have an unknown scalar parameter � that we want to estimate from an observed

vector , which is related to � through the following relationship

� � � � ��

where� is a random noise vector with probability density function (pdf) � �� .

The estimator is of the form�

� � � � �Note that

�� itself is a random variable.

Hence, the performance of the estimator

�� should be described statistically.

7

Introduction

Special Models

To solve any estimation problem, we need a model. Here, we will look deeper into two

specific models:

The linear model: The relationship between and � is then given by

� � � �

where � is the model vector and � is the noise vector, which is assumed to have

mean � , m� � � , and covariance matrix � , cov� � � .

The linear Gaussian model: This model is a special case of the linear model, where the

noise vector� is assumed to be Gaussian (or normal) distributed:

� � � � � � � � � ��

�

� � � � � � det � � � � �exp

��

� ��

�

8

Estimation Techniques

We can view the unknown parameter � as a deterministic variable

� Minimum Variance Unbiased (MVU) Estimator

� Best Linear Unbiased Estimator (BLUE)

� Maximum Likelihood Estimator (MLE)

� Least Squares Estimator (LSE)

The Bayesian philosophy: � is viewed as a random variable

� Minimum Mean Square Error (MMSE) Estimator

� Linear Minimum Mean Square Error (LMMSE) Estimator

9

Minimum Variance Unbiased Estimation

A natural criterion that comes to mind is the Mean Square Error (MSE):

mse �� E �

� � ��

� � � � � �� E ��

�� m �

� � � m ��

� � � � � �

� E ��

�� m �

� � � � � m ��

� � � � � var ��

� m ��

� � � �

The MSE depends does not only depend on the variance but also on the bias.

This means that an estimator that tries to minimize the MSE will often depend on the

parameter � , and is therefore unrealizable.

Solution: constrain the bias to zero and minimize the variance, which leads to the so-called

Minimum Variance Unbiased (MVU) estimator:

unbiased: m �� for all �

minimum variance: var ��

is minimal for all �Remark: The MVU does not always exist and is generally difficult to find.

10

Minimum Variance Unbiased Estimation (Linear Gaussian Model)

For the linear Gaussian model the MVU exists and its solution can be found by means of

the Cramer-Rao lower bound (see notes, [Kay’93], [Cover-Thomas’91]):

��

Properties:

m ��

var ��

�� is Gaussian distributed, i.e.,

��

11

Best Linear Unbiased Estimation

In this case we constrain the estimator to have the form

��

Unbiased:m �

�� E �

� ��

� �� E � � �� m � � � for all �

Minimum variance:

var �� E �

� � ��

� � m �� E � � � �� m � � � � �

� �� E � � � � m � � � � m � � � � � � �� cov � � is minimal for all �

The first condition can only be satisfied if we assume a linear model for m � :

m � � � �

Hence, we have to solve

min

�

� �� cov � � � subject to ��

12

Best Linear Unbiased Estimation

Problem: min

�

� �� cov � � � subject to ��

Solution: � � �� cov� � � � � � cov� � � ��

� � � � � cov� � � �� cov�

�

Proof:

Using the method of the Lagrange multipliers, we obtain

� � �� cov � � � � ��

Setting the gradient with respect to � to zero we get

� ��

� � cov � � � � � � � � � � cov

� � ��

The Lagrange multiplier� is obtained by the contraint

�� cov

� � �� cov

� � � ��

Properties: m �� var �

�� cov�

� � ��

13

Best Linear Unbiased Estimation (Linear Model)

For the linear model the BLUE is given by

��

Remark: For the linear model the BLUE equals the MVU only when the noise is Gaussian.

14

Maximum Likelihood Estimation

Since the pdf of depends on � , we often write it as a function that is parametrized on � :

� � � � � � � � �

� � . This function can also be interpreted as the likelihood function, since it

tells us how likely it is to observe a certain . The Maximum Likelihood Estimator (MLE)

finds the � that maximizes � � � �

� � for a certain .

The MLE is generally easy to derive.

Asymptotically, the MLE has the same mean and variance as the MVU (but not asymp-

totically equivalent to the MVU).

15

Maximum Likelihood Estimation (Linear Gaussian Model)

For the linear Gaussian model, the likelihood function is given by

� � � �

� � � ��

�

� � � � � � det � � � � �

exp

��

� � � � � � � ��

It is clear that this function is maximized by solving

min

�

� � � � � � � ��

16

Maximum Likelihood Estimation (Linear Gaussian Model)

Problem: min

�

� � � � � � � � � � � � � � �

Solution:

��

Proof:

Rewriting the cost function that we have to minimize, we get

� � � � � � � � ��

Setting the gradient with respect to � to zero we get

� ��

� � � � � ��

Remark: For the linear Gaussian model, the MLE is equivalent to the MVU estimator.

17

Least Squares Estimation

The Least Squares Estimator (LSE) finds the � for which

�

� � � � � � � �

� is minimal

Properties:

No probabilistic assumptions required

The performance highly depends on the noise

18

Least Squares Estimation (Linear Model)

For the linear model, the LSE solves

min

�

�

� � ��

�

Problem: min

�

�

� � ��

�

Solution:�

� � � � � � ��

Proof: As before

Remark: For the linear model the LSE corresponds to the BLUE when the noise is white,

and to the MVU when the noise is Gaussian and white.

19

Least Squares Estimation (Linear Model)

Orthogonality Condition

Let us compute � � � �� :

� � � ��

� � � � � � � � ��

� � �

For the linear model the LSE leads to the following orthogonality condition:� � � ��

�

��

�

20

The Bayesian Philosophy

� is viewed as a random variable and we must estimate its particular realization

This allows us to use prior knowledge about � , i.e., its prior pdf � � � � � .

Again, we would like to minimize the MSE

Bmse �� E �

� � ��

� � � � � �

but this time both� and � are random, hence the notation Bmse for Bayesian MSE.

Note the difference between these two MSEs:

mse �� E �

� � ��

� � � � � �� E� � ��

� � � � � �

� ��

� � � � � ��

� � � � � � � � � �

Bmse �� E �

� � ��

� � � � � �� E� � � � ��

� � � � � �

� ��

� � � � � ��

� � � � � � � � � � � � � � � �

Whereas the first MSE depends on � , the second MSE does not depend on � .

21

Minimum Mean Square Error Estimator

We know that � � � � � � � � � � � � � � � � � � � � � , so that

Bmse ��

� ��

� � � � � � � � � � � � � � ��

� � � � �

Since � � � �� for all , we have to minimize the inner integral for each .

Problem: min

�

��

� � � � � � � � � � � � � � �

Solution: mean of posterior pdf of � :

��

Proof: Setting the derivative with respect to

�� to zero we obtain:

��

��

� � � � � � � � � � � � � � � � � ��

� � � � � � � � � � � � � � � ��

� � � � � � � � � � � � � � � �

Remarks:

In contrast to the MVU estimator the MMSE estimator always exists.

The MMSE has a smaller average MSE (Bayesian MSE) than the MVU, but the MMSE

estimator is biased whereas the MVU estimator is unbiased.

22

Minimum Mean Square Error Estimator (Linear Gaussian Model)

For the linear Gaussian model where � is assumed to be Gaussian with mean 0 and variance

� ��

: � � � � � �� , the MMSE estimator can be found by means of the conditional pdf of a

Gaussian vector random variable [Kay’93]:

��

�

� � ��

� � � � ��

�

� ��

where the last equality is due to the matrix inversion lemma (see notes):

��

Remark: Compare this with the MVU for the linear Gaussian model.

23

Linear Minimum Mean Square Error Estimator

As for the BLUE, we now constrain the estimator to have the form

�� .

The Bayesian MSE can then be written as

Bmse �� E � � � � � �� E � � � � � � � �� E � � � � � � E � � � � �

��

� � � � �

Setting the derivative with respect to � to zero, we obtain� � ��

The LMMSE estimator is therefore given by

��

��

24

Linear Minimum Mean Square Error Estimator

Orthogonality Condition

Let us compute E � � � � ��

� � � � � :

E � � � � ��

� � � � �� E � � � � � �� E � � � � � ��

The LMMSE leads to the following orthogonality condition:

E � � � � ��

� � � � ��

� � � � � ��

� �

� �

�

��

25

Linear Minimum Mean Square Error Estimator (Linear Model)

For the linear model where � is assumed to have mean 0 and variance� ��

, the LMMSE

estimator is given by

��

�

� � ��

� � � � ��

�

� ��

where the last equality is again due to the matrix inversion lemma.

Remark: The LMMSE estimator is equivalent to the MMSE estimator when the noise and

the unknown parameter are Gaussian.

26

Summary

linear model linear Gaussian model

� deterministic � deterministic

MVU ?

��

BLUE

�� same as linear model

MLE ?

��

LSE


� stochastic with mean � and var. �� Gaussian with mean � and var. ��

MMSE ?�

� � ��

LMMSE


27

Extensions to Complex Vector Parameters

linear model linear Gaussian model

� deterministic � deterministic

MVU ?

��

BLUE


MLE ?

��

LSE


� stochastic with mean 0 and cov. � � � Gaussian with mean � and cov. � �

MMSE ?�

� � ��

LMMSE

��

� �� same as linear model

28

Application to Communications

� ��

� ��

� ��

channel

� ��

� ��

�� has length L

� ��

� � � �� has length K

29


Defining � �� and� � �� , we obtain

Symbol estimation model: � � � �

� �

� ��

� � � �.... . .

� �� . . ....

� ��

� � � � � � ��

Channel estimation model: � ��

� ��

� � � �...

. . .

�� . . ....

��

� � � � � � � ��

30


Most communications systems (GSM, UMTS, WLAN, ...) consist of two periods:

Training period: During this period we try to estimate the channel by transmitting some

known symbols, also known as training symbols or pilots.

Data period: During this period we use the estimated channel to recover the unknown

data symbols that convey useful information.

What kind of processing do we use in each of these periods?

During the training period we use one of the previously developed estimation techniques

on the channel estimation model, � �� , assuming that � is known.

During the data period we use one of the previously developed estimation techniques

on the symbol estimation model, � � � � , assuming that � is known.

31


Channel estimation

Let us assume that cov� � � � � � �

BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):

��

LMMSE (or when the noise and channel are Gaussian also the MMSE):

��

Remark: Note that the LMMSE estimator requires the knowledge of � � � E � ��

�

which is generally not available.

32


Symbol estimation

Let us assume that cov� � � � � � �

BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):

��

LMMSE (or when the noise and symbols are Gaussian also the MMSE):

��

Remark: Note that the LMMSE estimator requires the knowledge of � �� E

� �

� ��

which

can be set to� �� if the data symbols have energy� �

�

and are uncorrelated.

33

estimation theory - the open academy · [kay’93] s. m. kay, fundamentals of statistical signal...

Documents