basic estimation theory - washington university in st. louis · basic estimation theory ......

EE 552A: Detection and Estimation TheorySpring 2001, Joseph A. O’Sullivan

Exam 1March 28, 2001

This is a one and a half hour in class exam. It is closed book and closed notes. No computers (includingcalculators) are allowed. By signing this exam, you are confirming your agreement to abide by the examrules including the honor code of Washington University’s School of Engineering and Applied Science.

Sign your name below.

NAME:

Basic Estimation Theory

Suppose that X1, X2, . . . , Xn is a set of zero mean Gaussian random variables. The covariance between anytwo random variables is

E{XiXj} = 0.9|i−j|. (1)

a. Derive an expression for E{X2|X1}.b. Derive an expression for E{X3|X2, X1}.c. Find the conditional mean squared error

E{(X3 − E{X3|X2, X1})2}. (2)

d. Can you generalize this to a conclusion about the form of E{Xn|X1, X2, . . . , Xn−1}?

Basic Detection Theory

Suppose that under hypothesis H1, the random variable X has probability density function

pX(x) =32x2, for − 1 ≤ x ≤ 1. (3)

Under hypothesis H0, the random variable X is uniformly distributed on [ -1 , 1 ].

a. Use the Neyman-Pearson lemma to determine the decision rule to maximize the probability of detectionsubject to the constraint that the false alarm probability is less than or equal to 0.1. Find the resultingprobability of detection.b. Plot the receiver operating characteristic for this problem. Make your plot as good as possible.

Estimation Theory Problem:

The Expectation-Maximization Algorithm.

Suppose that the random variable R is a sum of two exponentially distributed random variables, X and Y ,

R = X + Y, (4)

1

where pX(x) = α exp(−αx), x ≥ 0, and pY (y) = β exp(−βy), y ≥ 0. The value of β is known, but α is notknown. The goal of the problem is to derive an algorithm to estimate α.a. Write down the loglikeihood function for the data R (this is the incomplete data loglikelihood function).Find a first order necessary condition for α to be a maximum likelihood estimate. Is this equation easy tosolve for α?b. Define the complete data to be the pair (X, R), and write down the complete data loglikelihood function.c. Determine the conditional probability density function on X given R, as a function of a nominal value α.Denote this probability density function (pdf) p(x|r, α).d. Using the pdf p(x|r, α), determine the conditional mean of X given R and α.e. Determine the function Q(α|α), the expected value of the complete data loglikelihood function given theincomplete data and α.f. Derive the expectation-maximization algorithm for estimating α given R.

2



This is a one and a half hour in class exam. It is open book and open notes. No computers (includingcalculators) are allowed. By signing this exam, you are confirming your agreement to abide by the examrules including the honor code of Washington University’s School of Engineering and Applied Science.


NAME:

Basic Estimation Theory

Suppose that (X1, Y1), (X2, Y2), . . . , (Xn, Yn) are pairwise independent and identically distributed jointlyGaussian random variables. The mean of Xi is 3. The mean of Yi is 5. The variance of Xi is 17. Thevariance of Yi is 11. The covariance between Xi and Yi is 7.

a. Find the expression for E[Xi|Yi] in terms of Yi.b. Find the expression for

E

[1n

n∑i=1

Xi

∣∣∣∣∣Y1, Y2, . . . Yn

](5)

What is the sufficient statistic of Y1, Y2, . . . Yn?c. Now define two new random variables

T =1n

n∑i=1

Xi S =1n

n∑i=1

Yi. (6)

Find the joint distribution for the random variables T and S.d. Find the expression for E[T |S] in terms of S. Compare this to your answer in part b. Comment on

this result.

Basic Detection Theory

Problem motivation. In some problems of a particle moving in space, it is not clear whether the motion israndom in three-dimensions, random but restricted to a two-dimensional surface, or somewhere in between(fractal motion of smoke, for example). In order to decide between these hypotheses, the positions ofparticles are measured and candidate density functions can be compared. While this can be turned into adimensionality estimation problem, here we treat the simpler case of deciding between the two extreme casesof random motion in two or three dimensions.

It is assumed that only the distance of the particles relative to an origin is measurable, not the positionsin three dimensions. In three dimensions, the distance at any time is Maxwell distributed. In two dimensions,the distance is Rayleigh distributed. The particles are assumed to be independent and identically distributedunder either hypothesis.

3

Suppose that under hypothesis H1, each random variable Xi has a Maxwell probability density function

pX(x) =x2√

2/π

σ3exp(− x2

2σ2

), x ≥ 0. (7)

Under hypothesis H0, each random variable Xi has a Rayleigh probability density function

pX(x) =x

σ2exp(− x2

2σ2

), x ≥ 0. (8)

a. Given N independent and identically distributed measurements, determine the optimal Bayes test.b. Determine the probability of false alarm for N = 1 measurement.c. Determine the threshold used in the Neyman-Pearson test for N = 1.

Estimation Theory

Suppose that an experiment has three possible outcomes, denoted 1, 2, and 3. Each run of the experimentconsists of two parts. In the first part, a biased coin is flipped, and heads occurs with unknown probability p.If a head occurs, then the three outcomes have probabilities [1/2 1/3 1/6]. Otherwise, the three outcomeshave probabilities [1/6 1/3 1/2]. In the second part of the experiment, one of the three outcomes is drawnfrom the distribution determined in the first part.

This experiment is run n independent times.a. Find the probabilities for outcomes 1, 2, and 3 in any run of the experiment in terms of the probability

p. You must do this part correctly to solve this problem.b. Find the probability distribution for the n runs of the experiment.c. From the distribution you determined in part b, determine a sufficient statistic for this problem.d. Find the maximum likelihood estimate for p.e. Is the estimate that you found in part d biased?

Cramer-Rao Bound; problem 12, p. 83, Hero

Let X1, X2, . . . , Xn be i.i.d. drawn from the Gamma density

p(x|θ) =1

Γ(θ)xθ−1e−x, x ≥ 0, (9)

where θ is an unknown nonegative parameter and Γ(θ) is the Gamma function. Note that Γ(θ) is thenormalizing constant,

Γ(θ) =∫ ∞

0

xθ−1e−xdx. (10)

The Gamma function satisfies a recurrence formula Γ(θ + 1) = θΓ(θ).a. Find the Cramer-Rao lower bound on unbiased estimators of θ using X1, X2, . . . , Xn. You may leave

your answer in terms of the first and second derivatives of the Gamma function.

4


Final ExamApril 28, 1999

This is a take-home exam. It is due on Wednesday, May 5 at 5 p.m. in my office. It is open book andopen notes. Students in the class may not communicate with each other regarding the exam. This is a strictrule for this exam. This exam does require the use of computers. In particular, Matlab is required for thefirst problem.

You are not allowed to seek help from any person other than me. This includes everyone. If you do notunderstand a problem, then you may only ask me about it; no one else.

Please think carefully about your answers and turn in a concise description of the theoretical derivations,of the matlab code, and of the results. Part of the grade will be based on the quality of your write-up.

NAME:

Estimation Theory Problem:The Expectation-Maximization Algorithm.

Suppose that an observed, continuous-time signal consists of a sum of signals of a given parameterizedform, plus noise. The problem is to estimate the parameters in each of the signals using the expectationmaximization algorithm.

To be more specific, assume that the real-valued signal is

r(t) =N∑

k=1

s(t; θk) + w(t), 0 ≤ t ≤ T, (11)

where s(t; θ) is a signal of a given type, and θk, k = 1, 2, . . . , N are the parameters; w(t) is white Gaussiannoise with intensity N0/2. Example signals (with Matlab code fragments) include:

1. Sinusoidal signals: θ′ = (f,phase,amplitudes) and s=amplitudes’*cos(2*pi*f*t+phase*ones(size(t)));

2. Exponential signals: θ′ = (exponents,amplitudes) and s=amplitudes’*exp(exponents*t);

3. Exponentially decaying sinusoids: θ′ = (exponents,f,phase,amplitudes) ands=amplitudes’*(exp(exponents*t).*cos(2*pi*f*t+phase*ones(size(t))));

Define the vector of all parameters to be estimated by

Θ = (θ1, θ2, . . . , θN)′. (12)

a. Derive the log-likelihood ratio functional for Θ. This is the incomplete-data log-likelihood ratiofunctional, where we refer to r(t) as the incomplete data. The ratio is obtained as in class relative to a nullhypothesis of white noise only.

b. For the expectation-maximization algorithm, define the complete-data signals

rk(t) = s(t; θk) + wk(t), 0 ≤ t ≤ T, k = 1, 2, . . . , N, (13)

5

where wk(t) is white Gaussian noise with intensity σ2k where

N∑k=1

σ2k = N0/2. (14)

Using these complete-data signals, we have

r(t) =N∑

k=1

rk(t). (15)

The standard choice for the intensities assigned to the components is σ2k = N0/(2N), but other choices may

yield better convergence. Derive the complete-data log-likelihood ratio functional; notice that it is written asa sum of log-likelihood ratio functionals for the rk (which are again relative to the noise only case). Denotethe log-likelihood ratio functional for rk given θk by l(rk|θk). Note that this log-likelihood ratio functionalis linear in rk.

c. Compute the expected value of the complete-data log-likelihood function given the incomplete data anda previous estimate for the parameters; denote this by Q. Suppose the previous estimate of the parametersis Θ(m), where m denotes the iteration number in the EM algorithm. So Q(Θ|Θ(m)) is a function of Θ andof the previous estimate. Note that Q can be decomposed as a sum

Q(Θ|Θ(m)) =N∑

k=1

Qk(θk|Θ(m)). (16)

d. Conclude from the derivation in parts b and c that only the conditional mean of rk given r and Θ(m)

is needed to find Qk. Explicitly compute this expected value,

rk(t) = E{rk(t)|r, Θ(m)}. (17)

e. The maximization step of the EM algorithm consists of maximizing Q(Θ|Θ(m)) over Θ. The EM algo-rithm effectively decouples the problem at every iteration into a set of independent maximization problems.Show that to maximize Q over Θ it suffices to maximize each Qk over θk. Derive the necessary equationsfor θ

(m+1)k by taking the gradient of Qk(θk|Θ(m)) with respect to θk. Write the result in terms of rk. Note

that this equation depends on θk through two terms, s(t; θk) and the gradient of s(t; θk) with respect to θk.f. In this part, you will develop Matlab code for the general problem described above and apply it to

two of the cases listed above. From the derivation of the EM algorithm, it is clear that there are two criticalcomponents. The first is the maximization of Qk and the second is the conditional expected value of rk. Forthe maximization step, we only need to analyze a simpler problem. The simpler problem has data

ρ(t) = s(t; θ) + w(t), 0 ≤ t ≤ T, (18)

where s is of the same form as above, and w(t) is white Gaussian noise with intensity σ2. Write a Matlabprogram to simulate (18), taking into account the following guidelines. Note that the code developed earlierin the semester should help significantly here.

f.i. To simulate the continuous-time case using discrete-time data, some notion of sampling must beused. We know from class that white Gaussian noise cannot be sampled in the usual sense. The discrete-time processing must be accomplished so that the resulting implementations converge to the solution of thecontinuous-time problem in a mean-square sense as the sampling interval converges to zero. To accomplishthis, we model the data as being an integral of ρ(t) over a small time interval,

ρi =∫ (i+1)Δ

iΔ

ρ(t)dt (19)

= s(iΔ; θ)Δ + wi + gi, (20)

6

where Δ is the sampling interval, and the term gi at the end is small and can be ignored (it is order Δ2).Show that the discrete-time noise terms wi are i.i.d. zero mean Gaussian random variables with varianceσ2Δ. In order to avoid having both the signal part (that is multiplied by Δ which is small) and the noisegoing to zero, it is equivalent to assume that the measured data are ηi = ρi/Δ, so

ηi = s(iΔ; θ) + ni, (21)

where ni are i.i.d. zero mean Gaussian with variance σ2/Δ. Given the signal, σ2, and Δ, the followingMatlab code fragment implements this model.size s=size(signal);noise=randn(size s);variance=sigma2/Delta;noise=noise*sqrt(variance);eta=signal+noise;Note that the noise is multiplied by σ and divided by the square root of Δ. Using this concept, the simulationis flexible in the number of samples. Only in the limit as Δ goes to zero does this truly approximate thecontinuous-time signal.

f.ii. In this part, you will write Matlab code to find the maximum likelihood estimates for parametersfor one sinusoid in noise. Assume that

s(t; θ) = a cos(2πf0t + φ), (22)

and that the three parameters (a, f0, φ) are to be estimated. For f0 much larger than 2πT

, show that∫ T

0

cos2(2πf0t + φ)dt (23)

is approximately equal to T/2. For this situation, show that the maximum likelihood estimates are foundby the following algorithm:Step 1: Compute the Fourier transform of the data

R(f) =∫ T

0

ρ(t)e−j2πftdt. (24)

Step 2: Find the maximum over all f of |R(f)|; set f0 equal to that frequency.Step 3: Find a > 0 and φ so that

a cos φ + ja sin φ = R(f0). (25)

Implement this algorithm. Perform some experiments that demonstrate your algorithm working. Derive theFisher Information Matrix for estimating the three parameters (a, f0, φ).

f.iii. Implement Matlab code for estimating two parameters for one decaying exponential. That is, assumethat the signal is

s(t; θ) = Ae−αt. (26)

Write Matlabe code for estimating (A, α). Perform some experiments that demonstrate that your algorithmis working. Derive the Fisher Information Matrix for estimating the two parameters (A, α).

f.iv. Implement general Matlab code to compute the estimate of rk(t) given r(t) and previous guesses forthe parameters.

f.v. To demonstrate that the code in f.iv. works, implement it on a sum of two sinusoids and, separately,on a sum of two decaying exponentials. Show that it works.

f.vi. Implement the full-blown EM algorithm for both a sum of two sinusoids and a sum of two decayingexponentials. Run the algorithm many times for one set of parameters and compute the sample variance ofthe estimates. Compare the sample variances to the entries in the inverse of the Fisher Information Matricescomputed above. Run your algorithm for selected choices of signal-to-noise ratio and parameters. Brieflycomment on your conclusions.

7

M-ary Detection Problem

There are many different ways to explore the performance of M-ary detection problems.Suppose that there are M random vectors sm, m = 1, 2, . . . , M , each vector having N components

(dimension N × 1). These vectors are independent and identically distributed (i.i.d.). Furthermore, thecomponents of the vectors are i.i.d. Gaussian with zero mean and variance σ2

s . Thus, the probability densityfunction for a vector sm is

p(sm) =N∏

k=1

1√2πσ2

s

e− 1

2σ2s

s2mk

. (27)

The model for the data, conditioned on the signal being sm is

r = sm + w, (28)

where w is a noise vector with i.i.d. zero mean, Gaussian distributed components whose variance is σ2w.

a. Let’s first assume that the signal vectors are not known at the receiver. Suppose that given r it isdesired to estimate the random vector sm. Find the minimum mean-squared error estimate; denote thisestimate by s(r). Find the mean-squared error of the estimate.

b. Given that sm is the signal and given your MMSE estimate s(r), compute the average mean-squarederror

ξ2ave =

1N

E{(sm − s(r))′(sm − s(r))}. (29)

Show that for any ε,

P (| 1N

(sm − s(r))′(sm − s(r)) − ξ2ave| > ε) (30)

goes to zero exponentially fast as N gets large. HINT: Use a Chernoff bound.c. Assume that we have a channel N uses of which may be modeled by (28), when the transmitted signal

is sm. Assume a receiver knows all of the possible transmitted signals S = {sm, m = 1, 2, . . . , M}, and thatthe receiver structure is of the following form. First, the receiver computes s(r). Second, the receiver looksthrough the vectors in S and finds all m such that sm satisfies

| 1N

(sm − s(r))′(sm − s(r)) − ξ2ave| < ε. (31)

If there is only one, then it is decided to be the transmitted signal. If there is more than one, the receiverrandomly chooses any signal. To evaluate the probability of error, we do the following. Suppose that sm

was the signal sent and let sl be any other signal. Show that the probability

P (| 1N

(sl − s(r))′(sl − s(r)) − ξ2ave| < ε) (32)

goes to zero exponentially fast as N gets large; find the exponent. HINT: Use a Chernoff bound.d. To finish the performance analysis, note that the probability that sl1 satisfies (31) is independent

of the probability that sl2 satisfies (31), for l1 �= l2. Use the union bound to find an expression for theprobability that no l other than l = m satisfies (31). Then, find an exponential bound on the followingexpression in terms of K = log2 M

P (error|m) < P (sm does not satisfy (31)) + (1 − P (all l �= m do not satisfy (31))) (33)

If you get this done, then you have an exponential rate on the error in terms of K, ε, σ2s , and σ2

w. Note thatShannon derived the capacity of this channel to be

C =12

log(

1 +σ2

s

σ2w

). (34)

Is C related to your exponent at all? It would relate to the largest possible K/N at which exponential errorrates begin. Do not spend a lot of time trying to relate this K/N to C if it does not drop out, since itprobably will not. However, you surely made a mistake if your largest rate is greater than C.

8


Final ExamMay 7, 2001

This is an all day exam, starting at 8:30 a.m. and lasting until 5 p.m. It is open book and open notes.All sources of written material are allowed, but you must adequately and carefully describe any sources thatyou use. You may use a computer if you wish, but no problem explicitly (or implicitly) calls for such usage.

If you have any questions, you have two options. First, you may contact me and ask the question(s).Second, you may write down what your understanding of the question is and work from there.

Under no circumstances can you ask any other person any question relating to this exam. Of course thisincludes all forms of communication and particularly includes students in the class this year.

By signing this exam, you are confirming your agreement to abide by the exam rules including the honorcode of Washington University’s School of Engineering and Applied Science.


NAME:

Two Dimensional Random Walk

Suppose that a random walk takes place in two dimensions. At time t = 0, a particle starts at (x, y) = (0, 0).The random walk begins at t = 0. At any time t > 0, the probability density function for the location of theparticle is

p(x, y; t, D) =1

2πDtexp(− 1

2Dt[x2 + y2]

), (35)

where D is the diffusion constant. If x and y have units of length, and t has units of time, then D has unitsof length squared per unit time.

Parts a and b

Suppose that the random walk is started at time t = 0 and then at some fixed time t = T , the position ismeasured exactly. Let this be repeated N independent times, yielding data {(Xi, Yi), i = 1, 2, . . . , N}.

a. Find the maximum likelihood estimate for the diffusion constant D from these N measurements.b. Find the Cramer-Rao lower bound on estimating D. Is the maximum-likelihood estimator efficient?

Parts c, d, e, and f

Now let us consider a more practical situation. The device making the measurement of position is of fintesize. Thus, for

x2 + y2 ≤ r, (36)

the measurement is perfect (where r is the radius of the device). For

x2 + y2 ≥ r, (37)

there is no measurement; that is, no particle is detected.

9

Suppose that this experiment is run N independent times. Out of those N times, only M , where M ≤ N ,runs yield particle measurements.

c. Find the probability for any value of M . That is, find P (M = m), for each 0 ≤ m ≤ N .d. Find the conditional distribution on the particles measured given M . In order to fix notation, the

particles that are measured are relabeled from 1 to M . The set of measurements, given M , is {(Xi, Yi), i =1, 2, . . . , M}.

e. Find the maximum likelihood estimate for D.f. Find the Cramer-Rao bound for estimating D given these data. Compare this bound to the original

bound. Under what conditions is the maximum likelihood estimate for D approximately efficient?

Detection for Two Measured Functions

Suppose that two random processes, r1(t), and r2(t), are measured. Under hypothesis H1, there is a signalpresent in both, while under hypothesis H0 (the null hypothesis), there is no signal present. More specifically,under H1,

r1(t) = Xe−αt cos ωt + w1(t), 0 ≤ t ≤ T (38)r2(t) = Y e−αt sin ωt + w2(t), 0 ≤ t ≤ T, (39)

where

• X and Y are independent zero mean Gaussian random variables with variances σ2

• α and ω are known

• w1(t) and w2(t) are independent realizations of white Gaussian noise, both with intensity N0/2

• w1(t) and w2(t) are independent of X and Y .

Under H0,

r1(t) = w1(t), 0 ≤ t ≤ T (40)r2(t) = w2(t), 0 ≤ t ≤ T, (41)

where w1(t) and w2(t) are as above.a. Find the decision rule that maximizes the probability of detection given an upper bound on the

probability of false alarm.b. Find expressions for the probability of false alarm and the probability of detection using the optimal

decision rule. In order to simplify this, you may need to assume that the two exponentially damped sinusoidsare orthogonal over the interval of length T .

c. Analyze the expressions obtained in part b. How does the performance depend on α, ω, T , σ2, andN0/2?

Autoregressive Models, Detection Theory, Estimation Theory

Background on Autoregressive Models

Suppose that R1, R2, . . . is a stationary sequence of Gaussian random variables with zero mean. The co-variance function is determined by an autoregressive model which the random variables satisfy. The autore-gressive model is an mth order Markov model, meaning that the probability density function of Rn givenRn−1, Rn−2, . . . , R1 equals the probability density function of Rn given Rn−1, Rn−2, . . . , Rn−m.

10

More specifically, suppose that

Rn = −a1Rn−1 − a2Rn−2 − . . .− amRn−m + Wn, (42)

where Wn are indepedent and identically distributed Gaussian random variables with zero mean and varianceσ2. Let the covariance function for the random process be Ck, so

Ck = E{RnRn−k}. (43)

Comment: In order for this equation to model a stationary random process and to be viewed as agenerative model for the data, the corresponding discrete time system must be stable. That is, if one were tocompute the transfer function in the Z-transform domain, then all of the poles of the transfer function mustbe inside of the unit disk in the complex plane. These poles are obviously the roots of the characteristicequation with coefficients aj .

a. Using the autoregressive model in equation (42), show that the covariance function satisfies theequations

C0 + a1C1 + a2C2 + . . . + amCm = σ2 (44)Ck + a1Ck−1 + a2Ck−2 + . . . + amCk−m = 0, (45)

where the second equation holds for all k > 0. Hint: Multiply both sides of (42) by a value of the randomsequence and take expected values. Use the symmetry property of covariance functions for the first equality.

b. Derive a recursive structure for computing the logarithm of the probability density function ofRn−1, Rn−2, . . . , R1. More specifically, let

vn = lnp(r1, r2, . . . , rn). (46)

Derive an expression for vn in terms of vn−1 and an update. Focus on the case where n > m.Hint: This is a key part of the problem, so make sure you do it correctly. It obviously relates to the Markovproperty expressed through the autoregressive model in (42).

c. Consider the special case of m = 1. Suppose that C0 = 1. Find a relationship between a1 and σ2

(essentially you must solve (45) in this general case).Comment: Note that the stability requirement implies that |a1| < 1.

Recursive Detection for Autoregressive Models

Suppose that one has to decide whether data arise from an autoregressive model or from white noise. In thisproblem, the log-likelihood ratio is computed recursively.

Under hypothesis H1, the data arise from the autoregressive model (42). Under hypothesis H0, the dataRn are i.i.d. Gaussian with zero mean and variance C0. That is, under either hypothesis the marginaldeistribution on any sample Rn is the same. The only difference between the two models is in the covariancestructure.

a. Find the log-likelihood ratio for n samples. Call this loglikelihood ratio ln. Derive a recursiveexpression for ln in terms of ln−1 and an update. Focus on the case n > m.

b. Consider the special case of m = 1. Write down the recursive structure for this case.c. The performance increases as n grows. This can be quantified in various ways. One way is to compute

the information rate functions for each n. In this problem, you will compute a special case.Consider again m = 1. Find the log-moment generating function for the difference between ln and

ln−1 conditioned on each hypothesis, and conditioned on previous measurements; call these two log-momentgenerating functions m0(s) and m1(s):

m1(s) = lnE{es(ln−ln−1)|H1, r1, r2, . . . , rn−1}. (47)

Compute and plot the information rate functions I0(x) and I1(x) for these two log-moment generatingfunctions.Comment: These two functions quantify the increase in information for detection provided by the newmeasurement.

11

Recursive Estimation for Autoregressive Models

In this problem, you will estimate the parameters in an autoregressive model given observations of the dataRn, Rn−1, . . . , R1.

a. First, assume that the maximum likelihood estimate for the parameters given data Rn−1, Rn−2, . . . , R1

satisfiesBn−1an−1 = dn−1, (48)

where the vector an−1 is the maximum likelihood estimate of the parameter vector

a = [a1 a2 . . . am]T . (49)

Find the update equations for Bn and dn. These may be obtained by writing down the likelihoodequation using the recursive update for the log-likelihood function, and taking the derivative with respectto the parameter vector.

b. The computation for an may also be written in recursive form. This is accomplished using the matrixinversion lemma. The matrix inversion lemma states that a rank one update to a matrix yields a rank oneupdate to its inverse. More specifically, if A is an m × m symmetric, invertible matrix and f is an m × 1vector, then

(A + f fT )−1 = A−1 −A−1f1

1 + fTA−1ffTA−1. (50)

Use this equation to derive an equation for the estimate an in terms of an−1. Hint: The final form shouldlook like

an = an−1 + gn[rn + aTn−1(rn−1 rn−2 . . . rn−m)T ], (51)

where an auxiliary equation defines the vector gn in terms of B−1n−1 and the appropriate definition of f .

12



This is a three hour exam in class. It is open book and open notes. No computers (including calculators)are allowed. By signing this exam, you are confirming your agreement to abide by the exam rules includingthe honor code of Washington University’s School of Engineering and Applied Science.


NAME:

1 Detection Theory (25 points)

Suppose that three hypotheses are equally likely. The hypotheses are H1, H2, and H3 with correspondingdata models

H1 : r(t) = a cos(100t) + w(t), 0 ≤ t ≤ π (52)H2 : r(t) = a cos(100t + 2π/3) + w(t), 0 ≤ t ≤ π (53)H3 : r(t) = a cos(100t − 2π/3) + w(t), 0 ≤ t ≤ π, (54)

where w(t) is white Gaussian noise with mean zero and intensity N0/2,

E[w(t)w(τ )] =N0

2δ(t − τ ). (55)

a. Assume that a > 0 is known and fixed. Determine the optimal receiver for the minimum probabilityof error decision rule. Draw a block diagram of this receiver. Does your receiver depend on a? Comment onthe receiver design.

b. Derive an expression for the probability of error as a function of a. Simplify the expression if possible.

2 Estimation Theory (25 points)

Two jointly Gaussian random processes a(t) and r(t) have zero mean and are stationary. The covariancefunctions are

E[r(t)r(τ )] = Krr(t − τ ) = 7e−3|t−τ| + 6δ(t − τ ) (56)E[a(t)a(τ )] = Kaa(t − τ ) = 7e−3|t−τ| (57)E[a(t)r(τ )] = Kar(t − τ ) = 7e−3|t−τ| (58)

(59)

a. Find the probability density function for a(1) given all values of r(t) for −∞ < t < ∞.b. Interpret the optimal minimum mean square error (MMSE) estimator for a(t) given r(u) for −∞ <

u < ∞ in terms of power spectra.c. Now consider a measurement over a finite time interval, r(u), 0 ≤ u ≤ 2. Find the form of the optimal

MMSE estimator for a(t) for 0 ≤ t ≤ 2. Do not work through all of the details on this part–just convinceme that you could if you had enough time.

13


In this problem, we use the same data model as in the previous problem. Two jointly Gaussian randomprocesses a(t) and r(t) have zero mean and are stationary. The covariance functions are

E[r(t)r(τ )] = Krr(t − τ ) = 7e−3|t−τ| + 6δ(t − τ ) (60)E[a(t)a(τ )] = Kaa(t − τ ) = 7e−3|t−τ| (61)E[a(t)r(τ )] = Kar(t − τ ) = 7e−3|t−τ| (62)

(63)

Suppose that a causal estimator is desired. Design an optimal MMSE estimator of the form

da

dt= −λa(t) + gr(t). (64)

Note that this estimator model has two parameters, λ and g.HINT: Consider the state space system

da

dt= −3a(t) + u(t) (65)

r(t) = a(t) + w(t), (66)

where u(t) and w(t) are appropriately chosen white noise processes. Do you know an optimal causal MMSEestimator for a(t)?

4 Basic Estimation Theory (30 points)

In order to understand the difficulty of finding a needle in a haystack, one first must understand the statisticsof the haystack. In this problem, you will derive a model for a haystack and then derive an algorithm forestimating how many pieces of hay are in the stack.

Suppose that a piece of hay is measure using hay units, abbreviated hu. On average a piece of hay haslength 1 hu and width 0.02 hu. The standard deviation of each of these dimensions is 20%. Assume a fillfactor of f = 50% of a volume; that is, when the hay is stacked, on average 50% of the volume is occupiedby hay.

A stack of hay is typically much higher in the middle than around the edges. For simplicity, assume thestack is appoximately circularly symmetric.

a. Write down a reasonable model for the shape of a haystack. Give the model in hay units (hu). Assumethere is an overall scale factor that determines the size, A, given in hu. Thus, for a doubling of the value ofA, the volume of the haystack increases by a factor of A3.

b. Write down a reasonable model for N pieces of hay stacked up. That is, assume a distribution onhay shapes consistent with the statistics above. Determine is the distribution on the volume occupied by Npieces of hay.

c. For your model of the shape of a haystack, derive a reasonable estimator on the number of pieces ofhay. Justify your model based on an optimality criterion and your statistical models above.

d. Evaluate the performance of your estimator as a function of A.e. As A gets very large, what is the form of your estimator?f. Comment on the difficulty of finding a needle in a haystack. You may assume that the length of the

needle is much smaller than 1 hu.

14



This is a one and a half hour in class exam. One sheet of notes, front and back, is allowed. No computers(including calculators) are allowed. By signing this exam, you are confirming your agreement to abide by theexam rules including the honor code of Washington University’s School of Engineering and Applied Science.


NAME:

Problem Score1.2.3.4.

Total

Potentially useful stuff:

∞∑k=0

αk =1

1 − α, |α| < 1 (67)

∞∑k=0

kαk−1 =1

(1 − α)2, |α| < 1 (68)

15

Basic Detection Theory (20 points)

Suppose that two data models are

Hypothesis 1: r(n) =(

14

)n + w(n) (69)

Hypothesis 2: r(n) = c(

13

)n + w(n), (70)

where under either hypothesis w(n) is a sequence of independent and identically distributed (i.i.d.) Gaussianrandom variables with zero mean and variance σ2; under each hypothesis, the noise w(n) is independent ofthe signal. The variable c and the variance σ2 are known.

Assume that measurements are available for n = 0, 1, . . . , N − 1.a. Find the loglikelihood ratio test.b. What single quantity parameterizes performance?c. What is the limiting signal to noise ratio as N goes to infinity?d. What is the value of the variable c that minimizes performance?

Basic Detection Theory (25 points)

Suppose that under hypothesis H1, the random variable x has a Cauchy probability density function withmean 1

px(X) =1

π [1 + (X − 1)2]. (71)

Under hypothesis H0, the random variable x has a Cauchy probability density function with mean 0

px(X) =1

π [1 + X2 ]. (72)

a. Given a single measurement find the likelihood ratio test.b. For a single measurement, sketch the receiver operating characteristic.In your calculations, you may need the following indefinite integral:∫

11 + X2

dX = tan−1(X) + constant. (73)

Basic Estimation Theory (20 points)

Suppose that xi, i = 1, 2, . . . , N are i.i.d. Poisson distributed random variables with mean λ.a. Find the maximum likelihood estimate for λ.b. Find the Cramer-Rao lower bound on the variance of any unbiased estimator.c. Compute the bias of the maximum likelihood estimator.d. Compute the variance of the maximum likelihood estimator. Compare this result to the Cramer-Rao

lower bound from part b, and comment on the result.

16

Expectation-Maximization (EM) Algorithm (35 points)

In this problem, you will solve a maximum likelihood estimation problem in two ways. First, you will solveit directly, obtaining a closed form solution. Given a closed form solution, the derivation of an iterativealgorithm for this problem is not fundamental. However, in the second part of this problem you will derivean expecation-maximization (EM) algorithm for it. This algorithm may be extended to more complicatedscenarios where it would be useful.

Assume that λ is a random variable drawn from a gamma density function

p(λ|θ) =θM

Γ(M)λM−1e−λθ, λ ≥ 0. (74)

Here θ is an unknown nonnegative parameter and Γ(M) is the Gamma function. Note that Γ(M) is thenormalizing constant,

Γ(M) =∫ ∞

0

xM−1e−xdx. (75)

The Gamma function satisfies a recurrence formula Γ(M+1) = MΓ(M). For M an integer, Γ(M) = (M−1)!.Note that M is known. This implies that the mean of this gamma density function is E[λ|θ] = M

θ.

The random variable λ is not directly observable. Given the random variable λ, the observationsx1, x2, . . . , xN , are i.i.d. with probability density functions

pxi|λ(Xi|λ) = λe−λXi , Xi ≥ 0. (76)

Note that there is one random variable λ and there are N random variables xi that are i.i.d. given λ.a. Find the joint probability density function for x1, x2, . . . , xN conditioned on θ.b. Directly from the probability density function from part a, find the maximum likelihood estimate of

θ. Note that this estimate does not depend on λ.c. To start the derivation of the EM algorithm, write down the complete data loglikelihood function,

keeping only terms that depend on θ. Denote this function lcd(λ|θ).d. Compute the expected value of the complete data loglikelihood function given the observations

x1, x2, . . . , xN , and the previous estimate for θ denoted θ(k). Denote this function by Q(θ|θ(k)). Hint: Thisstep requires a little thought. The posterior density function on λ given the measurements is in a familiarform.

e. Maximize the Q function over θ to obtain θ(k+1). Write down the resulting recursion with θ(k+1) as afunction of θ(k).

f. Verify that the maximum likelihood estimate derived in part b is a fixed point of the iterations derivedin part e.

17



This is a three hour exam in class. It is open book and open notes. No computers (including calculators)are allowed. By signing this exam, you are confirming your agreement to abide by the exam rules includingthe honor code of Washington University’s School of Engineering and Applied Science.


NAME:

1

2

3

4

Total


In this problem, you will compare the performance of three detection problems and determine which of thethree performs the best. The following assumptions are the same for each problem:

• The prior probabilities are equal: P1 = P2 = 0.5.

• The performance is measured by the probability of error.

• The models are known signal plus additive white Gaussian noise.

• As shown below, for each problem the average signal energy (averaged over the two hypotheses) is E.

• The signals are given in terms of s1(t) and s2(t) plotted in the figure below.

• The noise w(t) is white Gaussian noise with intensity N0/2, and is independent of the signal undereach hypothesis.

Problem 1:

H1 r(t) =√

2Es1(t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = w(t), 0 ≤ t ≤ 4.

18

Problem 2:

H1 r(t) =√

Es1(t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) =√

Es2(t) + w(t), 0 ≤ t ≤ 4.

Problem 3:

H1 r(t) =√

Es1(t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = −√Es1(t) + w(t), 0 ≤ t ≤ 4.

Determine the optimal performance for each of the three problems. Compare these performances.

−1 0 1 2 3 4 5−1

−0.5

0

0.5

1

Time, s

Sig

nal V

alue

Signal s1 (t)

−1 0 1 2 3 4 5−1

−0.5

0

0.5

1

Time, s

Sig

nal V

alue

Signal s2 (t)

Figure 1: Signals s1(t) and s2(t).

19

6 Detection Theory, continued (30 points)

In this problem, you will compare the performance of two problems, only this time, the signals are random.Let a be a Gaussian random variable with mean 0 and variance σ2. The random variable a is independentof the noise w(t). The two signals s1(t) and s2(t) are the same as in the first detection problem, as shown inFigure 1. As before, the hypotheses are equally likely, the goal is to minimize the probability of error, andthe independent additive white Gaussian noise w(t) has intensity N0/2.

Problem A:

H1 r(t) = a√

Es1(t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = a√

Es2(t) + w(t), 0 ≤ t ≤ 4.

Problem B:

H1 r(t) = a√

Es1(t) + w(t), 0 ≤ t ≤ 4,

H2 r(t) = −a√

Es1(t) + w(t), 0 ≤ t ≤ 4.

a. Find the probability of error for the minimum probability of error decision rule for Problem B.b. Find an expression (given as an integral, but without solving the integral) for the probability of error

for Problem A. Show that the performance is determined by the ratio SNR = 2Eσ2/N0.c. Which of these two problems has better performance and why?d. This part is difficult, so you may want to save it for last. Solve the integral for the minimum probability

of error for Problem A. Show that the performance is determined by the fraction of an ellipse that is in aspecific quadrant. Show that this fraction equals

1π

tan−1 1√SNR + 1

, (77)

and therefore that the probability of error is

P (error) =2π

tan−1 1√SNR + 1

. (78)

20


Suppose that the random variable a is known to be Gaussian distributed with mean 0, but it has unknownvariance σ2. The problem here is to find the maximum likelihood estimate for σ2 given a realization of arandom process r(t), where

r(t) = a√

Es(t) + w(t), 0 ≤ t ≤ T. (79)

In this equation, s(t) is a known function that has unit energy in the interval [0, T ], the energy E is known,and the additive white Gaussian noise w(t) is independent of a and the signal and has intensity N0/2.

a. Find the maximum likelihood estimate for σ2.b. Find the Cramer-Rao lower bound for estimating σ2. State what the Cramer-Rao lower bound

signifies.c. Find the bias of the maximum-likelihood estimator (if you cannot find it in closed form, it suffices to

give an integral).d. Is the variance of the maximum-likelihood estimator greater than the Cramer-Rao lower bound, equal

to it, or less than it? Explain your answer.


Suppose there are three random variables x, y, and z. x is Gaussian distributed with zero mean and variance4. Given x, the pair of random variables [y, z]T is jointly Gaussian with mean [x/2 x/6]T , and covariancematrix

K =[

2 2/32/3 17/9

]. (80)

a. In terms of a realization of the pair of random variables [y, z] = [Y, Z], find the conditional mean ofx: E[x|Y, Z].

b. Comment on the special form of the result in part a.c. What is the variance of x given [y, z] = [Y, Z]?

21



This is a one and a half hour in class exam. One sheet front and back (or two sides of) 8.5 by 11 inchnotes is allowed. With the exception of your brain, no computers (including calculators, cell phones, andslide rules) are allowed. By signing this exam, you are confirming your agreement to abide by the exam rulesincluding the honor code of Washington University’s School of Engineering and Applied Science.


NAME:


Total


∞∑k=0

αk =1

1 − α, |α| < 1 (81)

∞∑k=0

kαk−1 =1

(1 − α)2, |α| < 1 (82)

9 Basic Detection Theory

Two observations (r1, r2) are measured. Assume that under Hypothesis 0, the two random variables areindependent Gaussian random variables with mean zero and variance 2. Under Hypothesis 1, the tworandom variables are independent Gaussian random variables with mean zero and variance 3.

a. Find the decision rule that maximizes the probability of detection subject to a constraint on theprobability of false alarm, PF ≤ α.

b. Derive an equation for the probability of detection as a function of α.

22

10 Basic Estimation Theory

Suppose that x, y, and z are jointly distributed Gaussian random variables. To establish notation, supposethat

E [x y z] = [μx μy μz] , (83)

and that

E

⎛⎝⎡⎣ x

yz

⎤⎦ [x y z]

⎞⎠ =

⎡⎣ Rxx Rxy Rxz

Rxy Ryy Ryz

Rxz Ryz Rzz

⎤⎦ (84)

a. Show that x and y − E[y|x] are independent Gaussian random variables.b. Show that x, y − E[y|x], and z− E [z|(x, y − E[y|x])] are independent Gaussian random variables.Hint. This problem is easy.


Suppose that the random variable s is uniformly distributed between -3 and +3, denoted U [−3, +3]. Thedata r is a noisy measurement of s,

r = s + n, (85)

where n is independent of s and is Laplacian distributed with probability density function

pn(n) =12e−|n|. (86)

a. Find the minimum mean-square error estimate of s given r. Show your work.b. Find the maximum a posteriori estimate of s given r.


Suppose that x1 and x2 are jointly distributed Gaussian random variables. There are two hypotheses fortheir joint distribution. Under either hypothesis they are both zero mean. Under hypothesis H1, they areindependent with variances 20/9 and 5, respectively. Under hypothesis H2,

E

([x1

x2

][x1 x2]

)=[

4 44 9

](87)

Determine the optimal test for a Neyman-Pearson test. Sketch the form of the corresponding decisionregion.

23

ESE 524: Detection and Estimation TheorySpring 2005, Joseph A. O’Sullivan


This is a three hour exam in class. Two sheets, or four sides of 8.5× 11 notes are allowed. No computers(including calculators and cell phones) are allowed. By signing this exam, you are confirming your agreementto abide by the exam rules including the honor code of Washington University’s School of Engineering andApplied Science.


NAME:

1

2

3

4

Total

Potentially useful information:∫ ∞

0

e−αte−j2πftdt =1

α + j2πf, Re{α} > 0, (88)

∫ 0

−∞eαte−j2πftdt =

1α − j2πf

, Re{α} > 0. (89)

24

13 Linear Estimation Theory (25 points)

Suppose that n(t),−∞ < t < ∞ is a stationary Gaussian random process with covariance function

E[n(t)n(t − τ )] = δ(τ ) +54e−2|τ|

= Kn(τ ). (90)

a. Assume that n(t) = nc(t) + w(t), where w(t) and nc(t) are independent stationary Gaussian randomprocesses, w(t) is white Gaussian noise, and nc(t) has mean finite energy. Find the covariance functions forw(t) and nc(t). Denote the covariance function for nc(t) by Kc(τ ).

b. Find an equation for the optimal estimate of nc(t) given n(u),−∞ < u ≤ t, in terms of Kn and Kc.Be as specific as you can, that is, make sure that the equations define the unique solution to the problem.Make sure that you account for the causality, that is, the estimate of nc(t) at time t depends only on currentand previous values of n(t).

c. Examine the equations from part b carefully and argue that the unique solution for the estimator is alinear, causal, time-invariant filter. Find the Fourier transform of the impulse response and find the impulseresponse.

25

14 EM Algorithm (25 points)

The probability density function for the beta distribution is of the form

f(x : a, b) =Γ(a + b)Γ(a)Γ(b)

xa(1 − x)b, 0 ≤ x ≤ 1, (91)

where Γ(t) is the Gamma function

Γ(t) =∫ ∞

0

xt−1e−xdx. (92)

a. Suppose that N independent and identically distributed (i.i.d.) realizations Xi are drawn fromthe probability density function f(x : a, b). Find equations for the maximum likelihood estimates of theparameters a and b in terms of the observations. DO NOT SOLVE the equations, but represent the solutionin terms of the Gamma function and the derivative of the Gamma function, Γ′(t).

b. For the estimation problem in part a, what are the sufficient statistics?c. Now assume that the true distribution is a mixture of two beta distributions

f(x : π, a, b, α, β) = πΓ(a + b)Γ(a)Γ(b)

xa(1 − x)b + (1 − π)Γ(α + β)Γ(α)Γ(β)

xα(1 − x)β, 0 ≤ x ≤ 1. (93)

The generation of a realization from this probability density function can be viewed as a two-step process. Inthe first step, the first density is selected with probability π and the second is selected with probability 1−π.In the second step, a realization of the appropriate density function is selected. Define the complete datafor a realization as the pairs (ji, Xi), i = 1, 2, . . . , N , where ji ∈ {0, 1} indicates which density is selected inthe first step, and Xi is the realization from that density. For this complete data, write down the completedata loglikelihood function.

d. Find the expected value of the complete data loglikelihood function given the incomplete data{X1, X2, . . .XN}; call this function Q.

e. Maximize Q over the variables π, a, b, α, β. Write down the equations that the maximum likelihoodestimates satisfy in terms of the functions Γ(t) and Γ′(t).

f. If the maximum likelihood estimator defined in part a is given by (aML, bML) = h(x1, x2, . . . xN),express the result of the maximization step in part e in terms of h.

26


In this problem, we look at detection in the presence of colored noise. Under hypothesis H0,

r(t) = n(t), 0 ≤ t ≤ 3, (94)

while under hypothesis H1,r(t) = s(t) + n(t), 0 ≤ t ≤ 3, (95)

where the noise n(t) is independent of the signal s(t). Supppose that n(t), 0 ≤ t ≤ 3 is a Gaussian randomprocess with covariance function

Kn(t, τ ) = δ(t − τ ) + 2s1(t)s1(τ ) + 3s2(t)s2(τ ) + 5s3(t)s3(τ ) + 7s4(t)s4(τ ), 0 ≤ t ≤ 3, 0 ≤ τ ≤ 3,

where

s1(t) =

√23

cos(2πt/3), 0 ≤ t ≤ 3,

s2(t) =

√23

sin(2πt/3), 0 ≤ t ≤ 3,

s3(t) =

√23

cos(4πt/3), 0 ≤ t ≤ 3,

s4(t) =

√23

sin(4πt/3), 0 ≤ t ≤ 3.

These signals that parameterize the covariance function for the noise are shown in the top plots in Figure 3.The signal s(t) is shown in the lower plot in Figure 3 and is equal to

s(t) = 11 cos(4πt/3 + π/3), 0 ≤ t ≤ 3. (96)

a. Derive the optimal Bayes decision rule, assuming prior probabilities P0 and P1, and costs of false alarmand miss CF and CM , respectively.b. What are the sufficient statistics? Find the joint probability density function for the sufficient statistics.c. Derive an expression for the probability of detection for a fixed threshold.

−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4−1

−0.5

0

0.5

1

Time, s

Noi

se S

igna

l Val

ues

Four Noise Functions

−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4−15

−10

−5

0

5

10

15

Time, s

Sig

nal V

alue

, H1

Figure 2: Upper Plot: Signals s1(t), s2(t), s3(t), and s4(t). Lower Plot: Signal s(t) for hypothesis H1.

27


16.1 Background Motivation

In ultrasonic imaging, the substance through which the sound waves pass determines the speed of sound.Local variations in the speed of sound, if detectable, may be used to infer properties of the medium. On agrand scale, whales are able to communicate in the ocean over long distances because variations in the speedof sound with depth create effective waveguides near the surface, thereby enabling sound to propagate overlong distances. On a smaller scale, variations in the speed of sound in human tissue may confound inferencesabout other structures. The speed of sound in a uniform medium causes an ultrasound signal to be scaledin time. Thus, an estimate of a time-scale factor, as discussed below, may be used to derive an estimate ofthe speed of sound.

A second example in which time-scale estimation plays a role is in estimating the speed of an emitter ora reflector whose velocity is such that the usual narrowband assumption does not hold. This is the case ifthe velocity is significant relative to the speed of propagation. This is also the case if the bandwidth of thesignal is comparable to its largest frequency, so the notion of a carrier frequency does not make sense.

16.2 Problem Statement

Suppose that a real-valued signal with an unknown time-scale factor is observed in white Gaussian noise:

r(t) =√

aEs(at) + w(t), T/2 ≤ t ≤ T/2, (97)

where w(t) is white Gaussian noise with intensity N0/2, w(t) is independent of the signal and the unknowntime-scale factor a, s(t) is a signal of unit energy, and E is the energy of the transmitted signal. This problemrelates to finding the maximum likelihood estimate for the time-scale factor a subject to the constraint thata > 0.

Assume that T is large enough so that, whatever value of a is considered,∫ T/2

−T/2

|√as(at)|2dt =∫ aT/2

−aT/2

|s(τ )|2dτ

=∫ T/2

−T/2

|s(t)|2dt

=∫ ∞

−∞|s(t)|2dt

= 1. (98)

Essentially, this assumption allows us to make some of the integrals that are relevant for the problem goover an infinite time interval. Do not use this assumption in part a below.

a. Find the log-likelihood functional for estimating a.b. Derive an equation that the maximum likelihood estimate for a must satisfy. What role does the

constraint a > 0 play?c. For a general estimation problem, state in words what the Cramer-Rao bound is.d. Assume that the function s(t) is twice continuously differentiable. Find the Cramer-Rao lower bound

for any estimate of a. Hint: For a > 0 and α > 0,∫ ∞

−∞

√as(at)

√αs(αt)dt =

∫ ∞

−∞

√as(at)

√αs(αt)dt

=∫ ∞

−∞

√α

as(τ )s

(α

aτ)

dτ

= C(α

a

). (99)

28

The function C is a time-scale correlation function. This correlation between two time-scaled signals dependsonly on the ratio of the time scales. This correlation is symmetric in a and α, so C(ρ) = C(1/ρ). Note thatthe maximum of C is obtained at C(1) = 1. C(ρ) is differentiable at ρ = 1 if s(t) is differentiable. If inaddition to being differentiable, ts2(t) goes to zero as t gets large, dC

dρ (1) = 0. Derive the Cramer-Rao lowerbound in terms of C(ρ).

29

Qualifying Examination Questions 2005


Suppose that the random variable s is uniformly distributed between -3 and +3, denoted U [−3, +3]. Thedata y is a noisy measurement of s,

y = s + n, (100)

where n is independent of s and is Lapacian distributed with probability density function

pn(n) =12e−|n|. (101)

a. Find the minimum mean-square error estimate of s given y. Show your work.b. Find the maximum a posteriori estimate of s given y.


Suppose that x1 and x2 are jointly distributed Gaussian random variables. There are two hypotheses fortheir joint distribution. Under either hypothesis they are both zero mean. Under hypothesis H1, they areindependent with variances 20/9 and 5, respectively. Under hypothesis H2,

E

([x1

x2

][x1 x2]

)=[

4 44 9

](102)

Determine the optimal test for a Neyman-Pearson test. Sketch the form of the corresponding decisionregion.

30

Special Probability, Detection, and Estimation Qualifying ExaminationAugust 25, 2005

This is a two hour exam. No computers (including calculators and cell phones) are allowed. By signingthis exam, you are confirming your agreement to abide by the exam rules including the honor code ofWashington University’s School of Engineering and Applied Science.

Sign your name below.NAME:

1

2

3

4

Total

Potentially useful information:∫ ∞

0

e−αte−j2πftdt =1

α + j2πf, Re{α} > 0, (103)

∫ 0

−∞eαte−j2πftdt =

1α − j2πf

, Re{α} > 0. (104)

∞∑k=0

αk =1

1 − α, |α| < 1 (105)

∞∑k=0

kαk−1 =1

(1 − α)2, |α| < 1 (106)

19 Linear Estimation Theory (25 points)

Suppose that n(t),−∞ < t < ∞ is a stationary Gaussian random process with covariance function

E[n(t)n(t − τ )] = δ(τ ) +54e−2|τ|

= Kn(τ ). (107)

a. Assume that n(t) = nc(t) + w(t), where w(t) and nc(t) are independent stationary Gaussian randomprocesses, w(t) is white Gaussian noise, and nc(t) has mean finite energy over any finite interval. Find thecovariance functions for w(t) and nc(t). Denote the covariance function for nc(t) by Kc(τ ).

b. Find an equation for the optimal estimate of nc(t) given n(u),−∞ < u ≤ t, in terms of Kn and Kc.Be as specific as you can, that is, make sure that the equations define the unique solution to the problem.Make sure that you account for the causality, that is, the estimate of nc(t) at time t depends only on currentand previous values of n(t).

c. Examine the equations from part b carefully and argue that the unique solution for the estimator isa linear, causal, time-invariant filter. Find the Fourier transform of the impulse response of this filter.

31

20 Detection Theory

In this problem, we look at detection in the presence of colored noise. Under hypothesis H0,

r(t) = n(t), 0 ≤ t ≤ 3, (108)

while under hypothesis H1,r(t) = s(t) + n(t), 0 ≤ t ≤ 3, (109)

where the noise n(t) is independent of the signal s(t). Supppose that n(t), 0 ≤ t ≤ 3 is a Gaussian randomprocess with covariance function

Kn(t, τ ) = δ(t − τ ) + 2s1(t)s1(τ ) + 3s2(t)s2(τ ) + 5s3(t)s3(τ ) + 7s4(t)s4(τ ), 0 ≤ t ≤ 3, 0 ≤ τ ≤ 3,

where

s1(t) =

√23

cos(2πt/3), 0 ≤ t ≤ 3,

s2(t) =

√23

sin(2πt/3), 0 ≤ t ≤ 3,

s3(t) =

√23

cos(4πt/3), 0 ≤ t ≤ 3,

s4(t) =

√23

sin(4πt/3), 0 ≤ t ≤ 3.

These signals that parameterize the covariance function for the noise are shown in the top plots in Figure 3.The signal s(t) is shown in the lower plot in Figure 3 and is equal to

s(t) = 11 cos(4πt/3 + π/3), 0 ≤ t ≤ 3. (110)

a. Derive the optimal Bayes decision rule, assuming prior probabilities P0 and P1, and costs of false alarmand miss CF and CM , respectively.b. What are the sufficient statistics? Find the joint probability density function for the sufficient statistics.c. Derive an expression for the probability of detection for a fixed threshold.

−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4−1

−0.5

0

0.5

1

Time, s

Noi

se S

igna

l Val

ues

Four Noise Functions

−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4−15

−10

−5

0

5

10

15

Time, s

Sig

nal V

alue

, H1

Figure 3: Upper Plot: Signals s1(t), s2(t), s3(t), and s4(t). Lower Plot: Signal s(t) for hypothesis H1.

32

Qualifying Examination Questions 2006

Point Process Parameter Estimation

Many radioactive decay problems can be modeled as Poisson processes with intensity (or rate) functions thatdecay exponentially over time. That is, all radioactive decay events are independent and each such eventdecreases the total amount of the material. In this problem, you will estimate both the total amount of thematerial and the decay rate using a simplified, two-parameter model.

Assume that 0 < X1 < X2 < X3 < . . . < Xn < . . . is a set of points (a realization) drawn from a Poissonprocess with intensity function

λ(t) = ace−ct, t ≥ 0. (111)

Thus the two parameters are a and c. Denote such a realization by X. We know that number of points inany interval [T0, T1) is Poisson distributed with mean μ equal to

μ =∫ T1

T0

λ(t)dt. (112)

The numbers of points in two nonoverlapping intervals are independent.The derivation of the loglikelihood function for a Poisson process is technically involved. A simplified

derivation starts by placing intervals of width ε around each point Xi. The probability of getting one pointin the interval around Xi is

ελ(Xi)e−ελ(Xi), (113)

which for small ε is close to ελ(Xi). The probability of getting two or more points in an ε interval is negligiblefor small ε. The probability of getting no points between Xi−1 and Xi is

e−∫ Xi

Xi−1λ(t)dt

. (114)

The likelihood function for the first N points is then proportional to (taking X0 = 0)

L(X) =N∏

i=1

e−∫ Xi

Xi−1λ(t)dt

λ(Xi) (115)

= e−∫ XN

0λ(t)dt

N∏i=1

λ(Xi), (116)

and the loglikelihood function is the natural logarithm of L(X).a. What is the probability distribution on the total number of points? Argue that the total number is

finite with probability one. For a finite number of points, the likelihood function is modified by multiplyingby the term

e−∫∞

XNλ(t)dt

. (117)

that corresponds to getting zero points after the last point XN . Thus the likelihood function for N totalpoints becomes

L(X) = e−∫∞0

λ(t)dtN∏

i=1

λ(Xi). (118)

b. Argue that a corresponds to the total amount of material. Find the maximum likelinood estimate fora.

c. The parameter c determines the rate of decay. Find the maximum likelihood estimate for c. Arguethat 1/cML is the maximum likelihood estimate of the time constant of the decay.

33

Detection Theory

A decision must be made between two models for a sequence of Gaussian distributed random variables.Each model is an autoregressive model. The first model is autoregressive of order one, while the second isautoregressive of order two. There are two goals here as outlined below. First, the optimal test statisticfor a Neyman-Pearson test must be computed for a fixed number N of consecutive samples of a realization.Second, an efficient update of this test statistic to the case with N + 1 samples must be derived.

Consider the following two hypotheses. Under H1, the model for the measurements is

Yi = 0.75Yi−1 + Wi, (119)

where Wi are independent and identically distributed Gaussian random variables with zero mean and varianceequal to 7/4=1.75; Wi are independent of Y0 for all i; and Y0 is Gaussian distributed with zero mean andvariance 4.

Under H2, the model for the measurements is

Yi = 0.75Yi−1 + 0.2Yi−2 + Wi, (120)

where Wi are independent and identically distributed Gaussian random variables with zero mean and varianceequal to 1.75; Wi are independent of Y0 for all i; and Y0 is Gaussian distributed with zero mean and variance4. Y1 = 0.75Y0 + W1 where W1 is a zero mean Gaussian random variable with zero mean and variance 1.75.

a. Given Y0, Y1, . . . , YN , find the optimal test statistic for a Neyman-Pearson test. Simplify the expressionas much as possible. Interpret your answer.

b. Denote the test statistic computed in part a by lN . The optimal test statistic for N +1 measurementsis lN+1 . Find an efficient update rule for computing lN+1 from lN .

34



This is a one and a half hour in class exam. One sheet front and back (or two sides of) 8.5 by 11 inchnotes is allowed. With the exception of your brain, no computers (including calculators, cell phones, andslide rules) are allowed. By signing this exam, you are confirming your agreement to abide by the exam rulesincluding the honor code of Washington University’s School of Engineering and Applied Science.


Problem Score1.2.3.

Total


∞∑k=0

αk =1

1 − α, |α| < 1

∞∑k=0

kαk−1 =1

(1 − α)2, |α| < 1 (121)


Suppose that under each of two hypotheses the three random variables X1, X2, and X3 are independentPoisson random variables. Under hypothesis H0, they all have mean 2. Under hypothesis H1, they havemeans 1, 2, and 3, respectively. Assume that the prior probabilities on the two hypotheses are each 0.5.

a. Find the minimum probability of error decision rule.b. Draw a picture of the decision region.


Suppose that

R1 = 5 cos θ + W1, (122)R2 = 5 sin θ + W2, (123)

where θ is a deterministic parameter to be estimated and W1 and W2 are independent and identicallydistributed Gaussian random variables, with zero mean and variance 3.

a. Find the maximum likelihood estimate for θ.b. Find the Cramer-Rao lower bound for the estimate of θ. Does this bound depend on the true value

of θ? Comment on this. The variable θ has units of radians.

35

23 Information Rate Functions:

Exponential Densities with Different Means

Consider the hypothesis testing problem with hypotheses:

H0 : Ri ∼ p0(r) = 1λ0

e− r

λ0 , r ≥ 0, i = 1, 2, . . . , n

H1 : Ri ∼ p1(r) = 1λ1

e− r

λ1 , r ≥ 0, i = 1, 2, . . . , n,

and under either hypothesis, the random variables Ri are independent.a. Derive the log-likelihood ratio test for this problem and denote the loglikelihood ratio by l(r).b. Find the log-moment generating function for the loglikelihood ratio φ0(s). Recall that Φ0(s) is themoment generating function for the loglikelihood ratio (and is the normalizing factor for the tilted densityfunction) and φ0(s) = lnΦ0(s). Let l(r) be the loglikelihood function derived in part a; then

Φ0(s) = E[esl(R)|H0

]. (124)

c. Consider a threshold in the loglikelihood ratio test that gets smaller with n. In particular, consider thetest that compares l(r) to a threshold γ/n. Find, as a function of γ, an upper bound on the probability offalse alarm

PF ≤ e−nI(γ). (125)

In particular, I(γ) is the information rate function. In fact, for all problems like this,

I0(γ) = limn→∞− 1

nlnP (l(R) > γ/n|H0). (126)

d. Find the tilted probability density function for this problem. Show that this tilted density functioncorresponds to independent exponentially distributed random variables with mean λs, where

1λs

=1 − s

λ0+

s

λ1. (127)

e. Let s correspond to γ as in the computation of the information rate function in part c. Show that therelative entropy between the tilted density function using s and the density function under hypothesis H0

equals I(γ). That is,

D(ps||p0) = I(γ(s)). (128)

36



This is a three hour final exam. Two sheets front and back (or four sides of) 8.5 by 11 inch notes areallowed. With the exception of your brain, no computers (including calculators, i-pods, and cell phones) areallowed. By signing this exam, you are confirming your agreement to abide by the exam rules including thehonor code of Washington University’s School of Engineering and Applied Science.



Total


∞∑k=0

αk =1

1 − α, |α| < 1

∞∑k=0

kαk−1 =1

(1 − α)2, |α| < 1 (129)

A Poisson distribution for a random variable X with mean λ > 0 has probabilities

P (X = k) =λk

k!e−λ, k = 0, 1, 2, . . . (130)

24 3-ary Detection Theory

Suppose that there are three hypotheses, H1, H2, and H3. The prior probabilities of the hypotheses areeach 1/3. There are three observed random variables, X1, X2, and X3. Under each hypothesis, the randomvariables are independent and equal⎡

⎣ X1

X2

X3

⎤⎦ =

⎡⎣ S1

S2

S3

⎤⎦+

⎡⎣ W1

W2

W3

⎤⎦ . (131)

The random variables W1, W2, and W3 are independent noise random variables. They are Poisson distributedwith means equal to 1. The noise random variables Wi are all independent of the signal random variablesSk. Under each hypothesis, the signal random variables S1, S2, and S3 are independent Poisson randomvariables. The means of the signal random variables depend on the hypotheses:

H1 : E

⎧⎨⎩⎡⎣ S1

S2

S3

⎤⎦⎫⎬⎭ =

⎡⎣ 1

10

⎤⎦ H2 : E

⎧⎨⎩⎡⎣ S1

S2

S3

⎤⎦⎫⎬⎭ =

⎡⎣ 1

01

⎤⎦ (132)

H3 : E

⎧⎨⎩⎡⎣ S1

S2

S3

⎤⎦⎫⎬⎭ =

⎡⎣ 0

11

⎤⎦ . (133)

a. Find the optimal decision rule.b. Find an expression for the probability of error.

37

25 Estimation Theory

−1 0 1 2 3 4 5 6 7−1

0

1

2

Time, s

Sig

nal V

alue

Signal s1 (t)

−1 0 1 2 3 4 5 6 7−1

0

1

2

Time, s

Sig

nal V

alue

Signal s2 (t)

−1 0 1 2 3 4 5 6 7−1

0

1

2

Time, s

Sig

nal V

alue

Signal s3 (t)

Figure 4: Signals s1(t), s2(t), and s3(t).

Suppose that

r(t) = as1(t) + bs2(t) + cs3(t) + w(t), 0 ≤ t ≤ 6, (134)

where

• w(t) is white Gaussian noise, independent of the signals, with intensity N02 ,

• the signals s1(t), s2(t), and s3(t) are shown in Figure 4, and

• the scale factors a, b, and c are independent and identically distributed Gaussian random variableswith zero mean and variance 9.

a. Find the maximum a posteriori (MAP) estimates for a, b, and c given r(t), 0 ≤ t ≤ 6.b. Find the Fisher information for estimating a, b, and c given r(t), 0 ≤ t ≤ 6 (take into account that

they are random variables).c. Do the MAP estimates achieve the Cramer-Rao lower bound consistent with the Fisher information

from part b? Comment.

38

26 Detection Problem

[Adapted from a problem from the University of Maryland Detection and Estimation Theory course, spring2005]

Consider the binary hypothesis testing problem over the interval −1 ≤ t ≤ 1,

H0 : r(t) = n(t), −1 ≤ t ≤ 1, (135)H1 : r(t) = s(t) + n(t), −1 ≤ t ≤ 1, (136)

where s(t) is a known (real-valued) signal and n(t) is a (real-valued) zero mean Gaussian random processwith covariance function

E [n(t)n(u)] = 1 + tu, −1 ≤ t ≤ 1, −1 ≤ u ≤ 1, (137)= Kn(t, u). (138)

The noise n(t) is assumed to be independent of the signal s(t).a. Find the eigenfunctions and eigenvalues for Kn(t, u) over the interval −1 ≤ t ≤ 1.b. Suppose that s(t) = 2− 3t, for −1 ≤ t ≤ 1. Find the likelihood ratio test. What is the signal to noise

ratio?c. Note that there are only a finite number of nonzero eigenvalues. Comment on the implications

for the performance if the signal s(t) is not in the subspace of signal space spanned by the correspondingeigenfunctions. That is, what can be said about thhe performance if s(t) is not a linear combination of theeigenfunctions corresponding to the nonzero eigenvalues.

27 Sequential Estimation Problem

Suppose that X and Y are independent Gaussian random variables with means 0 and variances 3 and 5,respectively. Define the random variables

R1 = 5X + 3Y + W1 (139)R2 = 3X + Y + W2 (140)R3 = X − Y + W3, (141)

where W1, W2, and W3 are identically distributed Gaussian random variables with zero mean and variance1. The random variables X, Y , W1, W2, and W3 are all independent.

a. Find [X(1)Y (1)

]= E

{[XY

]|R1 = r1

}. (142)

b. Derive an expression for[X(2)Y (2)

]= E

{[XY

]|R1 = r1, R2 = r2

}(143)

that only depends on X(1), Y (1), and r2.c. Derive an expression for[

X(3)Y (3)

]= E

{[XY

]|R1 = r1, R2 = r2, R3 = r3

}(144)

that only depends on X(2), Y (2),and r3.

39

basic estimation theory - washington university in st. louis · basic estimation theory ......

Documents