Probability and Random Processes skoglund/edu/probability/slides/lec12.pdf · Probability and Random…

Download Probability and Random Processes skoglund/edu/probability/slides/lec12.pdf · Probability and Random…

Post on 30-Aug-2018




0 download

Embed Size (px)


<ul><li><p>Probability and Random ProcessesLecture 12</p><p> Detection Estimation Capacity Information</p><p>Mikael Skoglund, Probability and random processes 1/24</p><p>Detection</p><p> A random process on (ET , ET ) with two possible distributions0 and 1</p><p> Assume that 0 1 and 1 0 (the distributions areequivalent)</p><p> Observe f ET and based on the observation</p><p>decide H0 : 0 or H1 : 1</p><p> design a measurable mapping g : ET {0, 1}</p><p>Mikael Skoglund, Probability and random processes 2/24</p></li><li><p>Criteria</p><p> Classical: minimize</p><p>P (g(f) = 0|H1)</p><p>subject to P (g(f) = 1|H0) Bayesian: minimize</p><p>Pe = P (g(f) = 0|H1)P (H1) + P (g(f) = 1|H0)P (H0)</p><p>Mikael Skoglund, Probability and random processes 3/24</p><p>Bayesian detection</p><p> Let G1 = g1({1}), G0 = g1({0}), assuming G1 G0 = ETand G1 G0 = , then</p><p>Pe = P (H1)</p><p>G0</p><p>d1 + P (H0)</p><p>G1</p><p>d0</p><p>= P (H1)</p><p>G0</p><p>(d1d0 P (H0)P (H1)</p><p>)d0 + P (H0)</p><p> Hence, we should set</p><p>G0 =</p><p>{f :</p><p>d1d0</p><p>(f) </p><p>P (H0)</p><p>P (H1)</p><p>}</p><p>Mikael Skoglund, Probability and random processes 4/24</p></li><li><p> Compare the likelihood ratio</p><p>(f) =d1d0</p><p>(f)</p><p>to a threshold</p><p> Classical NeymanPearson: also based on comparing toa threshold</p><p> Given f ET , how do we compute (f)?</p><p>Mikael Skoglund, Probability and random processes 5/24</p><p>Grenanders theorem</p><p> Look at (E, E) = (R,B) and T = R+ X : (,A, P ) (RT ,BT , ) is separable if there is a setN for which P (N) = 0 and a sequence {tk} T suchthat for any open interval I and closed set C</p><p>{ : tk(X()) C, tk I}\{ : t(X()) C, t I} N</p><p> i.e., f(t) RT can be sampled without loss at the tks For any X : (,A, P ) (RT ,BT , ) there is aX : (,A, P ) (RT ,BT , ) such that X is separable and</p><p>P ({ : t(X) = t(X), t T}) = 1</p><p>Mikael Skoglund, Probability and random processes 6/24</p></li><li><p> Consider the detection problem, and assume that 1 and 2when restricted to {tk}n1 , for any finite n, are both absolutelycontinuous with respect to Lebesgue measure on Rn</p><p> Observe f(t), sample as fk = f(tk) (with {tk} as in thedefinition of separability), let fn = (f1, . . . , fn) and denotethe densities g1(f</p><p>n) and g2(fn) (corresponding to 1 and 2)</p><p> Then the entitygn =</p><p>g1(fn)</p><p>g2(fn)</p><p>converges with probability one to (f) under both H0 and H1</p><p>Mikael Skoglund, Probability and random processes 7/24</p><p>Gaussian waveforms</p><p> Consider the continuous-time Gaussian example with twopossible mean-value functions, that is, f(t) is Gaussian withE[f(t)] = mi(t) under Hi, and has a positive-definitecovariance kernel V (s, u) (under both H0 and H1)</p><p> Without loss we can assume m0(t) = 0 and m1(t) = m(t) Assume that m(t) can be expressed as</p><p>m(t) =</p><p>V (t, s)h(s)ds</p><p>for some h(t), then</p><p>ln(f) =</p><p>f(t)h(t)dt 1</p><p>2</p><p>m(t)h(t)dt</p><p>(with probability one under H0 and H1)</p><p>Mikael Skoglund, Probability and random processes 8/24</p></li><li><p>Estimation</p><p>Bayesian</p><p> Two random objects X and Y on (,A, P ) with range spaces(R,B) and (E, E) (standard)</p><p> Estimate X R from observing Y = y E; MMSE </p><p>X(y) = E[X|Y = y]</p><p> That is,X(y) =</p><p>xdy</p><p>where y is the regular conditional distribution for X givenY = y</p><p>Mikael Skoglund, Probability and random processes 9/24</p><p>Classical</p><p> For an absolutely continuous random variable X with pdff(x); given the observation X = x we have the traditionalML estimate</p><p> = arg max</p><p>f(x)</p><p> A pdf f(x) is the RadonNikodym derivative of thedistribution on (R,B) w.r.t. Lebesgue measure ; that is, itcan be interpreted as the likelihood ratio between thehypothesis H0 : = and H1 : = X (the correctdistribution)</p><p> In the case of a general random object X : (,A, P )(E, E , ), we can choose a dummy hypothesis H0 : = 0as a reference to H1 : = , where is the correctdistribution, with an unknown parameter R</p><p>Mikael Skoglund, Probability and random processes 10/24</p></li><li><p> Then, based on the observation X = x, the ML estimate canbe computed as</p><p> = arg max</p><p>dd0</p><p>(x)</p><p> The reference distribution can be chosen e.g. such thatcomputing the likelihood ratio is feasible</p><p> Note that in general, the estimate = arg max does notmake sense; is a mapping from sets in E</p><p>Mikael Skoglund, Probability and random processes 11/24</p><p>Channels</p><p> Given two measurable spaces (,A) and (,S), a mappingg : S R+ is called a transition kernel (from to ) if</p><p>1 f() = g(, S) is measurable for any fixed S S2 h(S) = g(, S) is a measure on (,S) for any fixed </p><p>If the measure h(S) in 2. is a probability measure, then g iscalled a stochastic kernel</p><p> If Y is a random object on (,A, P ) with values in (E, E),then a stochastic kernel g from to E is the regularconditional distribution of Y given G A if</p><p>g(, F ) = P ({Y F}|G)()</p><p>with probability one w.r.t. P and , and for all F E A regular conditional distribution for Y exists if (E, E) is</p><p>standard</p><p>Mikael Skoglund, Probability and random processes 12/24</p></li><li><p> The factorization lemma: Assume two measurable spaces(1,A1) and (2,A2) and a measurable mappingu : 1 2 are given. A function v : (1,A1) (R,B) ismeasurable w.r.t (u) A1 iff there is a measurable mapping : (2,A2) (R,B) such that v = u</p><p> Let X be an arbitrary random object on (,A, P ), and let Ybe as before, with (E, E) standard. Let</p><p>g(, F ) = P ({Y F}|(X))()</p><p>and let F be the mapping in the factorization lemmag(, F ) = F (X()) (w.r.t. for a fixed F ), then</p><p>P ({Y F}|X = x) = F (x)</p><p>is the conditional distribution of Y given X = x</p><p>Mikael Skoglund, Probability and random processes 13/24</p><p> Assume (,A, P ), a parameter set T and a processX : (,A, P ) (ET , ET , X) are given</p><p> Given another function/sequence space (F T ,FT ), a channelis a regular conditional distribution from to F T given aspecific value X = x ET ; that is, for any x ET thedistribution</p><p>x(F ) = P ({Y F}|X = x), F FT</p><p> Interpretation: A random channel input X is generated and isthen transmitted over the channel, resulting in the channeloutput Y ; given X = x the distribution for Y is x</p><p> A channel exists if the relevant spaces are standard</p><p>Mikael Skoglund, Probability and random processes 14/24</p></li><li><p> Given (ET , ET ) and (F T ,FT ), let ET FT be the product-algebra on ET F T</p><p> Let be defined by</p><p>(A,B) =</p><p>AP (B|X = x)dX(x)</p><p>on rectangles, A ET , B FT Standard spaces unique extension of from rectangles toET FT ; a joint distribution on (ET F T , ET FT )</p><p> Also define the corresponding product distribution ,generated as the extension of X(A)Y (B), with</p><p>Y (B) =</p><p>P (B|X = x)dX(x)</p><p>Mikael Skoglund, Probability and random processes 15/24</p><p>Channel Capacity</p><p> Focus on T = N+ and E = F = R; a channel x() withinput x RT and output y RT (sequences), resulting in thejoint distribution </p><p> A rate R [bits per channel use] is achievable if informationcan be transmitted at R with error probability below for any &gt; 0</p><p> The capacity C of the channel = sup{R : R is achievable }</p><p>Mikael Skoglund, Probability and random processes 16/24</p></li><li><p> Let Sn = {1, 2, . . . , n}, define the information density</p><p>i(x, y) = logd</p><p>d</p><p>(assuming ), and the corresponding restricted version</p><p>in(xn, yn) = log</p><p>d|Snd|Sn</p><p> Let(X) = sup</p><p>{ : lim</p><p>nP(n1in </p><p>)= 0}</p><p>Mikael Skoglund, Probability and random processes 17/24</p><p> A general formula for channel capacity [VerduHan, 94]:</p><p>C = supX</p><p>(X)</p><p> Computing C involves the problem of characterizing the limit (for each fixed X) ergodic theory</p><p>Mikael Skoglund, Probability and random processes 18/24</p></li><li><p>Information Measures</p><p> Given (,A), a measurable partition of is a finite collectionG1, . . . , Gn, Gi A, such that Gk Gl = for k 6= l andiGi = </p><p> Given two probability measures P and Q on (,A) and ameasurable partition G = {Gi}ni=1 of , define</p><p>D(PQ)(G) =</p><p>GGP (G) log</p><p>P (G)</p><p>Q(G)</p><p> Then, the relative entropy between P and Q is defined as</p><p>D(PQ) = supGD(PQ)(G)</p><p>Mikael Skoglund, Probability and random processes 19/24</p><p> If P Q, then we get</p><p>D(PQ) =</p><p>logdP</p><p>dQ() dP ()</p><p>Mikael Skoglund, Probability and random processes 20/24</p></li><li><p> Let X and Y be two random objects on (,A, P ) with rangespaces (,S) and (,U), and a joint distribution XY on( ,S U) corresponding to the marginal distributionsX and Y</p><p> Let XY be the corresponding product distribution The mutual information between X and Y is then defined as</p><p>I(X;Y ) = supFD(XY XY )(F)</p><p>over measurable partitions F of The entropy of the single variable X is defined as</p><p>H(X) = I(X;X)</p><p>Mikael Skoglund, Probability and random processes 21/24</p><p> If XY XY , then</p><p>I(X;Y ) =</p><p>log</p><p>dXYdXY</p><p>dXY</p><p>Mikael Skoglund, Probability and random processes 22/24</p></li><li><p> Returning to the setup of transmission over a channel (withthe previous notation), if we had</p><p>in(xn, yn) = log</p><p>d|Snd|Sn</p><p> If for any fixed stationary and ergodic input, with distributionX , the channel is such that the joint inputoutput process on(RT RT ,BT BT ) is stationary and ergodic, and in additionsatisfies the finite-gap information property (below), then</p><p>1</p><p>nin lim</p><p>n1</p><p>nI(Xn;Y n)</p><p>with probability one</p><p> finite-gap information: for any n &gt; 0 there is a k n such thatI(Xk;X</p><p>n|Xk) and I(Yk;Y n|Y k) are both finite</p><p>Mikael Skoglund, Probability and random processes 23/24</p><p> Lettingi(X) = lim</p><p>n1</p><p>nI(Xn;Y n)</p><p>for any fixed X , we get</p><p>C = supX</p><p>i(X)</p><p> Channels that result in this formula for C have been calledinformation stable</p><p> To prove this, one first needs to see that = i for any fixed Xsuch that the input, output and joint inputoutput are stationary</p><p>and ergodic. Then one needs to show that the supremum is</p><p>achieved in this class.</p><p>Mikael Skoglund, Probability and random processes 24/24</p></li></ul>