a general duality approach to i-projections

14
journal of statistical planning Journal of Statistical Planning and and inference Inference 47 (1995) 203--216 ELSEVIER A general duality approach to/-projections Bhaskar Bhattacharya a'*, Richard Dykstra b "Department q] Mathematics, Mailcode 4408, Southern Illinois University, Carbondale. IL 62901, USA bDepartment q[ Statistics and Actuarial Science, University (?[ Iowa, Iowa City, IA 52242, USA Received 1 August 1994; revised 23 November 1994 Abstract /-projection problems arise in a myriad of situations and settings. In this paper, it is shown that under reasonable assumptions, a Fenchel type dual optimization problem exists which is equivalent to the stated/-projection problem for very general probability measures. This dual problem is often much more tractable than the original /-projection problem. /-projection problems which are equivalent to least square problems are also identified; primarily through the dual formulation. Several examples are examined to illustrate the duality structure and theorems. A MS 1991 Subject Classification: Primary, 65K10; secondary, 90C25 Keywords: /-divergence;/-projection; Kullback Leibler information number; Fenchel duality; Dual space; Convexity; Inequality constraints 1. Introduction For two probability measures (PM) P and Q defined on an arbitrary measurable space (~, M), the Kullback-Leibler inJormation number, or the I-divergence between P and Q, is defined as ~Sln(dP/dQ)dP, P ~ Q, I(PI Q) = ( + oo, otherwise. Although I(PI Q) is not a metric, it is always nonnegative and equals 0 if and only if P = Q. Hence it is often interpreted as a measure of'divergence' or 'distance' between P and Q. Several other names are known for the quantity I(PIQ), for example, information for discrimination, cross-entropy, information gain, etc. * Corresponding author. Tel.: 618 453 6503. Fax: 618 453 5300. E-mail: [email protected]. 0378-3758/95/$09.50 ,,~C)1995 Elsevier Science B.V. All rights reserved SSDI 0378-3758(94)00142-1

Upload: bhaskar-bhattacharya

Post on 21-Jun-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: A general duality approach to I-projections

journal of statistical planning

Journal of Statistical Planning and and inference Inference 47 (1995) 203--216 ELSEVIER

A general duality approach to/-projections B h a s k a r B h a t t a c h a r y a a'*, R icha rd D y k s t r a b

"Department q] Mathematics, Mailcode 4408, Southern Illinois University, Carbondale. IL 62901, USA bDepartment q[ Statistics and Actuarial Science, University (?[ Iowa, Iowa City, IA 52242, USA

Received 1 August 1994; revised 23 November 1994

Abstract

/-projection problems arise in a myriad of situations and settings. In this paper, it is shown that under reasonable assumptions, a Fenchel type dual optimization problem exists which is equivalent to the s tated/-project ion problem for very general probability measures. This dual problem is often much more tractable than the original /-projection problem. /-projection problems which are equivalent to least square problems are also identified; primarily through the dual formulation. Several examples are examined to illustrate the duality structure and theorems.

A M S 1991 Subject Classification: Primary, 65K10; secondary, 90C25

Keywords: /-divergence;/-projection; Kullback Leibler information number; Fenchel duality; Dual space; Convexity; Inequality constraints

1. Introduction

For two probability measures (PM) P and Q defined on an arbitrary measurable space (~, M), the Kullback-Leibler inJormation number, or the I-divergence between P and Q, is defined as

~Sln(dP/dQ)dP, P ~ Q, I(PI Q) = ( + oo, otherwise.

Although I(PI Q) is not a metric, it is always nonnegative and equals 0 if and only if P = Q. Hence it is often interpreted as a measure of 'divergence' or 'distance' between P and Q. Several other names are known for the quantity I(PIQ), for example, information for discrimination, cross-entropy, information gain, etc.

* Corresponding author. Tel.: 618 453 6503. Fax: 618 453 5300. E-mail: [email protected].

0378-3758/95/$09.50 ,,~C) 1995 Elsevier Science B.V. All rights reserved SSDI 0 3 7 8 - 3 7 5 8 ( 9 4 ) 0 0 1 4 2 - 1

Page 2: A general duality approach to I-projections

204 B. Bhattacharya, R. Dykstra/Journal of Statistical Plann&g and Inference 47 (1995) 203 216

For a given Q and a specified set of PM's cg, it is often of interest to find the ReC£ which is 'closest' to Q in an l-div~rgence sense. Thus an R~Cg that

satisfies

I(RJQ) = inf I(PIQ) ( < ~), (1.1) pEc£

is called the 1-projection of Q onto ~. Csiszar (1975) has shown that R exists uniquely if is variation-closed and if there exists a P in ~ such that I(PI Q) < ~ . / -pro jec t ions

play a key role in the information theoretic approach to statistics (e.g., Kullback, 1959; Good, 1963; etc.). They also occur in other areas such as the maximization of entropy (Jaynes, 1957; Rao, 1965) and the theory of large deviations (Sanov, 1957). Maximum likelihood estimation in log-linear models subject to cone constraints under a multi- nomial sampling scheme has been shown to be equivalent to solving an/-project ion problem (Dykstra and Lemke, 1988). The iterative proportional fitting procedure commonly used in log-linear models is actually an iterative scheme of successive /-projection problems (Csiszar, 1975, 1989; Darroch and Ratcliff, 1972; Ireland and Kullback, 1968).

Unfortunately, /-projection problems are often difficult to solve. In this paper, we propose a generalized Fenchel duality theorem to develop dual problems to (possibly) infinite dimensional /-projection problems. This theorem also suggests a simple relationship between the solutions of the primal and dual problems. As the dual problems are often easier to solve, this theorem can yield a tractable approach to /-projection problems. In Section 2, we describe the convex conjugate of the I- divergence functional and a Fenchel duality type theorem applicable in an infinite dimensional space. In Section 3 we describe situations where infinite dimensional /-projection problems are solved by infinite dimensional least square problems. In Section 4, several examples are presented to illustrate the applicability of the duality approach.

2. Fenchel duality

Convex functions on ~" have the special property that they are totally determined by the collection of supporting hyperplanes of their epigraph. The Fenchel duality theorem which we state here essentially uses this characterization to express optimiza- tion problems in a different manner. Before we present this theorem and discuss its utility we will establish some necessary notation.

We will denote the underlying probability space by (f2, ~ , Q), where t2 is an arbitrary sample space, ~- is a a-field of subsets of f2 and Q is a given PM defined on elements of ~ . We will use the notation ~fdQ to indicate ~f(~o)dQ(~o).

Although we are really interested in finding the closest PM to Q (in the/-project ion sense) within the subclass ¢£, we can work with the normed, linear vector space LI(Q) since I(P] Q) < ~ implies dP(o~)/dQ exists as an integrable function.

Page 3: A general duality approach to I-projections

B. Bhattacharya, R. Dykstra/Journal of Statistical Planning and Inference 4,7 (1995) 203 216 205

It easily follows that if the convex function f is defined on the linear vector space

L1 (Q) as

~ x l n x d Q if x>_.o, 5xdO= 1, f (x ) = ( + oc elsewhere, (2.1t

={ dP } cgo X e Ll(Q): x = ~ for some P ~Cg (~ ~)

and

then (1.1) and

inf f (x ) (2.3)

are equivalent problems.

In a standard Fenchel setting (e.g. Luenberger, 1969), the dual normed, linear vector space would consist of the bounded linear operators on the primal space. In our case, we would like to take L1(Q) as our primary space which would imply that L~(Q) would be the natural dual space. However, this dual space may be too restrictive on two counts. First, restricting the dual space to a.s. bounded functions (L,,~(Q)) may imply that the dual problem does not exist (see Example 4.1.a). Secondly, if the PM's in eg are restricted to have support B (0 < Q(B) < 1), the standard dual approach will be too restrictive (see Example 4.1.c). Thus we suggest a more general dual space.

We define our dual space to be ~/g) , ~ ) , the set of extended valued, o~-measurable functions on ~2. The set d o m ( f ) = {x e LI(Q); f (x) < c~} is useful for defining the convex conjugate of f

We define the convex conjugate of f for y e ~/(~2, ~ ) as

J ( y ) = sup xydQ x ~ d o m ( f )

= sup i f ( y - l n x ) x d Q ] ,2.4) x e d o m ( f ) L d A

with the understanding that only those x c dom ( f ) where the integral is well defined (~) and - ~ are feasible values) are used in determining the supremum. The form of f * is remarkably tractable.

Theorem 2.1. For y ~ AT/(f2, f f ),

) Proof. First assume that y is not - G o a.s. (Q) and that ~yeYdQ < oc x0 = (eY/~ e y dQ) e dom (f) . Since the functions

g( t )= {t(_k ~ lnt) t~> O,

t < 0 ,

so thal

Page 4: A general duality approach to I-projections

206 B. Bhattacharya, R. Dykstra/Journal of Statistical Planning and lnference 47 (1995) 203-216

are concave for every k, we may say

g(t + ~h) -- g(t) ~ < ( k - l n t - 1)h, Vrea lh , k a n d t ~ > 0 , c~>0. (2.5)

If x E d o m ( f ) and 0 ~ c ~ < 1, we can write ( 1 - C 0 X o + ~ X = X o + ~ h , where h dQ = 0. From (2.5) we may write (with k = y(og))

g(xo(o) + ~h(o)) - g(xo(o)) ~< (y(co) -- In Xo(Og) -- 1)h(m), WoeQ.

Integrating both sides with respect to Q yields

G(xo + c~h) - G(xo) ~<0,

where G(x) = S x(y - In x)dQ. This implies that the supremum in (2.4) is attained at

Xo and that f * (y ) = G(xo) = ln~erdQ. If~ycYdQ = oo, but ~eYdQ < oo, a similar argument will imply /* (y ) ~< ln~eYdQ

since Xo = (er/~ e y dQ) ¢ dom ( f ) . However, if

{[~e ' I (ry l ~ n)dQ]-'e y, lyl ~ n, x , = 0, ]y[ > n,

then G ( x , ) ~ ln~eYdQ, and x. e d o m ( f ) which implies the desired result. The case when ~erdQ = oo can be handled by limiting arguments using the previous sequence of x.'s. It is easily seen that i f y = - oo a.s. (Q), t h e n f * ( y ) = - co. []

Fenchel duality theorems (e.g. Luenberger, 1969, p. 201, Theorem 1) essentially state that the minimum vertical separation between disjoint convex sets in a normed, linear vector space ~ is equal to the maximum vertical separation of two parallel hyper- planes that separate the sets. Though we will not phrase our theorems in this terminology, it is still the underlying principle which drives our results.

A subset K of a vector space is said to be a cone if x E K ~ c~x ~ K, Vc~ ~> 0. For an arbitrary subset S of LI(Q), we define

S ~ = { y ~ I : f x y d Q > > - O f o r a l l x E S }

as the positive conjugate cone of S, and

S e = { y e ~ l : i x y d Q < ~ O f o r a l l x s S }

as the negative conjugate cone of S. Though quite straightforward, the following theorem is sometimes very useful for

solving/-project ion problems.

Page 5: A general duality approach to I-projections

B. Bhattacharya, R. Dykstra / Journal o f Statistical Planning and lnterence 47 (1995) 203 216 207

Theorem 2.2. Then Xo e S and Yo ~ T are respective solutions of

and

if

Assume S is a subset Of Ll(Q) which intersects d o m( f ) and that T c S @ .

r inf f (x) = inf | x In x dQ x~S x e S c~ d o m ( f ) ,,]

i n f f * ( y ) = l n [ i n f feYdQ] yeT L ye T .) U

f (xo) + f* (Yo) <- O. (2.6)

Moreover, in this situation,

inff(x) = - i n f f * (y). xeS yeT

Proof. Since Xo e dora(f) , the function which is - oc, a.s. (O) cannot be in S @. Now, note that

f ( x ) + f * ( y ) = f ( x ) + sup [ f y z d Q - f ( z ) ] z e d o m ( f )

> f y x dQ

>~0, V x e S , y e S @,

Thus if (2.6) holds,

0 >~f(xo) +f*(Yo) ~> inff(x) + inf f*(y) >~ 0 xeS yeT

which implies

f(xo) = i n f f ( x ) = -- i n f f * ( y ) = - f * (Yo). xeS y~T

[~ (2.7)

Although Theorem 2.2 makes no mention of convexity, much stronger results follow when one imposes convexity and closure conditions on S. Under these condi- tions, solution of an appropriate dual problem immediately solves the primal prob- lem. The following theorem elaborates.

Theorem 2.3. Suppose S is a nonempty subset qf LI(Q) and that Yo is a solution to

inf (e ydQ < oc. (2.8) yes ® J

Page 6: A general duality approach to I-projections

208 B. Bhattacharya, R. Dykstra / Journal of Statistical Planning and Inference 47 (1995) 203-216

Then xo = er°/~ e y° dQ is the solution to

= inf f x In x dQ, (2 .9) i n f f ( x ) x~S x ~ s c~ dom(f) J

i f Xo • S. I f S is either (1) convex, closed (in L1 (Q) norm) and contained in d o m ( f ) or (2) a closed (in L1 (Q) norm), convex cone, then Xo must lie in S, and hence Xo must solve (2.9).

P r o o f . Note that

ey ° f*(yo) + f\~-e,TdQ)

=lnfeY°dQ+ ( eY° {lneyo lnfe,odQ}dQ

f' Yo er°

It easily follows that Yo • S ~ implies ~Yo • S * for 0 ~ ~ ~ 1. F r o m convexity, we can say

eYO(O,) _ e~YOt ,o) /> yo(e))e "y°~°~), '7'(.0 a.s. (Q)

1 - ~

for 0 < c~ < 1, from which it follows by integration with respect to Q of both sides that

o>~feY°dQ-fe~'Y°dQ>~(1-~)fyoe~Y°dQ for 0 < ~ < 1 (since Yo solves (2.8)). If a is bounded away from 0, yoe ~'y° is bounded below. Then, using a var iant of Fa tou ' s lemma, we obtain

0>. liminf fyoe'Y°dQ >i (liminfyoe~Y°dQ = fyoeY°dQ > - o0.

If Xo = (eY°/~eY°dQ) • S, the first par t of Theorem 2.3 then follows by Theorem 2.2.

F o r the last pa r t of Theorem 2.3, let S d e n o t e the closed convex cone {~x: x • S, >~ 0}. N o w the s tandard positive, conjugate cone of S with respect to the s tandard

dual space Loo (Q) is S* ~ Lo~ (Q). However

S*~= {xeL,(Q): fxydQ >~ O Vy•S~ c~ L~(Q)}

= ff (Barbu and Precupanu, 1986, Ch. 2).

Let y • S ~ c~ L~ (Q). Since

eYO(,O)+c~y(o) _ eYO(,O) <~ y(oo)e y°('°)+aytw)

ot

Page 7: A general duality approach to I-projections

B. Bhattacharva, R. Dykstra / Journal q[" Statistical Planning and lnJbrence 47 (1995) 203 216 209

(by convexity), we can integrate both sides with respect to Q and use a dominated

convergence theorem (e y° is integrable and y is bounded a.s. (Q)) as :~ --, 0 to obtain

0 <. fye"°dO.

Thus Xo e S ® ® = S. However it easily follows that under conditions (1) or (2) of Theorem 2.3, Xo ~ S.

3. [,east squares solutions

Robertson et al, (1988, Ch. 8) have discussed the theory of isotonic regression procedures applied to an infinite dimensional L2-space. Using the same framework we will show that some infinite dimensional /-project ion problems can also be solved by

infinite dimensional least square projections onto isotonic cones. We let ((L Y , Q) be an arbitrary probability space and let L 2 (~, J ' , Q) = L2 denote

the collection of square integrable random variables on (O,9-,Q). A real-valued (measurable) function f on ~2 is said to be isotonic: with respect to the quasi-ordering

on f2 if (Ol ~-~(O2 ~ f ( O ) l ) ~f(o)2). We define the collection og of upper .sets corresponding to % as follows: U in 3- is a member of °h' if and only if x <~ y and x ~_ U imply that y e U. Given the quasi-order % on f2, a random va r i ab le fon f2 is isotonic

with respect to % if it is measurable with respect to the collection of upper sets q/[see Theorem 8.1.2. of Robertson et al., 1988; also Brunk, 1965).

For a fixed g e L2(Q), the isotonic regression of g with respect to the quasi-order is the solution to the least squares problem

min ~(g(to) - - . f ( ( D ) ) 2 d Q , (3.1J f ~ ,5 t ,)

where J consists of Lz-functions which are isotonic with respect to ~<. The isotonic

regression exists uniquely a.e. (Q) and is denoted by g* = Ee (g l J ) . The following two theorems are extensions to the infinite dimensional case of

theorems by Barlow and Brunk (1972) and can be proved using similar arguments.

Theorem 3.1. Suppose g ~ L2 (Q) is given, cI) is a real valued convex function defined on ~, J denotes the class of functions on ~2 which are isotonic with respect to a given quasi-order on (2, and j o is the negative conjugate cone of J defined by . f i e = {y E Lz(Q): f xy dQ <. O, Vx ~ J }. Then the isotonic regression g* = Eo(,q[J) solves

min fq~(f )dQ (3.2) f ;g l E J o .

and is the unique solution if 49 is strictly convex.

Page 8: A general duality approach to I-projections

210 B. Bhattacharya, R. Dykstra/Journal of Statistical Planning and Inference 47 (1995) 203 216

Theorem 3.2. Suppose cb is a differentiable, real valued convex function defined on with derivative (9. For a real number t, let c~-1(0 be defined as inf{u e ~: ~b(u) ~> t}

and assume that gl is a specified function in L2(Q) such that the range of g 1 is in the

domain of c~ - 1. Finally suppose that J is the collection offunctions isotonic with respect

to a given quasi-order on (2. Then a solution to the problem

min f[-q~(f) - f g l ] dQ (3.3) f e J d

is ~ - 1 (g,), where g* is the isotonic regression of gl onto J . The solution is unique a.s.

(Q) if ~ is strictly convex.

There are subtle connections between /-projections and regular least squares projections. We offer the following theorems as an indication of these connections.

Theorem 3.3. Suppose J is an isotonic cone contained in L2(Q) and R ~ Q is a f ixed

measure such that (dR/dQ) e L2(Q). Then if C = {P: (dR/dQ) - (dP/dQ) E j e } , the

solution to the I-projection problem

inf I(PIQ) pecg

is given by the measure

P o ( A ) = Q ~ J dQ, A e ~ - ,

where EQ(dR/dQ [ J ) is as defined earlier.

(3.4)

Proofi If S = {(dP/dQ): P ~ C}, then the problem

inf f x l n x d Q = inf fxlnxdQ (3.5) xeS • x:(dR/dQ)- x ~ j e

is equivalent to (3.4). However, from Theorem 3.1, (3.5) is solved by E o ( d R / d Q J J ), from which the desired conclusion easily follows. []

A similar appearing problem can have a very different solution. We illustrate this fact in the following theorem. Though more general versions of this theorem are possible, we forbear from the additional complexities involved.

Theorem 3.4. Assume that J is an isotonic cone contained in L2(Q) and that

0 < t < (dR/dQ) < T < oo, a.s. (Q) for constants t and T and a f ixed probability measure R. Then the problem

inf l (PfQ), (3.6) pecg

Page 9: A general duality approach to I-projections

where

B. Bhattacharya, R. Dykstra / Journal o[ Statistical Planning and Inlerence 47 (1995) 203 216 211

fp.de } cg= ( " d R e J {3.7)

has the solution /~(A) = ~A exp (E, (ln (dQ/dR) IJ)) dR

fn exp(E , ( ln (dQ/dR) l J ) ) dR"

Proof. Note that L2 (Q) and L: (R) are identical, and that problem (3.6) is equivalent to

inf f(x), 13.8~ xeS(Q)

where

S(Q)={x~Lz(Q)'X~R~J }.

Since we will not need the earlier defined sets S ®, we form the standard conjugate region:

S*(Q)= {y~ L2(Q): fxydQ >~ O, VxeS(Q)}

={y~L2(Q):fx~RYdR>O, Vx~S(Q) }

= {ycL2(R): fzydR >>-O, Vz~J mL2(R)}

= J*(R). Thus the problem dual to (3.8) is

inf f e Y d Q = inf (eydQdR (3.9} yes*~o~ y e J*(RIJ dR

= inf feY+ln(e°-/amdR (3.10) y e J * ( R ) O

Making the change of variable z = y + ln(dQ/dR), we may write (3.10) as

inf fez dR, In(dQ/dR) z • J * ( R j

which by Theorem 3.1 has the solution

Page 10: A general duality approach to I-projections

212 B. Bhattacharya, R. Dykstra/Journal of Statistical Planning and Inference 47 (1995) 203-216

Thus (3.9) has solution

)~= E R ( l nd~ J ) + ln (dd~) •

Convexity will imply that e ) e L2(Q), and the desired result then follows from Theorem 2.3. []

4. Applications

In this section we provide some examples to illustrate the duality approach for solving/-projection problems.

Example 4.1. Let Q be a given PM on the measurable space (f2, .~). We assume T is a measurable function on (0, ~ ) and wish to find the/-projection of Q onto the set of PM's cg = {p: ~ TdP <~ t} (with the understanding that the integral must be well- defined for P to belong to c.g).

In terms of (2.1), we could write our problem as

inff(x), (4.1) x ~ S

where S = {x ~ Ll: ~(t -- T )xdQ >>. 0}. Since S is a closed, convex cone, and

S ~ = { ~ ( t - T ) : ~ > 0 } ,

the dual problem may be phrased as

info f e~"- r) dQ. (4.2)

The optimization problem in (4.2) is one-dimensional. Although it may not be solvable in closed form, a Newton-Raphson scheme typically works well and provides convergence at a quadratic rate (Luenberger, 1984). If~ solves the unrestricted version of (4.2), then ~ + = max (£ 0) solves (4.2) and the measure determined by

dP e ~ + (t r)

dQ Ie a+('- r)dQ

must solve the original problem. Of course,

d/6 e~(t T)

dQ Se s( '- V)dQ

solves the problem when S is defined with an equality present rather than the inequality. Multiple (finite) constraints of this form can be handled in a similar fashion although finding the appropriate vector fi may be significantly more difficult.

Page 11: A general duality approach to I-projections

B. Bhattacharya, R. Dykstra/Journal of Statistical Planning and ln['erence 47 (1995) 203 216 213

Example 4.1a. As a special case of Example 4.1, we let f2 = (0, 1), ,~- = {Borel sets in

(0, 1)}, and let Q be the uniform distribution on f2. We wish to find the/-project ion of Q onto the set of PM's

Writing our problem as

inff(x) , XES

where

S = {xcLl(Q): fj(O.5 + lns)x(s)dQ(s) > O}, the dual problem

f l e~'2 inf e~(°s+l"S)ds = i n f - c~>0 ~>00¢ q - 1

is solved by ~ = 1 which implies that the PM /~ defined by

dP e °'s+l"s dQ (s) - S~e ° 5 +'"~dO = 2s, 0 < s < 1

(which is the density of a beta distribution with a = 2, fl = I) solves the original problem. Note that had we tried to use the standard positive conjugate cone

{yeL~(O): fyxdQ >~ OVxeS} in the dual problem, this would not have contained y(s) = 0.5 + l n s , and the duality approach would have been unsuccessful.

Many of the standard entropy characterizations of families of distributions, such as those obtained by Kagan et al. (1973) can be easily obtained in this manner.

Example 4.lb. Suppose f2 = ~+, ~ = {Borel sets in f2}, and Q has a gamma (a, b) distribution (with probability density function q(s)= ba[F(a)]-le-b~s " ~l(s > 0)). We wish to find the / -projec t ion of Q onto

~={P:fsdP(s)<~c}. It easily follows that the dual problem can be expressed as

inf e~(~-') q(s)ds,

Page 12: A general duality approach to I-projections

214 B. Bhattacharya, R. Dykstra / Journal of Statistical Planning and Inference 47 (1995) 203-216

where q is the appropriate gamma density. It is straightforward to show that the solution to the original problem is a gamma (a, ac 1) PM, if b <~ ac 1. Note that the /-projection does not depend on b in this case. That is, all gamma distributions in the family {gamma(a, b): b <~ ac 1} project onto the gamma (a, ac-1) distribution.

Example 4.1e. In the event that we wish to restrict Q to have support B (Q(B) > 0), a natural suggestion would be to consider the conditional distribution of Q condi- tioned to have support B. It is easily seen that this distribution is also the/-projection of Q onto

S = {P;P(B)= 1}

by our duality results. By Example 4.1, the dual problem is

which is clearly attained at ct = + ~ ( ~ '0 = 0). Theorem 2.2 then implies that the /-projection of Q onto S is the aforementioned conditional distribution.

Example 4.2. We consider the situation where Q is a fixed, bivariate probability distribution on ~2 and Gx is a particular univariate distribution which is absolutely continuous with respect to Qx (the X-marginal distribution of Q). We consider the problem of finding the/-projection of Q onto the family of bivariate distributions with X-marginal equal to Gx. We could state our problem as

inf I(PIQ), pecg

where

{ f d, cg = p: (lax-- G x ( x ) ) ~ d Q = O,

for lax = {(u,v): - oo < u < x,

Vx e ~ }

- m <v~< m} a n d G x ( x ) = G x ( - re ,x] .

(4.3)

Careful scrutiny will show that an equivalent dual problem can be expressed as

inf teYdQ, (4.4)

where the infimum is taken over the set S e --- {y: y(u, v) = .~(u), ~.~(u)dGx(u) = 0}. However, if

Page 13: A general duality approach to I-projections

B. Bhattacharya, R. Dykstra / Journal of Statistical Planning and InJerence 47 (1995) 203 2/6 215

it can be seen that for all other y(u,v) in the restriction region S ~

d ie,,O+~(,,_yo)dQ ,=o f ~ j = (;(t)-.Po(t))dGx(t) = O,

which implies that Y0 (u, v) solves the dual problem (4.4). Thus if fi is the solution to (4.3), dP(u, v)/dQ must be proportional to dGx(u)/dQx. Moreover, since the value of (4.4) is

e x p l f (aC2x In d*~-(t)]dGx(t)l'- -_l

the value of (4.3) (see Theorem 2.2) must be given by

f i n / d G x h ~d~x(t))dGx(t)= l(Gx[Qx).

In other words, the/-projection is the distribution determined by the conditional distributions ( Y given X = x) from the Q-measure and the X-marginal from the Gx 1 T l e a s u r e .

Of course these results extend to more general measures. For example, if Q is the measure corresponding to the Brownian motion process l'V(t), 0 ~< t ~< 1, the l- projection of Q onto those measures with marginal distribution Ga at t = 1, would have conditional distributions (given [~(1) = x) of a Brownian bridge around the line from (0, 0) to (1, x) and marginal distribution G1 for W(1). The/-divergence between this measure and Brownian motion is the same as the/-divergence between G1 and an N(0, l)-distribution.

Of course,/-projections problems with more than one marginal specified extend in a natural way.

If we were to project onto the class of distributions with X-marginal stochastically greater (or equal) than G~, we would add the constraints that y must be nondecreas- ing. This is essentially the same problem as projecting Qx onto the class of univariate distributions stochastically greater than Gx and is in essence solved by Theorem 3.3 where J is the cone of nondecreasing functions, Gx plays the role of R and Q~ plays the role of Q.

Example 4.3. Let Qa x Q2 be a fixed bivariate product measure probability distribu- tion (PD) on ~2, and consider the problem of finding the/-projection of Q] x Q2 onto the class of all bivariate PD's whose X-marginal is stochastically larger than the Y-marginal. Thus our problem can be stated as

min I(PIQ] xQ2), (4.5)

where

= { P : P I A x ) <~ e(Bx),Vx ~ ~}

and P is a bivariate PD on ~2, Ax = ( - oQ,x] × [R and Bx = ~ × ( - .x~,x].

Page 14: A general duality approach to I-projections

216 B. Bhattacharya, R. Dykstra/Journal of Statistical Planning and Inference 47 (1995) 203-216

It can be shown that the equivalent dual problem is given by

inf IeYd(Ql × Q2), yES @ .]

where

S* = { f(u) - f(v): f is nonpositive and nondecreasing}.

The dual problem can be simplified to

rain feYd(Q1 x Q2) = min ~e:(")-:~V)d(Q,(u)x Q2(v)) yes ~ d f~< 0,fT J

= min ~h(u)dQl(u)~[h(v)]-ldQ2(v). (4.6) h~< 1,hI" J d

If h* solves the right-hand side of (4.6), then the/ -project ion problem (4.5) is solved by

h * ( . ) I-h * ( / ) ) ] - 1 - e * (A) = j,, j. h* ( , ) dQ,. (u) J' 2 (,;) d(Qi 0,,) × Q 2 (v)),

for some A. Note that P* is also a product measure.

References

Barbu, V. and Th. Precupanu (1986). Convexity and Optimization in Banach Spaces. Reidel, Boston. Brunk, H.D. (1965). Conditional expectation given a a-lattice and applications Ann. Math. Statist. 36,

1339-1350. Barlow, R.E. and H.D. Brunk (1972). The isotonic regression problem and its dual. J. Amer. Statist. Assoc.

67, 140-147. Csiszar, I. (1975). l-divergence geometry of probability distributions and minimization problems. Ann.

Probab. 3,146-159. Csiszar, I. (1989). A geometric interpretation of Darroch and Ratcliff's generalized iterative scaling. Ann.

Statist. 17, 1409 1413. Darroch, J.N. and D. Ratcliff (1972). Generalized iterative scaling for Jog-linear models. Ann. Math. Statist.

43, 1470-1480. Dykstra, R.L. and J.H. Lemke (1988). Duality of I-projections and maximum likelihood estimation for

log-linear models under cone constraints. J. Amer. Statist. Assoc, 402, 546-554. Good, I.J. (1963). Maximum entropy for hypothesis formulation, especially for multidimensional contin-

gency tables. Ann. Math. Satist. 34, 911 934. Ireland, C.T. and S. Kullback (1968). Contingency tables with given marginals. Biometrika 55, 179-188. Jaynes, E.T. (1957). Information theory and statistical mechanics. Phys. Rev. 106, 620-630. Kagan, A.M., Y.V. Linnik and C.R. Rao (1973). Characterization Problems in Mathematical Statistics.

Translated from Russian text by B. Ramachandran. Wiley, New York. Kullback, S. (1959). Information Theory and Statistics. Wiley, New York. Luenberger, D.G. (1969). Optimization by Vector Space Methods. Wiley, New York. Luenberger, D.G. (1984). Linear and Nonlinear Programmin 9. Addison-Wesley, Reading, MA. Rao, C.R. (1965). Linear Statistical Inference and Its Applications. Wiley, New York. Robertson, T., F.T. Wright and R.L. Dykstra (1988). Order Restricted Statistical Inference. Wiley,

New York. Sanov, I.N. (1957). On the probability of large deviations of random variables. Mat. Sbornik. 42, 1144.