constrained least squares - folk.uio.no · 2005-05-11 · constrained least squares authors: g.h....

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/1

Constrained Least SquaresAuthors: G.H. Golub and C.F. Van LoanChapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587

UNIVERSITETETI OSLO


BackgroundThe least squares problem:

min ‖Ax− b‖2

x

Sometimes, we want x to be chosen from some proper subset S ⊂ Rn.Example: S = {x ∈ Rn s.t. ‖x‖2 = 1}Such problems can be solved using the QR factorization and the singularvalue decomposition (SVD).

UNIVERSITETETI OSLO


Least Squares with a Quadratic InequalityConstraint (LSQI)General problem:

min ‖Ax− b‖2

x s.t. ‖Bx− d‖2 ≤ αwhere:

A ∈ Rm,n(m ≥ n),b ∈ Rm,B ∈ Rp,n,d ∈ Rp, α ≥ 0

UNIVERSITETETI OSLO


Assume the generalized SVD of matrices A and B given as:

UTAX = diag(α1, ...,αn),UTU = Im

VTBX = diag(β1, ..., βq),VTV = Ip, q =min{p,n}Assume also the following definitions:

b̃ Õ UTb, d̃ Õ VTd, y Õ X−1x

Then the problem becomes:

min ‖DAy− b̃‖2

y s.t. ‖DBy− d̃‖2 ≤ α

UNIVERSITETETI OSLO


min ‖DAy− b̃‖2

y s.t.‖DBy− d̃‖2 ≤ αCorrectness: By inserting the definitions we get:

‖DAy− b̃‖2 = ‖UTAXX−1x−UTb‖2 = ‖UT (Ax− b)‖2

Multiplication with an orthogonal matrix does not affect the 2-norm. (Thesame result applies for the inequality constraint.)

UNIVERSITETETI OSLO


The objective function becomes:n∑

i=1

(αiyi − b̃i

)2 +m∑

i=n+1

b̃2i

The constraint becomes:r∑

i=1

(βiyi − d̃i

)2 +p∑

i=r+1

d̃2i ≤ α2

We have:r = rank(B)

βr+1 = βr+2 = ... = βq = 0

UNIVERSITETETI OSLO


We have a solution if and only if:p∑

i=r+1

d̃2i ≤ α2

Otherwise, there is obviously no way to satisfy the constraint.

UNIVERSITETETI OSLO


Special Case:∑pi=r+1 d̃

2i = α2

The first sum in (12.1.5) must equal zero, this means:

yi = d̃iβi , i ∈ [1, r ]

The remainder of the variables can be chosen to minimize the first sum in(12.1.4):

yi = b̃iαi , i ∈ [r + 1, n]

(Of course, if αi = 0, i ∈ [r + 1, n], this does not make any sense. We thenchoose yi = 0.)

UNIVERSITETETI OSLO


The General Case:∑pi=r+1 d̃

2i < α2

The minimization (without regards to the constraint) is given by:

yi =b̃i/αi αi ≠ 0

d̃i/βi αi = 0

This may or may not be a feasible solution, depending on whether it is inS.

UNIVERSITETETI OSLO


The Method of Lagrange Multipliers

h(λ,y) = ‖DAy− b̃‖22 + λ

(‖DBy− d̃‖2

2 −α2)

Solve ∂h∂yi , i = 1, ..., n, this yields:

(DTADA + λDT

BDB)

y = DTAb̃+ λDT

Bd̃

UNIVERSITETETI OSLO


Solution using Lagrange multipliers:

yi(λ) =

αib̃i+λβid̃iα2i+λβ2

ii = 1,2, ..., q

b̃iαi i = q + 1, ..., n

UNIVERSITETETI OSLO


Determining the Lagrange parameter, λDefine:

φ(λ) Õ ‖DBy(λ)− d̃‖22 =

r∑

i=1

(αiβib̃i −αid̃iα2i + λβ2

i

)2

+p∑

i=r+1

d̃2i

Solve for φ(λ) = α2. Because φ(0) > α2 and the function is monotonedecreasing for λ > 0, we know that there must be a unique, positivesolution λ∗ with φ(λ∗) = α2.

UNIVERSITETETI OSLO


Algorithm: Spherical ConstraintThe special case B = In,d = 0, α < 0 can be interpreted as selecting x fromthe interior of an n-dimensional sphere. It can be solved using thefollowing algorithm:

• [U,Σ,V]← SVD(A)

• b ← UTb

• r ← rank (A)

UNIVERSITETETI OSLO


Algorithm: Sperical Constraint

• if∑ri=1

(biσi

)2> α2:

• λ∗ ← solve

(∑ri=1

(σibiσ 2i +λ∗

)2

= α2

)

• x ← ∑ri=1

(σibiσ 2i +λ∗

)vi

• else:

• x ← ∑ri=1

(biσi

)vi

• end if

Computing the SVD is the most computationally intense operation in theabove algorithm.

UNIVERSITETETI OSLO


Spherical Constraint as Ridge Regression ProblemUsing Lagrange multipliers to solve the spherical constraint problemresults in: (

ATA+ λI)

x = ATb

where:λ > 0,‖x‖2 = α

This is the solution to the ridge regression problem:

min‖Ax− b‖22 + λ‖x‖2

2

We need some procedure for selecting a suitable λ.

UNIVERSITETETI OSLO


Define the problem:

xk(λ) = argminx‖Dk(Ax− b)‖22 + λ‖x‖2

2

where Dk = I− ekeTk is the matrix operator for removing one of the rows.

Select λ to minimize the cross-validation weighted square error:

C(λ) = 1m

m∑

k=1

wk(aTkxk(λ)− bk)2

This means choosing a λ that does not make the final model rely to muchon any one observation.

UNIVERSITETETI OSLO


Through some calculation, we find that:

C(λ) = 1m

m∑

k=1

wk( rk∂rk/∂bk

)2

where rk is an element in the residual vector r = b−Ax(λ). Theexpression inside the parenthesis can be interpreted as an inversemeasure of the impact of the kth observation on the model.

UNIVERSITETETI OSLO


Using the SVD, the minimization problem is reduced to:

C(λ) = 1m

m∑

k=1

wk

b̃k −

∑rj=1ukjb̃j

(σ 2j

σ 2j +λ

)

1−∑rj=1u2kj

(σ 2j

σ 2j +λ

)

2

where b̃ = UTb as before.

UNIVERSITETETI OSLO


Equality Constrained Least SquaresWe consider a problem similar to LSQI, but with an equality constraint,i.e. a normal least squares problem with solution:

min‖Ax− b‖2

with the constraint that:Bx = d

We assume the following dimensions:

A ∈ Rm,n,B ∈ Rp,n,b ∈ Rm,d ∈ Rp, rank(B) = p

UNIVERSITETETI OSLO


We start by calculating the QR-factorization of BT:

BT = Q

R

0

withA ∈ Rn,n,R ∈ Rp,p,0 ∈ Rn−p,p

and then add the following defintions:

AQ = [A1A2] ,QTx = y

z

This gives us:

Bx =Q

R

0

T

x =[RT0

]QTx =

[RT0

] y

z

= RTy

UNIVERSITETETI OSLO


We also get (because QQT = I):

Ax = (AQ)(QTx

)= [A1A2]

y

z

= A1y+A2z

So the problem becomes:

min‖A1y+A2z− b‖2

subject to:RTy = d

where y is determined directly from the constraint, and then inserted intothe LS problem:

min‖A2z− (b−A1y)‖2

giving us a vector z which can be used to find the final answer:

x = Q

y

z

UNIVERSITETETI OSLO


The Method of WeightingA method for approximating the solution of the LSE-problem (minimize‖Ax− b‖ s.t. Bx = d) through a normal, unconstrained LS problem:

min

∥∥∥∥∥∥

AλB

x−

b

λd

∥∥∥∥∥∥

2

x

for large values of λ.

UNIVERSITETETI OSLO


The exact solution to the LSE problem:

x =p∑

i=1

vTi d

βixi +

n∑

i=p+1

uTi b

αixi

The approximation:

x(λ) =p∑

i=1

αiuTi b+ λ2β2

ivTi d

α2i + λ2β2

ixi +

n∑

i=p+1

uTi b

αixi

The difference:

x(λ)− x =p∑

i=1

αi(βiuT

i b−αivTi d)

βi(α2i + λ2β2

i

) xi

It is appearant that as λ grows larger, the approximation error is reduced.This method is attractive because it only utilizes ordinary LS solving.

constrained least squares - folk.uio.no · 2005-05-11 · constrained least squares authors: g.h....

Documents