constrained least squares - folk.uio.no · 2005-05-11 · constrained least squares authors: g.h....

23
UNIVERSITETET I OSLO INSTITUTT FOR INFORMATIKK CICN may05/1 Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587

Upload: others

Post on 27-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/1

Constrained Least SquaresAuthors: G.H. Golub and C.F. Van LoanChapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587

Page 2: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/2

BackgroundThe least squares problem:

min ‖Ax− b‖2

x

Sometimes, we want x to be chosen from some proper subset S ⊂ Rn.Example: S = {x ∈ Rn s.t. ‖x‖2 = 1}Such problems can be solved using the QR factorization and the singularvalue decomposition (SVD).

Page 3: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/3

Least Squares with a Quadratic InequalityConstraint (LSQI)General problem:

min ‖Ax− b‖2

x s.t. ‖Bx− d‖2 ≤ αwhere:

A ∈ Rm,n(m ≥ n),b ∈ Rm,B ∈ Rp,n,d ∈ Rp, α ≥ 0

Page 4: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/4

Assume the generalized SVD of matrices A and B given as:

UTAX = diag(α1, ...,αn),UTU = Im

VTBX = diag(β1, ..., βq),VTV = Ip, q =min{p,n}Assume also the following definitions:

b̃ Õ UTb, d̃ Õ VTd, y Õ X−1x

Then the problem becomes:

min ‖DAy− b̃‖2

y s.t. ‖DBy− d̃‖2 ≤ α

Page 5: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/5

min ‖DAy− b̃‖2

y s.t.‖DBy− d̃‖2 ≤ αCorrectness: By inserting the definitions we get:

‖DAy− b̃‖2 = ‖UTAXX−1x−UTb‖2 = ‖UT (Ax− b)‖2

Multiplication with an orthogonal matrix does not affect the 2-norm. (Thesame result applies for the inequality constraint.)

Page 6: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/6

The objective function becomes:n∑

i=1

(αiyi − b̃i

)2 +m∑

i=n+1

b̃2i

The constraint becomes:r∑

i=1

(βiyi − d̃i

)2 +p∑

i=r+1

d̃2i ≤ α2

We have:r = rank(B)

βr+1 = βr+2 = ... = βq = 0

Page 7: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/7

We have a solution if and only if:p∑

i=r+1

d̃2i ≤ α2

Otherwise, there is obviously no way to satisfy the constraint.

Page 8: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/8

Special Case:∑pi=r+1 d̃

2i = α2

The first sum in (12.1.5) must equal zero, this means:

yi = d̃iβi , i ∈ [1, r ]

The remainder of the variables can be chosen to minimize the first sum in(12.1.4):

yi = b̃iαi , i ∈ [r + 1, n]

(Of course, if αi = 0, i ∈ [r + 1, n], this does not make any sense. We thenchoose yi = 0.)

Page 9: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/9

The General Case:∑pi=r+1 d̃

2i < α2

The minimization (without regards to the constraint) is given by:

yi =b̃i/αi αi ≠ 0

d̃i/βi αi = 0

This may or may not be a feasible solution, depending on whether it is inS.

Page 10: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/10

The Method of Lagrange Multipliers

h(λ,y) = ‖DAy− b̃‖22 + λ

(‖DBy− d̃‖2

2 −α2)

Solve ∂h∂yi , i = 1, ..., n, this yields:

(DTADA + λDT

BDB)

y = DTAb̃+ λDT

Bd̃

Page 11: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/11

Solution using Lagrange multipliers:

yi(λ) =

αib̃i+λβid̃iα2i+λβ2

ii = 1,2, ..., q

b̃iαi i = q + 1, ..., n

Page 12: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/12

Determining the Lagrange parameter, λDefine:

φ(λ) Õ ‖DBy(λ)− d̃‖22 =

r∑

i=1

(αiβib̃i −αid̃iα2i + λβ2

i

)2

+p∑

i=r+1

d̃2i

Solve for φ(λ) = α2. Because φ(0) > α2 and the function is monotonedecreasing for λ > 0, we know that there must be a unique, positivesolution λ∗ with φ(λ∗) = α2.

Page 13: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/13

Algorithm: Spherical ConstraintThe special case B = In,d = 0, α < 0 can be interpreted as selecting x fromthe interior of an n-dimensional sphere. It can be solved using thefollowing algorithm:

• [U,Σ,V]← SVD(A)

• b ← UTb

• r ← rank (A)

Page 14: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/14

Algorithm: Sperical Constraint

• if∑ri=1

(biσi

)2> α2:

• λ∗ ← solve

(∑ri=1

(σibiσ 2i +λ∗

)2

= α2

)

• x ← ∑ri=1

(σibiσ 2i +λ∗

)vi

• else:

• x ← ∑ri=1

(biσi

)vi

• end if

Computing the SVD is the most computationally intense operation in theabove algorithm.

Page 15: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/15

Spherical Constraint as Ridge Regression ProblemUsing Lagrange multipliers to solve the spherical constraint problemresults in: (

ATA+ λI)

x = ATb

where:λ > 0,‖x‖2 = α

This is the solution to the ridge regression problem:

min‖Ax− b‖22 + λ‖x‖2

2

We need some procedure for selecting a suitable λ.

Page 16: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/16

Define the problem:

xk(λ) = argminx‖Dk(Ax− b)‖22 + λ‖x‖2

2

where Dk = I− ekeTk is the matrix operator for removing one of the rows.

Select λ to minimize the cross-validation weighted square error:

C(λ) = 1m

m∑

k=1

wk(aTkxk(λ)− bk)2

This means choosing a λ that does not make the final model rely to muchon any one observation.

Page 17: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/17

Through some calculation, we find that:

C(λ) = 1m

m∑

k=1

wk( rk∂rk/∂bk

)2

where rk is an element in the residual vector r = b−Ax(λ). Theexpression inside the parenthesis can be interpreted as an inversemeasure of the impact of the kth observation on the model.

Page 18: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/18

Using the SVD, the minimization problem is reduced to:

C(λ) = 1m

m∑

k=1

wk

b̃k −

∑rj=1ukjb̃j

(σ 2j

σ 2j +λ

)

1−∑rj=1u2kj

(σ 2j

σ 2j +λ

)

2

where b̃ = UTb as before.

Page 19: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/19

Equality Constrained Least SquaresWe consider a problem similar to LSQI, but with an equality constraint,i.e. a normal least squares problem with solution:

min‖Ax− b‖2

with the constraint that:Bx = d

We assume the following dimensions:

A ∈ Rm,n,B ∈ Rp,n,b ∈ Rm,d ∈ Rp, rank(B) = p

Page 20: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/20

We start by calculating the QR-factorization of BT:

BT = Q

R

0

withA ∈ Rn,n,R ∈ Rp,p,0 ∈ Rn−p,p

and then add the following defintions:

AQ = [A1A2] ,QTx = y

z

This gives us:

Bx =Q

R

0

T

x =[RT0

]QTx =

[RT0

] y

z

= RTy

Page 21: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/21

We also get (because QQT = I):

Ax = (AQ)(QTx

)= [A1A2]

y

z

= A1y+A2z

So the problem becomes:

min‖A1y+A2z− b‖2

subject to:RTy = d

where y is determined directly from the constraint, and then inserted intothe LS problem:

min‖A2z− (b−A1y)‖2

giving us a vector z which can be used to find the final answer:

x = Q

y

z

Page 22: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/22

The Method of WeightingA method for approximating the solution of the LSE-problem (minimize‖Ax− b‖ s.t. Bx = d) through a normal, unconstrained LS problem:

min

∥∥∥∥∥∥

AλB

x−

b

λd

∥∥∥∥∥∥

2

x

for large values of λ.

Page 23: Constrained Least Squares - folk.uio.no · 2005-05-11 · Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580›587

UNIVERSITETETI OSLO

INSTITUTT FOR INFORMATIKK CICN may05/23

The exact solution to the LSE problem:

x =p∑

i=1

vTi d

βixi +

n∑

i=p+1

uTi b

αixi

The approximation:

x(λ) =p∑

i=1

αiuTi b+ λ2β2

ivTi d

α2i + λ2β2

ixi +

n∑

i=p+1

uTi b

αixi

The difference:

x(λ)− x =p∑

i=1

αi(βiuT

i b−αivTi d)

βi(α2i + λ2β2

i

) xi

It is appearant that as λ grows larger, the approximation error is reduced.This method is attractive because it only utilizes ordinary LS solving.