constrained least squares - folk.uio.no · 2005-05-11 · constrained least squares authors: g.h....
TRANSCRIPT
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/1
Constrained Least SquaresAuthors: G.H. Golub and C.F. Van LoanChapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/2
BackgroundThe least squares problem:
min ‖Ax− b‖2
x
Sometimes, we want x to be chosen from some proper subset S ⊂ Rn.Example: S = {x ∈ Rn s.t. ‖x‖2 = 1}Such problems can be solved using the QR factorization and the singularvalue decomposition (SVD).
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/3
Least Squares with a Quadratic InequalityConstraint (LSQI)General problem:
min ‖Ax− b‖2
x s.t. ‖Bx− d‖2 ≤ αwhere:
A ∈ Rm,n(m ≥ n),b ∈ Rm,B ∈ Rp,n,d ∈ Rp, α ≥ 0
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/4
Assume the generalized SVD of matrices A and B given as:
UTAX = diag(α1, ...,αn),UTU = Im
VTBX = diag(β1, ..., βq),VTV = Ip, q =min{p,n}Assume also the following definitions:
b̃ Õ UTb, d̃ Õ VTd, y Õ X−1x
Then the problem becomes:
min ‖DAy− b̃‖2
y s.t. ‖DBy− d̃‖2 ≤ α
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/5
min ‖DAy− b̃‖2
y s.t.‖DBy− d̃‖2 ≤ αCorrectness: By inserting the definitions we get:
‖DAy− b̃‖2 = ‖UTAXX−1x−UTb‖2 = ‖UT (Ax− b)‖2
Multiplication with an orthogonal matrix does not affect the 2-norm. (Thesame result applies for the inequality constraint.)
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/6
The objective function becomes:n∑
i=1
(αiyi − b̃i
)2 +m∑
i=n+1
b̃2i
The constraint becomes:r∑
i=1
(βiyi − d̃i
)2 +p∑
i=r+1
d̃2i ≤ α2
We have:r = rank(B)
βr+1 = βr+2 = ... = βq = 0
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/7
We have a solution if and only if:p∑
i=r+1
d̃2i ≤ α2
Otherwise, there is obviously no way to satisfy the constraint.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/8
Special Case:∑pi=r+1 d̃
2i = α2
The first sum in (12.1.5) must equal zero, this means:
yi = d̃iβi , i ∈ [1, r ]
The remainder of the variables can be chosen to minimize the first sum in(12.1.4):
yi = b̃iαi , i ∈ [r + 1, n]
(Of course, if αi = 0, i ∈ [r + 1, n], this does not make any sense. We thenchoose yi = 0.)
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/9
The General Case:∑pi=r+1 d̃
2i < α2
The minimization (without regards to the constraint) is given by:
yi =b̃i/αi αi ≠ 0
d̃i/βi αi = 0
This may or may not be a feasible solution, depending on whether it is inS.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/10
The Method of Lagrange Multipliers
h(λ,y) = ‖DAy− b̃‖22 + λ
(‖DBy− d̃‖2
2 −α2)
Solve ∂h∂yi , i = 1, ..., n, this yields:
(DTADA + λDT
BDB)
y = DTAb̃+ λDT
Bd̃
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/11
Solution using Lagrange multipliers:
yi(λ) =
αib̃i+λβid̃iα2i+λβ2
ii = 1,2, ..., q
b̃iαi i = q + 1, ..., n
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/12
Determining the Lagrange parameter, λDefine:
φ(λ) Õ ‖DBy(λ)− d̃‖22 =
r∑
i=1
(αiβib̃i −αid̃iα2i + λβ2
i
)2
+p∑
i=r+1
d̃2i
Solve for φ(λ) = α2. Because φ(0) > α2 and the function is monotonedecreasing for λ > 0, we know that there must be a unique, positivesolution λ∗ with φ(λ∗) = α2.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/13
Algorithm: Spherical ConstraintThe special case B = In,d = 0, α < 0 can be interpreted as selecting x fromthe interior of an n-dimensional sphere. It can be solved using thefollowing algorithm:
• [U,Σ,V]← SVD(A)
• b ← UTb
• r ← rank (A)
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/14
Algorithm: Sperical Constraint
• if∑ri=1
(biσi
)2> α2:
• λ∗ ← solve
(∑ri=1
(σibiσ 2i +λ∗
)2
= α2
)
• x ← ∑ri=1
(σibiσ 2i +λ∗
)vi
• else:
• x ← ∑ri=1
(biσi
)vi
• end if
Computing the SVD is the most computationally intense operation in theabove algorithm.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/15
Spherical Constraint as Ridge Regression ProblemUsing Lagrange multipliers to solve the spherical constraint problemresults in: (
ATA+ λI)
x = ATb
where:λ > 0,‖x‖2 = α
This is the solution to the ridge regression problem:
min‖Ax− b‖22 + λ‖x‖2
2
We need some procedure for selecting a suitable λ.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/16
Define the problem:
xk(λ) = argminx‖Dk(Ax− b)‖22 + λ‖x‖2
2
where Dk = I− ekeTk is the matrix operator for removing one of the rows.
Select λ to minimize the cross-validation weighted square error:
C(λ) = 1m
m∑
k=1
wk(aTkxk(λ)− bk)2
This means choosing a λ that does not make the final model rely to muchon any one observation.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/17
Through some calculation, we find that:
C(λ) = 1m
m∑
k=1
wk( rk∂rk/∂bk
)2
where rk is an element in the residual vector r = b−Ax(λ). Theexpression inside the parenthesis can be interpreted as an inversemeasure of the impact of the kth observation on the model.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/18
Using the SVD, the minimization problem is reduced to:
C(λ) = 1m
m∑
k=1
wk
b̃k −
∑rj=1ukjb̃j
(σ 2j
σ 2j +λ
)
1−∑rj=1u2kj
(σ 2j
σ 2j +λ
)
2
where b̃ = UTb as before.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/19
Equality Constrained Least SquaresWe consider a problem similar to LSQI, but with an equality constraint,i.e. a normal least squares problem with solution:
min‖Ax− b‖2
with the constraint that:Bx = d
We assume the following dimensions:
A ∈ Rm,n,B ∈ Rp,n,b ∈ Rm,d ∈ Rp, rank(B) = p
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/20
We start by calculating the QR-factorization of BT:
BT = Q
R
0
withA ∈ Rn,n,R ∈ Rp,p,0 ∈ Rn−p,p
and then add the following defintions:
AQ = [A1A2] ,QTx = y
z
This gives us:
Bx =Q
R
0
T
x =[RT0
]QTx =
[RT0
] y
z
= RTy
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/21
We also get (because QQT = I):
Ax = (AQ)(QTx
)= [A1A2]
y
z
= A1y+A2z
So the problem becomes:
min‖A1y+A2z− b‖2
subject to:RTy = d
where y is determined directly from the constraint, and then inserted intothe LS problem:
min‖A2z− (b−A1y)‖2
giving us a vector z which can be used to find the final answer:
x = Q
y
z
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/22
The Method of WeightingA method for approximating the solution of the LSE-problem (minimize‖Ax− b‖ s.t. Bx = d) through a normal, unconstrained LS problem:
min
∥∥∥∥∥∥
AλB
x−
b
λd
∥∥∥∥∥∥
2
x
for large values of λ.
UNIVERSITETETI OSLO
INSTITUTT FOR INFORMATIKK CICN may05/23
The exact solution to the LSE problem:
x =p∑
i=1
vTi d
βixi +
n∑
i=p+1
uTi b
αixi
The approximation:
x(λ) =p∑
i=1
αiuTi b+ λ2β2
ivTi d
α2i + λ2β2
ixi +
n∑
i=p+1
uTi b
αixi
The difference:
x(λ)− x =p∑
i=1
αi(βiuT
i b−αivTi d)
βi(α2i + λ2β2
i
) xi
It is appearant that as λ grows larger, the approximation error is reduced.This method is attractive because it only utilizes ordinary LS solving.