are 252 - optimization with economic applications ... · be economic agents willing to satisfy the...
TRANSCRIPT
University of California, DavisDepartment of Agricultural and Resource Economics
ARE 252 – Optimization with Economic Applications – Lecture Notes 1Quirino Paris
1. Economic Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . page 1 2. Opportunity Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. Primal – Dual specification of production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34. Vector Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 5. Derivatives in vector notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 6. Tangent line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57. Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68. Curvature – Unconstrained maximization – Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . 89. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 10. Rules to test the definiteness of a symmetric matrix – Quadratic form . . . . . . . . . . . . . . . . . 1411. Solution of linear systems of equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1612. Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 13. More on solving systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Economic EquilibriumThis set of relations is the most important structure of microeconomics. You can find the discussion of the Economic Equilibrium problem in the Linear Programming1 textbook (pages 2-4) and the Symmetric Programming textbook (pages 42-44).Notation:
D ≡ Demand (quantity)S ≡ Supply (quantity) availabilityP ≡ Price
MC ≡ Marginal Cost MR ≡ Marginal Revenue Q ≡ Quantity
Definition 1: For any commodity (or service), equilibrium requires that ⎧ D ≤ SPrimal ⎪⎨ P ≥ 0Quantity Side ⎪⎩ (S − D)P = 0
⎧ MC ≥ MRDual ⎪
⎨ Q ≥ 0Price Side ⎪ (MC − MR)Q = 0⎩
The relation D > S is not an equilibrium (solution) condition because, in a free market, there will alwaysbe economic agents willing to satisfy the demand. Also MR > MC is not an equilibrium condition. Even within a single firm, an economic agent will never stop production at the point Q where MR > MC because
1 Paris, Q., An Economic Interpretation of Linear Programming, Palgrave/Macmillan, 2016. PaParriiss,, QQ.,, EcEcoonnoommiicc FFoouunnddaattiioonnss ooff SSyymmmmeettrriicc PPrrooggrraammmmiinngg, C, Caammbbrrididggee UUnniviveerrssityity PPrreessss, 2, 2001111.. .
1
she would lose the profit associated with the difference (MR − MC)(Q* − Q) where Q* is the profit maximizing quantity at MR = MC .
As we will discuss soon, the Economic Equilibrium relations are a generalization of Karush-Kuhn-Tucker conditions because they require neither differentiability nor optimization.
(S − D)P = 0 is called Primal Complementary Slackness Condition (PCSC) because S − D is the slack (surplus) and P (price) and S − D (quantity) are complements. It contains three propositions:
⎧ 1. if S > D then P = 0 free good 0 = (S − D)P = ⎨
⎪ 2. if P > 0 then S − D = 0
⎪ 3. P = 0 and S − D = 0 indicate MOS ⎩
MOS = Multiple Optimal Solutions.
(MC − MR)Q = 0 is called Dual Complementary Slackness Condition (DCSC) for analogous reasons. Itcontains three propositions:
⎧ 1. if MC > MR then Q = 0 0 = (MC − MR)Q = ⎨
⎪ 2. if Q > 0 then MC − MR = 0
⎪ 3. Q = 0 and MC − MR = 0 indicate MOS ⎩
Opportunity CostThe opportunity cost is the single most important notion of economics. Every decision (of your life) can be explained (taken) with the corresponding opportunity cost. This course will deal with opportunity cost in the presence of many inputs and many outputs.
Therefore, when referred to the action of producing (or not producing) a given commodity we have Marginal Cost in terms
In our notation, the opportunity cost of a given commodity is defined as OC = MC − MR .
Example 1. A farmer considers cultivating a combination of wheat, corn and tomatoes. His inputs are land and labor. The technology (input requirements per unit of output) is given as
Input Prices $10 Land
Wheat 1
Corn Tomatoes 2 3
$12 Labor 2 1 3 $30 $40 $66 Output prices
Gained Net Benefits of the given action
best alternative action
given commodity
Gained Marginal Revenue of the
Forgone Net Benefits of the
of the Forgone Revenue of the best alternative commodities
= Opportunity Cost of a given action
=of producing a given commodity
Opportunity Cost
Definition 2: ⎞⎛ ⎟⎟
⎜⎜
⎠⎝−
⎟ ⎟ ⎟ ⎟⎠
⎜ ⎜ ⎜ ⎜⎝
⎞⎛ ⎟⎟
⎜⎜
⎠⎝
⎞⎛
− ⎟ ⎟ ⎟ ⎟⎠
⎜ ⎜ ⎜ ⎜⎝
⎞⎛
⎞⎛ ⎟⎟
⎜⎜
⎠⎝
⎞⎛ ⎟⎟
⎜⎜
⎠⎝
2
Decide which crops to produce (Use the opportunity cost notion. For each commodity farmer gives up (foregoes)land and labor).
OCW = MCW − MRW = (10 + 24) − 30 > 0 not profitable OCC = MCC − MRC = (20 +12) − 40 < 0 profitable OCT = MCT − MRT = (30 + 36) − 66 = 0 profitable
Hence, produce corn and tomatoes.
Primal and Dual Specifications of production. In this course, we will deal with linear demand and linear supply functions. We will also deal with a linear technology. This choice is sufficient to understand the structure of production. After all, a nonlinear function can belinearized to any degree of accuracy. This selection allows to explicitly decompose a profit-maximizing problem intwo sub-problems, the first one dealing with maximizing total revenue (TR) and the second one with minimizing total cost (TC). The first problem takes the name of Primal and the second one of Dual.
A general specification of a Primal and Dual problem of a price-taking entrepreneur is stated as ⎡max ⎤
⎥⎦≤⎢
⎣ Primal: maxTR subject to D ≤ S Note the formalism
min ⎡ ⎤Dual: minTC subject to MC ≥ MR Note the formalism
Required (known) information: Technology: Given I inputs and J outputs, aij is a constant and known technical coefficient (known as
⎥⎦≥⎢
⎣
1.
′
Marginal Rate of Technical Transformation – MRTT). It measures the quantity of input i required for the production of one unit of output j , i = 1,..., I , j = 1,..., J .
2. Limiting input supplies: known quantities of bi , i = 1,..., I 3. Output market prices: known prices pj , j = 1,..., J
Unknown information: (prior to solve the problem) 4. Output quantities: x j , j = 1,..., J 5. Input shadow prices: yi , i = 1,..., I .
Vector Notation A vector is a carrier of information. It is a box (either vertical or horizontal) containing scalars (mathematical objects). A box containing boxes is called matrix. We must become familiar with vector and matrix notation for two important reasons: 1. It speeds up the writing of a problem; 2. mostimportantly, it allows a much better understanding of the geometric structure of a problem. By convention, a vector a is defined as a vertical box. A horizontal box is the original vertical box transposed and written as a′ . Consider the following polynomial function of order 2:
2z = a1x1 + a2 x2 + b11x12 + b12 x1x2 + b21x1x2 + b22 x2
To rewrite the above polynomial in vector notation we group similar information into horizontal and vertical boxes and name the boxes (vectors):
a x +
⎡ ⎤b11 b12⎤ ⎡ ⎤x1[ +[x1 x2 ]⎢ ⎢⎣
⎥ ⎥⎦
a1 a2 ⎡ ⎢⎣] x1
x2
z = ⎥⎦ b21 b22
⎢⎣x2
⎥⎦
x′Bx=
3
The symbol “ x′ ” means “transposition” of the vector x . To unbox the information it is sufficient to multiply elements of two boxes (one horizontal – in front – and one vertical) that bear the same index and add up the products. To emphasize the horizontal-vertical requirement for unboxing the information look at the following
The multiplication of two vectors is called the inner (or dot) product: ′a x ≡ inner product.
Derivatives in vector notation. It is very important to know how to take derivatives of differentiable function using vector notation. We begin with taking derivatives of the above polynomial function in extended form:
∂∂xz
1
= a1 + 2b11x1 + b12 x2 + b21x2
∂z = a2 + b12 x1 + b21x1 + 2b22 x2∂x2
which can be boxed as
∂z⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ∂ x1 ⎥ a1 b11 b12 x1 b11 b21 x1 = ⎢ ⎥ + ⎢ ⎥ ⎢ ⎥ + ⎢ ⎥ ⎢ ⎥ ⎢ ∂z ⎥ ⎢ a2 ⎥ ⎢ b21 b22 ⎥ ⎢ x2 ⎥ ⎢ b12 b22 ⎥ ⎢ x2 ⎥∂x2⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ∂z = a + Bx + B′x = a + (B + B′)x∂x
Pay great attention to conformability, the rule of adding up vectors of the same dimension. If B is a symmetric matrix, B = B′ , and the derivative simplifies to
∂z = a + (B + B′)x = a + (B + B)x = a + 2Bx .∂x
In a more general notation, the derivative of the differentiable function f (x1, x2,..., xn ) = f (x) is called the gradient and is defined as the vertical vector of all first partial derivatives of the given function, that is
∂ f (x ) ∂ x1
∇f (x) = ! ∂ f (x ) ∂ xn
where the derivatives are evaluated at the point x . The geometric meaning of gradient is the slope of a function, just like the slope of a hill at a given point on the hill. To make sense, the gradient (slope) must be defined in oneagreed-upon direction. Such a direction is orthogonal (perpendicular) to the tangent line of the function f (x1, x2,..., xn ) = f (x) evaluated at the point x .
(For a detailed discussion of orthogonality you can consult pages 119-124 of the Linear Programming textbook.)
4
x1
x2
b11 b12
b21 b22
[ +[x1 x2 ]
x′Bx
z =
=
⎤ ⎤⎡ ⎥⎦
⎢⎣
⎥ ⎥ ⎥⎥⎦
⎤⎦
⎡⎣
⎤⎦
⎡⎣
⎡ ⎢ ⎢ ⎢⎢⎣
⎤ ⎥⎦
] x1
x2
⎡ ⎢⎣
a1 a2
a x +′
⎤⎡ ⎥ ⎥ ⎥ ⎥⎦
⎢ ⎢ ⎢ ⎢⎣
Tangent line: Example 2. Consider the function y = f (x) = 15 − 2x2 and x = 1 . It follows that y = f (x = 1) = 13 . A known point on the function, therefore, is (y! = 13, x! = 1) . The slope (gradient) of the function at x = 1 is
( 1) f∂ x = Gradient of function: 4 4− −x= =
′
∂x The equation of the tangent line uses the slope formula RISE/RUN, that is
rise y − y! y −13 = = = −4 run x − x! x −1
y = 13 = −4(x −1) y = 17 − 4x tangent line at (y! = 13, x! = 1)
Note that the slope of the tangent line is collinear to the slope (gradient) of the function.
Linear function.
c x
⎡ ⎤x1
x2
⎢ ⎢ ⎢ ⎢ ⎢⎣
⎥ ⎥ ⎥ ⎥ ⎥⎦
Consider the linear function z = c1x1 + c2 x2 + ...+ cnxn = [c1 c2 ... cn ] =! xn
The gradient of this function is the vector of first partial derivatives (as stated above). Then
∇f (x) = ∂z (x) = c∂x
and it was stated also that the gradient is always orthogonal to the tangent line of the function evaluated at a given point, say x . But, in a linear function, the tangent line at x (take many different points x ) is identical to the function itself. Conclusion: the vector of coefficients, c , defining the linear function ′ always orthogonal z = c x is (perpendicular) to the function itself. This conclusion applies to the cases of a line, plane, hyperplane.
It is well known that two points define a line, three points (not on the same line) define a plane, n points (not on the same (n −1) hyperplane) define a hyperplane of n dimensions. An equivalent way to define a line, plane, hyperplane is to choose a vector c of 2, 3,…,n dimensions and draw the line, plane, hyperplane orthogonal to it.
The simplicity of example 2 has not fully illustrated the relationship between the gradient of a function, ∇f (⋅) , and the vector of coefficients, c , defining the tangent line at a given point on the function. Thisrelationship will be illustrated by example 3.
Example 3. Consider the equation of a circle f (x, y) = r2 − (x − h)2 − (y − k)2 = 0 where (h, k) is the center of the circle and r is the radius. Let the center be (h = 5, k = 5) and the radius be r = 4 . Then, the equation of this circle is f (x, y) = 16 − (x − 5)2 − (y − 5)2 = 0 . Choose x = 7 and find the corresponding value of y :
5
16 − (7 − 5)2 − (y − 5)2 = 0 16 − 4 − y2 +10y − 25 = 0 −y2 +10y −13 = 0
Recall the quadratic formula for ay2 + by + c = 0 , that is, y = −b± 2a
Therefore,
b2 −4 ac .
⎧ −16.928 −10 ± 100 − 52 −10 ± 48 −10 ± 6.928 ⎪ −2 = 8.464 y = = = = ⎨ −3.072 −2 −2 −2 ⎪ −2 = 1.536⎩
Choose the point (y! = 1.536, x! = 7) on the given circle. To find the slope of the equation of the circle at the given point (tangent line ) we need the Implicit Function Theorem.
Implicit Function Theorem.
Given f :!2 → ! continuously differentiable and a point (α , z ) ∈!2 , if ∂ f (α , z ) ≠ 0 , then there exist ∂z
neighborhoods Uα of α and Uz of z and a continuously differentiable function g :Uα → Uz such that for all α ∈Uα
f (α ,g(α )) = f (α , z ) (e.i., (α , g(α )) is on the level set through (α , z ) ∂ f (α ,g(α ))
g′(α ) = − ∂ f (α∂α
,g(α ) . ∂z
The significance of the Implicit Function Theorem is that it gives the conditions for the formula of the slope of the implicit function even though such a function may not be expressed explicitly.
To be clearer, consider the following implicit linear function: f (x, y) = ax + by + c = 0, a ≠ 0,b ≠ 0 .
To make the correspondence with the symbols of the Implicit Function Theorem, note that α ≡ x, z ≡ y . In this case, it is “easy” to find the g(⋅) function and its slope:
a c y = − bx −
b = g(x) explicit function
dy a fx (x, y)g′(x) = − b = − slope of g(⋅)
dx =
fy (x, y)In this linear function example we can find the g(⋅) function and, of course, its slope.
Consider now the more involved implicit function f (x, y) = x3 − 2x2 y + 3xy2 − 22 = 0 .
Let us choose a point (x, y) = (1,3) and verify that f (1,3) = 13 − 2(1)2 3 + 3(1)32 = 1− 6 + 27 − 22 = 0 .
Hence, the point (x, y) = (1,3) lies on the zero level set defined by f (x, y) = 0 . Now, the partial derivative fy (x, y) = −2x2 + 6xy
fy (1,3) = −2(1)2 + 6(1)3 = −2 +18 = 16 ≠ 0
6
is different from zero. Therefore, according to the Implicit Function Theorem, there exists a function y = g(x) near the point (x, y) = (1,3) which is a solution of f (x, y) = 0 . The problem is that it is not possible to find (try) an explicit formula for the function g(⋅) . However, we can find the slope of the function g(⋅) by applying the slope formula given by the IFT:
dy fx (x, y) = − 3x2 − 4xy + 3y2 9 = g′(x = 1) = − = −
8.
dx x=1 fy (x, y) −2x2 + 6xy x=1x=1y=3 y=3y=3
Back to the equation of the circle. To find the slope of the tangent line to the circle “local function” at the point (y! = 1.536, x! = 7) we apply the implicit function theorem:
fx (x, y) −2(x − 5) 4 g′(x) = − fy (x, y)
= −−2(y − 5)
= 6.928
= 0.577
The equation of the tangent line to the circle “local function” at (y! = 1.536, x! = 7) begins with its slope rise y − y! y −1.536 = = = 0.577 run x − x! x − 7
Then, the tangent line is y −1.536 = 0.577(x − 7) y = 1.536 − 4.039 + 0.577x y = −2.503 + 0.577x
Rewrite the tangent line −0.577x + y = −2.503
⎡ ⎤[−0.577 1]⎢⎣
xy⎥⎦
= −2.503
The vector c′ = [−0.577 1] is orthogonal to the tangent line for the reasons explained above.
Now compute the gradient of the circle “local function” at the point (y! = 1.536, x! = 7) ⎡ fx (x!, y! )⎤ ⎡−2(x! − 5) ⎤ ⎡−2(7 − 5) ⎤ ⎡−4 ⎤
∇f (x!, y! ) = ⎢ ⎥ = ⎢ ⎥ = ⎢ ⎥ = ⎢ ⎥⎣ fy (x!, y! )⎦ ⎣−2(y! − 5) ⎦ ⎣−2(1.536 − 5) ⎦ ⎣6.928⎦
But note that −4⎡ ⎤ ⎤
∇f (x!, y! ) = = 6.928c⎢⎣
⎥⎦
⎥⎦6.928
⎡ ⎢⎣
= 6.928 −0.577
1.0 That is, the gradient of a function is collinear (up to a scalar) to the vector of coefficients of thetangent line. Figure 1 illustrates this example.
7
Figure 1. The gradient of a function and the c vector of the tangent line
Curvature – Unconstrained maximization – Quadratic forms(See also chapter 2, pages 16-17, Symmetric Programming textbook)The maximization (minimization) of a function requires the proper curvature of the function. For example, the function y = f (x) = 16x − 2x2 has a maximum at x * = 4 . Note that the second derivative of this function is
negative f ′′(x * = 4) = −4 < 0 . We say that the function is concave. Figure 2 illustrates the direction of the first and second derivatives in a concave function.
8
Figure 2. Direction of first and second derivatives of a concave function
Theorem 1. First Order Necessary Conditions (FONC) Let f (⋅) : D ⊆ ! be continuously differentiable throughout D , (the domain of the function) If f (⋅) has a local
maximum at x = x * then f ′(x *) = 0 . Theorem 2. Second order Sufficient Conditions (SOSC) Let f (⋅) : D ⊆ ! be continuously differentiable throughout D . If f ′(x *) = 0 and f ′′(x *) < 0 for x = x * then f (⋅) has a strict local maximum at x = x * .
When dealing with a function of two or more variables to be maximized we must generalize the notion of concavity in terms of second partial derivatives of the given function.
Two-Variable Unconstrained Maximization. We want to use the development of theorems 1 and 2 to discuss this case. The scope of this section is to study the conditions for achieving the maximization of afunction of two variables, stated as
max f (x1, x2 ) . x1,x2
In order to use the results of theorems 1 and 2 (one-variable maximization), we state that if f (x1, x2 ) has a def
maximum at x * = (x1*, x2
* ) , then y(t) = f [x1(t), x2 (t)] = g(t) has a maximum at x * = (x1*, x2
* ) for all the * * * curves x(t) = (x1 (t), x2 (t)) that go through x * = (x1 , x2
* ) , that is x1 = x1(t = 0) and x2 = x2 (t = 0) . The advantage of this reformulation of the problem consists in transforming a maximization problem in two variables, (x1, x2 ) , into a maximization problem of one variable, t . Previous development (Thms 1 and 2) found that
9
FONC: dy (x *) = f ′(x *) = 0 and SOSC: d2 y f ′′(x *) < 0 .
dx dx2 (x *) =
Now that we have transformed the two-variable problem into a one-variable specification we are looking to the following expressions:
dy d 2 y(t = 0) = y′(0) = g′(t) = 0 and dt 2 (t = 0) = g′′(t) < 0 .
dt Theorem 3. (FONC) Let f (⋅) : D → R be C 2 on domain D ⊆ !2 . If f (⋅) has a local maximum at (x1
*, x2* )
* * *and (x1 , x2* ) ∈int D , then f1(x1 , x2
* ) = 0 and f2 (x1 , x2* ) = 0 . ( int D stands for interior of domain D)
Proof: Consider f (⋅) evaluated along the lines x1(t) = x1* + h1t and x2 (t) = x2
* + h2t , where h1 and h2 are fixed but arbitrary constants. Thus, the function f (⋅) , when evaluated along the lines x1(t) = x1
* + h1t and x2 (t) = x2
* + h2t , is only a function of the parameter t . We may, therefore define the function y(⋅) of the parameter t as
def y(t) = f (x1
* + h1t, x2* + h2t) .
By hypothesis, (x1*, x2
* ) is the point at which f (⋅) attains a local maximum. In view of the fact that (x1(t), x2 (t)) = (x1
*, x2* ) at t = 0 , it follows that y(0) ≥ y(t) for all t ≠ 0 and sufficiently small such that
def y(t) = f (x1
* + h1t, x2* + h2t) attains a local maximum at t = 0 by construction. Moreover, because f (⋅) ∈C 2
on the domain D , it follows that y(⋅) ∈C 2 for all t such that (x1* + h1t, x2
* + h2t) ∈D . Hence, by theorem 1, y′(0) = 0 necessarily holds.
Now compute y′(t) by the chain rule to get y′(t) = f1(x1
* + h1t, x2* + h2t)h1 + f2 (x1
* + h1t, x2* + h2t)h2 . (A)
Evaluating equation (A) at t = 0 and setting y′(0) = 0 thus yields y′(0) = f1(x1
*, x2* )h1 + f2 (x1
*, x2* )h2 = 0 . (B)
Equation (B) holds for all values of h1 and h2 that, by assumption, are fixed but arbitrary. Therefore, it must be possible to put any values of h1 and h2 in equation (B) and still obtain y′(0) = 0 . As a result, the only way y′(0) = 0 for all values of h1 and h2 is for the coefficients of h1 and h2 to vanish. This implies that f1(x1
*, x2* ) = 0 and f2 (x1
*, x2* ) = 0 .
To see this result from an alternative viewpoint, recall that equation (B) must hold for any values of h1
and h2 , it must therefore hold also for h1 = f1(x1*, x2
* ) and h2 = 0 . Substituting these values in equation (B) we obtain
y′(0) = ⎡⎣ f1(x1*, x2
* )⎤⎦ 2 = 0
which in turn implies that f1(x1*, x2
* ) = 0 . A symmetric reasoning will result in f2 (x1*, x2
* ) = 0 .
Another equivalent way is to let h1 = f1(x1*, x2
* ) and h2 = f2 (x1*, x2
* ) . Substituting these values in equation (B) we obtain
y′(0) = ⎡⎣ f1(x1*, x2
* )⎤⎦ 2 + ⎡⎣ f2 (x1
*, x2* )⎤⎦
2 = 0
10
which in turn implies that f1(x1*, x2
* ) = 0 and f2 (x1*, x2
* ) = 0 . Q.E.D.
Theorem 4. (SOSC). If the function y(t) = f [x1(t), x2 (t)] = g(t) is C (2) and if dy = 0 and d2 y < 0 , then the dt dt2
function has a strict local maximum at x * = (x1*, x2
* ) . Proof: By hypothesis, the second derivative of the function is strictly negative and develops (using ( )⋅y
⎞⎠
the chain rule) as ⎛⎝⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎞⎠
⎛⎝
⎞⎠
⎛⎝
⎞⎠
⎛⎝
2 2d 2 y d 2 x1 d 2 x2= f1 + f2dt 2
dx1 dx1 dx2
dt + f22
dx2+ 2 f12 < 0+ f11dt 2 dt 2 dt dt dt 2 2dx1 dx1 dx2
dt + f22
dx2+ 2 f12 < 0f11 = dt dt dt
because f1 = 0, f2 = 0 by theorem 3. Simplifying notation, let h1 ≡ dx1 and h2 ≡ dx2 . Then dt dt
f11h12 + 2 f12h1 h2 + f22h2
2 < 0 (2) for all values of h1,h2 except for the trivial values h1 = 0,h2 = 0 .We now prove that f11 < 0 and f22 < 0 . In fact, let h1 be any arbitrary number while h2 = 0 . Then f11h1
2 < 0 → f11 < 0 . Similarly, let h1 = 0 while h2
may be any arbitrary values. Then f22h22 < 0 → f22 < 0 .
Still, these conditions are not sufficient to guarantee the strict inequality of equation (2). What role doesthe cross-partial second derivative, f12 , play? To answer this question we must take a detour and study how to complete the square of the expression z2 + 2bz . By adding and subtracting the quantity b2 the given expression is equivalent to
z2 + 2bz ≡ z2 + 2bz + b2 − b2 ≡ (z + b)2 − b2 . By factoring out f11 from equation (2) we obtain
2⎡ ⎛ ⎞ ⎤f12h2 f22h2f11 ⎢h12 + 2
⎝⎜ ⎠⎟ h1 + ⎥ < 0 .
⎣ f11 f11 ⎦ f12h2Note the correspondence z = h1, and b ≡ f12h2 . Then, by adding and subtracting ( )2
we write f11 f11
2 2⎡ 2 ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞f12h2 f12h2 f12h2 f22h2f11 ⎢h12 + 2
⎝⎜ ⎠⎟ h1 +
⎝⎜ ⎠⎟−⎝⎜ ⎠⎟
+ ⎥ < 0 ⎣ f11 f11 f11 f11 ⎦⎢ ⎥
2 2⎡ 2 ⎤⎛ ⎞ ⎛ ⎞f12h2 f12h2 f22h2 f11f11 ⎢ h1 + − + ⎥ < 0 ⎝⎜ ⎠⎟ ⎝⎜ ⎠⎟⎢ f11 f11 f11 f11 ⎥⎣ ⎦
2 2⎡⎛ f12h2 ⎞ h2 2 ⎤
f11 ⎢ h1 + + 2 ( f11 f22 − f12 )⎥ < 0 ⎝⎜ ⎠⎟⎢ f11 f11 ⎥⎣ ⎦
Since f11 < 0 , the term in the square bracket must be positive for all the non-trivial values of h1,h2 . In particular, ( f11 f22 − f12
2 ) > 0 . Q.E.D.
An equivalent way to express the condition that equation (2) be strictly negative is the following:
11
⎡ f11 f12
f12 f22
⎡⎤ ⎢ ⎢⎣
⎥ ⎥⎦
h1
h2
⎤ Q(h1,h2 ) ≡ ⎡
⎣ h1 h2⎤⎦ < 0 (3) ⎢ ⎢⎣
⎥ ⎥⎦
for all the values of h1,h2 not all zero. The function Q(⋅) is called a quadratic form. We will have to fully understand what are the properties of quadratic forms. Note that the square matrix of Q(⋅) is symmetric and, indeed, it is called the Hessian matrix of the function f (⋅) which is the matrix of second partial derivatives of f (⋅) . By Young theorem, f12 = f21 . Hence,
⎡ f11 f12 ⎤
H ≡ ⎢ ⎥ . ⎢ f12 f22 ⎥⎣ ⎦
A function that fulfills the condition of equation (3) is called a strictly concave function.
Determinants A matrix is an arrangement of either row or column vectors. Consider the 2 × 2 matrix A
⎡ ⎤a bA = (4) ⎢⎣
⎥⎦c d
• Definition 3. The determinant (of matrix A ) of order 2, written with straight vertical lines around the matrix A , A , is defined as
a bD2 = = ad − bc c d
that is, the product of the elements on the main diagonal (north-west / south-east) minus the product of the elements on the other diagonal (north-east / south-west ).Consider now the 3× 3 matrix A
⎡ ⎤a11 a12 a13
a21 a22 a23
a31 a32 a33
⎢ ⎢ ⎢⎢⎣
⎥ ⎥ ⎥⎥⎦
(5) A =
• Definition 4. The determinant of order 3 of matrix A is defined as a11 a12 a13 a22 a23 a21 a23 a21 a22D3 = a21 a22 a23 = a11 − a12 + a13a32 a33 a31 a33 a31 a32a31 a32 a33
= a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31
The determinant of order 3, D3 , is defined in terms of determinants of order 2 multiplied by the elementsof row 1. The determinants of order 2 are extracted from matrix A in equation (5) by eliminating the elements of the i -th row and j -th column. For clarity, I repeat here the computation of the above determinant
12
a22 a23 a21 a23 a21 a22D3 = (−1)1+1 a12 + (−1)1+3
a32 a33 a31 a33 a31 a32
= (a22a33 − a23a32 )a11 − (a21a33 − a23a31)a12 + (a21a32 − a22a31)a13
Another way to compute the determinant of a 3 × 3 matrix is to copy the first two columns to the
in definition: copy copy!!
a11 a12 a13 a11 a12
a21 a22 a23 a21 a22
a31 a32 a33 a31 a32
Now multiply the coefficients on the 3 main diagonals (north-west/south-east) with a (+) sign and multiply the coefficients on the 3 other diagonals (north-east/south-west) with a (–) sign to obtain a11a22a33 + a12a23a31 + a13a21a32 − a13a22a31 − a11a23a32 − a12a21a33 , the same products and signs as in D3 .
• Definition 5. The minor of aij is that determinant, Mij , that remains after row i and column j are eliminated from the original determinant of matrix A . Note that in D3 3 products exhibit a positive sign and 3 products exhibit a negative sign. This fact leadsto the following definition:Definition 6. The cofactor of aij , written Aij , is defined as Aij = (−1)i+ j Mij
In terms of the D3 determinant written in definition 4, D3 = a11A11 + a12 A12 + a13A13 . We say that the determinant D3 was expanded in terms of the elements of row 1. It could have been expanded in terms of
the elements of any other row or column. Hence, we can write D3 = ∑3 aij Aij = ∑3
aij Aij for any i and j=1 i=1
for any j . This result applies to a square matrix of any order. • Theorem 5. If all the elements in a row or column of Dn are 0, then Dn = 0 .
1 2( 1) +−+a a11 13
− − −+ +a a a a a a a a a a a a a a a a a a= 11 22 33 11 23 32 12 21 33 12 23 31 13 21 32 13 22 31
right of the matrix coefficients and apply the same rule (+ main diagonals/ – secondary diagonals) stated
Theorem 6. If is obtained from by interchanging any two rows (or columns), then D D D D• −′ ′ = .n n n n
3 −4 1 5
⎡ ⎤ ⎡ 3 4 ⎤ = 20 − (−3) = 23 , = −3 − 20 = −23A2A1 , A2 A1 ⇒ = −A1 A2⎢
⎣ ⎥⎦
⎢⎣
⎥⎦
= = ,5 −1
• Theorem 7. If Dn ′ is obtained from Dn by multiplying any row (or column) by some scalar number k , then Dn ′ = kDn .
⎡ 4 3 ⎤ ⎡ 12 3 ⎤A1 = , A2 =⎢
⎣ ⎥⎦
⎢⎣
⎥⎦,
−1 5 −3 5
= 20 − (−3) = 23 , = 60 − (−9) = 69A1 A2
= k , k = 3A1 A2
One column of A1 is a multiple of a column of A2
13
• Theorem 8. If Dn has 2 rows that are identical, then Dn = 0 . If one row (column) is proportional to another row (column), then Dn = 0 .
1 3⎡ ⎤ , A = 27 − 27 = 0A = ⎢
⎣ ⎥⎦9 27
Columns (rows) are not independent (are collinear).The converse of this theorem is a test for linear independence: if Dn ≠ 0 , the column (or row) vectors of the A matrix are linearly independent. In this case, the matrix A is called non singular or has rank n .
• Theorem 9. If Dn ′ is obtained by adding term by term the multiple of any row (column) of Dn ′ , then Dn ′ = Dn .
(4 + 9) 3 (−1+15) 5
⎤ ⎥ ⎥⎦
⎡4 3⎡ ⎤A1 = , A2 = ⎢
⎢⎣ ⎢⎣
⎥⎦−1 5
= 20 − (−3) = 23, = 65 − 42 = 23A1 A2
=A1 A2
• Theorem 10. Very important. If the elements of any row (column) are multiplied by the cofactors ofsome other row (column), the resulting sum is equal to zero. This process is called expansion by alien cofactors.
6 1 2⎡ ⎤4 −1 −3 −1 −3 4⎢
⎢ ⎢⎣
⎥ ⎥ ⎥⎦
−3 4 −1 1 5 7
, minors of row 1: a11 =A = a12 a13 = = 5 7
, 1 7
, 1 5
Multiply cofactors by elements of row 2:
4 −1 −3 −1 −3 4(−1)2+1 (−3) + (−1)2+2 4 + (−1)2+3 (−1) = 5 7 1 7 1 5
(28 − (−5) )3+ (−21− (−1) )4 + (−15 − 4) = 33* 3+ (−80) + (−19) = 99 − 99 = 0
Rules to test the definiteness of a symmetric matrix (quadratic form)(See also chapter 5, pages 66-68, Symmetric Programming textbook)
⎡ ⎤⎡ ⎤ ⎡ ⎤b b xa a = ax2 + 2bxy + cy2 . Complete the square on yLet A = and compute ⎡⎣ x y ⎤⎦ ⎢
⎢⎣ ⎥⎥⎦
⎢⎣
⎥⎦
⎢⎣
⎥⎦b b yc c
by adding and subtracting b2 y2 a
b2 b2 ⎛ 2b b2 ⎞ ⎛ ac − b2 ⎞ ax2 + 2bxy + y2 − y2 + cy2 = a x2 + xy +
a2 y2
⎠⎟ y2
⎝⎜ ⎠⎟ + ⎝⎜a a a a
14
⎛ 2b b2 ⎞ ⎛ ac − b2 ⎞ ⎛ b ⎞ 2 ⎛ ac − b2 ⎞
ax2 + 2bxy + cy2 = a x2 + xy + a2 y
2
⎠⎟ +
⎠⎟ y2 = a x + y2 +
⎠⎟ y2
⎝⎜ ⎝⎜ ⎝⎜ ⎠⎟ ⎝⎜a a a a Therefore, the 2 × 2 matrix A will be positive definite iff a > 0 and ac − b2 > 0 .
It will be negative definite iff a < 0 and ac − b2 > 0 . Extrapolating (without proof; if interested, googlehttps://econ.ucsb.edu/~tedb/Courses/GraduateTheoryUCSB/BlumeSimonCh16.PDF,http://www2.econ.iastate.edu/classes/econ501/Hallam/documents/Quad_Forms_000.pdf)Given a symmetric matrix
⎡ ⎢ ⎢ ⎢ ⎢ ⎢⎣
a11 a12 ! a1n
a12 a22 ! a2n
" " " " a1n a2n ! ann
⎤ ⎥ ⎥ ⎥ ⎥ ⎥⎦
A =
• A is positive definite iff the leading principal minors are positive: (principal minors are on the main diagonal – north-west/south-east) (leading principal minors start on the north-west tip of the main diagonal)
a11 a12 a13a11 a12> 0, > 0, > 0, ...... a11 a12 a22 a23a12 a22 a13 a23 a33
• A is negative definite iff the leading principal minors alternate in sign: a11 a12 a13a11 a12< 0, > 0, < 0, ...... a11 a12 a22 a23a12 a22 a13 a23 a33
that is, if the leading principal minors of order k have sign (−1)k , k = 1,...,n . • A is positive semidefinite iff all the principal minors are nonnegative:
≥ 0, ≥ 0, ... ≥ 0, a11 a22 ann
a(n−1)(n−1) a(n−1)na11 a12 a22 a23 a11 a13≥ 0, ≥ 0, ≥ 0, ..... ≥ 0 a12 a22 a23 a33 a13 a33 a(n−1)n ann
a11 a12 a13 a11 a12 a14 a22 a23 a24
≥ 0, ≥ 0, ≥ 0,...... a12 a22 a23 a12 a22 a24 a23 a33 a34
a13 a23 a33 a14 a24 a44 a24 a34 a44
• A is negative semidefinite iff all the principal minors of order k alternate in sign, that is, iff the principal minors of order k have sign (−1)k , k = 1,...,n .
15
≤ 0, ≤ 0, ... ≤ 0, a11 a22 ann
a(n−1)(n−1) a(n−1)na11 a12 a22 a23 a11 a13≥ 0, ≥ 0, ≥ 0, ..... ≥ 0 a12 a22 a23 a33 a13 a33 a(n−1)n ann
a11 a12 a13 a11 a12 a14 a22 a23 a24
a12 a22 a23 ≤ 0, a12 a22 a24 ≤ 0, a23 a33 a34 ≤ 0,...... a13 a23 a33 a14 a24 a44 a24 a34 a44
Solution of linear systems of equations(See also chapter 4, pages 59-60, Symmetric Programming textbook)Consider the following system of linear equations Ax = b
⎡ ⎢ ⎢⎣
a11 a12
a21 a22
⎡⎤ ⎢ ⎢⎣
⎥ ⎥⎦
x1
x2
⎡⎤ ⎥ ⎥⎦= ⎢ ⎢⎣
b1
b2
⎤ ⎥ ⎥⎦
a11x1 + a12 x2 = b1
=a21x1 + a22 x2 b2
To solve these equations for x1 we multiply the first equation by a22 and the second equation by a12 and subtract the second equation from the first:
=a22 (a11x1 + a12 x2 ) a22b1
=a12 (a21x1 + a22 x2 ) a12b2
=(a11a22 − a12a21)x1 b1a22 − b2a12
Assuming that a11a22 − a12a21 ≠ 0 , the solution for x1 is b1a22 − b2a12x1 = a11a22 − a12a21
The denominator and the numerator of the above fraction correspond to the determinants of the following matrices
a11 a12D2 = = a11a22 − a12a21a21 a22
b1 a12Db1 = = b1a22 − b2a12b2 a22
The numerator is created by replacing the column of coefficients associated with x1 by the RHS column of coefficients b of the linear system of equations. This example inspired the development of the Cramer’sRule for solving linear systems of equations.
16
• Theorem 11. Cramer’s Rule. Consider a system of n linear equations in n unknowns Ax = b : ⎡ ...a11 a1n
⎡⎤ x1⎡⎤ b1
⎤
= ⎢⎢⎢ ⎣
! ! "an1 ann
⎢⎢⎢
⎥⎥⎥ ⎣⎦
! xn
⎥⎥⎥
⎢⎢⎢ ⎣⎦
! bn
⎥⎥⎥ ⎦
If the determinant ≠ 0 a unique solution exists for each xi . In particular, A
a11 ! b1 ! a1n
" " " " " an1 ! bn ! ann
xi = A
where the RHS vector b of the linear system of equations replaces the i -th column in the A matrix. Proof for a 3 × 3 system of equations.
a11x1 + a12 x2 + a13x3 = b1
a21x1 + a22 x2 + a23x3 = b2
a31x1 + a32 x2 + a33x3 = b3
Solving for x1 , multiply the first equation by A11 the cofactor of a11 ; multiply the second equation by A21
the cofactor of a21 ; multiply the third equation by A31 the cofactor of a31 . Add the three equations and factor out the xi unknown variables to obtain:
(a11A11 + a21A21 + a31A31)x1 + (a12 A11 + a22 A21 + a32 A31)x2
+ (a13A11 + a23A21 + a33A31)x3 = b1A11 + b2 A21 + b3A31
that reduces to A x1 = b1A11 + b2 A21 + b3A31
because the term in the first parenthesis is the determinant of the A matrix while the other terms in parenthesis are expansions by alien cofactors and by theorem 10 are equal to zero. Hence,
⎡ ⎢ ⎢ ⎢⎢⎣
b1 a12 a13
b2 a22 a23
b3 a23 a33
⎤ ⎥ ⎥ ⎥⎥⎦ Q.E.D. x1 =
A Similar computations apply to all the other components of the solution vector x .
Geometric Interpretation of the Determinants and Cramer’s Rule The determinant of the A matrix in Ax = b ∈!n is equal to the volume of the parallelepiped (faces are parallelograms) constructed by adding up the column vectors of the A matrix. The verification of this
⎡ 5 0 ⎤proposition will be done only for 2 × 2 matrices. Consider the matrix A = ⎡
⎣ a1 a2⎤⎦ = . Its ⎢
⎣ ⎥⎦0 2
determinant is = 5 ⋅2 − 0 = 10 . By plotting the column vectors a1 and a2 in a two-dimensional Cartesian diagram with coordinates (5, 0) and (0, 2), respectively, the area of the parallelogram resulting from the
A
17
addition of the two vectors is equal to a1 + a2 = ⎡ ⎢⎣
50
⎡⎤ ⎥⎦ + ⎢⎣
02
⎡⎤ ⎥⎦ = ⎢⎣
52
⎤ ⎥⎦
which indicates the rectangle with
sides of dimension 5 and 2, respectively, and whose area equals 5 × 2 = 10 . 5 3⎡ ⎤
Consider now the matrix A = ⎡⎣ a1 a2
⎤⎦ = . Its determinant is A = 5 ⋅ 2 − 3⋅1 = 7 . By⎢
⎣ ⎥⎦1 2
using theorem 9 (which states that adding term by term the multiple of any row (column) the determinantof a matrix does not change) twice, it is possible to make the off-diagonal terms equal to zero. It will beapparent, therefore, that the area of the resulting matrix is equal to the original determinant. The first operation is to subtract the multiple 3/2 of the second row from the first row:
5 3 5 − 3 / 2 3− 2 ⋅ 3 / 2 1 2
⎡⎤ ⎥⎦ = ⎢⎣
7 / 2 0 1 2
⎤ ⎥⎦
. The second operation is to subtract the multiple ⎡ ⎤ ⎡
⇒⎢⎣
⎥⎦
⎢⎣1 2
2/7 of the first row of the resulting matrix from the second row:
7 / 2 0 1− (7 / 2) ⋅(2 / 7)
⎡ ⎤7 / 2 0 1
7 / 2 0 0
⎡ ⎢⎣
⎤ ⎡ ⎤. The determinant of the resulting matrix is equal ⇒ ⎢
⎢⎣ ⎥⎥⎦
⎥⎦
⎢⎣
⎥⎦
= 22 2
to 7 (as the original matrix) and, furthermore, the two vectors of the resulting matrix plotted in a two-dimensional Cartesian diagram produce a rectangle with sides (7/2, 0) and (0, 2) with an area of (7 / 2) × 2 = 7 . These computations can be extended to a matrix of any dimension. It should not surprisethat we are talking here of the possibility of negative areas (parallelepiped) as the volume of the matrix depends on the disposition of its vectors in the Cartesian diagram.
Figure 3. Illustration of Cramer’s Rule
Intuition of Cramer’s Rule. Given Ax = b with A = ⎡⎣ v1 v2
⎤⎦ , define matrices A1 = ⎡
⎣ b v2⎤⎦ and
A2 = ⎡⎣ v1 b ⎤
⎦ . Cramer’s Rule states that x1 = A1 A2/ A x2 =and / A . Figure 3 illustrates a possible
18
arrangement of vectors v1,v2,b . The parallelograms (areas, determinants) associated with matrices A and A1 are pretty similar in size. Hence, the value of x1 is pretty close to 1. On the contrary, the area of the parallelogram (area, determinant) associated with matrix A2 is close to zero and the value of x2 will also be close to zero. This is a consequence of the fact that vector v1 is almost collinear to vector b . In other words, vector v1 can represent vector b almost all by itself, without the contribution of vector v2 , as indicated in the first diagram of figure 18. This is the reason why x2 ≈ 0 , that is, is very small.
Inverse of a Matrix Theorem 12. Inverse of a square matrix. Given a system of linearly independent equations Ax = b ∈!n , with a square matrix A of order n , the inverse of matrix A is the “reciprocal” of matrix A and is indicated by A−1 . Furthermore, AA−1 = A−1A = I , where I is the identity matrix. Proof. Since the equations are linearly independent by assumption, A ≠ 0 by the converse of theorem 8. Form the adjoint matrix A*
⎡ A11 A21 ! An1
A12 A22 ! An2
" " " " A1n A2n ! Ann
⎤ ⎥ ⎥ ⎥ ⎥ ⎥⎦
⎢ ⎢ ⎢ ⎢ ⎢⎣
A* =
that shows the transposed cofactors of elements aij . Now analyze the product AA* :
⎡ ⎤A 0 ! 0
0 A ! 0 " " # " 0 0 ! A
⎡ ⎡⎤ ⎤A11 A21 ! An1
A12 A22 ! An2
" " " " A1n A2n ! Ann
!a11 a12 a1n ⎢ ⎢ ⎢ ⎢ ⎢⎣
⎥ ⎥ ⎥ ⎥ ⎥⎦
⎢ ⎢ ⎢ ⎢ ⎢⎣
⎢ ⎢ ⎢ ⎢ ⎢⎣
⎥ ⎥ ⎥ ⎥ ⎥⎦
⎥ ⎥ ⎥ ⎥ ⎥⎦
!a21 a22 a2n = A I= " " " " an1 an2 ! ann
The inner products on the main diagonal are all equal to A by definition 6. The off diagonal inner
products are equal to zero by theorem 10, expansion by alien cofactors. Hence, from AA* = A I : ⎡ A11 ! An1
A A " " " A1n ! Ann
A A
⎤ ⎥⎥⎥⎥⎥⎥
⎢⎢⎢⎢⎢⎢
A−1 =
⎦⎣Also:
AA−1 = I A−1AA−1 = A−1I A−1AA−1A = A−1IA = A−1AI A−1AI = I = A−1A
19
The inverse of a matrix is unique. It does not matter on which side the matrix A is multiplied by its inverse. Q.E.D.
More on Solving Systems of Linear EquationsThere is more to discuss about a system of linear equations to understand its meaning and its geometricsolution in an intuitive but important way.
• A linear system of linear equations Ax = b ∈!n , x ∈!n , has three components: a vector of known coefficients b ∈!n that constitutes the RHS (by convention) of the system Ax = b . Hence, the vector b is a point in !n that should be expressed equivalently by what is written as the LHS of the system, that is, Ax .
• This operation is analogous to measuring an object using a ruler to obtain a measurement of the given object. For example, the length of a table can be measured with a ruler defined in inches (or centimeters) to obtain a measurement of the table’s length. Hence, if the vector b is the “mathematical object” to be measured, matrix A is the “mathematical ruler” to use and the vector x will be the resulting “measurement.” In fact, if ≠ 0 , matrix A constitutes a basis (“mathematical ruler”) for the !n space. A
• Definition 7. In a vector space V ⊆ !n , a basis is a linearly independent subset of vectors {v1,...,vm } ∈V that spans V . The spanning property states that every vector x ∈V can be expressed as a linear combination of the basic vectors x = ∑m
λk v k , ∀λk ∈!m .
k=1
The Cartesian coordinate system is a basis with vectors {e1,e2,...,em } where each vector ek , k = 1,...,m exhibits a unit value in location k and zero everywhere else. In general, a vector space may have many bases. Without loss of generality, therefore, it is possible to say that any system of linear equations Ax = b with A ≠ 0 has the following intuitive interpretation:
A x = b⋅Math Measu − Math ⎡ ⎡⎤ ⎡⎤ ⎤ ⎢ ⎢ ⎢⎣
⎢ ⎢ ⎢⎣
⎥ ⎥ ⎥⎦
⎥ ⎥ ⎥⎦
= ⎢ ⎢ ⎢⎣
⎥ ⎥ ⎥⎦
Ruler (basis)
object to be measured
rement (solution)
Example 6. This example, in !2 , shows how to find a solution of a system of linear equations without manipulating any coefficient of the problem. It follows the discussion developed above about a) mathematical object to be measured, b) mathematical ruler to use, and c) finding the measurement of thegiven object. In order to make sure that no coefficient manipulation will be performed we give the problemin the diagram of figure 18(a) where the “object to be measured” (vector b ) is located between two vectorsA = [a1 a2 ] that constitute the “mathematical ruler” (basis) to use in the “measurement.” In its algebraic
⎡ ⎤x1
x2form, the given system of linear equations is stated as ⎡⎣ a1 a2
⎤⎦ = a1x1 + a2 x2 = b . The crucial ⎢ ⎢⎣
⎥ ⎥⎦
idea is to construct the parallelogram using vectors [a1 a2 ] as the sides of the parallelogram so that the vector b is the main diagonal.
20
Panel (a) Panel (b)Figure 3. Solution of a linear system of equations by parallelogram
Panel (a) of figure 3 shows the problem as given and described above. Panel (b) exhibits the solution (measurement) achieved by constructing the relevant parallelogram. First, extend as necessary the half rays (dotted lines) along vectors a1 and a2 (in this case it is necessary to extend only vector a1 ). Vectors a1 and a2 constitute the “mathematical ruler” (basis) to be used to “measure” vector b . Second, from the tip of vector b , draw the parallel dashed lines to the opposite lines. The intersections points of thedashed lines constitute vectors a1x1 and a2 x2 of the given system of equations. Third, read the value of x1
and x2 by evaluating how long (or how short) vectors a1x1 and a2 x2 are. In this case, a1x1 is twice as long (“measurement” or solution) as vector a1 and vector a2 x2 is half of vector a2 . Hence, x1 = 2 and x2 = 1 / 2 . No manipulation of coefficients.
Graphical Appendix on Quadratic FormsQ(x) = x′Ax is called a Quadratic Form where A is a symmetric matrix of dimensions (n × n) .
If Q(x) = x′Ax < 0 for all x ≠ 0 vectors, the quadratic form is negative definite. Example: In a 3-dimensional space, the quadratic form lies entirely below the zero plane except for the point (x1 = 0, x2 = 0) . In other words, the graph of the quadratic form Q(x1, x2 ) = a11x1
2 + 2a12 x1x2 + a22 x22 touches the zero plane
only at the point (x1 = 0, x2 = 0) . Figure 4 illustrates a negative definite quadratic form for specific values of the aij ,i, j = 1,2 coefficients. The quadratic form looks like a half ball that touches the zero plane only at one point.
21
Figure 4. Negative Definite Quadratic Form: −2x12 − 2x2
2
If Q(x) = x′Ax ≤ 0 for all x vectors, the quadratic form is negative semidefinite. Example: In a 3-dimensional space, the quadratic form lies entirely below the zero plane except for all the pointsalong a line. Figure 5 illustrates a negative semidefinite quadratic form for specific values of the aij ,i, j = 1,2 coefficients. The quadratic form looks like a half a cylinder that touches the zero plane along a line.
Figure 5. Negative Semidefinite Quadratic Form: −212 + 4x1x2 − 2x2
2
22
If Q(x) = x′Ax > 0 for all x ≠ 0 vectors, the quadratic form is positive definite. Example: In a 3-dimensional space, the quadratic form lies entirely above the zero plane except for the point (x1 = 0, x2 = 0) . In other words, the graph of the quadratic form Q(x1, x2 ) = a11x1
2 + 2a12 x1x2 + a22 x22
touches the zero plane only at the point (x1 = 0, x2 = 0) . Figure 6 illustrates a positive definite quadratic form for specific values of the aij ,i, j = 1,2 coefficients. The quadratic form looks like a half ball that touches the zero plane only at one point.
Figure 6. Positive Definite Quadratic Form: 3x12 + 3x2
2
If Q(x) = x′Ax ≥ 0 for all x vectors, the quadratic form is positive semidefinite. Example: In a 3-dimensional space, the quadratic form lies entirely above the zero plane except for all the pointsalong a line. Figure 7 illustrates a positive semidefinite quadratic form for specific values of the aij ,i, j = 1,2 coefficients. The quadratic form looks like a half a cylinder that touches the zero plane along a line.
23
Figure 7: Positive Semidefinite Quadratic Form: 2x12 + 4x1x2 + 2x2
2
If Q(x) = x′Ax ≥ 0 for some x vectors and Q(x) = x′Ax ≤ 0 for some other x vectors, the quadratic form is indefinite. Example: In a 3-dimensional space, the quadratic form lies partly above the zero plane and partly below the zero plane. Figure 8 illustrates an indefinite quadratic form for specificvalues of the aij ,i, j = 1,2 coefficients.
Figure 8. Indefinite Quadratic Form: −2x12 + 4x1x2 + 2x2
2
24
Test for Quadratic FormsThe decision whether a symmetric matrix A of dimensions (n × n) is either negative or positive definite relies on the evaluation of principal minors of matrix A . Principal minors are determinants along the main diagonal of the A matrix. Leading principal minors are determinants of increasing size along the main diagonal. Figure 9 illustrates the structure of the A matrix in relation to leading principal minors.
Figure 9. Leading Principal Minors of the A Matrix
Q(x) = x′Ax < 0 is negative definite if and only if
a11 a12 a13a11 a12< 0 , > 0 , < 0 , … (−1)k Mka11 a21 a22 a23a21 a22 a31 a32 a33
where Mk is a minor of order k, k = 1,...,n . That is, leading principal minors alternate in sign.
25
Q(x) = x′Ax > 0 is positive definite if and only if
a11 a12 a13a11 a12> 0 , > 0 , > 0 , … Mk > 0a11 a21 a22 a23a21 a22 a31 a32 a33
where Mk is a minor of order k, k = 1,...,n . Example: The Least-Square matrix ( ′X X) of a linear statistical model is a positive definite matrix ( X is the matrix of explanatory variables). If it were not positive definite its inverse would not existand no least-square parameters could be estimated. Hence, you have dealt already with positivedefinite quadratic forms.
Q(x) = x′Ax ≤ 0 is negative semidefinite if and only if all minors of order k, k = 1,...,n alternate in sign:
≤ 0, ≤ 0, ≤ 0,.... ≤ 0 ,a11 a22 a33 akk all other
a11 a12 a11 a13 a22 a23≥ 0, ≥ 0, ≥ 0,..., 2 × 2 ≥ 0 ,a21 a22 a31 a33 a32 a33 minors
all other a11 a12 a13 a11 a12 a14
≤ 0, ≤ 0,..., minors of ≤ 0a21 a22 a23 a21 a22 a24
order 3 a31 a32 a33 a41 a42 a44
continue as above alternating sign with all minors of order k, k = 4,...,n
Q(x) = x′Ax ≥ 0 is positive semidefinite if and only if all minors of order k, k = 1,...,n are nonnegative:
≥ 0, ≥ 0, ≥ 0,.... ≥ 0 ,a11 a22 a33 akk all other
a11 a12 a11 a13 a22 a23≥ 0, ≥ 0, ≥ 0,..., 2 × 2 ≥ 0 ,a21 a22 a31 a33 a32 a33 minors
all other a11 a12 a13 a11 a12 a14
≥ 0, ≥ 0,..., minors of ≥ 0a21 a22 a23 a21 a22 a24
order 3 a31 a32 a33 a41 a42 a44
continue as above with all minors of order k, k = 4,...,n being nonnegative.
26