linear algebra - quantitative economics with juliaย ยท 1. no vector in ๐ดcan be formed as a linear...
TRANSCRIPT
Linear Algebra
Jesse Perla, Thomas J. Sargent and John Stachurski
May 20, 2020
1 Contents
โข Overview 2โข Vectors 3โข Matrices 4โข Solving Systems of Equations 5โข Eigenvalues and Eigenvectors 6โข Further Topics 7โข Exercises 8โข Solutions 9
2 Overview
Linear algebra is one of the most useful branches of applied mathematics for economists toinvest in.
For example, many applied problems in economics and finance require the solution of a linearsystem of equations, such as
๐ฆ1 = ๐๐ฅ1 + ๐๐ฅ2๐ฆ2 = ๐๐ฅ1 + ๐๐ฅ2
or, more generally,
๐ฆ1 = ๐11๐ฅ1 + ๐12๐ฅ2 + โฏ + ๐1๐๐ฅ๐โฎ
๐ฆ๐ = ๐๐1๐ฅ1 + ๐๐2๐ฅ2 + โฏ + ๐๐๐๐ฅ๐
(1)
The objective here is to solve for the โunknownsโ ๐ฅ1, โฆ , ๐ฅ๐ given ๐11, โฆ , ๐๐๐ and ๐ฆ1, โฆ , ๐ฆ๐.
When considering such problems, it is essential that we first consider at least some of the fol-lowing questions
โข Does a solution actually exist?โข Are there in fact many solutions, and if so how should we interpret them?โข If no solution exists, is there a best โapproximateโ solution?
1
โข If a solution exists, how should we compute it?
These are the kinds of topics addressed by linear algebra.
In this lecture we will cover the basics of linear and matrix algebra, treating both theory andcomputation.
We admit some overlap with this lecture, where operations on Julia arrays were first ex-plained.
Note that this lecture is more theoretical than most, and contains background material thatwill be used in applications as we go along.
3 Vectors
A vector is an element of a vector space.
Vectors can be added together and scaled (multiplied) by scalars.
Vectors can be written as ๐ฅ = [๐ฅ1, โฆ , ๐ฅ๐].
The set of all ๐-vectors is denoted by โ๐.
For example, โ2 is the plane, and a vector in โ2 is just a point in the plane.
Traditionally, vectors are represented visually as arrows from the origin to the point.
The following figure represents three vectors in this manner.
3.1 Setup
In [1]: using InstantiateFromURL# optionally add arguments to force installation: instantiate = true,๏ฟฝ
โชprecompile = truegithub_project("QuantEcon/quantecon-notebooks-julia", version = "0.7.0")
In [2]: using LinearAlgebra, Statistics, Plotsgr(fmt=:png);
In [3]: x_vals = [0 0 0 ; 2 -3 -4]y_vals = [0 0 0 ; 4 3 -3.5]
plot(x_vals, y_vals, arrow = true, color = :blue,legend = :none, xlims = (-5, 5), ylims = (-5, 5),annotations = [(2.2, 4.4, "[2, 4]"),
(-3.3, 3.3, "[-3, 3]"),(-4.4, -3.85, "[-4, -3.5]")],
xticks = -5:1:5, yticks = -5:1:5,framestyle = :origin)
Out[3]:
2
3.2 Vector Operations
The two most common operators for vectors are addition and scalar multiplication, which wenow describe.
As a matter of definition, when we add two vectors, we add them element by element
๐ฅ + ๐ฆ =โกโขโขโฃ
๐ฅ1๐ฅ2โฎ
๐ฅ๐
โคโฅโฅโฆ
+โกโขโขโฃ
๐ฆ1๐ฆ2โฎ
๐ฆ๐
โคโฅโฅโฆ
โถ=โกโขโขโฃ
๐ฅ1 + ๐ฆ1๐ฅ2 + ๐ฆ2
โฎ๐ฅ๐ + ๐ฆ๐
โคโฅโฅโฆ
Scalar multiplication is an operation that takes a number ๐พ and a vector ๐ฅ and produces
๐พ๐ฅ โถ=โกโขโขโฃ
๐พ๐ฅ1๐พ๐ฅ2
โฎ๐พ๐ฅ๐
โคโฅโฅโฆ
Scalar multiplication is illustrated in the next figure
In [4]: # illustrate scalar multiplication
x = [2]scalars = [-2 1 2]vals = [0 0 0; x * scalars]labels = [(-3.6, -4.2, "-2x"), (2.4, 1.8, "x"), (4.4, 3.8, "2x")]
plot(vals, vals, arrow = true, color = [:red :red :blue],legend = :none, xlims = (-5, 5), ylims = (-5, 5),annotations = labels, xticks = -5:1:5, yticks = -5:1:5,framestyle = :origin)
Out[4]:
In Julia, a vector can be represented as a one dimensional Array.
Julia Arrays allow us to express scalar multiplication and addition with a very natural syntax
In [5]: x = ones(3)
Out[5]: 3-element Array{Float64,1}:1.01.01.0
In [6]: y = [2, 4, 6]
Out[6]: 3-element Array{Int64,1}:246
3
In [7]: x + y
Out[7]: 3-element Array{Float64,1}:3.05.07.0
In [8]: 4x # equivalent to 4 * x and 4 .* x
Out[8]: 3-element Array{Float64,1}:4.04.04.0
3.3 Inner Product and Norm
The inner product of vectors ๐ฅ, ๐ฆ โ โ๐ is defined as
๐ฅโฒ๐ฆ โถ=๐
โ๐=1
๐ฅ๐๐ฆ๐
Two vectors are called orthogonal if their inner product is zero.
The norm of a vector ๐ฅ represents its โlengthโ (i.e., its distance from the zero vector) and isdefined as
โ๐ฅโ โถ=โ
๐ฅโฒ๐ฅ โถ= (๐
โ๐=1
๐ฅ2๐ )
1/2
The expression โ๐ฅ โ ๐ฆโ is thought of as the distance between ๐ฅ and ๐ฆ.
Continuing on from the previous example, the inner product and norm can be computed asfollows
In [9]: using LinearAlgebra
In [10]: dot(x, y) # Inner product of x and y
Out[10]: 12.0
In [11]: sum(prod, zip(x, y)) # Gives the same result
Out[11]: 12.0
In [12]: norm(x) # Norm of x
Out[12]: 1.7320508075688772
In [13]: sqrt(sum(abs2, x)) # Gives the same result
Out[13]: 1.7320508075688772
4
3.4 Span
Given a set of vectors ๐ด โถ= {๐1, โฆ , ๐๐} in โ๐, itโs natural to think about the new vectors wecan create by performing linear operations.New vectors created in this manner are called linear combinations of ๐ด.In particular, ๐ฆ โ โ๐ is a linear combination of ๐ด โถ= {๐1, โฆ , ๐๐} if
๐ฆ = ๐ฝ1๐1 + โฏ + ๐ฝ๐๐๐ for some scalars ๐ฝ1, โฆ , ๐ฝ๐
In this context, the values ๐ฝ1, โฆ , ๐ฝ๐ are called the coefficients of the linear combination.The set of linear combinations of ๐ด is called the span of ๐ด.The next figure shows the span of ๐ด = {๐1, ๐2} in โ3.The span is a 2 dimensional plane passing through these two points and the origin.
In [14]: # fixed linear function, to generate a planef(x, y) = 0.2x + 0.1y
# lines to vectorsx_vec = [0 0; 3 3]y_vec = [0 0; 4 -4]z_vec = [0 0; f(3, 4) f(3, -4)]
# draw the planen = 20grid = range(-5, 5, length = n)z2 = [ f(grid[row], grid[col]) for row in 1:n, col in 1:n ]wireframe(grid, grid, z2, fill = :blues, gridalpha =1 )plot!(x_vec, y_vec, z_vec, color = [:blue :green], linewidth = 3, labels๏ฟฝ
โช= "",colorbar = false)
Out[14]:
3.4.1 Examples
If ๐ด contains only one vector ๐1 โ โ2, then its span is just the scalar multiples of ๐1, which isthe unique line passing through both ๐1 and the origin.If ๐ด = {๐1, ๐2, ๐3} consists of the canonical basis vectors of โ3, that is
๐1 โถ= โกโขโฃ
100
โคโฅโฆ
, ๐2 โถ= โกโขโฃ
010
โคโฅโฆ
, ๐3 โถ= โกโขโฃ
001
โคโฅโฆ
then the span of ๐ด is all of โ3, because, for any ๐ฅ = (๐ฅ1, ๐ฅ2, ๐ฅ3) โ โ3, we can write
๐ฅ = ๐ฅ1๐1 + ๐ฅ2๐2 + ๐ฅ3๐3
Now consider ๐ด0 = {๐1, ๐2, ๐1 + ๐2}.If ๐ฆ = (๐ฆ1, ๐ฆ2, ๐ฆ3) is any linear combination of these vectors, then ๐ฆ3 = 0 (check it).Hence ๐ด0 fails to span all of โ3.
5
3.5 Linear Independence
As weโll see, itโs often desirable to find families of vectors with relatively large span, so thatmany vectors can be described by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is whatโs called linear inde-pendence.
In particular, a collection of vectors ๐ด โถ= {๐1, โฆ , ๐๐} in โ๐ is said to be
โข linearly dependent if some strict subset of ๐ด has the same span as ๐ดโข linearly independent if it is not linearly dependent
Put differently, a set of vectors is linearly independent if no vector is redundant to the span,and linearly dependent otherwise.
To illustrate the idea, recall the figure that showed the span of vectors {๐1, ๐2} in โ3 as aplane through the origin.
If we take a third vector ๐3 and form the set {๐1, ๐2, ๐3}, this set will be
โข linearly dependent if ๐3 lies in the planeโข linearly independent otherwise
As another illustration of the concept, since โ๐ can be spanned by ๐ vectors (see the discus-sion of canonical basis vectors above), any collection of ๐ > ๐ vectors in โ๐ must be linearlydependent.
The following statements are equivalent to linear independence of ๐ด โถ= {๐1, โฆ , ๐๐} โ โ๐.
1. No vector in ๐ด can be formed as a linear combination of the other elements.
2. If ๐ฝ1๐1 + โฏ ๐ฝ๐๐๐ = 0 for scalars ๐ฝ1, โฆ , ๐ฝ๐, then ๐ฝ1 = โฏ = ๐ฝ๐ = 0.
(The zero in the first expression is the origin of โ๐)
3.6 Unique Representations
Another nice thing about sets of linearly independent vectors is that each element in the spanhas a unique representation as a linear combination of these vectors.
In other words, if ๐ด โถ= {๐1, โฆ , ๐๐} โ โ๐ is linearly independent and
๐ฆ = ๐ฝ1๐1 + โฏ ๐ฝ๐๐๐
then no other coefficient sequence ๐พ1, โฆ , ๐พ๐ will produce the same vector ๐ฆ.
Indeed, if we also have ๐ฆ = ๐พ1๐1 + โฏ ๐พ๐๐๐, then
(๐ฝ1 โ ๐พ1)๐1 + โฏ + (๐ฝ๐ โ ๐พ๐)๐๐ = 0
Linear independence now implies ๐พ๐ = ๐ฝ๐ for all ๐.
6
4 Matrices
Matrices are a neat way of organizing data for use in linear operations.
An ๐ ร ๐ matrix is a rectangular array ๐ด of numbers with ๐ rows and ๐ columns:
๐ด =โกโขโขโฃ
๐11 ๐12 โฏ ๐1๐๐21 ๐22 โฏ ๐2๐
โฎ โฎ โฎ๐๐1 ๐๐2 โฏ ๐๐๐
โคโฅโฅโฆ
Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-cussed at the start of this lecture.
For obvious reasons, the matrix ๐ด is also called a vector if either ๐ = 1 or ๐ = 1.
In the former case, ๐ด is called a row vector, while in the latter it is called a column vector.
If ๐ = ๐, then ๐ด is called square.
The matrix formed by replacing ๐๐๐ by ๐๐๐ for every ๐ and ๐ is called the transpose of ๐ด, anddenoted ๐ดโฒ or ๐ดโค.
If ๐ด = ๐ดโฒ, then ๐ด is called symmetric.
For a square matrix ๐ด, the ๐ elements of the form ๐๐๐ for ๐ = 1, โฆ , ๐ are called the principaldiagonal.
๐ด is called diagonal if the only nonzero entries are on the principal diagonal.
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then๐ด is called the identity matrix, and denoted by ๐ผ .
4.1 Matrix Operations
Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:
๐พ๐ด = ๐พ โกโขโฃ
๐11 โฏ ๐1๐โฎ โฎ โฎ
๐๐1 โฏ ๐๐๐
โคโฅโฆ
โถ= โกโขโฃ
๐พ๐11 โฏ ๐พ๐1๐โฎ โฎ โฎ
๐พ๐๐1 โฏ ๐พ๐๐๐
โคโฅโฆ
and
๐ด + ๐ต = โกโขโฃ
๐11 โฏ ๐1๐โฎ โฎ โฎ
๐๐1 โฏ ๐๐๐
โคโฅโฆ
+ โกโขโฃ
๐11 โฏ ๐1๐โฎ โฎ โฎ
๐๐1 โฏ ๐๐๐
โคโฅโฆ
โถ= โกโขโฃ
๐11 + ๐11 โฏ ๐1๐ + ๐1๐โฎ โฎ โฎ
๐๐1 + ๐๐1 โฏ ๐๐๐ + ๐๐๐
โคโฅโฆ
In the latter case, the matrices must have the same shape in order for the definition to makesense.
We also have a convention for multiplying two matrices.
The rule for matrix multiplication generalizes the idea of inner products discussed above, andis designed to make multiplication play well with basic linear operations.
7
If ๐ด and ๐ต are two matrices, then their product ๐ด๐ต is formed by taking as its ๐, ๐-th elementthe inner product of the ๐-th row of ๐ด and the ๐-th column of ๐ต.
There are many tutorials to help you visualize this operation, such as this one, or the discus-sion on the Wikipedia page.
If ๐ด is ๐ ร ๐ and ๐ต is ๐ ร ๐, then to multiply ๐ด and ๐ต we require ๐ = ๐, and the resultingmatrix ๐ด๐ต is ๐ ร ๐.
As perhaps the most important special case, consider multiplying ๐ ร ๐ matrix ๐ด and ๐ ร 1column vector ๐ฅ.
According to the preceding rule, this gives us an ๐ ร 1 column vector
๐ด๐ฅ = โกโขโฃ
๐11 โฏ ๐1๐โฎ โฎ โฎ
๐๐1 โฏ ๐๐๐
โคโฅโฆ
โกโขโฃ
๐ฅ1โฎ
๐ฅ๐
โคโฅโฆ
โถ= โกโขโฃ
๐11๐ฅ1 + โฏ + ๐1๐๐ฅ๐โฎ
๐๐1๐ฅ1 + โฏ + ๐๐๐๐ฅ๐
โคโฅโฆ
(2)
Note๐ด๐ต and ๐ต๐ด are not generally the same thing.
Another important special case is the identity matrix.
You should check that if ๐ด is ๐ ร ๐ and ๐ผ is the ๐ ร ๐ identity matrix, then ๐ด๐ผ = ๐ด.
If ๐ผ is the ๐ ร ๐ identity matrix, then ๐ผ๐ด = ๐ด.
4.2 Matrices in Julia
Julia arrays are also used as matrices, and have fast, efficient functions and methods for allthe standard matrix operations.
You can create them as follows
In [15]: A = [1 23 4]
Out[15]: 2ร2 Array{Int64,2}:1 23 4
In [16]: typeof(A)
Out[16]: Array{Int64,2}
In [17]: size(A)
Out[17]: (2, 2)
8
The size function returns a tuple giving the number of rows and columns.
To get the transpose of A, use transpose(A) or, more simply, A'.
There are many convenient functions for creating common matrices (matrices of zeros, ones,etc.) โ see here.
Since operations are performed elementwise by default, scalar multiplication and additionhave very natural syntax
In [18]: A = ones(3, 3)
Out[18]: 3ร3 Array{Float64,2}:1.0 1.0 1.01.0 1.0 1.01.0 1.0 1.0
In [19]: 2I
Out[19]: UniformScaling{Int64}2*I
In [20]: A + I
Out[20]: 3ร3 Array{Float64,2}:2.0 1.0 1.01.0 2.0 1.01.0 1.0 2.0
To multiply matrices we use the * operator.
In particular, A * B is matrix multiplication, whereas A .* B is element by element multi-plication.
4.3 Matrices as Maps
Each ๐ ร ๐ matrix ๐ด can be identified with a function ๐(๐ฅ) = ๐ด๐ฅ that maps ๐ฅ โ โ๐ into๐ฆ = ๐ด๐ฅ โ โ๐.
These kinds of functions have a special property: they are linear.
A function ๐ โถ โ๐ โ โ๐ is called linear if, for all ๐ฅ, ๐ฆ โ โ๐ and all scalars ๐ผ, ๐ฝ, we have
๐(๐ผ๐ฅ + ๐ฝ๐ฆ) = ๐ผ๐(๐ฅ) + ๐ฝ๐(๐ฆ)
You can check that this holds for the function ๐(๐ฅ) = ๐ด๐ฅ + ๐ when ๐ is the zero vector, andfails when ๐ is nonzero.
In fact, itโs known that ๐ is linear if and only if there exists a matrix ๐ด such that ๐(๐ฅ) = ๐ด๐ฅfor all ๐ฅ.
9
5 Solving Systems of Equations
Recall again the system of equations (1).
If we compare (1) and (2), we see that (1) can now be written more conveniently as
๐ฆ = ๐ด๐ฅ (3)
The problem we face is to determine a vector ๐ฅ โ โ๐ that solves (3), taking ๐ฆ and ๐ด as given.
This is a special case of a more general problem: Find an ๐ฅ such that ๐ฆ = ๐(๐ฅ).Given an arbitrary function ๐ and a ๐ฆ, is there always an ๐ฅ such that ๐ฆ = ๐(๐ฅ)?If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
In [21]: f(x) = 0.6cos(4x) + 1.3grid = range(-2, 2, length = 100)y_min, y_max = extrema( f(x) for x in grid )plt1 = plot(f, xlim = (-2, 2), label = "f")hline!(plt1, [f(0.5)], linestyle = :dot, linewidth = 2, label = "")vline!(plt1, [-1.07, -0.5, 0.5, 1.07], linestyle = :dot, linewidth = 2,๏ฟฝ
โชlabel = "")plot!(plt1, fill(0, 2), [y_min y_min; y_max y_max], lw = 3, color = :blue,
label = ["range of f" ""])plt2 = plot(f, xlim = (-2, 2), label = "f")hline!(plt2, [2], linestyle = :dot, linewidth = 2, label = "")plot!(plt2, fill(0, 2), [y_min y_min; y_max y_max], lw = 3, color = :blue,
label = ["range of f" ""])plot(plt1, plt2, layout = (2, 1), ylim = (0, 3.5))
Out[21]:
In the first plot there are multiple solutions, as the function is not one-to-one, while in thesecond there are no solutions, since ๐ฆ lies outside the range of ๐ .
Can we impose conditions on ๐ด in (3) that rule out these problems?
In this context, the most important thing to recognize about the expression ๐ด๐ฅ is that it cor-responds to a linear combination of the columns of ๐ด.
In particular, if ๐1, โฆ , ๐๐ are the columns of ๐ด, then
๐ด๐ฅ = ๐ฅ1๐1 + โฏ + ๐ฅ๐๐๐
Hence the range of ๐(๐ฅ) = ๐ด๐ฅ is exactly the span of the columns of ๐ด.
We want the range to be large, so that it contains arbitrary ๐ฆ.
As you might recall, the condition that we want for the span to be large is linear indepen-dence.
A happy fact is that linear independence of the columns of ๐ด also gives us uniqueness.
Indeed, it follows from our earlier discussion that if {๐1, โฆ , ๐๐} are linearly independent and๐ฆ = ๐ด๐ฅ = ๐ฅ1๐1 + โฏ + ๐ฅ๐๐๐, then no ๐ง โ ๐ฅ satisfies ๐ฆ = ๐ด๐ง.
10
5.1 The ๐ ร ๐ Case
Letโs discuss some more details, starting with the case where ๐ด is ๐ ร ๐.
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary ๐ฆ โ โ๐, we hope to find a unique ๐ฅ โ โ๐ such that ๐ฆ = ๐ด๐ฅ.
In view of the observations immediately above, if the columns of ๐ด are linearly independent,then their span, and hence the range of ๐(๐ฅ) = ๐ด๐ฅ, is all of โ๐.
Hence there always exists an ๐ฅ such that ๐ฆ = ๐ด๐ฅ.
Moreover, the solution is unique.
In particular, the following are equivalent
1. The columns of ๐ด are linearly independent
2. For any ๐ฆ โ โ๐, the equation ๐ฆ = ๐ด๐ฅ has a unique solution
The property of having linearly independent columns is sometimes expressed as having fullcolumn rank.
5.1.1 Inverse Matrices
Can we give some sort of expression for the solution?
If ๐ฆ and ๐ด are scalar with ๐ด โ 0, then the solution is ๐ฅ = ๐ดโ1๐ฆ.
A similar expression is available in the matrix case.
In particular, if square matrix ๐ด has full column rank, then it possesses a multiplicative in-verse matrix ๐ดโ1, with the property that ๐ด๐ดโ1 = ๐ดโ1๐ด = ๐ผ .
As a consequence, if we pre-multiply both sides of ๐ฆ = ๐ด๐ฅ by ๐ดโ1, we get ๐ฅ = ๐ดโ1๐ฆ.
This is the solution that weโre looking for.
5.1.2 Determinants
Another quick comment about square matrices is that to every such matrix we assign aunique number called the determinant of the matrix โ you can find the expression for ithere.
If the determinant of ๐ด is not zero, then we say that ๐ด is nonsingular.
Perhaps the most important fact about determinants is that ๐ด is nonsingular if and only if ๐ดis of full column rank.
This gives us a useful one-number summary of whether or not a square matrix can be in-verted.
11
5.2 More Rows than Columns
This is the ๐ ร ๐ case with ๐ > ๐.
This case is very important in many settings, not least in the setting of linear regression(where ๐ is the number of observations, and ๐ is the number of explanatory variables).
Given arbitrary ๐ฆ โ โ๐, we seek an ๐ฅ โ โ๐ such that ๐ฆ = ๐ด๐ฅ.
In this setting, existence of a solution is highly unlikely.
Without much loss of generality, letโs go over the intuition focusing on the case where thecolumns of ๐ด are linearly independent.
It follows that the span of the columns of ๐ด is a ๐-dimensional subspace of โ๐.
This span is very โunlikelyโ to contain arbitrary ๐ฆ โ โ๐.
To see why, recall the figure above, where ๐ = 2 and ๐ = 3.
Imagine an arbitrarily chosen ๐ฆ โ โ3, located somewhere in that three dimensional space.
Whatโs the likelihood that ๐ฆ lies in the span of {๐1, ๐2} (i.e., the two dimensional planethrough these points)?
In a sense it must be very small, since this plane has zero โthicknessโ.
As a result, in the ๐ > ๐ case we usually give up on existence.
However, we can still seek a best approximation, for example an ๐ฅ that makes the distanceโ๐ฆ โ ๐ด๐ฅโ as small as possible.
To solve this problem, one can use either calculus or the theory of orthogonal projections.
The solution is known to be ฬ๐ฅ = (๐ดโฒ๐ด)โ1๐ดโฒ๐ฆ โ see for example chapter 3 of these notes
5.3 More Columns than Rows
This is the ๐ ร ๐ case with ๐ < ๐, so there are fewer equations than unknowns.
In this case there are either no solutions or infinitely many โ in other words, uniquenessnever holds.
For example, consider the case where ๐ = 3 and ๐ = 2.
Thus, the columns of ๐ด consists of 3 vectors in โ2.
This set can never be linearly independent, since it is possible to find two vectors that spanโ2.
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, letโs say that ๐1 = ๐ผ๐2 + ๐ฝ๐3.
Then if ๐ฆ = ๐ด๐ฅ = ๐ฅ1๐1 + ๐ฅ2๐2 + ๐ฅ3๐3, we can also write
๐ฆ = ๐ฅ1(๐ผ๐2 + ๐ฝ๐3) + ๐ฅ2๐2 + ๐ฅ3๐3 = (๐ฅ1๐ผ + ๐ฅ2)๐2 + (๐ฅ1๐ฝ + ๐ฅ3)๐3
In other words, uniqueness fails.
12
5.4 Linear Equations with Julia
Hereโs an illustration of how to solve linear equations with Juliaโs built-in linear algebra facili-ties
In [22]: A = [1.0 2.0; 3.0 4.0];
In [23]: y = ones(2, 1); # A column vector
In [24]: det(A)
Out[24]: -2.0
In [25]: A_inv = inv(A)
Out[25]: 2ร2 Array{Float64,2}:-2.0 1.01.5 -0.5
In [26]: x = A_inv * y # solution
Out[26]: 2ร1 Array{Float64,2}:-0.99999999999999980.9999999999999999
In [27]: A * x # should equal y (a vector of ones)
Out[27]: 2ร1 Array{Float64,2}:1.01.0000000000000004
In [28]: A \ y # produces the same solution
Out[28]: 2ร1 Array{Float64,2}:-1.01.0
Observe how we can solve for ๐ฅ = ๐ดโ1๐ฆ by either via inv(A) * y, or using A \ y.
The latter method is preferred because it automatically selects the best algorithm for theproblem based on the types of A and y.
If A is not square then A \ y returns the least squares solution ฬ๐ฅ = (๐ดโฒ๐ด)โ1๐ดโฒ๐ฆ.
13
6 Eigenvalues and Eigenvectors
Let ๐ด be an ๐ ร ๐ square matrix.
If ๐ is scalar and ๐ฃ is a non-zero vector in โ๐ such that
๐ด๐ฃ = ๐๐ฃ
then we say that ๐ is an eigenvalue of ๐ด, and ๐ฃ is an eigenvector.
Thus, an eigenvector of ๐ด is a vector such that when the map ๐(๐ฅ) = ๐ด๐ฅ is applied, ๐ฃ ismerely scaled.
The next figure shows two eigenvectors (blue arrows) and their images under ๐ด (red arrows).
As expected, the image ๐ด๐ฃ of each ๐ฃ is just a scaled version of the original
In [29]: A = [1 22 1]
evals, evecs = eigen(A)
a1, a2 = evalseig_1 = [0 0; evecs[:,1]']eig_2 = [0 0; evecs[:,2]']x = range(-5, 5, length = 10)y = -x
plot(eig_1[:, 2], a1 * eig_2[:, 2], arrow = true, color = :red,legend = :none, xlims = (-3, 3), ylims = (-3, 3), xticks = -3:3,๏ฟฝ
โชyticks = -3:3,framestyle = :origin)
plot!(a2 * eig_1[:, 2], a2 * eig_2, arrow = true, color = :red)plot!(eig_1, eig_2, arrow = true, color = :blue)plot!(x, y, color = :blue, lw = 0.4, alpha = 0.6)plot!(x, x, color = :blue, lw = 0.4, alpha = 0.6)
Out[29]:
The eigenvalue equation is equivalent to (๐ด โ ๐๐ผ)๐ฃ = 0, and this has a nonzero solution ๐ฃ onlywhen the columns of ๐ด โ ๐๐ผ are linearly dependent.
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for ๐ such that the determinant of ๐ด โ ๐๐ผ is zero.
This problem can be expressed as one of solving for the roots of a polynomial in ๐ of degree๐.
This in turn implies the existence of ๐ solutions in the complex plane, although some mightbe repeated.
Some nice facts about the eigenvalues of a square matrix ๐ด are as follows
1. The determinant of ๐ด equals the product of the eigenvalues.
2. The trace of ๐ด (the sum of the elements on the principal diagonal) equals the sum ofthe eigenvalues.
14
3. If ๐ด is symmetric, then all of its eigenvalues are real.
4. If ๐ด is invertible and ๐1, โฆ , ๐๐ are its eigenvalues, then the eigenvalues of ๐ดโ1 are1/๐1, โฆ , 1/๐๐.
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvaluesare nonzero.
Using Julia, we can solve for the eigenvalues and eigenvectors of a matrix as follows
In [30]: A = [1.0 2.0; 2.0 1.0];
In [31]: evals, evecs = eigen(A);
In [32]: evals
Out[32]: 2-element Array{Float64,1}:-1.03.0
In [33]: evecs
Out[33]: 2ร2 Array{Float64,2}:-0.707107 0.7071070.707107 0.707107
Note that the columns of evecs are the eigenvectors.
Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (checkit), the eig routine normalizes the length of each eigenvector to one.
6.1 Generalized Eigenvalues
It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-ces ๐ด and ๐ต, seeks generalized eigenvalues ๐ and eigenvectors ๐ฃ such that
๐ด๐ฃ = ๐๐ต๐ฃ
This can be solved in Julia via eigen(A, B).
Of course, if ๐ต is square and invertible, then we can treat the generalized eigenvalue problemas an ordinary eigenvalue problem ๐ตโ1๐ด๐ฃ = ๐๐ฃ, but this is not always the case.
7 Further Topics
We round out our discussion by briefly mentioning several other important topics.
15
7.1 Series Expansions
Recall the usual summation formula for a geometric progression, which states that if |๐| < 1,then โโ
๐=0 ๐๐ = (1 โ ๐)โ1.
A generalization of this idea exists in the matrix setting.
7.1.1 Matrix Norms
Let ๐ด be a square matrix, and let
โ๐ดโ โถ= maxโ๐ฅโ=1
โ๐ด๐ฅโ
The norms on the right-hand side are ordinary vector norms, while the norm on the left-handside is a matrix norm โ in this case, the so-called spectral norm.
For example, for a square matrix ๐, the condition โ๐โ < 1 means that ๐ is contractive, in thesense that it pulls all vectors towards the origin Section ??.
7.1.2 Neumannโs Theorem
Let ๐ด be a square matrix and let ๐ด๐ โถ= ๐ด๐ด๐โ1 with ๐ด1 โถ= ๐ด.
In other words, ๐ด๐ is the ๐-th power of ๐ด.
Neumannโs theorem states the following: If โ๐ด๐โ < 1 for some ๐ โ โ, then ๐ผ โ ๐ด is invertible,and
(๐ผ โ ๐ด)โ1 =โ
โ๐=0
๐ด๐ (4)
7.1.3 Spectral Radius
A result known as Gelfandโs formula tells us that, for any square matrix ๐ด,
๐(๐ด) = lim๐โโ
โ๐ด๐โ1/๐
Here ๐(๐ด) is the spectral radius, defined as max๐ |๐๐|, where {๐๐}๐ is the set of eigenvalues of๐ด.
As a consequence of Gelfandโs formula, if all eigenvalues are strictly less than one in modulus,there exists a ๐ with โ๐ด๐โ < 1.
In which case (4) is valid.
16
7.2 Positive Definite Matrices
Let ๐ด be a symmetric ๐ ร ๐ matrix.
We say that ๐ด is
1. positive definite if ๐ฅโฒ๐ด๐ฅ > 0 for every ๐ฅ โ โ๐ {0}
2. positive semi-definite or nonnegative definite if ๐ฅโฒ๐ด๐ฅ โฅ 0 for every ๐ฅ โ โ๐
Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if ๐ด is positive definite, then all of its eigenvalues are strictly positive, andhence ๐ด is invertible (with positive definite inverse).
7.3 Differentiating Linear and Quadratic forms
The following formulas are useful in many economic contexts. Let
โข ๐ง, ๐ฅ and ๐ all be ๐ ร 1 vectorsโข ๐ด be an ๐ ร ๐ matrixโข ๐ต be an ๐ ร ๐ matrix and ๐ฆ be an ๐ ร 1 vector
Then
1. ๐๐โฒ๐ฅ๐๐ฅ = ๐
2. ๐๐ด๐ฅ๐๐ฅ = ๐ดโฒ
3. ๐๐ฅโฒ๐ด๐ฅ๐๐ฅ = (๐ด + ๐ดโฒ)๐ฅ
4. ๐๐ฆโฒ๐ต๐ง๐๐ฆ = ๐ต๐ง
5. ๐๐ฆโฒ๐ต๐ง๐๐ต = ๐ฆ๐งโฒ
Exercise 1 below asks you to apply these formulas.
7.4 Further Reading
The documentation of the linear algebra features built into Julia can be found here.
Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along thesame lines as above, with solved exercises.
If you donโt mind a slightly abstract approach, a nice intermediate-level text on linear algebrais [1].
17
8 Exercises
8.1 Exercise 1
Let ๐ฅ be a given ๐ ร 1 vector and consider the problem
๐ฃ(๐ฅ) = max๐ฆ,๐ข
{โ๐ฆโฒ๐๐ฆ โ ๐ขโฒ๐๐ข}
subject to the linear constraint
๐ฆ = ๐ด๐ฅ + ๐ต๐ข
Here
โข ๐ is an ๐ ร ๐ matrix and ๐ is an ๐ ร ๐ matrixโข ๐ด is an ๐ ร ๐ matrix and ๐ต is an ๐ ร ๐ matrixโข both ๐ and ๐ are symmetric and positive semidefinite
(What must the dimensions of ๐ฆ and ๐ข be to make this a well-posed problem?)
One way to solve the problem is to form the Lagrangian
โ = โ๐ฆโฒ๐๐ฆ โ ๐ขโฒ๐๐ข + ๐โฒ [๐ด๐ฅ + ๐ต๐ข โ ๐ฆ]
where ๐ is an ๐ ร 1 vector of Lagrange multipliers.
Try applying the formulas given above for differentiating quadratic and linear forms to ob-tain the first-order conditions for maximizing โ with respect to ๐ฆ, ๐ข and minimizing it withrespect to ๐.
Show that these conditions imply that
1. ๐ = โ2๐ ๐ฆ
2. The optimizing choice of ๐ข satisfies ๐ข = โ(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ
3. The function ๐ฃ satisfies ๐ฃ(๐ฅ) = โ๐ฅโฒ ฬ๐ ๐ฅ where ฬ๐ = ๐ดโฒ๐๐ด โ ๐ดโฒ๐๐ต(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด
As we will see, in economic contexts Lagrange multipliers often are shadow prices
NoteIf we donโt care about the Lagrange multipliers, we can substitute the constraintinto the objective function, and then just maximize โ(๐ด๐ฅ+๐ต๐ข)โฒ๐(๐ด๐ฅ+๐ต๐ข)โ๐ขโฒ๐๐ขwith respect to ๐ข. You can verify that this leads to the same maximizer.
9 Solutions
Thanks to Willem Hekman and Guanlong Ren for providing this solution.
18
9.1 Exercise 1
We have an optimization problem:
๐ฃ(๐ฅ) = max๐ฆ,๐ข
{โ๐ฆโฒ๐๐ฆ โ ๐ขโฒ๐๐ข}
s.t.
๐ฆ = ๐ด๐ฅ + ๐ต๐ข
with primitives
โข ๐ be a symmetric and positive semidefinite ๐ ร ๐ matrix.โข ๐ be a symmetric and positive semidefinite ๐ ร ๐ matrix.โข ๐ด an ๐ ร ๐ matrix.โข ๐ต an ๐ ร ๐ matrix.
The associated Lagrangian is :
๐ฟ = โ๐ฆโฒ๐๐ฆ โ ๐ขโฒ๐๐ข + ๐โฒ[๐ด๐ฅ + ๐ต๐ข โ ๐ฆ]
9.1.1 1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
๐๐ฟ๐๐ฆ = โ(๐ + ๐ โฒ)๐ฆ โ ๐ = โ2๐๐ฆ โ ๐ = 0 ,
since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies
๐ = โ2๐๐ฆ .
9.1.2 2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
๐๐ฟ๐๐ข = โ(๐ + ๐โฒ)๐ข โ ๐ตโฒ๐ = โ2๐๐ข + ๐ตโฒ๐ = 0 .
Substituting ๐ = โ2๐๐ฆ gives
๐๐ข + ๐ตโฒ๐๐ฆ = 0 .
Substituting the linear constraint ๐ฆ = ๐ด๐ฅ + ๐ต๐ข into above equation gives
๐๐ข + ๐ตโฒ๐(๐ด๐ฅ + ๐ต๐ข) = 0
19
(๐ + ๐ตโฒ๐๐ต)๐ข + ๐ตโฒ๐๐ด๐ฅ = 0
which is the first-order condition for maximizing L w.r.t. u.
Thus, the optimal choice of u must satisfy
๐ข = โ(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ ,
which follows from the definition of the first-oder conditions for Lagrangian equation.
9.1.3 3.
Rewriting our problem by substituting the constraint into the objective function, we get
๐ฃ(๐ฅ) = max๐ข
{โ(๐ด๐ฅ + ๐ต๐ข)โฒ๐(๐ด๐ฅ + ๐ต๐ข) โ ๐ขโฒ๐๐ข} .
Since we know the optimal choice of u satisfies ๐ข = โ(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ, then
๐ฃ(๐ฅ) = โ(๐ด๐ฅ + ๐ต๐ข)โฒ๐(๐ด๐ฅ + ๐ต๐ข) โ ๐ขโฒ๐๐ข ๐ค๐๐กโ ๐ข = โ(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ
To evaluate the function
๐ฃ(๐ฅ) = โ(๐ด๐ฅ + ๐ต๐ข)โฒ๐(๐ด๐ฅ + ๐ต๐ข) โ ๐ขโฒ๐๐ข= โ(๐ฅโฒ๐ดโฒ + ๐ขโฒ๐ตโฒ)๐ (๐ด๐ฅ + ๐ต๐ข) โ ๐ขโฒ๐๐ข= โ๐ฅโฒ๐ดโฒ๐๐ด๐ฅ โ ๐ขโฒ๐ตโฒ๐๐ด๐ฅ โ ๐ฅโฒ๐ดโฒ๐๐ต๐ข โ ๐ขโฒ๐ตโฒ๐๐ต๐ข โ ๐ขโฒ๐๐ข= โ๐ฅโฒ๐ดโฒ๐๐ด๐ฅ โ 2๐ขโฒ๐ตโฒ๐๐ด๐ฅ โ ๐ขโฒ(๐ + ๐ตโฒ๐๐ต)๐ข
For simplicity, denote by ๐ โถ= (๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด, then ๐ข = โ๐๐ฅ.
Regarding the second term โ2๐ขโฒ๐ตโฒ๐๐ด๐ฅ,
โ2๐ขโฒ๐ตโฒ๐๐ด๐ฅ = โ2๐ฅโฒ๐โฒ๐ตโฒ๐๐ด๐ฅ= 2๐ฅโฒ๐ดโฒ๐๐ต(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ
Notice that the term (๐ + ๐ตโฒ๐๐ต)โ1 is symmetric as both P and Q are symmetric.
Regarding the third term โ๐ขโฒ(๐ + ๐ตโฒ๐๐ต)๐ข,
โ๐ขโฒ(๐ + ๐ตโฒ๐๐ต)๐ข = โ๐ฅโฒ๐โฒ(๐ + ๐ตโฒ๐๐ต)๐๐ฅ= โ๐ฅโฒ๐ดโฒ๐๐ต(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ
Hence, the summation of second and third terms is ๐ฅโฒ๐ดโฒ๐๐ต(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ.
This implies that
๐ฃ(๐ฅ) = โ๐ฅโฒ๐ดโฒ๐๐ด๐ฅ โ 2๐ขโฒ๐ตโฒ๐๐ด๐ฅ โ ๐ขโฒ(๐ + ๐ตโฒ๐๐ต)๐ข= โ๐ฅโฒ๐ดโฒ๐๐ด๐ฅ + ๐ฅโฒ๐ดโฒ๐๐ต(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด๐ฅ= โ๐ฅโฒ[๐ดโฒ๐๐ด โ ๐ดโฒ๐๐ต(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด]๐ฅ
20
Therefore, the solution to the optimization problem ๐ฃ(๐ฅ) = โ๐ฅโฒ ฬ๐ ๐ฅ follows the above result bydenoting ฬ๐ โถ= ๐ดโฒ๐๐ด โ ๐ดโฒ๐๐ต(๐ + ๐ตโฒ๐๐ต)โ1๐ตโฒ๐๐ด.
Footnotes
[1] Suppose that โ๐โ < 1. Take any nonzero vector ๐ฅ, and let ๐ โถ= โ๐ฅโ. We have โ๐๐ฅโ =๐โ๐(๐ฅ/๐)โ โค ๐โ๐โ < ๐ = โ๐ฅโ. Hence every point is pulled towards the origin.
References[1] K Jรคnich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technology.
Springer, 1994.
21