notes 4 supp diffeq

Upload: shopforever1238145

Post on 05-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Notes 4 Supp Diffeq

    1/23

    Additional Topics for Chapter 4Linear Algebra and Differential Equations1

    Matrix Factorization

    Review of Elementary Matrices

    Denition 1 An elementary matrix is an n

    n matrix that can be obtained by performing a single

    elementary row operation on the identity matrix In. (Note that the identity matrix itself is an elementarymatrix because we could multiply any row of In by the scalar 1.)

    Recall that the elementary row operations are:

    1. Swap two rows

    2. Multiply a row by a nonzero constant

    3. Add a multiple of one row to another row

    Example 2 Row swap: Multiplying matrix A by the elementary matrix E1, in which rows 1 and 2 of I3are swapped, produces a matrix in which rows 1 and 2 of A have also been swapped.

    240 1 0

    1 0 00 0 1

    35241 4 7

    2 5 63 1 2 35 = 242 5 6

    1 4 73 1 2 35Example 3 Multiplication of a row by a scalar: Multiplying matrix A by the elementary matrix E2,in which the second row of I3 has been multiplied by

    1

    3, produces a new matrix in which the second row of A

    has been multiplied by 13

    .24 1 0 00 1

    30

    0 0 1

    3524 1 4 72 5 6

    3 1 2

    35 =

    24 1 4 72

    3

    5

    32

    3 1 2

    35

    Example 4 Adding a multiple of one row to another: Multiplying matrixA by the elementary matrixE3, in which two times the rst row has been subtracted from the second row of I3, produces a new matrixin which the two times the rst row of A has been subtracted from the second row of A.

    24 1 0 02 1 00 0 1

    3524 1 4 72 5 63 1 2

    35 = 24 1 4 70 3 83 1 2

    35This leads us to the following theorems, the second of which is a direct result of the fact that elementary

    row operations are reversible.

    Theorem 5 If an elementary row operation is performed on a matrix A, the resulting matrix can also beobtained by multiplying A (on the left) by the corresponding elementary matrix E.

    Theorem 6 If E is an elementary matrix, then E1 exists and is also an elementary matrix.

    As conrmation of the previous theorem, note that the elementary matrices E1, E2, and E3 from abovehave inverses

    24 0 1 01 0 00 0 1

    35 ,24 1 0 00 3 00 0 1

    35 , and 24 1 0 02 1 00 0 1

    35because

    E1E11

    = E11

    E1 =

    24 0 1 01 0 0

    0 0 1

    3524 0 1 01 0 0

    0 0 1

    35 =

    24 1 0 00 1 0

    0 0 1

    35 ,

    1 Material from Falvo, David C. and Larson, Ron. Elementary Linear Algebra, 6th ed. Brooks/Cole. 2010.

    1

  • 7/31/2019 Notes 4 Supp Diffeq

    2/23

    while

    E2E12

    = E12

    E2 =

    24 1 0 00 1

    30

    0 0 1

    3524 1 0 00 3 0

    0 0 1

    35 =

    24 1 0 00 1 0

    0 0 1

    35 ,

    and

    E3E13

    = E13

    E3 =

    24 1 0 02 1 0

    0 0 1

    3524 1 0 02 1 0

    0 0 1

    35 =

    24 1 0 00 1 0

    0 0 1

    35 .

    Theorem 7 Two matricesA andB are row equivalent if there exists a nite number of elementary matricesE1; E2;:::;Ek such that B = EkEk1 E2E1A. (In other words, A and B are row equivalent if we can get

    fromA to B via a nite number of elementary row operations.)

    Following is an example of elementary matrices in use to reduce a 2 2 matrix to reduced row-echelonform (i.e., I2 in this case):

    Example 8 Start with A =

    5 181 4

    :

    Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix

    5 181 4

    swap R1 and R2 E1 =

    0 11 0

    E11

    =

    0 11 0

    1 45 18

    Add5R1 to R2 E2 = 1 05 1 E12 = 1 05 1

    1 40 2

    Multiply R2 by 1

    2E3 =

    1 00 1

    2

    E13

    =

    1 00 2

    1 40 1

    Add4R2 to R1 E4 =

    1 40 1

    E14

    =

    1 40 1

    Then, E4E3E2E1A = I. Since each of the Ei are invertible, we also see that

    E11

    E12

    E13

    E14

    E4E3E2E1A = E11

    E12

    E13

    E14

    I

    A = E1

    1 E1

    2 E1

    3 E1

    4 .

    In other words,

    A =

    0 11 0

    1 05 1

    1 00 2

    1 40 1

    ,

    or, A is the product of the inverses of the elementary matrices that were used to reduce A to I.

    The LU-Factorization (without row interchanges)

    There are a number of "matrix factorizations" in frequent use. Perhaps the most basic of these is what isknown as the "LU-Factorization." To motivate its development, let us consider an example:

    Example 9 Start with A = 2 1

    8 7 . We can accomplish row-echelon form with only one row operation.Here is that row operation and its associated elementary matrix:Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix

    2 18 7

    Add4R1 to R2 E1 =

    1 0

    4 1

    E11

    =

    1 04 1

    #

    2 10 3

    2

  • 7/31/2019 Notes 4 Supp Diffeq

    3/23

    The above example shows that E1A = U, so the relation A = LU implies that L must actually be E11

    ,or

    A =

    2 18 7

    =

    1 04 1

    2 10 3

    = LU:

    What is the signicance of this factorization? First of all, we use the letters L and U for a reason. Note that

    L is lower triangular (any nonzero elements are on or below the diagonal) and U is upper triangular (anynonzero elements are on or above the diagonal). Additionally, the diagonal elements of the L matrix are1s. Once we have an LU-factorization of a matrix, we can generate an algorithm to easily solve numeroussystems involving that same coecient matrix. The practical signicance of this is that it is even moreecient than Gaussian elimination when we need to reuse a coecient matrix with varying right-hand sides(i.e., what weve been calling the b vector).2 Before we proceed, we need to mention an important "lemma"(a lemma is a sort of warm-up to a Theorem):

    Lemma 10 IfL and bL are lower triangular matrices of the same size, so is their productLbL. Furthermore,if both of the matrices have ones on their diagonals, then so does their product. If U and bU are uppertriangular matrices of the same size, so is their product UbU.

    Let us illustrate with another example, this time taking note of the result of the above lemma.

    Example 11 Find an LU-factorization of the matrix A =

    24 2 1 14 5 22 2 0

    35.Here is the procedure (Gaussian elimination) and its associated elementary matrices.

    Matrix Elementary Row Operation Elementary Matrix Inverse Elementary Matrix

    24 2 1 14 5 2

    2 2 0

    35 Add2R1 to R2 E1 =

    24 1 0 02 1 0

    0 0 1

    35 E1

    1=

    24 1 0 02 1 0

    0 0 1

    35

    24

    2 1 10 3 0

    2 2 0

    35

    Add

    R1 to R3 E2 =

    24

    1 0 00 1 0

    1 0 1

    35

    E12

    =

    24

    1 0 00 1 0

    1 0 1

    352

    4 2 1 10 3 00 3 1

    35 Add R2 to R3 E3 =

    24 1 0 00 1 0

    0 1 1

    35 E1

    3=

    24 1 0 00 1 0

    0 1 1

    35

    #24 2 1 10 3 0

    0 0 1

    35

    Just as in the earlier 2 2 example, we have

    E3E2E1A = U;

    soE11

    E12

    E13

    E3E2E1A = A = E11

    E12

    E13

    U.

    2 For n n systems, LU-Factorization requires

    4n3 3n2 n=6 airthmetic operations for the factorization itself (which

    only has to be done once and can then be reused). Then each solution for the two resulting tiangular systems (more on thislater) can be carried out in 2n2n operations per system. On the other hand, Gaussian elimination uses

    4n3 + 9n2 7n

    =6

    arithmetic operations to arrive at a solution, and it requries this many operations for each system.

    3

  • 7/31/2019 Notes 4 Supp Diffeq

    4/23

    But, note that each of the E1i s are lower triangular with ones on their diagonal. According to the previouslemma, their product will also have this form. Indeed,

    E11

    E12

    E13

    =

    24 1 0 02 1 0

    0 0 1

    3524 1 0 00 1 0

    1 0 1

    3524 1 0 00 1 0

    0 1 1

    35 =

    24 1 0 02 1 0

    1 1 1

    35 ; (1)

    and we realize that E11

    E12

    E13

    = L, and that A = LU, as desired. In other words, A can be "factored"into

    A = 24 2 1 14 5 22 2 0

    35 = 24 1 0 02 1 01 1 1

    3524 2 1 10 3 00 0 1

    35 = LU, (2)which again, is a product of a lower and an upper diagonal matrix. Note too that the result of themultiplication in (1) is a matrix whose diagonal elements are ones and whose other elements are the individualelements of the elementary matrices "condensed" into one matrix. We can look directly at L (at least inthis case) and see exactly what row operations were performed to get from A to U.

    Using A = LU to Solve Systems

    So how do we use this factorization to solve a system Ax = b? We can use a simple two-stage process:

    1. Solve the lower triangular system Ly = b for the vector c by forward substitution.

    2. Solve the resulting upper triangular system Ux = y for x by back substitution.

    The above two-stage process works because if

    Ux = y and Ly = b, then Ax = LUx = Ly = b.

    As an example, consider the LU-factorization we found in (2) above, namely24 2 1 14 5 2

    2 2 0

    35 =

    24 1 0 02 1 0

    1 1 1

    3524 2 1 10 3 0

    0 0 1

    35 :

    Suppose we seek to nd the solution to the system

    24 2 1 14 5 22 2 0

    3524 xyz

    35 = 24 122

    35 .We rst solve the lower triangular system2

    4 1 0 02 1 01 1 1

    3524 ab

    c

    35 =

    24 12

    2

    35 , or

    8 0;with positive scalars ci (called weights) are all inner products on R

    n. Note the conditionc > 0. If any ofthe ci are zero or negative, the product is no longer an inner product.

    8

  • 7/31/2019 Notes 4 Supp Diffeq

    9/23

    Example 17 Consider the real-valued and continuous functions in the vector space C[a; b] (the space of all

    continuous functions on the interval [a; b]). Thenhf; gi = Rba

    f(x) g (x) dx is an inner product on C[a; b].

    (1) hf; gi = Rba

    f(x) g (x) dx =Rba

    g (x) f(x) dx = hg; fi(2) hf; g + hi = Rb

    af(x) [g (x) + h (x)] dx =

    Rba

    f(x) g (x) dx +Rba

    f(x) h (x) dx = hf; gi + hf; hi(3) c hf; gi = c Rb

    af(x) g (x) dx =

    Rba

    (cf(x)) g (x) dx = hcf;gi(4) hf; fi = Rb

    af(x) f(x) dx 0 because (f(x))2 0 for all x. Additionally, hf; fi = 0 if and only if

    f(x) = 0 or if a = b.

    Orthogonal Projections

    Review of Dot Products and Orthogonality

    Recall the following:

    Two vectors are said to be orthogonal if their dot product is zero, namely

    u v = 0 or uTv = 0,

    where u and v are column vectors. By denition, the zero vector is orthogonal to all other vectors.

    The angle between the two vectors is given by the relation

    u v = kuk kvk cos or cos = u vkuk kvk .

    The length or norm of a vector is given by kvk2 = v v. The distance between two points (or vectors) is given by

    d (u;v) = ku vk = kv uk .

    A set of vectors is said to be mutually orthogonal if every pair of vectors in the set is orthogonal.Additionally, if all of the vectors are unit vectors (i.e., have length of one), the set is said to beorthonormal.

    An orthogonal set of nonzero vectors is linearly independent. A basis that is an orthogonal set is called an orthogonal basis. If the vectors in the basis are all

    of length one, the basis is called an orthonormal basis. (All of the familiar "standard" bases areorthonormal, e.g. f(1; 0; 0) ; (0; 1; 0) ; (0; 0; 1)g)

    Orthogonal and Orthonormal Bases

    Why make a big deal out of orthogonal and orthonormal bases? It turns out that the orthonormal bases ofa vector space are quite useful because there is a simple formula for writing any vector in the vector space asa linear combination of those orthonormal basis vectors. We do not have to start over and solve a system ofequations just to determine the coecients of the given vector relative to the basis every single time. Hereis the derivation of that formula.

    Suppose we have an orthonormal basis fu1;:::;ung for a vector space V. Ifv is a vector in V, theremust exist scalars c1;:::;cn such that

    v = c1u1 + c2u2 + + cnun. (3)

    We seek a formula to determine each of the cis. Start with the ith basis vector, namely ui. If we take thedot product ofui with both sides of (3), we have

    v ui = (c1u1 + c2u2 + + cnun) ui,

    9

  • 7/31/2019 Notes 4 Supp Diffeq

    10/23

    and using the properties of dot products, this leads to

    v ui = (c1u1 + c2u2 + + cnun) ui= c1u1 ui + c2u2 ui + + cnun ui.

    Now, since each of the basis vectors are mutually orthogonal, we must have ui uj = 0 for any two distinctvectors in the set fu1;:::;ung (i.e., ui uj = 0 unless i = j). Therefore,

    v ui = 0 + 0 + + ciui ui + + 0 + 0.

    Since the basis vectors are orthonormal, we know their lengths are all one, so ui ui = kuik2 = 1, and

    v ui = ci (ui ui) = ci.

    We have therefore found a formula for the ith coecient ci. As i ranges from 1 to n, we nd that c1 = v u1,c2 = v u2, ..., cn = v un. Consequently, we have proven the following theorem.

    Theorem 18 Iffu1; :::;ung is an orthonormal basis for a vector space V, any vectorv in V can be writtenas a linear combination of these basis vectors as follows:

    v = c1u1 + c2u2 + + cnun= (v u1)u1 + (v u2)u2 + + (v un)un.

    Example 19 The vectors u1 = (0; 1; 0), u2 = 35 ; 0; 45, andu3 = 45 ; 0; 35 form an orthonormal basisB for R3. Express the vectorv = (2; 3; 1) as a linear combination of these basis vectors.

    Solution 20 Take the three required dot products:

    v u1 = (2; 3; 1) (0; 1; 0) = 3v u2 = (2; 3; 1)

    3

    5; 0; 4

    5

    =

    2

    5

    v u3 = (2; 3; 1)

    4

    5; 0;

    3

    5

    =

    11

    5

    These scalars represent the "coordinates ofv relative to the basis B," and

    v = 3(0; 1; 0) + 253

    5; 0; 4

    5 + 11

    54

    5; 0; 3

    5 .

    (Multiply it out to conrm this!)

    Furthermore, note that taking dot products in this manner, with the rst vector the same each time, isequivalent to the following matrix multiplications:

    [2; 3; 1]24 01

    0

    35 = 3, [2; 3; 1]

    24 350

    45

    35 = 2

    5, and [2; 3; 1]

    24 450

    3

    5

    35 = 11

    5,

    and we can combine all of them into a single matrix multiplication:

    [2; 3; 1]24 0 35 451 0 0

    0 45

    3

    5

    35 = 3 25

    11

    5

    ,

    yielding the desired coecients ofu1, u2, and u3, respectively. (Compare this to the technique we had touse to nd the coordinates of a vector relative to a nonstandard basis.)

    10

  • 7/31/2019 Notes 4 Supp Diffeq

    11/23

    Distance and Projections

    We quite often need to determine the distance between a point b and a line in the direction of vector a, asshown in the gure below. Or, we might want to determine "how much" of the force vector b is pointing inthe direction ofa. (We have probably all done this with respect to the coordinate axes in the former caseor horizontal and vertical vector components in the latter.) Regardless of the question, the approach is thesame. We need to determine the projection of b onto a, denoted by projab and represented by p in thegure.

    b

    a

    e = b - p

    p

    b

    a

    e = b - p

    p

    O

    It might help to think of projab as what b would look like if you were "above" it and looking directlydown at a, with a line of sight perpendicular to a.

    We will now derive the formula for p. Note that p must be some scalar multiple of vector a because itis in the same direction (or opposite direction if the angle was obtuse). Therefore, p = ca, and we need tosolve for c. Of course, the point on the vector a that is closest to b would be the point at the foot of theperpendicular dropped from b onto a. In other words, the line from b to the closest point p on a wouldbe perpendicular to a: Note that in terms of vector subtraction, the side opposite angle O (denoted e inthe gure) represents the vector subtraction e = b p, or because p = ca, e = b ca. Since vector e isperpendicular to a, we must have

    a e = 0, or a (b ca) = 0, or a b a ca = 0,

    which in turn leads to the solution

    c =a ba a .

    Therefore, the projection p of vector b onto a is given by

    p = proja

    b = ca =a ba a

    a. (4)

    If we rewrite the dot products in (4) in the equivalent form a b = aTb and a a = aTa, we have

    projab =aTb

    aTaa.

    Realizing that this is a scalar aTb

    aTamultiplied by the vector a and rearranging, we have3

    projab = aaTb

    aTa=aaT

    aTab.

    Note that the quantity aaT

    aTaactually represents a matrix called the projection matrix P. (It is a matrix

    because aaT is a column times a row (say an n

    1 times a 1

    n, so the product is an n

    n matrix), andaTa is the familiar dot product of a with itself.) Thus we conclude that the projection ofb onto a can be

    found by multiplying the projection matrix P = aaT

    aTaby the vector b:

    p = Pb.

    3 The 11 "matrix" (i.e. scalar) aTa is called an "inner product" while the nn matrix aaT is called the "outer product."

    11

  • 7/31/2019 Notes 4 Supp Diffeq

    12/23

    Example 21 The matrix that projects any vector onto the line through the point a = (1; 1; 1) is given by

    P =aaT

    aTa=

    1

    3

    24 11

    1

    35 1 1 1 =

    24 13 13 131

    3

    1

    3

    1

    31

    3

    1

    3

    1

    3

    35 .

    For example, to determine the projection of (2; 3; 1) onto the line through (1; 1; 1), we would simply calculate24

    1

    3

    1

    3

    1

    31

    3

    1

    3

    1

    31

    3

    1

    3

    1

    3

    3524

    215

    35 =

    24

    8

    38

    38

    3

    35 .

    Note again the ease with which the projections can be found if the vector a has unit length. The dotproduct a a would be 1, and the resulting formulas would become

    projab = (a b)a

    andP = aaT.

    Example 22 Determine the projection of the vectorv = (6; 7) onto the vectoru = (1; 4) :

    Method 1 Using the formula projab =abaaa, we have

    projuv =34

    17(1; 4) = (2; 8) .

    Method 2 Using the projection matrix P = aaT

    aTa, we nd

    P =1

    17

    14

    1 4

    =

    1

    17

    4

    174

    17

    16

    17

    .

    Then

    projuv = Pv =

    1

    17

    4

    174

    17

    16

    17

    67

    =

    28

    ,

    both of which appear to agree with the gure shown below.

    (2,8)

    (1,4)

    (6,7)

    (2,8)

    (1,4)

    (6,7)

    u

    p

    O

    v

    12

  • 7/31/2019 Notes 4 Supp Diffeq

    13/23

    Gram-Schmidt Orthonormalization

    Recall that, in R2, the projection of a vector v onto a nonzero vector u is given by

    projuv =u vu uu.

    If the vector u is of unit length, this projection becomes

    projuv =u vu

    uu = (u v)u. (5)

    Now suppose we have a basis fw1; :::;wng for some vector space V and we wish to use this basis toconstruct an orthogonal (or orthonormal) basis fv1;:::;vng for V. Start by choosing

    v1 = w1,

    (where v1 6= 0 because w1 was a member of the original basis). We then require that the second vector beorthogonal to the rst, or v1 v2 = 0. Weve seen previously that at least one way to obtain an orthogonalvector is to consider the perpendicular dropped from v onto u in the projection projuv:

    proj(v)

    v

    v - proj(v)

    u

    proj(v)

    v

    v - proj(v)

    u

    O

    So lets take the next vector, v1, to be the perpendicular dropped from w2 onto v1, i.e.

    v2 = w2 projv1w2. (6)

    As conrmation of this choice, note that this will satisfy the orthogonality requirement because

    v1 v2 = v1 w2 projv1w2= v1 w2 v1 v1 w2

    v1 v1 v1= v1 w2 v1 w2= 0.

    Because v1 = w1 and w2 are members of the original basis, we know they are linearly independent andtherefore v1 and v2 are also linearly independent, and thus v2 = w2 v1w2v1v1 v1 6= 0.

    Now we need the third basis vector to be perpendicular to the rst two. Note from Eq. (6) that inorder to construct a new orthogonal basis vector (i.e. v2), we took the next given basis vector (i.e., w2) andremoved the component ofw2 that pointed in the direction of v1, our already settled basis vector. If wecontinue in this manner, to nd v3 we would subtract the components ofw3 in the directions ofv1 and v2 to

    obtain a vector that is perpendicular to both v1 and v2, then to nd v4 we would subtract the componentsofw4 in the direction ofv1, v2, and v3, and so on. In other words, we will take

    v3 = w3 projv1w3 projv2w3,

    and thenv4 = w4 projv1w4 projv2w4 projv3w4,

    and so on. This leads to the following generalization:

    13

  • 7/31/2019 Notes 4 Supp Diffeq

    14/23

    Theorem 23 Gram-Schmidt Orthogonalization: Let W = fw1;:::;wng be a basis for a vector space V.To create a set of orthogonal basis vectors B = fv1;:::;vng from W, construct the vi as follows:

    v1 = w1

    v2 = w2 projv1w2v3 = w3 projv1w3 projv2w3

    ...

    vn = wn

    projv1wn

    projv2wn

    projvn1wn

    To create an orthonormal basis, normalize each of the vectors vi.

    If we normalize the vectors as we go through the process, all of the dot products, as we are remindedin (5), are easier to calculate. However, the normalization usually introduces many square roots into thecalculation, which may be cumbersome to work with.

    Here are some examples of this process.

    Example 24 Apply the Gram-Schmidt process to the following basis for R2: B = f(1; 1) ; (0; 1)g.

    Solution: Choose v1 = (1; 1). Then remove the component of w2 = (0; 1) that points in the directionof v1:

    v2 = w2 projv1w2= (0; 1) (1; 1) (0; 1)

    (1; 1) (1; 1) (1; 1)

    = (0; 1)

    1

    2;

    1

    2

    =

    12

    ;1

    2

    :

    Therefore an orthogonal basis for R2 based on the two vectors (1; 1) and (0; 1) would be (1; 1) and1

    2; 12

    .

    If we desire and orthonormal basis, divide each vector by its respective length, namely kv1k =p

    2 and

    kv2

    k= 1

    p2, so the basis would be

    p2

    2;p2

    2 andp2

    2;p2

    2 .Note: Had we chosen v1 = (0; 1), we would have found

    v2 = (1; 1) (0; 1) (1; 1)(0; 1) (0; 1) (0; 1) = (1; 0) ,

    which we should have been able to guess in the rst place, since (1; 0) and (0; 1) make up the standard basisfor R2!

    Example 25 Apply the Gram-Schmidt process to the following basis for a three-dimensional subspace of R4:B = f(1; 2; 0; 3) ; (4; 0; 5; 8) ; (8; 1; 5; 6)g.

    Solution: Choose v1 = (1; 2; 0; 3). Then remove the component of w2 = (4; 0; 5; 8) that points in the

    direction of v1:

    v2 = w2 projv1w2= (4; 0; 5; 8) (1; 2; 0; 3) (4; 0; 5; 8)

    (1; 2; 0; 3) (1; 2; 0; 3) (1; 2; 0; 3)= (4; 0; 5; 8) (2; 4; 0; 6)= (2; 4; 5; 2) :

    14

  • 7/31/2019 Notes 4 Supp Diffeq

    15/23

    Now remove the components of w3 = (8; 1; 5; 6) that point in the directions of v1 and v2:

    v3 = w3 projv1w3 projv2w3= (8; 1; 5; 6) (1; 2; 0; 3) (8; 1; 5; 6)

    (1; 2; 0; 3) (1; 2; 0; 3) (1; 2; 0; 3) (2; 4; 5; 2) (8; 1; 5; 6)

    (2; 4; 5; 2) (2; 4; 5; 2) (2; 4; 5; 2)= (8; 1; 5; 6) (2; 4; 0; 6) (2; 4; 5; 2)= (4; 1; 0; 2) .

    We conclude that the setf

    (1; 2; 0; 3) ; (2;

    4; 5; 2) ; (4; 1; 0;

    2)g

    constitutes an orthogonal basis for this par-ticular subspace. We get an orthonormal basis by dividing each vector by its length:

    k(1; 2; 0; 3)k =p

    14

    k(2; 4; 5; 2)k = 7k(4; 1; 0; 2)k =

    p21,

    so the orthonormal basis is given by1p14

    ;2p14

    ; 0;3p14

    ;

    2

    7;47

    ;5

    7;

    2

    7

    ;

    4p21

    ;1p21

    ; 0;2p

    21

    .

    Projection and Distances on Subspaces; QR-Factorization

    Quick Review

    We now know how to project one vector onto another vector, namely via any of the following formulas:

    projuv =u vu uu or projuv =

    uTv

    uTuu or projuv =

    uuT

    uTuv.

    We also know how to write any vector w in a vector space V in terms of its orthonormal basis vectorsfu1;:::;ung:

    w = (w u1)u1 + (w u2)u2 + + (w un)un.Finally, weve devised a way to generate an orthonormal basis fv1; :::;vng from another basis fw1;:::;wngvia the Gram-Schmidt process:

    v1 = w1

    v2 = w2 projv1w2v3 = w3 projv1w3 projv2w3

    ...

    vn = wn projv1wn projv2wn projvn1wn.

    Projection onto a Subspace

    The projection of a vector v onto a subspace tells us "how much" of the given vector v lies in that particularsubspace. Put another way (and rather non-rigorously), the projection ofv onto the subspace tells us "howmany" of each of the subspaces orthonormal basis vectors we would need to represent v. We have met this

    quantity before, and you should recognize the right-hand side of the following.

    Denition 26 Consider the subspace W of Rn and letfu1; :::;ukg be an orthonormal basis for W. Ifv isa vector in Rn, the projection of vectorv onto the subspace W, denoted proj

    Wv, is dened as

    projWv = (v u1)u1 + (v u2)u2 + + (v uk)uk.

    15

  • 7/31/2019 Notes 4 Supp Diffeq

    16/23

    This is the exact same formula we encountered when writing a vector in terms of orthonormal basisvectors of a particular subspace! In addition, it would make sense (and we accept without proof) that everyvector in Rn can be "decomposed" into a vector w within a vector space W and a vector w? orthogonal toW. In symbols,

    v = w + w?, where w is in W and w? is in W?.

    It should come as no surprise, especially if one considers the two-dimensional case, that

    w = projWv,

    and because v = w + w?, we must have

    w? = v projWv.

    Example 27 Suppose we have the vector v = (3; 2; 6) in R3, and we wish to decomposev into the sum ofa vector that lies in the subspace W consisting of all vectors of the form (a;b;b) and a vector orthogonal tothat subspace.

    Solution: The vectors (1; 0; 0) and (0; 1; 1) span all of W and are orthogonal (hence linearly independent),and therefore form a basis for W. Normalizing, we nd orthonormal basis vectors

    u1 = (1; 0; 0) and u2 =

    0;

    1p2

    ;1p

    2

    .

    Then

    w = projWv

    = (v u1)u1 + (v u2)u2= ((3; 2; 6) (1; 0; 0)) (1; 0; 0) +

    (3; 2; 6)

    0;

    1p2

    ;1p

    2

    0;

    1p2

    ;1p

    2

    = (3; 0; 0) + (0; 4; 4)

    = (3; 4; 4) .

    Now,

    w

    ?= v

    proj

    Wv

    = (3; 2; 6) (3; 4; 4)= (0; 2; 2) .

    We can then conclude that (3; 4; 4) is a vector in W while (0; 2; 2) is a vector that is orthogonal to W.

    Distance from a Point to a Subspace

    Again, it would seem reasonable to extend the concept of "distance between points" to "distance from apoint to a line" to "distance from a point to a subspace" by realizing that the latter is simply the distanceof the point from its projection in the subspace. In symbols,

    d (x; W) =x proj

    Wx

    .

    Example 28 Determine the distance of the point x = (4; 1; 7) from the subspace W discussed in the pre-vious example.

    Solution: We have already found an orthonormal basis for W, namely

    u1 = (1; 0; 0) and u2 =

    0;

    1p2

    ;1p

    2

    .

    16

  • 7/31/2019 Notes 4 Supp Diffeq

    17/23

    Then

    projWx = (x u1)u1 + (x u2)u2

    = ((4; 1; 7) (1; 0; 0)) (1; 0; 0) +

    (4; 1; 7)

    0;1p

    2;

    1p2

    0;

    1p2

    ;1p

    2

    = (4; 0; 0) + (0; 3; 3)= (4; 3; 3) .

    The distance of the pointx from the subspace W is then

    x projWx = k(4; 1; 7) (4; 3; 3)k

    = k(0; 4; 4)k=

    p32.

    Orthogonal Matrices

    Denition 29 Anorthogonal matrixis a square matrix with orthonormal columns. Denoting this matrixQ, it is easy to determine that QTQ = I, and therefore, QT = Q1. In other words, the transpose of anorthogonal matrix is its inverse.4

    Example 30 Consider the rotation matrix Q =

    cos sin sin cos

    . Then QT =

    cos sin

    sin cos

    ; and

    it is easy to verify that QT

    Q = I. This type of matrix is called an isometry because it represents alength-preserving transformation. We can calculate the length of(1; 2)T to be p5. Then,cos sin sin cos

    12

    =

    cos 2sin 2cos + sin

    ,

    which still has a length ofp

    5:

    Example 31 All permutation matrices are orthogonal, hence we conrm that the inverse of a permutationmatrix is actually its transpose.

    Another important property of orthogonal matrices is that multiplication by Q preserves lengths, innerproducts, and angles (i.e., lengths, inner products, and angles that existed before multiplication by Q will

    be the same after multiplication by Q.) For instance, lengths are equivalent (i.e., kQxk2 = kxk2) because

    (Qx)

    T

    (Qx) = xT

    QT

    Qx = xT

    x, and inner products are preserved because (Qx)

    T

    (Qy) = xT

    QT

    Qy = xT

    y.Therefore, the following statements are equivalent if they are about an n n matrix Q:1. Q is orthogonal.

    2. kQxk = kxk for exery x in Rn:3. Qx Qy = x y for every x and y in Rn.Note that the discussion earlier regarding the expression of a vector v as a linear combination of a

    subspaces orthonormal basis vectors can be reinterpreted here if we consider again the system Ax = b.This time, however, we will consider Qx = b, where the columns of Q are the orthonormal basis vectors.Then writing b as a linear combination of the basis vectors fq1;:::;qng simply equates to solving the system

    x1q1 + x2q2 + + xnqn = b, or Qx = b.The solution to this system is x = Q1b, and since Q1 = QT, this becomes

    x =QTb =

    264

    qT1

    ...

    qTn

    3752664

    ...b...

    3775 =

    264

    qT1

    b...

    qTn b

    375 , (7)

    4 The QTQ = I relation still works even ifQ is not square. IfQ is an m n matrix, QT would be an nm matrix, andtheir product would be a square identity matrix.

    17

  • 7/31/2019 Notes 4 Supp Diffeq

    18/23

    where the components ofx are the dot products of the orthonormal basis vectors with b, as we would expect.

    Note: When we projected a vector b onto a line, we ended up with the expression aTb

    aTa. Note here

    that a is actually qi, and because of the unit lengths, the denominator is 1. What Eq. (7) then showsis that every vector b is the sum of its one-dimensional projections onto the lines spanned by each of theorthonormal vectors qi.

    Note: Furthermore, because QT = Q1 we have QQT = I (in addition to QTQ = I). This leads to thesomewhat remarkable conclusion that the rows of a square matrix are orthonormal whenever the columnsare!

    QR-Factorization

    In the Gram-Schmidt process, we start with independent vectors in Rm, namely fa1; :::;ang ; and end withorthonormal vectors fq1; :::;qng (again in Rm) . If we make these vectors the columns of matrices A andQ, respectively, we have two m n matrices. Is there a third matrix that connects these two?

    Recall that we can easily write vectors in a space as linear combinations of the vectors in any orthonormalbasis of that space. Since the qi constitute an orthonormal basis, we have

    a1 =qT1a1q1 +

    qT2a1q2 + +

    qTna1

    qn

    a2 =qT1a2q1 +

    qT2a2q2 + +

    qTna2

    qn

    a3 =qT1a3q1 +

    qT2a3q2 + +

    qTna3

    qn

    ...

    an =qT1anq1 +

    qT2anq2 + + qTnanqn:

    However, because of the manner in which the Gram-Schmidt process is performed, we know that vector a1is orthogonal to the vectors q2;q3;q4;:::, the vector a2 is orthogonal to the vectors q3;q4;q5;:::, the vectora3 is orthogonal to the vectors q4;q5;q6;:::, and so on. Therefore, all of the dot products q

    Tj ai with j > i

    will equal zero, yielding the following:

    a1 =qT1a1q1

    a2 =qT1a2q1 +

    qT2a2q2

    a3 =qT1a3q1 +

    qT2a3q2 +

    qT3a3q3

    ...

    an = qT1 anq1 + qT2 anq2 + + qTnanqn:Of course, this corresponds exactly to the following system:

    A =

    24 a1 a2 an

    35

    | {z }mn

    =

    24 q1 q2 qn

    35

    | {z }mn

    2666664

    qT1a1

    qT1a2

    qT1a3 qT

    1an

    0qT2a2

    qT2a3 qT

    2an

    0 0qT3a3 qT

    3an

    ......

    .... . .

    ...0 0 0 0

    qTnan

    3777775

    | {z }nn

    = QR,

    and we have arrived at the QR-Factorization of matrix A, in which Q has orthonormal columns and R is

    upper triangular (because of how Gram-Schmidt is performed - we start with vector a, which falls on thesame line as q1. Then vectors a1 and a2 are in the same plane as q1 and q2, and so on). Thus matrix R isthe matrix that connects Q back to A, and we have the following theorem:

    Theorem 32 Let A be an m n matrix with linearly independent columns. Then A can be factored asA = QR, where Q is an m n matrix with orthonormal columns and R is an invertible upper triangularmatrix.

    18

  • 7/31/2019 Notes 4 Supp Diffeq

    19/23

    Example 33 Find a QR factorization of

    A =

    2664

    1 2 21 1 21 0 11 1 2

    3775 .

    Solution: It is easy to determine that the columns of A are linearly independent, so it forms a basis forthe subspace spanned by those columns (i.e., the column space of A). Start the Gram-Schmidt process by

    settingv1

    = a1

    :

    v1 =

    0BB@1

    111

    1CCA .Then,

    v2 =

    0BB@

    2101

    1CCA

    v1 a2v1 v1

    0BB@1

    111

    1CCA =

    0BB@

    2101

    1CCA

    2

    4

    0BB@1

    111

    1CCA =

    0BB@

    3

    23

    21

    21

    2

    1CCA .

    Note: Since we will be normalizing later, we can "rescale"v2 without changing any orthogonality relation-ships to make future calculations easier. So well replacev2 withv02 = (3; 3; 1; 1). Finally,

    v3 =

    0BB@2212

    1CCAv1 a3v1 v1

    0BB@1

    111

    1CCAv02 a3

    v02 v0

    2

    0BB@3311

    1CCA

    =

    0BB@

    2212

    1CCA

    1

    4

    0BB@1

    111

    1CCA

    15

    20

    0BB@3311

    1CCA

    =

    0BB@

    12

    01

    2

    1

    1CCA

    .

    We can again rescale v3 to obtain v03

    =

    0BB@1012

    1CCA. We now have an orthogonal basis fv1;v02;v03g for thesubspace W. Now, to obtain an orthonormal basis, normalize each vector (the details are left to you):

    fq1; q2; q3g =

    8>>>:0BB@

    1=21=21=21=2

    1CCA ;

    0BB@

    3p

    5=10

    3p

    5=10p5=10p5=10

    1CCA ;

    0BB@

    p6=60p6=6p6=3

    1CCA9>>=>>; .

    Now, to obtain a QR factorization for A, we have

    Q =

    26641=2 3p5=10 p6=6

    1=2 3p5=10 01=2 p5=10 p6=61=2

    p5=10

    p6=3

    3775 .

    Because Q has orthonormal columns, we know that QTQ = I. Therefore, if A = QR,

    QTA = QTQR = IR = R.

    19

  • 7/31/2019 Notes 4 Supp Diffeq

    20/23

    So to nd R, just calculate QTA:

    QTA =

    24 12 12 12 123p5=10 3p5=10 p5=10 p5=10

    p6=6 0 p6=6 p6=3

    352664

    1 2 21 1 21 0 11 1 2

    3775

    =

    2

    4

    2 1 12

    0p

    5 32

    p5

    0 0 12

    p6

    3

    5= R.Note that the diagonals of R contain the lengths of vectors v1, v2, and v3.

    Using the QR Factorization to Solve Systems

    Note that the system Ax = b becomes QRx = b, and hence

    Rx = QTb (8)

    (because Q1 = QT). Because R is upper triangular, the equation in (8) can be solved easily via backsubstitution. For example, given the system Ax = (0; 4; 5) and the fact that A = QR factorization yields

    24 1 1 21 0 21 2 3

    35 = 2641

    p34

    p422

    p141p3

    1p42

    3p14

    1p3

    5p42

    1p14

    375264 p3 1

    p3 p30

    p14p3

    p21p2

    0 0p7p2

    375 ,We nd

    QTb =

    24 13

    p3 1

    3

    p3 1

    3

    p3

    2

    21

    p42 1

    42

    p42 5

    42

    p42

    1

    7

    p14 3

    14

    p14 1

    14

    p14

    3524 04

    5

    35 =

    24 3

    p3

    1

    2

    p42

    1

    2

    p14

    35 .

    Then solve

    Rx =

    264

    p3 1p

    3p3

    0p14p3

    p21p2

    0 0p7

    p2

    375

    2

    4xyz

    3

    5=

    2

    43p31

    2

    p42

    1

    2

    p14

    3

    5by back substitution to obtain

    x =

    24 20

    1

    35 .

    Least Squares and the QR Factorization

    Review of Least Squares and the Normal Equations

    This topic builds o of what we did in Computer Lab #10. In the lab, we learned:

    In a least-squares situation, in order to minimize all of the errors (specically, the sum of the squared

    distances between the "best-t" line and the actual data points), we needed to determine the vectorin Ax that was closest to the vector b.

    This is the same as determining the projection of b onto a subspace, and that subspace was actuallythe column space of A.

    Typically, in a least squares setting, we have many more data points than variables, so if A is m n,then m > n, and we most likely do not have an exact solution (i.e., rarely will all the points follow themathematical model exactly).

    20

  • 7/31/2019 Notes 4 Supp Diffeq

    21/23

    In terms of matrix subspaces, the vector b will most likely be outside the column space of A. However, the point p in the subspace that is closest to b would be in the column space of A, so it can

    be written as p = Abx, where bx represents the "best estimate" vector to the "almost" solution vectorx.

    Since p is the projection of b onto the column space, the error vector we wish to minimize, i.e.e = b Abx, will be orthogonal to that space.

    However, if a vector is orthogonal to the column space of the matrix A, it is also orthogonal to the rowspace of the transpose A

    T

    , and any vector orthogonal to the row space of a matrix is in the null spaceof that matrix.

    Therefore, because e is orthogonal to the column space of A, we can conclude that it is in the nullspace of AT. This is what nally allowed us make the following important connection:

    AT (b Abx) = 0ATb ATAbx = 0

    ATAbx = ATb, (9)the last line of which describes what are called the normal equations.

    Finally, the matrix ATA is invertible exactly when the columns of A are linearly independent.5 Then,the best estimate

    bx, which gives us the coecients in the mathematical model (or "line" of best-t), 6

    can be found as bx = ATA1 ATb:Example 34 Find a least squares solution to the inconsistent system Ax = b, where

    A =

    24 1 52 2

    1 1

    35 and b =

    24 32

    5

    35 .

    Solution: Compute

    ATA =

    1 2 15 2 1

    24 1 52 21 1

    35 = 6 0

    0 30

    andATb =

    1 2 15 2 1

    24 325

    35 = 216

    .

    Then the normal equations are

    ATAbx = ATb6 00 30

    bx = 216

    ,

    from which it is easy to see thatbx = 13

    ; 815

    T.

    Example 35 Find the least squares approximating line for the data points (1; 2), (2; 2), and (3; 4).Solution: We want the liney = a + bx that is best ts these three points. The appropriate system would

    be

    a + b (1) = 2

    a + b (2) = 2

    a + b (3) = 4

    5 Be careful here - because A might be rectangular, we are acutally dealing with what is called a "left inverse," and the

    relationATA

    1

    = A1AT

    1

    does not hold as it does with square matrices.6 I use quotes here because we are not limited to linear models with this technique.

    21

  • 7/31/2019 Notes 4 Supp Diffeq

    22/23

    which can be reformed into Ax = b as 24 1 11 2

    1 3

    35 a

    b

    =

    24 22

    4

    35 .

    Again, compute

    ATA =

    1 1 11 2 3

    2

    41 11 21 3

    3

    5=

    3 66 14

    and

    ATb =

    1 1 11 2 3

    24 224

    35 = 8

    18

    .

    Solving 3 66 14

    bx = 818

    leads to the solution bx = 23

    ; 1T

    , so the equation for the line of best t would be y = 23

    + x, shown in theplot below along with the three data points:

    52.50-2.5-5

    5

    2.5

    0

    -2.5

    x

    y

    x

    y

    While were at it, we can also calculate the actual least squares error. If bx represents the least squaressolution of Ax = b, it is the vector in the column space of A that is closest to b. The actual distance fromb to bx would simply be the length of the perpendicular component of the projection ofb onto A. In symbols,

    kek = kb Abxk .Now,

    e = b Abx =24 22

    4

    35

    24 1 11 2

    1 3

    35 23

    1

    =

    24 132

    31

    3

    35 ,

    and the length ofe is thenq132 + 2

    32 + 1

    32 = q2

    3 0:816.

    Least Squares and the QR Factorization

    One major advantage of orthogonalization is that it greatly simplies the least squares problem Ax = b.The normal equations from (9) are still

    ATAbx = ATb,

    22

  • 7/31/2019 Notes 4 Supp Diffeq

    23/23

    but with QR factorization, ATA becomes

    ATA = (QR)T

    (QR) = RTQTQR = RTR, (because QT = Q1).

    Then, the equations in (9) become

    ATAbx = ATbRTRbx = RTQTb,

    or,R

    bx = QTb. (10)

    Although this may not look like much of an improvement, it most certainly is, particularly because R isupper triangular. Therefore, the solution to (10) can be found via back substitution. We still need to useGram-Schmidt to produce Q and R, but the payo is that the equations in (10) are less prone to numericalinaccuracies such as round-o error.

    Example 36 Consider the previous example in which we found the line of best t for the points (1; 2), (2; 2),and (3; 4). If we instead nd the QR factorization, we have

    A =

    24 1 11 2

    1 3

    35 =

    24 13

    p3 1

    2

    p2

    1

    3

    p3 0

    1

    3

    p3 1

    2

    p2

    35 p3 2p3

    0p

    2

    = QR.

    Then Rbx = QTb becomes p3 2p30 p2 bx =

    1

    3p31

    3p31

    3p312

    p2 0 12

    p2 242

    2435 = 83p3p2 .

    Hence,p

    2b =p

    2 ) b = 1 and so p3a + 2p3(1) = 8p3

    3) a = 2

    3, as we found earlier.

    An Aside: Least Squares and Calculus

    Consider the simple system

    a1x = b1

    a2x = b2:

    a3x = b3

    This is solvable only if b1; b2, and b3 are in the ratio of a1 : a2 : a3. In practice, this would rarely be the

    case if the above equations came from "real" data. So, instead of trying to solve the unsolvable, we proceedby choosing an x that minimizes the average error E in the equations. A convenient error measurement touse is the "sum of squares," namely

    E2 = (a1x b1)2 + (a2x b2)2 + (a3x b3)2 .If there was an exact solution, E = 0. If there is not an exact solution, we can nd the minimum error bysetting the derivative of E2 = 0

    dE2

    dx= 2[(a1x b1) a1 + (a2x b2) a2 + (a3x b3) a3] = 0

    and then solving for x:

    0 = 2 ((a1x b1) a1 + (a2x b2) a2 + (a3x b3) a3)= 2xa2

    1 2a2b2

    2a3b3

    2a1b1 + 2xa

    2

    2+ 2xa2

    3

    )x =

    a1b1 + a2b2 + a3b3a21

    + a22

    + a23

    =aTb

    aTa.

    This result, which you should recognize as the coecient in the projection calculations, gives us the least-squares solution to a problem ax = b in one variable x.

    23