g mclelland notes

8/18/2019 G McLelland Notes

1/78


2/78

CHAPTER 1

Linear Equations

1.1 The Method of Elimination.

The solution of problems which arise in the real world frequently requires the calculation of unknown quantities. In the simplest cases, the unknowns are numbers and are found by solvingequations which the unknown numbers must satisfy. We begin with an example.

A mining company owns three mines which produce silver, lead and zinc. The ores fromthe three mines produce different quantities of each of the metals. The following table gives thequantities per tonne of ore.

Mine Silver(100gm) Lead(100kg) Zinc(100kg)1 1 2 32 2 1 13 1 3 1

The mining company has an order for 11,000 gm of silver, 17,000 kg of lead and 10,000 kg of zinc.How much ore must be taken from each mine in order to fill the order?

Clearly the company could fill the order by taking say 110 tonnes of ore from the first mine.This would produce 11,000 gm of silver, 22,000 kg of lead and 33,000 kg of zinc. This is a verywasteful solution and the company would wish to avoid such a course of action. What is requiredis an extraction regime from the three mines which fills the order exactly. It is not obvious whatthe solution might be or even whether there is a solution.

The first step in dealing with a problem such as this is to formulate the problem mathematically.This step is often referred to as formulating a mathematical model of the problem. Mathematical

calculations can then be carried out to find the solution. Such calculations may not be easy, butthey are almost always very much simpler than attempting to solve the problem in any other way.

The problem requires some unknowns to be found. These are the amounts of ore which mustbe taken from each mine and so we begin by giving symbolic names to these quantities. Let theamount of ore taken from the first mine be x tonnes, the amount from the second mine be y tonnesand the amount from the third be z tonnes. The method used in mathematics to find unknownsis to formulate equations which the unknowns must satisfy and then to solve these equations. Theequations constitute the mathematical model of the real world problem.

Consider first the quantity of silver which must be produced. Since x tonnes of ore are takenfrom the first mine and each tonne of ore produces 100 gm of silver, so the first mine will give1× x× 100 gm of silver. Similarly the second mine will give 2× y× 100 gm of silver and the third


3/78

2 1 Linear Equations

will give 1× z × 100 gm. The total amount of silver produced is then 100x + 200y + 100z gm andthis is required to be 11,000 gm. We thus require that the equation

100x + 200y + 100z = 11, 000 or x + 2y + z = 110

be satisfied by x, y and z . By similar considerations with the amounts of lead and zinc, we obtaintwo further equations

2x + 1y + 3z = 170 and 3x + 1y + 1z = 100.

To avoid the large numbers on the right hand sides of these equations, we change the definitionsof x, y and z to be the number of tens of tonnes of ore taken from each mine. The amount of oretaken from the first mine is now 10x tonnes and so the amount of silver obtained is 10x× 100 gm.With similar calculations for the other mines, the equation for the production of silver becomes

1, 000x + 2, 000y + 1, 000z = 11, 000 or x + 2y + z = 11.

The other two equations change in a similar way and so the set of equations to be solved is

x + 2y + z = 11,

2x + y + 3z = 17,

3x + y + z = 10.

These equations are called algebraic as the unknowns are numbers and the only operationswhich occur are those of simple arithmetic. Further, they are linear because only first powers of the unknowns occur and there are no products of unknowns. These definitions can be made more

precise, but as yet we have no reason to do so. For any equations, there are three questions whichcan be asked.

1. Are there any solutions of the equations? In the example above it will certainly be possiblefor the company to fill its order by taking out very large amounts of ore from each mine, butthe question we are asking is whether it is possible to fill the order with no waste.

2. If there are solutions, how many are there? In the example the question is whether the companycan fill the order in a variety of different ways or in just one way.

3. Given that there are one or more solutions of the equations, how do we find them?

These questions can be asked about any equations and the way in which they are answeredvaries with the type and complexity of the equations in question. For a set of linear equations,such as the one formulated above, one method of solution is the method of elimination. In thismethod we first eliminate one of the variables between two of the equations and then eliminate thesame variable between another two of the equations. To describe the method, we must number theequations.

x + 2y + z = 11, (1)

2x + y + 3z = 17, (2)

3x + y + z = 10. (3)

We can eliminate z between the first two equations by multiplying the first by 3 and subtractingthe result from the second to obtain

−x−

5y =−

16. (4)


4/78

1.2 Gaussian Reduction 3

To eliminate z between the first and third equations, we simply subtract equation (1) from equation(3) to obtain

2x− y = −1. (5)We now have two equations in the two unknowns x and y. To eliminate y, we multiply equation(5) by five and subtract equation (4). We then obtain

11x = 11. (6)

From (6), we obtain x = 1 and then from (5) we obtain y = 3. Finally from (1), we find z = 4. Thesolution which the company is seeking is that it should take 10 tonnes of ore from its first mine, 30tonnes from its second mine and 40 tonnes from the third mine.

These calculations have answered the three questions simultaneously. There is a solution, thereis only one and we have computed it. For sets of linear equations, it may appear that there is littlemore to be said, but this is far from the case. There are a variety of complications which can occur.

1. In the example above, we have only three unknowns. In more complex problems there may behundreds or perhaps thousands of unknowns. In order to solve such problems computers mustbe used and a simple and efficient method of automating the method of elimination must beused.

2. In some problems which arise in practice there are unequal numbers of equations and unknowns.Our methods of solution should be able to handle these situations as well as the simple caseof equal numbers of each.

3. In the problem outlined above, the mining company will receive many orders over time and soit will have to solve the given equations many times. The sets of equations however will alwayshave the same left hand sides as the left hand side of each equation is determined only by theproperties of the mine. The right hand sides are determined by the orders and so will vary. In

solving the sets of equations we should be able to avoid performing the same calculations onthe left hand sides every time. How can we achieve this?

1.2 Gaussian Reduction.

The first problem to be considered is the automation of the elimination method described inthe previous section. In the earlier calculation we eliminated z first and then y. It is irrelevantwhich variable is eliminated first and for hand calculations we choose a sequence of eliminationswhich produces the simplest numbers. For a computer however a fixed sequence of eliminationsmust be specified. The order which is usually used is to eliminate x first and then y as in the

following calculation.

x + 2y + z = 11, (1)

2x + y + 3z = 17, (2)

3x + y + z = 10. (3)

We subtract twice the first equation from the second and three times the first equation from thethird to obtain

−3y + z = −5, (4)

−5y

−2z =

−23. (5)


5/78


To eliminate y , we multiply equation (5) by three and subtract five times equation (4). This gives

−11z = −44. (6)

From (6), we find z = 4, then from (5), y = 3 and finally from (1) we obtain x = 1.This calculation involves only the coefficients of x, y and z , together with the numbers on the

right hand sides of the equations. It can thus be written out as a calculation with this collectionof numbers, with the unknowns omitted. To do this we begin with the array of numbers obtainedfrom the equations.

1 2 1 112 1 3 173 1 1 10

It is useful to separate the right hand sides of the equations and we shall do this by inserting avertical line in the array, although this line is of no significance in performing the calculations.

1 2 1 112 1 3 173 1 1 10

Definition 1.1

The array of numbers obtained from a set of simultaneous linear equations by remov-ing the unknowns and separating the right hand sides by a vertical stroke is called theaugmented array of the set of equations.

The operations previously carried out on the equations can now be performed on the rows of numbers in the array. To describe these operations, we number each row in the array. The firststep is to subtract twice row 1 from row 2. This operation does not change row 1. We shall denotethe operation by R2 → R2− 2R1. Thus the array becomes

1 2 1 110 −3 1 −5 (R2 → R2− 2R1)3 1 1 10

A similar operation is performed to produce a zero as the first element of the third row.1 2 1 110 −3 1 −50 −5 −2 −23 (R3 → R3− 3R1)

The next step is to obtain a zero as the second element of the third row. We can do this bymultiplying the third row by 3 to obtain

1 2 1 110 −3 1 −50 −

15 −

6 −

69 (R3→

3R3)


6/78


and then subtracting 5 times row 2 from row 3.

1 2 1 11

0 −3 1 −50 0 −11 −44 (R3 → R3− 5R2)

The solution can be found from this array. From the third row we find −11z = −44 and so z = 4.The second row gives −3y + z = −5 and so y = 3. Finally, from the first row x + 2y + z = 11 andso x = 1. There is some terminology used to describe the procedure we have just outlined.

Definition 1.2

1. In an array, the element in the i’th row and j’th column is called the (i,j) element.

2. The main diagonal of an augmented array is the diagonal beginning at the top lefthand corner and proceeding downwards and to the right, but not crossing the verticalstroke.

The following diagram shows an array with the main diagonal circled. This array is square tothe left of the vertical line.

a11 . . . a1i . . . a1n b1...

......

...ai1 . . . aii . . . ain bi

......

......

an1 . . . ani . . . ann bn

To solve a set of simultaneous linear equations by elimination, we seek to transform the aug-mented array into an array in which all elements below the main diagonal are zero. From this newarray, the unknowns can be easily found.

Definition 1.31. The process of transforming an augmented array into an array in which all elements

below the main diagonal are zero is called Gaussian Reduction.

2. The process of finding the unknowns successively from the reduced array is calledback substitution.

Gaussian reduction does not always work in such a straightforward way as it did in the aboveexample and we shall progressively refine the procedure as we consider examples in which difficultiesof various types arise.


7/78


The first difficulty that we shall consider is the appearance of a zero on the main diagonal atsome stage of the calculation. This will of course prevent the other elements in the column beingreduced to zero. This difficulty is easily dealt with. We simply interchange rows so that a non-zero

element appears on the main diagonal. This operation corresponds to writing the original equationsin a different order and this will certainly not change the solutions.

Example 1.1

We consider an example with four equations in four unknowns.

x + 2y + z + 3t = 4

2x + 4y + 3z + t = 5

−x + 2y

−z + 2t =

−3

2x + y + 3z − t = 6

The augmented array of the set of equations is

1 2 1 3 42 4 3 1 5

−1 2 −1 2 −32 1 3 −1 6

The reduction of the array is carried out column by column beginning at the left hand side of the array. The reduction however is performed by using row operations and is often referred to asrow reduction.

1 2 1 3 40 0 1 −5 −3 (R2 → R2− 2R1)0 4 0 5 1 (R3 → R3 + R1)0 −3 1 −7 −2 (R4 → R4− 2R1)

We now encounter the problem that the second element of the second row is zero and so cannot beused to reduce the elements below it. This can be overcome by interchanging the second and thirdrows.

1 2 1 3 40 4 0 5 1 (R2 ↔ R3)0 0 1 −5 −30 −3 1 −7 −2

The reduction of the second column can now be carried out.

1 2 1 3 40 4 0 5 10 0 1 −5 −30 −

12 4 −

28 −

8 (R4→

4R4)


8/78


1 2 1 3 40 4 0 5 10 0 1 −5 −30 0 4

−13

−5 (R4

→R4 + 3R2)

The third column is easily reduced.

1 2 1 3 40 4 0 5 10 0 1 −5 −30 0 0 7 7 (R4 → R4− 4R3)

From this array, the solution of the set of equations can be obtained by back substitution as t = 1,z = 2, y = −1 and x = 1.

The method used in this example will fail if the column in which the zero appears contains

only zeros below the main diagonal. In this case there is no suitable row to be interchanged withthe one in question. Because of this possibility, the general description of Gaussian reduction ismore elaborate than we have given here. We shall not however consider sets of equations whichgive rise to these more complicated situations.

The method of Gaussian reduction is simply a systematic and concise way of writing out themethod of elimination. The method can be described as the use of row operations on the augmentedarray in order to transform it to the reduced array in which all elements below the main diagonalare zero. We begin at the top left hand corner and use the element there to reduce all elementsbelow it to zero. We then use the second element in the second column to reduce all elements belowit to zero and so on.

In reducing an array in the manner just described, three types of row operation are permitted.

Each corresponds to an operation on the original set of equations. These operations do not changethe solutions of the equations and this is the reason why they can be used in transforming theaugmented array. The operations are as follows.

1. Multiplication or division of a row by a constant.2. Interchange of two rows.3. Addition or subtraction of one row multiplied by a constant to another row.

The first of these operations was used in Example 1.1, but an array can be reduced withoutit. It was used above to avoid fractions, but in a calculation carried out on a computer, it isirrelevant whether or not fractions or decimal numbers appear. In the previous example, the stepof multiplying the fourth row by 4 could be omitted. The reduction of the second column wouldthen be carried out by adding (3/4)R2 to R4.

1 2 1 3 40 4 0 5 10 0 1 −5 −30 0 1 −13/4 −5/4 (R4 → R4 + (3/4)R2)

The appearance of fractions makes the hand calculation very tedious but makes no differenceto the computer. This example also illustrates the fact that there are many ways to carry out areduction, and many different reduced arrays, for a given set of equations. All reduced arrays will of course be equivalent, in the sense that they give the same solutions to the original set of equations.

In some cases the reduction of the array can be carried further than is the case with Gaussianreduction. If the array to the left of the line is square and all elements on the main diagonal in the


9/78


Gaussian reduction are non-zero, then all elements above the main diagonal can also be reduced tozero. The solutions of the equations can then simply be read from the final array.

Example 1.2

We carry on the reduction of the last example. Again the reduction is done column by columnusing row operations. We begin with the Gaussian array.

1 2 1 3 40 4 0 5 10 0 1 −5 −30 0 0 7 7

1 2 1 3 40 1 0 5/4 1/4 (R2

→R2/4)

0 0 1 −5 −30 0 0 1 1 (R4 → R4/7)

1 0 1 1/2 7/2 (R1 → R1− 2R2)0 1 0 5/4 1/40 0 1 −5 −30 0 0 1 1

1 0 0 11/2 13/2 (R1 → R1−R3)0 1 0 5/4 1/40 0 1 −5 −30 0 0 1 1

1 0 0 0 1 (R1 → R1− 11/2R4)0 1 0 0 −1 (R2 → R2− 5/4R4)0 0 1 0 2 (R3 → R3 + 5R4)0 0 0 1 1

From this array, the solution can be read off immediately and back substitution is not needed.

Definition 1.4

The process of transforming an array which is square to the left of the vertical lineinto one in which all elements on the main diagonal are one and all elements above andbelow the main diagonal are zero is called Gauss-Jordan reduction.

Gauss-Jordan reduction appears to be a more efficient method than Gaussian reduction. It isplausible however, and can be demonstrated, that it requires more operations to reduce the array


10/78

1.3 Special Cases 9

from Gauss to Gauss-Jordan form than it does to carry out the back substitution. It is thus moreefficient to solve sets of equations by Gaussian rather than Gauss-Jordan reduction. There arehowever other uses of Gauss-Jordan reduction and we shall later consider one of these.

1.3 Special Cases.

We return to the example of the mining company with three mines each producing silver, leadand zinc. We shall consider the same problem as before, with the company having to find theamount of ore to be extracted from each mine to fill an order without wastage. We now supposethat the amounts of metal produced by the ore from each mine are given by the following table.


The order to be filled is for 2,000 gm of silver, 3,000 kg of lead and 5,000 kg of zinc. We introducethe same unknowns x, y and z as before and construct the equations which must be solved.

2x + y + 4z = 2,

4x + 5y + 6z = 3,

3x + 3y + 5z = 5.

The augmented array of the set of equations is easily written down.

2 1 4 2

4 5 6 33 3 5 5

At this stage there appears to be little difference between this case and the one presentedearlier. When we carry out the reduction however a significant difference becomes apparent.

1 2 1 3 (R1 → R3−R1)4 5 6 33 3 5 5

1 2 1 30 −3 2 −9 (R2 → R2− 4R1)0 −3 2 −4 (R3 → R3− 3R1)

1 2 1 30 −3 2 −90 0 0 5 (R3 → R3−R2)

The array has been reduced to the required form, but the final array has a row of zeros tothe left of the line. This cannot be reduced further because we can only exchange a row witha row below it in the array. If we seek to use a row above the row, then the reduction already


11/78


carried out will be destroyed. Each row in any array corresponds to an equation and the equationcorresponding to the last row of the above array is 0x + 0y + 0z = 5. For any values of x, y andz, this gives 0 = 5, which is impossible. As a consequence, the original set of equations has no

solutions. In terms of the original problem, this means that the mining company cannot fill thegiven order without wastage. It can of course fill the order by extracting a sufficiently large amountof ore from any one of its mines, but there will always be product left over unless the equations aresatisfied.

Definition 1.5

A set of simultaneous linear equations for which the reduced array contains a rowwith zeros to the left of the line and a non-zero number to the right is called inconsistent.

It should be noted that it is not possible for a set of equations to be inconsistent if everyequation has right hand side zero. A set of equations with this property is given a particular name.

Definition 1.6

1. A set of simultaneous linear equations is called homogeneous if every equation in theset has right hand side zero.

2. If at least one right hand side is non-zero, then the set is called non-homogeneous.

Homogeneous equations do not of course occur in the example of the mining company, becausein any real order, a nonzero amount of at least one of the metals will be required. We shall howeverencounter homogeneous sets of equations in another context later and the fact that such equations

cannot be inconsistent will be important.An inconsistent set of equations has no solutions. We have now seen that a set of simultaneous

linear equations may have one solution or it may have no solutions. There is one other possibilityand again we illustrate it with an example from the mining company. Suppose that the three minesproduce ore as in the previous case but the order is now for 2, 000 gm of silver, 8,000 kg of leadand 5, 000 kg of zinc. The calculations are almost the same as before, the only difference being inthe elements on the right hand side of the dividing line. The equations are

2x + y + 4z = 2,

4x + 5y + 6z = 8,

3x + 3y + 5z = 5.


12/78

1.3 Special Cases 11

and the augmented array is

2 1 4 2

4 5 6 83 3 5 5

The Gaussian reduction is almost identical to that given above and the result is

2 1 4 20 3 −2 40 0 0 0

Again we have a row of zeros to the left of the line, but the corresponding element to the right of the line is also zero. The equation corresponding to this line is 0 = 0 and this is satisfied for anyvalues of x, y and z. There are then only two equations to be solved for the three unknowns. Infact we can let z have any value at all and we can find the corresponding values for x and y. Letz = a. Then from the second row of the array we find y = (4 + 2a)/3 and from the first row weobtain x = (1 − 7a)/3. Some particular solutions are x = −2, y = 2, z = 1, given by a = 1 andx = 0.1, y = 1.4, z = 0.1, given by a = 0.1. There are in fact an infinite number of solutions of the equations.

Definition 1.7

A set of simultaneous linear equations for which the reduced array contains a row of

zeros to the left of the line and zero to the right is said to be redundant.

If a set of equations is redundant, then at least one of the equations can be obtained by addingtogether suitable multiples of the others. The number of distinct equations is thus less than thenumber of equations originally given. In the above example, if we subtract the second equation fromtwice the third, then we get the first equation and so there are really only two distinct equations.To put this in another way, once we have found a solution of two of the equations, then it willautomatically satisfy the third equation. This is not true of any of the earlier examples.

It is quite possible for the reduced array to contain several rows of zeros to the left of the line.

For some of these, the element on the right may be zero and for some it may be non-zero. Thusit is possible for a set of equations to be both inconsistent and redundant. This occurs if someequations can be obtained as combinations of others, but the set of equations has no solutions.

In relation to the above example, it should be observed that not every value of a gives a solutionwhich the mining company can use. Clearly the mining company cannot extract a negative amountof ore from a mine and so the values of x, y and z must all be positive. In order to obtain a solutionto the original real world problem we can only allow values of a between 0 and 1/7. This restrictionillustrates the fact that when we construct a mathematical model of a real world problem, themathematical properties of the model may not all correspond to properties of the original problem.In particular, solutions of equations may not always be able to be implemented in practice. Insolving problems with mathematics, we must always be aware of the interpretation of the resultsand not simply accept all mathematical results as solutions.


13/78


There is a simple geometric interpretation of the solution of sets of simultaneous linear equa-tions, at least in two and three dimensions. The general form of a linear equation in two unknownsis

ax + by = cand such an equation is represented graphically by a line in the plane. A set of two simultaneouslinear equations in two unknowns thus represents a pair of lines in the plane and a solution of theset of equations represents a point where the lines intersect. Similar considerations apply to anequation with three unknowns. The general form is

ax + by + cz = d

and this is represented graphically as a plane in three dimensional space. Each solution of a set of equations in three unknowns represents a point where all of the planes represented by the equationsintersect.

We shall consider the two dimensional case where we have two equations in two unknowns.Consider first the equations

x + y = 2,

2x + y = 3.

The augmented array of the equations is

1 1 22 1 3

The Gaussian reduction is easily obtained and the result is

1 1 20 −1 −1

The solution of the set of equations is x = 1, y = 1. Each of the equations represents a line in theplane and the solution of the set of equations is the point where the two lines intersect.

1 2 3 4−1

1

2

3

4

−1

In two dimensions two lines will fail to intersect if they are are parallel and in this case thecorresponding equations will have no solutions. An example is provided by the equations

x− 2y = 1,

−2x + 4y = 6.


14/78

1.4 Unequal Numbers of Equations and Unknowns 13

The augmented array of the equations is

1 −2 1−2 4 6

The Gaussian reduction is easily obtained and the result is

1 −2 10 0 8

The reduction shows that the equations are inconsistent and this can be seen graphically by thefact that the two lines are parallel.

1 2 3−1

1

2

3

−1

The final case is when the equations have an infinite number of solutions. This occurs whenthe two lines are the same and so every point on the line is a solution of the set of equations.

In two dimensions the geometrical possibilities are easy to understand but trivial. In threedimensions there are clearly more possibilities and in more than three dimensions we can no longerdraw pictures to see what is happening.

1.4 Unequal Numbers of Equations and Unknowns.

We have used the mining company example to illustrate various features of the solutions of sets of simultaneous linear equations. At the end of Section 1.1 we listed three problems to be dealtwith. The first concerned the formulation of an efficient procedure for solving sets of equations byelimination. We presented such a method, the method of reduction, in Sections 1.2 and 1.3. Inthese Sections however, we considered only situations where the number of equations is the same

as the number of unknowns. The next task is to examine cases where there are more unknownsthan equations or more equations than unknowns.

Suppose we consider the original mining company which has three mines, with the productionof metal per tonne of ore from each mine given by the following table.


We suppose the company has an order for 11,000 gm of silver and 17,000 kg of lead, but has noorder for zinc. Letting x, y and z be the amounts of ore taken from each mine in tens of tonnes,


15/78


we obtain the following equations to solve.

x + 2y + z = 11

2x + y + 3z = 17

The augmented array is1 2 1 112 1 3 17

The main diagonal of this array does not reach the vertical line and to reduce the array, we againseek to reduce all elements below the main diagonal to zero. This is easily done and requires onlyone operation.

1 2 1 110 −3 1 −5 (R2 → R2− 2R1)

We can allow z to have any real value and so we put z = a. Then from the second row of thearray, y = (5 + a)/3. From the first row, x = (23 − 5a)/3. A mathematical solution of theequations is obtained for any value of a, but in the original problem we must restrict a to the range0 ≤ a ≤ 23/5. Possible solutions for the company are x = 6, y = 2, z = 1, given by a = 1, andx = 23/3, y = 5/3, z = 0, given by a = 0, but not x = 28/3, y = 4/3, z = −1, given by a = −1.We have an infinite number of solutions in this situation but the equations are not redundant. Thecompany would decide between the various solutions on some other grounds than those we haveso far considered. Such criteria may involve the minimisation of costs. The application of suchextra criteria takes us into optimisation theory, which is of great importance for rational decisionmaking, but which is beyond the mathematical methods we are considering here.

Example 1.3

We consider an example with three equations and four unknowns.

x + y + z + t = 22x− y + 3z − t = 1

x + 2y + 4z + 3t = 4

The augmented array is1 1 1 1 22 −1 3 −1 11 2 4 3 4

The Gaussian reduction is as follows.

1 1 1 1 2

0 −3 1 −3 −3 (R2 → R2− 2R1)0 1 3 2 2 (R3 → R3−R1)

1 1 1 1 20 1 3 2 2 (R2 ↔ R3)0 −3 1 −3 −3

1 1 1 1 20 1 3 2 20 0 10 3 3 (R3

→R3 + 3R2)


16/78

1.4 Unequal Numbers of Equations and Unknowns 15

The last unknown t may be assigned any value, t = a, and then

z = 3(1− a)/10,

y = 11(1− a)/10,x = (3 + 2a)/5.

The opposite case is when there are more equations than unknowns. Suppose that our companydiscovers that there are traces of gold in each of its mines and that the content of the ore per tonneis given by the following table.

Mine Silver(100gm) Lead(100kg) Zinc(100kg) Gold(gm)1 1 2 3 12 2 1 1 33 1 3 1 2

Suppose that the company obtains an order for 11,000 gm of silver, 13,000 kg of lead, 10,000 kg of zinc and 100 gm of gold. Can it fill this order without wastage from its mines? There will now befour equations in the three unknowns x, y and z. The equations are easy to formulate

x + 2y + z = 11

2x + y + 3z = 13

3x + y + z = 10

x + 3y + 2z = 10

We begin the solution of the problem with the augmented array.

1 2 1 112 1 3 133 1 1 101 3 2 10

The Gaussian reduction of this array is carried out as follows.

1 2 1 110 −3 1 −9 (R2 → R2− 2R1)0 −5 −2 −23 (R3 → R3− 3R1)0 1 1 −1 (R4 → R4−R1)

1 2 1 110 1 1 −1 (R2 ↔ R4)0 −5 −2 −230 −3 1 −9

1 2 1 110 1 1 −10 0 3 −28 (R3 → R3 + 5R2)0 0 4

−12 (R4

→R4 + 3R2)


17/78


1 2 1 110 1 1 −10 0 3 −280 0 0 76/3 (R4 → R4− 4/3R3)

The reduced form shows that the equations are inconsistent, that is they have no solutions.The interpretation of this result is that it is impossible for the company to fill the given order fromits mines without wastage.

Example 1.4

Consider the following set of 4 equations in three unknowns.

2x− y + 3z = 9−x + y + z = 0

3x + 2y − z = −1

−2x + 3y + 2z = −1The augmented array is

−1 1 1 02 −1 3 93 2 −1 −1

−2 3 2 −1Where we have written the second equation as the first row of the array to simplify the calculation.The Gaussian reduction of this array is carried out as follows.

−1 1 1 00 1 5 9 (R2 → R2 + 2R1)0 5 2

−1 (R3

→R3 + 3R1)

0 1 0 −1 (R4 → R4− 2R1)

−1 1 1 00 1 5 90 0 −23 −46 (R3 → R3− 5R2)0 0 −5 −10 (R4 → R4−R2)

−1 1 1 00 1 5 9

0 0 −23 −460 0 0 0 (R4 → R4− (5/23)R3)The final row of the reduced array shows that the equations are redundant. The solution of the setof equations is

z = 2, y = −1, x = 1.

In the various examples above, we have seen that a set of simultaneous linear equations mayhave a unique solution, no solution or an infinite number of solutions. There are no other possibilitiesand the various cases can be decided by reducing the augmented array of the set of equations tothe form in which all elements below the main diagonal are zero. All of the cases arise in practicalproblems and so all are needed if we are to be able to deal with applications.


18/78

1.5 Problems 17

1.5 Problems.

Solve each of the following sets of equations by

1. Gaussian reduction,2. Gauss–Jordan reduction,where possible.

(i) x + 3y = 2 (ii) 5x− 6y = 42x− y = −3 8x− 9y = 7

(iii) x + 2y − z = 5 (iv) 2x− y + 2z = 122x− y − z = 0 3x + 4y − 3z = 5− x− y + 3z = 8 4x− 3y + 2z = 14

(v) x − y + 4z = 1 (vi) 3x + y − 3z = 0− 2x + 7y − 6z = 2 x + 4y + 2z = 0

x + 9y + 8z = 3 3x− 10y − 12z = 0(vii) 7x− 8y + 9z = 33 (viii) x− y + 2z = 4

9x + 8y − z = −1 3x + y + 4z = 6x− 7y + 9z = 26 x + y + z = 1

(ix) 2x + y − 5z = 0 (x) 4x− 3y + 2z = 8x − y − z = −3 3x + y − 4z = 2

− 3x− 3y + 9z = −3 5x − 7y + 8z = 1(xi) x + y − 2z − t = 1 (xii) 2x + y + z − 3t = 1

2x− y + 3z + t = 2 x + 2y − z + 3t = 2x + 4y − 9z − 4t = 2 − x − 2y + 2z − t = −3

(xiii) 2x−

y = 5 (xiv) 7x + 3y = 23x + 2y = 4 4x− 2y = 87x + 8y = 6 x + y = 1

(xv) 2x− y + z + 2t = 5 (xvi) 2x + 2y − z = 2− x + 2y + 2z − 3t = −3 − 3x− y + 2z = 3

3x − y + z + t = 6 − 2x + 3y + z = 0− 2x + y − 2z − 3t = −6 4x− y − z = 0


19/78

CHAPTER 2

Matrices

2.1 Definitions.

The methods presented in Chapter 1 are very effective for solving a single set of simultaneousequations. If these were the only problems which were encountered then we would not need to

develop other methods. One of the situations in which other methods are needed was referred toin Chapter 1. Consider again the mining company. Over time it will receive many orders and eachtime it receives an order it will have to solve a set of simultaneous linear equations. As it is onlythe orders which change and not the ores in the mines, these sets of equations all have the same lefthand sides. It is only the right hand sides which change. In reducing each array, the calculationson the left hand side of the line are the same every time and it is wasteful to have to repeat them.We should be able to devise methods which avoid this. There are several such methods and in thischapter we consider one of them.

In this method we treat the set of equations as a single equation. Consider the simplest possiblecase of one equation in one unknown. The solution is easy to obtain. The equation is

ax = b, with a= 0. (1)

The solution, x = b/a, is easily obtained by dividing both sides by a, but we shall write this outin a way which we shall later be able to generalise to more than one equation. First multiply bothsides by a−1, the inverse of a, and then we obtain

a−1ax = a−1b,

1x = a−1b,

x = a−1b.

It is this pattern of solution which we seek to imitate for a set of simultaneous equations. We note

that the left hand side of the equation determines a−1

. Once this has been computed, we need onlymultiply it by the right hand side b to obtain the solution. If the right hand side changes this doesnot affect a−1.

The first problem is to write a set of equations as a single equation. We begin with twoequations in two unknowns and we shall use subscript notation.

a11x + a12y = b1,

a21x + a22y = b2. (2)

Equation (1) has the formconstant times unknown = constant


20/78

2.1 Definitions 19

and we seek to write the equations (2) in this form. We have two unknowns and so we group theseinto an array.

x

y .

Definition 2.1

A rectangular array of numbers is called a matrix. The element in the i’th row and j’th column of a matrix is called the (i, j) element of the matrix.

Matrices will be fundamental to our work in this chapter. Returning to the set of simultaneous

equations (2), on the right hand sides of the equations we have two constants and we group theseinto a matrix b1b2

.

There are four constants on the left hand side of the equations and, following the pattern of thearrays used in the previous chapter, these are grouped into a matrix with two rows and two columns.

a11 a12a21 a22

.

A matrix is always enclosed in parentheses to indicate that the array is being treated as a singleobject.

The equations (2) can now be written in the form

constant times unknown = constant,

by using the three matrices which we have introduced.a11 a12a21 a22

.

xy

=

b1b2

(3)

This is a single equation, but the constants and the unknown are matrices rather than numbers.Before the equation can be solved however there are several aspects of it which require clarification.

1. What does it mean to say that two matrices are equal?2. How are matrices multiplied together?

There are no predetermined answers to these questions. The answers are definitions, but thedefinitions are adopted because of their usefulness in applications.

Equation (3) contains matrices of several different shapes and the following terminology is usedto describe the shape of a matrix.

Definition 2.2

A matrix containing m rows and n columns is said to be of size m × n. An n × nmatrix is called square. An n × 1 or 1× n matrix is also called a vector. No distinction isusually made between a 1 × 1 matrix and a number.


21/78

20 2 Matrices

For example, the matrices

123

, 2 3 4

5 6 7

and ( 1 2 3 4 )

are of sizes 3 × 1, 2× 3 and 1 × 4 respectively.Matrices are usually denoted by capital letters such as A and B . Some care however is needed

when using computer algebra systems such as Mathematica which use capital letters to begininternal commands. In Mathematica , problems will be encountered if we attempt to use C , Dand some other capital letters as names of matrices. These letters are reserved for internal use inMathematica , and we get an error message if we attempt to use them for some other purpose, suchas naming a matrix.

If A is a matrix, then the elements of A are denoted by aij or by Aij . We shall usually usethe first of these notations when the matrix is named by a single capital letter, but for matriceswith composite names, such as A + B or A−BT , we shall use the second notation, referring to theelements of A + B for example by (A + B)ij.

The definition of equality of matrices can now be given. It is the obvious one to adopt.

Definition 2.3

Two matrices are equal if they have the same size and if the elements in correspondingpositions are equal.

Example 2.1

1.

1 2−1 1

= ( 1 2 ) (different sizes)

2.

1 2−1 1

=

2 1−1 1

(same size but unequal elements)

3. ( 1 2 ) = 1

2

(different sizes)

2.2 Multiplication of Matrices.

The second of the two questions we must deal with concerns multiplication of matrices. Thedefinition is complicated, but its form is dictated by its applications. In our case we must look atequation (2) from the previous section. As an alternative to equation (3), we can write equation(2) in matrix form as

a11x + a12ya21x + a22y =

b1b2 .


22/78

2.2 Multiplication of Matrices 21

Comparing this equation with equation (3), we see that

a11 a12a21 a22

.xy = a11x + a12y

a21x + a22y

This gives the rule for multiplying a 2 × 2 matrix by a 2 × 1 matrix to produce a 2 × 1 matrix.Applying this rule to an example gives

2 34 5

.

67

=

2 · 6 + 3 · 74 · 6 + 5 · 7

=

3359

.

This rule is sufficiently unusual that it may be useful to give some further examples of its use. Thefirst example is one that we shall use again later.

Suppose we have a country in which an epidemic is raging. The country will contain peoplewho are well, people who are sick and people who are dead. Suppose that each month 40% of

the well people become sick, 40% of the sick die and 20% of the sick recover. There are of courseno changes from being dead. What is the state of the population after 12 months? What is it inthe long run? We shall later be able to answer these questions in a very efficient way, but for themoment we shall be content with formulating the problem in matrix form.

Let the numbers of well, sick and dead people after n months be wn, sn and dn respectively.The initial numbers in each group are w0, s0 and d0 respectively. After one month, 40% of the wellpeople have become sick and 20% of the sick have become well. Thus the number of well people atthe end of one month is 60% of the number originally well added to 20% of the number originallysick. In symbols,

w1 = 3/5w0 + 1/5s0.

Similarly, the number of sick at the end of one month is given by

s1 = 2/5w0 + 2/5s0

and the number of dead is given byd1 = 2/5s0 + d0.

These results can be written in matrix form as follows,w1s1

d1

=

3/5 1/5 02/5 2/5 0

0 2/5 1

.w0s0

d0

,

and we see that the same rule for matrix multiplication applies as in the previous case. The resultwe have obtained in this case is not a set of equations to be solved. Rather it is a rule telling ushow to compute the state of the population after one month, assuming we know the initial state.We shall return to this problem in Chapter 4.

Both in the case of simultaneous equations and in the case of the epidemic problem, the rulewhich is used to multiply a matrix by a vector may best be described as “tipping rows of the firstmatrix down the columns of the second”. This same rule is used to multiply matrices of other sizes,but it is easily seen that there must be restrictions on the sizes of the matrices being multiplied.Suppose we multiply matrices A and B to obtain matrix C , that is

A·

B = C.


23/78

22 2 Matrices

In order to apply the rule of “tipping rows of A down columns of B ”, the number of elements in arow of A must equal the number of elements in a column of B . This can be expressed as

number of columns in A = number of rows in B .

It is only when this condition is satisfied that the matrices can be multiplied together.In the examples considered above, a square matrix and a vector were multiplied together. We

next consider an application where a product of matrices of other sizes arises.

A manufacturing company sells three products X, Y and Z. Suppose there is a sales staff of fourand that their names are A, B, C and D. The selling prices of the products and the commissionsreceived by the each member of the sales staff are given in thousands of dollars, ($K), in thefollowing table.

Product Price($K) Commission($K)X 2.50 0.25Y 3.80 0.35Z 4.05 0.38

Each month the sales manager is presented with the quantities of each product sold by eachsalesperson and the manager must compute the total sales and the commission to be paid to each of them. Suppose that in some particular month the manager is presented with the following figures.

SalesPerson Product X Product Y Product ZA 6 4 5B 5 7 8C 8 3 6D 7 6 4

Each of the tables contains an array of numbers, and these arrays can be written as matrices.We shall call these the Sales and Prices matrices, and denote them by S and P repectively.

S =

6 4 55 7 88 3 67 6 4

, P =

2.50 0.253.80 0.354.05 0.38

We wish to calculate the total value of the sales of each salesperson. For A we calculate the totalsales in $K as

6× 2.50 + 4× 3.80 + 5× 4.05.This is easily recognised as the (1, 1) entry in the matrix product which is obtained by tippingthe first row of S down the first column of P . Similarly, the total sales of B is given by the (2, 1)element of the product,

5× 2.50 + 7× 3.80 + 8× 4.05,obtained by tipping the second row of S down the first column of P . The total sales of C and Dare given by the (3, 1) and (4, 1) elements, respectively. The total sales of each member of the sales


24/78


staff can thus be obtained from the first column of the matrix product S · P .

S · P = 6 4 5

5 7 88 3 67 6 4

· 2.50 0.253.80 0.354.05 0.38

=

6 · 2.50 + 4 · 3.8 + 5 · 4.05 6 · 0.25 + 4 · 0.35 + 5 · 0.385 · 2.50 + 7 · 3.8 + 8 · 4.05 5 · 0.25 + 7 · 0.35 + 8 · 0.388 · 2.50 + 3 · 3.8 + 6 · 4.05 8 · 0.25 + 3 · 0.35 + 6 · 0.387 · 2.50 + 6 · 3.8 + 4 · 4.05 7 · 0.25 + 6 · 0.35 + 4 · 0.38

=

50.45 4.8071.50 6.7455.70 5.3356.50 5.37

.

The commission in $K paid to A is given by

6× 0.25 + 4× 0.35 + 5× 0.38.

This is the (2, 1) element in the matrix product, obtained by tipping the first row of S downthe second column of P . Similar calculations show that the commissions earned by the othersalespersons are given by the remaining elements of the second column of the matrix product.

We have shown that the total sales and the commissions for each member of the sales staff can be read from the product matrix. The total sales of A for example are $50,450 and thecommission paid is $4,800. All computer algebra systems can perform matrix multiplications, and

so the identification of the solution of a problem as a matrix product will enable the problem to besolved with such software.

The rule for matrix multiplication can be expressed in general terms in the following way. If the first matrix has size m×n and the second has size n× p, then the product will have size m× p.To obtain the (i, j) element of the product of two matrices, row i of the first matrix is tipped downcolumn j of the second, corresponding elements are multiplied together the products are added.The result is

ai1b1j + ai2b2j + · · · + ainbnj.The operation is illustrated diagrammatically in the following figure.

a11 . . . a1k . . . a1n...

......

ai1 . . . aik . . . ain...

......

am1 . . . amk . . . amn

b11 . . . b1j . . . b1 p...

......

bk1 . . . bkj . . . bkp...

......

bn1 . . . bnj . . . bnp

The precise definition of matrix multiplication can now be formulated.


25/78

24 2 Matrices

Definition 2.4

Let A be an m × n matrix and B be an n × p matrix. The product of A and Bis the m × p matrix whose (i, j) element is obtained by tipping the i

th row of A downthe j th column of B , multiplying together each pair of elements and adding the results.The product is denoted by A · B or simply by AB. This can be expressed formally insummation notation as

(A ·B)ij = ai1b1j + ai2b2j + · · ·+ ainbnj

=

nk=1

aikbkj .

Note that we are using lower case letters to denote elements on the right hand side of thisdefinition, but not on the left hand side where the matrix has a composite name. We shall adoptthe same convention for the other operations on matrices in the next section.

We consider some examples and in these examples we shall write out the calculations in detail,although in hand calculations much of the arithmetic would be done mentally.

Example 2.2

1.

3 41 2

·

1 −22 −3

=

3 · 1 + 4 · 2 3 · (−2) + 4 · (−3)1 · 1 + 2 · 2 1 · (−2) + 2 · (−3)

=

11 −185 −8

2. ( 2 1 ) ·

3 87 6

= ( 2 · 3 + 1 · 7 2 · 8 + 1 · 6 )

= ( 13 22 )

3. ( 1 −1 2 ) · 2−1

3

= 1 · 2 + (−1) · (−1) + 2 · 3

= 9

Matrix multiplication differs from ordinary multiplication of numbers in two ways. First,because of the size restrictions it is not always possible to multiply two matrices together. Second,and more important, is the fact that the order of the factors usually affects the result. There arefive possibilities. We shall give examples of each.

1. The multiplication is not possible in either direction. For example, let

A =

1 23 1

, B = ( 1 2 3 ) .

Since A is 2

×2, while B is 1

×3, neither A

·B nor B

·A can be calculated.


26/78


2. The multiplication is possible in one direction but not the other. For example, let

A = ( 1 2 ) , B = 2 13 2

.Then A ·B = ( 8 5 ) , but B ·A is undefined because B is 2× 2, while A is 1× 2.

3. The multiplication is possible in both directions, but matrices of different sizes are produced.For example, let

A =

31

, B = ( 2 3 ) .

Then

A ·B =

6 92 3

, B ·A = 9.

4. The multiplication is possible in both directions and the resulting matrices have the samedimensions, but are unequal. For example, let

A =

2 31 4

, B =

−1 23 −2

.

Then

A ·B =

7 −211 −6

, B · A =

0 54 1

.

This case can only occur when the matrices are square and of the same size. This possibilityis important in algebraic calculations with matrices and we give several examples below.

5. The multiplication is possible in both directions and the resulting matrices are equal. Againthis can only occur when the matrices are square and of the same size. For example, let

A =

3 −12 0

, B =

−5 4−8 7

.

Then

A ·B = −7 5−10 8

, B · A =

−7 5−10 8

.

The following terminology is used to describe these possibilities.

Definition 2.5

1. Two matrices A and B are said to commute if

A ·B = B ·A.

2. Because there are pairs of matrices which do not commute, matrix multiplication issaid to be non-commutative.


27/78

26 2 Matrices

Throughout this section, we have denoted matrix multiplication by a dot, writing the productof the matrices A and B as A·B. This is the notation used by Mathematica for matrix multiplication,but it is not usually used in printed text. From here on we shall usually omit the dot and simply

write the product of A and B as AB .Finally in this section, we return to the representation of a set of simultaneous linear equationsin matrix form. Having defined matrix equality and matrix multiplication, we can write the set of equations in the form

AX = B,

where A is the matrix of coefficients from the left hand side, X is the column matrix or vector of unknowns and B is the column matrix or vector of constants from the right hand side. In the caseof two equations in two unknowns for example we would have

A =

a11 a12a21 a22

, X =

xy

and B =

b1b2

.

2.3 Other Operations with Matrices.

There are several other operations which can be performed on matrices in addition to the theoperation of multiplication. In this section we shall introduce some of these, in particular sums,multiples and transposes. As with multiplication, the definitions of these operations are chosenbecause of their usefulness in particular applications. While the definition of multiplication iscomplicated and unintuitive, the definitions of the present operations are quite straightforward.

Consider the mining company which we introduced in the previous chapter. Suppose thecompany receives two orders, one for 5,000 gm of silver, 7,000 kg of lead and 6,000 kg of zinc, whilethe other is for 3,000 gm of silver, 5,000 kg of lead and 5,000 kg of zinc. The total order is for 8,000

gm of silver, 12,000 kg of lead and 11,000 kg of zinc. Each order is represented by a 3 × 1 matrixand the fact that the total order is the sum of the two individual orders suggests that we shouldwrite

576

+

35

5

=

812

11

.

This in turn suggests that the sum of two matrices is obtained by adding the elements in corre-sponding positions. In order for this to be possible, the two matrices must have the same size.

Definition 2.6

Let A and B be two matrices of the same size. The sum of A and B, denoted byA + B , is the matrix obtained by adding the elements in corresponding positions. Insymbols,

(A + B)ij = aij + bij .

We next consider some examples. In these examples, as with our earlier examples of matrixmultiplication, we shall write out the calculations in detail, although in hand calculations much of the arithmetic would be done mentally.


28/78

2.3 Other Operations with Matrices 27

Example 2.3

1. 2 13 6+

−1 21 4 =

2− 1 1 + 23 + 1 6 + 4

=

1 34 10

2. ( 1 2 ) + ( 4 3 ) = ( 1 + 4 2 + 3 )

= ( 5 5 )

3. (−2 1 ) +

1 34 5

is undefined.

The second operation on matrices to be considered in this section is multiplication of a matrixby a number. This is carried out element by element and there are no size restrictions. Thedefinition is suggested by considerations similar to those used above for addition of matrices.

Definition 2.7

Let A be a matrix and c be a number. The product of A by c, denoted by cA, is thematrix obtained by multiplying each element of A by c. In symbols

(cA)ij = caij .

This definition is consistent with the definition of addition, as it justifies results of the form

A + A = 2A, 3A− 4A = −A

Example 2.4

1. 3

1 24 5

=

3 · 1 3 · 23 · 4 3 · 5

=

3 612 15

2. −1

1 −4 5−2 1 6

=

(−1) · 1 (−1) · (−4) (−1) · 5

(−1) · (−2) (−1) · 1 (−1) · 6

=

−1 4 −52 −1 −6

The operations we have considered so far have analogies with the operations of ordinary arith-metic with numbers. The next operation however has no analogue in ordinary arithmetic. It is an


29/78

28 2 Matrices

operation which does not appear to have any obvious application, but we shall use it in the nextchapter when discussing the inverse of a matrix.

Definition 2.8

Let A be a matrix. The transpose of A, denoted by AT , is the matrix obtained byinterchanging the rows and columns of A. Thus

(AT )ij = aji .

There are no size restrictions on the operation of transposing a matrix. If A is m × n, thenAT is n ×m. If A is square, then AT has the same size as A.

Example 2.5

1.

1 23 4

T =

1 32 4

.

2.

12

T = ( 1 2 ) .

3. 1 42 5

3 6

T

=

1 2 34 5 6

.

There are many frequently used properties of the operations used in elementary arithmetic of real numbers. They are all obvious and are not usually explicitly mentioned, but these properties arenot obvious when we consider other types of entities besides numbers. Indeed, the properties mightnot even hold. We have seen one example of this already with the fact that matrix multiplicationis not commutative.

In ordinary arithmetic with numbers we have two main operations, addition and multiplication.There are also inverses to both of these operations. The principal properties of the two operationsare as follows.

1. Both addition and multiplication are commutative, that is

a + b = b + a, ab = ba.

2. Both addition and multiplication are associative, that is

(a + b) + c = a + (b + c), (ab)c = a(bc).

3. Multiplication is distributive over addition, that is

a(b + c) = ab + ac.


30/78

2.4 The Inverse of a Matrix 29

For matrix addition and multiplication, there are size restrictions which result in the operationsbeing not always defined. In many cases however, we apply these operations to square matrices of the same size and then there are no problems with sizes. If we consider only matrices of appropriate

sizes, then the only one of the above properties which does not hold is that multiplication is notcommutative. Thus for matrices of appropriate sizes, we have

A + B = B + A, (A + B) + C = A + (B + C ),

(AB)C = A(BC ), A(B + C ) = AB + AC ,

but not AB = BA. Powers of a matrix can be defined in the obvious way as

A2 = A ·A, A3 = A2 · A, and so on.

The fact that matrix multiplication is not commutative implies that care is needed in applying

familiar algebraic results to matrices. Two particular examples are that (AB)2 may not equal A2B2and (A + B)2 may not be equal to A2 + 2AB + B2. What is true in the first case is that by theassociative rule for matrix multiplication,

(AB)2 = (AB)(AB) = A(BA)B,

but it will usually not be correct to replace B A by AB in order to get (AA)(BB ). In the secondcase, we can use the distributive rule to obtain

(A + B)2 = (A + B)(A + B) = A2 + AB + BA + B2

and again we cannot usually replace B A by AB to obtain 2AB.The inverse operation to addition is subtraction and this works in a straightforward manner

for matrices. Matrix multiplication however is a complicated operation and we have to be carefulwith its inverse. This is the topic of the next section.

2.4 The Inverse of a Matrix.

In the arithmetic of real numbers, the inverse of the operation of multiplication is the operationof division, but division is not a useful concept for matrices because matrix multiplication is notcommutative. There is however another way of thinking about division of numbers. If we take anonzero number b, then it has a reciprocal or inverse b−1. The quotient a/b can then be regarded

as b

−1

a. It could equally well be regarded as ab

−1

. For real numbers, it does not matter which isused because multiplication is commutative, but this is not the case for matrices. Thus for matriceswe cannot meaningfully define A/B and we must use the terminology of inverses.

At the beginning of this chapter we wrote out the solution of an equation in one unknownusing the terminology of inverses, rather than division. We must now consider this in more detailand then apply it to the matrix equation

AX = B.

The inverse of a nonzero number a is the number a−1 with the property that

a−1a = 1.


31/78

30 2 Matrices

To apply this same idea to matrices, the first thing we require is a matrix analogue of the number1. This number has one very simple property, namely 1 ·x = x, for every number x. This is anotherproperty which is so simple that it is not usually mentioned in ordinary arithmetic. In the case

of matrices however, there cannot be a matrix, I , which has the property that IA = A, for everymatrix A. This is prevented by the size restrictions on matrix multiplication. A more modestrequirement is that I A = A, for all matrices of a given size.

There is however still a problem because of the fact that matrix multiplication is noncommu-tative. Should we require I A = A or AI = A or both? The solution is to require both.

Definition 2.9

The n × n matrix I n with the property that

AI n = I nA = A,

for every n × n matrix A, is called the n × n unit matrix.

From our discussion so far, there is no guarantee that such matrices exist. However someexperimentation quickly shows that a square matrix in which every element on the main diagonalis 1 and all other elements are 0 has the required property. It can also be shown that these matricesare the only ones with the property. If the size of the unit matrix is obvious from the context, thenthe unit matrix is denoted simply by I .

Example 2.6

Let

A =

1 2−3 4

.

Then a simple calculation gives

I 2A =

1 00 1

1 2−3 4

=

1 2−3 4

= A,

AI 2 = 1 2−3 4 1 00 1 = 1 2−3 4 = A.

A non-square matrix can be multiplied by a unit matrix provided the size restrictions aresatisfied, and the matrix will remain unchanged. However the multiplication can be carried out inonly one direction.

Example 2.7

Let

A = 1 3 4

−2 1 2 .


32/78


Then

I 2A =

1 00 1

1 3 4−2 1 2

=

1 3 4−2 1 2

= A,

but AI 2 is undefined. However

AI 3 =

1 3 4−2 1 2

1 0 00 1 00 0 1

= 1 3 4−2 1 2

= A,

but of course I 3A is undefined.

We shall be particularly concerned with the situation where the n × n unit matrix multipliesan n × 1 column vector. The result is simply the vector. Thus if n = 2,

I 2X = 1 00 1

xy = x

y = X.

Having defined unit matrices, we can consider the inverse of a square matrix. The definitionis straightforward, given our earlier discussion.

Definition 2.10

Let A be a square matrix. The matrix A−1 is the inverse of A if

A−1

A = AA−1

= I ,

where I is the unit matrix of the same size as A.

We restrict the discussion here to square matrices. For nonsquare matrices, we would haveto define left and right inverses and these would not be equal. Indeed, they would have differentsizes. There is no assurance that any given square matrix has an inverse, but it can be shown thata square matrix can have at most one inverse. We shall find in fact that many matrices do nothave inverses and that the possession of an inverse by a square matrix A is related to whether ornot the equations AX = B have a unique solution.

The basic problem facing us is to find the inverse of a given square matrix. Before we tacklethis problem however, let us consider the method for solving a set of equations, assuming that theinverse is known. The equations are written as AX = B . We multiply both sides by A−1 and thenthe solution is obtained as follows.

AX = B

A−1AX = A−1B

IX = A−1B

X = A−1B

Once the inverse of the matrix is known, then the solution can be obtained by a single matrixmultiplication.


33/78

32 2 Matrices

Example 2.8

Consider the set of equations2x

−y = 2,

−5x + 3y = −1.These equations can be written in matrix form as AX = B , where

A =

2 −1−5 3

, X =

xy

and B =

2−1

.

We shall later show that the inverse of A is

A−1 =

3 15 2

.

It is easily checked that AA−1 = A−1A = I . Given this result, the solution of the set of equationsis

X = A−1B =

3 15 2

2−1

=

58

,

that is x = 5 and y = 8.

The only remaining problem is to find a method for calculating the inverse matrix. Thereare several methods for doing this, but the one we consider here is based on Gaussian reduction.Consider the case of a 2 × 2 matrix

A = a11 a12a21 a22

.Let the inverse matrix be

A−1 =

z11 z12z21 z22

.

Suppose we solve the equation AX = B , where B =

10

. The result is

X = A−1B =

z11 z12z21 z22

10

=

z11z21

.

The solution is thus the first column of the inverse matrix. If we solve the equation with B = 0

1

,then we obtain the second column of the inverse matrix. In both cases the steps in the reductionare determined by the elements of the matrix A and so the steps are the same. Because of this wecan carry out both reductions at once by beginning with the array

a11 a12 1 0a21 a22 0 1

and reducing it to an array of the form

1 0 z11 z120 1 z21 z22


34/78


The array to the right of the vertical line gives the elements of the matrix A−1.

Example 2.9

LetA =

2 −1−5 3

.

The array to be reduced is obtained by writing the elements of A to the left of the vertical line andthe elements of the 2 × 2 unit matrix to the right of the line. We then reduce the array to obtainthe elements of the unit matrix to the left of the line. The reduction is as follows.

2 −1 1 0−5 3 0 1

6 −3 3 0 (R1 → 3R1)−5 3 0 1

1 0 3 1 (R1 → R1 + R2)−5 3 0 1

1 0 3 10 3 15 6 (R2 → R2 + 5R1)

1 0 3 10 1 5 2 (R3 → R3/3)

From this array we see that the inverse matrix is given by

A−1 =

3 15 2

.

This result can be easily checked. Matrix multiplication shows that

A−1A = AA−1 = I .

A similar procedure is followed for a square matrix of any size. We shall consider one furtherexample, in this case a 3× 3 matrix.

Example 2.10Consider the 3× 3 matrix A defined by

A =

1 −2 12 −1 3

3 2 4

.

The initial array is obtained by writing the array of elements of A to the left of the line and the3× 3 unit array to the right of the line.

1 −2 1 1 0 02 −1 3 0 1 03 2 4 0 0 1


35/78

34 2 Matrices

The reduction can be carried out as follows. In this hand calculation, we have sought to avoidfractions where possible.

1 −2 1 1 0 00 3 1 −2 1 0 (R2 → R2− 2R1)0 8 1 −3 0 1 (R3 → R3− 3R1)

1 −2 1 1 0 00 9 3 −6 3 0 (R2 → 3R2)0 8 1 −3 0 1

1 −2 1 1 0 00 1 2 −3 3 −1 (R2 → R2−R3)0 8 1

−3 0 1

1 0 5 −5 6 −2 (R1 → R1 + 2R2)0 1 2 −3 3 −10 0 −15 21 −24 9 (R3 → R3− 8R2)

1 0 5 −5 6 −20 1 2 −3 3 −10 0 1 −7/5 8/5 −3/5 (R3 → −(1/15)R3)

1 0 0 2 −2 1 (R1 → R1− 5R3)

0 1 0 −1/5 −1/5 1/5 (R2 → R2− 2R3)0 0 1 −7/5 8/5 −3/5

The inverse matrix can be read from this array as

A−1 =

2 −2 1−1/5 −1/5 1/5−7/5 8/5 −3/5

= −1/5

−10 10 −51 1 −1

7 −8 3

.

Again this result is easily checked by matrix multiplication.

Calculations such as this are extremely tedious to carry out by hand for matrices any largerthan that considered here, but the calculations are ideally suited to a computer. Mathematica hasa command Inverse which performs the required calculation. For the above example, Mathematica gives the result in the first form without taking the fraction outside the matrix.

2.5 Singular Matrices.

In the arithmetic of real numbers, every number except zero has an inverse. The number 0has the property that a + 0 = a and it is not difficult to find a matrix analogue of this property.


36/78

2.5 Singular Matrices 35

Definition 2.11

The m

×n matrix Θ with the property that

A + Θ = Θ + A = A,

for every m × n matrix A, is called the m× n zero matrix. The n × n zero matrix isdenoted by Θn.

A little experimentation shows that Θ is the m×n matrix with all elements zero. The number0 has no inverse and we would expect the zero matrix Θn to similarly have no inverse. It is easilyshown that this is indeed the case. Since

A ·Θn = Θn ·A = Θn,

for every n × n matrix A, so it is not possible to have a matrix B with the property

B ·Θn = Θn ·B = I n.

Thus Θn has no inverse. The new feature of matrix arithmetic however is that there are non-zeron × n matrices which do not have an inverse. Consider the matrix

A =

1 −2−1 2

.

Suppose that A has an inverse B. Then A ·B = I 2 and so 1 −2−1 2

·

b11 b12b21 b22

=

b11 − 2b21 b12 − 2b22−b11 + 2b21 −b12 + 2b22

=

1 00 1

.

Using the definition of equality of matrices this requires

b11 − 2b21 = 1,−b11 + 2b21 = 0.

This is impossible and so we have found a non-zero matrix which has no inverse.

The question of whether or not a square matrix has an inverse is closely related to whether ornot the set of simultaneous equations AX = B has a unique solution. If the matrix has an inverse,then the equations have the unique solution X = A−1B. The statement

“if the matrix has an inverse then the equations have a unique solution”

is the same as the statement

“if the equations do not have a unique solution, then the matrix does not have an inverse”.

In Chapter 1, we found that the equations do not have a unique solution if a row of zerosappears to the left of the line in a Gaussian reduction. When this happens the equations eitherhave no solution b ecause they are inconsistent or they have an infinite number of solutions becausethey are redundant. Thus we can determine whether a matrix has an inverse by performing a


37/78

36 2 Matrices

Gaussian reduction on it. In the previous section we used Gaussian reduction to find the inverseof a square matrix and we can now conclude that if a row of zeros appears to the left of the lineduring the reduction, then the matrix has no inverse. In technical terms, the matrix is singular.

This gives us a method for determining whether or not a square matrix is singular. In fact wedon’t need to have any equations at all. We only need to carry out a Gaussian reduction on thearray of numbers in the matrix itself. We usually carry out the reduction using matrix notation,but it is important to remember that the reduced matrix is not equal to the original matrix. Thereduced matrix tells us a lot about the original matrix, but the two are not equal. In the situationhere, the reduced matrix tells us whether or not the original matrix is singular.

Example 2.11

Consider the matrix

A =

1 −2−1 2

.

The reduction requires only one step, R2 → R2 + R1, and the reduced matrix is

1 −20 0

.

Thus A is singular.

Example 2.12

Consider the matrix

A = 1 −2 5

2 3 −15 4 3

.

The reduction of this matrix is carried out as follows. 1 −2 50 7 −11

0 14 −22

(R2 → R2− 2R1)

(R3 → R3− 5R1)

1 −2 50 7 −110 0 0

(R3 → R3− 2R2)Thus the matrix A is singular. 2.6 Problems.

1. Let

A =

1 23 −1

, B =

2 −1−1 4

, C = ( 1 3 ) ,

D = 22 , I =

1 00 1 , Θ =

0 00 0 .


38/78

2.6 Problems 37

Find where possible

(i) AB (ii) A + B (iii) AI

(iv) A + Θ (v) C −

Θ (vi) BC

(vii) 2Θ (viii) BT C (ix) BA

(x) CI (xi) 2A−BT (xii) IC (xiii) ΘC (xiv) B −DC (xv) ΘD(xvi) A(DC ) (xvii) Θ(IC ) (xviii) AT + CD

(xix) A2 (xx) A3

2. Let

A =

1 3 −24 1 −11 3 2

, B =

1 0 2−1 3 1

−2 −

1 2

, C =

0 13 −14 2

,

D =

1 1 4−2 3 1

, I = I 3, E =

1 22 −1

.

Find where possible

(i) AB (ii) BC (iii) A−B(iv) IE (v) C − 2D (vi) AD

(vii) A− 2I (viii) IB (ix) ID(x) CD (xi) DC (xii) AT B

(xiii) CE (xiv) C T B (xv) DA

(xvi) ED (xvii) DE (xviii) C + DT

(xix) (CE )D (xx) DT C T

3. Let

A =

1 −1 22 1 3

1 −2 1

, B =

2 −2 11 1 −1

3 1 2

, C =

1 1 −12 −2 3

3 1 2

.

Verify each of the following ten properties for these matrices. The first six equalities can beshown to be true for any square matrices of the same size. The final two results are inequlitieswhich hold for these particular matrices. In cases where the matrices commute however, theseresults become equalities.

(i) A(BC ) = (AB)C

(ii) A + (B + C ) = (A + B) + C

(iii) (A + B)T = AT + BT

(iv) A(B + C ) = AB + AC

(v) (AB)T = BT AT

(vi) (AT )2 = (A2)T

(vii) (A + B)2 = A2 + 2AB + B2

(viii) (AB)2

= A2B2


39/78

38 2 Matrices

4. Determine whether each of the following matrices is singular or nonsingular. For those whichare nonsingular, find the inverse and check whether the inverse is correct by multitplying it bythe original matrix.

(i)

3 −1−5 4

(ii)

1 2 43 −2 −1

1 3 5

(iii)

1 −1 4−2 7 −6

1 9 8

(iv)

1 3 12 5 3

4 3 2

(v)

1 3 12 7 4

1 1 −4

(vi)

3 5 −1 12 1 −3 −2

−2 4 1 71 9 0 8

5. Let A be a nonsingular square matrix and let c be a nonzero number. Show that

(i) (A−1)−1 = A, (ii) (A2)−1 = (A−1)2,

(iii) (cA)−1 = 1c

A−1, (iv) (AT )−1 = (A−1)T .

6. Use the method of inverse matrices, where possible, to solve each of the systems in Problem 1of Chapter 1.

7. Find the inverse of the matrix 1 4 11 1 12 3 1

and use it to solve each of the following sets of simultaneous linear equations.

(i) x + 4y + z = 3 (ii) x + 4y + z = 7x + y + z = 6 x + y + z = 1

2x + 3y + z = 6 2x + 3y + z = 6

(iii) 2x + 3y + z = −1 (iv) 4x + y + z = 12x + y + z =

−2 x + y + z = 6

x + 4y + z = 4 3x + y + 2z = 13

(v) x + y + z = −1x + 2y + 3z = 3x + y + 4z = 2

8. Find the inverse of the matrix 2 4 −11 3 −2−3 −2 −4

and use it to solve each of the following sets of simultaneous linear equations.


40/78

2.6 Problems 39

(i) 2x + 4y − z = −9 (ii) 2x + 4y − z = 3x + 3y − 2z = −10 x + 3y − 2z = 3

− 3x − 2y − 4z = −6 − 3x− 2y − 4z = 1

(iii) − 3x− 2y − 4z = −16 (iv) − 4x− 2y − 3z = −82x + 4y − z = −3 − 2z + 3y + z = −13

x + 3y − 2z = −7 − x + 4y + 2z = −12(v) 3x− 2y + z = −15

4x− y + 2z = −16− 2x − 4y − 3z = −3


41/78

CHAPTER 3

Determinants

3.1 Definitions.

In the previous chapter, we used Gaussian reduction to determine whether or not a matrixis singular, but there is another method which can be used for such calculations, and one which

introduces an important new property of matrices. To introduce the method, we return to thesolution of sets of simultaneous equations. In solving such sets of equations there are patterns inthe solutions which we have not yet examined. In order to see these patterns we shall use doublesubscript notation for the equations. Consider first two equations in two unknowns,

a11x + a12y = b1,

a21x + a22y = b2.

To solve for x, we multiply the first equation by a22, the second by a12 and subtract to obtain

(a11a22 − a21a12) x = b1a22 − b2a12. (1)

There is a different way to look at this calculation, and it is one we shall use later for threeequations in three unknowns. In this approach, we use the second equation to express y in termsof x.

a22y = b2 − a21x.We multiply the first equation by a22 and then substitute for a22y. The result is

a11a22x + a12(b2 − a21x) = b1a22,

from which we obtain(a11a22 − a21a12) x = b1a22 − b2a12,

as before. Similarly we can eliminate x to obtain

(a11a22 − a21a12) y = a11b2 − a21b1. (2)

The original equations will have a unique solution provided

a11a22 − a21a12 = 0.

If we write the equations in the form AX = B , then the expression a11a22 − a21a12 is constructedfrom the elements of the matrix A, and it is this expression which determines whether the equationshave a unique solution and hence, whether the matrix is singular.


42/78

3.1 Definitions 41

Definition 3.1

Let A be a 2 × 2 matrix. The number

a11a22 − a21a12

is called the determinant of the matrix and is written as det A or as

a11 a12a21 a22 .

Using this terminology, the 2×

2 matrix A is singular if det A = 0. This criterion is veryeasy to apply. We compute the determinant by multiplying the elements on the main diagonal andsubtracting the product of the elements on the other diagonal.

The problem now is to extend this criterion to larger matrices. It is by no means obvioushow to do this. We shall begin with the 3 × 3 case and shall again consider a set of simultaneousequations written in subscript notation.

a11x + a12y + a13z = b1 (3)

a21x + a22y + a23z = b2 (4)

a31x + a32y + a33z = b3 (5)

We could simply solve these by elimination and look for patterns in the solutions, but this generates

a very large amount of algebra and the pattern is difficult to extract. Instead, we shall use a methodsimilar to that used above for two equations in two unknowns. Some properties of 2×2 determinantswill also be required. We shall later extend these properties to determinants of any size, but theyare easy to check in the 2 × 2 case.

1. If each element in one column of the determinant is a sum, then the determinant is a sum of determinants. Thus a11 + b1 a12a21 + b2 a22

= a11 a12a21 a22

+ b1 a12b2 a22

.2. If each element in one column of a determinant is multiplied by a constant, then the determi-

nant is multiplied by the constant. Thus

ka11 a12ka21 a22 = k a11 a12a21 a22 .3. If the columns of a determinant are interchanged, then the determinant changes sign. Thus a12 a11a22 a21

= − a11 a12a21 a22

.We return to the three equations in three unknowns. If we assume x is known, we can use

equations (4) and (5) to solve for y and z in terms of x. The equations to be solved are

a22y + a23z = b2 − a21x,a32y + a33z = b3

−a31x.


43/78

42 3 Determinants

These can be solved by elimination as a set of two equations in two unknowns. Using our earlierresults together with the three properties of 2 × 2 determinants, we obtain

a22 a23a32 a33 y = b2 − a21x a23b3 − a31x a33 = b2 a23b3 a33 − a21 a23a31 a33 x, a22 a23a32 a33 z =

a22 b2 − a21xa32 b3 − a31x =

a22 b2a32 b3 + a21 a22a31 a32

x.Notice that in the last line, we have interchanged the columns in the final determinant and so thesign has changed. If the equations are written in matrix form as AX = B, then the elements ineach determinant multiplying x, y or z in these expressions, are in the same column order as theyare in A. To achieve this requires the interchange of columns in the final determinant.

To use these results, we return to equation (3) and multiply it by

a22 a23a32 a33 .The above two results can then be substituted into the resulting equation to obtain

a11

a22 a23a32 a33− a12

a21 a23a31 a33+ a13

a21 a22a31 a32

x

= b1

a22 a23a32 a33− a12

b2 a23b3 a33+ a13

b2 a22b3 a32 .

Similar equations can be obtained for y and z . These results are the three dimensional analogue of equations (1) and (2). The results are much more complicated than before, but the interpretationis the same. The equations will have a unique solution provided

a11

a22 a23a32 a33− a12

a21 a23a31 a33+ a13

a21 a22a31 a32 = 0.

This expression provides us with the definition of the determinant of a 3 × 3 matrix, A.

det A =

a11 a12 a13a21 a22 a23a31 a32 a33

= a11a22 a23a32 a33

− a12 a21 a23a31 a33

+ a13 a21 a22a31 a32

.The pattern in this definition can be described as follows. To calculate the determinant, we

work our way across the top row. We multiply each element by the 2 × 2 determinant obtainedby crossing out the row and column containing that element. Notice that this requires that thecolumns in each 2× 2 determinant maintain the same order of elements as the columns of A. Wethen add the results together, but with alternating signs for the terms. For example, for the elementa12 we cross out the first row and the second column of the determinant to obtain the required 2 ×2determinant. The element a12 is multiplied by the value of this determinant with an appropriatesign attached. In this case the sign is negative.

a11 a12 a13a21 a22 a23a31 a32 a33


44/78

3.1 Definitions 43

Example 3.1

Consider the matrix A given by

A =

1 −2 52 3 −15 4 3

.The determinant is calculated as follows.

det A =

1 −2 52 3 −15 4 3

= 1 3 −1

4 3−

(

−2) 2 −1

5 3+ 5 2 3

5 4

= 1 · 13 + 2 · 11 + 5 · (−7)= 0

In Example 2.11, we showed that the matrix A is singular. We shall later show that a matrix issingular precisely when its determinant is zero and so the present result verifies the earlier one.The calculations in the two cases are however quite different in appearance.

It is important to remember the alternating signs in the expansion of the determinant. It isalso important to remember that a determinant is a number and not an array of numbers. To

describe the method for calculating a determinant, the following terminology is used.

Definition 3.2

Let A be an n×n matrix and let aij be an element of A. The cofactor of aij , denotedby Aij , is the (n− 1)× (n− 1) determinant obtained by

1. crossing out the i’th row and j ’th column of det A, and

2. multiplying the resulting determinant by (−1)i+j .

There is some ambiguity in the notation, as there are situations where the elements of thematrix A are denoted by Aij . The context however, always makes clear which meaning is intended.

Using this notation, the definition of the determinant of a 3 × 3 matrix A can be written as

det A = a11A11 + a12A12 + a13A13.

This same pattern of definition of the determinant in terms of cofactors persists for square matricesof any size. Just as in the 3×3 case, the required definition can be found by analysing the solutionsof sets of simultaneous linear equations.


45/78

44 3 Determinants

De

g mclelland notes

Documents