formal computational skills matrices 1. overview motivation: many mathematical uses eg writing...

Formal Computational Skills

Matrices 1

OverviewMotivation: many mathematical uses eg• Writing networks operations• Solving linear equations• Calculating transformations• Changing coordinate systems

By the end you should:• Be able to add/subtract/multiply 2 matrices• Be able to add/subtract/multiply 2 vectors• Use matrix inverses to solve linear equations• Write network operations with matrices

Advanced topics - will also discuss:• Matrices as transformations• Eigenvectors and eigenvalues of a matrix

Today’s TopicsMatrix/Vector Basics• Matrix definitions (square matrix, identity etc) • Matrix addition/subtraction/multiplication • Matrix inverse• Vector definitions• Vectors as geometric objects

Tomorrow’s TopicsUses of Matrices• Matrices as sets of linear equations• Networks as matrices• Solving sets of linear equations• (Briefly) Matrix operations as transformations• (Briefly) Eigenvectors and eigenvalues of a matrix

MatricesA matrix W is a 2d set of m x n numbers (often denoted by a capital (and sometimes bold) letter

654

321P

mnm

n

ww

w

www

W

1

21

11211

eg is a 2 x 3 matrix

W has m rows and n columns andwe say W has dimensions m x n

Each number or element in W is usually represented by a small wij and indexed with the row and column it is in ie wij is in the i’th row and j’th column

654

321P

2eg p12 = 2

224

533

432

201

WWhat is w32 + w21 ?

Is it a) 4 b) 5 c) 6

What are the dimensions of L? 3 x 4 or 4 x 3 …

Answer: 4 x 3

Note in both (all) cases indexed as row then column

3

w32 = 3

2

w21 = 2Answer: b) 5

Square and Diagonal Matrices

A diagonal matrix D is a square matrix where all off-diagonal elements (ie elements where i is not equal to j) equal zero

500

020

001

Q

nnn d

d

d

d

D

0

0

00

0

0

00 111

(where the upside-down A means “for all”)

eg

Mathematically we say: jiqij 0

42

31SIf m = n, say W is a square matrix eg

Identity MatrixThe identity matrix I is a diagonal matrix where all the diagonal elements are 1

It is ALWAYS represented by a capital I and it plays the role of the number 1 in multiplication and ‘division’

10

10

001

I

If you multiply a matrix by I you get that matrix: IA = AI = A

just as if you multiply a number by 1: 1 x 1.6 = 1.6 x 1 = 1.6

MATHS FACT: all maths ‘objects’ obey the same rules. One is they have to have a 1 (known as identity) which

a) leaves things unchanged when you multiply by it b) Is what you get when you divide something by itself

Matrix Transposethe transpose of a matrix P is written as PT and is P with the rows and columns swapped round

63

52

41TPeg so [PT]ij = [P]ji = pji

Qu: if P has dimensions m x n what dimensions does PT have?

654

321P

Answer: n x m

One useful equation: (AB)T = BTAT

Addition/SubtractionAdding/subtracting a scalar k (ie a constant):

A = W + k = k+W means W with k added to each element

ie aij = wij + k

ie if C = A + B then cij = aij + bij

Matrix plus/minus a matrix: both must be of the same size and we add each element together (point-by-point or point-wise operation)

876

432

262524

131211

222

111

654

321

765

4321

654

321Aeg

eg

Matrix-Constant Multiplication

363534

3332313

654

321A

if A=Bk = kB, where k is a constant: multiply all elements in B by k ie

aij= kbij

181512

963

Matrix-Matrix Multiplication

12108654)36()25()14(

642321)33()22()11(

213

212

211

654

321

if A=BC, to get the value in the i’th row and j’th column in A, take the i’th row in B, multiply each element by the corresponding element in the j’th column in C, and add them together

301532

12614n columns

n rows

so inner dimensions must match ie

m x p(n x p)(m x n)

n

kkjikij cba

1Formula:

p columns

m rows

Matrix Inverses

Given the complications of matrix multiplication, what about division?!?!

Use a matrix inverse: Inverse of A is written A-1

Analogy to numbers – inverse of 2 is 2-1 = ½. Just as:

2 x 2-1 = 2-1x 2 = 1, A A-1 = A-1A = I (the identity matrix)

This defines the inverse ie:

if AB = I and BA = I then B = A-1

It is used for division since dividing by 2 is like multiplying by 2-

1 ie

If : y = 2z, then: 2-1y = 2-12z = 1z = z . Similarly

if: Y = WX then: W-1Y = W-1WX = IX = X more later ...

Calculating A-1 is a pain: use a computer. Note however, that to do it, need to know the determinant of A which is written as |A|: analogous to ‘size’ of a matrix

Now try questions 1-4 off the work sheet.

Finally, often said matrix multiplication is associative but not commutative. What does this mean?

CBBC

A2 = AA but how about A1/2 ??

n

n

nn

n

d

d

d

d

d

d

D

3

2

1

3

2

1

00

00

00

00

00

00

Powers and Things

A(BC) = (AB)C (associative)

but, in general: (not commutative)

There are ways but we shouldn’t worry apart from special case of diagonal matrices which are easy

VectorsA vector is a special matrix where m or n is 1.

5

2

1

v

v

Column vector if n is 1:

Strictly (and I’m pretty strict) all vectors are column vectors so unless told otherwise assume a column vector so if you see vT then it is a row vector

Row vector if m is 1: v = (1, 2, 3)

Usually denoted by a small letter, sometimes bold v , sometimes underlined v, sometimes bold and underlined v and sometimes with an arrow over it: and sometimes just with a square-ish font. I’m going to (pretty much) stick with an underlined letter

A vector with n elements v = (v1, v2, …, vn)T is said to be n-dimensional and is a point or position ie an object in nD space.

v1 = (1, 6)T6

2

1 3

v2 = (3, 2)T

x

x

Here have 2D vectors and elements specify the coordinates

By convention, in 2D 1st element refers to the horizontal (x-) axis and 2nd to the vertical (y-) axis

Vectors have both direction and length and so often used for eg speed (think of arrows representing wind on weather maps)

Often visualised as a line/arrow joining the origin and that point

As they are matrices, vectors follow all rules of matrices so are eg added and subtracted in a point-wise way

v3 = v1 + v2 = (1, 6)T + (3, 2)T = (1+3, 6+2)T = (4, 8)T

Vector Addition/Subtraction

v1

However it can also be viewed geometrically: 6

2

1 3

v2

v3 = v1 + v2 v2

If you were at v1

then added v2 on to the end

you would be at v3

= (4, 8)T

v1

Can also use this to get the vector between 2 vectors

6

2

1 3

v3

Geometrically, subtracting a vector is the equivalent of going backwards along the arrow so:

v3 - v1= v2

Then forwards along v3 (+v3)

Get to v3 from v1 by going backwards along v1 (ie - v1)

- v1

v3 – v1 = v2

u = – v1 + v3

eg what is vector u from v1 to v3?

At any point in the path, the sum of the vectors points to current position so minus the sum points back to the start

Rather than have to ‘remember’ all the vectors of the path, just keep adding the last bit walked to a ‘home’ vector. Then return home directly by following minus the home vector

Organisms (ants, us etc) use this for path integration to get back home: Imagine a path of (random) vectors out to some goal

Length of VectorsLength of v written |v| (or ||v||)

From Pythagoras:

v = (3, 4)T4

3

543 22 v

Similarly if:

n

ii

n

vv

v

v

v1

21

22

21

2

1 vvvv

vv

and in general:

Note if |v| = 1 v is known as a unit vector

As the vector between u and v is v - u: distance between 2 points is:

222

211 )()( uvuvuv

Vector MultiplicationLike matrices, but can be viewed geometrically 3v = (3, 6)T6

3

Standard vector vector multiplication: vectors must be the same length, 1st one a row vector, 2nd a column vector eg

222

21

2

121 vvvv

vvvvvT

and in general:

v

22112

121 vuvuv

vuuvuT

n

iii

T vuvu1

Note vTv = |v|2 since eg

Eg multiplying by a constant k makes v k times longer

3v = 3(1, 2)T = (3, 6)T

Vector vector multiplication also known as the dot product (u.v) and the inner product <u,v>. Result is a single number

1

2

What does inner product mean??

Question: What is an angle in 10 dimensions?

Also can be interpreted as the projection of one vector onto another (usually unit) vector (cf principal component analysis)

u

v

t

u, (|u|=1)

v

t |v| cos(t)

So uTv = 0 if vectors are orthogonal (perpendicular t=90, cos(t)=0)

In n-dimensions use the dot product to define the angle between 2 vectors

uTv = |u| |v| cos(t) where t is the angle between the 2 vectors

uvand maximised if vectors are parallel (t=0 so cos(t) = 1)

uv

Summary of Main Points

• How to multiply 2 matrices• What a matrix inverse is• How adding vectors can be interpreted geometrically• How to calculate vector length |v| and distance between 2 vectors• vTv = |v|2

• uTv = |u||v|cos(angle between vectors) • To project u onto v if |v|=1 do uTv

Formal Computational Skills

Matrices 2

Yesterday’s TopicsMatrix/Vector Basics• Matrix definitions (square matrix, identity etc) • Matrix addition/subtraction/multiplication • Matrix inverse• Vector definitions• Vectors as geometric objects

Today’s TopicsUses of Matrices• Matrices as sets of equations• Networks written as matrices• Solving sets of linear equations• (Briefly) Matrix operations as transformations• (Briefly) Eigenvectors and eigenvalues of a matrix

Equations as Matrices

Suppose we have the following: 2x1 + 3x2 + 4x3 = y1

Or, using vectors: wT=(w1, w2, w3) and x = get: wT x = y1

3

2

1

x

x

x

Can write this as a matrix operation: 1

3

2

1

432 y

x

x

x

Similarly, can write:

w1x1 + w2x2 + w3x3 = y1 as 1

3

2

1

321 y

x

x

x

www

Bit more concise but not great

Sets of Equations as MatricesHowever, suppose we have several equations involving x eg

2x1 + 3x2 + 4x3 = y1

4x1 + 3x2 + 8x3 = y2

This becomes:

2

1

3

2

1

834

432

y

y

x

x

x

where y =

2

1

y

yyx

834

432Or:

Similarly:

w11x1 + w12x2 + w13x3 = y1

w21x1 + w22x2 + w23x3 = y2

yxwww

www

232221

131211becomes:

Or: Wx = y where:

232221

131211

www

wwwW

Matrices as Neural NetworksWill encounter this notation when dealing with Artificial Neural Networks (ANNs)

x1

x3

w2 y = w1x1 + w2x2 + w3x3x2

w3

w1

into output vectors (y)

Comes from connectionist picture of the brain as electrical impulses travelling along axons, modulated by synaptic strengths and being summed at synapses

(x) via a set (matrix) of numbers known as weights (W) associated with connections from inputs to outputs

Can think of networks as functions which transforms input vectors

• Above ANN takes 3D input vector xT=(x1, x2, x3).

xw

x

x

x

wwwxwy T

iii

3

2

1

321 )(

x1

x3

w2 y = w1x1 + w2x2 + w3x3x2

w3

w1

Thus sum is the same as if we multiplied w and x so can write output as

• It therefore has 3 connections from output to input each with an associated weight. Thus W is a 3d vector wT=(w1, w2, w3)).

• Inputs travel along connections and are multiplied by weights and summed to give the (1D) output y (say y is a weighted sum of the inputs).

If we have more than one output, need a matrix of weights W

Represent all weights by matrix W where each weight vector is a row of W and the ij’th element of W is the weight wij

y2

23

13

2221

1211

w

w

ww

wwW

x1

x3w22

y1

x2

w13

w11

w21

w23

w12

Effectively have one weight vector for each output

Each output is a weighted sum of input and corresponding weight vector ie a row of W multiplied by x

xW

x

x

x

w

w

ww

ww

xwxwxw

xwxwxw

y

yy

3

2

1

23

13

2221

1211

323222121

313212111

2

1

3

11

iiixw

y2 = w21x1 + w22x2 + w23x3 =

3

12

iiixw

x1

x3 w22

y1 = w11x1 + w12x2 + w13x3 =x2

w13

w11

w21

w23

w12

Note weights indexed (oddly) as wto from so that the matrix multiplication works

Thus writing output as a vector y network operation is: Wx = y

j

n

jiji xwy

1

So for n-dimensional input data i’th output is

ie

Finally, suppose we have many input vectors xi = (x1i, x2i , x3i)T

Since we know that a single output y = Wx we have:

Each input generates a different output vector. Thus make a matrix Y where the i’th column is the output due to the application of the i’th input vector to the network

21yyY

3231

2221

1211

21

xx

xx

xx

xxX

WXxWxWyyY 2121

Make a matrix X where the i’th column of X is the i’th input vector

Might not seem like a great leap but very convenient for mathematical manipulation (and matlab programming) eg …

Solving Linear EquationsSuppose we have to solve:

After some (tedious) calculations solve to get: x1 = -2, x2 = 1.

x1 + 2x2 = 0 3x1 + 7x2 = 1

Instead write as: Wx = y where:

and:

Giving:

1

0y

then solve the equations by multiplying both sides by W-1 since:

xyW

1

2

10

20

1

0

13

271

73

21W

So if Wx = y solve via x = W-1y using computer.... But not always so simple …

W-1 Wx = I x = x = W-1y

Sometimes W-1 does not exist ie W is singular (or near-singular). Why? Problem could be underdetermined eg

332

221

111

(usually) no unique solution [eg x1=1, x2=6 or x1=2, x2=4 etc]

Or one row is a linear combination (ie made up out of) of the others

2

121 1228

x

xxx

2

1

24

12

16

8

x

x(Row 2 = 2 x Row 1)

(Row 3 = Row 1+Row 2)

Same problem occurs if one equation is written twice

Problem is that we need more data (number of unknowns = number of equations/bits of info needed).

Or problem could be overdetermined eg

more outputs than inputs: contradictory solutions (x=2 and x=1)

xx

x

2

2

2

4

22

24

For W-1 to exist, W must be square and its rows must not duplicate info ie they must be linearly independent

In networks, used to find weights for Radial Basis Function networks and single layer networks (eg Bishop p.92-95)

This is often not the case so to avoid problems we use the pseudoinverse of W

If the problem is underdetermined finds the solution with the smallest sum of squares of elements of x and if overdetermined, finds an approximate solution (What sort? could be investigated…)

Matrix Vector Multiplication: Matrices as Transformations

If the dimensionality is right as above, can view a matrix vector multiplication as a transformation of one vector into another

If U is diagonal. Get expansion/contraction along the axes

222121

212111

2

1

2221

1211

vuvu

vuvu

v

v

uu

uuvU

22

11

2

1

2

1

0

0

vd

vd

v

v

d

dvD

x

x

x

x5v1

2v2x

x

x

xv1

v2

2

1

2

1

2

5

20

05

v

v

v

vvD

2

5

1

1

0

5

0

1etc

x

x

eg

Get a rotation anticlockwise thru t by:

So we see that (1,0)T goes to (cos t, sin t)T

tvtv

tvtv

v

v

tt

ttvR

cossin

sincos

cossin

sincos

21

21

2

1

In general, transformations produced by matrix multiplication can be broken down into a rotation followed by an expansion-contraction followed by another rotation

(1,0)T

(0,1)T

(cos t,sin t)T

(-sin t, cos t)T

and (0, 1)T goes to (-sin t, cos t)T

Eigenvectors and Eigenvalues

If : Ax = x

For some scalar not = 0, then we say that x is an eigenvector of A with eigenvalue .

Turns out eigenvectors are VERY useful

Clearly, x is not unique since if: Ax = x,

then: A2x = 2Ax = 2x = 2x

so it is usual to scale x so that it has length = 1.

Intuition: direction of x is unchanged by being transformed by A so it in some sense reflects the principal direction (or axis) of the transformation

Starting from an eigenvector, x however, get:

x

x

xx

x xie Repeatedly transform v by A.

Start at v then Av then AAv= A2v

Most starting points result in curved trajectories

etc …

Ax = x, A2x = 2x,

A4x = 4x, …

A3x = 3x,

So trajectory is a straight linex

x

x

x

Note if || > 1, x expands. If not, will contract

Eigenvector Facts

d

iii xzv

1

33020

30210

20101

true if A is the covariance matrix and many other important matrices, (unit length) eigenvectors will be orthogonal (say they are orthonormal):

xiT xj = 1 if i = j

xiT xj = 0 else

ie aij=ajiIf A is symmetric eg:

This means that the eigenvectors form a set of basis vectors and any vector can be expressed as a linear sum of the eigenvectors

ie they form a new rotated co-ordinate system

If the data is d-dimensional there will be d eigenvectors

Summary of Main Points• When dealing with networks, Wx = y means “outputs y are weighted sum of input x. That is: y is the network output given input x”• Similarly, WX = Y means “each column of Y is the network output operating on the corresponding column of X”• Matrix inverse can be used to solve linear equations, but pseudoinverse is more robust• In networks, use pseudoinverse to calculate ‘best’ weights that will transform training input vectors into known target vectors • Matrix vector multiplication can be seen as a transformation (eg rotations and expansions)• If Ax = x x is an eigenvector, with eigenvalue • Eigenvectors and eigenvalues tell us about main axes and behaviour of matrix transformations

formal computational skills matrices 1. overview motivation: many mathematical uses eg writing...

Documents