formal computational skills matrices 1. overview motivation: many mathematical uses eg writing...
TRANSCRIPT
Formal Computational Skills
Matrices 1
OverviewMotivation: many mathematical uses eg• Writing networks operations• Solving linear equations• Calculating transformations• Changing coordinate systems
By the end you should:• Be able to add/subtract/multiply 2 matrices• Be able to add/subtract/multiply 2 vectors• Use matrix inverses to solve linear equations• Write network operations with matrices
Advanced topics - will also discuss:• Matrices as transformations• Eigenvectors and eigenvalues of a matrix
Today’s TopicsMatrix/Vector Basics• Matrix definitions (square matrix, identity etc) • Matrix addition/subtraction/multiplication • Matrix inverse• Vector definitions• Vectors as geometric objects
Tomorrow’s TopicsUses of Matrices• Matrices as sets of linear equations• Networks as matrices• Solving sets of linear equations• (Briefly) Matrix operations as transformations• (Briefly) Eigenvectors and eigenvalues of a matrix
MatricesA matrix W is a 2d set of m x n numbers (often denoted by a capital (and sometimes bold) letter
654
321P
mnm
n
ww
w
www
W
1
21
11211
eg is a 2 x 3 matrix
W has m rows and n columns andwe say W has dimensions m x n
Each number or element in W is usually represented by a small wij and indexed with the row and column it is in ie wij is in the i’th row and j’th column
654
321P
2eg p12 = 2
224
533
432
201
WWhat is w32 + w21 ?
Is it a) 4 b) 5 c) 6
What are the dimensions of L? 3 x 4 or 4 x 3 …
Answer: 4 x 3
Note in both (all) cases indexed as row then column
3
w32 = 3
2
w21 = 2Answer: b) 5
Square and Diagonal Matrices
A diagonal matrix D is a square matrix where all off-diagonal elements (ie elements where i is not equal to j) equal zero
500
020
001
Q
nnn d
d
d
d
D
0
0
00
0
0
00 111
(where the upside-down A means “for all”)
eg
Mathematically we say: jiqij 0
42
31SIf m = n, say W is a square matrix eg
Identity MatrixThe identity matrix I is a diagonal matrix where all the diagonal elements are 1
It is ALWAYS represented by a capital I and it plays the role of the number 1 in multiplication and ‘division’
10
10
001
I
If you multiply a matrix by I you get that matrix: IA = AI = A
just as if you multiply a number by 1: 1 x 1.6 = 1.6 x 1 = 1.6
MATHS FACT: all maths ‘objects’ obey the same rules. One is they have to have a 1 (known as identity) which
a) leaves things unchanged when you multiply by it b) Is what you get when you divide something by itself
Matrix Transposethe transpose of a matrix P is written as PT and is P with the rows and columns swapped round
63
52
41TPeg so [PT]ij = [P]ji = pji
Qu: if P has dimensions m x n what dimensions does PT have?
654
321P
Answer: n x m
One useful equation: (AB)T = BTAT
Addition/SubtractionAdding/subtracting a scalar k (ie a constant):
A = W + k = k+W means W with k added to each element
ie aij = wij + k
ie if C = A + B then cij = aij + bij
Matrix plus/minus a matrix: both must be of the same size and we add each element together (point-by-point or point-wise operation)
876
432
262524
131211
222
111
654
321
765
4321
654
321Aeg
eg
Matrix-Constant Multiplication
363534
3332313
654
321A
if A=Bk = kB, where k is a constant: multiply all elements in B by k ie
aij= kbij
181512
963
Matrix-Matrix Multiplication
12108654)36()25()14(
642321)33()22()11(
213
212
211
654
321
if A=BC, to get the value in the i’th row and j’th column in A, take the i’th row in B, multiply each element by the corresponding element in the j’th column in C, and add them together
301532
12614n columns
n rows
so inner dimensions must match ie
m x p(n x p)(m x n)
n
kkjikij cba
1Formula:
p columns
m rows
Matrix Inverses
Given the complications of matrix multiplication, what about division?!?!
Use a matrix inverse: Inverse of A is written A-1
Analogy to numbers – inverse of 2 is 2-1 = ½. Just as:
2 x 2-1 = 2-1x 2 = 1, A A-1 = A-1A = I (the identity matrix)
This defines the inverse ie:
if AB = I and BA = I then B = A-1
It is used for division since dividing by 2 is like multiplying by 2-
1 ie
If : y = 2z, then: 2-1y = 2-12z = 1z = z . Similarly
if: Y = WX then: W-1Y = W-1WX = IX = X more later ...
Calculating A-1 is a pain: use a computer. Note however, that to do it, need to know the determinant of A which is written as |A|: analogous to ‘size’ of a matrix
Now try questions 1-4 off the work sheet.
Finally, often said matrix multiplication is associative but not commutative. What does this mean?
CBBC
A2 = AA but how about A1/2 ??
n
n
nn
n
d
d
d
d
d
d
D
3
2
1
3
2
1
00
00
00
00
00
00
Powers and Things
A(BC) = (AB)C (associative)
but, in general: (not commutative)
There are ways but we shouldn’t worry apart from special case of diagonal matrices which are easy
VectorsA vector is a special matrix where m or n is 1.
5
2
1
v
v
Column vector if n is 1:
Strictly (and I’m pretty strict) all vectors are column vectors so unless told otherwise assume a column vector so if you see vT then it is a row vector
Row vector if m is 1: v = (1, 2, 3)
Usually denoted by a small letter, sometimes bold v , sometimes underlined v, sometimes bold and underlined v and sometimes with an arrow over it: and sometimes just with a square-ish font. I’m going to (pretty much) stick with an underlined letter
A vector with n elements v = (v1, v2, …, vn)T is said to be n-dimensional and is a point or position ie an object in nD space.
v1 = (1, 6)T6
2
1 3
v2 = (3, 2)T
x
x
Here have 2D vectors and elements specify the coordinates
By convention, in 2D 1st element refers to the horizontal (x-) axis and 2nd to the vertical (y-) axis
Vectors have both direction and length and so often used for eg speed (think of arrows representing wind on weather maps)
Often visualised as a line/arrow joining the origin and that point
As they are matrices, vectors follow all rules of matrices so are eg added and subtracted in a point-wise way
v3 = v1 + v2 = (1, 6)T + (3, 2)T = (1+3, 6+2)T = (4, 8)T
Vector Addition/Subtraction
v1
However it can also be viewed geometrically: 6
2
1 3
v2
v3 = v1 + v2 v2
If you were at v1
then added v2 on to the end
you would be at v3
= (4, 8)T
v1
Can also use this to get the vector between 2 vectors
6
2
1 3
v3
Geometrically, subtracting a vector is the equivalent of going backwards along the arrow so:
v3 - v1= v2
Then forwards along v3 (+v3)
Get to v3 from v1 by going backwards along v1 (ie - v1)
- v1
v3 – v1 = v2
u = – v1 + v3
eg what is vector u from v1 to v3?
At any point in the path, the sum of the vectors points to current position so minus the sum points back to the start
Rather than have to ‘remember’ all the vectors of the path, just keep adding the last bit walked to a ‘home’ vector. Then return home directly by following minus the home vector
Organisms (ants, us etc) use this for path integration to get back home: Imagine a path of (random) vectors out to some goal
Length of VectorsLength of v written |v| (or ||v||)
From Pythagoras:
v = (3, 4)T4
3
543 22 v
Similarly if:
n
ii
n
vv
v
v
v1
21
22
21
2
1 vvvv
vv
and in general:
Note if |v| = 1 v is known as a unit vector
As the vector between u and v is v - u: distance between 2 points is:
222
211 )()( uvuvuv
Vector MultiplicationLike matrices, but can be viewed geometrically 3v = (3, 6)T6
3
Standard vector vector multiplication: vectors must be the same length, 1st one a row vector, 2nd a column vector eg
222
21
2
121 vvvv
vvvvvT
and in general:
v
22112
121 vuvuv
vuuvuT
n
iii
T vuvu1
Note vTv = |v|2 since eg
Eg multiplying by a constant k makes v k times longer
3v = 3(1, 2)T = (3, 6)T
Vector vector multiplication also known as the dot product (u.v) and the inner product <u,v>. Result is a single number
1
2
What does inner product mean??
Question: What is an angle in 10 dimensions?
Also can be interpreted as the projection of one vector onto another (usually unit) vector (cf principal component analysis)
u
v
t
u, (|u|=1)
v
t |v| cos(t)
So uTv = 0 if vectors are orthogonal (perpendicular t=90, cos(t)=0)
In n-dimensions use the dot product to define the angle between 2 vectors
uTv = |u| |v| cos(t) where t is the angle between the 2 vectors
uvand maximised if vectors are parallel (t=0 so cos(t) = 1)
uv
Summary of Main Points
• How to multiply 2 matrices• What a matrix inverse is• How adding vectors can be interpreted geometrically• How to calculate vector length |v| and distance between 2 vectors• vTv = |v|2
• uTv = |u||v|cos(angle between vectors) • To project u onto v if |v|=1 do uTv
Formal Computational Skills
Matrices 2
Yesterday’s TopicsMatrix/Vector Basics• Matrix definitions (square matrix, identity etc) • Matrix addition/subtraction/multiplication • Matrix inverse• Vector definitions• Vectors as geometric objects
Today’s TopicsUses of Matrices• Matrices as sets of equations• Networks written as matrices• Solving sets of linear equations• (Briefly) Matrix operations as transformations• (Briefly) Eigenvectors and eigenvalues of a matrix
Equations as Matrices
Suppose we have the following: 2x1 + 3x2 + 4x3 = y1
Or, using vectors: wT=(w1, w2, w3) and x = get: wT x = y1
3
2
1
x
x
x
Can write this as a matrix operation: 1
3
2
1
432 y
x
x
x
Similarly, can write:
w1x1 + w2x2 + w3x3 = y1 as 1
3
2
1
321 y
x
x
x
www
Bit more concise but not great
Sets of Equations as MatricesHowever, suppose we have several equations involving x eg
2x1 + 3x2 + 4x3 = y1
4x1 + 3x2 + 8x3 = y2
This becomes:
2
1
3
2
1
834
432
y
y
x
x
x
where y =
2
1
y
yyx
834
432Or:
Similarly:
w11x1 + w12x2 + w13x3 = y1
w21x1 + w22x2 + w23x3 = y2
yxwww
www
232221
131211becomes:
Or: Wx = y where:
232221
131211
www
wwwW
Matrices as Neural NetworksWill encounter this notation when dealing with Artificial Neural Networks (ANNs)
x1
x3
w2 y = w1x1 + w2x2 + w3x3x2
w3
w1
into output vectors (y)
Comes from connectionist picture of the brain as electrical impulses travelling along axons, modulated by synaptic strengths and being summed at synapses
(x) via a set (matrix) of numbers known as weights (W) associated with connections from inputs to outputs
Can think of networks as functions which transforms input vectors
• Above ANN takes 3D input vector xT=(x1, x2, x3).
xw
x
x
x
wwwxwy T
iii
3
2
1
321 )(
x1
x3
w2 y = w1x1 + w2x2 + w3x3x2
w3
w1
Thus sum is the same as if we multiplied w and x so can write output as
• It therefore has 3 connections from output to input each with an associated weight. Thus W is a 3d vector wT=(w1, w2, w3)).
• Inputs travel along connections and are multiplied by weights and summed to give the (1D) output y (say y is a weighted sum of the inputs).
If we have more than one output, need a matrix of weights W
Represent all weights by matrix W where each weight vector is a row of W and the ij’th element of W is the weight wij
y2
23
13
2221
1211
w
w
ww
wwW
x1
x3w22
y1
x2
w13
w11
w21
w23
w12
Effectively have one weight vector for each output
Each output is a weighted sum of input and corresponding weight vector ie a row of W multiplied by x
xW
x
x
x
w
w
ww
ww
xwxwxw
xwxwxw
y
yy
3
2
1
23
13
2221
1211
323222121
313212111
2
1
3
11
iiixw
y2 = w21x1 + w22x2 + w23x3 =
3
12
iiixw
x1
x3 w22
y1 = w11x1 + w12x2 + w13x3 =x2
w13
w11
w21
w23
w12
Note weights indexed (oddly) as wto from so that the matrix multiplication works
Thus writing output as a vector y network operation is: Wx = y
j
n
jiji xwy
1
So for n-dimensional input data i’th output is
ie
Finally, suppose we have many input vectors xi = (x1i, x2i , x3i)T
Since we know that a single output y = Wx we have:
Each input generates a different output vector. Thus make a matrix Y where the i’th column is the output due to the application of the i’th input vector to the network
21yyY
3231
2221
1211
21
xx
xx
xx
xxX
WXxWxWyyY 2121
Make a matrix X where the i’th column of X is the i’th input vector
Might not seem like a great leap but very convenient for mathematical manipulation (and matlab programming) eg …
Solving Linear EquationsSuppose we have to solve:
After some (tedious) calculations solve to get: x1 = -2, x2 = 1.
x1 + 2x2 = 0 3x1 + 7x2 = 1
Instead write as: Wx = y where:
and:
Giving:
1
0y
then solve the equations by multiplying both sides by W-1 since:
xyW
1
2
10
20
1
0
13
271
73
21W
So if Wx = y solve via x = W-1y using computer.... But not always so simple …
W-1 Wx = I x = x = W-1y
Sometimes W-1 does not exist ie W is singular (or near-singular). Why? Problem could be underdetermined eg
332
221
111
(usually) no unique solution [eg x1=1, x2=6 or x1=2, x2=4 etc]
Or one row is a linear combination (ie made up out of) of the others
2
121 1228
x
xxx
2
1
24
12
16
8
x
x(Row 2 = 2 x Row 1)
(Row 3 = Row 1+Row 2)
Same problem occurs if one equation is written twice
Problem is that we need more data (number of unknowns = number of equations/bits of info needed).
Or problem could be overdetermined eg
more outputs than inputs: contradictory solutions (x=2 and x=1)
xx
x
2
2
2
4
22
24
For W-1 to exist, W must be square and its rows must not duplicate info ie they must be linearly independent
In networks, used to find weights for Radial Basis Function networks and single layer networks (eg Bishop p.92-95)
This is often not the case so to avoid problems we use the pseudoinverse of W
If the problem is underdetermined finds the solution with the smallest sum of squares of elements of x and if overdetermined, finds an approximate solution (What sort? could be investigated…)
Matrix Vector Multiplication: Matrices as Transformations
If the dimensionality is right as above, can view a matrix vector multiplication as a transformation of one vector into another
If U is diagonal. Get expansion/contraction along the axes
222121
212111
2
1
2221
1211
vuvu
vuvu
v
v
uu
uuvU
22
11
2
1
2
1
0
0
vd
vd
v
v
d
dvD
x
x
x
x5v1
2v2x
x
x
xv1
v2
2
1
2
1
2
5
20
05
v
v
v
vvD
2
5
1
1
0
5
0
1etc
x
x
eg
Get a rotation anticlockwise thru t by:
So we see that (1,0)T goes to (cos t, sin t)T
tvtv
tvtv
v
v
tt
ttvR
cossin
sincos
cossin
sincos
21
21
2
1
In general, transformations produced by matrix multiplication can be broken down into a rotation followed by an expansion-contraction followed by another rotation
(1,0)T
(0,1)T
(cos t,sin t)T
(-sin t, cos t)T
and (0, 1)T goes to (-sin t, cos t)T
Eigenvectors and Eigenvalues
If : Ax = x
For some scalar not = 0, then we say that x is an eigenvector of A with eigenvalue .
Turns out eigenvectors are VERY useful
Clearly, x is not unique since if: Ax = x,
then: A2x = 2Ax = 2x = 2x
so it is usual to scale x so that it has length = 1.
Intuition: direction of x is unchanged by being transformed by A so it in some sense reflects the principal direction (or axis) of the transformation
Starting from an eigenvector, x however, get:
x
x
xx
x xie Repeatedly transform v by A.
Start at v then Av then AAv= A2v
Most starting points result in curved trajectories
etc …
Ax = x, A2x = 2x,
A4x = 4x, …
A3x = 3x,
So trajectory is a straight linex
x
x
x
Note if || > 1, x expands. If not, will contract
Eigenvector Facts
d
iii xzv
1
33020
30210
20101
true if A is the covariance matrix and many other important matrices, (unit length) eigenvectors will be orthogonal (say they are orthonormal):
xiT xj = 1 if i = j
xiT xj = 0 else
ie aij=ajiIf A is symmetric eg:
This means that the eigenvectors form a set of basis vectors and any vector can be expressed as a linear sum of the eigenvectors
ie they form a new rotated co-ordinate system
If the data is d-dimensional there will be d eigenvectors
Summary of Main Points• When dealing with networks, Wx = y means “outputs y are weighted sum of input x. That is: y is the network output given input x”• Similarly, WX = Y means “each column of Y is the network output operating on the corresponding column of X”• Matrix inverse can be used to solve linear equations, but pseudoinverse is more robust• In networks, use pseudoinverse to calculate ‘best’ weights that will transform training input vectors into known target vectors • Matrix vector multiplication can be seen as a transformation (eg rotations and expansions)• If Ax = x x is an eigenvector, with eigenvalue • Eigenvectors and eigenvalues tell us about main axes and behaviour of matrix transformations