classic estimation - ulisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © jorge salvador...

46
© Jorge Salvador Marques, 2002 Classic Estimation

Upload: others

Post on 07-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Classic Estimation

Page 2: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Summary

•Motivation

•Deterministic Methods

•Classic Probabilistic Methods

•Examples

Page 3: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

ChallengeGiven the data points shown in the figure we wish to approximatethem by a 2nd order polynomial

y=c2 x2+ c1 x+ c0

Problem: how to compute c0, c1, c2.

-1 -0.5 0 0.5 10

1

2

3

x

y

Note: we assume that the x values are accurately known but the y values are corrupted by measurement noise.

Page 4: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Gauss (1777-1855)

Page 5: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Parabolic Fit

The absolute value of the errors must be small

ei - error

yi

Quadratic cost:2i

ieE ∑= <e,e>

Page 6: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Estimation of the coefficients

What coefficients minimize E ?

Stationarity condition:

0)(220 012

22 =−−−∑=∑=∑

∂∂

⇒=∂∂ p

iiiii

pii

ii

ippxcxcxcyxee

ccE

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

⎥⎥

⎢⎢

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=

∑∑∑

∑∑∑

∑∑∑

iixiy

iixiy

iiy

ccc

iixix

iixix

iix

iixix

iixix

iix

iix

iix

i

2210

22212

21

2111

internal products between basis functions

internal products between the observations and the basis functions

Page 7: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Results

-1 -0.5 0 0.5 10

1

2

3

Page 8: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Least Squares Method

Least squares method:

|||| e

cost

Given a set of observations (xi, yi). We wish to approximate yi by φ(xi,θ), θ being an unknown vector of parameters. The error is

ei = yi − φ(xi,θ)

How to obtain θ ?

Hypothesis: it is assumed that xi is accurately known.

2|||| minargˆ ii

e∑=θ

θ

normEuclidean theis||.||

Page 9: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Linear Model

iii exy += θφ )'(Model:

LS estimate: bA 1−=θ

)(

)'()(

1

1

iin

i

iin

i

yxb

xxA

φ

φφ

∑=

∑=

=

=

Page 10: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Proof∑ −−=i

iiii xyxyE ))'(()')'(( θφθφ

∑=∑⇒=∑ −−⇒=i

iii

iii

iii xxyxxyxddE θφφφθφφ

θ)'()()( 0))'()((2 0

Computing the derivative with respect to θ and using the appropriate properties

is a stationary point.

If the matrix

∑=i

ii xxd

Ed )'()(2 2

2φφ

θis positive definite,

Ε is minimized by .

bA 1ˆ −=θ

θ

Page 11: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Geometric InterpretationObservations y1, ..., yn, belong to a vector space E of dimension nm, in which m is the dimension of vector yi.

Model sequence φ(x1)’θ, ..., φ(xn)’θ, belongs to a subspace S ⊂ E with a lower dimension.

The least squares method computes the coefficients of the orthogonal projection of y onto the subspace S, using the internal product:

iii

yxyx ', ∑>=<

y projection is such that the projection error is orthogonal to all the vectors in S:

<e,b>=0, ∀ b∈ S (orthogonality condition)

Page 12: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Geometric Interpretationy

e

y

Space of signals generated by the model

If it is a linear subspace S (linear model), is the orthogonal projection ofy onto S

e

y

y Orthogonality conditions

<e,b>=0

b – any vector of S

y

E

Page 13: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Learning vs InferenceThe least squares method allows the solution of several problems.

? inference

Learning(model identification)

training data

it does not provide an uncertainty measure!

Page 14: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Estimation

yx

General problem: how to estimate x, θ, given y ?

Subproblems: obtain x from y, θ (inference)obtain θ from y, x (learning, identification)

y depends on x and on a vector θx is known or unknown

The learning problem is often based on observations obtained at different instants of time or even different experiments.The estimation of the unknown variable x is usually performed in each experiment.

Page 15: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Regression

Given (xi ,yi ), we wish to approximate yi by f(xi ,θ) with θ unknown.

Least squares estimate:

2|||| minargˆ ii

e∑=θ

θ ),( θiii xfye −=

Examples: linear combination of basis functions, neural networks.

x

y

Hypothesis (asymmetric): x is accurately known.

Page 16: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

PredictionGiven y=( y1 , ..., yt-1 ), we wish to predict yt using a linear combination of past observations:

y1 = at yt-1 + ... + ap yt-p + et

The coefficients ai are estimated by least squares.

?

Page 17: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Frequency EstimationGiven a model

iii etAy ++= )cos( ϕω

We wish to estimate A,ω,ϕ ?

Available data: (ti, yi) i=1, ...,N

Idea: choose A,ω,ϕ minimizing ∑=i

ieE 2

non linear problem

Page 18: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Estimation of a Sinusoid

with noise

0 1 2 3 4 5 6 7-2

-1

0

1

2

3

0 1 2 3 4 5 6 7-2

-1

0

1

2

0 1 2 3 4 5 6 7-4

-3

-2

-1

0

1

2

3

4

0 1 2 3 4 5 6 7-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

without noise

3 points 7 points

Optimization in the interval: [2,0][10,0][5,0]),,( πφω ××∈A(white noise with σ=.5)

Page 19: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Restrictions

The least squares method has the following restrictions:

• does not allow a statistical description of the error: nothing is known about the error we obtain in the next experiment.

•Not robust

•Leads to difficult optimization problems when non linear models are used. In general, only local minima of E can be obtained.

Page 20: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Robustnessspurious observations

Not robust

Alternatives:• weighted LS (errors are weighted by confidence degrees)• robust estimation methods

(why ?)

Page 21: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

RANSAC

•choose a minimal set of data allowing to estimate the parameters•compute the number of observations well approximated by the model•repeat the previous steps N times and at the end choose the estimate with bigger support(the estimate is refined using the support observations)

Refinement

Procedure

support=2 support=3 support=5

Generation of random hypothesis

Page 22: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Robust Methods

||)(|| minargˆ iix

ex ρ∑=

Robust estimator:

|||| e

ρ

•requires recursive numeric optimization

Example: LMed

Page 23: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Exercises1. Given a signal y=( y1 , ..., yN ), determine the coefficients of the linear predictor

N>>p

using the least squares method.

2. Generalize the previous problem to the case in which we want to predict the signal k steps ahead.

ptptt yayay −− ++= ...ˆ 11

Page 24: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

WorkConsider two parabola the and observations belonging to each of them. However, we do not know which parabola is close to each observation. Estimate the coefficients of the parabola using the least squares method and robust methods.

Characterize the performance of both methods

Page 25: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Classic Probabilistic Methods

Page 26: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Classic vs Bayesian Methods

yx y – realization of a random variable

•classic: LS, EM•Bayesian: MAP, MSE

Estimation methods:

Classic methods consider x as a deterministic variable.

Bayesian methods consider x as a random variable characterized by an a posteriori distribution p(x/y)

x - unknown

Page 27: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Classic Estimation

•classic estimation methods are not based on general principles.

• An estimator is a mapθ(y) from the observation space into the parameter space.

• Each estimator is a random variable which can be statistically characterized. In general we wish to define estimators with a set of desired properties (e.g., unbiased and with low variance).

Page 28: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Maximum Likelihood Method

Maximum likelihood estimate (ML)

p(y/x)l(x)xlx

p(y/x)L(x)xLx

x

x

log )( maxargˆ

)( maxargˆ

==

==

(Fisher, 1921)

If y= y1,..., yN is a set of independent and identically distributed (iid) observations, then

∏=i

i xypxL )/()( ∑=i

i xypxl )/(log)(

oulikelihood function

log likelihood function

Page 29: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Normal Distribution

Problem: estimate the mean and covariance matrix of a normal distributionN(µ,R) from a set of n independent random variables y1, ..., yn.

Log likelihood function:

)()'(||log),( 121

2 µµµ −−∑−−= −ii

in yRyRKRl

Optimizing l we get

)'ˆ)(ˆ(ˆ ˆ ii1

1i

11 µµµ −−∑=∑=

==yyPy

n

in

n

in

Mahalanobis distance

Page 30: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Properties of ML estimates

)ˆ(ˆ then )( if MVMV yfzyfz ==•Invariance to nonlinear transformations of the data:

In the sequel we consider that we observed a sequence of n iid variables such that the density p of each variable belongs to the family of functions adopted to the model, i.e. p=px0

, where xo is the correct value of the parameter x.• the ML estimator is consistent ( converges to xo when n tends to infinity)• has a normal asymptotic distribution:

x

),0()ˆ( JNxxnd

o →−

where J is the Fisher information matrix:

}{T

dxdl

dxdlEJ =

Page 31: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

ApproximationHow does the ML method work if the model is wrong ?

PropositionLet y be a sequence of n iid variables with density p and {pθ} a family of functions depending on x (model). If the ML estimates converge to when n tends to infinity, then minimizes the Kullback-Leiblerdivergence between p and {pθ} (model):

dyyp

ypypppDx

x )()(log)(],[ ∫=

x

The divergence is a non symmetric distance.

p

px

D

x

Page 32: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

ML vs LS

When the observations are normally distributed

)()/( θθ ECeyp −=

The maximum likelihood method is equivalent to the least squares method.

Page 33: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Polynomial Approximation

Model:

Data: (x1, y1), (x2, y2), ..., (xn, yn)

),0(~ ]1[)'( )'( 22 σφθφ Nw x xxwxy iiiiii =+=

)(])'([)( 22

12 θθφθ

σECxyCl ii

i−=−∑−=

Log likelihood function:

where E is the least squares energy. The maximum likelihood estimateis equal to the least squares estimate.

Did we gain anything ? Yes. can be statistically characterized.θ

bA 1−=θ )( )'()(11

iin

iii

n

iyxbxxA φφφ ∑=∑=

==

Page 34: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Example – Frequency estimation

0 0.5 1 1.5 2 2.5 3-2

0

2

Energy Likelihood function

0 2 4 6 8 100

2

4

0 2 4 6 8 100

1

2

n=1

0 2 4 6 8 100

2

4

0 2 4 6 8 100

1

2

n=2

0 2 4 6 8 100

5

10

0 2 4 6 8 100

10

20

Alln=3

ω ω

observations rad/segNiw

iwitiy

1 )01,.0(~

)cos(

=

+=

ω

ω

rad/seg1ˆ =ω

Page 35: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Binary Detection

A binary symbol was transmitted in an analog channel in base band.

},{ AA−∈θ

At the receiver, the observed signal is

,...,n iwy ii 1 =+= θWe wish to estimate θ assuming that the noise is uncorrelated and

),0(~ 2σNwi

The log likelihood function is

θθθσσ i

ii

iyCyl ∑+=−∑−= 22

122

1 )()(

This is the expression of the matched filter.

Page 36: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

ProofMean estimation:

0)(2)()'(0 11 =−∑−=−−∑∂∂

⇒=∂∂ −− µµµ

µµxRxRxl

ii

xin ∑= 1µ

multiplying by R we get

Note: see the derivative rules presented in Lecture 1

Covariance estimation :

0)()'(||log0 121

2 =−−∂∂

∑−∂∂−⇒=

∂∂ − µµ ii

i

n yRyR

RRR

l

0)')(( 111 =−−∑+− −−− RyyRnR iii

µµ

Multipying by R on the left and on the right

)')((ˆ 1 µµ −−∑= iiin yyR

Page 37: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Linear PredictionPrediction is a regression problm. In linear prediction the regression vector consists of the past values of the signal.

ipipii vyayay +++= −− ...11

),0(~ ' 2 σθφ Nvvy iiii += ]'...[ ]'...[ 11 piiip yyaa −−== φθ

The solution of this problem is similar to the estimation of parameters with a linear model (see polynomial approximation)

i=p+1, ...,n

Page 38: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Characterization of Estimators

An estimator is a random variable which depends on x, y.

2nd order properties

Mean

random }ˆ{}{ticdeterminis }ˆ{

xxExEBxxExB

−=−=

}ˆ{xE=µ

Covariance matrix

})ˆ)(ˆ{( TxxER µµ −−=

Bias

Notes:•ideal goal: B=0 e zero covariance. This goal is impossible in most problems.•Expected values are computed using p(y/x), when x is deterministic and with p(x,y) when x is random.

Page 39: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Crámer-Rao boundCan the covariance matrix of an estimator be the null matrix ?

Crámer-Rao theorem

1}ˆ{ −≥ JxCov

J -1 Crámer-Rao bound, J is the Fisher information matrix

} {T

dxdl

dxdlEJ = l log likelihood function

When The estimator is called eficiente.1}ˆ{ −= JxCov

(x determministic)

definition: A>B if and only if A-B positive definite.

Page 40: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

ExampleIn practice it is not always easy to compute the Crámer-Rao bound. Onecase which can be easily addressed concerns the estimation of the meanvector of a normal distribution, given N observations y1,..., yN.

RCov N1}ˆ{ ≥µ

Proof: since p(y/x)=N(µ,R),

∑ −−−= −

ii

Ti yRyCl )()()( 1

21 µµµ ∑ −= −

iid

dl yR )(1 µµ

11111 })()({}{ −−−−− ==∑ −∑ −== NRRNRRRyyEREJj

ji

iT

ddl

ddl µµµµ

RJCRB N11 == −

Note: show that the ML estimator of µ is efficient.

Page 41: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Monte Carlo MethodMonte Carlo Method: Numeric evaluation of an estimator based on a large number of estimation experiments and statistical analysis of the results.

Page 42: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Exercises1. Let x be a binary variable such that P(0)=Po, where Po is an unknown

parameter. Determine the ML estimate of Po from n independent realizations of x.

2. Determine the ML estimator of the mean and covariance matrix of a normal distribution N(µ,R), given n independent observations. (see Apendix).

3. Given n samples of a random signal y described by an autoregressive model yt= a yt-1+ b wt, where w is a white process with distribution wt~N(0,1), determine coefficients a, b by the ML method.

4. Compute the Crámer-Rao bound for the estimation of the parameter ofa Rayleigh distribution, knowing N realizations y1, ..., yN.

Page 43: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Appendix – Matrices (I)

Transpose: C=A’Sum: C=A+BMultiplication: C=ABTrace: c=tr(A)Inversion: C=A-1

eigen values and eigen vectors: iiiii

k ii

k kjikij

ijijij

ijij

vAvvIAAAAAC

acAtrcbacABC

bacBACacAC

λλ ====

∑==∑==

+=+===

−−−

,

)(

'

111

Properties

• a matrix is non sigular if all its eigen values are non zero;• the rank of a matrix is equal to the number of eigen values different fromzero; a non singular matrix has full rank.•The eigen values of a symmetric matrix are real.

Operations

Page 44: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

Appendix Matrices (II)• the trace of a matrix is equal to the summ of all its eigen values;• the determinant of a matrix is equal to the product of all its eigen values• the inverse of a matrix has the following properties:

111)( −−− = ABAB

22122111

1

][ ][][ ][ que em nnDnnC

nnBnnAHGFE

DCBA

×=×=×=×=

⎥⎦⎤

⎢⎣⎡=⎥⎦

⎤⎢⎣⎡

111

1

1

11 )(

−−−

−−

+=

−=

−=

−=

CEBDDDH

CEDG

EBDF

CBDAE

Page 45: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

DerivativesThe derivative of a scalar function f(X) with respect to matrix X is a matrix

[ ]ijXd

XdfijdX

Xdf][

)()( =

Se f=f(X,Y) e Y=y(X), [ ] dYdf

dXdY

dXdf

dXdf '

+=

Properties

)'11(}1{

'''}'{

''''''}{

2}'{

}'{

''}{

}'{

'}{

−−−=−+=

+=

=

=

=

=

=

BAXXBAXtrdXd

AXBBXAAXBXtrdXd

AXBBXAAXBXtrdXd

XXXtrdXd

BABAXtrdXd

BAAXBtrdXd

AAXtrdXd

AAXtrdXd

)'(||||

)'(||||

)'(|log|

)'(||||

1

1

1

1

=

=

=

=

XXnX

XAXBAXB

XX

XXXdXd

nndXd

dXd

dXd

trace determinant

Note: extracted from Sage&Melsa

Page 46: Classic Estimation - ULisboausers.isr.ist.utl.pt/~jsm/teaching/ec/2.pdf · © Jorge Salvador Marques, 2002 Challenge Given the data points shown in the figure we wish to approximate

© Jorge Salvador Marques, 2002

BibliographyDuda, Hart, Stork, Pattern Classification, Wiley, 2001.

J. Marques, Reconhecimento de Padrões. Métodos Estatísticos e Neuronais, IST Press, 1999