introduction to machine learning multivariate methods

Introduction to Machine Learning

Multivariate Methods

姓名 :李政軒

Multiple measurements d inputs/features/attributes: d-

variate N instances/observations/examples

Multivariate Data

Nd

NN

d

d

XXX

XXXXXX

21

222

21

112

11

X

Multivariate Parameters

221

22221

11221

μμ

ddd

d

d

TE

XXXCov

ji

ijijji

jiij

Td

XX

XXE

,

,,...,

Corr :nCorrelatio

Cov:Covariance:Mean 1μx

Parameter Estimation

ji

ijij

jtj

N

t iti

ij

N

tti

i

sss

r

Nmxmx

s

diNx

m

:

:

,...,,:

R

S

m

matrix nCorrelatio

matrix Covariance

mean Sample

1

1 1

Estimation of Missing ValuesWhat to do if certain instances have missing attributesIgnore those instances: not a good idea if the sample is smallUse ‘missing’ as an attribute: may give informationImputation: Fill in the missing value

Mean imputation: Use the most likely valueImputation by regression: Predict based on other attributes

Multivariate Normal Distribution

μxμxx

μx

1212

Σ21

Σ21Σ

Td

d

p exp

~

//

,N

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

Bivariate: d = 2

2

221

2121

Multivariate Normal Distribution

iiii xz

zzzzxxp

/

exp,

2

2212122

21

21 2121

121

Bivariate Normal

Is probability contour plot of the bivariate normal distribution.Its center is given by the mean, and its shape and

orientation depend on the covariance matrix.

If xi are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:

If variances are also equal, reduces to Euclidean distance

Independent Inputs: Naive Bayes

d

i i

iid

ii

d

d

iii

xxpp1

2

1

21 21

2

1

exp

/ x

Parametric Classification

If p (x | Ci ) ~ N ( μi , ∑i

)

Discriminant functions

iiT

ii

diCp μxμxx 1212

Σ21

Σ21 exp| //

iiiT

ii

iii

CPdCPCpg

log loglog

log| log

μΣμ21Σ

212

21 xx

xx

Estimation of Parameters

tti

Ti

tt i

tti

i

tti

ttt

ii

tti

i

rr

rr

Nr

CP

mxmx

xm

S

ˆ

iiiT

iii CPg ˆ log log mxmxx 1

21

21 SS

Different Si

Quadratic discriminant

iiiiTii

iii

ii

iTii

T

iiiTiii

Ti

Tii

CPw

w

CPg

ˆ log log

where

ˆ log log

SS

S

SW

W

SSSS

21

21

21

221

21

10

1

1

0

111

mm

mw

xwxx

mmmxxxx

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

Shared common sample covariance S

Discriminant reduces to

which is a linear discriminant

iiT

ii CP̂g log21 1 mxmxx S

Common Covariance Matrix S

ii

iCP̂ SS

iiTiiii

iTii

CPw

wg

ˆ log

where

mmmw

xwx

10

1

0

21 SS

Common Covariance Matrix S

Covariances may be arbitrary but shared by both classes.

When xj j = 1,..d, are independent, ∑ is diagonalp (x|Ci) = ∏j p (xj |Ci)

Classify based on weighted Euclidean distance (in sj units) to the nearest mean

Diagonal S

id

j j

ijtj

i CPsmx

g ˆ log

1

2

21x

Diagonal S

variances may bedifferent

Nearest mean classifier: Classify based on Euclidean distance to the nearest mean

Each mean can be considered a prototype or template and this is template matching

Diagonal S, equal variances

id

jij

tj

ii

i

CPmxs

CPs

g

ˆ log

ˆ log

2

12

2

2

212mx

x

Diagonal S, equal variances

*?

All classes have equal, diagonal covariance matrices of equal variances on both dimensions.

As we increase complexity (less restricted S), bias decreases and variance increases

Assume simple models (allow some bias) to control variance (regularization)

Model Selection

Different cases of the covariance matrices fitted to the same data lead to different boundaries.

Binary features:if xj are independent (Naive

Bayes’)

the discriminant is linear

ijij Cxpp |1

d

j

xij

xiji

jj ppCxp1

11|

tti

tti

tj

ij rrx

p̂

Discrete Features

i

jijjijj

iii

CPpxpx

CPCpg

log log log

log| log

11

xx

Estimated parameters

Multinomial (1-of-nj) features: xj Î {v1, v2,..., vnj

}

if xj are independent

tti

tti

tjk

ijk

iijkj k jki

d

j

n

k

zijki

rrz

p

CPpzg

pCpj

jk

ˆ

log log

|

x

x1 1

Discrete Features

ikjijkijk CvxpCzpp || 1

Multivariate linear model

Multivariate polynomial model: Define new higher-order variables z1=x1, z2=x2, z3=x1

2, z4=x22, z5=x1x2

and use the linear model in this new z space

211010

22110

21

ttdd

ttd

tdd

tt

xwxwwrwwwE

xwxwxww

X|,...,,

Multivariate Regression

dtt wwwxgr ,...,,| 10

introduction to machine learning multivariate methods

Documents