introduction to machine learning multivariate methods
DESCRIPTION
Introduction to Machine Learning Multivariate Methods. 姓名 : 李政軒. Multiple measurements d inputs/features/attributes: d -variate N instances/observations/examples. Multivariate Data. Multivariate Parameters. Parameter Estimation. Estimation of Missing Values. - PowerPoint PPT PresentationTRANSCRIPT
Introduction to Machine Learning
Multivariate Methods
姓名 :李政軒
Multiple measurements d inputs/features/attributes: d-
variate N instances/observations/examples
Multivariate Data
Nd
NN
d
d
XXX
XXXXXX
21
222
21
112
11
X
Multivariate Parameters
221
22221
11221
μμ
ddd
d
d
TE
XXXCov
ji
ijijji
jiij
Td
XX
XXE
,
,,...,
Corr :nCorrelatio
Cov:Covariance:Mean 1μx
Parameter Estimation
ji
ijij
jtj
N
t iti
ij
N
tti
i
sss
r
Nmxmx
s
diNx
m
:
:
,...,,:
R
S
m
matrix nCorrelatio
matrix Covariance
mean Sample
1
1 1
Estimation of Missing ValuesWhat to do if certain instances have missing attributesIgnore those instances: not a good idea if the sample is smallUse ‘missing’ as an attribute: may give informationImputation: Fill in the missing value
Mean imputation: Use the most likely valueImputation by regression: Predict based on other attributes
Multivariate Normal Distribution
μxμxx
μx
1212
Σ21
Σ21Σ
Td
d
p exp
~
//
,N
Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)
Bivariate: d = 2
2
221
2121
Multivariate Normal Distribution
iiii xz
zzzzxxp
/
exp,
2
2212122
21
21 2121
121
Bivariate Normal
Is probability contour plot of the bivariate normal distribution.Its center is given by the mean, and its shape and
orientation depend on the covariance matrix.
If xi are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:
If variances are also equal, reduces to Euclidean distance
Independent Inputs: Naive Bayes
d
i i
iid
ii
d
d
iii
xxpp1
2
1
21 21
2
1
exp
/ x
Parametric Classification
If p (x | Ci ) ~ N ( μi , ∑i
)
Discriminant functions
iiT
ii
diCp μxμxx 1212
Σ21
Σ21 exp| //
iiiT
ii
iii
CPdCPCpg
log loglog
log| log
μΣμ21Σ
212
21 xx
xx
Estimation of Parameters
tti
Ti
tt i
tti
i
tti
ttt
ii
tti
i
rr
rr
Nr
CP
mxmx
xm
S
ˆ
iiiT
iii CPg ˆ log log mxmxx 1
21
21 SS
Different Si
Quadratic discriminant
iiiiTii
iii
ii
iTii
T
iiiTiii
Ti
Tii
CPw
w
CPg
ˆ log log
where
ˆ log log
SS
S
SW
W
SSSS
21
21
21
221
21
10
1
1
0
111
mm
mw
xwxx
mmmxxxx
likelihoods
posterior for C1
discriminant: P (C1|x ) = 0.5
Shared common sample covariance S
Discriminant reduces to
which is a linear discriminant
iiT
ii CP̂g log21 1 mxmxx S
Common Covariance Matrix S
ii
iCP̂ SS
iiTiiii
iTii
CPw
wg
ˆ log
where
mmmw
xwx
10
1
0
21 SS
Common Covariance Matrix S
Covariances may be arbitrary but shared by both classes.
When xj j = 1,..d, are independent, ∑ is diagonalp (x|Ci) = ∏j p (xj |Ci)
Classify based on weighted Euclidean distance (in sj units) to the nearest mean
Diagonal S
id
j j
ijtj
i CPsmx
g ˆ log
1
2
21x
Diagonal S
variances may bedifferent
Nearest mean classifier: Classify based on Euclidean distance to the nearest mean
Each mean can be considered a prototype or template and this is template matching
Diagonal S, equal variances
id
jij
tj
ii
i
CPmxs
CPs
g
ˆ log
ˆ log
2
12
2
2
212mx
x
Diagonal S, equal variances
*?
All classes have equal, diagonal covariance matrices of equal variances on both dimensions.
As we increase complexity (less restricted S), bias decreases and variance increases
Assume simple models (allow some bias) to control variance (regularization)
Model Selection
Different cases of the covariance matrices fitted to the same data lead to different boundaries.
Binary features:if xj are independent (Naive
Bayes’)
the discriminant is linear
ijij Cxpp |1
d
j
xij
xiji
jj ppCxp1
11|
tti
tti
tj
ij rrx
p̂
Discrete Features
i
jijjijj
iii
CPpxpx
CPCpg
log log log
log| log
11
xx
Estimated parameters
Multinomial (1-of-nj) features: xj Î {v1, v2,..., vnj
}
if xj are independent
tti
tti
tjk
ijk
iijkj k jki
d
j
n
k
zijki
rrz
p
CPpzg
pCpj
jk
ˆ
log log
|
x
x1 1
Discrete Features
ikjijkijk CvxpCzpp || 1
Multivariate linear model
Multivariate polynomial model: Define new higher-order variables z1=x1, z2=x2, z3=x1
2, z4=x22, z5=x1x2
and use the linear model in this new z space
211010
22110
21
ttdd
ttd
tdd
tt
xwxwwrwwwE
xwxwxww
X|,...,,
Multivariate Regression
dtt wwwxgr ,...,,| 10