chapter 4: parametric methods. lecture notes for e alpaydın 2004 introduction to machine learning...

47
CHAPTER 4: Parametric Methods

Upload: bertram-robbins

Post on 22-Dec-2015

248 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods

Page 2: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

2

Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation:

Assume a form for p (x | θ) and estimate θ, its sufficient statistics, using Xe.g., N ( μ, σ2) where θ = { μ, σ2}

Problem: How can we obtain θ from X? Assumption: X contains samples of a one-

dimensional random variable Later multivariate estimation: X contains

multiple and not only a single measurement.

Page 3: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3

Maximum Likelihood Estimation

Density function p with parameters θ is given and xt~p (X |θ)

Likelihood of θ given the sample Xl (θ|X) = p (X |θ) = ∏

t p (xt|θ)

We look θ for that “maximizes the likelihood of the sample”! Log likelihood

L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

Maximum likelihood estimator (MLE)θ* = argmaxθ L(θ|X)

Homework: Sample: 0, 3, 3, 4, 5 and x~N(,)? Use MLE to find(,)!

Page 4: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4

Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1}

P (x) = pox (1 – po )

(1 – x)

L (po|X) = log ∏t po

xt (1 – po ) (1 – xt)

MLE: po = ∑t xt / N

Multinomial: K>2 states, xi in {0,1}

P (x1,x2,...,xK) = ∏i pi

xi

L(p1,p2,...,pK|X) = log ∏t ∏

i pi

xit

MLE: pi = ∑t xi

t / N

Page 5: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5

Gaussian (Normal) Distribution

2

2

2exp

2

1 x-xp

p(x) = N ( μ, σ2)

MLE for μ and σ2:

μ σ

N

mxs

N

xm

t

t

t

t

2

2

2

2

2exp

2

1 xxp

Page 6: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

6

Bias and Variance

Unknown parameter θEstimator di = d (Xi) on sample Xi

Bias: bθ(d) = E [d] – θVariance: E [(d–E [d])2]

Mean square error of the estimator d: r (d,θ) = E [(d–θ)2]

= (E [d] – θ)2 + E [(d–E [d])2]= Bias2 + Variance

Error in the Model itself Variation/randomness of the model

Page 7: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

7

Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) * p(θ) / p(X) Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)

Maximum Likelihood (ML): θML = argmaxθ p(X|θ)

Bayes’ Estimator: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

Comments: ML just takes the maximum value of the density function Compared with ML, MAP additionally considers priors Bayes’ estimator averages over all possible values of θ which are

weighted by their likelihood to occur (which is measured by a probability distribution p(θ)).

For MAP see: http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation

Page 8: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)8

Bayes’ Estimator: Example

xt ~ N (θ, σo2) and θ ~ N ( μ, σ2)

θML = m

θMAP = θBayes’ =

220

2

220

20

1

1

1|

//N

/m

//N/N

E X

σ: converges to m

Page 9: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

9

Parametric Classification

iii

iii

CPCxpxg

CPCxpxg

log| log

lyequivalent or

|

ii

iii

i

i

i

i

CPx

xg

xCxp

log2

log2 log21

2exp

2

1|

2

2

2

2

kind of p(Ci|x)

Page 10: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10

Given the sample

ML estimates are

Discriminant becomes

Nt

tt,rx 1}{ X

x

, if 0

if 1

ijx

xr

jt

it

ti C

C

t

ti

t

tii

t

i

t

ti

t

ti

t

it

ti

i r

rmxs

r

rxm

N

rCP̂

2

2

ii

iii CP̂

s

mxsxg log

2 log2 log

21

2

2

Page 11: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)11

Equal variances

Single boundary athalfway between means

Page 12: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)12

Variances are different

Two boundaries

Homework!

Page 13: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13

Regression

2

2

|~|

0~

|:for estimator

,N

,N

xgxrp

xgr

xfr

N

t

tN

t

tt

N

t

tt

xpxrp

r,xp

11

1

log| log

log|XLMaximizing the probabilityof the sample again!

Page 14: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14

Regression: From LogL to Error

2

1

2

12

2

2

1

|21

|

|21

2log

2|

exp2

1 log|

N

t

tt

N

t

tt

ttN

t

xgrE

xgrN

xgr

X

XL

Skip to 20!

Page 15: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)15

Linear Regression 0101| wxww,wxg tt

t

t

t

tt

t

t

t

t

t

t

xwxwxr

xwNwr

2

10

10

t

t

tt

t

t

t

t

tt

t

xr

r

w

w

xx

xNyw

1

02A

yw 1ARelationship to what we discussed in Topic2??

Page 16: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)16

Polynomial Regression

01

2

2012| wxwxwxww,w,w,,wxg ttktkk

t

NNNN

k

k

r

r

r

xxx

xxx

xxx

2

1

22

2222

1211

1

1

1

rD

rw TT DDD1

Here we get k+1 equations with k+1 unknowns!

Page 17: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17

Other Error Measures

Square Error:

Relative Square Error:

Absolute Error: E (θ|X) = ∑t |rt – g(xt|θ)|

ε-sensitive Error:

E (θ|X) = ∑ t 1(|rt – g(xt|θ)|>ε) (|rt –

g(xt|θ)| – ε)

2

1

|21

|

N

t

tt xgrE X

2

1

2

1

||

N

t

t

N

t

tt

rr

xgrE X

Page 18: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18

Bias and Variance

222 ||| xgExgExgExrExxgxrEE XXXX bias variance

222 |||| xgxrExxrErExxgrE

noise squared error

To be revisited next week!

Page 19: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19

Estimating Bias and Variance

M samples Xi={xti , rt

i}, i=1,...,M

are used to fit gi (x), i =1,...,M

ti

t i

tti

t

tt

xgM

xg

xgxgNM

g

xfxgN

g

1

1Variance

1Bias

2

22

Initially skip!

Page 20: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20

Bias/Variance Dilemma

Example: gi(x)=2 has no variance and high bias

gi(x)= ∑t rt

i/N has lower bias with variance

As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with

data) Bias/Variance dilemma: (Geman et al., 1992)

Page 21: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21

bias

variance

f

gi g

f

Already visited as Topic4!

Page 22: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22

Polynomial Regression

Best fit “min error”

Page 23: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23

Model Selection

Cross-validation: Measure generalization accuracy by testing on data unused during training

Regularization: Penalize complex modelsE’=error on data + λ model complexity

Akaike’s information criterion (AIC), Bayesian information criterion (BIC)

Minimum description length (MDL): Kolmogorov complexity, shortest description of data

Structural risk minimization (SRM)

Remark: will be discussed in more depth later: Topic 11

Page 24: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

24

Bayesian Model Selection

Prior on models, p(model)

Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high

posterior (voting, ensembles: Chapter 15)

data

model model|datadata|model

ppp

p

Page 25: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 5:

Multivariate Methods

Page 26: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

26

Multivariate Data

Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples

Nd

NN

d

d

XXX

XXX

XXX

21

222

21

112

11

X

Page 27: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

27

Multivariate Parameters

ji

ijijji

jiij

Td

X,X

X,X

,...,E

Corr :nCorrelatio

Cov:Covariance

:Mean 1μx

221

22221

11221

Cov

ddd

d

d

TE

μμ XXX

Page 28: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)28

Parameter Estimation

ji

ijij

jtj

N

t iti

ij

N

t

ti

i

ss

sr:

N

mxmxs:

d,...,i,N

xm:

R

S

m

matrix nCorrelatio

matrix Covariance

1 mean Sample

1

1

Page 29: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

29

Multivariate Normal Distribution

μxμxx

μx

1212 2

1exp

2

Σ

Σ

T

//d

d

p

~ ,N

Mahalanobis distance between x and

http://www.analyzemath.com/Calculators/inverse_matrix_3by3.html

Page 30: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

30

Multivariate Normal Distribution Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

Bivariate: d = 2

2221

2121

iiii xz

zzzzxxp

/

212

1exp

12

1, 2

2212122

21

21

Remark: is the correlation between the two variables

Called z-score zi for xi

Page 31: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

31

Bivariate Normal

Page 32: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

32

Page 33: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

33

Independent Inputs: Naive Bayes If xi are independent, offdiagonals of ∑ are 0,

Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:

If variances are also equal, reduces to Euclidean distance

d

i i

iid

ii

/d

d

iii

xxpp

1

2

1

21 21

exp2

1

x

Page 34: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)34

Parametric Classification

If p (x | Ci ) ~ N ( μi , ∑i )

Discriminant functions are

iiT

i/

i/diCp μxμxx 1

212 21

exp2

1| Σ

Σ

iiiT

ii

iii

CPd

CPCpg

log21

log21

2log2

log| log

1

μΣμΣ xx

xx

Page 35: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

35

Estimation of Parameters

t

ti

T

it

t itt

ii

t

ti

t

tti

i

t

ti

i

r

r

r

rN

rCP̂

mxmx

xm

S

iiiT

iii CP̂g log21

log21 1 mxmxx SS

Page 36: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)36

Different Si

Quadratic discriminant

iiiiT

ii

iii

ii

iT

iiT

iiiT

iiiT

iT

ii

CP̂w

w

CP̂g

log log21

21

21

where

log221

log21

10

1

1

0

111

SS

S

SW

W

SSSS

mm

mw

xwxx

mmmxxxx

skip

Page 37: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

37

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

Page 38: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

38

Common Covariance Matrix S

Shared common sample covariance S

Discriminant reduces to

which is a linear discriminant

ii

iCP̂ SS

iiT

ii CP̂g log21 1 mxmxx S

iiT

iiii

iT

ii

CP̂w

wg

log21

where

10

1

0

mmmw

xwx

SS

Initially skip!

Page 39: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

39

Common Covariance Matrix SInitially skip!

Page 40: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

40

Diagonal S

When xj j = 1,..d, are independent, ∑ is diagonal

p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)

Classify based on weighted Euclidean distance (in sj units) to the nearest mean

id

j j

ijtj

i CP̂s

mxg log

21

1

2

x

Likely covered in April!

Page 41: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

41

Diagonal S

variances may bedifferent

Page 42: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

42

Diagonal S, equal variances

Nearest mean classifier: Classify based on Euclidean distance to the nearest mean

Each mean can be considered a prototype or template and this is template matching

id

jij

tj

ii

i

CP̂mxs

CP̂s

g

log21

log2

2

12

2

2

mxx

Page 43: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

43

Diagonal S, equal variances

*?

Page 44: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

44

Model Selection

As we increase complexity (less restricted S), bias decreases and variance increases

Assume simple models (allow some bias) to control variance (regularization)

Assumption Covariance matrix No of parameters

Shared, Hyperspheric Si=S=s2I 1

Shared, Axis-aligned Si=S, with sij=0 d

Shared, Hyperellipsoidal Si=S d(d+1)/2

Different, Hyperellipsoidal

Si K d(d+1)/2

Page 45: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

45

Discrete Features

Binary features:if xj are independent (Naive Bayes’)

the discriminant is linear

ij

ijjijj

iii

CPpxpx

CPCpg

log1 log 1 log

log| log

xx

Estimated parameters

ijij Cxpp |1

d

j

xij

xiji

jj ppCxp1

11|

t

ti

t

ti

tj

ij r

rxp̂

skip!

Page 46: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

46

Discrete Features

Multinomial (1-of-nj) features: xj {v1, v2,..., vnj}

if xj are independent

ikjijkijk CvxpCzpp ||1

t

ti

t

ti

tjk

ijk

iijkj k jki

d

j

n

k

zijki

r

rzp̂

CPpzg

pCpj

jk

log log

|1 1

x

x

skip!

Page 47: CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

47

Multivariate Regression

Multivariate linear model

Multivariate polynomial model: Define new higher-order variables

z1=x1, z2=x2, z3=x12, z4=x2

2, z5=x1x2

and use the linear model in this new z space (basis functions, kernel trick, SVM: Chapter 10)

dtt w,...,w,wxgr 10|

211010

22110

21

|

t

tdd

ttd

tdd

tt

xwxwwrw,...,w,wE

xwxwxww

X

skip!