machine learning 5. parametric methods

47
MACHİNE LEARNİNG 5. Parametric Methods

Upload: albert-bond

Post on 17-Jan-2018

378 views

Category:

Documents


7 download

DESCRIPTION

Parametric Methods Need a probabilities to make decisions (prior, evidence, likelihood) Probability is a function of input (observables) Represent function by Selecting its general form (model) with several unknown parameters Find(estimate) parameters from data that optimize certain criteria (e.g. minimize generalization error) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

TRANSCRIPT

Page 1: Machine Learning 5. Parametric Methods

MACHİNE LEARNİNG5. Parametric Methods

Page 2: Machine Learning 5. Parametric Methods

Parametric Methods Need a probabilities to make decisions

(prior, evidence, likelihood) Probability is a function of input

(observables) Represent function by

Selecting its general form (model) with several unknown parameters

Find(estimate) parameters from data that optimize certain criteria (e.g. minimize generalization error)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

2

Page 3: Machine Learning 5. Parametric Methods

Parametric Estimation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3

Assume sample comes from a distribution known up to its parameters

Sufficient statistics : parameters that completely define distribution (e.g. mean and variance)

X = { xt }t where xt ~ p (x) Parametric estimation:

Assume a form for p (x | θ) and estimate θ, its sufficient statistics, using Xe.g., N ( μ, σ2) where θ = { μ, σ2}

Page 4: Machine Learning 5. Parametric Methods

Estimation Assume form of distribution Estimate its sufficient parameters from a

sample Use distribution in classification or

regressions

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4

Page 5: Machine Learning 5. Parametric Methods

Maximum Likelihood Estimation

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

5

X consists of independent and identically distributed (iid) samples

Likelihood of θ given the sample Xl (θ|X) = p (X |θ) = ∏t p (xt|θ)

Log likelihood L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

Maximum likelihood estimator (MLE)θ* = argmaxθ L(θ|X)

Page 6: Machine Learning 5. Parametric Methods

Why to use log likelihood Log is increasing function

Increased input->increased output

Maximizing log of a function is equivalent to maximizing function itself

Log convert products to sum log (abc)=log(a)+log(b)+logc

Makes analysis/computation simplerBased on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

6

Page 7: Machine Learning 5. Parametric Methods

Examples: Bernoulli/Multinomial

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7

Bernoulli: Two states, failure/success, x in {0,1} P (x) = po

x (1 – po ) (1 – x)

Solving L(p|X)/dp=0 =>

Page 8: Machine Learning 5. Parametric Methods

Examples: Multinomial

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

8

Generalization of Bernoulli : multinomial Outcome is mutually exclusive K states Probability of occurring pi

P (x1,x2,...,xK) = ∏i pixi

L(p1,p2,...,pK|X) = log ∏t ∏i pixit

MLE: pi = ∑t xit / N

Page 9: Machine Learning 5. Parametric Methods

Gaussian (Normal) Distribution

2

2

2exp21 x-xp

N

mxs

N

xm

t

t

t

t

2

2

p(x) = N ( μ, σ2)

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9

2

2

2exp21 xxp

Page 10: Machine Learning 5. Parametric Methods

Bias and Variance of an estimator Actual value of en estimator depends on data Get different N samples from a true distribution

, results in different value of an estimator Value of an estimator is a random variables Can ask about its mean and variance Difference between the true value of a

parameter and mean of estimator is bias of the estimator

Usually looking for formula resulting in unbiased estimator with small variance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10

Page 11: Machine Learning 5. Parametric Methods

Bias and Variance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11

Unknown parameter θEstimator di = d (Xi) on sample Xi

Bias: bθ(d) = E [d] – θ

Variance: E [(d–E [d])2]

Mean square error: r (d,θ) = E [(d–θ)2] = (E [d] – θ)2 + E [(d–E [d])2]

= Bias2 + Variance

Page 12: Machine Learning 5. Parametric Methods

Mean Estimator

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

12

Page 13: Machine Learning 5. Parametric Methods

Variance Estimator

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13

Page 14: Machine Learning 5. Parametric Methods

Optimal Estimators Estimators are random variables as depend

on random sample from the distribution We look for estimators (functions of

samples) that have minimum expected square error

Expected square error can be decomposed into bias (drift) and variance

Sometimes accept larger errors but require unbiased estimators.

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14

Page 15: Machine Learning 5. Parametric Methods

Bayes’ Estimator

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

15

Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ Maximum a Posteriori (MAP): θMAP =

argmaxθ p(θ|X) Maximum Likelihood (ML): θML = argmaxθ

p(X|θ) Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

Page 16: Machine Learning 5. Parametric Methods

Bayes estimator Sometimes have some prior information on the

possible value range that parameter may take Can’t have exact value but can provide a

probability for each value (density function) Probability of a parameter before looking at data

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

16

Page 17: Machine Learning 5. Parametric Methods

Bayes Estimator

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17

Page 18: Machine Learning 5. Parametric Methods

Bayes Estimator

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18

•Assume this function have a peak around true value of the parameter

•Maximum a posteriori estimate will minimize error

Page 19: Machine Learning 5. Parametric Methods

Bayes’ Estimator: Example

xt ~ N (θ, σo2) and θ ~ N ( μ, σ2)

θML = m θMAP =

2 20

2 2 2 20 0

/ 1// 1/ / 1/N m

N N

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)19

Page 20: Machine Learning 5. Parametric Methods

Parametric Classification Bayes

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20

Page 21: Machine Learning 5. Parametric Methods

Assumptions

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21

ii

iii

i

i

ii

CPxxg

xCxp

log2

log2 log21

2exp

21|

2

2

2

2

Page 22: Machine Learning 5. Parametric Methods

Example Input: customer income Output: which car out of K models he will prefer Assume that for each model there is a group of

customers with certain income Income in a single class distributed normal

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22

Page 23: Machine Learning 5. Parametric Methods

Example: True distributions

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23

Page 24: Machine Learning 5. Parametric Methods

Example: Estimate parameters

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

24

Page 25: Machine Learning 5. Parametric Methods

Example: Discriminant

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

25

•Evaluate on a new input•Select class with largest discriminant•Assume: priors and variances are equal:

Page 26: Machine Learning 5. Parametric Methods

Boundary between classes

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

26

Page 27: Machine Learning 5. Parametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

27

Equal variances

Single boundary athalfway between means

Page 28: Machine Learning 5. Parametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

28

Variances are different

Two boundaries

Page 29: Machine Learning 5. Parametric Methods

Two approaches Likelihood-based approach (till now)

Calculate probabilities using Bayes rule Compute discriminant function

Discriminant function approach(later) Directly estimate discriminant Bypass probabilities estimation Boundary between classes?

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

29

Page 30: Machine Learning 5. Parametric Methods

Regression

x is independent variable, r is dependant variable

Unknown f, want to approximate to predict future values

Parametric approach: assume model with small number of parameters y=

Find best parameters from data Also have to make assumption on noise

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

30

Page 31: Machine Learning 5. Parametric Methods

Regressions

Have a training data (x,r) Find parameters to maximize likelihood In other words, what parameters makes

data most probable

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

31

Page 32: Machine Learning 5. Parametric Methods

Regressions

Ignore the last term,(does not depend on parameters

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

32

Page 33: Machine Learning 5. Parametric Methods

Regression

Minimize last termBased on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

33

Page 34: Machine Learning 5. Parametric Methods

Least Square Estimate

Minimize thisBased on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

34

Page 35: Machine Learning 5. Parametric Methods

Linear Regression 0101| wxww,wxg tt

t

t

t

tt

t

t

t

t

t

t

xwxwxr

xwNwr

210

10

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

35

Assume linear model Need to minimize Set derivatives to zero 2 linear equations in 2

unknowns Can solve easily

Page 36: Machine Learning 5. Parametric Methods

Polynomial Regression

012

2012| wxwxwxww,w,w,,wxg ttktkk

t

NNNN

k

k

r

rr

xxx

xxxxxx

2

1

22

2222

1211

1

11

rD

rw TT DDD 1

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

36

Page 37: Machine Learning 5. Parametric Methods

Bias and Variance

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

37

222 ||| xgExgExgExrExxgxrEE XXXX bias variance

222 |||| xgxrExxrErExxgrE

noise squared error

Page 38: Machine Learning 5. Parametric Methods

Bias and Variance Given single sample (x,r), what is the

expected error Variations are due to noise and training

sample

First term is due to noise Does not depend on the estimate Can’t be removed

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

38

Page 39: Machine Learning 5. Parametric Methods

Variance

Second term Deviation of estimator from regression function Depends on estimator and training set Average over all possible training samples

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

39

Page 40: Machine Learning 5. Parametric Methods

Example: polynomial regression As we increase degree of the polynomial

Bias decreases as allow better fit to points Variance increases as small deviation in

training sample might result in large deviation in model parameters

Bias/variance dilemma true for any machine learning systems

Need a way to find optimal model complexity to balance between bias and variance

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

40

Page 41: Machine Learning 5. Parametric Methods

Bias/Variance Dilemma

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

41

Example: gi(x)=2 has no variance and high biasgi(x)= ∑t rt

i/N has lower bias with variance

As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data)

Bias/Variance dilemma: (Geman et al., 1992)

Page 42: Machine Learning 5. Parametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

42

gi

Bias/Variance Dilemma

Page 43: Machine Learning 5. Parametric Methods

Polynomial Regression

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

43

Best fit “min error”

Page 44: Machine Learning 5. Parametric Methods

Model Selection How to select right model complexity? Different from estimating model

parameters There are several procedures

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

44

Page 45: Machine Learning 5. Parametric Methods

Cross-Validation Can’t calculate bias and variance as don’t

know true model But can estimate total generalization error Set aside portion of data (validation set) Increase model complexity, find

parameters Calculate error on validation set Stop when error cease to decrease or even

start increasingBased on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

45

Page 46: Machine Learning 5. Parametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

46

Best fit, “elbow”

Cross-Validation

Page 47: Machine Learning 5. Parametric Methods

Regularization Introduce penalty for model complexity

into an error function Find optimal model complexity (e.g. degree

of polynomial) and optimal parameters (coefficients) which minimize this function

Lambda is penalty for model complexity If lambda is too large only very simple

models will be admitted

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

47