eel 851: biometrics · 2009. 7. 5. · eel 851 4 ¾feature relevant characteristics that make...

52
EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition

Upload: others

Post on 24-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 1

EEL 851: Biometrics

An Overview ofStatistical Pattern Recognition

Page 2: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 2

OutlineIntroduction

PatternFeatureNoise

ExampleProblem Analysis

SegmentationFeature ExtractionClassification

Design CycleBayesian Decision Theory

Discriminant FunctionsDecision RegionsMinimum Distance Classification

Nearest Neighbor ClassificationParameter EstimationDecision Boundaries

Page 3: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 3

What is a Pattern?Object process of event consisting of both deterministic/stochastic componentsRecord of dynamic occurrences influenced by both deterministic and stochastic factorsExamples – voice, image, characters

Kind of PatternsVisual patternsTemporal patternsLogical patterns

What is Pattern Class?Set of patterns sharing set of common attributes (or features)Usually originating from the same source

Introduction

Page 4: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 4

FeatureRelevant characteristics that make patterns apart from each otherData extractable through measurements or processing

ExamplesPatterns

Speech waveforms, crystals, textures, weather patternsFeatures

Age, color, height, width

ClassificationsAssigning patterns into classes based on features

NoiseDistortions associated with pattern processing and/or training samples that effect the classification performance of system

Introduction

Page 5: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 5

ExampleAutomation in fish packing plant

Sort incoming fish on a conveyor according to species using optical sensing

Sorting speciesSea bassSalmon

Page 6: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 6

Setup cameraTake some sample imagesFeatures?

Suggested featuresLengthLightnessWidthNumber and shape of finsPosition of mouth, etc..

Explore above suggested features!

Problem Analysis

Page 7: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 7

SegmentationIsolate fishes from backgroundIsolate fishes from one another

Feature extractionReduce the data by measuring certain features

ClassifierEvaluates the evidence presented → Makes final decision

Preprocessing

Page 8: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 8

Preprocessing

Page 9: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 9

Selecting length of fish → Possible feature for classification

Classification

Page 10: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 10

Length is a poor feature!Select brightness as a possible feature

Classification

Page 11: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 11

CostConsequences of our decisionEqual?

Task of decision theoryMove the decision boundary towards smaller values of lightness, Cost ↓Reduce number of sea brass classified as salmon

Threshold Decision Boundary

Page 12: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 12

Adopt lightness and the width of fish

Fish x = [x1, x2]

Classification

Lightness Width

Page 13: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 13

Classification

Page 14: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 14

Classification

Our satisfaction → PrematureCentral aim → Correctly classify a novel inputIssue of Generalization!

Page 15: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 15

Classification

Syntactic Pattern RecognitionRecursive description using method of formal languages

Structural Pattern RecognitionDerivation of descriptions using formal rules

Page 16: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 16

Pattern Recognition Systems

Invariant Features• Translation• Rotation• Scale

Page 17: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 17

Data Collection

Feature Choice

Model Choice

Training

Evaluation

Computational Complexity

Design Cycle

Page 18: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 18

Design Cycle

Page 19: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 19

Supervised LearningTeacher → Category label or cost for each label in training setDesired input → Desired output

Goals → Produce a correct output given a new input

Goals of Supervised LearningClassification

Desired output → Discrete class labels

RegressionDesired output → Continuous valued

Unsupervised LearningThe system forms clusters or natural groupings of input patternGoals → Build a model or find useful representations of dataUsage → Reasoning, finding clusters, dimensionality reduction, decision making, prediction, etc.

Data compression, Outlier detection, Help classification

Types of Learning

Page 20: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 20

AssumptionsProblem → Probabilistic termsKnowledge of relevant probabilities

Sea Bass/Salmon exampleState of nature

Random Variableω → ω1 (sea bass)ω → ω1 (salmon)

Priori probabilityP(ω1) → sea bass; P(ω2) → salmonP(ω1) + P( ω2) = 1 (exclusivity and exhaustivity)

Bayesian Decision Theory

Page 21: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 21

Decision RuleOnly prior information, Cannot see!Decide ω1 if P(ω1) > P(ω2) otherwise decide ω2More info in most cases

Class Conditional InfoClass-conditional pdf → p(x|ω)p(x|ω1) and p(x|ω2)

Difference in lightness between populations of sea and salmon

Bayesian Decision Theory

Page 22: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 22

Bayesian Decision Theory

Class-conditional pdfx → Lightness of fishNormalized

Page 23: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 23

Bayes FormulaLightness of fish → x, Class?

P(ωj | x) = p(x | ωj) . P (ωj) / p(x)

where in case of two categories

Posterior = (Likelihood × Prior) / Evidencep(ωj | x) → Likelihoodp(x) → Evidence (scale factor)

∑=

=

=2

1

)()|()(j

jjj Pxpxp ωω

Page 24: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 24

Posterior Probabilities

Prior probabilitiesP(ω1) = 2/3P(ω2) = 1/3

Page 25: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 25

Decision given posterior probabilitiesObservation → xif P(ω1 | x) > P(ω2 | x) → True state of nature = ω1

if P(ω1 | x) < P(ω2 | x) → True state of nature = ω2

Probability of errorP(error | x) = P(ω1 | x) if we decide ω2

P(error | x) = P(ω2 | x) if we decide ω1

Bayes Decision Rule

Page 26: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 26

Minimize probability of errorDecide ω1 if P(ω1 | x) > P(ω2 | x); otherwise decide ω2

Bayes decisionP(error | x) = min [P(ω1 | x), P(ω2 | x)]

Bayes Decision Rule

Page 27: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 27

Generalization of Preceding IdeasMore than one featureMore than two state of natureActions not only deciding state of natureLoss function → More general than probability of error

Bayes Decision RuleInput pattern C is classified into class ωk , for a given feature vector x, maximizes the posterior probability;

P(ωk| x) ≥ P(ωj| x) for ∀ j ≠ kLikelihood conditions p(x |ωk) → Training data measurementsPrior probabilities P(ωk) → Supposed to known within given population of sample patternsp(x) is same for all class alternatives → Ignored, normalization factor

Bayesian Classifier

Page 28: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 28

Certain classification errors may be costly than othersFalse Negative may be much more costly than False PositiveExamples → Fire alarm, Medical diagnosisΩ = ω1,ω1, … ωn → Possible set of nature/classes∆ = α1,α2, … αn → Possible decisions/actions

Loss Functionλ(αi|ωj)Loss incurred by taking action αi when true state of nature is ωj

Conditional RiskR(αi|x)Loss expected by taking action αi when observed evidence is x

Loss Function

Page 29: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 29

Minimum Error Rate ClassificationAction αi is associated with class ωjAll errors are equally likelyZero-One Loss

Classification decision αi is correct only if state of nature is ωjSymmetrical function

Conditional RiskR(αi|x) = ∑ λ(αi|ωj) P(ωj|x) = 1 – P(ωi|x)

Minimize RiskSelect the class maximizing the posterior probabilitySuggested by Bayes Decision Rule, Minimum error rate classification

⎩⎨⎧

≠=

=jiji

ji for 1for 0

)|( ωαλ

Page 30: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 30

Minimize overall risk subject to constraints∫ R(αi|x) dx < constantMisclassification limited to a frequencyFish example → Do not misclassify more than 1% of salmon as bassMinimize the chance of classifying sea bass as salmon

Neyman-Pearson Criterion

Page 31: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 31

Classifier RepresentationFunction employed for discriminating among classesgi(x), i = 1, …,kClassifier → Assign x to class ωi if

gi(x) ≥ gj(x) for ∀ j ≠ i

Possible Choicesgi(x) = P(ωj|x)gi(x) = p(x|ωj)P(ωj)gi(x) = ln p(x|ωj) + ln P(ωj)

Discriminant Functions

Page 32: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 32

Decision RegionEffect of Decision Rule

Feature space → Decision regionsDecision Boundaries

Surface in feature space →Ties occur among largest discriminant functions

Page 33: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 33

Normal DensityUnivariate Density

Analytically tractable, Maximum entropyContinuous-valued densityA lot of processes are asymptotically GaussianCentral Limit Theorem

Aggregate effect of independent random disturbances → GaussianMany patterns → Prototype corrupted by large number of RP

µ = mean (or expected value) of xσ2 = expected squared deviation or variance

,x21exp

21)x(P

2

⎥⎥⎦

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛

σµ−

−σπ

=

Page 34: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 34

Multivariate density

Multivariate normal density in d dimensions is:

where:x = (x1, x2, …, xd)t feature vectorµ = (µ1, µ2, …, µd)t mean vectorΣ = d*d covariance matrix|Σ| and Σ-1 are determinant and inverse respectively

Σ → Shape of Gaussian curve(x - µ)t Σ-1(x - µ) → Mahalanobis distance

⎥⎦⎤

⎢⎣⎡ µ−Σµ−−

Σπ= − )x()x(

21exp

)2(1)x(P 1t

2/12/d

Normal Density

Page 35: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 35

⎥⎦⎤

⎢⎣⎡ µ−Σµ−−

Σπ= − )x()x(

21exp

)2(1)x(P 1t

2/12/d

Normal Density

Page 36: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 36

Discriminant FunctionsMinimum Error Rate Classification

gi(x) = ln p(x | ωi) + ln P(ωi)

Case → Σi = σ2.IFeatures are statistically independent, same variance σ2

Diagonal covariance matrix

Linear machinesDecision surface → Pieces of hyperplanesgi(x) = gj(x)

)(Plnln212ln

2d)x()x(

21)x(g ii

1

ii

tii ω+Σ−π∑ −µ−µ−−=

0itii wg += xwx)(

ii µ21σ

=w )(ln iitii Pw ω

σ+

−= µµ20

21

Page 37: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 37

Discriminant Functions

Page 38: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 38

Discriminant Functions

Page 39: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 39

Case → Σi = ΣCovariance matrices of all classes → Identical but arbitrary

Hyperplane separating Ri and Rj

Hyperplane separating Ri and Rj → Not orthogonal to the line between the means

Discriminant Functions

[ ]).(

)()(

)(/)(ln)( ji

jit

ji

jiji

PPx µµ

µµµµ

ωωµµ −

−Σ−−+=

−10 21

)(ln)( ii Pg ωµµ +Σ−= − )-(x)-(xx it

i1

21

Page 40: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 40

Discriminant Functions

Page 41: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 41

Discriminant Functions

Page 42: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 42

Case → Σi = arbitrary

The covariance matrices are different for each category

HyperquadricsHyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids

)(lnln21

21 w

w21 W

:)(

10

1i

1i

0

iiiitii

ii

i

itii

ti

P

wherewxwxWxxg

ωµµ

µ

+Σ−Σ−=

Σ=

Σ−=

=+=

Discriminant Functions

Page 43: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 43

Discriminant Functions

Page 44: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 44

Decision Boundary - ExampleCompute Bayes decision boundary

Gaussian Distributions

Means and Covariance's → Discrete versions

⎥⎦

⎤⎢⎣

⎡=

63

1µ ⎟⎟⎠

⎞⎜⎜⎝

⎛=∑

2 00 21

1/

⎥⎦

⎤⎢⎣

⎡−

=23

2µ ⎟⎟⎠

⎞⎜⎜⎝

⎛=∑

2 00 2

2

211 1875012515143 xxx ... +−=2

Page 45: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 45

Supervised LearningParametric Approaches

Bayesian parameter estimationMaximum likelihood estimation

Estimation ProblemEstimate P(ωj) Estimate p(ωj | x) → Tough

High dimensional feature spaces, small number of training samples

Simplifying AssumptionsFeature IndependenceIndependently drawn samples → I. I. D. modelAssume that p(ωj | x) is Gaussian

Estimation problem → Parameters of normality

Page 46: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 46

Estimation MethodsBayesian Estimation (MAP)

Distribution parameters are random values that follow a known (i.e. Gaussian) distributionBehavior of training data helps in revising parameter valuesLarge training samples → Better chances of refining posterior probabilities (parameter peaking)

Maximum-Likelihood (ML) Estimation Parameters of probabilistic distributions are fixed but unknown values Parameters → Unknown constants, Identify using training dataBest estimates of parameter values

Class-conditional probabilities are maximized over the available samples

Page 47: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 47

Parametric ApproachesCurse of dimensionality

Estimate the parameters known distributionSmaller number of samplesSome priori information

Page 48: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 48

Non-parametric ApproachesParzen window pdf estimation (KDE)

Estimate p(ωj | x) directly from sample patterns

Kn nearest-neighborDirectly construct the decision boundary based on training data

Page 49: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 49

Statistical Approaches

Page 50: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 50

Nearest-Neighbor MethodsStatistical, nearest-neighbor or memory-based methodsk-nearest-neighbor

New pattern category → Plurality of its k closest neighborLarge k → Decreases the chance of undue influence by noisy training patternSmall k → Reduces acuity of methodDistance metric usually Euclidean

In practice features are scaled → aj

Advantage → Does not require training

Page 51: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 51

Nearest-Neighbor DecisionExample

Class of training patterns → 4

Page 52: EEL 851: Biometrics · 2009. 7. 5. · EEL 851 4 ¾Feature Relevant characteristics that make patterns apart from each other Data extractable through measurements or processing ¾Examples

EEL 851 52

Richard O. Duda, Peter E. Hart, and David G.Stork, Pattern Classification, 2nd edition,John Wiley & Sons, 2001 (most contents)Nils J. Nilsson, Introduction to Machine Learning, http://ai.stanford.edu/people/nilsson/mlbook.htmlhttp://www.rii.ricoh.com/%7Estork/DHS.htmlA. K. Jain, R. P. W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans PAMI, pp. 4-37, Jan. 2000

Other Useful References

Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, NY, 1972. Therrien, Decision Estimation and Classification, John-Wiley & sons, New York, NY, 1989. Hertz, Krogh, and Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991. Bose and Liang, Neural Network Fundamentals with Graphs, Algorithms, and Applications, McGrawll-Hill, New York, NY, 1996

References