eel 851: biometrics · 2009. 7. 5. · eel 851 4 ¾feature relevant characteristics that make...

Post on 24-Aug-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EEL 851 1

EEL 851: Biometrics

An Overview ofStatistical Pattern Recognition

EEL 851 2

OutlineIntroduction

PatternFeatureNoise

ExampleProblem Analysis

SegmentationFeature ExtractionClassification

Design CycleBayesian Decision Theory

Discriminant FunctionsDecision RegionsMinimum Distance Classification

Nearest Neighbor ClassificationParameter EstimationDecision Boundaries

EEL 851 3

What is a Pattern?Object process of event consisting of both deterministic/stochastic componentsRecord of dynamic occurrences influenced by both deterministic and stochastic factorsExamples – voice, image, characters

Kind of PatternsVisual patternsTemporal patternsLogical patterns

What is Pattern Class?Set of patterns sharing set of common attributes (or features)Usually originating from the same source

Introduction

EEL 851 4

FeatureRelevant characteristics that make patterns apart from each otherData extractable through measurements or processing

ExamplesPatterns

Speech waveforms, crystals, textures, weather patternsFeatures

Age, color, height, width

ClassificationsAssigning patterns into classes based on features

NoiseDistortions associated with pattern processing and/or training samples that effect the classification performance of system

Introduction

EEL 851 5

ExampleAutomation in fish packing plant

Sort incoming fish on a conveyor according to species using optical sensing

Sorting speciesSea bassSalmon

EEL 851 6

Setup cameraTake some sample imagesFeatures?

Suggested featuresLengthLightnessWidthNumber and shape of finsPosition of mouth, etc..

Explore above suggested features!

Problem Analysis

EEL 851 7

SegmentationIsolate fishes from backgroundIsolate fishes from one another

Feature extractionReduce the data by measuring certain features

ClassifierEvaluates the evidence presented → Makes final decision

Preprocessing

EEL 851 8

Preprocessing

EEL 851 9

Selecting length of fish → Possible feature for classification

Classification

EEL 851 10

Length is a poor feature!Select brightness as a possible feature

Classification

EEL 851 11

CostConsequences of our decisionEqual?

Task of decision theoryMove the decision boundary towards smaller values of lightness, Cost ↓Reduce number of sea brass classified as salmon

Threshold Decision Boundary

EEL 851 12

Adopt lightness and the width of fish

Fish x = [x1, x2]

Classification

Lightness Width

EEL 851 13

Classification

EEL 851 14

Classification

Our satisfaction → PrematureCentral aim → Correctly classify a novel inputIssue of Generalization!

EEL 851 15

Classification

Syntactic Pattern RecognitionRecursive description using method of formal languages

Structural Pattern RecognitionDerivation of descriptions using formal rules

EEL 851 16

Pattern Recognition Systems

Invariant Features• Translation• Rotation• Scale

EEL 851 17

Data Collection

Feature Choice

Model Choice

Training

Evaluation

Computational Complexity

Design Cycle

EEL 851 18

Design Cycle

EEL 851 19

Supervised LearningTeacher → Category label or cost for each label in training setDesired input → Desired output

Goals → Produce a correct output given a new input

Goals of Supervised LearningClassification

Desired output → Discrete class labels

RegressionDesired output → Continuous valued

Unsupervised LearningThe system forms clusters or natural groupings of input patternGoals → Build a model or find useful representations of dataUsage → Reasoning, finding clusters, dimensionality reduction, decision making, prediction, etc.

Data compression, Outlier detection, Help classification

Types of Learning

EEL 851 20

AssumptionsProblem → Probabilistic termsKnowledge of relevant probabilities

Sea Bass/Salmon exampleState of nature

Random Variableω → ω1 (sea bass)ω → ω1 (salmon)

Priori probabilityP(ω1) → sea bass; P(ω2) → salmonP(ω1) + P( ω2) = 1 (exclusivity and exhaustivity)

Bayesian Decision Theory

EEL 851 21

Decision RuleOnly prior information, Cannot see!Decide ω1 if P(ω1) > P(ω2) otherwise decide ω2More info in most cases

Class Conditional InfoClass-conditional pdf → p(x|ω)p(x|ω1) and p(x|ω2)

Difference in lightness between populations of sea and salmon

Bayesian Decision Theory

EEL 851 22

Bayesian Decision Theory

Class-conditional pdfx → Lightness of fishNormalized

EEL 851 23

Bayes FormulaLightness of fish → x, Class?

P(ωj | x) = p(x | ωj) . P (ωj) / p(x)

where in case of two categories

Posterior = (Likelihood × Prior) / Evidencep(ωj | x) → Likelihoodp(x) → Evidence (scale factor)

∑=

=

=2

1

)()|()(j

jjj Pxpxp ωω

EEL 851 24

Posterior Probabilities

Prior probabilitiesP(ω1) = 2/3P(ω2) = 1/3

EEL 851 25

Decision given posterior probabilitiesObservation → xif P(ω1 | x) > P(ω2 | x) → True state of nature = ω1

if P(ω1 | x) < P(ω2 | x) → True state of nature = ω2

Probability of errorP(error | x) = P(ω1 | x) if we decide ω2

P(error | x) = P(ω2 | x) if we decide ω1

Bayes Decision Rule

EEL 851 26

Minimize probability of errorDecide ω1 if P(ω1 | x) > P(ω2 | x); otherwise decide ω2

Bayes decisionP(error | x) = min [P(ω1 | x), P(ω2 | x)]

Bayes Decision Rule

EEL 851 27

Generalization of Preceding IdeasMore than one featureMore than two state of natureActions not only deciding state of natureLoss function → More general than probability of error

Bayes Decision RuleInput pattern C is classified into class ωk , for a given feature vector x, maximizes the posterior probability;

P(ωk| x) ≥ P(ωj| x) for ∀ j ≠ kLikelihood conditions p(x |ωk) → Training data measurementsPrior probabilities P(ωk) → Supposed to known within given population of sample patternsp(x) is same for all class alternatives → Ignored, normalization factor

Bayesian Classifier

EEL 851 28

Certain classification errors may be costly than othersFalse Negative may be much more costly than False PositiveExamples → Fire alarm, Medical diagnosisΩ = ω1,ω1, … ωn → Possible set of nature/classes∆ = α1,α2, … αn → Possible decisions/actions

Loss Functionλ(αi|ωj)Loss incurred by taking action αi when true state of nature is ωj

Conditional RiskR(αi|x)Loss expected by taking action αi when observed evidence is x

Loss Function

EEL 851 29

Minimum Error Rate ClassificationAction αi is associated with class ωjAll errors are equally likelyZero-One Loss

Classification decision αi is correct only if state of nature is ωjSymmetrical function

Conditional RiskR(αi|x) = ∑ λ(αi|ωj) P(ωj|x) = 1 – P(ωi|x)

Minimize RiskSelect the class maximizing the posterior probabilitySuggested by Bayes Decision Rule, Minimum error rate classification

⎩⎨⎧

≠=

=jiji

ji for 1for 0

)|( ωαλ

EEL 851 30

Minimize overall risk subject to constraints∫ R(αi|x) dx < constantMisclassification limited to a frequencyFish example → Do not misclassify more than 1% of salmon as bassMinimize the chance of classifying sea bass as salmon

Neyman-Pearson Criterion

EEL 851 31

Classifier RepresentationFunction employed for discriminating among classesgi(x), i = 1, …,kClassifier → Assign x to class ωi if

gi(x) ≥ gj(x) for ∀ j ≠ i

Possible Choicesgi(x) = P(ωj|x)gi(x) = p(x|ωj)P(ωj)gi(x) = ln p(x|ωj) + ln P(ωj)

Discriminant Functions

EEL 851 32

Decision RegionEffect of Decision Rule

Feature space → Decision regionsDecision Boundaries

Surface in feature space →Ties occur among largest discriminant functions

EEL 851 33

Normal DensityUnivariate Density

Analytically tractable, Maximum entropyContinuous-valued densityA lot of processes are asymptotically GaussianCentral Limit Theorem

Aggregate effect of independent random disturbances → GaussianMany patterns → Prototype corrupted by large number of RP

µ = mean (or expected value) of xσ2 = expected squared deviation or variance

,x21exp

21)x(P

2

⎥⎥⎦

⎢⎢⎣

⎡⎟⎠⎞

⎜⎝⎛

σµ−

−σπ

=

EEL 851 34

Multivariate density

Multivariate normal density in d dimensions is:

where:x = (x1, x2, …, xd)t feature vectorµ = (µ1, µ2, …, µd)t mean vectorΣ = d*d covariance matrix|Σ| and Σ-1 are determinant and inverse respectively

Σ → Shape of Gaussian curve(x - µ)t Σ-1(x - µ) → Mahalanobis distance

⎥⎦⎤

⎢⎣⎡ µ−Σµ−−

Σπ= − )x()x(

21exp

)2(1)x(P 1t

2/12/d

Normal Density

EEL 851 35

⎥⎦⎤

⎢⎣⎡ µ−Σµ−−

Σπ= − )x()x(

21exp

)2(1)x(P 1t

2/12/d

Normal Density

EEL 851 36

Discriminant FunctionsMinimum Error Rate Classification

gi(x) = ln p(x | ωi) + ln P(ωi)

Case → Σi = σ2.IFeatures are statistically independent, same variance σ2

Diagonal covariance matrix

Linear machinesDecision surface → Pieces of hyperplanesgi(x) = gj(x)

)(Plnln212ln

2d)x()x(

21)x(g ii

1

ii

tii ω+Σ−π∑ −µ−µ−−=

0itii wg += xwx)(

ii µ21σ

=w )(ln iitii Pw ω

σ+

−= µµ20

21

EEL 851 37

Discriminant Functions

EEL 851 38

Discriminant Functions

EEL 851 39

Case → Σi = ΣCovariance matrices of all classes → Identical but arbitrary

Hyperplane separating Ri and Rj

Hyperplane separating Ri and Rj → Not orthogonal to the line between the means

Discriminant Functions

[ ]).(

)()(

)(/)(ln)( ji

jit

ji

jiji

PPx µµ

µµµµ

ωωµµ −

−Σ−−+=

−10 21

)(ln)( ii Pg ωµµ +Σ−= − )-(x)-(xx it

i1

21

EEL 851 40

Discriminant Functions

EEL 851 41

Discriminant Functions

EEL 851 42

Case → Σi = arbitrary

The covariance matrices are different for each category

HyperquadricsHyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids

)(lnln21

21 w

w21 W

:)(

10

1i

1i

0

iiiitii

ii

i

itii

ti

P

wherewxwxWxxg

ωµµ

µ

+Σ−Σ−=

Σ=

Σ−=

=+=

Discriminant Functions

EEL 851 43

Discriminant Functions

EEL 851 44

Decision Boundary - ExampleCompute Bayes decision boundary

Gaussian Distributions

Means and Covariance's → Discrete versions

⎥⎦

⎤⎢⎣

⎡=

63

1µ ⎟⎟⎠

⎞⎜⎜⎝

⎛=∑

2 00 21

1/

⎥⎦

⎤⎢⎣

⎡−

=23

2µ ⎟⎟⎠

⎞⎜⎜⎝

⎛=∑

2 00 2

2

211 1875012515143 xxx ... +−=2

EEL 851 45

Supervised LearningParametric Approaches

Bayesian parameter estimationMaximum likelihood estimation

Estimation ProblemEstimate P(ωj) Estimate p(ωj | x) → Tough

High dimensional feature spaces, small number of training samples

Simplifying AssumptionsFeature IndependenceIndependently drawn samples → I. I. D. modelAssume that p(ωj | x) is Gaussian

Estimation problem → Parameters of normality

EEL 851 46

Estimation MethodsBayesian Estimation (MAP)

Distribution parameters are random values that follow a known (i.e. Gaussian) distributionBehavior of training data helps in revising parameter valuesLarge training samples → Better chances of refining posterior probabilities (parameter peaking)

Maximum-Likelihood (ML) Estimation Parameters of probabilistic distributions are fixed but unknown values Parameters → Unknown constants, Identify using training dataBest estimates of parameter values

Class-conditional probabilities are maximized over the available samples

EEL 851 47

Parametric ApproachesCurse of dimensionality

Estimate the parameters known distributionSmaller number of samplesSome priori information

EEL 851 48

Non-parametric ApproachesParzen window pdf estimation (KDE)

Estimate p(ωj | x) directly from sample patterns

Kn nearest-neighborDirectly construct the decision boundary based on training data

EEL 851 49

Statistical Approaches

EEL 851 50

Nearest-Neighbor MethodsStatistical, nearest-neighbor or memory-based methodsk-nearest-neighbor

New pattern category → Plurality of its k closest neighborLarge k → Decreases the chance of undue influence by noisy training patternSmall k → Reduces acuity of methodDistance metric usually Euclidean

In practice features are scaled → aj

Advantage → Does not require training

EEL 851 51

Nearest-Neighbor DecisionExample

Class of training patterns → 4

EEL 851 52

Richard O. Duda, Peter E. Hart, and David G.Stork, Pattern Classification, 2nd edition,John Wiley & Sons, 2001 (most contents)Nils J. Nilsson, Introduction to Machine Learning, http://ai.stanford.edu/people/nilsson/mlbook.htmlhttp://www.rii.ricoh.com/%7Estork/DHS.htmlA. K. Jain, R. P. W. Duin, and J. Mao, “Statistical Pattern Recognition: A Review,” IEEE Trans PAMI, pp. 4-37, Jan. 2000

Other Useful References

Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, NY, 1972. Therrien, Decision Estimation and Classification, John-Wiley & sons, New York, NY, 1989. Hertz, Krogh, and Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991. Bose and Liang, Neural Network Fundamentals with Graphs, Algorithms, and Applications, McGrawll-Hill, New York, NY, 1996

References

top related