Download - Commonly Used Classifiers

8/11/2019 Commonly Used Classifiers

1/33

Commonly Used Classification

Techniques and Recent Developments

Presented by Ke-Shiuan Lynn


2/33


3/33


4/33

Terminology (cont.)

In practice, input vectors of different classesare rarely so neatly distinguishable. Samplesof different classes may have same input

vectors. Due to such a uncertainty, areas ofinput space can be clouded by a mixture ofsamples of different classes.

Input #2

Input #1


5/33

Terminology (cont.)

The optimal classifieris the one expected to

produce the least number of misclassifications.

Such misclassifications are due to uncertainty in the

problem rather than a deficiency in the decision

regions.

Input #2

Input #1


6/33


7/33

Types of Models

Decision-Region Boundaries

Probability Density Functions

Posterior Probabilities


8/33

Decision-Region Boundaries

This type of model defines decision regions

by explicitly constructing boundaries in the

input space.

These models attempt to minimize the

number of expected misclassifications by

placing boundaries appropriately in the

input space.


9/33

Probability Density Functions (PDFs) The models of this type attempt toconstruct aprobability density function,p(x|C), that maps a pointxin the input

space to class C. Prior probabilities,p(C), is to be estimated

from the given database.

This model assigns the most probable classto an input vectorxby selecting the classmaximizingp(C)p(x|C).


10/33

Posterior Probabilities

Let there be mpossible classes denoted C1,

C2, , Cm. This type of models attempts to

generate mposterior probabilitiesp(Ci

|x),

i=1, 2, , mfor any input vectorx.

The classification is made in the way that

the input vector is assigned to the class

associated with maximalp(Ci|x).


11/33

Approaches to Modeling

Fixed models

Parametric models

Nonparametric models


12/33

Fixed models

Fixed model is used when the exact input-

output relationship is known.

Decision region boundary: A known thresholdvalue (e.g. A particular BMI value for defining

obesity)

PDF: When each classs PDF can be obtained Posterior probability: when the probability that

any observation belongs to each class is know.


13/33


14/33

Parametric Models (cont.)

Decision-region boundary: Linear

discriminant function e.g.

y=ax1

+bx2

+cx3

+d

PDF: Multivariate Gaussian function

Posterior probability: Logistic regression


15/33

Nonparametric Models

Nonparametric model is used when the

relationships between input vectors and

their associated classes are not well

understood.

Models of varying smoothness and

complexity are generated and the one with

best generalizationis chosen.


16/33

Nonparametric Models (cont.)

Decision-region boundary: LearningVector Quantization (LVQ),Knearestneighbor classifier, decision tree.

PDF: Gaussian mixture methods, Pazenswindow.

Posterior probability: Artificial neural

network (ANN), radial basis function(RBF), group method of data handling(GMDH)


17/33

Commonly Used AlgorithmsParametric Nonparametric

Linear regression

Logistic regression

Unimodal Gaussian

Backpropagation

Radial basis function

K nearest neighbor

Gaussian mixture

Nearest clustering

Binary/Linear decision tree

Projection pursuit

Estimate-Maximize clusteringMultivariate Adaptive Regression Spline (MARS)

Group Method of Data Handling (GMDH)

Parzens window

Learning Vector Quentization (LVQ)


18/33


19/33

Memory UsageAlgorithm Memory Usage

Linear / Logistic regression Very low

Unimodal Gaussian Very low

Backpropagation Low

Radial basis function Medium

K nearest neighbor High

Gaussian mixture Medium

Nearest clustering Medium

Binary / Linear decision tree Low

Projection pursuit Low

Estimate-Maximize clustering MediumMARS Low

GMDH Low

Parzens window High

LVQ Medium


20/33

Training TimeAlgorithm Training Time

Linear / Logistic regression Fast-Medium

Unimodal Gaussian Fast-Medium

Backpropagation Slow


K nearest neighbor No training required

Gaussian mixture Medium-Slow

Nearest clustering Medium

Binary / Linear decision tree Fast

Projection pursuit Medium

Estimate-Maximize clustering MediumMARS Medium

GMDH Fast-Medium

Parzens window Fast

LVQ Slow


21/33

Classification timeAlgorithm Classification time

Linear / Logistic regression Very fast

Unimodal Gaussian Fast

Backpropagation Very fast


K nearest neighbor Slow

Gaussian mixture Medium

Nearest clustering Fast-medium

Binary / Linear decision tree Very fast

Projection pursuit Fast

Estimate-Maximize clustering MediumMARS Fast

GMDH Fast

Parzens window Slow

LVQ Medium


22/33

Comparison of Algorithms

Linear regression:y = w0+w1x1+w2x2 ++wNxN

Logistic regression:

Linear and Logistic regressions both tend to

explicitly construct the decision-region

boundaries.

Advantages: Easy implementation, easy

explanation of input-output relationship Disadvantages: Limited complexity on the

constructed boundary

)1(1

e

y N

i iixww

10


23/33

Comparison of Algorithms (cont)

Binary decision tree:

Binary and Linear decision trees also tend toexplicitly construct the decision-region

boundaries.

Advantages: Easy implementation, easy

explanation of input-output relationship

Disadvantages: Limited complexity on the

constructed boundary, the tree structure may not

be global optimal.

Root

xi>=c1 xi=c2 xj=c3 xk


24/33

Comparison of Algorithms (cont)

Neural Network:

Feedforward neural network and radial-basisfunction network both tend to implicitly construct

the decision-region boundaries.

Advantages: They can both approximate any

complex decision boundaries provided that enough

nodes are used.

Disadvantages: Long training time


25/33

Comparison of Algorithms (cont) Supporting vector machine

Supporting vector machine also tends to implicitly

construct the decision-region boundaries.

Advantages: This type of classifier has been shown to

have good generalization capability.


26/33


27/33


28/33

Comparison of Algorithms (cont)K nearest neighbor classifier

K nearest neighbor tends to construct posteriorprobabilitiesP(Cj|X)

Advantage: No training is required, confidence

level can be obtained Disadvantage: classification accuracy is low is

complex decision-region boundary exists, largestorage required.


29/33

Other Useful Classifiers

Projection Pursuit: aims to decomposing

the task of high-dimensional modeling into

a sequence of low-dimensional modeling.

This algorithm consists of two stage: the

first stage projects the input data onto a

one-dimensional space while the second

stage construct the mapping from projected

space to the output space.


30/33

Other Useful Classifiers (cont) Multivariate adaptive regression splines (MARS)

tends to approximate the decision-regionboundaries in two stages.

At the first stage, the algorithm partitions the statespace into small portions.

At the second stage, the algorithm construct alow-order polynomial to approximate the

decision-region boundary within each partition. Disadvantage: This algorithm is intractable for

problem with high (> 10) dimensional inputs


31/33

Other Useful Classifiers (cont)

Group method of data handling (GMDH)also aims to approximate the decision-region boundaries using high-order

polynomial functions. The modeling process begins with a low

order polynomial, and then iteratively

combines terms to produce a higher orderpolynomial until the modeling accuracysaturates.


32/33

Keep The Following In Mind

Use multiple algorithms without bias and

let your specific data help determine which

model is best suited for your problem.

Occams Razor: Entities should not be

multiplied unnecessarily -- "when you have

two competing models which make exactlythe same predictions to the data, the one

that is simpler is the better."


33/33

A New Member In Our Group

Download - Commonly Used Classifiers

Top Related