generalizing linear discriminant analysis. linear discriminant analysis objective -project a feature...
TRANSCRIPT
Generalizing Linear Discriminant Analysis
Linear Discriminant Analysis Objective
-Project a feature space (a dataset n-dimensional samples) onto a smaller
-Maintain the class separation
Reason
-Reduce computational costs
-Minimize overfitting
Linear Discriminant Analysis Want to reduce dimensionality while preserving ability to discriminate
Figures from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most:
Equation from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most:
Equations from [1]
Linear Discriminant Analysis
Figure from [1]
Linear Discriminant Analysis Fisher’s solution.
Linear Discriminant Analysis Fisher’s solution…
Scatter:
Equation from [1]
Linear Discriminant Analysis Fisher’s solution…
Scatter:
Maximize:
Equations from [1]
Linear Discriminant Analysis Fisher’s solution…
Figure from [1]
Linear Discriminant Analysis How to get optimum w*?
Linear Discriminant Analysis How to get optimum w*?
◦ Must express J(w) as a function of w.
Equation from [1]
Linear Discriminant Analysis How to get optimum w*8…
Equation from [1]
Linear Discriminant Analysis How to get optimum w*…
Equations modified from [1]
Linear Discriminant Analysis How to get optimum w*…
Equation from [1]
Linear Discriminant Analysis How to get optimum w*…
Equation from [1]
Linear Discriminant Analysis How to get optimum w*…
Equations from [1]
Linear Discriminant Analysis How to generalize for >2 classes:
-Instead of a single projection, we calculate a matrix of projections.
Linear Discriminant Analysis How to generalize for >2 classes:
-Instead of a single projection, we calculate a matrix of projections.
-Within-class scatter becomes:
-Between-class scatter becomes:
Equations from [1]
Linear Discriminant Analysis How to generalize for >2 classes…
Here, W is a projection matrix.
Equation from [1]
Linear Discriminant Analysis Limitations of LDA:
-Parametric method
-Produces at most (C-1) projections
Benefits of LDA:
-Linear Decision Boundaries◦ Human interpretation◦ Implementation
-Good classification results
Flexible Discriminant Analysis
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem.
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem.
-“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish)
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem.
-“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish)
◦ Linear regression can be generalized into more flexible, nonparametric forms of regression.◦ (Parametric – mean, variance…)
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem.
-“Differences between LDA and FDA and what criteria can be used to pick one for a given task?” (Tavish)
◦ Linear regression can be generalized into more flexible, nonparametric forms of regression.◦ (Parametric – mean, variance…)◦ Expands the set of predictors via basis expansions
Flexible Discriminant Analysis
Figure from [2]
Penalized Discriminant Analysis
Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth.
◦ Directly curbing ‘overfitting’ problem
Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth.
◦ Directly curbing ‘overfitting’ problem
Positively correlated predictors lead to noisy, negatively correlated coefficient
estimates, and this noise results in unwanted sampling variance.◦ Example: images
Penalized Discriminant Analysis
Images from [2]
Mixture Discriminant Analysis
Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian:
Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian:
-Model each class as a mixture of two or more Gaussian components.
-All components sharing the same covariance matrix
Mixture Discriminant Analysis
Image from [2]
Sources1. Gutierrez-Osuna, Ricardo– “CSCE 666 Pattern Analysis – Lecture 10” http://
research.cs.tamu.edu/prism/lectures/pr/pr_l10.pdf
2. Hastie , Trever, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
3. Raschka, Sebastian - “Linear Discriminant Analysis bit by bit” http://sebastianraschka.com/Articles/2014_python_lda.html
END.