inf3490 -biologically inspired computing support vector ... · inf3490 -biologically inspired...
TRANSCRIPT
INF3490 - Biologically inspired computingSupport Vector Machines, Ensemble
Learning, and Dimensionality Reduction
Weria KhaksarOctober 17, 2018
2
Support Vector Machines (SVM)
17.10.2018
3
Support Vector Machines (SVM): Background
17.10.2018
SVM is used for extreme classification cases.CAT DOG
?
4
Support Vector Machines (SVM): Background
17.10.2018
Remember the inefficiency of the Perceptron?
5
Support Vector Machines (SVM): Background
17.10.2018
Linear Separability
?6
Support Vector Machines (SVM): Background
17.10.2018
A trick to solve it …
It is always possible to separate out twoclasses with a linear function, provided thatyou project the data into the correct set ofdimensions.
7
Support Vector Machines (SVM): Background
17.10.2018
A trick to solve it …
?
8
Support Vector Machines (SVM): The Margin
17.10.2018
Which line is the best separator?
9
Support Vector Machines (SVM): The Margin
17.10.2018
Why do we need the best line?
10
Support Vector Machines (SVM): The Margin
17.10.2018
Which line is the best separator?
The one with the highest margin
11
Support Vector Machines (SVM): Support Vectors
17.10.2018
Which data points are important?
12
Support Vector Machines (SVM): Support Vectors
17.10.2018
Which data points are important?
Support Vectors
The data points in each class that lie closest to the classification line are called Support Vectors.
13
Support Vector Machines (SVM): Optimal Separation
17.10.2018
The margin should be as large as possible.
the best classifier is the one that goes through the middle of the marginal area.
We can through away other data and just use support vectors for classification.
14
Support Vector Machines (SVM): The Math.
17.10.2018
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 |𝑀|𝑠. 𝑡. : 𝑡 𝐰 . 𝐱 𝑏 1, 𝑖 1, … , 𝑛
15
Support Vector Machines (SVM): Slack Variables for Non-Linearly Separable Problems:
17.10.2018 16
Support Vector Machines (SVM): Slack Variables for Non-Linearly Separable Problems:
17.10.2018
17
Support Vector Machines (SVM): KERNELS
17.10.2018
The trick is to modify the input features in some way, to beable to linearly classify the data.
The main idea is to replace the input feature, 𝐱 , with somefunction, 𝜙 𝐱 .
The main challenge is to automate the algorithm to find theproper function without a suitable knowledge domain.
18
Support Vector Machines (SVM): KERNELS
17.10.2018
The trick is to modify the input features in some way, to beable to linearly classify the data.
The main idea is to replace the input feature, 𝐱 , with somefunction, 𝜙 𝐱 .
The main challenge is to automate the algorithm to find theproper function without a suitable knowledge domain.
19
Support Vector Machines (SVM): KERNELS
17.10.2018
The trick is to modify the input features in some way, to beable to linearly classify the data.
The main idea is to replace the input feature, 𝐱 , with somefunction, 𝜙 𝐱 .
The main challenge is to automate the algorithm to find theproper function without a suitable knowledge domain.
20
Support Vector Machines (SVM): SVM Algorithm:
17.10.2018
21
Support Vector Machines (SVM): SVM Examples:
17.10.2018 22
Support Vector Machines (SVM): SVM Examples:
17.10.2018Performing nonlinear classification via linear separation in higher dimensional space
23
Support Vector Machines (SVM): SVM Examples:
17.10.2018
The SVM learning about a linearly separable dataset (top row) and a dataset that needs two straight lines to separate in 2D(bottom row) with left the linear kernel,middle the polynomial kernel of degree 3, and right the RBF kernel.
24
Support Vector Machines (SVM): SVM Examples:
17.10.2018 The effects of different kernels when learning a version of XOR
25
Ensembled Learning
17.10.2018 26
Ensemble Learning: Background
17.10.2018
Having lots of simple learners that each provide slightly different results,
Putting them together in a proper way,
The results are significantly better.
27
Ensemble Learning: Background
17.10.2018
The Basic Idea:
28
Ensemble Learning: Important Considerations
17.10.2018
Which learners should we use?
How should we ensure that they learn different things?
How should we combine their results?
29
Ensemble Learning: Important Considerations
17.10.2018
Which learners should we use?
How should we ensure that they learn different things?
How should we combine their results?
30
Ensemble Learning: Background
17.10.2018
If we take a collection of very poor learners,each performing only just better than chance,then by putting them together it is possible tomake an ensemble learner that can performarbitrarily well.
We just need lots of low-quality learners, anda way to put them together usefully, and wecan make a learner that will do very well.
31
Ensemble Learning: Background
17.10.2018 32
Ensemble Learning: How it works?
17.10.2018
33
Ensemble Learning: BOOSTING
17.10.2018
As points are misclassified, their weights increase inboosting (shown by the data point getting larger), whichmakes the importance of those data points increase,making the classifiers pay more attention to them.
34
Ensemble Learning: BOOSTING
17.10.2018
AdaBoost:
35
Ensemble Learning: BOOSTING
17.10.2018 36
Ensemble Learning: BOOSTING
17.10.2018
AdaBoost: How it works?
37
Ensemble Learning: BOOSTING
17.10.2018
AdaBoost:
AdaBoost in Action
38
Ensemble Learning: BAGGING
17.10.2018
Bagging (Bootstrap Aggregating):
39
Ensemble Learning: BAGGING
17.10.2018
Bagging (Bootstrap Aggregating): How it works?
40
Ensemble Learning: BAGGING
17.10.2018
Bagging (Bootstrap Aggregating): Examples:
41
Ensemble Learning: Summary
17.10.2018 42
Dimensionality reduction
17.10.2018
43
Dimensionality reduction: Why?
17.10.2018
When looking at data and plotting results, we can never go beyond three dimensions.
The higher the number of dimensions we have, the more training data we need.
The dimensionality is an explicit factor for the computational cost of many algorithms.
Remove noise. Significantly improve the results of the learning
algorithm. Make the dataset easier to work with. Make the results easier to understand.
44
Dimensionality reduction: How?
17.10.2018
Feature Selection: Looking through the featuresthat are available and seeing whether or notthey are actually useful.
Feature Derivation: Deriving new features fromthe old ones, generally by applying transforms tothe dataset.
Clustering: Grouping together similar datapoints, and see whether this allows fewerfeatures to be used.
45
Dimensionality reduction: Example
17.10.2018 46
Dimensionality reduction: Principal Components Analysis (PCA)
17.10.2018
47
Dimensionality reduction: Principal Components Analysis (PCA)
17.10.2018
The principal component is the direction in the data with the largest variance.
48
Dimensionality reduction: Principal Components Analysis (PCA)
17.10.2018
49
Dimensionality reduction: Principal Components Analysis (PCA)
17.10.2018
PCA is a linear transformation
• Does not directly help with data that is not linearly separable.
• However, may make learning easier because of reduced complexity.
PCA removes some information from the data
• Might just be noise.• Might provide helpful nuances that may be of help
to some classifiers.50
Dimensionality reduction: Principal Components Analysis (PCA) Example
how to project samples into the variable space17.10.2018
17.10.2018 51