statistical learning theory & classifications based on support vector machines 2014: anders...

48
Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning Theory by V. Vapnik 1

Upload: eugenia-randall

Post on 19-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Statistical Learning Theory & Classifications Based on Support Vector

Machines

2014: Anders Melen2015: Rachel Temple

The Nature of Statistical Learning Theory by V. Vapnik

1

Page 2: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

2

Page 3: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

3

Page 4: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Empirical Data Modeling

• Observations of a system are collected• Induction on observations are used to build up a

model of the system.• Model is then used to deduce responses of an

unobserved system. • Sampling is typically non-uniform• High dimensional problems will form a sparse

distribution in the input space

4

Page 5: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Modeling Error

• Approximation error is the consequence of the hypothesis space not fitting the target space

Globally Optimal Model

Best Reachable Model

Selected Model

5

Page 6: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Modeling Error

• Approximation error is the consequence of the hypothesis space not fitting the target space

Globally Optimal Model

Best Reachable Model

Selected Model

● Goal○ Choose a model from the hypothesis

space which is closest (w/ respect to some error measure) to the function target space

6

Page 7: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

• Estimation Error is the error between the best model in our hypothesis space and the model within our hypothesis space that we selected.

● This forms the Generalization Error

Globally Optimal Model

Best Reachable Model

Selected Model

Approximation Error

Generalization Error

Estimation Error

7

Page 8: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

•The Globally optimal model & the selected model form the generalization error which measures how well our data model adapts to new and unobserved data

8

Page 9: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

9

Page 10: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Statistical Learning Theory

Definition: “Consider the learning problem as a problem of finding a desired dependence using a limited number of observations.” (Vapnik 17)

10

Page 11: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

11

Page 12: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Model of Supervised Learning• Training

o The supervisor takes each generated x value and returns an output value y.

o Each (x,y) pair is part of the training set:

F(x,y) = F(x)F(y|x) = (x1, y1) , (x2, y2), … , (xl,yl) 12

Page 13: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

13

Page 14: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Risk Minimization

• To find the best function, we need to measure loss

• L is the discrepancy function which is based on the y’s generated by the supervision and the ŷ’s generated by the estimate functions

• F is a predictor such that expected loss is minimized

L(y, F(x, ))𝛂

14

Page 15: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Risk Minimization

• Pattern Recognitiono With pattern recognition, the supervisor’s output y can

only take on 2 values, y = {0,1} and the loss takes the following values.

○ So the risk function determines the probability of different answers being given by the supervisor and the estimation function.

15

Page 16: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Some Simplifications From Here On

● Training Set{(X1,Y1), … , (Xl,Yl)} → {Z1, … , Zl}

● Loss FunctionL(y, F(x, )) → 𝛂 Q(z, )𝛂

16

Page 17: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Empirical Risk Minimization (ERM)

● We want to measure the risk over the training set rather than the set of all

17

Page 18: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Empirical Risk Minimization (ERM)

● The empirical risk must converge to the actual riskover the set of loss functions

18

Page 19: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Empirical Risk Minimization (ERM)

● In both directions!

19

Page 20: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

20

Page 21: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Vapnik-Chervonenkis Dimensions• Lets just call them VC Dimensions

• Developed by Alexey Jakovlevich Chervonenkis & Vladimir Vapnik

• The VC dimension is scalar value that measures the capacity of a set of functions

21

Page 22: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Vapnik-Chervonenkis Dimensions

• The VC dimension is a set of functions responsible for the generalization ability of learning machines

• The VC dimension of a set of indicator functions Q(z, )𝛂 𝛂 ∈ 𝞚 is the maximum number h of vectors

z1, …, zh that can be separated into two classes in all 2h possible ways using functions of the set.

22

Page 23: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Upper Bound For Risk

• It can be shown that

where is the confidence interval and h is the

VC dimension

23

Page 24: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Upper Bound For Risk

• ERM only minimizes and ,

the confidence interval, is fixed based on the VC dimension of the set of functions determined by apriori

• ERM must tune the confidence interval based on the problem to avoid overfitting and underfitting

24

Page 25: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

25

Page 26: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Structural Risk Management (SRM)

• SRM attempts to minimize the right hand size of the inequality over both terms simultaneously

26

Page 27: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Structural Risk Management (SRM)

The term is dependent on a specific function’s error while the

term depends on the dimension of the space that the functions lives in.

• The VC dimension is the controlling variable

27

Page 28: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Structural Risk Management (SRM)

• We define the hypothesis space S to be the set of functions:

Q(z, )𝛂 𝛂 ∈ 𝞚• We say that Sk= {Q(z, )}, 𝛂 𝛂 ∈ 𝞚k is the

hypothesis space of a VC dimension, k, such that:

28

Page 29: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

29

Page 30: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)

• Map input vectors x into a high-dimensional feature space using a kernel function:

(zi, z) = K(x, xi)

30

Page 31: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)• Feature space… Optimal hyperplane…

What are you talking about...

31

Page 32: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)

32

Page 33: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)

● Lets try a basic one dimensional example!

33

Page 34: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)

● Aw snap, that was easy!

34

Page 35: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)

● Ok, what about a harder one dimensional example?

35

Page 36: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)

● Project the lower dimensional data into a higher dimensional space just like in the animation!

36

Page 37: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Support Vector Machines (SVM)

● There is several ways to implement a SVM

○ Polynomial Learning Machine (Like the animation)

○ Radial Basis Function Machines

○ Two-Layer Neural Networks

37

Page 38: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Simple Neural Network

● Neural Networks are computer science models inspired by nature!

● The brain is a massive natural neural network consisting of neurons and synapses

● Neural networks can be modeled using a graphical model

38

Page 39: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Simple Neural Network

● Neurons → Nodes● Synapses → Edges

Molecular Form Neural Network Model39

Page 40: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Two-Layer Neural Network

Kernel is a sigmoid function

Implementing the rules

40

Page 41: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Two-Layer Neural Network

● Using this technique the following are found automatically:

i. Architecture of a two-layer machine

ii. Determining N number of units in first layer (# of support vectors)

iii. The vectors of the weights wi = xi in the first layer

iv. The vector of weights for the second layer (values of 𝛂)

41

Page 42: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Conclusion

● The quality of a learning machine is characterized by three main components

a. How rich and universal is the set of functions that the LM can approximate?

b. How well can the machine generalize?c. How fast does the learning process for this

machine converge

42

Page 43: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Machines (SVM)• Exam Questions• Q & A Session

43

Page 44: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Exam Question #1

• What is the main difference between Polynomial, radial basis learning machines and neural network learning machines? Also provide that difference for the neural network learning machineo The kernel function

44

Page 45: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Exam Question #2

• What is empirical data modeling? Give a summary of the main concept and its componentso Empirical data modeling is the induction of observations

to build up a model. Then the model is used to deduce responses of an unobserved system.

45

Page 46: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Exam Question #3

• What must the Remp( )𝛂 do over the set of loss functions?o It must converge to the R( )𝛂

46

Page 47: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

Table of Contents

• Empirical Data Modeling• What is Statistical Learning Theory• Model of Supervised Learning• Risk Minimization• Vapnik-Chervonenkis Dimensions• Structural Risk Management (SRM)• Support Vector Classification

o Optimal Separating Hyperplane & Quadratic Programming

• Support Vector Machines (SVM)• Exam Questions• Q & A Session 47

Page 48: Statistical Learning Theory & Classifications Based on Support Vector Machines 2014: Anders Melen 2015: Rachel Temple The Nature of Statistical Learning

End

Any questions?

48