support vector machines: optimization of the methodology presentation/2016... · support vector...

15
Support Vector Machines: Optimization of Decision Making Christopher Katinas March 10, 2016

Upload: dinhkhanh

Post on 26-Apr-2018

230 views

Category:

Documents


5 download

TRANSCRIPT

Support Vector Machines: Optimization of Decision Making

Christopher Katinas March 10, 2016

Overview • Background of Support Vector Machines • Segregation Functions/Problem Statement • Methodology • Training/Testing Results • Conclusions

Support Vector Machines (SVMs) • Goal: Maximize the margin between two distinct

groups via a segregation function • Certain engineering problems require high certainty

estimations of the equation separating two data sets (ex. phase diagrams in thermodynamics)

• Distinct phases are separated by functions which may not be described easily in closed form

https://en.wikipedia.org/wiki/Phase_diagram

Can the liquid/vapor line be recreated by only using select points and using an SVM to identify the function?

Segregation Functions to Match

𝒚𝒚 𝒙𝒙 = 0.01𝑥𝑥2 + 5 Parabola

Segregation Functions to Match

𝒚𝒚 𝒙𝒙 = 0.01𝑥𝑥2 + 5

𝒚𝒚 𝒙𝒙 = 0.5𝑥𝑥 + 25

Parabola

Line

Segregation Functions to Match

𝒚𝒚 𝒙𝒙 = 0.01𝑥𝑥2 + 5

𝒚𝒚 𝒙𝒙 = 0.5𝑥𝑥 + 25

𝒚𝒚 𝒙𝒙 = 108.07131− 1730.63𝑥𝑥+233.42

101.325760

Antoinne Equation for Vapor Pressure of Water

Parabola

Line

Segregation Functions to Match

𝒚𝒚 𝒙𝒙 = 0.01𝑥𝑥2 + 5

𝒚𝒚 𝒙𝒙 = 0.5𝑥𝑥 + 25

𝒚𝒚 𝒙𝒙 = 108.07131− 1730.63𝑥𝑥+233.42

101.325760

Antoinne Equation for Vapor Pressure of Water

𝒚𝒚 𝒙𝒙 = ± 232 − 𝑥𝑥 − 54 22 + 50 Circle of Radius 23 at centered at (54,50)

Parabola

Line

Segregation Functions to Match

𝒚𝒚 𝒙𝒙 = 0.01𝑥𝑥2 + 5

𝒚𝒚 𝒙𝒙 = 0.5𝑥𝑥 + 25

𝒚𝒚 𝒙𝒙 = 108.07131− 1730.63𝑥𝑥+233.42

101.325760

Antoinne Equation for Vapor Pressure of Water

𝒚𝒚 𝒙𝒙 = ± 232 − 𝑥𝑥 − 54 22 + 50 Circle of Radius 23 at centered at (54,50)

𝒚𝒚 𝒙𝒙 = ±62 − 𝑥𝑥 − 84 22

6 + 20

𝒚𝒚 𝒙𝒙 = ± 232 − 𝑥𝑥 − 54 22 + 50

Circle of Radius 23 at centered at (54,50) and Ellipse centered at (84,20)

Parabola

Line

Methodology • Solve Lagrangian Dual Problem

max �𝛼𝛼𝑖𝑖

𝑛𝑛

𝑖𝑖=1

−12��𝛼𝛼𝑖𝑖

𝑛𝑛

𝑗𝑗=1

𝛼𝛼𝑗𝑗𝑦𝑦𝑖𝑖𝑦𝑦𝑗𝑗𝑲𝑲(𝒙𝒙𝒊𝒊,𝒙𝒙𝒋𝒋)𝑛𝑛

𝑖𝑖=1

such that 𝐶𝐶 ≥ 𝛼𝛼𝑖𝑖 ≥ 0 and�𝑦𝑦𝑖𝑖𝛼𝛼𝑖𝑖

𝑛𝑛

𝑖𝑖=1

= 0

min −�𝛼𝛼𝑖𝑖

𝑛𝑛

𝑖𝑖=1

+12��𝛼𝛼𝑖𝑖

𝑛𝑛

𝑗𝑗=1

𝛼𝛼𝑗𝑗𝑦𝑦𝑖𝑖𝑦𝑦𝑗𝑗𝑲𝑲(𝒙𝒙𝒊𝒊,𝒙𝒙𝒋𝒋)𝑛𝑛

𝑖𝑖=1

such that 𝐶𝐶 ≥ 𝛼𝛼𝑖𝑖 ≥ 0and�𝑦𝑦𝑖𝑖𝛼𝛼𝑖𝑖

𝑛𝑛

𝑖𝑖=1

= 0

𝑲𝑲 𝒙𝒙𝒊𝒊,𝒙𝒙𝒋𝒋 = 𝑃𝑃 + 𝐴𝐴𝒙𝒙𝒊𝒊𝑇𝑇𝒙𝒙𝒋𝒋𝒅𝒅

Kernel Function

Matlab ‘quadprog’ can solve this!

Select A to prevent numerical overflow for a given d, and P should be large to force optimizer to solve for correct weights - A-1=max(xiTxj) [Normalize inputs] - P= 1𝑒𝑒1𝑒 𝟏𝟏/𝒅𝒅

C was set to 1.0 for all simulations performed in this study (hence the choice of A and P)

Training Method • Use Delaunay Triangulation to identify most

critical points and query the function close to the boundary – RED line segments denote where segregation function

must reside – Specify maximum number of refinements – Keep only the points which bound the function for

faster optimization

Training/Testing Results

All results shown for 8 refinements and five random seeding points on each side of the function – Eighth Order Polynomial Kernel was used. All Training Points were Kept!

Magenta X = Group 1 Test Points Blue Area = Testing Group 1 Green X = Group 2 Test Points Maroon Area = Testing Group 2 Cyan circles = Support Vectors Yellow Line = Actual Boundary

Magenta X = Group 1 Test Points Green X = Group 2 Test Points Blue Lines = Segregation Function Anti- Gate Red Lines = Segregation Function Gate Black circles = Desired New Points

Training/Testing Results

Parabolic – 0.68% Error

Line – 0.54% Error

Antoinne – 0.94% Error

Circle – 0.13% Error

Circle/Ellipse – 0.60% Error

Training/Testing Results

Antoinne – 0.94% Error No Pre-Seeding

Antoinne – 0.26% Error Pre-Seeded Boundary Points Only

• Error in Antoinne Equation was due to no test points at boundaries – Created one point at each corner of the domain [Pre-Seeding]

Training/Testing Results with Noise

Antoinne – 0.26% Error Zero Noise

• Slack variables automatically included based on methodology shown earlier.

• More support vectors than for the no noise case due to higher difficult in fitting of segregation function

(5 units of uniform random noise prescribed in each input variable)

Antoinne – 0.70% Error

Conclusions • SVMs are extremely versatile in allowing for

quantifiable decision-making strategies • Capability of support vector machines was successfully

demonstrated via five examples • Care must be taken in selecting the parameters and

training points – Poor choice of number of training points can lead to

improper bounding function and ultimately higher error – Delaunay triangulation is a new method to acquire more

desirable training points over random domain space – Modified Kernel function constants were based on

optimization versatility and general convergence – Noise can be included and SVM is capable of creating a

reasonable segregation function