an introduction to machine learning · machine learning introduction supervised learning...
TRANSCRIPT
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
An Introduction to Machine LearningResearch Group Meeting 2017
Melissa R. McGuirl
February 22, 2017
An Introduction to Machine Learning February 22, 2017 1 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Table of Contents
1 Introduction2 Supervised Learning
Generalized Linear ModelsSupport Vector MachinesDecision Trees
3 Unsupervised LearningManifold learningClusteringNeural Networks
4 Model Selection and Evaluation5 Applications6 Pitfalls to avoid7 Machine Learning that Matters8 Bibliography
An Introduction to Machine Learning February 22, 2017 2 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Introduction
General Set-up
Set-up and GoalSuppose we have X1,X2, . . . ,Xn data samples. Can wepredict properites about any given Xn+1,Xn+2, . . . ,XN?
Machine learning systems attempt to predict properties ofunknown data based on the attributes or features of thedata.
An Introduction to Machine Learning February 22, 2017 3 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Introduction
Supervised learning
1 Data comes with attributes that we want our algorithmto predict
2 Machine learning algorithm is given data attributes anddesired outputs and the goal is to learn a way to mapinputs to outputs in a general way
3 2 main types of learning:1 Classification: Data belongs to different
classes/groups and we want to be able to predict whichclass/group unlabeled data belongs to
2 Regression: Data labeled with one or more continuousvariables (parameters) and the task is to predict thevalue of these variables for unknown data
An Introduction to Machine Learning February 22, 2017 4 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Introduction
Image source: http://ipython-books.github.io/featured-04/
An Introduction to Machine Learning February 22, 2017 5 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Introduction
Unsupervised Learning
1 No labels are given to the algorithm2 Training data consists of a set of input vectors{X1,X2, . . . ,Xn} with no target outputs
3 3 Types of goals in this setting:
1 Clustering: discover groups with similar features withinthe data
2 Density Estimation: determine distribution of datawithin the input space
3 Dimensionality Reduction: project data into a lowerdimensional space than input space
An Introduction to Machine Learning February 22, 2017 6 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Generalized Linear Models
Generalized Linear Models
Set-up: Data in the form of Xi = (x i1,x
i2, . . . ,x
ip)
Goal: Regression
Find y
y(w ,Xi) = w0 + w1x i1 + w2x i
2 + . . .wpx ip,
where w0 is called the intercept and w1, . . .wp are thecoefficients.
An Introduction to Machine Learning February 22, 2017 7 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Generalized Linear Models
GLM: Ordinary Least Squares
Fits linear model y(w ,Xi) = w0 + w1x i1 + w2x i
2 + . . .wpx ip
where w = (w1, ...,wp) coefficients obtained by solving
minw||Xw −y ||22
Model relies on independence of model termsIf terms are correlated then the least-square estimate ishighly sensitive to random errors in the observedresponse (large variance)Complexity: O(np2)
An Introduction to Machine Learning February 22, 2017 8 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Generalized Linear Models
GLM: Ridge Regression
Fits linear model y(w ,Xi) = w0 + w1x i1 + w2x i
2 + . . .wpx ip
where w = (w1, ...,wp) coefficients obtained by solving
minw||Xw −y ||22 + α||w ||22
α > 0 is a complexity parameter that controls howrobust the coefficients are to linearityPenalty on the size of coefficients addresses issueswith ordinary least squares methodComplexity: O(np2)
An Introduction to Machine Learning February 22, 2017 9 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Generalized Linear Models
GLM: Lasso
Fits linear model y(w ,Xi) = w0 + w1x i1 + w2x i
2 + . . .wpx ip
where w = (w1, ...,wp) coefficients obtained by solving
minw
12n||Xw −y ||22 + α||w ||11
l1 norm means we will this linear model estimatessparse coefficientsThis method reduces number of variables upon whichthe solution is dependentUseful in compressed sensing and can be used forfeature selection
An Introduction to Machine Learning February 22, 2017 10 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: Mathematical Setup
A SVM constructs a hyperplane (or set of hyperplanes) in ahigh or infinite dimensional space, which can be used forclassification, regression or other tasks.
An Introduction to Machine Learning February 22, 2017 11 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: Linear, separable case
We want to find the hyperplane that maximizes the marginas follows:
Mathematical formulationGiven X1,X2, . . .Xn ∈ Rp, with labels Yi ∈ [−1,1], minimize||w ||2 subject to{
(w ·Xi + b)≥ 1 Yi = 1(w ·Xi + b)≤−1 Yi =−1
Equivalently,
Yi(w ·Xi + b)≥ 1
Decision function:
f (X ,w ,b) = sign(w ·X + b)
An Introduction to Machine Learning February 22, 2017 12 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: Linear, non-separable case
Rewrite problem as
Mathematical formulationGiven X1,X2, . . .Xn ∈ Rp, with labels Yi ∈ [−1,1], minimize||w ||2 + C ∑
ni=1 ξi subject to
Yi(w ·Xi + b)≥ 1−ξi , ξi ≥ 0
This added a hinge-loss term.
GeneralDecision: f (X ) = w ·X + bSolve:
minP(w ,b) =12||w ||2 + C ∑
iH1[Yi f (Xi)]
= maximize margin + minimize errorAn Introduction to Machine Learning February 22, 2017 13 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: Linear, non-separable case
Rewrite problem as
Mathematical formulationGiven X1,X2, . . .Xn ∈ Rp, with labels Yi ∈ [−1,1], minimize||w ||2 + C ∑
ni=1 ξi subject to
Yi(w ·Xi + b)≥ 1−ξi , ξi ≥ 0
This added a hinge-loss term.
GeneralDecision: f (X ) = w ·X + bSolve:
minP(w ,b) =12||w ||2 + C ∑
iH1[Yi f (Xi)]
= maximize margin + minimize errorAn Introduction to Machine Learning February 22, 2017 13 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: non-linear case
For some data, linear classifiers are not complexenoughThe solution is to map data into a feature space, andthen construct a hyperplane in the feature space asbefore:
X 7→ Φ(X )Learn f (X ) = w ·Φ(X ) + b
An Introduction to Machine Learning February 22, 2017 14 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: non-linear case
For some data, linear classifiers are not complexenoughThe solution is to map data into a feature space, andthen construct a hyperplane in the feature space asbefore:
X 7→ Φ(X )Learn f (X ) = w ·Φ(X ) + b
An Introduction to Machine Learning February 22, 2017 14 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: non-linear case
For some data, linear classifiers are not complexenoughThe solution is to map data into a feature space, andthen construct a hyperplane in the feature space asbefore:
X 7→ Φ(X )Learn f (X ) = w ·Φ(X ) + b
An Introduction to Machine Learning February 22, 2017 14 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
SVM: Kernel Trick
Kernel trick: if Φ(X ) is high-dimensional, it can be hardto solve for wBy representer theorem (Kimeldorf & Wahba, 1971)
w = ∑i
αiΦ(Xi)
for some αiOptimize αi instead of w :
f (X ) = ∑i
αiK (Xi ,X ) + b, K (Xi ,X ) = Φ(Xi) ·Φ(X )
Rewrite all SVM equations as before but withw = ∑i αiΦ(Xi)
Dual:
minP(w ,b) =12||∑
iαiΦ(Xi)||2 + C ∑
iH1[Yi f (Xi)]
An Introduction to Machine Learning February 22, 2017 15 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
Common kernel examples
RBF-SVM:
K (X ,X ′) = exp(−γ||X −X ′||2)
Adds a bump around each data pointPolynomial-SVM:
K (X ,X ′) = (X ·X ′)d
When d is large, the kernel still only requires ncomputations, whereas explicit representation may notfit in memory
An Introduction to Machine Learning February 22, 2017 16 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
Pros and Cons of SVM
Advantages
Effective in high dimensional spacesCan be used for classification or regressionMemory efficient since it only uses a subset of trainingpoints in the decision functionVersatile since we can use different kernel functions forthe decision function
An Introduction to Machine Learning February 22, 2017 17 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Support Vector Machines
Pros and Cons of SVM
Disadvantages
Non-Probabilistic: SVMs do not directly provideprobability estimatesMethod will likely not do well if the number of features ismuch greater than the number of samples
An Introduction to Machine Learning February 22, 2017 18 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Decision Trees
Decision Trees formulation
Set-up training vectors Xi ∈ Rn, i = 1, ...,p and a labelvector Y ∈ Rp
A decision tree recursively partitions the space suchthat the samples with the same labels are groupedtogetherDenote data at node m by QConsider all possible splits of Q into left and rightgroupsChoose splits which minimizes some measure ofimpurityMaximum tree depth parameter
An Introduction to Machine Learning February 22, 2017 19 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Decision Trees
Decision Tree Example
Image source: https://www.projectrhea.org/rhea/index.php/Lecture_21_-_Decision_Trees(Continued)_Old_Kiwi
An Introduction to Machine Learning February 22, 2017 20 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Decision Trees
Pros and Cons of Decision Trees
Advantages
Simple to understand and to interpret. Trees can bevisualizedThe cost of using the tree (i.e., predicting data) islogarithmic in the number of data points used to trainthe treePossible to validate a model using statistical testsAble to handle both numerical and categorical data
An Introduction to Machine Learning February 22, 2017 21 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Decision Trees
Pros and Cons of Decision Trees
DisadvantagesDecision-tree learners can create over-complex treesthat do not generalize the data wellDecision tree learners create biased trees if someclasses dominateThe problem of learning an optimal decision tree isknown to be NP-complete under several aspects ofoptimality and even for simple concepts
An Introduction to Machine Learning February 22, 2017 22 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Supervised Learning Decision Trees
Other Supervised Learning Methods
Stochastic gradient descentNearest neighborsGaussian processNaive bayesEnsemble methods
An Introduction to Machine Learning February 22, 2017 23 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Manifold learning
Manifold Learning
A dimension reduction attempt to generalize linearframeworks like PCA to be sensitive to non-linearstructure in dataExamples:
Isomap: seeks a lower-dimensional embedding whichmaintains geodesic distances between all pointsLocally linear embedding: seeks a lower-dimensionalprojection of the data which preserves distances withinlocal neighborhoodsMultidimensional scaling: seeks a low-dimensionalrepresentation of the data in which the distancesrespect well the distances in the originalhigh-dimensional space
An Introduction to Machine Learning February 22, 2017 24 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Clustering
Clustering: K-means
Goal: Separate samples into K clusters C by solving
n
∑i=0
minµj∈C
(||xj −µi ||2),
where µj is the centroid of the jth cluster
The within-cluster sum of squares criterion is referredto as inertia and it measures how internally coherentclusters are.Inertia makes the assumption that clusters are convexand isotropicInertia is not a normalized metric, better to run PCAbefore doing K-means clustering if you’re in a highdimensional space (curse of dimensionality)
An Introduction to Machine Learning February 22, 2017 25 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Clustering
K-means clustering
Image source:http://webstaff.itn.liu.se/~reile/edu/TNM025/Matlab/html/KMeansDemo.html
An Introduction to Machine Learning February 22, 2017 26 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Clustering
Clustering: Spectral
Spectral clustering does a low-dimension embedding ofthe affinity matrix between samples, followed by aKMeans in the low dimensional spaceThe general approach to spectral clustering is to use astandard clustering method on relevant eigenvectors ofa Laplacian matrix of similarity matrix A, where Aij ≥ 0represents a measure of the similarity between datapoints with indexes i and jWorks well for a small number of clusters but is notadvised when using many clustersVery useful when the structure of the individual clustersis highly non-convex, or more generally when ameasure of the center and spread of the cluster is not asuitable description of the complete cluster
An Introduction to Machine Learning February 22, 2017 27 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Neural Networks
Neural Networks
A neural network is a system that receives an input,process the data, and provides an outputAlgorithm mimics biological processes of a neuronSimplest case: “Neuron” is a computational unit thattakes as input x1,x2,x3, . . . and outputs
hW ,b(x) = f (W T x) = f (∑i
Wixi + b),
where f : R→ R is called the activation function, Wi areweights and b is the biasCommon choice for the activation function is tanhAdd hidden layers to create a more sophisticatednetworkNeed to learn interconnection pattern between thedifferent layers of neurons and weights of theinterconnections based on a cost function
An Introduction to Machine Learning February 22, 2017 28 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Neural Networks
Neural Networks
Image source: https://visualstudiomagazine.com/articles/2014/11/01/use-python-with-your-neural-networks.aspxAn Introduction to Machine Learning February 22, 2017 29 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Neural Networks
Pros and Cons of Neural Networks
Advantages
Can be trained directly on data with thousands of inputsOnce trains, predictions are fastPerforms tasks that linear programs cannotCan be used with supervised of unsupervised learningCan be done in parallel
An Introduction to Machine Learning February 22, 2017 30 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Unsupervised Learning Neural Networks
Pros and Cons of Neural Networks
DisadvantagesTraining is computationally expensiveComplex and can be hard to interpretCould require a lot of parameter tweaking
An Introduction to Machine Learning February 22, 2017 31 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Model Selection and Evaluation
Things to consider:
What is your goal? Classify? Parameter prediction?Dimension of your dataVariability of dataPre-existing knowledge of data
An Introduction to Machine Learning February 22, 2017 32 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Model Selection and Evaluation
Classification Metrics:
accuracy(y , y) =1
nsamples
nsamples−1
∑i=0
1(yi = yi)
Regression Metrics:
MAE(y , y) =1
nsamples
nsamples−1
∑i=0
|yi − yi |
MSE(y , y) =1
nsamples
nsamples−1
∑i=0
(yi − yi)2
R2(y , y) = 1− ∑nsamples−1i=0 (yi − yi)
2
∑nsamples−1i=0 (yi − y)2
An Introduction to Machine Learning February 22, 2017 33 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Model Selection and Evaluation
Classification Metrics:
accuracy(y , y) =1
nsamples
nsamples−1
∑i=0
1(yi = yi)
Regression Metrics:
MAE(y , y) =1
nsamples
nsamples−1
∑i=0
|yi − yi |
MSE(y , y) =1
nsamples
nsamples−1
∑i=0
(yi − yi)2
R2(y , y) = 1− ∑nsamples−1i=0 (yi − yi)
2
∑nsamples−1i=0 (yi − y)2
An Introduction to Machine Learning February 22, 2017 33 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Applications of Machine Learning
Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)
An Introduction to Machine Learning February 22, 2017 34 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Applications of Machine Learning
Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)
An Introduction to Machine Learning February 22, 2017 34 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Applications of Machine Learning
Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)
An Introduction to Machine Learning February 22, 2017 34 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Applications of Machine Learning
Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)
An Introduction to Machine Learning February 22, 2017 34 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Applications of Machine Learning
Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)
An Introduction to Machine Learning February 22, 2017 34 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Applications of Machine Learning
Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)
An Introduction to Machine Learning February 22, 2017 34 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Applications of Machine Learning
Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)
An Introduction to Machine Learning February 22, 2017 34 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Applications
Useful Software Packages
PyML, pyMC, scikit-learn in PythonTensorFlow (Google)MATLAB’s Statistics and Machine Learning toolboxSpider (MATLAB)Shogunmlpack in C++
Torch (C++)Weka (Java)Orange (Open source machine learning and datavisualization)
An Introduction to Machine Learning February 22, 2017 35 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Contamination of classifier
Goal: General classifierProblem: Illusion of success (over-tuning parameters)Use cross-validation to avoid this:
Randomly divide training data into multiple subsetsOnly use one subset for training at a timeTest each classifier on data not used for trainingAverage results to see how well the classifiers do
An Introduction to Machine Learning February 22, 2017 36 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Contamination of classifier
Goal: General classifierProblem: Illusion of success (over-tuning parameters)Use cross-validation to avoid this:
Randomly divide training data into multiple subsetsOnly use one subset for training at a timeTest each classifier on data not used for trainingAverage results to see how well the classifiers do
An Introduction to Machine Learning February 22, 2017 36 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Overfitting
“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:
Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance
A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting
An Introduction to Machine Learning February 22, 2017 37 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Overfitting
“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:
Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance
A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting
An Introduction to Machine Learning February 22, 2017 37 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Overfitting
“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:
Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance
A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting
An Introduction to Machine Learning February 22, 2017 37 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Overfitting
“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:
Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance
A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting
An Introduction to Machine Learning February 22, 2017 37 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Bias and Variance
Image source: [1]An Introduction to Machine Learning February 22, 2017 38 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Pitfalls to avoid
Curse of Dimensionality
Second biggest problem in machine learningExpression coined in 1961 to refer to fact that manyalgorithms work well in low dimensions but fail in highdimensionsGeneralizing correctly is exponentially more difficult asthe dimensionality, or number of features, increasesSimilarity-based reasoning fails in high dimensionsGood news: usually high dimensional data areconcentrated on/near a lower dimensional manifold sowe can use dimension reduction techniques to avoidthis “curse ”
An Introduction to Machine Learning February 22, 2017 39 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Machine Learning that Matters
Ways Machine Learning Limits Impacts on World
C. Rudin and K. Wagstaff argue that most people donot ML with the primary intention of having a significantimpact on the worldMany papers in ML community are rejected solelybecause their algorithms and analyses are not novel,even if their scientific contribution could have animportant impact on societyIn a survey of 152 papers published at ICML 2011, only1% of papers interpret results in domain context,whereas 39% used synthetic data and 37 % usedstandard data from UCE archiveAbstract metrics for performance of algorithms do notmeasure the impact of the resultsTheoretical advances not connected back to real worldimpact
An Introduction to Machine Learning February 22, 2017 40 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Machine Learning that Matters
Making Machine Learning Matter
1 Step 1: Define or select evaluation methods that willallow you to measure the impact of your results
2 Step 2: Collaborate with experts in other fields who canhelp define the ML problem and label data forclassification and regression tasks
3 Step 3: Consider the potential impact when decidingwhich research problem to work on
An Introduction to Machine Learning February 22, 2017 41 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Machine Learning that Matters
Making Machine Learning Matter
1 Step 1: Define or select evaluation methods that willallow you to measure the impact of your results
2 Step 2: Collaborate with experts in other fields who canhelp define the ML problem and label data forclassification and regression tasks
3 Step 3: Consider the potential impact when decidingwhich research problem to work on
An Introduction to Machine Learning February 22, 2017 41 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Machine Learning that Matters
Making Machine Learning Matter
1 Step 1: Define or select evaluation methods that willallow you to measure the impact of your results
2 Step 2: Collaborate with experts in other fields who canhelp define the ML problem and label data forclassification and regression tasks
3 Step 3: Consider the potential impact when decidingwhich research problem to work on
An Introduction to Machine Learning February 22, 2017 41 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Machine Learning that Matters
Wagstaff’s Impact Challenges
(Original list from Carbonell, 1992)1 A law passed or legal decision made that relies on the
results of an ML analysis2 Save $100 through an improved decision making
algorithm provided by an ML system3 Avert a conflict between nations though high-quality
translation provided by an ML system4 Save a human life through a diagnosis or invention
recommended by a ML system5 Reduce cyber security break-ins by 50% through ML
defenses6 Improve the Human Development Index by 10% in a
country using a ML system
An Introduction to Machine Learning February 22, 2017 42 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Machine Learning that Matters
Obstacles to ML Impact
1 JargonML vocabulary creates barriers between experts andsociety or even experts in other fieldsReplace “feature extraction” with “representation,”“variance” with “instability” and so on
2 Risk“With great power comes great responsibility”Who is at fault for errors when errors have a significantimpact?High concerns in fields such as medicine, spacecraft,finance, etc.
3 ComplexityML is not simple enough for researchers across fields touse freelySimplifying, maturing, and “robustifying” ML tools willpromote wider, more independent uses of ML
An Introduction to Machine Learning February 22, 2017 43 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Machine Learning that Matters
Thanks for listening!!!
An Introduction to Machine Learning February 22, 2017 44 / 45
AnIntroduction to
MachineLearning
Introduction
SupervisedLearningGeneralized LinearModels
Support VectorMachines
Decision Trees
UnsupervisedLearningManifold learning
Clustering
Neural Networks
ModelSelection andEvaluation
Applications
Pitfalls toavoid
MachineLearning thatMatters
Bibliography
Bibliography
REFERENCES
[1] Domingos, Pedro. A Few Useful Things to Know aboutMachine Learning.[2] Rudin, Cynthia and Kiri L. Wagstaff. Machine Learningfor Science and Society. Mach Lean, Springer. (2013)[3] Wagstaff, Kiri L. Machine Learning that Matters.Proceedings of the 29th International Conferences onMachine Learning, Edinburgh, Scotland. (2012)[4] scikit-learn.org/stable/user_guide.html[5] https://www.quantstart.com/articles/Support-Vector-Machines-A-Guide-for-Beginners[6] www.cs.columbia/edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf
An Introduction to Machine Learning February 22, 2017 45 / 45