an introduction to machine learning · machine learning introduction supervised learning...

61
An Introduction to Machine Learning Introduction Supervised Learning Generalized Linear Models Support Vector Machines Decision Trees Unsupervised Learning Manifold learning Clustering Neural Networks Model Selection and Evaluation Applications Pitfalls to avoid Machine Learning that Matters Bibliography An Introduction to Machine Learning Research Group Meeting 2017 Melissa R. McGuirl February 22, 2017 An Introduction to Machine Learning February 22, 2017 1 / 45

Upload: others

Post on 22-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

An Introduction to Machine LearningResearch Group Meeting 2017

Melissa R. McGuirl

February 22, 2017

An Introduction to Machine Learning February 22, 2017 1 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Table of Contents

1 Introduction2 Supervised Learning

Generalized Linear ModelsSupport Vector MachinesDecision Trees

3 Unsupervised LearningManifold learningClusteringNeural Networks

4 Model Selection and Evaluation5 Applications6 Pitfalls to avoid7 Machine Learning that Matters8 Bibliography

An Introduction to Machine Learning February 22, 2017 2 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Introduction

General Set-up

Set-up and GoalSuppose we have X1,X2, . . . ,Xn data samples. Can wepredict properites about any given Xn+1,Xn+2, . . . ,XN?

Machine learning systems attempt to predict properties ofunknown data based on the attributes or features of thedata.

An Introduction to Machine Learning February 22, 2017 3 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Introduction

Supervised learning

1 Data comes with attributes that we want our algorithmto predict

2 Machine learning algorithm is given data attributes anddesired outputs and the goal is to learn a way to mapinputs to outputs in a general way

3 2 main types of learning:1 Classification: Data belongs to different

classes/groups and we want to be able to predict whichclass/group unlabeled data belongs to

2 Regression: Data labeled with one or more continuousvariables (parameters) and the task is to predict thevalue of these variables for unknown data

An Introduction to Machine Learning February 22, 2017 4 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Introduction

Image source: http://ipython-books.github.io/featured-04/

An Introduction to Machine Learning February 22, 2017 5 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Introduction

Unsupervised Learning

1 No labels are given to the algorithm2 Training data consists of a set of input vectors{X1,X2, . . . ,Xn} with no target outputs

3 3 Types of goals in this setting:

1 Clustering: discover groups with similar features withinthe data

2 Density Estimation: determine distribution of datawithin the input space

3 Dimensionality Reduction: project data into a lowerdimensional space than input space

An Introduction to Machine Learning February 22, 2017 6 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Generalized Linear Models

Generalized Linear Models

Set-up: Data in the form of Xi = (x i1,x

i2, . . . ,x

ip)

Goal: Regression

Find y

y(w ,Xi) = w0 + w1x i1 + w2x i

2 + . . .wpx ip,

where w0 is called the intercept and w1, . . .wp are thecoefficients.

An Introduction to Machine Learning February 22, 2017 7 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Generalized Linear Models

GLM: Ordinary Least Squares

Fits linear model y(w ,Xi) = w0 + w1x i1 + w2x i

2 + . . .wpx ip

where w = (w1, ...,wp) coefficients obtained by solving

minw||Xw −y ||22

Model relies on independence of model termsIf terms are correlated then the least-square estimate ishighly sensitive to random errors in the observedresponse (large variance)Complexity: O(np2)

An Introduction to Machine Learning February 22, 2017 8 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Generalized Linear Models

GLM: Ridge Regression

Fits linear model y(w ,Xi) = w0 + w1x i1 + w2x i

2 + . . .wpx ip

where w = (w1, ...,wp) coefficients obtained by solving

minw||Xw −y ||22 + α||w ||22

α > 0 is a complexity parameter that controls howrobust the coefficients are to linearityPenalty on the size of coefficients addresses issueswith ordinary least squares methodComplexity: O(np2)

An Introduction to Machine Learning February 22, 2017 9 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Generalized Linear Models

GLM: Lasso

Fits linear model y(w ,Xi) = w0 + w1x i1 + w2x i

2 + . . .wpx ip

where w = (w1, ...,wp) coefficients obtained by solving

minw

12n||Xw −y ||22 + α||w ||11

l1 norm means we will this linear model estimatessparse coefficientsThis method reduces number of variables upon whichthe solution is dependentUseful in compressed sensing and can be used forfeature selection

An Introduction to Machine Learning February 22, 2017 10 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: Mathematical Setup

A SVM constructs a hyperplane (or set of hyperplanes) in ahigh or infinite dimensional space, which can be used forclassification, regression or other tasks.

An Introduction to Machine Learning February 22, 2017 11 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: Linear, separable case

We want to find the hyperplane that maximizes the marginas follows:

Mathematical formulationGiven X1,X2, . . .Xn ∈ Rp, with labels Yi ∈ [−1,1], minimize||w ||2 subject to{

(w ·Xi + b)≥ 1 Yi = 1(w ·Xi + b)≤−1 Yi =−1

Equivalently,

Yi(w ·Xi + b)≥ 1

Decision function:

f (X ,w ,b) = sign(w ·X + b)

An Introduction to Machine Learning February 22, 2017 12 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: Linear, non-separable case

Rewrite problem as

Mathematical formulationGiven X1,X2, . . .Xn ∈ Rp, with labels Yi ∈ [−1,1], minimize||w ||2 + C ∑

ni=1 ξi subject to

Yi(w ·Xi + b)≥ 1−ξi , ξi ≥ 0

This added a hinge-loss term.

GeneralDecision: f (X ) = w ·X + bSolve:

minP(w ,b) =12||w ||2 + C ∑

iH1[Yi f (Xi)]

= maximize margin + minimize errorAn Introduction to Machine Learning February 22, 2017 13 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: Linear, non-separable case

Rewrite problem as

Mathematical formulationGiven X1,X2, . . .Xn ∈ Rp, with labels Yi ∈ [−1,1], minimize||w ||2 + C ∑

ni=1 ξi subject to

Yi(w ·Xi + b)≥ 1−ξi , ξi ≥ 0

This added a hinge-loss term.

GeneralDecision: f (X ) = w ·X + bSolve:

minP(w ,b) =12||w ||2 + C ∑

iH1[Yi f (Xi)]

= maximize margin + minimize errorAn Introduction to Machine Learning February 22, 2017 13 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: non-linear case

For some data, linear classifiers are not complexenoughThe solution is to map data into a feature space, andthen construct a hyperplane in the feature space asbefore:

X 7→ Φ(X )Learn f (X ) = w ·Φ(X ) + b

An Introduction to Machine Learning February 22, 2017 14 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: non-linear case

For some data, linear classifiers are not complexenoughThe solution is to map data into a feature space, andthen construct a hyperplane in the feature space asbefore:

X 7→ Φ(X )Learn f (X ) = w ·Φ(X ) + b

An Introduction to Machine Learning February 22, 2017 14 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: non-linear case

For some data, linear classifiers are not complexenoughThe solution is to map data into a feature space, andthen construct a hyperplane in the feature space asbefore:

X 7→ Φ(X )Learn f (X ) = w ·Φ(X ) + b

An Introduction to Machine Learning February 22, 2017 14 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

SVM: Kernel Trick

Kernel trick: if Φ(X ) is high-dimensional, it can be hardto solve for wBy representer theorem (Kimeldorf & Wahba, 1971)

w = ∑i

αiΦ(Xi)

for some αiOptimize αi instead of w :

f (X ) = ∑i

αiK (Xi ,X ) + b, K (Xi ,X ) = Φ(Xi) ·Φ(X )

Rewrite all SVM equations as before but withw = ∑i αiΦ(Xi)

Dual:

minP(w ,b) =12||∑

iαiΦ(Xi)||2 + C ∑

iH1[Yi f (Xi)]

An Introduction to Machine Learning February 22, 2017 15 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

Common kernel examples

RBF-SVM:

K (X ,X ′) = exp(−γ||X −X ′||2)

Adds a bump around each data pointPolynomial-SVM:

K (X ,X ′) = (X ·X ′)d

When d is large, the kernel still only requires ncomputations, whereas explicit representation may notfit in memory

An Introduction to Machine Learning February 22, 2017 16 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

Pros and Cons of SVM

Advantages

Effective in high dimensional spacesCan be used for classification or regressionMemory efficient since it only uses a subset of trainingpoints in the decision functionVersatile since we can use different kernel functions forthe decision function

An Introduction to Machine Learning February 22, 2017 17 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Support Vector Machines

Pros and Cons of SVM

Disadvantages

Non-Probabilistic: SVMs do not directly provideprobability estimatesMethod will likely not do well if the number of features ismuch greater than the number of samples

An Introduction to Machine Learning February 22, 2017 18 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Decision Trees

Decision Trees formulation

Set-up training vectors Xi ∈ Rn, i = 1, ...,p and a labelvector Y ∈ Rp

A decision tree recursively partitions the space suchthat the samples with the same labels are groupedtogetherDenote data at node m by QConsider all possible splits of Q into left and rightgroupsChoose splits which minimizes some measure ofimpurityMaximum tree depth parameter

An Introduction to Machine Learning February 22, 2017 19 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Decision Trees

Decision Tree Example

Image source: https://www.projectrhea.org/rhea/index.php/Lecture_21_-_Decision_Trees(Continued)_Old_Kiwi

An Introduction to Machine Learning February 22, 2017 20 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Decision Trees

Pros and Cons of Decision Trees

Advantages

Simple to understand and to interpret. Trees can bevisualizedThe cost of using the tree (i.e., predicting data) islogarithmic in the number of data points used to trainthe treePossible to validate a model using statistical testsAble to handle both numerical and categorical data

An Introduction to Machine Learning February 22, 2017 21 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Decision Trees

Pros and Cons of Decision Trees

DisadvantagesDecision-tree learners can create over-complex treesthat do not generalize the data wellDecision tree learners create biased trees if someclasses dominateThe problem of learning an optimal decision tree isknown to be NP-complete under several aspects ofoptimality and even for simple concepts

An Introduction to Machine Learning February 22, 2017 22 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Supervised Learning Decision Trees

Other Supervised Learning Methods

Stochastic gradient descentNearest neighborsGaussian processNaive bayesEnsemble methods

An Introduction to Machine Learning February 22, 2017 23 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Manifold learning

Manifold Learning

A dimension reduction attempt to generalize linearframeworks like PCA to be sensitive to non-linearstructure in dataExamples:

Isomap: seeks a lower-dimensional embedding whichmaintains geodesic distances between all pointsLocally linear embedding: seeks a lower-dimensionalprojection of the data which preserves distances withinlocal neighborhoodsMultidimensional scaling: seeks a low-dimensionalrepresentation of the data in which the distancesrespect well the distances in the originalhigh-dimensional space

An Introduction to Machine Learning February 22, 2017 24 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Clustering

Clustering: K-means

Goal: Separate samples into K clusters C by solving

n

∑i=0

minµj∈C

(||xj −µi ||2),

where µj is the centroid of the jth cluster

The within-cluster sum of squares criterion is referredto as inertia and it measures how internally coherentclusters are.Inertia makes the assumption that clusters are convexand isotropicInertia is not a normalized metric, better to run PCAbefore doing K-means clustering if you’re in a highdimensional space (curse of dimensionality)

An Introduction to Machine Learning February 22, 2017 25 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Clustering

K-means clustering

Image source:http://webstaff.itn.liu.se/~reile/edu/TNM025/Matlab/html/KMeansDemo.html

An Introduction to Machine Learning February 22, 2017 26 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Clustering

Clustering: Spectral

Spectral clustering does a low-dimension embedding ofthe affinity matrix between samples, followed by aKMeans in the low dimensional spaceThe general approach to spectral clustering is to use astandard clustering method on relevant eigenvectors ofa Laplacian matrix of similarity matrix A, where Aij ≥ 0represents a measure of the similarity between datapoints with indexes i and jWorks well for a small number of clusters but is notadvised when using many clustersVery useful when the structure of the individual clustersis highly non-convex, or more generally when ameasure of the center and spread of the cluster is not asuitable description of the complete cluster

An Introduction to Machine Learning February 22, 2017 27 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Neural Networks

Neural Networks

A neural network is a system that receives an input,process the data, and provides an outputAlgorithm mimics biological processes of a neuronSimplest case: “Neuron” is a computational unit thattakes as input x1,x2,x3, . . . and outputs

hW ,b(x) = f (W T x) = f (∑i

Wixi + b),

where f : R→ R is called the activation function, Wi areweights and b is the biasCommon choice for the activation function is tanhAdd hidden layers to create a more sophisticatednetworkNeed to learn interconnection pattern between thedifferent layers of neurons and weights of theinterconnections based on a cost function

An Introduction to Machine Learning February 22, 2017 28 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Neural Networks

Neural Networks

Image source: https://visualstudiomagazine.com/articles/2014/11/01/use-python-with-your-neural-networks.aspxAn Introduction to Machine Learning February 22, 2017 29 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Neural Networks

Pros and Cons of Neural Networks

Advantages

Can be trained directly on data with thousands of inputsOnce trains, predictions are fastPerforms tasks that linear programs cannotCan be used with supervised of unsupervised learningCan be done in parallel

An Introduction to Machine Learning February 22, 2017 30 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Unsupervised Learning Neural Networks

Pros and Cons of Neural Networks

DisadvantagesTraining is computationally expensiveComplex and can be hard to interpretCould require a lot of parameter tweaking

An Introduction to Machine Learning February 22, 2017 31 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Model Selection and Evaluation

Things to consider:

What is your goal? Classify? Parameter prediction?Dimension of your dataVariability of dataPre-existing knowledge of data

An Introduction to Machine Learning February 22, 2017 32 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Model Selection and Evaluation

Classification Metrics:

accuracy(y , y) =1

nsamples

nsamples−1

∑i=0

1(yi = yi)

Regression Metrics:

MAE(y , y) =1

nsamples

nsamples−1

∑i=0

|yi − yi |

MSE(y , y) =1

nsamples

nsamples−1

∑i=0

(yi − yi)2

R2(y , y) = 1− ∑nsamples−1i=0 (yi − yi)

2

∑nsamples−1i=0 (yi − y)2

An Introduction to Machine Learning February 22, 2017 33 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Model Selection and Evaluation

Classification Metrics:

accuracy(y , y) =1

nsamples

nsamples−1

∑i=0

1(yi = yi)

Regression Metrics:

MAE(y , y) =1

nsamples

nsamples−1

∑i=0

|yi − yi |

MSE(y , y) =1

nsamples

nsamples−1

∑i=0

(yi − yi)2

R2(y , y) = 1− ∑nsamples−1i=0 (yi − yi)

2

∑nsamples−1i=0 (yi − y)2

An Introduction to Machine Learning February 22, 2017 33 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Applications of Machine Learning

Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)

An Introduction to Machine Learning February 22, 2017 34 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Applications of Machine Learning

Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)

An Introduction to Machine Learning February 22, 2017 34 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Applications of Machine Learning

Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)

An Introduction to Machine Learning February 22, 2017 34 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Applications of Machine Learning

Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)

An Introduction to Machine Learning February 22, 2017 34 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Applications of Machine Learning

Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)

An Introduction to Machine Learning February 22, 2017 34 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Applications of Machine Learning

Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)

An Introduction to Machine Learning February 22, 2017 34 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Applications of Machine Learning

Spam email detectionImproving weather predictionTargeted advertising and web searchesPredicting emergency room wait times using staffinglevels, patient data, charts, and layout of ERIdentifying heart failure from physician’s notesPredicting hospital readmissionsLearning dynamical systems models directly fromhigh-dimensional sensor data (Byron Boots, GeorgiaTech)

An Introduction to Machine Learning February 22, 2017 34 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Applications

Useful Software Packages

PyML, pyMC, scikit-learn in PythonTensorFlow (Google)MATLAB’s Statistics and Machine Learning toolboxSpider (MATLAB)Shogunmlpack in C++

Torch (C++)Weka (Java)Orange (Open source machine learning and datavisualization)

An Introduction to Machine Learning February 22, 2017 35 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Contamination of classifier

Goal: General classifierProblem: Illusion of success (over-tuning parameters)Use cross-validation to avoid this:

Randomly divide training data into multiple subsetsOnly use one subset for training at a timeTest each classifier on data not used for trainingAverage results to see how well the classifiers do

An Introduction to Machine Learning February 22, 2017 36 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Contamination of classifier

Goal: General classifierProblem: Illusion of success (over-tuning parameters)Use cross-validation to avoid this:

Randomly divide training data into multiple subsetsOnly use one subset for training at a timeTest each classifier on data not used for trainingAverage results to see how well the classifiers do

An Introduction to Machine Learning February 22, 2017 36 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Overfitting

“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:

Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance

A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting

An Introduction to Machine Learning February 22, 2017 37 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Overfitting

“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:

Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance

A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting

An Introduction to Machine Learning February 22, 2017 37 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Overfitting

“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:

Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance

A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting

An Introduction to Machine Learning February 22, 2017 37 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Overfitting

“Hallucinating a classifier” occurs when data are notsufficient to determine the correct classifierIn this case the classifiers will be encoding randomfeatures in data which are not grounded in realityOverfitting can be decomposed into bias and variance:

Bias is the learner’s tendency to consistently learn thesame (wrong) thingVariance is the tendency to learn random thingsirrespective of real signalLinear classifiers have high biasDecision trees have low bias but high variance

A more powerful algorithm is not necessarily betterUse cross-validation, add a regularization term to avoidoverfitting

An Introduction to Machine Learning February 22, 2017 37 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Bias and Variance

Image source: [1]An Introduction to Machine Learning February 22, 2017 38 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Pitfalls to avoid

Curse of Dimensionality

Second biggest problem in machine learningExpression coined in 1961 to refer to fact that manyalgorithms work well in low dimensions but fail in highdimensionsGeneralizing correctly is exponentially more difficult asthe dimensionality, or number of features, increasesSimilarity-based reasoning fails in high dimensionsGood news: usually high dimensional data areconcentrated on/near a lower dimensional manifold sowe can use dimension reduction techniques to avoidthis “curse ”

An Introduction to Machine Learning February 22, 2017 39 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Machine Learning that Matters

Ways Machine Learning Limits Impacts on World

C. Rudin and K. Wagstaff argue that most people donot ML with the primary intention of having a significantimpact on the worldMany papers in ML community are rejected solelybecause their algorithms and analyses are not novel,even if their scientific contribution could have animportant impact on societyIn a survey of 152 papers published at ICML 2011, only1% of papers interpret results in domain context,whereas 39% used synthetic data and 37 % usedstandard data from UCE archiveAbstract metrics for performance of algorithms do notmeasure the impact of the resultsTheoretical advances not connected back to real worldimpact

An Introduction to Machine Learning February 22, 2017 40 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Machine Learning that Matters

Making Machine Learning Matter

1 Step 1: Define or select evaluation methods that willallow you to measure the impact of your results

2 Step 2: Collaborate with experts in other fields who canhelp define the ML problem and label data forclassification and regression tasks

3 Step 3: Consider the potential impact when decidingwhich research problem to work on

An Introduction to Machine Learning February 22, 2017 41 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Machine Learning that Matters

Making Machine Learning Matter

1 Step 1: Define or select evaluation methods that willallow you to measure the impact of your results

2 Step 2: Collaborate with experts in other fields who canhelp define the ML problem and label data forclassification and regression tasks

3 Step 3: Consider the potential impact when decidingwhich research problem to work on

An Introduction to Machine Learning February 22, 2017 41 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Machine Learning that Matters

Making Machine Learning Matter

1 Step 1: Define or select evaluation methods that willallow you to measure the impact of your results

2 Step 2: Collaborate with experts in other fields who canhelp define the ML problem and label data forclassification and regression tasks

3 Step 3: Consider the potential impact when decidingwhich research problem to work on

An Introduction to Machine Learning February 22, 2017 41 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Machine Learning that Matters

Wagstaff’s Impact Challenges

(Original list from Carbonell, 1992)1 A law passed or legal decision made that relies on the

results of an ML analysis2 Save $100 through an improved decision making

algorithm provided by an ML system3 Avert a conflict between nations though high-quality

translation provided by an ML system4 Save a human life through a diagnosis or invention

recommended by a ML system5 Reduce cyber security break-ins by 50% through ML

defenses6 Improve the Human Development Index by 10% in a

country using a ML system

An Introduction to Machine Learning February 22, 2017 42 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Machine Learning that Matters

Obstacles to ML Impact

1 JargonML vocabulary creates barriers between experts andsociety or even experts in other fieldsReplace “feature extraction” with “representation,”“variance” with “instability” and so on

2 Risk“With great power comes great responsibility”Who is at fault for errors when errors have a significantimpact?High concerns in fields such as medicine, spacecraft,finance, etc.

3 ComplexityML is not simple enough for researchers across fields touse freelySimplifying, maturing, and “robustifying” ML tools willpromote wider, more independent uses of ML

An Introduction to Machine Learning February 22, 2017 43 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Machine Learning that Matters

Thanks for listening!!!

An Introduction to Machine Learning February 22, 2017 44 / 45

AnIntroduction to

MachineLearning

Introduction

SupervisedLearningGeneralized LinearModels

Support VectorMachines

Decision Trees

UnsupervisedLearningManifold learning

Clustering

Neural Networks

ModelSelection andEvaluation

Applications

Pitfalls toavoid

MachineLearning thatMatters

Bibliography

Bibliography

REFERENCES

[1] Domingos, Pedro. A Few Useful Things to Know aboutMachine Learning.[2] Rudin, Cynthia and Kiri L. Wagstaff. Machine Learningfor Science and Society. Mach Lean, Springer. (2013)[3] Wagstaff, Kiri L. Machine Learning that Matters.Proceedings of the 29th International Conferences onMachine Learning, Edinburgh, Scotland. (2012)[4] scikit-learn.org/stable/user_guide.html[5] https://www.quantstart.com/articles/Support-Vector-Machines-A-Guide-for-Beginners[6] www.cs.columbia/edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf

An Introduction to Machine Learning February 22, 2017 45 / 45