machine learning

75
Machine Learning LW/OB presentation

Upload: arthur-breitman

Post on 02-Nov-2014

9 views

Category:

Technology


0 download

DESCRIPTION

Simple presentation explaining what machine learning is.

TRANSCRIPT

Page 1: Machine Learning

Machine Learning

LW/OB presentation

Page 2: Machine Learning

Machine learning ( ML ) is a field concerned with studying and developing algorithms that perform

better at a task as they gain experience

( but mostly I wanted to use this cool picture )

Page 3: Machine Learning

WARNING This presentation is seriously lacking slides,

preparation and cool running examples.

That being said. I know what I’m talking about ;)

Page 4: Machine Learning

What ML is really about…

Page 5: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

Page 6: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

Page 7: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

• ML is about finding simple yet expressive classes of distributions

Page 8: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

• ML is about finding simple yet expressive classes of distributions

• ML is about using approximate numerical methods to perform Bayesian update on the training data

Page 9: Machine Learning

ML = intersection of

Page 10: Machine Learning

Data sizes vary…

Page 11: Machine Learning

Data sizes vary…

From a couple kilobytes

Page 12: Machine Learning

Data sizes vary

From a couple kilobytes To petabytes

Page 13: Machine Learning

Type of problems solved

• Supervised

• Unsupervised

• Reinforcement learning

• ( transduction )

Page 14: Machine Learning

Type of problems solved

• Supervised

– Classification

– Regression

• Unsupervised

• Reinforcement learning

• ( transduction )

Page 15: Machine Learning

Type of problems solved

• Supervised

• Unsupervised

- Clustering

- Discovering causal links

• Reinforcement learning

• ( transduction )

Page 16: Machine Learning

Type of problems solved

• Supervised

• Unsupervised

• Reinforcement learning

– Learn to perform a task, only from final result

• ( transduction )

– Not discussed, improve supervised learning with unsupervised samples

Page 17: Machine Learning

Typical applications

• Image, speech, pattern recognition

• Collaborative filtering

• Time series forecasting

• Game playing

• Denoising

• Any task where experience is valuable

Page 18: Machine Learning

Common ML techniques

Page 19: Machine Learning

Common ML techniques

• Linear regression

Page 20: Machine Learning

Common ML techniques

• Linear regression

• Factor models

Page 21: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

Page 22: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

Page 23: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

perceptron, multilayer perceptron with backpropagation, hebbian autoassociative memory, Boltzmann machine, spiking neurons…

Page 24: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

• SVM’s

Page 25: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

• SVM’s

• Bayesian networks, white box models…

Page 26: Machine Learning

Meta-Methods

Page 27: Machine Learning

Meta-Methods

– Ensemble forecasting

Page 28: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

Page 29: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

Page 30: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

Page 31: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

• Out of sample testing

Page 32: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

• Out of sample testing

• Minimum description length

Page 33: Machine Learning

Neural networks demystified

Page 34: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

Page 35: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

THIS IS…

Page 36: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

THIS IS… LINEAR

ALGEBRA!

Page 37: Machine Learning

Neural networks demystified • Perceptron

• Linear separability

Page 38: Machine Learning

Neural networks demystified • Perceptron

• Linear separability

8 binary inputs => 1/2212classifications linearly separable

Page 39: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

Page 40: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

( 1969 ~ 1986 )

Page 41: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

• Smooth Interpolation

Page 42: Machine Learning

Many more types…

Page 43: Machine Learning

SVM in a nutshell

Page 44: Machine Learning

SVM in a nutshell

• Maximize margin

Page 45: Machine Learning

SVM in a nutshell

• Maximize margin

• Embed in a high dimensional space

Page 46: Machine Learning

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

Page 47: Machine Learning

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

• Train on random ( with replacement ) subsets of the data ( bootstrapping )

Page 48: Machine Learning

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

• Train on random ( with replacement ) subsets of the data ( bootstrapping )

• Or weight the data according to the quality of prediction, and train new weak classifiers accordingly ( boosting )

Page 49: Machine Learning

Numerical tricks

Page 50: Machine Learning

Numerical tricks

• Optimization of fit with standard operational search techniques

Page 51: Machine Learning

Numerical tricks

• Optimization of fit with standard operational search techniques

• EM algorithm

Page 52: Machine Learning

Numerical tricks

• Optimization of fit with standard operational search techniques

• EM algorithm

• MCMC methods ( Gibbs sampling, metropolis algorithm… )

Page 53: Machine Learning

A fundamental Bayesian model, the Hidden Markov Model

Page 54: Machine Learning

A fundamental Bayesian model, the Hidden Markov Model

• Hidden states produce observed states

Page 55: Machine Learning

A fundamental Bayesian model, the Hidden Markov Model

• Hidden states produce observed states

• Billions of application

– Finance

– Speech recognition

– Swype

– Kinect

– Open heart surgery

– Airplane navigation

Page 56: Machine Learning

Questions I was asked

• How does Boosting work ?

• What is the No Free Lunch Theorem ?

• Writing style recognition

• Signature recognition

• Rule extraction

• Moving odds in response to informed gamblers

• BellKor-Pragmatic Chaos and the Netflix prize

Page 57: Machine Learning

Writing style recognition

Page 58: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

Page 59: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

Page 60: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )

Page 61: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )

• Combine with a logistic regression

Page 62: Machine Learning

Signature recognition

Page 63: Machine Learning

Signature recognition

• Depends if raster or vector

Page 64: Machine Learning

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

Page 65: Machine Learning

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

• Dimensionality reduction is key

Page 66: Machine Learning

Signature recognition

• Wavelet on raster image for feature extraction

Page 67: Machine Learning

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

• Dimensionality reduction is key

• Wavelet on raster image for feature extraction

• Path following then learning on path features ( total variation, average curvature etc )

Page 68: Machine Learning

Rules extraction

• Hard, hypothesis space not smooth

• Decision tree regression

• Genetic Programming ( Koza )

Page 69: Machine Learning

Netflix prize

Page 70: Machine Learning

Netflix prize

• The base (cinematch) = latent semantic model

• The defining characteristic of winners, ensemble prediction with neural networks to combine predictors

• Best team were mergers of good teams

Page 71: Machine Learning

Latent semantic model

• There is a set of K “features”. Each movie has a score in each feature, each user has a weight for each feature.

• Features are latent, we only assume the value of K.

• Equivalent to representing the rating matrix as a product of a score and preference matrix. SVD minimizes RMSE

Page 72: Machine Learning

Poker is hard…

Page 73: Machine Learning

Poker is hard…

• Gigantic, yet not continuous state space

• Dimensionality reduction isn’t easy

• High variance

• Possible to make parametric strategies and optimize with ML

• Inputs such as pot odds trivial to compute

Page 74: Machine Learning

Uhuh, slides end here

Page 75: Machine Learning

Sort of… Questions ?