machine learning

Post on 02-Nov-2014

9 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Simple presentation explaining what machine learning is.

TRANSCRIPT

Machine Learning

LW/OB presentation

Machine learning ( ML ) is a field concerned with studying and developing algorithms that perform

better at a task as they gain experience

( but mostly I wanted to use this cool picture )

WARNING This presentation is seriously lacking slides,

preparation and cool running examples.

That being said. I know what I’m talking about ;)

What ML is really about…

What ML is really about…

• ML is about data, and modeling its distribution

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

• ML is about finding simple yet expressive classes of distributions

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

• ML is about finding simple yet expressive classes of distributions

• ML is about using approximate numerical methods to perform Bayesian update on the training data

ML = intersection of

Data sizes vary…

Data sizes vary…

From a couple kilobytes

Data sizes vary

From a couple kilobytes To petabytes

Type of problems solved

• Supervised

• Unsupervised

• Reinforcement learning

• ( transduction )

Type of problems solved

• Supervised

– Classification

– Regression

• Unsupervised

• Reinforcement learning

• ( transduction )

Type of problems solved

• Supervised

• Unsupervised

- Clustering

- Discovering causal links

• Reinforcement learning

• ( transduction )

Type of problems solved

• Supervised

• Unsupervised

• Reinforcement learning

– Learn to perform a task, only from final result

• ( transduction )

– Not discussed, improve supervised learning with unsupervised samples

Typical applications

• Image, speech, pattern recognition

• Collaborative filtering

• Time series forecasting

• Game playing

• Denoising

• Any task where experience is valuable

Common ML techniques

Common ML techniques

• Linear regression

Common ML techniques

• Linear regression

• Factor models

Common ML techniques

• Linear regression

• Factor models

• Decision trees

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

perceptron, multilayer perceptron with backpropagation, hebbian autoassociative memory, Boltzmann machine, spiking neurons…

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

• SVM’s

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

• SVM’s

• Bayesian networks, white box models…

Meta-Methods

Meta-Methods

– Ensemble forecasting

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

• Out of sample testing

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

• Out of sample testing

• Minimum description length

Neural networks demystified

Neural networks demystified • Perceptron ( 1957 )

Neural networks demystified • Perceptron ( 1957 )

THIS IS…

Neural networks demystified • Perceptron ( 1957 )

THIS IS… LINEAR

ALGEBRA!

Neural networks demystified • Perceptron

• Linear separability

Neural networks demystified • Perceptron

• Linear separability

8 binary inputs => 1/2212classifications linearly separable

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

( 1969 ~ 1986 )

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

• Smooth Interpolation

Many more types…

SVM in a nutshell

SVM in a nutshell

• Maximize margin

SVM in a nutshell

• Maximize margin

• Embed in a high dimensional space

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

• Train on random ( with replacement ) subsets of the data ( bootstrapping )

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

• Train on random ( with replacement ) subsets of the data ( bootstrapping )

• Or weight the data according to the quality of prediction, and train new weak classifiers accordingly ( boosting )

Numerical tricks

Numerical tricks

• Optimization of fit with standard operational search techniques

Numerical tricks

• Optimization of fit with standard operational search techniques

• EM algorithm

Numerical tricks

• Optimization of fit with standard operational search techniques

• EM algorithm

• MCMC methods ( Gibbs sampling, metropolis algorithm… )

A fundamental Bayesian model, the Hidden Markov Model

A fundamental Bayesian model, the Hidden Markov Model

• Hidden states produce observed states

A fundamental Bayesian model, the Hidden Markov Model

• Hidden states produce observed states

• Billions of application

– Finance

– Speech recognition

– Swype

– Kinect

– Open heart surgery

– Airplane navigation

Questions I was asked

• How does Boosting work ?

• What is the No Free Lunch Theorem ?

• Writing style recognition

• Signature recognition

• Rule extraction

• Moving odds in response to informed gamblers

• BellKor-Pragmatic Chaos and the Netflix prize

Writing style recognition

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )

• Combine with a logistic regression

Signature recognition

Signature recognition

• Depends if raster or vector

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

• Dimensionality reduction is key

Signature recognition

• Wavelet on raster image for feature extraction

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

• Dimensionality reduction is key

• Wavelet on raster image for feature extraction

• Path following then learning on path features ( total variation, average curvature etc )

Rules extraction

• Hard, hypothesis space not smooth

• Decision tree regression

• Genetic Programming ( Koza )

Netflix prize

Netflix prize

• The base (cinematch) = latent semantic model

• The defining characteristic of winners, ensemble prediction with neural networks to combine predictors

• Best team were mergers of good teams

Latent semantic model

• There is a set of K “features”. Each movie has a score in each feature, each user has a weight for each feature.

• Features are latent, we only assume the value of K.

• Equivalent to representing the rating matrix as a product of a score and preference matrix. SVD minimizes RMSE

Poker is hard…

Poker is hard…

• Gigantic, yet not continuous state space

• Dimensionality reduction isn’t easy

• High variance

• Possible to make parametric strategies and optimize with ML

• Inputs such as pot odds trivial to compute

Uhuh, slides end here

Sort of… Questions ?

top related