machine learning
Post on 02-Nov-2014
9 Views
Preview:
DESCRIPTION
TRANSCRIPT
Machine Learning
LW/OB presentation
Machine learning ( ML ) is a field concerned with studying and developing algorithms that perform
better at a task as they gain experience
( but mostly I wanted to use this cool picture )
WARNING This presentation is seriously lacking slides,
preparation and cool running examples.
That being said. I know what I’m talking about ;)
What ML is really about…
What ML is really about…
• ML is about data, and modeling its distribution
What ML is really about…
• ML is about data, and modeling its distribution
• ML is about a tradeoff between model accuracy and predictive power
What ML is really about…
• ML is about data, and modeling its distribution
• ML is about a tradeoff between model accuracy and predictive power
• ML is about finding simple yet expressive classes of distributions
What ML is really about…
• ML is about data, and modeling its distribution
• ML is about a tradeoff between model accuracy and predictive power
• ML is about finding simple yet expressive classes of distributions
• ML is about using approximate numerical methods to perform Bayesian update on the training data
ML = intersection of
Data sizes vary…
Data sizes vary…
From a couple kilobytes
Data sizes vary
From a couple kilobytes To petabytes
Type of problems solved
• Supervised
• Unsupervised
• Reinforcement learning
• ( transduction )
Type of problems solved
• Supervised
– Classification
– Regression
• Unsupervised
• Reinforcement learning
• ( transduction )
Type of problems solved
• Supervised
• Unsupervised
- Clustering
- Discovering causal links
• Reinforcement learning
• ( transduction )
Type of problems solved
• Supervised
• Unsupervised
• Reinforcement learning
– Learn to perform a task, only from final result
• ( transduction )
– Not discussed, improve supervised learning with unsupervised samples
Typical applications
• Image, speech, pattern recognition
• Collaborative filtering
• Time series forecasting
• Game playing
• Denoising
• Any task where experience is valuable
Common ML techniques
Common ML techniques
• Linear regression
Common ML techniques
• Linear regression
• Factor models
Common ML techniques
• Linear regression
• Factor models
• Decision trees
Common ML techniques
• Linear regression
• Factor models
• Decision trees
• Neural networks
Common ML techniques
• Linear regression
• Factor models
• Decision trees
• Neural networks
perceptron, multilayer perceptron with backpropagation, hebbian autoassociative memory, Boltzmann machine, spiking neurons…
Common ML techniques
• Linear regression
• Factor models
• Decision trees
• Neural networks
• SVM’s
Common ML techniques
• Linear regression
• Factor models
• Decision trees
• Neural networks
• SVM’s
• Bayesian networks, white box models…
Meta-Methods
Meta-Methods
– Ensemble forecasting
Meta-Methods
– Ensemble forecasting
– Bootstrapping, Bagging, model averaging
Meta-Methods
– Ensemble forecasting
– Bootstrapping, Bagging, model averaging
– Boosting
Meta-Methods
– Ensemble forecasting
– Bootstrapping, Bagging, model averaging
– Boosting
– Inductive bias through
Meta-Methods
– Ensemble forecasting
– Bootstrapping, Bagging, model averaging
– Boosting
– Inductive bias through
• Out of sample testing
Meta-Methods
– Ensemble forecasting
– Bootstrapping, Bagging, model averaging
– Boosting
– Inductive bias through
• Out of sample testing
• Minimum description length
Neural networks demystified
Neural networks demystified • Perceptron ( 1957 )
Neural networks demystified • Perceptron ( 1957 )
THIS IS…
Neural networks demystified • Perceptron ( 1957 )
THIS IS… LINEAR
ALGEBRA!
Neural networks demystified • Perceptron
• Linear separability
Neural networks demystified • Perceptron
• Linear separability
8 binary inputs => 1/2212classifications linearly separable
Neural networks demystified • Perceptron ( 1957 )
• Linear separability
• Multilayered perceptron + backpropagation
Neural networks demystified • Perceptron ( 1957 )
• Linear separability
• Multilayered perceptron + backpropagation
( 1969 ~ 1986 )
Neural networks demystified • Perceptron ( 1957 )
• Linear separability
• Multilayered perceptron + backpropagation
• Smooth Interpolation
Many more types…
SVM in a nutshell
SVM in a nutshell
• Maximize margin
SVM in a nutshell
• Maximize margin
• Embed in a high dimensional space
Ensemble learning
• Combine predictions through voting ( with classifiers ) or regression to improve prediction
Ensemble learning
• Combine predictions through voting ( with classifiers ) or regression to improve prediction
• Train on random ( with replacement ) subsets of the data ( bootstrapping )
Ensemble learning
• Combine predictions through voting ( with classifiers ) or regression to improve prediction
• Train on random ( with replacement ) subsets of the data ( bootstrapping )
• Or weight the data according to the quality of prediction, and train new weak classifiers accordingly ( boosting )
Numerical tricks
Numerical tricks
• Optimization of fit with standard operational search techniques
Numerical tricks
• Optimization of fit with standard operational search techniques
• EM algorithm
Numerical tricks
• Optimization of fit with standard operational search techniques
• EM algorithm
• MCMC methods ( Gibbs sampling, metropolis algorithm… )
A fundamental Bayesian model, the Hidden Markov Model
A fundamental Bayesian model, the Hidden Markov Model
• Hidden states produce observed states
A fundamental Bayesian model, the Hidden Markov Model
• Hidden states produce observed states
• Billions of application
– Finance
– Speech recognition
– Swype
– Kinect
– Open heart surgery
– Airplane navigation
Questions I was asked
• How does Boosting work ?
• What is the No Free Lunch Theorem ?
• Writing style recognition
• Signature recognition
• Rule extraction
• Moving odds in response to informed gamblers
• BellKor-Pragmatic Chaos and the Netflix prize
Writing style recognition
Writing style recognition
• Naïve Bayes ( similar to spam filtering, bag of words approach )
Writing style recognition
• Naïve Bayes ( similar to spam filtering, bag of words approach )
• Clustering of HMM model parameters
Writing style recognition
• Naïve Bayes ( similar to spam filtering, bag of words approach )
• Clustering of HMM model parameters
• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )
Writing style recognition
• Naïve Bayes ( similar to spam filtering, bag of words approach )
• Clustering of HMM model parameters
• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )
• Combine with a logistic regression
Signature recognition
Signature recognition
• Depends if raster or vector
Signature recognition
• Depends if raster or vector
• Post office uses neural networks, but corpus is gigantic
Signature recognition
• Depends if raster or vector
• Post office uses neural networks, but corpus is gigantic
• Dimensionality reduction is key
Signature recognition
• Wavelet on raster image for feature extraction
Signature recognition
• Depends if raster or vector
• Post office uses neural networks, but corpus is gigantic
• Dimensionality reduction is key
• Wavelet on raster image for feature extraction
• Path following then learning on path features ( total variation, average curvature etc )
Rules extraction
• Hard, hypothesis space not smooth
• Decision tree regression
• Genetic Programming ( Koza )
Netflix prize
Netflix prize
• The base (cinematch) = latent semantic model
• The defining characteristic of winners, ensemble prediction with neural networks to combine predictors
• Best team were mergers of good teams
Latent semantic model
• There is a set of K “features”. Each movie has a score in each feature, each user has a weight for each feature.
• Features are latent, we only assume the value of K.
• Equivalent to representing the rating matrix as a product of a score and preference matrix. SVD minimizes RMSE
Poker is hard…
Poker is hard…
• Gigantic, yet not continuous state space
• Dimensionality reduction isn’t easy
• High variance
• Possible to make parametric strategies and optimize with ML
• Inputs such as pot odds trivial to compute
Uhuh, slides end here
Sort of… Questions ?
top related